WiMi Hologram Cloud Built A Intelligent Virtual Interaction System Based on Gesture Recognition
WiMi Hologram Cloud, a leading global Hologram Augmented Reality (“AR”) Technology provider, announced the development of a virtual interaction system based on gesture recognition. The system recognizes the user’s body and gesture movements through a recognition device. At the same time, the system uses immersive devices to provide feedback on the processing results, which significantly enriches the interaction semantics and enhances the natural friendliness of the interaction. It has the advantages of a realistic visualization effect, a friendly interaction mode, and a suitable compatibility interaction mode.
AiThority Interview Insights: AiThority Interview with Arijit Sengupta, CEO and Founder at Aible
First, the system acquires real-time hand data through the acquisition device and determines the continuous points of hand gesture actions. Then the coarse-grained recognition of gesture actions is performed using a weighting-based dynamic time-regularization algorithm. After recognizing the basic gesture actions, the system extracts and matches the feature vectors to label the main local images. After that, the system performs fine-grained recognition of gesture actions using deep learning methods. The results of both are then combined to perform detailed gesture translation. Finally, the recognition results interact with the virtual environment to simulate an immersive feeling for the user.
The gesture recognition based on the weighted dynamic time regularization algorithm is divided into three main steps: building a gesture sample library, training the gesture sample model, and matching the similarity between the recognition sample and the sample library sample. In the recognition, some gesture frames are first intercepted as test samples and then compared with the samples in the sample library to find the most matching samples.
In the fine-grained gesture recognition process, the system automatically performs image feature extraction and selects appropriate gesture actions in candidate regions by convolutional neural networks. The algorithm includes a feature extraction network, region suggestion network acquisition, area of interest pooling, and fully connected layers. The gesture fine-grained image consists of several convolutional layers alternating with pooling layers for feature extraction. Each local information node of the convolutional network layer is connected to the local information node of the previous neural network layer through the convolutional network kernel. After optimization of the activation function, a feature map of the fine-grained gestures in this network layer is obtained. The input eigenface of each gesture in the pooled layer neural network corresponds to the eigenface information of the gesture in the previous layer. The system sparsely processes the feature information of the gestures to obtain features with spatially invariant rows. In the fully connected layer, the system integrates the feature information of gestures with considerable fine-grained differentiation. In the output layer, the system returns the classification processing results after the gesture fine-grained is feature-processed by the classifier.
Read More about AiThority Interview: AiThority Interview with Alex Mans, Founder and CEO at FLYR Lab
In the feature extraction network, the system only focuses on the local details of the hand, such as finger positions, movements, and motion trajectories. After the feature extraction pooling operation, the dimensionality is reduced to half of the original image while ensuring the basic information of the image, which can effectively reduce the time complexity of the computation. The region suggests that the network takes the last output value of the feature extraction network as input and uses a fixed-size window to slide over the feature map. A vector value of low latitude is obtained with each slide, and each pixel point is returned to the location where the original image is located in the last layer of the region suggestion network.
The system inputs the extracted feature map and the candidate regions. Based on the input image, the system selects the region of interest, i.e., the location of the finger region corresponding to the feature map. The system performs a maximum pooling operation on the region and inputs this pooled data to the fully connected layer. Thus, the probability of belonging to the highest classification is input, while the offset of the position is obtained using border regression to obtain a more accurate target detection frame. This system can significantly improve the recognition rate of gesture actions and enhance the realistic experience and immersion of human-computer interaction.
Immersive virtual interaction based on gesture recognition is a hot topic in virtual reality, computer vision, and computer graphics. Along with the rapid development of technology and the gradual improvement of the human-computer interaction, people’s demand for visualization has also changed. The mouse, keyboard, and monitor can no longer give the user a three-dimensional experience. The two-dimensional interaction system alone makes certain interactive information challenging to express accurately. Therefore, WiMi’s system should have great application value.
Latest AiThority Interview Insights : AiThority Interview with Elliott Jobe, Chief Innovation Officer and Co-Founder at Infinite Reality
[To share your insights with us, please write to email@example.com]