- This event has passed.
PhD Thesis Defense of Mr. Shome Subhra Das
November 17, 2021 @ 4:45 PM - 6:00 PM IST
Abstract: Currently, we interact with computers, robots, drones, and virtual reality interfaces using pointing devices such as mouse, touchpad, joystick, virtual reality wand, drone controller, etc. These devices have one or more of the following limitations: being cumbersome, non-immersive, immobile, and having a steep learning curve. The target of this work is to explore ways to replace existing pointing devices with pointing gesture-based interfaces.
This thesis addresses two problems, namely estimating the direction being indicated by a pointing gesture (PDE) and detection of pointing gestures. The proposed techniques use a single depth sensor and use only the hand region. To our knowledge, this is the maiden attempt at creating depth and orientation tolerant, accurate methods for estimating the pointing direction using only depth images of the hand region. The proposed methods achieve accuracies comparable to or better than those of existing methods while avoiding their limitations.
Significant contributions of the thesis:
(i) Proposing a real-time technique for estimating the pointing direction using a nine-axis inertial motion unit (IMU) and an RGB-D sensor. It is the first method to compute the pointing direction (PD) by finding the axis vector of the index finger. It is also the first method to fuse information from the IMU and depth sensor to obtain the PD. Further, this is the first method to obtain the ground-truth pointing direction of pointing gestures using depth data of the index finger region.
(ii) Creation of a large (100k+ samples) dataset with accurate ground truth for PDE from depth images. Each sample consists of the segmented depth image of a hand, the fingertip location (2D + 3D), the pointing vector (as a unit vector and in terms of the yaw and pitch values), and the mean depth of the hand. This is the first public dataset for depth image based PDE that has accurate ground truth and a large number of samples.
(iii) Proposing a new 3D convolutional neural network-based method to estimate pointing direction. This is the first deep learning-based method for PDE that uses only the depth images of the hand region for PDE, without the use of RGB data. It is tolerant to variation in orientation and depth of the hand with respect to the camera and is suitable for real-time applications.
(iv) Proposing another technique for estimating the pointing direction using global registration of the test data point cloud with a pointing hand model captured using Kinect fusion-based method. It is tolerant to the variation in the orientation and depth of the hand w.r.t. the RGB-D sensor. It does not have the limitation of the previously proposed methods since it does not require the attachment of any device such as IMU nor does it require any dataset for training. It achieves less net angular error than most techniques in the literature using only the hand region.
(v) Creation of a large dataset of positive and negative samples for detection of pointing gestures from depth images of the hand region. A technique is also proposed using deep learning to distinguish pointing gestures from other hand gestures. This achieves higher accuracy than the only other existing technique by Cordo et al. for detection of pointing gestures from depth images of the hand.