BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//EE - ECPv5.10.0//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:EE
X-ORIGINAL-URL:https://ee.iisc.ac.in
X-WR-CALDESC:Events for EE
BEGIN:VTIMEZONE
TZID:Asia/Kolkata
BEGIN:STANDARD
TZOFFSETFROM:+0530
TZOFFSETTO:+0530
TZNAME:IST
DTSTART:20210101T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=Asia/Kolkata:20211117T164500
DTEND;TZID=Asia/Kolkata:20211117T180000
DTSTAMP:20260616T092156
CREATED:20211108T230750Z
LAST-MODIFIED:20211116T032837Z
UID:238966-1637167500-1637172000@ee.iisc.ac.in
SUMMARY:PhD Thesis Defense of Mr. Shome Subhra Das
DESCRIPTION:Date and Time: November 17\, 2021 (Wednesday)  11.15 AM\nClick here to join the meetingExternal Examiner: Prof. Gaurav Harit\, IIT Jodhpur\n\nTitle: Techniques for estimating the direction of pointing gestures using depth images in the presence of orientation and distance variations from the depth sensor\nAbstract: Currently\, we interact with computers\, robots\, drones\, and virtual reality interfaces using pointing devices such as mouse\, touchpad\, joystick\, virtual reality wand\, drone controller\, etc. These devices have one or more of the following limitations: being cumbersome\, non-immersive\, immobile\, and having a steep learning curve. The target of this work is to explore ways to replace existing pointing devices with pointing gesture-based interfaces.  \n This thesis addresses two problems\, namely estimating the direction being indicated by a pointing gesture (PDE) and detection of pointing gestures. The proposed techniques use a single depth sensor and use only the hand region. To our knowledge\, this is the maiden attempt at creating depth and orientation tolerant\, accurate methods for estimating the pointing direction using only depth images of the hand region. The proposed methods achieve accuracies comparable to or better than those of existing methods while avoiding their limitations.  \n Significant contributions of the thesis:  \n (i) Proposing a real-time technique for estimating the pointing direction using a nine-axis inertial motion unit (IMU) and an RGB-D sensor. It is the first method to compute the pointing direction (PD) by finding the axis vector of the index finger. It is also the first method to fuse information from the IMU and depth sensor to obtain the PD. Further\, this is the first method to obtain the ground-truth pointing direction of pointing gestures using depth data of the index finger region.  \n (ii) Creation of a large (100k+ samples) dataset with accurate ground truth for PDE from depth images. Each sample consists of the segmented depth image of a hand\, the fingertip location (2D + 3D)\, the pointing vector (as a unit vector and in terms of the yaw and pitch values)\, and the mean depth of the hand. This is the first public dataset for depth image based PDE that has accurate ground truth and a large number of samples.  \n(iii) Proposing a new 3D convolutional neural network-based method to estimate pointing direction. This is the first deep learning-based method for PDE that uses only the depth images of the hand region for PDE\, without the use of RGB data. It is tolerant to variation in orientation and depth of the hand with respect to the camera and is suitable for real-time applications.  \n (iv) Proposing another technique for estimating the pointing direction using global registration of the test data point cloud with a pointing hand model captured using Kinect fusion-based method. It is tolerant to the variation in the orientation and depth of the hand w.r.t. the RGB-D sensor. It does not have the limitation of the previously proposed methods since it does not require the attachment of any device such as IMU nor does it require any dataset for training. It achieves less net angular error than most techniques in the literature using only the hand region.  \n (v) Creation of a large dataset of positive and negative samples for detection of pointing gestures from depth images of the hand region. A technique is also proposed using deep learning to distinguish pointing gestures from other hand gestures. This achieves higher accuracy than the only other existing technique by Cordo et al.  for detection of pointing gestures from depth images of the hand. 
URL:https://ee.iisc.ac.in/event/phd-thesis-defense-of-mr-shome-subhra-das/
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=Asia/Kolkata:20211117T193000
DTEND;TZID=Asia/Kolkata:20211117T203000
DTSTAMP:20260616T092156
CREATED:20211108T224619Z
LAST-MODIFIED:20211108T225102Z
UID:238961-1637177400-1637181000@ee.iisc.ac.in
SUMMARY:M.Tech.(Research) Thesis Defense of Mr. Vinayak Killedar
DESCRIPTION:Title of the thesis: Solving Inverse Problems Using a Deep Generative Prior\nSupervisor: Prof. Chandra Sekhar Seelamantula (EE)\nExaminer: Prof. Sumohana Channappayya (EE\, IIT Hyderabad) \nAbstract: The objective in an inverse problem is to recover a signal from its measurements\, given the knowledge of the measurement operator. In this thesis\, we address the problems of compressive sensing (CS) and compressive phase retrieval (CPR) using a generative prior model with sparse latent sampling. These problems are ill-posed and have infinite solutions. Structural assumptions such as smoothness\, sparsity and non-negativity are imposed on the solution to obtain a unique solution. \nThe standard CS and CPR formulations impose a sparsity prior on the signal. Recently\, generative modeling approaches have removed the sparsity constraint and shown superior performance over traditional CS and CPR techniques in recovering signals from fewer measurements. Generative model uses a pre-trained network\, the generator of a Generative Adversarial Network (GAN) or the decoder of a Variational Autoencoder (VAE) to model the distribution of the signal and impose a Set-Restricted Eigenvalue Condition (S-REC) on the measurement operator. The S-REC property places a condition on the l-2 norm of the difference in signal and measurement domain for signals coming from the set S. Solving CS and CPR using generative models have some limitations. The reconstructed signal is constrained to lie in the range-space of the generator. The reconstruction process is slow because the latent space is optimized through gradient-descent (GD) and requires several restarts. It has been argued that the distribution of natural images is not confined to a single manifold\, but a union of submanifolds. To take advantage of this property\, we propose a sparsity-driven latent space sampling (SDLSS) framework\, where sparsity is imposed in the latent space. The effect is to divide the latent space into subspaces such that the generator models map each subspace into a submanifold. We propose a proximal meta-learning (PML) algorithm to optimize the parameters of the generative model along with the latent code. The PML algorithm reduces the number of gradient steps required during testing and imposes sparsity in the latent space. We derive the sample complexity bounds within the SDLSS framework for the linear CS model\, which is a generalization of the result available in the literature. The results demonstrate that\, for a higher degree of compression\, the SDLSS method is more efficient than the state-of-the-art deep compressive sensing (DCS) method. We consider both linear and learned nonlinear sensing mechanisms\, where the nonlinear operator is a learned fully connected neural network or a convolutional neural network and show that the learned nonlinear version is superior to the linear one. \nAs an application of the nonlinear sensing operator\, we consider compressive phase retrieval\, wherein the problem is to reconstruct a signal from the magnitude of its compressed linear measurements. We adapt the S-REC imposed on the measurement operator and propose a novel cost function. The SDLSS framework along with PML algorithm is applied to optimize the sparse latent space such that the adapted $\mathcal{S}$-REC loss and data-fitting error are minimized. The reconstruction process is fast and requires few gradient steps during testing compared with the state-of-art deep phase retrieval technique. \nExperiments are conducted on standard datasets such as MNIST\, Fashion-MNIST\, CIFAR-10\, and CelebA to validate the efficiency of SDLSS framework for CS and CPR. The results show that\, for a given dataset\, there exists an effective input latent dimension for the generative model. Performance quantification is carried out by employing three objective metrics: peak signal-to-noise ratio (PSNR)\, structural similarity index measure (SSIM)\, and reconstruction error (RE) per pixel\, which are averaged across the test dataset. \nAbout the speaker: Vinayak Killedar obtained a B.E. (ECE) degree from M. S. Ramaiah Institute of Technology (MSRIT)\, Bangalore in 2008. During 2008-2010\, he worked for Robert Bosch Engineering and Business Solution (RBEI)\, Coimbatore. He joined the M.Tech.(Signal Processing) program in National Institute of Technology (NIT) Calicut and graduated in 2013. He worked for Continental AG during 2014-2018 in the areas of autonomous driving and radar signal processing. Subsequently\, he joined the Spectrum Lab\, Department of Electrical Engineering\, Indian Institute of Science for M.Tech.(Research) and specialized in Compressed Sensing and Machine Learning. He is presently a Senior Technical Specialist at Ansys\, Kempten\, Germany.
URL:https://ee.iisc.ac.in/event/m-tech-research-thesis-defense-of-mr-vinayak-killedar/
LOCATION:Online\, India
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID=Asia/Kolkata:20211124T163000
DTEND;TZID=Asia/Kolkata:20211124T173000
DTSTAMP:20260616T092156
CREATED:20211122T233509Z
LAST-MODIFIED:20211122T233826Z
UID:239276-1637771400-1637775000@ee.iisc.ac.in
SUMMARY:PhD Thesis Defense of Mr. Jitendra Kumar Dhiman
DESCRIPTION:Date and Time: November 24\, 2021\,  11 AM. \nClick here to join the meeting \nTitle of the thesis: Spectrotemporal Processing of Speech Signals Using the Riesz Transform \nExaminer: Prof. S. R. Mahadeva Prasanna\, IIT Dharwad and IIT Guwahati \nAbstract: Speech signals have time-varying spectra. Spectrograms have served as a useful tool for the visualization and analysis of speech signals in the joint time-frequency plane. In this thesis\, we consider 2-D analysis of speech spectrograms. We consider a spectrotemporal patch and model it as a 2-D amplitude-modulated and frequency-modulated (AM-FM) sinusoid. Demodulation of the spectrogram yields the 2-D AM and FM components\, which correspond to the slowly varying vocal-tract envelope and the excitation\, respectively. For solving the demodulation problem\, we rely on the complex Riesz transform\, which is a 2-D extension of the 1-D Hilbert transform. The demodulation viewpoint brings forth many interesting properties of the speech signal. The spectrotemporal carrier helps us identify the regions that are coherent and those that are not. Based on this idea\, we introduce the coherencegram corresponding to a given spectrogram. The temporal evolution of the pitch harmonics can also be characterized by the orientation at each time-frequency coordinate\, resulting in the orientationgram. We show that these features collectively enable solutions for the important problems of voiced/unvoiced segmentation\, aperiodicity estimation\, periodic/aperiodic signal separation\, and pitch tracking. We compare the performance of the proposed methods with benchmark methods. The spectrotemporal amplitude characterizes the time-varying magnitude response of the vocal-tract filter. We show how the formants and their bandwidths manifest in the spectrotemporal amplitude. It turns out that the formant bandwidths are mildly overestimated\, which are perceptible when one performs speech synthesis using the estimated parameters. We propose a method for correcting the formant bandwidths\, which also restores the speech quality. Finally\, we use the curated spectrotemporal amplitude\, pitch\, aperiodicity\, and voiced/unvoiced decisions for the task of speech reconstruction in a spectral synthesis model and a neural vocoder\, namely\, WaveNet. We show that conditioning WaveNet on the spectrotemporal features results in high-quality speech synthesis. The quality of the synthesized speech is assessed using both objective and subjective measures. \nWe rely on the Perceptual Evaluation of Speech Quality (PESQ) measure and standard Mean Opinion Score (MOS) test for objective and subjective evaluation\, respectively. The performance of the proposed parameters is evaluated in a vocoder framework that uses the spectral synthesis model for speech reconstruction. The objective evaluation shows that the performance of the Riesz transform-based speech parameters is on par with the baseline systems. Using the spectral synthesis model\, we report an average PESQ score in the range from 2.30 to 3.45 over a total of 200 speech waveforms taken from the CMU-ARCTIC database comprising both male and female speakers. In comparison\, WaveNet-based speech reconstruction gave an average PESQ score of 3.65. \nSubjective evaluation was carried out through listening tests conducted in an acoustic test chamber on volunteers in the age group of 21 to 30. The average MOS score was 4.30 when the Riesz transform-based features were used in WaveNet for speech reconstruction\, which was also comparable with the baseline systems: STRAIGHT and WORLD. Both objective and subjective evaluations also showed that the quality of reconstructed speech waveforms was superior with the proposed features in a WaveNet vocoder than in the spectral synthesis model. \n An audio demonstration is available at the GitHub link: http://jitendradhiman.github.io/Demo \nBiography of Jitendra Kumar Dhiman: Jitendra Kumar Dhiman received his B.Tech. degree in Electronics and Telecommunication Engineering from the Institution of Electronics and Telecommunication Engineering\, Delhi\, India\, in 2010\, and M.Tech. degree in Signal Processing from Indian Institute of Technology Hyderabad in 2013. Subsequently\, he joined as a project assistant in Spectrum Lab (EE Department\, IISc) and worked on prosody modification of speech signals\, and then as a PhD student working on spectrotemporal models for speech processing. His research interests include speech and audio signal processing and machine learning. He will soon be joining Samsung Research and Innovation\, Bangalore (SRIB) as Chief Engineer.
URL:https://ee.iisc.ac.in/event/phd-thesis-defense-of-mr-jitendra-kumar-dhiman/
END:VEVENT
END:VCALENDAR