- This event has passed.
Faculty Candidate Talk on Blind speaker separation from noisy speech mixtures
June 4 @ 3:00 PM - 5:00 PM IST
Faculty Candidate Talk
Title: Blind speaker separation from noisy speech mixtures
Date and Time: 3:00 PM: 4th JUNE, 2024.
Location: MMCR, EE dept (Online link)
Abstract:
Blind separation of speech mixtures into individual speaker signals is crucial for several speech processing applications, including teleconferencing. These applications require blind speech separation (BSS), i.e., without any additional information about the speakers in the mixture or their count, for both transcription and communication. This task becomes particularly difficult when the number of speakers in the mixture is unknown and recordings are made using a single microphone. In a recent work, we developed a deep-learning-based system for BSS from noisy single-channel mixtures, with an unknown number of speakers in the mixture. The work employs a transformer-based neural network architecture with an attractor generation scheme, allowing it to count the speakers and separate their signals simultaneously. In my presentation, I will share the results from experimental validation on simulated speech mixtures. Our findings show that the system can achieve 18 dB or more improvement in signal-to-distortion ratio and 99% accuracy in speaker counting for mixtures with up to three speakers. Additionally, I will also discuss the insights gained into the model’s internal mechanics, by examining the attention patterns computed in the transformers. We also observed that these findings apply universally across different transformer configurations used in other tasks, such as ambisonic-to-ambisonic and multi-channel speech separation.
Bio: Srikanth Raj Chetupalli received the Master of Engineering and Doctor of Philosophy degrees from the Division of Electrical Sciences, Indian Institute of Science (IISc.) Bengaluru, India, in 2011 and 2020, respectively. He is currently a Postdoctoral Researcher with the International Audio Laboratories Erlangen (a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg and Fraunhofer Institute for Integrated Circuits IIS), Erlangen, Germany. His research interests include speech processing, multimicrophone processing, spatial audio processing, and in particular, source extraction, speech dereverberation, acoustic parameter estimation, and speaker diarization. He was the recipient of the Tata Consultancy Services Research Scholarship from 2015 to 2019.