Loading Events

« All Events

  • This event has passed.

Colloquium

March 20 @ 11:30 AM - 1:00 PM IST

Date & Time: 20th March 2025, at 11:30 am

Speaker: Mr. Varun Krishna P S

Venue: EE, MMCR [1st Floor,  C241]

 

Title: Self-Supervised Learning Approaches for Content Factor Extraction from Raw Speech

 

Abstract:

The rapid expansion of the digital data has led to a growing interest in self-supervised learning (SSL) techniques, particularly for speech processing tasks where labeled data is often scarce. SSL enables models to learn meaningful representations directly from raw data by capturing inherent structures and patterns without requiring explicit supervision. To be effective, speech representations must not only capture content-related information—such as phonetic, lexical, and semantic features—but also remain robust against speaker variations, co-articulation effects, channel distortions, and background noise. The focus of this talk is to describe our efforts in developing self-supervised learning techniques in extracting semantic content from raw speech while remaining invariant to non-semantic speech factors.

 

In the first part of the talk, we propose the Hidden Unit Clustering (HUC) framework, which integrates contrastive learning with deep clustering techniques to enhance representation quality. A speaker normalization strategy is incorporated to mitigate speaker variability, ensuring that the extracted representations focus primarily on the content-related information. Additionally, a heuristic data sampling method is introduced to generate pseudo-targets for deep clustering, further refining the learned representations. The framework is evaluated across multiple SSL models, demonstrating significant improvements in phonetic and semantic benchmarks, as well as superior performance in the low-resource ASR settings.

 

The second part of the talk focuses on the efforts to achieve context-invariant representations to address the challenges posed by co-articulation effects and variations in speaker and channel characteristics. To achieve this, a pseudo-con loss framework is proposed, leveraging pseudo-targets to guide the contrastive learning and enhance robustness. This approach serves as an effective auxiliary module that can be seamlessly integrated into SSL models based on deep clustering. Extensive evaluations demonstrate state-of-the-art performance across multiple ZeroSpeech 2021 sub-tasks, as well as significant improvements in phoneme recognition and ASR performance.

 

In the final part of the talk, we explore the integration of adversarial learning to obtain semantic representations that are invariant to non-semantic factors. A gradient-reversal mechanism is employed to suppress non-semantic variations explicitly within the SSL models, thereby refining the learned representations. The proposed adversarial learning approach effectively disentangles content from non-semantic factors, leading to robust semantic representations. Experimental results confirm that the proposed approach enhances the ability of speech processing models to generalize across different acoustic conditions while preserving the linguistic information.

 

 

Bio:

Varun Krishna is a PhD scholar at the Department of Electrical Engineering, Indian Institute of Science Bengaluru. He obtained his B.Tech from the Department of Electronics and Communications from NITK, Surathkal, in 2017. Later completed his M.Tech in Signal Processing from the Department of Electrical Engineering, Indian Institute of Science, Bengaluru, in 2019. His research interests include self-supervised representation learning, large language models, and generative AI.

 

Details

Date:
March 20
Time:
11:30 AM - 1:00 PM IST

Venue

MMCR, Hall C 241, 1st floor, EE department