Loading Events

« All Events

  • This event has passed.

[PhD Colloquium Talk by Prachi Singh] – 21-7 in MMCR, EE @ 330-430pm {Graph Clustering Approaches for Speaker Diarization of Conversational Speech}

July 21 @ 11:00 AM - 1:00 PM IST

Dear All,
We are pleased to invite you to the following PhD colloquium talk,
Who: Ms. Prachi Singh, PhD candidate, EE.
When: 21/7/2023 at 11AM [Note the updated the time]. High Tea at 1045am
Where: MMCR, EE, IISc and in the Teams Link
WhatGraph Clustering Approaches for Speaker Diarization of Conversational Speech 
In this era of advanced machine intelligence, real-world speech applications need to be equipped to deal with conversations involving multiple speakers. An essential first step in speech information extraction from conversational speech is the task of finding “who spoke when”, also referred to as speaker diarization. The focus of this talk is to describe our efforts in investigating graph clustering techniques for this problem. While graph models have been used in several other domains, its application to temporal segmentation of speech is the first of its kind.
The talk is divided into three main parts. In the first part of this talk, I will describe a novel proposal on self-supervised learning to perform joint representation learning and clustering, called self-supervised clustering (SSC) for diarization. On the learned representations, we explore path integral clustering (PIC), a graph-based clustering algorithm. The PIC is an agglomerative graph clustering method that performs clustering based on the edge connections of a node, called path integral. The proposed SSC with path integral clustering (SSC-PIC) is shown to achieve state-of-the-art performance for benchmark datasets.
The second part of the talk is an extension of SSC-PIC to incorporate metric learning. We design a neural version of the probabilistic linear discriminant analysis (PLDA) approach with learnable parameters to compute a log-likelihood score between embeddings from two segments of the recording.  We propose a joint self-supervised representation learning and metric learning approach called Selfsup-PLDA-PIC.
In the third part of the talk, we introduce an end-to-end supervised graph clustering approach. We develop a supervised learning setup using labeled conversational data for training this model. In this setting, we propose a supervised clustering approach called Supervised HierArchical gRaph Clustering (SHARC) for speaker diarization. This approach uses Graph Neural Networks (GNN) to capture the similarity between the speaker embeddings and perform hierarchical clustering. An extension of this work is the joint training of the speaker embedding extractor along with the GNN module, referred to as end-to-end SHARC (E-SHARC). To incorporate overlapped speech detection, I will illustrate how to extend the E-SHARC model for diarization of overlapped speech recordings.

The talk will conclude with a summary of our key contributions, while highlighting the pros and cons of using graph-based models for speaker diarization.

All are welcome


July 21
11:00 AM - 1:00 PM IST


MMCR, Hall C 241, 1st floor, EE department