BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//EE - ECPv5.10.0//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:EE
X-ORIGINAL-URL:https://ee.iisc.ac.in
X-WR-CALDESC:Events for EE
BEGIN:VTIMEZONE
TZID:Asia/Kolkata
BEGIN:STANDARD
TZOFFSETFROM:+0530
TZOFFSETTO:+0530
TZNAME:IST
DTSTART:20240101T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=Asia/Kolkata:20240206T113000
DTEND;TZID=Asia/Kolkata:20240206T130000
DTSTAMP:20260419T132033
CREATED:20240201T101134Z
LAST-MODIFIED:20240206T040039Z
UID:241319-1707219000-1707224400@ee.iisc.ac.in
SUMMARY:[EE Defense Talk]- Graph Based Approaches for Diarization of Conversational Speech\,
DESCRIPTION: The thesis defense talk of Ms. Prachi Singh (PhD candidate\, EE dept.) with the following details \nTitle Graph Clustering Approaches for Speaker Diarization of Conversational Speech \nDate and time  February 6\, 2024 (11:30am) \nVenue  MMCR\, EE\, (C241). And in Teams \nAbstract \nIn this era of advanced machine intelligence\, real-world speech applications still find it challenging to deal with conversations involving multiple speakers. An essential first step in speech information extraction from conversational speech is the task of finding “who spoke when”\, also referred to as speaker diarization. The focus of this talk is to describe our efforts in investigating graph representation learning and clustering techniques for this problem. While graph models have been used in several other domains\, our work on its application to temporal segmentation of speech is the first of its kind. \nThe talk is divided into three main parts. In the first part of this talk\, I will describe a novel proposal on self-supervised learning to perform joint representation learning and clustering\, called self-supervised clustering (SSC) for diarization. On the learned representations\, we explore path integral clustering (PIC)\, a graph-based clustering algorithm. The PIC is an unsupervised agglomerative graph clustering method that performs clustering based on the edge connections of a node\, called path integral. The proposed SSC with path integral clustering (SSC-PIC) is shown to achieve state-of-the-art performance for benchmark datasets. \nThe second part of the talk is an extension of SSC-PIC to incorporate metric learning. We design a neural version of the probabilistic linear discriminant analysis (PLDA) approach with learnable parameters to compute a log-likelihood score between embeddings from two segments of the recording.  We propose a joint self-supervised representation learning and metric learning approach called Selfsup-PLDA-PIC. \nIn the third part of the talk\, we develop a supervised learning setup using labeled conversational data for training. In this setting\, we propose a supervised clustering approach called Supervised HierArchical gRaph Clustering (SHARC) for speaker diarization. This approach uses Graph Neural Networks (GNN) to capture the similarity between the speaker embeddings and performs hierarchical clustering. An extension of this work is the joint training of the speaker embedding extractor along with the GNN module\, referred to as end-to-end SHARC (E-SHARC). I will also illustrate how to extend the E-SHARC model for diarization of overlapped speech recordings. \nThe talk will conclude with a summary of our key contributions\, highlighting the pros and cons of using graph-based models for speaker diarization. \n\n\n–—————–\nCoffee/tea will be served before the talk. All are welcome.
URL:https://ee.iisc.ac.in/event/ee-defense-talk-graph-based-approaches-for-diarization-of-conversational-speech/
LOCATION:Multi-Media Class Room (MMCR)\, EE Department (Hybrid mode)
END:VEVENT
END:VCALENDAR