BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//EE - ECPv5.10.0//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:EE
X-ORIGINAL-URL:https://ee.iisc.ac.in
X-WR-CALDESC:Events for EE
BEGIN:VTIMEZONE
TZID:Asia/Kolkata
BEGIN:STANDARD
TZOFFSETFROM:+0530
TZOFFSETTO:+0530
TZNAME:IST
DTSTART:20230101T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=Asia/Kolkata:20230811T150000
DTEND;TZID=Asia/Kolkata:20230811T170000
DTSTAMP:20260419T204255
CREATED:20230809T114701Z
LAST-MODIFIED:20230814T084627Z
UID:240977-1691766000-1691773200@ee.iisc.ac.in
SUMMARY:Ph.D. Thesis Oral Defense of Mr. Anoop C. S.: Automatic speech recognition for low-resource Indian languages
DESCRIPTION:Name of the student:  ANOOP C. S.\n\nAdvisor: Prof. A. G. Ramakrishnan & Dr. G. N. Rathna\n\nExternal examiner: Prof. Umesh S\, Dept of EE\, IIT Madras\n\nDate and Time: 11 August 2023 (Friday) 3:00 PM\n\nVenue (hybrid): MMCR\, C 241\, First Floor\, Dept. of EE\n                                                                AND\n\nMicrosoft Teams meeting link: \n\n\n\n\n\n\n\n\n\nJoin conversation\nteams.microsoft.com\n\n\n\n\n\n\n\n\n\nTITLE: Automatic speech recognition for low-resource Indian languages\n\nBuilding good models for automatic speech recognition (ASR) requires large amounts of annotated speech data. Most Indian languages are low-resourced and lack enough training data to build robust and efficient ASR systems. However\, many have an overlapping phoneme set and a strong correspondence between their character sets and pronunciations. This thesis exploits such similarities among the Indian languages to improve speech recognition in low-resource settings.\n\nSignificant contributions of the thesis:\n\nExploiting the pronunciation similarities across multiple Indian languages through shared label sets:\n\nThe use of a common set of tokens is proposed across multiple Indian languages and their performance analyzed in mono and multilingual settings.\n\n\nIt is found that the Sanskrit Library Phonetic Encoding (SLP1) tokens\, which exploit the pronunciation-based structuring of character Unicodes in Indian languages\, perform better than other grapheme-to-phoneme (G2P) based tokens in monolingual ASR settings.\nSyllable-based sub-words perform better than the character-based token units in monolingual speech recognition. However\, character-based SLP1 tokens perform better in cross-lingual transfer.\n\n\nStrategies for improving the performance of ASR systems in low-resource scenarios (target languages) exploiting the annotated data from high-resource languages (source languages):\n\nThree different low-resource settings have been studied:\n\nA) Labelled audio data is not available in the target language. Only a limited amount of unlabeled data is available. Unsupervised domain adaptation (UDA) schemes popular in image classification problems have been adopted to tackle this case.\n\n\nThe adversarial training with gradient reversal layers (GRL) and domain separation networks (DSN) provide word error rate (WER) improvements of 6.71% and 7.32%\, respectively\, on Sanskrit compared to a baseline hybrid DNN-HMM system trained on Hindi.\nThe UDA models outperform multi-task training with language recognition as the auxiliary task.\nSelection of the source language is critical in UDA systems.\n\n\nB) Target language has only a small amount of labeled speech data and has some amount of text data to build language models. In this case\, available data in high-resource languages is used through shared label sets to build unified acoustic (AM) and language models (LM).\n\n\nUnified language-agnostic AM + LM performs better than monolingual AM + LM in cases where (a) only limited speech data is available for training the acoustic models and (b) the test speech data is from domains different from that used in training.\nIn general\, multilingual AM + monolingual LM performs the best.\n\n\nC) There are N target languages with limited training data and several source languages with large training sets. In this case\, the usefulness of model-agnostic meta-learning (MAML) pre-training is established for Indian languages and improvements are proposed with text-similarity-based loss-weightings.\n\n\nMAML beats joint multilingual pretraining by an average of 5.4% in CER and 20.3% in WER.\nWith just 25% of the data\, MAML performance matches joint multilingual models trained on the whole target data.\nSimilarity with the source languages impacts the target language’s ASR performance.\nText-similarity measured through cosine and Mahalanobis distances is used to weigh the losses during MAML pretraining. It yields a mean absolute improvement of 1% in WER.\n\n\n\n                                                                                                              ALL ARE WELCOME ONLINE!\n Meeting Recording\n\n\n\nhttps://ee.iisc.ac.in/wp-content/uploads/2023/08/Anoop-PhD-Viva-20230811_150431-Meeting-Recording_Cut.mp4
URL:https://ee.iisc.ac.in/event/ph-d-thesis-oral-defense-of-mr-anoop-c-s-automatic-speech-recognition-for-low-resource-indian-languages/
LOCATION:MMCR\, Hall C 241\, 1st floor\, EE department
END:VEVENT
END:VCALENDAR