BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//EE - ECPv5.10.0//NONSGML v1.0//EN
CALSCALE:GREGORIAN
METHOD:PUBLISH
X-WR-CALNAME:EE
X-ORIGINAL-URL:https://ee.iisc.ac.in
X-WR-CALDESC:Events for EE
BEGIN:VTIMEZONE
TZID:Asia/Kolkata
BEGIN:STANDARD
TZOFFSETFROM:+0530
TZOFFSETTO:+0530
TZNAME:IST
DTSTART:20230101T000000
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID=Asia/Kolkata:20230410T210000
DTEND;TZID=Asia/Kolkata:20230410T230000
DTSTAMP:20260419T132559
CREATED:20230410T000426Z
LAST-MODIFIED:20230410T001059Z
UID:240584-1681160400-1681167600@ee.iisc.ac.in
SUMMARY:Thesis colloquium of Mr. Anoop C. S.
DESCRIPTION:Advisor               : Prof. A. G. Ramakrishnan\n\nDate and Time: 10 April 2023 (Monday) 3:30 PM\n\n\n\nmeeting link: \n  \n\n\n\n\n\n\n\n\nJoin conversation\nteams.microsoft.com\n\n\n\n\n\n\n\nTITLE: Automatic speech recognition for low-resource Indian languages\n\nBuilding good models for automatic speech recognition (ASR) requires large amounts of annotated speech data. Most Indian languages are low-resourced and lack enough training data to build robust and efficient ASR systems. However\, many have an overlapping phoneme set and a strong correspondence between their character sets and pronunciations. In this thesis\, we exploit such similarities among the Indian languages to improve speech recognition in low-resource settings.\n\nSignificant contributions of the thesis:\n\nExploiting the pronunciation similarities across multiple Indian languages through shared label sets: \n\nWe propose the use of a common set of tokens across multiple Indian languages and analyze their performance in mono and multilingual settings.\n\n\nWe find that the Sanskrit Library Phonetic Encoding (SLP1) tokens\, which exploit the pronunciation-based structuring of character Unicodes in Indian languages\, perform better than some other grapheme-to-phoneme (G2P) based tokens in monolingual ASR settings.\nSyllable-based sub-words perform better than the character-based token units in monolingual speech recognition. However\, character-based SLP1 tokens perform better in cross-lingual transfer.\n\n\nStrategies for improving the performance of ASR systems in low-resource scenarios (target languages) exploiting the annotated data from high-resource languages (source languages):\n\nWe study three different low-resource settings:\n\nA) Labelled audio data is not available in the target language. Only a limited amount of unlabeled data is available. We adopt the unsupervised domain adaptation (UDA) schemes popular in image classification problems to tackle this case.\n\n\nThe adversarial training with gradient reversal layers (GRL) and domain separation networks (DSN) provides word error rate (WER) improvements of 6.71% and 7.32% in Sanskrit compared to a baseline hybrid DNN-HMM system trained on Hindi.\nThe UDA models outperform multi-task training with language recognition as the auxiliary task.\nSelection of the source language is critical in UDA systems.\n\n\nB) Target language has only a small amount of labeled data and has some amount of text data to build language models. We try to benefit from the available data in high-resource languages through shared label sets to build unified acoustic (AM) and language models (LM).\n\n\nUnified language-agnostic AM + LM performs better than monolingual AM + LM in cases where (a) only limited speech data is available for training the acoustic models and (b) the speech data is from domains different from that used in training.\nIn general\, multilingual AM + monolingual LM performs the best.\n\n\nC) There are N target languages with limited training data and several source languages with large training sets. Here\, we establish the usefulness of model-agnostic meta-learning (MAML) pre-training in Indian languages and propose improvements with text-similarity-based loss-weightings.\n\n\nMAML beats joint multilingual pretraining by an average of 5.4% in CER and 20.3% in WER.\nWith just 25% of the data\, MAML performance matches joint multilingual models trained on the whole target data.\nSimilarity with the source languages impacts the target language’s ASR performance.\nWe use text-similarity measured through cosine and Mahalanobis distances to weigh the losses during MAML pretraining. It yields a mean absolute improvement of 1% in WER.\n\n\n\n\n\n                                       ALL ARE WELCOME ONLINE!
URL:https://ee.iisc.ac.in/event/thesis-colloquium-of-mr-anoop-c-s/
END:VEVENT
END:VCALENDAR