This event has passed.

Ph.D. Thesis Oral Defense of Mr. Anoop C. S.: Automatic speech recognition for low-resource Indian languages

Name: Ph.D. Thesis Oral Defense of Mr. Anoop C. S.: Automatic speech recognition for low-resource Indian languages
Start: 2023-08-11T15:00:00+05:30
End: 2023-08-11T17:00:00+05:30
Location: MMCR, Hall C 241, 1st floor, EE department

August 11, 2023 @ 3:00 PM - 5:00 PM IST

Name of the student: ANOOP C. S.

Advisor: Prof. A. G. Ramakrishnan & Dr. G. N. Rathna

External examiner: Prof. Umesh S, Dept of EE, IIT Madras

Date and Time: 11 August 2023 (Friday) 3:00 PM

Venue (hybrid): MMCR, C 241, First Floor, Dept. of EE

AND

Microsoft Teams meeting link:

Join conversation

teams.microsoft.com

TITLE: Automatic speech recognition for low-resource Indian languages

Building good models for automatic speech recognition (ASR) requires large amounts of annotated speech data. Most Indian languages are low-resourced and lack enough training data to build robust and efficient ASR systems. However, many have an overlapping phoneme set and a strong correspondence between their character sets and pronunciations. This thesis exploits such similarities among the Indian languages to improve speech recognition in low-resource settings.

Significant contributions of the thesis:

Exploiting the pronunciation similarities across multiple Indian languages through shared label sets:

The use of a common set of tokens is proposed across multiple Indian languages and their performance analyzed in mono and multilingual settings.

It is found that the Sanskrit Library Phonetic Encoding (SLP1) tokens, which exploit the pronunciation-based structuring of character Unicodes in Indian languages, perform better than other grapheme-to-phoneme (G2P) based tokens in monolingual ASR settings.
Syllable-based sub-words perform better than the character-based token units in monolingual speech recognition. However, character-based SLP1 tokens perform better in cross-lingual transfer.

Strategies for improving the performance of ASR systems in low-resource scenarios (target languages) exploiting the annotated data from high-resource languages (source languages):

Three different low-resource settings have been studied:

A) Labelled audio data is not available in the target language. Only a limited amount of unlabeled data is available. Unsupervised domain adaptation (UDA) schemes popular in image classification problems have been adopted to tackle this case.

The adversarial training with gradient reversal layers (GRL) and domain separation networks (DSN) provide word error rate (WER) improvements of 6.71% and 7.32%, respectively, on Sanskrit compared to a baseline hybrid DNN-HMM system trained on Hindi.
The UDA models outperform multi-task training with language recognition as the auxiliary task.
Selection of the source language is critical in UDA systems.

B) Target language has only a small amount of labeled speech data and has some amount of text data to build language models. In this case, available data in high-resource languages is used through shared label sets to build unified acoustic (AM) and language models (LM).

Unified language-agnostic AM + LM performs better than monolingual AM + LM in cases where (a) only limited speech data is available for training the acoustic models and (b) the test speech data is from domains different from that used in training.
In general, multilingual AM + monolingual LM performs the best.

C) There are N target languages with limited training data and several source languages with large training sets. In this case, the usefulness of model-agnostic meta-learning (MAML) pre-training is established for Indian languages and improvements are proposed with text-similarity-based loss-weightings.

MAML beats joint multilingual pretraining by an average of 5.4% in CER and 20.3% in WER.
With just 25% of the data, MAML performance matches joint multilingual models trained on the whole target data.
Similarity with the source languages impacts the target language’s ASR performance.
Text-similarity measured through cosine and Mahalanobis distances is used to weigh the losses during MAML pretraining. It yields a mean absolute improvement of 1% in WER.

ALL ARE WELCOME ONLINE!

Meeting Recording

+ Google Calendar + iCal Export

Details

Date:: August 11, 2023
Time:: 3:00 PM - 5:00 PM IST

Venue

MMCR, Hall C 241, 1st floor, EE department

Ph.D. Thesis Oral Defense of Mr. Anoop C. S.: Automatic speech recognition for low-resource Indian languages

August 11, 2023 @ 3:00 PM - 5:00 PM IST

Details

Venue

Explore

Useful Links