This event has passed.

Thesis colloquium of Mr. Anoop C. S.

Name: Thesis colloquium of Mr. Anoop C. S.
Start: 2023-04-10T21:00:00+05:30
End: 2023-04-10T23:00:00+05:30

April 10, 2023 @ 9:00 PM - 11:00 PM IST

Advisor : Prof. A. G. Ramakrishnan

Date and Time: 10 April 2023 (Monday) 3:30 PM

meeting link:

Join conversation

teams.microsoft.com

TITLE: Automatic speech recognition for low-resource Indian languages

Building good models for automatic speech recognition (ASR) requires large amounts of annotated speech data. Most Indian languages are low-resourced and lack enough training data to build robust and efficient ASR systems. However, many have an overlapping phoneme set and a strong correspondence between their character sets and pronunciations. In this thesis, we exploit such similarities among the Indian languages to improve speech recognition in low-resource settings.

Significant contributions of the thesis:

Exploiting the pronunciation similarities across multiple Indian languages through shared label sets:

We propose the use of a common set of tokens across multiple Indian languages and analyze their performance in mono and multilingual settings.

We find that the Sanskrit Library Phonetic Encoding (SLP1) tokens, which exploit the pronunciation-based structuring of character Unicodes in Indian languages, perform better than some other grapheme-to-phoneme (G2P) based tokens in monolingual ASR settings.
Syllable-based sub-words perform better than the character-based token units in monolingual speech recognition. However, character-based SLP1 tokens perform better in cross-lingual transfer.

Strategies for improving the performance of ASR systems in low-resource scenarios (target languages) exploiting the annotated data from high-resource languages (source languages):

We study three different low-resource settings:

A) Labelled audio data is not available in the target language. Only a limited amount of unlabeled data is available. We adopt the unsupervised domain adaptation (UDA) schemes popular in image classification problems to tackle this case.

The adversarial training with gradient reversal layers (GRL) and domain separation networks (DSN) provides word error rate (WER) improvements of 6.71% and 7.32% in Sanskrit compared to a baseline hybrid DNN-HMM system trained on Hindi.
The UDA models outperform multi-task training with language recognition as the auxiliary task.
Selection of the source language is critical in UDA systems.

B) Target language has only a small amount of labeled data and has some amount of text data to build language models. We try to benefit from the available data in high-resource languages through shared label sets to build unified acoustic (AM) and language models (LM).

Unified language-agnostic AM + LM performs better than monolingual AM + LM in cases where (a) only limited speech data is available for training the acoustic models and (b) the speech data is from domains different from that used in training.
In general, multilingual AM + monolingual LM performs the best.

C) There are N target languages with limited training data and several source languages with large training sets. Here, we establish the usefulness of model-agnostic meta-learning (MAML) pre-training in Indian languages and propose improvements with text-similarity-based loss-weightings.

MAML beats joint multilingual pretraining by an average of 5.4% in CER and 20.3% in WER.
With just 25% of the data, MAML performance matches joint multilingual models trained on the whole target data.
Similarity with the source languages impacts the target language’s ASR performance.
We use text-similarity measured through cosine and Mahalanobis distances to weigh the losses during MAML pretraining. It yields a mean absolute improvement of 1% in WER.

ALL ARE WELCOME ONLINE!

+ Google Calendar + iCal Export

Details

Date:: April 10, 2023
Time:: 9:00 PM - 11:00 PM IST

Thesis colloquium of Mr. Anoop C. S.

April 10, 2023 @ 9:00 PM - 11:00 PM IST

Details

Explore

Useful Links