Prasanta Kumar Ghosh

E9 261 (JAN) 3:1 Speech Information Processing

Speech Information Processing
January-April, 2020

Announcements:

January 1, 2020: First lecture will be held in EE B 308 on January 6, 2020 (Monday) at 9AM.
January 5, 2020: If you are attending the course (credit or audit), please fill up this form (on or before January 12, 2020) to join the class email list.
January 13, 2020: First of the extra DSP lectures will be help on January 16, 2020 (Thursday) at 5:30PM in EE B 308.
February 12, 2020: First Midterm will be held on February 26, 2020 (Wednesday) at 4:00PM in EE B 308. Syllabus for first midterm will be topics covered till February 19, 2020.
February 15, 2020: You need to send an email to confirm your selection of project in this course on or before February 29, 2020.

Instructor:

Prasanta Kumar Ghosh
Office: EE C 330
Phone: +91 (80) 2293 2694
prasantg AT iisc.ac.in

Sriram Ganapathy
Office: EE C 334
Phone: +91 (80) 2293 2433
sriramg AT iisc.ac.in

Teaching Assistant(s):

Class meetings:

4:00pm to 5:30pm every Monday and Wednesday (Venue: EE B 308)

Course Content:

Speech communication and overview
Time varying signals/sys
Spectrograms and applications
Speech parameterization/representation
AM-FM, sinusoidal models for speech
Linear Prediction, AR and ARMA modeling of speech.
Sequence Modeling of Speech - Dynamic Time Warping, Introduction to Hidden Markov Models
Deep learning for Sequence Modeling - Recurrent neural networks, attention based models.
Speech applications - Automatic speech recognition.

Prerequisites:

Digital Signal Processing, Probability and Random Processes

Textbooks:

Fundamentals of speech recognition, Rabiner and Juang, Prentice Hall, 1993.
Automatic Speech Recognition, A Deep Learning Approach, Authors: Yu, Dong, Deng, Li, Springer, 2014.
Discrete-Time Speech Signal Processing: Principles and Practice, Thomas F. Quatieri, Prentice Hall, 2001.
Digital Processing of Speech Signals, Lawrence R. Rabiner, Pearson Education, 2008.
"Automatic Speech Recognition - A deep learning approach" - Dong Yu, Li Deng.

Web Links:

The Edinburgh Speech Tools Library
Speech Signal Processing Toolkit (SPTK)
Hidden Markov Model Toolkit (HTK)
ICSI Speech Group Tools
VOICEBOX: Speech Processing Toolbox for MATLAB
Praat: doing phonetics by computer
Audacity
SoX - Sound eXchange
HMM-based Speech Synthesis System (HTS)
International Phonetic Association (IPA)
Type IPA phonetic symbols
CMU dictionary
Co-articulation and phonology by Ohala
Assisted Listening Using a Headset
Headphone-Based Spatial Sound
Pitch Perception
Head-Related Transfer Functions and Virtual Auditory Display
Signal reconstruction from STFT magnitude: a state of the art
On the usefulness of STFT phase spectrum in human listening tests
Experimental comparison between stationary and nonstationary formulations of linear prediction applied to voiced speech analysis
A modified autocorrelation method of linear prediction for pitch-synchronous analysis of voiced speech
Linear prediction: A tutorial review
Energy separation in signal modulations with application to speech analysis
Nonlinear Speech Modeling and Applications

Grading:

Assignments including recording (20 points) - Average of all assignments will be considered. Assignments will include associated recordings. Cheating or violating academic integrity (see below) will result in failing in the course. Turning in identical homework sets counts as cheating.
Midterm exam. (20 points) - 2 midterm exams. Missed exams earn 0 points. No make-up exams. An average of the midterm scores will be considered.
Final exam. (35 points)
Project (25 points) - Quality/Quantity of work (10 points), Report (5 points), Presentation (5 points), Recording (5 points).

Topics covered:

Date	Topics	Remarks
Jan 6	Course logistics, Information in speech, speech chain, speech research - science and technology	Introductory lecture, code First Day Questions
Jan 8	Phonemes, allophones, diphones, morphemes, lexicon, consonant cluster.	IPA, ARPABET, Grapheme-to-Phoneme conversion
Jan 13	Summary of phonetics and phonology, manner and place of articulation, intonation, stress, co-articulation, Assimilation, Elision, speech production models, formants, Human auditory system, auditory modeling, Cochlear signal processing.	Notes
Jan 15	Speech perception theories, Fletcher Munson curve, Perceptual unit of loudness, Pitch Perception, Timbre, Masking, critical band, BARK, HRTF, Categorial Perception.	Notes
Jan 16	Extra DSP class.
Jan 20	McGurk Effect, distorted speech perception, Time-varying signal, time-varying system, temporal and frequency resolution.	-
Jan 21	Extra DSP class.	code
Jan 22	Short-time Fourier transform (STFT), properties of STFT, inverse STFT.	Notes
Jan 23	Extra DSP class.	code
Jan 27	Short-time Fourier transform (STFT)	Notes
Jan 28	Extra DSP class.	-
Jan 29	Short-time Fourier transform (STFT) - Perfect reconstruction conditions	Notes
Jan 30	Extra DSP class.	-
Feb 3	Overlap Add method, reconstruction from STFT magnitude, Wideband and Narrowband spectrogram, Spectrograms of different sounds -- vowel, fricative, semivowel, nasal, stops, Spectrogram reading, formants, pattern playback, Spectrogram reading, weighted overlap add method, spectrogram re-assignment, speech denoising, time-scale modification.	Notes Notes
Feb 4	Extra DSP class.	-
Feb 5	Time-frequency representation, time-bandwidth product, Gabor transform, time-frequency tile, auditory filterbank, auditory filter modeling, wavelet based auditory filter, auditory model.	Notes
Feb 6	Extracting formants and pitch using praat script; introduction to librosa library.	Tutorial by Araind Illa
Feb 10	homomorphic filtering, cepstrum, properties of cepstrum, uniqueness of cpestrum, Motivation for extraction of excitation of vocal tract response using cepstrum.	Notes
Feb 12	Properties of cepstrum, derivation of the cepstrum for all pole-zero transfer function.	Notes
Feb 17	derivation of the cepstrum for periodic pulse train and white noise, liftering, homomorphic vocoder, mel-frequency cepstral coefficients.	Notes
Feb 19	AM-FM model, non-linear models, signal subspace approach, Sinusoidal model, its applications, Chirp model, short-time chirp transform, mixture Gaussian envelope chirp model, group delay analysis Speaking in noise: How does the Lombard effect improve acoustic contrasts between speech and ambient noise? The evolution of the Lombard effect: 100 years of psychoacoustic research REVERBERANT SPEECH ENHANCEMENT USJNG CEPSTRAL PROCESSING Enhancement of Reverberant Speech Using LP Residual Signal Reverberant Speech Enhancement by Temporal and Spectral Processing JOINT DEREVERBERATION AND NOISE REDUCTION USING BEAMFORMING AND A SINGLE-CHANNEL SPEECH ENHANCEMENT SCHEME Acoustic characteristics related to the perceptual pitch in whispered vowels A Comprehensive Vowel Space for Whispered Speech FUNDAMENTAL FREQUENCY GENERATION FOR WHISPER-TO-AUDIBLE SPEECH CONVERSION Silent Communication: whispered speech-to-clear speech conversion Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease Seeing Speech: Capturing Vocal Tract Shaping Using Real-Time Magnetic Resonance Imaging Speech production, syntax comprehension, and cognitive deficits in Parkinson's disease Speech production knowledge in automatic speech recognition Knowledge from Speech Production Used in Speech Technology: Articulatory Synthesis Speech Production and Speech Modelling	Notes
Feb 24	Introduction to linear prediction. Filter analogy. Orthogonality principle, Solution to LP. Yule Walker system of equations. Properties of LP filter - Minimum Phase. Reference: Chapter 2 of Theory of Linear Prediction	Notes
Feb 26	-	Midterm# 1
Mar 2	Relationship between eigenvalues of autocorrelation matrix and power spectrum. Augmented normal equations, Line Spectral Processes. Reference: Chapter 2 of Theory of Linear Prediction	Notes
Mar 4	Autocorrelation matrix estimation, Estimation of LP coefficients using Levinson Durbin recursion. Reflection coefficients. Properties of Error Stalling. White residual signal. Reference: Chapter 3 of Theory of Linear Prediction	Notes
Mar 9	AR (N) processes, relationship to linear prediction. AR approximation of a WSS sequence. Spectral estimation using linear prediction. Applications of LP for speech processing. Reference: Chapter 5 of Theory of Linear Prediction	Notes
Mar 11	Spectral transform linear prediction. Perceptual Linear Prediction. Comparing speech sequences. Time alignment and Normalization. Dynamic Programming, PLP paper	Notes

Your Voice/files to upload:

Click to upload (max 10Mb)

Transcripts for recording:

Click here

Your feedback on this course (any time):

Click here

Academic Honesty:

As students of IISc, we expect you to adhere to the highest standards of academic honesty and integrity.
Please read the IISc academic integrity.