Announcements:
|
January 1, 2019: First lecture will be held in EE B 308 on January 7, 2019 (Wednesday) at 3:30pm.
January 1, 2019: Please send email to the instructors (before January 14, 2019) if you are interested to attend the course. In the email, include your complete name, SR No, department, program, credit/audit. Subject line of your email must contain E9_261_2019 .
February 18, 2019: First midterm will be held in EE B 308 on February 22, 2019 (Friday) at 3:30pm.
March 28, 2019: Project proposal submission deadline March 29, 2019: format - one paragraph on topic (List of potential projects) and one on prior work from reference paper(s)
March 28, 2019: Project MidTerm Evaluation (April 12, 2019) - 5 min presentation per student detailing the progress of the student.
March 28, 2019: Project Final Presentation (May 6, 2019) - 10 min presentation per student detailing the work done. Mark distribution - Repeat of previous reference paper (70%) - Novelty (30%)
March 28, 2019: Second MidTerm: April 5, 2019 (200pm-330pm) in Class
March 28, 2019: Final Exam: April 24th AN (2:00pm-5pm).
April 10, 2019: A short evaluation of your course project will happen during 2:00pm-2:30pm on April 12, 2019 in the regular classroom.
|
Instructor:
|
Prasanta Kumar Ghosh
Office: EE C 330
Phone: +91 (80) 2293 2694
prasantg AT iisc.ac.in
Sriram Ganapathy
Office: EE C 334
Phone: +91 (80) 2293 2433
sriramg AT iisc.ac.in
|
Teaching Assistant(s):
|
|
Class meetings:
|
3:30pm to 5:00pm every Monday and Wednesday (Venue: EE B 308)
|
Course Content:
|
- Speech communication and overview
- Time varying signals/sys
- Spectrograms and applications
- Speech parameterization/representation
- AM-FM, sinusoidal models for speech
- Linear Prediction, AR and ARMA modeling of speech.
- Sequence Modeling of Speech - Dynamic Time Warping, Introduction to Hidden Markov Models
- Deep learning for Sequence Modeling - Recurrent neural networks, attention based models.
- Speech applications - Automatic speech recognition.
|
Prerequisites:
|
Digital Signal Processing, Probability and Random Processes
|
Textbooks:
|
- Fundamentals of speech recognition, Rabiner and Juang, Prentice Hall, 1993.
- Automatic Speech Recognition, A Deep Learning Approach, Authors: Yu, Dong, Deng, Li, Springer, 2014.
- Discrete-Time Speech Signal Processing: Principles and Practice, Thomas F. Quatieri, Prentice Hall, 2001.
- Digital Processing of Speech Signals, Lawrence R. Rabiner, Pearson Education, 2008.
- "Automatic Speech Recognition - A deep learning approach" - Dong Yu, Li Deng.
|
Web Links:
|
The Edinburgh Speech Tools Library
Speech Signal Processing Toolkit (SPTK)
Hidden Markov Model Toolkit (HTK)
ICSI Speech Group Tools
VOICEBOX: Speech Processing Toolbox for MATLAB
Praat: doing phonetics by computer
Audacity
SoX - Sound eXchange
HMM-based Speech Synthesis System (HTS)
International Phonetic Association (IPA)
Type IPA phonetic symbols
CMU dictionary
Co-articulation and phonology by Ohala
Assisted Listening Using a Headset
Headphone-Based Spatial Sound
Pitch Perception
Head-Related Transfer Functions and Virtual Auditory Display
Signal reconstruction from STFT magnitude: a state of the art
On the usefulness of STFT phase spectrum in human listening tests
Experimental comparison between stationary and nonstationary formulations of linear prediction applied to voiced speech analysis
A modified autocorrelation method of linear prediction for pitch-synchronous analysis of voiced speech
Linear prediction: A tutorial review
Energy separation in signal modulations with application to speech analysis
Nonlinear Speech Modeling and Applications
|
Grading:
|
- Assignments including recording (10 points) - 6 assignments. Average of all assignments will be considered. Assignments will include associated recordings. Cheating or violating academic integrity (see below) will result in failing in the course. Turning in identical homework sets counts as cheating.
- Midterm exam. (20 points) - 2 midterm exams. Missed exams earn 0 points. No make-up exams. An average of the midterm scores will be considered.
- Final exam. (50 points)
- Project (20 points) - Quality/Quantity of work (5 points), Report (5 points), Presentation (5 points), Recording (5 points).
|
Topics covered:
|
Date
|
Topics
|
Remarks
|
Jan 7
|
Course logistics, Information in speech, speech chain, speech research - science and technology
|
Introductory lecture
|
Jan 9
|
Phonemes, allophones, diphones, morphemes, lexicon, consonant cluster.
|
IPA, ARPABET, Grapheme-to-Phoneme conversion
|
Jan 14
|
Summary of phonetics and phonology, manner and place of articulation, intonation, stress, co-articulation, Assimilation, Elision, speech production models, formants, Human auditory system, auditory modeling, Cochlear signal processing.
|
Notes
|
Jan 16
|
Speech perception theories, Fletcher Munson curve, Perceptual unit of loudness, Pitch Perception, Timbre, Masking, critical band, BARK, HRTF, Categorial Perception.
|
Notes
|
Jan 21
|
McGurk Effect, distorted speech perception, Time-varying signal, time-varying system, temporal and frequency resolution.
|
-
|
Jan 23
|
Short-time Fourier transform (STFT), properties of STFT, inverse STFT.
|
Notes
|
Jan 28
|
Overlap Add method, reconstruction from STFT magnitude, Wideband and Narrowband spectrogram, Spectrograms of different sounds -- vowel, fricative, semivowel, nasal, stops.
|
Notes
|
Jan 30
|
Spectrogram reading, formants, pattern playback, Spectrogram reading, weighted overlap add method, spectrogram re-assignment, speech denoising, time-scale modification.
|
Notes
|
Feb 4
|
Time-frequency representation, time-bandwidth product, Gabor transform, time-frequency tile, auditory filterbank, auditory filter modeling, wavelet based auditory filter, auditory model.
|
Notes
|
Feb 6
|
homomorphic filtering, cepstrum, properties of cepstrum, uniqueness of cpestrum, Motivation for extraction of excitation of vocal tract response using cepstrum.
|
Notes
|
Feb 11
|
derivation of the cepstrum for all pole-zero transfer function, periodic impulse train.
|
Notes
|
Feb 13
|
derivation of the cepstrum for white noise, liftering, homomorphic vocoder, mel-frequency cepstral coefficients.
|
Notes
|
Feb 18
|
AM-FM model, non-linear models, signal subspace approach, Sinusoidal model, its applications, Chirp model, short-time chirp transform, mixture Gaussian envelope chirp model, group delay analysis
Speaking in noise: How does the Lombard effect improve acoustic contrasts between speech and ambient noise?
The evolution of the Lombard effect: 100 years of psychoacoustic research
REVERBERANT SPEECH ENHANCEMENT USJNG CEPSTRAL PROCESSING
Enhancement of Reverberant Speech Using LP Residual Signal
Reverberant Speech Enhancement by Temporal and Spectral Processing
JOINT DEREVERBERATION AND NOISE REDUCTION USING BEAMFORMING AND A SINGLE-CHANNEL SPEECH ENHANCEMENT SCHEME
Acoustic characteristics related to the perceptual pitch in whispered vowels
A Comprehensive Vowel Space for Whispered Speech
FUNDAMENTAL FREQUENCY GENERATION FOR WHISPER-TO-AUDIBLE SPEECH CONVERSION
Silent Communication: whispered speech-to-clear speech conversion
Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease
Seeing Speech: Capturing Vocal Tract Shaping Using Real-Time Magnetic Resonance Imaging
Speech production, syntax comprehension, and cognitive deficits in Parkinson's disease
Speech production knowledge in automatic speech recognition
Knowledge from Speech Production Used in Speech Technology: Articulatory Synthesis
Speech Production and Speech Modelling
|
Notes
|
Feb 22
|
-
|
Midterm# 1
|
Feb 25
|
Introduction to linear prediction (LP), LP as a filtering problem, orthogonality principle, optimal linear predictor, Yule Walker equations, Properties of Autocorrelation matrix. Reference: Chapter 2 of Theory of Linear Prediction
|
Notes
|
Feb 27
|
Relationship between eigenvalues of autocorrelation matrix and power spectrum. Augmented normal equations, Line Spectral Processes. Reference: Chapter 2 of Theory of Linear Prediction
|
-
|
Mar 6
|
Autocorrelation matrix estimation, Estimation of LP coefficients using Levinson Durbin recursion. Reflection coefficients. Properties of Error Stalling. White residual signal. Reference: Chapter 3 of Theory of Linear Prediction
|
-
|
Mar 11
|
AR (N) processes, relationship to linear prediction. AR approximation of a WSS sequence. Spectral estimation using linear prediction. Reference: Chapter 5 of Theory of Linear Prediction
|
-
|
Mar 13
|
Spectral estimation using AR modeling. Spectral transform linear prediction. Speech feature extraction using linear prediction, perceptual linear prediction (PLP) features.
|
Notes
|
Mar 18
|
Time Alignment and normalization of two sequences of variable length. Dynamic Programming Principles - recursive optimization in sequential problems. Introduction to Dynamic time warping. Reference : Rabiner and Juang, Speech Recognition Text book, Chapter 4.
|
Notes
|
Mar 22
|
Dynamic Time Warping - End point constraints, local and global constraints. Optimization algorithm for DTW. Applications of DTW for speech signal processing. Ref: Rabiner and Juang, Speech Recognition Text book, Chapter 4.
|
-
|
Mar 25
|
Introduction to Hidden Markov Models. Definition of HMM. Three Problems in HMM. Likelihood estimation - brute force and forward/backward modeling.
|
HMM Tutorial
|
Mar 27
|
Problem of state alignment - Viterbi decoding. HMM model training (Problem 3) for discrete case. Application to Word recognition in speech. Deep neural networks, supervised and unsupervised learning. Multi-layer perceptrons. Hidden layer activations and output layer non-linearities.
|
-
|
April 1
|
Connected Word Models - HMM with non-emitting states. Decoding over connected HMMs. Examples.
|
Notes Extra reading material
|
April 8
|
Neural networks in speech recognition. Multi-layer perceptron, gradient based learning. Convergence issues in neural networks. Activation functions and output activation. Estimating posteriors with neural networks.
|
Notes Extra reading material
|
April 10
|
Deep neural networks. Hierarchical feature processing. Applying neural networks for speech recognition. State-of-art speech recognition engine.
|
-
|
|
Your Voice/files to upload:
|
Click to upload (max 10Mb)
|
Transcripts for recording:
|
Click here
|
Your feedback on this course (any time):
|
Click here
|
Academic Honesty:
|
As students of IISc, we expect you to adhere to the highest standards of academic honesty and integrity.
Please read the IISc academic integrity.
|