E9 261 (JAN) 3:1 Speech Information Processing



Speech Information Processing
January-April, 2016

Announcements:
December 28, 2015: First lecture will be held in EE 307 on January 4, 2016 (Monday) at 4:00pm.
January 11, 2016: Homework# 1 is due on January 18, 2016 (Monday) at 4:00pm.
January 25, 2016: Homework# 2 is due on February 1, 2016 (Monday) at 4:00pm.
January 29, 2016: Every Friday's class will now be held from 1:30pm to 2:30pm.
February 12, 2016: First midterm will be held from 1:30pm to 3:00pm at EE307 on Feb 19, 2016 (Friday).
February 16, 2016: Homework# 3 is due on February 18, 2016 (Thursday) at 6:00pm.
April 1, 2016: Second midterm will be held from 4:00pm to 5:30pm at EE307 on April 4, 2016 (Monday).
April 1, 2016: Final examination will be held from 2:00pm to 5:00pm at EE307 on April 22, 2016 (Friday).
April 1, 2016: Project interim report should be submitted on or before April 15, 2016 (Friday).
April 1, 2016: Final project report submission and presentation will be held on May 2, 2016 (Monday) from 2pm.
April 13, 2016: There will be a take home exam on April 15, 2016 and the answer sheets need to be turned in by 12Noon on April 18, 2015.


Instructor:
Prasanta Kumar Ghosh
Office: EE 320
Phone: +91 (80) 2293 2694
prasantg AT ee.iisc.ernet.in


Sriram Ganapathy
Office: EE 322
Phone: +91 (80) 2293 2433
sriram AT ee.iisc.ernet.in
Teaching Assistant(s):


Class meetings:
4:00pm to 5:00pm every Monday, Wednesday and 1:30pm to 2:30pm every Friday (Venue: EE 307)


Course Content:
  • Speech communication and overview
  • Time varying signals/sys
  • Spectrograms and applications
  • Speech parameterization/representation
  • AR, ARMA, sinusoidal and time-varying models for speech
  • Speech acoustic modeling methods
  • Neural Network models in speech
  • Speech recognition systems
  • Speech Applications


Prerequisites:
Digital Signal Processing, Probability and Random Processes


Textbooks:
    • Fundamentals of speech recognition, Rabiner and Juang, Prentice Hall, 1993.
    • Automatic Speech Recognition, A Deep Learning Approach, Authors: Yu, Dong, Deng, Li, Springer, 2014.
    • Discrete-Time Speech Signal Processing: Principles and Practice, Thomas F. Quatieri, Prentice Hall, 2001.
    • Digital Processing of Speech Signals, Lawrence R. Rabiner, Pearson Education, 2008.
    • "Automatic Speech Recognition - A deep learning approach" - Dong Yu, Li Deng.


Web Links:
The Edinburgh Speech Tools Library
Speech Signal Processing Toolkit (SPTK)
Hidden Markov Model Toolkit (HTK)
ICSI Speech Group Tools
VOICEBOX: Speech Processing Toolbox for MATLAB
Praat: doing phonetics by computer
Audacity
SoX - Sound eXchange
HMM-based Speech Synthesis System (HTS)
International Phonetic Association (IPA)
Type IPA phonetic symbols
CMU dictionary
Co-articulation and phonology by Ohala
Assisted Listening Using a Headset
Headphone-Based Spatial Sound
Pitch Perception
Head-Related Transfer Functions and Virtual Auditory Display
Signal reconstruction from STFT magnitude: a state of the art
On the usefulness of STFT phase spectrum in human listening tests
Experimental comparison between stationary and nonstationary formulations of linear prediction applied to voiced speech analysis
A modified autocorrelation method of linear prediction for pitch-synchronous analysis of voiced speech
Linear prediction: A tutorial review
Energy separation in signal modulations with application to speech analysis
Nonlinear Speech Modeling and Applications


Grading:
  • Surprise exam. (5 points) - 6 surprise exams. 10 minutes per exam. Each surprise exam is of 5 points. Missed exams earn 0 points. No make-up exams. Average of six surprise examinations will be considered. Class attendance is mandatory. Unexcused absences get an automatic exam score of zero for that session's exam grade.
  • Assignments (5 points) - 6 assignments. Average of all assignments will be considered. Assignments are meant for learning and preparation for exams. Students may discuss homework problems among themselves but each student must do his or her own work. Cheating or violating academic integrity (see below) will result in failing in the course. Turning in identical homework sets counts as cheating.
  • Midterm exam. (20 points) - 2 midterm exams. Missed exams earn 0 points. No make-up exams. An average of the midterm scores will be considered.
  • Final exam. (50 points)
  • Project (20 points) - Quality/Quantity of work (5 points), Report (5 points), Presentation (5 points), Recording (5 points).


Topics covered:
Date
Topics
Remarks
Jan 4
Course logistics, information in speech, speech chain, speech research - science and technology
Introductory lecture
Part 1,Part 2,Part 3
Jan 6
Phonemes, allophones, diphones, morphemes, lexicon, consonant cluster, summary of phonetics and phonology, manner and place of articulation, intonation, stress, IPA, ARPABET, Co-articulation, Assimilation, Elision.
Notes
Jan 8
Speech production models, Human auditory system, auditory modeling, Cochlear signal processing, Speech perception theories, Fletcher Munson curve, Perceptual unit of loudness.
Notes
Jan 11
Pitch Perception, Timbre, Masking, critical band, BARK, HRTF, , distorted speech perception.
Notes
HW# 1
Jan 13
Time-varying signal, time-varying system, temporal and frequency resolution, short-time Fourier transform (STFT), properties of STFT, inverse STFT.
Notes
Jan 15
Filtering and Filterbank Interpretation of STFT, Filter Bank Synthesis and Introduction to Overlap Add method.
Notes
Jan 18
Overlap Add method, reconstruction from STFT magnitude, Wideband and Narrowband spectrogram, Spectrograms of different sounds -- vowel, fricative, semivowel, nasal, stops, spectrogram reading, formants, pattern playback.
Notes
Jan 20
Spectrogram reading, weighted overlap add method, spectrogram re-assignment, speech denoising, time-scale modification.
Notes
Jan 22
Time-frequency representation, time-bandwidth product, Gabor transform, time-frequency tile, auditory filterbank, auditory filter modeling, wavelet based auditory filter, auditory model.
Notes
Suprise Test 1
Jan 25
Time-varying parameters in speech, Using praat to estimat TV parameters, homomorphic filtering, cepstrum, properties of cepstrum, uniqueness of cpestrum.
Notes
HW# 2
Jan 27
Motivation for extraction of excitation of vocal tract response using cepstrum, derivation of the cepstrum for all pole-zero transfer function, periodic impulse train, white noise, liftering, homomorphic vocoder, mel-frequency cepstral coefficients.
Notes
Jan 29
All-pole model, Linear prediction, autocorrelation and covariance formulation, gain computation, effect of LP order on spectral modeling, illustration of LPC for voiced and unvoiced segments, comparison of spectral envelope from cepstral smoothing and linear prediction, LPC synthesis.
Notes
Feb 1
Properries of linear prediction - global, local spectral properties, linear prediction using continuous, discrete spectra, minimum error, LP order selection, Levinson-Durbin recursion, Burg method, example spectra computed using ANE and Burg method.
Notes
Feb 3
Levinson-Durbin recursion, Burg method, forward and backward prediction, order recursion, lattic formulation, stability condition, ARMA model.
Notes
Feb 5
Warped LP, ARMA models, Autocorrelation domain relation, sequential solution of ARMA, Itakura Saito distance, Different assessment methods for speech enhancement, multi-taper cepstrum estimation, Mel generalized cepstrum.
Notes
Feb 8
Mel generalized cepstrum, Quantization of LPC parameters, Reflection coefficients, Log Area Ratios, Line Spectral Frequency, Mixed excited linear prediciton (MELP), multi-pulse linear prediction (MP-LP), code excited linear prediction (CELP) speech coders.
Notes
Suprise Test 2
Feb 10
Time-varying signal models, time-varying linear prediction, time-varying ARMA, AM-FM model, non-linear models, signal subspace approach.

Feb 12
Sinusoidal model, its applications, Chirp model, short-time chirp transform, mixture Gaussian envelope chirp model, group delay analysis.

Feb 15
Speaking in noise: How does the Lombard effect improve acoustic contrasts between speech and ambient noise?
The evolution of the Lombard effect: 100 years of psychoacoustic research
REVERBERANT SPEECH ENHANCEMENT USJNG CEPSTRAL PROCESSING
Enhancement of Reverberant Speech Using LP Residual Signal
Reverberant Speech Enhancement by Temporal and Spectral Processing
JOINT DEREVERBERATION AND NOISE REDUCTION USING BEAMFORMING AND A SINGLE-CHANNEL SPEECH ENHANCEMENT SCHEME
Acoustic characteristics related to the perceptual pitch in whispered vowels
A Comprehensive Vowel Space for Whispered Speech
FUNDAMENTAL FREQUENCY GENERATION FOR WHISPER-TO-AUDIBLE SPEECH CONVERSION
Silent Communication: whispered speech-to-clear speech conversion
Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease
Seeing Speech: Capturing Vocal Tract Shaping Using Real-Time Magnetic Resonance Imaging
Speech production, syntax comprehension, and cognitive deficits in Parkinson's disease
Speech production knowledge in automatic speech recognition
Knowledge from Speech Production Used in Speech Technology: Articulatory Synthesis
Speech Production and Speech Modelling

Suprise Test 3
HW# 3
Feb 17
Revision of the topics covered so far.

Feb 19
.
Midterm# 1
Feb 22
Gaussian distribution, Maximum likelihood estimation, MLE for Gaussian distribution, Gaussian Mixture Models (GMM), Expectaction Maximization (EM) Algorithm, EM for GMMs.
Notes
Feb 24
GMM-EM reestimation formulae, Gaussian model and GMM application to data, K-means clustering.
Time alignment and normalization, synchornous and asynchronous dynamic programming algorithm. [Text book - Rabiner and Juang, Pg 191-217].
Notes
Reference1
Reference2
Feb 26
Dynamic time warping, path constraints, local and global constraints, slope weight, DTW recursion.
Notes
Reference1
Reference2
Feb 29
Introduction to Markov Models, Extension to Hidden Markov Models (HMMs). Definition of model parameters, Three problems of HMMs - Likelihood computation, optimal state sequence estimation and Parameter reestimation.
Notes
Reference1
Mar 2
Solution to Problem I and II of HMMs - Likelihood computation using forward and backward variable. Viterbi algorithm for best state sequence determination.
Reference1
Mar 4
Solution to Problem III of HMMs - Parameter re-estimation using EM algorithm - Baum Welch Re-estimation algorithm. Estimating the Q function (E-step) and the maximization of Q function (M-step) to iteratively update the parameters.

HW# 4
Mar 7
Continuous density HMMs (HMM-GMMs). Parameter re-estimation using EM algorithm. Types of HMMs - (Ergodic versus left-to-right HMMs). Implementation issues - scaling.
Notes

Mar 9
Duration modeling in HMMs, Initialization using segmental K-means, Connected Word Models, Introducing null/non-emission states.
Notes

Mar 11
Re-estimating forward and backward variables with non-emitting states, Viterbi algorithm with non-emitting states. Parameter reestimation using multiple trials.
Notes
CMU reading material

Mar 14
Dealing with silence, need for sub-word modeling, Multiple pronunciations, Alternate models of phonetic states using neural networks, Hybrid models, Introduction to neural networks, Need for hidden units and non-linearity, solving the XOR problem. Deep Neural Networks.
External reading

Mar 16
How does a neural network estimate posterior probabilities ? Why do we need Deep neural networks (DNNs)? Types of non-linearities and their properties. Choice of cost function. Regression versus classfication using DNNs.
External reading1 External reading2

Mar 18
Learning parameters of DNN using error backpropagation (BP) algorithm. Equivalance between cross entropy and mean square error cost function in BP learning.
Slide External reading
Surprise Test 4
HW# 5

Mar 26
Matlab Discussion on MFCC features, standardization of features, GMM training and likelihood computation using diagonal and full covariance.

Mar 28
Practical Considerations in DNN implementation. Data preprocessing, Model initialization, Overfitting and Underfitting, Dropout and Weight Regularization. Learning parameters with regularization. Batch size selection - mini batch training and stochastic gradient descent (SGD).
Slide
Surprise Test 5
HW# 5

Mar 30
Restricted Boltzmann Machine - definition of free energy and probability density function with RBMs, conditional independence, equivalence with sigmoidal DNN, Properties of Gaussian-Bernoulli RBM, similarities with GMMs. Applications of RBM. Deep Belief Networks (DBNs) as initialization for DNNs. Denoising Autoencoders. Discriminative layerwise pretraining techniques.
External reading

Mar 31
Hybrid HMM-ANN models. Decoding word sequences using DNN-HMM systems. Language modeling with N-grams, backoff and smoothing. Recurrent neural networks, parameter learning with backpropagation, vanishing and exploding gradients, Introduction to long short term memory (LSTM) networks. Introduction to convolutional neural networks.
Slide External example

April 4
-
Midterm# 2

April 6
-
Discussion on Midterm# 2

April 11
Introduction to ivectors. Summary of the applications - speech recognition, speaker recognition, language identification and speech activity detection.
Slide











Your Voice:
Select file to upload:



Transcripts for recording:
Click here


Academic Honesty:
As students of IISc, we expect you to adhere to the highest standards of academic honesty and integrity.
Please read the IISc academic integrity.