Announcements:
|
December 28, 2015: First lecture will be held in EE 307 on January 4, 2016 (Monday) at 4:00pm.
January 11, 2016: Homework# 1 is due on January 18, 2016 (Monday) at 4:00pm.
January 25, 2016: Homework# 2 is due on February 1, 2016 (Monday) at 4:00pm.
January 29, 2016: Every Friday's class will now be held from 1:30pm to 2:30pm.
February 12, 2016: First midterm will be held from 1:30pm to 3:00pm at EE307 on Feb 19, 2016 (Friday).
February 16, 2016: Homework# 3 is due on February 18, 2016 (Thursday) at 6:00pm.
April 1, 2016: Second midterm will be held from 4:00pm to 5:30pm at EE307 on April 4, 2016 (Monday).
April 1, 2016: Final examination will be held from 2:00pm to 5:00pm at EE307 on April 22, 2016 (Friday).
April 1, 2016: Project interim report should be submitted on or before April 15, 2016 (Friday).
April 1, 2016: Final project report submission and presentation will be held on May 2, 2016 (Monday) from 2pm.
April 13, 2016: There will be a take home exam on April 15, 2016 and the answer sheets need to be turned in by 12Noon on April 18, 2015.
|
Topics covered:
|
Date
|
Topics
|
Remarks
|
Jan 4
|
Course logistics, information in speech, speech chain, speech research - science and technology
|
Introductory lecture
Part 1,Part 2,Part 3
|
Jan 6
|
Phonemes, allophones, diphones, morphemes, lexicon, consonant cluster, summary of phonetics and phonology, manner and place of articulation, intonation, stress, IPA, ARPABET, Co-articulation, Assimilation, Elision.
|
Notes
|
Jan 8
|
Speech production models, Human auditory system, auditory modeling, Cochlear signal processing, Speech perception theories, Fletcher Munson curve, Perceptual unit of loudness.
|
Notes
|
Jan 11
|
Pitch Perception, Timbre, Masking, critical band, BARK, HRTF, , distorted speech perception.
|
Notes HW# 1
|
Jan 13
|
Time-varying signal, time-varying system, temporal and frequency resolution, short-time Fourier transform (STFT), properties of STFT, inverse STFT.
|
Notes
|
Jan 15
|
Filtering and Filterbank Interpretation of STFT, Filter Bank Synthesis and Introduction to Overlap Add method.
|
Notes
|
Jan 18
|
Overlap Add method, reconstruction from STFT magnitude, Wideband and Narrowband spectrogram, Spectrograms of different sounds -- vowel, fricative, semivowel, nasal, stops, spectrogram reading, formants, pattern playback.
|
Notes
|
Jan 20
|
Spectrogram reading, weighted overlap add method, spectrogram re-assignment, speech denoising, time-scale modification.
|
Notes
|
Jan 22
|
Time-frequency representation, time-bandwidth product, Gabor transform, time-frequency tile, auditory filterbank, auditory filter modeling, wavelet based auditory filter, auditory model.
|
Notes
Suprise Test 1
|
Jan 25
|
Time-varying parameters in speech, Using praat to estimat TV parameters, homomorphic filtering, cepstrum, properties of cepstrum, uniqueness of cpestrum.
|
Notes
HW# 2
|
Jan 27
|
Motivation for extraction of excitation of vocal tract response using cepstrum, derivation of the cepstrum for all pole-zero transfer function, periodic impulse train, white noise, liftering, homomorphic vocoder, mel-frequency cepstral coefficients.
|
Notes
|
Jan 29
|
All-pole model, Linear prediction, autocorrelation and covariance formulation, gain computation, effect of LP order on spectral modeling, illustration of LPC for voiced and unvoiced segments, comparison of spectral envelope from cepstral smoothing and linear prediction, LPC synthesis.
|
Notes
|
Feb 1
|
Properries of linear prediction - global, local spectral properties, linear prediction using continuous, discrete spectra, minimum error, LP order selection, Levinson-Durbin recursion, Burg method, example spectra computed using ANE and Burg method.
|
Notes
|
Feb 3
|
Levinson-Durbin recursion, Burg method, forward and backward prediction, order recursion, lattic formulation, stability condition, ARMA model.
|
Notes
|
Feb 5
|
Warped LP, ARMA models, Autocorrelation domain relation, sequential solution of ARMA, Itakura Saito distance, Different assessment methods for speech enhancement, multi-taper cepstrum estimation, Mel generalized cepstrum.
|
Notes
|
Feb 8
|
Mel generalized cepstrum, Quantization of LPC parameters, Reflection coefficients, Log Area Ratios, Line Spectral Frequency, Mixed excited linear prediciton (MELP), multi-pulse linear prediction (MP-LP), code excited linear prediction (CELP) speech coders.
|
Notes
Suprise Test 2
|
Feb 10
|
Time-varying signal models, time-varying linear prediction, time-varying ARMA, AM-FM model, non-linear models, signal subspace approach.
|
|
Feb 12
|
Sinusoidal model, its applications, Chirp model, short-time chirp transform, mixture Gaussian envelope chirp model, group delay analysis.
|
|
Feb 15
|
Speaking in noise: How does the Lombard effect improve acoustic contrasts between speech and ambient noise?
The evolution of the Lombard effect: 100 years of psychoacoustic research
REVERBERANT SPEECH ENHANCEMENT USJNG CEPSTRAL PROCESSING
Enhancement of Reverberant Speech Using LP Residual Signal
Reverberant Speech Enhancement by Temporal and Spectral Processing
JOINT DEREVERBERATION AND NOISE REDUCTION USING BEAMFORMING AND A SINGLE-CHANNEL SPEECH ENHANCEMENT SCHEME
Acoustic characteristics related to the perceptual pitch in whispered vowels
A Comprehensive Vowel Space for Whispered Speech
FUNDAMENTAL FREQUENCY GENERATION FOR WHISPER-TO-AUDIBLE SPEECH CONVERSION
Silent Communication: whispered speech-to-clear speech conversion
Novel speech signal processing algorithms for high-accuracy classification of Parkinson's disease
Seeing Speech: Capturing Vocal Tract Shaping Using Real-Time Magnetic Resonance Imaging
Speech production, syntax comprehension, and cognitive deficits in Parkinson's disease
Speech production knowledge in automatic speech recognition
Knowledge from Speech Production Used in Speech Technology: Articulatory Synthesis
Speech Production and Speech Modelling
|
Suprise Test 3
HW# 3
|
Feb 17
|
Revision of the topics covered so far.
|
|
Feb 19
|
.
|
Midterm# 1
|
Feb 22
|
Gaussian distribution, Maximum likelihood estimation, MLE for Gaussian distribution, Gaussian Mixture Models (GMM), Expectaction Maximization (EM) Algorithm, EM for GMMs.
|
Notes
|
Feb 24
|
GMM-EM reestimation formulae, Gaussian model and GMM application to data, K-means clustering. Time alignment and normalization, synchornous and asynchronous dynamic programming algorithm. [Text book - Rabiner and Juang, Pg 191-217].
|
Notes
Reference1
Reference2
|
Feb 26
|
Dynamic time warping, path constraints, local and global constraints, slope weight, DTW recursion.
|
Notes
Reference1
Reference2
|
Feb 29
|
Introduction to Markov Models, Extension to Hidden Markov Models (HMMs). Definition of model parameters, Three problems of HMMs - Likelihood computation, optimal state sequence estimation and Parameter reestimation.
|
Notes
Reference1
|
Mar 2
|
Solution to Problem I and II of HMMs - Likelihood computation using forward and backward variable. Viterbi algorithm for best state sequence determination.
|
Reference1
|
Mar 4
|
Solution to Problem III of HMMs - Parameter re-estimation using EM algorithm - Baum Welch Re-estimation algorithm. Estimating the Q function (E-step) and the maximization of Q function (M-step) to iteratively update the parameters.
|
HW# 4
|
Mar 7
|
Continuous density HMMs (HMM-GMMs). Parameter re-estimation using EM algorithm. Types of HMMs - (Ergodic versus left-to-right HMMs). Implementation issues - scaling.
|
Notes
|
Mar 9
|
Duration modeling in HMMs, Initialization using segmental K-means, Connected Word Models, Introducing null/non-emission states.
|
Notes
|
Mar 11
|
Re-estimating forward and backward variables with non-emitting states, Viterbi algorithm with non-emitting states. Parameter reestimation using multiple trials.
|
Notes
CMU reading material
|
Mar 14
|
Dealing with silence, need for sub-word modeling, Multiple pronunciations, Alternate models of phonetic states using neural networks, Hybrid models, Introduction to neural networks, Need for hidden units and non-linearity, solving the XOR problem. Deep Neural Networks.
|
External reading
|
Mar 16
|
How does a neural network estimate posterior probabilities ? Why do we need Deep neural networks (DNNs)? Types of non-linearities and their properties. Choice of cost function. Regression versus classfication using DNNs.
|
External reading1
External reading2
|
Mar 18
|
Learning parameters of DNN using error backpropagation (BP) algorithm. Equivalance between cross entropy and mean square error cost function in BP learning.
|
Slide
External reading
Surprise Test 4 HW# 5
|
Mar 26
|
Matlab Discussion on MFCC features, standardization of features, GMM training and likelihood computation using diagonal and full covariance.
|
|
Mar 28
|
Practical Considerations in DNN implementation. Data preprocessing, Model initialization, Overfitting and Underfitting, Dropout and Weight Regularization. Learning parameters with regularization. Batch size selection - mini batch training and stochastic gradient descent (SGD).
|
Slide
Surprise Test 5 HW# 5
|
Mar 30
|
Restricted Boltzmann Machine - definition of free energy and probability density function with RBMs, conditional independence, equivalence with sigmoidal DNN, Properties of Gaussian-Bernoulli RBM, similarities with GMMs. Applications of RBM. Deep Belief Networks (DBNs) as initialization for DNNs. Denoising Autoencoders. Discriminative layerwise pretraining techniques.
|
External reading
|
Mar 31
|
Hybrid HMM-ANN models. Decoding word sequences using DNN-HMM systems. Language modeling with N-grams, backoff and smoothing. Recurrent neural networks, parameter learning with backpropagation, vanishing and exploding gradients, Introduction to long short term memory (LSTM) networks. Introduction to convolutional neural networks.
|
Slide
External example
|
April 4
|
-
|
Midterm# 2
|
April 6
|
-
|
Discussion on Midterm# 2
|
April 11
|
Introduction to ivectors. Summary of the applications - speech recognition, speaker recognition, language identification and speech activity detection.
|
Slide
|
|