The course outline is a guideline to topics that will be discussed in the course, and when they will be discussed:
Introduction: application background; a big picture; speech sounds; spoken language (reading assignment)
Math Background: probabilities; Bayes theorem; statistics; estimation; regression; hypothesis testing; entropy; mutual information; decision tree; optimization theory and convex optimization
Pattern Classification: pattern classification & pattern verification; Bayesian decision theory
Generative Models: model estimation: maximum likelihood, Bayesian learning, EM algorithm; multivariate Gaussian, Gaussian mixture model, Multinomial, Markov Chain model, etc.
Discriminative Models: Linear discriminant functions; support vector machine (SVM); large margin classifiers; sparse kernel machines; Neural networks
Hidden Markov Model (HMM): HMM vs. Markov chains; HMM concepts; Three algorithms: forward-backward; Viterbi decoding; Baum-Welch learning.
midterm presentation
Automatic Speech Recognition (ASR) (I): Acoustic and Language Modeling: HMM for ASR; ASR as an example of pattern classification; Acoustic modeling: HMM learning (ML, MAP, DT); parameter tying (decision tree based state tying); n-gram models: smoothing, learning, perplexity, class-based.
Automatic Speech Recognition (ASR) (II): Search - why search; Viterbi decoding in a large HMM; beam search; tree-based lexicon; dynamic decoding; static decoding; weighted finite state transducer (WFST)
Spoken Language Processing (I): text categorization classify text documents: call/email routing, topic detection, etc. vector-based approach, Naïve Bayes classifier; Bayesian networks, etc. (2) HMM applications: Statistical Part-of-Speech (POS) tagging; Language understanding: hidden concept models.
Spoken Language Processing (II): statistical machine translation IBM’s models for machine translation: lexicon model, alignment model, language model training process, generation & search
student presentation