6.345 | Spring 2003 | Graduate
Automatic Speech Recognition


Course Meeting Times

Lectures: 2 sessions / week, 1.5 hours / session


This course introduces students to the rapidly developing field of automatic speech recognition. Its content is divided into three parts. Part I deals with background material in the acoustic theory of speech production, acoustic-phonetics, and signal representation. Part II describes algorithmic aspects of speech recognition systems including pattern classification, search algorithms, stochastic modelling, and language modelling techniques. Part III compares and contrasts the various approaches to speech recognition, and describes advanced techniques used for acoustic-phonetic modelling, robust speech recognition, speaker adaptation, processing paralinguistic information, speech understanding, and multimodal processing.


There will be two 90 minute lectures per week. To facilitate the coverage of a large quantity of material, copies of the lecture viewgraphs will be handed out. There will be no final exam for the course. Instead there will be two in-class quizzes each counting approximately 15% towards the final grade.

There will be weekly assignments consisting of both problems and mandatory laboratory work, so that students will be able to gain hands-on experience with the materials covered. Linux workstations will be made available to conduct laboratory work. A sign-up mechanism will be available via the 6.345 web-site to reserve time on these machines. Assignments must be turned in by the due date. Solutions will be provided along with the graded assignments. Each of the nine assignments will count approximately 5% towards the final grade.

During the last quarter of the course, assignments will end, and students will work on a term project that will count approximately 25% towards the final grade. Projects will be chosen in consultation with staff members, and typically involve creating and evaluating a speech recognizer along a dimension of interest to the student. Tool kits of key recognizer components will be provided, so that minimal programming skills are necessary.


A detailed outline of the class lectures and assignments is also available.


Lecturer: Jim Glass


Huang, Acero, and Hon. Spoken Language Processing. Upper Saddle River, NJ: Prentice-Hall, 2001. ISBN: 0130226165.

Jelinek. Statistical Methods for Speech Recognition. Cambridge, MA: MIT Press, 1998. ISBN: 0262100665.

Rabiner & Juang. Fundamentals of Speech Recognition. Upper Saddle River, NJ: Prentice-Hall, 1993. ISBN: 0130151572.

Duda, Hart, and Stork. Pattern Classification. New York, NY: Wiley & Sons, 2000. ISBN: 0471056693.

Stevens. Acoustic Phonetics. MIT Press, 1998. ISBN: 0262692503.

Course Info
As Taught In
Spring 2003
Learning Resource Types
equalizer Demonstration Audio
notes Lecture Notes