Course Meeting Times

Lectures: 2 sessions / week, 1.5 hours / session

Labs: 2 sessions / week, for 2 hours / session
One session / week, for 1 hour / session


Students should have some experience in a programming language. 6.034 is listed as a prerequisite but can be waived by permission of the instructor.

The material covered in this course is selected in such a way that at its completion you should be able to understand current papers in the field of Natural Language Processing (NLP). No background in NLP is necessary.



Jurafsky, D. and Martin, J. H. Speech and Language Processing. Prentice Hall: 2000. ISBN: 0130950696

Recommended and Reference (on Library Reserve)

Manning, C. D. and H. Schütze: Foundations of Statistical Natural Language Processing. The MIT Press. 1999. ISBN 0-262-13360-1.

Barton, E., Berwick, R., and Ristad, E. Computational Complexity and Natural Language: The MIT Press. 1987. ISBN 0-26-02266-4.

Allen, J. Natural Language Understanding. The Benajmins/Cummings Publishing Company Inc. 1994. ISBN 0-8053-0334-0.

Brady, J., and Berwick, R. Computational Models of Discourse. The MIT Press, 1983. ISBN-0-262-02183-8.

Supplementary readings will be available as pdf files on the course website, and posted as needed. These will mostly be in the form of background readings for the laboratory assignments.

Additional Resources

  • Proceedings of major conferences (related to Natural Language Processing):
    • ACL (Association of Computational Linguistics)
    • European Chapter of the ACL
    • COLING (International Committee of Computational Linguistics)
    • ANLP (Applied Natural Language Processing, by ACL)
    • ACL SIGDAT, other SIG (Special Interest Groups) Workshops, such as WVLC (Workshop on Very Large Corpora)
    • EMNLP (Empirical Methods in Natural Language Processing)
    • Linguist List (for everything you always wanted to say)
    • DARPA HLT (Defense Advanced Research Project Agency Human Language Technology Workshops)


This course is lab oriented; that is, the work of the course is done via a series of laboratory exercises. These will be handed out once approximately every two weeks. There are no exams, in particular, there will be no final exam. The final project will involve an element of non determinism, i.e., so-called 'free will', in that you will be able to choose your own project and combine elements from the previous laboratories, or do something completely new. For the final project, we will have people work in teams of 2 or 3 (but not more, and at my urging, not fewer - solos are discouraged, but, like all labs, collaboration is encouraged - see below). In addition, new this year, there may be an additional 'lab' project that will be an engineering-oriented exploration based on an extension of any of the laboratory topics in the course or additional readings associated with that topic. The format will be a short, analytical paper (less than 10 pages). This is to be done solo also, like other labs, though this is 'under construction' and may be morphed into 1 or 2-person reports. It can be completed at any time during the term. I would exhort you to try to grapple with this before the very end of the term, when the final project will also be on your mind.

The laboratory exercises are designed to be carried out on Athena, but this year we may be able to distribute packaged files for home/laptop use. The software, along with related software you may find helpful, is listed under the tools section of this site. If you are clever and adventuresome, you are certainly free to download the software used and get it running on your own PC/laptop, but this must be own 'your own nickel' - i.e., we cannot guarantee that you will succeed, nor can we offer technical support to do so. Stay tuned for further announcements.

Turning in the Assignments


The assignments are due at the end of class on the due date.


Please construct (simple) web pages for your lab reports, and email the root URL to the TA. If you do not know how to construct web pages like this one, this might be a good time to learn.


Late Assignments

You have up to 30 (thirty) late days to use up, that can be distributed among your laboratory projects. However, the last project must be turned during the last week of class (Week #26), even if you have not used all of your days by then.

Once you use up your late days, late projects will not earn any points, even though they might be considered in borderline cases for the final grade. Thus try to turn in all projects, even though you might feel they are not to be counted. If you do not turn in a final (joint) project, you will receive an I (incomplete) for the class, and will have to make this up by next term (the incomplete will note that 80% of the coursework has been completed).

Cooperative Work and Plagiarism

Cooperative work is strongly encouraged; you are free to work together on laboratory assignments. However, aside from the final project, you must write up and turn in your own work. Please write the names of the people with whom you worked at the top of the first page. Exact copies of laboratory reports will not be acceptable. (Something other than your name and those of your co-workers must be different!) The aim of the course (and its pedagogical philosophy) is to learn about natural language processing. You will learn more if you actually do the laboratory assignments.


Final grades will be determined on the basis of the following weighting scheme.

Laboratory Assignments 55%
Final Project 35%
Class Participation and Discussion 10%