Course Meeting Times
Lectures: 2 sessions / week, 1.5 hours / session
Recitations: 1 session / week, 1 hour / session
This course covers the algorithmic and machine learning foundations of computational biology combining theory with practice. We cover both foundational topics in computational biology, and current research frontiers. We study fundamental techniques, recent advances in the field, and work directly with current large-scale biological datasets.
Genomes: Biological sequence analysis, hidden Markov models, gene finding, comparative genomics, RNA structure, sequence alignment, hashing
Networks: Gene expression, clustering / classification, EM / Gibbs sampling, motifs, Bayesian networks, microRNAs, regulatory genomics, epigenomics
Evolution: Gene / species trees, phylogenomics, coalescent, personal genomics, population genomics, human ancestry, recent selection, disease mapping
In addition to the technical material in the course, the term project provides practical experience doing these things:
- Writing an National Institutes of Health (NIH)-style research proposal
- Reviewing peer proposals
- Planning and carrying out independent research
- Presenting research results orally in a conference setting
- Writing results in a journal-style scientific paper
There will be five problem sets during the semester, each including 3–5 problems for all students and a lab problem which is optional for undergraduate students. The problem sets will include both theoretical and programming problems. For programming problems, we provide skeleton code in Python, but you are welcome to write solutions in any language.
There will be one quiz, in class, which will cover all material covered up to that point. There will be no final exam. The quiz will include true / false questions, short answer questions, practical problems using algorithms covered in class, and one or two problems extending ideas seen in class.
Students will work on a final project with deliverables due at several milestones during the term as marked on the course schedule. The first part of the term will be spent identifying a topic relevant to the course materials, planning the project, writing an NIH-style proposal, and reviewing the proposals of your peers. The second part of the term will be focused on completing the project, writing the report, and presenting the results. Details of what is expected by each milestone will be posted on the course website.
You may either work alone or with one partner; however, teams and graduate students will be expected to undertake more ambitious projects. Part of the final project grade will depend on the challenge and originality of your project.
We anticipate projects of a few types:
- Identify a biological problem, gather relevant datasets, design and implement new algorithms, apply the methods, and interpret the results.
- Rigorously compare several algorithms which solve the same biological problem in terms of their performance and the quality of their outputs on synthetic and real data sets.
Each student will contribute to the scribe notes, which are chapters of the course textbook. Several students may be assigned to work together on a single lecture / chapter depending on course enrollment. As a scribe, you are expected to do the following:
- Before the lecture, familiarize yourself with the materials.
- During the lecture take note of ideas covered in lecture which are missing or explained poorly in the text, questions asked in lecture which are not answered in the text, digressions from the lecture material which are worth elaborating on, figures / illustrations which are confusing or missing important elements, etc.
- After the lecture, edit the text to address the points you identify.
The course textbook is the compiled scribe notes. The entire course textbook is available in the readings section.
You may also find the following optional texts helpful:
Durbin, Richard, Sean R. Eddy, Anders Krogh, et al. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, 1998.
Jones, Neil C., and Pavel Pevzner. An Introduction to Bioinformatics Algorithms. MIT Press, 2004. ISBN: 9780262101066. [Preview on Google Books ]
Duda, Richard O., Peter E. Hart, and David G. Stork. Pattern Classification. John Wiley & Sons, 2003. ISBN: 9789814126021.
Recitations will be held on Fridays, during which we will both review the lecture material and discuss additional aspects of it. Since there is only one recitation section, we will not be able to accommodate all scheduling conflicts. Therefore, attendance is not mandatory. Material in the recitation notes may appear on the quiz.
You are welcome to collaborate on problem sets and the final project. However:
- You must work independently on each problem before you discuss it with others.
- You must write the solutions on your own.
- You must acknowledge outside sources and collaborators.