|Four Problem Sets||60%|
Lectures: 1 session / week, 1.5 hours / session
Lectures will be held weekly for 1.5 hours each session. This term, 6.096 is being taught in conjunction with 6.046J. Students are encouraged to enroll in both courses simultaneously and learn both the foundations of algorithms, together with Computational Biology. In addition, tutorials and office hours will be held weekly.
6.001 (Structure and Interpretation of Computer Programs), 7.01
Grading will be calculated based on the following formula:
|Four Problem Sets||60%|
The field of computational biology is rich in algorithmic challenges, stemming from the sheer size of massive datasets, and also the noisy nature of biological signals. For example, the human genome contains 3 billion letters, 25 thousand genes, and several thousand regulatory signals governing the gene usage. The genes themselves interact in complex ways, forming dense regulatory networks, shaped by relentless evolutionary forces.
In this course, we will address several algorithmic challenges in computational biology. We will study the principles of algorithmic design for biological datasets, analyze existing algorithms, and apply these to real datasets. The lectures will cover:
Genomes are several billion letters long, and yet sequencing technologies only read a few hundred nucleotides at a time. Assembling the pieces into a complete genome is especially challenging, given the highly repetitive nature of the human genome, and the error-prone nature of sequencing reads.
The cell recognizes several biological signals guiding the processes of transcription, splicing, and translation. We will see how to map these signals onto the genome, and recognize complete gene structures in presence of vast non-coding regions.
In response to environmental changes, the cell reads a complex code of regulatory signals which dictate gene usage. These sequence patterns are extremely short (10 nucleotides), often degenerate, and occur at varying distances from the gene start. We will develop algorithms for detecting these elements amidst oceans of non-functional nucleotides.
The forces of natural selection and random mutation have shaped genome evolution and the modern species. Comparison of related genomes can yield insights into their evolutionary constraints, and reveal the functional elements within them. We will study algorithms for sequence alignment, both at the nucleotide level, and at the genome level.
The genes of a species rarely act in isolation. Complex regulatory networks govern their usage, and complex interaction networks govern their cooperation patterns. We will study algorithms to infer modules of cooperating genes, reconstruct regulatory networks, and study their emerging properties.
Perhaps the most intriguing aspect of biological systems is their ability to evolve. We will study algorithms for inferring evolutionary events, understanding the ancestry of species, and their evolutionary relationships.
References will be taken from:
Gusfield, Dan. Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge, UK: Cambridge University Press, 1997. ISBN: 0521585198.
Waterman, Michael. Introduction to Computational Biology: Maps, Sequences, and Genomes. Boca Raton, FL: CRC Press, 1995. ISBN: 0412993910.
Durbin, Richard, Graeme Mitchison, S. Eddy, A. Krogh, and G. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge, UK: Cambridge University Press, 1997. ISBN: 0521629713.