18.417 | Fall 2004 | Graduate

Introduction to Computational Molecular Biology

Lecture Notes

LEC # TOPICS    
Introduction to Biology  
1 The Central Dogma: Some Algorithms Introduction    
Enumerative Solutions: Partial Digest Problem and Median Strings  
2 Partial Digest Problem    
3

Motifs and Median Strings (PDF) (Courtesy of Jerome Mettetal. Used with permission.)

A Review Paper on Regulatory Motifs

   
Dynamic Programming: Sequence Alignments  
4 Global Alignment (PDF) (Courtesy of Russ Cox and Michael Walfish. Used with permission.)    
5

Local Alignment (PDF) (Courtesy of Anthony W. Scelfo and Athicha Muthitacharoen. Used with permission.)

More on Alignment Statistics

   
6

Spliced Alignment (PDF) (Courtesy of Lele Yu and Sam Kaufman. Used with permission.)

A Paper on the Subtle Problem of Aligning Proteins to DNA

Spliced Alignment with SIM4

   
7

More Efficient Alignment (PDF) (Courtesy of Mark Halsey and Robert Edward Beverly. Used with permission.)

A Recent Paper on Sub-quadratic Time Alignment: Crochemore, Maxime, Gad M. Landau, and Michal Ziv-Ukelson. “A Subquadratic Sequence Alignment Algorithm for Unrestricted Scoring Matrices.” SIAM J. COMPUT. 32, no. 6: 1654-1673.

Gene Myers on the State of Alignment in 1991: Myers, E. W. An overview of sequence comparison algorithms in molecular biology. University of Arizona, Department of Computer Science, Technical Report TR 91-29, 1991.

   
Graph Theory: Sequencing Genes and Proteins  
8

Genomics and SBH Graphs

A Nice Paper on Getting SBH to Work: Lengauer, Thomas, David Sankoff, Sorin Istrail, Pavel Peuvzner, and Michael Waterman. “Annual Conference on Research in Computational Molecular Biology Proceedings of the fifth annual international conference on Computational biology Montreal, Quebec, Canada.” New York: ACM Press, 2001, pp. 269-277. ISBN: 1581133537.

   
9 Peptide Graphs (PDF) (Courtesy of Robert Edward Beverly and Christopher J. Peikert. Used with permission.)    
Pattern Matching: Exact Matches, Gapless Alignments, and BLAST  
10 Exact Pattern Matching (PDF) (Courtesy of Lele Yu and Mark Halsey. Used with permission.)    
11 Suffix Trees (PDF) (Courtesy of Athicha Muthitacharoen and Anthony W. Scelfo. Used with permission.)    
12

Suffix Arrays and BWTs (PDF) (Courtesy of Tushara C. Karunaratna and Peter Wai Kei Lee. Used with permission.)

A Nice Paper on CSAs/BWTs

   
13

BLAST (PDF) (Courtesy of Eitan Reich and Peter Lee. Used with permission.)

The BLAST Paper

BLAST Statistics and PAMs: F., Altschul S. “Amino acid substitution matrices from an information theoretic perspective.” J Mol Biol. 219, no. 3 (1991): 555-65.

Introduction of Gapped Seed Filtration

   
Clustering: Microarrays and Phylogeny  
14 Clustering (Guest Lecturer)    
15 Trees (PDF) (Courtesy of Eugenia Lyashenko. Used with permission.)    
Neighbor Joining
16

A Nice Review of Phylogenetic Analysis

A Very Complicated Paper on Phylogeny: K. Rice, and T. Warnow. “Parsimony is Hard to Beat, Source Lecture Notes.” In Computer Science 1276 (1997): 124-133. Archive Proceedings of the Third Annual International Conference on Computing and Combinatorics table of contents. ISBN: 354063357X.

A Review of Coalescent Theory in Biology

 
17

Application: Microarrays (Guest Lecturer)

The Classic AML/ALL Cancer Classification Paper: Golub, T. R., D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander. “Molecular Classification of Cancer: class Discovery and Class Prediction by Gene Expression Monitoring.” Science 286 (15 October 1999).

Finding Weak Signals in Microarray Data (PDF)

A Review Article on Mining Microarray Data

   
Probabilistic Models and Machine Learning: Gene Annotation and Evolution
18

Hidden Markov Models I (PDF) (Courtesy of Han-Pang Chiu and Russ Cox. Used with permission.)

An HMM Paper, focused on Recognizing Folds: Krogh, Anders, Michael Brown, I. Saira Mian, Kimmen Sjölander, and David Haussler. “Hidden Markov Models in Computational Biology: Applications to Protein Modeling.” Journal of Molecular Biology 235, no. 5 (Feb 3, 1994): 1501-1531.

   
19

Hidden Markov Models II (PDF) (Courtesy of Chris Peikert. Used with permission.)

One of the GENSCAN Papers: Burge, Chris, and Samuel Karlin. “Prediction of Complete Gene Structures in Human Genomic DNA.” J. Mol. Biol. 268 (1997): 78-94.

An HMM Searching for Coding RNA

   
20

Gibbs Sampling (PDF) (Courtesy of Tushara C. Karunaratna. Used with permission.)

A Seminal Paper on Gibbs Sampling

   
21 Random Projections (PDF) (Courtesy of Eugenia Lyashenko. Used with permission.)  
22 MCMC and Bayesian Networks  
Another Probabilistic Method to Phase Haplotype Data
   
Horizons
23 The Future: Protein Structure (Guest Lecturer)    
24

The Future: Haplotype Mapping (Guest Lecturer)

Ancestor Reconstruction by Haplotype Coloring: Schwartz, R., and A. Clark, S. Istrail. “Methods for inferring block-wise ancestral history from haploid sequences: The haplotype coloring problem.” Proceedings of the 2nd Workshop on Algorithms in Bioinformatics, (WABI02), Springer Lecture Notes in Computer Science (2002): 44-59.

   
25 Presentations of Final Projects    
26 Presentations of Final Projects (cont.)    

Course Info

Instructor
Departments
As Taught In
Fall 2004
Level
Learning Resource Types
Lecture Notes
Projects with Examples
Problem Sets