6.047 | Fall 2015 | Undergraduate

Computational Biology

Readings

Readings are from the course textbook, which has been transcribed and compiled by students in this course over many years. The instructions for student “scribes,” and the templates they used, are linked below.

The entire course textbook is available, courtesy of the professor and the students.

Kellis, Manolis, ed. Computational Biology: Genomes, Networks, Evolution. MIT course 6.047 / 6.878. 2016 (PDF - 43.5MB).

LEC # TOPICS READINGS
1 Introduction: Course Overview, Biology, Algorithms, Machine Learning

Chapter 1: Introduction to the Course

  • 1.1 Introduction and Goals
  • 1.2 Final Project: Introduction to Research in Computational Biology
  • 1.3 Additional Materials
  • 1.4 Crash Course in Molecular Biology
  • 1.5 Introduction to Algorithms and Probabilistic Inference
2 Alignment I: Dynamic Programming, Global and Local Alignment

Chapter 2: Sequence Alignment and Dynamic Programming

  • 2.1 Introduction
  • 2.2 Aligning Sequences
  • 2.3 Problem Formulations
  • 2.4 Dynamic Programming
  • 2.5 The Needleman-Wunsch Algorithm
  • 2.6 Multiple Alignment
  • 2.7 Current Research Directions
  • 2.8 Further Reading
  • 2.9 Tools and Techniques
  • 2.10 What Have We Learned?
  • 2.11 Appendix
3 Alignment II: Database Search, Rapid String Matching, BLAST, BLOSUM

Chapter 3: Rapid Sequence Alignment and Database Search

  • 3.1 Introduction
  • 3.2 Global Alignment vs. Local Alignment vs. Semi-global Alignment
  • 3.3 Linear-time Exact String Matching
  • 3.4 The BLAST (Basic Local Alignment Search Tool) Algorithm
  • 3.5 Pre-processing for Linear-time String Matching
  • 3.6 Probabilistic Foundations of Sequence Alignment
  • 3.7 Current Research Directions
  • 3.8 Further Readings
  • 3.9 Tools and Techniques
  • 3.10 What Have We Learned?
4 Hidden Markov Models Part 1: Evaluation / Parsing, Viterbi, Forward Algorithms

Chapter 7: Hidden Markov Models I

  • 7.1 Introduction
  • 7.2 Motivation
  • 7.3 Markov Chains and HMMS: From Example to Formalizing
  • 7.4 Apply HMM to Real World: From Casino to Biology
  • 7.5 Algorithmic Settings for HMMs
  • 7.6 An Interesting Question: Can We Incorporate Memory in Our Model?
  • 7.7 Further Reading
  • 7.8 Current Research Directions
  • 7.9 Tools and Techniques
  • 7.10 What Have We Learned?
5 Hidden Markov Models Part 2: Posterior Decoding, Learning, Baum-Welch

Chapter 8: Hidden Markov Models II-Posterior Decoding and Learning

  • 8.1 Review of Previous Lecture
  • 8.2 Posterior Decoding
  • 8.3 Encoding Memory in an HMM: Detection of CpG Islands
  • 8.4 Learning
  • 8.5 Using HMMs to Align Sequences with Affine Gap Penalties
  • 8.6 Current Research Directions
  • 8.7 Further Reading
  • 8.8 Tools and Techniques
  • 8.9 What Have We Learned?
6 Transcript Structure: GENSCAN, RNA-seq, Mapping, De Novo Assembly, Diff Expr

Chapter 12: Large Intergenic Non-coding RNAs

  • 12.3 Practical Topic: RNAseq
7 Expression Analysis: Clustering / Classification, K-Means, Hierarchical, Bayesian

Chapter 15: Gene Regulation 1: Gene Expression Clustering

  • 15.1 Introduction
  • 15.2 Methods for Measuring Gene Expression
  • 15.3 Clustering Algorithms
  • 15.4 Current Research Directions
  • 15.5 Further Reading
  • 15.6 Resources
  • 15.7 What Have We Learned? 

Chapter 16: Gene Regulation 2: Classification

  • 16.1 Introduction
  • 16.2 Classification - Bayesian Techniques
  • 16.3 Classification Support Vector Machines
  • 16.4 Tumor Classification with SVMs
  • 16.5 Semi-Supervised Learning
  • 16.6 Current Research Directions
  • 16.7 Further Reading
  • 16.8 Resources
8 Networks I: Bayesian Inference, Deep Learning, Network Dynamics

Chapter 20: Networks I Inference, Structure, Spectral Methods

  • 20.1 Introduction
  • 20.2 Network Centrality Measures
  • 20.3 Linear Algebra Review
  • 20.4 Sparse Principal Component Analysis
  • 20.5 Network Communities and Modules
  • 20.6 Network Diffusion Kernels
  • 20.7 Neural Networks
  • 20.8 Open Issues and Challenges
  • 20.9 Current Research Directions
  • 20.10 Further Reading
  • 20.11 Tools and Techniques
  • 20.12 What Have We Learned?

Chapter 21: Regulatory Networks: Inferences, Analysis, Application

  • 21.1 Introduction
  • 21.2 Structure Inference
  • 21.3 Overview of the OGM Learning Task
  • 21.4 Applications of Networks
  • 21.5 Structural Properties of Networks
  • 21.6 Network Clustering
9 Networks II: Network Learning, Structure, Spectral Methods
10 Regulatory Motifs: Discovery, Representation, PBMs, Gibbs Sampling, EM

Chapter 17: Regulatory Motifs, Gibbs Sampling, and EM

  • 17.1 Introduction to Regulatory Motifs and Gene Regulation
  • 17.2 Expectation Maximization
  • 17.3 Gibbs Sampling: Sample from Joint (M, Zjj) Distribution
  • 17.4 De Novo Motif Discovery
  • 17.5 Evolutionary Signatures for Instance Identification
  • 17.6 Phylogenies, Branch Length Score, Confidence Score
  • 17.7 Possibly Deprecated Stuff Below
  • 17.8 Comparing Different Methods
  • 17.9 OOPS, ZOOPS, TCM
  • 17.10 Extension of the EM Approach
  • 17.11 Motif Representation and Information Content
11 Epigenomics: ChIP-Seq, Read Mapping, Peak Calling, IDR, Chromatin States

Chapter 19: Epigenomics / Chromatin States

  • 19.1 Introduction
  • 19.2 Epigenetic Information in Nucleosomes
  • 19.3 Epigenomic Assays
  • 19.4 Primary Data Processing of ChIP Data
  • 19.5 Annotating the Genome Using Chromatin Signatures
  • 19.6 Current Research Directions
  • 19.7 Further Reading
  • 19.8 Tools and Techniques
  • 19.9 What Have We Learned?
12 RNA Modifications: RNA Editing, Translation Regulation, Splicing Regulation

Chapter 11: RNA Modifications

  • 11.1 Introduction
  • 11.2 Post-transcriptional Regulation
  • 11.3 Current Research Directions
  • 11.4 Further Reading
  • 11.5 Tools and Techniques
  • 11.6 What Have We Learned?
13 Resolving Human Ancestry and Human History from Genetic Data

Chapter 29: Population History

  • 29.1 Introduction
  • 29.2 Quick Survey of Human Genetic Variation
  • 29.3 African European Gene Flow
  • 29.4 Gene Flow on the Indian Subcontinent
  • 29.5 Gene Flow Between Archaic Human Populations
  • 29.6 European Ancestry and Migrations
  • 29.7 Tools and Techniques
  • 29.8 Further Directions
  • 29.9 Further Reading
14 Disease Association Mapping, GWAS, Organismal Phenotypes

Chapter 31: Medical Genetics-The Past to the Present

  • 31.1 Introduction
  • 31.2 Goals of Investigating the Genetic Basis of Disease
  • 31.3 Mendelian Traits
  • 31.4 Complex Traits
  • 31.5 Genome-wide Association Studies
  • 31.6 Current Research Directions
  • 31.7 Further Reading
  • 31.8 Tools and Techniques
  • 31.9 What Have We Learned?
15 Quantitative Trait Mapping, Molecular Traits, eQTLs

Chapter 32: Variation 2: Quantitative Trait Mapping, eQTLs, Molecular Trait Variation

  • 32.1 Introduction
  • 32.2 eQTL Basics
  • 32.3 Structure of an eQTL Study
  • 32.4 Current Research Directions
  • 32.5 What Have We Learned?
  • 32.6 Further Reading
  • 32.7 Tools and Resources
16 Missing Heritability, Complex Traits, Interpret GWAS, Rank-based Enrichment

Chapter 33: Missing Heritability

  • 33.1 Introduction
  • 33.2 Current Research Directions
  • 33.3 Further Reading
  • 33.4 Tools and Techniques
  • 33.5 What Have We Learned?
17 Comparative Genomics and Evolutionary Signatures

Chapter 4: Comparative Genomics I: Genome Annotation

  • 4.1 Introduction
  • 4.2 Conservation of Genomic Sequences
  • 4.3 Excess Constraint
  • 4.4 Diversity of Evolutionary Signatures: An Overview of Selection Patterns
  • 4.5 Protein-coding Signatures
  • 4.6 MicroRNA (miRNA) Gene Signatures
  • 4.7 Regulatory Motifs
  • 4.8 Current Research Directions
  • 4.9 Further Reading
  • 4.10 Tools and Techniques
  • 4.11 Bibliography
18 Phylogenetics: Molecular Evolution, Tree Building, Phylogenetic Inference

Chapter 27: Molecular Evolution and Phylogenetics

  • 27.1 Introduction
  • 27.2 Basics of Phylogeny
  • 27.3 Distance Based Methods
  • 27.4 Character Based Methods
  • 27.5 Possible Theoretical and Practical Issues with Discussed Approach
  • 27.6 Towards Final Project
  • 27.7 What Have We Learned?
19 Phylogenomics: Gene / Species Trees, Reconciliation, Recombination Graphs

Chapter 28: Phylogenomics II

  • 28.1 Introduction
  • 28.2 Inferring Orthologs / Paralogs, Gene Duplication and Loss
  • 28.3 Reconstruction
  • 28.4 Modeling Population and Allele Frequencies
  • 28.5 SPIDIR
  • 28.6 Ancestral Recombination Graphs
  • 28.7 Conclusion
  • 28.8 Current Research Directions
  • 28.9 Further Reading
  • 28.10 Tools and Techniques
  • 28.11 What Have We Learned?
20 Personal Genomics, Disease Epigenomics: Systems Approaches to Disease

Chapter 34: Personal Genomes, Synthetic Genomes, Computing in C vs. Si

  • 34.1 Introduction
  • 34.2 Reading and Writing Genomes
  • 34.3 Personal Genomes
  • 34.4 Current Research Directions
  • 34.5 Further Reading
  • 34.6 Tools and Techniques
  • 34.7 What Have We Learned?

Chapter 36: Cancer Genomics

  • 36.1 Introduction
  • 36.2 Characterization
  • 36.3 Interpretation
  • 36.4 Current Research Directions
  • 36.5 Further Reading
  • 36.6 Tools and Techniques
  • 36.7 What Have We Learned?
21 Three-Dimensional Chromatin Interactions: 3C, 5C, HiC, ChIA-Pet

Chapter 30: Population Genetic Variation

  • 30.1 Introduction
  • 30.2 Population Selection Basics
  • 30.3 Genetic Linkage
  • 30.4 Natural Selection
  • 30.5 Human Evolution
  • 30.6 Current Research
  • 30.7 Further Reading
22 Genome Engineering with CRISPR / Cas9 and Related Technologies No readings for this lecture.
Learning Resource Types
Problem Sets
Online Textbook
Projects