6.877J | Fall 2005 | Graduate

Computational Evolutionary Biology

Labs

Overview

Instructions for using the software for each laboratory will be provided. Most of the programs and all the data will be available for download to PCs, Macintosh®, and Linux®-based systems. Alternatively, you may use MIT servers’ computational resources.

  • Goals: The aim of the lab work is to make sure that you get a chance to apply the theory to real data. We can talk and talk about evolutionary models, but in our experience, getting your hands dirty may be the best way to learn.
  • Mechanics: In general, the labs will involve computer work with modeling and application to ’live’ data sets. The computational models and data will always be available to either download to your own computer (either PC, Macintosh®, or Linux® box), with the data available on the MIT server. We will arrange for a separate laboratory section in the MIT cluster in the basement of the Stata building that will serve as a question-answer time in general and for lab questions in particular. During the latter part of the course, we can use this time to talk about projects.
  • Format: Each lab will be divided into three parts: first, a ‘warmup’ section to get familiar with the computational tools; second, brief questions that review and extend the lecture material; and third, a substantive application to a real data set drawn from the literature. We anticipate that each laboratory will take about two weeks, unless otherwise noted. In general, you won’t need to do your own programming in C++, Perl, Java®, etc. - rather, we will use off-the-shelf tools. However, some questions and projects later on may require some elementary skills with manipulating data, that you should be able to do via Perl, or MATLAB® - almost any such set of tools. If you find that challenging, do not despair. Please just talk to me: we can team up people that have complementary skill sets, or substitute alternative questions, etc.
  • Lab Reports: We will strive to make this a ‘paperless’ class. Lab reports should be written up as Web pages, with the final URL sent to me.
  • Policy on Cooperative Work: We strongly encourage cooperation. You are free to work in groups, but please do write up your lab yourself (however, the Final Project is a team effort and requires only one write-up).

Lab 1: Forces of Evolution

Part 1(PDF)
Part 2 (PDF)

Lab 2: Evolution, Polymorphism and the Coalescent

Part 1 (PDF)
Part 2 (PDF)

Supporting files for Lab 2 (Part 1):

5-sequence-data.txt (TXT)
PiS-awk.txt (TXT)
sites-testdata.txt (TXT)

Supporting files for Lab 2 (Part 2):

ms is from Professor Richard Hudson at University of Chicago. The following compiled binaries and source files are provided courtesy of Richard Hudson, and used with permission.

msmac.tar.gz (GZ) (The GZ file contains: params, clms, migmat, col1, ms, stats, sample_stats, seedms, ._.DS_Store, and .DS_Store.)
mslinux.tar.gz (GZ) (The GZ file contains: clms, col1, seedms, ms, stats, and sample_stats.)
mslinuxsource.tar.gz (GZ) (The GZ file contains: 9 .c files, 1 .h file., clms, col1, migmat, params, readme, and seedms.)
msmacsource.tar.gz (GZ) (The GZ file contains: 10 .c files, ._.DS_Store, .DS_Store, clms, migmat, params, readme, seedms, and 1 .h file.)
mssource.zip (ZIP) (The ZIP file contains: 9 .c files, clms, col1, migmat, params, readme, seedms, and 1 .h file.)
ms - A Program for Generating Samples under Neutral Models (PDF)

Lab 3: Detecting Selection

Part 1 (PDF)
Part 2 (PDF)

Supporting files for Lab 3 (Part 1):

falciparum.fas (FAS)
pf11_0344.fasta (FASTA)
Pfa3D7_chr11ORFs50.fasta (FASTA)

Supporting files for Lab 3 (Part 2):

PAML Description (PDF)
paml-exercise1.ctl (CTL)
codonmlpair.ctl (CTL)
mhc.phy (PHY)
codonmlsites.ctl (CTL)
codonmllineage.ctl (CTL)
paml_gstd1_seqfile.txt (TXT)

Lab Supporting Files

The supporting files are needed to complete the lab exercises. Each lab explains the files and procedures needed to perform the various exercises and comparative tests in PAML, the software used in the course. In order to use PAML, users must provide a control (.ctl) file which specifies the particular model and estimation method to use and a data sequence (.phy) file given in the proper format.

The Population Genetics and Evolution Simulation

The Populus Java® computer program, version 5.3 that is needed to complete Lab 1 is available at the University of Minnesota Web site. Make sure you have checked for Java® compatibility according to the instructions on that page. A PDF file of instructions for all modules is available (PDF).