Course Meeting Times
Lectures: 2 sessions / week, 1.5 hours / session
Course Description
Biology and medicine are moving into a new era that is characterized as being “data-rich.” In biological research, a single laboratory can produce terabytes of data per month that needs to be shared across the research community. Drug development involves analyzing hundreds of compounds with laboratory tests that generate huge amounts of data that must be analyzed and shared. Clinical trials assay thousands of individual data elements on hundreds of patients over many time points.
The objective of this course is to provide the students with the knowledge to address these challenges. We focus on the storage, integration, querying and management of heterogeneous, voluminous, geographically dispersed biomedical data. In addition to primary data, such as experimental data, the methods also address derived data such as those from analyzed microscope images. Examples of pathway analysis methods and the sharing and storage of the data that they generate will be presented. Querying across multiple databases is described, where the databases can be as diverse as microarray experiments, curated databases compiled by domain experts, or biomedical images. Other data sources include medical records, information on disease, references to literature, and biological pathways predicting protein expression. Several current examples from biological research will be presented.
Prerequisites
1.00 Introduction to Computers and Engineering Problem Solving; 6.001 Structure and Interpretation of Computer Programs; or experience with Web-based computing.
Reference Materials
There is no recommended text book for this course simply because to the best of our knowledge there is no single text book available that can address the breath and depth of the issues in this course. Hence the reference materials will be lecture notes, research experience of the course instructors in biomedical data management, and a set of research papers.
Term Paper
A term paper is required of all students. The subject of the term paper is the choice of the student, and can as examples be a driving problem in research, a new idea for managing biomedical data, or an improvement on an existing system.
Calendar
Instructors Key:
CFD = Prof. C. Forbes Dewey, Jr. (MIT)
SSB = Prof. Sourav S. Bhowmick (NTU, Singapore)
HY = Prof. Hanry Yu (NUS, Singapore)
LEC # | TOPICS | INSTRUCTORS | KEY DATES |
---|---|---|---|
Part 1: Introduction | |||
1 |
Biomedical information technology today - Grand challenge problems in biology and medicine |
CFD | |
Part 2: Biological and Medical Data | |||
2 |
Types and characteristics of biological and medical data - Distributed data systems |
HY | Assignment 1 out |
3 |
Examples from liver fibrosis - Gel electrophoresis |
HY | |
Part 3: Storing, Querying, and Integrating Biomedical Data | |||
4 |
Data avalanche in the biomedical world and role of databases - Relational data model |
SSB | |
5 |
Designing good database schema - Functional dependencies |
SSB | |
6 | Querying relational databases using SQL | SSB | |
7 |
Querying relational databases using SQL (cont.) - Limitations of relational data |
SSB | |
8 |
Issues in querying XML data using XPath and XQuery - XML query languages |
SSB | Assignment 1A due |
9 |
Querying XML data (cont.) - XML and relational databases |
SSB | |
10 |
Querying graphs (molecular networks) - Querying pathways and protein sources |
SSB | |
11 |
Data integration without semantics - Issues related to biological data integration |
SSB | Assignment 1B due |
Part 4: Ontology Management in Systems Biology | |||
12 |
Definitions and importance of ontologies - Standards for publishing and sharing ontologies (OWL, RDF) |
CFD | |
13 | Database approach to ontology storage and inference | SSB | Assignment 2 out |
14 |
Creating relational databases from ontologies - OWLdb |
CFD | |
15 |
Querying ontologies with SPARQL - Integrating ontologies and XML query processing |
SSB | |
Part 5: Biological Pathways | |||
16 |
Modeling and computing pathways - Modeling and representation of pathways (SBML, CellML) |
CFD | |
17 | Discussions with TAs related to assignments and term project | ||
18 |
Molecular network comparisons - Importance of molecular network comparison |
SSB | |
Part 6: Biological and Medical Data Integration | |||
19 |
SWAN: An advanced architecture for sharing scientific information - The stakeholders, requirements, and functionality |
Guest lecturer: Tim Clark, Harvard University | Assignment 2 due 3 days later |
20 |
Building a distributed pathway-enabled information system for biological research - Scope of the data sources and the application constraints |
CFD | |
Part 7: Grand Challenges | |||
21 |
Predicting drug efficacy by modeling - Current limits of predictability |
CFD | |
22 |
Revolutionizing the drug discovery pipeline - The need for change |
CFD | |
Part 8: Term Paper Presentations | |||
24-26 | Term paper presentations |