WEBVTT

00:00:01.000 --> 00:00:06.000
Good morning. Welcome back.
So, the Red Sox won, it's pretty

00:00:06.000 --> 00:00:13.000
convincing, yeah,
very good. Yay Red Sox.

00:00:13.000 --> 00:00:20.000
So, as you can also tell,
I have something of a cold,

00:00:20.000 --> 00:00:27.000
so I'll see if I, if my voice makes
it through, but what I wanted to do

00:00:27.000 --> 00:00:34.000
today, if the voice allows,
was to talk about genomics.

00:00:34.000 --> 00:00:38.000
Now, this is a little bit different
than what we normally do in the

00:00:38.000 --> 00:00:42.000
class because, I
work on genomics,

00:00:42.000 --> 00:00:46.000
it's something I'm
extremely interested in.

00:00:46.000 --> 00:00:50.000
And so, what I wanted to do today,
and I'll do it one more time before

00:00:50.000 --> 00:00:54.000
the end of the term, is to
talk about research that's

00:00:54.000 --> 00:00:58.000
going on in genomics, give
you a sense of what's really

00:00:58.000 --> 00:01:02.000
going on. I can assure you that
what I say is not going to be in the

00:01:02.000 --> 00:01:05.000
text book, or any other text book.
And, I'm not entirely sure how this

00:01:05.000 --> 00:01:08.000
might appear on an exam, so
don't ask, because I'm really

00:01:08.000 --> 00:01:12.000
just going to talk about
research that's going on today.

00:01:12.000 --> 00:01:15.000
And part of the purpose in doing
that is to a, show you that it's

00:01:15.000 --> 00:01:18.000
possible for you to understand the
kind of research that's going on in

00:01:18.000 --> 00:01:21.000
this field, and b, to excite
you about what's going on

00:01:21.000 --> 00:01:25.000
in this field. So each
year I pick different

00:01:25.000 --> 00:01:28.000
things to talk about, and
I've picked a few things,

00:01:28.000 --> 00:01:32.000
and we'll see. So feel
free to interrupt and to ask

00:01:32.000 --> 00:01:36.000
questions, and all of that, but
this is very much more, sort of

00:01:36.000 --> 00:01:40.000
the edge of genomics,
including stuff that's going on,

00:01:40.000 --> 00:01:44.000
you know, right now as we
speak. So, we'll fire away.

00:01:44.000 --> 00:01:48.000
So a little introductory stuff.
I call this, we can actually keep

00:01:48.000 --> 00:01:52.000
the lights up,
I think people,

00:01:52.000 --> 00:01:56.000
can people read that?
Yeah, it's fine, good,

00:01:56.000 --> 00:02:00.000
so we'll leave the lights
up and I can see people.

00:02:00.000 --> 00:02:04.000
So, I think the thing that sets
apart this revolution of biology

00:02:04.000 --> 00:02:08.000
that we're looking through right
now, is the transformation of biology,

00:02:08.000 --> 00:02:12.000
not just from being the
study of living organisms,

00:02:12.000 --> 00:02:16.000
to the study of chemicals and
enzymes, to the study of molecules,

00:02:16.000 --> 00:02:20.000
but to the study of biology
as information. That is what's

00:02:20.000 --> 00:02:24.000
distinctive about this decade,
is the idea that the information

00:02:24.000 --> 00:02:28.000
sciences have begun to merge with
biology, or biology merged with

00:02:28.000 --> 00:02:32.000
information sciences, and
that it's having a profound

00:02:32.000 --> 00:02:36.000
effect on driving biomedicine. In
both of the two talks I'll give,

00:02:36.000 --> 00:02:40.000
this one and near the end of the
term, that will be the common theme,

00:02:40.000 --> 00:02:44.000
because I think that's the most
important thing that's going on

00:02:44.000 --> 00:02:48.000
right now. Now,
just to remind you,

00:02:48.000 --> 00:02:52.000
of course, the idea that biology
is about information is an old one,

00:02:52.000 --> 00:02:56.000
it goes back to my hero, Gregor
Mendel, with the recognition that

00:02:56.000 --> 00:03:00.000
information was passed from parent
to offspring, according to rules.

00:03:00.000 --> 00:03:04.000
And, as you know, the
history of biology in the 20th

00:03:04.000 --> 00:03:08.000
century can be read as the
development of biology's information.

00:03:08.000 --> 00:03:12.000
The first quarter of the 20th
century was the development of the

00:03:12.000 --> 00:03:16.000
idea that the information lives
in chromosomes. The next quarter of

00:03:16.000 --> 00:03:20.000
the 20th century, the idea
that the information of the

00:03:20.000 --> 00:03:24.000
chromosomes resides in the DNA
double-helix, and that information

00:03:24.000 --> 00:03:28.000
was contained in this molecule,
and somehow in it's sequence, and

00:03:28.000 --> 00:03:31.000
you know all of this. And
the next quarter of the 20th

00:03:31.000 --> 00:03:35.000
century, basically from 1950 to
1975, understanding how it is that the

00:03:35.000 --> 00:03:39.000
cell reads out that information,
from DNA to RNA to protein, how it

00:03:39.000 --> 00:03:43.000
uses a genetic code to
translate RNA's into proteins,

00:03:43.000 --> 00:03:46.000
and the development of the tools
of recombinant DNA that made it

00:03:46.000 --> 00:03:50.000
possible for us to read out the
information that the cell reads out.

00:03:50.000 --> 00:03:54.000
So that brought us ¾ of the
way through the 20th century,

00:03:54.000 --> 00:03:58.000
with the ability to read out genetic
information, at least in little ways,

00:03:58.000 --> 00:04:02.000
but they were little ways.
You could write a PhD thesis,

00:04:02.000 --> 00:04:07.000
around that time, for
sequencing 200 letters of DNA.

00:04:07.000 --> 00:04:12.000
That would be, you know,
considered amazingly exciting PhD

00:04:12.000 --> 00:04:17.000
thesis. The next quarter of the
20th century, the last quarter of

00:04:17.000 --> 00:04:22.000
the 20th century, was
characterized by a veracious

00:04:22.000 --> 00:04:27.000
appetite to read as much of
this information as possible.

00:04:27.000 --> 00:04:32.000
It started, first, with trying to
read out the sequence of individual

00:04:32.000 --> 00:04:37.000
genes, then sets of genes,
then genomes of small organisms'

00:04:37.000 --> 00:04:41.000
bacteria, medium-sized
organisms. And then, you know,

00:04:41.000 --> 00:04:45.000
in a wonderful closure to the 20th
century, the reading out of the

00:04:45.000 --> 00:04:48.000
nearly complete genetic information
of the human being in the closing

00:04:48.000 --> 00:04:52.000
weeks of the 20th century. When
you remember that, that Mendel

00:04:52.000 --> 00:04:55.000
was rediscovered in January of 1900,
that's when the papers rediscovering

00:04:55.000 --> 00:04:59.000
Mendel came out, and you
figure you've got perfect

00:04:59.000 --> 00:05:02.000
bookends from the rediscovery
of Mendel in January of 1900,

00:05:02.000 --> 00:05:06.000
to the sequencing of the
human genome in around 2000.

00:05:06.000 --> 00:05:09.000
You realize what a century can do.
It's not bad, as centuries go, you

00:05:09.000 --> 00:05:12.000
know, to accomplish all that, and
it gives you know, as students,

00:05:12.000 --> 00:05:15.000
you get a point estimate in
time of what science knows,

00:05:15.000 --> 00:05:18.000
but you guys aren't old enough yet
and haven't lived long enough yet,

00:05:18.000 --> 00:05:22.000
to measure the derivative, and
see how rapidly it's changing.

00:05:22.000 --> 00:05:25.000
But just look at what happened
over the course of that century,

00:05:25.000 --> 00:05:28.000
and then just project forward
to what that can mean for

00:05:28.000 --> 00:05:32.000
the next century. So what
that's done is it's brought

00:05:32.000 --> 00:05:36.000
us to the next picture. I
have a picture in my head,

00:05:36.000 --> 00:05:40.000
of biology as a vast library
of information, a library of

00:05:40.000 --> 00:05:44.000
information in which evolution
has been taking patient notes.

00:05:44.000 --> 00:05:48.000
Evolution is a very
good experimentalist,

00:05:48.000 --> 00:05:52.000
and it's a very patient note taker.
It's notes, of course, are written

00:05:52.000 --> 00:05:56.000
in the genomes, and
everyday evolution wakes up,

00:05:56.000 --> 00:06:00.000
changes a few nucleotides,
sees how the organism works,

00:06:00.000 --> 00:06:04.000
if it was an improvement,
evolution keeps the notes,

00:06:04.000 --> 00:06:08.000
if it was disadvantageous,
evolution discards the notes.

00:06:08.000 --> 00:06:11.000
That, by the way, for those
of you working in labs,

00:06:11.000 --> 00:06:14.000
is no longer considered
appropriate laboratory practice.

00:06:14.000 --> 00:06:17.000
You're obliged to keep your
laboratory notes from failed

00:06:17.000 --> 00:06:20.000
experiments, as well, but
evolution got into this before

00:06:20.000 --> 00:06:23.000
those rules were codified, and
so it discards the notes from

00:06:23.000 --> 00:06:26.000
unsuccessful experiments,
and keeps the notes from the

00:06:26.000 --> 00:06:29.000
successful experiments. But
nonetheless, we have all the

00:06:29.000 --> 00:06:32.000
notes from the successful
experiments, and we can learn a

00:06:32.000 --> 00:06:35.000
tremendous amount from it.
There's a volume on the shelf

00:06:35.000 --> 00:06:38.000
corresponding to each species on
the planet. There's a volume on the

00:06:38.000 --> 00:06:41.000
shelf corresponding to each
individual within each species,

00:06:41.000 --> 00:06:44.000
to each tissue within each
individual within each species,

00:06:44.000 --> 00:06:47.000
and there's information
there about the DNA sequence,

00:06:47.000 --> 00:06:50.000
about the RNA readouts, about
the protein expression levels,

00:06:50.000 --> 00:06:53.000
and in principle, even if not yet
in practice, we can pull down any

00:06:53.000 --> 00:06:56.000
volume we want,
and interrogate it,

00:06:56.000 --> 00:06:59.000
and compare it for related species,
for individuals within a species,

00:06:59.000 --> 00:07:02.000
some of whom might have a disease,
some of whom might not, for

00:07:02.000 --> 00:07:06.000
different kinds of tissues
treated in different ways.

00:07:06.000 --> 00:07:09.000
That is, I think, going
to be a tremendous theme of

00:07:09.000 --> 00:07:12.000
biology going forward, and
that's why it's a particular

00:07:12.000 --> 00:07:16.000
pleasure to teach biology at MIT,
where you guys understand what that

00:07:16.000 --> 00:07:19.000
could mean, that fusion could
mean. Now, this idea of extracting

00:07:19.000 --> 00:07:23.000
genomic information in large-scale,
is a relatively new one. In the

00:07:23.000 --> 00:07:26.000
mid-1980's, the scientific community
began debating what was a pretty

00:07:26.000 --> 00:07:30.000
radical idea, sequencing
the human genome.

00:07:30.000 --> 00:07:33.000
This was floated in a couple of
places, in 1984 at one meeting,

00:07:33.000 --> 00:07:37.000
somebody raised the idea, you've got
to realize that sequencing itself,

00:07:37.000 --> 00:07:41.000
that sequencing DNA, only
came from the late 70's,

00:07:41.000 --> 00:07:45.000
so within six, seven years of
being able to sequence anything,

00:07:45.000 --> 00:07:49.000
people were now saying,
let's sequence everything.

00:07:49.000 --> 00:07:52.000
That was a reasonably audacious
thing to do, and it was

00:07:52.000 --> 00:07:56.000
controversial. There
were many people who felt

00:07:56.000 --> 00:08:00.000
that the human genome
project was a terrible idea,

00:08:00.000 --> 00:08:04.000
and with good reason, because
the initial version of the

00:08:04.000 --> 00:08:08.000
human genome project was, kind
of, a blunderbuss approach.

00:08:08.000 --> 00:08:11.000
It was, let's immediately mount a
massive factory and start sequencing

00:08:11.000 --> 00:08:15.000
the human genome with the just
horrible technologies of the

00:08:15.000 --> 00:08:19.000
mid-80's, with radioactive
sequencing gels,

00:08:19.000 --> 00:08:22.000
and you know, lots and
lots of people doing stuff.

00:08:22.000 --> 00:08:26.000
And so, you know, many people in
science were, were concerned that an

00:08:26.000 --> 00:08:30.000
entire generation of students
would need to be chained to the

00:08:30.000 --> 00:08:33.000
bench, sequencing
DNA. Sydney Brenner,

00:08:33.000 --> 00:08:37.000
a great molecular biologist,
proposed the whole thing be done at

00:08:37.000 --> 00:08:41.000
institutions [LAUGHTER],
because you know, people could be

00:08:41.000 --> 00:08:45.000
sentenced to, 20 million bases,
with time off for accuracy, or

00:08:45.000 --> 00:08:48.000
things like that [LAUGHTER].
And so what happened was, the

00:08:48.000 --> 00:08:52.000
scientific community came
together well, in it's best form.

00:08:52.000 --> 00:08:56.000
Group, a group was put together by
the National Academy of Sciences,

00:08:56.000 --> 00:09:00.000
who said, well look, this
is a really good idea,

00:09:00.000 --> 00:09:04.000
but we also need a carefully
thought-through program to do it.

00:09:04.000 --> 00:09:07.000
We need intermediate goals that will
get us things that will advance the

00:09:07.000 --> 00:09:10.000
science along the way, we need
to improve the technologies,

00:09:10.000 --> 00:09:13.000
and laid out a plan. The goals of
that plan, to develop a genetic map,

00:09:13.000 --> 00:09:16.000
a map showing the locations
of DNA polymorphisms,

00:09:16.000 --> 00:09:19.000
sites of variation, genetic
markers, just like Sturdiman

00:09:19.000 --> 00:09:22.000
did with fruit flies,
but to do it with humans,

00:09:22.000 --> 00:09:25.000
and with DNA sequence differences,
to be used to trace inheritance.

00:09:25.000 --> 00:09:28.000
That, that genetic map could
be used to map human diseases,

00:09:28.000 --> 00:09:31.000
and if all you accomplish was,
got a human map of the human being,

00:09:31.000 --> 00:09:34.000
that would be a good thing. Then
you could get a physical map of

00:09:34.000 --> 00:09:38.000
the human being, all the
pieces of DNA overlapping

00:09:38.000 --> 00:09:41.000
each other, so that you would know
if you had a genetic marker linked

00:09:41.000 --> 00:09:44.000
to cystic fibrosis, you
would be able to get the piece

00:09:44.000 --> 00:09:48.000
of DNA that contains the gene.
Then, if we managed to pull that

00:09:48.000 --> 00:09:51.000
off, we could get a sequence of
the human genome, all three billion

00:09:51.000 --> 00:09:54.000
nucleotides, on the web, so
that you could go to just any

00:09:54.000 --> 00:09:58.000
place on the genome,
double-click, and up would pop the

00:09:58.000 --> 00:10:01.000
sequence. Now,
you guys of course,

00:10:01.000 --> 00:10:04.000
don't laugh at that, but about eight
years ago, when I would give talks

00:10:04.000 --> 00:10:07.000
about this, I would speak about,
oh you'll be able to go double-click

00:10:07.000 --> 00:10:10.000
and up will pop the sequence,
and of course, everybody thought

00:10:10.000 --> 00:10:13.000
that was really funny, and
that, that was something people

00:10:13.000 --> 00:10:16.000
laughed at. But of course,
you can just do that today, if

00:10:16.000 --> 00:10:19.000
anybody has a wireless you can just
double-click, and up will pop the

00:10:19.000 --> 00:10:22.000
sequence. And then, of
course, a complete inventory of

00:10:22.000 --> 00:10:25.000
all the genes within that sequence.
And a very importantly, and from

00:10:25.000 --> 00:10:28.000
the very beginning, the notion
that all this information

00:10:28.000 --> 00:10:31.000
should be completely,
freely available to anybody,

00:10:31.000 --> 00:10:34.000
regardless of where they were,
whether in academia, or industry,

00:10:34.000 --> 00:10:37.000
in first world, third world
countries, that everybody should

00:10:37.000 --> 00:10:40.000
have free and unrestricted
access to that information.

00:10:40.000 --> 00:10:43.000
So a plan was laid out, I
won't go into the details here,

00:10:43.000 --> 00:10:46.000
but the plan was laid out that
involved work constructing genetic

00:10:46.000 --> 00:10:49.000
maps, physical maps,
sequence maps, in the human,

00:10:49.000 --> 00:10:53.000
the mouse, and some model organisms,
including the bacteria yeast, fruit

00:10:53.000 --> 00:10:56.000
flies, worms. And, quite
remarkably, it largely went

00:10:56.000 --> 00:11:00.000
according to plan, over the
course of about 15 years.

00:11:00.000 --> 00:11:03.000
A lot of people in the scientific
community came together and took up

00:11:03.000 --> 00:11:06.000
different tasks. I should
say, with some pride,

00:11:06.000 --> 00:11:09.000
that MIT was by far, one of the
leading contributors to this effort,

00:11:09.000 --> 00:11:13.000
having been involved in
essentially every stage of this,

00:11:13.000 --> 00:11:16.000
the genetic mapping of human and
mouse, the physical mapping of human

00:11:16.000 --> 00:11:19.000
and mouse, and the
sequencing of human and mouse,

00:11:19.000 --> 00:11:23.000
and having been the leading
contributor to the latter,

00:11:23.000 --> 00:11:26.000
and it's not an accident because
MIT's a marvelous environment in

00:11:26.000 --> 00:11:30.000
which to undertake
this kind of research.

00:11:30.000 --> 00:11:33.000
It involved changing the way we
do biology. Back in the mid-80's,

00:11:33.000 --> 00:11:37.000
when we sequenced DNA, we
did it with radioactivity,

00:11:37.000 --> 00:11:40.000
remember I taught you how to
sequence using radioactive label of

00:11:40.000 --> 00:11:44.000
a gel, and all that.
That's how we did it,

00:11:44.000 --> 00:11:48.000
stood behind this plastic shield,
and you loaded the gels. Of course,

00:11:48.000 --> 00:11:51.000
now it's done in a highly automated
fashion. This is the production

00:11:51.000 --> 00:11:55.000
floor at the Broad Institute,
which is here at MIT, where robots

00:11:55.000 --> 00:11:59.000
prepare all the DNA samples, so
E. coli's grown up, and then you

00:11:59.000 --> 00:12:02.000
have to crack open the cells,
purify the DNA, purify the plasmid,

00:12:02.000 --> 00:12:06.000
do a sequencing reaction, etc.,
etc. it's all done robotically there,

00:12:06.000 --> 00:12:10.000
and this is capable of processing,
and does process, in a given day,

00:12:10.000 --> 00:12:13.000
about 200,000 samples per day.
They then go, and this is all

00:12:13.000 --> 00:12:17.000
equipment designed by people here at
MIT, and then commercially built for

00:12:17.000 --> 00:12:21.000
us. They then go to the
back room where, actually,

00:12:21.000 --> 00:12:24.000
these are the previous
generation of DNA sequencers,

00:12:24.000 --> 00:12:28.000
commercial detectors, those
capillary detectors that have

00:12:28.000 --> 00:12:32.000
little lasers on them, there's
a whole farm of them that

00:12:32.000 --> 00:12:36.000
sit there, and are
able to get data out.

00:12:36.000 --> 00:12:39.000
In the course of a single day, we
can now generate about 40 billion

00:12:39.000 --> 00:12:43.000
bases, I'm sorry, in the
course of a single year we

00:12:43.000 --> 00:12:46.000
can generate about 40
billion bases of DNA sequence.

00:12:46.000 --> 00:12:50.000
The genome project itself, was
a collaboration involving 20

00:12:50.000 --> 00:12:53.000
different groups around the world,
groups in the United States, United

00:12:53.000 --> 00:12:57.000
Kingdom, France,
Germany, and Japan,

00:12:57.000 --> 00:13:01.000
and China. They were of different
sizes, they used different

00:13:01.000 --> 00:13:04.000
approaches, but everybody was
committed to one common cause of

00:13:04.000 --> 00:13:08.000
producing this information,
and making it freely available,

00:13:08.000 --> 00:13:11.000
and everybody worked together.
And for the rest of my life,

00:13:11.000 --> 00:13:15.000
when it comes to Friday, at 11
o'clock, I will always think genome

00:13:15.000 --> 00:13:19.000
project, because we had a weekly
conference call of all the groups in

00:13:19.000 --> 00:13:23.000
the world working on this Fridays,
at eleven, and it was a fascinating

00:13:23.000 --> 00:13:26.000
experience, there were many,
many years of that. So a draft

00:13:26.000 --> 00:13:30.000
sequence, a rough draft
sequence of the human genome,

00:13:30.000 --> 00:13:34.000
was published in the year,
in February of 2001, it was

00:13:34.000 --> 00:13:38.000
announced with some fanfare in June
of 2000, but the real scientific

00:13:38.000 --> 00:13:42.000
paper came out in
February of 2001.

00:13:42.000 --> 00:13:45.000
This was not a perfect
sequence of the human genome,

00:13:45.000 --> 00:13:48.000
by any means. We discovered about
90% of the sequence of the human

00:13:48.000 --> 00:13:51.000
genome. It still had about 150,
00 gaps in it, it had errors. But,

00:13:51.000 --> 00:13:54.000
it still did have 90% of the
sequence of the human genome.

00:13:54.000 --> 00:13:57.000
For the next three years,
people worked very hard,

00:13:57.000 --> 00:14:00.000
and, as of last April, a
finished sequence of the human

00:14:00.000 --> 00:14:03.000
genome was produced, and was
published a couple weeks ago,

00:14:03.000 --> 00:14:06.000
and it contains, our
best guess, about 99.

00:14:06.000 --> 00:14:09.000
% of the human genome, and
it still has about 343 gaps,

00:14:09.000 --> 00:14:12.000
they're, we know what they are,
we know where they are, but they're

00:14:12.000 --> 00:14:16.000
not sequence able with
current technology.

00:14:16.000 --> 00:14:19.000
That's the finished human genome.
What is it like? Well, this is a

00:14:19.000 --> 00:14:23.000
picture of the genome,
do we have a pointer, yes,

00:14:23.000 --> 00:14:27.000
I see here we do have a pointer.
This is your genome here, this is

00:14:27.000 --> 00:14:31.000
chromosome number 11, and
I'll call attention to some

00:14:31.000 --> 00:14:34.000
interesting bits. So
these colored lines here,

00:14:34.000 --> 00:14:38.000
represent genes, or
gene-predictions, based on both,

00:14:38.000 --> 00:14:42.000
sequencing of the DNA, and
mapping them back to the genome,

00:14:42.000 --> 00:14:46.000
as well as computer programs
that analyze the genome.

00:14:46.000 --> 00:14:49.000
And, right here, you have
a big pileup of lots of

00:14:49.000 --> 00:14:52.000
genes, very few genes of here.
Lots of genes, few genes. Notice

00:14:52.000 --> 00:14:55.000
the places where there are lots
of genes, match up with these

00:14:55.000 --> 00:14:58.000
light-grey bands, which
are the light-grey bands of

00:14:58.000 --> 00:15:01.000
the microscope, on
chromosomes. The places with

00:15:01.000 --> 00:15:04.000
very few genes match up with
the dark bands in the chromosome.

00:15:04.000 --> 00:15:08.000
Do you know why that is, that
the gene-rich regions are these

00:15:08.000 --> 00:15:12.000
light bands, and the gene-poor
regions are the chromosome dark

00:15:12.000 --> 00:15:16.000
bands? Me neither. Nobody
has a clue. It's really,

00:15:16.000 --> 00:15:20.000
it's really just one of these things.
We had no reason to expect that

00:15:20.000 --> 00:15:24.000
we'd see these striking patterns,
and other genomes, e-coli, doesn't

00:15:24.000 --> 00:15:28.000
have this dense, urban
cluster, and these big,

00:15:28.000 --> 00:15:32.000
rural plains that are gene-poor.
This is very weird, and it's

00:15:32.000 --> 00:15:35.000
distinctive to mammals.
You'll also notice that the

00:15:35.000 --> 00:15:38.000
gene-rich regions, here,
are rich in G's and C's,

00:15:38.000 --> 00:15:41.000
they have different distributions
of some repeat elements,

00:15:41.000 --> 00:15:43.000
it's all sorts of weirdness that
comes from just looking at the

00:15:43.000 --> 00:15:46.000
genome. The biggest weirdness
was the number of genes,

00:15:46.000 --> 00:15:49.000
the count of genes is,
our best guess, about 22,

00:15:49.000 --> 00:15:52.000
00 genes, if I had to pick a number
today, it would be our count of

00:15:52.000 --> 00:15:55.000
genes, and of course,
that's down from the 100,

00:15:55.000 --> 00:15:58.000
00 that was in some textbooks,
and it's down from even 30 to 40,

00:15:58.000 --> 00:16:01.000
00 that was in the genome
paper of February, 2001.

00:16:01.000 --> 00:16:04.000
Our best guess is that it's
really just about that range.

00:16:04.000 --> 00:16:07.000
Genes, themselves,
are very interesting.

00:16:07.000 --> 00:16:11.000
When you look at, you know, if
we only have 22,000 genes we know

00:16:11.000 --> 00:16:15.000
of, how do we manage to run a
human being with so few genes?

00:16:15.000 --> 00:16:19.000
It is, by the way, probably
fewer genes than the mustard weed,

00:16:19.000 --> 00:16:22.000
or Arabidopsis thaliana. So,
what do we do? Well, humans, one

00:16:22.000 --> 00:16:26.000
thing we may take comfort in,
is that we, although we only have

00:16:26.000 --> 00:16:30.000
about 22,000 genes,
there's a lot of alternative

00:16:30.000 --> 00:16:34.000
splicing, on average the
typical gene, on average,

00:16:34.000 --> 00:16:38.000
has about two alternative
splice products.

00:16:38.000 --> 00:16:41.000
Some have many, some
have few, but probably,

00:16:41.000 --> 00:16:45.000
when you're all done, those 22,
00 genes may encode 70-80,000

00:16:45.000 --> 00:16:48.000
different proteins, and
it could be more than that

00:16:48.000 --> 00:16:52.000
because we don't know all the
alternative splice products,

00:16:52.000 --> 00:16:55.000
and what they do. But, if you ask,
humans get credit for being really

00:16:55.000 --> 00:16:59.000
inventive or creative, for
having lots of new genes that

00:16:59.000 --> 00:17:03.000
make us human,
the answer is, no.

00:17:03.000 --> 00:17:06.000
Not only are humans not different
in their gene complement from other

00:17:06.000 --> 00:17:10.000
mammals, mammals, as a
group, really haven't invented

00:17:10.000 --> 00:17:14.000
that much, when you get down to it.
Most of the recognizable sub-domains

00:17:14.000 --> 00:17:18.000
of proteins, proteins are
built up of sub-domains,

00:17:18.000 --> 00:17:22.000
recognizable sequences that have
certain motifs that fold up in

00:17:22.000 --> 00:17:26.000
certain ways, or carry out
certain enzymatic functions.

00:17:26.000 --> 00:17:30.000
And it looks like our genomes,
our genes, are mixed-and-matched

00:17:30.000 --> 00:17:34.000
combinations of many domains that
were invented a long time ago,

00:17:34.000 --> 00:17:38.000
in invertebrates and before, and
that most of evolutionary innovation

00:17:38.000 --> 00:17:42.000
in the more complex,
multi-cellular animals,

00:17:42.000 --> 00:17:46.000
has simply been mixing-and-matching
these domains in new ways,

00:17:46.000 --> 00:17:50.000
to get slightly
different functions.

00:17:50.000 --> 00:17:54.000
You don't get a lot of points for
creativity, but it does seem to work.

00:17:54.000 --> 00:17:58.000
By far, the most derivative of all,
and what characterizes our genome

00:17:58.000 --> 00:18:02.000
tremendously is,
when a gene works,

00:18:02.000 --> 00:18:07.000
make extra copies of it,
and let it diverge slightly,

00:18:07.000 --> 00:18:11.000
and take up new functions. Really,
your genome is just characterized by

00:18:11.000 --> 00:18:15.000
large expansions of families,
immunoglobulin-like genes,

00:18:15.000 --> 00:18:20.000
intermediate filament proteins
holding together the cytoskeleton.

00:18:20.000 --> 00:18:23.000
There are 111 different
keratin-like genes in your genome.

00:18:23.000 --> 00:18:26.000
They're all different,
they do different things,

00:18:26.000 --> 00:18:29.000
but they all came from one
gene that was copied, copied,

00:18:29.000 --> 00:18:33.000
copied, at random, randomly
duplicated, and then diverged to

00:18:33.000 --> 00:18:36.000
take up new functions. Growth
factors, flies and worms

00:18:36.000 --> 00:18:39.000
managed to get by just fine,
thank you, with two growth factors

00:18:39.000 --> 00:18:43.000
of the TGF beta-class,
whatever that is. You have 42

00:18:43.000 --> 00:18:46.000
growth factors of this TGF
beta-class, all of which help

00:18:46.000 --> 00:18:50.000
communicate, cells
communicate, in different ways.

00:18:50.000 --> 00:18:53.000
And then, of course, all
the olfactory receptors.

00:18:53.000 --> 00:18:57.000
In your genome, you have about 1,
00 genes for olfactory, for smell

00:18:57.000 --> 00:19:00.000
receptors. This is what Richard
Axel and Linda Buck won a Nobel

00:19:00.000 --> 00:19:04.000
Prize for this year, was
their work on the olfactory

00:19:04.000 --> 00:19:07.000
receptors. Sad to say though, out
of all your olfactory receptors,

00:19:07.000 --> 00:19:11.000
genes, most of them are broken.
They're most pseudo-genes.

00:19:11.000 --> 00:19:15.000
It's not true in dogs and mice,
who keep their olfactory receptor

00:19:15.000 --> 00:19:18.000
genes in pretty fine-working order,
but it's very clear that in primates

00:19:18.000 --> 00:19:22.000
with color vision, our
olfactory receptor genes have

00:19:22.000 --> 00:19:25.000
been going to seed. They've
been piling up mutations,

00:19:25.000 --> 00:19:28.000
and there's no selective
pressure to keep many of them.

00:19:28.000 --> 00:19:32.000
And, in fact, we've now shown, in
a paper that will come out soon,

00:19:32.000 --> 00:19:35.000
that this process is accelerating
dramatically in the last 7 million

00:19:35.000 --> 00:19:38.000
years since we diverged from
chimps. And so, humans have almost

00:19:38.000 --> 00:19:42.000
completely lost interest in smell,
that's not totally true, some of

00:19:42.000 --> 00:19:45.000
these olfactory receptors surely
matter for various processes,

00:19:45.000 --> 00:19:48.000
but most of them are
probably irrelevant right now.

00:19:48.000 --> 00:19:52.000
And so, anyway, that's the
nature of the genes there.

00:19:52.000 --> 00:19:56.000
Anyway, another interesting fact
that's worth mentioning about your

00:19:56.000 --> 00:20:01.000
genome is half of your genome
consists of transposable elements,

00:20:01.000 --> 00:20:05.000
elements that simply duplicate
themselves, and hop around the

00:20:05.000 --> 00:20:10.000
genome. Elements that are
like viruses, they make a copy,

00:20:10.000 --> 00:20:14.000
sometimes in RNA, the RNA is copied
back into DNA and slammed elsewhere

00:20:14.000 --> 00:20:19.000
in your genome.
These elements,

00:20:19.000 --> 00:20:24.000
well the, there
are four classes.

00:20:24.000 --> 00:20:27.000
Alo elements, Line elements,
Retro-Virus like elements, all these

00:20:27.000 --> 00:20:30.000
go through RNA intermediates,
and use reverse transcription.

00:20:30.000 --> 00:20:34.000
And then there's certain DNA
transposons, that go through DNA

00:20:34.000 --> 00:20:37.000
intermediate. The number of
copies of the aloe element,

00:20:37.000 --> 00:20:40.000
the aloe element that's
hopped around your genome,

00:20:40.000 --> 00:20:44.000
you have about a million, you
have a million fossils of this

00:20:44.000 --> 00:20:47.000
element. You say, why is
it there, and the answer is,

00:20:47.000 --> 00:20:50.000
because it's there. Because
anything that knows how to make a

00:20:50.000 --> 00:20:54.000
copy of itself, and insert
it itself in it's genome,

00:20:54.000 --> 00:20:57.000
you can't get rid of.
You can consider it,

00:20:57.000 --> 00:21:00.000
if you wish, an infection, but
half of your genome consists of

00:21:00.000 --> 00:21:03.000
an infection, with these
kinds of transposable elements.

00:21:03.000 --> 00:21:12.000
Now that's
it, yes?

00:21:12.000 --> 00:21:16.000
Well, it's very interesting,
what's the effect? Well, they do,

00:21:16.000 --> 00:21:20.000
some of them are transcribed
and, it's very interesting.

00:21:20.000 --> 00:21:24.000
Sometimes it's bad, one of them
will hop into a gene and mutate it,

00:21:24.000 --> 00:21:28.000
and that's bad, that person
will have a lethal mutation,

00:21:28.000 --> 00:21:32.000
but the genome has probably begun
to use them, and count on their being

00:21:32.000 --> 00:21:36.000
there. So, when a bunch,
when a transposable goes in,

00:21:36.000 --> 00:21:40.000
and creates a spacing, if you,
for example, if an engineering

00:21:40.000 --> 00:21:44.000
committee came in and cleaned up
the genome by getting rid of all the

00:21:44.000 --> 00:21:48.000
transposable elements,
it would surely not work.

00:21:48.000 --> 00:21:51.000
Because we have evolutionarily
come to count on the spacing there.

00:21:51.000 --> 00:21:55.000
It's sort of like, if in some very,
some very messy attic, you put a cup

00:21:55.000 --> 00:21:58.000
of coffee down on top of a stack of
papers, those papers may be utterly

00:21:58.000 --> 00:22:02.000
irrelevant, but now they're holding
up that cup of coffee that you put

00:22:02.000 --> 00:22:06.000
down on it. And if you were to just,
poof, magically get rid of them,

00:22:06.000 --> 00:22:10.000
the cup of coffee would
come crashing to the ground.

00:22:10.000 --> 00:22:13.000
So, you know it,
they're just there,

00:22:13.000 --> 00:22:17.000
taking up space. Now sometimes,
even more than that, a few of them

00:22:17.000 --> 00:22:20.000
have actually been co-opted
into being human genes.

00:22:20.000 --> 00:22:24.000
We know that a few of these
transposable elements have mutated

00:22:24.000 --> 00:22:27.000
into being our genes
that do something for us.

00:22:27.000 --> 00:22:31.000
And others of them may do things in
affecting the general neighborhood

00:22:31.000 --> 00:22:35.000
with regard to transcription,
and so, instead of it being a

00:22:35.000 --> 00:22:38.000
parasite, think of them as a
symbiont, that's a genomic symbiont,

00:22:38.000 --> 00:22:42.000
which takes some advantage of us,
and we may, you know, have worked

00:22:42.000 --> 00:22:46.000
out a compromise to take
some advantage of it.

00:22:46.000 --> 00:22:49.000
Every time a copy is made of these,
and it hops in the genome, some

00:22:49.000 --> 00:22:53.000
mutations may happen in the master
element, but when it lands in the

00:22:53.000 --> 00:22:56.000
new place, we have a record of
that hop. And if you reconstruct the

00:22:56.000 --> 00:23:00.000
sequence of the million AluI
elements, you can see which ones are

00:23:00.000 --> 00:23:04.000
very close relatives of each other,
and had to have hopped recently, and

00:23:04.000 --> 00:23:08.000
which ones are somewhat
more distant relatives.

00:23:08.000 --> 00:23:11.000
And you can build an evolutionary
tree connecting all of the repeat

00:23:11.000 --> 00:23:14.000
elements that have hopped around
your genome, and thereby attaching a

00:23:14.000 --> 00:23:17.000
date to each of them,
as to when they hopped.

00:23:17.000 --> 00:23:20.000
So it really is a fossil record,
and you can figure out how many of

00:23:20.000 --> 00:23:23.000
them have been hopping at
different times over history.

00:23:23.000 --> 00:23:27.000
And we can even make a plot
of that, this is long ago,

00:23:27.000 --> 00:23:30.000
sometime here, some 30 million years
ago, there was a huge explosion and

00:23:30.000 --> 00:23:33.000
in transposion,
transposons, in our genome.

00:23:33.000 --> 00:23:36.000
We don't know why that happened,
but it's very interesting, it does

00:23:36.000 --> 00:23:40.000
correspond to very interesting
periods of primate evolution.

00:23:40.000 --> 00:23:43.000
And then, interestingly,
there's been a huge crash,

00:23:43.000 --> 00:23:46.000
and transposition has dropped
dramatically. We have no clue why

00:23:46.000 --> 00:23:49.000
this is, but we have a whole
fossil record here of the rate of

00:23:49.000 --> 00:23:52.000
transposition of different kinds of
repeat elements around our genome,

00:23:52.000 --> 00:23:55.000
and people are now starting to try
to figure out what in the world this

00:23:55.000 --> 00:23:58.000
means. So all this is sort of
there, inherent in the sequence,

00:23:58.000 --> 00:24:01.000
and if you want the sequence, as
I say, you can go to the web and

00:24:01.000 --> 00:24:04.000
pull all this stuff now.
So how do we understand the

00:24:04.000 --> 00:24:07.000
sequence? Well, I've told
you a little bit about it,

00:24:07.000 --> 00:24:10.000
from the simple things that we've
done, but there's a lot more that

00:24:10.000 --> 00:24:13.000
needs to be learned about the
sequence, so what I really want to

00:24:13.000 --> 00:24:16.000
turn to, is how we're extracting
information out of this sequence.

00:24:16.000 --> 00:24:19.000
So, DNA sequence is long and
boring, it's only marginally more

00:24:19.000 --> 00:24:23.000
interesting than reading your hard
disk, because it has four letters,

00:24:23.000 --> 00:24:27.000
instead of ones and zeros, but it's,
you know, well, it's pretty really

00:24:27.000 --> 00:24:30.000
boring if you take a look at it.
How do you attach meaning to all

00:24:30.000 --> 00:24:34.000
this stuff? One of the most
powerful ways is by comparison with

00:24:34.000 --> 00:24:38.000
other genomes. And so,
comparing the human genome

00:24:38.000 --> 00:24:42.000
to the mouse genome is very
informative in many ways.

00:24:42.000 --> 00:24:45.000
So, as soon as the human genome
was far along, a portion of the

00:24:45.000 --> 00:24:49.000
international consortium, set
to work getting a sequence of

00:24:49.000 --> 00:24:52.000
the mouse genome. And that
was published in December

00:24:52.000 --> 00:24:56.000
of 2002. We have a nice map of the
mouse genome, with all these things,

00:24:56.000 --> 00:24:59.000
it, too, shows these gene-rich
regions, gene-poor regions,

00:24:59.000 --> 00:25:03.000
all sorts of funny things. And if
we look closely at a portion of the

00:25:03.000 --> 00:25:06.000
human genome over here, I've
picked about a million bases of

00:25:06.000 --> 00:25:10.000
the human genome, and we
take any little spot in that

00:25:10.000 --> 00:25:14.000
million bases of the human
genome, let's say over here.

00:25:14.000 --> 00:25:17.000
And we take half the DNA sequence
corresponding to this spot,

00:25:17.000 --> 00:25:20.000
and we run it in the computer
against the mouse genome,

00:25:20.000 --> 00:25:23.000
and ask where in the mouse genome
do we get the best match for this,

00:25:23.000 --> 00:25:26.000
the best match to this is here.
Now let's do it for this piece,

00:25:26.000 --> 00:25:29.000
here. The best match anywhere in
the mouse genome lands in the same

00:25:29.000 --> 00:25:32.000
million bases here as
the mouse genome. In fact,

00:25:32.000 --> 00:25:36.000
for every single sequence that we
pull out from this million bases in

00:25:36.000 --> 00:25:39.000
the human genome, the best
match is in this million

00:25:39.000 --> 00:25:42.000
bases of the mouse genome.
That's very interesting. Why is

00:25:42.000 --> 00:25:45.000
that? Sorry? No,
people do know.

00:25:45.000 --> 00:25:49.000
It, it was a good try, though.
[LAUGHTER]. This million bases in

00:25:49.000 --> 00:25:52.000
the mouse genome, and this
million bases in the human

00:25:52.000 --> 00:25:56.000
genome, represent the evolutionary
descendents of a common million

00:25:56.000 --> 00:25:59.000
bases that occurred in our common
ancestor 75-million years ago.

00:25:59.000 --> 00:26:03.000
This is a clear evidence
of the evolution here,

00:26:03.000 --> 00:26:06.000
because we can see that this is
a segment of DNA from our common

00:26:06.000 --> 00:26:10.000
ancestor that really hasn't
undergone much rearrangement,

00:26:10.000 --> 00:26:14.000
and we can just line up
the sequences and see.

00:26:14.000 --> 00:26:17.000
In fact, we can build a whole map
across the mouse genome like this.

00:26:17.000 --> 00:26:20.000
For any bit of the mouse genome,
I don't know, here's a bit on mouse

00:26:20.000 --> 00:26:24.000
chromosome 17, this whole
stretch corresponds to a

00:26:24.000 --> 00:26:27.000
portion of human chromosome
number eight. This stretch here,

00:26:27.000 --> 00:26:30.000
I don't know, this green color
here on chromosome number six,

00:26:30.000 --> 00:26:34.000
corresponds to chromosome
four in the human. And so,

00:26:34.000 --> 00:26:37.000
we can build a look-up table that
says, for any portion of the human

00:26:37.000 --> 00:26:40.000
genome, what's the corresponding
portion of the mouse genome that

00:26:40.000 --> 00:26:44.000
came from the same ancestor, has
basically the same complement of

00:26:44.000 --> 00:26:47.000
genes in it. And there's
only about 330 such

00:26:47.000 --> 00:26:50.000
regions that we need to
cut-and-paste the human genome order

00:26:50.000 --> 00:26:53.000
to the mouse genome order,
roughly speaking. There's a lot of

00:26:53.000 --> 00:26:56.000
little local rearrangements,
but at this gross level. So now,

00:26:56.000 --> 00:26:59.000
if we go back more closely and
we look at this, and we say,

00:26:59.000 --> 00:27:03.000
OK, so now we look at this region,
we now know these two regions

00:27:03.000 --> 00:27:06.000
descend from a common ancestor,
if we do a careful evolutionary

00:27:06.000 --> 00:27:09.000
analysis, lining up all
the sequences, and see how

00:27:09.000 --> 00:27:12.000
well-preserved the sequences are,
some are much better preserved than

00:27:12.000 --> 00:27:16.000
others. Evolution
has been much more

00:27:16.000 --> 00:27:20.000
lovingly conserving other
sequences than others, and so,

00:27:20.000 --> 00:27:24.000
so let's now zoom-in on a gene,
this is a gene that goes by the name,

00:27:24.000 --> 00:27:28.000
PP-Gama, I'm fond of this gene but,
it doesn't matter. If we look, I've

00:27:28.000 --> 00:27:32.000
indicated all the regions here, in
which there's a heightened degree

00:27:32.000 --> 00:27:36.000
of conservation. The sequence
is well-conserved here,

00:27:36.000 --> 00:27:40.000
here, here, here, here,
here, here, and here,

00:27:40.000 --> 00:27:44.000
here, here, here, here, here.
These correspond to the exons of

00:27:44.000 --> 00:27:48.000
the PPR-Gama gene, they
encode the protein of the gene,

00:27:48.000 --> 00:27:52.000
then the splicing goes like this, OK?
These things here do not correspond

00:27:52.000 --> 00:27:56.000
to the exons. People have
no idea what they are,

00:27:56.000 --> 00:28:00.000
in fact, this is not supposed to be
here. The official textbook picture

00:28:00.000 --> 00:28:04.000
says, the vast majority
of what matters for a gene,

00:28:04.000 --> 00:28:09.000
what evolution should preserve,
is the exons plus the promoter.

00:28:09.000 --> 00:28:13.000
Here's the promoter. But in
fact, what we found is that

00:28:13.000 --> 00:28:17.000
an awful lot more is being
preserved. In fact, across the genome,

00:28:17.000 --> 00:28:21.000
our best estimate is there are about
500,000 conserved elements across

00:28:21.000 --> 00:28:26.000
the genome, and only 1/3 of
them are protein-coding exons.

00:28:26.000 --> 00:28:30.000
That means 2/3 of the stuff
evolution has been interested in,

00:28:30.000 --> 00:28:34.000
is not protein-coding exons, and the
truth is, we do not know what it is,

00:28:34.000 --> 00:28:38.000
this was a very radical finding,
when this mouse paper came out,

00:28:38.000 --> 00:28:43.000
about a year and a half,
about two years ago now.

00:28:43.000 --> 00:28:47.000
What it must be, I
think, but we're guessing,

00:28:47.000 --> 00:28:51.000
are regulatory signals, the
structural elements in chromosomes,

00:28:51.000 --> 00:28:56.000
RNA genes, but there's an awful
lot more of it than we had imagined.

00:28:56.000 --> 00:28:59.000
And we've, now we're in
this fascinating situation,

00:28:59.000 --> 00:29:03.000
where computational analysis has
told us what's on evolution's mind,

00:29:03.000 --> 00:29:07.000
and now we have to go to the lab and
figure out what in the world it does.

00:29:07.000 --> 00:29:10.000
But there's no doubt that it must
do something, because evolution has

00:29:10.000 --> 00:29:14.000
preserved it quite well. Now,
I oversimplified greatly in

00:29:14.000 --> 00:29:18.000
this discussion,
let me first say,

00:29:18.000 --> 00:29:21.000
and I'll come back to that. We
do know, if we take some of those

00:29:21.000 --> 00:29:24.000
elements, here's one, there's
a 481 base-pair elements

00:29:24.000 --> 00:29:27.000
that's 84% identical between human
and mouse. You could write yourself

00:29:27.000 --> 00:29:31.000
a little statistical model to say
that's way unusual to have something

00:29:31.000 --> 00:29:34.000
that's so well preserved. When
Eddie Ruben and his colleagues

00:29:34.000 --> 00:29:37.000
from Berkley made a knockout
mouse that deleted that segment,

00:29:37.000 --> 00:29:40.000
this knockout mouse loses regulation
of three different genes in the

00:29:40.000 --> 00:29:43.000
neighborhood, saying that this
must be a regulatory sequence that

00:29:43.000 --> 00:29:47.000
affects multiple genes
in the neighborhood. That,

00:29:47.000 --> 00:29:50.000
that's one, with about 300, 00
such elements to go, in order to

00:29:50.000 --> 00:29:54.000
attach meaning to them. So
doing this entirely by knocking

00:29:54.000 --> 00:29:58.000
out mice will be a slow process,
one's going to need other ways to be

00:29:58.000 --> 00:30:02.000
able to attach meaning,
but there's no doubt. Now,

00:30:02.000 --> 00:30:06.000
there's some other interesting
papers where people have knocked

00:30:06.000 --> 00:30:10.000
some of these things out, and
they've seen no effect on the

00:30:10.000 --> 00:30:14.000
mouse. They get a totally viable
mouse. Can you conclude from that,

00:30:14.000 --> 00:30:18.000
that they have no function? Why
not? The knockout mouse is viable.

00:30:18.000 --> 00:30:22.000
Could be redundant, it
could even not be redundant,

00:30:22.000 --> 00:30:26.000
but yes, it could be redundant,
but you couldn't knock out both of

00:30:26.000 --> 00:30:29.000
two things. It turns
out, suppose knocking it

00:30:29.000 --> 00:30:33.000
out affected the mouse's viability
by part, ten to the third,

00:30:33.000 --> 00:30:37.000
it was only 99.9% as fertile,
would you be able to see that in the

00:30:37.000 --> 00:30:41.000
laboratory? No. Would
that matter to evolution?

00:30:41.000 --> 00:30:44.000
It would be lethal, in
an evolutionary sense.

00:30:44.000 --> 00:30:48.000
Such mutation could never
propagate through a population.

00:30:48.000 --> 00:30:52.000
One part, and ten to the third,
is massive selection against, from

00:30:52.000 --> 00:30:56.000
an evolutionary point of view,
but almost undetectable in a

00:30:56.000 --> 00:31:00.000
laboratory batch. Evolution
has a far more sensitive

00:31:00.000 --> 00:31:04.000
assay than we do. Now,
I won't go into detail,

00:31:04.000 --> 00:31:09.000
but for the mathematically inclined
here, showing that there really were

00:31:09.000 --> 00:31:13.000
about 5% of the human genome under,
under evolutionary selection, it was

00:31:13.000 --> 00:31:18.000
a complicated affair,
because with only two genomes,

00:31:18.000 --> 00:31:23.000
what we really had to do, and if
this doesn't make sense, ignore it.

00:31:23.000 --> 00:31:26.000
We looked at the background
distribution of conservation of the

00:31:26.000 --> 00:31:29.000
genome in unimportant elements,
in those repeat elements that we

00:31:29.000 --> 00:31:32.000
knew to be functionally
broken. We looked at the overall

00:31:32.000 --> 00:31:35.000
conservation of the genome, and
found that the overall genome

00:31:35.000 --> 00:31:38.000
has this rightward tail, by
subtracting the distributions we

00:31:38.000 --> 00:31:41.000
were able to see how much
excess conservation there was.

00:31:41.000 --> 00:31:44.000
That's because we only had two
genomes, we had to draw inferences.

00:31:44.000 --> 00:31:47.000
If we had more genomes,
like the mouse and the rat,

00:31:47.000 --> 00:31:50.000
and the dog and the-this-and-the-that,

00:31:50.000 --> 00:31:54.000
we would be able to
extract signal from noise.

00:31:54.000 --> 00:31:57.000
We would be able to see right away,
which bits were well-conserved, and

00:31:57.000 --> 00:32:01.000
we wouldn't have to do this as
a sensitive statistical analysis.

00:32:01.000 --> 00:32:05.000
So, in fact, we need more mammalian
genomes, so, so right now there's

00:32:05.000 --> 00:32:09.000
been a sequence of the rat
genome in the past year or so,

00:32:09.000 --> 00:32:12.000
there's a sequence of the dog genome,
we're writing up that paper now,

00:32:12.000 --> 00:32:16.000
but it's on the web already.
There's a sequence of the chimpanzee

00:32:16.000 --> 00:32:20.000
genome we're writing up a paper
on that, in collaboration with our

00:32:20.000 --> 00:32:24.000
friends in the
genome-sequencing community.

00:32:24.000 --> 00:32:27.000
We're currently sequencing
a variety of other organisms,

00:32:27.000 --> 00:32:30.000
as well. And if you had enough
organisms, you ought to be able to

00:32:30.000 --> 00:32:34.000
just line it up and say,
what has evolution preserved,

00:32:34.000 --> 00:32:37.000
and figure out exactly
which nucleotides matter,

00:32:37.000 --> 00:32:40.000
and which nucleotides don't, are
allowed to drift freely, at the

00:32:40.000 --> 00:32:44.000
background rate. How far
could you go with this?

00:32:44.000 --> 00:32:47.000
Well, we decided to try
an interesting experiment.

00:32:47.000 --> 00:32:50.000
We said, since mammals are very big,
then we're going to need a lot of

00:32:50.000 --> 00:32:54.000
genome sequences, how about
we try a small organism,

00:32:54.000 --> 00:32:58.000
like yeast? What if we
were to try to do this,

00:32:58.000 --> 00:33:02.000
this kind of evolutionary, genomic
analysis on something like the yeast

00:33:02.000 --> 00:33:06.000
genome? And so, this is
work that I'll describe,

00:33:06.000 --> 00:33:10.000
that was between a bunch of people
here at MIT who do genome-sequencing,

00:33:10.000 --> 00:33:14.000
and a student in computer science,
Manolis Kellis, was PhD student in

00:33:14.000 --> 00:33:18.000
computer science, he now
just joined the faculty here

00:33:18.000 --> 00:33:21.000
at MIT in computer science. But
it was a really great example of

00:33:21.000 --> 00:33:25.000
how biology and computer
science could come together.

00:33:25.000 --> 00:33:28.000
So, the genome-sequencing folks
sequenced three related species,

00:33:28.000 --> 00:33:32.000
through our friend, the baker's
yeast, Saccharomyces cerevisiae,

00:33:32.000 --> 00:33:35.000
workhorse of geneticist. These
three different species are

00:33:35.000 --> 00:33:39.000
separated by different evolutionary
distances, from Saccharomyces

00:33:39.000 --> 00:33:42.000
cerevisiae. When you line up their
genomes, just like with human and

00:33:42.000 --> 00:33:46.000
mouse, you find the genes
occur largely in the same order,

00:33:46.000 --> 00:33:49.000
and it's not hard to pick out,
oh there's this gene there, there,

00:33:49.000 --> 00:33:53.000
it's all lined up, you've got
these evolutionary segments,

00:33:53.000 --> 00:33:56.000
and very few rearrangements have
occurred across these species,

00:33:56.000 --> 00:34:00.000
despite the fact that they're about
20 million years apart in history.

00:34:00.000 --> 00:34:05.000
But here's an interesting
thing. When the yeast genome,

00:34:05.000 --> 00:34:11.000
Saccharomyces cerevisiae,
was first published in 1995,

00:34:11.000 --> 00:34:16.000
the paper describing it reported
6, 00 genes. Now, how did they know

00:34:16.000 --> 00:34:22.000
there were 6,200 genes? They
ran a computer program looking

00:34:22.000 --> 00:34:28.000
for open reading frames. Any
open reading frame, consecutive

00:34:28.000 --> 00:34:34.000
codons without a stop sufficiently
long, was called a gene.

00:34:34.000 --> 00:34:37.000
But statistically,
you could, by chance,

00:34:37.000 --> 00:34:41.000
just have a long stretch of
codons without a stop codon.

00:34:41.000 --> 00:34:44.000
And so, if I saw 100 codons
in a row, without a stop,

00:34:44.000 --> 00:34:48.000
they called it a gene, but
it might just be chance.

00:34:48.000 --> 00:34:52.000
And they knew that, of course,
they wrote that in the paper, but

00:34:52.000 --> 00:34:55.000
for many years,
people then had 6,

00:34:55.000 --> 00:34:59.000
00 open reading frames,
which were the yeast's genes.

00:34:59.000 --> 00:35:02.000
Could evolution now tell us which
one of them were real and which

00:35:02.000 --> 00:35:06.000
weren't? Well, it turns
out that evolution was

00:35:06.000 --> 00:35:10.000
tremendously powerful
in doing that.

00:35:10.000 --> 00:35:14.000
If you take something that's
a well-known gene that has been

00:35:14.000 --> 00:35:19.000
extensively studied by yeast
geneticists, you line it up across

00:35:19.000 --> 00:35:23.000
all four species, you
almost never see deletions.

00:35:23.000 --> 00:35:28.000
And when you do see the lesions,
here in grey, they're always a

00:35:28.000 --> 00:35:33.000
multiple of three. Why are
they a multiple of three?

00:35:33.000 --> 00:35:37.000
They preserve the reading frame.
By contrast, if I take some clear,

00:35:37.000 --> 00:35:42.000
intergenetic DNA, that's
not protein-coding,

00:35:42.000 --> 00:35:47.000
and I compare it across these four
species, I see lots and lots of

00:35:47.000 --> 00:35:52.000
frame shifting
deletions that occur,

00:35:52.000 --> 00:35:54.000
Evolution tolerates frame shifting
deletions, and if I juts write down

00:35:54.000 --> 00:35:57.000
the rates, frame shifting deletions
are 75x more common in intergenic

00:35:57.000 --> 00:36:00.000
DNA, than genic DNA. This
provides a very powerful test.

00:36:00.000 --> 00:36:03.000
Run this test across the genome,
looking for the density of frame

00:36:03.000 --> 00:36:06.000
shifting deletions, any
place that doesn't tolerate

00:36:06.000 --> 00:36:09.000
frame shifting deletions is probably
a real gene, anything that does

00:36:09.000 --> 00:36:12.000
tolerate it is probably not.
When you sorted through all this,

00:36:12.000 --> 00:36:15.000
it turned out that 528 of the
official yeast genes were clearly

00:36:15.000 --> 00:36:18.000
not real, not real genes. They
were just chock-a-block full

00:36:18.000 --> 00:36:22.000
of these frame shifting deletions.
And, and a bunch of others could be

00:36:22.000 --> 00:36:26.000
confirmed. So the yeast gene
count, and I won't tell you all the

00:36:26.000 --> 00:36:30.000
experimental and other
that shows this is right,

00:36:30.000 --> 00:36:34.000
but the yeast genome has now
been revised downward to 5,

00:36:34.000 --> 00:36:38.000
00 genes, and we have great
confidence that almost all of those

00:36:38.000 --> 00:36:42.000
are real genes, there
are 20 whose origins that

00:36:42.000 --> 00:36:46.000
we're not sure of, and new
genes could be found in this

00:36:46.000 --> 00:36:50.000
way. Here's a really
audacious thing.

00:36:50.000 --> 00:36:51.000
This graduate student in
computer science said, I think,

00:36:51.000 --> 00:36:53.000
based on these other species,
there was a mistake made in the

00:36:53.000 --> 00:36:55.000
sequencing of the first yeast, and
that the reason these things are

00:36:55.000 --> 00:36:57.000
called two separate genes, is
that somebody made a sequencing

00:36:57.000 --> 00:36:58.000
error that got a stop codon here,
but I think these are really part of

00:36:58.000 --> 00:37:00.000
one gene. And so, somebody
went back and re-sequenced

00:37:00.000 --> 00:37:02.000
some of these,
and sure enough,

00:37:02.000 --> 00:37:04.000
he had correctly predicted that
there had been a mistake made at

00:37:04.000 --> 00:37:06.000
that letter, and that these
were in fact, a single gene.

00:37:06.000 --> 00:37:11.000
The computational analysis was
incredibly powerful in this regard,

00:37:11.000 --> 00:37:17.000
it could go further than this, you
could ask, could I also figure out

00:37:17.000 --> 00:37:23.000
the way genes are regulated in
this fashion, could I work out the

00:37:23.000 --> 00:37:29.000
intergenic signals in the
promoter regions? Remember that lac

00:37:29.000 --> 00:37:35.000
repressor to a certain operator
site, well, all of these regulatory

00:37:35.000 --> 00:37:41.000
proteins bind to different sequences,
could we figure out what the

00:37:41.000 --> 00:37:46.000
sequences were, computational?
Well, if we look closely at a genic,

00:37:46.000 --> 00:37:50.000
intergenic region, here's one where
there's two genes being transcribed

00:37:50.000 --> 00:37:54.000
in opposite directions, gal-1
and gal-10, both involved in

00:37:54.000 --> 00:37:58.000
galactose metabolism, and
there's a particular protein,

00:37:58.000 --> 00:38:03.000
a transcription factor here,
called Gal-4, in this region,

00:38:03.000 --> 00:38:07.000
and it has a particular
sequence that it likes,

00:38:07.000 --> 00:38:11.000
CCG, 11 bases, GGC.
So, that Gal-4 we see,

00:38:11.000 --> 00:38:16.000
is very well preserved
across all of the species.

00:38:16.000 --> 00:38:20.000
So, in no regulatory
sequence is well-preserved,

00:38:20.000 --> 00:38:24.000
now let's look at that closely.
This Gal-4 binding site is a measly,

00:38:24.000 --> 00:38:29.000
crummy, six nucleotides
of information. At random,

00:38:29.000 --> 00:38:33.000
it's going to occur in many
places in the yeast genome,

00:38:33.000 --> 00:38:38.000
but not be a real, important Gal-4,
right? Some of them matter, some of

00:38:38.000 --> 00:38:42.000
them don't. How do we figure out
which of these occurrences are real

00:38:42.000 --> 00:38:46.000
Gal-4, well, if we look across all
four species, what we find is that

00:38:46.000 --> 00:38:51.000
those occurrences that
occur in promoter regions,

00:38:51.000 --> 00:38:55.000
are much more likely to be
conserved by evolution than those

00:38:55.000 --> 00:39:00.000
that don't. So there's
a special property here,

00:39:00.000 --> 00:39:04.000
conservation of the motif
and the motor regions.

00:39:04.000 --> 00:39:08.000
In fact, this particular sequence
is four times more likely to be

00:39:08.000 --> 00:39:12.000
preserved when it occurs
in a promoter region,

00:39:12.000 --> 00:39:16.000
than when it occurs in a coded
region. And for a typical control

00:39:16.000 --> 00:39:20.000
region, the opposite is true.
Since genes, since coding sequences

00:39:20.000 --> 00:39:24.000
are better preserved in general,
for a randomly chosen sequence, I

00:39:24.000 --> 00:39:28.000
don't know, ATGGCAT, it's
more likely to be preserved in

00:39:28.000 --> 00:39:32.000
coding regions than
non-coding regions.

00:39:32.000 --> 00:39:35.000
So this Gal-4 motif has a
very funky property that,

00:39:35.000 --> 00:39:38.000
on average, it's 12x more
likely than background,

00:39:38.000 --> 00:39:41.000
to be preserved when it
occurs in a promoter. Now,

00:39:41.000 --> 00:39:44.000
that's a test you apply to
another motif, and another motif.

00:39:44.000 --> 00:39:47.000
In fact, you could, by computer,
test all possible motifs, and ask

00:39:47.000 --> 00:39:50.000
which ones have that property?
Make a scatter plot, most motifs

00:39:50.000 --> 00:39:53.000
are better conserved when
they occur in promoter regions,

00:39:53.000 --> 00:39:56.000
than when they occur in
coding regions, some however,

00:39:56.000 --> 00:40:00.000
are better preserved in promoter
regions than in coding regions.

00:40:00.000 --> 00:40:04.000
Our friend, Gal-4, is up
there, but there are a lot

00:40:04.000 --> 00:40:09.000
more things like it, that
are better preserved by

00:40:09.000 --> 00:40:14.000
evolution than promoters are.
You can make a list of them. You

00:40:14.000 --> 00:40:19.000
can get about 72 well-conserved,
regulatory motifs and it turns out

00:40:19.000 --> 00:40:24.000
that 20 years of yeast work produced
knowledge about things like the

00:40:24.000 --> 00:40:29.000
Gal-4 site, and other sites.
Almost all the known regulatory

00:40:29.000 --> 00:40:34.000
sites that had been discovered
over the course of 20 years of

00:40:34.000 --> 00:40:39.000
experimental work appear on this
list that falls out of the computer

00:40:39.000 --> 00:40:44.000
analysis of evolutionary
comparison of genomes.

00:40:44.000 --> 00:40:48.000
You can actually go a step further,
I'll hesitate to tell you, but I'll

00:40:48.000 --> 00:40:53.000
try anyway. If you wanted to find
out, without knowing in advance,

00:40:53.000 --> 00:40:57.000
what these motifs were doing,
what their biological function was,

00:40:57.000 --> 00:41:02.000
you can do that informationally,
too. It turns out that if I take my

00:41:02.000 --> 00:41:06.000
motif, Gal-4, and I ask, which
chains does it occur in front

00:41:06.000 --> 00:41:11.000
of? Well, across Saccharomyces
cerevisiae, you find this crummy

00:41:11.000 --> 00:41:15.000
little motif in many, many
places because, as I said,

00:41:15.000 --> 00:41:20.000
most of it's just noise. But
if I ask, which genes have this

00:41:20.000 --> 00:41:24.000
motif in all four species, these
genes, there's a huge overlap

00:41:24.000 --> 00:41:28.000
with a class of genes involved
in carbohydrate metabolism.

00:41:28.000 --> 00:41:33.000
So, if I didn't know in advance
that the Gal-4 motif was involved in

00:41:33.000 --> 00:41:37.000
regulating genes in carbohydrate
metabolism, I could tell,

00:41:37.000 --> 00:41:41.000
just from the fact that the
genes that'd conserved it,

00:41:41.000 --> 00:41:46.000
are genes involved in
carbohydrate metabolism.

00:41:46.000 --> 00:41:50.000
You can do that using all sorts
of tricks, expression of genes,

00:41:50.000 --> 00:41:54.000
protein mass spec, blah, blah,
blah, and the short answer is, for

00:41:54.000 --> 00:41:58.000
almost all of those motifs that
you can find in the computer,

00:41:58.000 --> 00:42:02.000
by consulting public data bases of
sets of genes that are co-expressed,

00:42:02.000 --> 00:42:06.000
or have similar properties and all
that, the computer can also offer

00:42:06.000 --> 00:42:10.000
you a pretty good hypothesis about
what that motif is associated with.

00:42:10.000 --> 00:42:14.000
You can even go a step further
than that. You can begin to look at

00:42:14.000 --> 00:42:18.000
pairs of motifs, you can
say, if I have a certain

00:42:18.000 --> 00:42:23.000
regulatory sequence, number
one, and a second regulatory

00:42:23.000 --> 00:42:27.000
sequence, number two, do
they tend to be preserved in

00:42:27.000 --> 00:42:31.000
front of the same genes as each
other? Is their conservation

00:42:31.000 --> 00:42:36.000
correlated? And you can
build a map of these two

00:42:36.000 --> 00:42:40.000
guys tend, when this guy's
correlated, this guy tends to be

00:42:40.000 --> 00:42:44.000
correlated. And you can say, oh
those proteins must be talking to

00:42:44.000 --> 00:42:48.000
each other, and you can read that
off from the patterns of evolution,

00:42:48.000 --> 00:42:52.000
as well. There are two
regulators, one called Sterile 12,

00:42:52.000 --> 00:42:57.000
one called Tec1. This computational
analysis shows that they tend to

00:42:57.000 --> 00:43:01.000
co-occur in a conserved fashion,
far more often then you'd expect by

00:43:01.000 --> 00:43:05.000
chance. And when you do the
analysis, you find that those genes

00:43:05.000 --> 00:43:09.000
that just have a conserved
Sterile 12, those genes tend to

00:43:09.000 --> 00:43:13.000
be involved in mating. Genes
that just have a conserved

00:43:13.000 --> 00:43:16.000
instance of Tec1 tend to be
involved in the budding of the yeast,

00:43:16.000 --> 00:43:20.000
and those genes that have conserved
the occurrences of both tend to be

00:43:20.000 --> 00:43:23.000
involved in fillamentation.
Now all that can be read out,

00:43:23.000 --> 00:43:26.000
which is way cool, this is not
the way we used to do biology.

00:43:26.000 --> 00:43:30.000
Now don't get me wrong, there's
a ton of experiments that underlay

00:43:30.000 --> 00:43:33.000
creating these databases, and
there's a ton of experiments

00:43:33.000 --> 00:43:36.000
that have to be done to check any
of these things. But what we have is

00:43:36.000 --> 00:43:40.000
one of the most powerful
hypothesis generators that's ever

00:43:40.000 --> 00:43:44.000
been seen here. Evolution,
by telling us what to

00:43:44.000 --> 00:43:48.000
focus on, is giving us, on
a silver platter, hundreds of

00:43:48.000 --> 00:43:52.000
hypothesis about who's interacting
with whom, and sending us back to

00:43:52.000 --> 00:43:56.000
the lab then, to test
these hypotheses. Now,

00:43:56.000 --> 00:44:00.000
what are the implications of
all of this for the human genome?

00:44:00.000 --> 00:44:04.000
Could we do this for
the human genome? Well,

00:44:04.000 --> 00:44:08.000
these species,
Saccharomyces cerevisiase, S.

00:44:08.000 --> 00:44:12.000
paradoxus, S. mikatae and S.
bayanus, are they a good model for

00:44:12.000 --> 00:44:15.000
mammals? Well it
turns out that their

00:44:15.000 --> 00:44:19.000
evolutionary distance from each
other is the same as the distance of

00:44:19.000 --> 00:44:23.000
human to lemur,
to dog, to mouse.

00:44:23.000 --> 00:44:27.000
So they were chosen with a purpose.
Those are actually fairly good

00:44:27.000 --> 00:44:30.000
models for the human. So
could we do exactly the same

00:44:30.000 --> 00:44:34.000
analysis for the human,
for the entire human genome?

00:44:34.000 --> 00:44:38.000
If we had, human, lemur, dog,
and mouse, are basically four

00:44:38.000 --> 00:44:42.000
species, human,
mouse, rat, and dog.

00:44:42.000 --> 00:44:46.000
Well, there's one little fly in the
ointment. The human genome is 20x

00:44:46.000 --> 00:44:50.000
bigger than the yeast genome.
If I want to analyze the whole

00:44:50.000 --> 00:44:54.000
human genome, I have a
problem of signal-to-noise.

00:44:54.000 --> 00:44:58.000
The genome is 20x bigger, I've
got 20x as much noise to get

00:44:58.000 --> 00:45:03.000
rid of. I won't walk you through
it, but I need more evolutionary

00:45:03.000 --> 00:45:07.000
information to get rid of all that
noise. And, you can do a simple

00:45:07.000 --> 00:45:11.000
calculation that says, my
evolutionary tree needs to be

00:45:11.000 --> 00:45:15.000
bigger, it's branch length needs to
be bigger by about the natural log

00:45:15.000 --> 00:45:20.000
of 20, to get rid of
20 fold more noise.

00:45:20.000 --> 00:45:24.000
And that would mean I'd need more
species, I'd need about 16 species,

00:45:24.000 --> 00:45:28.000
or something like that to be
able to do that. But if I built an

00:45:28.000 --> 00:45:32.000
evolutionary tree that had
a branch length of four,

00:45:32.000 --> 00:45:36.000
that is, four substitutions per
base across this evolutionary tree,

00:45:36.000 --> 00:45:40.000
as indicated by these colored lines
here, I should have enough power to

00:45:40.000 --> 00:45:44.000
analyze the entire human genome,
the way we just did the yeast genome.

00:45:44.000 --> 00:45:48.000
So we currently have human,
chimp, mouse, rat, dog. As of this

00:45:48.000 --> 00:45:52.000
fall, during in fact, right
at the beginning of this term,

00:45:52.000 --> 00:45:56.000
the National Institute of Health
signed off on the sequencing of

00:45:56.000 --> 00:46:00.000
these additional eight mammals.
These mammals are now in process,

00:46:00.000 --> 00:46:04.000
and in fact, the elephant is done,
and the armadillo is in process,

00:46:04.000 --> 00:46:08.000
and the tree shrew, I think,
is being caught at the moment.

00:46:08.000 --> 00:46:12.000
[LAUGHTER]. The ten-,
don't talk about the tree

00:46:12.000 --> 00:46:18.000
shrews. The tenrec is actually
being tested right now,

00:46:18.000 --> 00:46:24.000
etc, and all this is going
on right now, as we speak,

00:46:24.000 --> 00:46:29.000
and I think that by next summer,
we should have much of, and by

00:46:29.000 --> 00:46:35.000
certainly, by a year from now, we
should have all this information

00:46:35.000 --> 00:46:41.000
to do such an analysis.
That said, we're of course,

00:46:41.000 --> 00:46:47.000
very impatient people, you
could just take the human,

00:46:47.000 --> 00:46:51.000
the mouse, the rat, and the dog.
And I said that's not enough if you

00:46:51.000 --> 00:46:55.000
wanted to analyze the whole genome,
but suppose you just wanted to

00:46:55.000 --> 00:46:59.000
analyze a portion of the genome,
maybe about a yeast-size piece of

00:46:59.000 --> 00:47:03.000
the genome, well let's see,
at 20,000 genes, I don't know,

00:47:03.000 --> 00:47:06.000
suppose I take, I don't know,
two kilo bases around each 20,

00:47:06.000 --> 00:47:10.000
00 genes, well that's you know,
40 mega bases of DNA, it's only a

00:47:10.000 --> 00:47:14.000
couple-fold more than yeast.
Maybe, if I just focus on a limited

00:47:14.000 --> 00:47:18.000
region around each promoter,
I could start reading out these

00:47:18.000 --> 00:47:22.000
regulatory signals,
with just four species.

00:47:22.000 --> 00:47:26.000
So in fact, the post-doctorate
fellow is, has been working on this

00:47:26.000 --> 00:47:30.000
problem over the summer, and
a little bit, too, through the

00:47:30.000 --> 00:47:34.000
spring and summer, together
with Manolis Kellis,

00:47:34.000 --> 00:47:38.000
who's now in the computer science
department. And I think we have a

00:47:38.000 --> 00:47:42.000
preliminary list for the human
genome that's fallen out over the

00:47:42.000 --> 00:47:46.000
course of the past couple of months,
and we're in the process, right now,

00:47:46.000 --> 00:47:50.000
of finishing up a paper that we're
hoping to get submitted by Friday,

00:47:50.000 --> 00:47:54.000
with a preliminary list of
regulatory signals in the human

00:47:54.000 --> 00:47:58.000
genome, read out from evolution
of human, mouse, rat, and dog.

00:47:58.000 --> 00:48:01.000
It won't be everything, we
don't have full power to pick up

00:48:01.000 --> 00:48:04.000
all possible signals, but
we're picking up a lot of the

00:48:04.000 --> 00:48:08.000
signals, we're picking up a
very large fraction of previously

00:48:08.000 --> 00:48:11.000
discovered signals, and
lots more new signals,

00:48:11.000 --> 00:48:14.000
as well, are falling out
of that analysis. So anyway,

00:48:14.000 --> 00:48:18.000
I can assure you that that's
not in the textbooks because,

00:48:18.000 --> 00:48:21.000
actually, it hasn't been submitted
yet. This other stuff I've

00:48:21.000 --> 00:48:25.000
described about the yeast analysis,
this, you do want to look it up,

00:48:25.000 --> 00:48:28.000
there's a paper in nature
about a year and change ago,

00:48:28.000 --> 00:48:32.000
Kellis et. al. describes this
yeast work. This is what's going on.

00:48:32.000 --> 00:48:36.000
This is what's fun about teaching
at MIT, as I can tell you this stuff,

00:48:36.000 --> 00:48:41.000
and you guys have a sense for the
convergence that's going on in our

00:48:41.000 --> 00:48:45.000
field. Much of what I've
tried to make the biology,

00:48:45.000 --> 00:48:50.000
you know, in making the biology
clear, I've talked about how the

00:48:50.000 --> 00:48:54.000
different directions,
genetics, biochemistry,

00:48:54.000 --> 00:48:59.000
have converged together. What
we're really seeing now is

00:48:59.000 --> 00:49:03.000
information sciences converging with
that as well, and I've got to say,

00:49:03.000 --> 00:49:08.000
it's a tremendous amount of
fun. See you on Monday, good

00:49:08.000 --> 00:49:13.000
luck on
the quiz.