WEBVTT

00:00:06.594 --> 00:00:07.550
ERIC LANDER: Good morning.

00:00:07.550 --> 00:00:08.800
Good morning.

00:00:11.870 --> 00:00:16.120
So we've been talking about
recombinant DNA.

00:00:18.910 --> 00:00:22.520
And really what it does
to our picture here--

00:00:22.520 --> 00:00:25.080
function, gene, protein--

00:00:25.080 --> 00:00:28.790
is for the first time take
something that's a theoretical

00:00:28.790 --> 00:00:32.130
relationship and make
it operational.

00:00:32.130 --> 00:00:42.160
Being able to go from a function
like the ability to

00:00:42.160 --> 00:00:46.550
make your own arginine, to a
specific gene, to a specific

00:00:46.550 --> 00:00:50.250
protein, and to be able
to connect those up.

00:00:50.250 --> 00:00:53.340
In principle, by the time we're
done with recombinant

00:00:53.340 --> 00:00:56.310
DNA, one should be able to go
from any vertex of that

00:00:56.310 --> 00:00:59.190
triangle to any other vertex
of that triangle.

00:00:59.190 --> 00:01:01.680
Given a function,
find the genes.

00:01:01.680 --> 00:01:03.290
Given a gene, find
the proteins.

00:01:03.290 --> 00:01:04.890
Given a protein,
find the genes.

00:01:04.890 --> 00:01:07.700
Given a protein, find
the function.

00:01:07.700 --> 00:01:11.630
That's really the goal of
recombinant DNA is to be able

00:01:11.630 --> 00:01:14.750
to start at any vertex
and reach any other

00:01:14.750 --> 00:01:16.670
vertex of that triangle.

00:01:16.670 --> 00:01:17.910
We're not there yet.

00:01:17.910 --> 00:01:20.590
But we will be in the next
couple of days, to the point

00:01:20.590 --> 00:01:24.750
where we can move freely about
this whole picture.

00:01:24.750 --> 00:01:29.450
So we've talked about
DNA sequencing.

00:01:29.450 --> 00:01:31.840
And I want to pick up a little
bit with DNA sequencing.

00:01:31.840 --> 00:01:35.570
And I'll probably end with DNA
sequencing again, as I tell

00:01:35.570 --> 00:01:37.455
you where things stand today.

00:01:40.510 --> 00:01:43.860
How did we use DNA sequencing?

00:01:43.860 --> 00:01:45.780
Well, we found us a clone.

00:01:45.780 --> 00:01:49.620
Maybe our clone was a clone that
conferred the ability to

00:01:49.620 --> 00:01:50.810
grow without arginine.

00:01:50.810 --> 00:01:53.420
That was cloning by
complementation.

00:01:53.420 --> 00:01:57.160
Maybe it was a clone that
encoded beta globin.

00:01:57.160 --> 00:02:01.200
There we found an antibody that
recognized beta globin.

00:02:01.200 --> 00:02:05.170
We made a cDNA library with
an appropriate promoter.

00:02:05.170 --> 00:02:08.440
And we asked E. coli to produce
those human proteins

00:02:08.440 --> 00:02:10.259
from those cDNAs.

00:02:10.259 --> 00:02:13.560
And then we use our antibody
to recognize which clone.

00:02:13.560 --> 00:02:16.430
One way or the other, we found
ourselves a clone.

00:02:19.860 --> 00:02:21.870
The clone had this vector.

00:02:21.870 --> 00:02:23.520
It had this insert.

00:02:23.520 --> 00:02:25.750
The insert is now of
interest to us.

00:02:25.750 --> 00:02:27.410
We wish to sequence it.

00:02:27.410 --> 00:02:32.460
And we talked before about
taking that piece of DNA and

00:02:32.460 --> 00:02:33.710
subjecting it to sequencing.

00:02:36.820 --> 00:02:40.140
We started with a primer.

00:02:40.140 --> 00:02:41.765
From that primer, we extended.

00:02:45.200 --> 00:02:52.550
And we hit a point, let's say,
opposite an A or maybe

00:02:52.550 --> 00:03:02.950
opposite the next A or opposite
the A after that or

00:03:02.950 --> 00:03:05.300
opposite the A after that.

00:03:05.300 --> 00:03:10.450
And we had this clever trick,
for which Fred Sanger actually

00:03:10.450 --> 00:03:15.830
won a Nobel Prize, of using
a defective version of the

00:03:15.830 --> 00:03:20.930
nucleotide T that would
stop at that point.

00:03:20.930 --> 00:03:24.980
But remember, we didn't use only
defective T. We used a

00:03:24.980 --> 00:03:30.820
mixture of good T and defective
T. Let's say

00:03:30.820 --> 00:03:33.370
defective T was 1% of the whole
mixture, whenever we

00:03:33.370 --> 00:03:36.420
encountered a defective T and
put it in, the chain would

00:03:36.420 --> 00:03:38.510
stop because it couldn't
be extended.

00:03:38.510 --> 00:03:41.160
Whenever we put in a good
T, it would keep going.

00:03:41.160 --> 00:03:42.840
And so since we're making--

00:03:42.840 --> 00:03:46.050
Of course, we have millions and
millions of copies of our

00:03:46.050 --> 00:03:48.000
template sitting there
in the test tube.

00:03:48.000 --> 00:03:50.500
We're always working-- when I
talk about "a" molecule, and I

00:03:50.500 --> 00:03:53.110
draw a picture of a molecule,
we should always know that

00:03:53.110 --> 00:03:55.210
there's millions of copies
of those things.

00:03:55.210 --> 00:03:56.370
Some of them are
stopping here.

00:03:56.370 --> 00:03:57.410
Some of them are going
here, et cetera,

00:03:57.410 --> 00:03:58.900
et cetera, et cetera.

00:03:58.900 --> 00:04:02.400
And we end up with a
large collection.

00:04:02.400 --> 00:04:06.610
And if we separate it on a gel,
we can detect the lengths

00:04:06.610 --> 00:04:07.860
of those fragments.

00:04:13.250 --> 00:04:17.360
If we attach a fluorescent dye,
then those fragments are

00:04:17.360 --> 00:04:18.959
fluorescently labeled.

00:04:18.959 --> 00:04:24.660
We could all run them
in the same lane

00:04:24.660 --> 00:04:25.910
with different colors.

00:04:28.530 --> 00:04:32.060
And we could put our little
fluorescence detector and see

00:04:32.060 --> 00:04:32.990
what goes by.

00:04:32.990 --> 00:04:36.340
And the traces you would get,
the actual pictures that

00:04:36.340 --> 00:04:41.356
emerge from this, look
something like this.

00:04:55.920 --> 00:04:59.300
And you'd see colored traces
like that emerging from that

00:04:59.300 --> 00:05:00.840
electrophoretic detector.

00:05:00.840 --> 00:05:03.770
And you could read off the
sequence by the colors--

00:05:03.770 --> 00:05:07.190
really very gorgeous, very
simple, beautiful technology,

00:05:07.190 --> 00:05:09.890
all this taking place in a
little capillary tube.

00:05:13.030 --> 00:05:15.840
You remember how we made
a defective T, right?

00:05:15.840 --> 00:05:19.630
How did we make a defective T?

00:05:19.630 --> 00:05:20.270
Sorry?

00:05:20.270 --> 00:05:21.840
STUDENT: Dideoxy.

00:05:21.840 --> 00:05:22.930
ERIC LANDER: To dideoxy.

00:05:22.930 --> 00:05:25.520
Because remember, we need
that 3-prime hydroxyl in

00:05:25.520 --> 00:05:26.870
order to extend it.

00:05:26.870 --> 00:05:29.590
No 3-prime hydroxyl,
no extension.

00:05:29.590 --> 00:05:32.150
So if you make that deoxy
at the 3-prime

00:05:32.150 --> 00:05:34.220
position, you can extend.

00:05:34.220 --> 00:05:36.780
And since it was originally
2-prime deoxy, it's now

00:05:36.780 --> 00:05:38.740
2-prime, 3-prime dideoxy.

00:05:38.740 --> 00:05:41.440
That's all it takes
in order to block

00:05:41.440 --> 00:05:43.360
the extension reaction.

00:05:43.360 --> 00:05:46.380
And they sell the stuff,
and you can use it.

00:05:46.380 --> 00:05:47.215
All right.

00:05:47.215 --> 00:05:49.712
I had another question
for you guys.

00:05:49.712 --> 00:05:51.460
Where did the primer
come from?

00:05:55.550 --> 00:05:57.520
Sorry?

00:05:57.520 --> 00:05:57.930
Sorry?

00:05:57.930 --> 00:05:58.340
STUDENT: We put it in.

00:05:58.340 --> 00:05:58.970
ERIC LANDER: We put it in.

00:05:58.970 --> 00:06:01.605
How did we know what
to put in?

00:06:01.605 --> 00:06:03.000
STUDENT: [INAUDIBLE] catalog.

00:06:03.000 --> 00:06:04.950
ERIC LANDER: Well, the catalog's
very smart, but it

00:06:04.950 --> 00:06:06.440
doesn't tell us what we need.

00:06:06.440 --> 00:06:08.352
How do we know what
sequence it is?

00:06:12.510 --> 00:06:15.920
Actually, it turns out that so
many different sequences might

00:06:15.920 --> 00:06:18.910
be needed for different purposes
in molecular biology

00:06:18.910 --> 00:06:22.160
that they don't stock them all
in the catalog because you

00:06:22.160 --> 00:06:23.950
might order any sequence.

00:06:23.950 --> 00:06:26.610
So it really turns out that if
you want to order a specific

00:06:26.610 --> 00:06:31.320
20-letter sequence, you go on
the web, type it in, and then

00:06:31.320 --> 00:06:33.080
the machine will make it for
you, and you get it the next

00:06:33.080 --> 00:06:34.980
day, it turns out.

00:06:34.980 --> 00:06:36.790
But they don't actually put it
in the catalog because it

00:06:36.790 --> 00:06:38.740
would be too big an inventory.

00:06:38.740 --> 00:06:41.440
Although ones that people use
a lot they keep in the

00:06:41.440 --> 00:06:44.604
catalog, otherwise they just
make it on the spot for you.

00:06:44.604 --> 00:06:46.030
But how do we know
what sequence

00:06:46.030 --> 00:06:46.870
we're supposed to use?

00:06:46.870 --> 00:06:50.990
Here's my clone, let's say
arginine, the ARG1 gene.

00:06:50.990 --> 00:06:53.200
How did I know what sequence
to start with?

00:06:56.672 --> 00:06:59.120
You let me get away with just
put a primer there.

00:06:59.120 --> 00:07:00.370
And what does it match?

00:07:07.216 --> 00:07:08.683
Yeah?

00:07:08.683 --> 00:07:12.110
STUDENT: Could you use the
EcoR1 [INAUDIBLE]?

00:07:12.110 --> 00:07:13.360
ERIC LANDER: I did use EcoR1.

00:07:16.746 --> 00:07:18.730
STUDENT: [INAUDIBLE]

00:07:18.730 --> 00:07:23.778
ERIC LANDER: EcoR1 site
here, GAATTC--

00:07:23.778 --> 00:07:25.028
GAATTC--

00:07:27.090 --> 00:07:31.090
I'm going to cut in this site.

00:07:31.090 --> 00:07:34.130
I'm sorry, I'm going
to cut in this site

00:07:34.130 --> 00:07:35.715
like that, let's say.

00:07:35.715 --> 00:07:37.430
Yeah, like that.

00:07:37.430 --> 00:07:43.210
And this fragment here--

00:07:43.210 --> 00:07:44.590
sorry, like that--

00:07:44.590 --> 00:07:50.840
the fragment here that I'm going
to start sequencing from

00:07:50.840 --> 00:07:53.470
it starts with a G, right?

00:07:53.470 --> 00:07:57.100
Because it's opposite that C.

00:07:57.100 --> 00:08:00.240
So because I cut with EcoR1, I'm
pretty sure it starts with

00:08:00.240 --> 00:08:05.970
a G. It's not a very big
primer to use, though.

00:08:05.970 --> 00:08:08.080
I could start with a C, but I
don't think that's going to

00:08:08.080 --> 00:08:09.800
have enough binding
energy to do it.

00:08:18.249 --> 00:08:23.230
STUDENT: Doesn't a bacteria
already replicate that DNA?

00:08:23.230 --> 00:08:24.260
ERIC LANDER: It replicates
that DNA just fine.

00:08:24.260 --> 00:08:26.232
STUDENT: So there's already
a primer in there.

00:08:26.232 --> 00:08:27.220
Why do you [INAUDIBLE]?

00:08:27.220 --> 00:08:28.100
ERIC LANDER: Well,
because I've now

00:08:28.100 --> 00:08:29.836
purified out my insert.

00:08:29.836 --> 00:08:31.410
And I'm going to subject
it to sequencing.

00:08:31.410 --> 00:08:33.630
And I need to start
with a primer.

00:08:33.630 --> 00:08:34.630
I've got that fragment.

00:08:34.630 --> 00:08:37.710
I need to know how that
fragment starts.

00:08:37.710 --> 00:08:41.366
And I don't know how that
fragment starts.

00:08:41.366 --> 00:08:43.843
STUDENT: Is there a place you
could cut in a little bit

00:08:43.843 --> 00:08:44.159
after that?

00:08:44.159 --> 00:08:45.610
ERIC LANDER: Oh!

00:08:45.610 --> 00:08:51.640
What if I was really smart and
put in a different restriction

00:08:51.640 --> 00:08:54.520
site back here?

00:08:54.520 --> 00:08:59.070
And I use that restriction
enzyme, and I cut there?

00:08:59.070 --> 00:09:01.330
Then what would you be
able to tell me?

00:09:01.330 --> 00:09:03.380
Well, then the fragment
would start with a

00:09:03.380 --> 00:09:05.900
known vector sequence.

00:09:05.900 --> 00:09:07.250
Bingo.

00:09:07.250 --> 00:09:08.290
That'll work.

00:09:08.290 --> 00:09:08.660
Good.

00:09:08.660 --> 00:09:10.360
Good engineering.

00:09:10.360 --> 00:09:12.800
Since I don't know what the
sequence is of the thing I'm

00:09:12.800 --> 00:09:17.180
reading, I'd better back up a
little bit and use a known

00:09:17.180 --> 00:09:18.640
sequence from the vector.

00:09:18.640 --> 00:09:20.600
And then I keep going.

00:09:20.600 --> 00:09:22.850
This is all just to give
you a sense of all the

00:09:22.850 --> 00:09:24.430
tricks you can do.

00:09:24.430 --> 00:09:25.530
So now it's easy.

00:09:25.530 --> 00:09:28.700
Now, in fact if that's the
vector I use a lot, all the

00:09:28.700 --> 00:09:30.540
time, then bingo!

00:09:30.540 --> 00:09:31.740
I can go to the catalog
because they

00:09:31.740 --> 00:09:33.330
will stock that one.

00:09:33.330 --> 00:09:34.400
And I'll use it.

00:09:34.400 --> 00:09:39.210
And I can use my green primer
there to get going.

00:09:39.210 --> 00:09:39.970
All right.

00:09:39.970 --> 00:09:44.130
Now here's the problem.

00:09:44.130 --> 00:09:49.380
As I start sequencing, coming
down this capillary tube are

00:09:49.380 --> 00:09:52.970
fragments of different
lengths.

00:09:52.970 --> 00:09:58.860
The speed of migration depends
on the logarithm of the length

00:09:58.860 --> 00:10:00.110
of the fragment.

00:10:02.250 --> 00:10:03.940
Big fragments goes slower.

00:10:03.940 --> 00:10:07.110
It's inversely proportional
to the log of the

00:10:07.110 --> 00:10:08.140
length of the fragment.

00:10:08.140 --> 00:10:10.900
So big fragments go slower.

00:10:10.900 --> 00:10:12.590
And as they get bigger and
bigger and bigger, they go

00:10:12.590 --> 00:10:13.720
slower and slower and slower.

00:10:13.720 --> 00:10:20.190
But the difference between log
of 1,000 and log of 1,001 is

00:10:20.190 --> 00:10:22.220
pretty small.

00:10:22.220 --> 00:10:25.890
And then log of 1,001 and
1,002, very small.

00:10:25.890 --> 00:10:29.430
Actually, those peaks over there
start bunching up, and I

00:10:29.430 --> 00:10:31.760
can't tell them apart.

00:10:31.760 --> 00:10:35.270
So I can actually only go with
this electrophoretic process

00:10:35.270 --> 00:10:39.170
maybe 1,000 letters before the
peaks get too bunched up

00:10:39.170 --> 00:10:42.630
because the different speeds are
not so different anymore.

00:10:42.630 --> 00:10:51.940
So I can only read 1,000
bases, let's say.

00:10:51.940 --> 00:10:55.420
In practice, we would tend to
read 700, 800 bases because it

00:10:55.420 --> 00:10:56.520
started getting scruffy.

00:10:56.520 --> 00:10:59.190
But let's make it round, and
we'll say 1,000 bases.

00:10:59.190 --> 00:11:02.850
Now, suppose this fragment that
I got that's the ARG1

00:11:02.850 --> 00:11:06.970
gene is 3,000 bases.

00:11:06.970 --> 00:11:09.830
Well, we've got our clever trick
that you've introduced

00:11:09.830 --> 00:11:15.920
here of cutting back over here
at some previous site, using

00:11:15.920 --> 00:11:18.090
our primer here and reading.

00:11:21.760 --> 00:11:25.290
But it kind of dies at
about 1,000 bases.

00:11:25.290 --> 00:11:26.540
I can't read any further.

00:11:29.460 --> 00:11:30.710
What do I do?

00:11:33.260 --> 00:11:34.510
STUDENT: [INAUDIBLE]

00:11:47.470 --> 00:11:48.750
ERIC LANDER: What if there's
not a perfect

00:11:48.750 --> 00:11:52.260
restriction site there?

00:11:52.260 --> 00:11:53.770
But you're on the right track.

00:11:53.770 --> 00:11:55.676
Keep going.

00:11:55.676 --> 00:12:01.190
Once I've sequenced the first
1,000 bases, what do I know?

00:12:01.190 --> 00:12:04.870
The sequence of the
first 1,000 bases.

00:12:04.870 --> 00:12:07.364
What primer could I use then?

00:12:07.364 --> 00:12:08.360
STUDENT: [INAUDIBLE].

00:12:08.360 --> 00:12:10.410
ERIC LANDER: I can just make
a new primer based on those

00:12:10.410 --> 00:12:11.830
bases, right?

00:12:11.830 --> 00:12:13.730
So I don't even need my
restriction site anymore.

00:12:13.730 --> 00:12:14.680
You've got it exactly right.

00:12:14.680 --> 00:12:15.920
I use my knowledge.

00:12:15.920 --> 00:12:18.910
And then I could use
a new primer, and I

00:12:18.910 --> 00:12:23.130
could go more bases.

00:12:23.130 --> 00:12:31.360
Then I could use a new primer
and go more bases.

00:12:31.360 --> 00:12:36.550
Then a new primer, and
go more bases.

00:12:36.550 --> 00:12:38.060
And I can do what's
called "primer

00:12:38.060 --> 00:12:41.400
walking" along the clone.

00:12:41.400 --> 00:12:43.170
Will that work?

00:12:43.170 --> 00:12:43.940
You bet.

00:12:43.940 --> 00:12:45.190
That works just fine.

00:12:47.820 --> 00:12:51.100
It's also very slow.

00:12:51.100 --> 00:12:56.050
Because I had to get my first
bases, analyze them, order a

00:12:56.050 --> 00:12:58.760
new primer, and the next day
set up my next reaction.

00:12:58.760 --> 00:13:01.130
Take a couple days, get
my next reaction.

00:13:01.130 --> 00:13:03.110
Take a couple days,
next reaction.

00:13:03.110 --> 00:13:06.550
Imagine sequencing the human
genome like this.

00:13:06.550 --> 00:13:11.720
This could take a long time
if I do it in serial.

00:13:11.720 --> 00:13:13.150
So what else could I do?

00:13:13.150 --> 00:13:13.930
This works, by the way.

00:13:13.930 --> 00:13:15.000
This totally works.

00:13:15.000 --> 00:13:16.110
It's a good procedure, and it's

00:13:16.110 --> 00:13:17.470
used for certain purposes.

00:13:17.470 --> 00:13:18.720
But what else could I do?

00:13:24.790 --> 00:13:26.320
Here's the cool thing.

00:13:26.320 --> 00:13:28.220
We've got biology.

00:13:28.220 --> 00:13:30.490
But you guys, being MIT
students, you've also got

00:13:30.490 --> 00:13:33.260
computer science and other
tricks available.

00:13:33.260 --> 00:13:34.510
Here's a cool trick.

00:13:37.160 --> 00:13:39.820
I have my clone, 3,000 bases.

00:13:39.820 --> 00:13:41.690
I like my clone.

00:13:41.690 --> 00:13:47.030
I'm going to take my clone, my
fragment, of 3,000 bases, and

00:13:47.030 --> 00:13:51.180
maybe instead of sequencing it,
I'm going to shred it up

00:13:51.180 --> 00:13:54.060
into a lot of smaller pieces.

00:13:54.060 --> 00:13:59.180
Suppose I shred this up
into fragments of,

00:13:59.180 --> 00:14:01.600
say, size 800 bases.

00:14:01.600 --> 00:14:03.300
This was 3,000 bases.

00:14:03.300 --> 00:14:04.420
Now let me shred it up.

00:14:04.420 --> 00:14:06.750
I'll just take 800
as a number.

00:14:06.750 --> 00:14:09.740
Now remember, I had a lot of
copies of this clone, right?

00:14:09.740 --> 00:14:12.795
So I'm going to get shreds like
this and shreds like that

00:14:12.795 --> 00:14:14.060
and a shred like that.

00:14:14.060 --> 00:14:15.130
It's not just one copy.

00:14:15.130 --> 00:14:18.130
I've got a lot of copies of
this piece of DNA there.

00:14:18.130 --> 00:14:19.300
I'm going to shred it up.

00:14:19.300 --> 00:14:21.600
And there are ways to shred it
up by forcing it through a

00:14:21.600 --> 00:14:25.210
needle or treating it meanly,
or things like that.

00:14:25.210 --> 00:14:27.550
I'm going to get lots
of little fragments.

00:14:27.550 --> 00:14:33.540
And what I could do is I could
clone all of those little

00:14:33.540 --> 00:14:35.990
subfragments.

00:14:35.990 --> 00:14:39.040
I take my big fragment, and what
I'm going to do is I'm

00:14:39.040 --> 00:14:46.400
going to make a new library
of subfragments.

00:14:46.400 --> 00:14:48.230
Got it?

00:14:48.230 --> 00:14:51.340
Now I have a whole lot of
subfragments, each taken from

00:14:51.340 --> 00:14:52.580
my 3,000 bases.

00:14:52.580 --> 00:14:53.830
And they're all kind
of smallish.

00:14:56.910 --> 00:14:58.720
What could I do to all of those
little subfragments?

00:15:01.910 --> 00:15:04.520
They're all living
in a vector.

00:15:04.520 --> 00:15:07.060
Each is in its own vector.

00:15:07.060 --> 00:15:07.570
I've spread them out.

00:15:07.570 --> 00:15:08.630
I've made a library.

00:15:08.630 --> 00:15:10.080
They're each in their
own bacteria.

00:15:10.080 --> 00:15:11.370
They're each in a vector.

00:15:11.370 --> 00:15:13.240
That vector has a known
sequence at its

00:15:13.240 --> 00:15:16.530
end, the green primer.

00:15:16.530 --> 00:15:22.430
Couldn't I just sequence a lot
of different random subclones

00:15:22.430 --> 00:15:24.155
and paste them together?

00:15:24.155 --> 00:15:27.440
See, that's where it pays to
have computer science as well.

00:15:27.440 --> 00:15:34.380
Because what I could do is
by subcloning these into

00:15:34.380 --> 00:15:36.380
individual little
random pieces--

00:15:36.380 --> 00:15:38.210
I have no idea how they've
been broken up.

00:15:38.210 --> 00:15:38.610
I don't care--

00:15:38.610 --> 00:15:41.730
I just take the total DNA, shred
it up into pieces, sub

00:15:41.730 --> 00:15:44.460
clone it into a vector.

00:15:44.460 --> 00:15:47.610
And now because it's in the
vector, I could read this one

00:15:47.610 --> 00:15:49.760
and that one and this one
and that one and this

00:15:49.760 --> 00:15:50.510
one and that one.

00:15:50.510 --> 00:15:53.980
Do I actually know which
ones I'm reading?

00:15:53.980 --> 00:15:54.330
No.

00:15:54.330 --> 00:15:57.130
It's totally random.

00:15:57.130 --> 00:16:00.500
I take my 3,000 bases, shred
it up into lots of smaller

00:16:00.500 --> 00:16:03.800
pieces, and I just read
a lot of them.

00:16:03.800 --> 00:16:06.090
When I get a whole lot of
these pieces, maybe 800

00:16:06.090 --> 00:16:09.410
letters each, what do I do?

00:16:09.410 --> 00:16:13.330
I write me a piece of code
that looks for overlaps

00:16:13.330 --> 00:16:16.020
between them and start
pasting it together.

00:16:18.540 --> 00:16:20.180
And that's called assembly.

00:16:20.180 --> 00:16:23.250
You assemble the sequence out
of its little pieces.

00:16:23.250 --> 00:16:26.510
And so you can assemble
things.

00:16:26.510 --> 00:16:31.301
And this gets referred to
as shotgun sequencing.

00:16:31.301 --> 00:16:33.940
That's what it's really called,
because it's like you

00:16:33.940 --> 00:16:35.630
shoot it out of the end of
the shotgun or something.

00:16:35.630 --> 00:16:37.160
It's broken up into
a lot of pieces.

00:16:37.160 --> 00:16:41.310
It's just a shotgun, random
approach where I take

00:16:41.310 --> 00:16:43.920
individual random clones,
and I assemble them.

00:16:43.920 --> 00:16:44.660
Any questions about that?

00:16:44.660 --> 00:16:46.440
It's really a way to do it.

00:16:46.440 --> 00:16:49.650
And the big difference there is
you can do it in parallel.

00:16:49.650 --> 00:16:52.820
Rather than doing it one step
at a time, which sounds so

00:16:52.820 --> 00:16:54.990
logical but takes
so much time--

00:16:54.990 --> 00:16:55.960
easier.

00:16:55.960 --> 00:17:01.220
Just shred it up, read lots of
them all the same afternoon,

00:17:01.220 --> 00:17:03.810
and then assemble them
by computer.

00:17:03.810 --> 00:17:06.099
So that's nice.

00:17:06.099 --> 00:17:06.849
So I do that.

00:17:06.849 --> 00:17:08.150
I get my clone.

00:17:08.150 --> 00:17:10.579
I'm going to now do my computer
assembly of it.

00:17:10.579 --> 00:17:14.780
And I'm going to get
my 3,000 bases.

00:17:14.780 --> 00:17:20.820
How do I analyze my clone,
my clone sequence?

00:17:24.319 --> 00:17:29.818
Now I have 3,000 letters
in order, nicely done.

00:17:29.818 --> 00:17:31.230
What do I do with it?

00:17:36.460 --> 00:17:39.680
That clone, let's say, is able
to confer the ability to grow

00:17:39.680 --> 00:17:41.270
without arginine.

00:17:41.270 --> 00:17:45.120
It encodes some enzyme that
lets you make arginine.

00:17:45.120 --> 00:17:46.260
I've got 3,000 letters.

00:17:46.260 --> 00:17:48.680
How do I tell what it's doing?

00:17:48.680 --> 00:17:50.040
What do I look for?

00:17:50.040 --> 00:17:50.757
Yep?

00:17:50.757 --> 00:17:52.705
STUDENT: Compare it with
something that doesn't have

00:17:52.705 --> 00:17:54.660
[INAUDIBLE]?

00:17:54.660 --> 00:17:56.700
ERIC LANDER: Well, so tell
me what I'm looking for?

00:17:56.700 --> 00:17:59.172
I'm looking for a gene?

00:17:59.172 --> 00:18:02.136
STUDENT: Yes, [INAUDIBLE].

00:18:02.136 --> 00:18:02.630
ERIC LANDER: Yeah.

00:18:02.630 --> 00:18:05.320
So what about that gene?

00:18:05.320 --> 00:18:06.860
What's distinctive
about genes?

00:18:06.860 --> 00:18:09.391
How do I recognize a gene?

00:18:09.391 --> 00:18:10.792
It's tricky.

00:18:10.792 --> 00:18:13.127
STUDENT: The sequence?

00:18:13.127 --> 00:18:13.594
ERIC LANDER: How can I just

00:18:13.594 --> 00:18:14.630
recognize it from the sequence?

00:18:14.630 --> 00:18:16.938
Can I tell that something
is a gene?

00:18:16.938 --> 00:18:18.460
STUDENT: Start codon.

00:18:18.460 --> 00:18:20.660
ERIC LANDER: I could look
for a start codon, ATG.

00:18:20.660 --> 00:18:24.970
Do you think that'll happen
just by chance, though?

00:18:24.970 --> 00:18:28.390
There'll be a lot ATGs running
around, because you've got two

00:18:28.390 --> 00:18:30.120
strands, three reading frames.

00:18:30.120 --> 00:18:32.390
It'll happen pretty often.

00:18:32.390 --> 00:18:33.570
But that's a start.

00:18:33.570 --> 00:18:35.362
What happens after
the start codon.

00:18:35.362 --> 00:18:36.580
STUDENT: There's a stop codon.

00:18:36.580 --> 00:18:38.450
ERIC LANDER: There's a stop
codon, at some point.

00:18:38.450 --> 00:18:39.660
And what's in between
the start codon

00:18:39.660 --> 00:18:41.768
and the stop codon?

00:18:41.768 --> 00:18:44.000
STUDENT: [INAUDIBLE].

00:18:44.000 --> 00:18:47.310
ERIC LANDER: Well,
no stop codons.

00:18:47.310 --> 00:18:52.220
A gene should look like ATG
and a whole lot of codons

00:18:52.220 --> 00:18:55.360
without any stops in the
reading frame, until

00:18:55.360 --> 00:18:57.390
you get to a stop.

00:18:57.390 --> 00:19:01.320
That's called an open reading
frame, a long stretch without

00:19:01.320 --> 00:19:02.780
stop codons.

00:19:02.780 --> 00:19:04.610
So I could look for an
open reading frame.

00:19:18.560 --> 00:19:21.020
So by an open reading frame,
I mean a long stretch that

00:19:21.020 --> 00:19:24.370
starts with an ATG and
then goes on and on

00:19:24.370 --> 00:19:25.570
and on and on with--

00:19:25.570 --> 00:19:26.630
How frequent are stops?

00:19:26.630 --> 00:19:28.500
There are three stops
out of 64.

00:19:28.500 --> 00:19:31.340
One codon in 20 is a
stop, on average.

00:19:31.340 --> 00:19:35.030
So if I go 20 codons, I might
see a stop, on average.

00:19:35.030 --> 00:19:40.990
But suppose I run for 100
codons, and there's no stop---

00:19:40.990 --> 00:19:43.460
without a stop codon.

00:19:43.460 --> 00:19:48.130
That's pretty impressive, isn't
it, if I can read 100

00:19:48.130 --> 00:19:50.370
codons in a row, and
I never see a stop,

00:19:50.370 --> 00:19:51.760
that's pretty unusual.

00:19:51.760 --> 00:19:53.110
So I say, that's an open
reading frame.

00:19:56.370 --> 00:19:58.960
That's one way to recognize
the gene in there.

00:19:58.960 --> 00:20:01.770
The problem is introns.

00:20:01.770 --> 00:20:04.760
What happens if there's
an intron?

00:20:04.760 --> 00:20:05.900
Yikes.

00:20:05.900 --> 00:20:08.120
Then it'll be spliced there.

00:20:08.120 --> 00:20:10.490
That'll be spliced out, but I
won't initially know that,

00:20:10.490 --> 00:20:11.350
reading the sequence.

00:20:11.350 --> 00:20:14.730
And there could be stop codons
in the intron because it's not

00:20:14.730 --> 00:20:17.510
part of the final message.

00:20:17.510 --> 00:20:18.780
So I'm in trouble.

00:20:18.780 --> 00:20:22.050
So happily, in yeast, which has
very small introns, not

00:20:22.050 --> 00:20:25.140
very many introns, I can
actually almost get away by

00:20:25.140 --> 00:20:26.860
looking for open
reading frames.

00:20:26.860 --> 00:20:31.332
In human DNA, this
is kind of lousy.

00:20:31.332 --> 00:20:33.760
Well, it's really problematic
because

00:20:33.760 --> 00:20:35.420
there'll be too many introns.

00:20:35.420 --> 00:20:38.070
There, other tricks
that get used--

00:20:38.070 --> 00:20:42.090
I could make cDNA and compare
it to cDNAs, which have

00:20:42.090 --> 00:20:44.210
already spliced everything out,
and look for the open

00:20:44.210 --> 00:20:45.390
reading frame.

00:20:45.390 --> 00:20:47.330
Other tricks that get used--

00:20:47.330 --> 00:20:49.590
I can compare it to the
database of everything

00:20:49.590 --> 00:20:52.830
everybody has ever sequenced
before and start looking for

00:20:52.830 --> 00:20:54.010
similarities.

00:20:54.010 --> 00:20:56.480
And today there are
massive databases.

00:20:56.480 --> 00:21:01.490
So many years ago, a
postdoctoral fellow in my lab

00:21:01.490 --> 00:21:04.640
cloned a gene related
to a human disease.

00:21:04.640 --> 00:21:07.000
And she didn't know
what the gene did.

00:21:07.000 --> 00:21:09.630
And she found it.

00:21:09.630 --> 00:21:11.935
And it had exons, but
she didn't know

00:21:11.935 --> 00:21:14.250
where they were yet.

00:21:14.250 --> 00:21:17.610
But she just took the whole
sequence and said, this

00:21:17.610 --> 00:21:21.430
sequence here, is it similar to
anything that's ever been

00:21:21.430 --> 00:21:22.820
seen before?

00:21:22.820 --> 00:21:25.580
This was a gene that was in
people who had a really severe

00:21:25.580 --> 00:21:28.820
form of dwarfism with twisted
bones and things, called

00:21:28.820 --> 00:21:30.370
diastrophic dysplasia.

00:21:30.370 --> 00:21:33.060
And she put it against the
computer database.

00:21:33.060 --> 00:21:36.910
And the computer came back and
said, the sequence you just

00:21:36.910 --> 00:21:42.120
gave me has a whole lot of
patches that looks just like

00:21:42.120 --> 00:21:46.680
sulfate transporters
in a fungus.

00:21:46.680 --> 00:21:51.030
She instantly knew what
her gene did.

00:21:51.030 --> 00:21:53.830
Because it turns out that bones
have a lot of sulfated

00:21:53.830 --> 00:21:56.780
proteoglycans, et cetera, et
cetera, whatever those are.

00:21:56.780 --> 00:21:59.370
And she instantly knew, because
my sequence was

00:21:59.370 --> 00:22:01.190
similar to something-- it's a
human sequence similar to

00:22:01.190 --> 00:22:04.430
something in a fungus that does
sulfate transport, I've

00:22:04.430 --> 00:22:05.800
probably got a sulfate
transporter.

00:22:05.800 --> 00:22:07.700
That's probably the basis
of my disease.

00:22:07.700 --> 00:22:10.900
She took her cells from her
patients, added sulfate, found

00:22:10.900 --> 00:22:14.060
that the cells couldn't take
up sulfate very well, and

00:22:14.060 --> 00:22:16.490
bingo-- had found the cause
of her disease.

00:22:16.490 --> 00:22:18.670
One of the most powerful
ways--

00:22:18.670 --> 00:22:21.120
it's sort of Google,
of course, right?--

00:22:21.120 --> 00:22:22.480
it's Google before Google.

00:22:22.480 --> 00:22:24.700
You take your sequence and you
Google it against all other

00:22:24.700 --> 00:22:27.440
sequences and see
what it's like.

00:22:27.440 --> 00:22:30.540
And by googling all of life's
sequences against each other,

00:22:30.540 --> 00:22:33.740
if somebody else has already
solved your problem for you,

00:22:33.740 --> 00:22:36.350
you can find out about
your problem.

00:22:36.350 --> 00:22:39.340
And it's just this wonderful
network effect that is so

00:22:39.340 --> 00:22:41.940
characteristic of information
technologies.

00:22:41.940 --> 00:22:47.320
So anyway, you can do that
by searching databases.

00:22:47.320 --> 00:22:48.815
And we will not,
in this class.

00:22:51.650 --> 00:22:53.830
So you can write code for
looking for open reading

00:22:53.830 --> 00:22:57.150
frames, you can search databases
for similarities

00:22:57.150 --> 00:23:00.610
across organisms or
within organisms,

00:23:00.610 --> 00:23:01.750
or things like that.

00:23:01.750 --> 00:23:04.850
We won't here, but there are at
MIT some great courses on

00:23:04.850 --> 00:23:08.550
computational biology that,
for example, you can write

00:23:08.550 --> 00:23:10.620
algorithms for detecting
these sorts of things.

00:23:10.620 --> 00:23:11.490
It's an interesting question.

00:23:11.490 --> 00:23:14.340
How do you write an algorithm
for comparing two strings

00:23:14.340 --> 00:23:17.200
which might have insertions and
deletions and changes and

00:23:17.200 --> 00:23:17.780
things like that?

00:23:17.780 --> 00:23:20.820
There's a whole rich field of
computational mathematics

00:23:20.820 --> 00:23:23.060
associated with genome
comparisons.

00:23:23.060 --> 00:23:23.910
All right.

00:23:23.910 --> 00:23:26.240
So we've got it.

00:23:26.240 --> 00:23:27.348
Bingo!

00:23:27.348 --> 00:23:30.910
Now, here's our next problem.

00:23:30.910 --> 00:23:38.010
Our next problem, we
cloned the gene for

00:23:38.010 --> 00:23:41.730
beta globin from you.

00:23:41.730 --> 00:23:42.770
You were kind enough.

00:23:42.770 --> 00:23:46.020
You signed an informed consent
allowing us to take some DNA,

00:23:46.020 --> 00:23:47.370
prepare a library.

00:23:47.370 --> 00:23:49.220
We made our antibody,
we found your beta

00:23:49.220 --> 00:23:51.360
globin gene, et cetera.

00:23:51.360 --> 00:23:55.310
Now we're going to conduct a
study of beta globin in a

00:23:55.310 --> 00:23:58.010
larger population.

00:23:58.010 --> 00:24:01.740
Maybe we're going to ask
multiple people in the class,

00:24:01.740 --> 00:24:04.420
would they be willing to sign
an informed consent to have

00:24:04.420 --> 00:24:06.740
their beta globin
gene sequenced?

00:24:06.740 --> 00:24:07.740
It's an interesting gene.

00:24:07.740 --> 00:24:10.320
There are variants in it that
confer risk of sickle cell.

00:24:10.320 --> 00:24:13.040
There are variants in it that
confer risks of other things.

00:24:13.040 --> 00:24:14.790
There are fascinating things
about that gene.

00:24:14.790 --> 00:24:17.110
Maybe we'd like to see
the beta globin

00:24:17.110 --> 00:24:18.130
sequence of many people.

00:24:18.130 --> 00:24:20.380
How do we get the beta globin
sequence of a second person?

00:24:24.430 --> 00:24:25.690
Well, how'd we get the
beta globin sequence

00:24:25.690 --> 00:24:27.890
from the first person?

00:24:27.890 --> 00:24:34.520
Took DNA, cut it up, cloned it
in our vector, spread it out

00:24:34.520 --> 00:24:37.230
on the plate, washed
over the antibody--

00:24:37.230 --> 00:24:39.200
actually, we took cDNA--

00:24:39.200 --> 00:24:42.400
washed it over, et cetera,
et cetera, et cetera.

00:24:42.400 --> 00:24:43.890
It's a lot of work.

00:24:43.890 --> 00:24:46.410
If we wanted to do 100 people
in this class, do we have to

00:24:46.410 --> 00:24:50.820
get DNA from each of you,
prepare a library, maybe a

00:24:50.820 --> 00:24:54.880
cDNA library even, and do the
same exact process to discover

00:24:54.880 --> 00:24:57.200
your beta globe gene?

00:24:57.200 --> 00:25:00.000
Or is there any way where after
we've done your beta

00:25:00.000 --> 00:25:03.740
globin gene, we could now do
everybody's beta globin gene a

00:25:03.740 --> 00:25:04.990
lot easier?

00:25:07.440 --> 00:25:09.440
STUDENT: [INAUDIBLE]

00:25:09.440 --> 00:25:11.040
ERIC LANDER: Actually, I
do know the sequence.

00:25:11.040 --> 00:25:12.930
That's the thing that's
different is having found it

00:25:12.930 --> 00:25:15.970
once, I know the
whole sequence.

00:25:15.970 --> 00:25:19.640
The question is, can I use the
sequence to save me the

00:25:19.640 --> 00:25:24.060
trouble of making an entire
library of everything?

00:25:24.060 --> 00:25:27.130
How can I use the knowledge I've
just gained to make it so

00:25:27.130 --> 00:25:28.560
much easier?

00:25:28.560 --> 00:25:32.510
Well, the answer occurred to
a chemist working at Cetus

00:25:32.510 --> 00:25:35.590
corporation in the mid 1980s.

00:25:35.590 --> 00:25:38.580
He was driving along, and he was
thinking about this very

00:25:38.580 --> 00:25:42.410
problem and thinking about
sequencing and how they do it.

00:25:42.410 --> 00:25:45.650
And he had the following
thought.

00:25:45.650 --> 00:25:49.630
His following thought was,
suppose we've got the whole

00:25:49.630 --> 00:25:51.250
human genome.

00:25:51.250 --> 00:25:53.160
There's a whole human genome.

00:25:53.160 --> 00:25:56.380
And I'm just going to melt it
for you, for a second, into

00:25:56.380 --> 00:25:58.016
two strands.

00:25:58.016 --> 00:25:59.670
And suppose we've already
discovered

00:25:59.670 --> 00:26:01.780
the beta globin gene.

00:26:01.780 --> 00:26:05.590
The beta globin gene is right
over here, it turns out.

00:26:05.590 --> 00:26:06.840
That's beta globin.

00:26:10.540 --> 00:26:11.790
Well, we know this
whole sequence.

00:26:16.950 --> 00:26:18.100
This is total DNA.

00:26:18.100 --> 00:26:19.640
I haven't done anything
right now.

00:26:19.640 --> 00:26:21.140
This is the whole human
genome that runs

00:26:21.140 --> 00:26:22.960
over 3 billion bases.

00:26:22.960 --> 00:26:24.413
But it's the whole genome.

00:26:24.413 --> 00:26:28.010
I know the sequence, right?

00:26:28.010 --> 00:26:32.500
I could make a primer to that
part of the sequence.

00:26:32.500 --> 00:26:34.590
And suppose I just make a primer
to that part of the

00:26:34.590 --> 00:26:37.990
sequence, throw it into your
total DNA, and I add

00:26:37.990 --> 00:26:40.410
polymerase and nucleotides.

00:26:40.410 --> 00:26:47.850
Maybe it'll start copying.

00:26:47.850 --> 00:26:48.970
At some point, it'll fall off.

00:26:48.970 --> 00:26:57.630
But notice, I've made an extra
copy of beta globin--

00:26:57.630 --> 00:27:01.210
of course, mixed into the whole,
total human genome.

00:27:01.210 --> 00:27:04.130
But there's a little bit
extra beta globin now.

00:27:04.130 --> 00:27:06.600
Suppose I also made a
primer over here.

00:27:13.070 --> 00:27:14.320
I'd get that.

00:27:16.320 --> 00:27:18.810
I'd now have two double strands
of beta globin,

00:27:18.810 --> 00:27:20.080
whereas before, I
only had one.

00:27:23.930 --> 00:27:25.180
Let's call this step 1.

00:27:28.190 --> 00:27:31.030
What do you think step 1
should be followed by?

00:27:31.030 --> 00:27:32.030
STUDENT: Step 2.

00:27:32.030 --> 00:27:33.010
ERIC LANDER: Step 2.

00:27:33.010 --> 00:27:34.490
Very good.

00:27:34.490 --> 00:27:35.140
Excellent.

00:27:35.140 --> 00:27:38.190
You guys have learned
induction.

00:27:38.190 --> 00:27:59.230
So let me melt the DNA and
now throw back my primer.

00:28:03.240 --> 00:28:07.440
Actually, if you'll allow me,
I'm going to make the two

00:28:07.440 --> 00:28:10.230
primers different colors.

00:28:10.230 --> 00:28:13.470
Let's make that one
a different color.

00:28:13.470 --> 00:28:14.720
There we go.

00:28:20.270 --> 00:28:27.240
So now what will happen is
this primer goes here.

00:28:27.240 --> 00:28:30.280
This green primer goes here.

00:28:34.220 --> 00:28:37.420
This guy sits down here.

00:28:42.060 --> 00:28:44.306
And this guy goes like that.

00:28:44.306 --> 00:28:45.950
Well, that didn't come
out very good.

00:28:45.950 --> 00:28:47.945
I'll just draw it a little
more clearly here.

00:28:50.560 --> 00:28:58.160
What happens is this guy will
start copying this way.

00:28:58.160 --> 00:29:02.600
This guy starts copying
this way.

00:29:02.600 --> 00:29:06.470
This guy starts copying
this way.

00:29:06.470 --> 00:29:11.860
This guy starts copying
this way.

00:29:11.860 --> 00:29:16.910
Now, after step 2, how
many copies of

00:29:16.910 --> 00:29:19.670
beta globin do I have?

00:29:19.670 --> 00:29:20.920
Four copies.

00:29:24.500 --> 00:29:26.424
What's the next step?

00:29:26.424 --> 00:29:27.300
STUDENT: Step 3.

00:29:27.300 --> 00:29:28.190
ERIC LANDER: Step 3.

00:29:28.190 --> 00:29:28.940
Very good.

00:29:28.940 --> 00:29:30.810
No putting anything
over on you.

00:29:30.810 --> 00:29:33.900
And after we melt the DNA and
we add back the primers, how

00:29:33.900 --> 00:29:36.031
many copies of beta globin
will we now have?

00:29:38.917 --> 00:29:39.400
STUDENT: Eight.

00:29:39.400 --> 00:29:41.700
ERIC LANDER: Eight, because
it's doubling every time.

00:29:41.700 --> 00:29:44.240
Step 4?

00:29:44.240 --> 00:29:46.050
Step 10?

00:29:46.050 --> 00:29:47.910
2 the 10th.

00:29:47.910 --> 00:29:48.860
2 to the 10th, because
we're doubling.

00:29:48.860 --> 00:29:50.270
2 to the 10th.

00:29:50.270 --> 00:29:51.520
Step 20?

00:29:53.990 --> 00:29:58.790
2 to the 20th, which
is about a million.

00:29:58.790 --> 00:30:01.800
Step 30?

00:30:01.800 --> 00:30:04.030
2 to the 30th is about
a billion.

00:30:07.410 --> 00:30:08.660
Oh.

00:30:11.810 --> 00:30:20.480
After 30 steps, I have 2 to
the 30th copies, which is

00:30:20.480 --> 00:30:24.080
about a billion copies.

00:30:24.080 --> 00:30:26.970
And at that point, the majority
of the DNA in my tube

00:30:26.970 --> 00:30:28.470
is beta globin.

00:30:28.470 --> 00:30:30.855
The rest of the human genome
is still there, but beta

00:30:30.855 --> 00:30:33.635
globin started out being one

00:30:33.635 --> 00:30:37.190
one-hundred-millionth of the genome.

00:30:37.190 --> 00:30:40.590
And I've just amplified
it a billionfold.

00:30:40.590 --> 00:30:45.080
So it's now 90% of what's
in the tube.

00:30:45.080 --> 00:30:47.070
Pretty cool.

00:30:47.070 --> 00:30:48.400
This is like a chain reaction.

00:30:48.400 --> 00:30:49.890
You do it once, you do
it again, you do it

00:30:49.890 --> 00:30:50.770
again, you do it again.

00:30:50.770 --> 00:30:54.280
You just throw in polymerase,
and you run a chain reaction.

00:30:54.280 --> 00:31:10.610
This therefore is called the
polymerase chain reaction, or

00:31:10.610 --> 00:31:12.862
as it is universally
known, PCR.

00:31:16.040 --> 00:31:17.670
That's PCR.

00:31:17.670 --> 00:31:21.020
That's the polymerase
chain reaction.

00:31:21.020 --> 00:31:23.830
Kary Mullis, who invented this
thing, won a Nobel Prize in

00:31:23.830 --> 00:31:24.890
chemistry for it.

00:31:24.890 --> 00:31:26.930
Because notice what
he's just done.

00:31:26.930 --> 00:31:31.400
He's cloned your beta globin
gene without cloning.

00:31:31.400 --> 00:31:33.050
It's cloning without cloning.

00:31:33.050 --> 00:31:34.900
I didn't need any vectors, I
didn't need any bacteria, I

00:31:34.900 --> 00:31:35.730
didn't need no nothing.

00:31:35.730 --> 00:31:39.960
All I needed was the sequence
that I got once by cloning,

00:31:39.960 --> 00:31:41.890
and then I'm off to the races.

00:31:41.890 --> 00:31:43.603
I throw in two primers--

00:31:43.603 --> 00:31:46.370
choop-choop-choop-choop-choop--

00:31:46.370 --> 00:31:47.750
bingo!

00:31:47.750 --> 00:31:49.880
Where do my primers come from?

00:31:49.880 --> 00:31:51.020
They're not in the catalogs.

00:31:51.020 --> 00:31:52.140
We don't keep all
that inventory.

00:31:52.140 --> 00:31:54.440
You just type them in, and they
come to you the next day

00:31:54.440 --> 00:31:56.640
by an automatic synthesis
machine.

00:31:56.640 --> 00:31:59.370
So anyplace in the human genome
or the yeast genome or

00:31:59.370 --> 00:32:02.930
any other thing that you want
to PCR, just give me the two

00:32:02.930 --> 00:32:04.620
primers and piece it out.

00:32:04.620 --> 00:32:05.360
How do we do this?

00:32:05.360 --> 00:32:07.750
There are a couple of the
details that I have to worry

00:32:07.750 --> 00:32:09.000
about here.

00:32:11.290 --> 00:32:14.990
Cooking details here
for the recipe.

00:32:14.990 --> 00:32:18.415
What I have to do is I have to
take my test tube, and I have

00:32:18.415 --> 00:32:24.290
to heat it up so high that the
double helix melts and comes

00:32:24.290 --> 00:32:26.960
apart, so that the primers
can get in there.

00:32:26.960 --> 00:32:32.120
So I have to heat
to 97 degrees.

00:32:35.230 --> 00:32:38.360
Then it comes apart.

00:32:38.360 --> 00:32:42.410
I cool it down, I add
my polymerase.

00:32:42.410 --> 00:32:45.990
I cool it, I add polymerase
and nucleotides.

00:32:54.690 --> 00:32:56.270
And then it does
its extension.

00:32:56.270 --> 00:32:58.630
Then I heat it up to 97.

00:32:58.630 --> 00:33:00.320
Now the problem is when I heat
it up to 97, you know what

00:33:00.320 --> 00:33:02.256
happens to my polymerase?

00:33:02.256 --> 00:33:02.700
STUDENT: Denatured.

00:33:02.700 --> 00:33:03.490
ERIC LANDER: It gets denatured,

00:33:03.490 --> 00:33:04.660
and it doesn't come.

00:33:04.660 --> 00:33:05.820
It's ruined.

00:33:05.820 --> 00:33:08.250
So what I have to do is I have
to pop open my two--

00:33:08.250 --> 00:33:09.690
Sorry, you had a question?

00:33:09.690 --> 00:33:10.940
STUDENT: [INAUDIBLE]

00:33:13.110 --> 00:33:14.120
ERIC LANDER: Why do
I heat it up?

00:33:14.120 --> 00:33:16.320
There are ways you might
be able to avoid it.

00:33:16.320 --> 00:33:18.530
But the traditional technique
is you heat it up.

00:33:18.530 --> 00:33:20.230
But you're right, there might
be other solution.

00:33:20.230 --> 00:33:22.110
But now I'm going heat it up.

00:33:22.110 --> 00:33:24.280
And my polymerase
gets denatured.

00:33:24.280 --> 00:33:26.880
So I have to pop open the test
tube, throw in some more

00:33:26.880 --> 00:33:30.750
polymerase, let it do its work,
heat it up again, pop

00:33:30.750 --> 00:33:32.750
open the test tube, put some
more polymerase in.

00:33:32.750 --> 00:33:33.780
And it gets really boring.

00:33:33.780 --> 00:33:35.820
Every one of these 30 steps,
I have to keep adding

00:33:35.820 --> 00:33:37.070
polymerase.

00:33:41.780 --> 00:33:44.180
So the engineers in you will
say, why don't we just design

00:33:44.180 --> 00:33:47.430
a polymerase that doesn't
denature at 97 degrees?

00:33:47.430 --> 00:33:49.800
So we should go to an expert
and say, please make us a

00:33:49.800 --> 00:33:52.980
polymerase that doesn't denature
at 97 degrees, even

00:33:52.980 --> 00:33:55.630
that doesn't mind
being boiled.

00:33:55.630 --> 00:33:59.220
So what expert do we go to?

00:33:59.220 --> 00:33:59.980
STUDENT: Bacteria.

00:33:59.980 --> 00:34:00.960
ERIC LANDER: Bacteria.

00:34:00.960 --> 00:34:03.650
What bacteria do you think has
a DNA polymerase that doesn't

00:34:03.650 --> 00:34:05.058
mind being boiled?

00:34:05.058 --> 00:34:06.730
STUDENT: [INAUDIBLE].

00:34:06.730 --> 00:34:08.350
ERIC LANDER: Bacteria
that live in, say,

00:34:08.350 --> 00:34:10.070
geysers, hot springs.

00:34:10.070 --> 00:34:11.340
So you go to a hot spring.

00:34:11.340 --> 00:34:14.370
You go to geyser, you go to
Yosemite, and you fish out

00:34:14.370 --> 00:34:16.920
some water, and you see what's
growing there, and you find

00:34:16.920 --> 00:34:19.719
the bacteria growing
there that has the

00:34:19.719 --> 00:34:25.380
name Thermus aquaticus.

00:34:31.070 --> 00:34:34.300
And you purify DNA from
Thermus aquaticus.

00:34:34.300 --> 00:34:38.900
Thermus aquaticus just goes by
name TAQ, T-A-Q. You purify

00:34:38.900 --> 00:34:40.929
TAQ polymerase.

00:34:40.929 --> 00:34:44.060
And now, no problem.

00:34:44.060 --> 00:34:46.610
You just use TAQ polymerase,
throw it in your test tube.

00:34:46.610 --> 00:34:48.290
And you go heat, cool, heat,
cool, heat,cool, heat, cool,

00:34:48.290 --> 00:34:51.000
heat, cool, and you're
all done.

00:34:51.000 --> 00:34:52.690
You just put it on a little
heating block.

00:34:52.690 --> 00:34:54.650
And the heating block
automatically goes hot, cold,

00:34:54.650 --> 00:34:56.969
hot, cold, hot, cold,
hot, cold.

00:34:56.969 --> 00:34:58.840
And that's called the
thermocycler.

00:34:58.840 --> 00:35:00.490
The thermocycler does it.

00:35:00.490 --> 00:35:02.670
And of course, nowadays,
do you have to yourself

00:35:02.670 --> 00:35:05.740
personally go to the hot spring
and risk your life

00:35:05.740 --> 00:35:07.740
fishing out the bacteria?

00:35:07.740 --> 00:35:08.640
No.

00:35:08.640 --> 00:35:09.950
Because it's in--?

00:35:09.950 --> 00:35:10.380
STUDENT: The catalog.

00:35:10.380 --> 00:35:11.350
ERIC LANDER: The catalog.

00:35:11.350 --> 00:35:12.620
Exactly.

00:35:12.620 --> 00:35:13.240
Very good.

00:35:13.240 --> 00:35:16.740
TAQ polymerase is
in the catalog.

00:35:16.740 --> 00:35:18.470
All right.

00:35:18.470 --> 00:35:19.310
I'll tell you a story.

00:35:19.310 --> 00:35:22.330
The statute of limitations has
already expired, so it's OK.

00:35:22.330 --> 00:35:25.060
TAQ polymerase used to
be very expensive.

00:35:25.060 --> 00:35:29.000
So we needed a lot
of it in our lab.

00:35:29.000 --> 00:35:33.100
And we couldn't afford
all of it.

00:35:33.100 --> 00:35:39.360
So what we did was we just
looked up the sequence of the

00:35:39.360 --> 00:35:44.100
TAQ polymerase in Thermus
aquaticus, got primers, used

00:35:44.100 --> 00:35:49.640
PCR to get the gene for TAQ
polymerase, and then expressed

00:35:49.640 --> 00:35:51.940
it to make a lot of
TAQ polymerase.

00:35:51.940 --> 00:35:54.500
So it's kind of cool.

00:35:54.500 --> 00:35:57.040
Anyway, that was about
15 years ago.

00:35:57.040 --> 00:35:59.830
We produced in a few days what
was then worth about $4

00:35:59.830 --> 00:36:01.610
million worth of
TAQ polymerase.

00:36:01.610 --> 00:36:02.402
[LAUGHTER]

00:36:02.402 --> 00:36:05.030
That was why we went to the
trouble of doing it.

00:36:07.810 --> 00:36:09.900
We didn't end up getting in
any real trouble about it.

00:36:09.900 --> 00:36:10.250
OK.

00:36:10.250 --> 00:36:13.770
So now, why is this
stuff cool?

00:36:13.770 --> 00:36:19.290
This stuff is cool because
you're able to amplify tiny

00:36:19.290 --> 00:36:20.240
amounts of DNA.

00:36:20.240 --> 00:36:22.720
So if I want to purify any human
gene now, and I know its

00:36:22.720 --> 00:36:25.480
sequence initially,
PCR it out.

00:36:25.480 --> 00:36:27.730
No problem.

00:36:27.730 --> 00:36:32.300
Suppose a patient presents with
a bacterial infection.

00:36:32.300 --> 00:36:34.730
And you're a physician.

00:36:34.730 --> 00:36:37.530
And you suspect that there might
be a specific bacterial

00:36:37.530 --> 00:36:40.960
infection or maybe a specific
viral infection.

00:36:40.960 --> 00:36:43.115
So applications of PCR.

00:36:50.260 --> 00:36:52.180
Application of PCR?

00:36:52.180 --> 00:36:55.780
Well, resequencing a
known gene, yes.

00:36:55.780 --> 00:36:58.160
Resequencing beta globin.

00:36:58.160 --> 00:37:00.325
But infectious disease.

00:37:08.340 --> 00:37:09.910
I have a patient.

00:37:09.910 --> 00:37:11.450
I think there might be a
bacteria, there might be a

00:37:11.450 --> 00:37:13.230
virus in the blood.

00:37:13.230 --> 00:37:14.480
What do I do?

00:37:16.650 --> 00:37:19.700
Make primers, do PCR.

00:37:19.700 --> 00:37:21.570
I have a detection technique.

00:37:21.570 --> 00:37:25.530
I can detect the presence
of a viral infection

00:37:25.530 --> 00:37:27.320
or a bacterial infection.

00:37:27.320 --> 00:37:31.540
For example, HIV testing
can be done by PCR.

00:37:31.540 --> 00:37:33.170
Because it doesn't take
very much there in

00:37:33.170 --> 00:37:35.900
order to detect it.

00:37:35.900 --> 00:37:37.620
Water contamination.

00:37:37.620 --> 00:37:40.930
You can test for bugs that
shouldn't be in the water, by

00:37:40.930 --> 00:37:43.070
PCR because you don't
need much.

00:37:43.070 --> 00:37:45.160
How little do you need?

00:37:45.160 --> 00:37:49.540
Suppose I take a tube of DNA,
and I start diluting it and

00:37:49.540 --> 00:37:52.330
diluting it and diluting it.

00:37:52.330 --> 00:37:55.660
How far down do you think I can
go and still PCR back up,

00:37:55.660 --> 00:37:58.670
say, beta globin?

00:37:58.670 --> 00:38:01.360
Suppose I dilute it so that
there's only like 1,000 copies

00:38:01.360 --> 00:38:04.040
of beta globin left,
on average.

00:38:04.040 --> 00:38:06.590
Can I still PCR it?

00:38:06.590 --> 00:38:09.100
100 copies?

00:38:09.100 --> 00:38:11.260
10 copies?

00:38:11.260 --> 00:38:13.920
Suppose I dilute it so on
average there's only one copy

00:38:13.920 --> 00:38:16.280
of the beta globin
molecule there.

00:38:16.280 --> 00:38:19.810
Can I PCR it?

00:38:19.810 --> 00:38:22.410
How can I prove that?

00:38:22.410 --> 00:38:23.590
An easy way to prove that--

00:38:23.590 --> 00:38:25.930
I could do it statistically by
just diluting it so that on

00:38:25.930 --> 00:38:27.380
average there's only
one beta globin.

00:38:27.380 --> 00:38:29.950
But how can I get one
copy of beta globin

00:38:29.950 --> 00:38:31.260
packaged up very nicely?

00:38:34.410 --> 00:38:35.310
STUDENT: Order it.

00:38:35.310 --> 00:38:36.146
ERIC LANDER: Sorry?

00:38:36.146 --> 00:38:37.394
STUDENT: Order it.

00:38:37.394 --> 00:38:39.550
ERIC LANDER: No.

00:38:39.550 --> 00:38:40.800
Can't order that.

00:38:45.040 --> 00:38:47.660
Where do you know that there's
just exactly one copy of the

00:38:47.660 --> 00:38:49.950
beta globin gene?

00:38:49.950 --> 00:38:53.510
One human sperm.

00:38:53.510 --> 00:38:55.570
Suppose with a micromanipulator,
I purify a

00:38:55.570 --> 00:38:58.450
single human sperm, one sperm.

00:39:03.420 --> 00:39:03.895
It's haploid.

00:39:03.895 --> 00:39:05.940
It's got exactly one
beta globin.

00:39:05.940 --> 00:39:10.020
Throw it in a test tube, crack
it open, do PCR, it works.

00:39:10.020 --> 00:39:13.230
That's how I demonstrate that
a single copy is enough.

00:39:13.230 --> 00:39:14.240
I can make that work.

00:39:14.240 --> 00:39:15.550
Pretty impressive.

00:39:15.550 --> 00:39:18.130
Not only that, that
I can do it from a

00:39:18.130 --> 00:39:19.380
single copy in a sperm.

00:39:25.390 --> 00:39:27.576
If someone is doing in
vitro fertilization--

00:39:32.350 --> 00:39:34.070
remember, in vitro fertilization
won the Nobel

00:39:34.070 --> 00:39:37.396
Prize this semester--

00:39:37.396 --> 00:39:38.810
in vitro.

00:39:38.810 --> 00:39:42.960
Suppose a couple has a 1 in 4
chance of having a baby with

00:39:42.960 --> 00:39:45.520
some terrible lethal disorder.

00:39:45.520 --> 00:39:49.150
The couple might use in vitro
fertilization to make multiple

00:39:49.150 --> 00:39:52.300
independent embryos.

00:39:52.300 --> 00:39:56.540
The doctor, then, is deciding
which embryo should we implant

00:39:56.540 --> 00:39:58.750
back in mom?

00:39:58.750 --> 00:40:03.660
Well, how could they tell at
this eight-cell stage which

00:40:03.660 --> 00:40:07.030
embryo carries the
genetic disease?

00:40:07.030 --> 00:40:08.930
Suppose this genetic disease
they already knew the

00:40:08.930 --> 00:40:10.702
molecular mutation causing it.

00:40:13.600 --> 00:40:14.850
Pull off a cell.

00:40:19.400 --> 00:40:25.200
Pull off one cell, pull off one
cell, pull off one cell.

00:40:25.200 --> 00:40:27.760
This is at the eight-cell
stage, let's say.

00:40:27.760 --> 00:40:30.620
If I pull off one of those cells
at the eight-cell stage,

00:40:30.620 --> 00:40:33.180
does that mean the baby doesn't
have an ear or an arm

00:40:33.180 --> 00:40:34.860
or something?

00:40:34.860 --> 00:40:35.640
No, it doesn't.

00:40:35.640 --> 00:40:37.350
Because at that stage, none
of the cells have

00:40:37.350 --> 00:40:39.030
taken up any identity.

00:40:39.030 --> 00:40:39.830
It regulates.

00:40:39.830 --> 00:40:40.630
There's no problem.

00:40:40.630 --> 00:40:43.550
It turns out you can pull off an
individual cell, and it has

00:40:43.550 --> 00:40:45.960
no impact on the embryo.

00:40:45.960 --> 00:40:47.210
And I do PCR.

00:40:49.710 --> 00:40:53.330
And I can figure out that that
one carries the severe genetic

00:40:53.330 --> 00:40:55.090
disease that's going
to cause the baby

00:40:55.090 --> 00:40:56.630
to die at five months.

00:40:56.630 --> 00:40:59.120
And the couple says, we're not
going to implant that one.

00:40:59.120 --> 00:41:00.580
We'll implant the other ones.

00:41:00.580 --> 00:41:02.580
That's called preimplantation
diagnostics.

00:41:13.830 --> 00:41:17.310
Or suppose somebody's being
treated for cancer.

00:41:17.310 --> 00:41:22.840
And the cancer cells that had
previously been there are no

00:41:22.840 --> 00:41:25.070
longer detectable.

00:41:25.070 --> 00:41:27.820
The drug therapy has apparently
killed this blood

00:41:27.820 --> 00:41:29.130
cancer that somebody has.

00:41:29.130 --> 00:41:31.620
Maybe they have a cancer
of the blood.

00:41:31.620 --> 00:41:34.670
Now what I'm going to do is
monitor that patient every

00:41:34.670 --> 00:41:38.250
several months by getting a
blood sample and seeing if the

00:41:38.250 --> 00:41:40.110
distinct mutations that
were present in

00:41:40.110 --> 00:41:41.830
the cancer were there.

00:41:41.830 --> 00:41:44.750
And I can begin to see if that's
coming back, if those

00:41:44.750 --> 00:41:46.840
cells are now recurring.

00:41:46.840 --> 00:41:49.020
It's an incredibly sensitive
technique.

00:41:49.020 --> 00:41:51.420
And then of course, where does
this stuff get used that you

00:41:51.420 --> 00:41:55.280
guys all surely we know about?

00:41:55.280 --> 00:41:57.040
Forensics.

00:41:57.040 --> 00:41:59.120
CSI and all that
kind of stuff.

00:42:02.660 --> 00:42:04.900
If I lick an envelope--

00:42:04.900 --> 00:42:08.100
I used to say, licking a stamp,
but stamps just peel

00:42:08.100 --> 00:42:08.700
off these days.

00:42:08.700 --> 00:42:10.610
But people still do lick
an envelope sometimes.

00:42:10.610 --> 00:42:14.590
If you lick an envelope, more
than enough DNA comes off when

00:42:14.590 --> 00:42:18.720
you lick the envelope that you
can use PCR to determine who

00:42:18.720 --> 00:42:21.680
do the licking.

00:42:21.680 --> 00:42:22.930
ERIC LANDER: You can.

00:42:22.930 --> 00:42:24.590
It works.

00:42:24.590 --> 00:42:25.270
All right.

00:42:25.270 --> 00:42:26.520
So that's PCR.