WEBVTT

00:00:03.000 --> 00:00:17.000
Good morning.

00:00:17.000 --> 00:00:23.000
It can't go without at least
some acknowledgement dimension.

00:00:23.000 --> 00:00:29.000
If you should ever
find yourself in

00:00:29.000 --> 00:00:35.000
life in a situation where you have
or are about to give up all hope,

00:00:35.000 --> 00:00:42.000
you think things are utterly
impossible and there's no way,

00:00:42.000 --> 00:00:49.000
you will remember this week
that nothing is impossible.

00:00:49.000 --> 00:00:53.000
It is possible to come back three
games down in the bottom of the

00:00:53.000 --> 00:00:58.000
ninth inning, you've got
to believe you can do it,

00:00:58.000 --> 00:01:03.000
and remember to have
Dave Roberts pinch run.

00:01:03.000 --> 00:01:06.000
Just a general bit of good
advice. What an amazing week, just

00:01:06.000 --> 00:01:09.000
absolutely amazing week. Wow.
There are lessons in life to

00:01:09.000 --> 00:01:12.000
be taken from it.
Please do take them.

00:01:12.000 --> 00:01:15.000
You know, there really are. I
mean I'd given up hope by that

00:01:15.000 --> 00:01:18.000
point, I confess. I
wish I could say oh,

00:01:18.000 --> 00:01:22.000
I knew they were going to
pull it out, but I didn't.

00:01:22.000 --> 00:01:25.000
And, boy, they pulled it
out one game at a time.

00:01:25.000 --> 00:01:28.000
So, all of you think good thoughts
this week. This could be a historic

00:01:28.000 --> 00:01:33.000
week, you know, you were
here. Anyway, onward.

00:01:33.000 --> 00:01:45.000
We were talking
last time about how

00:01:45.000 --> 00:01:49.000
to analyze your clone. The
notion of cloning random pieces

00:01:49.000 --> 00:01:54.000
of DNA, identifying your
clone within a library,

00:01:54.000 --> 00:01:59.000
purifying the DNA from a clone,
doing some preliminary analysis by

00:01:59.000 --> 00:02:03.000
maybe cutting with a restriction
enzyme, then sequencing it using

00:02:03.000 --> 00:02:08.000
these techniques that I'd described
would allow you to take the clone

00:02:08.000 --> 00:02:13.000
that was, say, able to
rescue the yeast that

00:02:13.000 --> 00:02:17.000
couldn't grow without arginine
and figure out what its

00:02:17.000 --> 00:02:22.000
DNA sequence was. You could
take the clone that you

00:02:22.000 --> 00:02:26.000
had obtained by hybridizing with
the DNA sequence corresponding to the

00:02:26.000 --> 00:02:31.000
protein sequence for beta-globin and
sequence it and see the beta-globin

00:02:31.000 --> 00:02:36.000
gene sequence perhaps.
This is very powerful.

00:02:36.000 --> 00:02:40.000
I want to take a brief moment,
we'll come back to it in more detail

00:02:40.000 --> 00:02:44.000
in a subsequent lecture, but I
really described how you would

00:02:44.000 --> 00:02:49.000
sequence one clone. I
just want to make a note,

00:02:49.000 --> 00:02:53.000
because someone asked about it last
time, about how you would sequence

00:02:53.000 --> 00:02:58.000
an entire genome.
Someone asked about this.

00:02:58.000 --> 00:03:04.000
Remember before we pulled out
our clone, we sequenced it,

00:03:04.000 --> 00:03:10.000
we got its DNA sequence. What
if I wanted to sequence the

00:03:10.000 --> 00:03:17.000
entirety of a genome?
Yeah. Do a lot of this,

00:03:17.000 --> 00:03:23.000
right, basically if I
got a whole genome. Well,

00:03:23.000 --> 00:03:30.000
somebody asked could I put a
primer here and just sequence?

00:03:30.000 --> 00:03:34.000
It would take a very long time.
And it turns out that it wouldn't

00:03:34.000 --> 00:03:38.000
work because the separation that
you can achieve through gels is a

00:03:38.000 --> 00:03:43.000
function, the separation between N
and N plus 1 in length goes like the

00:03:43.000 --> 00:03:47.000
logarithm of the ratio. So,
it turns out that when N and N

00:03:47.000 --> 00:03:52.000
plus 1 get to like about a thousand,
you can achieve very little physical

00:03:52.000 --> 00:03:56.000
separation between them. And
so, DNA sequencing runs cannot

00:03:56.000 --> 00:04:01.000
go much past the thousand bases.
So, the problem with sequencing a

00:04:01.000 --> 00:04:05.000
genome by putting down a primer on
an extraordinarily long piece of DNA,

00:04:05.000 --> 00:04:09.000
a hundred million bases, is
you cannot separate the little

00:04:09.000 --> 00:04:14.000
fragments like that. So,
what you do is you break up

00:04:14.000 --> 00:04:18.000
your genome into lots of pieces.
One strategy, break it up into a

00:04:18.000 --> 00:04:22.000
library of some very big pieces.
It turns out you can make pieces at

00:04:22.000 --> 00:04:27.000
random of a hundred
thousand base pairs.

00:04:27.000 --> 00:04:31.000
Cloning these in bacterial
artificial chromosomes,

00:04:31.000 --> 00:04:36.000
as we talked about before.
Take a library of bacterial

00:04:36.000 --> 00:04:40.000
artificial chromosomes and
then begin sequencing them.

00:04:40.000 --> 00:04:45.000
And take any given bacterial
artificial chromosome and break it

00:04:45.000 --> 00:04:49.000
up into a whole lot of pieces that
are maybe a thousand bases long,

00:04:49.000 --> 00:04:54.000
and you could sequence all of those.
How do you arrange to get just a

00:04:54.000 --> 00:04:58.000
perfect overlapping set of thousand
based pair clones that perfectly

00:04:58.000 --> 00:05:03.000
tile across the sequence
with no redundancy?

00:05:03.000 --> 00:05:06.000
You don't. That's the correct
answer. That's how you do it,

00:05:06.000 --> 00:05:10.000
you don't. Instead you just
randomly take a bunch of things.

00:05:10.000 --> 00:05:13.000
And, in fact, typically you might
take clones that give you six or

00:05:13.000 --> 00:05:17.000
eight-fold redundancy. You
just sequence a lot of clones

00:05:17.000 --> 00:05:20.000
and then you ask the
computer to reassemble it.

00:05:20.000 --> 00:05:24.000
And, in fact, all that overlap is
very good for being able to stick

00:05:24.000 --> 00:05:27.000
these pieces together. Sometimes
people do such things as

00:05:27.000 --> 00:05:31.000
take pieces that might be four
thousand bases long and sequence a

00:05:31.000 --> 00:05:35.000
thousand bases here and a thousand
bases here by using a primer that

00:05:35.000 --> 00:05:39.000
starts there and a primer that
starts there. And then you can get

00:05:39.000 --> 00:05:42.000
DNA sequences from two ends of
a clone. And if you had that for

00:05:42.000 --> 00:05:46.000
zillions of clones your computer
program might do an even better job

00:05:46.000 --> 00:05:50.000
of linking things up. It's
one very big crossword puzzle

00:05:50.000 --> 00:05:54.000
of putting together all of these
pieces, a jigsaw puzzle of putting

00:05:54.000 --> 00:05:58.000
together all these pieces.
But, in effect, this is how you

00:05:58.000 --> 00:06:01.000
sequence a big piece of DNA.
You chop it up into medium-sized

00:06:01.000 --> 00:06:05.000
pieces of DNA and then tinny
pieces of DNA, you sequence them,

00:06:05.000 --> 00:06:09.000
and you use computational
science to reassemble it.

00:06:09.000 --> 00:06:13.000
Some people, for some genomes,
take the whole big genome and

00:06:13.000 --> 00:06:16.000
immediately go to lots of
little pieces. That can work,

00:06:16.000 --> 00:06:20.000
too. I depends on exactly how
complicated your genome is.

00:06:20.000 --> 00:06:24.000
In the human genome, there are some
parts of a human genome that are

00:06:24.000 --> 00:06:28.000
almost identical that might be like
99.91% identical in two different

00:06:28.000 --> 00:06:32.000
parts of the genome.
And so, if you do that,

00:06:32.000 --> 00:06:36.000
you may have trouble telling
those pieces apart. So,

00:06:36.000 --> 00:06:40.000
for really complicated genomes
people like sometimes breaking it up

00:06:40.000 --> 00:06:45.000
into intermediate-sized pieces.
But basically the idea of

00:06:45.000 --> 00:06:49.000
sequencing a big piece of DNA
by this process is referred to as

00:06:49.000 --> 00:06:53.000
shotgun sequencing.
Shotgun sequencing,

00:06:53.000 --> 00:06:58.000
in fact, was developed in
about 1980 by Fred Sanger,

00:06:58.000 --> 00:07:02.000
the same guy who developed the DNA
sequencing technique that I told you

00:07:02.000 --> 00:07:07.000
about using polymerase
and dideoxynucleotides.

00:07:07.000 --> 00:07:09.000
Sanger very quickly wanted to go
from sequencing a single piece to

00:07:09.000 --> 00:07:12.000
sequencing pieces, and so
he developed the shotgun

00:07:12.000 --> 00:07:15.000
technique there. And it's
now been applied in many

00:07:15.000 --> 00:07:18.000
different forms of intermediate
shotguns, whole genome shotguns,

00:07:18.000 --> 00:07:21.000
et cetera. So, that's in reply to
the question someone asked last time

00:07:21.000 --> 00:07:24.000
about, well, how would
you do a whole genome? And,

00:07:24.000 --> 00:07:27.000
as a matter of fact, this
is not theoretical because,

00:07:27.000 --> 00:07:30.000
in fact, people do hold genomes
this way. And we do this at MIT.

00:07:30.000 --> 00:07:35.000
Lots of genomes get done here in
this fashion. Someone else asked

00:07:35.000 --> 00:07:40.000
how would you analyze your clone.
And, again, I'll just make a brief

00:07:40.000 --> 00:07:46.000
remark on that in response
to the question. So,

00:07:46.000 --> 00:07:51.000
analyzing some DNA sequence.
So, suppose we got some DNA

00:07:51.000 --> 00:07:57.000
sequence, A-A-T-A, don't
bother writing this down.

00:07:57.000 --> 00:08:03.000
I'm just making up letters here.
How would we make any sense of it?

00:08:03.000 --> 00:08:09.000
Suppose I give you the ones
and zeros from your hard-drive,

00:08:09.000 --> 00:08:15.000
how would you make any sense out of
them? This is about as interesting

00:08:15.000 --> 00:08:21.000
as the ones as yours from
your hard-drive, right?

00:08:21.000 --> 00:08:27.000
It's got four letters, not
two, but this is actually what

00:08:27.000 --> 00:08:31.000
you get out of any project. You
want to sequence beta-globin?

00:08:31.000 --> 00:08:35.000
You'll get something like this.
You want to sequence the arginine

00:08:35.000 --> 00:08:39.000
gene? You want to sequence the
human genome? You get a very long

00:08:39.000 --> 00:08:43.000
string of four letters.
What do you do with it?

00:08:43.000 --> 00:08:46.000
Oh, well, you could compare it
to a normal copy of the gene.

00:08:46.000 --> 00:08:50.000
And if I did that I might
find a bunch of differences.

00:08:50.000 --> 00:08:54.000
But how would I even know where
the beta-globin gene was within this

00:08:54.000 --> 00:08:58.000
sequence? This clone contains
beta-globin. How would I even find

00:08:58.000 --> 00:09:02.000
the exons? Yes? Or
whatever? Look at codons.

00:09:02.000 --> 00:09:08.000
So, let's start looking at
the codons. This codon here?

00:09:08.000 --> 00:09:13.000
Well, or this, maybe it's
this codon here. Sorry?

00:09:13.000 --> 00:09:19.000
Find it. Do you see any
start codons here? Oh,

00:09:19.000 --> 00:09:24.000
there's an ATG there. So,
maybe that's the start codon or

00:09:24.000 --> 00:09:30.000
maybe not. How often do we
expect to find an ATG in some

00:09:30.000 --> 00:09:35.000
reading frame? You know,
it could happen fairly

00:09:35.000 --> 00:09:39.000
easily. Also, how do we
know it's going this way?

00:09:39.000 --> 00:09:44.000
Maybe we should look for
an ATG, we'll put it there,

00:09:44.000 --> 00:09:48.000
going this way. Sorry?
I drew the arrow there?

00:09:48.000 --> 00:09:53.000
Well, that's because it's where
the sequence started out on my page.

00:09:53.000 --> 00:09:57.000
It doesn't tell me my gene
runs that way. Yes? From five

00:09:57.000 --> 00:10:02.000
prime to three prime. Ah, but
it's a double-stranded piece

00:10:02.000 --> 00:10:06.000
of DNA. You see, if it's
five prime to three prime on

00:10:06.000 --> 00:10:11.000
this strand, the genome has a,
another strand that reads the other

00:10:11.000 --> 00:10:16.000
way. What did I get?
C-A-T-A, right, C-C-T,

00:10:16.000 --> 00:10:20.000
et cetera. And the gene could
be encoded on this strand.

00:10:20.000 --> 00:10:25.000
This could be the coding strand,
that could be the coding strand, and

00:10:25.000 --> 00:10:30.000
looking for a mere ATG in one of
three possible reading frames on one

00:10:30.000 --> 00:10:35.000
of two possible strands
I'll find all sorts of stuff.

00:10:35.000 --> 00:10:38.000
So, sorry? Guess.
Guess. Guess is good.

00:10:38.000 --> 00:10:42.000
They don't, don't, won't you, you
remember we talked about getting

00:10:42.000 --> 00:10:46.000
papers accepted. If you
were to write up the paper

00:10:46.000 --> 00:10:50.000
that way the reviewing would
probably ding it and say,

00:10:50.000 --> 00:10:54.000
you know, the guess isn't,
isn't good enough. So, that's

00:10:54.000 --> 00:10:58.000
actually very interesting. How
do you actually find the gene

00:10:58.000 --> 00:11:02.000
sequence? Well, it
turns out to be a

00:11:02.000 --> 00:11:06.000
non-trivial problem which often
gets glossed over in the textbooks.

00:11:06.000 --> 00:11:11.000
What you might do is if
something really were exonic,

00:11:11.000 --> 00:11:16.000
if this were any exon, does
it have any properties that you

00:11:16.000 --> 00:11:20.000
can think of? It shouldn't have
a stop codon. No stop codon.

00:11:20.000 --> 00:11:25.000
How often does a stop codon occur
at random in a given reading frame?

00:11:25.000 --> 00:11:30.000
How many stop codons are there?
Three out of 64 possible codons.

00:11:30.000 --> 00:11:33.000
There's about one in 20 codons in
any given reading frame is a stop

00:11:33.000 --> 00:11:37.000
codon. So, that means if
I read for about 20 codons,

00:11:37.000 --> 00:11:41.000
and I don't encounter a stop,
it's beginning to get more likely

00:11:41.000 --> 00:11:45.000
that that's not random. If
I read for, say, 60 codons,

00:11:45.000 --> 00:11:49.000
180 bases and I've encountered no
stop codon in that reading frame,

00:11:49.000 --> 00:11:53.000
that chances of that occurring is
about either the minus three or so,

00:11:53.000 --> 00:11:57.000
right? Because if I went through
three characteristic lengths,

00:11:57.000 --> 00:12:01.000
either the minus three, you know,
and I don't know, about 5% or

00:12:01.000 --> 00:12:05.000
something like that. If I
went for thousands of bases

00:12:05.000 --> 00:12:09.000
without any stop codon,
would you be impressed? That's

00:12:09.000 --> 00:12:14.000
pretty impressive. So, all
I have to do is find the

00:12:14.000 --> 00:12:18.000
few thousands of basis with no
stop codon. The problem with that is

00:12:18.000 --> 00:12:23.000
that in bacteria there are some
genes that are a thousand bases long

00:12:23.000 --> 00:12:27.000
and you, there, you can
read them and they have no

00:12:27.000 --> 00:12:33.000
stop codon. What's the
problem with the human

00:12:33.000 --> 00:12:39.000
genome? Introns. It
turns out that because the

00:12:39.000 --> 00:12:46.000
coding sequences are broken up into
small exons, if I found a thousand

00:12:46.000 --> 00:12:53.000
bases with no stop codons then
it's very likely coding sequence.

00:12:53.000 --> 00:13:00.000
But a typical human exon is on
the order of a 150 to 200 bases.

00:13:00.000 --> 00:13:04.000
Very inconvenient because,
you know, it's a typical exon

00:13:04.000 --> 00:13:08.000
encodes 50, 60, 70 codons.
So, it turns out that

00:13:08.000 --> 00:13:12.000
even that is not so easy to do.
Well, the answer is it's not a

00:13:12.000 --> 00:13:17.000
trivial problem. People
do all sorts of things to

00:13:17.000 --> 00:13:21.000
figure out how to decode sequences
of genomes. You do run computer

00:13:21.000 --> 00:13:25.000
filters across there that
say, look, there are a bunch of

00:13:25.000 --> 00:13:30.000
consecutive codons
without stop codons.

00:13:30.000 --> 00:13:33.000
There tend to be little preferences,
like amongst the synonymous choices

00:13:33.000 --> 00:13:36.000
of stop codons, humans tend
to prefer one stop codon,

00:13:36.000 --> 00:13:39.000
one codon for a specific
amino acid over others. So,

00:13:39.000 --> 00:13:43.000
there are some biases as
to which codons get used.

00:13:43.000 --> 00:13:46.000
And the computer can kind of take
a little bit of account of that.

00:13:46.000 --> 00:13:49.000
Then you can also have made a
library of seed DNAs and sequence

00:13:49.000 --> 00:13:53.000
seed DNA, the mRNA which will help
you a lot and look for where they

00:13:53.000 --> 00:13:56.000
match up. Then you can take
sequences from the human

00:13:56.000 --> 00:13:59.000
and the mouse. And it
turns out that the sequences

00:13:59.000 --> 00:14:03.000
in the mouse and the sequences
in the human, if you line them up,

00:14:03.000 --> 00:14:06.000
the exons tend to match up better
than the introns because evolution

00:14:06.000 --> 00:14:09.000
cares a lot about the exons.
But it turns out this is not a

00:14:09.000 --> 00:14:13.000
trivial problem. And even
today, if I give you a

00:14:13.000 --> 00:14:16.000
random stretch of human DNA,
it's not, there is no simple

00:14:16.000 --> 00:14:20.000
computer program that it's
on, that on its own, not even a

00:14:20.000 --> 00:14:23.000
complicated computer program,
but on its own would be able,

00:14:23.000 --> 00:14:27.000
without axillary data, to
accurately pick out all the genes.

00:14:27.000 --> 00:14:30.000
Even for simple bacteria, we
cannot nail perfectly all the

00:14:30.000 --> 00:14:34.000
genes. Although, the lack
of these introns means that

00:14:34.000 --> 00:14:38.000
the exons tend to be pretty big,
it means the coding are pretty big

00:14:38.000 --> 00:14:42.000
and we can kind of do it. So,
I just wanted to point out that,

00:14:42.000 --> 00:14:46.000
that there's a lot still to be
done there. The cell manages,

00:14:46.000 --> 00:14:50.000
thank you, to read this just fine,
but we're not as smart yet as the

00:14:50.000 --> 00:14:54.000
cell, and so we're not totally
able to read out all this stuff.

00:14:54.000 --> 00:14:58.000
We'll come back to genomics in a,
in a further lecture. Yes? Yeah,

00:14:58.000 --> 00:15:01.000
wouldn't --
What a cool idea.

00:15:01.000 --> 00:15:04.000
Yes. There are actually some
experiments, which maybe if you

00:15:04.000 --> 00:15:08.000
remind me we can, I can
work it into a subsequent

00:15:08.000 --> 00:15:11.000
lecture, but people have some
experiments where they can randomly

00:15:11.000 --> 00:15:14.000
mutagenize zillions of bacteria and
determine which ones will grow and

00:15:14.000 --> 00:15:18.000
which ones won't. And they
can do it all in parallel

00:15:18.000 --> 00:15:21.000
in a single test-tube. And
thereby you can tell which,

00:15:21.000 --> 00:15:24.000
which nucleotides in the
genome matter and which don't.

00:15:24.000 --> 00:15:28.000
It's a kind of cool procedure.
OK. Anyway, I just wanted to sort

00:15:28.000 --> 00:15:31.000
of tie up that bit here. Now,
let's move on to re-sequencing

00:15:31.000 --> 00:15:35.000
a gene. So, let's
suppose we've managed to

00:15:35.000 --> 00:15:39.000
sequence, I don't know, the
human genome, the entirety of

00:15:39.000 --> 00:15:43.000
the human genome we have before
us, OK? Actually, next week in the

00:15:43.000 --> 00:15:47.000
journal Nature will appear
a paper reporting, in fact,

00:15:47.000 --> 00:15:51.000
today, yester, no, yesterday, in
fact, it was yesterday, yesterday

00:15:51.000 --> 00:15:55.000
appeared in the journal Nature
a paper reporting the finished

00:15:55.000 --> 00:15:59.000
sequence of the
human genome.

00:15:59.000 --> 00:16:03.000
http://www.nature. om/nature/journal/v431/n7011/pdf/nature03001.pdf

00:16:03.000 --> 00:16:07.000
And so, anybody
who wants to go

00:16:07.000 --> 00:16:11.000
online, in fact, we can
get, we'll get copies for the

00:16:11.000 --> 00:16:14.000
class. Why don't we get you copies
of the paper? It's not as long as

00:16:14.000 --> 00:16:17.000
the last one. But, in fact,
I didn't realize it was

00:16:17.000 --> 00:16:21.000
yesterday. I thought it was next
week. Yesterday came out the final

00:16:21.000 --> 00:16:24.000
report on the finished
sequence of the human genome,

00:16:24.000 --> 00:16:28.000
which a number of us have been
laboring on for quite a long time.

00:16:28.000 --> 00:16:31.000
And it just appeared. So,
we actually have that now.

00:16:31.000 --> 00:16:35.000
It actually, it's been on the
Web for a while, but the paper

00:16:35.000 --> 00:16:39.000
describing it took a while to
write up and it came out yesterday.

00:16:39.000 --> 00:16:43.000
So, we'll get you a copy of that.
But now you've got that whole

00:16:43.000 --> 00:16:46.000
sequence of the human genome here.
I've been, you know, I've been

00:16:46.000 --> 00:16:50.000
working on this paper with
people for so long that,

00:16:50.000 --> 00:16:54.000
you know, I hadn't actually paid
attention to the fact that it just

00:16:54.000 --> 00:16:58.000
came out. You don't want to know
how long it took to write this.

00:16:58.000 --> 00:17:01.000
The paper, actually, is unusual.
It's the only paper that Nature has

00:17:01.000 --> 00:17:05.000
ever published where the author list
is sufficiently long that we don't

00:17:05.000 --> 00:17:09.000
have it in the Journal. There's
a website that contains the

00:17:09.000 --> 00:17:13.000
author list. There are, I
believe, I don't have the final

00:17:13.000 --> 00:17:16.000
count, but something in the
neighborhood of about 200 authors to

00:17:16.000 --> 00:17:20.000
the paper. We decided that
everybody who'd worked on it should

00:17:20.000 --> 00:17:24.000
be a co-author of the paper, and
we just put it all on a website.

00:17:24.000 --> 00:17:28.000
So, anyway, I digress. So, suppose
we have the beta-globin gene here.

00:17:28.000 --> 00:17:31.000
So, I've got that in -- I've
got the normal form of the

00:17:31.000 --> 00:17:34.000
beta-globin gene, or I've
got one person's form,

00:17:34.000 --> 00:17:37.000
in any case, in the human genome
sequence. Now I want to take a

00:17:37.000 --> 00:17:41.000
patient with sickle cell anemia and
I want to re-sequence their gene.

00:17:41.000 --> 00:17:44.000
Now, remember what we said, we
would, we would make a library from

00:17:44.000 --> 00:17:47.000
that person, right? So,
we'd get that person's blood,

00:17:47.000 --> 00:17:50.000
we'd purify DNA, we'd cut it, we'd
clone it, we'd probe the library

00:17:50.000 --> 00:17:53.000
with a radioactive probe
for the beta-globin gene,

00:17:53.000 --> 00:17:56.000
we'd pull out the gene
and we'd re-sequence it.

00:17:56.000 --> 00:18:00.000
Suppose we wanted to do
that to a hundred patients.

00:18:00.000 --> 00:18:03.000
For every patient we'd get blood,
we'd make DNA, we'd clone in a

00:18:03.000 --> 00:18:07.000
plasma, we'd made a whole plasma
library, plate it out on filter,

00:18:07.000 --> 00:18:10.000
probe it with a radioactive probe,
pull out the clone and sequence it.

00:18:10.000 --> 00:18:14.000
Now, for any such library you
probably need to look through a

00:18:14.000 --> 00:18:18.000
couple hundred thousand
clones to find beta-globin.

00:18:18.000 --> 00:18:21.000
So, for your DNA and your
DNA and your DNA and your DNA,

00:18:21.000 --> 00:18:25.000
we're going to make libraries
of a hundred thousand clones,

00:18:25.000 --> 00:18:29.000
that's a couple, that's
a lot of plates, right?

00:18:29.000 --> 00:18:32.000
We're going to put them all
on, on nylon filters in these

00:18:32.000 --> 00:18:35.000
Seal-a-Meal bags with
these radioactive probes,

00:18:35.000 --> 00:18:38.000
and we're going to look
for your beta-globin clone,

00:18:38.000 --> 00:18:42.000
your beta-globin clone,
your beta-globin clone, your

00:18:42.000 --> 00:18:45.000
beta-globin clone,
et cetera, et cetera,

00:18:45.000 --> 00:18:48.000
et cetera. This is really boring.
Do you realize how off putting it

00:18:48.000 --> 00:18:51.000
would be to study sickle cell
anemia if we had to do that for each

00:18:51.000 --> 00:18:55.000
successive patient,
make a whole library?

00:18:55.000 --> 00:18:58.000
But that was what you had to do in
molecular biology because that was

00:18:58.000 --> 00:19:02.000
how you got the gene.
You build a whole library,

00:19:02.000 --> 00:19:07.000
you withdraw it from the library.
However, if you wanted to do this,

00:19:07.000 --> 00:19:12.000
could you manage to get the
beta-globin sequence from your

00:19:12.000 --> 00:19:16.000
genome without having to
make the whole library?

00:19:16.000 --> 00:19:21.000
It turns out, and I know it's been
covered at least in some of the

00:19:21.000 --> 00:19:26.000
sections, there's a cool
technique to do that. And what is

00:19:26.000 --> 00:19:32.000
that technique? PCR. So,
it turns out that the next

00:19:32.000 --> 00:19:39.000
really great advance in molecular
biology was the technique of PCR.

00:19:39.000 --> 00:19:47.000
And what PCR was a way, is, is
a way to obtain a piece of DNA

00:19:47.000 --> 00:19:54.000
corresponding to an already known
gene, you have to already know the

00:19:54.000 --> 00:20:01.000
gene, and what it allows you to
do is then obtain that piece of DNA

00:20:01.000 --> 00:20:09.000
based on knowing at least
some of its sequence.

00:20:09.000 --> 00:20:14.000
It allows you to amplify just that
DNA from a, from any individual.

00:20:14.000 --> 00:20:20.000
So, as compared to the experiment
where I make a library for you and a

00:20:20.000 --> 00:20:25.000
library for you and a library
from you and a library from you,

00:20:25.000 --> 00:20:31.000
each of which could take a month,
PCR would allow us to do it in

00:20:31.000 --> 00:20:36.000
principle in five
minutes. And, actually,

00:20:36.000 --> 00:20:41.000
there are machines that would
let you do it in five minutes.

00:20:41.000 --> 00:20:46.000
So, let's discuss how this PCR
works. Nobody uses the five minute

00:20:46.000 --> 00:20:52.000
machines because you usually
will then wait an hour or so,

00:20:52.000 --> 00:20:57.000
but anyway. Suppose I take
my DNA sequence here from

00:20:57.000 --> 00:21:04.000
the human genome. Five
prime to three prime.

00:21:04.000 --> 00:21:14.000
Five prime to three prime.
This sequence here beta-globin.

00:21:14.000 --> 00:21:23.000
I want to obtain that sequence.
The first thing I do is I'm going to

00:21:23.000 --> 00:21:33.000
heat my DNA sample to maybe
97 degrees Celsius to denature.

00:21:33.000 --> 00:21:42.000
Denaturing means, of
course, breaking the hydrogen

00:21:42.000 --> 00:21:51.000
bonds that separate the two strands
so that the strands come apart,

00:21:51.000 --> 00:22:00.000
five prime to three prime,
five prime to three prime.

00:22:00.000 --> 00:22:09.000
Now, what I then do is I take a
specific DNA primer matching this

00:22:09.000 --> 00:22:19.000
stretch just before the
beta-globin gene starts.

00:22:19.000 --> 00:22:22.000
Or just before where I'm interested
in. How do I make a primer that

00:22:22.000 --> 00:22:26.000
matches just that sequence? I
order, well, how do I know what

00:22:26.000 --> 00:22:30.000
to order? I know
the sequence, right?

00:22:30.000 --> 00:22:34.000
I've got the sequence already. I
just look at it and I say I want

00:22:34.000 --> 00:22:38.000
that sequence. And
then how do I get it?

00:22:38.000 --> 00:22:42.000
I order it. I type it into the Web
and the machine will synthesize me

00:22:42.000 --> 00:22:47.000
this, this primer. Typically
a 20-base stretch will

00:22:47.000 --> 00:22:51.000
suffice. So, I'll get me a
twentymer, a 20 base oligonucleotide

00:22:51.000 --> 00:22:55.000
complimentary to the sequence
on this side of the gene.

00:22:55.000 --> 00:23:00.000
What I'm also going to do
is the same thing over here.

00:23:00.000 --> 00:23:06.000
I'm going to get a second primer.
This is primer number one. This is

00:23:06.000 --> 00:23:12.000
primer number two.
OK? Now, let's see.

00:23:12.000 --> 00:23:19.000
Five prime. This is five prime,
five prime. Now what I'd like to do

00:23:19.000 --> 00:23:25.000
is add polymerase,
I'd like to add dNTPs.

00:23:25.000 --> 00:23:32.000
So, plus DNA
polymerase plus dNTPs.

00:23:32.000 --> 00:23:36.000
And what will happen?
Polymerase will come along and

00:23:36.000 --> 00:23:41.000
start copying my DNA, but
it will only copy it starting

00:23:41.000 --> 00:23:45.000
from the primers. Now,
this will keep going,

00:23:45.000 --> 00:23:50.000
of course, but DNA
polymerase doesn't go forever,

00:23:50.000 --> 00:23:55.000
you know, the reactions
sort of stops at some point.

00:23:55.000 --> 00:23:59.000
And so you'll get a strand
going off here and a strand

00:23:59.000 --> 00:24:04.000
going off there. Now,
notice what I've done.

00:24:04.000 --> 00:24:10.000
I started with an entire human
genome, and the number of copies of

00:24:10.000 --> 00:24:15.000
beta-globin was one per genome.
When I'm done with this process,

00:24:15.000 --> 00:24:21.000
how many copies, how many
double-stranded copies of

00:24:21.000 --> 00:24:27.000
beta-globin do I have? Two.
That's still very little,

00:24:27.000 --> 00:24:33.000
but it's more than I had
before. So, what do I do next?

00:24:33.000 --> 00:24:41.000
Repeat. So, let's heat up that
sample again. We'll denature at 97

00:24:41.000 --> 00:24:48.000
degrees, and now we have our initial
strand here, we have our strand that

00:24:48.000 --> 00:24:56.000
came off this primer that runs
to here and maybe goes forward,

00:24:56.000 --> 00:25:05.000
we have this strand here.
We have this strand here.

00:25:05.000 --> 00:25:15.000
And this was five prime, five
prime, five prime, five prime.

00:25:15.000 --> 00:25:25.000
Now what do we do? We repeat.
We'll take our primer, this is

00:25:25.000 --> 00:25:33.000
primer number one, let's
see. It matches over there.

00:25:33.000 --> 00:25:41.000
Primer number two over here.
Number one over here. Number two.

00:25:41.000 --> 00:25:48.000
Have I got this right? Yes. Good.
Then where does this guy stop?

00:25:48.000 --> 00:25:56.000
Right at the end where my
other primer was. This guy

00:25:56.000 --> 00:26:02.000
runs along here. That guy
stops right at the end.

00:26:02.000 --> 00:26:07.000
That guy might go a little further.
How many copies of the beta-globin

00:26:07.000 --> 00:26:12.000
gene do I have now?
Four. Two of which,

00:26:12.000 --> 00:26:17.000
by the way, perfectly sit between
my pink primers. What's going to

00:26:17.000 --> 00:26:22.000
happen if I do this again?
How many copies will I get?

00:26:22.000 --> 00:26:27.000
Eight, six of which will sit
perfectly and two might be a little

00:26:27.000 --> 00:26:32.000
ragged as to where
they go. So, initially,

00:26:32.000 --> 00:26:38.000
after cycle number zero,
that is initial conditions,

00:26:38.000 --> 00:26:44.000
the number of copies relative
to the genome was one.

00:26:44.000 --> 00:26:50.000
After one cycle it's two.
After two cycles it's four.

00:26:50.000 --> 00:26:56.000
After N cycles it's two to
the N copies. Is that clear

00:26:56.000 --> 00:27:01.000
how the PCR works? And
that on every round you're

00:27:01.000 --> 00:27:07.000
doubling. And, with the
exception of those two

00:27:07.000 --> 00:27:13.000
white things that go off to the
side, they're going back and forth and

00:27:13.000 --> 00:27:18.000
back and forth between the two
primers you chose to put in.

00:27:18.000 --> 00:27:24.000
What is when N equals ten, what
do you got? A thousand copies.

00:27:24.000 --> 00:27:30.000
What happens when N equals
20? A million copies.

00:27:30.000 --> 00:27:34.000
What is the copy number
of beta-globin? Beta,

00:27:34.000 --> 00:27:38.000
let's suppose beta-globin,
for the sake of the argument,

00:27:38.000 --> 00:27:42.000
sake of argument is
about one thousand bases.

00:27:42.000 --> 00:27:46.000
What fraction of the human
genome does beta-globin represent?

00:27:46.000 --> 00:27:50.000
Yeah, about a millionth for
the genome. No, actually,

00:27:50.000 --> 00:27:54.000
one three millionth, but
we'll call it a millionth.

00:27:54.000 --> 00:27:58.000
So, after I've made a million-fold
amplification of beta-globin,

00:27:58.000 --> 00:28:03.000
beta-globin now represents half
of the stuff that's in the tube.

00:28:03.000 --> 00:28:08.000
What would happen if I go another
ten rounds? How many copies do I

00:28:08.000 --> 00:28:13.000
have? A billion copies. So,
in other words, I started with

00:28:13.000 --> 00:28:18.000
something that was only present at
about one one-millionth of what was

00:28:18.000 --> 00:28:23.000
in my test tube. If I could
make a billion copies of

00:28:23.000 --> 00:28:28.000
that specific molecule, now it
so dominates the mixture that

00:28:28.000 --> 00:28:33.000
it is a thousand times more
abundant than the rest of the genome.

00:28:33.000 --> 00:28:39.000
It works. That's the
remarkable thing, this works.

00:28:39.000 --> 00:28:45.000
Any questions about the technique?
Now, yes? Well, I need two primers

00:28:45.000 --> 00:28:52.000
in their sequence. How many
copies do I need of each

00:28:52.000 --> 00:28:58.000
of those primers? Well, I,
I obviously need a lot of

00:28:58.000 --> 00:29:03.000
copies of those primers.
So, primer number one,

00:29:03.000 --> 00:29:06.000
it's a single sequence, but
when I order it from the company,

00:29:06.000 --> 00:29:10.000
I'm going to order me a boat
load, a lot of that primer. So, I'm

00:29:10.000 --> 00:29:13.000
adding, I better add a billion
molecules of that primer because I'm

00:29:13.000 --> 00:29:16.000
going to make a billion copies
starting from such primers.

00:29:16.000 --> 00:29:20.000
But if I have a billion copies of,
of number one and a billion copies

00:29:20.000 --> 00:29:23.000
of number two and,
you know, these days,

00:29:23.000 --> 00:29:26.000
billions aren't such big, you
know, molecules are Avogadro's

00:29:26.000 --> 00:29:30.000
number and all that. It's
not hard to get things.

00:29:30.000 --> 00:29:34.000
So, you throw in huge excess, a
massive excess of primer number

00:29:34.000 --> 00:29:38.000
one, a massive excess of primer
number two, and you just do this.

00:29:38.000 --> 00:29:42.000
Now, I mean, what does it cost
to make such a massive excess of a

00:29:42.000 --> 00:29:46.000
primer? It's about ten cents
a base, so it's two bucks,

00:29:46.000 --> 00:29:50.000
two bucks per primer give or take.
You know, so I can get you a better

00:29:50.000 --> 00:29:54.000
price if you want,
but, you know, anyway.

00:29:54.000 --> 00:29:58.000
It's not a bad price to, to
buy primer. So, you can just go

00:29:58.000 --> 00:30:01.000
out and order a pair of primers.
You can have them tomorrow.

00:30:01.000 --> 00:30:05.000
And then all you have to do
is add the primers to the,

00:30:05.000 --> 00:30:09.000
so I take DNA. Do I,
I, I need DNA from you.

00:30:09.000 --> 00:30:13.000
It turns out I could draw your
blood and purify DNA and all that.

00:30:13.000 --> 00:30:16.000
But it turns out that if all I
wanted to do was amplify one locus,

00:30:16.000 --> 00:30:20.000
I could actually take a Popsicle
stick and ask you to scrape the

00:30:20.000 --> 00:30:24.000
inside of your cheek. That'll
get enough cells off from

00:30:24.000 --> 00:30:28.000
the inside of your cheek, stick
it in a test-tube, and it'll

00:30:28.000 --> 00:30:31.000
actually have enough DNA there.
It turns out this is a very

00:30:31.000 --> 00:30:35.000
sensitive and powerful technique,
so, but before we get to that notice

00:30:35.000 --> 00:30:39.000
what we had to do. We
had to heat our DNA to 97

00:30:39.000 --> 00:30:43.000
degrees and add polymerase.
Then we heat again to 97, add

00:30:43.000 --> 00:30:46.000
polymerase, heat again to 97,
add polymerase. Why do I have to

00:30:46.000 --> 00:30:50.000
keep adding polymerase? Because
polymerase gets roomed at

00:30:50.000 --> 00:30:54.000
97 degrees so it's denatured.
So, the nuisance about PCR is I

00:30:54.000 --> 00:30:57.000
have to go to my Eppendorf
plastic tube, pop open the lid,

00:30:57.000 --> 00:31:01.000
stick in some DNA polymerase,
close it up, stick it back in a

00:31:01.000 --> 00:31:05.000
heating bath, let it go for a while,
take it out, pop it open, add some

00:31:05.000 --> 00:31:09.000
more polymerase, put it
back in the heating bath,

00:31:09.000 --> 00:31:13.000
pop it out. And this
is actually the way

00:31:13.000 --> 00:31:17.000
primitive scientists did PCR not
so long ago, OK? Wouldn't it be cool

00:31:17.000 --> 00:31:22.000
if we could engineer a DNA
polymerase that didn't denature at

00:31:22.000 --> 00:31:26.000
97 degrees? Because then what we
could do is just add the polymerase,

00:31:26.000 --> 00:31:31.000
close up the tube, put it
in a machine that goes heat,

00:31:31.000 --> 00:31:35.000
cool, heat, cool, heat, cool,
heat, cool, but you would have,

00:31:35.000 --> 00:31:40.000
so how do we, what kind of cleaver
biological engineering do we use to

00:31:40.000 --> 00:31:45.000
modify polymerase so it
won't denature at 97 degrees?

00:31:45.000 --> 00:31:49.000
Yes? Get it from a bacteria.
What kind of a bacteria would you

00:31:49.000 --> 00:31:53.000
ask for an enzyme that could work
in, in basically boiling water?

00:31:53.000 --> 00:31:57.000
Bacteria that basically live
in boiling water. Where would

00:31:57.000 --> 00:32:03.000
you look for such?
Thermal vents.

00:32:03.000 --> 00:32:09.000
You'll, geysers, things
like that. Life lives

00:32:09.000 --> 00:32:15.000
everywhere. What you go is
you find yourself a bacterium,

00:32:15.000 --> 00:32:21.000
so you find bacteria that lived in
geysers or in thermal vents and you

00:32:21.000 --> 00:32:27.000
purify their DNA polymerase. The
most famous one comes from the

00:32:27.000 --> 00:32:34.000
organism, the bacterium called
thermos aquaticus, aquaticus.

00:32:34.000 --> 00:32:38.000
Which of course means hot water,
right? That's what the bacteria is

00:32:38.000 --> 00:32:43.000
called, thermos aquaticus. And,
or, and its enzyme is called

00:32:43.000 --> 00:32:47.000
tack, tack. So, we'll
refer to it often,

00:32:47.000 --> 00:32:52.000
Taq polymerase, meaning from
this bacteria thermos aquaticus,

00:32:52.000 --> 00:32:57.000
OK? So, that's Taq. So, it turns
out that you can do this now without

00:32:57.000 --> 00:33:02.000
having to open and
close the test-tubes.

00:33:02.000 --> 00:33:11.000
Oops, I meant to put that here.
How sensitive is PCR? It's very

00:33:11.000 --> 00:33:20.000
sensitive. You could do,
so applications of PCR. Very

00:33:20.000 --> 00:33:30.000
versatile. First let's
just re-sequence a gene.

00:33:30.000 --> 00:33:36.000
Gene from yeast or from human.
You just need, you know, any DNA

00:33:36.000 --> 00:33:42.000
sample. Get my gene, get
my primers, and as I was

00:33:42.000 --> 00:33:48.000
indicating with a Popsicle stick,
I don't have to have it very pure,

00:33:48.000 --> 00:33:54.000
although in a laboratory you go
to the trouble of making it pure

00:33:54.000 --> 00:34:00.000
because you want it to
be pure and all that. Yes?

00:34:00.000 --> 00:34:06.000
Correct. Yeah. So,
remember I was making a fuss

00:34:06.000 --> 00:34:12.000
over the accuracy of replication,
right? And I said that on its own a

00:34:12.000 --> 00:34:19.000
polymerase might have an accuracy
to only about ten to the minus five.

00:34:19.000 --> 00:34:25.000
So, now, what were
the two mechanisms for,

00:34:25.000 --> 00:34:32.000
for repairing DNA,
for proofreading DNA?

00:34:32.000 --> 00:34:35.000
One was a built-in proofreading
activity that the enzyme had.

00:34:35.000 --> 00:34:38.000
The enzyme would have put in
a base, would check the base,

00:34:38.000 --> 00:34:42.000
and that actually helped by
an order of magnitude or two.

00:34:42.000 --> 00:34:45.000
And some of these polymerases
have a proofreading activity.

00:34:45.000 --> 00:34:49.000
But then we also discussed the
mismatch repair activity that would

00:34:49.000 --> 00:34:52.000
later come along and detect
mismatches. You're absolutely right,

00:34:52.000 --> 00:34:56.000
PCR is not as accurate as cells
because it doesn't have that

00:34:56.000 --> 00:35:00.000
mismatch repair activity. So,
when you take a PCR product,

00:35:00.000 --> 00:35:05.000
if I were to clone, so if I
were to take all the PCR product,

00:35:05.000 --> 00:35:10.000
say, from my beta-globin gene,
so I'm going to take my test-tube,

00:35:10.000 --> 00:35:15.000
I'm going to add my primers and
everything, I'm going to PCR,

00:35:15.000 --> 00:35:20.000
I'm going to PCR, and then I'm
going to get a lot of copies of

00:35:20.000 --> 00:35:25.000
beta-globin. If I were to take
that beta-globin and just directly

00:35:25.000 --> 00:35:30.000
sequence the DNA in the test-tube.
Here's my pieces of beta-globin.

00:35:30.000 --> 00:35:34.000
I can now sequence it by adding
a primer and doing my fluorescent

00:35:34.000 --> 00:35:39.000
sequencing and running it
on a sequencer and all that.

00:35:39.000 --> 00:35:44.000
Sorry, going the other way.
I'll run a sequencing reaction.

00:35:44.000 --> 00:35:48.000
I could actually do it, and what I
do it on is the whole population of

00:35:48.000 --> 00:35:53.000
a million or a billion molecules.
If any one of them is wrong it's

00:35:53.000 --> 00:35:58.000
going to be swamped out by
others, OK? Because I could do my

00:35:58.000 --> 00:36:03.000
sequencing reaction on
the whole PCR product.

00:36:03.000 --> 00:36:07.000
And random mistakes in one molecule
or the other will still be a tiny

00:36:07.000 --> 00:36:12.000
minority of the votes at any given
base, right? But suppose I were to

00:36:12.000 --> 00:36:17.000
take my PCR product, all
these amplified molecules here,

00:36:17.000 --> 00:36:22.000
and suppose I were to clone them
individually and I were to sequence

00:36:22.000 --> 00:36:27.000
each of those individual
clones instead of sequencing a,

00:36:27.000 --> 00:36:32.000
a mixture of all the
products. I would, in fact,

00:36:32.000 --> 00:36:36.000
see a higher mutation rate.
And you're absolutely right.

00:36:36.000 --> 00:36:41.000
When people clone PCR products they
have to check them afterwards and

00:36:41.000 --> 00:36:46.000
throw out the ones that are wrong,
OK? Absolutely right. Good, good,

00:36:46.000 --> 00:36:50.000
good. So, you guys are, you know,
right on top of the important issues

00:36:50.000 --> 00:36:55.000
about, about DNA. So, so I
can, I can take a gene and

00:36:55.000 --> 00:37:00.000
I can re-sequence it. I
can also do things like take

00:37:00.000 --> 00:37:05.000
blood and look for the
presence of a virus.

00:37:05.000 --> 00:37:11.000
So, I could re-sequence beta-globin
and study people and see who's got

00:37:11.000 --> 00:37:18.000
sickle cell anemia and all that.
I could take blood and I might want

00:37:18.000 --> 00:37:24.000
to say do I see the HIV virus
present in someone's blood?

00:37:24.000 --> 00:37:31.000
For example, HIV testing can be
done by making PCR primers for the

00:37:31.000 --> 00:37:37.000
sequence of the HIV
virus. It has a genome.

00:37:37.000 --> 00:37:41.000
Taking a human's blood sample and
PCR-ing it. If you get a positive

00:37:41.000 --> 00:37:45.000
PCR product, a PCR product that is
made by these two primers and if,

00:37:45.000 --> 00:37:49.000
for example, you checked that it,
that it gives you the HIV sequence

00:37:49.000 --> 00:37:53.000
then you know that that blood sample
has, that person has the HIV virus.

00:37:53.000 --> 00:37:57.000
This is a way to do this. The
PCR reaction itself is fast.

00:37:57.000 --> 00:38:01.000
Typically takes hours. In
fact, can be forced to go much

00:38:01.000 --> 00:38:04.000
more quickly by machines
that rapidly thermocycle.

00:38:04.000 --> 00:38:08.000
And you can actually PCR in five
minutes, although people don't do it

00:38:08.000 --> 00:38:11.000
very often, but if you put a
thin glass capillary and go heat,

00:38:11.000 --> 00:38:14.000
cold, heat, cold very, very
quickly, there's a machine from Idaho

00:38:14.000 --> 00:38:18.000
Technologies that can do it in five
minutes, but it's usually not the

00:38:18.000 --> 00:38:21.000
trouble. And you just put it in and,
you know, in a couple hours you'll

00:38:21.000 --> 00:38:24.000
get an answer there as to
whether or not somebody has HIV,

00:38:24.000 --> 00:38:28.000
for example. So, you can do
that to detect relatively low

00:38:28.000 --> 00:38:33.000
quantities of virus.
How low can you go?

00:38:33.000 --> 00:38:39.000
Well, it turns out, what's
the limit? What's the

00:38:39.000 --> 00:38:45.000
smallest number of molecules you
might be able to detect in a sample?

00:38:45.000 --> 00:38:51.000
Theoretically. One. You
can't fewer than one molecule,

00:38:51.000 --> 00:38:57.000
right? So, one might be the limit.
So, how could I arrange to have a

00:38:57.000 --> 00:39:01.000
single molecule in a test-tube?
I would like to have a test-tube

00:39:01.000 --> 00:39:04.000
that has exactly one copy
of the beta-globin gene.

00:39:04.000 --> 00:39:08.000
What, how's the best, what's
the best way to get exactly

00:39:08.000 --> 00:39:11.000
one copy of beta-globin
and put it in the test-tube?

00:39:11.000 --> 00:39:14.000
Sorry? You can't.
Why? Just one molecule.

00:39:14.000 --> 00:39:18.000
I want to get exactly one
copy of beta-globin. I could,

00:39:18.000 --> 00:39:21.000
I could just take total DNA
and dilute it so, on average,

00:39:21.000 --> 00:39:24.000
there's only one copy. Or,
actually, is there any way to,

00:39:24.000 --> 00:39:28.000
I mean can I, I'd just like to
buy a package that contains exactly

00:39:28.000 --> 00:39:32.000
one beta-globin. Sorry?
Bind it to something big.

00:39:32.000 --> 00:39:36.000
Let's think biologically. Does
biology package up a single copy of

00:39:36.000 --> 00:39:41.000
beta-globin? Sorry?
Gametes. How about a sperm?

00:39:41.000 --> 00:39:46.000
Let's grab a sperm by its tail
here, put it in the test-tube.

00:39:46.000 --> 00:39:50.000
It's one copy of beta-globin.
So, you can actually take cell

00:39:50.000 --> 00:39:55.000
sorters and have it cell sort
sperm into individual test-tubes.

00:39:55.000 --> 00:40:00.000
You now know there's
one copy of beta-globin.

00:40:00.000 --> 00:40:05.000
Heat it up, it will crack open
the sperm, add your primers,

00:40:05.000 --> 00:40:10.000
you can amplify beta-globin, it's
a single copy. That proves its

00:40:10.000 --> 00:40:16.000
extraordinary sensitivity. You
can do it with a single sperm.

00:40:16.000 --> 00:40:21.000
You can do it with a single
egg also, but harder to come by.

00:40:21.000 --> 00:40:27.000
So, with that level of sensitivity,
you could do the following. So,

00:40:27.000 --> 00:40:32.000
single sperm typing. Now,
single sperm typing is cool but

00:40:32.000 --> 00:40:36.000
sort of useless. What are
you going to do with it,

00:40:36.000 --> 00:40:41.000
right? But here's another thing
you could do. Embryo typing.

00:40:41.000 --> 00:40:45.000
Suppose someone has a genetic
disease in their family,

00:40:45.000 --> 00:40:50.000
maybe it's Huntington's disease.
And suppose that the individual with

00:40:50.000 --> 00:40:54.000
Huntington's disease wants to
have kids. Or the individual,

00:40:54.000 --> 00:40:59.000
sorry, the individual who is at risk
for Huntington's disease or breast

00:40:59.000 --> 00:41:04.000
cancer or whatever
wants to have kids.

00:41:04.000 --> 00:41:13.000
What you can do is with an in vitro
fertilization clinic you're able to

00:41:13.000 --> 00:41:23.000
obtain eggs, fertilize eggs in vitro,
and grow them up in a Petri plate to

00:41:23.000 --> 00:41:33.000
8 or 16 cell stage before
re-implanting embryos

00:41:33.000 --> 00:41:41.000
back in the mother. Wouldn't
it be cool if we could

00:41:41.000 --> 00:41:48.000
choose to only re-implant an
embryo that did not have the genetic

00:41:48.000 --> 00:41:56.000
disease? How are we going
to do that? PCR. How are we,

00:41:56.000 --> 00:42:02.000
so what do we do?
We take the embryo.

00:42:02.000 --> 00:42:06.000
We make DNA from the embryo. We
do PCR and we say, ah-ha, this

00:42:06.000 --> 00:42:10.000
embryo did not have the genetic
disease. Problem is it has killed

00:42:10.000 --> 00:42:15.000
the, the cells there,
right, it killed the embryo.

00:42:15.000 --> 00:42:19.000
Any ideas? Pull off one cell.
Remove a single cell. It turns out

00:42:19.000 --> 00:42:24.000
that at stage the cells
are not differentiated.

00:42:24.000 --> 00:42:28.000
If I remove one cell from an
embryo at that very early stage,

00:42:28.000 --> 00:42:33.000
the other cells with make a
perfectly happy, healthy baby.

00:42:33.000 --> 00:42:37.000
That cell is not necessary.
This single cell sensitivity is

00:42:37.000 --> 00:42:41.000
very valuable because I can actually
do single cell genotyping on in

00:42:41.000 --> 00:42:45.000
vitro fertilized embryos and
be able offer parents a chance,

00:42:45.000 --> 00:42:49.000
the opportunity to re-implant only
those embryos that do not have the

00:42:49.000 --> 00:42:53.000
genetic defect. That's
cool. That's really cool.

00:42:53.000 --> 00:42:57.000
There are other things you might
be able to do. If you're treating a

00:42:57.000 --> 00:43:01.000
patient with cancer, a
patient, a cancer patient and

00:43:01.000 --> 00:43:05.000
you've given chemotherapy you want
to know have I managed to eradicate

00:43:05.000 --> 00:43:11.000
the cancer cells? And six
months later have any of the

00:43:11.000 --> 00:43:19.000
cancer cells come back?
I could look for very low

00:43:19.000 --> 00:43:27.000
quantities of cancer cells. I
can, I can do surveillance for

00:43:27.000 --> 00:43:35.000
low quantities of cancer
cells following chemotherapy.

00:43:35.000 --> 00:43:39.000
And, of course, I
can also do forensics.

00:43:39.000 --> 00:43:44.000
I could take a small sample of
blood from the scene of a crime or

00:43:44.000 --> 00:43:49.000
saliva from the back of an
envelope that someone has licked,

00:43:49.000 --> 00:43:53.000
and I could do PCR and look for
genetic variations that distinguish

00:43:53.000 --> 00:43:58.000
people. And, presumably, you
see all that stuff on television

00:43:58.000 --> 00:44:04.000
all the time. So, that's
what PCR is good for.

00:44:04.000 --> 00:44:10.000
It's good. All right. Last topic,
very brief topic, but I do want to

00:44:10.000 --> 00:44:17.000
mention. This was being able to
analyze a gene directed mutagenesis.

00:44:17.000 --> 00:44:23.000
And I won't go through the details
of all this, but I just want to at

00:44:23.000 --> 00:44:30.000
least basically
describe the concept.

00:44:30.000 --> 00:44:36.000
I could take any piece of DNA,
say from a drosophila, and I can

00:44:36.000 --> 00:44:42.000
mutate the DNA in vitro. I can
change this base from a G to

00:44:42.000 --> 00:44:48.000
a C. There's a right,
there's a proper protocol and

00:44:48.000 --> 00:44:54.000
cooking trick for doing that. It
involves putting a certain oligo

00:44:54.000 --> 00:45:00.000
over it and extending, and
it doesn't matter exactly how.

00:45:00.000 --> 00:45:04.000
I could insert an extra gene
into that. I could use a little

00:45:04.000 --> 00:45:08.000
restriction enzyme to open
it up and stuff something in.

00:45:08.000 --> 00:45:13.000
I could delete something from this.
Maybe I'll use a restriction enzyme

00:45:13.000 --> 00:45:17.000
to cut it open, et
cetera. Basically I could,

00:45:17.000 --> 00:45:22.000
I can fuse genes together. I can
do whatever kind of construction of

00:45:22.000 --> 00:45:26.000
pieces of DNA and modifications
of pieces of DNA that I would

00:45:26.000 --> 00:45:31.000
like to do in vitro. I can
then take that mutated gene,

00:45:31.000 --> 00:45:36.000
let's say the gene is an
enzyme, encodes an enzyme,

00:45:36.000 --> 00:45:41.000
and the enzyme has an active site.
I could change the code for the

00:45:41.000 --> 00:45:46.000
amino acid right at the active site
to see if that amino acid really

00:45:46.000 --> 00:45:52.000
matters or not. I can
do any of those things.

00:45:52.000 --> 00:45:57.000
And I can put this back in an
organism. Remember that you,

00:45:57.000 --> 00:46:02.000
I said you could transform
DNA back into bacteria?

00:46:02.000 --> 00:46:07.000
Well, you can also do such
things as simply inject DNA into

00:46:07.000 --> 00:46:11.000
a fertilized egg. In fact,
at the stage where there's

00:46:11.000 --> 00:46:15.000
a male and a female pronucleus
that haven't fused yet right after

00:46:15.000 --> 00:46:19.000
fertilization. You can
take your little pipette

00:46:19.000 --> 00:46:22.000
and a needle and you can inject some
of the DNA you want into the male

00:46:22.000 --> 00:46:26.000
pronucleus, and then when the male
pronucleus and the female pronucleus

00:46:26.000 --> 00:46:30.000
fuse and the embryo grows
it will have your DNA.

00:46:30.000 --> 00:46:36.000
You can make mice that carry
whatever gene you've modified like

00:46:36.000 --> 00:46:42.000
this. You can also not, you,
you can also not just modify a

00:46:42.000 --> 00:46:49.000
piece of DNA and add,
this is gene addition,

00:46:49.000 --> 00:46:55.000
you can also do gene subtraction.
You can do gene subtraction and,

00:46:55.000 --> 00:47:01.000
again, I won't worry about
the details here, by taking

00:47:01.000 --> 00:47:07.000
embryonic stem cells. Much
in the news these days,

00:47:07.000 --> 00:47:11.000
and we may come back to them.
And in vitro, working with

00:47:11.000 --> 00:47:15.000
embryonic stem cells, to
transform a piece of DNA that has

00:47:15.000 --> 00:47:19.000
been arranged to recombine into the
gene of interest and know it out.

00:47:19.000 --> 00:47:24.000
So, if you build, if you build a
piece of DNA in vitro and you put it

00:47:24.000 --> 00:47:28.000
into a whole bunch of embryonic
stem cells you can select,

00:47:28.000 --> 00:47:32.000
by various cleaver techniques,
for those embryonic stem cells that

00:47:32.000 --> 00:47:38.000
have taken up your gene. And
not just taken it up but slammed

00:47:38.000 --> 00:47:45.000
it into the normal locus in
place of the normal locus.

00:47:45.000 --> 00:47:52.000
And that way you can knock out
a gene. You can do gene knockout.

00:47:52.000 --> 00:47:59.000
So, the basic point of this now,
to summarize these many lectures is

00:47:59.000 --> 00:48:06.000
we're now at the point where
this picture that we saw at the,

00:48:06.000 --> 00:48:13.000
at the beginning, function, gene,
protein, that we understood now

00:48:13.000 --> 00:48:20.000
first as a methodology,
genetics, biochemistry.

00:48:20.000 --> 00:48:26.000
And then we understood how genes
encode proteins through molecular

00:48:26.000 --> 00:48:32.000
biology. These tools of
recombinant DNA allow us to move

00:48:32.000 --> 00:48:37.000
in any direction. You want
to find the gene underlying

00:48:37.000 --> 00:48:41.000
a function, find the gene
for Huntington's disease?

00:48:41.000 --> 00:48:45.000
We could do it. Clone it
based solely on its linkage.

00:48:45.000 --> 00:48:49.000
You want to find the gene encoding
a protein? If I know its amino acid

00:48:49.000 --> 00:48:53.000
sequence, I can find the DNA
sequence that corresponds it.

00:48:53.000 --> 00:48:57.000
If I want to find what a certain
protein does, its function,

00:48:57.000 --> 00:49:01.000
I could get the gene for that
protein. I could knock out the gene

00:49:01.000 --> 00:49:05.000
for that protein and
see what its function is.

00:49:05.000 --> 00:49:08.000
Suddenly, for the
mathematicians amongst the group,

00:49:08.000 --> 00:49:12.000
this becomes a commutative diagram,
which you can chase around in any

00:49:12.000 --> 00:49:15.000
direction. That is, in a
sense, what the 20th century

00:49:15.000 --> 00:49:19.000
was about, was intellectually these
two disciplines merging through

00:49:19.000 --> 00:49:22.000
molecular biology and then
recombinant DNA giving you all the

00:49:22.000 --> 00:49:26.000
tools that if you're sitting at any
place in this triangle you can move

00:49:26.000 --> 00:49:29.000
this way and that way,
from a gene to a protein,

00:49:29.000 --> 00:49:33.000
from a protein to a gene, from
a function to a gene, from a

00:49:33.000 --> 00:49:36.000
function to a protein. Much
of the rest of the course we'll

00:49:36.000 --> 00:49:40.000
talk about how you use these tools,
but this brings to a close this

00:49:40.000 --> 00:49:44.000
first chunk of the course about the
concepts and the methodologies of

00:49:44.000 --> 00:49:48.000
molecular biology. Now, if
you hang on one more minute,

00:49:48.000 --> 00:49:52.000
this is my last lecture for a while.
I won't be, we're having an exam on,

00:49:52.000 --> 00:49:56.000
we have a quiz on Monday, and
then Bob's taking over again.

00:49:56.000 --> 00:50:00.000
So, I won't see you for the
next week or so. So, two things.

00:50:00.000 --> 00:50:04.000
One, I won't see you before the
World Series is over so everyone

00:50:04.000 --> 00:50:09.000
please think good thoughts
about the Red Sox. Number two,

00:50:09.000 --> 00:50:13.000
I will not see you
before the election. Vote.

00:50:13.000 --> 00:50:18.000
It's your choice who you
vote for, but vote. Good-bye.