1 00:00:01 --> 00:00:05 Good morning. Good morning. 2 00:00:05 --> 00:00:10 I don't know about you, but I can't take too many more nights like this. 3 00:00:10 --> 00:00:15 I confess, I haven't gotten a thing done for so many nights in a row now, 4 00:00:15 --> 00:00:20 but what a game! How many of you saw the game? Excellent. 5 00:00:20 --> 00:00:25 Very good, very good. You have your priorities straight in 6 00:00:25 --> 00:00:30 the world. Very good. Well, if it's possible to get your 7 00:00:30 --> 00:00:35 minds off Curt Schilling last night, and off more importantly tonight. 8 00:00:35 --> 00:00:40 Perhaps we can spend a bit of time this morning in the meanwhile with 9 00:00:40 --> 00:00:45 whatever spare neurons you have talking about recombinant DNA for a 10 00:00:45 --> 00:00:50 bit, OK? What we talked about last time was different ways to clone 11 00:00:50 --> 00:00:55 your gene based on its properties. We started off with cloning by 12 00:00:55 --> 00:01:00 complementation, right, the idea that if you took a 13 00:01:00 --> 00:01:05 library of clones, you would be able to put it into 14 00:01:05 --> 00:01:10 bacteria and select a bacterium whose phenotype had been restored by 15 00:01:10 --> 00:01:14 virtue of having the plasmid. You would complement the defect. 16 00:01:14 --> 00:01:18 You'd find the clone you wanted because it complemented the defect. 17 00:01:18 --> 00:01:22 That's great if you can put it into an organism that has a defect. 18 00:01:22 --> 00:01:25 You can do it with bacteria. You can do that with yeast. 19 00:01:25 --> 00:01:29 It's harder to do with large organisms because you can't inject 20 00:01:29 --> 00:01:33 enough of them with different clones to be able to make that practical 21 00:01:33 --> 00:01:37 unless you're working in cell culture or some very small, 22 00:01:37 --> 00:01:41 fast growing organism. We talked about being able to use a 23 00:01:41 --> 00:01:45 protein sequence, reverse translating that protein 24 00:01:45 --> 00:01:50 sequence in the computer from amino acid sequence to nucleotide sequence, 25 00:01:50 --> 00:01:54 and using the nucleotide sequence to design a probe to hybridize back to 26 00:01:54 --> 00:01:59 the genome. That works fine if you have a protein sequence. 27 00:01:59 --> 00:02:02 But the last topic we talked about that I wanted to just touch on again 28 00:02:02 --> 00:02:05 this morning was suppose you were trying to clone the gene that causes 29 00:02:05 --> 00:02:08 a certain human disease, and you have no idea what the 30 00:02:08 --> 00:02:11 protein was. Then, you can't use its amino acid 31 00:02:11 --> 00:02:15 sequence because you don't have the protein. What can you possibly do 32 00:02:15 --> 00:02:18 when all you know is that you have a gene which causes a genetic defect 33 00:02:18 --> 00:02:21 that causes a disease? And I said you could clone it using 34 00:02:21 --> 00:02:24 the ideas of genetic mapping, position, the things that Sturtevant 35 00:02:24 --> 00:02:28 developed. And, I touched on it briefly, 36 00:02:28 --> 00:02:31 and I want to just touch on it a bit more because some people had some 37 00:02:31 --> 00:02:34 questions about it. And I've set up a very simple 38 00:02:34 --> 00:02:38 example to show you. Suppose that, to make it easy, 39 00:02:38 --> 00:02:41 we're working in a fruit fly first. We're working drosophila, and 40 00:02:41 --> 00:02:44 suppose that the true picture of the underlying chromosome is like this. 41 00:02:44 --> 00:02:48 There's a locus that could either have a mutant allele M or the wild 42 00:02:48 --> 00:02:51 type allele plus. There's a bunch of other loci along 43 00:02:51 --> 00:02:54 the chromosome. And, let's suppose we know all of 44 00:02:54 --> 00:02:58 where they are and all that. And, they have two alternative 45 00:02:58 --> 00:03:01 alleles. At this locus the alleles are orange 46 00:03:01 --> 00:03:05 or pink. At this locus I'll call the alleles orange or pink. 47 00:03:05 --> 00:03:08 Now, these are different loci. These are different alleles. I've 48 00:03:08 --> 00:03:11 just called them orange and pink in both cases so I don't have a rainbow 49 00:03:11 --> 00:03:15 of colors up here to confuse us. But all I mean is there's two 50 00:03:15 --> 00:03:18 possible alleles here, two alleles here, two alleles here. 51 00:03:18 --> 00:03:21 This is the diseased gene we're interested in, 52 00:03:21 --> 00:03:25 and these are passive markers. These are other markers along the 53 00:03:25 --> 00:03:28 chromosome. If we were to set up a cross between 54 00:03:28 --> 00:03:32 heterozygotes, a heterozygote here, 55 00:03:32 --> 00:03:36 and a heterozygote here, and it were the case that on the 56 00:03:36 --> 00:03:40 chromosome bearing the mutant allele, it happened that at these three 57 00:03:40 --> 00:03:44 markers we had orange alleles. I don't know what they are, but 58 00:03:44 --> 00:03:48 whatever these orange alleles are, they might be a visible phenotype, 59 00:03:48 --> 00:03:52 forked or yellow or bristled. They could be a DNA sequence 60 00:03:52 --> 00:03:56 difference. They could be whatever you want, but let's suppose the M 61 00:03:56 --> 00:04:00 chromosome has a set of alleles that are different in each location than 62 00:04:00 --> 00:04:03 the plus chromosome. Then, when we look at the offspring 63 00:04:03 --> 00:04:07 that come out of this cross, let's only, for the sake of 64 00:04:07 --> 00:04:11 simplicity, look at those offspring who are homozygous mutants. 65 00:04:11 --> 00:04:15 Well, in general, if there's been no crossover here, 66 00:04:15 --> 00:04:18 then the M chromosome will have orange, orange, 67 00:04:18 --> 00:04:22 orange, orange, orange, orange. If there's been a crossover, 68 00:04:22 --> 00:04:26 however, it could go orange, orange, pink on one of those 69 00:04:26 --> 00:04:30 chromosomes. Or if there's been crossovers like this, 70 00:04:30 --> 00:04:34 it could go orange, orange, pink on one chromosome, and orange, 71 00:04:34 --> 00:04:37 pink, pink on the other chromosome. It could even, 72 00:04:37 --> 00:04:41 in the extreme, have had crossovers very close to 73 00:04:41 --> 00:04:44 the gene maybe here, and even maybe here. And you've got 74 00:04:44 --> 00:04:47 orange, pink, pink, and pink, pink, pink. 75 00:04:47 --> 00:04:51 But if we look at the many segregates, you know from genetic 76 00:04:51 --> 00:04:54 mapping that the closer the locus is to the disease gene, 77 00:04:54 --> 00:04:57 the more strongly correlated the inheritance will be, 78 00:04:57 --> 00:05:01 the tighter the linkage will be. This is nothing more than linkage 79 00:05:01 --> 00:05:04 mapping. But now, suppose we were doing 80 00:05:04 --> 00:05:08 linkage mapping, but for the sake of argument the 81 00:05:08 --> 00:05:12 whole genome had already been sequenced. Suppose the genome had 82 00:05:12 --> 00:05:15 been sequenced in a cross, and the whole genome of the fruit 83 00:05:15 --> 00:05:19 fly had been sequenced which it has been sequenced. 84 00:05:19 --> 00:05:22 And, we looked at a cross and we looked at the mutants. 85 00:05:22 --> 00:05:26 And what we did was we tried different positions along the genome. 86 00:05:26 --> 00:05:30 And at each position, we had some genetic marker. 87 00:05:30 --> 00:05:34 And that genetic marker might be as simple as the fact that at that 88 00:05:34 --> 00:05:38 position, maybe there is an A in the DNA sequence on one of the 89 00:05:38 --> 00:05:42 chromosomes, and maybe I don't know a G in the other sequence. 90 00:05:42 --> 00:05:47 And over here, this marker might be, there's a T in some particular 91 00:05:47 --> 00:05:51 position, and there's a C in some particular position. 92 00:05:51 --> 00:05:55 If we could assay that, if we could tell, we could look 93 00:05:55 --> 00:06:00 whether this spelling variation is closely correlated with the mutant. 94 00:06:00 --> 00:06:03 And this spelling variation is closely correlated with the 95 00:06:03 --> 00:06:07 inheritance of the mutant allele. And we could just try up and down 96 00:06:07 --> 00:06:11 the genome, different sites of spelling difference as if they were 97 00:06:11 --> 00:06:15 genetic markers in our cross because they are genetic markers in our 98 00:06:15 --> 00:06:18 cross, and see which one is most tightly correlated. 99 00:06:18 --> 00:06:22 The minute we get any genetic sequence difference, 100 00:06:22 --> 00:06:26 that shows co-inheritance linkage in this cross, we know that this spot 101 00:06:26 --> 00:06:30 in the genome must be nearby our mutation. 102 00:06:30 --> 00:06:33 So, we'll try one closer, and we'll try one on the other side. 103 00:06:33 --> 00:06:37 And, what you do is you test sites of genetic variation, 104 00:06:37 --> 00:06:40 first to find one that shows any co-inheritance. 105 00:06:40 --> 00:06:44 And once you've got that, you try ones closer, and closer, 106 00:06:44 --> 00:06:48 and closer. Last time I talked about the process of, 107 00:06:48 --> 00:06:51 if you had one of those markers you could use it to isolate the next 108 00:06:51 --> 00:06:55 clone and the next clone and the next clone. But you know what I 109 00:06:55 --> 00:06:59 realized? That's so old fashioned. We might as well deal with the fact 110 00:06:59 --> 00:07:02 we have a sequence of the genome. No more would you ever isolate the 111 00:07:02 --> 00:07:05 next clone and the next clone and the next clone. 112 00:07:05 --> 00:07:08 You just look it up in the computer. So, even if you have the whole 113 00:07:08 --> 00:07:11 sequence of the genome, we have to figure out what part of 114 00:07:11 --> 00:07:14 it was co-inherited along with this disease, and that's the way you do 115 00:07:14 --> 00:07:17 it, OK? Genetic mapping, ust as Sturtevant invented it, 116 00:07:17 --> 00:07:20 can be applied if you have a whole sequence of the genome, 117 00:07:20 --> 00:07:23 and enough sites of variation. And, I've drawn it for a fruit fly 118 00:07:23 --> 00:07:26 cross, but this could equally well be cystic fibrosis. 119 00:07:26 --> 00:07:29 The only difference is if we're doing this in human families and 120 00:07:29 --> 00:07:33 it's cystic fibrosis we don't have as many offspring. 121 00:07:33 --> 00:07:36 So, we have to pool data from many families. And, 122 00:07:36 --> 00:07:40 we can't arrange it so that every family has exactly the same orange 123 00:07:40 --> 00:07:43 alleles up here and pink alleles down there, but computers can deal 124 00:07:43 --> 00:07:47 with that. They can still figure out the correlation across many 125 00:07:47 --> 00:07:50 families, and you find the spot in the genome where for many, 126 00:07:50 --> 00:07:54 many, many families the kids who all got the disease show correlated 127 00:07:54 --> 00:07:57 inheritance with this marker. And that eventually pins you down 128 00:07:57 --> 00:08:01 to a region of the genome. It pins you down to those genetic 129 00:08:01 --> 00:08:04 markers that show the absolute tightest correlation, 130 00:08:04 --> 00:08:08 tight correlation, and that's where you look. 131 00:08:08 --> 00:08:12 And in that fashion, people went being able to map the 132 00:08:12 --> 00:08:16 location of Huntington's Disease in 1984 to, by now, 133 00:08:16 --> 00:08:20 mapping the locations of more than 1, 00 different human genetic diseases 134 00:08:20 --> 00:08:25 where people didn't know the protein in advance. They did it entirely 135 00:08:25 --> 00:08:29 based on this positional mapping. So, Sturtevant's idea, which I like 136 00:08:29 --> 00:08:33 so much, has played itself out so beautifully now in the area of 137 00:08:33 --> 00:08:38 modern molecular medicine. OK. So, onward. 138 00:08:38 --> 00:08:43 I want to talk about a few other variations on the theme rather 139 00:08:43 --> 00:08:48 quickly, and then I think I want to talk about how you analyze your 140 00:08:48 --> 00:08:53 clones. First, variations on cloning, 141 00:08:53 --> 00:08:58 I should just at least mention it. We talked about cloning in an 142 00:08:58 --> 00:09:04 autonomously replicating plasmid in a bacteria. 143 00:09:04 --> 00:09:07 So, you go to a bacteria. They have some autonomously 144 00:09:07 --> 00:09:10 replicated pieces of DNA. There are circles. You can clone 145 00:09:10 --> 00:09:13 in them, and you can typically, these things are on the order of, 146 00:09:13 --> 00:09:17 I don't know, 1,000 to 2,000 to 5, 00 bases can be readily cloned in 147 00:09:17 --> 00:09:20 these plasmids. You can do more, 148 00:09:20 --> 00:09:23 but that's a typical kind of number is the insert size, 149 00:09:23 --> 00:09:27 typically. But we in the lab go up to much higher numbers 150 00:09:27 --> 00:09:31 like 10,000 sometimes. You can also, if you wanted to study 151 00:09:31 --> 00:09:36 yeast, it turns out yeast happily have plasmids as well, 152 00:09:36 --> 00:09:41 and you can do a similar sort of thing for yeast. 153 00:09:41 --> 00:09:47 It turns out that instead of using plasmids, you can use bacterial 154 00:09:47 --> 00:09:52 viruses. These bacterial viruses have all different shapes as we've 155 00:09:52 --> 00:09:57 talked about, circular or linear, and they can typically hold, oh, 15, 156 00:09:57 --> 00:10:02 00-40,000. Some of these viruses are quite big. 157 00:10:02 --> 00:10:06 The bacteriophage lambda tends to carry a lot of stuff. 158 00:10:06 --> 00:10:11 And, it can replicate. So, you could do the same thing to 159 00:10:11 --> 00:10:15 that. You can even use viruses that infect mammalian cells and there are 160 00:10:15 --> 00:10:19 all sorts of viruses now that people clone in again, 161 00:10:19 --> 00:10:24 linear or circular. I don't know, for mammalian cells, 162 00:10:24 --> 00:10:28 you often, the viruses like 1,000-5, 00. You can even make artificial 163 00:10:28 --> 00:10:33 whole chromosomes now. You can do this in yeast. 164 00:10:33 --> 00:10:39 Artificial chromosomes are called YACs. They have all the little 165 00:10:39 --> 00:10:44 machinery, little telomeres on them, little centromeres. They have a 166 00:10:44 --> 00:10:50 selectable marker, and then you can clone into it your 167 00:10:50 --> 00:10:55 piece of DNA. And these can take up to a million bases of DNA. 168 00:10:55 --> 00:11:01 So, if you wanted, there are bacterial artificial chromosomes. 169 00:11:01 --> 00:11:05 They're called BACs if they're in bacteria. And recently, 170 00:11:05 --> 00:11:09 people have developed artificial chromosome systems for mammalian 171 00:11:09 --> 00:11:13 cells, and specifically human cells. And they're called unfortunately 172 00:11:13 --> 00:11:17 MACs and HACs and things like that. Basically, any molecule that can 173 00:11:17 --> 00:11:21 replicate in any system, some smart molecular biologist will 174 00:11:21 --> 00:11:25 come along and say, how do I use that for my purpose, 175 00:11:25 --> 00:11:30 to stick my DNA in it, and get it to replicate in this organism? 176 00:11:30 --> 00:11:36 And so, if something's not on this list, it will be soon, 177 00:11:36 --> 00:11:43 OK? Now, here's another thing. This is cloning chunks of DNA. 178 00:11:43 --> 00:11:50 Just to have the piece of DNA in a library, but suppose we want to do 179 00:11:50 --> 00:11:57 more than just have the DNA sitting there in the bacterium, 180 00:11:57 --> 00:12:04 suppose what I'd really like to do is take a bacterium, 181 00:12:04 --> 00:12:10 E coli, and put it to work for us. Maybe what I'd like to do is take a 182 00:12:10 --> 00:12:14 plasmid and insert in that plasmid the gene for human insulin. 183 00:12:14 --> 00:12:19 So, I'm going to take the DNA locus corresponding to human insulin, 184 00:12:19 --> 00:12:23 clone it into my plasmid. Maybe I'll have isolated it from my 185 00:12:23 --> 00:12:28 library because, let's see, insulin's protein 186 00:12:28 --> 00:12:32 sequence is known so I could reverse translate it to a nucleotide 187 00:12:32 --> 00:12:36 sequence. So, I could probe a library. 188 00:12:36 --> 00:12:40 So, I could find the clone that has insulin. Now what I'd like to do is 189 00:12:40 --> 00:12:43 persuade this bacteria not just to carry the DNA but to make insulin 190 00:12:43 --> 00:12:47 for me. Would that be useful? Yeah, how did people used to get 191 00:12:47 --> 00:12:51 insulin? Cadavers, dead bodies; it would be much easier 192 00:12:51 --> 00:12:54 to get them from a fermenter, right, to get insulin from a 193 00:12:54 --> 00:12:58 fermenter, if you could just ask E coli to make it. 194 00:12:58 --> 00:13:02 So, if we put it into E coli, will it make insulin for us? 195 00:13:02 --> 00:13:09 Here's the human locus, DNA for insulin. Will it make 196 00:13:09 --> 00:13:17 insulin? Let's see, how do you make a protein? 197 00:13:17 --> 00:13:24 You've got to start by making RNA, right? You've got to transcribe the 198 00:13:24 --> 00:13:32 gene. Will E coli transcribe this gene? 199 00:13:32 --> 00:13:36 Well, why? It's got a promoter, right? It's got the insulin 200 00:13:36 --> 00:13:41 promoter. There we go. The insulin promoter is here. 201 00:13:41 --> 00:13:45 So, E coli will come along to the insulin promoter and start making 202 00:13:45 --> 00:13:50 RNA? No, it turns out that promoters in humans and promoters in 203 00:13:50 --> 00:13:55 bacteria are sufficiently different. They don't work across species. 204 00:13:55 --> 00:14:00 They won't recognize the human promoter. Too bad. Any ideas? 205 00:14:00 --> 00:14:05 Yep? Stick a bacterial promoter there. Good, you're acting like a 206 00:14:05 --> 00:14:10 good molecular biology designer here. Let's put a bacterial promoter here. 207 00:14:10 --> 00:14:15 It will recognize its own promoter. That's great. Then, let's put the 208 00:14:15 --> 00:14:21 DNA for the human insulin gene here. And now, maybe we'll put the Lac 209 00:14:21 --> 00:14:26 operon, and when it has lactose it'll start making RNA from the 210 00:14:26 --> 00:14:32 human insulin gene. And it'll start translating it. 211 00:14:32 --> 00:14:38 And, we get insulin. Any problems? Well, 212 00:14:38 --> 00:14:44 will it make any, for starters? What's another aspect of mammalian 213 00:14:44 --> 00:14:50 genes that's different from bacterial genes? 214 00:14:50 --> 00:14:56 Processing, what kind of processing with the RNA? And the splicing, 215 00:14:56 --> 00:15:02 ooh, the insulin gene has introns that have to be spliced out. 216 00:15:02 --> 00:15:06 So, this is going to make some RNA, insulin RNA, and it needs to be 217 00:15:06 --> 00:15:10 processed like this. Will bacteria carry on our splicing 218 00:15:10 --> 00:15:14 for us? They don't do splicing. Yep? Well, that's a very 219 00:15:14 --> 00:15:18 interesting question because we haven't. But, 220 00:15:18 --> 00:15:22 what do you propose? You see, I've just taken a piece of 221 00:15:22 --> 00:15:26 human DNA from the human genome, which encodes the introns and the 222 00:15:26 --> 00:15:30 exons. But, you seem to have a solution to our problem, 223 00:15:30 --> 00:15:35 and what would that be? So, instead of making a library of 224 00:15:35 --> 00:15:42 genomic DNA, what you're suggesting is a radical idea. 225 00:15:42 --> 00:15:50 Let's instead take human RNA. Here's some human RNA, lots of 226 00:15:50 --> 00:15:57 human RNA, a big collection of human RNA. What was at the end of the 227 00:15:57 --> 00:16:02 human RNA: a poly(A) tail. And what I understand you to be 228 00:16:02 --> 00:16:06 suggesting is if we take human mRNAs, a whole collection of them, 229 00:16:06 --> 00:16:10 you want me to turn these mRNAs back into DNA and clone them instead of 230 00:16:10 --> 00:16:14 using the chromosomal DNA. How do I turn an RNA back to DNA? 231 00:16:14 --> 00:16:18 Is that possible? What do you use: reverse transcriptase. 232 00:16:18 --> 00:16:22 We have to give it a primer. So remember, five prime to three 233 00:16:22 --> 00:16:26 prime, we'd like to put a primer going over here. 234 00:16:26 --> 00:16:30 Any ideas for a good primer? Poly(T), isn't that convenient? 235 00:16:30 --> 00:16:35 One of the reasons that mammalian messages have poly(A) tails is so 236 00:16:35 --> 00:16:41 that we are able to reverse transcribe them using poly(T) 237 00:16:41 --> 00:16:46 primers. No, that's actually not true. So, we use reverse 238 00:16:46 --> 00:16:52 transcriptase. And what we can do is we'll copy 239 00:16:52 --> 00:16:58 this RNA into a strand of DNA. There we go. 240 00:16:58 --> 00:17:03 Then what we'll do, next step, is we'll take the DNA, 241 00:17:03 --> 00:17:09 and we'll copy back into a second strand of DNA. 242 00:17:09 --> 00:17:15 And now, we have double-stranded DNA whose sequence matches the 243 00:17:15 --> 00:17:21 already-processed mRNAs. Sorry? So, the sequences would 244 00:17:21 --> 00:17:27 match the mRNAs. So what you could do is instead of 245 00:17:27 --> 00:17:32 taking human DNA from the nucleus, you could take RNAs, 246 00:17:32 --> 00:17:38 turn them back into DNA by reverse transcriptase, 247 00:17:38 --> 00:17:43 and make a library now that consists of zillions of inserts, 248 00:17:43 --> 00:17:49 each of which has what's called a cDNA, a copied DNA, 249 00:17:49 --> 00:17:54 copied back from the RNA. The great advantage of this is that 250 00:17:54 --> 00:18:00 the human cell has already done the splicing, and so there 251 00:18:00 --> 00:18:05 are no introns left. Now, when you stick it in a 252 00:18:05 --> 00:18:09 bacterium, the bacterium is able to express this. It's able, 253 00:18:09 --> 00:18:13 if you give it its own bacterial promoter, to make an RNA. 254 00:18:13 --> 00:18:17 And if you don't ask the bacteria to have to splice, 255 00:18:17 --> 00:18:21 if you just give it a pre-spliced piece of DNA that doesn't need 256 00:18:21 --> 00:18:25 splicing, it can translate that DNA. Now, notice we used all of our 257 00:18:25 --> 00:18:29 tricks. You had to know about reverse transcriptase, 258 00:18:29 --> 00:18:34 poly(A) tails, structures of genes, introns, exons, yes, question? 259 00:18:34 --> 00:18:38 It doesn't. You do this in the test tube. You purify human mRNA in the 260 00:18:38 --> 00:18:42 test tube. You take that mRNA in a test tube, add reverse transcriptase, 261 00:18:42 --> 00:18:47 add poly(T), make this reaction of RNA to DNA in the test tube go back. 262 00:18:47 --> 00:18:51 Where does it come from? Viruses that copy themselves back for a 263 00:18:51 --> 00:18:56 living, right? So, again, every single thing we're 264 00:18:56 --> 00:19:00 using comes from some living organism that does this 265 00:19:00 --> 00:19:04 kind of stuff. And, when I teach you about the 266 00:19:04 --> 00:19:08 facts of how viruses replicate or what the structure of mRNAs look 267 00:19:08 --> 00:19:11 like or whatever, it's because every bit of knowledge 268 00:19:11 --> 00:19:14 we get about the way biology works turns into an incredibly powerful 269 00:19:14 --> 00:19:18 tool as it's turning out for us to actually be able to further study 270 00:19:18 --> 00:19:21 biology. So, great. So, where does reverse 271 00:19:21 --> 00:19:24 transcriptase come from now? Originally they come from viruses 272 00:19:24 --> 00:19:28 that turn themselves back from RNA to DNA. Now, how do you get reverse 273 00:19:28 --> 00:19:32 transcriptase? Catalog, right, 274 00:19:32 --> 00:19:38 very good. All right, so this is called, finally, 275 00:19:38 --> 00:19:44 a cDNA library. And, if you had made a cDNA library, 276 00:19:44 --> 00:19:49 you would be able to screen the cDNA library to find the gene for insulin. 277 00:19:49 --> 00:19:55 Is this useful? This happens to be, 278 00:19:55 --> 00:20:01 for example, one of the consequences of this was the biotechnology 279 00:20:01 --> 00:20:06 industry. OK, so if you have any doubts about 280 00:20:06 --> 00:20:10 the usefulness of understanding these abstract things about E coli 281 00:20:10 --> 00:20:14 and bacteria and stuff like that, one of the consequences was 282 00:20:14 --> 00:20:18 Genentech, Biogen, and Amgen, and if you just simply 283 00:20:18 --> 00:20:22 walk around Kendall Square, within a mile of this place you will 284 00:20:22 --> 00:20:26 see laid out before you the consequences of this ability, 285 00:20:26 --> 00:20:30 OK? It's transforming Cambridge. Yes? 286 00:20:30 --> 00:20:36 And the world. Yeah. Indeed. 287 00:20:36 --> 00:20:43 It might be that producing large amounts of insulin was bad for the 288 00:20:43 --> 00:20:50 bacteria because there would be so much protein it would clump and kill 289 00:20:50 --> 00:20:57 the bacteria. It might be that insulin, for various reasons, 290 00:20:57 --> 00:21:04 might not fold appropriately in the bacterial environment. 291 00:21:04 --> 00:21:07 And, this is why the biotechnology industry has lots of smart people 292 00:21:07 --> 00:21:10 working in it because you're totally, 100% right. You might decide that 293 00:21:10 --> 00:21:13 instead of cloning it in bacteria it's better to clone it in some 294 00:21:13 --> 00:21:16 insect cell in culture which, in fact, people like to work with, 295 00:21:16 --> 00:21:19 or some other cell, or a mammalian cell. And so, 296 00:21:19 --> 00:21:23 I simplify by saying put it in coli, but in fact that might test six 297 00:21:23 --> 00:21:26 different cell lines, six different host possibilities. 298 00:21:26 --> 00:21:29 They might have to take the insulin out and refold it in vitro 299 00:21:29 --> 00:21:33 and things like that. You're totally right. 300 00:21:33 --> 00:21:37 This is actually something that requires work to do it right, 301 00:21:37 --> 00:21:42 just like building an airplane requires work. 302 00:21:42 --> 00:21:47 I could tell you Bernoulli's principles, but then Boeing does 303 00:21:47 --> 00:21:51 more than just writes down Bernoulli's principles. 304 00:21:51 --> 00:21:56 OK, so onward. Now, I'd like to turn next to analyzing your clone. 305 00:21:56 --> 00:22:00 Analyzing the clone, so suppose we have, maybe it's by positional 306 00:22:00 --> 00:22:05 cloning, maybe it's by cDNA cloning, but one way or the other we've got 307 00:22:05 --> 00:22:10 us a clone that we're very interested in. 308 00:22:10 --> 00:22:14 Maybe it has the insulin gene. Maybe it has the Huntington's 309 00:22:14 --> 00:22:18 disease gene. Whatever it is, we're going to want to study it. 310 00:22:18 --> 00:22:22 And at the moment, I haven't told you how I would even read its DNA 311 00:22:22 --> 00:22:26 sequence or analyze its DNA. So, the first step is, of course, 312 00:22:26 --> 00:22:31 I have to purify the plasmid. And, it turns out that that can be done. 313 00:22:31 --> 00:22:34 There are simple biochemical techniques, as I mentioned in a 314 00:22:34 --> 00:22:37 previous lecture, that allow you to grow up a lot of 315 00:22:37 --> 00:22:40 the bacteria, crack them open, and the plasmid being a little 316 00:22:40 --> 00:22:43 circle, and being a little more tightly super-coiled and wound up 317 00:22:43 --> 00:22:46 has somewhat different physical properties. And you can use those 318 00:22:46 --> 00:22:50 to purify the plasmid. So, plasmid preps are not hard to 319 00:22:50 --> 00:22:53 do. You can get a fairly pure collection of the plasmid. 320 00:22:53 --> 00:22:56 Now, suppose I've done this for, oh, I don't know, let's take my 321 00:22:56 --> 00:23:00 first example, orange mutants. Suppose I tried to rescue bacteria 322 00:23:00 --> 00:23:04 that were orange minus, and suppose I found that 50 323 00:23:04 --> 00:23:08 different plasmids rescued my orange mutant because I transformed a lot 324 00:23:08 --> 00:23:12 of plasmids in, I plated it, and 50 colonies grew up. 325 00:23:12 --> 00:23:16 Are they all the same thing or are they different? 326 00:23:16 --> 00:23:20 Is there any quickie way to take a look at these 50 plasmids and see if 327 00:23:20 --> 00:23:24 they're identical or fairly close, or obviously different? Well, I'd 328 00:23:24 --> 00:23:28 like to take some way to take the DNA from the plasmid and analyze it 329 00:23:28 --> 00:23:32 kind of easily. I might want to see, 330 00:23:32 --> 00:23:37 like, how big is the insert? Right, that'd be one way, 331 00:23:37 --> 00:23:43 if they had different sized inserts so they couldn't be the same thing. 332 00:23:43 --> 00:23:49 So, maybe what I could do is how do I clone this? I used EcoRI sites I 333 00:23:49 --> 00:23:55 recall. So, I have EcoRI sites here. Suppose I were to take this DNA, 334 00:23:55 --> 00:24:02 and I were to now cut the DNA from the plasmid with EcoRI. 335 00:24:02 --> 00:24:09 Then, what I would get is two separate molecules. 336 00:24:09 --> 00:24:16 I would get the vector and the insert. How could I see how big 337 00:24:16 --> 00:24:24 they were? Gels, gel electrophoresis is the way to do 338 00:24:24 --> 00:24:29 that. So, I take a gel. A gel is a slab of gelatin, 339 00:24:29 --> 00:24:33 Jell-O, OK, and normally it's laid flat, but I'm going to do it 340 00:24:33 --> 00:24:37 vertically here. I load into the top of it here a 341 00:24:37 --> 00:24:41 little bit of my DNA, this whole mixture. I take the 342 00:24:41 --> 00:24:45 plasmid. I cut it. I put it in here. DNA's positive 343 00:24:45 --> 00:24:49 charge or negative charge? Negative. So, where should I put 344 00:24:49 --> 00:24:53 the positive pull? On the bottom, well done. 345 00:24:53 --> 00:24:57 That's often not done, and to the detriment of the experiment. 346 00:24:57 --> 00:25:01 If you put the positive pull here, it goes the wrong way, and everybody 347 00:25:01 --> 00:25:05 has to do that at least once. So, what'll happen is the DNA 348 00:25:05 --> 00:25:11 fragments move through, and the smaller fragments move 349 00:25:11 --> 00:25:16 faster than the big fragments, right? If something's little, it'll 350 00:25:16 --> 00:25:22 move fast. If something's big, it moves slowly: little, big. 351 00:25:22 --> 00:25:27 Smaller moves faster because it wiggles through the little pores in 352 00:25:27 --> 00:25:33 the gel better. So, suppose I were to do this for a 353 00:25:33 --> 00:25:39 bunch of plasmids, and what I saw was this. 354 00:25:39 --> 00:25:47 First order, what do you guess? Sorry? Top road's probably the 355 00:25:47 --> 00:25:55 plasmid vector. This is probably the vector, 356 00:25:55 --> 00:26:03 and what do I know about the inserts? At least two inserts, 357 00:26:03 --> 00:26:09 at least two distinct inserts. Now, if I wanted to be sure that was 358 00:26:09 --> 00:26:13 the vector, maybe what I could do is take another row, 359 00:26:13 --> 00:26:17 and run a known amount of the vector, take the vector alone and I could 360 00:26:17 --> 00:26:21 check that the vector alone runs over here. And maybe I might take 361 00:26:21 --> 00:26:25 some other known molecules. These would be called molecular 362 00:26:25 --> 00:26:29 weight standards. So, if I run some knowns in one of 363 00:26:29 --> 00:26:33 the lanes of the gel, I can even measure and say, 364 00:26:33 --> 00:26:37 ah-ha, the insert is somewhere between the size of this one and the 365 00:26:37 --> 00:26:40 size of that one. And so, I get a little ruler that I 366 00:26:40 --> 00:26:43 can put on the gel. So, in fact, that's the first thing 367 00:26:43 --> 00:26:46 you would do is you digest your clone that way. 368 00:26:46 --> 00:26:49 Now, does the fact that these guys have exactly the same, 369 00:26:49 --> 00:26:52 apparently, size on the gel mean that they're the exact same piece of 370 00:26:52 --> 00:26:55 DNA? No, because you can't even actually tell it's exactly the same. 371 00:26:55 --> 00:26:59 There's a limit to how precisely you can measure it. 372 00:26:59 --> 00:27:04 So, what else could you do? You could try another restriction 373 00:27:04 --> 00:27:10 enzyme. It turns out that since there are so many restriction 374 00:27:10 --> 00:27:15 enzymes in the catalog, if I take a piece of DNA, 375 00:27:15 --> 00:27:21 maybe that Eco fragment, I could try cutting it with HinDIII. 376 00:27:21 --> 00:27:26 And when I cut it with HinDIII, I'm going to get three distinct 377 00:27:26 --> 00:27:32 lengths. I could try cutting it with, oh, I don't know, 378 00:27:32 --> 00:27:37 pick another enzyme, BamHI. When I cut it with BamHI, 379 00:27:37 --> 00:27:43 I'll get some other lengths. And, how to get these lengths by 380 00:27:43 --> 00:27:48 adding these, by running them out on a gel and looking at their sizes. 381 00:27:48 --> 00:27:54 What if I added both HinDIII and BamHI to my test tube? 382 00:27:54 --> 00:28:00 I'd cut at both sites. So, I'd cut here, here, 383 00:28:00 --> 00:28:06 here, here, here. So, this is cut with HinDIII, 384 00:28:06 --> 00:28:12 here cut with BamHI, here cut with both and I could measure these 385 00:28:12 --> 00:28:19 lengths. So, suppose I gave you this as a computer problem, 386 00:28:19 --> 00:28:25 I have a string and it's an unknown string, and I cut it at two places 387 00:28:25 --> 00:28:31 and I get these lengths, X1, X2, X3. And then I take that same string and 388 00:28:31 --> 00:28:35 I cut it at other positions, Y1, Y2, and Y3 are the lengths that 389 00:28:35 --> 00:28:39 result. And then suppose I now cut it at both of the sites, 390 00:28:39 --> 00:28:43 and I measure it, and I get Z1, Z2, Z3, Z4, Z5. If I gave you all 391 00:28:43 --> 00:28:48 those numbers, could you figure out where the sites 392 00:28:48 --> 00:28:52 must be? Probably. It turns out to be a reasonably 393 00:28:52 --> 00:28:56 doable computer problem, although it can get a little hard in 394 00:28:56 --> 00:29:00 places. And you could try a third enzyme and 395 00:29:00 --> 00:29:03 a fourth enzyme, and it's a cute exercise to write 396 00:29:03 --> 00:29:07 yourself a little piece of code that will figure out where the sites are 397 00:29:07 --> 00:29:10 based on the lengths. The reason it occasionally gets 398 00:29:10 --> 00:29:13 funny what if Z3 and Z4 are exactly the same length and they run on top 399 00:29:13 --> 00:29:16 of each other in the gel, and there are special cases. 400 00:29:16 --> 00:29:20 But you can kind of reconstruct where those restriction sites must 401 00:29:20 --> 00:29:23 be just by writing a good piece of code that'll put these pieces 402 00:29:23 --> 00:29:26 together. This is called restriction mapping, 403 00:29:26 --> 00:29:30 and it's great fun. Everybody likes to do this once. 404 00:29:30 --> 00:29:33 But, it's only a limited amount of information, right, 405 00:29:33 --> 00:29:36 because you get where the sites are, and I guess if I gave you ten clones 406 00:29:36 --> 00:29:40 and they all had exactly the same restriction maps, 407 00:29:40 --> 00:29:43 the exact same positions of these restriction sites, 408 00:29:43 --> 00:29:47 you'd feel pretty confident they were the same clone. 409 00:29:47 --> 00:29:50 But you still wouldn't really know much about the clone other than it 410 00:29:50 --> 00:29:53 had two HinDIII sites and two BamHI sites, and here's where they were. 411 00:29:53 --> 00:29:57 What do you really want to know about this clone? It's 412 00:29:57 --> 00:30:02 DNA sequence, right? Let's not settle for anything less 413 00:30:02 --> 00:30:10 than the exact nucleotide sequence of the clone. So, 414 00:30:10 --> 00:30:18 that's really the last key topic is sequencing DNA. 415 00:30:18 --> 00:30:26 How are you going to sequence DNA? Well, suppose I give you some 416 00:30:26 --> 00:30:34 double strand of DNA, five prime to three prime, 417 00:30:34 --> 00:30:42 five prime, three prime, double stranded DNA. 418 00:30:42 --> 00:30:47 Let me heat it up. What happens when I heat up DNA? 419 00:30:47 --> 00:30:52 It melts the hydrogen bonds, the non-covalent hydrogen bonds here 420 00:30:52 --> 00:30:57 break, and I got my two strands separated. Now, 421 00:30:57 --> 00:31:02 what I'd like to do is I want to start reading out this DNA sequence. 422 00:31:02 --> 00:31:08 So, I'm going to make me a primer. Now, golly, here's a primer. 423 00:31:08 --> 00:31:14 You're going to ask me, how did I even know what primer to 424 00:31:14 --> 00:31:21 use if I don't know the DNA sequence? How can I make a primer? 425 00:31:21 --> 00:31:27 Hold that question. Make sure I remember to come back and answer 426 00:31:27 --> 00:31:34 that, OK? But for the moment, grant me that I have a primer here. 427 00:31:34 --> 00:31:39 What I'd like to do is add DNA polymerase. So, 428 00:31:39 --> 00:31:45 let's add some DNA polymerase. And, I'd like to add nucleotide 429 00:31:45 --> 00:31:50 triphosphates, dNTPs, the dATP, 430 00:31:50 --> 00:31:56 dCTP, the dGTP, dTTP, and if I add DNA polymerase and I 431 00:31:56 --> 00:32:02 add my nucleotides, what does Arthur Kornberg tell us 432 00:32:02 --> 00:32:07 will happen? It'll start polymerizing, 433 00:32:07 --> 00:32:13 right? And, it'll stop there. So, the polymerase knows the bases, 434 00:32:13 --> 00:32:18 right? It knows what base to put in because polymerase is very smart. 435 00:32:18 --> 00:32:24 So, the bases get put in correctly. The only problem is, how do we get 436 00:32:24 --> 00:32:30 polymerase to tell us what it just did? 437 00:32:30 --> 00:32:37 Here's a cute trick. This is, by the way, 438 00:32:37 --> 00:32:44 a cute trick that won the Nobel Prize. So, suppose my primer is 439 00:32:44 --> 00:32:51 like this: five prime, T, A, A, T, T, C, T, and the 440 00:32:51 --> 00:32:58 template strand here, A, T, T, A, A, G, A, now let's keep 441 00:32:58 --> 00:33:05 going, A, T, G, C, C, A, A, T, G, 442 00:33:05 --> 00:33:14 G, A, T, T, A, five prime. So, there's my primer. 443 00:33:14 --> 00:33:26 There's my template. I'm going to start adding. Well, let's 444 00:33:26 --> 00:33:37 add our polymerase. Let's add our dNTPs, 445 00:33:37 --> 00:33:47 polymerase, dATP, dCTP, dTTP, dGTP, and then I want to add a 446 00:33:47 --> 00:33:58 special extra good old ingredient into this. 447 00:33:58 --> 00:34:07 The special extra ingredient I want to add is a defective T, 448 00:34:07 --> 00:34:17 a defective dTTP. What do I mean by defective? I mean chemically 449 00:34:17 --> 00:34:27 modified in such a way that it can't be extended, that you 450 00:34:27 --> 00:34:36 can't extend past it. So now, let's follow my reaction. 451 00:34:36 --> 00:34:44 I'm going to start with, I'm just going to write them down here, 452 00:34:44 --> 00:34:52 T, A, A, T, T, C, T. What's the next base I'm going to put in? 453 00:34:52 --> 00:35:00 T, OK? Is that a defective T or a good T? I don't know. 454 00:35:00 --> 00:35:05 It could be. Maybe it's a defective T, which I'll put a little star 455 00:35:05 --> 00:35:11 there, OK? If so, what happens to my polymerase? 456 00:35:11 --> 00:35:17 It stops. It can't go any further. It can't go any further because the 457 00:35:17 --> 00:35:22 T's defective. But what if it wasn't a defective T? 458 00:35:22 --> 00:35:28 What if it was a good T? Then what goes on? The polymerase 459 00:35:28 --> 00:35:34 will put in, keep going guys. A, C, G, G, and what does it put in 460 00:35:34 --> 00:35:39 now? T, right? Now, 461 00:35:39 --> 00:35:45 is that a defective T? Maybe. We don't know. 462 00:35:45 --> 00:35:50 If it is a defective T, it stops there. Otherwise, 463 00:35:50 --> 00:35:56 polymerase goes here, and the next space is what? 464 00:35:56 --> 00:36:02 T, and is that a defective T? Maybe. 465 00:36:02 --> 00:36:09 And, if it's not a defective T, then polymerase goes on, puts in an 466 00:36:09 --> 00:36:16 A, puts in a G, a C, C, and then a T. 467 00:36:16 --> 00:36:23 And maybe that's defective. All right, which of these 468 00:36:23 --> 00:36:31 possibilities is what polymerase does when I throw it in? 469 00:36:31 --> 00:36:35 Well, all of them. There's a lot of molecules there. 470 00:36:35 --> 00:36:39 Some of the molecules, by chance, happen to install a defective T, 471 00:36:39 --> 00:36:43 and they grind to a halt here. Sometimes, a good T's put in and the 472 00:36:43 --> 00:36:47 molecules stop here. Sometimes they stop here, 473 00:36:47 --> 00:36:51 and if I start with a big collection of primers in a lot of my template 474 00:36:51 --> 00:36:55 DNA, I'm going to get this whole collection of different molecules of 475 00:36:55 --> 00:36:59 different lengths. What lengths do I get? 476 00:36:59 --> 00:37:04 The lengths correspond precisely to the positions of the Ts. 477 00:37:04 --> 00:37:10 I get a series of molecules whose lengths perfectly match the 478 00:37:10 --> 00:37:16 positions of Ts. Well, first off, 479 00:37:16 --> 00:37:23 how do I measure their lengths? Run a gel, bingo, run a gel. 480 00:37:23 --> 00:37:30 So, if I could run a gel that could separate nucleotides based on length 481 00:37:30 --> 00:37:37 that two next to each other, another one up there, I'd see a 482 00:37:37 --> 00:37:44 small molecule, length one, two, 483 00:37:44 --> 00:37:51 three, four, five, three, six, eight, I'd see one of 484 00:37:51 --> 00:37:58 length eight. I'd see one of length, what's the next one, 485 00:37:58 --> 00:38:05 13, eight, nine, ten, 13, 14, so eight, nine, 486 00:38:05 --> 00:38:12 ten, 11, 12, 13, 14, 15, what's that, 13, 14, 15, 16, 487 00:38:12 --> 00:38:18 17, 18. OK, those would be the positions at 488 00:38:18 --> 00:38:22 which I would see this T. So, I'd need to have a special kind 489 00:38:22 --> 00:38:26 of gel that's so accurate that it can separate single nucleotides, 490 00:38:26 --> 00:38:31 right, that the lengths, but that can be done. 491 00:38:31 --> 00:38:36 There's acrylamide, the polymer that will do that. 492 00:38:36 --> 00:38:41 That'll tell me the exact lengths of the T's. What else do I do? 493 00:38:41 --> 00:38:46 Well, let's obviously do it from the other bases. 494 00:38:46 --> 00:38:52 Let's try defective A, defective C, defective G. 495 00:38:52 --> 00:38:57 Let's see, if I got it right, which we'll try, it ought to end up 496 00:38:57 --> 00:39:02 looking something like that. And if not, you get the picture, 497 00:39:02 --> 00:39:07 that this ought to match up as to which columns have which lengths. 498 00:39:07 --> 00:39:13 OK, I think I got it right. That tells me the lengths of the 499 00:39:13 --> 00:39:18 molecules. So, I could read off at sequence. 500 00:39:18 --> 00:39:23 The sequence of that molecule ought to be, starting over there, 501 00:39:23 --> 00:39:29 the sequence of what I've added in, ought to be something like T, A, C, 502 00:39:29 --> 00:39:34 G, G, T, T, A, C, C, T, yep, it worked. 503 00:39:34 --> 00:39:39 It's exactly right. Bingo. I can now read the sequence. 504 00:39:39 --> 00:39:44 Fred Sanger, a brilliant scientist, thought up this method of just 505 00:39:44 --> 00:39:49 exploiting E coli's own polymerase or other organism's own polymerases. 506 00:39:49 --> 00:39:54 So, copying and all the chemistry that had to be done was thinking up 507 00:39:54 --> 00:40:00 a defective nucleotide that could not be extended. 508 00:40:00 --> 00:40:10 It could obviously be inserted. It can't be extended. So, one 509 00:40:10 --> 00:40:20 question is, what's a defective nucleotide? Well, 510 00:40:20 --> 00:40:30 you will recall that our nucleotide in the sugar phosphate chain 511 00:40:30 --> 00:40:37 is sitting like this. Let's see, hanging off the one prime 512 00:40:37 --> 00:40:42 carbon is the base. This is the one prime carbon, 513 00:40:42 --> 00:40:47 the two prime carbon, the three prime carbon, the four prime carbon, 514 00:40:47 --> 00:40:52 the five prime carbon. What do we know in DNA at the two prime carbon? 515 00:40:52 --> 00:40:57 Normally in ribose there would be a hydroxyl here, 516 00:40:57 --> 00:41:02 right? But in deoxyribose, there's just a hydrogen. 517 00:41:02 --> 00:41:11 So, if this is deoxyribose, so a dNTP really means a two prime 518 00:41:11 --> 00:41:20 deoxyribose, where do I now attach my next base in the sugar phosphate 519 00:41:20 --> 00:41:30 train? Three prime ends, and what do I attach it to: the OH. 520 00:41:30 --> 00:41:35 What do you think would happen if there's no OH there? 521 00:41:35 --> 00:41:40 You're stuck. All you've got to do is take off that hydroxyl. 522 00:41:40 --> 00:41:45 No hydroxyl group. If you made nucleotides that don't have that 523 00:41:45 --> 00:41:50 hydroxyl group, they can't be extended. 524 00:41:50 --> 00:41:55 So, instead of these being just deoxy at the two prime position, 525 00:41:55 --> 00:42:00 they are dideoxy, deoxy at two positions. 526 00:42:00 --> 00:42:04 They are two prime, three prime, dideoxynucleotides. 527 00:42:04 --> 00:42:09 That's it. Now, if you needed to get two prime three prime 528 00:42:09 --> 00:42:13 dideoxynucleotides, they're in the catalogue of course, 529 00:42:13 --> 00:42:18 right, because Fred Sanger had to make them himself and all that, 530 00:42:18 --> 00:42:23 but you can just buy them now. And so, you can do the sequence. 531 00:42:23 --> 00:42:28 A few other little details here, though, guys. 532 00:42:28 --> 00:42:32 How do we see the DNA and the gel? One possibility would be staining 533 00:42:32 --> 00:42:37 it. There are some dies like ethidium bromide, 534 00:42:37 --> 00:42:42 and for doing your restriction mapping, using a dye that sticks to 535 00:42:42 --> 00:42:47 DNA like ethidium bromide does is pretty good. And then you put it 536 00:42:47 --> 00:42:52 under fluorescent light and you look. For sequencing, 537 00:42:52 --> 00:42:57 the amount of DNA is so little that it's hard to see with a dye by the 538 00:42:57 --> 00:43:02 naked eye, which is what you do with restriction map. So, sorry? 539 00:43:02 --> 00:43:06 So, the first thing people did was radioactive. What they did was they 540 00:43:06 --> 00:43:10 took a primer, made it radioactive, 541 00:43:10 --> 00:43:14 and you did this whole sequencing reaction with radioactive primer. 542 00:43:14 --> 00:43:18 Then, when you run the gel, you take your gel and you expose it for 543 00:43:18 --> 00:43:22 some number of hours, eight hours maybe, a piece of x-ray 544 00:43:22 --> 00:43:26 film, develop the x-ray film, and you'll see that picture. So, 545 00:43:26 --> 00:43:30 one solution that you could do to visualize is using radioactive 546 00:43:30 --> 00:43:37 nucleotides. So, we got the defective nucleotide. 547 00:43:37 --> 00:43:45 We now need to visualize our DNA. Let's visualize the sequence. 548 00:43:45 --> 00:43:54 One possibility: radioactive. The second possibility, someone 549 00:43:54 --> 00:44:03 already mentioned it, a fluorescent dye. 550 00:44:03 --> 00:44:10 Now, here, a fluorescent dye could be put on, and you can't read it 551 00:44:10 --> 00:44:17 with your eye, but lasers are very good at reading. 552 00:44:17 --> 00:44:24 So, you might run a whole gel here and have lasers scan it. 553 00:44:24 --> 00:44:31 But, you can actually do better than that. Suppose I put my 554 00:44:31 --> 00:44:39 fluorescent dye on my dideoxynucleotides. 555 00:44:39 --> 00:44:45 Suppose I put it on my dideoxynucleotides, 556 00:44:45 --> 00:44:52 and suppose I even had enough chemistry at my disposal that I 557 00:44:52 --> 00:44:59 could put a different color of fluorescent dye on each 558 00:44:59 --> 00:45:05 of my nucleotides. Then, whenever the dideoxy is put in 559 00:45:05 --> 00:45:09 to terminate the chain, it carries with it its own color. 560 00:45:09 --> 00:45:14 Wouldn't that be cool? And, that's what's done. Not just can you buy 561 00:45:14 --> 00:45:18 dideoxynucleotides now, but you can buy the four different 562 00:45:18 --> 00:45:23 dideoxynucleotides each with its own dye attached to it. 563 00:45:23 --> 00:45:27 So, there are di-dideoxies I guess, sorry, but it's different di's, 564 00:45:27 --> 00:45:32 right? They're dye-dideoxies. So, you could do that. 565 00:45:32 --> 00:45:36 And then what you get would be that in this column you get this color. 566 00:45:36 --> 00:45:40 And in this column, you'd get this color. And in this column you'd get 567 00:45:40 --> 00:45:44 this color, etc. I'm not worrying about where they 568 00:45:44 --> 00:45:48 are here. And they'd all be different colors and it would be 569 00:45:48 --> 00:45:52 very pretty. You know what? Why do we need to run separate 570 00:45:52 --> 00:45:56 lanes anymore? If we got a laser, 571 00:45:56 --> 00:46:00 we can tell the laser scan it to tell it different. Stick 572 00:46:00 --> 00:46:05 it in one way. In fact, what's done is stick it in 573 00:46:05 --> 00:46:13 a capillary tube, throw in all four at the same time 574 00:46:13 --> 00:46:20 now, and as these fragments come by, each has its own color. And all we 575 00:46:20 --> 00:46:28 need is a laser scanner capable of sitting right here. Here's 576 00:46:28 --> 00:46:34 my laser scanner. And the laser scanner, 577 00:46:34 --> 00:46:40 positive here, negative here, as the DNA flows by through this 578 00:46:40 --> 00:46:46 polymer, the laser scanner reads off which colors just went by. 579 00:46:46 --> 00:46:51 And it goes A color, C color, T color, G color. That's it. So, 580 00:46:51 --> 00:46:57 there are actually machines now that have 96 different capillaries. 581 00:46:57 --> 00:47:04 These are called capillary tubes. And you can have 96 of them with 582 00:47:04 --> 00:47:12 laser scanning across, and in each column now, 583 00:47:12 --> 00:47:20 it turns out that you can read almost 1,000 letters, 584 00:47:20 --> 00:47:28 1,000 bases per column per capillary times about 100 capillaries. 585 00:47:28 --> 00:47:36 Or in other words, you can read out about 10^5 bases of information. 586 00:47:36 --> 00:47:40 You can read out 10^5 bases of information in about two hours. 587 00:47:40 --> 00:47:45 Of course, you can do that ten times a day. So, 588 00:47:45 --> 00:47:50 you can actually read out 10^6 or about a million bases of information 589 00:47:50 --> 00:47:55 per machine. And here at MIT, we have 100 of these machines. So, 590 00:47:55 --> 00:48:00 we actually can read out a little shy of 100 million letters of DNA 591 00:48:00 --> 00:48:05 sequence per day, which I mean is a lot. 592 00:48:05 --> 00:48:11 We read about 40 billion letters per year here at MIT, 593 00:48:11 --> 00:48:17 and this is how we do it. How much does a machine cost? 594 00:48:17 --> 00:48:23 List, or do you want a deal? They list for $300, 595 00:48:23 --> 00:48:29 00, but if you buy in bulk, I can do better. [LAUGHTER] We buy 596 00:48:29 --> 00:48:34 it in bulk, by the way. So now, how are we going to get our 597 00:48:34 --> 00:48:38 primer there? That was the only little bit we were missing is where 598 00:48:38 --> 00:48:42 did our primer come from? The last little detail: here's my 599 00:48:42 --> 00:48:47 vector, remember, and I want to sequence this insert. 600 00:48:47 --> 00:48:51 How am I going to get a primer in the insert? I don't know what its 601 00:48:51 --> 00:48:56 sequence is. How do I even start this? Sorry? Well, 602 00:48:56 --> 00:49:00 but that won't tell me what the sequence is that I have to, 603 00:49:00 --> 00:49:05 I mean, I was looking to try to get a primer that matches the insert. 604 00:49:05 --> 00:49:09 And I don't know what the insert is. So, how am I going to get a primer? 605 00:49:09 --> 00:49:13 Oh, I know the vector. The vector is well known. 606 00:49:13 --> 00:49:17 It sequence is in the catalog. Let me instead just use a primer 607 00:49:17 --> 00:49:21 that happens to sit in the vector, and I'll match to a known sequence 608 00:49:21 --> 00:49:25 to start with, and then I'll sequence into my 609 00:49:25 --> 00:49:29 unknown territory. So, this is how you get the initial 610 00:49:29 --> 00:49:33 primer was you arrange that your initial primer is sitting in known 611 00:49:33 --> 00:49:37 vector sequence. All right, so you can now sequence 612 00:49:37 --> 00:49:40 DNA. I've got to say, I've taught this course for a little 613 00:49:40 --> 00:49:44 more than a decade, and being able to say, 614 00:49:44 --> 00:49:47 now we can routinely sequence about a million letters per machine, 615 00:49:47 --> 00:49:50 and 100 million letters per day, and things like this was not 616 00:49:50 --> 00:49:53 routinely the case. When we started teaching this 617 00:49:53 --> 49:58 course, I was describing what we di