1 00:00:01 --> 00:00:05 Good morning. So, 2 00:00:05 --> 00:00:09 we are going to see if my voice holds up through this lecture today. 3 00:00:09 --> 00:00:13 It is a casualty of having been at Foxborough yesterday, 4 00:00:13 --> 00:00:17 and then staying up rather late watching the Red Sox game. 5 00:00:17 --> 00:00:21 On the whole, both seemed to have come through successfully, 6 00:00:21 --> 00:00:25 but my voice is a bit of a casualty of the events. 7 00:00:25 --> 00:00:29 So, we'll see. But I'm going to sound a lot 8 00:00:29 --> 00:00:35 scratchier than normal. 9 00:00:35 --> 00:00:41 So, how many of you stayed up to the end of the game last night? 10 00:00:41 --> 00:00:49 Good, excellent. I approve. 11 00:00:49 --> 00:00:54 OK,last time, we spoke about the idea of cloning DNA, 12 00:00:54 --> 00:00:59 to create libraries of molecules. 13 00:00:59 --> 00:01:03 And again, I think this is just one of the most clever inventions 14 00:01:03 --> 00:01:07 because it's a completely new way to think about purifying molecules. 15 00:01:07 --> 00:01:11 Rather than purifying molecules, by separating them based on their 16 00:01:11 --> 00:01:15 biochemical properties, it's purifying molecules by diluting 17 00:01:15 --> 00:01:20 them into single components, and then amplifying each back up 18 00:01:20 --> 00:01:24 from its own source. It's really quite a beautiful idea. 19 00:01:24 --> 00:01:28 And just to go over it, we take, say, human DNA, 20 00:01:28 --> 00:01:32 or we could take drosophila DNA, or we could take yeast DNA, or we 21 00:01:32 --> 00:01:37 could take any other DNA we feel like. 22 00:01:37 --> 00:01:42 We cut it up in some fashion with a restriction enzyme. 23 00:01:42 --> 00:01:48 We'll use our favorite restriction enzyme here, echo R1, 24 00:01:48 --> 00:01:54 which cuts a defying side, GAATTC. We take that. We add our 25 00:01:54 --> 00:02:00 insert DNA. These are referred to as inserts because they're going to 26 00:02:00 --> 00:02:05 be inserted into a plasmid. We take a plasmid vector. 27 00:02:05 --> 00:02:11 The plasmid vector here is a naturally occurring, 28 00:02:11 --> 00:02:16 although sometimes modified, piece of DNA that bacteria have that 29 00:02:16 --> 00:02:22 take an origin of replication that allow it to grow autonomously when 30 00:02:22 --> 00:02:28 put in a bacterial cell, a selectable marker. 31 00:02:28 --> 00:02:32 The selectable marker, for example, ampicillin resistance, 32 00:02:32 --> 00:02:37 or some other resistance, we add these and then we seal up the pieces 33 00:02:37 --> 00:02:42 of the DNA using the enzyme ligase. Ligase joins and joins producing 34 00:02:42 --> 00:02:47 for us molecules of this sort. We make zillions of them in 35 00:02:47 --> 00:02:52 parallel in one test tube. We then transform them by adding 36 00:02:52 --> 00:02:57 these molecules to bacterial cells that have been appropriately 37 00:02:57 --> 00:03:02 prepared to be transformed, that is, their membranes have been 38 00:03:02 --> 00:03:07 treated in such a way that they're going to be most likely to 39 00:03:07 --> 00:03:11 suck up pieces of DNA. We then plate them on a plate at a 40 00:03:11 --> 00:03:15 density so that individual bacterial cells are well separated from each 41 00:03:15 --> 00:03:19 other. You try a bunch of different densities so you get one right. 42 00:03:19 --> 00:03:23 And, you let them grow up. And, every colony here, as we discussed, 43 00:03:23 --> 00:03:27 is the descendant of a single bacterial cell, 44 00:03:27 --> 00:03:31 carrying ideally a single plasmid. 45 00:03:31 --> 00:03:35 And, that single plasmid, we know it's carrying a single 46 00:03:35 --> 00:03:39 plasmid because we were clever enough to put ampicillin or other 47 00:03:39 --> 00:03:44 selectable marker on this plate. And so, only bacteria that have 48 00:03:44 --> 00:03:48 picked up the plasmid are ampicillin resistant. And there you go. 49 00:03:48 --> 00:03:53 This is called a library. And, at the end of the day, you may have 50 00:03:53 --> 00:03:57 a library that contains one plate of clones or a library containing 51 00:03:57 --> 00:04:02 hundreds of plates of clones. We're going to see how we last 52 00:04:02 --> 00:04:08 through this. Now, a few people asked me at the end of 53 00:04:08 --> 00:04:13 the last lecture, well, OK, but what about the details. 54 00:04:13 --> 00:04:19 Is it really going to work like this? How come some of these 55 00:04:19 --> 00:04:24 plasmid molecules don't automatically get closed back up by 56 00:04:24 --> 00:04:30 ligase? Why is it that there's always an insert in the plasmid? 57 00:04:30 --> 00:04:34 What's the answer to that question? Sorry? There's not an answer 58 00:04:34 --> 00:04:38 because sometimes ligase might close up that molecule. 59 00:04:38 --> 00:04:42 Now, that would be unfortunate because it would mean that a bunch 60 00:04:42 --> 00:04:46 of the things in your library just had the vector without any insert. 61 00:04:46 --> 00:04:50 So, and these are details, but over the course of years, 62 00:04:50 --> 00:04:54 recombinant DNA specialists have worked out lots of cute tricks to 63 00:04:54 --> 00:04:58 make better and better libraries. I'll just give you an example of 64 00:04:58 --> 00:05:03 the kinds of things. Remember that in order to ligate DNA, 65 00:05:03 --> 00:05:09 we had a five prime here. We have a phosphate group here, 66 00:05:09 --> 00:05:16 three prime hydroxyl phosphate here, double strand of DNA here. We have 67 00:05:16 --> 00:05:23 a phosphate here. We have a hydroxyl here, 68 00:05:23 --> 00:05:30 phosphate five prime, three prime. 69 00:05:30 --> 00:05:37 If ligase is going to come along, it turns out that ligase needs the 70 00:05:37 --> 00:05:45 phosphate there in order to seal it up and make a chain. 71 00:05:45 --> 00:05:53 So, for example, suppose we were to arrange that the plasmid vector 72 00:05:53 --> 00:06:01 didn't have phosphates on its two ends. Then ligase would not be able 73 00:06:01 --> 00:06:06 to re-seal the plasmid vector. That's a cute trick. 74 00:06:06 --> 00:06:10 This is just cooking, but I'm giving you an idea of the 75 00:06:10 --> 00:06:13 kind of cooking tricks we use in all this. So, ideally, 76 00:06:13 --> 00:06:17 you would like an enzyme that can remove phosphate groups from the end 77 00:06:17 --> 00:06:21 of DNA. How are you going to invent such an enzyme? 78 00:06:21 --> 00:06:24 It already exists is the answer to all these questions. 79 00:06:24 --> 00:06:28 And, bacteria have such an enzyme that can remove phosphate groups. 80 00:06:28 --> 00:06:32 So, just remove phosphate groups. And of course these enzymes are 81 00:06:32 --> 00:06:36 developed by bacteria because they need them in the course of DNA 82 00:06:36 --> 00:06:41 metabolism. And, what do you think the enzyme is 83 00:06:41 --> 00:06:45 called? Phosphotase, of course. That's what happens, 84 00:06:45 --> 00:06:49 use phosphotase, and you treat that, and it doesn't seal back up. Now, 85 00:06:49 --> 00:06:54 somebody will say to me, well, OK, but now I've got my vector 86 00:06:54 --> 00:06:58 here, and I don't have a phosphate on it, and so this is 87 00:06:58 --> 00:07:03 my vector DNA. And then, I've got my insert DNA, 88 00:07:03 --> 00:07:08 and sorry, my insert DNA here, it has a hydroxyl here and a phosphate 89 00:07:08 --> 00:07:13 here. So, the vector has no phosphate. But, 90 00:07:13 --> 00:07:19 when ligase wants to attach an insert, it's got a phosphate here 91 00:07:19 --> 00:07:24 but not here. What's going to happen? Well, 92 00:07:24 --> 00:07:29 it turns out that ligase will seal up this because it's got a phosphate, 93 00:07:29 --> 00:07:34 but it'll leave this one open. Now, is that a problem? 94 00:07:34 --> 00:07:38 It turns out, if you just transform it into the bacteria with that hole 95 00:07:38 --> 00:07:42 there on one strand but not both strands, it's still a covalently 96 00:07:42 --> 00:07:47 closed circle on one of its strands. The bacteria will repair it. So, 97 00:07:47 --> 00:07:51 you can take advantage of the bacteria's own DNA repair mechanisms 98 00:07:51 --> 00:07:55 to just throw the molecule in sealed up on one strand and let its repair 99 00:07:55 --> 00:08:00 mechanism; all these tricks we play to our advantage. 100 00:08:00 --> 00:08:04 Someone else asked after class, what happens if the gene I'm 101 00:08:04 --> 00:08:08 interested in studying has, here's my gene let's say that I'm 102 00:08:08 --> 00:08:13 interested in studying. I take human DNA. I cut it with 103 00:08:13 --> 00:08:17 echo R1. So, I have cut it at all the echo sites. 104 00:08:17 --> 00:08:22 Well, golly, what happens if my gene happened to have an echo site 105 00:08:22 --> 00:08:26 in it? Then my gene's going to be cut up into two pieces. 106 00:08:26 --> 00:08:30 Isn't that bad? What do I do about that? 107 00:08:30 --> 00:08:34 Do I know in advance if my gene has an echo site? Well, 108 00:08:34 --> 00:08:38 no, I don't, because I don't know what my gene is. 109 00:08:38 --> 00:08:42 I'm making a library of everything in the genome. 110 00:08:42 --> 00:08:46 So, some genes will have it, and some won't. And, I might not 111 00:08:46 --> 00:08:50 know the gene I'm looking for. So, how do I avoid that? Sorry? 112 00:08:50 --> 00:08:54 Oh, you've tried another enzyme. You've tried BAM and Hindi, 113 00:08:54 --> 00:08:58 and make a library with different enzymes. That's one 114 00:08:58 --> 00:09:02 way. That works. Another way, just to give you a 115 00:09:02 --> 00:09:06 sense of how fast molecular biologists are with this. 116 00:09:06 --> 00:09:10 Supposed when we add echo R1 we don't let the reaction go to 117 00:09:10 --> 00:09:14 completion. Suppose we run the reaction under conditions where it's 118 00:09:14 --> 00:09:18 somewhat inefficient, and instead of managing to cleave 119 00:09:18 --> 00:09:22 every echo site, on average it cleaves, 120 00:09:22 --> 00:09:26 say, one out of every three-echo sites. You can do that. 121 00:09:26 --> 00:09:30 So, that means you can arrange just by your reaction conditions to on 122 00:09:30 --> 00:09:35 average randomly cleave some but not others. 123 00:09:35 --> 00:09:38 And, these are called partial digestions. So, 124 00:09:38 --> 00:09:41 it turns out that all of the kinds of things that people were asking me 125 00:09:41 --> 00:09:44 about afterwards, I was very glad people were thinking 126 00:09:44 --> 00:09:47 about would this really work? There are tricks to get around all 127 00:09:47 --> 00:09:50 of it, and there's a whole fat book of protocols about if you want to 128 00:09:50 --> 00:09:54 make a library really, really carefully, how you would do 129 00:09:54 --> 00:09:57 that, how you make sure the vector doesn't re-close, 130 00:09:57 --> 00:10:00 how you make sure that you don't cut every site but random sites, 131 00:10:00 --> 00:10:04 and things like that. And, all of these rely on lots of 132 00:10:04 --> 00:10:08 enzymes and things that bacteria have already invented. 133 00:10:08 --> 00:10:12 So, I'm just going to put these down as cooking tips. 134 00:10:12 --> 00:10:16 These are not really necessarily, I don't care whether you know the 135 00:10:16 --> 00:10:20 details or not, rather that there exists a whole 15 136 00:10:20 --> 00:10:24 years, 20 years worth of ways to make the best possible libraries. 137 00:10:24 --> 00:10:28 And so, it's quite routine now to be able to make good libraries. 138 00:10:28 --> 00:10:34 All right, so, having made a library, 139 00:10:34 --> 00:10:40 the challenge is finding your clone. How to find your clone, the clone 140 00:10:40 --> 00:10:46 of interest. So, I need to describe a number of ways 141 00:10:46 --> 00:10:52 that people have for finding a clone of interest. And here, 142 00:10:52 --> 00:10:58 of course, up to this point, the DNA could be zebra DNA, and it 143 00:10:58 --> 00:11:04 could be human DNA and yeast DNA, and it could be something that is an 144 00:11:04 --> 00:11:11 enzyme for arginine, or this, or that. 145 00:11:11 --> 00:11:18 But now we have to be specific. So, let's suppose we go back to a 146 00:11:18 --> 00:11:25 problem we talked about before about, say, auxotrophy for a nutrient. 147 00:11:25 --> 00:11:32 So, let's suppose that I have a bacteria, maybe even E coli itself, 148 00:11:32 --> 00:11:40 where I have selected mutants that are auxotrophic for arginine. 149 00:11:40 --> 00:11:52 So, arginine auxotrophs will grow on rich medium, but on minimal medium 150 00:11:52 --> 00:12:00 they don't grow. But, they would grow if I added 151 00:12:00 --> 00:12:04 arginine to that medium. They don't grow because they have a 152 00:12:04 --> 00:12:09 mutation in a gene. We know it's a gene because we 153 00:12:09 --> 00:12:13 crossed together the mutant and the wild type. We show that we can 154 00:12:13 --> 00:12:18 define this phenotype to be a recessive phenotype. 155 00:12:18 --> 00:12:22 We can map it in the yeast genome by showing it has linkage to other 156 00:12:22 --> 00:12:27 phenotypes. That's all great. We can do classical genetics, a la 157 00:12:27 --> 00:12:32 Mendel, a la Morgan, a la Sturtevant. 158 00:12:32 --> 00:12:37 But, how are we going to find the gene? How are we going to, 159 00:12:37 --> 00:12:42 now, use our tools of recombinant DNA to get physically in our hand 160 00:12:42 --> 00:12:47 the piece of DNA that encodes the gene that is defective in the strand? 161 00:12:47 --> 00:12:52 So, have a mutant bacteria. It can't make arginine. It can't 162 00:12:52 --> 00:12:57 grow in minimal medium. Somewhere in there, you know 163 00:12:57 --> 00:13:02 there's a mutation in the DNA sequence. 164 00:13:02 --> 00:13:07 How do we find it? What should we do? 165 00:13:07 --> 00:13:13 This is the whole point of recombinant DNA, 166 00:13:13 --> 00:13:18 to make this abstract notion of, there exists genes, they transmit 167 00:13:18 --> 00:13:24 all this kind of stuff, concrete. How are you going to find 168 00:13:24 --> 00:13:30 it? Any takers? Sorry? Run a gel. 169 00:13:30 --> 00:13:34 So, I take DNA, cut it up, run a gel. 170 00:13:34 --> 00:13:38 I have all the DNA from the bacteria schmeered (sic) out. 171 00:13:38 --> 00:13:42 And somewhere in that schmeer is the gene. So, 172 00:13:42 --> 00:13:46 I take normal DNA from normal bacteria. I take mutant DNA. 173 00:13:46 --> 00:13:50 One nucleotide is different in the mutant DNA. I run them out, 174 00:13:50 --> 00:13:54 and I assure you, they just look like a schmeer. 175 00:13:54 --> 00:13:58 It's just a big schmeer of DNA. It's hard to see one nucleotide 176 00:13:58 --> 00:14:03 difference out of the 4 million nucleotides. 177 00:14:03 --> 00:14:07 The E coli say, how are we going to get that? 178 00:14:07 --> 00:14:11 This is good. We're thinking practically here. 179 00:14:11 --> 00:14:15 What else? Sorry? Sorry? Cut it up. I'm assuming 180 00:14:15 --> 00:14:19 she wanted it cut up and run out on the gel. It still will look like a 181 00:14:19 --> 00:14:23 schmeer. Forget the gel. Cut it up. Make a library. 182 00:14:23 --> 00:14:27 OK, so we're going to make a library. Let's assume now we have a 183 00:14:27 --> 00:14:31 library of different E coli cells containing individual plasmids, 184 00:14:31 --> 00:14:37 containing random bits of E coli. How's that going to help? 185 00:14:37 --> 00:14:46 Splice it back in. How do I know if I spliced it back in? 186 00:14:46 --> 00:14:55 Ooh, that's an interesting thought. Suppose I were to make my library 187 00:14:55 --> 00:15:04 using wild type DNA, DNA from the wild type strain. 188 00:15:04 --> 00:15:09 So, I'm going to make a library containing lots and lots of 189 00:15:09 --> 00:15:14 fragments of normal E coli DNA. This is my library. I'm going to 190 00:15:14 --> 00:15:20 transform it into, what kind of bacteria should I 191 00:15:20 --> 00:15:25 transform it into, wild type or mutant? 192 00:15:25 --> 00:15:31 Who votes mutant? Who votes wild type? 193 00:15:31 --> 00:15:38 We'll go with mutant, then. Mutant. We'll put it in 194 00:15:38 --> 00:15:45 mutant. So now, all of these mutant cells, 195 00:15:45 --> 00:15:52 each one is going to suck up a plasmid. We then are going to plate 196 00:15:52 --> 00:15:59 this, and let colonies grow up. One of these colonies contained, 197 00:15:59 --> 00:16:06 so this mutant is arge minus. And, one of these colonies is going 198 00:16:06 --> 00:16:12 to contain the ARG plus gene here. How are we going to know which one? 199 00:16:12 --> 00:16:18 Sorry? How are we going to know which one has the arge plus gene? 200 00:16:18 --> 00:16:24 Yes? So, plate it on minimal medium. If I plate it on minimal 201 00:16:24 --> 00:16:31 medium, what will happen to most of my mutant bacteria? 202 00:16:31 --> 00:16:34 They're not going to grow. But, what's going to happen to the 203 00:16:34 --> 00:16:38 bacteria that happens to be lucky enough to have picked up the plasmid 204 00:16:38 --> 00:16:41 that contains the ARG plus gene? It'll grow. So, whatever grows on 205 00:16:41 --> 00:16:45 minimal medium has been rescued. In fact, we've complemented the 206 00:16:45 --> 00:16:49 defect. Remember, we talked about complementation 207 00:16:49 --> 00:16:52 tests? In a way, it would be the plasmid is 208 00:16:52 --> 00:16:56 complementing the defect. Bingo, that's it. So, we can 209 00:16:56 --> 00:17:00 actually find that gene functionally. 210 00:17:00 --> 00:17:09 We plate on minimal median, and we look for growth. The only 211 00:17:09 --> 00:17:18 things that will grow have been rescued. So, this is called cloning 212 00:17:18 --> 00:17:27 by complementation because we are complementing the defect 213 00:17:27 --> 00:17:34 in this strand. All right. So, 214 00:17:34 --> 00:17:38 any time I have a functional defect in my bacteria, 215 00:17:38 --> 00:17:43 I can find the gene for that functional defect by simply taking a 216 00:17:43 --> 00:17:48 total library for normal from wild type bacteria, 217 00:17:48 --> 00:17:52 transforming it into a mutant bacteria, and looking for rich 218 00:17:52 --> 00:17:57 bacteria has suddenly been rescued. Then I'll purify that bacterium, 219 00:17:57 --> 00:18:05 and I'll purify out the plasmid. And that plasmid will contain the 220 00:18:05 --> 00:18:16 DNA for the gene. That's pretty cool. 221 00:18:16 --> 00:18:28 Let's try another one. Suppose, yes? OK, great. 222 00:18:28 --> 00:18:32 I've got my plate here, and I've said only one of these 223 00:18:32 --> 00:18:36 bacteria will grow. It's the one that happens to have 224 00:18:36 --> 00:18:41 within it the plasmid containing the ARG gene. And, 225 00:18:41 --> 00:18:45 you're fine with that, but you're saying, but how would I 226 00:18:45 --> 00:18:50 get that plasmid back out of the bacteria because the bacteria's got 227 00:18:50 --> 00:18:54 its own chromosome, and I'm making this big deal about 228 00:18:54 --> 00:18:59 how we purified stuff away from all this other DNA. 229 00:18:59 --> 00:19:03 But, I've thrown this plasmid back into a bacteria that has all 230 00:19:03 --> 00:19:08 its chromosomal DNA. So, who am I kidding? 231 00:19:08 --> 00:19:13 How are we going to purify out just that plasmid? If I could purify the 232 00:19:13 --> 00:19:18 plasmid, it would be OK right? It turns out I can. Plasmids are 233 00:19:18 --> 00:19:22 little circles of DNA. Chromosomes are big pieces of DNA. 234 00:19:22 --> 00:19:27 It turns out that the coiling of the plasmid as a little circle gives 235 00:19:27 --> 00:19:32 it different densities and different physical chemical properties to big 236 00:19:32 --> 00:19:37 chunks of DNA which get broken up. And so, there are a bunch of tricks 237 00:19:37 --> 00:19:41 that allow me to get a pretty high purification of a plasmid away from 238 00:19:41 --> 00:19:46 chromosomal DNA based on the different physical properties of a 239 00:19:46 --> 00:19:50 small circle versus big chromosome. But, good question. Otherwise, how 240 00:19:50 --> 00:19:55 would I get that plasmid out? But it turns out, you can purify 241 00:19:55 --> 00:20:00 plasmids. Good question. OK, so now, let's try another one. 242 00:20:00 --> 00:20:05 Next cloning expedition: we're going to go to the library, 243 00:20:05 --> 00:20:10 and we want to withdraw a volume from the library. 244 00:20:10 --> 00:20:15 And, I want now, instead of bacteria that can't make arginine, 245 00:20:15 --> 00:20:20 let's go with human DNA. Let's try human DNA. And, 246 00:20:20 --> 00:20:25 I would like you to now please find the gene that encodes beta-globin. 247 00:20:25 --> 00:20:30 Beta globin, of course, is one of the two proteins in hemoglobin. 248 00:20:30 --> 00:20:34 Hemoglobin is a tetramer. It has alpha-globin and beta-globin. 249 00:20:34 --> 00:20:39 This tetramer is the oxygen carrier in your blood. 250 00:20:39 --> 00:20:43 It carriers oxygen. Beta-globin happens to be the site 251 00:20:43 --> 00:20:48 of some very important mutations. We know that sickle cell anemia is 252 00:20:48 --> 00:20:52 caused by mutations in beta-globin. We know that diseases like 253 00:20:52 --> 00:20:57 thalassemia are caused by mutations in beta-globin. 254 00:20:57 --> 00:21:01 And, people knew this before they had recombinant DNA because they 255 00:21:01 --> 00:21:06 could study red blood cells. There's lots of beta-globin in red 256 00:21:06 --> 00:21:10 blood cells. They could see that something was funny about the 257 00:21:10 --> 00:21:14 protein. They could even see that in sickle cell anemia the protein 258 00:21:14 --> 00:21:19 had a different net charge, and it would run differently. 259 00:21:19 --> 00:21:23 So, they knew something was funny with the beta globin protein. 260 00:21:23 --> 00:21:27 All I want you to do now is clone beta-globin for me. 261 00:21:27 --> 00:21:32 Could we do the same thing? Why not? 262 00:21:32 --> 00:21:40 Bacteria don't make beta-globin. So, what can we do? Well, we could 263 00:21:40 --> 00:21:49 make a library of human DNA. And, we could throw it into the 264 00:21:49 --> 00:21:58 bacteria. So, why don't we just select for a 265 00:21:58 --> 00:22:05 bacteria that makes beta-globin? Could we do that? 266 00:22:05 --> 00:22:11 I don't know, how? Do you see how? How would we 267 00:22:11 --> 00:22:16 select for that? I mean, there, we could see who 268 00:22:16 --> 00:22:21 grows without arginine. But how are we going to tell which 269 00:22:21 --> 00:22:27 bacteria has picked up beta-globin? I don't know. Yeah? Use 270 00:22:27 --> 00:22:32 mammals. We could take a mouse that did not 271 00:22:32 --> 00:22:37 make beta globin, a mouse that had, say, 272 00:22:37 --> 00:22:41 thalassemia, isolate a naturally occurring mouse with a defect in 273 00:22:41 --> 00:22:46 beta-globin. Then, do injections of plasmids into mouse 274 00:22:46 --> 00:22:51 eggs, grow up the mouse eggs by implanting them back into 275 00:22:51 --> 00:22:55 pseudo-pregnant females, do this for 108 individual plasmids 276 00:22:55 --> 00:23:00 with 108 individual mice, and look for the mouse that is 277 00:23:00 --> 00:23:04 rescued. Intellectually, 278 00:23:04 --> 00:23:08 you're absolutely right, it works. So, that's exactly the 279 00:23:08 --> 00:23:12 cloning by complementation we talked about for bacteria, 280 00:23:12 --> 00:23:16 and you're dead-on right. That would work. Getting it funded 281 00:23:16 --> 00:23:19 is another matter because it's a hugely expensive experiment to shoot 282 00:23:19 --> 00:23:23 up each egg with this, but it could work. So, 283 00:23:23 --> 00:23:27 we need another solution because we can't rescue the function in mice 284 00:23:27 --> 00:23:31 because it's just not practical to do so. 285 00:23:31 --> 00:23:35 Of course, if we could do this in mouse cells, maybe we could make it 286 00:23:35 --> 00:23:40 work in cell culture in mice. But, let's suppose we don't have a 287 00:23:40 --> 00:23:44 cell culture phenotype. We just have an organism phenotype. 288 00:23:44 --> 00:23:49 So, it's not going to work to just do this by complementation. 289 00:23:49 --> 00:23:53 But, good thinking guys. This is good. So, next trick we might have 290 00:23:53 --> 00:23:58 at our disposal is suppose because beta-globin is so abundant in red 291 00:23:58 --> 00:24:02 blood cells we have purified beta-globin, and we've done amino 292 00:24:02 --> 00:24:09 acid sequencing of the protein. By end degradation, 293 00:24:09 --> 00:24:17 you can work out the sequence of globin. And, you can learn that 294 00:24:17 --> 00:24:25 beta-globin has, here at its amino terminal, 295 00:24:25 --> 00:24:33 val, leu, ser, pro, ala, asp, lys, threonine dot, dot, dot, dot, 296 00:24:33 --> 00:24:41 dot off to the carboxy terminal, OK? 297 00:24:41 --> 00:24:46 If I knew that this was the amino acid sequence of the beginning, 298 00:24:46 --> 00:24:51 just the beginning of beta-globin, couldn't I figure out what that 299 00:24:51 --> 00:24:57 initial portion of the DNA sequence must be? 300 00:24:57 --> 00:25:01 Wouldn't this give me a clue? If I knew a little bit of the 301 00:25:01 --> 00:25:05 protein sequence, wouldn't this give me a clue about 302 00:25:05 --> 00:25:09 the nucleotide sequence that must be there in the human genome to encode 303 00:25:09 --> 00:25:13 this protein? So, a biochemist has purified the 304 00:25:13 --> 00:25:17 protein. Biochemists have studied the protein well enough to know some 305 00:25:17 --> 00:25:21 of its amino acid sequence. Can I infer the DNA sequence from 306 00:25:21 --> 00:25:25 the amino acid sequence, or at least a little snippet of it? 307 00:25:25 --> 00:25:30 Sorry? Multiple possibilities, 308 00:25:30 --> 00:25:35 but an infinite number? No. Why do you encode valine? Well, 309 00:25:35 --> 00:25:40 GT something; something could be actually A, T, 310 00:25:40 --> 00:25:45 C, or G. What about luecine. Well, it's either a T and a C, 311 00:25:45 --> 00:25:50 or is T in the first place? There's always a T there. 312 00:25:50 --> 00:25:55 There you go, and it can be either of those. There's a T, 313 00:25:55 --> 00:26:01 C, anything, or an A, G, and a T, or a C. 314 00:26:01 --> 00:26:07 Here, we have C, C anything. Here we have a G, 315 00:26:07 --> 00:26:13 C anything. We have a G, A, T, or a C. For leucine it's an A, an A, 316 00:26:13 --> 00:26:19 either an A or a G. Here, it's an A, a C, 317 00:26:19 --> 00:26:25 an anything. Here, it's an A, an A, a T, or a C, here a G, a T, 318 00:26:25 --> 00:26:31 anything, an A, an A, A or a G. You're right. There are 319 00:26:31 --> 00:26:36 multiple possibilities. But, it's not an infinite number, 320 00:26:36 --> 00:26:41 right? There are certain possible DNA sequences that might be encoded 321 00:26:41 --> 00:26:46 here. If I just work it out, it's either two choices here. There 322 00:26:46 --> 00:26:52 are four choices here. There's two choices here. 323 00:26:52 --> 00:26:57 There's four choices here. There's two choices here, two 324 00:26:57 --> 00:27:02 choices, etc. If I just look at, 325 00:27:02 --> 00:27:08 let's take a segment of this. Let's try one, two, three, these 326 00:27:08 --> 00:27:14 six amino acids. Four choices here, 327 00:27:14 --> 00:27:20 how many possible DNA sequences could encode these six amino acids 328 00:27:20 --> 00:27:26 in this order? Four times four times two times two 329 00:27:26 --> 00:27:32 times four times two, what is that? 330 00:27:32 --> 00:27:38 256, let's see, two, two, to the two, 331 00:27:38 --> 00:27:44 to the four, to the five, to the six, to the seven, eight, 332 00:27:44 --> 00:27:50 512. I think it's about 512 possibilities. 333 00:27:50 --> 00:27:56 So, 512 possible nucleotide sequences could work here. 334 00:27:56 --> 00:28:02 Well, 512's not infinite. There's 18 bases of sequence, 335 00:28:02 --> 00:28:09 512 possible 18 base long nucleotide sequences. 336 00:28:09 --> 00:28:14 Just suppose that you knew which one it was. Now, you have to suspend 337 00:28:14 --> 00:28:19 your disbelief for a second. I'm not going to tell you how you 338 00:28:19 --> 00:28:24 might know, but suppose you knew which of the 512 it was. 339 00:28:24 --> 00:28:29 OK, could we use that little fact of knowing a stretch from about 18 340 00:28:29 --> 00:28:35 bases of the sequence to find the clone? 341 00:28:35 --> 00:28:39 How could we find that clone in our library that has that 18 bases of 342 00:28:39 --> 00:28:43 sequence? Google. [LAUGHTER] And, of course, 343 00:28:43 --> 00:28:47 you are totally right because as we'll come back to, 344 00:28:47 --> 00:28:51 that is the way you would do it today if it's the human genome 345 00:28:51 --> 00:28:55 because the entire sequence of the human genome's on the web. 346 00:28:55 --> 00:29:00 But, you might have an organism where it's not on the web. 347 00:29:00 --> 00:29:04 But, we'll come back because, of course, the human genome project 348 00:29:04 --> 00:29:09 changes everything as to how you would approach this. 349 00:29:09 --> 00:29:13 Google is how you would do it today. But, in the absence of Google or 350 00:29:13 --> 00:29:18 the absence of the entire sequence of the human genome, 351 00:29:18 --> 00:29:23 but I'm glad you raise it because it's absolutely right, 352 00:29:23 --> 00:29:27 how could I find the clone that has that specific 18 base pair sequence? 353 00:29:27 --> 00:29:33 Who has my 18 base sequence. Well, here's a trick. 354 00:29:33 --> 00:29:41 I could chemically synthesize an oligonucleotide that matches my 355 00:29:41 --> 00:29:48 sequence: an 18 base pair long ologonucleotide encoding my sequence. 356 00:29:48 --> 00:29:56 What I'd like to do is use this ologonucleotide as a chemical probe 357 00:29:56 --> 00:30:02 to wash over my library. And, by washing it over my library, 358 00:30:02 --> 00:30:07 I'd like to see where it sticks. Now, that's kind of interesting. 359 00:30:07 --> 00:30:12 What do I mean by that? What I'd really like to do would be to kind 360 00:30:12 --> 00:30:18 of crack open all the cells of my library, and then the DNA would be 361 00:30:18 --> 00:30:23 sitting there. And, I'd like to take my 362 00:30:23 --> 00:30:28 ologonucleotide probe for a little snippet of the gene and wash it over 363 00:30:28 --> 00:30:33 the library. And then, by the amazing powers of 364 00:30:33 --> 00:30:39 Crick and Watson base pairing, it should stick to the right place. 365 00:30:39 --> 00:30:44 Could it do that? Turns out DNA, given time to wash around, 366 00:30:44 --> 00:30:49 will stick to its own complement. So that's the idea. How in the 367 00:30:49 --> 00:30:55 world do I do this in practice? So, here's what you do in practice. 368 00:30:55 --> 00:31:00 In practice, let us grow our 369 00:31:00 --> 00:31:06 bacteria. Let's plate the bacteria on an agar plate on which we have 370 00:31:06 --> 00:31:12 put a membrane a nitrocellulose filter or some other kind of filter. 371 00:31:12 --> 00:31:18 Just imagine it being a piece of filter paper. And, 372 00:31:18 --> 00:31:24 I'm going to plate my bacteria on the filter paper that's here. 373 00:31:24 --> 00:31:30 I'll let them grow up because there's nutrients here. 374 00:31:30 --> 00:31:35 The nutrients diffuse through the filter paper. And then, 375 00:31:35 --> 00:31:40 I have a piece of filter paper that I can pick up with my tweezers, 376 00:31:40 --> 00:31:45 and on that filter paper are bacterial colonies growing. 377 00:31:45 --> 00:31:50 So, this is a filter. Then, what I'm going to do is I'm going to 378 00:31:50 --> 00:31:55 take this filter with these glistening bacterial colonies, 379 00:31:55 --> 00:32:00 and I'm going to stick it in the autoclave. 380 00:32:00 --> 00:32:04 And, I'm going to heat it up in the presence of wet heat, 381 00:32:04 --> 00:32:09 and the bacterial cells will crack open. And, under these conditions, 382 00:32:09 --> 00:32:13 the DNA will tend to stick to the filter because I've picked the 383 00:32:13 --> 00:32:18 filter that the DNA tends to stick to. And, I'm going to wash this 384 00:32:18 --> 00:32:23 filter in a certain way that all the usual junk, some of the proteins and 385 00:32:23 --> 00:32:27 cell surface junk washes off. And, the DNA from each bacterial 386 00:32:27 --> 00:32:33 colony will stick. So now, I have the DNA from each 387 00:32:33 --> 00:32:39 colony sticking to that spot. Then, what I'm going to do is I'm 388 00:32:39 --> 00:32:45 going to take my filter and I'm going to add my ologoprobe. 389 00:32:45 --> 00:32:51 This thing is now called a probe. I'm going to add the probe to the 390 00:32:51 --> 00:32:57 filter, and I'm going to put this in a, I need some sort of a 391 00:32:57 --> 00:33:03 hybridization device in which the probe and the ologonucleotide and a 392 00:33:03 --> 00:33:07 little water can swish around. And here, we use a technical device 393 00:33:07 --> 00:33:11 called a baggy, or some other kind of, 394 00:33:11 --> 00:33:15 basically, a Ziploc bag or you can heat seal it or something like a 395 00:33:15 --> 00:33:18 freeze meal. In fact that's actually what's used in the lab is 396 00:33:18 --> 00:33:22 Freeze-a-Meal. You get these Freeze-a-Meal bags, 397 00:33:22 --> 00:33:26 you toss your filter in, you squirt a little bit of your probe in, 398 00:33:26 --> 00:33:30 and you put it in the Freeze-a-Meal bag, and then you put 399 00:33:30 --> 00:33:34 it in a water bath. And, it switches back and forth. 400 00:33:34 --> 00:33:40 And, the probe just goes washing all over the place. 401 00:33:40 --> 00:33:46 And, wherever the probe finds its corresponding cognate sequence by 402 00:33:46 --> 00:33:51 Crick and Watson, it'll stick. And there you go. 403 00:33:51 --> 00:33:57 That clone contains your sequence. Now, we have a few problems here, 404 00:33:57 --> 00:34:03 don't we? What are some of the problems with this? Yeah? 405 00:34:03 --> 00:34:07 Sorry, what if it sticks what? So, the probe, I thought this 406 00:34:07 --> 00:34:12 filter likes DNA. So, why won't the probe just stick 407 00:34:12 --> 00:34:17 nonspecifically everywhere? We treat it in some way so that 408 00:34:17 --> 00:34:22 after we've got the DNA adhering to it it's now not going to stick 409 00:34:22 --> 00:34:27 everywhere. Good, next problem. Well, 410 00:34:27 --> 00:34:31 even before that, yes? No, we'll take the whole library. 411 00:34:31 --> 00:34:35 We've gotten the library scattered out on this filter. 412 00:34:35 --> 00:34:39 Good, so hang on to that one for a second. First off, 413 00:34:39 --> 00:34:42 do we even know where that clone is? How did we know where the piece of 414 00:34:42 --> 00:34:46 DNA stuck? I mean, I drew it as red. But, 415 00:34:46 --> 00:34:50 how do we know where that red spot is? Yeah? Oh yeah, 416 00:34:50 --> 00:34:53 you see the problem is if I just wash it over there, 417 00:34:53 --> 00:34:57 unless you have, you know, Superman vision, you're not going to 418 00:34:57 --> 00:35:01 know where that probe is. So, you're proposing, the first 419 00:35:01 --> 00:35:05 thing we better do is radioactively label the probe. 420 00:35:05 --> 00:35:08 So, let's put a radioactive label on the probe, OK? 421 00:35:08 --> 00:35:12 Radio label, and it turns out you can radio label probes by using 422 00:35:12 --> 00:35:15 these enzymes that can add a radioactive phosphate group, 423 00:35:15 --> 00:35:19 etc. So, now, when it's radioactive, we put it here. 424 00:35:19 --> 00:35:22 And now we have a radioactive signal here. How are we going to 425 00:35:22 --> 00:35:26 find our radioactive signal? We put it up against x-ray films. 426 00:35:26 --> 00:35:30 We take our filter. We dry it off. 427 00:35:30 --> 00:35:33 We slap it onto a piece of x-ray film. We let it expose overnight. 428 00:35:33 --> 00:35:36 We develop the x-ray film. And, we'll see a black dot. 429 00:35:36 --> 00:35:39 We'd better actually have taken some care to take a little 430 00:35:39 --> 00:35:43 radioactive pen and make a couple of fiducial marks around the corners. 431 00:35:43 --> 00:35:46 Otherwise, we're not going to know where our black dot corresponds to. 432 00:35:46 --> 00:35:49 But, assume we've made a couple of dots and we know how to line up our 433 00:35:49 --> 00:35:53 x-ray film to our filter. Now, we go back to our filter. 434 00:35:53 --> 00:35:56 We say, uh-huh, there is a black dot corresponding to the location of 435 00:35:56 --> 00:36:00 the radioactive probe right there. 436 00:36:00 --> 00:36:06 That was, as you said, where the colony used to be that we 437 00:36:06 --> 00:36:12 wished we still had [LAUGHTER] because we cooked it in the 438 00:36:12 --> 00:36:19 autoclave, which is too bad. So, what should we do about that? 439 00:36:19 --> 00:36:25 Yep? So, if I did it one colony at a time, I would know exactly which 440 00:36:25 --> 00:36:32 one it came from. But, it could take a long time. 441 00:36:32 --> 00:36:35 Sorry? So, plate it first onto a plate of agar. 442 00:36:35 --> 00:36:39 Take a filter, and press the filter up against the 443 00:36:39 --> 00:36:43 plate and make a copy of it. Replicaplate (sic) that. It turns 444 00:36:43 --> 00:36:46 out, that'll work. There are two different approaches 445 00:36:46 --> 00:36:50 and both of you were right. One approach is to replicaplate it. 446 00:36:50 --> 00:36:54 Plate it first on a normal plate, and lay a piece of filter on top of 447 00:36:54 --> 00:36:58 it, and a little bacteria will stick in the same patterns. 448 00:36:58 --> 00:37:01 Peel it off, and you now have it. Alternatively, 449 00:37:01 --> 00:37:05 now in the presence of robotics, you can use a robot to take these 450 00:37:05 --> 00:37:08 colonies into microtiter plates, and you can screen the individual 451 00:37:08 --> 00:37:12 wells by stamping them onto a filter, things like that. 452 00:37:12 --> 00:37:15 And frankly, that's how we do it now. If you want to screen the 453 00:37:15 --> 00:37:19 human genome, at least set up a library with a few tens of thousands 454 00:37:19 --> 00:37:23 or hundreds of thousands such things. And, we can read off from a grid 455 00:37:23 --> 00:37:26 which one it was, and we go back to our master 456 00:37:26 --> 00:37:30 microtiter plates where we have. But, either way, we need to have a 457 00:37:30 --> 00:37:34 living copy of the library. But, that's how you do it. 458 00:37:34 --> 00:37:39 So now, we're in business. We have a living copy of the 459 00:37:39 --> 00:37:43 library. We make a filter containing that. 460 00:37:43 --> 00:37:48 We cook the filter in the autoclave. We add a radioactive probe. 461 00:37:48 --> 00:37:53 Wherever it sticks, it matches by the wonders of Crick-Watson base 462 00:37:53 --> 00:37:58 pairing. We're in business. Yes? So now, there was this issue. 463 00:37:58 --> 00:38:03 I mean, how do I know that that sequence doesn't appear multiple 464 00:38:03 --> 00:38:08 times in the human genome? That's one issue. So, I'm going to 465 00:38:08 --> 00:38:13 have to pull out each of the positive hits I get and check it out. 466 00:38:13 --> 00:38:18 I'm going to have to analyze the clone because just knowing that it 467 00:38:18 --> 00:38:23 hybridized to that might not tell me it's the beta-globin gene, 468 00:38:23 --> 00:38:28 but at least it's probably a good start, right? I've narrowed it down. 469 00:38:28 --> 00:38:33 But, yes? Wait a second, right. We said there were 512 possibilities, 470 00:38:33 --> 00:38:39 and I said, bear with me, let's suppose we knew which one it 471 00:38:39 --> 00:38:45 was and we used it. Well, how are we going to know 472 00:38:45 --> 00:38:51 which one it is? Well, we could do the experiment 473 00:38:51 --> 00:38:57 512 times, and one of them would work. That's lousy. 474 00:38:57 --> 00:39:03 We could go and make 512 ologotes and simultaneously throw them in the 475 00:39:03 --> 00:39:07 same seal-a-meal bag. That actually works. 476 00:39:07 --> 00:39:10 How do you make 512 ologotes? How do you make an ologote, by the 477 00:39:10 --> 00:39:13 way? To make an ologonucleotide, there's very fancy chemistry that's 478 00:39:13 --> 00:39:16 been developed, which someone won a Nobel Prize. 479 00:39:16 --> 00:39:20 Nowadays, of course, if you need an ologote made, how do you do it? 480 00:39:20 --> 00:39:23 Go to the catalog, that's right. In fact, you can go on the web, 481 00:39:23 --> 00:39:26 type in the sequence you want, and there's a machine that will make 482 00:39:26 --> 00:39:29 it. You can have it tomorrow. So, it turns out, that's how you 483 00:39:29 --> 00:39:32 make ologonucleotides today. There are good machines for it. 484 00:39:32 --> 00:39:36 And, it turns out that if you wanted to, so what you do is you 485 00:39:36 --> 00:39:40 type into the computer the following. You type in, please make me an 486 00:39:40 --> 00:39:44 ologote that starts, put a C in the first position, 487 00:39:44 --> 00:39:47 a C in the second position. And, what are you going to put in the 488 00:39:47 --> 00:39:51 third position? Just tell the computer to put in a 489 00:39:51 --> 00:39:55 random mix of all four. Then, a G in this position, 490 00:39:55 --> 00:39:59 a C in that position, and then a random mix of all four. 491 00:39:59 --> 00:40:03 Then, put in a G and an A, and then put in a 50/50 mix of T and 492 00:40:03 --> 00:40:06 A. In fact, in one synthesis, 493 00:40:06 --> 00:40:09 by telling the computer to just add a mixture at certain steps, 494 00:40:09 --> 00:40:12 it'll simultaneously synthesize a mixture of all 512 possibilities for 495 00:40:12 --> 00:40:16 you. So actually, a single synthesis will suffice to 496 00:40:16 --> 00:40:19 get a mixture of 512. You take your mixture of 512, 497 00:40:19 --> 00:40:22 wash it over the filter, etc. Now, your point still stands. How do we 498 00:40:22 --> 00:40:25 know that there's not something else in the genome that has this, 499 00:40:25 --> 00:40:28 etc.? But at least we can find all the specific positives associated 500 00:40:28 --> 00:40:31 with this, and we can analyze them further as we'll talk about next 501 00:40:31 --> 00:40:35 time more about how you actually analyze them. 502 00:40:35 --> 00:40:38 And, of course, whether 18 is the right number of 503 00:40:38 --> 00:40:41 bases, or you might prefer to have a longer probe or shorter probes, 504 00:40:41 --> 00:40:44 or two probes, these are all the cooking tips molecular biologists 505 00:40:44 --> 00:40:48 worry about. But, given a sequence of an amino acid 506 00:40:48 --> 00:40:51 sequence, you can infer, although with redundancy, 507 00:40:51 --> 00:40:54 a nucleotide sequence. Given a nucleotide sequence, 508 00:40:54 --> 00:40:58 you can make an ologonucleotide probe. Given a nucleotide probe, 509 00:40:58 --> 00:41:02 you can wash it over the filter. You can find the colonies that have 510 00:41:02 --> 00:41:07 it, and therefore you could clone by hybridization. 511 00:41:07 --> 00:41:12 So, we'll call this one cloning by hybridization, 512 00:41:12 --> 00:41:17 or cloning by sequence. OK, now, there are other ways to do 513 00:41:17 --> 00:41:21 it, or by sequence here. Of course, as someone correctly 514 00:41:21 --> 00:41:26 noted, if the entire sequence of the human genome has been already 515 00:41:26 --> 00:41:31 sequenced as it has right now, if you knew the amino acid sequence, 516 00:41:31 --> 00:41:36 you could do this hybridization not using filters and radioactive probes, 517 00:41:36 --> 00:41:42 but just doing it in silico. You can do it in the computer, 518 00:41:42 --> 00:41:50 and that will work as well. So now, let's do the next one. Last cloning 519 00:41:50 --> 00:41:57 expedition: I'd like to clone the gene for Huntington's disease or 520 00:41:57 --> 00:42:05 cystic fibrosis or something like that. Cloning a disease gene, 521 00:42:05 --> 00:42:13 such as Huntington's disease, is a dominantly inherited disorder 522 00:42:13 --> 00:42:23 passed to some of the offspring, causes a brain degeneration that 523 00:42:23 --> 00:42:33 onsets typically in the fifth decade of life. 524 00:42:33 --> 00:42:36 Let's clone that gene. Can we do it by method number one, 525 00:42:36 --> 00:42:39 cloning by complementation? No, because we don't have a bacteria 526 00:42:39 --> 00:42:42 that has Huntington's disease. We don't have mice that have 527 00:42:42 --> 00:42:46 Huntington's disease. And, we can't certainly shoot up 528 00:42:46 --> 00:42:49 people and try to rescue the phenotype and all that. 529 00:42:49 --> 00:42:52 That's not going to work. Number two, how about doing it by 530 00:42:52 --> 00:42:56 number two? Let's just get the protein for Huntington's disease, 531 00:42:56 --> 00:42:59 get its amino acid sequence, and then find its nucleotide 532 00:42:59 --> 00:43:03 sequence. Pretty good. What's the protein for Huntington's 533 00:43:03 --> 00:43:07 disease? Huntase. No, it's actually called Huntington 534 00:43:07 --> 00:43:11 it turns out. But, at the time that people went off 535 00:43:11 --> 00:43:15 trying to find the gene for Huntington's disease, 536 00:43:15 --> 00:43:19 I'm afraid they didn't know. They had no idea what the gene was 537 00:43:19 --> 00:43:23 that caused Huntington's disease. That was the point. They wanted to 538 00:43:23 --> 00:43:27 use molecular biology to find the gene when they didn't even 539 00:43:27 --> 00:43:32 know the protein. So, we can't use our method number 540 00:43:32 --> 00:43:37 two. So, how are we going to find it? The disease does lead to 541 00:43:37 --> 00:43:42 degeneration of nervous cells. Study nerve cells. So, we could 542 00:43:42 --> 00:43:47 take brain biopsies from patients who have died of Huntington's 543 00:43:47 --> 00:43:52 disease, and people did that. But, nerve cells that die, a lot of 544 00:43:52 --> 00:43:57 stuff goes on. All sorts of proteins go wrong, 545 00:43:57 --> 00:44:02 and it's stuff. The problem with studying tissue 546 00:44:02 --> 00:44:06 from people who have a disease is that it's diseased tissue. 547 00:44:06 --> 00:44:10 And, just because you see something wrong doesn't mean it's a cause 548 00:44:10 --> 00:44:15 rather than the effect of the disease. That's why we really want 549 00:44:15 --> 00:44:19 to find the gene and find its mutation because we know then that's 550 00:44:19 --> 00:44:24 the primary cause. But, how are we going to do that? 551 00:44:24 --> 00:44:28 We don't know its sequence. We can't rescue it by complementation. 552 00:44:28 --> 00:44:33 As a pure geneticist, what can we do? 553 00:44:33 --> 00:44:36 Yeah, we know the sequence of the human genome. So, 554 00:44:36 --> 00:44:40 we just sequence the entirety of the genome of somebody with Huntington's 555 00:44:40 --> 00:44:43 disease and compare it to normal. That actually may become a 556 00:44:43 --> 00:44:47 reasonable way to do things, but the first sequence of the human 557 00:44:47 --> 00:44:50 genome costs a couple of billion dollars. Doing it again would be 558 00:44:50 --> 00:44:54 cheaper. We'd spend about $30 million or so, 559 00:44:54 --> 00:44:57 but it's pricey. Also, there would be a lot of 560 00:44:57 --> 00:45:01 genetic variation, just random, meaningless 561 00:45:01 --> 00:45:05 polymorphism between individuals. The human genome differs between any 562 00:45:05 --> 00:45:11 two people by about one letter or 1, 00. So, we would see about 3 563 00:45:11 --> 00:45:16 million differences between the person with Huntington's and the 564 00:45:16 --> 00:45:22 wild type reference sequence on Google. We wouldn't know which one 565 00:45:22 --> 00:45:27 causes it. Suppose you have a family tree. How could we use it? 566 00:45:27 --> 00:45:33 Compare the children and the parents. 567 00:45:33 --> 00:45:37 That's all right. What does a geneticist do with a 568 00:45:37 --> 00:45:42 family tree? What did Sturtevant teach us: genetic mapping. 569 00:45:42 --> 00:45:47 Suppose we were to study a family tree of individuals with 570 00:45:47 --> 00:45:52 Huntington's disease. And suppose on the chromosome where 571 00:45:52 --> 00:45:57 the Huntington's disease gene lives, we were to look at genetic markers. 572 00:45:57 --> 00:46:03 Could we do genetic linkage analysis? 573 00:46:03 --> 00:46:09 Genetic linkage analysis that would allow us to know that there was a 574 00:46:09 --> 00:46:15 marker here, some kind of a marker, a DNA marker, a DNA variation that 575 00:46:15 --> 00:46:21 was co-inherited with that showed linkage with Huntington's disease? 576 00:46:21 --> 00:46:27 We could do that just by finding that across a family, 577 00:46:27 --> 00:46:33 there tended to be very little genetic recombination between this 578 00:46:33 --> 00:46:38 marker and Huntington's disease. Now, how would we know to look here? 579 00:46:38 --> 00:46:44 You wouldn't. We'd try markers all over the genome. 580 00:46:44 --> 00:46:50 Next chromosome, next chromosome; if we tried genetic 581 00:46:50 --> 00:46:56 variations all over the human genome, we would eventually find that some 582 00:46:56 --> 00:47:02 genetic markers in the human genome tended to be co-inherited along with 583 00:47:02 --> 00:47:07 Huntington's disease. It turns out that that's enough. 584 00:47:07 --> 00:47:12 This will tell us approximately where this unknown gene must live. 585 00:47:12 --> 00:47:17 Here's a portion of the chromosome where the unknown Huntington's 586 00:47:17 --> 00:47:22 disease gene lives. Here's a genetic variant, 587 00:47:22 --> 00:47:27 and here's a genetic variant, a marker, that shows correlation. 588 00:47:27 --> 00:47:32 Maybe there's only 1% recombination here, and 1% recombination here. 589 00:47:32 --> 00:47:36 And, that's the powerful thing about Sturtevant's idea. 590 00:47:36 --> 00:47:41 It works in fruit flies. It works in humans. If I have any 591 00:47:41 --> 00:47:45 genetic variation and it's 99% correlated, or only recombines 1% of 592 00:47:45 --> 00:47:50 the time, it tells me that this unknown gene must be nearby. 593 00:47:50 --> 00:47:55 So, I could use this genetic marker as a DNA probe to wash over a 594 00:47:55 --> 00:48:00 library to get a big piece of DNA from this region. 595 00:48:00 --> 00:48:04 I can take this piece of DNA and use it as a probe, 596 00:48:04 --> 00:48:09 a radioactive probe, to get an overlapping piece of DNA. 597 00:48:09 --> 00:48:13 I can use the end of this DNA as a probe to wash over a library and get 598 00:48:13 --> 00:48:18 the next piece of DNA. And, I can do the same thing here. 599 00:48:18 --> 00:48:22 Once I have any piece of DNA that's even vaguely in the neighborhood, 600 00:48:22 --> 00:48:27 I can use it as a probe to wash over a library and get a piece of DNA, 601 00:48:27 --> 00:48:31 use it to get the next piece, the next piece, the next piece, 602 00:48:31 --> 00:48:36 in a process that was called chromosomal walking. 603 00:48:36 --> 00:48:40 That gives me a series of clones that I know must cover the region 604 00:48:40 --> 00:48:45 for this unknown gene. I then begin to analyze them and I 605 00:48:45 --> 00:48:50 say, let's look at some more genetic markers, a genetic marker a little 606 00:48:50 --> 00:48:55 closer and a little closer and a little closer. 607 00:48:55 --> 00:49:00 Which ones show perfect correlation with Huntington's disease? 608 00:49:00 --> 00:49:04 And, that narrows me down to a small number of clones that must contain 609 00:49:04 --> 00:49:09 the gene, even though I had no idea in advance what that gene was. 610 00:49:09 --> 00:49:14 This is called cloning by position. And, that's a very powerful 611 00:49:14 --> 00:49:19 technique of genetics because you don't need to know in advance what's 612 00:49:19 --> 00:49:24 wrong with a diseased gene. You first figure out where it is, 613 00:49:24 --> 00:49:29 and then you get the clones to figure out what it is. So, 614 00:49:29 --> 00:49:33 this actually works. Now, the process of getting the next 615 00:49:33 --> 00:49:37 piece, and the next clone, and the next clone, is unbelievably 616 00:49:37 --> 00:49:41 boring and tedious. And, for Huntington's disease, 617 00:49:41 --> 00:49:45 this process took nine years. Of course, now, how would you do it? 618 00:49:45 --> 00:49:49 Go to the web because with all of this process of the human genome, 619 00:49:49 --> 00:49:53 you've got all these clones laid out already. And so, 620 00:49:53 --> 00:49:57 the work that used to take years now is, once you have a genetic marker 621 00:49:57 --> 00:50:01 that's close to Huntington's you can just look up all the clones in the 622 00:50:01 --> 00:50:05 neighborhood and actually all the sequences in the neighborhood. 623 00:50:05 --> 00:50:09 So, this process has gone from nine years to, if you have do this again, 624 00:50:09 --> 00:50:14 you could get that region for Huntington's disease in a couple 625 00:50:14 --> 00:50:18 weeks. Now the question is, how do you analyze that region? 626 00:50:18 --> 00:50:23 How do you know what's in that region? How do you know what the 627 00:50:23 --> 00:50:27 genes are that are in that region? And that's what we'll talk about 628 00:50:27 --> 50:32 next time.