1 00:00:15 --> 00:00:19 Professor Jacks is out of town so I am going to tell you about 2 00:00:19 --> 00:00:24 Recombinant DNA 3, then he's going to come back and 3 00:00:24 --> 00:00:29 tell you about Cell Biology, and then you will have finished the 4 00:00:29 --> 00:00:34 foundations part of the course. And we'll move onto things that 5 00:00:34 --> 00:00:38 build on the foundation, the Formation Module and the part of 6 00:00:38 --> 00:00:43 the Systems Module, which I'll be teaching you for the 7 00:00:43 --> 00:00:48 next few weeks, but today is Recombinant DNA 3. 8 00:00:48 --> 00:00:52 And, as you've been hearing for the last couple of lectures, 9 00:00:52 --> 00:00:57 this is one of the How-To Modules that we've put in the course. 10 00:00:57 --> 00:01:01 How to make use of the information that you have been learning in 11 00:01:01 --> 00:01:06 Molecular Biology and in Biochemistry and in Genetics to use 12 00:01:06 --> 00:01:11 these disciplines or these pieces of information to do something useful. 13 00:01:11 --> 00:01:15 And recombinant DNA is really an extraordinary set of technologies 14 00:01:15 --> 00:01:19 that just keeps getting more and more extraordinary. 15 00:01:19 --> 00:01:23 And the way one can manipulate biological systems now is really 16 00:01:23 --> 00:01:27 very exciting. And it continues to be exciting. 17 00:01:27 --> 00:01:31 When I was a beginning graduate student we were able to clone the 18 00:01:31 --> 00:01:35 first pieces of DNA. And now we can really do a lot more 19 00:01:35 --> 00:01:39 than just clone DNA. So I want to tell you about some of 20 00:01:39 --> 00:01:44 the things that are really essential to understand about this technology, 21 00:01:44 --> 00:01:48 and then take you through some of the forefronts of where recombinant 22 00:01:48 --> 00:01:52 DNA technology is now. We're going to cover three things 23 00:01:52 --> 00:02:00 in this lecture. 24 00:02:00 --> 00:02:10 DNA sequencing, using genetic polymorphisms for 25 00:02:10 --> 00:02:20 various genotyping analyses, and then I'm going to try to touch 26 00:02:20 --> 00:02:30 on, and we'll have to see how we do here, making animals that are 27 00:02:30 --> 00:02:38 so-called transgenic. So transgenic technology. 28 00:02:38 --> 00:02:44 And I'm going to use PowerPoint pretty much for most of the lecture, 29 00:02:44 --> 00:02:50 so you have most of the relevant stuff in front of you. 30 00:02:50 --> 00:02:56 I'm going to frame this in terms of a human disease, familial 31 00:02:56 --> 00:03:02 hypercholesterolemia. So you may remember way back when in 32 00:03:02 --> 00:03:06 biochemistry we talked about cholesterol. Anyone remember what 33 00:03:06 --> 00:03:10 class of macromolecules cholesterol belongs to? Lipids. 34 00:03:10 --> 00:03:14 Thank you. Lipids. OK. I'm not even going to give a frog 35 00:03:14 --> 00:03:19 for that. And we have this sense of cholesterol being a really bad kind 36 00:03:19 --> 00:03:23 of molecule but, in fact, cholesterol is an essential 37 00:03:23 --> 00:03:27 lipid. It's extremely important. Without cholesterol you'd die and 38 00:03:27 --> 00:03:32 you need it for many things. Not only for building membranes in 39 00:03:32 --> 00:03:36 your cells but also, if you think way back, 40 00:03:36 --> 00:03:40 you may remember me telling you that cholesterol was part of or had a 41 00:03:40 --> 00:03:44 chemical structure that was very similar to the steroid hormone 42 00:03:44 --> 00:03:48 family. And steroid hormones, and we'll discuss this more in the 43 00:03:48 --> 00:03:52 future, are very important molecules that tell one part of the body what 44 00:03:52 --> 00:03:56 to do, that regulate what different parts of the body are doing. 45 00:03:56 --> 00:04:00 So cholesterol is part of this whole signaling system. 46 00:04:00 --> 00:04:04 And really it's not actually understood all of what cholesterol 47 00:04:04 --> 00:04:08 does, but it's very important. However, too much of it is not good. 48 00:04:08 --> 00:04:12 And it's probably not good because, but it's not actually clear. I'll 49 00:04:12 --> 00:04:16 tell you what happens if you have too much cholesterol, 50 00:04:16 --> 00:04:20 but actually why it happens is not that clear. So let me talk about 51 00:04:20 --> 00:04:24 this slide up here, and then we'll talk about what too 52 00:04:24 --> 00:04:28 much cholesterol does for you. So familial hypercholesterolemia is 53 00:04:28 --> 00:04:33 an inherited disease, and it's caused by mutations in a 54 00:04:33 --> 00:04:39 gene called the LDL receptor, that encodes for something called 55 00:04:39 --> 00:04:44 the LDL receptor. Now, LDL stands for low density 56 00:04:44 --> 00:04:49 lipoprotein. And you had this in a previous lecture because I'd been 57 00:04:49 --> 00:04:55 mentioned these to you. Low density lipoproteins. 58 00:04:55 --> 00:05:00 And these bind to various lipids, including cholesterol, and are taken 59 00:05:00 --> 00:05:05 up into the cell. And some of them are OK, 60 00:05:05 --> 00:05:09 you probably need some LDLs, but too much LDL is bad. And if you 61 00:05:09 --> 00:05:14 have too much LDL receptor, the thing that actually binds to the 62 00:05:14 --> 00:05:18 LDLs, you get too much LDL taken up into the cell. 63 00:05:18 --> 00:05:23 So this LDL receptor, you'll talk more about this in cell 64 00:05:23 --> 00:05:27 biology, this LDL receptor, and you've already had some of this, 65 00:05:27 --> 00:05:32 the LDL receptor is a protein that binds to these LDLs, 66 00:05:32 --> 00:05:37 takes them into the cell, and then your cell gets full of LDLs. 67 00:05:37 --> 00:05:41 OK? And as a consequence of this, your cholesterol levels go way up. 68 00:05:41 --> 00:05:46 Now, you can be heterozygote or homozygote for familiar 69 00:05:46 --> 00:05:50 hypercholesterolemia, for the LDL receptor gene. 70 00:05:50 --> 00:05:55 OK? For the familiar hypercholesterolemia gene. 71 00:05:55 --> 00:06:00 Try to say that one quickly. All right. 72 00:06:00 --> 00:06:06 So if you're heterozygote, you have an increased risk of heart 73 00:06:06 --> 00:06:13 disease. In particular for this thing called atherosclerosis I'll 74 00:06:13 --> 00:06:19 talk more about in a moment. If you are homozygote, so you have 75 00:06:19 --> 00:06:26 two copies of a mutated LDL receptor gene, you get severe heart symptoms 76 00:06:26 --> 00:06:32 and you die early. OK? What is atherosclerosis? 77 00:06:32 --> 00:06:38 Atherosclerosis is a disease that occurs because you get these 78 00:06:38 --> 00:06:44 buildups of stuff in the blood vessels. And the stuff is fat and 79 00:06:44 --> 00:06:49 it's proteins, and it basically makes a big lump 80 00:06:49 --> 00:06:55 that eventually occludes or blocks the blood vessel. 81 00:06:55 --> 00:07:01 And so atherosclerosis is bad because impedes blood flow. 82 00:07:01 --> 00:07:06 And if you impede blood flow, eventually your heart will seize up 83 00:07:06 --> 00:07:12 and you will have a heart attack, and that can have, obviously, very 84 00:07:12 --> 00:07:17 severe consequences. So atherosclerosis occurs because 85 00:07:17 --> 00:07:23 you have high levels of LDL. And it's really, the actual 86 00:07:23 --> 00:07:29 etiology of atherosclerosis is not really clear. 87 00:07:29 --> 00:07:33 Part it may be that there's just too much fat around and that starts 88 00:07:33 --> 00:07:37 actually getting deposited out of solution, but it's much more 89 00:07:37 --> 00:07:42 complicated than that. And there seems to be a very 90 00:07:42 --> 00:07:46 complicated chain of events by which you get these atherosclerosis 91 00:07:46 --> 00:07:50 plaques sitting on the lining of blood vessels and impeding blood 92 00:07:50 --> 00:07:55 flow. OK. So there is a lot of interest medically in 93 00:07:55 --> 00:07:59 atherosclerosis, particularly in countries such as 94 00:07:59 --> 00:08:04 ours where food is plentiful and people tend to have too much. 95 00:08:04 --> 00:08:08 And obesity is a problem anyway because that is part of the set of 96 00:08:08 --> 00:08:13 risk factors for atherosclerosis. So here are the risk factors. High 97 00:08:13 --> 00:08:18 levels of LDL, high blood pressure, 98 00:08:18 --> 00:08:23 diabetes, cigarette smoke and so on. And familial hypercholesterolemia 99 00:08:23 --> 00:08:28 is contributory to high levels of LDL and atherosclerosis. OK. 100 00:08:28 --> 00:08:34 So one of the things I want to do is to keep thinking about this disorder 101 00:08:34 --> 00:08:40 and walk you through how you figure out who's got FH. 102 00:08:40 --> 00:08:46 OK. What you can do is to get blood cells from people at-risk, 103 00:08:46 --> 00:08:52 and you can actually examine the LDL receptor gene in the blood cells of 104 00:08:52 --> 00:08:59 people who are at-risk for familial hypercholesterolemia. 105 00:08:59 --> 00:09:03 And what I tell you about is how you can actually sequence the gene, 106 00:09:03 --> 00:09:08 the FH gene, see if you can find the mutation and see whether or not you 107 00:09:08 --> 00:09:13 can then identify people who are at-risk for the disorder. 108 00:09:13 --> 00:09:18 So the first thing I want to tell you about today is DNA sequencing. 109 00:09:18 --> 00:09:23 DNA sequencing. What is DNA sequencing? Does someone care to 110 00:09:23 --> 00:09:28 give me a definition or think about what I might mean by 111 00:09:28 --> 00:09:33 DNA sequencing? In particular, 112 00:09:33 --> 00:09:38 what part of the DNA are we sequencing? Thank you, 113 00:09:38 --> 00:09:43 Jamie. You want to say it louder? The bases. Yes. So in DNA 114 00:09:43 --> 00:09:48 sequencing, and maybe I even wrote this, what is this, 115 00:09:48 --> 00:09:53 what you want to do is to determine the base sequence of the DNA. 116 00:09:53 --> 00:09:58 OK? You want to determine the sequence of AGCT along 117 00:09:58 --> 00:10:03 a DNA fragment. This technique is powerful beyond 118 00:10:03 --> 00:10:07 almost anything else. It's an extraordinary technique. 119 00:10:07 --> 00:10:11 The ability to sequence DNA is extraordinary. 120 00:10:11 --> 00:10:16 And it's extraordinary because you can get out of it information that 121 00:10:16 --> 00:10:20 is absolutely essential for understanding life. 122 00:10:20 --> 00:10:25 What you can get from DNA sequencing is an understanding of 123 00:10:25 --> 00:10:29 the coding capacity of a gene. So, just like you did in your exam, 124 00:10:29 --> 00:10:33 we gave you a string of DNA and you conceptually translated 125 00:10:33 --> 00:10:38 it into the protein. Well, you can do that in real life 126 00:10:38 --> 00:10:42 by looking through the genome, the human genome and finding 127 00:10:42 --> 00:10:46 stretches of DNA and conceptually turning them into RNA and into 128 00:10:46 --> 00:10:50 protein and saying, OK, is this is a gene? 129 00:10:50 --> 00:10:54 Does it code for something? And what does it code for? So you 130 00:10:54 --> 00:10:58 can figure out the coding capacity of a gene. Part of that is actually 131 00:10:58 --> 00:11:02 identifying is a gene a gene? So we've sequenced the entire human 132 00:11:02 --> 00:11:06 genome. And I've told you previously that only about 5% of the 133 00:11:06 --> 00:11:10 genome is actually genes and the rest is other stuff. 134 00:11:10 --> 00:11:14 So one of the things you want to do with DNA sequencing is to identify 135 00:11:14 --> 00:11:18 genes. And that's actually very difficult to do it turns out. 136 00:11:18 --> 00:11:22 But that's one of the things you can do with DNA sequencing. 137 00:11:22 --> 00:11:26 I'll talk more about identifying genes that are associated with 138 00:11:26 --> 00:11:30 disease, that are causative of disease. 139 00:11:30 --> 00:11:33 And particularly alleles that are associated with disease such as in 140 00:11:33 --> 00:11:37 the case of familial hypercholesterolemia. 141 00:11:37 --> 00:11:40 One can figure out evolutionary relationships between organisms. 142 00:11:40 --> 00:11:44 So you've probably heard for years about how similar we are to 143 00:11:44 --> 00:11:48 chimpanzees or how similar we are to dogs or to dolphins or whatever. 144 00:11:48 --> 00:11:51 But, actually, we didn't really know. Now we can sequence a human 145 00:11:51 --> 00:11:55 genome, we can sequence a chimp genome, a dog genome, 146 00:11:55 --> 00:11:59 a dolphin genome, and we can actually look and see 147 00:11:59 --> 00:12:02 how similar we are. And we can try to figure out, 148 00:12:02 --> 00:12:06 in evolutionary time, what's changed between the dolphin and ourselves 149 00:12:06 --> 00:12:09 and what makes a dolphin a dolphin and ourselves ourselves. 150 00:12:09 --> 00:12:13 It's a very tough question, but DNA sequencing is essential for 151 00:12:13 --> 00:12:16 trying to answer that kind of question. And then one can ask 152 00:12:16 --> 00:12:20 about the genome is other ways. Can one find the promoters of all 153 00:12:20 --> 00:12:23 the different genes? Remember promoters that make genes 154 00:12:23 --> 00:12:27 be transcribed? The centromeres, 155 00:12:27 --> 00:12:31 the middle of chromosomes. Various other elements in the genome 156 00:12:31 --> 00:12:36 that are essential for its function. So I'm going to spend quite some 157 00:12:36 --> 00:12:41 time talking about DNA sequencing and tell you that DNA sequencing, 158 00:12:41 --> 00:12:45 most of the DNA sequencing we do uses a trick. And it's a terrific 159 00:12:45 --> 00:12:50 trick. It really is. So this DNA sequencing, 160 00:12:50 --> 00:12:55 I'll write it because I don't think I have this on one of 161 00:12:55 --> 00:13:01 your PowerPoints. The method of DNA sequencing I'm 162 00:13:01 --> 00:13:09 going to tell you about was devised by a scientist called Fred Sanger. 163 00:13:09 --> 00:13:17 So I'll tell you about it. It's called dideoxy, 164 00:13:17 --> 00:13:25 it's also called chain termination, and it's also called Sanger 165 00:13:25 --> 00:13:30 sequencing. Professor Sanger is a British 166 00:13:30 --> 00:13:34 scientist who received two Nobel Prizes. The first was for figuring 167 00:13:34 --> 00:13:37 out how proteins, how to sequence proteins, 168 00:13:37 --> 00:13:41 and the second was for figuring out how to sequence DNA. 169 00:13:41 --> 00:13:44 When I was a student, I heard Professor Sanger talk. 170 00:13:44 --> 00:13:48 And he gave a lecture which was really memorable. 171 00:13:48 --> 00:13:51 It was packed, a packed auditorium. And he spoke the entire time like 172 00:13:51 --> 00:13:55 this. I don't think he looked up once. He gave the entire lecture 173 00:13:55 --> 00:13:59 like this, and he was barely audible. 174 00:13:59 --> 00:14:03 But at the end of the lecture he got a standing ovation from everybody 175 00:14:03 --> 00:14:07 because really what he's done, figuring out how to sequence 176 00:14:07 --> 00:14:12 proteins and how to sequence DNA was really an extraordinary 177 00:14:12 --> 00:14:16 accomplishment. So that's the method I'll tell you 178 00:14:16 --> 00:14:21 about. And it uses a cool trick. So you know now that the sugar in 179 00:14:21 --> 00:14:25 DNA has a 3 prime hydroxyl group, and that hydroxyl group is the group 180 00:14:25 --> 00:14:30 unto which the phosphate gets added. 181 00:14:30 --> 00:14:35 Right? And without that hydroxyl group you could not add on the next 182 00:14:35 --> 00:14:40 nucleotide, right? It's a question. Think about it. 183 00:14:40 --> 00:14:46 OK? I don't mean it to be rhetorical. I want you to really be 184 00:14:46 --> 00:14:51 thinking, OK, about this, because otherwise you won't 185 00:14:51 --> 00:14:57 understand the method. So here's the 3 prime hydroxyl on 186 00:14:57 --> 00:15:02 regular deoxyribose. OK? In the Sanger or dideoxy method one 187 00:15:02 --> 00:15:07 uses in the reaction mix, and I'll go through this with you in 188 00:15:07 --> 00:15:13 a moment, a sugar or nucleotide that's a dideoxy nucleotide. 189 00:15:13 --> 00:15:18 In other words, on both the 2 prime and the 3 prime of the sugar, 190 00:15:18 --> 00:15:23 of the ribose there is no hydroxyl group. There are just 191 00:15:23 --> 00:15:28 those hydrogens. Now, a dideoxy nucleotide such as 192 00:15:28 --> 00:15:34 this one can get incorporated into DNA just fine because this phosphate, 193 00:15:34 --> 00:15:40 the triphosphate here can react with a regular nucleotide that's got a 3 194 00:15:40 --> 00:15:46 prime hydroxyl. However, once it's been 195 00:15:46 --> 00:15:52 incorporated you cannot elongate the chain anymore because there is no 196 00:15:52 --> 00:15:58 reactive hydroxyl group. OK. So based on this principle let 197 00:15:58 --> 00:16:03 me explain. I've got one of your handouts here. 198 00:16:03 --> 00:16:07 OK. So here we go. Revision, your template, your primer, 199 00:16:07 --> 00:16:11 here's your template strand, always goes 3 prime to 5 prime. 200 00:16:11 --> 00:16:15 Here's your 5 prime to 3 prime primer. If you add nucleotides, 201 00:16:15 --> 00:16:19 deoxynucleotide triphosphates and DNA polymerase, 202 00:16:19 --> 00:16:24 you will polymerize the whole fragment. 203 00:16:24 --> 00:16:29 If you add, however, to the mix of dNTPs and DNA 204 00:16:29 --> 00:16:35 polymerase a low-level of dideoxy nucleotide triphosphates, 205 00:16:35 --> 00:16:41 every time you add on a nucleotide the polymerase can either use a 206 00:16:41 --> 00:16:46 regular nucleotide triphosphate, in which case the chain can elongate 207 00:16:46 --> 00:16:52 subsequently, or it can use a dideoxy nucleotide triphosphate. 208 00:16:52 --> 00:16:58 If it uses one of the dideoxy NTPs the chain will terminate. 209 00:16:58 --> 00:17:03 It cannot be elongated any further. So you get something like this. 210 00:17:03 --> 00:17:08 And the trick here is really this low-level of ddNTPs. 211 00:17:08 --> 00:17:14 OK? So if you have your template and your primer and you do a 212 00:17:14 --> 00:17:19 reaction with your dNTPs at a reasonable level and you spike the 213 00:17:19 --> 00:17:24 reaction with a low-level of dideoxy NTPs, you get a whole bunch of 214 00:17:24 --> 00:17:30 different length chains polymerized. 215 00:17:30 --> 00:17:35 Because there is some probability, at every position, that you're 216 00:17:35 --> 00:17:40 either going to get a ddNTP incorporated, in which case the 217 00:17:40 --> 00:17:45 chain terminates, or you're going to get a regular 218 00:17:45 --> 00:17:50 nucleotide incorporated in which case the chain can continue for a 219 00:17:50 --> 00:17:56 bit. OK? So that is paramount to dideoxy sequencing. 220 00:17:56 --> 00:18:01 So let's continue now by looking at a specific polymer and following 221 00:18:01 --> 00:18:06 through exactly what happens. So here I've given you a template 222 00:18:06 --> 00:18:10 and a primer. And we're going to do the same reaction that we just did 223 00:18:10 --> 00:18:15 conceptually. We're going to do it again conceptually except with 224 00:18:15 --> 00:18:19 letters. We're going to mix together. And we're going to do, 225 00:18:19 --> 00:18:24 and I see a mistake up here already, but that's OK. You'll bear with me. 226 00:18:24 --> 00:18:29 What I've done here is to put in some dideoxy ATP. 227 00:18:29 --> 00:18:33 And I meant to say here I've got dATP at high levels. 228 00:18:33 --> 00:18:37 And I've got all the other nucleotides here, 229 00:18:37 --> 00:18:41 too, at high levels. OK? That's my error and I will 230 00:18:41 --> 00:18:45 correct it. You should correct it now in your handout. 231 00:18:45 --> 00:18:49 So where it says dATP high, that should actually say dNTPs high, 232 00:18:49 --> 00:18:53 not just dATP. OK? All right. So let's look and see what happens to 233 00:18:53 --> 00:18:57 this reaction. And I've noted here that this 234 00:18:57 --> 00:19:02 dideoxy ATP can be radioactive or florescent. 235 00:19:02 --> 00:19:06 Or actually it doesn't have to work that way but let's just leave it 236 00:19:06 --> 00:19:10 that way for now. OK. That actually is not 237 00:19:10 --> 00:19:15 necessarily true. So let's just focus on the ddATP 238 00:19:15 --> 00:19:19 plus the high dNTPs, and let's see what happens. 239 00:19:19 --> 00:19:24 OK. So one thing that can happen is that, here's your primer in red 240 00:19:24 --> 00:19:28 and here's the polymerized DNA in blue, you get a bit 241 00:19:28 --> 00:19:33 of DNA polymerase. Now here's an A. 242 00:19:33 --> 00:19:38 See? It goes GAGTAA. And I've given you a reaction where 243 00:19:38 --> 00:19:42 the first two As use regular dATP. And so the chain will continue 244 00:19:42 --> 00:19:47 after that. All right? So here we go, GAGTA. And then the 245 00:19:47 --> 00:19:52 next A that's put in is a dideoxy A. And that's the end of that 246 00:19:52 --> 00:19:57 polymerization reaction, and the fragments you're going to 247 00:19:57 --> 00:20:02 get out of it is this little red and blue composite there. 248 00:20:02 --> 00:20:06 You can do the same thing where you say actually in some molecules you 249 00:20:06 --> 00:20:10 get polymerization past the second A, and you keep going until you get to 250 00:20:10 --> 00:20:15 the next A. And at that point, by chance, you get a dideoxy ATP 251 00:20:15 --> 00:20:19 added to some molecules. That is the end of polymerization 252 00:20:19 --> 00:20:24 for those molecules. The chain terminates. 253 00:20:24 --> 00:20:28 For some molecules, however, you'll put in a regular dATP and the 254 00:20:28 --> 00:20:33 chain will continue. But it will terminate, 255 00:20:33 --> 00:20:38 excuse me, at the next A that's put in because you put a dideoxy A in. 256 00:20:38 --> 00:20:43 So in different molecules you're going to land up with a spectrum of 257 00:20:43 --> 00:20:47 elongated products of different length. All right? 258 00:20:47 --> 00:20:52 And what's crucial here is that the length of the molecules that chain 259 00:20:52 --> 00:20:57 terminate, because they incorporated dideoxy nucleotide, 260 00:20:57 --> 00:21:02 correspond to the position of that particular nucleotide 261 00:21:02 --> 00:21:07 along the chain. So you're only going to get a 262 00:21:07 --> 00:21:13 molecule chain terminating with A when there was a T on the template 263 00:21:13 --> 00:21:18 strand. OK? And so you can map the positions of the T on the template 264 00:21:18 --> 00:21:23 or the A on the elongated strand by the length of the elongated products 265 00:21:23 --> 00:21:29 that come out of this reaction. I'm going to assume you're with me 266 00:21:29 --> 00:21:33 here. OK. So the point is the polymerized 267 00:21:33 --> 00:21:37 fragments terminate where dideoxy A incorporates. Now, 268 00:21:37 --> 00:21:40 you've got to do four reactions to determine the sequence of something. 269 00:21:40 --> 00:21:44 OK. And I've noted here. And the length of the terminated fragment 270 00:21:44 --> 00:21:48 indicates the position of A. You may need to go and work with 271 00:21:48 --> 00:21:51 this a bit. OK? It's a very clever method but it 272 00:21:51 --> 00:21:55 may not be something that's immediately apparent, 273 00:21:55 --> 00:21:59 so go and work with it if you need to. 274 00:21:59 --> 00:22:03 So the length of the terminated fragments indicates the positions of 275 00:22:03 --> 00:22:07 A in the elongated strand, or if you want in T of the template 276 00:22:07 --> 00:22:11 strand. In order to get the positions of all the different 277 00:22:11 --> 00:22:16 nucleotides along that DNA fragment you have to do four separate 278 00:22:16 --> 00:22:20 reactions. One that includes dideoxy ATP, one that includes dideoxy CTP, 279 00:22:20 --> 00:22:25 one dideoxy GTP and one dideoxy TTP. 280 00:22:25 --> 00:22:29 And you do those separately so that you can monitor the positions of 281 00:22:29 --> 00:22:33 each of those four nucleotides by the position of chain terminating as 282 00:22:33 --> 00:22:38 you're going along. OK. So assuming that you guys are 283 00:22:38 --> 00:22:42 with me here at this point, are you? No. That's an honest 284 00:22:42 --> 00:22:46 answer. Raise your hands if you're with me. OK. If you're not with me, 285 00:22:46 --> 00:22:51 don't worry about. You have to go work with it. 286 00:22:51 --> 00:22:55 It's not intuitive. It's very clever. I mean there's a reason 287 00:22:55 --> 00:23:00 this guy got the Nobel Prize for this. OK? 288 00:23:00 --> 00:23:03 It's a really clever method. OK. So the deal is this. So now 289 00:23:03 --> 00:23:07 what you get out of this is a whole mix of fragments of different 290 00:23:07 --> 00:23:11 lengths that have terminated at positions of particular nucleotides, 291 00:23:11 --> 00:23:14 depending on how you've spiked the reaction. And you've got to 292 00:23:14 --> 00:23:18 separate them from one another somehow to figure out what those 293 00:23:18 --> 00:23:22 positions are. And you can do this in a couple of 294 00:23:22 --> 00:23:26 ways. You can use gel electrophoresis, 295 00:23:26 --> 00:23:31 which was discussed with you previously, where you separate the 296 00:23:31 --> 00:23:36 DNA on the basis of size where the DNA migrates in a gel in an electric 297 00:23:36 --> 00:23:40 field and long fragments stay near the top of the gel and short 298 00:23:40 --> 00:23:45 fragments go to the bottom of the gel because they migrate quickly. 299 00:23:45 --> 00:23:50 And what you can do on a gel, and you've somehow labeled, 300 00:23:50 --> 00:23:55 don't worry about this right now, but somehow you're able to detect 301 00:23:55 --> 00:24:00 each of the fragments that has come out of your mix. 302 00:24:00 --> 00:24:04 OK? So remember you're doing the sequencing reaction on millions and 303 00:24:04 --> 00:24:08 millions or billions of molecules. And so you've got this kind of 304 00:24:08 --> 00:24:12 stochastic mix of molecules of different lengths. 305 00:24:12 --> 00:24:16 And you want to separate this mix of molecules of different lengths. 306 00:24:16 --> 00:24:20 OK. So what you can end up with, once you've separated all these 307 00:24:20 --> 00:24:24 different molecules, is in your dideoxy A reaction mix a 308 00:24:24 --> 00:24:28 series of one, two, three, four, 309 00:24:28 --> 00:24:33 five different sized fragments. In your ddG mix, 310 00:24:33 --> 00:24:37 you got out of that also a series of five different sized fragments. 311 00:24:37 --> 00:24:42 And notice that they're different in size from the ones in the ddA 312 00:24:42 --> 00:24:47 lane, the ones in the ddC lane and the ones in the ddT lane. 313 00:24:47 --> 00:24:51 And the reason they're different in size is because their size indicates 314 00:24:51 --> 00:24:56 the position of where a particular nucleotide is in the DNA fragment or 315 00:24:56 --> 00:25:01 particular bases in the DNA fragment. 316 00:25:01 --> 00:25:06 And then the trick is you could look at this gel and you could read off 317 00:25:06 --> 00:25:11 the sequence. So the shortest fragments that you're going to get 318 00:25:11 --> 00:25:16 are the ones that are nearest the beginning of that molecule you made, 319 00:25:16 --> 00:25:21 nearest the 5 prime end. So the bottom one is G, 320 00:25:21 --> 00:25:26 here's the band in the ddG lane. Then up above it there is this band 321 00:25:26 --> 00:25:32 indicating a fragment in the ddA lane. 322 00:25:32 --> 00:25:38 Above it there's one in the G lane again. Above it there's one in the 323 00:25:38 --> 00:25:44 T lane. So the sequence goes G-A-G-T, and then you can keep 324 00:25:44 --> 00:25:50 reading A-A-C-G-G-T-A-T-G-C-A. OK? Literally like that on a gel. 325 00:25:50 --> 00:25:56 OK? So you can do that on a gel. It's really fantastic. 326 00:25:56 --> 00:26:02 And this is what old sequencing gels look like. 327 00:26:02 --> 00:26:05 And, actually, I used to run them. 328 00:26:05 --> 00:26:09 I used to spend hours and hours running these gels. 329 00:26:09 --> 00:26:13 They're very, very thin. They're about a millimeter thick 330 00:26:13 --> 00:26:16 acrylamide so that you can resolve the fragments that are one 331 00:26:16 --> 00:26:20 nucleotide different in size. Think about that. OK? Each of 332 00:26:20 --> 00:26:24 these fragments, indicated by a band, 333 00:26:24 --> 00:26:28 is one nucleotide different in size. Otherwise, you couldn't get the one 334 00:26:28 --> 00:26:32 nucleotide resolution. So you do that by running very, 335 00:26:32 --> 00:26:37 very thin gels so that you can resolve the fragments well, 336 00:26:37 --> 00:26:42 and then you read off the bottom. OK? I've thrown out all my old 337 00:26:42 --> 00:26:46 sequencing gels. And the reason that I have is that 338 00:26:46 --> 00:26:51 there is new technology where you don't use this kind of display 339 00:26:51 --> 00:26:56 anymore. This is a display where your fragments were labeled with 340 00:26:56 --> 00:27:01 radioactivity and you exposed them to x-ray film and you read the 341 00:27:01 --> 00:27:06 sequence after exposure. Nowadays this is done by machine. 342 00:27:06 --> 00:27:11 And the dideoxy nucleotides are labeled fluorescently. 343 00:27:11 --> 00:27:15 OK? So they're not labeled with radioactivity. 344 00:27:15 --> 00:27:20 They're literally labeled with labels that fluoresce with different 345 00:27:20 --> 00:27:25 colors when you put UV light on them. And you do your dideoxy reaction 346 00:27:25 --> 00:27:30 and you run a gel. Again, it's a gel. 347 00:27:30 --> 00:27:35 It's actually a very thin tube of a gel mostly, but your run your gel. 348 00:27:35 --> 00:27:41 And, again, it's the same idea. You resolve fragments at single base 349 00:27:41 --> 00:27:46 resolution, single nucleotide resolution, and they keep, 350 00:27:46 --> 00:27:52 the gel keeps running and running. And single fragments actually run 351 00:27:52 --> 00:27:57 off the bottom of the gel. And as they're passing down the gel 352 00:27:57 --> 00:28:03 they are detected by a laser. A laser excites the fluorochrome. 353 00:28:03 --> 00:28:06 And the detector, there is a detector which will 354 00:28:06 --> 00:28:10 detect whether or not it's yellow, orange, blue or green. OK? And 355 00:28:10 --> 00:28:14 that will tell you which base is being, has been incorporated at that 356 00:28:14 --> 00:28:18 position. So you get things that come out. It's kind of small but 357 00:28:18 --> 00:28:22 you can go back and look, where instead of getting a gel with 358 00:28:22 --> 00:28:26 those bands that I showed you, you get these peaks and valleys that 359 00:28:26 --> 00:28:30 are different colors. And that's what current DNA 360 00:28:30 --> 00:28:35 sequencing readout looks like. And, in fact, there are machines. 361 00:28:35 --> 00:28:40 What did I do? Lots of primers. Well, it depends. 362 00:28:40 --> 00:28:45 Many copies of the same primer, right. Yes. Dr. Gardel is pointing 363 00:28:45 --> 00:28:50 out that there are many copies of the same primer in a reaction mix. 364 00:28:50 --> 00:28:55 Certainly there are. There are billions of molecules in the 365 00:28:55 --> 00:29:00 reaction mix, and so there are billions of primers. 366 00:29:00 --> 00:29:03 OK, so you have to have a primer for each molecule. 367 00:29:03 --> 00:29:06 OK. And each band, you should realize, is not a single 368 00:29:06 --> 00:29:09 molecule. It's a composite of many, many molecules, many thousands of 369 00:29:09 --> 00:29:12 molecules that have all chain terminated at the same position. 370 00:29:12 --> 00:29:15 So what I want to point out here is that this is what today's readout 371 00:29:15 --> 00:29:19 looks like. And, in fact, nowadays you just get a 372 00:29:19 --> 00:29:22 printout from the company or from the machine that tells 373 00:29:22 --> 00:29:26 you a DNA sequence. And it's this improvement in 374 00:29:26 --> 00:29:31 technology, but that basically uses this chain termination method, 375 00:29:31 --> 00:29:36 that has allowed one to sequence, rapidly enough to sequence the human 376 00:29:36 --> 00:29:41 genome and to sequence multiple human genomes in multiple animals. 377 00:29:41 --> 00:29:46 OK. So let's see. Actually, I have a movie. I guess we can take 378 00:29:46 --> 00:29:51 the time to watch this movie. Let's see if it will work. All 379 00:29:51 --> 00:29:56 right. So primer template. Four reactions, each with lots of 380 00:29:56 --> 00:30:01 molecules, each with their primer. DNA polymerase, 381 00:30:01 --> 00:30:05 dNTPs, dATP, dGTP, dCTP, dTTP, dCTP, excuse me. 382 00:30:05 --> 00:30:09 OK. They're your four reactions. OK. I think is a less dorky movie 383 00:30:09 --> 00:30:13 than some. OK. So here we go. Here's your primer 384 00:30:13 --> 00:30:17 and your template, and here's polymerization. 385 00:30:17 --> 00:30:21 And, ah, there we go, chain termination, dideoxy nucleotide 386 00:30:21 --> 00:30:25 incorporation, and you cannot get elongation. 387 00:30:25 --> 00:30:30 The poor G is thwarted in its desire to elongate. OK? 388 00:30:30 --> 00:30:34 So you land up with this mix, just like I showed you, and you land 389 00:30:34 --> 00:30:38 up with a set of four reactions, each with molecules of different 390 00:30:38 --> 00:30:42 lengths in them. And here's your gel, 391 00:30:42 --> 00:30:47 and you load them on your gel, and they migrate through your 392 00:30:47 --> 00:30:51 electric field. And there you have your things, 393 00:30:51 --> 00:30:55 you have your fragments. This is a piece of x-ray film you put on top. 394 00:30:55 --> 00:31:00 There are your little bands, your radioactive bands, and here we go. 395 00:31:00 --> 00:31:04 GT, you can read it. OK. Enough. Enough. 396 00:31:04 --> 00:31:09 OK. You can go and look at this yourself. This is an old gel 397 00:31:09 --> 00:31:13 apparatus that one used to do DNA sequencing on. 398 00:31:13 --> 00:31:18 This was the first generation of machine that you could do the 399 00:31:18 --> 00:31:22 fluorescent sequencing on. This is a room full of sequencing 400 00:31:22 --> 00:31:27 machines of the kind that was used to sequence the human genome. 401 00:31:27 --> 00:31:30 In fact, many rooms of machines going all day and all night 402 00:31:30 --> 00:31:34 sequencing and sequencing and sequencing. We have a lot of 403 00:31:34 --> 00:31:37 nucleotides. And it takes a long time to sequence. 404 00:31:37 --> 00:31:41 Although, in retrospect it's not such a long time. 405 00:31:41 --> 00:31:45 And now all the sequencing machines that sequence the human genome are 406 00:31:45 --> 00:31:48 sitting around looking for other work because they all exist. 407 00:31:48 --> 00:31:52 And so that is why we are sequencing things like dolphins and 408 00:31:52 --> 00:31:56 dogs and multiple strains of dogs, multiple breeds, excuse me, of dogs 409 00:31:56 --> 00:32:00 because we have all these sequencing machines sitting around. 410 00:32:00 --> 00:32:04 OK. Honestly, I think that's true, 411 00:32:04 --> 00:32:08 not that it's not useful. All right. So I'm going to move on 412 00:32:08 --> 00:32:13 here. This is Professor Jack's joke that I decided to use also. 413 00:32:13 --> 00:32:17 OK. This is something about DNA sequencing and the implications of 414 00:32:17 --> 00:32:21 being able to use DNA sequencing for genotyping. So I'm going to use 415 00:32:21 --> 00:32:26 that. You can go and read that on your thing. I'm going to move on 416 00:32:26 --> 00:32:30 right to talking about familial hypercholesterolemia and the notion 417 00:32:30 --> 00:32:35 of a disease allele. So here's part of the normal FH gene, 418 00:32:35 --> 00:32:40 the LDL receptor gene, and here it is. And there is a T 419 00:32:40 --> 00:32:45 here in red. And here is the mutant gene sequence and there is an A. 420 00:32:45 --> 00:32:50 So if you're wild type you have a T at this position that's arrowed and 421 00:32:50 --> 00:32:55 if you're a mutant you have an A. And if you do your conceptual 422 00:32:55 --> 00:33:00 protein translation here you get your amino acid, part of 423 00:33:00 --> 00:33:05 the amino acid chain. Obviously it's not at the beginning. 424 00:33:05 --> 00:33:09 And obviously this is DNA and this is protein, so we've removed the RNA 425 00:33:09 --> 00:33:14 here, the RNA step. And you can see here is the amino 426 00:33:14 --> 00:33:19 acid of your wild type, the sequence of your wild type gene. 427 00:33:19 --> 00:33:23 And in your LDL receptor mutant there is a stop codon at this 428 00:33:23 --> 00:33:28 position that terminates the LDL receptor. And so the receptor gene 429 00:33:28 --> 00:33:33 is mutant and does not function as it should. 430 00:33:33 --> 00:33:38 OK. All right. So let me move onto the next thing 431 00:33:38 --> 00:33:43 I want to talk about, which is this question of 432 00:33:43 --> 00:33:48 polymorphisms. What is a polymorphism? 433 00:33:48 --> 00:34:03 Anyone. All right. 434 00:34:03 --> 00:34:07 I'll tell you what a polymorphism is. A polymorphism is defined as 435 00:34:07 --> 00:34:12 some kind of variation in DNA sequence. 436 00:34:12 --> 00:34:23 And it's defined as a variation in 437 00:34:23 --> 00:34:27 DNA sequence at a particular position. 438 00:34:27 --> 00:34:40 So our DNA, all of us have very 439 00:34:40 --> 00:34:45 similar DNA. If we were to sequence me and we were to sequence you and 440 00:34:45 --> 00:34:49 we were to sequence you, we would find that our DNA was 441 00:34:49 --> 00:34:54 greater than 99% identical. If we lined up our three times ten 442 00:34:54 --> 00:34:59 to the ninth base pairs in a very long line, we would find 443 00:34:59 --> 00:35:04 it was very similar. There was about 1% difference in 444 00:35:04 --> 00:35:10 sequence between each of us. And most of that, some of that 445 00:35:10 --> 00:35:15 corresponds to disease gene alleles. We all are supposed to carry about 446 00:35:15 --> 00:35:20 a thousand bad genes, or a thousand genes that if 447 00:35:20 --> 00:35:26 homozygous would give us something bad, and sometimes do. 448 00:35:26 --> 00:35:31 And some of those correspond to changes in differences in DNA 449 00:35:31 --> 00:35:37 sequence that are not directly in genes. 450 00:35:37 --> 00:35:41 All of these differences between different individuals are called 451 00:35:41 --> 00:35:46 polymorphisms, DNA sequence variation. 452 00:35:46 --> 00:35:50 And you can use these to help figure out whether or not someone 453 00:35:50 --> 00:35:55 has a particular disease allele, and also you can use it to figure 454 00:35:55 --> 00:35:59 out where the DNA from a sample comes from me or from you 455 00:35:59 --> 00:36:04 or from Dr. Gardel. OK? And I'll talk about this, 456 00:36:04 --> 00:36:08 using polymorphisms to map genotype. I'm going to talk about a 457 00:36:08 --> 00:36:12 particular kind of polymorphism, and these are called SNPs which is 458 00:36:12 --> 00:36:17 pronounced “snip”. This stands for single nucleotide 459 00:36:17 --> 00:36:21 polymorphisms. So I've said again that human 460 00:36:21 --> 00:36:25 genomes are 99% identical, but there are throughout the genome 461 00:36:25 --> 00:36:30 changes, differences between regions. 462 00:36:30 --> 00:36:34 Single nucleotide polymorphisms are variations in one region. 463 00:36:34 --> 00:36:38 Here's a sample sequence I made up. Here's a G in one individual and an 464 00:36:38 --> 00:36:42 A in another individual. And if you take the population, 465 00:36:42 --> 00:36:47 you find very often that there just is a choice of two, 466 00:36:47 --> 00:36:51 sometimes more, but often just a choice of two nucleotides in one 467 00:36:51 --> 00:36:55 position. Most of the genomes are identical, but you find these little 468 00:36:55 --> 00:36:59 regions where in many individuals of a population there are 469 00:36:59 --> 00:37:04 these variations. In fact, these variations have to be 470 00:37:04 --> 00:37:08 present in more than 1% of the population for this thing to be 471 00:37:08 --> 00:37:12 called a SNP. This is a definition that humans have given but it's a 472 00:37:12 --> 00:37:16 useful definition as a genetic tool. So if there is a polymorphism 473 00:37:16 --> 00:37:20 present in about 1% of the population, whereby I might have an 474 00:37:20 --> 00:37:24 A here, excuse me, and Dr. Gardel has a G at that 475 00:37:24 --> 00:37:28 position, that would be a SNP, and we would be polymorphic for that 476 00:37:28 --> 00:37:32 SNP. In fact, my two chromosomes, 477 00:37:32 --> 00:37:38 OK, that are homologous chromosomes might on one copy carry an A and on 478 00:37:38 --> 00:37:43 the other copy carry a G. Now, these different bases are 479 00:37:43 --> 00:37:49 present at different frequencies. So, for example, it might be very 480 00:37:49 --> 00:37:54 common to have a G at this position in the sequence and it might be very 481 00:37:54 --> 00:38:00 rare to have an A at that position. All right? 482 00:38:00 --> 00:38:04 And that's useful because you can use the frequency of these different 483 00:38:04 --> 00:38:09 nucleotides, these different bases to help you use the SNP to genotype. 484 00:38:09 --> 00:38:13 And I want to point out that usually SNPs occur outside coding 485 00:38:13 --> 00:38:18 regions because 95%, actually more than that, 486 00:38:18 --> 00:38:22 99% of the genome is not coding per se. 95% is not genes, 487 00:38:22 --> 00:38:27 but then if you remove all the introns and promoters and so on, 488 00:38:27 --> 00:38:32 99% does not code for any protein. 489 00:38:32 --> 00:38:36 OK. So usually these SNPs are present outside coding regions. 490 00:38:36 --> 00:38:40 So here's to explore this a bit more. You can find lots of these 491 00:38:40 --> 00:38:44 SNPs. There are about three million SNPs in the human genome, 492 00:38:44 --> 00:38:49 and a very large percentage of those SNPs has been identified by DNA 493 00:38:49 --> 00:38:53 sequencing. So you can get the idea. You have to sequence DNA from lots 494 00:38:53 --> 00:38:57 and lots of individuals to identify these SNPs, but people 495 00:38:57 --> 00:39:02 have done it. And we know now more than a million 496 00:39:02 --> 00:39:06 SNPs in the human genome that are located all over different 497 00:39:06 --> 00:39:10 chromosomes, and we know where they're located on different 498 00:39:10 --> 00:39:14 chromosomes. And so you can use these SNPs to make kind of a map, 499 00:39:14 --> 00:39:19 I'll tell you in a moment. So here are some possible genotypes. 500 00:39:19 --> 00:39:23 I've given you a choice of two for each of these. 501 00:39:23 --> 00:39:27 OK? So, for example, for this red SNP here you can be AA, 502 00:39:27 --> 00:39:32 AC or CC on the two homologous chromosomes. 503 00:39:32 --> 00:39:36 All right. So let's keep going with this thread. So because you have 504 00:39:36 --> 00:39:41 these SNPs all over your genome and you know where they are, 505 00:39:41 --> 00:39:46 you can use them to make a map of your entire genome. 506 00:39:46 --> 00:39:51 That doesn't depend on the genes. It just depends on the sequence. 507 00:39:51 --> 00:39:56 And knowing these SNPs is a lot easier to work with than having to 508 00:39:56 --> 00:40:01 sequence the entire genome of somebody every time you 509 00:40:01 --> 00:40:06 want some information. So you can use these SNPs to 510 00:40:06 --> 00:40:11 identify each person. So I have a SNP map of all these 511 00:40:11 --> 00:40:16 hundreds of thousands of SNPs, or up to a million. The usual maps 512 00:40:16 --> 00:40:20 presently used are about 300, 00 SNPs per genome. I have a map of 513 00:40:20 --> 00:40:25 300,000 SNPs where there are different, actually, 514 00:40:25 --> 00:40:30 I don't, but I could, where there are different alleles at 515 00:40:30 --> 00:40:35 different frequencies, different bases present at different 516 00:40:35 --> 00:40:40 frequencies at specific positions. And we could pick any one of you and 517 00:40:40 --> 00:40:44 make a SNP map for you. And it would look really different 518 00:40:44 --> 00:40:48 from mine, not because the SNPs themselves are different, 519 00:40:48 --> 00:40:53 they'd be the same SNPs, but the actual bases and the combination of 520 00:40:53 --> 00:40:57 bases between all these different SNPs would be different between 521 00:40:57 --> 00:41:01 different individuals. And this SNP-type map is the basis 522 00:41:01 --> 00:41:05 for DNA fingerprinting that is used in forensics and to figure out 523 00:41:05 --> 00:41:09 disease alleles. I'll talk more about this in a 524 00:41:09 --> 00:41:13 second. I want to point out that there are other kinds of 525 00:41:13 --> 00:41:16 polymorphisms that are used in genotyping, restriction fragment 526 00:41:16 --> 00:41:20 length polymorphisms and things called simple repeat polymorphisms. 527 00:41:20 --> 00:41:24 And you can look in your book for these restriction fragment length 528 00:41:24 --> 00:41:28 polymorphisms, but let's talk more about SNPs. 529 00:41:28 --> 00:41:32 So SNP genotyping, here's a whole list, 530 00:41:32 --> 00:41:36 but the ones I'm going to focus on are disease gene mapping and 531 00:41:36 --> 00:41:41 forensics. Also, you use SNP genotyping for paternity 532 00:41:41 --> 00:41:45 suits. OK? So if someone comes and, you know, if someone says it's my 533 00:41:45 --> 00:41:50 kid and the other one says it's my kid, you can figure out very easily 534 00:41:50 --> 00:41:54 whose it is by looking at these various SNPs and figuring out what 535 00:41:54 --> 00:41:59 pattern of SNPs is present in the offspring. OK. 536 00:41:59 --> 00:42:02 So let me actually consider, let me not deal with genotyping for 537 00:42:02 --> 00:42:06 disease alleles at this point. Let me talk about forensics a bit 538 00:42:06 --> 00:42:09 because it's kind of interesting. So how do you do this? Let's look 539 00:42:09 --> 00:42:13 through this slide. You have it as a handout. 540 00:42:13 --> 00:42:17 Here are SNPs. And I've just given you two chromosomes each with two 541 00:42:17 --> 00:42:20 SNPs. OK? And different people will have different bases at these 542 00:42:20 --> 00:42:24 particular SNPs, or they'll have different 543 00:42:24 --> 00:42:28 combinations of these bases. So here's the spot of blood at the 544 00:42:28 --> 00:42:32 crime scene. OK? Our red blood cells do not have 545 00:42:32 --> 00:42:37 nuclei so you cannot get DNA from those, but there are enough white 546 00:42:37 --> 00:42:42 blood cells that do have nuclei so you can. And, 547 00:42:42 --> 00:42:47 actually, you know from PCR now that you need very little to amplify 548 00:42:47 --> 00:42:52 something up by PCR. One cell is sufficient, 549 00:42:52 --> 00:42:58 right? It's pushing the technology, but you can really use one cell. 550 00:42:58 --> 00:43:03 So there are plenty of cells in a spot of blood at a crime scene to 551 00:43:03 --> 00:43:08 isolate the DNA and to PCR amplify the regions surrounding the SNP. 552 00:43:08 --> 00:43:13 So you're not just dealing with these two nucleotides or the choice 553 00:43:13 --> 00:43:19 of these two nucleotides at the SNP. You've got a little piece of DNA 554 00:43:19 --> 00:43:24 that's usually maybe 20 or so bases that includes this choice of single 555 00:43:24 --> 00:43:29 nucleotide polymorphism. So you amplify the SNP region, 556 00:43:29 --> 00:43:34 OK, a region that's constant, that includes the nucleotide polymorphism, 557 00:43:34 --> 00:43:39 and you determine the sequence at the different single nucleotide 558 00:43:39 --> 00:43:44 polymorphism regions. So you might get someone who, 559 00:43:44 --> 00:43:49 at the red position you an be A or C, at the green you can be G. 560 00:43:49 --> 00:43:54 OK, let's have an example here. You can get genotypes where at red 561 00:43:54 --> 00:43:59 you're A or C, green you're G or G, 562 00:43:59 --> 00:44:04 purple GT, and yellow you can be A or C. 563 00:44:04 --> 00:44:07 And here the example is C and C. So here are the four suspects, 564 00:44:07 --> 00:44:11 numbers one to four. OK. And here are their genotypes. 565 00:44:11 --> 00:44:15 OK. And here is the spot of blood at the crime scene that actually has 566 00:44:15 --> 00:44:18 this genotype. OK. So let me go back here. 567 00:44:18 --> 00:44:22 This is the genotype in the blood at the crime scene. 568 00:44:22 --> 00:44:26 OK. So the red sequence on one chromosome is an A, 569 00:44:26 --> 00:44:30 on the other is a C, so you have AC. 570 00:44:30 --> 00:44:34 On the other, the green sequence you have GG, purple you have GT, 571 00:44:34 --> 00:44:38 and yellow CC. So you're looking to see whether or not any of the 572 00:44:38 --> 00:44:43 suspect genotypes map up with a spot of blood, right? 573 00:44:43 --> 00:44:47 So we're assuming that a spot of blood, you know, 574 00:44:47 --> 00:44:52 comes from one of the suspects that was attacked by the person who was 575 00:44:52 --> 00:44:56 the victim. OK. So you have a victim with scratch. 576 00:44:56 --> 00:45:01 Someone has a spot of blood. And you see whether or not, 577 00:45:01 --> 00:45:05 or you can use semen samples, you can see whether or not the DNA 578 00:45:05 --> 00:45:09 in the human tissue that is believed to come from the attacker is 579 00:45:09 --> 00:45:13 matching of any of the suspects' genotypes. So there are a lot of 580 00:45:13 --> 00:45:17 assumptions there, right? You have to have tissue at 581 00:45:17 --> 00:45:21 the crime scene that you believe to come from the attacker. 582 00:45:21 --> 00:45:25 And then, once you have that, you can determine its genotype and 583 00:45:25 --> 00:45:30 compare it to the genotypes of the suspects. 584 00:45:30 --> 00:45:35 And you find, for example, here that, let's see, yeah, 585 00:45:35 --> 00:45:40 so I believe the suspect number three has the same genotype as the 586 00:45:40 --> 00:45:45 DNA that was in the spot of blood at the crime scene. 587 00:45:45 --> 00:45:50 And that would be some evidence that this suspect number three was 588 00:45:50 --> 00:45:55 the person who did it. Now, in actual fact, you do this 589 00:45:55 --> 00:46:00 not just for four SNPs, you do it for thousands of SNPs. 590 00:46:00 --> 00:46:04 You don't usually do this for 300, 00 SNPs because that's expensive and 591 00:46:04 --> 00:46:08 it's a lot of work. And forensics doesn't put that much 592 00:46:08 --> 00:46:12 money into this. However, the more SNPs you use for 593 00:46:12 --> 00:46:17 genotyping the more sure you are of the suspect's identity. 594 00:46:17 --> 00:46:21 OK? Because it's really a matter of frequency of whether or not 595 00:46:21 --> 00:46:25 you're going to get the same combination of these different SNP 596 00:46:25 --> 00:46:30 bases in different potential suspects. 597 00:46:30 --> 00:46:33 So the greater the spectrum of SNPs you look at, the more sure you are 598 00:46:33 --> 00:46:37 of the suspect's identity. Now, in some cases this has been 599 00:46:37 --> 00:46:41 very, very useful. And there are a number of people on 600 00:46:41 --> 00:46:45 Death Row who have been exonerated by going back to DNA recovered from 601 00:46:45 --> 00:46:49 the crime scene sometimes years ago, doing SNP mapping and showing that 602 00:46:49 --> 00:46:53 they really couldn't have done it because the genotypes did not match 603 00:46:53 --> 00:46:57 up. Usually these were rape cases and the semen genotype just did not 604 00:46:57 --> 00:47:01 match up with the semen genotype of the person on Death Row. 605 00:47:01 --> 00:47:05 So this is very valuable technology. OK. It was used in the O.J. 606 00:47:05 --> 00:47:10 Simpson trial, but not as well as it could have 607 00:47:10 --> 00:47:14 been which lead to equivocation there. OK. So time is fleeting. 608 00:47:14 --> 00:47:19 I'm going to mention a technology to you in the last couple of minutes, 609 00:47:19 --> 00:47:24 and then we'll come back to it as we go on through later parts of the 610 00:47:24 --> 00:47:29 course. So I've talked today about DNA sequencing. 611 00:47:29 --> 00:47:33 I've talked about using polymorphisms to genotype people 612 00:47:33 --> 00:47:38 either, well, for disease alleles I focused on who-done-its. 613 00:47:38 --> 00:47:43 Something else that I want to throw out at you at this point is the 614 00:47:43 --> 00:47:48 notion of transgenic technology. And I'm going to tell you what 615 00:47:48 --> 00:47:52 transgenic organisms are as part of completing the Recombinant DNA 616 00:47:52 --> 00:47:57 Module. And then we'll come back in future modules and talk more about 617 00:47:57 --> 00:48:02 how you make these things. But I want to have this as part of 618 00:48:02 --> 00:48:07 your compendium now. A transgenic animal or transgenic 619 00:48:07 --> 00:48:12 organism is an organism where you have manipulated its genome in some 620 00:48:12 --> 00:48:17 way, where you've either inserted extra DNA into its genome or you've 621 00:48:17 --> 00:48:22 removed DNA from its genome or you've done something to its genome 622 00:48:22 --> 00:48:27 such that it was not the organism that you started off with. 623 00:48:27 --> 00:48:32 Genetically modified organisms. The food that you eat that is 624 00:48:32 --> 00:48:36 genetically modified has had its genome tampered with. 625 00:48:36 --> 00:48:40 This type of transgenic technology is very, very useful, 626 00:48:40 --> 00:48:44 not only for creating genetically modified foods, 627 00:48:44 --> 00:48:49 but it's very, very useful for creating disease models of animals. 628 00:48:49 --> 00:48:53 And I'll tell you now that there is a mouse model of human familial 629 00:48:53 --> 00:48:57 hypercholesterolemia that has been created by making a specific 630 00:48:57 --> 00:49:02 mutation, that T to A mutation in the mouse LDL receptor gene. 631 00:49:02 --> 00:49:06 Another thing that is extremely useful about transgenic animals is 632 00:49:06 --> 00:49:10 that you can get them to make specific proteins. 633 00:49:10 --> 00:49:14 So, for example, there are goats that have had inserted into their 634 00:49:14 --> 00:49:18 genomes genes that encode for particular medications, 635 00:49:18 --> 00:49:22 for particular drugs. And you can get these drugs out of the milk of 636 00:49:22 --> 00:49:26 the goats usually or out of the serum of the goats because they are 637 00:49:26 --> 00:49:30 constitutively producing them because you've put various genes 638 00:49:30 --> 00:49:35 into their genome. So I'm going to leave it there and 639 00:49:35 --> 00:49:38 we'll talk about how to make transgenics in a future lecture.