1 00:00:01 --> 00:00:08 Good morning. Good morning. 2 00:00:08 --> 00:00:14 So, I'd like to pick up where we left off last time and just finish 3 00:00:14 --> 00:00:20 off translation and then step back and look at how this central dogma 4 00:00:20 --> 00:00:26 of DNA is replicated into DNA, is read into RNA, and is translated 5 00:00:26 --> 00:00:30 into protein. Or, actually, as Francis Crick 6 00:00:30 --> 00:00:33 really put it, all information flow from nucleic 7 00:00:33 --> 00:00:37 acid to protein. How that varies amongst organisms. 8 00:00:37 --> 00:00:40 Because first we're going through it and looking at the absolutely 9 00:00:40 --> 00:00:43 common features, DNA replication, so it's five prime 10 00:00:43 --> 00:00:46 to three prime, et cetera, et cetera, 11 00:00:46 --> 00:00:50 transcription, translation. But in a moment I'd like to turn to 12 00:00:50 --> 00:00:53 the variations between different kinds of organisms. 13 00:00:53 --> 00:00:56 But let me briefly finish up, if I may, the bit about translation 14 00:00:56 --> 00:01:00 in general so we can look at its variation. 15 00:01:00 --> 00:01:06 As we talked about last time, we have a messenger RNA that has 16 00:01:06 --> 00:01:13 been transcribed from a specific region of the chromosome starting at 17 00:01:13 --> 00:01:20 a promoter and going to some stop of transcription. 18 00:01:20 --> 00:01:27 And that messenger RNA will include some particular sequence, 19 00:01:27 --> 00:01:34 and I'll copy one here, A-U-A-C-G-A-U-G-A-A-G-A-G-G-C-C-C, 20 00:01:34 --> 00:01:41 et cetera, et cetera, et cetera, out to a UAG. 21 00:01:41 --> 00:01:45 And this is the direction five prime to three prime. 22 00:01:45 --> 00:01:49 We'll remember that all nucleic acid polymerization goes five prime 23 00:01:49 --> 00:01:53 to three prime. So, what happens is the cell begins 24 00:01:53 --> 00:01:57 scanning this message. And it does that by this message 25 00:01:57 --> 00:02:01 being exported into the cytoplasm of the cell. The ribosome coming along 26 00:02:01 --> 00:02:05 and glomming onto this message and scanning on for the 27 00:02:05 --> 00:02:09 place to start. It looks, it looks, 28 00:02:09 --> 00:02:13 it looks, it looks, and it finds the first AUG. Footnote, 29 00:02:13 --> 00:02:18 this isn't 100% true. There are occasional messages that start their 30 00:02:18 --> 00:02:22 translation not at an AUG, and there are even occasional, 31 00:02:22 --> 00:02:27 there are even more messages that don't quite start at the first AUG 32 00:02:27 --> 00:02:31 because the ribosome is really is looking for something a little bit 33 00:02:31 --> 00:02:36 special, but to a first order approximation. 34 00:02:36 --> 00:02:40 Good enough for the textbooks. It goes along to the first AUG. 35 00:02:40 --> 00:02:44 In reality it's a little more subtle than that. 36 00:02:44 --> 00:02:48 But it starts at the first AUG. And what it does is it builds a 37 00:02:48 --> 00:02:52 protein that corresponds to it according to a three letter genetic 38 00:02:52 --> 00:02:56 code. And you all know the lookup table. It's in your book. 39 00:02:56 --> 00:03:00 AUG, always the first amino acid put in. A methionine. 40 00:03:00 --> 00:03:04 Then AAG. Lysine, I think. Then arginine. 41 00:03:04 --> 00:03:08 Then a proline. Now, I mean this is this particular sequence. 42 00:03:08 --> 00:03:12 Any other sequence would be different. Et cetera. 43 00:03:12 --> 00:03:17 How does it accomplish this matching between three letters of 44 00:03:17 --> 00:03:21 the genetic code? Oh, and when it gets to AUG, 45 00:03:21 --> 00:03:25 that is one of the three singles for stop, don't put in any more amino 46 00:03:25 --> 00:03:30 acids. There are three such stop signals. 47 00:03:30 --> 00:03:37 AUG, sorry, UAG, UGG and U, oops, what did I just do 48 00:03:37 --> 00:03:44 here? Let's get that right. UAG, UGG and UGA. Those are the 49 00:03:44 --> 00:03:51 three stop codons. So, how many total codons are there? 50 00:03:51 --> 00:03:59 64 codons. Three of them spell stop. 61 of them spell 51 00:03:59 --> 00:04:05 specific amino acids. And how many amino acids are there? 52 00:04:05 --> 00:04:11 20. So, the average redundancy is three. Some are specified by 53 00:04:11 --> 00:04:16 multiple codons. The most extreme is some amino 54 00:04:16 --> 00:04:21 acids are specified by as many as six codons. Did I, 55 00:04:21 --> 00:04:27 oh, thank you. Come back down. Of course. U-A, so it's UAG, right? 56 00:04:27 --> 00:04:32 Sorry, UAA and UGA and UAG. 57 00:04:32 --> 00:04:37 Thank you. Very good. All right. So, now, how does it accomplish 58 00:04:37 --> 00:04:42 this feat of taking amino acids, of taking nucleotide sequence, RNA 59 00:04:42 --> 00:04:47 sequence and converting it into the sequence of amino acids? 60 00:04:47 --> 00:04:52 As I mentioned last time, there was lots of original somewhat 61 00:04:52 --> 00:04:57 nutty thinking about some looping codes that would make the RNA fold 62 00:04:57 --> 00:05:03 up in such a way to bind the amino acids and all that. 63 00:05:03 --> 00:05:12 But, as Francis Crick thought up, there had to be some kind of an 64 00:05:12 --> 00:05:21 adapter molecule that would take the RNA sequence and would somehow 65 00:05:21 --> 00:05:30 connect it up to the correct amino acid, and that was UAC. 66 00:05:30 --> 00:05:35 A particular transfer RNA molecule. And the tRNA molecule is an adapter 67 00:05:35 --> 00:05:40 sequence that has three nucleotides here that match up to the three 68 00:05:40 --> 00:05:45 nucleotides of the codon that we're trying to translate, 69 00:05:45 --> 00:05:50 and it has the appropriate amino acid that's been stuck on the end of 70 00:05:50 --> 00:05:55 it. And how does it get there? How does the right tRNA, the tRNA 71 00:05:55 --> 00:06:00 to match this codon have the right amino acid put on it? 72 00:06:00 --> 00:06:03 There's a dedicated enzyme that recognizes that tRNA and puts on 73 00:06:03 --> 00:06:07 that amino acid. It's aminoacyl-tRNA synthetase. 74 00:06:07 --> 00:06:11 It sticks the right amino on the right transfer RNA. 75 00:06:11 --> 00:06:15 So, that's how it accomplishes the physical recognition of these three 76 00:06:15 --> 00:06:18 bases and has the right amino acid attached to it. 77 00:06:18 --> 00:06:22 There's an enzymatic machinery that has all of these tRNAs floating 78 00:06:22 --> 00:06:26 around in the cell which can be used for this translation here. 79 00:06:26 --> 00:06:30 How does this actually happen physically? 80 00:06:30 --> 00:06:35 It happens in this vast machine called the ribosome. 81 00:06:35 --> 00:06:41 In the ribosome, if we have, say, our codon here and we have a 82 00:06:41 --> 00:06:47 tRNA that, well, we'll put that actually in the 83 00:06:47 --> 00:06:53 ribosome that, say, has the first amino acid here, 84 00:06:53 --> 00:06:59 methionine, there's a cavity for this guy and there's a cavity 85 00:06:59 --> 00:07:05 for the next guy. And other tRNAs come into the cell 86 00:07:05 --> 00:07:11 carrying their next amino acid. Maybe it will be here a lysine that 87 00:07:11 --> 00:07:17 matches up with the codon and the anti-codon. And when the right tRNA 88 00:07:17 --> 00:07:23 fits in the next cavity over, the ribosome itself catalyzes a 89 00:07:23 --> 00:07:30 peptide bond between these amino acids. 90 00:07:30 --> 00:07:35 Then it chugs over by one, it translocates by one moving this 91 00:07:35 --> 00:07:40 bit of the complex to the left, and the peptide chain continues to 92 00:07:40 --> 00:07:45 grow out this end as each new codon is moved into position, 93 00:07:45 --> 00:07:50 a tRNA comes in bringing the right amino acid until finally a stop 94 00:07:50 --> 00:07:55 codon is hit. And what happens when you hit a stop codon? 95 00:07:55 --> 00:08:00 It stops. And is there a tRNA for a stop? 96 00:08:00 --> 00:08:02 It turns out there's not. There actually isn't. There's some 97 00:08:02 --> 00:08:05 other factor. There's a protein factor that helps recognize the 98 00:08:05 --> 00:08:08 stops. So, that just continues to chug on. Those of you who are 99 00:08:08 --> 00:08:11 computer scientists or mathematicians will recognize this 100 00:08:11 --> 00:08:14 is a two-tape Turing machine. It is the small two-tape Turing 101 00:08:14 --> 00:08:17 machine that I know to exist. If you don't know what that means, 102 00:08:17 --> 00:08:20 you can forget about that comment. In any case, but some of you know 103 00:08:20 --> 00:08:23 what that is. So, that's how it proceeds. 104 00:08:23 --> 00:08:26 That is your basic protein translation. 105 00:08:26 --> 00:08:28 And, I must say, what I really love about this was 106 00:08:28 --> 00:08:31 that Francis Crick kind of figured out what had to happen just on first 107 00:08:31 --> 00:08:34 principles and was able to think through it much more clearly and 108 00:08:34 --> 00:08:37 direct people to know what to look for in the laboratory. 109 00:08:37 --> 00:08:40 And if people had not had the clarity of thinking that Crick 110 00:08:40 --> 00:08:43 provided by saying, look, there's got to be this kind of 111 00:08:43 --> 00:08:46 adapter, I don't think they would have found it as quickly. 112 00:08:46 --> 00:08:49 But once he said this is what you've got to look for, 113 00:08:49 --> 00:08:52 golly, it was there. You can't do that very often, 114 00:08:52 --> 00:08:55 but Francis Crick seemed to have a very good track record of doing 115 00:08:55 --> 00:08:58 those things. OK. So, that was just finishing off 116 00:08:58 --> 00:09:02 translation. Now what I'd like to do is turn to 117 00:09:02 --> 00:09:08 variations on the theme as the major issue for today. 118 00:09:08 --> 00:09:14 How does this central dogma, DNA replicates, is transcribed into 119 00:09:14 --> 00:09:20 RNA and is translated into protein, vary amongst the different kinds of 120 00:09:20 --> 00:09:26 organisms that we might be interested in? 121 00:09:26 --> 00:09:32 The kinds of organisms we might be interested in, 122 00:09:32 --> 00:09:39 eukaryotes, prokaryotes, viruses. Sample eukaryote, 123 00:09:39 --> 00:09:46 MIT undergraduate. Prokaryote, E. coli. And virus, many possible 124 00:09:46 --> 00:09:53 viruses. The eukaryotes' big nucleated cells. 125 00:09:53 --> 00:10:00 So, in here we're going to have our nucleated cells. 126 00:10:00 --> 00:10:04 DNA living in there. In our prokaryotes we have no 127 00:10:04 --> 00:10:09 distinct nucleus. The DNA is not in a distinct 128 00:10:09 --> 00:10:14 nucleus, although it's not entirely freely floating around. 129 00:10:14 --> 00:10:19 It tends to be clustered together. In the virus the nucleic acid 130 00:10:19 --> 00:10:24 resides in some kind of a capsid, some kind of a, it could be a 131 00:10:24 --> 00:10:29 protein capsid. There are some of them that have 132 00:10:29 --> 00:10:34 lipid capsids with lipid particles around them, but some kind of a coat 133 00:10:34 --> 00:10:39 around nucleic acid there. Do they all do exactly the same 134 00:10:39 --> 00:10:44 things with regard to DNA replication, RNA transcription and 135 00:10:44 --> 00:10:49 protein translation? Well, not entirely. So, 136 00:10:49 --> 00:10:54 as a way, in a way to reinforce what we know about these, 137 00:10:54 --> 00:11:00 let's look at how they differ. DNA replication. Eukaryotes. 138 00:11:00 --> 00:11:06 What's the structure of one of your chromosomes? Is it a long line, 139 00:11:06 --> 00:11:12 a long linear molecule, or is it a circular molecule? 140 00:11:12 --> 00:11:18 How many of you have linear chromosomes? How many of you have 141 00:11:18 --> 00:11:24 circular chromosomes? I heard there were some people with 142 00:11:24 --> 00:11:30 circular. And how many of you are unsure about your chromosomes? 143 00:11:30 --> 00:11:38 OK. That's good. Well, then I'm pleased to inform 144 00:11:38 --> 00:11:47 you that you have long linear chromosomes. Every human chromosome 145 00:11:47 --> 00:11:55 is a long double-stranded molecule of DNA. Linear double-stranded DNA. 146 00:11:55 --> 00:12:02 They can be extremely long. You have 23 chromosomes, 147 00:12:02 --> 00:12:08 and together they make up three billion nucleotides of DNA. 148 00:12:08 --> 00:12:13 A typical chromosome could be 150 million bases long as an average 149 00:12:13 --> 00:12:19 size for a chromosome. And it's a single connected 150 00:12:19 --> 00:12:24 molecule. 150 million bases long in the human is a typical chromosome. 151 00:12:24 --> 00:12:30 One tricky little bit about replicating DNA. 152 00:12:30 --> 00:12:34 Let's just think back to our little model of replicating DNA. 153 00:12:34 --> 00:12:38 Let's come to the chromosome end here. It's five prime to three 154 00:12:38 --> 00:12:42 prime. Five prime to three prime. We're going to start replicating. 155 00:12:42 --> 00:12:47 We're getting to the end of chromosome number one. 156 00:12:47 --> 00:12:51 We've got a primer here, and the primer is going to be used 157 00:12:51 --> 00:12:55 to extend, extend, extend. We get right to the end. 158 00:12:55 --> 00:13:00 That's good. Tell me how we're going to replicate back. 159 00:13:00 --> 00:13:04 We need a little primer to start it, right? And where's that primer 160 00:13:04 --> 00:13:09 going to land? Maybe over here it will start 161 00:13:09 --> 00:13:14 replicating back. Oh, boy, we haven't done this 162 00:13:14 --> 00:13:19 figure. So, what do we have to do there? So, we need to primer a 163 00:13:19 --> 00:13:24 little further back. OK. But, you know what, 164 00:13:24 --> 00:13:29 the chance that we're going to get that right at the end, 165 00:13:29 --> 00:13:34 that we're going to get a primer exactly at the end is pretty low. 166 00:13:34 --> 00:13:37 And if we don't have a primer exactly at the end, 167 00:13:37 --> 00:13:41 what's going to be wrong with that copy of the chromosome? 168 00:13:41 --> 00:13:45 Too short. Now, big deal. So, it's short by maybe 20 bases. 169 00:13:45 --> 00:13:49 But that's just this cell division. What about next cell division? It 170 00:13:49 --> 00:13:53 will be short on average by a little bit, and then the next cell division 171 00:13:53 --> 00:13:57 and the next cell division. It's actually pretty tricky to 172 00:13:57 --> 00:14:01 replicate a linear chromosome on the lagging strand, 173 00:14:01 --> 00:14:05 unless you can land the primer in exactly the right place, 174 00:14:05 --> 00:14:09 which doesn't happen. So, a special little solution is 175 00:14:09 --> 00:14:15 used. The ends of chromosomes here are called telomeres, 176 00:14:15 --> 00:14:21 telo meaning end. These telomeres have very specific structures. 177 00:14:21 --> 00:14:27 In the human they repeat, T-T-A-G-G-G, again and 178 00:14:27 --> 00:14:32 again and again. At the end of the chromosome there's 179 00:14:32 --> 00:14:36 a special enzyme that will come along and add some extra telomere to 180 00:14:36 --> 00:14:41 the chromosome. That, sorry? Did I say leading 181 00:14:41 --> 00:14:46 strand? It's the, oh, yeah, sorry. It's the lagging, 182 00:14:46 --> 00:14:50 sorry. It's the leading strand. No, no, no, this is the lagging strand. 183 00:14:50 --> 00:14:55 This is the leading strand because it's running along happily not 184 00:14:55 --> 00:15:00 having to make a primer. The okazaki fragment should be here. 185 00:15:00 --> 00:15:04 I'll stick by that. We'll debate it later. 186 00:15:04 --> 00:15:08 Anyway, they, we get the point. But it's lagging because you've got 187 00:15:08 --> 00:15:12 the ogzocy fragments there. So, anyway, we have a problem of 188 00:15:12 --> 00:15:16 replication. And the way the cell solves it is the actual replication 189 00:15:16 --> 00:15:20 is shorter, but since it manages to stick some repeat at the end of the 190 00:15:20 --> 00:15:24 chromosome it adds back some more T-T-A-G-G-G, T-T-A-G-G-G, 191 00:15:24 --> 00:15:29 T-T-A-G-G-G, and it keeps dynamically adding more. 192 00:15:29 --> 00:15:33 What do you think would happen if you didn't, or what's the enzyme 193 00:15:33 --> 00:15:37 that adds telomeres? Telomerase. Telomerase adds that. 194 00:15:37 --> 00:15:41 What cells do you think need to have active telomerase? 195 00:15:41 --> 00:15:45 Rapidly dividing cells would need to have telomerase. 196 00:15:45 --> 00:15:49 Cells that are not rapidly dividing, cells that have stopped dividing can 197 00:15:49 --> 00:15:53 shut off their telomerase. But if a cell is going to go 198 00:15:53 --> 00:15:57 through lots and lots of cell divisions it's got to, 199 00:15:57 --> 00:16:01 it's got to tidy up its telomeres each time because they're 200 00:16:01 --> 00:16:06 getting too short. You've got to have an enzyme that's 201 00:16:06 --> 00:16:10 adding back ends of chromosomes. What cells do you think 202 00:16:10 --> 00:16:14 particularly care about having telomerase on them? 203 00:16:14 --> 00:16:19 Cancers. It turns out that this is not a trivial point. 204 00:16:19 --> 00:16:23 More than 90% of cancers turn on actively the telomerase gene, 205 00:16:23 --> 00:16:27 which would be a shut off in normal cells because the cell is 206 00:16:27 --> 00:16:32 not dividing anymore. Part of becoming a cancer is having 207 00:16:32 --> 00:16:36 to turn on this repair mechanism for the ends, this extension mechanism 208 00:16:36 --> 00:16:40 for the ends of your chromosomes. And so, various people are trying 209 00:16:40 --> 00:16:44 to make drugs to inhibit cancers by inhibiting this telomerase enzyme. 210 00:16:44 --> 00:16:49 So, understanding just your linear replication of chromosomes is a kind 211 00:16:49 --> 00:16:53 of useful thing even in dealing with things like cancer. 212 00:16:53 --> 00:16:57 Genome sizes. I mentioned, how big was the human genome? 213 00:16:57 --> 00:17:02 Three times ten to the ninth bases. The mouse genome? 214 00:17:02 --> 00:17:06 It's almost as big, about 2.7 times ten to the ninth 215 00:17:06 --> 00:17:11 bases, 2.7 million bases. The elephant genome? I actually 216 00:17:11 --> 00:17:15 just found this out last week because we just finished sequencing 217 00:17:15 --> 00:17:20 elephant DNA, and I can now tell you I think it's 3. 218 00:17:20 --> 00:17:25 . The dog is 2. times ten to the ninth. 219 00:17:25 --> 00:17:29 Anyway, it's about, for most mammals it's pretty close 220 00:17:29 --> 00:17:33 to three billion bases. And there is some fluctuation. 221 00:17:33 --> 00:17:37 Some are a little bigger. Some are a little smaller. 222 00:17:37 --> 00:17:41 It doesn't scale with sizing the animal, though, 223 00:17:41 --> 00:17:45 because the dog has a smaller genome, for example, than the mouse does, 224 00:17:45 --> 00:17:48 but the elephant is a big bigger than us. And check in later in the 225 00:17:48 --> 00:17:52 term, I'll tell you about the aardvark. We should know in a 226 00:17:52 --> 00:17:56 little while. But here are, for example, fruit flies. The fruit 227 00:17:56 --> 00:18:00 fly, it has a genome of two times ten to the eighth. 228 00:18:00 --> 00:18:04 I'm giving, I'm being quite approximate. In fact, 229 00:18:04 --> 00:18:08 I'll make it, I'll give you 1. times ten to the eighth. 150 230 00:18:08 --> 00:18:12 million bases. Yeast, by contrast, 231 00:18:12 --> 00:18:17 has a genome of 1.2 times ten to the seventh. So, that's 12 million, 232 00:18:17 --> 00:18:21 150 million give or take, and about three billion, 233 00:18:21 --> 00:18:25 so 3,000 million. So, genome sizes can vary quite 234 00:18:25 --> 00:18:30 dramatically amongst different eukaryotes. 235 00:18:30 --> 00:18:36 Now, what about prokaryotes? How do the prokaryotes differ? 236 00:18:36 --> 00:18:43 Prokaryotes differ because their genomes are typically not linear 237 00:18:43 --> 00:18:49 chromosomes. The typical prokaryotic chromosome is a 238 00:18:49 --> 00:18:56 double-stranded circle. It's a double-stranded circular DNA. 239 00:18:56 --> 00:19:02 Now, the double-stranded circular 240 00:19:02 --> 00:19:07 DNA doesn't have this problem of telomeres. You just keep 241 00:19:07 --> 00:19:12 replicating around and you get to the end. So, there you have a much 242 00:19:12 --> 00:19:17 simpler replication system than having to worry about your ends of 243 00:19:17 --> 00:19:22 chromosomes. You also have much smaller genomes. 244 00:19:22 --> 00:19:27 The typical prokaryotic genome size, it's on the order of a few million 245 00:19:27 --> 00:19:31 bases. E. coli, 4.6 million bases. There are, for example, 246 00:19:31 --> 00:19:35 mycobacteria, such as the mycobacteria that caused 247 00:19:35 --> 00:19:39 tuberculosis or leprosy, have on the order of, well, 248 00:19:39 --> 00:19:43 actually, not quite them, but other mycobacteria have on the order of 249 00:19:43 --> 00:19:46 about a million bases or so. Mycobacteria, M. genitalia has 250 00:19:46 --> 00:19:50 actually slightly less than a million basis. 251 00:19:50 --> 00:19:54 So, these are basically several million bases. 252 00:19:54 --> 00:19:58 So, there's a huge variation in genome size. 253 00:19:58 --> 00:20:02 Your genome is about a thousand times bigger than E. 254 00:20:02 --> 00:20:07 coli's genome. Now, you do actually have one circular chromosome. 255 00:20:07 --> 00:20:11 Do you know what it is? I speak about the 23 pairs of human 256 00:20:11 --> 00:20:16 chromosomes. There's actually one more human chromosome. 257 00:20:16 --> 00:20:20 The mitochondria have their own chromosome. It's a circle. 258 00:20:20 --> 00:20:25 That's very odd that you would have one chromosome that's a circle that 259 00:20:25 --> 00:20:30 looks like a bacterial chromosome. Do you know why that is? 260 00:20:30 --> 00:20:34 The mitochondria arose as a symbiotic bacterium that became a 261 00:20:34 --> 00:20:38 symbiont of eukaryotic cells about 1. billion years ago. 262 00:20:38 --> 00:20:42 It was a bacterium taken up into another cell, and that's how 263 00:20:42 --> 00:20:46 eukaryotes evolved. And we can even see that little 264 00:20:46 --> 00:20:50 signature of it having been a prokaryote from the fact that it's 265 00:20:50 --> 00:20:54 got one of these circular prokaryotic looking chromosomes. 266 00:20:54 --> 00:20:58 Now, it, because it's living in your cells, has thrown out all sorts 267 00:20:58 --> 00:21:02 of genes that it doesn't need anymore because the main, 268 00:21:02 --> 00:21:06 the nucleus supplies most of the proteins. 269 00:21:06 --> 00:21:10 So, your mitochondrial genome is a circle that's a mere 16, 270 00:21:10 --> 00:21:14 00 bases long. It's a very small circle encoding a very limited 271 00:21:14 --> 00:21:18 number of genes, but it's, in fact, 272 00:21:18 --> 00:21:22 the residue of the bacterial symbiont that lead to the formation 273 00:21:22 --> 00:21:26 of euks. Now, viruses, what do viruses have? 274 00:21:26 --> 00:21:30 Do they have double-strained linear chromosomes? Which is it? 275 00:21:30 --> 00:21:38 Is it double-stranded linear DNA or is it double-stranded circular DNA? 276 00:21:38 --> 00:21:46 Circular DNA. So, who votes for linear? Who votes for circular? 277 00:21:46 --> 00:21:54 Who's undecided? Ah, the undecided are very larger here. So, 278 00:21:54 --> 00:22:01 the answer is both. Some viruses have double-stranded 279 00:22:01 --> 00:22:07 linear DNA. Some viruses have double-stranded circular DNA. 280 00:22:07 --> 00:22:14 It's worse than that, though. Some viruses have single-stranded 281 00:22:14 --> 00:22:20 linear, circular DNA. Ha? How does that work? 282 00:22:20 --> 00:22:26 Some viruses actually infect the cell injecting DNA, 283 00:22:26 --> 00:22:32 and it's just single-stranded. As soon as it gets into the cell, 284 00:22:32 --> 00:22:36 however, it's replicated to make a double-stranded DNA which can then 285 00:22:36 --> 00:22:41 be transcribed, et cetera, et cetera, 286 00:22:41 --> 00:22:46 et cetera. But it travels around as a single-stranded piece of DNA. 287 00:22:46 --> 00:22:50 And it's actually weirder than that. Some viruses, 288 00:22:50 --> 00:22:55 viruses being very small can experiment with all 289 00:22:55 --> 00:23:02 sorts of things. Some viruses actually consist not of 290 00:23:02 --> 00:23:10 DNA at all but of RNA, single-stranded RNA. How does it do 291 00:23:10 --> 00:23:18 that? So, in other words, in the capsid there's a single 292 00:23:18 --> 00:23:26 strand of RNA. When it gets into the cell, 293 00:23:26 --> 00:23:32 what does it do? Sorry? It creates DNA. 294 00:23:32 --> 00:23:36 How does it create DNA? From the RNA. How's it going to do 295 00:23:36 --> 00:23:41 that? Well, how is it going to turn itself into DNA? 296 00:23:41 --> 00:23:46 It needs an enzyme to do that? Reverse transcriptase. You would 297 00:23:46 --> 00:23:50 like to reverse the transcription process, and you would like to name 298 00:23:50 --> 00:23:55 that reverse transcriptase. And where are you going to get this 299 00:23:55 --> 00:24:00 reverse transcriptase from? Laying around. Laying around where? 300 00:24:00 --> 00:24:04 I mean the cell is just sitting there with reverse transcriptase 301 00:24:04 --> 00:24:08 waiting to obligingly reverse transcribe this virus? 302 00:24:08 --> 00:24:12 Your RNA. So make it how? With ribosomes. So, in other words, 303 00:24:12 --> 00:24:17 if I'm an RNA, why don't I encode the sequence for 304 00:24:17 --> 00:24:21 reverse transcriptase and actually translate myself. 305 00:24:21 --> 00:24:25 So, if you were really cleaver, you might decide to put in the 306 00:24:25 --> 00:24:30 genetic code for reverse transcriptase. 307 00:24:30 --> 00:24:36 And when that message gets into the cell, it will first act as an mRNA, 308 00:24:36 --> 00:24:42 a messenger RNA, translate, make, here's the reverse transcriptase 309 00:24:42 --> 00:24:48 enzyme, which is then going to go, and it's going to reverse transcribe 310 00:24:48 --> 00:24:54 this thing into, say, DNA. So, wow. 311 00:24:54 --> 00:25:00 Now, that's a good one. This is a plus strand virus. 312 00:25:00 --> 00:25:05 It encodes its own reverse transcriptase in its instructions. 313 00:25:05 --> 00:25:10 There actually are minus-strand viruses that don't, 314 00:25:10 --> 00:25:15 but what they do is instead in their own code, in their own package bring 315 00:25:15 --> 00:25:20 a longer reverse transcriptase. So, either you can encode your own 316 00:25:20 --> 00:25:25 reverse transcriptase or in the package you can include your own 317 00:25:25 --> 00:25:30 reverse transcriptase. Do you know any viruses? 318 00:25:30 --> 00:25:36 And then the reverse transcriptase is then used to transcribe the DNA, 319 00:25:36 --> 00:25:43 the RNA into DNA, and eventually into a double-stranded DNA which, 320 00:25:43 --> 00:25:49 in some of the viruses, can then be slammed into and inserted into your 321 00:25:49 --> 00:25:56 own chromosomes. So, a DNA copy of the virus can be 322 00:25:56 --> 00:26:03 installed into your own chromosomes, which is somewhat insidious. 323 00:26:03 --> 00:26:08 What viruses do you know that do this? HIV. More generally 324 00:26:08 --> 00:26:13 retroviruses are the class of these viruses that can, 325 00:26:13 --> 00:26:19 in fact, run this replication process from RNA to DNA and install 326 00:26:19 --> 00:26:24 DNA copies of them in your genome. And how do you then get the DNA 327 00:26:24 --> 00:26:30 copy out of your genome? You don't. 328 00:26:30 --> 00:26:33 It doesn't come out. Retroviral insertions don't come 329 00:26:33 --> 00:26:37 out. That's one of the issues in dealing with AIDS is once this DNA 330 00:26:37 --> 00:26:40 copy is in a cell it's not coming out. We have no way to remove it. 331 00:26:40 --> 00:26:44 We have to make sure that the virus is shut down by other mechanisms 332 00:26:44 --> 00:26:47 that might inhibit its products, et cetera, but once its stuck a DNA 333 00:26:47 --> 00:26:51 copy into your chromosomes, you know, there's no way of getting 334 00:26:51 --> 00:26:55 it out. So, if we had to try to inhibit the 335 00:26:55 --> 00:27:00 action of the AIDS virus, we might wish to make inhibitors of 336 00:27:00 --> 00:27:05 this aspect of replication, inhibitors or reverse transcription. 337 00:27:05 --> 00:27:10 And, of course, as probably many of you know, some of the important AIDS 338 00:27:10 --> 00:27:15 drugs are reverse transcriptase inhibitors, very important to 339 00:27:15 --> 00:27:20 limiting the replication of the AIDS virus. And there are many other 340 00:27:20 --> 00:27:25 kinds of weirdnesses. Viruses pretty much explore, 341 00:27:25 --> 00:27:30 everything you possibly can do, viruses come up with ways to do. 342 00:27:30 --> 00:27:35 Let's take now the process of transcription. 343 00:27:35 --> 00:27:40 We have replication up there. Let's look at transcription. And 344 00:27:40 --> 00:27:45 this time let's start with prokaryotes. For the simple aspect 345 00:27:45 --> 00:27:50 of transcribing genes, the prokaryotic genome looks just 346 00:27:50 --> 00:27:55 like the simple model I gave you. There is some kind of a promoter 347 00:27:55 --> 00:28:00 that tells RNA polymerase to come sit down here. 348 00:28:00 --> 00:28:07 RNA polymerase hops on, RNA polymerase begins to copy in RNA, 349 00:28:07 --> 00:28:15 and eventually it hits the signal that says to terminate transcription. 350 00:28:15 --> 00:28:22 OK. This is not a stop codon which is about translation. 351 00:28:22 --> 00:28:30 This is a termination of transcription. 352 00:28:30 --> 00:28:35 And this RNA then goes off. A perfectly happy thing, a 353 00:28:35 --> 00:28:40 messenger RNA, mRNA. So, there's nothing weird 354 00:28:40 --> 00:28:46 about proks compared to the simple description that we gave before. 355 00:28:46 --> 00:28:51 But eukaryotes are different. There are some funny things that 356 00:28:51 --> 00:28:57 happen in the eukaryote. Well, first off it starts the same. 357 00:28:57 --> 00:29:03 There's a promoter. RNA polymerase sits down there, 358 00:29:03 --> 00:29:09 it starts transcribing, it makes an mRNA, it hits the transcriptional 359 00:29:09 --> 00:29:16 termination signal, it stops, and then this RNA gets 360 00:29:16 --> 00:29:22 processed in interesting ways. The first thing that happens is 361 00:29:22 --> 00:29:29 three modifications happen. The first one is at the five prime 362 00:29:29 --> 00:29:35 end, remember five prime to three prime, a funny modification is put 363 00:29:35 --> 00:29:41 on. It's a, if the message, say, were A-U-C-U-G-G-C et cetera, 364 00:29:41 --> 00:29:47 a G triphosphate is put on backwards. It's actually a methyl G 365 00:29:47 --> 00:29:53 triphosphate is put on backwards, so going in the other direction. 366 00:29:53 --> 00:30:00 You have the triphosphate bond there, a methyl G. 367 00:30:00 --> 00:30:04 And the only thing that you share care about that, 368 00:30:04 --> 00:30:09 I don't care if you know the structure, is that there's a funny 369 00:30:09 --> 00:30:13 cap. This thing is called a cap that is put on this message. 370 00:30:13 --> 00:30:18 And that cap is very important to signaling to the cell this is a 371 00:30:18 --> 00:30:23 messenger RNA to be dealt with in a certain way, to get the ribosome to 372 00:30:23 --> 00:30:27 hop on, to get this thing processed properly, et cetera. 373 00:30:27 --> 00:30:32 At the other end of the message a long string of As is added 374 00:30:32 --> 00:30:37 to messenger RNAs. This long string of As is called, 375 00:30:37 --> 00:30:41 very sensibly, a poly A tail. The poly A tail is added to the 376 00:30:41 --> 00:30:46 messenger RNA, and very often, 377 00:30:46 --> 00:30:51 I mean it's, if you wanted to purify messenger RNAs from your own human 378 00:30:51 --> 00:30:55 cells, you can actually use poly T as a reagent because it turns out, 379 00:30:55 --> 00:31:00 because messenger RNAs have a poly A tail, they'll bind to 380 00:31:00 --> 00:31:04 and stick to poly T. So, people actually purify messenger 381 00:31:04 --> 00:31:08 RNAs by binding them to poly T and they get the poly A tail. 382 00:31:08 --> 00:31:11 But it is broadly believed that the reason for this poly A tail is not 383 00:31:11 --> 00:31:15 to make things convenient for molecular biologists to purify 384 00:31:15 --> 00:31:18 messages. To the contrary, it is an important function for the 385 00:31:18 --> 00:31:22 cell. And it turns out that this is important in regulating the 386 00:31:22 --> 00:31:25 stability of messages. If, in fact, you don't have a poly 387 00:31:25 --> 00:31:29 A tail, if you contrive to make the same message without the poly A tail, 388 00:31:29 --> 00:31:33 the message will be degraded rather rapidly. 389 00:31:33 --> 00:31:36 And the lengths of the poly A tails control aspects of the degradation, 390 00:31:36 --> 00:31:39 et cetera. So, in a complex eukaryotic cell, 391 00:31:39 --> 00:31:43 already it's how to attach a little signal at the front, 392 00:31:43 --> 00:31:46 some signals at the back that says process me in a certain way, 393 00:31:46 --> 00:31:49 et cetera, don't degrade me yet. You could even imagine that this 394 00:31:49 --> 00:31:53 poly A tail could serve as a little bit of a clock for how long that 395 00:31:53 --> 00:31:56 message sticks around. It's not quite that simple but 396 00:31:56 --> 00:31:59 there are ways to do it. But all of these pale in comparison 397 00:31:59 --> 00:32:03 to the third way in which eukaryotic messages differ from prokaryotic 398 00:32:03 --> 00:32:10 messages. These small modifications are, 399 00:32:10 --> 00:32:21 as I say, small. The most striking way in which they differ is that 400 00:32:21 --> 00:32:33 only a small portion often of the gene, here's my gene, 401 00:32:33 --> 00:32:44 matters for the protein that is made. So, my mRNA gets made. 402 00:32:44 --> 00:32:54 It includes the whole long sequence. And then the cell comes along and 403 00:32:54 --> 00:33:05 splices this message together. So, this is the immature RNA. 404 00:33:05 --> 00:33:13 It is processed by clipping out this, clipping out this, 405 00:33:13 --> 00:33:22 clipping out this. And what you get is a splice where the mature message 406 00:33:22 --> 00:33:30 throws this stuff out, splices between here and here, 407 00:33:30 --> 00:33:39 splices here, splices here, splices here, and you get a 408 00:33:39 --> 00:33:47 much shorter mRNA. And this is a mature mRNA. 409 00:33:47 --> 00:33:53 This splicing is a remarkable phenomenon. In fact, 410 00:33:53 --> 00:34:00 it was discovered by Phil Sharp here, for which he won a Nobel prize. 411 00:34:00 --> 00:34:04 This splicing is a very complex operation. First off, 412 00:34:04 --> 00:34:08 how does, well, actually, what accomplishes splicing? 413 00:34:08 --> 00:34:12 It should be splicase, right? But it turns out it's not a single 414 00:34:12 --> 00:34:17 enzyme. It's a big body of stuff. So, instead it's the splicosome, OK. 415 00:34:17 --> 00:34:21 Everything is either ase or some or something like that. 416 00:34:21 --> 00:34:25 So, it turns out it's the splicosome that does that. 417 00:34:25 --> 00:34:30 It's just wonderful how all those names work out. The splicosome. 418 00:34:30 --> 00:34:36 The splicosome comes along and splices it. How does the splicosome 419 00:34:36 --> 00:34:42 know how to do this? Well, there are kind of codes. 420 00:34:42 --> 00:34:48 It turns out that there are some information encoded along in these 421 00:34:48 --> 00:34:54 messages. It turns out that there is, you know, slight biases. 422 00:34:54 --> 00:35:00 Typically the sequence just after where the slice starts here is a GU 423 00:35:00 --> 00:35:06 and the sequence here is an AG, but that's obviously not enough 424 00:35:06 --> 00:35:10 information, right? It's not enough bases of information 425 00:35:10 --> 00:35:14 to get this right. And so there's a little more 426 00:35:14 --> 00:35:18 preferences for what bases use, but the truth is we don't fully know. 427 00:35:18 --> 00:35:21 Our best picture right now involves some cellular factors help 428 00:35:21 --> 00:35:25 recognizing the parts that are supposed to stay in some sequences 429 00:35:25 --> 00:35:29 here. But the truth is we don't have the simple codes. 430 00:35:29 --> 00:35:33 Because if we had the simple codes, we'd be able to take a long stretch 431 00:35:33 --> 00:35:37 of DNA and figure out exactly where the splices go based on just 432 00:35:37 --> 00:35:42 computer analysis. And we can't do that so well. 433 00:35:42 --> 00:35:46 These bits that stay in are called exons. The bits that go out are 434 00:35:46 --> 00:35:51 called introns. This is the source of extraordinary 435 00:35:51 --> 00:35:55 confusion for students because you might think that the bits that are 436 00:35:55 --> 00:36:00 excised are the exons, but they're not. 437 00:36:00 --> 00:36:04 The bits that stay in are the exons. Why are they called exons if they 438 00:36:04 --> 00:36:08 stay in and ex is a prefix meaning out? Well, because the introns are 439 00:36:08 --> 00:36:12 named because they're intervening sequences. Once the introns, 440 00:36:12 --> 00:36:17 the intervening sequences were named as intervening sequences or introns, 441 00:36:17 --> 00:36:21 you were stuck then having to name the things that stay in as exons. 442 00:36:21 --> 00:36:25 This was all done by a Harvard professor, don't blame me. 443 00:36:25 --> 00:36:30 In any case, a good friend Harvard professor. 444 00:36:30 --> 00:36:37 But, nonetheless, I'm not sure that this was the best 445 00:36:37 --> 00:36:44 way to name them. But you're stuck with it. 446 00:36:44 --> 00:36:52 So, for a typical human gene, typical human gene, the length of 447 00:36:52 --> 00:36:59 the gene itself might be 30, 00 bases. But the mature RNA, 448 00:36:59 --> 00:37:07 the mature mRNA might be one and a half, 1,500 bases. 449 00:37:07 --> 00:37:11 That's remarkable. Out of 30,000 letters in the 450 00:37:11 --> 00:37:15 initial transcript that is made, the genes start, the promoter, and 451 00:37:15 --> 00:37:19 the transcription will stop 30, 00 bases away. The cell goes 452 00:37:19 --> 00:37:24 through the trouble of making an RNA of 30,000 bases long, 453 00:37:24 --> 00:37:28 and then it trims it down by throwing out 28, 454 00:37:28 --> 00:37:33 00 of the bases, keeping only 1, 00 bases at the end. 455 00:37:33 --> 00:37:37 Now, this may seem profligate but it ain't nothing compared to some 456 00:37:37 --> 00:37:42 extreme cases. The clotting factor gene, 457 00:37:42 --> 00:37:47 the factor 8 gene, the gene that has mutated in individuals with 458 00:37:47 --> 00:37:52 hemophilia, that gene is 200, 00 bases long, and it gets spliced 459 00:37:52 --> 00:37:57 down to a mere 10, 00 bases. 190,000 bases are thrown 460 00:37:57 --> 00:38:02 away. But that's nothing compared to 461 00:38:02 --> 00:38:08 Duchene muscular dystrophy. The Duchene muscular dystrophy is 462 00:38:08 --> 00:38:13 the all time winner. That gene makes an immature initial 463 00:38:13 --> 00:38:19 RNA of 2 million bases. RNA polymerase hops on at the 464 00:38:19 --> 00:38:24 promoter and it gets off at the end of the Boston Marathon on here 2 465 00:38:24 --> 00:38:30 million bases later having made an RNA of 2 million bases long. 466 00:38:30 --> 00:38:36 Calculate the speed of RNA polymerase and you'll find out that 467 00:38:36 --> 00:38:42 it's at it for hours. It hops on and it stays on for 468 00:38:42 --> 00:38:48 hours until it gets to the other end. And then for all its troubles this 469 00:38:48 --> 00:38:54 gene is spliced down to 16, 00 bases in the mature message. 470 00:38:54 --> 00:39:00 Yup? How would it increase the chance of mutations? Yup. 471 00:39:00 --> 00:39:05 So, splicing mutations could be a problem. Some diseases could arise 472 00:39:05 --> 00:39:10 from errors in splicing. Do you think that happens? 473 00:39:10 --> 00:39:15 Sure does. There could be mutations that create, 474 00:39:15 --> 00:39:20 that change a splicing, or mutations that create a new 475 00:39:20 --> 00:39:25 splicing, and all of that could screw up the gene. 476 00:39:25 --> 00:39:30 Why do this? What in the world is going on? 477 00:39:30 --> 00:39:35 Just think about the energetic cost. I mean count up the ATPs involved 478 00:39:35 --> 00:39:40 in synthesizing a nucleotide, and then the ATPs involved in adding 479 00:39:40 --> 00:39:45 nucleotides up. You know, think about this totally 480 00:39:45 --> 00:39:50 wasted energy. What is the point? 481 00:39:50 --> 00:39:55 I might be able to encode multiple proteins with the same gene. 482 00:39:55 --> 00:40:00 How would I do that? Ooh, wouldn't that be cleaver? 483 00:40:00 --> 00:40:05 I might be able to take a single gene and make a mix and match 484 00:40:05 --> 00:40:10 product. It might be, do you mean like one type of cell 485 00:40:10 --> 00:40:15 might splice up that message one way to produce a certain protein, 486 00:40:15 --> 00:40:20 but a different cell type might splice the same gene another way to 487 00:40:20 --> 00:40:25 produce a different protein? Ooh. So, you're proposing, if I 488 00:40:25 --> 00:40:30 understand you correctly, alternative splicing. 489 00:40:30 --> 00:40:34 Alternative splicing could create multiple proteins, 490 00:40:34 --> 00:40:38 multiple distinct proteins. It might be, for example, that you 491 00:40:38 --> 00:40:42 might make one protein that has a cytoplasmic tail and another protein 492 00:40:42 --> 00:40:46 that doesn't have cytosplasmic tail or a different tail or, 493 00:40:46 --> 00:40:50 or, this is true. This actually happens. It's very cleaver. 494 00:40:50 --> 00:40:54 Anything that can happen does happen somewhere, 495 00:40:54 --> 00:40:58 and it's fairly regularly used. A typical gene in the human being 496 00:40:58 --> 00:41:02 has at least two alternative splice forms, on average. 497 00:41:02 --> 00:41:05 Most, many don't, but there are some that have large 498 00:41:05 --> 00:41:08 numbers. The most extreme is there's a gene known, 499 00:41:08 --> 00:41:11 drosophila, that has more than a thousand alternative splice forms. 500 00:41:11 --> 00:41:15 How does it know, how does the cell know whether to splice it one way in 501 00:41:15 --> 00:41:18 the liver and one way in a heart or something? We don't fully know but 502 00:41:18 --> 00:41:21 there's machinery and signals people are trying to work out for that. 503 00:41:21 --> 00:41:25 Now, I don't want to confuse you too much about it. 504 00:41:25 --> 00:41:28 You know, mostly, when we give you a gene, you should think about it 505 00:41:28 --> 00:41:31 spliced out introns, exons. But the truth is it is more 506 00:41:31 --> 00:41:35 complicated than that. There can be alternative splicing 507 00:41:35 --> 00:41:39 that allows genes to be used in multiple ways. 508 00:41:39 --> 00:41:43 Sometimes they don't make multiple proteins. They may splice into 509 00:41:43 --> 00:41:46 portions of the mRNA that are not translated, but, 510 00:41:46 --> 00:41:50 there is that, but, boy, it's a huge amount of overhead 511 00:41:50 --> 00:41:54 here just to do that. Is it justified? Yes? 512 00:41:54 --> 00:41:58 That is by computer if I just gave you the sequence? Not quite. 513 00:41:58 --> 00:42:01 Almost. Maybe. Sort of. It turns out that the 514 00:42:01 --> 00:42:04 computer programs for automatically recognizing the matter of the human 515 00:42:04 --> 00:42:07 genome are sort of, they're mediocre, not very good. 516 00:42:07 --> 00:42:09 We have some idea of the signals, and various people have trying to 517 00:42:09 --> 00:42:12 write better and better algorithms for doing that, 518 00:42:12 --> 00:42:15 but the cell knows what it's doing and we don't fully know, 519 00:42:15 --> 00:42:18 as evidenced by the fact that we can't write a clean computer program 520 00:42:18 --> 00:42:21 to do it yet. We need to get information from the cell or from 521 00:42:21 --> 00:42:24 evolution or various other things like that, and that's the ultimate 522 00:42:24 --> 00:42:27 test. If we knew what we were talking about we'd just be able to 523 00:42:27 --> 00:42:30 write a computer program and splice it out. 524 00:42:30 --> 00:42:34 And we don't. There's another reason why people think these big 525 00:42:34 --> 00:42:38 introns and exons, these big introns are helpful, 526 00:42:38 --> 00:42:42 and that is an evolutionary reason. The evolutionary reason is a little 527 00:42:42 --> 00:42:47 bit harder to follow, but let me try it on you. 528 00:42:47 --> 00:42:51 Suppose a random event happens and a chromosome breaks, 529 00:42:51 --> 00:42:55 that happens, and suppose a random breakage sticks one part of a 530 00:42:55 --> 00:43:00 chromosome to some other part of the chromosome. 531 00:43:00 --> 00:43:04 If it lands smack dab in the middle of the coding sequence of a gene 532 00:43:04 --> 00:43:09 that's bad new. But it turns out that if it lands 533 00:43:09 --> 00:43:13 in the introns of two different genes and sticks them together it 534 00:43:13 --> 00:43:18 could make a new gene that would still work. By having a random 535 00:43:18 --> 00:43:23 break between two genes in their introns and slamming them together, 536 00:43:23 --> 00:43:27 you could make a gene that had a bunch of exons from one gene and a 537 00:43:27 --> 00:43:32 bunch of exons from another gene. And this intervening sequence in the 538 00:43:32 --> 00:43:38 middle and it would get spliced up. Evolution might like that because 539 00:43:38 --> 00:43:44 it would be a very easy way for evolution to build new genes that 540 00:43:44 --> 00:43:49 had a portion of one protein and a portion of another protein. 541 00:43:49 --> 00:43:55 This kind of mix and match domain swapping could be very useful. 542 00:43:55 --> 00:44:01 And when we look across genomes, we see lots and lots of examples of 543 00:44:01 --> 00:44:07 genes that have a similar first half but different second halves. 544 00:44:07 --> 00:44:10 Or have some portion in the middle, a domain that we recognize, that we 545 00:44:10 --> 00:44:13 see in multiple proteins. And so, in fact, an argument for 546 00:44:13 --> 00:44:17 why we have all of this intronic DNA, one that's impossible to prove but 547 00:44:17 --> 00:44:20 is an argument is that from an evolution point of view, 548 00:44:20 --> 00:44:24 this allows a great deal of evolutionary innovation. 549 00:44:24 --> 00:44:27 You have to be careful that you say those organisms that have this extra 550 00:44:27 --> 00:44:31 space are able to mix and match and create more new kinds of combination 551 00:44:31 --> 00:44:34 proteins, et cetera, et cetera, and therefore survived 552 00:44:34 --> 00:44:38 better, et cetera, et cetera, et cetera. 553 00:44:38 --> 00:44:41 Why don't bacteria have this? Sorry? They're not as complicated. 554 00:44:41 --> 00:44:45 That's one though is we can take a sort of condescending attitude to 555 00:44:45 --> 00:44:48 these bacteria. They're not very, 556 00:44:48 --> 00:44:52 they're just not so complicated. There's another point of view which 557 00:44:52 --> 00:44:56 is bacteria are far more sophisticated than we are because 558 00:44:56 --> 00:45:00 they're under incredibly rigorous evolutionary selection. 559 00:45:00 --> 00:45:04 You might argue that if I'm a bacteria, can I really afford all 560 00:45:04 --> 00:45:08 this extra DNA? Now, the metabolic cost of all that 561 00:45:08 --> 00:45:12 extra DNA is huge to a bacteria which competes on replication. 562 00:45:12 --> 00:45:16 It's got to divide every 20 minutes, and trying to put in all these extra 563 00:45:16 --> 00:45:20 bases would be very news. So, you might imagine, just to be, 564 00:45:20 --> 00:45:24 you know, stand things on its head, that early life all had introns and 565 00:45:24 --> 00:45:28 bacteria, in the process of competing to be more and more 566 00:45:28 --> 00:45:31 efficient go rid of their introns. There's actually a large camp of 567 00:45:31 --> 00:45:34 people who think it went that way, that early life evolved with introns, 568 00:45:34 --> 00:45:37 and then bacteria, in the pressure to compete, 569 00:45:37 --> 00:45:41 got rid of them. And there's some evidence to support that. 570 00:45:41 --> 00:45:44 Bacteria don't have introns. Small eukaryotes like yeast that 571 00:45:44 --> 00:45:47 sort of do compete on replication have some introns, 572 00:45:47 --> 00:45:50 but a small number. There are only about 250 introns in 573 00:45:50 --> 00:45:53 yeast. Only about 5% of the genes have an intron and they're small. 574 00:45:53 --> 00:45:57 Bigger eukaryotes have bigger introns. And the bigger you get, 575 00:45:57 --> 00:46:00 often on average the bigger the genome sizes are the more 576 00:46:00 --> 00:46:03 you can tolerate it. And so I actually think, 577 00:46:03 --> 00:46:07 I actually probably favor this notion that introns were the 578 00:46:07 --> 00:46:10 original state and they've been gotten rid of. 579 00:46:10 --> 00:46:14 And the more pressure you're under to replicate rapidly the less you 580 00:46:14 --> 00:46:17 can tolerate this interesting and complicated innovation. 581 00:46:17 --> 00:46:21 Anyway, that's another way that things differ. 582 00:46:21 --> 00:46:24 And then, finally, viruses can do it either way. 583 00:46:24 --> 00:46:28 Viruses, depending on whether they are prokaryotic viruses or 584 00:46:28 --> 00:46:31 eukaryotic viruses, are able to replicate, 585 00:46:31 --> 00:46:35 are able to either do or don't have splicing. 586 00:46:35 --> 00:46:41 Last topic. Translation. Here eukaryotes are relatively 587 00:46:41 --> 00:46:48 simple. You get a message, you get a gene, you get an mRNA. 588 00:46:48 --> 00:46:54 The mRNA goes to the ribosome. Here's a ribosome. 589 00:46:54 --> 00:47:01 The ribosome goes to the mRNA, actually, and it starts turning out 590 00:47:01 --> 00:47:09 one protein as it chugs along. Prokaryotes differ in an interesting 591 00:47:09 --> 00:47:18 way. I get a promoter here that is transcribed into my mRNA, 592 00:47:18 --> 00:47:27 but it turns out that the mRNA can encode multiple independent proteins, 593 00:47:27 --> 00:47:36 protein one, protein two, protein three on the same mRNA. 594 00:47:36 --> 00:47:40 And a ribosome will hop on here and synthesize this one. 595 00:47:40 --> 00:47:44 A ribosome will hop on here and synthesize this one, 596 00:47:44 --> 00:47:48 and a ribosome will hop on here and synthesize that one. 597 00:47:48 --> 00:47:52 And you have what is called a polycistronic message. 598 00:47:52 --> 00:47:56 Poly, many. Cystronic, cystrons were an old name for coding 599 00:47:56 --> 00:48:00 regions of genes here. Polycystronic messages. 600 00:48:00 --> 00:48:03 Why would you want to do that, have a single mRNA that encodes 601 00:48:03 --> 00:48:06 multiple distinct proteins, each starting with its own ribosome 602 00:48:06 --> 00:48:09 start site there? Efficiency. Maybe, 603 00:48:09 --> 00:48:12 in fact, these would be, how about, oh, this would be cleaver, 604 00:48:12 --> 00:48:15 make them multiple steps in a biochemical pathway? 605 00:48:15 --> 00:48:18 Have them coded on a single messenger so then you'd only have to 606 00:48:18 --> 00:48:21 worry about regulating that once. If you have the regulatory 607 00:48:21 --> 00:48:24 machinery to turn on, you'll make all the enzymes for the 608 00:48:24 --> 00:48:27 pathway. And that's exactly what bacteria do. They tend to put all 609 00:48:27 --> 00:48:30 the enzymes for a pathway on a single message so when they want to 610 00:48:30 --> 00:48:33 call up, let's digest hexose this morning, they have a whole thing 611 00:48:33 --> 00:48:37 that will let them be able to do that, poly-cystronic. 612 00:48:37 --> 00:48:41 That's because they're small genomes. They're pressed for space. 613 00:48:41 --> 00:48:46 And, because of that, they have to slam a lot into a single unit. 614 00:48:46 --> 00:48:50 And this single unit that has multiple genes encoded in a single 615 00:48:50 --> 00:48:55 message is called an operon, and we'll talk more about that. 616 00:48:55 --> 00:49:00 Last of all viruses. Viruses. Viruses have very little room. 617 00:49:00 --> 00:49:05 Their genomes can be tiny. A typical virus might have a genome 618 00:49:05 --> 00:49:10 of 5,000 bases to 10, 00 bases to, in some cases, 619 00:49:10 --> 00:49:15 200,000 bases, but it hasn't got a lot of room. It wants to pack a lot 620 00:49:15 --> 00:49:21 of protein coating information in. And some viruses have come up with 621 00:49:21 --> 00:49:26 the most extraordinary way of doing that. Some viruses have gone to the 622 00:49:26 --> 00:49:32 extreme of having RNAs that get made from them that have a sequence -- 623 00:49:32 --> 00:49:39 I'm just going to pick up in the middle of the sequence here. 624 00:49:39 --> 00:49:46 A-C-U-A-C-U-A-C-U-A-C-U. You might decide to read the sequence like 625 00:49:46 --> 00:49:53 this, that those are the codons, and you'd get a certain protein. 626 00:49:53 --> 00:50:00 But I might also decide to read that sequence C-U-A-C-U-A-C-U-A. 627 00:50:00 --> 00:50:04 And, of course, I'm giving this in a repeating form 628 00:50:04 --> 00:50:08 because it's easy to note. I could give you any sequence and I 629 00:50:08 --> 00:50:12 could read it in this reading frame, I could read it in this reading 630 00:50:12 --> 00:50:17 frame, or I could read it as U-A-C-U-A-C-U-A-C. 631 00:50:17 --> 00:50:21 In other words, there are three reading frames that, 632 00:50:21 --> 00:50:25 in principle, you could translate a protein from. In a typical 633 00:50:25 --> 00:50:30 prokaryotic gene or eukaryotic gene only one of those is used. 634 00:50:30 --> 00:50:34 You start at the first AUG and that sets the reading frame. 635 00:50:34 --> 00:50:39 But some viruses are so pressed for space and are so cleaver and are so 636 00:50:39 --> 00:50:43 efficient that they make messages that have tricks that they actually 637 00:50:43 --> 00:50:48 use two or, in some cases, all three reading frames, which is 638 00:50:48 --> 00:50:53 an extraordinary packing of information density into a simple 639 00:50:53 --> 00:50:57 message. So, the basic point. We have a simple model. DNA is 640 00:50:57 --> 00:51:02 replicated. Transcribed into RNA. 641 00:51:02 --> 00:51:06 Translated into protein. But there are a lot of important 642 00:51:06 --> 00:51:11 variations between eukaryotes, prokaryotes and viruses. And 643 00:51:11 --> 00:51:15 understanding them can be useful for treating cancer, 644 00:51:15 --> 00:51:20 for treating AIDS, and for treating viral and bacterial 645 00:51:20 --> 51:25 infections. Next time.