1 00:00:15 --> 00:00:21 OK. And here we are in the molecular biology section. 2 00:00:21 --> 00:00:27 And the goal of this section, as Professor Jacks started to tell 3 00:00:27 --> 00:00:33 you during the Genetics module and Professor Baker told you at the 4 00:00:33 --> 00:00:39 beginning of last lecture is try to link together in molecular terms the 5 00:00:39 --> 00:00:45 question of genotype and the question of phenotype. 6 00:00:45 --> 00:00:51 And we presented to you this notion that goes by the ponderous name of 7 00:00:51 --> 00:00:58 the central dogma that the link between genotype and phenotype is 8 00:00:58 --> 00:01:04 related to DNA as a genetic material that then proceeds to transmit its 9 00:01:04 --> 00:01:11 information to a final outcome, which is very often a protein, 10 00:01:11 --> 00:01:17 through an RNA intermediate. And the point of these molecular 11 00:01:17 --> 00:01:22 biology lectures is to tell you about the molecular biology, 12 00:01:22 --> 00:01:27 and then at the end to try to bring together this genotype and phenotype 13 00:01:27 --> 00:01:32 in molecular terms. Now, last lecture you talked about 14 00:01:32 --> 00:01:36 DNA replication, DNA as the genetic material required 15 00:01:36 --> 00:01:41 to be replicated faithfully and accurately so it can transmit its 16 00:01:41 --> 00:01:46 information to the next generation. Professor Baker I know stressed the 17 00:01:46 --> 00:01:50 requirement for accurate replication, but she did not do one part of this 18 00:01:50 --> 00:01:55 lecture that I want to spend a couple of minutes now discussing 19 00:01:55 --> 00:02:00 with you. And that is the question of DNA repair. 20 00:02:00 --> 00:02:13 So there are two types of DNA repair. 21 00:02:13 --> 00:02:19 Actually, three types that I want to talk to you about. 22 00:02:19 --> 00:02:25 And the first pertains to the accuracy of the DNA polymerase that 23 00:02:25 --> 00:02:32 replicates the DNA. So DNA polymerase -- 24 00:02:32 --> 00:02:40 -- three, or the DNA polymerase that 25 00:02:40 --> 00:02:45 replicates the DNA makes mistakes. It puts in the wrong nucleotide. 26 00:02:45 --> 00:02:51 It puts in the wrong base. And it does so about one in ten to 27 00:02:51 --> 00:02:56 the fifth bases. OK? So one in a hundred thousand 28 00:02:56 --> 00:03:01 bases is wrong. Now, if you think about the fact 29 00:03:01 --> 00:03:05 that there are more than ten to the ninth bases in a human genome, 30 00:03:05 --> 00:03:09 every cell cycle that translates to ten thousand or so mistakes, 31 00:03:09 --> 00:03:14 that's a lot of changes in the DNA. That's not a very faithful kind of 32 00:03:14 --> 00:03:18 DNA replication. So this has been selected against 33 00:03:18 --> 00:03:22 evolutionarily. And there is a mechanism that's 34 00:03:22 --> 00:03:30 called proofreading -- 35 00:03:30 --> 00:03:35 That allows this high error rate to be corrected. And it's actually 36 00:03:35 --> 00:03:40 very cleaver. So this DNA polymerase has what is called an 37 00:03:40 --> 00:03:45 exonuclease activity. Exo meaning out. Nuclease meaning 38 00:03:45 --> 00:03:50 to break down nucleic acids. And this exonuclease proceeds from 39 00:03:50 --> 00:03:55 the 3 prime to the 5 prime direction, the 3 prime nucleotide being the one 40 00:03:55 --> 00:04:00 that was added last as you should now know. 41 00:04:00 --> 00:04:04 And so what DNA polymerase does as it is replicating is it kind of 42 00:04:04 --> 00:04:08 feels whether or not the double helix has reformed in a smooth way. 43 00:04:08 --> 00:04:12 And if it feels that there is a bubble there, a bubble where the two 44 00:04:12 --> 00:04:17 bases, actually look at me rather than the diagram. 45 00:04:17 --> 00:04:21 I think it's easier. I can do it better with my hands. 46 00:04:21 --> 00:04:25 Where you've got a nice smooth helix, if there is a mismatched 47 00:04:25 --> 00:04:30 nucleotide, the bases do not pair, there will be a bubble. 48 00:04:30 --> 00:04:34 OK? There will be a bubble in the helix or a space in the helix. 49 00:04:34 --> 00:04:39 The two strands will not be joined together. And the DNA polymerase 50 00:04:39 --> 00:04:43 can sense this and it can go back and it excises the wrong nucleotide 51 00:04:43 --> 00:04:48 and puts in the correct one. OK? This is called proofreading. 52 00:04:48 --> 00:04:53 And it's extremely necessary and it's actually very good. 53 00:04:53 --> 00:04:57 And what it does is to decrease the error rate to one in ten 54 00:04:57 --> 00:05:02 to the ninth bases. OK? So you get four orders of 55 00:05:02 --> 00:05:07 magnitude improvement in the accuracy of DNA replication. 56 00:05:07 --> 00:05:12 Now, there is another set of things that can go wrong. 57 00:05:12 --> 00:05:18 And these actually fall under the heading of mutagens. 58 00:05:18 --> 00:05:23 Mutagens, as Professor Jacks mentioned to you, 59 00:05:23 --> 00:05:28 being agents which change the base sequence of the DNA once 60 00:05:28 --> 00:05:33 the DNA is there. And these can either be chemical or 61 00:05:33 --> 00:05:37 these can be ionizing radiation. And in those cases also the helix 62 00:05:37 --> 00:05:41 gets changed because the wrong base gets put in. No, 63 00:05:41 --> 00:05:45 not because the wrong base gets put in. But because there is a chemical 64 00:05:45 --> 00:05:49 reaction which might modify a base, which might, for example, covalently 65 00:05:49 --> 00:05:53 link two bases. thymine for example. 66 00:05:53 --> 00:05:57 If two thymines are sitting next to one another in the helix, 67 00:05:57 --> 00:06:02 ultraviolet light is very good at cross linking those. 68 00:06:02 --> 00:06:06 And you now have something called a thymine dimer. 69 00:06:06 --> 00:06:10 And that is very bad because that is not a normal base sequence. 70 00:06:10 --> 00:06:14 And when replication time comes along that DNA helix is abnormal and 71 00:06:14 --> 00:06:19 the replication machinery doesn't know what to do about it, 72 00:06:19 --> 00:06:23 and that can lead to all sorts of problems and to mutations. 73 00:06:23 --> 00:06:28 So there are mechanisms that can get rid of abnormal bases. 74 00:06:28 --> 00:06:34 So mutagens can chemically, actually, maybe not say chemically. 75 00:06:34 --> 00:06:41 Let me just say change bases. They change base structure either to 76 00:06:41 --> 00:06:48 something that looks like another normal new base or to something that 77 00:06:48 --> 00:06:55 looks abnormal. And there are two mechanisms to get 78 00:06:55 --> 00:07:02 rid of this. One is called excision repair and the other is called 79 00:07:02 --> 00:07:08 mismatch repair. I have them written in the reverse 80 00:07:08 --> 00:07:12 order than is on this diagram from your book. In mismatch repair there 81 00:07:12 --> 00:07:16 is one nucleotide that looks normal, but it's different. It doesn't 82 00:07:16 --> 00:07:20 match the, usually it looks normal. It doesn't match the one opposite 83 00:07:20 --> 00:07:24 to it. And in that case the repair machinery can go in and remove the 84 00:07:24 --> 00:07:28 abnormal or the mismatched nucleotide, and there's another 85 00:07:28 --> 00:07:33 enzyme that will go and correct it. In excision repair, 86 00:07:33 --> 00:07:37 one very often, excision repair occurs when, for example, 87 00:07:37 --> 00:07:41 two nucleotides have become covalently linked to one another, 88 00:07:41 --> 00:07:45 and the one strand of the helix is just a mess. And there is an enzyme, 89 00:07:45 --> 00:07:49 or enzyme complex that will go in and actually excise a little chunk 90 00:07:49 --> 00:07:53 of the helix. And then another enzyme will come in and fill in the 91 00:07:53 --> 00:07:58 gap so that you get the helix repaired. 92 00:07:58 --> 00:08:02 Now, the challenge in this, and you may be asking yourselves 93 00:08:02 --> 00:08:07 this, is how does this repair machinery know which the correct 94 00:08:07 --> 00:08:12 strand was? In the case of proofreading it's very interesting 95 00:08:12 --> 00:08:17 because initially after replication the newly synthesized DNA strand is 96 00:08:17 --> 00:08:22 not modified. It's just a normal nucleotide polymer. 97 00:08:22 --> 00:08:27 However, the template strand, the template strands, the parental 98 00:08:27 --> 00:08:32 strands over time become chemically modified. 99 00:08:32 --> 00:08:35 The bases actually get, especially adenine gets some methyl 100 00:08:35 --> 00:08:39 groups added to it. And this is different than the 101 00:08:39 --> 00:08:43 newly synthesized with doesn't have these methyl groups. 102 00:08:43 --> 00:08:47 And so the polymerase knows which strand is the old strand and the 103 00:08:47 --> 00:08:50 correct one and which is the new strand and the incorrect one. 104 00:08:50 --> 00:08:54 In the case of excision and mismatch repair, 105 00:08:54 --> 00:08:58 that's sometimes not clear. Where you've got these thymine 106 00:08:58 --> 00:09:02 dimmers, these Ts that are joined together then that's clearly the 107 00:09:02 --> 00:09:06 wrong, that's clearly wrong. OK? The enzymatic machinery can 108 00:09:06 --> 00:09:10 take that out and copy the other strand. Sometimes, 109 00:09:10 --> 00:09:14 though, if you just have a chemical conversion of one base to another, 110 00:09:14 --> 00:09:18 the repair machinery does not know which strand is the correct and 111 00:09:18 --> 00:09:22 which isn't. And that's when you'll get mutations fixed in the DNA 112 00:09:22 --> 00:09:26 because at replication you really may get the changing, 113 00:09:26 --> 00:09:30 you may get the incorrect, you may not get the correct base 114 00:09:30 --> 00:09:36 repairs. And then that incorrect base will be 115 00:09:36 --> 00:09:42 passed on through the next generation. OK. 116 00:09:42 --> 00:09:48 So this is a very rapid zip through DNA repair that I wanted you to be 117 00:09:48 --> 00:09:54 able to think about. I want to move onto the next step 118 00:09:54 --> 00:10:01 in the transmission of information from gene to final product today. 119 00:10:01 --> 00:10:06 And I want to talk to you about the generation of RNA. 120 00:10:06 --> 00:10:11 And so let us begin with a quiz. And I have for you a new incentive 121 00:10:11 --> 00:10:17 to pay attention, a new prize that you can use to 122 00:10:17 --> 00:10:22 think about the conversion of potential to kinetic energy, 123 00:10:22 --> 00:10:27 and also you can use to amuse yourself when you're downloading 124 00:10:27 --> 00:10:33 very poor, when you're downloading things from the Internet and have 125 00:10:33 --> 00:10:38 nothing better to do. I can usually get this right across 126 00:10:38 --> 00:10:42 the room. There you go. You can also use it to think about 127 00:10:42 --> 00:10:46 the nature of amphibians, they're nice flying frogs. 128 00:10:46 --> 00:10:51 OK. So let us pose the question here, what is RNA? 129 00:10:51 --> 00:11:01 And you've had some of this on a 130 00:11:01 --> 00:11:06 problem set, but you really need to know what I'm talking about. 131 00:11:06 --> 00:11:12 This is a ribonucleotide. How do I know that this is a ribonucleotide? 132 00:11:12 --> 00:11:17 Think about it. You can put your hands up, but I want everyone to 133 00:11:17 --> 00:11:22 think about it. OK. And you need to identify the 134 00:11:22 --> 00:11:28 precise chemical group, please, that tells me. I saw you 135 00:11:28 --> 00:11:38 two first, so yes. 136 00:11:38 --> 00:11:43 What does the lower right mean? Give me a name. It's the? There's 137 00:11:43 --> 00:11:49 a number there. The? Ah, we have a discrepancy of 138 00:11:49 --> 00:11:55 opinion here. Someone says it's a 3 prime hydroxyl on the ribose and 139 00:11:55 --> 00:12:01 someone says it's the 2 prime hydroxyl on the ribose. 140 00:12:01 --> 00:12:04 Let's take a vote. Who thinks that this is identified 141 00:12:04 --> 00:12:08 as a ribose because of this 2 prime hydroxyl? Thank you. 142 00:12:08 --> 00:12:12 And who believes it's the 3 prime hydroxyl that identified riboses? 143 00:12:12 --> 00:12:16 OK. We have a smaller but firm contingent. In fact, 144 00:12:16 --> 00:12:20 it's the 2 prime hydroxyl that identifies this is ribose. 145 00:12:20 --> 00:12:24 You remember, and you really need to remember that this three prime 146 00:12:24 --> 00:12:28 hydroxyl is the reactive group that allows the sugar phosphate backbone 147 00:12:28 --> 00:12:32 to polymerize. This 2 prime hydroxyl is a reactive 148 00:12:32 --> 00:12:38 group. It identifies this as ribose rather than deoxyribose, 149 00:12:38 --> 00:12:43 and it also is an additional reactive group. 150 00:12:43 --> 00:12:49 And the fact that it is a reactive group makes RNA rather labile. 151 00:12:49 --> 00:12:54 OK? So let's write a couple of important things here. 152 00:12:54 --> 00:13:00 So this is RNA as the nucleic acid polymer. 153 00:13:00 --> 00:13:06 You should really know this. Ribose has both a 3 prime hydroxyl 154 00:13:06 --> 00:13:13 and this 2 prime hydroxyl. And this is a reactive group. 155 00:13:13 --> 00:13:19 And because of this RNA is a much less stable polymer than DNA. 156 00:13:19 --> 00:13:26 Here's another one. What type of polynucleotide is this and how do 157 00:13:26 --> 00:13:33 you know? Yes. You. OK. It's RNA. 158 00:13:33 --> 00:13:41 And it's RNA we know because of these uracil groups. 159 00:13:41 --> 00:13:49 OK? So uracil is an alternate base to thymine that's found only in RNA. 160 00:13:49 --> 00:13:57 Here are the Us. It tells you it's RNA. OK? So you need to know those 161 00:13:57 --> 00:14:07 facts about RNA. 162 00:14:07 --> 00:14:12 Good. So let me pose a question to you. In this litany that you've had 163 00:14:12 --> 00:14:18 several times now where the flow of information moves from DNA to RNA to 164 00:14:18 --> 00:14:24 protein, why is the RNA there? This is a rhetorical question. 165 00:14:24 --> 00:14:30 I'm going to try to answer it for you. 166 00:14:30 --> 00:14:34 Why is the RNA there? Why is there an RNA intermediate? 167 00:14:34 --> 00:14:47 You could imagine that the DNA 168 00:14:47 --> 00:14:52 double helix could open up and that nucleic acid could be directly 169 00:14:52 --> 00:14:57 translated or could be directly converted or the code could be 170 00:14:57 --> 00:15:03 changed to form a protein without any RNA intermediate. 171 00:15:03 --> 00:15:07 But, in fact, universally throughout biology, throughout our world anyway, 172 00:15:07 --> 00:15:12 throughout our earth, RNA is there as an intermediate. 173 00:15:12 --> 00:15:16 Why? Well, I think the answer actually lies in evolution. 174 00:15:16 --> 00:15:21 RNA is probably the most ancient of the information polymers. 175 00:15:21 --> 00:15:26 That is widely believed now. So RNA is ancient. It was the 176 00:15:26 --> 00:15:30 first, strongly believed now that it was the first information 177 00:15:30 --> 00:15:35 carrying polymer. RNAs themselves were catalytic. 178 00:15:35 --> 00:15:39 They became able to replicate. And they also probably became able to be 179 00:15:39 --> 00:15:43 translated into protein before DNA was invented. OK? 180 00:15:43 --> 00:15:47 So DNA's chemical structure is different and it's a derivative of 181 00:15:47 --> 00:15:52 ribonucleic acid, and undoubtedly came second. 182 00:15:52 --> 00:15:56 There was an advantage of having DNA because it's so much more stable, 183 00:15:56 --> 00:16:00 and it made the hereditary material much more stable and much more 184 00:16:00 --> 00:16:05 faithfully transmitted from generation to generation. 185 00:16:05 --> 00:16:10 So RNA was ancient. And the relationship between RNA 186 00:16:10 --> 00:16:15 and protein is probably a very old one, and we'll talk about this 187 00:16:15 --> 00:16:20 relationship next lecture. And I believe that that 188 00:16:20 --> 00:16:25 relationship has persisted, and then DNA was kind of an add-on. 189 00:16:25 --> 00:16:30 And the DNA to RNA to protein does not necessarily reflect the only way 190 00:16:30 --> 00:16:36 or the best way to do things. Evolution is a capitalization of 191 00:16:36 --> 00:16:42 various changes. And RNA to DNA, DNA to RNA to 192 00:16:42 --> 00:16:48 protein is how things work now. But this, I think, is a consequence 193 00:16:48 --> 00:16:54 of the evolutionary past. Now, however, in our modern world 194 00:16:54 --> 00:16:59 RNA serves two main purposes. One of the things it does is to 195 00:16:59 --> 00:17:03 allow one to use just a subset of the genes to make proteins. 196 00:17:03 --> 00:17:08 So, as you've been told several times, you and I have about 30, 197 00:17:08 --> 00:17:13 00 genes in our genomes. Not all of those genes, and we will discuss 198 00:17:13 --> 00:17:17 this in great depth as the course goes on. Not all of those genes are 199 00:17:17 --> 00:17:22 used at any one time. We use just a subset of the genes. 200 00:17:22 --> 00:17:27 And having them converted into an 201 00:17:27 --> 00:17:32 RNA intermediate is one of the ways that you can allow just a subset of 202 00:17:32 --> 00:17:38 the genes to be used. So I'm going to write here subset. 203 00:17:38 --> 00:17:54 Subset of gene usage. 204 00:17:54 --> 00:17:57 OK? Because you can turn just some of those genes, 205 00:17:57 --> 00:18:00 or you can convert some of those genes into RNA, 206 00:18:00 --> 00:18:04 the information in some of those genes into RNA. 207 00:18:04 --> 00:18:08 And the other thing it lets you do is to amplify the signal from each 208 00:18:08 --> 00:18:12 gene. So there are two copies of each gene in a diploid cell. 209 00:18:12 --> 00:18:17 When it comes to RNA there can be up to 10,000 copies of RNA per cell 210 00:18:17 --> 00:18:21 of a particular RNA. OK? So you can get an 211 00:18:21 --> 00:18:34 amplification of the signal -- 212 00:18:34 --> 00:18:42 -- from each gene. RNA copy number per cell ranges 213 00:18:42 --> 00:18:50 from about one copy to about 10, 00, that's rare, copies per cell. 214 00:18:50 --> 00:18:59 All right. So here we are. Why RNA? We've dealt with that. 215 00:18:59 --> 00:19:02 So I want to talk to you about two things. I want to talk to you about 216 00:19:02 --> 00:19:06 synthesizing the RNA, and then I'm going to talk to you 217 00:19:06 --> 00:19:10 about modifying the RNA a bit. And the first thing I want to cover 218 00:19:10 --> 00:19:18 is something called transcription. 219 00:19:18 --> 00:19:22 Which is also known as RNA synthesis. And you all should have this 220 00:19:22 --> 00:19:26 handout. So I'm not going to draw it but I will write some salient 221 00:19:26 --> 00:19:35 features on the board for you. 222 00:19:35 --> 00:19:42 And we're not quite ready to use that. I'm going to leave this up 223 00:19:42 --> 00:19:49 here, but I'm going to work on the board for a little bit. 224 00:19:49 --> 00:19:56 The basic idea behind transcription, RNA synthesis, 225 00:19:56 --> 00:20:04 is that one copies a DNA template into a complementary RNA strand, 226 00:20:04 --> 00:20:09 complementary RNA. And one does this, 227 00:20:09 --> 00:20:13 as I've alluded to, only from the genes. 228 00:20:13 --> 00:20:24 And this is an interesting point 229 00:20:24 --> 00:20:31 because although you have 30, 00 genes in your genome, in fact, 230 00:20:31 --> 00:20:37 those 30,000 genes only take up about 5% of the total amount of DNA 231 00:20:37 --> 00:20:44 in each of your cells. So 5% of your total DNA of your 232 00:20:44 --> 00:20:51 genome comprises the genes, the information carrying entities in 233 00:20:51 --> 00:20:58 your DNA. And the rest is other stuff. 234 00:20:58 --> 00:21:03 So the 95% is not genes. It consists of various repeats, 235 00:21:03 --> 00:21:09 repetitive DNA that can be there at just a few copies per genome or at 236 00:21:09 --> 00:21:14 10,000 copies per genome. They can be real little, 10 base 237 00:21:14 --> 00:21:20 pairs, six base pair repeats, or they can be a few kilo bases 238 00:21:20 --> 00:21:25 repeated many times. Oris, Origins of Replication that 239 00:21:25 --> 00:21:31 you talked about last time are not genes. 240 00:21:31 --> 00:21:36 Those are there, too. Centromeres, 241 00:21:36 --> 00:21:42 the middles of chromosomes. Telomeres, the ends of chromosomes. 242 00:21:42 --> 00:21:47 All of these things are not genes, and they comprise the bulk of your 243 00:21:47 --> 00:21:53 DNA. Now, this isn't true in all organisms. OK? 244 00:21:53 --> 00:21:59 Some organisms have got very little of this repetitive extra DNA. 245 00:21:59 --> 00:22:06 We happen to have a great deal of it. OK. So let's pursue this a bit 246 00:22:06 --> 00:22:14 more. And let's think a bit more about these genes. 247 00:22:14 --> 00:22:22 And in particular let's think about the kinds of RNAs that those genes 248 00:22:22 --> 00:22:30 make. So I'm going to talk about gene classes or classes. 249 00:22:30 --> 00:22:37 And this is with respect to the RNA and the functional RNA that comes 250 00:22:37 --> 00:22:44 from those sets of genes. And I want to distinguish two major 251 00:22:44 --> 00:22:52 classes of genes. The first are the protein encoding 252 00:22:52 --> 00:22:59 genes. And protein encoding genes move through a type of RNA that is 253 00:22:59 --> 00:23:07 called messenger RNA, abbreviated mRNA. 254 00:23:07 --> 00:23:16 Messenger RNAs comprise about 1% of the total amount of RNA in a cell. 255 00:23:16 --> 00:23:25 And they can range in size from let's say 100 base pairs to 10, 256 00:23:25 --> 00:23:32 00 base pairs. OK? So there's a very wide size range. 257 00:23:32 --> 00:23:38 No, not base pairs. Yell at me. Why not base pairs? 258 00:23:38 --> 00:23:43 Why was I wrong saying base pairs? Tell me about RNA. Raise your hand. 259 00:23:43 --> 00:23:49 This is worth a frog. I caught myself, but if you can 260 00:23:49 --> 00:23:54 catch me, too. Yes. You. Good. 261 00:23:54 --> 00:24:00 OK. Generally RNA is single, woops. 262 00:24:00 --> 00:24:04 RNA is single-stranded. It does not form, it can form a 263 00:24:04 --> 00:24:08 double helix, OK? It's not as stable as the DNA 264 00:24:08 --> 00:24:12 double helix, and many RNAs, probably most RNAs have some 265 00:24:12 --> 00:24:16 double-strandedness to them, but that is an intromolecular double 266 00:24:16 --> 00:24:20 strand in this. There are some RNAs that form 267 00:24:20 --> 00:24:24 intermolecular double strands, but in generally I'm going to assume 268 00:24:24 --> 00:24:28 that RNAs are single-stranded. So we talk about 100 bases rather 269 00:24:28 --> 00:24:33 than 100 base pairs. OK? Second class of genes are the 270 00:24:33 --> 00:24:39 ones that do not code for protein, and in this case the RNA is the 271 00:24:39 --> 00:24:45 final product. And this litany of DNA to RNA to 272 00:24:45 --> 00:24:51 protein doesn't hold. You just stop at the RNA. 273 00:24:51 --> 00:24:58 And the RNA is the functional thing. So here RNA is the final product. 274 00:24:58 --> 00:25:06 And we can break these into a bunch 275 00:25:06 --> 00:25:12 of different classes. Ribosomal RNAs, abbreviated rRNA 276 00:25:12 --> 00:25:19 are a very abundant class of RNA that comprise about, 277 00:25:19 --> 00:25:25 I've moved over here, let me move here, 98% of total RNA. 278 00:25:25 --> 00:25:31 And there are a few thousand bases in length that say 2, 279 00:25:31 --> 00:25:37 00 to 4,000 bases in length. OK? So this is 98% ribosomal RNA. 280 00:25:37 --> 00:25:42 This is fascinating. I'll tell you next time. This is the RNA that 281 00:25:42 --> 00:25:47 comprises a very large proportion of the ribosome that is the factory 282 00:25:47 --> 00:25:52 that makes the proteins. OK? And so I will tell you more 283 00:25:52 --> 00:25:57 about these next time. Some other ones, tRNA, 284 00:25:57 --> 00:26:03 the T for transfer RNA. tRNA comprise about 1% of all RNA 285 00:26:03 --> 00:26:11 and are about 100 base pairs, 100 bases long. OK? And then an 286 00:26:11 --> 00:26:19 interesting one that MIT has had a huge role in discovering and 287 00:26:19 --> 00:26:27 studying, these things called micro RNAs, abbreviated miRNAs, 288 00:26:27 --> 00:26:35 which are, they're at relatively low abundance. 289 00:26:35 --> 00:26:43 Less than 1% of total RNAs. And these are small. In their 290 00:26:43 --> 00:26:51 mature form they're about 22 bases in length. OK. 291 00:26:51 --> 00:27:00 So now, and I believe I cannot do anything with these boards. 292 00:27:00 --> 00:27:03 Ah, I can do something with this one, but that one is stuck. 293 00:27:03 --> 00:27:07 All right. So I'm going to do something with this one. 294 00:27:07 --> 00:27:10 And then I'm afraid it's going to disappear, but it's not going to 295 00:27:10 --> 00:27:14 matter because you have the handout in front of you. 296 00:27:14 --> 00:27:18 So now I'm ready to move on with you to the basic idea of 297 00:27:18 --> 00:27:21 transcription. And I'm going to write some facts 298 00:27:21 --> 00:27:25 on the board, and we're going to look at these cartoons that I drew 299 00:27:25 --> 00:27:29 for you together because I think your book is kind of difficult. 300 00:27:29 --> 00:27:36 So I decided to draw some cartoons to help you with the basic idea. 301 00:27:36 --> 00:27:44 Transcription or RNA synthesis takes place in the nucleus. 302 00:27:44 --> 00:27:59 Anyone else need a handout? 303 00:27:59 --> 00:28:05 Why don't you come on down. Actually, one of the TAs, could you 304 00:28:05 --> 00:28:11 be an emissary and just hand out to those people with raised hands? 305 00:28:11 --> 00:28:18 Thanks. Transcription takes place in the nucleus. 306 00:28:18 --> 00:28:24 And the idea is really analogous to DNA replication with a difference. 307 00:28:24 --> 00:28:30 The analogy is the synthesis of a complementary strand of nucleic acid 308 00:28:30 --> 00:28:37 on a template strand. So this is an enormously important 309 00:28:37 --> 00:28:43 principle that you need to have. Super important that you get the 310 00:28:43 --> 00:28:50 principle. The basic idea involves synthesis of a complementary strand 311 00:28:50 --> 00:28:57 of nucleic acid from a template strand. The template, 312 00:28:57 --> 00:29:04 actually, let me start even earlier than that. 313 00:29:04 --> 00:29:09 We start with a gene that generally comprises double-stranded DNA. 314 00:29:09 --> 00:29:14 There are exceptions to almost everything that I will tell you, 315 00:29:14 --> 00:29:19 or that Professor Jacks will tell you. You should understand that 316 00:29:19 --> 00:29:25 there are exceptions. Some organisms, particularly 317 00:29:25 --> 00:29:30 viruses have genomes that are RNA that can be single-stranded or 318 00:29:30 --> 00:29:35 double-stranded RNA. Some have genomes that are 319 00:29:35 --> 00:29:40 single-stranded DNA. But in general most genomes are 320 00:29:40 --> 00:29:45 double-stranded DNA. And the deal is this. 321 00:29:45 --> 00:29:50 The double-stranded DNA separates its strands, and one of the strands, 322 00:29:50 --> 00:29:55 and this is the difference between DNA replication and transcription, 323 00:29:55 --> 00:30:01 one of the strands becomes the template strand. 324 00:30:01 --> 00:30:14 And this template is copied to form 325 00:30:14 --> 00:30:22 a complementary strand. And it's copied by an enzyme called 326 00:30:22 --> 00:30:31 RNA polymerase. So RNA polymerase synthesizes the 327 00:30:31 --> 00:30:40 complementary strand to the template strand. 328 00:30:40 --> 00:30:45 complementary strand. And it does so, of course, 329 00:30:45 --> 00:30:51 as RNA, because we're talking about RNA synthesis and this is RNA 330 00:30:51 --> 00:30:57 polymerase. It does not, unlike DNA polymerization, 331 00:30:57 --> 00:31:03 require a primer. So this does not require a primer. 332 00:31:03 --> 00:31:07 OK. You should know, and it should be getting deep within 333 00:31:07 --> 00:31:11 your neural circuitry that polymerization occurs by adding 334 00:31:11 --> 00:31:15 nucleotides to the 3 prime end of the growing polymer. 335 00:31:15 --> 00:31:20 Yes. If that didn't make, you know, if you didn't say "yeah" 336 00:31:20 --> 00:31:24 to that, go back and think about it, go back and look at problem sets and 337 00:31:24 --> 00:31:28 you'll get more practice in this. But you really need to know that 338 00:31:28 --> 00:31:33 the growing chain adds onto the 3 prime end. 339 00:31:33 --> 00:31:39 OK. So after the polymer, after the RNA polymer is made the 340 00:31:39 --> 00:31:46 RNA is released from the template strand. As its being transcribed it 341 00:31:46 --> 00:31:52 forms this complementary strand. And, as you know, complementary 342 00:31:52 --> 00:31:59 strands can base pair. After it's made it is released from 343 00:31:59 --> 00:32:05 the template. And it usually then goes into the 344 00:32:05 --> 00:32:09 cytoplasm where it does its thing. So if you look at the diagram I 345 00:32:09 --> 00:32:13 gave you, that's what's up here, here's your double-stranded DNA, 346 00:32:13 --> 00:32:17 your gene. The strands separate. One strand is transcribed into RNA. 347 00:32:17 --> 00:32:21 The RNA is release. Obviously, your double-stranded template, 348 00:32:21 --> 00:32:25 or what was your double-stranded template will reform 349 00:32:25 --> 00:32:30 its double strand. So perhaps that's not so obvious, 350 00:32:30 --> 00:32:36 but the double-stranded, originally double-stranded template will reform 351 00:32:36 --> 00:32:41 its double strands, thus released RNA, then goes into 352 00:32:41 --> 00:32:47 the cytoplasm where it is translated into protein, or where the RNA is 353 00:32:47 --> 00:32:53 the final product. So let's look at that in a bit more 354 00:32:53 --> 00:32:58 detail. I've got here a template strand 355 00:32:58 --> 00:33:03 shown in red. This is, again, the second picture in the 356 00:33:03 --> 00:33:09 handout in front of you. And I've got three features added 357 00:33:09 --> 00:33:14 here. I have got a precise start site of transcription. 358 00:33:14 --> 00:33:19 I've indicated elongation where the polymer is elongating. 359 00:33:19 --> 00:33:24 And I have a precise termination site where transcription ends. 360 00:33:24 --> 00:33:30 OK? Now, let me see what I have here. 361 00:33:30 --> 00:33:35 I have a movie here. Watch the movie. I'll show it to 362 00:33:35 --> 00:33:41 you once, and then you can go and watch it at your leisure. 363 00:33:41 --> 00:33:47 This is meant to be RNA polymerase. There's the helix opening up 364 00:33:47 --> 00:33:53 locally. Here are ribonucleotide triphosphates coming in, 365 00:33:53 --> 00:33:59 and RNA polymerase is catalyzing their synthesis. OK? 366 00:33:59 --> 00:34:03 So the template strand is the bottom and here is the RNA being released. 367 00:34:03 --> 00:34:08 There's RNA polymerase moving along the helix. And the depiction is 368 00:34:08 --> 00:34:13 that the helix is opening locally and then closing again behind the 369 00:34:13 --> 00:34:18 RNA polymerase. At transcription termination, 370 00:34:18 --> 00:34:23 the helix, the gene helix zips up again and the transcript is released. 371 00:34:23 --> 00:34:28 So this is a very much simplified story. 372 00:34:28 --> 00:34:32 But is the basic principle of transcription. 373 00:34:32 --> 00:34:37 And you should know it. And in particular I have put onto 374 00:34:37 --> 00:34:41 this second diagram, and because you have him in front of 375 00:34:41 --> 00:34:46 you I'm not going to write this on the board, I'm going to use this as 376 00:34:46 --> 00:34:50 something to tell you, I have put the directionality of the 377 00:34:50 --> 00:34:55 strands of the double-helix on this diagram. This should be something 378 00:34:55 --> 00:35:00 you can deal with. 5 prime to 3 prime on one strand. 379 00:35:00 --> 00:35:04 The other strand is anti-parallel. RNA, any nucleic acid is 380 00:35:04 --> 00:35:08 synthesized by adding onto the 3 prime end. And that newly 381 00:35:08 --> 00:35:12 synthesizing nucleic acid polymer is anti-parallel to the template. 382 00:35:12 --> 00:35:16 This is also something that you should be familiar with. 383 00:35:16 --> 00:35:20 And you will have, will have, have not yet, will have practice on 384 00:35:20 --> 00:35:24 doing this kind of polymerization, but it should be something you 385 00:35:24 --> 00:35:28 really, really should be familiar with, this anti-parallel 386 00:35:28 --> 00:35:32 requirement. So, in fact, you can tell the 387 00:35:32 --> 00:35:36 direction of transcription because of the directionality of the 388 00:35:36 --> 00:35:40 template strand. OK. So this is very important for 389 00:35:40 --> 00:35:45 you to go and think about after class the directionality of the 390 00:35:45 --> 00:35:49 template and of the newly synthesized polymer. 391 00:35:49 --> 00:35:54 These are some diagrams from your book, and you can go and look at 392 00:35:54 --> 00:35:58 them. I'm not going to dwell on them. They indicate the different 393 00:35:58 --> 00:36:02 between, or the steps in transcription initiation, 394 00:36:02 --> 00:36:06 elongation and termination. And I've put them up there just to 395 00:36:06 --> 00:36:10 tell you there are these diagrams in your book and you can go and take a 396 00:36:10 --> 00:36:14 look at them and read the accompanying text. 397 00:36:14 --> 00:36:18 OK. So I see three problems with transcription that are very 398 00:36:18 --> 00:36:28 interesting problems. 399 00:36:28 --> 00:36:33 One is how to find the genes. I'll write them on the board and 400 00:36:33 --> 00:36:43 then we'll go through them. 401 00:36:43 --> 00:36:47 5% of the genome is genes. That's most of it that is not genes. 402 00:36:47 --> 00:36:51 How does the transcription enzyme, how does the RNA polymerase know 403 00:36:51 --> 00:36:56 which is a gene and which is not a gene? How does it know, 404 00:36:56 --> 00:37:00 even if it finds the gene, which strand is the template strand 405 00:37:00 --> 00:37:05 and which is not the template strand? 406 00:37:05 --> 00:37:08 I could have drawn your previous diagram where the top strand was the 407 00:37:08 --> 00:37:11 template, and that what would have happened would be that the RNA 408 00:37:11 --> 00:37:15 synthesis went in the other direction. So which strand 409 00:37:15 --> 00:37:31 is the template? 410 00:37:31 --> 00:37:38 And that also gives you the direction, of course, 411 00:37:38 --> 00:37:46 of transcription. And the third one I'm going to write, 412 00:37:46 --> 00:37:54 and then I'll tell you about this in a moment. I'm going to write how to 413 00:37:54 --> 00:38:00 unwrap chromatin. OK. So in each of your cells, 414 00:38:00 --> 00:38:05 you have to look at me for this. In each of your cells there is this 415 00:38:05 --> 00:38:11 length of DNA. One meter. This is a little over, 416 00:38:11 --> 00:38:16 but one meter of DNA. How big is the average cell in diameter? 417 00:38:16 --> 00:38:21 Give it to me in micrometers. Worth a frog. On average. Well, 418 00:38:21 --> 00:38:27 that's actually a really big cell. It's about ten times 419 00:38:27 --> 00:38:32 less than that. But whoever that was, 420 00:38:32 --> 00:38:36 who was it? No way. These are very bad to throw. Very bad to throw. 421 00:38:36 --> 00:38:41 You can have it because you caught it. See me afterwards. 422 00:38:41 --> 00:38:46 I'll give you one. OK. [LAUGHTER] OK. So how do you pack 423 00:38:46 --> 00:38:50 a meter of DNA into a cell that is about ten microns in diameter? 424 00:38:50 --> 00:38:55 OK. So, OK, Jamie, you want to hazard an answer here? 425 00:38:55 --> 00:39:06 Your hand was up. 426 00:39:06 --> 00:39:10 OK. Good. You can wind it up. OK. The other thing you have to do, 427 00:39:10 --> 00:39:14 of course, is to make it really thin. It has to be a lot thinner than my 428 00:39:14 --> 00:39:18 piece of rope. But once you've made it really thin 429 00:39:18 --> 00:39:22 you can then wind it up. OK. It's logical and this is how 430 00:39:22 --> 00:39:26 it's done. And you can wind it up and then it will fit into 431 00:39:26 --> 00:39:31 your ten micron cell. Now, in actual fact, 432 00:39:31 --> 00:39:36 there's a whole process to do that. And I'm going to go through them as 433 00:39:36 --> 00:39:41 we go through these problems here. So here is problem one exemplified. 434 00:39:41 --> 00:39:46 I've got red little dots for each of the genes. How does RNA 435 00:39:46 --> 00:39:51 polymerase find these genes in this vast amount of DNA that is not genes? 436 00:39:51 --> 00:39:56 Here's the other one. Which strand is the template? 437 00:39:56 --> 00:40:01 Oh. And here is a nice problem that in 438 00:40:01 --> 00:40:05 the interest of time I am not going to do here in class with you, 439 00:40:05 --> 00:40:10 but I want you guys to go and do this as an exercise. 440 00:40:10 --> 00:40:14 I will tell you that the answer is not on your handout on the Web. 441 00:40:14 --> 00:40:18 I took it off. Sneaky, ha? So that you can go and think about this. 442 00:40:18 --> 00:40:23 I want you to go and understand that the products of synthesis from 443 00:40:23 --> 00:40:27 either strand of a DNA double-stranded helix 444 00:40:27 --> 00:40:32 are not the same. OK? And I'm going to zip through 445 00:40:32 --> 00:40:37 this because I want to move on here. OK. And I want to move to problem 446 00:40:37 --> 00:40:42 three which is this thing I called chromatin. DNA is wound up around 447 00:40:42 --> 00:40:46 proteins. These are called histones, and we'll have more to say about 448 00:40:46 --> 00:40:51 them later in the course. And wound up and wound up and wound 449 00:40:51 --> 00:40:56 up. And there is a very set number and type of proteins that the DNA is 450 00:40:56 --> 00:41:01 wound around. And once the DNA has been wound 451 00:41:01 --> 00:41:05 around once, those DNA protein complexes are wound up some more, 452 00:41:05 --> 00:41:10 and then wound up some more. And eventually you get them wound up and 453 00:41:10 --> 00:41:14 wrapped up so much you get the characteristic rather large 454 00:41:14 --> 00:41:19 chromosomes which are very much packed DNA. Now, 455 00:41:19 --> 00:41:23 this is a great way to fit DNA into a cell. However, 456 00:41:23 --> 00:41:28 this wrapping up of the chromatin into, the wrapping up of the DNA 457 00:41:28 --> 00:41:33 into this chromatin structure inhibits transcription. 458 00:41:33 --> 00:41:37 And in order to allow transcription to proceed, you have to remove these 459 00:41:37 --> 00:41:42 proteins from the DNA and allow it to unwind locally. 460 00:41:42 --> 00:41:47 And that takes a whole series of enzymatic steps, 461 00:41:47 --> 00:41:51 again that we'll explore more later in the course. 462 00:41:51 --> 00:41:56 But the problem I throw out at you now is hw do you unwrap the 463 00:41:56 --> 00:42:01 chromatin where transcription is needed? And the answer to all of 464 00:42:01 --> 00:42:06 these things lies in a specific, no, stop. 465 00:42:06 --> 00:42:11 Stop. Down. Up. OK. The answer to all of these 466 00:42:11 --> 00:42:17 questions lies in a specific DNA sequence or a series of specific DNA 467 00:42:17 --> 00:42:23 sequences that are collectively called -- 468 00:42:23 --> 00:42:34 -- the promoter. 469 00:42:34 --> 00:42:41 Here's another one. What is a promoter? And I need to 470 00:42:41 --> 00:42:49 make the distinction now between transcribed DNA of a gene and 471 00:42:49 --> 00:42:56 untranscribed DNA of a gene. The promoter is part of a gene but 472 00:42:56 --> 00:43:03 it is not transcribed. It usually depends on the gene and 473 00:43:03 --> 00:43:09 the type of gene. It usually lies 5 prime to the 474 00:43:09 --> 00:43:16 transcriptional start site. And it is a DNA sequence that says 475 00:43:16 --> 00:43:23 this is a gene, and it also says transcription 476 00:43:23 --> 00:43:30 should proceed in this direction. OK. 477 00:43:30 --> 00:43:34 And the way it does these things, I'm going to each of the answers to 478 00:43:34 --> 00:43:39 each of the problems now, is that it binds proteins that 479 00:43:39 --> 00:43:44 specifically recognize the sequence of the promoter. 480 00:43:44 --> 00:43:48 So you talked about the DNA replication origin and proteins that 481 00:43:48 --> 00:43:53 specifically recognize the nucleotide sequence of the origin. 482 00:43:53 --> 00:43:58 This is analogous. There are proteins that recognize promoter 483 00:43:58 --> 00:44:03 sequences which are similar but not identical from gene to gene. 484 00:44:03 --> 00:44:09 So it binds proteins. And these are called transcription 485 00:44:09 --> 00:44:21 factors. 486 00:44:21 --> 00:44:25 And these transcription factors bind in a DNA sequence specific way. 487 00:44:25 --> 00:44:30 OK. It also binds RNA polymerase which I'm going to abbreviate RNA 488 00:44:30 --> 00:44:37 polymerase, RNA pol. OK? And the answers to the three 489 00:44:37 --> 00:44:46 questions are that firstly the protein-DNA interaction is sequence 490 00:44:46 --> 00:44:55 specific, sequence specific, and so this allows you to actually 491 00:44:55 --> 00:45:02 find the genes. Secondly, and this is cool and I'll 492 00:45:02 --> 00:45:07 show you a picture of this in a moment, the proteins interact with a 493 00:45:07 --> 00:45:12 promoter DNA differently on different strands of the helix so 494 00:45:12 --> 00:45:17 they bind asymmetrically. They may bind more to one strand 495 00:45:17 --> 00:45:22 than to the other strand. And this gives directionality to 496 00:45:22 --> 00:45:27 the transcription so the protein binding is asymmetric 497 00:45:27 --> 00:45:36 or strand specific. 498 00:45:36 --> 00:45:41 Not for all of these proteins but for a significant number. 499 00:45:41 --> 00:45:46 And that helps you decide which strand you're going to use as the 500 00:45:46 --> 00:45:52 template. And thirdly these proteins have got associated with 501 00:45:52 --> 00:45:57 them activities that will unwrap the chromatin, that will unwrap the DNA 502 00:45:57 --> 00:46:03 from its protein complexes and allow it to be accessible to the 503 00:46:03 --> 00:46:16 transcription machinery. 504 00:46:16 --> 00:46:20 OK. Let's zip so I can show you this. You can look at this on your 505 00:46:20 --> 00:46:25 slides. Here are some pictures from your book. Really important in this, 506 00:46:25 --> 00:46:30 don't move. Really important in this is a protein called TF2D which 507 00:46:30 --> 00:46:35 recognizes a sequence called, that goes T-A-T-A-A-A. 508 00:46:35 --> 00:46:40 This is called the TATA binding protein. And it's really important. 509 00:46:40 --> 00:46:45 And it's the one thing that, the major thing that gives asymmetry to 510 00:46:45 --> 00:46:50 this transcription, set of transcription factors on the 511 00:46:50 --> 00:46:55 promoter. Once TF2D has bound to the promoter, other proteins come 512 00:46:55 --> 00:47:01 along, including these various other things called BFHG and so on. 513 00:47:01 --> 00:47:05 And here's RNA polymerase. And you can see this complex 514 00:47:05 --> 00:47:10 positioned asymmetrically on the DNA. And this complex you should know 515 00:47:10 --> 00:47:15 the name of, I'm going to put it in its own box here, is the 516 00:47:15 --> 00:47:23 initiation complex. 517 00:47:23 --> 00:47:28 And I want to show you a crystallographic rendition of the 518 00:47:28 --> 00:47:34 TATA binding protein called TBP, also sometimes called TF2D. But 519 00:47:34 --> 00:47:40 TATA binding protein here shown in purple. And if you look here, 520 00:47:40 --> 00:47:45 you're looking head on at the double helix. OK? Here's the helix. 521 00:47:45 --> 00:47:51 You're looking down the helix. And you can see that this protein 522 00:47:51 --> 00:47:57 is positioned on just one side of the helix, so that gives 523 00:47:57 --> 00:48:02 you asymmetry. Here's another transcription factor 524 00:48:02 --> 00:48:06 bound to DNA. This is a protein called GAL-4. It binds as a dimer. 525 00:48:06 --> 00:48:11 And you can see that GAL4 is the blue. And here it is contacting 526 00:48:11 --> 00:48:15 just one side, one strand of this red double helix. 527 00:48:15 --> 00:48:20 And I'm going to stop there and finish off the last little 528 00:48:20 --> 00:48:23 bit on Monday.