1 00:00:00 --> 00:00:08 So, for today's lecture as you can see up there is molecular -- 2 00:00:08 --> 00:00:16 evolution, and ecology. 3 00:00:16 --> 00:00:23 And what I mean by this, 4 00:00:23 --> 00:00:28 it's basically the study or what we try to figure out in molecular 5 00:00:28 --> 00:00:34 evolution and ecology is what genes or gene sequences can tell us about 6 00:00:34 --> 00:00:39 the evolution and ultimately also the ecology of organisms 7 00:00:39 --> 00:00:44 in the environment. And it's particularly relevant for 8 00:00:44 --> 00:00:48 thinking about microorganisms, prokaryotes and the environment. 9 00:00:48 --> 00:00:53 And I hope I can actually convince you today of that. 10 00:00:53 --> 00:00:57 This is interesting. The topics that I want to cover 11 00:00:57 --> 00:01:02 today is, first of all, I want to review a little bit what 12 00:01:02 --> 00:01:06 we know about life on Earth, sort of give an overview of the 13 00:01:06 --> 00:01:11 evolution of life on Earth. Then, I want to go into specific 14 00:01:11 --> 00:01:15 topic that's of particular relevance for the evolution of eukaryotes. 15 00:01:15 --> 00:01:19 That's the endosymbiosis theory. And then I'll explain how we can 16 00:01:19 --> 00:01:23 use gene sequences to actually reconstruct events that have 17 00:01:23 --> 00:01:27 happened a very, very long time ago. 18 00:01:27 --> 00:01:31 OK, so we'll look at what we call molecular phylogenies, 19 00:01:31 --> 00:01:35 with the use of gene sequences to reconstruct the evolutionary history 20 00:01:35 --> 00:01:40 of organisms on Earth. Derived from that, we'll look at 21 00:01:40 --> 00:01:44 what we call the tree of life. That's sort of the big picture 22 00:01:44 --> 00:01:49 overview of the evolutionary relationships of all organisms on 23 00:01:49 --> 00:01:53 the planet. And then finally, I'll introduce you to a topic called 24 00:01:53 --> 00:01:59 molecular ecology. Again, that's how we can use gene 25 00:01:59 --> 00:02:05 sequences to learn something about the diversity of microorganisms in 26 00:02:05 --> 00:02:11 the environment that lead us then, next time, when I come back on 27 00:02:11 --> 00:02:18 Monday, into this big topic of environmental genomics, 28 00:02:18 --> 00:02:24 how we can actually expand this analysis to learn much more about 29 00:02:24 --> 00:02:30 organisms in the environment. So, first of all, let's look at 30 00:02:30 --> 00:02:38 life on Earth. Does anybody know how old we think 31 00:02:38 --> 00:02:48 Earth is? Say again? Yeah, 4.5 to 4.6, I haven't my 32 00:02:48 --> 00:02:58 notes 4.6. So, Earth's thought to have originated 33 00:02:58 --> 00:03:08 about 4.6 billion years ago. When did the first solid rocks 34 00:03:08 --> 00:03:19 appear on earth? So, when was the surface kind of 35 00:03:19 --> 00:03:30 solidified? Anybody know? About 3.9 billion years ago, OK? 36 00:03:30 --> 00:03:40 And when do we think life started to develop on the planet? 37 00:03:40 --> 00:03:50 Any ideas? Take a guess. Two? One? 3.5 billion years ago, 38 00:03:50 --> 00:04:00 OK? So, this is really remarkable. 39 00:04:00 --> 00:04:04 We think it didn't, I mean, of course it took a long 40 00:04:04 --> 00:04:09 time because were talking about millions of years and hundreds of 41 00:04:09 --> 00:04:13 millions of years, but still, if you look at the big 42 00:04:13 --> 00:04:18 picture, it didn't actually take life that long to evolve on the 43 00:04:18 --> 00:04:23 planet. So, why do we think that is the case? What's the evidence for 44 00:04:23 --> 00:04:27 that? Well, we look into sedimentary rocks, 45 00:04:27 --> 00:04:32 so old rocks that arose from sediments, what you find around this 46 00:04:32 --> 00:04:37 time, you find that chemicals start to appear, organic molecules that 47 00:04:37 --> 00:04:42 really resemble organic molecules in modern life. 48 00:04:42 --> 00:04:47 So, we have sort of chemical tracers, or chemical fossils. 49 00:04:47 --> 00:05:01 So, tracers that indicate the 50 00:05:01 --> 00:05:09 presence of organisms. But what we also find is so-called 51 00:05:09 --> 00:05:17 micro-fossils, and I have a picture of that here 52 00:05:17 --> 00:05:25 where when you actually take rocks and actually slice them into very, 53 00:05:25 --> 00:05:33 very then slices, you can put them under specific microscopes. 54 00:05:33 --> 00:05:37 And what you then find is that many rocks that are very, 55 00:05:37 --> 00:05:42 very old, have those kinds of inclusions in them. 56 00:05:42 --> 00:05:47 And these things really resemble very much modern prokaryotic cells, 57 00:05:47 --> 00:05:52 modern bacterial cells, for example. And so, those micro-fossils are 58 00:05:52 --> 00:05:57 generally taken as an indication, also, that life is already present 59 00:05:57 --> 00:06:02 during those times. Now, when we take a quick sort of 60 00:06:02 --> 00:06:08 overlook of the evolution of life on the planet, again this graph here 61 00:06:08 --> 00:06:13 summarizes sort of the last 4. billion years or so when life 62 00:06:13 --> 00:06:19 originated. We see that there was a period of chemical evolution, 63 00:06:19 --> 00:06:24 and then somewhere here that region, it's, of course, not really well 64 00:06:24 --> 00:06:30 understood when that exactly happens, the origin of life is placed. 65 00:06:30 --> 00:06:34 But I want to alert you to a couple of really, really critical steps 66 00:06:34 --> 00:06:39 here that are shown on this graph which we'll actually talk more about. 67 00:06:39 --> 00:06:44 It is thought that life very early on is split into three major 68 00:06:44 --> 00:06:49 lineages: the bacteria, the archaea, in what is called here 69 00:06:49 --> 00:06:54 nuclear line. And I'll come back to that in a minute or so. 70 00:06:54 --> 00:06:59 Then, a further major event which you may remember is oxygenic 71 00:06:59 --> 00:07:04 photosynthesis actually evolved -- -- which means that cyanobacteria 72 00:07:04 --> 00:07:08 evolved that started to produce oxygen as a byproduct of 73 00:07:08 --> 00:07:12 photosynthesis. And that really fundamentally 74 00:07:12 --> 00:07:16 changed the chemistry of the Earth. It actually became an oxidizing 75 00:07:16 --> 00:07:20 atmosphere. And what you see here is, once the oxygen concentration 76 00:07:20 --> 00:07:24 goes over a certain level, it allowed the development of an 77 00:07:24 --> 00:07:28 ozone shield. Now, what does that mean? 78 00:07:28 --> 00:07:33 What was the critical significance of the presence of an ozone shield? 79 00:07:33 --> 00:07:38 Does anybody know? What does it block out? Anybody remember that? 80 00:07:38 --> 00:07:43 What's the big significance of the ozone hole over Antarctica for 81 00:07:43 --> 00:07:48 example? It allows UV radiation to heat the Earth's surface, 82 00:07:48 --> 00:07:53 and in fact if there were no ozone, the UV radiation would be so strong 83 00:07:53 --> 00:07:59 that there would be no life possible on land. 84 00:07:59 --> 00:08:03 So, once the ozone shield actually developed, organisms could conquer, 85 00:08:03 --> 00:08:08 basically, the land's surface and settle on the land surface. 86 00:08:08 --> 00:08:13 In this, then, is thought to be at least correlated with the 87 00:08:13 --> 00:08:18 development of endosymbiosis. And I'll explain what I mean by 88 00:08:18 --> 00:08:22 that. But it basically led to the origin of modern eukaryotes, 89 00:08:22 --> 00:08:27 so your ancestors essentially. But there was still a long time, 90 00:08:27 --> 00:08:33 obviously, until humans appeared. We have here the origin of animals 91 00:08:33 --> 00:08:39 and metazoans, and then the age of the dinosaurs is 92 00:08:39 --> 00:08:45 already a very small blip here on this graph. And humans don't even 93 00:08:45 --> 00:08:51 get featured on that because we are so recent. So, 94 00:08:51 --> 00:08:57 but what I want to show you here is that three major lineages 95 00:08:57 --> 00:09:05 evolved early on. These are the bacteria, 96 00:09:05 --> 00:09:15 archaea, and what we call a nuclear lineage. And the significance of 97 00:09:15 --> 00:09:25 those nuclear lineages is that it basically combined with bacteria to 98 00:09:25 --> 00:09:35 form the modern eukaryotic cell. So, the eukarya, or eukaryotes 99 00:09:35 --> 00:09:50 they're also called. And it was this combination that we 100 00:09:50 --> 00:10:02 called the endosymbiosis event. I want to explain this a little bit 101 00:10:02 --> 00:10:07 more, and then I'll show you finally why we actually know that those 102 00:10:07 --> 00:10:12 things are very likely to have occurred a long time ago. 103 00:10:12 --> 00:10:17 Yes? It means the bacteria and the nuclear lineages combine to form a 104 00:10:17 --> 00:10:22 eukaryote, OK? And I'm actually going to explain 105 00:10:22 --> 00:10:27 this on the slide here. So, if you have any more questions 106 00:10:27 --> 00:10:32 after that, please let me know. So, again, this shows you this early 107 00:10:32 --> 00:10:38 evolution, this early split in two archaea, bacteria, 108 00:10:38 --> 00:10:44 and this sort of nuclear line. It is thought that this nuclear 109 00:10:44 --> 00:10:50 line, this was single celled organisms that increased in cell 110 00:10:50 --> 00:10:56 size, and then developed or partitioned the DNA into a nucleus, 111 00:10:56 --> 00:11:02 basically. So exactly how you find it in modern eukaryotic cells. 112 00:11:02 --> 00:11:07 But then what happened is the cell took up a bacterial cell, 113 00:11:07 --> 00:11:12 and over time this bacterial cell became symbiont. 114 00:11:12 --> 00:11:17 In fact it became the mitochondria. And so what this mitochondria now 115 00:11:17 --> 00:11:22 does in the moderate eukaryotic cell as you all know is it really took 116 00:11:22 --> 00:11:27 over the energy metabolism. So, the proto-eukaryotic cell took 117 00:11:27 --> 00:11:33 up a heterotrophic bacteria that form the mitochondria. 118 00:11:33 --> 00:11:37 And this ultimately then gave rise to protozoa and to modern-day 119 00:11:37 --> 00:11:42 animals. But there was a secondary symbiotic event. 120 00:11:42 --> 00:11:46 This cell, once it had taken up a heterotrophic bacterium, 121 00:11:46 --> 00:11:51 it took up an autotrophic bacterium, a cyanobacterium, an oxygenic 122 00:11:51 --> 00:11:55 photosynthesizer. And this actually that led to the 123 00:11:55 --> 00:12:00 development of modern algae and modern plants. 124 00:12:00 --> 00:12:08 So what we can say is that mitochondria our ancient 125 00:12:08 --> 00:12:24 heterotrophic bacteria -- 126 00:12:24 --> 00:12:36 And the chloroplasts are ancient cyanobacteria, 127 00:12:36 --> 00:12:48 so, oxygenic, photosynthetic bacteria. And these obviously have 128 00:12:48 --> 00:13:00 coevolved to then form animals and finally your plants. 129 00:13:00 --> 00:13:06 So now, obviously we are talking here about events that happened a 130 00:13:06 --> 00:13:13 very, very long time ago. And so, the big question is really 131 00:13:13 --> 00:13:19 how do we really know this? But this takes me to the third 132 00:13:19 --> 00:13:26 topic, which is that of molecular evolution. So, we can state 133 00:13:26 --> 00:13:34 the problem again, And that is very simply put, 134 00:13:34 --> 00:13:42 evolution is incredibly slow, OK? And therefore, its processes 135 00:13:42 --> 00:14:01 are not directly observable. 136 00:14:01 --> 00:14:05 And we need to actually use inference techniques to reconstruct 137 00:14:05 --> 00:14:10 evolutionary processes. Now, what do we use when we want to 138 00:14:10 --> 00:14:15 reconstruct the evolutionary history of animals and plants usually? 139 00:14:15 --> 00:14:20 Anybody? Fossils. Exactly. So you take a shovel, 140 00:14:20 --> 00:14:25 essentially, and dig down into the different layers. 141 00:14:25 --> 00:14:30 And there's different techniques that you can actually determine the 142 00:14:30 --> 00:14:34 age of different sedentary rocks. For example, and then you can 143 00:14:34 --> 00:14:38 construct, if you're lucky, you'll find enough fossils of a 144 00:14:38 --> 00:14:42 particular lineage. You can reconstruct the evolution 145 00:14:42 --> 00:14:45 of the lineage. I'm sure you all have seen the 146 00:14:45 --> 00:14:49 example of the horse, for example, where we have actually 147 00:14:49 --> 00:14:53 quite good evidence when ancient horses look like. 148 00:14:53 --> 00:14:57 And we can reconstruct the sequence of events that led to the evolution 149 00:14:57 --> 00:14:59 of modern-day horses. Now, you can imagine, 150 00:14:59 --> 00:14:59 though, that when we talk about such ancient events like these there 151 00:14:59 --> 00:14:59 really is no fossil record. OK, so what people have figured out, 152 00:14:59 --> 00:14:59 then, is that that was really a stroke of genius that came about in 153 00:14:59 --> 00:15:00 the late 60s, that DNA molecules can act as evolutionary chronometers. 154 00:15:00 --> 00:15:44 OK, now what do I mean by that? 155 00:15:44 --> 00:15:48 I mean that you can take DNA sequences or gene sequences from 156 00:15:48 --> 00:15:53 different kinds of organisms. Based on those gene sequences you 157 00:15:53 --> 00:15:58 can reconstruct the relationships to each other. You can determine 158 00:15:58 --> 00:16:02 whether two organisms are closely related or whether they are only 159 00:16:02 --> 00:16:14 very distantly related. And the underlying mechanism of that, 160 00:16:14 --> 00:16:33 is that mutations happen with a certain probability all the time. 161 00:16:33 --> 00:16:41 So, the idea is that as time passed on, DNA molecules will change. 162 00:16:41 --> 00:16:50 So they will accumulate, actually, mutations, and so this will lead to, 163 00:16:50 --> 00:16:59 and that the idea is that the amount of change in a particular DNA 164 00:16:59 --> 00:17:08 sequence is proportional to the time of separate evolution of two 165 00:17:08 --> 00:17:17 different lineages or two different organisms. 166 00:17:17 --> 00:17:26 So, the amount is more or less proportional -- 167 00:17:26 --> 00:17:38 -- to time since the last 168 00:17:38 --> 00:17:54 common ancestry. 169 00:17:54 --> 00:18:05 So, let me explain how this is actually done. 170 00:18:05 --> 00:18:16 What you really need in order to do this, is you need genes that are 171 00:18:16 --> 00:18:27 related to each other, OK? So, genes, they need to be 172 00:18:27 --> 00:18:34 universally distributed. That meets all organisms that you 173 00:18:34 --> 00:18:37 want to compare need to have this type of gene. And, 174 00:18:37 --> 00:18:41 those genes need to have conserved function. 175 00:18:41 --> 00:18:52 In these genes, 176 00:18:52 --> 00:18:57 we can then compare to each other, and I will explain how this is 177 00:18:57 --> 00:19:02 actually done. Any questions so far? 178 00:19:02 --> 00:19:06 OK, so the example that I actually want to bring is the 16S 179 00:19:06 --> 00:19:26 ribosomal RNA genes. 180 00:19:26 --> 00:19:35 We oftentimes abbreviate this rRNA. Now, does anybody remember what the 181 00:19:35 --> 00:19:44 ribosomal RNAs are and do? What's the ribosome? Yes? 182 00:19:44 --> 00:19:53 Right, and what does it do? Exactly, it's the location where 183 00:19:53 --> 00:20:02 messenger RNA is translated into protein. 184 00:20:02 --> 00:20:06 Now, the ribosomal RNAs are an integral part of the ribosome. 185 00:20:06 --> 00:20:10 They play both a catalytic role as well as a structural role in the 186 00:20:10 --> 00:20:14 ribosome. And so, fundamentally, because this is such 187 00:20:14 --> 00:20:18 a fundamental organelle, all living organisms possess it. 188 00:20:18 --> 00:20:22 So, all organisms have it. So this allows us to use these genes to 189 00:20:22 --> 00:20:26 really compare all living organisms to each other. 190 00:20:26 --> 00:20:30 OK, so this is a very important point. 191 00:20:30 --> 00:20:34 I wanted to show you a, OK, if it wakes up. There we go. 192 00:20:34 --> 00:20:39 An example of these ribosomal RNA genes, now this is actually, 193 00:20:39 --> 00:20:43 what you see here is a secondary structure of the actual RNA, 194 00:20:43 --> 00:20:48 the ribosomal RNA. Now, these molecules have a secondary structure 195 00:20:48 --> 00:20:52 because they play a catalytic and structural role. 196 00:20:52 --> 00:20:57 And so, the really amazing thing is when you look at the structure, 197 00:20:57 --> 00:21:01 the structure determines really the function of those molecules in 198 00:21:01 --> 00:21:06 different organisms. And then look at this. 199 00:21:06 --> 00:21:10 We have here a bacterium, and here are an archaea. Now, 200 00:21:10 --> 00:21:14 if you think back to the first couple of slides, 201 00:21:14 --> 00:21:18 what I showed you is that those organisms have not shared a common 202 00:21:18 --> 00:21:22 evolutionary history for about four, or so, billion years, or 3 billion 203 00:21:22 --> 00:21:26 years, excuse me. But, if you just glance very 204 00:21:26 --> 00:21:30 quickly at the structures, you see that they look very similar 205 00:21:30 --> 00:21:34 to each other. So, there's an indication that the 206 00:21:34 --> 00:21:38 function is really very highly conserved of those molecules. 207 00:21:38 --> 00:21:42 However, when you actually look at the sequences in detail, 208 00:21:42 --> 00:21:46 what you'll find is that there's different regions. 209 00:21:46 --> 00:21:50 And I'd given some examples here denoted by A, B, 210 00:21:50 --> 00:21:54 C in those molecules. And these different regions of the 211 00:21:54 --> 00:21:58 molecules are really the key to its usefulness in figuring out the 212 00:21:58 --> 00:22:02 evolution and ecology of many organisms. 213 00:22:02 --> 00:22:06 The region number A here, or denoted by A, a sequence 214 00:22:06 --> 00:22:10 stretches that are the same in all living organisms. 215 00:22:10 --> 00:22:14 So they are universally conserved, which means that if you get a 216 00:22:14 --> 00:22:19 mutation in a gene in that particular region, 217 00:22:19 --> 00:22:23 you are dead. OK, that's why it's conserved essentially. 218 00:22:23 --> 00:22:27 Then we have those regions B where the length is conserved, 219 00:22:27 --> 00:22:32 but the sequence is not. So, there are sequence change 220 00:22:32 --> 00:22:36 allowed, but the length needs to be conserved. And then there's the 221 00:22:36 --> 00:22:40 region C were neither length nor sequence is actually conserved, 222 00:22:40 --> 00:22:44 and where we get a lot of variation. So, let me write this down. We 223 00:22:44 --> 00:22:49 have three types of sequence stretches. 224 00:22:49 --> 00:23:05 We have A, what I called the 225 00:23:05 --> 00:23:16 universally conserved sequences. We have B where length, but not 226 00:23:16 --> 00:23:27 sequence is conserved. And, we have C where neither length 227 00:23:27 --> 00:23:42 nor sequence is actually conserved. 228 00:23:42 --> 00:23:48 And the first two stretches, the first two types of sequence 229 00:23:48 --> 00:23:55 stretches, are very important in figuring out the phylogeny or the 230 00:23:55 --> 00:24:01 evolutionary relationships amongst organisms. Whereas the sequence 231 00:24:01 --> 00:24:08 stretches number C because they vary so dramatically, 232 00:24:08 --> 00:24:15 are very important in identifying organisms. 233 00:24:15 --> 00:24:19 And we'll talk more about this actually next time. 234 00:24:19 --> 00:24:24 So what can we actually know do with those sequences? 235 00:24:24 --> 00:24:29 Well, the first step is we need to generate an alignment. 236 00:24:29 --> 00:24:51 OK, and this is actually shown here, 237 00:24:51 --> 00:24:55 where each row denotes a gene from a particular organism. 238 00:24:55 --> 00:25:00 OK, so these are all abbreviated here. 239 00:25:00 --> 00:25:04 These actually aren't ribosomal RNA genes, but other genes. 240 00:25:04 --> 00:25:09 And that what you will see here is we can recognize those three 241 00:25:09 --> 00:25:13 different regions that I've pointed out before. You have the regions A 242 00:25:13 --> 00:25:18 which tell you which nucleotides line up with each other, 243 00:25:18 --> 00:25:22 so you use this sort of as an anchor because the sequences never vary 244 00:25:22 --> 00:25:27 amongst organisms. And that the sequence region B 245 00:25:27 --> 00:25:31 where you light up sequences that vary or stretches that vary in 246 00:25:31 --> 00:25:36 sequence but not in length. Now, why is this important? 247 00:25:36 --> 00:25:41 It's important because you have in each column that nucleotides that 248 00:25:41 --> 00:25:47 have originated from a common ancestral nucleotide, 249 00:25:47 --> 00:25:52 and whose variation over time you can actually monitor. 250 00:25:52 --> 00:25:58 Is everybody with that? Any questions? OK, great. 251 00:25:58 --> 00:26:02 The second step, then, is the calculation of a 252 00:26:02 --> 00:26:16 similarity. 253 00:26:16 --> 00:26:20 And this is shown here. Again, we have a very simplified 254 00:26:20 --> 00:26:24 alignment now of four different organisms. Here, 255 00:26:24 --> 00:26:29 we have the sequences that we want to compare. And what you'll see is 256 00:26:29 --> 00:26:33 that they're overall very similar, but there are different sort of 257 00:26:33 --> 00:26:38 nucleotides. And so, what we simply do is for 258 00:26:38 --> 00:26:43 each pair of sequence combinations, we calculate the sequence similarity 259 00:26:43 --> 00:26:48 value. So, what you see is that you have 12 nucleotides, 260 00:26:48 --> 00:26:52 and the first pair differs in three nucleotides. OK, 261 00:26:52 --> 00:26:57 so that tells us, or it's called actually a distance 262 00:26:57 --> 00:27:01 here, I'm sorry. Let me write this down here. 263 00:27:01 --> 00:27:15 It's simply one minus the similarity, 264 00:27:15 --> 00:27:21 of course, but so basically a quarter of the nucleotides differ 265 00:27:21 --> 00:27:27 where it's between A and C, a third of the nucleotides 266 00:27:27 --> 00:27:33 difference on. OK, so you do this for each pair of 267 00:27:33 --> 00:27:40 sequences, excuse me. The third step, 268 00:27:40 --> 00:27:49 then, is to calculate the correction for multiple mutations affecting the 269 00:27:49 --> 00:28:08 same nucleotides. 270 00:28:08 --> 00:28:12 Now, you can imagine that over time there's a probability that a 271 00:28:12 --> 00:28:16 particular nucleotide mutates, say, twice. So, in the first 272 00:28:16 --> 00:28:20 instance it may change from A to a G, , but then it changes to a C. 273 00:28:20 --> 00:28:24 But when you look at the modern-day sequences, you don't know that this 274 00:28:24 --> 00:28:28 actually happened. And so there's ways to 275 00:28:28 --> 00:28:32 statistically estimate what the likelihood is that a sequence 276 00:28:32 --> 00:28:37 actually contains such multiple events. 277 00:28:37 --> 00:28:41 OK, and this, we called, a corrective evolutionary distance 278 00:28:41 --> 00:28:46 then. And what you will note is that the corrected evolutionary 279 00:28:46 --> 00:28:51 distance is invariably larger than the actual observed one. 280 00:28:51 --> 00:28:56 Now, what can we can do with those distances? We can constrain them 281 00:28:56 --> 00:29:01 into a best fit tree of relationships. 282 00:29:01 --> 00:29:07 So, we can draw what we call is a best fit tree. 283 00:29:07 --> 00:29:14 That's shown here. We have our four organisms, 284 00:29:14 --> 00:29:20 but when you look at those branches of the tree what you'll see is that 285 00:29:20 --> 00:29:27 they add up roughly to the correct evolutionary distance here. 286 00:29:27 --> 00:29:32 So, between A and B we have 0. 3 and 0.08, which roughly gives you 287 00:29:32 --> 00:29:37 0.3 here, OK, whereas between A and C the tree is constrain such that we 288 00:29:37 --> 00:29:42 have 0.31, and here 0. 5, and so overall you roughly get 289 00:29:42 --> 00:29:48 the distance here that we have calculated. And so what this means 290 00:29:48 --> 00:29:53 is that you ordered the organisms by their calculated evolutionary 291 00:29:53 --> 00:29:58 distance. And so you have now obtained, actually, 292 00:29:58 --> 00:30:04 a very intuitive picture of the relationship of organisms to each 293 00:30:04 --> 00:30:09 other where A and B are obviously the most closely related ones, 294 00:30:09 --> 00:30:15 and A and D are the most distantly related. 295 00:30:15 --> 00:30:23 Is everybody with it? Any questions? OK, now, 296 00:30:23 --> 00:30:31 this best fit tree is what we call a phylogeny. 297 00:30:31 --> 00:30:52 Now, excuse me, 298 00:30:52 --> 00:31:00 these techniques really revolutionized the study of 299 00:31:00 --> 00:31:08 evolutionary relationships, and one of the things that it 300 00:31:08 --> 00:31:16 allowed us to do is to construct universal phylogenetic trees or what 301 00:31:16 --> 00:31:23 we can also call the tree of life. And I will show you this on the next 302 00:31:23 --> 00:31:30 slide, and that I want to make a few general statements about this. 303 00:31:30 --> 00:31:37 So first of all, when you analyze all known organisms, 304 00:31:37 --> 00:31:45 and obviously that would be a big task, but representative of all 305 00:31:45 --> 00:31:52 known organisms, what you'll find is that, 306 00:31:52 --> 00:32:00 indeed, we have three major lineages: the bacteria, 307 00:32:00 --> 00:32:07 the archaea, and the eukarya. OK, so we have what we call three 308 00:32:07 --> 00:32:15 domains of life: the archaea, bacteria, and the eukarya. 309 00:32:15 --> 00:32:20 So, this really is the evidence that life really split very, 310 00:32:20 --> 00:32:26 very early on into those three lineages that I showed you before. 311 00:32:26 --> 00:32:32 Interestingly, two of those major domains here are 312 00:32:32 --> 00:32:39 prokaryotic, OK? So, two of the domains are 313 00:32:39 --> 00:32:46 prokaryotes. Moreover, if you actually look at the types of 314 00:32:46 --> 00:32:53 organisms that are on here, you'll notice that even on the 315 00:32:53 --> 00:33:00 eukaryotic side of the tree, most of the organisms here are 316 00:33:00 --> 00:33:07 actually microbial. So, the single celled organisms: and 317 00:33:07 --> 00:33:14 that means that most of the life on the planet is microbial. 318 00:33:14 --> 00:33:21 The vast diversity of organisms on the planet are microorganisms. 319 00:33:21 --> 00:33:29 So, we can say that most life is microbial. 320 00:33:29 --> 00:33:34 And when you, then, look at analysis of mitochondria, 321 00:33:34 --> 00:33:39 and chloroplasts which all have their own genetic machinery, 322 00:33:39 --> 00:33:44 and therefore also their own ribosomes you'll see that the 323 00:33:44 --> 00:33:49 mitochondrion, OK, and the chloroplasts both tree 324 00:33:49 --> 00:33:54 within the bacteria. So, we really have an amazing 325 00:33:54 --> 00:33:59 confirmation of this endosymbiont theory which actually developed in 326 00:33:59 --> 00:34:04 the absence of gene sequences by some Russian scientists in the early 327 00:34:04 --> 00:34:13 20th century. So, we have that mitochondria and 328 00:34:13 --> 00:34:27 chloroplasts tree within bacteria, and this really supports the 329 00:34:27 --> 00:34:36 endosymbiont theory. So really, you could say eukaryotes 330 00:34:36 --> 00:34:42 are really just walking, and swimming, and flying incubators 331 00:34:42 --> 00:34:48 for bacteria, right? So, just hosts for microorganisms. 332 00:34:48 --> 00:34:54 OK, so basically you can, what you should take home from this is the 333 00:34:54 --> 00:35:00 three domains of life. Two are prokaryotic, and even more 334 00:35:00 --> 00:35:06 so most of the diversity that we find is actually microbial, 335 00:35:06 --> 00:35:12 and then finally the endosymbiont theory is actually confirmed by 336 00:35:12 --> 00:35:17 those phylogenies. Now, what I want to cover in the 337 00:35:17 --> 00:35:22 remaining time, is how we can actually use now those 338 00:35:22 --> 00:35:27 sequences to learn something about organisms in the environment. 339 00:35:27 --> 00:35:32 That's the topic of molecular ecology. 340 00:35:32 --> 00:35:43 To introduce this, 341 00:35:43 --> 00:35:47 I just want to show you a couple slides that really sort of capture 342 00:35:47 --> 00:35:51 what the big problem is that we're facing here. Now, 343 00:35:51 --> 00:35:55 when we look at the abundance of prokaryotic cells in different types 344 00:35:55 --> 00:35:59 of environments, what we see is that there is an 345 00:35:59 --> 00:36:04 enormous number of different prokaryotes out there. 346 00:36:04 --> 00:36:08 This summarizes, here, different types of 347 00:36:08 --> 00:36:12 environments. We have the marine environment, freshwater environment, 348 00:36:12 --> 00:36:16 sediment and soils, subsurface sentiments and animal guts. 349 00:36:16 --> 00:36:20 And that this number here gives you the average number of prokaryotic 350 00:36:20 --> 00:36:24 cells either per milliliter or per gram. And it here we have the total 351 00:36:24 --> 00:36:28 number of cells obtained by multiplying the average number with 352 00:36:28 --> 00:36:33 the total volume of the particular environment. 353 00:36:33 --> 00:36:37 So what you can see is that in the marine environment, 354 00:36:37 --> 00:36:41 we have an average half a million cells per milliliter of water, 355 00:36:41 --> 00:36:45 OK? It freshwater, we have about a million cells. 356 00:36:45 --> 00:36:49 What is that telling you? There's a ton of prokaryotes out 357 00:36:49 --> 00:36:53 there. What you go swimming, you take a little gulp of water: 358 00:36:53 --> 00:36:57 you've probably eaten several million prokaryotes, 359 00:36:57 --> 00:37:01 that it's nothing to worry about because what this also tells us is 360 00:37:01 --> 00:37:05 that very, very few prokaryotes out there are really pathogens because 361 00:37:05 --> 00:37:09 otherwise you'd be sick all the time. 362 00:37:09 --> 00:37:15 Now, in sediments and soils, in as little as a gram you have five 363 00:37:15 --> 00:37:22 times 10^9 prokaryotic cells almost. 5 billion prokaryotic cells are out 364 00:37:22 --> 00:37:29 there, and even in very, very deep sediments that reach down 365 00:37:29 --> 00:37:36 to 3,000 m, you have a substantial number of prokaryotic cells. 366 00:37:36 --> 00:37:40 Well, and here's your guts, 10^5 times 10^6 gives you 10^11 per 367 00:37:40 --> 00:37:45 gram. So again, you're just a walking incubator for 368 00:37:45 --> 00:37:50 a very complex microbial community. Here's the global abundance. You 369 00:37:50 --> 00:37:55 see that steeps of surface sediments and the marine environment, 370 00:37:55 --> 00:38:00 probably in terms of numbers at least, the most important 371 00:38:00 --> 00:38:05 microbial environments. Now, faced with this enormous 372 00:38:05 --> 00:38:09 abundance of prokaryotes out there, very important question is how many 373 00:38:09 --> 00:38:14 of them are out there? Or, how diverse our prokaryotes in 374 00:38:14 --> 00:38:18 the environment? That's important if you want to 375 00:38:18 --> 00:38:23 figure out their function and the environment, and want to understand 376 00:38:23 --> 00:38:27 also their evolution. And what I want to show you here is 377 00:38:27 --> 00:38:32 that we've gone through an amazing development in our understanding of 378 00:38:32 --> 00:38:36 prokaryotic diversity in the environment over the last 379 00:38:36 --> 00:38:42 10 to 15 years or so. Who knows about E. 380 00:38:42 --> 00:38:48 . Wilson here? One person? So, he wrote a very famous book on 381 00:38:48 --> 00:38:54 biodiversity, which was published in 1988, where he tried to summarize, 382 00:38:54 --> 00:39:00 really, how diverse the known organisms are on the planet it also 383 00:39:00 --> 00:39:06 try to extrapolate to the total diversity. 384 00:39:06 --> 00:39:10 And what you see is that he came up with about 1.4 million different 385 00:39:10 --> 00:39:14 species here, mostly dominated by insects. That's the big section 386 00:39:14 --> 00:39:19 here on this pie chart. The plants: very important. 387 00:39:19 --> 00:39:23 And if you look, the prokaryotes feature with about 3, 388 00:39:23 --> 00:39:27 00 different species. So, in 1988 we thought there were very 389 00:39:27 --> 00:39:32 few prokaryotic species out there. If you look about 10 years into the 390 00:39:32 --> 00:39:36 future and take the assessment here, and this just exemplifies how the 391 00:39:36 --> 00:39:41 thinking has changed, you see that we think now that there 392 00:39:41 --> 00:39:45 is about 11 million different species out there, 393 00:39:45 --> 00:39:50 and that the vast majority of them are prokaryotic, 394 00:39:50 --> 00:39:54 OK, 10 million. So, this big part of the pie chart is 395 00:39:54 --> 00:39:59 really the prokaryotic diversity. Now, what really has changed is 396 00:39:59 --> 00:40:03 that we've actually started to use molecular techniques to determine 397 00:40:03 --> 00:40:08 the diversity of prokaryotes in the environment. 398 00:40:08 --> 00:40:18 So molecular ecology is really the use of molecular gene sequences 399 00:40:18 --> 00:40:29 obtained directly from the environment -- 400 00:40:29 --> 00:40:42 -- to learn about the diversity 401 00:40:42 --> 00:40:54 prokaryotic -- 402 00:40:54 --> 00:40:58 -- diversity out there. Now, this slide just quickly 403 00:40:58 --> 00:41:03 summarizes this. Basically, the idea is that you go 404 00:41:03 --> 00:41:08 out into the environment and collect either water or soil samples that, 405 00:41:08 --> 00:41:13 as I just showed you, invariably contain a lot of different 406 00:41:13 --> 00:41:17 prokaryotic cells. You then lyse the cells and purify 407 00:41:17 --> 00:41:22 their DNA. And so that you end up with a mixture of DNA that 408 00:41:22 --> 00:41:27 represents the organisms out there, and then you can use universal PCR 409 00:41:27 --> 00:41:32 primers to actually amplify ribosomal RNA genes from all the 410 00:41:32 --> 00:41:37 organisms that are present in your samples. 411 00:41:37 --> 00:41:42 Now, why can you use universal PCR primers? Well, 412 00:41:42 --> 00:41:48 they target the regions number A that I showed you before. 413 00:41:48 --> 00:41:53 Those regions in the genes are invariant amongst all organisms. 414 00:41:53 --> 00:41:59 You guys all remember how the PCR works, right? We cover this. 415 00:41:59 --> 00:42:04 OK? Yes? No? Who doesn't? You don't? All right, 416 00:42:04 --> 00:42:09 come to the board. Just kidding. OK, you should look it up. I don't 417 00:42:09 --> 00:42:15 have time to cover this, unfortunately, but basically it's a 418 00:42:15 --> 00:42:20 technique that allows you to amplify specific types of genes millions to 419 00:42:20 --> 00:42:25 billion fold. And once you have done this, what you can do is that 420 00:42:25 --> 00:42:31 you can purify the genes on gels, and then separate them by cloning 421 00:42:31 --> 00:42:36 them into individual plasmids. And those plasmids have been 422 00:42:36 --> 00:42:41 inserted into E. coli cells, and the E. 423 00:42:41 --> 00:42:46 coli cells are then individually grown up so that each culture 424 00:42:46 --> 00:42:50 contains only a single plasmid, and you can then sequence these 425 00:42:50 --> 00:42:55 ribosomal DNAs or ribosomal RNA genes from those clones. 426 00:42:55 --> 00:43:00 And so, you have obtained a library of the ribosomal RNA genes 427 00:43:00 --> 00:43:08 from the environment. So, we use environmental ribosomal 428 00:43:08 --> 00:43:18 RNA gene libraries from which we then can actually compare how many 429 00:43:18 --> 00:43:28 different types of genes are out there. 430 00:43:28 --> 00:43:32 So let me show you an example of this. What we have done recently, 431 00:43:32 --> 00:43:37 we've gone out in one of the first really comprehensive samplings of 432 00:43:37 --> 00:43:42 coastal bacteria plankton, which means the bacteria that are 433 00:43:42 --> 00:43:47 present free living in ocean water. And so, we've done this, we've 434 00:43:47 --> 00:43:52 collected all those clones, and then basically we constructed 435 00:43:52 --> 00:43:57 those phylogenetic trees that I showed you before that really allow 436 00:43:57 --> 00:44:02 us see how many different types are out there, and how closely related 437 00:44:02 --> 00:44:07 they are to one another. And what we found is that in this 438 00:44:07 --> 00:44:12 environment that you think might be very simple because it just the 439 00:44:12 --> 00:44:17 water column right? No, not much structure in there. 440 00:44:17 --> 00:44:22 We found over 1500 bacterial 16S ribosomal RNA sequences to occur, 441 00:44:22 --> 00:44:27 so an enormous diversity of prokaryotes of bacteria in that 442 00:44:27 --> 00:44:32 particular environment. And the important point is that when 443 00:44:32 --> 00:44:36 you actually look at a collection of such studies that I just showed you, 444 00:44:36 --> 00:44:40 what you find is that the vast majority of microorganisms in the 445 00:44:40 --> 00:44:44 environment have never been cultured. So traditionally what we do of 446 00:44:44 --> 00:44:49 course to learn about microorganisms when you grow E. 447 00:44:49 --> 00:44:53 coli, or so, you throw them onto culture plates. 448 00:44:53 --> 00:44:57 You make lots of different cells, and that allows you to study some of 449 00:44:57 --> 00:45:02 their properties. But when you look, 450 00:45:02 --> 00:45:06 for example, at results from the ocean, this summarizes now coastal 451 00:45:06 --> 00:45:10 and open ocean environments, again, the bacteria plankton is 452 00:45:10 --> 00:45:15 those free-floating bacterial cells in the water. 453 00:45:15 --> 00:45:19 And you compare this to what we've actually been able to culture from 454 00:45:19 --> 00:45:23 those environments. What you see is that you have some 455 00:45:23 --> 00:45:27 dominant groups here. They have all funny names, 456 00:45:27 --> 00:45:32 most of them, because they're just clones and clone libraries. 457 00:45:32 --> 00:45:36 But these are the dominant groups that show up in clone libraries. 458 00:45:36 --> 00:45:40 Here's their relative representation in different clone 459 00:45:40 --> 00:45:44 libraries from a variety of environments. And so here you have 460 00:45:44 --> 00:45:48 one very important one, the SAR11 group, or this one, 461 00:45:48 --> 00:45:53 the SAR86, that always show up in clone libraries. 462 00:45:53 --> 00:45:57 But we've never see them in culture, so the important point to realize 463 00:45:57 --> 00:46:01 here is that what is actually happening is that whenever we go out, 464 00:46:01 --> 00:46:05 we find a great diversity of bacteria out there, 465 00:46:05 --> 00:46:10 but we have no idea what they actually do. 466 00:46:10 --> 00:46:14 And this is one of the big questions that we need to answer to understand, 467 00:46:14 --> 00:46:18 really, how the planet actually works. What are those uncultured 468 00:46:18 --> 00:46:22 microorganisms out in the environment really doing, 469 00:46:22 --> 00:46:26 and what is their importance? And we'll talk about this next time. 470 00:46:26 --> 00:46:30 We're going to talk about environmental genomics because 471 00:46:30 --> 00:46:34 essentially what we can do now, is we have techniques available that 472 00:46:34 --> 00:46:38 allow us to isolate and least large fragments of the genomes, 473 00:46:38 --> 00:46:42 sequence those, and look at what kinds of genes they have present. 474 00:46:42 --> 00:46:46 And that allows us, then, to infer some of their 475 00:46:46 --> 00:46:51 function in the biogeochemical cycles in the environment. 476 00:46:51 --> 00:46:55 OK, so with this I'm going to close today unless you have 477 00:46:55 --> 00:46:58 any more questions.