1 00:00:00 --> 00:00:04 So what I want to do today is recap a little bit what we talked about 2 00:00:04 --> 00:00:08 last time, reiterate some of the important points, 3 00:00:08 --> 00:00:13 and then show you how we can learn something about microorganisms in 4 00:00:13 --> 00:00:17 the environment by talking about in-situ identification of 5 00:00:17 --> 00:00:21 microorganisms as well as genomics. We'll first talk about genomics and 6 00:00:21 --> 00:00:26 general and then talk about some applications of genomics to 7 00:00:26 --> 00:00:30 environmental microbiology because I think there is some of the most 8 00:00:30 --> 00:00:35 exciting new developments are in the area, actually. 9 00:00:35 --> 00:00:44 So last time we talked about molecular evolution and ecology. 10 00:00:44 --> 00:00:53 And just to recap, some of the main points were that we can actually use 11 00:00:53 --> 00:01:02 genes or gene sequences for a couple of very important questions that we 12 00:01:02 --> 00:01:15 want to explore. The first one was gene sequences act 13 00:01:15 --> 00:01:33 as evolutionary chronometers. 14 00:01:33 --> 00:01:40 Now, what do I mean by that? Basically what we said last time 15 00:01:40 --> 00:01:48 was that each gene, each sequence in the genome 16 00:01:48 --> 00:01:55 accumulates mutations with a certain probability. So what we mean is 17 00:01:55 --> 00:02:03 that all genes accumulate mutations over time. 18 00:02:03 --> 00:02:07 Now, these of course are the mutations that do not kill the 19 00:02:07 --> 00:02:12 organisms, so not the deleterious mutations, but these are mutations 20 00:02:12 --> 00:02:16 that are either slightly deleterious, or don't matter, 21 00:02:16 --> 00:02:21 or are beneficial mutations. OK, and what this entails is that 22 00:02:21 --> 00:02:25 each gene accumulates mutation with a certain probability over time. 23 00:02:25 --> 00:02:30 It basically means that two organisms that come from species 24 00:02:30 --> 00:02:34 that are relatively closely related to each other have gene sequences 25 00:02:34 --> 00:02:39 that will be much more similar to each other than genes from an 26 00:02:39 --> 00:02:44 organism that comes from a species that's much more distantly related. 27 00:02:44 --> 00:02:48 So, in practical terms what this means is your genes are much, 28 00:02:48 --> 00:02:52 much more similar to those of a monkey than they are to a crocodile, 29 00:02:52 --> 00:02:56 for example. And we can take advantage of that by applying some 30 00:02:56 --> 00:03:00 algorithms, some mathematical modeling essentially, 31 00:03:00 --> 00:03:04 to constrain these relationships in those phylogenetic trees that we 32 00:03:04 --> 00:03:09 talked about last time. And I also mentioned that the 33 00:03:09 --> 00:03:14 ribosomal RNA genes are particularly important for that process. 34 00:03:14 --> 00:03:19 In principle, you could do it with any protein coding machine or any 35 00:03:19 --> 00:03:24 kind of gene in the genome, but we use the ribosomal RNA genes 36 00:03:24 --> 00:03:29 in particular because all organisms have them. They're part of a 37 00:03:29 --> 00:03:34 handful of genes that are what we called universally distributed 38 00:03:34 --> 00:03:38 last time. And what this allows us to do is 39 00:03:38 --> 00:03:42 then construct phylogenetic relationships for all living 40 00:03:42 --> 00:03:46 organisms. And I just want to remind you of the tree of life that 41 00:03:46 --> 00:03:50 I showed you last time where we can really explore the relationships 42 00:03:50 --> 00:03:54 amongst all living organisms. And some of the important points 43 00:03:54 --> 00:03:58 there that we made were, for example, that the tree of life 44 00:03:58 --> 00:04:02 supports the endosymbiont theory, that when you actually look on the 45 00:04:02 --> 00:04:06 tree where the mitochondria and the chloroplasts tree, they fall 46 00:04:06 --> 00:04:10 into the bacteria. Now, there is a question where 47 00:04:10 --> 00:04:15 somebody asked in the online survey, can the mitochondria and 48 00:04:15 --> 00:04:21 chloroplasts still live outside of the eukaryotic cell? 49 00:04:21 --> 00:04:26 And the answer is no, they can't anymore because over 50 00:04:26 --> 00:04:31 evolutionary time the two organisms have become so integrated that the 51 00:04:31 --> 00:04:37 mitochondria and chloroplasts both lost their ability to live outside 52 00:04:37 --> 00:04:42 of the eukaryotic host cell. Another important point that we made 53 00:04:42 --> 00:04:46 last time that I want to reiterate here is that gene sequences, 54 00:04:46 --> 00:04:51 when we go into the environment, and obtain them directly from the 55 00:04:51 --> 00:04:55 environment act as a proxy for microbial diversity in 56 00:04:55 --> 00:05:08 the environment. So, the number of genes recovered 57 00:05:08 --> 00:05:30 directly from the environment is a measure of diversity. 58 00:05:30 --> 00:05:34 And we said that this actually plays a very, very important role in the 59 00:05:34 --> 00:05:39 analysis of microbial communities, and I showed you the example here 60 00:05:39 --> 00:05:43 where we went and took some ocean water and basically apply this 61 00:05:43 --> 00:05:48 technique that outlined last time where we can actually amplify 62 00:05:48 --> 00:05:53 ribosomal RNA genes from environmental samples, 63 00:05:53 --> 00:05:57 clone them, determine the sequence, and then constructs phylogenetic 64 00:05:57 --> 00:06:01 trees. And what you see here is a tree 65 00:06:01 --> 00:06:05 where we summarize the major groups that we found in the sample have 66 00:06:05 --> 00:06:09 been only for two of those groups where we show the entire set of 67 00:06:09 --> 00:06:13 sequences that we actually obtained because there were so many of them 68 00:06:13 --> 00:06:16 out there. And what we basically found was that over 1500 bacterial 69 00:06:16 --> 00:06:20 16S ribosomal RNA gene sequences coexist in this environment. 70 00:06:20 --> 00:06:24 And what we said also last time was that the analyses like these have 71 00:06:24 --> 00:06:28 really taught us that microorganisms are the most diverse organisms 72 00:06:28 --> 00:06:32 on the planet. So, most diversity is amongst the 73 00:06:32 --> 00:06:36 microorganisms, and one of the big questions now is 74 00:06:36 --> 00:06:40 what are all those microorganisms doing in the environment? 75 00:06:40 --> 00:06:44 And so, today what I want to do with you is basically explore this 76 00:06:44 --> 00:06:48 question of how we can actually figure out what those microorganisms 77 00:06:48 --> 00:06:52 are all doing in environmental samples? 78 00:06:52 --> 00:07:09 So we can say we are exploring the 79 00:07:09 --> 00:07:21 function of microbes in the environment. At first, 80 00:07:21 --> 00:07:33 I want to cover how we can actually identify them in the environment. 81 00:07:33 --> 00:07:38 And I want to show you one specific example, and then I want to talk 82 00:07:38 --> 00:07:44 about genomics in general, and then basically end with an 83 00:07:44 --> 00:07:50 application of genomics to environmental questions. 84 00:07:50 --> 00:07:56 So, let's first talk about the in-situ identification 85 00:07:56 --> 00:08:14 of microorganisms. 86 00:08:14 --> 00:08:18 And the basic problem that I alluded to already before is that most 87 00:08:18 --> 00:08:23 microbes are only known [SOUND OFF/THEN ON] 88 00:08:23 --> 00:08:33 -- from 16S ribosomal 89 00:08:33 --> 00:08:48 RNA clone libraries. 90 00:08:48 --> 00:09:03 And we basically want to search and identify them in the environment. 91 00:09:03 --> 00:09:07 OK, and I'll show you a specific example of that later on. 92 00:09:07 --> 00:09:12 Now last time, we said that the ribosomal RNA sequences consist 93 00:09:12 --> 00:09:17 really, like all gene sequences, in fact. We identified several 94 00:09:17 --> 00:09:22 stretches of nucleotides, types of stretches, that can be 95 00:09:22 --> 00:09:26 found. We said the A type stretches and B type stretches that are very 96 00:09:26 --> 00:09:31 important for construction of phylogenetic relationships, 97 00:09:31 --> 00:09:36 because we can align them and look for changes in the nucleotide 98 00:09:36 --> 00:09:41 sequences because they are the same length and only differ in mutation 99 00:09:41 --> 00:09:46 and single nucleotide base pair changes. 100 00:09:46 --> 00:09:50 But then there's also those C type stretches, if you remember, 101 00:09:50 --> 00:09:55 and those we said vary at much faster rates because they are not 102 00:09:55 --> 00:10:00 functionally constrained in those genes. 103 00:10:00 --> 00:10:08 OK, so they can actually also accumulate length changes. 104 00:10:08 --> 00:10:17 And, it's these C type stretches that we can use sort of as 105 00:10:17 --> 00:10:25 diagnostic sequence stretches for microorganisms. 106 00:10:25 --> 00:10:34 So, what we can say is we identify organisms by the C type stretches, 107 00:10:34 --> 00:10:47 C type sequence stretches. And we call those signature 108 00:10:47 --> 00:11:03 sequences. OK, and they allow the differentiation 109 00:11:03 --> 00:11:16 of closely related organisms, -- because they vary at very fast 110 00:11:16 --> 00:11:24 rates between organisms. And the way we do this is that we 111 00:11:24 --> 00:11:32 construct so-called phylogenetic probes. I should probably 112 00:11:32 --> 00:11:38 write this over here. Now what are those phylogenetic 113 00:11:38 --> 00:11:42 probes? They're basically short pieces of DNA that have a 114 00:11:42 --> 00:11:46 fluorescent molecule attached to them. 115 00:11:46 --> 00:12:03 -- DNA molecules that are roughly 20 116 00:12:03 --> 00:12:11 nucleotides in length, and they carry a florescent molecule. 117 00:12:11 --> 00:12:19 Now what the short, single-stranded stretches of DNA 118 00:12:19 --> 00:12:27 basically are is they are complementary to those C type 119 00:12:27 --> 00:12:45 sequence stretches -- 120 00:12:45 --> 00:12:52 -- in the ribosomal RNA. And so basically what we can do is 121 00:12:52 --> 00:12:59 we can collect microbial cells from the environment -- 122 00:12:59 --> 00:13:17 -- make them permeable -- 123 00:13:17 --> 00:13:33 -- and then basically mix them with 124 00:13:33 --> 00:13:48 those phylogenetic probes. 125 00:13:48 --> 00:13:52 And these probes will then permeate into the cell and bind to their 126 00:13:52 --> 00:14:15 complementary sequences. 127 00:14:15 --> 00:14:17 Then we wash away the unbound probe -- 128 00:14:17 --> 00:14:33 -- and we can view it in a 129 00:14:33 --> 00:14:53 microscope under UV light. 130 00:14:53 --> 00:14:57 Let me show you an example of this. What you see here is basically a 131 00:14:57 --> 00:15:01 light micrograph. So this is what you see basically 132 00:15:01 --> 00:15:06 when you collect microbial cells from the environment under the 133 00:15:06 --> 00:15:10 microscope. Most bacteria look the same, so you cannot actually 134 00:15:10 --> 00:15:15 differentiate them all by just looking at them. 135 00:15:15 --> 00:15:19 But then these cells were fixed and permeabilized and then basically 136 00:15:19 --> 00:15:24 mixed with two different phylogenetic probes that identified 137 00:15:24 --> 00:15:28 two different types of organisms. One was labeled with a red Fluor, 138 00:15:28 --> 00:15:33 the other one with a green Fluor. 139 00:15:33 --> 00:15:42 And what you see is that you can now differentiate those two organisms. 140 00:15:42 --> 00:15:52 Now, why is this especially interesting? Well here's just a 141 00:15:52 --> 00:16:02 specific example where people were looking for bacteria capable of 142 00:16:02 --> 00:16:12 nitrogen oxidation. These are bacteria that are very 143 00:16:12 --> 00:16:22 important in, for example, sewage treatment. And it was known 144 00:16:22 --> 00:16:32 that there were two different types out there, one that oxidizes 145 00:16:32 --> 00:16:41 ammonia to nitrite, -- and that a second one that 146 00:16:41 --> 00:16:47 oxidizes nitrite to nitrate. And by doing this type of analysis 147 00:16:47 --> 00:16:53 what people basically learned is that those two organisms live in 148 00:16:53 --> 00:17:00 very, very close proximity at all times. 149 00:17:00 --> 00:17:04 So the organisms that oxidized ammonia to nitrite are really 150 00:17:04 --> 00:17:08 attached, and oftentimes even surround by the organisms that take 151 00:17:08 --> 00:17:12 the nitrite to nitrate. So, where you have is a very close 152 00:17:12 --> 00:17:16 cooperation between two different types of microorganisms, 153 00:17:16 --> 00:17:21 and the transfer of one of the substrates that's a product of the 154 00:17:21 --> 00:17:25 metabolism of one of the organisms to another one: so extremely 155 00:17:25 --> 00:17:29 efficient process that really is very important to take into 156 00:17:29 --> 00:17:33 consideration when you want to understand processes like sewer 157 00:17:33 --> 00:17:37 treatment, but also nitrogen biogeochemistry and 158 00:17:37 --> 00:17:45 the environment. Any questions? 159 00:17:45 --> 00:17:55 OK, so for the remainder of the lecture I want to talk 160 00:17:55 --> 00:18:03 about genomics, -- and then in particular also its 161 00:18:03 --> 00:18:07 application to questions of environmental microbiology and 162 00:18:07 --> 00:18:12 environmental science. So first, what I want to do is give 163 00:18:12 --> 00:18:16 you a little bit of the definition of genomics, and then cover how it 164 00:18:16 --> 00:18:21 is actually possible that we can sequence entire genomes, 165 00:18:21 --> 00:18:25 and I want to give you some highlights of what we have found by 166 00:18:25 --> 00:18:30 comparing different genomes to each other. 167 00:18:30 --> 00:18:36 And then I want to talk about this field about environmental genomics 168 00:18:36 --> 00:18:43 where we can use genomic techniques to actually learn something about 169 00:18:43 --> 00:18:49 the function of different uncultured microorganisms in the environment. 170 00:18:49 --> 00:18:56 So first, our definition, it's basically to interpret 171 00:18:56 --> 00:19:03 or to sequence, -- interpret, and compare whole 172 00:19:03 --> 00:19:11 genomes. And as you will see the comparison part actually plays an 173 00:19:11 --> 00:19:18 increasingly important role because we have now actually genome 174 00:19:18 --> 00:19:26 sequences available from almost all, or from at least some of the major 175 00:19:26 --> 00:19:32 groups of life. So this, again, 176 00:19:32 --> 00:19:36 is a different kind of representation of the tree of life. 177 00:19:36 --> 00:19:40 You have bacteria, archaea, and eukarya again. 178 00:19:40 --> 00:19:44 And as you can see, we have a lot of representatives. 179 00:19:44 --> 00:19:49 In fact, this doesn't even come close to the diversity that we have 180 00:19:49 --> 00:19:53 now sequenced as well over a hundred bacterial genome sequence now, 181 00:19:53 --> 00:19:57 several archeael genomes, and increasingly also in 182 00:19:57 --> 00:20:02 eukaryotic genomes. Now, genomes, so how is this done? 183 00:20:02 --> 00:20:08 How can we actually sequence genomes? Well, 184 00:20:08 --> 00:20:13 on the face of it we use very large facilities where you have sequencing 185 00:20:13 --> 00:20:19 machines present. There's one very important one at 186 00:20:19 --> 00:20:24 MIT, actually at the Broad Institute, and here you see all those really 187 00:20:24 --> 00:20:30 industrial scale production lines actually. 188 00:20:30 --> 00:20:40 But the basic problem is that genomes are large. 189 00:20:40 --> 00:20:50 E. coli, for example, has roughly 4.4 million base pairs, 190 00:20:50 --> 00:21:00 and the human genome is even much, much larger. 191 00:21:00 --> 00:21:08 It has about 3 billion base pairs. OK, so genomes are very, very large. 192 00:21:08 --> 00:21:22 But a single sequencing reaction-- 193 00:21:22 --> 00:21:29 -- gives you only roughly 500-1, 00 nucleotides or base pairs. So 194 00:21:29 --> 00:21:36 how is it that we can actually sequence entire genomes? 195 00:21:36 --> 00:21:43 I'm going to walk you through this, and there is some variation on the 196 00:21:43 --> 00:21:50 theme, but this is still a major approach that's still used in some 197 00:21:50 --> 00:21:57 of the sequencing facilities. Now, you start out by extracting 198 00:21:57 --> 00:22:04 genomic DNA from organisms, and then you use restriction enzymes 199 00:22:04 --> 00:22:11 to cut the DNA into relatively large pieces of DNA, so about 160 200 00:22:11 --> 00:22:17 kilobase pairs long. On average, this is shown here. 201 00:22:17 --> 00:22:22 Kilo means a thousand, so 160,000 base pairs long. 202 00:22:22 --> 00:22:28 These pieces are then cloned into specific cloning vectors that are 203 00:22:28 --> 00:22:43 called BAC vectors. 204 00:22:43 --> 00:23:01 So therefore, cloning large pieces of DNA, and BAC stands for Bacterial 205 00:23:01 --> 00:23:17 Artificial Chromosome. And what they basically are, 206 00:23:17 --> 00:23:31 are plasmids, very special plasmids that can carry large pieces of 207 00:23:31 --> 00:23:40 genome, or large genome fragments. So, by cloning into those BAC 208 00:23:40 --> 00:23:45 vectors, what you do is you basically divide up the genome, 209 00:23:45 --> 00:23:50 and then the step number three is mostly done for eukaryotic genomes 210 00:23:50 --> 00:23:55 because they are so much larger. You can actually map and analyze 211 00:23:55 --> 00:24:00 the fragments, and map them onto genome maps where 212 00:24:00 --> 00:24:05 you know the location of different restriction fragments and different 213 00:24:05 --> 00:24:10 genes, actually. For bacteria, this step is mostly 214 00:24:10 --> 00:24:15 skipped, actually. What you do with each one of those 215 00:24:15 --> 00:24:20 BACs, is you cut them further up into 1 kilobase per fragment, 216 00:24:20 --> 00:24:25 so much smaller fragments. And these are called, 217 00:24:25 --> 00:24:30 and these are cloned then into normal plasmid vectors. 218 00:24:30 --> 00:24:35 And so you generate what are called shotgun clones. 219 00:24:35 --> 00:24:40 So, these are then cloned into E. coli, you go through the same type 220 00:24:40 --> 00:24:45 of steps that we discussed before already with environmental clone 221 00:24:45 --> 00:24:50 libraries. And you can actually determine the sequence of each one 222 00:24:50 --> 00:24:55 of those pieces of DNA. And what you will then get, 223 00:24:55 --> 00:25:00 is small fragments of overlapping DNA sequences. 224 00:25:00 --> 00:25:04 That it shown here. You'll find overlaps, 225 00:25:04 --> 00:25:09 basically, which piece together the whole genome. And so, 226 00:25:09 --> 00:25:14 first to assemble, you piece together these genome fragments that 227 00:25:14 --> 00:25:19 are present in the BACs, and then finally you piece together 228 00:25:19 --> 00:25:24 the entire genome propose large sequence pieces, 229 00:25:24 --> 00:25:29 and you get a so-called draft genome sequence. 230 00:25:29 --> 00:25:39 The next step in this analysis, then, is that you do so-called 231 00:25:39 --> 00:25:49 genome annotation is. And the first very important step 232 00:25:49 --> 00:26:00 is that you translate the gene sequences into amino acids. 233 00:26:00 --> 00:26:05 So, the nucleotide sequences into amino acids particularly in 234 00:26:05 --> 00:26:10 prokaryotes. This step can be done right away -- 235 00:26:10 --> 00:26:31 -- and what this allows you to do, 236 00:26:31 --> 00:26:37 is you can look for what we call open reading frames, 237 00:26:37 --> 00:26:44 or ORFs. And what you look for is a start codon and a stop codon that 238 00:26:44 --> 00:26:50 basically branches or frames a stretch of amino acids encoded by 239 00:26:50 --> 00:26:57 the nucleotides. So you look for ORFs. 240 00:26:57 --> 00:27:13 And these are your putative genes. 241 00:27:13 --> 00:27:26 The next step that you can do, 242 00:27:26 --> 00:27:32 then, is you can go to databases and now you compare your ORFs to 243 00:27:32 --> 00:27:39 information that is present in the databases. So basically, 244 00:27:39 --> 00:27:45 you inquire the database and ask, is a gene sequence that is similar 245 00:27:45 --> 00:27:52 to the one that I have statistically significantly similar present that 246 00:27:52 --> 00:27:58 allows me to say something about the function of this particular gene? 247 00:27:58 --> 00:28:05 So function, can then be identified by comparison with databases. 248 00:28:05 --> 00:28:29 Any questions? 249 00:28:29 --> 00:28:34 OK, so that allows you, then, to basically say something 250 00:28:34 --> 00:28:39 about the different genes that you have found in the genome, 251 00:28:39 --> 00:28:44 but to give you an impression of how new this field really is and how 252 00:28:44 --> 00:28:49 little we still know about the diversity of genes and organisms, 253 00:28:49 --> 00:28:54 on average when we sequence a new bacterial genome we find about 30% 254 00:28:54 --> 00:28:59 of the genes, or a third of the genes have no known functional 255 00:28:59 --> 00:29:05 analog of the databases. OK, so there's a lot to learn about 256 00:29:05 --> 00:29:11 the diversity of life and about the functional diversity of life. 257 00:29:11 --> 00:29:18 In eukaryotes, there are some little twists, 258 00:29:18 --> 00:29:24 as you all know. And basically, that is that genes 259 00:29:24 --> 00:29:31 of course consist of introns and exons, right? 260 00:29:31 --> 00:29:35 And so it's basically relatively difficult to directly identify those 261 00:29:35 --> 00:29:40 open reading frames. And what you have to do is that you 262 00:29:40 --> 00:29:45 have to actually oftentimes, so let's write this down. 263 00:29:45 --> 00:30:01 And what people oftentimes do, 264 00:30:01 --> 00:30:12 then, is that they search for matching sequences in so-called cDNA 265 00:30:12 --> 00:30:24 libraries. Now what are cDNA libraries? Let me just show you 266 00:30:24 --> 00:30:32 this on the next slide. Skip this. Basically what you can do is you can 267 00:30:32 --> 00:30:38 isolate messenger RNA from cells and that translate the messenger RNA by 268 00:30:38 --> 00:30:43 a process called reverse transcription that the viral enzyme 269 00:30:43 --> 00:30:49 that translates RNA into DNA, so you can translate it into DNA 270 00:30:49 --> 00:30:54 fragments. And you can then clone those DNA fragments into plasmids, 271 00:30:54 --> 00:31:00 sequence those, and then basically see what are the pieces that are 272 00:31:00 --> 00:31:06 actually, what are the introns in the genes? 273 00:31:06 --> 00:31:11 What are the pieces that are excised when the messenger RNA is actually 274 00:31:11 --> 00:31:29 created from the genome? 275 00:31:29 --> 00:31:33 And so, let me just cover now a few of the major insights that people 276 00:31:33 --> 00:31:37 have come up with. Of course, it's a very growing 277 00:31:37 --> 00:31:41 field and a lot of excitement is coming out. 278 00:31:41 --> 00:31:58 And I first want to talk about 279 00:31:58 --> 00:32:09 bacteria and archaea -- 280 00:32:09 --> 00:32:13 -- and then say a few words also about eukaryotes or eukaryote. 281 00:32:13 --> 00:32:17 First of all, what we learned about, bacteria and archaea, 282 00:32:17 --> 00:32:21 is that their genomes are very compact. 283 00:32:21 --> 00:32:35 Whenever they have pieces of DNA 284 00:32:35 --> 00:32:43 that are not frequently used, they're actually lost from the 285 00:32:43 --> 00:32:51 genome. OK, so they lose genes, I should say, relatively easily, and 286 00:32:51 --> 00:32:59 we can see this that the genome size is correlated to metabolic 287 00:32:59 --> 00:33:12 diversity. 288 00:33:12 --> 00:33:23 So, for example, we have Mycoplasma genetalium and 289 00:33:23 --> 00:33:37 Streptomyces -- 290 00:33:37 --> 00:33:42 coelicor are two very different bacteria. The first one is an 291 00:33:42 --> 00:34:01 obligate intracellular parasite. 292 00:34:01 --> 00:34:08 OK, so, which means it's actually bathed in a nutrient solution in the 293 00:34:08 --> 00:34:16 eukaryotic cells that it invades. It doesn't have to make amino acids. 294 00:34:16 --> 00:34:23 It gets it just from the host cell. And it turns out it has a very 295 00:34:23 --> 00:34:31 small genome, so only 0. 8-based mega-base pairs, so 580, 296 00:34:31 --> 00:34:37 00 base pairs, and only 517 genes. And interestingly, 297 00:34:37 --> 00:34:41 actually people are now using this organism to try and ask, 298 00:34:41 --> 00:34:46 well, what's the minimum number of genes that organism can actually 299 00:34:46 --> 00:34:50 will live with? And so, they are deleting in a 300 00:34:50 --> 00:34:55 stepwise fashion the different genes in this organism, 301 00:34:55 --> 00:34:59 and it turns out that you need about two to 300 genes minimum in order to 302 00:34:59 --> 00:35:03 make the things survive. On the other hand, 303 00:35:03 --> 00:35:15 streptomyces is a soil bacterium -- 304 00:35:15 --> 00:35:20 -- has a very complex lifestyle, can degrade a lot of environmental 305 00:35:20 --> 00:35:26 substrates, and it has a very big genome, one of the biggest bacterial 306 00:35:26 --> 00:35:31 genomes. And so, those two organisms basically span 307 00:35:31 --> 00:35:37 pretty much the range of bacterial genome sizes. 308 00:35:37 --> 00:35:41 And so, it's thought that it has about 7,846 genes. 309 00:35:41 --> 00:35:57 Now, we also have a very large 310 00:35:57 --> 00:36:09 genetic diversity -- 311 00:36:09 --> 00:36:23 -- between species. And typically what you find is that 312 00:36:23 --> 00:36:38 roughly 15 to 30% of genes are unique to a specific species. 313 00:36:38 --> 00:36:44 And that's really because bacteria and archaea have the capability to 314 00:36:44 --> 00:36:50 affect a lot of chemical reactions that eukaryotes, 315 00:36:50 --> 00:36:56 for example, cannot. There's about 20 million known 316 00:36:56 --> 00:37:02 organic substances, organic chemicals, and almost all of 317 00:37:02 --> 00:37:07 them are biodegradable by bacteria. Even the minutest compounds if it 318 00:37:07 --> 00:37:12 were not biodegradable bacteria, would build up in the environment, 319 00:37:12 --> 00:37:16 OK? So, if it just where a cofactor that some organism produces because 320 00:37:16 --> 00:37:21 we have such a long period of time of evolution on this planet and 321 00:37:21 --> 00:37:26 evolutionary history, you probably would be able to dig it 322 00:37:26 --> 00:37:32 up in your backyard. One of the other very important and 323 00:37:32 --> 00:37:39 interesting insights that has come out with comparing genomes for 324 00:37:39 --> 00:37:46 microorganisms is that lateral gene transfer is a very important process 325 00:37:46 --> 00:38:07 amongst microorganisms. 326 00:38:07 --> 00:38:11 Now what do I mean by lateral gene transfer? It basically means that 327 00:38:11 --> 00:38:16 we find evidence among bacterial genomes that they have actually 328 00:38:16 --> 00:38:20 taken genes from completely unrelated organisms. 329 00:38:20 --> 00:38:25 And I just want to show you one example here from that of 330 00:38:25 --> 00:38:38 thermotoga maritima -- 331 00:38:38 --> 00:38:47 -- which lives in hot springs. This is a very interesting 332 00:38:47 --> 00:38:56 bacterium that lives in hot water of around 80°C and thrives only in 333 00:38:56 --> 00:39:05 those kinds of environments. And they coexist there with many 334 00:39:05 --> 00:39:14 archaea. And when people sequenced the genome of thermotoga maritima 335 00:39:14 --> 00:39:23 what they found was that about 25% of the genes have their closest 336 00:39:23 --> 00:39:32 relatives in archaeal genomes. So roughly 25% of genes in 337 00:39:32 --> 00:39:39 thermotoga are of archaeal origin. And how can we actually figure 338 00:39:39 --> 00:39:44 something like that out? Well, the most important technique 339 00:39:44 --> 00:39:49 is, again, phylogenetic tree construction. And so when you have, 340 00:39:49 --> 00:39:54 for example, gene A, well let me draw this, actually, 341 00:39:54 --> 00:40:10 on a new board. 342 00:40:10 --> 00:40:15 So you're comparing, say, three organisms, 343 00:40:15 --> 00:40:21 organism A, B, and C and you compare gene one with gene two. 344 00:40:21 --> 00:40:27 And you notice that most genes adhere to this pattern, 345 00:40:27 --> 00:40:33 but that every now and then there's a gene that gives you this 346 00:40:33 --> 00:40:38 type of pattern. What you can then conclude is that 347 00:40:38 --> 00:40:43 this gene, C, has not coevolved with the other genes in the genome of 348 00:40:43 --> 00:40:48 these organisms but was actually transferred into it from another 349 00:40:48 --> 00:40:53 source. And I don't have time to go actually into the mechanisms. 350 00:40:53 --> 00:40:58 If you're interested, I teach a graduate class that undergraduates 351 00:40:58 --> 00:41:03 actually take in our department, environmental microbiology, where we 352 00:41:03 --> 00:41:08 discussed a lot of the mechanisms. It's basically a lot of viruses can 353 00:41:08 --> 00:41:13 affect gene transfer but also plasmids and transposons. 354 00:41:13 --> 00:41:18 But for bacteria, again, you should remember that often new function is 355 00:41:18 --> 00:41:23 actually oftentimes arises by lateral gene transfer. 356 00:41:23 --> 00:41:28 And one of the interesting things is that lateral gene transfer is 357 00:41:28 --> 00:41:34 actually very important in the evolution of pathogenic bacteria. 358 00:41:34 --> 00:41:48 So, the so-called virulence genes, 359 00:41:48 --> 00:41:57 which are the genes that basically affect pathogenesis. Do 360 00:41:57 --> 00:42:13 you have a question? Among pathogenic bacteria, 361 00:42:13 --> 00:42:35 often arise by lateral gene transfer. OK. Any questions? 362 00:42:35 --> 00:42:43 OK, now for eukarya, I just want to make the point that 363 00:42:43 --> 00:42:52 their genomes are generally orders of magnitudes larger -- 364 00:42:52 --> 00:43:16 OK, and that the exons, 365 00:43:16 --> 00:43:23 so the stretches that really encode the protein that make up the 366 00:43:23 --> 00:43:30 organism, the exons are only typically a few percent 367 00:43:30 --> 00:43:37 of the genome. That's particularly in higher 368 00:43:37 --> 00:43:44 eukaryotes. Yeasts, for example, have a much more 369 00:43:44 --> 00:43:50 compact genome also. We, for example, are full of DNA 370 00:43:50 --> 00:43:57 that people still have a very hard time figuring out what that actually 371 00:43:57 --> 00:44:04 does. But it seems that the majority of the genome, 372 00:44:04 --> 00:44:10 so-called repeated sequences -- -- many of which seems to be ancient 373 00:44:10 --> 00:44:15 retroviruses that have inserted themselves into the genome and have 374 00:44:15 --> 00:44:21 since then lost actually their function. OK, 375 00:44:21 --> 00:44:26 so the remaining time I want to just give you an example of how we can 376 00:44:26 --> 00:44:31 now use these techniques that I outlined before to learn something 377 00:44:31 --> 00:44:37 about microorganisms in the environment. 378 00:44:37 --> 00:44:47 It's called environmental. 379 00:44:47 --> 00:45:04 Basically, the way this all started 380 00:45:04 --> 00:45:08 was by going into the environment and extracting nuclear gases and 381 00:45:08 --> 00:45:13 treating them exactly the same way as if you had a single genome. 382 00:45:13 --> 00:45:18 But, again, remember, we have a very large mixture of microorganisms 383 00:45:18 --> 00:45:22 present in the environment. And where this is mostly done was 384 00:45:22 --> 00:45:27 in the ocean, actually. And what people did, was they 385 00:45:27 --> 00:45:32 constructed those BAC clones directly from the environment and 386 00:45:32 --> 00:45:36 then looked amongst those BAC clones for specific 16S ribosomal 387 00:45:36 --> 00:45:41 RNA genes. Remember, this is the marker that we 388 00:45:41 --> 00:45:45 have for microorganisms in the environment. We know the diversity 389 00:45:45 --> 00:45:49 of microorganisms through those types of genes, 390 00:45:49 --> 00:45:53 and we have a lot of the data available. And so, 391 00:45:53 --> 00:45:57 in order to link a specific function of such an organism that we only 392 00:45:57 --> 00:46:01 know from the 16S ribosomal RNA genes. 393 00:46:01 --> 00:46:06 So, to ask the question of what much of this organism might be carrying 394 00:46:06 --> 00:46:12 out in the environment, it's very useful to sequence BAC 395 00:46:12 --> 00:46:17 clones that have 16S ribosomal RNA genes on them, 396 00:46:17 --> 00:46:23 and determine what kinds of protein coding genes are on there that might 397 00:46:23 --> 00:46:28 reveal some of the function of the organism in the environment. 398 00:46:28 --> 00:46:34 And one example that I want to show you is that of the proteorhodopsin. 399 00:46:34 --> 00:46:45 So basically, the initial task was to sequence BAC clones containing 400 00:46:45 --> 00:46:57 ribosomal RNA genes, and look for other genes that might 401 00:46:57 --> 00:47:15 reveal some of the function. 402 00:47:15 --> 00:47:18 So, you don't want to look for all the genes that encode proteins that 403 00:47:18 --> 00:47:22 are important to the cell cycle and things like that, 404 00:47:22 --> 00:47:25 but really sort of metabolic genes that might tell you something about 405 00:47:25 --> 00:47:29 the type of metabolism that this organism carries out 406 00:47:29 --> 00:47:33 in the environment. And so, what the first example that 407 00:47:33 --> 00:47:39 turned out to be really, really important is that people 408 00:47:39 --> 00:47:44 found rhodopsin genes on one of those BAC fragments, 409 00:47:44 --> 00:47:50 and it turns out this rhodopsin catalyzes or these rhodopsin genes 410 00:47:50 --> 00:47:55 produce a protein that inserts itself into the bacterial membrane, 411 00:47:55 --> 00:48:01 and it's a photoreceptor that when it's hit by light, 412 00:48:01 --> 00:48:06 it actually becomes a proton pump. So, it expels protons from the cell 413 00:48:06 --> 00:48:11 interior to the outside, and you already know that this is 414 00:48:11 --> 00:48:16 important in energy generation in all living cells. 415 00:48:16 --> 00:48:20 So proton gradient across membranes basically give the cells sort of a 416 00:48:20 --> 00:48:25 battery status that can be exploited by ATPase molecules or ATPase 417 00:48:25 --> 00:48:30 proteins that equalize the proton gradient and affect ATP 418 00:48:30 --> 00:48:35 synthesis in doing so. Now, why is this so important? 419 00:48:35 --> 00:48:40 Well, it turned out that this type of protein is present in almost all 420 00:48:40 --> 00:48:45 microbial cells that were previously thought to be heterotrophs alone in 421 00:48:45 --> 00:48:49 the ocean in the parts of the ocean that receive enough life. 422 00:48:49 --> 00:48:54 And what this means is that our estimates of the global carbon 423 00:48:54 --> 00:48:59 budget of the ocean were basically wrong because most microorganisms in 424 00:48:59 --> 00:49:12 the ocean have this. 425 00:49:12 --> 00:49:31 So most prokaryotes in the ocean have a light-driven proton pump 426 00:49:31 --> 00:49:52 which is called proteorhodopsin. And it basically allows them to gain 427 00:49:52 --> 00:50:06 energy from sunlight. And there's an increasing number of 428 00:50:06 --> 00:50:12 such examples now where we are learning to interpret environmental 429 00:50:12 --> 00:50:18 communities, and the function of environmental microbial communities 430 00:50:18 --> 00:50:23 through those genomic approaches. And it reveals basically an 431 00:50:23 --> 00:50:29 enormous diversity of organisms out there. And what we also are 432 00:50:29 --> 00:50:35 learning to do now is to assemble entire genomes from those samples by 433 00:50:35 --> 00:50:40 applying genomic techniques. And this is an example here where 434 00:50:40 --> 00:50:44 you see, this was published last year, where people went out and 435 00:50:44 --> 00:50:49 basically were able to piece together from pieces of genes 436 00:50:49 --> 00:50:53 obtained from the environment, entire genomes or fragments of 437 00:50:53 --> 00:50:58 entire genomes. And that's shown here. 438 00:50:58 --> 00:51:02 Those are contiguous sequences. OK, so if you have any questions 439 00:51:02 --> 00:51:07 let me know by e-mail, or if you're interested in pursuing 440 00:51:07 --> 00:51:11 this further I also teach another class in civil and environmental 441 00:51:11 --> 00:51:14 engineering.