1 00:00:00,000 --> 00:00:01,936 [SQUEAKING] 2 00:00:01,936 --> 00:00:03,388 [RUSTLING] 3 00:00:03,388 --> 00:00:05,324 [CLICKING] 4 00:00:13,080 --> 00:00:16,320 JASON KU: Welcome, everybody, to the second-to-last lecture 5 00:00:16,320 --> 00:00:18,990 of 6.006. 6 00:00:18,990 --> 00:00:21,870 In this lecture, we've mostly covered 7 00:00:21,870 --> 00:00:24,570 all of the testable material that we're 8 00:00:24,570 --> 00:00:30,780 going to have on the final, or on quiz 3. 9 00:00:30,780 --> 00:00:32,729 Today, really what we're talking about 10 00:00:32,729 --> 00:00:35,550 is putting into context all the material 11 00:00:35,550 --> 00:00:39,920 that we've learned over the course of the term 12 00:00:39,920 --> 00:00:44,240 at a high level and talk about where we can go from here 13 00:00:44,240 --> 00:00:47,060 in terms of other theory classes and other classes 14 00:00:47,060 --> 00:00:49,340 in the department that are related to this material. 15 00:00:49,340 --> 00:00:51,740 Now, most things in the department 16 00:00:51,740 --> 00:00:54,240 are in some way related to this material. 17 00:00:54,240 --> 00:00:57,045 And so that's why there's a foundational course. 18 00:00:57,045 --> 00:00:59,420 But we're going to try to talk about it from a high level 19 00:00:59,420 --> 00:01:05,269 and talk about how some future things that you 20 00:01:05,269 --> 00:01:06,690 might be interested relate. 21 00:01:06,690 --> 00:01:11,420 OK, so we started out the term, in lecture one, 22 00:01:11,420 --> 00:01:15,470 talking about 6.006, and we had four main goals 23 00:01:15,470 --> 00:01:21,110 that we had for our course-- really three main goals 24 00:01:21,110 --> 00:01:23,540 Does anyone remember what those goals were? 25 00:01:29,160 --> 00:01:31,680 So you got to the last one first. 26 00:01:31,680 --> 00:01:36,000 The first one was to solve hard computational problems, 27 00:01:36,000 --> 00:01:39,060 to be able to solve problems. 28 00:01:39,060 --> 00:01:43,710 So this is kind of like the "let's make an algorithm" 29 00:01:43,710 --> 00:01:46,290 part of the course. 30 00:01:46,290 --> 00:01:51,135 1, solve hard computational problems. 31 00:01:56,620 --> 00:02:00,730 I guess "hard" here maybe should be in quotes because we 32 00:02:00,730 --> 00:02:03,010 saw in the last lecture what hard 33 00:02:03,010 --> 00:02:05,680 means in a technical sense. 34 00:02:05,680 --> 00:02:11,350 Hard could mean that there's no efficient algorithm that we 35 00:02:11,350 --> 00:02:13,137 know how to solve a problem on. 36 00:02:13,137 --> 00:02:15,220 That's getting a little bit of ahead of ourselves. 37 00:02:15,220 --> 00:02:17,650 Computational problems with algorithms 38 00:02:17,650 --> 00:02:20,020 is really the key part about this goal. 39 00:02:20,020 --> 00:02:21,670 It's kind of the same goal that you 40 00:02:21,670 --> 00:02:26,230 have in a class like 6.0001 or 6.009. 41 00:02:26,230 --> 00:02:28,390 You're trying to convince a computer 42 00:02:28,390 --> 00:02:34,930 that you solved a problem on a finite set of inputs. 43 00:02:34,930 --> 00:02:37,180 But really what this class is about 44 00:02:37,180 --> 00:02:41,680 is two other things, which is more about communication 45 00:02:41,680 --> 00:02:44,200 to people rather than computers. 46 00:02:44,200 --> 00:02:48,010 Your algorithm might be correct or efficient, 47 00:02:48,010 --> 00:02:50,260 but you need to be able to communicate that to humans. 48 00:02:50,260 --> 00:02:52,720 And that's what the other two goals are. 49 00:02:52,720 --> 00:03:01,150 So second one is argue correctness. 50 00:03:01,150 --> 00:03:05,620 Basically, the thing that I'm doing to my inputs 51 00:03:05,620 --> 00:03:07,855 is always going to lead me to a correct output. 52 00:03:10,580 --> 00:03:13,140 No matter what input I give it, any valid input-- 53 00:03:13,140 --> 00:03:16,740 there could be an infinite space of possible inputs, 54 00:03:16,740 --> 00:03:18,330 and in this class, that's the case, 55 00:03:18,330 --> 00:03:22,770 because we want our input size to grow arbitrarily large-- 56 00:03:22,770 --> 00:03:24,660 we need to be able to argue correctness 57 00:03:24,660 --> 00:03:26,940 that it's going to return me the correct thing 58 00:03:26,940 --> 00:03:29,470 no matter what my inputs are. 59 00:03:29,470 --> 00:03:34,140 And in order to do that, that's essentially-- that's 6.042. 60 00:03:34,140 --> 00:03:38,940 This whole class has basically been applied 6.042. 61 00:03:38,940 --> 00:03:42,360 I've given you some procedures, and you 62 00:03:42,360 --> 00:03:44,250 have to prove things about these procedures. 63 00:03:44,250 --> 00:03:47,100 Or most of the time, we proved it for you, 64 00:03:47,100 --> 00:03:49,410 and then you've used them as black boxes. 65 00:03:49,410 --> 00:03:54,090 But that's a lot of what this class is about. 66 00:03:54,090 --> 00:03:56,030 And the third one is efficiency-- 67 00:03:58,610 --> 00:04:02,700 argue that it's "good," for lack of a better thing. 68 00:04:02,700 --> 00:04:04,330 This is efficiency. 69 00:04:04,330 --> 00:04:07,020 What does "good" mean? 70 00:04:07,020 --> 00:04:10,500 Well, that was hard to know at the beginning of our class. 71 00:04:10,500 --> 00:04:14,150 And so we set up this model of computation, a framework, 72 00:04:14,150 --> 00:04:18,140 through which we could determine how good or bad 73 00:04:18,140 --> 00:04:20,240 our algorithms were by saying-- 74 00:04:20,240 --> 00:04:23,510 by defining a model of computation, saying what things 75 00:04:23,510 --> 00:04:28,160 we can do in constant time, and then just building off of that. 76 00:04:28,160 --> 00:04:35,362 So this is basically our model plus some asymptotics 77 00:04:35,362 --> 00:04:36,320 or something like that. 78 00:04:39,840 --> 00:04:41,350 Ran out of space. 79 00:04:41,350 --> 00:04:42,040 What? 80 00:04:42,040 --> 00:04:43,415 AUDIENCE: It's about scalability. 81 00:04:43,415 --> 00:04:46,050 JASON KU: Yeah, this is about scalability. 82 00:04:46,050 --> 00:04:50,310 A model of computation tells us how much time we can spend, 83 00:04:50,310 --> 00:04:53,340 but it's compared to our input size. 84 00:04:53,340 --> 00:04:56,850 This is always about, how does our algorithm 85 00:04:56,850 --> 00:05:01,770 perform relative to the rate that our problem size grows? 86 00:05:01,770 --> 00:05:03,780 And so that's what we mean by "good." 87 00:05:03,780 --> 00:05:07,590 And in this class, we don't tend to talk about constant size 88 00:05:07,590 --> 00:05:08,610 problems. 89 00:05:08,610 --> 00:05:12,630 It's about how algorithms can scale as you 90 00:05:12,630 --> 00:05:14,430 have arbitrarily large inputs. 91 00:05:14,430 --> 00:05:17,230 That's why we need recursion and induction 92 00:05:17,230 --> 00:05:19,230 to be able to prove things about our algorithms, 93 00:05:19,230 --> 00:05:21,960 because they're for arbitrary n. 94 00:05:21,960 --> 00:05:27,450 And that's why we need this relative-to-input size, 95 00:05:27,450 --> 00:05:31,560 the growth factor of our algorithm's performance, 96 00:05:31,560 --> 00:05:33,460 relative to the input. 97 00:05:33,460 --> 00:05:39,370 OK, and then the last thing is, to me, 98 00:05:39,370 --> 00:05:41,370 one of the most important things, 99 00:05:41,370 --> 00:05:45,030 is communicating these things to another human. 100 00:05:45,030 --> 00:05:51,450 So communication is key here. 101 00:05:51,450 --> 00:05:53,580 If you can always write good code 102 00:05:53,580 --> 00:05:56,490 that's always right, good for you. 103 00:05:56,490 --> 00:05:59,930 I can't do that all the time. 104 00:05:59,930 --> 00:06:04,310 But that might mean that you can be very-- 105 00:06:04,310 --> 00:06:08,780 a competent, independent computer programmer. 106 00:06:08,780 --> 00:06:14,510 But you are going to be limited in what you can do if you're 107 00:06:14,510 --> 00:06:18,140 only able to rely on yourself. 108 00:06:18,140 --> 00:06:22,010 A lot about computer science is working with others 109 00:06:22,010 --> 00:06:23,672 to solve computational problems. 110 00:06:23,672 --> 00:06:25,130 And when you're working with others 111 00:06:25,130 --> 00:06:27,292 to solve computational problems, you 112 00:06:27,292 --> 00:06:29,000 need to be able to communicate with them, 113 00:06:29,000 --> 00:06:31,490 and you need to be able to communicate them both 114 00:06:31,490 --> 00:06:36,590 what it is you're doing and why it is you're doing it-- 115 00:06:36,590 --> 00:06:40,290 that you're doing the correct thing and that it's efficient. 116 00:06:40,290 --> 00:06:44,030 And so that's a big part about what this course is. 117 00:06:44,030 --> 00:06:46,340 At the end of the day, on your quiz, 118 00:06:46,340 --> 00:06:52,730 if you write down Python script for a correct algorithm, 119 00:06:52,730 --> 00:06:54,710 but we don't know what it's doing, 120 00:06:54,710 --> 00:06:56,690 but it's correct, we're not going 121 00:06:56,690 --> 00:07:01,130 to give you full points on that, because you're not satisfying 122 00:07:01,130 --> 00:07:04,190 the conditions of this class. 123 00:07:04,190 --> 00:07:06,610 It's really about the communication here. 124 00:07:06,610 --> 00:07:11,840 OK, so just to review, since we've not 125 00:07:11,840 --> 00:07:17,960 discussed how the most recent lecture fits into your problem 126 00:07:17,960 --> 00:07:18,680 sets. 127 00:07:18,680 --> 00:07:24,070 We didn't have any problem sets that covered complexity, 128 00:07:24,070 --> 00:07:25,320 so how does that fit in? 129 00:07:25,320 --> 00:07:28,770 Well, argue that the ways that we're solving our problems 130 00:07:28,770 --> 00:07:29,910 are good. 131 00:07:29,910 --> 00:07:32,640 What we proved in the last lecture was that most problems 132 00:07:32,640 --> 00:07:35,960 cannot be solved good. 133 00:07:35,960 --> 00:07:38,720 They can't be solved in polynomial time 134 00:07:38,720 --> 00:07:41,340 with respect to the size of your input. 135 00:07:41,340 --> 00:07:46,010 However, most of the problems that we think about, 136 00:07:46,010 --> 00:07:51,830 in a sense, I can prove to you that it's a yes solution. 137 00:07:51,830 --> 00:07:55,910 I can show you a simple path in this graph that 138 00:07:55,910 --> 00:07:57,920 has a certain length. 139 00:07:57,920 --> 00:08:01,760 Or I can show you a subset that sums to a certain value 140 00:08:01,760 --> 00:08:03,320 in a particular problem. 141 00:08:03,320 --> 00:08:05,150 I can give you a certificate that I 142 00:08:05,150 --> 00:08:08,060 can prove to you in a reasonable amount of time 143 00:08:08,060 --> 00:08:10,220 that, yes, I can prove to you that this 144 00:08:10,220 --> 00:08:12,890 is-- the answer to this thing is correct. 145 00:08:12,890 --> 00:08:16,880 And that's what we talked about in the last lecture. 146 00:08:16,880 --> 00:08:40,210 So not always "good" algorithms to solve problems, 147 00:08:40,210 --> 00:09:02,270 but many problems we think about can be either checked 148 00:09:02,270 --> 00:09:05,000 in polynomial time-- 149 00:09:08,480 --> 00:09:11,720 this is the concept of having a certificate that I could give 150 00:09:11,720 --> 00:09:13,610 you of polynomial size that could 151 00:09:13,610 --> 00:09:16,040 be checked in polynomial time-- 152 00:09:16,040 --> 00:09:18,660 in a sense, that's a way-- 153 00:09:18,660 --> 00:09:22,520 check-- checked in polynomial time. 154 00:09:22,520 --> 00:09:26,570 This leads to our class of decision problems, NP. 155 00:09:26,570 --> 00:09:40,190 Or it can be solved by brute force in exponential time. 156 00:09:40,190 --> 00:09:42,650 Most of the things that we've talked about in this class 157 00:09:42,650 --> 00:09:45,030 fall into one of these two categories. 158 00:09:45,030 --> 00:09:47,540 We can just brute force over the combinatorial space 159 00:09:47,540 --> 00:09:51,730 of possible outputs and check to see if they're correct. 160 00:09:51,730 --> 00:09:58,550 Or I can give you a certificate basically saying, look, 161 00:09:58,550 --> 00:09:59,200 I can solve-- 162 00:09:59,200 --> 00:10:00,937 actually, anything that's of this form 163 00:10:00,937 --> 00:10:02,770 can be checked in this form, because there's 164 00:10:02,770 --> 00:10:05,570 only a polynomial number of things to check-- or, sorry, 165 00:10:05,570 --> 00:10:08,110 an exponential possible number of certificates 166 00:10:08,110 --> 00:10:10,510 of polynomial length to check. 167 00:10:10,510 --> 00:10:13,450 But basically, this is saying that the problems 168 00:10:13,450 --> 00:10:17,090 that we think of mostly fall into these two categories. 169 00:10:17,090 --> 00:10:20,260 And so there usually are algorithms to solve 170 00:10:20,260 --> 00:10:24,710 the problems that we care about, even 171 00:10:24,710 --> 00:10:29,510 if most random problems in terms of bit strings 172 00:10:29,510 --> 00:10:32,480 that we gave an analysis in the last lecture 173 00:10:32,480 --> 00:10:36,050 actually prove that most random problems are not solvable. 174 00:10:38,570 --> 00:10:42,180 In a sense, the problems we think about are not random. 175 00:10:42,180 --> 00:10:43,970 They kind of have this structure that they 176 00:10:43,970 --> 00:10:46,330 can be checked pretty quickly. 177 00:10:46,330 --> 00:10:49,730 OK, so that's what we mean when we 178 00:10:49,730 --> 00:10:51,530 are talking about complexity. 179 00:10:51,530 --> 00:10:54,050 For the purposes of the final, you'll 180 00:10:54,050 --> 00:10:59,150 be able to see on your final exam practice problems 181 00:10:59,150 --> 00:11:05,450 that we're going to give you, most of what we cover in terms 182 00:11:05,450 --> 00:11:09,200 of material on the final that will be testing 183 00:11:09,200 --> 00:11:14,450 the lecture 19 material will be in terms of the definitions. 184 00:11:14,450 --> 00:11:19,340 Do you understand what the decision problem class NP is? 185 00:11:19,340 --> 00:11:20,630 EXP is? 186 00:11:20,630 --> 00:11:22,670 Do you know how these relate to each other? 187 00:11:22,670 --> 00:11:27,620 EXP is definitely a superset of NP here. 188 00:11:27,620 --> 00:11:29,270 NP nestles inside here. 189 00:11:29,270 --> 00:11:31,850 They could be equal-- probably not. 190 00:11:31,850 --> 00:11:35,480 Those are the types of things that we would address. 191 00:11:35,480 --> 00:11:38,690 Knowing a directionality of a reduction. 192 00:11:38,690 --> 00:11:44,330 If you have a problem A and a problem 193 00:11:44,330 --> 00:11:53,290 B, and I know that this one is difficult by some measure-- 194 00:11:53,290 --> 00:11:57,070 I already happened to know that it's very hard, 195 00:11:57,070 --> 00:11:59,500 like NP-hard or something like that. 196 00:11:59,500 --> 00:12:02,830 If this is a problem that I'm interested in knowing 197 00:12:02,830 --> 00:12:05,560 the complexity about, and I can prove 198 00:12:05,560 --> 00:12:11,710 that I can solve it if I had a black box to solve B-- 199 00:12:11,710 --> 00:12:13,930 any black box to solve B, and I could 200 00:12:13,930 --> 00:12:17,250 make this reduction in polynomial time, 201 00:12:17,250 --> 00:12:21,230 and if this is hard, that means this can't be-- 202 00:12:23,770 --> 00:12:26,170 that means that if this is hard, then 203 00:12:26,170 --> 00:12:29,860 I better not be able to solve this in polynomial time, 204 00:12:29,860 --> 00:12:32,510 because then I would be able to solve this in polynomial time. 205 00:12:32,510 --> 00:12:35,830 So that's basically the type of argument 206 00:12:35,830 --> 00:12:37,600 usually in a true/false question we 207 00:12:37,600 --> 00:12:40,450 might have on the final exam for you 208 00:12:40,450 --> 00:12:45,130 to kind of understand the basic high-level definitions involved 209 00:12:45,130 --> 00:12:49,150 in what was talked about in lecture 19. 210 00:12:49,150 --> 00:12:52,870 Hardness-- the very most difficult problems of these 211 00:12:52,870 --> 00:12:53,590 classes-- 212 00:12:53,590 --> 00:12:57,340 and completeness-- sorry, anything harder 213 00:12:57,340 --> 00:12:59,290 than things in these classes. 214 00:12:59,290 --> 00:13:02,320 Whereas completeness is the ones that are in this 215 00:13:02,320 --> 00:13:05,890 set, but at least as hard as anything in those classes. 216 00:13:05,890 --> 00:13:09,340 So that's just to give you a brief overview 217 00:13:09,340 --> 00:13:12,700 of the only material that hasn't been tested 218 00:13:12,700 --> 00:13:15,910 but might be tested on the final. 219 00:13:15,910 --> 00:13:20,080 So when we don't have a good algorithm, 220 00:13:20,080 --> 00:13:24,820 we can actually prove that it probably 221 00:13:24,820 --> 00:13:27,190 doesn't have a good algorithm. 222 00:13:27,190 --> 00:13:30,130 And that's a problem that you'll be 223 00:13:30,130 --> 00:13:32,590 able to solve in future classes, if you 224 00:13:32,590 --> 00:13:34,370 continue along this track. 225 00:13:34,370 --> 00:13:38,710 OK, so what's the actual content that we talked about? 226 00:13:38,710 --> 00:13:40,390 This is a very high-level overview 227 00:13:40,390 --> 00:13:43,660 of why we're taking this class, why you're taking this class. 228 00:13:43,660 --> 00:13:46,570 But what is the content we actually covered? 229 00:13:46,570 --> 00:13:52,340 I like to break it up into three units and, in a sense, 230 00:13:52,340 --> 00:13:54,290 two subunits. 231 00:13:54,290 --> 00:14:04,310 So quiz 1 material and quiz 2 material was about showing you 232 00:14:04,310 --> 00:14:06,410 some nice black boxes. 233 00:14:06,410 --> 00:14:10,220 Basically, if I'm going to have inputs of non-constant size, 234 00:14:10,220 --> 00:14:14,000 it's going to be useful for me to be able to find things 235 00:14:14,000 --> 00:14:16,240 among those elements. 236 00:14:16,240 --> 00:14:19,910 So that's really what quiz 1 is all about-- 237 00:14:19,910 --> 00:14:30,320 data structures for finding things 238 00:14:30,320 --> 00:14:48,240 in non-constant size database. 239 00:14:48,240 --> 00:14:49,090 Sure. 240 00:14:49,090 --> 00:14:51,160 And when we were storing these things, 241 00:14:51,160 --> 00:14:54,295 we want to support maybe two different types of queries-- 242 00:14:59,520 --> 00:15:03,960 ones that were intrinsic to the items, what the items were, 243 00:15:03,960 --> 00:15:07,140 and ones based on what-- an extrinsic order 244 00:15:07,140 --> 00:15:10,350 was placed on these items. 245 00:15:10,350 --> 00:15:13,383 And that was a way in which we broke down, 246 00:15:13,383 --> 00:15:15,300 how should I approach looking at this problem? 247 00:15:15,300 --> 00:15:18,270 I want to be able to support queries and maintain 248 00:15:18,270 --> 00:15:19,920 an extrinsic order on these things. 249 00:15:19,920 --> 00:15:22,090 I might want a sequence. 250 00:15:22,090 --> 00:15:30,090 This is a sequence extrinsic order. 251 00:15:33,030 --> 00:15:39,160 Or I want to be able to look up, is this thing in my set, 252 00:15:39,160 --> 00:15:43,640 by a key that we identify, with a unique key. 253 00:15:43,640 --> 00:15:50,930 So this is some intrinsic queries, and often order. 254 00:15:54,690 --> 00:15:59,040 A hash table doesn't maintain any order on my keys. 255 00:15:59,040 --> 00:16:00,930 But it does support intrinsic queries-- 256 00:16:00,930 --> 00:16:02,920 is this thing in my set or not? 257 00:16:02,920 --> 00:16:05,400 But we did show you other set data structures 258 00:16:05,400 --> 00:16:08,160 that do support an intrinsic order that 259 00:16:08,160 --> 00:16:12,330 allows me to see what the next larger and the next previous-- 260 00:16:12,330 --> 00:16:20,140 the next larger and the next smaller item is in my set. 261 00:16:20,140 --> 00:16:25,010 So here's a summary of those data structures that we had. 262 00:16:25,010 --> 00:16:28,100 I'm not going to go into how to use these things 263 00:16:28,100 --> 00:16:30,740 or how to choose from among them here. 264 00:16:30,740 --> 00:16:36,710 That's what your quiz 1 review lecture was all about. 265 00:16:36,710 --> 00:16:40,680 But basically, the idea here is, if we have a sequence, 266 00:16:40,680 --> 00:16:44,080 most of the time when you're programming, 267 00:16:44,080 --> 00:16:47,320 being able to push and pop at the end of a list 268 00:16:47,320 --> 00:16:48,190 is pretty good. 269 00:16:48,190 --> 00:16:51,340 Which is why Python, the most fundamental data 270 00:16:51,340 --> 00:16:54,550 structure that you have, is a list, because it's 271 00:16:54,550 --> 00:16:55,810 a super useful thing. 272 00:16:55,810 --> 00:16:57,700 I just want to store a bunch of things, 273 00:16:57,700 --> 00:17:04,510 have random access to the, say, 10th element to my thing, 274 00:17:04,510 --> 00:17:09,040 but I'm not necessarily having to dynamically update 275 00:17:09,040 --> 00:17:13,180 the order of these things dynamically. 276 00:17:13,180 --> 00:17:15,220 I don't necessarily have to insert something 277 00:17:15,220 --> 00:17:16,690 in the middle of the list. 278 00:17:16,690 --> 00:17:18,339 But most of the time, what I can do 279 00:17:18,339 --> 00:17:21,790 is put it at the end of the list and maybe swap it down 280 00:17:21,790 --> 00:17:24,369 into place if I need to. 281 00:17:24,369 --> 00:17:26,829 So that's why a list is super useful. 282 00:17:26,829 --> 00:17:31,630 A sequence AVL tree, useful, but not 283 00:17:31,630 --> 00:17:33,520 as ubiquitous as a linked list-- 284 00:17:33,520 --> 00:17:35,950 I mean, as a dynamic array, sorry. 285 00:17:35,950 --> 00:17:38,830 I said linked list, I meant Python list, 286 00:17:38,830 --> 00:17:41,140 which is a dynamic array. 287 00:17:41,140 --> 00:17:43,420 So the dynamic array tended to be, 288 00:17:43,420 --> 00:17:47,920 in your coding practice, your most common sequence data 289 00:17:47,920 --> 00:17:49,000 structure here. 290 00:17:49,000 --> 00:17:52,390 Though, we can get pretty good for this insert 291 00:17:52,390 --> 00:17:55,330 in the middle operation with the sequence AVL. 292 00:17:55,330 --> 00:17:58,960 OK, then on the set data structure side, 293 00:17:58,960 --> 00:18:03,225 I categorize these things into a couple different categories 294 00:18:03,225 --> 00:18:05,850 here in terms of the operations we can support on these things. 295 00:18:05,850 --> 00:18:08,650 These are all intrinsic operations-- finding things, 296 00:18:08,650 --> 00:18:10,180 inserting, deleting things. 297 00:18:10,180 --> 00:18:14,590 I think of the first three as being dictionary operations. 298 00:18:14,590 --> 00:18:18,700 I want to just look up whether something's there. 299 00:18:18,700 --> 00:18:23,920 Whereas the last two are order-preserving operations, 300 00:18:23,920 --> 00:18:28,060 where it matters what the order of these things are stored in. 301 00:18:28,060 --> 00:18:34,600 And so as you can see from the asymptotic complexity 302 00:18:34,600 --> 00:18:37,330 of the various operations here, the hash table 303 00:18:37,330 --> 00:18:40,690 is actually super good if you want the dictionary oper-- 304 00:18:40,690 --> 00:18:44,480 if you just want to support dictionary operations. 305 00:18:44,480 --> 00:18:48,160 But in the cases where you need to maintain order dynamically, 306 00:18:48,160 --> 00:18:51,590 a set AVL is the way to go. 307 00:18:51,590 --> 00:18:53,000 But if you don't need it dynamic, 308 00:18:53,000 --> 00:18:54,980 but you still need those order operations, 309 00:18:54,980 --> 00:18:56,990 a sorted array is just good enough 310 00:18:56,990 --> 00:18:59,600 if you don't need to change what they are. 311 00:18:59,600 --> 00:19:07,070 So that's a quick overview of quiz 1-type data structures 312 00:19:07,070 --> 00:19:08,390 material. 313 00:19:08,390 --> 00:19:11,750 But then we used most of these data structures 314 00:19:11,750 --> 00:19:15,085 to get faster sorting algorithms in different contexts. 315 00:19:18,760 --> 00:19:21,010 Basically, everything on this list 316 00:19:21,010 --> 00:19:25,660 involved making a data structure and exploiting that data 317 00:19:25,660 --> 00:19:27,460 structure to get a better running time, 318 00:19:27,460 --> 00:19:30,780 all except for merge sort, really. 319 00:19:30,780 --> 00:19:35,730 The first two we presented in terms of a priority queue, 320 00:19:35,730 --> 00:19:38,700 whether we used a sorted array or an array. 321 00:19:38,700 --> 00:19:44,800 We represented it at the end of lecture eight 322 00:19:44,800 --> 00:19:46,600 to get n-squared running time. 323 00:19:46,600 --> 00:19:50,380 We generalized that down to n log n by using a heap. 324 00:19:50,380 --> 00:19:52,480 That was a nice optimization. 325 00:19:52,480 --> 00:19:56,860 But we also got interesting data structures using an-- 326 00:19:56,860 --> 00:20:00,730 I mean, interesting sorting algorithms using an AVL tree 327 00:20:00,730 --> 00:20:02,710 because of the power of maintaining 328 00:20:02,710 --> 00:20:05,290 a dynamic order over time. 329 00:20:05,290 --> 00:20:11,080 But then exploiting a direct access array to be able to sort 330 00:20:11,080 --> 00:20:14,860 in linear time for small-bounded-- 331 00:20:14,860 --> 00:20:17,530 bounded in terms of the input, polynomially 332 00:20:17,530 --> 00:20:20,620 bounded in terms of the input-- ranges of numbers. 333 00:20:20,620 --> 00:20:25,750 So we leverage that direct axis array to get counting sort. 334 00:20:25,750 --> 00:20:30,310 And then we kind of amplified that effect 335 00:20:30,310 --> 00:20:34,390 by sorting on a bunch of digits multiple times 336 00:20:34,390 --> 00:20:38,920 to get basically polynomial blow-up in terms of the numbers 337 00:20:38,920 --> 00:20:41,570 that we could sort in linear time. 338 00:20:41,570 --> 00:20:46,526 So that's an overview of the content of quiz 1. 339 00:20:46,526 --> 00:20:51,480 In quiz 2, we were kind of like, OK, now you 340 00:20:51,480 --> 00:20:57,210 know how to find things within a set of just 341 00:20:57,210 --> 00:21:00,420 a flat list of things, you can put it in a data structure. 342 00:21:00,420 --> 00:21:04,710 But in a sense, a graph is a special kind of data structure 343 00:21:04,710 --> 00:21:08,940 that relates the different things in your input. 344 00:21:08,940 --> 00:21:14,070 So if you've got a bunch of vertices, 345 00:21:14,070 --> 00:21:17,100 there's a relation now between those vertices 346 00:21:17,100 --> 00:21:18,400 that are your edges. 347 00:21:18,400 --> 00:21:20,640 And this is a super useful framework 348 00:21:20,640 --> 00:21:22,980 in talking about discrete systems, 349 00:21:22,980 --> 00:21:25,050 because you can think of a vertex 350 00:21:25,050 --> 00:21:29,160 as a state of your system, and then connect these transitions 351 00:21:29,160 --> 00:21:30,420 as a graph. 352 00:21:30,420 --> 00:21:33,030 That's the reason why-- 353 00:21:33,030 --> 00:21:35,970 I mean, graphs are awesome, but they're 354 00:21:35,970 --> 00:21:38,140 awesome because they can be used to model 355 00:21:38,140 --> 00:21:40,590 so many different things within our world. 356 00:21:40,590 --> 00:21:44,490 It's not just about road networks. 357 00:21:44,490 --> 00:21:49,870 It can also be about playing your favorite turn-based game, 358 00:21:49,870 --> 00:21:51,670 like Tilt. 359 00:21:51,670 --> 00:21:55,530 OK, so we talked about a lot of different types of problems 360 00:21:55,530 --> 00:21:57,330 that you could solve, various algorithms, 361 00:21:57,330 --> 00:22:01,290 with a focus on a bunch of different ways of solving 362 00:22:01,290 --> 00:22:03,630 single-source shortest paths. 363 00:22:03,630 --> 00:22:05,573 And again, just like the sorting algorithms 364 00:22:05,573 --> 00:22:06,990 and just like the data structures, 365 00:22:06,990 --> 00:22:10,920 we presented multiple of them, because we 366 00:22:10,920 --> 00:22:14,310 had this trade-off of generality of the graph 367 00:22:14,310 --> 00:22:19,240 that they apply to contrasted with the running time. 368 00:22:19,240 --> 00:22:26,790 So I guess, in particular, the top line 369 00:22:26,790 --> 00:22:29,610 there is, in some sense, the most restrictive. 370 00:22:29,610 --> 00:22:31,500 We don't have any cycles in our graph. 371 00:22:31,500 --> 00:22:33,000 That's a very special type of graph, 372 00:22:33,000 --> 00:22:36,080 and we're able to get linear time. 373 00:22:36,080 --> 00:22:38,180 But then even if we do have cycles in our graph, 374 00:22:38,180 --> 00:22:41,420 we can do better if we have a bound on the weights 375 00:22:41,420 --> 00:22:45,260 in our thing, whether they be-- 376 00:22:45,260 --> 00:22:47,750 there's an easy conversion to a linear time algorithm 377 00:22:47,750 --> 00:22:51,170 via an unweighted process, or whether these things are 378 00:22:51,170 --> 00:22:53,480 non-negative, so there can't be negative weight cycles, 379 00:22:53,480 --> 00:22:55,040 and we don't have to deal with that. 380 00:22:55,040 --> 00:22:58,970 OK, so that's quiz 2 material. 381 00:22:58,970 --> 00:23:02,270 And then quiz 3 material was kind 382 00:23:02,270 --> 00:23:07,910 of applying this graph material to a recursive framework. 383 00:23:07,910 --> 00:23:09,440 What was our recursive framework? 384 00:23:09,440 --> 00:23:10,580 Everyone say it with me. 385 00:23:10,580 --> 00:23:11,680 AUDIENCE: Dynamic programming. 386 00:23:11,680 --> 00:23:13,680 JASON KU: Dynamic programming, and the framework 387 00:23:13,680 --> 00:23:15,770 was SRT BOT, right? 388 00:23:15,770 --> 00:23:18,710 Missing a letter, but SORT BOT, right? 389 00:23:18,710 --> 00:23:21,680 You can actually think of the quiz 3 390 00:23:21,680 --> 00:23:26,600 material as really an application of the graph 391 00:23:26,600 --> 00:23:28,860 material. 392 00:23:28,860 --> 00:23:31,040 What are we doing in SORT BOT? 393 00:23:31,040 --> 00:23:34,350 We're defining a set of subproblems. 394 00:23:34,350 --> 00:23:37,320 These are a set of vertices in a graph. 395 00:23:37,320 --> 00:23:39,240 What is the relationship doing? 396 00:23:39,240 --> 00:23:43,170 It's saying, what are the relation 397 00:23:43,170 --> 00:23:45,480 between the subproblems, essentially 398 00:23:45,480 --> 00:23:48,770 defining the edges of a graph? 399 00:23:48,770 --> 00:23:52,580 And then this topological order and the base cases, 400 00:23:52,580 --> 00:23:54,470 all of these things are just saying, 401 00:23:54,470 --> 00:23:57,200 what is the problem that I want to solve on this graph? 402 00:23:57,200 --> 00:24:01,580 And how do I compute that for things that 403 00:24:01,580 --> 00:24:04,670 don't have any outgoing edges? 404 00:24:04,670 --> 00:24:06,950 I need to start writing on the board again. 405 00:24:06,950 --> 00:24:09,590 This is graphs. 406 00:24:09,590 --> 00:24:14,930 There was sorting in here, too. 407 00:24:14,930 --> 00:24:16,730 This is basically an application. 408 00:24:21,070 --> 00:24:24,380 OK, graphs was basically a relationship 409 00:24:24,380 --> 00:24:26,330 on these non-constant things. 410 00:24:30,160 --> 00:24:35,700 So this was kind of like useful black boxes 411 00:24:35,700 --> 00:24:39,730 that you can just bundle up and stick in some inputs, 412 00:24:39,730 --> 00:24:43,200 stick out some outputs, and you're golden. 413 00:24:43,200 --> 00:24:45,990 Whereas quiz 3 was very different, 414 00:24:45,990 --> 00:24:47,730 the material in quiz 3 is very different. 415 00:24:47,730 --> 00:24:59,490 Dynamic programming, while it was, in some sense, 416 00:24:59,490 --> 00:25:00,920 related to this graph material-- 417 00:25:00,920 --> 00:25:02,900 I'm constructing a graph-- 418 00:25:02,900 --> 00:25:04,610 I have to construct that graph. 419 00:25:04,610 --> 00:25:08,660 There's a creative process in trying to construct that graph. 420 00:25:08,660 --> 00:25:10,190 I don't give you a set of vertices. 421 00:25:10,190 --> 00:25:14,180 Usually what I give you are a set of-- 422 00:25:14,180 --> 00:25:15,770 a sequence or something like that. 423 00:25:15,770 --> 00:25:19,840 And you have to construct vertices, subproblems, 424 00:25:19,840 --> 00:25:22,870 that will be able to be related in a recursive way 425 00:25:22,870 --> 00:25:24,290 so you can solve the problem. 426 00:25:24,290 --> 00:25:27,340 This is a very much more difficult thing 427 00:25:27,340 --> 00:25:29,200 than these other things, I think, 428 00:25:29,200 --> 00:25:32,560 because there's a lot more creativity in this. 429 00:25:32,560 --> 00:25:35,530 In the same way that just applying-- 430 00:25:35,530 --> 00:25:39,830 reducing to the graph algorithms we have is fairly easy. 431 00:25:39,830 --> 00:25:42,910 But actually doing some graph transformations 432 00:25:42,910 --> 00:25:44,890 to change the shape of the graph so 433 00:25:44,890 --> 00:25:47,140 that you can apply these algorithms, that's 434 00:25:47,140 --> 00:25:50,320 a harder thing to do. 435 00:25:50,320 --> 00:25:53,530 The difficulty with these two sets of materials 436 00:25:53,530 --> 00:25:55,180 is very similar. 437 00:25:55,180 --> 00:25:56,958 Figuring out what the graph should be, 438 00:25:56,958 --> 00:25:58,750 figuring out what the subproblems should be 439 00:25:58,750 --> 00:26:02,596 and how they relate, is really the entire part of the-- 440 00:26:02,596 --> 00:26:07,390 the entire difficulty with solving problems recursively. 441 00:26:07,390 --> 00:26:10,210 And we've only given you a taste of solving problems 442 00:26:10,210 --> 00:26:12,470 recursively. 443 00:26:12,470 --> 00:26:17,300 In future classes, like 6.046, which 444 00:26:17,300 --> 00:26:21,490 is the follow-on to this one in the undergraduate curriculum, 445 00:26:21,490 --> 00:26:23,640 this is all about introduction to algorithms. 446 00:26:23,640 --> 00:26:27,210 The next one's about design and analysis of algorithms. 447 00:26:27,210 --> 00:26:34,130 It's quite a bit more difficult, because we've mostly 448 00:26:34,130 --> 00:26:39,710 left it to you to use the things that we gave you 449 00:26:39,710 --> 00:26:42,350 or make your own algorithms based 450 00:26:42,350 --> 00:26:46,910 on this very nice cookbook-like framework 451 00:26:46,910 --> 00:26:49,610 that you can plug in a recursive algorithm to. 452 00:26:49,610 --> 00:26:51,320 Now actually, that cookbook is super 453 00:26:51,320 --> 00:26:56,210 nice for any way of looking at a problem recursively, 454 00:26:56,210 --> 00:27:03,300 but while in dynamic programming, 455 00:27:03,300 --> 00:27:08,730 the inductive hypothesis of combining your subproblems 456 00:27:08,730 --> 00:27:12,850 is almost trivial, in other types 457 00:27:12,850 --> 00:27:17,120 of recursive algorithms, that's not necessarily the case. 458 00:27:17,120 --> 00:27:19,120 Especially when instead of looking 459 00:27:19,120 --> 00:27:22,560 at all possible choices, for example, 460 00:27:22,560 --> 00:27:24,380 in a greedy algorithm where you're just 461 00:27:24,380 --> 00:27:27,620 looking at one of the choices, the locally best thing, 462 00:27:27,620 --> 00:27:31,550 and recursing forward, you're not doing all the work. 463 00:27:31,550 --> 00:27:33,050 You're not locally brute-forcing. 464 00:27:33,050 --> 00:27:37,070 Your locally picking an optimal thing locally and hoping that 465 00:27:37,070 --> 00:27:38,630 will lead you to good thing. 466 00:27:38,630 --> 00:27:44,510 That's a much harder algorithmic paradigm to operate under. 467 00:27:44,510 --> 00:27:47,120 And so that's more like the material 468 00:27:47,120 --> 00:27:50,180 that you'll be talking about in 6.046. 469 00:27:50,180 --> 00:27:59,560 So that's 006, a very quick overview of the content of this 470 00:27:59,560 --> 00:28:00,800 class. 471 00:28:00,800 --> 00:28:05,230 And we really like the structure of how this class is laid out, 472 00:28:05,230 --> 00:28:08,140 because it gives you a fundamental idea of the things 473 00:28:08,140 --> 00:28:13,990 people use to store information on a computer and a sense 474 00:28:13,990 --> 00:28:16,690 of how you solve problems computationally 475 00:28:16,690 --> 00:28:19,780 and how to argue that they're correct and efficient. 476 00:28:19,780 --> 00:28:21,640 That's really what this problem-- 477 00:28:21,640 --> 00:28:25,650 this course is about. 478 00:28:25,650 --> 00:28:30,380 And if you feel like you enjoy this kind of stuff, 479 00:28:30,380 --> 00:28:35,060 that's where you go to take 6.046. 480 00:28:35,060 --> 00:28:39,080 And 6.046 was actually the first algorithms class 481 00:28:39,080 --> 00:28:43,745 I ever took here at MIT, as a grad student actually. 482 00:28:46,370 --> 00:28:49,310 This was hard for me. 483 00:28:49,310 --> 00:28:53,390 It's actually hard to look at these problems, these types, 484 00:28:53,390 --> 00:28:55,400 and think in a computational way, 485 00:28:55,400 --> 00:28:59,450 especially having not taken this class, 6.006. 486 00:28:59,450 --> 00:29:02,420 So hopefully you guys are all in a better position 487 00:29:02,420 --> 00:29:04,510 than I was when I took it. 488 00:29:04,510 --> 00:29:09,280 There's two ways I like to think of the content in 6.046. 489 00:29:09,280 --> 00:29:12,880 One is kind of just as an extension of 006. 490 00:29:12,880 --> 00:29:14,800 It's the natural follow-on to the things 491 00:29:14,800 --> 00:29:16,570 that we do in this class. 492 00:29:16,570 --> 00:29:19,247 They still talk about data structures. 493 00:29:23,150 --> 00:29:28,030 This isn't the core part of 046, but they do touch on data 494 00:29:28,030 --> 00:29:31,180 structures for more complicated-- 495 00:29:31,180 --> 00:29:33,970 that have more complicated analyses involved in them. 496 00:29:33,970 --> 00:29:37,990 It's really about-- usually in 046, 497 00:29:37,990 --> 00:29:42,920 stating what the algorithm is doing is not so hard. 498 00:29:42,920 --> 00:29:45,980 Basically, giving you the algorithm, 499 00:29:45,980 --> 00:29:49,190 number one here, is not so difficult, 500 00:29:49,190 --> 00:29:52,430 to state what's happening in the algorithm. 501 00:29:52,430 --> 00:29:55,730 But the number two and number three here, 502 00:29:55,730 --> 00:29:59,060 arguing that that thing is correct and arguing that thing 503 00:29:59,060 --> 00:30:04,430 is efficient, that's where the complexity comes in in 046. 504 00:30:04,430 --> 00:30:08,360 The analysis part is quite a bit more complicated in 046 than 505 00:30:08,360 --> 00:30:09,920 in 006. 506 00:30:09,920 --> 00:30:17,240 So they solve a problem called union-find and give a much-- 507 00:30:17,240 --> 00:30:19,700 we talked a little bit about amortization. 508 00:30:19,700 --> 00:30:24,290 This goes into a much better-- a much more formal way of proving 509 00:30:24,290 --> 00:30:29,840 things run in amortized time. 510 00:30:29,840 --> 00:30:38,690 So this is basically amortization via what 511 00:30:38,690 --> 00:30:40,370 we call a potential analysis. 512 00:30:46,030 --> 00:30:51,052 It's basically making that notion 513 00:30:51,052 --> 00:30:52,760 that we talked about when we were talking 514 00:30:52,760 --> 00:30:57,140 about dynamic arrays of, we're not doing this expensive thing 515 00:30:57,140 --> 00:30:58,310 too often. 516 00:30:58,310 --> 00:31:00,170 Basically what we do is we keep track 517 00:31:00,170 --> 00:31:03,140 of the cost of all sequence of operations 518 00:31:03,140 --> 00:31:06,740 and prove that the average cost is small. 519 00:31:06,740 --> 00:31:10,238 That's kind of what this potential analysis is doing. 520 00:31:10,238 --> 00:31:11,780 It's a little bit more formal process 521 00:31:11,780 --> 00:31:15,770 for making that argument a little more formal. 522 00:31:15,770 --> 00:31:16,270 Right. 523 00:31:16,270 --> 00:31:16,770 OK. 524 00:31:16,770 --> 00:31:19,340 So then on the graph side, this is 525 00:31:19,340 --> 00:31:22,610 kind of an extension of quiz 1-type material. 526 00:31:22,610 --> 00:31:25,140 This is, what is this union-find data structure? 527 00:31:25,140 --> 00:31:31,400 It's basically-- it's a set type thing, where 528 00:31:31,400 --> 00:31:37,300 I can make a set of just a single element, 529 00:31:37,300 --> 00:31:42,220 I can take two sets, merge them together, make them 530 00:31:42,220 --> 00:31:47,440 their union, and then given an object, I say, 531 00:31:47,440 --> 00:31:51,970 which set am I part of, essentially by electing 532 00:31:51,970 --> 00:31:53,920 a leader within a set and saying, 533 00:31:53,920 --> 00:31:57,280 return me a pointer to that one. 534 00:31:57,280 --> 00:32:00,730 And so this can be useful in dynamically maintaining, 535 00:32:00,730 --> 00:32:05,770 say, the connected components in a dynamically changing graph 536 00:32:05,770 --> 00:32:09,520 supporting the query of, am I in the same component 537 00:32:09,520 --> 00:32:11,350 as this other guy? 538 00:32:11,350 --> 00:32:13,870 That could be a very useful thing to know about a graph 539 00:32:13,870 --> 00:32:14,800 as it's changing. 540 00:32:14,800 --> 00:32:17,140 So that's an application of this problem. 541 00:32:17,140 --> 00:32:20,320 And they get near-constant performance 542 00:32:20,320 --> 00:32:22,000 for a lot of these queries. 543 00:32:22,000 --> 00:32:25,360 It's not quite, but pretty close. 544 00:32:25,360 --> 00:32:27,370 OK, on the graph side, they solve 545 00:32:27,370 --> 00:32:32,350 a number of very useful problems on graphs. 546 00:32:32,350 --> 00:32:35,770 Minimum Spanning Tree-- so I'm trying 547 00:32:35,770 --> 00:32:39,730 to find a tree connecting all of the vertices 548 00:32:39,730 --> 00:32:42,190 in a connected component of my graph. 549 00:32:42,190 --> 00:32:44,170 And I'm trying to find-- in a weighted graph, 550 00:32:44,170 --> 00:32:46,960 I'm trying to find the spanning tree that 551 00:32:46,960 --> 00:32:49,240 has minimum total weight. 552 00:32:49,240 --> 00:32:51,020 So that's a problem-- 553 00:32:51,020 --> 00:32:55,630 a fundamental problem in weighted-graph algorithms. 554 00:32:55,630 --> 00:32:59,680 They've solved this via a greedy algorithm. 555 00:32:59,680 --> 00:33:10,360 And network flows and I guess cuts. 556 00:33:10,360 --> 00:33:12,030 So this is-- what is this? 557 00:33:12,030 --> 00:33:15,840 This is, I'm given a weighted graph. 558 00:33:15,840 --> 00:33:21,120 Basically, each of the weights correspond to a capacity. 559 00:33:21,120 --> 00:33:23,880 I could push water through along this edge. 560 00:33:23,880 --> 00:33:28,830 And I may be given a source vertex and a sink vertex. 561 00:33:28,830 --> 00:33:31,500 And I want to say, I want to shove water 562 00:33:31,500 --> 00:33:35,820 through the source vertex along the edges 563 00:33:35,820 --> 00:33:37,890 with their various capacities. 564 00:33:37,890 --> 00:33:40,140 And I'll get some amount of water 565 00:33:40,140 --> 00:33:43,140 on the other end in the source. 566 00:33:43,140 --> 00:33:46,472 So the question is, what's the most amount of water 567 00:33:46,472 --> 00:33:47,680 that I can push through this? 568 00:33:47,680 --> 00:33:50,643 Well, I could build that pipe network 569 00:33:50,643 --> 00:33:53,060 with the different things and just do this experimentally. 570 00:33:53,060 --> 00:33:55,020 I just stick a bunch of-- 571 00:33:55,020 --> 00:33:56,820 maybe-- I'm a mechanical engineer, 572 00:33:56,820 --> 00:33:58,440 so that maybe makes sense to me. 573 00:33:58,440 --> 00:34:01,200 But you want to be able to just look at those numbers 574 00:34:01,200 --> 00:34:04,260 and be able to tell me how much water can I push through. 575 00:34:04,260 --> 00:34:08,730 That's what the max flow in a network is talking about. 576 00:34:08,730 --> 00:34:12,060 And we give you some polynomial time algorithms in this class, 577 00:34:12,060 --> 00:34:16,320 basically incremental algorithms that, kind of like Dijkstra, 578 00:34:16,320 --> 00:34:22,080 or kind of like Bellman-Ford, will incrementally update 579 00:34:22,080 --> 00:34:25,320 estimates to-- 580 00:34:25,320 --> 00:34:28,949 of a max flow and improve them over time. 581 00:34:31,980 --> 00:34:42,480 Then on the, basically, design paradigms, 582 00:34:42,480 --> 00:34:46,590 you've got more involved making your own divide-and-conquer 583 00:34:46,590 --> 00:34:57,930 algorithms, dynamic programming algorithms, greedy algorithms. 584 00:34:57,930 --> 00:35:00,900 Basically, they go a lot more in depth 585 00:35:00,900 --> 00:35:04,290 in terms of how to design these algorithms and these paradigms 586 00:35:04,290 --> 00:35:07,080 than we do in this class. 587 00:35:07,080 --> 00:35:10,170 And then the last thing is-- 588 00:35:10,170 --> 00:35:13,980 we only touched on complexity. 589 00:35:13,980 --> 00:35:18,400 And in a sense, 046 is only going to touch on complexity. 590 00:35:18,400 --> 00:35:20,520 It's a very big field. 591 00:35:20,520 --> 00:35:24,110 But it will give you the tools to be 592 00:35:24,110 --> 00:35:28,130 able to prove that something is NP-hard, whereas we just kind 593 00:35:28,130 --> 00:35:32,090 of say that, oh, there's this thing called a reduction. 594 00:35:32,090 --> 00:35:36,320 We didn't give you any problems in which you actually 595 00:35:36,320 --> 00:35:38,610 had to reduce one problem to another. 596 00:35:38,610 --> 00:35:40,740 And you'll do a lot more of that here. 597 00:35:40,740 --> 00:35:44,360 So, reductions. 598 00:35:44,360 --> 00:35:52,420 So in a big sense, 046 is really just a natural extension 599 00:35:52,420 --> 00:35:55,727 to the 006 material, plus some additional stuff, 600 00:35:55,727 --> 00:35:57,310 which I'm going to get to in a second. 601 00:35:57,310 --> 00:35:57,940 Yeah, question? 602 00:35:57,940 --> 00:35:59,690 AUDIENCE: Do you want to add randomization 603 00:35:59,690 --> 00:36:01,360 for time paradigms? 604 00:36:01,360 --> 00:36:03,250 JASON KU: I'm going to talk about that 605 00:36:03,250 --> 00:36:05,680 slightly in a separate-- 606 00:36:05,680 --> 00:36:07,660 I'll get to your question in just a second. 607 00:36:07,660 --> 00:36:11,410 I like to think of it as a separate topic, which 608 00:36:11,410 --> 00:36:13,600 I will go into right now. 609 00:36:13,600 --> 00:36:16,900 The separate topic I like to think of it as, instead 610 00:36:16,900 --> 00:36:23,040 of being the natural extension to the things in the 006 units, 611 00:36:23,040 --> 00:36:28,840 what I'm going to do is kind of relax either what it means 612 00:36:28,840 --> 00:36:33,155 to have a correct algorithm or relax what it means to-- 613 00:36:35,780 --> 00:36:39,110 what my model of computation is. 614 00:36:39,110 --> 00:36:51,300 So 006, this is kind of as an extension of 006. 615 00:36:51,300 --> 00:37:02,970 And this is kind of like 6.046 as change 616 00:37:02,970 --> 00:37:06,874 my definition of what it means to be correct or efficient. 617 00:37:10,590 --> 00:37:21,490 So we've already kind of done this a little bit in 006. 618 00:37:21,490 --> 00:37:23,540 Basically, one of the things that we can do, 619 00:37:23,540 --> 00:37:27,010 which is what the question that a student asked a question 620 00:37:27,010 --> 00:37:29,110 about, was about randomized algorithms, 621 00:37:29,110 --> 00:37:32,800 which is a big part of 046 actually-- 622 00:37:32,800 --> 00:37:36,610 randomized analysis of algorithms 623 00:37:36,610 --> 00:37:38,410 that are not deterministic. 624 00:37:38,410 --> 00:37:41,140 It's not guaranteed that it'll give you the same output 625 00:37:41,140 --> 00:37:43,930 every time or not guaranteed that it 626 00:37:43,930 --> 00:37:46,870 will do the same computations over the course 627 00:37:46,870 --> 00:37:48,670 of the algorithm every time. 628 00:37:48,670 --> 00:37:50,680 But it exploits some randomization. 629 00:38:01,650 --> 00:38:03,990 And in 006, this is-- 630 00:38:03,990 --> 00:38:07,530 we've mostly not touched on this, except in one area. 631 00:38:07,530 --> 00:38:10,200 Where did we use randomization? 632 00:38:10,200 --> 00:38:12,270 In hashing, right? 633 00:38:12,270 --> 00:38:16,080 When we used hashing, what were we doing? 634 00:38:16,080 --> 00:38:19,230 We changed the definition of correct versus efficient. 635 00:38:19,230 --> 00:38:22,200 We didn't really change the definition, what we did was 636 00:38:22,200 --> 00:38:26,700 we said that it was OK that sometimes our algorithm was 637 00:38:26,700 --> 00:38:29,260 slower than we-- 638 00:38:29,260 --> 00:38:32,200 than on-- in expectation. 639 00:38:32,200 --> 00:38:33,930 That's what we meant there. 640 00:38:33,930 --> 00:38:36,850 We're relaxing the idea of efficient, 641 00:38:36,850 --> 00:38:38,830 but we're still saying it's good, because most 642 00:38:38,830 --> 00:38:41,050 of the time it is good. 643 00:38:41,050 --> 00:38:44,860 So there's two types of randomized algorithms. 644 00:38:44,860 --> 00:38:47,200 They have these weird names based 645 00:38:47,200 --> 00:38:52,060 on betting regions of the world, shall we say? 646 00:38:52,060 --> 00:38:58,058 There are-- this is L-O? 647 00:38:58,058 --> 00:38:59,360 Los Vegas? 648 00:38:59,360 --> 00:39:04,670 It is Las, OK, Vegas algorithms. 649 00:39:04,670 --> 00:39:13,978 These are always correct, but probably efficient. 650 00:39:18,860 --> 00:39:21,980 In a sense, that's what hashing is. 651 00:39:21,980 --> 00:39:25,100 I'm always going to give you the right thing, 652 00:39:25,100 --> 00:39:27,140 whether this thing is in my set or not. 653 00:39:27,140 --> 00:39:30,410 But some of the time, it's inefficient. 654 00:39:30,410 --> 00:39:32,870 I have to look through a chain of length 655 00:39:32,870 --> 00:39:36,110 of-- that's linear in the size of the things that I'm storing. 656 00:39:36,110 --> 00:39:43,890 And this is in contrast to a Monte Carlo algorithm, 657 00:39:43,890 --> 00:39:52,100 which is always efficient for some definition of efficient, 658 00:39:52,100 --> 00:39:53,840 but only probably correct. 659 00:39:58,640 --> 00:40:03,170 And I mean, I could define you a hash table that has Monte Carlo 660 00:40:03,170 --> 00:40:04,910 semantics instead. 661 00:40:04,910 --> 00:40:08,835 Say, for example, I say that I'm going-- 662 00:40:08,835 --> 00:40:11,760 it's going to be exactly the same as a hash table, 663 00:40:11,760 --> 00:40:14,160 except instead of storing all the things that 664 00:40:14,160 --> 00:40:18,450 collide in a place, I just store the first two, say. 665 00:40:21,750 --> 00:40:24,900 Well, actually, that's actually going to be always efficient. 666 00:40:24,900 --> 00:40:26,400 I'm going to look through the things 667 00:40:26,400 --> 00:40:29,120 and see if it's in there. 668 00:40:29,120 --> 00:40:32,513 And the chains that I'm storing there only have two things. 669 00:40:32,513 --> 00:40:33,930 It's going to be always efficient. 670 00:40:33,930 --> 00:40:36,150 It's always going to give me constant time. 671 00:40:36,150 --> 00:40:38,660 But some of the time, it's going to be the wrong thing, 672 00:40:38,660 --> 00:40:41,630 because I'm not storing everything in that chain. 673 00:40:41,630 --> 00:40:45,200 So there's some probability that that's not going to be correct. 674 00:40:45,200 --> 00:40:47,480 And so that's a different kind of-- 675 00:40:47,480 --> 00:40:50,330 maybe I want my hash tables to always be fast, 676 00:40:50,330 --> 00:40:54,348 but I can afford to be wrong some of the time. 677 00:40:54,348 --> 00:40:54,890 I don't know. 678 00:40:54,890 --> 00:40:57,890 In practice, this is actually sometimes a good trade-off 679 00:40:57,890 --> 00:40:59,900 in real systems. 680 00:40:59,900 --> 00:41:02,490 Sometimes it's OK to be wrong some of the times, 681 00:41:02,490 --> 00:41:05,230 if we get good performance. 682 00:41:05,230 --> 00:41:23,770 OK, but generally can do better if you allow randomization. 683 00:41:23,770 --> 00:41:28,120 And by better I mean, usually we can get faster bounds 684 00:41:28,120 --> 00:41:32,770 on a lot of problems if we allow randomization and things aren't 685 00:41:32,770 --> 00:41:36,670 necessarily always correct or always efficient. 686 00:41:36,670 --> 00:41:42,130 So this is a big area in 046 that requires a lot more 687 00:41:42,130 --> 00:41:47,620 analysis using randomness and probability. 688 00:41:47,620 --> 00:41:52,910 So if you need some primers on that-- 689 00:41:52,910 --> 00:41:57,645 we didn't have a lot of this in 006, but if you go on to 046, 690 00:41:57,645 --> 00:41:59,770 that's going to be a really important thing for you 691 00:41:59,770 --> 00:42:02,140 to brush up on. 692 00:42:02,140 --> 00:42:08,110 The next part on 006 is kind of changing what our definition 693 00:42:08,110 --> 00:42:10,210 of correct or efficient means. 694 00:42:10,210 --> 00:42:12,850 I mean, we've restricted ourselves 695 00:42:12,850 --> 00:42:16,120 in this class to a class of problems where we only 696 00:42:16,120 --> 00:42:19,030 talk about integers. 697 00:42:19,030 --> 00:42:20,900 But there's tons of problems in this world, 698 00:42:20,900 --> 00:42:22,900 especially in scientific computing, 699 00:42:22,900 --> 00:42:28,510 where I want to be able to find out what this real number is. 700 00:42:28,510 --> 00:42:31,950 And I can't even store a real number on my computer. 701 00:42:31,950 --> 00:42:33,460 So what the hell, Jason? 702 00:42:33,460 --> 00:42:35,710 What are you talking about? 703 00:42:35,710 --> 00:42:37,420 I can't do that on a computer. 704 00:42:37,420 --> 00:42:41,680 But what I can do is basically compute things 705 00:42:41,680 --> 00:42:45,790 in a numerical sense-- numerical algorithms. 706 00:42:49,020 --> 00:42:52,710 And in 046, a lot of times we put this in the context 707 00:42:52,710 --> 00:43:01,890 of continuous optimization, continuous being the opportune 708 00:43:01,890 --> 00:43:03,750 word here, not discrete systems. 709 00:43:03,750 --> 00:43:08,220 You have a continuum of possible solutions, real numbers 710 00:43:08,220 --> 00:43:09,930 essentially. 711 00:43:09,930 --> 00:43:12,345 How do we do this on a computer that's a discrete system? 712 00:43:16,350 --> 00:43:20,550 Basically, in 046 what you can do, 713 00:43:20,550 --> 00:43:24,420 and in other numerical methods classes, what you can say is, 714 00:43:24,420 --> 00:43:28,290 well, I know that you can't return me a real number. 715 00:43:28,290 --> 00:43:28,920 I got that. 716 00:43:28,920 --> 00:43:31,890 Or you can maybe have a model of computation 717 00:43:31,890 --> 00:43:34,770 that allows integers to represent 718 00:43:34,770 --> 00:43:38,402 other kinds of real numbers, like radicals or rationals 719 00:43:38,402 --> 00:43:39,360 or something like that. 720 00:43:39,360 --> 00:43:41,220 And I can do manipulations on those. 721 00:43:41,220 --> 00:43:44,040 But really what these algorithms are usually about 722 00:43:44,040 --> 00:43:48,660 is computing real numbers not completely, 723 00:43:48,660 --> 00:43:51,570 but to some bounded precision. 724 00:43:51,570 --> 00:43:54,330 And I pay for that precision. 725 00:43:54,330 --> 00:43:57,450 The more bits of precision I want on my number, 726 00:43:57,450 --> 00:43:59,030 I have to pay for them. 727 00:43:59,030 --> 00:44:00,960 So this is basic-- 728 00:44:00,960 --> 00:44:02,910 I think of these as an approximation-- 729 00:44:05,730 --> 00:44:21,600 approximation of real number to some precision, 730 00:44:21,600 --> 00:44:33,490 and I pay for precision with time. 731 00:44:33,490 --> 00:44:37,320 So let's say I wanted to compute the square root of a number. 732 00:44:37,320 --> 00:44:40,650 I could have an algorithm just like the algorithms-- 733 00:44:40,650 --> 00:44:45,000 or I guess division, right, long division. 734 00:44:45,000 --> 00:44:47,400 You all know the algorithm of long division. 735 00:44:47,400 --> 00:44:51,960 You put the quotient under here with these-- an AB 736 00:44:51,960 --> 00:44:55,117 and you get the C on top or whatever. 737 00:44:55,117 --> 00:44:55,950 That's an algorithm. 738 00:44:55,950 --> 00:44:59,760 That's a procedure using essentially small numbers. 739 00:44:59,760 --> 00:45:02,550 I'm only talking about the digits zero to nine 740 00:45:02,550 --> 00:45:04,470 here when I'm doing that algorithm. 741 00:45:04,470 --> 00:45:07,140 So it's a procedure that only uses 742 00:45:07,140 --> 00:45:10,110 small integers to compute arbitrary 743 00:45:10,110 --> 00:45:13,590 precision of a division. 744 00:45:13,590 --> 00:45:15,030 So that's an algorithm, and I have 745 00:45:15,030 --> 00:45:18,690 to pay time to get more digits. 746 00:45:18,690 --> 00:45:22,140 So that's an example of this kind 747 00:45:22,140 --> 00:45:25,890 of-- how we live in the world of real numbers 748 00:45:25,890 --> 00:45:29,050 when all we have is a discrete system. 749 00:45:29,050 --> 00:45:33,810 And then the last category I'd like to talk about here 750 00:45:33,810 --> 00:45:37,230 is really approximation algorithms. 751 00:45:40,830 --> 00:45:43,530 Whereas this is kind of an approximation algorithm, 752 00:45:43,530 --> 00:45:45,870 I'm approximating my outputs, this 753 00:45:45,870 --> 00:45:50,680 is an approximation algorithm from the standpoint of, 754 00:45:50,680 --> 00:45:53,790 well, there's a lot of problems that I can't solve efficiently. 755 00:45:53,790 --> 00:45:54,540 They're NP-hard. 756 00:45:54,540 --> 00:46:00,690 They're in EXP or even harder problems. 757 00:46:00,690 --> 00:46:06,180 But maybe I'm OK with not getting the optimal solution. 758 00:46:06,180 --> 00:46:09,095 So this is in the domain of optimization problems. 759 00:46:15,090 --> 00:46:18,272 So most of the dynamic programming 760 00:46:18,272 --> 00:46:20,480 problems that we gave you were optimization problems. 761 00:46:20,480 --> 00:46:21,710 They're the shortest paths problems. 762 00:46:21,710 --> 00:46:23,043 Those are optimization problems. 763 00:46:23,043 --> 00:46:26,870 Basically, the possible outputs are ranked in some way-- 764 00:46:26,870 --> 00:46:32,210 the distance of a path that you return or something like that. 765 00:46:32,210 --> 00:46:33,570 They're ranked in some way. 766 00:46:33,570 --> 00:46:35,060 There is an optimal one-- 767 00:46:35,060 --> 00:46:38,510 the one with the smallest metric or something like that. 768 00:46:41,050 --> 00:46:43,600 Well, in an approximation algorithm what I do is, OK, 769 00:46:43,600 --> 00:46:46,720 I get that it's computationally difficult for you 770 00:46:46,720 --> 00:46:49,990 to give me the longest simple path in this graph, 771 00:46:49,990 --> 00:46:53,680 or the shortest possible route for my traveling salesman, 772 00:46:53,680 --> 00:46:57,730 but maybe that's OK. 773 00:46:57,730 --> 00:47:01,120 I mean, my engineering Spidey-sense 774 00:47:01,120 --> 00:47:04,480 tells me that within 10% is fine. 775 00:47:04,480 --> 00:47:08,470 So maybe instead of giving me the most optimal thing, 776 00:47:08,470 --> 00:47:09,910 can I give you an algorithm that's 777 00:47:09,910 --> 00:47:12,520 guaranteed to be within a certain distance 778 00:47:12,520 --> 00:47:15,010 from the optimal thing? 779 00:47:15,010 --> 00:47:17,860 Usually, we're looking for constant factor approximations 780 00:47:17,860 --> 00:47:22,630 which have low constant, or maybe 781 00:47:22,630 --> 00:47:26,020 even have to do for worse if such things don't exist. 782 00:47:26,020 --> 00:47:28,570 OK, so that's approximation algorithms. 783 00:47:28,570 --> 00:47:33,530 Can we get close to an optimal solution in polynomial time? 784 00:47:33,530 --> 00:47:35,080 OK. 785 00:47:35,080 --> 00:47:39,430 And then the last way we could change things in, especially 786 00:47:39,430 --> 00:47:42,970 future classes, though sometimes they talk about this in 046 787 00:47:42,970 --> 00:47:46,870 as well, is we could change the model of computation. 788 00:47:46,870 --> 00:47:50,140 We could basically change something about our computer 789 00:47:50,140 --> 00:47:54,910 to be put in some other weird paradigm of solving problems 790 00:47:54,910 --> 00:47:57,543 with more power essentially, or you're 791 00:47:57,543 --> 00:47:59,210 in a situation where there's less power. 792 00:47:59,210 --> 00:48:02,195 OK, so change in the model of computation. 793 00:48:12,590 --> 00:48:16,370 So what we've been talking to you in terms of model 794 00:48:16,370 --> 00:48:18,420 of computation is our word-RAM-- 795 00:48:18,420 --> 00:48:18,920 word-RAM. 796 00:48:21,830 --> 00:48:25,520 And that essentially says I can do arithmetic operations, 797 00:48:25,520 --> 00:48:28,310 and I can look up stuff in my memory in constant time. 798 00:48:28,310 --> 00:48:30,928 And but if I allocate a certain amount, 799 00:48:30,928 --> 00:48:32,970 I have to pay that amount and that kind of thing. 800 00:48:32,970 --> 00:48:36,320 So that's this word-RAM model. 801 00:48:36,320 --> 00:48:40,490 But in actuality, all of your computers, 802 00:48:40,490 --> 00:48:43,340 it's a lot easier for me to figure-- 803 00:48:43,340 --> 00:48:48,770 to find and read memory that's on my CPU in a register 804 00:48:48,770 --> 00:48:52,070 than it is for me to go out to the hard disk, ask this-- 805 00:48:52,070 --> 00:48:56,090 well, in my day, it used to be this movable mechanical head 806 00:48:56,090 --> 00:49:00,800 that had to go and scan over a bit on a CD-ROM drive 807 00:49:00,800 --> 00:49:05,930 and actually read what that thing was. 808 00:49:05,930 --> 00:49:09,050 So we can add complexity to our model 809 00:49:09,050 --> 00:49:14,600 to better account for the costs of operations on my machine. 810 00:49:14,600 --> 00:49:17,360 One of those models is called the cache model-- 811 00:49:21,830 --> 00:49:24,320 cache model. 812 00:49:24,320 --> 00:49:28,820 It's basically a hierarchy of memory. 813 00:49:28,820 --> 00:49:32,240 I have my registers on board my CPU. 814 00:49:32,240 --> 00:49:36,110 I have maybe an L1 cache that's close to my CPU. 815 00:49:36,110 --> 00:49:37,550 Then I have another set of caches 816 00:49:37,550 --> 00:49:40,310 and another set of caches maybe out to RAM. 817 00:49:40,310 --> 00:49:43,520 And reading from a hard disk, a solid state drive of some kind, 818 00:49:43,520 --> 00:49:46,340 that's the slowest thing to access. 819 00:49:46,340 --> 00:49:49,820 And I can put a cost associated with each of those things. 820 00:49:49,820 --> 00:49:51,350 And instead of having to-- 821 00:49:51,350 --> 00:49:57,760 having all of our operations be said to be constant, 822 00:49:57,760 --> 00:49:59,860 the constants are actually different, 823 00:49:59,860 --> 00:50:02,290 and I have to pay for that difference. 824 00:50:02,290 --> 00:50:04,210 And so that's extending our model 825 00:50:04,210 --> 00:50:07,150 to be a little bit more realistic to our machine. 826 00:50:07,150 --> 00:50:11,620 Another one is we have computers right now that 827 00:50:11,620 --> 00:50:14,920 operate in classical physics, that exploit 828 00:50:14,920 --> 00:50:16,690 things in classical physics. 829 00:50:16,690 --> 00:50:21,280 But in actuality, our world allows 830 00:50:21,280 --> 00:50:27,310 for even more complicated types of operations, 831 00:50:27,310 --> 00:50:30,790 like quantum operations, where you're exploiting entanglement 832 00:50:30,790 --> 00:50:33,980 and superposition of different atoms 833 00:50:33,980 --> 00:50:39,290 to potentially get operations that I can act on my data that 834 00:50:39,290 --> 00:50:43,010 are actually provably stronger than the classical models 835 00:50:43,010 --> 00:50:44,660 in some sense. 836 00:50:44,660 --> 00:50:48,560 So this is a huge reason why there's 837 00:50:48,560 --> 00:50:51,140 a lot of work being done in, say, 838 00:50:51,140 --> 00:50:56,960 lots of industry research facilities in figuring out 839 00:50:56,960 --> 00:50:57,590 these models. 840 00:50:57,590 --> 00:51:00,750 Because maybe if you can make a big enough quantum computer, 841 00:51:00,750 --> 00:51:03,800 you can break encryption and stuff in polynomial time. 842 00:51:03,800 --> 00:51:06,620 And that's something that maybe the NSA is interested in. 843 00:51:06,620 --> 00:51:08,690 And I'm not going to go into that. 844 00:51:08,690 --> 00:51:11,390 But, you know. 845 00:51:11,390 --> 00:51:12,800 I mean, some people-- 846 00:51:12,800 --> 00:51:15,200 you look at artificial intelligence and things like 847 00:51:15,200 --> 00:51:17,630 discussions around artificial intelligence, 848 00:51:17,630 --> 00:51:24,430 my brain might be doing things that a classical computer 849 00:51:24,430 --> 00:51:26,290 cannot. 850 00:51:26,290 --> 00:51:29,890 It could be using quantum superposition in some way. 851 00:51:29,890 --> 00:51:34,147 And our computers that are in your phone and your laptop 852 00:51:34,147 --> 00:51:36,480 and things like that aren't exploiting those operations, 853 00:51:36,480 --> 00:51:38,063 so how could we ever get intelligence, 854 00:51:38,063 --> 00:51:41,620 because in some sense, our brains are more powerful. 855 00:51:41,620 --> 00:51:45,070 And so a lot of what AI should be looking into 856 00:51:45,070 --> 00:51:50,050 is, what is the actual model of computation of our brains 857 00:51:50,050 --> 00:51:54,010 that can give us the power to have sentience? 858 00:51:54,010 --> 00:51:56,440 OK, so that's kind of quantum computing. 859 00:51:56,440 --> 00:51:58,720 I don't know much about it actually. 860 00:51:58,720 --> 00:52:04,950 And then there's things like, maybe I have more than one CPU. 861 00:52:04,950 --> 00:52:07,660 I mean, most computers-- all the computers you have, 862 00:52:07,660 --> 00:52:11,080 even the ones in your phone, probably have multiple cores. 863 00:52:11,080 --> 00:52:14,750 In a sense, you have lots of CPUs running in parallel. 864 00:52:14,750 --> 00:52:17,120 So this is like par-- 865 00:52:17,120 --> 00:52:18,535 there's one R in parallel? 866 00:52:28,080 --> 00:52:31,410 Parallel computing basically says, it's cheap for me 867 00:52:31,410 --> 00:52:34,650 to make another computer potentially. 868 00:52:34,650 --> 00:52:38,130 If I have two computers running on the same problem, 869 00:52:38,130 --> 00:52:41,772 maybe I can get a two-fold speed-up on my-- 870 00:52:41,772 --> 00:52:46,670 on the time in which it takes to solve my problem. 871 00:52:46,670 --> 00:52:51,950 Now, suppose I had then 100 CPUs running on a machine. 872 00:52:51,950 --> 00:52:53,690 Maybe I can get 100-fold speed-up. 873 00:52:53,690 --> 00:52:58,140 And actually, in real life, 100-fold speed-up 874 00:52:58,140 --> 00:53:00,000 makes a difference. 875 00:53:00,000 --> 00:53:03,060 It's, am I waiting for this for 10 minutes? 876 00:53:03,060 --> 00:53:07,110 Or am I waiting for this for 1,000 minutes? 877 00:53:07,110 --> 00:53:09,030 That's, like, all day. 878 00:53:09,030 --> 00:53:10,140 I don't want to do that. 879 00:53:10,140 --> 00:53:11,220 Maybe it's on weeks. 880 00:53:11,220 --> 00:53:13,050 I don't even remember. 881 00:53:13,050 --> 00:53:17,040 But parallel computing, if I can get a 100-fold speed-up, 882 00:53:17,040 --> 00:53:18,870 that might be a huge win. 883 00:53:18,870 --> 00:53:22,780 But for some problems, it's not possible-- 884 00:53:22,780 --> 00:53:29,200 if I have k CPUs, can I get a k-factor speed-up? 885 00:53:29,200 --> 00:53:32,390 It's not always possible to do. 886 00:53:32,390 --> 00:53:35,330 And so parallel computing is another paradigm 887 00:53:35,330 --> 00:53:40,920 in which there's a lot of interesting theory going on. 888 00:53:40,920 --> 00:53:42,680 There's a lot of complications there, 889 00:53:42,680 --> 00:53:45,020 because there are a couple different models. 890 00:53:45,020 --> 00:53:46,790 You can have multicore set-up, where 891 00:53:46,790 --> 00:53:48,800 you have a lot of computers that are accessing 892 00:53:48,800 --> 00:53:50,930 the same bank of memory. 893 00:53:50,930 --> 00:53:53,690 And then you don't want them all to be reading and writing 894 00:53:53,690 --> 00:53:56,720 from them at different times, because you don't necessarily 895 00:53:56,720 --> 00:53:59,373 know what their state is, and you get these collisions, 896 00:53:59,373 --> 00:54:01,040 which are something that you really have 897 00:54:01,040 --> 00:54:03,320 to think about in this world. 898 00:54:03,320 --> 00:54:05,960 Or you have situations where maybe I 899 00:54:05,960 --> 00:54:08,990 have a bunch of nano-flies or something 900 00:54:08,990 --> 00:54:12,050 that are going around, and they have very small computer 901 00:54:12,050 --> 00:54:15,110 brains themselves. 902 00:54:15,110 --> 00:54:17,662 But they can talk to each other and pass information 903 00:54:17,662 --> 00:54:19,370 to each other, but they don't have access 904 00:54:19,370 --> 00:54:22,230 to one central network repository of information. 905 00:54:22,230 --> 00:54:25,460 That's what we call a distributed parallel system, 906 00:54:25,460 --> 00:54:28,940 where all of the CPUs that you have 907 00:54:28,940 --> 00:54:31,730 can interact with each other maybe locally, 908 00:54:31,730 --> 00:54:35,880 but don't have access to the same memory system. 909 00:54:35,880 --> 00:54:40,760 So they have to work together to learn information 910 00:54:40,760 --> 00:54:41,840 about the system. 911 00:54:41,840 --> 00:54:48,260 OK, so that's a brief overview of the different directions 912 00:54:48,260 --> 00:54:51,530 this class, 6.006, and theory in general, 913 00:54:51,530 --> 00:54:55,820 could lead you-- into a huge array of different branches 914 00:54:55,820 --> 00:54:58,280 theory and different problems that you 915 00:54:58,280 --> 00:55:01,790 could address with different types of computers. 916 00:55:01,790 --> 00:55:07,220 So I know this is a very high-level lecture and maybe 917 00:55:07,220 --> 00:55:10,730 less-applied than some of you might like. 918 00:55:10,730 --> 00:55:15,530 But I hope this gives you a good understanding of the directions 919 00:55:15,530 --> 00:55:19,400 you can go after this class that I think 920 00:55:19,400 --> 00:55:23,120 are really excited in terms of how to solve problems 921 00:55:23,120 --> 00:55:24,890 computationally. 922 00:55:24,890 --> 00:55:29,410 So with that, I'd like to end there.