1 00:00:00 --> 00:00:00 2 00:00:00 --> 00:00:02 ANNOUNCER: Open content is provided under a creative 3 00:00:02 --> 00:00:03 commons license. 4 00:00:03 --> 00:00:06 Your support will help MIT OpenCourseWare continue to 5 00:00:06 --> 00:00:10 offer high quality educational resources for free. 6 00:00:10 --> 00:00:13 To make a donation, or view additional materials from 7 00:00:13 --> 00:00:16 hundreds of MIT courses, visit MIT OpenCourseWare 8 00:00:16 --> 00:00:19 at ocw.mit.edu. 9 00:00:19 --> 00:00:23 PROFESSOR JOHN GUTTAG: OK. 10 00:00:23 --> 00:00:27 I finished up last time talking about lists. 11 00:00:27 --> 00:00:30 And I pointed out that lists are mutable, showed you 12 00:00:30 --> 00:00:34 some examples of mutation. 13 00:00:34 --> 00:00:39 We can look at it here; we looked at append, which added 14 00:00:39 --> 00:00:43 things to lists, we looked at delete, deleting 15 00:00:43 --> 00:00:45 things from a list. 16 00:00:45 --> 00:00:52 You can also assign to a list, or to an element of a list. 17 00:00:52 --> 00:00:58 So ivy sub 1, for example, could be assigned minus 15, 18 00:00:58 --> 00:01:03 and that will actually mutate the list. 19 00:01:03 --> 00:01:08 So heretofore, when we wrote assignment, what we always 20 00:01:08 --> 00:01:13 meant, was changing the binding of a variable 21 00:01:13 --> 00:01:15 to a different object. 22 00:01:15 --> 00:01:22 Here, we are overloading the notation to say, no, no, ivys 23 00:01:22 --> 00:01:29 is still bound to the same object, but an element of ivys 24 00:01:29 --> 00:01:33 is bound to a different object. 25 00:01:33 --> 00:01:37 If you think about it, that makes sense, because when we 26 00:01:37 --> 00:01:49 have a list, what a list is, is a sequence of objects. 27 00:01:49 --> 00:01:55 And what this says is, is the object named by the expression 28 00:01:55 --> 00:02:03 ivys sub 1, is now bound to the object, if you will, named 29 00:02:03 --> 00:02:08 by the constant minus 15. 30 00:02:08 --> 00:02:20 So we can watch this run here. 31 00:02:20 --> 00:02:25 Idle can-- that's exciting. 32 00:02:25 --> 00:02:30 I hadn't expected that answer. 33 00:02:30 --> 00:02:31 All right, your question. 34 00:02:31 --> 00:02:31 STUDENT: [INAUDIBLE] 35 00:02:31 --> 00:02:34 four elements to ivys, and you tell it to change the fifth 36 00:02:34 --> 00:02:42 element of ivys to negative 15, will it add it or [INAUDIBLE] 37 00:02:42 --> 00:02:44 PROFESSOR JOHN GUTTAG: Well, I'll tell you how ol-- let's 38 00:02:44 --> 00:02:48 answer that the easy way. 39 00:02:48 --> 00:02:55 We'll start up a shell and we'll try it. 40 00:02:55 --> 00:02:59 All right, we'll just get out of what we were doing here. 41 00:02:59 --> 00:03:05 And so, we now have some things, so we, for example, 42 00:03:05 --> 00:03:11 have ivys, I can print ivys , and it's only got three 43 00:03:11 --> 00:03:14 elements but your question probably is just as good for 44 00:03:14 --> 00:03:18 adding the fourth as adding the fifth, so what would happen if 45 00:03:18 --> 00:03:24 we say ivys sub 3-- because that of course is the 46 00:03:24 --> 00:03:32 fourth element, right? 47 00:03:32 --> 00:03:36 Let's find out. 48 00:03:36 --> 00:03:36 OK. 49 00:03:36 --> 00:03:45 Because what that does is, it's changing the binding of the 50 00:03:45 --> 00:03:49 name ivys, in this case, sub 1. 51 00:03:49 --> 00:03:52 What it looked at here, with the name ivys sub 3, and said 52 00:03:52 --> 00:03:56 that name doesn't-- isn't bound, right? 53 00:03:56 --> 00:03:58 That isn't there. 54 00:03:58 --> 00:04:00 So it couldn't do it, so instead that's what append is 55 00:04:00 --> 00:04:05 for, is to stick things on to the end of the list. 56 00:04:05 --> 00:04:08 But a very good question. 57 00:04:08 --> 00:04:16 So we can see what we did here, and, but of course I can now, 58 00:04:16 --> 00:04:25 if I choose, say something like, ivys sub 1 is assigned 59 00:04:25 --> 00:04:32 minus 15, and now if I print ivys, there it is. 60 00:04:32 --> 00:04:35 And again, this points out something I wanted to me-- I 61 00:04:35 --> 00:04:41 mentioned last time, list can be heterogeneous, in the sense 62 00:04:41 --> 00:04:45 that the elements can be multiple different types. 63 00:04:45 --> 00:04:47 As you see here, some of the elements are strings and some 64 00:04:47 --> 00:04:51 of the elements are integers. 65 00:04:51 --> 00:04:56 Let's look at another example. 66 00:04:56 --> 00:05:08 Let's suppose, we'll start with the list, I'll call it l 1. 67 00:05:08 --> 00:05:12 This, by the way, is a really bad thing I just did. 68 00:05:12 --> 00:05:15 What was-- what's really bad about calling a list l 1? 69 00:05:15 --> 00:05:18 STUDENT: [INAUDIBLE] 70 00:05:18 --> 00:05:22 PROFESSOR JOHN GUTTAG: Is it l 1, or is it 11, or is it l l? 71 00:05:22 --> 00:05:27 It's a bad habit to get into when you write programs, so I 72 00:05:27 --> 00:05:30 never use lowercase L except when I'm spelling the word 73 00:05:30 --> 00:05:33 where it's obvious, because otherwise I get all sorts 74 00:05:33 --> 00:05:35 of crazy things going on. 75 00:05:35 --> 00:05:41 All right, so let's make it the list 123. 76 00:05:41 --> 00:05:43 All right? 77 00:05:43 --> 00:05:50 Now, I'll say L 2 equals L 1. 78 00:05:50 --> 00:05:54 Now I'll print L 2. 79 00:05:54 --> 00:05:57 Kind of what you'd guess, but here's the interesting 80 00:05:57 --> 00:06:05 question: if I say L 1 is assigned 0, L 1 sub 0 is 81 00:06:05 --> 00:06:10 assigned 4, I'll print L 1. 82 00:06:10 --> 00:06:13 That's what you expect, but what's going to happen 83 00:06:13 --> 00:06:22 if I print L 2? 84 00:06:22 --> 00:06:27 423 as well, and that's because what happened is I had this 85 00:06:27 --> 00:06:34 model, which we looked at last time, where I had the list L 1, 86 00:06:34 --> 00:06:46 which was bound to an object, and then the assignment L 2 87 00:06:46 --> 00:06:54 gets L 1, bound the name L 2 to the same object, so when I 88 00:06:54 --> 00:07:00 mutated this object, which I reached through the name L 1 to 89 00:07:00 --> 00:07:06 make that 4, since this name was bound to the same object, 90 00:07:06 --> 00:07:12 when I print it, I got 423. 91 00:07:12 --> 00:07:17 So that's the key thing to-- to realize; that what the 92 00:07:17 --> 00:07:23 assignment did was have two separate paths to 93 00:07:23 --> 00:07:25 the same object. 94 00:07:25 --> 00:07:28 So I could get to that object either through this path or 95 00:07:28 --> 00:07:31 through that path, it didn't matter which path I use to 96 00:07:31 --> 00:07:35 modify it, I would see it when I looked at the other. 97 00:07:35 --> 00:07:36 Yes. 98 00:07:36 --> 00:07:44 STUDENT: [INAUDIBLE] 99 00:07:44 --> 00:07:50 PROFESSOR JOHN GUTTAG: So the question, if I said a is 100 00:07:50 --> 00:07:56 assigned 2, b is assigned a, and then a is assigned 3. 101 00:07:56 --> 00:07:58 Is that your question? 102 00:07:58 --> 00:08:06 So the question is, a is assigned 1, b is assigned a, 103 00:08:06 --> 00:08:18 a is assigned 2, and then if I print b, I'll get 1. 104 00:08:18 --> 00:08:25 Because these are not mutable, this is going to be assigned to 105 00:08:25 --> 00:08:29 an object in the store, so we'll draw the picture over 106 00:08:29 --> 00:08:36 here, that we had initially a is bound to an object with 1 in 107 00:08:36 --> 00:08:46 it, and then b got bound to the same object, but then when I 108 00:08:46 --> 00:08:51 did the assignment, what that did was it broke this 109 00:08:51 --> 00:08:58 connection, and now had a assigned to a different object, 110 00:08:58 --> 00:09:04 with the number, in this case, 2 in it. 111 00:09:04 --> 00:09:12 Whereas the list assignment you see here did not rebind the 112 00:09:12 --> 00:09:18 object l 1, it changed this. 113 00:09:18 --> 00:09:18 OK? 114 00:09:18 --> 00:09:21 Now formally I could have had this pointing off to another 115 00:09:21 --> 00:09:26 object containing 4, but that just seemed excessive, right? 116 00:09:26 --> 00:09:28 But you see the difference. 117 00:09:28 --> 00:09:32 Great question, and a very important thing to understand, 118 00:09:32 --> 00:09:36 and that's why I'm belaboring this point, since this is where 119 00:09:36 --> 00:09:39 people tend to get pretty confused, and this is 120 00:09:39 --> 00:09:42 why mutation is very important to understand. 121 00:09:42 --> 00:09:43 Yeah. 122 00:09:43 --> 00:09:45 STUDENT: [UNINTELLIGIBLE] 123 00:09:45 --> 00:09:46 PROFESSOR JOHN GUTTAG: I'm just assuming it'll 124 00:09:46 --> 00:09:47 be a great question. 125 00:09:47 --> 00:09:57 STUDENT: [INAUDIBLE] 126 00:09:57 --> 00:09:59 PROFESSOR JOHN GUTTAG: Exactly. 127 00:09:59 --> 00:10:04 So if-- very good question-- so, for example, we 128 00:10:04 --> 00:10:07 can just do it here. 129 00:10:07 --> 00:10:19 The question was, suppose I now type L 1 equals the empty list. 130 00:10:19 --> 00:10:30 I can print L 1, and I can print L 2, because again, 131 00:10:30 --> 00:10:34 that's analogous to this example, where I just swung the 132 00:10:34 --> 00:10:37 binding of the identifier. 133 00:10:37 --> 00:10:40 So this is important, it's a little bit subtle, but if you 134 00:10:40 --> 00:10:44 don't really understand this deeply, you'll find yourself 135 00:10:44 --> 00:10:48 getting confused a lot. 136 00:10:48 --> 00:10:50 All right? 137 00:10:50 --> 00:10:51 OK. 138 00:10:51 --> 00:10:59 Let me move on, and I want to talk about one more type. 139 00:10:59 --> 00:11:02 By the way, if you look at the handout from last time, you'll 140 00:11:02 --> 00:11:05 see that there's some other examples of mutation, including 141 00:11:05 --> 00:11:07 a function that does mutation. 142 00:11:07 --> 00:11:09 It's kind of interesting, but I don't think we need-- think 143 00:11:09 --> 00:11:15 we've probably done enough here that I hope it now make sense. 144 00:11:15 --> 00:11:34 That one type I want to talk about still is dictionaries. 145 00:11:34 --> 00:11:47 Like lists, dictionaries are mutable, like lists, they can 146 00:11:47 --> 00:11:58 be heterogeneous, but unlike lists, they're not ordered. 147 00:11:58 --> 00:12:04 The elements in them don't have an order, and furthermore, we 148 00:12:04 --> 00:12:17 have generalized the indexing. 149 00:12:17 --> 00:12:24 So lists and strings, we can only get at elements by 150 00:12:24 --> 00:12:29 numbers, by integers, really. 151 00:12:29 --> 00:12:35 Here what we use is, think of every element of the dictionary 152 00:12:35 --> 00:12:51 as a key value pair, where the keys are used as the indices. 153 00:12:51 --> 00:13:01 So we can have an example, let's look at it. 154 00:13:01 --> 00:13:07 So, if you look at the function show dics here, you'll see I've 155 00:13:07 --> 00:13:12 declared a variable called e to f, ah, think of that as English 156 00:13:12 --> 00:13:18 to French, and I've defined a dictionary to do translations. 157 00:13:18 --> 00:13:25 And so, we see that the string one corresponds the-- the key 158 00:13:25 --> 00:13:32 one corresponds to the value un the key soccer corresponds 159 00:13:32 --> 00:13:39 to the French word football, et cetera. 160 00:13:39 --> 00:13:45 It's kind of bizarre, but the French call soccer football. 161 00:13:45 --> 00:13:49 And then I can index in it. 162 00:13:49 --> 00:13:52 So if I print e to f of soccer, it will print 163 00:13:52 --> 00:13:56 the string football. 164 00:13:56 --> 00:14:07 So you can imagine that this is a very powerful mechanism. 165 00:14:07 --> 00:14:29 So let's look what happens when I run-- start to run this. 166 00:14:29 --> 00:14:30 All right. 167 00:14:30 --> 00:14:35 So, it says not defined-- and why did it say not defined, 168 00:14:35 --> 00:14:36 there's an interesting question. 169 00:14:36 --> 00:14:41 Let's just make sure we get this right, and we start the 170 00:14:41 --> 00:14:59 show up again-- All right, so, I run it, and sure enough, 171 00:14:59 --> 00:15:02 it shows football. 172 00:15:02 --> 00:15:10 What happens if I go e to f of 0? 173 00:15:10 --> 00:15:13 I get a key error. 174 00:15:13 --> 00:15:16 Because, remember, these things are not ordered. 175 00:15:16 --> 00:15:19 There is no 0th element. 176 00:15:19 --> 00:15:24 0 is not a key of this particular object. 177 00:15:24 --> 00:15:28 Now I could have made 0 a key, keys don't have to be strings, 178 00:15:28 --> 00:15:32 but as it happened, I didn't. 179 00:15:32 --> 00:15:55 So let's comment that out, so we don't get stuck again. 180 00:15:55 --> 00:15:59 Where we were before, I've printed it here, you might be a 181 00:15:59 --> 00:16:01 little surprised of the order. 182 00:16:01 --> 00:16:04 Why is soccer first? 183 00:16:04 --> 00:16:07 Because the order of this doesn't matter. 184 00:16:07 --> 00:16:10 That's why it's using set braces, so don't 185 00:16:10 --> 00:16:12 worry about that. 186 00:16:12 --> 00:16:19 The next thing I'm doing is-- so that's that, and then-- I'm 187 00:16:19 --> 00:16:25 now going to create another one, n to s, for numbers to 188 00:16:25 --> 00:16:29 strings, where my keys are numbers, in this case the 189 00:16:29 --> 00:16:33 number 1 corresponds to the word one, and interestingly 190 00:16:33 --> 00:16:36 enough, I'm also going to have the word one corresponding 191 00:16:36 --> 00:16:38 to the number 1. 192 00:16:38 --> 00:16:40 I can use anything I want for keys, I can use anything 193 00:16:40 --> 00:16:45 I want for values. 194 00:16:45 --> 00:16:50 And now if we look at this, we see, I can get this. 195 00:16:50 --> 00:16:51 All right. 196 00:16:51 --> 00:16:56 So these are extremely valuable. 197 00:16:56 --> 00:16:59 I can do lots of things with these, and you'll see that as 198 00:16:59 --> 00:17:01 we get to future assignments, we'll make heavy use 199 00:17:01 --> 00:17:03 of dictionaries. 200 00:17:03 --> 00:17:04 Yeah. 201 00:17:04 --> 00:17:04 Question. 202 00:17:04 --> 00:17:07 STUDENT: [INAUDIBLE] 203 00:17:07 --> 00:17:09 PROFESSOR JOHN GUTTAG: You can, but you don't know what 204 00:17:09 --> 00:17:11 order you'll get them in. 205 00:17:11 --> 00:17:16 What you can do is you can iterate keys, which gives you 206 00:17:16 --> 00:17:20 the keys in the dictionary, and then you can choose them, but 207 00:17:20 --> 00:17:25 there's no guarantee in the order in which you get keys. 208 00:17:25 --> 00:17:28 Now you might wonder, why do we have dictionaries? 209 00:17:28 --> 00:17:33 It would be pretty easy to implement them with lists, 210 00:17:33 --> 00:17:37 because you could have a list where each element of the list 211 00:17:37 --> 00:17:41 was a key value pair, and if I wanted to find the value 212 00:17:41 --> 00:17:46 corresponding to a key, I could say for e in the list, if the 213 00:17:46 --> 00:17:49 first element of e is the key, then I get the value, 214 00:17:49 --> 00:17:52 otherwise I look at the next element in the list. 215 00:17:52 --> 00:17:56 So adding dictionaries, as Professor Grimson said with 216 00:17:56 --> 00:17:58 so many other things, doesn't give you any more 217 00:17:58 --> 00:18:01 computational power. 218 00:18:01 --> 00:18:04 It gives you a lot of expressive convenience, you 219 00:18:04 --> 00:18:08 can write the programs much more cleanly, but most 220 00:18:08 --> 00:18:12 importantly, it's fast. 221 00:18:12 --> 00:18:15 Because if you did what I suggested with the list, the 222 00:18:15 --> 00:18:17 time to look up the key would be linear in the 223 00:18:17 --> 00:18:18 length of the list. 224 00:18:18 --> 00:18:20 You'd have to look at each element until 225 00:18:20 --> 00:18:21 you found the key. 226 00:18:21 --> 00:18:25 Dictionaries are implemented using a magic technique called 227 00:18:25 --> 00:18:32 hashing, which we'll look at a little bit later in the term, 228 00:18:32 --> 00:18:36 which allows us to retrieve keys in constant time. 229 00:18:36 --> 00:18:39 So it doesn't matter how big the dictionary is, you can 230 00:18:39 --> 00:18:45 instantaneously retrieve the value associated with the key. 231 00:18:45 --> 00:18:46 Extremely powerful. 232 00:18:46 --> 00:18:49 Not in the next problems set but in the problem set after 233 00:18:49 --> 00:18:54 that, we'll be exploiting that facility of dictionaries. 234 00:18:54 --> 00:18:55 All right. 235 00:18:55 --> 00:18:58 Any questions about this? 236 00:18:58 --> 00:19:02 If not, I will turn the podium over to Professor Grimson. 237 00:19:02 --> 00:19:04 PROFESSOR ERIC GRIMSON: I've stolen it. 238 00:19:04 --> 00:19:24 This is like tag team wrestling, right? 239 00:19:24 --> 00:19:31 Professor Guttag has you on the ropes, I get to finish you off. 240 00:19:31 --> 00:19:34 Try this again. 241 00:19:34 --> 00:19:35 OK. 242 00:19:35 --> 00:19:37 We wanted to finish up that section, we're now going to 243 00:19:37 --> 00:19:42 start on a new section, and I want to try and do one and a 244 00:19:42 --> 00:19:44 half things in the remaining time. 245 00:19:44 --> 00:19:46 I'm going to introduce one topic that we're going to deal 246 00:19:46 --> 00:19:50 with fairly quickly, and then we tackle the second topic, 247 00:19:50 --> 00:19:52 it's going to start today, and we're going to carry on. 248 00:19:52 --> 00:19:54 So let me tell the two things I want to do. 249 00:19:54 --> 00:19:57 I want to talk a little bit about how you use the things 250 00:19:57 --> 00:20:00 we've been building in terms of functions to help you structure 251 00:20:00 --> 00:20:01 and organize your code. 252 00:20:01 --> 00:20:05 It's a valuable tool that you want to have as a programmer. 253 00:20:05 --> 00:20:07 And then we're going to turn to the question of efficiency. 254 00:20:07 --> 00:20:09 How do we measure efficiency of algorithms? 255 00:20:09 --> 00:20:11 Which is going to be a really important thing that we want to 256 00:20:11 --> 00:20:13 deal with, and we'll start it today, it's undoubtedly going 257 00:20:13 --> 00:20:16 to take us a couple more lectures to finish it off. 258 00:20:16 --> 00:20:19 Right, so how do you use the idea of functions 259 00:20:19 --> 00:20:21 to organize code? 260 00:20:21 --> 00:20:24 We've been doing it implicitly, ever since 261 00:20:24 --> 00:20:25 we introduced functions. 262 00:20:25 --> 00:20:27 I want to make it a little more explicit, and I want to show 263 00:20:27 --> 00:20:29 you a tool for doing that. 264 00:20:29 --> 00:20:31 And I think the easy way to do is-- is to do 265 00:20:31 --> 00:20:32 it with an example. 266 00:20:32 --> 00:20:34 So let's take a really simple example. 267 00:20:34 --> 00:20:37 I want to compute the length of the hypotenuse of 268 00:20:37 --> 00:20:39 a right triangle. 269 00:20:39 --> 00:20:41 And yeah, I know you know how to do it, but let's think 270 00:20:41 --> 00:20:43 about what might happen if I wanted to do that. 271 00:20:43 --> 00:20:47 And in particular, if I think about that problem-- actually I 272 00:20:47 --> 00:20:57 want to do this-- if I think about that problem, I'm 273 00:20:57 --> 00:20:59 going to write a little piece of pseudo code. 274 00:20:59 --> 00:21:03 Just to think about how I would break that problem up. 275 00:21:03 --> 00:21:04 Pseudo code. 276 00:21:04 --> 00:21:08 Now, you're all linguistic majors, pseudo means false, 277 00:21:08 --> 00:21:10 this sounds like code that ain't going to run, and that's 278 00:21:10 --> 00:21:12 not the intent of the term. 279 00:21:12 --> 00:21:14 When I say pseudo code, what I mean is, I'm going to write a 280 00:21:14 --> 00:21:18 description of the steps, but not in a particular 281 00:21:18 --> 00:21:19 programming language. 282 00:21:19 --> 00:21:20 I'm going to simply write a description of what 283 00:21:20 --> 00:21:22 do I want to do. 284 00:21:22 --> 00:21:24 So if I were to solve this problem, here's 285 00:21:24 --> 00:21:24 the way I would do it. 286 00:21:24 --> 00:21:29 I would say, first thing I want to do, is I want to input a 287 00:21:29 --> 00:21:38 value for the base as a float. 288 00:21:38 --> 00:21:40 Need to get the base in. 289 00:21:40 --> 00:21:43 Second thing I want to do, I need to get the height, so I'm 290 00:21:43 --> 00:21:49 going to input a value for the height, also as a float, 291 00:21:49 --> 00:21:51 a floating point. 292 00:21:51 --> 00:21:52 OK. 293 00:21:52 --> 00:21:54 I get the two values in, what do I need to do, well, you 294 00:21:54 --> 00:21:55 sort of know that, right? 295 00:21:55 --> 00:22:04 I want to then do, I need to find the square root-- b 296 00:22:04 --> 00:22:07 squared plus h squared, right? 297 00:22:07 --> 00:22:09 The base plus the height, that's the thing I want for the 298 00:22:09 --> 00:22:17 hypotenuse-- and I'm going to save that as a float in 299 00:22:17 --> 00:22:20 hyp, for hypotenuse. 300 00:22:20 --> 00:22:27 And then finally I need to print something out, 301 00:22:27 --> 00:22:34 using the value in hyp. 302 00:22:34 --> 00:22:34 OK. 303 00:22:34 --> 00:22:35 Whoop-dee-doo, right? 304 00:22:35 --> 00:22:36 Come on. 305 00:22:36 --> 00:22:37 We know how to do this. 306 00:22:37 --> 00:22:40 But notice what I did. 307 00:22:40 --> 00:22:42 First of all, I've used the notion of modularity. 308 00:22:42 --> 00:22:45 I've listed a sequence of modules, the things 309 00:22:45 --> 00:22:46 that I want to do. 310 00:22:46 --> 00:22:50 Second thing to notice, is that little piece of pseudo code is 311 00:22:50 --> 00:22:53 telling me things about values. 312 00:22:53 --> 00:22:55 I need to have a float. 313 00:22:55 --> 00:22:57 I need to have another float here, it's giving 314 00:22:57 --> 00:22:58 me some information. 315 00:22:58 --> 00:23:01 Third thing to notice is, there's a flow of control. 316 00:23:01 --> 00:23:04 The order which these things are going to happen. 317 00:23:04 --> 00:23:07 And the fourth thing to notice is, I've used abstraction. 318 00:23:07 --> 00:23:11 I've said nothing about how I'm going to make square root. 319 00:23:11 --> 00:23:13 I'm using it as an abstraction, saying I'm going to have square 320 00:23:13 --> 00:23:15 root from somewhere, maybe I'll build it myself, maybe somebody 321 00:23:15 --> 00:23:18 gives it to me as part of a library, so I'm burying 322 00:23:18 --> 00:23:20 the details inside of it. 323 00:23:20 --> 00:23:23 I know this is a simple example, but when you mature as 324 00:23:23 --> 00:23:26 a programmer, one of the first things you should do when you 325 00:23:26 --> 00:23:28 sit down to tackle some problem is write something like 326 00:23:28 --> 00:23:30 this pseudo code. 327 00:23:30 --> 00:23:31 I know Professor Guttag does it all the time. 328 00:23:31 --> 00:23:33 I know, for a lot of you, it's like, OK, I got 329 00:23:33 --> 00:23:34 a heavy problem. 330 00:23:34 --> 00:23:39 Let's see, def Foobar open paren, a bunch of parameters. 331 00:23:39 --> 00:23:40 Wrong way to start. 332 00:23:40 --> 00:23:41 Start by thinking about what are the sequences. 333 00:23:41 --> 00:23:44 This also, by the way, in some sense, gives me the beginnings 334 00:23:44 --> 00:23:47 of my comments for what the structure of my code 335 00:23:47 --> 00:23:48 is going to be. 336 00:23:48 --> 00:23:49 OK. 337 00:23:49 --> 00:23:52 If we do that, if you look at the handout then, I can now 338 00:23:52 --> 00:23:54 start implementing this. 339 00:23:54 --> 00:23:56 I wanted to show you that, so, first thing I'm going to do is 340 00:23:56 --> 00:23:58 say, all right, I know I'm going to need square root in 341 00:23:58 --> 00:24:03 here, so I'm going to, in fact, import math. 342 00:24:03 --> 00:24:05 That's a little different from other import statements. 343 00:24:05 --> 00:24:08 This says I'm going to get the entire math library and 344 00:24:08 --> 00:24:10 bring it in so I can use it. 345 00:24:10 --> 00:24:12 And then, what's the first thing I wanted to do? 346 00:24:12 --> 00:24:15 I need to get a value for base as a float. 347 00:24:15 --> 00:24:16 Well OK, and that sounds like I'm going to need to do input 348 00:24:16 --> 00:24:20 of something, you can see that statement there, it's-- got the 349 00:24:20 --> 00:24:22 wrong glasses on but right there-- I'm going to do 350 00:24:22 --> 00:24:24 an input with a little message, and I'm going 351 00:24:24 --> 00:24:26 to store it in base. 352 00:24:26 --> 00:24:28 But here's where I'm going to practice a little bit 353 00:24:28 --> 00:24:30 of defensive programming. 354 00:24:30 --> 00:24:32 I can't rely on Professor Guttag if I give this-- if this 355 00:24:32 --> 00:24:36 code to him, I can't rely on him to type in a float. 356 00:24:36 --> 00:24:38 Actually I can, because he's a smart guy, but in general, 357 00:24:38 --> 00:24:39 I can't rely on the user-- 358 00:24:39 --> 00:24:42 PROFESSOR JOHN GUTTAG: I wouldn't do it right 359 00:24:42 --> 00:24:42 to see if you did. 360 00:24:42 --> 00:24:43 PROFESSOR ERIC GRIMSON: Actually, he's right, you know. 361 00:24:43 --> 00:24:45 He would not do it, just to see if I'm doing it right. 362 00:24:45 --> 00:24:47 I can't rely on the user. 363 00:24:47 --> 00:24:48 I want to make sure I get a float in it, 364 00:24:48 --> 00:24:49 so how do I do that? 365 00:24:49 --> 00:24:52 Well, here's one nice little trick. 366 00:24:52 --> 00:24:56 First of all, having read in that value, I can check to 367 00:24:56 --> 00:24:57 see, is it of the right type? 368 00:24:57 --> 00:24:59 Now, this is not the nicest way to do it but it'll work. 369 00:24:59 --> 00:25:03 I can look at the type of the value of base and compare it to 370 00:25:03 --> 00:25:05 the type of an actual float and see, are they the same? 371 00:25:05 --> 00:25:08 Is this a real or a float? 372 00:25:08 --> 00:25:10 If it is, I'm done. 373 00:25:10 --> 00:25:12 How do I go back if it isn't? 374 00:25:12 --> 00:25:14 Well, I'm going to create a little infinite loop. 375 00:25:14 --> 00:25:16 Not normally a good idea. 376 00:25:16 --> 00:25:19 I set up a variable here, called input OK. 377 00:25:19 --> 00:25:21 Initially it's false, because I have no input. 378 00:25:21 --> 00:25:25 And then I run a loop in which I read something in, I check to 379 00:25:25 --> 00:25:28 see if it's the right type, if it is, I change that variable 380 00:25:28 --> 00:25:31 to say it's now the correct type, which means the next time 381 00:25:31 --> 00:25:34 through the loop, I'm going to say I'm all set and I'm 382 00:25:34 --> 00:25:35 going to bounce out. 383 00:25:35 --> 00:25:38 But if it is not, it's going to print out a message here 384 00:25:38 --> 00:25:42 saying, you screwed up, somewhat politely, and it's 385 00:25:42 --> 00:25:43 going to go back around. 386 00:25:43 --> 00:25:47 So it'll just cycle until I get something of the right type. 387 00:25:47 --> 00:25:49 Nice way of doing it. 388 00:25:49 --> 00:25:50 Right, what's the second thing I do? 389 00:25:50 --> 00:25:53 Well, I get the same sort of thing to read in the height, 390 00:25:53 --> 00:25:56 once I have that I'm going to take base squared plus height 391 00:25:56 --> 00:25:59 squared, and there's a form that we've just seen once 392 00:25:59 --> 00:26:02 before, and it's going to repeat it, that is math.SQRT 393 00:26:02 --> 00:26:07 and it says the following: it says, take from the math 394 00:26:07 --> 00:26:11 library the function called sqrt. 395 00:26:11 --> 00:26:12 OK. 396 00:26:12 --> 00:26:13 We're going to come back to this when we get to objects, 397 00:26:13 --> 00:26:16 it's basically picking up that object and it's applying that, 398 00:26:16 --> 00:26:19 putting that value into hype, and then just printing 399 00:26:19 --> 00:26:21 something out. 400 00:26:21 --> 00:26:23 And again, if I just run this, just to show that it's going to 401 00:26:23 --> 00:26:28 do the right thing, it says enter base, I'm obnoxious, it 402 00:26:28 --> 00:26:33 says oops, wasn't a float, so we'll be nice about it, and 403 00:26:33 --> 00:26:37 I enter a height, and it prints out what I expected. 404 00:26:37 --> 00:26:38 I just concatenated those strings together, by 405 00:26:38 --> 00:26:39 the way, at the end. 406 00:26:39 --> 00:26:41 All right. 407 00:26:41 --> 00:26:43 Notice what I did. 408 00:26:43 --> 00:26:44 OK. 409 00:26:44 --> 00:26:47 I went from this description, it gives me [UNINTELLIGIBLE] 410 00:26:47 --> 00:26:48 some information. 411 00:26:48 --> 00:26:49 I need to have a particular type. 412 00:26:49 --> 00:26:51 I made sure I had the particular type. 413 00:26:51 --> 00:26:55 I've used some abstraction to suppress some details here. 414 00:26:55 --> 00:26:58 Now if you look at that list, there is actually something I 415 00:26:58 --> 00:27:01 didn't seem to check, which is, I said I wanted a 416 00:27:01 --> 00:27:04 float stored in hyp. 417 00:27:04 --> 00:27:07 How do I know I've got a float in hyp? 418 00:27:07 --> 00:27:11 Well I'm relying on the contract, if you like, that the 419 00:27:11 --> 00:27:13 manufacturer of square root put together, which is, if I know 420 00:27:13 --> 00:27:16 I'm giving it two floats, which I do because I make sure 421 00:27:16 --> 00:27:18 they're floats, the contract, if you like, of square root 422 00:27:18 --> 00:27:20 says I'll give you back a float. 423 00:27:20 --> 00:27:24 So I can guarantee I've got something of the right type. 424 00:27:24 --> 00:27:24 OK. 425 00:27:24 --> 00:27:26 I know this is boring as whatever. 426 00:27:26 --> 00:27:28 But there's an important point here. 427 00:27:28 --> 00:27:31 Having now used this pseudo code to line things up, I can 428 00:27:31 --> 00:27:33 start putting some additional structure on this. 429 00:27:33 --> 00:27:37 And in particular, I'm sure you're looking at this going-- 430 00:27:37 --> 00:27:39 will look at it if we look at the right piece-- 431 00:27:39 --> 00:27:41 going, wait a minute. 432 00:27:41 --> 00:27:46 This chunk of code and this chunk of code, they're really 433 00:27:46 --> 00:27:48 doing the same thing. 434 00:27:48 --> 00:27:49 And this is something I want to use. 435 00:27:49 --> 00:27:52 If I look at those two pieces of computation, I can 436 00:27:52 --> 00:27:53 see a pattern there. 437 00:27:53 --> 00:27:56 It's an obvious pattern of what I'm doing. 438 00:27:56 --> 00:27:59 And in particular, I can then ask the following question, 439 00:27:59 --> 00:28:01 which is, what's different between those two 440 00:28:01 --> 00:28:03 pieces of code? 441 00:28:03 --> 00:28:05 And I suggest two things, right? 442 00:28:05 --> 00:28:08 One is, what's the thing I print out when I 443 00:28:08 --> 00:28:09 ask for the input? 444 00:28:09 --> 00:28:12 The second thing is, what do I print out if I actually don't 445 00:28:12 --> 00:28:14 get the right input in? 446 00:28:14 --> 00:28:18 And so the only two differences are, right there, and there 447 00:28:18 --> 00:28:20 versus here and here. 448 00:28:20 --> 00:28:22 So this is a good place to think about, OK, 449 00:28:22 --> 00:28:24 let me capture that. 450 00:28:24 --> 00:28:26 Let me write a function, in fact the literal thing I would 451 00:28:26 --> 00:28:30 do is to say, identify the things that change, give each 452 00:28:30 --> 00:28:34 of them a variable name because I want to refer to them, and 453 00:28:34 --> 00:28:36 then write a function that captures the rest of that 454 00:28:36 --> 00:28:39 computation just with those variable names inside. 455 00:28:39 --> 00:28:42 And in fact, if you look down-- and I'm just going to highlight 456 00:28:42 --> 00:28:44 this portion, I'm not going to run it-- but if you look down 457 00:28:44 --> 00:28:48 here, that's exactly what that does. 458 00:28:48 --> 00:28:50 I happen to have it commented out, right? 459 00:28:50 --> 00:28:50 What does it do? 460 00:28:50 --> 00:28:52 It has height, it says, I've got two names of things: the 461 00:28:52 --> 00:28:54 request message and the error message. 462 00:28:54 --> 00:28:58 The body of that function looks exactly like the computation up 463 00:28:58 --> 00:29:02 above, except I'm simply using those in place of the specific 464 00:29:02 --> 00:29:04 message I had before. 465 00:29:04 --> 00:29:05 And then the only other difference is obviously, 466 00:29:05 --> 00:29:07 it's a function I need to return a value. 467 00:29:07 --> 00:29:10 So when I'm done, I'm going to give the value back out. 468 00:29:10 --> 00:29:11 All right? 469 00:29:11 --> 00:29:17 And that then let's me get to, basically, this code. 470 00:29:17 --> 00:29:21 Having done that, I simply call base with get float, I call 471 00:29:21 --> 00:29:23 height with get float, and do the rest of the work. 472 00:29:23 --> 00:29:24 All right. 473 00:29:24 --> 00:29:27 What's the point of doing this? 474 00:29:27 --> 00:29:28 Well, notice again. 475 00:29:28 --> 00:29:28 What have I done? 476 00:29:28 --> 00:29:31 I've captured a module inside of a function. 477 00:29:31 --> 00:29:33 And even though it's a simple little thing here, there's 478 00:29:33 --> 00:29:36 some a couple of really nice advantages to this. 479 00:29:36 --> 00:29:37 All right? 480 00:29:37 --> 00:29:39 First one is there's less code to read. 481 00:29:39 --> 00:29:40 It's easier to debug. 482 00:29:40 --> 00:29:42 I don't have as much to deal with. 483 00:29:42 --> 00:29:45 But the more important thing is, I've now separated out 484 00:29:45 --> 00:29:48 implementation from functionality, or 485 00:29:48 --> 00:29:50 implementation from use. 486 00:29:50 --> 00:29:51 What does that mean? 487 00:29:51 --> 00:29:55 It means anybody using that little function get float 488 00:29:55 --> 00:29:57 doesn't have to worry about what's inside of it. 489 00:29:57 --> 00:29:59 So for example, I decide I want to change the message I print 490 00:29:59 --> 00:30:03 out, I don't have to change the function, I just pass in 491 00:30:03 --> 00:30:04 a different parameter. 492 00:30:04 --> 00:30:08 Well if I-- you know, with [UNINTELLIGIBLE PHRASE sorry, 493 00:30:08 --> 00:30:09 let me say it differently. 494 00:30:09 --> 00:30:11 I don't need to worry about how checking is done, it's handled 495 00:30:11 --> 00:30:14 inside of that function. 496 00:30:14 --> 00:30:18 If I decide there's a better way to get input, and there is, 497 00:30:18 --> 00:30:21 then I can make it to change what I don't have to change 498 00:30:21 --> 00:30:23 the code that uses the input. 499 00:30:23 --> 00:30:25 So, if you like, I've built a separation between the 500 00:30:25 --> 00:30:28 user and the implementer. 501 00:30:28 --> 00:30:31 And that's exactly one of the reasons why I want to have the 502 00:30:31 --> 00:30:33 functions, because I've separated those out. 503 00:30:33 --> 00:30:37 Another way of saying it is, anything that uses get float 504 00:30:37 --> 00:30:39 doesn't care what the details are inside or shouldn't, and if 505 00:30:39 --> 00:30:42 I change that definition, I don't have to change anything 506 00:30:42 --> 00:30:46 elsewhere in my code, whereas if I just have the raw code 507 00:30:46 --> 00:30:48 in there, I have to go off and do it. 508 00:30:48 --> 00:30:51 Right, so the things we want you to take away from this are, 509 00:30:51 --> 00:30:54 get into the habit of using pseudo code when you sit down 510 00:30:54 --> 00:30:57 to start a problem, write out what are the steps. 511 00:30:57 --> 00:31:00 I will tell you that a good programmer, at least in my 512 00:31:00 --> 00:31:03 mind, may actually go back and modify the pseudo code as they 513 00:31:03 --> 00:31:05 realize they're missing things, but it's easier to do that when 514 00:31:05 --> 00:31:08 you're looking at a simple set of steps, than when you're in 515 00:31:08 --> 00:31:10 the middle of a pile of code. 516 00:31:10 --> 00:31:13 And get into the habit of using it to help you define what 517 00:31:13 --> 00:31:14 is the flow of control. 518 00:31:14 --> 00:31:17 What are the basic modules, what information needs to be 519 00:31:17 --> 00:31:22 passed between those modules in order to make the code work. 520 00:31:22 --> 00:31:23 OK. 521 00:31:23 --> 00:31:25 That was the short topic. 522 00:31:25 --> 00:31:26 I will come back to this some more and you're going to get 523 00:31:26 --> 00:31:28 lots of practice with this. 524 00:31:28 --> 00:31:30 What I want to do is to start talking about 525 00:31:30 --> 00:31:31 a different topic. 526 00:31:31 --> 00:31:33 Which is efficiency. 527 00:31:33 --> 00:31:36 And this is going to sound like a weird topic, we're going 528 00:31:36 --> 00:31:39 to see why it's of value in a second. 529 00:31:39 --> 00:31:43 I want to talk about efficiency, and we're going to, 530 00:31:43 --> 00:31:47 or at least I'm going to, at times also refer to this as 531 00:31:47 --> 00:31:50 orders of growth, for reasons that you'll see over 532 00:31:50 --> 00:31:54 the next few minutes. 533 00:31:54 --> 00:31:57 Now, efficiency is obviously an important consideration when 534 00:31:57 --> 00:32:00 you're designing code, although I have to admit, at least for 535 00:32:00 --> 00:32:03 me, I usually want to at least start initially with code that 536 00:32:03 --> 00:32:06 works, and then worry about how I might go back and come up 537 00:32:06 --> 00:32:07 with more efficient implementation. 538 00:32:07 --> 00:32:09 I like to have something I can rely on, but it 539 00:32:09 --> 00:32:11 is an important issue. 540 00:32:11 --> 00:32:14 And our goal over the next couple of lectures, is 541 00:32:14 --> 00:32:16 basically to give you a sense of this. 542 00:32:16 --> 00:32:17 So we're not going to turn you into an expert on 543 00:32:17 --> 00:32:19 computational efficiency. 544 00:32:19 --> 00:32:21 That's, there are whole courses on that, there's some great 545 00:32:21 --> 00:32:23 courses here on that, it takes some mathematical 546 00:32:23 --> 00:32:26 sophistication, we're going to push that off a little bit. 547 00:32:26 --> 00:32:28 But what we-- what we do want to do, is to give you some 548 00:32:28 --> 00:32:31 intuition about how to approach questions of efficiency. 549 00:32:31 --> 00:32:36 We want you to have a sense of why some programs complete 550 00:32:36 --> 00:32:39 almost before you're done typing it. 551 00:32:39 --> 00:32:42 Some programs run overnight. 552 00:32:42 --> 00:32:45 Some programs won't stop until I'm old and gray. 553 00:32:45 --> 00:32:49 Some programs won't stop until you're old and gray. 554 00:32:49 --> 00:32:51 And these are really different efficiencies, and we want to 555 00:32:51 --> 00:32:53 give you a sense of how do you reason about those different 556 00:32:53 --> 00:32:55 kinds of programs. 557 00:32:55 --> 00:32:58 And part of it is we want you to learn how to have a catalog, 558 00:32:58 --> 00:33:01 if you like, of different classes of algorithms, so that 559 00:33:01 --> 00:33:05 when you get a problem, you try and map it into an appropriate 560 00:33:05 --> 00:33:08 class, and use the leverage, if you like, of that 561 00:33:08 --> 00:33:10 class of algorithms. 562 00:33:10 --> 00:33:12 Now. 563 00:33:12 --> 00:33:13 It's a quick sidebar, I've got to say, I'm sure talking about 564 00:33:13 --> 00:33:17 efficiency to folks like you probably seems really strange. 565 00:33:17 --> 00:33:20 I mean, you grew up in an age when computers were blazingly 566 00:33:20 --> 00:33:23 fast, and have tons of memory, so why in the world do you 567 00:33:23 --> 00:33:24 care about efficiency? 568 00:33:24 --> 00:33:27 Some of us were not so lucky. 569 00:33:27 --> 00:33:32 So I'll admit, my first computer I program was a PDP6, 570 00:33:32 --> 00:33:34 only Professor Guttag even knows what PDP stands for, it 571 00:33:34 --> 00:33:37 was made by Digital Equipment Company, which does not exist 572 00:33:37 --> 00:33:39 anymore, is now long gone. 573 00:33:39 --> 00:33:41 It had, I know, this is old guy stories, but 574 00:33:41 --> 00:33:46 it had 160k of memory. 575 00:33:46 --> 00:33:46 Yeah. 576 00:33:46 --> 00:33:46 160k. 577 00:33:46 --> 00:33:50 160 kilobits of memory. 578 00:33:50 --> 00:33:53 I mean, your flash cards have more than that, right? 579 00:33:53 --> 00:33:56 It had a processor speed of one megahertz. 580 00:33:56 --> 00:34:00 It did a million operations per second. 581 00:34:00 --> 00:34:02 So let's think about it. 582 00:34:02 --> 00:34:04 This sucker, what's it got in there? 583 00:34:04 --> 00:34:07 That Air Mac, it's, see, it's got, its go-- my Air Mac, I 584 00:34:07 --> 00:34:09 don't know about John's, his is probably better, mine 585 00:34:09 --> 00:34:12 has 1.8 gigahertz speed. 586 00:34:12 --> 00:34:15 That's 1800 times faster. 587 00:34:15 --> 00:34:18 But the real one that blows me away is, it has 2 gig 588 00:34:18 --> 00:34:19 of memory inside of it. 589 00:34:19 --> 00:34:23 That's 12 thousand times more memory. 590 00:34:23 --> 00:34:24 Oh, and by the way? 591 00:34:24 --> 00:34:27 The PDP6, it was in a rack about this tall. 592 00:34:27 --> 00:34:29 From the floor, not from the table. 593 00:34:29 --> 00:34:31 All right, so you didn't grow up in the late 1800s like I 594 00:34:31 --> 00:34:33 did, you don't have to worry about this sort 595 00:34:33 --> 00:34:34 of stuff, right? 596 00:34:34 --> 00:34:36 But a point I'm trying to make is, it sounds like anymore 597 00:34:36 --> 00:34:39 computers have gotten so blazingly fast, why should 598 00:34:39 --> 00:34:39 you worry about it? 599 00:34:39 --> 00:34:42 Let me give you one other anecdote that I can't resist. 600 00:34:42 --> 00:34:44 This is the kind of thing you can use at cocktail parties to 601 00:34:44 --> 00:34:45 impress your friends from Harvard. 602 00:34:45 --> 00:34:47 OK. 603 00:34:47 --> 00:34:50 Imagine I have a little lamp, a little goose-- one of those 604 00:34:50 --> 00:34:52 little gooseneck lamps, I'd put it on the table here, I'd put 605 00:34:52 --> 00:34:55 the height about a f-- about a foot off the table. 606 00:34:55 --> 00:34:58 And if I was really good, I could hit, or time it so that 607 00:34:58 --> 00:35:00 when I hurt-- yeah, try again. 608 00:35:00 --> 00:35:03 When I turn this on switch on in the lamp, at exactly the 609 00:35:03 --> 00:35:05 same time, I'm going to hit a key on my computer and 610 00:35:05 --> 00:35:07 start it running. 611 00:35:07 --> 00:35:07 OK. 612 00:35:07 --> 00:35:12 In the length of time it takes for the light to get from that 613 00:35:12 --> 00:35:19 bulb to the table, this machine processes two operations. 614 00:35:19 --> 00:35:21 Oh come on, that's amazing. 615 00:35:21 --> 00:35:22 Two operations. 616 00:35:22 --> 00:35:24 You know, you can do the simple numbers, right? 617 00:35:24 --> 00:35:25 [UNINTELLIGIBLE PHRASE] 618 00:35:25 --> 00:35:28 Light travels basically a foot in a nanosecond. 619 00:35:28 --> 00:35:30 Simple rule of thumb. 620 00:35:30 --> 00:35:32 Now, the nanosecond is what, 10 to the minus 9 seconds. 621 00:35:32 --> 00:35:38 This thing does 2 gig worth of operations. 622 00:35:38 --> 00:35:40 A gig is 10 to the 9, so it does two operations in the 623 00:35:40 --> 00:35:43 length of time it takes light to get from one foot off the 624 00:35:43 --> 00:35:44 table down to the table. 625 00:35:44 --> 00:35:45 That's amazing. 626 00:35:45 --> 00:35:48 So why in the world do you care about efficiency? 627 00:35:48 --> 00:35:52 Well the problem is that the problems grow faster than 628 00:35:52 --> 00:35:55 the computers speed up. 629 00:35:55 --> 00:35:56 I'll give you two examples. 630 00:35:56 --> 00:35:57 I happen to work in medical imaging. 631 00:35:57 --> 00:35:58 Actually, so does Professor Guttag. 632 00:35:58 --> 00:36:02 In my in my area of research, it's common for us to want to 633 00:36:02 --> 00:36:04 process about 100 images a second in order to get 634 00:36:04 --> 00:36:05 real time display. 635 00:36:05 --> 00:36:08 Each image has about a million elements in it. 636 00:36:08 --> 00:36:11 I've got to process about a half a gig of data a 637 00:36:11 --> 00:36:14 second in order to get anything out of it. 638 00:36:14 --> 00:36:15 Second example. 639 00:36:15 --> 00:36:18 Maybe one that'll hit a little more home to you. 640 00:36:18 --> 00:36:20 I'm sure you all use Google, I'm sure it's a verb in 641 00:36:20 --> 00:36:22 your vocabulary, right? 642 00:36:22 --> 00:36:24 Now, Google processes-- ten million? 643 00:36:24 --> 00:36:25 Ten billion pages? 644 00:36:25 --> 00:36:25 John? 645 00:36:25 --> 00:36:27 I think ten billion was the last number I heard. 646 00:36:27 --> 00:36:28 Does that sound about right? 647 00:36:28 --> 00:36:30 PROFESSOR JOHN GUTTAG: I think it might actually 648 00:36:30 --> 00:36:31 be more by now. 649 00:36:31 --> 00:36:32 PROFESSOR ERIC GRIMSON: Maybe more by now. 650 00:36:32 --> 00:36:34 But let's, for the sake of argument, ten billion pages. 651 00:36:34 --> 00:36:38 Imagine you want to search through Google to find 652 00:36:38 --> 00:36:39 a particular page. 653 00:36:39 --> 00:36:41 You want to do it in a second. 654 00:36:41 --> 00:36:43 And you're going to just do it the brute force way, assuming 655 00:36:43 --> 00:36:46 you could even reach all of those pages in that time. 656 00:36:46 --> 00:36:49 Well, if you're going to do that, you've got to be able 657 00:36:49 --> 00:36:53 to find what you're looking for in a page in two steps. 658 00:36:53 --> 00:36:57 Where a step is a comparison or an arithmetic operation. 659 00:36:57 --> 00:36:58 Ain't going to happen, right? 660 00:36:58 --> 00:36:59 You just can't do it. 661 00:36:59 --> 00:37:03 So again, part of the point here is that things grow-- or 662 00:37:03 --> 00:37:05 to rephrase it, interesting things grow at an 663 00:37:05 --> 00:37:06 incredible rate. 664 00:37:06 --> 00:37:09 And as a consequence, brute force methods are typically 665 00:37:09 --> 00:37:12 not going to work. 666 00:37:12 --> 00:37:12 OK. 667 00:37:12 --> 00:37:14 So that then leads to the question about what 668 00:37:14 --> 00:37:14 should we do about this? 669 00:37:14 --> 00:37:17 And probably the obvious thing you'll think about 670 00:37:17 --> 00:37:19 is, we'll come up with a clever algorithm. 671 00:37:19 --> 00:37:21 And I want to disabuse you of that notion. 672 00:37:21 --> 00:37:23 It's a great idea if you can do it, 673 00:37:23 --> 00:37:25 The guy who-- I think I'm going to say this right, John, right? 674 00:37:25 --> 00:37:26 Sanjay? 675 00:37:26 --> 00:37:29 Ghemawat?-- with a guy who was a graduate of our department, 676 00:37:29 --> 00:37:31 who is the heart and soul behind Google's really fast 677 00:37:31 --> 00:37:34 search, is an incredibly smart guy, and he did come up with a 678 00:37:34 --> 00:37:36 really clever algorithm about how you structure that search, 679 00:37:36 --> 00:37:37 in order to make it happen. 680 00:37:37 --> 00:37:39 And he probably made a lot of money along the way. 681 00:37:39 --> 00:37:41 So if you have a great idea, you know, talk to a good 682 00:37:41 --> 00:37:44 patent attorney and get it locked away. 683 00:37:44 --> 00:37:46 But in general, it's hard to come up with the 684 00:37:46 --> 00:37:48 really clever algorithm. 685 00:37:48 --> 00:37:51 What you're much better at doing is saying how do I take 686 00:37:51 --> 00:37:54 the problem I've got and map it into a class of algorithms 687 00:37:54 --> 00:37:58 about which I know and use the efficiencies of those to try 688 00:37:58 --> 00:38:00 and figure out how to make it work. 689 00:38:00 --> 00:38:03 So what we want to do, is, I guess another way of saying 690 00:38:03 --> 00:38:12 it is, efficiency is really about choice of algorithm. 691 00:38:12 --> 00:38:18 And we want to help you learn how to map a problem into a 692 00:38:18 --> 00:38:24 class of algorithms of some efficiency. 693 00:38:24 --> 00:38:27 That's our goal. 694 00:38:27 --> 00:38:28 OK. 695 00:38:28 --> 00:38:31 So to do this, we need a little more abstract way of talking 696 00:38:31 --> 00:38:34 about efficiency, and so, the question is, how do we 697 00:38:34 --> 00:38:35 think about efficiency? 698 00:38:35 --> 00:38:39 Typically there's two things we want to measure. 699 00:38:39 --> 00:38:43 Space and time. 700 00:38:43 --> 00:38:46 Sounds like an astrophysics course, right? 701 00:38:46 --> 00:38:51 Now, space usually we-- ach, try it again. 702 00:38:51 --> 00:38:54 When we talk about space, what we usually refer to is, how 703 00:38:54 --> 00:38:57 much computer memory does it take to complete a computation 704 00:38:57 --> 00:38:59 of a particular size? 705 00:38:59 --> 00:39:08 So let me write that down, it's how much memory do I need 706 00:39:08 --> 00:39:09 to complete a computation. 707 00:39:09 --> 00:39:13 And by that, I mean, not how much memory do I need to store 708 00:39:13 --> 00:39:16 the size of the input, it's really how much internal 709 00:39:16 --> 00:39:19 memory do I use up as I go through the computation? 710 00:39:19 --> 00:39:20 I've got some internal variables I have to store, 711 00:39:20 --> 00:39:24 what kinds of things do I have to keep track of? 712 00:39:24 --> 00:39:26 You're going to see the arguments about space if you 713 00:39:26 --> 00:39:28 take some of the courses that follow on, and again, some 714 00:39:28 --> 00:39:29 nice courses about that. 715 00:39:29 --> 00:39:31 For this course, we're not going to worry 716 00:39:31 --> 00:39:33 about space that much. 717 00:39:33 --> 00:39:36 What we're really going to focus on is time. 718 00:39:36 --> 00:39:37 OK. 719 00:39:37 --> 00:39:39 So we're going to focus here. 720 00:39:39 --> 00:39:44 And the obvious question I could start with is, and 721 00:39:44 --> 00:39:49 suppose I ask you, how long does the algorithm implemented 722 00:39:49 --> 00:39:52 by this program take to run? 723 00:39:52 --> 00:39:56 How might I answer that question? 724 00:39:56 --> 00:40:01 Any thoughts? 725 00:40:01 --> 00:40:01 Yeah. 726 00:40:01 --> 00:40:06 STUDENT: [INAUDIBLE] 727 00:40:06 --> 00:40:06 PROFESSOR ERIC GRIMSON: Ah, you're jumping 728 00:40:06 --> 00:40:07 ahead of me, great. 729 00:40:07 --> 00:40:09 The answer was, find a mathematical expression 730 00:40:09 --> 00:40:11 depending on the number of inputs. 731 00:40:11 --> 00:40:12 It was exactly where I want to go. 732 00:40:12 --> 00:40:13 Thank you. 733 00:40:13 --> 00:40:18 I was hoping for a simpler answer, which is, just run it. 734 00:40:18 --> 00:40:20 Which is, yeah I know, seems like a dumb 735 00:40:20 --> 00:40:21 thing to say, right? 736 00:40:21 --> 00:40:23 One of the things you could imagine is just try it on and 737 00:40:23 --> 00:40:25 input, see how long it takes. 738 00:40:25 --> 00:40:28 You're all cleverer than that, but I want to point out why 739 00:40:28 --> 00:40:29 that's not a great idea. 740 00:40:29 --> 00:40:30 First of all, that depends on which input I've picked. 741 00:40:30 --> 00:40:32 All right? 742 00:40:32 --> 00:40:34 Obviously the algorithm is likely to depend on the 743 00:40:34 --> 00:40:36 size of the input, so this is not a great idea. 744 00:40:36 --> 00:40:38 Second one is, it depends on which machine I'm running on. 745 00:40:38 --> 00:40:42 If I'm using a PDP6, it's going to take a whole lot longer 746 00:40:42 --> 00:40:44 than if I'm using an Air Mac. 747 00:40:44 --> 00:40:45 All right? 748 00:40:45 --> 00:40:47 Third one is, it may depend on which version of 749 00:40:47 --> 00:40:48 Python I'm running. 750 00:40:48 --> 00:40:51 Depends on how clever the implementer of Python was. 751 00:40:51 --> 00:40:53 Fourth one is, it may depend on which programming 752 00:40:53 --> 00:40:54 language I'm doing it in. 753 00:40:54 --> 00:40:57 So I could do it empirically, but I don't want to do that 754 00:40:57 --> 00:40:58 typically, it's just not a great way to get at it. 755 00:40:58 --> 00:41:02 And so in fact, what we want is exactly what 756 00:41:02 --> 00:41:02 the young lady said. 757 00:41:02 --> 00:41:06 I'm going to ask the following question, which is-- let me 758 00:41:06 --> 00:41:16 write it down-- what is the number of the basic steps 759 00:41:16 --> 00:41:31 needed as a function of the input size? 760 00:41:31 --> 00:41:35 That's the question we're going to try and address. 761 00:41:35 --> 00:41:37 If we can do this, this is good, because first of all, it 762 00:41:37 --> 00:41:40 removes any questions about what machine I'm running on, 763 00:41:40 --> 00:41:42 it's talking about fundamentally, how hard is this 764 00:41:42 --> 00:41:44 problem, and the second thing is, it is going to do 765 00:41:44 --> 00:41:47 it specifically in terms of the input. 766 00:41:47 --> 00:41:50 Which is one of the things that I was worried about. 767 00:41:50 --> 00:41:51 OK. 768 00:41:51 --> 00:41:54 So to do this, we're going to have to do a couple of things. 769 00:41:54 --> 00:41:57 All right, the first one is, what do we mean by input size? 770 00:41:57 --> 00:42:00 And unfortunately, this depends on the problem. 771 00:42:00 --> 00:42:03 It could be what's the size of the integer I pass in 772 00:42:03 --> 00:42:04 as an argument, if that's what I'm passing in. 773 00:42:04 --> 00:42:08 It could be, how long is the list, if I'm processing a list 774 00:42:08 --> 00:42:10 or a tuple It could be, how many bits are there 775 00:42:10 --> 00:42:11 in something. 776 00:42:11 --> 00:42:13 So it-- that is something where we have to simply be clear 777 00:42:13 --> 00:42:17 about specifying what we're using as input size. 778 00:42:17 --> 00:42:19 And we want to characterize it mathematically as some number, 779 00:42:19 --> 00:42:22 or some variable rather, the length of the list, the size of 780 00:42:22 --> 00:42:25 the integer, would be the thing we'd want to do. 781 00:42:25 --> 00:42:29 Second thing we've got to worry about is, what's a basic step? 782 00:42:29 --> 00:42:33 All right, if I bury a whole lot of computation inside of 783 00:42:33 --> 00:42:36 something, I can say, wow, this program, you know, 784 00:42:36 --> 00:42:37 runs in one step. 785 00:42:37 --> 00:42:39 Unfortunately, that one step calls the Oracle at Delphi 786 00:42:39 --> 00:42:41 and gets an answer back. 787 00:42:41 --> 00:42:43 Maybe not quite what you want. 788 00:42:43 --> 00:42:46 We're typically going to use as basic steps the 789 00:42:46 --> 00:42:49 built-in primitives that a machine comes with. 790 00:42:49 --> 00:42:51 Or another way of saying it is, we're going to use as the basic 791 00:42:51 --> 00:42:53 steps, those operations that run in constant time, so 792 00:42:53 --> 00:42:55 arithmetic operations. 793 00:42:55 --> 00:42:57 Comparisons. 794 00:42:57 --> 00:42:59 Memory access, and in fact one of the things we're going to do 795 00:42:59 --> 00:43:08 here, is we're going to assume a particular model, called a 796 00:43:08 --> 00:43:16 random access model, which basically says, we're going to 797 00:43:16 --> 00:43:20 assume that the length of time it takes me to get to any 798 00:43:20 --> 00:43:23 location in memory is constant. 799 00:43:23 --> 00:43:25 It's not true, by the way, of all programming languages. 800 00:43:25 --> 00:43:26 In fact, Professor Guttag already talked about that, in 801 00:43:26 --> 00:43:30 some languages lists take a time linear with the 802 00:43:30 --> 00:43:31 length to get to it. 803 00:43:31 --> 00:43:33 So we're to assume we can get to any piece of data, any 804 00:43:33 --> 00:43:36 instruction in constant time, and the second assumption we're 805 00:43:36 --> 00:43:39 going to make is that the basic primitive steps take constant 806 00:43:39 --> 00:43:42 time, same amount of time to compute. 807 00:43:42 --> 00:43:44 Again, not completely true, but it's a good model, so 808 00:43:44 --> 00:43:47 arithmetic operations, comparisons, things of that 809 00:43:47 --> 00:43:50 sort, we're all going to assume are basically in that in 810 00:43:50 --> 00:43:52 that particular model. 811 00:43:52 --> 00:43:52 OK. 812 00:43:52 --> 00:43:54 Having done that, then, there are three things that 813 00:43:54 --> 00:43:56 we're going to look at. 814 00:43:56 --> 00:43:58 As I said, what we want to do is, we want to count the number 815 00:43:58 --> 00:44:01 of basic steps it takes to compute a computation as a 816 00:44:01 --> 00:44:02 function of input size. 817 00:44:02 --> 00:44:04 And the question is, what do we want to count? 818 00:44:04 --> 00:44:10 Now, one possibility is to do best case. 819 00:44:10 --> 00:44:13 Over all possible inputs to this function, what's 820 00:44:13 --> 00:44:14 the fastest it runs? 821 00:44:14 --> 00:44:19 The fewest, so the minimum, if you like. 822 00:44:19 --> 00:44:21 It's nice, but not particularly helpful. 823 00:44:21 --> 00:44:27 The other obvious one to do would be worst case. 824 00:44:27 --> 00:44:30 Again, over all possible inputs to this function, what's the 825 00:44:30 --> 00:44:34 most number of steps it takes to do the computation? 826 00:44:34 --> 00:44:38 And the third possibility, is to do the expected case. 827 00:44:38 --> 00:44:43 The average. 828 00:44:43 --> 00:44:46 I'm going to think of it that way. 829 00:44:46 --> 00:44:49 In general, people focus on worst case. 830 00:44:49 --> 00:44:50 For a couple of reasons. 831 00:44:50 --> 00:44:53 In some ways, this would be nicer, do expected cases, it's 832 00:44:53 --> 00:44:56 going to tell you on average how much you expect to take, 833 00:44:56 --> 00:44:58 but it tends to be hard to compute, because to compute 834 00:44:58 --> 00:45:02 that, you have to know a distribution on input. 835 00:45:02 --> 00:45:04 How likely are all the inputs, are they all equally likely, 836 00:45:04 --> 00:45:05 or are they going to depend on other things? 837 00:45:05 --> 00:45:07 And that may depend on the user, so you can't 838 00:45:07 --> 00:45:08 kind of get at that. 839 00:45:08 --> 00:45:12 We're, as a consequence, going to focus on worst case. 840 00:45:12 --> 00:45:13 This is handy for a couple of reasons. 841 00:45:13 --> 00:45:16 One, it means there are no surprises. 842 00:45:16 --> 00:45:17 All right? 843 00:45:17 --> 00:45:20 If you run it, you have a sense of the upper bound, about how 844 00:45:20 --> 00:45:22 much time it's going to take to do this computation, so you're 845 00:45:22 --> 00:45:25 not going to get surprised by something showing up. 846 00:45:25 --> 00:45:27 The second one is, a lot of the time, the worst case 847 00:45:27 --> 00:45:29 is the one that happens. 848 00:45:29 --> 00:45:31 Professor Guttag used an example of looking in the 849 00:45:31 --> 00:45:33 dictionary for something. 850 00:45:33 --> 00:45:35 Now, imagine that dictionary actually has something that's a 851 00:45:35 --> 00:45:38 linear search to go through it, as opposed to the hashing 852 00:45:38 --> 00:45:40 he did, so it's a list, for example. 853 00:45:40 --> 00:45:42 If it's in there, you'll find it perhaps very quickly. 854 00:45:42 --> 00:45:44 If it's not there, you've got to go through everything 855 00:45:44 --> 00:45:45 to say it's not there. 856 00:45:45 --> 00:45:47 And so the worst case often is the one that shows up, 857 00:45:47 --> 00:45:50 especially in things like search. 858 00:45:50 --> 00:45:54 So, as a consequence, we're going to stick with the 859 00:45:54 --> 00:45:57 worst case analysis. 860 00:45:57 --> 00:45:59 Now, I've got two minutes left. 861 00:45:59 --> 00:46:01 I was going to start showing you some examples, but I think, 862 00:46:01 --> 00:46:04 rather than doing that, I'm going to stop here, I'm going 863 00:46:04 --> 00:46:06 to give you two minutes back of time, but I want to just point 864 00:46:06 --> 00:46:11 out to you that we are going to have fun next week, because I'm 865 00:46:11 --> 00:46:15 going to show you what in the world that has to do 866 00:46:15 --> 00:46:16 with efficiency. 867 00:46:16 --> 00:46:19 So with that, we'll see you next time. 868 00:46:19 --> 00:46:21