1 00:00:00,000 --> 00:00:12,220 [SQUEAKING] [RUSTLING] [CLICKING] 2 00:00:12,220 --> 00:00:12,720 JUSTIN: OK. 3 00:00:12,720 --> 00:00:15,360 So it's a pleasure to see all of you guys. 4 00:00:15,360 --> 00:00:16,450 I'm Justin. 5 00:00:16,450 --> 00:00:20,700 I'm your third instructor for 6.006. 6 00:00:20,700 --> 00:00:22,540 This is my first time with this course. 7 00:00:22,540 --> 00:00:24,415 Although, of course, this is material that we 8 00:00:24,415 --> 00:00:27,660 all know and love in the computer science department. 9 00:00:27,660 --> 00:00:29,880 I'll admit, I find the prospect of teaching 10 00:00:29,880 --> 00:00:32,910 sorting to 400 people all at once is mildly, 11 00:00:32,910 --> 00:00:33,960 low key terrifying. 12 00:00:33,960 --> 00:00:35,670 But we're going to give it a shot. 13 00:00:35,670 --> 00:00:38,520 And hopefully, that will subside as the lecture goes on today, 14 00:00:38,520 --> 00:00:40,080 all right? 15 00:00:40,080 --> 00:00:43,140 So we're going to pick up where we left off in our last lecture 16 00:00:43,140 --> 00:00:45,300 and continue on with a similar theme 17 00:00:45,300 --> 00:00:47,700 that we're going to see throughout our algorithms class 18 00:00:47,700 --> 00:00:49,050 here in 6.006. 19 00:00:49,050 --> 00:00:51,480 I think Jason and colleagues have 20 00:00:51,480 --> 00:00:54,180 done a really great job of organizing this class 21 00:00:54,180 --> 00:00:55,890 around some interesting themes. 22 00:00:55,890 --> 00:00:58,650 And so I thought I'd start with just a tiny bit of review 23 00:00:58,650 --> 00:01:00,780 of some key vocabulary words. 24 00:01:00,780 --> 00:01:04,030 Incidentally, typically, I teach the intro graphics class, 25 00:01:04,030 --> 00:01:04,980 the geometry course. 26 00:01:04,980 --> 00:01:07,020 And last year, I got feedback that 27 00:01:07,020 --> 00:01:08,850 said I have serial killer handwriting. 28 00:01:08,850 --> 00:01:11,620 I'm not 100% sure what that means. 29 00:01:11,620 --> 00:01:13,890 But we're going to use the slides a tiny bit more 30 00:01:13,890 --> 00:01:15,990 than normal, just to make sure you guys can read. 31 00:01:15,990 --> 00:01:17,407 And when I'm writing on the board, 32 00:01:17,407 --> 00:01:19,920 at any point, if you can't tell what I wrote, 33 00:01:19,920 --> 00:01:21,720 it's definitely me and not you. 34 00:01:21,720 --> 00:01:23,590 So just let me know. 35 00:01:23,590 --> 00:01:28,485 But in any event, in 6.006, all the way back in our lecture 1-- 36 00:01:28,485 --> 00:01:29,860 I know that was a long time ago-- 37 00:01:29,860 --> 00:01:34,080 we introduced two big keywords that are closely related, 38 00:01:34,080 --> 00:01:35,460 but not precisely the same. 39 00:01:35,460 --> 00:01:37,450 Hopefully, I've gotten this right. 40 00:01:37,450 --> 00:01:39,610 But roughly, there's a theme here 41 00:01:39,610 --> 00:01:42,960 which is that there's an object called an interface, which is 42 00:01:42,960 --> 00:01:44,795 just a program specification. 43 00:01:44,795 --> 00:01:46,170 It's just telling us that there's 44 00:01:46,170 --> 00:01:48,940 a collection of operations that we want to implement. 45 00:01:48,940 --> 00:01:51,630 So for example, a set, as we're going to see today, 46 00:01:51,630 --> 00:01:53,550 is like a big pile of things. 47 00:01:53,550 --> 00:01:56,550 And behind the scenes, how I choose to implement 48 00:01:56,550 --> 00:02:01,100 it can affect the runtime and how efficient my set is. 49 00:02:01,100 --> 00:02:02,850 But the actual way that I interact with it 50 00:02:02,850 --> 00:02:06,060 is the same, whether I use an unsorted array, a sorted array, 51 00:02:06,060 --> 00:02:07,380 what have you. 52 00:02:07,380 --> 00:02:10,350 On the other hand, what happens behind the scenes 53 00:02:10,350 --> 00:02:12,430 is something called a data structure, 54 00:02:12,430 --> 00:02:14,670 which is a way to actually, in some sense, 55 00:02:14,670 --> 00:02:15,900 implement an interface. 56 00:02:15,900 --> 00:02:17,940 So this is the object that on my computer 57 00:02:17,940 --> 00:02:19,830 is actually storing the information 58 00:02:19,830 --> 00:02:22,260 and implementing the set of operations 59 00:02:22,260 --> 00:02:24,970 that I've laid out in my interface. 60 00:02:24,970 --> 00:02:26,820 And so this sort of distinction, I think, 61 00:02:26,820 --> 00:02:29,370 is a critical theme in this course 62 00:02:29,370 --> 00:02:31,412 because, for instance, in the first couple weeks, 63 00:02:31,412 --> 00:02:33,828 we're going to talk about many different ways to implement 64 00:02:33,828 --> 00:02:34,350 a set. 65 00:02:34,350 --> 00:02:35,970 I'm going to see that there's a bunch of tradeoffs. 66 00:02:35,970 --> 00:02:38,130 Some of them are really fast for certain operations 67 00:02:38,130 --> 00:02:39,790 and slow for others. 68 00:02:39,790 --> 00:02:42,660 And so essentially, we have two different decisions 69 00:02:42,660 --> 00:02:44,370 to make when we choose an algorithm. 70 00:02:44,370 --> 00:02:46,320 One is making sure that the interface 71 00:02:46,320 --> 00:02:48,540 is correct for the problem that we're concerned with. 72 00:02:48,540 --> 00:02:51,390 And the other is choosing an appropriate data structure 73 00:02:51,390 --> 00:02:55,548 whose efficiency, and memory usage, and so on aligns 74 00:02:55,548 --> 00:02:57,840 with the priorities that we have for the application we 75 00:02:57,840 --> 00:02:58,990 have in mind. 76 00:02:58,990 --> 00:03:01,270 So hopefully, this high level theme makes sense. 77 00:03:01,270 --> 00:03:04,380 And really, spiritually, I think this is the main message 78 00:03:04,380 --> 00:03:06,700 to get out of this course in the first couple of weeks, 79 00:03:06,700 --> 00:03:08,250 even though these O's, and thetas, 80 00:03:08,250 --> 00:03:12,660 and so on are easy to lose the forest through the trees. 81 00:03:12,660 --> 00:03:15,730 In any event, today, in our lecture, 82 00:03:15,730 --> 00:03:17,793 we're concerned with one particular interface, 83 00:03:17,793 --> 00:03:18,710 which is called a set. 84 00:03:18,710 --> 00:03:20,680 A set is exactly what it sounds like. 85 00:03:20,680 --> 00:03:22,840 It's a big pile of things. 86 00:03:22,840 --> 00:03:26,648 And so a set interface is like an object 87 00:03:26,648 --> 00:03:28,440 that just you can keep adding things to it. 88 00:03:28,440 --> 00:03:31,890 And then querying inside of my set, is this object here? 89 00:03:31,890 --> 00:03:32,730 Can I find it? 90 00:03:32,730 --> 00:03:35,110 And then maybe I associate with my objects 91 00:03:35,110 --> 00:03:36,610 in my set different information. 92 00:03:36,610 --> 00:03:38,867 So for example, maybe I have a set 93 00:03:38,867 --> 00:03:40,950 which represents all the students in our classroom 94 00:03:40,950 --> 00:03:42,300 today. 95 00:03:42,300 --> 00:03:45,540 Yeah, and all of you guys are associated with your student 96 00:03:45,540 --> 00:03:48,480 ID, which I believe at MIT is a number, 97 00:03:48,480 --> 00:03:50,430 which has less than sign, which is convenient. 98 00:03:50,430 --> 00:03:52,320 So we can sort all of you guys. 99 00:03:52,320 --> 00:03:54,180 And that might be the key that's associated 100 00:03:54,180 --> 00:03:56,095 to every object in the room. 101 00:03:56,095 --> 00:03:57,720 And so when I'm searching for students, 102 00:03:57,720 --> 00:03:59,970 maybe I enter in the student number. 103 00:03:59,970 --> 00:04:02,280 And then I want to ask my set, does this number 104 00:04:02,280 --> 00:04:05,700 exist in the set of students that are in 6.006? 105 00:04:05,700 --> 00:04:07,890 And if it does, then I can pull that student back. 106 00:04:07,890 --> 00:04:09,600 And then associated with that object 107 00:04:09,600 --> 00:04:12,183 is a bunch of other information that I'm not using to search-- 108 00:04:12,183 --> 00:04:13,908 so for instance, your name, your-- 109 00:04:13,908 --> 00:04:16,200 I don't know-- your social security number, your credit 110 00:04:16,200 --> 00:04:17,825 card number, all the other stuff that I 111 00:04:17,825 --> 00:04:21,120 need to have a more interesting profession. 112 00:04:21,120 --> 00:04:26,940 So in any event, let's fill in the details of our set 113 00:04:26,940 --> 00:04:28,980 interface a little bit more. 114 00:04:28,980 --> 00:04:31,620 So our set is a container. 115 00:04:31,620 --> 00:04:34,710 It contains all of the students in this classroom, 116 00:04:34,710 --> 00:04:37,060 in some virtual sense at least. 117 00:04:37,060 --> 00:04:40,150 And so to build up our set, of course, 118 00:04:40,150 --> 00:04:43,260 we need an operation that takes some iterable object A 119 00:04:43,260 --> 00:04:44,550 and builds a set out of it. 120 00:04:44,550 --> 00:04:47,040 So in other words, I have all the students 121 00:04:47,040 --> 00:04:49,560 in this classroom represented maybe in some other fashion. 122 00:04:49,560 --> 00:04:52,050 And I have to insert them all into my set. 123 00:04:52,050 --> 00:04:54,510 I can also ask my set for how much stuff is in it. 124 00:04:54,510 --> 00:04:56,190 Personally, I would call that size. 125 00:04:56,190 --> 00:04:58,272 But length is cool, too. 126 00:04:58,272 --> 00:05:00,480 And then of course, there are a lot of different ways 127 00:05:00,480 --> 00:05:02,590 that we can interact with our set. 128 00:05:02,590 --> 00:05:08,400 So for instance, we could say, is this student taking 6.006? 129 00:05:08,400 --> 00:05:10,830 So in set language, one way to understand that 130 00:05:10,830 --> 00:05:13,460 is to say that the key-- 131 00:05:13,460 --> 00:05:16,490 each person in this classroom is associated with a key. 132 00:05:16,490 --> 00:05:19,460 Does that key k exist in my set? 133 00:05:19,460 --> 00:05:22,290 In which case, I'll call this find function, 134 00:05:22,290 --> 00:05:26,420 which will give me back the item with key k or maybe null 135 00:05:26,420 --> 00:05:29,660 or something if it doesn't exist. 136 00:05:29,660 --> 00:05:33,770 Maybe I can delete an object from my set or insert it. 137 00:05:33,770 --> 00:05:36,120 Notice that these are dynamic operations, 138 00:05:36,120 --> 00:05:39,690 meaning that they actually edit what's inside of my set. 139 00:05:39,690 --> 00:05:42,860 And then finally, there are all kinds of different operations 140 00:05:42,860 --> 00:05:45,320 that I might want to do to interact with my set beyond 141 00:05:45,320 --> 00:05:46,840 is this thing inside of it. 142 00:05:46,840 --> 00:05:51,680 So for instance, so for the student ID example, 143 00:05:51,680 --> 00:05:54,440 probably finding the minimum ID number in a class 144 00:05:54,440 --> 00:05:56,000 isn't a terribly exciting exercise. 145 00:05:56,000 --> 00:05:57,917 But maybe I'm trying to find the student who's 146 00:05:57,917 --> 00:05:59,000 been at MIT the longest. 147 00:05:59,000 --> 00:06:01,245 And so that would be a reasonable heuristic. 148 00:06:01,245 --> 00:06:03,920 I actually have no idea whether MIT student IDs 149 00:06:03,920 --> 00:06:05,310 are assigned linearly or not. 150 00:06:05,310 --> 00:06:09,680 But in any event, I could find the smallest key, the largest 151 00:06:09,680 --> 00:06:11,480 key, and so on in my set. 152 00:06:11,480 --> 00:06:13,760 And these are all reasonable operations 153 00:06:13,760 --> 00:06:16,040 to query, where my object is just 154 00:06:16,040 --> 00:06:20,330 a thing that stores a lot of different entities inside it. 155 00:06:20,330 --> 00:06:22,490 Now, this description here-- 156 00:06:22,490 --> 00:06:25,130 notice that I've labeled this as a set interface. 157 00:06:25,130 --> 00:06:28,270 This is not a set data structure. 158 00:06:28,270 --> 00:06:31,270 And the way to remember that is that I haven't told you how 159 00:06:31,270 --> 00:06:32,920 I've actually implemented this. 160 00:06:32,920 --> 00:06:35,630 I haven't told you that I'm going to behind the scenes 161 00:06:35,630 --> 00:06:38,380 have an array of information, and look inside of it, 162 00:06:38,380 --> 00:06:41,110 and that's how I'm going to implement find min or find max 163 00:06:41,110 --> 00:06:42,360 with a for loop or whatever. 164 00:06:42,360 --> 00:06:44,500 All I'm telling you is that a set is a thing that 165 00:06:44,500 --> 00:06:46,000 implements these operations. 166 00:06:46,000 --> 00:06:48,640 And behind the scenes, my computer does what it does. 167 00:06:48,640 --> 00:06:50,200 Now, it might sound abstract. 168 00:06:50,200 --> 00:06:51,783 But it's more or less what you guys do 169 00:06:51,783 --> 00:06:53,240 when you write code in Python. 170 00:06:53,240 --> 00:06:56,110 I think in Python what we're calling 171 00:06:56,110 --> 00:06:57,880 a set is maybe a dictionary. 172 00:06:57,880 --> 00:06:59,350 I'm a Matlab Coder. 173 00:06:59,350 --> 00:07:00,040 I'm sorry. 174 00:07:00,040 --> 00:07:02,020 I'm a numerical analysis kind of guy. 175 00:07:02,020 --> 00:07:06,250 But essentially, one of the beautiful things 176 00:07:06,250 --> 00:07:08,800 about coding in these high level programming languages 177 00:07:08,800 --> 00:07:11,410 is that they take care of these ugly details. 178 00:07:11,410 --> 00:07:12,910 And what you're left with is just 179 00:07:12,910 --> 00:07:16,420 the high level interfacing with this object 180 00:07:16,420 --> 00:07:18,790 that you need at the end of the day. 181 00:07:18,790 --> 00:07:21,550 So of course, in today's lecture, now that we set out 182 00:07:21,550 --> 00:07:23,740 our goal, which is to fill in-- 183 00:07:23,740 --> 00:07:27,258 if I wanted to write code for a set, how could I do it? 184 00:07:27,258 --> 00:07:29,800 Now, of course, our goal is to give different data structures 185 00:07:29,800 --> 00:07:31,717 that implement these, and then understand them 186 00:07:31,717 --> 00:07:34,750 in terms of their efficiency, data storage, correctness, 187 00:07:34,750 --> 00:07:36,830 all that good stuff. 188 00:07:36,830 --> 00:07:39,657 So before we get into all these ugly details, 189 00:07:39,657 --> 00:07:40,740 let me pause for a second. 190 00:07:40,740 --> 00:07:44,307 Are there any questions about this basic interface? 191 00:07:44,307 --> 00:07:46,140 You all should feel free to stop me any time 192 00:07:46,140 --> 00:07:48,390 because this is going to be hella boring if you're not 193 00:07:48,390 --> 00:07:51,050 getting the first slide or two. 194 00:07:51,050 --> 00:07:52,304 Yes? 195 00:07:52,304 --> 00:07:54,096 AUDIENCE: Can you explain how [INAUDIBLE].. 196 00:08:01,080 --> 00:08:02,630 JUSTIN: That's a good question. 197 00:08:02,630 --> 00:08:04,630 So the question was, what exactly 198 00:08:04,630 --> 00:08:06,270 is this insert operation doing? 199 00:08:06,270 --> 00:08:07,900 That's why working on the analogy 200 00:08:07,900 --> 00:08:10,900 of the students in this classroom is a reasonable one. 201 00:08:10,900 --> 00:08:13,600 So I'm going to build up an object, which is a student. 202 00:08:13,600 --> 00:08:16,420 So in this lecture notes, I think we've been consistent. 203 00:08:16,420 --> 00:08:17,620 I caught one or two typos. 204 00:08:17,620 --> 00:08:20,050 We're going to think of x as the object that contains 205 00:08:20,050 --> 00:08:21,670 all of the information. 206 00:08:21,670 --> 00:08:24,555 And then associated with that is one piece, 207 00:08:24,555 --> 00:08:25,555 which is called the key. 208 00:08:25,555 --> 00:08:27,957 That's where we're going to use a letter k. 209 00:08:27,957 --> 00:08:29,290 And that's like your student ID. 210 00:08:29,290 --> 00:08:31,340 That's the thing I'm going to use to search. 211 00:08:31,340 --> 00:08:33,530 So what the Insert operation does 212 00:08:33,530 --> 00:08:35,289 that takes this whole student object 213 00:08:35,289 --> 00:08:37,789 x, which includes your ID, your name, your phone number, all 214 00:08:37,789 --> 00:08:40,358 that good stuff, and inserts it into the set 215 00:08:40,358 --> 00:08:42,400 with the understanding that when I search my set, 216 00:08:42,400 --> 00:08:43,995 I'm going to be searching by key. 217 00:08:43,995 --> 00:08:45,370 So when I want to find a student, 218 00:08:45,370 --> 00:08:47,670 I have to put in my ID number. 219 00:08:47,670 --> 00:08:49,660 Does that makes sense? 220 00:08:49,660 --> 00:08:50,187 Cool. 221 00:08:50,187 --> 00:08:51,020 Any other questions? 222 00:08:51,020 --> 00:08:53,330 That's great. 223 00:08:53,330 --> 00:08:54,420 Fabulous. 224 00:08:54,420 --> 00:08:54,920 OK. 225 00:08:54,920 --> 00:08:58,250 So now, let's talk about how to actually implement this thing. 226 00:08:58,250 --> 00:09:01,640 And thankfully, we're already equipped with at least a very 227 00:09:01,640 --> 00:09:04,905 simple way that we can implement a set based on what you've 228 00:09:04,905 --> 00:09:07,280 already seen in your previous programming classes or even 229 00:09:07,280 --> 00:09:10,820 just in the last two lectures, which is one way to understand 230 00:09:10,820 --> 00:09:14,150 a set or to implement it rather would be to just store 231 00:09:14,150 --> 00:09:21,080 a giant array of objects that are in my set. 232 00:09:21,080 --> 00:09:24,350 I suppose continuing with the theme of the last two lectures, 233 00:09:24,350 --> 00:09:26,300 this is not a space in memory, but rather 234 00:09:26,300 --> 00:09:30,172 a metaphorical array, a theoretical array. 235 00:09:30,172 --> 00:09:31,380 But it doesn't really matter. 236 00:09:31,380 --> 00:09:35,390 And so one way to store my set would 237 00:09:35,390 --> 00:09:39,920 be to just store a bunch of x's in no particular order. 238 00:09:39,920 --> 00:09:40,800 Does that make sense? 239 00:09:40,800 --> 00:09:42,290 So I have a big piece of memory. 240 00:09:42,290 --> 00:09:44,593 Every piece of memory is associated 241 00:09:44,593 --> 00:09:46,010 with a different object in my set. 242 00:09:46,010 --> 00:09:48,000 Obviously, this is quite easy to build. 243 00:09:48,000 --> 00:09:51,140 I just make a big array and dump everything in there. 244 00:09:51,140 --> 00:09:53,450 And the question is, is this particularly efficient 245 00:09:53,450 --> 00:09:57,400 or a useful way to implement a set? 246 00:09:57,400 --> 00:10:00,650 So for instance, let's say that I 247 00:10:00,650 --> 00:10:03,118 have a set of all the students in this classroom. 248 00:10:03,118 --> 00:10:04,910 There's some ridiculous number of you guys. 249 00:10:04,910 --> 00:10:07,340 So actually, asymptotic efficiency maybe 250 00:10:07,340 --> 00:10:08,960 actually matters a little bit. 251 00:10:08,960 --> 00:10:13,640 And I want to query, does this student exist in my class? 252 00:10:13,640 --> 00:10:16,310 Is Erik Demaine taking 6.006? 253 00:10:16,310 --> 00:10:18,410 The answer is no, I think. 254 00:10:18,410 --> 00:10:19,160 Teaching, taking? 255 00:10:19,160 --> 00:10:20,700 I don't know. 256 00:10:20,700 --> 00:10:23,330 But in any event, how do I implement it 257 00:10:23,330 --> 00:10:26,340 if my set is unordered? 258 00:10:26,340 --> 00:10:28,970 We'll think about it for a second. 259 00:10:28,970 --> 00:10:31,232 Yeah? 260 00:10:31,232 --> 00:10:35,580 AUDIENCE: [INAUDIBLE] 261 00:10:35,580 --> 00:10:37,580 JUSTIN: It's actually an interesting suggestion, 262 00:10:37,580 --> 00:10:39,705 which is going to anticipate what's happening later 263 00:10:39,705 --> 00:10:42,240 in this lecture, which was to sort the set and then 264 00:10:42,240 --> 00:10:43,297 binary search. 265 00:10:43,297 --> 00:10:45,630 But let's say that actually I only have to do this once. 266 00:10:45,630 --> 00:10:47,370 For some reason, I built up a whole set of the people 267 00:10:47,370 --> 00:10:48,540 in this classroom. 268 00:10:48,540 --> 00:10:51,100 And I just want to know, is Erik Demaine in this class? 269 00:10:51,100 --> 00:10:53,760 So then that algorithm would take n log n time 270 00:10:53,760 --> 00:10:55,560 because I've got to sort everybody. 271 00:10:55,560 --> 00:10:57,570 And then I have to do binary search, which 272 00:10:57,570 --> 00:10:59,310 is maybe log n time. 273 00:10:59,310 --> 00:11:01,350 But I claim that, if the only thing I care about 274 00:11:01,350 --> 00:11:03,480 is building up my entire set and searching it once, 275 00:11:03,480 --> 00:11:04,980 there's actually a faster algorithm. 276 00:11:04,980 --> 00:11:06,960 This is going to be needlessly confusing because we're 277 00:11:06,960 --> 00:11:09,585 going to see that this is really not the right way to implement 278 00:11:09,585 --> 00:11:11,550 it in about 38 seconds. 279 00:11:11,550 --> 00:11:12,305 Yes? 280 00:11:12,305 --> 00:11:13,180 AUDIENCE: [INAUDIBLE] 281 00:11:13,180 --> 00:11:13,722 JUSTIN: Yeah. 282 00:11:13,722 --> 00:11:15,940 Just iterate from beginning to this array and say, 283 00:11:15,940 --> 00:11:17,060 is this guy Erik? 284 00:11:17,060 --> 00:11:17,560 No. 285 00:11:17,560 --> 00:11:18,290 Is this guy Erik? 286 00:11:18,290 --> 00:11:18,790 No. 287 00:11:18,790 --> 00:11:19,600 Is this guy Erik? 288 00:11:19,600 --> 00:11:20,110 Yes. 289 00:11:20,110 --> 00:11:22,540 And then return it. 290 00:11:22,540 --> 00:11:26,260 So in the worst case, how long will that algorithm take? 291 00:11:26,260 --> 00:11:29,530 Well, in the worst case of really bad luck, 292 00:11:29,530 --> 00:11:33,250 your instructor is all the way at the end of the list. 293 00:11:33,250 --> 00:11:35,480 So in this case, what is that going to mean? 294 00:11:35,480 --> 00:11:38,230 That means that I have to walk along the entire array 295 00:11:38,230 --> 00:11:39,640 before I find him. 296 00:11:39,640 --> 00:11:42,020 So that algorithm takes order n time. 297 00:11:42,020 --> 00:11:43,480 And so your colleague's intuition 298 00:11:43,480 --> 00:11:45,370 that somehow this is quite inefficient 299 00:11:45,370 --> 00:11:46,590 is absolutely correct. 300 00:11:46,590 --> 00:11:48,340 If I know that I'm going to have to search 301 00:11:48,340 --> 00:11:51,260 my array many, many times for different people, then probably 302 00:11:51,260 --> 00:11:53,593 it makes sense to do a little bit of work ahead of time, 303 00:11:53,593 --> 00:11:54,940 like sorting the list. 304 00:11:54,940 --> 00:11:58,430 And then my query is much more efficient. 305 00:11:58,430 --> 00:12:00,710 But this is all just to say that an unordered array is 306 00:12:00,710 --> 00:12:04,250 a perfectly reasonable way to implement this set interface. 307 00:12:04,250 --> 00:12:07,010 And then searching that array it will take linear time 308 00:12:07,010 --> 00:12:09,680 every single time I search. 309 00:12:09,680 --> 00:12:11,420 And of course, if you go down your list 310 00:12:11,420 --> 00:12:13,253 of all of the different operations you might 311 00:12:13,253 --> 00:12:16,885 want to do on a set, you'll see that they all take linear time. 312 00:12:16,885 --> 00:12:18,510 So for instance, how do I build myself? 313 00:12:18,510 --> 00:12:22,400 Well, I have to reserve n slots in memory. 314 00:12:22,400 --> 00:12:25,250 And at least according to our model 315 00:12:25,250 --> 00:12:28,480 of computation in this class, that takes order n time. 316 00:12:28,480 --> 00:12:31,070 Then I'm going to copy everything into the set. 317 00:12:31,070 --> 00:12:33,165 Similarly, if I want to insert or delete, 318 00:12:33,165 --> 00:12:34,040 what do I have to do? 319 00:12:34,040 --> 00:12:37,003 Well, I have to reserve memory, stick something 320 00:12:37,003 --> 00:12:37,670 inside of there. 321 00:12:37,670 --> 00:12:40,940 In the worst case, we saw this amortize argument before, 322 00:12:40,940 --> 00:12:43,460 if your set is allowed to grow dynamically. 323 00:12:43,460 --> 00:12:46,350 And finally, if I wanted to find the minimum student 324 00:12:46,350 --> 00:12:49,960 ID in my classroom, the only algorithm 325 00:12:49,960 --> 00:12:53,320 I can have if my list of students isn't sorted 326 00:12:53,320 --> 00:12:53,950 is to what? 327 00:12:53,950 --> 00:12:56,313 Just iterate over every single student in the class. 328 00:12:56,313 --> 00:12:57,730 And if the guy that I'm looking at 329 00:12:57,730 --> 00:13:01,903 has a smaller ID than the one that I found, replace it. 330 00:13:01,903 --> 00:13:03,320 Does that make sense to everybody? 331 00:13:03,320 --> 00:13:06,070 So basically, everything you can do in a set you can implement-- 332 00:13:06,070 --> 00:13:07,778 and I think all of you guys are more than 333 00:13:07,778 --> 00:13:10,300 qualified to implement-- as an unordered array. 334 00:13:10,300 --> 00:13:11,640 It's just going to be slow. 335 00:13:11,640 --> 00:13:13,054 Yes? 336 00:13:13,054 --> 00:13:17,235 AUDIENCE: [INAUDIBLE] 337 00:13:17,235 --> 00:13:18,360 JUSTIN: Yeah, that's right. 338 00:13:18,360 --> 00:13:21,675 So actually, I don't know in this class. 339 00:13:21,675 --> 00:13:23,550 I guess, the interface and the way that we've 340 00:13:23,550 --> 00:13:24,758 described it here is dynamic. 341 00:13:24,758 --> 00:13:26,970 We can just keep adding stuff to it. 342 00:13:26,970 --> 00:13:29,040 In that case, remember this amortized argument 343 00:13:29,040 --> 00:13:31,070 from Erik's lecture says that on average 344 00:13:31,070 --> 00:13:34,080 that it will take order n time. 345 00:13:34,080 --> 00:13:35,840 AUDIENCE: [INAUDIBLE] 346 00:13:35,840 --> 00:13:37,177 JUSTIN: What was that? 347 00:13:37,177 --> 00:13:38,300 AUDIENCE: [INAUDIBLE] 348 00:13:38,300 --> 00:13:39,300 JUSTIN: Oh, that's true. 349 00:13:39,300 --> 00:13:40,890 That's an even better-- sorry. 350 00:13:40,890 --> 00:13:43,560 Even if it weren't dynamic. 351 00:13:43,560 --> 00:13:46,523 If I wanted to replace an existing key-- 352 00:13:46,523 --> 00:13:48,690 like, for some reason, two students had the same ID. 353 00:13:48,690 --> 00:13:50,010 This is a terrible analogy. 354 00:13:50,010 --> 00:13:51,310 I'm sorry. 355 00:13:51,310 --> 00:13:53,610 But in any event, if I wanted to replace 356 00:13:53,610 --> 00:13:56,153 an object with a new one, well, what would I have to do? 357 00:13:56,153 --> 00:13:57,570 I'd have to search for that object 358 00:13:57,570 --> 00:13:59,130 first, and then replace it. 359 00:13:59,130 --> 00:14:01,830 And that search is going to take order n time from our argument 360 00:14:01,830 --> 00:14:02,370 before. 361 00:14:02,370 --> 00:14:04,300 Thank you. 362 00:14:04,300 --> 00:14:04,800 OK. 363 00:14:04,800 --> 00:14:07,450 So in some sense, we're done. 364 00:14:07,450 --> 00:14:09,150 We've now implemented the interface. 365 00:14:09,150 --> 00:14:10,788 Life is good. 366 00:14:10,788 --> 00:14:12,330 And of course, this is the difference 367 00:14:12,330 --> 00:14:16,290 between existence and actually caring about the details 368 00:14:16,290 --> 00:14:17,170 inside of this thing. 369 00:14:17,170 --> 00:14:19,578 We've shown that one can implement a set. 370 00:14:19,578 --> 00:14:21,120 But it's not a terribly efficient way 371 00:14:21,120 --> 00:14:24,210 to do it by just storing a big, hot mess, disorganized list 372 00:14:24,210 --> 00:14:26,850 of numbers without any order. 373 00:14:26,850 --> 00:14:29,242 So instead of that, conveniently, 374 00:14:29,242 --> 00:14:30,700 our colleague in the front row here 375 00:14:30,700 --> 00:14:33,700 has already suggested a different data structure, 376 00:14:33,700 --> 00:14:37,530 which is to store our set not as just a disorganized array 377 00:14:37,530 --> 00:14:41,670 in any arbitrary order, but rather to keep the items 378 00:14:41,670 --> 00:14:44,690 in our set organized by key. 379 00:14:44,690 --> 00:14:47,020 So in other words, if I have this array of all 380 00:14:47,020 --> 00:14:48,700 of the students in our classroom, 381 00:14:48,700 --> 00:14:50,180 the very first element in my array 382 00:14:50,180 --> 00:14:52,660 is going to be the student with the smallest ID number, 383 00:14:52,660 --> 00:14:54,677 the second is the second smallest number, all 384 00:14:54,677 --> 00:14:56,260 the way to the end of the array, which 385 00:14:56,260 --> 00:14:59,080 is the student with the biggest ID number. 386 00:14:59,080 --> 00:15:02,170 Now, does that mean I want to do arithmetic 387 00:15:02,170 --> 00:15:03,130 on student ID numbers? 388 00:15:03,130 --> 00:15:04,060 Absolutely not. 389 00:15:04,060 --> 00:15:06,910 But it's just a way to impose order on that list 390 00:15:06,910 --> 00:15:09,810 so that I can search it very quickly later. 391 00:15:09,810 --> 00:15:10,310 OK. 392 00:15:10,310 --> 00:15:13,640 So if I want to fill in the set interface 393 00:15:13,640 --> 00:15:17,790 and I have somehow a sorted array of students-- 394 00:15:17,790 --> 00:15:20,960 so again, they're organized by student ID number-- 395 00:15:20,960 --> 00:15:24,440 then my runtime starts to get a little more interesting. 396 00:15:24,440 --> 00:15:24,940 Yeah. 397 00:15:24,940 --> 00:15:27,272 So now, insertion, deletion they'd still 398 00:15:27,272 --> 00:15:28,480 take the same amount of time. 399 00:15:28,480 --> 00:15:29,855 But let's say that I want to find 400 00:15:29,855 --> 00:15:34,290 the student with the minimum ID number, this find min function. 401 00:15:34,290 --> 00:15:36,900 Well, how could I do it in a sorted array? 402 00:15:36,900 --> 00:15:39,490 Keyword is sorted here. 403 00:15:39,490 --> 00:15:41,280 Where's the min element of an array? 404 00:15:41,280 --> 00:15:42,500 Yes? 405 00:15:42,500 --> 00:15:44,458 AUDIENCE: [INAUDIBLE] 406 00:15:44,458 --> 00:15:45,000 JUSTIN: Yeah. 407 00:15:45,000 --> 00:15:47,333 In fact, I can give a moderately faster algorithm, which 408 00:15:47,333 --> 00:15:49,170 is just look at the first one. 409 00:15:49,170 --> 00:15:52,740 If I want the minimum element of an array and the array 410 00:15:52,740 --> 00:15:55,760 is in sorted order, I know that's the first thing. 411 00:15:55,760 --> 00:15:58,260 So that's order 1 time to answer that kind of a question. 412 00:15:58,260 --> 00:16:00,840 And similarly, if I want the thing with the biggest ID 413 00:16:00,840 --> 00:16:03,240 number, I look all the way at the end. 414 00:16:03,240 --> 00:16:06,300 Now, in 6.006-- 415 00:16:06,300 --> 00:16:10,320 MIT student class numbers are super confusing to me. 416 00:16:10,320 --> 00:16:12,930 In 6.0001, 6.042, you guys already 417 00:16:12,930 --> 00:16:14,910 I think learned about binary search 418 00:16:14,910 --> 00:16:18,318 and even may have implemented it. 419 00:16:18,318 --> 00:16:19,110 So what do we know? 420 00:16:19,110 --> 00:16:21,330 If my array is sorted, how long does it 421 00:16:21,330 --> 00:16:24,950 take for me to search for any given element? 422 00:16:24,950 --> 00:16:25,600 Yes? 423 00:16:25,600 --> 00:16:26,300 AUDIENCE: Log n time. 424 00:16:26,300 --> 00:16:27,092 JUSTIN: Log n time. 425 00:16:27,092 --> 00:16:29,690 That's absolutely right because I can cut my array in half. 426 00:16:29,690 --> 00:16:31,970 If my key is bigger or smaller, then I 427 00:16:31,970 --> 00:16:33,960 look on the left or the right. 428 00:16:33,960 --> 00:16:38,360 And so this is a much more efficient means 429 00:16:38,360 --> 00:16:40,850 of searching a set. 430 00:16:40,850 --> 00:16:44,870 So in particular, 6.006 this year has 400 students. 431 00:16:44,870 --> 00:16:46,430 Maybe next year, it has 4,000. 432 00:16:46,430 --> 00:16:49,850 And eventually, it's going to have billions. 433 00:16:49,850 --> 00:16:51,360 Then what's going to happen? 434 00:16:51,360 --> 00:16:53,477 Well, if I use my unordered array 435 00:16:53,477 --> 00:16:55,310 and I have a billion students in this class, 436 00:16:55,310 --> 00:16:56,990 what's going to happen? 437 00:16:56,990 --> 00:16:59,960 Well, then it's going to take me roughly a billion computations 438 00:16:59,960 --> 00:17:02,570 to find any one student in this course, 439 00:17:02,570 --> 00:17:06,619 whereas log of a billion is a heck of a lot faster. 440 00:17:06,619 --> 00:17:10,250 On the other hand, I've swept under the rug 441 00:17:10,250 --> 00:17:12,920 here, which is how do I actually get 442 00:17:12,920 --> 00:17:14,128 a sorted array to begin with. 443 00:17:14,128 --> 00:17:16,045 And what we're going to see in today's lecture 444 00:17:16,045 --> 00:17:18,770 is that that takes more time than building if I just 445 00:17:18,770 --> 00:17:19,823 have a disorganized list. 446 00:17:19,823 --> 00:17:21,990 Building a disorganized list is an easy thing to do. 447 00:17:21,990 --> 00:17:25,575 You probably all do it at home when you're cleaning house. 448 00:17:25,575 --> 00:17:27,200 But actually, sorting a list of numbers 449 00:17:27,200 --> 00:17:29,072 requires a little bit more work. 450 00:17:29,072 --> 00:17:31,280 And so this is a great example where there's at least 451 00:17:31,280 --> 00:17:33,710 a tiny amount of tradeoff. 452 00:17:33,710 --> 00:17:36,218 Now, building my sorted array to represent my set 453 00:17:36,218 --> 00:17:38,010 is going to take a little more computation. 454 00:17:38,010 --> 00:17:39,980 We're going to see it's n log n time. 455 00:17:39,980 --> 00:17:42,740 But then once I've done that step 0, 456 00:17:42,740 --> 00:17:44,828 now a lot of these other operations 457 00:17:44,828 --> 00:17:46,370 that I typically care about in a set, 458 00:17:46,370 --> 00:17:48,020 like searching it for a given key, 459 00:17:48,020 --> 00:17:53,740 are going to go a lot faster using binary search. 460 00:17:53,740 --> 00:17:56,690 So this is our basic motivator here. 461 00:17:56,690 --> 00:17:59,530 And so now, we've seen the setup interface and two 462 00:17:59,530 --> 00:18:01,000 potential data structures. 463 00:18:01,000 --> 00:18:02,620 And our goal for the day is going 464 00:18:02,620 --> 00:18:06,040 to be to fill in the details of that second one. 465 00:18:06,040 --> 00:18:08,440 And since you all have already seen binary search, 466 00:18:08,440 --> 00:18:10,420 you've probably also already seen sorting. 467 00:18:10,420 --> 00:18:12,520 But in any event, today, we're going 468 00:18:12,520 --> 00:18:15,760 to focus mostly on the lower left square 469 00:18:15,760 --> 00:18:19,990 here, on just how can I take a disorganized list of objects 470 00:18:19,990 --> 00:18:23,650 and put it into sorted order so that I can search for it later. 471 00:18:23,650 --> 00:18:27,820 So in other words, our big problem for lecture today 472 00:18:27,820 --> 00:18:30,865 is the second thing here, this sorting. 473 00:18:30,865 --> 00:18:32,740 Incidentally, in the next couple of lectures, 474 00:18:32,740 --> 00:18:34,810 we're going to see other data sets-- or data 475 00:18:34,810 --> 00:18:35,950 structures, rather. 476 00:18:35,950 --> 00:18:37,180 Sorry, data sets. 477 00:18:37,180 --> 00:18:39,610 I used to teach machine learning class. 478 00:18:39,610 --> 00:18:43,998 And we'll see that they have different efficiency operations 479 00:18:43,998 --> 00:18:45,290 that we can fill in this table. 480 00:18:45,290 --> 00:18:46,207 So we're not done yet. 481 00:18:46,207 --> 00:18:48,670 But this is one step forward. 482 00:18:48,670 --> 00:18:49,480 OK. 483 00:18:49,480 --> 00:18:52,330 So hopefully, I have ad nauseum justified 484 00:18:52,330 --> 00:18:54,327 why one might want to sort things. 485 00:18:54,327 --> 00:18:56,410 And indeed, there are a couple of vocabulary words 486 00:18:56,410 --> 00:18:57,560 that are worth noting. 487 00:18:57,560 --> 00:19:00,760 So one, so remember that your sorting algorithm 488 00:19:00,760 --> 00:19:03,940 is pretty straightforward in terms of how you specify it. 489 00:19:03,940 --> 00:19:15,550 So in sorting, your input is an array of n numbers. 490 00:19:15,550 --> 00:19:17,170 I suppose actually really that we 491 00:19:17,170 --> 00:19:18,462 should think of them like keys. 492 00:19:18,462 --> 00:19:21,100 It's not going to matter a whole lot. 493 00:19:21,100 --> 00:19:21,790 And our output-- 494 00:19:24,662 --> 00:19:26,120 I'm always very concerned that if I 495 00:19:26,120 --> 00:19:30,650 write on the board on the back, I have to cover it up-- 496 00:19:30,650 --> 00:19:41,520 is going to be sorted array. 497 00:19:41,520 --> 00:19:44,130 And we'll call this guy B. We'll call 498 00:19:44,130 --> 00:19:49,370 this one A. This classroom is not optimized for short people. 499 00:19:51,890 --> 00:19:54,140 So there's a lot of variations on the basics sorting 500 00:19:54,140 --> 00:19:56,613 problem and the different algorithms that are out there. 501 00:19:56,613 --> 00:19:59,030 Two vocabulary words are going to highlight really quick-- 502 00:19:59,030 --> 00:20:01,100 one is if your sort is destructive, 503 00:20:01,100 --> 00:20:04,820 what that means is that rather than reserving some new memory 504 00:20:04,820 --> 00:20:08,930 for my sorted array B and then putting a sorted version of A 505 00:20:08,930 --> 00:20:13,250 into B, a destructive algorithm is one that just overwrites A 506 00:20:13,250 --> 00:20:18,300 with a sorted version of A. Certainly the C++ interface 507 00:20:18,300 --> 00:20:19,080 does this. 508 00:20:19,080 --> 00:20:20,970 I assume the Python one does, too. 509 00:20:20,970 --> 00:20:25,350 I always forget this detail. 510 00:20:25,350 --> 00:20:27,990 In addition to destructive sorts, 511 00:20:27,990 --> 00:20:31,880 some sorts are in place, meaning that not only are 512 00:20:31,880 --> 00:20:34,490 they destructive, but they also don't use extra memory 513 00:20:34,490 --> 00:20:35,825 in the process of sorting. 514 00:20:35,825 --> 00:20:37,700 Really, you could imagine a sorting algorithm 515 00:20:37,700 --> 00:20:40,850 that has to reserve a bunch of scratch space to do its work, 516 00:20:40,850 --> 00:20:44,060 and then put it back into A. 517 00:20:44,060 --> 00:20:47,240 For instance, the world's dumbest destructive sort 518 00:20:47,240 --> 00:20:49,070 might be to call your non-destructive 519 00:20:49,070 --> 00:20:52,820 and then copy it back into A. But that would 520 00:20:52,820 --> 00:20:55,220 require order n space to do. 521 00:20:55,220 --> 00:20:56,960 So if my algorithm additionally has 522 00:20:56,960 --> 00:20:59,510 the property that it doesn't reserve any extra space, 523 00:20:59,510 --> 00:21:03,740 at least up to a constant, then we call that in place. 524 00:21:03,740 --> 00:21:04,240 OK. 525 00:21:04,240 --> 00:21:05,800 So those are our basic vocabulary words. 526 00:21:05,800 --> 00:21:07,600 And they're ways to understand the differences 527 00:21:07,600 --> 00:21:09,142 between different sorting algorithms. 528 00:21:09,142 --> 00:21:09,662 Yes? 529 00:21:09,662 --> 00:21:12,760 AUDIENCE: [INAUDIBLE] 530 00:21:12,760 --> 00:21:14,890 JUSTIN: Why do they end up using extra O(1) space? 531 00:21:14,890 --> 00:21:15,550 Oh yeah, sure. 532 00:21:15,550 --> 00:21:18,040 Any time I just make a temporary variable 533 00:21:18,040 --> 00:21:20,110 like a loop counter, that's going 534 00:21:20,110 --> 00:21:21,547 to count toward that order 1. 535 00:21:21,547 --> 00:21:24,130 But the important thing is that the number of variables I need 536 00:21:24,130 --> 00:21:28,030 doesn't scale in the length of the list. 537 00:21:28,030 --> 00:21:28,630 OK. 538 00:21:28,630 --> 00:21:30,400 So I present to you the beginning and end 539 00:21:30,400 --> 00:21:33,610 of our sorting lecture, which is the world's simplest sorting 540 00:21:33,610 --> 00:21:34,750 algorithm. 541 00:21:34,750 --> 00:21:36,648 I call it permutation sort. 542 00:21:36,648 --> 00:21:38,440 I think it's very easy to prove correctness 543 00:21:38,440 --> 00:21:40,630 for this particular technique. 544 00:21:40,630 --> 00:21:43,250 So in permutation sort, what can I do? 545 00:21:43,250 --> 00:21:47,920 Well, I know that if I have an input that's a list of numbers, 546 00:21:47,920 --> 00:21:51,250 there exists a permutation of that list of numbers that 547 00:21:51,250 --> 00:21:54,720 is sorted by definition because a sort is 548 00:21:54,720 --> 00:21:57,300 a permutation of your original list. 549 00:21:57,300 --> 00:21:59,600 So what's a very simple sorting algorithm? 550 00:21:59,600 --> 00:22:04,840 Well, list every possible permutation, 551 00:22:04,840 --> 00:22:08,370 and then just double check which one's in the right order. 552 00:22:08,370 --> 00:22:11,948 So there's two key pieces to this particular technique, 553 00:22:11,948 --> 00:22:12,990 if we want to analyze it. 554 00:22:12,990 --> 00:22:15,240 I don't see a reason to belabor it too much. 555 00:22:15,240 --> 00:22:22,560 But one is that we have to enumerate the permutations. 556 00:22:25,090 --> 00:22:27,010 Now, if I have a list of n numbers, 557 00:22:27,010 --> 00:22:30,160 how many different permutations of n numbers are there? 558 00:22:30,160 --> 00:22:30,880 Yes? 559 00:22:30,880 --> 00:22:31,810 AUDIENCE: n factorial. 560 00:22:31,810 --> 00:22:33,970 JUSTIN: n factorial. 561 00:22:33,970 --> 00:22:38,570 So just by virtue of calling this permutation's function, 562 00:22:38,570 --> 00:22:42,200 I know that I incur at least n factorial time. 563 00:22:42,200 --> 00:22:43,140 It might be worse. 564 00:22:43,140 --> 00:22:45,530 It might be that like actually listing permutations takes 565 00:22:45,530 --> 00:22:47,738 a lot of time for some reason, like every permutation 566 00:22:47,738 --> 00:22:49,340 itself takes order n time. 567 00:22:49,340 --> 00:22:52,310 But at the very least, each one of these things 568 00:22:52,310 --> 00:22:54,500 looks like n factorial. 569 00:22:54,500 --> 00:22:57,210 I warned you my handwriting is terrible. 570 00:22:57,210 --> 00:23:00,920 So that's what this omega thing is doing, if I recall properly. 571 00:23:00,920 --> 00:23:03,170 And then secondarily, well, we've 572 00:23:03,170 --> 00:23:08,600 got to check if that particular permutation is sorted. 573 00:23:16,730 --> 00:23:19,370 How are we going to do that? 574 00:23:19,370 --> 00:23:21,950 There's a very easy way to check if a list is sorted. 575 00:23:21,950 --> 00:23:29,210 I'm going to do maybe for i equals 1 to n minus 1. 576 00:23:29,210 --> 00:23:30,350 Notice not a Python coder. 577 00:23:30,350 --> 00:23:32,230 It's going to look different. 578 00:23:32,230 --> 00:23:44,000 Then check, is Bi less than or equal to Bi plus 1? 579 00:23:44,000 --> 00:23:48,050 And so if this relationship is true for every single i-- 580 00:23:48,050 --> 00:23:49,660 that's supposed to be a question mark. 581 00:23:49,660 --> 00:23:52,160 This was less than or equal to with a question mark over it. 582 00:23:52,160 --> 00:23:53,755 There's my special notation. 583 00:23:53,755 --> 00:23:55,880 So if I get all the way to the end of this for loop 584 00:23:55,880 --> 00:23:58,550 and this is true everywhere, then my list is sorted 585 00:23:58,550 --> 00:24:01,770 and life is good. 586 00:24:01,770 --> 00:24:03,390 So how long does this algorithm take? 587 00:24:03,390 --> 00:24:04,950 Well, it's staring you right in the face 588 00:24:04,950 --> 00:24:06,450 because you have an algorithm, which 589 00:24:06,450 --> 00:24:09,030 is looping from 1 to n minus 1. 590 00:24:09,030 --> 00:24:13,920 So this step incurs order n time because theta of n time 591 00:24:13,920 --> 00:24:16,870 because it's got to go all the way to the end of the list. 592 00:24:16,870 --> 00:24:20,040 So when I put these things together, permutation sort-- 593 00:24:20,040 --> 00:24:23,430 well, remember that this check if sorted happens 594 00:24:23,430 --> 00:24:25,830 for every single permutation. 595 00:24:25,830 --> 00:24:27,510 So at the end of the day, our algorithm 596 00:24:27,510 --> 00:24:34,535 takes at least n factorial times n time. 597 00:24:34,535 --> 00:24:35,910 It's a great example of something 598 00:24:35,910 --> 00:24:39,160 that's even worse than n factorial, which 599 00:24:39,160 --> 00:24:43,490 somehow in my head is like the worst possible algorithm. 600 00:24:43,490 --> 00:24:46,470 So do you think that Python implements permutation sort? 601 00:24:46,470 --> 00:24:48,390 I certainly hope not. 602 00:24:48,390 --> 00:24:49,640 Yes? 603 00:24:49,640 --> 00:24:51,477 AUDIENCE: [INAUDIBLE] 604 00:24:51,477 --> 00:24:52,060 JUSTIN: Right. 605 00:24:52,060 --> 00:24:54,370 So the question was, why is it omega and not big O? 606 00:24:54,370 --> 00:24:56,750 Which is a fabulous question in this course. 607 00:24:56,750 --> 00:24:58,300 So here's the basic issue. 608 00:24:58,300 --> 00:25:00,040 I haven't given you an algorithm for how 609 00:25:00,040 --> 00:25:02,123 to compute the set of permutations 610 00:25:02,123 --> 00:25:03,040 for a list of numbers. 611 00:25:03,040 --> 00:25:06,310 I just called some magic function that I made up. 612 00:25:06,310 --> 00:25:09,490 But I know that that algorithm takes at least n factorial time 613 00:25:09,490 --> 00:25:10,210 in some sense. 614 00:25:10,210 --> 00:25:12,190 Or if nothing else, the list of permutations 615 00:25:12,190 --> 00:25:14,200 is n factorial big because that's 616 00:25:14,200 --> 00:25:16,290 all the stuff has to compute. 617 00:25:16,290 --> 00:25:18,290 So I haven't told you how to solve this problem. 618 00:25:18,290 --> 00:25:20,750 But I'm convinced that it's at least this amount of time. 619 00:25:20,750 --> 00:25:23,350 So remember that omega means lower bound. 620 00:25:23,350 --> 00:25:26,380 So when I put it all together, in some sense-- 621 00:25:26,380 --> 00:25:28,075 OK, this isn't satisfying in the sense 622 00:25:28,075 --> 00:25:30,700 that I didn't give you precisely the runtime of this algorithm. 623 00:25:30,700 --> 00:25:33,650 But hopefully, I've convinced you that it's super useless. 624 00:25:33,650 --> 00:25:35,300 Yeah, OK. 625 00:25:35,300 --> 00:25:38,660 Any other questions about that? 626 00:25:38,660 --> 00:25:39,380 But great. 627 00:25:39,380 --> 00:25:42,507 So if we go back to our table for the set interface, 628 00:25:42,507 --> 00:25:44,090 well, in some sense, if we implemented 629 00:25:44,090 --> 00:25:47,720 it using this goofy algorithm, then the lower left entry 630 00:25:47,720 --> 00:25:49,610 in our table would be n factorial times 631 00:25:49,610 --> 00:25:51,410 n, which wouldn't be so hot. 632 00:25:51,410 --> 00:25:54,860 But notice that actually all the rest of our operations 633 00:25:54,860 --> 00:25:55,920 are now quite efficient. 634 00:25:55,920 --> 00:25:57,020 I can use binary search. 635 00:25:57,020 --> 00:25:59,900 I just obtained the algorithm that-- 636 00:25:59,900 --> 00:26:03,860 rather, I obtained the sorted array in a funny fashion. 637 00:26:03,860 --> 00:26:04,360 OK. 638 00:26:04,360 --> 00:26:06,880 So let's fill in some more interesting algorithms. 639 00:26:06,880 --> 00:26:08,620 As usual, I'm talking too much. 640 00:26:08,620 --> 00:26:09,940 And I'm nervous about the time. 641 00:26:09,940 --> 00:26:13,120 But we can skip one of them if we need to. 642 00:26:13,120 --> 00:26:17,490 So how many of us have seen selection sort before? 643 00:26:17,490 --> 00:26:18,370 I see your hand. 644 00:26:18,370 --> 00:26:20,530 But we're going to defer for a little bit. 645 00:26:20,530 --> 00:26:21,950 I'm sorry? 646 00:26:21,950 --> 00:26:23,340 AUDIENCE: [INAUDIBLE] 647 00:26:23,340 --> 00:26:24,340 JUSTIN: That's fabulous. 648 00:26:24,340 --> 00:26:26,048 Why don't we defer to the end of lecture? 649 00:26:26,048 --> 00:26:27,320 And we'll do it then. 650 00:26:27,320 --> 00:26:27,820 OK. 651 00:26:27,820 --> 00:26:31,120 So the first algorithm that we'll talk about for sorting, 652 00:26:31,120 --> 00:26:33,250 which is somewhat sensible, is something 653 00:26:33,250 --> 00:26:34,840 called selection sort. 654 00:26:34,840 --> 00:26:37,310 Selection sort is exactly what it sounds like. 655 00:26:37,310 --> 00:26:40,840 So let's say that we have a list of-- whoops, my laptop 656 00:26:40,840 --> 00:26:42,250 and the screen are not agreeing. 657 00:26:42,250 --> 00:26:42,650 OK. 658 00:26:42,650 --> 00:26:44,150 Let's say I have a list of numbers-- 659 00:26:44,150 --> 00:26:46,510 8, 2, 4, 9, 3. 660 00:26:46,510 --> 00:26:48,333 There's a message that Jason I think 661 00:26:48,333 --> 00:26:49,750 is sending me in the course notes. 662 00:26:49,750 --> 00:26:52,430 But I haven't figured it out. 663 00:26:52,430 --> 00:26:56,658 But in any event, I want to sort this list of numbers. 664 00:26:56,658 --> 00:26:58,450 Here's a simple algorithm for how to do it, 665 00:26:58,450 --> 00:27:02,020 which is I can find the biggest number in this whole list 666 00:27:02,020 --> 00:27:04,110 and stick it at the end. 667 00:27:04,110 --> 00:27:05,860 So in this case, what's the biggest number 668 00:27:05,860 --> 00:27:07,840 in this list everybody? 669 00:27:07,840 --> 00:27:08,480 9. 670 00:27:08,480 --> 00:27:08,980 Good. 671 00:27:08,980 --> 00:27:10,960 See, this is why you go to MIT. 672 00:27:10,960 --> 00:27:13,120 So I'm going to take that 9. 673 00:27:13,120 --> 00:27:13,720 I find it. 674 00:27:13,720 --> 00:27:17,760 And then swap it out with the 3, which is at the end. 675 00:27:17,760 --> 00:27:19,762 And now, what's my inductive hypothesis? 676 00:27:19,762 --> 00:27:21,470 Well, in some sense, it's that everything 677 00:27:21,470 --> 00:27:23,845 to the right of this little red line that I've drawn here 678 00:27:23,845 --> 00:27:27,270 is in sorted order, in this case because there's only one thing. 679 00:27:27,270 --> 00:27:28,520 So now, what am I going to do? 680 00:27:28,520 --> 00:27:30,437 I'm going to look to the left of the red line, 681 00:27:30,437 --> 00:27:32,200 find the next biggest thing. 682 00:27:32,200 --> 00:27:35,050 What's that? 683 00:27:35,050 --> 00:27:35,800 Come on. 684 00:27:35,800 --> 00:27:36,917 AUDIENCE: 8. 685 00:27:36,917 --> 00:27:37,750 JUSTIN: There we go. 686 00:27:37,750 --> 00:27:38,690 Yeah, wake up. 687 00:27:38,690 --> 00:27:39,190 OK. 688 00:27:39,190 --> 00:27:41,300 So right, the next biggest one is the 8. 689 00:27:41,300 --> 00:27:44,560 So we're going to swap it with the 3, put it at the end, 690 00:27:44,560 --> 00:27:45,160 and so on. 691 00:27:45,160 --> 00:27:47,268 I think you guys could all finish this off. 692 00:27:47,268 --> 00:27:49,810 I suppose there should be one last line here where everything 693 00:27:49,810 --> 00:27:51,190 is green and we're happy. 694 00:27:51,190 --> 00:27:54,610 But in some sense, we're pretty sure that an array of one item 695 00:27:54,610 --> 00:27:56,680 is in sorted order. 696 00:27:56,680 --> 00:27:59,490 And so essentially, from a high level, what does selection sort 697 00:27:59,490 --> 00:28:00,370 do? 698 00:28:00,370 --> 00:28:03,520 Well, it just kept choosing the element which was the biggest 699 00:28:03,520 --> 00:28:07,060 and swapping it into the back and then iterating. 700 00:28:07,060 --> 00:28:10,270 Now, in 6.006, we're going to write selection sort in a way 701 00:28:10,270 --> 00:28:12,640 that you might not be familiar with. 702 00:28:12,640 --> 00:28:15,280 In some sense, this is not so hard to implement with two 703 00:28:15,280 --> 00:28:15,790 for loops. 704 00:28:15,790 --> 00:28:17,582 I think you guys could all do this at home. 705 00:28:17,582 --> 00:28:18,850 In fact, you may have already. 706 00:28:18,850 --> 00:28:20,600 But in this class, because we're concerned 707 00:28:20,600 --> 00:28:22,990 with proving correctness, proving efficiency, 708 00:28:22,990 --> 00:28:25,000 all that good stuff, we're going to write it 709 00:28:25,000 --> 00:28:29,120 in kind of a funny way, which is recursive. 710 00:28:29,120 --> 00:28:33,770 Now, I can't emphasize strongly enough how little you guys 711 00:28:33,770 --> 00:28:35,130 should implement this at home. 712 00:28:35,130 --> 00:28:37,730 This is mostly a theoretical version of selection sort 713 00:28:37,730 --> 00:28:38,990 rather than one that you would actually 714 00:28:38,990 --> 00:28:40,910 want to write in code because there's obviously 715 00:28:40,910 --> 00:28:41,720 a much better way to do it. 716 00:28:41,720 --> 00:28:43,762 And you'll see that in your recitation this week, 717 00:28:43,762 --> 00:28:45,150 I believe. 718 00:28:45,150 --> 00:28:47,060 But in terms of analysis, there's 719 00:28:47,060 --> 00:28:49,370 a nice, easy way to write it down. 720 00:28:49,370 --> 00:28:53,150 So we're going to take the selection sort algorithm. 721 00:28:53,150 --> 00:28:55,770 And we're going to divide it into two chunks. 722 00:28:55,770 --> 00:29:00,350 One of them is find me the biggest thing in the first k 723 00:29:00,350 --> 00:29:01,402 elements of my array. 724 00:29:01,402 --> 00:29:03,110 I shouldn't use k because that means key. 725 00:29:03,110 --> 00:29:05,690 The first i elements of my array. 726 00:29:05,690 --> 00:29:07,628 And the next one is to swap it into place 727 00:29:07,628 --> 00:29:09,170 and then sort everything to the left. 728 00:29:09,170 --> 00:29:10,348 That's the two pieces here. 729 00:29:10,348 --> 00:29:11,390 So let's write that down. 730 00:29:14,450 --> 00:29:16,670 So what did I do? 731 00:29:16,670 --> 00:29:20,180 Well, in some sense, in step 1 here, I 732 00:29:20,180 --> 00:29:31,240 found the biggest with index less than or equal to i. 733 00:29:31,240 --> 00:29:35,860 So I started at the end of the list, and then moved backward. 734 00:29:35,860 --> 00:29:41,292 And then step 2 was to swap that into place. 735 00:29:41,292 --> 00:29:43,000 Notice when I say swap-- so for instance, 736 00:29:43,000 --> 00:29:44,803 when I put the 8 there, well, I had 737 00:29:44,803 --> 00:29:45,970 to do something with that 3. 738 00:29:45,970 --> 00:29:48,740 So I just put it where the 8 used to be. 739 00:29:48,740 --> 00:29:53,540 And then finally, well, am I done? 740 00:29:53,540 --> 00:29:56,120 No, I just put the biggest thing at the end of my array. 741 00:29:56,120 --> 00:30:00,350 So now, I have to sort from index 1 to i minus 1 742 00:30:00,350 --> 00:30:03,260 because now I know that the last guy is in sorted order. 743 00:30:03,260 --> 00:30:03,770 I see you. 744 00:30:03,770 --> 00:30:05,690 I'll turn it over to you in just a sec. 745 00:30:11,170 --> 00:30:12,436 Yes? 746 00:30:12,436 --> 00:30:13,750 AUDIENCE: [INAUDIBLE] 747 00:30:13,750 --> 00:30:15,688 JUSTIN: You can't read the handwriting? 748 00:30:15,688 --> 00:30:17,020 AUDIENCE: [INAUDIBLE] 749 00:30:17,020 --> 00:30:19,750 JUSTIN: This is index less than or equal to i. 750 00:30:19,750 --> 00:30:21,788 Great question. 751 00:30:21,788 --> 00:30:22,330 I warned you. 752 00:30:22,330 --> 00:30:25,270 It's going to be a problem. 753 00:30:25,270 --> 00:30:29,600 So let's do step 1 first. 754 00:30:29,600 --> 00:30:31,300 So I'm going to put code on the board. 755 00:30:31,300 --> 00:30:34,860 And then we're going to fill in the details. 756 00:30:34,860 --> 00:30:36,900 Erik is posting on Facebook. 757 00:30:36,900 --> 00:30:39,780 I'm going to turn that feature off on my watch later. 758 00:30:39,780 --> 00:30:45,960 So right, let's implement this helper function here. 759 00:30:45,960 --> 00:30:48,253 This is something we're going to call prefix max. 760 00:30:48,253 --> 00:30:49,920 And this is going to find me the biggest 761 00:30:49,920 --> 00:30:54,970 element of array between index 0 and index i inclusive, 762 00:30:54,970 --> 00:30:55,470 I believe. 763 00:30:55,470 --> 00:30:56,272 Yeah? 764 00:30:56,272 --> 00:30:57,690 AUDIENCE: [INAUDIBLE] 765 00:30:57,690 --> 00:30:59,730 JUSTIN: Well, here's an interesting observation, 766 00:30:59,730 --> 00:31:09,200 really a deep one, which is that the biggest element from 0 767 00:31:09,200 --> 00:31:10,970 to i-- 768 00:31:10,970 --> 00:31:13,670 that's an i, sorry. 769 00:31:13,670 --> 00:31:15,950 There's two cases. 770 00:31:15,950 --> 00:31:27,370 Either it's at index i, meaning I have the first 10 771 00:31:27,370 --> 00:31:28,720 elements of my right-- 772 00:31:28,720 --> 00:31:34,760 either it is element number 10 or what's the other case? 773 00:31:34,760 --> 00:31:36,220 It ain't, Yeah? 774 00:31:36,220 --> 00:31:47,020 In other words, it has index less than i. 775 00:31:47,020 --> 00:31:48,440 This is a tautology, rate? 776 00:31:48,440 --> 00:31:50,840 Either the biggest thing is at this index or it's not. 777 00:31:50,840 --> 00:31:52,600 In which case, it has to be to the left. 778 00:31:52,600 --> 00:31:54,690 Does that makes sense? 779 00:31:54,690 --> 00:31:57,200 So this gives us a really simple algorithm 780 00:31:57,200 --> 00:32:00,950 for finding the biggest element in the array between index 0 781 00:32:00,950 --> 00:32:03,647 and index i, which is what I've shown you on the screen here. 782 00:32:03,647 --> 00:32:04,730 I'd write it on the board. 783 00:32:04,730 --> 00:32:08,070 But I am a slow writer and already low on time. 784 00:32:08,070 --> 00:32:10,490 And so essentially, what did I implement? 785 00:32:10,490 --> 00:32:15,770 Well, I found the biggest element between index 0 786 00:32:15,770 --> 00:32:18,170 and index i minus 1. 787 00:32:18,170 --> 00:32:20,345 So let's say that I have an array-- 788 00:32:22,873 --> 00:32:24,290 I forget the sequence of numbers-- 789 00:32:24,290 --> 00:32:27,560 8, 3, 5, 7, 9. 790 00:32:27,560 --> 00:32:29,270 That'll do it. 791 00:32:29,270 --> 00:32:33,680 And so like I give a pointer here, which is i. 792 00:32:33,680 --> 00:32:35,690 And the very first thing that I do 793 00:32:35,690 --> 00:32:38,150 is I compute the biggest number all the way to the left 794 00:32:38,150 --> 00:32:39,320 of this stuff. 795 00:32:39,320 --> 00:32:40,430 In this case, that is? 796 00:32:40,430 --> 00:32:41,370 AUDIENCE: 8. 797 00:32:41,370 --> 00:32:41,970 JUSTIN: 8. 798 00:32:41,970 --> 00:32:44,340 There we go. 799 00:32:44,340 --> 00:32:48,210 Now, I look at the very last element of my array, which is-- 800 00:32:48,210 --> 00:32:48,820 9. 801 00:32:48,820 --> 00:32:51,390 You're killing me today, guys. 802 00:32:51,390 --> 00:32:52,600 And then what do I return? 803 00:32:52,600 --> 00:32:57,370 Well, I want the biggest one between 0 and index i. 804 00:32:57,370 --> 00:32:59,490 So in this case, I return the 9. 805 00:32:59,490 --> 00:33:02,570 Does that make sense? 806 00:33:02,570 --> 00:33:04,850 So I know Jerry Cain at Stanford likes 807 00:33:04,850 --> 00:33:09,230 to talk about the recursive leap of faith that happens. 808 00:33:09,230 --> 00:33:11,810 Another term for this is induction. 809 00:33:11,810 --> 00:33:14,613 So we want to prove that our algorithm works. 810 00:33:14,613 --> 00:33:15,780 Well, what do we have to do? 811 00:33:15,780 --> 00:33:17,930 We have to show that when I call this function, 812 00:33:17,930 --> 00:33:22,010 it gives me the max of my array between index 0 813 00:33:22,010 --> 00:33:24,747 and index i for all i. 814 00:33:24,747 --> 00:33:27,330 So let's maybe do this inductive proof a little bit carefully. 815 00:33:27,330 --> 00:33:29,790 And then the rest, we'll be sloppy about it. 816 00:33:29,790 --> 00:33:36,200 So the base case is i equals 0. 817 00:33:36,200 --> 00:33:38,595 Well, in this case, there's only one element in my array. 818 00:33:38,595 --> 00:33:40,220 So it's pretty clear that it's the max. 819 00:33:46,110 --> 00:33:51,150 And now, we have to do our inductive step, which 820 00:33:51,150 --> 00:33:54,570 means that if I call prefix max with i minus 1, 821 00:33:54,570 --> 00:33:58,230 I really do get the max of my array between 0 822 00:33:58,230 --> 00:34:00,330 and index i minus 1. 823 00:34:00,330 --> 00:34:06,190 And then really, I can just look at my very deep statement, 824 00:34:06,190 --> 00:34:09,239 which is that either my object is at the end of the array 825 00:34:09,239 --> 00:34:09,830 or it's not. 826 00:34:12,443 --> 00:34:13,860 And this is precisely what we need 827 00:34:13,860 --> 00:34:16,020 to justify the inductive step. 828 00:34:16,020 --> 00:34:17,489 Essentially, there are two cases. 829 00:34:17,489 --> 00:34:20,520 Either the biggest element of my arrays the last one or it's 830 00:34:20,520 --> 00:34:21,909 not. 831 00:34:21,909 --> 00:34:24,100 We already, by our inductive hypothesis, 832 00:34:24,100 --> 00:34:25,719 have argued that our code can find 833 00:34:25,719 --> 00:34:31,380 the biggest element between index 0 and index i minus 1. 834 00:34:31,380 --> 00:34:33,960 So as long as we take the max of that and the very last guy, 835 00:34:33,960 --> 00:34:35,620 we're in good shape. 836 00:34:35,620 --> 00:34:40,449 So this is our very informal proof of correctness. 837 00:34:40,449 --> 00:34:40,949 OK. 838 00:34:40,949 --> 00:34:44,070 So now, we have to justify runtime for this algorithm. 839 00:34:44,070 --> 00:34:46,199 And that's actually not 100% obvious from the way 840 00:34:46,199 --> 00:34:47,074 I've written it here. 841 00:34:47,074 --> 00:34:49,330 There's no for loop. 842 00:34:49,330 --> 00:34:50,170 But what do I do? 843 00:34:50,170 --> 00:34:52,679 Well, in some sense, if my run time 844 00:34:52,679 --> 00:34:56,350 is a function s, well, for one thing, 845 00:34:56,350 --> 00:34:59,500 if my array has one element in it, 846 00:34:59,500 --> 00:35:02,680 well, my run time might be 7, might be 23. 847 00:35:02,680 --> 00:35:04,970 But at the end of the day, it only does one thing. 848 00:35:04,970 --> 00:35:06,970 It just returns i. 849 00:35:06,970 --> 00:35:11,270 So in other words, it's theta of 1. 850 00:35:11,270 --> 00:35:12,950 This isn't terribly insightful. 851 00:35:12,950 --> 00:35:14,220 But what else do we know? 852 00:35:14,220 --> 00:35:17,990 Well, when I call my function, I call it recursively on 853 00:35:17,990 --> 00:35:19,430 one smaller index. 854 00:35:19,430 --> 00:35:21,480 And then I do a constant amount of work. 855 00:35:21,480 --> 00:35:26,870 So I know that s of n is equal to s of n minus 1 856 00:35:26,870 --> 00:35:29,180 plus theta of 1. 857 00:35:29,180 --> 00:35:32,046 I do a little bit of extra computation on top of that. 858 00:35:32,046 --> 00:35:36,950 Can anybody guess what this total runtime is going to be? 859 00:35:36,950 --> 00:35:37,685 Yes? 860 00:35:37,685 --> 00:35:38,560 AUDIENCE: [INAUDIBLE] 861 00:35:38,560 --> 00:35:40,090 JUSTIN: Yeah, order n. 862 00:35:40,090 --> 00:35:42,600 So let's say that we hypothesize that this takes n time. 863 00:35:42,600 --> 00:35:46,240 You can see that because at step n we call n minus 1, 864 00:35:46,240 --> 00:35:51,226 we call it minus 2, and so on, all the way down to 1. 865 00:35:51,226 --> 00:35:53,458 If we want to prove this, one of the ways that we-- 866 00:35:53,458 --> 00:35:55,750 I think, in theory, you guys have learned in the past-- 867 00:35:55,750 --> 00:35:58,270 and you're going to cover it in recitation-- 868 00:35:58,270 --> 00:36:00,590 is a technique called substitution. 869 00:36:00,590 --> 00:36:03,170 What we do is we're going to look at this relationship. 870 00:36:03,170 --> 00:36:06,700 And we're going to hypothesize that we think s of n 871 00:36:06,700 --> 00:36:11,590 maybe look something like cn for some constant c that 872 00:36:11,590 --> 00:36:13,417 doesn't depend on n. 873 00:36:13,417 --> 00:36:15,000 Then all we have to do is double check 874 00:36:15,000 --> 00:36:16,710 that that relationship is consistent 875 00:36:16,710 --> 00:36:19,440 with our inductive hypothesis, or rather 876 00:36:19,440 --> 00:36:20,830 just as a recursive function. 877 00:36:20,830 --> 00:36:23,290 And if it is, then we're in good shape. 878 00:36:23,290 --> 00:36:26,260 So in this case, well, what do I know? 879 00:36:26,260 --> 00:36:32,610 I've guessed that s of n is theta of n. 880 00:36:32,610 --> 00:36:36,000 In particular, if I plug into this recursive relationship 881 00:36:36,000 --> 00:36:39,428 here, on the left-hand side, I'm going to get cn. 882 00:36:39,428 --> 00:36:44,290 On the right-hand side, I'm going to get c n minus 1 883 00:36:44,290 --> 00:36:47,800 plus theta of 1. 884 00:36:47,800 --> 00:36:51,080 We just have to make sure that this is an OK equal sign. 885 00:36:51,080 --> 00:36:51,870 So what can I do? 886 00:36:51,870 --> 00:36:54,260 I can subtract cn from both sides, 887 00:36:54,260 --> 00:36:56,580 maybe put that 1 on the other side here. 888 00:36:56,580 --> 00:37:00,090 Then we get the c equals big I of 1. 889 00:37:00,090 --> 00:37:01,432 c is, of course, a constant. 890 00:37:01,432 --> 00:37:02,390 So we're in good shape. 891 00:37:05,230 --> 00:37:06,850 My undergrad algorithms professor 892 00:37:06,850 --> 00:37:09,880 told me never to write a victory mark at the end of a proof. 893 00:37:09,880 --> 00:37:11,800 You have to do a little square. 894 00:37:11,800 --> 00:37:13,770 But he's not here. 895 00:37:16,650 --> 00:37:18,008 So now, I see you. 896 00:37:18,008 --> 00:37:19,300 But we're a little low on time. 897 00:37:19,300 --> 00:37:21,550 So we'll save it for the lecture. 898 00:37:21,550 --> 00:37:22,050 OK. 899 00:37:22,050 --> 00:37:25,470 So if we want to implement the selection sort algorithm, well, 900 00:37:25,470 --> 00:37:26,560 what do we do? 901 00:37:26,560 --> 00:37:30,420 Well, we're going to think of i as the index of that red line 902 00:37:30,420 --> 00:37:32,100 that I was showing you before. 903 00:37:32,100 --> 00:37:35,620 Everything beyond i is already sorted. 904 00:37:35,620 --> 00:37:37,990 So in selection sort, the first thing I'm going to do 905 00:37:37,990 --> 00:37:41,110 is find the max element between 0 and i. 906 00:37:41,110 --> 00:37:44,290 And then I'm going to swap it into place. 907 00:37:44,290 --> 00:37:46,770 So this is just a code version of the technique 908 00:37:46,770 --> 00:37:48,240 we've already talked about. 909 00:37:48,240 --> 00:37:49,590 Hopefully, this makes sense. 910 00:37:49,590 --> 00:37:53,310 So you find the biggest element between 0 and index i. 911 00:37:53,310 --> 00:37:55,800 That's what we're going to call j here. 912 00:37:55,800 --> 00:37:58,350 I swap that with the one in index i. 913 00:37:58,350 --> 00:38:00,480 That's step 2. 914 00:38:00,480 --> 00:38:02,670 And then step 3 is I still have to sort everything 915 00:38:02,670 --> 00:38:06,760 to the left of index i and that's that recursive call. 916 00:38:06,760 --> 00:38:09,940 So if I want to justify the runtime 917 00:38:09,940 --> 00:38:12,910 of this particular technique, well, now 918 00:38:12,910 --> 00:38:15,110 let's call that t for time. 919 00:38:18,280 --> 00:38:19,310 Well, what do I do? 920 00:38:19,310 --> 00:38:23,350 Well, for one, I call selection sort with index i minus 1. 921 00:38:23,350 --> 00:38:27,390 So that incurs time that looks like this. 922 00:38:27,390 --> 00:38:30,942 But I also call that prefix max function. 923 00:38:30,942 --> 00:38:32,370 And how much time does that take? 924 00:38:32,370 --> 00:38:35,300 That takes order n time. 925 00:38:35,300 --> 00:38:40,135 So at the end of the day, I have some relationship 926 00:38:40,135 --> 00:38:41,010 that looks like this. 927 00:38:41,010 --> 00:38:43,170 Does that makes sense? 928 00:38:43,170 --> 00:38:45,740 So by the way, notice that this order n swallowed 929 00:38:45,740 --> 00:38:51,960 up the order 1 computations that I had to do to swap and so on. 930 00:38:51,960 --> 00:38:56,190 So remember, there's this nice relationship, which 931 00:38:56,190 --> 00:38:58,710 you probably learned in your combinatorics class, which 932 00:38:58,710 --> 00:39:01,690 is that 1 plus 2 plus dot, dot, dot plus n. 933 00:39:01,690 --> 00:39:02,190 OK. 934 00:39:02,190 --> 00:39:03,898 I can never remember exactly the formula. 935 00:39:03,898 --> 00:39:07,980 But I'm pretty sure that it looks like n squared. 936 00:39:07,980 --> 00:39:10,170 So based on that and taking a look 937 00:39:10,170 --> 00:39:12,820 at this recursive thing, which is essentially doing exactly 938 00:39:12,820 --> 00:39:13,320 that-- 939 00:39:13,320 --> 00:39:16,140 n plus n minus 1 plus n minus 2, and so on-- 940 00:39:16,140 --> 00:39:19,950 I might hypothesize that this thing is really 941 00:39:19,950 --> 00:39:22,587 order n squared. 942 00:39:22,587 --> 00:39:24,170 So if I'm going to do that, then again 943 00:39:24,170 --> 00:39:26,510 if I want to use the same technique for proof, 944 00:39:26,510 --> 00:39:29,300 I have to plug this relationship in, and then double 945 00:39:29,300 --> 00:39:30,580 check that is consistent. 946 00:39:30,580 --> 00:39:37,400 So maybe I hypothesize that t of n equals cn squared. 947 00:39:37,400 --> 00:39:41,400 In which case, I plug it in here. 948 00:39:41,400 --> 00:39:45,500 I have cn squared equals with a question mark over it 949 00:39:45,500 --> 00:39:52,552 cn minus 1 squared plus big O or even theta n here. 950 00:39:52,552 --> 00:39:54,520 So if I expand the square, notice 951 00:39:54,520 --> 00:39:57,150 I'm going to get c times n squared 952 00:39:57,150 --> 00:39:59,610 plus a bunch of linear stuff. 953 00:39:59,610 --> 00:40:04,170 This is really cn squared-- 954 00:40:04,170 --> 00:40:05,880 I should be careful with that-- 955 00:40:05,880 --> 00:40:12,020 minus 2 cn plus c plus theta of n. 956 00:40:14,575 --> 00:40:18,520 Notice that there's a cn squared on both sides of this equation. 957 00:40:18,520 --> 00:40:19,700 They go away. 958 00:40:19,700 --> 00:40:23,410 And what I'm left with is a nice, consistent formula 959 00:40:23,410 --> 00:40:32,320 that theta of n equals 2 cn minus c. 960 00:40:32,320 --> 00:40:34,185 And indeed, this is an order n expression. 961 00:40:34,185 --> 00:40:35,560 So there's order in the universe. 962 00:40:35,560 --> 00:40:36,287 Life is good. 963 00:40:36,287 --> 00:40:37,870 Yeah, this is the substitution method. 964 00:40:37,870 --> 00:40:41,085 And again, I think you'll cover it more in your recitation. 965 00:40:41,085 --> 00:40:41,960 So what have we done? 966 00:40:41,960 --> 00:40:43,840 We have derived the selection sort. 967 00:40:43,840 --> 00:40:46,810 We've checked that it runs in n squared time. 968 00:40:46,810 --> 00:40:49,910 And by this nice, inductive strategy, 969 00:40:49,910 --> 00:40:52,220 we know that it's correct. 970 00:40:52,220 --> 00:40:53,200 So life is pretty good. 971 00:40:53,200 --> 00:40:55,367 Unfortunately, I promised for you guys on the slides 972 00:40:55,367 --> 00:40:57,130 that sorting really takes n log n time. 973 00:40:57,130 --> 00:40:59,100 And this is an order n squared algorithm. 974 00:40:59,100 --> 00:41:01,270 So we're not quite done yet. 975 00:41:01,270 --> 00:41:02,570 I'm way over time. 976 00:41:02,570 --> 00:41:05,470 So we're going to skip a different algorithm, which 977 00:41:05,470 --> 00:41:07,750 is called insertion sort, also runs on n time. 978 00:41:10,890 --> 00:41:13,140 Essentially, insertion sort runs in the reverse order. 979 00:41:13,140 --> 00:41:14,848 I'm going to sort everything to the left, 980 00:41:14,848 --> 00:41:17,433 and then insert a new object, whereas, in selection, I'm 981 00:41:17,433 --> 00:41:18,850 going to choose the biggest object 982 00:41:18,850 --> 00:41:20,858 and then sort everything to the left. 983 00:41:20,858 --> 00:41:22,900 But I'll let you guys piece through that at home. 984 00:41:22,900 --> 00:41:24,640 It's essentially the same argument. 985 00:41:24,640 --> 00:41:27,460 And instead, we should jump to an algorithm 986 00:41:27,460 --> 00:41:31,150 that actually matters, which is something called merge sort. 987 00:41:31,150 --> 00:41:34,300 How many of us have encountered merge sort before? 988 00:41:34,300 --> 00:41:34,880 Fabulous. 989 00:41:34,880 --> 00:41:35,380 Good. 990 00:41:35,380 --> 00:41:37,810 So then I'm done. 991 00:41:37,810 --> 00:41:39,342 So let's say that I have a list. 992 00:41:39,342 --> 00:41:41,050 Now, I'm sending a message back to Jason. 993 00:41:41,050 --> 00:41:42,640 I made this one up last night. 994 00:41:42,640 --> 00:41:45,820 So I have 7, 1, 5, 6, 2, 4, 9, 3. 995 00:41:45,820 --> 00:41:47,980 This is not in sorted order. 996 00:41:47,980 --> 00:41:50,090 But I can make a very deep observation, 997 00:41:50,090 --> 00:41:52,690 which is that every number by itself is in sorted order 998 00:41:52,690 --> 00:41:56,380 if I think of it as an array of length 1. 999 00:41:56,380 --> 00:42:00,670 It's really deep, like deep learning deep. 1000 00:42:00,670 --> 00:42:02,240 So now, what can I do? 1001 00:42:02,240 --> 00:42:06,167 Well, I could take every pair of numbers, draw a little red box. 1002 00:42:06,167 --> 00:42:07,750 Well, now, they're not in sorted order 1003 00:42:07,750 --> 00:42:09,230 any more inside of the red boxes. 1004 00:42:09,230 --> 00:42:10,972 So I'm going to sort inside of every box. 1005 00:42:10,972 --> 00:42:12,430 In this case, it's not too exciting 1006 00:42:12,430 --> 00:42:14,560 because it's just pairs. 1007 00:42:14,560 --> 00:42:17,350 And now, they're in sorted order because they said they were. 1008 00:42:17,350 --> 00:42:19,570 Now, I'm going to keep doubling the size of my boxes. 1009 00:42:19,570 --> 00:42:22,600 So now, let's say I have box of length 4. 1010 00:42:22,600 --> 00:42:25,300 What do I know about the left and right-hand sides 1011 00:42:25,300 --> 00:42:28,230 of the dotted lines here? 1012 00:42:28,230 --> 00:42:30,750 On the two sides of the dotted lines, 1013 00:42:30,750 --> 00:42:32,400 the array is in sorted order. 1014 00:42:32,400 --> 00:42:33,585 There's a 1 and then a 7. 1015 00:42:33,585 --> 00:42:35,800 Those are in sorted order, 5 and a 6. 1016 00:42:35,800 --> 00:42:39,650 That's because, in the previous step, I sorted every pair. 1017 00:42:39,650 --> 00:42:43,580 So when I merge these two sides together, 1018 00:42:43,580 --> 00:42:46,700 I have an additional useful piece of information, 1019 00:42:46,700 --> 00:42:49,460 namely that the two sides of the dotted line 1020 00:42:49,460 --> 00:42:50,690 are already in sorted order. 1021 00:42:50,690 --> 00:42:54,530 That's going to be our basic inductive step here. 1022 00:42:54,530 --> 00:42:56,520 So in this case, I merge the two sides. 1023 00:42:56,520 --> 00:42:59,010 I get 1, 5, 6, 7, and 2, 3, 4, 9. 1024 00:42:59,010 --> 00:43:01,380 Then finally, I put these two things together. 1025 00:43:01,380 --> 00:43:03,650 And I have to sort these two. 1026 00:43:03,650 --> 00:43:06,380 I have to merge these two sorted lists. 1027 00:43:06,380 --> 00:43:08,180 But they're in sorted order. 1028 00:43:08,180 --> 00:43:11,290 And that's going to give me a big advantage because-- oops, 1029 00:43:11,290 --> 00:43:15,080 I lost my chalk. 1030 00:43:15,080 --> 00:43:18,770 I suppose I've got space on this board here. 1031 00:43:18,770 --> 00:43:21,050 Oh no. 1032 00:43:21,050 --> 00:43:29,666 So if I want to merge 1, 5, 6, 7 and 2, 3, 4, 9, 1033 00:43:29,666 --> 00:43:31,430 there's a nice, clever technique that we 1034 00:43:31,430 --> 00:43:34,305 can do that's going to take just linear time. 1035 00:43:34,305 --> 00:43:36,180 Jason tells me it's the two finger algorithm. 1036 00:43:36,180 --> 00:43:37,715 I think that's a cute analogy here. 1037 00:43:37,715 --> 00:43:38,840 So here are my two fingers. 1038 00:43:38,840 --> 00:43:40,790 They're going to point at the end of the list. 1039 00:43:40,790 --> 00:43:44,150 And I'm going to construct the merged array backwards. 1040 00:43:44,150 --> 00:43:46,610 So how many elements are in my merged array, if I'm 1041 00:43:46,610 --> 00:43:49,400 merging two things of length 4? 1042 00:43:49,400 --> 00:43:52,020 I don't ask you guys hard questions. 1043 00:43:52,020 --> 00:43:52,920 It's 8, yeah? 1044 00:43:52,920 --> 00:43:54,076 4 plus 4. 1045 00:43:54,076 --> 00:43:55,920 8, yeah? 1046 00:43:55,920 --> 00:43:56,930 So what do I know? 1047 00:43:56,930 --> 00:44:00,000 I know that my merge array-- 1048 00:44:00,000 --> 00:44:04,240 5, 6, 7-- has eight elements. 1049 00:44:04,240 --> 00:44:07,800 And now, I'm going to have two fingers at the end of my array. 1050 00:44:07,800 --> 00:44:11,490 Which one should I put at the end of the merged guy? 1051 00:44:11,490 --> 00:44:12,390 The 7 of the 9? 1052 00:44:15,318 --> 00:44:16,300 AUDIENCE: The 9 1053 00:44:16,300 --> 00:44:17,010 JUSTIN: The 9. 1054 00:44:17,010 --> 00:44:19,040 Right, thank you. 1055 00:44:19,040 --> 00:44:23,737 So now, I can move my lower finger to the left 1056 00:44:23,737 --> 00:44:25,070 because I've already added that. 1057 00:44:25,070 --> 00:44:27,278 Notice that I never need to look to the left of where 1058 00:44:27,278 --> 00:44:30,020 my finger is because they're already in sorted order. 1059 00:44:30,020 --> 00:44:31,993 Now what should I add, the 4 or the 7? 1060 00:44:31,993 --> 00:44:32,820 AUDIENCE: 7. 1061 00:44:32,820 --> 00:44:35,150 JUSTIN: The 7. 1062 00:44:35,150 --> 00:44:37,888 And so on, dot, dot, dot, yeah? 1063 00:44:37,888 --> 00:44:40,180 So that's going to be the basic idea of the merge sort. 1064 00:44:40,180 --> 00:44:41,920 I'm going to take two sorted lists. 1065 00:44:41,920 --> 00:44:43,837 And I'm going to make a new sorted list, which 1066 00:44:43,837 --> 00:44:46,210 is twice as long, by using two fingers 1067 00:44:46,210 --> 00:44:49,530 and moving from the and backward. 1068 00:44:49,530 --> 00:44:51,720 So that's the basic intuition here. 1069 00:44:51,720 --> 00:44:53,220 Indeed, there's our sorted list. 1070 00:44:53,220 --> 00:44:55,440 It's stressing me out that there's no eight. 1071 00:44:55,440 --> 00:44:58,500 I need the power of 2. 1072 00:44:58,500 --> 00:45:00,930 So I think merge sort, we're going to present it 1073 00:45:00,930 --> 00:45:02,790 in a backward way from the previous one, 1074 00:45:02,790 --> 00:45:04,620 where I'm going to give you the high level algorithm. 1075 00:45:04,620 --> 00:45:06,670 And then actually, the headache is that merging step, 1076 00:45:06,670 --> 00:45:07,920 which I have four minutes for. 1077 00:45:07,920 --> 00:45:10,000 And I apologize for it. 1078 00:45:10,000 --> 00:45:11,340 So what does the merger sort do? 1079 00:45:11,340 --> 00:45:13,200 Well, it computes an index c, which 1080 00:45:13,200 --> 00:45:15,142 is the middle of my array. 1081 00:45:15,142 --> 00:45:17,350 And it's going to make a recursive call which is sort 1082 00:45:17,350 --> 00:45:22,423 the left, which is everything between index A and index C. 1083 00:45:22,423 --> 00:45:24,840 And then sort everything on the right, which is everything 1084 00:45:24,840 --> 00:45:28,620 from index C to index B. I know this is confusing 1085 00:45:28,620 --> 00:45:30,840 because usually letters appear in order. 1086 00:45:30,840 --> 00:45:33,600 But C, if you think of as standing for center, 1087 00:45:33,600 --> 00:45:34,650 then it makes sense like. 1088 00:45:34,650 --> 00:45:35,340 Here's my array. 1089 00:45:38,310 --> 00:45:40,530 I'm going to choose an index right in the middle. 1090 00:45:40,530 --> 00:45:44,190 I've done myself a disservice by not using a power of 2. 1091 00:45:44,190 --> 00:45:45,360 But that's OK. 1092 00:45:45,360 --> 00:45:48,870 I'm going to say sort everything to the left of the dotted line 1093 00:45:48,870 --> 00:45:49,810 first. 1094 00:45:49,810 --> 00:45:52,200 Sort everything to the right of the dotted line second. 1095 00:45:52,200 --> 00:45:54,930 Now, I have two sorted lists on the two 1096 00:45:54,930 --> 00:45:56,040 sides of the dotted line. 1097 00:45:56,040 --> 00:45:59,392 And then I'm going to use my two fingers to put them together. 1098 00:45:59,392 --> 00:46:01,100 So that's what this is implementing here. 1099 00:46:01,100 --> 00:46:03,080 See, there's two recursive calls-- 1100 00:46:03,080 --> 00:46:05,402 sort from A to C, and then sort from C to B. Oops, 1101 00:46:05,402 --> 00:46:06,610 I didn't actually label this. 1102 00:46:06,610 --> 00:46:11,870 So this is A, C, B. And then I've got to call merge. 1103 00:46:16,130 --> 00:46:18,890 Now, our implementation of merge-- 1104 00:46:18,890 --> 00:46:21,440 well, we can also do this in a recursive fashion. 1105 00:46:21,440 --> 00:46:23,640 But personally, I find this a little complicated. 1106 00:46:23,640 --> 00:46:24,590 I'm going to admit. 1107 00:46:24,590 --> 00:46:28,700 But here's the basic idea here, which I'm now rushing. 1108 00:46:31,830 --> 00:46:35,820 So I'm going to think of my upper finger as finger i 1109 00:46:35,820 --> 00:46:38,060 and my lower finger as finger j. 1110 00:46:38,060 --> 00:46:40,320 Does that makes sense? 1111 00:46:40,320 --> 00:46:43,830 So I have two sorted lists. 1112 00:46:43,830 --> 00:46:46,380 So maybe like that. 1113 00:46:46,380 --> 00:46:49,350 I don't know, 1, 3, 5, 7. 1114 00:46:49,350 --> 00:46:52,240 And then I have a second sorted list here, 1115 00:46:52,240 --> 00:46:58,950 which is maybe 2, 4, 6, 72, as one does. 1116 00:46:58,950 --> 00:47:03,900 Then I'm going to have one pointer like this, which is i, 1117 00:47:03,900 --> 00:47:06,970 and a pointer down here, which is j. 1118 00:47:06,970 --> 00:47:13,526 And my goal is to construct an array A 1119 00:47:13,526 --> 00:47:15,960 with a bunch of elements in it. 1120 00:47:15,960 --> 00:47:18,200 And the way that I'm going to do it is I'm 1121 00:47:18,200 --> 00:47:21,440 going to use exactly the same kind of recursive argument, 1122 00:47:21,440 --> 00:47:24,950 that I can either have the biggest element of my 1123 00:47:24,950 --> 00:47:27,050 be the last element of the first guy 1124 00:47:27,050 --> 00:47:30,890 or be the last element of the second one. 1125 00:47:30,890 --> 00:47:33,012 So here's going to be our recursive call. 1126 00:47:33,012 --> 00:47:34,970 And in addition to that, for convenience, we'll 1127 00:47:34,970 --> 00:47:39,710 have a third index, which is B, which is pointing to this thing 1128 00:47:39,710 --> 00:47:43,710 inside of my sorted array that I'm currently processing Yeah? 1129 00:47:43,710 --> 00:47:46,112 It's going to start at A, go to B. 1130 00:47:46,112 --> 00:47:47,570 Incidentally, I see a lot of people 1131 00:47:47,570 --> 00:47:48,830 taking photos of the slides. 1132 00:47:48,830 --> 00:47:51,980 These are just copy pasted from the notes. 1133 00:47:51,980 --> 00:47:52,790 OK. 1134 00:47:52,790 --> 00:47:57,960 So in this case, what should I put in B for my two arrays? 1135 00:47:57,960 --> 00:48:02,340 I have 1, 3, 5, 7; 2, 4, 6, 72. 1136 00:48:02,340 --> 00:48:05,170 72, yeah? 1137 00:48:05,170 --> 00:48:05,770 Great. 1138 00:48:05,770 --> 00:48:07,070 So now, what am I going to do? 1139 00:48:07,070 --> 00:48:10,180 I'm just going to call the merge function. 1140 00:48:10,180 --> 00:48:14,340 But I'm going to decrement B because now I'm 1141 00:48:14,340 --> 00:48:15,960 happy with that last element. 1142 00:48:15,960 --> 00:48:19,620 And in addition to that, I'm going to decrement j 1143 00:48:19,620 --> 00:48:22,710 because I already used it up. 1144 00:48:22,710 --> 00:48:24,900 And so that's our recursive call here. 1145 00:48:24,900 --> 00:48:27,270 It's saying, if j is less than or equal to 0-- 1146 00:48:27,270 --> 00:48:28,770 so in other words, I have an element 1147 00:48:28,770 --> 00:48:31,830 to use in one of the lists of the other. 1148 00:48:31,830 --> 00:48:36,223 And maybe the left one is bigger than the right one. 1149 00:48:36,223 --> 00:48:37,140 That's our first case. 1150 00:48:37,140 --> 00:48:39,630 That does not apply in this example here. 1151 00:48:39,630 --> 00:48:41,400 Well, then I should make the last element 1152 00:48:41,400 --> 00:48:44,520 of a from the first list and then recurse 1153 00:48:44,520 --> 00:48:48,210 with one fewer element i, and similarly 1154 00:48:48,210 --> 00:48:50,580 the reverse case for j. 1155 00:48:50,580 --> 00:48:53,760 So if we do our runtime in two minutes or less-- 1156 00:48:53,760 --> 00:48:56,190 bare with me guys-- 1157 00:48:56,190 --> 00:48:59,580 well, what is this merge function going to do? 1158 00:48:59,580 --> 00:49:02,700 Well, in some sense, there's two branches. 1159 00:49:02,700 --> 00:49:04,380 There's an if statement with two pieces. 1160 00:49:04,380 --> 00:49:07,350 But both of those pieces call merge 1161 00:49:07,350 --> 00:49:10,520 with one fewer piece in it. 1162 00:49:10,520 --> 00:49:15,470 So in some sense, we have s of n equals s of n minus 1 1163 00:49:15,470 --> 00:49:18,680 plus theta of 1, which we already 1164 00:49:18,680 --> 00:49:24,230 know from our previous proof means that s of n 1165 00:49:24,230 --> 00:49:27,870 is equal to theta of n. 1166 00:49:27,870 --> 00:49:30,972 So in other words, it takes linear time to merge. 1167 00:49:30,972 --> 00:49:33,180 It makes sense intuitively because essentially you're 1168 00:49:33,180 --> 00:49:34,597 touching every one of these things 1169 00:49:34,597 --> 00:49:38,440 once with your two fingers. 1170 00:49:38,440 --> 00:49:40,660 And now, probably the hardest part 1171 00:49:40,660 --> 00:49:42,700 of the lecture, which I left zero time for, 1172 00:49:42,700 --> 00:49:47,300 is deriving the runtime for the actual merge sort algorithm. 1173 00:49:47,300 --> 00:49:48,680 And what does that look like? 1174 00:49:48,680 --> 00:49:52,870 Well, that one's a little bit trickier because, of course, 1175 00:49:52,870 --> 00:49:56,290 I call the merge sort algorithm twice, each time 1176 00:49:56,290 --> 00:49:58,240 on a list that's half the size. 1177 00:49:58,240 --> 00:50:00,740 In this class, we're going to assume that our list is always 1178 00:50:00,740 --> 00:50:02,360 a power of 2 in its length. 1179 00:50:02,360 --> 00:50:06,100 Otherwise, this analysis is a itty bitty bit more 1180 00:50:06,100 --> 00:50:07,190 of a headache. 1181 00:50:07,190 --> 00:50:09,190 So first of all, how long does it 1182 00:50:09,190 --> 00:50:11,780 take to sort an array of length 1? 1183 00:50:11,780 --> 00:50:14,600 I am not going to ask hard questions. 1184 00:50:14,600 --> 00:50:16,750 Everybody? 1185 00:50:16,750 --> 00:50:19,060 Yeah, it's just 1, right? 1186 00:50:19,060 --> 00:50:20,835 Because there's nothing to do. 1187 00:50:20,835 --> 00:50:23,620 An array of length 1 has one element and it's sorted. 1188 00:50:23,620 --> 00:50:28,540 It's also the biggest element and the smallest element. 1189 00:50:28,540 --> 00:50:31,000 And now, what does our algorithm do? 1190 00:50:31,000 --> 00:50:33,370 Well, it makes two recursive calls 1191 00:50:33,370 --> 00:50:34,810 on lists that are half the length. 1192 00:50:40,290 --> 00:50:42,040 And then it calls that merge function. 1193 00:50:42,040 --> 00:50:46,858 And we know that the merge function takes theta of n time. 1194 00:50:46,858 --> 00:50:50,150 Does that make sense? 1195 00:50:50,150 --> 00:50:53,200 So one thing we might do, because we have some intuition 1196 00:50:53,200 --> 00:50:56,470 from your 6042 course, is that we 1197 00:50:56,470 --> 00:51:06,990 think that this thing is order n log n because it makes the two 1198 00:51:06,990 --> 00:51:07,708 recursive calls. 1199 00:51:07,708 --> 00:51:09,000 And then it puts them together. 1200 00:51:09,000 --> 00:51:11,417 And let's double check that that's true really quick using 1201 00:51:11,417 --> 00:51:12,670 the substitution method. 1202 00:51:12,670 --> 00:51:14,700 So in particular, on the left-hand side 1203 00:51:14,700 --> 00:51:18,990 here, maybe I have cn log n. 1204 00:51:18,990 --> 00:51:21,420 Now, I have 2 c. 1205 00:51:21,420 --> 00:51:29,250 Well, I have to put an n over 2 log n over 2 plus theta of n. 1206 00:51:29,250 --> 00:51:32,740 And I want to double check that this expression is consistent. 1207 00:51:32,740 --> 00:51:36,330 I've got about a foot to do it in. 1208 00:51:36,330 --> 00:51:37,920 So remember-- let's see. 1209 00:51:37,920 --> 00:51:42,690 If we use our favorite identities from high school 1210 00:51:42,690 --> 00:51:44,760 class that you probably forgot, remember 1211 00:51:44,760 --> 00:51:46,890 that log of 2 things divided by each other 1212 00:51:46,890 --> 00:51:48,720 is the difference of the logs. 1213 00:51:48,720 --> 00:51:50,830 So this is really 2. 1214 00:51:50,830 --> 00:51:51,330 OK. 1215 00:51:51,330 --> 00:51:53,310 2 divided by 2 is 1. 1216 00:51:53,310 --> 00:52:05,900 So this is c times n times log n minus log of 2 plus theta n. 1217 00:52:05,900 --> 00:52:07,290 I'm already out of time. 1218 00:52:07,290 --> 00:52:12,290 But notice that there's a c n log n on the right-hand side. 1219 00:52:12,290 --> 00:52:15,360 There's a c n log n on the left-hand side. 1220 00:52:15,360 --> 00:52:17,760 So those two things go away. 1221 00:52:17,760 --> 00:52:19,500 And what am I going to be left with? 1222 00:52:19,500 --> 00:52:29,630 I'm going to be left with theta of n equals cn log of 2. 1223 00:52:29,630 --> 00:52:32,547 Notice that c and log 2 are both constants. 1224 00:52:32,547 --> 00:52:34,380 We have a theta event on the left-hand side. 1225 00:52:34,380 --> 00:52:35,755 So there's order in the universe. 1226 00:52:35,755 --> 00:52:38,120 And we've derived our runtime. 1227 00:52:38,120 --> 00:52:40,310 So I know I rest a little bit through merge sort. 1228 00:52:40,310 --> 00:52:42,890 I'm sure that Erik and Jason can review this a little bit 1229 00:52:42,890 --> 00:52:44,000 next time. 1230 00:52:44,000 --> 00:52:46,970 But with that, we'll see you, what? 1231 00:52:46,970 --> 00:52:48,140 Thursday and Friday. 1232 00:52:48,140 --> 00:52:51,100 And it's been a pleasure to talk to you all.