1 00:00:06,470 --> 00:00:08,270 SRINI DEVADAS: Good morning, everyone. 2 00:00:08,270 --> 00:00:11,650 Thanks for making it here through the snow and sleet. 3 00:00:11,650 --> 00:00:14,450 You will be, quote unquote, rewarded 4 00:00:14,450 --> 00:00:23,260 with a cool little puzzle that has both recreational value as 5 00:00:23,260 --> 00:00:28,480 well as a deep connection to the most popular sorting algorithm, 6 00:00:28,480 --> 00:00:30,700 or at least the most commonly used sorting algorithm, 7 00:00:30,700 --> 00:00:32,670 called quicksort. 8 00:00:32,670 --> 00:00:36,270 And so what we have up here is what 9 00:00:36,270 --> 00:00:39,460 I call the disorganized handyman puzzle. 10 00:00:39,460 --> 00:00:41,550 This is not a puzzle of my invention, 11 00:00:41,550 --> 00:00:46,050 but I called it this because I think at some point 12 00:00:46,050 --> 00:00:50,200 when I read this, it was a carpenter who 13 00:00:50,200 --> 00:00:56,620 had a bunch of nuts and bolts in his bag, 14 00:00:56,620 --> 00:00:58,600 and they were all mixed up. 15 00:00:58,600 --> 00:01:06,650 So rather than having these nuts attached to the bolts, 16 00:01:06,650 --> 00:01:11,640 he was disorganized and the nuts and bolts were separate. 17 00:01:11,640 --> 00:01:14,670 And then they all got mixed up together in a bag, 18 00:01:14,670 --> 00:01:15,540 OK? 19 00:01:15,540 --> 00:01:20,310 Now, obviously, all puzzles are a little bit contrived. 20 00:01:20,310 --> 00:01:21,820 And so we're going to assume here 21 00:01:21,820 --> 00:01:27,990 that there's 100 different nuts and 100 associated bolts. 22 00:01:27,990 --> 00:01:34,730 And these 200 objects are all mixed up into this one bag 23 00:01:34,730 --> 00:01:38,100 that the carpenter is carrying. 24 00:01:38,100 --> 00:01:41,820 And not only that, each of the bolts 25 00:01:41,820 --> 00:01:45,210 is unique in terms of its size. 26 00:01:45,210 --> 00:01:50,250 And as I mentioned, there is an associated nut associated 27 00:01:50,250 --> 00:01:56,550 with each unique bolt. And so, as you can imagine, 28 00:01:56,550 --> 00:02:01,320 the goal of this puzzle is going to be finding 29 00:02:01,320 --> 00:02:06,150 the most efficient way of creating 30 00:02:06,150 --> 00:02:11,730 some organization in this bag by attaching 31 00:02:11,730 --> 00:02:16,230 each nut to the corresponding bolt, or vice versa. 32 00:02:16,230 --> 00:02:20,280 And you can assume that there's no ambiguity because 33 00:02:20,280 --> 00:02:22,920 of the uniqueness of the nuts and the bolts. 34 00:02:22,920 --> 00:02:27,450 And there's 100 pairs waiting to be discovered. 35 00:02:27,450 --> 00:02:30,600 And you can try to make that process as 36 00:02:30,600 --> 00:02:32,220 efficient as possible. 37 00:02:32,220 --> 00:02:35,910 As with all of the puzzles that we look at here, 38 00:02:35,910 --> 00:02:39,180 there's going to be naive slash straightforward way of doing 39 00:02:39,180 --> 00:02:40,180 this. 40 00:02:40,180 --> 00:02:44,250 You're going to go ahead and figure that out pretty quickly. 41 00:02:44,250 --> 00:02:48,780 We're going to analyze the complexity of that, 42 00:02:48,780 --> 00:02:52,140 or how long it takes for a specific example of 100 43 00:02:52,140 --> 00:02:54,180 nuts and 100 bolts, and then we're 44 00:02:54,180 --> 00:02:57,847 going to scratch our heads and try and do better, all right? 45 00:02:57,847 --> 00:02:59,430 And as I mentioned, obviously, there's 46 00:02:59,430 --> 00:03:01,530 going to be some programming associated with this. 47 00:03:01,530 --> 00:03:05,190 And we're not going to represent nuts and bolts 48 00:03:05,190 --> 00:03:09,040 in programs or codes. 49 00:03:09,040 --> 00:03:12,270 We're going to switch gears and talk about sorting once we're 50 00:03:12,270 --> 00:03:14,250 done with this puzzle. 51 00:03:14,250 --> 00:03:16,110 Good. 52 00:03:16,110 --> 00:03:20,460 The only way that you or the carpenter, 53 00:03:20,460 --> 00:03:22,410 if you're the carpenter's helper, 54 00:03:22,410 --> 00:03:30,000 have of checking to see that a nut attaches to a bolt 55 00:03:30,000 --> 00:03:32,490 is to try it out. 56 00:03:32,490 --> 00:03:37,080 And the nuts and bolts are different enough in size 57 00:03:37,080 --> 00:03:38,990 that even if you had your eyes closed or it 58 00:03:38,990 --> 00:03:43,410 was a dark room, if the nut attaches to the bolt, 59 00:03:43,410 --> 00:03:46,290 it would screw on perfectly. 60 00:03:46,290 --> 00:03:51,360 If the bolt is smaller than the nut, it would just go through 61 00:03:51,360 --> 00:03:56,670 and it would be obvious that the bolt is smaller than the nut. 62 00:03:56,670 --> 00:03:59,160 And if the nut is smaller than the bolt, 63 00:03:59,160 --> 00:04:01,690 it would also be obvious because the bolt won't 64 00:04:01,690 --> 00:04:03,780 go through it, which makes perfect sense 65 00:04:03,780 --> 00:04:05,920 from a physical standpoint, right? 66 00:04:05,920 --> 00:04:08,140 And you all tried this before. 67 00:04:08,140 --> 00:04:10,110 We've had-- maybe not all of you, 68 00:04:10,110 --> 00:04:14,400 but I've certainly had occasion to discover 69 00:04:14,400 --> 00:04:20,350 pairs of nuts and bolts, though it was never 100 of them. 70 00:04:20,350 --> 00:04:23,180 So that's kind of a little bit contrived, as I said. 71 00:04:23,180 --> 00:04:23,940 All right, good. 72 00:04:23,940 --> 00:04:26,275 So the setup is clear? 73 00:04:26,275 --> 00:04:27,960 We're good on the setup? 74 00:04:27,960 --> 00:04:28,740 Excellent. 75 00:04:28,740 --> 00:04:31,170 So what is the straightforward way of doing this? 76 00:04:34,230 --> 00:04:35,070 Someone. 77 00:04:35,070 --> 00:04:36,360 Yeah, Fadi back there. 78 00:04:36,360 --> 00:04:39,120 AUDIENCE: So like choose a nut by random 79 00:04:39,120 --> 00:04:40,950 and try every different bolt in turn. 80 00:04:40,950 --> 00:04:43,200 SRINI DEVADAS: Right, we could choose a nut at random, 81 00:04:43,200 --> 00:04:45,420 try every different bolt, or choose a bolt at random 82 00:04:45,420 --> 00:04:48,000 and try every different nut. 83 00:04:48,000 --> 00:04:52,830 And we're guaranteed, in this case, that there is a pairing. 84 00:04:52,830 --> 00:04:56,730 And so after we do that, we can put this paired nut and bolt 85 00:04:56,730 --> 00:05:02,700 aside and we've shrunk the problem down to one 86 00:05:02,700 --> 00:05:04,720 less than the original problem. 87 00:05:04,720 --> 00:05:09,690 If I started out with 100 pairs that are not paired together, 88 00:05:09,690 --> 00:05:12,150 but there's 100 nuts and 100 bolts, 89 00:05:12,150 --> 00:05:16,110 then I get one pair which is a correct pair, 90 00:05:16,110 --> 00:05:19,200 and I have 99 nuts left and 99 bolts left. 91 00:05:19,200 --> 00:05:20,640 And I keep going. 92 00:05:20,640 --> 00:05:23,020 Perfectly reasonable way of doing things. 93 00:05:23,020 --> 00:05:27,600 And let me show you how that would work. 94 00:05:27,600 --> 00:05:30,680 These slides were made by my daughters many years ago 95 00:05:30,680 --> 00:05:33,475 you know, back when they listened to me. 96 00:05:33,475 --> 00:05:35,850 I have a fat chance of getting them to do any work for me 97 00:05:35,850 --> 00:05:37,010 anymore. 98 00:05:37,010 --> 00:05:41,260 But they are busier than they were, I guess, years ago. 99 00:05:41,260 --> 00:05:43,070 At least they pretend to be. 100 00:05:43,070 --> 00:05:45,310 And there you go. 101 00:05:45,310 --> 00:05:48,970 You end up, in this particular example, which obviously 102 00:05:48,970 --> 00:05:52,330 has many fewer than 100 nuts and bolts, 103 00:05:52,330 --> 00:05:55,150 you end up checking this. 104 00:05:55,150 --> 00:05:59,100 And in the worst case, how many comparisons 105 00:05:59,100 --> 00:06:06,170 would I have had to do if I had 100 nuts and 100 bolts? 106 00:06:06,170 --> 00:06:08,157 In the worst case, I would have to-- 107 00:06:08,157 --> 00:06:09,032 AUDIENCE: Do all 100. 108 00:06:09,032 --> 00:06:10,532 SRINI DEVADAS: I have to do all 100, 109 00:06:10,532 --> 00:06:12,570 because I might just have gotten unlucky. 110 00:06:12,570 --> 00:06:16,740 Like I said, we're not eyeballing this at all. 111 00:06:16,740 --> 00:06:20,020 You could probably prune the search a little bit 112 00:06:20,020 --> 00:06:23,300 with respect to putting aside-- 113 00:06:23,300 --> 00:06:25,490 not having to do an exact comparison. 114 00:06:25,490 --> 00:06:27,290 But let's just say that even looking 115 00:06:27,290 --> 00:06:30,650 at a nut, as opposed to even touching it 116 00:06:30,650 --> 00:06:32,740 with the bolt that you've chosen, 117 00:06:32,740 --> 00:06:34,520 is, in fact, a comparison. 118 00:06:34,520 --> 00:06:36,852 You're just eyeballing it as opposed to physically 119 00:06:36,852 --> 00:06:38,310 touching the bolt and nut together, 120 00:06:38,310 --> 00:06:40,060 and we're going to call that a comparison. 121 00:06:40,060 --> 00:06:43,070 And if you do that, and if you make that assumption, then 122 00:06:43,070 --> 00:06:47,750 obviously, you're going to have to look at all 100 nuts 123 00:06:47,750 --> 00:06:50,630 if you have a random bolt, in the worst case, OK? 124 00:06:50,630 --> 00:06:51,670 You might get lucky. 125 00:06:51,670 --> 00:06:54,270 On an average, it might be only 50. 126 00:06:54,270 --> 00:06:58,570 We're not going to do that type of analysis here. 127 00:06:58,570 --> 00:06:59,980 So that's good. 128 00:06:59,980 --> 00:07:06,980 And what is the complexity in terms of the growth rate 129 00:07:06,980 --> 00:07:10,100 of this particular algorithm? 130 00:07:10,100 --> 00:07:18,050 If I had N nuts and bolts, then I 131 00:07:18,050 --> 00:07:34,800 do N comparisons in the worst case, to find the first pair. 132 00:07:34,800 --> 00:07:39,230 And when I say first pair, I mean the correct pairing 133 00:07:39,230 --> 00:07:42,000 that's associated with the nut and the bolt. 134 00:07:42,000 --> 00:07:44,615 And what happens-- yeah, Kevin, you have a question? 135 00:07:44,615 --> 00:07:46,490 AUDIENCE: Wouldn't it be less than N squared, 136 00:07:46,490 --> 00:07:49,730 because once you find each pair, you have one less? 137 00:07:49,730 --> 00:07:51,230 SRINI DEVADAS: That's exactly right. 138 00:07:51,230 --> 00:07:58,370 It's absolutely less than N square in terms of numerics. 139 00:07:58,370 --> 00:08:03,050 The growth rate is a slightly different question, right? 140 00:08:03,050 --> 00:08:04,650 It's related, obviously. 141 00:08:04,650 --> 00:08:10,640 And so if you do this and you get the first pair, 142 00:08:10,640 --> 00:08:15,710 you now have a problem that is of size N minus 1, right? 143 00:08:15,710 --> 00:08:18,140 And so in this case, how many comparisons 144 00:08:18,140 --> 00:08:20,490 am I going to make when I have N minus 1 nuts 145 00:08:20,490 --> 00:08:23,830 and N minus 1 bolts, in the worst case? 146 00:08:23,830 --> 00:08:27,220 I'm going to do N minus 1 comparisons, right? 147 00:08:29,790 --> 00:08:31,560 And then I keep going down. 148 00:08:31,560 --> 00:08:38,030 And you could argue that at the very end, 149 00:08:38,030 --> 00:08:42,870 when I have two nuts and two bolts, one comparison, 150 00:08:42,870 --> 00:08:49,140 if I assume that this was a perfect set of nuts and bolts, 151 00:08:49,140 --> 00:08:52,800 that we had all pairs right at the beginning, 152 00:08:52,800 --> 00:08:56,880 you could argue that that small problem corresponding 153 00:08:56,880 --> 00:09:01,530 to two nuts and bolts can be solved using one comparison. 154 00:09:01,530 --> 00:09:07,350 And you immediately know which nut pairs with which bolt, 155 00:09:07,350 --> 00:09:09,270 and the other one as well. 156 00:09:09,270 --> 00:09:11,590 But let's call it a confirmation comparison, 157 00:09:11,590 --> 00:09:18,870 and essentially say you need two comparisons here and one here. 158 00:09:18,870 --> 00:09:25,320 You can obviously shave a number a little bit out of this. 159 00:09:25,320 --> 00:09:28,445 But this goes back to-- 160 00:09:28,445 --> 00:09:29,070 Kevin is right. 161 00:09:29,070 --> 00:09:32,580 It's less than N square if you look at it numerically. 162 00:09:32,580 --> 00:09:36,860 But the growth rate is N square, because we 163 00:09:36,860 --> 00:09:41,255 know that N plus N minus 1 plus N minus 2, 164 00:09:41,255 --> 00:09:47,690 dot dot dot, 2 plus 1 is N times N plus 1 divided by 2. 165 00:09:47,690 --> 00:09:50,750 And sure, you could take this 1 off 166 00:09:50,750 --> 00:09:54,530 and this would become N times N minus 1 divided by 2, 167 00:09:54,530 --> 00:09:57,620 but that growth rate is N square. 168 00:09:57,620 --> 00:10:02,100 Grows as N square. 169 00:10:02,100 --> 00:10:05,630 So if you had 100 nuts and hundreds bolts, 170 00:10:05,630 --> 00:10:09,410 you're talking about something of the order of 5,000 171 00:10:09,410 --> 00:10:12,800 comparisons if you want to do it numerically. 172 00:10:12,800 --> 00:10:14,420 And it grows as N squared. 173 00:10:14,420 --> 00:10:17,430 If you had 1,000, then it would be 174 00:10:17,430 --> 00:10:20,150 1,000 times 1,000, which is 1 million, 175 00:10:20,150 --> 00:10:22,040 divided by 2, that's 500,000. 176 00:10:22,040 --> 00:10:25,760 So that is astronomical in terms of its growth. 177 00:10:25,760 --> 00:10:27,530 You don't want to do that. 178 00:10:27,530 --> 00:10:32,120 Obviously a contrived problem, but N square in general, 179 00:10:32,120 --> 00:10:34,480 when you talk about manual labor, 180 00:10:34,480 --> 00:10:37,790 is generally not very good. 181 00:10:37,790 --> 00:10:40,190 So we'd like to improve this. 182 00:10:40,190 --> 00:10:43,450 And we've been talking about recursion. 183 00:10:43,450 --> 00:10:45,680 We've been talking about divide and conquer. 184 00:10:45,680 --> 00:10:48,770 This is not divide and conquer in the true sense 185 00:10:48,770 --> 00:10:54,740 in that you're going to a smaller problem, admittedly. 186 00:10:54,740 --> 00:10:57,080 But recall I said, divide and conquer 187 00:10:57,080 --> 00:10:59,960 is usually used when you break the problem 188 00:10:59,960 --> 00:11:07,220 up into fractional pieces, which means in mergesort, 189 00:11:07,220 --> 00:11:09,630 for example, we took-- 190 00:11:09,630 --> 00:11:14,660 or the tiling puzzle, we took courtyard and we broke it up 191 00:11:14,660 --> 00:11:15,930 into four courtyards. 192 00:11:15,930 --> 00:11:19,100 So essentially, we had four quarters 193 00:11:19,100 --> 00:11:21,140 in the case of the tiling puzzle. 194 00:11:21,140 --> 00:11:23,630 We had two halves in the case of mergesort. 195 00:11:23,630 --> 00:11:30,980 Here we are solving, basically, one nut and bolt 196 00:11:30,980 --> 00:11:33,140 in the original puzzle. 197 00:11:33,140 --> 00:11:35,900 And then we're going from N to N minus 1, 198 00:11:35,900 --> 00:11:38,882 and so there's many more steps, right? 199 00:11:38,882 --> 00:11:42,110 The one thing to remember when you think about complexity 200 00:11:42,110 --> 00:11:48,044 is that when you go N to N divided by 2 to N divided by 4, 201 00:11:48,044 --> 00:11:50,210 and then you keep going, and you go all the way down 202 00:11:50,210 --> 00:11:55,100 to 1, this, of course, when you go like this, 203 00:11:55,100 --> 00:11:58,130 there's a linear number of steps. 204 00:11:58,130 --> 00:12:01,100 But when you do that, how many steps 205 00:12:01,100 --> 00:12:05,290 do you have to get all the way to 1 in relation-- 206 00:12:05,290 --> 00:12:08,180 if you start with N, how many steps do you have? 207 00:12:12,030 --> 00:12:14,840 If this were 64-- 208 00:12:14,840 --> 00:12:17,400 or let's just take a simpler one. 209 00:12:17,400 --> 00:12:20,620 If this were 4, how many steps would I have? 210 00:12:20,620 --> 00:12:22,760 I would have two steps. 211 00:12:22,760 --> 00:12:24,686 If this were 8, I'd have three steps. 212 00:12:24,686 --> 00:12:25,810 And so what's that formula? 213 00:12:25,810 --> 00:12:26,600 AUDIENCE: Logarithm. 214 00:12:26,600 --> 00:12:27,641 SRINI DEVADAS: Logarithm. 215 00:12:27,641 --> 00:12:30,720 Log to the base 2, right? 216 00:12:30,720 --> 00:12:33,980 So that's the power of fractional sizes. 217 00:12:33,980 --> 00:12:37,010 And this is actually a very fundamental notion 218 00:12:37,010 --> 00:12:40,940 that is going to appear over and over if you 219 00:12:40,940 --> 00:12:44,810 do any algorithms work or take any algorithms classes. 220 00:12:44,810 --> 00:12:49,250 That divide and conquer is very efficient, 221 00:12:49,250 --> 00:12:52,760 because the number of steps in order to get down 222 00:12:52,760 --> 00:12:57,050 to small problems is relatively small. 223 00:12:57,050 --> 00:12:58,800 It's a logarithm. 224 00:12:58,800 --> 00:13:04,160 Now, if you broke this up into three parts 225 00:13:04,160 --> 00:13:07,490 and you went to N/3, et cetera, then 226 00:13:07,490 --> 00:13:09,560 this would be log to the base something else, 227 00:13:09,560 --> 00:13:10,920 log to the base 3. 228 00:13:10,920 --> 00:13:14,524 So log to the base 2 came because we broke it up 229 00:13:14,524 --> 00:13:18,219 and we went down to half the size. 230 00:13:18,219 --> 00:13:19,760 Now remember, of course, I mean, it's 231 00:13:19,760 --> 00:13:22,490 not that the complexity here is just logarithmic. 232 00:13:22,490 --> 00:13:24,890 It's that you do have two problems. 233 00:13:27,540 --> 00:13:33,380 This is a function of the particular specific algorithm 234 00:13:33,380 --> 00:13:35,150 that corresponds to divide and conquer. 235 00:13:35,150 --> 00:13:36,920 But in the case of mergesort, it's 236 00:13:36,920 --> 00:13:39,820 not that we just went from N to N/2-- 237 00:13:39,820 --> 00:13:43,490 well, we went from N to N/2, but there were two N/2 size 238 00:13:43,490 --> 00:13:45,250 problems. 239 00:13:45,250 --> 00:13:49,460 But the beauty of divide and conquer 240 00:13:49,460 --> 00:14:02,880 is if I magically broke up the N sized problem-- 241 00:14:02,880 --> 00:14:06,820 and let's go back to nuts and bolts-- 242 00:14:06,820 --> 00:14:15,570 into two problems of size N/2, so 243 00:14:15,570 --> 00:14:21,900 each problem has N/2 nuts and N/2 bolts, each problem-- 244 00:14:21,900 --> 00:14:27,960 I'll repeat that-- N/2 nuts and N/2 bolts, then if I used-- 245 00:14:27,960 --> 00:14:29,580 and this is magical, right? 246 00:14:29,580 --> 00:14:31,610 I don't quite know how to do that yet. 247 00:14:31,610 --> 00:14:36,360 But if I did that, then think about what happens with respect 248 00:14:36,360 --> 00:14:38,620 to these comparisons. 249 00:14:38,620 --> 00:14:46,730 So I said when N was 100, I needed 5,000 comparisons 250 00:14:46,730 --> 00:14:49,270 for this naive algorithm. 251 00:14:49,270 --> 00:14:54,760 But now if I did this and I got two N/2's, then I have 50, 252 00:14:54,760 --> 00:14:56,320 so I call that M equals 50. 253 00:14:56,320 --> 00:14:59,110 And then I have another one which is M equals 50. 254 00:14:59,110 --> 00:15:00,880 And so roughly how many comparisons would 255 00:15:00,880 --> 00:15:05,590 I need if I had, as problem of size 50, 50 nuts and 50 bolts, 256 00:15:05,590 --> 00:15:08,600 using our original naive strategy? 257 00:15:08,600 --> 00:15:10,454 Roughly? 258 00:15:10,454 --> 00:15:11,770 AUDIENCE: 1200 259 00:15:11,770 --> 00:15:13,740 SRINI DEVADAS: 1,225, right? 260 00:15:13,740 --> 00:15:14,520 Roughly. 261 00:15:14,520 --> 00:15:17,210 And this would be 1,225. 262 00:15:17,210 --> 00:15:19,530 And so, so that is 2,500. 263 00:15:19,530 --> 00:15:24,680 So of course, I haven't quite told you how to do this. 264 00:15:24,680 --> 00:15:26,150 This is still magic. 265 00:15:26,150 --> 00:15:28,220 But I was upfront about it. 266 00:15:28,220 --> 00:15:30,900 But clearly, I've gotten an improvement 267 00:15:30,900 --> 00:15:33,670 if I have this magic, OK? 268 00:15:33,670 --> 00:15:35,000 And so let's talk about that. 269 00:15:35,000 --> 00:15:39,950 Let's turn this nuts and bolts puzzle, or the solution 270 00:15:39,950 --> 00:15:42,500 to this puzzle, and try and figure out a divide 271 00:15:42,500 --> 00:15:44,960 and conquer strategy which is distinctly 272 00:15:44,960 --> 00:15:47,420 different from the brute force strategy that 273 00:15:47,420 --> 00:15:49,950 just reduced by one, all right? 274 00:15:49,950 --> 00:15:59,070 Now, we'll-- this is just going on. 275 00:15:59,070 --> 00:16:02,480 So the comparisons in the worst case is what I just said. 276 00:16:02,480 --> 00:16:03,790 And it grows as N square. 277 00:16:03,790 --> 00:16:07,190 So big O N square means it just grows as N square. 278 00:16:07,190 --> 00:16:10,520 That's asymptotic notation that we won't go into, 279 00:16:10,520 --> 00:16:14,810 but you understand what that means now. 280 00:16:14,810 --> 00:16:18,080 So if I do a straightforward divide 281 00:16:18,080 --> 00:16:20,720 and conquer, like I did with mergesort, 282 00:16:20,720 --> 00:16:23,120 where I just took the array and I split it 283 00:16:23,120 --> 00:16:27,050 in half, if I take these nuts and bolts 284 00:16:27,050 --> 00:16:31,490 and I separate the nuts out, and I put 50 on this side, 50 285 00:16:31,490 --> 00:16:33,320 on the other side, take the bolts, 286 00:16:33,320 --> 00:16:35,990 put 50 on this side and 50 on the other side, 287 00:16:35,990 --> 00:16:37,572 is that going to work? 288 00:16:37,572 --> 00:16:38,780 No, that's not going to work. 289 00:16:38,780 --> 00:16:44,330 And the reason is if I do this arbitrary partition, then-- 290 00:16:44,330 --> 00:16:47,540 and let's say that I take this, and the two of you 291 00:16:47,540 --> 00:16:49,940 get together, you're friends and partners, 292 00:16:49,940 --> 00:16:53,515 and you say, let's do this in parallel and save some time, 293 00:16:53,515 --> 00:16:54,890 you're still going to be counting 294 00:16:54,890 --> 00:16:56,600 the number of comparisons. 295 00:16:56,600 --> 00:16:58,490 And so, obviously, you also would 296 00:16:58,490 --> 00:17:00,260 like to reduce the number of comparisons. 297 00:17:00,260 --> 00:17:03,740 But you can't even use a helper here in the sense 298 00:17:03,740 --> 00:17:05,569 that if you split this up-- obviously, 299 00:17:05,569 --> 00:17:09,349 this was not 50 and 50, but three nuts and three 300 00:17:09,349 --> 00:17:12,200 bolts on one side and four and four on the other side. 301 00:17:12,200 --> 00:17:15,319 As you can see from this example, 302 00:17:15,319 --> 00:17:22,849 you had a situation where the matching nut was in one pile 303 00:17:22,849 --> 00:17:27,589 on the left for a bolt that was on the right-hand side. 304 00:17:27,589 --> 00:17:29,840 And that could happen not just for one, 305 00:17:29,840 --> 00:17:32,240 which would kill the process, but could 306 00:17:32,240 --> 00:17:35,220 happen for many nut-bolt pairs. 307 00:17:35,220 --> 00:17:36,500 So we can't do that. 308 00:17:36,500 --> 00:17:38,810 We cannot use the straightforward divide 309 00:17:38,810 --> 00:17:41,020 and conquer approach. 310 00:17:41,020 --> 00:17:44,000 So this comes to the first interesting question 311 00:17:44,000 --> 00:17:50,694 that we have here, which is, how do we exploit the fact that all 312 00:17:50,694 --> 00:17:51,860 of you are going to help me? 313 00:17:51,860 --> 00:17:53,920 So I'm the disorganized carpenter or handyman. 314 00:17:56,480 --> 00:18:03,020 And I'd like to break this up, first into two piles, such 315 00:18:03,020 --> 00:18:06,960 that I can go off and send one or more of you off 316 00:18:06,960 --> 00:18:13,700 and say, OK, I can guarantee that this is a subproblem, 317 00:18:13,700 --> 00:18:15,860 in the sense that the original problem had 318 00:18:15,860 --> 00:18:18,830 all of the bolts that matched all of the nuts that 319 00:18:18,830 --> 00:18:20,210 were in my pile. 320 00:18:20,210 --> 00:18:23,735 I want to break it up into two problems in-- 321 00:18:23,735 --> 00:18:25,620 this is the magic that we have. 322 00:18:25,620 --> 00:18:28,300 We need to figure out this magic-- 323 00:18:28,300 --> 00:18:31,730 such that if I give you 50 nuts and 50 bolts, 324 00:18:31,730 --> 00:18:36,350 and I keep 50 nuts and 50 bolts, that you 325 00:18:36,350 --> 00:18:37,320 can solve that problem. 326 00:18:37,320 --> 00:18:38,090 It is a problem. 327 00:18:38,090 --> 00:18:40,220 I mean, there's a matching there, right? 328 00:18:40,220 --> 00:18:43,610 For those 50 nuts, you have the 50 bolts in your pile. 329 00:18:43,610 --> 00:18:44,967 Same thing with me. 330 00:18:44,967 --> 00:18:45,800 So how do I do that? 331 00:18:50,680 --> 00:18:51,903 Yeah, go ahead, Josh. 332 00:18:51,903 --> 00:18:53,090 AUDIENCE: I have a question. 333 00:18:53,090 --> 00:18:55,940 Can you compare the size of nuts to each other? 334 00:18:55,940 --> 00:18:57,190 SRINI DEVADAS: No, you cannot. 335 00:18:57,190 --> 00:18:59,170 Yeah, good question. 336 00:18:59,170 --> 00:19:03,910 So the only thing you can do is you have a nut 337 00:19:03,910 --> 00:19:06,770 and you have a bolt, and if it goes through, 338 00:19:06,770 --> 00:19:09,790 then the nut is bigger. 339 00:19:09,790 --> 00:19:11,890 If it doesn't, then the bolt is bigger. 340 00:19:11,890 --> 00:19:15,880 And then if it fits exactly, you have a match, all right? 341 00:19:15,880 --> 00:19:18,300 Great, so yeah, go ahead, Ganatra. 342 00:19:18,300 --> 00:19:20,600 AUDIENCE: So once you get one to fit, 343 00:19:20,600 --> 00:19:24,370 you can use that to sort of like order the other ones. 344 00:19:24,370 --> 00:19:26,906 Because it fits in perfectly, right, it fits, 345 00:19:26,906 --> 00:19:29,510 so all the ones that-- you just go through all of them, 346 00:19:29,510 --> 00:19:31,530 and if it goes through just the hole, 347 00:19:31,530 --> 00:19:34,284 then you know those are smaller-- 348 00:19:34,284 --> 00:19:35,950 well, they're bigger. 349 00:19:35,950 --> 00:19:37,459 And then if it doesn't fit through, 350 00:19:37,459 --> 00:19:38,542 then you know that those-- 351 00:19:38,542 --> 00:19:39,310 SRINI DEVADAS: Right, great. 352 00:19:39,310 --> 00:19:40,030 Excellent. 353 00:19:40,030 --> 00:19:46,810 So Ganatra has discovered this notion of pivoting. 354 00:19:46,810 --> 00:19:50,440 And pivoting is essentially something 355 00:19:50,440 --> 00:19:56,320 that is best described here in the animation that gives you 356 00:19:56,320 --> 00:19:59,290 a divide and conquer strategy. 357 00:19:59,290 --> 00:20:03,040 Now, I answered Josh's question, and that was a key question. 358 00:20:03,040 --> 00:20:06,220 And I'm not going to violate the answer to that question 359 00:20:06,220 --> 00:20:13,480 by using some other strategy that does not correspond 360 00:20:13,480 --> 00:20:16,540 to this nut-bolt check, right? 361 00:20:16,540 --> 00:20:18,730 So the only thing I can do in this puzzle 362 00:20:18,730 --> 00:20:20,140 is a nut-bolt check. 363 00:20:20,140 --> 00:20:24,650 But I do get three potential possibilities 364 00:20:24,650 --> 00:20:26,860 whenever I do a nut-bolt check. 365 00:20:26,860 --> 00:20:30,840 I do get the information about a perfect match, 366 00:20:30,840 --> 00:20:33,880 whether the nut is smaller, or whether the nut is bigger. 367 00:20:33,880 --> 00:20:37,550 And that's all I need in order to do pivoting, right? 368 00:20:37,550 --> 00:20:41,380 And so what's going to happen here is I go ahead 369 00:20:41,380 --> 00:20:48,580 and I choose an arbitrary bolt. And I don't even 370 00:20:48,580 --> 00:20:52,515 actually need to find the match before I start this process. 371 00:20:52,515 --> 00:20:55,450 So it's a small variant on what Ganatra said, 372 00:20:55,450 --> 00:20:58,660 but it's really a pretty small variant. 373 00:20:58,660 --> 00:21:02,680 And when I see that this is not a match, 374 00:21:02,680 --> 00:21:06,160 and in fact, the nut is bigger, then I put the nut 375 00:21:06,160 --> 00:21:09,540 on the right-hand side pile. 376 00:21:09,540 --> 00:21:12,330 When I see that the nut is smaller, 377 00:21:12,330 --> 00:21:15,310 then I put the nut in the left-hand side pile. 378 00:21:15,310 --> 00:21:18,407 And I see a match, I just put that aside. 379 00:21:18,407 --> 00:21:19,990 I don't put it in either of the piles, 380 00:21:19,990 --> 00:21:22,570 because I'm going to actually-- it turns out 381 00:21:22,570 --> 00:21:27,590 I'm not going to get a pile of 50 and 50, or in this case, 382 00:21:27,590 --> 00:21:30,670 I'm actually going to get something like three and three, 383 00:21:30,670 --> 00:21:32,950 because I'm going to get one match out of it. 384 00:21:32,950 --> 00:21:38,350 So in general, I'm not necessarily 385 00:21:38,350 --> 00:21:39,970 going to get equal size piles. 386 00:21:39,970 --> 00:21:42,660 So that's actually a little bit of an issue, 387 00:21:42,660 --> 00:21:45,700 and we'll get back to that maybe later. 388 00:21:45,700 --> 00:21:49,270 But I will get a matching, and I'll 389 00:21:49,270 --> 00:21:53,680 get a pile on the left that is the perfect problem, 390 00:21:53,680 --> 00:21:55,270 perfect subproblem. 391 00:21:55,270 --> 00:21:57,400 There I can hand it off to any of you 392 00:21:57,400 --> 00:21:59,650 and you can go off and solve it and it will work. 393 00:21:59,650 --> 00:22:02,350 And I'll get a pile on the right, same thing for that. 394 00:22:02,350 --> 00:22:05,950 So this was a match, but I don't stop. 395 00:22:05,950 --> 00:22:09,100 I don't put the nut in either of the two piles. 396 00:22:09,100 --> 00:22:10,840 It's just going to stay up there. 397 00:22:10,840 --> 00:22:12,820 And then I keep going. 398 00:22:12,820 --> 00:22:20,020 And I make up my right pile and my left pile. 399 00:22:20,020 --> 00:22:21,770 Now, I'm not quite done yet. 400 00:22:21,770 --> 00:22:24,400 What do I do now? 401 00:22:24,400 --> 00:22:25,645 Yeah, back there. 402 00:22:25,645 --> 00:22:28,435 AUDIENCE: You would test every screw into that nut 403 00:22:28,435 --> 00:22:29,554 to see which one-- 404 00:22:29,554 --> 00:22:30,720 SRINI DEVADAS: That's right. 405 00:22:30,720 --> 00:22:32,070 So remember, I kept the-- 406 00:22:35,790 --> 00:22:38,960 I guess you used a different term here. 407 00:22:38,960 --> 00:22:42,570 You used-- I said bolt, you said screw. 408 00:22:42,570 --> 00:22:43,090 That's fine. 409 00:22:43,090 --> 00:22:45,520 But we're talking about a nut here. 410 00:22:45,520 --> 00:22:47,020 So let's go with the nut. 411 00:22:47,020 --> 00:22:49,260 And this nut that I put aside, I'm 412 00:22:49,260 --> 00:22:53,760 going to now compare that, not with this pivot bolt that I 413 00:22:53,760 --> 00:22:58,770 picked, which is now set aside and never have used that again, 414 00:22:58,770 --> 00:23:00,600 but I'm going to put that aside, and I'm 415 00:23:00,600 --> 00:23:07,901 going to compare each of the bolts with this pivot nut 416 00:23:07,901 --> 00:23:08,400 that I have. 417 00:23:08,400 --> 00:23:15,370 So the pivot bolt was picked and I discover 418 00:23:15,370 --> 00:23:19,510 the associated pivot nut. 419 00:23:19,510 --> 00:23:23,170 And now I'm going to use that pivot nut, which 420 00:23:23,170 --> 00:23:29,820 is the nut that I have up here, the light green one, 421 00:23:29,820 --> 00:23:32,280 and then I'm going to do the same thing. 422 00:23:32,280 --> 00:23:38,600 And all of this does not violate the answer to Josh's question. 423 00:23:38,600 --> 00:23:41,700 And it's going to do exactly the right thing 424 00:23:41,700 --> 00:23:47,970 in terms of giving me two piles that 425 00:23:47,970 --> 00:23:54,990 are beautiful problems that are going to be solvable. 426 00:23:54,990 --> 00:24:00,690 There's nothing that's stopping me from repeating this process. 427 00:24:00,690 --> 00:24:02,580 When we talk about divide and conquer, 428 00:24:02,580 --> 00:24:04,530 usually we talk about recursion, and we 429 00:24:04,530 --> 00:24:07,680 talk about repeatedly doing divide and conquer. 430 00:24:07,680 --> 00:24:13,710 Then I wrote this up here in terms of the N size problem 431 00:24:13,710 --> 00:24:15,900 turned into two N/2 problems. 432 00:24:15,900 --> 00:24:18,120 Well, each of those N/2 problems I 433 00:24:18,120 --> 00:24:20,310 could turn into two N/4 problems, 434 00:24:20,310 --> 00:24:23,490 so I could have four N/4's, and so on and so forth. 435 00:24:23,490 --> 00:24:25,450 And so I could clearly do that. 436 00:24:25,450 --> 00:24:28,410 And especially if N is large, then I 437 00:24:28,410 --> 00:24:31,500 want to get a reduction in comparisons. 438 00:24:31,500 --> 00:24:33,780 And get this to grow, by the way. 439 00:24:33,780 --> 00:24:35,580 And we won't do this analysis. 440 00:24:35,580 --> 00:24:39,550 But rather than growing as N square, 441 00:24:39,550 --> 00:24:47,450 you'd like it to grow as N log N. 442 00:24:47,450 --> 00:24:54,120 And that's kind of the asymptotic analysis 443 00:24:54,120 --> 00:24:57,420 that you'll have to do if you take a class like 6006. 444 00:24:57,420 --> 00:25:01,830 But the general sense that you should take away from this 445 00:25:01,830 --> 00:25:05,040 is that you're going to have a logarithmic number of steps. 446 00:25:05,040 --> 00:25:07,680 And obviously, that doesn't imply a logarithmic number 447 00:25:07,680 --> 00:25:10,950 of comparisons, because the number of subproblems in each 448 00:25:10,950 --> 00:25:12,630 of those steps is doubling. 449 00:25:12,630 --> 00:25:15,390 Initially, you had a problem, then you have two subproblems, 450 00:25:15,390 --> 00:25:17,010 and then you have four subproblems. 451 00:25:17,010 --> 00:25:22,380 Each of them are small in size, but together, 452 00:25:22,380 --> 00:25:27,390 assuming these problems are N/2 and N/2, then 453 00:25:27,390 --> 00:25:30,210 you can get something that's substantially smaller than N 454 00:25:30,210 --> 00:25:33,690 square, namely N log N, OK? 455 00:25:33,690 --> 00:25:35,070 And you can do that numerically. 456 00:25:35,070 --> 00:25:37,290 And I don't want to get into that too much, 457 00:25:37,290 --> 00:25:40,380 but be happy to talk to you about this after lecture 458 00:25:40,380 --> 00:25:43,640 or during office hours. 459 00:25:43,640 --> 00:25:46,370 There's a couple of caveats. 460 00:25:46,370 --> 00:25:49,490 One of them is there's no guarantee here 461 00:25:49,490 --> 00:25:53,510 if I pick a random pivot bolt that I'm actually 462 00:25:53,510 --> 00:25:57,980 going to get N/2 and N/2. 463 00:25:57,980 --> 00:25:59,930 It's going to be one less than that. 464 00:25:59,930 --> 00:26:02,960 Maybe it'd be in over N/2 and N over 2 minus 1. 465 00:26:02,960 --> 00:26:07,250 I could get something-- if I picked a large pivot bolt, 466 00:26:07,250 --> 00:26:13,010 I might get a very skewed pair of piles, right? 467 00:26:13,010 --> 00:26:15,227 One of them could have 80 in them-- 468 00:26:15,227 --> 00:26:17,060 80 nuts and bolts in them, and the other one 469 00:26:17,060 --> 00:26:19,880 could have 20 nuts and bolts in them, OK? 470 00:26:19,880 --> 00:26:23,150 So that's something to worry about. 471 00:26:23,150 --> 00:26:25,280 We won't worry about that. 472 00:26:25,280 --> 00:26:27,719 But we'll talk about it when we move 473 00:26:27,719 --> 00:26:29,510 to sorting, which is what we're going to do 474 00:26:29,510 --> 00:26:31,980 in just a couple of minutes. 475 00:26:31,980 --> 00:26:35,510 So people buy the solution to the nuts and bolts puzzle. 476 00:26:35,510 --> 00:26:37,520 Clearly, it's going to give you some efficiency. 477 00:26:37,520 --> 00:26:40,660 If you pick a middling size bolt, 478 00:26:40,660 --> 00:26:41,880 maybe you can eyeball that. 479 00:26:41,880 --> 00:26:45,980 It's a little bit of a violation of what we've talked about, 480 00:26:45,980 --> 00:26:46,940 but you could clearly-- 481 00:26:46,940 --> 00:26:50,120 I mean, let's assume that you can make out 482 00:26:50,120 --> 00:26:52,220 the difference between something that's as 483 00:26:52,220 --> 00:26:54,530 thick as this and that finger. 484 00:26:54,530 --> 00:26:56,520 And then you pick something in the middle. 485 00:26:56,520 --> 00:26:59,360 And then you can get two large piles 486 00:26:59,360 --> 00:27:02,510 that are roughly equal in size from the original really large 487 00:27:02,510 --> 00:27:03,320 pile. 488 00:27:03,320 --> 00:27:05,330 And at least at the beginning, you 489 00:27:05,330 --> 00:27:08,030 could probably get, in the context 490 00:27:08,030 --> 00:27:11,605 of this puzzle, piles that are roughly similar in size. 491 00:27:11,605 --> 00:27:12,980 And after that, it doesn't really 492 00:27:12,980 --> 00:27:14,900 matter once you've broken things up. 493 00:27:14,900 --> 00:27:18,407 So N is not 100, but N is more like 5 or 10. 494 00:27:18,407 --> 00:27:20,990 At that point, it doesn't really matter what strategy you use, 495 00:27:20,990 --> 00:27:24,770 because the numbers aren't really different, regardless 496 00:27:24,770 --> 00:27:27,750 of the strategy. 497 00:27:27,750 --> 00:27:29,440 So good, all right. 498 00:27:29,440 --> 00:27:31,870 So I promised you a relationship to sorting. 499 00:27:31,870 --> 00:27:35,540 And obviously, we want to do some programming here. 500 00:27:35,540 --> 00:27:39,310 And it turns out that just like we 501 00:27:39,310 --> 00:27:41,200 had mergesort, which was a divide 502 00:27:41,200 --> 00:27:43,630 and conquer out algorithm, it turns out 503 00:27:43,630 --> 00:27:49,360 this pivoting strategy turns into a strategy for divide 504 00:27:49,360 --> 00:27:52,300 and conquer that is quite different from mergesort, 505 00:27:52,300 --> 00:27:57,490 but is equally applicable to sorting numbers. 506 00:27:57,490 --> 00:28:01,510 And so let me remind you what mergesort was. 507 00:28:01,510 --> 00:28:04,420 And then I'm going to contrast mergesort 508 00:28:04,420 --> 00:28:09,910 with quicksort, which is a pivot-based sorting algorithm. 509 00:28:09,910 --> 00:28:11,830 So forget about nuts and bolts. 510 00:28:11,830 --> 00:28:17,090 We're now down to boring numbers, integers. 511 00:28:17,090 --> 00:28:22,000 And we want to just sort these numbers in ascending order. 512 00:28:22,000 --> 00:28:35,640 So if I have a bunch of numbers and I 513 00:28:35,640 --> 00:28:40,800 want to sort these in ascending order, mergesort would say, 514 00:28:40,800 --> 00:28:48,090 I'm going to go ahead and break it up into halves. 515 00:28:48,090 --> 00:28:50,580 And let's just do one level of recursion. 516 00:28:50,580 --> 00:28:52,650 With all of these things you can always do more, 517 00:28:52,650 --> 00:28:54,810 but these problems aren't large enough 518 00:28:54,810 --> 00:28:58,860 that you really want to do that in this example. 519 00:28:58,860 --> 00:29:01,920 And the important thing is you just broke it into half. 520 00:29:01,920 --> 00:29:02,880 But it was easy. 521 00:29:02,880 --> 00:29:04,680 The split was easy. 522 00:29:04,680 --> 00:29:09,030 You slice the list, splice the list, what have you. 523 00:29:09,030 --> 00:29:13,860 And you assume that somehow you can sort this 524 00:29:13,860 --> 00:29:14,870 in ascending order. 525 00:29:14,870 --> 00:29:21,580 And in this case, you need to go to that. 526 00:29:21,580 --> 00:29:23,520 And then you sort this in ascending order. 527 00:29:23,520 --> 00:29:26,680 So you go like that. 528 00:29:26,680 --> 00:29:29,160 And then you need to do a merge. 529 00:29:29,160 --> 00:29:32,880 And we used what we call the two finger algorithm to do 530 00:29:32,880 --> 00:29:35,970 the merge that essentially says, I'm 531 00:29:35,970 --> 00:29:38,850 going to assume that this array is sorted 532 00:29:38,850 --> 00:29:41,400 and this array is sorted, and I'm 533 00:29:41,400 --> 00:29:48,210 going to put pointers up at the beginning of these two arrays. 534 00:29:48,210 --> 00:29:50,322 And I'm going to do comparisons. 535 00:29:50,322 --> 00:29:51,780 And I'm essentially going to assume 536 00:29:51,780 --> 00:29:59,490 that I have a blank array of size six, blank list of size 537 00:29:59,490 --> 00:30:02,080 six. 538 00:30:02,080 --> 00:30:05,220 And I'm going to be writing into this blank array 539 00:30:05,220 --> 00:30:09,450 the result of the comparison that is, which one is 540 00:30:09,450 --> 00:30:10,950 less in the comparisons. 541 00:30:10,950 --> 00:30:12,840 So I'm going to get minus 31. 542 00:30:12,840 --> 00:30:14,940 And then I compare 0 with minus 4, 543 00:30:14,940 --> 00:30:19,140 and I'm going to get minus 4, 0, et cetera, et cetera. 544 00:30:19,140 --> 00:30:20,220 So that was our merge. 545 00:30:20,220 --> 00:30:21,840 And the important thing to remember 546 00:30:21,840 --> 00:30:28,170 is that our merge algorithm had an easy divide step, 547 00:30:28,170 --> 00:30:32,130 and it had a more difficult step that 548 00:30:32,130 --> 00:30:36,420 corresponded to taking the subarrays that were sorted, 549 00:30:36,420 --> 00:30:39,510 and the work was all in the merge, 550 00:30:39,510 --> 00:30:40,680 putting the things together. 551 00:30:40,680 --> 00:30:44,080 Because the divide, obviously, is just like chop. 552 00:30:44,080 --> 00:30:48,630 Now, as you can see from the nuts and bolts, 553 00:30:48,630 --> 00:30:51,060 division was non-trivial, right? 554 00:30:51,060 --> 00:30:54,120 Division required pivoting. 555 00:30:54,120 --> 00:30:56,850 But there was no real merge after that. 556 00:30:56,850 --> 00:31:00,630 And one side handed off, let's say, to Ganatra 557 00:31:00,630 --> 00:31:04,620 these 50 nuts and bolts after I did the work in pivoting. 558 00:31:04,620 --> 00:31:08,609 And I kept 50 nuts and bolts as well. 559 00:31:08,609 --> 00:31:09,150 We were done. 560 00:31:09,150 --> 00:31:11,760 I mean, maybe I wanted the nuts and bolts back. 561 00:31:11,760 --> 00:31:13,530 I don't want him to run away with it. 562 00:31:13,530 --> 00:31:16,980 But it wasn't like I had to process anything 563 00:31:16,980 --> 00:31:18,822 that he gave me back, right? 564 00:31:18,822 --> 00:31:20,280 If he paired them up, then I didn't 565 00:31:20,280 --> 00:31:21,960 have to process that, right? 566 00:31:21,960 --> 00:31:24,788 So this is mergesort. 567 00:31:28,810 --> 00:31:34,120 The quicksort algorithm, which I'll now describe to you, 568 00:31:34,120 --> 00:31:37,960 is divide and conquer, two-way divide and conquer, 569 00:31:37,960 --> 00:31:42,340 same as mergesort, but it kind of flips the work. 570 00:31:42,340 --> 00:31:45,790 And it does more work up front before the division, 571 00:31:45,790 --> 00:31:49,760 and then this little or no work at the end. 572 00:31:49,760 --> 00:31:51,700 So what happens here is-- 573 00:31:51,700 --> 00:31:55,060 the way we're going to think about quicksort 574 00:31:55,060 --> 00:31:58,900 is we're going to choose a pivot. 575 00:31:58,900 --> 00:32:00,890 You can have some array here. 576 00:32:00,890 --> 00:32:05,710 And I can go ahead and just call these a, b, c, d, e, f, g. 577 00:32:05,710 --> 00:32:09,420 And I'm going to choose a pivot. 578 00:32:09,420 --> 00:32:12,440 And in the code that I'm going to show you, 579 00:32:12,440 --> 00:32:15,460 we're just going to go ahead and choose the last element. 580 00:32:15,460 --> 00:32:17,530 We assume that this is in random order. 581 00:32:17,530 --> 00:32:20,170 We're going to choose the last element of this array 582 00:32:20,170 --> 00:32:22,000 as the pivot. 583 00:32:22,000 --> 00:32:25,660 And what did we do with the pivot back in our nuts 584 00:32:25,660 --> 00:32:27,110 and bolts example? 585 00:32:27,110 --> 00:32:31,736 We-- someone, what did we do with the pivot? 586 00:32:34,640 --> 00:32:35,659 Yeah, go ahead. 587 00:32:35,659 --> 00:32:36,950 AUDIENCE: We spliced around it. 588 00:32:36,950 --> 00:32:39,366 SRINI DEVADAS: Yeah, you just basically spliced around it. 589 00:32:39,366 --> 00:32:40,780 You compared it with-- 590 00:32:40,780 --> 00:32:43,690 we took the pivot bolt, and in this case of the puzzle, 591 00:32:43,690 --> 00:32:44,940 you compared it with the nuts. 592 00:32:44,940 --> 00:32:47,274 Here we don't have nuts and bolts, we just have numbers. 593 00:32:47,274 --> 00:32:48,856 And so you're going to take that pivot 594 00:32:48,856 --> 00:32:51,230 and you're to start comparing it with the other numbers. 595 00:32:51,230 --> 00:32:54,350 And you're exactly right in that we're going to get to the point 596 00:32:54,350 --> 00:32:56,660 where we have to splice round it, 597 00:32:56,660 --> 00:33:00,230 which isn't your standard Python splicing, 598 00:33:00,230 --> 00:33:03,860 because that requires contiguous locations. 599 00:33:03,860 --> 00:33:06,770 But what you want to do is you want to get to a situation 600 00:33:06,770 --> 00:33:09,290 where you have something like this, where you have 601 00:33:09,290 --> 00:33:13,160 g somewhere in the middle and all elements 602 00:33:13,160 --> 00:33:15,230 less than g are to the left. 603 00:33:15,230 --> 00:33:17,960 And let's assume they're all unique elements. 604 00:33:17,960 --> 00:33:21,980 All elements greater than g are to the right, OK? 605 00:33:21,980 --> 00:33:26,330 So this is exactly pivoting around g. 606 00:33:26,330 --> 00:33:34,390 So it's referred to as pivoting around g. 607 00:33:34,390 --> 00:33:38,470 Now, the nice thing here, when you do this, 608 00:33:38,470 --> 00:33:44,530 is that you can go off and you can fix the location of g. 609 00:33:44,530 --> 00:33:49,630 In fact, g's location, just like the pair, 610 00:33:49,630 --> 00:33:54,900 the nut-bolt pair was corresponding to the pivot nut, 611 00:33:54,900 --> 00:33:57,881 and the pivot bolt was determined during the pivoting 612 00:33:57,881 --> 00:33:58,380 step. 613 00:33:58,380 --> 00:34:02,420 And you never had to check whether-- 614 00:34:02,420 --> 00:34:04,170 you never had to discover that pair again. 615 00:34:04,170 --> 00:34:06,240 You can put that pair aside. 616 00:34:06,240 --> 00:34:09,330 The location of g, the pivot chosen, 617 00:34:09,330 --> 00:34:13,650 in the final sorted array, is determined 618 00:34:13,650 --> 00:34:15,989 by this pivoting step, OK? 619 00:34:15,989 --> 00:34:18,020 That makes sense? 620 00:34:18,020 --> 00:34:21,330 And so this location, whatever that index is, 621 00:34:21,330 --> 00:34:23,270 it might be right in the middle. 622 00:34:23,270 --> 00:34:24,949 It might be a little bit skewed. 623 00:34:24,949 --> 00:34:26,750 But that location is determined. 624 00:34:26,750 --> 00:34:31,730 And if you did this with roughly equal piles, if you will, 625 00:34:31,730 --> 00:34:35,090 or equal sides, if there were 100 elements here, 626 00:34:35,090 --> 00:34:38,980 you might see 50-odd here and 40-odd over there. 627 00:34:38,980 --> 00:34:42,620 Now, of course, if g happened to be the largest number, 628 00:34:42,620 --> 00:34:46,320 then g would be all the way to the right. 629 00:34:46,320 --> 00:34:47,670 So there's that to worry about. 630 00:34:50,690 --> 00:34:53,810 But this is the divide step. 631 00:34:53,810 --> 00:34:57,290 So the divide step is the pivoting step. 632 00:34:57,290 --> 00:35:07,000 And then you can go off and sort each of those sublists. 633 00:35:07,000 --> 00:35:09,010 Because those less than g are not 634 00:35:09,010 --> 00:35:13,060 necessarily in ascending order. 635 00:35:13,060 --> 00:35:14,390 You just dumped them. 636 00:35:14,390 --> 00:35:16,160 There's still work to be done. 637 00:35:16,160 --> 00:35:17,890 And then the greater than g, likewise. 638 00:35:17,890 --> 00:35:19,820 They're not necessarily in ascending order. 639 00:35:19,820 --> 00:35:23,530 But you know that there's 42 elements corresponding 640 00:35:23,530 --> 00:35:25,937 to less than g, and you can go sort that array. 641 00:35:25,937 --> 00:35:27,520 And those are going to be the first 42 642 00:35:27,520 --> 00:35:30,310 elements of your final array. 643 00:35:30,310 --> 00:35:34,456 And the g is going to be at the 43rd position. 644 00:35:34,456 --> 00:35:36,580 And then the ones that are greater than g are going 645 00:35:36,580 --> 00:35:37,840 to be to the right of that. 646 00:35:37,840 --> 00:35:40,300 All make sense? 647 00:35:40,300 --> 00:35:41,556 Yeah, go ahead, Fadi. 648 00:35:41,556 --> 00:35:44,639 AUDIENCE: So like if we were very unlucky and at each step, 649 00:35:44,639 --> 00:35:46,555 we just picked the largest number, the largest 650 00:35:46,555 --> 00:35:48,060 number, the largest number, then it's 651 00:35:48,060 --> 00:35:49,160 got to be N squared, correct? 652 00:35:49,160 --> 00:35:49,630 SRINI DEVADAS: That's right. 653 00:35:49,630 --> 00:35:50,920 So that's pathological. 654 00:35:50,920 --> 00:35:54,130 Now, it turns out that, amazingly, in order 655 00:35:54,130 --> 00:35:57,190 to avoid that situation when people 656 00:35:57,190 --> 00:36:00,580 use quicksort in practice, which is essentially this algorithm 657 00:36:00,580 --> 00:36:04,240 that I describe, they take the array that they're given 658 00:36:04,240 --> 00:36:06,130 and they actually randomize it. 659 00:36:06,130 --> 00:36:08,450 They actually make it-- 660 00:36:08,450 --> 00:36:10,120 they shake it up. 661 00:36:10,120 --> 00:36:12,185 It's like you get these nuts and bolts, 662 00:36:12,185 --> 00:36:13,810 and you want to close your eyes and you 663 00:36:13,810 --> 00:36:16,210 want to pick up a nut that's middling in size, 664 00:36:16,210 --> 00:36:18,544 and maybe all the small ones are up at the top 665 00:36:18,544 --> 00:36:20,210 and the big ones are down at the bottom. 666 00:36:20,210 --> 00:36:21,940 You want things in the middle, so you 667 00:36:21,940 --> 00:36:23,200 go, you shake, shake, shake. 668 00:36:23,200 --> 00:36:24,179 You randomize. 669 00:36:24,179 --> 00:36:25,720 And then go stick your hand in About. 670 00:36:25,720 --> 00:36:28,090 Halfway through the pile and pick up a nut, 671 00:36:28,090 --> 00:36:29,410 or pick up a bolt, right? 672 00:36:29,410 --> 00:36:31,380 That's kind of what happens here. 673 00:36:31,380 --> 00:36:33,400 So quicksort, it turns out-- 674 00:36:33,400 --> 00:36:36,160 and I'll just say this, and if this doesn't make sense, 675 00:36:36,160 --> 00:36:37,910 ask me later. 676 00:36:37,910 --> 00:36:40,900 But it should give you some intuition. 677 00:36:40,900 --> 00:36:46,720 The randomized quicksort, where you have this random input, 678 00:36:46,720 --> 00:36:49,240 and you have some probabilistic guarantee that you're 679 00:36:49,240 --> 00:36:53,920 picking a pivot that is middling in size in terms of an integer 680 00:36:53,920 --> 00:36:58,700 size, is going to be N log N in complexity. 681 00:36:58,700 --> 00:37:00,910 But the worst case complexity of quicksort 682 00:37:00,910 --> 00:37:04,180 is exactly as you said, it's N square. 683 00:37:04,180 --> 00:37:07,570 But that's really not something that you 684 00:37:07,570 --> 00:37:11,530 need to understand deeply to understand, well, 685 00:37:11,530 --> 00:37:14,800 either the puzzle or the rest of this lecture. 686 00:37:14,800 --> 00:37:16,534 You do get a sense of-- 687 00:37:16,534 --> 00:37:18,700 as long as you have a sense of if you pick something 688 00:37:18,700 --> 00:37:21,400 in the middle and these piles are roughly equal in size, 689 00:37:21,400 --> 00:37:23,200 I'm going to get some improvement. 690 00:37:23,200 --> 00:37:25,660 The guarantee that the piles are roughly 691 00:37:25,660 --> 00:37:29,950 equal in size, all the way down to the depths of recursion, 692 00:37:29,950 --> 00:37:33,880 is a difficult one to achieve in a deterministic way. 693 00:37:33,880 --> 00:37:37,780 But it's not hard to achieve in a probablistic way 694 00:37:37,780 --> 00:37:39,289 by doing this randomization. 695 00:37:39,289 --> 00:37:40,330 You had another question? 696 00:37:40,330 --> 00:37:40,860 Yeah. 697 00:37:40,860 --> 00:37:43,235 AUDIENCE: So then why do we use this more than mergesort, 698 00:37:43,235 --> 00:37:43,860 which is like-- 699 00:37:43,860 --> 00:37:45,443 SRINI DEVADAS: Ah, that's exactly-- so 700 00:37:45,443 --> 00:37:48,300 that is the rest of the lecture, which is only 10 minutes left. 701 00:37:48,300 --> 00:37:51,750 But OK, so that's the rest of the lecture. 702 00:37:51,750 --> 00:37:55,810 So one of the things that I said here, 703 00:37:55,810 --> 00:38:00,460 I said a blank list of size six. 704 00:38:00,460 --> 00:38:03,480 So what mergesort requires, in order for mergesort 705 00:38:03,480 --> 00:38:07,630 to be efficient, you needed auxiliary storage 706 00:38:07,630 --> 00:38:10,750 that corresponded to the size of the list. 707 00:38:10,750 --> 00:38:13,660 Because right at the first level of recursion, 708 00:38:13,660 --> 00:38:17,140 when you got two arrays that were N/2 and N/2, 709 00:38:17,140 --> 00:38:20,290 and they were both sorted, and you had to merge the two 710 00:38:20,290 --> 00:38:23,080 together to create the final result corresponding 711 00:38:23,080 --> 00:38:25,570 to the sorted array, you ended up 712 00:38:25,570 --> 00:38:29,140 requiring storage that was N in size 713 00:38:29,140 --> 00:38:32,740 in order to actually do this two finger algorithm 714 00:38:32,740 --> 00:38:35,050 and compare the minus 31's with the zeros 715 00:38:35,050 --> 00:38:37,540 and then write them into this array. 716 00:38:37,540 --> 00:38:44,800 Now, the obvious way of taking this and going to here 717 00:38:44,800 --> 00:38:49,570 is code that I'm going to show you that is easy to write. 718 00:38:49,570 --> 00:38:55,516 And it requires a blank list as well. 719 00:38:55,516 --> 00:38:56,890 I mean, the easy way to do that-- 720 00:38:56,890 --> 00:38:59,380 and I'll just show you the code, because it's easier 721 00:38:59,380 --> 00:39:06,040 to show you the code as opposed to waving my hands here 722 00:39:06,040 --> 00:39:08,240 and potentially confusing people. 723 00:39:08,240 --> 00:39:13,600 But let me show you first the quicksort divide and conquer, 724 00:39:13,600 --> 00:39:15,910 which is absolutely trivial. 725 00:39:15,910 --> 00:39:17,640 I mean, just like mergesort was. 726 00:39:17,640 --> 00:39:22,660 But pivot partition is this step here. 727 00:39:22,660 --> 00:39:24,820 This is the pivoting step. 728 00:39:24,820 --> 00:39:28,630 And this is what that procedure does. 729 00:39:28,630 --> 00:39:30,950 So we'll get to that in just a second. 730 00:39:30,950 --> 00:39:34,420 But the rest of it is simply I'm going to go ahead, 731 00:39:34,420 --> 00:39:37,010 and once I do the pivot partition, 732 00:39:37,010 --> 00:39:39,290 I have something that looks like this. 733 00:39:39,290 --> 00:39:41,290 And then I just go off and run quicksort on this 734 00:39:41,290 --> 00:39:42,670 and quicksort on that. 735 00:39:42,670 --> 00:39:45,850 And I'm doing that on the array itself. 736 00:39:45,850 --> 00:39:47,980 It's just like I have this array, 737 00:39:47,980 --> 00:39:51,150 and I'm just saying I'm going to call quicksort 738 00:39:51,150 --> 00:39:53,560 with the indices that are associated 739 00:39:53,560 --> 00:39:55,841 with the beginning and the end of this array. 740 00:39:55,841 --> 00:39:57,340 And then I'm going to call quicksort 741 00:39:57,340 --> 00:40:00,460 with the indices that begin with the beginning and the end 742 00:40:00,460 --> 00:40:02,860 of this array. 743 00:40:02,860 --> 00:40:07,960 And that same array is going to be my final output, right? 744 00:40:07,960 --> 00:40:10,470 So everything is done-- 745 00:40:10,470 --> 00:40:13,570 the term that's used here is "in place." 746 00:40:13,570 --> 00:40:15,550 So in place sorting and insertion 747 00:40:15,550 --> 00:40:17,950 sort, which is also N square, and a few other things 748 00:40:17,950 --> 00:40:21,100 are in place sorting algorithms, which essentially say, look, 749 00:40:21,100 --> 00:40:25,960 I need to sort all of you in some way, 750 00:40:25,960 --> 00:40:28,240 perhaps by age or something. 751 00:40:28,240 --> 00:40:31,480 And I don't want another room, right? 752 00:40:31,480 --> 00:40:33,430 I don't want to say, who's the youngest here, 753 00:40:33,430 --> 00:40:35,342 and then go over to that room, and then 754 00:40:35,342 --> 00:40:37,800 who's the next youngest and go over to the room, et cetera. 755 00:40:37,800 --> 00:40:40,265 I just want you to start swapping positions here. 756 00:40:40,265 --> 00:40:41,890 And we're all going to be in this room, 757 00:40:41,890 --> 00:40:43,840 and somehow we're going to get sorted. 758 00:40:43,840 --> 00:40:45,260 So that's in place. 759 00:40:45,260 --> 00:40:49,030 And when I have an array that corresponds to integers, 760 00:40:49,030 --> 00:40:51,850 I don't want another blank list or blank array 761 00:40:51,850 --> 00:40:53,650 and use that storage. 762 00:40:53,650 --> 00:40:55,870 And so that answers your question 763 00:40:55,870 --> 00:40:58,810 at the top level as to why quicksort is used. 764 00:40:58,810 --> 00:41:01,780 And it's used because the memory requirements of quicksort 765 00:41:01,780 --> 00:41:03,610 are substantially less than the memory 766 00:41:03,610 --> 00:41:05,380 requirements of mergesort. 767 00:41:05,380 --> 00:41:09,280 And in fact, you can do this pivoting step 768 00:41:09,280 --> 00:41:14,375 using one integer as storage. 769 00:41:14,375 --> 00:41:16,000 You need the original array, of course, 770 00:41:16,000 --> 00:41:18,700 but you needed that to store all of the numbers. 771 00:41:18,700 --> 00:41:21,850 And the auxiliary storage, the extra storage that you need, 772 00:41:21,850 --> 00:41:23,785 is exactly the storage for one integer 773 00:41:23,785 --> 00:41:26,110 where you store the pivot, OK? 774 00:41:26,110 --> 00:41:28,200 And I'm going to show you the naive way, which 775 00:41:28,200 --> 00:41:30,970 is the regular quicksort. 776 00:41:30,970 --> 00:41:34,690 And that is a trivial pivot partition. 777 00:41:34,690 --> 00:41:37,070 It's not the clever one, which essentially says, 778 00:41:37,070 --> 00:41:38,620 look, I'm going to go ahead and I'm 779 00:41:38,620 --> 00:41:41,350 going to create a new array. 780 00:41:41,350 --> 00:41:44,110 In fact, I have two arrays that are less and more, 781 00:41:44,110 --> 00:41:45,190 or two lists. 782 00:41:45,190 --> 00:41:47,830 Together the sizes of less and more 783 00:41:47,830 --> 00:41:52,300 are going to be the size of the original, maybe minus 1. 784 00:41:52,300 --> 00:41:54,940 I'm adding to less and adding to more, 785 00:41:54,940 --> 00:41:57,460 depending on whether the element is less than the pivot 786 00:41:57,460 --> 00:41:59,060 or greater than the pivot. 787 00:41:59,060 --> 00:42:00,490 And then I'm good. 788 00:42:00,490 --> 00:42:03,790 So there's that algorithm, which would just be, oh yeah, 789 00:42:03,790 --> 00:42:06,010 I'm going to create a new list here. 790 00:42:06,010 --> 00:42:07,960 And I'm going to compare a to g. 791 00:42:07,960 --> 00:42:11,740 And then I'm going to compare b to g, et cetera, et cetera. 792 00:42:11,740 --> 00:42:13,870 So this does not give you anything. 793 00:42:13,870 --> 00:42:16,000 This is an uninteresting algorithm. 794 00:42:16,000 --> 00:42:17,830 I'm sorry. 795 00:42:17,830 --> 00:42:20,680 Implementation-- this is an uninteresting implementation, 796 00:42:20,680 --> 00:42:22,150 OK? 797 00:42:22,150 --> 00:42:23,410 It is much more-- 798 00:42:27,620 --> 00:42:30,620 you need to be much more clever-- 799 00:42:30,620 --> 00:42:33,020 I don't need this anymore-- 800 00:42:33,020 --> 00:42:36,260 to do this in place. 801 00:42:36,260 --> 00:42:45,770 So what I want to do is I'm going to take an example here. 802 00:42:45,770 --> 00:42:53,503 I want to take this array. 803 00:43:00,330 --> 00:43:03,390 And I'm going to not talk about the recursive 804 00:43:03,390 --> 00:43:05,940 sorting or anything like that, because that's easy. 805 00:43:05,940 --> 00:43:07,470 We kind of know how to do that. 806 00:43:07,470 --> 00:43:12,460 What I want to do is I want to choose this as the pivot. 807 00:43:12,460 --> 00:43:16,950 And I want to translate that somehow 808 00:43:16,950 --> 00:43:23,800 into the final pivoted output, which 809 00:43:23,800 --> 00:43:37,560 is going to be 0, minus 31, 1, 2, 65, 99. 810 00:43:41,326 --> 00:43:47,165 Oops, I have something wrong here. 811 00:43:47,165 --> 00:43:48,790 So this should be the other way around. 812 00:43:48,790 --> 00:43:49,920 83, right? 813 00:43:49,920 --> 00:43:51,056 OK. 814 00:43:51,056 --> 00:43:51,800 Mm, bug. 815 00:43:58,190 --> 00:43:59,779 Oh, I'm sorry, I'm sorry. 816 00:43:59,779 --> 00:44:00,320 This is fine. 817 00:44:00,320 --> 00:44:00,770 This is fine. 818 00:44:00,770 --> 00:44:01,436 I just need it-- 819 00:44:01,436 --> 00:44:03,031 I don't need it to be sorted. 820 00:44:03,031 --> 00:44:03,530 OK, good. 821 00:44:03,530 --> 00:44:04,850 Whew. 822 00:44:04,850 --> 00:44:07,760 All right. 823 00:44:07,760 --> 00:44:10,920 I just thought I found the first bug in my book. 824 00:44:10,920 --> 00:44:14,420 So there's probably many bugs in the book, 825 00:44:14,420 --> 00:44:16,000 but I don't want to know. 826 00:44:20,420 --> 00:44:22,570 So I don't need this to be sorted. 827 00:44:22,570 --> 00:44:24,910 Yeah, so there's a 0, a minus 31. 828 00:44:24,910 --> 00:44:27,680 So the key is that I've discovered this. 829 00:44:27,680 --> 00:44:30,650 And everything to the left of that is less than 1. 830 00:44:30,650 --> 00:44:33,110 And everything to the right of that is greater than 1. 831 00:44:33,110 --> 00:44:36,290 And that's the only requirement that I have. 832 00:44:36,290 --> 00:44:39,020 And as I said, I mean, I said this to begin with 833 00:44:39,020 --> 00:44:41,182 and then I forgot, I said we're not going 834 00:44:41,182 --> 00:44:42,390 to worry about the recursion. 835 00:44:42,390 --> 00:44:45,620 So clearly, I have to do more work here 836 00:44:45,620 --> 00:44:49,880 with respect to taking this set of numbers 837 00:44:49,880 --> 00:44:52,640 and sorting them in ascending order, et cetera. 838 00:44:52,640 --> 00:44:57,620 But what I've done here is I know that 1 is fixed in place, 839 00:44:57,620 --> 00:44:59,690 and 1 will never move. 840 00:44:59,690 --> 00:45:03,500 And I just have to turn this into minus 31 and 0. 841 00:45:03,500 --> 00:45:05,090 And I have to do some reordering here. 842 00:45:05,090 --> 00:45:08,150 But 1 is fixed in place. 843 00:45:08,150 --> 00:45:10,879 Now, the algorithm that I'm going to show you, 844 00:45:10,879 --> 00:45:12,170 I'm going to show you the code. 845 00:45:12,170 --> 00:45:16,410 And it's probably code that you won't 846 00:45:16,410 --> 00:45:24,050 be able to parse, at least in minutes or seconds. 847 00:45:24,050 --> 00:45:28,490 And this code is in place pivoting. 848 00:45:28,490 --> 00:45:31,310 It's clever code. 849 00:45:31,310 --> 00:45:37,610 And what it does is it has one additional variable 850 00:45:37,610 --> 00:45:42,800 worth of storage that we're going to call the pivot. 851 00:45:42,800 --> 00:45:46,190 So you see the variable pivot there. 852 00:45:46,190 --> 00:45:51,860 And just using that one additional integer storage, 853 00:45:51,860 --> 00:45:57,590 it manages to transform this array using a linear number 854 00:45:57,590 --> 00:46:01,160 of steps into this array. 855 00:46:01,160 --> 00:46:03,380 So you can see that this is non-trivial. 856 00:46:03,380 --> 00:46:06,530 And I need to move from here to there without having 857 00:46:06,530 --> 00:46:07,350 this extra storage. 858 00:46:07,350 --> 00:46:09,141 If you had extra storage, it would be easy. 859 00:46:09,141 --> 00:46:10,260 We know how to do that. 860 00:46:10,260 --> 00:46:12,890 But if you didn't, you need this code 861 00:46:12,890 --> 00:46:15,020 that has an outer while loop. 862 00:46:15,020 --> 00:46:16,430 You see while not done. 863 00:46:16,430 --> 00:46:18,560 And then it's got two inner while loops in it. 864 00:46:18,560 --> 00:46:21,590 And it's got a couple of counters, couple of pointers 865 00:46:21,590 --> 00:46:23,790 I should say, to indices. 866 00:46:23,790 --> 00:46:29,150 And you go left and right, and you magically get this answer. 867 00:46:29,150 --> 00:46:32,720 But there's no magic here, because we have the code up 868 00:46:32,720 --> 00:46:33,750 and we can run it. 869 00:46:33,750 --> 00:46:37,430 And computers aren't smart, so clearly, 870 00:46:37,430 --> 00:46:38,930 that code is doing something clever, 871 00:46:38,930 --> 00:46:41,192 and we just need to understand that, all right? 872 00:46:41,192 --> 00:46:41,900 So I'm going to-- 873 00:46:41,900 --> 00:46:43,562 in the couple of minutes I have left, 874 00:46:43,562 --> 00:46:45,020 this is the last thing I want to do 875 00:46:45,020 --> 00:46:48,590 is I want to tell you how this code works. 876 00:46:48,590 --> 00:46:50,440 And it's really pretty code. 877 00:46:50,440 --> 00:46:52,130 And it's clever code. 878 00:46:52,130 --> 00:46:59,430 So what I have is I have two pointers, top and bottom. 879 00:46:59,430 --> 00:47:02,510 So I'm going to, essentially, say that top is-- 880 00:47:02,510 --> 00:47:05,700 bottom is 0. 881 00:47:05,700 --> 00:47:08,840 Initially, it's minus 1, but we go ahead and increment it. 882 00:47:08,840 --> 00:47:10,700 And top is 8. 883 00:47:10,700 --> 00:47:13,130 And so how many-- 884 00:47:13,130 --> 00:47:20,870 I have 9-- 1, 2, 3, 4, 5, 6, 7, 8, 9. 885 00:47:20,870 --> 00:47:24,110 So top is pointing to the pivot. 886 00:47:24,110 --> 00:47:26,990 So this is pointing to pivot, which top is 8. 887 00:47:26,990 --> 00:47:31,240 And bottom is pointing to the first element. 888 00:47:31,240 --> 00:47:34,250 So these two while loops are essentially 889 00:47:34,250 --> 00:47:40,160 going from the left of the array and going this way 890 00:47:40,160 --> 00:47:42,200 from the right of the array. 891 00:47:42,200 --> 00:47:45,800 So what we're going to do-- 892 00:47:45,800 --> 00:47:48,770 and don't worry about exactly how the code does this. 893 00:47:48,770 --> 00:47:52,130 I'm going to show you the way this array is transformed 894 00:47:52,130 --> 00:47:56,990 and the way these pointers are changed. 895 00:47:56,990 --> 00:48:00,200 And then you'll get a sense of how this code works. 896 00:48:00,200 --> 00:48:01,910 And obviously, this transformation 897 00:48:01,910 --> 00:48:04,500 happens in one of those while loops. 898 00:48:04,500 --> 00:48:08,270 So we have the pivot corresponding to 1. 899 00:48:08,270 --> 00:48:10,910 And bottom equals 0 and top equals 8. 900 00:48:10,910 --> 00:48:12,980 What I'm going to do is I'm going 901 00:48:12,980 --> 00:48:19,610 to start moving leftward from-- 902 00:48:19,610 --> 00:48:21,845 this would be the second inner while loop. 903 00:48:21,845 --> 00:48:22,970 I'm going to start moving-- 904 00:48:29,718 --> 00:48:32,490 oh, I'm sorry. 905 00:48:32,490 --> 00:48:35,832 I'll start with bottom. 906 00:48:35,832 --> 00:48:38,040 I'm going to increment-- bottom is initially minus 1. 907 00:48:38,040 --> 00:48:40,410 I increment it to 0. 908 00:48:40,410 --> 00:48:47,850 And I start moving rightward from the left of the list 909 00:48:47,850 --> 00:48:53,490 and try and find an element that is greater than 1. 910 00:48:53,490 --> 00:48:55,530 And that is immediate. 911 00:48:55,530 --> 00:48:59,460 I realize that 4 is greater than 1, OK? 912 00:48:59,460 --> 00:49:02,430 So when I realize that 4 is greater than 1, 913 00:49:02,430 --> 00:49:05,670 I'm going to copy over 4. 914 00:49:05,670 --> 00:49:08,640 And this is doesn't mean that I'm copying the entire array. 915 00:49:08,640 --> 00:49:10,350 I'm just going to copy over 4. 916 00:49:10,350 --> 00:49:12,960 But I'm writing what the array looks like, because that's 917 00:49:12,960 --> 00:49:16,080 important, over to here. 918 00:49:16,080 --> 00:49:18,860 And you might say-- 919 00:49:18,860 --> 00:49:19,950 pivot equals 1. 920 00:49:19,950 --> 00:49:23,910 You might say, oh, but I overwrote the location 1. 921 00:49:23,910 --> 00:49:26,680 And I'm like, don't worry about it. 922 00:49:26,680 --> 00:49:30,570 I do have 1 stored in my pivot. 923 00:49:30,570 --> 00:49:33,600 So my pivot has 1 in it. 924 00:49:33,600 --> 00:49:35,310 So I haven't lost anything here. 925 00:49:35,310 --> 00:49:37,350 It's not like I threw away any locations. 926 00:49:37,350 --> 00:49:39,290 I do have that extra location. 927 00:49:39,290 --> 00:49:44,100 So I did this rightward move going from the left. 928 00:49:44,100 --> 00:49:52,890 Now I'm going to go leftward from the right. 929 00:49:52,890 --> 00:49:56,765 And I'm going to look for something that is less than 1 930 00:49:56,765 --> 00:49:58,140 and I'm going to try and move it. 931 00:49:58,140 --> 00:49:59,880 So basically, what this algorithm does 932 00:49:59,880 --> 00:50:04,230 is it tries to find things that are-- 933 00:50:04,230 --> 00:50:09,349 if a is greater than g, it moves a to the rightmost part 934 00:50:09,349 --> 00:50:09,890 of the array. 935 00:50:12,670 --> 00:50:14,450 And the same thing, if d is less than g, 936 00:50:14,450 --> 00:50:17,640 then it moves it to the left part of the array. 937 00:50:17,640 --> 00:50:21,420 So depending on whether the comparison to g 938 00:50:21,420 --> 00:50:23,550 is greater or less, you want to end up 939 00:50:23,550 --> 00:50:25,020 in the edges of the array. 940 00:50:25,020 --> 00:50:27,480 And you're going to try and get the edges of the array 941 00:50:27,480 --> 00:50:31,470 to be correct in the sense that they have elements 942 00:50:31,470 --> 00:50:33,294 on the left that are less than the pivot, 943 00:50:33,294 --> 00:50:34,710 and the elements on the right that 944 00:50:34,710 --> 00:50:35,980 are greater than the pivot. 945 00:50:35,980 --> 00:50:38,850 And you're trying to get to the middle. 946 00:50:38,850 --> 00:50:41,370 And when your two pointers converge, 947 00:50:41,370 --> 00:50:45,060 your top and bottom pointers converge, you're done. 948 00:50:45,060 --> 00:50:50,010 So let's do a couple more steps and close this lecture. 949 00:50:50,010 --> 00:50:53,520 So I did the right step. 950 00:50:53,520 --> 00:50:56,020 Now I'm going rightward. 951 00:50:56,020 --> 00:50:57,630 And now I'm going to go leftward. 952 00:50:57,630 --> 00:51:01,740 What is the first element that is less than 1-- 953 00:51:04,590 --> 00:51:06,960 no, less, than 1. 954 00:51:06,960 --> 00:51:10,654 0, right? 955 00:51:10,654 --> 00:51:11,820 Yes, but I'm going this way. 956 00:51:11,820 --> 00:51:12,990 AUDIENCE: oh! 957 00:51:12,990 --> 00:51:15,370 SRINI DEVADAS: Yeah, I'm going-- 958 00:51:15,370 --> 00:51:17,850 I'm glad you pointed that out. 959 00:51:17,850 --> 00:51:19,740 I needed to make sure. 960 00:51:19,740 --> 00:51:22,920 So I went rightward the first time, 961 00:51:22,920 --> 00:51:25,130 and I want to go leftward this time. 962 00:51:25,130 --> 00:51:29,050 So this would be-- so I come here and I see 0. 963 00:51:29,050 --> 00:51:31,470 When I see 0, what I'm going to do 964 00:51:31,470 --> 00:51:34,950 is I'm going to transform the array. 965 00:51:34,950 --> 00:51:38,060 I'll write this out to make sure I get this right. 966 00:51:38,060 --> 00:51:48,180 And so right now I have 4, 65 to minus 31, 0, et cetera. 967 00:51:48,180 --> 00:51:52,940 And I'm going to go this way, and I'm going to put-- 968 00:51:52,940 --> 00:51:58,290 when I see the 0, I'm going to copy over 0 over here 969 00:51:58,290 --> 00:52:04,570 and leave the 0 in here for a second. 970 00:52:04,570 --> 00:52:09,590 And there's no problem here, because 4 was copied over here. 971 00:52:09,590 --> 00:52:13,030 And so now I overwrote 4 with 0. 972 00:52:13,030 --> 00:52:15,444 OK, now you kind of see, maybe get some sense 973 00:52:15,444 --> 00:52:16,610 of how this algorithm works. 974 00:52:16,610 --> 00:52:21,050 This is the first interesting step where I took 0, 975 00:52:21,050 --> 00:52:23,350 and because 0 is less than 1, as I mentioned, 976 00:52:23,350 --> 00:52:26,200 I want to jam it all the way to the left. 977 00:52:26,200 --> 00:52:28,030 Just put it all the way to the left. 978 00:52:28,030 --> 00:52:31,540 And I'm cool with losing 4, because 4 is being put 979 00:52:31,540 --> 00:52:33,650 all the way to the right, OK? 980 00:52:33,650 --> 00:52:37,690 Now, at this point, my bottom and top 981 00:52:37,690 --> 00:52:40,630 are going to get modified. 982 00:52:40,630 --> 00:52:45,820 My bottom is still 0, but the top is 4, 983 00:52:45,820 --> 00:52:50,530 because I moved-- top was 8, and I decremented top all the way 984 00:52:50,530 --> 00:52:51,940 to the point where it became 4. 985 00:52:51,940 --> 00:52:54,340 And I realized that 0 was less than 1. 986 00:52:54,340 --> 00:52:59,540 And then I copied over that value over to the bottom. 987 00:52:59,540 --> 00:53:03,030 You can kind of see the code up there. 988 00:53:03,030 --> 00:53:05,350 This is actually an output of that code. 989 00:53:05,350 --> 00:53:07,330 And I put a 0 in here. 990 00:53:07,330 --> 00:53:09,250 So that's what my counters look like. 991 00:53:09,250 --> 00:53:11,920 And it's only a couple of more steps. 992 00:53:11,920 --> 00:53:13,900 We're going to go now go right. 993 00:53:13,900 --> 00:53:18,130 And I'm going to go do this again, except that 0 is already 994 00:53:18,130 --> 00:53:21,850 taken care of, so now I see 65. 995 00:53:21,850 --> 00:53:28,270 Clearly I'm going to have a situation where 996 00:53:28,270 --> 00:53:32,710 if I go this way, 65 is greater than the pivot, 997 00:53:32,710 --> 00:53:39,670 so I'm going to go 0, 65, 2, minus 31. 998 00:53:39,670 --> 00:53:41,810 And because 65 is greater than the pivot, 999 00:53:41,810 --> 00:53:46,580 I'm going to write it into what the top was, which obviously 1000 00:53:46,580 --> 00:53:48,080 got copied over that way. 1001 00:53:48,080 --> 00:53:54,820 So I'm going to have 65 in here, 99, 83, 782, and 4. 1002 00:53:54,820 --> 00:54:00,370 So now after this step, bottom equals 1. 1003 00:54:00,370 --> 00:54:04,450 And the reason for that was 65 was in the location on 1, 1004 00:54:04,450 --> 00:54:07,730 and top equals 4. 1005 00:54:07,730 --> 00:54:09,680 And then the next one, someone want 1006 00:54:09,680 --> 00:54:13,550 to tell me what the next one is going to be? 1007 00:54:13,550 --> 00:54:15,410 If I go leftward. 1008 00:54:15,410 --> 00:54:17,990 I'm going to start from here, top was 4. 1009 00:54:17,990 --> 00:54:20,690 So I was pointing at 65. 1010 00:54:20,690 --> 00:54:33,050 And I'm looking for something that when I go this way, 1011 00:54:33,050 --> 00:54:35,090 I saw 65 greater than 1. 1012 00:54:35,090 --> 00:54:38,690 Now I'm looking for something that is less than 1. 1013 00:54:38,690 --> 00:54:41,256 And minus 31 is less than 1, correct? 1014 00:54:41,256 --> 00:54:43,880 When I go this way, I'm looking for something that's less than. 1015 00:54:43,880 --> 00:54:48,560 So given that it's minus 31, I'm going to copy over minus 31 1016 00:54:48,560 --> 00:54:50,630 to what top was pointing to. 1017 00:54:50,630 --> 00:55:00,890 So I have minus 31, 2, minus 31, 65, 99, 28, 782, and 4. 1018 00:55:00,890 --> 00:55:04,280 And at this point, I'm going to have bottom 1019 00:55:04,280 --> 00:55:08,750 equals 1 and top equals 3. 1020 00:55:08,750 --> 00:55:16,420 Keep going one more, and I get to the point where I see 2. 1021 00:55:16,420 --> 00:55:18,070 2 is greater than 1. 1022 00:55:18,070 --> 00:55:21,220 0 and minus 31 are less than 1. 1023 00:55:21,220 --> 00:55:23,770 And so I take 2 and I copy it over here. 1024 00:55:31,220 --> 00:55:34,650 And now at this point, when I do an increment, 1025 00:55:34,650 --> 00:55:41,820 I realize that bottom is 2 and top is 3. 1026 00:55:41,820 --> 00:55:44,940 And in the very next step, bottom will equal top. 1027 00:55:44,940 --> 00:55:48,360 And I'm sitting up here, and so I copy-- 1028 00:55:48,360 --> 00:55:49,770 I make two-- 1029 00:55:49,770 --> 00:55:53,400 I'd write the pivot into the location that 1030 00:55:53,400 --> 00:55:58,950 corresponds to the bottom after top got decremented. 1031 00:55:58,950 --> 00:56:01,620 And in that case, both bottom and top are 2. 1032 00:56:01,620 --> 00:56:06,090 And that index, 0, 1, and 2, the pivot gets written into it. 1033 00:56:06,090 --> 00:56:07,732 And so that's exactly what I had, 1034 00:56:07,732 --> 00:56:10,380 which I wrote right at the beginning. 1035 00:56:10,380 --> 00:56:16,860 So I would say this is probably the most complicated code 1036 00:56:16,860 --> 00:56:20,150 from a control flow standpoint that I've ever shown you. 1037 00:56:20,150 --> 00:56:24,510 This algorithm is an in place algorithm that does not 1038 00:56:24,510 --> 00:56:26,710 require any extra storage. 1039 00:56:26,710 --> 00:56:29,400 And that's exactly why it's so popular. 1040 00:56:29,400 --> 00:56:33,420 People are sorting billions of elements in lists, 1041 00:56:33,420 --> 00:56:35,850 and you can't use gigabytes of storage 1042 00:56:35,850 --> 00:56:37,680 during the sorting process. 1043 00:56:37,680 --> 00:56:41,340 And what this clever strategy tells you is you 1044 00:56:41,340 --> 00:56:42,780 don't need all of that storage. 1045 00:56:42,780 --> 00:56:44,310 You can do things in place. 1046 00:56:44,310 --> 00:56:46,710 And as long as you choose the pivot reasonably well, 1047 00:56:46,710 --> 00:56:49,510 you get your average case N log N complexity, 1048 00:56:49,510 --> 00:56:52,620 which is the same as the worst case complexity of mergesort. 1049 00:56:52,620 --> 00:56:55,660 So you've got to be a little bit careful with that.