1 00:00:00,000 --> 00:00:01,924 [SQUEAKING] 2 00:00:01,924 --> 00:00:04,329 [RUSTLING] 3 00:00:04,329 --> 00:00:05,772 [CLICKING] 4 00:00:12,510 --> 00:00:13,510 ERIK DEMAINE: All right. 5 00:00:13,510 --> 00:00:18,080 Welcome to practice problem session 3, 006. 6 00:00:18,080 --> 00:00:20,600 Today, we are going to go through a bunch of problems, 7 00:00:20,600 --> 00:00:23,497 which you should have already. 8 00:00:23,497 --> 00:00:25,580 I was thinking of skipping the very first problem, 9 00:00:25,580 --> 00:00:27,200 because it's just a mechanical thing. 10 00:00:27,200 --> 00:00:28,910 If we have time at the end, we can come back to it. 11 00:00:28,910 --> 00:00:31,310 But there's not really any insight I can give you in how 12 00:00:31,310 --> 00:00:32,352 to approach this problem. 13 00:00:32,352 --> 00:00:34,220 It's just, do you understand hashing? 14 00:00:34,220 --> 00:00:36,470 So I want to go into the more creative problems first. 15 00:00:36,470 --> 00:00:41,010 Let's start with problem 3.2, hash sequence. 16 00:00:41,010 --> 00:00:42,080 So I'll just read it. 17 00:00:42,080 --> 00:00:46,070 And then our first task is to convert the word problem 18 00:00:46,070 --> 00:00:49,635 into a concise formal algorithms thing we need to achieve. 19 00:00:49,635 --> 00:00:52,010 Then we need to come up with ideas for how to achieve it. 20 00:00:52,010 --> 00:00:53,990 And we need to check the details. 21 00:00:53,990 --> 00:00:55,885 That'll be our general pattern. 22 00:00:55,885 --> 00:00:57,260 So this problem says, hash tables 23 00:00:57,260 --> 00:00:59,210 are not only useful for implementing set operations, 24 00:00:59,210 --> 00:01:01,085 they can also be used to implement sequences. 25 00:01:01,085 --> 00:01:04,730 Remember from lecture 2, we have a set interface, 26 00:01:04,730 --> 00:01:07,370 which is about querying items by their key 27 00:01:07,370 --> 00:01:09,260 and sort of intrinsic order that's 28 00:01:09,260 --> 00:01:11,672 about the items themselves, versus a sequence 29 00:01:11,672 --> 00:01:13,880 interface that we started out with, with linked lists 30 00:01:13,880 --> 00:01:18,482 and so on, and arrays, where we're given an order, 31 00:01:18,482 --> 00:01:19,940 and we want to maintain that order. 32 00:01:19,940 --> 00:01:21,440 And that order may not have anything 33 00:01:21,440 --> 00:01:23,080 to do with the items themselves. 34 00:01:23,080 --> 00:01:24,890 So that's what we call an extrinsic order. 35 00:01:24,890 --> 00:01:26,515 We're told what the order is by saying, 36 00:01:26,515 --> 00:01:29,240 insert this item after this one, or append this one to the end, 37 00:01:29,240 --> 00:01:31,620 or prepend it to the beginning. 38 00:01:31,620 --> 00:01:36,830 So in lecture last week, we saw hash tables implement sets. 39 00:01:36,830 --> 00:01:43,470 And let me just remind you some things that they can do. 40 00:01:43,470 --> 00:01:47,460 So we have, on the one hand, set hashing. 41 00:01:50,330 --> 00:01:55,760 So we're going to need this in a moment. 42 00:01:55,760 --> 00:01:58,320 This is just a reminder from lecture. 43 00:01:58,320 --> 00:02:01,910 We can build one in linear time expected. 44 00:02:01,910 --> 00:02:06,530 We can find an item in constant timed expected by key. 45 00:02:06,530 --> 00:02:12,320 And we can insert or delete an item 46 00:02:12,320 --> 00:02:17,480 in constant expected amortized. 47 00:02:20,570 --> 00:02:25,070 OK, so this is a black box that we're given. 48 00:02:25,070 --> 00:02:31,130 And the problem statement says that, imagine you're 49 00:02:31,130 --> 00:02:33,410 given a hash table as a black box, which 50 00:02:33,410 --> 00:02:37,160 means we're giving a thing that behaves just like a-- 51 00:02:37,160 --> 00:02:39,530 thank you, 2. 52 00:02:39,530 --> 00:02:41,870 We're given something that is a hash table. 53 00:02:41,870 --> 00:02:43,653 But it's black box in the sense we're not 54 00:02:43,653 --> 00:02:46,070 allowed to reach in and change the implementation details. 55 00:02:46,070 --> 00:02:49,190 We're supposed to use it as is, just by calling its interface. 56 00:02:49,190 --> 00:02:51,440 So in particular, we're giving these three operations. 57 00:02:51,440 --> 00:02:55,105 I'll maybe also use iter to iterate through the items. 58 00:02:55,105 --> 00:02:57,230 So we're allowed to build something in linear time, 59 00:02:57,230 --> 00:03:02,120 find, and insert and delete in constant expected amortized. 60 00:03:02,120 --> 00:03:03,920 And what the problem is asking is 61 00:03:03,920 --> 00:03:06,320 to build out of this data structure 62 00:03:06,320 --> 00:03:11,540 a sequence with particular time balance. 63 00:03:11,540 --> 00:03:17,000 So this is what we call a reduction in that we're 64 00:03:17,000 --> 00:03:19,700 going to convert-- 65 00:03:19,700 --> 00:03:22,820 I guess technically, we're reducing the sequence problem 66 00:03:22,820 --> 00:03:25,838 to the set problem, because we're 67 00:03:25,838 --> 00:03:28,130 showing how to solve the sequence problem using the set 68 00:03:28,130 --> 00:03:28,910 problem. 69 00:03:28,910 --> 00:03:30,980 But the way we'll think about it is in the other direction. 70 00:03:30,980 --> 00:03:32,660 We're given a data structure that solves set. 71 00:03:32,660 --> 00:03:34,550 And we're going to convert it into a data structure that 72 00:03:34,550 --> 00:03:35,640 solves sequence. 73 00:03:35,640 --> 00:03:38,580 So given that we already know how to do this from lecture, 74 00:03:38,580 --> 00:03:40,661 we're going to learn how to do this. 75 00:03:40,661 --> 00:03:43,945 This is teaching you new stuff in a problem set. 76 00:03:43,945 --> 00:03:45,320 So the specific bounds that we're 77 00:03:45,320 --> 00:03:51,020 told to achieve are build in constant expected time, 78 00:03:51,020 --> 00:04:00,290 get and set_at in constant expected time, 79 00:04:00,290 --> 00:04:12,770 insert and delete_at in linear expected time, 80 00:04:12,770 --> 00:04:23,930 and insert and delete first and last in-- 81 00:04:23,930 --> 00:04:26,060 we're running out of room here-- 82 00:04:26,060 --> 00:04:31,550 constant expected amortized. 83 00:04:31,550 --> 00:04:33,530 OK. 84 00:04:33,530 --> 00:04:36,560 So this is just what we're told to do. 85 00:04:36,560 --> 00:04:38,730 And now we start thinking. 86 00:04:38,730 --> 00:04:39,890 So we're given this. 87 00:04:39,890 --> 00:04:42,500 We want to build this. 88 00:04:42,500 --> 00:04:44,990 And so I'm going to tell you a little bit about my thought 89 00:04:44,990 --> 00:04:45,620 process. 90 00:04:45,620 --> 00:04:47,453 When I'm presented with a problem like this, 91 00:04:47,453 --> 00:04:49,070 the first thing is to read the problem 92 00:04:49,070 --> 00:04:52,210 and see, OK, what's the hard part here? 93 00:04:52,210 --> 00:04:53,210 What are the challenges? 94 00:04:53,210 --> 00:04:55,010 So clearly, we have to do all four 95 00:04:55,010 --> 00:04:56,480 of these types of operations. 96 00:04:56,480 --> 00:04:59,060 Build in linear expected time-- that's basically everything 97 00:04:59,060 --> 00:05:00,140 we've seen. 98 00:05:00,140 --> 00:05:03,890 Get or set_at in constant expected time-- that's fast. 99 00:05:03,890 --> 00:05:07,400 And that feels kind of like this find operation. 100 00:05:07,400 --> 00:05:10,520 So both of these seem pretty matchy-matchy. 101 00:05:10,520 --> 00:05:12,770 So that looks like a good mapping. 102 00:05:12,770 --> 00:05:14,770 I'm going to try to build these operations using 103 00:05:14,770 --> 00:05:16,150 those operations. 104 00:05:16,150 --> 00:05:20,260 Insert and delete at a specific location 105 00:05:20,260 --> 00:05:22,270 in constant expected time-- 106 00:05:22,270 --> 00:05:25,270 sorry, linear expected time-- that's big. 107 00:05:25,270 --> 00:05:28,540 Linear expected time means I can rebuild the entire data 108 00:05:28,540 --> 00:05:30,730 structure every time I do an operation. 109 00:05:30,730 --> 00:05:32,627 So this is easy. 110 00:05:32,627 --> 00:05:34,210 OK, that's the first thing to realize. 111 00:05:34,210 --> 00:05:36,350 This is big. 112 00:05:36,350 --> 00:05:36,850 Great. 113 00:05:36,850 --> 00:05:39,160 So I don't really have to worry about these operations. 114 00:05:39,160 --> 00:05:40,780 I mean, I do have to implement them. 115 00:05:40,780 --> 00:05:46,480 But it's not hard to do it that fast, because I can rebuild. 116 00:05:46,480 --> 00:05:49,390 And then here, insert and delete at the beginning and the end 117 00:05:49,390 --> 00:05:52,300 of the array, these are the DEQ, Double-Ended Queue operations, 118 00:05:52,300 --> 00:05:54,520 insert and delete at either end, in constant 119 00:05:54,520 --> 00:05:55,780 expected amortized time. 120 00:05:55,780 --> 00:05:58,970 This, I feel like, is a tricky one. 121 00:05:58,970 --> 00:06:01,510 You've seen one way to do this in a problem set. 122 00:06:01,510 --> 00:06:04,480 But now we're going to see another way with-- 123 00:06:04,480 --> 00:06:08,800 OK, the other thing to notice is these "expected" words. 124 00:06:08,800 --> 00:06:10,780 In this case, we're told to use hashing. 125 00:06:10,780 --> 00:06:13,960 But with lot of the problems, you're not told how to solve it 126 00:06:13,960 --> 00:06:16,570 or what you should be basing your thing on. 127 00:06:16,570 --> 00:06:19,330 And so "expected" is always a good keyword, 128 00:06:19,330 --> 00:06:21,730 because it means randomization is involved somehow. 129 00:06:21,730 --> 00:06:24,460 If you're told the bound is going to be expected, 130 00:06:24,460 --> 00:06:28,030 you probably need to use randomization. 131 00:06:28,030 --> 00:06:30,830 And in this class, the only form of randomization you will use 132 00:06:30,830 --> 00:06:32,320 is essentially hashing. 133 00:06:32,320 --> 00:06:35,645 So that's a good hint. 134 00:06:35,645 --> 00:06:38,020 In this case, we know what we're supposed to use hashing. 135 00:06:38,020 --> 00:06:43,870 All right, so this is going to be the challenge. 136 00:06:46,390 --> 00:06:49,460 But any ideas on how we might tackle this problem? 137 00:06:49,460 --> 00:06:53,500 How can we-- so set, remember, every item has a key. 138 00:06:53,500 --> 00:06:55,823 In a sequence, items are just items. 139 00:06:55,823 --> 00:06:57,490 And we're told to insert and delete them 140 00:06:57,490 --> 00:06:59,110 at particular locations. 141 00:06:59,110 --> 00:07:01,060 But they don't have keys. 142 00:07:01,060 --> 00:07:02,440 So one of the challenges is going 143 00:07:02,440 --> 00:07:05,770 to be to take our items here, give them keys 144 00:07:05,770 --> 00:07:07,460 so that we can store them in a set. 145 00:07:07,460 --> 00:07:12,556 Otherwise, we can't use find. 146 00:07:12,556 --> 00:07:14,700 If there's no keys there, there's 147 00:07:14,700 --> 00:07:15,700 no way to search by key. 148 00:07:18,100 --> 00:07:18,600 Ideas? 149 00:07:23,870 --> 00:07:27,830 So let's think about what we want to do. 150 00:07:27,830 --> 00:07:31,370 Let's start with-- so build, I think, is fine. 151 00:07:31,370 --> 00:07:33,370 If you just want to build a data structure, 152 00:07:33,370 --> 00:07:35,220 you don't need to do anything. 153 00:07:35,220 --> 00:07:37,460 The hard part are the queries or updates 154 00:07:37,460 --> 00:07:39,870 you want to be able to do on your data structure. 155 00:07:39,870 --> 00:07:42,900 Let's start with this operation, get and set_at. 156 00:07:42,900 --> 00:07:46,280 So remember, get_at, you're given an index i, 157 00:07:46,280 --> 00:07:53,950 and you want to find the item at position i. 158 00:07:53,950 --> 00:07:58,190 And set_at, we're given a position, 159 00:07:58,190 --> 00:08:00,965 and we want to change the item stored at that position, 160 00:08:00,965 --> 00:08:03,210 at that index, i. 161 00:08:03,210 --> 00:08:06,420 Now, over here, what we're given, 162 00:08:06,420 --> 00:08:07,850 we can insert and delete. 163 00:08:07,850 --> 00:08:09,920 But the main sort of lookup-- 164 00:08:09,920 --> 00:08:11,870 let's think about get_at first. 165 00:08:11,870 --> 00:08:16,170 A natural mapping, given this arrow, is find. 166 00:08:16,170 --> 00:08:19,500 Find will search for an item by key. 167 00:08:19,500 --> 00:08:21,750 So here's-- just staring at that, 168 00:08:21,750 --> 00:08:24,110 let's look at all the possible pairings you could do. 169 00:08:24,110 --> 00:08:25,760 We have find by key over here. 170 00:08:25,760 --> 00:08:28,680 And we need to implement get_at by index. 171 00:08:28,680 --> 00:08:31,460 So let's make the indices keys. 172 00:08:31,460 --> 00:08:35,390 OK, so this is idea number one-- 173 00:08:35,390 --> 00:08:47,870 index-- assign a key to each item 174 00:08:47,870 --> 00:08:51,495 equal to index in sequence. 175 00:08:55,790 --> 00:08:58,580 OK, so then, when I do-- 176 00:08:58,580 --> 00:09:00,440 to implement get_at, I can just call 177 00:09:00,440 --> 00:09:06,380 find of i if i is also a key. 178 00:09:06,380 --> 00:09:08,455 And that should give me the thing that I want. 179 00:09:08,455 --> 00:09:09,830 Maybe for this to make sense, let 180 00:09:09,830 --> 00:09:11,780 me tell you how I'm building. 181 00:09:11,780 --> 00:09:16,820 So if I'm given, say, an array A of items, and they're both-- 182 00:09:16,820 --> 00:09:18,910 the only name conflict here is build. 183 00:09:18,910 --> 00:09:21,620 So let me call this one sequence build. 184 00:09:24,260 --> 00:09:26,300 And I'm going to implement it using set build. 185 00:09:29,600 --> 00:09:32,930 And I'll use some shorthand notation here. 186 00:09:32,930 --> 00:09:35,870 Let's say I want to make an object that 187 00:09:35,870 --> 00:09:47,960 has a key equal to i and a value equal to A of i-- 188 00:09:47,960 --> 00:09:50,090 that's my object notation-- 189 00:09:50,090 --> 00:09:58,100 for i equals 0, 1, up to size of A minus 1. 190 00:09:58,100 --> 00:10:02,040 That's a little bit code-like, but not quite literal code. 191 00:10:02,040 --> 00:10:03,750 So I'm just going to use this to say, 192 00:10:03,750 --> 00:10:07,863 let's make an object that has two parts. 193 00:10:07,863 --> 00:10:08,780 One is called the key. 194 00:10:08,780 --> 00:10:11,840 So we can talk about the object.key, so we can-- 195 00:10:11,840 --> 00:10:13,085 which sets want to do. 196 00:10:13,085 --> 00:10:15,710 And we're also going to store a value, which is the actual item 197 00:10:15,710 --> 00:10:17,640 that we're given. 198 00:10:17,640 --> 00:10:20,030 So I'm just-- because these are given in the sequence, 199 00:10:20,030 --> 00:10:21,738 I'm just representing that sequence order 200 00:10:21,738 --> 00:10:23,490 by assigning i to be the key. 201 00:10:23,490 --> 00:10:26,240 And so now, if I want to find the item at index i, 202 00:10:26,240 --> 00:10:27,590 I can do find of i. 203 00:10:27,590 --> 00:10:30,980 And technically, I should probably do .value. 204 00:10:30,980 --> 00:10:32,520 That will give me the actual item 205 00:10:32,520 --> 00:10:34,550 that's stored at that position. 206 00:10:34,550 --> 00:10:37,580 When I do find of i, I'm going to get this whole object 207 00:10:37,580 --> 00:10:38,390 with the key of i. 208 00:10:38,390 --> 00:10:41,720 And then I want to get the value part of it. 209 00:10:41,720 --> 00:10:45,500 So then set_at, I can just use this 210 00:10:45,500 --> 00:10:52,860 find operation to get the object and set its value to x. 211 00:10:52,860 --> 00:10:53,360 Boom. 212 00:10:53,360 --> 00:10:57,050 We've implemented array-like semantics, get_at i 213 00:10:57,050 --> 00:11:01,130 and set_at i, using a set. 214 00:11:01,130 --> 00:11:03,110 If you've ever programmed in JavaScript, 215 00:11:03,110 --> 00:11:05,750 this should feel very familiar, because JavaScript actually 216 00:11:05,750 --> 00:11:10,130 implements arrays, at least at the conceptual level, 217 00:11:10,130 --> 00:11:13,340 as just general mapping types, which are-- 218 00:11:13,340 --> 00:11:16,820 they call them objects, but they are basically sets. 219 00:11:16,820 --> 00:11:18,290 And it's even grosser. 220 00:11:18,290 --> 00:11:20,690 They convert the integers into strings and then 221 00:11:20,690 --> 00:11:24,815 index everything by the strings, semantically, anyway. 222 00:11:24,815 --> 00:11:26,690 Implementation details can be more efficient. 223 00:11:26,690 --> 00:11:28,860 But conceptually, that's what's going on. 224 00:11:28,860 --> 00:11:34,910 And so that's the idea we're doing here, which seems great. 225 00:11:34,910 --> 00:11:37,420 Any problems? 226 00:11:37,420 --> 00:11:38,040 So let's see. 227 00:11:38,040 --> 00:11:39,740 There's insert_at and delete_at. 228 00:11:39,740 --> 00:11:41,420 As I mentioned, what I'm going to do 229 00:11:41,420 --> 00:11:44,650 for those operations is just rebuild the entire structure. 230 00:11:44,650 --> 00:11:46,715 I'll just write that briefly. 231 00:11:49,340 --> 00:11:54,122 Basically, let's just iterate all the items. 232 00:11:54,122 --> 00:12:00,950 Iterate all items, let's say, into an array. 233 00:12:00,950 --> 00:12:04,670 Insert, delete one of them. 234 00:12:04,670 --> 00:12:07,290 And then rebuild. 235 00:12:07,290 --> 00:12:09,980 OK, and if I was writing a P set answer, 236 00:12:09,980 --> 00:12:11,480 I would say a little bit more detail 237 00:12:11,480 --> 00:12:14,482 what I mean in this step. 238 00:12:14,482 --> 00:12:15,565 I've done it in the notes. 239 00:12:15,565 --> 00:12:16,490 Not that hard. 240 00:12:16,490 --> 00:12:18,320 But we can afford linear expected time. 241 00:12:18,320 --> 00:12:20,735 I can afford to call build again. 242 00:12:20,735 --> 00:12:22,110 I guess, technically, I'm calling 243 00:12:22,110 --> 00:12:23,970 this build, sequence build. 244 00:12:27,462 --> 00:12:29,670 So I can afford just to extract things into an array, 245 00:12:29,670 --> 00:12:31,445 do the linear time operation on the array 246 00:12:31,445 --> 00:12:32,820 with the shifting and everything, 247 00:12:32,820 --> 00:12:34,205 and then just call build again. 248 00:12:34,205 --> 00:12:34,830 Yeah, question? 249 00:12:34,830 --> 00:12:36,648 AUDIENCE: I had a question about get_at. 250 00:12:36,648 --> 00:12:37,440 ERIK DEMAINE: Yeah. 251 00:12:37,440 --> 00:12:41,340 AUDIENCE: [INAUDIBLE] get_at. 252 00:12:41,340 --> 00:12:44,640 ERIK DEMAINE: Sorry, no. 253 00:12:44,640 --> 00:12:46,800 These are separate definitions, yeah? 254 00:12:46,800 --> 00:12:48,100 Sorry, they got a little close. 255 00:12:48,100 --> 00:12:48,300 AUDIENCE: Oh. 256 00:12:48,300 --> 00:12:50,280 ERIK DEMAINE: So this is the definition of get_at. 257 00:12:50,280 --> 00:12:51,840 This is the definition of sequence build. 258 00:12:51,840 --> 00:12:52,688 AUDIENCE: Oh, I see. 259 00:12:52,688 --> 00:12:53,480 ERIK DEMAINE: Yeah. 260 00:12:53,480 --> 00:12:54,510 Thanks for asking. 261 00:12:57,340 --> 00:12:59,730 OK, all good, yeah? 262 00:12:59,730 --> 00:13:02,653 AUDIENCE: Can you explain the insert and delete again? 263 00:13:02,653 --> 00:13:04,320 ERIK DEMAINE: Explain insert and delete. 264 00:13:04,320 --> 00:13:09,270 OK, so maybe I should actually write one of them down, 265 00:13:09,270 --> 00:13:11,100 or I'll just draw a picture, maybe. 266 00:13:11,100 --> 00:13:15,720 So we have this data structure, which is now a sequence data 267 00:13:15,720 --> 00:13:20,220 structure, represents some sequence of items. 268 00:13:20,220 --> 00:13:23,460 And my goal is to say, delete the i-th item. 269 00:13:23,460 --> 00:13:27,930 So there's some items in here, x0 up to xn minus 1. 270 00:13:27,930 --> 00:13:31,320 I want to remove xi from the sequence. 271 00:13:31,320 --> 00:13:33,640 Or I guess I should draw it this way. 272 00:13:33,640 --> 00:13:35,180 It's coming out. 273 00:13:35,180 --> 00:13:38,020 So what I'm going to do is first extract all the items 274 00:13:38,020 --> 00:13:38,870 from the sequence. 275 00:13:38,870 --> 00:13:40,510 And I didn't write it, but there's 276 00:13:40,510 --> 00:13:44,650 an interface over here called iter, which just gives me 277 00:13:44,650 --> 00:13:46,760 all the items in order. 278 00:13:46,760 --> 00:13:50,320 So I'm going to extract this into an array sequence. 279 00:13:55,240 --> 00:13:59,620 Let's say I'll just build a static array of size n. 280 00:13:59,620 --> 00:14:01,810 I also have a length operation that tells me 281 00:14:01,810 --> 00:14:03,100 how many items are in here. 282 00:14:03,100 --> 00:14:06,910 And the iter operation will give me all the items in order. 283 00:14:06,910 --> 00:14:11,000 And so I'll put into my array x0 and then x1, 284 00:14:11,000 --> 00:14:13,700 and so on, as they come out. 285 00:14:13,700 --> 00:14:17,140 Then I go to position i. 286 00:14:17,140 --> 00:14:20,890 And I want to delete that item and shift all the others over. 287 00:14:20,890 --> 00:14:23,290 This is the boring-- 288 00:14:23,290 --> 00:14:28,600 I think we even said how to do delete_at in dynamic arrays 289 00:14:28,600 --> 00:14:31,120 in recitation 2, pretty sure. 290 00:14:31,120 --> 00:14:32,560 So I'm just mimicking that. 291 00:14:32,560 --> 00:14:35,830 I'm building this just to get the new order of things. 292 00:14:35,830 --> 00:14:39,340 And then I'm applying, via the build operation, 293 00:14:39,340 --> 00:14:43,570 I'm building a totally new sequence. 294 00:14:43,570 --> 00:14:46,895 And that's how I would implement delete_at, one way. 295 00:14:46,895 --> 00:14:47,770 There are other ways. 296 00:14:47,770 --> 00:14:48,270 Yeah? 297 00:14:48,270 --> 00:14:50,960 AUDIENCE: Do you have [INAUDIBLE] space [INAUDIBLE],, 298 00:14:50,960 --> 00:14:51,460 or-- 299 00:14:51,460 --> 00:14:53,640 ERIK DEMAINE: How much space is this using? 300 00:14:53,640 --> 00:14:55,515 Oh, problem with space if you're inserting-- 301 00:14:55,515 --> 00:14:56,890 if you're inserting, you probably 302 00:14:56,890 --> 00:14:58,990 want to allocate a static array of size n plus 1. 303 00:14:58,990 --> 00:15:01,010 You know exactly what's going to happen. 304 00:15:01,010 --> 00:15:03,130 So just allocate a little bit bigger. 305 00:15:03,130 --> 00:15:04,760 Then you can do the shift. 306 00:15:04,760 --> 00:15:06,250 You could also use dynamic arrays. 307 00:15:06,250 --> 00:15:08,240 But then you would get maybe an-- 308 00:15:08,240 --> 00:15:10,240 it's not an amortized bound, because you're only 309 00:15:10,240 --> 00:15:12,130 doing one insertion. 310 00:15:12,130 --> 00:15:14,180 The point is this is really easy. 311 00:15:14,180 --> 00:15:15,710 We can spend linear time. 312 00:15:15,710 --> 00:15:17,920 So we can rebuild the-- 313 00:15:17,920 --> 00:15:20,930 we can rebuild this array three times if we wanted. 314 00:15:20,930 --> 00:15:22,190 Question? 315 00:15:22,190 --> 00:15:23,910 AUDIENCE: What if you weren't allowed 316 00:15:23,910 --> 00:15:27,488 external non-constant space? 317 00:15:27,488 --> 00:15:28,238 ERIK DEMAINE: Huh. 318 00:15:28,238 --> 00:15:29,988 You're going to throw me and open problem. 319 00:15:29,988 --> 00:15:32,275 What if you only have constant extra space? 320 00:15:36,750 --> 00:15:37,350 Right. 321 00:15:37,350 --> 00:15:40,050 Then I think we need to use insert and delete. 322 00:15:40,050 --> 00:15:44,160 So we could-- good question. 323 00:15:44,160 --> 00:15:46,590 We could conceptually do this shifting, 324 00:15:46,590 --> 00:15:48,960 but do it using insert and delete. 325 00:15:48,960 --> 00:15:53,730 So we can-- so let's do the delete case again. 326 00:15:53,730 --> 00:15:56,190 So we want to-- 327 00:15:56,190 --> 00:15:57,030 here's xi. 328 00:15:57,030 --> 00:16:00,330 We want to replace it with xi plus 1 and so on. 329 00:16:04,380 --> 00:16:10,170 And so we can start out by deleting the item with key i. 330 00:16:10,170 --> 00:16:12,060 That will get rid of this guy. 331 00:16:12,060 --> 00:16:16,920 Then we can delete the item with key i plus 1. 332 00:16:16,920 --> 00:16:18,630 And that gives us the item. 333 00:16:18,630 --> 00:16:23,310 And then we can reassign its key to i instead of i plus 1 334 00:16:23,310 --> 00:16:25,200 and then reinsert it. 335 00:16:25,200 --> 00:16:27,060 So we can take this item out. 336 00:16:27,060 --> 00:16:30,516 It has a key, which is-- 337 00:16:30,516 --> 00:16:31,770 I'll draw this properly. 338 00:16:31,770 --> 00:16:37,200 So we have key i plus 1 and value xi 339 00:16:37,200 --> 00:16:40,350 plus 1 stored in this data structure. 340 00:16:40,350 --> 00:16:43,260 Then we update the key to i. 341 00:16:43,260 --> 00:16:44,700 And then we reinsert it. 342 00:16:44,700 --> 00:16:46,955 And it takes the place of this guy. 343 00:16:46,955 --> 00:16:47,830 So you could do that. 344 00:16:47,830 --> 00:16:50,220 You could go down this list and-- or not the list, 345 00:16:50,220 --> 00:16:52,980 but you could iterate for i equals-- sorry, 346 00:16:52,980 --> 00:16:56,370 for j equals i to n minus 1, and for each of those items, 347 00:16:56,370 --> 00:16:59,688 delete it, change its key, reinsert it with the new key. 348 00:16:59,688 --> 00:17:01,980 And then you don't have to build this intermediate data 349 00:17:01,980 --> 00:17:03,100 structure. 350 00:17:03,100 --> 00:17:05,335 So if you're told to minimize space, great. 351 00:17:05,335 --> 00:17:06,960 And maybe you think of that as simpler. 352 00:17:06,960 --> 00:17:09,000 I like to think of this as simpler, because I-- 353 00:17:09,000 --> 00:17:10,208 point is, I have linear time. 354 00:17:10,208 --> 00:17:14,069 I can do crazy, silly, very non-data-structures-y things, 355 00:17:14,069 --> 00:17:16,670 where I just start from scratch. 356 00:17:16,670 --> 00:17:18,619 OK, great. 357 00:17:18,619 --> 00:17:20,510 But there's one more set of operations, 358 00:17:20,510 --> 00:17:23,609 insert, delete, first and last. 359 00:17:23,609 --> 00:17:26,339 Are these easy? 360 00:17:26,339 --> 00:17:26,839 Good? 361 00:17:33,540 --> 00:17:35,340 Shall we try? 362 00:17:35,340 --> 00:17:40,440 We can insert last. 363 00:17:40,440 --> 00:17:44,520 So this is given an item, x, we want to add it 364 00:17:44,520 --> 00:17:45,790 to the end of the structure. 365 00:17:45,790 --> 00:17:48,928 So that means its index is going to be equal to-- because we 366 00:17:48,928 --> 00:17:50,970 start at 0, it's going to be equal to the length, 367 00:17:50,970 --> 00:17:52,780 current length of the structure. 368 00:17:52,780 --> 00:17:57,350 So let's just insert a new object, which 369 00:17:57,350 --> 00:18:00,720 has key equal to the length. 370 00:18:00,720 --> 00:18:04,240 And it has value equal to x. 371 00:18:04,240 --> 00:18:05,310 We're done. 372 00:18:05,310 --> 00:18:06,690 Delete last, similar. 373 00:18:06,690 --> 00:18:10,740 Just delete the item with key length minus 1. 374 00:18:10,740 --> 00:18:11,760 OK, what about first? 375 00:18:17,170 --> 00:18:19,905 This is supposed to add x to the beginning of my sequence. 376 00:18:22,810 --> 00:18:25,120 Well, now I realize I have a problem, 377 00:18:25,120 --> 00:18:29,830 because I want this new item to have key 0, 378 00:18:29,830 --> 00:18:33,700 because after I do an insert first, get_at of 0 379 00:18:33,700 --> 00:18:35,530 should return this item. 380 00:18:35,530 --> 00:18:39,370 But I already have an item with key 0, and an item with key 1, 381 00:18:39,370 --> 00:18:41,990 and an item with key 2, and so on down the way. 382 00:18:41,990 --> 00:18:44,170 And so if I wanted to give x a key of 0, 383 00:18:44,170 --> 00:18:46,750 I have to shift the keys of all of those items, 384 00:18:46,750 --> 00:18:48,320 just like we were doing here. 385 00:18:48,320 --> 00:18:50,890 And that's going to take linear time. 386 00:18:50,890 --> 00:18:54,320 But we were supposed to do this in constant expected amortized 387 00:18:54,320 --> 00:18:55,640 time. 388 00:18:55,640 --> 00:18:57,800 So that's no good. 389 00:18:57,800 --> 00:19:00,890 So this idea is not enough. 390 00:19:00,890 --> 00:19:01,850 It's not a bad idea. 391 00:19:01,850 --> 00:19:03,190 It's still a good idea. 392 00:19:03,190 --> 00:19:05,770 But it's no longer what we actually want to do. 393 00:19:05,770 --> 00:19:09,430 It's only morally what we want to do. 394 00:19:09,430 --> 00:19:13,750 So do you have any thoughts on how we might get around 395 00:19:13,750 --> 00:19:15,760 this problem? 396 00:19:15,760 --> 00:19:17,630 Seems like inserting at position 0, 397 00:19:17,630 --> 00:19:20,095 I need to shift everything down, linear time. 398 00:19:20,095 --> 00:19:20,965 That really sucks. 399 00:19:28,680 --> 00:19:31,310 Yeah? 400 00:19:31,310 --> 00:19:36,210 AUDIENCE: You could create some sort of link to something else. 401 00:19:36,210 --> 00:19:38,640 ERIK DEMAINE: Link this data structure with another one. 402 00:19:38,640 --> 00:19:40,140 So we could build more than one set. 403 00:19:40,140 --> 00:19:41,182 That's certainly allowed. 404 00:19:45,260 --> 00:19:46,940 I don't know how to do-- oh, I see. 405 00:19:46,940 --> 00:19:49,070 You're saying maybe build a whole other structure 406 00:19:49,070 --> 00:19:51,620 for the items that come before 0? 407 00:19:51,620 --> 00:19:52,820 AUDIENCE: Yeah. 408 00:19:52,820 --> 00:19:54,028 ERIK DEMAINE: Yeah, actually. 409 00:19:54,028 --> 00:19:57,260 That would work, I think, maybe. 410 00:19:57,260 --> 00:19:58,490 It's like in the P set. 411 00:19:58,490 --> 00:20:01,280 Then you have to deal with when-- if you delete 412 00:20:01,280 --> 00:20:02,630 one of them, it becomes empty. 413 00:20:02,630 --> 00:20:04,340 Then things get messy. 414 00:20:04,340 --> 00:20:07,280 Delete first is also going to be a problem, because I 415 00:20:07,280 --> 00:20:11,970 delete beginning of this data structure, 416 00:20:11,970 --> 00:20:13,820 then I lose my 0 item. 417 00:20:13,820 --> 00:20:16,850 And I want the new 0 item to be the 1 item. 418 00:20:16,850 --> 00:20:18,500 And again, all the indices shift. 419 00:20:18,500 --> 00:20:21,230 So delete and inserting at the first is hard. 420 00:20:21,230 --> 00:20:23,720 So we could do that trick like in the P set, but-- 421 00:20:23,720 --> 00:20:27,540 or like in the last problem session and so on. 422 00:20:27,540 --> 00:20:31,629 But there's a much simpler idea. 423 00:20:31,629 --> 00:20:33,296 AUDIENCE: Can you have an extra variable 424 00:20:33,296 --> 00:20:35,453 to keep track of where is the beginning? 425 00:20:35,453 --> 00:20:36,245 ERIK DEMAINE: Nice. 426 00:20:36,245 --> 00:20:38,930 I can have an extra variable to keep track 427 00:20:38,930 --> 00:20:40,190 of where the beginning is. 428 00:20:52,084 --> 00:20:54,390 Call this first. 429 00:20:54,390 --> 00:21:08,090 This is going to be the key of the first item, index 0. 430 00:21:08,090 --> 00:21:10,130 Another way to say this is, let's just 431 00:21:10,130 --> 00:21:13,100 use negative integers, right? 432 00:21:13,100 --> 00:21:15,920 Sets work for any keys, any integer keys. 433 00:21:15,920 --> 00:21:17,360 OK, actually, we technically said, 434 00:21:17,360 --> 00:21:19,670 make sure you use keys 0 to u minus 1. 435 00:21:19,670 --> 00:21:24,290 But then, if you have negative numbers, you can easily fold-- 436 00:21:24,290 --> 00:21:26,435 AUDIENCE: Wait, doesn't it [INAUDIBLE] to like 437 00:21:26,435 --> 00:21:27,205 [INAUDIBLE]? 438 00:21:27,205 --> 00:21:27,913 ERIK DEMAINE: Ah. 439 00:21:27,913 --> 00:21:29,850 Python negative numbers mean something else. 440 00:21:29,850 --> 00:21:31,475 But we're not using a Python interface. 441 00:21:31,475 --> 00:21:34,760 We're using our custom magical set interface, 442 00:21:34,760 --> 00:21:38,090 which we show how to implement in recitation notes, which 443 00:21:38,090 --> 00:21:40,280 can take an arbitrary key. 444 00:21:40,280 --> 00:21:46,040 It hashes that key and finds a place to put that item. 445 00:21:46,040 --> 00:21:48,338 So we're not actually storing things in order here. 446 00:21:48,338 --> 00:21:49,880 We're storing things in a hash table. 447 00:21:49,880 --> 00:21:52,160 But we're not supposed to get into the implementation 448 00:21:52,160 --> 00:21:52,940 details. 449 00:21:52,940 --> 00:21:56,720 I think the way we presented hashing with our universal hash 450 00:21:56,720 --> 00:21:59,220 functions, we only allowed positive numbers. 451 00:21:59,220 --> 00:22:01,980 So maybe, technically, I should point out, 452 00:22:01,980 --> 00:22:07,430 if you have positive and negative numbers, 453 00:22:07,430 --> 00:22:15,260 you can fold this in half by mapping 0 to 0, 1 to 2, 2 to 4, 454 00:22:15,260 --> 00:22:17,300 spreading it out. 455 00:22:17,300 --> 00:22:19,760 And then you can take minus 1 and map it 456 00:22:19,760 --> 00:22:23,750 to plus 1, and minus 2 and map it to plus 3. 457 00:22:23,750 --> 00:22:26,540 So this is like multiplying each of these guys by 2, 458 00:22:26,540 --> 00:22:30,200 and multiplying each of these guys by minus 2 and adding 1. 459 00:22:30,200 --> 00:22:35,960 And then you get non-negative integers out of all integers. 460 00:22:35,960 --> 00:22:40,100 This is a typical math trick for showing 461 00:22:40,100 --> 00:22:42,320 that the number of integers is equal to the number 462 00:22:42,320 --> 00:22:45,140 of non-negative integers, which may seem weird to you. 463 00:22:45,140 --> 00:22:48,410 But they're both countably infinite. 464 00:22:48,410 --> 00:22:50,690 So you could-- if your structure only supports 465 00:22:50,690 --> 00:22:53,930 non-negative keys, you could map negative keys in this way 466 00:22:53,930 --> 00:22:57,200 and throw them into the hash table, OK? 467 00:22:57,200 --> 00:23:00,170 So now, I allow negative things for-- 468 00:23:03,490 --> 00:23:04,250 like that. 469 00:23:04,250 --> 00:23:05,230 And so, great. 470 00:23:05,230 --> 00:23:08,080 If I want to insert at the beginning, what I can do 471 00:23:08,080 --> 00:23:14,080 is just decrement my first variable, which 472 00:23:14,080 --> 00:23:15,450 is keeping track of the index. 473 00:23:15,450 --> 00:23:17,920 So initially, first is going to be 0. 474 00:23:17,920 --> 00:23:20,590 So I'm going to add into my build. 475 00:23:20,590 --> 00:23:23,500 First, I'm going to say first equals 476 00:23:23,500 --> 00:23:28,270 0, because I start with key 0 when I initially 477 00:23:28,270 --> 00:23:29,680 build a structure. 478 00:23:29,680 --> 00:23:31,930 And if I want to-- if I need more room before 0, 479 00:23:31,930 --> 00:23:34,240 I just set first to minus 1. 480 00:23:34,240 --> 00:23:36,400 And if I already have a minus 1 element, 481 00:23:36,400 --> 00:23:38,040 I'll decrement it to minus 2. 482 00:23:38,040 --> 00:23:40,600 Decrement means decrease by 1-- 483 00:23:40,600 --> 00:23:42,640 shows my assembly language programming. 484 00:23:42,640 --> 00:23:45,550 This is usually a built-in operation on most computers. 485 00:23:45,550 --> 00:23:55,690 And then I can insert an item with key first and value x. 486 00:23:58,410 --> 00:23:58,910 Great. 487 00:23:58,910 --> 00:24:01,020 And if I want to delete the first item, 488 00:24:01,020 --> 00:24:05,850 I would delete the item with key first and then increment first. 489 00:24:05,850 --> 00:24:08,720 And now all of my operations have to change a little bit-- 490 00:24:08,720 --> 00:24:11,030 let me use another color-- 491 00:24:11,030 --> 00:24:13,190 because I was implicitly assuming here 492 00:24:13,190 --> 00:24:15,320 that all my indices started at i. 493 00:24:15,320 --> 00:24:16,850 But now they start at first. 494 00:24:16,850 --> 00:24:22,730 The index 0 maps to key first. 495 00:24:22,730 --> 00:24:24,680 And so the right thing to do here 496 00:24:24,680 --> 00:24:31,370 is plus first and plus first. 497 00:24:31,370 --> 00:24:35,930 Basically, add a whole bunch of plus firsts throughout. 498 00:24:35,930 --> 00:24:37,080 This one is probably fine. 499 00:24:37,080 --> 00:24:40,550 If I'm globally rebuilding, I can reassign all my labels. 500 00:24:40,550 --> 00:24:44,840 But this one should be first plus length. 501 00:24:47,630 --> 00:24:52,760 OK, so just by keeping track of where my keys are starting, 502 00:24:52,760 --> 00:24:56,510 I can do this shifting and not have to worry about stuff. 503 00:24:56,510 --> 00:24:59,630 And this is a lot easier than having to worry about 504 00:24:59,630 --> 00:25:03,290 maintaining two structures, and keeping them both non-empty, 505 00:25:03,290 --> 00:25:06,070 and stuff like that, because of-- 506 00:25:06,070 --> 00:25:08,780 if I assume my mindset has this power 507 00:25:08,780 --> 00:25:11,420 of dealing with negative integers, and strings, 508 00:25:11,420 --> 00:25:14,450 and whatever else. 509 00:25:14,450 --> 00:25:15,400 Cool? 510 00:25:15,400 --> 00:25:15,900 Yeah? 511 00:25:15,900 --> 00:25:17,330 AUDIENCE: Is there a reason why you didn't do 512 00:25:17,330 --> 00:25:18,490 like the sorting-- 513 00:25:18,490 --> 00:25:21,870 like, have [INAUDIBLE]? 514 00:25:21,870 --> 00:25:24,130 ERIK DEMAINE: Oh, why didn't I use a linked list? 515 00:25:24,130 --> 00:25:27,790 Because this. 516 00:25:27,790 --> 00:25:32,440 Linked lists are very bad at get and set at a given index. 517 00:25:32,440 --> 00:25:33,400 AUDIENCE: Is that the-- 518 00:25:33,400 --> 00:25:35,025 the bottom idea, is that a linked list? 519 00:25:35,025 --> 00:25:36,692 ERIK DEMAINE: This is not a linked list. 520 00:25:36,692 --> 00:25:38,200 This is just storing a single number 521 00:25:38,200 --> 00:25:41,050 as integer in your data structure that says, 522 00:25:41,050 --> 00:25:44,660 what is the smallest key in my data structure? 523 00:25:44,660 --> 00:25:45,970 That's all it this. 524 00:25:45,970 --> 00:25:47,078 It's a counter. 525 00:25:47,078 --> 00:25:48,302 AUDIENCE: Ah. 526 00:25:48,302 --> 00:25:50,027 ERIK DEMAINE: OK, so data structure 527 00:25:50,027 --> 00:25:51,110 keeps track of its length. 528 00:25:51,110 --> 00:25:52,910 And it keeps track of the minimum key. 529 00:25:52,910 --> 00:25:55,100 And so it will always consist-- the invariant is, 530 00:25:55,100 --> 00:25:57,530 you will always have keys from first 531 00:25:57,530 --> 00:26:00,290 up to first plus length minus 1. 532 00:26:00,290 --> 00:26:04,800 And that's what we're exploiting here. 533 00:26:04,800 --> 00:26:06,330 We have no idea where first will be. 534 00:26:06,330 --> 00:26:08,430 It depends how many operations you've done, 535 00:26:08,430 --> 00:26:10,510 how many inserts at the beginning, and so on. 536 00:26:10,510 --> 00:26:23,520 But the keys-- keys will always be first to first plus 537 00:26:23,520 --> 00:26:26,433 length minus 1. 538 00:26:26,433 --> 00:26:27,850 This is what we call an invariant. 539 00:26:27,850 --> 00:26:31,200 Useful to write these things down so 540 00:26:31,200 --> 00:26:32,970 you can understand what the heck-- 541 00:26:32,970 --> 00:26:35,077 why is your data structure correct? 542 00:26:35,077 --> 00:26:36,660 Because of invariants like this, which 543 00:26:36,660 --> 00:26:39,072 you can prove by induction, by showing, 544 00:26:39,072 --> 00:26:40,530 each time you do an operation, this 545 00:26:40,530 --> 00:26:44,130 is maintained, even when I'm changing first in order 546 00:26:44,130 --> 00:26:47,820 to maintain this invariant. 547 00:26:47,820 --> 00:26:48,352 Cool. 548 00:26:48,352 --> 00:26:50,310 Sometimes you come up with the invariant first. 549 00:26:50,310 --> 00:26:57,010 In this case, I came up with it post facto, after the fact. 550 00:26:57,010 --> 00:26:58,510 Cool. 551 00:26:58,510 --> 00:27:18,760 Let's move on to problem 3, which is called critter sort. 552 00:27:18,760 --> 00:27:21,600 And the other key thing I want you to learn about-- 553 00:27:21,600 --> 00:27:22,100 question? 554 00:27:22,100 --> 00:27:22,600 Sorry. 555 00:27:22,600 --> 00:27:23,480 AUDIENCE: Yeah. 556 00:27:23,480 --> 00:27:25,547 So when you do first, first plus 1, 557 00:27:25,547 --> 00:27:28,247 is that a rebuilding of the [INAUDIBLE]?? 558 00:27:28,247 --> 00:27:29,830 ERIK DEMAINE: This is just a sentence. 559 00:27:29,830 --> 00:27:32,720 It is not an algorithm or data structure. 560 00:27:32,720 --> 00:27:34,115 This is a mathematical property. 561 00:27:34,115 --> 00:27:34,990 AUDIENCE: [INAUDIBLE] 562 00:27:34,990 --> 00:27:36,770 ERIK DEMAINE: This is not an assignment. 563 00:27:36,770 --> 00:27:38,800 This is a mathematically is equal to. 564 00:27:38,800 --> 00:27:41,920 AUDIENCE: But you are re-indexing it though, 565 00:27:41,920 --> 00:27:43,630 because you're doing first plus 1. 566 00:27:43,630 --> 00:27:46,480 ERIK DEMAINE: So are you asking about one of these operations, 567 00:27:46,480 --> 00:27:48,142 like this one? 568 00:27:48,142 --> 00:27:48,850 AUDIENCE: Oh, OK. 569 00:27:48,850 --> 00:27:49,090 Never mind. 570 00:27:49,090 --> 00:27:49,530 I get it. 571 00:27:49,530 --> 00:27:50,030 OK. 572 00:27:50,030 --> 00:27:51,830 ERIK DEMAINE: Yeah. 573 00:27:51,830 --> 00:27:52,600 OK. 574 00:27:52,600 --> 00:27:54,160 So the other important takeaway I 575 00:27:54,160 --> 00:27:56,350 want you to get about reading our problem sets 576 00:27:56,350 --> 00:27:58,630 is that they have hidden humor inside. 577 00:27:58,630 --> 00:28:00,470 I don't know if you've noticed. 578 00:28:00,470 --> 00:28:02,950 But here's an example of a problem called critter sort. 579 00:28:02,950 --> 00:28:05,770 Ashley Gettem collects and trains pocket critters 580 00:28:05,770 --> 00:28:07,750 to fight other pocket critters in battle. 581 00:28:07,750 --> 00:28:09,235 What is this a reference to? 582 00:28:09,235 --> 00:28:10,313 AUDIENCE: Digimon. 583 00:28:10,313 --> 00:28:11,230 ERIK DEMAINE: Digimon. 584 00:28:11,230 --> 00:28:13,150 Wow, you guys are so young. 585 00:28:13,150 --> 00:28:16,990 Pokemon, the ancient form. 586 00:28:16,990 --> 00:28:19,690 Pokemon is short for pocket monsters. 587 00:28:19,690 --> 00:28:20,950 And in fact, in the original-- 588 00:28:20,950 --> 00:28:21,490 AUDIENCE: [INAUDIBLE] 589 00:28:21,490 --> 00:28:22,448 ERIK DEMAINE: --anime-- 590 00:28:22,448 --> 00:28:24,160 AUDIENCE: Actually, there's [INAUDIBLE].. 591 00:28:24,160 --> 00:28:25,285 ERIK DEMAINE: I don't know. 592 00:28:25,285 --> 00:28:27,880 This is all after my time. 593 00:28:27,880 --> 00:28:28,845 We can debate after. 594 00:28:28,845 --> 00:28:30,220 So pocket critters is a reference 595 00:28:30,220 --> 00:28:33,130 to pocket monsters, which is Pokemon. 596 00:28:33,130 --> 00:28:35,240 Who's Ashley Gettem? 597 00:28:35,240 --> 00:28:36,400 AUDIENCE: Ash. 598 00:28:36,400 --> 00:28:38,470 ERIK DEMAINE: Ash Ketchum is his full name 599 00:28:38,470 --> 00:28:40,952 in the English version. 600 00:28:40,952 --> 00:28:42,910 Totally different name in the Japanese version. 601 00:28:42,910 --> 00:28:46,090 But they're both puns on collect them all, right? 602 00:28:46,090 --> 00:28:48,040 All right, so that's the important stuff. 603 00:28:48,040 --> 00:28:50,240 We'll see more jokes later. 604 00:28:50,240 --> 00:28:51,640 So there's this setup. 605 00:28:51,640 --> 00:28:54,280 But basically, we have n critters. 606 00:28:54,280 --> 00:28:57,290 And we want to sort them by four different things. 607 00:28:57,290 --> 00:28:59,590 And so I'm just going to abstract this problem 608 00:28:59,590 --> 00:29:04,438 to sort n objects by the following types of keys. 609 00:29:04,438 --> 00:29:06,730 And for each one, we want to know what the best sorting 610 00:29:06,730 --> 00:29:08,200 algorithm is. 611 00:29:08,200 --> 00:29:10,990 And there's this footnote that's very important that says, 612 00:29:10,990 --> 00:29:13,420 faster correct algorithms will receive more points 613 00:29:13,420 --> 00:29:15,938 than slower correct algorithms. 614 00:29:15,938 --> 00:29:17,980 Also, correct algorithms will receive more points 615 00:29:17,980 --> 00:29:19,150 than incorrect algorithms. 616 00:29:19,150 --> 00:29:20,230 But that's implicit. 617 00:29:20,230 --> 00:29:22,330 Incorrect generally gets zero. 618 00:29:22,330 --> 00:29:27,340 OK, so part a, it says, species ID. 619 00:29:27,340 --> 00:29:36,400 But basically, we have integers and the range minus n to n. 620 00:29:36,400 --> 00:29:40,900 So if I want to sort n integers in the range minus n to n, 621 00:29:40,900 --> 00:29:42,340 what should I do? 622 00:29:42,340 --> 00:29:45,550 This is a reference to yesterday's lecture. 623 00:29:50,170 --> 00:29:51,350 Yeah? 624 00:29:51,350 --> 00:29:52,800 Radix sort, yeah. 625 00:29:52,800 --> 00:29:54,005 Always a good answer. 626 00:29:54,005 --> 00:29:56,990 Or almost always a good answer when you have integers. 627 00:29:56,990 --> 00:29:59,640 It's a good answer whenever you have small integers. 628 00:29:59,640 --> 00:30:01,590 Now, radix sort, the way we phrased it-- 629 00:30:01,590 --> 00:30:04,010 let me maybe put it down here. 630 00:30:04,010 --> 00:30:20,270 Radix sort sorts n integers in the range 0 to u minus 1 in n 631 00:30:20,270 --> 00:30:25,910 plus n log base n of u time. 632 00:30:28,460 --> 00:30:31,800 And in particular, this is linear time 633 00:30:31,800 --> 00:30:38,420 if u is n to some constant power. 634 00:30:38,420 --> 00:30:43,430 OK, so can I just apply this as is to these integers? 635 00:30:43,430 --> 00:30:45,987 No, because they're negative. 636 00:30:45,987 --> 00:30:46,820 So what should I do? 637 00:30:46,820 --> 00:30:48,278 Maybe I should do my folding trick. 638 00:30:48,278 --> 00:30:51,440 We just saw how to take negative numbers and fold them in, 639 00:30:51,440 --> 00:30:53,160 interspersed with positive numbers. 640 00:30:53,160 --> 00:30:57,900 If I sort that, will that work? 641 00:30:57,900 --> 00:31:00,960 No, because that does not preserve order. 642 00:31:00,960 --> 00:31:03,335 It would intersperse. 643 00:31:03,335 --> 00:31:05,460 We want all the negative numbers to come before all 644 00:31:05,460 --> 00:31:06,240 the positive numbers. 645 00:31:06,240 --> 00:31:06,630 Yeah? 646 00:31:06,630 --> 00:31:08,672 AUDIENCE: Can you just add n to all the integers? 647 00:31:08,672 --> 00:31:10,485 ERIK DEMAINE: Just add n, yep. 648 00:31:10,485 --> 00:31:11,190 Boom. 649 00:31:11,190 --> 00:31:11,970 Plus n. 650 00:31:11,970 --> 00:31:14,730 Now we have integers in the range-- 651 00:31:14,730 --> 00:31:16,020 let's be careful-- 652 00:31:16,020 --> 00:31:17,640 0 to 2n. 653 00:31:21,130 --> 00:31:21,630 Cool. 654 00:31:21,630 --> 00:31:22,650 Now we can apply this. 655 00:31:22,650 --> 00:31:25,470 Now u equals, technically, 2n plus 1, 656 00:31:25,470 --> 00:31:27,833 because we're only supposed to go to u minus 1. 657 00:31:27,833 --> 00:31:28,500 But that's fine. 658 00:31:28,500 --> 00:31:29,490 That's linear. 659 00:31:29,490 --> 00:31:32,220 And so we can sort in linear time, easy. 660 00:31:32,220 --> 00:31:34,850 This is a super easy problem. 661 00:31:34,850 --> 00:31:37,500 But in each one, we might need to do some transformation. 662 00:31:37,500 --> 00:31:41,410 Part b is a little more interesting. 663 00:31:41,410 --> 00:31:58,230 So we have strings over 26 letters of length at most 664 00:31:58,230 --> 00:32:00,930 10 ceiling log n. 665 00:32:03,640 --> 00:32:05,160 OK, this is a little trickier. 666 00:32:05,160 --> 00:32:06,300 What could I do? 667 00:32:06,300 --> 00:32:10,350 Again, I'd like to see whether radix sort applies. 668 00:32:10,350 --> 00:32:12,120 I should say radix sort sorts. 669 00:32:15,420 --> 00:32:17,130 I'd like to see if radix sort applies. 670 00:32:17,130 --> 00:32:20,550 To do that, I have to map these strings into integers somehow. 671 00:32:20,550 --> 00:32:21,510 Any way to do that? 672 00:32:24,405 --> 00:32:26,155 This is easy if you understand radix sort. 673 00:32:26,155 --> 00:32:26,655 Yeah? 674 00:32:26,655 --> 00:32:29,180 AUDIENCE: Can you just index the letters? 675 00:32:29,180 --> 00:32:30,618 ERIK DEMAINE: Index, the letters. 676 00:32:30,618 --> 00:32:31,118 Yeah. 677 00:32:34,610 --> 00:32:35,910 Yeah, we can map-- 678 00:32:35,910 --> 00:32:36,410 right. 679 00:32:36,410 --> 00:32:39,150 So we can map A to 0, B to 1. 680 00:32:39,150 --> 00:32:39,650 Then what? 681 00:32:42,288 --> 00:32:43,080 AUDIENCE: Oh, wait. 682 00:32:43,080 --> 00:32:45,170 Length-- 683 00:32:45,170 --> 00:32:48,050 ERIK DEMAINE: So that's for each letter. 684 00:32:48,050 --> 00:32:52,520 But we have a lot of letters. 685 00:32:52,520 --> 00:32:53,660 There are only 26 letters. 686 00:32:53,660 --> 00:32:58,070 But then we have 10 log n letters in a string. 687 00:32:58,070 --> 00:33:00,920 That is, together, a single key that we need to sort. 688 00:33:05,130 --> 00:33:05,630 Yeah? 689 00:33:05,630 --> 00:33:09,995 AUDIENCE: Can't we just sort by the first letter first, then-- 690 00:33:09,995 --> 00:33:12,620 ERIK DEMAINE: Sort by the first letter, then the second letter. 691 00:33:12,620 --> 00:33:14,517 That is exactly the opposite of radix sort. 692 00:33:14,517 --> 00:33:16,100 Remember, radix sort, we want to start 693 00:33:16,100 --> 00:33:18,980 by the last letter, and then the next to last letter, 694 00:33:18,980 --> 00:33:20,450 and finally, the first letter. 695 00:33:20,450 --> 00:33:21,530 AUDIENCE: But you want to sort by the first one 696 00:33:21,530 --> 00:33:23,233 in order to alphabetize things. 697 00:33:23,233 --> 00:33:24,650 ERIK DEMAINE: No, to alphabetize-- 698 00:33:24,650 --> 00:33:27,230 we do want to, in the end, sort by the first letter. 699 00:33:27,230 --> 00:33:29,150 But that's at the end. 700 00:33:29,150 --> 00:33:30,690 So at the end-- remember, radix sort 701 00:33:30,690 --> 00:33:32,690 always goes backwards from the least significant 702 00:33:32,690 --> 00:33:34,340 to the most significant. 703 00:33:34,340 --> 00:33:36,090 And so indeed, that is what we want to do. 704 00:33:36,090 --> 00:33:37,250 You're just saying, use radix sort. 705 00:33:37,250 --> 00:33:38,690 But what am I radix sort on? 706 00:33:38,690 --> 00:33:41,150 What am I radix sorting on? 707 00:33:41,150 --> 00:33:44,353 AUDIENCE: Yeah, on the last letters, not the first letters. 708 00:33:44,353 --> 00:33:45,770 ERIK DEMAINE: So technically, that 709 00:33:45,770 --> 00:33:47,953 would be using counting sort on the last letter, 710 00:33:47,953 --> 00:33:49,870 counting sort of the next to last letter, dot, 711 00:33:49,870 --> 00:33:53,330 dot, dot, counting sort on the first letter. 712 00:33:53,330 --> 00:33:56,510 But that is, together, radix sort on something, 713 00:33:56,510 --> 00:34:01,580 or Jason likes to call this tuple sorting. 714 00:34:01,580 --> 00:34:03,150 Tuple sort is the thing-- 715 00:34:03,150 --> 00:34:05,600 is the algorithm that says, sort by the last thing, 716 00:34:05,600 --> 00:34:08,580 then sort by the previous thing, and so on. 717 00:34:08,580 --> 00:34:10,820 You can also think of this as radix sorting 718 00:34:10,820 --> 00:34:13,025 on a number written in base 26. 719 00:34:15,550 --> 00:34:16,639 They're the same thing. 720 00:34:24,440 --> 00:34:28,227 But in the end, we can sort in linear time. 721 00:34:28,227 --> 00:34:30,830 AUDIENCE: How do you ensure that the letters are 722 00:34:30,830 --> 00:34:33,818 sorted in order, though? 723 00:34:33,818 --> 00:34:36,975 Like, how do you ensure that-- how do you tell the algorithm 724 00:34:36,975 --> 00:34:39,850 that you want A to come-- 725 00:34:39,850 --> 00:34:42,596 just like not-- 0 is less than 1, A is less than B. 726 00:34:42,596 --> 00:34:43,429 ERIK DEMAINE: Right. 727 00:34:43,429 --> 00:34:45,230 So I mean, technically, when you call 728 00:34:45,230 --> 00:34:47,270 something like tuple sort-- 729 00:34:47,270 --> 00:34:49,810 or maybe it's even clearer when you call radix sort. 730 00:34:49,810 --> 00:34:51,810 Radix sort, you're giving it a bunch of numbers. 731 00:34:51,810 --> 00:34:55,560 So you're taking these strings and mapping them to numbers. 732 00:34:55,560 --> 00:34:57,775 And when you do that, you get to decide which letter 733 00:34:57,775 --> 00:34:59,150 is the most significant, which is 734 00:34:59,150 --> 00:35:01,100 the least significant, right? 735 00:35:01,100 --> 00:35:04,700 So you will choose to always map the first letter in your string 736 00:35:04,700 --> 00:35:07,250 to position-- 737 00:35:07,250 --> 00:35:13,130 to value, or the-- what do you call it? 738 00:35:13,130 --> 00:35:15,050 Position in positional notation. 739 00:35:15,050 --> 00:35:19,590 Position 26 to the power 10 log n as the most significant. 740 00:35:19,590 --> 00:35:21,090 So it's always the most significant. 741 00:35:21,090 --> 00:35:22,580 Even if your string is of length 1, 742 00:35:22,580 --> 00:35:25,070 you want to put that in the most significant digit. 743 00:35:25,070 --> 00:35:26,870 And you'll pad with zeros at the end 744 00:35:26,870 --> 00:35:29,340 if you run out of letters in your string. 745 00:35:29,340 --> 00:35:31,970 AUDIENCE: How many times are you running counting sort here? 746 00:35:31,970 --> 00:35:34,303 ERIK DEMAINE: How many times am I running counting sort? 747 00:35:34,303 --> 00:35:35,680 Oh, 10 log n times. 748 00:35:35,680 --> 00:35:36,775 Whoops. 749 00:35:36,775 --> 00:35:37,880 Yeah, good question. 750 00:35:37,880 --> 00:35:39,240 Good point. 751 00:35:39,240 --> 00:35:40,890 Yeah, I computed this wrong. 752 00:35:40,890 --> 00:35:42,980 So right. 753 00:35:42,980 --> 00:35:47,180 There are log n digits in the string. 754 00:35:47,180 --> 00:35:49,310 So that is bad. 755 00:35:49,310 --> 00:35:51,270 I mean, it's OK. 756 00:35:51,270 --> 00:35:57,480 We'll end up with n log n running time by the tuple sort. 757 00:36:01,850 --> 00:36:03,920 However-- so that's the tuple sort. 758 00:36:03,920 --> 00:36:07,380 So I should really make this not equivalent. 759 00:36:07,380 --> 00:36:10,410 If I run tuple short letter by letter, I'm going to do-- 760 00:36:10,410 --> 00:36:12,270 I'm running counting sort log n times. 761 00:36:12,270 --> 00:36:16,310 And so I get n log n, because each one takes linear time. 762 00:36:16,310 --> 00:36:20,630 If I map my strings into numbers first, 763 00:36:20,630 --> 00:36:22,355 radix sort doesn't use base 26. 764 00:36:22,355 --> 00:36:24,500 It uses base n. 765 00:36:24,500 --> 00:36:28,070 And then it will only run 10 times, 766 00:36:28,070 --> 00:36:38,520 because 2 to the 10 log n is n to the 10. 767 00:36:38,520 --> 00:36:42,800 And so the numbers that we're sorting are between 0 and n 768 00:36:42,800 --> 00:36:44,040 to the 10. 769 00:36:44,040 --> 00:36:46,290 And so u is n to the 10. 770 00:36:46,290 --> 00:36:49,380 And so that's the case when radix sort runs in linear time. 771 00:36:49,380 --> 00:36:53,210 So if you run tuple short letter by letter, it's slow. 772 00:36:53,210 --> 00:36:55,700 If you run radix sort, it's doing a whole bunch 773 00:36:55,700 --> 00:36:56,720 of letters at once. 774 00:36:56,720 --> 00:36:58,340 Effectively, it's doing log n letters 775 00:36:58,340 --> 00:37:01,700 at a time in a single call to counting sort. 776 00:37:01,700 --> 00:37:05,930 And so the radix sort will actually win and get linear. 777 00:37:09,883 --> 00:37:11,300 There's a subtlety here, which is, 778 00:37:11,300 --> 00:37:13,940 I'm assuming that we can actually take these strings 779 00:37:13,940 --> 00:37:18,380 and convert them into integers in constant time each. 780 00:37:18,380 --> 00:37:20,750 And this problem set was ambiguous. 781 00:37:20,750 --> 00:37:22,610 And both answers were accepted. 782 00:37:22,610 --> 00:37:25,010 If you assume these letters are nice and compactly 783 00:37:25,010 --> 00:37:27,800 stored, and they fit in 10 words, 784 00:37:27,800 --> 00:37:31,280 because a word is at least log n bits long, 785 00:37:31,280 --> 00:37:33,560 then you can actually do this. 786 00:37:33,560 --> 00:37:36,650 If you store each letter in a separate word, 787 00:37:36,650 --> 00:37:40,496 then just reading the entire input will take n log n time. 788 00:37:40,496 --> 00:37:44,947 So that's a subtlety which we don't need to worry too much 789 00:37:44,947 --> 00:37:45,780 about in this class. 790 00:37:45,780 --> 00:37:46,280 Yeah? 791 00:37:46,280 --> 00:37:49,340 AUDIENCE: So [INAUDIBLE] bounding the letters 792 00:37:49,340 --> 00:37:50,312 to numbers. 793 00:37:50,312 --> 00:37:52,267 And like, how would that help? 794 00:37:52,267 --> 00:37:55,130 Because we still have to do 26-- 795 00:37:55,130 --> 00:37:58,340 ERIK DEMAINE: Yeah, there are 26 possible letters, 796 00:37:58,340 --> 00:38:01,260 numbering them 0 to 25. 797 00:38:01,260 --> 00:38:05,450 And then when we take a string, like AA, 798 00:38:05,450 --> 00:38:09,740 we map this into 00 in base 26. 799 00:38:12,350 --> 00:38:13,430 That's a number. 800 00:38:13,430 --> 00:38:18,350 If we do BB, for example, this maps to 11 801 00:38:18,350 --> 00:38:25,820 in base 26, which means 1 times 26 plus 1, which is 27. 802 00:38:25,820 --> 00:38:28,958 OK, so that's the mapping that I mean. 803 00:38:28,958 --> 00:38:31,250 AUDIENCE: You're mapping the whole string [INAUDIBLE]?? 804 00:38:31,250 --> 00:38:34,130 ERIK DEMAINE: The whole string to a single number, yeah. 805 00:38:34,130 --> 00:38:36,740 And there's a subtlety, because I want lexicographic. 806 00:38:36,740 --> 00:38:38,540 I need to pad things with spaces at the end 807 00:38:38,540 --> 00:38:40,850 or pad them with As at the end in case 808 00:38:40,850 --> 00:38:44,410 they're shorter than 10 log n. 809 00:38:44,410 --> 00:38:46,150 OK, cool. 810 00:38:46,150 --> 00:38:46,990 That was b. 811 00:38:50,330 --> 00:38:52,610 c is not very interesting. 812 00:38:52,610 --> 00:39:00,410 It's integers in the range 0 to n squared. 813 00:39:00,410 --> 00:39:02,420 This, I can just solve with radix sort, 814 00:39:02,420 --> 00:39:03,990 because my radix sort, at this point, 815 00:39:03,990 --> 00:39:06,160 we've done it a third time. 816 00:39:06,160 --> 00:39:09,050 Radix sort, we can sort as long as the integers are 817 00:39:09,050 --> 00:39:10,400 bounded by a polynomial. 818 00:39:10,400 --> 00:39:13,940 Here, it's a fixed polynomial with a constant exponent. 819 00:39:13,940 --> 00:39:16,370 So this will-- and this is radix sort, 820 00:39:16,370 --> 00:39:19,220 like we saw, that just calls counting sort twice, 821 00:39:19,220 --> 00:39:21,290 linear time. 822 00:39:21,290 --> 00:39:24,240 d is where things get more interesting. 823 00:39:24,240 --> 00:39:29,100 Let me get this phrasing the same. 824 00:39:29,100 --> 00:39:36,230 So in d, we have rational numbers of the form w over f. 825 00:39:36,230 --> 00:39:40,310 This is some win ratio. 826 00:39:40,310 --> 00:39:42,230 Always in the range 0 to 1. 827 00:39:42,230 --> 00:39:44,420 So we saw w is at most f. 828 00:39:44,420 --> 00:39:49,550 And 0 is less than w, is less than f, is less than n 829 00:39:49,550 --> 00:39:51,380 squared, because the-- 830 00:39:51,380 --> 00:39:53,185 that is really confusing-- 831 00:39:55,920 --> 00:39:57,470 is less than n squared-- 832 00:39:57,470 --> 00:40:00,200 those are separate statements-- 833 00:40:00,200 --> 00:40:02,240 because the f actually comes from part c. 834 00:40:02,240 --> 00:40:04,800 And c is really a setup for this one. 835 00:40:04,800 --> 00:40:06,770 Doesn't really matter what this means. 836 00:40:06,770 --> 00:40:09,050 It's just that we have numbers w and f, 837 00:40:09,050 --> 00:40:10,460 where w is always less than f. 838 00:40:10,460 --> 00:40:12,420 And they're between 0 and n squared. 839 00:40:12,420 --> 00:40:15,500 So you should think, this is a good range for me, right? 840 00:40:15,500 --> 00:40:17,240 That I'm representing this rational 841 00:40:17,240 --> 00:40:19,650 in terms of two numbers between 0 and n squared. 842 00:40:19,650 --> 00:40:22,700 So there's like n to the 4th possible choices 843 00:40:22,700 --> 00:40:24,500 for what w and f are. 844 00:40:24,500 --> 00:40:26,780 So the range of my values is n to the 4th. 845 00:40:26,780 --> 00:40:29,780 That's the setting where radix sort should run fast. 846 00:40:29,780 --> 00:40:33,290 Unfortunately, these numbers-- what I want to sort by 847 00:40:33,290 --> 00:40:34,500 is not an integer. 848 00:40:34,500 --> 00:40:36,650 It's a rational. 849 00:40:36,650 --> 00:40:39,020 And that's annoying. 850 00:40:39,020 --> 00:40:43,850 So there are a couple of ways to solve this problem. 851 00:40:43,850 --> 00:40:48,980 In general, a good way to solve sorting is to use merge sort. 852 00:40:48,980 --> 00:40:50,590 Merge sort is always a good answer. 853 00:40:50,590 --> 00:40:51,770 It's not the best answer. 854 00:40:51,770 --> 00:40:54,540 In these cases, we shaved off a log. 855 00:40:54,540 --> 00:40:55,730 We got to linear time. 856 00:40:55,730 --> 00:40:57,330 But n log n is pretty good. 857 00:40:57,330 --> 00:40:58,530 It's pretty close to n. 858 00:40:58,530 --> 00:41:01,820 So first goal might be, can we even achieve 859 00:41:01,820 --> 00:41:03,428 n log n via merge sort? 860 00:41:06,710 --> 00:41:13,805 What would I need to do in order to actually apply merge sort 861 00:41:13,805 --> 00:41:14,840 to this instance? 862 00:41:18,700 --> 00:41:21,690 What does merge sort do to its keys? 863 00:41:32,110 --> 00:41:32,610 Sorry? 864 00:41:32,610 --> 00:41:34,230 AUDIENCE: Isolate and compare them. 865 00:41:34,230 --> 00:41:36,930 ERIK DEMAINE: It isolates and compares them, yeah, right. 866 00:41:36,930 --> 00:41:39,030 So there's an array data structure. 867 00:41:39,030 --> 00:41:40,410 And it indexes into the array. 868 00:41:40,410 --> 00:41:41,820 That's the isolation. 869 00:41:41,820 --> 00:41:45,330 But then the thing it actually does with the items themselves 870 00:41:45,330 --> 00:41:46,427 is always a comparison. 871 00:41:46,427 --> 00:41:48,510 And this is why we introduced the comparison model 872 00:41:48,510 --> 00:41:51,150 and proved an n log n lower bound in the comparison model, 873 00:41:51,150 --> 00:41:53,010 because merge sort, and insertion sort, 874 00:41:53,010 --> 00:41:55,920 and selection sort are all comparison algorithms. 875 00:41:55,920 --> 00:41:57,443 Radix sort is not. 876 00:41:57,443 --> 00:41:58,110 But this one is. 877 00:41:58,110 --> 00:42:03,150 But to apply merge sort, I need to say, how do I compare wi 878 00:42:03,150 --> 00:42:09,375 over fi versus wj over fj? 879 00:42:12,850 --> 00:42:15,580 My computer only deals with integers. 880 00:42:15,580 --> 00:42:19,090 We can't actually represent wi over fi 881 00:42:19,090 --> 00:42:23,710 explicitly in binary, because it has infinitely many bits. 882 00:42:23,710 --> 00:42:25,240 But I can represent it implicitly 883 00:42:25,240 --> 00:42:27,080 by storing wi and fi. 884 00:42:27,080 --> 00:42:27,580 Yeah? 885 00:42:27,580 --> 00:42:29,620 AUDIENCE: Multiply by fi and fj. 886 00:42:29,620 --> 00:42:31,570 ERIK DEMAINE: Multiply by fi and fj, yeah. 887 00:42:31,570 --> 00:42:33,220 When I went-- I didn't go to school, 888 00:42:33,220 --> 00:42:37,123 but then we learned cross multiplication, 889 00:42:37,123 --> 00:42:39,165 which is the same as multiplying both sides by fi 890 00:42:39,165 --> 00:42:41,530 and multiplying both sides by fj, as you said. 891 00:42:41,530 --> 00:42:48,880 So then we get fi fj less than question mark f-- 892 00:42:48,880 --> 00:42:53,433 whatever-- fi wj. 893 00:42:53,433 --> 00:42:55,600 When we do that, we better make sure that the things 894 00:42:55,600 --> 00:42:57,183 we're multiplying by are non-negative. 895 00:42:57,183 --> 00:42:58,750 Otherwise, the sign flips. 896 00:42:58,750 --> 00:43:01,870 But here, we assume they're all non-negative. 897 00:43:01,870 --> 00:43:02,988 So this is good. 898 00:43:02,988 --> 00:43:05,030 And now we're just multiplying two integers here, 899 00:43:05,030 --> 00:43:06,970 multiplying two integers here, and comparing. 900 00:43:06,970 --> 00:43:08,803 Those are all things I can do in a word RAM. 901 00:43:11,440 --> 00:43:13,600 So this was actually the intended solution 902 00:43:13,600 --> 00:43:14,980 when this problem was posed. 903 00:43:14,980 --> 00:43:16,510 Here's a way to do comparison sort. 904 00:43:16,510 --> 00:43:17,890 We get n log n. 905 00:43:17,890 --> 00:43:21,140 But in fact, you can achieve linear time. 906 00:43:21,140 --> 00:43:21,640 Yeah? 907 00:43:21,640 --> 00:43:23,723 AUDIENCE: [INAUDIBLE] that solution, how would you 908 00:43:23,723 --> 00:43:26,110 quickly say which one's bigger? 909 00:43:26,110 --> 00:43:29,573 Because wi times f of j, one of them 910 00:43:29,573 --> 00:43:32,320 belongs to one of the Pokemons, and the other one 911 00:43:32,320 --> 00:43:34,852 is [INAUDIBLE]. 912 00:43:34,852 --> 00:43:37,060 ERIK DEMAINE: I feel like there's a joke here, like-- 913 00:43:37,060 --> 00:43:38,523 AUDIENCE: [INAUDIBLE] 914 00:43:38,523 --> 00:43:39,940 ERIK DEMAINE: Pikachu is superior. 915 00:43:39,940 --> 00:43:42,860 That's always the answer. 916 00:43:42,860 --> 00:43:44,950 So how do I tell whether one Pokemon 917 00:43:44,950 --> 00:43:46,730 is superior to the other? 918 00:43:46,730 --> 00:43:51,510 If I multiply my-- 919 00:43:51,510 --> 00:43:56,020 I multiply i's f value with j's w value. 920 00:43:56,020 --> 00:43:59,890 And I see whether that's greater than i's w 921 00:43:59,890 --> 00:44:02,510 value times j's f value. 922 00:44:02,510 --> 00:44:03,830 And if it is-- 923 00:44:03,830 --> 00:44:06,718 so these are equivalent. 924 00:44:06,718 --> 00:44:08,260 If this one is greater than this one, 925 00:44:08,260 --> 00:44:10,180 I know that this is greater than this. 926 00:44:10,180 --> 00:44:14,350 These are equivalent sentences by mathematics, by algebra. 927 00:44:14,350 --> 00:44:16,630 And so this is what I want to know. 928 00:44:16,630 --> 00:44:18,760 This would say j is superior to i. 929 00:44:18,760 --> 00:44:22,010 And so I determine that by actually doing this. 930 00:44:22,010 --> 00:44:25,060 So then I don't have to divide and deal with real numbers, 931 00:44:25,060 --> 00:44:27,790 because I don't know how, because I'm a computer. 932 00:44:31,110 --> 00:44:32,430 We're all computers in the end. 933 00:44:39,345 --> 00:44:39,845 OK. 934 00:44:53,050 --> 00:44:57,100 So it would be great if my numbers all 935 00:44:57,100 --> 00:44:58,670 had the same denominator. 936 00:44:58,670 --> 00:45:02,530 If they all had the same f, then I could just compare the w's. 937 00:45:02,530 --> 00:45:06,220 So that's one intuition for why we can actually 938 00:45:06,220 --> 00:45:09,070 do this in linear time. 939 00:45:09,070 --> 00:45:11,440 But the way I like to think about it-- so let's 940 00:45:11,440 --> 00:45:16,240 just draw the real interval from 0 to 1. 941 00:45:16,240 --> 00:45:23,040 And there are various spots all over here that represent-- 942 00:45:23,040 --> 00:45:24,310 I can't actually compute this. 943 00:45:24,310 --> 00:45:27,310 But conceptually, each of these wi fi's falls somewhere 944 00:45:27,310 --> 00:45:29,790 in that interval from 0 to 1. 945 00:45:29,790 --> 00:45:34,012 And I want to sort them somehow. 946 00:45:34,012 --> 00:45:35,470 So one thing that would be great is 947 00:45:35,470 --> 00:45:37,030 if I could take these real numbers 948 00:45:37,030 --> 00:45:42,830 and somehow map them to integers, 949 00:45:42,830 --> 00:45:47,560 which are uniformly spaced, maybe a few more of them. 950 00:45:47,560 --> 00:45:50,620 But these go from 0 to u minus 1. 951 00:45:50,620 --> 00:45:53,080 And if I could get u relatively small, 952 00:45:53,080 --> 00:45:55,550 and I could map each of these-- 953 00:45:55,550 --> 00:45:58,450 so I want that mapping to be order preserving. 954 00:45:58,450 --> 00:46:03,160 And I want two very close, but distinct items to map to-- 955 00:46:03,160 --> 00:46:04,600 distinct keys here. 956 00:46:04,600 --> 00:46:06,760 I want them to map to distinct integers down here. 957 00:46:06,760 --> 00:46:08,968 If I could do that, then I just sort by the integers. 958 00:46:08,968 --> 00:46:13,070 And that's the same as sorting by the real numbers. 959 00:46:13,070 --> 00:46:17,650 And so at this point, I wonder, how close 960 00:46:17,650 --> 00:46:20,350 can two of these numbers be? 961 00:46:20,350 --> 00:46:30,010 So how close can two keys be? 962 00:46:30,010 --> 00:46:37,510 So I want to consider wi over fi minus wj over fj 963 00:46:37,510 --> 00:46:38,580 in absolute value. 964 00:46:41,110 --> 00:46:43,550 Now I do algebra. 965 00:46:43,550 --> 00:46:44,830 So this is-- 966 00:46:44,830 --> 00:46:47,900 I'd like to bring this into one ratio. 967 00:46:47,900 --> 00:46:49,090 So this is-- 968 00:46:49,090 --> 00:46:52,782 I can do that by multiplying 1 by fi, 1 by fj. 969 00:46:52,782 --> 00:46:58,360 Now that's wi fj minus wj fi, which should 970 00:46:58,360 --> 00:47:01,030 look a lot like something here. 971 00:47:01,030 --> 00:47:02,740 But never mind. 972 00:47:02,740 --> 00:47:04,520 I'm sure there's a deep connection here. 973 00:47:04,520 --> 00:47:08,430 I can probably use this to prove that and vice versa. 974 00:47:08,430 --> 00:47:08,930 Cool. 975 00:47:08,930 --> 00:47:11,980 So with some absolute values, same thing. 976 00:47:11,980 --> 00:47:15,160 Maybe these are non-negative, so I can actually 977 00:47:15,160 --> 00:47:18,220 just put absolute values on the top part. 978 00:47:18,220 --> 00:47:24,070 And OK, wi is an integer, fj is an integer, wj is an integer, 979 00:47:24,070 --> 00:47:27,310 fi is an integer, all greater than or equal to 0. 980 00:47:27,310 --> 00:47:30,560 So this thing is an integer. 981 00:47:30,560 --> 00:47:33,405 So it could be equal to 0. 982 00:47:33,405 --> 00:47:35,530 It's a non-negative integer, because all the things 983 00:47:35,530 --> 00:47:36,400 are non-negative. 984 00:47:36,400 --> 00:47:37,390 It could be equal to 0. 985 00:47:37,390 --> 00:47:38,765 But if they're equal to 0, that's 986 00:47:38,765 --> 00:47:41,590 actually identical ratios, right? 987 00:47:41,590 --> 00:47:43,240 If this is 0, the whole thing is 0. 988 00:47:43,240 --> 00:47:46,180 And so these two values were the same. 989 00:47:46,180 --> 00:47:47,560 OK, but let's suppose it's not 0. 990 00:47:47,560 --> 00:47:50,170 If it's not 0, it's actually at least 1, 991 00:47:50,170 --> 00:47:53,950 the absolute value, because it's an integer. 992 00:47:53,950 --> 00:47:55,210 What about the bottom? 993 00:47:55,210 --> 00:47:57,950 fi-- so now we want this-- 994 00:47:57,950 --> 00:48:00,040 I want to know how small this ratio can be. 995 00:48:00,040 --> 00:48:02,860 It's going to be small when this is small and this is big. 996 00:48:02,860 --> 00:48:05,030 How big could fi fj be? 997 00:48:05,030 --> 00:48:07,720 Well, we're told that all the f's are less than n squared. 998 00:48:07,720 --> 00:48:10,610 So this thing is at most n squared, 999 00:48:10,610 --> 00:48:14,380 n to the 4th, less than n the 4th-- 1000 00:48:14,380 --> 00:48:17,620 n squared minus 1 squared, less than n to the 4th. 1001 00:48:17,620 --> 00:48:20,050 AUDIENCE: [INAUDIBLE] 1002 00:48:20,050 --> 00:48:21,910 ERIK DEMAINE: fi is at most n squared. 1003 00:48:21,910 --> 00:48:24,230 fj is at most n squared. 1004 00:48:24,230 --> 00:48:26,450 So it's n squared squared. 1005 00:48:26,450 --> 00:48:29,570 So this is at least 1 over n to the 4th. 1006 00:48:29,570 --> 00:48:32,830 So the closest the two points can get here 1007 00:48:32,830 --> 00:48:34,810 is 1 over n to the 4th. 1008 00:48:34,810 --> 00:48:39,430 So what can I do to scale that up to make them 1009 00:48:39,430 --> 00:48:41,860 kind of like integers? 1010 00:48:41,860 --> 00:48:45,400 Multiply by n to the 4th. 1011 00:48:45,400 --> 00:48:54,220 So just multiply by n to the 4th and then floor. 1012 00:48:54,220 --> 00:48:59,170 So we're going to take each fi over-- 1013 00:48:59,170 --> 00:49:00,640 I'd like to compute this ratio. 1014 00:49:00,640 --> 00:49:02,110 But I don't know how. 1015 00:49:02,110 --> 00:49:05,320 So instead, I'm going to take fi, multiply-- 1016 00:49:05,320 --> 00:49:05,890 OK. 1017 00:49:05,890 --> 00:49:08,890 Conceptually, what I want to do is multiply by n to the 4th 1018 00:49:08,890 --> 00:49:10,510 and take the floor. 1019 00:49:10,510 --> 00:49:16,840 How do I actually do this in a machine that doesn't 1020 00:49:16,840 --> 00:49:18,310 have real numbers like this? 1021 00:49:20,900 --> 00:49:23,575 So I don't have a floor operation. 1022 00:49:23,575 --> 00:49:26,200 I just have integer operations. 1023 00:49:26,200 --> 00:49:36,610 Then I can take fi, multiply it by n to the 4th, 1024 00:49:36,610 --> 00:49:38,450 and integer divide by wj. 1025 00:49:41,290 --> 00:49:42,400 That is the same-- 1026 00:49:42,400 --> 00:49:45,160 that computes exactly this, because I 1027 00:49:45,160 --> 00:49:46,910 can do the multiplication and the division 1028 00:49:46,910 --> 00:49:50,192 in either order in real space. 1029 00:49:50,192 --> 00:49:52,400 And then this does the floor at the appropriate time. 1030 00:49:52,400 --> 00:49:54,620 But this is just operations on integers. 1031 00:49:54,620 --> 00:49:57,010 And now these are integers representing 1032 00:49:57,010 --> 00:49:59,080 how good my Pokemon are. 1033 00:49:59,080 --> 00:50:02,110 They have the property that any two distinct ones-- 1034 00:50:02,110 --> 00:50:04,690 before I take the floor, any two distinct ones are at least 1 1035 00:50:04,690 --> 00:50:05,493 apart. 1036 00:50:05,493 --> 00:50:07,660 So after I take the floor, they will remain 1 apart. 1037 00:50:07,660 --> 00:50:09,760 They will remain distinct integers. 1038 00:50:09,760 --> 00:50:12,430 And so I have successfully mapped my real numbers 1039 00:50:12,430 --> 00:50:14,710 to integers where distinct real numbers match 1040 00:50:14,710 --> 00:50:15,590 distinct integers. 1041 00:50:15,590 --> 00:50:16,090 Yeah? 1042 00:50:16,090 --> 00:50:16,715 AUDIENCE: Wait. 1043 00:50:16,715 --> 00:50:18,870 So why is fi now in the numerator, and wi 1044 00:50:18,870 --> 00:50:19,760 in the denominator? 1045 00:50:19,760 --> 00:50:21,010 ERIK DEMAINE: Did I flip them? 1046 00:50:21,010 --> 00:50:22,450 Yeah, sorry. 1047 00:50:22,450 --> 00:50:26,620 Please invert everything-- just here. 1048 00:50:26,620 --> 00:50:27,670 This is w and fi. 1049 00:50:27,670 --> 00:50:28,630 That was just a typo. 1050 00:50:33,390 --> 00:50:35,016 That's all of them. 1051 00:50:35,016 --> 00:50:36,480 OK. 1052 00:50:36,480 --> 00:50:38,200 AUDIENCE: Are they both i's or j's? 1053 00:50:41,840 --> 00:50:45,310 ERIK DEMAINE: These are supposed to both be i's, yeah. 1054 00:50:45,310 --> 00:50:46,410 Thank you. 1055 00:50:46,410 --> 00:50:49,680 This was for each Pokemon, i, we're 1056 00:50:49,680 --> 00:50:52,595 going to compute this as our key. 1057 00:50:52,595 --> 00:50:54,720 And then we're going to sort by those integer keys. 1058 00:50:54,720 --> 00:50:56,678 And that will sort the Pokemon by their ratios. 1059 00:51:00,930 --> 00:51:05,080 Let's write mon for monster. 1060 00:51:05,080 --> 00:51:05,580 Yeah? 1061 00:51:05,580 --> 00:51:10,560 AUDIENCE: [INAUDIBLE] u minus 1 [INAUDIBLE]?? 1062 00:51:10,560 --> 00:51:12,960 ERIK DEMAINE: So u was just a-- 1063 00:51:12,960 --> 00:51:16,028 sorry, this is-- a label on this thing might help you. 1064 00:51:16,028 --> 00:51:16,570 AUDIENCE: Oh. 1065 00:51:16,570 --> 00:51:19,050 ERIK DEMAINE: Yeah. 1066 00:51:19,050 --> 00:51:21,720 So now my u-- oh, right. 1067 00:51:21,720 --> 00:51:23,110 What is my u? 1068 00:51:23,110 --> 00:51:24,480 What is my largest key? 1069 00:51:32,048 --> 00:51:36,145 It occurs to me, I really would like fi to be bigger than 0. 1070 00:51:36,145 --> 00:51:40,540 But let's not worry about it. 1071 00:51:40,540 --> 00:51:42,040 How big can u be? 1072 00:51:42,040 --> 00:51:45,410 Well, the biggest this can be is if fi is small, 1073 00:51:45,410 --> 00:51:46,240 and this is big. 1074 00:51:46,240 --> 00:51:47,920 Let's say fi can only go down to 1. 1075 00:51:47,920 --> 00:51:49,645 Otherwise, we'll get a division by 0. 1076 00:51:49,645 --> 00:51:51,910 We have to deal with infinity especially. 1077 00:51:51,910 --> 00:51:54,250 Probably, the problem isn't even well defined then. 1078 00:51:54,250 --> 00:51:55,270 How big could this be? 1079 00:51:55,270 --> 00:51:57,049 Well, I know the wi's-- 1080 00:51:57,049 --> 00:51:58,632 AUDIENCE: f's are defined as positive. 1081 00:51:58,632 --> 00:51:59,890 ERIK DEMAINE: Oh, good. 1082 00:51:59,890 --> 00:52:00,498 Thank you. 1083 00:52:00,498 --> 00:52:02,290 So there's also a positive constraint here. 1084 00:52:02,290 --> 00:52:07,720 Just I failed to preserve that constraint in my mapping 1085 00:52:07,720 --> 00:52:10,150 from the word problem into the formal problem. 1086 00:52:12,950 --> 00:52:14,800 So f is the least 1. 1087 00:52:14,800 --> 00:52:15,490 Good. 1088 00:52:15,490 --> 00:52:17,890 But worst case is when it's 1. 1089 00:52:17,890 --> 00:52:19,730 And when wi-- how big could it be? 1090 00:52:19,730 --> 00:52:21,340 Well, n squared minus 1. 1091 00:52:21,340 --> 00:52:23,680 So this could be, basically, n squared 1092 00:52:23,680 --> 00:52:27,415 times n to the 4th divided by 1, which is n to the 6th. 1093 00:52:27,415 --> 00:52:31,720 So w-- or sorry, u, the largest key I can have plus 1, 1094 00:52:31,720 --> 00:52:33,550 is n to the 6th. 1095 00:52:33,550 --> 00:52:36,400 But that's OK, because radix sort can handle 1096 00:52:36,400 --> 00:52:38,780 any fixed polynomial in n. 1097 00:52:38,780 --> 00:52:41,305 So it's going to end up doing six counting sort passes. 1098 00:52:46,960 --> 00:52:51,820 OK, that's problem 3. 1099 00:52:51,820 --> 00:52:54,640 Let's move on to problem 4. 1100 00:53:18,390 --> 00:53:22,260 So problem 4, MIT has employed Gank Frehry. 1101 00:53:22,260 --> 00:53:22,800 Who's that? 1102 00:53:26,610 --> 00:53:27,790 Frank Gehry, yeah. 1103 00:53:27,790 --> 00:53:34,970 This is a common encoding that Jason really likes. 1104 00:53:34,970 --> 00:53:38,820 I've grown to like it. 1105 00:53:38,820 --> 00:53:42,720 This is called spoonerism, where you replace some part 1106 00:53:42,720 --> 00:53:45,580 of the beginning of your thing. 1107 00:53:45,580 --> 00:53:47,410 OK, that's one joke. 1108 00:53:47,410 --> 00:53:49,357 There's another joke in this problem. 1109 00:53:49,357 --> 00:53:51,690 Anyway, they're building a new wing of the Stata Center, 1110 00:53:51,690 --> 00:53:53,280 as one does. 1111 00:53:53,280 --> 00:53:55,510 We have a bunch of cubes. 1112 00:53:55,510 --> 00:53:58,620 If you read long enough, you realize that's a red herring. 1113 00:53:58,620 --> 00:54:00,810 Cubes do not play a role in this problem. 1114 00:54:00,810 --> 00:54:03,870 In the end, what we have is a bunch 1115 00:54:03,870 --> 00:54:10,197 of integers, which happen to be the side length of the cubes. 1116 00:54:10,197 --> 00:54:12,030 But we just care about the side lengths, not 1117 00:54:12,030 --> 00:54:14,310 their volume or anything-- 1118 00:54:14,310 --> 00:54:21,640 s n minus 1. 1119 00:54:21,640 --> 00:54:32,015 And we want two numbers in s summing to h. 1120 00:54:32,015 --> 00:54:34,440 AUDIENCE: This is dumb, but how can cubes 1121 00:54:34,440 --> 00:54:36,553 have more than six sides? 1122 00:54:36,553 --> 00:54:38,220 ERIK DEMAINE: This is a side length, not 1123 00:54:38,220 --> 00:54:39,610 the number of sides. 1124 00:54:39,610 --> 00:54:40,470 So a cube-- 1125 00:54:40,470 --> 00:54:43,518 AUDIENCE: Oh, OK. 1126 00:54:43,518 --> 00:54:44,310 ERIK DEMAINE: Cool. 1127 00:54:44,310 --> 00:54:46,470 I didn't know we'd be doing 3D geometry today. 1128 00:54:46,470 --> 00:54:48,090 That's si. 1129 00:54:48,090 --> 00:54:49,800 OK, so you got little cubes. 1130 00:54:49,800 --> 00:54:52,020 You've got big cubes. 1131 00:54:52,020 --> 00:54:53,100 This is a small si. 1132 00:54:53,100 --> 00:54:53,910 This is a big si. 1133 00:54:53,910 --> 00:54:54,868 Doesn't matter, though. 1134 00:54:54,868 --> 00:54:56,163 They're just numbers. 1135 00:54:56,163 --> 00:54:57,330 We're not using them at all. 1136 00:54:57,330 --> 00:54:58,650 In the problem, you're trying to like 1137 00:54:58,650 --> 00:54:59,860 stack one cube on the other. 1138 00:54:59,860 --> 00:55:01,820 But all we really care about is two numbers 1139 00:55:01,820 --> 00:55:06,930 whose sum, regular old sum, is exactly h, ideally. 1140 00:55:06,930 --> 00:55:09,250 There's going to be two versions of this problem. 1141 00:55:09,250 --> 00:55:12,360 And so the first goal is to solve this exactly 1142 00:55:12,360 --> 00:55:17,550 in linear expected time. 1143 00:55:17,550 --> 00:55:20,100 That's what the problem says. 1144 00:55:20,100 --> 00:55:23,110 So what do we know? 1145 00:55:23,110 --> 00:55:24,523 Well, linear time, that's-- 1146 00:55:24,523 --> 00:55:26,940 can't get much faster than that, because we need that just 1147 00:55:26,940 --> 00:55:28,170 to read the input. 1148 00:55:28,170 --> 00:55:32,220 Expected time-- hashing, right? 1149 00:55:32,220 --> 00:55:34,680 We're told, basically, we should use hashing. 1150 00:55:34,680 --> 00:55:36,540 Now, if we're really annoying, maybe we 1151 00:55:36,540 --> 00:55:38,290 throw that in even when you don't need it. 1152 00:55:38,290 --> 00:55:40,750 But that's pretty rare. 1153 00:55:40,750 --> 00:55:43,728 So when we see expected, we should, in a problem 1154 00:55:43,728 --> 00:55:45,270 set setting like this-- in real life, 1155 00:55:45,270 --> 00:55:46,728 you never know what you should use. 1156 00:55:46,728 --> 00:55:49,110 But in our-- with your learning in this class, 1157 00:55:49,110 --> 00:55:51,082 we're going to tell you basically what 1158 00:55:51,082 --> 00:55:52,290 tricks you're allowed to use. 1159 00:55:52,290 --> 00:55:54,040 Here, you're allowed to use randomization. 1160 00:55:54,040 --> 00:55:55,590 So probably, we need it. 1161 00:55:55,590 --> 00:55:59,760 Indeed, you need it to achieve this bound. 1162 00:55:59,760 --> 00:56:01,650 Cool. 1163 00:56:01,650 --> 00:56:03,210 Hashing. 1164 00:56:03,210 --> 00:56:05,460 Not obvious how to approach this problem with hashing. 1165 00:56:05,460 --> 00:56:08,640 So I'm going to give you the way I-- 1166 00:56:08,640 --> 00:56:10,800 it's hard for me to not know this algorithm. 1167 00:56:10,800 --> 00:56:14,400 But to me, the first thing you should think about 1168 00:56:14,400 --> 00:56:17,760 is if I have linear time and n things, 1169 00:56:17,760 --> 00:56:22,800 and I'm going to use hashing, the obvious thing to do 1170 00:56:22,800 --> 00:56:25,590 is to take those n things and put them in a hash table. 1171 00:56:25,590 --> 00:56:26,400 Build. 1172 00:56:26,400 --> 00:56:28,540 Why not? 1173 00:56:28,540 --> 00:56:38,400 So let's just build a hash table on all the keys in s. 1174 00:56:38,400 --> 00:56:39,540 That's idea one. 1175 00:56:45,510 --> 00:56:46,980 Seems like the first thing to try. 1176 00:56:46,980 --> 00:56:49,440 So what does that let me do? 1177 00:56:49,440 --> 00:56:50,640 It lets me-- 1178 00:56:50,640 --> 00:56:52,950 I just erased the interface for hash tables. 1179 00:56:52,950 --> 00:56:55,170 But I can build a sequence out of it. 1180 00:56:55,170 --> 00:56:57,630 But normally, it gives me a set interface. 1181 00:56:57,630 --> 00:56:59,880 So I can call find now in constant time. 1182 00:56:59,880 --> 00:57:03,330 It lets me, given the number, determine immediately 1183 00:57:03,330 --> 00:57:06,050 whether that number is in s. 1184 00:57:06,050 --> 00:57:07,800 Well, that sounds interesting, because I'm 1185 00:57:07,800 --> 00:57:09,990 looking for two numbers in s. 1186 00:57:09,990 --> 00:57:11,740 So it lets me find one of them. 1187 00:57:11,740 --> 00:57:13,710 So I call it twice? 1188 00:57:13,710 --> 00:57:14,880 No. 1189 00:57:14,880 --> 00:57:17,342 Calling it twice and only spending constant time 1190 00:57:17,342 --> 00:57:19,050 on this beautiful data structure will not 1191 00:57:19,050 --> 00:57:20,220 give you anything useful. 1192 00:57:23,310 --> 00:57:25,070 But we have linear time, right? 1193 00:57:25,070 --> 00:57:26,940 So in addition to building a table, 1194 00:57:26,940 --> 00:57:29,730 we could call find on that table a linear number 1195 00:57:29,730 --> 00:57:32,610 of times, because each find only takes constant expected 1196 00:57:32,610 --> 00:57:33,910 amortized time. 1197 00:57:33,910 --> 00:57:37,410 So if I do n of them, that will take linear expected time. 1198 00:57:37,410 --> 00:57:40,850 The amortization disappears, because I'm using it n times. 1199 00:57:40,850 --> 00:57:41,725 AUDIENCE: [INAUDIBLE] 1200 00:57:41,725 --> 00:57:42,725 ERIK DEMAINE: Oh, right. 1201 00:57:42,725 --> 00:57:44,020 Find never has amortization. 1202 00:57:44,020 --> 00:57:46,740 So it doesn't disappear, because it was never there. 1203 00:57:46,740 --> 00:57:47,610 Never mind. 1204 00:57:47,610 --> 00:57:52,530 I can afford n calls, or 5n calls, to find, 1205 00:57:52,530 --> 00:57:55,290 because each one costs constant expected. 1206 00:57:55,290 --> 00:57:57,250 And the total for that will be linear time. 1207 00:57:57,250 --> 00:58:04,966 So the next idea is let's just somehow call find 1208 00:58:04,966 --> 00:58:12,280 a linear number of times, OK? 1209 00:58:12,280 --> 00:58:16,440 So I want to find two numbers summing to a given value, h. 1210 00:58:16,440 --> 00:58:19,540 That wasn't maybe clear, but h is given. 1211 00:58:19,540 --> 00:58:20,440 AUDIENCE: Sorry. 1212 00:58:20,440 --> 00:58:22,720 How long does it take to build the hash table? 1213 00:58:22,720 --> 00:58:24,130 ERIK DEMAINE: How long does it take to build a hash table? 1214 00:58:24,130 --> 00:58:26,422 It was previously on this board-- linear expected time. 1215 00:58:29,360 --> 00:58:32,421 See previous lecture. 1216 00:58:32,421 --> 00:58:35,220 No, two years ago. 1217 00:58:35,220 --> 00:58:35,860 OK. 1218 00:58:35,860 --> 00:58:37,410 Well, if we're going to do this a linear number of times, 1219 00:58:37,410 --> 00:58:39,310 I guess we should have a for loop. 1220 00:58:39,310 --> 00:58:41,145 Let's do a for loop over the numbers. 1221 00:58:41,145 --> 00:58:44,370 That's the next idea. 1222 00:58:44,370 --> 00:58:48,247 Loop over s. 1223 00:58:48,247 --> 00:58:49,830 And at this point, we're done, almost. 1224 00:58:52,560 --> 00:58:55,330 I want space. 1225 00:58:55,330 --> 00:58:57,000 So I want to loop over the numbers. 1226 00:58:57,000 --> 00:58:58,860 And each one, I want to do a find. 1227 00:58:58,860 --> 00:59:00,900 That's kind of all I have time to do. 1228 00:59:00,900 --> 00:59:04,900 So seems like a natural thing to try. 1229 00:59:04,900 --> 00:59:06,670 This is by no means easy. 1230 00:59:06,670 --> 00:59:07,890 Don't get me wrong. 1231 00:59:07,890 --> 00:59:09,570 Having these ideas is-- 1232 00:59:09,570 --> 00:59:12,150 while I'm explaining them as the obvious ideas, 1233 00:59:12,150 --> 00:59:13,600 they're not obvious. 1234 00:59:13,600 --> 00:59:17,040 But they are easy, at least, just not 1235 00:59:17,040 --> 00:59:19,300 obvious to come up with the easy ideas. 1236 00:59:19,300 --> 00:59:22,590 So let's loop over s, somehow call find, 1237 00:59:22,590 --> 00:59:23,670 using our hash table. 1238 00:59:23,670 --> 00:59:25,860 So the order is actually, we're going to build the hash table, 1239 00:59:25,860 --> 00:59:26,500 then loop. 1240 00:59:26,500 --> 00:59:28,583 And inside the loop, we're going to call find once 1241 00:59:28,583 --> 00:59:31,000 per loop iteration. 1242 00:59:31,000 --> 00:59:32,020 So let's do it. 1243 00:59:32,020 --> 00:59:37,410 Let's say, for si in S-- 1244 00:59:40,320 --> 00:59:42,210 so I want to find two numbers. 1245 00:59:42,210 --> 00:59:46,050 Here, I have exhaustively looped over one number. 1246 00:59:46,050 --> 00:59:47,850 I just need to find the second number that 1247 00:59:47,850 --> 00:59:49,600 can possibly add up, right? 1248 00:59:49,600 --> 00:59:57,390 I want to find whether there's an sj in S such 1249 00:59:57,390 --> 01:00:05,090 that si plus sj equals h. 1250 01:00:08,030 --> 01:00:09,530 Can I do that query with find? 1251 01:00:12,290 --> 01:00:12,790 How? 1252 01:00:16,350 --> 01:00:17,790 So what does find do? 1253 01:00:17,790 --> 01:00:22,590 Find says, if I give you a key, it will tell me whether-- like, 1254 01:00:22,590 --> 01:00:25,800 if I knew what sj was, it would tell me 1255 01:00:25,800 --> 01:00:29,346 whether it's in S. Yeah? 1256 01:00:29,346 --> 01:00:31,776 AUDIENCE: Can't you just subtract h from si 1257 01:00:31,776 --> 01:00:33,450 and then see if [INAUDIBLE]? 1258 01:00:33,450 --> 01:00:38,860 ERIK DEMAINE: Subtract h from si and see whether that exists. 1259 01:00:38,860 --> 01:00:39,720 Did get it right? 1260 01:00:39,720 --> 01:00:40,890 AUDIENCE: h minus si. 1261 01:00:40,890 --> 01:00:42,092 ERIK DEMAINE: h minus si. 1262 01:00:42,092 --> 01:00:43,582 I always get it wrong. 1263 01:00:46,500 --> 01:00:48,330 Don't feel bad that you also got it wrong. 1264 01:00:48,330 --> 01:00:51,040 It makes me feel better, because I always get it wrong. 1265 01:00:51,040 --> 01:00:52,650 So the claim is this. 1266 01:00:52,650 --> 01:00:53,460 Why? 1267 01:00:53,460 --> 01:00:57,570 Because what we want to do is find-- 1268 01:00:57,570 --> 01:00:58,890 well, OK. 1269 01:00:58,890 --> 01:01:01,040 Let's see what it says over here. 1270 01:01:01,040 --> 01:01:05,100 So if we do h minus si equals sj-- 1271 01:01:05,100 --> 01:01:07,740 so these are equivalent statements, 1272 01:01:07,740 --> 01:01:10,320 just by moving the si over. 1273 01:01:10,320 --> 01:01:12,460 And this is a query we can do. 1274 01:01:12,460 --> 01:01:15,330 So let's remember, these are things we know. 1275 01:01:19,830 --> 01:01:21,870 And s.j is something we don't know. 1276 01:01:21,870 --> 01:01:24,140 All that we know is that it's an s. 1277 01:01:24,140 --> 01:01:25,650 OK, so we know these two things. 1278 01:01:25,650 --> 01:01:27,970 So if we bring them over to the same side, 1279 01:01:27,970 --> 01:01:30,220 we're searching for an unknown thing, 1280 01:01:30,220 --> 01:01:33,030 which is equal to exactly this thing that we can compute. 1281 01:01:33,030 --> 01:01:35,190 So we just compute h minus si. 1282 01:01:35,190 --> 01:01:35,910 We call find. 1283 01:01:35,910 --> 01:01:40,230 That will tell us whether there is an sj equal to this. 1284 01:01:40,230 --> 01:01:43,590 OK, so this is like a comment. 1285 01:01:43,590 --> 01:01:45,990 And this is what we actually do. 1286 01:01:45,990 --> 01:01:49,050 And if there is a pair of numbers summing to h, 1287 01:01:49,050 --> 01:01:50,400 this will find it. 1288 01:01:50,400 --> 01:01:52,090 How much time did it take? 1289 01:01:52,090 --> 01:01:54,690 Well, we're doing n iterations of this loop. 1290 01:01:54,690 --> 01:01:57,450 Each one, we're calling a single find operation. 1291 01:01:57,450 --> 01:02:01,760 And find costs constant expected time. 1292 01:02:01,760 --> 01:02:05,700 And so the total is linear expected time. 1293 01:02:05,700 --> 01:02:07,050 Great. 1294 01:02:07,050 --> 01:02:08,136 Part A done. 1295 01:02:19,560 --> 01:02:24,330 Then they throw part b at us, make it harder. 1296 01:02:24,330 --> 01:02:25,950 Those pesky instructors. 1297 01:02:33,680 --> 01:02:37,010 So we read part b. 1298 01:02:37,010 --> 01:02:41,240 Part b says two things to make it harder. 1299 01:02:41,240 --> 01:02:45,200 So first of all, we want linear worst-case time. 1300 01:02:50,290 --> 01:02:54,360 And furthermore-- so we can't use hashing anymore. 1301 01:02:54,360 --> 01:02:57,270 Furthermore-- so here, we just needed 1302 01:02:57,270 --> 01:02:59,460 to solve the exact problem to find whether the two 1303 01:02:59,460 --> 01:03:01,470 numbers sum exactly to h. 1304 01:03:01,470 --> 01:03:04,350 Now we would like to find the best solution smaller than 1305 01:03:04,350 --> 01:03:05,940 or equal to h. 1306 01:03:05,940 --> 01:03:17,730 So find biggest pairwise sum that's 1307 01:03:17,730 --> 01:03:22,890 less than or equal to h if there's no perfect pair. 1308 01:03:22,890 --> 01:03:28,620 But we're given a little bit of extra information, which 1309 01:03:28,620 --> 01:03:37,122 is, we can assume h equals 600 n to the 6th. 1310 01:03:37,122 --> 01:03:40,620 That's a weird polynomial. 1311 01:03:40,620 --> 01:03:43,380 Took me a while to even notice that that was a joke in here-- 1312 01:03:43,380 --> 01:03:46,950 6006, hiding in a polynomial. 1313 01:03:46,950 --> 01:03:49,470 All right, so polynomial. 1314 01:03:49,470 --> 01:03:51,060 Hm. 1315 01:03:51,060 --> 01:03:52,680 That should make you think radix sort. 1316 01:03:52,680 --> 01:03:54,160 It is radix sort week. 1317 01:03:54,160 --> 01:03:56,160 So that is a natural thing to try. 1318 01:03:56,160 --> 01:03:58,530 But in general, even later in the semester, 1319 01:03:58,530 --> 01:04:02,580 when you see a nice polynomial with a fixed 1320 01:04:02,580 --> 01:04:04,178 constant like this, and it's somehow 1321 01:04:04,178 --> 01:04:05,970 related to the integers we're dealing with, 1322 01:04:05,970 --> 01:04:07,137 you should think radix sort. 1323 01:04:07,137 --> 01:04:09,730 Especially because now, we want constant worst-case time, 1324 01:04:09,730 --> 01:04:11,820 radix sort seems like a good thing to do. 1325 01:04:11,820 --> 01:04:13,380 Don't know what to do with it yet. 1326 01:04:13,380 --> 01:04:15,810 In fact, I can't even apply radix sort. 1327 01:04:15,810 --> 01:04:19,170 But idea one is radix sort. 1328 01:04:19,170 --> 01:04:21,000 Just because I see that polynomial, 1329 01:04:21,000 --> 01:04:24,600 I think maybe I should try it. 1330 01:04:24,600 --> 01:04:27,690 Now, there's a problem here, because we're given some 1331 01:04:27,690 --> 01:04:31,080 numbers, some integers, si's. 1332 01:04:31,080 --> 01:04:32,010 We're also given h. 1333 01:04:32,010 --> 01:04:34,510 We're told now that is a nice, small polynomial. 1334 01:04:34,510 --> 01:04:37,710 But we have no idea how big these numbers are. 1335 01:04:37,710 --> 01:04:39,540 So the problem with this idea is that-- 1336 01:04:43,180 --> 01:04:46,260 but si could be bigger than h. 1337 01:04:46,260 --> 01:04:48,420 We have no idea how big the si's are. 1338 01:04:51,140 --> 01:04:53,510 What can I say about si's that are 1339 01:04:53,510 --> 01:04:57,400 bigger than h for this problem? 1340 01:05:00,100 --> 01:05:02,360 Summing to h. 1341 01:05:02,360 --> 01:05:02,860 Oh. 1342 01:05:02,860 --> 01:05:05,193 I didn't say it, but all these numbers are non-negative. 1343 01:05:05,193 --> 01:05:06,490 That's important. 1344 01:05:06,490 --> 01:05:09,160 That looks like [INAUDIBLE]. 1345 01:05:09,160 --> 01:05:11,581 Greater than or equal to 0. 1346 01:05:16,580 --> 01:05:17,080 Yeah? 1347 01:05:17,080 --> 01:05:18,640 AUDIENCE: [INAUDIBLE] solution [INAUDIBLE].. 1348 01:05:18,640 --> 01:05:19,420 ERIK DEMAINE: Right. 1349 01:05:19,420 --> 01:05:21,587 If I'm finding a sum that's less than or equal to h, 1350 01:05:21,587 --> 01:05:22,810 they're non-negative. 1351 01:05:22,810 --> 01:05:28,540 And any number that's greater than h, I can just throw away. 1352 01:05:28,540 --> 01:05:29,920 They'll never be in a solution. 1353 01:05:29,920 --> 01:05:32,670 So they already-- a sum of one number is bigger than h. 1354 01:05:32,670 --> 01:05:35,175 So two is only going to get bigger if they're non-negative. 1355 01:05:38,090 --> 01:05:41,830 So idea number two is let's just throw out all the big si's, 1356 01:05:41,830 --> 01:05:43,180 anything bigger than h. 1357 01:05:43,180 --> 01:05:46,090 Now, that won't change the answer, because those can never 1358 01:05:46,090 --> 01:05:47,320 be in a solution. 1359 01:05:47,320 --> 01:05:49,120 And now I have all the si's having 1360 01:05:49,120 --> 01:05:51,700 the property that they're less than or equal to h. 1361 01:05:51,700 --> 01:05:56,150 And so they are small, bounded by a fixed polynomial. 1362 01:05:56,150 --> 01:05:57,600 And now I can apply radix sort. 1363 01:05:57,600 --> 01:06:01,370 So after this idea, I can apply this idea. 1364 01:06:01,370 --> 01:06:02,830 OK, this gives you a flavor of how 1365 01:06:02,830 --> 01:06:04,720 I like to think about problems. 1366 01:06:04,720 --> 01:06:06,820 I see clues, like a polynomial. 1367 01:06:06,820 --> 01:06:08,860 I think radix sort doesn't work. 1368 01:06:08,860 --> 01:06:12,655 But with some more ideas, I can get it to work. 1369 01:06:12,655 --> 01:06:14,370 OK. 1370 01:06:14,370 --> 01:06:16,890 What goes with the-- so now I've sorted si. 1371 01:06:16,890 --> 01:06:18,380 OK, great. 1372 01:06:18,380 --> 01:06:19,550 S is sorted. 1373 01:06:23,650 --> 01:06:27,250 I guess we can try to do the same algorithm, 1374 01:06:27,250 --> 01:06:29,540 except I don't have a hash table anymore. 1375 01:06:29,540 --> 01:06:34,820 So let's just try doing a for loop over the S. Why not? 1376 01:06:34,820 --> 01:06:40,270 So let's do for si in S. But now it's sorted. 1377 01:06:40,270 --> 01:06:42,920 So presumably, I should exploit the sorted order. 1378 01:06:42,920 --> 01:06:45,860 So let's do them in order. 1379 01:06:45,860 --> 01:06:49,870 So i equals 0, 1, up to n minus 1. 1380 01:06:49,870 --> 01:06:52,030 Let's say that s0 is the smallest. 1381 01:06:52,030 --> 01:06:53,770 s1 is the next smallest. 1382 01:06:53,770 --> 01:06:56,260 sn minus 1 is the biggest. 1383 01:06:56,260 --> 01:06:58,120 So I want to do something with-- 1384 01:06:58,120 --> 01:06:59,890 so I have si. 1385 01:06:59,890 --> 01:07:05,200 And I want to figure out whether h minus si is in there. 1386 01:07:08,680 --> 01:07:12,040 Hard to do that better than-- 1387 01:07:12,040 --> 01:07:15,730 actually, I could do this with binary search. 1388 01:07:15,730 --> 01:07:18,730 I'm looking for this value. 1389 01:07:18,730 --> 01:07:20,260 And I have a sorted array now. 1390 01:07:20,260 --> 01:07:22,960 So I could binary search for h minus si. 1391 01:07:22,960 --> 01:07:26,600 And in log n time, I will find whether that guy is in there. 1392 01:07:26,600 --> 01:07:27,820 And if not, keep looping. 1393 01:07:27,820 --> 01:07:29,960 I can keep track of the best thing that I found. 1394 01:07:29,960 --> 01:07:32,430 And so in n log n time, I can definitely solve this. 1395 01:07:32,430 --> 01:07:35,020 But I'd like to get linear time. 1396 01:07:35,020 --> 01:07:36,070 Do you have a question? 1397 01:07:36,070 --> 01:07:37,270 AUDIENCE: Well, I'm just wondering, 1398 01:07:37,270 --> 01:07:38,512 how would you [INAUDIBLE]? 1399 01:07:38,512 --> 01:07:40,622 Like, why would you [INAUDIBLE] whether that 1400 01:07:40,622 --> 01:07:41,728 is in there [INAUDIBLE]? 1401 01:07:41,728 --> 01:07:43,270 ERIK DEMAINE: I'm not looking for si. 1402 01:07:43,270 --> 01:07:45,910 I'm going to compute h minus si. 1403 01:07:45,910 --> 01:07:48,550 So this is-- maybe I shouldn't even write this down, but-- 1404 01:07:48,550 --> 01:07:52,090 AUDIENCE: She's asking about the [INAUDIBLE] constraint of, 1405 01:07:52,090 --> 01:07:53,290 we're not looking for h. 1406 01:07:53,290 --> 01:07:55,990 We're looking for something smaller than h. 1407 01:07:55,990 --> 01:07:57,020 ERIK DEMAINE: This one? 1408 01:07:57,020 --> 01:07:57,520 Or-- 1409 01:07:57,520 --> 01:07:58,690 AUDIENCE: Something larger. 1410 01:07:58,690 --> 01:07:59,950 ERIK DEMAINE: Oh, this thing. 1411 01:07:59,950 --> 01:08:01,010 AUDIENCE: A large thing less than h. 1412 01:08:01,010 --> 01:08:01,843 ERIK DEMAINE: Right. 1413 01:08:01,843 --> 01:08:06,100 So in particular, if there are two items that sum to h, 1414 01:08:06,100 --> 01:08:08,920 I want to find it. 1415 01:08:08,920 --> 01:08:09,970 So let's start with that. 1416 01:08:09,970 --> 01:08:14,710 So I'm binary searching for h minus si in S. 1417 01:08:14,710 --> 01:08:16,330 So I could certainly do that. 1418 01:08:16,330 --> 01:08:19,737 And if I find it, great. 1419 01:08:19,737 --> 01:08:21,279 I found a pair that sum to exactly h. 1420 01:08:21,279 --> 01:08:24,670 If I don't find it, binary search tells me not only 1421 01:08:24,670 --> 01:08:26,229 that it's not there, but it tells me 1422 01:08:26,229 --> 01:08:28,850 what the previous and next value are. 1423 01:08:28,850 --> 01:08:30,520 So even though h minus si isn't there, 1424 01:08:30,520 --> 01:08:33,430 I can get the next largest thing and the next smallest thing. 1425 01:08:33,430 --> 01:08:35,319 What I want is the next smallest thing. 1426 01:08:35,319 --> 01:08:39,970 And that will be the largest sum I can get using si. 1427 01:08:39,970 --> 01:08:43,499 And so then that's one candidate for a sum less than 1428 01:08:43,499 --> 01:08:44,498 or equal to h. 1429 01:08:44,498 --> 01:08:45,790 I want to find the largest one. 1430 01:08:45,790 --> 01:08:46,899 So I do a for loop. 1431 01:08:46,899 --> 01:08:48,010 I always keep track of-- 1432 01:08:48,010 --> 01:08:49,930 I take a list of all the candidates I got. 1433 01:08:49,930 --> 01:08:53,529 Each time I do an iteration of this loop, I get one candidate. 1434 01:08:53,529 --> 01:08:56,229 Then I take the largest one. 1435 01:08:56,229 --> 01:08:59,593 OK, so return largest candidate. 1436 01:09:06,859 --> 01:09:12,910 So this gives me a candidate, just the previous item. 1437 01:09:12,910 --> 01:09:15,700 This is what we called find previous, or find prev, 1438 01:09:15,700 --> 01:09:18,220 probably, in our set interface. 1439 01:09:18,220 --> 01:09:21,350 And if you have a sorted set, you can do that in log n time. 1440 01:09:21,350 --> 01:09:27,382 So this is an n log n solution, because we do n iterations 1441 01:09:27,382 --> 01:09:28,090 through the loop. 1442 01:09:28,090 --> 01:09:29,920 Each binary search takes log n. 1443 01:09:29,920 --> 01:09:32,510 I want to get linear. 1444 01:09:32,510 --> 01:09:35,455 This is not obvious. 1445 01:09:40,689 --> 01:09:43,600 The best intuition I can think of for this next idea 1446 01:09:43,600 --> 01:09:49,750 is, well, I start with the very smallest item in S. 1447 01:09:49,750 --> 01:09:53,029 And I want to sum up to something that's kind of big, 1448 01:09:53,029 --> 01:09:53,529 right? 1449 01:09:53,529 --> 01:09:56,920 I threw away all the items bigger than h. 1450 01:09:56,920 --> 01:10:00,730 If s0 is like tiny, like close to 0, 1451 01:10:00,730 --> 01:10:02,680 because it's the smallest one, then maybe I 1452 01:10:02,680 --> 01:10:05,380 should look at the end of the array, 1453 01:10:05,380 --> 01:10:07,150 because I want to compare, or I want 1454 01:10:07,150 --> 01:10:08,670 to add the smallest thing probably 1455 01:10:08,670 --> 01:10:09,820 with the biggest thing. 1456 01:10:09,820 --> 01:10:12,940 That's as close as I can imagine. 1457 01:10:12,940 --> 01:10:20,530 So then-- so here's my sorted S. It's the smallest item, biggest 1458 01:10:20,530 --> 01:10:21,320 item. 1459 01:10:21,320 --> 01:10:25,150 So I'm going to loop over these items one by one. 1460 01:10:25,150 --> 01:10:27,850 So let's start by comparing the first one with the last one. 1461 01:10:31,690 --> 01:10:34,630 The two-finger algorithm, OK? 1462 01:10:41,950 --> 01:10:43,210 This is the big idea. 1463 01:10:43,210 --> 01:10:45,400 You're doing it all the time in this class. 1464 01:10:45,400 --> 01:10:46,150 It's super useful. 1465 01:10:46,150 --> 01:10:48,880 We saw it in merge sorts, for example, and merging two lists. 1466 01:10:48,880 --> 01:10:51,190 We have fingers in two lists that advance. 1467 01:10:51,190 --> 01:10:54,582 And because they only advance, it takes linear total time. 1468 01:10:54,582 --> 01:10:57,040 So we're going to do this kind of folded in backwards here. 1469 01:10:57,040 --> 01:10:58,550 We're going to start here. 1470 01:10:58,550 --> 01:11:00,700 This seems like a good candidate to start with. 1471 01:11:00,700 --> 01:11:03,400 Now, what else could this add with? 1472 01:11:03,400 --> 01:11:05,650 Well, maybe smaller items. 1473 01:11:05,650 --> 01:11:07,930 And maybe I have to go all the way through here. 1474 01:11:07,930 --> 01:11:10,345 And then I've got to advance my left finger. 1475 01:11:10,345 --> 01:11:11,080 Yeah, OK. 1476 01:11:11,080 --> 01:11:15,020 So here's the idea. 1477 01:11:15,020 --> 01:11:19,240 So let's look at-- 1478 01:11:19,240 --> 01:11:22,650 so I'm going to call this finger i and this finger j. 1479 01:11:22,650 --> 01:11:25,070 So we want to sum two things. 1480 01:11:25,070 --> 01:11:27,040 So I guess one other inspiration here is, 1481 01:11:27,040 --> 01:11:28,390 we want to add two things up. 1482 01:11:28,390 --> 01:11:31,170 And we have one algorithm that has the word "two" in it. 1483 01:11:31,170 --> 01:11:33,080 And it's the two-finger algorithm. 1484 01:11:33,080 --> 01:11:34,978 So let's try that. 1485 01:11:34,978 --> 01:11:36,520 So we're going to start at i equals 0 1486 01:11:36,520 --> 01:11:38,050 and j equals n minus 1. 1487 01:11:38,050 --> 01:11:44,770 We're going to look at si plus sj and see, how good is it? 1488 01:11:44,770 --> 01:11:46,900 How close to summing to h is it? 1489 01:11:46,900 --> 01:11:51,400 Well, in particular, it's either less than or equal to h 1490 01:11:51,400 --> 01:11:52,330 or bigger than h. 1491 01:11:57,940 --> 01:11:59,980 If it's bigger than h-- 1492 01:11:59,980 --> 01:12:01,300 so this sum is too big. 1493 01:12:01,300 --> 01:12:03,520 I can't even use it as a candidate. 1494 01:12:03,520 --> 01:12:06,640 Well, that means I really don't need this guy, right? 1495 01:12:06,640 --> 01:12:08,020 It's too big overall. 1496 01:12:08,020 --> 01:12:10,450 I'm adding the smallest item to this item. 1497 01:12:10,450 --> 01:12:11,362 And it's too big. 1498 01:12:11,362 --> 01:12:12,820 Well, then I should go to the left. 1499 01:12:12,820 --> 01:12:14,960 I should move my right finger to the left. 1500 01:12:14,960 --> 01:12:19,750 So in this case, we decrement j. 1501 01:12:19,750 --> 01:12:22,520 Move the right finger to the left. 1502 01:12:22,520 --> 01:12:26,760 So I'm guessing, in this case, I increment i. 1503 01:12:26,760 --> 01:12:28,660 Why? 1504 01:12:28,660 --> 01:12:32,395 If I add these two items up, and this is too small, 1505 01:12:32,395 --> 01:12:36,370 it's smaller than h, then this item was probably too small. 1506 01:12:36,370 --> 01:12:38,215 It might actually-- it's an OK solution. 1507 01:12:38,215 --> 01:12:39,423 It's less than or equal to h. 1508 01:12:39,423 --> 01:12:41,965 So I should keep it as a candidate. 1509 01:12:41,965 --> 01:12:47,800 Let's say add candidate. 1510 01:12:47,800 --> 01:12:51,220 So I'm just going to keep a list of candidates that I see. 1511 01:12:51,220 --> 01:12:52,900 So this is a possible solution. 1512 01:12:52,900 --> 01:12:54,430 It might not be the best one. 1513 01:12:54,430 --> 01:12:55,810 But it's one to add to my list. 1514 01:12:55,810 --> 01:13:00,400 And then I'm going to increase i and now work on this sub-array, 1515 01:13:00,400 --> 01:13:02,510 because that will be a little bit bigger. 1516 01:13:02,510 --> 01:13:04,160 I can't go this way to make it bigger, 1517 01:13:04,160 --> 01:13:05,860 because I'm at the last item. 1518 01:13:05,860 --> 01:13:07,690 And it's not obvious that this works. 1519 01:13:07,690 --> 01:13:15,920 I think there's a nice invariant that will help somewhere. 1520 01:13:15,920 --> 01:13:17,691 Where'd I put my piece of paper? 1521 01:13:22,430 --> 01:13:22,930 Yeah. 1522 01:13:26,325 --> 01:13:28,750 So here's an invariant. 1523 01:13:37,076 --> 01:13:38,010 Oh, yes. 1524 01:14:00,672 --> 01:14:02,380 It's really clear this is the right thing 1525 01:14:02,380 --> 01:14:03,460 to do in the first step. 1526 01:14:03,460 --> 01:14:06,340 And the tricky part is to argue that it works in all steps, 1527 01:14:06,340 --> 01:14:09,243 because when I really have the smallest item and the largest 1528 01:14:09,243 --> 01:14:11,535 item, it's clear that I should advance one or the other 1529 01:14:11,535 --> 01:14:13,060 if I'm too small or too big. 1530 01:14:13,060 --> 01:14:15,520 But the way to prove it in general by induction 1531 01:14:15,520 --> 01:14:17,290 is to show this invariant that-- 1532 01:14:17,290 --> 01:14:19,870 so at some point through this execution, 1533 01:14:19,870 --> 01:14:21,070 i and j are somewhere. 1534 01:14:21,070 --> 01:14:23,830 And I want to say that if I take any j from the right-- 1535 01:14:23,830 --> 01:14:26,860 any j prime to the right of j and any i prime to the left 1536 01:14:26,860 --> 01:14:30,310 of i, unstrictly, then all of those pairs, 1537 01:14:30,310 --> 01:14:33,010 all those pairwise sums, are either too big-- 1538 01:14:33,010 --> 01:14:34,750 and that's when we decrease j-- 1539 01:14:34,750 --> 01:14:37,450 or they're less than or equal to the largest candidate 1540 01:14:37,450 --> 01:14:39,490 that we've seen so far. 1541 01:14:39,490 --> 01:14:42,140 That's because we added these candidates in there. 1542 01:14:42,140 --> 01:14:45,280 So that invariant will hold by induction, 1543 01:14:45,280 --> 01:14:47,950 because whenever there's a possible thing that's good, 1544 01:14:47,950 --> 01:14:49,577 I add it to my candidate list. 1545 01:14:49,577 --> 01:14:51,160 And then, at the end of the algorithm, 1546 01:14:51,160 --> 01:14:54,250 I just loop through my candidate list, compute the max, 1547 01:14:54,250 --> 01:14:55,090 return that pair. 1548 01:14:58,390 --> 01:15:01,600 OK, so that is two-finger algorithm, 1549 01:15:01,600 --> 01:15:04,360 which solves the non-exact problem 1550 01:15:04,360 --> 01:15:06,070 in linear worst-case time. 1551 01:15:06,070 --> 01:15:06,570 Yeah? 1552 01:15:06,570 --> 01:15:08,170 AUDIENCE: i cannot equal j, right? 1553 01:15:08,170 --> 01:15:09,040 ERIK DEMAINE: Oh, i cannot-- right. 1554 01:15:09,040 --> 01:15:10,900 So what are the termination conditions? 1555 01:15:10,900 --> 01:15:14,740 When i equals j, that's probably when you want to stop. 1556 01:15:14,740 --> 01:15:15,350 It depends. 1557 01:15:15,350 --> 01:15:19,540 You could say, if i is greater than j, stop. 1558 01:15:19,540 --> 01:15:21,230 Return max candidate. 1559 01:15:24,005 --> 01:15:25,880 There are two ways to interpret this problem. 1560 01:15:25,880 --> 01:15:29,290 One is that the two values you choose in S 1561 01:15:29,290 --> 01:15:31,840 need to be different values, or you 1562 01:15:31,840 --> 01:15:35,080 allow them to be the same value, like they can both be h over 2. 1563 01:15:35,080 --> 01:15:36,730 And either way is easier to solve. 1564 01:15:36,730 --> 01:15:39,490 If you want to allow s over 2, then I 1565 01:15:39,490 --> 01:15:40,660 would put greater than here. 1566 01:15:40,660 --> 01:15:44,470 If you don't want to allow h over 2, 1567 01:15:44,470 --> 01:15:46,990 then I would put greater than or equal to-- 1568 01:15:46,990 --> 01:15:47,590 either way. 1569 01:15:47,590 --> 01:15:49,870 Both of these problems, you can solve both ways. 1570 01:15:49,870 --> 01:15:54,820 Or both algorithms can handle both situations. 1571 01:15:54,820 --> 01:15:57,790 OK, one more problem. 1572 01:16:00,950 --> 01:16:01,640 All right. 1573 01:16:01,640 --> 01:16:04,100 Yeah, I'm all out of time. 1574 01:16:04,100 --> 01:16:06,740 But I'm getting faster and faster. 1575 01:16:06,740 --> 01:16:08,180 Of course, on the hardest problem, 1576 01:16:08,180 --> 01:16:09,590 I can do it the fastest. 1577 01:16:09,590 --> 01:16:14,180 All right, so Meff Ja-- 1578 01:16:14,180 --> 01:16:16,880 this is a reference to Jeff Ma of the MIT Blackjack 1579 01:16:16,880 --> 01:16:19,910 Team, who I got to speak here at LSC a bunch of years ago. 1580 01:16:19,910 --> 01:16:22,880 But he's featured in the movie 21 and so on-- 1581 01:16:22,880 --> 01:16:25,910 fictionalized. 1582 01:16:25,910 --> 01:16:27,860 So I was playing this game. 1583 01:16:27,860 --> 01:16:29,270 It's a great setup. 1584 01:16:29,270 --> 01:16:31,220 You should definitely read this problem-- 1585 01:16:31,220 --> 01:16:35,120 Po- k -er. 1586 01:16:35,120 --> 01:16:38,930 And he has a deck of cards, where each card has 1587 01:16:38,930 --> 01:16:40,400 a letter of the alphabet on it. 1588 01:16:40,400 --> 01:16:42,510 I guess this is the right way up. 1589 01:16:42,510 --> 01:16:44,930 So I, of course, have such a deck. 1590 01:16:44,930 --> 01:16:46,040 Doesn't everyone? 1591 01:16:46,040 --> 01:16:47,000 You can buy these. 1592 01:16:47,000 --> 01:16:49,370 I have several, actually. 1593 01:16:49,370 --> 01:16:50,990 And so we can do a quick magic trick, 1594 01:16:50,990 --> 01:16:54,272 like pick a card, any card-- here, pick a card. 1595 01:16:54,272 --> 01:16:57,710 [INAUDIBLE] OK, good choice. 1596 01:16:57,710 --> 01:16:59,960 I can't force, so it doesn't really matter. 1597 01:16:59,960 --> 01:17:04,160 OK, and so this is your card, right? 1598 01:17:04,160 --> 01:17:07,680 And your card is an s, right? 1599 01:17:07,680 --> 01:17:09,560 OK, good. 1600 01:17:09,560 --> 01:17:11,180 No, not all the cards are s's. 1601 01:17:11,180 --> 01:17:14,923 [LAUGHTER] 1602 01:17:14,923 --> 01:17:16,340 But he has mirrors in his glasses. 1603 01:17:16,340 --> 01:17:17,404 No. 1604 01:17:17,404 --> 01:17:20,630 I can reveal later how that's done. 1605 01:17:20,630 --> 01:17:22,350 OK, so a deck of cards. 1606 01:17:22,350 --> 01:17:25,730 Each card has 26 possible letters on it. 1607 01:17:25,730 --> 01:17:29,090 And there's this weird dealing process. 1608 01:17:29,090 --> 01:17:30,590 Even just defining this problem is 1609 01:17:30,590 --> 01:17:31,370 going to take a little while. 1610 01:17:31,370 --> 01:17:32,578 Oh, here's my piece of paper. 1611 01:17:35,930 --> 01:17:38,000 So we have this dealing process. 1612 01:17:38,000 --> 01:17:40,310 Here's an example that's in the program, abcdbc. 1613 01:17:45,020 --> 01:17:47,330 So you know the order of the cards. 1614 01:17:47,330 --> 01:17:48,350 This is the top card. 1615 01:17:48,350 --> 01:17:49,940 This is the bottom card. 1616 01:17:49,940 --> 01:17:53,450 And now, randomly, you do a cut. 1617 01:17:53,450 --> 01:17:56,910 Cut is this. 1618 01:17:56,910 --> 01:18:00,140 So I take some chunk off the top, move it to the bottom, 1619 01:18:00,140 --> 01:18:02,030 once, randomly. 1620 01:18:02,030 --> 01:18:06,080 So for example, I could take this cut. 1621 01:18:06,080 --> 01:18:13,190 And then what I would get is cdbc for this part that's 1622 01:18:13,190 --> 01:18:18,840 copied here, and ab as the-- 1623 01:18:18,840 --> 01:18:23,490 so this is-- so the first thing we do is cut at i. 1624 01:18:23,490 --> 01:18:24,780 This is position i. 1625 01:18:24,780 --> 01:18:28,860 In this example, i equals 2. 1626 01:18:28,860 --> 01:18:33,610 OK, then we deal the top k cards. 1627 01:18:33,610 --> 01:18:38,610 So let's say we deal the top four cards, k equals 4. 1628 01:18:38,610 --> 01:18:42,960 So this is deal k. 1629 01:18:42,960 --> 01:18:45,900 So we get cdbc, in that order. 1630 01:18:45,900 --> 01:18:48,270 But the order doesn't matter, because the last operation 1631 01:18:48,270 --> 01:18:53,882 we do in the problem is sort them, which is bccd, 1632 01:18:53,882 --> 01:18:55,590 like you do when you get a hand of cards. 1633 01:18:55,590 --> 01:18:57,420 You tend to sort them. 1634 01:18:57,420 --> 01:18:58,860 OK, so this is a process. 1635 01:18:58,860 --> 01:19:03,330 Given a deck-- so the deck here is fixed. 1636 01:19:03,330 --> 01:19:10,200 We call this process, I think, P of D comma i comma k. 1637 01:19:10,200 --> 01:19:11,320 We're told what D is. 1638 01:19:11,320 --> 01:19:12,720 We're told what k is. 1639 01:19:12,720 --> 01:19:15,210 i is chosen randomly. 1640 01:19:15,210 --> 01:19:18,090 And we'd like to know what happens with different i's. 1641 01:19:21,160 --> 01:19:24,265 So if you stare at this problem enough, it begins to simplify. 1642 01:19:24,265 --> 01:19:26,382 So this is a complicated setup. 1643 01:19:26,382 --> 01:19:28,840 But what's really going on is we're starting at position i. 1644 01:19:28,840 --> 01:19:32,830 And we're taking the next k cards from there cyclically. 1645 01:19:32,830 --> 01:19:34,270 So here, we just took those four. 1646 01:19:34,270 --> 01:19:38,410 If i equaled 3, we would deal d, then b, then c, then a. 1647 01:19:38,410 --> 01:19:40,290 But then we sort them. 1648 01:19:40,290 --> 01:19:43,240 OK, so we're getting different substrings of length k, 1649 01:19:43,240 --> 01:19:44,350 cyclically. 1650 01:19:44,350 --> 01:19:46,710 But then we're sorting those letters. 1651 01:19:46,710 --> 01:19:48,460 Sorting is really crucial for this problem 1652 01:19:48,460 --> 01:19:50,890 to at all be feasible. 1653 01:19:50,890 --> 01:19:53,780 It took me a while even to see how to solve this problem. 1654 01:19:53,780 --> 01:19:56,950 But the key is sorting, that they get sorted, 1655 01:19:56,950 --> 01:19:59,350 because that means-- 1656 01:19:59,350 --> 01:20:01,630 because we sort, it doesn't matter whether you 1657 01:20:01,630 --> 01:20:08,050 have aaba, or baaa, or abaa. 1658 01:20:08,050 --> 01:20:10,490 These are all the same. 1659 01:20:10,490 --> 01:20:12,550 If you take these cards dealt, you 1660 01:20:12,550 --> 01:20:15,130 sort them to the same thing, which is 1661 01:20:15,130 --> 01:20:17,650 the one I didn't write, aaab. 1662 01:20:17,650 --> 01:20:20,150 All of these get sorted to the same thing. 1663 01:20:20,150 --> 01:20:23,970 So we lost some information when we sort, lost the order. 1664 01:20:23,970 --> 01:20:25,680 The first question to get you thinking 1665 01:20:25,680 --> 01:20:29,970 in this direction, part a, says, build a data structure 1666 01:20:29,970 --> 01:20:36,600 given D and k that lets me know, given two indices, i and j, 1667 01:20:36,600 --> 01:20:39,670 do I end up with the exact same hand? 1668 01:20:39,670 --> 01:20:43,080 This thing is called a hand. 1669 01:20:43,080 --> 01:20:46,770 And it's exactly this P D, i, k. 1670 01:20:46,770 --> 01:20:50,610 So I want to do P D, i, k and P, d, j, k. 1671 01:20:50,610 --> 01:20:51,480 And I want to know-- 1672 01:20:51,480 --> 01:20:52,260 JK. 1673 01:20:52,260 --> 01:20:54,480 And I want to know whether those two things are 1674 01:20:54,480 --> 01:20:57,340 equal in constant time. 1675 01:20:57,340 --> 01:20:58,380 That's what this says-- 1676 01:20:58,380 --> 01:21:00,240 constant time. 1677 01:21:00,240 --> 01:21:02,835 Doesn't say worst case, but worst case is possible. 1678 01:21:06,577 --> 01:21:08,160 And that sounds hard, because, I mean, 1679 01:21:08,160 --> 01:21:10,560 there's k symbols for one of them, another k symbols 1680 01:21:10,560 --> 01:21:11,312 for the other guy. 1681 01:21:11,312 --> 01:21:13,020 But we don't have to compare the symbols. 1682 01:21:13,020 --> 01:21:17,220 We just need to compare the sorting of those strings. 1683 01:21:17,220 --> 01:21:19,620 And this, we can compress. 1684 01:21:19,620 --> 01:21:21,570 So this is a subtlety. 1685 01:21:21,570 --> 01:21:24,210 But all I really need to know is that there are three a's 1686 01:21:24,210 --> 01:21:30,260 here, and one b, and zero c's, and zero d's, and zero 1687 01:21:30,260 --> 01:21:31,980 e's, and so on. 1688 01:21:31,980 --> 01:21:35,310 But because there's only 26 letters in this deck-- 1689 01:21:35,310 --> 01:21:38,480 and indeed, in this deck, it happens upper and lowercase a 1690 01:21:38,480 --> 01:21:39,180 through z. 1691 01:21:39,180 --> 01:21:41,220 But we might have n cards. 1692 01:21:41,220 --> 01:21:43,190 But there are only 26 possible labels. 1693 01:21:43,190 --> 01:21:44,648 So in fact, a lot of them are going 1694 01:21:44,648 --> 01:21:46,390 to be equal if n is large. 1695 01:21:46,390 --> 01:21:47,940 So this is a good compression scheme, 1696 01:21:47,940 --> 01:21:52,620 because to represent the things I get after sorting, I just 1697 01:21:52,620 --> 01:21:54,830 need to give you 26 numbers. 1698 01:21:54,830 --> 01:21:58,050 And for us, 26 is small, because 26 is a constant. 1699 01:21:58,050 --> 01:22:00,690 Independent of the number of cards, I just need to say, 1700 01:22:00,690 --> 01:22:01,710 how many a's are there? 1701 01:22:01,710 --> 01:22:03,630 It could be anywhere between 0 and n. 1702 01:22:03,630 --> 01:22:04,710 How many b's are there? 1703 01:22:04,710 --> 01:22:05,642 Between 0 and n. 1704 01:22:05,642 --> 01:22:06,600 How many c's are there? 1705 01:22:06,600 --> 01:22:07,800 Between 0 and n. 1706 01:22:07,800 --> 01:22:15,690 So 26 numbers in the range 0 to n-- 1707 01:22:15,690 --> 01:22:24,480 I like to think of this as a 26-digit number base n plus 1. 1708 01:22:24,480 --> 01:22:29,220 We can map this into base n plus 1. 1709 01:22:29,220 --> 01:22:36,360 And we get 26 digits In that base. 1710 01:22:36,360 --> 01:22:39,907 Another way to say it is that the number of possible 1711 01:22:39,907 --> 01:22:42,240 combinations here-- how many a's, how many b's, how many 1712 01:22:42,240 --> 01:22:42,960 c's-- 1713 01:22:42,960 --> 01:22:45,480 is not even theta. 1714 01:22:45,480 --> 01:22:51,120 It is n plus 1, anything between 0 and n, to the power of 26. 1715 01:22:51,120 --> 01:22:54,270 This is a good polynomial. 1716 01:22:54,270 --> 01:22:58,630 So I can do stuff like radix sort. 1717 01:23:01,300 --> 01:23:02,200 Cool. 1718 01:23:02,200 --> 01:23:07,330 So let me summarize a little bit how we solve part a. 1719 01:23:07,330 --> 01:23:10,440 So I want to build a data structure, which is, 1720 01:23:10,440 --> 01:23:12,960 for each value i, I know I'm going 1721 01:23:12,960 --> 01:23:17,610 to end up serving these four cards, or in general, k cards. 1722 01:23:17,610 --> 01:23:20,915 So for those cards, I would like to compute 1723 01:23:20,915 --> 01:23:23,040 how many a's, how many b's, how many c's are there? 1724 01:23:23,040 --> 01:23:25,770 And then just write down this number. 1725 01:23:25,770 --> 01:23:27,470 This is a number which I can write down 1726 01:23:27,470 --> 01:23:32,020 in at most 26 words, because we can represent numbers between 0 1727 01:23:32,020 --> 01:23:33,660 and n in a single word. 1728 01:23:33,660 --> 01:23:37,020 That's the w is at least log n assumption. 1729 01:23:37,020 --> 01:23:38,520 So it's constant size. 1730 01:23:38,520 --> 01:23:41,130 In a constant number of numbers, I 1731 01:23:41,130 --> 01:23:44,640 can represent all I need to know about a thing of size-- 1732 01:23:44,640 --> 01:23:46,480 of length k here. 1733 01:23:46,480 --> 01:23:49,088 So I don't need to know which letters is where. 1734 01:23:49,088 --> 01:23:50,630 I just need to know the sorted order. 1735 01:23:50,630 --> 01:23:53,490 So I just need to know-- this is called a frequency table-- 1736 01:23:53,490 --> 01:23:55,230 how many a's, how many b's? 1737 01:23:55,230 --> 01:23:59,010 And so if I can compute those, then given that representation 1738 01:23:59,010 --> 01:24:02,130 for starting at i, and given that representation 1739 01:24:02,130 --> 01:24:04,300 for starting at j, say, which would 1740 01:24:04,300 --> 01:24:07,920 be these two and these two, I can compare them 1741 01:24:07,920 --> 01:24:09,780 by just comparing those 26 numbers. 1742 01:24:09,780 --> 01:24:11,370 If they're all equal, then they're 1743 01:24:11,370 --> 01:24:13,230 the same string after sorting. 1744 01:24:13,230 --> 01:24:15,510 And if there's any difference, then they're different. 1745 01:24:15,510 --> 01:24:17,343 So that's how I could do it in constant time 1746 01:24:17,343 --> 01:24:19,410 if I can compute these representations. 1747 01:24:19,410 --> 01:24:21,660 And it's not hard to do that. 1748 01:24:21,660 --> 01:24:24,420 It's called a sliding window technique, where you compute it 1749 01:24:24,420 --> 01:24:26,640 for the first k guys. 1750 01:24:26,640 --> 01:24:29,370 And then you remove this item and add this item. 1751 01:24:29,370 --> 01:24:32,010 And just by incrementing the counter for b, 1752 01:24:32,010 --> 01:24:33,810 decrementing the counter for a, now 1753 01:24:33,810 --> 01:24:37,180 I know the representation for these guys. 1754 01:24:37,180 --> 01:24:39,690 Make a copy of that, which is a copy of those 26 numbers, 1755 01:24:39,690 --> 01:24:40,530 constant. 1756 01:24:40,530 --> 01:24:43,110 Then I add on c, remove b. 1757 01:24:43,110 --> 01:24:49,530 Then I add on a, remove c, add on d, remove d, add on c, 1758 01:24:49,530 --> 01:24:53,983 remove b, and add on d, and remove c. 1759 01:24:53,983 --> 01:24:55,400 Well, I got back to the beginning. 1760 01:24:55,400 --> 01:24:57,425 So now I have representation of those. 1761 01:24:57,425 --> 01:25:00,000 OK, so by sliding this window, I'm 1762 01:25:00,000 --> 01:25:02,640 only changing at the two ends. 1763 01:25:02,640 --> 01:25:03,690 I add one guy on. 1764 01:25:03,690 --> 01:25:05,580 I increment one of these counters. 1765 01:25:05,580 --> 01:25:07,140 I decrement one of these counters. 1766 01:25:07,140 --> 01:25:09,620 So in constant time, given the representation 1767 01:25:09,620 --> 01:25:12,120 of one of these substrings, I can compute the representation 1768 01:25:12,120 --> 01:25:13,095 of the next one. 1769 01:25:13,095 --> 01:25:14,760 And that's how I, in linear time, 1770 01:25:14,760 --> 01:25:16,470 can build such a data structure that 1771 01:25:16,470 --> 01:25:19,860 lets me tell whether any two hands are equal. 1772 01:25:19,860 --> 01:25:23,040 The next problem, part b, is, given 1773 01:25:23,040 --> 01:25:25,650 all these representations, can you 1774 01:25:25,650 --> 01:25:27,892 find which one is the most common? 1775 01:25:27,892 --> 01:25:29,850 Because we were choosing i uniformly at random, 1776 01:25:29,850 --> 01:25:33,930 I want to know what the most likely hand that you get is. 1777 01:25:33,930 --> 01:25:35,910 And I think the easiest way to say 1778 01:25:35,910 --> 01:25:38,670 this is you can do that by radix sorting. 1779 01:25:38,670 --> 01:25:40,180 You take all these representations. 1780 01:25:40,180 --> 01:25:42,480 They are nice numbers in the range 0 1781 01:25:42,480 --> 01:25:45,010 to n plus 1 to the 26th power. 1782 01:25:45,010 --> 01:25:47,490 So I can just run radix sort and sort them all. 1783 01:25:47,490 --> 01:25:49,830 And then with a single scan through the array, 1784 01:25:49,830 --> 01:25:51,990 I can see which one is the most common. 1785 01:25:51,990 --> 01:25:53,120 Or rather, I can-- 1786 01:25:53,120 --> 01:25:56,610 in a single scan, I can compute, OK, how many of the same things 1787 01:25:56,610 --> 01:25:57,330 are at the front? 1788 01:25:57,330 --> 01:26:00,310 If they're sorted, then all the equal ones will be together. 1789 01:26:00,310 --> 01:26:01,630 So how many are there? 1790 01:26:01,630 --> 01:26:03,550 Then how many equal ones next? 1791 01:26:03,550 --> 01:26:05,190 And how many equal ones next? 1792 01:26:05,190 --> 01:26:07,950 Each time, comparing each item to the previous one. 1793 01:26:07,950 --> 01:26:11,400 Then I get frequency counts for all of these hands. 1794 01:26:11,400 --> 01:26:13,770 And then I do another scan to find the most common one. 1795 01:26:13,770 --> 01:26:16,980 And I can do another scan to find the lexically best one, 1796 01:26:16,980 --> 01:26:19,005 lexically last one. 1797 01:26:19,005 --> 01:26:22,370 And that's how you solve problem 5.