1 00:00:00,000 --> 00:00:01,976 [SQUEAKING] 2 00:00:01,976 --> 00:00:04,446 [RUSTLING] 3 00:00:04,446 --> 00:00:06,422 [CLICKING] 4 00:00:13,360 --> 00:00:17,970 JASON KU: Welcome to our fourth problem session. 5 00:00:17,970 --> 00:00:21,460 We're going to be talking about binary trees mostly, today. 6 00:00:21,460 --> 00:00:24,430 We'll talk a little bit about binary heaps, which 7 00:00:24,430 --> 00:00:27,650 is a topic we won't cover until next Tuesday, 8 00:00:27,650 --> 00:00:32,710 but it will appear in very small ways 9 00:00:32,710 --> 00:00:36,460 on your problem set 4, which will be due next Friday. 10 00:00:36,460 --> 00:00:40,660 So I'm going to go over a little bit of that material today. 11 00:00:40,660 --> 00:00:43,150 But it's mostly concerning-- 12 00:00:43,150 --> 00:00:48,880 the subject material for today is mostly binary trees, 13 00:00:48,880 --> 00:00:52,450 specifically, being applied to set data structures 14 00:00:52,450 --> 00:00:57,400 and sequence data structures, as Professor Demaine talked 15 00:00:57,400 --> 00:01:01,340 to you earlier this week. 16 00:01:01,340 --> 00:01:06,100 But for now-- actually, as of yesterday-- 17 00:01:06,100 --> 00:01:08,410 you've seen all of the data structures that we're going 18 00:01:08,410 --> 00:01:10,090 to cover to-- 19 00:01:10,090 --> 00:01:13,180 that will implement the set interface and the sequence 20 00:01:13,180 --> 00:01:14,470 interface. 21 00:01:14,470 --> 00:01:21,160 Those nice tables that Professor Demaine has been showing you, 22 00:01:21,160 --> 00:01:22,960 those are now complete. 23 00:01:22,960 --> 00:01:25,210 We have some data structures that 24 00:01:25,210 --> 00:01:28,930 are really good-- constant time operations for some operations. 25 00:01:28,930 --> 00:01:32,740 So we might use them for some applications. 26 00:01:32,740 --> 00:01:36,520 And this week, we've been describing 27 00:01:36,520 --> 00:01:40,150 to you trees, which achieve, really, pretty 28 00:01:40,150 --> 00:01:43,660 good, for any type of query operation on my sets 29 00:01:43,660 --> 00:01:44,980 or sequences-- 30 00:01:44,980 --> 00:01:48,320 pretty good meaning logarithmic time, not quite constant. 31 00:01:48,320 --> 00:01:51,895 But for our purposes, log n is-- 32 00:01:51,895 --> 00:01:54,580 I mean, on your computer, practically-- 33 00:01:54,580 --> 00:01:56,890 not asymptotically, but practically-- log 34 00:01:56,890 --> 00:02:00,430 n is going to be at most what on your computer? 35 00:02:03,520 --> 00:02:07,030 Something like 64, right? 36 00:02:07,030 --> 00:02:11,620 Any input that you're operating on with, in machine words, 37 00:02:11,620 --> 00:02:14,000 is your input. 38 00:02:14,000 --> 00:02:16,000 You need to be able to address all those machine 39 00:02:16,000 --> 00:02:18,550 words in your input. 40 00:02:18,550 --> 00:02:22,840 And on your computer, the size of your machine word addresses 41 00:02:22,840 --> 00:02:25,030 is 64 bits, right? 42 00:02:25,030 --> 00:02:27,760 And we assume that the word size is at least 43 00:02:27,760 --> 00:02:33,700 log the size of your input so that you can address the input. 44 00:02:33,700 --> 00:02:37,060 So for your purposes, on your computer, 45 00:02:37,060 --> 00:02:39,100 log n is going to be no more than 64, 46 00:02:39,100 --> 00:02:42,460 which means you would get maybe a 50 times overhead, 47 00:02:42,460 --> 00:02:45,670 or for smaller instances, it could be more like 10, 48 00:02:45,670 --> 00:02:49,300 if you've got 1,000 things that you're working on. 49 00:02:49,300 --> 00:02:50,620 It's not that bad, right? 50 00:02:50,620 --> 00:02:53,740 It's a constant-- it's not a constant factor for theory 51 00:02:53,740 --> 00:02:57,970 purposes, but for your purposes, log n is much better than 52 00:02:57,970 --> 00:02:59,980 a polynomial factor-- 53 00:02:59,980 --> 00:03:03,250 another factor of n. 54 00:03:03,250 --> 00:03:05,800 You've seen all of the code. 55 00:03:05,800 --> 00:03:08,950 You've seen implementations of all of these set and sequence 56 00:03:08,950 --> 00:03:10,300 interfaces, right? 57 00:03:10,300 --> 00:03:12,550 So I went ahead and wrote a little-- 58 00:03:12,550 --> 00:03:15,490 I compiled all of that code from your recitation notes, 59 00:03:15,490 --> 00:03:18,610 of all of the different interface implementations. 60 00:03:18,610 --> 00:03:22,720 And what I did was, I wrote a little test program 61 00:03:22,720 --> 00:03:25,960 to see how they ran on a real machine. 62 00:03:25,960 --> 00:03:30,340 I have a little test code here. 63 00:03:30,340 --> 00:03:35,410 I have a little folder that lists an array implementing 64 00:03:35,410 --> 00:03:37,270 a sequence, a binary tree implementing 65 00:03:37,270 --> 00:03:39,400 a sequence, a dynamic array implementing-- 66 00:03:39,400 --> 00:03:41,170 all of these kinds of things. 67 00:03:41,170 --> 00:03:45,250 Then set things-- a sorted array being a set in a binary tree, 68 00:03:45,250 --> 00:03:46,600 and a hash table. 69 00:03:46,600 --> 00:03:48,010 These are our implementations. 70 00:03:48,010 --> 00:03:50,710 I'm not using Python dictionaries for hash tables, 71 00:03:50,710 --> 00:03:55,330 I'm using the implementations that are in your recitation. 72 00:03:55,330 --> 00:03:56,980 And I'm going to run this little test 73 00:03:56,980 --> 00:03:59,440 efficiency Python code that basically is just 74 00:03:59,440 --> 00:04:00,550 going to free each one. 75 00:04:00,550 --> 00:04:03,370 It's going to do a bunch of these different operations 76 00:04:03,370 --> 00:04:05,650 and measure to see how much time it took. 77 00:04:05,650 --> 00:04:07,720 I'm just logging how much time it took. 78 00:04:07,720 --> 00:04:11,380 It's not an asymptotic analysis, but hopefully, we 79 00:04:11,380 --> 00:04:13,330 see some separation. 80 00:04:13,330 --> 00:04:16,390 So when you press that, it runs a bunch of tests. 81 00:04:16,390 --> 00:04:19,269 Let's take a look. 82 00:04:19,269 --> 00:04:20,620 OK. 83 00:04:20,620 --> 00:04:23,660 I've got a bunch of sequence operations. 84 00:04:23,660 --> 00:04:28,150 We've got build, set_at, get-at, insert, delete, 85 00:04:28,150 --> 00:04:30,880 at the various places. 86 00:04:30,880 --> 00:04:36,250 And these are the actual timings to some scale-- 87 00:04:36,250 --> 00:04:39,610 to some resolution that I had for these data structures. 88 00:04:39,610 --> 00:04:41,680 And you can see build-- 89 00:04:41,680 --> 00:04:44,170 actually, build, on this machine, 90 00:04:44,170 --> 00:04:46,270 just allocating some array and clearing it, 91 00:04:46,270 --> 00:04:49,850 is a really efficient thing that Python is going to do for me. 92 00:04:49,850 --> 00:04:54,520 And so that's actually-- it's mislabelling that as log n. 93 00:04:54,520 --> 00:04:57,970 But these other things, get_at and set_at-- 94 00:04:57,970 --> 00:04:59,680 really, really fast, right? 95 00:04:59,680 --> 00:05:00,910 That's constant time. 96 00:05:00,910 --> 00:05:02,800 And then these other things, I essentially 97 00:05:02,800 --> 00:05:05,020 can't do better than loop through the thing, 98 00:05:05,020 --> 00:05:06,730 and so it takes linear time. 99 00:05:06,730 --> 00:05:11,500 And again, sequence stuff, setting_at and getting_at 100 00:05:11,500 --> 00:05:15,250 is slow, but deleting and removing from the first thing, 101 00:05:15,250 --> 00:05:17,800 I'm just re-linking the pointer, right? 102 00:05:17,800 --> 00:05:18,910 Dynamic arrays. 103 00:05:18,910 --> 00:05:23,440 Again, set_at, get_at is fast, because it's just 104 00:05:23,440 --> 00:05:24,670 regular arrays. 105 00:05:24,670 --> 00:05:27,340 And then inserting- and deleting-last, 106 00:05:27,340 --> 00:05:30,010 that's getting, essentially, constant time. 107 00:05:30,010 --> 00:05:32,680 Now, I'm actually-- when I'm running these tests to deal 108 00:05:32,680 --> 00:05:34,180 with averages, I'm actually running 109 00:05:34,180 --> 00:05:36,980 these things, a lot of times, and testing their performance. 110 00:05:36,980 --> 00:05:41,500 And so I'm not seeing the worst case happen here, right? 111 00:05:41,500 --> 00:05:43,420 I'm averaging over all of the things, which 112 00:05:43,420 --> 00:05:46,630 is exactly what amortization means. 113 00:05:46,630 --> 00:05:50,560 That's why I'm getting good performance here. 114 00:05:50,560 --> 00:05:51,940 A hash table. 115 00:05:51,940 --> 00:05:56,710 Again, really-- oh, so this is what 116 00:05:56,710 --> 00:05:59,320 we talked about in problem session 117 00:05:59,320 --> 00:06:04,180 last week, implementing kind of a double-ended queue 118 00:06:04,180 --> 00:06:06,070 with a hash table. 119 00:06:06,070 --> 00:06:07,450 This is that implementation. 120 00:06:07,450 --> 00:06:09,130 I just wanted to show it to you. 121 00:06:09,130 --> 00:06:10,460 But it's actually pretty good. 122 00:06:10,460 --> 00:06:14,080 This is what JavaScript uses for arrays. 123 00:06:14,080 --> 00:06:19,240 And then a binary sequence represented as a binary tree-- 124 00:06:19,240 --> 00:06:20,550 a balanced binary tree. 125 00:06:20,550 --> 00:06:22,480 This is our AVL code that I had. 126 00:06:22,480 --> 00:06:25,720 And all of the other things have been really pretty bad 127 00:06:25,720 --> 00:06:28,338 at insert_at and delete_at, but this one 128 00:06:28,338 --> 00:06:30,130 does comparable to all of the other things. 129 00:06:30,130 --> 00:06:34,060 Now, you see these are a little bit more machine 130 00:06:34,060 --> 00:06:39,230 cycles than the other things, but not so bad, actually. 131 00:06:39,230 --> 00:06:41,710 And then, on the set side of things, 132 00:06:41,710 --> 00:06:44,420 again, we had a sorted array. 133 00:06:44,420 --> 00:06:46,840 Sorry, this is a set from an array. 134 00:06:46,840 --> 00:06:48,970 Basically, it's an unsorted array. 135 00:06:48,970 --> 00:06:52,480 I'm just looking for all of the things-- that's very bad times. 136 00:06:52,480 --> 00:06:54,970 Sorted array does these find operations great, 137 00:06:54,970 --> 00:06:58,040 but inserting and deleting is poor. 138 00:06:58,040 --> 00:07:00,820 That's why we need binary trees. 139 00:07:00,820 --> 00:07:05,080 Hash tables get good dictionary operations, but really 140 00:07:05,080 --> 00:07:08,410 bad order operations. 141 00:07:08,410 --> 00:07:13,990 And then the binary search tree, a set binary tree, again, 142 00:07:13,990 --> 00:07:16,490 does quite good on all of these things. 143 00:07:16,490 --> 00:07:20,450 In fact, it's getting really quite good-- 144 00:07:20,450 --> 00:07:22,300 it's getting better, for some reason, 145 00:07:22,300 --> 00:07:26,200 than the sorted array even did. 146 00:07:26,200 --> 00:07:27,730 I don't know why. 147 00:07:27,730 --> 00:07:30,220 Our implementations are not optimized at all. 148 00:07:30,220 --> 00:07:31,890 But it does pretty well asymptotically. 149 00:07:31,890 --> 00:07:32,160 Yeah? 150 00:07:32,160 --> 00:07:33,535 AUDIENCE: Could you explain again 151 00:07:33,535 --> 00:07:38,470 why the first data treasury [INAUDIBLE] log n? 152 00:07:38,470 --> 00:07:41,830 JASON KU: This is just labeled based off the timings. 153 00:07:41,830 --> 00:07:43,930 It happens to be that, probably, there's 154 00:07:43,930 --> 00:07:47,200 a C intrinsic underneath Python that allocates this thing, 155 00:07:47,200 --> 00:07:49,060 and so it does it really fast. 156 00:07:49,060 --> 00:07:51,460 My program that's looking at these numbers 157 00:07:51,460 --> 00:07:54,350 and trying to guess what the asymptotic running time is, 158 00:07:54,350 --> 00:07:57,590 these are just labels based on these things, ranges. 159 00:07:57,590 --> 00:07:59,113 I just-- it's mischaracterized. 160 00:07:59,113 --> 00:08:01,605 AUDIENCE: [INAUDIBLE] 161 00:08:01,605 --> 00:08:02,230 JASON KU: Yeah. 162 00:08:02,230 --> 00:08:06,940 Well, I mean, in actuality, if it was C code-- 163 00:08:06,940 --> 00:08:09,730 if all of this stuff was in C, probably, 164 00:08:09,730 --> 00:08:12,820 we would see that bar be longer, because it's actually 165 00:08:12,820 --> 00:08:15,370 having to go through and touch all of that memory. 166 00:08:15,370 --> 00:08:18,250 It's still doing that here, but all of the Python stuff 167 00:08:18,250 --> 00:08:19,810 is super crufty. 168 00:08:19,810 --> 00:08:23,110 It's, like, 100 times slower than anything that C does. 169 00:08:23,110 --> 00:08:26,730 And so you're seeing that disparity. 170 00:08:26,730 --> 00:08:27,920 Does that make sense? 171 00:08:27,920 --> 00:08:28,420 OK. 172 00:08:28,420 --> 00:08:30,610 I just wanted to show you that. 173 00:08:30,610 --> 00:08:33,850 We might release this for you to play around with, 174 00:08:33,850 --> 00:08:36,309 but just wanted to give you a taste of that. 175 00:08:36,309 --> 00:08:41,010 OK, any questions before we move on? 176 00:08:41,010 --> 00:08:42,879 How do I turn this off? 177 00:08:42,879 --> 00:08:46,960 Up, and off. 178 00:08:46,960 --> 00:08:47,680 Shut down. 179 00:08:51,750 --> 00:08:52,780 Yes. 180 00:08:52,780 --> 00:08:53,280 OK. 181 00:08:58,440 --> 00:09:02,640 Moving on to problems-- working some problems. 182 00:09:02,640 --> 00:09:04,660 You have your set of problems here. 183 00:09:04,660 --> 00:09:09,483 The first one is, we're going to look at a sequence AVL tree. 184 00:09:09,483 --> 00:09:10,650 This is a sequence AVL tree. 185 00:09:10,650 --> 00:09:11,442 How do I know that? 186 00:09:15,200 --> 00:09:16,760 You don't, necessarily. 187 00:09:16,760 --> 00:09:19,460 But these things are certainly not 188 00:09:19,460 --> 00:09:23,450 in sorted order of the things I'm storing in them, Yeah? 189 00:09:23,450 --> 00:09:26,420 So it had better not be a set AVL tree. 190 00:09:26,420 --> 00:09:27,530 Is it an AVL tree? 191 00:09:27,530 --> 00:09:30,510 Is it balanced-- height balanced? 192 00:09:30,510 --> 00:09:32,220 Yeah, basically. 193 00:09:32,220 --> 00:09:36,810 Actually, if you compute the size of each subtree, 194 00:09:36,810 --> 00:09:39,120 the left and right subtrees on all of these-- you 195 00:09:39,120 --> 00:09:41,400 can confirm for yourself-- are balanced. 196 00:09:41,400 --> 00:09:43,950 They're within plus or minus 1 of each other. 197 00:09:43,950 --> 00:09:46,500 Actually, this is about as far away from balanced 198 00:09:46,500 --> 00:09:50,550 as you could get for this many nodes 199 00:09:50,550 --> 00:09:52,600 while still maintaining height balance-- 200 00:09:52,600 --> 00:09:54,720 maintaining AVL property-- which is why 201 00:09:54,720 --> 00:09:56,040 this is an instructive example. 202 00:09:56,040 --> 00:09:58,740 It's kind of at the limit. 203 00:09:58,740 --> 00:10:02,370 And what am I going to do? 204 00:10:02,370 --> 00:10:04,980 What's missing in this picture if I'm claiming 205 00:10:04,980 --> 00:10:08,790 this is a sequence AVL tree? 206 00:10:08,790 --> 00:10:11,850 Any ideas what's missing? 207 00:10:11,850 --> 00:10:14,700 What does the sequence AVL tree store that I'm not 208 00:10:14,700 --> 00:10:16,360 showing in this picture? 209 00:10:16,360 --> 00:10:17,190 AUDIENCE: Counts. 210 00:10:17,190 --> 00:10:17,490 JASON KU: What? 211 00:10:17,490 --> 00:10:18,282 AUDIENCE: Counts. 212 00:10:18,282 --> 00:10:18,990 JASON KU: Counts. 213 00:10:18,990 --> 00:10:19,490 And? 214 00:10:22,520 --> 00:10:25,360 It's a sequence AVL tree. 215 00:10:25,360 --> 00:10:27,320 Heights, right? 216 00:10:27,320 --> 00:10:29,470 Sequence AVL trees, different than set AVL trees, 217 00:10:29,470 --> 00:10:31,000 are augmented by two things, right? 218 00:10:31,000 --> 00:10:34,690 Because I need to be maintaining balance during 219 00:10:34,690 --> 00:10:36,910 rotations, and so I need to store heights. 220 00:10:36,910 --> 00:10:39,490 I need to be able to tell what the heights of these subtrees 221 00:10:39,490 --> 00:10:41,440 are in constant time when I'm walking up 222 00:10:41,440 --> 00:10:43,090 the tree, fixing things. 223 00:10:43,090 --> 00:10:47,440 And the sequence requires me to store their subtree numbers 224 00:10:47,440 --> 00:10:50,200 there. 225 00:10:50,200 --> 00:10:52,030 I don't know-- I'm not going to draw it 226 00:10:52,030 --> 00:10:55,270 for all of these things, but how about for number four? 227 00:10:55,270 --> 00:10:56,050 What's its height? 228 00:10:59,670 --> 00:11:00,840 1, 2, 3. 229 00:11:00,840 --> 00:11:05,200 That's the longest path from the root subtree of that. 230 00:11:05,200 --> 00:11:09,210 So this is height equals 3. 231 00:11:09,210 --> 00:11:13,900 That came from height equals 2 and height equals 1. 232 00:11:13,900 --> 00:11:15,000 Does everyone see that? 233 00:11:15,000 --> 00:11:15,690 Yeah? 234 00:11:15,690 --> 00:11:18,390 And then the size here, how big is that? 235 00:11:18,390 --> 00:11:22,440 That's 1, 2, 3, 4, 5, 6, 7. 236 00:11:22,440 --> 00:11:27,060 This is-- I'm going to put size equals 7 here. 237 00:11:27,060 --> 00:11:31,770 And that's coming from this guy, 1, 2, 3, 4. 238 00:11:31,770 --> 00:11:33,105 And this guy is 2. 239 00:11:36,300 --> 00:11:39,700 So how do I compute the subtree size? 240 00:11:39,700 --> 00:11:43,830 It's my left subtree size plus my right subtree size plus 1. 241 00:11:43,830 --> 00:11:47,100 And my height is taking the max of the 2 plus 1. 242 00:11:47,100 --> 00:11:47,600 All right. 243 00:11:47,600 --> 00:11:50,280 So we did all that yesterday. 244 00:11:50,280 --> 00:11:52,860 I'm just labeling these things. 245 00:11:52,860 --> 00:11:59,730 And what I'm asking of you is to perform a delete operation. 246 00:11:59,730 --> 00:12:02,250 This is a sequence tree. 247 00:12:02,250 --> 00:12:06,190 So I'm finding things by their index in the tree. 248 00:12:06,190 --> 00:12:08,790 So I'm going to ask you to delete the eighth thing 249 00:12:08,790 --> 00:12:10,980 in my sequence. 250 00:12:10,980 --> 00:12:14,770 What is the 8th thing in my sequence? 251 00:12:14,770 --> 00:12:15,370 Yeah? 252 00:12:15,370 --> 00:12:17,890 AUDIENCE: Just to clarify, since delete-8 is not delete 253 00:12:17,890 --> 00:12:18,520 the number. 254 00:12:18,520 --> 00:12:19,270 JASON KU: Correct. 255 00:12:19,270 --> 00:12:21,360 Well, delete_at 8. 256 00:12:21,360 --> 00:12:22,237 Do you see that? 257 00:12:22,237 --> 00:12:23,320 It's a sequence operation. 258 00:12:23,320 --> 00:12:23,650 AUDIENCE: Oh, OK. 259 00:12:23,650 --> 00:12:24,195 [INAUDIBLE] 260 00:12:24,195 --> 00:12:24,820 JASON KU: Yeah. 261 00:12:24,820 --> 00:12:27,160 So this is very important, that you 262 00:12:27,160 --> 00:12:30,340 differentiate between sequence and set semantics. 263 00:12:30,340 --> 00:12:32,140 If I'm dealing with the sequence, 264 00:12:32,140 --> 00:12:35,440 I had better not be looking up intrinsic things 265 00:12:35,440 --> 00:12:38,390 on this data structure, because it's not an intrinsic data 266 00:12:38,390 --> 00:12:38,890 structure. 267 00:12:38,890 --> 00:12:39,910 It doesn't support that. 268 00:12:39,910 --> 00:12:46,870 If I wanted to support find, say, the index of key 8, 269 00:12:46,870 --> 00:12:50,410 or something like that, then all I could do 270 00:12:50,410 --> 00:12:51,993 is-- it's similar to an array. 271 00:12:51,993 --> 00:12:54,160 I would just have to loop through the whole sequence 272 00:12:54,160 --> 00:12:56,290 and tell me if the thing is in it. 273 00:12:56,290 --> 00:12:58,165 Can't really do better than linear time. 274 00:12:58,165 --> 00:13:00,040 This data structure is not designed for that. 275 00:13:00,040 --> 00:13:01,040 What is it designed for? 276 00:13:01,040 --> 00:13:05,140 It's designed to find things by their index in the sequence. 277 00:13:05,140 --> 00:13:12,130 So how do I find the 8th index? 278 00:13:12,130 --> 00:13:14,110 Well, I mean, I'm looking at the tree, 279 00:13:14,110 --> 00:13:17,140 and I can just count along in-order traversal. 280 00:13:17,140 --> 00:13:19,990 What's the in-order traversal? 281 00:13:19,990 --> 00:13:24,850 0, 1, 2, 3, 4, 5, 6, 7. 282 00:13:24,850 --> 00:13:26,530 OK, found 8. 283 00:13:26,530 --> 00:13:29,470 But what does a sequence AVL tree do? 284 00:13:32,620 --> 00:13:34,990 I'm storing subtree sizes, and when I'm here, 285 00:13:34,990 --> 00:13:38,110 I don't know what index I'm at. 286 00:13:38,110 --> 00:13:40,900 How can I find out what index I'm at from the root? 287 00:13:40,900 --> 00:13:44,980 I look at my left subtree, see how many it is. 288 00:13:44,980 --> 00:13:47,990 There are seven things here. 289 00:13:47,990 --> 00:13:52,910 1, 2, 3, 4, 5, 6, 7. 290 00:13:52,910 --> 00:13:53,410 Yeah. 291 00:13:53,410 --> 00:13:57,280 Because I'm looking for the 9th item by index 8. 292 00:13:57,280 --> 00:14:01,060 This is saying that I'm the 8th item. 293 00:14:01,060 --> 00:14:03,485 I'm the guy at index 7. 294 00:14:03,485 --> 00:14:04,360 Does that make sense? 295 00:14:04,360 --> 00:14:07,310 Because I'm looking at the subtree size here. 296 00:14:07,310 --> 00:14:08,930 So what do I know? 297 00:14:08,930 --> 00:14:13,240 I know that the index that I'm looking for is to my right. 298 00:14:13,240 --> 00:14:17,350 I go down over here and I happen to know-- 299 00:14:17,350 --> 00:14:23,470 what index am I looking for in this subtree? 300 00:14:23,470 --> 00:14:25,090 0, right? 301 00:14:25,090 --> 00:14:27,490 I want the first thing in the subtree. 302 00:14:27,490 --> 00:14:32,380 My search index has changed now, because I essentially dealt 303 00:14:32,380 --> 00:14:35,740 with all of those eight items. 304 00:14:35,740 --> 00:14:38,990 Here, I'm looking for the 0th thing in my index. 305 00:14:38,990 --> 00:14:40,290 I look to my left-- 306 00:14:40,290 --> 00:14:43,930 if I didn't have a left subtree, I would be the 0th thing, 307 00:14:43,930 --> 00:14:45,010 and I would return me. 308 00:14:45,010 --> 00:14:46,330 But there is stuff in here. 309 00:14:46,330 --> 00:14:51,850 So I'm looking for the 0th thing in here, which is just him, 310 00:14:51,850 --> 00:14:53,560 and I return it. 311 00:14:53,560 --> 00:14:56,890 And actually, what I'm doing is, I'm deleting it. 312 00:14:56,890 --> 00:15:00,190 So I delete it. 313 00:15:00,190 --> 00:15:02,610 Yuck. 314 00:15:02,610 --> 00:15:05,382 What's the problem here? 315 00:15:05,382 --> 00:15:06,260 AUDIENCE: [INAUDIBLE] 316 00:15:06,260 --> 00:15:07,920 JASON KU: Not height balanced. 317 00:15:07,920 --> 00:15:10,610 What's not height balanced here? 318 00:15:10,610 --> 00:15:12,510 AUDIENCE: The left subtree is-- 319 00:15:12,510 --> 00:15:13,150 or, sorry. 320 00:15:13,150 --> 00:15:13,830 [INAUDIBLE] 321 00:15:13,830 --> 00:15:15,280 JASON KU: This guy is not height balanced, right? 322 00:15:15,280 --> 00:15:17,100 AUDIENCE: --right subtree of the right [INAUDIBLE].. 323 00:15:17,100 --> 00:15:19,558 JASON KU: This guy's subtree is not height balanced, right? 324 00:15:19,558 --> 00:15:20,250 This guy's 2. 325 00:15:20,250 --> 00:15:22,380 This guy's 1. 326 00:15:22,380 --> 00:15:24,300 So how do we fix it? 327 00:15:24,300 --> 00:15:25,740 AUDIENCE: Rotate. 328 00:15:25,740 --> 00:15:27,690 JASON KU: We do some rotations. 329 00:15:27,690 --> 00:15:30,470 This is actually the bad case that-- 330 00:15:30,470 --> 00:15:33,570 the third bad case that we talked about yesterday. 331 00:15:33,570 --> 00:15:37,260 If I tried to just left-rotate this guy, 332 00:15:37,260 --> 00:15:38,540 what would it look like? 333 00:15:38,540 --> 00:15:40,740 It would put 12 here. 334 00:15:40,740 --> 00:15:42,240 It would put 10 here. 335 00:15:42,240 --> 00:15:45,360 And 8 would be attached to that. 336 00:15:45,360 --> 00:15:48,090 Now it's height balanced wrong in the other direction, right? 337 00:15:48,090 --> 00:15:49,950 That's no good. 338 00:15:49,950 --> 00:15:53,490 So the way to handle this case, where 339 00:15:53,490 --> 00:15:59,100 I am badly skewed to the right but my right subtree 340 00:15:59,100 --> 00:16:03,900 is skewed to the left, I have to do a rotation here, 341 00:16:03,900 --> 00:16:06,810 right rotation, and then do a rotation. 342 00:16:06,810 --> 00:16:08,880 That's the formula. 343 00:16:08,880 --> 00:16:10,920 Here, we first do a right rotation 344 00:16:10,920 --> 00:16:21,440 at 10, which gives me something that looks like 8, 10. 345 00:16:21,440 --> 00:16:25,150 Now, obviously, this is not better than what was before, 346 00:16:25,150 --> 00:16:28,750 but it was an intermediate step so that we can fix it. 347 00:16:28,750 --> 00:16:32,050 We right-rotate here and then we left-rotate here. 348 00:16:32,050 --> 00:16:34,290 The default is that we would left-rotate here, 349 00:16:34,290 --> 00:16:38,440 but because this had the skew in the wrong direction, 350 00:16:38,440 --> 00:16:41,390 I need to right-rotate this one first and then we can do it. 351 00:16:41,390 --> 00:16:43,570 So now I rotate all of these guys 352 00:16:43,570 --> 00:16:50,005 over and put 12 down here, 8 here, 10 here. 353 00:16:53,010 --> 00:16:56,700 Everyone see that that's what a rotation looks like? 354 00:16:56,700 --> 00:16:57,230 OK. 355 00:16:57,230 --> 00:17:00,210 It takes a little while to get your mind wrapped around what 356 00:17:00,210 --> 00:17:02,970 the transformation is, but hopefully, you guys all 357 00:17:02,970 --> 00:17:04,283 followed that transformation. 358 00:17:04,283 --> 00:17:06,450 There was a little magic while I was trying to draw. 359 00:17:06,450 --> 00:17:06,930 Yeah? 360 00:17:06,930 --> 00:17:08,500 AUDIENCE: I still don't feel like this tree is 361 00:17:08,500 --> 00:17:09,270 height balanced. 362 00:17:09,270 --> 00:17:10,980 JASON KU: It's not. 363 00:17:10,980 --> 00:17:12,630 Good observation. 364 00:17:12,630 --> 00:17:13,890 Why is that? 365 00:17:13,890 --> 00:17:15,750 This thing still has height 3. 366 00:17:15,750 --> 00:17:17,550 What is the height of this thing? 367 00:17:17,550 --> 00:17:19,020 1, right? 368 00:17:19,020 --> 00:17:20,700 This is height 1. 369 00:17:20,700 --> 00:17:23,400 And actually, when I was doing that rotation, 370 00:17:23,400 --> 00:17:27,318 I needed to update all of these augmentations. 371 00:17:27,318 --> 00:17:28,860 Which augmentations did I really need 372 00:17:28,860 --> 00:17:31,620 to-- which subtrees have changed during those things? 373 00:17:36,380 --> 00:17:38,810 I don't remember what the thing looked like. 374 00:17:38,810 --> 00:17:41,780 What did the thing look like? 375 00:17:41,780 --> 00:17:46,860 10 had 8 in its subtree, so its subtree definitely changed. 376 00:17:46,860 --> 00:17:48,140 8's subtree changed. 377 00:17:48,140 --> 00:17:49,220 AUDIENCE: [INAUDIBLE] 378 00:17:49,220 --> 00:17:51,000 JASON KU: 12 didn't change-- 379 00:17:51,000 --> 00:17:51,500 eventually. 380 00:17:51,500 --> 00:17:54,498 AUDIENCE: These are 10 and 8 [INAUDIBLE].. 381 00:17:54,498 --> 00:17:55,040 JASON KU: OK. 382 00:17:55,040 --> 00:17:57,920 So there's the case analysis that's in your lecture notes 383 00:17:57,920 --> 00:18:00,050 and was done in recitation. 384 00:18:00,050 --> 00:18:03,800 It tells you that these A, B, C, D 385 00:18:03,800 --> 00:18:06,860 kind of subtress, the ones that could 386 00:18:06,860 --> 00:18:09,650 change in these things, those subtrees don't change. 387 00:18:09,650 --> 00:18:12,290 The only subtree that changed during one 388 00:18:12,290 --> 00:18:15,620 of these fix operations, when you do one or two rotations, 389 00:18:15,620 --> 00:18:18,860 is either two nodes or three nodes 390 00:18:18,860 --> 00:18:20,240 whose subtree has changed. 391 00:18:20,240 --> 00:18:23,900 Here, it could have been the case that three subtrees have 392 00:18:23,900 --> 00:18:26,000 changed. 393 00:18:26,000 --> 00:18:29,780 But in the easy case, only two nodes-- x and y, I think, 394 00:18:29,780 --> 00:18:31,130 in the notes-- 395 00:18:31,130 --> 00:18:32,040 could have changed. 396 00:18:32,040 --> 00:18:36,710 And so when I do that, I have to recompute their augmentations 397 00:18:36,710 --> 00:18:39,200 from their augmentations of their children, 398 00:18:39,200 --> 00:18:42,110 but it's only a constant number of those, 399 00:18:42,110 --> 00:18:45,780 so I just recompute them, because the subtrees below me 400 00:18:45,780 --> 00:18:47,430 haven't changed. 401 00:18:47,430 --> 00:18:47,930 OK. 402 00:18:47,930 --> 00:18:50,280 So we have a height mismatch here. 403 00:18:50,280 --> 00:18:50,926 Yeah? 404 00:18:50,926 --> 00:18:52,165 AUDIENCE: [INAUDIBLE] 405 00:18:56,743 --> 00:18:57,410 JASON KU: Right. 406 00:18:57,410 --> 00:19:01,730 So, originally, in the picture, 12 407 00:19:01,730 --> 00:19:04,010 has a bunch of things in its subtree-- 408 00:19:04,010 --> 00:19:08,350 10 and 8, and we just deleted 7. 409 00:19:08,350 --> 00:19:09,850 So its subtree definitely changed. 410 00:19:09,850 --> 00:19:10,980 There used to be three-- 411 00:19:10,980 --> 00:19:12,040 AUDIENCE: [INAUDIBLE] 412 00:19:12,040 --> 00:19:14,130 JASON KU: Oh, no, sorry, it did. 413 00:19:14,130 --> 00:19:14,680 Yeah. 414 00:19:14,680 --> 00:19:18,040 So here, three node subtrees have changed. 415 00:19:18,040 --> 00:19:20,890 But that's actually the most. 416 00:19:20,890 --> 00:19:23,620 I'm showing you the worst case. 417 00:19:23,620 --> 00:19:25,870 Only three nodes possible in doing 418 00:19:25,870 --> 00:19:28,143 one of these double-rotation things 419 00:19:28,143 --> 00:19:29,560 could have changed their subtrees. 420 00:19:29,560 --> 00:19:32,740 And so we just have to fix the augmentation of those three 421 00:19:32,740 --> 00:19:33,610 things. 422 00:19:33,610 --> 00:19:36,880 In the easy case, it's just two things. 423 00:19:36,880 --> 00:19:38,050 All right. 424 00:19:38,050 --> 00:19:39,580 We have unbalanced. 425 00:19:39,580 --> 00:19:40,600 How can we fix this? 426 00:19:43,520 --> 00:19:45,890 I could have been mean. 427 00:19:45,890 --> 00:19:49,070 I want to be able to right-rotate here, 428 00:19:49,070 --> 00:19:54,140 to re-balance I could have been mean and switched these two. 429 00:19:54,140 --> 00:19:56,150 If I switched those two, then I would 430 00:19:56,150 --> 00:19:58,100 have to do two rotations to fix this thing, 431 00:19:58,100 --> 00:20:01,040 because the middle one is heavier than the left one 432 00:20:01,040 --> 00:20:04,250 against what I'm doing. 433 00:20:04,250 --> 00:20:09,140 But I'm not that mean, so I'm going to right-rotate. 434 00:20:09,140 --> 00:20:10,370 How do I do that? 435 00:20:10,370 --> 00:20:13,130 Well, a right-rotate at 6 is going 436 00:20:13,130 --> 00:20:16,940 to bring all of this down below 4 437 00:20:16,940 --> 00:20:20,075 and stick this subtree as the left child of 6. 438 00:20:20,075 --> 00:20:21,830 Does that makes sense? 439 00:20:21,830 --> 00:20:22,438 Yuck. 440 00:20:22,438 --> 00:20:23,730 That's going to be fun to draw. 441 00:20:28,100 --> 00:20:30,580 I'm just going to redraw it. 442 00:20:30,580 --> 00:20:32,390 That makes more sense, right? 443 00:20:32,390 --> 00:20:54,020 4, 11, 3, 2, 1, and then 6, 5, 9, 8, 12, 10. 444 00:20:54,020 --> 00:20:57,730 That's the right rotation at 6. 445 00:20:57,730 --> 00:21:00,520 Is everyone cool with this? 446 00:21:00,520 --> 00:21:05,140 The rotation-- my x is 6, my y is 4. 447 00:21:05,140 --> 00:21:09,400 I have A, B, C subtrees. 448 00:21:09,400 --> 00:21:12,610 What I'm doing is kind of switching which of x and y 449 00:21:12,610 --> 00:21:15,610 is the root here. 450 00:21:15,610 --> 00:21:23,080 So now y is the root, and B and C subtrees here now become 451 00:21:23,080 --> 00:21:25,480 the children of x underneath y. 452 00:21:25,480 --> 00:21:29,110 And notice that, hopefully, through all of that process, 453 00:21:29,110 --> 00:21:33,150 my in-order traversal has not changed. 454 00:21:33,150 --> 00:21:35,600 We had to update our augmentations along the way, 455 00:21:35,600 --> 00:21:38,630 but it's a constant every time we walk up the tree. 456 00:21:38,630 --> 00:21:41,510 And we walk up the tree only a logarithmic number of times. 457 00:21:41,510 --> 00:21:42,460 Yeah? 458 00:21:42,460 --> 00:21:42,960 Yes. 459 00:21:42,960 --> 00:21:43,880 AUDIENCE: [INAUDIBLE]. 460 00:21:43,880 --> 00:21:46,778 So every time we do a rotation, do you just 461 00:21:46,778 --> 00:21:48,570 update the augmentation via the [INAUDIBLE] 462 00:21:48,570 --> 00:21:49,850 before we do any other rotation? 463 00:21:49,850 --> 00:21:50,145 JASON KU: Exactly. 464 00:21:50,145 --> 00:21:51,375 AUDIENCE: The second part. 465 00:21:51,375 --> 00:21:53,870 Updating the augmentation just means 466 00:21:53,870 --> 00:21:57,380 updating the count, and the height, and just, 467 00:21:57,380 --> 00:21:58,510 the properties that stay-- 468 00:21:58,510 --> 00:21:59,135 JASON KU: Yeah. 469 00:21:59,135 --> 00:22:00,650 Basically what we did, we defined-- 470 00:22:00,650 --> 00:22:02,390 when we augmented-- 471 00:22:02,390 --> 00:22:04,610 Professor Demaine yesterday defined for you 472 00:22:04,610 --> 00:22:07,100 what a subtree property is. 473 00:22:07,100 --> 00:22:10,490 It meant a property that I can compute only 474 00:22:10,490 --> 00:22:12,590 by looking at my children-- 475 00:22:12,590 --> 00:22:16,250 the augmentations of my children, recursively. 476 00:22:16,250 --> 00:22:20,570 So here, instead of trying to increment or try 477 00:22:20,570 --> 00:22:24,440 to think about, locally, what this augmentation should be, 478 00:22:24,440 --> 00:22:26,990 I'm going to throw away my old augmentation 479 00:22:26,990 --> 00:22:28,670 and just recompute it from my children, 480 00:22:28,670 --> 00:22:31,905 because those, recursively, had better be correct. 481 00:22:31,905 --> 00:22:32,780 Does that make sense? 482 00:22:32,780 --> 00:22:33,463 Yeah? 483 00:22:33,463 --> 00:22:35,630 AUDIENCE: So just looking at how the rotation works. 484 00:22:35,630 --> 00:22:38,420 I'm having trouble wrapping my head around. 485 00:22:38,420 --> 00:22:41,870 So basically, you're swapping 4 and 6, and that way, 486 00:22:41,870 --> 00:22:47,390 4 becomes the parent node and 6 becomes the right node. 487 00:22:47,390 --> 00:22:50,480 JASON KU: I'm going to draw this picture. 488 00:22:50,480 --> 00:22:52,280 It's just something you've got to memorize. 489 00:22:55,420 --> 00:23:05,640 This is x, B, C, and A. Can you see that picture? 490 00:23:05,640 --> 00:23:07,495 AUDIENCE: [INAUDIBLE] 491 00:23:07,495 --> 00:23:08,120 JASON KU: What? 492 00:23:08,120 --> 00:23:09,355 AUDIENCE: [INAUDIBLE] 493 00:23:09,355 --> 00:23:09,980 JASON KU: Yeah. 494 00:23:09,980 --> 00:23:11,090 It's in your notes. 495 00:23:11,090 --> 00:23:12,780 It's not a big deal. 496 00:23:12,780 --> 00:23:15,140 But if you've got this structure, 497 00:23:15,140 --> 00:23:19,010 where x has a left child-- 498 00:23:19,010 --> 00:23:21,150 and these subtrees may be empty or not. 499 00:23:21,150 --> 00:23:22,100 Doesn't really matter. 500 00:23:22,100 --> 00:23:25,490 What I can do is, I can move from here to there-- 501 00:23:25,490 --> 00:23:30,780 has the same in-order traversal order, 502 00:23:30,780 --> 00:23:32,140 but it's got a different shape. 503 00:23:32,140 --> 00:23:34,050 And in particular, subtree heights 504 00:23:34,050 --> 00:23:37,740 have changed, which means it can help us re-balance the tree. 505 00:23:37,740 --> 00:23:39,960 And that's the whole point of AVL. 506 00:23:39,960 --> 00:23:41,350 Does that make sense? 507 00:23:41,350 --> 00:23:43,320 AUDIENCE: That's the right rotation? 508 00:23:43,320 --> 00:23:45,210 JASON KU: This one is-- 509 00:23:45,210 --> 00:23:46,680 this is a right rotation. 510 00:23:46,680 --> 00:23:48,960 This is a left rotation. 511 00:23:48,960 --> 00:23:50,860 Any other questions? 512 00:23:50,860 --> 00:23:51,360 Yeah? 513 00:23:51,360 --> 00:23:54,776 AUDIENCE: [INAUDIBLE] 514 00:23:58,530 --> 00:24:00,510 JASON KU: As I'm walking up the tree, 515 00:24:00,510 --> 00:24:03,720 every node I might have to fix with a re-balance, 516 00:24:03,720 --> 00:24:07,290 but that re-balance does at most two rotations, 517 00:24:07,290 --> 00:24:11,070 and there is at most log n ancestors that I have, 518 00:24:11,070 --> 00:24:16,350 because my tree was height balanced at 2 519 00:24:16,350 --> 00:24:19,110 log n or something like that. 520 00:24:19,110 --> 00:24:21,810 Which means that, at max, I might 521 00:24:21,810 --> 00:24:26,310 have to do four log n rotations, because each one could 522 00:24:26,310 --> 00:24:27,310 do two rotations. 523 00:24:27,310 --> 00:24:28,230 Does that makes sense? 524 00:24:28,230 --> 00:24:31,680 Now, in actuality, you can prove that, in a delete operation, 525 00:24:31,680 --> 00:24:34,050 it's possible that you have to do a logarithmic number 526 00:24:34,050 --> 00:24:35,430 of these rotations up the tree. 527 00:24:35,430 --> 00:24:38,190 This was that bad case. 528 00:24:38,190 --> 00:24:40,980 The original tree I gave you is called a Fibonacci tree. 529 00:24:40,980 --> 00:24:45,090 It's the few-- it's the highest height balanced tree 530 00:24:45,090 --> 00:24:47,385 you can have on a given number of nodes. 531 00:24:52,620 --> 00:24:55,440 Yeah, the fewest nodes for a certain height. 532 00:24:55,440 --> 00:24:56,820 You can think of it either way. 533 00:24:56,820 --> 00:24:59,530 And if you generalize that to a large enough thing, 534 00:24:59,530 --> 00:25:01,470 then that thing will take a logarithmic number 535 00:25:01,470 --> 00:25:02,760 of rotations going up. 536 00:25:02,760 --> 00:25:05,070 Now, actually, with an insertion, 537 00:25:05,070 --> 00:25:09,270 you can actually prove-- you can go through the case analysis. 538 00:25:09,270 --> 00:25:12,090 An insertion operation will always re-balance the tree 539 00:25:12,090 --> 00:25:15,668 after one re-balance operation, which 540 00:25:15,668 --> 00:25:16,835 could include two rotations. 541 00:25:16,835 --> 00:25:19,350 Does that makes sense? 542 00:25:19,350 --> 00:25:20,053 Yeah? 543 00:25:20,053 --> 00:25:21,930 AUDIENCE: [INAUDIBLE] 544 00:25:27,280 --> 00:25:27,920 JASON KU: Yeah. 545 00:25:27,920 --> 00:25:30,415 Right rotation, this guy becomes a right child. 546 00:25:30,415 --> 00:25:31,040 AUDIENCE: Yeah. 547 00:25:31,040 --> 00:25:33,550 So can you not-- are there certain rotations that you 548 00:25:33,550 --> 00:25:36,730 can't perform, depending on whether you had a child-- 549 00:25:36,730 --> 00:25:37,390 JASON KU: Yeah. 550 00:25:37,390 --> 00:25:43,070 If I didn't have a left subtree, can't perform a right rotation 551 00:25:43,070 --> 00:25:44,680 there. 552 00:25:44,680 --> 00:25:47,340 A right rotation necessitates that I have a left child. 553 00:25:47,340 --> 00:25:48,520 So if you're doing it-- 554 00:25:48,520 --> 00:25:50,440 and you'll see, our code actually 555 00:25:50,440 --> 00:25:52,420 checks to make sure you have a left child. 556 00:25:57,500 --> 00:25:59,870 That's an assertion that you might 557 00:25:59,870 --> 00:26:02,960 want to fire before you ever do one of these rotations. 558 00:26:02,960 --> 00:26:03,700 Anything else? 559 00:26:03,700 --> 00:26:04,560 Yeah? 560 00:26:04,560 --> 00:26:06,393 AUDIENCE: Just to reiterate, so an insertion 561 00:26:06,393 --> 00:26:09,097 may take two rotations at most [INAUDIBLE]?? 562 00:26:09,097 --> 00:26:09,680 JASON KU: Mhm. 563 00:26:09,680 --> 00:26:11,680 A constant number of rotations, and the deletion 564 00:26:11,680 --> 00:26:14,540 could take a logarithmic number of rotations. 565 00:26:14,540 --> 00:26:16,490 Now, that's not something you need to know. 566 00:26:16,490 --> 00:26:19,460 It's not something I'm proving here to you. 567 00:26:19,460 --> 00:26:21,650 Just something that's interesting. 568 00:26:21,650 --> 00:26:26,390 There are rebalancing schemes, like in CRS. 569 00:26:26,390 --> 00:26:31,310 They introduce a red-black tree to introduce balance. 570 00:26:31,310 --> 00:26:34,670 And those trees actually have a weaker bound on-- 571 00:26:34,670 --> 00:26:39,680 it's not as tightly balanced as an AVL tree is. 572 00:26:39,680 --> 00:26:44,030 It allows higher than skew 2. 573 00:26:44,030 --> 00:26:46,400 And because it's kind of a weaker restriction, 574 00:26:46,400 --> 00:26:49,470 they get away with only doing a constant number of rotations-- 575 00:26:49,470 --> 00:26:54,560 that they can afford that before they fix the tree. 576 00:26:54,560 --> 00:26:56,150 But a little more complicated. 577 00:26:56,150 --> 00:26:57,680 AUDIENCE: [INAUDIBLE] 578 00:27:00,457 --> 00:27:01,290 JASON KU: Very nice. 579 00:27:01,290 --> 00:27:01,790 OK. 580 00:27:01,790 --> 00:27:02,780 Any questions on this? 581 00:27:06,090 --> 00:27:10,550 OK, so now, this is more of a mechanical question you'll 582 00:27:10,550 --> 00:27:11,750 get on your problem sets. 583 00:27:11,750 --> 00:27:15,950 And now we get more on to the theory-type questions. 584 00:27:15,950 --> 00:27:19,230 These are going to be a reduction-type questions. 585 00:27:19,230 --> 00:27:19,730 OK. 586 00:27:19,730 --> 00:27:25,250 This first problem, Fick Nury. 587 00:27:25,250 --> 00:27:26,840 This is-- anyone? 588 00:27:26,840 --> 00:27:27,650 Nick Fury, right? 589 00:27:27,650 --> 00:27:31,280 So it's an Avengers reference. 590 00:27:31,280 --> 00:27:33,800 So basically, what happens in this thing, 591 00:27:33,800 --> 00:27:38,210 he's got a list of superheroes that each 592 00:27:38,210 --> 00:27:41,150 have an opinion on whether they should go fight Sanos. 593 00:27:43,820 --> 00:27:49,220 And their opinion can be strongly positive 594 00:27:49,220 --> 00:27:50,730 or strongly negative. 595 00:27:50,730 --> 00:27:54,320 And so what Fick is trying to do is 596 00:27:54,320 --> 00:27:59,960 find, from among his Revengers, what 597 00:27:59,960 --> 00:28:04,820 the log n most extreme-opinion Revengers are so 598 00:28:04,820 --> 00:28:05,888 that he can talk to them. 599 00:28:05,888 --> 00:28:07,430 He doesn't want to talk to everybody. 600 00:28:07,430 --> 00:28:10,040 He wants to talk to a logarithmic number of them. 601 00:28:10,040 --> 00:28:11,030 OK. 602 00:28:11,030 --> 00:28:14,720 It's kind of-- whatever. 603 00:28:14,720 --> 00:28:18,800 Basically, we have a classified situation 604 00:28:18,800 --> 00:28:22,760 where you're given, as a read-only input data store 605 00:28:22,760 --> 00:28:23,980 of these things in an array. 606 00:28:26,900 --> 00:28:31,975 And I want to find the log n ones with the strongest 607 00:28:31,975 --> 00:28:32,475 opinions. 608 00:28:32,475 --> 00:28:34,138 Does that make sense? 609 00:28:34,138 --> 00:28:35,930 And I want to do it-- and the first problem 610 00:28:35,930 --> 00:28:39,440 is in linear time. 611 00:28:39,440 --> 00:28:42,770 You actually don't know how to do this yet. 612 00:28:42,770 --> 00:28:47,810 You'll know how to do it with material that you cover in-- 613 00:28:47,810 --> 00:28:50,420 well, they teach you one way to do it in 046, 614 00:28:50,420 --> 00:28:53,690 but we're not going to get you there right now. 615 00:28:53,690 --> 00:28:59,400 We'll teach you another way to do it on Tuesday, 616 00:28:59,400 --> 00:29:01,370 which is via binary heaps. 617 00:29:01,370 --> 00:29:03,980 Binary heaps are an interesting thing. 618 00:29:03,980 --> 00:29:14,390 It implements a subset of the set interface. 619 00:29:14,390 --> 00:29:23,150 Really, it just-- you can build on some iterable x. 620 00:29:23,150 --> 00:29:25,270 I collect a bunch of things. 621 00:29:25,270 --> 00:29:26,780 These items have keys. 622 00:29:26,780 --> 00:29:29,090 It's a key data structure in the same way. 623 00:29:29,090 --> 00:29:32,100 It's implementing what we call a priority queue interface. 624 00:29:32,100 --> 00:29:33,860 I can build these things. 625 00:29:33,860 --> 00:29:39,380 I can insert things, but I'm not going to do that here. 626 00:29:39,380 --> 00:29:43,490 All I really need here, for this situation, 627 00:29:43,490 --> 00:29:47,150 is a delete_superlative kind of operation-- 628 00:29:47,150 --> 00:29:50,870 in this case, probably max. 629 00:29:50,870 --> 00:29:52,115 Delete_max. 630 00:29:57,560 --> 00:29:58,400 So this is like-- 631 00:29:58,400 --> 00:30:01,713 I've got a data structure, I'm calling these things. 632 00:30:01,713 --> 00:30:02,630 Does that makes sense? 633 00:30:02,630 --> 00:30:02,890 Yeah? 634 00:30:02,890 --> 00:30:04,190 AUDIENCE: What's a priority queue? 635 00:30:04,190 --> 00:30:04,520 JASON KU: Yes. 636 00:30:04,520 --> 00:30:06,560 A priority queue is essentially something that 637 00:30:06,560 --> 00:30:08,450 implements these two things. 638 00:30:08,450 --> 00:30:11,570 Actually, there's a third one where I can insert a new thing, 639 00:30:11,570 --> 00:30:13,280 but I'm not going to need that right now. 640 00:30:13,280 --> 00:30:14,738 So that's what a priority queue is. 641 00:30:14,738 --> 00:30:15,890 And actually, a set-- 642 00:30:15,890 --> 00:30:19,180 this is a subset of the set interface. 643 00:30:19,180 --> 00:30:20,300 Right? 644 00:30:20,300 --> 00:30:22,250 The nice thing about a heap-- 645 00:30:22,250 --> 00:30:25,160 which, I won't to show you how it's done, 646 00:30:25,160 --> 00:30:27,380 but what a heap can do-- 647 00:30:27,380 --> 00:30:29,870 if I had both of these operations implemented 648 00:30:29,870 --> 00:30:34,520 using a set AVL tree, how long would these things take me? 649 00:30:41,430 --> 00:30:43,566 How long does it take to build a set AVL tree? 650 00:30:46,970 --> 00:30:48,620 n log n, right? 651 00:30:48,620 --> 00:30:50,960 Because essentially, I'm getting a sorted order 652 00:30:50,960 --> 00:30:53,300 out of this thing if I'm inserting these things 653 00:30:53,300 --> 00:30:55,220 one at a time-- 654 00:30:55,220 --> 00:30:57,320 or I could sort them, and then put them 655 00:30:57,320 --> 00:31:00,500 in a tree in linear time, like you saw a couple of days 656 00:31:00,500 --> 00:31:03,780 ago in recitation. 657 00:31:03,780 --> 00:31:05,840 But I have to sort them at some point, right? 658 00:31:05,840 --> 00:31:07,340 I'm kind of-- 659 00:31:07,340 --> 00:31:10,170 I need to take at least n log n time, 660 00:31:10,170 --> 00:31:14,120 because if I'm going to be able to return their traversal order 661 00:31:14,120 --> 00:31:18,110 in linear time, and I have this lower bound of n log 662 00:31:18,110 --> 00:31:21,800 n on sorting, I kind of needed to spend n log n time here, 663 00:31:21,800 --> 00:31:23,030 right? 664 00:31:23,030 --> 00:31:26,524 And how long would this delete_max take? 665 00:31:26,524 --> 00:31:28,632 AUDIENCE: It's sorted, so log n. 666 00:31:28,632 --> 00:31:29,590 JASON KU: Log n, right? 667 00:31:29,590 --> 00:31:31,020 So it's a set AVL tree. 668 00:31:31,020 --> 00:31:31,950 Where is my max? 669 00:31:31,950 --> 00:31:33,180 It's the right-most thing. 670 00:31:33,180 --> 00:31:35,520 I can just walk down the thing, take it off. 671 00:31:35,520 --> 00:31:37,050 Maybe I have to re-balance. 672 00:31:37,050 --> 00:31:39,420 But that's a log n operation. 673 00:31:39,420 --> 00:31:44,400 It's the same as insert-last in my subtree. 674 00:31:44,400 --> 00:31:46,860 For a set AVL tree, this is n log n. 675 00:31:46,860 --> 00:31:47,670 This is log n. 676 00:31:50,610 --> 00:31:53,970 Now, there's another data structure that does better 677 00:31:53,970 --> 00:31:55,560 for one of these operations. 678 00:31:55,560 --> 00:32:01,020 And the same for the other one that we've learned earlier. 679 00:32:01,020 --> 00:32:03,600 Anyone remember? 680 00:32:03,600 --> 00:32:05,430 Set AVL tree didn't actually give us 681 00:32:05,430 --> 00:32:11,910 anything over a sorted array in a dynamic array. 682 00:32:11,910 --> 00:32:14,805 What that did was, we have a-- 683 00:32:14,805 --> 00:32:17,730 we could sort it in n log n time using merge sort or something 684 00:32:17,730 --> 00:32:18,790 like that. 685 00:32:18,790 --> 00:32:22,500 And then we could just pop off the last one n times. 686 00:32:22,500 --> 00:32:23,790 That would be an amortized-- 687 00:32:23,790 --> 00:32:26,838 I mean, if I didn't care about taking up 688 00:32:26,838 --> 00:32:29,130 that size, I could do it, in worst case, constant time. 689 00:32:29,130 --> 00:32:31,380 I just read off the first-- the last one. 690 00:32:31,380 --> 00:32:33,750 I don't need to resize the array, ever. 691 00:32:33,750 --> 00:32:35,940 I can just ignore that. 692 00:32:35,940 --> 00:32:37,150 Does that makes sense? 693 00:32:37,150 --> 00:32:37,650 OK. 694 00:32:37,650 --> 00:32:39,130 But that's-- OK. 695 00:32:39,130 --> 00:32:43,230 If I had a data structure that implemented these two 696 00:32:43,230 --> 00:32:49,770 operations, can someone tell me an algorithm 697 00:32:49,770 --> 00:32:52,025 to generate fixed lists-- 698 00:32:52,025 --> 00:32:54,150 don't worry about running time right now-- but that 699 00:32:54,150 --> 00:32:55,800 just uses these two operations? 700 00:32:59,270 --> 00:33:00,350 Yeah? 701 00:33:00,350 --> 00:33:03,530 AUDIENCE: So we build this data structure. 702 00:33:03,530 --> 00:33:05,135 It's ordered from least to greatest 703 00:33:05,135 --> 00:33:07,830 toward absolute value of x. 704 00:33:07,830 --> 00:33:11,880 JASON KU: Don't worry about where things are 705 00:33:11,880 --> 00:33:13,130 ordered or anything like that. 706 00:33:13,130 --> 00:33:16,310 I don't tell you how these things are implemented, right? 707 00:33:16,310 --> 00:33:20,330 All I'm saying is, I can accept a bunch of these things, 708 00:33:20,330 --> 00:33:24,500 and I can remove the maximum and return it. 709 00:33:24,500 --> 00:33:25,330 OK? 710 00:33:25,330 --> 00:33:26,830 AUDIENCE: I think we just build it-- 711 00:33:26,830 --> 00:33:29,712 make sure that you build it such that the opinion levels are 712 00:33:29,712 --> 00:33:31,670 the absolute value of the opinion levels, not-- 713 00:33:31,670 --> 00:33:32,060 JASON KU: Sure. 714 00:33:32,060 --> 00:33:32,560 OK. 715 00:33:32,560 --> 00:33:33,720 So that's a nice thing. 716 00:33:33,720 --> 00:33:36,410 What I'm going to do, as your colleague is saying, 717 00:33:36,410 --> 00:33:39,830 is, I'm going to look through all of the things in my input. 718 00:33:39,830 --> 00:33:44,510 I'm going to copy it over to some writable memory store. 719 00:33:44,510 --> 00:33:47,150 That read-only thing is not relevant to this part 720 00:33:47,150 --> 00:33:48,808 of the problem. 721 00:33:48,808 --> 00:33:49,850 What I'm going to do is-- 722 00:33:56,950 --> 00:33:57,550 right. 723 00:33:57,550 --> 00:33:58,050 Sorry. 724 00:33:58,050 --> 00:34:00,790 I'm thinking about your problem set that we're writing. 725 00:34:00,790 --> 00:34:01,750 I'm mixing it up. 726 00:34:01,750 --> 00:34:09,790 OK, so we copy it over to our new linear-sized array. 727 00:34:09,790 --> 00:34:11,679 But instead of putting their values there, 728 00:34:11,679 --> 00:34:13,424 I'm going to put the absolute values of their values. 729 00:34:13,424 --> 00:34:14,170 Does that make sense? 730 00:34:14,170 --> 00:34:15,420 I just check if it's negative. 731 00:34:15,420 --> 00:34:17,949 If it is, I put the positive thing there. 732 00:34:17,949 --> 00:34:19,300 OK? 733 00:34:19,300 --> 00:34:23,440 And then I stick that array in this build. 734 00:34:23,440 --> 00:34:26,620 I put that there. 735 00:34:26,620 --> 00:34:29,199 That will take some-- whatever this build time is. 736 00:34:29,199 --> 00:34:33,489 And then I can delete max k times. 737 00:34:33,489 --> 00:34:38,170 Or I can delete max some number of times, however many things 738 00:34:38,170 --> 00:34:38,739 that I need. 739 00:34:38,739 --> 00:34:39,760 Right? 740 00:34:39,760 --> 00:34:44,050 If I want log n highest things, I can just do that log n times, 741 00:34:44,050 --> 00:34:44,860 right? 742 00:34:44,860 --> 00:34:48,219 So for this-- if I had such a data structure, 743 00:34:48,219 --> 00:34:51,580 I could do this in one run of this operation 744 00:34:51,580 --> 00:34:55,175 and log n runs of this operation. 745 00:34:55,175 --> 00:34:56,050 Does that make sense? 746 00:34:56,050 --> 00:34:57,610 I could solve this problem reducing 747 00:34:57,610 --> 00:34:59,290 to this data structure. 748 00:34:59,290 --> 00:35:05,980 Now, for a sorted array or a set AVL tree, 749 00:35:05,980 --> 00:35:08,155 this operation kind of kills me already. 750 00:35:08,155 --> 00:35:10,420 It takes n log n time. 751 00:35:10,420 --> 00:35:13,360 The nice thing about a binary heap 752 00:35:13,360 --> 00:35:17,100 is, it does this operation in linear time. 753 00:35:20,060 --> 00:35:21,850 You will see that on Tuesday. 754 00:35:21,850 --> 00:35:26,290 And it does this operation in log n time. 755 00:35:30,650 --> 00:35:34,220 What's the running time if I use a binary heap 756 00:35:34,220 --> 00:35:37,520 to implement this data structure? 757 00:35:37,520 --> 00:35:42,020 Order of n times order log n times log n. 758 00:35:42,020 --> 00:35:44,990 How big is log n squared-- 759 00:35:44,990 --> 00:35:48,551 log squared n-- compared to n? 760 00:35:48,551 --> 00:35:49,835 It's smaller, right? 761 00:35:49,835 --> 00:35:51,710 So if I add those two running times together, 762 00:35:51,710 --> 00:35:54,260 it's still linear. 763 00:35:54,260 --> 00:35:57,020 That's how you solve the first problem. 764 00:35:57,020 --> 00:35:59,750 I didn't have to tell you what a binary heap was 765 00:35:59,750 --> 00:36:00,842 or how it did what it did. 766 00:36:00,842 --> 00:36:02,300 All I needed to tell you is that it 767 00:36:02,300 --> 00:36:04,220 did this operation in linear time 768 00:36:04,220 --> 00:36:05,990 and it did this operation log n time. 769 00:36:10,550 --> 00:36:11,720 All right. 770 00:36:11,720 --> 00:36:14,780 The magic will be shown to you on Tuesday. 771 00:36:14,780 --> 00:36:19,730 Part B says, now, suppose Fick's computer is only 772 00:36:19,730 --> 00:36:22,550 allowed to write to, at most, log n space. 773 00:36:22,550 --> 00:36:23,210 Well, OK. 774 00:36:23,210 --> 00:36:24,260 That's a problem here. 775 00:36:24,260 --> 00:36:29,570 Because before, we copied over the entire array, 776 00:36:29,570 --> 00:36:34,137 filtered it out, and then did some operations. 777 00:36:34,137 --> 00:36:35,720 But we couldn't even afford this if we 778 00:36:35,720 --> 00:36:40,560 couldn't store the whole thing externally in writable memory. 779 00:36:40,560 --> 00:36:42,090 So we can't do that. 780 00:36:42,090 --> 00:36:46,820 So in some sense, this is a more restrictive environment. 781 00:36:46,820 --> 00:36:49,590 I can do less things. 782 00:36:49,590 --> 00:36:53,540 It's less powerful than my previous situation, where 783 00:36:53,540 --> 00:36:57,500 I had as much space as I wanted to use. 784 00:36:57,500 --> 00:37:02,000 So it kind of makes sense that I, maybe, 785 00:37:02,000 --> 00:37:07,490 couldn't get the running time bound that we had before. 786 00:37:07,490 --> 00:37:10,040 Maybe I have to sacrifice something 787 00:37:10,040 --> 00:37:13,060 because I'm in a more restricted computational setting. 788 00:37:17,120 --> 00:37:19,563 Now, this is something you could solve with binary heaps, 789 00:37:19,563 --> 00:37:20,480 but you don't have to. 790 00:37:20,480 --> 00:37:27,860 You can solve it with set AVL trees. 791 00:37:27,860 --> 00:37:29,960 Does anyone have an idea of how you could solve 792 00:37:29,960 --> 00:37:32,750 this using a set AVL tree? 793 00:37:32,750 --> 00:37:37,430 I'm limited by my number of-- my space is, at most, log n. 794 00:37:40,710 --> 00:37:43,473 AUDIENCE: So how much space does a set AVL tree take? 795 00:37:43,473 --> 00:37:44,140 JASON KU: Right. 796 00:37:44,140 --> 00:37:47,080 Space-- there's constant number of pointers 797 00:37:47,080 --> 00:37:49,030 for each one of these nodes. 798 00:37:49,030 --> 00:37:52,510 And I'm storing in notes and space. 799 00:37:52,510 --> 00:37:57,520 Basically every data structure we've shown you takes space-- 800 00:37:57,520 --> 00:38:00,130 the order of the things that we're storing. 801 00:38:00,130 --> 00:38:02,270 It's not using additional space. 802 00:38:02,270 --> 00:38:05,500 It might take more time to do certain things, 803 00:38:05,500 --> 00:38:09,457 but the space takes the number of items 804 00:38:09,457 --> 00:38:11,540 that we're storing plus, maybe, a constant factor. 805 00:38:14,260 --> 00:38:18,430 So I'm going to draw my input here, which I can only read-- 806 00:38:18,430 --> 00:38:20,073 I can't write. 807 00:38:20,073 --> 00:38:20,740 Do I give it a-- 808 00:38:20,740 --> 00:38:26,170 I'm just going to call it A. So this is my list 809 00:38:26,170 --> 00:38:29,020 of all the Revenger opinions. 810 00:38:29,020 --> 00:38:31,660 I can only read it. 811 00:38:31,660 --> 00:38:36,140 But my computer can only write to this logarithmic amount 812 00:38:36,140 --> 00:38:36,640 of space. 813 00:38:42,660 --> 00:38:46,280 What can I put in that space? 814 00:38:46,280 --> 00:38:49,500 AUDIENCE: The log [INAUDIBLE]? 815 00:38:49,500 --> 00:38:53,250 JASON KU: Well, I can certainly put log n things in there. 816 00:38:53,250 --> 00:38:55,090 So if I'm given that restriction, 817 00:38:55,090 --> 00:38:58,985 I probably want to build a data structure of that size, 818 00:38:58,985 --> 00:39:00,360 containing that number of things. 819 00:39:00,360 --> 00:39:02,797 Does that make sense? 820 00:39:02,797 --> 00:39:04,380 Because what else are you going to do? 821 00:39:08,000 --> 00:39:10,380 So I gave you an idea. 822 00:39:10,380 --> 00:39:12,800 Maybe we could use a set AVL here. 823 00:39:12,800 --> 00:39:16,460 I see a logarithm in my answer. 824 00:39:16,460 --> 00:39:21,170 It's very possible that we might have sorted arrays 825 00:39:21,170 --> 00:39:23,300 or set AVL things. 826 00:39:23,300 --> 00:39:26,360 Those things give me a log somewhere in my running times, 827 00:39:26,360 --> 00:39:27,440 right? 828 00:39:27,440 --> 00:39:30,930 So kind of makes sense that I might have, 829 00:39:30,930 --> 00:39:32,600 maybe, a set AVL tree here. 830 00:39:32,600 --> 00:39:37,810 Why would a set AVL tree be helpful for me? 831 00:39:37,810 --> 00:39:38,310 Yeah? 832 00:39:38,310 --> 00:39:39,750 AUDIENCE: Because it's sorted and you 833 00:39:39,750 --> 00:39:41,285 don't have the traversal order, you 834 00:39:41,285 --> 00:39:45,305 can calculate the traversal order and insert [INAUDIBLE]?? 835 00:39:45,305 --> 00:39:45,930 JASON KU: Sure. 836 00:39:45,930 --> 00:39:47,410 I can do all of those things. 837 00:39:47,410 --> 00:39:49,320 But in particular, it's going to help 838 00:39:49,320 --> 00:39:54,860 me be able to find a large one quickly, right? 839 00:39:54,860 --> 00:40:00,340 If I have a set of things, it's going to be-- 840 00:40:00,340 --> 00:40:02,150 and I'm maintaining this data structure 841 00:40:02,150 --> 00:40:05,960 by adding things incrementally to it, 842 00:40:05,960 --> 00:40:08,780 I can find out what the biggest one is-- 843 00:40:08,780 --> 00:40:12,710 or the smallest one-- pretty fast in log n time. 844 00:40:12,710 --> 00:40:17,780 So if I have log n things in a tree here, 845 00:40:17,780 --> 00:40:19,242 what's the height of that thing? 846 00:40:19,242 --> 00:40:20,186 AUDIENCE: [INAUDIBLE] 847 00:40:20,186 --> 00:40:21,470 JASON KU: Log log n. 848 00:40:21,470 --> 00:40:23,325 That looks familiar. 849 00:40:23,325 --> 00:40:24,200 So what can I afford? 850 00:40:24,200 --> 00:40:31,760 I can afford a linear number of opt set AVL tree operations 851 00:40:31,760 --> 00:40:34,231 on this data structure. 852 00:40:34,231 --> 00:40:35,455 OK, you had a question? 853 00:40:35,455 --> 00:40:36,330 AUDIENCE: [INAUDIBLE] 854 00:40:36,330 --> 00:40:37,330 JASON KU: OK, I'm sorry. 855 00:40:37,330 --> 00:40:38,008 Yeah? 856 00:40:38,008 --> 00:40:39,550 AUDIENCE: For that to be an AVL tree, 857 00:40:39,550 --> 00:40:42,680 does it have to be a BTS tree? 858 00:40:42,680 --> 00:40:43,960 JASON KU: Uh, BTS-- 859 00:40:43,960 --> 00:40:45,810 BSTs. 860 00:40:45,810 --> 00:40:48,110 So when I talk about-- 861 00:40:48,110 --> 00:40:49,760 someone likes Korean K-Pop. 862 00:40:49,760 --> 00:40:50,610 OK. 863 00:40:50,610 --> 00:40:55,820 So BST-- but in natural-- 864 00:40:55,820 --> 00:40:58,610 kind of in the lingo that you're probably 865 00:40:58,610 --> 00:41:00,860 used to hearing in other contexts, 866 00:41:00,860 --> 00:41:04,310 what we mean, in this class, is a set AVL tree. 867 00:41:04,310 --> 00:41:08,240 Now, sometimes, what people refer to as a 868 00:41:08,240 --> 00:41:12,500 binary search tree doesn't have balance semantics-- 869 00:41:12,500 --> 00:41:16,190 so what we might call, in this class, a set binary tree. 870 00:41:16,190 --> 00:41:20,150 But really, they're useful because they're balanced. 871 00:41:20,150 --> 00:41:22,310 So we're going to usually just assume 872 00:41:22,310 --> 00:41:24,450 that we're talking about balanced things here. 873 00:41:24,450 --> 00:41:28,730 Now, a set AVL tree has these binary search tree semantics 874 00:41:28,730 --> 00:41:30,790 where the keys are ordered. 875 00:41:30,790 --> 00:41:32,540 These items have keys and they're ordered. 876 00:41:32,540 --> 00:41:34,370 It's a set interface. 877 00:41:34,370 --> 00:41:36,980 Whereas we also presented to you a sequence 878 00:41:36,980 --> 00:41:41,900 interface, for which these things don't even have keys. 879 00:41:41,900 --> 00:41:45,440 How could I store set semantics there? 880 00:41:45,440 --> 00:41:47,060 So that's the distinction that we 881 00:41:47,060 --> 00:41:53,360 mean when we say binary search tree versus, really, a set AVL 882 00:41:53,360 --> 00:41:54,660 tree versus [INAUDIBLE]. 883 00:41:54,660 --> 00:41:55,160 Yeah? 884 00:41:55,160 --> 00:41:57,743 AUDIENCE: So if we look at it to make an AVL tree out of this, 885 00:41:57,743 --> 00:42:00,650 would that mean that, when we make a node, we tell it, 886 00:42:00,650 --> 00:42:03,830 we are keying on the absolute value of [INAUDIBLE]?? 887 00:42:03,830 --> 00:42:09,350 JASON KU: OK, when you're making a set AVL tree, 888 00:42:09,350 --> 00:42:11,943 you've got to tell us what-- if you're storing objects, 889 00:42:11,943 --> 00:42:13,610 you've got to tell me what their key is. 890 00:42:13,610 --> 00:42:15,170 You're just storing some numbers, 891 00:42:15,170 --> 00:42:18,110 like what I'm doing here. 892 00:42:18,110 --> 00:42:19,970 Now, this isn't a set AVL tree. 893 00:42:19,970 --> 00:42:21,560 But if I'm just storing numbers, I 894 00:42:21,560 --> 00:42:25,400 have to tell you the items that I'm storing are the keys. 895 00:42:25,400 --> 00:42:26,685 And then everything follows. 896 00:42:26,685 --> 00:42:28,310 But if you've got an object that you're 897 00:42:28,310 --> 00:42:30,750 trying to sort, like the students in this room, 898 00:42:30,750 --> 00:42:33,130 you've got a lot of properties. 899 00:42:33,130 --> 00:42:38,600 I want all of the people with phone number-- 900 00:42:38,600 --> 00:42:41,300 maybe I want to key you on phone number for some reason. 901 00:42:41,300 --> 00:42:44,150 That's going to help me find out where you live? 902 00:42:44,150 --> 00:42:45,680 I don't-- this is getting a little-- 903 00:42:45,680 --> 00:42:47,270 I don't want to go there. 904 00:42:47,270 --> 00:42:50,283 But if I give you a set AVL tree, 905 00:42:50,283 --> 00:42:51,950 I've got to tell you what it's keyed on. 906 00:42:51,950 --> 00:42:55,790 If I give you a sequence AVL tree, 907 00:42:55,790 --> 00:42:57,920 it's obvious what my traversal order 908 00:42:57,920 --> 00:43:00,320 is going to be because I'm giving you a sequence. 909 00:43:00,320 --> 00:43:03,740 That's what the input was. 910 00:43:03,740 --> 00:43:05,150 Does that make sense? 911 00:43:05,150 --> 00:43:10,250 All right, so I've got this set AVL tree of size log n. 912 00:43:10,250 --> 00:43:12,678 What should it be keyed by? 913 00:43:12,678 --> 00:43:13,720 AUDIENCE: Absolute value. 914 00:43:13,720 --> 00:43:15,920 JASON KU: The absolute value of their preference-- 915 00:43:15,920 --> 00:43:18,100 or of their opinion. 916 00:43:18,100 --> 00:43:20,770 I don't remember what this is called. 917 00:43:20,770 --> 00:43:23,490 But what log n things do I put in here? 918 00:43:28,240 --> 00:43:30,550 I don't know. 919 00:43:30,550 --> 00:43:32,260 I don't know anything about these things. 920 00:43:32,260 --> 00:43:35,270 What makes one better than another? 921 00:43:35,270 --> 00:43:37,900 Let's just put the first log n things. 922 00:43:37,900 --> 00:43:39,680 Does that make sense? 923 00:43:39,680 --> 00:43:40,180 All right. 924 00:43:40,180 --> 00:43:43,900 What could that tell me? 925 00:43:43,900 --> 00:43:45,490 Now, I've put this thing in it. 926 00:43:45,490 --> 00:43:47,811 How long did that take? 927 00:43:47,811 --> 00:43:49,640 AUDIENCE: [INAUDIBLE] 928 00:43:49,640 --> 00:43:55,760 JASON KU: Log n times log log n time, right? 929 00:43:55,760 --> 00:43:58,520 But that's much less than our running time 930 00:43:58,520 --> 00:44:01,800 that we're looking for, so I don't really care. 931 00:44:01,800 --> 00:44:04,760 I mean, I want you to say how long it took, 932 00:44:04,760 --> 00:44:06,527 but for my purposes, I know that it's 933 00:44:06,527 --> 00:44:08,360 lower than the running time I'm looking for. 934 00:44:08,360 --> 00:44:10,160 And I did that operation once. 935 00:44:10,160 --> 00:44:12,140 I don't really care about it anymore. 936 00:44:12,140 --> 00:44:12,860 Yeah? 937 00:44:12,860 --> 00:44:14,720 AUDIENCE: How did you get the log n times log log n? 938 00:44:14,720 --> 00:44:16,303 JASON KU: Because the number of things 939 00:44:16,303 --> 00:44:19,110 I'm storing in this thing is log n. 940 00:44:19,110 --> 00:44:24,050 And so if I pattern match the build time of an AVL tree 941 00:44:24,050 --> 00:44:29,870 and I stick log n in there, then it's log n times log n. 942 00:44:29,870 --> 00:44:31,220 OK. 943 00:44:31,220 --> 00:44:33,440 AUDIENCE: So that's just for one iteration? 944 00:44:33,440 --> 00:44:36,920 JASON KU: Well, right now, I've just built this thing. 945 00:44:36,920 --> 00:44:40,460 Maybe-- I just built it once. 946 00:44:40,460 --> 00:44:44,630 I'm asserting, too, that maybe I don't need to build it again. 947 00:44:44,630 --> 00:44:47,660 What could I-- so, by now, I know-- 948 00:44:47,660 --> 00:44:51,290 I haven't filtered my data at all. 949 00:44:51,290 --> 00:44:55,660 I'm just storing these things in sorted order in some way. 950 00:44:55,660 --> 00:44:59,780 What can I do to maybe start processing 951 00:44:59,780 --> 00:45:01,670 the rest of the data? 952 00:45:01,670 --> 00:45:02,360 Yeah? 953 00:45:02,360 --> 00:45:06,490 AUDIENCE: [INAUDIBLE] try to scroll through the list A 954 00:45:06,490 --> 00:45:09,830 and try to find someone that's bigger than-- 955 00:45:09,830 --> 00:45:13,430 try to keep the maximum [INAUDIBLE].. 956 00:45:13,430 --> 00:45:16,100 JASON KU: Sweep this guy over inserting things, 957 00:45:16,100 --> 00:45:18,980 and always maintaining-- 958 00:45:18,980 --> 00:45:20,990 if I do that, and I keep sticking things 959 00:45:20,990 --> 00:45:23,490 n I'll have this sorted of thing at the end. 960 00:45:23,490 --> 00:45:28,910 And now I can just read off the biggest k things. 961 00:45:28,910 --> 00:45:33,290 However, as I'm inserting things across here, 962 00:45:33,290 --> 00:45:34,610 my thing's growing. 963 00:45:34,610 --> 00:45:35,810 AUDIENCE: Well, just delete the smallest one. 964 00:45:35,810 --> 00:45:37,185 JASON KU: Oh, delete small stuff. 965 00:45:37,185 --> 00:45:38,293 I like that idea. 966 00:45:38,293 --> 00:45:39,710 AUDIENCE: So basically replace it. 967 00:45:39,710 --> 00:45:40,940 JASON KU: Yeah, basically replace it. 968 00:45:40,940 --> 00:45:41,630 Right. 969 00:45:41,630 --> 00:45:43,010 What I'm going to do-- 970 00:45:43,010 --> 00:45:44,810 here's a proposal. 971 00:45:44,810 --> 00:45:49,070 We're going to take the next guy, stick it in. 972 00:45:49,070 --> 00:45:51,520 Awesome. 973 00:45:51,520 --> 00:45:54,010 Which one don't I care about now? 974 00:45:54,010 --> 00:45:55,600 The smallest one there. 975 00:45:55,600 --> 00:45:57,850 So kick the smallest one out. 976 00:45:57,850 --> 00:46:00,100 Now, this one that I stuck in may. 977 00:46:00,100 --> 00:46:01,550 Be the smallest. 978 00:46:01,550 --> 00:46:03,580 So I just kind of passed it through this thing, 979 00:46:03,580 --> 00:46:05,300 but how long did that take me? 980 00:46:10,032 --> 00:46:11,490 It took me the height of this tree. 981 00:46:11,490 --> 00:46:12,195 What's the height of this tree? 982 00:46:12,195 --> 00:46:13,070 AUDIENCE: [INAUDIBLE] 983 00:46:13,070 --> 00:46:15,300 JASON KU: Log log n. 984 00:46:15,300 --> 00:46:18,570 So I put one in, I popped one out. 985 00:46:18,570 --> 00:46:20,638 That's the smallest, right? 986 00:46:20,638 --> 00:46:22,680 And I keep doing that all the way down the thing. 987 00:46:22,680 --> 00:46:23,763 How long did that take me? 988 00:46:23,763 --> 00:46:25,070 JASON KU: [INAUDIBLE] 989 00:46:28,415 --> 00:46:29,040 JASON KU: Yeah. 990 00:46:29,040 --> 00:46:34,890 Processing n minus log n things, which is basically n. 991 00:46:34,890 --> 00:46:37,230 And each one of those operations took me 992 00:46:37,230 --> 00:46:39,120 height of the tree time. 993 00:46:39,120 --> 00:46:40,920 That gives me the running time that we're 994 00:46:40,920 --> 00:46:45,960 looking for, n log log n. 995 00:46:45,960 --> 00:46:46,500 sense? 996 00:46:46,500 --> 00:46:48,840 AUDIENCE: Is this reminiscent of a sliding window technique? 997 00:46:48,840 --> 00:46:48,960 JASON KU: Yeah. 998 00:46:48,960 --> 00:46:50,760 It's kind of a sliding window technique. 999 00:46:50,760 --> 00:46:53,100 You may have been using one recently. 1000 00:46:56,470 --> 00:46:56,970 OK. 1001 00:46:56,970 --> 00:46:58,322 Everyone OK with this? 1002 00:46:58,322 --> 00:46:58,858 Yeah? 1003 00:46:58,858 --> 00:47:00,900 AUDIENCE: Could you just remind me of the context 1004 00:47:00,900 --> 00:47:05,170 that we're talking about is log log n, like tree and where-- 1005 00:47:05,170 --> 00:47:07,170 JASON KU: So this thing-- the size of this thing 1006 00:47:07,170 --> 00:47:08,495 is log log n? 1007 00:47:08,495 --> 00:47:09,120 AUDIENCE: Yeah. 1008 00:47:09,120 --> 00:47:10,453 JASON KU: I mean-- sorry, log n. 1009 00:47:10,453 --> 00:47:15,438 And the height of this thing is a log of the size. 1010 00:47:15,438 --> 00:47:19,065 AUDIENCE: I'm sorry, in relation to our little log n size 1011 00:47:19,065 --> 00:47:25,270 [INAUDIBLE] small log log n trees, or-- 1012 00:47:25,270 --> 00:47:27,150 JASON KU: No, so-- sorry. 1013 00:47:27,150 --> 00:47:29,790 I'm taking this stuff-- 1014 00:47:29,790 --> 00:47:32,040 there's no intermediate data structure here-- 1015 00:47:32,040 --> 00:47:37,105 I'm just sticking all of these things into set AVL. 1016 00:47:37,105 --> 00:47:37,605 Yeah? 1017 00:47:37,605 --> 00:47:39,030 AUDIENCE: [INAUDIBLE] 1018 00:47:39,030 --> 00:47:43,200 JASON KU: Into one set AVL of size log n. 1019 00:47:43,200 --> 00:47:48,497 I'm sticking a guy in, popping the worst guy out. 1020 00:47:48,497 --> 00:47:49,830 Going through all of the things. 1021 00:47:49,830 --> 00:47:52,410 I need to make sure, when I'm sticking it in, 1022 00:47:52,410 --> 00:47:55,740 I'm keeping track of which Revenger it is, 1023 00:47:55,740 --> 00:47:57,720 and that I'm taking the absolute value, 1024 00:47:57,720 --> 00:48:00,450 and all of those nitty gritty kind of things. 1025 00:48:00,450 --> 00:48:02,250 But that's the basic idea. 1026 00:48:02,250 --> 00:48:05,730 I'm just taking this, I'm sliding the window in, 1027 00:48:05,730 --> 00:48:07,830 putting something in, taking something out 1028 00:48:07,830 --> 00:48:11,460 that may or may not-- probably is not-- the same thing. 1029 00:48:11,460 --> 00:48:14,490 And at the end of this procedure, 1030 00:48:14,490 --> 00:48:17,070 the invariant I'm maintaining here 1031 00:48:17,070 --> 00:48:23,730 is that my thing always has the k largest opinions of the ones 1032 00:48:23,730 --> 00:48:24,925 that I've processed so far. 1033 00:48:24,925 --> 00:48:26,550 That's obviously true at the beginning, 1034 00:48:26,550 --> 00:48:27,930 when I build this thing. 1035 00:48:27,930 --> 00:48:31,320 And when I get to the end, I've processed all of the things, 1036 00:48:31,320 --> 00:48:38,310 and this has size log n, and so I have the log n largest-- 1037 00:48:38,310 --> 00:48:41,730 highest-- extremist opinions. 1038 00:48:41,730 --> 00:48:43,990 And then I can just do an in-order traversal 1039 00:48:43,990 --> 00:48:46,140 of this thing and return. 1040 00:48:46,140 --> 00:48:47,290 Does that make sense? 1041 00:48:47,290 --> 00:48:50,130 And I've only used logarithmic space. 1042 00:48:53,210 --> 00:48:53,710 Yeah? 1043 00:48:53,710 --> 00:48:55,002 AUDIENCE: Wait, I don't get it. 1044 00:48:55,002 --> 00:49:00,090 Are all of the opinions in that AVL tree? 1045 00:49:00,090 --> 00:49:02,340 JASON KU: Are all of the opinions in that AVL tree? 1046 00:49:02,340 --> 00:49:05,520 All of these opinions are in the AVL tree. 1047 00:49:05,520 --> 00:49:12,180 And at some point, I will insert every opinion 1048 00:49:12,180 --> 00:49:14,040 into this AVL tree. 1049 00:49:14,040 --> 00:49:17,640 But I'll be removing the ones that I don't care about 1050 00:49:17,640 --> 00:49:18,223 as I go. 1051 00:49:18,223 --> 00:49:19,140 Does that makes sense? 1052 00:49:19,140 --> 00:49:22,860 I'm always maintaining the invariants 1053 00:49:22,860 --> 00:49:26,010 that this thing, before I insert something, 1054 00:49:26,010 --> 00:49:29,100 has exactly log n items in it, and then I'm 1055 00:49:29,100 --> 00:49:31,890 maintaining that invariant by sticking one in, taking one 1056 00:49:31,890 --> 00:49:32,530 out. 1057 00:49:32,530 --> 00:49:35,178 AUDIENCE: Oh, OK, so then which one are you deleting? 1058 00:49:35,178 --> 00:49:36,720 JASON KU: Always the min, because I'm 1059 00:49:36,720 --> 00:49:38,667 wanting the largest ones. 1060 00:49:38,667 --> 00:49:40,500 AUDIENCE: And the min of the absolute value. 1061 00:49:40,500 --> 00:49:42,330 JASON KU: Yeah. 1062 00:49:42,330 --> 00:49:45,210 I'm keying by the absolute value of these opinions. 1063 00:49:45,210 --> 00:49:45,710 Yeah? 1064 00:49:45,710 --> 00:49:48,510 AUDIENCE: [INAUDIBLE]? 1065 00:49:48,510 --> 00:49:51,390 JASON KU: Total runtime here? 1066 00:49:51,390 --> 00:49:52,830 It's bookkeeping. 1067 00:49:52,830 --> 00:50:01,470 It took me log n times log log n time 1068 00:50:01,470 --> 00:50:05,430 to build this data structure at the beginning, 1069 00:50:05,430 --> 00:50:11,040 plus n times log log n. 1070 00:50:11,040 --> 00:50:15,510 I did, basically, n operation-- asymptotically, n operations. 1071 00:50:15,510 --> 00:50:19,620 This way, it's actually n minus log n operations. 1072 00:50:19,620 --> 00:50:22,740 And each one of those tree operations-- 1073 00:50:22,740 --> 00:50:25,350 doing one insert, one delete-- 1074 00:50:25,350 --> 00:50:28,750 each one of those took the height of the tree time. 1075 00:50:28,750 --> 00:50:31,440 And so this is that. 1076 00:50:31,440 --> 00:50:32,820 Good? 1077 00:50:32,820 --> 00:50:33,420 Yeah? 1078 00:50:33,420 --> 00:50:36,249 AUDIENCE: If, instead of one, we just insert and delete, can 1079 00:50:36,249 --> 00:50:39,030 you do a comparison and then-- 1080 00:50:39,030 --> 00:50:42,720 JASON KU: The inserting and deleting 1081 00:50:42,720 --> 00:50:45,930 a set AVL tree is actually doing comparisons 1082 00:50:45,930 --> 00:50:47,070 within its data structures. 1083 00:50:47,070 --> 00:50:47,910 AUDIENCE: Just compare with the min. 1084 00:50:47,910 --> 00:50:48,410 And then-- 1085 00:50:48,410 --> 00:50:50,490 JASON KU: Sure, you could do that. 1086 00:50:50,490 --> 00:50:52,410 I could do it the other way. 1087 00:50:52,410 --> 00:50:56,790 I could remove the smallest element here, 1088 00:50:56,790 --> 00:50:58,860 to start with, right? 1089 00:50:58,860 --> 00:51:00,870 And then I compare it with this guy, 1090 00:51:00,870 --> 00:51:04,370 and then whichever is bigger, I stick it back in. 1091 00:51:04,370 --> 00:51:06,160 Same thing. 1092 00:51:06,160 --> 00:51:09,260 It's just, am I doing the delete first and then the insertion, 1093 00:51:09,260 --> 00:51:12,940 or am I doing the insertion first and then the deletion? 1094 00:51:12,940 --> 00:51:14,830 Any other questions? 1095 00:51:14,830 --> 00:51:15,590 Lots of questions. 1096 00:51:15,590 --> 00:51:16,090 All right. 1097 00:51:16,090 --> 00:51:18,940 Well, I'm probably going to have to skip a problem. 1098 00:51:18,940 --> 00:51:22,060 We're going to move on to CS-- 1099 00:51:22,060 --> 00:51:23,530 no, SCLR. 1100 00:51:27,040 --> 00:51:28,396 What's the reference here? 1101 00:51:28,396 --> 00:51:30,100 AUDIENCE: [INAUDIBLE] 1102 00:51:30,100 --> 00:51:32,230 JASON KU: Yeah, CLRS. 1103 00:51:32,230 --> 00:51:34,630 These are four academics who wrote a popular textbook 1104 00:51:34,630 --> 00:51:37,520 in computer science. 1105 00:51:37,520 --> 00:51:39,880 This is the same kind of k kind of thing. 1106 00:51:39,880 --> 00:51:44,770 They found first editions and they want to auction them off. 1107 00:51:44,770 --> 00:51:46,553 People can go on to their website. 1108 00:51:46,553 --> 00:51:47,470 They have a bidder ID. 1109 00:51:47,470 --> 00:51:49,900 It's a unique identifier. 1110 00:51:49,900 --> 00:51:53,830 And they can place a bid for one of these books. 1111 00:51:53,830 --> 00:51:57,400 And they can change it during the bidding period, 1112 00:51:57,400 --> 00:52:00,640 but at the end of the bidding period, 1113 00:52:00,640 --> 00:52:06,112 the academics want to know who the-- 1114 00:52:06,112 --> 00:52:08,080 what is the expected revenue I'll 1115 00:52:08,080 --> 00:52:10,060 get by selling to the k highest bidders? 1116 00:52:10,060 --> 00:52:13,470 Does that makes sense? 1117 00:52:13,470 --> 00:52:15,930 Yeah? 1118 00:52:15,930 --> 00:52:16,590 OK. 1119 00:52:16,590 --> 00:52:22,560 Note that, before I build this data structure, 1120 00:52:22,560 --> 00:52:24,690 I know what k is. k is a fixed thing. 1121 00:52:27,360 --> 00:52:31,230 Because my running time of this get-revenue depends on this k, 1122 00:52:31,230 --> 00:52:34,470 it's not an input to that operation. 1123 00:52:34,470 --> 00:52:37,530 So k is kind of I don't know what it is, a priori. 1124 00:52:37,530 --> 00:52:39,630 It could be n over 2. 1125 00:52:39,630 --> 00:52:42,180 It could be log n. 1126 00:52:42,180 --> 00:52:43,590 It could be 1. 1127 00:52:46,560 --> 00:52:48,540 But the data structure I build needs 1128 00:52:48,540 --> 00:52:50,940 to satisfy these running time properties, 1129 00:52:50,940 --> 00:52:55,330 no matter what choice of k that the academics told me. 1130 00:52:55,330 --> 00:52:57,900 Does that make sense? 1131 00:52:57,900 --> 00:53:03,090 What I need to do is, as far as time is going on, 1132 00:53:03,090 --> 00:53:07,920 people are placing new bids and updating their bids. 1133 00:53:07,920 --> 00:53:11,890 And those updates can take log n time. 1134 00:53:11,890 --> 00:53:14,640 But as soon as I close the window, 1135 00:53:14,640 --> 00:53:21,250 I want to be able to tell, in constant time, what 1136 00:53:21,250 --> 00:53:24,670 the k highest bidders are. 1137 00:53:24,670 --> 00:53:26,460 Any ideas on how to do this? 1138 00:53:26,460 --> 00:53:28,210 What are the operations that I have to do? 1139 00:53:28,210 --> 00:53:32,920 I have to be able to place a new bid. 1140 00:53:32,920 --> 00:53:37,690 Associated with a bidder is an idea and a bid, 1141 00:53:37,690 --> 00:53:39,440 which is also an integer-- 1142 00:53:39,440 --> 00:53:41,440 how many dollars I'm going to pay for this book. 1143 00:53:44,050 --> 00:53:45,700 Update the bid. 1144 00:53:45,700 --> 00:53:48,460 In some sense, I need to find whether that person placed 1145 00:53:48,460 --> 00:53:51,830 the bid before, in my data structure. 1146 00:53:51,830 --> 00:53:54,700 So at some point, I'm going to need a find on bidder ID. 1147 00:53:54,700 --> 00:53:57,500 Does that seem possible? 1148 00:53:57,500 --> 00:53:59,800 So I might want to have some kind of dictionary 1149 00:53:59,800 --> 00:54:01,930 on bidder IDs. 1150 00:54:01,930 --> 00:54:05,260 When I say that I want to have a dictionary on something, 1151 00:54:05,260 --> 00:54:08,800 right I'm not specifying to you yet 1152 00:54:08,800 --> 00:54:11,080 how I'm going to implement that dictionary. 1153 00:54:11,080 --> 00:54:13,015 What are my usual options? 1154 00:54:13,015 --> 00:54:14,780 A hash table. 1155 00:54:14,780 --> 00:54:16,750 But what if I need worst-case time? 1156 00:54:19,390 --> 00:54:21,120 A set AVL tree, right? 1157 00:54:21,120 --> 00:54:23,740 That's going to be your go-to for a dictionary, 1158 00:54:23,740 --> 00:54:26,115 because that's going to give me log n time to find things 1159 00:54:26,115 --> 00:54:28,620 via a key. 1160 00:54:28,620 --> 00:54:31,560 It's the only thing-- well, except for a sorted-- 1161 00:54:31,560 --> 00:54:33,870 you could also use a sorted array, 1162 00:54:33,870 --> 00:54:36,180 but that's going to not be dynamic. 1163 00:54:36,180 --> 00:54:38,700 And here, we're updating who's in my data structure 1164 00:54:38,700 --> 00:54:39,780 all the time. 1165 00:54:39,780 --> 00:54:43,850 People are going in and placing bids-- 1166 00:54:43,850 --> 00:54:45,120 new people placing bids. 1167 00:54:45,120 --> 00:54:47,280 So my set of things that I care about 1168 00:54:47,280 --> 00:54:48,572 is changing all of the time. 1169 00:54:48,572 --> 00:54:50,280 So that's probably going to steer me away 1170 00:54:50,280 --> 00:54:52,440 from sorted arrays, because they're not 1171 00:54:52,440 --> 00:54:55,920 good with dynamic operations. 1172 00:54:55,920 --> 00:54:59,610 So I'm going to need some kind of dictionary on bidder IDs, 1173 00:54:59,610 --> 00:55:06,120 but I'm also going to need to maintain the sum of the k 1174 00:55:06,120 --> 00:55:06,810 highest bidders. 1175 00:55:06,810 --> 00:55:08,710 Does that makes sense? 1176 00:55:08,710 --> 00:55:10,650 And so, in some sense, I need to keep 1177 00:55:10,650 --> 00:55:18,878 track of an ordered notion of the bidders, the bids, that 1178 00:55:18,878 --> 00:55:19,920 are in my data structure. 1179 00:55:19,920 --> 00:55:22,080 Does that make sense? 1180 00:55:22,080 --> 00:55:24,460 So order is going to be important on the bids. 1181 00:55:24,460 --> 00:55:28,410 I'm going to need a look-up on bidder ID. 1182 00:55:28,410 --> 00:55:31,330 And that's about it, right? 1183 00:55:31,330 --> 00:55:31,830 OK. 1184 00:55:31,830 --> 00:55:32,100 Yeah? 1185 00:55:32,100 --> 00:55:32,670 AUDIENCE: Just checking. 1186 00:55:32,670 --> 00:55:34,045 So if something that's worst-case 1187 00:55:34,045 --> 00:55:38,550 runs at worst-case time, runs expected [INAUDIBLE].. 1188 00:55:38,550 --> 00:55:39,150 JASON KU: Yes. 1189 00:55:39,150 --> 00:55:40,260 Correct. 1190 00:55:40,260 --> 00:55:42,390 Yeah. 1191 00:55:42,390 --> 00:55:44,190 That's a very good observation. 1192 00:55:44,190 --> 00:55:46,590 If it runs in worst case time, it also 1193 00:55:46,590 --> 00:55:47,972 runs in expected at time, right? 1194 00:55:47,972 --> 00:55:49,805 Because there's essentially no randomization 1195 00:55:49,805 --> 00:55:51,650 that I'm talking about here. 1196 00:55:51,650 --> 00:55:54,620 AUDIENCE: [INAUDIBLE]? 1197 00:55:54,620 --> 00:55:55,450 JASON KU: Yeah. 1198 00:55:55,450 --> 00:55:58,890 And so there's a stronger notion which we want you to specify, 1199 00:55:58,890 --> 00:56:01,770 which is that, actually, there is no randomization here. 1200 00:56:01,770 --> 00:56:04,260 We're not using a hash table. 1201 00:56:04,260 --> 00:56:07,973 In this class, really, that's the only situation where 1202 00:56:07,973 --> 00:56:09,140 that's going to be an issue. 1203 00:56:11,880 --> 00:56:16,450 But if it is, what this problem is saying for each [INAUDIBLE],, 1204 00:56:16,450 --> 00:56:19,170 whether your running time is worst case expected and/or 1205 00:56:19,170 --> 00:56:22,350 amortized, what we're really trying to get you to say is 1206 00:56:22,350 --> 00:56:23,010 what's the-- 1207 00:56:26,220 --> 00:56:28,170 evaluate the running time of your algorithm 1208 00:56:28,170 --> 00:56:30,840 with the proper qualifications. 1209 00:56:30,840 --> 00:56:32,280 If it took worst case, I want you 1210 00:56:32,280 --> 00:56:35,030 to say that it took worst case. 1211 00:56:35,030 --> 00:56:38,300 If it took-- if you used a hash table, 1212 00:56:38,300 --> 00:56:39,620 I want you to say expected. 1213 00:56:39,620 --> 00:56:43,850 And if these operations were sometimes really bad, 1214 00:56:43,850 --> 00:56:46,430 but on average, they're really good-- if I did a lot of them, 1215 00:56:46,430 --> 00:56:49,550 that's amortized. 1216 00:56:49,550 --> 00:56:53,180 Or if I reduced to using a dynamic array, 1217 00:56:53,180 --> 00:56:54,920 or if I reduced to using a hash table, 1218 00:56:54,920 --> 00:56:59,530 those dynamic operations would still be amortized. 1219 00:56:59,530 --> 00:57:00,350 OK. 1220 00:57:00,350 --> 00:57:01,700 The dynamic ones. 1221 00:57:01,700 --> 00:57:04,520 The nice thing about linked data structures 1222 00:57:04,520 --> 00:57:07,148 is that dynamic operations aren't amortized. 1223 00:57:07,148 --> 00:57:08,690 So we're going to be able to get one. 1224 00:57:08,690 --> 00:57:12,080 Now, for this problem, we can actually get worst case bounds, 1225 00:57:12,080 --> 00:57:13,860 so we're going to try for that. 1226 00:57:13,860 --> 00:57:16,700 You can also do it in expected using some hash tables 1227 00:57:16,700 --> 00:57:20,240 for that dictionary. 1228 00:57:20,240 --> 00:57:21,710 When you approach a data structures 1229 00:57:21,710 --> 00:57:26,630 problem in this class, you want to tell me 1230 00:57:26,630 --> 00:57:30,800 what it is you're storing, first off. 1231 00:57:30,800 --> 00:57:34,300 Tell me what's supposed to be in those things-- 1232 00:57:34,300 --> 00:57:35,970 some invariants on this data structure 1233 00:57:35,970 --> 00:57:40,520 to make sure that, when I do queries later, 1234 00:57:40,520 --> 00:57:42,560 that these things are being maintained, 1235 00:57:42,560 --> 00:57:49,520 so that if I'm maintaining a sorted array, 1236 00:57:49,520 --> 00:57:52,070 and I'm supporting an operation to find the maximum, 1237 00:57:52,070 --> 00:57:52,850 I had better-- 1238 00:57:52,850 --> 00:57:54,380 anything I do to this data structure 1239 00:57:54,380 --> 00:57:56,448 had better be maintaining the invariants 1240 00:57:56,448 --> 00:57:57,990 that these things are in sorted order 1241 00:57:57,990 --> 00:58:00,620 and the last thing has the maximum item. 1242 00:58:00,620 --> 00:58:03,770 Because my max-return thing is going 1243 00:58:03,770 --> 00:58:05,540 to look there and return that. 1244 00:58:05,540 --> 00:58:07,290 Does that makes sense? 1245 00:58:07,290 --> 00:58:10,520 So you want to tell me what is being stored 1246 00:58:10,520 --> 00:58:13,280 at a generic point in time during your data 1247 00:58:13,280 --> 00:58:15,860 structure, what is being maintained-- so 1248 00:58:15,860 --> 00:58:20,060 that, when I support the dynamic operation or a query, 1249 00:58:20,060 --> 00:58:23,630 in a dynamic operation, where I'm inserting and leading 1250 00:58:23,630 --> 00:58:26,180 things from this thing, I need to make sure that I'm 1251 00:58:26,180 --> 00:58:28,508 maintaining those invariants. 1252 00:58:28,508 --> 00:58:30,050 And when I'm querying, I can actually 1253 00:58:30,050 --> 00:58:34,250 rely on those invariants to answer my query. 1254 00:58:34,250 --> 00:58:36,380 Does that makes sense? 1255 00:58:36,380 --> 00:58:40,030 So, for this problem-- this is 4-3. 1256 00:58:43,830 --> 00:58:45,410 Any ideas? 1257 00:58:45,410 --> 00:58:50,030 I have two kind of keys that I might have to deal with. 1258 00:58:50,030 --> 00:58:55,520 One's a bid ID and one's a bid, right? 1259 00:58:55,520 --> 00:59:00,380 So how could I, if I have two keys that I might want to, 1260 00:59:00,380 --> 00:59:03,620 maybe, order on one and look up on another, 1261 00:59:03,620 --> 00:59:06,187 how many data structures do you think I'm going to use? 1262 00:59:06,187 --> 00:59:06,770 AUDIENCE: Two. 1263 00:59:06,770 --> 00:59:07,353 JASON KU: Two. 1264 00:59:07,353 --> 00:59:10,130 That's a pretty good guess. 1265 00:59:10,130 --> 00:59:11,510 So one of them-- 1266 00:59:11,510 --> 00:59:12,860 let's just guess, right? 1267 00:59:12,860 --> 00:59:15,920 I need to be able to look up on bid, 1268 00:59:15,920 --> 00:59:24,110 so let's store these bidders in some kind of dictionary 1269 00:59:24,110 --> 00:59:26,540 that's going to be able to look up those things fast. 1270 00:59:26,540 --> 00:59:28,880 So two data structures. 1271 00:59:28,880 --> 00:59:41,222 One is a dictionary keyed on bidder ID. 1272 00:59:44,990 --> 00:59:47,890 What else am I going to want? 1273 00:59:47,890 --> 00:59:48,390 What's up? 1274 00:59:52,330 --> 00:59:56,380 The other way around, a dictionary stored on the bids? 1275 00:59:56,380 --> 00:59:57,910 Is a dictionary what I want here? 1276 00:59:57,910 --> 01:00:01,480 AUDIENCE: [INAUDIBLE] set up AVL tree [INAUDIBLE]?? 1277 01:00:01,480 --> 01:00:03,490 JASON KU: I want to maintain order somehow. 1278 01:00:03,490 --> 01:00:06,250 Because I want to maintain the biggest 1279 01:00:06,250 --> 01:00:07,780 things that I've seen so far. 1280 01:00:07,780 --> 01:00:12,102 Right now, if I have-- at some point in time, 1281 01:00:12,102 --> 01:00:13,060 what's going to happen? 1282 01:00:13,060 --> 01:00:16,780 If I'm maintaining the k largest at any point in time, 1283 01:00:16,780 --> 01:00:19,630 it's possible that one of those bidders 1284 01:00:19,630 --> 01:00:24,400 maybe decreases his bid so it's no longer in the highest. 1285 01:00:24,400 --> 01:00:27,310 I'm going to also need to keep track of the other guys 1286 01:00:27,310 --> 01:00:31,840 to see who I should add back into that set, for example. 1287 01:00:31,840 --> 01:00:33,880 Here's an idea. 1288 01:00:33,880 --> 01:00:37,823 I'm going to keep not just one other data structure, but two 1289 01:00:37,823 --> 01:00:38,740 other data structures. 1290 01:00:38,740 --> 01:00:40,810 Maybe this is a leap. 1291 01:00:40,810 --> 01:00:42,010 You don't have to do this. 1292 01:00:42,010 --> 01:00:44,140 There's a way to do it with just one other. 1293 01:00:44,140 --> 01:00:46,910 But I'm going to store two more. 1294 01:00:46,910 --> 01:00:49,900 One is kind of an-- 1295 01:00:49,900 --> 01:01:01,100 a data structure to store bidders with a-- 1296 01:01:01,100 --> 01:01:14,020 store the k highest bidders, and a data structure 1297 01:01:14,020 --> 01:01:22,570 to store the n minus k highest bidders. 1298 01:01:22,570 --> 01:01:25,640 Does that make sense? 1299 01:01:25,640 --> 01:01:31,190 This separates my problem quite nicely, right? 1300 01:01:31,190 --> 01:01:33,830 Every time someone does an interaction with this data 1301 01:01:33,830 --> 01:01:41,360 structure, I can check to see whether it's 1302 01:01:41,360 --> 01:01:44,600 bigger than the smallest thing in here. 1303 01:01:44,600 --> 01:01:48,320 If it is, I can do the same kind of trick I did before. 1304 01:01:48,320 --> 01:01:52,610 I can remove it and stick my new one in there. 1305 01:01:52,610 --> 01:01:54,200 And we're going-- but I removed it. 1306 01:01:54,200 --> 01:01:56,660 I have to maintain this property. 1307 01:01:56,660 --> 01:01:58,575 So I stick it in here. 1308 01:01:58,575 --> 01:01:59,450 There's another case. 1309 01:01:59,450 --> 01:02:00,367 What's the other case? 1310 01:02:03,190 --> 01:02:05,460 It's smaller. 1311 01:02:05,460 --> 01:02:07,090 In which case, I don't do anything 1312 01:02:07,090 --> 01:02:09,340 to this data structure, and I just stick it into here. 1313 01:02:09,340 --> 01:02:12,390 Does that makes sense? 1314 01:02:12,390 --> 01:02:14,460 What are the operations these data structures 1315 01:02:14,460 --> 01:02:16,380 need to maintain? 1316 01:02:16,380 --> 01:02:19,410 Finding the minimum or the maximum of these two sets. 1317 01:02:19,410 --> 01:02:20,820 Does that make sense? 1318 01:02:20,820 --> 01:02:25,230 Actually, really, the-- where are those operations? 1319 01:02:25,230 --> 01:02:26,640 I don't have them anymore. 1320 01:02:26,640 --> 01:02:29,070 But they were the priority queue operations. 1321 01:02:29,070 --> 01:02:33,750 They had delete_max-- and also insert-- 1322 01:02:33,750 --> 01:02:36,930 were things that it did well on. 1323 01:02:36,930 --> 01:02:40,200 So any priority queue, anything that 1324 01:02:40,200 --> 01:02:45,270 can deal with maxes and mins, is good. 1325 01:02:45,270 --> 01:02:47,610 And what's a data structure you know 1326 01:02:47,610 --> 01:02:50,910 that can deal with maxes and mins pretty efficiently? 1327 01:02:50,910 --> 01:02:51,870 The set AVL, right? 1328 01:02:51,870 --> 01:02:55,830 So instead of data structure here, I'm 1329 01:02:55,830 --> 01:02:59,075 going to say, set AVL. 1330 01:03:03,420 --> 01:03:12,168 And obviously, it's going to be cheered by bid. 1331 01:03:12,168 --> 01:03:14,460 Because that's the thing that I'm going to want to find 1332 01:03:14,460 --> 01:03:15,390 maxes and mins over. 1333 01:03:18,070 --> 01:03:19,690 Everyone following the logic here 1334 01:03:19,690 --> 01:03:23,583 of why I'm maintaining these things? 1335 01:03:23,583 --> 01:03:25,000 This is the level of an invariants 1336 01:03:25,000 --> 01:03:28,400 that I want to maintain, because when I go to, 1337 01:03:28,400 --> 01:03:33,430 for example, do this query, get-revenue, 1338 01:03:33,430 --> 01:03:36,890 I can just run through and sum all of these things. 1339 01:03:36,890 --> 01:03:38,260 Oh, wait. 1340 01:03:38,260 --> 01:03:40,780 How much time do I have? 1341 01:03:40,780 --> 01:03:42,250 Do I have k time? 1342 01:03:42,250 --> 01:03:44,210 No, I don't have k time. 1343 01:03:44,210 --> 01:03:48,790 So I don't-- I can't afford to sum up all of these things 1344 01:03:48,790 --> 01:03:49,880 at the end of my thread. 1345 01:03:49,880 --> 01:03:52,720 I have to return it to you in constant time. 1346 01:03:52,720 --> 01:03:54,880 Any ideas? 1347 01:03:54,880 --> 01:03:59,230 Yeah, just compute-- update a sum. 1348 01:03:59,230 --> 01:04:02,830 Along with this data structure, I'm 1349 01:04:02,830 --> 01:04:05,500 going to keep a fourth thing, which 1350 01:04:05,500 --> 01:04:13,120 is just total of their bids. 1351 01:04:13,120 --> 01:04:14,380 I'm going to call it t. 1352 01:04:18,670 --> 01:04:20,380 And that's something I'm maintaining. 1353 01:04:20,380 --> 01:04:22,780 It's part of my data structure. 1354 01:04:22,780 --> 01:04:24,370 You can think of it as, I'm augmenting 1355 01:04:24,370 --> 01:04:27,310 this thing with a number. 1356 01:04:27,310 --> 01:04:30,130 And the point of augmenting this thing with a number 1357 01:04:30,130 --> 01:04:31,510 is that I can just-- 1358 01:04:31,510 --> 01:04:34,120 if I need to know what the total of this stuff is, 1359 01:04:34,120 --> 01:04:35,800 I can just look at that number. 1360 01:04:35,800 --> 01:04:38,260 Does that make sense? 1361 01:04:38,260 --> 01:04:39,100 All right. 1362 01:04:39,100 --> 01:04:41,920 So now, I think, we're almost done. 1363 01:04:41,920 --> 01:04:44,050 We're basically done, right? 1364 01:04:44,050 --> 01:04:45,860 How do we do this? 1365 01:04:45,860 --> 01:04:51,220 Someone walk through to me how I would get revenue 1366 01:04:51,220 --> 01:04:54,270 with this data structure. 1367 01:04:54,270 --> 01:04:56,870 I basically kind of told you. 1368 01:04:56,870 --> 01:04:57,800 Look at this number. 1369 01:04:57,800 --> 01:04:59,240 Return it. 1370 01:04:59,240 --> 01:05:02,580 Because that's the invariant that I've maintained on my data 1371 01:05:02,580 --> 01:05:03,080 structure. 1372 01:05:03,080 --> 01:05:04,760 I'm relying on this invariant. 1373 01:05:04,760 --> 01:05:06,860 Now, I'd better make sure this is good 1374 01:05:06,860 --> 01:05:09,350 when I do dynamic operations. 1375 01:05:09,350 --> 01:05:11,030 I make sure I maintain it. 1376 01:05:11,030 --> 01:05:15,973 But if I, by induction, I ensure that all of this stuff is good, 1377 01:05:15,973 --> 01:05:18,140 and when I do a dynamic operation, all of that stuff 1378 01:05:18,140 --> 01:05:21,230 is maintained, then I'm all good. 1379 01:05:21,230 --> 01:05:28,050 So get-revenue, after I did all this extra work, is very easy. 1380 01:05:28,050 --> 01:05:31,730 I just look at this number and return then. 1381 01:05:31,730 --> 01:05:33,620 When we're grading a data structures problem, 1382 01:05:33,620 --> 01:05:36,710 usually we give you some points, first, for setting up your data 1383 01:05:36,710 --> 01:05:39,110 structure, separately from the operations, 1384 01:05:39,110 --> 01:05:42,320 and then we give you points per operation 1385 01:05:42,320 --> 01:05:44,360 that you successfully deal with, and then 1386 01:05:44,360 --> 01:05:47,130 some points for correctness and running time. 1387 01:05:47,130 --> 01:05:49,620 Yeah, you had a question? 1388 01:05:49,620 --> 01:05:51,560 AUDIENCE: So would total be a thing 1389 01:05:51,560 --> 01:05:54,770 that we update whenever we mess around 1390 01:05:54,770 --> 01:05:58,871 with the highest bidder tree and then n minus k bidder tree? 1391 01:05:58,871 --> 01:06:00,760 JASON KU: I'm sorry, say that again? 1392 01:06:00,760 --> 01:06:03,790 AUDIENCE: Are we treating the total to an augmentation 1393 01:06:03,790 --> 01:06:05,540 that we update every time we do something? 1394 01:06:05,540 --> 01:06:05,750 JASON KU: Yeah. 1395 01:06:05,750 --> 01:06:06,250 Yeah. 1396 01:06:06,250 --> 01:06:07,885 So it's just one number. 1397 01:06:07,885 --> 01:06:09,260 It's not really a data structure, 1398 01:06:09,260 --> 01:06:12,770 it's just one number that I'm storing with my database. 1399 01:06:16,820 --> 01:06:17,420 All right. 1400 01:06:17,420 --> 01:06:23,010 How do I implement a new bid operation? 1401 01:06:23,010 --> 01:06:23,510 Yeah? 1402 01:06:23,510 --> 01:06:24,677 AUDIENCE: I have a question. 1403 01:06:24,677 --> 01:06:27,302 Can we assume that the bids will also be unique? 1404 01:06:27,302 --> 01:06:29,510 JASON KU: Can you assume that the bids may be unique? 1405 01:06:29,510 --> 01:06:30,010 No. 1406 01:06:32,420 --> 01:06:39,050 That's actually something that is a really useful observation. 1407 01:06:39,050 --> 01:06:42,290 We've been talking about set data structures 1408 01:06:42,290 --> 01:06:47,390 as requiring unique keys. 1409 01:06:47,390 --> 01:06:48,890 How can I deal with non-unique keys? 1410 01:06:48,890 --> 01:06:51,140 It actually turns out that, hash table, 1411 01:06:51,140 --> 01:06:54,650 it's really important that these be unique keys. 1412 01:06:54,650 --> 01:06:56,930 Because I need to check whether it's in there. 1413 01:06:56,930 --> 01:06:59,110 I'm looking for that single key. 1414 01:06:59,110 --> 01:07:01,340 When I find it, I have to return. 1415 01:07:01,340 --> 01:07:03,230 If I had multiple things with that key, 1416 01:07:03,230 --> 01:07:05,450 I might not return the one that I'm looking for. 1417 01:07:05,450 --> 01:07:08,240 Doesn't even make sense. 1418 01:07:08,240 --> 01:07:17,240 But you can generalize the set infrastructure 1419 01:07:17,240 --> 01:07:19,230 to deal with multi-sets. 1420 01:07:19,230 --> 01:07:21,420 How can I do that? 1421 01:07:21,420 --> 01:07:24,860 Well, with each key-- 1422 01:07:24,860 --> 01:07:26,720 again, I'm storing unique keys. 1423 01:07:26,720 --> 01:07:30,110 With each key, I can link it to a sequence set of structure, 1424 01:07:30,110 --> 01:07:32,258 or any other data structure. 1425 01:07:32,258 --> 01:07:34,550 And what I'm going to do is, I'm going to do-- anything 1426 01:07:34,550 --> 01:07:36,440 that has that key, I'm going to stick it 1427 01:07:36,440 --> 01:07:38,558 in that data structure. 1428 01:07:38,558 --> 01:07:40,100 So instead of storing one item there, 1429 01:07:40,100 --> 01:07:42,267 I have the possibility of storing many things there. 1430 01:07:42,267 --> 01:07:44,700 Now, I have to change the semantics here. 1431 01:07:44,700 --> 01:07:48,800 If I'm saying, find on this key, well, now, I could say, 1432 01:07:48,800 --> 01:07:51,800 I'm going to return all of the things with that key, 1433 01:07:51,800 --> 01:07:55,010 or I'm going to store some thing with that key. 1434 01:07:55,010 --> 01:07:56,150 But you get the idea. 1435 01:07:56,150 --> 01:07:59,970 All I have to do is map it to some other data structure 1436 01:07:59,970 --> 01:08:01,160 to maintain [INAUDIBLE]. 1437 01:08:01,160 --> 01:08:06,380 Like, maybe, I want all of the things with that key. 1438 01:08:06,380 --> 01:08:09,440 I want to find the one with this other key. 1439 01:08:09,440 --> 01:08:11,475 So maybe I link to a set data structure 1440 01:08:11,475 --> 01:08:13,100 that can search on other things, right? 1441 01:08:13,100 --> 01:08:15,410 But the idea here is, we maintain 1442 01:08:15,410 --> 01:08:18,109 this uniqueness key property. 1443 01:08:18,109 --> 01:08:20,810 I have to relax my semantics so that I'm 1444 01:08:20,810 --> 01:08:23,609 storing multiple things at that key location. 1445 01:08:23,609 --> 01:08:24,550 Does that makes sense? 1446 01:08:24,550 --> 01:08:25,250 Yeah? 1447 01:08:25,250 --> 01:08:27,250 AUDIENCE: Why does it matter whether the set AVL 1448 01:08:27,250 --> 01:08:28,512 tree has unique keys or not? 1449 01:08:31,220 --> 01:08:35,750 JASON KU: It's going to matter here because I have bids. 1450 01:08:35,750 --> 01:08:38,899 And the bids could be non-unique. 1451 01:08:38,899 --> 01:08:43,580 Two people could have the same bid. 1452 01:08:43,580 --> 01:08:46,850 And by our definition of a set data structure, 1453 01:08:46,850 --> 01:08:48,149 it had to have unique keys. 1454 01:08:48,149 --> 01:08:50,870 So if I stuck in all of these things keyed by bidder, 1455 01:08:50,870 --> 01:08:53,510 you've got a problem. 1456 01:08:53,510 --> 01:08:54,979 Now, in actuality, we can get away 1457 01:08:54,979 --> 01:08:57,439 with that by storing, basically, a linked 1458 01:08:57,439 --> 01:09:00,529 list of all of the things with that key, and we would be fine. 1459 01:09:03,109 --> 01:09:06,260 And then, whenever I want to return one, I could just do it. 1460 01:09:06,260 --> 01:09:10,550 But actually, a binary tree actually 1461 01:09:10,550 --> 01:09:12,950 is flexible enough that, in most implementations, 1462 01:09:12,950 --> 01:09:15,920 you can just store a bunch of those things. 1463 01:09:15,920 --> 01:09:20,840 But, actually, our run times do worse. 1464 01:09:20,840 --> 01:09:24,439 What does it mean to find-next in my sequence? 1465 01:09:24,439 --> 01:09:29,750 What does it mean to return the next larger 1466 01:09:29,750 --> 01:09:32,029 thing above this key? 1467 01:09:32,029 --> 01:09:33,859 Doesn't really make sense, because there 1468 01:09:33,859 --> 01:09:34,819 could be multiple ones. 1469 01:09:34,819 --> 01:09:36,200 Which one do I return? 1470 01:09:36,200 --> 01:09:41,270 And if I repeatedly do find-next on this data structure, 1471 01:09:41,270 --> 01:09:44,720 I might not loop through all of the things. 1472 01:09:44,720 --> 01:09:49,170 So some stuff breaks down in our interface. 1473 01:09:49,170 --> 01:09:54,980 So I would prefer you use unique keys in this kind of situation. 1474 01:09:54,980 --> 01:09:58,580 Next Tuesday, I think, with binary heaps, 1475 01:09:58,580 --> 01:10:02,720 we'll deal with non-unique keys. 1476 01:10:02,720 --> 01:10:04,230 That's fine. 1477 01:10:04,230 --> 01:10:09,440 But if you're going to use non-unique keys in here, 1478 01:10:09,440 --> 01:10:11,480 you've just got to be a little bit 1479 01:10:11,480 --> 01:10:13,920 careful about the semantics. 1480 01:10:13,920 --> 01:10:14,420 Yeah? 1481 01:10:14,420 --> 01:10:15,410 AUDIENCE: [INAUDIBLE]? 1482 01:10:24,500 --> 01:10:27,620 JASON KU: You would get the same running 1483 01:10:27,620 --> 01:10:29,960 time-- you have to change the semantics 1484 01:10:29,960 --> 01:10:32,510 on what you mean by "find something." 1485 01:10:32,510 --> 01:10:35,570 I just want to return anything with this key. 1486 01:10:35,570 --> 01:10:37,490 AUDIENCE: What if everything has the same key. 1487 01:10:37,490 --> 01:10:38,190 Then-- 1488 01:10:38,190 --> 01:10:39,900 JASON KU: Then it takes constant time. 1489 01:10:39,900 --> 01:10:41,150 I just return the first thing. 1490 01:10:45,110 --> 01:10:48,410 I mean, these are special cases that you 1491 01:10:48,410 --> 01:10:51,140 have to think about, right? 1492 01:10:51,140 --> 01:10:52,920 I don't like thinking about them. 1493 01:10:52,920 --> 01:10:55,460 So I just like having unique keys. 1494 01:10:55,460 --> 01:10:59,570 And if I want a situation where I have non-unique keys, 1495 01:10:59,570 --> 01:11:02,870 I'm going to basically put collisions at that key 1496 01:11:02,870 --> 01:11:04,910 into a new data structure. 1497 01:11:04,910 --> 01:11:06,540 It's just easier for me to separate out 1498 01:11:06,540 --> 01:11:08,300 in my head on what's happening. 1499 01:11:08,300 --> 01:11:11,030 Because, all of the running times that we proposed, 1500 01:11:11,030 --> 01:11:14,540 there's very strong definitions for unique key. 1501 01:11:14,540 --> 01:11:16,970 When you're dealing with a multi-set, 1502 01:11:16,970 --> 01:11:20,150 it's a little bit more prevalent. 1503 01:11:20,150 --> 01:11:21,890 Any other questions? 1504 01:11:21,890 --> 01:11:25,910 We really need to kind of move on here, right? 1505 01:11:25,910 --> 01:11:27,650 Dictionary keyed on bidder. 1506 01:11:27,650 --> 01:11:31,670 We still haven't implemented any dynamic operations. 1507 01:11:31,670 --> 01:11:32,360 New bid. 1508 01:11:32,360 --> 01:11:34,890 What do I do? 1509 01:11:34,890 --> 01:11:38,030 What am I going to need for my update? 1510 01:11:38,030 --> 01:11:42,020 I'm going to be able to need to essentially find, 1511 01:11:42,020 --> 01:11:44,120 in each of these data structures, 1512 01:11:44,120 --> 01:11:47,510 where that bidder is. 1513 01:11:47,510 --> 01:11:51,890 And if I just have a thing keyed on their bid, 1514 01:11:51,890 --> 01:11:54,770 the interface doesn't tell me what their old bid was. 1515 01:11:54,770 --> 01:11:57,200 It just tells me what their bidder ID is. 1516 01:11:57,200 --> 01:12:02,030 So if I just had their bidder ID and their new bid, how the heck 1517 01:12:02,030 --> 01:12:04,430 am I going to find out which of these data-- where 1518 01:12:04,430 --> 01:12:07,190 in these data structures they are? 1519 01:12:07,190 --> 01:12:11,630 What I can do is, I can store, in this dictionary-- 1520 01:12:11,630 --> 01:12:15,680 which I can look up in some amount of time-- 1521 01:12:15,680 --> 01:12:18,050 a pointer to where it exists in these things. 1522 01:12:18,050 --> 01:12:20,690 Does that make sense? 1523 01:12:20,690 --> 01:12:21,920 This is called cross-linking. 1524 01:12:21,920 --> 01:12:24,800 You may have done that a little bit in problem 1525 01:12:24,800 --> 01:12:26,940 set 2, or something like that. 1526 01:12:26,940 --> 01:12:27,440 Yeah? 1527 01:12:27,440 --> 01:12:30,590 AUDIENCE: Restoring a pointer to a specific bidder? 1528 01:12:30,590 --> 01:12:32,780 JASON KU: Yeah, exactly. 1529 01:12:32,780 --> 01:12:34,178 The invariant we have is that all 1530 01:12:34,178 --> 01:12:35,720 of the bidders we've processed so far 1531 01:12:35,720 --> 01:12:38,420 exist in these data structures-- 1532 01:12:38,420 --> 01:12:41,060 in one of these data structures-- 1533 01:12:41,060 --> 01:12:43,850 because we've used a set AVL tree. 1534 01:12:43,850 --> 01:12:45,800 In particular, it exists in a node of one 1535 01:12:45,800 --> 01:12:47,660 of these data structures. 1536 01:12:47,660 --> 01:12:50,780 What we can do is, in this thing, 1537 01:12:50,780 --> 01:12:56,480 maintain pointers mapping each of the bidder IDs 1538 01:12:56,480 --> 01:12:58,910 to their location in these data structures. 1539 01:12:58,910 --> 01:13:00,920 And why is that going to be a useful thing? 1540 01:13:04,220 --> 01:13:06,290 Say I map this dictionary. 1541 01:13:06,290 --> 01:13:07,790 What could I use for this dictionary 1542 01:13:07,790 --> 01:13:10,580 to get the running time we're looking for? 1543 01:13:10,580 --> 01:13:13,340 I could use a hash table or a set AVL. 1544 01:13:13,340 --> 01:13:16,520 If it's a set AVL, I'm going to get logarithmic time, worst 1545 01:13:16,520 --> 01:13:17,705 case. 1546 01:13:17,705 --> 01:13:19,580 With a hash table, I'm getting constant time, 1547 01:13:19,580 --> 01:13:21,120 but it's expected. 1548 01:13:21,120 --> 01:13:24,920 So it could be linear time in the worst case. 1549 01:13:24,920 --> 01:13:27,020 We're going to use a set AVL tree, because that's 1550 01:13:27,020 --> 01:13:30,330 what we do right now. 1551 01:13:30,330 --> 01:13:32,810 And that's going to give us worst case. 1552 01:13:32,810 --> 01:13:35,060 What I'm going to do is, for each one of these things, 1553 01:13:35,060 --> 01:13:36,393 I'm going to store that pointer. 1554 01:13:36,393 --> 01:13:39,683 What I'm going to do is, first, I'm 1555 01:13:39,683 --> 01:13:41,100 going to do that operation we had. 1556 01:13:41,100 --> 01:13:49,100 If I'm adding a new bidder, I'm going to take the D and B-- 1557 01:13:49,100 --> 01:13:53,840 these two values, that object, that bidder object, 1558 01:13:53,840 --> 01:13:54,820 or whatever-- 1559 01:13:54,820 --> 01:13:58,040 I'm going to look at the smallest thing in this data 1560 01:13:58,040 --> 01:14:06,470 structure, see if its bid is bigger than the thing 1561 01:14:06,470 --> 01:14:07,683 I'm inserting. 1562 01:14:07,683 --> 01:14:10,100 If it is, then I'm not going to touch this data structure. 1563 01:14:10,100 --> 01:14:11,840 I'm just going to insert it in here. 1564 01:14:11,840 --> 01:14:13,992 And now, after I insert in here, I 1565 01:14:13,992 --> 01:14:15,950 know exactly where it is in the data structure. 1566 01:14:15,950 --> 01:14:19,370 I just inserted it. 1567 01:14:19,370 --> 01:14:22,010 So now, holding that in my hand-- 1568 01:14:22,010 --> 01:14:27,680 the node-- I can go and insert that bidder into here 1569 01:14:27,680 --> 01:14:29,990 by bid ID. 1570 01:14:29,990 --> 01:14:31,820 So it's going to take logarithm time. 1571 01:14:31,820 --> 01:14:36,410 And now I can store, with that node, my pointer 1572 01:14:36,410 --> 01:14:37,565 to this data structure. 1573 01:14:37,565 --> 01:14:38,820 Does that makes sense? 1574 01:14:38,820 --> 01:14:41,780 And in the other case, I kind of do the same thing. 1575 01:14:41,780 --> 01:14:45,050 If it's bigger than the smallest thing here, I pop that smaller 1576 01:14:45,050 --> 01:14:48,620 thing out, stick it in there, and I stick my new guy in here, 1577 01:14:48,620 --> 01:14:52,860 cross-linking each of those pointers along the way. 1578 01:14:52,860 --> 01:14:57,470 Does that make sense, hopefully? 1579 01:14:57,470 --> 01:14:59,280 Kind of? 1580 01:14:59,280 --> 01:14:59,780 Kind of? 1581 01:14:59,780 --> 01:15:00,860 OK. 1582 01:15:00,860 --> 01:15:03,470 And for update, very similar. 1583 01:15:03,470 --> 01:15:06,050 If I want to update a certain bidder, 1584 01:15:06,050 --> 01:15:11,550 I look in this data structure, find the bidder, 1585 01:15:11,550 --> 01:15:13,440 traverse that pointer to wherever 1586 01:15:13,440 --> 01:15:16,980 it is in one of these AVL trees, right? 1587 01:15:16,980 --> 01:15:22,050 If it's in this one, I just remove it from the tree, 1588 01:15:22,050 --> 01:15:23,790 or I remove it from the tree and then 1589 01:15:23,790 --> 01:15:29,070 I re-insert with whatever the new bid is. 1590 01:15:29,070 --> 01:15:32,010 And if it's in this one, again, I 1591 01:15:32,010 --> 01:15:35,190 remove it from the tree, re-insert in whichever 1592 01:15:35,190 --> 01:15:36,840 of these things is, and then I might 1593 01:15:36,840 --> 01:15:39,720 have to swap a constant number of things back 1594 01:15:39,720 --> 01:15:43,020 and forth here to maintain that this has the k highest. 1595 01:15:43,020 --> 01:15:47,610 And when I do those dynamic operations, 1596 01:15:47,610 --> 01:15:50,370 I'm always removing some constant number 1597 01:15:50,370 --> 01:15:53,340 of nodes in each of these trees and adding back 1598 01:15:53,340 --> 01:15:55,213 in a constant number of things. 1599 01:15:55,213 --> 01:15:57,630 And while I do that, I just make sure to update this total 1600 01:15:57,630 --> 01:15:58,980 as I go. 1601 01:15:58,980 --> 01:16:04,680 This total was the sum of all of the bids in here. 1602 01:16:04,680 --> 01:16:07,320 And if I insert a new bid in here, 1603 01:16:07,320 --> 01:16:09,520 I have to add to that total. 1604 01:16:09,520 --> 01:16:11,858 And if I remove one, I have to remove from that total. 1605 01:16:11,858 --> 01:16:13,650 But again, it's a constant number of things 1606 01:16:13,650 --> 01:16:15,900 I'm moving in and out of these data structures, 1607 01:16:15,900 --> 01:16:18,060 and so it can update this in constant time. 1608 01:16:18,060 --> 01:16:19,800 Does that makes sense? 1609 01:16:19,800 --> 01:16:23,760 Now, the lookup here, and the insertion and deletion 1610 01:16:23,760 --> 01:16:29,130 in here, those each took logarithmic time, worst case. 1611 01:16:29,130 --> 01:16:30,850 But I did a constant number of them. 1612 01:16:30,850 --> 01:16:32,780 So again, logarithm time. 1613 01:16:32,780 --> 01:16:33,930 Does that makes sense? 1614 01:16:33,930 --> 01:16:37,680 That's, essentially, this problem. 1615 01:16:37,680 --> 01:16:39,280 It's difficult, right? 1616 01:16:39,280 --> 01:16:41,470 There's a lot of moving parts here. 1617 01:16:41,470 --> 01:16:44,610 But if you just break it up and to describe to me-- 1618 01:16:44,610 --> 01:16:47,280 like, you really do a good job on this part, 1619 01:16:47,280 --> 01:16:50,790 describe well to me what your data structure has, then 1620 01:16:50,790 --> 01:16:52,380 those descriptions of those algorithms 1621 01:16:52,380 --> 01:16:53,920 can be pretty brief, actually. 1622 01:16:57,120 --> 01:16:59,940 In this one, you tell me these three data structures, 1623 01:16:59,940 --> 01:17:03,630 you tell me this guy's mapping to its location 1624 01:17:03,630 --> 01:17:07,770 and these things, I'm maintaining this guy, 1625 01:17:07,770 --> 01:17:11,520 and then you just maintain those things with dynamic operations 1626 01:17:11,520 --> 01:17:14,910 and then use those things for query operations. 1627 01:17:14,910 --> 01:17:15,900 Does that make sense? 1628 01:17:19,630 --> 01:17:20,960 Wow, we have 10 more minutes? 1629 01:17:25,810 --> 01:17:32,830 I'm going to briefly do 4-4 for you. 1630 01:17:32,830 --> 01:17:33,700 Receiver roster. 1631 01:17:33,700 --> 01:17:36,880 We've got a coach. 1632 01:17:36,880 --> 01:17:38,980 She's got a bunch of football players-- 1633 01:17:38,980 --> 01:17:41,650 receivers. 1634 01:17:41,650 --> 01:17:45,790 And wanting to start on her team, 1635 01:17:45,790 --> 01:17:50,380 some number of players that have the highest performance. 1636 01:17:50,380 --> 01:17:52,930 And by performance, we mean the average number 1637 01:17:52,930 --> 01:17:54,970 of points they've played in games that they 1638 01:17:54,970 --> 01:17:56,980 have logged in their system. 1639 01:17:56,980 --> 01:18:00,280 But actually, their data is incomplete. 1640 01:18:00,280 --> 01:18:02,680 They don't know which games, and how much they scored, 1641 01:18:02,680 --> 01:18:03,680 and all of these things. 1642 01:18:03,680 --> 01:18:05,090 There could be errors. 1643 01:18:05,090 --> 01:18:08,320 And so these interns, they're constantly 1644 01:18:08,320 --> 01:18:15,370 updating this database with queries like, oh, never mind, 1645 01:18:15,370 --> 01:18:17,980 this person didn't play in this game, 1646 01:18:17,980 --> 01:18:21,850 or actually, they did, and they scored this number of points. 1647 01:18:21,850 --> 01:18:25,780 That's the-- clear and record things. 1648 01:18:25,780 --> 01:18:28,360 And then, at some point in time, like 1649 01:18:28,360 --> 01:18:31,870 when we want to play a game, I want 1650 01:18:31,870 --> 01:18:36,760 to be able to return the jersey with the kth highest 1651 01:18:36,760 --> 01:18:40,270 performance in log n time. 1652 01:18:40,270 --> 01:18:45,370 This is kind of a rank query, right? 1653 01:18:45,370 --> 01:18:46,135 The kth highest. 1654 01:18:48,670 --> 01:18:53,500 Now, in actuality, I might want to return all k highest players 1655 01:18:53,500 --> 01:18:56,050 so that that might be my roster. 1656 01:18:56,050 --> 01:19:00,070 But this is a more generalized query. 1657 01:19:00,070 --> 01:19:02,000 It's more specific, more-- 1658 01:19:02,000 --> 01:19:04,030 it's not really comparable. 1659 01:19:04,030 --> 01:19:07,358 But you get an idea for why that might be useful to the coach. 1660 01:19:07,358 --> 01:19:07,900 I don't know. 1661 01:19:07,900 --> 01:19:10,090 Maybe not. 1662 01:19:10,090 --> 01:19:11,667 So what's the idea here? 1663 01:19:11,667 --> 01:19:13,750 We have a lot of different things floating around. 1664 01:19:13,750 --> 01:19:15,280 We've got games. 1665 01:19:15,280 --> 01:19:17,180 They have IDs-- unique IDs. 1666 01:19:17,180 --> 01:19:18,040 We've got receivers. 1667 01:19:18,040 --> 01:19:19,630 They have unique IDs. 1668 01:19:19,630 --> 01:19:23,890 And each receiver could play in many games. 1669 01:19:23,890 --> 01:19:26,590 Oh, that's kind of worrisome. 1670 01:19:26,590 --> 01:19:30,990 And many receivers could play in the same game. 1671 01:19:30,990 --> 01:19:32,680 These kind of many-to-one mappings 1672 01:19:32,680 --> 01:19:34,490 are a little confusing. 1673 01:19:34,490 --> 01:19:39,110 And then we've got each player-- 1674 01:19:39,110 --> 01:19:42,200 receiver-- having a certain number of points per game. 1675 01:19:42,200 --> 01:19:45,930 And we're trying to sort them, kind of, by their performance, 1676 01:19:45,930 --> 01:19:47,944 which is a rational number. 1677 01:19:47,944 --> 01:19:50,480 Ugh, right? 1678 01:19:50,480 --> 01:19:52,880 Which has to do with the number of games they've played 1679 01:19:52,880 --> 01:19:55,880 and the total number of points. 1680 01:19:55,880 --> 01:19:59,420 Now, I see rational number, I can't compute that. 1681 01:19:59,420 --> 01:20:02,990 That's what we're talking about last problem session, right? 1682 01:20:02,990 --> 01:20:04,448 But what I can do is, I could store 1683 01:20:04,448 --> 01:20:05,990 the total number of games they played 1684 01:20:05,990 --> 01:20:07,730 and the total number of points they have. 1685 01:20:07,730 --> 01:20:09,830 And you can imagine, by augmentation 1686 01:20:09,830 --> 01:20:14,690 similar to this, every time I add a game, one 1687 01:20:14,690 --> 01:20:18,980 of these small operations, I can update that information 1688 01:20:18,980 --> 01:20:19,700 for each player. 1689 01:20:22,850 --> 01:20:25,040 If one of these dynamic operations 1690 01:20:25,040 --> 01:20:27,440 is affecting only one receiver, I 1691 01:20:27,440 --> 01:20:31,550 can update whatever it is in constant time, 1692 01:20:31,550 --> 01:20:35,060 probably, if I just store with the player what 1693 01:20:35,060 --> 01:20:38,732 their total number of games is as recorded by the database 1694 01:20:38,732 --> 01:20:40,190 and how many points they've scored. 1695 01:20:40,190 --> 01:20:43,070 Then, if I have a data structure that 1696 01:20:43,070 --> 01:20:47,790 needs to sort the receivers by their performance 1697 01:20:47,790 --> 01:20:50,210 so I might be able to find the kth one-- 1698 01:20:50,210 --> 01:20:56,740 the kth largest-- then I can't compute that performance. 1699 01:20:56,740 --> 01:20:59,290 But what can I do? 1700 01:20:59,290 --> 01:21:02,820 I can compare two players based on their performance using 1701 01:21:02,820 --> 01:21:05,430 cross-multiplication. 1702 01:21:05,430 --> 01:21:07,740 Because I have the numerator and denominator 1703 01:21:07,740 --> 01:21:09,240 of each of these rationals and I can 1704 01:21:09,240 --> 01:21:11,910 cross-multiply and figure out whether one's 1705 01:21:11,910 --> 01:21:13,330 bigger or smaller. 1706 01:21:13,330 --> 01:21:15,030 And as long as I have a comparitor, 1707 01:21:15,030 --> 01:21:17,850 I can do set AVL stuff. 1708 01:21:17,850 --> 01:21:19,120 Does that makes sense? 1709 01:21:19,120 --> 01:21:20,610 OK. 1710 01:21:20,610 --> 01:21:25,440 I'm just going to outline kind of the components of this data 1711 01:21:25,440 --> 01:21:26,240 structure. 1712 01:21:28,770 --> 01:21:32,730 Well, first off, I'm going to need to record a receiver. 1713 01:21:32,730 --> 01:21:34,780 And a receiver could have a lot of games. 1714 01:21:34,780 --> 01:21:37,650 But the important-- this is kind of a receiver-centric kind 1715 01:21:37,650 --> 01:21:38,230 of problem. 1716 01:21:38,230 --> 01:21:39,930 Does that makes sense to you guys? 1717 01:21:39,930 --> 01:21:42,960 I'm not ever wanting to filter on all of the receivers 1718 01:21:42,960 --> 01:21:44,490 playing a game. 1719 01:21:44,490 --> 01:21:50,490 I'm never removing a game from the system, 1720 01:21:50,490 --> 01:21:53,040 I'm removing a receiver from ever 1721 01:21:53,040 --> 01:21:54,420 playing in a specific game. 1722 01:21:54,420 --> 01:21:55,630 Does that makes sense? 1723 01:21:55,630 --> 01:21:59,430 So if I'm storing a receiver, and each receiver 1724 01:21:59,430 --> 01:22:01,980 has some games associated with them, 1725 01:22:01,980 --> 01:22:05,370 kind of makes sense I might want to have a nested data 1726 01:22:05,370 --> 01:22:08,160 structure where with-- 1727 01:22:08,160 --> 01:22:11,550 maybe I have a dictionary on receivers. 1728 01:22:11,550 --> 01:22:14,400 And for each one, I store all the games 1729 01:22:14,400 --> 01:22:17,220 that they've played in some other data structure. 1730 01:22:17,220 --> 01:22:20,347 With each receiver, I store another-- its own data 1731 01:22:20,347 --> 01:22:21,930 structure containing all of its games. 1732 01:22:21,930 --> 01:22:23,640 Does that makes sense? 1733 01:22:23,640 --> 01:22:26,280 OK, so that's the idea. 1734 01:22:26,280 --> 01:22:27,540 We have some kind of-- 1735 01:22:27,540 --> 01:22:31,590 I need to be able to look up receivers, 1736 01:22:31,590 --> 01:22:35,970 because I'm clearing them or I'm recording them. 1737 01:22:35,970 --> 01:22:40,320 So I'm going to have a dictionary or-- 1738 01:22:40,320 --> 01:22:42,970 here, I'm looking for worst-case log n time. 1739 01:22:42,970 --> 01:22:45,870 So I'm going to skip the dictionary abstraction 1740 01:22:45,870 --> 01:22:49,260 and go straight for the set AVL. 1741 01:22:49,260 --> 01:23:00,150 AVL keyed on receivers. 1742 01:23:00,150 --> 01:23:06,730 I before E, except after C. It is I-- 1743 01:23:06,730 --> 01:23:07,230 E-I? 1744 01:23:10,650 --> 01:23:11,850 That rule never works. 1745 01:23:11,850 --> 01:23:13,200 OK. 1746 01:23:13,200 --> 01:23:17,940 Set AVL tree on receivers, and each one of those nodes 1747 01:23:17,940 --> 01:23:23,010 with each one of those receivers, I'm going to store-- 1748 01:23:23,010 --> 01:23:32,250 for each, store a set AVL on games. 1749 01:23:36,330 --> 01:23:38,460 Why do I store a set AVL on games? 1750 01:23:38,460 --> 01:23:42,480 Why don't I just store a list of all of the games? 1751 01:23:42,480 --> 01:23:45,360 Because if I want to remove this game from a receiver, 1752 01:23:45,360 --> 01:23:47,250 I need to do that in log n time. 1753 01:23:47,250 --> 01:23:51,520 And here, what we're saying is that n is the number of games, 1754 01:23:51,520 --> 01:23:54,000 but that the number of receivers on the team 1755 01:23:54,000 --> 01:23:56,760 is always less than the number of games. 1756 01:23:56,760 --> 01:24:02,040 If I search in this AVL tree and I search in its AVL tree, 1757 01:24:02,040 --> 01:24:07,050 I can be assured that those two searches was only log n time. 1758 01:24:07,050 --> 01:24:08,840 Because I need to remove game, right? 1759 01:24:08,840 --> 01:24:11,710 So there you go. 1760 01:24:11,710 --> 01:24:13,290 Then what am I doing? 1761 01:24:13,290 --> 01:24:15,550 I'm returning the kth highest performance. 1762 01:24:15,550 --> 01:24:21,100 Well, I need-- with each one of these guys, I also store-- 1763 01:24:21,100 --> 01:24:22,630 what was this augmentation? 1764 01:24:22,630 --> 01:24:28,030 The sum of the points stored in these games. 1765 01:24:28,030 --> 01:24:32,755 Sum of points and-- 1766 01:24:35,350 --> 01:24:40,000 what was it-- number games. 1767 01:24:40,000 --> 01:24:43,130 Because if I store both of those things in constant time, 1768 01:24:43,130 --> 01:24:46,450 I'm going to be able to compute their performance, 1769 01:24:46,450 --> 01:24:49,000 where I'm going to be able to have the data 1770 01:24:49,000 --> 01:24:51,290 I need to compare performances. 1771 01:24:51,290 --> 01:24:52,600 AUDIENCE: [INAUDIBLE]? 1772 01:24:52,600 --> 01:24:53,530 JASON KU: Yeah, it is. 1773 01:24:53,530 --> 01:24:54,970 Just numbers. 1774 01:24:54,970 --> 01:24:56,090 These are data structures. 1775 01:24:56,090 --> 01:24:56,970 This is a data structure. 1776 01:24:56,970 --> 01:24:57,928 These are just numbers. 1777 01:25:00,400 --> 01:25:02,190 And I'm storing that with each receiver. 1778 01:25:05,110 --> 01:25:08,860 But that's not going to help me find the kth highest player. 1779 01:25:08,860 --> 01:25:13,580 None of these things are sorted by performance. 1780 01:25:13,580 --> 01:25:15,635 So I need a last data structure. 1781 01:25:22,760 --> 01:25:31,130 Five, I need to store something dynamically sorted 1782 01:25:31,130 --> 01:25:32,970 by performance. 1783 01:25:32,970 --> 01:25:33,720 AUDIENCE: Set AVL? 1784 01:25:33,720 --> 01:25:35,810 JASON KU: Set AVL, yeah. 1785 01:25:35,810 --> 01:25:50,940 Set AVL storing receivers keyed on performance. 1786 01:25:50,940 --> 01:25:55,580 Now, when I say keyed on performance, 1787 01:25:55,580 --> 01:25:57,200 you want to mention something about 1788 01:25:57,200 --> 01:25:58,910 the cross-pot multiplication. 1789 01:25:58,910 --> 01:26:01,220 Like, I'm storing, with each one of these things, 1790 01:26:01,220 --> 01:26:04,070 this augmentation, and when I'm comparing two things, 1791 01:26:04,070 --> 01:26:05,900 I'm using cross-multiplication. 1792 01:26:05,900 --> 01:26:08,840 But other than that, then we can abstract it away, right? 1793 01:26:08,840 --> 01:26:10,700 We've abstracted that function call. 1794 01:26:10,700 --> 01:26:13,700 And I can imagine comparing two keys. 1795 01:26:13,700 --> 01:26:14,992 I can do this. 1796 01:26:14,992 --> 01:26:15,950 This is a theory thing. 1797 01:26:15,950 --> 01:26:18,200 I'm not asking you to implement that. 1798 01:26:18,200 --> 01:26:22,280 But that's sufficient for me, as a reader of your solution, 1799 01:26:22,280 --> 01:26:26,810 to be able to say, yeah, you know what you're talking about. 1800 01:26:26,810 --> 01:26:28,100 All right. 1801 01:26:28,100 --> 01:26:30,050 How do I connect these things? 1802 01:26:30,050 --> 01:26:32,480 The thing is, I'm going to need to be-- 1803 01:26:32,480 --> 01:26:38,800 I need to update these things when I insert or remove a game. 1804 01:26:41,360 --> 01:26:47,600 So how do I know where are these receivers are in this thing? 1805 01:26:47,600 --> 01:26:50,690 I store a pointer into this data structure, right? 1806 01:26:50,690 --> 01:26:57,710 So up here, I store a pointer to where 1807 01:26:57,710 --> 01:26:59,570 it is in the data structure. 1808 01:26:59,570 --> 01:27:01,280 Again, I'm storing all of the receivers. 1809 01:27:01,280 --> 01:27:04,070 This has the same size as the number 1 data 1810 01:27:04,070 --> 01:27:06,530 structure up there-- has the same number of receivers. 1811 01:27:09,590 --> 01:27:13,220 But we're not quite done yet, because I'm not 1812 01:27:13,220 --> 01:27:21,350 wanting to know who has the best performance. 1813 01:27:21,350 --> 01:27:25,170 I want to know who has the kth best performance. 1814 01:27:25,170 --> 01:27:27,410 Ugh. 1815 01:27:27,410 --> 01:27:30,450 How do I find the kth best thing in this tree? 1816 01:27:30,450 --> 01:27:31,220 I've got a tree. 1817 01:27:34,070 --> 01:27:35,030 Set AVL tree. 1818 01:27:35,030 --> 01:27:37,130 It's mapped on performance. 1819 01:27:37,130 --> 01:27:38,990 I know where the last one is. 1820 01:27:38,990 --> 01:27:41,390 But if I want to find the kth one from the end, 1821 01:27:41,390 --> 01:27:44,360 how do I do that? 1822 01:27:44,360 --> 01:27:45,890 It's an AVL tree-- 1823 01:27:45,890 --> 01:27:47,300 a set AVL tree. 1824 01:27:47,300 --> 01:27:50,070 All I'm storing is heights. 1825 01:27:50,070 --> 01:27:54,690 Is there an operation that you've thought about? 1826 01:27:54,690 --> 01:27:58,790 AUDIENCE: [INAUDIBLE] you're not storing the size of each. 1827 01:27:58,790 --> 01:28:00,770 JASON KU: A set AVL tree, by default, 1828 01:28:00,770 --> 01:28:02,300 does not store sizes, right? 1829 01:28:02,300 --> 01:28:04,220 That's what a sequence does. 1830 01:28:04,220 --> 01:28:08,370 But you think maybe that would be helpful in this situation? 1831 01:28:08,370 --> 01:28:08,870 Yeah. 1832 01:28:08,870 --> 01:28:12,500 So, actually, if I decided to augment by sizes also, 1833 01:28:12,500 --> 01:28:17,240 I could do the exact same kind of sequence find_at operation, 1834 01:28:17,240 --> 01:28:22,670 and I could be able to look up the n minus kth item 1835 01:28:22,670 --> 01:28:28,190 in here using the exact same function for subtree at that I 1836 01:28:28,190 --> 01:28:32,000 had in the sequence AVL tree stuff. 1837 01:28:32,000 --> 01:28:33,920 Actually, in CLRS, they don't even 1838 01:28:33,920 --> 01:28:35,930 bother with sequence AVL trees. 1839 01:28:35,930 --> 01:28:39,710 They go straight to, if I wanted this rank-find functionality 1840 01:28:39,710 --> 01:28:44,180 on a sorted order of things, then 1841 01:28:44,180 --> 01:28:47,330 I could augment the subtree sizes. 1842 01:28:47,330 --> 01:28:51,320 But it's actually a much more useful general property, 1843 01:28:51,320 --> 01:28:54,500 so we decided to present it to you in the context of sequence 1844 01:28:54,500 --> 01:28:56,780 AVL trees, because then I can just basically reduce 1845 01:28:56,780 --> 01:28:58,520 to it when I get to here. 1846 01:28:58,520 --> 01:29:01,130 So that's kind of a structure of a data structure 1847 01:29:01,130 --> 01:29:02,420 that work on this problem. 1848 01:29:02,420 --> 01:29:04,820 I leave it to you as an exercise to implement 1849 01:29:04,820 --> 01:29:06,920 all of these operations for yourself, 1850 01:29:06,920 --> 01:29:08,790 or take a look at the solutions. 1851 01:29:08,790 --> 01:29:13,460 The last one is going to be put online-- 1852 01:29:13,460 --> 01:29:15,140 the solution. 1853 01:29:15,140 --> 01:29:16,500 It's pretty complicated. 1854 01:29:16,500 --> 01:29:19,490 It's what's called-- you can think of the size augmentation 1855 01:29:19,490 --> 01:29:22,370 finding-rank as a one-sided range query. 1856 01:29:22,370 --> 01:29:27,410 It's basically, how many things are to the right of this value? 1857 01:29:27,410 --> 01:29:29,780 What the last problem does is walks you 1858 01:29:29,780 --> 01:29:32,330 through a two-sided range query, where 1859 01:29:32,330 --> 01:29:37,070 I want to know how many nodes are between these two values. 1860 01:29:37,070 --> 01:29:39,150 So it's a walkthrough. 1861 01:29:39,150 --> 01:29:39,650 All right. 1862 01:29:39,650 --> 01:29:40,550 Thanks, guys. 1863 01:29:40,550 --> 01:29:42,100 AUDIENCE: Thank you.