1 00:00:00,000 --> 00:00:12,580 [SQUEAKING] [RUSTLING] [CLICKING] 2 00:00:12,580 --> 00:00:15,970 ERIK DEMAINE: All right, welcome back to 006 Data Structures. 3 00:00:15,970 --> 00:00:19,330 Today, we're going to cover a different kind 4 00:00:19,330 --> 00:00:22,910 of tree-like data structure called a heap-- a binary heap. 5 00:00:22,910 --> 00:00:26,710 It's going to let us solve sorting problem in a new way. 6 00:00:26,710 --> 00:00:30,228 Let me first remind you of a portion-- 7 00:00:30,228 --> 00:00:32,020 the problem we're going to be solving today 8 00:00:32,020 --> 00:00:33,760 is called priority queue. 9 00:00:33,760 --> 00:00:34,803 This is the interface. 10 00:00:34,803 --> 00:00:36,220 We'll see several data structures, 11 00:00:36,220 --> 00:00:40,070 but one main data structure for today. 12 00:00:40,070 --> 00:00:44,650 And this is a subset of the set interface. 13 00:00:47,230 --> 00:00:49,420 And subsets are interesting because, potentially, we 14 00:00:49,420 --> 00:00:53,750 can solve them better, faster, simpler, something. 15 00:00:53,750 --> 00:00:57,880 And so you'll recognize-- you should recognize 16 00:00:57,880 --> 00:01:00,670 all of these operations, except we didn't normally 17 00:01:00,670 --> 00:01:02,930 highlight the max operation. 18 00:01:02,930 --> 00:01:06,190 So here, we're interested in storing a bunch of items. 19 00:01:06,190 --> 00:01:10,060 They have keys which we think of as priorities. 20 00:01:10,060 --> 00:01:13,840 And we want to be able to identify the maximum priority 21 00:01:13,840 --> 00:01:17,350 item in our set and remove it. 22 00:01:17,350 --> 00:01:19,250 And so there's lots of motivations for this. 23 00:01:19,250 --> 00:01:21,370 Maybe you have a router, packets going into the router. 24 00:01:21,370 --> 00:01:23,060 They have different priorities assigned to them. 25 00:01:23,060 --> 00:01:25,090 You want to route the highest priority first. 26 00:01:25,090 --> 00:01:28,360 Or you have processes on your computer trying 27 00:01:28,360 --> 00:01:32,620 to run on your single threaded-- single core, 28 00:01:32,620 --> 00:01:34,850 and you've got to choose which one to run next. 29 00:01:34,850 --> 00:01:37,840 And you usually run higher priority processes first. 30 00:01:37,840 --> 00:01:42,730 Or you're trying to simulate a system where events happen 31 00:01:42,730 --> 00:01:44,800 at different times, and you want to process 32 00:01:44,800 --> 00:01:46,990 the next event ordered by time. 33 00:01:46,990 --> 00:01:50,470 All of these are examples of the priority queue interface. 34 00:01:50,470 --> 00:01:52,420 We'll even see applications within this class 35 00:01:52,420 --> 00:01:54,160 when we get to graph algorithms. 36 00:01:54,160 --> 00:01:56,740 But the main two things we want to be able to support 37 00:01:56,740 --> 00:01:59,140 are inserting an item, which includes a key, 38 00:01:59,140 --> 00:02:01,930 and leading the maximum item, and also returning it 39 00:02:01,930 --> 00:02:03,670 at the same time. 40 00:02:03,670 --> 00:02:07,600 We'll also talk some about being able to build the structure 41 00:02:07,600 --> 00:02:09,164 faster than just inserting it. 42 00:02:09,164 --> 00:02:10,539 But of course, we could implement 43 00:02:10,539 --> 00:02:14,260 build by starting empty and repeatedly inserting. 44 00:02:14,260 --> 00:02:17,500 And also the complexity of just finding the max 45 00:02:17,500 --> 00:02:20,410 without deleting it, this you could simulate with these two 46 00:02:20,410 --> 00:02:24,190 operations by deleting the max and then reinserting it, 47 00:02:24,190 --> 00:02:25,930 which works. 48 00:02:25,930 --> 00:02:28,360 But often, we can do faster. 49 00:02:28,360 --> 00:02:32,470 But the two key, main operations are insert and delete max. 50 00:02:32,470 --> 00:02:37,180 And we're going to see a few data structures to do this. 51 00:02:37,180 --> 00:02:39,718 Any suggestions among the data structures 52 00:02:39,718 --> 00:02:40,760 we've seen in this class? 53 00:02:40,760 --> 00:02:45,604 What should we use to solve priority queue interface? 54 00:02:53,840 --> 00:02:54,770 Many possible answers. 55 00:02:57,882 --> 00:02:58,840 AUDIENCE: Sequence AVL? 56 00:02:58,840 --> 00:02:59,965 ERIK DEMAINE: Sequence AVL? 57 00:02:59,965 --> 00:03:01,540 Ooh, that's interesting! 58 00:03:01,540 --> 00:03:07,030 Sequence AVL is a good answer, but maybe the fancier version. 59 00:03:07,030 --> 00:03:07,830 Yeah? 60 00:03:07,830 --> 00:03:08,770 AUDIENCE: Set AVL? 61 00:03:08,770 --> 00:03:10,960 ERIK DEMAINE: Set AVL sounds good. 62 00:03:10,960 --> 00:03:14,090 Set AVL supports these operations and many more. 63 00:03:14,090 --> 00:03:18,190 All in log n time except for build, which takes n log n time 64 00:03:18,190 --> 00:03:20,980 because you have to sort first. 65 00:03:20,980 --> 00:03:23,530 So set AVL a good way to do this. 66 00:03:23,530 --> 00:03:27,580 We'll come back to your sequence AVL idea later. 67 00:03:27,580 --> 00:03:31,650 This gets log n for operation. 68 00:03:31,650 --> 00:03:32,150 Great. 69 00:03:32,150 --> 00:03:33,310 I mean, this is-- 70 00:03:33,310 --> 00:03:35,370 set AVL is our most powerful data structure. 71 00:03:35,370 --> 00:03:38,407 It does all the operations we care about on the set side. 72 00:03:38,407 --> 00:03:40,240 And the sequence AVL does all the operations 73 00:03:40,240 --> 00:03:41,115 on the sequence side. 74 00:03:41,115 --> 00:03:43,090 But note that this is a set, not a sequence. 75 00:03:43,090 --> 00:03:44,350 We care about keys. 76 00:03:44,350 --> 00:03:46,600 There are hacks to get around that with sequence AVLs, 77 00:03:46,600 --> 00:03:50,420 but let's do that later. 78 00:03:50,420 --> 00:03:52,880 So great, if we wanted to, for example, 79 00:03:52,880 --> 00:03:55,840 speed up find_max in a set AVL. 80 00:03:55,840 --> 00:03:58,420 We could add augmentation. 81 00:03:58,420 --> 00:04:05,410 We could-- remember subtree property augmentations? 82 00:04:05,410 --> 00:04:10,600 We can use that to get constant time find_max 83 00:04:10,600 --> 00:04:14,260 by storing in every node the maximum key item 84 00:04:14,260 --> 00:04:15,550 within the subtree. 85 00:04:15,550 --> 00:04:16,850 And that's a subtree property. 86 00:04:16,850 --> 00:04:18,740 It's one we mentioned last class. 87 00:04:18,740 --> 00:04:21,279 So we could even improve that to cut some time. 88 00:04:21,279 --> 00:04:22,690 Great. 89 00:04:22,690 --> 00:04:23,400 So we're done. 90 00:04:23,400 --> 00:04:24,025 End of lecture. 91 00:04:24,025 --> 00:04:27,040 [CHUCKLES] In some sense, that's true. 92 00:04:27,040 --> 00:04:28,540 But what we're going to see today 93 00:04:28,540 --> 00:04:30,800 is another data structure called a binary heap, 94 00:04:30,800 --> 00:04:34,780 which is, in some sense, a simplification of set AVL. 95 00:04:34,780 --> 00:04:37,480 It achieves basically the same time bounds. 96 00:04:37,480 --> 00:04:41,240 Build will be faster by a log factor. 97 00:04:41,240 --> 00:04:44,680 But that's not the main reason we care about them. 98 00:04:44,680 --> 00:04:46,960 The main advantage is that they're simpler 99 00:04:46,960 --> 00:04:51,070 and they give us an in-place sorting algorithm. 100 00:04:54,220 --> 00:04:58,150 So I have up here three of the operations 101 00:04:58,150 --> 00:05:01,120 I've been talking about-- build, insert, and delete_max. 102 00:05:01,120 --> 00:05:05,470 So we have set AVL trees there-- n log n build, log n insert, 103 00:05:05,470 --> 00:05:07,180 log n delete. 104 00:05:07,180 --> 00:05:12,310 So along the way to our heap, I want to mention two other data 105 00:05:12,310 --> 00:05:13,670 structures. 106 00:05:13,670 --> 00:05:18,010 One is a dynamic but unsorted array. 107 00:05:18,010 --> 00:05:20,320 And the other is a dynamic sorted array. 108 00:05:25,850 --> 00:05:27,530 These are simpler data structures 109 00:05:27,530 --> 00:05:29,300 we've talked about many times before. 110 00:05:29,300 --> 00:05:32,000 And they're useful kind of motivations for getting 111 00:05:32,000 --> 00:05:36,770 started, because a heap is going to be built on top of arrays 112 00:05:36,770 --> 00:05:38,060 instead of-- 113 00:05:38,060 --> 00:05:42,720 well, it's sort of a fusion between arrays and trees. 114 00:05:42,720 --> 00:05:47,400 So if I have an unsorted array, this is very easy 115 00:05:47,400 --> 00:05:49,320 to insert into, right? 116 00:05:49,320 --> 00:05:51,580 I just append to the end. 117 00:05:51,580 --> 00:05:53,650 This is what we called insert last. 118 00:05:53,650 --> 00:05:58,750 So insert is fast, constant amortized. 119 00:05:58,750 --> 00:06:00,180 We might have to resize the array, 120 00:06:00,180 --> 00:06:02,520 but so that's the amortized part. 121 00:06:02,520 --> 00:06:03,577 But delete max is slow. 122 00:06:03,577 --> 00:06:05,910 In an unsorted array, I don't know where the maximum is. 123 00:06:05,910 --> 00:06:07,770 So I have to scan through the whole array. 124 00:06:11,240 --> 00:06:12,865 So I scan through the array, identify 125 00:06:12,865 --> 00:06:15,700 that the max is somewhere in the middle, and then, 126 00:06:15,700 --> 00:06:16,990 if I want to delete it-- 127 00:06:22,780 --> 00:06:24,700 I want to delete that maximum element, well, 128 00:06:24,700 --> 00:06:26,230 in a dynamic array, all I can really 129 00:06:26,230 --> 00:06:28,940 do is delete the last element efficiently. 130 00:06:28,940 --> 00:06:33,870 So I could, for example, swap it with the last element. 131 00:06:33,870 --> 00:06:36,940 So I take this element and put it here, and then 132 00:06:36,940 --> 00:06:40,900 delete the last element in that array, which is pop in Python 133 00:06:40,900 --> 00:06:43,640 or delete_last in our world. 134 00:06:43,640 --> 00:06:50,160 So overall, this is linear time, which is bad. 135 00:06:50,160 --> 00:06:51,910 But I wanted to highlight exactly how it's 136 00:06:51,910 --> 00:06:54,220 done for a reason we'll get to in a moment. 137 00:06:54,220 --> 00:06:56,200 A sorted array is sort of the reverse. 138 00:06:56,200 --> 00:06:58,180 It's very easy to find the max. 139 00:06:58,180 --> 00:07:01,110 Where is it? 140 00:07:01,110 --> 00:07:02,580 At the end. 141 00:07:02,580 --> 00:07:06,210 delete_max, the maximum element is always the last element 142 00:07:06,210 --> 00:07:07,920 in a increasing sorted array. 143 00:07:11,440 --> 00:07:14,040 I guess that's constant amortized, because then I 144 00:07:14,040 --> 00:07:18,030 have to delete it, which may incur resizing. 145 00:07:18,030 --> 00:07:21,630 Insert, though, is going to be linear, 146 00:07:21,630 --> 00:07:25,680 because maybe I can binary search to find 147 00:07:25,680 --> 00:07:28,200 where the added item belongs. 148 00:07:28,200 --> 00:07:35,445 Let's say I just added this item here. 149 00:07:35,445 --> 00:07:36,820 I could binary search to find it, 150 00:07:36,820 --> 00:07:38,695 but then I'm going to have to do a big shift. 151 00:07:38,695 --> 00:07:41,590 So I might as well just swap repeatedly 152 00:07:41,590 --> 00:07:47,560 until I find the position where the added item x belongs. 153 00:07:47,560 --> 00:07:49,330 And now I've restored sorted order. 154 00:07:49,330 --> 00:07:52,300 That takes linear time, which is bad. 155 00:07:52,300 --> 00:07:56,110 And what we want is somehow the best of these two worlds. 156 00:07:56,110 --> 00:07:59,470 Insert is fast for array. 157 00:07:59,470 --> 00:08:01,750 Delete is fast for a sorted array. 158 00:08:01,750 --> 00:08:03,460 We can't get constant time for both. 159 00:08:03,460 --> 00:08:04,960 But we can get log n time for both. 160 00:08:04,960 --> 00:08:07,870 We already know how with set AVL trees. 161 00:08:07,870 --> 00:08:10,870 But we're going to see a different way to do it today. 162 00:08:10,870 --> 00:08:16,780 And the main motivation for a different way to do this 163 00:08:16,780 --> 00:08:18,670 is sorting. 164 00:08:18,670 --> 00:08:21,490 So I want to define a priority queue sort. 165 00:08:29,280 --> 00:08:33,120 So given any data structure that implements a priority queue 166 00:08:33,120 --> 00:08:36,419 interface, in particular insert and delete_max, 167 00:08:36,419 --> 00:08:38,220 I can make a sorting algorithm. 168 00:08:38,220 --> 00:08:39,270 What do I do? 169 00:08:39,270 --> 00:08:42,270 Insert all the items, delete all the items. 170 00:08:42,270 --> 00:08:45,990 But because when I delete them they come out largest first, 171 00:08:45,990 --> 00:08:47,610 I get them in reverse sorted order. 172 00:08:47,610 --> 00:08:51,840 Then I could reverse in linear time and I've sorted my items. 173 00:08:51,840 --> 00:09:04,740 So we can insert (x) for x in A, or (build(A)), 174 00:09:04,740 --> 00:09:10,210 and then repeatedly delete_max. 175 00:09:17,170 --> 00:09:19,520 How much time does this algorithm take? 176 00:09:19,520 --> 00:09:21,270 I'm going to introduce some notation here. 177 00:09:21,270 --> 00:09:25,080 It takes however long it takes to build n items, 178 00:09:25,080 --> 00:09:29,460 call that T sub build (n) plus-- 179 00:09:32,280 --> 00:09:42,655 sorry-- plus n times the time to do a delete_max. 180 00:09:45,510 --> 00:09:49,710 Or we can write this as n times time 181 00:09:49,710 --> 00:09:54,267 to do an insert, plus time to do a delete_max. 182 00:09:58,350 --> 00:10:01,590 So I'm using these T functions to just abstract what 183 00:10:01,590 --> 00:10:04,770 are the running times provided by my data structure that 184 00:10:04,770 --> 00:10:06,210 implements this interface. 185 00:10:06,210 --> 00:10:07,830 Interface says what's correct is, 186 00:10:07,830 --> 00:10:10,990 and these T functions give me my performance bounds. 187 00:10:10,990 --> 00:10:13,860 So if I plug in each of these data structures, 188 00:10:13,860 --> 00:10:16,560 I get a sorting algorithm. 189 00:10:16,560 --> 00:10:18,840 I get AVL sort, I get array sort, 190 00:10:18,840 --> 00:10:20,280 I get assorted array sort. 191 00:10:20,280 --> 00:10:21,478 What do those look like? 192 00:10:21,478 --> 00:10:23,145 It turns out many of these are familiar. 193 00:10:26,180 --> 00:10:29,530 So set AVLs take log n per operation. 194 00:10:29,530 --> 00:10:33,490 So we get an n log n sorting algorithm out of them, which 195 00:10:33,490 --> 00:10:36,580 is insert all of the items into the AVL tree. 196 00:10:36,580 --> 00:10:39,930 I don't want to use AVL build because that uses sort, and not 197 00:10:39,930 --> 00:10:42,220 allowed to sort in order to implement sort. 198 00:10:42,220 --> 00:10:43,990 But we saw how to insert into an AVL tree 199 00:10:43,990 --> 00:10:45,880 and keep the thing balanced. 200 00:10:45,880 --> 00:10:47,350 So that takes log n each. 201 00:10:47,350 --> 00:10:50,800 And then we can find the max, delete it, rebalance, 202 00:10:50,800 --> 00:10:51,430 and so on. 203 00:10:51,430 --> 00:10:52,780 Total time will be n log n. 204 00:10:52,780 --> 00:10:55,600 This is an algorithm we call AVL sort. 205 00:10:55,600 --> 00:10:58,630 It's a bit complicated, because AVL trees are complicated. 206 00:10:58,630 --> 00:11:04,150 But it gives us optimal comparison bound and log n. 207 00:11:04,150 --> 00:11:08,200 Now, what about array sort? 208 00:11:08,200 --> 00:11:13,310 So suppose I use an unsorted array. 209 00:11:13,310 --> 00:11:14,580 I insert the item. 210 00:11:14,580 --> 00:11:15,957 So if I insert the items-- 211 00:11:15,957 --> 00:11:18,540 so I'm doing all the insertions here before all the deletions. 212 00:11:18,540 --> 00:11:20,330 So what's going to happen is I just insert the items 213 00:11:20,330 --> 00:11:21,780 in the original array order. 214 00:11:21,780 --> 00:11:23,720 In other words, I just take the array. 215 00:11:23,720 --> 00:11:30,460 And then what I do is repeatedly extract the maximum item 216 00:11:30,460 --> 00:11:34,570 by searching for it, moving it to the end of the array, 217 00:11:34,570 --> 00:11:36,190 and then repeating that process. 218 00:11:36,190 --> 00:11:38,530 That sound familiar? 219 00:11:38,530 --> 00:11:43,960 That's selection sort from lecture three. 220 00:11:43,960 --> 00:11:47,800 So this-- arrays give us selection sort. 221 00:11:53,260 --> 00:11:54,880 This is a new way to think about what 222 00:11:54,880 --> 00:11:57,070 we were doing way back then. 223 00:11:57,070 --> 00:12:01,360 With a sorted array, what are we doing? 224 00:12:01,360 --> 00:12:02,800 We insert all the items. 225 00:12:02,800 --> 00:12:05,050 That's actually where all the work happens, because we 226 00:12:05,050 --> 00:12:06,490 maintain the sorted array. 227 00:12:06,490 --> 00:12:07,870 So we start with an empty array. 228 00:12:07,870 --> 00:12:08,590 It's sorted. 229 00:12:08,590 --> 00:12:09,353 We add an item. 230 00:12:09,353 --> 00:12:10,270 OK, it's still sorted. 231 00:12:10,270 --> 00:12:11,710 We add a second item, and we swap 232 00:12:11,710 --> 00:12:13,390 if we need to in order to sort. 233 00:12:13,390 --> 00:12:16,420 In general, when we add an item, we swap it to the left 234 00:12:16,420 --> 00:12:17,530 until it's sorted again. 235 00:12:17,530 --> 00:12:19,810 That is insertion sort. 236 00:12:26,080 --> 00:12:30,670 Kind of cool, this is a unifying framework for three sorting 237 00:12:30,670 --> 00:12:32,290 algorithms that we saw before. 238 00:12:32,290 --> 00:12:34,880 We didn't actually talk about AVL sort last time, 239 00:12:34,880 --> 00:12:36,190 but it was in the notes. 240 00:12:36,190 --> 00:12:39,110 And so that is the right part of this table. 241 00:12:39,110 --> 00:12:42,430 So of course, these array data structures are not efficient. 242 00:12:42,430 --> 00:12:44,710 They take linear time for some of the operations. 243 00:12:44,710 --> 00:12:46,570 So the sorting algorithms are not efficient. 244 00:12:46,570 --> 00:12:48,028 But they're ones we've seen before, 245 00:12:48,028 --> 00:12:49,938 so it's neat to see how they fit in here. 246 00:12:49,938 --> 00:12:51,730 They had the-- selection sort and insertion 247 00:12:51,730 --> 00:12:53,800 sort had the advantage that they were in place. 248 00:12:53,800 --> 00:12:57,490 You just needed a constant number of pointers or indices 249 00:12:57,490 --> 00:12:59,540 beyond the array itself. 250 00:12:59,540 --> 00:13:01,070 So they're very space efficient. 251 00:13:01,070 --> 00:13:02,348 So that was a plus for them. 252 00:13:02,348 --> 00:13:04,390 But they take n squared time, so you should never 253 00:13:04,390 --> 00:13:09,310 use them, except for n, at most, 100 or something. 254 00:13:09,310 --> 00:13:14,393 AVL tree sort is great and then it gets n log n time, probably 255 00:13:14,393 --> 00:13:16,060 more complicated than merge sort and you 256 00:13:16,060 --> 00:13:17,660 could stick to merge sort. 257 00:13:17,660 --> 00:13:21,950 But neither merge sort nor set AVL tree sort are in place. 258 00:13:21,950 --> 00:13:24,910 And so the goal of today is to get 259 00:13:24,910 --> 00:13:26,770 the best of all those worlds in sorting 260 00:13:26,770 --> 00:13:28,390 to get n log n comparisons, which 261 00:13:28,390 --> 00:13:30,910 is optimal in the comparison model, 262 00:13:30,910 --> 00:13:33,367 but get it to be in place. 263 00:13:33,367 --> 00:13:35,575 And that's what we're going to get with binary heaps. 264 00:13:38,750 --> 00:13:40,970 We're going to design a data structure that 265 00:13:40,970 --> 00:13:43,220 happens to build a little bit faster-- as I mentioned, 266 00:13:43,220 --> 00:13:45,470 linear time building. 267 00:13:45,470 --> 00:13:48,110 So it's not representing a sorted order in the same way 268 00:13:48,110 --> 00:13:49,880 that AVL trees are. 269 00:13:49,880 --> 00:13:51,410 But it will be kind of tree-based. 270 00:13:51,410 --> 00:13:53,687 It will also be array-based. 271 00:13:53,687 --> 00:13:56,270 We're going to get logarithmic time for insert and delete_max. 272 00:13:56,270 --> 00:14:00,230 It happens to be amortized, because we use arrays. 273 00:14:00,230 --> 00:14:04,010 But the key thing is that it's an in-place data structure. 274 00:14:04,010 --> 00:14:07,970 It only consists of an array of the items. 275 00:14:07,970 --> 00:14:11,120 And so, when we plug it into our sorting algorithm-- 276 00:14:11,120 --> 00:14:13,460 priority queue sort or generic sorting algorithm-- 277 00:14:13,460 --> 00:14:15,650 not only do we get n log n performance, 278 00:14:15,650 --> 00:14:18,530 but we also get an in-place sorting algorithm. 279 00:14:18,530 --> 00:14:21,510 This will be our first and only-- to this class-- 280 00:14:21,510 --> 00:14:24,570 n log n in-place sorting algorithm. 281 00:14:24,570 --> 00:14:27,250 Cool. 282 00:14:27,250 --> 00:14:29,890 That's the goal. 283 00:14:29,890 --> 00:14:31,300 Let's do it. 284 00:14:31,300 --> 00:14:36,420 So what we're going to do, because we're in place, 285 00:14:36,420 --> 00:14:39,690 basically we have to have an array storing our end items. 286 00:14:39,690 --> 00:14:41,580 That's sort of the definition of in-place, 287 00:14:41,580 --> 00:14:44,820 just using n slots of memory exactly 288 00:14:44,820 --> 00:14:47,918 the size of the number of items in our structure. 289 00:14:47,918 --> 00:14:50,460 But we're obviously not going to use a regular unsorted array 290 00:14:50,460 --> 00:14:52,998 or a regular sorted array. 291 00:14:52,998 --> 00:14:54,540 We're going to use array just as sort 292 00:14:54,540 --> 00:14:58,050 of the underlying technology for how things are stored. 293 00:14:58,050 --> 00:15:00,810 But we'd really like logarithmic performance, which 294 00:15:00,810 --> 00:15:02,850 should make you think tree. 295 00:15:02,850 --> 00:15:06,510 Only way to get a log is the binary tree, more or less. 296 00:15:06,510 --> 00:15:14,740 So somehow, we want to embed a tree into an array. 297 00:15:14,740 --> 00:15:16,050 Let me grab an example. 298 00:15:23,960 --> 00:15:26,480 Let me draw a tree. 299 00:15:44,680 --> 00:15:47,110 If I got to choose any old tree I want, 300 00:15:47,110 --> 00:15:51,550 I would choose a tree that's basically perfectly balanced. 301 00:15:51,550 --> 00:15:55,120 Perfectly balanced would be like this, where-- 302 00:15:55,120 --> 00:15:56,050 what's the property? 303 00:15:56,050 --> 00:15:58,900 That I have all of these levels-- 304 00:15:58,900 --> 00:16:01,810 all of these depths are completely filled with nodes. 305 00:16:01,810 --> 00:16:03,025 This is depth 0. 306 00:16:05,800 --> 00:16:11,340 Remember, this is depth 1, this is depth 2, this is depth 3. 307 00:16:11,340 --> 00:16:15,850 So what I'd really like is to have 2 to the i 308 00:16:15,850 --> 00:16:20,860 nodes at depth i. 309 00:16:20,860 --> 00:16:25,180 That would be a perfect binary tree. 310 00:16:25,180 --> 00:16:29,050 But that only works when n is 1 less than a power of 2, right? 311 00:16:29,050 --> 00:16:31,630 I can't always achieve that for any n. 312 00:16:31,630 --> 00:16:34,090 And so the next best thing I could hope for 313 00:16:34,090 --> 00:16:36,850 is 2 to the i at nodes at depth i 314 00:16:36,850 --> 00:16:40,340 until the very last i-- the largest depth. 315 00:16:40,340 --> 00:16:43,640 And in that level, I'm still going to restrict things. 316 00:16:43,640 --> 00:16:45,410 I'm going to force all of the nodes 317 00:16:45,410 --> 00:16:48,950 to be as far left as possible. 318 00:16:48,950 --> 00:17:03,550 So I want to say, except at max depth where nodes are-- 319 00:17:03,550 --> 00:17:05,050 I'll call them left justified. 320 00:17:10,660 --> 00:17:12,640 And these two properties together 321 00:17:12,640 --> 00:17:15,280 is what I call a complete binary tree. 322 00:17:27,839 --> 00:17:29,580 Why is this interesting? 323 00:17:29,580 --> 00:17:35,400 Because I claim I can represent a tree like this as an array. 324 00:17:35,400 --> 00:17:39,720 I've narrowed things down enough that I can draw an array down 325 00:17:39,720 --> 00:17:41,440 here. 326 00:17:41,440 --> 00:17:43,560 And what I'm going to do is write these nodes 327 00:17:43,560 --> 00:17:45,250 in depth order. 328 00:17:45,250 --> 00:17:47,910 So I write A first, because that's step 0. 329 00:17:47,910 --> 00:17:50,850 Then B, C, that's step 1. 330 00:17:50,850 --> 00:17:53,140 Then, well, they're alphabetical. 331 00:17:53,140 --> 00:17:54,880 I made it that way. 332 00:17:54,880 --> 00:17:58,680 D, E, F, G is depth 2. 333 00:17:58,680 --> 00:18:03,150 And then H, I, J is step 3. 334 00:18:03,150 --> 00:18:06,210 This is very different from traversal order of a tree. 335 00:18:06,210 --> 00:18:10,500 Traversal order would have been H, D, I, B, J, E, A, F, C, 336 00:18:10,500 --> 00:18:11,790 G, OK? 337 00:18:11,790 --> 00:18:15,330 But this is what we might call depth order, 338 00:18:15,330 --> 00:18:17,670 do the lowest depth nodes first-- 339 00:18:17,670 --> 00:18:24,440 very different way to lay things out or to linearize our data. 340 00:18:24,440 --> 00:18:27,260 And this is what a heap is going to look like. 341 00:18:27,260 --> 00:18:33,310 So the cool thing is, between complete binary 342 00:18:33,310 --> 00:18:36,400 trees and arrays is a bijection. 343 00:18:36,400 --> 00:18:39,760 For every array, there's a unique complete binary tree. 344 00:18:39,760 --> 00:18:43,340 And for every complete binary tree, there's a unique array. 345 00:18:43,340 --> 00:18:43,880 Why? 346 00:18:43,880 --> 00:18:46,700 Because the complete constraint forces 347 00:18:46,700 --> 00:18:48,660 everything-- forces my hand. 348 00:18:48,660 --> 00:18:50,870 There's only-- if I give you a number n, 349 00:18:50,870 --> 00:18:53,810 there is one tree shape of size n, right? 350 00:18:53,810 --> 00:18:57,043 You just fill in the nodes top down until you 351 00:18:57,043 --> 00:18:57,960 get to the last level. 352 00:18:57,960 --> 00:18:59,960 And then you have to fill them in left to right. 353 00:18:59,960 --> 00:19:03,830 It's what you might call reading order for writing down nodes. 354 00:19:03,830 --> 00:19:07,070 And the array is telling you which keys go where. 355 00:19:07,070 --> 00:19:09,210 This is the first node you write down at the root, 356 00:19:09,210 --> 00:19:10,790 this is the next node you write down 357 00:19:10,790 --> 00:19:13,670 at the left child of the root, and so on. 358 00:19:13,670 --> 00:19:16,820 So here we have a binary tree represented as an array, 359 00:19:16,820 --> 00:19:19,280 or array representing a binary tree. 360 00:19:19,280 --> 00:19:21,830 The very specific binary tree, it 361 00:19:21,830 --> 00:19:25,580 has a clear advantage, which is it is guaranteed balance. 362 00:19:25,580 --> 00:19:29,007 No rotations necessary in heaps, because complete binary trees 363 00:19:29,007 --> 00:19:29,840 are always balanced. 364 00:19:29,840 --> 00:19:33,770 In fact, they have the best height they possibly could, 365 00:19:33,770 --> 00:19:35,420 which is ceiling of log n. 366 00:19:39,560 --> 00:19:42,290 Balanced, remember, just meant you were big O of log n. 367 00:19:42,290 --> 00:19:44,330 This is 1 times log n. 368 00:19:44,330 --> 00:19:47,900 So it's the best level of balance you could hope for. 369 00:19:47,900 --> 00:19:52,910 So somehow, I claim, we can maintain a complete binary tree 370 00:19:52,910 --> 00:19:54,145 for solving priority queues. 371 00:19:54,145 --> 00:19:55,520 This would not be possible if you 372 00:19:55,520 --> 00:19:57,620 were trying to solve the whole set interface. 373 00:19:57,620 --> 00:19:59,660 And that's kind of the cool thing about heaps, 374 00:19:59,660 --> 00:20:03,030 is that by just focusing on the subset of the set interface, 375 00:20:03,030 --> 00:20:05,030 we can do more. 376 00:20:05,030 --> 00:20:07,493 We can maintain this very strong property. 377 00:20:07,493 --> 00:20:09,410 And because we have this very strong property, 378 00:20:09,410 --> 00:20:10,993 we don't even need to store this tree. 379 00:20:10,993 --> 00:20:13,610 We're not going to store left and right pointers and parent 380 00:20:13,610 --> 00:20:16,620 pointers, we're just going to store the array. 381 00:20:16,620 --> 00:20:28,960 This is what we call an implicit data structure, which 382 00:20:28,960 --> 00:20:38,530 basically means no pointers, just an array of the n items. 383 00:20:44,080 --> 00:20:46,630 How are we going to get away without storing pointers? 384 00:20:46,630 --> 00:20:48,790 I'd still like to treat it like a tree. 385 00:20:48,790 --> 00:20:52,150 I'd still like to know the left child of B is D 386 00:20:52,150 --> 00:20:56,350 and the right child B is E. We'll see why in a moment. 387 00:20:56,350 --> 00:21:01,990 Well, we can do this with index arithmetic. 388 00:21:01,990 --> 00:21:06,710 So maybe I should add some labels before I get there. 389 00:21:11,420 --> 00:21:14,170 So this array naturally has indices. 390 00:21:14,170 --> 00:21:16,630 This is index 0. 391 00:21:16,630 --> 00:21:21,580 This is index 1, index 2, index 3, index 4, index 5, index 6, 392 00:21:21,580 --> 00:21:27,610 7, 8, 9, because there are 10 items, 0 through 9. 393 00:21:27,610 --> 00:21:29,650 And I can apply those labels up here, too. 394 00:21:29,650 --> 00:21:32,590 These are the same nodes, so 0, 1, 2. 395 00:21:32,590 --> 00:21:34,300 This is just a depth order. 396 00:21:36,742 --> 00:21:38,200 But once I have this labeling, it's 397 00:21:38,200 --> 00:21:40,130 going to be a lot easier to figure things out. 398 00:21:40,130 --> 00:21:42,790 So if I wanted to know the left child of B is D, 399 00:21:42,790 --> 00:21:48,596 somehow, given the number 1, I want to compute the number 3. 400 00:21:48,596 --> 00:21:52,655 Add 2, there are all sorts-- multiply by 3, 401 00:21:52,655 --> 00:21:54,030 there are all sorts of operations 402 00:21:54,030 --> 00:21:55,770 that take 1 and turn it into 3. 403 00:21:55,770 --> 00:21:58,590 But there's only one that's going to work in all cases. 404 00:21:58,590 --> 00:22:00,840 And the intuition here is, well, I have to 2 405 00:22:00,840 --> 00:22:02,265 the i nodes at level i. 406 00:22:02,265 --> 00:22:04,530 If I want to go to the child level, 407 00:22:04,530 --> 00:22:06,990 there's 2 to the i plus 1 nodes down there-- 408 00:22:06,990 --> 00:22:08,310 exactly double. 409 00:22:08,310 --> 00:22:10,980 So it's the very last one, but that won't really matter. 410 00:22:10,980 --> 00:22:13,350 If there is a left child, it will behave the same. 411 00:22:13,350 --> 00:22:15,810 And so, intuitively, I have this space of size 2 to the i. 412 00:22:15,810 --> 00:22:19,350 I have to expand it to a space of size 2 to the i plus 1, 413 00:22:19,350 --> 00:22:22,580 So I should multiply by 2. 414 00:22:22,580 --> 00:22:27,110 And that's almost right, but then there's some constants. 415 00:22:27,110 --> 00:22:29,450 So I'd like to say 2 times i. 416 00:22:29,450 --> 00:22:32,900 But if we look at the examples here, 1 times 2 417 00:22:32,900 --> 00:22:36,170 is 2, which is 1 less than 3. 418 00:22:36,170 --> 00:22:38,420 2 times 2 is 4, which is 1 less than 5. 419 00:22:38,420 --> 00:22:39,770 Hey, we almost got it right. 420 00:22:39,770 --> 00:22:43,055 It's just off by 1. 421 00:22:43,055 --> 00:22:45,680 Off by 1 is-- 422 00:22:45,680 --> 00:22:50,840 index errors are the most common things in computer science. 423 00:22:50,840 --> 00:22:53,760 What about the right child? 424 00:22:53,760 --> 00:22:56,540 If the left child is a 2i plus 1, where is the right child? 425 00:22:59,640 --> 00:23:00,630 I hear lots of mumbles. 426 00:23:00,630 --> 00:23:03,930 2i plus 2-- one more. 427 00:23:03,930 --> 00:23:06,600 Because we're writing things left to right in depth order, 428 00:23:06,600 --> 00:23:09,830 the right child is the right sibling of the left child. 429 00:23:09,830 --> 00:23:12,000 So it's just one larger, OK? 430 00:23:12,000 --> 00:23:15,990 Given those rules, we can also compute parent. 431 00:23:15,990 --> 00:23:18,120 It's just whatever is the inverse 432 00:23:18,120 --> 00:23:22,500 of both of these functions, which I want 433 00:23:22,500 --> 00:23:25,890 to divide by 2 at some point. 434 00:23:25,890 --> 00:23:29,400 I want to get back to i given 2i plus 1 or given 2i plus 2. 435 00:23:29,400 --> 00:23:36,530 And so if I subtract 1 from i, then I either 436 00:23:36,530 --> 00:23:38,150 get 2i or 2i plus 1. 437 00:23:38,150 --> 00:23:42,680 And then, if I take an integer division by 2, I get i-- 438 00:23:42,680 --> 00:23:43,720 the original i. 439 00:23:43,720 --> 00:23:46,790 Sorry, maybe I'll call this j to be clearer. 440 00:23:46,790 --> 00:23:49,580 So j is the left or right child. 441 00:23:49,580 --> 00:23:53,210 Then I can reconstruct i, which was the parent. 442 00:23:53,210 --> 00:23:56,160 So this is constant number arithmetic operations. 443 00:23:56,160 --> 00:23:58,535 So I don't have to store left and right pointers. 444 00:23:58,535 --> 00:24:00,410 I can just compute them whenever I need them. 445 00:24:00,410 --> 00:24:03,140 Whenever I'm at some node like E, 446 00:24:03,140 --> 00:24:05,540 and I want to know what's its left child-- 447 00:24:05,540 --> 00:24:08,630 sorry, given the node index 4, which 448 00:24:08,630 --> 00:24:11,127 happens to contain the item E, and I 449 00:24:11,127 --> 00:24:13,460 want to know what's its left child, I just multiply by 2 450 00:24:13,460 --> 00:24:13,960 and add 1. 451 00:24:13,960 --> 00:24:14,930 I get 9. 452 00:24:14,930 --> 00:24:17,360 And then, I can index into this array at position 9. 453 00:24:17,360 --> 00:24:20,630 Because I don't-- this is just in my head, remember. 454 00:24:20,630 --> 00:24:22,770 We're just thinking that there's a tree here. 455 00:24:22,770 --> 00:24:26,340 But in reality, on the computer, there's just the array. 456 00:24:26,340 --> 00:24:30,530 So if we want to go from E to J, we can, from 4 to 9. 457 00:24:30,530 --> 00:24:33,620 If we go try to go to the right child, we multiply by 2. 458 00:24:33,620 --> 00:24:35,430 8 add 2-- 10. 459 00:24:35,430 --> 00:24:37,820 And we see, oh, 10 is beyond the end of the array. 460 00:24:37,820 --> 00:24:40,280 But our array stores its size, so we realize, oh, E 461 00:24:40,280 --> 00:24:42,110 does not have a right child. 462 00:24:42,110 --> 00:24:45,050 This is something you can only do in a complete binary tree. 463 00:24:45,050 --> 00:24:46,490 In a general binary tree you don't 464 00:24:46,490 --> 00:24:49,650 have these nice properties. 465 00:24:49,650 --> 00:24:53,970 Cool, so this is basically a heap. 466 00:24:53,970 --> 00:24:58,372 I just need to add one more property, 467 00:24:58,372 --> 00:24:59,830 naturally called the heap property. 468 00:25:04,880 --> 00:25:09,520 So there are multiple types of heaps. 469 00:25:09,520 --> 00:25:11,890 This type of heap is called a binary heap. 470 00:25:11,890 --> 00:25:14,430 We will talk about others in future lectures. 471 00:25:14,430 --> 00:25:28,690 I'm going to call it Q. Explicit thing-- 472 00:25:28,690 --> 00:25:47,913 this is an array representing a complete binary tree, 473 00:25:47,913 --> 00:26:02,270 called the array Q. And we want every node to satisfy 474 00:26:02,270 --> 00:26:20,770 the so-called max-heap property, which says Q[i] is greater than 475 00:26:20,770 --> 00:26:29,360 or equal to Q[j] for both children left of i and right 476 00:26:29,360 --> 00:26:29,860 of i. 477 00:26:37,160 --> 00:26:41,690 So we have a node i. 478 00:26:41,690 --> 00:26:44,750 And it has two children-- 479 00:26:44,750 --> 00:26:48,290 2i plus 1 and 2i plus 2. 480 00:26:48,290 --> 00:26:50,360 These are two values of j. 481 00:26:53,360 --> 00:26:57,140 What we want is a greater than or equal to 482 00:26:57,140 --> 00:26:59,940 relation here and here. 483 00:26:59,940 --> 00:27:01,940 So this node should be bigger than both this one 484 00:27:01,940 --> 00:27:02,520 and this one. 485 00:27:02,520 --> 00:27:03,890 Which of these is larger? 486 00:27:03,890 --> 00:27:06,230 We don't know, and we don't care-- 487 00:27:06,230 --> 00:27:08,360 very different from binary search trees 488 00:27:08,360 --> 00:27:11,640 or set binary trees, where we said these guys were less than 489 00:27:11,640 --> 00:27:13,613 or equal to this one, this one was less than 490 00:27:13,613 --> 00:27:15,530 or equal to all the nodes in the subtree here. 491 00:27:15,530 --> 00:27:17,240 We're just locally saying, this node 492 00:27:17,240 --> 00:27:20,000 is greater than or equal to this node and this node. 493 00:27:20,000 --> 00:27:21,800 So the biggest is at the top. 494 00:27:26,270 --> 00:27:34,010 So one nice lemma about these heaps-- this is weird. 495 00:27:34,010 --> 00:27:36,590 Let me give you some more intuition. 496 00:27:36,590 --> 00:27:39,680 If you are a binary heap, if you satisfy this max-heap property 497 00:27:39,680 --> 00:27:41,960 everywhere, then in fact, you learn 498 00:27:41,960 --> 00:27:47,510 that every node i is greater than or equal to all nodes 499 00:27:47,510 --> 00:27:48,320 in its subtree. 500 00:27:48,320 --> 00:27:55,760 These are what we call descendants in subtree of i. 501 00:28:00,750 --> 00:28:03,390 Let me look at this example. 502 00:28:03,390 --> 00:28:05,180 So I haven't written any numbers here. 503 00:28:05,180 --> 00:28:06,650 You can imagine. 504 00:28:06,650 --> 00:28:10,880 So A here is greater than or equal to both B and C, 505 00:28:10,880 --> 00:28:12,950 and B is greater than or equal to D and E, 506 00:28:12,950 --> 00:28:14,757 and C is greater than or equal to F and G, 507 00:28:14,757 --> 00:28:16,340 D is greater than or equal to H and I, 508 00:28:16,340 --> 00:28:17,840 and E is greater than or equal to J. 509 00:28:17,840 --> 00:28:20,030 That would make this structure a heap, 510 00:28:20,030 --> 00:28:23,640 not just a complete binary tree. 511 00:28:23,640 --> 00:28:24,640 So what does that imply? 512 00:28:24,640 --> 00:28:27,380 It implies that A must be the maximum. 513 00:28:27,380 --> 00:28:29,802 So you look at any node here, like J, A 514 00:28:29,802 --> 00:28:32,260 is greater than or equal to B is greater than or equal to E 515 00:28:32,260 --> 00:28:35,400 is greater than or equal to J. And in general, what we're 516 00:28:35,400 --> 00:28:38,100 saying is that A is greater than or equal to all nodes 517 00:28:38,100 --> 00:28:38,730 in the tree. 518 00:28:38,730 --> 00:28:41,190 B is greater than or equal to all nodes in its subtree 519 00:28:41,190 --> 00:28:42,340 down here. 520 00:28:42,340 --> 00:28:44,760 C is greater than or equal to all nodes in its subtree. 521 00:28:44,760 --> 00:28:47,220 That's what this lemma is saying. 522 00:28:47,220 --> 00:28:50,790 You can prove this lemma by induction. 523 00:28:50,790 --> 00:28:55,280 But it's really simple. 524 00:28:55,280 --> 00:28:57,830 If you have two nodes, i and j, and j is somewhere 525 00:28:57,830 --> 00:28:59,450 in the subtree, that means there's 526 00:28:59,450 --> 00:29:03,770 some downward path from i to j. 527 00:29:03,770 --> 00:29:05,510 And you know that, for every edge 528 00:29:05,510 --> 00:29:08,060 we traverse on a downward path, our key 529 00:29:08,060 --> 00:29:10,280 is going down non-strictly. 530 00:29:10,280 --> 00:29:12,692 So every child is less than or equal to its parent. 531 00:29:12,692 --> 00:29:14,150 i is greater than or equal to this, 532 00:29:14,150 --> 00:29:14,960 is greater than or equal to this, 533 00:29:14,960 --> 00:29:16,335 is greater than or equal to this, 534 00:29:16,335 --> 00:29:17,900 is greater than or equal to j, OK? 535 00:29:17,900 --> 00:29:21,470 So by transitivity of less than or equal to, 536 00:29:21,470 --> 00:29:24,710 you know that i is, in fact, greater than or equal to j. 537 00:29:24,710 --> 00:29:27,080 Or sorry, the key in i is greater than 538 00:29:27,080 --> 00:29:28,940 or equal to the key in j. 539 00:29:28,940 --> 00:29:30,960 This is what we're calling i, the index. 540 00:29:30,960 --> 00:29:34,170 This is what we would call Q of i. 541 00:29:34,170 --> 00:29:37,142 This is Index j Q of j. 542 00:29:40,010 --> 00:29:46,660 Very different way to organize keys in a tree, 543 00:29:46,660 --> 00:29:49,360 but as you might imagine, this is going 544 00:29:49,360 --> 00:29:50,680 to be good for priority queues. 545 00:29:50,680 --> 00:29:52,060 Because priority queues just need 546 00:29:52,060 --> 00:29:54,493 to find the maximum elements. 547 00:29:54,493 --> 00:29:55,660 Then they need to delete it. 548 00:29:55,660 --> 00:29:57,785 That's going to be harder, because leading the root 549 00:29:57,785 --> 00:30:00,880 is, like-- that's the hardest node to delete, intuitively. 550 00:30:00,880 --> 00:30:02,950 I'd really prefer to delete leaves. 551 00:30:02,950 --> 00:30:06,970 But leading leaves and keeping a complete binary tree 552 00:30:06,970 --> 00:30:08,470 is actually kind of hard. 553 00:30:08,470 --> 00:30:10,930 If I want to delete H, that doesn't 554 00:30:10,930 --> 00:30:12,760 look like a binary tree, or it doesn't 555 00:30:12,760 --> 00:30:14,468 look like a complete binary tree anymore. 556 00:30:14,468 --> 00:30:16,555 It's not left justified. 557 00:30:16,555 --> 00:30:18,430 Similarly, if I want to delete F, that's bad. 558 00:30:18,430 --> 00:30:20,800 Because now, I don't have four nodes here. 559 00:30:20,800 --> 00:30:24,870 The one node that's easy to delete is J, right? 560 00:30:24,870 --> 00:30:27,380 If I remove that node, I still have a complete tree. 561 00:30:27,380 --> 00:30:31,700 The last leaf, the last position in my array, 562 00:30:31,700 --> 00:30:33,200 is the one that's easy to delete. 563 00:30:33,200 --> 00:30:36,390 That's good, because arrays are good at leading the last item. 564 00:30:36,390 --> 00:30:39,770 But what I've set up here is it's easy to find the max. 565 00:30:39,770 --> 00:30:41,660 It's going to be up here at the root. 566 00:30:41,660 --> 00:30:43,580 Deleting it is annoying. 567 00:30:43,580 --> 00:30:48,200 I'd like to somehow take that key and put it at position-- 568 00:30:48,200 --> 00:30:50,540 at the last position at the last leaf, 569 00:30:50,540 --> 00:30:53,683 because that's the one that's easy to delete. 570 00:30:53,683 --> 00:30:55,100 And that's indeed what we're going 571 00:30:55,100 --> 00:30:58,250 to do in a delete algorithm. 572 00:30:58,250 --> 00:30:59,600 Let me first do insert. 573 00:30:59,600 --> 00:31:09,450 I guess that's a little simpler, kind of symmetric 574 00:31:09,450 --> 00:31:11,410 to what we just said. 575 00:31:11,410 --> 00:31:17,130 So if I want to insert a key or an item x which has some key, 576 00:31:17,130 --> 00:31:21,210 again, the only thing I really can do in an array-- 577 00:31:21,210 --> 00:31:23,532 if I want to add a new item, it has to go at the end. 578 00:31:23,532 --> 00:31:24,990 The only thing we know how to do is 579 00:31:24,990 --> 00:31:26,650 insert at the end of an array. 580 00:31:26,650 --> 00:31:28,742 This is what we called insert_last. 581 00:31:33,820 --> 00:31:34,320 this? 582 00:31:34,320 --> 00:31:39,270 Corresponds to adding a node containing x-- 583 00:31:39,270 --> 00:31:44,380 the item x-- in the very last level of the complete binary 584 00:31:44,380 --> 00:31:44,880 tree. 585 00:31:44,880 --> 00:31:47,130 Either it goes to the right of all the existing nodes, 586 00:31:47,130 --> 00:31:48,150 or starts a new level. 587 00:31:48,150 --> 00:31:50,238 But it's always going to be the last leaf. 588 00:31:50,238 --> 00:31:51,780 After we do the insertion, it will be 589 00:31:51,780 --> 00:31:54,030 at position size of Q minus 1. 590 00:31:57,805 --> 00:31:59,830 This is probably not enough, though. 591 00:31:59,830 --> 00:32:01,840 We just inserted an arbitrary item in a leaf. 592 00:32:01,840 --> 00:32:04,890 And now, it may not satisfy the max-heap property anymore. 593 00:32:04,890 --> 00:32:08,110 So let's just check if it does, and if it doesn't, fix it. 594 00:32:08,110 --> 00:32:10,490 That's what we know how to do. 595 00:32:10,490 --> 00:32:12,220 But this time, we're not even going 596 00:32:12,220 --> 00:32:23,960 to need rotations, which is cool. 597 00:32:23,960 --> 00:32:25,710 So I'm going to define an operation called 598 00:32:25,710 --> 00:32:29,190 max_heapify_up. 599 00:32:29,190 --> 00:32:33,170 This will make things more like a max-heap. 600 00:32:33,170 --> 00:32:39,810 We're going to start at size of Q minus 1 for our value i. 601 00:32:39,810 --> 00:32:43,860 But it's going to be recursive, so what we're going to do 602 00:32:43,860 --> 00:32:49,290 is look at a node i, in particular the one 603 00:32:49,290 --> 00:32:50,490 that just got inserted. 604 00:32:50,490 --> 00:32:52,230 And where could it violate things? 605 00:32:52,230 --> 00:32:58,790 Well, with its parent, because we have no idea what key 606 00:32:58,790 --> 00:32:59,750 we just put here. 607 00:32:59,750 --> 00:33:01,220 Maybe it's less than our parent. 608 00:33:01,220 --> 00:33:02,210 Then we're happy. 609 00:33:02,210 --> 00:33:05,180 But if it's greater than our parent, we're in trouble 610 00:33:05,180 --> 00:33:06,950 and we should fix it. 611 00:33:06,950 --> 00:33:23,090 So if the item in the parent's key is less than i's key-- 612 00:33:25,736 --> 00:33:28,705 ah, I see I forgot to write key and all these spots. 613 00:33:28,705 --> 00:33:32,000 This should be dot key and dot key, 614 00:33:32,000 --> 00:33:37,190 because Q[i] is an item that gets its key. 615 00:33:37,190 --> 00:33:38,660 So this is the bad case. 616 00:33:38,660 --> 00:33:41,360 This is if the parent is smaller than the child. 617 00:33:41,360 --> 00:33:43,400 We wanted the parent to always be greater 618 00:33:43,400 --> 00:33:45,240 than or equal to its children. 619 00:33:45,240 --> 00:33:49,310 So in that case, what could we do? 620 00:33:49,310 --> 00:33:50,150 Swap them. 621 00:33:53,690 --> 00:33:58,640 Let's swap Q parent of i-- 622 00:33:58,640 --> 00:34:03,970 excellent, more chalk-- with Q[i]. 623 00:34:03,970 --> 00:34:06,100 Now they're in the right order. 624 00:34:06,100 --> 00:34:08,650 Now, we need to think about what about the other child 625 00:34:08,650 --> 00:34:10,449 of that node? 626 00:34:10,449 --> 00:34:13,040 And what about its parent? 627 00:34:13,040 --> 00:34:15,889 So I have some numbers here. 628 00:34:15,889 --> 00:34:22,120 Let's say this was 5 and this was 10. 629 00:34:22,120 --> 00:34:24,909 What do I know about this picture before? 630 00:34:24,909 --> 00:34:29,590 Well, I know that 10 is this newly inserted item. 631 00:34:29,590 --> 00:34:32,380 It's the only one that could have caused violations 632 00:34:32,380 --> 00:34:33,980 when I first inserted it. 633 00:34:33,980 --> 00:34:37,420 So I know that before this-- before I moved 10 around, 634 00:34:37,420 --> 00:34:40,090 I knew all the things in this left subtree 635 00:34:40,090 --> 00:34:44,199 are less than or equal to 5, and everything up here 636 00:34:44,199 --> 00:34:47,560 are created equal to 5. 637 00:34:47,560 --> 00:34:50,590 I also know that the nodes in here, in fact, were less than 638 00:34:50,590 --> 00:34:51,219 or equal to 5. 639 00:34:51,219 --> 00:34:55,540 Other than this node 10 that we just inserted, 640 00:34:55,540 --> 00:34:57,139 this was a correct heap. 641 00:34:57,139 --> 00:34:59,320 So 5 was a separator between-- 642 00:34:59,320 --> 00:35:01,090 things above it on the ancestor chain 643 00:35:01,090 --> 00:35:04,180 are greater than or equal to 5, and things in its subtree 644 00:35:04,180 --> 00:35:06,580 are less than or equal to it. 645 00:35:06,580 --> 00:35:11,920 So after I do this swap, which I'm just going to do-- 646 00:35:15,280 --> 00:35:21,310 after I swap the items 5 and 10, 10 is up here, 5 is here. 647 00:35:21,310 --> 00:35:23,350 And now, I realize, OK, great, this edge 648 00:35:23,350 --> 00:35:26,260 is happy, because now 10 is greater than or equal to 5. 649 00:35:26,260 --> 00:35:28,690 But also this edge is happy, because it used to be happy, 650 00:35:28,690 --> 00:35:31,600 and we only made its parent larger. 651 00:35:31,600 --> 00:35:34,540 Now this edge maybe is bad. 652 00:35:34,540 --> 00:35:35,665 And so we need to recurse-- 653 00:35:39,140 --> 00:35:41,810 recurse on the parent. 654 00:35:45,070 --> 00:35:46,470 But that's it. 655 00:35:46,470 --> 00:35:47,845 So we fixed this one edge. 656 00:35:47,845 --> 00:35:49,720 Initially, this happens way down at the leaf. 657 00:35:49,720 --> 00:35:52,150 But in general, we're taking our item 658 00:35:52,150 --> 00:35:56,350 that we inserted, which is x, and it starts at the last leaf, 659 00:35:56,350 --> 00:35:58,150 and it may be bubbles up for a while. 660 00:35:58,150 --> 00:35:59,858 And maybe it gets all the way to the root 661 00:35:59,858 --> 00:36:01,660 if we inserted a new maximum item. 662 00:36:01,660 --> 00:36:03,610 But in each step, it goes up one. 663 00:36:03,610 --> 00:36:06,730 And so the running time of all this stuff 664 00:36:06,730 --> 00:36:12,228 is the height of the tree, which is log n. 665 00:36:12,228 --> 00:36:14,770 And because there's only this one item that could potentially 666 00:36:14,770 --> 00:36:16,270 be wrong, if it ever stops moving, 667 00:36:16,270 --> 00:36:18,190 we've just checked that it satisfies 668 00:36:18,190 --> 00:36:19,990 the max-heap property. 669 00:36:19,990 --> 00:36:21,610 If it gets to the root, you can also 670 00:36:21,610 --> 00:36:23,368 check it satisfies the max-heap property. 671 00:36:23,368 --> 00:36:25,160 So there's a base case I didn't write here, 672 00:36:25,160 --> 00:36:30,760 which is if i equals 0, we're at the root, we're done. 673 00:36:30,760 --> 00:36:34,413 And then you can prove this correct by induction. 674 00:36:34,413 --> 00:36:36,830 There's just one item that's in the wrong spot, initially. 675 00:36:36,830 --> 00:36:38,480 And we put it into a right spot. 676 00:36:38,480 --> 00:36:42,770 There are many places it could go, but we will move it to the, 677 00:36:42,770 --> 00:36:46,040 I guess, unique ancestor position that is correct-- 678 00:36:46,040 --> 00:36:48,500 that satisfies max-heap property, OK? 679 00:36:48,500 --> 00:36:50,180 So that's insert. 680 00:36:50,180 --> 00:37:03,000 Delete is going to be almost the same, delete_min, that is-- 681 00:37:14,010 --> 00:37:17,307 sorry, delete_max, thank you. 682 00:37:17,307 --> 00:37:19,140 You can of course define all of these things 683 00:37:19,140 --> 00:37:20,130 for min instead of max. 684 00:37:20,130 --> 00:37:21,360 Everything works the same. 685 00:37:21,360 --> 00:37:23,160 I just have a hard time remembering 686 00:37:23,160 --> 00:37:25,320 which one we're doing. 687 00:37:25,320 --> 00:37:28,220 Just don't switch you can't use a max-heap to do delete_min. 688 00:37:28,220 --> 00:37:29,970 You can't use a min-heap to do delete_max, 689 00:37:29,970 --> 00:37:32,790 but you can use a min-heap to do delete_min. 690 00:37:32,790 --> 00:37:35,200 That's fine. 691 00:37:35,200 --> 00:37:40,680 So like I said, the only node we really know how to delete 692 00:37:40,680 --> 00:37:42,520 is the last leaf on the last level, 693 00:37:42,520 --> 00:37:43,920 which is the end of the array. 694 00:37:43,920 --> 00:37:47,550 Because that's what arrays can delete efficiently. 695 00:37:47,550 --> 00:37:50,100 And what we need to delete is the root item, 696 00:37:50,100 --> 00:37:52,380 because that's always the maximum one, which 697 00:37:52,380 --> 00:37:54,490 is at the first position in the array. 698 00:37:54,490 --> 00:37:55,890 So what do we do? 699 00:37:55,890 --> 00:38:00,323 Swap them, our usual trick. 700 00:38:00,323 --> 00:38:01,740 I think the cool thing about heaps 701 00:38:01,740 --> 00:38:03,370 is we never have to do rotations. 702 00:38:03,370 --> 00:38:05,370 We're only going to do swaps, which is something 703 00:38:05,370 --> 00:38:06,840 we had to do with trees also-- 704 00:38:06,840 --> 00:38:07,650 binary trees. 705 00:38:11,087 --> 00:38:18,780 Q[0] with Q of the last item-- 706 00:38:18,780 --> 00:38:19,440 great, done. 707 00:38:19,440 --> 00:38:22,330 Now we have the last item is the one we want to delete. 708 00:38:22,330 --> 00:38:30,420 So we do delete_last, or pop in Python, and boom, we've got-- 709 00:38:30,420 --> 00:38:32,340 we've now deleted the maximum item. 710 00:38:32,340 --> 00:38:36,210 Of course, we may have also messed up the max-heap property 711 00:38:36,210 --> 00:38:40,230 just like we did with insert. 712 00:38:40,230 --> 00:38:43,410 So with insert, we were adding a last leaf. 713 00:38:43,410 --> 00:38:46,050 Now, what we're doing is swapping the last leaf 714 00:38:46,050 --> 00:38:46,710 with the-- 715 00:38:46,710 --> 00:38:48,300 I'm pointing at the wrong picture. 716 00:38:48,300 --> 00:38:50,610 Let me go back to this tree. 717 00:38:50,610 --> 00:38:55,530 What we did is swap item J with A. So the problem is now-- 718 00:38:55,530 --> 00:38:57,450 and then we deleted this node. 719 00:38:57,450 --> 00:38:59,910 The problem is now that that root node 720 00:38:59,910 --> 00:39:01,830 has maybe a very small key. 721 00:39:01,830 --> 00:39:04,900 Because the key that's here now is whatever was down here, 722 00:39:04,900 --> 00:39:06,150 which is very low in the tree. 723 00:39:06,150 --> 00:39:08,910 So intuitively, that's a small value. 724 00:39:08,910 --> 00:39:10,630 This is supposed to be the maximum value, 725 00:39:10,630 --> 00:39:12,700 and we just put a small value in the root. 726 00:39:12,700 --> 00:39:14,040 So what do we do? 727 00:39:14,040 --> 00:39:16,540 Heapify down. 728 00:39:16,540 --> 00:39:19,050 We're going to take that item and somehow push it 729 00:39:19,050 --> 00:39:21,390 down to the tree until the-- 730 00:39:21,390 --> 00:39:24,760 down in the tree until max-heap property is satisfied. 731 00:39:24,760 --> 00:39:30,960 So this is going to be max_heapify_down. 732 00:39:30,960 --> 00:39:34,590 And we will start at position 0, which is the root. 733 00:39:38,120 --> 00:39:45,840 And max_heapify_down is going to be a recursive algorithm. 734 00:39:45,840 --> 00:39:47,990 So we'll start at some position i. 735 00:39:47,990 --> 00:39:49,940 And initially, that's the root. 736 00:39:49,940 --> 00:39:53,955 And what we're going to do is look at position i and its two 737 00:39:53,955 --> 00:39:54,455 children. 738 00:39:57,350 --> 00:40:01,090 So let's say we put a very small value up here, like 0. 739 00:40:01,090 --> 00:40:04,180 And let's say we have our children, 5 and 10. 740 00:40:04,180 --> 00:40:06,220 We don't know-- maybe I'll swap their order just 741 00:40:06,220 --> 00:40:10,420 to be more generic, because that looks like not quite 742 00:40:10,420 --> 00:40:12,700 a binary search tree, but we don't 743 00:40:12,700 --> 00:40:14,060 know their relative order. 744 00:40:14,060 --> 00:40:17,070 But one of them is greater than or equal to the other 745 00:40:17,070 --> 00:40:18,920 in some order. 746 00:40:18,920 --> 00:40:21,910 And so what would I like to do to fix this local picture? 747 00:40:25,370 --> 00:40:26,840 Yeah, I want to swap. 748 00:40:26,840 --> 00:40:28,490 And I could swap-- 749 00:40:28,490 --> 00:40:29,900 0 is clearly in the wrong spot. 750 00:40:29,900 --> 00:40:31,400 It needs to go lower in the tree. 751 00:40:31,400 --> 00:40:34,280 I can swap 0 with 5 or 0 with 10. 752 00:40:34,280 --> 00:40:35,810 Which one? 753 00:40:35,810 --> 00:40:36,310 10. 754 00:40:39,010 --> 00:40:43,435 I could draw the picture with 5, but it will not be happy. 755 00:40:46,180 --> 00:40:46,888 Why 10? 756 00:40:46,888 --> 00:40:48,430 We want to do it with the larger one, 757 00:40:48,430 --> 00:40:51,010 because then this edge will be happy, 758 00:40:51,010 --> 00:40:52,660 and also this edge will be happy. 759 00:40:52,660 --> 00:40:56,710 If I swapped 5 up there instead, the 5/10 edge would be unhappy. 760 00:40:56,710 --> 00:40:58,460 It wouldn't satisfy the max-heap property. 761 00:40:58,460 --> 00:41:01,270 So I can do one swap and fix max-heap property. 762 00:41:01,270 --> 00:41:05,680 Except that, again, 0 may be unhappy with its children. 763 00:41:05,680 --> 00:41:08,380 0 was this one item that was in the wrong spot. 764 00:41:08,380 --> 00:41:10,210 And so it made it have to go farther down. 765 00:41:10,210 --> 00:41:11,560 But 5 will be-- 766 00:41:11,560 --> 00:41:12,790 5 didn't even move. 767 00:41:12,790 --> 00:41:13,600 So it's happy. 768 00:41:13,600 --> 00:41:16,960 Everything in this subtree is good. 769 00:41:16,960 --> 00:41:18,460 What about the parent? 770 00:41:18,460 --> 00:41:20,950 Well, if you think about it, because everything 771 00:41:20,950 --> 00:41:23,890 was a correct heap before we added 0, 772 00:41:23,890 --> 00:41:26,770 or before we put 0 too high, all of these nodes 773 00:41:26,770 --> 00:41:33,820 will be greater than or equal to 10 on the ancestor path. 774 00:41:33,820 --> 00:41:36,550 And all of these nodes were less than or equal to 10 775 00:41:36,550 --> 00:41:38,380 before, unless you're equal to 5. 776 00:41:38,380 --> 00:41:41,510 So that's still true. 777 00:41:41,510 --> 00:41:43,880 But you see, this tree is happy. 778 00:41:43,880 --> 00:41:45,200 This tree still may be unhappy. 779 00:41:45,200 --> 00:41:47,450 0 still might need to push down farther. 780 00:41:47,450 --> 00:41:50,300 That's going to be the recursion. 781 00:41:50,300 --> 00:41:56,650 So we check down here. 782 00:42:06,470 --> 00:42:09,650 There's a base case, which is if i is a leaf, we're done. 783 00:42:09,650 --> 00:42:11,270 Because there's nothing below them. 784 00:42:13,900 --> 00:42:17,100 So we satisfy the max-heap property at i 785 00:42:17,100 --> 00:42:18,750 because there's no children. 786 00:42:18,750 --> 00:42:24,400 Otherwise, let's look at the leaf among the left-- 787 00:42:24,400 --> 00:42:26,700 sorry, left not leaf-- 788 00:42:26,700 --> 00:42:29,640 among the two children left and right of i. 789 00:42:29,640 --> 00:42:30,960 Right if i might not exist. 790 00:42:30,960 --> 00:42:32,000 Then ignore it. 791 00:42:32,000 --> 00:42:36,000 But among the two children that exist, 792 00:42:36,000 --> 00:42:42,327 find the one that has maximum key value, Q[j].key. 793 00:42:46,840 --> 00:42:49,090 That was 10 in our example. 794 00:42:49,090 --> 00:42:52,040 And then, if these items are out of order, 795 00:42:52,040 --> 00:42:54,460 if we do not satisfy-- 796 00:42:54,460 --> 00:42:56,710 so greater than would be satisfy. 797 00:42:56,710 --> 00:43:00,550 Less than Q[j] would be the opposite of the max-heap 798 00:43:00,550 --> 00:43:03,850 property here. 799 00:43:03,850 --> 00:43:06,560 If max-heap property is violated, 800 00:43:06,560 --> 00:43:15,160 then we fix it by swapping Q[i] with Q[j], 801 00:43:15,160 --> 00:43:17,590 and then we recurse on j-- 802 00:43:21,700 --> 00:43:24,980 call max_heapify_down of j. 803 00:43:24,980 --> 00:43:25,560 That's it. 804 00:43:25,560 --> 00:43:26,460 So pretty symmetric. 805 00:43:26,460 --> 00:43:28,460 Insert was a little bit simpler, because we only 806 00:43:28,460 --> 00:43:29,780 have one parent. 807 00:43:29,780 --> 00:43:31,610 Delete_min, because we're pushing down, 808 00:43:31,610 --> 00:43:32,760 we have two children. 809 00:43:32,760 --> 00:43:34,140 We have to pick one. 810 00:43:34,140 --> 00:43:36,470 But there's a clear choice-- the bigger one. 811 00:43:36,470 --> 00:43:38,930 And again, this algorithm-- this whole thing-- 812 00:43:38,930 --> 00:43:40,980 will take order h time, the height of the tree, 813 00:43:40,980 --> 00:43:44,410 which is log n, because our node just sort of bubbles down. 814 00:43:44,410 --> 00:43:45,410 At some point, it stops. 815 00:43:45,410 --> 00:43:47,480 When it stops, we know the max-heap property 816 00:43:47,480 --> 00:43:49,050 was satisfied there. 817 00:43:49,050 --> 00:43:51,470 And if you check along the way, by induction, 818 00:43:51,470 --> 00:43:53,000 all the other max-heap properties 819 00:43:53,000 --> 00:43:57,400 will be satisfied, because they were before. 820 00:43:57,400 --> 00:43:59,670 So almost forced what we could do here. 821 00:43:59,670 --> 00:44:01,420 The amazing thing is that you can actually 822 00:44:01,420 --> 00:44:04,240 maintain a complete binary tree that satisfies 823 00:44:04,240 --> 00:44:05,800 the max-heap property. 824 00:44:05,800 --> 00:44:08,560 But once you're told that, the algorithm kind of falls out. 825 00:44:08,560 --> 00:44:09,805 Because we have an array. 826 00:44:09,805 --> 00:44:11,680 The only thing we can do is insert and delete 827 00:44:11,680 --> 00:44:12,585 the last item. 828 00:44:12,585 --> 00:44:14,710 And so we've got to swap things to there in order-- 829 00:44:14,710 --> 00:44:18,170 or out of there in order to make that work. 830 00:44:18,170 --> 00:44:20,500 And then, the rest is just checking locally 831 00:44:20,500 --> 00:44:22,330 that you can fix the property. 832 00:44:24,930 --> 00:44:27,000 Cool. 833 00:44:27,000 --> 00:44:32,300 So that's almost it, not quite what we wanted. 834 00:44:32,300 --> 00:44:36,370 So we now have log n amortize insert and delete_max 835 00:44:36,370 --> 00:44:39,330 in our heap. 836 00:44:39,330 --> 00:44:41,260 We did not yet cover linear build. 837 00:44:41,260 --> 00:44:45,070 Right now, it's n log n if you insert n times. 838 00:44:45,070 --> 00:44:47,830 And we did not yet cover how to make this an in-place sorting 839 00:44:47,830 --> 00:44:49,030 algorithm. 840 00:44:49,030 --> 00:44:52,660 So let me sketch each of those. 841 00:44:52,660 --> 00:44:56,490 I think first is in-place. 842 00:44:56,490 --> 00:45:01,530 So how do we make this algorithm in-place? 843 00:45:01,530 --> 00:45:04,215 I guess I want that, but I don't need this. 844 00:45:06,720 --> 00:45:09,820 So we want to follow priority queue sort. 845 00:45:09,820 --> 00:45:10,760 Maybe I do want that. 846 00:45:16,570 --> 00:45:22,060 But I don't want to have to grow and shrink my array. 847 00:45:22,060 --> 00:45:24,235 I would just like to start with the array itself. 848 00:45:36,500 --> 00:45:37,675 So this is in place. 849 00:45:41,977 --> 00:45:43,810 So what we're going to do is say, OK, here's 850 00:45:43,810 --> 00:45:47,548 my array that I want to sort. 851 00:45:47,548 --> 00:45:48,340 That's given to me. 852 00:45:48,340 --> 00:45:50,400 That's the input to priority queue sort. 853 00:45:53,650 --> 00:45:56,320 And what I'd like is to build a priority queue out of it. 854 00:45:56,320 --> 00:45:57,820 Initially, it's empty. 855 00:45:57,820 --> 00:46:02,600 And then I want to insert the items one at a time, let's say. 856 00:46:02,600 --> 00:46:05,000 So in general, what I'm going to do 857 00:46:05,000 --> 00:46:09,018 is maintain that Q is some prefix of A. 858 00:46:09,018 --> 00:46:10,560 That's going to be my priority queue. 859 00:46:10,560 --> 00:46:15,110 It's going to live in this sub array-- this prefix. 860 00:46:15,110 --> 00:46:17,240 So how do I insert a new item? 861 00:46:17,240 --> 00:46:20,510 Well, I just increment. 862 00:46:20,510 --> 00:46:29,090 So to do an insert, the first step is increment size of Q. 863 00:46:29,090 --> 00:46:31,025 Then I will have taken the next item from A 864 00:46:31,025 --> 00:46:32,540 and injected into this Q. 865 00:46:32,540 --> 00:46:37,430 And conveniently, if we look at our insert code, which is here, 866 00:46:37,430 --> 00:46:39,470 the first thing we wanted to do was add an item 867 00:46:39,470 --> 00:46:40,600 at the end of the array. 868 00:46:40,600 --> 00:46:43,940 So we just did it without any actual work, just 869 00:46:43,940 --> 00:46:44,697 conceptual work. 870 00:46:44,697 --> 00:46:46,280 We just said, oh, our Q is one bigger. 871 00:46:46,280 --> 00:46:46,780 Boom! 872 00:46:46,780 --> 00:46:49,430 Now this is at the end of the array. 873 00:46:49,430 --> 00:46:52,480 no more amortization, in fact, because we're not ever 874 00:46:52,480 --> 00:46:54,230 resizing our array, we're just saying, oh, 875 00:46:54,230 --> 00:46:57,050 now Q is a little bit bigger of a prefix. 876 00:46:57,050 --> 00:46:59,450 It just absorb the next item of A. 877 00:46:59,450 --> 00:47:06,330 Similarly, delete_max is going to, 878 00:47:06,330 --> 00:47:12,260 at the end, decrement the size of Q. Why is that OK? 879 00:47:12,260 --> 00:47:16,520 Because at the end of our delete_max operation-- 880 00:47:16,520 --> 00:47:19,520 not quite at the end, but almost the end-- 881 00:47:19,520 --> 00:47:22,160 we deleted the last item from our array. 882 00:47:22,160 --> 00:47:25,310 So we just replaced that delete last with a decrement, 883 00:47:25,310 --> 00:47:28,290 and that's going to shrink the Q by 1. 884 00:47:28,290 --> 00:47:31,430 It has the exact same impact as leading the last item. 885 00:47:31,430 --> 00:47:34,520 But now, it's constant time, worst case not amortized. 886 00:47:34,520 --> 00:47:37,790 And the result is we never actually build a dynamic array. 887 00:47:37,790 --> 00:47:40,140 We just use a portion of A to do it. 888 00:47:40,140 --> 00:47:41,645 So what's going to happen is we're 889 00:47:41,645 --> 00:47:43,895 going to absorb all the items into the priority queue, 890 00:47:43,895 --> 00:47:45,620 and then start kicking them out. 891 00:47:45,620 --> 00:47:49,790 As we kick them out, we kick out the largest key item first, 892 00:47:49,790 --> 00:47:52,100 and we put it here, then the next largest, then 893 00:47:52,100 --> 00:47:53,420 the next largest, and so on. 894 00:47:53,420 --> 00:47:55,010 The minimum item is going to be here. 895 00:47:55,010 --> 00:47:56,150 And, boom, it's sorted. 896 00:47:56,150 --> 00:48:00,080 This is the whole reason I did max-heaps instead of min-heaps, 897 00:48:00,080 --> 00:48:04,100 is that in the end, this will be a upward sorted array 898 00:48:04,100 --> 00:48:05,300 with the max at the end. 899 00:48:05,300 --> 00:48:07,400 Because we always kick out items at the end. 900 00:48:07,400 --> 00:48:10,200 We delete the max first. 901 00:48:10,200 --> 00:48:15,180 So that is what's normally called heapsort. 902 00:48:15,180 --> 00:48:18,750 You can apply this same trick to insertion sort and selection 903 00:48:18,750 --> 00:48:21,360 sort, and you actually get the insertion sort and selection 904 00:48:21,360 --> 00:48:23,820 sort algorithms that we've seen which operate 905 00:48:23,820 --> 00:48:26,920 in prefixes of the array. 906 00:48:26,920 --> 00:48:29,280 Cool, so now we have-- 907 00:48:29,280 --> 00:48:31,110 we've achieved the y up there, which 908 00:48:31,110 --> 00:48:34,390 is n log n sorting algorithm that is in-place. 909 00:48:34,390 --> 00:48:35,640 So that was our main goal-- 910 00:48:35,640 --> 00:48:37,160 heapsort. 911 00:48:37,160 --> 00:48:46,150 Let me very quickly mention you can build a heap in linear time 912 00:48:46,150 --> 00:48:47,930 with a clever trick. 913 00:48:47,930 --> 00:48:50,140 So if you insert the items one at a time, 914 00:48:50,140 --> 00:48:53,230 that would correspond to inserting down the array. 915 00:48:53,230 --> 00:48:57,040 And every time I insert an item, I have to walk up the tree. 916 00:48:57,040 --> 00:49:04,170 So this would be the sum of the depth of each node. 917 00:49:04,170 --> 00:49:08,520 If you do that, you get n log n. 918 00:49:08,520 --> 00:49:12,150 This is the sum over i of log i. 919 00:49:12,150 --> 00:49:14,610 That turns out to be n log n. 920 00:49:14,610 --> 00:49:17,310 It's a log of n factorial. 921 00:49:17,310 --> 00:49:20,220 The cool trick is to, instead, imagine 922 00:49:20,220 --> 00:49:21,810 adding all the items at once and not 923 00:49:21,810 --> 00:49:24,805 heapifying anything, and then heapify up-- sorry, 924 00:49:24,805 --> 00:49:28,420 heapify down from the bottom up. 925 00:49:28,420 --> 00:49:31,530 So here we're heapifying up. 926 00:49:31,530 --> 00:49:35,190 Now, we're going to heapify down. 927 00:49:35,190 --> 00:49:37,080 And surprisingly, that's better. 928 00:49:37,080 --> 00:49:42,150 Because this is the sum of the heights of the nodes. 929 00:49:42,150 --> 00:49:43,810 And that turns out to be linear. 930 00:49:43,810 --> 00:49:45,090 It's not obvious. 931 00:49:45,090 --> 00:49:49,740 But intuitively, for a depth, this is 0, this is log n, 932 00:49:49,740 --> 00:49:51,450 and we've got a whole ton of leaves. 933 00:49:51,450 --> 00:49:54,220 So right at the leaf level, you can see, we're paying n log n, 934 00:49:54,220 --> 00:49:54,720 right? 935 00:49:54,720 --> 00:49:57,540 Because there are n of them, and each one costs log n. 936 00:49:57,540 --> 00:49:59,850 Down here, at the leaf level, we're paying constant. 937 00:49:59,850 --> 00:50:03,480 Because the height of the leaves are 1. 938 00:50:03,480 --> 00:50:05,250 Here, the height of the root is log n. 939 00:50:05,250 --> 00:50:06,480 And this is better. 940 00:50:06,480 --> 00:50:09,530 Now we're paying a small amount for the thing 941 00:50:09,530 --> 00:50:11,040 of which there are many. 942 00:50:11,040 --> 00:50:12,730 It's not quite a geometric series, 943 00:50:12,730 --> 00:50:15,090 but it turns out this is linear. 944 00:50:15,090 --> 00:50:17,790 So that's how you can do linear building heap. 945 00:50:17,790 --> 00:50:23,450 To come back to your question about sequence AVL trees, 946 00:50:23,450 --> 00:50:25,500 turns out you can get all of the same bounds 947 00:50:25,500 --> 00:50:27,750 as heaps, except for the in-place part, 948 00:50:27,750 --> 00:50:30,330 by taking a sequence AVL tree, storing 949 00:50:30,330 --> 00:50:33,150 the items in an arbitrary order, and augmenting 950 00:50:33,150 --> 00:50:36,210 by max, which is a crazy idea. 951 00:50:36,210 --> 00:50:38,490 But it also gives you a linear build time. 952 00:50:38,490 --> 00:50:40,710 And yeah, there's other fun stuff in your notes. 953 00:50:40,710 --> 00:50:42,980 But I'll stop there.