1 00:00:00,000 --> 00:00:04,473 [SQUEAKING] [RUSTLING] [CLICKING] 2 00:00:12,740 --> 00:00:15,920 ERIK DEMAINE: All right, welcome back to data structures land. 3 00:00:15,920 --> 00:00:21,140 Today we continue and complete our segment on binary trees. 4 00:00:21,140 --> 00:00:23,030 So this is part two. 5 00:00:23,030 --> 00:00:26,180 If you missed part one, go back and watch part one. 6 00:00:26,180 --> 00:00:30,650 Last time, we talked about binary trees in general. 7 00:00:30,650 --> 00:00:36,980 We had each node stored an item, and also a left pointer 8 00:00:36,980 --> 00:00:38,570 and a right pointer to other nodes, 9 00:00:38,570 --> 00:00:41,420 and a parent pointer to another node. 10 00:00:41,420 --> 00:00:43,910 This was an example of a tree. 11 00:00:43,910 --> 00:00:45,965 B, and C are A's children. 12 00:00:45,965 --> 00:00:49,010 A is the parent of B and C, and also the root 13 00:00:49,010 --> 00:00:50,330 of the entire tree. 14 00:00:50,330 --> 00:00:51,847 We defined the height of a node. 15 00:00:51,847 --> 00:00:53,180 We didn't use this too much yet. 16 00:00:53,180 --> 00:00:54,763 But we're going to use it a lot today. 17 00:00:54,763 --> 00:00:58,460 So remember, the height is as drawn in red here. 18 00:00:58,460 --> 00:01:02,390 Height of the node is the length of the longest downward path 19 00:01:02,390 --> 00:01:03,230 counting edges. 20 00:01:03,230 --> 00:01:05,510 So B, for example, has a length 2 paths. 21 00:01:05,510 --> 00:01:07,070 So we write a 2 here. 22 00:01:07,070 --> 00:01:09,560 You can also think of it as if you just 23 00:01:09,560 --> 00:01:14,600 live within the subtree rooted at B, B subtree, 24 00:01:14,600 --> 00:01:17,612 then what is the maximum depth of those nodes, 25 00:01:17,612 --> 00:01:19,320 if you prefer to think about it that way. 26 00:01:19,320 --> 00:01:20,750 Either way is fine. 27 00:01:20,750 --> 00:01:22,730 And in particular, we distinguished 28 00:01:22,730 --> 00:01:25,460 h, the height of the root node, as the height 29 00:01:25,460 --> 00:01:27,360 of the entire tree. 30 00:01:27,360 --> 00:01:29,450 And what we achieved last time was basically 31 00:01:29,450 --> 00:01:32,850 all of our operations ran in order h time. 32 00:01:32,850 --> 00:01:37,730 So we had subtree insert, subtree delete, subtree first 33 00:01:37,730 --> 00:01:38,270 and last. 34 00:01:38,270 --> 00:01:40,187 We could compute the predecessor and successor 35 00:01:40,187 --> 00:01:42,780 of a node, all in order h time. 36 00:01:42,780 --> 00:01:48,170 So as long as h was small, we were happy. 37 00:01:48,170 --> 00:01:50,510 And remember, what does predecessor and successor mean? 38 00:01:50,510 --> 00:01:54,320 It's talking about an implicit order in the tree, which 39 00:01:54,320 --> 00:01:57,590 is what we call traversal order, which is defined recursively 40 00:01:57,590 --> 00:02:02,330 as recursively traverse the left subtree, then output the root, 41 00:02:02,330 --> 00:02:05,040 then recursively traverse the right subtree. 42 00:02:05,040 --> 00:02:10,520 So in this example, the traversal order is F is the-- 43 00:02:10,520 --> 00:02:13,130 if you go all the way left, that was the first 44 00:02:13,130 --> 00:02:14,930 in the traversal order. 45 00:02:14,930 --> 00:02:19,430 Then we have-- right, I'll make me some space here. 46 00:02:19,430 --> 00:02:24,140 Then we have D, then we have B. Then 47 00:02:24,140 --> 00:02:26,000 we do the right subtree of B, which 48 00:02:26,000 --> 00:02:30,590 is E. Then we have the root, because we finished 49 00:02:30,590 --> 00:02:32,310 the left subtree of the root. 50 00:02:32,310 --> 00:02:37,220 So that's A. And then we have C. 51 00:02:37,220 --> 00:02:40,910 So there's an implicit linear order encoded by this tree. 52 00:02:40,910 --> 00:02:42,770 And the whole point of binary trees 53 00:02:42,770 --> 00:02:46,220 is that we can efficiently update the tree much faster 54 00:02:46,220 --> 00:02:49,160 than we could explicitly write down an order in an array 55 00:02:49,160 --> 00:02:50,130 or something like that. 56 00:02:50,130 --> 00:02:53,480 So binary trees let us quickly-- 57 00:02:53,480 --> 00:02:55,520 now, quickly is not so quick right now. 58 00:02:55,520 --> 00:02:56,990 Because everything is order h. 59 00:02:56,990 --> 00:02:58,790 And in the worst case, h is linear. 60 00:02:58,790 --> 00:03:00,350 Because we can have a tree like this. 61 00:03:03,200 --> 00:03:05,120 But today, we're going to make-- we're going 62 00:03:05,120 --> 00:03:07,490 to guarantee that h is log n. 63 00:03:07,490 --> 00:03:09,050 And so the goal of today is to take 64 00:03:09,050 --> 00:03:12,080 all of these operations that run in order h time 65 00:03:12,080 --> 00:03:14,405 and get them to run an order log n time, just 66 00:03:14,405 --> 00:03:16,530 by modifying the data structure we've already seen. 67 00:03:16,530 --> 00:03:18,822 So we've done a lot of the hard work, just a little bit 68 00:03:18,822 --> 00:03:21,590 more work we need to do today on something called 69 00:03:21,590 --> 00:03:25,370 AVL trees or height balance. 70 00:03:25,370 --> 00:03:28,650 But before we get there, I want to talk a little bit more-- 71 00:03:28,650 --> 00:03:30,150 at the very end of the last lecture, 72 00:03:30,150 --> 00:03:32,970 we talked about once you have these subtree operations-- 73 00:03:32,970 --> 00:03:36,110 so I can insert and delete in the subtree-- 74 00:03:36,110 --> 00:03:38,780 how do I actually use that to solve the problems that we care 75 00:03:38,780 --> 00:03:41,630 about in this class, which are sequence data structure and set 76 00:03:41,630 --> 00:03:42,870 data structure? 77 00:03:42,870 --> 00:03:47,910 So we talked mostly about the set data structure last time. 78 00:03:47,910 --> 00:03:50,810 So in general, we're going to define what traversal order 79 00:03:50,810 --> 00:03:53,780 we maintain by a binary tree. 80 00:03:53,780 --> 00:03:58,730 And so for a set, because for the set interface, 81 00:03:58,730 --> 00:04:01,700 we're interested in doing queries like find_next 82 00:04:01,700 --> 00:04:04,642 and find_previous, given a key, if it's not there, 83 00:04:04,642 --> 00:04:06,350 tell me the previous one or the next one, 84 00:04:06,350 --> 00:04:08,720 this is something we could do with binary search. 85 00:04:08,720 --> 00:04:12,170 And so the big, cool thing that binary trees let us do, 86 00:04:12,170 --> 00:04:16,279 if we let the traversal order always be all of the items 87 00:04:16,279 --> 00:04:18,680 stored in increasing key order, then we 88 00:04:18,680 --> 00:04:23,450 are effectively maintaining the items in order-- 89 00:04:23,450 --> 00:04:24,680 in the traversal order sense. 90 00:04:24,680 --> 00:04:27,210 Again, we're not explicitly maintaining them in order. 91 00:04:27,210 --> 00:04:29,840 But up here, we're maintaining a tree that 92 00:04:29,840 --> 00:04:32,300 represents items in key order. 93 00:04:32,300 --> 00:04:37,640 And so this lets us do a subtree_find operation-- 94 00:04:37,640 --> 00:04:40,220 which you could easily use to implement find, 95 00:04:40,220 --> 00:04:42,500 and find_previous, and so on-- 96 00:04:42,500 --> 00:04:43,800 as follows. 97 00:04:43,800 --> 00:04:45,890 We start at the root of the tree. 98 00:04:45,890 --> 00:04:48,770 So we can say, node equals root initially. 99 00:04:48,770 --> 00:04:52,770 And then we can recursively search for a key k as follows. 100 00:04:52,770 --> 00:04:56,900 We check, well, if the item at the root 101 00:04:56,900 --> 00:04:59,490 has a key that's bigger than k-- 102 00:04:59,490 --> 00:05:00,890 let me draw a little picture. 103 00:05:04,200 --> 00:05:08,348 So we're at some node here. 104 00:05:08,348 --> 00:05:10,220 This is a node. 105 00:05:10,220 --> 00:05:13,550 And it has left subtree and a right subtree. 106 00:05:13,550 --> 00:05:16,780 And there's some item with some key. 107 00:05:16,780 --> 00:05:20,260 So if the key we're looking for is less than the node's item, 108 00:05:20,260 --> 00:05:23,080 that means it's down here in the left subtree. 109 00:05:23,080 --> 00:05:25,510 And so we recurse on node.left. 110 00:05:25,510 --> 00:05:27,900 If they're equal, that means that this item 111 00:05:27,900 --> 00:05:29,150 is the item we're looking for. 112 00:05:29,150 --> 00:05:30,848 So we can just return it or the node, 113 00:05:30,848 --> 00:05:32,390 depending on what you're looking for. 114 00:05:32,390 --> 00:05:34,690 And if the key in here is greater than the key 115 00:05:34,690 --> 00:05:38,650 we're looking for, then we'll recurse to the right. 116 00:05:38,650 --> 00:05:40,570 If you think about it a little bit, 117 00:05:40,570 --> 00:05:43,600 this is exactly binary search on an array. 118 00:05:43,600 --> 00:05:46,120 It just happens to be on a tree instead. 119 00:05:46,120 --> 00:05:55,330 If you think of an array like this, 120 00:05:55,330 --> 00:05:56,500 what does binary search do? 121 00:05:56,500 --> 00:05:58,340 It first looks at the key in the middle. 122 00:05:58,340 --> 00:06:00,160 I'm going to draw that as the root. 123 00:06:00,160 --> 00:06:04,060 And then, it recurses either on the left chunk, 124 00:06:04,060 --> 00:06:08,110 which I will draw recursively, or on the right chunk. 125 00:06:08,110 --> 00:06:10,240 And so if you happen to have a perfect binary 126 00:06:10,240 --> 00:06:13,690 tree like this one, it is simulating exactly 127 00:06:13,690 --> 00:06:15,048 binary search in this array. 128 00:06:15,048 --> 00:06:17,590 But this we're going to be able to maintain dynamically-- not 129 00:06:17,590 --> 00:06:19,690 perfect any more, but close. 130 00:06:19,690 --> 00:06:23,180 Whereas this we could not maintain in sorted order. 131 00:06:23,180 --> 00:06:27,160 So this is like a generalization of binary search 132 00:06:27,160 --> 00:06:30,670 to work on trees instead of on arrays. 133 00:06:30,670 --> 00:06:34,445 And for this reason, set binary trees are called binary search 134 00:06:34,445 --> 00:06:36,820 trees, because they're the tree version of binary search. 135 00:06:36,820 --> 00:06:38,450 So there's many equivalent names. 136 00:06:38,450 --> 00:06:41,140 So binary search tree is another name for set binary tree. 137 00:06:41,140 --> 00:06:43,720 And the key thing that makes this algorithm work is 138 00:06:43,720 --> 00:06:45,970 the so-called binary search tree property, which 139 00:06:45,970 --> 00:06:50,050 is all the keys in the left subtree of a node 140 00:06:50,050 --> 00:06:54,100 are less than the root, or of that node, 141 00:06:54,100 --> 00:06:57,880 and that key is less than all the keys in the right subtree. 142 00:06:57,880 --> 00:07:00,380 And this is true recursively all the way down. 143 00:07:00,380 --> 00:07:02,950 And so that's how you prove that this algorithm is 144 00:07:02,950 --> 00:07:04,850 correct by this property. 145 00:07:04,850 --> 00:07:06,170 Why is this true? 146 00:07:06,170 --> 00:07:09,160 Because if we can maintain traversal order 147 00:07:09,160 --> 00:07:12,010 to be increasing key, then that's 148 00:07:12,010 --> 00:07:13,983 exactly what traversal order means. 149 00:07:13,983 --> 00:07:16,150 It tells you all the things in the left subtree come 150 00:07:16,150 --> 00:07:18,070 before the root, which come before all the things 151 00:07:18,070 --> 00:07:18,945 in the right subtree. 152 00:07:18,945 --> 00:07:23,680 So this property implies this one. 153 00:07:23,680 --> 00:07:26,110 And how do you maintain things in increasing key order? 154 00:07:26,110 --> 00:07:27,790 It's pretty easy. 155 00:07:27,790 --> 00:07:31,240 If you want to insert an item, where does it belong? 156 00:07:31,240 --> 00:07:33,760 Well, you do this search to find where it 157 00:07:33,760 --> 00:07:35,380 would belong if it was there. 158 00:07:35,380 --> 00:07:38,260 If it's there, you can overwrite the value stored with that key. 159 00:07:38,260 --> 00:07:42,393 If it's not, this search will fall off 160 00:07:42,393 --> 00:07:43,810 the tree at some point, and that's 161 00:07:43,810 --> 00:07:48,850 where you insert a new node in your tree. 162 00:07:48,850 --> 00:07:51,610 That was covered in recitation, so I don't want to dwell on it. 163 00:07:51,610 --> 00:07:54,920 What I want to focus on today is the other application. 164 00:07:54,920 --> 00:07:57,160 How do we-- this is for representing a set, which 165 00:07:57,160 --> 00:07:59,080 is relatively easy. 166 00:07:59,080 --> 00:08:01,960 A challenge to sort of set ourselves up for, 167 00:08:01,960 --> 00:08:03,970 but we need a little more work, is 168 00:08:03,970 --> 00:08:07,540 to make sequence binary trees. 169 00:08:07,540 --> 00:08:10,787 So suppose I have a binary tree, and what I would like-- 170 00:08:10,787 --> 00:08:12,370 we mentioned at the end of last time-- 171 00:08:12,370 --> 00:08:15,640 is that I want the traversal order of my tree 172 00:08:15,640 --> 00:08:20,470 to be the sequence order, the order that I'm 173 00:08:20,470 --> 00:08:22,960 trying to represent that's changed by operations 174 00:08:22,960 --> 00:08:26,380 like insert_at. 175 00:08:26,380 --> 00:08:28,473 So I'd just like to do the same thing. 176 00:08:28,473 --> 00:08:30,640 But now, I have to think about how do I do a search, 177 00:08:30,640 --> 00:08:32,650 how do I do a insert_at, that sort of thing. 178 00:08:32,650 --> 00:08:36,789 And here is an algorithm for what I would like to work. 179 00:08:36,789 --> 00:08:40,059 But it's not going to quite work yet. 180 00:08:40,059 --> 00:08:45,700 So suppose I give you a subtree, so specified by a node. 181 00:08:45,700 --> 00:08:49,060 So there's all the descendants of that node. 182 00:08:49,060 --> 00:08:52,360 And I'd like to know what is in the traversal order 183 00:08:52,360 --> 00:08:55,570 of that subtree, which starts here, and ends here, 184 00:08:55,570 --> 00:08:58,250 and the root will be somewhere in the middle. 185 00:08:58,250 --> 00:09:01,100 Give me the ith node. 186 00:09:01,100 --> 00:09:03,430 So if I ask i equals 0, I want to get 187 00:09:03,430 --> 00:09:05,230 this leftmost descendant. 188 00:09:05,230 --> 00:09:08,800 If I ask for i equals the size of the tree minus 1, 189 00:09:08,800 --> 00:09:11,380 I want to get the rightmost descendant. 190 00:09:11,380 --> 00:09:13,835 That was the first and last in the subtree 191 00:09:13,835 --> 00:09:14,710 that we talked about. 192 00:09:14,710 --> 00:09:16,840 But we know how to find the first and last. 193 00:09:16,840 --> 00:09:18,220 Just walk left or walk right. 194 00:09:18,220 --> 00:09:22,180 But we don't know how to find the ith node-- 195 00:09:22,180 --> 00:09:26,110 in order h time is the goal right now, not log n. 196 00:09:26,110 --> 00:09:30,400 And the idea is, well, it seems like size matters. 197 00:09:30,400 --> 00:09:34,070 [CHUCKLES] Sorry if you heard otherwise. 198 00:09:34,070 --> 00:09:36,940 So in particular, I mentioned size 199 00:09:36,940 --> 00:09:40,300 when I was talking about the last node in the sequence. 200 00:09:40,300 --> 00:09:44,860 The index of that node is size of the subtree minus 1. 201 00:09:44,860 --> 00:09:51,250 So let's define the size of a node 202 00:09:51,250 --> 00:09:57,310 to be the number of nodes in its subtree-- 203 00:09:57,310 --> 00:09:59,050 we were calling that subtree(node)-- 204 00:10:06,510 --> 00:10:09,100 including the node itself. 205 00:10:09,100 --> 00:10:11,340 So if I somehow knew the size, this 206 00:10:11,340 --> 00:10:13,590 seems important for understanding indexes. 207 00:10:13,590 --> 00:10:16,530 Let's just assume that I knew that magically 208 00:10:16,530 --> 00:10:17,280 in constant time. 209 00:10:19,980 --> 00:10:23,200 Then, I claim that the size of the left subtree-- 210 00:10:23,200 --> 00:10:25,350 so why don't I expand this diagram a little bit? 211 00:10:29,060 --> 00:10:31,910 So we have node as before. 212 00:10:31,910 --> 00:10:36,120 But we have left subtree and a right subtree. 213 00:10:36,120 --> 00:10:37,700 So this node here is node.left. 214 00:10:37,700 --> 00:10:39,080 This node here is node.right. 215 00:10:39,080 --> 00:10:42,500 They might not exist, but let's ignore those exceptional cases 216 00:10:42,500 --> 00:10:43,760 for now. 217 00:10:43,760 --> 00:10:47,190 Let's suppose we knew not only the size of node, 218 00:10:47,190 --> 00:10:49,280 but we knew the size of node.left, 219 00:10:49,280 --> 00:10:53,780 so that is the size of this tree on the left. 220 00:10:53,780 --> 00:10:55,070 I'm going to call that nL. 221 00:10:55,070 --> 00:10:59,000 So let's suppose that there are nL nodes down here. 222 00:10:59,000 --> 00:11:03,770 I claim that lets me do the equivalent of a binary search. 223 00:11:03,770 --> 00:11:06,440 Because I'm looking for some index i. 224 00:11:06,440 --> 00:11:10,737 And if i is less than nL, then I know that it must be down here. 225 00:11:10,737 --> 00:11:12,320 For example, if i equals 0, it's going 226 00:11:12,320 --> 00:11:16,490 to be in the left subtree, as long as nL is greater than 0, 227 00:11:16,490 --> 00:11:16,990 right? 228 00:11:16,990 --> 00:11:19,190 So that's exactly this check. 229 00:11:19,190 --> 00:11:21,080 If i is less than nL, I'm going to recurse 230 00:11:21,080 --> 00:11:25,860 to the left, call subtree at of node.left, i. 231 00:11:25,860 --> 00:11:27,830 That's what's written here. 232 00:11:27,830 --> 00:11:31,430 If i equals nL, if you think about it for a second-- so nL 233 00:11:31,430 --> 00:11:33,150 is the number of nodes here. 234 00:11:33,150 --> 00:11:37,340 And so that means this node has index nL. 235 00:11:37,340 --> 00:11:39,500 The index of this node is nL. 236 00:11:39,500 --> 00:11:42,770 And so if i equals-- if the one index we're looking for 237 00:11:42,770 --> 00:11:44,780 is that one, then we just return this node. 238 00:11:44,780 --> 00:11:46,460 We're done. 239 00:11:46,460 --> 00:11:48,920 And otherwise, i is greater than nL. 240 00:11:48,920 --> 00:11:50,930 And that means that the node we're looking for 241 00:11:50,930 --> 00:11:54,590 is in the right subtree, because it comes after the root. 242 00:11:54,590 --> 00:11:56,263 Again, that's what it means. 243 00:11:56,263 --> 00:11:57,680 That's what traversal order means. 244 00:11:57,680 --> 00:12:00,830 So if we define it to be sequence order, 245 00:12:00,830 --> 00:12:03,920 then we know all the things that come after this node, which 246 00:12:03,920 --> 00:12:06,120 is index nL, must be over here. 247 00:12:06,120 --> 00:12:08,870 Now when we recurse down here, our numbering system changes. 248 00:12:08,870 --> 00:12:12,320 Because for node, 0 is here. 249 00:12:12,320 --> 00:12:15,060 And then for node.right is here. 250 00:12:15,060 --> 00:12:17,630 So we need to do a little bit of subtraction here, which 251 00:12:17,630 --> 00:12:19,310 is when we recurse to the right, we 252 00:12:19,310 --> 00:12:22,580 take i minus nL minus 1-- minus nL for these guys, 253 00:12:22,580 --> 00:12:25,925 minus 1 for the root node. 254 00:12:25,925 --> 00:12:27,800 And that will give us the index we're looking 255 00:12:27,800 --> 00:12:30,980 for within this subtree. 256 00:12:30,980 --> 00:12:34,400 So my point is, this algorithm is basically 257 00:12:34,400 --> 00:12:35,600 the same as this algorithm. 258 00:12:35,600 --> 00:12:39,080 But this one uses keys, because we're dealing with a set, 259 00:12:39,080 --> 00:12:41,450 and in sets we assume items have keys. 260 00:12:41,450 --> 00:12:43,520 Over here, items don't have to have keys. 261 00:12:43,520 --> 00:12:45,810 In fact, we're not touching the items at all. 262 00:12:45,810 --> 00:12:49,823 We're just asking, what is the ith item in my sequence, which 263 00:12:49,823 --> 00:12:52,490 is the same thing as what is the ith item in my traversal order, 264 00:12:52,490 --> 00:12:54,560 which is the same thing as asking what is the ith 265 00:12:54,560 --> 00:12:56,600 node in the traversal order? 266 00:12:56,600 --> 00:12:58,310 And this algorithm gives it to you 267 00:12:58,310 --> 00:13:01,090 exactly the same way in order h time. 268 00:13:03,547 --> 00:13:05,630 Now, I'm not going to show you all the operations. 269 00:13:05,630 --> 00:13:11,930 But you can use subtree_at to implement get_at set_at. 270 00:13:11,930 --> 00:13:15,050 You just find the appropriate node and return the item 271 00:13:15,050 --> 00:13:17,180 or modify the item. 272 00:13:17,180 --> 00:13:19,070 Or you can use it to-- 273 00:13:19,070 --> 00:13:21,950 most crucially, you can use it to do insert_at delete_at. 274 00:13:21,950 --> 00:13:25,090 This is a new thing we've never been able to do before. 275 00:13:25,090 --> 00:13:25,970 What do you do? 276 00:13:25,970 --> 00:13:28,670 Just like over here, if I'm trying to insert an item, 277 00:13:28,670 --> 00:13:33,300 I search for that item over here. 278 00:13:33,300 --> 00:13:36,710 So if I'm trying to insert at i, for example, I look for i. 279 00:13:36,710 --> 00:13:43,490 And then for insert_at i, want to add a new item just 280 00:13:43,490 --> 00:13:44,990 before that one. 281 00:13:44,990 --> 00:13:47,870 And conveniently, we already have-- 282 00:13:47,870 --> 00:13:50,810 I didn't mention, but we have a subtree insert. 283 00:13:50,810 --> 00:13:55,470 We had two versions-- before and after. 284 00:13:55,470 --> 00:13:59,270 I think we covered after, which I use successor 285 00:13:59,270 --> 00:14:01,130 before I use predecessor. 286 00:14:01,130 --> 00:14:04,280 But we can just call subtree insert 287 00:14:04,280 --> 00:14:09,470 before at that node, and boom, we will have added 288 00:14:09,470 --> 00:14:12,230 a new item just before it. 289 00:14:12,230 --> 00:14:15,710 And great, so magically, somehow, we 290 00:14:15,710 --> 00:14:18,080 have inserted in the middle of this sequence. 291 00:14:18,080 --> 00:14:21,980 And all of the indices update, because I'm not storing 292 00:14:21,980 --> 00:14:23,210 indices. 293 00:14:23,210 --> 00:14:25,730 Instead, to search for an item at index i, 294 00:14:25,730 --> 00:14:28,820 I'm using the search algorithm. 295 00:14:28,820 --> 00:14:31,460 But there's a problem. 296 00:14:31,460 --> 00:14:32,300 What's the problem? 297 00:14:36,520 --> 00:14:39,190 This seems a little too good to be true. 298 00:14:39,190 --> 00:14:41,500 I insert in the middle of this tree, 299 00:14:41,500 --> 00:14:44,350 and then somehow, I can magically search and still 300 00:14:44,350 --> 00:14:47,350 find the ith item, even though all the indices to the right 301 00:14:47,350 --> 00:14:48,760 of that item incremented by 1. 302 00:14:52,580 --> 00:14:54,720 It's almost true. 303 00:14:54,720 --> 00:14:55,220 Answer? 304 00:14:58,020 --> 00:14:59,280 Yeah? 305 00:14:59,280 --> 00:15:00,755 AUDIENCE: [INAUDIBLE] 306 00:15:00,755 --> 00:15:03,130 ERIK DEMAINE: Because we have to update the sizes, right. 307 00:15:03,130 --> 00:15:06,350 I didn't say, how do I compute the size of the left subtree? 308 00:15:06,350 --> 00:15:07,660 So that is the next topic. 309 00:15:11,950 --> 00:15:12,850 We're almost done. 310 00:15:12,850 --> 00:15:14,510 This will actually work. 311 00:15:14,510 --> 00:15:16,780 It's really quite awesome. 312 00:15:16,780 --> 00:15:18,790 But for it to work, we need something 313 00:15:18,790 --> 00:15:22,837 called subtree augmentation, which 314 00:15:22,837 --> 00:15:23,920 I'll talk about generally. 315 00:15:23,920 --> 00:15:25,390 And then, we'll apply it to size. 316 00:15:30,370 --> 00:15:32,590 So the idea with subtree augmentation 317 00:15:32,590 --> 00:15:37,900 is that each node in our binary tree 318 00:15:37,900 --> 00:15:45,100 can store a constant number of extra fields. 319 00:15:45,100 --> 00:15:46,780 Why not? 320 00:15:46,780 --> 00:15:49,750 And in particular, if these fields 321 00:15:49,750 --> 00:15:51,610 are of a particular type-- 322 00:15:51,610 --> 00:15:53,260 maybe I'll call them properties. 323 00:15:57,700 --> 00:16:10,030 I'm going to define a subtree property 324 00:16:10,030 --> 00:16:21,330 to be something that can be computed 325 00:16:21,330 --> 00:16:25,520 from the properties of the nodes' children. 326 00:16:28,740 --> 00:16:30,780 So I should say this is of a node. 327 00:16:37,230 --> 00:16:42,030 So children are node.left and node.right. 328 00:16:42,030 --> 00:16:44,655 You can also access constant amount of other stuff, 329 00:16:44,655 --> 00:16:47,620 for example the node itself. 330 00:16:47,620 --> 00:16:49,140 But the point of a subtree property 331 00:16:49,140 --> 00:16:50,490 is it's downward looking. 332 00:16:50,490 --> 00:16:57,360 If I have a node here, and I want 333 00:16:57,360 --> 00:17:00,750 to compute some property about it-- 334 00:17:00,750 --> 00:17:06,000 call it, we want to store P of the node-- 335 00:17:06,000 --> 00:17:09,270 and suppose we already know P over here, 336 00:17:09,270 --> 00:17:11,910 the property computed for the left subtree 337 00:17:11,910 --> 00:17:13,710 or for the left node, and we already 338 00:17:13,710 --> 00:17:17,970 know the property for the right node, then what I'd like 339 00:17:17,970 --> 00:17:20,369 is for this to be computable in constant time. 340 00:17:26,160 --> 00:17:29,750 So I can compute P of this node given P of the left node 341 00:17:29,750 --> 00:17:30,870 and P of the right node. 342 00:17:30,870 --> 00:17:32,340 That's a subtree property. 343 00:17:32,340 --> 00:17:38,340 Now, in particular, size is a substrate property. 344 00:17:38,340 --> 00:17:39,510 Why? 345 00:17:39,510 --> 00:17:43,830 Because I can write this kind of recurrence, 346 00:17:43,830 --> 00:17:49,282 node.size equals node.left.size-- 347 00:17:49,282 --> 00:17:52,470 this is very tedious to write-- 348 00:17:52,470 --> 00:17:59,580 plus node.right.size, plus? 349 00:18:03,790 --> 00:18:05,955 1, thank you. 350 00:18:05,955 --> 00:18:11,185 The size of the entire subtree here, called node, 351 00:18:11,185 --> 00:18:13,810 is the size of the left subtree plus size of the right subtree, 352 00:18:13,810 --> 00:18:17,290 plus 1 for that node itself. 353 00:18:17,290 --> 00:18:18,865 So this is an update rule. 354 00:18:18,865 --> 00:18:20,530 It takes constant time to evaluate. 355 00:18:20,530 --> 00:18:22,990 It's two editions. 356 00:18:22,990 --> 00:18:27,250 Sorry, my t's look kind of like plus signs. 357 00:18:27,250 --> 00:18:29,390 I'll make the pluses a little bigger. 358 00:18:35,750 --> 00:18:38,030 So we're just summing those three things. 359 00:18:38,030 --> 00:18:40,020 Boom, we can get node.size. 360 00:18:40,020 --> 00:18:46,980 So I claim that as long as my property has this feature, 361 00:18:46,980 --> 00:18:51,680 I can maintain it dynamically as I'm changing the tree. 362 00:18:51,680 --> 00:18:53,722 Now, this is a little bit of a forward reference, 363 00:18:53,722 --> 00:18:55,430 because we haven't said exactly how we're 364 00:18:55,430 --> 00:18:56,638 going to change the tree yet. 365 00:18:56,638 --> 00:18:57,710 But question? 366 00:18:57,710 --> 00:19:01,100 AUDIENCE: If node.size is recursive, 367 00:19:01,100 --> 00:19:03,290 then how is it happening in constant time? 368 00:19:03,290 --> 00:19:05,150 Wouldn't it be happening [INAUDIBLE]?? 369 00:19:05,150 --> 00:19:06,275 ERIK DEMAINE: Why is this-- 370 00:19:06,275 --> 00:19:07,290 OK, good question. 371 00:19:07,290 --> 00:19:09,080 So one natural way, you can think 372 00:19:09,080 --> 00:19:10,850 of this as a recursion, which gives you 373 00:19:10,850 --> 00:19:12,750 a recursive algorithm. 374 00:19:12,750 --> 00:19:14,780 So I wrote-- but didn't write it. 375 00:19:14,780 --> 00:19:18,140 But I could have written size of node equals this-- 376 00:19:18,140 --> 00:19:19,910 size of node.left plus-- and that 377 00:19:19,910 --> 00:19:23,030 would give you a linear time algorithm to count the size. 378 00:19:23,030 --> 00:19:24,710 And if you don't have any information, 379 00:19:24,710 --> 00:19:27,218 that is what you would do. 380 00:19:27,218 --> 00:19:28,510 And that would be very painful. 381 00:19:28,510 --> 00:19:30,427 So that would make this algorithm really slow. 382 00:19:30,427 --> 00:19:33,860 If I'm calling size as a recursive function, it's bad. 383 00:19:33,860 --> 00:19:36,410 What I'm instead doing is storing the sizes 384 00:19:36,410 --> 00:19:41,510 on every single node and pre-computing this. 385 00:19:41,510 --> 00:19:46,220 So in fact, I'm going to define the size of node in-- 386 00:19:46,220 --> 00:19:48,440 so this is the definition mathematically. 387 00:19:48,440 --> 00:19:50,000 But the algorithm for this function 388 00:19:50,000 --> 00:19:53,600 is just going to be return node.size. 389 00:19:53,600 --> 00:19:55,610 So that's constant time. 390 00:19:55,610 --> 00:19:58,100 So the challenge now is I have to keep these sizes up 391 00:19:58,100 --> 00:20:00,470 to date, no matter what I do to the tree. 392 00:20:00,470 --> 00:20:02,450 And you could look back at last lecture 393 00:20:02,450 --> 00:20:06,500 and see, OK, what were all the changes that I did in a tree? 394 00:20:06,500 --> 00:20:09,620 We only did changes during insert and delete. 395 00:20:09,620 --> 00:20:16,940 And I will just claim to you, when we did insert and delete, 396 00:20:16,940 --> 00:20:22,640 what they ended up doing in the end, 397 00:20:22,640 --> 00:20:29,090 they add or remove a leaf of the tree. 398 00:20:35,050 --> 00:20:39,170 Remember, a leaf was a node with no children. 399 00:20:39,170 --> 00:20:42,730 So let's just think about if I add a new leaf in a tree-- 400 00:20:42,730 --> 00:20:45,440 here's a tree, suppose I add a leaf here-- 401 00:20:45,440 --> 00:20:48,160 which subtrees change? 402 00:20:48,160 --> 00:20:52,570 Well, which subtrees contain that node? 403 00:20:52,570 --> 00:20:54,040 It is its own new subtree. 404 00:20:54,040 --> 00:20:58,480 Then it's in its parent subtree, and its grandparent subtree, 405 00:20:58,480 --> 00:21:00,970 and the overall subtree. 406 00:21:00,970 --> 00:21:04,240 In general, these nodes are called the ancestors 407 00:21:04,240 --> 00:21:06,250 of this node that we added. 408 00:21:06,250 --> 00:21:07,930 And those are the ones that update. 409 00:21:07,930 --> 00:21:10,360 This subtree over here didn't change. 410 00:21:10,360 --> 00:21:11,560 It didn't change size. 411 00:21:11,560 --> 00:21:13,770 And because it's a subtree property, 412 00:21:13,770 --> 00:21:15,520 no subtree property will change over here, 413 00:21:15,520 --> 00:21:18,370 because the subtree was untouched. 414 00:21:18,370 --> 00:21:22,630 So when I touch this guy, I just have 415 00:21:22,630 --> 00:21:24,610 to update the subtree property here, 416 00:21:24,610 --> 00:21:25,990 update the subtree property here, 417 00:21:25,990 --> 00:21:27,190 update subtree property here. 418 00:21:27,190 --> 00:21:28,357 How many of these are there? 419 00:21:32,220 --> 00:21:33,730 Yeah? 420 00:21:33,730 --> 00:21:36,500 h-- I'll say order h to be safe. 421 00:21:36,500 --> 00:21:40,090 But I think it's exactly h. 422 00:21:40,090 --> 00:21:42,610 So also, when I remove a leaf, the same thing-- 423 00:21:42,610 --> 00:21:44,830 if I remove this leaf, then the subtrees 424 00:21:44,830 --> 00:21:46,870 that change are exactly its former ancestors. 425 00:21:51,430 --> 00:22:01,360 Cool, so we're going to update those order h 426 00:22:01,360 --> 00:22:13,770 ancestors in order up the tree. 427 00:22:13,770 --> 00:22:15,950 So what do I mean by update? 428 00:22:15,950 --> 00:22:18,740 I mean apply this rule. 429 00:22:18,740 --> 00:22:21,810 For size, it's this rule. 430 00:22:21,810 --> 00:22:23,540 But in general, the subtree property 431 00:22:23,540 --> 00:22:26,250 gives me an update rule that takes constant time. 432 00:22:26,250 --> 00:22:28,820 And so I'm going to apply that update rule 433 00:22:28,820 --> 00:22:32,570 to this node, which will fix whatever property is stored 434 00:22:32,570 --> 00:22:33,560 in there. 435 00:22:33,560 --> 00:22:35,130 Maybe there's more than one property. 436 00:22:35,130 --> 00:22:36,630 And then I'll apply it to this node. 437 00:22:36,630 --> 00:22:40,040 And because this is already correct by induction, 438 00:22:40,040 --> 00:22:42,470 and this is already correct because I didn't touch this 439 00:22:42,470 --> 00:22:44,030 subtree-- it's unchanged-- 440 00:22:44,030 --> 00:22:46,790 then I can update the value at this node-- 441 00:22:46,790 --> 00:22:49,160 the property at this node in constant time. 442 00:22:49,160 --> 00:22:51,200 Then I update this one. 443 00:22:51,200 --> 00:22:53,468 And because this one is already correct by induction, 444 00:22:53,468 --> 00:22:55,760 and this one is already correct because this subtree is 445 00:22:55,760 --> 00:22:58,880 unchanged, I can update the property correctly here 446 00:22:58,880 --> 00:23:00,500 in constant time. 447 00:23:00,500 --> 00:23:03,170 So when I make a change in order h time, 448 00:23:03,170 --> 00:23:06,200 because I'm making h calls to this constant time algorithm, 449 00:23:06,200 --> 00:23:11,570 I can update a constant number of subtree properties. 450 00:23:11,570 --> 00:23:12,620 This is very powerful. 451 00:23:12,620 --> 00:23:14,660 Data structure augmentation is super useful. 452 00:23:14,660 --> 00:23:16,160 You will use it on your problem set. 453 00:23:16,160 --> 00:23:19,080 We will use it again today. 454 00:23:19,080 --> 00:23:22,550 Let me give you some examples of subtree properties. 455 00:23:27,980 --> 00:23:35,060 They could be-- common ones are, like, the sum, or the product, 456 00:23:35,060 --> 00:23:42,290 or the min, or the max, or sum of squares, 457 00:23:42,290 --> 00:23:50,090 or all sorts of things, of some feature 458 00:23:50,090 --> 00:23:52,055 of every node in the subtree. 459 00:23:57,860 --> 00:24:01,310 In fact, subtree size is an example of such a thing. 460 00:24:01,310 --> 00:24:05,990 It is the sum over all nodes in the subtree of the value 1. 461 00:24:05,990 --> 00:24:08,510 It's another way to say count the number of nodes. 462 00:24:08,510 --> 00:24:10,460 But you could also say, what's the sum 463 00:24:10,460 --> 00:24:11,697 of the keys in these nodes? 464 00:24:11,697 --> 00:24:14,030 Or you could say, what's the maximum key in these nodes? 465 00:24:14,030 --> 00:24:21,737 Or you could say, what is the maximum value in these nodes? 466 00:24:21,737 --> 00:24:22,820 You can take any property. 467 00:24:22,820 --> 00:24:23,900 It doesn't have to be key. 468 00:24:23,900 --> 00:24:25,775 It doesn't have to be anything in particular. 469 00:24:25,775 --> 00:24:26,730 It's very powerful. 470 00:24:26,730 --> 00:24:28,808 You can take all sums, products and maintain them 471 00:24:28,808 --> 00:24:30,350 as long as they're downward looking-- 472 00:24:30,350 --> 00:24:33,890 as long as you're only thinking about the subtree. 473 00:24:33,890 --> 00:24:38,860 Some examples of things you cannot maintain are-- 474 00:24:41,880 --> 00:24:45,345 not a nodes index. 475 00:24:48,480 --> 00:24:51,750 So if you get a little bit too excited about augmentation, 476 00:24:51,750 --> 00:24:54,150 you might think, oh, I could do everything. 477 00:24:54,150 --> 00:24:56,850 I needed to support subtree_at, or let's 478 00:24:56,850 --> 00:24:59,370 just say, get_at globally, I wanted 479 00:24:59,370 --> 00:25:01,500 to know what is the ith node in my tree? 480 00:25:01,500 --> 00:25:03,690 Well, I'll just use data structure augmentation 481 00:25:03,690 --> 00:25:09,240 and store in every node what is its index, 0 through n minus 1. 482 00:25:09,240 --> 00:25:11,370 I can't maintain that efficiently. 483 00:25:11,370 --> 00:25:13,890 Because if I insert at the beginning 484 00:25:13,890 --> 00:25:18,340 of my traversal order, then all the indices change. 485 00:25:18,340 --> 00:25:20,910 So that's an example of a edit. 486 00:25:20,910 --> 00:25:23,530 So if I insert a new node over here, 487 00:25:23,530 --> 00:25:25,650 so this guy's index was 0, now it's 1. 488 00:25:25,650 --> 00:25:27,780 This guy's index was 1, now it's 2. 489 00:25:27,780 --> 00:25:29,430 This was 2, now it's 3, and so on. 490 00:25:29,430 --> 00:25:31,380 Every node changes its index. 491 00:25:31,380 --> 00:25:33,750 Index is not a subtree property, and that's 492 00:25:33,750 --> 00:25:35,280 why we can't maintain it. 493 00:25:35,280 --> 00:25:37,497 Because it depends on all of the nodes in the tree. 494 00:25:37,497 --> 00:25:39,330 Or it depends on all the nodes to its left-- 495 00:25:39,330 --> 00:25:40,900 all the predecessors. 496 00:25:40,900 --> 00:25:43,440 So for example, this guy's index depends 497 00:25:43,440 --> 00:25:46,140 on how many nodes are over here on the left, which is not 498 00:25:46,140 --> 00:25:47,980 in the subtree of that node. 499 00:25:47,980 --> 00:25:50,400 So that's where you have to be careful. 500 00:25:50,400 --> 00:25:52,630 Don't use global properties of the tree. 501 00:25:52,630 --> 00:25:57,760 You can only use subtree properties. 502 00:25:57,760 --> 00:26:01,755 Another example is depth. 503 00:26:01,755 --> 00:26:08,220 Depth Is annoying to maintain, but it's not obvious why yet. 504 00:26:08,220 --> 00:26:09,780 We will see that in a moment. 505 00:26:14,610 --> 00:26:20,220 The rest of today is about going from order h order log 506 00:26:20,220 --> 00:26:23,770 n, which is what this slide is showing us. 507 00:26:23,770 --> 00:26:26,460 So at this point, you should believe that we can do all 508 00:26:26,460 --> 00:26:30,180 of the sequence data structure operations in order h time-- 509 00:26:30,180 --> 00:26:33,130 except for build and iterate, which take linear time-- 510 00:26:33,130 --> 00:26:36,750 and that we can do all of the set operations in order h time, 511 00:26:36,750 --> 00:26:38,460 except build and iterate, which take 512 00:26:38,460 --> 00:26:40,620 n log n and n respectively. 513 00:26:40,620 --> 00:26:50,790 And our goal is to now bound h by log n. 514 00:26:50,790 --> 00:26:53,670 We know it's possible at some level, 515 00:26:53,670 --> 00:26:57,450 because there are trees that have logarithmic height. 516 00:26:57,450 --> 00:27:00,565 That's like this perfect tree here. 517 00:27:00,565 --> 00:27:02,190 But we also know we have to be careful, 518 00:27:02,190 --> 00:27:04,320 because there are some bad trees, like this chain. 519 00:27:07,170 --> 00:27:15,140 So if h equals log n, we call this a balanced binary tree. 520 00:27:19,430 --> 00:27:21,200 There are many balanced binary trees 521 00:27:21,200 --> 00:27:24,302 in the world, maybe a dozen or two-- 522 00:27:24,302 --> 00:27:25,760 a lot of different data structures. 523 00:27:25,760 --> 00:27:26,260 Question? 524 00:27:26,260 --> 00:27:28,825 AUDIENCE: [INAUDIBLE] you said not 525 00:27:28,825 --> 00:27:30,933 to think about things on a global level 526 00:27:30,933 --> 00:27:32,433 so we'll think of them [INAUDIBLE].. 527 00:27:32,433 --> 00:27:34,382 Can you explain what that means a little more? 528 00:27:34,382 --> 00:27:35,840 ERIK DEMAINE: OK, what does it mean 529 00:27:35,840 --> 00:27:40,520 for a property to be local to a subtree versus global? 530 00:27:40,520 --> 00:27:43,075 The best answer is this definition. 531 00:27:43,075 --> 00:27:45,200 But that's maybe not the most intuitive definition. 532 00:27:45,200 --> 00:27:46,033 This is what I mean. 533 00:27:46,033 --> 00:27:48,782 Something that can be computed just knowing information 534 00:27:48,782 --> 00:27:50,240 about your left and right children, 535 00:27:50,240 --> 00:27:52,220 that is the meaning of such a property. 536 00:27:52,220 --> 00:27:54,595 And those are the only things you're allowed to maintain. 537 00:27:54,595 --> 00:27:57,320 Because those are the only things 538 00:27:57,320 --> 00:28:00,500 that are easy to update by walking up this path. 539 00:28:00,500 --> 00:28:06,050 And the contrast is that global property like index, 540 00:28:06,050 --> 00:28:08,000 it's global, in particular, because I 541 00:28:08,000 --> 00:28:12,140 can do one change, add one node, and all of the node's 542 00:28:12,140 --> 00:28:13,030 properties change. 543 00:28:13,030 --> 00:28:15,470 So that's an extreme example of global. 544 00:28:15,470 --> 00:28:20,180 We want this very particular notion of local, 545 00:28:20,180 --> 00:28:23,570 because that's what we can actually afford to recompute. 546 00:28:23,570 --> 00:28:24,920 Hopefully that clarifies. 547 00:28:24,920 --> 00:28:27,008 Yeah? 548 00:28:27,008 --> 00:28:31,630 AUDIENCE: Doesn't size not work with that [INAUDIBLE]?? 549 00:28:31,630 --> 00:28:33,700 ERIK DEMAINE: You're right that if we added-- 550 00:28:33,700 --> 00:28:34,255 oh, no. 551 00:28:34,255 --> 00:28:35,380 OK, let's think about that. 552 00:28:35,380 --> 00:28:37,340 If we added a new parent to the tree-- 553 00:28:37,340 --> 00:28:40,600 this is not something that we've ever done. 554 00:28:40,600 --> 00:28:44,500 But even if we did that, which subtrees change? 555 00:28:44,500 --> 00:28:47,740 Only this one. 556 00:28:47,740 --> 00:28:50,410 This node, it's a totally new subtree. 557 00:28:50,410 --> 00:28:52,780 But the subtree of this node is completely unchanged. 558 00:28:52,780 --> 00:28:54,655 Because subtrees are always downward looking, 559 00:28:54,655 --> 00:28:58,270 if I added a new root, I didn't change any subtrees 560 00:28:58,270 --> 00:28:59,890 except for one. 561 00:28:59,890 --> 00:29:01,790 So size is a subtree property. 562 00:29:01,790 --> 00:29:03,490 Now, there are-- 563 00:29:03,490 --> 00:29:05,510 I mean, I could completely redraw the tree. 564 00:29:05,510 --> 00:29:07,510 And that's an operation that requires everything 565 00:29:07,510 --> 00:29:08,980 to be recomputed. 566 00:29:08,980 --> 00:29:11,740 So it is limited exactly what I'm allowed to do in the tree. 567 00:29:11,740 --> 00:29:13,960 But I claim everything we'll do, last class 568 00:29:13,960 --> 00:29:18,410 and today, we can afford this augmentation. 569 00:29:18,410 --> 00:29:21,100 So it's a feature, not of all binary trees necessarily, 570 00:29:21,100 --> 00:29:22,600 but of the ones that we would cover. 571 00:29:22,600 --> 00:29:23,100 Yeah? 572 00:29:23,100 --> 00:29:24,395 AUDIENCE: What is a min? 573 00:29:24,395 --> 00:29:25,562 ERIK DEMAINE: What is a min? 574 00:29:25,562 --> 00:29:27,370 AUDIENCE: [INAUDIBLE] 575 00:29:27,370 --> 00:29:31,360 ERIK DEMAINE: Binary tree, yeah. 576 00:29:31,360 --> 00:29:35,260 OK, this will make a little more sense in a moment 577 00:29:35,260 --> 00:29:38,100 when I say what we're actually going to do with trees. 578 00:29:49,060 --> 00:29:53,830 We need a new tool for manipulating a tree. 579 00:29:53,830 --> 00:29:56,530 What we've done so far, we've done some swapping of items. 580 00:29:56,530 --> 00:29:59,110 And we did adding and removing a leaf. 581 00:29:59,110 --> 00:30:00,010 That's not enough. 582 00:30:00,010 --> 00:30:01,990 We're going to need something else to let us 583 00:30:01,990 --> 00:30:05,320 guarantee logarithmic height. 584 00:30:05,320 --> 00:30:07,900 And that something else is called a rotation. 585 00:30:12,050 --> 00:30:15,350 What does this something else need to do? 586 00:30:15,350 --> 00:30:18,470 This is just a tool for rebalancing the tree. 587 00:30:18,470 --> 00:30:22,280 So it should not change the data that's represented by the tree. 588 00:30:22,280 --> 00:30:24,320 What is the data represented by the tree? 589 00:30:24,320 --> 00:30:25,940 Traversal order. 590 00:30:25,940 --> 00:30:27,660 Traversal order is sacrosanct. 591 00:30:27,660 --> 00:30:28,910 We're not allowed to touch it. 592 00:30:28,910 --> 00:30:31,250 It's already defined two different ways, 593 00:30:31,250 --> 00:30:34,410 depending on whether you're using a set or sequence. 594 00:30:34,410 --> 00:30:37,130 So we want to modify the tree in a way that doesn't 595 00:30:37,130 --> 00:30:39,380 modify the traversal order. 596 00:30:39,380 --> 00:30:41,510 So we're exploiting redundancy. 597 00:30:41,510 --> 00:30:43,820 If you wrote down the traversal order in an array, 598 00:30:43,820 --> 00:30:45,590 there's exactly one representation 599 00:30:45,590 --> 00:30:46,940 of a given order. 600 00:30:46,940 --> 00:30:49,340 But in a tree, there's many representations. 601 00:30:49,340 --> 00:30:50,840 It could be a long line. 602 00:30:50,840 --> 00:30:52,520 It could be a balance thing. 603 00:30:52,520 --> 00:30:54,770 They could represent the exact same order on the nodes 604 00:30:54,770 --> 00:30:56,600 if you label them right. 605 00:30:56,600 --> 00:30:59,300 In fact, there are exponentially many different representations 606 00:30:59,300 --> 00:31:00,050 of the same thing. 607 00:31:00,050 --> 00:31:04,880 And we're going to exploit that, the same order, and define-- 608 00:31:04,880 --> 00:31:06,560 this is just a thing you need to know. 609 00:31:20,364 --> 00:31:27,820 Let me label, A, X, B, Y, C. You can tell I've 610 00:31:27,820 --> 00:31:31,570 drawn this diagram before many, many times. 611 00:31:31,570 --> 00:31:35,060 This is a very powerful tool in all tree data structures, 612 00:31:35,060 --> 00:31:37,060 which are most of data structures. 613 00:31:37,060 --> 00:31:46,940 And they are called right rotate of y and left rotate of x. 614 00:31:53,860 --> 00:31:55,730 So if I have this tree-- 615 00:31:55,730 --> 00:31:58,150 which I'm just black boxing some of the subtrees 616 00:31:58,150 --> 00:31:59,410 into little triangles. 617 00:31:59,410 --> 00:32:02,860 If I have a node, and it has a left child, 618 00:32:02,860 --> 00:32:05,590 then I'm allowed to right rotate this edge, which means take 619 00:32:05,590 --> 00:32:07,000 this edge and go like this-- 620 00:32:07,000 --> 00:32:09,058 90 degrees, kind of. 621 00:32:09,058 --> 00:32:11,350 Or you could just think of it as rewriting it this way. 622 00:32:11,350 --> 00:32:14,290 Now, you might also have keeping track of the parent pointer. 623 00:32:14,290 --> 00:32:16,720 Parent pointer moves around. 624 00:32:16,720 --> 00:32:18,610 Before, this was the parent of y. 625 00:32:18,610 --> 00:32:20,980 Now it's the parent of x. 626 00:32:20,980 --> 00:32:23,980 So x and y are switching places. 627 00:32:23,980 --> 00:32:27,610 But we couldn't just swap these items around, 628 00:32:27,610 --> 00:32:29,980 because that would change traversal order. 629 00:32:29,980 --> 00:32:32,440 In this picture, x comes before y, 630 00:32:32,440 --> 00:32:36,850 because x is in the left subtree of y in traversal order. 631 00:32:36,850 --> 00:32:40,000 And over here, now y is in the right subtree of x. 632 00:32:40,000 --> 00:32:41,350 So it comes after x. 633 00:32:41,350 --> 00:32:43,600 So in both cases, x comes before y. 634 00:32:43,600 --> 00:32:48,610 And indeed, in all of these pictures the traversal order-- 635 00:32:48,610 --> 00:32:51,520 I mean, not just for x and y, but also for A, B, and C, 636 00:32:51,520 --> 00:32:53,170 the traversal order is consistent. 637 00:32:53,170 --> 00:33:01,975 It is A, X, B, y, C, where, when I write a triangle, 638 00:33:01,975 --> 00:33:03,850 I mean recursively the traversal order of all 639 00:33:03,850 --> 00:33:05,578 the things in the triangle. 640 00:33:05,578 --> 00:33:07,870 So if you just apply the traversal order algorithm here 641 00:33:07,870 --> 00:33:09,830 versus here, you get the same output, 642 00:33:09,830 --> 00:33:12,970 which means these operations preserve traversal order. 643 00:33:18,770 --> 00:33:21,380 Great, so this is a thing that we 644 00:33:21,380 --> 00:33:24,890 can do in a tree that won't affect any of the stuff 645 00:33:24,890 --> 00:33:26,330 we've done so far. 646 00:33:26,330 --> 00:33:28,670 It's a tool that we can use to rebalance. 647 00:33:28,670 --> 00:33:34,190 Notice how deep things are in the tree changes. 648 00:33:34,190 --> 00:33:37,310 Our problem with this linear tree 649 00:33:37,310 --> 00:33:39,200 is that there are some nodes of linear depth. 650 00:33:39,200 --> 00:33:40,440 We want to get rid of those. 651 00:33:40,440 --> 00:33:40,940 How? 652 00:33:40,940 --> 00:33:44,030 Well, we could take these edges and start rotating them up. 653 00:33:44,030 --> 00:33:47,850 If you look at depths, in this picture, 654 00:33:47,850 --> 00:33:51,650 A and B are deeper than C. And in this picture, 655 00:33:51,650 --> 00:33:55,100 B and C are deeper than A. So it's a trade off. 656 00:33:55,100 --> 00:33:57,470 This one moved up. 657 00:33:57,470 --> 00:33:58,490 This one moved down. 658 00:33:58,490 --> 00:34:01,040 This one stayed at the same depth. 659 00:34:01,040 --> 00:34:07,130 So hopefully, if A is too deep and C is too shallow, 660 00:34:07,130 --> 00:34:08,935 they can trade off like this. 661 00:34:08,935 --> 00:34:11,120 It may sound difficult, but in fact, there's 662 00:34:11,120 --> 00:34:18,770 a pretty simple way, which are called AVL trees, that 663 00:34:18,770 --> 00:34:24,439 maintain balance in a particular way called height balance. 664 00:34:34,730 --> 00:34:42,710 This is if we take the height of node.left-- 665 00:34:46,130 --> 00:34:47,500 actually, I'd prefer to-- 666 00:34:50,080 --> 00:35:03,190 node.right, minus height of node.left, 667 00:35:03,190 --> 00:35:11,050 so this thing is called skew of the node. 668 00:35:11,050 --> 00:35:15,040 I want this to always be minus 1, 0, or plus 1. 669 00:35:17,890 --> 00:35:21,930 So this is saying that if I have any node, 670 00:35:21,930 --> 00:35:25,372 and I look if its left subtree and its right subtree, 671 00:35:25,372 --> 00:35:26,830 I measure their heights-- remember, 672 00:35:26,830 --> 00:35:30,670 that's downward distance, maximum distance to a leaf-- 673 00:35:30,670 --> 00:35:32,138 I measure the height of this tree-- 674 00:35:32,138 --> 00:35:34,180 maximum height-- and I measure the maximum height 675 00:35:34,180 --> 00:35:39,012 of this subtree, I want these to be within 1 of each other. 676 00:35:39,012 --> 00:35:39,970 Ideally, they're equal. 677 00:35:39,970 --> 00:35:41,740 That would be the perfect case. 678 00:35:41,740 --> 00:35:44,140 But let's let them differ by 1. 679 00:35:44,140 --> 00:35:48,220 So maybe this is k and this is k plus 1. 680 00:35:48,220 --> 00:35:51,670 Or maybe this is k and this is k minus 1. 681 00:35:51,670 --> 00:35:53,730 In this picture, what is the height of this node? 682 00:35:53,730 --> 00:35:56,812 It's good practice. 683 00:35:56,812 --> 00:35:59,560 k plus 2, good. 684 00:35:59,560 --> 00:36:01,780 What's the longest path from this node to a leaf? 685 00:36:01,780 --> 00:36:03,460 Well, it could go through this subtree. 686 00:36:03,460 --> 00:36:06,160 And that would be length k plus 1, because it's k in here 687 00:36:06,160 --> 00:36:07,570 plus 1 for this edge. 688 00:36:07,570 --> 00:36:10,450 Or it could be through here, and that's k plus 1 plus 1. 689 00:36:10,450 --> 00:36:12,380 So the biggest is to go to the right. 690 00:36:12,380 --> 00:36:14,860 So the height-- if I told you the height of these subtrees, 691 00:36:14,860 --> 00:36:16,443 we can derive the height of this node. 692 00:36:16,443 --> 00:36:19,490 We're going to use that a lot in a moment. 693 00:36:19,490 --> 00:36:21,490 So, the first claim is that if I could 694 00:36:21,490 --> 00:36:26,620 maintain height balance, then I will 695 00:36:26,620 --> 00:36:28,640 guarantee that h equals log n. 696 00:36:28,640 --> 00:36:32,180 So in other words, height balance implies balance. 697 00:36:32,180 --> 00:36:36,260 So let's prove that first quickly. 698 00:36:36,260 --> 00:36:40,505 And then, the interesting part is how do we actually prove-- 699 00:36:44,690 --> 00:36:47,825 or how do we actually maintain the balance property? 700 00:36:47,825 --> 00:36:49,450 We're going to do that using rotations. 701 00:36:49,450 --> 00:36:51,490 But how is a big question. 702 00:37:01,720 --> 00:37:07,420 So why does height balance imply balance? 703 00:37:16,550 --> 00:37:21,530 So what this is saying is that all height balanced trees 704 00:37:21,530 --> 00:37:24,810 have logarithmic height. 705 00:37:24,810 --> 00:37:26,390 So what I'd like to think about is 706 00:37:26,390 --> 00:37:30,890 sort of the least balanced height balanced tree. 707 00:37:30,890 --> 00:37:36,070 The least balanced one is going to have every node a mismatch. 708 00:37:36,070 --> 00:37:39,170 Let's say the left subtree is shallower 709 00:37:39,170 --> 00:37:42,800 than the right subtree by 1, and recursively all the way down. 710 00:37:42,800 --> 00:37:45,650 So every node has a gap here, a-- 711 00:37:49,522 --> 00:37:51,480 what do we call it-- 712 00:37:51,480 --> 00:37:56,520 a skew of 1, which I'm going to write-- 713 00:37:56,520 --> 00:37:58,090 I'm going to introduce some notation. 714 00:37:58,090 --> 00:38:00,360 I'll write a dissenting rightward arrow of this one 715 00:38:00,360 --> 00:38:03,165 is higher than the left subtree. 716 00:38:06,470 --> 00:38:08,270 So the easy way to think about this 717 00:38:08,270 --> 00:38:10,120 is this is sort of our worst case. 718 00:38:10,120 --> 00:38:14,260 This is going to be the fewest nodes for the maximum depth. 719 00:38:14,260 --> 00:38:18,370 Let's just count how many nodes are in this tree. 720 00:38:18,370 --> 00:38:21,310 I'm going to write that as a recurrence, which 721 00:38:21,310 --> 00:38:25,660 is the number of nodes in a tree of height h. 722 00:38:25,660 --> 00:38:32,680 So if this whole tree has height h, as we said in this picture, 723 00:38:32,680 --> 00:38:35,750 if I just subtract 2 from all these numbers, 724 00:38:35,750 --> 00:38:40,270 then this one has height h minus 2, 725 00:38:40,270 --> 00:38:44,020 and this one has height h minus 1. 726 00:38:44,020 --> 00:38:45,833 So how many nodes are in here? 727 00:38:45,833 --> 00:38:47,750 Well, this is a recurrence I'm going to write. 728 00:38:47,750 --> 00:38:52,060 So this will be N sub h minus 2. 729 00:38:52,060 --> 00:38:55,360 This will be N sub h minus 1. 730 00:38:55,360 --> 00:38:57,910 And then I just count how many nodes are in this picture. 731 00:38:57,910 --> 00:39:06,100 It is Nh minus 1 plus Nh minus 2 plus 1, or this node. 732 00:39:06,100 --> 00:39:09,520 Now you might ask, what is Nh a recurrence for? 733 00:39:09,520 --> 00:39:15,760 But it is the number of nodes in this sort of worst case 734 00:39:15,760 --> 00:39:18,460 if the worst case has total height h. 735 00:39:18,460 --> 00:39:20,350 So you can also think of it as what 736 00:39:20,350 --> 00:39:22,090 is the minimum number of nodes I could 737 00:39:22,090 --> 00:39:25,780 have in an AVL tree, which is a height balanced tree, that 738 00:39:25,780 --> 00:39:38,610 has a height h in a height balanced tree? 739 00:39:38,610 --> 00:39:40,920 OK, so now I just need to solve this recurrence. 740 00:39:40,920 --> 00:39:42,450 This recurrence look familiar-ish? 741 00:39:48,050 --> 00:39:49,550 It's like Fibonacci numbers. 742 00:39:49,550 --> 00:39:52,130 If I remove the plus 1, it's Fibonacci. 743 00:39:52,130 --> 00:39:54,320 And if you happen to know the Fibonacci numbers grow 744 00:39:54,320 --> 00:39:56,690 as, like, a golden ratio to the n, 745 00:39:56,690 --> 00:39:58,490 then we know that this is exponential, 746 00:39:58,490 --> 00:39:59,540 which is what we want. 747 00:39:59,540 --> 00:40:02,690 Because if Nh is exponential in h, 748 00:40:02,690 --> 00:40:04,670 that means h is logarithmic in N, 749 00:40:04,670 --> 00:40:06,618 because log is inverse of exponential. 750 00:40:06,618 --> 00:40:08,660 But maybe you don't know about Fibonacci numbers. 751 00:40:08,660 --> 00:40:14,300 And so we can still easily show that this is exponential 752 00:40:14,300 --> 00:40:15,960 as follows. 753 00:40:15,960 --> 00:40:18,260 I want to prove that it's at least an exponential, 754 00:40:18,260 --> 00:40:22,830 because that gives me that h is at most logarithmic. 755 00:40:22,830 --> 00:40:24,183 So we need a lower bound. 756 00:40:24,183 --> 00:40:26,600 And so we have these two terms which are hard to compare-- 757 00:40:26,600 --> 00:40:28,850 Nh minus 1 and Nh minus 2. 758 00:40:28,850 --> 00:40:30,212 It's kind of ugly. 759 00:40:30,212 --> 00:40:31,670 But if we're allowed to be sloppy-- 760 00:40:31,670 --> 00:40:33,460 and we'll see if we're not too sloppy-- 761 00:40:33,460 --> 00:40:35,570 and still get an exponential answer, 762 00:40:35,570 --> 00:40:40,115 let's just make them equal like so. 763 00:40:44,150 --> 00:40:48,140 So this is a true statement, in fact, strictly greater than. 764 00:40:48,140 --> 00:40:48,680 Why? 765 00:40:48,680 --> 00:40:50,420 Because I removed the plus 1. 766 00:40:50,420 --> 00:40:52,310 That should only make something smaller. 767 00:40:52,310 --> 00:40:56,600 And I replaced Nh minus 1 with Nh minus 2. 768 00:40:56,600 --> 00:40:58,310 Here, I'm implicitly using a fact, 769 00:40:58,310 --> 00:41:03,170 which is obvious by induction, that this tree on height-- 770 00:41:03,170 --> 00:41:05,270 if I take this tree versus this tree, 771 00:41:05,270 --> 00:41:07,850 this one has more nodes than this one. 772 00:41:07,850 --> 00:41:10,130 If I have larger height, this construction 773 00:41:10,130 --> 00:41:13,250 is going to build a bigger tree, at least as big. 774 00:41:13,250 --> 00:41:15,710 It doesn't even need to be strictly bigger. 775 00:41:15,710 --> 00:41:18,140 So certainly, Nh minus 1 is greater than or equal 776 00:41:18,140 --> 00:41:20,060 to Nh minus 2. 777 00:41:20,060 --> 00:41:24,230 Now, this is 2 times Nh minus 2. 778 00:41:24,230 --> 00:41:25,710 And this is an easy recurrence. 779 00:41:25,710 --> 00:41:27,920 This is just powers of 2. 780 00:41:27,920 --> 00:41:31,460 I keep multiplying by 2, and subtracting 2 from h. 781 00:41:31,460 --> 00:41:35,150 So this solves to 2 to the h over 2, 782 00:41:35,150 --> 00:41:37,880 maybe with a floor or something. 783 00:41:37,880 --> 00:41:43,490 But I'm using a base case here, which is N sub 0 equals 1. 784 00:41:46,187 --> 00:41:47,270 Maybe it's a ceiling then. 785 00:41:47,270 --> 00:41:49,380 But the point is this is exponential. 786 00:41:49,380 --> 00:41:53,810 So this implies that the height is always, at most, 2 times 787 00:41:53,810 --> 00:41:55,730 log n. 788 00:41:55,730 --> 00:41:57,810 This 2 corresponds to this 2. 789 00:41:57,810 --> 00:41:59,750 If you just invert this formula, this 790 00:41:59,750 --> 00:42:04,500 was a number of nodes is going to be at least 2 791 00:42:04,500 --> 00:42:05,600 to the h over 2. 792 00:42:05,600 --> 00:42:07,710 And so h is, at most, 2 log n. 793 00:42:07,710 --> 00:42:08,913 So it's not log n. 794 00:42:08,913 --> 00:42:09,830 That would be perfect. 795 00:42:09,830 --> 00:42:12,020 But it's within a factor of 2 of log n. 796 00:42:12,020 --> 00:42:15,510 So AVL trees are always quite balanced. 797 00:42:15,510 --> 00:42:17,090 Number of levels is at most double 798 00:42:17,090 --> 00:42:19,710 what you need to store n nodes. 799 00:42:19,710 --> 00:42:20,210 Great. 800 00:42:23,980 --> 00:42:27,730 We're left with the main magic-- 801 00:42:27,730 --> 00:42:28,780 not domain magic. 802 00:42:28,780 --> 00:42:31,060 That's different. 803 00:42:31,060 --> 00:42:34,180 And let's see, we're going to use subtree augmentation. 804 00:42:37,550 --> 00:42:38,150 Keep that. 805 00:42:45,510 --> 00:42:48,330 Big remaining challenge is how do we 806 00:42:48,330 --> 00:42:51,240 maintain this high balance property using rotations? 807 00:42:51,240 --> 00:42:54,190 We have all the ingredients lined up for us. 808 00:42:54,190 --> 00:42:56,670 We have subtree augmentation. 809 00:42:56,670 --> 00:42:57,810 What does that let me do? 810 00:43:00,820 --> 00:43:02,710 It's relevant to AVL trees. 811 00:43:02,710 --> 00:43:06,880 Well, it lets me store height. 812 00:43:06,880 --> 00:43:11,230 I need to be able to compute the height of a node. 813 00:43:11,230 --> 00:43:13,090 That, in general, takes linear time, 814 00:43:13,090 --> 00:43:15,173 because I have to look at all the downward paths-- 815 00:43:15,173 --> 00:43:16,660 all the leaves within that subtree. 816 00:43:16,660 --> 00:43:20,520 But height is a subtree property-- 817 00:43:20,520 --> 00:43:27,470 so, yes-- height. 818 00:43:27,470 --> 00:43:28,920 Why? 819 00:43:28,920 --> 00:43:33,670 Because-- let me just write it here-- 820 00:43:33,670 --> 00:43:49,140 node.height equals 1 plus max of node.left.height 821 00:43:49,140 --> 00:43:57,150 and node.right.height and of max. 822 00:43:57,150 --> 00:44:00,390 Let me put this in a box. 823 00:44:00,390 --> 00:44:05,550 This equation, or I guess it's an assignment operation-- 824 00:44:05,550 --> 00:44:08,267 this is a 1-- 825 00:44:08,267 --> 00:44:10,100 is the thing we've been doing over and over. 826 00:44:10,100 --> 00:44:11,660 When I said what is the height of this node, 827 00:44:11,660 --> 00:44:12,890 you just figured that out, right? 828 00:44:12,890 --> 00:44:14,495 You took the height of the left subtree maxed 829 00:44:14,495 --> 00:44:15,995 with the height of the right subtree 830 00:44:15,995 --> 00:44:19,250 and added 1 to account for these edges. 831 00:44:19,250 --> 00:44:20,930 So this is a general update rule. 832 00:44:20,930 --> 00:44:23,540 It matches this subtree property pattern. 833 00:44:23,540 --> 00:44:25,280 If I have the property of left and right, 834 00:44:25,280 --> 00:44:27,800 I can compute it for node. 835 00:44:27,800 --> 00:44:29,570 And this takes constant time to do. 836 00:44:29,570 --> 00:44:30,900 And so it's a subtree property. 837 00:44:30,900 --> 00:44:33,388 And so I can maintain, through all the things I'm doing, 838 00:44:33,388 --> 00:44:34,430 the height of every node. 839 00:44:34,430 --> 00:44:37,800 Oh by the way, whenever I do a rotation, 840 00:44:37,800 --> 00:44:41,180 I'm also going to have to update my subtree properties. 841 00:44:41,180 --> 00:44:45,720 When I rotate this edge, A does not change, B does not change, 842 00:44:45,720 --> 00:44:46,710 C does not change. 843 00:44:46,710 --> 00:44:48,080 So that's good. 844 00:44:48,080 --> 00:44:49,685 But x's subtree changes. 845 00:44:49,685 --> 00:44:50,840 It now has y. 846 00:44:50,840 --> 00:44:52,320 It didn't before. 847 00:44:52,320 --> 00:44:56,620 So we're going to have to also update the augmentation here 848 00:44:56,620 --> 00:44:58,970 in y. 849 00:44:58,970 --> 00:45:03,550 And we're going to have to update the augmentation in x. 850 00:45:03,550 --> 00:45:04,940 And we're going to have to update 851 00:45:04,940 --> 00:45:10,280 the augmentation of all of the ancestors of x eventually. 852 00:45:10,280 --> 00:45:12,660 So rotation is locally just changing 853 00:45:12,660 --> 00:45:13,910 a constant number of pointers. 854 00:45:13,910 --> 00:45:18,050 So I usually think of rotations as taking constant time. 855 00:45:18,050 --> 00:45:20,835 But eventually, we will have to do-- 856 00:45:20,835 --> 00:45:22,085 this is constant time locally. 857 00:45:25,250 --> 00:45:35,720 But we will need to update h ancestors in order 858 00:45:35,720 --> 00:45:39,162 to store all of-- keep all of our augmentations up to date. 859 00:45:39,162 --> 00:45:40,370 We'll worry about that later. 860 00:45:43,380 --> 00:45:44,250 All right, so great. 861 00:45:44,250 --> 00:45:45,917 Now we have the height of all the nodes. 862 00:45:45,917 --> 00:45:50,860 We can compute the skew of all the nodes, cool. 863 00:45:50,860 --> 00:45:52,570 We have this rotation operation. 864 00:45:52,570 --> 00:45:58,320 And we want to maintain this height balance property. 865 00:45:58,320 --> 00:46:02,130 Height of left node-- left and right of every node-- 866 00:46:02,130 --> 00:46:03,630 is plus or minus 1, or 0. 867 00:46:06,210 --> 00:46:13,740 Cool, so I said over here somewhere, whenever we-- 868 00:46:13,740 --> 00:46:15,365 so the only things that change the tree 869 00:46:15,365 --> 00:46:17,630 are when we insert or delete a new node. 870 00:46:17,630 --> 00:46:20,210 And the way that we implemented those so far 871 00:46:20,210 --> 00:46:21,950 is to add or remove a leaf. 872 00:46:21,950 --> 00:46:24,660 So we should still be thinking about adding or removing 873 00:46:24,660 --> 00:46:25,160 a leaf. 874 00:46:25,160 --> 00:46:27,410 The problem is, when I add a new leaf, 875 00:46:27,410 --> 00:46:31,320 now maybe this tree is higher than it used to be. 876 00:46:31,320 --> 00:46:35,270 So some node here may no longer be height balanced. 877 00:46:35,270 --> 00:46:37,220 But because height is a subtree property, 878 00:46:37,220 --> 00:46:39,590 the only nodes we need to check are 879 00:46:39,590 --> 00:46:42,330 the ones up this ancestor path. 880 00:46:42,330 --> 00:46:44,900 And there's only log n of them, because now height is log n. 881 00:46:44,900 --> 00:46:47,660 That's what we just proved as long as we have this property. 882 00:46:47,660 --> 00:46:50,450 Now, we right now don't have it for, like, 883 00:46:50,450 --> 00:46:51,620 maybe these few nodes. 884 00:46:51,620 --> 00:46:53,570 But it was long n before. 885 00:46:53,570 --> 00:46:54,470 It's at most log n-- 886 00:46:54,470 --> 00:46:58,250 2 log n plus 1 right now, because we just added a node. 887 00:46:58,250 --> 00:47:00,920 So what I want to do is check all of these ancestor 888 00:47:00,920 --> 00:47:04,310 nodes in sequence from bottom up, and find 889 00:47:04,310 --> 00:47:06,550 one that's out of balance. 890 00:47:06,550 --> 00:47:15,710 So let's take the lowest out of balance node. 891 00:47:21,540 --> 00:47:24,860 I'm going to call that x. 892 00:47:24,860 --> 00:47:28,370 Now, because we just insert or deleted a single leaf, 893 00:47:28,370 --> 00:47:32,090 it's only out of balance by 1, because we only 894 00:47:32,090 --> 00:47:33,590 changed height-- 895 00:47:33,590 --> 00:47:36,710 one height went up by 1, or one height went down by 1. 896 00:47:36,710 --> 00:47:40,730 And before, all of our skews were plus or minus 1, or 0. 897 00:47:40,730 --> 00:47:44,223 So now, it's-- the bad case is when it's plus or minus 2. 898 00:47:44,223 --> 00:47:46,640 If it happens to still be in this range for all the nodes, 899 00:47:46,640 --> 00:47:47,510 we're happy. 900 00:47:47,510 --> 00:47:51,080 But if it's outside this range, it's only going to be out by 1. 901 00:47:51,080 --> 00:47:58,550 So this means the skew is n plus 2 or minus 2. 902 00:47:58,550 --> 00:48:01,910 And let's say that it's 2 by symmetry. 903 00:48:01,910 --> 00:48:06,098 So my picture is-- 904 00:48:06,098 --> 00:48:10,010 I'm going to draw double right arrow 905 00:48:10,010 --> 00:48:18,590 to say that this subtree is 2 higher than this subtree. 906 00:48:18,590 --> 00:48:21,320 OK, so that's bad and we want to fix it. 907 00:48:21,320 --> 00:48:25,265 The obvious thing to do is to rotate this edge. 908 00:48:25,265 --> 00:48:26,390 Because that'll make this-- 909 00:48:29,030 --> 00:48:31,590 this is too high and this is too low. 910 00:48:31,590 --> 00:48:33,410 So if we rotate, this should go down by 1 911 00:48:33,410 --> 00:48:34,550 and this should go up by 1. 912 00:48:34,550 --> 00:48:38,070 And that works most of the time. 913 00:48:38,070 --> 00:48:45,200 So case one is the skew of y. 914 00:48:45,200 --> 00:48:45,860 What is y? 915 00:48:45,860 --> 00:48:50,090 I want y to be the right child of x. 916 00:48:50,090 --> 00:48:54,260 Because we have a positive skew, we know there is a right child. 917 00:48:54,260 --> 00:48:57,120 Now, because this was the lowest bad node, 918 00:48:57,120 --> 00:48:58,770 we know that y is actually good. 919 00:48:58,770 --> 00:49:02,300 It's either right heavy-- or even the two subtrees 920 00:49:02,300 --> 00:49:04,820 have the same height-- or left heavy. 921 00:49:04,820 --> 00:49:13,940 The easy cases are when skew of y 922 00:49:13,940 --> 00:49:22,720 is either 1 or 0, which I will draw. 923 00:49:29,980 --> 00:49:34,630 So a double right arrow, let's say single right arrow-- 924 00:49:40,430 --> 00:49:44,020 so I'm just going to add some labels here 925 00:49:44,020 --> 00:49:48,562 to make this picture consistent-- 926 00:49:48,562 --> 00:49:51,210 k plus 1, k plus 2. 927 00:49:51,210 --> 00:49:52,770 I'm riding the heights. 928 00:49:52,770 --> 00:49:56,010 So this is an example where C is taller 929 00:49:56,010 --> 00:49:58,947 than B. A and B are the same height. 930 00:49:58,947 --> 00:50:00,780 And then if you compute the heights up here, 931 00:50:00,780 --> 00:50:03,480 indeed this one is right leaning. 932 00:50:03,480 --> 00:50:05,220 This one is doubly right leaning. 933 00:50:05,220 --> 00:50:06,850 Because this one has height k plus 1. 934 00:50:06,850 --> 00:50:08,100 This one has height k minus 1. 935 00:50:08,100 --> 00:50:08,890 That's bad. 936 00:50:08,890 --> 00:50:11,370 But if we do this right rotation on x, 937 00:50:11,370 --> 00:50:14,100 we get exactly what we want. 938 00:50:18,010 --> 00:50:20,940 So I'm just going to copy the labels on A, B, C-- 939 00:50:20,940 --> 00:50:23,760 we have k minus 1, k minus 1, and k-- 940 00:50:23,760 --> 00:50:24,630 and then recompute. 941 00:50:24,630 --> 00:50:26,400 That means this guy has height k, 942 00:50:26,400 --> 00:50:29,040 this one has height k plus 1. 943 00:50:29,040 --> 00:50:31,653 And now, all the nodes in this picture that I've highlighted-- 944 00:50:31,653 --> 00:50:32,820 A, B, and C haven't changed. 945 00:50:32,820 --> 00:50:34,195 They were height balanced before. 946 00:50:34,195 --> 00:50:35,400 They still are. 947 00:50:35,400 --> 00:50:36,720 But now, x and y-- 948 00:50:36,720 --> 00:50:39,030 x wasn't height balanced before, y was. 949 00:50:39,030 --> 00:50:42,420 Now, both x and y are height balanced. 950 00:50:42,420 --> 00:50:44,160 That's case one. 951 00:50:44,160 --> 00:50:49,390 In case two, the skew of y is flat, 952 00:50:49,390 --> 00:50:54,870 which means that this is a k, and this is a k, 953 00:50:54,870 --> 00:50:58,570 and this is a k plus 1, and this is a k plus 2. 954 00:50:58,570 --> 00:51:00,743 But still, all the nodes are balanced-- 955 00:51:00,743 --> 00:51:01,410 height balanced. 956 00:51:01,410 --> 00:51:03,090 They're still plus or minus 1. 957 00:51:03,090 --> 00:51:04,440 So those are the easy cases. 958 00:51:04,440 --> 00:51:07,240 Unfortunately, there is a hard case-- 959 00:51:07,240 --> 00:51:08,280 case three. 960 00:51:08,280 --> 00:51:11,280 But there's only one, and it's not that much harder. 961 00:51:16,320 --> 00:51:21,780 So it's when skew of y is minus 1. 962 00:51:21,780 --> 00:51:24,750 In this case, we need to look at the left child of y. 963 00:51:28,590 --> 00:51:31,830 And to be alphabetical, I'm going to rename this to z. 964 00:51:34,470 --> 00:51:36,780 So this one, again, is double right arrow. 965 00:51:36,780 --> 00:51:39,540 This one is now left arrow. 966 00:51:39,540 --> 00:51:41,910 And this is letter y. 967 00:51:41,910 --> 00:51:49,950 And so we have A, B, C, and D potential subtrees hanging off 968 00:51:49,950 --> 00:51:50,550 of them. 969 00:51:50,550 --> 00:51:54,310 And I'm going to label the heights of these things. 970 00:51:54,310 --> 00:51:59,010 These are each k minus 1 or k minus 2. 971 00:51:59,010 --> 00:52:00,420 This one's k minus 1. 972 00:52:00,420 --> 00:52:01,770 And now, compute the inside. 973 00:52:01,770 --> 00:52:06,690 So this is going to height k for this to be left leaning. 974 00:52:06,690 --> 00:52:11,520 So this is k plus 1, and this is k plus 2. 975 00:52:11,520 --> 00:52:13,680 But the problem is this is 2 higher than this. 976 00:52:13,680 --> 00:52:17,370 The height of z is 2 higher than the height of A. 977 00:52:17,370 --> 00:52:18,990 This case, if I do this rotation, 978 00:52:18,990 --> 00:52:20,910 things get worse, actually. 979 00:52:20,910 --> 00:52:26,610 I'll just tell you the right thing to do is-- 980 00:52:26,610 --> 00:52:28,650 this is the one thing you need to memorize. 981 00:52:41,700 --> 00:52:43,860 And let me draw the results. 982 00:52:43,860 --> 00:52:47,300 You can also just think of it as redrawing the tree like this. 983 00:52:47,300 --> 00:52:49,940 But it's easier from an analysis perspective 984 00:52:49,940 --> 00:52:51,680 to think about it as two rotations. 985 00:52:51,680 --> 00:52:52,940 Then we can just reduce. 986 00:52:52,940 --> 00:52:54,800 As long as we know how rotations work, 987 00:52:54,800 --> 00:52:56,460 then we know that this thing works-- 988 00:52:56,460 --> 00:52:59,060 "works" meaning it preserves traversal order 989 00:52:59,060 --> 00:53:01,160 and we can maintain all the augmentations. 990 00:53:01,160 --> 00:53:04,340 So now, if I copy over these labels-- the height labels-- 991 00:53:04,340 --> 00:53:05,390 I have k minus 1. 992 00:53:05,390 --> 00:53:08,600 I have, for these two guys, k minus 1 or k minus 2. 993 00:53:08,600 --> 00:53:11,100 The biggest one is k minus 1. 994 00:53:11,100 --> 00:53:13,190 This is k minus 1. 995 00:53:13,190 --> 00:53:15,410 And so this will be k. 996 00:53:15,410 --> 00:53:17,450 This will be k. 997 00:53:17,450 --> 00:53:19,070 This will be k plus 1. 998 00:53:19,070 --> 00:53:22,490 And lo and behold, we have a nice, height balanced tree 999 00:53:22,490 --> 00:53:25,760 in all three cases for this one node. 1000 00:53:25,760 --> 00:53:27,020 Now, this was the lowest node. 1001 00:53:27,020 --> 00:53:28,880 Once we update this one, it could 1002 00:53:28,880 --> 00:53:32,090 be that we changed the height of the root. 1003 00:53:32,090 --> 00:53:35,750 Before it was k plus 2, now it's k plus 1. 1004 00:53:35,750 --> 00:53:39,483 Or sometimes, we keep it the same, like over in this case. 1005 00:53:39,483 --> 00:53:41,150 And so now, we have to check the parent. 1006 00:53:41,150 --> 00:53:42,740 Maybe the parent is out of balance. 1007 00:53:42,740 --> 00:53:44,720 And we just keep walking up the node, 1008 00:53:44,720 --> 00:53:47,660 and also maintain all the augmentations as we go. 1009 00:53:47,660 --> 00:53:49,790 Then, we'll keep track of height and subtree size 1010 00:53:49,790 --> 00:53:51,980 if we want them, or any other augmentations. 1011 00:53:51,980 --> 00:53:54,425 And after order h operations, we will 1012 00:53:54,425 --> 00:53:56,300 have restored height balanced property, which 1013 00:53:56,300 --> 00:53:59,280 means all the way through, h equals order log n. 1014 00:53:59,280 --> 00:54:03,250 And so all of our operations now are magically order log n.