1 00:00:08 --> 00:00:12 Today we're going to talk about sorting, which may not come as 2 00:00:12 --> 00:00:15 such a big surprise. We talked about sorting for a 3 00:00:15 --> 00:00:20 while, but we're going to talk about it at a somewhat higher 4 00:00:20 --> 00:00:24 level and question some of the assumptions that we've been 5 00:00:24 --> 00:00:27 making so far. And we're going to ask the 6 00:00:27 --> 00:00:32 question how fast can we sort? A pretty natural question. 7 00:00:32 --> 00:00:35 You may think you know the answer. 8 00:00:35 --> 00:00:40 Perhaps you do. Any suggestions on what the 9 00:00:40 --> 00:00:43 answer to this question might be? 10 00:00:43 --> 00:00:46 There are several possible answers. 11 00:00:46 --> 00:00:50 Many of them are partially correct. 12 00:00:50 --> 00:00:56 Let's hear any kinds of answers you'd like and start waking up 13 00:00:56 --> 00:01:00 this fresh morning. Sorry? 14 00:01:00 --> 00:01:02 Theta n log n. That's a good answer. 15 00:01:02 --> 00:01:06 That's often correct. Any other suggestions? 16 00:01:06 --> 00:01:09 N squared. That's correct if all you're 17 00:01:09 --> 00:01:12 allowed to do is swap adjacent elements. 18 00:01:12 --> 00:01:13 Good. That was close. 19 00:01:13 --> 00:01:17 I will see if I can make every answer correct. 20 00:01:17 --> 00:01:20 Usually n squared is not the right answer, 21 00:01:20 --> 00:01:22 but in some models it is. Yeah? 22 00:01:22 --> 00:01:26 Theta n is also sometimes the right answer. 23 00:01:26 --> 00:01:30 The real answer is "it depends". 24 00:01:30 --> 00:01:33 That's the point of today's lecture. 25 00:01:33 --> 00:01:37 It depends on what we call the computational model, 26 00:01:37 --> 00:01:42 what you're allowed to do. And, in particular here, 27 00:01:42 --> 00:01:46 with sorting, what we care about is the order 28 00:01:46 --> 00:01:49 of the elements, how are you allowed to 29 00:01:49 --> 00:01:54 manipulate the elements, what are you allowed to do with 30 00:01:54 --> 00:02:00 them and find out their order. The model is what you can do 31 00:02:00 --> 00:02:03 with the elements. 32 00:02:03 --> 00:02:14 33 00:02:14 --> 00:02:18 Now, we've seen several sorting algorithms. 34 00:02:18 --> 00:02:23 Do you want to shout some out? I think we've seen four, 35 00:02:23 --> 00:02:27 but maybe you know even more algorithms. 36 00:02:27 --> 00:02:30 Quicksort. Keep going. 37 00:02:30 --> 00:02:32 Heapsort. Merge sort. 38 00:02:32 --> 00:02:37 You can remember all the way back to Lecture 1. 39 00:02:37 --> 00:02:39 Any others? Insertion sort. 40 00:02:39 --> 00:02:43 All right. You're on top of it today. 41 00:02:43 --> 00:02:49 I don't know exactly why, but these two are single words 42 00:02:49 --> 00:02:54 and these two are two words. That's the style. 43 00:02:54 --> 00:03:00 What is the running time of quicksort? 44 00:03:00 --> 00:03:04 This is a bit tricky. N log n in the average case. 45 00:03:04 --> 00:03:10 Or, if we randomize quicksort, randomized quicksort runs in n 46 00:03:10 --> 00:03:14 log n expected for any input sequence. 47 00:03:14 --> 00:03:18 Let's say n lg n randomized. That's theta. 48 00:03:18 --> 00:03:24 And the worst-case with plain old quicksort where you just 49 00:03:24 --> 00:03:30 pick the first element as the partition element. 50 00:03:30 --> 00:03:34 That's n^2. Heapsort, what's the running 51 00:03:34 --> 00:03:37 time there? n lg n always. 52 00:03:37 --> 00:03:43 Merge sort, I hope you can remember that as well, 53 00:03:43 --> 00:03:46 n lg n. And insertion sort? 54 00:03:46 --> 00:03:50 n^2. All of these algorithms run no 55 00:03:50 --> 00:03:54 faster than n lg n, so we might ask, 56 00:03:54 --> 00:03:59 can we do better than n lg n? 57 00:03:59 --> 00:04:11 58 00:04:11 --> 00:04:13 And that is a question, in some sense, 59 00:04:13 --> 00:04:16 we will answer both yes and no to today. 60 00:04:16 --> 00:04:20 But all of these algorithms have something in common in 61 00:04:20 --> 00:04:25 terms of the model of what you're allowed to do with the 62 00:04:25 --> 00:04:28 elements. Any guesses on what that model 63 00:04:28 --> 00:04:30 might be? Yeah? 64 00:04:30 --> 00:04:33 You compare pairs of elements, exactly. 65 00:04:33 --> 00:04:39 That is indeed the model used by all four of these algorithms. 66 00:04:39 --> 00:04:43 And in that model n lg n is the best you can do. 67 00:04:43 --> 00:04:48 We have so far just looked at what are called comparison 68 00:04:48 --> 00:04:52 sorting algorithms or "comparison sorts". 69 00:04:52 --> 00:04:57 And this is a model for the sorting problem of what you're 70 00:04:57 --> 00:05:02 allowed to do. Here all you can do is use 71 00:05:02 --> 00:05:06 comparisons meaning less than, greater than, 72 00:05:06 --> 00:05:11 less than or equal to, greater than or equal to, 73 00:05:11 --> 00:05:17 equals to determine the relative order of elements. 74 00:05:17 --> 00:05:25 75 00:05:25 --> 00:05:26 This is a restriction on algorithms. 76 00:05:26 --> 00:05:29 It is, in some sense, stating what kinds of elements 77 00:05:29 --> 00:05:32 we're dealing with. They are elements that we can 78 00:05:32 --> 00:05:35 somehow compare. They have a total order, 79 00:05:35 --> 00:05:37 some are less, some are bigger. 80 00:05:37 --> 00:05:39 But is also restricts the algorithm. 81 00:05:39 --> 00:05:42 You could say, well, I'm sorting integers, 82 00:05:42 --> 00:05:45 but still I'm only allowed to do comparisons with them. 83 00:05:45 --> 00:05:49 I'm not allowed to multiply the integers or do other weird 84 00:05:49 --> 00:05:51 things. That's the comparison sorting 85 00:05:51 --> 00:05:52 model. And this lecture, 86 00:05:52 --> 00:05:55 in some sense, follows the standard 87 00:05:55 --> 00:05:58 mathematical progression where you have a theorem, 88 00:05:58 --> 00:06:01 then you have a proof, then you have a counter 89 00:06:01 --> 00:06:05 example. It's always a good way to have 90 00:06:05 --> 00:06:07 a math lecture. We're going to prove the 91 00:06:07 --> 00:06:11 theorem that no comparison sorting algorithm runs better 92 00:06:11 --> 00:06:13 than n lg n. Comparisons. 93 00:06:13 --> 00:06:17 State the theorem, prove that, and then we'll give 94 00:06:17 --> 00:06:21 a counter example in the sense that if you go outside the 95 00:06:21 --> 00:06:25 comparison sorting model you can do better, you can get linear 96 00:06:25 --> 00:06:28 time in some cases, better than n lg n. 97 00:06:28 --> 00:06:32 So, that is what we're doing today. 98 00:06:32 --> 00:06:36 But first we're going to stick to this comparison model and try 99 00:06:36 --> 00:06:41 to understand why we need n lg n comparisons if that's all we're 100 00:06:41 --> 00:06:45 allowed to do. And for that we're going to 101 00:06:45 --> 00:06:48 look at something called decision trees, 102 00:06:48 --> 00:06:52 which in some sense is another model of what you're allowed to 103 00:06:52 --> 00:06:56 do in an algorithm, but it's more general than the 104 00:06:56 --> 00:07:01 comparison model. And let's try and example to 105 00:07:01 --> 00:07:06 get some intuition. Suppose we want to sort three 106 00:07:06 --> 00:07:10 elements. This is not very challenging, 107 00:07:10 --> 00:07:15 but we'll get to draw the decision tree that corresponds 108 00:07:15 --> 00:07:22 to sorting three elements. Here is one solution I claim. 109 00:07:22 --> 00:07:42 110 00:07:42 --> 00:07:45 This is, in a certain sense, an algorithm, 111 00:07:45 --> 00:07:50 but it's drawn as a tree instead of pseudocode. 112 00:07:50 --> 00:08:15 113 00:08:15 --> 00:08:18 What this tree means is that each node you're making a 114 00:08:18 --> 00:08:21 comparison. This says compare a_1 versus 115 00:08:21 --> 00:08:24 a_2. If a_1 is smaller than a_2 you 116 00:08:24 --> 00:08:27 go this way, if it is bigger than a_2 you go this way, 117 00:08:27 --> 00:08:32 and then you proceed. When you get down to a leaf, 118 00:08:32 --> 00:08:36 this is the answer. Remember, the sorting problem 119 00:08:36 --> 00:08:41 is you're trying to find a permutation of the inputs that 120 00:08:41 --> 00:08:45 puts it in sorted order. Let's try it with some sequence 121 00:08:45 --> 00:08:48 of numbers, say 9, 4 and 6. 122 00:08:48 --> 00:08:51 We want to sort 9, 4 and 6, so first we compare 123 00:08:51 --> 00:08:55 the first element with the second element. 124 00:08:55 --> 00:09:00 9 is bigger than 4 so we go down this way. 125 00:09:00 --> 00:09:03 Then we compare the first element with the third element, 126 00:09:03 --> 00:09:05 that's 9 versus 6. 9 is bigger than 6, 127 00:09:05 --> 00:09:08 so we go this way. And then we compare the second 128 00:09:08 --> 00:09:11 element with the third element, 4 is less than 6 and, 129 00:09:11 --> 00:09:14 so we go this way. And the claim is that this is 130 00:09:14 --> 00:09:16 the correct permutation of the elements. 131 00:09:16 --> 00:09:19 You take a_2, which is 4, then you take a_3, 132 00:09:19 --> 00:09:22 which is 6, and then you take a_1, which is 9, 133 00:09:22 --> 00:09:25 so indeed that works out. And if I wrote this down right, 134 00:09:25 --> 00:09:30 this is a sorting algorithm in the decision tree model. 135 00:09:30 --> 00:09:36 In general, let me just say the rules of this game. 136 00:09:36 --> 00:09:43 In general, we have n elements we want to sort. 137 00:09:43 --> 00:09:52 And I only drew the n = 3 case because these trees get very big 138 00:09:52 --> 00:09:56 very quickly. Each internal node, 139 00:09:56 --> 00:10:03 so every non-leaf node, has a label of the form i : 140 00:10:03 --> 00:10:10 j where i and j are between 1 and n. 141 00:10:10 --> 00:10:15 142 00:10:15 --> 00:10:23 And this means that we compare a_i with a_j. 143 00:10:23 --> 00:10:29 144 00:10:29 --> 00:10:33 And we have two subtrees from every such node. 145 00:10:33 --> 00:10:40 We have the left subtree which tells you what the algorithm 146 00:10:40 --> 00:10:45 does, what subsequent comparisons it makes if it comes 147 00:10:45 --> 00:10:48 out less than. 148 00:10:48 --> 00:10:54 149 00:10:54 --> 00:10:57 And we have to be a little bit careful because it could also 150 00:10:57 --> 00:10:59 come out equal. What we will do is the left 151 00:10:59 --> 00:11:03 subtree corresponds to less than or equal to and the right 152 00:11:03 --> 00:11:06 subtree corresponds to strictly greater than. 153 00:11:06 --> 00:11:17 154 00:11:17 --> 00:11:21 That is a little bit more precise than what we were doing 155 00:11:21 --> 00:11:23 here. Here all the elements were 156 00:11:23 --> 00:11:26 distinct so no problem. But, in general, 157 00:11:26 --> 00:11:30 we care about the equality case too to be general. 158 00:11:30 --> 00:11:32 So, that was the internal nodes. 159 00:11:32 --> 00:11:36 And then each leaf node gives you a permutation. 160 00:11:36 --> 00:11:44 161 00:11:44 --> 00:11:47 So, in order to be the answer to that sorting problem, 162 00:11:47 --> 00:11:52 that permutation better have the property that it orders the 163 00:11:52 --> 00:11:54 elements. This is from the first lecture 164 00:11:54 --> 00:11:58 when we defined the sorting problem. 165 00:11:58 --> 00:12:05 Some permutation on n things such that a_pi(1) is less than 166 00:12:05 --> 00:12:09 or equal to a_pi(2) and so on. 167 00:12:09 --> 00:12:15 168 00:12:15 --> 00:12:18 So, that is the definition of a decision tree. 169 00:12:18 --> 00:12:21 Any binary tree with these kinds of labels satisfies all 170 00:12:21 --> 00:12:24 these properties. That is, in some sense, 171 00:12:24 --> 00:12:28 a sorting algorithm. It's a sorting algorithm in the 172 00:12:28 --> 00:12:31 decision tree model. Now, as you might expect, 173 00:12:31 --> 00:12:35 this is really not too different than the comparison 174 00:12:35 --> 00:12:37 model. If I give you a comparison 175 00:12:37 --> 00:12:40 sorting algorithm, we have these four, 176 00:12:40 --> 00:12:44 quicksort, heapsort, merge sort and insertion sort. 177 00:12:44 --> 00:12:48 All of them can be translated into the decision tree model. 178 00:12:48 --> 00:12:52 It's sort of a graphical representation of what the 179 00:12:52 --> 00:12:55 algorithm does. It's not a terribly useful one 180 00:12:55 --> 00:13:00 for writing down an algorithm. Any guesses why? 181 00:13:00 --> 00:13:03 Why do we not draw these pictures as a definition of 182 00:13:03 --> 00:13:06 quicksort or a definition of merge sort? 183 00:13:06 --> 00:13:09 It depends on the size of the input, that's a good point. 184 00:13:09 --> 00:13:13 This tree is specific to the value of n, so it is, 185 00:13:13 --> 00:13:15 in some sense, not as generic. 186 00:13:15 --> 00:13:19 Now, we could try to write down a construction for an arbitrary 187 00:13:19 --> 00:13:22 value of n of one of these decision trees and that would 188 00:13:22 --> 00:13:28 give us sort of a real algorithm that works for any input size. 189 00:13:28 --> 00:13:31 But even then this is not a terribly convenient 190 00:13:31 --> 00:13:34 representation for writing down an algorithm. 191 00:13:34 --> 00:13:38 Well, let's write down a transformation that converts a 192 00:13:38 --> 00:13:42 comparison sorting algorithm to a decision tree and then maybe 193 00:13:42 --> 00:13:45 you will see why. This is not a useless model, 194 00:13:45 --> 00:13:48 obviously, I wouldn't be telling you otherwise. 195 00:13:48 --> 00:13:52 It will be very powerful for proving that we cannot do better 196 00:13:52 --> 00:13:56 than n lg n, but as writing down an algorithm, 197 00:13:56 --> 00:14:00 if you were going to implement something, this tree is not so 198 00:14:00 --> 00:14:05 useful. Even if you had a decision tree 199 00:14:05 --> 00:14:10 computer, whatever that is. But let's prove this theorem 200 00:14:10 --> 00:14:14 that decision trees, in some sense, 201 00:14:14 --> 00:14:19 model comparison sorting algorithms, which we call just 202 00:14:19 --> 00:14:22 comparison sorts. 203 00:14:22 --> 00:14:29 204 00:14:29 --> 00:14:33 This is a transformation. And we're going to build one 205 00:14:33 --> 00:14:38 tree for each value of n. The decision trees depend on n. 206 00:14:38 --> 00:14:43 The algorithm hopefully, well, it depends on n, 207 00:14:43 --> 00:14:46 but it works for all values of n. 208 00:14:46 --> 00:14:51 And we're just going to think of the algorithm as splitting 209 00:14:51 --> 00:14:55 into two forks, the left subtree and the right 210 00:14:55 --> 00:15:00 subtree whenever it makes a comparison. 211 00:15:00 --> 00:15:07 212 00:15:07 --> 00:15:09 If we take a comparison sort like merge sort. 213 00:15:09 --> 00:15:12 And it does lots of stuff. It does index arithmetic, 214 00:15:12 --> 00:15:14 it does recursion, whatever. 215 00:15:14 --> 00:15:18 But at some point it makes a comparison and then we say, 216 00:15:18 --> 00:15:20 OK, there are two halves of the algorithm. 217 00:15:20 --> 00:15:24 There is what the algorithm would do if the comparison came 218 00:15:24 --> 00:15:27 out less than or equal to and what the algorithm would do if 219 00:15:27 --> 00:15:31 the comparison came out greater than. 220 00:15:31 --> 00:15:33 So, you can build a tree in this way. 221 00:15:33 --> 00:15:37 In some sense, what this tree is doing is 222 00:15:37 --> 00:15:42 listing all possible executions of this algorithm considering 223 00:15:42 --> 00:15:46 what would happen for all possible values of those 224 00:15:46 --> 00:15:48 comparisons. 225 00:15:48 --> 00:15:59 226 00:15:59 --> 00:16:03 We will call these all possible instruction traces. 227 00:16:03 --> 00:16:09 If you write down all the instructions that are executed 228 00:16:09 --> 00:16:13 by this algorithm, for all possible input arrays, 229 00:16:13 --> 00:16:19 a_1 to a_n, see what all the comparisons, how they could come 230 00:16:19 --> 00:16:25 and what the algorithm does, in the end you will get a tree. 231 00:16:25 --> 00:16:30 Now, how big will that tree be roughly? 232 00:16:30 --> 00:16:43 233 00:16:43 --> 00:16:48 As a function of n. Yeah? 234 00:16:48 --> 00:16:55 235 00:16:55 --> 00:16:57 Right. If it's got to be able to sort 236 00:16:57 --> 00:17:01 every possible list of length n, at the leaves I have to have 237 00:17:01 --> 00:17:05 all the permutations of those elements. 238 00:17:05 --> 00:17:07 That is a lot. There are a lot of permeations 239 00:17:07 --> 00:17:10 on n elements. There's n factorial of them. 240 00:17:10 --> 00:17:13 N factorial is exponential, it's really big. 241 00:17:13 --> 00:17:17 So, this tree is huge. It's going to be exponential on 242 00:17:17 --> 00:17:19 the input size n. That is why we don't write 243 00:17:19 --> 00:17:22 algorithms down normally as a decision tree, 244 00:17:22 --> 00:17:25 even though in some cases maybe we could. 245 00:17:25 --> 00:17:29 It's not a very compact representation. 246 00:17:29 --> 00:17:31 These algorithms, you write them down in 247 00:17:31 --> 00:17:33 pseudocode, they have constant length. 248 00:17:33 --> 00:17:35 It's a very succinct representation of this 249 00:17:35 --> 00:17:38 algorithm. Here the length depends on n 250 00:17:38 --> 00:17:41 and it depends exponentially on n, which is not useful if you 251 00:17:41 --> 00:17:44 wanted to implement the algorithm because writing down 252 00:17:44 --> 00:17:46 the algorithm would take a long time. 253 00:17:46 --> 00:17:49 But, nonetheless, we can use this as a tool to 254 00:17:49 --> 00:17:51 analyze these comparison sorting algorithms. 255 00:17:51 --> 00:17:54 We have all of these. Any algorithm can be 256 00:17:54 --> 00:17:58 transformed in this way into a decision tree. 257 00:17:58 --> 00:18:03 And now we have this observation that the number of 258 00:18:03 --> 00:18:08 leaves in this decision tree has to be really big. 259 00:18:08 --> 00:18:12 Let me talk about leaves in a second. 260 00:18:12 --> 00:18:18 Before we get to leaves, let's talk about the depth of 261 00:18:18 --> 00:18:20 the tree. 262 00:18:20 --> 00:18:26 263 00:18:26 --> 00:18:29 This decision tree represents all possible executions of the 264 00:18:29 --> 00:18:31 algorithm. If I look at a particular 265 00:18:31 --> 00:18:35 execution, which corresponds to some root to leaf path in the 266 00:18:35 --> 00:18:38 tree, the running time or the number of comparisons made by 267 00:18:38 --> 00:18:42 that execution is just the length of the path. 268 00:18:42 --> 00:18:47 269 00:18:47 --> 00:18:52 And, therefore, the worst-case running time, 270 00:18:52 --> 00:18:59 over all possible inputs of length n, is going to be -- 271 00:18:59 --> 00:19:05 272 00:19:05 --> 00:19:06 n - 1? Could be. 273 00:19:06 --> 00:19:11 Depends on the decision tree. But, as a function of the 274 00:19:11 --> 00:19:14 decision tree? The longest path, 275 00:19:14 --> 00:19:19 right, which is called the height of the tree. 276 00:19:19 --> 00:19:24 277 00:19:24 --> 00:19:26 So, this is what we want to measure. 278 00:19:26 --> 00:19:29 We want to claim that the height of the tree has to be at 279 00:19:29 --> 00:19:32 least n lg n with an omega in front. 280 00:19:32 --> 00:19:34 That is what we'll prove. 281 00:19:34 --> 00:19:42 282 00:19:42 --> 00:19:44 And the only thing we're going to use is that the number of 283 00:19:44 --> 00:19:48 leaves in that tree has to be big, has to be n factorial. 284 00:19:48 --> 00:20:00 285 00:20:00 --> 00:20:09 This is a lower bound on decision tree sorting. 286 00:20:09 --> 00:20:21 287 00:20:21 --> 00:20:26 And the lower bound says that if you have any decision tree 288 00:20:26 --> 00:20:32 that sorts n elements then its height has to be at least n lg n 289 00:20:32 --> 00:20:35 up to constant factors. 290 00:20:35 --> 00:20:45 291 00:20:45 --> 00:20:52 So, that is the theorem. Now we're going to prove the 292 00:20:52 --> 00:20:57 theorem. And we're going to use that the 293 00:20:57 --> 00:21:06 number of leaves in that tree must be at least n factorial. 294 00:21:06 --> 00:21:10 Because there are n factorial permutations of the inputs. 295 00:21:10 --> 00:21:14 All of them could happen. And so, for this algorithm to 296 00:21:14 --> 00:21:19 be correct, it has detect every one of those permutations in 297 00:21:19 --> 00:21:22 some way. Now, it may do it very quickly. 298 00:21:22 --> 00:21:26 We better only need n lg n comparisons because we know 299 00:21:26 --> 00:21:31 that's possible. The depth of the tree may not 300 00:21:31 --> 00:21:35 be too big, but it has to have a huge number of leaves down 301 00:21:35 --> 00:21:37 there. It has to branch enough to get 302 00:21:37 --> 00:21:42 n factorial leaves because it has to give the right answer in 303 00:21:42 --> 00:21:45 possible inputs. This is, in some sense, 304 00:21:45 --> 00:21:49 counting the number of possible inputs that we have to 305 00:21:49 --> 00:21:52 distinguish. This is the number of leaves. 306 00:21:52 --> 00:21:55 What we care about is the height of the tree. 307 00:21:55 --> 00:21:59 Let's call the height of the tree h. 308 00:21:59 --> 00:22:02 Now, if I have a tree of height h, how many leaves could it 309 00:22:02 --> 00:22:04 have? What's the maximum number of 310 00:22:04 --> 00:22:06 leaves it could have? 311 00:22:06 --> 00:22:19 312 00:22:19 --> 00:22:23 2^h, exactly. Because this is binary tree, 313 00:22:23 --> 00:22:29 comparison trees always have a branching factor of 2, 314 00:22:29 --> 00:22:35 the number of leaves has to be at most 2^h, if I have a height 315 00:22:35 --> 00:22:38 h tree. Now, this gives me a relation. 316 00:22:38 --> 00:22:41 The number of leaves has to be greater than or equal to n 317 00:22:41 --> 00:22:44 factorial and the number of leaves has to be less than or 318 00:22:44 --> 00:22:47 equal to 2^h. Therefore, n factorial is less 319 00:22:47 --> 00:22:50 than or equal to 2^h, if I got that right. 320 00:22:50 --> 00:22:58 321 00:22:58 --> 00:23:02 Now, again, we care about h in terms of n factorial, 322 00:23:02 --> 00:23:04 so we solve this by taking logs. 323 00:23:04 --> 00:23:07 And I am also going to flip sides. 324 00:23:07 --> 00:23:12 Now h is at least log base 2, because there is a 2 over here, 325 00:23:12 --> 00:23:15 of n factorial. There is a property that I'm 326 00:23:15 --> 00:23:20 using here in order to derive this inequality from this 327 00:23:20 --> 00:23:23 inequality. This is a technical aside, 328 00:23:23 --> 00:23:27 but it's important that you realize there is a technical 329 00:23:27 --> 00:23:30 issue here. 330 00:23:30 --> 00:23:40 331 00:23:40 --> 00:23:43 The general principle I'm applying is I have some 332 00:23:43 --> 00:23:46 inequality, I do the same thing to both sides, 333 00:23:46 --> 00:23:49 and hopefully that inequality should still be true. 334 00:23:49 --> 00:23:53 But, in order for that to be the case, I need a property 335 00:23:53 --> 00:23:56 about that operation that I'm performing. 336 00:23:56 --> 00:24:00 It has to be a monotonic transformation. 337 00:24:00 --> 00:24:04 Here what I'm using is that log is a monotonically increasing 338 00:24:04 --> 00:24:06 function. That is important. 339 00:24:06 --> 00:24:11 If I multiply both sides by -1, which is a decreasing function, 340 00:24:11 --> 00:24:14 the inequality would have to get flipped. 341 00:24:14 --> 00:24:18 The fact that the inequality is not flipping here, 342 00:24:18 --> 00:24:21 I need to know that log is monotonically increasing. 343 00:24:21 --> 00:24:27 If you see log that's true. We need to be careful here. 344 00:24:27 --> 00:24:31 Now we need some approximation of n factorial in order to 345 00:24:31 --> 00:24:36 figure out what its log is. Does anyone know a good 346 00:24:36 --> 00:24:41 approximation for n factorial? Not necessarily the equation 347 00:24:41 --> 00:24:44 but the name. Stirling's formula. 348 00:24:44 --> 00:24:47 Good. You all remember Stirling. 349 00:24:47 --> 00:24:52 And I just need the highest order term, which I believe is 350 00:24:52 --> 00:24:54 that. N factorial is at least 351 00:24:54 --> 00:24:59 (n/e)^n. So, that's all we need here. 352 00:24:59 --> 00:25:06 Now I can use properties of logs to bring the n outside. 353 00:25:06 --> 00:25:09 This is n lg (n/e). 354 00:25:09 --> 00:25:15 355 00:25:15 --> 00:25:18 And then lg (n/e) I can simplify. 356 00:25:18 --> 00:25:28 357 00:25:28 --> 00:25:32 That is just lg n - lg e. So, this is n(lg n - lg e). 358 00:25:32 --> 00:25:37 Lg e is a constant, so it's really tiny compared to 359 00:25:37 --> 00:25:39 this lg n which is growing within. 360 00:25:39 --> 00:25:44 This is Omega(n lg n). All we care about is the 361 00:25:44 --> 00:25:47 leading term. It is actually Theta(n lg n), 362 00:25:47 --> 00:25:52 but because we have it greater than or equal to all we care 363 00:25:52 --> 00:25:57 about is the omega. A theta here wouldn't give us 364 00:25:57 --> 00:26:01 anything stronger. Of course, not all algorithms 365 00:26:01 --> 00:26:04 have n lg n running time or make n lg n comparisons. 366 00:26:04 --> 00:26:07 Some of them do, some of them are worse, 367 00:26:07 --> 00:26:10 but this proves that all of them require a height of at 368 00:26:10 --> 00:26:12 least n lg n. There you see proof, 369 00:26:12 --> 00:26:15 once you observe the fact about the number of leaves, 370 00:26:15 --> 00:26:18 and if you remember Stirling's formula. 371 00:26:18 --> 00:26:22 So, you should know this proof. You can show that all sorts of 372 00:26:22 --> 00:26:25 problems require n lg n time with this kind of technique, 373 00:26:25 --> 00:26:30 provided you're in some kind of a decision tree model. 374 00:26:30 --> 00:26:32 That's important. We really need that our 375 00:26:32 --> 00:26:35 algorithm can be phrased as a decision tree. 376 00:26:35 --> 00:26:37 And, in particular, we know from this 377 00:26:37 --> 00:26:40 transformation that all comparison sorts can be 378 00:26:40 --> 00:26:42 represented as the decision tree. 379 00:26:42 --> 00:26:45 But there are some sorting algorithms which cannot be 380 00:26:45 --> 00:26:48 represented as a decision tree. And we will turn to that 381 00:26:48 --> 00:26:51 momentarily. But before we get there I 382 00:26:51 --> 00:26:54 phrased this theorem as a lower bound on decision tree sorting. 383 00:26:54 --> 00:26:57 But, of course, we also get a lower bound on 384 00:26:57 --> 00:27:02 comparison sorting. And, in particular, 385 00:27:02 --> 00:27:08 it tells us that merge sort and heapsort are asymptotically 386 00:27:08 --> 00:27:11 optimal. Their dependence on n, 387 00:27:11 --> 00:27:17 in terms of asymptotic notation, so ignoring constant 388 00:27:17 --> 00:27:24 factors, these algorithms are optimal in terms of growth of n, 389 00:27:24 --> 00:27:30 but this is only in the comparison model. 390 00:27:30 --> 00:27:33 So, among comparison sorting algorithms, which these are, 391 00:27:33 --> 00:27:35 they are asymptotically optimal. 392 00:27:35 --> 00:27:39 They use the minimum number of comparisons up to constant 393 00:27:39 --> 00:27:41 factors. In fact, their whole running 394 00:27:41 --> 00:27:44 time is dominated by the number of comparisons. 395 00:27:44 --> 00:27:47 It's all Theta(n lg n). So, this is good news. 396 00:27:47 --> 00:27:51 And I should probably mention a little bit about what happens 397 00:27:51 --> 00:27:55 with randomized algorithms. What I've described here really 398 00:27:55 --> 00:27:57 only applies, in some sense, 399 00:27:57 --> 00:28:02 to deterministic algorithms. Does anyone see what would 400 00:28:02 --> 00:28:06 change with randomized algorithms or where I've assumed 401 00:28:06 --> 00:28:09 that I've had a deterministic comparison sort? 402 00:28:09 --> 00:28:13 This is a bit subtle. And I only noticed it reading 403 00:28:13 --> 00:28:17 the notes this morning, oh, wait. 404 00:28:17 --> 00:28:28 405 00:28:28 --> 00:28:30 I will give you a hint. It's over here, 406 00:28:30 --> 00:28:33 the right-hand side of the world. 407 00:28:33 --> 00:28:50 408 00:28:50 --> 00:28:55 If I have a deterministic algorithm, what the algorithm 409 00:28:55 --> 00:29:00 does is completely determinate at each step. 410 00:29:00 --> 00:29:05 As long as I know all the comparisons that it made up to 411 00:29:05 --> 00:29:11 some point, it's determinate what that algorithm will do. 412 00:29:11 --> 00:29:17 But, if I have a randomized algorithm, it also depends on 413 00:29:17 --> 00:29:24 the outcomes of some coin flips. Any suggestions of what breaks 414 00:29:24 --> 00:29:28 over here? There is more than one tree, 415 00:29:28 --> 00:29:31 exactly. So, we had this assumption that 416 00:29:31 --> 00:29:33 we only have one tree for each n. 417 00:29:33 --> 00:29:36 In fact, what we get is a probability distribution over 418 00:29:36 --> 00:29:38 trees. For each value of n, 419 00:29:38 --> 00:29:41 if you take all the possible executions of that algorithm, 420 00:29:41 --> 00:29:44 all the instruction traces, well, now, in addition to 421 00:29:44 --> 00:29:47 branching on comparisons, we also branch on whether a 422 00:29:47 --> 00:29:50 coin flip came out heads or tails, or however we're 423 00:29:50 --> 00:29:53 generating random numbers it came out with some value between 424 00:29:53 --> 00:29:55 1 and n. So, we get a probability 425 00:29:55 --> 00:29:58 distribution over trees. This lower bound still applies, 426 00:29:58 --> 00:30:02 though. Because, no matter what tree we 427 00:30:02 --> 00:30:05 get, I don't really care. I get at least one tree for 428 00:30:05 --> 00:30:08 each n. And this proof applies to every 429 00:30:08 --> 00:30:10 tree. So, no matter what tree you 430 00:30:10 --> 00:30:15 get, if it is a correct tree it has to have height Omega(n lg 431 00:30:15 --> 00:30:17 n). This lower bound applies even 432 00:30:17 --> 00:30:21 for randomized algorithms. You cannot get better than n lg 433 00:30:21 --> 00:30:24 n, because no matter what tree it comes up with, 434 00:30:24 --> 00:30:29 no matter how those coin flips come out, this argument still 435 00:30:29 --> 00:30:33 applies. Every tree that comes out has 436 00:30:33 --> 00:30:37 to be correct, so this is really at least one 437 00:30:37 --> 00:30:38 tree. 438 00:30:38 --> 00:30:43 439 00:30:43 --> 00:30:47 And that will now work. We also get the fact that 440 00:30:47 --> 00:30:52 randomized quicksort is asymptotically optimal in 441 00:30:52 --> 00:30:54 expectation. 442 00:30:54 --> 00:31:05 443 00:31:05 --> 00:31:09 But, in order to say that randomized quicksort is 444 00:31:09 --> 00:31:13 asymptotically optimal, we need to know that all 445 00:31:13 --> 00:31:19 randomized algorithms require Omega(n lg n) comparisons. 446 00:31:19 --> 00:31:22 Now we know that so all is well. 447 00:31:22 --> 00:31:27 That is the comparison model. Any questions before we go on? 448 00:31:27 --> 00:31:31 Good. The next topic is to burst 449 00:31:31 --> 00:31:37 outside of the comparison model and try to sort in linear time. 450 00:31:37 --> 00:31:43 451 00:31:43 --> 00:31:45 It is pretty clear that, as long as you don't have some 452 00:31:45 --> 00:31:48 kind of a parallel algorithm or something really fancy, 453 00:31:48 --> 00:31:51 you cannot sort any better than linear time because you've at 454 00:31:51 --> 00:31:54 least got to look at the data. No matter what you're doing 455 00:31:54 --> 00:31:56 with the data, you've got to look at it, 456 00:31:56 --> 00:31:59 otherwise you're not sorting it correctly. 457 00:31:59 --> 00:32:01 So, linear time is the best we could hope for. 458 00:32:01 --> 00:32:05 N lg n is pretty close. How could we sort in linear 459 00:32:05 --> 00:32:07 time? Well, we're going to need some 460 00:32:07 --> 00:32:10 more powerful assumption. And this is the counter 461 00:32:10 --> 00:32:12 example. We're going to have to move 462 00:32:12 --> 00:32:16 outside the comparison model and do something else with our 463 00:32:16 --> 00:32:18 elements. And what we're going to do is 464 00:32:18 --> 00:32:21 assume that they're integers in a particular range, 465 00:32:21 --> 00:32:24 and we will use that to sort in linear time. 466 00:32:24 --> 00:32:27 We're going to see two algorithms for sorting faster 467 00:32:27 --> 00:32:32 than n lg n. The first one is pretty simple, 468 00:32:32 --> 00:32:35 and we will use it in the second algorithm. 469 00:32:35 --> 00:32:40 It's called counting sort. The input to counting sort is 470 00:32:40 --> 00:32:44 an array, as usual, but we're going to assume what 471 00:32:44 --> 00:32:49 those array elements look like. Each A[i] is an integer from 472 00:32:49 --> 00:32:52 the range of 1 to k. This is a pretty strong 473 00:32:52 --> 00:32:55 assumption. And the running time is 474 00:32:55 --> 00:33:01 actually going to depend on k. If k is small it is going to be 475 00:33:01 --> 00:33:06 a good algorithm. If k is big it's going to be a 476 00:33:06 --> 00:33:10 really bad algorithm, worse than n lg n. 477 00:33:10 --> 00:33:15 Our goal is to output some sorted version of this array. 478 00:33:15 --> 00:33:20 Let's call this sorting of A. It's going to be easier to 479 00:33:20 --> 00:33:25 write down the output directly instead of writing down 480 00:33:25 --> 00:33:32 permutation for this algorithm. And then we have some auxiliary 481 00:33:32 --> 00:33:36 storage. I'm about to write down the 482 00:33:36 --> 00:33:41 pseudocode, which is why I'm declaring all my variables here. 483 00:33:41 --> 00:33:45 And the auxiliary storage will have length k, 484 00:33:45 --> 00:33:48 which is the range on my input values. 485 00:33:48 --> 00:33:52 Let's see the algorithm. 486 00:33:52 --> 00:34:07 487 00:34:07 --> 00:34:09 This is counting sort. 488 00:34:09 --> 00:34:17 489 00:34:17 --> 00:34:20 And it takes a little while to write down but it's pretty 490 00:34:20 --> 00:34:22 straightforward. 491 00:34:22 --> 00:34:28 492 00:34:28 --> 00:34:32 First we do some initialization. 493 00:34:32 --> 00:34:36 Then we do some counting. 494 00:34:36 --> 00:35:04 495 00:35:04 --> 00:35:06 Then we do some summing. 496 00:35:06 --> 00:35:50 497 00:35:50 --> 00:35:54 And then we actually write the output. 498 00:35:54 --> 00:36:28 499 00:36:28 --> 00:36:30 Is that algorithm perfectly clear to everyone? 500 00:36:30 --> 00:36:30 No one. Good. 501 00:36:30 --> 00:36:33 This should illustrate how obscure pseudocode can be. 502 00:36:33 --> 00:36:36 And when you're solving your problem sets, 503 00:36:36 --> 00:36:39 you should keep in mind that it's really hard to understand 504 00:36:39 --> 00:36:41 an algorithm just given pseudocode like this. 505 00:36:41 --> 00:36:45 You need some kind of English description of what's going on 506 00:36:45 --> 00:36:48 because, while you could work through and figure out what this 507 00:36:48 --> 00:36:51 means, it could take half an hour to an hour. 508 00:36:51 --> 00:36:53 And that's not a good way of expressing yourself. 509 00:36:53 --> 00:36:57 And so what I will give you now is the English description, 510 00:36:57 --> 00:37:01 but we will refer back to this to understand. 511 00:37:01 --> 00:37:05 This is sort of our bible of what the algorithm is supposed 512 00:37:05 --> 00:37:07 to do. Let me go over it briefly. 513 00:37:07 --> 00:37:11 The first step is just some initialization. 514 00:37:11 --> 00:37:15 The C[i]'s are going to count some things, count occurrences 515 00:37:15 --> 00:37:18 of values. And so first we set them to 516 00:37:18 --> 00:37:20 zero. Then, for every value we see 517 00:37:20 --> 00:37:25 A[j], we're going to increment the counter for that value A[j]. 518 00:37:25 --> 00:37:30 Then the C[i]s will give me the number of elements equal to a 519 00:37:30 --> 00:37:35 particular value i. Then I'm going to take prefix 520 00:37:35 --> 00:37:39 sums, which will make it so that C[i] gives me the number of 521 00:37:39 --> 00:37:42 keys, the number of elements less than or equal to [i] 522 00:37:42 --> 00:37:45 instead of equals. And then, finally, 523 00:37:45 --> 00:37:49 it turns out that's enough to put all the elements in the 524 00:37:49 --> 00:37:52 right place. This I will call distribution. 525 00:37:52 --> 00:37:56 This is the distribution step. And it's probably the least 526 00:37:56 --> 00:38:01 obvious of all the steps. And let's do an example to make 527 00:38:01 --> 00:38:04 it more obvious what's going on. 528 00:38:04 --> 00:38:12 529 00:38:12 --> 00:38:30 Let's take an array A = [4, 1, 3, 4, 3]. 530 00:38:30 --> 00:38:36 And then I want some array C. And let me add some indices 531 00:38:36 --> 00:38:43 here so we can see what the algorithm is really doing. 532 00:38:43 --> 00:38:50 Here it turns out that all of my numbers are in the range 1 to 533 00:38:50 --> 00:38:54 4, so k = 4. My array C has four values. 534 00:38:54 --> 00:39:00 Initially, I set them all to zero. 535 00:39:00 --> 00:39:03 That's easy. And now I want to count through 536 00:39:03 --> 00:39:07 everything. And let me not cheat here. 537 00:39:07 --> 00:39:10 I'm in the second step, so to speak. 538 00:39:10 --> 00:39:13 And I look for each element in order. 539 00:39:13 --> 00:39:17 I look at the C[i] value. The first element is 4, 540 00:39:17 --> 00:39:20 so I look at C4. That is 0. 541 00:39:20 --> 00:39:24 I increment it to 1. Then I look at element 1. 542 00:39:24 --> 00:39:28 That's 0. I increment it to 1. 543 00:39:28 --> 00:39:30 Then I look at 3 and that's here. 544 00:39:30 --> 00:39:33 It is also 0. I increment it to 1. 545 00:39:33 --> 00:39:37 Not so exciting so far. Now I see 4, 546 00:39:37 --> 00:39:40 which I've seen before, how exciting. 547 00:39:40 --> 00:39:44 I had value 1 in here, I increment it to 2. 548 00:39:44 --> 00:39:48 Then I see value 3, which also had a value of 1. 549 00:39:48 --> 00:39:51 I increment that to 2. The result is [1, 550 00:39:51 --> 00:39:55 0, 2, 2]. That's what my array C looks 551 00:39:55 --> 00:40:00 like at this point in the algorithm. 552 00:40:00 --> 00:40:04 Now I do a relatively simple transformation of taking prefix 553 00:40:04 --> 00:40:05 sums. I want to know, 554 00:40:05 --> 00:40:09 instead of these individual values, the sum of this prefix, 555 00:40:09 --> 00:40:13 the sum of this prefix, the sum of this prefix and the 556 00:40:13 --> 00:40:17 sum of this prefix. I will call that C prime just 557 00:40:17 --> 00:40:21 so we don't get too lost in all these different versions of C. 558 00:40:21 --> 00:40:23 This is just 1. And 1 plus 0 is 1. 559 00:40:23 --> 00:40:25 1 plus 2 is 3. 3 plus 2 is 5. 560 00:40:25 --> 00:40:30 So, these are sort of the running totals. 561 00:40:30 --> 00:40:33 There are five elements total, there are three elements less 562 00:40:33 --> 00:40:37 than or equal to 3, there is one element less than 563 00:40:37 --> 00:40:38 or equal to 2, and so on. 564 00:40:38 --> 00:40:40 Now, the fun part, the distribution. 565 00:40:40 --> 00:40:43 And this is where we get our array B. 566 00:40:43 --> 00:40:46 B better have the same size, every element better appear 567 00:40:46 --> 00:40:50 here somewhere and they should come out in sorted order. 568 00:40:50 --> 00:40:54 Let's just run the algorithm. j is going to start at the end 569 00:40:54 --> 00:40:58 of the array and work its way down to 1, the beginning of the 570 00:40:58 --> 00:41:02 array. And what we do is we pick up 571 00:41:02 --> 00:41:05 the last element of A, A[n]. 572 00:41:05 --> 00:41:11 We look at the counter. We look at the C vector for 573 00:41:11 --> 00:41:14 that value. Here the value is 3, 574 00:41:14 --> 00:41:19 and this is the third column, so that has number 3. 575 00:41:19 --> 00:41:24 And the claim is that's where it belongs in B. 576 00:41:24 --> 00:41:29 You take this number 3, you put it in index 3 of the 577 00:41:29 --> 00:41:34 array B. And then you decrement the 578 00:41:34 --> 00:41:37 counter. I'm going to replace 3 here 579 00:41:37 --> 00:41:40 with 2. And the idea is these numbers 580 00:41:40 --> 00:41:44 tell you where those values should go. 581 00:41:44 --> 00:41:48 Anything of value 1 should go at position 1. 582 00:41:48 --> 00:41:53 Anything with value 3 should go at position 3 or less. 583 00:41:53 --> 00:41:59 This is going to be the last place that a 3 should go. 584 00:41:59 --> 00:42:02 And then anything with value 4 should go at position 5 or less, 585 00:42:02 --> 00:42:06 definitely should go at the end of the array because 4 is the 586 00:42:06 --> 00:42:09 largest value. And this counter will work out 587 00:42:09 --> 00:42:13 perfectly because these counts have left enough space in each 588 00:42:13 --> 00:42:15 section of the array. Effectively, 589 00:42:15 --> 00:42:18 this part is reserved for ones, there are no twos, 590 00:42:18 --> 00:42:21 this part is reserved for threes, and this part is 591 00:42:21 --> 00:42:24 reserved for fours. You can check if that's really 592 00:42:24 --> 00:42:27 what this array means. Let's finish running the 593 00:42:27 --> 00:42:31 algorithm. That was the last element. 594 00:42:31 --> 00:42:34 I won't cross it off, but we've sort of done that. 595 00:42:34 --> 00:42:36 Now I look at the next to last element. 596 00:42:36 --> 00:42:38 That's a 4. Fours go in position 5. 597 00:42:38 --> 00:42:42 So, I put my 4 here in position 5 and I decrement that counter. 598 00:42:42 --> 00:42:45 Next I look at another 3. Threes now go in position 2, 599 00:42:45 --> 00:42:48 so that goes there. And then I decrement that 600 00:42:48 --> 00:42:50 counter. I won't actually use that 601 00:42:50 --> 00:42:53 counter anymore, but let's decrement it because 602 00:42:53 --> 00:42:57 that's what the algorithm says. I look at the previous element. 603 00:42:57 --> 00:43:00 That's a 1. Ones go in position 1, 604 00:43:00 --> 00:43:04 so I put it here and decrement that counter. 605 00:43:04 --> 00:43:09 And finally I have another 4. And fours go in position 4 now, 606 00:43:09 --> 00:43:13 position 4 is here, and I decrement that counter. 607 00:43:13 --> 00:43:18 So, that's counting sort. And you'll notice that all the 608 00:43:18 --> 00:43:23 elements appear and they appear in order, so that's the 609 00:43:23 --> 00:43:26 algorithm. Now, what's the running time of 610 00:43:26 --> 00:43:31 counting sort? kn is an upper bound. 611 00:43:31 --> 00:43:35 It's a little bit better than that. 612 00:43:35 --> 00:43:43 Actually, quite a bit better. This requires some summing. 613 00:43:43 --> 00:43:49 Let's go back to the top of the algorithm. 614 00:43:49 --> 00:43:53 How much time does this step take? 615 00:43:53 --> 00:43:57 k. How much time does this step 616 00:43:57 --> 00:44:00 take? n. 617 00:44:00 --> 00:44:05 How much time does this step take? 618 00:44:05 --> 00:44:10 k. Each of these operations in the 619 00:44:10 --> 00:44:17 for loops is taking constant time, so it is how many 620 00:44:17 --> 00:44:22 iterations of that for loop are there? 621 00:44:22 --> 00:44:29 And, finally, this step takes n. 622 00:44:29 --> 00:44:35 So, the total running time of counting sort is k + n. 623 00:44:35 --> 00:44:43 And this is a great algorithm if k is relatively small, 624 00:44:43 --> 00:44:49 like at most n. If k is big like n^2 or 2^n or 625 00:44:49 --> 00:44:54 whatever, this is not such a good algorithm, 626 00:44:54 --> 00:45:01 but if k = O(n) this is great. And we get our linear time 627 00:45:01 --> 00:45:04 sorting algorithm. Not only do we need the 628 00:45:04 --> 00:45:08 assumption that our numbers are integers, but we need that the 629 00:45:08 --> 00:45:12 range of the integers is pretty small for this algorithm to 630 00:45:12 --> 00:45:14 work. If all the numbers are between 631 00:45:14 --> 00:45:17 1 and order n then we get a linear time algorithm. 632 00:45:17 --> 00:45:20 But as soon as they're up to n lg n we're toast. 633 00:45:20 --> 00:45:24 We're back to n lg n sorting. It's not so great. 634 00:45:24 --> 00:45:27 So, you could write a combination algorithm that says, 635 00:45:27 --> 00:45:31 well, if k is bigger than n lg n, then I will just use merge 636 00:45:31 --> 00:45:35 sort. And if it's less than n lg n 637 00:45:35 --> 00:45:38 I'll use counting sort. And that would work, 638 00:45:38 --> 00:45:42 but we can do better than that. How's the time? 639 00:45:42 --> 00:45:46 It is worth noting that we've beaten our bound, 640 00:45:46 --> 00:45:51 but only assuming that we're outside the comparison model. 641 00:45:51 --> 00:45:55 We haven't really contradicted the original theorem, 642 00:45:55 --> 00:46:00 we're just changing the model. And it's always good to 643 00:46:00 --> 00:46:04 question what you're allowed to do in any problem scenario. 644 00:46:04 --> 00:46:07 In, say, some practical scenarios, this would be great 645 00:46:07 --> 00:46:10 if the numbers you're dealing with are, say, 646 00:46:10 --> 00:46:12 a byte long. Then k is only 2^8, 647 00:46:12 --> 00:46:15 which is 256. You need this auxiliary array 648 00:46:15 --> 00:46:17 of size 256, and this is really fast. 649 00:46:17 --> 00:46:21 256 + n, no matter how big n is it's linear in n. 650 00:46:21 --> 00:46:24 If you know your numbers are small, it's great. 651 00:46:24 --> 00:46:27 But if you're numbers are bigger, say you still know 652 00:46:27 --> 00:46:30 they're integers but they fit in like 32 bit words, 653 00:46:30 --> 00:46:35 then life is not so easy. Because k is then 2^32, 654 00:46:35 --> 00:46:39 which is 4.2 billion or so, which is pretty big. 655 00:46:39 --> 00:46:43 And you would need this auxiliary array of 4.2 billion 656 00:46:43 --> 00:46:46 words, which is probably like 16 gigabytes. 657 00:46:46 --> 00:46:51 So, you just need to initialize that array before you can even 658 00:46:51 --> 00:46:54 get started. Unless n is like much, 659 00:46:54 --> 00:46:58 much more than 4 billion and you have 16 gigabytes of storage 660 00:46:58 --> 00:47:02 just to throw away, which I don't even have any 661 00:47:02 --> 00:47:06 machines with 16 gigabytes of RAM, this is not such a great 662 00:47:06 --> 00:47:10 algorithm. Just to get a feel, 663 00:47:10 --> 00:47:13 it's good, the numbers are really small. 664 00:47:13 --> 00:47:18 What we're going to do next is come up with a fancier algorithm 665 00:47:18 --> 00:47:22 that uses this as a subroutine on small numbers and combines 666 00:47:22 --> 00:47:25 this algorithm to handle larger numbers. 667 00:47:25 --> 00:47:29 That algorithm is called radix sort. 668 00:47:29 --> 00:47:34 But we need one important property of counting sort before 669 00:47:34 --> 00:47:36 we can go there. 670 00:47:36 --> 00:47:42 671 00:47:42 --> 00:47:45 And that important property is stability. 672 00:47:45 --> 00:47:50 673 00:47:50 --> 00:47:58 A stable sorting algorithm preserves the order of equal 674 00:47:58 --> 00:48:05 elements, let's say the relative order. 675 00:48:05 --> 00:48:19 676 00:48:19 --> 00:48:21 This is a bit subtle because usually we think of elements 677 00:48:21 --> 00:48:24 just as numbers. And, yeah, we had a couple 678 00:48:24 --> 00:48:25 threes and we had a couple fours. 679 00:48:25 --> 00:48:28 It turns out, if you look at the order of 680 00:48:28 --> 00:48:31 those threes and the order of those fours, we kept them in 681 00:48:31 --> 00:48:33 order. Because we took the last three 682 00:48:33 --> 00:48:36 and we put it here. Then we took the next to the 683 00:48:36 --> 00:48:39 last three and we put it to the left of that where O is 684 00:48:39 --> 00:48:42 decrementing our counter and moving from the end of the array 685 00:48:42 --> 00:48:45 to the beginning of the array. No matter how we do that, 686 00:48:45 --> 00:48:49 the orders of those threes are preserved, the orders of the 687 00:48:49 --> 00:48:51 fours are preserved. This may seem like a relatively 688 00:48:51 --> 00:48:54 simple thing, but if you look at the other 689 00:48:54 --> 00:48:57 four sorting algorithms we've seen, not all of them are 690 00:48:57 --> 00:49:00 stable. So, this is an exercise. 691 00:49:00 --> 00:49:06 692 00:49:06 --> 00:49:11 Exercise is figure out which other sorting algorithms that 693 00:49:11 --> 00:49:15 we've seen are stable and which are not. 694 00:49:15 --> 00:49:21 695 00:49:21 --> 00:49:25 I encourage you to work that out because this is the sort of 696 00:49:25 --> 00:49:29 thing that we ask on quizzes. But for now all we need is that 697 00:49:29 --> 00:49:33 counting sort is stable. And I won't prove this, 698 00:49:33 --> 00:49:37 but it should be pretty obvious from the algorithm. 699 00:49:37 --> 00:49:41 Now we get to talk about radix sort. 700 00:49:41 --> 00:49:55 701 00:49:55 --> 00:50:01 Radix sort is going to work for a much larger range of numbers 702 00:50:01 --> 00:50:04 in linear time. Still it has to have an 703 00:50:04 --> 00:50:09 assumption about how big those numbers are, but it will be a 704 00:50:09 --> 00:50:13 much more lax assumption. Now, to increase suspense even 705 00:50:13 --> 00:50:18 further, I am going to tell you some history about radix sort. 706 00:50:18 --> 00:50:22 This is one of the oldest sorting algorithms. 707 00:50:22 --> 00:50:26 It's probably the oldest implemented sorting algorithm. 708 00:50:26 --> 00:50:32 It was implemented around 1890. This is Herman Hollerith. 709 00:50:32 --> 00:50:35 Let's say around 1890. Has anyone heard of Hollerith 710 00:50:35 --> 00:50:37 before? A couple people. 711 00:50:37 --> 00:50:41 Not too many. He is sort of an important guy. 712 00:50:41 --> 00:50:43 He was a lecturer at MIT at some point. 713 00:50:43 --> 00:50:47 He developed an early version of punch cards. 714 00:50:47 --> 00:50:51 Punch card technology. This is before my time so I 715 00:50:51 --> 00:50:54 even have to look at my notes to remember. 716 00:50:54 --> 00:50:57 Oh, yeah, they're called punch cards. 717 00:50:57 --> 00:51:02 You may have seen them. If not they're in the 718 00:51:02 --> 00:51:06 PowerPoint lecture notes. There's this big grid. 719 00:51:06 --> 00:51:11 These days, if you've used a modern punch card recently, 720 00:51:11 --> 00:51:16 they are 80 characters wide and, I don't know, 721 00:51:16 --> 00:51:21 I think it's something like 16, I don't remember exactly. 722 00:51:21 --> 00:51:25 And then you punch little holes here. 723 00:51:25 --> 00:51:30 You have this magic machine. It's like a typewriter. 724 00:51:30 --> 00:51:34 You press a letter and that corresponds to some character. 725 00:51:34 --> 00:51:38 Maybe it will punch out a hole here, punch out a hole here. 726 00:51:38 --> 00:51:42 You can see the website if you want to know exactly how this 727 00:51:42 --> 00:51:46 works for historical reasons. You don't see these too often 728 00:51:46 --> 00:51:49 anymore, but this is in particular the reason why most 729 00:51:49 --> 00:51:53 terminals are 80 characters wide because that was how things 730 00:51:53 --> 00:51:55 were. Hollerith actually didn't 731 00:51:55 --> 00:51:59 develop these punch cards exactly, although eventually he 732 00:51:59 --> 00:52:01 did. In the beginning, 733 00:52:01 --> 00:52:04 in 1890, the big deal was the US Census. 734 00:52:04 --> 00:52:07 If you watched the news, I guess like a year or two ago, 735 00:52:07 --> 00:52:10 the US Census was a big deal because it's really expensive to 736 00:52:10 --> 00:52:12 collect all this data from everyone. 737 00:52:12 --> 00:52:15 And the Constitution says you've got to collect data about 738 00:52:15 --> 00:52:18 everyone every ten years. And it was getting hard. 739 00:52:18 --> 00:52:20 In particular, in 1880, they did the census. 740 00:52:20 --> 00:52:24 And it took them almost ten years to complete the census. 741 00:52:24 --> 00:52:27 The population kept going up, and ten years to do a ten-year 742 00:52:27 --> 00:52:30 census, that's going to start getting expensive when they 743 00:52:30 --> 00:52:34 overlap with each other. So, for 1890 they wanted to do 744 00:52:34 --> 00:52:37 something fancier. And Hollerith said, 745 00:52:37 --> 00:52:40 OK, I'm going to build a machine that you take in the 746 00:52:40 --> 00:52:42 data. It was a modified punch card 747 00:52:42 --> 00:52:46 where you would mark out particular squares depending on 748 00:52:46 --> 00:52:50 your status, whether you were single or married or whatever. 749 00:52:50 --> 00:52:53 All the things they wanted to know on the census they would 750 00:52:53 --> 00:52:57 encode in binary onto this card. And then he built a machine 751 00:52:57 --> 00:53:02 that would sort these cards so you could do counting. 752 00:53:02 --> 00:53:05 And, in some sense, these are numbers. 753 00:53:05 --> 00:53:10 And the numbers aren't too big, but they're big enough that 754 00:53:10 --> 00:53:15 counting sort wouldn't work. I mean if there were a hundred 755 00:53:15 --> 00:53:18 numbers here, 2^100 is pretty overwhelming, 756 00:53:18 --> 00:53:24 so we cannot use counting sort. The first idea was the wrong 757 00:53:24 --> 00:53:27 idea. I'm going to think of these as 758 00:53:27 --> 00:53:30 numbers. Let's say each of these columns 759 00:53:30 --> 00:53:34 is one number. And so there's sort of the most 760 00:53:34 --> 00:53:38 significant number out here and there is the least significant 761 00:53:38 --> 00:53:40 number out here. The first idea was you sort by 762 00:53:40 --> 00:53:43 the most significant digit first. 763 00:53:43 --> 00:53:50 764 00:53:50 --> 00:53:53 That's not such a great algorithm, because if you sort 765 00:53:53 --> 00:53:58 by the most significant digit you get a bunch of buckets each 766 00:53:58 --> 00:54:01 with a pile of cards. And this was a physical device. 767 00:54:01 --> 00:54:04 It wasn't exactly an electronically controlled 768 00:54:04 --> 00:54:06 computer. It was a human that would push 769 00:54:06 --> 00:54:09 down some kind of reader. It would see which holes in the 770 00:54:09 --> 00:54:12 first column are punched. And then it would open a 771 00:54:12 --> 00:54:15 physical bin in which the person would sort of swipe it and it 772 00:54:15 --> 00:54:17 would just fall into the right bin. 773 00:54:17 --> 00:54:20 It was a semi-automated. I mean the computer was the 774 00:54:20 --> 00:54:22 human plus the machine, but never mind. 775 00:54:22 --> 00:54:25 This was the procedure. You sorted it into bins. 776 00:54:25 --> 00:54:28 Then you had to go through and sort each bin by the second 777 00:54:28 --> 00:54:32 digit. And pretty soon the number of 778 00:54:32 --> 00:54:36 bins gets pretty big. And if you don't have too many 779 00:54:36 --> 00:54:40 digits this is OK, but it's not the right thing to 780 00:54:40 --> 00:54:41 do. The right idea, 781 00:54:41 --> 00:54:45 which is what Hollerith came up with after that, 782 00:54:45 --> 00:54:50 was to sort by the least significant digit first. 783 00:54:50 --> 00:55:00 784 00:55:00 --> 00:55:03 And you should also do that using a stable sorting 785 00:55:03 --> 00:55:05 algorithm. Now, Hollerith probably didn't 786 00:55:05 --> 00:55:08 call it a stable sorting algorithm at the time, 787 00:55:08 --> 00:55:11 but we will. And this won Hollerith lots of 788 00:55:11 --> 00:55:14 money and good things. He founded this tabulating 789 00:55:14 --> 00:55:17 machine company in 1911, and that merged with several 790 00:55:17 --> 00:55:21 other companies to form something you may have heard of 791 00:55:21 --> 00:55:24 called IBM in 1924. That may be the context in 792 00:55:24 --> 00:55:28 which you've heard of Hollerith, or if you've done punch cards 793 00:55:28 --> 00:55:32 before. The whole idea is that we're 794 00:55:32 --> 00:55:37 doing a digit by digit sort. I should have mentioned that at 795 00:55:37 --> 00:55:40 the beginning. And we're going to do it from 796 00:55:40 --> 00:55:43 least significant to most significant. 797 00:55:43 --> 00:55:48 It turns out that works. And to see that let's do an 798 00:55:48 --> 00:55:50 example. I think I'm going to need a 799 00:55:50 --> 00:55:55 whole two boards ideally. First we'll see an example. 800 00:55:55 --> 00:55:59 Then we'll prove the theorem. The proof is actually pretty 801 00:55:59 --> 00:56:03 darn easy. But, nonetheless, 802 00:56:03 --> 00:56:07 it's rather counterintuitive this works if you haven't seen 803 00:56:07 --> 00:56:10 it before. Certainly, the first time I saw 804 00:56:10 --> 00:56:14 it, it was quite a surprise. The nice thing also about this 805 00:56:14 --> 00:56:19 algorithm is there are no bins. It's all one big bin at all 806 00:56:19 --> 00:56:21 times. Let's take some numbers. 807 00:56:21 --> 329. 808 329. --> 00:56:23 This is a three digit number. 809 00:56:23 --> 00:56:28 I'm spacing out the digits so we can see them a little bit 810 00:56:28 --> 457. better. 811 457. --> 00:56:30 812 00:56:30 --> 00:56:33 657, 839, 436, 720 and 355. 813 00:56:33 --> 00:56:38 I'm assuming here we're using decimal numbers. 814 00:56:38 --> 00:56:43 Why not? Hopefully this are not yet 815 00:56:43 --> 00:56:47 sorted. We'd like to sort them. 816 00:56:47 --> 00:56:54 The first thing we do is take the least significant digit, 817 00:56:54 --> 00:57:00 sort by the least significant digit. 818 00:57:00 --> 00:57:04 And whenever we have equal elements like these two nines, 819 00:57:04 --> 00:57:07 we preserve their relative order. 820 00:57:07 --> 00:57:11 So, 329 is going to remain above 839. 821 00:57:11 --> 00:57:16 It doesn't matter here because we're doing the first sort, 822 00:57:16 --> 00:57:20 but in general we're always using a stable sorting 823 00:57:20 --> 00:57:23 algorithm. When we sort by this column, 824 00:57:23 --> 00:57:27 first we get the zero, so that's 720, 825 00:57:27 --> 355. then we get 5, 826 355. --> 00:57:30 827 00:57:30 --> 436. Then we get 6, 828 436. --> 00:57:31 829 00:57:31 --> 00:57:36 Stop me if I make a mistake. Then we get the 7s, 830 00:57:36 --> 00:57:42 and we preserve the order. Here it happens to be the right 831 00:57:42 --> 00:57:47 order, but it may not be at this point. 832 00:57:47 --> 00:57:51 We haven't even looked at the other digits. 833 00:57:51 --> 00:57:54 Then we get 9s, there are two 9s, 834 00:57:54 --> 00:57:57 329 and 839. All right so far? 835 00:57:57 --> 00:58:03 Good. Now we sort by the middle 836 00:58:03 --> 00:58:07 digit, the next least significant. 837 00:58:07 --> 00:58:12 And we start out with what looks like the 2s. 838 00:58:12 --> 00:58:17 There is a 2 up here and a 2 down here. 839 00:58:17 --> 00:58:23 Of course, we write the first 2 first, 720, then 329. 840 00:58:23 --> 00:58:30 Then we have the 3s, so we have 436 and 839. 841 00:58:30 --> 00:58:33 Then we have a bunch of 5s it looks like. 842 00:58:33 --> 00:58:36 Have I missed anyone so far? No. 843 00:58:36 --> 00:58:38 Good. We have three 5s, 844 00:58:38 --> 00:58:42 355, 457 and 657. I like to check that I haven't 845 00:58:42 --> 00:58:45 lost any elements. We have seven here, 846 00:58:45 --> 00:58:48 seven here and seven elements here. 847 00:58:48 --> 00:58:51 Good. Finally, we sort by the last 848 00:58:51 --> 00:58:53 digit. One thing to notice, 849 00:58:53 --> 00:59:00 by the way, is before we sorted by the last digit -- 850 00:59:00 --> 00:59:05 Currently these numbers don't resemble sorted order at all. 851 00:59:05 --> 00:59:10 But if you look at everything beyond the digit we haven't yet 852 00:59:10 --> 00:59:15 sorted, so these two digits, that's nice and sorted, 853 00:59:15 --> 00:59:17 20, 29, 36, 39, 55, 57, 57. 854 00:59:17 --> 00:59:20 Pretty cool. Let's finish it off. 855 00:59:20 --> 00:59:23 We stably sort by the first digit. 856 00:59:23 --> 00:59:29 And the smallest number we get is a 3, so we get 329 and then 857 00:59:29 --> 355. 858 355. --> 00:59:36 Then we get some 4s, 859 00:59:36 --> 00:59:45 436 and 457, then we get a 6, 860 00:59:45 --> 00:59:55 657, then a 7, and then we have an 8. 861 00:59:55 --> 1:00:01.631 And check. I still have seven elements. 862 1:00:01.631 --> 1:00:03.203 Good. I haven't lost anyone. 863 1:00:03.203 --> 1:00:05.533 And, indeed, they're now in sorted order. 864 1:00:05.533 --> 1:00:08.097 And you can start to see why this is working. 865 1:00:08.097 --> 1:00:11.417 When I have equal elements here, I have already sorted the 866 1:00:11.417 --> 1:00:13.398 suffix. Let's write down a proof of 867 1:00:13.398 --> 1:00:15.029 that. What is nice about this 868 1:00:15.029 --> 1:00:17.65 algorithm is we're not partitioning into bins. 869 1:00:17.65 --> 1:00:20.97 We always keep the huge batch of elements in one big pile, 870 1:00:20.97 --> 1:00:23.65 but we're just going through it multiple times. 871 1:00:23.65 --> 1:00:27.087 In general, we sort of need to go through it multiple times. 872 1:00:27.087 --> 1:00:32.006 Hopefully not too many times. But let's first argue 873 1:00:32.006 --> 1:00:36.019 correctness. To analyze the running time is 874 1:00:36.019 --> 1:00:41.751 a little bit tricky here because it depends how you partition 875 1:00:41.751 --> 1:00:44.808 into digits. Correctness is easy. 876 1:00:44.808 --> 1:00:50.159 We just induct on the digit position that we're currently 877 1:00:50.159 --> 1:00:55.891 sorting, so let's call that t. And we can assume by induction 878 1:00:55.891 --> 1:01:02.656 that it's sorted beyond digit t. This is our induction 879 1:01:02.656 --> 1:01:07.841 hypothesis. We assume that we're sorted on 880 1:01:07.841 --> 1:01:14.924 the low-order t - 1 digits. And then the next thing we do 881 1:01:14.924 --> 1:01:21.501 is sort on the t-th digit. We just need to check that 882 1:01:21.501 --> 1:01:26.561 things work. And we restore the induction 883 1:01:26.561 --> 1. hypothesis for t instead of t - 884 1. --> 1:01:32 885 1:01:32 --> 1:01:36.009 When we sort on the t-th digit there are two cases. 886 1:01:36.009 --> 1:01:40.981 If we look at any two elements, we want to know whether they're 887 1:01:40.981 --> 1:01:45.15 put in the right order. If two elements are the same, 888 1:01:45.15 --> 1:01:49 let's say they have the same t-th digit -- 889 1:01:49 --> 1:01:58 890 1:01:58 --> 1:02:02 This is the tricky case. If they have the same t-th 891 1:02:02 --> 1:02:05.519 digit then their order should not be changed. 892 1:02:05.519 --> 1:02:09.36 So, by stability, we know that they remain in the 893 1:02:09.36 --> 1:02:14.4 same order because stability is supposed to preserve things that 894 1:02:14.4 --> 1:02:17.519 have the same key that we're sorting on. 895 1:02:17.519 --> 1:02:21.92 And then, by the induction hypothesis, we know that that 896 1:02:21.92 --> 1:02:26.239 keeps them in sorted order because induction hypothesis 897 1:02:26.239 --> 1:02:30 says that they used to be sorted. 898 1:02:30 --> 1:02:35.369 Adding on this value in the front that's the same in both 899 1:02:35.369 --> 1:02:39.684 doesn't change anything so they remain sorted. 900 1:02:39.684 --> 1:02:44 And if they have differing t-th digits -- 901 1:02:44 --> 1:02:54 902 1:02:54 --> 1:03:00 -- then this sorting step will put them in the right order. 903 1:03:00 --> 1:03:03.189 Because that's what sorting does. 904 1:03:03.189 --> 1:03:08.87 This is the most significant digit, so you've got to order 905 1:03:08.87 --> 1:03:12.558 them by the t-th digit if they differ. 906 1:03:12.558 --> 1:03:17.84 The rest are irrelevant. So, proof here of correctness 907 1:03:17.84 --> 1:03:22.026 is very simple once you know the algorithm. 908 1:03:22.026 --> 1:03:25.514 Any questions before we go on? Good. 909 1:03:25.514 --> 1:03:30 We're going to use counting sort. 910 1:03:30 --> 1:03:30.344 We could use any sorting algorithm we want for individual 911 1:03:30.344 --> 1:03:30.713 digits, but the only algorithm that we know that runs in less 912 1:03:30.713 --> 1:03:30.916 than n lg n time is counting sort. 913 1:03:30.916 --> 1:03:31.267 So, we better use that one to sort of bootstrap and get an 914 1:03:31.267 --> 1:03:31.501 even faster and more general algorithm. 915 1:03:31.501 --> 1:03:31.883 I just erased the running time. Counting sort runs in order k + 916 1:03:31.883 --> 1:03:36.003 n time. We need to remember that. 917 1:03:36.003 --> 1:03:44.329 And the range of the numbers is 1 to k or 0 to k - 1. 918 1:03:44.329 --> 1:03:53.616 When we sort by a particular digit, we shouldn't use n lg n 919 1:03:53.616 --> 1:04:02.743 algorithm because then this thing will take n lg n for one 920 1:04:02.743 --> 1:04:09.788 round and it's going to have multiple rounds. 921 1:04:09.788 --> 1:04:15.552 That's going to be worse than n lg n. 922 1:04:15.552 --> 1:04:25 We're going to use counting sort for each round. 923 1:04:25 --> 1:04:32 924 1:04:32 --> 1:04:34.931 We use counting sort for each digit. 925 1:04:34.931 --> 1:04:40.125 And we know the running time of counting sort here is order k + 926 1:04:40.125 --> 1:04:42.973 n . But I don't want to assume that 927 1:04:42.973 --> 1:04:46.324 my integers are split into digits for me. 928 1:04:46.324 --> 1:04:50.261 That's sort of giving away too much flexibility. 929 1:04:50.261 --> 1:04:55.287 Because if I have some number written in whatever form it is, 930 1:04:55.287 --> 1:05:00.062 probably written in binary, I can cluster together some of 931 1:05:00.062 --> 1:05:04 those bits and call that a digit. 932 1:05:04 --> 1:05:07.415 Let's think of our numbers as binary. 933 1:05:07.415 --> 1:05:12.442 Suppose we have n integers. And they're in some range. 934 1:05:12.442 --> 1:05:16.901 And we want to know how big a range they can be. 935 1:05:16.901 --> 1:05:21.264 Let's say, a sort of practical way of thinking, 936 1:05:21.264 --> 1:05:26.577 you know, we're in a binary world, each integer is b bits 937 1:05:26.577 --> 1:05:29.774 long. So, in other words, 938 1:05:29.774 --> 1:05:35.283 the range is from 0 to 2b - 1. I will assume that my numbers 939 1:05:35.283 --> 1:05:39.765 are non-negative. It doesn't make much difference 940 1:05:39.765 --> 1:05:42.006 if they're negative, too. 941 1:05:42.006 --> 1:05:47.515 I want to know how big a b I can handle, but I don't want to 942 1:05:47.515 --> 1:05:52.65 split into bits as my digits because then I would have b 943 1:05:52.65 --> 1:05:59 digits and I would have to do b rounds of this algorithm. 944 1:05:59 --> 1:06:02.839 The number of rounds of this algorithm is the number of 945 1:06:02.839 --> 1:06:05.754 digits that I have. And each one costs me, 946 1:06:05.754 --> 1:06:08.598 let's hope, for linear time. And, indeed, 947 1:06:08.598 --> 1:06:10.589 if I use a single bit, k = 2. 948 1:06:10.589 --> 1:06:14.428 And so this is order n. But then the running time would 949 1:06:14.428 --> 1:06:17.557 be order n per round. And there are b digits, 950 1:06:17.557 --> 1:06:21.183 if I consider them to be bits, order n times b time. 951 1:06:21.183 --> 1:06:24.24 And even if b is something small like log n, 952 1:06:24.24 --> 1:06:27.866 if I have log n bits, then these are numbers between 953 1:06:27.866 --> 1:06:32.549 0 and n - 1. I already know how to sort 954 1:06:32.549 --> 1:06:36.666 numbers between 0 and n - 1 in linear time. 955 1:06:36.666 --> 1:06:41.372 Here I'm spending n lg n time, so that's no good. 956 1:06:41.372 --> 1:06:47.549 Instead, what we're going to do is take a bunch of bits and call 957 1:06:47.549 --> 1:06:51.47 that a digit, the most bits we can handle 958 1:06:51.47 --> 1:06:56.078 with counting sort. The notation will be I split 959 1:06:56.078 --> 1:07:01.847 each integer into b/r digits. Each r bits long. 960 1:07:01.847 --> 1:07:06.63 In other words, I think of my number as being 961 1:07:06.63 --> 1:07:11.086 in base 2^r. And I happen to be writing it 962 1:07:11.086 --> 1:07:15.869 down in binary, but I cluster together r bits 963 1:07:15.869 --> 1:07:20.108 and I get a bunch of digits in base 2^r. 964 1:07:20.108 --> 1:07:26.195 And then there are b/ r digits. This b/r is the number of 965 1:07:26.195 --> 1:07:30 rounds. And this base -- 966 1:07:30 --> 1:07:34.104 This is the maximum value I have in one of these digits. 967 1:07:34.104 --> 1:07:37.537 It's between 0 and 2^r. This is, in some sense, 968 1:07:37.537 --> 1:07:40 k for a run of counting sort. 969 1:07:40 --> 1:07:49 970 1:07:49 --> 1:07:54.673 What is the running time? Well, I have b/r rounds. 971 1:07:54.673 --> 1:08:00 It's b/r times the running time for a round. 972 1:08:00 --> 1:08:05.83 Which I have n numbers and my value of k is 2^r. 973 1:08:05.83 --> 1:08:10.917 This is the running time of counting sort, 974 1:08:10.917 --> 1:08:18.236 n + k, this is the number of rounds, so this is b/r (n+2^r). 975 1:08:18.236 --> 1:08:23.199 And I am free to choose r however I want. 976 1:08:23.199 --> 1:08:30.146 What I would like to do is minimize this run time over my 977 1:08:30.146 --> 1:08:35.704 choices of r. Any suggestions on how I might 978 1:08:35.704 --> 1:08:40.303 find the minimum running time over all choices of r? 979 1:08:40.303 --> 1:08:44 Techniques, not necessarily solutions. 980 1:08:44 --> 1:08:53 981 1:08:53 --> 1:08:55.488 We're not used to this because it's asymptomatic, 982 1:08:55.488 --> 1:08:58.288 but forget the big O here. How do I minimize a function 983 1:08:58.288 --> 1:09:01.336 with respect to one variable? Take the derivative, 984 1:09:01.336 --> 1:09:03.541 yeah. I can take the derivative of 985 1:09:03.541 --> 1:09:06.08 this function by r, differentiate by r, 986 1:09:06.08 --> 1:09:10.022 set the derivative equal to 0, and that should be a critical 987 1:09:10.022 --> 1:09:13.496 point in this function. It turns out this function is 988 1:09:13.496 --> 1:09:16.369 unimodal in r and you will find the minimum. 989 1:09:16.369 --> 1:09:19.51 We could do that. I'm not going to do it because 990 1:09:19.51 --> 1:09:23.385 it takes a little bit more work. You should try it at home. 991 1:09:23.385 --> 1:09:27.06 It will give you the exact minimum, which is good if you 992 1:09:27.06 --> 1:09:32.284 know what this constant is. Differentiate with respect to r 993 1:09:32.284 --> 1:09:35.305 and set to 0. I am going to do it a little 994 1:09:35.305 --> 1:09:39.063 bit more intuitively, in other words less precisely, 995 1:09:39.063 --> 1:09:41.789 but I will still get the right answer. 996 1:09:41.789 --> 1:09:46.21 And definitely I will get an upper bound because I can choose 997 1:09:46.21 --> 1:09:50.115 r to be whatever I want. It turns out this will be the 998 1:09:50.115 --> 1:09:53.21 right answer. Let's just think about growth 999 1:09:53.21 --> 1:09:56.526 in terms of r. There are essentially two terms 1000 1:09:56.526 --> 1:10:00.024 here. I have b/r(n) and I have 1001 1:10:00.024 --> 1:10:03.315 b/r(2^r). Now, b/r(n) would like r to be 1002 1:10:03.315 --> 1:10:07.364 as big as possible. The bigger r is the number of 1003 1:10:07.364 --> 1:10:10.992 rounds goes down. This number in front of n, 1004 1:10:10.992 --> 1:10:16.138 this coefficient in front of n goes down, so I would like r to 1005 1:10:16.138 --> 1:10:18.669 be big. So, b/r(n) wants r big. 1006 1:10:18.669 --> 1:10:23.478 However, r cannot be too big. This is saying I want digits 1007 1:10:23.478 --> 1:10:28.54 that have a lot of bits in them. It cannot be too big because 1008 1:10:28.54 --> 1:10:34.465 there's 2^r term out here. If this happens to be bigger 1009 1:10:34.465 --> 1:10:39.22 than n then this will dominate in terms of growth of r. 1010 1:10:39.22 --> 1:10:43.182 This is going to be b times 2 to the r over r. 1011 1:10:43.182 --> 1:10:46.264 2 the r is much, much bigger than r, 1012 1:10:46.264 --> 1:10:50.49 so it's going to grow much faster is what I mean. 1013 1:10:50.49 --> 1:10:55.949 And so I really don't want r to be too big for this other term. 1014 1:10:55.949 --> 1:11:00 So, that is b/4(2^r) wants r small. 1015 1:11:00 --> 1:11:06.684 Provided that this term is bigger or equal to this term 1016 1:11:06.684 --> 1:11:11.758 then I can set r pretty big for that term. 1017 1:11:11.758 --> 1:11:16.71 What I want is the n to dominate the 2^r. 1018 1:11:16.71 --> 1:11:23.641 Provided I have that then I can set r as large as I want. 1019 1:11:23.641 --> 1:11:30.697 Let's say I want to choose r to be maximum subject to this 1020 1:11:30.697 --> 1:11:38 condition that n is greater than or equal to 2^r. 1021 1:11:38 --> 1:11:42.291 This is an upper bound to 2^r, and upper bound on r. 1022 1:11:42.291 --> 1:11:44.899 In other words, I want r = lg n. 1023 1:11:44.899 --> 1:11:49.948 This turns out to be the right answer up to constant factors. 1024 1:11:49.948 --> 1:11:53.566 There we go. And definitely choosing r to be 1025 1:11:53.566 --> 1:11:58.951 lg n will give me an upper bound on the best running time I could 1026 1:11:58.951 --> 1:12:04 get because I can choose it to be whatever I want. 1027 1:12:04 --> 1:12:10.564 If you differentiate you will indeed get the same answer. 1028 1:12:10.564 --> 1:12:15.956 This was not quite a formal argument but close, 1029 1:12:15.956 --> 1:12:21.699 because the big O is all about what grows fastest. 1030 1:12:21.699 --> 1:12:26.036 If we plug in r = lg n we get bn/lg n. 1031 1:12:26.036 --> 1:12:31.78 The n and the 2^r are equal, that's a factor of 2, 1032 1:12:31.78 --> 1:12:38.704 2 times n, not a big deal. It comes out into the O. 1033 1:12:38.704 --> 1:12:44.788 We have bn/lg n which is r. We have to think about what 1034 1:12:44.788 --> 1:12:49.859 this means and translate it in terms of range. 1035 1:12:49.859 --> 1:12:56.957 b was the number of bits in our number, which corresponds to the 1036 1:12:56.957 --> 1:13:03.417 range of the number. I've got 20 minutes under so 1037 1:13:03.417 --> 1:13:08.543 far in lecture so I can go 20 minutes over, 1038 1:13:08.543 --> 1:13:11.228 right? No, I'm kidding. 1039 1:13:11.228 --> 1:13:15.988 Almost done. Let's say that our numbers, 1040 1:13:15.988 --> 1:13:21.724 are integers are in the range, we have 0 to 2^b, 1041 1:13:21.724 --> 1:13:26.606 I'm going to say that it's range 0 to nd. 1042 1:13:26.606 --> 1:13:33.449 This should be a -1 here. If I have numbers that are 1043 1:13:33.449 --> 1:13:38.632 between 0 and n^d - 1 where d is a constant or d is some 1044 1:13:38.632 --> 1:13:42.306 parameter, so this is a polynomial in n, 1045 1:13:42.306 --> 1:13:45.604 then you work out this running time. 1046 1:13:45.604 --> 1:13:49.844 It is order dn. This is the way to think about 1047 1:13:49.844 --> 1:13:54.179 it because now we can compare to counting sort. 1048 1:13:54.179 --> 1:13:59.644 Counting sort could handle 0 up to some constant times d in 1049 1:13:59.644 --> 1:14:04.501 linear time. Now I can handle 0 up to n to 1050 1:14:04.501 --> 1:14:07.434 some constant power in linear time. 1051 1:14:07.434 --> 1:14:12.178 This is if d = order 1 then we get a linear time sorting 1052 1:14:12.178 --> 1:14:15.543 algorithm. And that is cool as long as d 1053 1:14:15.543 --> 1:14:19.511 is at most lg n. As long as your numbers are at 1054 1:14:19.511 --> 1:14:24.255 most n lg n then we have something that beats our n lg n 1055 1:14:24.255 --> 1:14:29 sorting algorithms. And this is pretty nice. 1056 1:14:29 --> 1:14:33.099 Whenever you know that your numbers are order log end bits 1057 1:14:33.099 --> 1:14:36.048 long we are happy, and you get some smooth 1058 1:14:36.048 --> 1:14:37.99 tradeoff there. For example, 1059 1:14:37.99 --> 1:14:42.018 if we have our 32 bit numbers and we split into let's say 1060 1:14:42.018 --> 1:14:46.262 eight bit chunks then we'll only have to do four rounds each 1061 1:14:46.262 --> 1:14:49.57 linear time and we have just 256 working space. 1062 1:14:49.57 --> 1:14:52.735 We were doing four rounds for 32 bit numbers. 1063 1:14:52.735 --> 1:14:56.835 If you use n lg n algorithm, you're going to be doing lg n 1064 1:14:56.835 --> 1:15:00.941 rounds through your numbers. n is like 2000, 1065 1:15:00.941 --> 1:15:03.515 and that's at least 11 rounds for example. 1066 1:15:03.515 --> 1:15:07.281 You would think this algorithm is going to be much faster for 1067 1:15:07.281 --> 1:15:09.038 small numbers. Unfortunately, 1068 1:15:09.038 --> 1:15:11.612 counting sort is not very good on a cache. 1069 1:15:11.612 --> 1:15:14.311 In practice, rating sort is not that fast an 1070 1:15:14.311 --> 1:15:17.199 algorithm unless your numbers are really small. 1071 1:15:17.199 --> 1:15:19.584 Something like quicksort can do better. 1072 1:15:19.584 --> 1:15:22.66 It's sort of shame, but theoretically this is very 1073 1:15:22.66 --> 1:15:25.045 beautiful. And there are contexts where 1074 1:15:25.045 --> 1:15:29 this is really the right way to sort things. 1075 1:15:29 --> 1:15:34.352 I will mention finally that if you have arbitrary integers that 1076 1:15:34.352 --> 1:15:39.1 are one word length long. Here we're assuming that there 1077 1:15:39.1 --> 1:15:44.28 are b bits in a word and we have some depends indirectly on b 1078 1:15:44.28 --> 1:15:46.093 here. But, in general, 1079 1:15:46.093 --> 1:15:51.1 if you have a bunch of integers and they're one word length 1080 1:15:51.1 --> 1:15:55.589 long, and you can manipulate a word in constant time, 1081 1:15:55.589 --> 1:16:00.597 then the best algorithm we know for sorting runs in n times 1082 1:16:00.597 --> 1:16:05 square root of lg lg n time expected. 1083 1:16:05 --> 1:16:08.719 It is a randomized algorithm. We're not going to cover that 1084 1:16:08.719 --> 1:16:11.798 algorithm in this class. It's rather complicated. 1085 1:16:11.798 --> 1:16:15.068 I didn't even cover it in Advanced Algorithms when I 1086 1:16:15.068 --> 1:16:17.57 taught it. If you want something easier, 1087 1:16:17.57 --> 1:16:21.289 you can get n times square root of lg lg n time worst-case. 1088 1:16:21.289 --> 1:16:23.406 And that paper is almost readable. 1089 1:16:23.406 --> 1:16:26.035 I have taught that in Advanced Algorithms. 1090 1:16:26.035 --> 1:16:28.729 If you're interested in this kind of stuff, 1091 1:16:28.729 --> 1:16:32 take Advanced Algorithms next fall. 1092 1:16:32 --> 1:16:34.552 It's one of the follow-ons to this class. 1093 1:16:34.552 --> 1:16:38.317 These are much more complicated algorithms, but it gives you 1094 1:16:38.317 --> 1:16:40.87 some sense. You can even break out of the 1095 1:16:40.87 --> 1:16:43.742 dependence on b, as long as you know that b is 1096 1:16:43.742 --> 1:16:46.486 at most a word. And I will stop there unless 1097 1:16:46.486 --> 1:16:49 there are any questions. Then see you Wednesday.