1 00:00:01,550 --> 00:00:03,920 The following content is provided under a Creative 2 00:00:03,920 --> 00:00:05,310 Commons license. 3 00:00:05,310 --> 00:00:07,520 Your support will help MIT OpenCourseWare 4 00:00:07,520 --> 00:00:11,610 continue to offer high-quality educational resources for free. 5 00:00:11,610 --> 00:00:14,180 To make a donation or to view additional materials 6 00:00:14,180 --> 00:00:18,140 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,140 --> 00:00:19,026 at ocw.mit.edu. 8 00:00:22,250 --> 00:00:24,020 JULIAN SHUN: Good afternoon, everybody. 9 00:00:24,020 --> 00:00:29,590 So welcome to the third lecture of 6.172. 10 00:00:29,590 --> 00:00:31,340 Today we're going to talk about bit hacks, 11 00:00:31,340 --> 00:00:33,380 and today's going to be a really fun lecture. 12 00:00:36,510 --> 00:00:39,860 So, first of all, let's recall the binary representation 13 00:00:39,860 --> 00:00:41,250 of a word. 14 00:00:41,250 --> 00:00:45,390 So a w-bit word is represented as follows. 15 00:00:45,390 --> 00:00:49,940 So we're going to number the bits from x0 to xw minus 1 16 00:00:49,940 --> 00:00:53,370 starting from the rightmost side. 17 00:00:53,370 --> 00:00:55,850 And the unsigned integer value stored 18 00:00:55,850 --> 00:00:57,650 in x with this binary representation 19 00:00:57,650 --> 00:00:59,820 can be computed as follows. 20 00:00:59,820 --> 00:01:03,200 So it's essentially the sum of a whole bunch of powers of 2. 21 00:01:03,200 --> 00:01:08,320 And you sum the product of the bit with the appropriate power 22 00:01:08,320 --> 00:01:08,820 of 2. 23 00:01:08,820 --> 00:01:10,610 So if the bit is 1 in position k, 24 00:01:10,610 --> 00:01:12,950 then you multiply by 2 to the k. 25 00:01:12,950 --> 00:01:15,350 And if it's 0, then you just add 0. 26 00:01:15,350 --> 00:01:19,760 So, for example, let's say we have this 8-bit word here. 27 00:01:19,760 --> 00:01:23,840 And if we apply this equation, we get-- 28 00:01:23,840 --> 00:01:28,620 first we get 2 because there is one bit in the first position. 29 00:01:28,620 --> 00:01:32,090 So we multiply 1 by 2 to 1, which is 2. 30 00:01:32,090 --> 00:01:34,730 Then in the second position, we also have a 1. 31 00:01:34,730 --> 00:01:39,330 So we multiply 1 by 2 to the 2, which is 4. 32 00:01:39,330 --> 00:01:41,630 And then we have 16 and 128. 33 00:01:41,630 --> 00:01:44,840 So we just sum up all of these powers of 2 34 00:01:44,840 --> 00:01:49,490 and that gives us the unsigned integer value. 35 00:01:49,490 --> 00:01:54,480 And that 0b prefix here represents a Boolean constant. 36 00:01:54,480 --> 00:01:56,690 So that means we're going to interpret this constant 37 00:01:56,690 --> 00:01:58,280 as a Boolean value. 38 00:02:01,580 --> 00:02:03,490 There's also signed integers. 39 00:02:03,490 --> 00:02:05,800 So you can also represent negative numbers, which 40 00:02:05,800 --> 00:02:07,780 is useful, and this is called the two's 41 00:02:07,780 --> 00:02:09,370 complement representation. 42 00:02:09,370 --> 00:02:11,590 And here's the formula for computing 43 00:02:11,590 --> 00:02:14,290 the two's complement representation of a word. 44 00:02:14,290 --> 00:02:19,030 So for bit 0 all the way up to bit w minus 2, 45 00:02:19,030 --> 00:02:21,460 you do the same thing as above. 46 00:02:21,460 --> 00:02:26,530 But for the leftmost bit or bit w minus 1, 47 00:02:26,530 --> 00:02:31,984 you subtract that bit multiplied by 2 to the w minus 1. 48 00:02:31,984 --> 00:02:38,680 So for this example here, we saw 2 plus 4 plus 16. 49 00:02:38,680 --> 00:02:40,210 That's the same as above. 50 00:02:40,210 --> 00:02:43,000 But for the leftmost bit, since we have a 1 here, 51 00:02:43,000 --> 00:02:47,170 we're going to subtract 2 the 7, which is 128. 52 00:02:47,170 --> 00:02:49,870 And this gives us the signed value 53 00:02:49,870 --> 00:02:54,100 for the integer, which is negative 106. 54 00:02:54,100 --> 00:02:55,520 Does that make sense? 55 00:02:55,520 --> 00:02:57,978 Any questions about this representation? 56 00:03:02,140 --> 00:03:05,410 So the leftmost bit is known as a sign bit 57 00:03:05,410 --> 00:03:09,730 because it tells you whether you need to subtract 58 00:03:09,730 --> 00:03:12,298 by this negative value or not. 59 00:03:12,298 --> 00:03:14,590 So if it's 0, then you don't have to subtract anything. 60 00:03:14,590 --> 00:03:20,500 If it's 1, then you subtract by a large integer value. 61 00:03:20,500 --> 00:03:25,180 So in two's complement, the all 0's word is just 0. 62 00:03:25,180 --> 00:03:28,150 So you just apply the above formula and everything is 0. 63 00:03:28,150 --> 00:03:31,530 So you just get 0. 64 00:03:31,530 --> 00:03:33,360 What's the value of the all 1's word? 65 00:03:36,760 --> 00:03:37,260 Yes. 66 00:03:37,260 --> 00:03:38,580 AUDIENCE: 1. 67 00:03:38,580 --> 00:03:40,950 JULIAN SHUN: Negative 1, right? 68 00:03:40,950 --> 00:03:45,060 So the reason why it's negative 1, so you can just 69 00:03:45,060 --> 00:03:47,190 use the formula. 70 00:03:47,190 --> 00:03:50,340 And we're going to sum up a bunch of powers of 2. 71 00:03:50,340 --> 00:03:53,550 All of the x sub k's are going to be 1. 72 00:03:53,550 --> 00:03:58,170 So we're summing up 2 to the k from k equals 0 to w minus 2, 73 00:03:58,170 --> 00:04:01,830 and that's a geometric series which sums to 2 to the w 74 00:04:01,830 --> 00:04:03,700 minus 1 minus 1. 75 00:04:03,700 --> 00:04:05,690 And then for the sign bit, we're going 76 00:04:05,690 --> 00:04:08,160 to subtract 2 to the w minus 1. 77 00:04:08,160 --> 00:04:10,800 So now the 2 to the w minus 1's cancel out 78 00:04:10,800 --> 00:04:13,840 and we're just left with negative 1. 79 00:04:13,840 --> 00:04:17,160 So this is an important property to know about two's 80 00:04:17,160 --> 00:04:19,370 complement representation. 81 00:04:19,370 --> 00:04:24,342 The all 1's word is just negative 1. 82 00:04:24,342 --> 00:04:28,800 And this leads to important identity which says that x plus 83 00:04:28,800 --> 00:04:31,990 the one's complement of x-- the one's complement is just all 84 00:04:31,990 --> 00:04:33,900 the bits of x flipped-- 85 00:04:33,900 --> 00:04:36,255 is equal to negative 1. 86 00:04:36,255 --> 00:04:40,860 This is because if you add x with all of it bits flipped, 87 00:04:40,860 --> 00:04:43,260 then you're just going to end up with the all 1's word. 88 00:04:43,260 --> 00:04:45,330 And we saw on the previous slide that that's 89 00:04:45,330 --> 00:04:46,980 equal to negative 1. 90 00:04:46,980 --> 00:04:50,190 And from this identity, we have that negative x 91 00:04:50,190 --> 00:04:52,990 is equal to the one's complement of x plus 1. 92 00:04:52,990 --> 00:04:54,930 So this relates the two's complement 93 00:04:54,930 --> 00:04:58,360 to the one's complement representation. 94 00:04:58,360 --> 00:05:00,640 Let's look at an example. 95 00:05:00,640 --> 00:05:02,580 So let's look at-- 96 00:05:02,580 --> 00:05:05,790 let's say x is equal to this constant here. 97 00:05:05,790 --> 00:05:09,240 The one's complement of x or tilde of x 98 00:05:09,240 --> 00:05:12,390 is just all of the bits of x flipped. 99 00:05:12,390 --> 00:05:16,320 And then to get negative x, we add 1 100 00:05:16,320 --> 00:05:17,940 to the one's complement of x. 101 00:05:17,940 --> 00:05:20,520 And the fact of adding 1 here is we're 102 00:05:20,520 --> 00:05:25,290 going to take the rightmost 0 bit in the one's complement, 103 00:05:25,290 --> 00:05:26,770 flip that to a 1. 104 00:05:26,770 --> 00:05:29,100 And then for all of the bits to the right of that, 105 00:05:29,100 --> 00:05:30,750 we flip them to 0's. 106 00:05:33,910 --> 00:05:37,330 So another way to see this is you 107 00:05:37,330 --> 00:05:41,110 look at the representation of x and you flip all of the bits 108 00:05:41,110 --> 00:05:45,185 up to the rightmost 1 but not including that rightmost 1 bit, 109 00:05:45,185 --> 00:05:46,810 and then you just copy everything over. 110 00:05:49,830 --> 00:05:51,290 So any questions about this? 111 00:05:54,490 --> 00:05:54,990 OK. 112 00:05:57,760 --> 00:06:03,370 So this is a table showing the relationship between hex 113 00:06:03,370 --> 00:06:04,570 and binary representation. 114 00:06:04,570 --> 00:06:07,690 So hex representation is base 16. 115 00:06:07,690 --> 00:06:11,740 And the reason why we use hex is because sometimes we 116 00:06:11,740 --> 00:06:15,310 have these big binary constants and we don't want to write-- 117 00:06:15,310 --> 00:06:18,100 have to type all of these symbols into our code. 118 00:06:18,100 --> 00:06:20,800 And hex gives us a more compact format 119 00:06:20,800 --> 00:06:23,350 to write these constants. 120 00:06:23,350 --> 00:06:26,170 And this table, you can basically just 121 00:06:26,170 --> 00:06:29,680 look up, for each possible hex value, what 122 00:06:29,680 --> 00:06:31,360 its binary representation is. 123 00:06:31,360 --> 00:06:36,190 And for the values from 0 to 9, we're 124 00:06:36,190 --> 00:06:39,340 just going to use the same as decimal representation for hex. 125 00:06:39,340 --> 00:06:41,040 And then for values 10 to 15, we're 126 00:06:41,040 --> 00:06:47,460 going to use the characters from A to F. 127 00:06:47,460 --> 00:06:52,020 To translate from hex to binary, you just take each hex digit, 128 00:06:52,020 --> 00:06:55,620 look it up in this table, write out the binary equivalent, 129 00:06:55,620 --> 00:06:58,000 and then you concatenate together 130 00:06:58,000 --> 00:06:59,850 all of the binary values you've got. 131 00:06:59,850 --> 00:07:03,960 So in this example I have this hex constant 132 00:07:03,960 --> 00:07:07,380 which says DEC1DE2C0DE4F00D. 133 00:07:07,380 --> 00:07:11,970 So now I just look up each of these hex values in this table. 134 00:07:11,970 --> 00:07:18,940 So D is 1101, E is 1110, C is 1100, and so on. 135 00:07:18,940 --> 00:07:22,260 And I just concatenate all of these values together 136 00:07:22,260 --> 00:07:26,192 and that gives me my binary representation. 137 00:07:26,192 --> 00:07:27,900 And you can also go the other way around, 138 00:07:27,900 --> 00:07:30,720 converting binary to hex. 139 00:07:30,720 --> 00:07:33,120 And you do the same thing, just look it up in this table. 140 00:07:36,020 --> 00:07:39,890 And the prefix 0x here designates a hex constant, 141 00:07:39,890 --> 00:07:43,730 just like 0b designates the Boolean constant. 142 00:07:43,730 --> 00:07:45,867 So if you're using these constants in your code 143 00:07:45,867 --> 00:07:47,450 and you're writing it in hex, then you 144 00:07:47,450 --> 00:07:49,370 should use the 0x prefix. 145 00:07:55,400 --> 00:07:58,640 So C has a bunch of bitwise operators. 146 00:07:58,640 --> 00:08:00,950 And here's a table describing what 147 00:08:00,950 --> 00:08:02,300 these bitwise operators do. 148 00:08:02,300 --> 00:08:05,510 So the ampersand is just logical AND. 149 00:08:05,510 --> 00:08:09,590 The vertical bar is logical OR. 150 00:08:09,590 --> 00:08:13,850 This caret sign is the XOR or exclusive OR. 151 00:08:13,850 --> 00:08:17,660 And XOR just says that if either of the two bits is 1, 152 00:08:17,660 --> 00:08:18,820 then we return 1. 153 00:08:18,820 --> 00:08:22,550 And if both of the bits are 0 or both of them are 1, 154 00:08:22,550 --> 00:08:24,760 then we return 0. 155 00:08:24,760 --> 00:08:30,800 The tilde sign is the one's complement or the not. 156 00:08:30,800 --> 00:08:35,659 And then we have left shift and right shift operators. 157 00:08:35,659 --> 00:08:39,030 So let's look at how these operatives work on this example 158 00:08:39,030 --> 00:08:39,530 here. 159 00:08:39,530 --> 00:08:41,210 So we have these two 8-bit words, 160 00:08:41,210 --> 00:08:47,750 A and B. To compute A AND B, we just look at every two bits 161 00:08:47,750 --> 00:08:51,170 in the same position in A and B and compute 162 00:08:51,170 --> 00:08:52,610 the AND of those two bits. 163 00:08:52,610 --> 00:08:57,130 So 1 ANDed with 0 is 0, so we get 0 here. 164 00:08:57,130 --> 00:08:59,210 0 ANDed with 1 is 0. 165 00:08:59,210 --> 00:09:03,512 1 ended with 1 is 1, and so on. 166 00:09:03,512 --> 00:09:07,760 A OR B is similar but now you apply the OR operator 167 00:09:07,760 --> 00:09:09,290 instead of the AND operator. 168 00:09:09,290 --> 00:09:11,720 So if either one of the two positions is 1, 169 00:09:11,720 --> 00:09:13,100 then you return 1. 170 00:09:13,100 --> 00:09:14,900 And if both are 0, then you return 0. 171 00:09:14,900 --> 00:09:19,880 So an A OR B, all of the bits except for this bit here 172 00:09:19,880 --> 00:09:23,340 is 0 because in the original two words 173 00:09:23,340 --> 00:09:27,410 both of the corresponding bits were 0. 174 00:09:27,410 --> 00:09:32,570 For A XOR B, we check if exactly one of the two bits is 1. 175 00:09:32,570 --> 00:09:37,790 So for the leftmost bit, we have 1 and 0, 176 00:09:37,790 --> 00:09:42,140 so we have exactly one bit set to 1 and we get a 1 here. 177 00:09:42,140 --> 00:09:44,910 The second bit is 0 and 1, so that's 1. 178 00:09:44,910 --> 00:09:48,820 The third bit is 1, 1, so that's 0, and so on. 179 00:09:48,820 --> 00:09:51,330 Tilde of A is just the one's complement of A. We saw 180 00:09:51,330 --> 00:09:51,830 that before. 181 00:09:51,830 --> 00:09:54,080 We just flip all the bits. 182 00:09:54,080 --> 00:09:58,100 A right shifted by 3, we just shift the bit string 183 00:09:58,100 --> 00:10:03,800 to the right by 3, and then we fill in the digits or the bits 184 00:10:03,800 --> 00:10:05,880 on the left with 0's. 185 00:10:05,880 --> 00:10:09,140 And then A left shifted with 2, we do the same thing 186 00:10:09,140 --> 00:10:10,400 but to the left. 187 00:10:10,400 --> 00:10:12,665 And then we fill in these empty bits with 0's. 188 00:10:15,440 --> 00:10:19,915 So these are the bitwise operators in C. Any questions? 189 00:10:28,160 --> 00:10:30,100 AUDIENCE: They're not [INAUDIBLE]?? 190 00:10:34,470 --> 00:10:37,330 JULIAN SHUN: For a right shift, there is a-- 191 00:10:37,330 --> 00:10:42,190 there is a shift that will fill in the upper digits 192 00:10:42,190 --> 00:10:45,970 with whatever the leftmost digit was. 193 00:10:45,970 --> 00:10:47,980 But if you're working with unsigned integers, 194 00:10:47,980 --> 00:10:49,272 then it's not going to do that. 195 00:10:49,272 --> 00:10:51,430 For signed integers it will. 196 00:10:51,430 --> 00:10:53,980 And when we're doing bit manipulations, 197 00:10:53,980 --> 00:10:56,350 we're usually going to stick to unsigned integers, 198 00:10:56,350 --> 00:10:57,892 so we don't have to worry about that. 199 00:11:04,430 --> 00:11:06,410 So now let's look at some common idioms 200 00:11:06,410 --> 00:11:09,290 that you can do using these bitwise operators. 201 00:11:09,290 --> 00:11:11,720 So the first one we'll look at is setting 202 00:11:11,720 --> 00:11:15,380 the kth bit in a word x to 1. 203 00:11:15,380 --> 00:11:20,210 So the idea here is to use a shift followed by an OR. 204 00:11:20,210 --> 00:11:25,130 So we're going to compute 1 left-shift it by k if we 205 00:11:25,130 --> 00:11:27,410 want to set the kth bit to a 1. 206 00:11:27,410 --> 00:11:31,370 And this gives us a mask with a 1 in exactly the kth position, 207 00:11:31,370 --> 00:11:34,050 and 0's everywhere else. 208 00:11:34,050 --> 00:11:36,530 And then now when we OR that in to x, 209 00:11:36,530 --> 00:11:40,070 that's going to change the bit from a 0 to a 1 if it was a 0. 210 00:11:40,070 --> 00:11:41,930 And if that bit was already set to 1, 211 00:11:41,930 --> 00:11:43,263 then this doesn't do anything. 212 00:11:43,263 --> 00:11:44,930 And then for all of the other positions, 213 00:11:44,930 --> 00:11:46,520 since we're doing an OR with 0, we're 214 00:11:46,520 --> 00:11:51,480 just copying over the bits from x. 215 00:11:51,480 --> 00:11:54,210 So that's setting the kth bit. 216 00:11:54,210 --> 00:11:55,830 We can also clear the kth bit. 217 00:11:55,830 --> 00:11:59,190 And the idea here is to use a shift, a complement, and then 218 00:11:59,190 --> 00:12:00,440 an AND. 219 00:12:00,440 --> 00:12:05,175 So again we're going to generate this mask, 1 left-shifted by k. 220 00:12:05,175 --> 00:12:07,300 But now we're going to take the complement of this. 221 00:12:07,300 --> 00:12:11,100 So now we have a 0 in exactly the kth position and 1's 222 00:12:11,100 --> 00:12:14,300 everywhere else. 223 00:12:14,300 --> 00:12:18,670 And now when we AND this mask with x, in the kth position 224 00:12:18,670 --> 00:12:20,420 it's going to clear that bit because we're 225 00:12:20,420 --> 00:12:21,500 ANDing it with a 0. 226 00:12:21,500 --> 00:12:24,380 So the result is going to be 0 no matter 227 00:12:24,380 --> 00:12:25,400 what was there before. 228 00:12:25,400 --> 00:12:27,440 And then for all the remaining bits, 229 00:12:27,440 --> 00:12:30,110 since we're ANDing with 1, we're just copying it over 230 00:12:30,110 --> 00:12:31,220 from the original word. 231 00:12:36,890 --> 00:12:40,250 You can toggle the kth bit or flip the kth bit 232 00:12:40,250 --> 00:12:42,920 using a shift and then an XOR. 233 00:12:42,920 --> 00:12:46,550 So, again, we're going to generate this mask. 234 00:12:46,550 --> 00:12:48,920 And then now, when we do an XOR with this mask, 235 00:12:48,920 --> 00:12:53,510 it's going to change a bit from a 0 to 1, or from a 1 to a 0, 236 00:12:53,510 --> 00:12:56,330 because that's what XOR does. 237 00:12:56,330 --> 00:12:58,400 So in this example, it's changing from a 0 to 1. 238 00:12:58,400 --> 00:12:59,930 But if it was already a 1, then it's 239 00:12:59,930 --> 00:13:02,180 going to toggle it back to 0. 240 00:13:04,840 --> 00:13:05,780 Any questions? 241 00:13:12,290 --> 00:13:16,760 So let's look at another bit trick. 242 00:13:16,760 --> 00:13:21,500 So here we're trying to extract a bit field from a word x. 243 00:13:21,500 --> 00:13:26,090 And this is important if you're working with encoded data. 244 00:13:26,090 --> 00:13:29,720 And the idea here is to do a mask and a shift. 245 00:13:29,720 --> 00:13:34,820 So we're going to generate a mask with 1's in exactly 246 00:13:34,820 --> 00:13:38,720 the positions that we want to extract out of this word, 247 00:13:38,720 --> 00:13:41,330 and then 0's everywhere else. 248 00:13:41,330 --> 00:13:44,220 And then we're going to AND the x with the mask, 249 00:13:44,220 --> 00:13:47,480 and that's going to give us the bits in the four positions 250 00:13:47,480 --> 00:13:49,400 that we wanted to extract in this example, 251 00:13:49,400 --> 00:13:51,080 and then we have 0's everywhere else. 252 00:13:53,950 --> 00:13:56,950 And then now we're going to right-shift this value 253 00:13:56,950 --> 00:14:00,790 that we extracted so that it appears in the least 254 00:14:00,790 --> 00:14:06,730 significant digits so that we can use it in our computation. 255 00:14:06,730 --> 00:14:09,220 So this is a very useful bit trick 256 00:14:09,220 --> 00:14:13,390 to know if you're working with compressed or encoded data. 257 00:14:13,390 --> 00:14:16,240 And if you use the bit field facilities in C, 258 00:14:16,240 --> 00:14:21,280 it's actually going to generate assembly code that will do 259 00:14:21,280 --> 00:14:22,840 masking and shifting for you. 260 00:14:27,880 --> 00:14:31,110 You can also set a bit field in a word. 261 00:14:31,110 --> 00:14:36,300 So let's say we want to set a bit field in x to some value y. 262 00:14:36,300 --> 00:14:40,440 The idea is to first invert this mask to clear those bits 263 00:14:40,440 --> 00:14:41,820 we want to set in x. 264 00:14:41,820 --> 00:14:45,950 And then we OR in the shifted value of y. 265 00:14:45,950 --> 00:14:50,180 So let's say we have these two words, x and y here. 266 00:14:50,180 --> 00:14:52,273 We're going to generate the mask as we did before, 267 00:14:52,273 --> 00:14:54,440 but now we're going to flip all the bits in the mask 268 00:14:54,440 --> 00:14:57,440 by taking the one's complement. 269 00:14:57,440 --> 00:15:00,920 And then we AND the-- 270 00:15:00,920 --> 00:15:03,260 we AND the one's complement of the mask 271 00:15:03,260 --> 00:15:09,230 with x, and that's going to clear the bits in x because we 272 00:15:09,230 --> 00:15:11,750 have 0's in exactly those positions in that mask, 273 00:15:11,750 --> 00:15:14,840 and when you AND that into x it will return to 0. 274 00:15:14,840 --> 00:15:16,880 And then for all the other positions, 275 00:15:16,880 --> 00:15:18,920 we're just copying in the bits of x. 276 00:15:21,500 --> 00:15:25,000 And then, finally, we're going to left-shift y 277 00:15:25,000 --> 00:15:26,660 by an appropriate amount so that we 278 00:15:26,660 --> 00:15:29,870 can line up the value with these four bit positions here. 279 00:15:29,870 --> 00:15:33,260 And then we can now just OR those values in. 280 00:15:33,260 --> 00:15:36,890 And this will set the positions in x to the value y. 281 00:15:44,790 --> 00:15:46,530 In order to be safe, you should actually 282 00:15:46,530 --> 00:15:49,800 do a mask on the shifted y value before you OR it in, 283 00:15:49,800 --> 00:15:51,600 because you don't know that the value of y 284 00:15:51,600 --> 00:15:53,910 is within the range of the mask. 285 00:15:53,910 --> 00:15:58,680 So if y has some garbage values in the higher bits, 286 00:15:58,680 --> 00:16:00,300 when you OR this in it might pollute 287 00:16:00,300 --> 00:16:01,830 the original value of x. 288 00:16:01,830 --> 00:16:03,750 So, for safety, you should actually 289 00:16:03,750 --> 00:16:08,030 do a mask before you OR the value, the shifted value of y 290 00:16:08,030 --> 00:16:08,530 in. 291 00:16:15,330 --> 00:16:17,040 So any questions on this? 292 00:16:21,990 --> 00:16:24,710 So now let's look at how we can swap two integers. 293 00:16:24,710 --> 00:16:29,480 So we want to swap the values of x and y. 294 00:16:29,480 --> 00:16:33,200 The standard way to do this is to use a temporary variable t. 295 00:16:33,200 --> 00:16:37,490 So we set t equal to x, x equal to y, and then y equal to t. 296 00:16:40,610 --> 00:16:43,570 This does involve a temporary variable, however. 297 00:16:43,570 --> 00:16:45,170 So now the question is whether we 298 00:16:45,170 --> 00:16:48,290 can do a swap without using a temporary variable. 299 00:16:48,290 --> 00:16:51,940 It turns out that you can using bit tricks. 300 00:16:51,940 --> 00:16:55,580 So here's the code for doing a no-temp swap. 301 00:16:55,580 --> 00:16:58,210 So you first set x equal to x XOR 302 00:16:58,210 --> 00:17:04,780 y, then y equal to x XOR y, and then x equal to x XOR y. 303 00:17:04,780 --> 00:17:08,210 So has anyone seen this before? 304 00:17:08,210 --> 00:17:08,800 OK, good. 305 00:17:08,800 --> 00:17:11,180 So some of you have seen this before. 306 00:17:11,180 --> 00:17:12,670 And for the rest of you all, I'll 307 00:17:12,670 --> 00:17:14,839 tell you how it works in the next couple slides. 308 00:17:14,839 --> 00:17:18,160 So let's first look at an example 309 00:17:18,160 --> 00:17:20,079 of how to run this code before we 310 00:17:20,079 --> 00:17:21,680 go into why it actually works. 311 00:17:21,680 --> 00:17:27,069 So we're going to start with these two words in x and y. 312 00:17:27,069 --> 00:17:31,210 We're first going to do x equal x XOR y. 313 00:17:31,210 --> 00:17:33,520 And now we store the result in x. 314 00:17:33,520 --> 00:17:37,630 And this is the result when you do the XOR of these two words. 315 00:17:37,630 --> 00:17:41,120 And then now we do y equal to x XOR y. 316 00:17:41,120 --> 00:17:44,650 And notice how the value of x here has already changed. 317 00:17:44,650 --> 00:17:47,650 So we're doing the XOR of these two words 318 00:17:47,650 --> 00:17:49,990 and setting that to y. 319 00:17:49,990 --> 00:17:53,200 And here this value is actually the same as x. 320 00:17:53,200 --> 00:17:57,040 So we've already placed x in y. 321 00:17:57,040 --> 00:17:59,170 And, finally, we do another XOR. 322 00:17:59,170 --> 00:18:03,460 We set x equal to x XOR y. 323 00:18:03,460 --> 00:18:06,430 And then this gives us this value, which is y. 324 00:18:06,430 --> 00:18:08,500 So at the end, we've just swapped 325 00:18:08,500 --> 00:18:11,185 x and y without using any temporary variable. 326 00:18:14,150 --> 00:18:18,940 So the reason why this works is because XOR is its own inverse. 327 00:18:18,940 --> 00:18:23,260 So if you do x XOR y, and then XOR the result of that with y, 328 00:18:23,260 --> 00:18:24,950 you just get back x itself. 329 00:18:24,950 --> 00:18:27,800 So let's look at the truth table to see why this is true. 330 00:18:27,800 --> 00:18:33,770 So in the x and y columns, I've shown all the possibilities. 331 00:18:33,770 --> 00:18:37,450 So there are four different possibilities of x and y. 332 00:18:37,450 --> 00:18:40,160 And then I also have the values of x XOR y. 333 00:18:40,160 --> 00:18:45,670 So it's 1 in the rows where I have exactly one 1, 334 00:18:45,670 --> 00:18:48,070 and then 0 in the remaining rows. 335 00:18:48,070 --> 00:18:53,170 And then now if I do x XOR y XORed with y, 336 00:18:53,170 --> 00:18:55,930 I'm going to XOR these values with y. 337 00:18:55,930 --> 00:18:57,880 0 XOR 0 is 0. 338 00:18:57,880 --> 00:19:00,250 1 XOR 1 is 0. 339 00:19:00,250 --> 00:19:01,690 1 XOR 0 is 1. 340 00:19:01,690 --> 00:19:04,010 And 0 XOR 1 is 1. 341 00:19:04,010 --> 00:19:09,310 And notice that these values are the same as the values of x. 342 00:19:09,310 --> 00:19:14,950 So when I XOR something in twice, it just cancels out 343 00:19:14,950 --> 00:19:16,410 and I get back the original thing. 344 00:19:20,720 --> 00:19:26,780 So now let's go into why this bit trick actually does a swap. 345 00:19:26,780 --> 00:19:29,330 So in the first line, what we're doing is 346 00:19:29,330 --> 00:19:34,730 we're generating a mask with 1's where the bits in x and y 347 00:19:34,730 --> 00:19:35,240 differ. 348 00:19:35,240 --> 00:19:37,130 Because that's what XOR is going to give you. 349 00:19:37,130 --> 00:19:39,620 It's going to return a 1 if the bits are different, 350 00:19:39,620 --> 00:19:40,670 and 0 otherwise. 351 00:19:40,670 --> 00:19:43,970 So this is a mask that tells us in which positions 352 00:19:43,970 --> 00:19:48,340 the bits in x and y differ. 353 00:19:48,340 --> 00:19:50,900 And I'm going to store that into x. 354 00:19:50,900 --> 00:19:54,830 And then in the second line, when I do x XOR y, 355 00:19:54,830 --> 00:19:57,380 this is going to flip the bits in y 356 00:19:57,380 --> 00:19:59,510 that are different from x, because I'm 357 00:19:59,510 --> 00:20:02,150 XORing with this mask, which tells me which of the bits 358 00:20:02,150 --> 00:20:03,590 differ from x. 359 00:20:03,590 --> 00:20:06,350 And then if I XOR with that mask, 360 00:20:06,350 --> 00:20:10,580 I'm flipping the bits in y that differ from x, 361 00:20:10,580 --> 00:20:13,550 and this will just give me back x. 362 00:20:13,550 --> 00:20:15,140 And I store that in y. 363 00:20:15,140 --> 00:20:20,270 So we see that the original value of x is in y now. 364 00:20:20,270 --> 00:20:22,610 And then in the last line, I do the same thing but now 365 00:20:22,610 --> 00:20:24,920 I'm flipping the bits in x that are different from y. 366 00:20:24,920 --> 00:20:28,250 So I still have the mask that's stored in x. 367 00:20:28,250 --> 00:20:31,090 And then I can XOR that mask with y, 368 00:20:31,090 --> 00:20:33,380 and y has the original value of x. 369 00:20:33,380 --> 00:20:37,280 So this is flipping the bits in x that differ from y, 370 00:20:37,280 --> 00:20:40,760 and now I have the original value of y stored in x. 371 00:20:44,380 --> 00:20:47,480 So this is a pretty cool trick, right? 372 00:20:47,480 --> 00:20:50,180 Any questions on why this works? 373 00:20:58,050 --> 00:21:00,190 So one thing about this bit trick 374 00:21:00,190 --> 00:21:03,080 here is that it's actually poor at exploiting 375 00:21:03,080 --> 00:21:06,650 instruction-level parallelism, so it's actually 376 00:21:06,650 --> 00:21:08,750 going to be slower than the naive code that 377 00:21:08,750 --> 00:21:10,760 uses a temporary variable. 378 00:21:10,760 --> 00:21:14,240 Because in the original code I had, 379 00:21:14,240 --> 00:21:16,910 I could actually execute two lines in parallel. 380 00:21:16,910 --> 00:21:19,160 I can store value into the temporary 381 00:21:19,160 --> 00:21:21,110 and then also change one of the values 382 00:21:21,110 --> 00:21:23,330 of x and y at the same time. 383 00:21:23,330 --> 00:21:25,040 Whereas in this code here, there's 384 00:21:25,040 --> 00:21:27,830 a sequential dependence among these three lines. 385 00:21:27,830 --> 00:21:31,040 I can't execute any of the lines in parallel. 386 00:21:31,040 --> 00:21:34,580 We'll learn more about instruction-level parallelism 387 00:21:34,580 --> 00:21:37,640 in next week's lectures, but I just 388 00:21:37,640 --> 00:21:40,787 wanted to point out that the performance of this 389 00:21:40,787 --> 00:21:41,870 isn't actually that great. 390 00:21:41,870 --> 00:21:44,990 But this is actually a pretty cool trick to know. 391 00:21:44,990 --> 00:21:47,642 Sometimes it shows up in job interviews. 392 00:21:52,350 --> 00:21:55,890 So the next thing we're going to look at 393 00:21:55,890 --> 00:22:01,830 is finding the minimum of two integers, x and y. 394 00:22:01,830 --> 00:22:03,720 So let's say we want to store the result 395 00:22:03,720 --> 00:22:06,810 of the minimum in a variable r. 396 00:22:06,810 --> 00:22:09,540 Here's the standard way to do this. 397 00:22:09,540 --> 00:22:11,260 We just use an if-else statement. 398 00:22:11,260 --> 00:22:14,250 So if x is less than y, than r is x. 399 00:22:14,250 --> 00:22:17,610 And, otherwise, r is set to y. 400 00:22:17,610 --> 00:22:19,410 Here's an equivalent expression. 401 00:22:19,410 --> 00:22:21,840 It just uses the ternary operator in C. 402 00:22:21,840 --> 00:22:24,750 It does exactly the same thing as the if-else statement 403 00:22:24,750 --> 00:22:25,470 on the left. 404 00:22:31,710 --> 00:22:33,377 One performance problem with this code 405 00:22:33,377 --> 00:22:34,960 is that there is a branch in the code. 406 00:22:34,960 --> 00:22:37,390 So we have this if statement that 407 00:22:37,390 --> 00:22:39,700 checks if x is less than y. 408 00:22:39,700 --> 00:22:43,600 And modern machines will do branch prediction. 409 00:22:43,600 --> 00:22:46,570 And for whatever branch it predicts the code to take, 410 00:22:46,570 --> 00:22:48,520 it's going to do prefetching and execute some 411 00:22:48,520 --> 00:22:50,870 of the instructions in advance. 412 00:22:50,870 --> 00:22:55,660 But the problem is if it mispredicts the branch, 413 00:22:55,660 --> 00:22:57,970 it does a lot of wasted work, and the processor 414 00:22:57,970 --> 00:23:00,460 has to empty the pipeline and undo all of the work 415 00:23:00,460 --> 00:23:01,060 that it did. 416 00:23:01,060 --> 00:23:08,170 So this is a performance issue due to branch misprediction. 417 00:23:08,170 --> 00:23:10,000 Modern compilers are usually good enough 418 00:23:10,000 --> 00:23:12,880 to optimize this branch away, but sometimes the compiler 419 00:23:12,880 --> 00:23:15,820 isn't good enough to optimize the branch away. 420 00:23:15,820 --> 00:23:19,870 So is there a way to do a minimum without using a branch? 421 00:23:23,240 --> 00:23:23,740 All right. 422 00:23:23,740 --> 00:23:26,010 So here's how you do it. 423 00:23:26,010 --> 00:23:32,000 So we set r equal to y XOR x or y ANDed with negative x 424 00:23:32,000 --> 00:23:34,190 less than y. 425 00:23:34,190 --> 00:23:35,900 So it's pretty obvious, right? 426 00:23:39,710 --> 00:23:43,280 So why does this work? 427 00:23:43,280 --> 00:23:46,120 So first we need to know that the C language represents 428 00:23:46,120 --> 00:23:47,950 the Boolean values true and false 429 00:23:47,950 --> 00:23:52,250 with the integers 1 and 0, respectively. 430 00:23:52,250 --> 00:23:55,000 So now let's look at the two possible cases. 431 00:23:55,000 --> 00:23:57,370 First, let's look at a case where x is less than y, 432 00:23:57,370 --> 00:23:59,110 and then we'll look at the case where x 433 00:23:59,110 --> 00:24:00,880 is greater than or equal to y. 434 00:24:00,880 --> 00:24:05,800 So in the first case, when x is less than y, 435 00:24:05,800 --> 00:24:08,660 the comparison here x less than y is going to return 1. 436 00:24:08,660 --> 00:24:10,160 And then we're going to negate that, 437 00:24:10,160 --> 00:24:11,860 which gives us negative 1. 438 00:24:11,860 --> 00:24:13,810 And recall from earlier, negative 1 439 00:24:13,810 --> 00:24:19,480 is the all 1's word in two's complement representation. 440 00:24:19,480 --> 00:24:24,610 So when we AND x XOR y with all 1's word, 441 00:24:24,610 --> 00:24:27,880 that just gives us x XOR y. 442 00:24:27,880 --> 00:24:31,110 And now we're left with y XOR x XOR y. 443 00:24:31,110 --> 00:24:34,330 And we know that the-- 444 00:24:34,330 --> 00:24:37,990 we know that the inverse of XOR is itself. 445 00:24:37,990 --> 00:24:40,180 And therefore the two y's cancel out here 446 00:24:40,180 --> 00:24:42,390 and we're just left with x. 447 00:24:42,390 --> 00:24:44,620 And in this case x is indeed the minimum. 448 00:24:48,470 --> 00:24:52,490 In the other case, we have x greater than or equal to y. 449 00:24:52,490 --> 00:24:55,010 Then the expression x less than y is going to return 0. 450 00:24:55,010 --> 00:24:57,710 Negative of 0 is still 0. 451 00:24:57,710 --> 00:25:01,760 And then when we AND x XOR y with 0, we're left with 0. 452 00:25:01,760 --> 00:25:05,360 And this just gives us y XOR 0, which is y. 453 00:25:05,360 --> 00:25:07,970 And in this case y is the minimum of the two integers. 454 00:25:11,140 --> 00:25:12,493 So any questions? 455 00:25:20,710 --> 00:25:21,793 So how many of you-- 456 00:25:21,793 --> 00:25:23,210 how many of you knew this already? 457 00:25:26,230 --> 00:25:26,730 Good. 458 00:25:26,730 --> 00:25:29,820 So we learned something new today. 459 00:25:29,820 --> 00:25:36,640 So let's see how branches work in a real function. 460 00:25:36,640 --> 00:25:39,250 So here we're trying to merge together two sorted arrays, 461 00:25:39,250 --> 00:25:41,520 and this is a subroutine that's used in merge sort 462 00:25:41,520 --> 00:25:44,370 if you've seen it before. 463 00:25:44,370 --> 00:25:47,650 So the inputs to this function are three arrays. 464 00:25:47,650 --> 00:25:50,220 So we want to merge together arrays A and B 465 00:25:50,220 --> 00:25:52,260 and store the result in C. And then 466 00:25:52,260 --> 00:25:57,060 we also pass the function the sizes of A and B in na and nb. 467 00:25:59,650 --> 00:26:01,750 So what does the restrict keyword do here? 468 00:26:01,750 --> 00:26:02,470 Does anyone know? 469 00:26:07,980 --> 00:26:10,480 So the restrict keyword tells the compiler 470 00:26:10,480 --> 00:26:13,570 that this is going to be the only pointer that can 471 00:26:13,570 --> 00:26:16,750 point to that particular data. 472 00:26:16,750 --> 00:26:20,135 And this enables the compiler to do more optimizations. 473 00:26:20,135 --> 00:26:21,760 So when you're writing programs and you 474 00:26:21,760 --> 00:26:23,830 know that there can only be one pointer pointing 475 00:26:23,830 --> 00:26:26,620 to specific pieces of data, then you 476 00:26:26,620 --> 00:26:28,100 can declare that restrict keyword, 477 00:26:28,100 --> 00:26:32,230 and this gives the compiler more freedom to do optimizations. 478 00:26:35,750 --> 00:26:39,590 So now let's look at this procedure here. 479 00:26:39,590 --> 00:26:43,610 So while the sizes of A and B are nonzero, 480 00:26:43,610 --> 00:26:45,860 we're going to go into this if-else clause 481 00:26:45,860 --> 00:26:51,170 and we're going to check if the element pointed to by A is 482 00:26:51,170 --> 00:26:53,828 less than or equal to the element pointed to by B. 483 00:26:53,828 --> 00:26:56,120 And if so, we're going to store that element pointed to 484 00:26:56,120 --> 00:26:58,880 by A into C. And then we're going to increment 485 00:26:58,880 --> 00:27:01,710 both the C and A pointers. 486 00:27:01,710 --> 00:27:04,100 And then we're going to decrement na. 487 00:27:04,100 --> 00:27:06,320 This tells us that there's one less element in A 488 00:27:06,320 --> 00:27:09,460 that we need to merge in now. 489 00:27:09,460 --> 00:27:14,800 And, otherwise, we do the same thing but with array B and nb. 490 00:27:14,800 --> 00:27:17,910 And if one of the two arrays becomes empty, 491 00:27:17,910 --> 00:27:22,350 then we go to one of these two while loops at the bottom 492 00:27:22,350 --> 00:27:24,930 and we just copy all the remaining elements 493 00:27:24,930 --> 00:27:29,010 in the non-empty array into C. So here, 494 00:27:29,010 --> 00:27:31,650 if na is greater than 0, then A is a non-empty array, 495 00:27:31,650 --> 00:27:34,590 and then we just copy the remaining elements of A into C. 496 00:27:34,590 --> 00:27:37,730 And, otherwise, we copy the remaining elements of B into C. 497 00:27:37,730 --> 00:27:40,810 So let's do a simple example. 498 00:27:40,810 --> 00:27:43,410 Let's say we want to merge these two arrays in green 499 00:27:43,410 --> 00:27:45,570 into the blue array here. 500 00:27:45,570 --> 00:27:49,590 So let's say the top array is A, and the bottom array is B, 501 00:27:49,590 --> 00:27:53,790 and the blue array is C. So, initially, A and B 502 00:27:53,790 --> 00:27:57,420 are pointing to the beginning of these two green arrays. 503 00:27:57,420 --> 00:27:59,980 And since both arrays are non-empty, 504 00:27:59,980 --> 00:28:04,020 we're going to compare the first two elements here. 505 00:28:04,020 --> 00:28:05,700 And we see that 3 is less than 4, 506 00:28:05,700 --> 00:28:08,580 so we're going to place 3 into the array C. 507 00:28:08,580 --> 00:28:11,310 And then we're going to increment the pointer in A 508 00:28:11,310 --> 00:28:12,810 to point to the next element. 509 00:28:12,810 --> 00:28:14,685 And we're also going to increment the pointer 510 00:28:14,685 --> 00:28:18,540 C to point to the next slot. 511 00:28:18,540 --> 00:28:20,400 Now we're going to compare 4 and 12. 512 00:28:20,400 --> 00:28:24,030 4 is less than 12, so we place 4 into the array C, 513 00:28:24,030 --> 00:28:28,440 and we increment array B. And then we just keep doing this. 514 00:28:28,440 --> 00:28:29,790 So 12 is less than 14. 515 00:28:29,790 --> 00:28:32,045 14 is less than 19. 516 00:28:32,045 --> 00:28:34,125 19 Is less than 21. 517 00:28:34,125 --> 00:28:36,380 21 is less than 46. 518 00:28:36,380 --> 00:28:38,280 And here 23 is less than 46. 519 00:28:38,280 --> 00:28:40,740 And at this point, one of the arrays becomes empty. 520 00:28:40,740 --> 00:28:42,780 So B is empty now. 521 00:28:42,780 --> 00:28:45,210 So now we get to the second while loop. 522 00:28:45,210 --> 00:28:47,250 And we see that A still has elements in it, 523 00:28:47,250 --> 00:28:50,257 and we just copy the remaining elements in A into C. 524 00:28:50,257 --> 00:28:51,090 And then we're done. 525 00:28:55,260 --> 00:28:58,770 So that's how the standard code for merging two sorted arrays 526 00:28:58,770 --> 00:28:59,270 works. 527 00:29:02,710 --> 00:29:04,580 So let's look at each of these branches 528 00:29:04,580 --> 00:29:06,680 to see if it's predictable. 529 00:29:06,680 --> 00:29:10,790 So a predictable branch is a branch 530 00:29:10,790 --> 00:29:13,700 that most of the time it returns the same answer, 531 00:29:13,700 --> 00:29:16,520 and only rarely does it return a different answer. 532 00:29:16,520 --> 00:29:19,940 And an unpredictable branch is one where it sometimes returns 533 00:29:19,940 --> 00:29:22,050 one value and sometimes returns another value 534 00:29:22,050 --> 00:29:25,260 and you can't really predict it. 535 00:29:25,260 --> 00:29:27,440 So let's look at the first branch. 536 00:29:27,440 --> 00:29:30,331 Does anyone know if this branch is predictable? 537 00:29:36,700 --> 00:29:37,200 Yes. 538 00:29:37,200 --> 00:29:39,040 AUDIENCE: That would be unpredictable 539 00:29:39,040 --> 00:29:43,562 because it depends on what input you're given. 540 00:29:43,562 --> 00:29:46,260 JULIAN SHUN: So it turns out that this branch is actually 541 00:29:46,260 --> 00:29:49,300 predictable because it's going to return true most of the time 542 00:29:49,300 --> 00:29:51,210 except for the last time. 543 00:29:51,210 --> 00:29:54,750 So it's only going to return false when nb is equal to 0. 544 00:29:54,750 --> 00:29:57,120 And at that point you're just going to execute this once 545 00:29:57,120 --> 00:29:58,870 and then you're done. 546 00:29:58,870 --> 00:30:01,800 But most of the time nb is going to be greater 547 00:30:01,800 --> 00:30:04,470 than 0 when you execute this, and we 548 00:30:04,470 --> 00:30:06,450 call this a predictable branch. 549 00:30:09,780 --> 00:30:11,940 What about the second one? 550 00:30:14,802 --> 00:30:16,233 So-- 551 00:30:16,233 --> 00:30:18,322 AUDIENCE: Also predictable? 552 00:30:18,322 --> 00:30:19,030 JULIAN SHUN: Yes. 553 00:30:19,030 --> 00:30:21,190 So it's also predictable for the same reason. 554 00:30:24,738 --> 00:30:25,780 What about the third one? 555 00:30:29,950 --> 00:30:31,088 Yes. 556 00:30:31,088 --> 00:30:31,630 AUDIENCE: No. 557 00:30:31,630 --> 00:30:34,570 Because we really-- if we already knew which was bigger, 558 00:30:34,570 --> 00:30:37,412 then we already have the sorted array then. 559 00:30:37,412 --> 00:30:38,120 JULIAN SHUN: Yes. 560 00:30:38,120 --> 00:30:39,890 So this turns out to be unpredictable 561 00:30:39,890 --> 00:30:44,180 because we don't know the values in A and B a priori. 562 00:30:44,180 --> 00:30:49,220 So this condition inside the if statement 563 00:30:49,220 --> 00:30:52,430 is going to return true about half of the time 564 00:30:52,430 --> 00:30:55,253 because we don't know what values are in A and B. 565 00:30:55,253 --> 00:30:57,170 And that's going to be an unpredictable branch 566 00:30:57,170 --> 00:31:03,520 because it's going to return true or false about 50/50. 567 00:31:03,520 --> 00:31:06,340 What about the last one? 568 00:31:06,340 --> 00:31:06,840 Yes. 569 00:31:06,840 --> 00:31:07,990 Why? 570 00:31:07,990 --> 00:31:12,612 AUDIENCE: Yes, because for similar reasons of 1 and 2. 571 00:31:12,612 --> 00:31:15,192 It's probably [INAUDIBLE]. 572 00:31:15,192 --> 00:31:15,900 JULIAN SHUN: Yes. 573 00:31:15,900 --> 00:31:17,760 So it is predictable. 574 00:31:17,760 --> 00:31:20,232 The reason why it's predictable is that most 575 00:31:20,232 --> 00:31:21,690 the time it's going to return true. 576 00:31:21,690 --> 00:31:23,340 And that once it returns false you're 577 00:31:23,340 --> 00:31:26,700 never going to look at that again inside this function 578 00:31:26,700 --> 00:31:27,880 call. 579 00:31:27,880 --> 00:31:29,640 So it returns true most of the time, 580 00:31:29,640 --> 00:31:34,270 and we call that a predictable branch. 581 00:31:34,270 --> 00:31:37,420 So branches 1, 2, and 4 are OK because they're 582 00:31:37,420 --> 00:31:41,530 predictable branches, but branch 3 is going to cause a problem. 583 00:31:41,530 --> 00:31:45,970 It's an unpredictable branch, and the hardware doesn't really 584 00:31:45,970 --> 00:31:51,430 like this because it can't do prefetching efficiently. 585 00:31:51,430 --> 00:31:55,540 So to fix this, we can use our no-branch minimum bit trick 586 00:31:55,540 --> 00:31:58,660 that we learned a couple slides ago. 587 00:31:58,660 --> 00:32:02,470 So now what we're doing is we're going to have a variable called 588 00:32:02,470 --> 00:32:05,200 cmp which stores the result of the comparison 589 00:32:05,200 --> 00:32:10,540 between the first element of A and the first element of B. 590 00:32:10,540 --> 00:32:14,740 And then now we're going to get the minimum of A and B 591 00:32:14,740 --> 00:32:15,243 as follows. 592 00:32:15,243 --> 00:32:17,035 It's the same bit trick that we saw before. 593 00:32:19,880 --> 00:32:22,420 So now the variable min is going to store 594 00:32:22,420 --> 00:32:24,850 the smaller of the first element of A 595 00:32:24,850 --> 00:32:27,550 and the first element of B. And we also 596 00:32:27,550 --> 00:32:31,210 have the result of this comparison here. 597 00:32:31,210 --> 00:32:33,310 So that's stored in cmp. 598 00:32:33,310 --> 00:32:36,640 So first we're going to place the minimum value in C. 599 00:32:36,640 --> 00:32:38,710 And then, based on the result of cmp, 600 00:32:38,710 --> 00:32:43,150 we're going to increment one of A or B. So if A was less than 601 00:32:43,150 --> 00:32:45,670 or equal to B, then cmp is going to be 1. 602 00:32:45,670 --> 00:32:52,360 And A plus equal cmp is going to increment A by 1. 603 00:32:52,360 --> 00:32:55,840 And then B plus equal to not cmp is going to not do anything, 604 00:32:55,840 --> 00:32:57,880 because not cmp is 0. 605 00:32:57,880 --> 00:33:01,180 And then for na, we're going to decrement by cmp. 606 00:33:01,180 --> 00:33:04,390 So it's going to be 1 if A is less than or equal to B, 607 00:33:04,390 --> 00:33:06,110 and 0 otherwise. 608 00:33:06,110 --> 00:33:07,840 And then for nb, we're going to decrement 609 00:33:07,840 --> 00:33:09,820 by the not of the cmp. 610 00:33:09,820 --> 00:33:13,900 So only one of these two lines is actually 611 00:33:13,900 --> 00:33:18,348 going to do something based on the result of the comparison. 612 00:33:18,348 --> 00:33:20,515 And then the rest of the code is the same as before. 613 00:33:23,250 --> 00:33:25,770 Any questions? 614 00:33:25,770 --> 00:33:28,710 So now we've gotten rid of this unpredictable branch 615 00:33:28,710 --> 00:33:29,670 that we had before. 616 00:33:33,690 --> 00:33:35,450 So one thing about this optimization 617 00:33:35,450 --> 00:33:38,180 is that it works well on certain machines. 618 00:33:38,180 --> 00:33:41,120 However, on modern machines, using a good compiler 619 00:33:41,120 --> 00:33:44,690 like Clang with the minus O3 flag, 620 00:33:44,690 --> 00:33:46,640 the branchless version is usually 621 00:33:46,640 --> 00:33:48,530 going to be slower than the branching version 622 00:33:48,530 --> 00:33:51,200 because the compiler is actually smart enough 623 00:33:51,200 --> 00:33:55,310 to get rid of the branch inside the original version 624 00:33:55,310 --> 00:33:56,150 of minimum. 625 00:33:56,150 --> 00:34:00,980 There's this instruction called cmov or a conditional move. 626 00:34:00,980 --> 00:34:04,460 It's basically a branchless instruction 627 00:34:04,460 --> 00:34:05,690 for doing a comparison. 628 00:34:05,690 --> 00:34:08,429 We'll learn more about that next week. 629 00:34:08,429 --> 00:34:11,542 So this trick actually usually doesn't really work. 630 00:34:11,542 --> 00:34:14,000 There might be some machines and some compilers that works, 631 00:34:14,000 --> 00:34:15,417 but most of the time, the compiler 632 00:34:15,417 --> 00:34:19,860 is better at optimizing this code than you are. 633 00:34:19,860 --> 00:34:22,190 So one of the common themes so far 634 00:34:22,190 --> 00:34:25,190 is that I've told you about a really cool bit trick 635 00:34:25,190 --> 00:34:28,520 and then I told you that it doesn't really work. 636 00:34:28,520 --> 00:34:30,940 So why are we even learning about these bit tricks 637 00:34:30,940 --> 00:34:33,820 then if they don't even work? 638 00:34:33,820 --> 00:34:37,310 So first is because the compiler does some of these bit tricks, 639 00:34:37,310 --> 00:34:39,770 and it's helpful to understand what these bit tricks are 640 00:34:39,770 --> 00:34:42,409 so you can figure out what the compiler is doing when 641 00:34:42,409 --> 00:34:45,560 you look at the assembly code. 642 00:34:45,560 --> 00:34:48,530 Secondly, sometimes the compiler doesn't do these optimizations 643 00:34:48,530 --> 00:34:51,530 for you and you have to do it yourself. 644 00:34:51,530 --> 00:34:53,540 Thirdly, many bit hacks for words 645 00:34:53,540 --> 00:34:56,120 extend naturally to bit and word hacks 646 00:34:56,120 --> 00:34:59,530 for vectors, which are widely used in high-performance code. 647 00:34:59,530 --> 00:35:02,250 So it's good to know about these tricks. 648 00:35:02,250 --> 00:35:05,670 These bit tricks also arise in other domains. 649 00:35:05,670 --> 00:35:09,810 And, finally, because they're just fun to learn about. 650 00:35:09,810 --> 00:35:12,150 And for project 1, you'll be playing around 651 00:35:12,150 --> 00:35:14,520 with some of these bit tricks, so it's 652 00:35:14,520 --> 00:35:19,470 good to know about these things that I've talked about already. 653 00:35:19,470 --> 00:35:22,840 Here I'll talk about a bit trick that actually does work. 654 00:35:22,840 --> 00:35:26,970 So here we're trying to do modular addition. 655 00:35:26,970 --> 00:35:30,420 So we want to do x plus y mod n. 656 00:35:30,420 --> 00:35:35,100 And here let's assume that x is between 0 and n minus 1, 657 00:35:35,100 --> 00:35:38,800 and y is also between 0 and n minus 1. 658 00:35:38,800 --> 00:35:41,190 So the standard way to do this is just 659 00:35:41,190 --> 00:35:45,600 to use the mod operator, x plus y mod n. 660 00:35:45,600 --> 00:35:47,640 However, this does a division, which 661 00:35:47,640 --> 00:35:50,580 is relatively expensive compared to other operations 662 00:35:50,580 --> 00:35:52,830 unless n is a power of 2. 663 00:35:52,830 --> 00:35:55,070 But most of the time, you don't know 664 00:35:55,070 --> 00:35:57,100 if n is a power of 2 at compile time, 665 00:35:57,100 --> 00:35:59,400 so the compiler can't actually translate this 666 00:35:59,400 --> 00:36:05,550 to a right shift operation, and then it has to do a division. 667 00:36:05,550 --> 00:36:11,020 So here's another way to do it without using division. 668 00:36:11,020 --> 00:36:15,330 So we're first going to set z equal to the sum of x and y. 669 00:36:15,330 --> 00:36:18,060 And then if z is less than n, then 670 00:36:18,060 --> 00:36:21,710 it's already within the range and we can just return z. 671 00:36:21,710 --> 00:36:23,790 If z is greater than or equal to n, 672 00:36:23,790 --> 00:36:25,950 well we know we can be at most 2n minus 2 673 00:36:25,950 --> 00:36:29,220 because x and y were both at most n minus 1. 674 00:36:29,220 --> 00:36:32,490 So all we have to do is to subtract n and bring it back 675 00:36:32,490 --> 00:36:35,140 into range. 676 00:36:35,140 --> 00:36:37,470 However, this code has an unpredictable branch here 677 00:36:37,470 --> 00:36:41,910 because we don't know whether z is less than n or not. 678 00:36:41,910 --> 00:36:45,420 So now we can use the same trick as minimum. 679 00:36:45,420 --> 00:36:49,180 So now we're going to set r equal to z minus n 680 00:36:49,180 --> 00:36:55,380 ANDed with the negative of z greater than or equal to n. 681 00:36:55,380 --> 00:36:58,590 So if z is less than n, then this 682 00:36:58,590 --> 00:37:00,770 is going to return 0 in here. 683 00:37:00,770 --> 00:37:04,020 And n ANDed with 0 is 0, so we're just left with z. 684 00:37:04,020 --> 00:37:07,380 And if z is greater than or equal to n, 685 00:37:07,380 --> 00:37:09,330 then this is going to be 1. 686 00:37:09,330 --> 00:37:13,200 We negate that, we get negative 1, which is the all 1's word. 687 00:37:13,200 --> 00:37:15,450 n ANDed with all 1's is just n. 688 00:37:15,450 --> 00:37:19,320 So that is z minus n, which will bring the result back 689 00:37:19,320 --> 00:37:19,980 into range. 690 00:37:24,830 --> 00:37:29,070 So any questions? 691 00:37:29,070 --> 00:37:29,570 Yes. 692 00:37:29,570 --> 00:37:31,278 AUDIENCE: It seems like there essentially 693 00:37:31,278 --> 00:37:34,090 is still a branch based on the value of z. 694 00:37:34,090 --> 00:37:37,760 So why would that be faster? 695 00:37:37,760 --> 00:37:40,190 JULIAN SHUN: So this branch here is just generating 696 00:37:40,190 --> 00:37:42,140 either a Boolean value 1 or 0. 697 00:37:42,140 --> 00:37:44,900 There's actually-- like the code that you execute after it, 698 00:37:44,900 --> 00:37:47,400 it's still the same in either case. 699 00:37:47,400 --> 00:37:49,850 So the branch misprediction only hurts you 700 00:37:49,850 --> 00:37:51,570 if there are two different code paths. 701 00:37:51,570 --> 00:37:54,170 In this version, there are two different code paths, 702 00:37:54,170 --> 00:37:58,130 because one is doing z and one is doing z minus n. 703 00:38:02,620 --> 00:38:04,810 So the next problem we will look at 704 00:38:04,810 --> 00:38:09,610 is computing or rounding a value up to the nearest power of 2. 705 00:38:09,610 --> 00:38:15,160 And this is just 2 to the ceiling of log base 2 of n. 706 00:38:15,160 --> 00:38:18,330 And recall that lg of n is the log base 2 of n. 707 00:38:18,330 --> 00:38:23,140 That's the notation we'll be using in this class. 708 00:38:23,140 --> 00:38:24,730 Here's some code to do this. 709 00:38:24,730 --> 00:38:28,070 So we have our value of n here. 710 00:38:28,070 --> 00:38:30,640 First, we're going to decrement n. 711 00:38:30,640 --> 00:38:35,540 And then we're going to do an OR of n with n right-shifted by 1. 712 00:38:35,540 --> 00:38:38,920 Then an OR with n and n right-shifted by 2, and so on, 713 00:38:38,920 --> 00:38:40,490 all the way up to 32. 714 00:38:40,490 --> 00:38:44,500 So we do this for all powers of 2 up to 32. 715 00:38:44,500 --> 00:38:47,750 And then, finally, we increment n at the end. 716 00:38:47,750 --> 00:38:51,140 So let's look at an example to see why this works. 717 00:38:51,140 --> 00:38:53,710 So we're starting with this value of n here. 718 00:38:56,710 --> 00:38:59,030 First we're going to decrement it. 719 00:38:59,030 --> 00:39:03,010 And what that does is it flips the rightmost 1 bit to 0, 720 00:39:03,010 --> 00:39:06,500 and then it fills in all the 0's right of that with 1's. 721 00:39:09,490 --> 00:39:11,560 And then when we do this line, which 722 00:39:11,560 --> 00:39:16,630 says n is equal to n ORed with n right-shifted by 1, 723 00:39:16,630 --> 00:39:19,870 that's essentially propagating all of the 1 bits one position 724 00:39:19,870 --> 00:39:22,270 to the right and then ORing those in. 725 00:39:22,270 --> 00:39:25,390 So we can see that this 1 bit got copied 726 00:39:25,390 --> 00:39:26,560 one position to the right. 727 00:39:26,560 --> 00:39:29,380 This 1 bit got copied to one position to the right. 728 00:39:29,380 --> 00:39:32,100 These 1's also propagate, but since they were already 1's 729 00:39:32,100 --> 00:39:35,290 it doesn't do anything. 730 00:39:35,290 --> 00:39:37,600 For the next line, we're propagating the 1 bits 731 00:39:37,600 --> 00:39:40,030 two positions to the right. 732 00:39:40,030 --> 00:39:43,750 So this 1 bit here gets copied here. 733 00:39:43,750 --> 00:39:47,290 This 1 gets copied here, and so on. 734 00:39:47,290 --> 00:39:49,600 And then the next line is going to propagate bits 735 00:39:49,600 --> 00:39:51,790 four positions the right. 736 00:39:51,790 --> 00:39:53,380 Then 8, 16, and 32. 737 00:39:53,380 --> 00:39:56,650 For this example here, when I get to this line 738 00:39:56,650 --> 00:39:57,440 I'm already done. 739 00:39:57,440 --> 00:40:00,370 But, in general, you have more bits 740 00:40:00,370 --> 00:40:04,510 in a word, which I can't fit on this slide. 741 00:40:04,510 --> 00:40:07,210 And now we have something that's exactly 742 00:40:07,210 --> 00:40:09,670 one less than a power of 2. 743 00:40:09,670 --> 00:40:12,010 And when we add 1 to that, we just get a power of 2. 744 00:40:12,010 --> 00:40:14,890 So we're going to zero out all of these 1 bits 745 00:40:14,890 --> 00:40:16,330 and then place a 1 here. 746 00:40:16,330 --> 00:40:19,030 And this is exactly the power of 2 747 00:40:19,030 --> 00:40:21,360 that's greater than the value n. 748 00:40:28,890 --> 00:40:31,980 So the first line here is essentially 749 00:40:31,980 --> 00:40:35,670 guaranteeing us that the log nth minus 1 bit is set. 750 00:40:35,670 --> 00:40:37,860 And we need that bit to be set because we 751 00:40:37,860 --> 00:40:40,890 want to propagate that bit to all the positions 752 00:40:40,890 --> 00:40:43,890 to the right of it. 753 00:40:43,890 --> 00:40:47,370 And then these six lines here are populating all the bits 754 00:40:47,370 --> 00:40:49,770 to the right with 1's. 755 00:40:49,770 --> 00:40:53,580 And then the last bit is setting the log nth bit to 1 756 00:40:53,580 --> 00:40:55,260 and then clearing all of the other bits. 757 00:40:58,420 --> 00:41:01,330 So one question is why did we have to decrement n 758 00:41:01,330 --> 00:41:02,425 at the beginning? 759 00:41:05,540 --> 00:41:06,425 Yes. 760 00:41:06,425 --> 00:41:08,600 AUDIENCE: In case n is already [INAUDIBLE].. 761 00:41:08,600 --> 00:41:09,308 JULIAN SHUN: Yes. 762 00:41:09,308 --> 00:41:13,130 So if n is already a power of 2 and if we don't decrement n, 763 00:41:13,130 --> 00:41:16,340 this is isn't going to work because the log nth minus 1 bit 764 00:41:16,340 --> 00:41:17,360 isn't set. 765 00:41:17,360 --> 00:41:19,660 But if we decrement n, then it's going 766 00:41:19,660 --> 00:41:22,430 to guarantee us that the log nth minus 1 bit 767 00:41:22,430 --> 00:41:24,650 is set so that we can propagate that to the right. 768 00:41:28,590 --> 00:41:31,430 Any questions? 769 00:41:31,430 --> 00:41:31,930 Yes. 770 00:41:31,930 --> 00:41:34,480 AUDIENCE: [INAUDIBLE]? 771 00:41:34,480 --> 00:41:36,550 JULIAN SHUN: Because, in general, you're 772 00:41:36,550 --> 00:41:39,720 using 64-bit words. 773 00:41:39,720 --> 00:41:41,710 Here I don't have that many bits here 774 00:41:41,710 --> 00:41:43,210 because I can't fit in on the slide, 775 00:41:43,210 --> 00:41:44,627 but in general you have more bits. 776 00:41:51,810 --> 00:41:53,370 Let's look at another problem. 777 00:41:53,370 --> 00:41:57,060 Here we want to compute the mask of the least significant 1 778 00:41:57,060 --> 00:41:58,980 in a word x. 779 00:41:58,980 --> 00:42:01,110 So we want a mask that has a 1 in only 780 00:42:01,110 --> 00:42:04,200 the position of the least significant 1 in x, and 0's 781 00:42:04,200 --> 00:42:06,160 everywhere else. 782 00:42:06,160 --> 00:42:08,750 So how can we do this? 783 00:42:08,750 --> 00:42:11,210 So we can set r, the result, equal to x 784 00:42:11,210 --> 00:42:12,350 ANDed with negative x. 785 00:42:15,730 --> 00:42:18,550 So let's look at why this works. 786 00:42:18,550 --> 00:42:20,260 So here is x. 787 00:42:20,260 --> 00:42:27,200 And recall negative x is the two's complement of x plus 1. 788 00:42:27,200 --> 00:42:33,670 So what we do is we flip all of the bits up to the rightmost 1 789 00:42:33,670 --> 00:42:36,160 but not including it, and then we just copy all of the bits 790 00:42:36,160 --> 00:42:36,680 over. 791 00:42:36,680 --> 00:42:40,770 That's how we get negative x from x. 792 00:42:40,770 --> 00:42:43,570 And then now when we compare x and negative x, 793 00:42:43,570 --> 00:42:47,320 we see that all of the bits when we AND them together 794 00:42:47,320 --> 00:42:52,420 are going to be 0 except for the bit at the position 795 00:42:52,420 --> 00:42:55,420 corresponding to the least significant 1 bit in x. 796 00:42:55,420 --> 00:42:58,540 And that's going to be 1 since we're ANDing 1 and 1, 797 00:42:58,540 --> 00:43:00,730 and everything else is going to be 0. 798 00:43:00,730 --> 00:43:02,740 And this will give us the mask that we want. 799 00:43:06,860 --> 00:43:10,520 So this works because the binary representation of minus x 800 00:43:10,520 --> 00:43:13,450 is just the one's complement of x plus 1. 801 00:43:18,916 --> 00:43:23,390 So now, a question is how can we find the index of this bit? 802 00:43:23,390 --> 00:43:26,150 So here I'm just generating a mask that has a 1 803 00:43:26,150 --> 00:43:31,310 in the least significant 1 in x, but it doesn't actually 804 00:43:31,310 --> 00:43:33,140 tell me the index of this bit. 805 00:43:33,140 --> 00:43:36,950 In other words, I want to find the log base 2 of a power of 2. 806 00:43:40,110 --> 00:43:42,090 So that's the problem we want to solve, 807 00:43:42,090 --> 00:43:46,170 and here's some code that lets us do this. 808 00:43:46,170 --> 00:43:49,260 So we have this constant called the de Bruijn. 809 00:43:49,260 --> 00:43:51,770 It's written in hex here. 810 00:43:51,770 --> 00:43:56,850 And then we have this table of size 64 called convert. 811 00:43:56,850 --> 00:44:00,600 And now all we have to do is multiply x by this de Bruijn 812 00:44:00,600 --> 00:44:04,140 constant, right shift it by 58 positions, 813 00:44:04,140 --> 00:44:06,750 and then look up the result in the convert table. 814 00:44:06,750 --> 00:44:10,850 And that's going to give us the log base 2 of the power of 2. 815 00:44:10,850 --> 00:44:12,078 Any questions? 816 00:44:12,078 --> 00:44:16,380 [STUDENTS LAUGH] 817 00:44:18,800 --> 00:44:21,810 So this looks like magic to us. 818 00:44:21,810 --> 00:44:25,340 So in the spirit of magic, we're going to do a mathemagic trick. 819 00:44:25,340 --> 00:44:29,240 And to do this trick, I'm going to need five volunteers, 820 00:44:29,240 --> 00:44:30,740 and the only requirement is that you 821 00:44:30,740 --> 00:44:33,080 need to be able to follow directions. 822 00:44:33,080 --> 00:44:35,900 So who wants to volunteer for this magic trick? 823 00:44:35,900 --> 00:44:42,650 Yes, 1, 2, 3, 4-- 824 00:44:42,650 --> 00:44:45,500 one more-- 5. 825 00:44:45,500 --> 00:44:47,770 All right, come on up. 826 00:44:47,770 --> 00:44:49,103 So line up here. 827 00:44:49,103 --> 00:44:52,490 [STUDENTS APPLAUD] 828 00:44:52,490 --> 00:44:54,430 Yes, just line up right here. 829 00:45:03,843 --> 00:45:05,635 Can you move a little bit over to the left? 830 00:45:08,190 --> 00:45:08,780 OK, cool. 831 00:45:08,780 --> 00:45:13,790 So today I have the pleasure of welcoming Jess Ray, also known 832 00:45:13,790 --> 00:45:16,970 as The Golden Raytio, to join us for a lecture 833 00:45:16,970 --> 00:45:19,650 and help us perform this cool magic trick. 834 00:45:19,650 --> 00:45:21,984 So let's give her a round of applause. 835 00:45:21,984 --> 00:45:24,952 [STUDENTS APPLAUD] 836 00:45:24,952 --> 00:45:27,410 JESS RAY: I'm going to be doing a little bit of magic trick 837 00:45:27,410 --> 00:45:28,670 for you all today. 838 00:45:28,670 --> 00:45:31,580 I'm going to be reading your guys' minds. 839 00:45:31,580 --> 00:45:33,080 And I know you're looking skeptical, 840 00:45:33,080 --> 00:45:36,397 but I'm hoping I can convince you here. 841 00:45:36,397 --> 00:45:37,980 So we'll get to that part in a second. 842 00:45:37,980 --> 00:45:41,330 But, first, the first big step in reading minds 843 00:45:41,330 --> 00:45:43,100 is you got to clear the air, like get 844 00:45:43,100 --> 00:45:45,770 rid of all the negative vibes, all the bad energy. 845 00:45:45,770 --> 00:45:46,430 Throw that out. 846 00:45:46,430 --> 00:45:48,690 So I'm going to need a little help from you guys 847 00:45:48,690 --> 00:45:50,270 in doing this. 848 00:45:50,270 --> 00:45:52,650 So, first, we have this sweet little bell here. 849 00:45:52,650 --> 00:45:53,150 Let's see. 850 00:45:53,150 --> 00:45:54,097 Who wants the bell? 851 00:45:54,097 --> 00:45:55,430 AUDIENCE: I'll take it, I guess. 852 00:45:55,430 --> 00:45:55,760 JESS RAY: All right. 853 00:45:55,760 --> 00:45:56,900 Can you hold that for a second? 854 00:45:56,900 --> 00:45:58,358 So what this bell is going to do is 855 00:45:58,358 --> 00:46:00,500 help us get rid of some of those negative ideas. 856 00:46:00,500 --> 00:46:02,420 Can you give it a ring? 857 00:46:02,420 --> 00:46:03,470 Oh yes. 858 00:46:03,470 --> 00:46:06,080 So that painful ringing you're hearing in your ears right now 859 00:46:06,080 --> 00:46:08,060 is actually just clearing up the air for us, 860 00:46:08,060 --> 00:46:10,410 making it so I can read your minds. 861 00:46:10,410 --> 00:46:10,910 Thank you. 862 00:46:10,910 --> 00:46:11,430 Stop that. 863 00:46:15,070 --> 00:46:16,210 All right. 864 00:46:16,210 --> 00:46:19,480 Next we have this magic tone here. 865 00:46:19,480 --> 00:46:21,227 Who would like to give this a spin? 866 00:46:21,227 --> 00:46:23,060 Can you shake that around a couple of times? 867 00:46:23,060 --> 00:46:23,852 Spin it. 868 00:46:23,852 --> 00:46:26,560 Spin it with your wrist there, like-- you can go like this. 869 00:46:26,560 --> 00:46:28,900 There we go. 870 00:46:28,900 --> 00:46:29,680 All right. 871 00:46:29,680 --> 00:46:30,410 Perfect. 872 00:46:30,410 --> 00:46:30,910 All right. 873 00:46:30,910 --> 00:46:32,770 It's feeling a little clearer here. 874 00:46:32,770 --> 00:46:36,040 I can start-- you can start getting things off your mind. 875 00:46:36,040 --> 00:46:38,920 Don't worry, I won't tell anybody what you're thinking. 876 00:46:38,920 --> 00:46:41,160 Oh, let's see what else. 877 00:46:41,160 --> 00:46:42,850 Let me channel the spirits. 878 00:46:42,850 --> 00:46:45,310 Help me out here. 879 00:46:45,310 --> 00:46:47,210 All right, I'm feeling good. 880 00:46:47,210 --> 00:46:47,710 All right. 881 00:46:47,710 --> 00:46:51,600 So what we're going to be doing is, as I said, 882 00:46:51,600 --> 00:46:52,350 reading your mind. 883 00:46:52,350 --> 00:46:54,500 I'm going to be doing this by giving you cards, 884 00:46:54,500 --> 00:46:56,250 and I'm going to tell you what each of you 885 00:46:56,250 --> 00:46:57,520 are holding for the card. 886 00:46:57,520 --> 00:47:00,090 So I have some cards here. 887 00:47:00,090 --> 00:47:02,440 Well, I guess these are a little small. 888 00:47:02,440 --> 00:47:03,450 Let's see. 889 00:47:03,450 --> 00:47:05,432 Go a little bigger. 890 00:47:05,432 --> 00:47:06,680 Meh. 891 00:47:06,680 --> 00:47:07,850 Here we go. 892 00:47:07,850 --> 00:47:12,430 Let's-- this looks better. 893 00:47:12,430 --> 00:47:13,670 All right. 894 00:47:13,670 --> 00:47:17,380 These are kind of heavy. 895 00:47:17,380 --> 00:47:22,270 Get rid of these junk ones up here, all the junk. 896 00:47:22,270 --> 00:47:23,080 All right. 897 00:47:23,080 --> 00:47:25,690 So I need your help for this. 898 00:47:25,690 --> 00:47:27,898 So what I want you to do is take the cards 899 00:47:27,898 --> 00:47:29,690 and cut the deck as many times as you want. 900 00:47:29,690 --> 00:47:32,230 So, basically, just going like that however much. 901 00:47:32,230 --> 00:47:34,810 Just don't actually shuffle them randomly. 902 00:47:37,239 --> 00:47:38,072 AUDIENCE: All right. 903 00:47:38,072 --> 00:47:38,572 Here you go. 904 00:47:38,572 --> 00:47:40,237 JESS RAY: All right, cool. 905 00:47:40,237 --> 00:47:42,070 So now I'm going to hand each of you a card. 906 00:47:42,070 --> 00:47:42,910 Don't let me see it. 907 00:47:42,910 --> 00:47:43,910 Feel free to look at it. 908 00:47:50,176 --> 00:47:52,120 There you go. 909 00:47:52,120 --> 00:47:52,660 All right. 910 00:47:52,660 --> 00:47:55,900 So the reason I'm wearing this awesome onesie 911 00:47:55,900 --> 00:47:59,050 is this helps me sweat out the bad energy. 912 00:47:59,050 --> 00:48:01,000 I'm literally sweating right now. 913 00:48:01,000 --> 00:48:03,580 But there's one more piece that we need for this mind reading 914 00:48:03,580 --> 00:48:04,080 trick. 915 00:48:07,590 --> 00:48:09,540 The magic hat. 916 00:48:09,540 --> 00:48:10,590 All right. 917 00:48:10,590 --> 00:48:12,180 See if this fits on my head. 918 00:48:12,180 --> 00:48:12,948 There we go. 919 00:48:12,948 --> 00:48:13,740 Where's the switch? 920 00:48:13,740 --> 00:48:14,240 All right. 921 00:48:14,240 --> 00:48:16,020 Turn it on. 922 00:48:16,020 --> 00:48:17,730 All right, I'm feeling good here. 923 00:48:17,730 --> 00:48:19,960 All right, you guys ready? 924 00:48:19,960 --> 00:48:20,460 All right. 925 00:48:20,460 --> 00:48:23,610 So I do need a little help getting this trick started. 926 00:48:23,610 --> 00:48:25,582 So if you are holding a red card, 927 00:48:25,582 --> 00:48:26,790 can you just raise your hand? 928 00:48:29,580 --> 00:48:31,202 So no? 929 00:48:31,202 --> 00:48:32,160 Who's got the red card? 930 00:48:32,160 --> 00:48:33,720 Red, red. 931 00:48:33,720 --> 00:48:34,980 You don't have red? 932 00:48:34,980 --> 00:48:36,070 OK. 933 00:48:36,070 --> 00:48:36,570 All right. 934 00:48:36,570 --> 00:48:39,150 So the first one and the third one. 935 00:48:39,150 --> 00:48:40,588 All right. 936 00:48:40,588 --> 00:48:42,630 So let me handle the mind reading abilities here. 937 00:48:42,630 --> 00:48:44,550 Now what I'm going to do is I'm going to go left to right 938 00:48:44,550 --> 00:48:45,420 and tell you what you're holding. 939 00:48:45,420 --> 00:48:48,210 Obviously, I know the color, but I'll tell you what suit it is, 940 00:48:48,210 --> 00:48:52,390 and also I will tell you what the number is. 941 00:48:52,390 --> 00:48:55,115 So first card, obviously I know you have a red. 942 00:48:55,115 --> 00:48:56,910 Hmm. 943 00:48:56,910 --> 00:49:00,403 I'm feeling a diamond and also a four? 944 00:49:00,403 --> 00:49:01,320 AUDIENCE: That was it. 945 00:49:01,320 --> 00:49:02,550 JESS RAY: Yes. 946 00:49:02,550 --> 00:49:05,170 All right. 947 00:49:05,170 --> 00:49:05,670 All right. 948 00:49:05,670 --> 00:49:06,870 Good start, good start. 949 00:49:09,390 --> 00:49:09,900 All right. 950 00:49:09,900 --> 00:49:13,395 Got to-- got to think about what the next one is here. 951 00:49:15,820 --> 00:49:16,320 All right. 952 00:49:16,320 --> 00:49:19,800 So I know you had a black card. 953 00:49:19,800 --> 00:49:20,610 Let's see. 954 00:49:20,610 --> 00:49:24,840 Black of spades. 955 00:49:24,840 --> 00:49:27,300 Is it the ace of spades? 956 00:49:27,300 --> 00:49:28,120 Oh yes. 957 00:49:28,120 --> 00:49:28,930 There we go. 958 00:49:33,840 --> 00:49:34,340 All right. 959 00:49:34,340 --> 00:49:39,060 So back to red. 960 00:49:39,060 --> 00:49:39,560 All right. 961 00:49:39,560 --> 00:49:41,210 This one, let's see. 962 00:49:41,210 --> 00:49:45,710 Red, diamond, two. 963 00:49:45,710 --> 00:49:46,630 All right, all right. 964 00:49:46,630 --> 00:49:47,722 We're doing good so far. 965 00:49:47,722 --> 00:49:48,680 Can I get the last two? 966 00:49:51,652 --> 00:49:53,360 All right, let's see what we can do here. 967 00:49:53,360 --> 00:49:59,750 All right, black, club, four. 968 00:49:59,750 --> 00:50:00,350 All right. 969 00:50:00,350 --> 00:50:04,550 Last one, last one. 970 00:50:04,550 --> 00:50:06,160 All right. 971 00:50:06,160 --> 00:50:07,730 Oh, it's going to be a tough one. 972 00:50:10,250 --> 00:50:16,167 Black, spade, eight. 973 00:50:16,167 --> 00:50:19,999 [STUDENTS APPLAUD] 974 00:50:21,440 --> 00:50:23,830 And if we had time, I could you mystify you 975 00:50:23,830 --> 00:50:27,650 and go through the rest of the deck, but we won't do that. 976 00:50:27,650 --> 00:50:29,290 So thank you guys very much. 977 00:50:29,290 --> 00:50:32,390 I hope your minds were blown. 978 00:50:32,390 --> 00:50:32,890 Yes. 979 00:50:32,890 --> 00:50:35,580 So me collect the cards back from you. 980 00:50:39,747 --> 00:50:41,430 Thank you. 981 00:50:41,430 --> 00:50:41,930 All right. 982 00:50:41,930 --> 00:50:42,430 Thank you. 983 00:50:42,430 --> 00:50:45,022 Now I can get out of this and stop sweating. 984 00:50:45,022 --> 00:50:51,688 [STUDENTS APPLAUD] 985 00:50:51,688 --> 00:50:53,230 JULIAN SHUN: It's pretty cool, right? 986 00:50:56,570 --> 00:50:58,140 So why does this actually work? 987 00:51:00,900 --> 00:51:02,790 To know why this trick actually works, 988 00:51:02,790 --> 00:51:07,830 we need to first study what a de Bruijn sequence is. 989 00:51:07,830 --> 00:51:10,680 So a de Bruijn sequence s of length 2 990 00:51:10,680 --> 00:51:15,270 to the k is a cyclic bit sequence such that each 991 00:51:15,270 --> 00:51:20,730 of the 2 to the k possible bit strings of length k 992 00:51:20,730 --> 00:51:25,470 occurs exactly once as a substring in s. 993 00:51:25,470 --> 00:51:27,330 So this a pretty long definition, so let's 994 00:51:27,330 --> 00:51:28,930 look at an example. 995 00:51:28,930 --> 00:51:32,950 So here is a de Bruijn sequence for k equals 3. 996 00:51:32,950 --> 00:51:38,830 So the length of this sequence is 8 because 2 to the 3 is 8. 997 00:51:38,830 --> 00:51:44,740 And you can see that each of the possible three-bit substrings 998 00:51:44,740 --> 00:51:50,350 occurs exactly once in this cyclic bit string of length 8. 999 00:51:50,350 --> 00:51:52,480 So it wraps around and you can consider 1000 00:51:52,480 --> 00:51:55,810 this as a cyclic string. 1001 00:51:55,810 --> 00:52:00,100 So we see that 000 appears at position 0. 1002 00:52:00,100 --> 00:52:03,055 001 is at position 1. 1003 00:52:03,055 --> 00:52:05,740 Then 010 is at position 6. 1004 00:52:05,740 --> 00:52:10,660 011 is at position 2. 1005 00:52:10,660 --> 00:52:13,060 100 is at position 7. 1006 00:52:13,060 --> 00:52:15,250 101 is at 5. 1007 00:52:15,250 --> 00:52:17,180 110 is at 4. 1008 00:52:17,180 --> 00:52:19,190 And then 111 is at 3. 1009 00:52:19,190 --> 00:52:23,950 So all of the 8 possible substrings of length 3 1010 00:52:23,950 --> 00:52:26,320 occur exactly once in this de Bruijn sequence. 1011 00:52:29,920 --> 00:52:36,520 So now we're going to create this convert table of length 8. 1012 00:52:36,520 --> 00:52:38,410 In general, this will be 2 to the k. 1013 00:52:38,410 --> 00:52:40,010 And here, k is 3. 1014 00:52:40,010 --> 00:52:45,310 And in this convert table, what we're storing in each position 1015 00:52:45,310 --> 00:52:48,430 is the index in the de Bruijn sequence 1016 00:52:48,430 --> 00:52:51,730 where the bit string corresponding to that position 1017 00:52:51,730 --> 00:52:54,110 starts in the de Bruijn sequence. 1018 00:52:54,110 --> 00:53:00,220 So here we see that convert of 2 is 6 because the bit string 1019 00:53:00,220 --> 00:53:04,210 corresponding to 2 is 010, and that begins at position 6 1020 00:53:04,210 --> 00:53:06,370 in the de Bruijn sequence. 1021 00:53:06,370 --> 00:53:11,770 We also see that convert of 4 is 7 because 4 is 100, 1022 00:53:11,770 --> 00:53:14,530 and that begins at position 7 in the de Bruijn sequence. 1023 00:53:17,590 --> 00:53:21,310 Now we have this convert table. 1024 00:53:21,310 --> 00:53:23,230 And recall that we're trying to compute 1025 00:53:23,230 --> 00:53:25,490 the log base 2 of a power of 2. 1026 00:53:25,490 --> 00:53:27,670 So hopefully you guys remember that. 1027 00:53:31,250 --> 00:53:33,340 So the way to do this is we're going 1028 00:53:33,340 --> 00:53:36,880 to multiply the de Bruijn sequence 1029 00:53:36,880 --> 00:53:39,410 constant by this power of 2. 1030 00:53:39,410 --> 00:53:41,860 So let's say we're working with the integer 1031 00:53:41,860 --> 00:53:43,480 16, which is 2 to the 4. 1032 00:53:43,480 --> 00:53:46,100 So we're going to multiply this de Bruijn sequence by 2 1033 00:53:46,100 --> 00:53:47,420 to the 4. 1034 00:53:47,420 --> 00:53:49,480 And when we multiply by a power of 2, 1035 00:53:49,480 --> 00:53:52,810 that's the same as left shifting. 1036 00:53:52,810 --> 00:53:55,300 So that's going to left shift the de Bruijn sequence four 1037 00:53:55,300 --> 00:53:58,660 positions to the left. 1038 00:53:58,660 --> 00:54:01,810 And then now we want to see which of the eight 1039 00:54:01,810 --> 00:54:05,480 possible substrings appears at the beginning of this sequence. 1040 00:54:05,480 --> 00:54:08,320 And after we do the left shift, 110 1041 00:54:08,320 --> 00:54:11,945 appears at the beginning of the sequence. 1042 00:54:11,945 --> 00:54:13,570 And we want to extract this out, and we 1043 00:54:13,570 --> 00:54:17,440 can do that by right shifting five positions. 1044 00:54:17,440 --> 00:54:21,670 And 110 is just 6. 1045 00:54:21,670 --> 00:54:25,060 And we can figure out where 5 starts in this de Bruijn 1046 00:54:25,060 --> 00:54:27,820 sequence by looking it up in the convert table. 1047 00:54:27,820 --> 00:54:30,910 We see that convert of 6 is 4. 1048 00:54:30,910 --> 00:54:36,040 So the string 110 appears starting 1049 00:54:36,040 --> 00:54:38,980 at position 4 in the de Bruijn sequence, 1050 00:54:38,980 --> 00:54:41,590 and that means that we did a left shift by 4 1051 00:54:41,590 --> 00:54:44,710 in the first step, and that gives us the log base 1052 00:54:44,710 --> 00:54:47,680 2 of the power of 2, because the only reason why 1053 00:54:47,680 --> 00:54:51,910 we did a left shift by 4 is because the power of 2 1054 00:54:51,910 --> 00:54:54,820 was 2 to the 4. 1055 00:54:54,820 --> 00:54:58,300 So this returns us the log base 2 of the integer 1056 00:54:58,300 --> 00:54:59,320 that we started with. 1057 00:55:02,660 --> 00:55:04,960 And one thing to note is that it's 1058 00:55:04,960 --> 00:55:08,890 important to start with all 0's in this sequence here, 1059 00:55:08,890 --> 00:55:14,260 because we're representing this as a cyclic bit sequence. 1060 00:55:14,260 --> 00:55:16,390 So when we do a left shift, we need 1061 00:55:16,390 --> 00:55:20,350 to make sure that the values that fill in on the right side 1062 00:55:20,350 --> 00:55:21,190 are correct. 1063 00:55:21,190 --> 00:55:25,750 So notice that in the sixth and seventh positions, 1064 00:55:25,750 --> 00:55:30,100 we need 0's at the end when we overflow. 1065 00:55:30,100 --> 00:55:32,020 So because the de Bruijn sequence 1066 00:55:32,020 --> 00:55:34,540 starts with all 0's, when we do the left shift, 1067 00:55:34,540 --> 00:55:36,760 it's automatically filling with 0's, giving us 1068 00:55:36,760 --> 00:55:39,190 the correct substring. 1069 00:55:39,190 --> 00:55:44,020 So the magic trick that Jess did had 32 cards, and in that 1070 00:55:44,020 --> 00:55:46,750 case k was equal to 5. 1071 00:55:46,750 --> 00:55:50,500 And the cards were arranged according to a de Bruijn 1072 00:55:50,500 --> 00:55:53,500 sequence of length 32. 1073 00:55:53,500 --> 00:55:55,750 And each of the cards corresponded 1074 00:55:55,750 --> 00:56:00,700 to one particular bit string of length 5. 1075 00:56:00,700 --> 00:56:04,570 And the color of the card corresponded to the bit. 1076 00:56:04,570 --> 00:56:08,950 So when she asked you what the color of your card was, 1077 00:56:08,950 --> 00:56:11,740 she could determine the bits corresponding 1078 00:56:11,740 --> 00:56:15,040 to the first card in the sequence 1079 00:56:15,040 --> 00:56:19,630 because she has the 5 bits corresponding to that card. 1080 00:56:19,630 --> 00:56:21,430 And then with that she has some clever way 1081 00:56:21,430 --> 00:56:24,340 to determine the rest of the cards. 1082 00:56:24,340 --> 00:56:26,650 So that's how the de Bruijn sequence 1083 00:56:26,650 --> 00:56:28,930 is related to the magic trick that you just saw. 1084 00:56:33,940 --> 00:56:35,530 Any questions? 1085 00:56:35,530 --> 00:56:36,030 Yes. 1086 00:56:36,030 --> 00:56:37,405 AUDIENCE: The de Bruijn sequence, 1087 00:56:37,405 --> 00:56:40,610 do you need to do cyclic translation? 1088 00:56:40,610 --> 00:56:43,110 JULIAN SHUN: So there could be multiple de Bruijn sequences. 1089 00:56:43,110 --> 00:56:45,660 We just need one particular de Bruijn sequence 1090 00:56:45,660 --> 00:56:48,444 to make this bit trick work. 1091 00:56:48,444 --> 00:56:48,944 Yes. 1092 00:56:52,210 --> 00:56:55,520 So this example is just for k equals 3. 1093 00:56:55,520 --> 00:56:59,890 And the code I showed you before, that was for k 1094 00:56:59,890 --> 00:57:03,910 equals 8, so you can do up to 64-bit words. 1095 00:57:03,910 --> 00:57:04,420 Yes. 1096 00:57:04,420 --> 00:57:07,525 AUDIENCE: How do we know that the sequence exists? 1097 00:57:07,525 --> 00:57:09,400 JULIAN SHUN: So there is a mathematical proof 1098 00:57:09,400 --> 00:57:11,200 that says that. 1099 00:57:11,200 --> 00:57:13,300 I can give you some pointers so that you 1100 00:57:13,300 --> 00:57:14,500 can look at it after class. 1101 00:57:14,500 --> 00:57:18,130 But there's a proof that says that for any length 1102 00:57:18,130 --> 00:57:19,536 there is a de Bruijn sequence. 1103 00:57:22,470 --> 00:57:23,233 Yes. 1104 00:57:23,233 --> 00:57:24,900 AUDIENCE: Sorry, I missed the procedure. 1105 00:57:24,900 --> 00:57:27,870 So how exactly do you determine the log base 2? 1106 00:57:31,650 --> 00:57:32,910 JULIAN SHUN: So we have-- 1107 00:57:32,910 --> 00:57:37,230 we're starting with some integer that is a power of 2. 1108 00:57:37,230 --> 00:57:40,340 So when we multiply by that power of 2, 1109 00:57:40,340 --> 00:57:44,460 it's left-shifting by the log base 2 of that. 1110 00:57:44,460 --> 00:57:48,810 And then we can determine how much we left-shifted because we 1111 00:57:48,810 --> 00:57:50,760 know-- 1112 00:57:50,760 --> 00:57:53,940 we can just look at the first three bits of this sequence 1113 00:57:53,940 --> 00:57:55,980 after we did the left shift, and then 1114 00:57:55,980 --> 00:57:58,950 look at where that three-bit sequence appears 1115 00:57:58,950 --> 00:58:04,570 in the original de Bruijn sequence before we shifted it. 1116 00:58:04,570 --> 00:58:07,560 And to do that, you can look it up in the convert table. 1117 00:58:07,560 --> 00:58:11,455 This is what we did when we looked up the bit string 110 1118 00:58:11,455 --> 00:58:12,330 in the convert table. 1119 00:58:12,330 --> 00:58:15,330 And that tells us that it starts in the fourth position. 1120 00:58:15,330 --> 00:58:18,240 That means that we left-shifted by 4, 1121 00:58:18,240 --> 00:58:23,610 and that means that the value of n was 2 to the 4. 1122 00:58:23,610 --> 00:58:25,570 Does that make sense? 1123 00:58:25,570 --> 00:58:26,070 Yes. 1124 00:58:26,070 --> 00:58:27,903 AUDIENCE: So just to clarify this only works 1125 00:58:27,903 --> 00:58:30,290 if you multiply the sequence by a power of 2, 1126 00:58:30,290 --> 00:58:32,332 then it gives you back which power of 2 it was? 1127 00:58:32,332 --> 00:58:33,040 JULIAN SHUN: Yes. 1128 00:58:33,040 --> 00:58:36,130 So this only works if you're starting with a power of 2. 1129 00:58:36,130 --> 00:58:38,760 So if it's not a power of 2, this doesn't work. 1130 00:58:46,125 --> 00:58:48,089 Any other questions? 1131 00:58:51,526 --> 00:58:52,030 Yes. 1132 00:58:52,030 --> 00:58:54,030 So if it's not a power of 2, you can round it up 1133 00:58:54,030 --> 00:58:56,150 to the nearest power of 2 using another bit 1134 00:58:56,150 --> 00:58:57,620 trick that we saw earlier. 1135 00:58:57,620 --> 00:58:59,420 And then you can use this bit trick here. 1136 00:59:02,430 --> 00:59:05,250 The performance of this bit trick 1137 00:59:05,250 --> 00:59:07,890 is limited by the performance of multiplication and table 1138 00:59:07,890 --> 00:59:08,700 lookup. 1139 00:59:08,700 --> 00:59:11,850 So you have to do a multiplication 1140 00:59:11,850 --> 00:59:14,490 by some constant, and then you have 1141 00:59:14,490 --> 00:59:17,460 to do table lookup in this convert table. 1142 00:59:17,460 --> 00:59:20,190 So a table lookup does a memory reference, 1143 00:59:20,190 --> 00:59:21,990 which could be expensive. 1144 00:59:21,990 --> 00:59:24,900 And nowadays there's actually a hardware instruction 1145 00:59:24,900 --> 00:59:26,640 to compute this, so you don't actually 1146 00:59:26,640 --> 00:59:28,680 have to implement this trick. 1147 00:59:28,680 --> 00:59:30,450 But this trick is still pretty cool. 1148 00:59:30,450 --> 00:59:33,000 And in the past this is how you would do it 1149 00:59:33,000 --> 00:59:35,940 before there was a hardware instruction that came out. 1150 00:59:41,120 --> 00:59:42,890 So let's look at another problem. 1151 00:59:42,890 --> 00:59:45,120 So this is the n queens problem. 1152 00:59:45,120 --> 00:59:46,780 How many of you have seen this before? 1153 00:59:46,780 --> 00:59:47,280 Yes. 1154 00:59:47,280 --> 00:59:49,970 So many of you have seen this before. 1155 00:59:49,970 --> 00:59:52,250 As a reminder, we're trying to place n queens 1156 00:59:52,250 --> 00:59:57,350 on an n by n chessboard so that no queen attacks another queen. 1157 00:59:57,350 --> 00:59:59,030 In other words, there are no two queens 1158 00:59:59,030 --> 01:00:03,110 in any row, any column, or any diagonal. 1159 01:00:03,110 --> 01:00:04,940 And, commonly, we want to count the number 1160 01:00:04,940 --> 01:00:08,210 of possible solutions to the n queens problem 1161 01:00:08,210 --> 01:00:10,760 for a particular value of n. 1162 01:00:10,760 --> 01:00:14,930 And in this example here, this is a valid configuration. 1163 01:00:14,930 --> 01:00:17,270 You can check, for each of the queens, 1164 01:00:17,270 --> 01:00:19,460 they can't attack any other queen on the board. 1165 01:00:23,450 --> 01:00:26,450 So one common strategy for implementing the n queens 1166 01:00:26,450 --> 01:00:29,090 algorithm is to use backtracking. 1167 01:00:29,090 --> 01:00:31,440 We're going to try placing queens row by row. 1168 01:00:31,440 --> 01:00:33,620 We know that there can only be one queen per row, 1169 01:00:33,620 --> 01:00:36,680 so we just need to determine which position in that row 1170 01:00:36,680 --> 01:00:38,150 the queen will appear in. 1171 01:00:38,150 --> 01:00:40,490 And then if we can't place a queen in any row, 1172 01:00:40,490 --> 01:00:43,820 then we backtrack. 1173 01:00:43,820 --> 01:00:46,758 So, for example, in the first row, 1174 01:00:46,758 --> 01:00:48,800 we'll just place the queen in the first position, 1175 01:00:48,800 --> 01:00:50,480 because there's no queens on the board 1176 01:00:50,480 --> 01:00:53,390 yet, so the first position is valid. 1177 01:00:53,390 --> 01:00:55,790 For the second row, we're going to try 1178 01:00:55,790 --> 01:00:59,990 to place in the first position, but we can't place it there 1179 01:00:59,990 --> 01:01:03,410 because then it will attack the first queen. 1180 01:01:03,410 --> 01:01:05,960 And then the second position is also invalid, 1181 01:01:05,960 --> 01:01:10,970 so the third position is where we place the second queen. 1182 01:01:10,970 --> 01:01:12,610 Now, for the third row we're going 1183 01:01:12,610 --> 01:01:15,890 to check the positions until we get to one that's valid, 1184 01:01:15,890 --> 01:01:18,170 and this is going to be the fifth position. 1185 01:01:21,160 --> 01:01:22,630 Do this again. 1186 01:01:22,630 --> 01:01:25,780 Here we can do it in the second position. 1187 01:01:25,780 --> 01:01:29,840 For the fifth row, let's see where this is going to end up. 1188 01:01:29,840 --> 01:01:30,340 OK. 1189 01:01:30,340 --> 01:01:33,130 So it goes in the fourth position. 1190 01:01:33,130 --> 01:01:34,794 What about the sixth row? 1191 01:01:44,290 --> 01:01:44,790 Whoops. 1192 01:01:44,790 --> 01:01:48,430 So all of the eight positions are invalid, 1193 01:01:48,430 --> 01:01:51,010 because if we place the queen in any of those positions, 1194 01:01:51,010 --> 01:01:53,817 it's going to attack one of the queens that we already placed. 1195 01:01:53,817 --> 01:01:55,150 So now we're going to backtrack. 1196 01:01:55,150 --> 01:01:59,040 We're going to find another position for the fifth queen. 1197 01:01:59,040 --> 01:02:01,313 So let's try some more positions. 1198 01:02:04,630 --> 01:02:07,170 So we can place it at the end. 1199 01:02:07,170 --> 01:02:08,274 Now we try again. 1200 01:02:16,820 --> 01:02:17,773 All right. 1201 01:02:17,773 --> 01:02:19,690 So, unfortunately, we couldn't find a position 1202 01:02:19,690 --> 01:02:21,635 for the sixth row again. 1203 01:02:21,635 --> 01:02:22,510 We have to backtrack. 1204 01:02:22,510 --> 01:02:24,843 But we already tried all the positions in the fifth row, 1205 01:02:24,843 --> 01:02:27,400 so we backtrack to the fourth row. 1206 01:02:27,400 --> 01:02:29,350 And you get the idea. 1207 01:02:29,350 --> 01:02:31,600 And then whenever we find a configuration where 1208 01:02:31,600 --> 01:02:35,430 all eight queens are valid, then we increment some counter by 1. 1209 01:02:35,430 --> 01:02:37,600 And at the end we just return this counter, 1210 01:02:37,600 --> 01:02:40,220 which tells us the number of solutions to the n queens 1211 01:02:40,220 --> 01:02:40,720 puzzle. 1212 01:02:48,820 --> 01:02:51,430 So you can implement this quite easily using 1213 01:02:51,430 --> 01:02:53,170 a recursive procedure. 1214 01:02:53,170 --> 01:02:56,500 You can implement this backtracking search. 1215 01:02:56,500 --> 01:02:58,780 But one question is how should we 1216 01:02:58,780 --> 01:03:01,390 represent the board to facilitate efficient queen 1217 01:03:01,390 --> 01:03:03,580 placement? 1218 01:03:03,580 --> 01:03:06,010 So one way to represent the board 1219 01:03:06,010 --> 01:03:09,130 is to use an array of n squared bytes. 1220 01:03:09,130 --> 01:03:12,750 And for each byte, we just have a 1 1221 01:03:12,750 --> 01:03:17,365 if there is a queen in that position, and 0 otherwise. 1222 01:03:17,365 --> 01:03:19,240 Is there a better way to represent the board? 1223 01:03:27,032 --> 01:03:28,980 AUDIENCE: You can track all of the bits 1224 01:03:28,980 --> 01:03:31,415 such that a 1 bit represents a queen 1225 01:03:31,415 --> 01:03:34,350 at some place on the board? 1226 01:03:34,350 --> 01:03:35,710 JULIAN SHUN: Yes. 1227 01:03:35,710 --> 01:03:36,770 So that's a good answer. 1228 01:03:36,770 --> 01:03:39,400 So instead of using bytes, we can use bits, 1229 01:03:39,400 --> 01:03:41,470 because the value can only be 0 or 1. 1230 01:03:41,470 --> 01:03:43,430 We only need one bit to represent that. 1231 01:03:43,430 --> 01:03:48,082 So we can just have an array of n squared bits. 1232 01:03:48,082 --> 01:03:50,074 Is there a better way to do this? 1233 01:03:56,550 --> 01:03:57,531 Yes. 1234 01:03:57,531 --> 01:04:00,420 AUDIENCE: You could just say in each row 1235 01:04:00,420 --> 01:04:02,192 where a queen is with a byte? 1236 01:04:02,192 --> 01:04:02,900 JULIAN SHUN: Yes. 1237 01:04:02,900 --> 01:04:03,840 So good answer. 1238 01:04:03,840 --> 01:04:07,380 So a better way to do this is to just use an array of n bytes. 1239 01:04:07,380 --> 01:04:11,130 Because we know that on each row there can only be one queen, 1240 01:04:11,130 --> 01:04:14,820 so we just need to store the position of that queen. 1241 01:04:14,820 --> 01:04:17,232 So we have an array of n bytes, one byte for each row, 1242 01:04:17,232 --> 01:04:19,440 and then you just used the byte to store the position 1243 01:04:19,440 --> 01:04:20,630 of the queen in that row. 1244 01:04:23,740 --> 01:04:25,490 It turns out, to implement this algorithm, 1245 01:04:25,490 --> 01:04:27,740 there's a even more compact representation, 1246 01:04:27,740 --> 01:04:32,360 which is to use three-bit vectors of size n, 2n minus 1, 1247 01:04:32,360 --> 01:04:35,380 and 2n minus 1. 1248 01:04:35,380 --> 01:04:37,080 So let's see how this works. 1249 01:04:37,080 --> 01:04:40,520 So the first bit vector we're going to use is of length n. 1250 01:04:40,520 --> 01:04:43,450 We're going to call this the down vector. 1251 01:04:43,450 --> 01:04:45,620 And the down vector just stores a 1 1252 01:04:45,620 --> 01:04:48,800 in the columns that have a queen in it and 0 in the columns 1253 01:04:48,800 --> 01:04:49,580 that are empty. 1254 01:04:53,300 --> 01:04:57,170 And then when we want to check whether placing a queen 1255 01:04:57,170 --> 01:05:00,080 is safe in any position, we first 1256 01:05:00,080 --> 01:05:02,210 have to check whether that column is empty. 1257 01:05:02,210 --> 01:05:05,900 And you can do this by ANDing the down bit 1258 01:05:05,900 --> 01:05:09,440 vector with 1 left-shifted by c, where c is a column where 1259 01:05:09,440 --> 01:05:11,300 you want to place the queen. 1260 01:05:11,300 --> 01:05:13,100 And if that's nonzero, that means 1261 01:05:13,100 --> 01:05:17,030 there's already a queen in that column and you can't place it. 1262 01:05:17,030 --> 01:05:19,670 Otherwise, we're going to have to do another check, 1263 01:05:19,670 --> 01:05:23,870 and we're going to create this other bit vector called left. 1264 01:05:23,870 --> 01:05:27,941 The length of this bit vector is 2n minus 1. 1265 01:05:27,941 --> 01:05:31,250 And it stores a 1 in the diagonal that 1266 01:05:31,250 --> 01:05:33,230 has a queen in it, and 0's otherwise. 1267 01:05:33,230 --> 01:05:37,210 And there are 2n minus 2 possible diagonals. 1268 01:05:37,210 --> 01:05:38,960 And then now, when we want to place 1269 01:05:38,960 --> 01:05:41,780 a queen in row r and column c, we 1270 01:05:41,780 --> 01:05:47,090 can check whether it's safe by doing left ANDed with 1 1271 01:05:47,090 --> 01:05:49,310 left-shifted by r plus c. 1272 01:05:49,310 --> 01:05:51,680 And this is going to be nonzero if there is already 1273 01:05:51,680 --> 01:05:54,950 a queen in that particular diagonal. 1274 01:05:54,950 --> 01:05:57,140 So in that case, we can't place a queen there. 1275 01:05:57,140 --> 01:06:01,220 And, otherwise, we're going to do a final check using 1276 01:06:01,220 --> 01:06:04,610 this right bit vector, which is essentially the same 1277 01:06:04,610 --> 01:06:06,170 but we're looking at the diagonals 1278 01:06:06,170 --> 01:06:08,960 going down to the right. 1279 01:06:08,960 --> 01:06:12,980 So, again, we have a 1 in the diagonals that have a queen 1280 01:06:12,980 --> 01:06:14,700 and 0's otherwise. 1281 01:06:14,700 --> 01:06:17,960 And then now the check is going to be right ANDed with 1 1282 01:06:17,960 --> 01:06:23,120 left-shifted by n minus 1 minus r plus c. 1283 01:06:23,120 --> 01:06:25,850 And if a particular candidate passes all three 1284 01:06:25,850 --> 01:06:28,310 of these checks, then we know that there's not 1285 01:06:28,310 --> 01:06:30,560 going to be a conflict and we can place the queen 1286 01:06:30,560 --> 01:06:34,020 in that particular position. 1287 01:06:34,020 --> 01:06:36,022 So this is a bit vector representation. 1288 01:06:36,022 --> 01:06:37,730 You actually still have to write the code 1289 01:06:37,730 --> 01:06:40,850 to count the number of queens using this bit vector 1290 01:06:40,850 --> 01:06:43,010 representation, and it's actually 1291 01:06:43,010 --> 01:06:44,600 an interesting exercise. 1292 01:06:44,600 --> 01:06:48,440 So I encourage you to try to do this at home. 1293 01:06:48,440 --> 01:06:51,010 But I just told you about the bit vector representation. 1294 01:06:51,010 --> 01:06:52,284 So any questions? 1295 01:06:55,510 --> 01:06:56,010 Yes. 1296 01:06:56,010 --> 01:06:59,890 AUDIENCE: Could you just repeat what the down vector bit 1297 01:06:59,890 --> 01:07:02,320 hack was for figuring out [INAUDIBLE]?? 1298 01:07:02,320 --> 01:07:03,290 JULIAN SHUN: Yes. 1299 01:07:03,290 --> 01:07:05,550 So the down vector, it stores a 1 1300 01:07:05,550 --> 01:07:08,790 in the columns that have a queen in it and 0's otherwise. 1301 01:07:08,790 --> 01:07:13,310 And what you do is, if you want to place a queen in column c, 1302 01:07:13,310 --> 01:07:15,840 you first create the mask 1 left-shifted by c. 1303 01:07:15,840 --> 01:07:17,762 And then you AND it with a down vector. 1304 01:07:17,762 --> 01:07:19,470 And that's going to be nonzero if there's 1305 01:07:19,470 --> 01:07:20,660 a queen in that column. 1306 01:07:27,030 --> 01:07:28,920 Any other questions? 1307 01:07:28,920 --> 01:07:29,430 Yes. 1308 01:07:29,430 --> 01:07:32,008 AUDIENCE: Why isn't there a horizontal one? 1309 01:07:32,008 --> 01:07:34,050 JULIAN SHUN: So it turns out that you don't need. 1310 01:07:34,050 --> 01:07:38,880 Just these three checks is enough to guarantee-- 1311 01:07:38,880 --> 01:07:41,080 guarantee that you can place a queen in a position 1312 01:07:41,080 --> 01:07:42,855 if it passes all three of the checks. 1313 01:07:42,855 --> 01:07:43,355 Yes. 1314 01:07:43,355 --> 01:07:46,635 So a fourth check would just be redundant. 1315 01:07:46,635 --> 01:07:48,427 AUDIENCE: So we don't need a horizontal one 1316 01:07:48,427 --> 01:07:50,540 because we're not placing two queens in the same row. 1317 01:07:50,540 --> 01:07:50,790 JULIAN SHUN: Yes. 1318 01:07:50,790 --> 01:07:51,360 That's true. 1319 01:07:51,360 --> 01:07:51,910 Good point. 1320 01:07:51,910 --> 01:07:52,410 Yes. 1321 01:07:52,410 --> 01:07:55,064 So we're only placing one queen in each particular row. 1322 01:08:01,110 --> 01:08:04,680 So let's look at another problem. 1323 01:08:04,680 --> 01:08:08,620 This is called population count, or pop count for short. 1324 01:08:08,620 --> 01:08:10,980 And the problem here is we want to count the number of 1 1325 01:08:10,980 --> 01:08:14,880 bits in some word x. 1326 01:08:14,880 --> 01:08:17,910 Here's a way to do this that repeatedly eliminates the least 1327 01:08:17,910 --> 01:08:20,609 significant 1 bit in a word. 1328 01:08:20,609 --> 01:08:24,600 So we have this for loop where r is initialized to 0. 1329 01:08:24,600 --> 01:08:28,560 And we're going to repeat this loop until x becomes 0. 1330 01:08:28,560 --> 01:08:31,160 And then each time we go through this loop, we increment r. 1331 01:08:31,160 --> 01:08:33,120 And inside the loop we're going to set 1332 01:08:33,120 --> 01:08:37,410 x equal to x ANDed with x minus 1. 1333 01:08:37,410 --> 01:08:41,910 And this is going to clear the least significant 1 bit in x. 1334 01:08:41,910 --> 01:08:44,830 So let's look at an example. 1335 01:08:44,830 --> 01:08:47,990 So let's say we have this value here for x. 1336 01:08:47,990 --> 01:08:51,729 Well, to get x minus 1, we flip the rightmost 1 bit 1337 01:08:51,729 --> 01:08:53,348 in x from a 1 to 0. 1338 01:08:53,348 --> 01:08:55,890 And then we fill in all of the bits to the right of that with 1339 01:08:55,890 --> 01:08:57,222 1's. 1340 01:08:57,222 --> 01:09:02,130 And then now when we AND those two things together, 1341 01:09:02,130 --> 01:09:06,660 we're going to copy all of the bits up to the rightmost 1. 1342 01:09:06,660 --> 01:09:09,000 And then for the rightmost 1, we're going to zero it out 1343 01:09:09,000 --> 01:09:10,260 because we're ending with a 0. 1344 01:09:10,260 --> 01:09:12,135 And then all of the bits to the right of that 1345 01:09:12,135 --> 01:09:13,290 are still going to be 0. 1346 01:09:13,290 --> 01:09:16,050 So x ANDed with x minus 1 is just 1347 01:09:16,050 --> 01:09:21,750 going to get rid of the least significant 1 bit. 1348 01:09:21,750 --> 01:09:25,319 And then we repeat this process until x becomes 0. 1349 01:09:25,319 --> 01:09:28,109 In that case we've already eliminated all the 1's and we 1350 01:09:28,109 --> 01:09:30,630 know the answer, which is stored in r. 1351 01:09:34,990 --> 01:09:35,649 Questions? 1352 01:09:41,580 --> 01:09:44,590 So this code will be pretty fast if the number of 1 bits 1353 01:09:44,590 --> 01:09:47,590 is small, but the running time is 1354 01:09:47,590 --> 01:09:50,450 proportional to the number of 1 bits in a word. 1355 01:09:50,450 --> 01:09:53,600 So in the worst case, if most of the bits are set to 1, 1356 01:09:53,600 --> 01:10:00,320 then you're going to need a lot of iterations to run this code. 1357 01:10:00,320 --> 01:10:05,050 So let's look at a more efficient way to do this. 1358 01:10:05,050 --> 01:10:07,930 This is to use table lookup. 1359 01:10:07,930 --> 01:10:12,970 So we're going to create a table of size 256, which 1360 01:10:12,970 --> 01:10:16,330 stores for each 8-bit word the number of 1's 1361 01:10:16,330 --> 01:10:17,890 in that 8-bit word. 1362 01:10:17,890 --> 01:10:23,260 So we have all possible 8-bit words stored in this table. 1363 01:10:23,260 --> 01:10:27,400 And then now, to get the number of 1 bits in x, 1364 01:10:27,400 --> 01:10:30,760 for every 8-bit substring in x, we're 1365 01:10:30,760 --> 01:10:36,040 going to look it up in this count table and add it to r. 1366 01:10:36,040 --> 01:10:38,170 And then we're going to right-shift x by 8 1367 01:10:38,170 --> 01:10:39,550 so that we can get the next word. 1368 01:10:39,550 --> 01:10:43,870 And then when x becomes 0, we know we're done. 1369 01:10:43,870 --> 01:10:45,390 So that's table lookup. 1370 01:10:45,390 --> 01:10:51,060 And the performance here depends on the size of x. 1371 01:10:51,060 --> 01:10:53,910 If we have a 64-bit word, we need 1372 01:10:53,910 --> 01:10:57,400 to do this at most eight times, whereas in the initial code 1373 01:10:57,400 --> 01:11:03,180 we might have to do it 64 times if we had 64 1 bits. 1374 01:11:03,180 --> 01:11:06,300 The cost of this code is bottlenecked by the memory 1375 01:11:06,300 --> 01:11:10,540 operations, because this table here is stored in memory. 1376 01:11:10,540 --> 01:11:13,200 So every time you access it you have to go to memory 1377 01:11:13,200 --> 01:11:15,910 to fetch the value there. 1378 01:11:15,910 --> 01:11:18,630 And here are some approximate costs 1379 01:11:18,630 --> 01:11:22,600 for accessing memory in various levels of the hierarchy. 1380 01:11:22,600 --> 01:11:24,870 If something's stored in register, it's very fast. 1381 01:11:24,870 --> 01:11:27,240 It only takes you 1 cycle. 1382 01:11:27,240 --> 01:11:29,910 If it's stored in L1 cache, it's about 4 cycles, 1383 01:11:29,910 --> 01:11:34,230 L2 cache about 10 cycles, L3 cache about 50 cycles. 1384 01:11:34,230 --> 01:11:37,260 And then, finally, if you have to go to DRAM because it's not 1385 01:11:37,260 --> 01:11:40,620 in cache, it's much more expensive, 150 cycles. 1386 01:11:40,620 --> 01:11:43,650 It's an order of magnitude slower 1387 01:11:43,650 --> 01:11:45,802 than doing something-- fetching something that's 1388 01:11:45,802 --> 01:11:47,010 already stored in a register. 1389 01:11:49,620 --> 01:11:53,830 So let's now look at a third way to do population count where 1390 01:11:53,830 --> 01:11:57,660 we don't actually have to go to cache or DRAM. 1391 01:11:57,660 --> 01:12:01,140 Essentially, we can do everything in registers. 1392 01:12:01,140 --> 01:12:03,640 So here's how you do it. 1393 01:12:03,640 --> 01:12:06,810 So we're going to create these five masks-- 1394 01:12:06,810 --> 01:12:10,860 or six masks, from M0 up to M5. 1395 01:12:10,860 --> 01:12:14,548 And these masks-- the values of these masks 1396 01:12:14,548 --> 01:12:15,840 are shown in the comments here. 1397 01:12:15,840 --> 01:12:18,360 In this notation here, x to the k 1398 01:12:18,360 --> 01:12:20,850 just means x repeated k times. 1399 01:12:20,850 --> 01:12:26,430 So the mask M5 has 32 0's, followed by 32 1's. 1400 01:12:26,430 --> 01:12:31,440 The mask M0 has the bit string 01 repeated 32 times, 1401 01:12:31,440 --> 01:12:32,030 and so on. 1402 01:12:35,003 --> 01:12:36,420 After we create these masks, we're 1403 01:12:36,420 --> 01:12:40,030 going to execute these six instructions at the bottom, 1404 01:12:40,030 --> 01:12:44,850 and this is going to give us the number of 1's in the word. 1405 01:12:44,850 --> 01:12:49,150 So let's do an example to see how this works. 1406 01:12:49,150 --> 01:12:51,270 So let's say we start with this bit string here. 1407 01:12:54,750 --> 01:12:56,640 In the first step, what we're going to do 1408 01:12:56,640 --> 01:12:59,820 is we're going to AND x with the mask M0. 1409 01:12:59,820 --> 01:13:01,830 And then we're also going to AND x right-shifted 1410 01:13:01,830 --> 01:13:04,650 by 1 with the mask M0. 1411 01:13:04,650 --> 01:13:12,390 and recall that the mask M0 is just 01 repeated 32 times, 1412 01:13:12,390 --> 01:13:15,440 and therefore the mask is essentially extracting all 1413 01:13:15,440 --> 01:13:16,710 of the even bits. 1414 01:13:16,710 --> 01:13:21,180 So x ANDed with M0 gives us all of the even bits. 1415 01:13:21,180 --> 01:13:24,030 And then when we right-shift x by 1 and AND it with M0, 1416 01:13:24,030 --> 01:13:26,490 that's going to give us all the odd bits. 1417 01:13:26,490 --> 01:13:28,650 And then we're going to line those two things up 1418 01:13:28,650 --> 01:13:31,290 and add them together. 1419 01:13:31,290 --> 01:13:33,090 And the result of doing this is it's 1420 01:13:33,090 --> 01:13:37,410 going to tell us for every group of two bits the number 1421 01:13:37,410 --> 01:13:39,870 of 1 bits in that group. 1422 01:13:39,870 --> 01:13:42,540 So now for each of these pairs of bits, 1423 01:13:42,540 --> 01:13:44,920 it's telling us how many of them are 1. 1424 01:13:44,920 --> 01:13:48,270 So in the leftmost group here, we add two 1's. 1425 01:13:48,270 --> 01:13:52,440 So the result of adding 1 and 1 is 1 0, which is 2. 1426 01:13:52,440 --> 01:13:56,700 For the rightmost group, we have two 0's, and the count there is 1427 01:13:56,700 --> 01:13:57,810 00. 1428 01:13:57,810 --> 01:14:02,320 And this is the same for all of the other groups. 1429 01:14:02,320 --> 01:14:09,900 So this gives us the number of 1's in every pair of positions. 1430 01:14:09,900 --> 01:14:15,090 Now we're going to AND the result with M1. 1431 01:14:15,090 --> 01:14:19,050 And we're going to right-shift it by 2 and also AND it with M1 1432 01:14:19,050 --> 01:14:20,475 and add those two things together. 1433 01:14:23,430 --> 01:14:27,120 And M1 is a mask that will give us the bottom two bits 1434 01:14:27,120 --> 01:14:30,040 in every group of four bits. 1435 01:14:30,040 --> 01:14:31,860 So when we right-shift x by 2, that's 1436 01:14:31,860 --> 01:14:33,120 giving us the top two bits. 1437 01:14:33,120 --> 01:14:35,040 And then now we add those together, 1438 01:14:35,040 --> 01:14:38,520 and it will give us the count of the number of 1 1439 01:14:38,520 --> 01:14:41,340 bits in every group of size 4. 1440 01:14:41,340 --> 01:14:45,800 And these counts are stored in the result here now. 1441 01:14:45,800 --> 01:14:47,610 So you can verify that each of these groups 1442 01:14:47,610 --> 01:14:50,250 has the count of the number of 1 bits. 1443 01:14:50,250 --> 01:14:54,540 So, for example, we have 100 here. 1444 01:14:54,540 --> 01:14:57,240 And this is correct since there are four 1 bits. 1445 01:14:59,920 --> 01:15:03,640 Now we do this again with the mask M2. 1446 01:15:03,640 --> 01:15:05,530 That's going to give us the counts 1447 01:15:05,530 --> 01:15:09,180 for all groups of size 8. 1448 01:15:09,180 --> 01:15:12,490 Then we go to groups of size 16. 1449 01:15:12,490 --> 01:15:16,950 And then, finally, we add these two together, 1450 01:15:16,950 --> 01:15:21,350 giving us the number of bits in this group of size 32. 1451 01:15:21,350 --> 01:15:22,940 And this is actually the pop count. 1452 01:15:22,940 --> 01:15:25,880 So the value here is 17. 1453 01:15:25,880 --> 01:15:29,030 And you can verify that there are indeed 17 1's 1454 01:15:29,030 --> 01:15:31,340 in the input word x. 1455 01:15:34,220 --> 01:15:35,660 Any questions? 1456 01:15:41,430 --> 01:15:44,160 So the performance of this code, which 1457 01:15:44,160 --> 01:15:46,260 is based on parallel divide and conquer, 1458 01:15:46,260 --> 01:15:49,740 is going to be proportional to log base 2 of w, 1459 01:15:49,740 --> 01:15:51,510 where w is the word length. 1460 01:15:51,510 --> 01:15:56,670 Because on every step I'm doubling the size of my groups. 1461 01:15:56,670 --> 01:16:00,680 And after I do this log base 2 w times, I have the whole group. 1462 01:16:04,680 --> 01:16:09,730 In the first two instructions that I executed here, 1463 01:16:09,730 --> 01:16:14,200 I have to actually do the AND separately 1464 01:16:14,200 --> 01:16:17,890 for x right-shifted by 1 and x, and also x right-shifted by 2 1465 01:16:17,890 --> 01:16:20,630 and x, and then add them together, 1466 01:16:20,630 --> 01:16:23,750 because there is an overflow issue. 1467 01:16:23,750 --> 01:16:26,680 The overflow issue is that the size of the groups 1468 01:16:26,680 --> 01:16:30,700 here might not be large enough to actually store 1469 01:16:30,700 --> 01:16:33,940 the count of the number of 1 bits in that group. 1470 01:16:33,940 --> 01:16:35,530 But once I get to the larger groups, 1471 01:16:35,530 --> 01:16:38,530 the count can always be stored in a group of that size 1472 01:16:38,530 --> 01:16:42,010 and I don't need to worry about overflow. 1473 01:16:42,010 --> 01:16:44,140 So for the last four lines, I can actually 1474 01:16:44,140 --> 01:16:46,240 save one instruction. 1475 01:16:46,240 --> 01:16:47,810 I don't need to do the AND twice. 1476 01:16:55,920 --> 01:16:58,730 So it turns out that most modern machines nowadays 1477 01:16:58,730 --> 01:17:01,490 have an intrinsic pop count instruction implemented 1478 01:17:01,490 --> 01:17:03,980 in hardware, which is faster than anything 1479 01:17:03,980 --> 01:17:05,840 you can code yourself. 1480 01:17:05,840 --> 01:17:08,700 And you can access this pop count instruction 1481 01:17:08,700 --> 01:17:13,280 via compiler intrinsics, for example in GCC or Clang. 1482 01:17:13,280 --> 01:17:17,240 And in GCC, it's __builtin_popcount. 1483 01:17:20,860 --> 01:17:24,740 One warning though is that if you write this code using 1484 01:17:24,740 --> 01:17:27,710 these intrinsics, if you try to compile 1485 01:17:27,710 --> 01:17:29,690 the code on a machine that doesn't support it, 1486 01:17:29,690 --> 01:17:31,190 your code isn't going to compile. 1487 01:17:31,190 --> 01:17:33,500 So it makes your code less portable. 1488 01:17:33,500 --> 01:17:37,250 But this intrinsic is faster than the parallel divide 1489 01:17:37,250 --> 01:17:38,150 and conquer version. 1490 01:17:40,830 --> 01:17:43,010 So one question is, how can you get the log base 1491 01:17:43,010 --> 01:17:46,190 2 of a power of 2 quickly using a pop count instruction? 1492 01:17:46,190 --> 01:17:48,796 So instead of using the de Bruijn sequence trick. 1493 01:17:52,010 --> 01:17:52,510 Yes. 1494 01:17:52,510 --> 01:17:54,772 AUDIENCE: You decrement then you pop count. 1495 01:17:54,772 --> 01:17:55,480 JULIAN SHUN: Yes. 1496 01:17:55,480 --> 01:17:59,800 So what you do is you subtract 1 from the power of 2, 1497 01:17:59,800 --> 01:18:03,010 and that's going to flood all of the lower bits with 1's. 1498 01:18:03,010 --> 01:18:04,810 And then now when you execute pop count, 1499 01:18:04,810 --> 01:18:07,660 it's going to count the number of 1's, and that gives us 1500 01:18:07,660 --> 01:18:09,460 the log base 2 of the power of 2. 1501 01:18:09,460 --> 01:18:10,450 So good answer. 1502 01:18:13,458 --> 01:18:15,000 So those all the bit tricks I'm going 1503 01:18:15,000 --> 01:18:17,130 to be talking about today. 1504 01:18:17,130 --> 01:18:19,050 There's a lot of resources online if you're 1505 01:18:19,050 --> 01:18:21,150 interested in learning more. 1506 01:18:21,150 --> 01:18:23,670 There's this really good website maintained 1507 01:18:23,670 --> 01:18:26,760 by Sean Eron Anderson. 1508 01:18:26,760 --> 01:18:28,800 There's also the Knuth's textbook, which 1509 01:18:28,800 --> 01:18:30,480 has some bit tricks in there. 1510 01:18:30,480 --> 01:18:32,670 There's a chess programming website which 1511 01:18:32,670 --> 01:18:34,650 has a lot of cool bit tricks. 1512 01:18:34,650 --> 01:18:37,200 Some of those are used in implementing chess programs. 1513 01:18:37,200 --> 01:18:39,568 And then, finally, this book called Hacker's Delight. 1514 01:18:39,568 --> 01:18:41,610 So we'll be playing around with many of these bit 1515 01:18:41,610 --> 01:18:45,710 tricks in project 1, so happy bit hacking.