1
00:00:01,550 --> 00:00:03,920
The following content is
provided under a Creative

2
00:00:03,920 --> 00:00:05,310
Commons license.

3
00:00:05,310 --> 00:00:07,520
Your support will help
MIT OpenCourseWare

4
00:00:07,520 --> 00:00:11,610
continue to offer high-quality
educational resources for free.

5
00:00:11,610 --> 00:00:14,180
To make a donation or to
view additional materials

6
00:00:14,180 --> 00:00:18,140
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,140 --> 00:00:19,026
at ocw.mit.edu.

8
00:00:22,250 --> 00:00:24,020
JULIAN SHUN: Good
afternoon, everybody.

9
00:00:24,020 --> 00:00:29,590
So welcome to the
third lecture of 6.172.

10
00:00:29,590 --> 00:00:31,340
Today we're going to
talk about bit hacks,

11
00:00:31,340 --> 00:00:33,380
and today's going to be
a really fun lecture.

12
00:00:36,510 --> 00:00:39,860
So, first of all, let's recall
the binary representation

13
00:00:39,860 --> 00:00:41,250
of a word.

14
00:00:41,250 --> 00:00:45,390
So a w-bit word is
represented as follows.

15
00:00:45,390 --> 00:00:49,940
So we're going to number the
bits from x0 to xw minus 1

16
00:00:49,940 --> 00:00:53,370
starting from the
rightmost side.

17
00:00:53,370 --> 00:00:55,850
And the unsigned
integer value stored

18
00:00:55,850 --> 00:00:57,650
in x with this
binary representation

19
00:00:57,650 --> 00:00:59,820
can be computed as follows.

20
00:00:59,820 --> 00:01:03,200
So it's essentially the sum of
a whole bunch of powers of 2.

21
00:01:03,200 --> 00:01:08,320
And you sum the product of the
bit with the appropriate power

22
00:01:08,320 --> 00:01:08,820
of 2.

23
00:01:08,820 --> 00:01:10,610
So if the bit is
1 in position k,

24
00:01:10,610 --> 00:01:12,950
then you multiply by 2 to the k.

25
00:01:12,950 --> 00:01:15,350
And if it's 0, then
you just add 0.

26
00:01:15,350 --> 00:01:19,760
So, for example, let's say
we have this 8-bit word here.

27
00:01:19,760 --> 00:01:23,840
And if we apply this
equation, we get--

28
00:01:23,840 --> 00:01:28,620
first we get 2 because there is
one bit in the first position.

29
00:01:28,620 --> 00:01:32,090
So we multiply 1 by
2 to 1, which is 2.

30
00:01:32,090 --> 00:01:34,730
Then in the second
position, we also have a 1.

31
00:01:34,730 --> 00:01:39,330
So we multiply 1 by 2
to the 2, which is 4.

32
00:01:39,330 --> 00:01:41,630
And then we have 16 and 128.

33
00:01:41,630 --> 00:01:44,840
So we just sum up all
of these powers of 2

34
00:01:44,840 --> 00:01:49,490
and that gives us the
unsigned integer value.

35
00:01:49,490 --> 00:01:54,480
And that 0b prefix here
represents a Boolean constant.

36
00:01:54,480 --> 00:01:56,690
So that means we're going
to interpret this constant

37
00:01:56,690 --> 00:01:58,280
as a Boolean value.

38
00:02:01,580 --> 00:02:03,490
There's also signed integers.

39
00:02:03,490 --> 00:02:05,800
So you can also represent
negative numbers, which

40
00:02:05,800 --> 00:02:07,780
is useful, and this
is called the two's

41
00:02:07,780 --> 00:02:09,370
complement representation.

42
00:02:09,370 --> 00:02:11,590
And here's the
formula for computing

43
00:02:11,590 --> 00:02:14,290
the two's complement
representation of a word.

44
00:02:14,290 --> 00:02:19,030
So for bit 0 all the
way up to bit w minus 2,

45
00:02:19,030 --> 00:02:21,460
you do the same thing as above.

46
00:02:21,460 --> 00:02:26,530
But for the leftmost
bit or bit w minus 1,

47
00:02:26,530 --> 00:02:31,984
you subtract that bit multiplied
by 2 to the w minus 1.

48
00:02:31,984 --> 00:02:38,680
So for this example here,
we saw 2 plus 4 plus 16.

49
00:02:38,680 --> 00:02:40,210
That's the same as above.

50
00:02:40,210 --> 00:02:43,000
But for the leftmost bit,
since we have a 1 here,

51
00:02:43,000 --> 00:02:47,170
we're going to subtract
2 the 7, which is 128.

52
00:02:47,170 --> 00:02:49,870
And this gives us
the signed value

53
00:02:49,870 --> 00:02:54,100
for the integer,
which is negative 106.

54
00:02:54,100 --> 00:02:55,520
Does that make sense?

55
00:02:55,520 --> 00:02:57,978
Any questions about
this representation?

56
00:03:02,140 --> 00:03:05,410
So the leftmost bit
is known as a sign bit

57
00:03:05,410 --> 00:03:09,730
because it tells you
whether you need to subtract

58
00:03:09,730 --> 00:03:12,298
by this negative value or not.

59
00:03:12,298 --> 00:03:14,590
So if it's 0, then you don't
have to subtract anything.

60
00:03:14,590 --> 00:03:20,500
If it's 1, then you subtract
by a large integer value.

61
00:03:20,500 --> 00:03:25,180
So in two's complement,
the all 0's word is just 0.

62
00:03:25,180 --> 00:03:28,150
So you just apply the above
formula and everything is 0.

63
00:03:28,150 --> 00:03:31,530
So you just get 0.

64
00:03:31,530 --> 00:03:33,360
What's the value of
the all 1's word?

65
00:03:36,760 --> 00:03:37,260
Yes.

66
00:03:37,260 --> 00:03:38,580
AUDIENCE: 1.

67
00:03:38,580 --> 00:03:40,950
JULIAN SHUN: Negative 1, right?

68
00:03:40,950 --> 00:03:45,060
So the reason why it's
negative 1, so you can just

69
00:03:45,060 --> 00:03:47,190
use the formula.

70
00:03:47,190 --> 00:03:50,340
And we're going to sum up
a bunch of powers of 2.

71
00:03:50,340 --> 00:03:53,550
All of the x sub k's
are going to be 1.

72
00:03:53,550 --> 00:03:58,170
So we're summing up 2 to the k
from k equals 0 to w minus 2,

73
00:03:58,170 --> 00:04:01,830
and that's a geometric series
which sums to 2 to the w

74
00:04:01,830 --> 00:04:03,700
minus 1 minus 1.

75
00:04:03,700 --> 00:04:05,690
And then for the
sign bit, we're going

76
00:04:05,690 --> 00:04:08,160
to subtract 2 to the w minus 1.

77
00:04:08,160 --> 00:04:10,800
So now the 2 to the w
minus 1's cancel out

78
00:04:10,800 --> 00:04:13,840
and we're just left
with negative 1.

79
00:04:13,840 --> 00:04:17,160
So this is an important
property to know about two's

80
00:04:17,160 --> 00:04:19,370
complement representation.

81
00:04:19,370 --> 00:04:24,342
The all 1's word
is just negative 1.

82
00:04:24,342 --> 00:04:28,800
And this leads to important
identity which says that x plus

83
00:04:28,800 --> 00:04:31,990
the one's complement of x-- the
one's complement is just all

84
00:04:31,990 --> 00:04:33,900
the bits of x flipped--

85
00:04:33,900 --> 00:04:36,255
is equal to negative 1.

86
00:04:36,255 --> 00:04:40,860
This is because if you add x
with all of it bits flipped,

87
00:04:40,860 --> 00:04:43,260
then you're just going to
end up with the all 1's word.

88
00:04:43,260 --> 00:04:45,330
And we saw on the
previous slide that that's

89
00:04:45,330 --> 00:04:46,980
equal to negative 1.

90
00:04:46,980 --> 00:04:50,190
And from this identity,
we have that negative x

91
00:04:50,190 --> 00:04:52,990
is equal to the one's
complement of x plus 1.

92
00:04:52,990 --> 00:04:54,930
So this relates the
two's complement

93
00:04:54,930 --> 00:04:58,360
to the one's complement
representation.

94
00:04:58,360 --> 00:05:00,640
Let's look at an example.

95
00:05:00,640 --> 00:05:02,580
So let's look at--

96
00:05:02,580 --> 00:05:05,790
let's say x is equal
to this constant here.

97
00:05:05,790 --> 00:05:09,240
The one's complement
of x or tilde of x

98
00:05:09,240 --> 00:05:12,390
is just all of the
bits of x flipped.

99
00:05:12,390 --> 00:05:16,320
And then to get
negative x, we add 1

100
00:05:16,320 --> 00:05:17,940
to the one's complement of x.

101
00:05:17,940 --> 00:05:20,520
And the fact of
adding 1 here is we're

102
00:05:20,520 --> 00:05:25,290
going to take the rightmost 0
bit in the one's complement,

103
00:05:25,290 --> 00:05:26,770
flip that to a 1.

104
00:05:26,770 --> 00:05:29,100
And then for all of the
bits to the right of that,

105
00:05:29,100 --> 00:05:30,750
we flip them to 0's.

106
00:05:33,910 --> 00:05:37,330
So another way to
see this is you

107
00:05:37,330 --> 00:05:41,110
look at the representation of
x and you flip all of the bits

108
00:05:41,110 --> 00:05:45,185
up to the rightmost 1 but not
including that rightmost 1 bit,

109
00:05:45,185 --> 00:05:46,810
and then you just
copy everything over.

110
00:05:49,830 --> 00:05:51,290
So any questions about this?

111
00:05:54,490 --> 00:05:54,990
OK.

112
00:05:57,760 --> 00:06:03,370
So this is a table showing
the relationship between hex

113
00:06:03,370 --> 00:06:04,570
and binary representation.

114
00:06:04,570 --> 00:06:07,690
So hex representation
is base 16.

115
00:06:07,690 --> 00:06:11,740
And the reason why we use
hex is because sometimes we

116
00:06:11,740 --> 00:06:15,310
have these big binary constants
and we don't want to write--

117
00:06:15,310 --> 00:06:18,100
have to type all of these
symbols into our code.

118
00:06:18,100 --> 00:06:20,800
And hex gives us a
more compact format

119
00:06:20,800 --> 00:06:23,350
to write these constants.

120
00:06:23,350 --> 00:06:26,170
And this table, you
can basically just

121
00:06:26,170 --> 00:06:29,680
look up, for each
possible hex value, what

122
00:06:29,680 --> 00:06:31,360
its binary representation is.

123
00:06:31,360 --> 00:06:36,190
And for the values
from 0 to 9, we're

124
00:06:36,190 --> 00:06:39,340
just going to use the same as
decimal representation for hex.

125
00:06:39,340 --> 00:06:41,040
And then for values
10 to 15, we're

126
00:06:41,040 --> 00:06:47,460
going to use the
characters from A to F.

127
00:06:47,460 --> 00:06:52,020
To translate from hex to binary,
you just take each hex digit,

128
00:06:52,020 --> 00:06:55,620
look it up in this table, write
out the binary equivalent,

129
00:06:55,620 --> 00:06:58,000
and then you
concatenate together

130
00:06:58,000 --> 00:06:59,850
all of the binary
values you've got.

131
00:06:59,850 --> 00:07:03,960
So in this example I
have this hex constant

132
00:07:03,960 --> 00:07:07,380
which says DEC1DE2C0DE4F00D.

133
00:07:07,380 --> 00:07:11,970
So now I just look up each of
these hex values in this table.

134
00:07:11,970 --> 00:07:18,940
So D is 1101, E is 1110,
C is 1100, and so on.

135
00:07:18,940 --> 00:07:22,260
And I just concatenate all
of these values together

136
00:07:22,260 --> 00:07:26,192
and that gives me my
binary representation.

137
00:07:26,192 --> 00:07:27,900
And you can also go
the other way around,

138
00:07:27,900 --> 00:07:30,720
converting binary to hex.

139
00:07:30,720 --> 00:07:33,120
And you do the same thing,
just look it up in this table.

140
00:07:36,020 --> 00:07:39,890
And the prefix 0x here
designates a hex constant,

141
00:07:39,890 --> 00:07:43,730
just like 0b designates
the Boolean constant.

142
00:07:43,730 --> 00:07:45,867
So if you're using these
constants in your code

143
00:07:45,867 --> 00:07:47,450
and you're writing
it in hex, then you

144
00:07:47,450 --> 00:07:49,370
should use the 0x prefix.

145
00:07:55,400 --> 00:07:58,640
So C has a bunch of
bitwise operators.

146
00:07:58,640 --> 00:08:00,950
And here's a table
describing what

147
00:08:00,950 --> 00:08:02,300
these bitwise operators do.

148
00:08:02,300 --> 00:08:05,510
So the ampersand is
just logical AND.

149
00:08:05,510 --> 00:08:09,590
The vertical bar is logical OR.

150
00:08:09,590 --> 00:08:13,850
This caret sign is the
XOR or exclusive OR.

151
00:08:13,850 --> 00:08:17,660
And XOR just says that if
either of the two bits is 1,

152
00:08:17,660 --> 00:08:18,820
then we return 1.

153
00:08:18,820 --> 00:08:22,550
And if both of the bits are
0 or both of them are 1,

154
00:08:22,550 --> 00:08:24,760
then we return 0.

155
00:08:24,760 --> 00:08:30,800
The tilde sign is the one's
complement or the not.

156
00:08:30,800 --> 00:08:35,659
And then we have left shift
and right shift operators.

157
00:08:35,659 --> 00:08:39,030
So let's look at how these
operatives work on this example

158
00:08:39,030 --> 00:08:39,530
here.

159
00:08:39,530 --> 00:08:41,210
So we have these
two 8-bit words,

160
00:08:41,210 --> 00:08:47,750
A and B. To compute A AND B,
we just look at every two bits

161
00:08:47,750 --> 00:08:51,170
in the same position
in A and B and compute

162
00:08:51,170 --> 00:08:52,610
the AND of those two bits.

163
00:08:52,610 --> 00:08:57,130
So 1 ANDed with 0 is
0, so we get 0 here.

164
00:08:57,130 --> 00:08:59,210
0 ANDed with 1 is 0.

165
00:08:59,210 --> 00:09:03,512
1 ended with 1 is 1, and so on.

166
00:09:03,512 --> 00:09:07,760
A OR B is similar but now
you apply the OR operator

167
00:09:07,760 --> 00:09:09,290
instead of the AND operator.

168
00:09:09,290 --> 00:09:11,720
So if either one of
the two positions is 1,

169
00:09:11,720 --> 00:09:13,100
then you return 1.

170
00:09:13,100 --> 00:09:14,900
And if both are 0,
then you return 0.

171
00:09:14,900 --> 00:09:19,880
So an A OR B, all of the
bits except for this bit here

172
00:09:19,880 --> 00:09:23,340
is 0 because in the
original two words

173
00:09:23,340 --> 00:09:27,410
both of the corresponding
bits were 0.

174
00:09:27,410 --> 00:09:32,570
For A XOR B, we check if exactly
one of the two bits is 1.

175
00:09:32,570 --> 00:09:37,790
So for the leftmost
bit, we have 1 and 0,

176
00:09:37,790 --> 00:09:42,140
so we have exactly one bit
set to 1 and we get a 1 here.

177
00:09:42,140 --> 00:09:44,910
The second bit is 0
and 1, so that's 1.

178
00:09:44,910 --> 00:09:48,820
The third bit is 1, 1,
so that's 0, and so on.

179
00:09:48,820 --> 00:09:51,330
Tilde of A is just the one's
complement of A. We saw

180
00:09:51,330 --> 00:09:51,830
that before.

181
00:09:51,830 --> 00:09:54,080
We just flip all the bits.

182
00:09:54,080 --> 00:09:58,100
A right shifted by 3, we
just shift the bit string

183
00:09:58,100 --> 00:10:03,800
to the right by 3, and then we
fill in the digits or the bits

184
00:10:03,800 --> 00:10:05,880
on the left with 0's.

185
00:10:05,880 --> 00:10:09,140
And then A left shifted
with 2, we do the same thing

186
00:10:09,140 --> 00:10:10,400
but to the left.

187
00:10:10,400 --> 00:10:12,665
And then we fill in these
empty bits with 0's.

188
00:10:15,440 --> 00:10:19,915
So these are the bitwise
operators in C. Any questions?

189
00:10:28,160 --> 00:10:30,100
AUDIENCE: They're
not [INAUDIBLE]??

190
00:10:34,470 --> 00:10:37,330
JULIAN SHUN: For a right
shift, there is a--

191
00:10:37,330 --> 00:10:42,190
there is a shift that will
fill in the upper digits

192
00:10:42,190 --> 00:10:45,970
with whatever the
leftmost digit was.

193
00:10:45,970 --> 00:10:47,980
But if you're working
with unsigned integers,

194
00:10:47,980 --> 00:10:49,272
then it's not going to do that.

195
00:10:49,272 --> 00:10:51,430
For signed integers it will.

196
00:10:51,430 --> 00:10:53,980
And when we're doing
bit manipulations,

197
00:10:53,980 --> 00:10:56,350
we're usually going to
stick to unsigned integers,

198
00:10:56,350 --> 00:10:57,892
so we don't have to
worry about that.

199
00:11:04,430 --> 00:11:06,410
So now let's look at
some common idioms

200
00:11:06,410 --> 00:11:09,290
that you can do using
these bitwise operators.

201
00:11:09,290 --> 00:11:11,720
So the first one we'll
look at is setting

202
00:11:11,720 --> 00:11:15,380
the kth bit in a word x to 1.

203
00:11:15,380 --> 00:11:20,210
So the idea here is to use
a shift followed by an OR.

204
00:11:20,210 --> 00:11:25,130
So we're going to compute
1 left-shift it by k if we

205
00:11:25,130 --> 00:11:27,410
want to set the kth bit to a 1.

206
00:11:27,410 --> 00:11:31,370
And this gives us a mask with a
1 in exactly the kth position,

207
00:11:31,370 --> 00:11:34,050
and 0's everywhere else.

208
00:11:34,050 --> 00:11:36,530
And then now when
we OR that in to x,

209
00:11:36,530 --> 00:11:40,070
that's going to change the bit
from a 0 to a 1 if it was a 0.

210
00:11:40,070 --> 00:11:41,930
And if that bit was
already set to 1,

211
00:11:41,930 --> 00:11:43,263
then this doesn't do anything.

212
00:11:43,263 --> 00:11:44,930
And then for all of
the other positions,

213
00:11:44,930 --> 00:11:46,520
since we're doing
an OR with 0, we're

214
00:11:46,520 --> 00:11:51,480
just copying over
the bits from x.

215
00:11:51,480 --> 00:11:54,210
So that's setting the kth bit.

216
00:11:54,210 --> 00:11:55,830
We can also clear the kth bit.

217
00:11:55,830 --> 00:11:59,190
And the idea here is to use a
shift, a complement, and then

218
00:11:59,190 --> 00:12:00,440
an AND.

219
00:12:00,440 --> 00:12:05,175
So again we're going to generate
this mask, 1 left-shifted by k.

220
00:12:05,175 --> 00:12:07,300
But now we're going to take
the complement of this.

221
00:12:07,300 --> 00:12:11,100
So now we have a 0 in exactly
the kth position and 1's

222
00:12:11,100 --> 00:12:14,300
everywhere else.

223
00:12:14,300 --> 00:12:18,670
And now when we AND this mask
with x, in the kth position

224
00:12:18,670 --> 00:12:20,420
it's going to clear
that bit because we're

225
00:12:20,420 --> 00:12:21,500
ANDing it with a 0.

226
00:12:21,500 --> 00:12:24,380
So the result is going
to be 0 no matter

227
00:12:24,380 --> 00:12:25,400
what was there before.

228
00:12:25,400 --> 00:12:27,440
And then for all
the remaining bits,

229
00:12:27,440 --> 00:12:30,110
since we're ANDing with 1,
we're just copying it over

230
00:12:30,110 --> 00:12:31,220
from the original word.

231
00:12:36,890 --> 00:12:40,250
You can toggle the kth
bit or flip the kth bit

232
00:12:40,250 --> 00:12:42,920
using a shift and then an XOR.

233
00:12:42,920 --> 00:12:46,550
So, again, we're going
to generate this mask.

234
00:12:46,550 --> 00:12:48,920
And then now, when we do
an XOR with this mask,

235
00:12:48,920 --> 00:12:53,510
it's going to change a bit from
a 0 to 1, or from a 1 to a 0,

236
00:12:53,510 --> 00:12:56,330
because that's what XOR does.

237
00:12:56,330 --> 00:12:58,400
So in this example, it's
changing from a 0 to 1.

238
00:12:58,400 --> 00:12:59,930
But if it was already
a 1, then it's

239
00:12:59,930 --> 00:13:02,180
going to toggle it back to 0.

240
00:13:04,840 --> 00:13:05,780
Any questions?

241
00:13:12,290 --> 00:13:16,760
So let's look at
another bit trick.

242
00:13:16,760 --> 00:13:21,500
So here we're trying to extract
a bit field from a word x.

243
00:13:21,500 --> 00:13:26,090
And this is important if you're
working with encoded data.

244
00:13:26,090 --> 00:13:29,720
And the idea here is to
do a mask and a shift.

245
00:13:29,720 --> 00:13:34,820
So we're going to generate
a mask with 1's in exactly

246
00:13:34,820 --> 00:13:38,720
the positions that we want
to extract out of this word,

247
00:13:38,720 --> 00:13:41,330
and then 0's everywhere else.

248
00:13:41,330 --> 00:13:44,220
And then we're going to
AND the x with the mask,

249
00:13:44,220 --> 00:13:47,480
and that's going to give us
the bits in the four positions

250
00:13:47,480 --> 00:13:49,400
that we wanted to
extract in this example,

251
00:13:49,400 --> 00:13:51,080
and then we have
0's everywhere else.

252
00:13:53,950 --> 00:13:56,950
And then now we're going
to right-shift this value

253
00:13:56,950 --> 00:14:00,790
that we extracted so that
it appears in the least

254
00:14:00,790 --> 00:14:06,730
significant digits so that we
can use it in our computation.

255
00:14:06,730 --> 00:14:09,220
So this is a very
useful bit trick

256
00:14:09,220 --> 00:14:13,390
to know if you're working with
compressed or encoded data.

257
00:14:13,390 --> 00:14:16,240
And if you use the bit
field facilities in C,

258
00:14:16,240 --> 00:14:21,280
it's actually going to generate
assembly code that will do

259
00:14:21,280 --> 00:14:22,840
masking and shifting for you.

260
00:14:27,880 --> 00:14:31,110
You can also set a
bit field in a word.

261
00:14:31,110 --> 00:14:36,300
So let's say we want to set a
bit field in x to some value y.

262
00:14:36,300 --> 00:14:40,440
The idea is to first invert
this mask to clear those bits

263
00:14:40,440 --> 00:14:41,820
we want to set in x.

264
00:14:41,820 --> 00:14:45,950
And then we OR in the
shifted value of y.

265
00:14:45,950 --> 00:14:50,180
So let's say we have these
two words, x and y here.

266
00:14:50,180 --> 00:14:52,273
We're going to generate
the mask as we did before,

267
00:14:52,273 --> 00:14:54,440
but now we're going to flip
all the bits in the mask

268
00:14:54,440 --> 00:14:57,440
by taking the one's complement.

269
00:14:57,440 --> 00:15:00,920
And then we AND the--

270
00:15:00,920 --> 00:15:03,260
we AND the one's
complement of the mask

271
00:15:03,260 --> 00:15:09,230
with x, and that's going to
clear the bits in x because we

272
00:15:09,230 --> 00:15:11,750
have 0's in exactly those
positions in that mask,

273
00:15:11,750 --> 00:15:14,840
and when you AND that into
x it will return to 0.

274
00:15:14,840 --> 00:15:16,880
And then for all
the other positions,

275
00:15:16,880 --> 00:15:18,920
we're just copying
in the bits of x.

276
00:15:21,500 --> 00:15:25,000
And then, finally, we're
going to left-shift y

277
00:15:25,000 --> 00:15:26,660
by an appropriate
amount so that we

278
00:15:26,660 --> 00:15:29,870
can line up the value with
these four bit positions here.

279
00:15:29,870 --> 00:15:33,260
And then we can now
just OR those values in.

280
00:15:33,260 --> 00:15:36,890
And this will set the
positions in x to the value y.

281
00:15:44,790 --> 00:15:46,530
In order to be safe,
you should actually

282
00:15:46,530 --> 00:15:49,800
do a mask on the shifted y
value before you OR it in,

283
00:15:49,800 --> 00:15:51,600
because you don't know
that the value of y

284
00:15:51,600 --> 00:15:53,910
is within the range of the mask.

285
00:15:53,910 --> 00:15:58,680
So if y has some garbage
values in the higher bits,

286
00:15:58,680 --> 00:16:00,300
when you OR this
in it might pollute

287
00:16:00,300 --> 00:16:01,830
the original value of x.

288
00:16:01,830 --> 00:16:03,750
So, for safety,
you should actually

289
00:16:03,750 --> 00:16:08,030
do a mask before you OR the
value, the shifted value of y

290
00:16:08,030 --> 00:16:08,530
in.

291
00:16:15,330 --> 00:16:17,040
So any questions on this?

292
00:16:21,990 --> 00:16:24,710
So now let's look at how
we can swap two integers.

293
00:16:24,710 --> 00:16:29,480
So we want to swap
the values of x and y.

294
00:16:29,480 --> 00:16:33,200
The standard way to do this is
to use a temporary variable t.

295
00:16:33,200 --> 00:16:37,490
So we set t equal to x, x equal
to y, and then y equal to t.

296
00:16:40,610 --> 00:16:43,570
This does involve a
temporary variable, however.

297
00:16:43,570 --> 00:16:45,170
So now the question
is whether we

298
00:16:45,170 --> 00:16:48,290
can do a swap without
using a temporary variable.

299
00:16:48,290 --> 00:16:51,940
It turns out that you
can using bit tricks.

300
00:16:51,940 --> 00:16:55,580
So here's the code for
doing a no-temp swap.

301
00:16:55,580 --> 00:16:58,210
So you first set
x equal to x XOR

302
00:16:58,210 --> 00:17:04,780
y, then y equal to x XOR y,
and then x equal to x XOR y.

303
00:17:04,780 --> 00:17:08,210
So has anyone seen this before?

304
00:17:08,210 --> 00:17:08,800
OK, good.

305
00:17:08,800 --> 00:17:11,180
So some of you have
seen this before.

306
00:17:11,180 --> 00:17:12,670
And for the rest
of you all, I'll

307
00:17:12,670 --> 00:17:14,839
tell you how it works in
the next couple slides.

308
00:17:14,839 --> 00:17:18,160
So let's first
look at an example

309
00:17:18,160 --> 00:17:20,079
of how to run this
code before we

310
00:17:20,079 --> 00:17:21,680
go into why it actually works.

311
00:17:21,680 --> 00:17:27,069
So we're going to start with
these two words in x and y.

312
00:17:27,069 --> 00:17:31,210
We're first going to
do x equal x XOR y.

313
00:17:31,210 --> 00:17:33,520
And now we store
the result in x.

314
00:17:33,520 --> 00:17:37,630
And this is the result when you
do the XOR of these two words.

315
00:17:37,630 --> 00:17:41,120
And then now we do
y equal to x XOR y.

316
00:17:41,120 --> 00:17:44,650
And notice how the value of
x here has already changed.

317
00:17:44,650 --> 00:17:47,650
So we're doing the
XOR of these two words

318
00:17:47,650 --> 00:17:49,990
and setting that to y.

319
00:17:49,990 --> 00:17:53,200
And here this value is
actually the same as x.

320
00:17:53,200 --> 00:17:57,040
So we've already placed x in y.

321
00:17:57,040 --> 00:17:59,170
And, finally, we do another XOR.

322
00:17:59,170 --> 00:18:03,460
We set x equal to x XOR y.

323
00:18:03,460 --> 00:18:06,430
And then this gives us
this value, which is y.

324
00:18:06,430 --> 00:18:08,500
So at the end,
we've just swapped

325
00:18:08,500 --> 00:18:11,185
x and y without using
any temporary variable.

326
00:18:14,150 --> 00:18:18,940
So the reason why this works is
because XOR is its own inverse.

327
00:18:18,940 --> 00:18:23,260
So if you do x XOR y, and then
XOR the result of that with y,

328
00:18:23,260 --> 00:18:24,950
you just get back x itself.

329
00:18:24,950 --> 00:18:27,800
So let's look at the truth
table to see why this is true.

330
00:18:27,800 --> 00:18:33,770
So in the x and y columns, I've
shown all the possibilities.

331
00:18:33,770 --> 00:18:37,450
So there are four different
possibilities of x and y.

332
00:18:37,450 --> 00:18:40,160
And then I also have
the values of x XOR y.

333
00:18:40,160 --> 00:18:45,670
So it's 1 in the rows
where I have exactly one 1,

334
00:18:45,670 --> 00:18:48,070
and then 0 in the
remaining rows.

335
00:18:48,070 --> 00:18:53,170
And then now if I do
x XOR y XORed with y,

336
00:18:53,170 --> 00:18:55,930
I'm going to XOR
these values with y.

337
00:18:55,930 --> 00:18:57,880
0 XOR 0 is 0.

338
00:18:57,880 --> 00:19:00,250
1 XOR 1 is 0.

339
00:19:00,250 --> 00:19:01,690
1 XOR 0 is 1.

340
00:19:01,690 --> 00:19:04,010
And 0 XOR 1 is 1.

341
00:19:04,010 --> 00:19:09,310
And notice that these values
are the same as the values of x.

342
00:19:09,310 --> 00:19:14,950
So when I XOR something in
twice, it just cancels out

343
00:19:14,950 --> 00:19:16,410
and I get back the
original thing.

344
00:19:20,720 --> 00:19:26,780
So now let's go into why this
bit trick actually does a swap.

345
00:19:26,780 --> 00:19:29,330
So in the first line,
what we're doing is

346
00:19:29,330 --> 00:19:34,730
we're generating a mask with
1's where the bits in x and y

347
00:19:34,730 --> 00:19:35,240
differ.

348
00:19:35,240 --> 00:19:37,130
Because that's what XOR
is going to give you.

349
00:19:37,130 --> 00:19:39,620
It's going to return a 1
if the bits are different,

350
00:19:39,620 --> 00:19:40,670
and 0 otherwise.

351
00:19:40,670 --> 00:19:43,970
So this is a mask that
tells us in which positions

352
00:19:43,970 --> 00:19:48,340
the bits in x and y differ.

353
00:19:48,340 --> 00:19:50,900
And I'm going to
store that into x.

354
00:19:50,900 --> 00:19:54,830
And then in the second
line, when I do x XOR y,

355
00:19:54,830 --> 00:19:57,380
this is going to
flip the bits in y

356
00:19:57,380 --> 00:19:59,510
that are different
from x, because I'm

357
00:19:59,510 --> 00:20:02,150
XORing with this mask, which
tells me which of the bits

358
00:20:02,150 --> 00:20:03,590
differ from x.

359
00:20:03,590 --> 00:20:06,350
And then if I XOR
with that mask,

360
00:20:06,350 --> 00:20:10,580
I'm flipping the bits
in y that differ from x,

361
00:20:10,580 --> 00:20:13,550
and this will just
give me back x.

362
00:20:13,550 --> 00:20:15,140
And I store that in y.

363
00:20:15,140 --> 00:20:20,270
So we see that the original
value of x is in y now.

364
00:20:20,270 --> 00:20:22,610
And then in the last line,
I do the same thing but now

365
00:20:22,610 --> 00:20:24,920
I'm flipping the bits in x
that are different from y.

366
00:20:24,920 --> 00:20:28,250
So I still have the
mask that's stored in x.

367
00:20:28,250 --> 00:20:31,090
And then I can XOR
that mask with y,

368
00:20:31,090 --> 00:20:33,380
and y has the
original value of x.

369
00:20:33,380 --> 00:20:37,280
So this is flipping the bits
in x that differ from y,

370
00:20:37,280 --> 00:20:40,760
and now I have the original
value of y stored in x.

371
00:20:44,380 --> 00:20:47,480
So this is a pretty
cool trick, right?

372
00:20:47,480 --> 00:20:50,180
Any questions on why this works?

373
00:20:58,050 --> 00:21:00,190
So one thing about
this bit trick

374
00:21:00,190 --> 00:21:03,080
here is that it's actually
poor at exploiting

375
00:21:03,080 --> 00:21:06,650
instruction-level
parallelism, so it's actually

376
00:21:06,650 --> 00:21:08,750
going to be slower than
the naive code that

377
00:21:08,750 --> 00:21:10,760
uses a temporary variable.

378
00:21:10,760 --> 00:21:14,240
Because in the
original code I had,

379
00:21:14,240 --> 00:21:16,910
I could actually execute
two lines in parallel.

380
00:21:16,910 --> 00:21:19,160
I can store value
into the temporary

381
00:21:19,160 --> 00:21:21,110
and then also change
one of the values

382
00:21:21,110 --> 00:21:23,330
of x and y at the same time.

383
00:21:23,330 --> 00:21:25,040
Whereas in this
code here, there's

384
00:21:25,040 --> 00:21:27,830
a sequential dependence
among these three lines.

385
00:21:27,830 --> 00:21:31,040
I can't execute any of
the lines in parallel.

386
00:21:31,040 --> 00:21:34,580
We'll learn more about
instruction-level parallelism

387
00:21:34,580 --> 00:21:37,640
in next week's
lectures, but I just

388
00:21:37,640 --> 00:21:40,787
wanted to point out that
the performance of this

389
00:21:40,787 --> 00:21:41,870
isn't actually that great.

390
00:21:41,870 --> 00:21:44,990
But this is actually a
pretty cool trick to know.

391
00:21:44,990 --> 00:21:47,642
Sometimes it shows
up in job interviews.

392
00:21:52,350 --> 00:21:55,890
So the next thing
we're going to look at

393
00:21:55,890 --> 00:22:01,830
is finding the minimum
of two integers, x and y.

394
00:22:01,830 --> 00:22:03,720
So let's say we want
to store the result

395
00:22:03,720 --> 00:22:06,810
of the minimum in a variable r.

396
00:22:06,810 --> 00:22:09,540
Here's the standard
way to do this.

397
00:22:09,540 --> 00:22:11,260
We just use an
if-else statement.

398
00:22:11,260 --> 00:22:14,250
So if x is less
than y, than r is x.

399
00:22:14,250 --> 00:22:17,610
And, otherwise, r is set to y.

400
00:22:17,610 --> 00:22:19,410
Here's an equivalent expression.

401
00:22:19,410 --> 00:22:21,840
It just uses the
ternary operator in C.

402
00:22:21,840 --> 00:22:24,750
It does exactly the same
thing as the if-else statement

403
00:22:24,750 --> 00:22:25,470
on the left.

404
00:22:31,710 --> 00:22:33,377
One performance
problem with this code

405
00:22:33,377 --> 00:22:34,960
is that there is a
branch in the code.

406
00:22:34,960 --> 00:22:37,390
So we have this
if statement that

407
00:22:37,390 --> 00:22:39,700
checks if x is less than y.

408
00:22:39,700 --> 00:22:43,600
And modern machines will
do branch prediction.

409
00:22:43,600 --> 00:22:46,570
And for whatever branch it
predicts the code to take,

410
00:22:46,570 --> 00:22:48,520
it's going to do
prefetching and execute some

411
00:22:48,520 --> 00:22:50,870
of the instructions in advance.

412
00:22:50,870 --> 00:22:55,660
But the problem is if it
mispredicts the branch,

413
00:22:55,660 --> 00:22:57,970
it does a lot of wasted
work, and the processor

414
00:22:57,970 --> 00:23:00,460
has to empty the pipeline
and undo all of the work

415
00:23:00,460 --> 00:23:01,060
that it did.

416
00:23:01,060 --> 00:23:08,170
So this is a performance issue
due to branch misprediction.

417
00:23:08,170 --> 00:23:10,000
Modern compilers are
usually good enough

418
00:23:10,000 --> 00:23:12,880
to optimize this branch away,
but sometimes the compiler

419
00:23:12,880 --> 00:23:15,820
isn't good enough to
optimize the branch away.

420
00:23:15,820 --> 00:23:19,870
So is there a way to do a
minimum without using a branch?

421
00:23:23,240 --> 00:23:23,740
All right.

422
00:23:23,740 --> 00:23:26,010
So here's how you do it.

423
00:23:26,010 --> 00:23:32,000
So we set r equal to y XOR
x or y ANDed with negative x

424
00:23:32,000 --> 00:23:34,190
less than y.

425
00:23:34,190 --> 00:23:35,900
So it's pretty obvious, right?

426
00:23:39,710 --> 00:23:43,280
So why does this work?

427
00:23:43,280 --> 00:23:46,120
So first we need to know that
the C language represents

428
00:23:46,120 --> 00:23:47,950
the Boolean values
true and false

429
00:23:47,950 --> 00:23:52,250
with the integers 1
and 0, respectively.

430
00:23:52,250 --> 00:23:55,000
So now let's look at
the two possible cases.

431
00:23:55,000 --> 00:23:57,370
First, let's look at a case
where x is less than y,

432
00:23:57,370 --> 00:23:59,110
and then we'll look
at the case where x

433
00:23:59,110 --> 00:24:00,880
is greater than or equal to y.

434
00:24:00,880 --> 00:24:05,800
So in the first case,
when x is less than y,

435
00:24:05,800 --> 00:24:08,660
the comparison here x less
than y is going to return 1.

436
00:24:08,660 --> 00:24:10,160
And then we're going
to negate that,

437
00:24:10,160 --> 00:24:11,860
which gives us negative 1.

438
00:24:11,860 --> 00:24:13,810
And recall from
earlier, negative 1

439
00:24:13,810 --> 00:24:19,480
is the all 1's word in two's
complement representation.

440
00:24:19,480 --> 00:24:24,610
So when we AND x XOR
y with all 1's word,

441
00:24:24,610 --> 00:24:27,880
that just gives us x XOR y.

442
00:24:27,880 --> 00:24:31,110
And now we're left
with y XOR x XOR y.

443
00:24:31,110 --> 00:24:34,330
And we know that the--

444
00:24:34,330 --> 00:24:37,990
we know that the inverse
of XOR is itself.

445
00:24:37,990 --> 00:24:40,180
And therefore the two
y's cancel out here

446
00:24:40,180 --> 00:24:42,390
and we're just left with x.

447
00:24:42,390 --> 00:24:44,620
And in this case x is
indeed the minimum.

448
00:24:48,470 --> 00:24:52,490
In the other case, we have x
greater than or equal to y.

449
00:24:52,490 --> 00:24:55,010
Then the expression x less
than y is going to return 0.

450
00:24:55,010 --> 00:24:57,710
Negative of 0 is still 0.

451
00:24:57,710 --> 00:25:01,760
And then when we AND x XOR
y with 0, we're left with 0.

452
00:25:01,760 --> 00:25:05,360
And this just gives us
y XOR 0, which is y.

453
00:25:05,360 --> 00:25:07,970
And in this case y is the
minimum of the two integers.

454
00:25:11,140 --> 00:25:12,493
So any questions?

455
00:25:20,710 --> 00:25:21,793
So how many of you--

456
00:25:21,793 --> 00:25:23,210
how many of you
knew this already?

457
00:25:26,230 --> 00:25:26,730
Good.

458
00:25:26,730 --> 00:25:29,820
So we learned
something new today.

459
00:25:29,820 --> 00:25:36,640
So let's see how branches
work in a real function.

460
00:25:36,640 --> 00:25:39,250
So here we're trying to merge
together two sorted arrays,

461
00:25:39,250 --> 00:25:41,520
and this is a subroutine
that's used in merge sort

462
00:25:41,520 --> 00:25:44,370
if you've seen it before.

463
00:25:44,370 --> 00:25:47,650
So the inputs to this
function are three arrays.

464
00:25:47,650 --> 00:25:50,220
So we want to merge
together arrays A and B

465
00:25:50,220 --> 00:25:52,260
and store the result
in C. And then

466
00:25:52,260 --> 00:25:57,060
we also pass the function the
sizes of A and B in na and nb.

467
00:25:59,650 --> 00:26:01,750
So what does the
restrict keyword do here?

468
00:26:01,750 --> 00:26:02,470
Does anyone know?

469
00:26:07,980 --> 00:26:10,480
So the restrict keyword
tells the compiler

470
00:26:10,480 --> 00:26:13,570
that this is going to be
the only pointer that can

471
00:26:13,570 --> 00:26:16,750
point to that particular data.

472
00:26:16,750 --> 00:26:20,135
And this enables the compiler
to do more optimizations.

473
00:26:20,135 --> 00:26:21,760
So when you're writing
programs and you

474
00:26:21,760 --> 00:26:23,830
know that there can only
be one pointer pointing

475
00:26:23,830 --> 00:26:26,620
to specific pieces
of data, then you

476
00:26:26,620 --> 00:26:28,100
can declare that
restrict keyword,

477
00:26:28,100 --> 00:26:32,230
and this gives the compiler more
freedom to do optimizations.

478
00:26:35,750 --> 00:26:39,590
So now let's look at
this procedure here.

479
00:26:39,590 --> 00:26:43,610
So while the sizes of
A and B are nonzero,

480
00:26:43,610 --> 00:26:45,860
we're going to go into
this if-else clause

481
00:26:45,860 --> 00:26:51,170
and we're going to check if
the element pointed to by A is

482
00:26:51,170 --> 00:26:53,828
less than or equal to the
element pointed to by B.

483
00:26:53,828 --> 00:26:56,120
And if so, we're going to
store that element pointed to

484
00:26:56,120 --> 00:26:58,880
by A into C. And then
we're going to increment

485
00:26:58,880 --> 00:27:01,710
both the C and A pointers.

486
00:27:01,710 --> 00:27:04,100
And then we're going
to decrement na.

487
00:27:04,100 --> 00:27:06,320
This tells us that there's
one less element in A

488
00:27:06,320 --> 00:27:09,460
that we need to merge in now.

489
00:27:09,460 --> 00:27:14,800
And, otherwise, we do the same
thing but with array B and nb.

490
00:27:14,800 --> 00:27:17,910
And if one of the two
arrays becomes empty,

491
00:27:17,910 --> 00:27:22,350
then we go to one of these
two while loops at the bottom

492
00:27:22,350 --> 00:27:24,930
and we just copy all
the remaining elements

493
00:27:24,930 --> 00:27:29,010
in the non-empty
array into C. So here,

494
00:27:29,010 --> 00:27:31,650
if na is greater than 0,
then A is a non-empty array,

495
00:27:31,650 --> 00:27:34,590
and then we just copy the
remaining elements of A into C.

496
00:27:34,590 --> 00:27:37,730
And, otherwise, we copy the
remaining elements of B into C.

497
00:27:37,730 --> 00:27:40,810
So let's do a simple example.

498
00:27:40,810 --> 00:27:43,410
Let's say we want to merge
these two arrays in green

499
00:27:43,410 --> 00:27:45,570
into the blue array here.

500
00:27:45,570 --> 00:27:49,590
So let's say the top array is
A, and the bottom array is B,

501
00:27:49,590 --> 00:27:53,790
and the blue array is C.
So, initially, A and B

502
00:27:53,790 --> 00:27:57,420
are pointing to the beginning
of these two green arrays.

503
00:27:57,420 --> 00:27:59,980
And since both
arrays are non-empty,

504
00:27:59,980 --> 00:28:04,020
we're going to compare the
first two elements here.

505
00:28:04,020 --> 00:28:05,700
And we see that
3 is less than 4,

506
00:28:05,700 --> 00:28:08,580
so we're going to place
3 into the array C.

507
00:28:08,580 --> 00:28:11,310
And then we're going to
increment the pointer in A

508
00:28:11,310 --> 00:28:12,810
to point to the next element.

509
00:28:12,810 --> 00:28:14,685
And we're also going to
increment the pointer

510
00:28:14,685 --> 00:28:18,540
C to point to the next slot.

511
00:28:18,540 --> 00:28:20,400
Now we're going to
compare 4 and 12.

512
00:28:20,400 --> 00:28:24,030
4 is less than 12, so we
place 4 into the array C,

513
00:28:24,030 --> 00:28:28,440
and we increment array B. And
then we just keep doing this.

514
00:28:28,440 --> 00:28:29,790
So 12 is less than 14.

515
00:28:29,790 --> 00:28:32,045
14 is less than 19.

516
00:28:32,045 --> 00:28:34,125
19 Is less than 21.

517
00:28:34,125 --> 00:28:36,380
21 is less than 46.

518
00:28:36,380 --> 00:28:38,280
And here 23 is less than 46.

519
00:28:38,280 --> 00:28:40,740
And at this point, one of
the arrays becomes empty.

520
00:28:40,740 --> 00:28:42,780
So B is empty now.

521
00:28:42,780 --> 00:28:45,210
So now we get to the
second while loop.

522
00:28:45,210 --> 00:28:47,250
And we see that A still
has elements in it,

523
00:28:47,250 --> 00:28:50,257
and we just copy the remaining
elements in A into C.

524
00:28:50,257 --> 00:28:51,090
And then we're done.

525
00:28:55,260 --> 00:28:58,770
So that's how the standard code
for merging two sorted arrays

526
00:28:58,770 --> 00:28:59,270
works.

527
00:29:02,710 --> 00:29:04,580
So let's look at each
of these branches

528
00:29:04,580 --> 00:29:06,680
to see if it's predictable.

529
00:29:06,680 --> 00:29:10,790
So a predictable
branch is a branch

530
00:29:10,790 --> 00:29:13,700
that most of the time it
returns the same answer,

531
00:29:13,700 --> 00:29:16,520
and only rarely does it
return a different answer.

532
00:29:16,520 --> 00:29:19,940
And an unpredictable branch is
one where it sometimes returns

533
00:29:19,940 --> 00:29:22,050
one value and sometimes
returns another value

534
00:29:22,050 --> 00:29:25,260
and you can't really predict it.

535
00:29:25,260 --> 00:29:27,440
So let's look at
the first branch.

536
00:29:27,440 --> 00:29:30,331
Does anyone know if this
branch is predictable?

537
00:29:36,700 --> 00:29:37,200
Yes.

538
00:29:37,200 --> 00:29:39,040
AUDIENCE: That would
be unpredictable

539
00:29:39,040 --> 00:29:43,562
because it depends on
what input you're given.

540
00:29:43,562 --> 00:29:46,260
JULIAN SHUN: So it turns out
that this branch is actually

541
00:29:46,260 --> 00:29:49,300
predictable because it's going
to return true most of the time

542
00:29:49,300 --> 00:29:51,210
except for the last time.

543
00:29:51,210 --> 00:29:54,750
So it's only going to return
false when nb is equal to 0.

544
00:29:54,750 --> 00:29:57,120
And at that point you're just
going to execute this once

545
00:29:57,120 --> 00:29:58,870
and then you're done.

546
00:29:58,870 --> 00:30:01,800
But most of the time nb
is going to be greater

547
00:30:01,800 --> 00:30:04,470
than 0 when you
execute this, and we

548
00:30:04,470 --> 00:30:06,450
call this a predictable branch.

549
00:30:09,780 --> 00:30:11,940
What about the second one?

550
00:30:14,802 --> 00:30:16,233
So--

551
00:30:16,233 --> 00:30:18,322
AUDIENCE: Also predictable?

552
00:30:18,322 --> 00:30:19,030
JULIAN SHUN: Yes.

553
00:30:19,030 --> 00:30:21,190
So it's also predictable
for the same reason.

554
00:30:24,738 --> 00:30:25,780
What about the third one?

555
00:30:29,950 --> 00:30:31,088
Yes.

556
00:30:31,088 --> 00:30:31,630
AUDIENCE: No.

557
00:30:31,630 --> 00:30:34,570
Because we really-- if we
already knew which was bigger,

558
00:30:34,570 --> 00:30:37,412
then we already have
the sorted array then.

559
00:30:37,412 --> 00:30:38,120
JULIAN SHUN: Yes.

560
00:30:38,120 --> 00:30:39,890
So this turns out
to be unpredictable

561
00:30:39,890 --> 00:30:44,180
because we don't know the
values in A and B a priori.

562
00:30:44,180 --> 00:30:49,220
So this condition
inside the if statement

563
00:30:49,220 --> 00:30:52,430
is going to return true
about half of the time

564
00:30:52,430 --> 00:30:55,253
because we don't know what
values are in A and B.

565
00:30:55,253 --> 00:30:57,170
And that's going to be
an unpredictable branch

566
00:30:57,170 --> 00:31:03,520
because it's going to return
true or false about 50/50.

567
00:31:03,520 --> 00:31:06,340
What about the last one?

568
00:31:06,340 --> 00:31:06,840
Yes.

569
00:31:06,840 --> 00:31:07,990
Why?

570
00:31:07,990 --> 00:31:12,612
AUDIENCE: Yes, because for
similar reasons of 1 and 2.

571
00:31:12,612 --> 00:31:15,192
It's probably [INAUDIBLE].

572
00:31:15,192 --> 00:31:15,900
JULIAN SHUN: Yes.

573
00:31:15,900 --> 00:31:17,760
So it is predictable.

574
00:31:17,760 --> 00:31:20,232
The reason why it's
predictable is that most

575
00:31:20,232 --> 00:31:21,690
the time it's going
to return true.

576
00:31:21,690 --> 00:31:23,340
And that once it
returns false you're

577
00:31:23,340 --> 00:31:26,700
never going to look at that
again inside this function

578
00:31:26,700 --> 00:31:27,880
call.

579
00:31:27,880 --> 00:31:29,640
So it returns true
most of the time,

580
00:31:29,640 --> 00:31:34,270
and we call that a
predictable branch.

581
00:31:34,270 --> 00:31:37,420
So branches 1, 2, and 4
are OK because they're

582
00:31:37,420 --> 00:31:41,530
predictable branches, but branch
3 is going to cause a problem.

583
00:31:41,530 --> 00:31:45,970
It's an unpredictable branch,
and the hardware doesn't really

584
00:31:45,970 --> 00:31:51,430
like this because it can't
do prefetching efficiently.

585
00:31:51,430 --> 00:31:55,540
So to fix this, we can use our
no-branch minimum bit trick

586
00:31:55,540 --> 00:31:58,660
that we learned a
couple slides ago.

587
00:31:58,660 --> 00:32:02,470
So now what we're doing is we're
going to have a variable called

588
00:32:02,470 --> 00:32:05,200
cmp which stores the
result of the comparison

589
00:32:05,200 --> 00:32:10,540
between the first element of
A and the first element of B.

590
00:32:10,540 --> 00:32:14,740
And then now we're going to
get the minimum of A and B

591
00:32:14,740 --> 00:32:15,243
as follows.

592
00:32:15,243 --> 00:32:17,035
It's the same bit trick
that we saw before.

593
00:32:19,880 --> 00:32:22,420
So now the variable
min is going to store

594
00:32:22,420 --> 00:32:24,850
the smaller of the
first element of A

595
00:32:24,850 --> 00:32:27,550
and the first element
of B. And we also

596
00:32:27,550 --> 00:32:31,210
have the result of
this comparison here.

597
00:32:31,210 --> 00:32:33,310
So that's stored in cmp.

598
00:32:33,310 --> 00:32:36,640
So first we're going to
place the minimum value in C.

599
00:32:36,640 --> 00:32:38,710
And then, based on
the result of cmp,

600
00:32:38,710 --> 00:32:43,150
we're going to increment one of
A or B. So if A was less than

601
00:32:43,150 --> 00:32:45,670
or equal to B, then
cmp is going to be 1.

602
00:32:45,670 --> 00:32:52,360
And A plus equal cmp is
going to increment A by 1.

603
00:32:52,360 --> 00:32:55,840
And then B plus equal to not
cmp is going to not do anything,

604
00:32:55,840 --> 00:32:57,880
because not cmp is 0.

605
00:32:57,880 --> 00:33:01,180
And then for na, we're
going to decrement by cmp.

606
00:33:01,180 --> 00:33:04,390
So it's going to be 1 if A
is less than or equal to B,

607
00:33:04,390 --> 00:33:06,110
and 0 otherwise.

608
00:33:06,110 --> 00:33:07,840
And then for nb, we're
going to decrement

609
00:33:07,840 --> 00:33:09,820
by the not of the cmp.

610
00:33:09,820 --> 00:33:13,900
So only one of these
two lines is actually

611
00:33:13,900 --> 00:33:18,348
going to do something based on
the result of the comparison.

612
00:33:18,348 --> 00:33:20,515
And then the rest of the
code is the same as before.

613
00:33:23,250 --> 00:33:25,770
Any questions?

614
00:33:25,770 --> 00:33:28,710
So now we've gotten rid of
this unpredictable branch

615
00:33:28,710 --> 00:33:29,670
that we had before.

616
00:33:33,690 --> 00:33:35,450
So one thing about
this optimization

617
00:33:35,450 --> 00:33:38,180
is that it works well
on certain machines.

618
00:33:38,180 --> 00:33:41,120
However, on modern machines,
using a good compiler

619
00:33:41,120 --> 00:33:44,690
like Clang with
the minus O3 flag,

620
00:33:44,690 --> 00:33:46,640
the branchless
version is usually

621
00:33:46,640 --> 00:33:48,530
going to be slower than
the branching version

622
00:33:48,530 --> 00:33:51,200
because the compiler is
actually smart enough

623
00:33:51,200 --> 00:33:55,310
to get rid of the branch
inside the original version

624
00:33:55,310 --> 00:33:56,150
of minimum.

625
00:33:56,150 --> 00:34:00,980
There's this instruction called
cmov or a conditional move.

626
00:34:00,980 --> 00:34:04,460
It's basically a
branchless instruction

627
00:34:04,460 --> 00:34:05,690
for doing a comparison.

628
00:34:05,690 --> 00:34:08,429
We'll learn more
about that next week.

629
00:34:08,429 --> 00:34:11,542
So this trick actually
usually doesn't really work.

630
00:34:11,542 --> 00:34:14,000
There might be some machines
and some compilers that works,

631
00:34:14,000 --> 00:34:15,417
but most of the
time, the compiler

632
00:34:15,417 --> 00:34:19,860
is better at optimizing
this code than you are.

633
00:34:19,860 --> 00:34:22,190
So one of the
common themes so far

634
00:34:22,190 --> 00:34:25,190
is that I've told you about
a really cool bit trick

635
00:34:25,190 --> 00:34:28,520
and then I told you that
it doesn't really work.

636
00:34:28,520 --> 00:34:30,940
So why are we even learning
about these bit tricks

637
00:34:30,940 --> 00:34:33,820
then if they don't even work?

638
00:34:33,820 --> 00:34:37,310
So first is because the compiler
does some of these bit tricks,

639
00:34:37,310 --> 00:34:39,770
and it's helpful to understand
what these bit tricks are

640
00:34:39,770 --> 00:34:42,409
so you can figure out what
the compiler is doing when

641
00:34:42,409 --> 00:34:45,560
you look at the assembly code.

642
00:34:45,560 --> 00:34:48,530
Secondly, sometimes the compiler
doesn't do these optimizations

643
00:34:48,530 --> 00:34:51,530
for you and you have
to do it yourself.

644
00:34:51,530 --> 00:34:53,540
Thirdly, many bit
hacks for words

645
00:34:53,540 --> 00:34:56,120
extend naturally to
bit and word hacks

646
00:34:56,120 --> 00:34:59,530
for vectors, which are widely
used in high-performance code.

647
00:34:59,530 --> 00:35:02,250
So it's good to know
about these tricks.

648
00:35:02,250 --> 00:35:05,670
These bit tricks also
arise in other domains.

649
00:35:05,670 --> 00:35:09,810
And, finally, because they're
just fun to learn about.

650
00:35:09,810 --> 00:35:12,150
And for project 1,
you'll be playing around

651
00:35:12,150 --> 00:35:14,520
with some of these
bit tricks, so it's

652
00:35:14,520 --> 00:35:19,470
good to know about these things
that I've talked about already.

653
00:35:19,470 --> 00:35:22,840
Here I'll talk about a bit
trick that actually does work.

654
00:35:22,840 --> 00:35:26,970
So here we're trying
to do modular addition.

655
00:35:26,970 --> 00:35:30,420
So we want to do x plus y mod n.

656
00:35:30,420 --> 00:35:35,100
And here let's assume that x
is between 0 and n minus 1,

657
00:35:35,100 --> 00:35:38,800
and y is also between
0 and n minus 1.

658
00:35:38,800 --> 00:35:41,190
So the standard way
to do this is just

659
00:35:41,190 --> 00:35:45,600
to use the mod operator,
x plus y mod n.

660
00:35:45,600 --> 00:35:47,640
However, this does
a division, which

661
00:35:47,640 --> 00:35:50,580
is relatively expensive
compared to other operations

662
00:35:50,580 --> 00:35:52,830
unless n is a power of 2.

663
00:35:52,830 --> 00:35:55,070
But most of the
time, you don't know

664
00:35:55,070 --> 00:35:57,100
if n is a power of
2 at compile time,

665
00:35:57,100 --> 00:35:59,400
so the compiler can't
actually translate this

666
00:35:59,400 --> 00:36:05,550
to a right shift operation, and
then it has to do a division.

667
00:36:05,550 --> 00:36:11,020
So here's another way to do
it without using division.

668
00:36:11,020 --> 00:36:15,330
So we're first going to set z
equal to the sum of x and y.

669
00:36:15,330 --> 00:36:18,060
And then if z is
less than n, then

670
00:36:18,060 --> 00:36:21,710
it's already within the range
and we can just return z.

671
00:36:21,710 --> 00:36:23,790
If z is greater
than or equal to n,

672
00:36:23,790 --> 00:36:25,950
well we know we can
be at most 2n minus 2

673
00:36:25,950 --> 00:36:29,220
because x and y were
both at most n minus 1.

674
00:36:29,220 --> 00:36:32,490
So all we have to do is to
subtract n and bring it back

675
00:36:32,490 --> 00:36:35,140
into range.

676
00:36:35,140 --> 00:36:37,470
However, this code has an
unpredictable branch here

677
00:36:37,470 --> 00:36:41,910
because we don't know whether
z is less than n or not.

678
00:36:41,910 --> 00:36:45,420
So now we can use the
same trick as minimum.

679
00:36:45,420 --> 00:36:49,180
So now we're going to
set r equal to z minus n

680
00:36:49,180 --> 00:36:55,380
ANDed with the negative of z
greater than or equal to n.

681
00:36:55,380 --> 00:36:58,590
So if z is less
than n, then this

682
00:36:58,590 --> 00:37:00,770
is going to return 0 in here.

683
00:37:00,770 --> 00:37:04,020
And n ANDed with 0 is 0,
so we're just left with z.

684
00:37:04,020 --> 00:37:07,380
And if z is greater
than or equal to n,

685
00:37:07,380 --> 00:37:09,330
then this is going to be 1.

686
00:37:09,330 --> 00:37:13,200
We negate that, we get negative
1, which is the all 1's word.

687
00:37:13,200 --> 00:37:15,450
n ANDed with all 1's is just n.

688
00:37:15,450 --> 00:37:19,320
So that is z minus n, which
will bring the result back

689
00:37:19,320 --> 00:37:19,980
into range.

690
00:37:24,830 --> 00:37:29,070
So any questions?

691
00:37:29,070 --> 00:37:29,570
Yes.

692
00:37:29,570 --> 00:37:31,278
AUDIENCE: It seems
like there essentially

693
00:37:31,278 --> 00:37:34,090
is still a branch based
on the value of z.

694
00:37:34,090 --> 00:37:37,760
So why would that be faster?

695
00:37:37,760 --> 00:37:40,190
JULIAN SHUN: So this branch
here is just generating

696
00:37:40,190 --> 00:37:42,140
either a Boolean value 1 or 0.

697
00:37:42,140 --> 00:37:44,900
There's actually-- like the
code that you execute after it,

698
00:37:44,900 --> 00:37:47,400
it's still the same
in either case.

699
00:37:47,400 --> 00:37:49,850
So the branch misprediction
only hurts you

700
00:37:49,850 --> 00:37:51,570
if there are two
different code paths.

701
00:37:51,570 --> 00:37:54,170
In this version, there are
two different code paths,

702
00:37:54,170 --> 00:37:58,130
because one is doing z and
one is doing z minus n.

703
00:38:02,620 --> 00:38:04,810
So the next problem
we will look at

704
00:38:04,810 --> 00:38:09,610
is computing or rounding a value
up to the nearest power of 2.

705
00:38:09,610 --> 00:38:15,160
And this is just 2 to the
ceiling of log base 2 of n.

706
00:38:15,160 --> 00:38:18,330
And recall that lg of n
is the log base 2 of n.

707
00:38:18,330 --> 00:38:23,140
That's the notation we'll
be using in this class.

708
00:38:23,140 --> 00:38:24,730
Here's some code to do this.

709
00:38:24,730 --> 00:38:28,070
So we have our value of n here.

710
00:38:28,070 --> 00:38:30,640
First, we're going
to decrement n.

711
00:38:30,640 --> 00:38:35,540
And then we're going to do an OR
of n with n right-shifted by 1.

712
00:38:35,540 --> 00:38:38,920
Then an OR with n and n
right-shifted by 2, and so on,

713
00:38:38,920 --> 00:38:40,490
all the way up to 32.

714
00:38:40,490 --> 00:38:44,500
So we do this for all
powers of 2 up to 32.

715
00:38:44,500 --> 00:38:47,750
And then, finally, we
increment n at the end.

716
00:38:47,750 --> 00:38:51,140
So let's look at an example
to see why this works.

717
00:38:51,140 --> 00:38:53,710
So we're starting with
this value of n here.

718
00:38:56,710 --> 00:38:59,030
First we're going
to decrement it.

719
00:38:59,030 --> 00:39:03,010
And what that does is it flips
the rightmost 1 bit to 0,

720
00:39:03,010 --> 00:39:06,500
and then it fills in all the
0's right of that with 1's.

721
00:39:09,490 --> 00:39:11,560
And then when we
do this line, which

722
00:39:11,560 --> 00:39:16,630
says n is equal to n ORed
with n right-shifted by 1,

723
00:39:16,630 --> 00:39:19,870
that's essentially propagating
all of the 1 bits one position

724
00:39:19,870 --> 00:39:22,270
to the right and
then ORing those in.

725
00:39:22,270 --> 00:39:25,390
So we can see that
this 1 bit got copied

726
00:39:25,390 --> 00:39:26,560
one position to the right.

727
00:39:26,560 --> 00:39:29,380
This 1 bit got copied to
one position to the right.

728
00:39:29,380 --> 00:39:32,100
These 1's also propagate, but
since they were already 1's

729
00:39:32,100 --> 00:39:35,290
it doesn't do anything.

730
00:39:35,290 --> 00:39:37,600
For the next line, we're
propagating the 1 bits

731
00:39:37,600 --> 00:39:40,030
two positions to the right.

732
00:39:40,030 --> 00:39:43,750
So this 1 bit here
gets copied here.

733
00:39:43,750 --> 00:39:47,290
This 1 gets copied
here, and so on.

734
00:39:47,290 --> 00:39:49,600
And then the next line is
going to propagate bits

735
00:39:49,600 --> 00:39:51,790
four positions the right.

736
00:39:51,790 --> 00:39:53,380
Then 8, 16, and 32.

737
00:39:53,380 --> 00:39:56,650
For this example here,
when I get to this line

738
00:39:56,650 --> 00:39:57,440
I'm already done.

739
00:39:57,440 --> 00:40:00,370
But, in general,
you have more bits

740
00:40:00,370 --> 00:40:04,510
in a word, which I
can't fit on this slide.

741
00:40:04,510 --> 00:40:07,210
And now we have
something that's exactly

742
00:40:07,210 --> 00:40:09,670
one less than a power of 2.

743
00:40:09,670 --> 00:40:12,010
And when we add 1 to that,
we just get a power of 2.

744
00:40:12,010 --> 00:40:14,890
So we're going to zero
out all of these 1 bits

745
00:40:14,890 --> 00:40:16,330
and then place a 1 here.

746
00:40:16,330 --> 00:40:19,030
And this is exactly
the power of 2

747
00:40:19,030 --> 00:40:21,360
that's greater than the value n.

748
00:40:28,890 --> 00:40:31,980
So the first line
here is essentially

749
00:40:31,980 --> 00:40:35,670
guaranteeing us that the
log nth minus 1 bit is set.

750
00:40:35,670 --> 00:40:37,860
And we need that bit
to be set because we

751
00:40:37,860 --> 00:40:40,890
want to propagate that
bit to all the positions

752
00:40:40,890 --> 00:40:43,890
to the right of it.

753
00:40:43,890 --> 00:40:47,370
And then these six lines here
are populating all the bits

754
00:40:47,370 --> 00:40:49,770
to the right with 1's.

755
00:40:49,770 --> 00:40:53,580
And then the last bit is
setting the log nth bit to 1

756
00:40:53,580 --> 00:40:55,260
and then clearing all
of the other bits.

757
00:40:58,420 --> 00:41:01,330
So one question is why
did we have to decrement n

758
00:41:01,330 --> 00:41:02,425
at the beginning?

759
00:41:05,540 --> 00:41:06,425
Yes.

760
00:41:06,425 --> 00:41:08,600
AUDIENCE: In case n is
already [INAUDIBLE]..

761
00:41:08,600 --> 00:41:09,308
JULIAN SHUN: Yes.

762
00:41:09,308 --> 00:41:13,130
So if n is already a power of
2 and if we don't decrement n,

763
00:41:13,130 --> 00:41:16,340
this is isn't going to work
because the log nth minus 1 bit

764
00:41:16,340 --> 00:41:17,360
isn't set.

765
00:41:17,360 --> 00:41:19,660
But if we decrement
n, then it's going

766
00:41:19,660 --> 00:41:22,430
to guarantee us that
the log nth minus 1 bit

767
00:41:22,430 --> 00:41:24,650
is set so that we can
propagate that to the right.

768
00:41:28,590 --> 00:41:31,430
Any questions?

769
00:41:31,430 --> 00:41:31,930
Yes.

770
00:41:31,930 --> 00:41:34,480
AUDIENCE: [INAUDIBLE]?

771
00:41:34,480 --> 00:41:36,550
JULIAN SHUN: Because,
in general, you're

772
00:41:36,550 --> 00:41:39,720
using 64-bit words.

773
00:41:39,720 --> 00:41:41,710
Here I don't have
that many bits here

774
00:41:41,710 --> 00:41:43,210
because I can't fit
in on the slide,

775
00:41:43,210 --> 00:41:44,627
but in general you
have more bits.

776
00:41:51,810 --> 00:41:53,370
Let's look at another problem.

777
00:41:53,370 --> 00:41:57,060
Here we want to compute the
mask of the least significant 1

778
00:41:57,060 --> 00:41:58,980
in a word x.

779
00:41:58,980 --> 00:42:01,110
So we want a mask
that has a 1 in only

780
00:42:01,110 --> 00:42:04,200
the position of the least
significant 1 in x, and 0's

781
00:42:04,200 --> 00:42:06,160
everywhere else.

782
00:42:06,160 --> 00:42:08,750
So how can we do this?

783
00:42:08,750 --> 00:42:11,210
So we can set r, the
result, equal to x

784
00:42:11,210 --> 00:42:12,350
ANDed with negative x.

785
00:42:15,730 --> 00:42:18,550
So let's look at why this works.

786
00:42:18,550 --> 00:42:20,260
So here is x.

787
00:42:20,260 --> 00:42:27,200
And recall negative x is the
two's complement of x plus 1.

788
00:42:27,200 --> 00:42:33,670
So what we do is we flip all of
the bits up to the rightmost 1

789
00:42:33,670 --> 00:42:36,160
but not including it, and then
we just copy all of the bits

790
00:42:36,160 --> 00:42:36,680
over.

791
00:42:36,680 --> 00:42:40,770
That's how we get
negative x from x.

792
00:42:40,770 --> 00:42:43,570
And then now when we
compare x and negative x,

793
00:42:43,570 --> 00:42:47,320
we see that all of the bits
when we AND them together

794
00:42:47,320 --> 00:42:52,420
are going to be 0 except
for the bit at the position

795
00:42:52,420 --> 00:42:55,420
corresponding to the least
significant 1 bit in x.

796
00:42:55,420 --> 00:42:58,540
And that's going to be 1
since we're ANDing 1 and 1,

797
00:42:58,540 --> 00:43:00,730
and everything else
is going to be 0.

798
00:43:00,730 --> 00:43:02,740
And this will give us
the mask that we want.

799
00:43:06,860 --> 00:43:10,520
So this works because the
binary representation of minus x

800
00:43:10,520 --> 00:43:13,450
is just the one's
complement of x plus 1.

801
00:43:18,916 --> 00:43:23,390
So now, a question is how can
we find the index of this bit?

802
00:43:23,390 --> 00:43:26,150
So here I'm just generating
a mask that has a 1

803
00:43:26,150 --> 00:43:31,310
in the least significant 1
in x, but it doesn't actually

804
00:43:31,310 --> 00:43:33,140
tell me the index of this bit.

805
00:43:33,140 --> 00:43:36,950
In other words, I want to find
the log base 2 of a power of 2.

806
00:43:40,110 --> 00:43:42,090
So that's the problem
we want to solve,

807
00:43:42,090 --> 00:43:46,170
and here's some code
that lets us do this.

808
00:43:46,170 --> 00:43:49,260
So we have this constant
called the de Bruijn.

809
00:43:49,260 --> 00:43:51,770
It's written in hex here.

810
00:43:51,770 --> 00:43:56,850
And then we have this table
of size 64 called convert.

811
00:43:56,850 --> 00:44:00,600
And now all we have to do is
multiply x by this de Bruijn

812
00:44:00,600 --> 00:44:04,140
constant, right shift
it by 58 positions,

813
00:44:04,140 --> 00:44:06,750
and then look up the result
in the convert table.

814
00:44:06,750 --> 00:44:10,850
And that's going to give us the
log base 2 of the power of 2.

815
00:44:10,850 --> 00:44:12,078
Any questions?

816
00:44:12,078 --> 00:44:16,380
[STUDENTS LAUGH]

817
00:44:18,800 --> 00:44:21,810
So this looks like magic to us.

818
00:44:21,810 --> 00:44:25,340
So in the spirit of magic, we're
going to do a mathemagic trick.

819
00:44:25,340 --> 00:44:29,240
And to do this trick, I'm
going to need five volunteers,

820
00:44:29,240 --> 00:44:30,740
and the only
requirement is that you

821
00:44:30,740 --> 00:44:33,080
need to be able to
follow directions.

822
00:44:33,080 --> 00:44:35,900
So who wants to volunteer
for this magic trick?

823
00:44:35,900 --> 00:44:42,650
Yes, 1, 2, 3, 4--

824
00:44:42,650 --> 00:44:45,500
one more-- 5.

825
00:44:45,500 --> 00:44:47,770
All right, come on up.

826
00:44:47,770 --> 00:44:49,103
So line up here.

827
00:44:49,103 --> 00:44:52,490
[STUDENTS APPLAUD]

828
00:44:52,490 --> 00:44:54,430
Yes, just line up right here.

829
00:45:03,843 --> 00:45:05,635
Can you move a little
bit over to the left?

830
00:45:08,190 --> 00:45:08,780
OK, cool.

831
00:45:08,780 --> 00:45:13,790
So today I have the pleasure of
welcoming Jess Ray, also known

832
00:45:13,790 --> 00:45:16,970
as The Golden Raytio,
to join us for a lecture

833
00:45:16,970 --> 00:45:19,650
and help us perform
this cool magic trick.

834
00:45:19,650 --> 00:45:21,984
So let's give her a
round of applause.

835
00:45:21,984 --> 00:45:24,952
[STUDENTS APPLAUD]

836
00:45:24,952 --> 00:45:27,410
JESS RAY: I'm going to be doing
a little bit of magic trick

837
00:45:27,410 --> 00:45:28,670
for you all today.

838
00:45:28,670 --> 00:45:31,580
I'm going to be reading
your guys' minds.

839
00:45:31,580 --> 00:45:33,080
And I know you're
looking skeptical,

840
00:45:33,080 --> 00:45:36,397
but I'm hoping I can
convince you here.

841
00:45:36,397 --> 00:45:37,980
So we'll get to that
part in a second.

842
00:45:37,980 --> 00:45:41,330
But, first, the first
big step in reading minds

843
00:45:41,330 --> 00:45:43,100
is you got to clear
the air, like get

844
00:45:43,100 --> 00:45:45,770
rid of all the negative
vibes, all the bad energy.

845
00:45:45,770 --> 00:45:46,430
Throw that out.

846
00:45:46,430 --> 00:45:48,690
So I'm going to need a
little help from you guys

847
00:45:48,690 --> 00:45:50,270
in doing this.

848
00:45:50,270 --> 00:45:52,650
So, first, we have this
sweet little bell here.

849
00:45:52,650 --> 00:45:53,150
Let's see.

850
00:45:53,150 --> 00:45:54,097
Who wants the bell?

851
00:45:54,097 --> 00:45:55,430
AUDIENCE: I'll take it, I guess.

852
00:45:55,430 --> 00:45:55,760
JESS RAY: All right.

853
00:45:55,760 --> 00:45:56,900
Can you hold that for a second?

854
00:45:56,900 --> 00:45:58,358
So what this bell
is going to do is

855
00:45:58,358 --> 00:46:00,500
help us get rid of some
of those negative ideas.

856
00:46:00,500 --> 00:46:02,420
Can you give it a ring?

857
00:46:02,420 --> 00:46:03,470
Oh yes.

858
00:46:03,470 --> 00:46:06,080
So that painful ringing you're
hearing in your ears right now

859
00:46:06,080 --> 00:46:08,060
is actually just clearing
up the air for us,

860
00:46:08,060 --> 00:46:10,410
making it so I can
read your minds.

861
00:46:10,410 --> 00:46:10,910
Thank you.

862
00:46:10,910 --> 00:46:11,430
Stop that.

863
00:46:15,070 --> 00:46:16,210
All right.

864
00:46:16,210 --> 00:46:19,480
Next we have this
magic tone here.

865
00:46:19,480 --> 00:46:21,227
Who would like to
give this a spin?

866
00:46:21,227 --> 00:46:23,060
Can you shake that
around a couple of times?

867
00:46:23,060 --> 00:46:23,852
Spin it.

868
00:46:23,852 --> 00:46:26,560
Spin it with your wrist there,
like-- you can go like this.

869
00:46:26,560 --> 00:46:28,900
There we go.

870
00:46:28,900 --> 00:46:29,680
All right.

871
00:46:29,680 --> 00:46:30,410
Perfect.

872
00:46:30,410 --> 00:46:30,910
All right.

873
00:46:30,910 --> 00:46:32,770
It's feeling a
little clearer here.

874
00:46:32,770 --> 00:46:36,040
I can start-- you can start
getting things off your mind.

875
00:46:36,040 --> 00:46:38,920
Don't worry, I won't tell
anybody what you're thinking.

876
00:46:38,920 --> 00:46:41,160
Oh, let's see what else.

877
00:46:41,160 --> 00:46:42,850
Let me channel the spirits.

878
00:46:42,850 --> 00:46:45,310
Help me out here.

879
00:46:45,310 --> 00:46:47,210
All right, I'm feeling good.

880
00:46:47,210 --> 00:46:47,710
All right.

881
00:46:47,710 --> 00:46:51,600
So what we're going to
be doing is, as I said,

882
00:46:51,600 --> 00:46:52,350
reading your mind.

883
00:46:52,350 --> 00:46:54,500
I'm going to be doing
this by giving you cards,

884
00:46:54,500 --> 00:46:56,250
and I'm going to tell
you what each of you

885
00:46:56,250 --> 00:46:57,520
are holding for the card.

886
00:46:57,520 --> 00:47:00,090
So I have some cards here.

887
00:47:00,090 --> 00:47:02,440
Well, I guess these
are a little small.

888
00:47:02,440 --> 00:47:03,450
Let's see.

889
00:47:03,450 --> 00:47:05,432
Go a little bigger.

890
00:47:05,432 --> 00:47:06,680
Meh.

891
00:47:06,680 --> 00:47:07,850
Here we go.

892
00:47:07,850 --> 00:47:12,430
Let's-- this looks better.

893
00:47:12,430 --> 00:47:13,670
All right.

894
00:47:13,670 --> 00:47:17,380
These are kind of heavy.

895
00:47:17,380 --> 00:47:22,270
Get rid of these junk ones
up here, all the junk.

896
00:47:22,270 --> 00:47:23,080
All right.

897
00:47:23,080 --> 00:47:25,690
So I need your help for this.

898
00:47:25,690 --> 00:47:27,898
So what I want you to
do is take the cards

899
00:47:27,898 --> 00:47:29,690
and cut the deck as
many times as you want.

900
00:47:29,690 --> 00:47:32,230
So, basically, just going
like that however much.

901
00:47:32,230 --> 00:47:34,810
Just don't actually
shuffle them randomly.

902
00:47:37,239 --> 00:47:38,072
AUDIENCE: All right.

903
00:47:38,072 --> 00:47:38,572
Here you go.

904
00:47:38,572 --> 00:47:40,237
JESS RAY: All right, cool.

905
00:47:40,237 --> 00:47:42,070
So now I'm going to
hand each of you a card.

906
00:47:42,070 --> 00:47:42,910
Don't let me see it.

907
00:47:42,910 --> 00:47:43,910
Feel free to look at it.

908
00:47:50,176 --> 00:47:52,120
There you go.

909
00:47:52,120 --> 00:47:52,660
All right.

910
00:47:52,660 --> 00:47:55,900
So the reason I'm wearing
this awesome onesie

911
00:47:55,900 --> 00:47:59,050
is this helps me sweat
out the bad energy.

912
00:47:59,050 --> 00:48:01,000
I'm literally
sweating right now.

913
00:48:01,000 --> 00:48:03,580
But there's one more piece that
we need for this mind reading

914
00:48:03,580 --> 00:48:04,080
trick.

915
00:48:07,590 --> 00:48:09,540
The magic hat.

916
00:48:09,540 --> 00:48:10,590
All right.

917
00:48:10,590 --> 00:48:12,180
See if this fits on my head.

918
00:48:12,180 --> 00:48:12,948
There we go.

919
00:48:12,948 --> 00:48:13,740
Where's the switch?

920
00:48:13,740 --> 00:48:14,240
All right.

921
00:48:14,240 --> 00:48:16,020
Turn it on.

922
00:48:16,020 --> 00:48:17,730
All right, I'm
feeling good here.

923
00:48:17,730 --> 00:48:19,960
All right, you guys ready?

924
00:48:19,960 --> 00:48:20,460
All right.

925
00:48:20,460 --> 00:48:23,610
So I do need a little help
getting this trick started.

926
00:48:23,610 --> 00:48:25,582
So if you are
holding a red card,

927
00:48:25,582 --> 00:48:26,790
can you just raise your hand?

928
00:48:29,580 --> 00:48:31,202
So no?

929
00:48:31,202 --> 00:48:32,160
Who's got the red card?

930
00:48:32,160 --> 00:48:33,720
Red, red.

931
00:48:33,720 --> 00:48:34,980
You don't have red?

932
00:48:34,980 --> 00:48:36,070
OK.

933
00:48:36,070 --> 00:48:36,570
All right.

934
00:48:36,570 --> 00:48:39,150
So the first one
and the third one.

935
00:48:39,150 --> 00:48:40,588
All right.

936
00:48:40,588 --> 00:48:42,630
So let me handle the mind
reading abilities here.

937
00:48:42,630 --> 00:48:44,550
Now what I'm going to do is
I'm going to go left to right

938
00:48:44,550 --> 00:48:45,420
and tell you what
you're holding.

939
00:48:45,420 --> 00:48:48,210
Obviously, I know the color, but
I'll tell you what suit it is,

940
00:48:48,210 --> 00:48:52,390
and also I will tell
you what the number is.

941
00:48:52,390 --> 00:48:55,115
So first card, obviously
I know you have a red.

942
00:48:55,115 --> 00:48:56,910
Hmm.

943
00:48:56,910 --> 00:49:00,403
I'm feeling a diamond
and also a four?

944
00:49:00,403 --> 00:49:01,320
AUDIENCE: That was it.

945
00:49:01,320 --> 00:49:02,550
JESS RAY: Yes.

946
00:49:02,550 --> 00:49:05,170
All right.

947
00:49:05,170 --> 00:49:05,670
All right.

948
00:49:05,670 --> 00:49:06,870
Good start, good start.

949
00:49:09,390 --> 00:49:09,900
All right.

950
00:49:09,900 --> 00:49:13,395
Got to-- got to think about
what the next one is here.

951
00:49:15,820 --> 00:49:16,320
All right.

952
00:49:16,320 --> 00:49:19,800
So I know you had a black card.

953
00:49:19,800 --> 00:49:20,610
Let's see.

954
00:49:20,610 --> 00:49:24,840
Black of spades.

955
00:49:24,840 --> 00:49:27,300
Is it the ace of spades?

956
00:49:27,300 --> 00:49:28,120
Oh yes.

957
00:49:28,120 --> 00:49:28,930
There we go.

958
00:49:33,840 --> 00:49:34,340
All right.

959
00:49:34,340 --> 00:49:39,060
So back to red.

960
00:49:39,060 --> 00:49:39,560
All right.

961
00:49:39,560 --> 00:49:41,210
This one, let's see.

962
00:49:41,210 --> 00:49:45,710
Red, diamond, two.

963
00:49:45,710 --> 00:49:46,630
All right, all right.

964
00:49:46,630 --> 00:49:47,722
We're doing good so far.

965
00:49:47,722 --> 00:49:48,680
Can I get the last two?

966
00:49:51,652 --> 00:49:53,360
All right, let's see
what we can do here.

967
00:49:53,360 --> 00:49:59,750
All right, black, club, four.

968
00:49:59,750 --> 00:50:00,350
All right.

969
00:50:00,350 --> 00:50:04,550
Last one, last one.

970
00:50:04,550 --> 00:50:06,160
All right.

971
00:50:06,160 --> 00:50:07,730
Oh, it's going to
be a tough one.

972
00:50:10,250 --> 00:50:16,167
Black, spade, eight.

973
00:50:16,167 --> 00:50:19,999
[STUDENTS APPLAUD]

974
00:50:21,440 --> 00:50:23,830
And if we had time, I
could you mystify you

975
00:50:23,830 --> 00:50:27,650
and go through the rest of the
deck, but we won't do that.

976
00:50:27,650 --> 00:50:29,290
So thank you guys very much.

977
00:50:29,290 --> 00:50:32,390
I hope your minds were blown.

978
00:50:32,390 --> 00:50:32,890
Yes.

979
00:50:32,890 --> 00:50:35,580
So me collect the
cards back from you.

980
00:50:39,747 --> 00:50:41,430
Thank you.

981
00:50:41,430 --> 00:50:41,930
All right.

982
00:50:41,930 --> 00:50:42,430
Thank you.

983
00:50:42,430 --> 00:50:45,022
Now I can get out of
this and stop sweating.

984
00:50:45,022 --> 00:50:51,688
[STUDENTS APPLAUD]

985
00:50:51,688 --> 00:50:53,230
JULIAN SHUN: It's
pretty cool, right?

986
00:50:56,570 --> 00:50:58,140
So why does this actually work?

987
00:51:00,900 --> 00:51:02,790
To know why this
trick actually works,

988
00:51:02,790 --> 00:51:07,830
we need to first study what
a de Bruijn sequence is.

989
00:51:07,830 --> 00:51:10,680
So a de Bruijn
sequence s of length 2

990
00:51:10,680 --> 00:51:15,270
to the k is a cyclic bit
sequence such that each

991
00:51:15,270 --> 00:51:20,730
of the 2 to the k possible
bit strings of length k

992
00:51:20,730 --> 00:51:25,470
occurs exactly once
as a substring in s.

993
00:51:25,470 --> 00:51:27,330
So this a pretty long
definition, so let's

994
00:51:27,330 --> 00:51:28,930
look at an example.

995
00:51:28,930 --> 00:51:32,950
So here is a de Bruijn
sequence for k equals 3.

996
00:51:32,950 --> 00:51:38,830
So the length of this sequence
is 8 because 2 to the 3 is 8.

997
00:51:38,830 --> 00:51:44,740
And you can see that each of the
possible three-bit substrings

998
00:51:44,740 --> 00:51:50,350
occurs exactly once in this
cyclic bit string of length 8.

999
00:51:50,350 --> 00:51:52,480
So it wraps around
and you can consider

1000
00:51:52,480 --> 00:51:55,810
this as a cyclic string.

1001
00:51:55,810 --> 00:52:00,100
So we see that 000
appears at position 0.

1002
00:52:00,100 --> 00:52:03,055
001 is at position 1.

1003
00:52:03,055 --> 00:52:05,740
Then 010 is at position 6.

1004
00:52:05,740 --> 00:52:10,660
011 is at position 2.

1005
00:52:10,660 --> 00:52:13,060
100 is at position 7.

1006
00:52:13,060 --> 00:52:15,250
101 is at 5.

1007
00:52:15,250 --> 00:52:17,180
110 is at 4.

1008
00:52:17,180 --> 00:52:19,190
And then 111 is at 3.

1009
00:52:19,190 --> 00:52:23,950
So all of the 8 possible
substrings of length 3

1010
00:52:23,950 --> 00:52:26,320
occur exactly once in
this de Bruijn sequence.

1011
00:52:29,920 --> 00:52:36,520
So now we're going to create
this convert table of length 8.

1012
00:52:36,520 --> 00:52:38,410
In general, this
will be 2 to the k.

1013
00:52:38,410 --> 00:52:40,010
And here, k is 3.

1014
00:52:40,010 --> 00:52:45,310
And in this convert table, what
we're storing in each position

1015
00:52:45,310 --> 00:52:48,430
is the index in the
de Bruijn sequence

1016
00:52:48,430 --> 00:52:51,730
where the bit string
corresponding to that position

1017
00:52:51,730 --> 00:52:54,110
starts in the de
Bruijn sequence.

1018
00:52:54,110 --> 00:53:00,220
So here we see that convert of
2 is 6 because the bit string

1019
00:53:00,220 --> 00:53:04,210
corresponding to 2 is 010,
and that begins at position 6

1020
00:53:04,210 --> 00:53:06,370
in the de Bruijn sequence.

1021
00:53:06,370 --> 00:53:11,770
We also see that convert
of 4 is 7 because 4 is 100,

1022
00:53:11,770 --> 00:53:14,530
and that begins at position
7 in the de Bruijn sequence.

1023
00:53:17,590 --> 00:53:21,310
Now we have this convert table.

1024
00:53:21,310 --> 00:53:23,230
And recall that we're
trying to compute

1025
00:53:23,230 --> 00:53:25,490
the log base 2 of a power of 2.

1026
00:53:25,490 --> 00:53:27,670
So hopefully you
guys remember that.

1027
00:53:31,250 --> 00:53:33,340
So the way to do
this is we're going

1028
00:53:33,340 --> 00:53:36,880
to multiply the
de Bruijn sequence

1029
00:53:36,880 --> 00:53:39,410
constant by this power of 2.

1030
00:53:39,410 --> 00:53:41,860
So let's say we're
working with the integer

1031
00:53:41,860 --> 00:53:43,480
16, which is 2 to the 4.

1032
00:53:43,480 --> 00:53:46,100
So we're going to multiply
this de Bruijn sequence by 2

1033
00:53:46,100 --> 00:53:47,420
to the 4.

1034
00:53:47,420 --> 00:53:49,480
And when we multiply
by a power of 2,

1035
00:53:49,480 --> 00:53:52,810
that's the same
as left shifting.

1036
00:53:52,810 --> 00:53:55,300
So that's going to left shift
the de Bruijn sequence four

1037
00:53:55,300 --> 00:53:58,660
positions to the left.

1038
00:53:58,660 --> 00:54:01,810
And then now we want to
see which of the eight

1039
00:54:01,810 --> 00:54:05,480
possible substrings appears at
the beginning of this sequence.

1040
00:54:05,480 --> 00:54:08,320
And after we do
the left shift, 110

1041
00:54:08,320 --> 00:54:11,945
appears at the beginning
of the sequence.

1042
00:54:11,945 --> 00:54:13,570
And we want to extract
this out, and we

1043
00:54:13,570 --> 00:54:17,440
can do that by right
shifting five positions.

1044
00:54:17,440 --> 00:54:21,670
And 110 is just 6.

1045
00:54:21,670 --> 00:54:25,060
And we can figure out where
5 starts in this de Bruijn

1046
00:54:25,060 --> 00:54:27,820
sequence by looking it
up in the convert table.

1047
00:54:27,820 --> 00:54:30,910
We see that convert of 6 is 4.

1048
00:54:30,910 --> 00:54:36,040
So the string 110
appears starting

1049
00:54:36,040 --> 00:54:38,980
at position 4 in the
de Bruijn sequence,

1050
00:54:38,980 --> 00:54:41,590
and that means that we
did a left shift by 4

1051
00:54:41,590 --> 00:54:44,710
in the first step, and
that gives us the log base

1052
00:54:44,710 --> 00:54:47,680
2 of the power of 2,
because the only reason why

1053
00:54:47,680 --> 00:54:51,910
we did a left shift by 4
is because the power of 2

1054
00:54:51,910 --> 00:54:54,820
was 2 to the 4.

1055
00:54:54,820 --> 00:54:58,300
So this returns us the
log base 2 of the integer

1056
00:54:58,300 --> 00:54:59,320
that we started with.

1057
00:55:02,660 --> 00:55:04,960
And one thing to
note is that it's

1058
00:55:04,960 --> 00:55:08,890
important to start with all
0's in this sequence here,

1059
00:55:08,890 --> 00:55:14,260
because we're representing
this as a cyclic bit sequence.

1060
00:55:14,260 --> 00:55:16,390
So when we do a
left shift, we need

1061
00:55:16,390 --> 00:55:20,350
to make sure that the values
that fill in on the right side

1062
00:55:20,350 --> 00:55:21,190
are correct.

1063
00:55:21,190 --> 00:55:25,750
So notice that in the sixth
and seventh positions,

1064
00:55:25,750 --> 00:55:30,100
we need 0's at the
end when we overflow.

1065
00:55:30,100 --> 00:55:32,020
So because the de
Bruijn sequence

1066
00:55:32,020 --> 00:55:34,540
starts with all 0's, when
we do the left shift,

1067
00:55:34,540 --> 00:55:36,760
it's automatically filling
with 0's, giving us

1068
00:55:36,760 --> 00:55:39,190
the correct substring.

1069
00:55:39,190 --> 00:55:44,020
So the magic trick that Jess
did had 32 cards, and in that

1070
00:55:44,020 --> 00:55:46,750
case k was equal to 5.

1071
00:55:46,750 --> 00:55:50,500
And the cards were arranged
according to a de Bruijn

1072
00:55:50,500 --> 00:55:53,500
sequence of length 32.

1073
00:55:53,500 --> 00:55:55,750
And each of the
cards corresponded

1074
00:55:55,750 --> 00:56:00,700
to one particular bit
string of length 5.

1075
00:56:00,700 --> 00:56:04,570
And the color of the card
corresponded to the bit.

1076
00:56:04,570 --> 00:56:08,950
So when she asked you what
the color of your card was,

1077
00:56:08,950 --> 00:56:11,740
she could determine
the bits corresponding

1078
00:56:11,740 --> 00:56:15,040
to the first card
in the sequence

1079
00:56:15,040 --> 00:56:19,630
because she has the 5 bits
corresponding to that card.

1080
00:56:19,630 --> 00:56:21,430
And then with that she
has some clever way

1081
00:56:21,430 --> 00:56:24,340
to determine the
rest of the cards.

1082
00:56:24,340 --> 00:56:26,650
So that's how the
de Bruijn sequence

1083
00:56:26,650 --> 00:56:28,930
is related to the magic
trick that you just saw.

1084
00:56:33,940 --> 00:56:35,530
Any questions?

1085
00:56:35,530 --> 00:56:36,030
Yes.

1086
00:56:36,030 --> 00:56:37,405
AUDIENCE: The de
Bruijn sequence,

1087
00:56:37,405 --> 00:56:40,610
do you need to do
cyclic translation?

1088
00:56:40,610 --> 00:56:43,110
JULIAN SHUN: So there could be
multiple de Bruijn sequences.

1089
00:56:43,110 --> 00:56:45,660
We just need one particular
de Bruijn sequence

1090
00:56:45,660 --> 00:56:48,444
to make this bit trick work.

1091
00:56:48,444 --> 00:56:48,944
Yes.

1092
00:56:52,210 --> 00:56:55,520
So this example is
just for k equals 3.

1093
00:56:55,520 --> 00:56:59,890
And the code I showed you
before, that was for k

1094
00:56:59,890 --> 00:57:03,910
equals 8, so you can
do up to 64-bit words.

1095
00:57:03,910 --> 00:57:04,420
Yes.

1096
00:57:04,420 --> 00:57:07,525
AUDIENCE: How do we know
that the sequence exists?

1097
00:57:07,525 --> 00:57:09,400
JULIAN SHUN: So there
is a mathematical proof

1098
00:57:09,400 --> 00:57:11,200
that says that.

1099
00:57:11,200 --> 00:57:13,300
I can give you some
pointers so that you

1100
00:57:13,300 --> 00:57:14,500
can look at it after class.

1101
00:57:14,500 --> 00:57:18,130
But there's a proof that
says that for any length

1102
00:57:18,130 --> 00:57:19,536
there is a de Bruijn sequence.

1103
00:57:22,470 --> 00:57:23,233
Yes.

1104
00:57:23,233 --> 00:57:24,900
AUDIENCE: Sorry, I
missed the procedure.

1105
00:57:24,900 --> 00:57:27,870
So how exactly do you
determine the log base 2?

1106
00:57:31,650 --> 00:57:32,910
JULIAN SHUN: So we have--

1107
00:57:32,910 --> 00:57:37,230
we're starting with some
integer that is a power of 2.

1108
00:57:37,230 --> 00:57:40,340
So when we multiply
by that power of 2,

1109
00:57:40,340 --> 00:57:44,460
it's left-shifting by
the log base 2 of that.

1110
00:57:44,460 --> 00:57:48,810
And then we can determine how
much we left-shifted because we

1111
00:57:48,810 --> 00:57:50,760
know--

1112
00:57:50,760 --> 00:57:53,940
we can just look at the first
three bits of this sequence

1113
00:57:53,940 --> 00:57:55,980
after we did the
left shift, and then

1114
00:57:55,980 --> 00:57:58,950
look at where that
three-bit sequence appears

1115
00:57:58,950 --> 00:58:04,570
in the original de Bruijn
sequence before we shifted it.

1116
00:58:04,570 --> 00:58:07,560
And to do that, you can look
it up in the convert table.

1117
00:58:07,560 --> 00:58:11,455
This is what we did when we
looked up the bit string 110

1118
00:58:11,455 --> 00:58:12,330
in the convert table.

1119
00:58:12,330 --> 00:58:15,330
And that tells us that it
starts in the fourth position.

1120
00:58:15,330 --> 00:58:18,240
That means that we
left-shifted by 4,

1121
00:58:18,240 --> 00:58:23,610
and that means that the
value of n was 2 to the 4.

1122
00:58:23,610 --> 00:58:25,570
Does that make sense?

1123
00:58:25,570 --> 00:58:26,070
Yes.

1124
00:58:26,070 --> 00:58:27,903
AUDIENCE: So just to
clarify this only works

1125
00:58:27,903 --> 00:58:30,290
if you multiply the
sequence by a power of 2,

1126
00:58:30,290 --> 00:58:32,332
then it gives you back
which power of 2 it was?

1127
00:58:32,332 --> 00:58:33,040
JULIAN SHUN: Yes.

1128
00:58:33,040 --> 00:58:36,130
So this only works if you're
starting with a power of 2.

1129
00:58:36,130 --> 00:58:38,760
So if it's not a power
of 2, this doesn't work.

1130
00:58:46,125 --> 00:58:48,089
Any other questions?

1131
00:58:51,526 --> 00:58:52,030
Yes.

1132
00:58:52,030 --> 00:58:54,030
So if it's not a power
of 2, you can round it up

1133
00:58:54,030 --> 00:58:56,150
to the nearest power
of 2 using another bit

1134
00:58:56,150 --> 00:58:57,620
trick that we saw earlier.

1135
00:58:57,620 --> 00:58:59,420
And then you can use
this bit trick here.

1136
00:59:02,430 --> 00:59:05,250
The performance
of this bit trick

1137
00:59:05,250 --> 00:59:07,890
is limited by the performance
of multiplication and table

1138
00:59:07,890 --> 00:59:08,700
lookup.

1139
00:59:08,700 --> 00:59:11,850
So you have to do
a multiplication

1140
00:59:11,850 --> 00:59:14,490
by some constant,
and then you have

1141
00:59:14,490 --> 00:59:17,460
to do table lookup in
this convert table.

1142
00:59:17,460 --> 00:59:20,190
So a table lookup does
a memory reference,

1143
00:59:20,190 --> 00:59:21,990
which could be expensive.

1144
00:59:21,990 --> 00:59:24,900
And nowadays there's actually
a hardware instruction

1145
00:59:24,900 --> 00:59:26,640
to compute this, so
you don't actually

1146
00:59:26,640 --> 00:59:28,680
have to implement this trick.

1147
00:59:28,680 --> 00:59:30,450
But this trick is
still pretty cool.

1148
00:59:30,450 --> 00:59:33,000
And in the past this
is how you would do it

1149
00:59:33,000 --> 00:59:35,940
before there was a hardware
instruction that came out.

1150
00:59:41,120 --> 00:59:42,890
So let's look at
another problem.

1151
00:59:42,890 --> 00:59:45,120
So this is the n queens problem.

1152
00:59:45,120 --> 00:59:46,780
How many of you have
seen this before?

1153
00:59:46,780 --> 00:59:47,280
Yes.

1154
00:59:47,280 --> 00:59:49,970
So many of you have
seen this before.

1155
00:59:49,970 --> 00:59:52,250
As a reminder, we're
trying to place n queens

1156
00:59:52,250 --> 00:59:57,350
on an n by n chessboard so that
no queen attacks another queen.

1157
00:59:57,350 --> 00:59:59,030
In other words, there
are no two queens

1158
00:59:59,030 --> 01:00:03,110
in any row, any column,
or any diagonal.

1159
01:00:03,110 --> 01:00:04,940
And, commonly, we want
to count the number

1160
01:00:04,940 --> 01:00:08,210
of possible solutions
to the n queens problem

1161
01:00:08,210 --> 01:00:10,760
for a particular value of n.

1162
01:00:10,760 --> 01:00:14,930
And in this example here,
this is a valid configuration.

1163
01:00:14,930 --> 01:00:17,270
You can check, for
each of the queens,

1164
01:00:17,270 --> 01:00:19,460
they can't attack any
other queen on the board.

1165
01:00:23,450 --> 01:00:26,450
So one common strategy for
implementing the n queens

1166
01:00:26,450 --> 01:00:29,090
algorithm is to
use backtracking.

1167
01:00:29,090 --> 01:00:31,440
We're going to try
placing queens row by row.

1168
01:00:31,440 --> 01:00:33,620
We know that there can
only be one queen per row,

1169
01:00:33,620 --> 01:00:36,680
so we just need to determine
which position in that row

1170
01:00:36,680 --> 01:00:38,150
the queen will appear in.

1171
01:00:38,150 --> 01:00:40,490
And then if we can't
place a queen in any row,

1172
01:00:40,490 --> 01:00:43,820
then we backtrack.

1173
01:00:43,820 --> 01:00:46,758
So, for example,
in the first row,

1174
01:00:46,758 --> 01:00:48,800
we'll just place the queen
in the first position,

1175
01:00:48,800 --> 01:00:50,480
because there's no
queens on the board

1176
01:00:50,480 --> 01:00:53,390
yet, so the first
position is valid.

1177
01:00:53,390 --> 01:00:55,790
For the second row,
we're going to try

1178
01:00:55,790 --> 01:00:59,990
to place in the first position,
but we can't place it there

1179
01:00:59,990 --> 01:01:03,410
because then it will
attack the first queen.

1180
01:01:03,410 --> 01:01:05,960
And then the second
position is also invalid,

1181
01:01:05,960 --> 01:01:10,970
so the third position is where
we place the second queen.

1182
01:01:10,970 --> 01:01:12,610
Now, for the third
row we're going

1183
01:01:12,610 --> 01:01:15,890
to check the positions until
we get to one that's valid,

1184
01:01:15,890 --> 01:01:18,170
and this is going to
be the fifth position.

1185
01:01:21,160 --> 01:01:22,630
Do this again.

1186
01:01:22,630 --> 01:01:25,780
Here we can do it in
the second position.

1187
01:01:25,780 --> 01:01:29,840
For the fifth row, let's see
where this is going to end up.

1188
01:01:29,840 --> 01:01:30,340
OK.

1189
01:01:30,340 --> 01:01:33,130
So it goes in the
fourth position.

1190
01:01:33,130 --> 01:01:34,794
What about the sixth row?

1191
01:01:44,290 --> 01:01:44,790
Whoops.

1192
01:01:44,790 --> 01:01:48,430
So all of the eight
positions are invalid,

1193
01:01:48,430 --> 01:01:51,010
because if we place the queen
in any of those positions,

1194
01:01:51,010 --> 01:01:53,817
it's going to attack one of the
queens that we already placed.

1195
01:01:53,817 --> 01:01:55,150
So now we're going to backtrack.

1196
01:01:55,150 --> 01:01:59,040
We're going to find another
position for the fifth queen.

1197
01:01:59,040 --> 01:02:01,313
So let's try some
more positions.

1198
01:02:04,630 --> 01:02:07,170
So we can place it at the end.

1199
01:02:07,170 --> 01:02:08,274
Now we try again.

1200
01:02:16,820 --> 01:02:17,773
All right.

1201
01:02:17,773 --> 01:02:19,690
So, unfortunately, we
couldn't find a position

1202
01:02:19,690 --> 01:02:21,635
for the sixth row again.

1203
01:02:21,635 --> 01:02:22,510
We have to backtrack.

1204
01:02:22,510 --> 01:02:24,843
But we already tried all the
positions in the fifth row,

1205
01:02:24,843 --> 01:02:27,400
so we backtrack
to the fourth row.

1206
01:02:27,400 --> 01:02:29,350
And you get the idea.

1207
01:02:29,350 --> 01:02:31,600
And then whenever we find
a configuration where

1208
01:02:31,600 --> 01:02:35,430
all eight queens are valid, then
we increment some counter by 1.

1209
01:02:35,430 --> 01:02:37,600
And at the end we just
return this counter,

1210
01:02:37,600 --> 01:02:40,220
which tells us the number
of solutions to the n queens

1211
01:02:40,220 --> 01:02:40,720
puzzle.

1212
01:02:48,820 --> 01:02:51,430
So you can implement
this quite easily using

1213
01:02:51,430 --> 01:02:53,170
a recursive procedure.

1214
01:02:53,170 --> 01:02:56,500
You can implement this
backtracking search.

1215
01:02:56,500 --> 01:02:58,780
But one question
is how should we

1216
01:02:58,780 --> 01:03:01,390
represent the board to
facilitate efficient queen

1217
01:03:01,390 --> 01:03:03,580
placement?

1218
01:03:03,580 --> 01:03:06,010
So one way to
represent the board

1219
01:03:06,010 --> 01:03:09,130
is to use an array
of n squared bytes.

1220
01:03:09,130 --> 01:03:12,750
And for each byte,
we just have a 1

1221
01:03:12,750 --> 01:03:17,365
if there is a queen in that
position, and 0 otherwise.

1222
01:03:17,365 --> 01:03:19,240
Is there a better way
to represent the board?

1223
01:03:27,032 --> 01:03:28,980
AUDIENCE: You can
track all of the bits

1224
01:03:28,980 --> 01:03:31,415
such that a 1 bit
represents a queen

1225
01:03:31,415 --> 01:03:34,350
at some place on the board?

1226
01:03:34,350 --> 01:03:35,710
JULIAN SHUN: Yes.

1227
01:03:35,710 --> 01:03:36,770
So that's a good answer.

1228
01:03:36,770 --> 01:03:39,400
So instead of using
bytes, we can use bits,

1229
01:03:39,400 --> 01:03:41,470
because the value
can only be 0 or 1.

1230
01:03:41,470 --> 01:03:43,430
We only need one bit
to represent that.

1231
01:03:43,430 --> 01:03:48,082
So we can just have an
array of n squared bits.

1232
01:03:48,082 --> 01:03:50,074
Is there a better
way to do this?

1233
01:03:56,550 --> 01:03:57,531
Yes.

1234
01:03:57,531 --> 01:04:00,420
AUDIENCE: You could
just say in each row

1235
01:04:00,420 --> 01:04:02,192
where a queen is with a byte?

1236
01:04:02,192 --> 01:04:02,900
JULIAN SHUN: Yes.

1237
01:04:02,900 --> 01:04:03,840
So good answer.

1238
01:04:03,840 --> 01:04:07,380
So a better way to do this is
to just use an array of n bytes.

1239
01:04:07,380 --> 01:04:11,130
Because we know that on each
row there can only be one queen,

1240
01:04:11,130 --> 01:04:14,820
so we just need to store
the position of that queen.

1241
01:04:14,820 --> 01:04:17,232
So we have an array of n
bytes, one byte for each row,

1242
01:04:17,232 --> 01:04:19,440
and then you just used the
byte to store the position

1243
01:04:19,440 --> 01:04:20,630
of the queen in that row.

1244
01:04:23,740 --> 01:04:25,490
It turns out, to
implement this algorithm,

1245
01:04:25,490 --> 01:04:27,740
there's a even more
compact representation,

1246
01:04:27,740 --> 01:04:32,360
which is to use three-bit
vectors of size n, 2n minus 1,

1247
01:04:32,360 --> 01:04:35,380
and 2n minus 1.

1248
01:04:35,380 --> 01:04:37,080
So let's see how this works.

1249
01:04:37,080 --> 01:04:40,520
So the first bit vector we're
going to use is of length n.

1250
01:04:40,520 --> 01:04:43,450
We're going to call
this the down vector.

1251
01:04:43,450 --> 01:04:45,620
And the down vector
just stores a 1

1252
01:04:45,620 --> 01:04:48,800
in the columns that have a
queen in it and 0 in the columns

1253
01:04:48,800 --> 01:04:49,580
that are empty.

1254
01:04:53,300 --> 01:04:57,170
And then when we want to
check whether placing a queen

1255
01:04:57,170 --> 01:05:00,080
is safe in any
position, we first

1256
01:05:00,080 --> 01:05:02,210
have to check whether
that column is empty.

1257
01:05:02,210 --> 01:05:05,900
And you can do this
by ANDing the down bit

1258
01:05:05,900 --> 01:05:09,440
vector with 1 left-shifted by
c, where c is a column where

1259
01:05:09,440 --> 01:05:11,300
you want to place the queen.

1260
01:05:11,300 --> 01:05:13,100
And if that's
nonzero, that means

1261
01:05:13,100 --> 01:05:17,030
there's already a queen in that
column and you can't place it.

1262
01:05:17,030 --> 01:05:19,670
Otherwise, we're going to
have to do another check,

1263
01:05:19,670 --> 01:05:23,870
and we're going to create this
other bit vector called left.

1264
01:05:23,870 --> 01:05:27,941
The length of this bit
vector is 2n minus 1.

1265
01:05:27,941 --> 01:05:31,250
And it stores a 1
in the diagonal that

1266
01:05:31,250 --> 01:05:33,230
has a queen in it,
and 0's otherwise.

1267
01:05:33,230 --> 01:05:37,210
And there are 2n minus
2 possible diagonals.

1268
01:05:37,210 --> 01:05:38,960
And then now, when
we want to place

1269
01:05:38,960 --> 01:05:41,780
a queen in row r
and column c, we

1270
01:05:41,780 --> 01:05:47,090
can check whether it's safe
by doing left ANDed with 1

1271
01:05:47,090 --> 01:05:49,310
left-shifted by r plus c.

1272
01:05:49,310 --> 01:05:51,680
And this is going to be
nonzero if there is already

1273
01:05:51,680 --> 01:05:54,950
a queen in that
particular diagonal.

1274
01:05:54,950 --> 01:05:57,140
So in that case, we can't
place a queen there.

1275
01:05:57,140 --> 01:06:01,220
And, otherwise, we're going
to do a final check using

1276
01:06:01,220 --> 01:06:04,610
this right bit vector, which
is essentially the same

1277
01:06:04,610 --> 01:06:06,170
but we're looking
at the diagonals

1278
01:06:06,170 --> 01:06:08,960
going down to the right.

1279
01:06:08,960 --> 01:06:12,980
So, again, we have a 1 in the
diagonals that have a queen

1280
01:06:12,980 --> 01:06:14,700
and 0's otherwise.

1281
01:06:14,700 --> 01:06:17,960
And then now the check is
going to be right ANDed with 1

1282
01:06:17,960 --> 01:06:23,120
left-shifted by n
minus 1 minus r plus c.

1283
01:06:23,120 --> 01:06:25,850
And if a particular
candidate passes all three

1284
01:06:25,850 --> 01:06:28,310
of these checks, then
we know that there's not

1285
01:06:28,310 --> 01:06:30,560
going to be a conflict
and we can place the queen

1286
01:06:30,560 --> 01:06:34,020
in that particular position.

1287
01:06:34,020 --> 01:06:36,022
So this is a bit
vector representation.

1288
01:06:36,022 --> 01:06:37,730
You actually still
have to write the code

1289
01:06:37,730 --> 01:06:40,850
to count the number of
queens using this bit vector

1290
01:06:40,850 --> 01:06:43,010
representation,
and it's actually

1291
01:06:43,010 --> 01:06:44,600
an interesting exercise.

1292
01:06:44,600 --> 01:06:48,440
So I encourage you to
try to do this at home.

1293
01:06:48,440 --> 01:06:51,010
But I just told you about the
bit vector representation.

1294
01:06:51,010 --> 01:06:52,284
So any questions?

1295
01:06:55,510 --> 01:06:56,010
Yes.

1296
01:06:56,010 --> 01:06:59,890
AUDIENCE: Could you just
repeat what the down vector bit

1297
01:06:59,890 --> 01:07:02,320
hack was for figuring
out [INAUDIBLE]??

1298
01:07:02,320 --> 01:07:03,290
JULIAN SHUN: Yes.

1299
01:07:03,290 --> 01:07:05,550
So the down vector,
it stores a 1

1300
01:07:05,550 --> 01:07:08,790
in the columns that have a
queen in it and 0's otherwise.

1301
01:07:08,790 --> 01:07:13,310
And what you do is, if you want
to place a queen in column c,

1302
01:07:13,310 --> 01:07:15,840
you first create the
mask 1 left-shifted by c.

1303
01:07:15,840 --> 01:07:17,762
And then you AND it
with a down vector.

1304
01:07:17,762 --> 01:07:19,470
And that's going to
be nonzero if there's

1305
01:07:19,470 --> 01:07:20,660
a queen in that column.

1306
01:07:27,030 --> 01:07:28,920
Any other questions?

1307
01:07:28,920 --> 01:07:29,430
Yes.

1308
01:07:29,430 --> 01:07:32,008
AUDIENCE: Why isn't
there a horizontal one?

1309
01:07:32,008 --> 01:07:34,050
JULIAN SHUN: So it turns
out that you don't need.

1310
01:07:34,050 --> 01:07:38,880
Just these three checks
is enough to guarantee--

1311
01:07:38,880 --> 01:07:41,080
guarantee that you can
place a queen in a position

1312
01:07:41,080 --> 01:07:42,855
if it passes all
three of the checks.

1313
01:07:42,855 --> 01:07:43,355
Yes.

1314
01:07:43,355 --> 01:07:46,635
So a fourth check would
just be redundant.

1315
01:07:46,635 --> 01:07:48,427
AUDIENCE: So we don't
need a horizontal one

1316
01:07:48,427 --> 01:07:50,540
because we're not placing
two queens in the same row.

1317
01:07:50,540 --> 01:07:50,790
JULIAN SHUN: Yes.

1318
01:07:50,790 --> 01:07:51,360
That's true.

1319
01:07:51,360 --> 01:07:51,910
Good point.

1320
01:07:51,910 --> 01:07:52,410
Yes.

1321
01:07:52,410 --> 01:07:55,064
So we're only placing one
queen in each particular row.

1322
01:08:01,110 --> 01:08:04,680
So let's look at
another problem.

1323
01:08:04,680 --> 01:08:08,620
This is called population
count, or pop count for short.

1324
01:08:08,620 --> 01:08:10,980
And the problem here is we
want to count the number of 1

1325
01:08:10,980 --> 01:08:14,880
bits in some word x.

1326
01:08:14,880 --> 01:08:17,910
Here's a way to do this that
repeatedly eliminates the least

1327
01:08:17,910 --> 01:08:20,609
significant 1 bit in a word.

1328
01:08:20,609 --> 01:08:24,600
So we have this for loop
where r is initialized to 0.

1329
01:08:24,600 --> 01:08:28,560
And we're going to repeat
this loop until x becomes 0.

1330
01:08:28,560 --> 01:08:31,160
And then each time we go through
this loop, we increment r.

1331
01:08:31,160 --> 01:08:33,120
And inside the loop
we're going to set

1332
01:08:33,120 --> 01:08:37,410
x equal to x ANDed
with x minus 1.

1333
01:08:37,410 --> 01:08:41,910
And this is going to clear the
least significant 1 bit in x.

1334
01:08:41,910 --> 01:08:44,830
So let's look at an example.

1335
01:08:44,830 --> 01:08:47,990
So let's say we have
this value here for x.

1336
01:08:47,990 --> 01:08:51,729
Well, to get x minus 1, we
flip the rightmost 1 bit

1337
01:08:51,729 --> 01:08:53,348
in x from a 1 to 0.

1338
01:08:53,348 --> 01:08:55,890
And then we fill in all of the
bits to the right of that with

1339
01:08:55,890 --> 01:08:57,222
1's.

1340
01:08:57,222 --> 01:09:02,130
And then now when we AND
those two things together,

1341
01:09:02,130 --> 01:09:06,660
we're going to copy all of the
bits up to the rightmost 1.

1342
01:09:06,660 --> 01:09:09,000
And then for the rightmost
1, we're going to zero it out

1343
01:09:09,000 --> 01:09:10,260
because we're ending with a 0.

1344
01:09:10,260 --> 01:09:12,135
And then all of the bits
to the right of that

1345
01:09:12,135 --> 01:09:13,290
are still going to be 0.

1346
01:09:13,290 --> 01:09:16,050
So x ANDed with
x minus 1 is just

1347
01:09:16,050 --> 01:09:21,750
going to get rid of the
least significant 1 bit.

1348
01:09:21,750 --> 01:09:25,319
And then we repeat this
process until x becomes 0.

1349
01:09:25,319 --> 01:09:28,109
In that case we've already
eliminated all the 1's and we

1350
01:09:28,109 --> 01:09:30,630
know the answer,
which is stored in r.

1351
01:09:34,990 --> 01:09:35,649
Questions?

1352
01:09:41,580 --> 01:09:44,590
So this code will be pretty
fast if the number of 1 bits

1353
01:09:44,590 --> 01:09:47,590
is small, but the
running time is

1354
01:09:47,590 --> 01:09:50,450
proportional to the number
of 1 bits in a word.

1355
01:09:50,450 --> 01:09:53,600
So in the worst case, if most
of the bits are set to 1,

1356
01:09:53,600 --> 01:10:00,320
then you're going to need a lot
of iterations to run this code.

1357
01:10:00,320 --> 01:10:05,050
So let's look at a more
efficient way to do this.

1358
01:10:05,050 --> 01:10:07,930
This is to use table lookup.

1359
01:10:07,930 --> 01:10:12,970
So we're going to create
a table of size 256, which

1360
01:10:12,970 --> 01:10:16,330
stores for each 8-bit
word the number of 1's

1361
01:10:16,330 --> 01:10:17,890
in that 8-bit word.

1362
01:10:17,890 --> 01:10:23,260
So we have all possible 8-bit
words stored in this table.

1363
01:10:23,260 --> 01:10:27,400
And then now, to get the
number of 1 bits in x,

1364
01:10:27,400 --> 01:10:30,760
for every 8-bit
substring in x, we're

1365
01:10:30,760 --> 01:10:36,040
going to look it up in this
count table and add it to r.

1366
01:10:36,040 --> 01:10:38,170
And then we're going
to right-shift x by 8

1367
01:10:38,170 --> 01:10:39,550
so that we can
get the next word.

1368
01:10:39,550 --> 01:10:43,870
And then when x becomes
0, we know we're done.

1369
01:10:43,870 --> 01:10:45,390
So that's table lookup.

1370
01:10:45,390 --> 01:10:51,060
And the performance here
depends on the size of x.

1371
01:10:51,060 --> 01:10:53,910
If we have a 64-bit
word, we need

1372
01:10:53,910 --> 01:10:57,400
to do this at most eight times,
whereas in the initial code

1373
01:10:57,400 --> 01:11:03,180
we might have to do it 64
times if we had 64 1 bits.

1374
01:11:03,180 --> 01:11:06,300
The cost of this code is
bottlenecked by the memory

1375
01:11:06,300 --> 01:11:10,540
operations, because this table
here is stored in memory.

1376
01:11:10,540 --> 01:11:13,200
So every time you access
it you have to go to memory

1377
01:11:13,200 --> 01:11:15,910
to fetch the value there.

1378
01:11:15,910 --> 01:11:18,630
And here are some
approximate costs

1379
01:11:18,630 --> 01:11:22,600
for accessing memory in various
levels of the hierarchy.

1380
01:11:22,600 --> 01:11:24,870
If something's stored in
register, it's very fast.

1381
01:11:24,870 --> 01:11:27,240
It only takes you 1 cycle.

1382
01:11:27,240 --> 01:11:29,910
If it's stored in L1
cache, it's about 4 cycles,

1383
01:11:29,910 --> 01:11:34,230
L2 cache about 10 cycles,
L3 cache about 50 cycles.

1384
01:11:34,230 --> 01:11:37,260
And then, finally, if you have
to go to DRAM because it's not

1385
01:11:37,260 --> 01:11:40,620
in cache, it's much more
expensive, 150 cycles.

1386
01:11:40,620 --> 01:11:43,650
It's an order of
magnitude slower

1387
01:11:43,650 --> 01:11:45,802
than doing something--
fetching something that's

1388
01:11:45,802 --> 01:11:47,010
already stored in a register.

1389
01:11:49,620 --> 01:11:53,830
So let's now look at a third
way to do population count where

1390
01:11:53,830 --> 01:11:57,660
we don't actually have
to go to cache or DRAM.

1391
01:11:57,660 --> 01:12:01,140
Essentially, we can do
everything in registers.

1392
01:12:01,140 --> 01:12:03,640
So here's how you do it.

1393
01:12:03,640 --> 01:12:06,810
So we're going to create
these five masks--

1394
01:12:06,810 --> 01:12:10,860
or six masks, from M0 up to M5.

1395
01:12:10,860 --> 01:12:14,548
And these masks-- the
values of these masks

1396
01:12:14,548 --> 01:12:15,840
are shown in the comments here.

1397
01:12:15,840 --> 01:12:18,360
In this notation
here, x to the k

1398
01:12:18,360 --> 01:12:20,850
just means x repeated k times.

1399
01:12:20,850 --> 01:12:26,430
So the mask M5 has 32
0's, followed by 32 1's.

1400
01:12:26,430 --> 01:12:31,440
The mask M0 has the bit
string 01 repeated 32 times,

1401
01:12:31,440 --> 01:12:32,030
and so on.

1402
01:12:35,003 --> 01:12:36,420
After we create
these masks, we're

1403
01:12:36,420 --> 01:12:40,030
going to execute these six
instructions at the bottom,

1404
01:12:40,030 --> 01:12:44,850
and this is going to give us
the number of 1's in the word.

1405
01:12:44,850 --> 01:12:49,150
So let's do an example
to see how this works.

1406
01:12:49,150 --> 01:12:51,270
So let's say we start
with this bit string here.

1407
01:12:54,750 --> 01:12:56,640
In the first step,
what we're going to do

1408
01:12:56,640 --> 01:12:59,820
is we're going to AND
x with the mask M0.

1409
01:12:59,820 --> 01:13:01,830
And then we're also going
to AND x right-shifted

1410
01:13:01,830 --> 01:13:04,650
by 1 with the mask M0.

1411
01:13:04,650 --> 01:13:12,390
and recall that the mask M0
is just 01 repeated 32 times,

1412
01:13:12,390 --> 01:13:15,440
and therefore the mask is
essentially extracting all

1413
01:13:15,440 --> 01:13:16,710
of the even bits.

1414
01:13:16,710 --> 01:13:21,180
So x ANDed with M0 gives
us all of the even bits.

1415
01:13:21,180 --> 01:13:24,030
And then when we right-shift
x by 1 and AND it with M0,

1416
01:13:24,030 --> 01:13:26,490
that's going to give
us all the odd bits.

1417
01:13:26,490 --> 01:13:28,650
And then we're going to
line those two things up

1418
01:13:28,650 --> 01:13:31,290
and add them together.

1419
01:13:31,290 --> 01:13:33,090
And the result of
doing this is it's

1420
01:13:33,090 --> 01:13:37,410
going to tell us for every
group of two bits the number

1421
01:13:37,410 --> 01:13:39,870
of 1 bits in that group.

1422
01:13:39,870 --> 01:13:42,540
So now for each of
these pairs of bits,

1423
01:13:42,540 --> 01:13:44,920
it's telling us how
many of them are 1.

1424
01:13:44,920 --> 01:13:48,270
So in the leftmost group
here, we add two 1's.

1425
01:13:48,270 --> 01:13:52,440
So the result of adding 1
and 1 is 1 0, which is 2.

1426
01:13:52,440 --> 01:13:56,700
For the rightmost group, we have
two 0's, and the count there is

1427
01:13:56,700 --> 01:13:57,810
00.

1428
01:13:57,810 --> 01:14:02,320
And this is the same for
all of the other groups.

1429
01:14:02,320 --> 01:14:09,900
So this gives us the number of
1's in every pair of positions.

1430
01:14:09,900 --> 01:14:15,090
Now we're going to AND
the result with M1.

1431
01:14:15,090 --> 01:14:19,050
And we're going to right-shift
it by 2 and also AND it with M1

1432
01:14:19,050 --> 01:14:20,475
and add those two
things together.

1433
01:14:23,430 --> 01:14:27,120
And M1 is a mask that will
give us the bottom two bits

1434
01:14:27,120 --> 01:14:30,040
in every group of four bits.

1435
01:14:30,040 --> 01:14:31,860
So when we right-shift
x by 2, that's

1436
01:14:31,860 --> 01:14:33,120
giving us the top two bits.

1437
01:14:33,120 --> 01:14:35,040
And then now we
add those together,

1438
01:14:35,040 --> 01:14:38,520
and it will give us the
count of the number of 1

1439
01:14:38,520 --> 01:14:41,340
bits in every group of size 4.

1440
01:14:41,340 --> 01:14:45,800
And these counts are stored
in the result here now.

1441
01:14:45,800 --> 01:14:47,610
So you can verify that
each of these groups

1442
01:14:47,610 --> 01:14:50,250
has the count of the
number of 1 bits.

1443
01:14:50,250 --> 01:14:54,540
So, for example,
we have 100 here.

1444
01:14:54,540 --> 01:14:57,240
And this is correct since
there are four 1 bits.

1445
01:14:59,920 --> 01:15:03,640
Now we do this again
with the mask M2.

1446
01:15:03,640 --> 01:15:05,530
That's going to
give us the counts

1447
01:15:05,530 --> 01:15:09,180
for all groups of size 8.

1448
01:15:09,180 --> 01:15:12,490
Then we go to groups of size 16.

1449
01:15:12,490 --> 01:15:16,950
And then, finally, we
add these two together,

1450
01:15:16,950 --> 01:15:21,350
giving us the number of bits
in this group of size 32.

1451
01:15:21,350 --> 01:15:22,940
And this is actually
the pop count.

1452
01:15:22,940 --> 01:15:25,880
So the value here is 17.

1453
01:15:25,880 --> 01:15:29,030
And you can verify that
there are indeed 17 1's

1454
01:15:29,030 --> 01:15:31,340
in the input word x.

1455
01:15:34,220 --> 01:15:35,660
Any questions?

1456
01:15:41,430 --> 01:15:44,160
So the performance
of this code, which

1457
01:15:44,160 --> 01:15:46,260
is based on parallel
divide and conquer,

1458
01:15:46,260 --> 01:15:49,740
is going to be proportional
to log base 2 of w,

1459
01:15:49,740 --> 01:15:51,510
where w is the word length.

1460
01:15:51,510 --> 01:15:56,670
Because on every step I'm
doubling the size of my groups.

1461
01:15:56,670 --> 01:16:00,680
And after I do this log base 2
w times, I have the whole group.

1462
01:16:04,680 --> 01:16:09,730
In the first two instructions
that I executed here,

1463
01:16:09,730 --> 01:16:14,200
I have to actually
do the AND separately

1464
01:16:14,200 --> 01:16:17,890
for x right-shifted by 1 and x,
and also x right-shifted by 2

1465
01:16:17,890 --> 01:16:20,630
and x, and then
add them together,

1466
01:16:20,630 --> 01:16:23,750
because there is
an overflow issue.

1467
01:16:23,750 --> 01:16:26,680
The overflow issue is that
the size of the groups

1468
01:16:26,680 --> 01:16:30,700
here might not be large
enough to actually store

1469
01:16:30,700 --> 01:16:33,940
the count of the number
of 1 bits in that group.

1470
01:16:33,940 --> 01:16:35,530
But once I get to
the larger groups,

1471
01:16:35,530 --> 01:16:38,530
the count can always be
stored in a group of that size

1472
01:16:38,530 --> 01:16:42,010
and I don't need to
worry about overflow.

1473
01:16:42,010 --> 01:16:44,140
So for the last four
lines, I can actually

1474
01:16:44,140 --> 01:16:46,240
save one instruction.

1475
01:16:46,240 --> 01:16:47,810
I don't need to
do the AND twice.

1476
01:16:55,920 --> 01:16:58,730
So it turns out that most
modern machines nowadays

1477
01:16:58,730 --> 01:17:01,490
have an intrinsic pop count
instruction implemented

1478
01:17:01,490 --> 01:17:03,980
in hardware, which is
faster than anything

1479
01:17:03,980 --> 01:17:05,840
you can code yourself.

1480
01:17:05,840 --> 01:17:08,700
And you can access this
pop count instruction

1481
01:17:08,700 --> 01:17:13,280
via compiler intrinsics,
for example in GCC or Clang.

1482
01:17:13,280 --> 01:17:17,240
And in GCC, it's
__builtin_popcount.

1483
01:17:20,860 --> 01:17:24,740
One warning though is that
if you write this code using

1484
01:17:24,740 --> 01:17:27,710
these intrinsics, if
you try to compile

1485
01:17:27,710 --> 01:17:29,690
the code on a machine
that doesn't support it,

1486
01:17:29,690 --> 01:17:31,190
your code isn't
going to compile.

1487
01:17:31,190 --> 01:17:33,500
So it makes your
code less portable.

1488
01:17:33,500 --> 01:17:37,250
But this intrinsic is faster
than the parallel divide

1489
01:17:37,250 --> 01:17:38,150
and conquer version.

1490
01:17:40,830 --> 01:17:43,010
So one question is, how
can you get the log base

1491
01:17:43,010 --> 01:17:46,190
2 of a power of 2 quickly
using a pop count instruction?

1492
01:17:46,190 --> 01:17:48,796
So instead of using the
de Bruijn sequence trick.

1493
01:17:52,010 --> 01:17:52,510
Yes.

1494
01:17:52,510 --> 01:17:54,772
AUDIENCE: You decrement
then you pop count.

1495
01:17:54,772 --> 01:17:55,480
JULIAN SHUN: Yes.

1496
01:17:55,480 --> 01:17:59,800
So what you do is you subtract
1 from the power of 2,

1497
01:17:59,800 --> 01:18:03,010
and that's going to flood all
of the lower bits with 1's.

1498
01:18:03,010 --> 01:18:04,810
And then now when you
execute pop count,

1499
01:18:04,810 --> 01:18:07,660
it's going to count the number
of 1's, and that gives us

1500
01:18:07,660 --> 01:18:09,460
the log base 2 of
the power of 2.

1501
01:18:09,460 --> 01:18:10,450
So good answer.

1502
01:18:13,458 --> 01:18:15,000
So those all the
bit tricks I'm going

1503
01:18:15,000 --> 01:18:17,130
to be talking about today.

1504
01:18:17,130 --> 01:18:19,050
There's a lot of
resources online if you're

1505
01:18:19,050 --> 01:18:21,150
interested in learning more.

1506
01:18:21,150 --> 01:18:23,670
There's this really
good website maintained

1507
01:18:23,670 --> 01:18:26,760
by Sean Eron Anderson.

1508
01:18:26,760 --> 01:18:28,800
There's also the
Knuth's textbook, which

1509
01:18:28,800 --> 01:18:30,480
has some bit tricks in there.

1510
01:18:30,480 --> 01:18:32,670
There's a chess
programming website which

1511
01:18:32,670 --> 01:18:34,650
has a lot of cool bit tricks.

1512
01:18:34,650 --> 01:18:37,200
Some of those are used in
implementing chess programs.

1513
01:18:37,200 --> 01:18:39,568
And then, finally, this book
called Hacker's Delight.

1514
01:18:39,568 --> 01:18:41,610
So we'll be playing around
with many of these bit

1515
01:18:41,610 --> 01:18:45,710
tricks in project 1,
so happy bit hacking.