1
00:00:09 --> 00:00:10
Hashing.
2
00:00:10 --> 00:00:15
3
00:00:15 --> 00:00:19
Today we're going to do some
amazing stuff with hashing.
4
00:00:19 --> 00:00:21
And, really,
this is such neat stuff,
5
00:00:21 --> 00:00:24
it's amazing.
We're going to start by
6
00:00:24 --> 00:00:28
addressing a fundamental
weakness of hashing.
7
00:00:28 --> 00:00:34
8
00:00:34 --> 00:00:37
And that is that for any choice
of hash function
9
00:00:37 --> 00:00:49
10
00:00:49 --> 00:01:04
There exists a bad set of keys
that all hash to the same slot.
11
00:01:04 --> 00:01:09
12
00:01:09 --> 00:01:11
OK.
So you pick a hash function.
13
00:01:11 --> 00:01:15
We looked at some that seem to
work well in practice,
14
00:01:15 --> 00:01:18
that are easy to put into your
code.
15
00:01:18 --> 00:01:23
But whichever one you pick,
there's always some bad set of
16
00:01:23 --> 00:01:25
keys.
So you can imagine,
17
00:01:25 --> 00:01:30
just to drive this point home a
little bit.
18
00:01:30 --> 00:01:35
Imagine that you're building a
compiler for a customer and you
19
00:01:35 --> 00:01:40
have a symbol table in your
compiler and one of the things
20
00:01:40 --> 00:01:46
that the customer is demanding
is that compilations go fast.
21
00:01:46 --> 00:01:50
They don't want to sit around
waiting for compilations.
22
00:01:50 --> 00:01:56
And you have a competitor who's
also building a compiler and
23
00:01:56 --> 00:02:01
they're going to test the
compiler, both of your compilers
24
00:02:01 --> 00:02:07
and sort of have a run-off.
And one of the things in the
25
00:02:07 --> 00:02:12
test that they're going to allow
you to do is not only will the
26
00:02:12 --> 00:02:16
customer run his own benchmarks,
but he'll let you make up
27
00:02:16 --> 00:02:20
benchmarks for the other
program, for your competitor.
28
00:02:20 --> 00:02:24
And your competitor gets to
make up benchmarks for you.
29
00:02:24 --> 00:02:28
So and not only that,
but you're actually sharing
30
00:02:28 --> 00:02:32
code.
So you get to look at what the
31
00:02:32 --> 00:02:37
competitor is actually doing and
what hash function they're
32
00:02:37 --> 00:02:40
actually using.
So it's pretty clear that in
33
00:02:40 --> 00:02:44
this circumstance,
you have an adversary who is
34
00:02:44 --> 00:02:49
going to look at whatever hash
function you have and figure out
35
00:02:49 --> 00:02:53
OK, what's a set of variable
names and so forth that are
36
00:02:53 --> 00:02:58
going to all hash to the same
slot so that essentially you're
37
00:02:58 --> 00:03:03
just chasing through a linked
list whenever it comes to
38
00:03:03 --> 00:03:07
looking something up.
Slowing down your program
39
00:03:07 --> 00:03:12
enormously compared to if in
fact they got distributed nicely
40
00:03:12 --> 00:03:15
across the hash table which is,
what after all,
41
00:03:15 --> 00:03:19
you have a hash table in there
to do in the first place.
42
00:03:19 --> 00:03:22
And so the question is,
how do you defeat this
43
00:03:22 --> 00:03:26
adversary?
And the answer is one word.
44
00:03:26 --> 00:03:31
45
00:03:31 --> 00:03:33
One word.
How do you achieve?
46
00:03:33 --> 00:03:37
How do you defeat any adversary
in this class?
47
00:03:37 --> 00:03:38
Randomness.
OK.
48
00:03:38 --> 00:03:39
Randomness.
OK.
49
00:03:39 --> 00:03:42
You make it so that he can't
guess.
50
00:03:42 --> 00:03:47
And the idea is that you choose
a hash function at random.
51
00:03:47 --> 00:03:50
Independent.
So he can look at the code,
52
00:03:50 --> 00:03:55
but when it actually runs,
it's going to use a random hash
53
00:03:55 --> 00:04:00
function that he has no way of
predicting what the hash
54
00:04:00 --> 00:04:05
function is that will actually
be used.
55
00:04:05 --> 00:04:07
OK.
So that's the game and that way
56
00:04:07 --> 00:04:11
he can provide an input,
but he can't provide an input
57
00:04:11 --> 00:04:15
that's guaranteed to force you
to run slowly.
58
00:04:15 --> 00:04:19
You might get unlucky in your
choice of hash function,
59
00:04:19 --> 00:04:23
but it's not going to be
because of the adversary.
60
00:04:23 --> 00:04:28
So the idea is to choose a hash
function --
61
00:04:28 --> 00:04:34
62
00:04:34 --> 00:04:38
-- at random,
independently from the keys
63
00:04:38 --> 00:04:42
that you're, that are going to
be fed to it.
64
00:04:42 --> 00:04:47
So even if your adversary can
see your code,
65
00:04:47 --> 00:04:53
he can't tell which hash
function is going to be actually
66
00:04:53 --> 00:04:58
used at run time.
Doesn't get to see the output
67
00:04:58 --> 00:05:04
of the random numbers.
And so it turns out you can
68
00:05:04 --> 00:05:11
make this scheme work and the
name of the scheme is universal
69
00:05:11 --> 00:05:17
hashing, OK, is one way of
making this scheme work.
70
00:05:17 --> 00:05:22
71
00:05:22 --> 00:05:34
So let's do some math.
So let U be a universe of keys.
72
00:05:34 --> 00:05:41
And let H be a finite
collection --
73
00:05:41 --> 00:05:48
74
00:05:48 --> 00:05:49
-- of hash functions --
75
00:05:49 --> 00:05:56
76
00:05:56 --> 00:06:04
-- mapping U to what are going
to be the slots in our hash
77
00:06:04 --> 00:06:06
table.
OK.
78
00:06:06 --> 00:06:11
So we just have H as some
finite collection.
79
00:06:11 --> 00:06:15
We say that H is universal --
80
00:06:15 --> 00:06:22
81
00:06:22 --> 00:06:30
-- if for all pairs of the
keys, distinct keys --
82
00:06:30 --> 00:06:36
83
00:06:36 --> 00:06:41
-- so the keys are distinct,
the following is true.
84
00:06:41 --> 00:07:03
85
00:07:03 --> 00:07:08
So if the set of keys,
if for any pair of keys I pick,
86
00:07:08 --> 00:07:15
the number of hash functions
that hash those two keys to the
87
00:07:15 --> 00:07:21
same place is a one over m
fraction of the total set of
88
00:07:21 --> 00:07:23
keys.
So let m just,
89
00:07:23 --> 00:07:28
so to view that,
another way of viewing that is
90
00:07:28 --> 00:07:33
if H is chosen randomly --
91
00:07:33 --> 00:07:39
92
00:07:39 --> 00:07:51
-- from the set of keys H,
the probability of collision
93
00:07:51 --> 00:07:58
between x and y is what?
94
00:07:58 --> 00:08:12
95
00:08:12 --> 00:08:17
What's the probability if the
fraction of hash functions,
96
00:08:17 --> 00:08:22
OK, if the number of hash
functions is H over m,
97
00:08:22 --> 00:08:27
what's the probability of a
collision between x and y?
98
00:08:27 --> 00:08:32
If I pick a hash function at
random.
99
00:08:32 --> 00:08:39
So I pick a hash function at
random, what's the odds they
100
00:08:39 --> 00:08:42
collide?
One over m.
101
00:08:42 --> 00:08:49
Now let's draw a picture for
that, help people see that
102
00:08:49 --> 00:08:56
that's in fact the case.
So imagine this is our set of
103
00:08:56 --> 00:09:00
all hash functions.
OK.
104
00:09:00 --> 00:09:08
And then if I pick a particular
x and y, let's say that this is
105
00:09:08 --> 00:09:16
the set of hash functions such
that H of x is equal to H of y.
106
00:09:16 --> 00:09:23
And so what we're saying is
that the cardinality of that set
107
00:09:23 --> 00:09:30
is one over m times the
cardinality of H.
108
00:09:30 --> 00:09:33
So if I throw a dart and pick
one hash function at random,
109
00:09:33 --> 00:09:37
the odds are one in m that the
hash function falls into this
110
00:09:37 --> 00:09:39
particular set.
And of course,
111
00:09:39 --> 00:09:43
this has to be true of every x
and y that I can pick.
112
00:09:43 --> 00:09:45
Of course, it will be a
different set,
113
00:09:45 --> 00:09:49
a different x and y will
somehow map the hash functions
114
00:09:49 --> 00:09:52
differently, but the odds that
for any x and y that I pick,
115
00:09:52 --> 00:09:55
the odds that if I have a
random hash function,
116
00:09:55 --> 00:10:00
it hashes it to the same place,
is one over m.
117
00:10:00 --> 00:10:03
Now this is a little bit hard
sometimes for people to get
118
00:10:03 --> 00:10:07
their head around because we're
used to thinking of perhaps
119
00:10:07 --> 00:10:09
picking keys at random or
something.
120
00:10:09 --> 00:10:11
OK, that's not what's going on
here.
121
00:10:11 --> 00:10:14
We're picking hash functions at
random.
122
00:10:14 --> 00:10:18
So our probability space is
defined over the hash functions,
123
00:10:18 --> 00:10:21
not over the keys.
And this has to be true now for
124
00:10:21 --> 00:10:24
any particular two keys that I
pick that are distinct.
125
00:10:24 --> 00:10:28
That the places that they hash,
this set of hash functions,
126
00:10:28 --> 00:10:34
I mean this is like a marvelous
property if you think about it.
127
00:10:34 --> 00:10:39
OK, that you can actually find
ones where no matter what two
128
00:10:39 --> 00:10:43
elements I pick,
the odds are exactly one in m
129
00:10:43 --> 00:10:48
that a random hash function from
this set is going to hash them
130
00:10:48 --> 00:10:51
to the same place.
So very neat.
131
00:10:51 --> 00:10:56
Very, very neat property and
we'll see the mathematics
132
00:10:56 --> 00:11:00
associated with this is very
cool.
133
00:11:00 --> 00:11:14
So our theorem is that if we
choose h randomly from the set
134
00:11:14 --> 00:11:25
of hash functions H,
and then we suppose we're
135
00:11:25 --> 00:11:37
hashing n keys into m slots in
Table T --
136
00:11:37 --> 00:11:44
137
00:11:44 --> 00:11:46
-- then for given key x --
138
00:11:46 --> 00:11:52
139
00:11:52 --> 00:11:56
-- the expected number of
collisions with x --
140
00:11:56 --> 00:12:03
141
00:12:03 --> 00:12:12
-- is less than n over m.
And who remembers what we call
142
00:12:12 --> 00:12:16
n over m?
Alpha, which is the,
143
00:12:16 --> 00:12:22
what's the term that we use
there?
144
00:12:22 --> 00:12:30
Load factor.
The load factor of the table.
145
00:12:30 --> 00:12:36
OK, load factor alpha.
So the average number of keys
146
00:12:36 --> 00:12:42
per slot is the load factor of
the table.
147
00:12:42 --> 00:12:48
So we're saying,
so what is this theorem saying?
148
00:12:48 --> 00:12:55
It's saying that in fact,
if we have one of these
149
00:12:55 --> 00:13:02
universal sets of hash
functions, then things perform
150
00:13:02 --> 00:13:10
exactly the way we want them to.
Things get distributed evenly.
151
00:13:10 --> 00:13:15
The number of things that are
going to collide with any
152
00:13:15 --> 00:13:19
particular key that I pick is
going to be n over m.
153
00:13:19 --> 00:13:22
So that's a really good
property to have.
154
00:13:22 --> 00:13:27
Now I haven't shown you,
the construction of U is going,
155
00:13:27 --> 00:13:31
sorry, of the set of hash
functions H, that that
156
00:13:31 --> 00:13:36
construction will take us a
little bit of effort.
157
00:13:36 --> 00:13:39
But first I want to show you
why this is such a great
158
00:13:39 --> 00:13:42
property.
Basically it's this theorem.
159
00:13:42 --> 00:13:46
So let's prove this theorem.
So any questions about what the
160
00:13:46 --> 00:13:50
statement of the theorem is?
So we're going to go actually
161
00:13:50 --> 00:13:54
kind of fast today.
We've got a lot of good stuff
162
00:13:54 --> 00:13:57
today.
So I want to make sure people
163
00:13:57 --> 00:14:03
are onboard as we go through.
So if there are any questions,
164
00:14:03 --> 00:14:07
make sure, you know,
statement of theorem of
165
00:14:07 --> 00:14:13
whatever, best to get them out
early so that you're not
166
00:14:13 --> 00:14:19
confused later on when the going
gets a little more exciting.
167
00:14:19 --> 00:14:21
OK?
OK, good.
168
00:14:21 --> 00:14:26
So to prove this,
let's let C sub x be the random
169
00:14:26 --> 00:14:33
variable denoting the total
number of collisions --
170
00:14:33 --> 00:14:38
171
00:14:38 --> 00:14:44
-- of keys in T with x.
So this is a total number and
172
00:14:44 --> 00:14:51
one of the techniques that you
use a lot in probabilistic
173
00:14:51 --> 00:14:57
analysis of randomized
algorithms is recognizing that C
174
00:14:57 --> 00:15:05
of x is in fact a sum of
indicator random variables.
175
00:15:05 --> 00:15:11
If you can decompose things
into indicator random variables,
176
00:15:11 --> 00:15:17
the analysis goes much more
easily than if you're left with
177
00:15:17 --> 00:15:22
aggregate variables.
So here we're going to let our
178
00:15:22 --> 00:15:27
indicator random variable be
little c of x.,
179
00:15:27 --> 00:15:32
which is going to be one if h
of x equals h of y and 0
180
00:15:32 --> 00:15:35
otherwise.
181
00:15:35 --> 00:15:40
182
00:15:40 --> 00:15:49
And so we can note two things.
First, what is the expectation
183
00:15:49 --> 00:15:52
of C of x..
184
00:15:52 --> 00:15:57
185
00:15:57 --> 00:16:00
OK, if I have a process which
is picking a hash function at
186
00:16:00 --> 00:16:04
random, what's the expectation
of C of x.?
187
00:16:04 --> 00:16:07
One over m.
Because that's basically this
188
00:16:07 --> 00:16:11
definition here.
Now in other words I pick a
189
00:16:11 --> 00:16:16
hash function at random,
what's the odds that the hash
190
00:16:16 --> 00:16:19
is the same?
It's one over m.
191
00:16:19 --> 00:16:24
And then the other thing is,
and the reason we pick this
192
00:16:24 --> 00:16:28
thing is that I can express
capital C sub x,
193
00:16:28 --> 00:16:33
the random variable denoting
the total number of collisions
194
00:16:33 --> 00:16:39
as being just the sum over all
the keys in the table except x
195
00:16:39 --> 00:16:46
of C of x..
So for each one that would
196
00:16:46 --> 00:16:53
cause me a collision,
with x, I add one and if it
197
00:16:53 --> 00:17:00
wouldn't cause me a collision,
I add 0.
198
00:17:00 --> 00:17:06
And that adds up all of the
collisions that I would have in
199
00:17:06 --> 00:17:09
the table with x.
200
00:17:09 --> 00:17:17
201
00:17:17 --> 00:17:20
Is there any questions so far?
Because this is the set-up.
202
00:17:20 --> 00:17:24
The set-up in most of these
things, the set-up is where most
203
00:17:24 --> 00:17:27
students make mistakes and most
practicing researchers make
204
00:17:27 --> 00:17:30
mistakes as well,
let me tell you.
205
00:17:30 --> 00:17:32
And then once you get the
set-up right,
206
00:17:32 --> 00:17:36
then working out the math is
fine, but it's often that set-up
207
00:17:36 --> 00:17:40
of how do you actually translate
the situation into the math.
208
00:17:40 --> 00:17:43
That's the hard part.
Once you get that right,
209
00:17:43 --> 00:17:46
well, then, algebra,
we can all do algebra.
210
00:17:46 --> 00:17:49
Of course, we can also all make
mistakes doing algebra,
211
00:17:49 --> 00:17:53
but at least those mistakes are
much more easy to check than the
212
00:17:53 --> 00:17:57
one that does the translation.
So I want to make sure people
213
00:17:57 --> 00:18:00
are sort of understanding of how
that's set up.
214
00:18:00 --> 00:18:05
So now we just have to use our
math skills.
215
00:18:05 --> 00:18:12
So the expectation then of the
number of collisions is the
216
00:18:12 --> 00:18:18
expectation of C sub x and
that's just the expectation of
217
00:18:18 --> 00:18:26
just plugging the sum of y and T
minus the element x of c_xy.
218
00:18:26 --> 00:18:33
So that's just definition.
And that's equal to the sum of
219
00:18:33 --> 00:18:39
y and T minus x of expectation
of c_xy.
220
00:18:39 --> 00:18:44
So why is that?
Yeah, that's linearity.
221
00:18:44 --> 00:18:52
222
00:18:52 --> 00:18:56
Linearity of expectation,
doesn't require independence.
223
00:18:56 --> 00:19:00
It's true of all random
variables.
224
00:19:00 --> 00:19:07
And that's equal to,
and now the math gets easier.
225
00:19:07 --> 00:19:10
So what is that?
One over m.
226
00:19:10 --> 00:19:16
That makes the summation easy
to evaluate.
227
00:19:16 --> 00:19:22
That's just n minus one over m.
228
00:19:22 --> 00:19:30
229
00:19:30 --> 00:19:35
So fairly simple analysis and
shows you why we would love to
230
00:19:35 --> 00:19:41
have one of these sets of
universal hash functions because
231
00:19:41 --> 00:19:45
if you have them,
then they behave exactly the
232
00:19:45 --> 00:19:51
way you would want it to behave.
And you defeat your adversary
233
00:19:51 --> 00:19:55
by just picking up the hash
function at random.
234
00:19:55 --> 00:20:00
There's nothing he can do.
Or she.
235
00:20:00 --> 00:20:02
OK, any questions about that
proof?
236
00:20:02 --> 00:20:04
OK, now we get into the fun
math.
237
00:20:04 --> 00:20:07
Constructing one of these
babies.
238
00:20:07 --> 00:20:08
OK.
239
00:20:08 --> 00:20:20
240
00:20:20 --> 00:20:23
This is not the only
construction.
241
00:20:23 --> 00:20:31
This is a construction of a
classic universal hash function.
242
00:20:31 --> 00:20:37
And there are other
constructions in the literature
243
00:20:37 --> 00:20:42
and I think there's one on the
practice quiz.
244
00:20:42 --> 00:20:47
So let's see.
So this one works when m is
245
00:20:47 --> 00:20:51
prime.
So it works when the set of
246
00:20:51 --> 00:20:57
slots is a prime number.
Number of slots is a prime
247
00:20:57 --> 00:21:05
number.
So the idea here is we're going
248
00:21:05 --> 00:21:16
to decompose any key k in our
universe into r plus 1 digits.
249
00:21:16 --> 00:21:25
So k, we're going to look at as
being a k 0, k one,
250
00:21:25 --> 00:21:33
k_r where 0 is less than or
equal to k sub I,
251
00:21:33 --> 00:21:41
is less than or equal to m
minus one.
252
00:21:41 --> 00:21:47
So the idea is in some sense
we're looking at what the
253
00:21:47 --> 00:21:52
representation would be of k
base m.
254
00:21:52 --> 00:21:58
So if it were base two,
it would be just one bit at a
255
00:21:58 --> 00:22:01
time.
These would just be the bits.
256
00:22:01 --> 00:22:05
I'm not going to do base two.
We're going to do base min
257
00:22:05 --> 00:22:09
general and so each of these
represents one of the digits.
258
00:22:09 --> 00:22:13
And the way I've done it is
I've done low order digit first.
259
00:22:13 --> 00:22:16
It actually doesn't matter.
We're not actually going to
260
00:22:16 --> 00:22:20
care really about what the order
is, but basically we're just
261
00:22:20 --> 00:22:24
looking at busting it into a
twofold represented by each of
262
00:22:24 --> 00:22:27
those digits.
So one algorithm for computing
263
00:22:27 --> 00:22:31
this out of k is take the
remainder mod m.
264
00:22:31 --> 00:22:34
That's the low order one.
OK, take what's left.
265
00:22:34 --> 00:22:37
Take the remainder of that mod
m.
266
00:22:37 --> 00:22:39
Take whatever's left,
etc.
267
00:22:39 --> 00:22:42
So you're familiar with the
conversion to a base
268
00:22:42 --> 00:22:46
representation.
That's exactly how we're
269
00:22:46 --> 00:22:49
getting this representation.
So we treat,
270
00:22:49 --> 00:22:53
this is just a question of
taking the data that we've got
271
00:22:53 --> 00:22:57
and treating it as an r plus one
base m number.
272
00:22:57 --> 00:23:02
And now we invoke our
randomized strategy.
273
00:23:02 --> 00:23:05
The randomized strategy is
going to be able to have a class
274
00:23:05 --> 00:23:09
of hash functions that's
dependent essentially on random
275
00:23:09 --> 00:23:11
numbers.
And the random numbers we're
276
00:23:11 --> 00:23:15
going to pick is we're going to
pick an a at random --
277
00:23:15 --> 00:23:28
278
00:23:28 --> 00:23:33
-- which we're also going to
look at as a base mnumber.
279
00:23:33 --> 00:23:38
For each a_i is chosen randomly
--
280
00:23:38 --> 00:23:49
281
00:23:49 --> 00:23:50
-- from --
282
00:23:50 --> 00:23:55
283
00:23:55 --> 00:23:58
-- 0 to m minus one.
So one of our,
284
00:23:58 --> 00:24:03
it's a random if you will,
it's a random base mdigit.
285
00:24:03 --> 00:24:06
Random base m digit.
So each one of these is picked
286
00:24:06 --> 00:24:09
at random.
And for each one we,
287
00:24:09 --> 00:24:13
possible value of A,
we're going to get a different
288
00:24:13 --> 00:24:16
hash function.
So we're going to index our
289
00:24:16 --> 00:24:19
hash functions by this random
number.
290
00:24:19 --> 00:24:23
So this is where the randomness
is going to come in.
291
00:24:23 --> 00:24:28
Everybody with me?
And here's the hash function.
292
00:24:28 --> 00:24:56
293
00:24:56 --> 00:25:06
So what we do is we dot product
this vector with this vector and
294
00:25:06 --> 00:25:11
take the result,
mod m.
295
00:25:11 --> 00:25:18
So each digit of k of our key
gets multiplied by a random
296
00:25:18 --> 00:25:25
other digit.
We add all those up and we take
297
00:25:25 --> 00:25:29
that mod m.
So that's a dot product
298
00:25:29 --> 00:25:34
operator.
And this is what we're going to
299
00:25:34 --> 00:25:37
show is universal,
that this set of h sub a,
300
00:25:37 --> 00:25:39
where I look over that whole
set.
301
00:25:39 --> 00:25:44
So one of the things we need to
know is how big is the set of
302
00:25:44 --> 00:25:46
hash functions here.
303
00:25:46 --> 00:25:59
304
00:25:59 --> 00:26:01
So how big is this set of hash
functions?
305
00:26:01 --> 00:26:07
How many different hash
functions do I have in this set?
306
00:26:07 --> 00:26:24
307
00:26:24 --> 00:26:31
It's basic 6.042 material.
It's basically how many vectors
308
00:26:31 --> 00:26:38
of length r plus one where each
element of the vector is a
309
00:26:38 --> 00:26:45
number of 0 to m minus one,
has m different values.
310
00:26:45 --> 00:26:50
So how many?
m minus one to the r.
311
00:26:50 --> 00:26:51
No.
Close.
312
00:26:51 --> 00:26:56
It's up there.
It's a big number.
313
00:26:56 --> 00:27:01
m to the r plus one.
Good.
314
00:27:01 --> 00:27:06
It's m, so the size of H is
equal to m to the r plus one.
315
00:27:06 --> 00:27:10
So we're going to want to
remember that.
316
00:27:10 --> 00:27:13
OK, so let's just understand
why that is.
317
00:27:13 --> 00:27:17
I have m choices for the first
value of A.
318
00:27:17 --> 00:27:19
m for the second,
etc.
319
00:27:19 --> 00:27:23
m for the r th.
And since there are plus one
320
00:27:23 --> 00:27:28
things here, for each choice
here, I have this many same
321
00:27:28 --> 00:27:34
number of choices here,
so it's a product.
322
00:27:34 --> 00:27:39
OK, so this is the product rule
in counting.
323
00:27:39 --> 00:27:45
So if you haven't reviewed your
6.042 notes for counting,
324
00:27:45 --> 00:27:52
this is going to be a good idea
to go back and review that
325
00:27:52 --> 00:27:57
because we're doing stuff of
that nature.
326
00:27:57 --> 00:28:01
This is just the product rule.
Good.
327
00:28:01 --> 00:28:10
So then the theorem we want to
prove is that H is universal.
328
00:28:10 --> 00:28:14
And this is going to involve a
little bit of number theory,
329
00:28:14 --> 00:28:19
so it gets kind of interesting.
And it's a non-trivial proof,
330
00:28:19 --> 00:28:23
so this is where if there's any
questions as I'm going along,
331
00:28:23 --> 00:28:28
please ask because the argument
is not as simple as other
332
00:28:28 --> 00:28:33
arguments we've seen so far.
OK, not the ones we've seen so
333
00:28:33 --> 00:28:38
far have been simple,
but this is definitely a more
334
00:28:38 --> 00:28:43
involved mathematical argument.
So here's a proof.
335
00:28:43 --> 00:28:46
So let's let,
so we have two keys.
336
00:28:46 --> 00:28:50
What are we trying to show if
it's universal,
337
00:28:50 --> 00:28:55
that if I pick any two keys,
the number of hash functions
338
00:28:55 --> 00:29:01
for which they hash to the same
thing is the size of set of hash
339
00:29:01 --> 00:29:08
functions divided by m.
OK, so I'm going to look at two
340
00:29:08 --> 00:29:11
keys.
So let's pick two keys
341
00:29:11 --> 00:29:16
arbitrarily.
So x, and we'll decompose it
342
00:29:16 --> 00:29:23
into our base r representation
and y, y_0, y_1 --
343
00:29:23 --> 00:29:33
344
00:29:33 --> 00:29:39
So these are two distinct keys.
So if these are two distinct
345
00:29:39 --> 00:29:45
keys, so they're different,
then this base representation
346
00:29:45 --> 00:29:50
has the property that they've
got to differ somewhere.
347
00:29:50 --> 00:29:54
Right?
OK, they differ in at least one
348
00:29:54 --> 00:29:56
digit.
349
00:29:56 --> 00:30:08
350
00:30:08 --> 00:30:12
OK, and this is where most
people get lost because I'm
351
00:30:12 --> 00:30:16
going to make a simplification.
They could differ in any one of
352
00:30:16 --> 00:30:20
these digits.
I'm going to say they differ in
353
00:30:20 --> 00:30:24
position 0 because it doesn't
matter which one I do,
354
00:30:24 --> 00:30:28
the math is the same,
but it'll make it so that if I
355
00:30:28 --> 00:30:31
pick some said they differ in
some position i,
356
00:30:31 --> 00:30:35
I would have to be taking
summations as you'll see over
357
00:30:35 --> 00:30:41
the elements that are not i,
and that's complicated.
358
00:30:41 --> 00:30:44
If I do it in position 0,
then I can just sum for the
359
00:30:44 --> 00:30:46
rest of them.
So the math is going to be
360
00:30:46 --> 00:30:50
identical if I were to do it for
any position because it's
361
00:30:50 --> 00:30:52
symmetric.
All the digits are symmetric.
362
00:30:52 --> 00:30:56
So let's say they differ in
position 0, but the same
363
00:30:56 --> 00:30:59
argument is going to be true if
they differed in some other
364
00:30:59 --> 00:31:02
position.
So let's say,
365
00:31:02 --> 00:31:05
so we're saying without loss of
generality.
366
00:31:05 --> 00:31:08
So that's without loss of
generality.
367
00:31:08 --> 00:31:12
Position 0.
Because all the positions are
368
00:31:12 --> 00:31:16
symmetric here.
And so, now we need to ask the
369
00:31:16 --> 00:31:19
question for how many --
370
00:31:19 --> 00:31:24
371
00:31:24 --> 00:31:30
-- hash functions in our
universal, purportedly universal
372
00:31:30 --> 00:31:34
set do x and y collide?
373
00:31:34 --> 00:31:39
374
00:31:39 --> 00:31:42
OK, we've got to count them up.
So how often do they collide?
375
00:31:42 --> 00:31:46
This is where we're going to
pull out some heavy duty number
376
00:31:46 --> 00:31:48
theory.
So we must have,
377
00:31:48 --> 00:31:50
if they collide --
378
00:31:50 --> 00:31:56
379
00:31:56 --> 00:32:03
-- that h sub a of x is equal
to h sub a of y.
380
00:32:03 --> 00:32:09
That's what it means for them
to collide.
381
00:32:09 --> 00:32:20
So that implies that the sum of
i equal 0 to r of a sub i x sub
382
00:32:20 --> 00:32:30
i is equal to the sum of i
equals 0 to r of a sub i y sub i
383
00:32:30 --> 00:32:35
mod m.
Actually this is congruent mod
384
00:32:35 --> 00:32:38
m.
So congruence for those people
385
00:32:38 --> 00:32:43
who haven't seen much number
theory, is basically the way of
386
00:32:43 --> 00:32:48
essentially, rather than having
to say mod everywhere in here
387
00:32:48 --> 00:32:52
and mod everywhere in here,
we just at the end say OK,
388
00:32:52 --> 00:32:56
do a mod at the end.
Everything is being done mod,
389
00:32:56 --> 00:32:59
module m.
And then typically we use a
390
00:32:59 --> 00:33:06
congruence sign.
OK, there's a more mathematical
391
00:33:06 --> 00:33:13
definition but this will work
for us engineers.
392
00:33:13 --> 00:33:18
OK, so everybody with me so
far?
393
00:33:18 --> 00:33:23
This is just applying the
definition.
394
00:33:23 --> 00:33:32
So that implies that the sum of
i equals 0 to r of a i x i minus
395
00:33:32 --> 00:33:41
y i is congruent to zeros mod m.
OK, just threw it on the other
396
00:33:41 --> 00:33:45
side and applied the
distributive law.
397
00:33:45 --> 00:33:49
Now what I'm going to do is
pull out the 0-th position
398
00:33:49 --> 00:33:53
because that's the one that I
care about.
399
00:33:53 --> 00:33:58
And this is where it saves me
on the math, compared to if I
400
00:33:58 --> 00:34:03
didn't say that it was 0.
I'd have to pull out x_i.
401
00:34:03 --> 00:34:05
It wouldn't matter,
but it just would make the math
402
00:34:05 --> 00:34:06
a little bit cruftier
403
00:34:06 --> 00:34:23
404
00:34:23 --> 00:34:30
OK, so now we've just pulled
out one term.
405
00:34:30 --> 00:34:41
That implies that a_0 x_0 minus
y_0 is congruent to minus --
406
00:34:41 --> 00:34:54
407
00:34:54 --> 00:34:58
-- mod m.
Now remember that when I have a
408
00:34:58 --> 00:35:02
minus number mod m,
I just map it into whatever,
409
00:35:02 --> 00:35:07
into that range from 0 to m
minus one.
410
00:35:07 --> 00:35:12
So for example,
minus five mod seven is two.
411
00:35:12 --> 00:35:19
So if any of these things are
negative, we simply translate
412
00:35:19 --> 00:35:27
them into by adding multiples of
mbecause adding multiples of m
413
00:35:27 --> 00:35:32
doesn't affect the congruence.
414
00:35:32 --> 00:35:39
415
00:35:39 --> 00:35:41
OK.
And now for the next step,
416
00:35:41 --> 00:35:44
we need to use a number theory
fact.
417
00:35:44 --> 00:35:48
So let's pull out our number
theory --
418
00:35:48 --> 00:35:57
419
00:35:57 --> 00:36:05
-- textbook and take a little
digression
420
00:36:05 --> 00:36:10
421
00:36:10 --> 00:36:14
So this comes from the theory
of finite fields.
422
00:36:14 --> 00:36:17
So for people who are
knowledgeable,
423
00:36:17 --> 00:36:21
that's where you're plugging
your knowledge in.
424
00:36:21 --> 00:36:26
If you're not knowledgeable,
this is a great area of math to
425
00:36:26 --> 00:36:30
learn about.
So here's the fact.
426
00:36:30 --> 00:36:34
So let m be prime.
Then for any z,
427
00:36:34 --> 00:36:41
little z element of z sub m,
and z sub m is the integers mod
428
00:36:41 --> 00:36:46
m.
So this is essentially numbers
429
00:36:46 --> 00:36:51
from 0 to m minus one with all
the operations,
430
00:36:51 --> 00:36:57
times, minus,
plus, etc., defined on that
431
00:36:57 --> 00:37:04
such that if you end up outside
of the range of 0 to m minus
432
00:37:04 --> 00:37:11
one, you re-normalize by
subtracting or adding multiples
433
00:37:11 --> 00:37:21
of m to get back within the
range from 0 to m minus one.
434
00:37:21 --> 00:37:30
So it's the standard thing of
just doing things module m.
435
00:37:30 --> 00:37:38
So for any z such that z is not
congruent to 0,
436
00:37:38 --> 00:37:47
there exists a unique z inverse
in z sub m, such that if I
437
00:37:47 --> 00:37:57
multiply z times the inverse,
it produces something congruent
438
00:37:57 --> 00:38:04
to one mod m.
So for any number it says,
439
00:38:04 --> 00:38:11
I can find another number that
when multiplied by it gives me
440
00:38:11 --> 00:38:15
one.
So let's just do an example for
441
00:38:15 --> 00:38:18
m equals seven.
So here we have,
442
00:38:18 --> 00:38:24
we'll make a little table.
So z is not equal to 0,
443
00:38:24 --> 00:38:29
so I just write down the other
numbers.
444
00:38:29 --> 00:38:35
And let's figure out what z
inverse is.
445
00:38:35 --> 00:38:41
So what's the inverse of one?
What number when multiplied by
446
00:38:41 --> 00:38:43
one gives me one?
One.
447
00:38:43 --> 00:38:45
Good.
How about two?
448
00:38:45 --> 00:38:51
What number when I multiply it
by two gives me one?
449
00:38:51 --> 00:38:55
Four.
Because two times four is eight
450
00:38:55 --> 00:39:01
and eight is congruent to one
mod seven.
451
00:39:01 --> 00:39:04
So I've re-normalized it.
What about three?
452
00:39:04 --> 00:39:12
453
00:39:12 --> 00:39:13
Five.
Good.
454
00:39:13 --> 00:39:16
Five.
Three times five is 15.
455
00:39:16 --> 00:39:22
That's congruent to one mod
seven because 15 divided by
456
00:39:22 --> 00:39:28
seven is two remainder of one.
So that's the key thing.
457
00:39:28 --> 00:39:32
What about four?
Two.
458
00:39:32 --> 00:39:36
Five? Three. And six.
459
00:39:36 --> 00:39:43
460
00:39:43 --> 00:39:43
Yeah.
Six.
461
00:39:43 --> 00:39:48
Yeah, six it turns out.
OK, six times six is 36.
462
00:39:48 --> 00:39:52
OK, mod seven.
Basically subtract off the 35,
463
00:39:52 --> 00:39:56
gives m one.
So people have observed some
464
00:39:56 --> 00:40:02
interesting facts that if one
number's an inverse of another,
465
00:40:02 --> 00:40:08
then that other is an inverse
of the one.
466
00:40:08 --> 00:40:12
So that's actually one of these
things that you prove when you
467
00:40:12 --> 00:40:16
do group theory and field theory
and so forth.
468
00:40:16 --> 00:40:21
There are all sorts of other
great properties of this kind of
469
00:40:21 --> 00:40:23
math.
But the main thing is,
470
00:40:23 --> 00:40:27
and this turns out not to be
true if m is not a prime.
471
00:40:27 --> 00:40:31
So can somebody think of,
imagine we're doing something
472
00:40:31 --> 00:40:36
mod 10.
Can somebody think of a number
473
00:40:36 --> 00:40:39
that doesn't have an inverse mod
10?
474
00:40:39 --> 00:40:40
Yeah.
Two.
475
00:40:40 --> 00:40:45
Another one is five.
OK, it turns out the divisors
476
00:40:45 --> 00:40:49
in fact actually,
more generally,
477
00:40:49 --> 00:40:53
something that is not
relatively prime,
478
00:40:53 --> 00:40:58
meaning that it has no common
factors, the GCD is not one
479
00:40:58 --> 00:41:04
between that number and the
modulus.
480
00:41:04 --> 00:41:08
OK, those numbers do not have
an inverse mod m.
481
00:41:08 --> 00:41:13
OK, but if it's prime,
every number is relatively
482
00:41:13 --> 00:41:17
prime to the modulus.
And that's the property that
483
00:41:17 --> 00:41:22
we're taking advantage of.
So this is our fact and so,
484
00:41:22 --> 00:41:28
in this case what I'm after is
I want to divide by x_0 minus
485
00:41:28 --> 00:41:31
y_0.
That's what I want to do at
486
00:41:31 --> 00:41:34
this point.
But I can't do that if x_0,
487
00:41:34 --> 00:41:36
first of all,
if m isn't prime,
488
00:41:36 --> 00:41:40
I can't necessarily do that.
I might be able to,
489
00:41:40 --> 00:41:43
but I can't necessarily.
But if m is prime,
490
00:41:43 --> 00:41:46
I can definitely divide by x_0
minus y_0.
491
00:41:46 --> 00:41:49
I can find that inverse.
And the other thing I have to
492
00:41:49 --> 00:41:52
do is make sure x_0 minus y_0 is
not 0.
493
00:41:52 --> 00:41:57
OK, it would be 0 if these two
were equal, but our supposition
494
00:41:57 --> 00:42:01
was they weren't equal.
And once again,
495
00:42:01 --> 00:42:05
just bringing it back to the
without loss of generality,
496
00:42:05 --> 00:42:08
if it were some other position
that we were off,
497
00:42:08 --> 00:42:13
I would be doing exactly the
same thing with that position.
498
00:42:13 --> 00:42:16
So now we're going to be able
to divide.
499
00:42:16 --> 00:42:19
So we continue with our --
500
00:42:19 --> 00:42:24
501
00:42:24 --> 00:42:33
-- continue with our proof.
So since x_0 is not equal to
502
00:42:33 --> 00:42:42
y_0, there exists an inverse for
x_0 minus y_0.
503
00:42:42 --> 00:42:48
And that implies,
just continue on from over
504
00:42:48 --> 00:42:56
there, that a_0 is congruent
therefore to minus the sum of i
505
00:42:56 --> 00:43:04
equal one to r of a_i,
x_i minus y_i times x_0 minus
506
00:43:04 --> 00:43:10
y_0 inverse.
So let's just go back to the
507
00:43:10 --> 00:43:15
beginning of our proof and see
what we've derived.
508
00:43:15 --> 00:43:19
If we're saying we have two
distinct keys,
509
00:43:19 --> 00:43:24
and we've picked all of these
a_i randomly,
510
00:43:24 --> 00:43:30
and we're saying that these two
distinct keys hash to the same
511
00:43:30 --> 00:43:34
place.
If they hash to the same place,
512
00:43:34 --> 00:43:41
it says that a_0 essentially
had to have a particular value
513
00:43:41 --> 00:43:47
as a function of the other a_i.
Because in other words,
514
00:43:47 --> 00:43:51
once I've picked each of these
a_i from one to r,
515
00:43:51 --> 00:43:54
if I did them in that order,
for example,
516
00:43:54 --> 00:43:58
then I don't have a choice for
how I pick a_0 to make it
517
00:43:58 --> 00:44:00
collide.
Exactly one value allows it to
518
00:44:00 --> 00:44:05
collide, namely the value of a_0
given by this.
519
00:44:05 --> 00:44:10
If I picked a different value
of a_0, they wouldn't collide.
520
00:44:10 --> 00:44:16
So let m write that down.
Thus, while you think about it
521
00:44:16 --> 00:45:12
522
00:45:12 --> 00:45:18
So for any choice of these a_i,
there's exactly one of the
523
00:45:18 --> 00:45:24
impossible choices of a_0 that
cause a collision.
524
00:45:24 --> 00:45:29
And for all the other choices I
might make of a_0,
525
00:45:29 --> 00:45:36
there's n collision.
So essentially I don't have,
526
00:45:36 --> 00:45:42
if they're going to collide,
I've reduced essentially the
527
00:45:42 --> 00:45:49
number of degrees of freedom of
my randomness by a factor of m.
528
00:45:49 --> 00:45:55
So if I count up the number of
h_a's that cause x and y to
529
00:45:55 --> 00:46:01
collide, that's equal to,
well, there's m choices,
530
00:46:01 --> 00:46:06
just using the product rule
again.
531
00:46:06 --> 00:46:13
There's m choices for a_1 times
m choices for a_2,
532
00:46:13 --> 00:46:21
up to m choices for a_r and
then only one choice for a_0.
533
00:46:21 --> 00:46:28
So this is choices for a_1,
a_2, a_r and only one choice
534
00:46:28 --> 00:46:35
for a_0 if they're going to
collide.
535
00:46:35 --> 00:46:40
If they're not going to
collide, I've got more choices
536
00:46:40 --> 00:46:43
for a_0.
But if I want them to collide,
537
00:46:43 --> 00:46:48
there's only one value I can
pick, namely this value.
538
00:46:48 --> 00:46:53
That's the only value for which
I will pick.
539
00:46:53 --> 00:46:58
And that's equal to m to the r,
which is just the size of H
540
00:46:58 --> 00:47:03
divided by m.
And that completes the proof.
541
00:47:03 --> 00:47:11
542
00:47:11 --> 00:47:14
So there are other universal
constructions,
543
00:47:14 --> 00:47:18
but this is a particularly
elegant one.
544
00:47:18 --> 00:47:22
So the point is that I have m
plus one, sorry,
545
00:47:22 --> 00:47:27
r plus one degrees of freedom
where each degree of freedom I
546
00:47:27 --> 00:47:33
have m choices.
But if I want them to collide,
547
00:47:33 --> 00:47:40
once I've picked any of the,
once I've picked r of those
548
00:47:40 --> 00:47:45
possible choices,
the last one is forced if I
549
00:47:45 --> 00:47:48
want it to collide.
So therefore,
550
00:47:48 --> 00:47:55
the set of functions for which
it collides is only one in m.
551
00:47:55 --> 00:48:01
A very slick construction.
Very slick.
552
00:48:01 --> 00:48:03
OK.
Everybody with me here?
553
00:48:03 --> 00:48:07
Didn't lose too many people?
Yeah, question.
554
00:48:07 --> 00:48:12
Well, part of it is,
actually this is a quite common
555
00:48:12 --> 00:48:15
type of thing to be doing
actually.
556
00:48:15 --> 00:48:19
If you take a class,
so we have follow on classes in
557
00:48:19 --> 00:48:24
cryptography and so forth,
and this kind of thing of
558
00:48:24 --> 00:48:29
taking dot products,
modulo m and also Galois fields
559
00:48:29 --> 00:48:34
which are particularly simple
finite fields and things like
560
00:48:34 --> 00:48:40
that, people play with these all
the time.
561
00:48:40 --> 00:48:43
So Galois fields are like using
exor's as your,
562
00:48:43 --> 00:48:46
same sort of thing as this
except base two.
563
00:48:46 --> 00:48:49
And so there's a lot of study
of this sort of thing.
564
00:48:49 --> 00:48:53
So people understand these kind
of properties.
565
00:48:53 --> 00:48:57
But yeah, it's like what's the
algorithm for having a brilliant
566
00:48:57 --> 00:49:01
insight into algorithms?
It's like OK.
567
00:49:01 --> 00:49:05
Wish I knew.
Then I'd just turn the crank.
568
00:49:05 --> 00:49:11
[LAUGHTER] But if it were that
easy, I wouldn't be standing up
569
00:49:11 --> 00:49:13
here today.
[LAUGHTER] Good.
570
00:49:13 --> 00:49:19
OK, so now I want to take on
another topic which is also I
571
00:49:19 --> 00:49:22
find, I think this is
astounding.
572
00:49:22 --> 00:49:27
It's just beautiful,
beautiful mathematics and a big
573
00:49:27 --> 00:49:34
impact on your ability to build
good hash functions.
574
00:49:34 --> 00:49:37
Now I want to talk about
another one topic,
575
00:49:37 --> 00:49:41
which is related,
which is the topic of perfect
576
00:49:41 --> 00:49:42
hashing.
577
00:49:42 --> 00:49:54
578
00:49:54 --> 00:49:59
So everything we've done so far
does expected time performance.
579
00:49:59 --> 00:50:03
Hashing is good in the expected
sense.
580
00:50:03 --> 00:50:08
A perfect hashing addresses the
following questions.
581
00:50:08 --> 00:50:14
Suppose that I gave you a set
of keys, and I said just build
582
00:50:14 --> 00:50:20
me a static table so I can look
up whether the key is in the
583
00:50:20 --> 00:50:25
table with worst case time.
Good worst case time.
584
00:50:25 --> 00:50:31
So I have a fixed set of keys.
They might be something like
585
00:50:31 --> 00:50:37
for example, the hundred most
common or thousand most common
586
00:50:37 --> 00:50:42
words in English.
And when I get a word I want to
587
00:50:42 --> 00:50:47
check quickly in this table,
is the word that I've got one
588
00:50:47 --> 00:50:49
of the most common words in
English.
589
00:50:49 --> 00:50:54
I would like to do that not
with expected performance,
590
00:50:54 --> 00:50:57
but guaranteed worst case
performance.
591
00:50:57 --> 00:51:03
Is there a way of building it
so that I can find this quickly?
592
00:51:03 --> 00:51:06
So the problem is given n keys
--
593
00:51:06 --> 00:51:12
594
00:51:12 --> 00:51:14
-- construct a static hash
table.
595
00:51:14 --> 00:51:17
In other words,
no insertion and deletion.
596
00:51:17 --> 00:51:20
We're just going to put the
elements in there.
597
00:51:20 --> 00:51:22
A size --
598
00:51:22 --> 00:51:30
599
00:51:30 --> 00:51:37
-- m equal Order n.
So I don't want it to be a huge
600
00:51:37 --> 00:51:42
table.
I want it to be a table that is
601
00:51:42 --> 00:51:50
the size of my keys.
Table of size m equals Order n,
602
00:51:50 --> 00:51:59
such that search takes O(1)
time in the worst case.
603
00:51:59 --> 00:52:06
604
00:52:06 --> 00:52:10
So there's no place in the
table where I'm going to have,
605
00:52:10 --> 00:52:14
I know in the average case,
that's not hard to do.
606
00:52:14 --> 00:52:18
But in the worst case,
I want to make sure that
607
00:52:18 --> 00:52:22
there's no particular spot where
the number of keys piles up to
608
00:52:22 --> 00:52:26
be a large number.
OK, in no spot should that
609
00:52:26 --> 00:52:29
happen.
Every single search I do should
610
00:52:29 --> 00:52:33
take Order one time.
There shouldn't be any
611
00:52:33 --> 00:52:37
statistical variation in terms
of how long it takes me to get
612
00:52:37 --> 00:52:39
something.
Does everybody understand what
613
00:52:39 --> 00:52:42
the puzzle is?
So this is a great,
614
00:52:42 --> 00:52:45
because this actually ends up
having a lot of uses.
615
00:52:45 --> 00:52:49
You know, you want to build a
table for something and you know
616
00:52:49 --> 00:52:52
what the values are that you're
going look up in it.
617
00:52:52 --> 00:52:56
But you don't want to spend a
lot of space on it and so forth.
618
00:52:56 --> 00:53:00
So the idea here is actually
going to be to use a two-level
619
00:53:00 --> 00:53:02
scheme.
620
00:53:02 --> 00:53:09
621
00:53:09 --> 00:53:22
So the idea is we're going to
use a two-level scheme with
622
00:53:22 --> 00:53:31
universal hashing at both
levels.
623
00:53:31 --> 00:53:36
So the idea is we're going to
hash, we're going to have a hash
624
00:53:36 --> 00:53:41
table, we're going to hash into
slots, but rather than using
625
00:53:41 --> 00:53:46
chaining, we're going to have
another hash table there.
626
00:53:46 --> 00:53:51
We're going to do a second hash
into the second hash table.
627
00:53:51 --> 00:53:56
And the idea is that we're
going to do it in such a way
628
00:53:56 --> 00:54:01
that we have no collisions at
level two.
629
00:54:01 --> 00:54:03
So we may have collisions at
level one.
630
00:54:03 --> 00:54:08
We'll take anything that
collides at level one and put
631
00:54:08 --> 00:54:12
them into a hash table and then
our second level hash table,
632
00:54:12 --> 00:54:15
but that hash table,
no collisions.
633
00:54:15 --> 00:54:17
Boom.
We're just going to hash right
634
00:54:17 --> 00:54:20
in there.
And it'll just go boom to its
635
00:54:20 --> 00:54:23
thing.
So let's draw a picture of this
636
00:54:23 --> 00:54:28
to illustrate the scheme.
OK, so we have --
637
00:54:28 --> 00:54:34
638
00:54:34 --> 00:54:37
-- 0 one, let's say six,
m minus one.
639
00:54:37 --> 00:54:42
So here's our hash table.
And what we're going to do is
640
00:54:42 --> 00:54:47
we're going to use universal
hashing at the first level,
641
00:54:47 --> 00:54:49
OK.
So we find a universal hash
642
00:54:49 --> 00:54:52
function.
We pick a hash function at
643
00:54:52 --> 00:54:56
random.
And what we'll do is we'll hash
644
00:54:56 --> 00:55:00
into that level.
And then what we'll do is we'll
645
00:55:00 --> 00:55:05
keep track of two things.
One is what the size of the
646
00:55:05 --> 00:55:09
hash table is at the next level.
So in this case,
647
00:55:09 --> 00:55:13
the size of the hash table will
only use the number of slots.
648
00:55:13 --> 00:55:17
There's going to be four.
And we're also going to keep a
649
00:55:17 --> 00:55:19
separate hash key for the second
level.
650
00:55:19 --> 00:55:23
So each slot will have its own
hash function for the second
651
00:55:23 --> 00:55:25
level.
So for example,
652
00:55:25 --> 00:55:30
this one might have a key of 31
that is a random number.
653
00:55:30 --> 00:55:32
The a's here.
a's up there.
654
00:55:32 --> 00:55:34
There we go,
a's up there.
655
00:55:34 --> 00:55:39
So that's going to be the basis
of my hash function,
656
00:55:39 --> 00:55:42
the key with which I'm going to
hash.
657
00:55:42 --> 00:55:46
This one say has 86.
And let's say that this,
658
00:55:46 --> 00:55:50
and then we have a pointer to
the hash table.
659
00:55:50 --> 00:55:55
This is say S_1.
And it's got four slots and we
660
00:55:55 --> 00:56:01
stored up 14 and 27.
And these two slots are empty.
661
00:56:01 --> 00:56:09
And this one for example,
had what?
662
00:56:09 --> 00:56:12
Two nines.
663
00:56:12 --> 00:56:28
664
00:56:28 --> 00:56:34
So the idea here is that in
this case if we look over all
665
00:56:34 --> 00:56:40
our top level hash function,
which I'll just call H,
666
00:56:40 --> 00:56:47
has that H of 14 is equal to H
of 27 is equal to one.
667
00:56:47 --> 00:56:53
Because we're in slot one.
OK, so these two both hash to
668
00:56:53 --> 00:56:57
the same slot in the level one
hash table.
669
00:56:57 --> 00:57:02
This is level one.
And this is level two over
670
00:57:02 --> 00:57:06
here.
So level one hashing,
671
00:57:06 --> 00:57:11
14 and 27 collided.
They went into the same slot
672
00:57:11 --> 00:57:13
here.
But at level two,
673
00:57:13 --> 00:57:20
they got hashed to different
places and the hash function I
674
00:57:20 --> 00:57:26
use is going to be indexed by
whatever the random numbers are
675
00:57:26 --> 00:57:33
that I chose and found for those
and I'll show you how we find
676
00:57:33 --> 00:57:36
those.
We have then h of 31 of 14 is
677
00:57:36 --> 00:57:43
equal to one h of 31 of 27 is
equal to two.
678
00:57:43 --> 00:57:46
For level two.
So I go, hash in here,
679
00:57:46 --> 00:57:51
find the, use this as the basis
of my hash function to hash into
680
00:57:51 --> 00:57:55
whatever table I've got here.
And so, if there are no,
681
00:57:55 --> 00:58:00
if I can guarantee that there
are no collisions at level two,
682
00:58:00 --> 00:58:05
this is going to cost me Order
one time in the worst case to
683
00:58:05 --> 00:58:09
look something up.
How do I look it up?
684
00:58:09 --> 00:58:12
Take the value.
I apply h to it.
685
00:58:12 --> 00:58:16
That takes me to some slot.
Then I look to see what the key
686
00:58:16 --> 00:58:21
is for this hash function.
I apply that hash function and
687
00:58:21 --> 00:58:24
that takes me to another slot.
Then I go there.
688
00:58:24 --> 00:58:29
And that took me basically two
applications of hash functions
689
00:58:29 --> 00:58:33
plus some look-up,
plus who knows what minor
690
00:58:33 --> 00:58:41
amount of bookkeeping.
So the reason we're going to
691
00:58:41 --> 00:58:50
have no collisions at this level
is the following.
692
00:58:50 --> 00:59:01
If they're n sub i items that
hash to a level one slot i,
693
00:59:01 --> 00:59:11
then we're going to use m sub
i, which is equal to n sub i
694
00:59:11 --> 00:59:21
squared slots in the level two
hash table.
695
00:59:21 --> 00:59:29
696
00:59:29 --> 00:59:33
OK, so I should have mentioned
here this is going to be m sub
697
00:59:33 --> 00:59:37
i, the size of the hash table
and this is going to be my a sub
698
00:59:37 --> 00:59:39
i essentially.
699
00:59:39 --> 00:59:45
700
00:59:45 --> 00:59:50
So I'm going to use,
so basically I'm going to hash
701
00:59:50 --> 00:59:55
n sub i things into n sub i
squared locations here.
702
00:59:55 --> 1:00:00
So this is going to be
incredibly sparse.
703
1:00:00 --> 1:00:02.48
OK, it's going to be quadratic
in size.
704
1:00:02.48 --> 1:00:05.612
And so what I'm going to show
is that under those
705
1:00:05.612 --> 1:00:08.418
circumstances,
it's easy for me to find hash
706
1:00:08.418 --> 1:00:11.159
functions such that there are n
collisions.
707
1:00:11.159 --> 1:00:15.01
That's the name of the game.
Figure out how can I make these
708
1:00:15.01 --> 1:00:18.012
hash functions so that there are
no collisions.
709
1:00:18.012 --> 1:00:21.341
So that's why I draw this with
so few elements here.
710
1:00:21.341 --> 1:00:24.604
So here for example,
I have two elements and I have
711
1:00:24.604 --> 1:00:27.867
a hash table size four here.
I have three elements.
712
1:00:27.867 --> 1:00:32.52
I need a hash table size nine.
OK, if there are a hundred
713
1:00:32.52 --> 1:00:34.918
elements, I need a hash table
size 10,000.
714
1:00:34.918 --> 1:00:38.485
I'm not going to pick something
so there's likely that there's
715
1:00:38.485 --> 1:00:41.35
anything of that size.
And then the fact that this
716
1:00:41.35 --> 1:00:44.801
actually works and gives us all
the properties that we want,
717
1:00:44.801 --> 1:00:48.251
that's part of the analysis.
So does everybody see that this
718
1:00:48.251 --> 1:00:51.877
takes Order one worst case time
and what the basic structure of
719
1:00:51.877 --> 1:00:52.988
it is?
These things,
720
1:00:52.988 --> 1:00:55.21
by the way, are not in this
case prime.
721
1:00:55.21 --> 1:00:58.134
I could always pick primes that
were close to this.
722
1:00:58.134 --> 1:01:03.73
I didn't do that in this case.
Or I could use a universal hash
723
1:01:03.73 --> 1:01:09.103
function that in fact would work
for things other than primes.
724
1:01:09.103 --> 1:01:12.362
But I didn't do that for this
example.
725
1:01:12.362 --> 1:01:16.943
We all ready for analysis?
OK, let's do some analysis
726
1:01:16.943 --> 1:01:18
then.
727
1:01:18 --> 1:01:29
728
1:01:29 --> 1:01:31
And this is really pretty
analysis.
729
1:01:31 --> 1:01:33.528
Partly as you'll see because
we've already done some of this
730
1:01:33.528 --> 1:01:34
analysis.
731
1:01:34 --> 1:01:50
732
1:01:50 --> 1:01:53.238
So the trick is analyzing level
two.
733
1:01:53.238 --> 1:01:57.309
That's the main thing that I
want to analyze,
734
1:01:57.309 --> 1:02:02.583
to show that I can find hash
functions here that are going
735
1:02:02.583 --> 1:02:06.192
to, when I map them into,
very sparsely,
736
1:02:06.192 --> 1:02:09.523
into these arrays here,
that in fact,
737
1:02:09.523 --> 1:02:16
such hash functions exist and I
can compute them in advance.
738
1:02:16 --> 1:02:23.344
So that I have a good way of
storing those.
739
1:02:23.344 --> 1:02:30.338
So here's the theorem we're
going to use.
740
1:02:30.338 --> 1:02:40.83
My hash and keys into m equals
n squared slots using a random
741
1:02:40.83 --> 1:02:48
hash function in a universal set
H.
742
1:02:48 --> 1:03:00.393
Then the expected number of
collisions is less than one
743
1:03:00.393 --> 1:03:02.502
half.
OK.
744
1:03:02.502 --> 1:03:11.372
The expected number of
collisions I don't expect there
745
1:03:11.372 --> 1:03:20.577
to be even one collision.
I expect there to be less than
746
1:03:20.577 --> 1:03:29.447
half a collision on average.
And so, let's prove this,
747
1:03:29.447 --> 1:03:39.154
so that the probability that
two given keys collide under h
748
1:03:39.154 --> 1:03:45.216
is what?
What's the probability that two
749
1:03:45.216 --> 1:03:51.443
given keys collide under h when
h is chosen randomly from the
750
1:03:51.443 --> 1:03:54.037
universal set?
One over m.
751
1:03:54.037 --> 1:03:56.943
Right?
That's the definition,
752
1:03:56.943 --> 1:04:02.235
right, of, which is in this
case equal to one over n
753
1:04:02.235 --> 1:04:06.21
squared.
So now how many keys,
754
1:04:06.21 --> 1:04:11.052
how many pairs of keys do I
have in this table?
755
1:04:11.052 --> 1:04:16.526
How many keys could possibly
collide with each other?
756
1:04:16.526 --> 1:04:19.368
OK.
So that's basically just
757
1:04:19.368 --> 1:04:25.157
looking at how many different
pairs of keys do I have to
758
1:04:25.157 --> 1:04:30.315
evaluate this for.
So that's n choose two pairs of
759
1:04:30.315 --> 1:04:36.655
keys.
n choose two pairs of keys.
760
1:04:36.655 --> 1:04:42.689
So therefore,
the expected number of
761
1:04:42.689 --> 1:04:52.172
collisions is while for each of
these n, not n over two.
762
1:04:52.172 --> 1:05:00.793
n choose two pairs of keys.
The probability that it
763
1:05:00.793 --> 1:05:08.923
collides is one in n squared.
So that's equal to n times n
764
1:05:08.923 --> 1:05:12.221
minus one over two,
if you remember your formula,
765
1:05:12.221 --> 1:05:16
times one in n squared.
And that's less than a half.
766
1:05:16 --> 1:05:24
767
1:05:24 --> 1:05:28.183
So for every pair of keys,
so those of you who remember
768
1:05:28.183 --> 1:05:33.063
from 6.042 the birthday paradox,
this is related to the birthday
769
1:05:33.063 --> 1:05:36.8
paradox a little bit.
But here I basically have a
770
1:05:36.8 --> 1:05:40.333
large set, and I'm looking at
all pairs, but my set is
771
1:05:40.333 --> 1:05:44
sufficiently big that the odds
that I get a collision is
772
1:05:44 --> 1:05:47.199
relatively small.
If I start increasing it beyond
773
1:05:47.199 --> 1:05:50.4
the square root of m,
OK, the number of elements,
774
1:05:50.4 --> 1:05:54.466
it starts getting bigger in the
square root of m then the odds
775
1:05:54.466 --> 1:05:57.733
of a collision go up
dramatically as you know from
776
1:05:57.733 --> 1:06:01.532
the birthday paradox.
But if I'm less than,
777
1:06:01.532 --> 1:06:05.401
if I'm really sparse in there,
I don't get collisions.
778
1:06:05.401 --> 1:06:09.197
Or at least I get a relatively
small number expected.
779
1:06:09.197 --> 1:06:13.43
Now I want to remind you of
something which actually in the
780
1:06:13.43 --> 1:06:17.08
past I have just assumed,
but I want to actually go
781
1:06:17.08 --> 1:06:20.291
through it briefly.
It's Markov's inequality.
782
1:06:20.291 --> 1:06:22.919
So who remembers Markov's
inequality?
783
1:06:22.919 --> 1:06:25.839
Don't everybody raise their
hand at once.
784
1:06:25.839 --> 1:06:30
So Markov's inequality says the
following.
785
1:06:30 --> 1:06:34.145
This is one of these great
probability facts.
786
1:06:34.145 --> 1:06:38.762
For random variable x which is
bounded below by 0,
787
1:06:38.762 --> 1:06:44.227
says the probability that x is
bigger than, greater than or
788
1:06:44.227 --> 1:06:49.316
equal to any given value T is
less than or equal to the
789
1:06:49.316 --> 1:06:53.838
expectation of x divided by T.
It's a great fact.
790
1:06:53.838 --> 1:06:57.796
Doesn't happen if x isn't bound
below by 0.
791
1:06:57.796 --> 1:07:03.23
But it's a great fact.
It allows me to relate the
792
1:07:03.23 --> 1:07:06.833
probability of an event to its
expectation.
793
1:07:06.833 --> 1:07:12.066
And the idea is in general that
if the expectation is going to
794
1:07:12.066 --> 1:07:17.213
be small, then I can't have a
high probability that the value
795
1:07:17.213 --> 1:07:21.845
of the random variable is large.
It doesn't make sense.
796
1:07:21.845 --> 1:07:26.649
How could you have a high
probability that it's a million
797
1:07:26.649 --> 1:07:31.968
when my expectation is one or in
this case we're going to apply
798
1:07:31.968 --> 1:07:36
it when the expectation is a
half?
799
1:07:36 --> 1:07:39.676
Couldn't happen.
And the proof follows just
800
1:07:39.676 --> 1:07:44.666
directly on the definition of
expectation, and so I'mdoing
801
1:07:44.666 --> 1:07:47.73
this for a discrete random
variable.
802
1:07:47.73 --> 1:07:52.282
So the expectation by
definition is just the sum from
803
1:07:52.282 --> 1:07:57.622
little x goes to 0 to infinity
of x times the probability that
804
1:07:57.622 --> 1:08:02
my random variable takes on the
value x.
805
1:08:02 --> 1:08:06.56
That's the definition.
And now it's just a question of
806
1:08:06.56 --> 1:08:11.12
doing like the coarsest
approximation you can imagine.
807
1:08:11.12 --> 1:08:14.734
First of all,
let me just simply throw away
808
1:08:14.734 --> 1:08:19.725
all small terms that can be
greater to or equal to x equals
809
1:08:19.725 --> 1:08:24.716
T to infinity of x times the
probability that x is equal to
810
1:08:24.716 --> 1:08:28.072
little x.
So just throw away all the low
811
1:08:28.072 --> 1:08:31.427
order terms.
Now what I'm going to do is
812
1:08:31.427 --> 1:08:36.848
replace every one of these terms
is lower bounded by the value x
813
1:08:36.848 --> 1:08:42.875
equals T.
So that's just the summation of
814
1:08:42.875 --> 1:08:49.75
x equals T to infinity of T
times the probability that x
815
1:08:49.75 --> 1:08:51.25
equals x.
OK.
816
1:08:51.25 --> 1:08:58.25
Over x going from T larger.
Because these are only bigger
817
1:08:58.25 --> 1:09:02.009
values.
And that's just equal then to
818
1:09:02.009 --> 1:09:06.306
T, because I can pull that out,
and the summation of x equals T
819
1:09:06.306 --> 1:09:10.257
to infinity of the probability
that x equals x is just the
820
1:09:10.257 --> 1:09:14
probability that x is greater
than or equal to T.
821
1:09:14 --> 1:09:20
822
1:09:20 --> 1:09:26
And that's done because I just
divide by T.
823
1:09:26 --> 1:09:31
824
1:09:31 --> 1:09:34.379
So that's Markov's inequality.
Really dumb.
825
1:09:34.379 --> 1:09:37.919
Really simple.
There are much stronger things
826
1:09:37.919 --> 1:09:42.264
like Chebyshev bounds and
Chernoff bounds and things of
827
1:09:42.264 --> 1:09:44.839
that nature.
But Markov's is like
828
1:09:44.839 --> 1:09:49.586
unbelievably simple and useful.
So we're going to just apply
829
1:09:49.586 --> 1:09:52
that as a corollary.
830
1:09:52 --> 1:10:06
831
1:10:06 --> 1:10:13.06
So the probability now of no
collisions, when I hash n keys
832
1:10:13.06 --> 1:10:19.391
into n squared slots using a
universal hash function,
833
1:10:19.391 --> 1:10:26.817
I claim is the probability of
no collisions is greater than or
834
1:10:26.817 --> 1:10:32.173
equal to a half.
So I pick a hash function at
835
1:10:32.173 --> 1:10:36.409
random.
What are the odds that I got no
836
1:10:36.409 --> 1:10:40.917
collisions when I hashed those n
keys into n squared slots?
837
1:10:40.917 --> 1:10:43.326
Answer.
Probability is I have no
838
1:10:43.326 --> 1:10:47.834
collisions is at least a half.
Half the time I'm guaranteed
839
1:10:47.834 --> 1:10:51.409
that there won't be a collision.
And the proof,
840
1:10:51.409 --> 1:10:54.129
pretty simple.
The probability of no
841
1:10:54.129 --> 1:10:57.549
collisions is the same as the
probability as,
842
1:10:57.549 --> 1:11:01.746
sorry, is one minus the
probability that I have at most
843
1:11:01.746 --> 1:11:05.85
one collision.
So the odds that I have at
844
1:11:05.85 --> 1:11:09.337
least one collision,
the odds that I have at least
845
1:11:09.337 --> 1:11:12.254
one collision,
probability greater than or
846
1:11:12.254 --> 1:11:15.599
equal to one collision is less
than or equal to,
847
1:11:15.599 --> 1:11:18.872
now I just apply Markov's
inequality with this.
848
1:11:18.872 --> 1:11:23
So it's just the expected
number of collisions --
849
1:11:23 --> 1:11:29
850
1:11:29 --> 1:11:33.09
-- divided by one.
And that is by Markov's
851
1:11:33.09 --> 1:11:36.272
inequality less than,
by definition,
852
1:11:36.272 --> 1:11:40.181
excuse me, of expected number
of collisions,
853
1:11:40.181 --> 1:11:44.363
which we've already shown,
is less than a half.
854
1:11:44.363 --> 1:11:49.636
So the probability of at least
one collision is less than a
855
1:11:49.636 --> 1:11:52.909
half.
The probability of 0 collisions
856
1:11:52.909 --> 1:11:56.363
is at least a half.
So we're done here.
857
1:11:56.363 --> 1:12:02
So to find a good level to hash
function is easy.
858
1:12:02 --> 1:12:06.562
I just test a few at random.
Most of them out there,
859
1:12:06.562 --> 1:12:10.856
OK, half of them,
at least half of them are going
860
1:12:10.856 --> 1:12:13.808
to work.
So this is in some sense,
861
1:12:13.808 --> 1:12:18.102
if you think about it,
a randomized construction,
862
1:12:18.102 --> 1:12:22.664
because I can't tell you which
one it's going to be.
863
1:12:22.664 --> 1:12:27.763
It's non-constructive in that
sense, but it's a randomized
864
1:12:27.763 --> 1:12:32.485
construction.
But they have to exist because
865
1:12:32.485 --> 1:12:36.297
most of them out there have this
good property.
866
1:12:36.297 --> 1:12:40.605
So I'mgoing to be able to find
for each one of these,
867
1:12:40.605 --> 1:12:44.168
I just test a few at random,
and I find one.
868
1:12:44.168 --> 1:12:47.068
Test a few at random,
find one, etc.
869
1:12:47.068 --> 1:12:50.548
Fill in my table there.
Because all that is
870
1:12:50.548 --> 1:12:53.945
pre-computation.
And I'mgoing to find them
871
1:12:53.945 --> 1:12:57.342
because the odds are good that
one exists.
872
1:12:57.342 --> 1:12:59
So --
873
1:12:59 --> 1:13:13
874
1:13:13 --> 1:13:14
-- we just test a few at random.
875
1:13:14 --> 1:13:24
876
1:13:24 --> 1:13:25
And we'll find one quickly --
877
1:13:25 --> 1:13:32
878
1:13:32 --> 1:13:34.3
-- since at least half will
work.
879
1:13:34.3 --> 1:13:37.679
I just want to show that there
exists good ones.
880
1:13:37.679 --> 1:13:41.777
All I have to prove is that at
least one works for each of
881
1:13:41.777 --> 1:13:44.366
these cases.
In fact, I've shown that
882
1:13:44.366 --> 1:13:46.954
there's a huge number that will
work.
883
1:13:46.954 --> 1:13:50.189
Half of them will work.
But to show it exists,
884
1:13:50.189 --> 1:13:54.647
I would just have to show that
the probability was greater than
885
1:13:54.647 --> 0.
886
0. --> 1:13:55.941
So to finish up,
887
1:13:55.941 --> 1:14:00.254
we need to still analyze the
storage because I promised in my
888
1:14:00.254 --> 1:14:05
theorem that the table would be
of size order n.
889
1:14:05 --> 1:14:12.702
And yet now I've said there's
all of these quadratic-sized
890
1:14:12.702 --> 1:14:18.378
slots here.
So I'mgoing to show that that's
891
1:14:18.378 --> 1:14:20
order n.
892
1:14:20 --> 1:14:31
893
1:14:31 --> 1:14:35.605
So for level one,
that's easy.
894
1:14:35.605 --> 1:14:45.45
We'll just choose the number of
slots to be equal to the number
895
1:14:45.45 --> 1:14:51.008
of keys.
And that way the storage at
896
1:14:51.008 --> 1:14:59.583
level one is just order n.
And now let's let n sub i be
897
1:14:59.583 --> 1:15:08
the random variable for the
number of keys --
898
1:15:08 --> 1:15:13
899
1:15:13 --> 1:15:21.712
-- that hash to slot i in T.
OK, so n sub i is just what
900
1:15:21.712 --> 1:15:28.683
we've called it.
Number of elements that slot
901
1:15:28.683 --> 1:15:34.386
there.
And we're going to use m sub i
902
1:15:34.386 --> 1:15:45
equals n sub i squared slots in
each level two table S sub i.
903
1:15:45 --> 1:15:47
So the expected total storage --
904
1:15:47 --> 1:15:54
905
1:15:54 --> 1:16:01.085
-- is just n for level one,
order n if you want,
906
1:16:01.085 --> 1:16:09.979
but basically n slots for level
one plus the expected value,
907
1:16:09.979 --> 1:16:19.326
whatever I expect the sum of i
equals 0 to m minus one of theta
908
1:16:19.326 --> 1:16:24
of n sub i squared to be.
909
1:16:24 --> 1:16:30
910
1:16:30 --> 1:16:36.048
Because I basically have to add
up the square for every element
911
1:16:36.048 --> 1:16:40.731
that applies here,
the square of what's in there.
912
1:16:40.731 --> 1:16:46.682
Who recognizes this summation?
Where have we seen that before?
913
1:16:46.682 --> 1:16:51.951
Who attends recitation?
Where have we seen this before?
914
1:16:51.951 --> 1:16:54
What's the --
915
1:16:54 --> 1:17:03
916
1:17:03 --> 1:17:06
We're summing the expected
value of a bunch of --
917
1:17:06 --> 1:17:11
918
1:17:11 --> 1:17:14.959
Yeah, what was that algorithm?
We did the sorting algorithm,
919
1:17:14.959 --> 1:17:17.375
right?
What was the sorting algorithm
920
1:17:17.375 --> 1:17:21
for which this was an important
thing to evaluate?
921
1:17:21 --> 1:17:26
922
1:17:26 --> 1:17:29.272
Don't everybody shout it out at
once.
923
1:17:29.272 --> 1:17:33
What was that sorting algorithm
called?
924
1:17:33 --> 1:17:35.397
Bucket sort.
Good.
925
1:17:35.397 --> 1:17:37.794
Bucket sort.
Yeah.
926
1:17:37.794 --> 1:17:46.397
We showed that the sum of the
squares of random variables when
927
1:17:46.397 --> 1:17:53.025
they're falling randomly into n
bins is order n.
928
1:17:53.025 --> 1:17:55
Right?
929
1:17:55 --> 1:18:16
930
1:18:16 --> 1:18:20.105
And you can also out of this
get a, as we did before,
931
1:18:20.105 --> 1:18:24.131
get a probability bound.
What's the probability that
932
1:18:24.131 --> 1:18:28.315
it's more than a certain amount
times n using Markov's
933
1:18:28.315 --> 1:18:31.394
inequality.
But this is the key thing is
934
1:18:31.394 --> 1:18:36.109
we've seen this analysis.
OK, we used it there in time,
935
1:18:36.109 --> 1:18:39.963
so there's a little bit,
but that's one of the reasons
936
1:18:39.963 --> 1:18:43.963
we study sorting at the
beginning of the term is because
937
1:18:43.963 --> 1:18:47.89
the techniques of sorting,
they just propagate into all
938
1:18:47.89 --> 1:18:52.327
these other areas of analysis.
You see a lot of the same kinds
939
1:18:52.327 --> 1:18:55.309
of things.
And so now that you know bucket
940
1:18:55.309 --> 1:18:59.018
sort clearly so well,
now you know that this without
941
1:18:59.018 --> 1:19:04.61
having to do any extra work.
So you might want to go back
942
1:19:04.61 --> 1:19:09.926
and review your bucket sort
analysis, because it's applied
943
1:19:09.926 --> 1:19:11.604
now.
Same analysis.
944
1:19:11.604 --> 1:19:12.909
Two places.
OK.
945
1:19:12.909 --> 1:19:18.411
Good recitation this Friday,
which will be a quiz review and
946
1:19:18.411 --> 1:19:22.794
we have a quiz next,
there's no class on Monday,
947
1:19:22.794 --> 1:19:26.151
but we have a quiz on next
Wednesday.
948
1:19:26.151 --> 1:19:31
OK, so good luck everybody on
the quiz.
949
1:19:31 --> 1:19:34
Make sure you get plenty of
sleep.