WEBVTT

00:00:09.000 --> 00:00:10.000
Hashing.

00:00:15.000 --> 00:00:19.000
Today we're going to do some
amazing stuff with hashing.

00:00:19.000 --> 00:00:21.000
And, really,
this is such neat stuff,

00:00:21.000 --> 00:00:24.000
it's amazing.
We're going to start by

00:00:24.000 --> 00:00:28.000
addressing a fundamental
weakness of hashing.

00:00:34.000 --> 00:00:37.000
And that is that for any choice
of hash function

00:00:49.000 --> 00:01:04.000
There exists a bad set of keys
that all hash to the same slot.

00:01:09.000 --> 00:01:11.000
OK.
So you pick a hash function.

00:01:11.000 --> 00:01:15.000
We looked at some that seem to
work well in practice,

00:01:15.000 --> 00:01:18.000
that are easy to put into your
code.

00:01:18.000 --> 00:01:23.000
But whichever one you pick,
there's always some bad set of

00:01:23.000 --> 00:01:25.000
keys.
So you can imagine,

00:01:25.000 --> 00:01:30.000
just to drive this point home a
little bit.

00:01:30.000 --> 00:01:35.000
Imagine that you're building a
compiler for a customer and you

00:01:35.000 --> 00:01:40.000
have a symbol table in your
compiler and one of the things

00:01:40.000 --> 00:01:46.000
that the customer is demanding
is that compilations go fast.

00:01:46.000 --> 00:01:50.000
They don't want to sit around
waiting for compilations.

00:01:50.000 --> 00:01:56.000
And you have a competitor who's
also building a compiler and

00:01:56.000 --> 00:02:01.000
they're going to test the
compiler, both of your compilers

00:02:01.000 --> 00:02:07.000
and sort of have a run-off.
And one of the things in the

00:02:07.000 --> 00:02:12.000
test that they're going to allow
you to do is not only will the

00:02:12.000 --> 00:02:16.000
customer run his own benchmarks,
but he'll let you make up

00:02:16.000 --> 00:02:20.000
benchmarks for the other
program, for your competitor.

00:02:20.000 --> 00:02:24.000
And your competitor gets to
make up benchmarks for you.

00:02:24.000 --> 00:02:28.000
So and not only that,
but you're actually sharing

00:02:28.000 --> 00:02:32.000
code.
So you get to look at what the

00:02:32.000 --> 00:02:37.000
competitor is actually doing and
what hash function they're

00:02:37.000 --> 00:02:40.000
actually using.
So it's pretty clear that in

00:02:40.000 --> 00:02:44.000
this circumstance,
you have an adversary who is

00:02:44.000 --> 00:02:49.000
going to look at whatever hash
function you have and figure out

00:02:49.000 --> 00:02:53.000
OK, what's a set of variable
names and so forth that are

00:02:53.000 --> 00:02:58.000
going to all hash to the same
slot so that essentially you're

00:02:58.000 --> 00:03:03.000
just chasing through a linked
list whenever it comes to

00:03:03.000 --> 00:03:07.000
looking something up.
Slowing down your program

00:03:07.000 --> 00:03:12.000
enormously compared to if in
fact they got distributed nicely

00:03:12.000 --> 00:03:15.000
across the hash table which is,
what after all,

00:03:15.000 --> 00:03:19.000
you have a hash table in there
to do in the first place.

00:03:19.000 --> 00:03:22.000
And so the question is,
how do you defeat this

00:03:22.000 --> 00:03:26.000
adversary?
And the answer is one word.

00:03:31.000 --> 00:03:33.000
One word.
How do you achieve?

00:03:33.000 --> 00:03:37.000
How do you defeat any adversary
in this class?

00:03:37.000 --> 00:03:38.000
Randomness.
OK.

00:03:38.000 --> 00:03:39.000
Randomness.
OK.

00:03:39.000 --> 00:03:42.000
You make it so that he can't
guess.

00:03:42.000 --> 00:03:47.000
And the idea is that you choose
a hash function at random.

00:03:47.000 --> 00:03:50.000
Independent.
So he can look at the code,

00:03:50.000 --> 00:03:55.000
but when it actually runs,
it's going to use a random hash

00:03:55.000 --> 00:04:00.000
function that he has no way of
predicting what the hash

00:04:00.000 --> 00:04:05.000
function is that will actually
be used.

00:04:05.000 --> 00:04:07.000
OK.
So that's the game and that way

00:04:07.000 --> 00:04:11.000
he can provide an input,
but he can't provide an input

00:04:11.000 --> 00:04:15.000
that's guaranteed to force you
to run slowly.

00:04:15.000 --> 00:04:19.000
You might get unlucky in your
choice of hash function,

00:04:19.000 --> 00:04:23.000
but it's not going to be
because of the adversary.

00:04:23.000 --> 00:04:28.000
So the idea is to choose a hash
function --

00:04:34.000 --> 00:04:38.000
-- at random,
independently from the keys

00:04:38.000 --> 00:04:42.000
that you're, that are going to
be fed to it.

00:04:42.000 --> 00:04:47.000
So even if your adversary can
see your code,

00:04:47.000 --> 00:04:53.000
he can't tell which hash
function is going to be actually

00:04:53.000 --> 00:04:58.000
used at run time.
Doesn't get to see the output

00:04:58.000 --> 00:05:04.000
of the random numbers.
And so it turns out you can

00:05:04.000 --> 00:05:11.000
make this scheme work and the
name of the scheme is universal

00:05:11.000 --> 00:05:17.000
hashing, OK, is one way of
making this scheme work.

00:05:22.000 --> 00:05:34.000
So let's do some math.
So let U be a universe of keys.

00:05:34.000 --> 00:05:41.000
And let H be a finite
collection --

00:05:48.000 --> 00:05:49.000
-- of hash functions --

00:05:56.000 --> 00:06:04.000
-- mapping U to what are going
to be the slots in our hash

00:06:04.000 --> 00:06:06.000
table.
OK.

00:06:06.000 --> 00:06:11.000
So we just have H as some
finite collection.

00:06:11.000 --> 00:06:15.000
We say that H is universal --

00:06:22.000 --> 00:06:30.000
-- if for all pairs of the
keys, distinct keys --

00:06:36.000 --> 00:06:41.000
-- so the keys are distinct,
the following is true.

00:07:03.000 --> 00:07:08.000
So if the set of keys,
if for any pair of keys I pick,

00:07:08.000 --> 00:07:15.000
the number of hash functions
that hash those two keys to the

00:07:15.000 --> 00:07:21.000
same place is a one over m
fraction of the total set of

00:07:21.000 --> 00:07:23.000
keys.
So let m just,

00:07:23.000 --> 00:07:28.000
so to view that,
another way of viewing that is

00:07:28.000 --> 00:07:33.000
if H is chosen randomly --

00:07:39.000 --> 00:07:51.000
-- from the set of keys H,
the probability of collision

00:07:51.000 --> 00:07:58.000
between x and y is what?

00:08:12.000 --> 00:08:17.000
What's the probability if the
fraction of hash functions,

00:08:17.000 --> 00:08:22.000
OK, if the number of hash
functions is H over m,

00:08:22.000 --> 00:08:27.000
what's the probability of a
collision between x and y?

00:08:27.000 --> 00:08:32.000
If I pick a hash function at
random.

00:08:32.000 --> 00:08:39.000
So I pick a hash function at
random, what's the odds they

00:08:39.000 --> 00:08:42.000
collide?
One over m.

00:08:42.000 --> 00:08:49.000
Now let's draw a picture for
that, help people see that

00:08:49.000 --> 00:08:56.000
that's in fact the case.
So imagine this is our set of

00:08:56.000 --> 00:09:00.000
all hash functions.
OK.

00:09:00.000 --> 00:09:08.000
And then if I pick a particular
x and y, let's say that this is

00:09:08.000 --> 00:09:16.000
the set of hash functions such
that H of x is equal to H of y.

00:09:16.000 --> 00:09:23.000
And so what we're saying is
that the cardinality of that set

00:09:23.000 --> 00:09:30.000
is one over m times the
cardinality of H.

00:09:30.000 --> 00:09:33.000
So if I throw a dart and pick
one hash function at random,

00:09:33.000 --> 00:09:37.000
the odds are one in m that the
hash function falls into this

00:09:37.000 --> 00:09:39.000
particular set.
And of course,

00:09:39.000 --> 00:09:43.000
this has to be true of every x
and y that I can pick.

00:09:43.000 --> 00:09:45.000
Of course, it will be a
different set,

00:09:45.000 --> 00:09:49.000
a different x and y will
somehow map the hash functions

00:09:49.000 --> 00:09:52.000
differently, but the odds that
for any x and y that I pick,

00:09:52.000 --> 00:09:55.000
the odds that if I have a
random hash function,

00:09:55.000 --> 00:10:00.000
it hashes it to the same place,
is one over m.

00:10:00.000 --> 00:10:03.000
Now this is a little bit hard
sometimes for people to get

00:10:03.000 --> 00:10:07.000
their head around because we're
used to thinking of perhaps

00:10:07.000 --> 00:10:09.000
picking keys at random or
something.

00:10:09.000 --> 00:10:11.000
OK, that's not what's going on
here.

00:10:11.000 --> 00:10:14.000
We're picking hash functions at
random.

00:10:14.000 --> 00:10:18.000
So our probability space is
defined over the hash functions,

00:10:18.000 --> 00:10:21.000
not over the keys.
And this has to be true now for

00:10:21.000 --> 00:10:24.000
any particular two keys that I
pick that are distinct.

00:10:24.000 --> 00:10:28.000
That the places that they hash,
this set of hash functions,

00:10:28.000 --> 00:10:34.000
I mean this is like a marvelous
property if you think about it.

00:10:34.000 --> 00:10:39.000
OK, that you can actually find
ones where no matter what two

00:10:39.000 --> 00:10:43.000
elements I pick,
the odds are exactly one in m

00:10:43.000 --> 00:10:48.000
that a random hash function from
this set is going to hash them

00:10:48.000 --> 00:10:51.000
to the same place.
So very neat.

00:10:51.000 --> 00:10:56.000
Very, very neat property and
we'll see the mathematics

00:10:56.000 --> 00:11:00.000
associated with this is very
cool.

00:11:00.000 --> 00:11:14.000
So our theorem is that if we
choose h randomly from the set

00:11:14.000 --> 00:11:25.000
of hash functions H,
and then we suppose we're

00:11:25.000 --> 00:11:37.000
hashing n keys into m slots in
Table T --

00:11:44.000 --> 00:11:46.000
-- then for given key x --

00:11:52.000 --> 00:11:56.000
-- the expected number of
collisions with x --

00:12:03.000 --> 00:12:12.000
-- is less than n over m.
And who remembers what we call

00:12:12.000 --> 00:12:16.000
n over m?
Alpha, which is the,

00:12:16.000 --> 00:12:22.000
what's the term that we use
there?

00:12:22.000 --> 00:12:30.000
Load factor.
The load factor of the table.

00:12:30.000 --> 00:12:36.000
OK, load factor alpha.
So the average number of keys

00:12:36.000 --> 00:12:42.000
per slot is the load factor of
the table.

00:12:42.000 --> 00:12:48.000
So we're saying,
so what is this theorem saying?

00:12:48.000 --> 00:12:55.000
It's saying that in fact,
if we have one of these

00:12:55.000 --> 00:13:02.000
universal sets of hash
functions, then things perform

00:13:02.000 --> 00:13:10.000
exactly the way we want them to.
Things get distributed evenly.

00:13:10.000 --> 00:13:15.000
The number of things that are
going to collide with any

00:13:15.000 --> 00:13:19.000
particular key that I pick is
going to be n over m.

00:13:19.000 --> 00:13:22.000
So that's a really good
property to have.

00:13:22.000 --> 00:13:27.000
Now I haven't shown you,
the construction of U is going,

00:13:27.000 --> 00:13:31.000
sorry, of the set of hash
functions H, that that

00:13:31.000 --> 00:13:36.000
construction will take us a
little bit of effort.

00:13:36.000 --> 00:13:39.000
But first I want to show you
why this is such a great

00:13:39.000 --> 00:13:42.000
property.
Basically it's this theorem.

00:13:42.000 --> 00:13:46.000
So let's prove this theorem.
So any questions about what the

00:13:46.000 --> 00:13:50.000
statement of the theorem is?
So we're going to go actually

00:13:50.000 --> 00:13:54.000
kind of fast today.
We've got a lot of good stuff

00:13:54.000 --> 00:13:57.000
today.
So I want to make sure people

00:13:57.000 --> 00:14:03.000
are onboard as we go through.
So if there are any questions,

00:14:03.000 --> 00:14:07.000
make sure, you know,
statement of theorem of

00:14:07.000 --> 00:14:13.000
whatever, best to get them out
early so that you're not

00:14:13.000 --> 00:14:19.000
confused later on when the going
gets a little more exciting.

00:14:19.000 --> 00:14:21.000
OK?
OK, good.

00:14:21.000 --> 00:14:26.000
So to prove this,
let's let C sub x be the random

00:14:26.000 --> 00:14:33.000
variable denoting the total
number of collisions --

00:14:38.000 --> 00:14:44.000
-- of keys in T with x.
So this is a total number and

00:14:44.000 --> 00:14:51.000
one of the techniques that you
use a lot in probabilistic

00:14:51.000 --> 00:14:57.000
analysis of randomized
algorithms is recognizing that C

00:14:57.000 --> 00:15:05.000
of x is in fact a sum of
indicator random variables.

00:15:05.000 --> 00:15:11.000
If you can decompose things
into indicator random variables,

00:15:11.000 --> 00:15:17.000
the analysis goes much more
easily than if you're left with

00:15:17.000 --> 00:15:22.000
aggregate variables.
So here we're going to let our

00:15:22.000 --> 00:15:27.000
indicator random variable be
little c of x.,

00:15:27.000 --> 00:15:32.000
which is going to be one if h
of x equals h of y and 0

00:15:32.000 --> 00:15:35.000
otherwise.

00:15:40.000 --> 00:15:49.000
And so we can note two things.
First, what is the expectation

00:15:49.000 --> 00:15:52.000
of C of x..

00:15:57.000 --> 00:16:00.000
OK, if I have a process which
is picking a hash function at

00:16:00.000 --> 00:16:04.000
random, what's the expectation
of C of x.?

00:16:04.000 --> 00:16:07.000
One over m.
Because that's basically this

00:16:07.000 --> 00:16:11.000
definition here.
Now in other words I pick a

00:16:11.000 --> 00:16:16.000
hash function at random,
what's the odds that the hash

00:16:16.000 --> 00:16:19.000
is the same?
It's one over m.

00:16:19.000 --> 00:16:24.000
And then the other thing is,
and the reason we pick this

00:16:24.000 --> 00:16:28.000
thing is that I can express
capital C sub x,

00:16:28.000 --> 00:16:33.000
the random variable denoting
the total number of collisions

00:16:33.000 --> 00:16:39.000
as being just the sum over all
the keys in the table except x

00:16:39.000 --> 00:16:46.000
of C of x..
So for each one that would

00:16:46.000 --> 00:16:53.000
cause me a collision,
with x, I add one and if it

00:16:53.000 --> 00:17:00.000
wouldn't cause me a collision,
I add 0.

00:17:00.000 --> 00:17:06.000
And that adds up all of the
collisions that I would have in

00:17:06.000 --> 00:17:09.000
the table with x.

00:17:17.000 --> 00:17:20.000
Is there any questions so far?
Because this is the set-up.

00:17:20.000 --> 00:17:24.000
The set-up in most of these
things, the set-up is where most

00:17:24.000 --> 00:17:27.000
students make mistakes and most
practicing researchers make

00:17:27.000 --> 00:17:30.000
mistakes as well,
let me tell you.

00:17:30.000 --> 00:17:32.000
And then once you get the
set-up right,

00:17:32.000 --> 00:17:36.000
then working out the math is
fine, but it's often that set-up

00:17:36.000 --> 00:17:40.000
of how do you actually translate
the situation into the math.

00:17:40.000 --> 00:17:43.000
That's the hard part.
Once you get that right,

00:17:43.000 --> 00:17:46.000
well, then, algebra,
we can all do algebra.

00:17:46.000 --> 00:17:49.000
Of course, we can also all make
mistakes doing algebra,

00:17:49.000 --> 00:17:53.000
but at least those mistakes are
much more easy to check than the

00:17:53.000 --> 00:17:57.000
one that does the translation.
So I want to make sure people

00:17:57.000 --> 00:18:00.000
are sort of understanding of how
that's set up.

00:18:00.000 --> 00:18:05.000
So now we just have to use our
math skills.

00:18:05.000 --> 00:18:12.000
So the expectation then of the
number of collisions is the

00:18:12.000 --> 00:18:18.000
expectation of C sub x and
that's just the expectation of

00:18:18.000 --> 00:18:26.000
just plugging the sum of y and T
minus the element x of c_xy.

00:18:26.000 --> 00:18:33.000
So that's just definition.
And that's equal to the sum of

00:18:33.000 --> 00:18:39.000
y and T minus x of expectation
of c_xy.

00:18:39.000 --> 00:18:44.000
So why is that?
Yeah, that's linearity.

00:18:52.000 --> 00:18:56.000
Linearity of expectation,
doesn't require independence.

00:18:56.000 --> 00:19:00.000
It's true of all random
variables.

00:19:00.000 --> 00:19:07.000
And that's equal to,
and now the math gets easier.

00:19:07.000 --> 00:19:10.000
So what is that?
One over m.

00:19:10.000 --> 00:19:16.000
That makes the summation easy
to evaluate.

00:19:16.000 --> 00:19:22.000
That's just n minus one over m.

00:19:30.000 --> 00:19:35.000
So fairly simple analysis and
shows you why we would love to

00:19:35.000 --> 00:19:41.000
have one of these sets of
universal hash functions because

00:19:41.000 --> 00:19:45.000
if you have them,
then they behave exactly the

00:19:45.000 --> 00:19:51.000
way you would want it to behave.
And you defeat your adversary

00:19:51.000 --> 00:19:55.000
by just picking up the hash
function at random.

00:19:55.000 --> 00:20:00.000
There's nothing he can do.
Or she.

00:20:00.000 --> 00:20:02.000
OK, any questions about that
proof?

00:20:02.000 --> 00:20:04.000
OK, now we get into the fun
math.

00:20:04.000 --> 00:20:07.000
Constructing one of these
babies.

00:20:07.000 --> 00:20:08.000
OK.

00:20:20.000 --> 00:20:23.000
This is not the only
construction.

00:20:23.000 --> 00:20:31.000
This is a construction of a
classic universal hash function.

00:20:31.000 --> 00:20:37.000
And there are other
constructions in the literature

00:20:37.000 --> 00:20:42.000
and I think there's one on the
practice quiz.

00:20:42.000 --> 00:20:47.000
So let's see.
So this one works when m is

00:20:47.000 --> 00:20:51.000
prime.
So it works when the set of

00:20:51.000 --> 00:20:57.000
slots is a prime number.
Number of slots is a prime

00:20:57.000 --> 00:21:05.000
number.
So the idea here is we're going

00:21:05.000 --> 00:21:16.000
to decompose any key k in our
universe into r plus 1 digits.

00:21:16.000 --> 00:21:25.000
So k, we're going to look at as
being a k 0, k one,

00:21:25.000 --> 00:21:33.000
k_r where 0 is less than or
equal to k sub I,

00:21:33.000 --> 00:21:41.000
is less than or equal to m
minus one.

00:21:41.000 --> 00:21:47.000
So the idea is in some sense
we're looking at what the

00:21:47.000 --> 00:21:52.000
representation would be of k
base m.

00:21:52.000 --> 00:21:58.000
So if it were base two,
it would be just one bit at a

00:21:58.000 --> 00:22:01.000
time.
These would just be the bits.

00:22:01.000 --> 00:22:05.000
I'm not going to do base two.
We're going to do base min

00:22:05.000 --> 00:22:09.000
general and so each of these
represents one of the digits.

00:22:09.000 --> 00:22:13.000
And the way I've done it is
I've done low order digit first.

00:22:13.000 --> 00:22:16.000
It actually doesn't matter.
We're not actually going to

00:22:16.000 --> 00:22:20.000
care really about what the order
is, but basically we're just

00:22:20.000 --> 00:22:24.000
looking at busting it into a
twofold represented by each of

00:22:24.000 --> 00:22:27.000
those digits.
So one algorithm for computing

00:22:27.000 --> 00:22:31.000
this out of k is take the
remainder mod m.

00:22:31.000 --> 00:22:34.000
That's the low order one.
OK, take what's left.

00:22:34.000 --> 00:22:37.000
Take the remainder of that mod
m.

00:22:37.000 --> 00:22:39.000
Take whatever's left,
etc.

00:22:39.000 --> 00:22:42.000
So you're familiar with the
conversion to a base

00:22:42.000 --> 00:22:46.000
representation.
That's exactly how we're

00:22:46.000 --> 00:22:49.000
getting this representation.
So we treat,

00:22:49.000 --> 00:22:53.000
this is just a question of
taking the data that we've got

00:22:53.000 --> 00:22:57.000
and treating it as an r plus one
base m number.

00:22:57.000 --> 00:23:02.000
And now we invoke our
randomized strategy.

00:23:02.000 --> 00:23:05.000
The randomized strategy is
going to be able to have a class

00:23:05.000 --> 00:23:09.000
of hash functions that's
dependent essentially on random

00:23:09.000 --> 00:23:11.000
numbers.
And the random numbers we're

00:23:11.000 --> 00:23:15.000
going to pick is we're going to
pick an a at random --

00:23:28.000 --> 00:23:33.000
-- which we're also going to
look at as a base mnumber.

00:23:33.000 --> 00:23:38.000
For each a_i is chosen randomly
--

00:23:49.000 --> 00:23:50.000
-- from --

00:23:55.000 --> 00:23:58.000
-- 0 to m minus one.
So one of our,

00:23:58.000 --> 00:24:03.000
it's a random if you will,
it's a random base mdigit.

00:24:03.000 --> 00:24:06.000
Random base m digit.
So each one of these is picked

00:24:06.000 --> 00:24:09.000
at random.
And for each one we,

00:24:09.000 --> 00:24:13.000
possible value of A,
we're going to get a different

00:24:13.000 --> 00:24:16.000
hash function.
So we're going to index our

00:24:16.000 --> 00:24:19.000
hash functions by this random
number.

00:24:19.000 --> 00:24:23.000
So this is where the randomness
is going to come in.

00:24:23.000 --> 00:24:28.000
Everybody with me?
And here's the hash function.

00:24:56.000 --> 00:25:06.000
So what we do is we dot product
this vector with this vector and

00:25:06.000 --> 00:25:11.000
take the result,
mod m.

00:25:11.000 --> 00:25:18.000
So each digit of k of our key
gets multiplied by a random

00:25:18.000 --> 00:25:25.000
other digit.
We add all those up and we take

00:25:25.000 --> 00:25:29.000
that mod m.
So that's a dot product

00:25:29.000 --> 00:25:34.000
operator.
And this is what we're going to

00:25:34.000 --> 00:25:37.000
show is universal,
that this set of h sub a,

00:25:37.000 --> 00:25:39.000
where I look over that whole
set.

00:25:39.000 --> 00:25:44.000
So one of the things we need to
know is how big is the set of

00:25:44.000 --> 00:25:46.000
hash functions here.

00:25:59.000 --> 00:26:01.000
So how big is this set of hash
functions?

00:26:01.000 --> 00:26:07.000
How many different hash
functions do I have in this set?

00:26:24.000 --> 00:26:31.000
It's basic 6.042 material.
It's basically how many vectors

00:26:31.000 --> 00:26:38.000
of length r plus one where each
element of the vector is a

00:26:38.000 --> 00:26:45.000
number of 0 to m minus one,
has m different values.

00:26:45.000 --> 00:26:50.000
So how many?
m minus one to the r.

00:26:50.000 --> 00:26:51.000
No.
Close.

00:26:51.000 --> 00:26:56.000
It's up there.
It's a big number.

00:26:56.000 --> 00:27:01.000
m to the r plus one.
Good.

00:27:01.000 --> 00:27:06.000
It's m, so the size of H is
equal to m to the r plus one.

00:27:06.000 --> 00:27:10.000
So we're going to want to
remember that.

00:27:10.000 --> 00:27:13.000
OK, so let's just understand
why that is.

00:27:13.000 --> 00:27:17.000
I have m choices for the first
value of A.

00:27:17.000 --> 00:27:19.000
m for the second,
etc.

00:27:19.000 --> 00:27:23.000
m for the r th.
And since there are plus one

00:27:23.000 --> 00:27:28.000
things here, for each choice
here, I have this many same

00:27:28.000 --> 00:27:34.000
number of choices here,
so it's a product.

00:27:34.000 --> 00:27:39.000
OK, so this is the product rule
in counting.

00:27:39.000 --> 00:27:45.000
So if you haven't reviewed your
6.042 notes for counting,

00:27:45.000 --> 00:27:52.000
this is going to be a good idea
to go back and review that

00:27:52.000 --> 00:27:57.000
because we're doing stuff of
that nature.

00:27:57.000 --> 00:28:01.000
This is just the product rule.
Good.

00:28:01.000 --> 00:28:10.000
So then the theorem we want to
prove is that H is universal.

00:28:10.000 --> 00:28:14.000
And this is going to involve a
little bit of number theory,

00:28:14.000 --> 00:28:19.000
so it gets kind of interesting.
And it's a non-trivial proof,

00:28:19.000 --> 00:28:23.000
so this is where if there's any
questions as I'm going along,

00:28:23.000 --> 00:28:28.000
please ask because the argument
is not as simple as other

00:28:28.000 --> 00:28:33.000
arguments we've seen so far.
OK, not the ones we've seen so

00:28:33.000 --> 00:28:38.000
far have been simple,
but this is definitely a more

00:28:38.000 --> 00:28:43.000
involved mathematical argument.
So here's a proof.

00:28:43.000 --> 00:28:46.000
So let's let,
so we have two keys.

00:28:46.000 --> 00:28:50.000
What are we trying to show if
it's universal,

00:28:50.000 --> 00:28:55.000
that if I pick any two keys,
the number of hash functions

00:28:55.000 --> 00:29:01.000
for which they hash to the same
thing is the size of set of hash

00:29:01.000 --> 00:29:08.000
functions divided by m.
OK, so I'm going to look at two

00:29:08.000 --> 00:29:11.000
keys.
So let's pick two keys

00:29:11.000 --> 00:29:16.000
arbitrarily.
So x, and we'll decompose it

00:29:16.000 --> 00:29:23.000
into our base r representation
and y, y_0, y_1 --

00:29:33.000 --> 00:29:39.000
So these are two distinct keys.
So if these are two distinct

00:29:39.000 --> 00:29:45.000
keys, so they're different,
then this base representation

00:29:45.000 --> 00:29:50.000
has the property that they've
got to differ somewhere.

00:29:50.000 --> 00:29:54.000
Right?
OK, they differ in at least one

00:29:54.000 --> 00:29:56.000
digit.

00:30:08.000 --> 00:30:12.000
OK, and this is where most
people get lost because I'm

00:30:12.000 --> 00:30:16.000
going to make a simplification.
They could differ in any one of

00:30:16.000 --> 00:30:20.000
these digits.
I'm going to say they differ in

00:30:20.000 --> 00:30:24.000
position 0 because it doesn't
matter which one I do,

00:30:24.000 --> 00:30:28.000
the math is the same,
but it'll make it so that if I

00:30:28.000 --> 00:30:31.000
pick some said they differ in
some position i,

00:30:31.000 --> 00:30:35.000
I would have to be taking
summations as you'll see over

00:30:35.000 --> 00:30:41.000
the elements that are not i,
and that's complicated.

00:30:41.000 --> 00:30:44.000
If I do it in position 0,
then I can just sum for the

00:30:44.000 --> 00:30:46.000
rest of them.
So the math is going to be

00:30:46.000 --> 00:30:50.000
identical if I were to do it for
any position because it's

00:30:50.000 --> 00:30:52.000
symmetric.
All the digits are symmetric.

00:30:52.000 --> 00:30:56.000
So let's say they differ in
position 0, but the same

00:30:56.000 --> 00:30:59.000
argument is going to be true if
they differed in some other

00:30:59.000 --> 00:31:02.000
position.
So let's say,

00:31:02.000 --> 00:31:05.000
so we're saying without loss of
generality.

00:31:05.000 --> 00:31:08.000
So that's without loss of
generality.

00:31:08.000 --> 00:31:12.000
Position 0.
Because all the positions are

00:31:12.000 --> 00:31:16.000
symmetric here.
And so, now we need to ask the

00:31:16.000 --> 00:31:19.000
question for how many --

00:31:24.000 --> 00:31:30.000
-- hash functions in our
universal, purportedly universal

00:31:30.000 --> 00:31:34.000
set do x and y collide?

00:31:39.000 --> 00:31:42.000
OK, we've got to count them up.
So how often do they collide?

00:31:42.000 --> 00:31:46.000
This is where we're going to
pull out some heavy duty number

00:31:46.000 --> 00:31:48.000
theory.
So we must have,

00:31:48.000 --> 00:31:50.000
if they collide --

00:31:56.000 --> 00:32:03.000
-- that h sub a of x is equal
to h sub a of y.

00:32:03.000 --> 00:32:09.000
That's what it means for them
to collide.

00:32:09.000 --> 00:32:20.000
So that implies that the sum of
i equal 0 to r of a sub i x sub

00:32:20.000 --> 00:32:30.000
i is equal to the sum of i
equals 0 to r of a sub i y sub i

00:32:30.000 --> 00:32:35.000
mod m.
Actually this is congruent mod

00:32:35.000 --> 00:32:38.000
m.
So congruence for those people

00:32:38.000 --> 00:32:43.000
who haven't seen much number
theory, is basically the way of

00:32:43.000 --> 00:32:48.000
essentially, rather than having
to say mod everywhere in here

00:32:48.000 --> 00:32:52.000
and mod everywhere in here,
we just at the end say OK,

00:32:52.000 --> 00:32:56.000
do a mod at the end.
Everything is being done mod,

00:32:56.000 --> 00:32:59.000
module m.
And then typically we use a

00:32:59.000 --> 00:33:06.000
congruence sign.
OK, there's a more mathematical

00:33:06.000 --> 00:33:13.000
definition but this will work
for us engineers.

00:33:13.000 --> 00:33:18.000
OK, so everybody with me so
far?

00:33:18.000 --> 00:33:23.000
This is just applying the
definition.

00:33:23.000 --> 00:33:32.000
So that implies that the sum of
i equals 0 to r of a i x i minus

00:33:32.000 --> 00:33:41.000
y i is congruent to zeros mod m.
OK, just threw it on the other

00:33:41.000 --> 00:33:45.000
side and applied the
distributive law.

00:33:45.000 --> 00:33:49.000
Now what I'm going to do is
pull out the 0-th position

00:33:49.000 --> 00:33:53.000
because that's the one that I
care about.

00:33:53.000 --> 00:33:58.000
And this is where it saves me
on the math, compared to if I

00:33:58.000 --> 00:34:03.000
didn't say that it was 0.
I'd have to pull out x_i.

00:34:03.000 --> 00:34:05.000
It wouldn't matter,
but it just would make the math

00:34:05.000 --> 00:34:06.000
a little bit cruftier

00:34:23.000 --> 00:34:30.000
OK, so now we've just pulled
out one term.

00:34:30.000 --> 00:34:41.000
That implies that a_0 x_0 minus
y_0 is congruent to minus --

00:34:54.000 --> 00:34:58.000
-- mod m.
Now remember that when I have a

00:34:58.000 --> 00:35:02.000
minus number mod m,
I just map it into whatever,

00:35:02.000 --> 00:35:07.000
into that range from 0 to m
minus one.

00:35:07.000 --> 00:35:12.000
So for example,
minus five mod seven is two.

00:35:12.000 --> 00:35:19.000
So if any of these things are
negative, we simply translate

00:35:19.000 --> 00:35:27.000
them into by adding multiples of
mbecause adding multiples of m

00:35:27.000 --> 00:35:32.000
doesn't affect the congruence.

00:35:39.000 --> 00:35:41.000
OK.
And now for the next step,

00:35:41.000 --> 00:35:44.000
we need to use a number theory
fact.

00:35:44.000 --> 00:35:48.000
So let's pull out our number
theory --

00:35:57.000 --> 00:36:05.000
-- textbook and take a little
digression

00:36:10.000 --> 00:36:14.000
So this comes from the theory
of finite fields.

00:36:14.000 --> 00:36:17.000
So for people who are
knowledgeable,

00:36:17.000 --> 00:36:21.000
that's where you're plugging
your knowledge in.

00:36:21.000 --> 00:36:26.000
If you're not knowledgeable,
this is a great area of math to

00:36:26.000 --> 00:36:30.000
learn about.
So here's the fact.

00:36:30.000 --> 00:36:34.000
So let m be prime.
Then for any z,

00:36:34.000 --> 00:36:41.000
little z element of z sub m,
and z sub m is the integers mod

00:36:41.000 --> 00:36:46.000
m.
So this is essentially numbers

00:36:46.000 --> 00:36:51.000
from 0 to m minus one with all
the operations,

00:36:51.000 --> 00:36:57.000
times, minus,
plus, etc., defined on that

00:36:57.000 --> 00:37:04.000
such that if you end up outside
of the range of 0 to m minus

00:37:04.000 --> 00:37:11.000
one, you re-normalize by
subtracting or adding multiples

00:37:11.000 --> 00:37:21.000
of m to get back within the
range from 0 to m minus one.

00:37:21.000 --> 00:37:30.000
So it's the standard thing of
just doing things module m.

00:37:30.000 --> 00:37:38.000
So for any z such that z is not
congruent to 0,

00:37:38.000 --> 00:37:47.000
there exists a unique z inverse
in z sub m, such that if I

00:37:47.000 --> 00:37:57.000
multiply z times the inverse,
it produces something congruent

00:37:57.000 --> 00:38:04.000
to one mod m.
So for any number it says,

00:38:04.000 --> 00:38:11.000
I can find another number that
when multiplied by it gives me

00:38:11.000 --> 00:38:15.000
one.
So let's just do an example for

00:38:15.000 --> 00:38:18.000
m equals seven.
So here we have,

00:38:18.000 --> 00:38:24.000
we'll make a little table.
So z is not equal to 0,

00:38:24.000 --> 00:38:29.000
so I just write down the other
numbers.

00:38:29.000 --> 00:38:35.000
And let's figure out what z
inverse is.

00:38:35.000 --> 00:38:41.000
So what's the inverse of one?
What number when multiplied by

00:38:41.000 --> 00:38:43.000
one gives me one?
One.

00:38:43.000 --> 00:38:45.000
Good.
How about two?

00:38:45.000 --> 00:38:51.000
What number when I multiply it
by two gives me one?

00:38:51.000 --> 00:38:55.000
Four.
Because two times four is eight

00:38:55.000 --> 00:39:01.000
and eight is congruent to one
mod seven.

00:39:01.000 --> 00:39:04.000
So I've re-normalized it.
What about three?

00:39:12.000 --> 00:39:13.000
Five.
Good.

00:39:13.000 --> 00:39:16.000
Five.
Three times five is 15.

00:39:16.000 --> 00:39:22.000
That's congruent to one mod
seven because 15 divided by

00:39:22.000 --> 00:39:28.000
seven is two remainder of one.
So that's the key thing.

00:39:28.000 --> 00:39:32.000
What about four?
Two.

00:39:32.000 --> 00:39:36.000
Five? Three. And six.

00:39:43.000 --> 00:39:43.000
Yeah.
Six.
Yeah, six it turns out.
OK, six times six is 36.

00:39:48.000 --> 00:39:52.000
OK, mod seven.
Basically subtract off the 35,

00:39:52.000 --> 00:39:56.000
gives m one.
So people have observed some

00:39:56.000 --> 00:40:02.000
interesting facts that if one
number's an inverse of another,

00:40:02.000 --> 00:40:08.000
then that other is an inverse
of the one.

00:40:08.000 --> 00:40:12.000
So that's actually one of these
things that you prove when you

00:40:12.000 --> 00:40:16.000
do group theory and field theory
and so forth.

00:40:16.000 --> 00:40:21.000
There are all sorts of other
great properties of this kind of

00:40:21.000 --> 00:40:23.000
math.
But the main thing is,

00:40:23.000 --> 00:40:27.000
and this turns out not to be
true if m is not a prime.

00:40:27.000 --> 00:40:31.000
So can somebody think of,
imagine we're doing something

00:40:31.000 --> 00:40:36.000
mod 10.
Can somebody think of a number

00:40:36.000 --> 00:40:39.000
that doesn't have an inverse mod
10?

00:40:39.000 --> 00:40:40.000
Yeah.
Two.

00:40:40.000 --> 00:40:45.000
Another one is five.
OK, it turns out the divisors

00:40:45.000 --> 00:40:49.000
in fact actually,
more generally,

00:40:49.000 --> 00:40:53.000
something that is not
relatively prime,

00:40:53.000 --> 00:40:58.000
meaning that it has no common
factors, the GCD is not one

00:40:58.000 --> 00:41:04.000
between that number and the
modulus.

00:41:04.000 --> 00:41:08.000
OK, those numbers do not have
an inverse mod m.

00:41:08.000 --> 00:41:13.000
OK, but if it's prime,
every number is relatively

00:41:13.000 --> 00:41:17.000
prime to the modulus.
And that's the property that

00:41:17.000 --> 00:41:22.000
we're taking advantage of.
So this is our fact and so,

00:41:22.000 --> 00:41:28.000
in this case what I'm after is
I want to divide by x_0 minus

00:41:28.000 --> 00:41:31.000
y_0.
That's what I want to do at

00:41:31.000 --> 00:41:34.000
this point.
But I can't do that if x_0,

00:41:34.000 --> 00:41:36.000
first of all,
if m isn't prime,

00:41:36.000 --> 00:41:40.000
I can't necessarily do that.
I might be able to,

00:41:40.000 --> 00:41:43.000
but I can't necessarily.
But if m is prime,

00:41:43.000 --> 00:41:46.000
I can definitely divide by x_0
minus y_0.

00:41:46.000 --> 00:41:49.000
I can find that inverse.
And the other thing I have to

00:41:49.000 --> 00:41:52.000
do is make sure x_0 minus y_0 is
not 0.

00:41:52.000 --> 00:41:57.000
OK, it would be 0 if these two
were equal, but our supposition

00:41:57.000 --> 00:42:01.000
was they weren't equal.
And once again,

00:42:01.000 --> 00:42:05.000
just bringing it back to the
without loss of generality,

00:42:05.000 --> 00:42:08.000
if it were some other position
that we were off,

00:42:08.000 --> 00:42:13.000
I would be doing exactly the
same thing with that position.

00:42:13.000 --> 00:42:16.000
So now we're going to be able
to divide.

00:42:16.000 --> 00:42:19.000
So we continue with our --

00:42:24.000 --> 00:42:33.000
-- continue with our proof.
So since x_0 is not equal to

00:42:33.000 --> 00:42:42.000
y_0, there exists an inverse for
x_0 minus y_0.

00:42:42.000 --> 00:42:48.000
And that implies,
just continue on from over

00:42:48.000 --> 00:42:56.000
there, that a_0 is congruent
therefore to minus the sum of i

00:42:56.000 --> 00:43:04.000
equal one to r of a_i,
x_i minus y_i times x_0 minus

00:43:04.000 --> 00:43:10.000
y_0 inverse.
So let's just go back to the

00:43:10.000 --> 00:43:15.000
beginning of our proof and see
what we've derived.

00:43:15.000 --> 00:43:19.000
If we're saying we have two
distinct keys,

00:43:19.000 --> 00:43:24.000
and we've picked all of these
a_i randomly,

00:43:24.000 --> 00:43:30.000
and we're saying that these two
distinct keys hash to the same

00:43:30.000 --> 00:43:34.000
place.
If they hash to the same place,

00:43:34.000 --> 00:43:41.000
it says that a_0 essentially
had to have a particular value

00:43:41.000 --> 00:43:47.000
as a function of the other a_i.
Because in other words,

00:43:47.000 --> 00:43:51.000
once I've picked each of these
a_i from one to r,

00:43:51.000 --> 00:43:54.000
if I did them in that order,
for example,

00:43:54.000 --> 00:43:58.000
then I don't have a choice for
how I pick a_0 to make it

00:43:58.000 --> 00:44:00.000
collide.
Exactly one value allows it to

00:44:00.000 --> 00:44:05.000
collide, namely the value of a_0
given by this.

00:44:05.000 --> 00:44:10.000
If I picked a different value
of a_0, they wouldn't collide.

00:44:10.000 --> 00:44:16.000
So let m write that down.
Thus, while you think about it

00:45:12.000 --> 00:45:18.000
So for any choice of these a_i,
there's exactly one of the

00:45:18.000 --> 00:45:24.000
impossible choices of a_0 that
cause a collision.

00:45:24.000 --> 00:45:29.000
And for all the other choices I
might make of a_0,

00:45:29.000 --> 00:45:36.000
there's n collision.
So essentially I don't have,

00:45:36.000 --> 00:45:42.000
if they're going to collide,
I've reduced essentially the

00:45:42.000 --> 00:45:49.000
number of degrees of freedom of
my randomness by a factor of m.

00:45:49.000 --> 00:45:55.000
So if I count up the number of
h_a's that cause x and y to

00:45:55.000 --> 00:46:01.000
collide, that's equal to,
well, there's m choices,

00:46:01.000 --> 00:46:06.000
just using the product rule
again.

00:46:06.000 --> 00:46:13.000
There's m choices for a_1 times
m choices for a_2,

00:46:13.000 --> 00:46:21.000
up to m choices for a_r and
then only one choice for a_0.

00:46:21.000 --> 00:46:28.000
So this is choices for a_1,
a_2, a_r and only one choice

00:46:28.000 --> 00:46:35.000
for a_0 if they're going to
collide.

00:46:35.000 --> 00:46:40.000
If they're not going to
collide, I've got more choices

00:46:40.000 --> 00:46:43.000
for a_0.
But if I want them to collide,

00:46:43.000 --> 00:46:48.000
there's only one value I can
pick, namely this value.

00:46:48.000 --> 00:46:53.000
That's the only value for which
I will pick.

00:46:53.000 --> 00:46:58.000
And that's equal to m to the r,
which is just the size of H

00:46:58.000 --> 00:47:03.000
divided by m.
And that completes the proof.

00:47:11.000 --> 00:47:14.000
So there are other universal
constructions,

00:47:14.000 --> 00:47:18.000
but this is a particularly
elegant one.

00:47:18.000 --> 00:47:22.000
So the point is that I have m
plus one, sorry,

00:47:22.000 --> 00:47:27.000
r plus one degrees of freedom
where each degree of freedom I

00:47:27.000 --> 00:47:33.000
have m choices.
But if I want them to collide,

00:47:33.000 --> 00:47:40.000
once I've picked any of the,
once I've picked r of those

00:47:40.000 --> 00:47:45.000
possible choices,
the last one is forced if I

00:47:45.000 --> 00:47:48.000
want it to collide.
So therefore,

00:47:48.000 --> 00:47:55.000
the set of functions for which
it collides is only one in m.

00:47:55.000 --> 00:48:01.000
A very slick construction.
Very slick.

00:48:01.000 --> 00:48:03.000
OK.
Everybody with me here?

00:48:03.000 --> 00:48:07.000
Didn't lose too many people?
Yeah, question.

00:48:07.000 --> 00:48:12.000
Well, part of it is,
actually this is a quite common

00:48:12.000 --> 00:48:15.000
type of thing to be doing
actually.

00:48:15.000 --> 00:48:19.000
If you take a class,
so we have follow on classes in

00:48:19.000 --> 00:48:24.000
cryptography and so forth,
and this kind of thing of

00:48:24.000 --> 00:48:29.000
taking dot products,
modulo m and also Galois fields

00:48:29.000 --> 00:48:34.000
which are particularly simple
finite fields and things like

00:48:34.000 --> 00:48:40.000
that, people play with these all
the time.

00:48:40.000 --> 00:48:43.000
So Galois fields are like using
exor's as your,

00:48:43.000 --> 00:48:46.000
same sort of thing as this
except base two.

00:48:46.000 --> 00:48:49.000
And so there's a lot of study
of this sort of thing.

00:48:49.000 --> 00:48:53.000
So people understand these kind
of properties.

00:48:53.000 --> 00:48:57.000
But yeah, it's like what's the
algorithm for having a brilliant

00:48:57.000 --> 00:49:01.000
insight into algorithms?
It's like OK.

00:49:01.000 --> 00:49:05.000
Wish I knew.
Then I'd just turn the crank.

00:49:05.000 --> 00:49:11.000
[LAUGHTER] But if it were that
easy, I wouldn't be standing up

00:49:11.000 --> 00:49:13.000
here today.
[LAUGHTER] Good.

00:49:13.000 --> 00:49:19.000
OK, so now I want to take on
another topic which is also I

00:49:19.000 --> 00:49:22.000
find, I think this is
astounding.

00:49:22.000 --> 00:49:27.000
It's just beautiful,
beautiful mathematics and a big

00:49:27.000 --> 00:49:34.000
impact on your ability to build
good hash functions.

00:49:34.000 --> 00:49:37.000
Now I want to talk about
another one topic,

00:49:37.000 --> 00:49:41.000
which is related,
which is the topic of perfect

00:49:41.000 --> 00:49:42.000
hashing.

00:49:54.000 --> 00:49:59.000
So everything we've done so far
does expected time performance.

00:49:59.000 --> 00:50:03.000
Hashing is good in the expected
sense.

00:50:03.000 --> 00:50:08.000
A perfect hashing addresses the
following questions.

00:50:08.000 --> 00:50:14.000
Suppose that I gave you a set
of keys, and I said just build

00:50:14.000 --> 00:50:20.000
me a static table so I can look
up whether the key is in the

00:50:20.000 --> 00:50:25.000
table with worst case time.
Good worst case time.

00:50:25.000 --> 00:50:31.000
So I have a fixed set of keys.
They might be something like

00:50:31.000 --> 00:50:37.000
for example, the hundred most
common or thousand most common

00:50:37.000 --> 00:50:42.000
words in English.
And when I get a word I want to

00:50:42.000 --> 00:50:47.000
check quickly in this table,
is the word that I've got one

00:50:47.000 --> 00:50:49.000
of the most common words in
English.

00:50:49.000 --> 00:50:54.000
I would like to do that not
with expected performance,

00:50:54.000 --> 00:50:57.000
but guaranteed worst case
performance.

00:50:57.000 --> 00:51:03.000
Is there a way of building it
so that I can find this quickly?

00:51:03.000 --> 00:51:06.000
So the problem is given n keys
--

00:51:12.000 --> 00:51:14.000
-- construct a static hash
table.

00:51:14.000 --> 00:51:17.000
In other words,
no insertion and deletion.

00:51:17.000 --> 00:51:20.000
We're just going to put the
elements in there.

00:51:20.000 --> 00:51:22.000
A size --

00:51:30.000 --> 00:51:37.000
-- m equal Order n.
So I don't want it to be a huge

00:51:37.000 --> 00:51:42.000
table.
I want it to be a table that is

00:51:42.000 --> 00:51:50.000
the size of my keys.
Table of size m equals Order n,

00:51:50.000 --> 00:51:59.000
such that search takes O(1)
time in the worst case.

00:52:06.000 --> 00:52:10.000
So there's no place in the
table where I'm going to have,

00:52:10.000 --> 00:52:14.000
I know in the average case,
that's not hard to do.

00:52:14.000 --> 00:52:18.000
But in the worst case,
I want to make sure that

00:52:18.000 --> 00:52:22.000
there's no particular spot where
the number of keys piles up to

00:52:22.000 --> 00:52:26.000
be a large number.
OK, in no spot should that

00:52:26.000 --> 00:52:29.000
happen.
Every single search I do should

00:52:29.000 --> 00:52:33.000
take Order one time.
There shouldn't be any

00:52:33.000 --> 00:52:37.000
statistical variation in terms
of how long it takes me to get

00:52:37.000 --> 00:52:39.000
something.
Does everybody understand what

00:52:39.000 --> 00:52:42.000
the puzzle is?
So this is a great,

00:52:42.000 --> 00:52:45.000
because this actually ends up
having a lot of uses.

00:52:45.000 --> 00:52:49.000
You know, you want to build a
table for something and you know

00:52:49.000 --> 00:52:52.000
what the values are that you're
going look up in it.

00:52:52.000 --> 00:52:56.000
But you don't want to spend a
lot of space on it and so forth.

00:52:56.000 --> 00:53:00.000
So the idea here is actually
going to be to use a two-level

00:53:00.000 --> 00:53:02.000
scheme.

00:53:09.000 --> 00:53:22.000
So the idea is we're going to
use a two-level scheme with

00:53:22.000 --> 00:53:31.000
universal hashing at both
levels.

00:53:31.000 --> 00:53:36.000
So the idea is we're going to
hash, we're going to have a hash

00:53:36.000 --> 00:53:41.000
table, we're going to hash into
slots, but rather than using

00:53:41.000 --> 00:53:46.000
chaining, we're going to have
another hash table there.

00:53:46.000 --> 00:53:51.000
We're going to do a second hash
into the second hash table.

00:53:51.000 --> 00:53:56.000
And the idea is that we're
going to do it in such a way

00:53:56.000 --> 00:54:01.000
that we have no collisions at
level two.

00:54:01.000 --> 00:54:03.000
So we may have collisions at
level one.

00:54:03.000 --> 00:54:08.000
We'll take anything that
collides at level one and put

00:54:08.000 --> 00:54:12.000
them into a hash table and then
our second level hash table,

00:54:12.000 --> 00:54:15.000
but that hash table,
no collisions.

00:54:15.000 --> 00:54:17.000
Boom.
We're just going to hash right

00:54:17.000 --> 00:54:20.000
in there.
And it'll just go boom to its

00:54:20.000 --> 00:54:23.000
thing.
So let's draw a picture of this

00:54:23.000 --> 00:54:28.000
to illustrate the scheme.
OK, so we have --

00:54:34.000 --> 00:54:37.000
-- 0 one, let's say six,
m minus one.

00:54:37.000 --> 00:54:42.000
So here's our hash table.
And what we're going to do is

00:54:42.000 --> 00:54:47.000
we're going to use universal
hashing at the first level,

00:54:47.000 --> 00:54:49.000
OK.
So we find a universal hash

00:54:49.000 --> 00:54:52.000
function.
We pick a hash function at

00:54:52.000 --> 00:54:56.000
random.
And what we'll do is we'll hash

00:54:56.000 --> 00:55:00.000
into that level.
And then what we'll do is we'll

00:55:00.000 --> 00:55:05.000
keep track of two things.
One is what the size of the

00:55:05.000 --> 00:55:09.000
hash table is at the next level.
So in this case,

00:55:09.000 --> 00:55:13.000
the size of the hash table will
only use the number of slots.

00:55:13.000 --> 00:55:17.000
There's going to be four.
And we're also going to keep a

00:55:17.000 --> 00:55:19.000
separate hash key for the second
level.

00:55:19.000 --> 00:55:23.000
So each slot will have its own
hash function for the second

00:55:23.000 --> 00:55:25.000
level.
So for example,

00:55:25.000 --> 00:55:30.000
this one might have a key of 31
that is a random number.

00:55:30.000 --> 00:55:32.000
The a's here.
a's up there.

00:55:32.000 --> 00:55:34.000
There we go,
a's up there.

00:55:34.000 --> 00:55:39.000
So that's going to be the basis
of my hash function,

00:55:39.000 --> 00:55:42.000
the key with which I'm going to
hash.

00:55:42.000 --> 00:55:46.000
This one say has 86.
And let's say that this,

00:55:46.000 --> 00:55:50.000
and then we have a pointer to
the hash table.

00:55:50.000 --> 00:55:55.000
This is say S_1.
And it's got four slots and we

00:55:55.000 --> 00:56:01.000
stored up 14 and 27.
And these two slots are empty.

00:56:01.000 --> 00:56:09.000
And this one for example,
had what?

00:56:09.000 --> 00:56:12.000
Two nines.

00:56:28.000 --> 00:56:34.000
So the idea here is that in
this case if we look over all

00:56:34.000 --> 00:56:40.000
our top level hash function,
which I'll just call H,

00:56:40.000 --> 00:56:47.000
has that H of 14 is equal to H
of 27 is equal to one.

00:56:47.000 --> 00:56:53.000
Because we're in slot one.
OK, so these two both hash to

00:56:53.000 --> 00:56:57.000
the same slot in the level one
hash table.

00:56:57.000 --> 00:57:02.000
This is level one.
And this is level two over

00:57:02.000 --> 00:57:06.000
here.
So level one hashing,

00:57:06.000 --> 00:57:11.000
14 and 27 collided.
They went into the same slot

00:57:11.000 --> 00:57:13.000
here.
But at level two,

00:57:13.000 --> 00:57:20.000
they got hashed to different
places and the hash function I

00:57:20.000 --> 00:57:26.000
use is going to be indexed by
whatever the random numbers are

00:57:26.000 --> 00:57:33.000
that I chose and found for those
and I'll show you how we find

00:57:33.000 --> 00:57:36.000
those.
We have then h of 31 of 14 is

00:57:36.000 --> 00:57:43.000
equal to one h of 31 of 27 is
equal to two.

00:57:43.000 --> 00:57:46.000
For level two.
So I go, hash in here,

00:57:46.000 --> 00:57:51.000
find the, use this as the basis
of my hash function to hash into

00:57:51.000 --> 00:57:55.000
whatever table I've got here.
And so, if there are no,

00:57:55.000 --> 00:58:00.000
if I can guarantee that there
are no collisions at level two,

00:58:00.000 --> 00:58:05.000
this is going to cost me Order
one time in the worst case to

00:58:05.000 --> 00:58:09.000
look something up.
How do I look it up?

00:58:09.000 --> 00:58:12.000
Take the value.
I apply h to it.

00:58:12.000 --> 00:58:16.000
That takes me to some slot.
Then I look to see what the key

00:58:16.000 --> 00:58:21.000
is for this hash function.
I apply that hash function and

00:58:21.000 --> 00:58:24.000
that takes me to another slot.
Then I go there.

00:58:24.000 --> 00:58:29.000
And that took me basically two
applications of hash functions

00:58:29.000 --> 00:58:33.000
plus some look-up,
plus who knows what minor

00:58:33.000 --> 00:58:41.000
amount of bookkeeping.
So the reason we're going to

00:58:41.000 --> 00:58:50.000
have no collisions at this level
is the following.

00:58:50.000 --> 00:59:01.000
If they're n sub i items that
hash to a level one slot i,

00:59:01.000 --> 00:59:11.000
then we're going to use m sub
i, which is equal to n sub i

00:59:11.000 --> 00:59:21.000
squared slots in the level two
hash table.

00:59:29.000 --> 00:59:33.000
OK, so I should have mentioned
here this is going to be m sub

00:59:33.000 --> 00:59:37.000
i, the size of the hash table
and this is going to be my a sub

00:59:37.000 --> 00:59:39.000
i essentially.

00:59:45.000 --> 00:59:50.000
So I'm going to use,
so basically I'm going to hash

00:59:50.000 --> 00:59:55.000
n sub i things into n sub i
squared locations here.

00:59:55.000 --> 01:00:00.000
So this is going to be
incredibly sparse.

01:00:00.000 --> 01:00:02.480
OK, it's going to be quadratic
in size.

01:00:02.480 --> 01:00:05.612
And so what I'm going to show
is that under those

01:00:05.612 --> 01:00:08.418
circumstances,
it's easy for me to find hash

01:00:08.418 --> 01:00:11.159
functions such that there are n
collisions.

01:00:11.159 --> 01:00:15.010
That's the name of the game.
Figure out how can I make these

01:00:15.010 --> 01:00:18.012
hash functions so that there are
no collisions.

01:00:18.012 --> 01:00:21.341
So that's why I draw this with
so few elements here.

01:00:21.341 --> 01:00:24.604
So here for example,
I have two elements and I have

01:00:24.604 --> 01:00:27.867
a hash table size four here.
I have three elements.

01:00:27.867 --> 01:00:32.520
I need a hash table size nine.
OK, if there are a hundred

01:00:32.520 --> 01:00:34.918
elements, I need a hash table
size 10,000.

01:00:34.918 --> 01:00:38.485
I'm not going to pick something
so there's likely that there's

01:00:38.485 --> 01:00:41.350
anything of that size.
And then the fact that this

01:00:41.350 --> 01:00:44.801
actually works and gives us all
the properties that we want,

01:00:44.801 --> 01:00:48.251
that's part of the analysis.
So does everybody see that this

01:00:48.251 --> 01:00:51.877
takes Order one worst case time
and what the basic structure of

01:00:51.877 --> 01:00:52.988
it is?
These things,

01:00:52.988 --> 01:00:55.210
by the way, are not in this
case prime.

01:00:55.210 --> 01:00:58.134
I could always pick primes that
were close to this.

01:00:58.134 --> 01:01:03.730
I didn't do that in this case.
Or I could use a universal hash

01:01:03.730 --> 01:01:09.103
function that in fact would work
for things other than primes.

01:01:09.103 --> 01:01:12.362
But I didn't do that for this
example.

01:01:12.362 --> 01:01:16.943
We all ready for analysis?
OK, let's do some analysis

01:01:16.943 --> 01:01:18.000
then.

01:01:29.000 --> 01:01:31.000
And this is really pretty
analysis.

01:01:31.000 --> 01:01:33.528
Partly as you'll see because
we've already done some of this

01:01:33.528 --> 01:01:34.000
analysis.

01:01:50.000 --> 01:01:53.238
So the trick is analyzing level
two.

01:01:53.238 --> 01:01:57.309
That's the main thing that I
want to analyze,

01:01:57.309 --> 01:02:02.583
to show that I can find hash
functions here that are going

01:02:02.583 --> 01:02:06.192
to, when I map them into,
very sparsely,

01:02:06.192 --> 01:02:09.523
into these arrays here,
that in fact,

01:02:09.523 --> 01:02:16.000
such hash functions exist and I
can compute them in advance.

01:02:16.000 --> 01:02:23.344
So that I have a good way of
storing those.

01:02:23.344 --> 01:02:30.338
So here's the theorem we're
going to use.

01:02:30.338 --> 01:02:40.830
My hash and keys into m equals
n squared slots using a random

01:02:40.830 --> 01:02:48.000
hash function in a universal set
H.

01:02:48.000 --> 01:03:00.393
Then the expected number of
collisions is less than one

01:03:00.393 --> 01:03:02.502
half.
OK.

01:03:02.502 --> 01:03:11.372
The expected number of
collisions I don't expect there

01:03:11.372 --> 01:03:20.577
to be even one collision.
I expect there to be less than

01:03:20.577 --> 01:03:29.447
half a collision on average.
And so, let's prove this,

01:03:29.447 --> 01:03:39.154
so that the probability that
two given keys collide under h

01:03:39.154 --> 01:03:45.216
is what?
What's the probability that two

01:03:45.216 --> 01:03:51.443
given keys collide under h when
h is chosen randomly from the

01:03:51.443 --> 01:03:54.037
universal set?
One over m.

01:03:54.037 --> 01:03:56.943
Right?
That's the definition,

01:03:56.943 --> 01:04:02.235
right, of, which is in this
case equal to one over n

01:04:02.235 --> 01:04:06.210
squared.
So now how many keys,

01:04:06.210 --> 01:04:11.052
how many pairs of keys do I
have in this table?

01:04:11.052 --> 01:04:16.526
How many keys could possibly
collide with each other?

01:04:16.526 --> 01:04:19.368
OK.
So that's basically just

01:04:19.368 --> 01:04:25.157
looking at how many different
pairs of keys do I have to

01:04:25.157 --> 01:04:30.315
evaluate this for.
So that's n choose two pairs of

01:04:30.315 --> 01:04:36.654
keys.
n choose two pairs of keys.

01:04:36.654 --> 01:04:42.689
So therefore,
the expected number of

01:04:42.689 --> 01:04:52.172
collisions is while for each of
these n, not n over two.

01:04:52.172 --> 01:05:00.793
n choose two pairs of keys.
The probability that it

01:05:00.793 --> 01:05:08.923
collides is one in n squared.
So that's equal to n times n

01:05:08.923 --> 01:05:12.221
minus one over two,
if you remember your formula,

01:05:12.221 --> 01:05:16.000
times one in n squared.
And that's less than a half.

01:05:24.000 --> 01:05:28.183
So for every pair of keys,
so those of you who remember

01:05:28.183 --> 01:05:33.063
from 6.042 the birthday paradox,
this is related to the birthday

01:05:33.063 --> 01:05:36.800
paradox a little bit.
But here I basically have a

01:05:36.800 --> 01:05:40.333
large set, and I'm looking at
all pairs, but my set is

01:05:40.333 --> 01:05:44.000
sufficiently big that the odds
that I get a collision is

01:05:44.000 --> 01:05:47.199
relatively small.
If I start increasing it beyond

01:05:47.199 --> 01:05:50.400
the square root of m,
OK, the number of elements,

01:05:50.400 --> 01:05:54.466
it starts getting bigger in the
square root of m then the odds

01:05:54.466 --> 01:05:57.733
of a collision go up
dramatically as you know from

01:05:57.733 --> 01:06:01.532
the birthday paradox.
But if I'm less than,

01:06:01.532 --> 01:06:05.401
if I'm really sparse in there,
I don't get collisions.

01:06:05.401 --> 01:06:09.197
Or at least I get a relatively
small number expected.

01:06:09.197 --> 01:06:13.430
Now I want to remind you of
something which actually in the

01:06:13.430 --> 01:06:17.080
past I have just assumed,
but I want to actually go

01:06:17.080 --> 01:06:20.291
through it briefly.
It's Markov's inequality.

01:06:20.291 --> 01:06:22.919
So who remembers Markov's
inequality?

01:06:22.919 --> 01:06:25.839
Don't everybody raise their
hand at once.

01:06:25.839 --> 01:06:30.000
So Markov's inequality says the
following.

01:06:30.000 --> 01:06:34.145
This is one of these great
probability facts.

01:06:34.145 --> 01:06:38.762
For random variable x which is
bounded below by 0,

01:06:38.762 --> 01:06:44.227
says the probability that x is
bigger than, greater than or

01:06:44.227 --> 01:06:49.316
equal to any given value T is
less than or equal to the

01:06:49.316 --> 01:06:53.838
expectation of x divided by T.
It's a great fact.

01:06:53.838 --> 01:06:57.796
Doesn't happen if x isn't bound
below by 0.

01:06:57.796 --> 01:07:03.230
But it's a great fact.
It allows me to relate the

01:07:03.230 --> 01:07:06.833
probability of an event to its
expectation.

01:07:06.833 --> 01:07:12.066
And the idea is in general that
if the expectation is going to

01:07:12.066 --> 01:07:17.213
be small, then I can't have a
high probability that the value

01:07:17.213 --> 01:07:21.845
of the random variable is large.
It doesn't make sense.

01:07:21.845 --> 01:07:26.649
How could you have a high
probability that it's a million

01:07:26.649 --> 01:07:31.968
when my expectation is one or in
this case we're going to apply

01:07:31.968 --> 01:07:36.000
it when the expectation is a
half?

01:07:36.000 --> 01:07:39.676
Couldn't happen.
And the proof follows just

01:07:39.676 --> 01:07:44.666
directly on the definition of
expectation, and so I'mdoing

01:07:44.666 --> 01:07:47.730
this for a discrete random
variable.

01:07:47.730 --> 01:07:52.282
So the expectation by
definition is just the sum from

01:07:52.282 --> 01:07:57.622
little x goes to 0 to infinity
of x times the probability that

01:07:57.622 --> 01:08:02.000
my random variable takes on the
value x.

01:08:02.000 --> 01:08:06.560
That's the definition.
And now it's just a question of

01:08:06.560 --> 01:08:11.120
doing like the coarsest
approximation you can imagine.

01:08:11.120 --> 01:08:14.734
First of all,
let me just simply throw away

01:08:14.734 --> 01:08:19.725
all small terms that can be
greater to or equal to x equals

01:08:19.725 --> 01:08:24.716
T to infinity of x times the
probability that x is equal to

01:08:24.716 --> 01:08:28.072
little x.
So just throw away all the low

01:08:28.072 --> 01:08:31.426
order terms.
Now what I'm going to do is

01:08:31.426 --> 01:08:36.848
replace every one of these terms
is lower bounded by the value x

01:08:36.848 --> 01:08:42.875
equals T.
So that's just the summation of

01:08:42.875 --> 01:08:49.750
x equals T to infinity of T
times the probability that x

01:08:49.750 --> 01:08:51.250
equals x.
OK.

01:08:51.250 --> 01:08:58.250
Over x going from T larger.
Because these are only bigger

01:08:58.250 --> 01:09:02.009
values.
And that's just equal then to

01:09:02.009 --> 01:09:06.306
T, because I can pull that out,
and the summation of x equals T

01:09:06.306 --> 01:09:10.256
to infinity of the probability
that x equals x is just the

01:09:10.256 --> 01:09:14.000
probability that x is greater
than or equal to T.

01:09:20.000 --> 01:09:26.000
And that's done because I just
divide by T.

01:09:31.000 --> 01:09:34.379
So that's Markov's inequality.
Really dumb.

01:09:34.379 --> 01:09:37.919
Really simple.
There are much stronger things

01:09:37.919 --> 01:09:42.264
like Chebyshev bounds and
Chernoff bounds and things of

01:09:42.264 --> 01:09:44.839
that nature.
But Markov's is like

01:09:44.839 --> 01:09:49.586
unbelievably simple and useful.
So we're going to just apply

01:09:49.586 --> 01:09:52.000
that as a corollary.

01:10:06.000 --> 01:10:13.059
So the probability now of no
collisions, when I hash n keys

01:10:13.059 --> 01:10:19.391
into n squared slots using a
universal hash function,

01:10:19.391 --> 01:10:26.817
I claim is the probability of
no collisions is greater than or

01:10:26.817 --> 01:10:32.173
equal to a half.
So I pick a hash function at

01:10:32.173 --> 01:10:36.409
random.
What are the odds that I got no

01:10:36.409 --> 01:10:40.917
collisions when I hashed those n
keys into n squared slots?

01:10:40.917 --> 01:10:43.326
Answer.
Probability is I have no

01:10:43.326 --> 01:10:47.834
collisions is at least a half.
Half the time I'm guaranteed

01:10:47.834 --> 01:10:51.409
that there won't be a collision.
And the proof,

01:10:51.409 --> 01:10:54.129
pretty simple.
The probability of no

01:10:54.129 --> 01:10:57.549
collisions is the same as the
probability as,

01:10:57.549 --> 01:11:01.746
sorry, is one minus the
probability that I have at most

01:11:01.746 --> 01:11:05.850
one collision.
So the odds that I have at

01:11:05.850 --> 01:11:09.337
least one collision,
the odds that I have at least

01:11:09.337 --> 01:11:12.254
one collision,
probability greater than or

01:11:12.254 --> 01:11:15.599
equal to one collision is less
than or equal to,

01:11:15.599 --> 01:11:18.872
now I just apply Markov's
inequality with this.

01:11:18.872 --> 01:11:23.000
So it's just the expected
number of collisions --

01:11:29.000 --> 01:11:33.090
-- divided by one.
And that is by Markov's

01:11:33.090 --> 01:11:36.272
inequality less than,
by definition,

01:11:36.272 --> 01:11:40.181
excuse me, of expected number
of collisions,

01:11:40.181 --> 01:11:44.363
which we've already shown,
is less than a half.

01:11:44.363 --> 01:11:49.636
So the probability of at least
one collision is less than a

01:11:49.636 --> 01:11:52.909
half.
The probability of 0 collisions

01:11:52.909 --> 01:11:56.363
is at least a half.
So we're done here.

01:11:56.363 --> 01:12:02.000
So to find a good level to hash
function is easy.

01:12:02.000 --> 01:12:06.562
I just test a few at random.
Most of them out there,

01:12:06.562 --> 01:12:10.856
OK, half of them,
at least half of them are going

01:12:10.856 --> 01:12:13.808
to work.
So this is in some sense,

01:12:13.808 --> 01:12:18.102
if you think about it,
a randomized construction,

01:12:18.102 --> 01:12:22.664
because I can't tell you which
one it's going to be.

01:12:22.664 --> 01:12:27.763
It's non-constructive in that
sense, but it's a randomized

01:12:27.763 --> 01:12:32.485
construction.
But they have to exist because

01:12:32.485 --> 01:12:36.297
most of them out there have this
good property.

01:12:36.297 --> 01:12:40.605
So I'mgoing to be able to find
for each one of these,

01:12:40.605 --> 01:12:44.168
I just test a few at random,
and I find one.

01:12:44.168 --> 01:12:47.068
Test a few at random,
find one, etc.

01:12:47.068 --> 01:12:50.548
Fill in my table there.
Because all that is

01:12:50.548 --> 01:12:53.945
pre-computation.
And I'mgoing to find them

01:12:53.945 --> 01:12:57.342
because the odds are good that
one exists.

01:12:57.342 --> 01:12:59.000
So --

01:13:13.000 --> 01:13:14.000
-- we just test a few at random.

01:13:24.000 --> 01:13:25.000
And we'll find one quickly --

01:13:32.000 --> 01:13:34.300
-- since at least half will
work.

01:13:34.300 --> 01:13:37.679
I just want to show that there
exists good ones.

01:13:37.679 --> 01:13:41.777
All I have to prove is that at
least one works for each of

01:13:41.777 --> 01:13:44.366
these cases.
In fact, I've shown that

01:13:44.366 --> 01:13:46.954
there's a huge number that will
work.

01:13:46.954 --> 01:13:50.189
Half of them will work.
But to show it exists,

01:13:50.189 --> 01:13:54.647
I would just have to show that
the probability was greater than

00:00:00.000 --> 01:13:55.941
So to finish up,

01:13:55.941 --> 01:14:00.254
we need to still analyze the
storage because I promised in my

01:14:00.254 --> 01:14:05.000
theorem that the table would be
of size order n.

01:14:05.000 --> 01:14:12.702
And yet now I've said there's
all of these quadratic-sized

01:14:12.702 --> 01:14:18.378
slots here.
So I'mgoing to show that that's

01:14:18.378 --> 01:14:20.000
order n.

01:14:31.000 --> 01:14:35.605
So for level one,
that's easy.

01:14:35.605 --> 01:14:45.450
We'll just choose the number of
slots to be equal to the number

01:14:45.450 --> 01:14:51.008
of keys.
And that way the storage at

01:14:51.008 --> 01:14:59.583
level one is just order n.
And now let's let n sub i be

01:14:59.583 --> 01:15:08.000
the random variable for the
number of keys --

01:15:13.000 --> 01:15:21.712
-- that hash to slot i in T.
OK, so n sub i is just what

01:15:21.712 --> 01:15:28.683
we've called it.
Number of elements that slot

01:15:28.683 --> 01:15:34.386
there.
And we're going to use m sub i

01:15:34.386 --> 01:15:45.000
equals n sub i squared slots in
each level two table S sub i.

01:15:45.000 --> 01:15:47.000
So the expected total storage --

01:15:54.000 --> 01:16:01.085
-- is just n for level one,
order n if you want,

01:16:01.085 --> 01:16:09.979
but basically n slots for level
one plus the expected value,

01:16:09.979 --> 01:16:19.326
whatever I expect the sum of i
equals 0 to m minus one of theta

01:16:19.326 --> 01:16:24.000
of n sub i squared to be.

01:16:30.000 --> 01:16:36.048
Because I basically have to add
up the square for every element

01:16:36.048 --> 01:16:40.731
that applies here,
the square of what's in there.

01:16:40.731 --> 01:16:46.682
Who recognizes this summation?
Where have we seen that before?

01:16:46.682 --> 01:16:51.951
Who attends recitation?
Where have we seen this before?

01:16:51.951 --> 01:16:54.000
What's the --

01:17:03.000 --> 01:17:06.000
We're summing the expected
value of a bunch of --

01:17:11.000 --> 01:17:14.959
Yeah, what was that algorithm?
We did the sorting algorithm,

01:17:14.959 --> 01:17:17.375
right?
What was the sorting algorithm

01:17:17.375 --> 01:17:21.000
for which this was an important
thing to evaluate?

01:17:26.000 --> 01:17:29.272
Don't everybody shout it out at
once.

01:17:29.272 --> 01:17:33.000
What was that sorting algorithm
called?

01:17:33.000 --> 01:17:35.397
Bucket sort.
Good.

01:17:35.397 --> 01:17:37.794
Bucket sort.
Yeah.

01:17:37.794 --> 01:17:46.397
We showed that the sum of the
squares of random variables when

01:17:46.397 --> 01:17:53.025
they're falling randomly into n
bins is order n.

01:17:53.025 --> 01:17:55.000
Right?

01:18:16.000 --> 01:18:20.105
And you can also out of this
get a, as we did before,

01:18:20.105 --> 01:18:24.131
get a probability bound.
What's the probability that

01:18:24.131 --> 01:18:28.315
it's more than a certain amount
times n using Markov's

01:18:28.315 --> 01:18:31.394
inequality.
But this is the key thing is

01:18:31.394 --> 01:18:36.109
we've seen this analysis.
OK, we used it there in time,

01:18:36.109 --> 01:18:39.963
so there's a little bit,
but that's one of the reasons

01:18:39.963 --> 01:18:43.963
we study sorting at the
beginning of the term is because

01:18:43.963 --> 01:18:47.890
the techniques of sorting,
they just propagate into all

01:18:47.890 --> 01:18:52.327
these other areas of analysis.
You see a lot of the same kinds

01:18:52.327 --> 01:18:55.309
of things.
And so now that you know bucket

01:18:55.309 --> 01:18:59.018
sort clearly so well,
now you know that this without

01:18:59.018 --> 01:19:04.610
having to do any extra work.
So you might want to go back

01:19:04.610 --> 01:19:09.925
and review your bucket sort
analysis, because it's applied

01:19:09.925 --> 01:19:11.604
now.
Same analysis.

01:19:11.604 --> 01:19:12.909
Two places.
OK.

01:19:12.909 --> 01:19:18.411
Good recitation this Friday,
which will be a quiz review and

01:19:18.411 --> 01:19:22.794
we have a quiz next,
there's no class on Monday,

01:19:22.794 --> 01:19:26.151
but we have a quiz on next
Wednesday.

01:19:26.151 --> 01:19:31.000
OK, so good luck everybody on
the quiz.

01:19:31.000 --> 01:19:34.000
Make sure you get plenty of
sleep.