1
00:00:11 --> 00:00:17
So, we're going to talk today
about binary search trees.
2
00:00:17 --> 00:00:23
It's something called randomly
built binary search trees.
3
00:00:23 --> 00:00:29
And, I'll abbreviate binary
search trees as BST's throughout
4
00:00:29 --> 00:00:33
the lecture.
And, you of all seen binary
5
00:00:33 --> 00:00:39
search trees in one place or
another, in particular,
6
00:00:39 --> 00:00:45
recitation on Friday.
So, we're going to build up the
7
00:00:45 --> 00:00:49
basic ideas presented there,
and talk about how to randomize
8
00:00:49 --> 00:00:54
them, and make them good.
So, you know that there are
9
00:00:54 --> 00:00:58
good binary search trees,
which are relatively balanced,
10
00:00:58 --> 00:01:02
something like this.
The height is log n.
11
00:01:02 --> 00:01:04
We called unbalanced,
and that's good.
12
00:01:04 --> 00:01:06
Anything order log n will be
fine.
13
00:01:06 --> 00:01:10
In terms of searching,
it will then cost order log n.
14
00:01:10 --> 00:01:14
And, there are bad binary
search trees which have really
15
00:01:14 --> 00:01:16
large height,
possibly as big as n.
16
00:01:16 --> 00:01:19
So, this is good,
and this is bad.
17
00:01:19 --> 00:01:22
We'd sort of like to know,
we'd like to build binary
18
00:01:22 --> 00:01:26
search trees in such a way that
they are good all the time,
19
00:01:26 --> 00:01:31
or at least most of the time.
There are lots of ways to do
20
00:01:31 --> 00:01:36
this, and in the next couple of
weeks, we will see four of them,
21
00:01:36 --> 00:01:39
if you count the problem set,
I believe.
22
00:01:39 --> 00:01:42
Today, we are going to use
randomization to make them
23
00:01:42 --> 00:01:45
balanced most of the time in a
certain sense.
24
00:01:45 --> 00:01:49
And then, in your problem set,
you will make that in a broader
25
00:01:49 --> 00:01:52
sense.
But, one way to motivate this
26
00:01:52 --> 00:01:56
topic, so I'm not going to
define randomly built binary
27
00:01:56 --> 00:02:00
search trees for a little bit.
One way to motivate the topic
28
00:02:00 --> 00:02:04
is through sorting,
our good friend.
29
00:02:04 --> 00:02:09
So, there's a natural way to
sort n numbers using binary
30
00:02:09 --> 00:02:13
search trees.
So, if I give you an array,
31
00:02:13 --> 00:02:18
A, how would you sort that
array using binary search tree
32
00:02:18 --> 00:02:23
operations as a black box?
Build the binary search tree,
33
00:02:23 --> 00:02:27
and then traverse it in order.
Exactly.
34
00:02:27 --> 00:02:30
So, let's say we have some
initial tree,
35
00:02:30 --> 00:02:35
which is empty,
and then for each element of
36
00:02:35 --> 00:02:40
the array, we insert it into the
tree.
37
00:02:40 --> 00:02:46
That's what you meant by
building the search tree.
38
00:02:46 --> 00:02:53
So, we insert AI into the tree.
This is the binary search tree
39
00:02:53 --> 00:03:00
insertion, standard insertion.
And then, we do an in order
40
00:03:00 --> 00:03:09
traversal, which in the book is
called in order tree walk.
41
00:03:09 --> 00:03:11
OK, you should know these
algorithms are,
42
00:03:11 --> 00:03:14
but just for very quick
reminder, tree insert basically
43
00:03:14 --> 00:03:18
searches for that element AI
until it finds the place where
44
00:03:18 --> 00:03:21
it should have been if it was in
the tree already,
45
00:03:21 --> 00:03:24
and then adds a new leaf there
to insert that value.
46
00:03:24 --> 00:03:27
Tree walk recursively walks the
left subtree,
47
00:03:27 --> 00:03:30
then prints out the root,
and then recursively walks the
48
00:03:30 --> 00:03:33
right subtree.
And, by the binary search tree
49
00:03:33 --> 00:03:38
property, that will print the
elements out in sorted order.
50
00:03:38 --> 00:03:43
So, let's do a quick example
because this turns out to be
51
00:03:43 --> 00:03:48
related to another sorting
algorithm we've seen already.
52
00:03:48 --> 00:03:52
So, while the example is
probably pretty trivial,
53
00:03:52 --> 00:03:55
the connection is pretty
surprising.
54
00:03:55 --> 00:04:02
At least, it was to me the
first time I taught this class.
55
00:04:02 --> 00:04:04
So, my array is three,
one, eight, two,
56
00:04:04 --> 00:04:08
six, seven, five.
And, I'm going to visit these
57
00:04:08 --> 00:04:12
elements in order from left to
right, and just build a tree.
58
00:04:12 --> 00:04:15
So, the first element I see is
three.
59
00:04:15 --> 00:04:18
So, I insert three into an
empty tree.
60
00:04:18 --> 00:04:21
That requires no comparisons.
Then I insert one.
61
00:04:21 --> 00:04:24
I see, is one bigger or less
than three?
62
00:04:24 --> 00:04:27
It's smaller.
So, I put it over here.
63
00:04:27 --> 00:04:31
Then I insert eight.
That's bigger than three,
64
00:04:31 --> 00:04:35
so it get's a new leaf over
here.
65
00:04:35 --> 00:04:38
Then I insert two.
That sits between one and
66
00:04:38 --> 00:04:41
three.
And so, it would fall off this
67
00:04:41 --> 00:04:44
right child of one.
So, I add two there.
68
00:04:44 --> 00:04:48
Six is bigger than three,
and less than eight.
69
00:04:48 --> 00:04:51
So, it goes here.
Seven is bigger than three,
70
00:04:51 --> 00:04:54
and less than eight,
bigger than six.
71
00:04:54 --> 00:04:58
So, it goes here,
and five fits in between three
72
00:04:58 --> 00:05:03
and five, three and six rather.
And so, that's the binary
73
00:05:03 --> 00:05:06
search tree that again.
Then I run an in order
74
00:05:06 --> 00:05:10
traversal, which will print one,
two, three, five,
75
00:05:10 --> 00:05:13
six, seven, eight.
OK, I can run I quickly in my
76
00:05:13 --> 00:05:15
head because I've got a big
stack.
77
00:05:15 --> 00:05:18
I've got to be a little bit
careful.
78
00:05:18 --> 00:05:22
Of course, you should check
that they come out in sorted
79
00:05:22 --> 00:05:24
order: one, two,
three, five,
80
00:05:24 --> 00:05:27
six, seven, eight.
And, if you don't have a big
81
00:05:27 --> 00:05:32
stack, you can go and buy one.
That's always useful.
82
00:05:32 --> 00:05:36
Memory costs are going up a bit
these days, or going down.
83
00:05:36 --> 00:05:40
They should be because of
politics, but price-fixing,
84
00:05:40 --> 00:05:43
or whatever.
So, the question is,
85
00:05:43 --> 00:05:46
what's the running time of the
algorithm?
86
00:05:46 --> 00:05:50
Here, this is one of those
answers where it depends.
87
00:05:50 --> 00:05:53
The parts that are easy to
analyze are, well,
88
00:05:53 --> 00:05:56
initialization.
The in order tree walk,
89
00:05:56 --> 00:06:00
how long does that take?
n, good.
90
00:06:00 --> 00:06:05
So, it's order n for the walk,
and for the initialization,
91
00:06:05 --> 00:06:08
which is constant.
The question is,
92
00:06:08 --> 00:06:13
how long does it take me to do
n tree inserts?
93
00:06:13 --> 00:06:21
94
00:06:21 --> 00:06:26
Anyone want to guess any kind
of answer to that question,
95
00:06:26 --> 00:06:32
other than it depends?
I've already stolen the thunder
96
00:06:32 --> 00:06:34
there.
Yeah?
97
00:06:34 --> 00:06:38
Big Omega of n log n,
that's good.
98
00:06:38 --> 00:06:42
It's at least n log n.
Why?
99
00:06:42 --> 00:06:56
100
00:06:56 --> 00:06:58
Right.
So, you gave two reasons.
101
00:06:58 --> 00:07:02
The first one is because of the
decision tree lower bound.
102
00:07:02 --> 00:07:04
That doesn't actually prove
this.
103
00:07:04 --> 00:07:07
You have to be a little bit
careful.
104
00:07:07 --> 00:07:10
This is a claim that it's omega
n log n all the time.
105
00:07:10 --> 00:07:14
It's certainly omega n log n in
the worst case.
106
00:07:14 --> 00:07:18
Every comparison-based sorting
algorithm is omega n log n in
107
00:07:18 --> 00:07:21
the worst case.
It's also n log n every single
108
00:07:21 --> 00:07:25
time, omega n log n because of
the second reason you gave,
109
00:07:25 --> 00:07:29
which is the best thing that
could happen is we have a
110
00:07:29 --> 00:07:33
perfectly balanced tree.
So, this is the figure that I
111
00:07:33 --> 00:07:36
have drawn the most on a
blackboard in my life,
112
00:07:36 --> 00:07:41
the perfect tree on 15 nodes,
I guess.
113
00:07:41 --> 00:07:42
So, if we're lucky,
we have this.
114
00:07:42 --> 00:07:45
And if you add up all the
depths of the nodes here,
115
00:07:45 --> 00:07:48
which gives you the search tree
cost, in particular,
116
00:07:48 --> 00:07:52
these n over two nodes in the
bottom, each have depth log n.
117
00:07:52 --> 00:07:54
And, therefore,
you're going to have to pay it
118
00:07:54 --> 00:07:57
least n log n for those.
And, if you're less balanced,
119
00:07:57 --> 00:08:02
it's going to be even worse.
That takes some proving,
120
00:08:02 --> 00:08:08
but it's true.
So, it's actually omega n log n
121
00:08:08 --> 00:08:13
all the time.
OK, there are some cases,
122
00:08:13 --> 00:08:19
like you do know that the
elements are almost already in
123
00:08:19 --> 00:08:25
order, you can do it in linear
number comparisons.
124
00:08:25 --> 00:08:32
But here, you can't.
Any other guesses at an answer
125
00:08:32 --> 00:08:34
to this question?
Yeah?
126
00:08:34 --> 00:08:39
Big O n^2?
Good, why?
127
00:08:39 --> 00:08:41
Right.
We are doing n things,
128
00:08:41 --> 00:08:44
and each node has depth,
at most, n.
129
00:08:44 --> 00:08:49
So, the number of comparisons
we're making per element we
130
00:08:49 --> 00:08:51
insert, is, at most,
n.
131
00:08:51 --> 00:08:53
So that's, at most,
n^2.
132
00:08:53 --> 00:08:56
Any other answers?
Is it possible for this
133
00:08:56 --> 00:09:03
algorithm to take n^2 time?
Are there instances where it
134
00:09:03 --> 00:09:08
takes theta n^2?
If it's already sorted,
135
00:09:08 --> 00:09:14
that would be pretty bad.
So, if it's already sorted or
136
00:09:14 --> 00:09:21
if it's reverse sorted,
you are in bad shape because
137
00:09:21 --> 00:09:27
then you get a tree like this.
This is the sorted case.
138
00:09:27 --> 00:09:32
And, you compute.
So, the total cost,
139
00:09:32 --> 00:09:38
the time in general is going to
be the sum of the depths of the
140
00:09:38 --> 00:09:41
nodes for each node,
X, in the tree.
141
00:09:41 --> 00:09:45
And in this case,
it's one plus two plus three
142
00:09:45 --> 00:09:48
plus four, this arithmetic
series.
143
00:09:48 --> 00:09:52
There's n of them,
so this is theta n squared.
144
00:09:52 --> 00:09:56
It's like n^2 over two.
So, that's bad news.
145
00:09:56 --> 00:10:03
The worst-case running time of
this algorithm is n^2.
146
00:10:03 --> 00:10:08
Does that sound familiar at
all, and algorithms worst-case
147
00:10:08 --> 00:10:11
running time is n^2,
in particular,
148
00:10:11 --> 00:10:16
in the already-sorted case?
But if we're lucky,
149
00:10:16 --> 00:10:20
at the lucky case,
as we said, it's a balanced
150
00:10:20 --> 00:10:23
tree.
Wouldn't that be great?
151
00:10:23 --> 00:10:28
Anything with omega log n
height would give us a sorting
152
00:10:28 --> 00:10:36
algorithm that runs in n log n.
So, in the lucky case,
153
00:10:36 --> 00:10:43
we are n log n.
But in the unlucky case,
154
00:10:43 --> 00:10:48
we are n^2 and unlucky use
sorted.
155
00:10:48 --> 00:10:57
Does it remind you of any
algorithm we've seen before?
156
00:10:57 --> 00:11:02
Quicksort.
It turns out the running time
157
00:11:02 --> 00:11:09
of this algorithm is the same as
the running time of quicksort in
158
00:11:09 --> 00:11:13
a very strong sense.
It turns out the comparisons
159
00:11:13 --> 00:11:19
that this algorithm makes are
exactly the same comparisons
160
00:11:19 --> 00:11:24
that quicksort makes.
It makes them in a different
161
00:11:24 --> 00:11:29
order, but it's really the same
algorithm in disguise.
162
00:11:29 --> 00:11:34
That's the surprise here.
So, in particular,
163
00:11:34 --> 00:11:36
we've already analyzed
quicksort.
164
00:11:36 --> 00:11:40
We should get something for
free out of that analysis.
165
00:11:40 --> 00:11:54
166
00:11:54 --> 00:12:05
So, the relation is,
BST sort and quicksort make the
167
00:12:05 --> 00:12:15
same comparisons but in a
different order.
168
00:12:15 --> 00:12:25
169
00:12:25 --> 00:12:29
So, let me walk through the
same example we did before:
170
00:12:29 --> 00:12:33
three, one, eight,
two, six, seven,
171
00:12:33 --> 00:12:35
five.
So, there is an array.
172
00:12:35 --> 00:12:40
We are going to run a
particular version of quicksort.
173
00:12:40 --> 00:12:43
I have to be a little bit
careful here.
174
00:12:43 --> 00:12:47
It's sort of the obvious
version of quicksort.
175
00:12:47 --> 00:12:52
Remember, our standard,
boring quicksort is you take
176
00:12:52 --> 00:12:56
the first element as the
partition element.
177
00:12:56 --> 00:13:01
So, I'll take three here.
And, I split into the elements
178
00:13:01 --> 00:13:04
less than three,
which is one and two.
179
00:13:04 --> 00:13:07
And, the elements bigger than
three, which is eight,
180
00:13:07 --> 00:13:09
six, seven, five.
And, in this version of
181
00:13:09 --> 00:13:12
quicksort, I don't change the
order of the elements,
182
00:13:12 --> 00:13:13
eight, six, seven,
five.
183
00:13:13 --> 00:13:17
So, let's say the order is
preserved because only then will
184
00:13:17 --> 00:13:20
this equivalence hold.
So, this is sort of a stable
185
00:13:20 --> 00:13:22
partition algorithm.
It's easy enough to do.
186
00:13:22 --> 00:13:25
It's a particular version of
quicksort.
187
00:13:25 --> 00:13:27
And soon, we're going to
randomize it.
188
00:13:27 --> 00:13:32
And after we randomize,
this difference doesn't matter.
189
00:13:32 --> 00:13:35
OK, then on the left recursion,
we split in the partition
190
00:13:35 --> 00:13:38
element.
There is things less than one,
191
00:13:38 --> 00:13:41
which is nothing,
things bigger than one,
192
00:13:41 --> 00:13:44
which is two.
And then, that's our partition
193
00:13:44 --> 00:13:45
element.
We are done.
194
00:13:45 --> 00:13:48
Over here, we partition on
eight.
195
00:13:48 --> 00:13:51
Everything is less than eight.
So, we get six,
196
00:13:51 --> 00:13:53
seven, five,
nothing on the right.
197
00:13:53 --> 00:13:57
Then we partition at six.
We get things less than six,
198
00:13:57 --> 00:13:59
mainly five,
things bigger than six,
199
00:13:59 --> 00:14:03
mainly seven.
And, those are sort of
200
00:14:03 --> 00:14:06
partition elements in a trivial
way.
201
00:14:06 --> 00:14:11
Now, this tree that we get on
the partition elements looks an
202
00:14:11 --> 00:14:16
awful lot like this tree.
OK, it should be exactly the
203
00:14:16 --> 00:14:19
same tree.
And, you can walk through,
204
00:14:19 --> 00:14:22
what comparisons does quicksort
make?
205
00:14:22 --> 00:14:25
Well, first,
it compares everything to
206
00:14:25 --> 00:14:30
three, OK, except three itself.
Now, if you look over here,
207
00:14:30 --> 00:14:32
what happens when we are
inserting elements?
208
00:14:32 --> 00:14:35
Well, each time we insert an
element, the first thing we do
209
00:14:35 --> 00:14:37
is compare with three.
If it's less than,
210
00:14:37 --> 00:14:40
we go to the left branch.
If it's greater than,
211
00:14:40 --> 00:14:43
we go to the right branch.
So, we are making all these
212
00:14:43 --> 00:14:44
comparisons with three in both
cases.
213
00:14:44 --> 00:14:47
Then, if we have an element
less than three,
214
00:14:47 --> 00:14:49
it's either one or two.
If it's one,
215
00:14:49 --> 00:14:51
we're done.
No comparisons happen here one
216
00:14:51 --> 00:14:52
to one.
But, we compare two to one.
217
00:14:52 --> 00:14:56
And indeed, when we insert two
over there after comparing it to
218
00:14:56 --> 00:14:59
three, we compare it to one.
And then we figure out that it
219
00:14:59 --> 00:15:01
happens here.
Same thing happens in
220
00:15:01 --> 00:15:04
quicksort.
For elements greater than
221
00:15:04 --> 00:15:08
three, we compare everyone to
eight here because we are
222
00:15:08 --> 00:15:12
partitioning with respect to
eight, and here because that's
223
00:15:12 --> 00:15:16
the next node after three.
As soon as eight is inserted,
224
00:15:16 --> 00:15:20
we compare everything with
eight to see in fact that's less
225
00:15:20 --> 00:15:23
than eight, and so on:
so, all of the same
226
00:15:23 --> 00:15:25
comparisons, just in a different
order.
227
00:15:25 --> 00:15:29
So, we turn 90°.
Kind of cool.
228
00:15:29 --> 00:15:34
So, this has various
consequences in the analysis.
229
00:15:34 --> 00:15:50
230
00:15:50 --> 00:15:54
So, in particular,
the worst-case running time is
231
00:15:54 --> 00:15:58
theta n^2, which is not so
exciting.
232
00:15:58 --> 00:16:04
What we really care about is
the randomized version because
233
00:16:04 --> 00:16:10
that's what performs well.
So, randomized BST sort is just
234
00:16:10 --> 00:16:16
like randomized quicksort.
So, the first thing you do is
235
00:16:16 --> 00:16:21
randomly permute the array
uniformly, picking all
236
00:16:21 --> 00:16:24
permutations with equal
probability.
237
00:16:24 --> 00:16:31
And then, we call BST sort.
OK, this is basically what
238
00:16:31 --> 00:16:35
randomized quicksort could be
formulated as.
239
00:16:35 --> 00:16:40
And then, randomized BST sort
is going to make exactly the
240
00:16:40 --> 00:16:43
same comparisons as randomized
quicksort.
241
00:16:43 --> 00:16:48
Here, we are picking the root
essentially randomly.
242
00:16:48 --> 00:16:52
And here in quicksort,
you are picking the partition
243
00:16:52 --> 00:16:56
elements randomly.
It's the same difference.
244
00:16:56 --> 00:17:00
OK, so the time of this
algorithm equals the time of
245
00:17:00 --> 00:17:08
randomized quicksort because we
are making the same comparisons.
246
00:17:08 --> 00:17:10
So, the number of comparisons
is equal.
247
00:17:10 --> 00:17:11
And this is true as random
variables.
248
00:17:11 --> 00:17:13
The random variable,
the running time,
249
00:17:13 --> 00:17:16
this algorithm is equal to the
random variable of this
250
00:17:16 --> 00:17:17
algorithm.
In particular,
251
00:17:17 --> 00:17:20
the expectations are the same.
252
00:17:20 --> 00:17:33
253
00:17:33 --> 00:17:37
OK, and we know that the
expected running time of
254
00:17:37 --> 00:17:40
randomized quicksort on n
elements is?
255
00:17:40 --> 00:17:42
Oh boy.
n log n.
256
00:17:42 --> 00:17:45
Good.
I was a little worried there.
257
00:17:45 --> 00:17:49
OK, so in particular,
the expected running time of
258
00:17:49 --> 00:17:53
BST sort is n log n.
Obviously, this is not too
259
00:17:53 --> 00:17:57
exciting from a sorting point of
view.
260
00:17:57 --> 00:18:03
Sorting was just sort of to see
this connection.
261
00:18:03 --> 00:18:05
What we actually care about,
and the reason I've introduced
262
00:18:05 --> 00:18:08
this BST sort is what the tree
looks like.
263
00:18:08 --> 00:18:10
What we really want is that
search tree.
264
00:18:10 --> 00:18:11
The search tree can do more
than sort.
265
00:18:11 --> 00:18:14
n order traversals are a pretty
boring thing to do with the
266
00:18:14 --> 00:18:16
search tree.
You can search in a search
267
00:18:16 --> 00:18:18
tree.
So, OK, that's still not so
268
00:18:18 --> 00:18:20
exciting.
You could sort the elements and
269
00:18:20 --> 00:18:22
then put them in an array and do
binary search.
270
00:18:22 --> 00:18:26
But, the point of binary search
trees, instead of binary search
271
00:18:26 --> 00:18:28
arrays, is that you can update
them dynamically.
272
00:18:28 --> 00:18:31
We won't be updating them
dynamically in this lecture,
273
00:18:31 --> 00:18:35
and we will in Wednesday and on
your problem set.
274
00:18:35 --> 00:18:36
For now, it's just sort of
warm-up.
275
00:18:36 --> 00:18:39
Let's say that the elements
aren't changing.
276
00:18:39 --> 00:18:41
We are building one tree from
the beginning.
277
00:18:41 --> 00:18:43
We have all n elements ahead of
time.
278
00:18:43 --> 00:18:45
We are going to build it
randomly.
279
00:18:45 --> 00:18:49
We randomly permute that array.
Then we throw all the elements
280
00:18:49 --> 00:18:52
into a binary search tree.
That's what BST sort does.
281
00:18:52 --> 00:18:54
Then it calls n order
traversal.
282
00:18:54 --> 00:18:56
I don't really care about n
order traversal.
283
00:18:56 --> 00:19:00
What I want,
because we've just analyzed it.
284
00:19:00 --> 00:19:04
It would be a short lecture if
I were done.
285
00:19:04 --> 00:19:11
What we want is this randomly
built BST, which is what we get
286
00:19:11 --> 00:19:18
out of this algorithm.
So, this is the tree resulting
287
00:19:18 --> 00:19:24
from randomized BST sort,
OK, resulting from randomly
288
00:19:24 --> 00:19:30
permute in the array of just
inserting those elements using
289
00:19:30 --> 00:19:36
the simple tree insert
algorithm.
290
00:19:36 --> 00:19:40
The question is,
what does that tree look like?
291
00:19:40 --> 00:19:45
And in particular,
is there anything we can
292
00:19:45 --> 00:19:50
conclude out of this fact?
The expected running time of
293
00:19:50 --> 00:19:55
BST sort is n log n.
OK, I've mentioned cursorily
294
00:19:55 --> 00:20:02
what the running time of BST
sort is, several times.
295
00:20:02 --> 00:20:06
It was the sum.
So, this is the time of BST
296
00:20:06 --> 00:20:11
sort on n elements.
It's the sum over all nodes,
297
00:20:11 --> 00:20:17
X, of the depth of that node.
OK, depth starts at zero and
298
00:20:17 --> 00:20:21
works its way down because the
root element,
299
00:20:21 --> 00:20:27
you don't make any comparisons
beyond that, you are making
300
00:20:27 --> 00:20:32
whatever the depth is
comparisons.
301
00:20:32 --> 00:20:40
OK, so we know that this thing
is, in expectation we know that
302
00:20:40 --> 00:20:47
this is n log n.
What does that tell us about
303
00:20:47 --> 00:20:52
the tree?
This is for all nodes,
304
00:20:52 --> 00:20:58
X, in the tree.
Does it tell us anything about
305
00:20:58 --> 00:21:03
the height of the tree,
for example?
306
00:21:03 --> 00:21:07
Yeah?
Right, intuitively,
307
00:21:07 --> 00:21:11
it says that the height of the
tree is theta log n,
308
00:21:11 --> 00:21:13
and not n.
But, in fact,
309
00:21:13 --> 00:21:17
it doesn't show that.
And that's why if you feel that
310
00:21:17 --> 00:21:21
that's just intuition,
but it may not be quite right.
311
00:21:21 --> 00:21:24
Indeed it's not.
Let me tell you what it does
312
00:21:24 --> 00:21:27
say.
So, if we take expectation of
313
00:21:27 --> 00:21:31
both sides, here we get n log n.
So, the expected value of that
314
00:21:31 --> 00:21:35
is n log n.
So, over here,
315
00:21:35 --> 00:21:41
well, we get the expected total
depth, which is not so exciting.
316
00:21:41 --> 00:21:45
Let's look at the expected
average depth.
317
00:21:45 --> 00:21:51
So, if I look at one over n,
the sum over all n nodes in the
318
00:21:51 --> 00:21:57
tree of the depth of X,
that would be the average depth
319
00:21:57 --> 00:22:02
over all the nodes.
And what I should get is theta
320
00:22:02 --> 00:22:06
n log n over n because I divided
n on both sides.
321
00:22:06 --> 00:22:10
And, I'm using,
here, linearity of expectation,
322
00:22:10 --> 00:22:14
which is log n.
So, what this fact about the
323
00:22:14 --> 00:22:19
expected running time tells me
is that the average depth in the
324
00:22:19 --> 00:22:23
tree is log n,
which is not quite the height
325
00:22:23 --> 00:22:26
of the tree being log n.
326
00:22:26 --> 00:22:35
327
00:22:35 --> 00:22:39
OK, remember the height of the
tree is the maximum depth of any
328
00:22:39 --> 00:22:41
node.
Here, we are just bounding the
329
00:22:41 --> 00:22:43
average depth.
330
00:22:43 --> 00:23:04
331
00:23:04 --> 00:23:08
Let's look at an example of a
tree.
332
00:23:08 --> 00:23:14
I'll draw my favorite picture.
So, here we have a nice
333
00:23:14 --> 00:23:20
balanced tree,
let's say, on half of the nodes
334
00:23:20 --> 00:23:25
or a little more.
And then, I have one really
335
00:23:25 --> 00:23:30
long path hanging off one
particular leaf.
336
00:23:30 --> 00:23:37
It doesn't matter which one.
And, I'm going to say that this
337
00:23:37 --> 00:23:41
path has length,
with a total height here,
338
00:23:41 --> 00:23:45
I want to make root n,
which is a lot bigger than log
339
00:23:45 --> 00:23:47
n.
This is roughly log n.
340
00:23:47 --> 00:23:51
It's going to be log of n minus
root n, or so,
341
00:23:51 --> 00:23:54
roughly.
So, most of the nodes have
342
00:23:54 --> 00:23:58
logarithmic height and,
sorry, logarithmic depth.
343
00:23:58 --> 00:24:03
If you compute the average
depth in this particular tree,
344
00:24:03 --> 00:24:06
for most of the nodes,
let's say it's,
345
00:24:06 --> 00:24:12
at most, n of the nodes have
height log n.
346
00:24:12 --> 00:24:15
And then, there are root n
nodes, at most,
347
00:24:15 --> 00:24:19
down here, which have depth,
at most, root n.
348
00:24:19 --> 00:24:22
So, it's, at most,
root n times root n.
349
00:24:22 --> 00:24:26
In fact, it's like half that,
but not a big deal.
350
00:24:26 --> 00:24:29
So, this is n.
So, this is n log n,
351
00:24:29 --> 00:24:34
or, sorry, average depth:
I have to divide everything by
352
00:24:34 --> 00:24:38
n.
n log n would be rather large
353
00:24:38 --> 00:24:42
for an average height,
average depth.
354
00:24:42 --> 00:24:48
So, the average depth here is
log n, but the height of the
355
00:24:48 --> 00:24:53
tree is square root of n.
So, this is not enough.
356
00:24:53 --> 00:24:59
Just to know that the average
depth is log n doesn't mean that
357
00:24:59 --> 00:25:04
the height is log n.
OK, but the claim is this
358
00:25:04 --> 00:25:10
theorem for today is that the
expected height of a randomly
359
00:25:10 --> 00:25:16
built binary search tree is
indeed log n.
360
00:25:16 --> 00:25:21
BST is order log n.
This is what we like to know
361
00:25:21 --> 00:25:26
because that tells us,
if we just build a binary
362
00:25:26 --> 00:25:31
search tree randomly,
then we can search in it in log
363
00:25:31 --> 00:25:34
n time.
OK, for sorting,
364
00:25:34 --> 00:25:38
it's not as big a deal.
We just care about the expected
365
00:25:38 --> 00:25:41
running time of creating the
thing.
366
00:25:41 --> 00:25:44
Here, now we know that once we
prove this theorem,
367
00:25:44 --> 00:25:48
we know that we can search
quickly in expectation,
368
00:25:48 --> 00:25:53
in fact, most of the time.
So, the rest of today's lecture
369
00:25:53 --> 00:25:56
will be proving this theorem.
It's quite tricky,
370
00:25:56 --> 00:26:00
as you might imagine.
It's another big probability
371
00:26:00 --> 00:26:06
analysis along the lines of
quicksort and everything.
372
00:26:06 --> 00:26:22
373
00:26:22 --> 00:26:26
So, I'm going to start with an
outline of the proof,
374
00:26:26 --> 00:26:31
unless there are any questions
about the theorem.
375
00:26:31 --> 00:26:35
It should be pretty clear what
we want to prove.
376
00:26:35 --> 00:26:40
This is even weirder than most
of the analyses we've seen.
377
00:26:40 --> 00:26:45
It's going to use a fancy
trick, which is exponentiating a
378
00:26:45 --> 00:26:50
random variable.
And to do that we need a tool
379
00:26:50 --> 00:26:54
called Jenson's inequality.
We are going to prove that
380
00:26:54 --> 00:26:57
tool.
Usually, we don't prove
381
00:26:57 --> 00:27:01
probability tools.
But this one we are going to
382
00:27:01 --> 00:27:03
prove.
It's not too hard.
383
00:27:03 --> 00:27:09
It's also basic analysis.
So, the lemma,
384
00:27:09 --> 00:27:13
says that if we have what's
called to a convex function,
385
00:27:13 --> 00:27:17
f, and you should all know what
that means, but I'll define it
386
00:27:17 --> 00:27:21
soon in case you have forgotten.
If you have a convex function,
387
00:27:21 --> 00:27:25
f, and you have a random
variable, X, you take f of the
388
00:27:25 --> 00:27:27
expectation.
That's, at most,
389
00:27:27 --> 00:27:32
the expectation of f of that
random variable.
390
00:27:32 --> 00:27:40
Think about it enough and draw
a convex function that is fairly
391
00:27:40 --> 00:27:46
intuitive, I guess.
But we will prove it.
392
00:27:46 --> 00:27:54
What that allows us to do is
instead of analyzing the random
393
00:27:54 --> 00:28:00
variable that tells us the
height of a tree,
394
00:28:00 --> 00:28:06
so, X_n I'll call the random
variable, RV,
395
00:28:06 --> 00:28:13
of the height of a BST,
randomly constructed BST on n
396
00:28:13 --> 00:28:21
nodes we will analyze.
Well, instead of analyzing this
397
00:28:21 --> 00:28:27
desired random variable,
X_n, sorry, this should have
398
00:28:27 --> 00:28:32
been in capital X.
We can analyze any convex
399
00:28:32 --> 00:28:35
function of X_n.
And, we're going to analyze the
400
00:28:35 --> 00:28:39
exponentiation.
So, I'm going to define Y_n to
401
00:28:39 --> 00:28:43
be two to the power of X_n.
OK, the big question here is
402
00:28:43 --> 00:28:47
why bother doing this?
The answer is because it works
403
00:28:47 --> 00:28:50
and it wouldn't work if we
analyze X_n.
404
00:28:50 --> 00:28:54
We will see some intuition of
that later on,
405
00:28:54 --> 00:28:59
but it's not very intuitive.
This is our analysis where you
406
00:28:59 --> 00:29:03
need this extra trick.
So, we're going to bound the
407
00:29:03 --> 00:29:05
expectation of Y_n,
and from that,
408
00:29:05 --> 00:29:09
and using Jensen's inequality,
we're going to get a bound on
409
00:29:09 --> 00:29:12
the expectation of X_n,
a pretty tight bound,
410
00:29:12 --> 00:29:16
actually, because if we can
bound the exponent up to
411
00:29:16 --> 00:29:18
constant factors,
the exponentiation up to
412
00:29:18 --> 00:29:21
constant factors,
we can bound X_n even better
413
00:29:21 --> 00:29:23
because you take logs to get
X_n.
414
00:29:23 --> 00:29:28
So, we will even figure out
what the constant is.
415
00:29:28 --> 00:29:33
So, what we will prove,
this is the heart of the proof,
416
00:29:33 --> 00:29:37
is that the expected value of
Y_n is order n^3.
417
00:29:37 --> 00:29:42
Here, we won't really know what
the constant is.
418
00:29:42 --> 00:29:46
We don't need to.
And then, we put these pieces
419
00:29:46 --> 00:29:49
together.
So, let's do that.
420
00:29:49 --> 00:29:54
What we really care about is
the expectation of X_n,
421
00:29:54 --> 00:29:57
which is the height of our
tree.
422
00:29:57 --> 00:30:02
What we find out about is this
fact.
423
00:30:02 --> 00:30:05
So, leave some horizontal space
here.
424
00:30:05 --> 00:30:09
We get the expectation of two
to the X_n.
425
00:30:09 --> 00:30:14
That's the expectation of Y_n.
So, we learned that that's
426
00:30:14 --> 00:30:18
order n^3.
And, Jensen's inequality tells
427
00:30:18 --> 00:30:23
us that if we take this
function, two to the X,
428
00:30:23 --> 00:30:27
we plug it in here,
that on the left-hand side we
429
00:30:27 --> 00:30:33
get two to the E of X.
So, we get two to the E of X_n
430
00:30:33 --> 00:30:38
is at most E of two to the X_n.
So, that's where we use
431
00:30:38 --> 00:30:43
Jensen's inequality,
because what we care about is E
432
00:30:43 --> 00:30:46
of X_n.
So now, we have a bound.
433
00:30:46 --> 00:30:50
We say, well,
two to the E of X_n is,
434
00:30:50 --> 00:30:54
at most, n^3.
So, if we take the log of both
435
00:30:54 --> 00:31:00
sides, we get E of X_n is,
at most, the log of n^3.
436
00:31:00 --> 00:31:05
OK, I will write it in this
funny way, log of order n^3,
437
00:31:05 --> 00:31:09
which will actually tell us the
constant.
438
00:31:09 --> 00:31:12
This is three log n plus order
one.
439
00:31:12 --> 00:31:18
So, we will prove that the
expected height of a randomly
440
00:31:18 --> 00:31:24
constructed binary search tree
on n nodes is roughly three log
441
00:31:24 --> 00:31:28
n, at most.
OK, I will say more about that
442
00:31:28 --> 00:31:31
later.
So, you've now seen the end of
443
00:31:31 --> 00:31:35
the proof.
That's the foreshadowing.
444
00:31:35 --> 00:31:38
And now, this is the top-down
approach.
445
00:31:38 --> 00:31:41
So, you sort of see what the
steps are.
446
00:31:41 --> 00:31:44
Now, we just have to do the
steps.
447
00:31:44 --> 00:31:46
OK, step one:
take a bit of work,
448
00:31:46 --> 00:31:50
but it's easy because it's
pretty basic stuff.
449
00:31:50 --> 00:31:54
Step two is just a definition
and we are done.
450
00:31:54 --> 00:31:57
Step three is probably the
hardest part.
451
00:31:57 --> 00:32:03
Step four, we've already done.
So, let's start with step one.
452
00:32:03 --> 00:32:16
453
00:32:16 --> 00:32:22
So, the first thing I need to
do is define a convex function
454
00:32:22 --> 00:32:29
because we are going to
manipulate the definition a fair
455
00:32:29 --> 00:32:33
amount.
So, this is a notion from real
456
00:32:33 --> 00:32:36
analysis.
Analysis is a fancy word for
457
00:32:36 --> 00:32:40
calculus if you haven't taken
the proper analysis class.
458
00:32:40 --> 00:32:44
You should have seen convexity
in any calculus class.
459
00:32:44 --> 00:32:47
A convex function is one that
looks like this.
460
00:32:47 --> 00:32:50
OK, good.
One way to formalize that
461
00:32:50 --> 00:32:53
notion is to consider any two
points on this curve.
462
00:32:53 --> 00:32:57
So, I'm only interested in
functions from reals to reals.
463
00:32:57 --> 00:33:01
So, it looks like this.
This is f of something.
464
00:33:01 --> 00:33:05
And, this is the something.
If I take two points on this
465
00:33:05 --> 00:33:08
curve, and I draw a line segment
connecting them,
466
00:33:08 --> 00:33:11
that line segment is always
above the curve.
467
00:33:11 --> 00:33:13
That's the meaning of
convexity.
468
00:33:13 --> 00:33:16
It has a geometric notion,
which is basically the same.
469
00:33:16 --> 00:33:19
But for functions,
this line segment should stay
470
00:33:19 --> 00:33:22
above the curve.
The line does not stay above
471
00:33:22 --> 00:33:24
the curve.
If I extended it farther,
472
00:33:24 --> 00:33:26
it goes beneath the curve,
of course.
473
00:33:26 --> 00:33:31
But, that segment should.
So, I'm going to formalize that
474
00:33:31 --> 00:33:33
a little bit.
I'll call this x,
475
00:33:33 --> 00:33:37
and then this is f of x.
And, I'll call this y,
476
00:33:37 --> 00:33:41
and this is f of y.
So, the claim is that I take
477
00:33:41 --> 00:33:44
any number between x and y,
and I look up,
478
00:33:44 --> 00:33:48
and I say, OK,
here's the point on the curve.
479
00:33:48 --> 00:33:50
Here's the point on the line
segment.
480
00:33:50 --> 00:33:54
The value of that point on the
y value, here,
481
00:33:54 --> 00:33:58
should be greater than or equal
to the y value here,
482
00:33:58 --> 00:34:01
OK?
To figure out what the point
483
00:34:01 --> 00:34:06
is, we need some,
I would call it geometry.
484
00:34:06 --> 00:34:08
I'm sure it's an analysis
concept, too.
485
00:34:08 --> 00:34:12
But, I'm a geometer,
so I get to call it geometry.
486
00:34:12 --> 00:34:16
If you have two points,
p and q, and you want to
487
00:34:16 --> 00:34:19
parameterize this line segment
between them,
488
00:34:19 --> 00:34:24
so, I want to parameterize some
points here, the way to do it is
489
00:34:24 --> 00:34:29
to take a linear combination.
And, if you should have taken
490
00:34:29 --> 00:34:32
some linear algebra,
linear combination look
491
00:34:32 --> 00:34:35
something like this.
And, in fact,
492
00:34:35 --> 00:34:39
we're going to take something
called an affine combination
493
00:34:39 --> 00:34:41
where alpha plus beta equals
one.
494
00:34:41 --> 00:34:43
It turns out,
if you take all such points,
495
00:34:43 --> 00:34:45
some number,
alpha, times the point,
496
00:34:45 --> 00:34:48
p, plus some number,
beta times the point,
497
00:34:48 --> 00:34:50
q, where alpha plus beta equals
one.
498
00:34:50 --> 00:34:53
If you take all those points,
you get the entire line here,
499
00:34:53 --> 00:34:56
which is nifty.
But, we don't want the entire
500
00:34:56 --> 00:34:58
line.
If you also constrained alpha
501
00:34:58 --> 00:35:01
and beta to be nonnegative,
you just get this line segment.
502
00:35:01 --> 00:35:05
So, this forces alpha and beta
to be between zero and one
503
00:35:05 --> 00:35:10
because they have to sum to one,
and they are nonnegative.
504
00:35:10 --> 00:35:14
So, what we are going to do
here is take alpha times x plus
505
00:35:14 --> 00:35:17
beta times y.
That's going to be our point
506
00:35:17 --> 00:35:22
between with these constraints:
alpha plus beta equals one.
507
00:35:22 --> 00:35:26
Alpha and beta are greater than
or equal to zero.
508
00:35:26 --> 00:35:31
Then, this point is f of that.
This is f of alpha x plus beta,
509
00:35:31 --> 00:35:34
y.
And, this point is the linear
510
00:35:34 --> 00:35:38
interpolation between f of x and
f of y, the same one.
511
00:35:38 --> 00:35:42
So, it's alpha times f of x
plus beta times f of y.
512
00:35:42 --> 00:35:46
OK, that's the intuition.
If you didn't follow it,
513
00:35:46 --> 00:35:51
it's not too big a deal because
all we care about are the
514
00:35:51 --> 00:35:54
symbolic answer for proving
things.
515
00:35:54 --> 00:35:56
But, that's where this comes
from.
516
00:35:56 --> 00:36:03
So, here's the definition.
Its function is convex.
517
00:36:03 --> 00:36:09
If, for all x and y,
and all alpha and beta are
518
00:36:09 --> 00:36:16
greater than or equal to zero,
whose sum is one,
519
00:36:16 --> 00:36:25
we have f of alpha x plus beta
y is less than or equal to alpha
520
00:36:25 --> 00:36:32
f of x plus beta f of y.
So, that's just saying that
521
00:36:32 --> 00:36:38
this y coordinate here is less
than or equal to this y
522
00:36:38 --> 00:36:41
coordinate.
OK, but that's the symbolism
523
00:36:41 --> 00:36:46
behind that picture.
OK, so now we want to prove
524
00:36:46 --> 00:36:51
Jensen's inequality.
OK, we're not quite there yet.
525
00:36:51 --> 00:36:57
We are going to prove a simple
lemma, from which it will be
526
00:36:57 --> 00:37:02
easy to derive Jenson's
equality.
527
00:37:02 --> 00:37:07
So, this is the theorem we are
proving.
528
00:37:07 --> 00:37:13
So, here's a lemma about convex
functions.
529
00:37:13 --> 00:37:22
You may have seen it before.
It will be crucial to Jensen's
530
00:37:22 --> 00:37:25
inequality.
So, suppose,
531
00:37:25 --> 00:37:34
this is a statement about
affine combinations of n things
532
00:37:34 --> 00:37:41
instead of two things.
So, this will say that
533
00:37:41 --> 00:37:46
convexity can be generalized to
taking n things.
534
00:37:46 --> 00:37:52
So, suppose we have n real
numbers, and we have n values
535
00:37:52 --> 00:37:55
alpha i, alpha one up to alpha
n.
536
00:37:55 --> 00:38:00
They are all nonnegative.
And, their sum is one.
537
00:38:00 --> 00:38:06
So, the sum of alpha k,
I guess, k equals one to n,
538
00:38:06 --> 00:38:11
is one.
So, those are the assumptions.
539
00:38:11 --> 00:38:18
The conclusion is the same
thing, but summing over all k.
540
00:38:18 --> 00:38:22
So, k equals one to n,
alpha_k * x_k.
541
00:38:22 --> 00:38:29
Take f of that versus taking
the sum of the alphas times the
542
00:38:29 --> 00:38:32
f's.
k equals one to n.
543
00:38:32 --> 00:38:37
So, the definition of convexity
is exactly that statement,
544
00:38:37 --> 00:38:42
but where n equals two.
OK, alpha one and alpha two are
545
00:38:42 --> 00:38:46
alpha and beta.
This is just a statement for
546
00:38:46 --> 00:38:50
general n.
And, you can interpret this in
547
00:38:50 --> 00:38:53
some funnier way,
which I won't get into.
548
00:38:53 --> 00:38:56
Oh, sure, why not?
I'm a geometer.
549
00:38:56 --> 00:39:03
So, this is saying you take
several points on this curve.
550
00:39:03 --> 00:39:05
You take the polygon that they
define.
551
00:39:05 --> 00:39:07
So, these are straight-line
segments.
552
00:39:07 --> 00:39:10
You take the interior.
If you take an affine
553
00:39:10 --> 00:39:13
combination like that,
you will get a point inside
554
00:39:13 --> 00:39:16
that polygon,
or possibly on the boundary.
555
00:39:16 --> 00:39:20
The claim is that all those
points are above the curve.
556
00:39:20 --> 00:39:23
Again, intuitively:
true if you draw a nice,
557
00:39:23 --> 00:39:25
canonical convex curve,
but in fact,
558
00:39:25 --> 00:39:27
it's true algebraically,
too.
559
00:39:27 --> 00:39:33
It's always a good thing.
Any suggestions on how we might
560
00:39:33 --> 00:39:36
prove this theorem,
this lemma?
561
00:39:36 --> 00:39:40
It's pretty easy.
So, what technique might we use
562
00:39:40 --> 00:39:44
to prove it?
One word: induction.
563
00:39:44 --> 00:39:46
Always a good answer,
yeah.
564
00:39:46 --> 00:39:52
Induction should shout out at
you here because we already know
565
00:39:52 --> 00:40:00
that this is true by definition
of convexity for n equals two.
566
00:40:00 --> 00:40:04
So, the base case is clear.
In fact, there's an even
567
00:40:04 --> 00:40:08
simpler base case,
which is when n equals one.
568
00:40:08 --> 00:40:13
If n equals one,
then you have one number that
569
00:40:13 --> 00:40:16
sums to one.
So, alpha one is one.
570
00:40:16 --> 00:40:19
And so, nothing is going on
here.
571
00:40:19 --> 00:40:23
This is just saying that f of
one times x_1 is,
572
00:40:23 --> 00:40:28
at most, one times f of x_1:
so, not terribly exciting
573
00:40:28 --> 00:40:33
because that holds with the
quality.
574
00:40:33 --> 00:40:37
OK, so we don't even need the n
equals two base case.
575
00:40:37 --> 00:40:42
So, the interesting part,
although still not terribly
576
00:40:42 --> 00:40:45
interesting, is the induction
step.
577
00:40:45 --> 00:40:48
This is good practice in
induction.
578
00:40:48 --> 00:40:53
So, what we care about is this
f of this linear combination,
579
00:40:53 --> 00:40:57
f on combination,
x_k times x_k summed over all
580
00:40:57 --> 00:41:01
k.
Now, what I would like to do is
581
00:41:01 --> 00:41:05
apply induction.
What I know about inductively,
582
00:41:05 --> 00:41:09
is say f of this sum,
if it's summed only up to n
583
00:41:09 --> 00:41:12
minus one instead of all the way
up to n.
584
00:41:12 --> 00:41:16
Any smaller sum I can deal with
by induction.
585
00:41:16 --> 00:41:20
So, I'm going to try and get
rid of the nth term.
586
00:41:20 --> 00:41:24
I want to separate it out.
And, this is fairly natural if
587
00:41:24 --> 00:41:28
you've played with affine
combinations before.
588
00:41:28 --> 00:41:35
But it's just some algebra.
So, I want to separate out the
589
00:41:35 --> 00:41:40
alpha_n*x_n term.
And, I'd also like to make it
590
00:41:40 --> 00:41:45
an affine combination.
This is the trick.
591
00:41:45 --> 00:41:50
Sorry, no f here.
If I just removed the last
592
00:41:50 --> 00:41:57
term, the alpha k's from one up
to n minus one wouldn't sum to
593
00:41:57 --> 00:42:02
one anymore.
They'd sum to something
594
00:42:02 --> 00:42:05
smaller.
So, I can't just take out this
595
00:42:05 --> 00:42:08
term.
I'm going to have to do some
596
00:42:08 --> 00:42:10
trickery here,
x_k plus the f.
597
00:42:10 --> 00:42:13
Good.
So, you should see why this is
598
00:42:13 --> 00:42:17
true, because the one minus
alpha n's cancel.
599
00:42:17 --> 00:42:22
And then, I'm just getting the
sum of alpha_k*x_k,
600
00:42:22 --> 00:42:28
k equals one to n minus one,
plus the alpha_n*x_n term.
601
00:42:28 --> 00:42:30
So, I haven't done anything
here.
602
00:42:30 --> 00:42:32
These are equal.
But now, I have this nifty
603
00:42:32 --> 00:42:36
feature, that on the one hand,
these two numbers,
604
00:42:36 --> 00:42:38
alpha n and one minus alpha n
sum to one.
605
00:42:38 --> 00:42:41
And on the other hand,
if I did it right,
606
00:42:41 --> 00:42:45
these numbers should sum up to
one just going from one up to n
607
00:42:45 --> 00:42:47
minus one.
Why do they sum up to one?
608
00:42:47 --> 00:42:51
Well, these numbers summed up
to one minus alpha n.
609
00:42:51 --> 00:42:54
And so, I'm dividing everything
by one minus alpha n.
610
00:42:54 --> 00:42:57
So, they will sum to one.
So now, I have two affine
611
00:42:57 --> 00:43:02
combinations.
I just apply the two things
612
00:43:02 --> 00:43:07
that I know.
I know this affine combination
613
00:43:07 --> 00:43:10
will work because,
well, why?
614
00:43:10 --> 00:43:16
Why can I say that this is
alpha n f of x_n plus one minus
615
00:43:16 --> 00:43:20
alpha n f of this crazy sum?
616
00:43:20 --> 00:43:35
617
00:43:35 --> 00:43:41
Shout it out.
There are two possible answers.
618
00:43:41 --> 00:43:47
One is correct,
and one is incorrect.
619
00:43:47 --> 00:43:55
So, which will it be?
This should have been less than
620
00:43:55 --> 00:44:01
or equal to.
That's important.
621
00:44:01 --> 00:44:04
It's on the board.
It can't be too difficult.
622
00:44:04 --> 00:44:17
623
00:44:17 --> 00:44:21
So, I'm treating this as just
one big X value.
624
00:44:21 --> 00:44:26
So, I have some x_n,
and I have some crazy X.
625
00:44:26 --> 00:44:31
I want f of the affine
combination of those two X
626
00:44:31 --> 00:44:36
values is, at most,
the affine combinations of the
627
00:44:36 --> 00:44:40
f's of those X values.
This is?
628
00:44:40 --> 00:44:43
It is the inductive hypothesis
where n equals two.
629
00:44:43 --> 00:44:45
Unfortunately,
we didn't prove the n equals
630
00:44:45 --> 00:44:49
two case is a special base case.
So, we can't use induction here
631
00:44:49 --> 00:44:52
the way that I've stated the
base case.
632
00:44:52 --> 00:44:55
If you did n equals two base
case, you can do that.
633
00:44:55 --> 00:44:58
Here, we can't.
So, the other answer is by
634
00:44:58 --> 00:45:02
convexity, good.
That's right here.
635
00:45:02 --> 00:45:08
So, f is convex.
We know that this is true for
636
00:45:08 --> 00:45:15
any two X values,
and provided these two sum to
637
00:45:15 --> 00:45:20
one.
So, we know that this is true.
638
00:45:20 --> 00:45:28
Now is when we apply induction.
So, now we are going to
639
00:45:28 --> 00:45:35
manipulate this right term by
induction.
640
00:45:35 --> 00:45:40
See, before we didn't
necessarily know that n was
641
00:45:40 --> 00:45:44
bigger than two.
But, we know that n is bigger
642
00:45:44 --> 00:45:49
than n minus one.
That much, I can be sure of.
643
00:45:49 --> 00:45:53
So, this is one minus alpha n
times the sum,
644
00:45:53 --> 00:46:00
k equals one to n minus one of
alpha k over one minus alpha n
645
00:46:00 --> 00:46:05
times f of x_k,
if I got that right.
646
00:46:05 --> 00:46:09
This is by induction,
the induction hypothesis,
647
00:46:09 --> 00:46:16
because these alpha k's over
one minus alpha n sum to one.
648
00:46:16 --> 00:46:22
Now, these one minus alpha n's
cancel, and we just get what we
649
00:46:22 --> 00:46:26
want.
This is sum k equals one to n
650
00:46:26 --> 00:46:31
of alpha k, f of x_k.
So, we get f of the sum is,
651
00:46:31 --> 00:46:37
at most, sum of the f's.
That proves the lemma.
652
00:46:37 --> 00:46:43
OK, a bit tedious,
but each step is pretty
653
00:46:43 --> 00:46:46
straightforward.
Do you agree?
654
00:46:46 --> 00:46:53
Now, it turns out to be
relatively straightforward to
655
00:46:53 --> 00:47:00
prove Jensen's inequality.
That's the magic.
656
00:47:00 --> 00:47:04
And then, we get to do the
expectation analysis.
657
00:47:04 --> 00:47:09
So, we use our good friends,
indicator random variables.
658
00:47:09 --> 00:47:13
OK, but for now,
we just want to prove this
659
00:47:13 --> 00:47:16
statement.
If we have a convex function,
660
00:47:16 --> 00:47:21
f of the expectation is,
at most, expectation of f of
661
00:47:21 --> 00:47:26
that random variable.
OK, this is a random variable,
662
00:47:26 --> 00:47:29
right?
If you want to sample from this
663
00:47:29 --> 00:47:33
random variable,
you sample from X,
664
00:47:33 --> 00:47:39
and then you apply f to it.
That's the meaning of this
665
00:47:39 --> 00:47:45
notation, f of X because X is a
random variable.
666
00:47:45 --> 00:47:51
We get to use that f is convex.
OK, it turns out this is not
667
00:47:51 --> 00:47:57
hard, if you remember the
definition of expectation,
668
00:47:57 --> 00:48:01
oh, I want to make one more
assumption here,
669
00:48:01 --> 00:48:08
which is that X is integral.
So, it's an integer random
670
00:48:08 --> 00:48:11
variable, meaning it takes
integer values.
671
00:48:11 --> 00:48:16
OK, that's all we care about
because we're looking at running
672
00:48:16 --> 00:48:19
times.
This statement is true for
673
00:48:19 --> 00:48:24
continuous random variables,
too, but I would like to do the
674
00:48:24 --> 00:48:29
discrete case because then I get
to write down what U of X is.
675
00:48:29 --> 00:48:34
So, what is the definition of E
of X?
676
00:48:34 --> 00:48:40
X only takes on integer values.
This is easy,
677
00:48:40 --> 00:48:47
but you have to remember it.
It's a good drill.
678
00:48:47 --> 00:48:55
I don't really know much about
X except that it takes on
679
00:48:55 --> 00:49:02
integer values.
Any suggestions on how I should
680
00:49:02 --> 00:49:10
expand the expectation of X?
How many people know this by
681
00:49:10 --> 00:49:14
heart?
OK, it's not too easy then.
682
00:49:14 --> 00:49:20
Well, expectation has something
to do with probability,
683
00:49:20 --> 00:49:23
right?
So, I should be looking at
684
00:49:23 --> 00:49:29
something like the probability
that X equals some value,
685
00:49:29 --> 00:49:32
x.
That would seem like a good
686
00:49:32 --> 00:49:36
thing to do.
What else goes here?
687
00:49:36 --> 00:49:39
A sum, yeah.
The sum, well,
688
00:49:39 --> 00:49:44
X could be somewhere between
minus infinity and infinity.
689
00:49:44 --> 00:49:49
That's certainly true.
And, we have some more.
690
00:49:49 --> 00:49:54
There's something missing here.
What is this sum?
691
00:49:54 --> 00:49:58
What does it come out to for
any random variable,
692
00:49:58 --> 00:50:03
X, that takes on integer
values?
693
00:50:03 --> 00:50:06
One, good.
So, I need to add in something
694
00:50:06 --> 00:50:10
here, namely X.
OK, that's the definition of
695
00:50:10 --> 00:50:13
the expectation.
Now, f of a sum of things,
696
00:50:13 --> 00:50:18
where these coefficients sum to
one looks an awful lot like the
697
00:50:18 --> 00:50:23
lemma that we just proved.
OK, we proved it in the finite
698
00:50:23 --> 00:50:25
case.
It turns out,
699
00:50:25 --> 00:50:30
it holds just as well if you
take all integers.
700
00:50:30 --> 00:50:33
So, I'm just going to assume
that.
701
00:50:33 --> 00:50:39
So, I have these probabilities,
these alpha values sum to one.
702
00:50:39 --> 00:50:44
Therefore, I can use this
inequality, that this is,
703
00:50:44 --> 00:50:49
at most, let me get this right,
I have the alphas,
704
00:50:49 --> 00:50:53
so I have a sum,
x equals minus infinity to
705
00:50:53 --> 00:50:58
infinity of the alphas,
which are a probability;
706
00:50:58 --> 00:51:03
capital X equals little x times
f of the value,
707
00:51:03 --> 00:51:09
f of little x.
OK, so there it is.
708
00:51:09 --> 00:51:16
I've used the lemma.
So, maybe now I'll erase the
709
00:51:16 --> 00:51:21
lemma.
OK, I cheated by using the
710
00:51:21 --> 00:51:31
countable version of the lemma
while only proving the finite
711
00:51:31 --> 00:51:36
case.
It's all I can do in lecture.
712
00:51:36 --> 00:51:42
So, this is by a lemma.
Now, what I'd like to prove and
713
00:51:42 --> 00:51:47
leave some blank space here is
this is, at most,
714
00:51:47 --> 00:51:51
E of f of X,
so that this summation is,
715
00:51:51 --> 00:51:56
at most, E of f of X.
Actually, it's equal to E of f
716
00:51:56 --> 00:52:00
of X.
And, it really looks kind of
717
00:52:00 --> 00:52:05
equal, right?
You've got sum of some
718
00:52:05 --> 00:52:09
probabilities times f of X.
It almost looks like the
719
00:52:09 --> 00:52:13
definition of E of f of X,
but it isn't.
720
00:52:13 --> 00:52:18
You've got to be a little bit
careful because E of f of X
721
00:52:18 --> 00:52:23
should talk about the
probability that f of X equals a
722
00:52:23 --> 00:52:28
particular value.
We can relate these as follows.
723
00:52:28 --> 00:52:32
It's not too hard.
You can look at each value that
724
00:52:32 --> 00:52:37
f takes on, and then look at all
the values, k,
725
00:52:37 --> 00:52:41
that map to that value,
x.
726
00:52:41 --> 00:52:48
So all the k's where f of X
equals x, the probability that X
727
00:52:48 --> 00:52:54
equals k, OK,
this is another way of writing
728
00:52:54 --> 00:53:00
the probability that f of X
equals x.
729
00:53:00 --> 00:53:04
OK, so, in other words,
I'm grouping the terms in a
730
00:53:04 --> 00:53:07
particular way.
I'm saying, well,
731
00:53:07 --> 00:53:12
f of X takes on various values.
Clever me to switch.
732
00:53:12 --> 00:53:18
I used to use k's unannounced,
so I better call this something
733
00:53:18 --> 00:53:20
else.
Let's call this Y,
734
00:53:20 --> 00:53:25
sorry, switch notation here.
It makes sense.
735
00:53:25 --> 00:53:31
I should look at the
probability that X equals x.
736
00:53:31 --> 00:53:35
So, what I really care about is
what this f of X value takes on.
737
00:53:35 --> 00:53:38
Let's just call it Y,
look at all the values,
738
00:53:38 --> 00:53:41
Y, that f could take on.
That's the range of f.
739
00:53:41 --> 00:53:46
And then, I'll look at all the
different values of X where f of
740
00:53:46 --> 00:53:47
X equals Y.
If I add up those
741
00:53:47 --> 00:53:50
probabilities,
because these are different
742
00:53:50 --> 00:53:53
values of X.
Those are sort of independent
743
00:53:53 --> 00:53:56
events.
So, this summation will be the
744
00:53:56 --> 00:53:58
probability that f of X equals
Y.
745
00:53:58 --> 00:54:02
This is capital X.
This is little y.
746
00:54:02 --> 00:54:09
And then, if I multiply that by
y, I'm getting the expectation
747
00:54:09 --> 00:54:12
of f of X.
So, think about this,
748
00:54:12 --> 00:54:18
these two inequalities hold.
This may be a bit bizarre here
749
00:54:18 --> 00:54:22
because these sums are
potentially infinite.
750
00:54:22 --> 00:54:26
But, it's true.
OK, this proves Jensen's
751
00:54:26 --> 00:54:30
inequality.
So, it wasn't very hard,
752
00:54:30 --> 00:54:35
just a couple of boards,
once we had this powerful
753
00:54:35 --> 00:54:41
convexity lemma.
So, we just used convexity.
754
00:54:41 --> 00:54:43
We used the definition of E of
X.
755
00:54:43 --> 00:54:47
We used convexity.
That lets us put the f's
756
00:54:47 --> 00:54:50
inside.
Then we do this regrouping of
757
00:54:50 --> 00:54:54
terms, and we figure out,
oh, that's just E of f of X.
758
00:54:54 --> 00:54:58
So, the only inequality here is
coming from convexity.
759
00:54:58 --> 00:55:01
All right, now comes the
algorithms.
760
00:55:01 --> 00:55:05
So, this was just some basic
probability stuff,
761
00:55:05 --> 00:55:10
which is good to practice.
OK, we could see in the quiz,
762
00:55:10 --> 00:55:13
which is not surprising.
This is the case for me,
763
00:55:13 --> 00:55:15
too.
You have a lot of intuition
764
00:55:15 --> 00:55:17
with algorithms.
Whenever it's algorithmic,
765
00:55:17 --> 00:55:21
it makes a lot of sense because
you're sort of grounded in some
766
00:55:21 --> 00:55:24
things that you know because you
are computer scientists,
767
00:55:24 --> 00:55:27
or something of that ilk.
For the purposes of this class,
768
00:55:27 --> 00:55:32
you are computer scientists.
But, with sort of the basic
769
00:55:32 --> 00:55:36
probability, unless you happen
to be a mathematician,
770
00:55:36 --> 00:55:40
it's less intuitive,
and therefore harder to get
771
00:55:40 --> 00:55:42
fast.
And, in quiz one,
772
00:55:42 --> 00:55:45
speed is pretty important.
On the final,
773
00:55:45 --> 00:55:50
speed will also be important.
The take home certainly doesn't
774
00:55:50 --> 00:55:53
hurt.
So, the take home is more
775
00:55:53 --> 00:55:56
interesting because it requires
being clever.
776
00:55:56 --> 00:56:01
You have to actually be
creative.
777
00:56:01 --> 00:56:03
And, that really tests
algorithmic design.
778
00:56:03 --> 00:56:06
So far, we've mainly tested
analysis, and just,
779
00:56:06 --> 00:56:09
can you work through
probability?
780
00:56:09 --> 00:56:12
Can you figure out what the,
can you remember what your
781
00:56:12 --> 00:56:15
running time of randomized
quicksort is,
782
00:56:15 --> 00:56:17
and so on?
Quiz two will actually test
783
00:56:17 --> 00:56:20
creativity because you have more
time.
784
00:56:20 --> 00:56:22
It's hard to be creative in two
hours.
785
00:56:22 --> 00:56:26
OK, so we want to analyze the
expected height of a randomly
786
00:56:26 --> 00:56:32
constructed binary search tree.
So, I've defined this before,
787
00:56:32 --> 00:56:38
but let me repeat it because it
was a while ago almost at the
788
00:56:38 --> 00:56:42
beginning of lecture.
I'm going to take the random
789
00:56:42 --> 00:56:48
variable of the height of a
randomly built binary search
790
00:56:48 --> 00:56:51
tree on n nodes.
So, that was randomized,
791
00:56:51 --> 00:56:55
the n values.
Take a random permutation,
792
00:56:55 --> 00:57:02
insert them one by one from
left to right with tree insert.
793
00:57:02 --> 00:57:05
What is the height of the tree
that you get?
794
00:57:05 --> 00:57:08
What is the maximum depth of
any node?
795
00:57:08 --> 00:57:11
I'm not going to look so much
at X_n.
796
00:57:11 --> 00:57:14
I'm going to look at the
exponentiation of X_n.
797
00:57:14 --> 00:57:17
And, still we have no intuition
why.
798
00:57:17 --> 00:57:20
But, two to the X is a convex
function.
799
00:57:20 --> 00:57:23
OK, it looks like that.
It's very sharp.
800
00:57:23 --> 00:57:27
That's the best I can do for
drawing, two to the X.
801
00:57:27 --> 00:57:31
You saw how I drew my
histogram.
802
00:57:31 --> 00:57:34
So, we want to somehow write
this random variable as
803
00:57:34 --> 00:57:36
something, OK,
in some algebra.
804
00:57:36 --> 00:57:39
The main thing here is to split
into cases.
805
00:57:39 --> 00:57:42
That's how we usually go
because there's lots of
806
00:57:42 --> 00:57:45
different scenarios on what
happens.
807
00:57:45 --> 00:57:48
So, I mean, how do we construct
a tree from the beginning?
808
00:57:48 --> 00:57:51
First thing we do is we take
the first node.
809
00:57:51 --> 00:57:54
We throw it in,
make it the root.
810
00:57:54 --> 00:57:58
OK, so whatever the first value
happens to be in the array,
811
00:57:58 --> 00:58:02
which we don't really know how
that falls into sorted order,
812
00:58:02 --> 00:58:06
we put it at the root.
And, it stays the root.
813
00:58:06 --> 00:58:08
We never change the root from
then on.
814
00:58:08 --> 00:58:12
Now, of all the remaining
elements, some of them are less
815
00:58:12 --> 00:58:14
than this value,
and they go over here.
816
00:58:14 --> 00:58:17
So, let's call this r at the
root.
817
00:58:17 --> 00:58:19
And, some of them are greater
than r.
818
00:58:19 --> 00:58:22
So, they go over here.
Maybe there's more over here.
819
00:58:22 --> 00:58:25
Maybe there's more over here.
Who knows?
820
00:58:25 --> 00:58:28
Arbitrary partition,
in fact, uniformly random
821
00:58:28 --> 00:58:31
partition, which should sound
familiar, whether there are k
822
00:58:31 --> 00:58:34
elements over here,
and n minus k minus one
823
00:58:34 --> 00:58:36
elements over here,
for any value of k,
824
00:58:36 --> 00:58:42
that's equally likely because
this is chosen uniformly.
825
00:58:42 --> 00:58:44
The root is chosen uniformly.
It's the first element in a
826
00:58:44 --> 00:58:47
random permutation.
So, what I'm going to do is
827
00:58:47 --> 00:58:49
parameterize by that.
How many elements are over
828
00:58:49 --> 00:58:51
here, and how many elements are
over here?
829
00:58:51 --> 00:58:54
Because this thing is,
again, a randomly built binary
830
00:58:54 --> 00:58:57
search tree on however many
nodes are in there because after
831
00:58:57 --> 00:59:00
I pick r, it's determined who is
to the left and who is to the
832
00:59:00 --> 00:59:03
right.
And so, I can just partition.
833
00:59:03 --> 00:59:07
It's like running quicksort.
I partition the elements left
834
00:59:07 --> 00:59:11
of r, the elements right of r,
and I'm sort of recursively
835
00:59:11 --> 00:59:15
constructing a randomly built
binary search tree on those two
836
00:59:15 --> 00:59:18
sub-permutations because
sub-permutations of uniform
837
00:59:18 --> 00:59:22
permutations are uniform.
OK, so these are essentially
838
00:59:22 --> 00:59:25
recursive problems.
And, we know how to analyze
839
00:59:25 --> 00:59:28
recursive problems.
All we need to know is that
840
00:59:28 --> 00:59:31
there are k minus one elements
over here, and n minus k
841
00:59:31 --> 00:59:38
elements over here.
And, that would mean that r has
842
00:59:38 --> 00:59:45
rank k, remember,
rank in the sense of the index
843
00:59:45 --> 00:59:52
in assorted order.
So, where should I go?
844
00:59:52 --> 1:00:08
845
1:00:08 --> 1:00:11.034
So, if the root,
r, has rank,
846
1:00:11.034 --> 1:00:17.318
k, so if this is a statement
about condition on this event,
847
1:00:17.318 --> 1:00:23.278
which is a random event,
then what we have is X_n equals
848
1:00:23.278 --> 1:00:29.888
one plus the max of X_(k minus
one), X_(n minus k) because the
849
1:00:29.888 --> 1:00:35.848
height of this tree is the max
of the heights of the two
850
1:00:35.848 --> 1:00:43
subtrees plus one because we
have one more level up top.
851
1:00:43 --> 1:00:46.728
OK, so that's the natural thing
to do.
852
1:00:46.728 --> 1:00:51.263
What we are trying to analyze,
though, is Y_n.
853
1:00:51.263 --> 1:00:55.193
So, for Y_n,
we have to take two to this
854
1:00:55.193 --> 1:00:58.72
power.
So, it's two times the max of
855
1:00:58.72 --> 1:01:03.961
two to the X_(k minus one),
which is Y_(k minus one),
856
1:01:03.961 --> 1:01:09
and two to this,
which is Y_(n minus k).
857
1:01:09 --> 1:01:12.536
And, now you start to see,
maybe, why we are interested in
858
1:01:12.536 --> 1:01:16.26
Y's instead of X's in the sense
that it's what we know how to
859
1:01:16.26 --> 1:01:18.059
do.
When we solve a recursion,
860
1:01:18.059 --> 1:01:20.541
when we solve,
like, the expected running
861
1:01:20.541 --> 1:01:22.713
time, we haven't taken
expectations,
862
1:01:22.713 --> 1:01:24.823
yet, here.
But, when we compute the
863
1:01:24.823 --> 1:01:28.05
expected running time of
quicksort, we have something
864
1:01:28.05 --> 1:01:30.656
like two times,
I mean, we have a couple of
865
1:01:30.656 --> 1:01:35
recursive subproblems,
which are being added together.
866
1:01:35 --> 1:01:37.015
OK, here, we have a factor of
two.
867
1:01:37.015 --> 1:01:39.276
Here, we have a max.
But, intuitively,
868
1:01:39.276 --> 1:01:43.002
we know how to multiply random
variables by a constant because
869
1:01:43.002 --> 1:01:45.079
that's, like,
there's two recursive
870
1:01:45.079 --> 1:01:48.5
subproblems of the size is equal
to the max of these two,
871
1:01:48.5 --> 1:01:50.576
which we don't happen to know
here.
872
1:01:50.576 --> 1:01:52.653
But, there it is,
whereas one plus,
873
1:01:52.653 --> 1:01:54.791
we don't know how to handle so
well.
874
1:01:54.791 --> 1:01:57.357
And, indeed,
our techniques are really good
875
1:01:57.357 --> 1:02:00.289
at solving recurrences,
except up to the constant
876
1:02:00.289 --> 1:02:03.355
factors.
And, this one plus really
877
1:02:03.355 --> 1:02:05.685
doesn't affect the constant
factor too much,
878
1:02:05.685 --> 1:02:07.745
it would seem.
OK, but it's a big deal.
879
1:02:07.745 --> 1:02:09.859
In exponentiation,
it's a factor of two.
880
1:02:09.859 --> 1:02:13.112
So here, it's really hard to
see what this one plus is doing.
881
1:02:13.112 --> 1:02:14.9
And, our analysis,
if we tried it,
882
1:02:14.9 --> 1:02:18.099
and it's a good idea to try it
at home and see what happens,
883
1:02:18.099 --> 1:02:20.7
if you tried to do what I'm
about to do with X_n,
884
1:02:20.7 --> 1:02:24.007
the one plus will sort of get
lost, and you won't get a bound.
885
1:02:24.007 --> 1:02:26.771
You just can't prove anything.
With a factor of two,
886
1:02:26.771 --> 1:02:29.319
we're in good shape.
We sort of know how to deal
887
1:02:29.319 --> 1:02:33.98
with that.
We'll say more when we've
888
1:02:33.98 --> 1:02:41.015
actually done the proof about
why we use Y_n instead of X_n.
889
1:02:41.015 --> 1:02:44.353
But for now,
we're using Y_n.
890
1:02:44.353 --> 1:02:49.48
So, this is sort of a
recursion, except it's
891
1:02:49.48 --> 1:02:56.038
conditioned on this event.
So, how do I turn this into a
892
1:02:56.038 --> 1:02:59.973
statement that holds all the
time?
893
1:02:59.973 --> 1:03:04.896
Sorry?
Divide by the probability of
894
1:03:04.896 --> 1:03:07.275
the event?
More or less.
895
1:03:07.275 --> 1:03:11
Indeed, these events are
independent.
896
1:03:11 --> 1:03:15.551
Or, they're all equally likely,
I should say.
897
1:03:15.551 --> 1:03:21.241
They're not independent.
In fact, one determines all the
898
1:03:21.241 --> 1:03:24.241
others.
So, how do I generally
899
1:03:24.241 --> 1:03:30.137
represent an event in algebra?
Indicator random variables:
900
1:03:30.137 --> 1:03:34.995
good.
Remember your friends,
901
1:03:34.995 --> 1:03:42.076
indicator random variables.
All of these analyses use
902
1:03:42.076 --> 1:03:49.565
indicator random variables.
So, they will just represent
903
1:03:49.565 --> 1:03:54.195
this event, and we'll call it
Z_nk.
904
1:03:54.195 --> 1:03:59.778
It's going to be one if the
root has rank,
905
1:03:59.778 --> 1:04:05.415
k, and zero otherwise.
So, in particular,
906
1:04:05.415 --> 1:04:09.11
the probability of,
these things are all equally
907
1:04:09.11 --> 1:04:13.828
likely for, a particular value
of n if you try all the values
908
1:04:13.828 --> 1:04:16.186
of k.
The probability that this
909
1:04:16.186 --> 1:04:20.746
equals one, which is also the
expectation of that indicator
910
1:04:20.746 --> 1:04:23.734
random variable,
which you should know,
911
1:04:23.734 --> 1:04:26.486
is it only takes values one or
zero.
912
1:04:26.486 --> 1:04:29.788
The zero doesn't matter in the
expectation.
913
1:04:29.788 --> 1:04:34.034
So, this is going to be,
hopefully, one over n if I got
914
1:04:34.034 --> .
right.
915
. --> 1:04:36
916
1:04:36 --> 1:04:43.013
So, there are n possibility of
what the rank of the root could
917
1:04:43.013 --> 1:04:46.922
be.
Each of them are equally likely
918
1:04:46.922 --> 1:04:51.176
because we have a uniform
permutation.
919
1:04:51.176 --> 1:04:57.04
So, now, I can rewrite this
condition statement as a
920
1:04:57.04 --> 1:05:04.168
summation where the Z_nk's will
let me choose what case I'm in.
921
1:05:04.168 --> 1:05:10.836
So, we have Y_n is the sum,
k equals one to n of Z_nk times
922
1:05:10.836 --> 1:05:16.01
two times the max of X,
sorry, Y, k minus one,
923
1:05:16.01 --> 1:05:20.478
Y_n minus k.
So, now we have our good
924
1:05:20.478 --> 1:05:23.126
friend, the recurrence.
We need to solve it.
925
1:05:23.126 --> 1:05:26.329
OK, we can't really solve it
because this is a random
926
1:05:26.329 --> 1:05:29.963
variable, and it's talking about
recursive random variables.
927
1:05:29.963 --> 1:05:32.858
So, we first take the
expectation of both sides.
928
1:05:32.858 --> 1:05:36
That's the only thing we can
really bound.
929
1:05:36 --> 1:05:40.074
Y_n could be n^2 in an unlucky
case, sorry, not n^2.
930
1:05:40.074 --> 1:05:43.19
It could be n^2.
It could be two to the,
931
1:05:43.19 --> 1:05:47.903
boy, two to the n if you are
unlucky because X_n could be as
932
1:05:47.903 --> 1:05:50.46
big as n, the height of the
tree.
933
1:05:50.46 --> 1:05:54.694
And, Y_n is two to that.
So, it could be two to the n.
934
1:05:54.694 --> 1:05:58.688
What we want to prove is that
it's polynomial in n.
935
1:05:58.688 --> 1:06:02.203
If it's n to some constant,
and we take logs,
936
1:06:02.203 --> 1:06:07.341
it'll be order log n.
OK, so we'll take the
937
1:06:07.341 --> 1:06:14.254
expectation, and hopefully that
will guarantee that this holds.
938
1:06:14.254 --> 1:06:20.163
OK, so we have expectation of
this summation of random
939
1:06:20.163 --> 1:06:24.846
variables times recursive random
variables.
940
1:06:24.846 --> 1:06:30.198
So, what is the first,
woops, I forgot a bracket.
941
1:06:30.198 --> 1:06:37
What is the first thing that we
do in this analysis?
942
1:06:37 --> 1:06:41.3
This should,
yeah, linearity of expectation.
943
1:06:41.3 --> 1:06:45.9
That one's easy to remember.
OK, we have a sum.
944
1:06:45.9 --> 1:06:49
So, let's put the E inside.
945
1:06:49 --> 1:07:04
946
1:07:04 --> 1:07:08.842
OK, now we have the expectation
of our product.
947
1:07:08.842 --> 1:07:12.21
What should we use?
Independence.
948
1:07:12.21 --> 1:07:15.684
Hopefully, things are
independent.
949
1:07:15.684 --> 1:07:21.052
And then, we could write this.
Then, it would be the
950
1:07:21.052 --> 1:07:26.842
expectation of the product.
And, heck, let's put the two
951
1:07:26.842 --> 1:07:34
outside, because it's not,
no sense in keeping it in here.
952
1:07:34 --> 1:07:37.956
Y is there starting to look
like X's?
953
1:07:37.956 --> 1:07:42.351
I can't even read them.
Sorry about that.
954
1:07:42.351 --> 1:07:46.417
This should all be Y's.
OK, very wise,
955
1:07:46.417 --> 1:07:48.615
random variables.
So.
956
1:07:48.615 --> 1:07:54.769
Why are these independent?
So, here we are looking at the
957
1:07:54.769 --> 1:08:00.703
choice of what the root is,
what rank the root has in a
958
1:08:00.703 --> 1:08:05.608
problem of size n.
In here, we're looking at what
959
1:08:05.608 --> 1:08:08.02
the root, I mean,
there are various choices of
960
1:08:08.02 --> 1:08:11.29
what the search tree looks like
in the stuff left of the root,
961
1:08:11.29 --> 1:08:13.112
and in the stuff right of the
root.
962
1:08:13.112 --> 1:08:16.221
Those are independent choices
because everything is uniform
963
1:08:16.221 --> 1:08:18.097
here.
So, the choice of this guy was
964
1:08:18.097 --> 1:08:20.081
uniform.
And then, that determines who
965
1:08:20.081 --> 1:08:22.011
partitions in the left and the
right.
966
1:08:22.011 --> 1:08:24.798
Those are completely
independent recursive choices of
967
1:08:24.798 --> 1:08:26.621
who's the root in the left
subtree?
968
1:08:26.621 --> 1:08:29.086
Who's the root in the left of
the left subtree,
969
1:08:29.086 --> 1:08:31.177
and so on?
So, this is a little trickier
970
1:08:31.177 --> 1:08:36.385
than usual.
Before, it was random choices
971
1:08:36.385 --> 1:08:41.871
in the algorithm.
Now, it's in some construction
972
1:08:41.871 --> 1:08:47.474
where we choose the random
numbers ahead of time.
973
1:08:47.474 --> 1:08:52.961
It's a bit funny,
but this is still independent.
974
1:08:52.961 --> 1:08:58.214
So, we get this just like we
did in quicksort,
975
1:08:58.214 --> 1:08:59.731
and so on.
OK.
976
1:08:59.731 --> 1:09:05.374
Now, we continue.
And, now it's time to be a bit
977
1:09:05.374 --> 1:09:08.143
sloppy.
Well, one of these things we
978
1:09:08.143 --> 1:09:09.568
know.
OK, E of ZNK,
979
1:09:09.568 --> 1:09:12.812
that, we wrote over here.
It's one over n.
980
1:09:12.812 --> 1:09:15.899
So, that's cool.
So, we get a two over n
981
1:09:15.899 --> 1:09:20.489
outside, and we get this sum of
the expectation of a max of
982
1:09:20.489 --> 1:09:23.812
these two things.
Normally, we would write,
983
1:09:23.812 --> 1:09:27.136
well, I think sometimes you
write T of max,
984
1:09:27.136 --> 1:09:30.143
or Y of the max of the two
things here.
985
1:09:30.143 --> 1:09:36
You've got to write it as the
max of these two variables.
986
1:09:36 --> 1:09:41.547
And, the trick,
I mean, it's not too much of a
987
1:09:41.547 --> 1:09:46.849
trick, is that the max is,
at most, the sum.
988
1:09:46.849 --> 1:09:53.506
So, we have nonnegative things.
So, we have two over n,
989
1:09:53.506 --> 1:10:00.657
sum k equals one to n of the
expectation of the sum instead
990
1:10:00.657 --> 1:10:03.944
of the max.
OK, this is,
991
1:10:03.944 --> 1:10:07.014
in some sense,
the key step where we are
992
1:10:07.014 --> 1:10:11.344
losing something in our bound.
So far, we've been exact.
993
1:10:11.344 --> 1:10:15.437
Now, we're being pretty sloppy.
It's true the max is,
994
1:10:15.437 --> 1:10:19.137
at most, the sum.
But, it's a pretty loose upper
995
1:10:19.137 --> 1:10:22.758
bound as things go.
We'll keep that in mind for
996
1:10:22.758 --> 1:10:25.434
later.
What else can we do with the
997
1:10:25.434 --> 1:10:27.166
summation?
This should,
998
1:10:27.166 --> 1:10:33.47
again, look familiar.
Now that we have a sum of a sum
999
1:10:33.47 --> 1:10:38.283
of two things,
I'm trying to like it to be a
1000
1:10:38.283 --> 1:10:40.858
sum of one thing.
Sorry?
1001
1:10:40.858 --> 1:10:45.559
You can use linearity of
expectation, good.
1002
1:10:45.559 --> 1:10:49.813
So, that's the first thing I
should do.
1003
1:10:49.813 --> 1:10:55.41
So, linearity of expectation
lets me separate that.
1004
1:10:55.41 --> 1:11:02.079
Now I have a sum of 2n things.
Right, I could break that into
1005
1:11:02.079 --> 1:11:05.405
the sum of these guys,
and the sum of these guys.
1006
1:11:05.405 --> 1:11:08.247
Do you know anything about
those two sums?
1007
1:11:08.247 --> 1:11:11.019
Do we know anything about those
two sums?
1008
1:11:11.019 --> 1:11:14.069
They're the same.
In fact, every term here is
1009
1:11:14.069 --> 1:11:17.326
appearing exactly twice.
One says a k minus one.
1010
1:11:17.326 --> 1:11:20.722
One says an n minus k,
and that even works if it's
1011
1:11:20.722 --> 1:11:22.455
odd, I think.
So, in fact,
1012
1:11:22.455 --> 1:11:26.267
we can just take one of the
sums and multiply it by two.
1013
1:11:26.267 --> 1:11:30.356
So, this is four over n times
the sum, and I'll rewrite it a
1014
1:11:30.356 --> 1:11:35
little bit from zero to n minus
one of E of Y_k.
1015
1:11:35 --> 1:11:40.425
Just check the number of times
each Y_k appears from zero up to
1016
1:11:40.425 --> 1:11:45.237
n minus one is exactly two.
So, now I have a recurrence.
1017
1:11:45.237 --> 1:11:48.649
I have E of Y_n is,
at most, this thing.
1018
1:11:48.649 --> 1:11:51.8
Let's just write that for our
memory.
1019
1:11:51.8 --> 1:11:53.55
So, how's that?
Cool.
1020
1:11:53.55 --> 1:11:57.05
Now, I just have to solve the
recurrence.
1021
1:11:57.05 --> 1:12:03
How should I solve an ugly,
hairy, recurrence like this?
1022
1:12:03 --> 1:12:05.125
Substitution:
yea!
1023
1:12:05.125 --> 1:12:10.75
Not the master method.
OK, it's a pretty nasty
1024
1:12:10.75 --> 1:12:15.875
recurrence.
So, I'm going to make a guess,
1025
1:12:15.875 --> 1:12:22.125
and I've already told you the
guess, that it's n^3.
1026
1:12:22.125 --> 1:12:29.375
I think n^3 is pretty much
exactly where this proof will be
1027
1:12:29.375 --> 1:12:34.239
obtainable.
So, substitution method,
1028
1:12:34.239 --> 1:12:38.72
substitution method is just a
proof by induction.
1029
1:12:38.72 --> 1:12:44.506
And, there are two things every
proof by induction should have,
1030
1:12:44.506 --> 1:12:49.826
well, almost every proof by
induction, unless you're being
1031
1:12:49.826 --> 1:12:52.906
fancy.
It should have a base case,
1032
1:12:52.906 --> 1:12:57.013
and the base case here is n
equals order one.
1033
1:12:57.013 --> 1:13:00.093
I didn't write it,
but, of course,
1034
1:13:00.093 --> 1:13:05.319
if you have a constant size
tree, it has constant height.
1035
1:13:05.319 --> 1:13:10.64
So, this thing will be true as
long as we set true if c is
1036
1:13:10.64 --> 1:13:15.684
sufficiently large.
OK, so, don't forget that.
1037
1:13:15.684 --> 1:13:18.08
A lot of people forgot it on
the quiz.
1038
1:13:18.08 --> 1:13:20.089
We even mentioned the base
case.
1039
1:13:20.089 --> 1:13:22.939
Usually, we don't even mention
the base case.
1040
1:13:22.939 --> 1:13:25.854
And, you should assume that
there's one there.
1041
1:13:25.854 --> 1:13:30
And, you have to say this in
any proof by substitution.
1042
1:13:30 --> 1:13:33.107
OK, now, we have the induction
step.
1043
1:13:33.107 --> 1:13:37.279
So, I claim that E of Y_n is,
at most, Ccof n^3,
1044
1:13:37.279 --> 1:13:40.563
assuming that it's true for
smaller n.
1045
1:13:40.563 --> 1:13:44.647
You should write the induction
hypothesis here,
1046
1:13:44.647 --> 1:13:49.618
but I'm going to skip it
because I'm running out of time.
1047
1:13:49.618 --> 1:13:53.613
Now, we have this recurrence
that E of Y_n is,
1048
1:13:53.613 --> 1:13:56.809
at most, this thing.
So, E of Y_n is,
1049
1:13:56.809 --> 1:14:01.159
at most, four over n,
sum k equals zero to n minus
1050
1:14:01.159 --> 1:14:07.223
one of E of Y_k.
Now, notice that k is always
1051
1:14:07.223 --> 1:14:12.059
smaller than n.
So, we can apply induction.
1052
1:14:12.059 --> 1:14:15.858
So, this is,
at most, four over n,
1053
1:14:15.858 --> 1:14:21.269
sum k equals zero to n minus
one of c times k^3.
1054
1:14:21.269 --> 1:14:24.838
That's the induction
hypothesis.
1055
1:14:24.838 --> 1:14:28.753
Cool.
Now, I need an upper bound on
1056
1:14:28.753 --> 1:14:35.43
this sum, if you have a good
memory, then you know a closed
1057
1:14:35.43 --> 1:14:40.801
form for this sum.
But, I don't have such a good
1058
1:14:40.801 --> 1:14:43.97
memory as I used to.
I never memorized this sum when
1059
1:14:43.97 --> 1:14:47.884
I was a kid, so I don't remember
everything when I memorize when
1060
1:14:47.884 --> 1:14:51.612
I was less than 12 years old.
I still remember all the digits
1061
1:14:51.612 --> 1:14:54.532
of pi, whatever.
But, anything I try to memorize
1062
1:14:54.532 --> 1:14:57.079
now just doesn't quite stick the
same way.
1063
1:14:57.079 --> 1:15:00
So, I don't happen to know this
sum.
1064
1:15:00 --> 1:15:03.169
What's a good way to
approximate this sum?
1065
1:15:03.169 --> 1:15:05.256
Integral: good.
So, in fact,
1066
1:15:05.256 --> 1:15:07.653
I'm going to take the c
outside.
1067
1:15:07.653 --> 1:15:10.9
So, this is 4c over n.
The sum is, at most,
1068
1:15:10.9 --> 1:15:13.992
the integral.
If you get the range right,
1069
1:15:13.992 --> 1:15:18.089
so, you have to go one larger.
Instead of n minus one,
1070
1:15:18.089 --> 1:15:21.104
you go up to n.
This is in the textbook.
1071
1:15:21.104 --> 1:15:24.274
It's intuitive,
too, as long as you have a
1072
1:15:24.274 --> 1:15:26.516
monotone function.
That's key.
1073
1:15:26.516 --> 1:15:31
So, you have something that's
like this.
1074
1:15:31 --> 1:15:34.075
And, you know,
the sum is taking each of these
1075
1:15:34.075 --> 1:15:36.671
and weighting them with a value
of one.
1076
1:15:36.671 --> 1:15:40.157
The integral is computing the
area under this curve.
1077
1:15:40.157 --> 1:15:42.685
So, in particular,
if you look at this
1078
1:15:42.685 --> 1:15:45.624
approximation of the integral,
then, I mean,
1079
1:15:45.624 --> 1:15:49.382
this thing is certainly,
this would be the sum if you go
1080
1:15:49.382 --> 1:15:52.252
one larger at the end,
and that's, at most,
1081
1:15:52.252 --> 1:15:55.054
the integral.
So, that's proof by picture.
1082
1:15:55.054 --> 1:15:57.309
But, you can see this in the
book.
1083
1:15:57.309 --> 1:16:01
You should know it from 042 I
guess.
1084
1:16:01 --> 1:16:04.448
Now, integrals,
hopefully, you can solve.
1085
1:16:04.448 --> 1:16:07.206
Integral of x^3 is x^4 over
four.
1086
1:16:07.206 --> 1:16:11.172
I got it right.
And then, we're valuing that at
1087
1:16:11.172 --> 1:16:12.637
n.
And, it's zero.
1088
1:16:12.637 --> 1:16:17.293
Subtracting the zero doesn't
matter because zero to the
1089
1:16:17.293 --> 1:16:21.517
fourth power is zero.
So, it's just n^4 over four.
1090
1:16:21.517 --> 1:16:25.051
So, this is 4c over n times n^4
over four.
1091
1:16:25.051 --> 1:16:28.931
And, conveniently,
this four cancels with this
1092
1:16:28.931 --> 1:16:31.689
four.
The four turns into a three
1093
1:16:31.689 --> 1:16:36
because of this,
and we get n^3.
1094
1:16:36 --> 1:16:38.159
We get cn^3.
Damn convenient,
1095
1:16:38.159 --> 1:16:41.089
because that's what we wanted
to prove.
1096
1:16:41.089 --> 1:16:44.404
OK, so this proof is just
barely snaking by:
1097
1:16:44.404 --> 1:16:48.028
no residual term.
We've been sloppy all over the
1098
1:16:48.028 --> 1:16:50.727
place, and yet we were really
lucky.
1099
1:16:50.727 --> 1:16:54.12
And, we were just sloppy in the
right places.
1100
1:16:54.12 --> 1:16:56.51
So, this is a very tricky
proof.
1101
1:16:56.51 --> 1:17:01.214
If you just tried to do it by
hand, it's pretty easy to be too
1102
1:17:01.214 --> 1:17:04.453
sloppy, and not get quite the
right answer.
1103
1:17:04.453 --> 1:17:09.869
But, this just barely works.
So, let me say a couple of
1104
1:17:09.869 --> 1:17:12.89
things about it in my remaining
one minute.
1105
1:17:12.89 --> 1:17:15.407
So, we can do the conclusion,
again.
1106
1:17:15.407 --> 1:17:18.428
I won't write it because I
don't have time,
1107
1:17:18.428 --> 1:17:21.664
but here it is.
We just proved a bound on Y_n,
1108
1:17:21.664 --> 1:17:25.907
which was two to the power X_n.
What we cared about was X_n.
1109
1:17:25.907 --> 1:17:29
So, we used Jensen's
inequality.
1110
1:17:29 --> 1:17:32.35
We get the two to the E of X_n
is, at most, E of two to the
1111
1:17:32.35 --> 1:17:34.083
X_n.
This is what we know about
1112
1:17:34.083 --> 1:17:36.74
because that's Y_n.
So, we know E of Y_n is now
1113
1:17:36.74 --> 1:17:39.108
order n^3.
OK, we had to set this constant
1114
1:17:39.108 --> 1:17:41.187
sufficiently large for the base
case.
1115
1:17:41.187 --> 1:17:44.306
We didn't really figure out
what the constant was here.
1116
1:17:44.306 --> 1:17:47.599
It didn't matter because now
we're taking the logs of both
1117
1:17:47.599 --> 1:17:49.043
sides.
We get E of X_n is,
1118
1:17:49.043 --> 1:17:51.584
at most, log of order n^3.
This constant is a
1119
1:17:51.584 --> 1:17:54.241
multiplicative constant.
So, you take the logs.
1120
1:17:54.241 --> 1:17:57.072
It becomes additive.
This constant is an exponent.
1121
1:17:57.072 --> 1:18:01
So, it would take logs.
It becomes a multiple.
1122
1:18:01 --> 1:18:07.361
Three log n plus order one.
This is a pretty damn tight
1123
1:18:07.361 --> 1:18:13.486
bound on the height of a
randomly built binary search
1124
1:18:13.486 --> 1:18:18.081
tree, the expected height,
I should say.
1125
1:18:18.081 --> 1:18:23.617
In fact, the expected height of
X_n is equal to,
1126
1:18:23.617 --> 1:18:28.447
well, roughly,
I'll just say it's roughly,
1127
1:18:28.447 --> 1:18:34.926
I don't want to be too precise
here, 2.9882 times log n.
1128
1:18:34.926 --> 1:18:40.934
This is the result by a friend
of mine, Luke Devroy,
1129
1:18:40.934 --> 1:18:46
if I spell it right,
in 1986.
1130
1:18:46 --> 1:18:49.572
He's a professor at McGill
University in Montreal.
1131
1:18:49.572 --> 1:18:52.27
So, we're pretty close,
three to 2.98.
1132
1:18:52.27 --> 1:18:56.572
And, I won't prove this here.
The hard part here is actually
1133
1:18:56.572 --> 1:19:00
the lower bound,
but it's only that much.
1134
1:19:00 --> 1:19:04.274
I should say a little bit more
about why we use Y_n instead of
1135
1:19:04.274 --> 1:19:06.166
X_n.
And, it's all about the
1136
1:19:06.166 --> 1:19:08.268
sloppiness.
And, in particular,
1137
1:19:08.268 --> 1:19:12.193
this step, where we said that
the max of these two random
1138
1:19:12.193 --> 1:19:14.295
variables is,
at most, the sum.
1139
1:19:14.295 --> 1:19:18.359
And, while that's true for X
just as well as it is true for
1140
1:19:18.359 --> 1:19:21.653
Y, it's more true for Y.
OK, this is a bit weird
1141
1:19:21.653 --> 1:19:24.876
because, remember,
what we're analyzing here is
1142
1:19:24.876 --> 1:19:28.801
all possible values of k.
This has to work no matter what
1143
1:19:28.801 --> 1:19:32.234
k is, in some sense.
I mean, we're bounding all of
1144
1:19:32.234 --> 1:19:37
those cases simultaneously,
the sum of them all.
1145
1:19:37 --> 1:19:41.576
So, here we're looking at k
minus one versus n minus k.
1146
1:19:41.576 --> 1:19:44.881
And, in fact,
here, there's a polynomial
1147
1:19:44.881 --> 1:19:48.186
version.
But, so, if you take two values
1148
1:19:48.186 --> 1:19:51.576
a and b, and you say,
well, max of ab is,
1149
1:19:51.576 --> 1:19:55.728
at most, a plus b.
And, on the other hand you say,
1150
1:19:55.728 --> 1:19:59.542
well, max of two to the a and
two to the b is,
1151
1:19:59.542 --> 1:20:02.847
at most, two to the a plus two
to the b.
1152
1:20:02.847 --> 1:20:07
Doesn't this feel better than
that?
1153
1:20:07 --> 1:20:09.82
Well, they are,
of course, the same.
1154
1:20:09.82 --> 1:20:13.367
But, if you look at a minus b,
as that grows,
1155
1:20:13.367 --> 1:20:17.719
this becomes a tighter bound
faster than this becomes a
1156
1:20:17.719 --> 1:20:22.716
tighter bound because here we're
looking at absolute difference
1157
1:20:22.716 --> 1:20:26.504
between a minus b.
So, that's why this is pretty
1158
1:20:26.504 --> 1:20:31.259
good and this is pretty bad.
We're still really bad if a and
1159
1:20:31.259 --> 1:20:35.812
b are almost the same.
But, we're trying to solve this
1160
1:20:35.812 --> 1:20:38.677
for all partitions into k minus
one and n minus k.
1161
1:20:38.677 --> 1:20:42.127
So, it's OK if we get a few of
the cases wrong in the middle
1162
1:20:42.127 --> 1:20:45.284
where it evenly partitions.
But, as soon as we get some
1163
1:20:45.284 --> 1:20:49.026
skew, this will be very close to
this, whereas this will be still
1164
1:20:49.026 --> 1:20:52.066
pretty far from this.
You have to get pretty close to
1165
1:20:52.066 --> 1:20:54.58
the edge before you're not
losing much here,
1166
1:20:54.58 --> 1:20:57.504
whereas pretty quickly you're
not losing much here.
1167
1:20:57.504 --> 1:21:00.368
That's the intuition.
Try it, and see what happens
1168
1:21:00.368 --> 1:21:03
with X_n, and it won't work.
See you Wednesday.