WEBVTT

00:00:00.090 --> 00:00:02.490
The following content is
provided under a Creative

00:00:02.490 --> 00:00:04.030
Commons license.

00:00:04.030 --> 00:00:06.360
Your support will help
MIT OpenCourseWare

00:00:06.360 --> 00:00:10.720
continue to offer high quality
educational resources for free.

00:00:10.720 --> 00:00:13.320
To make a donation or
view additional materials

00:00:13.320 --> 00:00:17.280
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:17.280 --> 00:00:18.450
at ocw.mit.edu.

00:00:21.215 --> 00:00:22.090
PROFESSOR: All right.

00:00:22.090 --> 00:00:24.880
Today, we're going to look at
some kind of different data

00:00:24.880 --> 00:00:28.000
structures for static trees.

00:00:28.000 --> 00:00:30.294
So we have-- at least in
the second two problems--

00:00:30.294 --> 00:00:31.210
we have a static tree.

00:00:31.210 --> 00:00:34.780
We want to preprocess it
to answer lots of queries.

00:00:34.780 --> 00:00:37.380
And all the queries we're
going to support today

00:00:37.380 --> 00:00:40.360
we'll do in constant time per
operation, which is pretty

00:00:40.360 --> 00:00:42.130
awesome, and linear space.

00:00:42.130 --> 00:00:42.797
That's our goal.

00:00:42.797 --> 00:00:44.671
It's going to be hard
to achieve these goals.

00:00:44.671 --> 00:00:46.510
But in the end, we will
do it for all three

00:00:46.510 --> 00:00:47.260
of these problems.

00:00:47.260 --> 00:00:49.480
So let me tell you
about these problems.

00:00:49.480 --> 00:00:57.160
Range minimum queries, you're
given an array of numbers.

00:01:07.064 --> 00:01:08.855
And the kind of query
you want to support--

00:01:11.590 --> 00:01:14.671
we'll call RMQ of ij--

00:01:17.420 --> 00:01:21.630
is to find the
minimum in a range.

00:01:21.630 --> 00:01:34.280
So we have Ai up to Aj and we
want to compute the minimum

00:01:34.280 --> 00:01:35.430
in that range.

00:01:35.430 --> 00:01:38.840
So i and j form the query.

00:01:38.840 --> 00:01:40.670
I think it's pretty
clear what this means.

00:01:40.670 --> 00:01:44.820
I give you an interval
that I care about, ij,

00:01:44.820 --> 00:01:46.730
and I want to know,
in this range,

00:01:46.730 --> 00:01:48.230
what's the smallest value.

00:01:48.230 --> 00:01:50.750
And a little more subtle--
this will come up later.

00:01:50.750 --> 00:01:52.880
I don't just want to
know the value that's

00:01:52.880 --> 00:01:54.380
there-- like say
this is the minimum

00:01:54.380 --> 00:01:56.310
among that shaded region.

00:01:56.310 --> 00:02:00.820
But I also want to know
the index K between i

00:02:00.820 --> 00:02:03.689
and j of that element.

00:02:03.689 --> 00:02:06.230
Of course, if I know the index,
I can also look up the value.

00:02:06.230 --> 00:02:10.199
So it's more interesting
to know that index.

00:02:10.199 --> 00:02:10.699
OK.

00:02:10.699 --> 00:02:13.880
This is a non-tree problem,
but it will be closely related

00:02:13.880 --> 00:02:17.465
to tree problem, namely LCA.

00:02:22.240 --> 00:02:28.000
So LCA problem is you
want to preprocess a tree.

00:02:28.000 --> 00:02:42.950
It's a rooted tree, and the
query is LCA of two nodes.

00:02:42.950 --> 00:02:47.210
Which I think you know, or
I guess I call them x and y.

00:02:47.210 --> 00:02:50.540
So it has two nodes
x and y in the tree.

00:02:50.540 --> 00:02:53.327
I want to find their lowest
common ancestor, which

00:02:53.327 --> 00:02:54.410
looks something like that.

00:02:57.110 --> 00:02:59.210
At some point they
have shared ancestors,

00:02:59.210 --> 00:03:01.926
and we want to find
that lowest one.

00:03:01.926 --> 00:03:03.800
And then another problem
we're going to solve

00:03:03.800 --> 00:03:06.750
is level ancestor,
which again, preprocess

00:03:06.750 --> 00:03:14.952
a rooted tree and the query
is a little different.

00:03:18.050 --> 00:03:22.130
Given a node and an integer k--

00:03:22.130 --> 00:03:25.220
positive integer--
I want to find

00:03:25.220 --> 00:03:30.740
the kth ancestor of that node.

00:03:30.740 --> 00:03:37.040
Which you might write parent to
the k, meaning I have a node x,

00:03:37.040 --> 00:03:38.831
the first ancestor
is its parent.

00:03:41.780 --> 00:03:45.720
Eventually want to get
to the kth ancestor.

00:03:45.720 --> 00:03:49.520
So I want to jump
from x to there.

00:03:49.520 --> 00:03:53.840
So it's like teleporting to
a target height above me.

00:03:53.840 --> 00:03:58.447
Obviously, k cannot be larger
than the depth of the node.

00:03:58.447 --> 00:04:00.905
So these are the three problems
we're going to solve today,

00:04:00.905 --> 00:04:06.380
RMQ, LCA, and LA.

00:04:06.380 --> 00:04:08.500
Using somewhat
similar techniques,

00:04:08.500 --> 00:04:10.250
we're going to use a
nice technique called

00:04:10.250 --> 00:04:12.141
table look-up,
which is generally

00:04:12.141 --> 00:04:13.640
useful for a lot
of data structures.

00:04:13.640 --> 00:04:18.060
We are working in the
Word RAM throughout.

00:04:18.060 --> 00:04:20.959
But that's not as essential as
it has been in our past integer

00:04:20.959 --> 00:04:24.620
data structures.

00:04:24.620 --> 00:04:26.540
Now the fun thing
about these problems

00:04:26.540 --> 00:04:30.110
is while LCA and LA
look quite similar--

00:04:30.110 --> 00:04:33.830
I mean, they even share
two letters out of three--

00:04:33.830 --> 00:04:34.880
they're quite different.

00:04:34.880 --> 00:04:37.130
As far as I know, you need
fairly different techniques

00:04:37.130 --> 00:04:39.111
to deal with-- or as
far as anyone knows--

00:04:39.111 --> 00:04:40.610
you need pretty
different techniques

00:04:40.610 --> 00:04:42.410
to deal with both of them.

00:04:42.410 --> 00:04:45.020
The original paper that
solved level ancestors kind of

00:04:45.020 --> 00:04:47.360
lamented on this.

00:04:47.360 --> 00:04:49.160
RMQ, on the other
hand, turns out

00:04:49.160 --> 00:04:52.190
to be basically
identical to LCA.

00:04:52.190 --> 00:04:54.750
So that's the more
surprising thing,

00:04:54.750 --> 00:04:57.482
and I want to start with that.

00:04:57.482 --> 00:04:59.690
Again, our goal is to get
constant time, linear space

00:04:59.690 --> 00:05:02.390
for all these problems.

00:05:02.390 --> 00:05:05.419
Constant time is easy to
get with polynomial space.

00:05:05.419 --> 00:05:06.960
You could just store
all the answers.

00:05:06.960 --> 00:05:11.210
There's only n squared different
queries for all these problems,

00:05:11.210 --> 00:05:13.090
so quadratic space is easy.

00:05:13.090 --> 00:05:15.540
Linear space is the hard part.

00:05:15.540 --> 00:05:19.130
So let me tell you
about a nice reduction

00:05:19.130 --> 00:05:20.810
from an array to a tree.

00:05:30.830 --> 00:05:32.159
Very simple idea.

00:05:32.159 --> 00:05:33.450
It's called the Cartesian tree.

00:05:33.450 --> 00:05:37.110
It goes back to Gabow
Bentley and Tarjan in 1984.

00:05:37.110 --> 00:05:40.019
It's an old idea, but it
comes up now and then,

00:05:40.019 --> 00:05:41.810
and in particular,
provides the equivalence

00:05:41.810 --> 00:05:45.980
between RMQ and LCA,
or one direction of it.

00:05:45.980 --> 00:05:48.555
I just take a minimum element--

00:05:51.170 --> 00:05:53.010
let's call it Ai--

00:05:53.010 --> 00:05:55.370
of the array.

00:05:55.370 --> 00:05:58.560
Let that be the root of my tree.

00:05:58.560 --> 00:06:03.560
And then the left
sub-tree of T is just

00:06:03.560 --> 00:06:10.640
going to be a
Cartesian tree on all

00:06:10.640 --> 00:06:12.610
the elements to the left of i.

00:06:12.610 --> 00:06:19.220
So A less than i, and
then the right sub-tree

00:06:19.220 --> 00:06:23.960
is going to be A greater than i.

00:06:23.960 --> 00:06:25.190
So let's do little example.

00:06:32.560 --> 00:06:41.976
Suppose we have 8,
7, 2, 8, 6, 9, 4, 5.

00:06:45.210 --> 00:06:48.150
So the minimum in
this rate is 2.

00:06:48.150 --> 00:06:50.190
So it gets promoted
to the root, which

00:06:50.190 --> 00:06:55.350
decomposes the problem into
two halves, the left half

00:06:55.350 --> 00:06:56.280
and the right half.

00:06:56.280 --> 00:07:00.060
So drawing the tree, I put 2--

00:07:00.060 --> 00:07:02.130
maybe over here is
actually nicer--

00:07:02.130 --> 00:07:03.870
2 at the root.

00:07:03.870 --> 00:07:06.440
On the left side,
7 is the smallest.

00:07:06.440 --> 00:07:08.550
And so it's going to get
promoted to be the root,

00:07:08.550 --> 00:07:12.060
and so the left side
will look like this.

00:07:12.060 --> 00:07:15.840
On the right side,
the minimum is 4,

00:07:15.840 --> 00:07:24.460
so 4 is the right root, which
decomposes into the left half

00:07:24.460 --> 00:07:25.920
there, the right half there.

00:07:25.920 --> 00:07:29.400
So the right thing is just 5.

00:07:29.400 --> 00:07:34.050
Here the minimum is 6, and
so we get a nice binary tree

00:07:34.050 --> 00:07:34.770
on the left here.

00:07:37.751 --> 00:07:38.250
OK.

00:07:38.250 --> 00:07:40.650
This is not a
binary search tree.

00:07:40.650 --> 00:07:41.600
It's a min heap.

00:07:49.330 --> 00:07:53.450
Cartesian tree is a min heap.

00:07:53.450 --> 00:07:56.420
But Cartesian trees have a
more interesting property,

00:07:56.420 --> 00:07:59.750
which I've kind of alluded
to a couple of times already,

00:07:59.750 --> 00:08:02.140
which is that LCAs in
this tree correspond

00:08:02.140 --> 00:08:04.970
to RMQs in this array.

00:08:04.970 --> 00:08:07.400
So let's do some examples.

00:08:07.400 --> 00:08:11.140
Let's say I do LCA of 7 and 8.

00:08:11.140 --> 00:08:12.100
That's 2.

00:08:12.100 --> 00:08:16.630
Anything from the left and the
right sub-tree, the LCA is 2.

00:08:16.630 --> 00:08:20.490
And indeed, if I take anything,
any interval that spans 2,

00:08:20.490 --> 00:08:22.684
then the RMQ is 2.

00:08:22.684 --> 00:08:25.100
If I don't span 2, I'm either
in the left or in the right.

00:08:25.100 --> 00:08:28.790
Let's say I'm on the right, say
I do an LCA between 9 and 5.

00:08:28.790 --> 00:08:34.449
I get 4 because, yeah, the
RMQ between 9 and 5 is 4.

00:08:34.449 --> 00:08:35.710
Make sense?

00:08:35.710 --> 00:08:41.860
Same problem, really, because
it's all about which mins--

00:08:41.860 --> 00:08:44.320
I mean in the sequence of mins--
which mins do you contain?

00:08:44.320 --> 00:08:46.090
If you contain
the first min, you

00:08:46.090 --> 00:08:48.250
contain the highest
min you contain.

00:08:48.250 --> 00:08:55.330
That is the answer and
that's what LCA in this tree

00:08:55.330 --> 00:08:56.730
gives you.

00:08:56.730 --> 00:09:02.620
So LCA i and j in
this tree T equals

00:09:02.620 --> 00:09:08.090
RMQ in the original array of
the corresponding elements.

00:09:08.090 --> 00:09:11.080
So there is a bijection
between these items,

00:09:11.080 --> 00:09:13.090
and so I and J here
represents nodes,

00:09:13.090 --> 00:09:18.111
and here corresponding to
the corresponding items in A.

00:09:18.111 --> 00:09:18.610
OK.

00:09:18.610 --> 00:09:21.610
So this says if you
wanted to solve RMQ,

00:09:21.610 --> 00:09:25.860
you can reduce it
to an LCA problem.

00:09:25.860 --> 00:09:29.680
Quick note here, which is--

00:09:29.680 --> 00:09:30.370
yeah.

00:09:30.370 --> 00:09:32.744
There's a couple of different
versions of Cartesian trees

00:09:32.744 --> 00:09:35.880
when you have ties, so
here I only had one 2.

00:09:35.880 --> 00:09:40.120
If there was another 2,
then you could either just

00:09:40.120 --> 00:09:42.820
break ties arbitrarily
and you get a binary tree,

00:09:42.820 --> 00:09:46.480
or you could make them all one
node, which is kind of messier,

00:09:46.480 --> 00:09:48.520
and then you get
a non-binary tree.

00:09:48.520 --> 00:09:52.210
I think I'll say we
disambiguate arbitrarily.

00:09:52.210 --> 00:09:55.020
Just pick any min, and
then you get a binary tree.

00:09:55.020 --> 00:09:56.910
It won't affect the answer.

00:09:56.910 --> 00:09:59.634
But I think the original paper
might do it a different way.

00:10:02.235 --> 00:10:02.735
OK.

00:10:06.250 --> 00:10:06.880
Let's see.

00:10:06.880 --> 00:10:10.240
So then let me just
mention a fun fact

00:10:10.240 --> 00:10:12.490
about this reduction, which
is that you can compute it

00:10:12.490 --> 00:10:14.230
in linear time.

00:10:14.230 --> 00:10:16.804
This is a fun fact we
basically saw last class,

00:10:16.804 --> 00:10:18.595
although in a completely
different setting,

00:10:18.595 --> 00:10:21.160
so it's not at all obvious.

00:10:21.160 --> 00:10:23.740
But you may recall, we
had a method last time

00:10:23.740 --> 00:10:27.520
for building a compressed
trie in linear time.

00:10:27.520 --> 00:10:29.530
Basically, same
thing works here,

00:10:29.530 --> 00:10:31.496
although it seems
quite different.

00:10:31.496 --> 00:10:33.220
The idea is if you
want to build this,

00:10:33.220 --> 00:10:34.570
if you build a
Cartesian tree according

00:10:34.570 --> 00:10:36.111
to this recursive
algorithm, you will

00:10:36.111 --> 00:10:39.320
spend n log n time or actually,
maybe even quadratic time,

00:10:39.320 --> 00:10:42.190
if you're computing
min with a linear scan.

00:10:42.190 --> 00:10:44.080
So don't use that
recursive algorithm.

00:10:44.080 --> 00:10:46.690
Just walk through the array,
left to right, one at a time.

00:10:46.690 --> 00:10:48.250
So first you insert 8.

00:10:48.250 --> 00:10:49.720
Then you insert
7, and you realize

00:10:49.720 --> 00:10:53.980
7 would have would have
won, so you put 7 above 8.

00:10:53.980 --> 00:10:54.750
Then you insert 2.

00:10:54.750 --> 00:10:59.590
You say that's even higher than
7, so I have to put it up here.

00:10:59.590 --> 00:11:03.640
Then you insert 8 so that
you'll just go down from there,

00:11:03.640 --> 00:11:06.250
and you put 8 as a
right child of 2.

00:11:06.250 --> 00:11:07.120
Then you insert 6.

00:11:07.120 --> 00:11:12.790
You say whoops, 6 actually would
have gone in between 2 and 8.

00:11:12.790 --> 00:11:15.211
And the way you'd see that is--

00:11:15.211 --> 00:11:17.710
I mean, at that moment, your
tree looks something like this.

00:11:17.710 --> 00:11:21.367
You've got 2, 8, and there's
other stuff to the left,

00:11:21.367 --> 00:11:22.450
but I don't actually care.

00:11:22.450 --> 00:11:23.970
I just care about
the right spine.

00:11:23.970 --> 00:11:25.690
I say I'm inserting 6.

00:11:25.690 --> 00:11:28.390
6 would have been above
8, but not above 2.

00:11:28.390 --> 00:11:31.090
Therefore, it fits
along this edge,

00:11:31.090 --> 00:11:38.110
and so I convert this
tree into this pattern,

00:11:38.110 --> 00:11:40.150
and it will always
look like this.

00:11:43.090 --> 00:11:45.240
8 becomes a child of 7--

00:11:45.240 --> 00:11:47.760
sorry, 6.

00:11:47.760 --> 00:11:49.380
6.

00:11:49.380 --> 00:11:51.006
Thanks.

00:11:51.006 --> 00:11:52.350
Not 7.

00:11:52.350 --> 00:11:53.280
7 was on the left.

00:11:53.280 --> 00:11:58.390
This is the guy I'm
inserting next because here.

00:11:58.390 --> 00:12:03.450
So I guess it's a left child
because it's the first one.

00:12:03.450 --> 00:12:05.170
So we insert 6 like this.

00:12:05.170 --> 00:12:07.332
So now the new
right spine is 2, 6,

00:12:07.332 --> 00:12:08.790
and from then on,
we will always be

00:12:08.790 --> 00:12:10.080
working to the right of that.

00:12:10.080 --> 00:12:13.030
We'll never be touching
any of this left stuff.

00:12:13.030 --> 00:12:13.530
OK.

00:12:13.530 --> 00:12:15.390
So how long did it
take me to do that?

00:12:15.390 --> 00:12:19.200
In general, I have a right
spine of the tree, which are all

00:12:19.200 --> 00:12:22.860
right edges, and I might
have to walk up several steps

00:12:22.860 --> 00:12:28.620
before I discover whoops, this
is where the next item belongs.

00:12:28.620 --> 00:12:32.970
And then I convert it
into this new entry,

00:12:32.970 --> 00:12:35.709
which has a left child,
which is that stuff.

00:12:35.709 --> 00:12:37.500
But this stuff becomes
irrelevant from then

00:12:37.500 --> 00:12:40.120
on, because now, this
is the new right spine.

00:12:40.120 --> 00:12:42.630
And so if this is a
long walk, I charge that

00:12:42.630 --> 00:12:45.150
to the decrease in the
length of the right spine,

00:12:45.150 --> 00:12:47.460
just like that
algorithm last time.

00:12:47.460 --> 00:12:50.670
Slightly different
notion of right spine.

00:12:50.670 --> 00:12:53.449
So same amortization,
you get linear time,

00:12:53.449 --> 00:12:54.990
and you can build
the Cartesian tree.

00:12:54.990 --> 00:12:57.031
This is actually where
that algorithm comes from.

00:12:57.031 --> 00:12:58.330
This one was first, I believe.

00:13:01.410 --> 00:13:03.610
Questions?

00:13:03.610 --> 00:13:05.819
I'm not worrying too
much about build time,

00:13:05.819 --> 00:13:07.860
how long it takes to build
these data structures,

00:13:07.860 --> 00:13:09.930
but they can all be
built in linear time.

00:13:09.930 --> 00:13:12.540
And this is one of
the cooler algorithms,

00:13:12.540 --> 00:13:15.150
and it's a nice tie
into last lecture.

00:13:15.150 --> 00:13:17.550
So that's a reduction
from RMQ to LCA,

00:13:17.550 --> 00:13:21.122
so now all of our problems are
about trees, in some sense.

00:13:21.122 --> 00:13:22.830
I mean, there's a
reason I mentioned RMQ.

00:13:22.830 --> 00:13:25.350
Not just that it's a handy
problem to have solved,

00:13:25.350 --> 00:13:29.250
but we're actually going
to use RMQ to solve LCA.

00:13:29.250 --> 00:13:32.340
So we're going to go back and
forth between the two a lot.

00:13:34.870 --> 00:13:37.410
Actually, we'll spend most
of our time in the RMQ land.

00:13:37.410 --> 00:13:40.830
So let me tell you about
the reverse direction,

00:13:40.830 --> 00:13:45.330
if you want to
reduce LCA to RMQ.

00:13:45.330 --> 00:13:48.300
That also works.

00:13:48.300 --> 00:13:53.510
And you can kind of
see it in this picture.

00:13:53.510 --> 00:13:55.790
If I gave you this
tree, how would you

00:13:55.790 --> 00:13:57.500
reconstruct this array?

00:13:57.500 --> 00:13:59.730
Pop quiz.

00:13:59.730 --> 00:14:01.400
How do I go from here to here?

00:14:07.820 --> 00:14:09.584
In-order traversal, yep.

00:14:09.584 --> 00:14:11.750
Just doing an in-order
traversal, write those guys--

00:14:11.750 --> 00:14:13.801
I mean, yeah.

00:14:13.801 --> 00:14:14.300
Pretty easy.

00:14:14.300 --> 00:14:18.020
Now, not so easy because
in the LCA problem,

00:14:18.020 --> 00:14:19.730
I don't have numbers
in the nodes.

00:14:19.730 --> 00:14:21.950
So if I do an in-order
walk and I write stuff,

00:14:21.950 --> 00:14:24.910
it's like, what should I
write for each of the nodes.

00:14:24.910 --> 00:14:26.223
Any suggestions?

00:14:45.450 --> 00:14:47.174
AUDIENCE: [INAUDIBLE]

00:14:47.174 --> 00:14:48.090
PROFESSOR: The height?

00:14:48.090 --> 00:14:48.900
Not quite the height.

00:14:48.900 --> 00:14:49.440
The depth.

00:14:56.000 --> 00:14:58.590
That will work.

00:14:58.590 --> 00:15:04.256
So let's do it,
just so it's clear.

00:15:04.256 --> 00:15:12.246
Look at the same tree
Is that the same tree?

00:15:12.246 --> 00:15:12.745
Yep.

00:15:12.745 --> 00:15:14.615
So I write the depths.

00:15:14.615 --> 00:15:20.405
0, 1, 1, 2, 2, 2, 3, 3.

00:15:20.405 --> 00:15:22.530
It's either height or depth,
and you try them both.

00:15:22.530 --> 00:15:24.790
This is depth.

00:15:24.790 --> 00:15:28.625
So I do an in-order
walk I get 2, 1, 0--

00:15:31.449 --> 00:15:32.490
can you read my writing--

00:15:32.490 --> 00:15:35.474
3, 2, 3, 1, 2.

00:15:35.474 --> 00:15:37.890
It's funny doing an in-order
traversal on something that's

00:15:37.890 --> 00:15:40.590
not a binary search
tree, but there it is.

00:15:40.590 --> 00:15:43.350
That's the order in which
you visit the nodes.

00:15:43.350 --> 00:15:48.630
And you stare at it long
enough, this sequence

00:15:48.630 --> 00:15:52.380
will behave exactly the
same as this sequence.

00:15:52.380 --> 00:15:55.510
Of course, not in terms of
the actual values returned.

00:15:55.510 --> 00:15:58.650
But if you do the
argument version of RMQ,

00:15:58.650 --> 00:16:01.830
you just ask for what's the
index that gives me the min.

00:16:01.830 --> 00:16:05.660
If you can solve RMQ
on this structure,

00:16:05.660 --> 00:16:10.050
then that RMQ will give
exactly the same answers

00:16:10.050 --> 00:16:12.030
as this structure.

00:16:12.030 --> 00:16:13.680
Just kind of nifty.

00:16:13.680 --> 00:16:17.260
Because here I had numbers, they
could be all over the place.

00:16:17.260 --> 00:16:19.940
Here I have very clean numbers.

00:16:19.940 --> 00:16:24.670
They will go between 0 and
the height of the tree.

00:16:24.670 --> 00:16:28.090
So in general at
most, 0, 2, n minus 1.

00:16:28.090 --> 00:16:31.980
So fun consequence of
this is you get a tool

00:16:31.980 --> 00:16:38.124
for universe reduction in RMQ.

00:16:38.124 --> 00:16:39.790
The tree problems
don't have this issue,

00:16:39.790 --> 00:16:41.248
because they don't
involve numbers.

00:16:41.248 --> 00:16:44.710
They involve trees, and that's
why this reduction does this.

00:16:44.710 --> 00:16:54.190
But you can start from an
arbitrary ordered universe

00:16:54.190 --> 00:16:59.140
and have an RMQ problem on that,
and you can convert it to LCA.

00:16:59.140 --> 00:17:10.599
And then you can convert it
to a nice clean universe RMQ,

00:17:10.599 --> 00:17:12.989
just by doing the Cartesian
tree and then doing

00:17:12.989 --> 00:17:14.530
the in-order traversal
of the depths.

00:17:17.430 --> 00:17:21.480
This is kind of nifty because
if you look at these algorithms,

00:17:21.480 --> 00:17:23.784
they only assume a
comparison model.

00:17:23.784 --> 00:17:25.200
So these don't
have to be numbers.

00:17:25.200 --> 00:17:27.359
They just have to be something
from a totally ordered universe

00:17:27.359 --> 00:17:29.220
that you can compare
in constant time.

00:17:29.220 --> 00:17:30.780
You do this
reduction, and now we

00:17:30.780 --> 00:17:33.870
can assume they're integers,
nice small integers, and that

00:17:33.870 --> 00:17:36.510
will let us solve things in
constant time using the Word

00:17:36.510 --> 00:17:38.040
RAM.

00:17:38.040 --> 00:17:42.250
So you don't need to assume
that about the original values.

00:17:42.250 --> 00:17:44.110
Cool.

00:17:44.110 --> 00:17:46.160
So, time to actually
solve something.

00:17:46.160 --> 00:17:47.335
We've done reductions.

00:17:47.335 --> 00:17:49.750
We now know RMQ and
LCA are equivalent.

00:17:49.750 --> 00:17:50.740
Let's solve them both.

00:18:01.500 --> 00:18:06.410
Kind of like the last
of the sorting we saw,

00:18:06.410 --> 00:18:08.062
there's going to
be a lot of steps.

00:18:08.062 --> 00:18:09.270
They're not sequential steps.

00:18:09.270 --> 00:18:11.850
These are like different
versions of a data structure

00:18:11.850 --> 00:18:14.040
for solving RMQ,
and they're going

00:18:14.040 --> 00:18:17.540
to be getting progressively
better and better.

00:18:17.540 --> 00:18:26.560
So LCA which applies RMQ.

00:18:30.450 --> 00:18:33.660
This is originally solved
by Harel and Tarjan in 1984,

00:18:33.660 --> 00:18:35.899
but is rather complicated.

00:18:35.899 --> 00:18:37.440
And then what I'm
going to talk about

00:18:37.440 --> 00:18:41.580
is a version from 2000 by
Bender and Farach-Colton,

00:18:41.580 --> 00:18:44.830
same authors from the
cache-oblivious B-trees.

00:18:44.830 --> 00:18:48.420
That's a much
simpler presentation.

00:18:48.420 --> 00:18:54.630
So first step is I want to do
this reduction again from LCA

00:18:54.630 --> 00:18:57.799
to RMQ, but slightly
differently.

00:18:57.799 --> 00:19:00.090
And we're going to get a more
restricted problem called

00:19:00.090 --> 00:19:01.530
plus or minus 1 RMQ.

00:19:05.130 --> 00:19:07.140
What is plus or minus 1 RMQ?

00:19:07.140 --> 00:19:13.170
Just means that you get an array
where all the adjacent values

00:19:13.170 --> 00:19:16.128
differ by plus or minus 1.

00:19:19.170 --> 00:19:21.540
And if you look at the
numbers here, a lot of them

00:19:21.540 --> 00:19:23.540
differ by plus or minus 1.

00:19:23.540 --> 00:19:24.690
These all do.

00:19:24.690 --> 00:19:26.250
But then there are
some big gaps--

00:19:26.250 --> 00:19:29.440
like this has a gap of
3, this has a gap of 2.

00:19:29.440 --> 00:19:31.710
This is plus or minus 1.

00:19:31.710 --> 00:19:34.320
That's almost right,
and if you just

00:19:34.320 --> 00:19:38.550
stare at this idea
of tree walk enough,

00:19:38.550 --> 00:19:40.230
you'll realize a
little trick to make

00:19:40.230 --> 00:19:44.370
the array a little bit bigger,
but give you plus or minus

00:19:44.370 --> 00:19:45.360
ones.

00:19:45.360 --> 00:19:47.120
If you've done a lot
of tree traversal,

00:19:47.120 --> 00:19:50.040
this will come quite naturally.

00:19:50.040 --> 00:19:53.530
This is a depth first search.

00:19:53.530 --> 00:19:55.460
This is how the depth
first search order

00:19:55.460 --> 00:19:58.210
of visiting a tree in order.

00:19:58.210 --> 00:20:00.660
This is usually called
an Eulerian tour.

00:20:00.660 --> 00:20:04.450
The concept we'll come
back to in a few lectures.

00:20:04.450 --> 00:20:08.070
But Euler tour just means
you visit every edge twice,

00:20:08.070 --> 00:20:10.662
in this case.

00:20:10.662 --> 00:20:12.120
If you look at the
node visits, I'm

00:20:12.120 --> 00:20:16.590
visiting this node here,
here, and here, three times.

00:20:16.590 --> 00:20:19.110
But it's amortized constant,
because every edge is just

00:20:19.110 --> 00:20:21.270
visited twice.

00:20:21.270 --> 00:20:23.790
What I'd like to do is
follow an Euler tour

00:20:23.790 --> 00:20:29.490
and then write down all
the nodes that I visit,

00:20:29.490 --> 00:20:31.710
but with repetition.

00:20:31.710 --> 00:20:41.010
So in that picture I
will get 0, 1, 2, 1.

00:20:41.010 --> 00:20:48.450
I go 0, 1, 2, back to 1,
back to 0, then over to the 1

00:20:48.450 --> 00:20:52.740
on the right, then to
the 2, then to the 3,

00:20:52.740 --> 00:20:55.260
then back up to the 2,
then down to the other 3,

00:20:55.260 --> 00:20:57.120
then back up to the
2, back up to the 1,

00:20:57.120 --> 00:21:00.690
back down to the last
node on the right,

00:21:00.690 --> 00:21:03.040
and back up and back up.

00:21:03.040 --> 00:21:03.540
OK.

00:21:03.540 --> 00:21:06.300
This is what we call Euler tour.

00:21:06.300 --> 00:21:08.130
So multiple visits--
for example, here's

00:21:08.130 --> 00:21:11.670
all the places that
the root is visited.

00:21:11.670 --> 00:21:16.050
Here's all the places
that this node is visited,

00:21:16.050 --> 00:21:20.430
then this node is
visited 3 times.

00:21:20.430 --> 00:21:23.890
It's going to be visited
once per incident edge.

00:21:23.890 --> 00:21:26.840
I think you get the pattern.

00:21:26.840 --> 00:21:28.680
I'm just going to store this.

00:21:28.680 --> 00:21:30.911
And what else am I going to do?

00:21:30.911 --> 00:21:31.410
Let's see.

00:21:31.410 --> 00:21:41.610
Each node in the tree
stores, let's say,

00:21:41.610 --> 00:21:45.450
the first visit in the array.

00:21:45.450 --> 00:21:47.086
Pretty sure this is enough.

00:21:47.086 --> 00:21:48.960
You could maybe store
the last visit as well.

00:21:48.960 --> 00:21:51.790
We can only store a
constant number of things.

00:21:51.790 --> 00:22:04.500
And I guess each array
item stores a pointer

00:22:04.500 --> 00:22:06.120
to the corresponding
node in the tree.

00:22:12.340 --> 00:22:12.840
OK.

00:22:12.840 --> 00:22:15.960
So each instance of the 0
stores a pointer to the root,

00:22:15.960 --> 00:22:18.470
and so on.

00:22:18.470 --> 00:22:21.360
It's kind of what these
horizontal bars are indicating,

00:22:21.360 --> 00:22:23.760
but those aren't
actually stored.

00:22:23.760 --> 00:22:24.330
OK.

00:22:24.330 --> 00:22:32.390
So I claim still RMQ and here
is the same as LCA over there.

00:22:32.390 --> 00:22:35.550
It's maybe a little
more subtle, but now

00:22:35.550 --> 00:22:39.300
if I want to compute
the LCA of two nodes,

00:22:39.300 --> 00:22:41.110
I look at their
first occurrences.

00:22:41.110 --> 00:22:42.690
So let's do-- I don't know--

00:22:42.690 --> 00:22:44.910
2 and 3.

00:22:44.910 --> 00:22:46.530
Here, this 2 and this 3.

00:22:46.530 --> 00:22:49.515
I didn't label them, but I
happen to know where they are.

00:22:49.515 --> 00:22:51.360
2 is here, and it's the first 3.

00:22:54.150 --> 00:22:56.669
Now here, they happen to
only occur once in the tour,

00:22:56.669 --> 00:22:57.710
so it's a little clearer.

00:22:57.710 --> 00:23:00.590
If I compute the RMQ,
I get this 0, this 0,

00:23:00.590 --> 00:23:03.090
as opposed to the other 0s,
but this 0 points to the root,

00:23:03.090 --> 00:23:05.400
so I get the LCA.

00:23:05.400 --> 00:23:07.570
Let's do ones that do not
have unique occurrences.

00:23:07.570 --> 00:23:11.475
So like, this guy and this guy,
the first 1 and the first 2

00:23:11.475 --> 00:23:14.811
It'd be this 1 and this 1.

00:23:14.811 --> 00:23:16.560
In fact, I think any
of the 2s would work.

00:23:16.560 --> 00:23:17.481
Doesn't really matter.

00:23:17.481 --> 00:23:18.730
Just have to pick one of them.

00:23:18.730 --> 00:23:21.150
So I picked the leftmost
one for consistency.

00:23:21.150 --> 00:23:24.400
Then I take the
RMQ, again I get 0.

00:23:24.400 --> 00:23:25.960
You can test that
for all of them.

00:23:25.960 --> 00:23:27.610
I think the slightly
more subtle case

00:23:27.610 --> 00:23:30.140
is when one node is an
ancestor of another.

00:23:30.140 --> 00:23:35.060
So let's do that,
1 here and 3 there.

00:23:35.060 --> 00:23:37.390
I think here you do need to
be leftmost or rightmost,

00:23:37.390 --> 00:23:38.920
consistently.

00:23:38.920 --> 00:23:43.330
So I take the 1 and
I take the second 3.

00:23:43.330 --> 00:23:43.830
OK.

00:23:43.830 --> 00:23:48.820
I take the RMQ of that, I get 1
which is the higher of the two.

00:23:48.820 --> 00:23:49.320
OK.

00:23:49.320 --> 00:23:50.959
So it seems to work.

00:23:50.959 --> 00:23:53.500
Actually, I think it would work
no matter which guy you pick.

00:23:53.500 --> 00:23:56.390
I just picked the first one.

00:23:56.390 --> 00:23:58.872
OK, no big deal.

00:23:58.872 --> 00:24:01.330
You're not going to see why
this is useful for a little bit

00:24:01.330 --> 00:24:05.080
until step 4 or something,
but we've slightly

00:24:05.080 --> 00:24:08.110
simplified our problem to
this plus or minus 1 RMQ.

00:24:08.110 --> 00:24:11.300
Otherwise identical to
this in-order traversal.

00:24:11.300 --> 00:24:13.940
So not a big deal, but
we'll need it later.

00:24:16.691 --> 00:24:17.190
OK.

00:24:34.476 --> 00:24:35.350
That was a reduction.

00:24:35.350 --> 00:24:38.140
Next, we're finally going
to actually solve something.

00:24:38.140 --> 00:24:47.920
I'm going to do constant
time, n log n space, RMQ.

00:24:47.920 --> 00:24:50.910
This data structure will not
require plus or minus 1 RMQ.

00:24:50.910 --> 00:24:52.660
It works for any RMQ.

00:24:52.660 --> 00:24:54.560
It's actually a
very simple idea,

00:24:54.560 --> 00:24:55.790
and it's almost what we need.

00:24:55.790 --> 00:24:58.040
But we're going to have to
get rid of this log factor.

00:24:58.040 --> 00:25:00.050
That will be step 3.

00:25:00.050 --> 00:25:01.240
OK, so here's the idea.

00:25:01.240 --> 00:25:04.000
You've got an array.

00:25:04.000 --> 00:25:06.670
And now someone gives
you an arbitrary interval

00:25:06.670 --> 00:25:08.725
from here to here.

00:25:11.260 --> 00:25:14.290
Ideally, I just store the mins
for every possible interval,

00:25:14.290 --> 00:25:15.955
but there's n squared intervals.

00:25:15.955 --> 00:25:20.830
So instead, what I'm going
to do is store the answer

00:25:20.830 --> 00:25:24.880
not for all the intervals, but
for all intervals of length

00:25:24.880 --> 00:25:27.286
of power of 2.

00:25:27.286 --> 00:25:29.470
It's a trick you've
probably seen before.

00:25:32.500 --> 00:25:34.185
This is the easy thing to do.

00:25:34.185 --> 00:25:35.560
And then the
interesting thing is

00:25:35.560 --> 00:25:37.920
how you make it actually
get down to linear space.

00:25:40.570 --> 00:25:44.260
Length, power of 2.

00:25:46.540 --> 00:25:47.040
OK.

00:25:47.040 --> 00:25:50.230
There are only log n
possible powers of 2.

00:25:50.230 --> 00:25:53.200
There's still n different start
points for those intervals,

00:25:53.200 --> 00:25:55.720
so total number of
intervals is n log n.

00:25:55.720 --> 00:25:58.857
So this is n log n space,
because I'm storing

00:25:58.857 --> 00:25:59.940
an index for each of them.

00:26:02.520 --> 00:26:04.370
OK.

00:26:04.370 --> 00:26:07.520
And then if I have an
arbitrary query, the point is--

00:26:07.520 --> 00:26:10.280
let's call it length k--

00:26:10.280 --> 00:26:14.300
then I can cover
it by two intervals

00:26:14.300 --> 00:26:16.790
of length a power of 2.

00:26:16.790 --> 00:26:18.500
They will be the same length.

00:26:18.500 --> 00:26:21.980
They will be length
2 to the floor

00:26:21.980 --> 00:26:26.176
of log k, the next smaller
power of 2 below k.

00:26:26.176 --> 00:26:27.800
Maybe k is a power
of 2, in which case,

00:26:27.800 --> 00:26:30.800
it's just one interval
or two equal intervals.

00:26:30.800 --> 00:26:33.500
But in general, you just take
the next smaller power of 2.

00:26:33.500 --> 00:26:37.700
That will cover more than half
of the thing, of the interval.

00:26:37.700 --> 00:26:39.479
And so you have one
that's left aligned,

00:26:39.479 --> 00:26:40.520
one that's right aligned.

00:26:40.520 --> 00:26:42.320
Together, those will
cover everything.

00:26:42.320 --> 00:26:45.680
And because the min operation
has this nifty feature

00:26:45.680 --> 00:26:48.200
that you can take the min of
all these, min of all these,

00:26:48.200 --> 00:26:49.490
take the min of the 2.

00:26:49.490 --> 00:26:51.530
You will get the min overall.

00:26:51.530 --> 00:26:54.090
It doesn't hurt to
have duplicate entries.

00:26:54.090 --> 00:26:58.514
That's kind of an
important property of min.

00:26:58.514 --> 00:26:59.930
It holds for other
properties too,

00:26:59.930 --> 00:27:02.840
like max, but not everything.

00:27:02.840 --> 00:27:05.019
Then boom, we've solved RMQ.

00:27:05.019 --> 00:27:05.810
I think it's clear.

00:27:05.810 --> 00:27:08.514
You do two queries, take
the min of the two--

00:27:08.514 --> 00:27:10.305
actually, you have to
restore the arg mins.

00:27:10.305 --> 00:27:14.660
So it's a little more
work, but constant time.

00:27:14.660 --> 00:27:15.420
Cool.

00:27:15.420 --> 00:27:16.070
That was easy.

00:27:22.070 --> 00:27:23.190
Leave LCA up there.

00:27:30.180 --> 00:27:30.790
OK.

00:27:30.790 --> 00:27:32.020
So we're almost there, right.

00:27:32.020 --> 00:27:34.420
Just a log factor off.

00:27:34.420 --> 00:27:38.230
So what technique do we have
for shaving log factors?

00:27:40.880 --> 00:27:44.607
Indirection, yeah, our
good friend indirection.

00:27:47.470 --> 00:27:50.140
Indirection comes to
our rescue yet again,

00:27:50.140 --> 00:27:52.150
but we won't be done.

00:27:52.150 --> 00:27:54.280
The idea is, well, want
to remove a log factor.

00:27:54.280 --> 00:27:56.849
Before we removed log
factors from time,

00:27:56.849 --> 00:27:58.390
but there's no real
time here, right.

00:27:58.390 --> 00:27:59.890
Everything's constant time.

00:27:59.890 --> 00:28:02.770
But we can use indirection to
shave a log factor in space,

00:28:02.770 --> 00:28:03.790
too.

00:28:03.790 --> 00:28:07.120
Let's just divide.

00:28:07.120 --> 00:28:11.980
So this is again for RMQ.

00:28:11.980 --> 00:28:18.010
So I have an array, I'm going
to divide the array into groups

00:28:18.010 --> 00:28:23.130
of size, I believe
1/2 log n would

00:28:23.130 --> 00:28:24.940
be the right magic number.

00:28:24.940 --> 00:28:27.580
It's going to be theta log n,
but I need a specific constant

00:28:27.580 --> 00:28:28.390
for step 4.

00:28:30.910 --> 00:28:33.610
So what does that mean?

00:28:33.610 --> 00:28:37.820
I have the first 1/2 log
n entries in the array.

00:28:37.820 --> 00:28:41.530
Then I have the next
1/2 log entries,

00:28:41.530 --> 00:28:47.200
and then I have the
last 1/2 log n entries.

00:28:47.200 --> 00:28:50.560
OK, that's easy enough.

00:28:50.560 --> 00:28:53.210
But now I'd like to tie all
these structures together.

00:28:53.210 --> 00:28:58.420
A natural way to do that is with
a big structure on top of size,

00:28:58.420 --> 00:29:03.670
n over log n, I guess
with a factor 2 out here.

00:29:03.670 --> 00:29:07.210
n over 1/2 log n.

00:29:07.210 --> 00:29:08.240
How do I do that?

00:29:08.240 --> 00:29:10.840
Well, this is an RMQ problem,
so the natural thing to do

00:29:10.840 --> 00:29:13.400
is just take the min
of everything here.

00:29:13.400 --> 00:29:16.480
So the red here is going
to denote taking the min,

00:29:16.480 --> 00:29:18.880
and take that-- the one
item that results by taking

00:29:18.880 --> 00:29:22.520
the min in that group, and
promoting it to the next level.

00:29:22.520 --> 00:29:25.250
This is a static thing
we do ahead of time.

00:29:25.250 --> 00:29:28.120
Now if I'm given
a query, like say,

00:29:28.120 --> 00:29:31.900
this interval, what
I need to do is first

00:29:31.900 --> 00:29:37.300
compute the min in this range
within a bottom structure.

00:29:37.300 --> 00:29:39.190
Maybe also compute the
min within this range,

00:29:39.190 --> 00:29:41.590
the last bottom structure,
and then these guys

00:29:41.590 --> 00:29:42.820
are all taken in entirety.

00:29:42.820 --> 00:29:46.330
So I can just take the
corresponding interval up here

00:29:46.330 --> 00:29:48.370
and that will give me
simultaneously the mins

00:29:48.370 --> 00:29:49.660
of everything below.

00:29:49.660 --> 00:30:01.840
So now a query is going to be
the min of two bottoms and one

00:30:01.840 --> 00:30:03.550
top.

00:30:03.550 --> 00:30:06.550
In other words, I do one
top RMQ query for everything

00:30:06.550 --> 00:30:09.760
between, strictly
between the two ends.

00:30:09.760 --> 00:30:12.520
Then I do a bottom query for
the one end, a bottom query

00:30:12.520 --> 00:30:13.689
for the other end.

00:30:13.689 --> 00:30:15.730
Take the min of all those
values and really, it's

00:30:15.730 --> 00:30:18.380
the arg min, but.

00:30:18.380 --> 00:30:18.880
Clear?

00:30:18.880 --> 00:30:20.530
So it would be
constant time if I

00:30:20.530 --> 00:30:22.030
can do bottom in
constant time, if I

00:30:22.030 --> 00:30:24.040
can do top in constant time.

00:30:24.040 --> 00:30:26.890
But the big win is that
this top structure only has

00:30:26.890 --> 00:30:29.050
to store n over log n items.

00:30:29.050 --> 00:30:32.830
So I can afford an n log
n space data structure,

00:30:32.830 --> 00:30:34.930
because the logs cancel.

00:30:34.930 --> 00:30:38.800
So I'm going to use
structure 2 for the top.

00:30:38.800 --> 00:30:42.440
That will give me constant
time up here, linear space.

00:30:42.440 --> 00:30:46.240
So all that's left is to solve
the bottoms individually.

00:30:46.240 --> 00:30:48.310
Again, similar kind of
structure to [INAUDIBLE]..

00:30:48.310 --> 00:30:51.160
We have a summary structure and
we have the details down below.

00:30:51.160 --> 00:30:52.900
But the parameters
are way out of whack.

00:30:52.900 --> 00:30:54.506
It's no longer root n, root n.

00:30:54.506 --> 00:30:56.380
Now these guys are super
tiny because we only

00:30:56.380 --> 00:30:59.410
needed this to be a
little bit smaller than n,

00:30:59.410 --> 00:31:03.010
and then this would work
out to linear space.

00:31:03.010 --> 00:31:03.940
OK.

00:31:03.940 --> 00:31:08.176
So step 4 is going to be how do
we solve the bottom structures.

00:31:20.326 --> 00:31:23.310
So step 4.

00:31:23.310 --> 00:31:33.830
This is where we're going to
use technique of lookup tables

00:31:33.830 --> 00:31:35.550
for bottom groups.

00:31:37.965 --> 00:31:39.840
This is going to be
slightly weird to phrase,

00:31:39.840 --> 00:31:41.756
because on the one hand,
I want to be thinking

00:31:41.756 --> 00:31:44.490
about an individual group,
but my solution is actually

00:31:44.490 --> 00:31:46.260
going to solve all
groups simultaneously,

00:31:46.260 --> 00:31:47.385
and it's kind of important.

00:31:47.385 --> 00:31:51.990
But for now, let's just
think of one group.

00:31:51.990 --> 00:31:58.065
So it has size n prime
and n prime is 1/2 log n.

00:31:58.065 --> 00:31:59.440
I need to remember
how it relates

00:31:59.440 --> 00:32:02.920
to the original value of n so
I know how to pay for things.

00:32:02.920 --> 00:32:06.630
The idea is there's really
not many different problems

00:32:06.630 --> 00:32:08.239
of size 1/2 log n.

00:32:08.239 --> 00:32:10.530
And here's where we're going
to use the fact that we're

00:32:10.530 --> 00:32:14.110
in plus or minus 1 land.

00:32:14.110 --> 00:32:17.970
We have this giant
string of integers.

00:32:17.970 --> 00:32:19.720
Well, now we're looking
at log n of them

00:32:19.720 --> 00:32:25.590
to say OK, this here, this
is a sequence 0, 1, 2, 3.

00:32:25.590 --> 00:32:28.200
Over here a 0, 1, 2, 1.

00:32:28.200 --> 00:32:29.790
There's all these
different things.

00:32:29.790 --> 00:32:31.815
Then there's other
things like 2, 3, 2, 3.

00:32:34.350 --> 00:32:37.000
So there's a couple
annoying things.

00:32:37.000 --> 00:32:40.800
One is it matters what
value you start at, a b,

00:32:40.800 --> 00:32:43.260
and then it matters what the
sequence of plus and minus 1s

00:32:43.260 --> 00:32:45.940
are after that.

00:32:45.940 --> 00:32:46.440
OK.

00:32:46.440 --> 00:32:49.710
I claim it doesn't really
matter what value you start at,

00:32:49.710 --> 00:33:04.110
because RMQ, this query,
is invariant under adding

00:33:04.110 --> 00:33:10.720
some value x to all entries,
all values, in the array.

00:33:10.720 --> 00:33:13.590
Or if I add 100 to every
value, then the minimums

00:33:13.590 --> 00:33:15.690
stay the same in position.

00:33:15.690 --> 00:33:18.220
So again, here I'm thinking
of RMQ as an arg min.

00:33:18.220 --> 00:33:21.600
So it's giving just the
index of where it lives.

00:33:21.600 --> 00:33:27.030
So in particular, I'm going
to add minus the first value

00:33:27.030 --> 00:33:29.370
of the array to all values.

00:33:33.210 --> 00:33:34.790
I should probably call this--

00:33:34.790 --> 00:33:36.720
well, yeah.

00:33:36.720 --> 00:33:39.580
Here I'm just thinking about
a single group for now.

00:33:39.580 --> 00:33:42.570
So in a single group, saying
well, it starts at some value.

00:33:42.570 --> 00:33:44.760
I'm just going to
decrease all these things

00:33:44.760 --> 00:33:46.345
by whatever that value is.

00:33:46.345 --> 00:33:47.970
Now some of them
might become negative,

00:33:47.970 --> 00:33:50.730
but at least now
we start with a 0.

00:33:50.730 --> 00:33:55.180
So what we start
with is irrelevant.

00:33:55.180 --> 00:33:58.020
What remains, the remaining
numbers here are completely

00:33:58.020 --> 00:34:01.770
defined by the gaps
between or the difs

00:34:01.770 --> 00:34:04.320
between consecutive
items, and the difs

00:34:04.320 --> 00:34:06.555
are all plus or minus 1.

00:34:06.555 --> 00:34:19.469
So now the number of possible
arrays in a group, so

00:34:19.469 --> 00:34:22.980
in a single group, is
equal to the number

00:34:22.980 --> 00:34:32.070
of plus or minus 1 strings
of length n prime, which is

00:34:32.070 --> 00:34:32.610
1/2 log n.

00:34:37.120 --> 00:34:39.600
And the number of plus or
minus 1 strings of length

00:34:39.600 --> 00:34:42.670
n prime is 2 to the n prime.

00:34:42.670 --> 00:34:48.150
So we get 2 to the 1/2 log n,
also known as square root of n.

00:34:48.150 --> 00:34:49.530
Square root of n is small.

00:34:49.530 --> 00:34:51.219
We're aiming for linear space.

00:34:51.219 --> 00:34:53.730
This means that for every--

00:34:53.730 --> 00:34:56.730
not only for every group,
there is n over log n groups--

00:34:56.730 --> 00:34:59.140
but actually many of the
groups have to be the same.

00:34:59.140 --> 00:35:02.220
There's n over log n groups,
but there's only root n

00:35:02.220 --> 00:35:04.450
different types of groups.

00:35:04.450 --> 00:35:09.090
So on average, like root n
over log n occurrences of each.

00:35:09.090 --> 00:35:12.494
So we can kind of compress
things down and say hey,

00:35:12.494 --> 00:35:14.160
I would like to just
like store a lookup

00:35:14.160 --> 00:35:16.985
table for each one of these, but
that would be quadratic space.

00:35:16.985 --> 00:35:19.360
But there's really only square
root of n different types.

00:35:19.360 --> 00:35:22.410
So if I use a layer of
indirection, I guess--

00:35:22.410 --> 00:35:24.360
different sort of
indirection-- if I just

00:35:24.360 --> 00:35:26.010
have, for each of
these groups, I just

00:35:26.010 --> 00:35:29.040
store a pointer to the
type of group, which

00:35:29.040 --> 00:35:31.980
is what the plus or
minus 1 string is,

00:35:31.980 --> 00:35:34.200
and then for that type,
I store a lookup table

00:35:34.200 --> 00:35:35.730
of all possibilities.

00:35:35.730 --> 00:35:36.900
That will be efficient.

00:35:36.900 --> 00:35:41.200
Let me show that to you.

00:35:41.200 --> 00:35:43.390
This is a very handy idea.

00:35:43.390 --> 00:35:47.960
In general, if you have a lot
of things of size roughly log n,

00:35:47.960 --> 00:35:50.960
lookup tables are a good idea.

00:35:54.520 --> 00:35:56.860
And this naturally arises
when you're using indirection,

00:35:56.860 --> 00:36:00.340
because usually you just
need to shave a log or two.

00:36:00.340 --> 00:36:03.610
So here we have these
different types.

00:36:03.610 --> 00:36:13.510
So what we're going to do
is store a lookup table that

00:36:13.510 --> 00:36:16.810
says for each group
type, I'll just

00:36:16.810 --> 00:36:24.130
say a lookup table
of all answers,

00:36:24.130 --> 00:36:26.770
do that for each group type.

00:36:31.012 --> 00:36:32.970
Group type, meaning the
plus or minus 1 string.

00:36:32.970 --> 00:36:34.630
It's really what
is in that group

00:36:34.630 --> 00:36:36.831
after you do this shifting.

00:36:36.831 --> 00:36:37.330
OK.

00:36:37.330 --> 00:36:39.038
Now there's square
root of n group types.

00:36:41.620 --> 00:36:43.390
What does it take to
store the answers?

00:36:43.390 --> 00:36:50.680
Well, there is, I guess, 1/2
log n squared different queries,

00:36:50.680 --> 00:36:53.157
because n prime is
1/2 log n, and a query

00:36:53.157 --> 00:36:54.490
is defined by the two endpoints.

00:36:54.490 --> 00:36:56.650
So there's at most
this many queries.

00:36:56.650 --> 00:37:00.820
Each query, to store the answer,
is going to take order log log

00:37:00.820 --> 00:37:03.910
n bits-- this is
if you're fancy--

00:37:03.910 --> 00:37:07.890
because the answer is an index
into that array of size 1/2 log

00:37:07.890 --> 00:37:10.840
n, so you need log log n
bits to write down that.

00:37:10.840 --> 00:37:14.110
So the total size
of this lookup table

00:37:14.110 --> 00:37:16.720
is the product of these things.

00:37:16.720 --> 00:37:19.120
We have to write root
n look up tables.

00:37:19.120 --> 00:37:24.790
Each stores log squared
n different values,

00:37:24.790 --> 00:37:28.310
and the values require
log log n bits.

00:37:28.310 --> 00:37:31.240
So total number of
bits is this thing,

00:37:31.240 --> 00:37:34.390
and this thing is little o of n.

00:37:34.390 --> 00:37:37.480
So smaller than linear,
so it's irrelevant.

00:37:37.480 --> 00:37:39.700
Can store for free.

00:37:39.700 --> 00:37:42.700
Now if we have a bottom group,
the one thing we need to do

00:37:42.700 --> 00:37:45.010
is store a pointer
from that bottom group

00:37:45.010 --> 00:37:50.380
to the corresponding section of
the lookup table for that group

00:37:50.380 --> 00:37:51.976
type.

00:37:51.976 --> 00:38:05.140
So each group stores a
pointer into lookup table.

00:38:08.290 --> 00:38:11.050
I'm of two minds whether I
think of this as a single lookup

00:38:11.050 --> 00:38:13.960
table that's parameterized
first by group type,

00:38:13.960 --> 00:38:15.444
and then by the query.

00:38:15.444 --> 00:38:17.860
So it's like a two-dimensional
table or three-dimensional,

00:38:17.860 --> 00:38:18.924
depending how you count.

00:38:18.924 --> 00:38:21.340
Or you can think of there being
several lookup tables, one

00:38:21.340 --> 00:38:22.360
for each group type,
and then you're

00:38:22.360 --> 00:38:23.810
pointing to a
single lookup table.

00:38:23.810 --> 00:38:26.050
However, you want to think
about it, same thing.

00:38:26.050 --> 00:38:29.260
Same difference, as they say.

00:38:29.260 --> 00:38:31.276
This gives us linear space.

00:38:31.276 --> 00:38:32.650
These pointers
take linear space.

00:38:32.650 --> 00:38:34.733
The top structure takes
linear space linear number

00:38:34.733 --> 00:38:38.650
of words, and
constant query time,

00:38:38.650 --> 00:38:40.750
because lookup
tables are very fast.

00:38:40.750 --> 00:38:42.070
Just look into them.

00:38:42.070 --> 00:38:43.360
They give you the answer.

00:38:43.360 --> 00:38:46.820
So you can do a lookup table
here, lookup table here.

00:38:46.820 --> 00:38:50.290
And then over here,
you do the covering

00:38:50.290 --> 00:38:52.990
by 2, powers of 2 intervals.

00:38:52.990 --> 00:38:55.567
Again, we have a lookup
table for those intervals,

00:38:55.567 --> 00:38:57.400
so it's like we're
looking into four tables,

00:38:57.400 --> 00:39:00.610
take the min of them all, done.

00:39:00.610 --> 00:39:04.180
That is RMQ, and also LCA.

00:39:04.180 --> 00:39:07.690
Actually it was really LCA that
we solved, because we solved

00:39:07.690 --> 00:39:09.940
plus or minus 1
RMQ, which solved

00:39:09.940 --> 00:39:15.910
LCA, but by the
Cartesian tree reduction,

00:39:15.910 --> 00:39:16.820
that also solves RMQ.

00:39:19.570 --> 00:39:22.870
Now we solved 2 out
of 3 of our problems.

00:39:22.870 --> 00:39:23.620
Any questions?

00:39:26.750 --> 00:39:31.490
Level ancestors are going to
be harder, little bit harder.

00:39:31.490 --> 00:39:33.350
Similar number of steps.

00:39:33.350 --> 00:39:35.330
I'd say they're a
little more clever.

00:39:35.330 --> 00:39:36.830
This I feel is pretty easy.

00:39:36.830 --> 00:39:39.560
Very simple style of
indirection, very simple style

00:39:39.560 --> 00:39:40.979
of enumeration here.

00:39:40.979 --> 00:39:43.520
It's going to be a little more
sophisticated and a little bit

00:39:43.520 --> 00:39:48.110
more representative of
the general case for level

00:39:48.110 --> 00:39:48.760
ancestors.

00:39:52.349 --> 00:39:53.140
Definitely fancier.

00:39:56.620 --> 00:40:01.420
Level ancestors is a similar
story we solved a while ago,

00:40:01.420 --> 00:40:03.170
but it was kind of a
complicated solution.

00:40:03.170 --> 00:40:05.680
And then Bender
and Farach-Colton

00:40:05.680 --> 00:40:08.770
found it and said hey,
we can simplify this.

00:40:08.770 --> 00:40:12.700
And I'm going to give you
the simplified version.

00:40:12.700 --> 00:40:15.640
So this is level ancestors.

00:40:15.640 --> 00:40:17.860
Says originally solved
by Berkman and Vishkin

00:40:17.860 --> 00:40:21.010
in 1994, OK, not so long ago.

00:40:21.010 --> 00:40:25.690
And then the new
version is from 2004.

00:40:25.690 --> 00:40:26.800
Ready?

00:40:26.800 --> 00:40:27.580
Level ancestors.

00:40:27.580 --> 00:40:29.710
What was the problem again?

00:40:29.710 --> 00:40:30.580
Here it is.

00:40:30.580 --> 00:40:34.150
I gave you a rooted
tree, give you a node,

00:40:34.150 --> 00:40:39.620
and a level that I want to go
up, and then I level up by k,

00:40:39.620 --> 00:40:45.470
so I go to the kth ancestor,
or parent to the k.

00:40:45.470 --> 00:40:47.650
This may seem
superficially like LCA,

00:40:47.650 --> 00:40:50.470
but it's very different,
because as you can see,

00:40:50.470 --> 00:40:52.540
RMQ was very specific to LCA.

00:40:52.540 --> 00:40:56.200
It's not going to let you solve
level ancestors in any sense.

00:40:56.200 --> 00:40:57.049
I don't think.

00:40:57.049 --> 00:40:59.340
Maybe you could try to do
the Cartesian tree reduction,

00:40:59.340 --> 00:41:03.520
but solution we'll see
is completely different,

00:41:03.520 --> 00:41:05.860
although similar in spirit.

00:41:05.860 --> 00:41:09.564
So step 1.

00:41:09.564 --> 00:41:11.230
This one's going to
be a little bit less

00:41:11.230 --> 00:41:12.980
obvious that we will succeed.

00:41:12.980 --> 00:41:15.370
Here we started with
n log n space which

00:41:15.370 --> 00:41:17.217
is shaving a log, no big deal.

00:41:17.217 --> 00:41:19.300
Here, I'm going to give
you a couple of strategies

00:41:19.300 --> 00:41:22.690
that aren't even constant time,
they're log time or worse.

00:41:22.690 --> 00:41:25.951
And yet you combine them
and you get constant time.

00:41:25.951 --> 00:41:26.770
It's crazy.

00:41:31.490 --> 00:41:33.880
Again, each of the
pieces is going

00:41:33.880 --> 00:41:42.274
to be pretty intuitive,
not super surprising,

00:41:42.274 --> 00:41:43.690
but it's one of
these things where

00:41:43.690 --> 00:41:46.570
you take all these ingredients
that are all kind of obvious,

00:41:46.570 --> 00:41:49.209
you stare at them for a while
like, oh, I put them together

00:41:49.209 --> 00:41:49.750
and it works.

00:41:49.750 --> 00:41:51.490
It's like magic.

00:41:51.490 --> 00:41:54.820
All right, so first goal is
going to be n log n space,

00:41:54.820 --> 00:41:56.726
log n query.

00:41:56.726 --> 00:41:59.350
So here's a way to do it with a
technique called jump pointers.

00:42:07.510 --> 00:42:11.290
In this case, nodes are going to
have log n different pointers,

00:42:11.290 --> 00:42:12.790
and they're going
to point to the 2

00:42:12.790 --> 00:42:16.900
to the ith ancestor for all i.

00:42:20.281 --> 00:42:22.870
I guess maximum possible
i would be log n.

00:42:22.870 --> 00:42:26.230
You can never go up more than n.

00:42:26.230 --> 00:42:29.150
So I mean, ideally you'd have
a pointer to all your ancestors

00:42:29.150 --> 00:42:30.584
in array, boom.

00:42:30.584 --> 00:42:32.500
In the quadratic space,
you solve your problem

00:42:32.500 --> 00:42:33.970
in constant time.

00:42:33.970 --> 00:42:35.470
But it's a little
more interesting.

00:42:35.470 --> 00:42:39.430
Now every node only has pointers
to log n different places

00:42:39.430 --> 00:42:44.212
so it's looking like this.

00:42:44.212 --> 00:42:47.950
This is the ancestor path.

00:42:47.950 --> 00:42:50.380
So n log n space, and
I claim with this,

00:42:50.380 --> 00:42:54.410
you can roughly do a binary
search, if you wanted to.

00:42:54.410 --> 00:42:56.920
Now we're not actually going
to use this query algorithm

00:42:56.920 --> 00:42:59.930
for anything, but I'll
write it down just

00:42:59.930 --> 00:43:02.200
so it feels like we've
accomplished something, mainly

00:43:02.200 --> 00:43:04.030
log n query time.

00:43:04.030 --> 00:43:07.390
So what do I do?

00:43:07.390 --> 00:43:15.100
I set x to be the 2 to the
floor log kth ancestor of x.

00:43:17.810 --> 00:43:24.760
OK, remember we're given
a node x and a value

00:43:24.760 --> 00:43:26.530
k that we want to rise by.

00:43:26.530 --> 00:43:29.430
So I take the power
of 2 just below k--

00:43:29.430 --> 00:43:31.150
that's 2 the floor log k.

00:43:31.150 --> 00:43:33.580
I go up that much,
and that's my new x,

00:43:33.580 --> 00:43:38.560
and then I set k to
be k minus that value.

00:43:38.560 --> 00:43:41.300
That's how much I
have left to go.

00:43:41.300 --> 00:43:41.800
OK.

00:43:41.800 --> 00:43:45.430
This thing will be
less than k over 2.

00:43:45.430 --> 00:43:48.550
Because the next previous
power of 2 is at least,

00:43:48.550 --> 00:43:50.710
is bigger than
half of the thing.

00:43:50.710 --> 00:43:52.690
So we got more
than halfway there,

00:43:52.690 --> 00:43:55.950
and so after log n iterations,
we'll actually get there.

00:43:55.950 --> 00:43:58.250
That's pretty easy.

00:43:58.250 --> 00:44:04.120
That's jump pointers to two
logs that we need to get rid of,

00:44:04.120 --> 00:44:06.040
and yes, we will use
indirection, but not yet.

00:44:14.590 --> 00:44:16.950
First, we need some
more ingredients.

00:44:21.270 --> 00:44:22.920
This next ingredient
is kind of funny,

00:44:22.920 --> 00:44:25.180
because it will seem useless.

00:44:25.180 --> 00:44:30.000
But in fact, it is useful as
a step towards ingredient 3.

00:44:30.000 --> 00:44:33.450
So the next trick is called
long path decomposition.

00:44:40.170 --> 00:44:42.960
In general, this class covers
a lot of different treaty

00:44:42.960 --> 00:44:45.030
compositions.

00:44:45.030 --> 00:44:49.160
We did preferred path
decomposition for tango trees.

00:44:49.160 --> 00:44:50.700
We're going to do long path now.

00:44:50.700 --> 00:44:52.800
We'll do another one
called heavy path later.

00:44:52.800 --> 00:44:54.270
There's a lot of them out there.

00:44:54.270 --> 00:44:56.730
This one won't seem
very useful at first,

00:44:56.730 --> 00:45:00.420
because while it will
achieve linear space,

00:45:00.420 --> 00:45:04.650
it will achieve the amazing
square root of n query, which

00:45:04.650 --> 00:45:06.270
I guess is new.

00:45:06.270 --> 00:45:10.860
I mean, we don't know how to
do that yet with linear space.

00:45:10.860 --> 00:45:12.240
Not so obvious
how to get root n.

00:45:12.240 --> 00:45:16.680
But anyway, don't worry
about the query time.

00:45:16.680 --> 00:45:18.960
It's more the concept of
long path that's interesting.

00:45:18.960 --> 00:45:21.404
It's a step in the
right direction.

00:45:21.404 --> 00:45:23.820
So here's what here's how we're
going to decompose a tree.

00:45:23.820 --> 00:45:30.570
First thing we do is find the
longest route to leaf path

00:45:30.570 --> 00:45:36.940
in the tree, because
if you look at a tree,

00:45:36.940 --> 00:45:39.690
it has some wavy bottom.

00:45:39.690 --> 00:45:41.400
Take the deepest node.

00:45:41.400 --> 00:45:44.870
Take the path the unique path
from the root to that node.

00:45:44.870 --> 00:45:45.630
OK.

00:45:45.630 --> 00:45:49.410
When I do that, I could
imagine deleting those nodes.

00:45:49.410 --> 00:45:51.180
I mean, there's
that path, and then

00:45:51.180 --> 00:45:52.890
there's everything
else, which means

00:45:52.890 --> 00:45:56.730
there's all these triangles
hanging off of that path, some

00:45:56.730 --> 00:46:00.400
on the left, some on the right.

00:46:00.400 --> 00:46:04.440
Actually, I haven't
talked about this,

00:46:04.440 --> 00:46:10.620
but both LCA and level ancestors
work not just for binary trees.

00:46:10.620 --> 00:46:13.000
They work for arbitrary trees.

00:46:13.000 --> 00:46:15.660
And somewhere along here--

00:46:15.660 --> 00:46:17.760
yeah, here.

00:46:17.760 --> 00:46:20.250
This reduction of
using the Euler tour

00:46:20.250 --> 00:46:22.464
works for non-binary trees, too.

00:46:22.464 --> 00:46:23.880
That's actually
another reason why

00:46:23.880 --> 00:46:27.330
this reduction is better than
in-order traversal by itself.

00:46:27.330 --> 00:46:30.630
In-order traversal works
only for binary trees.

00:46:30.630 --> 00:46:32.190
This thing works for any tree.

00:46:32.190 --> 00:46:33.900
In that case, in
an arbitrary tree,

00:46:33.900 --> 00:46:36.300
you visit the node many,
many times potentially.

00:46:36.300 --> 00:46:37.980
OK, but it will
still be linear space

00:46:37.980 --> 00:46:39.930
and everything will still work.

00:46:39.930 --> 00:46:42.210
Here also, I want to
handle non-binary trees.

00:46:42.210 --> 00:46:44.250
So I'm going to draw
things hanging off,

00:46:44.250 --> 00:46:46.500
but in fact, there might be
several things hanging off

00:46:46.500 --> 00:46:49.200
here, each their
own little tree.

00:46:49.200 --> 00:46:50.910
OK, but the point is--

00:46:50.910 --> 00:46:51.750
where's my red.

00:46:54.690 --> 00:46:56.300
Here.

00:46:56.300 --> 00:46:59.850
There was this one path in the
beginning, the longest path,

00:46:59.850 --> 00:47:01.680
and then there's stuff
hanging off of it.

00:47:01.680 --> 00:47:05.760
So just recurse on all the
things hanging off of it.

00:47:05.760 --> 00:47:08.610
Recursively decompose
those sub-trees.

00:47:28.032 --> 00:47:29.062
OK.

00:47:29.062 --> 00:47:30.770
Not clear what this
is going to give you.

00:47:30.770 --> 00:47:32.478
In fact, it's not
going to be so awesome,

00:47:32.478 --> 00:47:35.140
but it will be a starting point.

00:47:35.140 --> 00:47:40.440
Now you can answer a query
with this, as follows.

00:47:40.440 --> 00:47:43.400
Query-- oh, sorry.

00:47:43.400 --> 00:47:46.340
I should say how we're
actually storing these paths.

00:47:46.340 --> 00:47:50.260
Here's the cool idea
with this path thing.

00:47:50.260 --> 00:47:52.130
I have this path.

00:47:52.130 --> 00:47:54.770
I'd like to be able to
jump around at least--

00:47:54.770 --> 00:47:56.500
suppose your tree was a path.

00:47:56.500 --> 00:47:58.160
Suppose your tree were a path.

00:47:58.160 --> 00:47:59.930
Then what would you want to do?

00:47:59.930 --> 00:48:03.482
Store the nodes in an
array ordered by depth,

00:48:03.482 --> 00:48:04.940
because then if
you're a position i

00:48:04.940 --> 00:48:07.820
and you need to go to
position i minus k, boom.

00:48:07.820 --> 00:48:09.990
That's just a look
up into your array.

00:48:09.990 --> 00:48:17.610
So I'm going to store
each path as an array,

00:48:17.610 --> 00:48:27.770
as an array of nodes or
node pointers, I guess,

00:48:27.770 --> 00:48:30.080
ordered by depth.

00:48:30.080 --> 00:48:35.420
So if it happens, so if my query
value x is somewhere on this

00:48:35.420 --> 00:48:40.430
path, and if this path
encompasses where I need

00:48:40.430 --> 00:48:44.240
to go-- so if I need to go
k up and I end up here--

00:48:44.240 --> 00:48:45.890
then that's instantaneous.

00:48:45.890 --> 00:48:48.270
The trouble would be
is if I have a query,

00:48:48.270 --> 00:48:50.530
let's say, over here.

00:48:50.530 --> 00:48:55.190
And so there's going to be
a path that guy lives on,

00:48:55.190 --> 00:48:57.540
but maybe the kth ancestor
is not on that path.

00:48:57.540 --> 00:48:59.794
It could be on a higher up path.

00:48:59.794 --> 00:49:01.460
It could be on the
red path, and I can't

00:49:01.460 --> 00:49:03.710
jump there instantaneously.

00:49:03.710 --> 00:49:07.170
Nonetheless, there is a
decent query algorithm here.

00:49:07.170 --> 00:49:07.670
All right.

00:49:11.230 --> 00:49:17.900
So Here's what
we're going to do.

00:49:17.900 --> 00:49:29.970
If k is less than or equal
to the index i of node

00:49:29.970 --> 00:49:32.300
x on its path.

00:49:37.580 --> 00:49:40.130
So every node belongs
to exactly one path.

00:49:40.130 --> 00:49:41.840
This is a path decomposition.

00:49:41.840 --> 00:49:44.954
It's a partition of
the tree into paths.

00:49:44.954 --> 00:49:46.370
Not all the edges
are represented,

00:49:46.370 --> 00:49:48.980
but all the nodes are there.

00:49:48.980 --> 00:49:53.020
All the nodes
belong to some path,

00:49:53.020 --> 00:49:56.000
and we're going to store,
for every node, store

00:49:56.000 --> 00:49:59.780
what its index is and where
it lives in its array.

00:49:59.780 --> 00:50:02.330
So look at that
index in the array.

00:50:02.330 --> 00:50:05.330
If k is less than or
equal to that index,

00:50:05.330 --> 00:50:08.810
then we can solve
our problem instantly

00:50:08.810 --> 00:50:15.590
by looking at the path
array at position i minus k.

00:50:15.590 --> 00:50:17.930
That's what I said before.

00:50:17.930 --> 00:50:20.210
If our kth ancestor
is within the path,

00:50:20.210 --> 00:50:22.190
then that's where it
will be, and that's

00:50:22.190 --> 00:50:25.260
going to work as long
as that is non-negative.

00:50:25.260 --> 00:50:28.380
If I get to negative, that
means it's another path.

00:50:28.380 --> 00:50:30.590
So that's the good case.

00:50:30.590 --> 00:50:34.970
The other case is
we're just going to do

00:50:34.970 --> 00:50:37.224
some recursion, essentially.

00:50:41.100 --> 00:50:43.460
So we're going to go as high
as we can with this path.

00:50:43.460 --> 00:50:47.420
We're going to look at
path array at position 0.

00:50:47.420 --> 00:50:48.534
Go to the parent of that.

00:50:48.534 --> 00:50:50.450
Let's suppose every node
has a parent pointer.

00:50:50.450 --> 00:50:59.940
That's easy, regular tree, and
then decrease k by 1 plus i.

00:50:59.940 --> 00:51:02.570
So the array let us
jump up i steps--

00:51:02.570 --> 00:51:06.130
that's this part--
and then the parent

00:51:06.130 --> 00:51:07.460
stepped us up one more step.

00:51:07.460 --> 00:51:10.910
That's just to get to
the next path above us.

00:51:10.910 --> 00:51:13.300
OK, so how much did
this decrease k by?

00:51:13.300 --> 00:51:16.520
I'd like to say a factor of
2 and get log n, but in fact,

00:51:16.520 --> 00:51:18.842
no, it's not very good.

00:51:18.842 --> 00:51:20.300
It doesn't decrease
k by very much.

00:51:20.300 --> 00:51:23.030
It does decrease k,
guaranteed by at least 1,

00:51:23.030 --> 00:51:25.700
so it's definitely linear time.

00:51:25.700 --> 00:51:29.165
And there's a bad
tree, which is this.

00:51:35.100 --> 00:51:36.710
It's like a grid.

00:51:36.710 --> 00:51:37.230
Whoa.

00:51:37.230 --> 00:51:37.730
Sorry.

00:51:41.104 --> 00:51:42.680
OK, here's a tree.

00:51:42.680 --> 00:51:44.820
It's a binary tree.

00:51:44.820 --> 00:51:47.091
And if you set it up right,
this is the longest path.

00:51:47.091 --> 00:51:49.340
And then when you decompose,
this is the longest path,

00:51:49.340 --> 00:51:51.631
and this is the longest path,
this is the longest path.

00:51:51.631 --> 00:51:53.544
If you query here,
you'll walk up to here,

00:51:53.544 --> 00:51:55.460
and then walk up to here,
and walk up to here,

00:51:55.460 --> 00:51:56.293
and walk up to here.

00:51:56.293 --> 00:52:00.510
So this is a square root of n
lower bound for this algorithm.

00:52:00.510 --> 00:52:03.410
So not a good algorithm
yet, but the makings

00:52:03.410 --> 00:52:04.744
of a good algorithm.

00:52:19.570 --> 00:52:25.826
Makings of step 3, which is
called ladder decomposition.

00:52:32.587 --> 00:52:34.670
Ladder decomposition is
something I haven't really

00:52:34.670 --> 00:52:36.515
seen anywhere else.

00:52:36.515 --> 00:52:38.390
I think it comes from
the parallel algorithms

00:52:38.390 --> 00:52:39.620
world in general.

00:52:43.830 --> 00:52:53.760
And now we're going to achieve
linear space log n query.

00:52:53.760 --> 00:52:55.650
Now this is an improvement.

00:52:55.650 --> 00:52:58.070
So we have, at the
moment, n log n space,

00:52:58.070 --> 00:53:02.105
log n query or n
space root n query.

00:53:02.105 --> 00:53:05.120
We're basically taking
the min of the two.

00:53:05.120 --> 00:53:08.990
And so we're getting
linear space log n query.

00:53:08.990 --> 00:53:10.170
Still not perfect.

00:53:10.170 --> 00:53:11.810
We want constant query.

00:53:11.810 --> 00:53:15.140
That's when we'll use
indirection, I think.

00:53:15.140 --> 00:53:20.060
Yeah, basically, a new type
of indirection, but OK.

00:53:20.060 --> 00:53:22.950
So linear space log n query.

00:53:22.950 --> 00:53:28.040
Well, the idea is just
to fix long paths,

00:53:28.040 --> 00:53:29.690
and it's a crazy idea, OK.

00:53:29.690 --> 00:53:31.970
Let me tell you the
idea and then it's

00:53:31.970 --> 00:53:34.310
like, why would that be useful.

00:53:34.310 --> 00:53:37.220
But it's obvious that
it doesn't hurt you, OK.

00:53:37.220 --> 00:53:40.970
When we have these paths,
sometimes they're long.

00:53:40.970 --> 00:53:43.310
Sometimes they're
not long enough.

00:53:43.310 --> 00:53:44.990
Just take each of
these paths and extend

00:53:44.990 --> 00:53:48.900
them upwards by a factor of 2.

00:53:48.900 --> 00:53:51.110
That's the idea.

00:53:51.110 --> 00:54:00.430
So take number 2, extend
each path upward 2 x.

00:54:00.430 --> 00:54:03.760
So that gives us call a ladder.

00:54:06.270 --> 00:54:07.160
OK, what happens?

00:54:07.160 --> 00:54:10.810
Well, paths are
going to overlap.

00:54:10.810 --> 00:54:13.000
Fine.

00:54:13.000 --> 00:54:13.860
Ladders overlap.

00:54:13.860 --> 00:54:15.730
The original paths
don't overlap.

00:54:15.730 --> 00:54:16.510
Ladders overlap.

00:54:16.510 --> 00:54:18.580
I don't really care
if they overlap.

00:54:18.580 --> 00:54:20.650
How much space is there?

00:54:20.650 --> 00:54:23.350
It's still linear space, because
I'm just doubling everything.

00:54:23.350 --> 00:54:27.027
So I've most doubled space
relative to long path

00:54:27.027 --> 00:54:27.610
decomposition.

00:54:27.610 --> 00:54:30.010
I didn't mention it explicitly,
but long path decomposition

00:54:30.010 --> 00:54:30.676
is linear space.

00:54:30.676 --> 00:54:34.630
We're just partitioning up
the tree into little pieces.

00:54:34.630 --> 00:54:36.070
Doesn't take much.

00:54:36.070 --> 00:54:39.100
We have to store those
arrays, but every node

00:54:39.100 --> 00:54:40.990
appears in exactly
one cell here.

00:54:40.990 --> 00:54:42.820
Now every node will
appear in, on average,

00:54:42.820 --> 00:54:45.100
two cells in some weird way.

00:54:45.100 --> 00:54:46.570
Like what happens over here?

00:54:46.570 --> 00:54:48.850
I have no idea.

00:54:48.850 --> 00:54:50.590
So this guy's length 1.

00:54:50.590 --> 00:54:52.060
It's going to grow to length 2.

00:54:52.060 --> 00:54:55.750
This one's length 2, so
now it'll grow to length 4.

00:54:55.750 --> 00:54:56.625
This one's length 3--

00:54:56.625 --> 00:54:57.958
and it depends on how you count.

00:54:57.958 --> 00:54:59.170
I'm counting nodes here.

00:54:59.170 --> 00:55:03.010
That's going to go here,
all the way the top.

00:55:03.010 --> 00:55:04.840
Interesting.

00:55:04.840 --> 00:55:06.470
All the others
will go to the top.

00:55:06.470 --> 00:55:08.945
So if I'm here, I walk here.

00:55:08.945 --> 00:55:10.570
Then I can jump all
the way to the top.

00:55:10.570 --> 00:55:13.210
Then I can jump all
the way to the root.

00:55:13.210 --> 00:55:17.000
Not totally obvious, but it
actually will be log n steps.

00:55:17.000 --> 00:55:18.215
Let's prove that.

00:55:18.215 --> 00:55:19.840
This is again something
we don't really

00:55:19.840 --> 00:55:21.490
need to know for
the final solution,

00:55:21.490 --> 00:55:25.060
but kind of nice, kind of
comforting to know that we've

00:55:25.060 --> 00:55:27.190
gotten down a log n query.

00:55:27.190 --> 00:55:29.715
So it's at most
double the space.

00:55:29.715 --> 00:55:30.590
This is still linear.

00:55:34.030 --> 00:55:37.165
Now-- oh, there's one catch.

00:55:40.750 --> 00:55:45.660
Over in this world,
we said each--

00:55:45.660 --> 00:55:47.650
I didn't say it.

00:55:47.650 --> 00:55:49.000
I mentioned it out loud.

00:55:49.000 --> 00:55:52.330
Every node stores what
array it lives in.

00:55:52.330 --> 00:55:55.720
Now a node lives in
multiple arrays, OK.

00:55:55.720 --> 00:55:58.360
So which one do I
store a pointer to?

00:55:58.360 --> 00:56:02.890
Well, there's one obvious
one to store a pointer to.

00:56:02.890 --> 00:56:05.380
Whatever node you take
lives in one path.

00:56:05.380 --> 00:56:08.230
In that long path decomposition,
it still lives in one path.

00:56:08.230 --> 00:56:11.660
Store a pointer
into that ladder.

00:56:11.660 --> 00:56:20.560
So node stores a pointer you
could say to the ladder that

00:56:20.560 --> 00:56:22.780
contains it in the lower half.

00:56:27.550 --> 00:56:31.150
That corresponds to the one
where it was an actual path.

00:56:31.150 --> 00:56:35.260
And only one ladder will contain
a node in its lower half.

00:56:35.260 --> 00:56:37.100
The upper half
was the extension.

00:56:37.100 --> 00:56:40.560
I guess it's like those
folding ladders you extend.

00:56:40.560 --> 00:56:42.260
OK.

00:56:42.260 --> 00:56:42.760
Cool.

00:56:42.760 --> 00:56:44.718
So that's what we're
going to do and also store

00:56:44.718 --> 00:56:47.210
its index in the array.

00:56:47.210 --> 00:56:50.230
Now we can do exactly this
query algorithm again,

00:56:50.230 --> 00:56:52.570
except now instead of
path, it says ladder.

00:56:52.570 --> 00:56:55.900
So you look at the index
of the node in its ladder.

00:56:55.900 --> 00:56:58.900
If that index is
larger than k, then

00:56:58.900 --> 00:57:02.200
boom, that ladder array will
tell you exactly where to go.

00:57:02.200 --> 00:57:04.480
Otherwise you go to
the top of the ladder

00:57:04.480 --> 00:57:06.160
and then you take
the parent pointer,

00:57:06.160 --> 00:57:07.510
and you decrease by this.

00:57:07.510 --> 00:57:11.116
But now I claim that
decrease will be substantial.

00:57:11.116 --> 00:57:11.615
Why?

00:57:20.470 --> 00:57:22.900
If I have a node of height h--

00:57:25.570 --> 00:57:28.100
remember, height of a node
is the length of the longest

00:57:28.100 --> 00:57:29.620
path from there downward--

00:57:32.300 --> 00:57:44.301
it will be on a ladder
of height at least 2h.

00:57:44.301 --> 00:57:44.800
Why?

00:57:44.800 --> 00:57:47.122
Because if you look at
a node of height h--

00:57:47.122 --> 00:57:48.580
like say, I don't
know, this node--

00:57:51.220 --> 00:57:53.964
the longest path from
there is substantial.

00:57:53.964 --> 00:57:56.380
I mean, if it's height h, then
the longest path from there

00:57:56.380 --> 00:57:57.850
is length at least h.

00:57:57.850 --> 00:58:00.850
So every node of height h will
be on a path of length at least

00:58:00.850 --> 00:58:04.010
h, and from there down.

00:58:04.010 --> 00:58:05.260
And so you look at the ladder.

00:58:05.260 --> 00:58:06.920
Well, that's going
to be double that.

00:58:06.920 --> 00:58:09.670
So the ladder will be
height at least 2h,

00:58:09.670 --> 00:58:13.030
which means if your
query starts at height h,

00:58:13.030 --> 00:58:16.480
after you do one step
of this ladder search,

00:58:16.480 --> 00:58:19.750
you will get to height at least
2h, and then 4h, and then 8h.

00:58:19.750 --> 00:58:23.020
You're increasing your height by
a power of 2, by a factor of 2

00:58:23.020 --> 00:58:24.140
every time.

00:58:24.140 --> 00:58:29.000
So in log n steps, you will
get to wherever you need to go.

00:58:29.000 --> 00:58:31.160
OK You don't have to
worry about overshooting,

00:58:31.160 --> 00:58:33.530
because that's the case
when the array tells you

00:58:33.530 --> 00:58:35.800
exactly where to go.

00:58:35.800 --> 00:58:36.300
OK.

00:58:38.930 --> 00:58:42.080
Time for the climax.

00:58:42.080 --> 00:58:43.806
It won't be the end,
but it's the climax

00:58:43.806 --> 00:58:44.930
in the middle of the story.

00:58:44.930 --> 00:58:47.900
So we have on the one
hand, jump pointers.

00:58:47.900 --> 00:58:48.890
Remember those?

00:58:48.890 --> 00:58:55.170
Jump pointers made small
steps initially and got--

00:58:55.170 --> 00:58:56.840
actually, no.

00:58:56.840 --> 00:58:59.065
This is what it looks like
for the data structure.

00:58:59.065 --> 00:59:00.440
But if you look
at the algorithm,

00:59:00.440 --> 00:59:02.460
actually it makes a big
step in the beginning.

00:59:02.460 --> 00:59:02.960
Right?

00:59:02.960 --> 00:59:04.744
It gets more than halfway there.

00:59:04.744 --> 00:59:06.410
Then it makes smaller
and smaller steps,

00:59:06.410 --> 00:59:07.890
exponentially decreasing steps.

00:59:07.890 --> 00:59:12.990
Finally, it arrives
at the intended node.

00:59:12.990 --> 00:59:16.004
Ladder decomposition
is doing the reverse.

00:59:16.004 --> 00:59:17.420
If you start at
low height, you're

00:59:17.420 --> 00:59:19.490
going to make very small
steps in the beginning.

00:59:19.490 --> 00:59:20.906
As your height
gets bigger, you're

00:59:20.906 --> 00:59:22.790
going to be making
bigger and bigger steps.

00:59:22.790 --> 00:59:25.880
And then when you jump over your
node, you found it instantly.

00:59:25.880 --> 00:59:29.340
So it's kind of the
opposite of jump pointers.

00:59:29.340 --> 00:59:32.630
So what we're going to
do is take jump pointers

00:59:32.630 --> 00:59:35.959
and add them to
ladder decomposition.

00:59:52.096 --> 00:59:53.580
Huh.

00:59:53.580 --> 00:59:56.170
This is, I guess, version 4.

00:59:56.170 --> 01:00:08.700
Combine jump pointers from
one and ladders from three.

01:00:08.700 --> 01:00:09.420
Forget about two.

01:00:09.420 --> 01:00:11.940
Two is just a warm up for three.

01:00:11.940 --> 01:00:15.210
Long paths, defined ladders.

01:00:15.210 --> 01:00:17.850
So we've got one way
to do log n query.

01:00:17.850 --> 01:00:20.250
We've got another way
to do log n query.

01:00:20.250 --> 01:00:25.610
I combine them, and
I get constant query.

01:00:25.610 --> 01:00:27.260
Because log n plus
log n equals 1.

01:00:27.260 --> 01:00:27.830
I don't know.

01:00:32.820 --> 01:00:34.760
OK, here's the idea.

01:00:34.760 --> 01:00:38.030
On the one hand, jump pointers
make a big step and then

01:00:38.030 --> 01:00:40.030
smaller steps, right.

01:00:40.030 --> 01:00:41.460
Yeah, like that.

01:00:41.460 --> 01:00:46.340
And on the other hand,
ladders make small steps.

01:00:46.340 --> 01:00:47.090
It's hard to draw.

01:00:51.400 --> 01:00:59.380
What I'd like to do is take
this step and this step.

01:00:59.380 --> 01:01:02.110
That would be good,
because only two of them.

01:01:02.110 --> 01:01:13.180
So query is going to do
one jump, plus 1 ladder,

01:01:13.180 --> 01:01:15.410
in that order.

01:01:15.410 --> 01:01:17.652
See, the thing about
ladders is it's

01:01:17.652 --> 01:01:20.110
really slow in the beginning,
because your height is small.

01:01:20.110 --> 01:01:23.170
I really want to
get large height.

01:01:23.170 --> 01:01:24.730
Jump pointers give
you large height.

01:01:24.730 --> 01:01:28.720
The very first step, you get
half the height you need.

01:01:28.720 --> 01:01:30.600
That's it.

01:01:30.600 --> 01:01:38.770
So when we do a jump, we do
one step of the jump algorithm.

01:01:38.770 --> 01:01:39.430
What do we do?

01:01:39.430 --> 01:01:49.040
We reach height at
least k over 2 above x.

01:01:49.040 --> 01:01:51.650
All right, we get halfway there.

01:01:51.650 --> 01:01:53.310
So our height-- it's a little--

01:01:53.310 --> 01:01:56.360
let's say x has height h.

01:01:56.360 --> 01:01:58.960
OK, so then we get to
height-- this is saying we

01:01:58.960 --> 01:02:03.370
get to height h plus k over 2.

01:02:03.370 --> 01:02:04.180
OK, that's good.

01:02:04.180 --> 01:02:06.190
This is a big height.

01:02:06.190 --> 01:02:11.410
Halfway there, I mean, halfway
of the remainder after h.

01:02:11.410 --> 01:02:15.620
Now ladders double your
height in every step.

01:02:15.620 --> 01:02:20.540
So ladder step-- so
this is the jump step.

01:02:20.540 --> 01:02:24.940
If you do one ladder step, you
will reach height double that.

01:02:24.940 --> 01:02:28.725
So it's at least
2 h plus k, which

01:02:28.725 --> 01:02:29.900
is bigger than what we need.

01:02:29.900 --> 01:02:31.067
We need h plus k.

01:02:31.067 --> 01:02:32.400
That's where we're trying to go.

01:02:32.400 --> 01:02:34.964
And so we're done.

01:02:34.964 --> 01:02:35.630
Isn't that cool?

01:02:40.960 --> 01:02:44.750
So the annoying part is
there's this extra part here.

01:02:44.750 --> 01:02:48.110
This is the h part and
we start at some level.

01:02:48.110 --> 01:02:49.080
We don't know where.

01:02:49.080 --> 01:02:49.580
This is x.

01:02:49.580 --> 01:02:51.620
The worst case is maybe
when it's very small,

01:02:51.620 --> 01:02:56.260
but whatever it is, we do this
step and this is our target up

01:02:56.260 --> 01:02:57.080
here.

01:02:57.080 --> 01:03:00.110
This is height h plus k.

01:03:00.110 --> 01:03:02.210
In one step, we get
more than halfway

01:03:02.210 --> 01:03:05.150
there with the jump pointer.

01:03:05.150 --> 01:03:07.530
And then the ladder will
carry us the rest of the way.

01:03:07.530 --> 01:03:08.990
Because this is the ladder.

01:03:08.990 --> 01:03:13.310
We basically go horizontally
to fall on this ladder,

01:03:13.310 --> 01:03:15.470
and it will cover
us beyond where

01:03:15.470 --> 01:03:17.500
we need to go, beyond
our wildest imaginations.

01:03:17.500 --> 01:03:19.669
So this is k over 2.

01:03:19.669 --> 01:03:21.210
Because not only
will it double this,

01:03:21.210 --> 01:03:22.668
which is what we
need to double, it

01:03:22.668 --> 01:03:25.820
will also double whatever
is down here, this h part.

01:03:25.820 --> 01:03:28.320
So it gets us way beyond
where we need to go.

01:03:28.320 --> 01:03:29.250
I mean, could be h 0.

01:03:29.250 --> 01:03:31.208
Then it gets us to exactly
where we need to go.

01:03:33.472 --> 01:03:35.180
But then the ladder
tells us where to go.

01:03:35.180 --> 01:03:37.931
So two steps constant time.

01:03:41.300 --> 01:03:45.440
Now one annoying thing is
we're not done with space.

01:03:45.440 --> 01:03:47.420
So this is the anticlimax part.

01:03:47.420 --> 01:03:49.320
It's still going to
be pretty interesting.

01:03:49.320 --> 01:03:51.456
We've got to shave off
a log factor in space,

01:03:51.456 --> 01:03:52.580
but hey, we're experienced.

01:03:52.580 --> 01:03:54.121
We already did that once today.

01:03:54.121 --> 01:03:54.620
Question?

01:03:54.620 --> 01:03:55.120
Yeah.

01:03:55.120 --> 01:03:57.145
Why is it OK to go
past your target?

01:04:01.664 --> 01:04:03.830
The question was why is it
OK to go past our target?

01:04:03.830 --> 01:04:05.788
Jump pointers aren't
allowed, because they only

01:04:05.788 --> 01:04:06.720
know how to go up.

01:04:06.720 --> 01:04:07.740
They can't overshoot.

01:04:07.740 --> 01:04:10.640
That's why they went less than
halfway, or more than halfway,

01:04:10.640 --> 01:04:12.020
but less than the full way.

01:04:12.020 --> 01:04:16.580
Ladder decomposition can go
beyond, because as soon as--

01:04:16.580 --> 01:04:20.450
the point is, as soon as--
here's you, x, and here's

01:04:20.450 --> 01:04:21.770
your kth ancestor.

01:04:21.770 --> 01:04:22.810
This is the answer.

01:04:22.810 --> 01:04:24.560
As soon as you're in
a common ladder, then

01:04:24.560 --> 01:04:26.190
the array tells you where to go.

01:04:26.190 --> 01:04:30.099
So even though the top
of the ladder overshot,

01:04:30.099 --> 01:04:31.640
there will be a
ladder connecting you

01:04:31.640 --> 01:04:32.723
to that top of the ladder.

01:04:32.723 --> 01:04:36.020
So as long as it's somewhere
in between, it's free.

01:04:36.020 --> 01:04:39.755
Yeah, so that's why it's OK
this goes potentially too high.

01:04:39.755 --> 01:04:41.630
So it's good for ladders,
not good for jumps,

01:04:41.630 --> 01:04:46.160
but that's exactly where
we have it Other questions?

01:04:46.160 --> 01:04:46.823
Yeah.

01:04:46.823 --> 01:04:50.204
AUDIENCE: [INAUDIBLE]
jump pointers,

01:04:50.204 --> 01:04:52.458
wouldn't you be high
up enough in the tree

01:04:52.458 --> 01:04:54.950
so that just the
long path would work?

01:04:54.950 --> 01:04:56.450
PROFESSOR: Oh,
interesting question.

01:04:56.450 --> 01:04:59.450
So would it be enough to do
jump pointers plus long path?

01:04:59.450 --> 01:05:02.030
My guess is no.

01:05:02.030 --> 01:05:04.250
Jump pointers get you up to--

01:05:04.250 --> 01:05:05.750
so think of the
case where h is 0.

01:05:05.750 --> 01:05:08.060
Initially you're at height 0.

01:05:08.060 --> 01:05:09.970
I think that's going
to be a problem.

01:05:09.970 --> 01:05:15.470
You jump up to height k
over 2 with a jump pointer.

01:05:15.470 --> 01:05:17.270
Now long path
decomposition, you know

01:05:17.270 --> 01:05:21.170
that the path will have a
length at least k over 2,

01:05:21.170 --> 01:05:22.740
but you need to get up to k.

01:05:22.740 --> 01:05:25.010
And so you may get stuck
in this kind of situation

01:05:25.010 --> 01:05:27.540
where maybe you're
trying to get to the root

01:05:27.540 --> 01:05:31.220
and you jumped to here,
but then you have to walk.

01:05:31.220 --> 01:05:33.140
So I think the long
path's not enough.

01:05:33.140 --> 01:05:35.532
You need that factor of 2,
which the ladders give you.

01:05:35.532 --> 01:05:37.490
You can see where ladders
come from now, right?

01:05:37.490 --> 01:05:40.391
I mean we got up
to height k over 2.

01:05:40.391 --> 01:05:41.640
Now we just need to double it.

01:05:41.640 --> 01:05:44.000
Hey, we can afford
to double every path,

01:05:44.000 --> 01:05:45.385
but I think we need to.

01:05:45.385 --> 01:05:48.690
Are there questions?

01:05:48.690 --> 01:05:50.300
OK.

01:05:50.300 --> 01:05:56.000
So last thing to do is to shave
off this log factor of space.

01:05:56.000 --> 01:05:58.560
Now, we're going to do that
with indirection, of course,

01:05:58.560 --> 01:06:01.040
constant time and log n space.

01:06:01.040 --> 01:06:04.450
But it's not our usual
type of indirection.

01:06:08.750 --> 01:06:10.540
Use this board.

01:06:10.540 --> 01:06:11.260
Indirections.

01:06:11.260 --> 01:06:17.080
So last time we did indirection,
it was with an array.

01:06:17.080 --> 01:06:19.330
And actually pretty much
every indirection we've done,

01:06:19.330 --> 01:06:21.160
it's been with an
array-like thing.

01:06:21.160 --> 01:06:24.040
We could decompose into
groups of size log n,

01:06:24.040 --> 01:06:26.010
the top thing was n over log n.

01:06:26.010 --> 01:06:28.180
So it was kind of clean.

01:06:28.180 --> 01:06:32.200
This structure is not so
clean, because it's a tree.

01:06:32.200 --> 01:06:34.450
How do you decompose a
tree into little things

01:06:34.450 --> 01:06:36.700
at the bottom of size
log n and a top thing

01:06:36.700 --> 01:06:38.680
of size n over log n?

01:06:38.680 --> 01:06:43.510
Suppose, for example,
your tree is a path.

01:06:46.840 --> 01:06:49.420
Bad news.

01:06:49.420 --> 01:06:55.960
If my tree were a path,
well, I could trim off

01:06:55.960 --> 01:06:57.340
bottom thing of size log n.

01:06:57.340 --> 01:07:01.750
But now the rest is of size
n minus log n, not n divided

01:07:01.750 --> 01:07:02.260
by log n.

01:07:02.260 --> 01:07:03.700
That's bad.

01:07:03.700 --> 01:07:06.630
I need to shave a factor of
log n, not an additive log n.

01:07:09.591 --> 01:07:11.340
Can you tell me a good
thing about a path?

01:07:14.300 --> 01:07:17.150
I mean, obviously, when
we can put in an array.

01:07:17.150 --> 01:07:19.070
But can you quantify
the goodness,

01:07:19.070 --> 01:07:21.320
or the pathlikedness of a tree?

01:07:24.984 --> 01:07:26.250
I erase this board.

01:07:33.169 --> 01:07:34.210
Kind of a vague question.

01:07:41.730 --> 01:07:43.980
Good thing about
a path is that it

01:07:43.980 --> 01:07:46.050
doesn't have very many leaves.

01:07:46.050 --> 01:07:48.220
That's one way to
quantify pathedness.

01:07:48.220 --> 01:07:54.290
Small number of leaves, I
claim life's not so bad.

01:07:54.290 --> 01:07:59.388
I actually need to do that
before we get to indirection.

01:08:16.010 --> 01:08:20.990
Step 5 is let's tune
jump pointers a bit.

01:08:24.479 --> 01:08:25.550
I want to make them--

01:08:28.141 --> 01:08:29.390
so they're the problem, right?

01:08:29.390 --> 01:08:31.330
That's where we
get n log n space.

01:08:31.330 --> 01:08:34.370
They're the only source
of our n log n space.

01:08:34.370 --> 01:08:39.170
So what I'd like to do is
in this situation where

01:08:39.170 --> 01:08:41.000
the number of leaves is small--

01:08:41.000 --> 01:08:42.740
we'll see what small
is in a moment--

01:08:42.740 --> 01:08:46.809
I would like jump pointers
to be linear size.

01:08:49.779 --> 01:08:51.420
OK, here's the idea.

01:08:56.250 --> 01:09:00.224
First idea is let's just store
jump pointers from leaves.

01:09:09.290 --> 01:09:10.020
OK.

01:09:10.020 --> 01:09:16.529
So that would imply
l log n space,

01:09:16.529 --> 01:09:18.640
I guess, plus linear overall.

01:09:22.948 --> 01:09:25.890
Instead of n log n, now we
just pay for the leaves,

01:09:25.890 --> 01:09:27.689
except we kind of
messed up our query.

01:09:27.689 --> 01:09:31.024
First thing query did was at the
node, follow the jump pointer.

01:09:31.024 --> 01:09:33.870
But it's not so bad.

01:09:33.870 --> 01:09:35.279
Here we are at x.

01:09:35.279 --> 01:09:39.120
There's some leaves
down here, and we want

01:09:39.120 --> 01:09:41.670
to jump up from here, from x.

01:09:41.670 --> 01:09:43.470
How do I jump from x?

01:09:43.470 --> 01:09:45.990
Well, if I could somehow
go from x to really,

01:09:45.990 --> 01:09:50.250
any leaf, the ancestors
of x that I care about

01:09:50.250 --> 01:09:53.010
are also ancestors of
any leaf descendant of x.

01:09:53.010 --> 01:09:57.420
So all I need to do
is store for each node

01:09:57.420 --> 01:10:01.930
any leaf descendant,
single pointer--

01:10:01.930 --> 01:10:09.260
this'll be linear--
from every node.

01:10:12.740 --> 01:10:15.510
OK so I start at x.

01:10:15.510 --> 01:10:18.670
I jump down to an arbitrary
leaf, say this one.

01:10:18.670 --> 01:10:22.950
And now I have to do a query.

01:10:25.800 --> 01:10:33.540
Jump down, and let's
say I jumped down by d.

01:10:33.540 --> 01:10:40.770
Then my k becomes
k plus d, right.

01:10:40.770 --> 01:10:42.960
If I went down by d,
and I want to go up

01:10:42.960 --> 01:10:46.440
by k from my original point,
now I have to go up by k plus d.

01:10:46.440 --> 01:10:50.010
But hey, we know how to
go up from any node that

01:10:50.010 --> 01:10:51.010
has jump pointers.

01:10:51.010 --> 01:10:55.800
So now we have a
new node, a leaf.

01:10:55.800 --> 01:11:01.020
So it has a jump pointer,
has jump pointers, upward.

01:11:01.020 --> 01:11:04.890
So we follow that one jump
pointer to get us halfway there

01:11:04.890 --> 01:11:06.660
from our new starting point.

01:11:06.660 --> 01:11:08.340
We follow one
ladder thing, and we

01:11:08.340 --> 01:11:13.530
can get to the level ancestor
k plus d from the leaf,

01:11:13.530 --> 01:11:16.270
and that's the level
ancestor k from x.

01:11:16.270 --> 01:11:19.069
OK, this is like a reduction
to the leaf situation.

01:11:19.069 --> 01:11:21.610
We really don't have to support
queries from arbitrary nodes.

01:11:21.610 --> 01:11:24.030
Just go down to a leaf
and then solve the problem

01:11:24.030 --> 01:11:25.840
from the leaf.

01:11:25.840 --> 01:11:26.340
OK.

01:11:29.350 --> 01:11:31.840
OK, so now, if the
number leaves is small,

01:11:31.840 --> 01:11:32.880
my space will get small.

01:11:32.880 --> 01:11:34.540
How small does l have to be?

01:11:34.540 --> 01:11:36.310
n divided by log n.

01:11:36.310 --> 01:11:39.180
Interesting.

01:11:39.180 --> 01:11:43.260
If I could get the top structure
to not have n over log n nodes,

01:11:43.260 --> 01:11:44.250
that's not possible.

01:11:44.250 --> 01:11:47.070
I can, at best, get to
n minus log n nodes.

01:11:47.070 --> 01:11:50.700
But if I could get it down
to n over log n leaves, that

01:11:50.700 --> 01:11:52.500
would be enough to
make this linear space,

01:11:52.500 --> 01:11:54.930
and indeed, I can.

01:11:54.930 --> 01:11:59.340
This is a technique called tree
trimming, or I call it that.

01:11:59.340 --> 01:12:01.540
I don't know if
anyone else does.

01:12:01.540 --> 01:12:04.542
But I think I've called
it that in enough papers

01:12:04.542 --> 01:12:06.000
that we're allowed
to call it that.

01:12:13.420 --> 01:12:15.670
Originally invented by
[? Al ?] [? Strip ?] and others

01:12:15.670 --> 01:12:18.620
for a particular data structure.

01:12:18.620 --> 01:12:19.820
There's many versions of it.

01:12:19.820 --> 01:12:21.830
We will see other versions
in future lectures,

01:12:21.830 --> 01:12:29.340
but here's the version
you need for this problem.

01:12:47.920 --> 01:12:50.470
OK, here's the plan.

01:12:50.470 --> 01:12:59.820
I have a tree and
I want to identify

01:12:59.820 --> 01:13:04.890
all the maximally deep nodes
that have at least log n

01:13:04.890 --> 01:13:06.505
nodes below them.

01:13:06.505 --> 01:13:08.130
This will seem weird,
because we really

01:13:08.130 --> 01:13:09.670
care about leaves, and so on.

01:13:09.670 --> 01:13:15.240
So there's stuff hanging
off here, whatever.

01:13:15.240 --> 01:13:18.220
I guess I'm thinking of
that as one big tree.

01:13:18.220 --> 01:13:21.180
No, actually I'm not.

01:13:21.180 --> 01:13:22.770
I do need to separate these out.

01:13:25.470 --> 01:13:28.230
But one of these nodes could
have arbitrarily many children.

01:13:28.230 --> 01:13:29.920
We have no idea.

01:13:29.920 --> 01:13:31.250
It's a arbitrary tree.

01:13:34.740 --> 01:13:38.310
OK, and what I know is that
each of these triangles

01:13:38.310 --> 01:13:42.750
has size less than 1/4 log n.

01:13:42.750 --> 01:13:46.020
Because otherwise, this
node was not maximally deep.

01:13:46.020 --> 01:13:53.919
So if this had size greater
or equal than 1/4 log n,

01:13:53.919 --> 01:13:56.460
then that would have been the
node where I cut, not this one.

01:13:56.460 --> 01:13:58.920
So I'm circling the
nodes that I cut below,

01:13:58.920 --> 01:14:00.240
so meaning I cut these edges.

01:14:03.050 --> 01:14:05.830
OK, so these things have
size less than 1/4 log n,

01:14:05.830 --> 01:14:11.880
but these nodes have at least
1/4 log n nodes below them.

01:14:11.880 --> 01:14:15.510
So how many of these
circle nodes are there?

01:14:15.510 --> 01:14:30.120
Well, at most, 4 n over
log n such nodes, right,

01:14:30.120 --> 01:14:33.300
because I can charge
this node to at least 1/4

01:14:33.300 --> 01:14:36.750
log n nodes that disappear
in the top structure.

01:14:40.010 --> 01:14:43.650
But these things become
the leaves, right.

01:14:43.650 --> 01:14:45.930
If I cut all the edges
going down from there,

01:14:45.930 --> 01:14:47.880
that makes it a leaf.

01:14:47.880 --> 01:14:50.720
And they're the only leaves.

01:14:50.720 --> 01:14:52.090
Are they the only leaves?

01:14:52.090 --> 01:14:53.730
Yeah.

01:14:53.730 --> 01:14:57.000
If you look at a leaf, then it
has size less than 1/4 log n.

01:14:57.000 --> 01:14:59.050
So you will cut
above it somewhere.

01:14:59.050 --> 01:15:01.540
So every old leaf
will be down here,

01:15:01.540 --> 01:15:05.270
and the only new leaves
will be the cut nodes.

01:15:05.270 --> 01:15:05.820
OK.

01:15:05.820 --> 01:15:12.330
So we have order n
over log n leaves.

01:15:12.330 --> 01:15:14.057
Yes, good.

01:15:14.057 --> 01:15:14.640
So it's funny.

01:15:14.640 --> 01:15:17.070
We're cutting according
to counting nodes,

01:15:17.070 --> 01:15:18.660
descendants, not leaves.

01:15:18.660 --> 01:15:20.870
Won't work if you
cut with leaves--

01:15:20.870 --> 01:15:21.630
cut with nodes.

01:15:21.630 --> 01:15:24.171
But then the thing that we care
about is the number of leaves

01:15:24.171 --> 01:15:25.660
went down.

01:15:25.660 --> 01:15:28.530
That will be enough.

01:15:28.530 --> 01:15:29.880
Great.

01:15:29.880 --> 01:15:38.100
So up here, we can afford to
use 5, the tuned jump pointer,

01:15:38.100 --> 01:15:41.730
combined with ladder structure.

01:15:41.730 --> 01:15:45.310
Because this only costs l log n.

01:15:45.310 --> 01:15:48.930
l is now n over log n,
so the log n's cancel.

01:15:48.930 --> 01:15:51.720
So linear space to
store the jump pointers

01:15:51.720 --> 01:15:53.910
from these circled nodes.

01:15:53.910 --> 01:15:56.370
So if our query is
anywhere up here,

01:15:56.370 --> 01:15:58.840
then we go to a descendant
leaf in the top structure.

01:15:58.840 --> 01:16:00.420
And we can go wherever
we need to go.

01:16:03.300 --> 01:16:05.506
If our query is in one
of the little trees

01:16:05.506 --> 01:16:08.130
at the bottom, which are small,
they're only 1/4 quarter log n,

01:16:08.130 --> 01:16:10.770
so we're going to
use a lookup table.

01:16:10.770 --> 01:16:12.845
Either answer is
inside the triangle,

01:16:12.845 --> 01:16:15.510
in which case, we really
need to query that structure.

01:16:15.510 --> 01:16:18.300
Or it's up here.

01:16:18.300 --> 01:16:21.690
If it's up here, we just
need to know, basically,

01:16:21.690 --> 01:16:25.660
if every node down here stores
a pointer to the dot above it.

01:16:25.660 --> 01:16:28.070
Then we can first go there
and see, is that too high?

01:16:28.070 --> 01:16:30.210
If it's too high, then
our answer is in here.

01:16:30.210 --> 01:16:31.680
If it's not too
high, then we just

01:16:31.680 --> 01:16:34.330
do the corresponding
query in structure 5.

01:16:34.330 --> 01:16:36.630
OK, so the last
remaining thing is

01:16:36.630 --> 01:16:40.240
to solve a query that stays
entirely within a triangle, so

01:16:40.240 --> 01:16:45.555
a bottom structure, and that's
where we use lookup tables.

01:16:56.740 --> 01:17:00.026
Again, things are going
to be similar to last time

01:17:00.026 --> 01:17:01.600
except for now, to step 7.

01:17:04.780 --> 01:17:08.410
But it's a little bit messier
because instead of arrays,

01:17:08.410 --> 01:17:09.760
we have trees.

01:17:09.760 --> 01:17:13.870
And here it's like we graduate
from baby [INAUDIBLE] which is

01:17:13.870 --> 01:17:16.090
how many plus or minus
1 strings there are--

01:17:16.090 --> 01:17:20.050
power of 2-- to how
many trees are there.

01:17:20.050 --> 01:17:23.890
Anyone know how many trees
on n nodes there are?

01:17:23.890 --> 01:17:24.700
One word answer.

01:17:28.220 --> 01:17:29.436
No.

01:17:29.436 --> 01:17:30.660
Nice.

01:17:30.660 --> 01:17:32.710
That is a correct
one word answer.

01:17:32.710 --> 01:17:34.150
Very good.

01:17:34.150 --> 01:17:38.620
Not the one I had in
mind, but anyone else?

01:17:54.630 --> 01:17:55.339
Nope.

01:17:55.339 --> 01:17:56.630
You're thinking end to the end.

01:17:56.630 --> 01:17:58.010
That would be bad.

01:17:58.010 --> 01:18:00.200
We could not afford that,
because log n to log n

01:18:00.200 --> 01:18:02.101
is super polynomial.

01:18:02.101 --> 01:18:03.350
Fortunately it's not that big.

01:18:03.350 --> 01:18:03.850
Hmm?

01:18:03.850 --> 01:18:04.592
AUDIENCE:

01:18:04.592 --> 01:18:06.050
PROFESSOR: It's
roughly 4 to the n.

01:18:06.050 --> 01:18:07.966
The correct answer-- I
mean the exact answer--

01:18:07.966 --> 01:18:11.637
is called the Catalan number,
which didn't tell you much.

01:18:11.637 --> 01:18:13.220
I didn't write it
down, but I'm pretty

01:18:13.220 --> 01:18:21.770
sure it is 2 n prime choose
n prime 1 over n prime plus 1

01:18:21.770 --> 01:18:24.380
ish?

01:18:24.380 --> 01:18:25.370
Don't quote me on that.

01:18:25.370 --> 01:18:27.380
It's roughly that.

01:18:27.380 --> 01:18:28.370
Might be exactly that.

01:18:28.370 --> 01:18:30.970
Someone with internet can check.

01:18:30.970 --> 01:18:33.460
But it is at most
4 to the n prime.

01:18:33.460 --> 01:18:35.210
The computer science
answer is 4 to the n.

01:18:35.210 --> 01:18:37.460
Indeed.

01:18:37.460 --> 01:18:39.260
It's just some asymptotics here.

01:18:39.260 --> 01:18:40.220
Why is it 4 to the n?

01:18:40.220 --> 01:18:42.440
4 to the n you could also
write as 2 to the 2 n

01:18:42.440 --> 01:18:44.754
prime, which is--

01:18:44.754 --> 01:18:46.670
first, let's check this
is good, and then I'll

01:18:46.670 --> 01:18:50.060
explain why this is true
in a computer science way.

01:18:50.060 --> 01:18:51.930
So we got 1/4 log n up here.

01:18:51.930 --> 01:18:55.910
So the one 2 cancels
with one 2 up here.

01:18:55.910 --> 01:18:57.850
So we have 2 to the 1/2 log n.

01:18:57.850 --> 01:19:00.095
This is our good friend root n.

01:19:00.095 --> 01:19:02.750
Root n is just something
that's n to the something,

01:19:02.750 --> 01:19:05.700
but is n to the
something less than 1.

01:19:05.700 --> 01:19:07.175
So we can afford
some log factors.

01:19:10.370 --> 01:19:13.970
Why are there only 2
to the 2 n prime trees?

01:19:13.970 --> 01:19:17.630
One way to see that is you can
encode a tree using 2n bits.

01:19:17.630 --> 01:19:20.360
If I have an n node tree, I
can encode it with 2n bits.

01:19:20.360 --> 01:19:21.710
How?

01:19:21.710 --> 01:19:23.720
Do an Euler tour.

01:19:23.720 --> 01:19:26.660
And all you really need
to know from an Euler tour

01:19:26.660 --> 01:19:28.970
to reconstruct the
tree is at each step,

01:19:28.970 --> 01:19:30.179
did I go down or did I go up?

01:19:30.179 --> 01:19:31.719
Those are the only
things you can do.

01:19:31.719 --> 01:19:33.530
If you went down,
it's to a new child.

01:19:33.530 --> 01:19:35.850
If you went up,
it's to an old node.

01:19:35.850 --> 01:19:38.562
So if I told you
a sequence of bits

01:19:38.562 --> 01:19:40.520
for every step in the
Euler tour, did I go down

01:19:40.520 --> 01:19:43.950
or did I go up, you can
reconstruct the tree.

01:19:43.950 --> 01:19:45.450
Now how many bits
do I have to do?

01:19:45.450 --> 01:19:48.080
Well, twice the number
of edges in the tree,

01:19:48.080 --> 01:19:49.610
because the length
of an Euler tour

01:19:49.610 --> 01:19:51.318
is twice the number
of edges in the tree.

01:19:51.318 --> 01:19:53.810
So 2 n bits are enough
to encode any tree.

01:19:53.810 --> 01:19:55.790
That's the computer
science information

01:19:55.790 --> 01:19:57.560
theoretic way to prove it.

01:19:57.560 --> 01:19:59.190
You could also do it
from this formula,

01:19:59.190 --> 01:20:01.440
but then you'd have to know
why the formula's correct,

01:20:01.440 --> 01:20:03.760
and that's messier.

01:20:03.760 --> 01:20:06.150
Cool.

01:20:06.150 --> 01:20:07.890
So we're almost done.

01:20:07.890 --> 01:20:12.480
We have root n possible
different structures down here.

01:20:12.480 --> 01:20:14.610
We've got n over
log n of them or--

01:20:14.610 --> 01:20:15.132
maybe.

01:20:15.132 --> 01:20:17.340
It's a little harder to know
exactly how many of them

01:20:17.340 --> 01:20:19.440
there are, but I don't care.

01:20:19.440 --> 01:20:21.310
There's only root
n different types,

01:20:21.310 --> 01:20:24.490
and so I only need to store
a lookup table for each type.

01:20:24.490 --> 01:20:32.250
The number of queries is
order log squared n again,

01:20:32.250 --> 01:20:36.960
because our structures
are of size order log n,

01:20:36.960 --> 01:20:39.470
and the answer to
a query is again,

01:20:39.470 --> 01:20:44.620
order log log n bits,
because there's only log

01:20:44.620 --> 01:20:47.730
n different nodes to point to.

01:20:47.730 --> 01:20:58.260
And so the total space is
order root n log n squared,

01:20:58.260 --> 01:21:01.650
log log n for the lookup table.

01:21:01.650 --> 01:21:05.800
And then each of these
triangles stores a pointer,

01:21:05.800 --> 01:21:08.010
or I guess, every node
in here stores a pointer

01:21:08.010 --> 01:21:14.490
to what tree we're in, or
what type of tree we have,

01:21:14.490 --> 01:21:17.250
and also what node in
that tree we are in.

01:21:17.250 --> 01:21:19.340
So every guy in here--

01:21:19.340 --> 01:21:20.940
because that's not
part of the query--

01:21:20.940 --> 01:21:23.970
has to store, not only a little
bit more specific pointer

01:21:23.970 --> 01:21:25.260
into this table.

01:21:25.260 --> 01:21:27.870
It actually tells you
what the query part is,

01:21:27.870 --> 01:21:30.390
or the first part of
the query, the node x.

01:21:30.390 --> 01:21:33.780
Then the table also
is parameterized by k,

01:21:33.780 --> 01:21:37.350
so one of these logs is
which node you're querying.

01:21:37.350 --> 01:21:39.300
The other log is
now the value k,

01:21:39.300 --> 01:21:41.676
but again, you never go
up higher than log n.

01:21:41.676 --> 01:21:43.050
If you went up
higher than log n,

01:21:43.050 --> 01:21:44.592
then you'd be in
the 5 structure,

01:21:44.592 --> 01:21:46.050
so if you just do
a query up there,

01:21:46.050 --> 01:21:48.430
you don't need a
query in the bottom.

01:21:48.430 --> 01:21:48.990
OK.

01:21:48.990 --> 01:21:50.760
So there's only
that many queries,

01:21:50.760 --> 01:21:55.550
and so space for this lookup
table is little o of n again.

01:21:55.550 --> 01:21:58.490
And so we're dominated by
space for these pointers

01:21:58.490 --> 01:22:00.530
and for the space up
here, which is linear.

01:22:00.530 --> 01:22:04.020
So linear space, constant query.

01:22:04.020 --> 01:22:06.398
Boom.

01:22:06.398 --> 01:22:07.362
Any questions?

01:22:13.150 --> 01:22:15.190
I have an open question, maybe.

01:22:15.190 --> 01:22:17.460
I think it's open.

01:22:17.460 --> 01:22:22.900
So what if you want to do
dynamic, 30 seconds of dynamic?

01:22:22.900 --> 01:22:26.590
For LCA, it's known
how to do dynamic LCA

01:22:26.590 --> 01:22:28.570
constant operations.

01:22:28.570 --> 01:22:30.890
The operations are add a leaf--

01:22:30.890 --> 01:22:32.530
we can add another leaf--

01:22:32.530 --> 01:22:34.330
given an edge.

01:22:34.330 --> 01:22:38.270
Subdivide that edge into
that, and also the reverse.

01:22:38.270 --> 01:22:42.850
So I can erase a guy, put
the edge back, delete a leaf,

01:22:42.850 --> 01:22:44.230
those sorts of things.

01:22:44.230 --> 01:22:47.500
Those operations can all be
done in constant time for LCA.

01:22:47.500 --> 01:22:49.510
What about level ancestor?

01:22:49.510 --> 01:22:50.570
I have no idea.

01:22:50.570 --> 01:22:53.050
Maye we'll work on it today.

01:22:53.050 --> 01:22:54.444
That's it.