WEBVTT

00:00:00.090 --> 00:00:02.490
The following content is
provided under a Creative

00:00:02.490 --> 00:00:04.030
Commons license.

00:00:04.030 --> 00:00:06.360
Your support will help
MIT OpenCourseWare

00:00:06.360 --> 00:00:10.720
continue to offer high quality
educational resources for free.

00:00:10.720 --> 00:00:13.320
To make a donation or
view additional materials

00:00:13.320 --> 00:00:17.280
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:17.280 --> 00:00:18.450
at ocw.mit.edu.

00:00:22.100 --> 00:00:24.680
ERIK DEMAINE: All right, welcome
back to Dynamic Optimality.

00:00:24.680 --> 00:00:27.020
This is the second
of two lectures.

00:00:27.020 --> 00:00:29.770
And today we're going to
focus mainly on lower bounds.

00:00:29.770 --> 00:00:32.479
So last time we saw this
geometric connection

00:00:32.479 --> 00:00:34.020
to binary search trees.

00:00:34.020 --> 00:00:37.760
So again, this is about is there
one best binary search tree.

00:00:37.760 --> 00:00:40.070
And we represented binary
search trees, or at least

00:00:40.070 --> 00:00:42.080
the execution of
those algorithms,

00:00:42.080 --> 00:00:45.920
as point sets in time space.

00:00:45.920 --> 00:00:49.340
And of course a point
set corresponded

00:00:49.340 --> 00:00:52.410
to a valid execution of a BST
tree where each of these points

00:00:52.410 --> 00:00:55.970
represented which nodes got
touched during an access.

00:00:55.970 --> 00:00:59.300
If and only if the point
set was arborally satisfied,

00:00:59.300 --> 00:01:01.910
meaning you take any two
points in the point set,

00:01:01.910 --> 00:01:03.530
if they span a
rectangle that is not

00:01:03.530 --> 00:01:05.720
just a horizontal or
vertical line segment,

00:01:05.720 --> 00:01:07.640
there must be a
third point somewhere

00:01:07.640 --> 00:01:09.410
inside that rectangle,
which in the end

00:01:09.410 --> 00:01:14.310
implies that there's a monotone
path between those two points.

00:01:14.310 --> 00:01:16.190
And then we saw, on
the upper bound side,

00:01:16.190 --> 00:01:19.100
we saw a greedy algorithm, which
was the obvious offline thing

00:01:19.100 --> 00:01:22.010
to do, which is as
these points come along,

00:01:22.010 --> 00:01:24.170
as you do the accesses,
the white dots,

00:01:24.170 --> 00:01:26.540
you add the necessary
red dots in order

00:01:26.540 --> 00:01:30.867
to make it arborally
satisfied row by row.

00:01:30.867 --> 00:01:33.200
And so that seemed like the
obvious offline thing to do.

00:01:33.200 --> 00:01:36.140
Turns out it could be done
online up to constant factors.

00:01:36.140 --> 00:01:37.790
I sketched that last time.

00:01:37.790 --> 00:01:40.100
And this is conjectured to
be within a constant factor

00:01:40.100 --> 00:01:41.060
of optimal.

00:01:41.060 --> 00:01:41.810
We can't prove it.

00:01:41.810 --> 00:01:43.851
What I'm going to show
today is our best attempts

00:01:43.851 --> 00:01:45.166
at proving this is optimal.

00:01:45.166 --> 00:01:46.790
In particular, there's
something called

00:01:46.790 --> 00:01:49.730
the signed greedy algorithm,
which is almost the same as

00:01:49.730 --> 00:01:52.100
greedy, but it's a lower bound.

00:01:52.100 --> 00:01:54.007
And greedy is an upper bound.

00:01:54.007 --> 00:01:56.090
So all you need to do is
show these two things are

00:01:56.090 --> 00:01:59.564
within a constant factor of
each other and we're done.

00:01:59.564 --> 00:02:01.230
We're not going to
get there, obviously,

00:02:01.230 --> 00:02:02.830
because we haven't
solved that yet.

00:02:02.830 --> 00:02:05.900
But along the way,
we're going to see

00:02:05.900 --> 00:02:10.639
tango trees, which achieve this
log log n competitive bound.

00:02:10.639 --> 00:02:13.120
So we think greedy is
constant competitive.

00:02:13.120 --> 00:02:14.840
The best we know right
now is log log n,

00:02:14.840 --> 00:02:17.330
this is an improvement
over red black trees,

00:02:17.330 --> 00:02:19.340
which achieve log n.

00:02:19.340 --> 00:02:21.710
Any balance binary search
tree is within a log n

00:02:21.710 --> 00:02:23.030
factor of optimal.

00:02:23.030 --> 00:02:27.350
So it's all between constant and
log n that we're trying to do.

00:02:27.350 --> 00:02:29.450
Another fun consequence
of lower bounds

00:02:29.450 --> 00:02:31.450
is a particular
sense in which log n

00:02:31.450 --> 00:02:34.350
is necessary for some
access sequences.

00:02:34.350 --> 00:02:37.340
So I argued last time
that if you take, like,

00:02:37.340 --> 00:02:39.650
a random access sequence, or--

00:02:39.650 --> 00:02:42.487
for example, if you look
at a binary search tree

00:02:42.487 --> 00:02:44.570
and you say, oh, I'll just
access the thing that's

00:02:44.570 --> 00:02:46.160
deepest in the
tree, there's always

00:02:46.160 --> 00:02:48.950
something that's deep in the
tree of depth, at least log n.

00:02:48.950 --> 00:02:50.450
And so for any
binary search tree

00:02:50.450 --> 00:02:51.710
there is an access sequence.

00:02:51.710 --> 00:02:53.543
No matter what that
binary search tree does,

00:02:53.543 --> 00:02:56.300
I can choose the next
access to force you

00:02:56.300 --> 00:02:58.520
to take log n per operation.

00:02:58.520 --> 00:03:01.370
What we're going to see
today in these lower bounds

00:03:01.370 --> 00:03:05.600
is one access sequence that
for all binary search trees,

00:03:05.600 --> 00:03:08.780
they must spend log n time.

00:03:08.780 --> 00:03:11.240
Just changing the
quantifiers around.

00:03:11.240 --> 00:03:13.970
So instead of for every
binary search tree

00:03:13.970 --> 00:03:15.470
there is an access
sequence, there's

00:03:15.470 --> 00:03:17.480
going to be there is
an access sequence

00:03:17.480 --> 00:03:19.070
such that for every
binary search tree

00:03:19.070 --> 00:03:20.330
you need log n time.

00:03:20.330 --> 00:03:22.810
That's something we'll get
easily out of these lower

00:03:22.810 --> 00:03:24.470
bounds.

00:03:24.470 --> 00:03:26.660
So let's jump into
the lower bounds.

00:03:26.660 --> 00:03:29.620
And we're going to cover you
could say three different lower

00:03:29.620 --> 00:03:30.120
bounds.

00:03:30.120 --> 00:03:32.930
The independent rectangles
is kind of a generic class

00:03:32.930 --> 00:03:34.250
of lower bounds.

00:03:34.250 --> 00:03:36.260
Then we're going to see
two specific choices

00:03:36.260 --> 00:03:38.810
of these independent
rectangles, which are actually

00:03:38.810 --> 00:03:41.480
older than this
result. So this is

00:03:41.480 --> 00:03:44.450
sort of a modern interpretation
of two older results

00:03:44.450 --> 00:03:47.060
and a more general result.

00:03:47.060 --> 00:03:50.210
Signed greedy is going to turn
out to be the best lower bound.

00:03:50.210 --> 00:03:52.790
Is It's better than all the
ones that we will cover,

00:03:52.790 --> 00:03:56.630
but each of them has their
own uses for analysis.

00:03:56.630 --> 00:03:59.300
Each of them is going to
let us analyze an algorithm

00:03:59.300 --> 00:04:01.340
that we couldn't or
that we don't otherwise

00:04:01.340 --> 00:04:04.160
know how to analyze.

00:04:04.160 --> 00:04:08.600
So let's do the independent
rectangle lower bound.

00:04:08.600 --> 00:04:09.700
The sort of generic one.

00:04:25.540 --> 00:04:28.690
So these lower
bounds are all going

00:04:28.690 --> 00:04:31.780
to refer to the
original point set,

00:04:31.780 --> 00:04:34.720
the white dots, the accesses.

00:04:34.720 --> 00:04:37.330
The idea is you're given an
access sequence, a sequence x

00:04:37.330 --> 00:04:41.170
i-- x 1 up to x n, and you
want to know some lower bound

00:04:41.170 --> 00:04:43.690
that every binary
search tree requires

00:04:43.690 --> 00:04:46.930
a certain number of accesses, a
certain number of node touches

00:04:46.930 --> 00:04:47.980
for that access sequence.

00:04:47.980 --> 00:04:49.021
You know it's at least n.

00:04:49.021 --> 00:04:50.910
You want something
bigger than n.

00:04:50.910 --> 00:04:54.475
We've got to at least touch the
nodes that are being accessed.

00:04:58.983 --> 00:05:00.370
I'm going to drop this.

00:05:15.520 --> 00:05:19.090
So I want the notion of
independent rectangles.

00:05:19.090 --> 00:05:24.400
And general idea of
dependent rectangles

00:05:24.400 --> 00:05:27.340
would be something like this.

00:05:30.328 --> 00:05:31.300
Ah, I see.

00:05:42.610 --> 00:05:44.170
So these are two rectangles.

00:05:44.170 --> 00:05:46.570
I consider them dependent
because one of the corners

00:05:46.570 --> 00:05:48.900
is inside the other rectangle.

00:05:48.900 --> 00:05:51.320
This is true no matter
where the points are.

00:05:51.320 --> 00:05:59.750
So, for example, if I take two
points, they span a rectangle.

00:05:59.750 --> 00:06:04.660
If I take these two points, for
example, they span a rectangle.

00:06:04.660 --> 00:06:06.180
This corner is inside that one.

00:06:06.180 --> 00:06:08.440
So these are considered
dependent rectangles

00:06:08.440 --> 00:06:11.127
in either case.

00:06:11.127 --> 00:06:13.210
So corner here does not
necessarily mean a point--

00:06:13.210 --> 00:06:14.782
any of the four corners.

00:06:14.782 --> 00:06:16.240
Rectangle is defined
by two points,

00:06:16.240 --> 00:06:19.960
but it has all four corners.

00:06:19.960 --> 00:06:23.380
And so, in particular,
independent rectangles--

00:06:23.380 --> 00:06:26.080
for example, they might
be completely disjoint.

00:06:26.080 --> 00:06:28.700
Those are going
to be independent.

00:06:28.700 --> 00:06:31.700
Something like that
is independent.

00:06:31.700 --> 00:06:33.140
But there are some other cases.

00:06:33.140 --> 00:06:37.180
You can have rectangles
that look like this.

00:06:37.180 --> 00:06:38.110
OK?

00:06:38.110 --> 00:06:39.901
And it doesn't matter
where the points are.

00:06:39.901 --> 00:06:42.000
Maybe here, here,
here, and here.

00:06:42.000 --> 00:06:44.210
Or the other way.

00:06:44.210 --> 00:06:46.990
These are independent.

00:06:46.990 --> 00:06:52.960
And there's one other kind of
special case, which maybe I'll

00:06:52.960 --> 00:06:56.680
use color to draw the
other one because they're

00:06:56.680 --> 00:06:57.970
right on top of each other.

00:07:03.405 --> 00:07:10.000
So I've got a point here, a
point here, and a point here.

00:07:10.000 --> 00:07:13.810
These are two rectangles
defined on three points.

00:07:13.810 --> 00:07:16.990
So they both use this point.

00:07:16.990 --> 00:07:19.930
And if you check, it does
satisfy this condition.

00:07:19.930 --> 00:07:23.260
So no corner strictly
inside the other.

00:07:23.260 --> 00:07:25.660
But we also need that the
rectangles are unsatisfied.

00:07:25.660 --> 00:07:28.660
So this is saying that
there's no other point even

00:07:28.660 --> 00:07:30.410
on the boundary
of the rectangle.

00:07:30.410 --> 00:07:34.360
So this part says, OK, there's
nothing strictly inside.

00:07:34.360 --> 00:07:36.610
But we also need
that on the boundary

00:07:36.610 --> 00:07:38.060
there's no other points.

00:07:38.060 --> 00:07:40.060
So this is the only
sort of situation other

00:07:40.060 --> 00:07:44.200
than reflections where
you get this working

00:07:44.200 --> 00:07:45.840
out as independent.

00:07:50.632 --> 00:07:52.090
AUDIENCE: Last case
is independent?

00:07:52.090 --> 00:07:53.730
ERIK DEMAINE: Last
case is independent.

00:07:56.730 --> 00:07:58.800
All right?

00:07:58.800 --> 00:08:00.470
So this is a definition.

00:08:00.470 --> 00:08:02.470
If I give you a
set of rectangles,

00:08:02.470 --> 00:08:05.070
they're independent.

00:08:05.070 --> 00:08:06.930
I mean, I was
looking at pairwise.

00:08:06.930 --> 00:08:08.640
But if they're are
pairwise independent,

00:08:08.640 --> 00:08:10.710
then they will be independent.

00:08:10.710 --> 00:08:12.270
No corner of any
rectangle strictly

00:08:12.270 --> 00:08:14.590
inside any other rectangle.

00:08:14.590 --> 00:08:17.940
And there's no points
of those rectangles

00:08:17.940 --> 00:08:21.360
that are inside others.

00:08:21.360 --> 00:08:22.166
OK.

00:08:22.166 --> 00:08:22.665
Cool.

00:08:25.390 --> 00:08:27.790
So what?

00:08:32.350 --> 00:08:36.640
Lower bound says the optimal
offline binary search

00:08:36.640 --> 00:08:41.799
tree, or the optimal way to add
dots to satisfy your point set,

00:08:41.799 --> 00:08:45.490
is going to be at least
the size of the input--

00:08:45.490 --> 00:08:48.340
meaning the number of
initial points you have--

00:08:48.340 --> 00:08:53.170
plus half the maximum number
of independent rectangles.

00:09:02.591 --> 00:09:03.090
OK.

00:09:03.090 --> 00:09:05.460
So this is a max
independence set problem.

00:09:05.460 --> 00:09:07.350
In general, that's NP-complete.

00:09:07.350 --> 00:09:10.260
Turns out we'll be able to at
least approximate the number

00:09:10.260 --> 00:09:12.990
of independent rectangles
within a constant factor

00:09:12.990 --> 00:09:14.560
by the end of class.

00:09:14.560 --> 00:09:16.220
That's going to
be signed greedy.

00:09:16.220 --> 00:09:18.180
So signed greedy is
going to be the best

00:09:18.180 --> 00:09:21.270
way up to constant factors to
choose independent rectangles.

00:09:21.270 --> 00:09:24.780
For now, someone
magically tells you

00:09:24.780 --> 00:09:28.770
what's the best way or you
just choose some reasonable--

00:09:28.770 --> 00:09:30.534
any choice of
independent rectangles

00:09:30.534 --> 00:09:31.450
will be a lower bound.

00:09:31.450 --> 00:09:34.600
But you get the best lower
bound by choosing the max.

00:09:34.600 --> 00:09:35.100
OK?

00:09:35.100 --> 00:09:38.400
So we're going to
prove this theorem,

00:09:38.400 --> 00:09:43.432
and then we're going to see
three different ways to choose

00:09:43.432 --> 00:09:44.640
those independent rectangles.

00:09:44.640 --> 00:09:46.770
And we'll use them
for various things.

00:09:46.770 --> 00:09:48.780
Wilber 1, Wilber,
2 and signed greedy

00:09:48.780 --> 00:09:51.090
are going to be
the three choices

00:09:51.090 --> 00:09:53.530
for independent rectangles.

00:09:53.530 --> 00:09:54.030
All right.

00:09:54.030 --> 00:09:57.660
To prove this theorem, we're
going to change it a little bit

00:09:57.660 --> 00:09:59.850
first.

00:09:59.850 --> 00:10:02.700
And this is kind of
the focus of today--

00:10:02.700 --> 00:10:04.890
is the idea of
signed rectangles.

00:10:09.090 --> 00:10:11.580
If you look at the rectangles
in the world spanned

00:10:11.580 --> 00:10:14.800
by two points, there
are two different kinds.

00:10:14.800 --> 00:10:20.460
There's the top right,
lower left kind.

00:10:20.460 --> 00:10:23.910
And then there's the top
left, lower right kind.

00:10:23.910 --> 00:10:26.610
These are positive
slope or negative slope.

00:10:29.340 --> 00:10:31.590
Those are the two
kinds of rectangles.

00:10:31.590 --> 00:10:36.630
And it's helpful to think about
just the positive rectangles

00:10:36.630 --> 00:10:39.690
or the slash rectangles and
just the backslash rectangles

00:10:39.690 --> 00:10:41.380
separately.

00:10:41.380 --> 00:10:43.920
So we're going to
call a point set--

00:10:48.090 --> 00:10:50.400
it's a little hard
to pronounce--

00:10:50.400 --> 00:10:53.820
we used to call
this plus satisfied.

00:10:53.820 --> 00:10:55.620
So maybe it's easiest
to pronounce it

00:10:55.620 --> 00:11:00.990
that way, the symbol formerly
known as plus satisfied,

00:11:00.990 --> 00:11:11.370
if all plus rectangles
that are not

00:11:11.370 --> 00:11:15.000
on a horizontal or vertical
line contain another point.

00:11:23.360 --> 00:11:27.390
So a point set is
arborally satisfied

00:11:27.390 --> 00:11:31.310
if and only if it is plus
satisfied and minus satisfied--

00:11:31.310 --> 00:11:33.787
just breaking apart that
definition into two parts.

00:11:33.787 --> 00:11:36.120
But now, we're going to look
at point sets that are just

00:11:36.120 --> 00:11:39.300
plus satisfied or point sets
that are just minus satisfied.

00:11:39.300 --> 00:11:42.630
And then we can look
at the optimal solution

00:11:42.630 --> 00:11:46.960
if you only care
about plus rectangles.

00:11:46.960 --> 00:12:03.840
So this is the smallest
plus satisfied point set

00:12:03.840 --> 00:12:08.776
containing all the access
points, all the given points.

00:12:08.776 --> 00:12:12.470
So we'll call that the input.

00:12:12.470 --> 00:12:15.090
OPT was the smallest
arborally satisfied.

00:12:15.090 --> 00:12:17.340
OPT plus is the--

00:12:17.340 --> 00:12:19.483
you just look at
plus rectangles.

00:12:24.351 --> 00:12:24.850
OK.

00:12:24.850 --> 00:12:26.570
Why are we doing this?

00:12:26.570 --> 00:12:29.030
Well, for now, we're going to
do it to prove this theorem.

00:12:29.030 --> 00:12:35.830
So lemma, which was what
we're actually going to prove,

00:12:35.830 --> 00:12:38.650
is if you look at
this OPT plus thing,

00:12:38.650 --> 00:12:41.010
it's got to be at least
the size of the input--

00:12:41.010 --> 00:12:44.620
everything has to at
least contain the input--

00:12:44.620 --> 00:12:54.460
plus maximum number of
independent plus rectangles.

00:13:02.260 --> 00:13:04.680
So this is where we're
actually going to prove.

00:13:04.680 --> 00:13:06.180
If you want to
get plus satisfied

00:13:06.180 --> 00:13:09.240
and you've got k
independent plus rectangles,

00:13:09.240 --> 00:13:11.050
you need to add at
least that many points--

00:13:11.050 --> 00:13:14.850
so at least one point
per plus rectangle.

00:13:14.850 --> 00:13:16.830
If you can prove this,
you prove the theorem

00:13:16.830 --> 00:13:18.900
because this holds
for minus just

00:13:18.900 --> 00:13:21.120
as well as plus by symmetry.

00:13:21.120 --> 00:13:26.400
And so you take your maximum
independent set of rectangles.

00:13:26.400 --> 00:13:28.830
At least half of them are
plus or at least half of them

00:13:28.830 --> 00:13:30.000
are minus.

00:13:30.000 --> 00:13:34.290
You apply this bound, and that's
where you get the 1/2 here.

00:13:34.290 --> 00:13:38.182
So this is stronger, I
guess, than the theorem,

00:13:38.182 --> 00:13:40.140
and this is what we're
actually going to prove.

00:13:40.140 --> 00:13:41.850
And so, in this
world, we just are

00:13:41.850 --> 00:13:44.790
thinking about plus rectangles,
which is a little weird.

00:13:44.790 --> 00:13:47.340
But it works.

00:13:51.120 --> 00:13:54.366
And the proof is going
to be in three steps.

00:13:54.366 --> 00:13:56.760
I'm first going to give you
an overview of the steps,

00:13:56.760 --> 00:13:58.470
and then we'll actually do them.

00:13:58.470 --> 00:14:03.030
So this is like a
two-level proof.

00:14:03.030 --> 00:14:07.470
First thing we
do, the top level,

00:14:07.470 --> 00:14:13.590
is we're going to
find a rectangle

00:14:13.590 --> 00:14:20.370
in the independent
set, and we're

00:14:20.370 --> 00:14:27.540
going to find a vertical line
that hits only that rectangle.

00:14:33.730 --> 00:14:37.160
So we're going to
have some rectangle

00:14:37.160 --> 00:14:43.380
in the independent set, and we
want a vertical line stabbing

00:14:43.380 --> 00:14:46.470
it such that no
other rectangle is

00:14:46.470 --> 00:14:48.490
stabbed by this vertical line.

00:14:48.490 --> 00:14:50.910
So all other rectangles--
that's independence,

00:14:50.910 --> 00:14:54.260
so maybe they look
something like this--

00:14:54.260 --> 00:14:55.690
but nothing like this.

00:14:59.297 --> 00:15:01.380
Not obvious that such a
thing exists, but it does.

00:15:01.380 --> 00:15:03.240
Actually, not that hard to find.

00:15:03.240 --> 00:15:05.510
We just need some
rectangle with some line.

00:15:09.540 --> 00:15:16.230
Then, using that
property, we're going

00:15:16.230 --> 00:15:31.200
to be able to find some points
in that rectangle that are also

00:15:31.200 --> 00:15:43.530
in the optimal plus
solution in the rectangle

00:15:43.530 --> 00:15:44.490
crossing the line.

00:15:49.950 --> 00:15:52.770
Let me get another color.

00:15:52.770 --> 00:15:56.490
So we're going to find
a point on the left

00:15:56.490 --> 00:15:59.964
of the line and a point
on the right of the line.

00:15:59.964 --> 00:16:01.380
And they're
horizontally adjacent,

00:16:01.380 --> 00:16:03.790
meaning there's no other
point between them.

00:16:03.790 --> 00:16:07.164
So we know there's
some point in this box.

00:16:07.164 --> 00:16:08.580
Because this is a
plus box, it has

00:16:08.580 --> 00:16:09.750
got to be satisfied somehow.

00:16:09.750 --> 00:16:11.375
And I claim there's
actually two points

00:16:11.375 --> 00:16:12.600
on either side of the line.

00:16:12.600 --> 00:16:14.992
One of them could be equal
to this or this, but not both

00:16:14.992 --> 00:16:16.950
obviously because they're
horizontally aligned.

00:16:21.270 --> 00:16:29.230
And then what we're going to
do is charge the rectangle

00:16:29.230 --> 00:16:29.970
to those points.

00:16:34.346 --> 00:16:36.720
And then, basically, we're
going to remove that rectangle

00:16:36.720 --> 00:16:38.340
and repeat.

00:16:38.340 --> 00:16:40.830
And the claim is this charging
sort of only happens once

00:16:40.830 --> 00:16:41.910
per point.

00:16:41.910 --> 00:16:44.700
And therefore, the number of
points in the optimal solution

00:16:44.700 --> 00:16:48.150
has to be at least the
number of rectangles-- number

00:16:48.150 --> 00:16:50.940
of plus rectangles in
the independent set.

00:16:50.940 --> 00:16:53.740
So, basically, this is a way
of ordering the rectangles.

00:16:53.740 --> 00:16:56.640
We're going to take one that
has one of these vertical lines,

00:16:56.640 --> 00:17:00.630
find two points that
pay for that rectangle,

00:17:00.630 --> 00:17:03.360
and therefore argue that OPT
has to be at least the number

00:17:03.360 --> 00:17:04.394
of rectangles.

00:17:04.394 --> 00:17:08.250
So we have to argue that at
least one of these points

00:17:08.250 --> 00:17:11.470
is not one of the
original points.

00:17:11.470 --> 00:17:15.644
And that's where we're
getting the input plus this.

00:17:15.644 --> 00:17:17.310
So there's lots of
things to check here.

00:17:17.310 --> 00:17:18.476
Let's do them one at a time.

00:17:21.560 --> 00:17:23.930
And throughout, I'm
going to assume--

00:17:23.930 --> 00:17:27.150
let me write that the bottom--

00:17:27.150 --> 00:17:31.700
assume all x- and
y-coordinates are unique.

00:17:39.860 --> 00:17:43.490
This is an idea I mentioned
last time as well.

00:17:43.490 --> 00:17:46.890
If you have lots of
accesses to the same key,

00:17:46.890 --> 00:17:50.870
imagine them being accesses
to slightly different keys.

00:17:50.870 --> 00:17:53.330
Just skew them a little
bit, and it doesn't

00:17:53.330 --> 00:17:56.420
change any of the bounds much.

00:17:56.420 --> 00:18:00.290
I won't to argue that here,
but at the least think of this

00:18:00.290 --> 00:18:03.530
as just a simplifying assumption
to make the proofs cleaner.

00:18:07.010 --> 00:18:09.230
So how are we going
to do step one?

00:18:09.230 --> 00:18:12.710
I need to find some rectangle
and some vertical line that

00:18:12.710 --> 00:18:16.580
only stabs that rectangle.

00:18:16.580 --> 00:18:18.560
And the way we're going
to do that is just

00:18:18.560 --> 00:18:30.190
take the widest rectangle that
just has the maximum x extent.

00:18:30.190 --> 00:18:34.810
There might be more than one,
but just take one of them.

00:18:34.810 --> 00:18:38.534
So it's very wide.

00:18:38.534 --> 00:18:39.950
What this tells
us is that there's

00:18:39.950 --> 00:18:42.710
no other rectangle like this.

00:18:42.710 --> 00:18:45.110
This would be independent,
but it would be wider.

00:18:45.110 --> 00:18:46.040
So that's not allowed.

00:18:51.365 --> 00:18:53.490
Now, we have to think about
all sorts of scenarios.

00:18:53.490 --> 00:18:55.990
So we've got a point
here and a point here.

00:18:55.990 --> 00:18:58.960
It could still be that we
have rectangles like this.

00:18:58.960 --> 00:19:00.970
They just can't go
farther to the right.

00:19:00.970 --> 00:19:03.670
It could be we have
rectangles that go like this--

00:19:03.670 --> 00:19:07.354
just can't go too
far to the left.

00:19:07.354 --> 00:19:08.770
These rectangles
that are anchored

00:19:08.770 --> 00:19:10.330
in the lower left and
these rectangles that

00:19:10.330 --> 00:19:12.610
are anchored in the upper
right can't touch each other

00:19:12.610 --> 00:19:16.369
because then one of
them would be satisfied.

00:19:16.369 --> 00:19:18.160
This one's going to
have a point down here.

00:19:18.160 --> 00:19:20.230
This one is going to
have a point here.

00:19:20.230 --> 00:19:23.200
I guess-- yeah, let's
see, how would it

00:19:23.200 --> 00:19:24.560
go if they were touching?

00:19:31.110 --> 00:19:33.465
We'd have a corner--

00:19:33.465 --> 00:19:34.910
hmm, touching is a little weird.

00:19:51.000 --> 00:19:52.000
Ah, I see.

00:19:52.000 --> 00:19:52.500
Good.

00:19:52.500 --> 00:19:53.958
This can't happen
because we assume

00:19:53.958 --> 00:19:56.790
the x-coordinates are distinct.

00:19:56.790 --> 00:19:59.360
So that's why I did this.

00:19:59.360 --> 00:20:01.860
That's the reason.

00:20:01.860 --> 00:20:03.580
So this can't happen.

00:20:03.580 --> 00:20:07.410
And I also can't have them go
like this because then there's

00:20:07.410 --> 00:20:12.120
a corner in the strict interior
of the other rectangle.

00:20:12.120 --> 00:20:12.912
Is that clear?

00:20:12.912 --> 00:20:14.370
This rectangle
can't come over here

00:20:14.370 --> 00:20:18.090
because then that would
be not independent.

00:20:18.090 --> 00:20:20.160
Rectangle can't come
right to the same spot

00:20:20.160 --> 00:20:21.460
because there is no same spot.

00:20:21.460 --> 00:20:24.190
That would be two points
on the same vertical line.

00:20:24.190 --> 00:20:26.460
And so what we must
have is a picture more

00:20:26.460 --> 00:20:30.810
like this where there's an
empty region in between.

00:20:30.810 --> 00:20:34.680
that not hit by-- there can be
many of these rectangles, many

00:20:34.680 --> 00:20:35.911
of these rectangles.

00:20:35.911 --> 00:20:37.410
They're independent
from each other.

00:20:37.410 --> 00:20:40.620
That's like this case here.

00:20:40.620 --> 00:20:43.560
There can also be some
rectangles like this.

00:20:43.560 --> 00:20:47.100
But by the same argument, these
guys can't touch each other

00:20:47.100 --> 00:20:49.380
and they can't
overlap horizontally

00:20:49.380 --> 00:20:52.770
because then one of the corners
would be inside the other.

00:20:52.770 --> 00:20:54.080
Question?

00:20:54.080 --> 00:20:56.455
AUDIENCE: For that picture,
you drew a rectangle

00:20:56.455 --> 00:20:59.282
under the other one.

00:20:59.282 --> 00:21:00.740
ERIK DEMAINE: This
one or this one?

00:21:00.740 --> 00:21:01.531
AUDIENCE: That one.

00:21:01.531 --> 00:21:03.480
ERIK DEMAINE: Yeah,
this one cannot happen.

00:21:03.480 --> 00:21:04.575
That's what we claim--

00:21:04.575 --> 00:21:05.250
haha, right.

00:21:05.250 --> 00:21:07.140
So you're right.

00:21:07.140 --> 00:21:10.590
So we worry about--

00:21:10.590 --> 00:21:11.612
interesting.

00:21:11.612 --> 00:21:13.320
Well, we worry about
something like this.

00:21:16.194 --> 00:21:19.425
AUDIENCE: Sorry, why
can't that happen?

00:21:19.425 --> 00:21:20.800
ERIK DEMAINE:
Yeah, you're right.

00:21:20.800 --> 00:21:22.216
I actually drew
the wrong picture.

00:21:22.216 --> 00:21:23.215
Sorry.

00:21:23.215 --> 00:21:23.715
Kidding.

00:21:31.080 --> 00:21:31.580
Yeah.

00:21:31.580 --> 00:21:33.250
I really meant
line segment here.

00:21:33.250 --> 00:21:34.305
I'm sorry.

00:21:39.640 --> 00:21:42.020
Poor choice of wording.

00:21:42.020 --> 00:21:43.720
So vertical line is
actually just going

00:21:43.720 --> 00:21:47.770
to go the extent
of the rectangle--

00:21:47.770 --> 00:21:49.510
something like this.

00:21:49.510 --> 00:21:50.230
Sorry.

00:21:50.230 --> 00:21:53.910
We can't forbid
rectangles like this.

00:21:53.910 --> 00:21:57.260
What we can forbid our
rectangles like this

00:21:57.260 --> 00:22:01.632
that also try to
cross that segment.

00:22:01.632 --> 00:22:03.340
We'll see why this is
enough in a moment.

00:22:03.340 --> 00:22:04.060
Sorry about that.

00:22:08.350 --> 00:22:12.070
I really only care about the
interior of this rectangle.

00:22:12.070 --> 00:22:15.370
I'm trying to get a vertical
line that only stabs

00:22:15.370 --> 00:22:17.530
this rectangle, nothing else--

00:22:17.530 --> 00:22:19.600
inside the rectangle.

00:22:19.600 --> 00:22:20.740
Sorry, poor wording.

00:22:20.740 --> 00:22:22.674
I don't care about
these guys outside

00:22:22.674 --> 00:22:24.340
because I can't say
anything about them.

00:22:24.340 --> 00:22:29.275
They could be all over the
place in an independent set.

00:22:29.275 --> 00:22:31.150
I mean, relative to what
hits this rectangle,

00:22:31.150 --> 00:22:32.050
there's stuff on the left.

00:22:32.050 --> 00:22:33.050
There's stuff on the right.

00:22:33.050 --> 00:22:33.924
There are these guys.

00:22:33.924 --> 00:22:37.340
There can also be
things like this.

00:22:37.340 --> 00:22:40.600
But still remaining
are these regions

00:22:40.600 --> 00:22:42.959
which are not hit
by any rectangles,

00:22:42.959 --> 00:22:44.500
and that's because
what I was saying.

00:22:44.500 --> 00:22:46.050
These guys can't touch each
other because then there

00:22:46.050 --> 00:22:47.680
would be equal x-coordinates.

00:22:47.680 --> 00:22:49.660
They can't overlap
because then one of them

00:22:49.660 --> 00:22:52.180
would not be independent
from the other.

00:22:52.180 --> 00:22:58.540
So I get my vertical lines.

00:22:58.540 --> 00:23:01.510
I just need one, but it
could be any of these.

00:23:01.510 --> 00:23:06.400
In general, for example, if
you take all of these lower

00:23:06.400 --> 00:23:08.470
left anchored
rectangles and take just

00:23:08.470 --> 00:23:10.540
to the right of
the rightmost one,

00:23:10.540 --> 00:23:13.210
that will be a valid
choice for your line.

00:23:13.210 --> 00:23:17.060
Because you can argue none
of these can overlap it.

00:23:17.060 --> 00:23:18.550
So that's step one.

00:23:18.550 --> 00:23:20.686
We just take a widest rectangle.

00:23:20.686 --> 00:23:22.060
The one thing we
needed to forbid

00:23:22.060 --> 00:23:25.060
was something going like
this all the way across.

00:23:28.120 --> 00:23:28.870
Step two.

00:23:31.822 --> 00:23:33.560
Step two is actually
pretty easy.

00:23:33.560 --> 00:23:37.490
Once you've identified
this red line--

00:23:37.490 --> 00:23:40.370
inside the rectangle, you
know there are some points.

00:23:40.370 --> 00:23:43.714
And I'm going to
take the rightmost.

00:23:43.714 --> 00:23:45.380
And then among all
the rightmost points,

00:23:45.380 --> 00:23:47.560
I'm going to take the
topmost point that

00:23:47.560 --> 00:23:50.420
is to the left of the line
and inside the rectangle.

00:23:50.420 --> 00:23:59.340
So let p be the topmost,
leftmost point--

00:23:59.340 --> 00:24:14.020
sorry, rightmost-- that
is both in the rectangle

00:24:14.020 --> 00:24:16.210
and left of the line.

00:24:22.240 --> 00:24:28.420
Let me erase this one for
a little bit more room.

00:24:28.420 --> 00:24:31.150
So I'm looking at
all of this region

00:24:31.150 --> 00:24:33.370
to the left of the
line in the rectangle.

00:24:33.370 --> 00:24:39.130
I want to take the
rightmost and then topmost

00:24:39.130 --> 00:24:42.100
point-- something like this.

00:24:42.100 --> 00:24:43.960
How do I know such
a point exists?

00:24:43.960 --> 00:24:46.640
Because this point
is such a point.

00:24:46.640 --> 00:24:49.030
And this point is to
the left of the line.

00:24:49.030 --> 00:24:52.510
So if there's nothing else in
here, that is a valid choice.

00:24:52.510 --> 00:24:54.950
But in general, we go to the
right as much as possible.

00:24:54.950 --> 00:24:56.390
Then we go up as
much as possible.

00:24:56.390 --> 00:24:59.110
So that's a point, which
we will call them p.

00:24:59.110 --> 00:25:03.534
AUDIENCE: Couldn't it be on
the border of the rectangle?

00:25:03.534 --> 00:25:05.200
ERIK DEMAINE: It could
be on the border.

00:25:05.200 --> 00:25:06.550
It could be interior.

00:25:06.550 --> 00:25:08.177
We don't know.

00:25:08.177 --> 00:25:11.890
AUDIENCE: When you said
topmost, what is your topmost?

00:25:11.890 --> 00:25:15.448
ERIK DEMAINE: Topmost means
of maximum y-coordinate.

00:25:15.448 --> 00:25:16.226
AUDIENCE: Oh, OK.

00:25:16.226 --> 00:25:16.850
Got it.

00:25:16.850 --> 00:25:19.490
ERIK DEMAINE: So it
could be up here.

00:25:19.490 --> 00:25:21.380
We don't know.

00:25:21.380 --> 00:25:22.760
First, we go rightmost.

00:25:22.760 --> 00:25:25.019
Then, among all the
things in that column,

00:25:25.019 --> 00:25:26.060
we go to the topmost one.

00:25:26.060 --> 00:25:27.140
So it might be on the top.

00:25:27.140 --> 00:25:27.806
It might not be.

00:25:30.760 --> 00:25:36.265
These are points-- sorry,
this is a point in OPT plus.

00:25:40.270 --> 00:25:42.966
And then q is going
to be a similar thing.

00:25:42.966 --> 00:25:52.780
It's going to be the
bottom-most, leftmost point

00:25:52.780 --> 00:26:03.970
in OPT plus that
is in the rectangle

00:26:03.970 --> 00:26:05.230
and right of the line.

00:26:08.112 --> 00:26:09.670
Not totally symmetric, though.

00:26:09.670 --> 00:26:12.420
We're also going to
say and not below p.

00:26:16.570 --> 00:26:25.450
So now we're looking at
this upper region here.

00:26:25.450 --> 00:26:28.290
Among all the things
that are not below p--

00:26:28.290 --> 00:26:30.280
should have drawn
this more horizontal--

00:26:33.430 --> 00:26:35.960
and to the right
of the red line--

00:26:35.960 --> 00:26:38.960
so that's up to here, I guess--

00:26:38.960 --> 00:26:41.662
I want to take the
leftmost column that

00:26:41.662 --> 00:26:43.370
has any points in it
and then among those

00:26:43.370 --> 00:26:47.370
take the bottom-most
point in the column.

00:26:47.370 --> 00:26:50.539
I claim that's actually
going to be on this line.

00:26:50.539 --> 00:26:52.580
First thing to check is
that such a point exists.

00:26:52.580 --> 00:26:54.980
Such a point exists
because, in particular, this

00:26:54.980 --> 00:26:56.689
is such a point.

00:26:56.689 --> 00:26:57.980
It is to the right of the line.

00:26:57.980 --> 00:27:00.800
It's above the blue line,
right of the red line,

00:27:00.800 --> 00:27:02.660
in the rectangle.

00:27:02.660 --> 00:27:05.350
But I claim that if we take the
leftmost, bottom-most one, then

00:27:05.350 --> 00:27:06.600
they must actually be aligned.

00:27:06.600 --> 00:27:08.010
Why?

00:27:08.010 --> 00:27:10.520
So if it was somewhere
else, like up here

00:27:10.520 --> 00:27:16.590
or like this point, then I claim
that is an unsatisfied box.

00:27:16.590 --> 00:27:24.180
Let me draw that picture,
make a little clearer.

00:27:24.180 --> 00:27:27.240
So something like this.

00:27:30.760 --> 00:27:39.540
So we've got our red line
and we've got this picture.

00:27:43.090 --> 00:27:54.000
This is p, and then actually
we don't know anything

00:27:54.000 --> 00:27:55.095
about down here.

00:27:59.770 --> 00:28:01.990
This is q.

00:28:01.990 --> 00:28:06.230
I claim that these black regions
cannot have any points in them

00:28:06.230 --> 00:28:09.010
because, by definition, p
was in the rightmost column.

00:28:09.010 --> 00:28:11.677
So there's nothing in this strip
and between p and the red line.

00:28:11.677 --> 00:28:14.218
And it was the topmost within
the column, which means there's

00:28:14.218 --> 00:28:15.520
nothing above p in the column.

00:28:15.520 --> 00:28:17.620
So that's why all the
points are confined

00:28:17.620 --> 00:28:19.300
to this region over here.

00:28:19.300 --> 00:28:22.360
Similarly, for q, if you look
at the things that are above

00:28:22.360 --> 00:28:24.640
or on this horizontal
line, which

00:28:24.640 --> 00:28:28.630
was the blue line
over there, then we

00:28:28.630 --> 00:28:33.320
know that there's nothing
in this strip in between

00:28:33.320 --> 00:28:34.330
because q is leftmost.

00:28:34.330 --> 00:28:36.163
And then among leftmost,
it was bottom-most,

00:28:36.163 --> 00:28:37.630
so there's nothing down here.

00:28:37.630 --> 00:28:40.360
So that means, if these guys
are not horizontally aligned,

00:28:40.360 --> 00:28:42.840
there is an
unsatisfied box here.

00:28:42.840 --> 00:28:43.940
Contradiction.

00:28:43.940 --> 00:28:45.170
It's a plus box.

00:28:45.170 --> 00:28:48.227
So in OPT plus, there's got
to be another point, which

00:28:48.227 --> 00:28:49.060
was a contradiction.

00:28:49.060 --> 00:28:52.360
So in fact, p and q must
be horizontally aligned.

00:28:52.360 --> 00:28:54.360
So that was step two.

00:28:57.040 --> 00:28:58.300
Finally, we get step three.

00:29:03.250 --> 00:29:05.290
So the idea with step three--

00:29:05.290 --> 00:29:07.390
now we're going to do
a charging argument.

00:29:07.390 --> 00:29:13.060
We want to say, OK, basically,
for every independent

00:29:13.060 --> 00:29:15.370
rectangle, we want to
find a point that's

00:29:15.370 --> 00:29:19.720
in OPT that was not
in the original input.

00:29:19.720 --> 00:29:22.390
And therefore, then
OPT plus has to be

00:29:22.390 --> 00:29:25.120
at least the size of the
input plus 1 per each

00:29:25.120 --> 00:29:26.350
of these plus rectangles.

00:29:29.510 --> 00:29:32.530
So the idea is the following.

00:29:32.530 --> 00:29:33.730
Because of all this set-up--

00:29:33.730 --> 00:29:39.340
because we made pq
horizontally aligned--

00:29:39.340 --> 00:29:40.930
they're inside the rectangle.

00:29:40.930 --> 00:29:43.030
And furthermore,
they're adjacent

00:29:43.030 --> 00:29:45.040
and they cross
this vertical line.

00:29:45.040 --> 00:29:48.880
And that vertical line is not
crossed by any other rectangle.

00:29:48.880 --> 00:29:51.780
When I say line, I
mean line segment.

00:29:51.780 --> 00:29:54.880
There's no other rectangle
that hits this red thing.

00:29:54.880 --> 00:29:56.470
Therefore, these
two points are not

00:29:56.470 --> 00:30:00.640
going to get charged
as a pair ever again.

00:30:00.640 --> 00:30:05.260
If you remove this rectangle,
repeat this process,

00:30:05.260 --> 00:30:09.070
pq is never going to
get charged again.

00:30:09.070 --> 00:30:10.990
So we charge to pq.

00:30:10.990 --> 00:30:26.230
And the pair never
charged again, never

00:30:26.230 --> 00:30:33.860
be charged by another
rectangle because no rectangle

00:30:33.860 --> 00:30:39.520
hits the red thing.

00:30:46.600 --> 00:30:53.380
So no rectangle
contains the segment pq,

00:30:53.380 --> 00:30:54.620
the horizontal segment pq.

00:30:57.600 --> 00:30:59.130
So this is almost what we want.

00:30:59.130 --> 00:31:01.900
We really want a single point
which is not in the input.

00:31:01.900 --> 00:31:03.360
So we have p and q.

00:31:03.360 --> 00:31:04.680
They're horizontally aligned.

00:31:04.680 --> 00:31:07.080
Now, if they're
horizontally aligned,

00:31:07.080 --> 00:31:12.090
we know that not both of them
are in the original input

00:31:12.090 --> 00:31:14.564
because all y-coordinates
are distinct.

00:31:14.564 --> 00:31:16.230
This is usually true
because you're only

00:31:16.230 --> 00:31:19.800
accessing one point
per row, per time step.

00:31:19.800 --> 00:31:23.550
So one of these might
be in the input,

00:31:23.550 --> 00:31:25.170
but the other one is not.

00:31:25.170 --> 00:31:26.970
So that's the one I
want to hold onto.

00:31:26.970 --> 00:31:32.040
And say, OK, that's a point
added to OPT plus that pays

00:31:32.040 --> 00:31:34.295
for this rectangle.

00:31:34.295 --> 00:31:35.670
It's not quite so
simple, though,

00:31:35.670 --> 00:31:37.230
because we might
have a whole bunch

00:31:37.230 --> 00:31:42.060
of horizontally-aligned things.

00:31:42.060 --> 00:31:44.950
And one rectangle
charges to this one.

00:31:44.950 --> 00:31:46.980
One rectangle
charges to this one.

00:31:46.980 --> 00:31:48.960
One rectangle
charges to this pair.

00:31:51.510 --> 00:31:54.660
That's OK, though, because
here we have four points.

00:31:54.660 --> 00:31:57.180
Again, one of them
could be in the input.

00:31:57.180 --> 00:31:59.160
The other three
have to be added.

00:31:59.160 --> 00:32:01.720
And so you've got three
rectangles, three added points,

00:32:01.720 --> 00:32:03.190
and we're happy.

00:32:03.190 --> 00:32:04.149
Question?

00:32:04.149 --> 00:32:05.940
AUDIENCE: Just to make
the argument formal,

00:32:05.940 --> 00:32:11.667
wouldn't you want to say that
only when your saying assume

00:32:11.667 --> 00:32:13.916
that x and y are always
distinct-- but then,

00:32:13.916 --> 00:32:16.050
if you have the
same either x or y--

00:32:16.050 --> 00:32:17.490
ERIK DEMAINE: Ah, good point.

00:32:17.490 --> 00:32:21.960
So this is distinct in
the input is what I meant.

00:32:21.960 --> 00:32:23.832
Obviously, in OPT,
any satisfied set

00:32:23.832 --> 00:32:25.290
is not going to
have this property.

00:32:25.290 --> 00:32:26.159
Yeah, good.

00:32:26.159 --> 00:32:28.200
So I want to assume x-
and y-coordinates are only

00:32:28.200 --> 00:32:29.490
distinct in the input.

00:32:29.490 --> 00:32:31.020
OPT will not have that property.

00:32:31.020 --> 00:32:34.422
And that's why p and q can exist
and have the same y-coordinate.

00:32:34.422 --> 00:32:35.130
Another question?

00:32:39.397 --> 00:32:40.938
AUDIENCE: Does this
still [INAUDIBLE]

00:32:40.938 --> 00:32:43.116
the special case where
your two points are

00:32:43.116 --> 00:32:44.932
the points of the rectangle?

00:32:44.932 --> 00:32:45.640
ERIK DEMAINE: OK.

00:32:45.640 --> 00:32:50.560
So the question is can p and q
be the points of the rectangle?

00:32:50.560 --> 00:32:52.180
One of them can be.

00:32:52.180 --> 00:32:55.360
Like, p could be here, and then
another point is over here.

00:32:55.360 --> 00:32:58.480
So then that will be the
segment that you are using,

00:32:58.480 --> 00:33:00.770
between p and q.

00:33:00.770 --> 00:33:03.690
Or it could be q is
here, and p is over here.

00:33:03.690 --> 00:33:04.875
Then that's the segment.

00:33:04.875 --> 00:33:07.000
You can't have them both
equal because p and q have

00:33:07.000 --> 00:33:10.320
to be horizontally aligned
and also because there's got

00:33:10.320 --> 00:33:13.750
to be another point in there.

00:33:13.750 --> 00:33:15.220
Yeah, so that should work.

00:33:15.220 --> 00:33:17.710
You have to check that this
boundary case is still OK.

00:33:17.710 --> 00:33:23.350
But the claim is no other
rectangle touches this red line

00:33:23.350 --> 00:33:24.825
even on the endpoint.

00:33:24.825 --> 00:33:26.200
And therefore, no
other rectangle

00:33:26.200 --> 00:33:28.760
will wholly contain p and q.

00:33:28.760 --> 00:33:32.470
And so that means you're only
charging to this pair once.

00:33:32.470 --> 00:33:34.280
And then this pair
charging is OK

00:33:34.280 --> 00:33:38.142
because, luckily, there's three
edges here, four vertices.

00:33:38.142 --> 00:33:39.850
One of those vertices
we can't charge to,

00:33:39.850 --> 00:33:43.840
so there's exactly the right
number of things for the edges,

00:33:43.840 --> 00:33:46.210
and we're OK.

00:33:46.210 --> 00:33:48.580
Yeah, this can really happen.

00:33:48.580 --> 00:33:54.420
In fact our favorite
example of the pinwheel--

00:33:54.420 --> 00:33:58.903
if instead of doing the greedy
addition, we do this addition--

00:34:03.740 --> 00:34:07.120
these are supposed to
be horizontally aligned.

00:34:07.120 --> 00:34:09.770
A little hard without a grid--

00:34:09.770 --> 00:34:12.350
a graph blackboard
would a great.

00:34:12.350 --> 00:34:14.870
So this is not quite satisfied.

00:34:14.870 --> 00:34:19.969
You've got to add some more
points here or something.

00:34:19.969 --> 00:34:22.380
But it has the feature that--

00:34:22.380 --> 00:34:24.560
here's an independent
set of rectangles.

00:34:24.560 --> 00:34:33.199
I can do this one,
this one, and this one.

00:34:36.030 --> 00:34:38.330
So this is three
independent rectangles.

00:34:38.330 --> 00:34:42.852
As the white points go,
they're independent rectangles.

00:34:42.852 --> 00:34:44.810
The corners are not
strictly inside each other,

00:34:44.810 --> 00:34:46.393
and none of the white
points satisfies

00:34:46.393 --> 00:34:49.590
any of the other rectangles.

00:34:49.590 --> 00:34:53.132
And indeed, if you
applied this argument,

00:34:53.132 --> 00:34:54.590
first you take the
widest rectangle

00:34:54.590 --> 00:34:57.570
and say, OK, here is my
vertical red segment.

00:34:57.570 --> 00:35:00.597
I'm going to charge to these
two guys, this segment,

00:35:00.597 --> 00:35:02.930
and then eventually this guy
will charge to this segment

00:35:02.930 --> 00:35:05.480
and this guy will
charge to this segment.

00:35:05.480 --> 00:35:07.460
And luckily, there
are three added points

00:35:07.460 --> 00:35:11.140
for exactly the three segments
for the three rectangles.

00:35:11.140 --> 00:35:12.390
There had to be another point.

00:35:15.080 --> 00:35:16.140
So that's a lower bound.

00:35:21.453 --> 00:35:22.930
A lot of work--

00:35:22.930 --> 00:35:25.060
but in the end, it
says, look, just

00:35:25.060 --> 00:35:28.930
find an independent set of
plus boxes, plus rectangles.

00:35:28.930 --> 00:35:30.350
That's a lower bound on OPT.

00:35:30.350 --> 00:35:32.620
So now, the question
remains, how

00:35:32.620 --> 00:35:36.340
do we find a good independent
set of plus boxes?

00:35:36.340 --> 00:35:38.500
And now we'll go through
the three different ways

00:35:38.500 --> 00:35:40.230
we know how to do it.

00:35:40.230 --> 00:35:42.580
I'll start actually
with Wilber 2.

00:35:42.580 --> 00:35:45.220
It's called Wilber 2 because
it was in a paper by Wilber,

00:35:45.220 --> 00:35:49.090
and I think he called it lower
bound number 1 and lower bound

00:35:49.090 --> 00:35:50.410
number 2.

00:35:50.410 --> 00:35:53.620
But for pragmatic reasons, I'm
going to start with number 2.

00:36:00.670 --> 00:36:03.460
It's from 1989, so it's
actually an old paper.

00:36:03.460 --> 00:36:06.410
And it was sort of
lost for a long time.

00:36:06.410 --> 00:36:09.540
I don't think Wilber
wrote any other papers.

00:36:09.540 --> 00:36:11.755
It was in SICOMP, a big journal.

00:36:14.372 --> 00:36:18.400
so a few years after
splay trees and then

00:36:18.400 --> 00:36:21.610
sort of rediscovered
in the early 2000s

00:36:21.610 --> 00:36:25.060
and turns out to be really
useful for a lot of theorems.

00:36:25.060 --> 00:36:27.970
So here's the lower bound.

00:36:27.970 --> 00:36:29.590
Again, we're
looking at the input

00:36:29.590 --> 00:36:33.970
point set-- no added points,
just the original points.

00:36:33.970 --> 00:36:37.450
Look at every point, and
look at all the points

00:36:37.450 --> 00:36:40.970
that you can see from
this point downward.

00:36:40.970 --> 00:36:41.830
What does see mean?

00:36:41.830 --> 00:36:46.210
I'm interested in
points below p that when

00:36:46.210 --> 00:36:49.910
I draw the rectangle
contain no other points.

00:36:49.910 --> 00:36:53.740
So this is sort of
like a lower envelope.

00:36:53.740 --> 00:36:56.770
It's going to look
something like this--

00:36:56.770 --> 00:37:02.180
and maybe some points like this.

00:37:02.180 --> 00:37:05.950
So all of these rectangles
have to be empty.

00:37:23.540 --> 00:37:37.670
So these are the downward
visible points from p.

00:37:41.330 --> 00:37:45.680
And now, among these points, you
can sort them by y-coordinate.

00:37:45.680 --> 00:37:48.590
And I want to see
how many times do

00:37:48.590 --> 00:37:51.630
they cross this vertical line.

00:37:51.630 --> 00:37:54.560
So if I order them
by y-coordinate--

00:37:57.590 --> 00:37:59.780
so I start here, and
maybe I go to here.

00:37:59.780 --> 00:38:03.470
Then the next one is over
here, so that's across.

00:38:03.470 --> 00:38:04.850
Then I go over here.

00:38:04.850 --> 00:38:05.520
Then I cross.

00:38:05.520 --> 00:38:06.710
Then I go here.

00:38:06.710 --> 00:38:08.250
Then I cross.

00:38:08.250 --> 00:38:10.670
Go here, here, cross.

00:38:10.670 --> 00:38:12.350
So if I visit them
in order, I want

00:38:12.350 --> 00:38:15.170
to know how many times do
I cross this vertical line.

00:38:19.320 --> 00:38:23.330
So this is the past of p,
all of the accesses before p.

00:38:23.330 --> 00:38:25.880
Think of this is how
many times you alternate

00:38:25.880 --> 00:38:27.985
between accessing on
the left of the line

00:38:27.985 --> 00:38:29.610
and accessing on the
right of the line.

00:38:33.594 --> 00:38:44.270
So count number of alternations
left or right of p.

00:38:44.270 --> 00:38:46.412
And again, if we assume
that no key is ever

00:38:46.412 --> 00:38:47.870
accessed more than
once, then there

00:38:47.870 --> 00:38:49.910
will always be left or
right, never exactly on.

00:38:54.931 --> 00:38:56.555
And then I want to
sum over all points.

00:38:59.820 --> 00:39:03.950
And I claim this
is a lower bound.

00:39:03.950 --> 00:39:07.550
Why is it a lower bound?

00:39:07.550 --> 00:39:13.580
Essentially, I take each
of these red lines that

00:39:13.580 --> 00:39:20.005
cross the p vertical line
and I turn them into a box.

00:39:20.005 --> 00:39:28.360
So there's one there, one
there, one there, and one there.

00:39:28.360 --> 00:39:33.597
I claim if I do this for
all p, all those boxes

00:39:33.597 --> 00:39:34.430
will be independent.

00:39:34.430 --> 00:39:36.805
All those rectangles will be
independent from each other.

00:39:36.805 --> 00:39:41.810
I won't prove that formally
here, but you can check it.

00:39:41.810 --> 00:39:44.240
So it's obvious for one
p because each of these

00:39:44.240 --> 00:39:46.250
has a different vertical span.

00:39:46.250 --> 00:39:49.270
If you do it for all
p-- all the points p--

00:39:49.270 --> 00:39:51.980
these won't conflict.

00:39:51.980 --> 00:39:55.560
So by the independent
rectangle lower bound,

00:39:55.560 --> 00:40:02.490
this is a lower bound on
OPT up to a factor of 2.

00:40:02.490 --> 00:40:04.950
So what?

00:40:04.950 --> 00:40:07.080
Wilber 2 is quite interesting.

00:40:07.080 --> 00:40:08.520
For a long time,
we've conjectured

00:40:08.520 --> 00:40:11.060
that it is the right answer.

00:40:11.060 --> 00:40:16.620
So conjecture-- I know
it's a weird lower

00:40:16.620 --> 00:40:18.290
bound to even think of.

00:40:18.290 --> 00:40:20.850
It's a very hard paper to read.

00:40:20.850 --> 00:40:22.950
Without the geometric
view, it's even harder

00:40:22.950 --> 00:40:26.800
to imagine the
definition of this bound.

00:40:26.800 --> 00:40:28.400
It's sort of an algorithm.

00:40:28.400 --> 00:40:29.955
It's a way to assign boxes.

00:40:29.955 --> 00:40:31.080
It gives you a lower bound.

00:40:31.080 --> 00:40:32.810
It's a little weird.

00:40:32.810 --> 00:40:34.560
We conjecture that
it's proportional

00:40:34.560 --> 00:40:37.540
to the optimal solution.

00:40:37.540 --> 00:40:38.550
We can't prove it.

00:40:38.550 --> 00:40:40.440
We've tried many times.

00:40:40.440 --> 00:40:47.630
It's a pain to work with,
but it is what it is.

00:40:47.630 --> 00:40:49.340
There's one theorem
that uses it,

00:40:49.340 --> 00:40:51.150
so I want to tell you
about that theorem.

00:40:51.150 --> 00:40:54.200
But I don't want to go
into it in too much detail.

00:40:54.200 --> 00:40:56.660
It's a neat theorem.

00:40:56.660 --> 00:41:02.900
And it's in a paper
by Iacono, 2002.

00:41:02.900 --> 00:41:05.900
And it was the first paper to
revitalize the Wilber stuff.

00:41:05.900 --> 00:41:08.390
So it's like, hey, there's
this Wilber 2 bound.

00:41:08.390 --> 00:41:10.820
We can use it to solve a
new problem, which is called

00:41:10.820 --> 00:41:12.905
key independent optimality.

00:41:23.960 --> 00:41:27.100
Briefly, the idea with
key independent optimality

00:41:27.100 --> 00:41:29.640
is, suppose you've heard
about dynamic optimality.

00:41:29.640 --> 00:41:32.510
You know, it's really cool
because splay trees and whatnot

00:41:32.510 --> 00:41:34.940
seem to really adapt to
whatever your inputs are.

00:41:34.940 --> 00:41:37.250
But suppose your inputs
really don't have keys.

00:41:37.250 --> 00:41:42.140
They're just arbitrary objects
labeled however, just randomly.

00:41:42.140 --> 00:41:44.570
In fact, let's assume that
they're labeled randomly.

00:41:44.570 --> 00:41:46.340
Suppose the keys are
generated randomly

00:41:46.340 --> 00:41:48.797
because they're meaningless
or just arbitrary things.

00:41:48.797 --> 00:41:50.630
So you figure, oh, maybe
I'll make it better

00:41:50.630 --> 00:41:54.560
and just randomize
them completely.

00:41:54.560 --> 00:42:06.380
If keys are random,
then dynamic OPT

00:42:06.380 --> 00:42:09.481
is the same thing up to constant
factors as the working set

00:42:09.481 --> 00:42:09.980
bound.

00:42:15.830 --> 00:42:17.390
That's the theorem.

00:42:17.390 --> 00:42:20.120
So this is cool because it
means splay trees are actually

00:42:20.120 --> 00:42:22.070
optimal in the setting
where keys are random.

00:42:25.580 --> 00:42:30.200
This is in expectation
over the randomized keys.

00:42:30.200 --> 00:42:34.070
And the way this theorem
is proved is basically--

00:42:34.070 --> 00:42:36.890
so what this is saying is,
if we take a point set--

00:42:36.890 --> 00:42:39.290
arbitrary point set--
but then we re-randomize

00:42:39.290 --> 00:42:43.350
the x-coordinates-- leave the
y-coordinates as they are--

00:42:43.350 --> 00:42:48.106
then you can compute
how Wilber 2 behaves.

00:42:48.106 --> 00:42:49.730
Because now you have
a bunch of points,

00:42:49.730 --> 00:42:53.940
and you're randomly
shifting their x-coordinate.

00:42:53.940 --> 00:42:58.070
So it's like if you're
randomly bouncing around an x

00:42:58.070 --> 00:43:00.380
and you're interested
in this envelope

00:43:00.380 --> 00:43:02.720
on the left and the right,
you want to know basically

00:43:02.720 --> 00:43:03.605
how many times--

00:43:08.030 --> 00:43:11.950
I guess since I last
accessed p, which is here.

00:43:11.950 --> 00:43:14.330
We didn't do that here, but
in the working set bound

00:43:14.330 --> 00:43:16.280
that's part of the deal.

00:43:20.960 --> 00:43:23.000
If you look on the
left side, it's

00:43:23.000 --> 00:43:26.060
like how many times
does the max change.

00:43:26.060 --> 00:43:28.520
And you may know if
you n random numbers

00:43:28.520 --> 00:43:33.530
and you want to know how many
times does the max changes

00:43:33.530 --> 00:43:35.390
as I go left to right,
as I take larger

00:43:35.390 --> 00:43:37.130
and larger prefixes
of those n numbers,

00:43:37.130 --> 00:43:40.610
the answer is log
n in expectation.

00:43:40.610 --> 00:43:43.040
Because the more points you
have, the less and less likely

00:43:43.040 --> 00:43:46.240
it is for the max to change.

00:43:46.240 --> 00:43:51.880
So basically, you show
the expected Wilber

00:43:51.880 --> 00:43:57.200
2 of a point over
this randomization

00:43:57.200 --> 00:44:03.750
is theta log ti, where ti
is the working set bound.

00:44:03.750 --> 00:44:08.260
And so, that gives
you the theorem.

00:44:08.260 --> 00:44:10.664
This gives you a lower bound
of the working set bound.

00:44:10.664 --> 00:44:12.580
We have upper bounds of
the working set bound,

00:44:12.580 --> 00:44:15.190
and therefore that's OPT.

00:44:15.190 --> 00:44:17.911
So that's just a
very quick sketch.

00:44:17.911 --> 00:44:19.660
If you're interested,
check out the paper.

00:44:22.230 --> 00:44:25.865
That's unfortunately all we
know what to do with Wilber 2.

00:44:25.865 --> 00:44:27.490
But there's this
other bound, Wilber 1,

00:44:27.490 --> 00:44:33.030
which seems less good yet we
can do a lot more with it.

00:44:33.030 --> 00:44:35.140
So let me go to that.

00:44:54.470 --> 00:44:57.220
It's a lot easier to analyze
algorithms with respect

00:44:57.220 --> 00:45:00.440
to Wilber 1.

00:45:00.440 --> 00:45:01.230
What's Wilber 1?

00:45:04.080 --> 00:45:09.870
We're going to fix something
called a lower bound tree.

00:45:09.870 --> 00:45:12.660
I'm going to call it
because it's basically

00:45:12.660 --> 00:45:16.515
going to be a perfect
binary tree on my keys.

00:45:19.200 --> 00:45:20.740
This tree never changes.

00:45:20.740 --> 00:45:22.320
That's why I say fix.

00:45:22.320 --> 00:45:25.250
It is not the binary search
tree you're looking for.

00:45:25.250 --> 00:45:29.040
It is not the binary search
tree that you're interested in.

00:45:29.040 --> 00:45:30.540
It's just a thing
to think about.

00:45:33.930 --> 00:45:41.400
Now, for each node
of that tree--

00:45:41.400 --> 00:45:44.700
let's look at this node,
I'll give the node a name, y.

00:45:47.880 --> 00:45:48.720
So here's y.

00:45:52.000 --> 00:45:53.960
There's the left subtree
of y, and there's

00:45:53.960 --> 00:45:56.360
the right subtree of y.

00:45:56.360 --> 00:45:57.766
These are a bunch of keys.

00:45:57.766 --> 00:45:59.390
There's keys that
are to the left of y.

00:45:59.390 --> 00:46:01.100
There's keys to the right of y.

00:46:01.100 --> 00:46:02.840
There's keys
outside the subtree.

00:46:02.840 --> 00:46:04.880
We're going to ignore those.

00:46:04.880 --> 00:46:08.030
I want to look at the accesses
to these keys and accesses

00:46:08.030 --> 00:46:10.220
to these keys and see
how many times do I

00:46:10.220 --> 00:46:13.040
switch between left and right.

00:46:13.040 --> 00:46:19.730
So count the number
of alternations--

00:46:19.730 --> 00:46:22.887
so very similar in
spirit to Wilber 2,

00:46:22.887 --> 00:46:24.470
it's just relative
to this weird tree,

00:46:24.470 --> 00:46:26.045
which is kind of arbitrary--

00:46:29.090 --> 00:46:34.880
in the access sequence--
which is x1 up to xn--

00:46:34.880 --> 00:46:48.020
between left and
right subtrees of y

00:46:48.020 --> 00:46:50.240
So we're going to ignore
accesses to y itself.

00:46:50.240 --> 00:46:53.240
We're going to ignore
accesses to keys outside of y.

00:46:53.240 --> 00:46:56.990
Just look at how many times
do I switch between x and y.

00:46:56.990 --> 00:46:58.322
That's a lower bound.

00:46:58.322 --> 00:46:59.030
That's the claim.

00:47:09.550 --> 00:47:11.200
It's a lower bound
for the same reason

00:47:11.200 --> 00:47:14.800
we use the independent
rectangle lower bound.

00:47:14.800 --> 00:47:17.710
And the claim is, if you
look at these alternations,

00:47:17.710 --> 00:47:20.370
draw the corresponding
rectangles--

00:47:26.170 --> 00:47:27.820
so over here, we
had a vertical line

00:47:27.820 --> 00:47:30.820
which corresponded to the key,
and we see how many times do we

00:47:30.820 --> 00:47:32.020
cross the line.

00:47:32.020 --> 00:47:38.874
Basically, the same thing
over here except now

00:47:38.874 --> 00:47:41.290
there's one big vertical line
that corresponds to the root

00:47:41.290 --> 00:47:43.330
node, then there's some
vertical lines that

00:47:43.330 --> 00:47:45.670
correspond to this
node and this node,

00:47:45.670 --> 00:47:48.350
and you're interested
in the access sequence.

00:47:48.350 --> 00:47:51.591
How many times-- let's do some
kind of access sequence like

00:47:51.591 --> 00:47:52.090
this--

00:47:55.890 --> 00:47:56.920
these are our points--

00:48:02.590 --> 00:48:06.220
and you just look at what
lines are you crossing.

00:48:08.830 --> 00:48:11.127
So like this crosses
the big line.

00:48:11.127 --> 00:48:13.210
So that's going to be one
alternation between left

00:48:13.210 --> 00:48:14.157
and right here.

00:48:14.157 --> 00:48:16.240
Here's another alternation
between left and right.

00:48:16.240 --> 00:48:18.820
Here is another alternation
between left and right.

00:48:18.820 --> 00:48:22.750
Here's another alternation
between left and right.

00:48:22.750 --> 00:48:24.880
And one more.

00:48:24.880 --> 00:48:27.280
So for the big line,
for the root node,

00:48:27.280 --> 00:48:29.530
that's how many times you
cross between left and right

00:48:29.530 --> 00:48:31.750
relative to the root.

00:48:31.750 --> 00:48:34.140
Then, for the left
subtree the root,

00:48:34.140 --> 00:48:36.120
there's one crossing here.

00:48:36.120 --> 00:48:40.060
There is one crossing
here, one crossing here.

00:48:42.940 --> 00:48:45.330
These are touching, but
they're not satisfied.

00:48:45.330 --> 00:48:46.200
So it's OK.

00:48:46.200 --> 00:48:48.540
The claim is all these
rectangles will be independent.

00:48:48.540 --> 00:48:52.010
Again, I won't prove that
formally, but it's true.

00:48:55.010 --> 00:48:55.940
OK?

00:48:55.940 --> 00:48:58.790
Rough sketch.

00:48:58.790 --> 00:49:00.250
So that's Wilber 1.

00:49:00.250 --> 00:49:02.600
It's, again, an independent
rectangle lower bound.

00:49:02.600 --> 00:49:04.944
It's a little weird because
it depends on this tree.

00:49:04.944 --> 00:49:06.860
You could choose it to
be a nice perfect tree.

00:49:06.860 --> 00:49:07.970
You could choose it to
be a different tree.

00:49:07.970 --> 00:49:10.344
You'll get a different
lower bound each time.

00:49:10.344 --> 00:49:12.260
So of course, you take
the max over all trees.

00:49:12.260 --> 00:49:16.100
That will give you the
biggest Wilber 1 lower bound.

00:49:16.100 --> 00:49:21.140
We don't know much about that
biggest Wilber 1 lower bound.

00:49:21.140 --> 00:49:26.960
I guess you could ask the
following open question.

00:49:26.960 --> 00:49:30.380
Is it true that for
every access sequence

00:49:30.380 --> 00:49:37.700
there exists a tree p such
that Wilber 1 is theta OPT?

00:49:37.700 --> 00:49:39.920
Or is theta Wilber
2 or something?

00:49:39.920 --> 00:49:41.940
Wilber 2 is a single quantity.

00:49:41.940 --> 00:49:43.005
You compute it.

00:49:43.005 --> 00:49:44.420
It gives you a bound.

00:49:44.420 --> 00:49:46.790
Wilber 1, it depends on this p.

00:49:46.790 --> 00:49:48.950
Maybe if you choose the
best p for your sequence

00:49:48.950 --> 00:49:50.190
you get the right answer.

00:49:50.190 --> 00:49:55.550
But it's definitely the case
that Wilber 1 for a fixed p

00:49:55.550 --> 00:49:58.850
is not the right answer.

00:49:58.850 --> 00:50:00.740
I recall that's easy to prove.

00:50:06.580 --> 00:50:08.690
Well, maybe we'll
come back to that.

00:50:08.690 --> 00:50:10.017
Yeah, question?

00:50:10.017 --> 00:50:11.686
AUDIENCE: So how
do you construct

00:50:11.686 --> 00:50:12.879
this lower bound tree?

00:50:12.879 --> 00:50:13.932
Like, is it just--

00:50:13.932 --> 00:50:16.140
ERIK DEMAINE: I'll tell you
what we're going to use--

00:50:16.140 --> 00:50:17.884
the question is how
do we construct p.

00:50:17.884 --> 00:50:19.300
You can make it
whatever you want.

00:50:19.300 --> 00:50:21.870
What we're going to use
is the perfect tree,

00:50:21.870 --> 00:50:23.730
which is sort of unique.

00:50:23.730 --> 00:50:27.076
It's kind of arbitrary,
but it works.

00:50:27.076 --> 00:50:28.950
It has the property that
its height is log n.

00:50:28.950 --> 00:50:30.300
That's all we need.

00:50:30.300 --> 00:50:32.430
We're going to use that
to get tango trees.

00:50:32.430 --> 00:50:34.500
Other questions?

00:50:34.500 --> 00:50:36.600
All right.

00:50:36.600 --> 00:50:39.270
Let me briefly mention
a fun access sequence.

00:50:45.315 --> 00:50:48.990
You may recognize this sequence.

00:50:48.990 --> 00:50:52.080
This would be in-order
traversal in binary.

00:50:52.080 --> 00:50:53.760
But if I take
these bit sequences

00:50:53.760 --> 00:51:09.311
and read them backwards, then
I get 0, 4, 2, 6, 1, 5, 3, 7.

00:51:09.311 --> 00:51:11.310
This is the number 0
through 7 in a funny order.

00:51:11.310 --> 00:51:13.890
It's called the bit
reversal sequence.

00:51:13.890 --> 00:51:24.000
If you access 0, 4, 2, 6, 1, 5,
3, 7 in a perfect binary tree,

00:51:24.000 --> 00:51:26.410
it maximizes Wilber 1.

00:51:26.410 --> 00:51:36.960
So in-order traversal--
0, 1, 2, 3, 4, 5, 6.

00:51:36.960 --> 00:51:38.010
Ignore 7.

00:51:38.010 --> 00:51:39.647
There's not 7 in this tree.

00:51:43.135 --> 00:51:46.050
I do 0, 4--

00:51:46.050 --> 00:51:48.640
if you look at the
root, alternate left,

00:51:48.640 --> 00:51:53.820
right, left, right, left, right.

00:51:53.820 --> 00:51:56.500
Because the high-order bit
is switching every time,

00:51:56.500 --> 00:51:58.614
and so whether I go to
the left of the tree here

00:51:58.614 --> 00:52:00.780
or the right of the tree,
it's switching every time.

00:52:00.780 --> 00:52:02.321
And also, if you
look in any subtree,

00:52:02.321 --> 00:52:04.710
like when I'm accessing things
within the subtree of one,

00:52:04.710 --> 00:52:06.530
it alternates 0, too.

00:52:06.530 --> 00:52:09.030
It's too small a tree to
really see that happening,

00:52:09.030 --> 00:52:10.600
but it's true.

00:52:10.600 --> 00:52:15.380
And so, if you do
this for k bits,

00:52:15.380 --> 00:52:18.060
n equals 2 to the k roughly.

00:52:18.060 --> 00:52:21.680
And Wilber 1, the
lower bound, is

00:52:21.680 --> 00:52:27.210
log n per [INAUDIBLE] because
the every access alternates.

00:52:27.210 --> 00:52:29.622
So if you look at
a subtree, whatever

00:52:29.622 --> 00:52:31.080
the size of that
subtree is, that's

00:52:31.080 --> 00:52:33.060
how many alternations there are.

00:52:33.060 --> 00:52:38.070
And so, number of
alternations is theta n log n

00:52:38.070 --> 00:52:41.780
because it's the sum over all
nodes of their subtree sizes.

00:52:41.780 --> 00:52:48.300
And so OPT is theta n log n.

00:52:48.300 --> 00:52:50.420
We know we can achieve n log n--

00:52:50.420 --> 00:52:53.520
this is to do n accesses--

00:52:53.520 --> 00:52:55.899
we know we can n log n with
a red-black tree or whatever,

00:52:55.899 --> 00:52:57.690
but there's actually
a lower bound of n log

00:52:57.690 --> 00:52:59.730
n, meaning all
binary search trees--

00:52:59.730 --> 00:53:01.590
if you're given this
access sequence,

00:53:01.590 --> 00:53:04.080
doesn't matter what you're
doing-- you have to pay

00:53:04.080 --> 00:53:04.770
n log n.

00:53:04.770 --> 00:53:06.680
It's kind of cool.

00:53:06.680 --> 00:53:07.990
A little side effect--

00:53:07.990 --> 00:53:09.840
that's Wilber's paper ended.

00:53:09.840 --> 00:53:13.350
It's like, hey, cool, can
find one access sequence that

00:53:13.350 --> 00:53:17.340
is bad for everybody.

00:53:17.340 --> 00:53:20.480
But now we're going
to use Wilber 1

00:53:20.480 --> 00:53:22.890
to get one binary search
tree that's pretty

00:53:22.890 --> 00:53:25.770
good for all access sequences.

00:53:25.770 --> 00:53:28.590
Pretty good meaning within a
log log n factor of optimal.

00:53:51.280 --> 00:53:59.152
And this is tango
trees, which would

00:53:59.152 --> 00:54:04.400
be log log n competitive
online binary search trees.

00:54:09.380 --> 00:54:11.270
Why are they called tango trees?

00:54:11.270 --> 00:54:14.330
People made up all sorts of
reasons, but I can tell you--

00:54:14.330 --> 00:54:16.010
because I was there--

00:54:16.010 --> 00:54:20.810
they were invented mostly
on a flight from New York

00:54:20.810 --> 00:54:24.650
to Buenos Aires, which
is the center of tango.

00:54:24.650 --> 00:54:26.900
I bought this T-shirt
I think the day after.

00:54:26.900 --> 00:54:29.150
And then that week,
we wrote the paper,

00:54:29.150 --> 00:54:30.610
and that was tango trees.

00:54:30.610 --> 00:54:35.630
So no particular reason,
but it sounds good.

00:54:35.630 --> 00:54:37.130
Always good to have a cool name.

00:54:37.130 --> 00:54:39.095
So the secret is revealed.

00:54:39.095 --> 00:54:46.100
The true meaning of tango trees
is nothing, but you we'll see.

00:54:46.100 --> 00:54:48.630
So how do they work?

00:54:48.630 --> 00:54:49.640
It's very simple.

00:54:49.640 --> 00:54:55.040
Basically, we take Wilber
1 and we simulate it.

00:54:55.040 --> 00:54:59.310
So let me be more precise.

00:54:59.310 --> 00:55:05.870
There's one key idea, which
is to look at the preferred

00:55:05.870 --> 00:55:10.510
child of a node.

00:55:15.870 --> 00:55:19.242
I'm going to say the
preferred child is left.

00:55:19.242 --> 00:55:23.818
Let's see, node y in p.

00:55:23.818 --> 00:55:38.060
It's left if we accessed some
node in the left subtree of y

00:55:38.060 --> 00:55:38.675
most recently.

00:55:43.300 --> 00:55:46.760
It's the right
child if we accessed

00:55:46.760 --> 00:55:48.960
something in the right
subtree most recently.

00:55:48.960 --> 00:55:52.850
So we're just looking at left
and right subtree accesses,

00:55:52.850 --> 00:55:54.240
what was most recent?

00:55:54.240 --> 00:55:56.390
There is a special case
in the beginning, which

00:55:56.390 --> 00:55:58.681
is you don't have a preferred
child because you haven't

00:55:58.681 --> 00:56:00.710
accessed either
left or right yet.

00:56:00.710 --> 00:56:12.365
So this is if no access
to the left or right yet.

00:56:12.365 --> 00:56:14.340
So that just happens
in the beginning.

00:56:14.340 --> 00:56:17.180
Once you've touched
everything, everybody

00:56:17.180 --> 00:56:19.970
will have a left or
right preferred child.

00:56:19.970 --> 00:56:23.780
So this is just what was
your most recent child.

00:56:23.780 --> 00:56:26.580
This is like a parent
with a very short memory.

00:56:26.580 --> 00:56:31.400
Just whichever child I
most recently talked to,

00:56:31.400 --> 00:56:34.292
that is my preferred
child at the moment.

00:56:34.292 --> 00:56:35.750
It's kind of like
I don't know when

00:56:35.750 --> 00:56:38.575
you're going to job interviews.

00:56:38.575 --> 00:56:40.160
You know, the most
recent interview

00:56:40.160 --> 00:56:42.702
is the one you remember
most fondly and so, ah,

00:56:42.702 --> 00:56:44.660
you like that one the
best independent of which

00:56:44.660 --> 00:56:45.570
is the coolest.

00:56:45.570 --> 00:56:48.620
So let me draw a picture.

00:56:48.620 --> 00:56:52.520
And I guess I'm going
to draw a big picture--

00:56:52.520 --> 00:57:00.121
my favorite-- a perfectly
balanced binary search tree

00:57:00.121 --> 00:57:02.810
with eight leaves.

00:57:02.810 --> 00:57:10.840
And so now, suppose that every
node has a preferred child.

00:57:10.840 --> 00:57:12.710
Let's say they all do
just because it makes

00:57:12.710 --> 00:57:13.834
a more interesting picture.

00:57:17.240 --> 00:57:20.780
I'm going to draw that
with a big fat arrow.

00:57:20.780 --> 00:57:23.390
And now, what does that do?

00:57:23.390 --> 00:57:24.950
It decomposes our tree.

00:57:24.950 --> 00:57:27.860
This is the perfect tree. p is
going to be perfectly balanced,

00:57:27.860 --> 00:57:28.700
log n height.

00:57:28.700 --> 00:57:30.290
It could be any
log n height tree,

00:57:30.290 --> 00:57:32.990
but we'll make it perfect.

00:57:32.990 --> 00:57:38.840
And it decomposes
that tree into paths.

00:57:38.840 --> 00:57:40.120
And there's a path here.

00:57:40.120 --> 00:57:43.880
You just keep following parent
pointers, you get a path--

00:57:43.880 --> 00:57:45.680
not parent pointers,
preferred pointers.

00:57:45.680 --> 00:57:47.600
It's also true if you follow
parent pointers you get a path,

00:57:47.600 --> 00:57:49.160
but they'll overlap each other.

00:57:49.160 --> 00:57:50.660
You follow preferred
child pointers,

00:57:50.660 --> 00:57:52.050
you get non-overlapping paths.

00:57:54.515 --> 00:57:55.140
There they are.

00:57:55.140 --> 00:57:58.950
We also get these singleton
paths at the leaves.

00:57:58.950 --> 00:58:01.170
Some of the leaves are
in singleton paths.

00:58:01.170 --> 00:58:02.685
These are called
preferred paths.

00:58:12.820 --> 00:58:15.635
Why do I care?

00:58:15.635 --> 00:58:19.860
So this tells me the most
recently accessed element

00:58:19.860 --> 00:58:21.940
was somebody on this path.

00:58:21.940 --> 00:58:22.911
I don't quite know who.

00:58:22.911 --> 00:58:24.910
It could have been this
one, and that would say,

00:58:24.910 --> 00:58:26.410
OK, this is the most
recent direction

00:58:26.410 --> 00:58:27.110
we went through all of them.

00:58:27.110 --> 00:58:28.270
Let's say it's that one.

00:58:28.270 --> 00:58:30.280
Now suppose I access this node.

00:58:30.280 --> 00:58:32.410
What does that tell me?

00:58:32.410 --> 00:58:35.020
Well, if I most recently
accessed left here

00:58:35.020 --> 00:58:38.650
and now I'm accessing the
right, if you look at this node,

00:58:38.650 --> 00:58:40.840
the Wilber 1 bound goes up by 1.

00:58:40.840 --> 00:58:42.400
Because I just accessed left.

00:58:42.400 --> 00:58:44.170
Now I accessed right.

00:58:44.170 --> 00:58:49.120
Also, if I access this node,
this guy, his Wilber 1 bound

00:58:49.120 --> 00:58:51.430
goes up by 1 because now
he's going to the right,

00:58:51.430 --> 00:58:53.200
whereas last time
he went to the left.

00:58:53.200 --> 00:58:56.170
Also, this node previously
went to the right

00:58:56.170 --> 00:58:57.580
and went to the left.

00:58:57.580 --> 00:59:02.200
So Wilber 1 went up
because of this edge,

00:59:02.200 --> 00:59:03.920
and it went up
because of this edge.

00:59:03.920 --> 00:59:07.240
In general, following
non-preferred edges,

00:59:07.240 --> 00:59:10.240
I can pay for because
Wilber 1 goes up by 1

00:59:10.240 --> 00:59:12.190
every time I use a
non-preferred edge.

00:59:12.190 --> 00:59:15.280
This is another way to
state the Wilber 1 bound.

00:59:15.280 --> 00:59:17.320
This is the cool thing.

00:59:17.320 --> 00:59:20.350
As long as I can go
through a path quickly--

00:59:23.419 --> 00:59:25.210
ideally, if I could do
it in constant time,

00:59:25.210 --> 00:59:27.460
this would be a
dynamically-optimal binary

00:59:27.460 --> 00:59:27.990
search tree.

00:59:27.990 --> 00:59:30.573
If I could instantly transport
to where I need to go on a path

00:59:30.573 --> 00:59:32.680
and then jump off the
path to the next path,

00:59:32.680 --> 00:59:35.900
that I can pay for--

00:59:35.900 --> 00:59:38.590
I can spend constant
time to do that--

00:59:38.590 --> 00:59:40.125
then I'd be OK.

00:59:40.125 --> 00:59:42.250
I'm not going to be able
to do it in constant time,

00:59:42.250 --> 00:59:44.935
but I'm going to be able
to do it log log n time.

00:59:44.935 --> 00:59:47.710
I'm going to be able to jump
through a path in log log n

00:59:47.710 --> 00:59:49.270
time, and then jump--

00:59:49.270 --> 00:59:51.730
figure out where I need
to diverge from the path

00:59:51.730 --> 00:59:53.364
because maybe I'm
accessing this guy.

00:59:53.364 --> 00:59:54.280
Jump to the next path.

00:59:54.280 --> 00:59:56.099
Do that in log log n time.

00:59:56.099 --> 00:59:57.640
I've got to update
the path structure

00:59:57.640 --> 00:59:59.620
because now the preferred
child is to the right.

00:59:59.620 --> 01:00:00.703
It used to be to the left.

01:00:00.703 --> 01:00:05.886
So I've got to do something that
will only cost log log n time.

01:00:05.886 --> 01:00:08.260
If I can do that, the lower
bound is the number of edges.

01:00:08.260 --> 01:00:11.390
The upper bound is the
number of non-preferred edges

01:00:11.390 --> 01:00:13.250
times log log n.

01:00:13.250 --> 01:00:21.190
So we get a lower
bound Wilber 1,

01:00:21.190 --> 01:00:22.930
which is going to be
equal to the number

01:00:22.930 --> 01:00:25.480
of non-preferred edges.

01:00:29.099 --> 01:00:30.640
And we're going to
get an upper bound

01:00:30.640 --> 01:00:35.950
through tango trees,
which is going

01:00:35.950 --> 01:00:39.430
to be order number of
non-preferred edges times

01:00:39.430 --> 01:00:42.080
log log n.

01:00:42.080 --> 01:00:42.580
OK.

01:00:42.580 --> 01:00:44.230
Why is it log log n?

01:00:44.230 --> 01:00:47.710
Because each of these paths
has length only log n.

01:00:47.710 --> 01:00:50.500
So put them in a balanced
binary search tree,

01:00:50.500 --> 01:00:53.350
and it has height log log n.

01:00:53.350 --> 01:00:57.220
So take these paths,
squish them into a tree--

01:00:57.220 --> 01:01:01.430
it's hard, I don't know
which way you're squishing.

01:01:01.430 --> 01:01:02.830
It says log n depth.

01:01:02.830 --> 01:01:03.990
It's a path.

01:01:03.990 --> 01:01:05.680
I'm going to fold
it into a tree.

01:01:05.680 --> 01:01:07.450
So it has height only log log n.

01:01:07.450 --> 01:01:09.886
Then I can jump around
it in log log n time.

01:01:09.886 --> 01:01:11.260
That's the idea
with tango trees.

01:01:11.260 --> 01:01:12.345
You're basically done.

01:01:12.345 --> 01:01:15.550
A few details in how they work.

01:01:15.550 --> 01:01:17.510
I don't want to spend
too much time on them,

01:01:17.510 --> 01:01:18.926
but let's go through
some of them.

01:01:42.410 --> 01:01:53.930
So we're going to store
each preferred path

01:01:53.930 --> 01:01:59.840
as an auxiliary
tree, which is just--

01:01:59.840 --> 01:02:02.650
I don't know-- a
red-black tree, say.

01:02:07.800 --> 01:02:10.200
What is the red-black
tree sorted by?

01:02:10.200 --> 01:02:11.710
I don't have a choice.

01:02:11.710 --> 01:02:13.500
Whatever I do has to
be a binary search

01:02:13.500 --> 01:02:15.610
tree among the original keys.

01:02:15.610 --> 01:02:18.180
So if I take these items
and I just throw them

01:02:18.180 --> 01:02:21.854
into a red-black tree, they
will be sorted by whatever

01:02:21.854 --> 01:02:22.770
their x-coordinate is.

01:02:22.770 --> 01:02:24.910
So this is the max,
this is the min.

01:02:24.910 --> 01:02:26.480
This is somewhere in between.

01:02:26.480 --> 01:02:28.320
This is to the left of that.

01:02:28.320 --> 01:02:29.640
So the order is a little weird.

01:02:29.640 --> 01:02:31.980
I'd really like to store
them sorted by depth,

01:02:31.980 --> 01:02:33.390
but I can't do that.

01:02:33.390 --> 01:02:34.965
They are sorted by
their key values.

01:02:38.430 --> 01:02:42.740
Now, what do I need to do
with these auxiliary trees?

01:02:42.740 --> 01:02:46.110
I mean, the basic thing
I do is a search, right?

01:02:46.110 --> 01:02:47.330
I'm searching for a key.

01:02:47.330 --> 01:02:49.580
It's a binary search tree,
so I can still do a search.

01:02:49.580 --> 01:02:52.740
I can figure out this
tree gets represented

01:02:52.740 --> 01:02:56.040
as something more like this.

01:02:56.040 --> 01:02:59.390
That would be a nicely balanced
version of these four nodes.

01:02:59.390 --> 01:03:05.250
So if I called them, I
don't know, a, b, c, d.

01:03:05.250 --> 01:03:06.900
That's their sorted order.

01:03:06.900 --> 01:03:10.290
It's going to be a, b, c, d.

01:03:10.290 --> 01:03:12.502
That's also their
sorted order over here.

01:03:12.502 --> 01:03:14.460
So if I search for my
key, I'll figure out, oh,

01:03:14.460 --> 01:03:18.090
do I fall off here, here,
here, here, or here?

01:03:18.090 --> 01:03:20.400
Now, each of those
corresponds to another path

01:03:20.400 --> 01:03:21.420
I need to visit.

01:03:21.420 --> 01:03:23.730
So if I fall off
the left side of a,

01:03:23.730 --> 01:03:26.940
then I should have a
pointer to this structure.

01:03:26.940 --> 01:03:29.486
If I fall off the--

01:03:29.486 --> 01:03:32.640
I guess these two are empty.

01:03:32.640 --> 01:03:35.820
Those correspond to
these two places.

01:03:35.820 --> 01:03:40.920
If I fall off here, the right
side of c, which is now here,

01:03:40.920 --> 01:03:45.240
this is going to be a
pointer to my new structure

01:03:45.240 --> 01:03:48.570
which corresponds to this one.

01:03:48.570 --> 01:03:51.652
And then this one is going to
correspond to all this stuff--

01:03:51.652 --> 01:03:52.860
well, in particular this one.

01:03:55.960 --> 01:03:59.970
It's a little hard to draw this
picture, but you get the idea.

01:03:59.970 --> 01:04:02.220
You just rebalance
each of these things.

01:04:02.220 --> 01:04:05.160
Keep that the pointers
between the preferred paths

01:04:05.160 --> 01:04:06.774
just as they were.

01:04:06.774 --> 01:04:08.940
This is uniquely defined
how to do this because it's

01:04:08.940 --> 01:04:11.340
a binary search tree.

01:04:11.340 --> 01:04:20.450
So leaves point to other--

01:04:20.450 --> 01:04:24.775
let's call them child
auxiliary trees.

01:04:24.775 --> 01:04:26.640
It uniquely defines
which ones they

01:04:26.640 --> 01:04:29.160
have to point to in
order to still navigate

01:04:29.160 --> 01:04:30.760
the whole structure.

01:04:30.760 --> 01:04:33.810
So it's a weird way of
rebalancing your tree.

01:04:33.810 --> 01:04:36.360
And the point is each of these
red-black trees has height log

01:04:36.360 --> 01:04:39.750
log n because the number of
nodes in it is only log n.

01:04:39.750 --> 01:04:41.626
And that gives us the bound.

01:04:55.620 --> 01:05:03.330
Now, key thing to think about is
what happens when you change--

01:05:03.330 --> 01:05:05.010
I said I have to
be able to achieve

01:05:05.010 --> 01:05:07.990
number of non-preferred
edges times log log n.

01:05:07.990 --> 01:05:08.490
So fine.

01:05:08.490 --> 01:05:10.920
I do a log log n search in here.

01:05:10.920 --> 01:05:12.850
Maybe I decide I
have to go off here.

01:05:12.850 --> 01:05:14.589
Then I do a log log
n search in here.

01:05:14.589 --> 01:05:16.130
And then maybe I
have to go this way.

01:05:16.130 --> 01:05:18.360
So number of
non-preferred edges was 2.

01:05:18.360 --> 01:05:20.440
I did two, maybe three searches.

01:05:20.440 --> 01:05:21.880
Fine.

01:05:21.880 --> 01:05:23.910
It's going to be number
of non-preferred edges

01:05:23.910 --> 01:05:25.410
plus 1 time log log n.

01:05:25.410 --> 01:05:25.980
No big deal.

01:05:33.600 --> 01:05:35.120
Now I have to update.

01:05:35.120 --> 01:05:37.710
Now this is the preferred
edge from the root,

01:05:37.710 --> 01:05:41.070
and this is the preferred
edge from this node.

01:05:41.070 --> 01:05:43.930
How do I update preferred edges?

01:05:43.930 --> 01:05:45.400
That's something to think about.

01:05:45.400 --> 01:05:49.480
So I've got a path represented
by a red-black tree.

01:05:49.480 --> 01:05:54.360
And now I fall off here, and
there's another path here.

01:05:54.360 --> 01:06:00.960
I need to convert this into
a path that goes like this

01:06:00.960 --> 01:06:02.490
and then does this.

01:06:02.490 --> 01:06:05.040
And separately, a
path that does this.

01:06:05.040 --> 01:06:07.110
That's the new version.

01:06:07.110 --> 01:06:08.120
How do I do that?

01:06:08.120 --> 01:06:11.170
Conceptually, it's
pretty simple.

01:06:11.170 --> 01:06:18.630
I want to cut the path here
and then rejoin along there,

01:06:18.630 --> 01:06:20.490
like that.

01:06:20.490 --> 01:06:23.700
So conceptually, if things
were stored by depth,

01:06:23.700 --> 01:06:26.060
this is what we'd call a
split and a concatenate.

01:06:26.060 --> 01:06:28.680
You should know this from
regular binary search trees.

01:06:28.680 --> 01:06:31.530
This is a standard exercise
for red-black trees.

01:06:31.530 --> 01:06:35.670
Given a query, x, you can
cut this tree into two halves

01:06:35.670 --> 01:06:39.000
and get two red-black trees,
which represent everything

01:06:39.000 --> 01:06:42.960
to the left of x and
everything to the right of x.

01:06:42.960 --> 01:06:44.499
Similarly, given
two trees that are

01:06:44.499 --> 01:06:46.290
sorted like this where
all the elements are

01:06:46.290 --> 01:06:48.112
less than all the
elements over here,

01:06:48.112 --> 01:06:50.070
I can concatenate them
into one red-black tree.

01:06:50.070 --> 01:06:51.809
And all of these
take log n time,

01:06:51.809 --> 01:06:53.100
where n is the number of nodes.

01:06:53.100 --> 01:06:56.790
Here, that would
be log log n time.

01:06:56.790 --> 01:06:58.620
In this world, it's
not quite so simple

01:06:58.620 --> 01:07:00.600
because things are
not sorted by depth.

01:07:00.600 --> 01:07:02.670
They're sorted by key value.

01:07:02.670 --> 01:07:04.860
But it's not so bad.

01:07:04.860 --> 01:07:12.730
Because, if you look at some
path and you want to say,

01:07:12.730 --> 01:07:21.150
OK, I want everything
that's below this key value

01:07:21.150 --> 01:07:24.300
or something, then that's
the same as saying,

01:07:24.300 --> 01:07:27.790
well, take everything that is
within this interval of keys.

01:07:27.790 --> 01:07:29.860
So it's strictly
between here and here.

01:07:32.900 --> 01:07:34.460
Let me redraw this slightly.

01:07:45.020 --> 01:07:54.312
So if you look at the nodes
of depth greater than d,

01:07:54.312 --> 01:07:56.350
I want to cut off
everybody that's

01:07:56.350 --> 01:07:58.120
deeper than a
particular spot in order

01:07:58.120 --> 01:08:01.760
to do this kind of change.

01:08:01.760 --> 01:08:12.970
These are equal to nodes
in subtree of that.

01:08:12.970 --> 01:08:16.850
So let me give it a name.

01:08:16.850 --> 01:08:18.830
Let's say I want to cut here.

01:08:18.830 --> 01:08:21.580
So I'm going to
look at this node y.

01:08:21.580 --> 01:08:24.580
This is nodes in
the subtree of y.

01:08:24.580 --> 01:08:26.290
All of the nodes
that are below y

01:08:26.290 --> 01:08:30.790
are obviously going to have
smaller depth than that path.

01:08:30.790 --> 01:08:31.960
This is nodes in a path.

01:08:35.229 --> 01:08:41.920
And nodes in a subtree
are equal to nodes

01:08:41.920 --> 01:08:50.109
with keys in the
min of that subtree

01:08:50.109 --> 01:08:51.600
to the max of that tree.

01:08:51.600 --> 01:08:53.899
It's an interval.

01:08:53.899 --> 01:08:55.090
So what do I do?

01:08:55.090 --> 01:08:56.590
I split at min of y.

01:08:56.590 --> 01:08:58.660
I split at max of y.

01:08:58.660 --> 01:09:00.010
That gives me the interval.

01:09:00.010 --> 01:09:01.210
So here's the picture.

01:09:01.210 --> 01:09:02.290
I have a tree.

01:09:02.290 --> 01:09:04.229
I want to cut out this
interval of nodes.

01:09:04.229 --> 01:09:07.300
This is like range
queries kind of in 1D.

01:09:07.300 --> 01:09:08.229
So I split here.

01:09:08.229 --> 01:09:09.040
I split here.

01:09:09.040 --> 01:09:10.779
What I will have
are the things I

01:09:10.779 --> 01:09:13.160
care about, the things
to the left of it

01:09:13.160 --> 01:09:15.040
and the things to
the right of it.

01:09:15.040 --> 01:09:17.170
What I wanted was this
and everything else.

01:09:17.170 --> 01:09:18.160
How do I do that?

01:09:18.160 --> 01:09:24.010
I concatenate-- this is y.

01:09:24.010 --> 01:09:32.380
This is in the interval
min of y to max of y.

01:09:32.380 --> 01:09:33.348
So I wanted those guys.

01:09:33.348 --> 01:09:35.139
Those are the nodes
that are deeper than d.

01:09:35.139 --> 01:09:37.240
I also want the nodes
all together that

01:09:37.240 --> 01:09:38.870
are less deep than d.

01:09:38.870 --> 01:09:40.939
That's these nodes
and these nodes.

01:09:40.939 --> 01:09:43.029
So I concatenate
these together, get

01:09:43.029 --> 01:09:45.970
one big tree that represents
things with depth less than d.

01:09:45.970 --> 01:09:49.479
These are the things of
depth greater than d.

01:09:49.479 --> 01:09:49.979
OK?

01:09:49.979 --> 01:09:52.810
So I do two splits,
one concatenate,

01:09:52.810 --> 01:09:56.500
and that simulates this
kind of cut operation.

01:09:56.500 --> 01:09:58.780
Similarly, if I want to
do a joint operation,

01:09:58.780 --> 01:10:01.447
it's a constant number of splits
and concatenates, and I'm done.

01:10:01.447 --> 01:10:03.988
Just dealing with the fact that
things are in the wrong order

01:10:03.988 --> 01:10:05.620
here, but it's not so bad.

01:10:10.510 --> 01:10:15.230
One more thing, which is--

01:10:15.230 --> 01:10:17.100
I basically described
the overall structure

01:10:17.100 --> 01:10:19.350
as a tree of auxiliary trees.

01:10:19.350 --> 01:10:22.680
In reality, we're in the
binary search tree model.

01:10:22.680 --> 01:10:25.830
We can only have one tree.

01:10:25.830 --> 01:10:26.910
Not so hard, though.

01:10:26.910 --> 01:10:30.060
I mean, basically,
you want one tree that

01:10:30.060 --> 01:10:34.030
represents lots of trees that
are kind of pasted together.

01:10:34.030 --> 01:10:36.900
So to do that, you
just mark each node

01:10:36.900 --> 01:10:39.780
that transitions from
one tree to the next.

01:10:39.780 --> 01:10:42.810
So each node will say, I am the
root of a new auxiliary tree

01:10:42.810 --> 01:10:45.720
or just say, no, I'm part
of the same auxiliary tree

01:10:45.720 --> 01:10:46.470
as my parent.

01:10:49.530 --> 01:10:51.790
And then you have to
define these kinds of split

01:10:51.790 --> 01:10:55.080
and concatenate operations in
this setting where you have

01:10:55.080 --> 01:10:56.676
a tree embedded inside a tree.

01:10:56.676 --> 01:10:58.050
But you just ignore
all the nodes

01:10:58.050 --> 01:10:59.883
that are claimed to be
part of another tree.

01:10:59.883 --> 01:11:02.290
Just pretend they weren't
there, and it works.

01:11:02.290 --> 01:11:06.819
So a little hand-wavy there, but
it's kind of a tedious detail.

01:11:06.819 --> 01:11:08.610
You can stick all these
trees into one tree

01:11:08.610 --> 01:11:13.100
just by marking these roots.

01:11:13.100 --> 01:11:14.976
And that's tango trees.

01:11:14.976 --> 01:11:18.970
I already spoiled the climax,
which is this log log n thing,

01:11:18.970 --> 01:11:22.220
but it's pretty obvious
how to get there.

01:11:22.220 --> 01:11:25.150
It's just a lot of
details to actually do it.

01:11:25.150 --> 01:11:27.100
We're just taking
the Wilber 1 bound,

01:11:27.100 --> 01:11:30.070
recasting it in terms of
this preferred path thing

01:11:30.070 --> 01:11:33.370
where it's just the
non-preferred edges.

01:11:33.370 --> 01:11:35.600
Or the non-preferred edges
are what Wilber 1 counts,

01:11:35.600 --> 01:11:37.270
and so we can afford
to spend log log n

01:11:37.270 --> 01:11:38.860
time for each of them.

01:11:38.860 --> 01:11:41.100
And the paths themselves
only have log n nodes,

01:11:41.100 --> 01:11:45.470
so you can search through them
in log log n time pretty easy.

01:11:45.470 --> 01:11:47.110
This also shows you
why Wilber 1 is not

01:11:47.110 --> 01:11:50.890
a good bound with a fixed tree.

01:11:50.890 --> 01:11:53.730
Because here are log n nodes.

01:11:53.730 --> 01:11:58.180
I can just sit there all day
bouncing around all of them

01:11:58.180 --> 01:11:59.322
in random order.

01:11:59.322 --> 01:12:01.780
I'm definitely going to need
log log n time to access them,

01:12:01.780 --> 01:12:04.460
but Wilber 1 is not
changing at all.

01:12:04.460 --> 01:12:08.480
So Wilber 1 stays
constant, like 0.

01:12:08.480 --> 01:12:11.140
I had to warm it up, but
after I test everything,

01:12:11.140 --> 01:12:14.710
I can just sit there and bounce
around these guys randomly.

01:12:14.710 --> 01:12:16.730
I've got to spend log
log n time to do that,

01:12:16.730 --> 01:12:19.090
but Wilber 1 doesn't
justify it for me.

01:12:19.090 --> 01:12:22.720
Wilber 2 will go up, but
Wilber 1 with this tree?

01:12:22.720 --> 01:12:24.389
It's kind of lame.

01:12:24.389 --> 01:12:25.930
So this is the best
tango trees could

01:12:25.930 --> 01:12:28.790
hope to do using Wilber 1.

01:12:31.420 --> 01:12:33.790
I would guess that tango
trees are a log log

01:12:33.790 --> 01:12:37.400
factor away from optimal, though
we don't know that for sure.

01:12:37.400 --> 01:12:40.110
But greedy we're still
pretty sure is good.

01:12:40.110 --> 01:12:43.031
It should be a constant
factor away from optimal.

01:12:43.031 --> 01:12:44.780
So I want to talk a
little bit about that.

01:12:48.170 --> 01:12:50.260
There's one thing
on this outline

01:12:50.260 --> 01:12:51.260
we haven't talked about.

01:12:51.260 --> 01:12:52.450
We did independent rectangles.

01:12:52.450 --> 01:12:53.390
We did Wilber 1 and 2.

01:12:53.390 --> 01:12:55.919
We saw applications of them
in particular tango trees.

01:12:55.919 --> 01:12:57.710
One thing we haven't
done is Signed Greedy.

01:13:01.617 --> 01:13:02.700
So let's do Signed Greedy.

01:13:06.550 --> 01:13:10.260
Still left here is we
have two ways to choose

01:13:10.260 --> 01:13:12.330
rectangles,
independent rectangles.

01:13:12.330 --> 01:13:13.290
They're different.

01:13:13.290 --> 01:13:15.690
It would be kind of nice
to know what the best

01:13:15.690 --> 01:13:17.520
way to choose rectangles is.

01:13:17.520 --> 01:13:19.140
And we actually know that--

01:13:25.340 --> 01:13:26.880
Signed Greedy.

01:13:26.880 --> 01:13:28.500
So there's two kinds
of Signed Greedy.

01:13:28.500 --> 01:13:31.260
There's the plus sign
greedy, and there's

01:13:31.260 --> 01:13:32.420
the minus sign greedy.

01:13:36.640 --> 01:13:38.136
How does plus greedy work?

01:13:38.136 --> 01:13:39.510
It's the same as
greedy, you just

01:13:39.510 --> 01:13:41.970
only look at plus rectangles.

01:13:41.970 --> 01:13:44.140
Remember plus rectangles
and minus rectangles.

01:13:44.140 --> 01:13:49.320
So let's look at our
favorite example here.

01:13:49.320 --> 01:13:52.020
With greedy, I would sweep
up, and every rectangle that

01:13:52.020 --> 01:13:54.390
was unsatisfied, I
would satisfy it.

01:13:54.390 --> 01:13:57.570
Now I'm going to ignore
minus rectangles,

01:13:57.570 --> 01:14:00.220
only look at plus rectangles.

01:14:00.220 --> 01:14:02.160
So I see this rectangle,
and I say, oh, I

01:14:02.160 --> 01:14:05.220
don't care because
that's a minus rectangle.

01:14:05.220 --> 01:14:10.880
Then I see this
one and this one.

01:14:10.880 --> 01:14:13.385
I say, oh, those
are plus rectangles.

01:14:13.385 --> 01:14:14.760
So I'm going to
add a point here.

01:14:14.760 --> 01:14:17.190
I'm going to add a point here.

01:14:17.190 --> 01:14:20.760
Then I go up to here.

01:14:20.760 --> 01:14:23.130
I see this rectangle,
which is a plus rectangle.

01:14:23.130 --> 01:14:24.070
That's bad.

01:14:24.070 --> 01:14:25.290
So I've got to add a point.

01:14:25.290 --> 01:14:29.670
I see this minus rectangle
I don't care about.

01:14:29.670 --> 01:14:31.590
This is plus greedy.

01:14:31.590 --> 01:14:33.150
It does not satisfy the set.

01:14:33.150 --> 01:14:35.790
This rectangle
never got satisfied.

01:14:35.790 --> 01:14:37.930
But it plus satisfies the set.

01:14:37.930 --> 01:14:42.780
If I do plus greedy, it
will be plus satisfied.

01:14:42.780 --> 01:14:45.030
Every rectangle you draw
here, if it's plus rectangle,

01:14:45.030 --> 01:14:47.490
it's got another point in it.

01:14:47.490 --> 01:14:50.430
What's kind of nice,
also, is if you actually

01:14:50.430 --> 01:14:54.600
draw the rectangles
you are satisfying--

01:14:54.600 --> 01:14:55.830
maybe I'm use another color.

01:14:58.530 --> 01:15:00.715
There was one rectangle here.

01:15:00.715 --> 01:15:04.650
There was one rectangle here.

01:15:04.650 --> 01:15:08.379
And there was one
rectangle here.

01:15:08.379 --> 01:15:10.170
That's a little awkward
because they're not

01:15:10.170 --> 01:15:12.690
on the original points.

01:15:12.690 --> 01:15:14.490
So I can change
them a little bit,

01:15:14.490 --> 01:15:21.052
maybe move this one down to here
and move this one over to here.

01:15:21.052 --> 01:15:22.510
You could say that
those rectangles

01:15:22.510 --> 01:15:24.700
came from those points.

01:15:24.700 --> 01:15:27.670
Then this is a set of
independent rectangles

01:15:27.670 --> 01:15:30.210
on the original points.

01:15:30.210 --> 01:15:34.900
Maybe not totally
obvious, but plus greedy

01:15:34.900 --> 01:15:49.220
always gives an independent
set of plus rectangles.

01:15:49.220 --> 01:15:50.629
So it's a lower bound.

01:15:50.629 --> 01:15:53.170
It's not an upper bound because
it's not satisfying the point

01:15:53.170 --> 01:15:55.550
set, but it's a lower bound.

01:15:55.550 --> 01:15:58.418
I claim it's a very
good lower bound.

01:16:07.420 --> 01:16:09.280
It by itself might
not be great, but you

01:16:09.280 --> 01:16:11.686
have to consider both of them.

01:16:11.686 --> 01:16:26.340
So theorem is if I take the
max of plus greedy and minus

01:16:26.340 --> 01:16:26.840
greedy--

01:16:30.290 --> 01:16:32.210
each of them is lower
bound, so the max

01:16:32.210 --> 01:16:35.090
is a lower bound on optimal--

01:16:35.090 --> 01:16:37.760
then this is within
a constant factor

01:16:37.760 --> 01:16:41.650
of the biggest possible
independent rectangle lower

01:16:41.650 --> 01:16:42.150
bound.

01:16:51.860 --> 01:16:53.360
And so this is
the way you should

01:16:53.360 --> 01:16:54.680
choose independent rectangles.

01:16:54.680 --> 01:16:55.430
Run plus greedy.

01:16:55.430 --> 01:16:56.150
Run minus greedy.

01:16:56.150 --> 01:16:57.587
Take the best of the two.

01:16:57.587 --> 01:16:59.420
That will always be
within a constant factor

01:16:59.420 --> 01:17:04.015
of the best independent set
of rectangles, factors like 4

01:17:04.015 --> 01:17:06.540
or something in the worst case.

01:17:06.540 --> 01:17:08.480
So let me prove this to you.

01:17:12.110 --> 01:17:13.610
It's a kind of a weird argument.

01:17:16.307 --> 01:17:17.765
I'm going to define
a new quantity.

01:17:20.670 --> 01:17:23.210
Let's call this OPT x, I guess.

01:17:26.300 --> 01:17:31.940
It's sort of like if you
consider plus rectangles

01:17:31.940 --> 01:17:33.837
separately from minus
rectangles, which

01:17:33.837 --> 01:17:34.670
is what we're doing.

01:17:52.440 --> 01:17:55.220
So I would like a point set--

01:17:55.220 --> 01:17:57.620
first, I'd like a plus
satisfying point set,

01:17:57.620 --> 01:18:01.440
and then I'd also like a
minus satisfying point set.

01:18:01.440 --> 01:18:03.020
And then I take their union.

01:18:03.020 --> 01:18:08.340
And I say the cost of that pair
of plus satisfying and minus

01:18:08.340 --> 01:18:10.350
satisfying is the
size of the union.

01:18:10.350 --> 01:18:13.640
So I get a bonus point if
they happen to overlap.

01:18:13.640 --> 01:18:16.040
Not a big deal,
just a factor of 2.

01:18:16.040 --> 01:18:18.200
So this is not a core
concept, but it turns out

01:18:18.200 --> 01:18:20.510
to be basically what we
were doing over here.

01:18:23.300 --> 01:18:27.200
Let me give you a sequence
of crazy inequalities.

01:18:27.200 --> 01:18:29.900
First one is that this
OPT thing is greater than

01:18:29.900 --> 01:18:33.980
or equal to size of the input.

01:18:33.980 --> 01:18:36.060
Each of these inequalities
is totally obvious,

01:18:36.060 --> 01:18:38.056
but the conclusion
is kind of crazy.

01:18:43.860 --> 01:18:46.520
The independent rectangle
lower bound, which we proved,

01:18:46.520 --> 01:18:49.040
says that if you look at
plus satisfying things that's

01:18:49.040 --> 01:18:51.050
going to be at least
size of the input

01:18:51.050 --> 01:18:52.967
plus the max number of
independent rectangles.

01:18:52.967 --> 01:18:54.758
If you look at the
minus satisfying things,

01:18:54.758 --> 01:18:56.990
that's also going to be
at least size of the input

01:18:56.990 --> 01:19:00.320
plus maximum number of minus
independent rectangles.

01:19:00.320 --> 01:19:01.520
So we already proved this.

01:19:01.520 --> 01:19:03.344
That, if you look
at this union, it's

01:19:03.344 --> 01:19:05.510
going to be at least the
size of the input plus half

01:19:05.510 --> 01:19:08.060
the overall max.

01:19:08.060 --> 01:19:12.320
So that's what we proved
at the beginning a lecture.

01:19:12.320 --> 01:19:17.840
Now, this is the best way to
use independent rectangles.

01:19:17.840 --> 01:19:19.970
This kind of Signed
Greedy, which

01:19:19.970 --> 01:19:22.749
is the max of the
two signs, is a way

01:19:22.749 --> 01:19:24.040
to find independent rectangles.

01:19:24.040 --> 01:19:25.331
So it's only going to be worse.

01:19:25.331 --> 01:19:27.480
It's going to be smaller.

01:19:27.480 --> 01:19:32.000
So we can say is greater
than or equal to half

01:19:32.000 --> 01:19:37.630
the max of plus greedy
and minus greedy.

01:19:42.650 --> 01:19:44.160
This was the max.

01:19:44.160 --> 01:19:46.700
So this is another way to do
it, so it must be smaller.

01:19:49.920 --> 01:19:58.710
Now, greedy computes a
plus satisfying assignment.

01:19:58.710 --> 01:20:02.834
So I could say, well, if you
looked at the optimal plus

01:20:02.834 --> 01:20:05.000
satisfying assignment--
this is something we defined

01:20:05.000 --> 01:20:06.440
at the beginning of lecture--

01:20:06.440 --> 01:20:09.590
and the optimal minus
satisfying assignment, that's

01:20:09.590 --> 01:20:11.870
only going to be smaller
than greedy because greedy

01:20:11.870 --> 01:20:15.290
is an algorithm for
solving OPT plus.

01:20:15.290 --> 01:20:19.302
It can't be better
than the optimum.

01:20:19.302 --> 01:20:21.260
Greedy again has to be
bigger than the optimum.

01:20:24.890 --> 01:20:29.110
Now I just want to turn
this max into a plus

01:20:29.110 --> 01:20:32.550
because the max is always
at least the average.

01:20:32.550 --> 01:20:37.670
So if I take the average, which
turns it into 1/4 OPT plus

01:20:37.670 --> 01:20:39.995
plus OPT minus.

01:20:42.990 --> 01:20:44.090
Then that holds.

01:20:44.090 --> 01:20:46.120
You turn the max into a plus.

01:20:46.120 --> 01:20:48.440
If I look at the
optimal plus satisfying

01:20:48.440 --> 01:20:50.690
plus the optimal
minus satisfying,

01:20:50.690 --> 01:20:54.770
that's only going to be
bigger than this thing

01:20:54.770 --> 01:20:56.900
because this can only
save like a factor of 2

01:20:56.900 --> 01:21:00.436
or whatever over
just adding them up.

01:21:00.436 --> 01:21:02.060
I don't even need to
factor of 2 thing.

01:21:02.060 --> 01:21:05.810
I just need that
if you add them up,

01:21:05.810 --> 01:21:08.270
that's only going to be
worse than just counting them

01:21:08.270 --> 01:21:10.350
as the union.

01:21:10.350 --> 01:21:13.100
So we get what I
call a sandwich.

01:21:13.100 --> 01:21:15.860
On the one side, we have OPT x.

01:21:15.860 --> 01:21:17.489
On the other side,
we have 1/4 OPT x.

01:21:17.489 --> 01:21:19.280
I really don't care
about OPT x personally.

01:21:19.280 --> 01:21:21.690
I mean, it's kind of interesting
to see that it's here.

01:21:21.690 --> 01:21:23.910
But the point is these are
within a constant factor.

01:21:23.910 --> 01:21:25.618
Therefore, all of
these things in between

01:21:25.618 --> 01:21:27.690
are within a constant
factor of each other.

01:21:27.690 --> 01:21:31.970
So in particular, this thing,
max of the two greedys,

01:21:31.970 --> 01:21:34.070
is within a constant
factor of this thing.

01:21:34.070 --> 01:21:37.070
This is the independent
rectangle lower bound,

01:21:37.070 --> 01:21:38.540
the best one.

01:21:38.540 --> 01:21:40.460
It also tells you that
OPT x is basically

01:21:40.460 --> 01:21:43.230
what we're computing here.

01:21:43.230 --> 01:21:44.640
So this is weird.

01:21:44.640 --> 01:21:49.190
I'm going to draw one
more picture, which

01:21:49.190 --> 01:21:56.630
is greedy versus Signed Greedy.

01:21:56.630 --> 01:21:58.760
Remember greedy
from last lecture.

01:21:58.760 --> 01:22:02.930
Greedy says, look, I'm going
to fix plus rectangles,

01:22:02.930 --> 01:22:04.640
and I'm going to fix
minus rectangles.

01:22:04.640 --> 01:22:06.740
It does them both
at the same time.

01:22:06.740 --> 01:22:08.600
Signed Greedy says,
look, I'm going

01:22:08.600 --> 01:22:11.270
to do the plus
rectangles separately,

01:22:11.270 --> 01:22:14.510
and then I'm going to the
minus rectangles separately,

01:22:14.510 --> 01:22:17.605
and then add them up or take
the union or take the max.

01:22:17.605 --> 01:22:18.370
It doesn't matter.

01:22:18.370 --> 01:22:19.760
It's a constant factor.

01:22:19.760 --> 01:22:22.670
Just add them separately.

01:22:22.670 --> 01:22:24.920
This one is an upper bound.

01:22:24.920 --> 01:22:27.680
It is a binary search tree.

01:22:27.680 --> 01:22:29.030
This thing is a lower bound.

01:22:29.030 --> 01:22:32.780
All binary search trees
must take at least this.

01:22:32.780 --> 01:22:35.720
Are they equal up
to constant factors?

01:22:35.720 --> 01:22:36.500
We don't know.

01:22:36.500 --> 01:22:37.970
That's the big question.

01:22:37.970 --> 01:22:40.400
They look almost identical.

01:22:40.400 --> 01:22:43.280
But what greedy has to deal with
is sort of the interrelations.

01:22:43.280 --> 01:22:45.980
When I fix some
plus rectangles, I

01:22:45.980 --> 01:22:49.305
might get new minus rectangles
that I have to fix with greedy.

01:22:49.305 --> 01:22:51.180
Signed Greedy doesn't
have to deal with that.

01:22:51.180 --> 01:22:52.989
It's just the plus rectangles.

01:22:52.989 --> 01:22:54.530
They might make more
plus rectangles,

01:22:54.530 --> 01:22:56.120
but that's all I
have to deal with.

01:22:56.120 --> 01:22:57.620
It doesn't deal
with the interaction

01:22:57.620 --> 01:22:59.420
between plus and
minus rectangles.

01:22:59.420 --> 01:23:03.290
Seems like the interaction
kind of fades away

01:23:03.290 --> 01:23:04.207
as a geometric series.

01:23:04.207 --> 01:23:05.873
And therefore, these
things are the same

01:23:05.873 --> 01:23:07.040
up to constant factors.

01:23:07.040 --> 01:23:08.900
But we have no
way to prove that.

01:23:08.900 --> 01:23:12.380
It could be the interaction
blows you out of the water

01:23:12.380 --> 01:23:14.750
somehow.

01:23:14.750 --> 01:23:19.271
That's the best we know
for dynamic optimality.

01:23:19.271 --> 01:23:21.770
Maybe next time I teach this
class we'll have a final answer

01:23:21.770 --> 01:23:25.730
and it'll be constant, but
that's where we are today.