WEBVTT
00:00:17.970 --> 00:00:20.400
PROFESSOR: I sent out
a survey this morning
00:00:20.400 --> 00:00:22.400
about how the class
is going, what
00:00:22.400 --> 00:00:23.970
you thought of the problem set.
00:00:23.970 --> 00:00:27.860
And I would appreciate if you
provide me some feedback--
00:00:27.860 --> 00:00:31.740
so things you like or don't
like about the class or about
00:00:31.740 --> 00:00:34.364
the problem set that
was just due last night.
00:00:34.364 --> 00:00:38.115
So I can try to adjust to make
it more interesting and useful
00:00:38.115 --> 00:00:38.740
for all of you.
00:00:42.600 --> 00:00:45.200
Last time we talked about
Szemerédi's graph regularity
00:00:45.200 --> 00:00:46.100
lemma.
00:00:46.100 --> 00:00:48.020
So the regularity
lemma, as I mentioned,
00:00:48.020 --> 00:00:52.230
is an extremely powerful
tool in modern combinatorics.
00:00:52.230 --> 00:00:55.380
And last time we saw the
statement and the proof
00:00:55.380 --> 00:00:57.690
of this regularity lemma.
00:00:57.690 --> 00:01:01.290
Today, I want to show you
how to apply the lemma
00:01:01.290 --> 00:01:04.709
for extremal applications.
00:01:04.709 --> 00:01:08.550
In particular, we'll see how
to prove Roth's theorem that I
00:01:08.550 --> 00:01:10.530
mentioned in the
very first lecture,
00:01:10.530 --> 00:01:14.580
about subsets of integers
lacking three-term arithmetic
00:01:14.580 --> 00:01:17.340
progressions.
00:01:17.340 --> 00:01:21.360
First, let me remind you
the regularity lemma.
00:01:21.360 --> 00:01:25.930
We're always working
inside some graph, G.
00:01:25.930 --> 00:01:32.870
We say that a pair of
subsets of vertices
00:01:32.870 --> 00:01:42.130
is epsilon regular if
the following holds--
00:01:42.130 --> 00:01:57.660
for all subsets A of X, B
of Y, neither too small,
00:01:57.660 --> 00:02:03.320
we have that the edge
density between A and B
00:02:03.320 --> 00:02:08.340
is very similar to the edge
density between the ambient
00:02:08.340 --> 00:02:11.437
sets X and Y.
00:02:11.437 --> 00:02:13.020
So we had this picture
from last time.
00:02:19.040 --> 00:02:20.690
You have two sets.
00:02:20.690 --> 00:02:22.190
Now, they don't
have to be disjoint.
00:02:22.190 --> 00:02:23.523
They could even be the same set.
00:02:23.523 --> 00:02:25.430
But for illustration
purposes, it's
00:02:25.430 --> 00:02:27.650
easier to visualize what's
going on if I draw them
00:02:27.650 --> 00:02:31.100
as disjoint subsets.
00:02:31.100 --> 00:02:33.320
So there is some edge density.
00:02:33.320 --> 00:02:36.020
And I say they're epsilon
regular if they behave
00:02:36.020 --> 00:02:38.360
random-like in the
following sense--
00:02:38.360 --> 00:02:42.200
that the edges are somehow
distributed in a fairly uniform
00:02:42.200 --> 00:02:46.790
way so that if I look
at some smaller subsets
00:02:46.790 --> 00:02:50.810
A and B, but not too small,
then the edge densities
00:02:50.810 --> 00:02:56.170
between A and B is very similar
to the ambient edge densities.
00:02:56.170 --> 00:02:58.360
So by most, epsilon difference.
00:02:58.360 --> 00:03:01.040
Now I need that A and
B are not too small
00:03:01.040 --> 00:03:05.270
because if you allow to take,
for example, single vertices,
00:03:05.270 --> 00:03:09.620
you can easily get densities
that are either 0 or 1.
00:03:09.620 --> 00:03:12.530
So then it's very hard to
make any useful statement.
00:03:12.530 --> 00:03:16.480
So that's why these two
conditions are needed.
00:03:16.480 --> 00:03:20.590
And here, the edge
density is defined
00:03:20.590 --> 00:03:25.210
to be the number of edges
with one endpoint in A,
00:03:25.210 --> 00:03:31.160
one endpoint in B, divided by
the product of the sizes of A
00:03:31.160 --> 00:03:40.220
and B. And we say that a
partition of the vertex set
00:03:40.220 --> 00:03:57.300
of the graph is epsilon regular
if, by summing over all pairs
00:03:57.300 --> 00:04:10.390
i, j, such that vi, vj is not
epsilon regular if we sum up
00:04:10.390 --> 00:04:13.360
the product of these
part sizes, then
00:04:13.360 --> 00:04:19.750
this sum is at most an epsilon
fraction of the total number
00:04:19.750 --> 00:04:22.480
of pairs of vertices.
00:04:22.480 --> 00:04:25.210
And the way to think of this
is that there are not too
00:04:25.210 --> 00:04:27.880
many irregular parts.
00:04:27.880 --> 00:04:30.710
At least in the case when
all the parts are equitable.
00:04:30.710 --> 00:04:32.770
So we should really
think about all
00:04:32.770 --> 00:04:35.260
of them having more or less
the same size than saying
00:04:35.260 --> 00:04:38.276
that at most an epsilon
fraction of them are irregular.
00:04:42.940 --> 00:04:46.940
And the main theorem from last
time was Szemerédi's regularity
00:04:46.940 --> 00:04:47.440
lemma.
00:05:01.560 --> 00:05:05.130
And the statement is
that for every epsilon,
00:05:05.130 --> 00:05:07.850
there exists some M--
00:05:07.850 --> 00:05:10.730
so M depends only
on epsilon and not
00:05:10.730 --> 00:05:13.460
on the graph that
we're about to see--
00:05:13.460 --> 00:05:28.210
such that every graph has
an epsilon regular partition
00:05:28.210 --> 00:05:32.780
into at most M parts.
00:05:36.230 --> 00:05:37.670
In particular, the
number of parts
00:05:37.670 --> 00:05:40.220
does not depend on the graph.
00:05:40.220 --> 00:05:42.050
For every epsilon,
there is some M.
00:05:42.050 --> 00:05:43.760
And no matter how
large the graph,
00:05:43.760 --> 00:05:45.950
there exists a
bounded size partition
00:05:45.950 --> 00:05:49.130
that is epsilon regular.
00:05:49.130 --> 00:05:51.620
So the proof last
time gave us a bound M
00:05:51.620 --> 00:05:54.550
that is quite large as
a function of epsilon.
00:05:54.550 --> 00:05:56.960
So the last time
we saw that this M
00:05:56.960 --> 00:06:06.230
was a tower of twos of height
essentially polynomial in 1
00:06:06.230 --> 00:06:08.770
over epsilon.
00:06:08.770 --> 00:06:13.160
And I mentioned that you
basically cannot improve this
00:06:13.160 --> 00:06:14.850
bound.
00:06:14.850 --> 00:06:17.410
So this bound is more or
less the best possible
00:06:17.410 --> 00:06:20.050
up to maybe changing the 5.
00:06:20.050 --> 00:06:22.940
And so in some sense, the
proof that we gave last time
00:06:22.940 --> 00:06:26.750
for Szemerédi's graph regularity
lemma was the right proof.
00:06:26.750 --> 00:06:28.690
So that was the
sequence of steps that
00:06:28.690 --> 00:06:30.315
were the right things to do.
00:06:30.315 --> 00:06:31.940
Even though they give
a terrible bound,
00:06:31.940 --> 00:06:35.060
it's somehow the bound
that should come out.
00:06:38.690 --> 00:06:40.600
What I want to talk
about today is,
00:06:40.600 --> 00:06:44.270
what's a regularity
partition good for?
00:06:44.270 --> 00:06:48.230
So we did all this work to
get a regularity partition,
00:06:48.230 --> 00:06:50.210
and it has all of
these nice definitions.
00:06:50.210 --> 00:06:51.850
But they are useful
for something.
00:06:51.850 --> 00:06:55.130
So what is it useful for?
00:06:55.130 --> 00:06:57.050
And here is the intuition.
00:06:59.820 --> 00:07:02.530
Remember at the
beginning of last lecture
00:07:02.530 --> 00:07:06.070
I mentioned this informal
statement of regularity lemma--
00:07:06.070 --> 00:07:10.360
namely that there exists
a partition of the graph
00:07:10.360 --> 00:07:13.960
so that most pairs
look random-like.
00:07:21.290 --> 00:07:23.810
So what does random-like mean?
00:07:23.810 --> 00:07:26.710
So random-like, there is
a specific definition.
00:07:26.710 --> 00:07:30.620
But the intuition is that in
many aspects, especially when
00:07:30.620 --> 00:07:34.460
it comes to counting
small patterns,
00:07:34.460 --> 00:07:37.220
the graph in the
random-like setting
00:07:37.220 --> 00:07:41.090
looks very similar to what
happens in a random graph--
00:07:41.090 --> 00:07:42.920
in a genuine random graph.
00:07:42.920 --> 00:07:47.150
In particular, if you
have three subsets--
00:07:47.150 --> 00:07:49.670
x, y, and z--
00:07:49.670 --> 00:07:59.000
and suppose that the three
pairs are all epsilon regular,
00:07:59.000 --> 00:08:07.220
then you might be interested
in the number of triangles
00:08:07.220 --> 00:08:09.830
with one vertex in each set.
00:08:13.720 --> 00:08:17.690
Now, if this were a
genuine random tripartite
00:08:17.690 --> 00:08:20.900
graph with specified
edge densities,
00:08:20.900 --> 00:08:23.240
then the number of triangles
in such a random graph
00:08:23.240 --> 00:08:24.960
is pretty easy to calculate.
00:08:24.960 --> 00:08:28.380
You would expect
that it is around
00:08:28.380 --> 00:08:34.520
the product of the sizes of
these vertex sets multiplied
00:08:34.520 --> 00:08:37.070
by their edge densities.
00:08:42.120 --> 00:08:44.790
And what we will see
is that in this case,
00:08:44.790 --> 00:08:49.200
in that of an epsilon
regular setting,
00:08:49.200 --> 00:08:50.750
this is also a true statement.
00:08:50.750 --> 00:08:53.220
It's a true,
deterministic statement.
00:08:53.220 --> 00:08:56.410
That's one of the consequences
of epsilon regularity.
00:08:56.410 --> 00:08:57.250
Yes, question?
00:08:57.250 --> 00:08:58.792
AUDIENCE: Why are
we only multiplying
00:08:58.792 --> 00:09:02.293
the sizes [INAUDIBLE]?
00:09:02.293 --> 00:09:04.210
PROFESSOR: Asking, why
are we only multiplying
00:09:04.210 --> 00:09:07.400
the sizes of x, y, and z?
00:09:07.400 --> 00:09:08.940
So you're asking--
00:09:08.940 --> 00:09:09.440
OK.
00:09:09.440 --> 00:09:12.230
So we're trying to find out
how many triangles are there
00:09:12.230 --> 00:09:16.790
with one vertex in x, one vertex
in y, and one vertex in z.
00:09:16.790 --> 00:09:20.030
So if I put these vertices
in there, one by one,
00:09:20.030 --> 00:09:22.400
then if this were
a random graph,
00:09:22.400 --> 00:09:30.730
I expect that pair to be an edge
with probability dxy and so on.
00:09:30.730 --> 00:09:33.900
So if all the edge densities
were one half, then
00:09:33.900 --> 00:09:38.620
I expect one eighth of these
triples to be actual triangles.
00:09:38.620 --> 00:09:41.710
And what we're saying is that
in an epsilon regular setting,
00:09:41.710 --> 00:09:47.490
that is approximately
a true statement.
00:09:47.490 --> 00:09:49.380
So let me formalize
this intuition
00:09:49.380 --> 00:09:51.666
into an actual statement.
00:09:58.970 --> 00:10:01.980
And this type of statements
are known as counting lemmas
00:10:01.980 --> 00:10:02.860
in literature.
00:10:02.860 --> 00:10:06.000
And in particular, let's look
at the triangle counting lemma.
00:10:15.860 --> 00:10:17.340
In the triangle counting lemma--
00:10:17.340 --> 00:10:19.500
so we're using the same
picture over there--
00:10:19.500 --> 00:10:27.150
I have three vertex subsets
of some given graph.
00:10:27.150 --> 00:10:28.740
Again, they don't
have to be disjoint.
00:10:28.740 --> 00:10:30.570
They could overlap,
but it's fine to think
00:10:30.570 --> 00:10:31.820
about that picture over there.
00:10:34.310 --> 00:10:41.655
And suppose that these
three pairs of subsets--
00:10:44.220 --> 00:10:47.890
so these three subsets-- they
are mutually epsilon regular.
00:10:57.270 --> 00:11:01.740
Then, for abbreviation,
let me write
00:11:01.740 --> 00:11:07.580
the sub xy to be the edge
density between x and y,
00:11:07.580 --> 00:11:09.500
and so on for the
other two pairs.
00:11:12.410 --> 00:11:15.800
The conclusion is that
the number of triangles--
00:11:19.810 --> 00:11:25.420
where I'm looking at triangles
and only counting triangles
00:11:25.420 --> 00:11:29.940
with one specified vertex in
x, one in y, and one in z--
00:11:33.740 --> 00:11:37.010
is at least some quantity.
00:11:37.010 --> 00:11:39.950
So there is a small
potential error
00:11:39.950 --> 00:11:44.930
loss but otherwise the product,
as I mentioned earlier.
00:11:57.410 --> 00:12:00.890
So it is at least this
quantity I mentioned earlier up
00:12:00.890 --> 00:12:04.220
to a potential small
error because we're
00:12:04.220 --> 00:12:06.062
looking at epsilon regularity.
00:12:06.062 --> 00:12:07.520
So there could be
some fluctuations
00:12:07.520 --> 00:12:10.110
in both directions.
00:12:10.110 --> 00:12:13.493
A similar statement is also
true as an upper bound.
00:12:13.493 --> 00:12:15.160
But the lower bound
will be more useful,
00:12:15.160 --> 00:12:17.290
so I will show you the
proof of the lower bound.
00:12:17.290 --> 00:12:19.590
But you can figure out
how to do the upper bound.
00:12:19.590 --> 00:12:23.490
And later on we'll see a
general proof what happens,
00:12:23.490 --> 00:12:26.910
instead of triangles, if you
have other subgraphs that you
00:12:26.910 --> 00:12:29.920
wish to count.
00:12:29.920 --> 00:12:30.940
So here's the intuition.
00:12:30.940 --> 00:12:32.840
So you have a
random-like setting,
00:12:32.840 --> 00:12:34.570
and we'll formalize
it in the setting
00:12:34.570 --> 00:12:36.655
of epsilon regular pairs.
00:12:36.655 --> 00:12:37.155
Yeah?
00:12:37.155 --> 00:12:40.800
AUDIENCE: Where does the 1
minus 2 epsilon come from?
00:12:40.800 --> 00:12:41.537
PROFESSOR: OK.
00:12:41.537 --> 00:12:43.870
The question is, where does
1 minus 2 epsilon come from?
00:12:43.870 --> 00:12:45.680
You'll see in the proof.
00:12:45.680 --> 00:12:47.980
But you should think
of this as essentially
00:12:47.980 --> 00:12:49.020
a negligible factor.
00:12:51.580 --> 00:12:54.850
Any more questions?
00:12:54.850 --> 00:12:55.350
All right.
00:12:55.350 --> 00:12:57.410
So here's how this
proof is going to go.
00:13:03.850 --> 00:13:10.880
Let's look at x and think
about its relationship to y.
00:13:10.880 --> 00:13:12.820
It's epsilon regular.
00:13:12.820 --> 00:13:14.450
And they claim, as
a result of them
00:13:14.450 --> 00:13:21.590
being epsilon regular, fewer
than epsilon fraction of x.
00:13:21.590 --> 00:13:30.900
So fewer than this many vertices
in x have very small number
00:13:30.900 --> 00:13:32.720
of neighbors in y.
00:13:42.270 --> 00:13:45.330
Because if this
were not the case,
00:13:45.330 --> 00:13:49.740
then you can violate
the condition
00:13:49.740 --> 00:13:51.480
of absolute regularity.
00:13:51.480 --> 00:13:56.890
So if not, then let's
look at this subset, which
00:13:56.890 --> 00:14:01.690
has size at least epsilon x.
00:14:01.690 --> 00:14:12.250
And all of them have fewer than
that number of neighbors to y.
00:14:12.250 --> 00:14:16.750
So these two sets--
00:14:16.750 --> 00:14:21.480
so this set, x
prime and y, would
00:14:21.480 --> 00:14:28.420
witness non-epsilon regularity.
00:14:28.420 --> 00:14:31.300
So you cannot have too many
vertices with small degrees
00:14:31.300 --> 00:14:32.260
going to x--
00:14:32.260 --> 00:14:34.400
going to y.
00:14:34.400 --> 00:14:34.900
OK.
00:14:34.900 --> 00:14:35.400
Great.
00:14:37.890 --> 00:14:45.260
And likewise, fewer
than epsilon x vertices
00:14:45.260 --> 00:14:49.560
have a small number
of neighbors to z.
00:15:01.705 --> 00:15:03.330
So what does the
picture now look like?
00:15:08.570 --> 00:15:13.210
So you have this x and
then these two other sets,
00:15:13.210 --> 00:15:24.680
y and z where I'm going to throw
out a small proportion of x
00:15:24.680 --> 00:15:29.480
less than 2 epsilon
fraction of x that have
00:15:29.480 --> 00:15:31.910
the wrong kinds of degrees.
00:15:31.910 --> 00:15:39.290
And everything else in here
have lots of neighbors in both y
00:15:39.290 --> 00:15:40.115
and in z.
00:15:40.115 --> 00:15:50.120
And in particular, for all x up
here it has lots of neighbors
00:15:50.120 --> 00:15:56.100
to y, lots of neighbors to z.
00:15:56.100 --> 00:15:57.080
How many?
00:15:57.080 --> 00:16:06.110
Well, we have at least d sub xy
minus epsilon y neighbors to y
00:16:06.110 --> 00:16:13.080
and at least d sub xz minus
epsilon times z neighbors to z.
00:16:22.390 --> 00:16:23.980
OK.
00:16:23.980 --> 00:16:29.314
So now I realize I'm missing
a hypothesis in the counting
00:16:29.314 --> 00:16:29.814
lemma.
00:16:34.080 --> 00:16:41.580
Let me assume that none of these
edge densities are too small.
00:16:47.980 --> 00:16:49.630
They're all at least 2 epsilon.
00:16:56.410 --> 00:17:02.750
So now these guys are at least
epsilon fractions of y and z.
00:17:06.260 --> 00:17:17.619
So I can apply the definition
of epsilon regularity
00:17:17.619 --> 00:17:25.650
to the pair yz to deduce
that there are lots of edges
00:17:25.650 --> 00:17:26.960
between these two sets.
00:17:29.650 --> 00:17:37.530
So over here, the
number of edges
00:17:37.530 --> 00:17:57.970
is at least the products of the
sizes multiplied by the edge
00:17:57.970 --> 00:17:59.680
density between them.
00:17:59.680 --> 00:18:02.260
And by the definition
of epsilon regularity,
00:18:02.260 --> 00:18:05.470
the edge density between these
two small or these two red sets
00:18:05.470 --> 00:18:10.090
is at least d of
yz minus epsilon.
00:18:12.950 --> 00:18:14.890
So putting everything
now together,
00:18:14.890 --> 00:18:22.980
we find that the total
number of triangles,
00:18:22.980 --> 00:18:27.336
looking at all the possible
places where x can go--
00:18:27.336 --> 00:18:33.550
so at least 1 minus 2
epsilon times the size of x.
00:18:33.550 --> 00:18:38.710
And then multiply by
this factor over here.
00:18:43.720 --> 00:18:47.910
And so we find the
statement up there.
00:18:47.910 --> 00:18:50.940
So this calculation
formalizes the intuition
00:18:50.940 --> 00:18:53.010
that if you have
epsilon regular pairs,
00:18:53.010 --> 00:18:55.350
then they behave
like random settings
00:18:55.350 --> 00:18:57.540
when it comes to counting
small patterns-- namely
00:18:57.540 --> 00:18:58.332
that of a triangle.
00:19:03.280 --> 00:19:06.510
So what can we use this for?
00:19:06.510 --> 00:19:08.520
The next statement
I want to show you
00:19:08.520 --> 00:19:11.470
is called a triangle
removal lemma.
00:19:22.490 --> 00:19:24.830
So this is a somewhat
innocuous looking
00:19:24.830 --> 00:19:28.760
statement that is
surprisingly tricky to prove.
00:19:28.760 --> 00:19:32.960
And part of the development
of this regularity lemma was
00:19:32.960 --> 00:19:36.830
to prove Szemerédi's-- to
the triangle removal lemma.
00:19:36.830 --> 00:19:40.360
This was one of the early
applications of the regularity
00:19:40.360 --> 00:19:41.650
lemma.
00:19:41.650 --> 00:19:50.220
So it's due to Ruzsa and
Szemerédi back in the '70s.
00:19:53.170 --> 00:19:55.150
Here's the statement.
00:19:55.150 --> 00:19:59.800
For every epsilon
there exists a delta,
00:19:59.800 --> 00:20:13.400
such that every graph of n
vertices with a small number
00:20:13.400 --> 00:20:15.540
of triangles--
00:20:15.540 --> 00:20:17.510
so a small number
of triangles means
00:20:17.510 --> 00:20:21.170
a negligible fraction of all
the possible triples of vertices
00:20:21.170 --> 00:20:22.580
are actual triangles.
00:20:22.580 --> 00:20:26.120
So fewer than delta
n cubed triangles.
00:20:31.097 --> 00:20:33.430
So if you have a graph with
a small number of triangles,
00:20:33.430 --> 00:20:37.690
the question is, can you make
it triangle free by getting rid
00:20:37.690 --> 00:20:40.231
of a small number of edges?
00:20:40.231 --> 00:20:43.430
So actually, there was already
a problem on the first homework
00:20:43.430 --> 00:20:47.300
set that is in that spirit.
00:20:47.300 --> 00:20:50.450
So if you compare what I'm
doing here to the homework set,
00:20:50.450 --> 00:20:52.400
you'll see that there
are different scales.
00:20:52.400 --> 00:20:57.110
So fewer than delta
n cubed triangles
00:20:57.110 --> 00:21:12.740
can be made triangle free
by removing epsilon n
00:21:12.740 --> 00:21:13.730
squared edges.
00:21:22.028 --> 00:21:23.820
So if you have a small
number of triangles,
00:21:23.820 --> 00:21:25.320
you can get rid of
all the triangles
00:21:25.320 --> 00:21:27.935
by removing a small
number of edges.
00:21:27.935 --> 00:21:29.310
If I put it that
way, it actually
00:21:29.310 --> 00:21:31.230
sounds kind of trivial.
00:21:31.230 --> 00:21:33.150
You just get rid of
all the triangles.
00:21:33.150 --> 00:21:36.550
But if you look at the scales
it's not trivial at all,
00:21:36.550 --> 00:21:41.740
because there are only a
subcubic number of triangles.
00:21:41.740 --> 00:21:45.150
So if you take out one
edge from each triangle,
00:21:45.150 --> 00:21:48.425
maybe you got rid
of a lot of edges.
00:21:48.425 --> 00:21:50.300
So this is a very innocent
looking statement,
00:21:50.300 --> 00:21:53.720
but it's actually
incredibly deep and tricky.
00:21:53.720 --> 00:21:56.060
Before jumping to the
proof, let me first
00:21:56.060 --> 00:22:02.120
show you an equivalent
reformulation of the statement
00:22:02.120 --> 00:22:04.640
that also helps you to think
about what this statement is
00:22:04.640 --> 00:22:05.350
trying to say.
00:22:16.480 --> 00:22:19.930
So the triangle removal lemma
can be equivalently stated
00:22:19.930 --> 00:22:28.720
as saying that
every n vertex graph
00:22:28.720 --> 00:22:35.180
with a subcubic
number of triangles--
00:22:35.180 --> 00:22:38.320
so little o of n
cubed triangles--
00:22:42.780 --> 00:22:56.240
can be made triangle free
by removing a subquadratic--
00:22:56.240 --> 00:22:58.700
namely, little o of n squared--
00:22:58.700 --> 00:22:59.460
number of edges.
00:23:05.710 --> 00:23:07.140
So this is an
equivalent statement
00:23:07.140 --> 00:23:09.460
to what I wrote above,
although it actually
00:23:09.460 --> 00:23:12.550
takes some thought to figure
out what this is even saying
00:23:12.550 --> 00:23:16.440
because everybody loves
using asymptotic notation,
00:23:16.440 --> 00:23:18.920
but there is also
ambiguity with,
00:23:18.920 --> 00:23:21.250
what do you mean by
asymptotic notation,
00:23:21.250 --> 00:23:26.200
especially if it appears in
the hypothesis of a claim?
00:23:26.200 --> 00:23:27.950
So what do you think
this statement means?
00:23:27.950 --> 00:23:33.310
Can you write out
more of a full form?
00:23:33.310 --> 00:23:35.770
I think of this
as a lazy version
00:23:35.770 --> 00:23:38.090
of trying to say something.
00:23:38.090 --> 00:23:42.010
So what do you mean by having
little o of n cubed triangles?
00:23:46.990 --> 00:23:47.986
Yes.
00:23:47.986 --> 00:23:49.978
AUDIENCE: The
sequence of the graph.
00:23:49.978 --> 00:23:50.974
[INAUDIBLE]
00:23:54.545 --> 00:23:58.262
AUDIENCE: [INAUDIBLE]
function has n and only n.
00:23:58.262 --> 00:23:59.236
[INAUDIBLE]
00:24:04.277 --> 00:24:04.860
PROFESSOR: OK.
00:24:04.860 --> 00:24:05.360
Great.
00:24:05.360 --> 00:24:07.710
So I have a sequence of graphs.
00:24:07.710 --> 00:24:10.207
And also, we can put
some functions in.
00:24:10.207 --> 00:24:11.790
So I'll write down
the statement here,
00:24:11.790 --> 00:24:12.957
but that's kind of the idea.
00:24:12.957 --> 00:24:14.820
We're looking at not
just a single graph,
00:24:14.820 --> 00:24:17.340
but we're looking at a sequence.
00:24:17.340 --> 00:24:23.490
Another way to say this
is that for every function
00:24:23.490 --> 00:24:28.390
fn, that is subcubic.
00:24:28.390 --> 00:24:32.730
So for example, if f of n is
n cubed divided about log n,
00:24:32.730 --> 00:24:39.060
there exists some function
g, which is subquadratic,
00:24:39.060 --> 00:24:42.810
such that if you replace
the first one by f of n
00:24:42.810 --> 00:24:46.800
and the second thing by g of n,
then this is a true statement.
00:24:46.800 --> 00:24:48.940
And I'll leave it to
you as an exercise
00:24:48.940 --> 00:24:52.230
in quantified
elimination, let's say,
00:24:52.230 --> 00:24:54.948
to explain why these
two statements are
00:24:54.948 --> 00:24:55.990
equivalent to each other.
00:25:04.380 --> 00:25:09.750
I want to explain a recipe for
applying Szemerédi's regularity
00:25:09.750 --> 00:25:10.250
lemma.
00:25:10.250 --> 00:25:12.680
How does one use
the regularity lemma
00:25:12.680 --> 00:25:17.990
to prove, well, statements
in graph theory?
00:25:17.990 --> 00:25:22.100
The most standard applications
of regularity lemma
00:25:22.100 --> 00:25:23.810
generally have the
following steps.
00:25:26.990 --> 00:25:29.050
Let me call this a recipe.
00:25:29.050 --> 00:25:30.330
And we'll see it a few times.
00:25:46.510 --> 00:25:54.010
The first step is we apply
Szemerédi's regularity lemma
00:25:54.010 --> 00:25:55.828
to obtain a partition.
00:26:08.290 --> 00:26:11.653
So let me call the
first step partition.
00:26:16.860 --> 00:26:20.520
In the second step, we look at
the partition that we obtained,
00:26:20.520 --> 00:26:22.410
and we clean it up.
00:26:22.410 --> 00:26:26.610
So in the partition, you have
some irregular pairs that
00:26:26.610 --> 00:26:28.690
are undesirable to work with.
00:26:28.690 --> 00:26:31.160
And there are some other
pairs that we'll see.
00:26:31.160 --> 00:26:35.820
So in particular, if
your pair involves
00:26:35.820 --> 00:26:39.300
edges that are fairly
sparse or subsets
00:26:39.300 --> 00:26:41.825
of vertices that are
fairly small, then
00:26:41.825 --> 00:26:43.200
maybe we don't
want to touch them
00:26:43.200 --> 00:26:47.320
because they're kind of
not so good to deal with.
00:26:47.320 --> 00:26:52.480
So we're going to
clean the graph
00:26:52.480 --> 00:27:10.140
by removing edges in irregular
pairs and low density pairs.
00:27:17.100 --> 00:27:19.910
And unless you're using the
version of regularity lemma
00:27:19.910 --> 00:27:22.920
that allows you to
have equitable parts,
00:27:22.920 --> 00:27:31.320
you also want to get rid of
edges where one of the parts
00:27:31.320 --> 00:27:32.250
is too small.
00:27:43.210 --> 00:27:49.630
And the third step,
I'll call this count.
00:27:49.630 --> 00:27:52.690
Once you've cleaned up
the regularity partition,
00:27:52.690 --> 00:27:55.690
say, well, let's try
to find some patterns.
00:27:55.690 --> 00:28:05.830
If you find one pattern
in the cleaned graph--
00:28:11.780 --> 00:28:23.450
and we can use the counting
lemma to find lots of patterns.
00:28:23.450 --> 00:28:27.170
Here, for the purpose of
triangle removal lemma
00:28:27.170 --> 00:28:29.270
and what we've
been doing so far,
00:28:29.270 --> 00:28:32.510
pattern just means a triangle.
00:28:32.510 --> 00:28:35.110
So we're going to use the
triangle counting lemma to find
00:28:35.110 --> 00:28:37.630
us lots of triangles.
00:28:37.630 --> 00:28:39.250
So we'll see the
details in a bit.
00:28:39.250 --> 00:28:42.020
But if we run through
the strategy--
00:28:42.020 --> 00:28:43.540
you give me a graph.
00:28:43.540 --> 00:28:46.980
Let's say, starting from
the triangle removal lemma,
00:28:46.980 --> 00:28:49.530
it has a small
number of triangles.
00:28:49.530 --> 00:28:53.000
You apply the
partition, clean it up,
00:28:53.000 --> 00:28:55.140
and I claim this
cleaning removes
00:28:55.140 --> 00:28:56.970
a small number of edges.
00:28:56.970 --> 00:29:00.690
And it should result in
a triangle free graph
00:29:00.690 --> 00:29:04.770
because if it did not result
in a triangle free graph, then
00:29:04.770 --> 00:29:06.506
there's some triangle.
00:29:06.506 --> 00:29:09.760
And from that triangle I can
apply the triangle counting
00:29:09.760 --> 00:29:13.600
lemma to get lots of triangles.
00:29:13.600 --> 00:29:16.000
And that would
violate the hypothesis
00:29:16.000 --> 00:29:18.820
of the triangle removal lemma.
00:29:18.820 --> 00:29:22.350
So that's how the
proof is going to go.
00:29:22.350 --> 00:29:25.300
So I want to take
a very quick break.
00:29:25.300 --> 00:29:26.970
And then when we
come back, we'll
00:29:26.970 --> 00:29:31.174
see the details of how to
apply the irregularity lemma.
00:29:31.174 --> 00:29:32.800
Are there any questions so far?
00:29:37.730 --> 00:29:38.716
Yeah?
00:29:38.716 --> 00:29:41.190
AUDIENCE: So when
we're removing edges
00:29:41.190 --> 00:29:44.830
in one of the [INAUDIBLE],,
is that too small?
00:29:44.830 --> 00:29:48.477
Can we do that for every
vertex, or is it too small?
00:29:48.477 --> 00:29:50.060
PROFESSOR: So you're
asking about what
00:29:50.060 --> 00:29:54.672
happens when we remove
vertexes that are too small.
00:29:54.672 --> 00:29:56.380
You will see in the
details of the proof.
00:29:56.380 --> 00:29:58.000
So hold on to that
question for a bit.
00:30:00.560 --> 00:30:03.860
More questions.
00:30:03.860 --> 00:30:04.360
OK.
00:30:04.360 --> 00:30:07.517
So let's see the proof of
the triangle removal lemma.
00:30:18.470 --> 00:30:25.230
So the first step is to apply
Szemerédi's regularity lemma
00:30:25.230 --> 00:30:27.820
and find a partition.
00:30:27.820 --> 00:30:34.380
So we'll find a partition
that's epsilon over 4 regular.
00:30:45.000 --> 00:30:46.800
So here, epsilon
is the same epsilon
00:30:46.800 --> 00:30:49.437
in the statement-- in the top
statement-- of the triangle
00:30:49.437 --> 00:30:50.020
removal lemma.
00:30:52.610 --> 00:31:00.450
In the second step,
let's clean the graph
00:31:00.450 --> 00:31:11.460
by removing all edges in--
00:31:11.460 --> 00:31:14.445
so we are going to get
rid of edges between--
00:31:19.760 --> 00:31:20.460
OK.
00:31:20.460 --> 00:31:21.502
So let me do it this way.
00:31:21.502 --> 00:31:28.340
So all edges between
the vi and the vj
00:31:28.340 --> 00:31:42.500
whenever vi and vj is
not epsilon regular.
00:31:42.500 --> 00:31:45.876
Get rid of the edges
between irregular parts.
00:31:45.876 --> 00:31:47.370
AUDIENCE: Epsilon
over 4 regular.
00:31:47.370 --> 00:31:48.078
PROFESSOR: Sorry?
00:31:48.078 --> 00:31:49.633
AUDIENCE: Epsilon
over 4 regular.
00:31:49.633 --> 00:31:51.050
PROFESSOR: Epsilon
over 4 regular.
00:31:51.050 --> 00:31:51.550
Thank you.
00:31:57.240 --> 00:32:05.586
Also, between parts where the
edge density is too small--
00:32:08.778 --> 00:32:11.910
if the edge density is
less than epsilon over 2,
00:32:11.910 --> 00:32:12.960
get rid of those edges.
00:32:15.790 --> 00:32:22.380
And if one of the
two vertex sets
00:32:22.380 --> 00:32:25.500
has size too small--
and here, too small
00:32:25.500 --> 00:32:32.550
means epsilon over 4M
times the size of n.
00:32:32.550 --> 00:32:37.630
So here-- OK.
00:32:37.630 --> 00:32:42.370
So let me use big M for
the number of parts.
00:32:42.370 --> 00:32:45.130
So that's the M that comes
out of Szemerédi's regularity
00:32:45.130 --> 00:32:45.790
lemma.
00:32:45.790 --> 00:32:48.310
If you like, some of the
vertex sets can be empty.
00:32:48.310 --> 00:32:49.855
It doesn't change the proof.
00:32:49.855 --> 00:32:55.280
And n is the number of
vertices in the graph.
00:32:59.045 --> 00:33:00.920
And this step, you don't
really need the step
00:33:00.920 --> 00:33:04.790
if your regular
partition is equitable.
00:33:04.790 --> 00:33:08.030
So let's see how many
vertices-- how many edges
00:33:08.030 --> 00:33:11.300
have we gotten rid of.
00:33:11.300 --> 00:33:16.300
We want to show that we're
not deleting too many edges.
00:33:16.300 --> 00:33:18.040
In the first step--
00:33:18.040 --> 00:33:24.440
so the number of deleted edges.
00:33:24.440 --> 00:33:29.240
In the first step, you see that
the number of edges deleted
00:33:29.240 --> 00:33:39.980
is at most the sum of product of
vi vj when you sum over ij such
00:33:39.980 --> 00:33:46.980
that this pair is not epsilon
regular or epsilon 4 regular.
00:33:46.980 --> 00:33:50.540
Epsilon over 4 regular.
00:33:50.540 --> 00:33:53.780
By the definition of an
epsilon regular partition,
00:33:53.780 --> 00:33:59.300
the sum here is at most
epsilon over 4 times n squared.
00:34:04.480 --> 00:34:10.620
In the second step, I'm getting
rid of low density pairs.
00:34:10.620 --> 00:34:12.900
By the virtue of them
being low density,
00:34:12.900 --> 00:34:15.630
I'm not removing so many edges.
00:34:15.630 --> 00:34:20.560
So at most epsilon over
2 times n squared edges
00:34:20.560 --> 00:34:21.360
I'm getting rid of.
00:34:24.620 --> 00:34:33.260
In the third part, you see every
time I take a very small piece,
00:34:33.260 --> 00:34:38.060
every vertex here is adjacent
to at most n vertices.
00:34:38.060 --> 00:34:42.909
So the number of such
things, such edges
00:34:42.909 --> 00:34:45.429
I'm getting rid of
in the last step
00:34:45.429 --> 00:34:51.360
is at most this number times a
number of parts M then times n.
00:34:51.360 --> 00:34:56.739
So it's at most epsilon
over 4 times n squared.
00:35:00.540 --> 00:35:03.600
So here I'm telling
you how many edges
00:35:03.600 --> 00:35:06.510
I've deleted in each step.
00:35:06.510 --> 00:35:10.710
And in total, putting
them together,
00:35:10.710 --> 00:35:17.153
we see that we get rid of at
most epsilon n squared edges
00:35:17.153 --> 00:35:17.820
from this graph.
00:35:24.350 --> 00:35:26.600
So that's the cleaning step.
00:35:26.600 --> 00:35:29.670
So we cleaned up the graph
by getting rid of low density
00:35:29.670 --> 00:35:33.750
pairs, getting rid of irregular
pairs, and small vertex s.
00:35:38.930 --> 00:35:41.330
Now suppose, after
this cleaning,
00:35:41.330 --> 00:35:44.810
some triangles still remains.
00:35:44.810 --> 00:35:46.530
So we're now onto
the third step.
00:35:46.530 --> 00:35:54.320
So suppose some
triangle remains.
00:36:00.430 --> 00:36:03.050
So where could
this triangle sit?
00:36:06.130 --> 00:36:08.020
Has to be between three parts--
00:36:08.020 --> 00:36:10.570
vi, vj, and vk.
00:36:14.130 --> 00:36:16.537
I, j, and k, they don't
have to be distinct.
00:36:16.537 --> 00:36:18.870
So the argument will be OK
if some of them are the same,
00:36:18.870 --> 00:36:22.660
but it's easier to draw
if they're all different.
00:36:22.660 --> 00:36:24.560
So I have some
triangle, like that.
00:36:28.040 --> 00:36:32.270
Because these edges have not
yet been deleted in the cleaning
00:36:32.270 --> 00:36:38.898
step, I know that the vertex
sets are not too small,
00:36:38.898 --> 00:36:40.440
the edge densities
are not too small,
00:36:40.440 --> 00:36:43.600
and they are all
regular with each other.
00:36:43.600 --> 00:37:00.440
So here, each pair in vi, vj,
vk is epsilon over 4 regular
00:37:00.440 --> 00:37:09.660
and have edge density
at least epsilon over 2.
00:37:09.660 --> 00:37:15.490
And now we apply the
triangle counting lemma,
00:37:15.490 --> 00:37:21.770
and we find that the
number of triangles
00:37:21.770 --> 00:37:24.410
with one vertex in
vi, one vertex in vj,
00:37:24.410 --> 00:37:31.030
one vertex in the vk is at
least this quantity here.
00:37:36.390 --> 00:37:39.070
So that's a correction factor.
00:37:39.070 --> 00:37:41.960
So 1 over this 2 epsilon.
00:37:41.960 --> 00:37:44.980
And then a bunch of densities--
so densities are not too small.
00:37:44.980 --> 00:37:48.240
So I have at least
epsilon over 4 n
00:37:48.240 --> 00:37:52.590
cubed multiplied by the
sizes of the vertex sets.
00:37:58.360 --> 00:37:59.725
Now I know that--
00:37:59.725 --> 00:38:05.080
use the fact that these part
sizes are not too small.
00:38:05.080 --> 00:38:15.460
So I have that.
00:38:21.120 --> 00:38:26.830
Just in case, if i, j, and
k happen to be the same,
00:38:26.830 --> 00:38:28.660
or two of them happen
to be the same,
00:38:28.660 --> 00:38:32.980
I might overcount the number
of triangles a little bit.
00:38:32.980 --> 00:38:37.410
But at most, you overcount
by a factor of 6.
00:38:37.410 --> 00:38:38.140
So that's OK.
00:38:38.140 --> 00:38:40.060
So if you're worried
about that, put
00:38:40.060 --> 00:38:48.330
the 1 over 6 factor in, just
in case i, j, k not distinct.
00:38:51.120 --> 00:38:55.790
Or if you like, in the
cleaning step, you can--
00:38:55.790 --> 00:38:58.010
if you apply the equitable
version of the regularity
00:38:58.010 --> 00:39:00.682
lemma, you can also get rid
of edges inside the parts.
00:39:00.682 --> 00:39:02.140
But there are many
ways to do this.
00:39:02.140 --> 00:39:03.620
It's not an important step.
00:39:06.720 --> 00:39:12.560
Now, this quantity, let
me set it to be delta.
00:39:12.560 --> 00:39:15.530
You see, delta is a
function of epsilon
00:39:15.530 --> 00:39:17.390
because M is a
function of epsilon.
00:39:21.330 --> 00:39:23.510
So now, looking back
at the statement,
00:39:23.510 --> 00:39:27.530
you see for every epsilon
there exists a delta, such
00:39:27.530 --> 00:39:38.160
that if your graph has fewer
than delta n cubed triangles,
00:39:38.160 --> 00:39:43.050
then let me get rid
of all those edges.
00:39:43.050 --> 00:39:47.060
I've gotten rid of fewer
than epsilon n squared edges,
00:39:47.060 --> 00:39:50.310
and the remaining graph
should be triangle free.
00:39:50.310 --> 00:39:52.340
Because if it were
not triangle free,
00:39:52.340 --> 00:39:53.960
then I can find some triangle.
00:39:53.960 --> 00:39:57.790
And that will lead to
a lot more triangles.
00:39:57.790 --> 00:40:00.020
So for example, if you
set this as delta over 2,
00:40:00.020 --> 00:40:03.980
then this will give you 2
delta n cubed triangles.
00:40:03.980 --> 00:40:11.870
Therefore, it would
contradict the hypothesis.
00:40:17.840 --> 00:40:22.380
And that finishes the proof
of the triangle removal lemma,
00:40:22.380 --> 00:40:32.410
saying that thus the resulting
graph is triangle free.
00:40:38.990 --> 00:40:41.930
So that's the proof of the
triangle removal lemma.
00:40:41.930 --> 00:40:43.970
So let me recap.
00:40:43.970 --> 00:40:48.250
We start with a graph, apply
Szemerédi's regularity lemma,
00:40:48.250 --> 00:40:51.220
and clean up the regularity
partition by getting rid of low
00:40:51.220 --> 00:40:54.550
density pairs, getting
rid of irregular pairs,
00:40:54.550 --> 00:40:58.030
and getting rid of edges
touching a very small vertex
00:40:58.030 --> 00:40:58.650
set.
00:40:58.650 --> 00:41:01.720
And I claim that the resulting
graph, after cleaning up,
00:41:01.720 --> 00:41:03.680
should be triangle free.
00:41:03.680 --> 00:41:06.730
Because if it were not triangle
free and I find some triangle,
00:41:06.730 --> 00:41:11.290
then I should be able to use
that triple of vertex sets,
00:41:11.290 --> 00:41:13.360
combined with a
triangle counting lemma,
00:41:13.360 --> 00:41:16.300
to produce a lot
more triangles and.
00:41:16.300 --> 00:41:19.600
That would violate the
hypothesis of the theorem.
00:41:23.177 --> 00:41:23.760
Any questions?
00:41:23.760 --> 00:41:24.480
Yeah.
00:41:24.480 --> 00:41:28.435
AUDIENCE: Where are you using
that there exists a triangle?
00:41:28.435 --> 00:41:29.310
PROFESSOR: Ah, great.
00:41:29.310 --> 00:41:32.350
So question is, where am I
using there exists a triangle?
00:41:32.350 --> 00:41:34.760
If there were no
triangles, then we're done.
00:41:34.760 --> 00:41:37.630
So the purpose of the triangle--
the claim in the triangle
00:41:37.630 --> 00:41:40.450
removal lemma is that you
can get rid of all triangles
00:41:40.450 --> 00:41:43.540
by removing at most
epsilon n squared edges.
00:41:43.540 --> 00:41:47.270
AUDIENCE: So say we
did that, and now--
00:41:47.270 --> 00:41:51.198
why does this not prove that
we still have triangles?
00:41:51.198 --> 00:41:52.990
PROFESSOR: Can you say
your question again?
00:41:52.990 --> 00:41:56.590
AUDIENCE: So say we've removed
everything by our cleaning
00:41:56.590 --> 00:41:59.200
step, and we've removed
epsilon n squared edges,
00:41:59.200 --> 00:42:01.690
why does this logic
not prove that we still
00:42:01.690 --> 00:42:04.415
have delta n cubed triangles.
00:42:07.317 --> 00:42:07.900
PROFESSOR: OK.
00:42:07.900 --> 00:42:10.390
So let me try to
answer your question.
00:42:10.390 --> 00:42:13.900
So why does this proof
show that you still
00:42:13.900 --> 00:42:16.450
have delta n cubed triangles?
00:42:16.450 --> 00:42:18.940
So I only set delta at the end.
00:42:18.940 --> 00:42:20.680
But of course, you
can also set delta
00:42:20.680 --> 00:42:23.210
in the beginning of this proof.
00:42:23.210 --> 00:42:25.760
So I'm saying that
you do the step.
00:42:25.760 --> 00:42:28.910
You get rid of epsilon
n squared edges.
00:42:28.910 --> 00:42:31.030
And now I claim,
after the step--
00:42:31.030 --> 00:42:39.940
so I claim the remaining
graph is triangle free.
00:42:43.430 --> 00:42:47.805
If it were not triangle
free, then, well,
00:42:47.805 --> 00:42:49.698
it has some triangle.
00:42:49.698 --> 00:42:51.490
Then the triangle
counting lemma would tell
00:42:51.490 --> 00:42:54.740
me there are lots of triangles.
00:42:54.740 --> 00:42:57.820
And that would
contradict the hypothesis
00:42:57.820 --> 00:43:01.360
where we assume that
this graph G has
00:43:01.360 --> 00:43:03.848
a small number of triangles.
00:43:03.848 --> 00:43:06.243
AUDIENCE: So if
there is no triangle,
00:43:06.243 --> 00:43:11.991
then we've removed edges between
vi, vj, or vi, vk, or vj,
00:43:11.991 --> 00:43:14.187
vk for any three i, j, k.
00:43:14.187 --> 00:43:15.270
PROFESSOR: That's correct.
00:43:15.270 --> 00:43:19.710
So we're saying, if you do
not have any triangles-- well,
00:43:19.710 --> 00:43:22.990
after the cleaning
step, we have gotten rid
00:43:22.990 --> 00:43:27.152
of all the edges
between the bad pairs.
00:43:27.152 --> 00:43:29.110
And I'm claiming that
there is no configuration
00:43:29.110 --> 00:43:32.210
like this left.
00:43:32.210 --> 00:43:35.330
And this is the proof because
if you have some configuration
00:43:35.330 --> 00:43:38.570
where you did not delete the
edges between these three
00:43:38.570 --> 00:43:42.270
parts, then you
should be able to get
00:43:42.270 --> 00:43:44.985
a lot more triangles from
the triangle counting lemma.
00:43:48.560 --> 00:43:49.060
Yeah.
00:43:49.060 --> 00:43:50.435
AUDIENCE: What if
there were lots
00:43:50.435 --> 00:43:53.968
of triangles inside each
individual vi, vj, vk?
00:43:53.968 --> 00:43:55.510
PROFESSOR: You asked
me, what happens
00:43:55.510 --> 00:43:58.583
if there were a lot of triangles
inside each vi, vj, vk?
00:43:58.583 --> 00:43:59.250
So that is fine.
00:43:59.250 --> 00:44:01.230
If you find some triangle--
00:44:01.230 --> 00:44:03.925
so this picture,
i, j, or k, they
00:44:03.925 --> 00:44:05.050
do not have to be distinct.
00:44:07.830 --> 00:44:10.970
So the same proof works if
i, j, and k, some of them
00:44:10.970 --> 00:44:13.440
are equal to each other.
00:44:13.440 --> 00:44:14.095
Yep.
00:44:14.095 --> 00:44:16.945
AUDIENCE: [INAUDIBLE] but,
I don't really understand
00:44:16.945 --> 00:44:18.845
why-- isn't delta over 2 there?
00:44:21.420 --> 00:44:23.920
PROFESSOR: So you're asking,
why did I put the delta over 2?
00:44:23.920 --> 00:44:26.800
Just because I put less
than or equal to delta.
00:44:26.800 --> 00:44:28.420
If I put strictly
less than delta,
00:44:28.420 --> 00:44:30.166
then I don't need
a delta over 2.
00:44:30.166 --> 00:44:34.087
AUDIENCE: [INAUDIBLE]
delta over 2 or 2 delta.
00:44:34.087 --> 00:44:34.670
PROFESSOR: OK.
00:44:37.205 --> 00:44:38.080
Don't worry about it.
00:44:47.470 --> 00:44:47.970
Yes.
00:44:47.970 --> 00:44:49.830
AUDIENCE: Is there
a way to generalize
00:44:49.830 --> 00:44:52.897
the triangle counting
lemma to a general graph?
00:44:52.897 --> 00:44:53.480
PROFESSOR: OK.
00:44:53.480 --> 00:44:55.272
You're asking, is there
a way to generalize
00:44:55.272 --> 00:44:57.240
the triangle counting
lemma to a general graph?
00:44:57.240 --> 00:44:57.740
So yes.
00:44:57.740 --> 00:45:00.770
We will see that not today
but I think next time.
00:45:04.560 --> 00:45:07.860
Any more questions?
00:45:07.860 --> 00:45:09.780
Great.
00:45:09.780 --> 00:45:14.600
So why do people care about
the triangle removal lemma?
00:45:14.600 --> 00:45:19.880
So it's a nice, maybe somewhat
unintuitive statement.
00:45:19.880 --> 00:45:22.170
But there was a very good
reason why the statement was
00:45:22.170 --> 00:45:24.630
formulated, and
it's because you can
00:45:24.630 --> 00:45:26.520
use it to prove Roth's theorem.
00:45:26.520 --> 00:45:27.960
So that's what I
want to explain,
00:45:27.960 --> 00:45:30.990
how to connect this
graph theoretic statement
00:45:30.990 --> 00:45:35.130
to a statement about
three-term AP--
00:45:35.130 --> 00:45:38.670
three AP-free subsets
of the integers.
00:45:38.670 --> 00:45:41.940
This goes back to the very
connection between graph theory
00:45:41.940 --> 00:45:44.790
and additive combinatorics
that I highlighted
00:45:44.790 --> 00:45:47.570
in the first lecture.
00:45:47.570 --> 00:45:53.900
First, let me state a corollary
of the triangle removal lemma--
00:45:53.900 --> 00:45:59.655
namely, that if you have an
n vertex graph G, where--
00:46:05.030 --> 00:46:15.280
so if G is n vertex, and
every edge is in exactly one
00:46:15.280 --> 00:46:29.870
triangle, then the
number of edges of G
00:46:29.870 --> 00:46:32.490
is little o of n squared.
00:46:36.500 --> 00:46:38.710
These are actually
kind of strange graphs.
00:46:38.710 --> 00:46:40.630
Every edge is in
exactly one triangle.
00:46:43.080 --> 00:46:43.580
OK.
00:46:47.340 --> 00:46:51.360
Well, the number
of triangles in G--
00:46:57.040 --> 00:46:59.560
ever edge is in
exactly one triangle.
00:46:59.560 --> 00:47:02.920
So the number of triangles
in G is the number
00:47:02.920 --> 00:47:04.890
of edges divided by 3.
00:47:09.170 --> 00:47:16.050
The number of edges
is at most n squared.
00:47:16.050 --> 00:47:20.780
So this quantity is at
most quadratic order,
00:47:20.780 --> 00:47:24.140
which in particular is
little o of n cubed.
00:47:27.240 --> 00:47:29.970
And thus the triangle
removal lemma
00:47:29.970 --> 00:47:35.030
tells us that G can
be made triangle
00:47:35.030 --> 00:47:47.550
free by removing little
o of n squared edges.
00:47:55.130 --> 00:48:03.430
On the other hand,
since every edge
00:48:03.430 --> 00:48:08.940
is in exactly one
triangle, well,
00:48:08.940 --> 00:48:11.730
how many edges do you
need to remove to get rid
00:48:11.730 --> 00:48:13.530
of all the triangles?
00:48:13.530 --> 00:48:17.025
Well, I need to remove at
least a third of the edges.
00:48:17.025 --> 00:48:25.030
I need to remove at
least a third of edges
00:48:25.030 --> 00:48:28.660
to make G triangle free.
00:48:34.110 --> 00:48:37.970
Putting these two
claims together,
00:48:37.970 --> 00:48:40.240
we see that the
number of edges of G
00:48:40.240 --> 00:48:42.630
must be little o of n squared.
00:48:48.410 --> 00:48:50.342
Any questions?
00:48:50.342 --> 00:48:53.272
AUDIENCE: Are there not more
elementary ways to prove this?
00:48:53.272 --> 00:48:53.980
PROFESSOR: Great.
00:48:53.980 --> 00:48:58.170
Question is, are there not more
elementary ways to prove this?
00:48:58.170 --> 00:49:00.140
Let me make some
comments about that.
00:49:00.140 --> 00:49:12.000
So the short answer
is, yes but not really.
00:49:12.000 --> 00:49:13.525
And really, the answer is no.
00:49:13.525 --> 00:49:16.200
[LAUGHTER]
00:49:16.200 --> 00:49:21.000
So you can ask, what
about quantitative bounds?
00:49:21.000 --> 00:49:24.030
Because what is more elementary,
what is less elementary
00:49:24.030 --> 00:49:25.320
is kind of subjective.
00:49:25.320 --> 00:49:27.223
But quantitative
bounds, something
00:49:27.223 --> 00:49:28.140
that is very concrete.
00:49:28.140 --> 00:49:29.370
It's hard to argue.
00:49:29.370 --> 00:49:38.560
So if you look at the triangle
removal lemma, you can ask,
00:49:38.560 --> 00:49:43.990
how is the dependence
of delta on epsilon?
00:49:43.990 --> 00:49:46.940
So what does the proof give you?
00:49:46.940 --> 00:49:49.145
Where's the bottleneck?
00:49:49.145 --> 00:49:53.920
The bottleneck is always
in the application
00:49:53.920 --> 00:49:58.450
of Szemerédi's regularity
lemma-- namely in this M.
00:49:58.450 --> 00:50:01.060
So none of the other
epsilons really matter.
00:50:01.060 --> 00:50:04.750
It's this M that kills you in
terms of quantitative bounds.
00:50:04.750 --> 00:50:13.850
So in triangle removal lemma,
this proof gives 1 over delta.
00:50:13.850 --> 00:50:26.210
So you can take 1 over delta
being a tower of twos of height
00:50:26.210 --> 00:50:34.180
at most polynomial
in 1 over epsilon.
00:50:34.180 --> 00:50:36.550
So that is your different proof.
00:50:36.550 --> 00:50:44.100
Well, the best known
bound due to Fox
00:50:44.100 --> 00:50:48.590
is that you can
replace this height
00:50:48.590 --> 00:50:55.170
by a different height that
is at most essentially
00:50:55.170 --> 00:50:59.040
logarithmic in 1 over epsilon.
00:50:59.040 --> 00:51:01.290
Still a tower of twos.
00:51:01.290 --> 00:51:03.150
So we've changed some
really big number
00:51:03.150 --> 00:51:08.360
to another, but slightly
smaller, really big number.
00:51:08.360 --> 00:51:10.270
So this is still an
astronomical number
00:51:10.270 --> 00:51:13.850
for any reasonable epsilon.
00:51:13.850 --> 00:51:18.490
And in terms of that corollary,
basically the only known proof
00:51:18.490 --> 00:51:21.270
goes through the
triangle removal lemma.
00:51:21.270 --> 00:51:25.780
Currently, we do not know any
other approach to this problem.
00:51:25.780 --> 00:51:29.530
And you'll see later on
that, well, what's the best
00:51:29.530 --> 00:51:30.790
thing that we can hope for?
00:51:30.790 --> 00:51:35.200
So it is quite possible that
there are other proofs that
00:51:35.200 --> 00:51:37.270
are yet to be found.
00:51:37.270 --> 00:51:39.590
So that's actually--
people believe this,
00:51:39.590 --> 00:51:41.710
that this is not
the right proof,
00:51:41.710 --> 00:51:44.200
that maybe there's some
other way to do this.
00:51:44.200 --> 00:51:47.980
And the best lower bound, which
we'll see either later today
00:51:47.980 --> 00:52:02.000
or next time, shows that we
cannot do better than 1 over
00:52:02.000 --> 00:52:11.080
epsilon being essentially just a
little bit more than polynomial
00:52:11.080 --> 00:52:12.240
in epsilon.
00:52:12.240 --> 00:52:17.590
So epsilon raised
to something that is
00:52:17.590 --> 00:52:20.300
logarithmic in 1 over epsilon.
00:52:23.210 --> 00:52:25.410
So you can think
of this as very--
00:52:25.410 --> 00:52:28.010
it's a little bit bigger than
polynomial in 1 over epsilon
00:52:28.010 --> 00:52:31.590
but not that much bigger than
polynomial in 1 over epsilon.
00:52:31.590 --> 00:52:33.590
So there is a very big
gap in our knowledge
00:52:33.590 --> 00:52:38.030
on what is the right dependence
between epsilon and delta
00:52:38.030 --> 00:52:40.640
in the triangle removal lemma.
00:52:40.640 --> 00:52:42.272
And that's one of the--
00:52:42.272 --> 00:52:45.380
it's a major open problem
in extremal combinatorics
00:52:45.380 --> 00:52:46.460
to close this gap.
00:52:50.570 --> 00:52:51.640
Other questions?
00:52:55.400 --> 00:52:57.000
All right.
00:52:57.000 --> 00:52:58.250
So let's prove Roth's theorem.
00:53:00.980 --> 00:53:06.710
So let me remind you that
Roth's theorem, which
00:53:06.710 --> 00:53:09.260
we saw in the very
first lecture,
00:53:09.260 --> 00:53:13.340
says that if you
have a subset of 1
00:53:13.340 --> 00:53:19.620
through n that is free
of three-term arithmetic
00:53:19.620 --> 00:53:27.320
progressions, then the size
of the set must be sublinear.
00:53:33.060 --> 00:53:36.610
So what does this have to do
with a triangle removal lemma?
00:53:36.610 --> 00:53:38.540
So if you remember
the first lecture,
00:53:38.540 --> 00:53:40.540
maybe the connection
shouldn't be so surprising.
00:53:40.540 --> 00:53:45.640
What we will do is we
will set up a graph,
00:53:45.640 --> 00:53:48.640
starting from some
arithmetic sets such
00:53:48.640 --> 00:53:51.700
that the graph encodes some
arithmetic information--
00:53:51.700 --> 00:53:56.950
in particular, the three-term
APs in your graph, in the set,
00:53:56.950 --> 00:54:02.760
correspond to the
triangles in the graph.
00:54:02.760 --> 00:54:05.550
So let's set up this graph.
00:54:05.550 --> 00:54:14.070
It will be helpful to view A
not as a subset of the integers.
00:54:14.070 --> 00:54:15.750
It'll just be more
convenient to view it
00:54:15.750 --> 00:54:17.250
as a subset of a cyclic group.
00:54:19.950 --> 00:54:23.530
Because I don't have to worry
about edge cases so much when
00:54:23.530 --> 00:54:24.780
you're working a cyclic group.
00:54:28.060 --> 00:54:30.800
Here I take M to be 2N plus 1.
00:54:30.800 --> 00:54:34.320
So having it odd makes
my life a bit simpler.
00:54:34.320 --> 00:54:40.050
Then if A is three AP free
subset of 1 through n,
00:54:40.050 --> 00:54:43.680
then I claim that A now sitting
inside this cyclic group
00:54:43.680 --> 00:54:45.650
is also three AP free.
00:54:49.580 --> 00:54:53.141
So it's a subset of Z mod n.
00:54:57.070 --> 00:55:01.080
And what we will do is that we
will set up a certain graph.
00:55:04.620 --> 00:55:14.570
So we will set up a
tripartite graph, x, y, z.
00:55:14.570 --> 00:55:18.130
And here, x, y, and
z are going to be
00:55:18.130 --> 00:55:23.130
M elements whose
vertices are represented
00:55:23.130 --> 00:55:25.230
by elements of Z mod n.
00:55:28.090 --> 00:55:32.530
And I need to tell you what
are the edges of this graph.
00:55:32.530 --> 00:55:34.250
So here are the edges.
00:55:34.250 --> 00:55:42.210
I'm putting an edge between
vertex x and y if and only
00:55:42.210 --> 00:55:51.740
if y minus x is an element of A.
00:55:51.740 --> 00:55:54.650
So it's a rule for how
to put in the edges.
00:55:54.650 --> 00:55:58.550
And this is basically a Cayley
graph, a bipartite variant
00:55:58.550 --> 00:56:01.590
of a Cayley graph.
00:56:01.590 --> 00:56:07.090
Likewise, I put an
edge between x and z.
00:56:07.090 --> 00:56:10.820
So let me put x down
here and y up there.
00:56:10.820 --> 00:56:18.750
So let me put in the edge
between y and z if and only
00:56:18.750 --> 00:56:25.365
if z minus y is an element of A.
00:56:25.365 --> 00:56:31.450
And for the very last pair, it's
similar but slightly different.
00:56:31.450 --> 00:56:37.900
I'm putting that edge if and
only if z minus x divided by 2
00:56:37.900 --> 00:56:41.080
is an element of A. Because
we're in an odd cyclic group
00:56:41.080 --> 00:56:42.070
I can divide by 2.
00:56:46.040 --> 00:56:47.100
So this is a graph.
00:56:47.100 --> 00:56:49.470
So starting with
a set A I give you
00:56:49.470 --> 00:56:53.130
this rule for constructing
this tripartite graph.
00:56:53.130 --> 00:56:55.410
And the question
now is, what are
00:56:55.410 --> 00:56:57.150
the triangles in this graph?
00:57:00.090 --> 00:57:10.190
If the vertices x, y, z is
a triangle, then these three
00:57:10.190 --> 00:57:14.720
numbers by definition,
because of the edges--
00:57:14.720 --> 00:57:17.750
because they're all edges
in this graph, these three
00:57:17.750 --> 00:57:25.380
numbers, they all lie in A.
00:57:25.380 --> 00:57:27.760
But now notice that
these three numbers,
00:57:27.760 --> 00:57:35.480
they form a three-term
arithmetic progression
00:57:35.480 --> 00:57:41.350
because the middle element is
the average of the two others.
00:57:41.350 --> 00:57:44.470
But we said that A is a
set that is three AP free.
00:57:52.300 --> 00:57:55.100
Has no three-term
arithmetic progression.
00:57:55.100 --> 00:57:59.000
So what must be the case?
00:57:59.000 --> 00:58:02.670
So A is 3 AP free.
00:58:02.670 --> 00:58:06.420
But you can still have three
APs using the same element
00:58:06.420 --> 00:58:07.010
three times.
00:58:09.760 --> 00:58:11.760
So all the three-term
arithmetic progressions
00:58:11.760 --> 00:58:13.270
must be of that form.
00:58:13.270 --> 00:58:20.220
So these three numbers must
then equal to each other.
00:58:23.932 --> 00:58:27.540
And in particular, you see
that if you select x and y,
00:58:27.540 --> 00:58:28.540
it determines z.
00:58:31.340 --> 00:58:35.650
This equality here is the
same as saying that x, y,
00:58:35.650 --> 00:58:43.000
and z they themselves form
a three AP in Z mod nz.
00:58:48.280 --> 00:58:51.230
So this is precisely
the description
00:58:51.230 --> 00:58:53.360
of all the triangles
in the graph.
00:59:06.740 --> 00:59:09.340
So all the triangles
in the graph G
00:59:09.340 --> 00:59:12.590
are precisely x, y,
z, where x, y, and z
00:59:12.590 --> 00:59:15.820
form a three-term
arithmetic progression.
00:59:15.820 --> 00:59:31.340
And in particular, every edge of
G lies in exactly one triangle.
00:59:35.400 --> 00:59:36.570
You give me an edge--
00:59:36.570 --> 00:59:37.890
for example, xy--
00:59:37.890 --> 00:59:41.000
I complete it two a
three AP, x, y, z.
00:59:41.000 --> 00:59:42.000
And that's the triangle.
00:59:42.000 --> 00:59:45.220
And that's the unique triangle
that the edge sits in.
00:59:45.220 --> 00:59:48.720
And likewise, if you
give me xz or yz,
00:59:48.720 --> 00:59:50.430
I can produce for you
a unique triangle.
00:59:53.240 --> 00:59:55.140
So we have this graph.
00:59:55.140 --> 00:59:57.620
It has this property that
every edge lies in exactly one
00:59:57.620 --> 00:59:59.690
triangle, so we can
apply the corollary
00:59:59.690 --> 01:00:07.440
up there to deduce a bound
on the total number of edges.
01:00:07.440 --> 01:00:08.780
Well, how many edges are there?
01:00:13.820 --> 01:00:19.850
On one hand, we see that because
it's a Cayley graph, each
01:00:19.850 --> 01:00:21.070
of the three parts--
01:00:21.070 --> 01:00:22.730
there are three parts here.
01:00:22.730 --> 01:00:28.040
Each of the three parts,
if I start with any vertex,
01:00:28.040 --> 01:00:35.240
I have A edges coming out of
that vertex to the next part
01:00:35.240 --> 01:00:37.160
by the construction.
01:00:37.160 --> 01:00:40.280
On the other hand,
by the corollary
01:00:40.280 --> 01:00:43.010
up there, the
number of edges has
01:00:43.010 --> 01:00:45.910
to be little o of M squared.
01:00:50.830 --> 01:00:55.260
And because M is
essentially twice n,
01:00:55.260 --> 01:01:04.950
we obtain that the size
of A is little o of M.
01:01:04.950 --> 01:01:07.490
And that proves Roth's theorem.
01:01:07.490 --> 01:01:07.990
Yeah?
01:01:07.990 --> 01:01:09.698
AUDIENCE: Could you
explain one more time
01:01:09.698 --> 01:01:12.257
why every edge is in
exactly one triangle?
01:01:12.257 --> 01:01:12.840
PROFESSOR: OK.
01:01:12.840 --> 01:01:17.190
So the question is, why is every
edge in exactly one triangle?
01:01:17.190 --> 01:01:18.960
So you know what
all the edges are.
01:01:18.960 --> 01:01:23.010
So this is a description
of what all the edges are.
01:01:23.010 --> 01:01:25.740
And what are all the triangles.
01:01:25.740 --> 01:01:30.240
Well, x, y, z is a triangle
precisely when these three
01:01:30.240 --> 01:01:34.080
expressions all lie in A.
But note that these three
01:01:34.080 --> 01:01:36.120
expressions, they
form a three AP
01:01:36.120 --> 01:01:40.730
because the middle term is
the average of the two others.
01:01:40.730 --> 01:01:43.640
So x, y, z is the
triangle if and only
01:01:43.640 --> 01:01:45.050
if this equation is true.
01:01:47.660 --> 01:01:50.460
And this equation is
true if and only if x,
01:01:50.460 --> 01:01:53.620
y, z form a three AP in Z mod n.
01:01:53.620 --> 01:01:56.560
So if you just read out this
equation, I give you x and y.
01:01:56.560 --> 01:01:59.650
So what is z?
01:01:59.650 --> 01:02:07.300
So all the triangles in x, y,
z are precisely given by three
01:02:07.300 --> 01:02:15.480
APs, where one of the
differences y minus x is in A.
01:02:15.480 --> 01:02:15.980
OK.
01:02:15.980 --> 01:02:17.570
So I give you an edge.
01:02:17.570 --> 01:02:26.710
For example, xy, such
that y minus z is in A.
01:02:26.710 --> 01:02:30.040
And I claim there's a
unique z that completes
01:02:30.040 --> 01:02:31.060
this edge to a triangle.
01:02:35.100 --> 01:02:37.960
Well, it tells you
what that z is.
01:02:37.960 --> 01:02:43.370
z has to be the
element in Z mod m
01:02:43.370 --> 01:02:46.990
that completes x
and y to a three AP.
01:02:46.990 --> 01:02:50.620
Namely, z is the solution
to this equation.
01:02:50.620 --> 01:02:52.310
No other z can work.
01:02:52.310 --> 01:02:54.340
And you can check
that z indeed works
01:02:54.340 --> 01:02:57.100
and that all the
remaining pairs are edges.
01:03:00.060 --> 01:03:02.460
So it's something you can check.
01:03:05.547 --> 01:03:09.610
Any more questions?
01:03:09.610 --> 01:03:14.260
So starting with the set
A that is three AP free,
01:03:14.260 --> 01:03:17.890
we set up this graph
with a property
01:03:17.890 --> 01:03:22.140
that every edge lies in
exactly one triangle.
01:03:22.140 --> 01:03:24.650
And the one triangle basically
corresponds to the fact
01:03:24.650 --> 01:03:29.060
that you always have these
trivial three APs repeating
01:03:29.060 --> 01:03:31.870
the same element three times.
01:03:31.870 --> 01:03:34.840
And then, by applying this
corollary of the triangle
01:03:34.840 --> 01:03:38.200
removal lemma, we
deduce that the number
01:03:38.200 --> 01:03:40.690
of edges in the graph
must be subquadratic.
01:03:40.690 --> 01:03:43.960
So then the size of
A must be sublinear.
01:03:43.960 --> 01:03:45.930
And that proves Roth's theorem.
01:03:50.030 --> 01:03:53.420
So we did quite a bit of work
in proving this theorem--
01:03:53.420 --> 01:03:57.470
Szemerédi's regularity lemma,
counting lemma, removal lemma,
01:03:57.470 --> 01:03:59.030
and then we set up this graph.
01:03:59.030 --> 01:04:02.710
So it's not an easy theorem.
01:04:02.710 --> 01:04:04.730
Later in the course, we'll
see a different proof
01:04:04.730 --> 01:04:08.060
of Roth's theorem that goes
through Fourier analysis.
01:04:08.060 --> 01:04:10.190
That will look
somewhat different,
01:04:10.190 --> 01:04:12.330
but it will have similar themes.
01:04:12.330 --> 01:04:14.480
So we'll also have
this theme comparing
01:04:14.480 --> 01:04:18.770
structure and pseudorandomness,
which comes up in the proof--
01:04:18.770 --> 01:04:22.370
in the statement and proof of
Szemerédi's graph regularity
01:04:22.370 --> 01:04:22.940
lemma.
01:04:22.940 --> 01:04:24.830
So there, it's really
about understanding
01:04:24.830 --> 01:04:28.100
what is the structure of the
graph in terms of decomposition
01:04:28.100 --> 01:04:30.920
into parts that
look pseudorandom.
01:04:30.920 --> 01:04:32.006
Yeah.
01:04:32.006 --> 01:04:34.471
AUDIENCE: You called the
graph the Cayley graph.
01:04:34.471 --> 01:04:35.622
Why?
01:04:35.622 --> 01:04:36.205
PROFESSOR: OK.
01:04:36.205 --> 01:04:38.700
So question is, why do I call
this graph the Cayley graph?
01:04:38.700 --> 01:04:41.130
So usually the Cayley
graph refers to a graph
01:04:41.130 --> 01:04:45.120
where I give you a group, and I
give you a subset of the group,
01:04:45.120 --> 01:04:49.830
and I connect two elements if,
let's say, their difference
01:04:49.830 --> 01:04:52.260
lies in my subset.
01:04:52.260 --> 01:04:54.220
This basically has that form.
01:04:54.220 --> 01:04:56.910
So it's not exactly what
people mean by a Cayley graph,
01:04:56.910 --> 01:04:58.140
but it has that spirit.
01:05:01.890 --> 01:05:03.128
Any more questions?
01:05:06.120 --> 01:05:06.940
OK.
01:05:06.940 --> 01:05:10.390
So earlier I talked about bounds
for triangle removal lemma.
01:05:10.390 --> 01:05:13.630
So what about bounds
for Roth's theorem?
01:05:13.630 --> 01:05:16.540
We do know somewhat better
bounds for Roth's theorem
01:05:16.540 --> 01:05:19.270
compared to this proof.
01:05:19.270 --> 01:05:22.300
Somehow it's a nice proof, it's
a nice graph, theoretic proof,
01:05:22.300 --> 01:05:24.530
but it doesn't give
you very good bounds.
01:05:24.530 --> 01:05:28.730
It gives you bounds that decay
very poorly as a function of n.
01:05:28.730 --> 01:05:31.490
Actually, what does it give
you as a function of n?
01:05:31.490 --> 01:05:35.120
If you were to replace this
little o by a function of n
01:05:35.120 --> 01:05:39.060
according to this proof,
what would you get?
01:05:39.060 --> 01:05:41.690
I'm basically asking,
what is the inverse
01:05:41.690 --> 01:05:44.480
of the function where
you input some number
01:05:44.480 --> 01:05:47.770
and it gives you a tower
of exponentials of height
01:05:47.770 --> 01:05:48.590
with that input?
01:05:54.070 --> 01:05:56.040
It's called a log star.
01:05:56.040 --> 01:05:57.930
So the log star--
01:05:57.930 --> 01:06:04.230
so this is essentially N
over the log star of N.
01:06:04.230 --> 01:06:06.410
So the log star
basically is the number
01:06:06.410 --> 01:06:14.640
of times you have to take the
logarithm to get you below 1.
01:06:14.640 --> 01:06:16.120
So that's the log star.
01:06:16.120 --> 01:06:20.000
And there's a saying
that the log star, we
01:06:20.000 --> 01:06:22.000
know that it grows to
infinity, but it has never
01:06:22.000 --> 01:06:23.350
been observed to do so.
01:06:23.350 --> 01:06:26.350
It's extremely slowly
growing function.
01:06:30.260 --> 01:06:33.164
Any more questions?
01:06:33.164 --> 01:06:39.260
So I want to--
01:06:39.260 --> 01:06:42.470
so next time I want to show
you a construction that
01:06:42.470 --> 01:06:47.300
gives you a--
01:06:47.300 --> 01:06:49.190
so next time I will
show you a construction
01:06:49.190 --> 01:06:56.300
that gives you a subset A
of n that is fairly large.
01:06:56.300 --> 01:06:58.670
So you might ask, OK, so
you have this upper bound,
01:06:58.670 --> 01:07:00.250
but what should the truth be?
01:07:00.250 --> 01:07:03.472
And here's more or less
the state of knowledge.
01:07:18.724 --> 01:07:26.640
So best bounds of
Roth's theorem.
01:07:31.950 --> 01:07:41.040
Basically, the best bounds have
the form N divided by basically
01:07:41.040 --> 01:07:44.820
log N raised to power
1 plus little o1.
01:07:48.010 --> 01:07:50.810
The precise bounds are
of the form N over log N,
01:07:50.810 --> 01:07:52.748
and then there's some
extra log-log factors.
01:07:52.748 --> 01:07:54.040
But let's not worry about that.
01:07:57.000 --> 01:07:58.850
The best lower bounds--
01:07:58.850 --> 01:08:00.210
so we'll see this next time.
01:08:00.210 --> 01:08:05.060
So there exists subsets
of 1 through N such
01:08:05.060 --> 01:08:13.420
that the size of A is
at least e to the--
01:08:18.109 --> 01:08:24.710
so N times-- so first, let
me say it's pretty close to--
01:08:24.710 --> 01:08:29.270
the exponent is as
close to 1 as you wish.
01:08:29.270 --> 01:08:32.649
So there exists as A
such that the size of A
01:08:32.649 --> 01:08:34.210
is N to the 1 minus little o1.
01:08:34.210 --> 01:08:38.729
And already, this
fact is an indication
01:08:38.729 --> 01:08:41.550
of the difficulty of the
problem because if you
01:08:41.550 --> 01:08:44.340
could prove Roth's theorem
through some fairly
01:08:44.340 --> 01:08:46.420
elementary
techniques, like using
01:08:46.420 --> 01:08:49.229
a Cauchy-Schwarz a bunch
of times for instance,
01:08:49.229 --> 01:08:52.740
then experience tells
us that you probably
01:08:52.740 --> 01:08:58.050
expect some bound that's
power saving, replacing
01:08:58.050 --> 01:09:00.840
the 1 by some smaller number.
01:09:00.840 --> 01:09:01.840
But that's not the case.
01:09:01.840 --> 01:09:03.382
And the fact that
that's not the case
01:09:03.382 --> 01:09:05.080
is already indication
of the difficulty
01:09:05.080 --> 01:09:07.510
of this upper bound
of Roth's theorem,
01:09:07.510 --> 01:09:09.279
even getting a little o.
01:09:09.279 --> 01:09:12.350
So you don't expect there
to be simple proofs getting
01:09:12.350 --> 01:09:13.276
the little o.
01:09:16.020 --> 01:09:18.588
The bound that we'll
see next time--
01:09:18.588 --> 01:09:20.380
so we'll see a construction
which gives you
01:09:20.380 --> 01:09:22.270
a bound that is of this form.
01:09:30.040 --> 01:09:33.189
So it's maybe a little
bit hard to think
01:09:33.189 --> 01:09:36.380
about how quickly
this function grows,
01:09:36.380 --> 01:09:38.200
but I'll let you think about it.
01:09:41.220 --> 01:09:44.250
Now, how does this--
01:09:44.250 --> 01:09:46.490
so let's look at
this corollary here.
01:09:50.740 --> 01:09:54.580
Can you see a way to
construct a graph which
01:09:54.580 --> 01:09:59.135
has lots of edges, such that
every edge lies in exactly one
01:09:59.135 --> 01:09:59.635
triangle?
01:10:13.490 --> 01:10:16.820
So we did this
connection showing
01:10:16.820 --> 01:10:22.790
how to use this corollary
to prove Roth's theorem.
01:10:22.790 --> 01:10:27.190
But you can run the
same connection.
01:10:27.190 --> 01:10:41.180
So starting from
this three AP free A,
01:10:41.180 --> 01:10:58.910
we can use that construction
to build a graph
01:10:58.910 --> 01:11:11.150
n, such that a graph of n
vertices with essentially
01:11:11.150 --> 01:11:19.320
order of n times the size
of A number of edges, such
01:11:19.320 --> 01:11:33.880
that every edge lies in
exactly one triangle.
01:11:39.230 --> 01:11:42.850
So you run the
same construction.
01:11:42.850 --> 01:11:44.770
And this is actually
more or less
01:11:44.770 --> 01:11:49.030
the only way that we know how
to construct such graphs that
01:11:49.030 --> 01:11:50.080
are fairly dense.
01:11:53.260 --> 01:11:55.480
So on one hand--
01:11:55.480 --> 01:11:58.260
basically what I said earlier.
01:11:58.260 --> 01:12:00.080
On one hand, you have
this upper bound,
01:12:00.080 --> 01:12:03.430
which is given by the proof of
using Szemerédi's regularity
01:12:03.430 --> 01:12:07.690
lemma that gives you a tower
in the upper bound of 1 over
01:12:07.690 --> 01:12:09.640
delta.
01:12:09.640 --> 01:12:13.230
And if you use this construction
here of three AP free
01:12:13.230 --> 01:12:17.330
set to construct the graph, you
get this lower bound on delta,
01:12:17.330 --> 01:12:19.510
which is quasipolynomial.
01:12:22.370 --> 01:12:24.945
And that's more or
less that we know.
01:12:24.945 --> 01:12:27.320
And there's a major open
problem to close these two gaps.
01:12:32.150 --> 01:12:35.890
Any more questions?
01:12:35.890 --> 01:12:39.420
So I want to give you a plan
on what's coming up ahead.
01:12:39.420 --> 01:12:43.650
So today we saw one application
of Szemerédi's regularity
01:12:43.650 --> 01:12:44.370
lemma--
01:12:44.370 --> 01:12:47.220
namely, the triangle
removal lemma,
01:12:47.220 --> 01:12:49.620
which has this application
to Roth's theorem.
01:12:49.620 --> 01:12:52.410
So we've seen our first
proof of Roth's theorem.
01:12:52.410 --> 01:12:55.260
And next lecture, and
the next couple lectures,
01:12:55.260 --> 01:12:59.610
I want to show you a few
extensions and applications
01:12:59.610 --> 01:13:01.790
of Szemerédi's regularity lemma.
01:13:01.790 --> 01:13:03.810
So one of the
questions today was,
01:13:03.810 --> 01:13:05.700
we knew how to count
the triangles, but what
01:13:05.700 --> 01:13:06.615
about other graphs?
01:13:06.615 --> 01:13:08.740
And as you can imagine, if
you can count triangles,
01:13:08.740 --> 01:13:11.430
then the other
graphs should also
01:13:11.430 --> 01:13:13.210
be doable using the same ideas.
01:13:13.210 --> 01:13:14.460
And we'll do that.
01:13:14.460 --> 01:13:17.640
So we'll see how to
count other graphs.
01:13:17.640 --> 01:13:19.290
And we'll give you a--
01:13:19.290 --> 01:13:24.090
well, I'll give you a proof
of the Erdos-Stone-Simonovits
01:13:24.090 --> 01:13:28.700
theorem that we did not prove in
the first part of this course.
01:13:28.700 --> 01:13:33.450
So it gives you an upper
bound on the extremal number
01:13:33.450 --> 01:13:39.990
of a graph H that depends only
on the chromatic number of H.
01:13:39.990 --> 01:13:41.480
So we'll do that.
01:13:41.480 --> 01:13:44.090
And then I'll also mention,
although not prove,
01:13:44.090 --> 01:13:46.640
some extensions
of the regularity
01:13:46.640 --> 01:13:52.430
lemma to other settings,
such as to hypergraphs.
01:13:52.430 --> 01:13:54.680
And what that's
useful for is that it
01:13:54.680 --> 01:13:58.580
will allow us to
deduce generalizations
01:13:58.580 --> 01:14:04.830
of Roth's theorem to longer
arithmetic progressions.
01:14:04.830 --> 01:14:07.280
Proving Szemerédi's theorem.
01:14:07.280 --> 01:14:12.060
So one way to deduce Szemerédi's
theorem is to use a hypergraph
01:14:12.060 --> 01:14:12.985
removal lemma--
01:14:12.985 --> 01:14:16.080
the hypergraph extension
of the graph removal lemma,
01:14:16.080 --> 01:14:19.100
the triangle removal
lemma that we saw today.
01:14:19.100 --> 01:14:22.830
It would also let us
derive higher dimensional
01:14:22.830 --> 01:14:27.790
generalizations
of these theorems.
01:14:27.790 --> 01:14:29.188
So it's a very powerful tool.
01:14:29.188 --> 01:14:30.980
And actually, the
hypergraph removal lemma,
01:14:30.980 --> 01:14:32.920
as mentioned in the
very first lecture,
01:14:32.920 --> 01:14:37.340
it's a very difficult extension
of the graph removal lemma.
01:14:37.340 --> 01:14:39.630
And the hypergraph
regularity lemma,
01:14:39.630 --> 01:14:42.330
which can be used to prove
the hypergraph removal lemma,
01:14:42.330 --> 01:14:45.090
is a difficult extension of
the graph regularity lemma.
01:14:48.520 --> 01:14:51.180
So we'll see that in
the next few lectures.