WEBVTT
00:00:16.602 --> 00:00:18.810
YUFEI ZHAO: We've been
spending the past few lectures
00:00:18.810 --> 00:00:21.750
discussing Szemeredi's
Regularity Lemma.
00:00:21.750 --> 00:00:24.060
And one of the
first applications
00:00:24.060 --> 00:00:26.730
that we discussed of
the Regularity Lemma
00:00:26.730 --> 00:00:29.580
is the triangle removal Lemma.
00:00:29.580 --> 00:00:31.740
So today, I want to
revisit this topic
00:00:31.740 --> 00:00:34.860
and show you a strengthening
of the Removal Lemma
00:00:34.860 --> 00:00:37.860
for which new regularity
techniques are needed.
00:00:42.030 --> 00:00:45.090
But first, recall the
graph removal Lemma.
00:00:57.690 --> 00:01:03.590
In the graph removal Lemma,
we have that for every graph H
00:01:03.590 --> 00:01:10.040
and epsilon bigger than zero,
there exists some delta such
00:01:10.040 --> 00:01:24.730
that if an N vertex graph
has fewer than delta
00:01:24.730 --> 00:01:35.270
and to the number of vertices
of H, many copies of H,
00:01:35.270 --> 00:01:53.620
then it can be made H-free by
removing fewer than epsilon N
00:01:53.620 --> 00:01:54.520
squared edges.
00:01:58.230 --> 00:02:01.065
Even in the case when
H is a triangle, when
00:02:01.065 --> 00:02:02.940
this is called a triangle
removal Lemma, even
00:02:02.940 --> 00:02:06.330
in that case, basically
the regularity method
00:02:06.330 --> 00:02:08.280
is more or less the only
way that we currently
00:02:08.280 --> 00:02:10.380
know how to prove this theorem.
00:02:10.380 --> 00:02:14.310
So we saw this a
few lectures ago.
00:02:14.310 --> 00:02:16.320
What I would like
to discuss today
00:02:16.320 --> 00:02:19.650
is a variant of
this result where
00:02:19.650 --> 00:02:24.030
instead of considering
copies of H,
00:02:24.030 --> 00:02:29.460
we're now considering
induced copies of H. OK?
00:02:29.460 --> 00:02:35.010
So this is the induced
graph removal Lemma
00:02:35.010 --> 00:02:39.780
where the only difference is
that the hypothesis is now
00:02:39.780 --> 00:02:44.910
going to be changed to
induced copies of H.
00:02:44.910 --> 00:02:46.590
And the conclusion
is that you can
00:02:46.590 --> 00:02:51.330
make the graph induced H-free.
00:02:51.330 --> 00:02:52.920
So let me remind
you, the difference
00:02:52.920 --> 00:02:55.860
between the induced
graph subgraph
00:02:55.860 --> 00:02:59.040
and the usual subgraph.
00:02:59.040 --> 00:03:12.220
So we say that H is an induced
copy of G, induced subgraph
00:03:12.220 --> 00:03:42.842
of G. If one can obtain H from
G by deleting vertices of G.
00:03:42.842 --> 00:03:44.300
You're not allowed
to delete edges,
00:03:44.300 --> 00:03:46.910
but only allowed
to delete vertices.
00:03:46.910 --> 00:03:55.520
So in other words,
the four cycle
00:03:55.520 --> 00:04:05.210
is not an induced
subgraph because, well,
00:04:05.210 --> 00:04:08.270
if you select four vertices, you
don't generate this four cycle.
00:04:08.270 --> 00:04:09.290
You get extra edges.
00:04:09.290 --> 00:04:12.060
So it is a subgraph, but
not an induced subgraph.
00:04:20.050 --> 00:04:23.290
So it is a theorem, the
induced graph removal Lemma.
00:04:23.290 --> 00:04:24.790
So it's a theorem,
and let's discuss
00:04:24.790 --> 00:04:26.220
how we may prove that theorem.
00:04:26.220 --> 00:04:26.720
Question.
00:04:30.550 --> 00:04:33.000
OK, question is, why is it
stronger than the graph removal
00:04:33.000 --> 00:04:34.810
lemma?
00:04:34.810 --> 00:04:40.060
So it's not stronger, but
we'll see the relationship
00:04:40.060 --> 00:04:41.270
between the two.
00:04:41.270 --> 00:04:46.330
So I claim that it is more
difficult to do this theorem.
00:04:46.330 --> 00:04:48.940
Any more questions?
00:04:48.940 --> 00:04:57.320
So let's pretend for a second
that whatever's in here
00:04:57.320 --> 00:04:58.605
is not quite true.
00:04:58.605 --> 00:04:59.480
So here's an example.
00:05:06.110 --> 00:05:14.360
For example, if your H is
three isolated vertices.
00:05:14.360 --> 00:05:15.890
So what is that saying?
00:05:15.890 --> 00:05:18.230
We're looking at
copies of H which
00:05:18.230 --> 00:05:19.700
are three isolated vertices.
00:05:19.700 --> 00:05:26.330
So really you are looking at
triangles in g complement.
00:05:26.330 --> 00:05:30.040
So this is exactly the
triangle removal lemma
00:05:30.040 --> 00:05:35.300
in the complement of g, but
you can't get rid of these guys
00:05:35.300 --> 00:05:36.680
by removing edges.
00:05:36.680 --> 00:05:38.360
So we need to make
the modification
00:05:38.360 --> 00:05:40.580
where instead of
removing these edges,
00:05:40.580 --> 00:05:48.080
we need to both remove and
add by adding or deleting.
00:05:51.490 --> 00:05:52.758
So maybe at the same time.
00:05:52.758 --> 00:05:55.050
So you're allowed to add some
edges, delete some edges.
00:05:55.050 --> 00:05:58.788
But in total, you change no more
than epsilon n squared edges.
00:05:58.788 --> 00:06:01.080
So those are sometimes also
known as the edit distance.
00:06:05.690 --> 00:06:08.530
You're allowed to change edges.
00:06:08.530 --> 00:06:10.810
So you can add edges
and delete edges.
00:06:16.090 --> 00:06:17.940
Any questions about
the statement?
00:06:20.880 --> 00:06:24.550
All right, so let's
think about how would you
00:06:24.550 --> 00:06:27.760
prove this result following
the proof that we did
00:06:27.760 --> 00:06:30.660
for the triangle removal lemma.
00:06:30.660 --> 00:06:33.340
So let's pretend that
we go through this proof
00:06:33.340 --> 00:06:35.230
and think about
what could go wrong.
00:06:35.230 --> 00:06:37.840
So remember in the application
of the removal lemma,
00:06:37.840 --> 00:06:40.250
so the recipe has three steps.
00:06:40.250 --> 00:06:42.454
The first step we
do a partition.
00:06:45.420 --> 00:06:49.220
So we partition applying
Szemeredi's regularity lemma
00:06:49.220 --> 00:06:50.890
to this partition.
00:06:50.890 --> 00:06:56.400
And the second step
is do a cleaning,
00:06:56.400 --> 00:06:59.820
and the two key things
that happen in the cleaning
00:06:59.820 --> 00:07:11.130
is we remove low density pairs
of parts and irregular pairs.
00:07:13.940 --> 00:07:16.380
And the third step
we claim that once we
00:07:16.380 --> 00:07:19.890
do the cleaning, once
we remove those edges,
00:07:19.890 --> 00:07:22.230
the resulting
graphs should be H3.
00:07:22.230 --> 00:07:26.970
Because if we're not H3, then
by considering the vertex parts
00:07:26.970 --> 00:07:29.460
where H lie and applying
the counting lemma,
00:07:29.460 --> 00:07:33.430
you can generate many
more copies of H.
00:07:33.430 --> 00:07:35.550
So these were the three
main steps in the proof
00:07:35.550 --> 00:07:37.530
of the triangle removal lemma.
00:07:37.530 --> 00:07:40.050
So let's see what
happens when we
00:07:40.050 --> 00:07:43.530
try to apply this strategy
to the induced version.
00:07:43.530 --> 00:07:47.760
I mean, the partition you still
do the regularity partition.
00:07:47.760 --> 00:07:50.370
Nothing really changes there.
00:07:50.370 --> 00:07:54.420
So let's see in the
cleaning step what happens.
00:07:54.420 --> 00:07:56.460
For low density pairs--
00:07:56.460 --> 00:07:59.250
well, so now we need to think
about not just low density
00:07:59.250 --> 00:08:01.700
pairs, but also
high density pairs.
00:08:01.700 --> 00:08:04.770
Because in the induced, we
think about edges and non-edges
00:08:04.770 --> 00:08:05.950
at the same time.
00:08:05.950 --> 00:08:09.030
So you might think of a
strategy which is like the edge
00:08:09.030 --> 00:08:11.460
density is less than n.
00:08:11.460 --> 00:08:15.480
So less than epsilon, then
you remove all those edges.
00:08:15.480 --> 00:08:18.450
And if the edge density is
bigger than 1 plus epsilon,
00:08:18.450 --> 00:08:22.100
then you add all
of those edges in.
00:08:22.100 --> 00:08:23.860
So this is the
natural generalization
00:08:23.860 --> 00:08:25.820
of our strategy for
triangle removal
00:08:25.820 --> 00:08:27.400
lemma for the induced setting.
00:08:27.400 --> 00:08:30.570
So so far,
everything's still OK.
00:08:30.570 --> 00:08:33.490
But now what would you do
for the irregular pairs?
00:08:37.970 --> 00:08:41.030
That's problematic.
00:08:41.030 --> 00:08:43.820
Previously for
triangle removal lemma,
00:08:43.820 --> 00:08:47.270
we just said if a pair is
irregular, get rid of that pair
00:08:47.270 --> 00:08:51.440
and it will never show
up in the counting stage.
00:08:51.440 --> 00:08:54.540
But that strategy
no longer works.
00:08:54.540 --> 00:08:59.870
Because for example, if
your graph H being counted
00:08:59.870 --> 00:09:08.630
is this here, you do the
regularity partition,
00:09:08.630 --> 00:09:12.550
and one of your
pairs is irregular.
00:09:12.550 --> 00:09:16.240
So you, let's say, get rid of
all those edges in between.
00:09:16.240 --> 00:09:20.680
Then maybe you have
some embedding of H
00:09:20.680 --> 00:09:25.905
where you are going to
use the removed edges.
00:09:30.010 --> 00:09:34.040
And now you don't
have a counting lemma.
00:09:34.040 --> 00:09:41.070
You cannot say, I found this
copy of H in my changed graph.
00:09:41.070 --> 00:09:43.650
And by the counting lemma I
could get many copies of H
00:09:43.650 --> 00:09:47.620
because you have no control over
this irregular pair anymore.
00:09:47.620 --> 00:09:50.880
So the fact that you
have to add and remove
00:09:50.880 --> 00:09:52.660
makes it unclear
what to do here,
00:09:52.660 --> 00:09:54.810
and this is a big obstacle
in the application
00:09:54.810 --> 00:09:59.370
of the regularity lemma to
the induced removal lemma
00:09:59.370 --> 00:10:02.310
application.
00:10:02.310 --> 00:10:04.350
Any questions about
this obstacle?
00:10:08.550 --> 00:10:10.940
So make sure you understand
why this is an issue.
00:10:10.940 --> 00:10:13.840
Otherwise you won't
really appreciate
00:10:13.840 --> 00:10:16.550
what will happen next.
00:10:16.550 --> 00:10:21.630
So somehow we need to find some
kind of regularity partition
00:10:21.630 --> 00:10:25.170
to get no irregular pairs.
00:10:25.170 --> 00:10:29.780
So the question
is, is there a way
00:10:29.780 --> 00:10:36.580
to partition so that there
are no irregular pairs?
00:10:42.130 --> 00:10:44.530
For those of you who have
started your homework
00:10:44.530 --> 00:10:49.150
problem on time, you realize
that the answer is no.
00:10:49.150 --> 00:10:50.770
So one of the
homework problems is
00:10:50.770 --> 00:10:53.700
for you to show that for
the specific graph known
00:10:53.700 --> 00:10:54.700
as the half graph.
00:10:58.110 --> 00:11:00.588
So there was an
example in homework
00:11:00.588 --> 00:11:01.630
that for the half graph--
00:11:08.665 --> 00:11:11.800
so you'll see in the
homework what this graph is--
00:11:11.800 --> 00:11:15.050
you cannot partition it so that
you get rid of all irregular
00:11:15.050 --> 00:11:15.550
pairs.
00:11:15.550 --> 00:11:18.340
Irregular pairs are
necessary in the statement
00:11:18.340 --> 00:11:19.650
of regularity lemma.
00:11:22.240 --> 00:11:24.340
So what I want to show
you today is a way
00:11:24.340 --> 00:11:29.400
to do what's called a strong
regularity lemma in which you
00:11:29.400 --> 00:11:33.000
obtain a somewhat different
consequence that will allow
00:11:33.000 --> 00:11:35.438
you to get rid of
irregular pairs
00:11:35.438 --> 00:11:36.730
in the more restricted setting.
00:11:39.730 --> 00:11:42.100
So this is the issue,
the irregular pairs.
00:11:48.420 --> 00:11:50.790
Before telling you what
this regularity lemma is,
00:11:50.790 --> 00:11:55.050
I want to give you a
small generalization
00:11:55.050 --> 00:11:58.350
of the induced graph removal
lemma, or just a different way
00:11:58.350 --> 00:12:00.540
to think about the statement.
00:12:00.540 --> 00:12:03.720
And you can think of it as
a colorful version instead
00:12:03.720 --> 00:12:08.970
of induced where you
have edges and no edges.
00:12:08.970 --> 00:12:11.070
You can also have colored edges.
00:12:11.070 --> 00:12:14.190
So colorful removal
lemma, although this name
00:12:14.190 --> 00:12:15.300
is not standard.
00:12:21.580 --> 00:12:25.610
So colorful-- so when
we talk about graphs,
00:12:25.610 --> 00:12:29.040
it's colorful graph
removal lemma.
00:12:29.040 --> 00:12:37.490
So for every k, r, and epsilon,
there exists delta such
00:12:37.490 --> 00:12:58.020
that if curly H is a set of
r edge of the complete graph
00:12:58.020 --> 00:13:00.590
on little k vertices.
00:13:00.590 --> 00:13:03.800
So edge coloring just
means using r colors
00:13:03.800 --> 00:13:04.700
to color the edges.
00:13:04.700 --> 00:13:07.530
So there are no restrictions
about what are allowed,
00:13:07.530 --> 00:13:08.450
what are not allowed.
00:13:08.450 --> 00:13:12.293
So just a set of
possible r colorings.
00:13:12.293 --> 00:13:22.000
Then if the complete graph--
00:13:27.823 --> 00:13:28.990
say it slightly differently.
00:13:28.990 --> 00:13:44.650
So then every r edge coloring
of the complete graph
00:13:44.650 --> 00:14:02.850
on n vertices with fewer than
delta fraction of its k vertex
00:14:02.850 --> 00:14:20.310
subsets, say k vertex subgraphs,
belonging to the script H.
00:14:20.310 --> 00:14:34.670
So every such graph can be made
curly H free by recoloring,
00:14:34.670 --> 00:14:49.710
so using the same r colors, a
fewer than epsilon fraction.
00:14:49.710 --> 00:15:00.710
So less than epsilon fraction
of the edges of this kn.
00:15:03.360 --> 00:15:06.540
So in particular, the
version that we just stated,
00:15:06.540 --> 00:15:16.260
the induced version, so the
induced graph removal lemma,
00:15:16.260 --> 00:15:31.450
is the same as having two
colors and H having exactly one
00:15:31.450 --> 00:15:44.620
red-blue coloring of k
of the complete graph
00:15:44.620 --> 00:15:48.780
on the same number
of vertices as H.
00:15:48.780 --> 00:15:51.930
So you color red the edges
and blue the non-edges,
00:15:51.930 --> 00:15:53.650
for instance.
00:15:53.650 --> 00:15:58.680
And you're saying, I want to
color the big complete graph
00:15:58.680 --> 00:16:02.760
with red and blue in such a way
that there are very few copies
00:16:02.760 --> 00:16:03.900
of that pattern.
00:16:03.900 --> 00:16:05.730
So then I can recolor
the red and blue
00:16:05.730 --> 00:16:09.750
in a small number of places to
get rid of all such patterns.
00:16:09.750 --> 00:16:11.940
So having a colored
pattern somewhere
00:16:11.940 --> 00:16:14.610
in your graph in this
complete graph coloring
00:16:14.610 --> 00:16:19.170
is the same as having
an induced subgraph.
00:16:19.170 --> 00:16:19.750
Yeah?
00:16:19.750 --> 00:16:21.130
AUDIENCE: So after done--
00:16:21.130 --> 00:16:23.976
like the statement after done
is a really long sentence.
00:16:23.976 --> 00:16:24.560
Can I--
00:16:24.560 --> 00:16:28.020
YUFEI ZHAO: Yeah, OK.
00:16:28.020 --> 00:16:37.380
So every r edge coloring of kn
with a small number of patterns
00:16:37.380 --> 00:16:43.710
can be made h-free by recoloring
a small fraction of the edges.
00:16:43.710 --> 00:16:45.630
So like in a triangle
removal lemma,
00:16:45.630 --> 00:16:49.170
every graph with a small
number of triangles
00:16:49.170 --> 00:16:51.540
can be made
triangle-free by removing
00:16:51.540 --> 00:16:52.710
a small number of edges.
00:16:58.020 --> 00:17:01.710
Any other questions?
00:17:01.710 --> 00:17:07.079
So this is a restatement of
the induced removal lemma
00:17:07.079 --> 00:17:10.349
with a bit more generality.
00:17:10.349 --> 00:17:12.960
It's OK if you like
this one more or less,
00:17:12.960 --> 00:17:15.810
but let's talk about the
induced version from now on.
00:17:15.810 --> 00:17:18.450
But the same proofs that
I will talk about also
00:17:18.450 --> 00:17:22.710
applies to this version where
you have somewhat more colors.
00:17:26.349 --> 00:17:30.940
So the variant of the
regularity lemma that we'll need
00:17:30.940 --> 00:17:33.370
is known as a strong
regularity lemma.
00:17:47.320 --> 00:17:49.210
To state the strong
regularity lemma,
00:17:49.210 --> 00:17:52.420
let me recall a notion that came
up in the proof of Szemeredi's
00:17:52.420 --> 00:17:54.820
regularity lemma.
00:17:54.820 --> 00:17:57.740
And this was the
notion of an energy.
00:17:57.740 --> 00:18:04.930
So recall that if you have
a partition, denoted P. So
00:18:04.930 --> 00:18:11.980
if this is a partition of
the vertex set of a graph, G,
00:18:11.980 --> 00:18:17.770
and here n is the
number of vertices,
00:18:17.770 --> 00:18:29.390
we defined this notion of energy
to be this quantity denoted
00:18:29.390 --> 00:18:36.790
q, which is basically a
squared mean of the densities
00:18:36.790 --> 00:18:42.010
between vertex parts
appropriately normalized
00:18:42.010 --> 00:18:46.930
if the vertexes do not
all have the same size.
00:18:52.860 --> 00:18:58.690
In the proof of Szemeredi's
regularity lemma,
00:18:58.690 --> 00:19:00.760
there was an important
energy increment
00:19:00.760 --> 00:19:12.250
step which says that if you have
some partition p that is not
00:19:12.250 --> 00:19:24.318
epsilon regular, then there
exists a refinement, Q.
00:19:24.318 --> 00:19:29.920
And this refinement has
the property that Q has
00:19:29.920 --> 00:19:37.440
a small number of pieces, or
not too large as a function of P
00:19:37.440 --> 00:19:41.170
So it's bounded at
least in terms of P.
00:19:41.170 --> 00:19:48.370
But also if P is not epsilon
regular, then the energy of Q
00:19:48.370 --> 00:19:55.160
is significantly larger than
the energy of P. So remember,
00:19:55.160 --> 00:19:58.580
this was an important step in
the proof of regularity lemma.
00:20:03.130 --> 00:20:07.380
So to state the strong
regularity lemma,
00:20:07.380 --> 00:20:09.298
we need that notion of energy.
00:20:09.298 --> 00:20:11.340
And the statement of the
strong regularity lemma,
00:20:11.340 --> 00:20:13.298
if you've never seen this
kind of thing before,
00:20:13.298 --> 00:20:14.880
will seem a bit
intimidating at first
00:20:14.880 --> 00:20:19.230
because it involves a whole
sequence of parameters.
00:20:19.230 --> 00:20:20.640
But we'll get used to it.
00:20:23.880 --> 00:20:26.790
So instead of one
epsilon parameter,
00:20:26.790 --> 00:20:35.945
now you have a sequence
of positive epsilons.
00:20:35.945 --> 00:20:38.030
And part of the strength
of this regularity lemma
00:20:38.030 --> 00:20:41.120
is that depending on the
application you have in mind,
00:20:41.120 --> 00:20:44.570
you can make the sequence
go to zero pretty quickly.
00:20:44.570 --> 00:20:48.810
Thereby increasing the strength
of the regularity lemma.
00:20:48.810 --> 00:20:53.750
So there exists some m
bound, which depends only
00:20:53.750 --> 00:21:07.960
on your epsilons such that
every graph has not just one,
00:21:07.960 --> 00:21:16.170
but now we're going to get a
pair of vertex partitions P
00:21:16.170 --> 00:21:22.930
and Q with the
following properties.
00:21:22.930 --> 00:21:27.180
So first, P refines--
00:21:27.180 --> 00:21:36.990
so Q refines P. So it's
a pair of partitions,
00:21:36.990 --> 00:21:38.650
one refining the other.
00:21:42.400 --> 00:21:45.250
The number of parts
of Q is bounded
00:21:45.250 --> 00:21:47.260
just like in the usual
regularity lemma.
00:21:50.070 --> 00:21:54.960
The partition P
epsilon 0 regular.
00:21:57.720 --> 00:22:02.350
And here is the new part
that's the most important one.
00:22:02.350 --> 00:22:07.220
Q is very epsilon regular.
00:22:07.220 --> 00:22:09.250
So it's not just
epsilon 0 regular,
00:22:09.250 --> 00:22:13.410
it's epsilon sub the number
of parts of P regular.
00:22:16.330 --> 00:22:23.380
So you should think of
this as extremely regular
00:22:23.380 --> 00:22:28.280
because you get to choose what
the sequence of epsilon is.
00:22:28.280 --> 00:22:32.720
And finally, the energy
difference between P and Q
00:22:32.720 --> 00:22:33.980
is not too big.
00:22:43.690 --> 00:22:46.830
This is the statement of
the strong regularity lemma.
00:22:46.830 --> 00:22:48.750
It produces for you
not just one partition,
00:22:48.750 --> 00:22:50.780
but a pair of partitions.
00:22:50.780 --> 00:22:52.800
And in this pair
of partitions, you
00:22:52.800 --> 00:22:57.480
have one partition, P,
which is similar to the one
00:22:57.480 --> 00:22:59.660
that we obtained from
Szemeredi's regularity lemma
00:22:59.660 --> 00:23:06.150
is some epsilon 0 regular, but
we also get a refinement Q.
00:23:06.150 --> 00:23:11.250
And this Q is extremely regular.
00:23:11.250 --> 00:23:19.510
So you can think
that is P, then Q
00:23:19.510 --> 00:23:22.040
is an extremely
regular refinement
00:23:22.040 --> 00:23:28.210
of P. Any questions
about the statement
00:23:28.210 --> 00:23:29.960
of the strong regularity lemma?
00:23:33.860 --> 00:23:35.610
So the sequence of
epsilons gives you
00:23:35.610 --> 00:23:39.810
flexibility on how to apply it,
but let's see how to prove it.
00:23:39.810 --> 00:23:43.800
And the proof is once you
understand how this works,
00:23:43.800 --> 00:23:47.070
conceptually it's pretty short.
00:23:47.070 --> 00:23:50.370
But let me do it slowly
so that we can appreciate
00:23:50.370 --> 00:23:54.970
this sequence of epsilons.
00:23:54.970 --> 00:23:58.270
And the idea is that we
will repeatedly apply
00:23:58.270 --> 00:23:59.570
Szemeredi's regularity lemma.
00:24:06.030 --> 00:24:08.510
So start with the
regularity lemma.
00:24:08.510 --> 00:24:12.980
We'll apply it
repeatedly to generate
00:24:12.980 --> 00:24:16.540
a sequence of partitions.
00:24:16.540 --> 00:24:21.620
So first, let me remind you
a statement of Szemeredi's
00:24:21.620 --> 00:24:22.610
regularity lemma.
00:24:22.610 --> 00:24:24.320
This is slightly
different from the one
00:24:24.320 --> 00:24:28.330
that we stated, but comes
out of the same proof.
00:24:28.330 --> 00:24:33.440
So for every epsilon,
there exists some m0
00:24:33.440 --> 00:24:41.500
which depends on epsilon such
that for every partition P0,
00:24:41.500 --> 00:24:46.530
so starting with
some partition--
00:24:46.530 --> 00:24:48.510
so actually, let me
start with just P.
00:24:48.510 --> 00:24:54.920
So if you start with some
partition of the vertex set
00:24:54.920 --> 00:25:10.590
of g, there exists a refinement
P prime of P into at most--
00:25:10.590 --> 00:25:27.520
OK, so the refinement has is
such that with each part of P
00:25:27.520 --> 00:25:38.960
refined into at
most m0 parts such
00:25:38.960 --> 00:25:44.650
that P prime, the new
partition, is epsilon regular.
00:25:52.470 --> 00:25:54.900
So this is a statement of
Szemeredi's regularity lemma
00:25:54.900 --> 00:25:56.850
that we will apply repeatedly.
00:25:56.850 --> 00:25:58.870
So in the version that
we've seen before,
00:25:58.870 --> 00:26:02.500
we would start with
a trivial partition.
00:26:02.500 --> 00:26:05.740
And applying
refinements repeatedly
00:26:05.740 --> 00:26:09.880
in the proof to get a partition
into a bounded number of parts
00:26:09.880 --> 00:26:14.080
such that the final
partition is epsilon regular.
00:26:14.080 --> 00:26:17.020
But instead, in the proof
of the regularity lemma
00:26:17.020 --> 00:26:19.180
if you start with not
a trivial partition
00:26:19.180 --> 00:26:24.100
but start with a given partition
and run this exact same proof,
00:26:24.100 --> 00:26:26.380
you find this consequence.
00:26:26.380 --> 00:26:29.320
Except now you can guarantee
that the final partition
00:26:29.320 --> 00:26:31.570
is a refinement of the
one that you are given.
00:26:35.880 --> 00:26:42.940
So let's apply the
statement, and we obtain
00:26:42.940 --> 00:26:50.640
a sequence of partitions of g--
00:26:50.640 --> 00:26:52.460
the vertex set of g--
00:26:52.460 --> 00:27:07.250
starting with P0 being a
trivial partition, and so on.
00:27:07.250 --> 00:27:17.330
Such that each partition,
each P sub i plus 1
00:27:17.330 --> 00:27:29.940
refines the previous
one, and such
00:27:29.940 --> 00:27:39.370
that each P sub i plus 1 is
epsilon sub a P sub i regular.
00:27:42.360 --> 00:27:45.460
So you apply the regularity
lemma with parameter
00:27:45.460 --> 00:27:49.150
based on the number of
parts you currently have.
00:27:49.150 --> 00:27:51.010
Applied to the
current partition,
00:27:51.010 --> 00:27:56.820
you get a finer partition
that's extremely regular.
00:27:56.820 --> 00:27:58.920
And you also know
that the number
00:27:58.920 --> 00:28:04.350
of parts of the new
partition is bounded in terms
00:28:04.350 --> 00:28:06.737
of the previous partition.
00:28:23.138 --> 00:28:26.115
All right.
00:28:26.115 --> 00:28:26.990
Any questions so far?
00:28:31.510 --> 00:28:35.790
So now we get this
sequence of partitions.
00:28:35.790 --> 00:28:37.320
We can keep on doing this.
00:28:37.320 --> 00:28:43.150
So g could be arbitrarily
large, but eventually we
00:28:43.150 --> 00:28:45.980
will be able to obtain
the last condition here,
00:28:45.980 --> 00:28:49.170
which is the only thing
that is missing so far.
00:28:49.170 --> 00:28:59.510
So since the energy is
bounded between 0 and 1,
00:28:59.510 --> 00:29:06.650
there exists some i at
most 1 over epsilon 0
00:29:06.650 --> 00:29:15.860
such that the energy goes
up by less than epsilon 0.
00:29:26.390 --> 00:29:28.700
Because otherwise your
energy would exceed 1.
00:29:33.370 --> 00:29:38.590
So now let's set
P to be this Pi,
00:29:38.590 --> 00:29:43.920
and Q to be this,
the refinement--
00:29:43.920 --> 00:29:45.670
the next term in the partition.
00:29:50.470 --> 00:29:54.680
And what we find is that the--
00:29:54.680 --> 00:29:57.110
so then you have basically
all the conditions.
00:29:57.110 --> 00:30:03.080
So p is epsilon 0 regular,
because it is epsilon--
00:30:03.080 --> 00:30:07.870
the previous term, which is
at most epsilon 0 regular.
00:30:07.870 --> 00:30:11.970
And you have this one as
well, and this one as well.
00:30:11.970 --> 00:30:14.700
And we want to show that
the number of parts of Q
00:30:14.700 --> 00:30:16.960
is bounded.
00:30:16.960 --> 00:30:19.310
And that's basically
because each time there
00:30:19.310 --> 00:30:22.790
was a bound on the number
of parts which depends only
00:30:22.790 --> 00:30:24.890
on the regularity
parameters, and you're
00:30:24.890 --> 00:30:27.980
repeating that bound a
bounded number of times.
00:30:30.520 --> 00:30:44.610
So Q is-- so it's bounded as
a function of the sequence
00:30:44.610 --> 00:30:47.010
of epsilons-- this infinite
vector of epsilons,
00:30:47.010 --> 00:30:49.180
but it is a bounded number.
00:30:49.180 --> 00:30:54.323
You're only iterating this
bound a bounded number of times.
00:30:54.323 --> 00:30:55.490
And that finishes the proof.
00:31:00.946 --> 00:31:02.363
Any questions?
00:31:09.770 --> 00:31:13.180
It may be somewhat mysterious
to you right now why we do this,
00:31:13.180 --> 00:31:14.840
so we'll get that
application a second.
00:31:14.840 --> 00:31:18.010
But for now, I just want to
comment a bit on the bounds.
00:31:26.090 --> 00:31:32.130
Of course, the bounds depend
on what epsilon i's do you use.
00:31:32.130 --> 00:31:33.900
And typically, you
want the epsilon i's
00:31:33.900 --> 00:31:37.350
to decrease with more
parts that you have.
00:31:37.350 --> 00:31:40.020
And with almost all
reasonable applications
00:31:40.020 --> 00:31:44.450
of this regularity lemma,
the strong regularity lemma--
00:31:44.450 --> 00:31:52.960
so for example, with epsilon
i being some epsilon divided
00:31:52.960 --> 00:31:58.140
by, let's say, i plus 1 or
any polynomial of the i's--
00:31:58.140 --> 00:32:02.230
or you can even let it decay
quicker than that, as well.
00:32:02.230 --> 00:32:06.310
You see, basically what
happens is that you
00:32:06.310 --> 00:32:09.310
are applying this m0 bound.
00:32:15.300 --> 00:32:25.360
m0 applied in succession
1 over epsilon times.
00:32:29.250 --> 00:32:33.800
In the regularity lemma, we
saw that the m0 that comes out
00:32:33.800 --> 00:32:37.130
of Szemeredi's graph regularity
lemma is the tower function.
00:32:39.872 --> 00:32:43.970
So the tower function,
that's a tower of i
00:32:43.970 --> 00:32:52.326
is defined to be the exponential
function iterated i times.
00:32:52.326 --> 00:32:53.920
So of course, I'm
being somewhat loose
00:32:53.920 --> 00:32:57.250
here with the exact dependence,
but you get the idea
00:32:57.250 --> 00:33:02.050
that now we want to apply
the tower function i times.
00:33:15.220 --> 00:33:17.350
Instead of iterating
the exponential i times,
00:33:17.350 --> 00:33:20.380
now you iterate the
tower function i times.
00:33:20.380 --> 00:33:23.292
And some of you laughing, this
is an incredibly large number.
00:33:23.292 --> 00:33:25.000
It's even larger than
the tower function.
00:33:29.490 --> 00:33:33.530
So in literature, especially
around the regularity lemma,
00:33:33.530 --> 00:33:37.020
this function where you
iterate the tower function i
00:33:37.020 --> 00:33:39.873
times is given the name wowzer.
00:33:39.873 --> 00:33:47.280
[LAUGHTER] As in, wow,
this is a huge number.
00:33:47.280 --> 00:33:50.400
So it's a step up in
the Ackerman hierarchy.
00:33:50.400 --> 00:33:53.480
So if you repeat the
wowzer function i times,
00:33:53.480 --> 00:33:56.640
you move up one ladder
in the Ackerman hierarchy
00:33:56.640 --> 00:34:01.060
and this hierarchy of
rapidly growing functions.
00:34:01.060 --> 00:34:03.560
But in any case, it's bounded
and that's good enough for us.
00:34:10.695 --> 00:34:11.570
Any questions so far?
00:34:16.380 --> 00:34:16.950
Yeah?
00:34:16.950 --> 00:34:19.857
AUDIENCE: What do you
call like [INAUDIBLE]
00:34:19.857 --> 00:34:21.690
YUFEI ZHAO: Yes, so
question is, what do you
00:34:21.690 --> 00:34:23.920
call wowzer iterated?
00:34:23.920 --> 00:34:27.170
I'm not aware of a
standard name for that.
00:34:27.170 --> 00:34:28.980
Actually, even the
name wowzer somehow
00:34:28.980 --> 00:34:32.139
is very common in the
combinatorics community,
00:34:32.139 --> 00:34:34.199
but I think most people
outside this community
00:34:34.199 --> 00:34:35.510
will not recognize this word.
00:34:40.139 --> 00:34:42.389
Any more questions?
00:34:42.389 --> 00:34:46.145
So another way it's a step
up in Ackerman hierarchy.
00:34:46.145 --> 00:34:48.270
So it's enumerated one,
two, three, four, you know,
00:34:48.270 --> 00:34:49.170
if you keep going up.
00:34:52.994 --> 00:34:56.340
All right.
00:34:56.340 --> 00:35:01.290
Another remark about this
strong regularity lemma
00:35:01.290 --> 00:35:04.910
is that it will be
convenient for us-- actually,
00:35:04.910 --> 00:35:07.950
some are more essential compared
to our previous applications--
00:35:07.950 --> 00:35:09.912
to make the parts equitable.
00:35:13.150 --> 00:35:18.710
So P and Q equitable.
00:35:18.710 --> 00:35:22.175
And basically, the parts
are such that all the--
00:35:22.175 --> 00:35:25.068
the partitions are such that
all the parts have basically
00:35:25.068 --> 00:35:26.235
the same number of vertices.
00:35:26.235 --> 00:35:30.485
So I won't make it
precise, but you can do it.
00:35:30.485 --> 00:35:31.610
It's not too hard to do it.
00:35:31.610 --> 00:35:35.180
And you can prove
it similar to how
00:35:35.180 --> 00:35:39.030
I described how to modify the
proof of the regularity level.
00:35:39.030 --> 00:35:41.240
So I won't belabor
that point, but we'll
00:35:41.240 --> 00:35:42.470
use the equitable version.
00:35:45.790 --> 00:35:50.760
All right, so how does one
use this regularity lemma?
00:35:50.760 --> 00:35:53.460
Let me state a
corollary, and let
00:35:53.460 --> 00:35:55.850
me call this a corollary
star because you actually
00:35:55.850 --> 00:35:58.430
need to do some work
to get it to follow
00:35:58.430 --> 00:35:59.880
from the strong
regularity lemma.
00:35:59.880 --> 00:36:01.547
But the corollary is
the version that we
00:36:01.547 --> 00:36:06.420
will apply that if you
start with a decreasing
00:36:06.420 --> 00:36:17.215
sequence of this epsilon,
then there exists a delta such
00:36:17.215 --> 00:36:18.340
that the following is true.
00:36:22.650 --> 00:36:41.360
Every n vertex graph has an
equitable vertex partition,
00:36:41.360 --> 00:36:51.530
call it i through the
k, and a subset Wi
00:36:51.530 --> 00:36:59.580
of each Vi such that the
following properties hold.
00:36:59.580 --> 00:37:04.960
First, all the W's
are fairly large.
00:37:04.960 --> 00:37:07.070
They're at least
constant proportion
00:37:07.070 --> 00:37:08.540
of the total vertex set.
00:37:13.040 --> 00:37:22.520
Between every pair of Wi Wj,
it is epsilon sub k regular.
00:37:30.890 --> 00:37:32.640
And this is the point
I want to emphasize.
00:37:32.640 --> 00:37:35.360
So here there are not you
regular pairs anymore.
00:37:35.360 --> 00:37:37.170
So it is every.
00:37:41.274 --> 00:37:44.870
So no irregular pairs
between the Wi's,
00:37:44.870 --> 00:37:48.740
and also we need
to include the case
00:37:48.740 --> 00:37:50.900
when i equals the j, as well.
00:37:50.900 --> 00:37:53.495
So each Wi is
regular with itself.
00:37:57.060 --> 00:38:05.850
And furthermore, the edge
densities between the V's are
00:38:05.850 --> 00:38:11.820
similar to the edge densities
between the corresponding W's.
00:38:11.820 --> 00:38:18.060
And here it is for
most pairs for all
00:38:18.060 --> 00:38:23.550
but at most epsilon
k square pairs.
00:38:30.320 --> 00:38:31.745
Epsilon 0, yeah.
00:38:31.745 --> 00:38:32.860
At most epsilon 0.
00:38:40.660 --> 00:38:42.405
Any questions about
the statement?
00:38:57.440 --> 00:39:00.950
So let me show you
how you could deduce
00:39:00.950 --> 00:39:04.520
the corollary from the
strong regularity lemma.
00:39:16.950 --> 00:39:18.720
So first, let me
draw your picture.
00:39:23.070 --> 00:39:26.346
So here you have a
regularity partition.
00:39:31.010 --> 00:39:38.155
And so these are your
V's, and inside each V
00:39:38.155 --> 00:39:50.340
I find a W such that
if I look at the edge
00:39:50.340 --> 00:39:53.350
sets between pairwise
blue sets, including
00:39:53.350 --> 00:39:58.950
the blue sets with themselves,
it is always very regular.
00:39:58.950 --> 00:40:04.620
And also, the edge densities
between the blue sets
00:40:04.620 --> 00:40:07.590
is mostly very
similar to the edge
00:40:07.590 --> 00:40:10.110
density between their
ambient white sets.
00:40:20.040 --> 00:40:21.910
OK, so let me say a few words--
00:40:21.910 --> 00:40:23.880
I won't go into
too many details--
00:40:23.880 --> 00:40:26.280
about how you might
deduce this corollary
00:40:26.280 --> 00:40:28.470
from the strong
regularity lemma.
00:40:31.280 --> 00:40:33.320
So first let me
do something which
00:40:33.320 --> 00:40:39.830
is slightly simpler, which
is to not yet require
00:40:39.830 --> 00:40:44.326
that the blue sets, Wi's,
are regular with themselves.
00:40:51.480 --> 00:40:59.630
So without requiring this
as regular so we can obtain
00:40:59.630 --> 00:41:13.660
the Wi's by picking
a uniform random part
00:41:13.660 --> 00:41:27.870
of the final partition,
Q, inside each part of P
00:41:27.870 --> 00:41:29.640
in the strong regularity lemma.
00:41:35.920 --> 00:41:37.840
So you have the strong
regularity lemma,
00:41:37.840 --> 00:41:44.080
which produces for you a
pair of partitions like that.
00:41:44.080 --> 00:41:47.020
So it produces for you
a pair of partitions.
00:41:47.020 --> 00:41:53.560
And what we will do is to pick
one of these guys as my W,
00:41:53.560 --> 00:41:55.048
pick one of these
guys at random,
00:41:55.048 --> 00:41:56.590
and pick one of
those guys at random.
00:42:00.730 --> 00:42:06.620
Because W is so extremely
regular, most of these pairs
00:42:06.620 --> 00:42:09.740
will be regular.
00:42:09.740 --> 00:42:13.610
So with high
probability, you will not
00:42:13.610 --> 00:42:18.800
encounter any
irregular pairs if you
00:42:18.800 --> 00:42:25.570
pick the W's randomly as parts
of Q. So that's the key point.
00:42:25.570 --> 00:42:28.840
Here we're using that
Q is extremely regular.
00:42:40.930 --> 00:42:47.420
So all the Wi Wj is
regular for all i not equal
00:42:47.420 --> 00:42:49.718
to j with high probability.
00:42:53.550 --> 00:42:56.490
But the other thing that we
would like is that the edge
00:42:56.490 --> 00:43:00.810
densities between the W's
are similar to those between
00:43:00.810 --> 00:43:02.740
the V's.
00:43:02.740 --> 00:43:06.020
And for that, we will use this
condition about their energies
00:43:06.020 --> 00:43:07.550
being very similar
to each other.
00:43:10.650 --> 00:43:16.800
So the third
consequence, C, is--
00:43:16.800 --> 00:43:22.200
it's a consequence
of the energy bound.
00:43:30.820 --> 00:43:34.510
Because recall that in our proof
of the Szemeredi regularity
00:43:34.510 --> 00:43:36.730
lemma there was
an interpretation
00:43:36.730 --> 00:43:43.860
of the energy as
the second moment
00:43:43.860 --> 00:43:47.340
of a certain random
variable which we called z.
00:43:51.640 --> 00:43:54.830
And using that interpretation,
I can write down
00:43:54.830 --> 00:44:00.320
this expression like that.
00:44:00.320 --> 00:44:03.860
We are here assuming
for simplicity
00:44:03.860 --> 00:44:08.030
that Q is completely
equitable, so all the parts
00:44:08.030 --> 00:44:09.650
have exactly the same size.
00:44:09.650 --> 00:44:15.740
Z of Q is defined to be the
edge density between Vi and Vj
00:44:15.740 --> 00:44:21.010
for random ij.
00:44:21.010 --> 00:44:23.770
So this is a random variable z.
00:44:23.770 --> 00:44:28.410
So you pick pair
of parts uniformly,
00:44:28.410 --> 00:44:31.110
or maybe with some weights
if they're not exactly equal.
00:44:31.110 --> 00:44:34.480
And you evaluate
the edge density.
00:44:34.480 --> 00:44:37.650
So this energy difference
is the difference
00:44:37.650 --> 00:44:39.330
between the second moments.
00:44:39.330 --> 00:44:46.710
And because Q is
a refinement of P,
00:44:46.710 --> 00:44:55.870
it is the case that this
difference of L2 norms
00:44:55.870 --> 00:45:00.860
is equal to the second
moment of the difference
00:45:00.860 --> 00:45:02.480
of the random variables.
00:45:02.480 --> 00:45:04.760
So we saw a version
of this earlier
00:45:04.760 --> 00:45:07.910
when we were discussing
variance in the context
00:45:07.910 --> 00:45:10.520
of the proof of the
similar irregularity lemma.
00:45:10.520 --> 00:45:11.810
Here it's basically the same.
00:45:11.810 --> 00:45:16.430
You can either look at this
inequality part by part of V,
00:45:16.430 --> 00:45:21.050
or if you like to be
a bit more abstract
00:45:21.050 --> 00:45:24.170
then this is actually a
case of Pythagorean theorem.
00:45:29.910 --> 00:45:34.350
If you view these as vectors
in a certain vector space,
00:45:34.350 --> 00:45:36.100
then you have some
orthogonality.
00:45:36.100 --> 00:45:40.378
So you have this sum
of squares identity.
00:45:45.860 --> 00:45:47.792
Where does part A come from?
00:45:47.792 --> 00:45:52.340
So part A, we want the parts,
that Wi's to be not too small,
00:45:52.340 --> 00:46:15.561
but that comes from a bound
on the number of parts of Q.
00:46:15.561 --> 00:46:18.810
So so far this more or
less proves the corollary
00:46:18.810 --> 00:46:23.050
except for that we
simplified our lives
00:46:23.050 --> 00:46:29.680
by requiring just that the i
not equal to j, the Vi Vj's are
00:46:29.680 --> 00:46:31.100
regular.
00:46:31.100 --> 00:46:33.800
But in the statement
up there, we also want
00:46:33.800 --> 00:46:37.650
the Vi's-- so the Wi's ice to
be regular with themselves,
00:46:37.650 --> 00:46:41.440
which will be important
for application.
00:46:41.440 --> 00:46:45.670
So I won't explain how to do
that, and part of the reason
00:46:45.670 --> 00:46:49.170
is that this is also one
of your homework problems.
00:46:49.170 --> 00:46:51.920
So in one of the homework
problems problem set 3,
00:46:51.920 --> 00:46:55.430
you were asked to prove
that every graph has
00:46:55.430 --> 00:47:01.100
a subset of vertices that is of
least constant proportion such
00:47:01.100 --> 00:47:04.880
that it is regular with itself.
00:47:04.880 --> 00:47:06.770
And the methods
you use there will
00:47:06.770 --> 00:47:12.210
be applicable to handle the
situation over here, as well.
00:47:12.210 --> 00:47:15.030
So putting all of these
ingredients together,
00:47:15.030 --> 00:47:20.020
we get the corollary whereby
you have this picture,
00:47:20.020 --> 00:47:21.610
you have this partition.
00:47:21.610 --> 00:47:24.190
I don't even require
the Vi's to be regular.
00:47:24.190 --> 00:47:25.640
That doesn't matter anymore.
00:47:25.640 --> 00:47:29.230
All that matters is that between
the Wi's they are very regular,
00:47:29.230 --> 00:47:34.450
and that there are no irregular
parts between these Wi's.
00:47:34.450 --> 00:47:41.610
And now we'll be able to go back
to the induced graph removal
00:47:41.610 --> 00:47:46.920
lemma where previously we had
an issue with the existence
00:47:46.920 --> 00:47:51.880
of irregular pairs in the use of
Szemeredi regularity partition,
00:47:51.880 --> 00:47:55.250
and now we have a tool
to get around that.
00:47:55.250 --> 00:48:00.300
So next we will see how
to execute this proof,
00:48:00.300 --> 00:48:03.840
but at this point hopefully
you already see an outline.
00:48:03.840 --> 00:48:10.678
Because you no longer need to
worry about this thing here.
00:48:10.678 --> 00:48:11.720
Let's take a quick break.
00:48:14.760 --> 00:48:15.830
Any questions so far?
00:48:20.600 --> 00:48:21.480
Yes?
00:48:21.480 --> 00:48:31.540
AUDIENCE: Why are we
able to [INAUDIBLE]
00:48:31.540 --> 00:48:33.040
YUFEI ZHAO: OK, so
the question was,
00:48:33.040 --> 00:48:37.170
there was a step where we were
looking at some expectations
00:48:37.170 --> 00:48:39.300
of squares.
00:48:39.300 --> 00:48:43.670
And so why was
that identity true?
00:48:43.670 --> 00:48:46.250
So if you look back to the
proof of Szemeredi's regularity
00:48:46.250 --> 00:48:48.932
lemma, we already saw an
instance of that inequality
00:48:48.932 --> 00:48:50.390
in the computation
of the variance.
00:48:57.860 --> 00:49:01.270
So you know that the
variance of x, on one
00:49:01.270 --> 00:49:10.470
hand it is equal to where
mu is the mean of x.
00:49:10.470 --> 00:49:15.670
And on the other hand, it
is equal to this quantity.
00:49:20.410 --> 00:49:23.360
So you agree with this formula?
00:49:23.360 --> 00:49:28.360
And you can expand it to
prove it, and the thing that--
00:49:28.360 --> 00:49:30.990
the question that you
raised basically you
00:49:30.990 --> 00:49:34.680
can prove by looking at
this formula part by part.
00:49:39.250 --> 00:49:40.602
Any more questions?
00:49:49.760 --> 00:49:54.350
So let's now prove the
induced graph removal lemma.
00:49:54.350 --> 00:49:57.110
And we'll follow the
regularity partition,
00:49:57.110 --> 00:49:59.510
but with a small
twist that Instead
00:49:59.510 --> 00:50:01.850
of using Szemeredi's
regularity lemma,
00:50:01.850 --> 00:50:03.740
we will use that
corollary up there.
00:50:11.560 --> 00:50:14.880
So let's prove the induced
graph removal lemma.
00:50:20.820 --> 00:50:21.800
So the three steps.
00:50:21.800 --> 00:50:23.451
First, we do partition.
00:50:30.050 --> 00:50:33.370
So let's suppose you have a--
00:50:36.850 --> 00:50:39.100
so we suppose g is like above.
00:50:39.100 --> 00:50:45.200
You have very few
induced copies of H.
00:50:45.200 --> 00:50:51.170
Let's apply the corollary to get
a partition of the vertex set
00:50:51.170 --> 00:50:57.660
of g into k parts.
00:50:57.660 --> 00:51:04.690
And inside each part
I have a W. Satisfying
00:51:04.690 --> 00:51:11.950
the following properties
that each Wi Wj
00:51:11.950 --> 00:51:18.517
is regular with the
following parameter which
00:51:18.517 --> 00:51:21.100
will come out of later when we
need to use the counting lemma.
00:51:21.100 --> 00:51:23.740
But it's some number, but
don't worry too much about it.
00:51:27.050 --> 00:51:30.310
So here I'm going to--
00:51:30.310 --> 00:51:35.685
so let's say H has
little H vertices.
00:51:45.405 --> 00:51:48.060
So between Wi Wj
it is this regular.
00:51:48.060 --> 00:51:50.730
So we actually have not
yet used the full strength
00:51:50.730 --> 00:51:58.740
of the corollary where I can
make the regularity even depend
00:51:58.740 --> 00:51:59.520
on k.
00:51:59.520 --> 00:52:01.650
So we will not need
that here, but we'll
00:52:01.650 --> 00:52:04.090
need it in a later application.
00:52:04.090 --> 00:52:11.120
So the exponent is little H.
00:52:11.120 --> 00:52:18.890
OK, so other properties are that
the densities between the Vi's
00:52:18.890 --> 00:52:28.910
and the Wi's do not differ
by more than epsilon over 2
00:52:28.910 --> 00:52:33.700
for all but a small fraction--
00:52:33.700 --> 00:52:36.480
so epsilon k squared over 2--
00:52:36.480 --> 00:52:36.980
pairs.
00:52:45.150 --> 00:52:53.010
And finally, the sizes of the
Wi's are at least delta 0 times
00:52:53.010 --> 00:52:57.010
n where delta 0 depends
only on epsilon.
00:53:04.310 --> 00:53:07.600
Epsilon and H.
00:53:21.210 --> 00:53:25.320
This is the partition step,
so now let's do the cleaning.
00:53:28.210 --> 00:53:34.830
In the cleaning step,
basically we're not going to--
00:53:34.830 --> 00:53:37.620
I mean, there is no longer an
issue of irregular pairs if we
00:53:37.620 --> 00:53:40.100
only look at the Wi's.
00:53:40.100 --> 00:53:43.560
So we just need to think
about the low density pairs
00:53:43.560 --> 00:53:46.710
or whatever the
corresponding analog is.
00:53:46.710 --> 00:53:52.200
And what happens here is
that for every i less than j,
00:53:52.200 --> 00:53:57.790
and crucially including
when i equals to j,
00:53:57.790 --> 00:54:06.330
if the edge densities
between the W's is too small
00:54:06.330 --> 00:54:18.790
then we remove all
edges between Vi and Vj.
00:54:23.070 --> 00:54:31.810
And if the edge density
between the Wi's is too big,
00:54:31.810 --> 00:54:35.395
then we remove all edges.
00:54:38.280 --> 00:54:41.560
So we add all edges
between Vi and Vj.
00:55:00.900 --> 00:55:02.970
How many edges do we end
up adding or removing?
00:55:06.560 --> 00:55:21.460
So the total number of edges
added or removed from g is--
00:55:21.460 --> 00:55:26.680
in this case, so if
the edges density
00:55:26.680 --> 00:55:32.160
in g between the Vi's and
Vj's is also very small,
00:55:32.160 --> 00:55:36.110
then you do not remove
very many edges.
00:55:36.110 --> 00:55:41.190
But most pairs of Vi and
Vj have that property.
00:55:41.190 --> 00:55:43.650
So you tidy up
what kind of errors
00:55:43.650 --> 00:55:48.140
you can get from here
and there, and you
00:55:48.140 --> 00:55:51.890
find that the total number of
edges that are added or removed
00:55:51.890 --> 00:55:59.150
from g is less than, let's
say, epsilon n squared.
00:55:59.150 --> 00:56:00.830
Maybe even get an
extra factor of 2,
00:56:00.830 --> 00:56:04.310
but you know, upon changing
some constant factors,
00:56:04.310 --> 00:56:08.180
it's less than
epsilon n squared.
00:56:08.180 --> 00:56:13.470
So this is some small
details you can work out.
00:56:13.470 --> 00:56:15.780
Here we're using--
asking, how is
00:56:15.780 --> 00:56:19.410
the density between Vi and
Vj related to Wi and Wj?
00:56:19.410 --> 00:56:23.540
Well, for most pairs of i
and j they're very similar.
00:56:23.540 --> 00:56:26.180
And there's a small fraction
of them that are not similar,
00:56:26.180 --> 00:56:32.310
but then you lump everything
in to this bound over here.
00:56:40.290 --> 00:56:43.212
So maybe I need to--
00:56:43.212 --> 00:56:44.920
let me just put a 2
here just to be safe.
00:56:48.690 --> 00:56:50.780
All right.
00:56:50.780 --> 00:56:55.590
So we deleted a very
small number of edges,
00:56:55.590 --> 00:56:57.750
and now we want to show
that the graph that
00:56:57.750 --> 00:57:01.740
has resulted from
this modification
00:57:01.740 --> 00:57:05.250
does not have any
induced H sub-graphs.
00:57:11.480 --> 00:57:15.110
And the final step
is the counting step.
00:57:15.110 --> 00:57:20.960
So suppose there
were any induced
00:57:20.960 --> 00:57:26.860
H left after the modification.
00:57:26.860 --> 00:57:30.020
So I want to show that, in fact,
there must be a lot of H's--
00:57:30.020 --> 00:57:32.160
induced H's originally
in the graph,
00:57:32.160 --> 00:57:34.336
thereby contradicting
the hypothesis.
00:57:41.690 --> 00:57:45.210
So where does this
induced H sit?
00:57:45.210 --> 00:57:55.070
Well, you have the V's,
and inside the V's you have
00:57:55.070 --> 00:57:55.570
the W's.
00:58:04.170 --> 00:58:13.200
So suppose my H is that
graph for illustration.
00:58:13.200 --> 00:58:16.770
And in particular,
I have a non-edge.
00:58:16.770 --> 00:58:20.370
So I have an edge, and
I also have a non-edge.
00:58:20.370 --> 00:58:22.895
So between these two,
that's the non-edge.
00:58:27.950 --> 00:58:34.590
So suppose you find a copy
of H in the cleaned-up graph.
00:58:34.590 --> 00:58:36.110
Where can that cleaned up--
00:58:36.110 --> 00:58:37.940
this copy of H sit?
00:58:37.940 --> 00:58:39.790
Suppose you find it here.
00:58:43.440 --> 00:58:52.130
The claim now is that if this
copy of H existed here, then
00:58:52.130 --> 00:58:57.050
I must be able to find
many such copies of H
00:58:57.050 --> 00:58:59.090
in the corresponding
yellow parts.
00:59:01.940 --> 00:59:10.050
Because between the yellow
parts you have regularity,
00:59:10.050 --> 00:59:15.450
and you also have the
right kinds of densities.
00:59:15.450 --> 00:59:17.710
Because if they didn't have
the right kind of density,
00:59:17.710 --> 00:59:19.210
we would have cleaned
it up already.
00:59:22.900 --> 00:59:26.000
So that's the ideal.
00:59:26.000 --> 00:59:29.720
If you had a copy
of this H somewhere,
00:59:29.720 --> 00:59:31.820
then I zoom into
the yellow parts,
00:59:31.820 --> 00:59:37.160
zoom into these W's, and I find
lots of copies of H in between
00:59:37.160 --> 00:59:39.430
the W's.
00:59:39.430 --> 00:59:40.930
So suppose-- let
me write this down.
00:59:40.930 --> 00:59:57.690
So suppose the little V's, so
the vertices, lies in the--
00:59:57.690 --> 01:00:00.540
so I'm just indexing
where a little v lies.
01:00:00.540 --> 01:00:03.225
The little v lies
in big V sub phi
01:00:03.225 --> 01:00:14.600
V for some phi which since the
vertices of H2 went through k.
01:00:14.600 --> 01:00:34.650
So now we apply counting lemma
to embed induced copies of H
01:00:34.650 --> 01:00:45.710
in g where the vertex
V in H is mapped
01:00:45.710 --> 01:00:54.455
to a vertex in the
corresponding W.
01:01:00.630 --> 01:01:05.440
And we would like to know that
there are lots of such copies.
01:01:05.440 --> 01:01:06.660
And the counting Lemma--
01:01:06.660 --> 01:01:11.730
or rather, some variant, but I
should read the counting lemma
01:01:11.730 --> 01:01:16.940
that we did last time and view
it as a multi-partite version.
01:01:16.940 --> 01:01:21.440
Apply this so far part to part.
01:01:21.440 --> 01:01:30.340
So we find that the number
of such induced copies
01:01:30.340 --> 01:01:32.180
is within a small error.
01:01:36.630 --> 01:01:47.700
So that regularity parameter
multiplied by the number
01:01:47.700 --> 01:01:51.690
of edges of H, which we
already canceled out,
01:01:51.690 --> 01:02:01.015
multiplied by the
product of these Wi's.
01:02:04.490 --> 01:02:08.660
So it's within
this error of what
01:02:08.660 --> 01:02:13.250
you would suspect if you
naively multiply the edge
01:02:13.250 --> 01:02:16.953
densities together along
with the vertex densities.
01:02:29.280 --> 01:02:35.790
So these factors are for the
edges that you want to embed,
01:02:35.790 --> 01:02:39.460
and then I also need to
multiply the densities
01:02:39.460 --> 01:02:40.510
for the long edges.
01:02:51.378 --> 01:02:54.542
So 1 minus these edge densities.
01:02:54.542 --> 01:02:56.500
So one way you can think
of it is just consider
01:02:56.500 --> 01:02:58.510
the complement in g.
01:02:58.510 --> 01:03:02.500
So consider the complement of
g to get this version here.
01:03:02.500 --> 01:03:06.240
And then finally, the product
of the vertex set sizes.
01:03:18.000 --> 01:03:22.500
And the point is that this
is not a small number.
01:03:22.500 --> 01:03:35.870
So hence the number of
induced copies of H in g
01:03:35.870 --> 01:03:41.450
is at least on the order of--
01:03:41.450 --> 01:03:42.370
well, OK?
01:03:44.910 --> 01:03:50.590
So it's at least some
number, which is basically
01:03:50.590 --> 01:03:51.880
this guy over here.
01:03:51.880 --> 01:03:55.820
So epsilon over 4 raised to--
01:03:55.820 --> 01:03:57.820
all of these are constants,
so that's the point.
01:03:57.820 --> 01:03:59.740
All of these guys are
constants, minus--
01:04:05.050 --> 01:04:08.103
so here is the main term,
and then the error term.
01:04:13.790 --> 01:04:17.210
And then the product of
these vertex set sizes,
01:04:17.210 --> 01:04:19.680
and we saw that each vertex
set is not too small.
01:04:25.840 --> 01:04:30.035
So you have lots of
induced copies of H in g.
01:04:30.035 --> 01:04:31.430
Yep?
01:04:31.430 --> 01:04:34.600
AUDIENCE: How do you
do in the case where
01:04:34.600 --> 01:04:44.740
the density between [INAUDIBLE]
01:04:44.740 --> 01:04:47.360
YUFEI ZHAO: OK, so can
you repeat your question?
01:04:47.360 --> 01:04:53.045
AUDIENCE: How are you
dealing with the [INAUDIBLE]
01:04:53.045 --> 01:04:53.670
YUFEI ZHAO: OK.
01:04:53.670 --> 01:04:55.860
So question, how do we deal
with the all but epsilon
01:04:55.860 --> 01:04:57.080
over two pairs?
01:04:57.080 --> 01:04:59.375
So that comes up in
the cleaning step
01:04:59.375 --> 01:05:01.760
in what I wrote
in red in dealing
01:05:01.760 --> 01:05:07.780
with the number of total edges
that are added or removed.
01:05:07.780 --> 01:05:10.330
So think about how many
edges are added or removed.
01:05:10.330 --> 01:05:15.440
In these non-exceptional pairs,
the number of edges that are
01:05:15.440 --> 01:05:16.310
added or removed--
01:05:26.470 --> 01:05:28.670
let's just think
about added edges.
01:05:28.670 --> 01:05:45.930
So if the density of V is
controlled by that of W,
01:05:45.930 --> 01:05:48.910
then the number of edges added--
01:05:48.910 --> 01:05:50.730
or removed, in that case--
01:05:50.730 --> 01:05:56.230
from all such pairs along with--
01:05:56.230 --> 01:05:56.730
yeah.
01:05:56.730 --> 01:06:01.410
So you have epsilon n
squared edges changed.
01:06:06.540 --> 01:06:17.780
On the other hand, if this is
not true then you only have
01:06:17.780 --> 01:06:23.000
epsilon k squared such pairs ij
for which this cannot be true.
01:06:23.000 --> 01:06:26.380
So you also only have at
most epsilon n squared edges
01:06:26.380 --> 01:06:29.850
added or removed in such cases.
01:06:29.850 --> 01:06:32.642
That answers your question?
01:06:32.642 --> 01:06:33.907
Yes?
01:06:33.907 --> 01:06:35.032
AUDIENCE: Is that number 0?
01:06:37.755 --> 01:06:39.624
YUFEI ZHAO: Is which number 0?
01:06:39.624 --> 01:06:45.320
AUDIENCE: The number of induced
edges for the [INAUDIBLE]
01:06:45.320 --> 01:06:46.520
YUFEI ZHAO: The--
01:06:46.520 --> 01:06:47.951
AUDIENCE: Yeah, the top board.
01:06:47.951 --> 01:06:48.868
YUFEI ZHAO: Top board?
01:06:58.840 --> 01:06:59.400
Good.
01:06:59.400 --> 01:07:01.695
So asking about this number.
01:07:01.695 --> 01:07:02.820
So that should have been 2.
01:07:08.580 --> 01:07:10.020
Yes?
01:07:10.020 --> 01:07:12.173
AUDIENCE: I don't
see k anywhere.
01:07:12.173 --> 01:07:14.840
YUFEI ZHAO: OK, so question, you
don't see k appearing anywhere.
01:07:14.840 --> 01:07:16.893
So the k in the
corollary, do you mean?
01:07:16.893 --> 01:07:17.820
AUDIENCE: Yeah.
01:07:17.820 --> 01:07:19.530
YUFEI ZHAO: So that
hasn't come up yet.
01:07:19.530 --> 01:07:24.180
So it comes up implicitly
because we need to lower bound
01:07:24.180 --> 01:07:26.715
the sizes of these W's.
01:07:31.820 --> 01:07:34.880
So this is partly why we need
a bound on the number of parts,
01:07:34.880 --> 01:07:38.150
but it is true that we do
not need epsilon k to depend
01:07:38.150 --> 01:07:39.860
on k in this application yet.
01:07:39.860 --> 01:07:42.140
I will mention a different
application in the second
01:07:42.140 --> 01:07:43.230
where you do need that k.
01:07:48.810 --> 01:07:52.920
OK, so the number of induced H
in g is at least this amount.
01:07:52.920 --> 01:07:54.080
And that's a small lie.
01:07:54.080 --> 01:07:59.780
You need to maybe consider this
is the number of homomorphic.
01:07:59.780 --> 01:08:01.730
Well, actually, no, we're OK.
01:08:01.730 --> 01:08:02.514
Never mind.
01:08:11.120 --> 01:08:19.609
So you can set delta to
be this quantity here,
01:08:19.609 --> 01:08:21.210
and then that
finishes the proof.
01:08:21.210 --> 01:08:23.500
So you have lots of
induced copies of H
01:08:23.500 --> 01:08:27.765
in your graph which
contradicts the hypothesis.
01:08:27.765 --> 01:08:30.600
So that finishes the proof
of the induced removal lemma,
01:08:30.600 --> 01:08:34.109
and basically the proof is
the same as the usual graph
01:08:34.109 --> 01:08:36.510
removal lemma except
that now we need
01:08:36.510 --> 01:08:40.260
some strengthened
regularity lemma which
01:08:40.260 --> 01:08:43.290
allows us to get rid
of irregular parts
01:08:43.290 --> 01:08:44.890
but in a more
restricted setting.
01:08:44.890 --> 01:08:47.729
Because we saw you cannot
completely get rid of irregular
01:08:47.729 --> 01:08:48.229
parts.
01:08:51.720 --> 01:08:52.500
Any questions?
01:08:56.109 --> 01:08:56.849
Yes?
01:08:56.849 --> 01:09:01.473
AUDIENCE: [INAUDIBLE]
01:09:01.473 --> 01:09:03.640
YUFEI ZHAO: So I want to
address the question of why
01:09:03.640 --> 01:09:05.950
did I state this
corollary in this more
01:09:05.950 --> 01:09:09.520
general form of a decreasing
sequence of epsilons?
01:09:09.520 --> 01:09:11.770
So first of all, with
strong regularity lemmas,
01:09:11.770 --> 01:09:15.189
the strength is sometimes
always nice to--
01:09:15.189 --> 01:09:17.439
it's always nice to state
it with this extra strength.
01:09:17.439 --> 01:09:20.080
Because it's the
right way to think
01:09:20.080 --> 01:09:22.770
about these types of theorems.
01:09:22.770 --> 01:09:25.330
That the regularity
on the parts depends--
01:09:25.330 --> 01:09:28.779
you can make it depend
on the number of parts
01:09:28.779 --> 01:09:32.170
so that you get much stronger
control on the regularity.
01:09:32.170 --> 01:09:33.740
But there are also
some applications.
01:09:33.740 --> 01:09:36.970
For example, whether
I will state next,
01:09:36.970 --> 01:09:40.210
an application where you do
need that kind of strength.
01:09:40.210 --> 01:09:43.899
So here's what's known as
the infinite removal lemma.
01:09:47.260 --> 01:09:49.689
Here we have not
just a single pattern
01:09:49.689 --> 01:09:52.660
or a finite number of patterns
we want to get rid of.
01:09:52.660 --> 01:09:55.330
For now we have
infinitely many patterns.
01:09:55.330 --> 01:10:08.880
So for every curly H, which
is a possibly infinite set
01:10:08.880 --> 01:10:11.040
of graphs.
01:10:11.040 --> 01:10:13.090
The graphs themselves
are always finite,
01:10:13.090 --> 01:10:15.480
but this may be
an infinite list.
01:10:15.480 --> 01:10:19.830
And an epsilon parameter.
01:10:19.830 --> 01:10:26.350
There exists an H0 and a
delta positive parameter
01:10:26.350 --> 01:10:38.230
such that every n vertex
graph with at most delta--
01:10:38.230 --> 01:10:41.180
so less than delta--
01:10:41.180 --> 01:10:52.580
V to the H induced
copies of H for every H
01:10:52.580 --> 01:11:00.205
in this family with
fewer than H0 vertices.
01:11:03.670 --> 01:11:07.670
So every graph
with this property
01:11:07.670 --> 01:11:14.910
can be made curly H free.
01:11:14.910 --> 01:11:17.580
So it means free of--
01:11:17.580 --> 01:11:32.390
induced curly H free by
adding or removing fewer
01:11:32.390 --> 01:11:34.380
than epsilon n squared edges.
01:11:38.230 --> 01:11:39.660
So now instead of
a single pattern
01:11:39.660 --> 01:11:43.370
you have a possibly infinite
set of induced patterns and a
01:11:43.370 --> 01:11:49.700
want to make your
graph curly H free--
01:11:49.700 --> 01:11:51.930
induced curly H free.
01:11:51.930 --> 01:11:55.910
And the theorem is
that if there exists
01:11:55.910 --> 01:12:03.320
some finite bound, H0, such
that if you have few copies--
01:12:03.320 --> 01:12:06.460
so for all the patterns
up to that point--
01:12:06.460 --> 01:12:08.050
then you can do
what you need to do.
01:12:11.050 --> 01:12:13.390
So take some time to even
digest this statement,
01:12:13.390 --> 01:12:16.000
but it's somehow infinite
versions-- the correct infinite
01:12:16.000 --> 01:12:18.220
version of the
removal lemma if you
01:12:18.220 --> 01:12:21.347
have infinitely many patterns
that you need to remove.
01:12:21.347 --> 01:12:22.930
And I claim that the
proof is actually
01:12:22.930 --> 01:12:25.640
more or less the same proof
as the one that we did here,
01:12:25.640 --> 01:12:27.760
except now you need
to take your epsilon
01:12:27.760 --> 01:12:31.900
case, as in this
corollary, to depend on k.
01:12:31.900 --> 01:12:36.670
You need to in some way look
ahead in this infinite pattern.
01:12:36.670 --> 01:12:50.160
So here in proof, this epsilon
k from corollary depends on k.
01:12:50.160 --> 01:13:02.790
And also it depends on
your family of patterns H.
01:13:02.790 --> 01:13:05.340
Finally, I want to
mention a perspective--
01:13:05.340 --> 01:13:09.022
a computer science
perspective on these removal
01:13:09.022 --> 01:13:10.730
lemmas that we've been
discussing so far.
01:13:14.750 --> 01:13:16.510
And that's in the
context of something
01:13:16.510 --> 01:13:17.590
called property testing.
01:13:37.830 --> 01:13:43.350
And basically, we would
like an efficient--
01:13:43.350 --> 01:13:51.900
efficient meaning fast--
randomized algorithm
01:13:51.900 --> 01:14:11.820
to distinguish graphs that
are triangle-free from those
01:14:11.820 --> 01:14:15.747
that are epsilon far
from triangle-free.
01:14:19.330 --> 01:14:22.370
Where being epsilon far
from triangle-free means
01:14:22.370 --> 01:14:32.560
that you need to change more
than epsilon n squared edges
01:14:32.560 --> 01:14:45.310
here. n is, as usual, the number
of vertices to make the graph
01:14:45.310 --> 01:14:45.940
triangle-free.
01:14:45.940 --> 01:14:48.040
So the distance, the
[INAUDIBLE] distance
01:14:48.040 --> 01:14:52.750
is more than epsilon away
from being triangle-free.
01:14:52.750 --> 01:14:55.450
So somebody gives you
a very large graphing.
01:14:55.450 --> 01:14:56.830
n is very large.
01:14:56.830 --> 01:15:00.400
You cannot search through
every triple vertices.
01:15:00.400 --> 01:15:01.600
That's too expensive.
01:15:01.600 --> 01:15:06.280
But you want some way to test
if a graph is triangle-free
01:15:06.280 --> 01:15:09.551
versus very far away
from being triangle-free.
01:15:14.270 --> 01:15:16.670
So there's a very simple
randomized algorithm
01:15:16.670 --> 01:15:24.000
to do this, which is
to just try randomly
01:15:24.000 --> 01:15:30.850
sample a random
triple of vertices
01:15:30.850 --> 01:15:33.090
and check if it's a triangle.
01:15:41.860 --> 01:15:43.710
So you do this.
01:15:43.710 --> 01:15:49.330
And just to make our
life a bit more secure,
01:15:49.330 --> 01:15:53.320
let's try it some
larger number of times.
01:15:53.320 --> 01:15:59.160
So some c of epsilon some
constant number of times.
01:15:59.160 --> 01:16:02.890
And if you find a triangle--
01:16:02.890 --> 01:16:08.770
so if you don't find
a triangle, then we
01:16:08.770 --> 01:16:14.084
return that it's triangle-free.
01:16:18.350 --> 01:16:23.444
Otherwise we return that it is
epsilon far from triangle-free.
01:16:31.980 --> 01:16:33.296
So that's the algorithm.
01:16:36.560 --> 01:16:39.470
So it's a very
intuitive algorithm,
01:16:39.470 --> 01:16:42.180
but why does it work?
01:16:42.180 --> 01:16:45.050
So we want to know that,
indeed, somebody gives you
01:16:45.050 --> 01:16:46.370
one of these two possibilities.
01:16:46.370 --> 01:16:50.340
You run that algorithm, you can
succeed with high probability.
01:16:50.340 --> 01:16:51.544
Question?
01:16:51.544 --> 01:16:54.380
AUDIENCE: [INAUDIBLE]
01:16:54.380 --> 01:16:58.500
YUFEI ZHAO: So let's talk
about why this works.
01:16:58.500 --> 01:17:02.340
So theorem, for
every epsilon, there
01:17:02.340 --> 01:17:08.324
exists a c such that
algorithm succeeds
01:17:08.324 --> 01:17:17.760
with probability bigger than
2/3, and 2/3 can be any number.
01:17:17.760 --> 01:17:20.000
So any number that you
like because you can always
01:17:20.000 --> 01:17:22.901
repeat it to boost that
constant probability.
01:17:26.200 --> 01:17:28.320
So there are two cases.
01:17:28.320 --> 01:17:35.070
If g is triangle-free,
then it always succeeds.
01:17:35.070 --> 01:17:36.720
You'll never find
this triangle, and it
01:17:36.720 --> 01:17:38.435
would return triangle-free.
01:17:44.260 --> 01:17:55.340
On the other hand, if g is
epsilon far from triangle-free,
01:17:55.340 --> 01:18:00.230
then triangle removal
lemma tells us
01:18:00.230 --> 01:18:03.913
that g has lots of triangles.
01:18:07.230 --> 01:18:08.850
Delta n cubed triangles.
01:18:12.100 --> 01:18:23.710
So if we sample c being, let's
say, 1 over delta times--
01:18:23.710 --> 01:18:26.770
delta here is a function of
epsilon from the triangle
01:18:26.770 --> 01:18:28.640
removal lemma.
01:18:28.640 --> 01:18:34.960
So we find that the probability
that the algorithm fails
01:18:34.960 --> 01:18:36.790
is at most--
01:18:51.710 --> 01:18:53.820
so you have a lot of triangles.
01:18:53.820 --> 01:18:56.580
So very likely you will
hit one of these triangles.
01:18:56.580 --> 01:19:00.020
So the probability that the
algorithm fails is at most 1
01:19:00.020 --> 01:19:05.780
minus delta n cubed divided
by total number of triples
01:19:05.780 --> 01:19:08.240
raised to 1 over delta.
01:19:08.240 --> 01:19:13.090
And this is 1 minus at
most 1 minus 6 delta raised
01:19:13.090 --> 01:19:17.900
to 1 over delta, and it's
at most e to the minus 6.
01:19:17.900 --> 01:19:21.890
So less than 1/3 in particular.
01:19:21.890 --> 01:19:25.040
So this algorithm succeeds
with high probability.
01:19:25.040 --> 01:19:26.850
Now, how big of a c do you need?
01:19:26.850 --> 01:19:30.090
Well, that depends on the
triangle removal lemma.
01:19:30.090 --> 01:19:32.290
So it's a constant.
01:19:32.290 --> 01:19:34.410
So it's a constant,
does not depend
01:19:34.410 --> 01:19:37.520
on the size of the graph.
01:19:37.520 --> 01:19:39.360
But it's a large
constant, because we
01:19:39.360 --> 01:19:41.560
saw in the proof
of regularity lemma
01:19:41.560 --> 01:19:42.690
that it can be very large.
01:19:46.800 --> 01:19:49.500
But you know, this
theorem here is basically
01:19:49.500 --> 01:19:53.230
the same as a triangle
removal lemma.
01:19:53.230 --> 01:19:55.630
So it's highly
non-trivial if it's true.
01:19:55.630 --> 01:19:59.470
Even though the algorithm is
extremely naive and simple.
01:19:59.470 --> 01:20:01.420
I just want to finish
off with one more thing.
01:20:01.420 --> 01:20:03.220
Instead of testing
for triangle-freeness,
01:20:03.220 --> 01:20:06.100
you can ask what other
properties can you test?
01:20:06.100 --> 01:20:12.590
So which graph
properties are testable
01:20:12.590 --> 01:20:13.770
in default in that sense?
01:20:16.950 --> 01:20:19.040
So distinguishing
something which
01:20:19.040 --> 01:20:27.070
has the property, so P versus
epsilon far from this property
01:20:27.070 --> 01:20:32.120
P.
01:20:32.120 --> 01:20:34.060
And you have this
tester which is you
01:20:34.060 --> 01:20:37.020
sample some number of vertices.
01:20:37.020 --> 01:20:38.800
So this is called
the oblivious tester.
01:20:42.120 --> 01:20:48.760
So you sample k
vertices, and you try
01:20:48.760 --> 01:20:52.710
to see if it has that property.
01:20:52.710 --> 01:20:56.637
So there's a class of
properties called hereditary.
01:21:00.050 --> 01:21:02.560
So hereditary properties
are properties
01:21:02.560 --> 01:21:05.710
that are closed under
vertex deletion.
01:21:11.320 --> 01:21:13.730
And these properties are--
01:21:13.730 --> 01:21:16.660
lots of properties that you're
seeing are of this form.
01:21:16.660 --> 01:21:24.780
So for example, being H3 is this
form being planar so this one
01:21:24.780 --> 01:21:31.706
being induced H3, so this
one being three-colorable,
01:21:31.706 --> 01:21:33.950
being perfect,
they're all examples
01:21:33.950 --> 01:21:35.600
of hereditary properties.
01:21:35.600 --> 01:21:38.210
Properties that if your
graph is three-colorable,
01:21:38.210 --> 01:21:41.690
you take out some vertices,
it's still three-colorable.
01:21:41.690 --> 01:21:44.120
And all the
discussions that we've
01:21:44.120 --> 01:21:47.960
done so far, in particular
the infinite removal lemma.
01:21:47.960 --> 01:21:52.010
If you phrase it in the form
of property testing given
01:21:52.010 --> 01:22:03.060
the above discussion, it implies
that every hereditary property
01:22:03.060 --> 01:22:03.630
is testable.
01:22:06.860 --> 01:22:10.970
In fact, it's testable
in the above sense
01:22:10.970 --> 01:22:16.110
with a one-sided error
using an oblivious tester.
01:22:16.110 --> 01:22:19.830
One-sided error means that up
there if it's triangle-free,
01:22:19.830 --> 01:22:21.260
then it always succeeds.
01:22:21.260 --> 01:22:26.110
So here one of the cases
that always succeeds.
01:22:26.110 --> 01:22:28.150
And the reason is that
you can characterize
01:22:28.150 --> 01:22:37.350
a hereditary property
by a curly H induced H3
01:22:37.350 --> 01:22:41.790
for some curly H. Namely,
you're putting everything
01:22:41.790 --> 01:22:45.800
into H that do not
have this property.
01:22:53.970 --> 01:22:58.580
This is a possibly
infinite set of graphs,
01:22:58.580 --> 01:23:00.500
and that completely
characterizes
01:23:00.500 --> 01:23:02.700
this hereditary property.
01:23:02.700 --> 01:23:05.460
And if you read out the
infinite removal lemma,
01:23:05.460 --> 01:23:09.950
it says precisely, using
above this interpretation,
01:23:09.950 --> 01:23:14.350
that you have a property
testing algorithm.