WEBVTT
00:00:18.962 --> 00:00:20.670
YUFEI ZHAO: For the
past couple lectures,
00:00:20.670 --> 00:00:22.980
we've been talking
about Roth's theorem.
00:00:22.980 --> 00:00:28.190
And we showed-- so we saw
a proof of Roth's theorem
00:00:28.190 --> 00:00:30.730
using Fourier analytic methods.
00:00:30.730 --> 00:00:32.220
And we saw basically
the same proof
00:00:32.220 --> 00:00:33.930
but in two different settings.
00:00:33.930 --> 00:00:36.250
So two lectures ago,
we saw a proof in F3
00:00:36.250 --> 00:00:39.395
to the M And basically
the same strategy,
00:00:39.395 --> 00:00:40.770
but with a bit
more work, we were
00:00:40.770 --> 00:00:46.560
able to show Roth's theorem
worth roughly comparable bounds
00:00:46.560 --> 00:00:48.750
over the integers.
00:00:48.750 --> 00:00:51.900
Today, I want to show you a
very different kind of proof
00:00:51.900 --> 00:00:54.840
of Roth's theorem in the
finite fieldfini setting.
00:00:54.840 --> 00:00:59.220
So first let me remind
you, the bound that we
00:00:59.220 --> 00:01:04.019
saw last time for
Roth's in F3 to the M
00:01:04.019 --> 00:01:07.920
gave an upper bound
on the maximum number
00:01:07.920 --> 00:01:14.730
of elements in the 3-AP-free
set that were of the form
00:01:14.730 --> 00:01:17.520
3 to the n over n.
00:01:21.010 --> 00:01:22.510
And so this proof
wasn't too bad.
00:01:22.510 --> 00:01:24.190
So we did it in one lecture.
00:01:24.190 --> 00:01:26.670
And then with a lot more
work-- and people tried very,
00:01:26.670 --> 00:01:28.060
very hard to improve this--
00:01:28.060 --> 00:01:33.170
and there was a paper that got
it to just a little bit more.
00:01:33.170 --> 00:01:34.400
And this was a lot of work.
00:01:34.400 --> 00:01:36.760
And this was something
that people thought
00:01:36.760 --> 00:01:39.130
was very exciting at the time.
00:01:39.130 --> 00:01:41.820
And then just a few years ago,
there was a major breakthrough,
00:01:41.820 --> 00:01:44.155
a very surprising
breakthrough, where--
00:01:44.155 --> 00:01:46.030
you know, at this point,
it wasn't even clear
00:01:46.030 --> 00:01:49.020
whether 3 should be the
right base for this exponent.
00:01:49.020 --> 00:01:50.950
That was a big open problem.
00:01:50.950 --> 00:01:52.870
And then there was
a big breakthrough
00:01:52.870 --> 00:01:56.170
where the following
bound was proved,
00:01:56.170 --> 00:02:03.700
that it was exponentially
less than the previous bound.
00:02:03.700 --> 00:02:06.520
So this is one that I want to
talk about in the first part
00:02:06.520 --> 00:02:08.090
of today's lecture.
00:02:08.090 --> 00:02:11.020
So this development came first--
00:02:11.020 --> 00:02:13.060
the history is a
bit interesting.
00:02:13.060 --> 00:02:16.510
So Croot, Lev, and
Pach uploaded a paper
00:02:16.510 --> 00:02:22.060
to the archive May 5 of
2016, where they showed not
00:02:22.060 --> 00:02:26.200
exactly this theorem but in
a slightly different setting
00:02:26.200 --> 00:02:32.170
in this group, so in Z
mod 4 instead of Z mod 3.
00:02:32.170 --> 00:02:33.670
And this was already
quite exciting,
00:02:33.670 --> 00:02:36.640
getting exponential
improvement in this setting.
00:02:36.640 --> 00:02:38.650
But it wasn't
exactly obvious how
00:02:38.650 --> 00:02:41.440
to use their method to get F3.
00:02:41.440 --> 00:02:44.350
But that was done
about a week later.
00:02:44.350 --> 00:02:57.400
So Ellenberg and Gijswijt,
they managed to improve the--
00:02:57.400 --> 00:03:00.320
use this technique to modify
the Croot-Lev-Pach technique
00:03:00.320 --> 00:03:04.570
to the F2 to the n setting,
which is the one that we've
00:03:04.570 --> 00:03:05.380
been interested in.
00:03:05.380 --> 00:03:07.380
So there's a small
difference between these two,
00:03:07.380 --> 00:03:10.855
namely this group has
elements of order 2, which
00:03:10.855 --> 00:03:12.730
makes things a bit easier
to do it here with.
00:03:16.540 --> 00:03:18.400
So this is the
Croot-Lev-Pach method,
00:03:18.400 --> 00:03:20.500
as it's often called
in literature.
00:03:20.500 --> 00:03:23.440
And we'll see that--
it's a very ingenious use
00:03:23.440 --> 00:03:25.240
of the so-called
linear algebraic method
00:03:25.240 --> 00:03:28.570
in combinatorics, in this
case the polynomial method.
00:03:28.570 --> 00:03:32.600
And it works specifically in
the finite field vector space.
00:03:32.600 --> 00:03:35.650
So what we're talking about
in this part of the lecture
00:03:35.650 --> 00:03:37.260
does not translate whatsoever.
00:03:37.260 --> 00:03:40.812
At least, nobody knows how
to translate this technique
00:03:40.812 --> 00:03:41.770
to the integer setting.
00:03:45.010 --> 00:03:47.500
So how does it work?
00:03:47.500 --> 00:03:49.420
The presentation
I'm going to give
00:03:49.420 --> 00:03:52.910
follows not the original paper,
which is quite nice to read,
00:03:52.910 --> 00:03:53.470
by the way.
00:03:53.470 --> 00:03:54.803
It's only about four pages long.
00:03:54.803 --> 00:03:57.000
It's pleasant to read.
00:03:57.000 --> 00:04:00.640
But there's is a slightly
even nicer formulation
00:04:00.640 --> 00:04:01.980
on Terry Tao's blog.
00:04:01.980 --> 00:04:03.795
And that's the one
that I'm presenting.
00:04:07.890 --> 00:04:11.670
So the idea is that if
you have a subset of F3
00:04:11.670 --> 00:04:18.630
to the n that is
3-AP-free, such a set also
00:04:18.630 --> 00:04:24.265
has a name capset, which
is also used in literature
00:04:24.265 --> 00:04:25.890
in this specific
setting where you have
00:04:25.890 --> 00:04:27.720
no three points on the line.
00:04:27.720 --> 00:04:31.659
In this case, then we have
the following identity.
00:04:38.020 --> 00:04:41.710
So here delta is
the Dirac delta.
00:04:41.710 --> 00:04:43.300
Let me write that
down in a second.
00:04:47.470 --> 00:04:51.130
So the delta of a
is the Dirac delta.
00:04:51.130 --> 00:04:56.450
It's either 1 if x equals to a,
and 0 if x does not equal to a.
00:04:59.140 --> 00:05:00.910
So this is simply
rewriting the fact
00:05:00.910 --> 00:05:03.340
that x, y, z form
a 3-AP if and only
00:05:03.340 --> 00:05:08.125
if their sum is equal to 0.
00:05:08.125 --> 00:05:10.420
And because you're
3-AP-free, the only 3-AP's
00:05:10.420 --> 00:05:13.670
are the trivial ones recorded
on the right-hand side.
00:05:13.670 --> 00:05:15.800
So this is simply a
recording of the statement
00:05:15.800 --> 00:05:17.388
that A is 3-AP-free.
00:05:20.140 --> 00:05:23.890
And the idea now is that you
have this expression up there,
00:05:23.890 --> 00:05:28.570
and I want to show that
if A is very, very large,
00:05:28.570 --> 00:05:31.780
then I could get a
contradiction by considering
00:05:31.780 --> 00:05:33.820
some notion of rank.
00:05:33.820 --> 00:05:37.150
So we will show that
the left-hand side
00:05:37.150 --> 00:05:41.870
is, in some sense, low rank.
00:05:41.870 --> 00:05:44.330
Well, I haven't told
you what rank means yet.
00:05:44.330 --> 00:05:46.340
But the left-hand side
is somewhat low rank,
00:05:46.340 --> 00:05:52.750
and the right-hand side
is a high-rank object.
00:05:57.690 --> 00:06:00.440
So what does rank mean.
00:06:00.440 --> 00:06:02.615
So recall from linear algebra--
00:06:07.900 --> 00:06:10.390
so the classical notion
of rank corresponds
00:06:10.390 --> 00:06:12.710
to two variable functions.
00:06:12.710 --> 00:06:17.890
So you should think of F as a
matrix over an arbitrary field
00:06:17.890 --> 00:06:23.360
F. So such a function or
a corresponding matrix
00:06:23.360 --> 00:06:33.600
is called rank 1
if it is nonzero
00:06:33.600 --> 00:06:37.810
and it can be written
in the following form--
00:06:37.810 --> 00:06:48.190
F of x, y is f of x g
of y for some functions
00:06:48.190 --> 00:06:51.130
that are one variable each.
00:06:51.130 --> 00:06:54.990
So, in matrix language,
this is a column vector
00:06:54.990 --> 00:06:57.160
times a row vector.
00:06:57.160 --> 00:06:58.810
So that's the meaning of rank 1.
00:06:58.810 --> 00:07:03.910
And to say that something is of
high rank of a specific rank--
00:07:03.910 --> 00:07:08.980
rather, the rank of F is
defined to be the minimum number
00:07:08.980 --> 00:07:19.050
of rank 1 functions
needed to write F
00:07:19.050 --> 00:07:21.479
as a sum or a
linear combination.
00:07:27.840 --> 00:07:29.310
So this is rank 1.
00:07:29.310 --> 00:07:31.380
And if you add up
r rank 1 functions,
00:07:31.380 --> 00:07:33.930
then get something
that's, at most, rank r.
00:07:33.930 --> 00:07:38.740
So that's the basic definition
of rank from linear algebra.
00:07:38.740 --> 00:07:40.160
For three-variable
functions, you
00:07:40.160 --> 00:07:42.410
can come up with
other notions of rank.
00:07:42.410 --> 00:07:47.248
So what about
three-variable functions?
00:07:50.460 --> 00:07:53.430
So how do we define a
rank of such a function?
00:07:53.430 --> 00:07:56.100
So you might have seen such
objects as generalizations
00:07:56.100 --> 00:07:57.960
of matrices called tensors.
00:07:57.960 --> 00:08:01.880
And tensors have, already,
a natural notion of rank,
00:08:01.880 --> 00:08:04.680
and this is called tensor rank.
00:08:04.680 --> 00:08:08.130
Just like how, here, F is--
00:08:08.130 --> 00:08:12.030
we say rank 1 if it's
decomposable like that,
00:08:12.030 --> 00:08:18.660
we say F has tensor rank 1 if
this three-variable function is
00:08:18.660 --> 00:08:22.732
decomposable as a product
of one-variable functions.
00:08:25.660 --> 00:08:28.150
The tensor rank,
it turns out, this
00:08:28.150 --> 00:08:31.760
is an important notion, which
is actually quite mysterious.
00:08:31.760 --> 00:08:33.549
There's a lot of
important problems
00:08:33.549 --> 00:08:37.900
that boil down to us not really
understanding what tensor rank,
00:08:37.900 --> 00:08:39.703
how it behaves.
00:08:39.703 --> 00:08:41.620
And it turns out, this
is not the right notion
00:08:41.620 --> 00:08:43.299
to use for our problem.
00:08:43.299 --> 00:08:46.180
So we're going to use a
different notion of rank.
00:08:46.180 --> 00:08:49.150
Here, rank 1 is decomposing
this three-variable function
00:08:49.150 --> 00:08:51.910
into a product of three
one-variable functions.
00:08:51.910 --> 00:08:54.550
But, instead, I can
define a different notion.
00:08:54.550 --> 00:09:00.280
We say that F has slice rank 1--
00:09:00.280 --> 00:09:03.327
so this is a definition
that's introduced
00:09:03.327 --> 00:09:05.410
in the context of this
problem, although it's also
00:09:05.410 --> 00:09:07.210
quite a natural definition--
00:09:07.210 --> 00:09:11.890
if it has one of
the following forms.
00:09:16.340 --> 00:09:20.120
So I can write it as a product
of a one-variable function
00:09:20.120 --> 00:09:21.740
and a two-variable function.
00:09:21.740 --> 00:09:25.340
So one variable and the
remaining two variables.
00:09:25.340 --> 00:09:30.350
But this definition should also
be symmetric in the variables,
00:09:30.350 --> 00:09:33.010
so the other combinations
are OK as well.
00:09:40.200 --> 00:09:42.500
So this is the definition
of a rank one function,
00:09:42.500 --> 00:09:43.930
a slice rank 1.
00:09:43.930 --> 00:09:46.750
And, also, if nonzero.
00:09:46.750 --> 00:09:49.340
If it's nonzero and can be
written in one of these forms.
00:09:51.970 --> 00:09:54.340
And, just like
earlier, we define
00:09:54.340 --> 00:10:03.440
the slice rank of F to be
the minimum number of slice
00:10:03.440 --> 00:10:04.490
rank 1 functions.
00:10:08.390 --> 00:10:13.260
Same as before, that you
need to write F as a sum.
00:10:13.260 --> 00:10:18.140
So I can decompose this F into
a sum of slice rank 1 functions.
00:10:18.140 --> 00:10:21.730
What's the most
efficient way to do so?
00:10:21.730 --> 00:10:24.977
So that's the definition
of slice rank.
00:10:24.977 --> 00:10:27.060
And, you see, you can come
up with this definition
00:10:27.060 --> 00:10:29.640
for any number of
variables, where slice rank
00:10:29.640 --> 00:10:32.670
1 means decompose into
two functions, where
00:10:32.670 --> 00:10:35.340
one function takes one variable,
and the other function takes
00:10:35.340 --> 00:10:37.860
all the remaining variables.
00:10:37.860 --> 00:10:40.320
And, therefore, two
variables, slice rank and rank
00:10:40.320 --> 00:10:41.640
correspond to the same notion.
00:10:45.185 --> 00:10:46.060
Any questions so far?
00:10:49.640 --> 00:10:50.140
All right.
00:10:50.140 --> 00:10:53.020
So let's look at the
function on the right.
00:10:53.020 --> 00:10:55.600
So think of it as
a matrix, a tensor.
00:10:55.600 --> 00:10:56.900
So what is it?
00:10:56.900 --> 00:10:59.735
Well, it's kind of
like a diagonal matrix.
00:10:59.735 --> 00:11:00.610
So that's what it is.
00:11:00.610 --> 00:11:03.110
It's a diagonal matrix.
00:11:03.110 --> 00:11:10.660
So what is the rank of a
diagonal matrix, in this case
00:11:10.660 --> 00:11:11.620
a diagonal function?
00:11:14.570 --> 00:11:16.620
Well, you know
from linear algebra
00:11:16.620 --> 00:11:20.070
that if you have a matrix, then
the rank of a diagonal matrix
00:11:20.070 --> 00:11:23.090
is the number of
nonzero entries.
00:11:23.090 --> 00:11:25.700
So something similar
is true for slice rank,
00:11:25.700 --> 00:11:27.320
although it's less obvious.
00:11:27.320 --> 00:11:30.230
It will require a proof.
00:11:30.230 --> 00:11:36.310
So if I have this
three-variable function
00:11:36.310 --> 00:11:43.790
defined by the
following formula.
00:11:52.090 --> 00:11:54.810
So, in other words, it's
a diagonal function where
00:11:54.810 --> 00:11:59.280
the entries on the
diagonals are the Ca's.
00:11:59.280 --> 00:12:01.130
So what is the rank
of this function?
00:12:01.130 --> 00:12:09.260
So the slice rank of
F. In the matrix case,
00:12:09.260 --> 00:12:11.305
it will be the number
of nonzero entries,
00:12:11.305 --> 00:12:12.760
and it's exactly the same here.
00:12:17.710 --> 00:12:21.060
So number of nonzero
diagonal entries.
00:12:21.060 --> 00:12:22.560
That turns out to
be the slice rank.
00:12:25.670 --> 00:12:26.610
Let's see a proof.
00:12:26.610 --> 00:12:30.500
So we go back to the
definition of slice rank.
00:12:30.500 --> 00:12:36.380
And we see that one of
the directions is easy.
00:12:36.380 --> 00:12:39.280
So this less than or equal to,
greater than or equal to-- so
00:12:39.280 --> 00:12:40.030
which one is easy?
00:12:45.860 --> 00:12:50.080
So, you see, the right-hand
side is a sum of r--
00:12:50.080 --> 00:12:53.690
of a-- well, this
many rank 1 functions.
00:12:53.690 --> 00:12:58.150
So this direction is--
00:12:58.150 --> 00:13:02.140
so this direction is clear,
just looking at the definition.
00:13:02.140 --> 00:13:06.610
I can write F explicitly
as that many rank 1,
00:13:06.610 --> 00:13:10.130
slice rank 1 functions.
00:13:10.130 --> 00:13:16.110
So the tricky part is
greater than or equal to.
00:13:16.110 --> 00:13:19.920
And for the greater
than or equal to,
00:13:19.920 --> 00:13:26.700
let's assume that all the
diagonal entries are nonzero.
00:13:30.750 --> 00:13:33.270
So why can we do this?
00:13:33.270 --> 00:13:35.730
If it's not nonzero,
I claim that we
00:13:35.730 --> 00:13:47.440
can remove this element
from A. If the Ca is not 0,
00:13:47.440 --> 00:13:50.350
then I remove a from the set.
00:13:50.350 --> 00:13:55.690
And doing so cannot
increase the rank.
00:14:03.040 --> 00:14:12.310
A priori, the rank might go
down if you get rid of an entry.
00:14:12.310 --> 00:14:14.800
Because if you add an entry,
even though the function
00:14:14.800 --> 00:14:18.850
doesn't change on the original
set, if you increase your set,
00:14:18.850 --> 00:14:21.850
maybe you have more space,
maybe you have more flexibility
00:14:21.850 --> 00:14:22.480
to work with.
00:14:27.070 --> 00:14:34.530
But, certainly, if you remove an
element, the rank cannot go up.
00:14:38.310 --> 00:14:45.620
Now, so suppose
the slice rank of F
00:14:45.620 --> 00:14:47.980
is strictly less
than the size of A.
00:14:47.980 --> 00:14:52.190
So all these Ca's are nonzero.
00:14:52.190 --> 00:14:55.610
So suppose, for
contradiction, that there
00:14:55.610 --> 00:15:01.590
is some different way
to write function F that
00:15:01.590 --> 00:15:02.730
uses fewer terms.
00:15:05.240 --> 00:15:08.000
So what would such
a sum look like?
00:15:11.250 --> 00:15:16.410
So I would be able to write this
function F in a different way.
00:15:28.190 --> 00:15:28.810
Like that.
00:15:28.810 --> 00:15:31.556
And then, now, I look at these--
00:15:31.556 --> 00:15:35.700
the other types of functions
using different combination
00:15:35.700 --> 00:15:36.948
of the variables.
00:16:01.358 --> 00:16:02.900
So suppose there
were a different way
00:16:02.900 --> 00:16:08.210
to write this function
F that uses fewer terms.
00:16:08.210 --> 00:16:11.063
So I assume it uses exactly
the size of A minus 1 terms,
00:16:11.063 --> 00:16:12.980
and always putting zero
functions if you like.
00:16:15.720 --> 00:16:23.720
So now I claim that
there exists a function
00:16:23.720 --> 00:16:30.910
h on the set A whose support--
00:16:30.910 --> 00:16:34.520
so the support is
the number of entries
00:16:34.520 --> 00:16:35.930
that give nonzero values.
00:16:35.930 --> 00:16:41.740
The support of F is
bigger than m, such
00:16:41.740 --> 00:16:45.420
that the following sum is 0.
00:17:05.520 --> 00:17:12.660
So I claim that we can
find a function F--
00:17:12.660 --> 00:17:19.829
h such that I think of it as
in the kernel of some of these
00:17:19.829 --> 00:17:20.329
f's.
00:17:27.869 --> 00:17:29.730
So this is a linear
algebraic statement.
00:17:29.730 --> 00:17:30.624
Yes.
00:17:30.624 --> 00:17:32.763
AUDIENCE: What is
h sub [INAUDIBLE]??
00:17:32.763 --> 00:17:33.680
YUFEI ZHAO: Ah, sorry.
00:17:33.680 --> 00:17:35.020
It's just h.
00:17:35.020 --> 00:17:36.181
Thank you.
00:17:36.181 --> 00:17:39.480
It's a single function h
such that this equation
00:17:39.480 --> 00:17:42.486
is true for all x.
00:17:46.887 --> 00:17:49.332
AUDIENCE: [INAUDIBLE]
h of x minus the sum
00:17:49.332 --> 00:17:52.760
of all [INAUDIBLE].
00:17:52.760 --> 00:17:55.770
YUFEI ZHAO: You are right.
00:17:55.770 --> 00:17:57.020
So what do I want to say here?
00:18:12.355 --> 00:18:25.140
So we want to find a function
h such that the support of h
00:18:25.140 --> 00:18:27.220
is at least m.
00:18:45.527 --> 00:18:46.610
So what do we want to say?
00:18:51.050 --> 00:18:53.070
I want to say that--
00:19:00.915 --> 00:19:02.270
yes, so you're right.
00:19:02.270 --> 00:19:03.980
This is not what I want to say.
00:19:03.980 --> 00:19:09.090
And, instead, it's something--
00:19:09.090 --> 00:19:09.590
mm-hmm.
00:19:22.982 --> 00:19:26.670
Yes, good.
00:19:26.670 --> 00:19:28.570
So, let's see.
00:19:31.350 --> 00:19:34.345
So here we have some
number of functions.
00:19:34.345 --> 00:19:35.970
Here, we have some
number of functions.
00:19:35.970 --> 00:19:42.730
And for each a, I have--
00:19:42.730 --> 00:19:47.670
or for each-- let's see.
00:20:00.488 --> 00:20:01.967
Umm, hmm.
00:20:06.897 --> 00:20:08.445
AUDIENCE: [INAUDIBLE].
00:20:08.445 --> 00:20:09.362
YUFEI ZHAO: I'm sorry?
00:20:09.362 --> 00:20:10.380
AUDIENCE: [INAUDIBLE].
00:20:10.380 --> 00:20:10.570
YUFEI ZHAO: No.
00:20:10.570 --> 00:20:12.622
So I do want to show--
no, there's no induction,
00:20:12.622 --> 00:20:14.830
because I'm in three variables,
and I want to get rid
00:20:14.830 --> 00:20:16.390
of-- so the point is--
00:20:16.390 --> 00:20:19.450
so let's see where
we're going eventually,
00:20:19.450 --> 00:20:23.140
and then we'll figure out
what happened up there.
00:20:23.140 --> 00:20:26.050
So we want to consider--
00:20:33.180 --> 00:20:36.650
so I would like to eventually
consider the following sum.
00:20:52.200 --> 00:20:57.010
So I want to consider this
sum, which comes from--
00:20:57.010 --> 00:20:58.720
so you look at--
00:21:01.810 --> 00:21:02.550
wait, no.
00:21:02.550 --> 00:21:06.050
That's not the sum
I want to consider.
00:21:06.050 --> 00:21:17.352
So let's look at this F of
x, y, z, so F being that sum.
00:21:17.352 --> 00:21:17.852
No.
00:21:31.010 --> 00:21:33.080
So take that F up there.
00:21:33.080 --> 00:21:36.980
And let me consider,
basically, taking
00:21:36.980 --> 00:21:42.000
the inner product of
this function viewed
00:21:42.000 --> 00:21:44.900
as a function in z.
00:21:44.900 --> 00:21:48.690
So consider this inner product.
00:21:48.690 --> 00:21:54.558
And if I-- ah.
00:21:54.558 --> 00:22:00.340
I think-- so what I
want to say is not this.
00:22:03.730 --> 00:22:17.530
So what I want to say is, if I
look at an inner product of h
00:22:17.530 --> 00:22:19.880
with the--
00:22:24.580 --> 00:22:26.920
so take one of these f's--
00:22:26.920 --> 00:22:29.230
take one of these f's
and look at the bilinear
00:22:29.230 --> 00:22:31.190
form relating each in f.
00:22:31.190 --> 00:22:34.450
So I want to show
that this sum vanishes
00:22:34.450 --> 00:22:41.565
for all i between m plus 1
and the size of A minus 1.
00:22:41.565 --> 00:22:46.120
So this row, I want it
to vanish when being
00:22:46.120 --> 00:22:50.180
taken bilinear form with h.
00:22:50.180 --> 00:22:52.246
So that makes sense now.
00:22:52.246 --> 00:22:53.190
OK, good.
00:22:57.910 --> 00:23:02.020
So the fact that such a
nonzero h exists simply
00:23:02.020 --> 00:23:04.390
is a matter of
counting parameters.
00:23:04.390 --> 00:23:06.490
It's a linear
algebraic statement.
00:23:06.490 --> 00:23:08.420
You have some
number of freedoms.
00:23:08.420 --> 00:23:12.400
You have some number
of constraints.
00:23:12.400 --> 00:23:20.110
So the set of such h satisfy
all of these constraints.
00:23:20.110 --> 00:23:23.200
So there are this
many constraints.
00:23:23.200 --> 00:23:25.150
Well, each one of
them could carry down
00:23:25.150 --> 00:23:27.970
to one dimension less,
but the set of such h
00:23:27.970 --> 00:23:39.710
is a linear subspace of
dimension bigger than m,
00:23:39.710 --> 00:23:48.990
because I have A dimensions, and
I have these many constraints.
00:23:48.990 --> 00:23:52.640
So the set of such h is-- there
are a lot of possibilities.
00:23:52.640 --> 00:23:59.100
And, furthermore, it
is also true that--
00:23:59.100 --> 00:24:01.480
and this is a linear
algebraic statement--
00:24:01.480 --> 00:24:09.580
that every subspace of
dimension m plus one
00:24:09.580 --> 00:24:19.510
has a vector whose support
has size at least m plus 1.
00:24:26.170 --> 00:24:28.640
I'll leave this as a
linear algebraic exercise.
00:24:28.640 --> 00:24:34.074
It's not entirely
obvious, but it is true.
00:24:34.074 --> 00:24:35.830
When you put these
two things together,
00:24:35.830 --> 00:24:37.970
you find that there
is some vector--
00:24:37.970 --> 00:24:40.150
so I think of the
corners of the vectors
00:24:40.150 --> 00:24:42.040
as indexed by the set A--
00:24:42.040 --> 00:24:45.270
there is some vector whose
support is large enough.
00:24:53.510 --> 00:24:55.350
So we prove the claim.
00:24:55.350 --> 00:24:59.420
Let's go back to this lemma
about this diagonal function
00:24:59.420 --> 00:25:01.970
having high rank.
00:25:01.970 --> 00:25:03.380
Take h from the claim.
00:25:08.460 --> 00:25:10.570
So let's take h from the claim.
00:25:10.570 --> 00:25:15.260
Then let's consider
this sum over here.
00:25:15.260 --> 00:25:19.490
On one hand, what this sum is--
00:25:19.490 --> 00:25:24.350
you can do the sum on
the right-hand side.
00:25:24.350 --> 00:25:29.910
We see that it's like
multiplying a diagonal matrix
00:25:29.910 --> 00:25:31.350
by a vector.
00:25:31.350 --> 00:25:36.320
So what you get, following the
formula on the right-hand side,
00:25:36.320 --> 00:25:39.876
is the following.
00:25:39.876 --> 00:25:41.550
Let me rewrite this part.
00:25:47.250 --> 00:25:54.842
Sum over a of C sub a h of
a delta sub a of x delta sub
00:25:54.842 --> 00:25:55.650
a of y.
00:25:58.460 --> 00:26:02.580
Just looking at the formula
from the right hand side.
00:26:02.580 --> 00:26:08.110
On the other hand, if you
had a decomposition up there,
00:26:08.110 --> 00:26:14.730
doing this sum and
noting the claim,
00:26:14.730 --> 00:26:19.080
we see that the
third row is gone.
00:26:19.080 --> 00:26:38.152
So what you would have is
a sum over these z's of--
00:26:38.152 --> 00:26:42.950
so let me write that like this.
00:26:42.950 --> 00:26:51.200
So you would have a sum that is
of the form f1 of x and g tilde
00:26:51.200 --> 00:26:54.830
1 of y, where g
tilde is basically
00:26:54.830 --> 00:26:58.460
the inner product of g1
as a function of z with h.
00:27:05.180 --> 00:27:09.470
So fl of x gl of y.
00:27:09.470 --> 00:27:20.070
And then, also,
functions like that.
00:27:30.860 --> 00:27:33.120
So there exists some
functions g, which
00:27:33.120 --> 00:27:35.580
come from g tilde,
which come from the g's
00:27:35.580 --> 00:27:37.070
up there, such
that this is true.
00:27:41.830 --> 00:27:45.840
But now we're in the world
of two-variable functions.
00:27:45.840 --> 00:27:49.710
So left and right-hand side
are two-variable functions.
00:27:49.710 --> 00:27:51.900
And for two-variable
functions, you
00:27:51.900 --> 00:27:56.440
understand what is the rank
of a diagonal function.
00:27:56.440 --> 00:28:12.950
So the left-hand side has
more than m diagonal entries,
00:28:12.950 --> 00:28:15.260
because h has support.
00:28:15.260 --> 00:28:17.450
So the number of
diagonal entries
00:28:17.450 --> 00:28:19.310
is just the support of h.
00:28:23.320 --> 00:28:27.760
Whereas the right-hand
side has rank--
00:28:27.760 --> 00:28:30.850
so now a linear
algebraic matrix rank--
00:28:30.850 --> 00:28:32.230
at most, m.
00:28:35.080 --> 00:28:36.540
And that's a contradiction.
00:28:36.540 --> 00:28:37.092
Yes.
00:28:37.092 --> 00:28:39.060
AUDIENCE: So you can
show a similar statement
00:28:39.060 --> 00:28:41.560
where [INAUDIBLE].
00:28:41.560 --> 00:28:42.310
YUFEI ZHAO: Great.
00:28:42.310 --> 00:28:44.170
So we can show a
similar statement
00:28:44.170 --> 00:28:51.510
for arbitrary
number of variables
00:28:51.510 --> 00:28:53.940
by generalizing this
proof and using induction
00:28:53.940 --> 00:28:56.790
on the number of variables.
00:28:56.790 --> 00:28:58.540
But we only need three
variables for now.
00:29:01.390 --> 00:29:04.130
Any questions?
00:29:04.130 --> 00:29:07.550
Just to recap, what we
proved is the generalization
00:29:07.550 --> 00:29:10.760
of the statement that
a diagonal matrix has
00:29:10.760 --> 00:29:14.240
rank equal to the number of
nonzero diagonal entries.
00:29:14.240 --> 00:29:18.380
But the same fact is true for
these three-variable functions
00:29:18.380 --> 00:29:19.700
with respect to slice rank.
00:29:25.240 --> 00:29:27.618
So this is intuitively
obvious, but the execution
00:29:27.618 --> 00:29:28.410
is slightly tricky.
00:29:31.160 --> 00:29:31.890
All right.
00:29:31.890 --> 00:29:33.870
So now we have the
statement here.
00:29:33.870 --> 00:29:41.480
Let's proceed to analyze this
function which comes from--
00:29:41.480 --> 00:29:46.547
so this relationship here coming
from set A that is 3-AP-free.
00:29:53.020 --> 00:29:56.770
So suppose now I'm in--
00:29:56.770 --> 00:30:00.560
so let me-- so everything so
far was generally with any A.
00:30:00.560 --> 00:30:05.230
But now let me think
about, specifically,
00:30:05.230 --> 00:30:13.166
functions on the finite field
vector space, F3 to the n.
00:30:13.166 --> 00:30:16.600
So it's a function
taking value F3.
00:30:16.600 --> 00:30:22.180
And this function is defined
to be the left-hand side
00:30:22.180 --> 00:30:23.940
of that equation over there.
00:30:29.240 --> 00:30:31.850
So the claim is that--
00:30:31.850 --> 00:30:35.630
so the left-hand side claim
that this function has low rank.
00:30:35.630 --> 00:30:43.160
So we claim that a slice rank of
this function is, at most, 3M,
00:30:43.160 --> 00:31:01.120
where M is the sum
of, essentially,
00:31:01.120 --> 00:31:03.236
this multinomial coefficient.
00:31:09.160 --> 00:31:11.500
So we'll analyze this
number in a second,
00:31:11.500 --> 00:31:13.590
but this number is
supposed to be small.
00:31:19.540 --> 00:31:23.830
So we want to show that this
function here has small rank.
00:31:23.830 --> 00:31:29.170
So let's rewrite this
function in a form
00:31:29.170 --> 00:31:37.040
explicitly as a sum of
products by expanding
00:31:37.040 --> 00:31:40.410
this function after writing it
in a slightly different form.
00:31:40.410 --> 00:31:46.600
So in F3, in a three-variable--
00:31:46.600 --> 00:31:53.630
in characteristic-- so in
F3, you have this equation.
00:31:53.630 --> 00:31:56.560
You can check that it's true
for x equal to 0, 1, or 2.
00:31:59.920 --> 00:32:04.930
So picked that, and
plug it in over here.
00:32:04.930 --> 00:32:17.840
So we find-- so now x,
y, z are in F3 to the n.
00:32:17.840 --> 00:32:31.350
So we find that, applying
this guy here coordinate-wise,
00:32:31.350 --> 00:32:32.910
you have this product.
00:32:36.850 --> 00:32:37.390
Great.
00:32:37.390 --> 00:32:41.890
Now let's pretend we're
expanding everything.
00:32:41.890 --> 00:32:51.370
This is a polynomial in 3n
variables, 3n variables.
00:32:51.370 --> 00:32:52.810
It's degrees is 2n.
00:32:55.880 --> 00:33:04.440
So if we expand, we get
a bunch of monomials.
00:33:04.440 --> 00:33:06.890
And the monomials will
have the following form.
00:33:13.710 --> 00:33:18.760
So the x's, which--
whose exponents I call i,
00:33:18.760 --> 00:33:25.680
the y's, whose
exponents I call j,
00:33:25.680 --> 00:33:34.900
and the z's, whose
exponents I call k, where--
00:33:34.900 --> 00:33:41.140
so I get a sum of
monomials like that,
00:33:41.140 --> 00:33:51.630
where all of these i, j's,
and k's are either 0, 1, or 2.
00:33:55.950 --> 00:33:59.283
So I get this big
sum of monomials,
00:33:59.283 --> 00:34:01.200
and I want to show that
it's possible to write
00:34:01.200 --> 00:34:07.940
this sum as a small number of
functions that can be written
00:34:07.940 --> 00:34:12.320
as a product, where
one of the factors only
00:34:12.320 --> 00:34:16.949
involves one of x, y, z.
00:34:16.949 --> 00:34:19.409
So what we can do
is to group them.
00:34:22.277 --> 00:34:33.120
So group these
monomials by the--
00:34:36.860 --> 00:34:39.770
so, for example, I'm going
to group these monomials
00:34:39.770 --> 00:34:41.780
by using the
following observation.
00:34:41.780 --> 00:34:58.030
So by pigeonhole, at least
one of the exponents of x,
00:34:58.030 --> 00:35:05.080
or the exponents of y, or the
exponents of z, at least one
00:35:05.080 --> 00:35:10.930
of these guys is,
at most, 2n over 3.
00:35:13.920 --> 00:35:20.328
So I group these
monomials by the--
00:35:20.328 --> 00:35:23.010
one of x, y, z that has
the smallest exponent.
00:35:26.850 --> 00:35:36.240
So the contributions to
the rank or the slice
00:35:36.240 --> 00:35:51.340
rank from monomials with the
degree of x being, at most,
00:35:51.340 --> 00:35:57.990
2n over 3, well, I can
write such contributions
00:35:57.990 --> 00:36:09.770
in the form like that, where
this f of x is a monomial,
00:36:09.770 --> 00:36:14.030
and the g is a sum of
whatever that could come up.
00:36:14.030 --> 00:36:17.680
This is a sum, but
this is a monomial.
00:36:17.680 --> 00:36:21.950
So the number of such terms--
00:36:21.950 --> 00:36:28.870
so the number of such
terms is the number
00:36:28.870 --> 00:36:35.030
of monomials corresponding
to choices of i's, the sum
00:36:35.030 --> 00:36:41.760
to 2n over 3, and individual
i's coming from 0, 1, or 2.
00:36:41.760 --> 00:36:46.270
And that number is precisely M.
00:36:46.270 --> 00:36:51.420
So M counts the number
of choices of 0, 1, 2's.
00:36:51.420 --> 00:36:52.800
There are n of them.
00:36:52.800 --> 00:36:55.500
And the sums of the i's
is, at most, 2n over 3.
00:37:02.125 --> 00:37:03.500
So these are
contributions coming
00:37:03.500 --> 00:37:07.310
from monomials where the degree
of x is, at most, 2n over 3.
00:37:07.310 --> 00:37:13.220
And, similarly, with
degree of y being
00:37:13.220 --> 00:37:21.910
2n over 3, and also degree of
z being, at most, 2n over 3.
00:37:21.910 --> 00:37:24.020
So. all the monomials
can be grouped
00:37:24.020 --> 00:37:27.665
in one of these three groups,
and I count the contribution
00:37:27.665 --> 00:37:29.774
to the slice rank.
00:37:29.774 --> 00:37:32.685
AUDIENCE: Do we have a good idea
as to how sharp this bound is?
00:37:32.685 --> 00:37:34.227
YUFEI ZHAO: So the
question is, do we
00:37:34.227 --> 00:37:37.730
have a good idea as to
how sharp this bound is?
00:37:37.730 --> 00:37:38.980
That's a really good question.
00:37:38.980 --> 00:37:40.326
I don't know.
00:37:40.326 --> 00:37:40.826
Yes.
00:37:46.600 --> 00:37:47.100
Great.
00:37:47.100 --> 00:37:49.490
So that finishes the
proof of this lemma.
00:37:58.580 --> 00:38:00.260
So now we have this lemma.
00:38:00.260 --> 00:38:04.190
I can compare-- so we
have these two lemmas.
00:38:04.190 --> 00:38:08.180
One of them tells me the rank
of the right-hand side, which
00:38:08.180 --> 00:38:16.860
is A. Let's compare
ranks, the slice rank.
00:38:21.160 --> 00:38:25.200
So the left-hand side, we know
it is, at most, this quantity.
00:38:25.200 --> 00:38:28.600
And the right-hand
side is equal to A.
00:38:28.600 --> 00:38:34.620
So we automatically
find this bound.
00:38:34.620 --> 00:38:38.590
So now we want to know
how big this number M is.
00:38:38.590 --> 00:38:42.250
So there's actually-- this
is a fairly standard problem
00:38:42.250 --> 00:38:45.800
to solve to estimate the
growth of this function M. So
00:38:45.800 --> 00:38:48.500
let me show you how to do
it, and this is basically
00:38:48.500 --> 00:38:51.590
the universal method.
00:38:51.590 --> 00:38:55.940
Notice that I can--
00:38:55.940 --> 00:38:59.280
if I look at this
number here, where if--
00:38:59.280 --> 00:39:04.930
so now x is some real
number between 0 and 1.
00:39:04.930 --> 00:39:09.700
Then I claim the
following is true.
00:39:13.800 --> 00:39:16.940
And this is because if you
expand the right-hand side
00:39:16.940 --> 00:39:20.290
and count your monomials--
00:39:20.290 --> 00:39:24.330
so you can just keep track
of which monomials occur,
00:39:24.330 --> 00:39:27.540
and there are M of them,
where you can lower
00:39:27.540 --> 00:39:28.988
bound by this quantity here.
00:39:32.580 --> 00:39:37.190
So this is kind of related to
things in probability theory
00:39:37.190 --> 00:39:40.290
on large deviations, to
the CramÃ©r's theorem.
00:39:40.290 --> 00:39:42.860
But that's what you can do.
00:39:42.860 --> 00:39:47.720
So this is true for every
value of x, so you pick one
00:39:47.720 --> 00:39:51.070
that gives you the best bound.
00:39:51.070 --> 00:40:04.980
So M is, at most, the inf
of this quantity here.
00:40:04.980 --> 00:40:07.760
And to show you
any bound, I just
00:40:07.760 --> 00:40:10.410
have to plug in some value.
00:40:10.410 --> 00:40:15.560
So if I plug in, for
example, x being 0.6,
00:40:15.560 --> 00:40:18.200
I already get a bound which
is the one that I claimed.
00:40:24.900 --> 00:40:28.970
And it turns out this
step here is not lossy.
00:40:28.970 --> 00:40:33.380
As in, basically, up to 1 plus
little o1 in the exponent,
00:40:33.380 --> 00:40:36.480
this is the correct bound.
00:40:36.480 --> 00:40:39.730
And that follows from general
results in large deviation
00:40:39.730 --> 00:40:41.980
theory.
00:40:41.980 --> 00:40:45.850
And that finishes the proof.
00:40:45.850 --> 00:40:47.380
Alternatively, you
can also estimate
00:40:47.380 --> 00:40:50.680
M using Sterling's formula.
00:40:50.680 --> 00:40:51.930
But this, I think, is cleaner.
00:40:55.060 --> 00:40:55.910
Great.
00:40:55.910 --> 00:40:58.890
Any questions?
00:40:58.890 --> 00:40:59.390
Yes.
00:40:59.390 --> 00:41:00.800
AUDIENCE: [INAUDIBLE].
00:41:06.748 --> 00:41:07.540
YUFEI ZHAO: Ah, OK.
00:41:07.540 --> 00:41:10.010
So why is this step true?
00:41:10.010 --> 00:41:14.210
So if you expand
the right-hand side,
00:41:14.210 --> 00:41:21.040
you see that the right-hand side
is upper bounded by all these
00:41:21.040 --> 00:41:23.820
a, b, c, as in--
00:41:23.820 --> 00:41:33.300
same as over here,
x to the b plus 2c.
00:41:37.300 --> 00:41:41.140
And because how many terms--
00:41:41.140 --> 00:41:45.635
and, also, there's a
binomial coefficient term.
00:41:45.635 --> 00:41:47.760
So, basically, I'm doing
the multinomial expansion,
00:41:47.760 --> 00:41:50.265
except I toss out everything
which is not part of the index.
00:41:52.795 --> 00:42:00.045
And because b plus 2c
is, at most, 2n over 3,
00:42:00.045 --> 00:42:04.030
I get M times x
to the 2n over 3.
00:42:07.495 --> 00:42:08.397
OK?
00:42:08.397 --> 00:42:08.980
AUDIENCE: Yes.
00:42:13.435 --> 00:42:17.970
YUFEI ZHAO: Now I want to
convey a sense of mystique
00:42:17.970 --> 00:42:19.280
about this proof.
00:42:19.280 --> 00:42:21.960
This is a really cool proof.
00:42:21.960 --> 00:42:24.070
So because you're
seeing a lecture,
00:42:24.070 --> 00:42:25.800
maybe it went by very quickly.
00:42:25.800 --> 00:42:29.130
But when this proof came out,
people were very shocked.
00:42:29.130 --> 00:42:31.560
They didn't expect that this
problem would be tackled,
00:42:31.560 --> 00:42:37.950
would be solved using a
method that is so unexpected.
00:42:37.950 --> 00:42:41.130
And this is part of this
power of the algebraic method
00:42:41.130 --> 00:42:45.120
in combinatorics,
where we often end up
00:42:45.120 --> 00:42:47.100
with these short,
surprising proofs that
00:42:47.100 --> 00:42:48.842
take a very long time to find.
00:42:48.842 --> 00:42:50.300
But they turn out
to be very short.
00:42:50.300 --> 00:42:51.450
So this is very short.
00:42:51.450 --> 00:42:55.080
This was basically
a four-page paper.
00:42:55.080 --> 00:42:56.850
But when they work,
they work beautifully.
00:42:56.850 --> 00:42:58.100
They work like magic.
00:42:58.100 --> 00:43:00.480
But it's hard to
predict when they work.
00:43:00.480 --> 00:43:03.800
And, also, these methods
are somewhat fragile.
00:43:03.800 --> 00:43:05.960
So, unlike the Fourier
analytic methods
00:43:05.960 --> 00:43:10.120
that we saw last time, with
that method, it's very analytic.
00:43:10.120 --> 00:43:13.160
It works in one situation,
you can play with it,
00:43:13.160 --> 00:43:15.780
massage it, make it work
in a different situation.
00:43:15.780 --> 00:43:20.390
Here, we're using something
very implicit, very special
00:43:20.390 --> 00:43:24.390
about these many variables.
00:43:24.390 --> 00:43:28.080
And if you try to tweak the
problem just a little bit,
00:43:28.080 --> 00:43:29.940
the method seems to break down.
00:43:29.940 --> 00:43:31.980
So, in particular,
it is open how
00:43:31.980 --> 00:43:38.570
to extend this method
to other settings.
00:43:38.570 --> 00:43:40.590
It's not even clear what
the results should be.
00:43:40.590 --> 00:43:43.560
So it's open to extend
it to, for example, 4-AP.
00:43:48.910 --> 00:43:58.130
So we do not know if the maximum
size of 4-AP-free subset of F5
00:43:58.130 --> 00:44:06.080
to the n is less than some
constant, 4.99 to the n.
00:44:06.080 --> 00:44:10.511
So that's very much open.
00:44:10.511 --> 00:44:12.420
By the way, all of
this 3-AP stuff,
00:44:12.420 --> 00:44:14.390
right now I've
only done it in F3,
00:44:14.390 --> 00:44:17.690
but it works for 3-AP
in any finite field.
00:44:20.450 --> 00:44:23.855
It also is open to
extend it to corners.
00:44:27.110 --> 00:44:30.950
So you can define a
notion of corners.
00:44:30.950 --> 00:44:34.010
So, previously, we saw
corners in integer grid.
00:44:34.010 --> 00:44:35.760
If I replace integer
by some other group,
00:44:35.760 --> 00:44:38.570
you can define a notion
of corners there.
00:44:38.570 --> 00:44:43.400
So not clear how to extend
this method to corners.
00:44:43.400 --> 00:44:46.520
And, also, is there some
way to extend some ideas
00:44:46.520 --> 00:44:49.010
from this method
to the integers?
00:44:49.010 --> 00:44:52.680
It completely fails, so this
method is not clear at all
00:44:52.680 --> 00:44:56.190
how you might have
it work in a setting
00:44:56.190 --> 00:44:58.500
where you don't have
this high dimensionality.
00:44:58.500 --> 00:44:59.610
I mean, the result
will be different,
00:44:59.610 --> 00:45:01.950
because, integers, we know
that there's no power saving,
00:45:01.950 --> 00:45:03.617
but maybe you can get
some other bounds.
00:45:06.960 --> 00:45:07.990
Any questions?
00:45:12.690 --> 00:45:13.460
OK.
00:45:13.460 --> 00:45:14.708
great.
00:45:14.708 --> 00:45:15.500
Let's take a break.
00:45:18.570 --> 00:45:20.270
So in the first part
of today's lecture,
00:45:20.270 --> 00:45:23.468
I showed you a proof
of Roth's theorem.
00:45:23.468 --> 00:45:25.510
In F3 to the n, that gave
you a much better bound
00:45:25.510 --> 00:45:28.480
than what we did with Fourier.
00:45:28.480 --> 00:45:30.490
Second part, I want to
show you another proof.
00:45:30.490 --> 00:45:33.720
So yet another proof
of Roth in F2 to the n,
00:45:33.720 --> 00:45:36.700
and this time giving
you a much worse bound.
00:45:36.700 --> 00:45:38.850
But, of course, I do
this for a reason.
00:45:38.850 --> 00:45:42.830
So it will give
you the new result.
00:45:42.830 --> 00:45:45.970
So it will give you some more
information about 3-AP's and F3
00:45:45.970 --> 00:45:46.630
to the n.
00:45:46.630 --> 00:45:50.530
But the more important
reason is that in this course
00:45:50.530 --> 00:45:52.840
I try to make some connections
between graph theory
00:45:52.840 --> 00:45:54.430
on one hand and
additive combinatorics
00:45:54.430 --> 00:45:55.510
on the other hand.
00:45:55.510 --> 00:45:57.390
And, so far, we've
seen some analogies.
00:45:57.390 --> 00:45:59.860
Well, in the proof of
Szemeredi's graph regularity
00:45:59.860 --> 00:46:02.680
lemma versus the proof--
the Fourier analytic proof
00:46:02.680 --> 00:46:07.810
of Roth's theorem, there was
this common theme of structure
00:46:07.810 --> 00:46:09.920
versus pseudorandomness.
00:46:09.920 --> 00:46:13.360
But the actual execution of the
proofs are somewhat different.
00:46:13.360 --> 00:46:16.480
Because, on one hand,
in regularity lemma,
00:46:16.480 --> 00:46:18.565
you have energy increment.
00:46:18.565 --> 00:46:21.650
You have partitioning
and energy increment.
00:46:21.650 --> 00:46:23.710
And, on the other
hand, with Roth,
00:46:23.710 --> 00:46:26.065
you have density increment.
00:46:26.065 --> 00:46:27.190
Or you're not partitioning.
00:46:27.190 --> 00:46:28.870
You're zooming in.
00:46:28.870 --> 00:46:31.880
Take a set, find some structure,
zoom in, find some structure,
00:46:31.880 --> 00:46:32.730
zoom in.
00:46:32.730 --> 00:46:35.210
You'll get density increment.
00:46:35.210 --> 00:46:38.690
So it's similar, but
differently executed.
00:46:38.690 --> 00:46:41.030
So, today-- I mean,
this second half,
00:46:41.030 --> 00:46:43.520
I want to show you how to do
a different proof of Roth's
00:46:43.520 --> 00:46:46.640
theorem that is much
more closely related
00:46:46.640 --> 00:46:50.600
to the regularity proof, so
that has this energy increment
00:46:50.600 --> 00:46:53.120
element to it.
00:46:53.120 --> 00:46:56.920
And I show you this proof
because it also gives you
00:46:56.920 --> 00:47:00.890
a stronger consequence.
00:47:00.890 --> 00:47:08.300
And, namely, we'll
get that there is also
00:47:08.300 --> 00:47:14.840
not just 3-AP's but 3-AP's
with popular difference.
00:47:14.840 --> 00:47:17.790
So here's the result
that we'll see today.
00:47:17.790 --> 00:47:20.256
So it's proved by Ben Green.
00:47:20.256 --> 00:47:32.370
That for every epsilon, there
exists some n0 such that every
00:47:32.370 --> 00:47:43.790
A in subset of F3 to the
n with density alpha,
00:47:43.790 --> 00:47:58.990
there exists some nonzero y such
that the number of 3-AP's with
00:47:58.990 --> 00:48:00.220
common difference y--
00:48:04.340 --> 00:48:06.200
so let's think about
what's going on here.
00:48:06.200 --> 00:48:09.070
So if I just give you
a set A and ask you
00:48:09.070 --> 00:48:12.640
how many 3-AP's are
there, and compare it
00:48:12.640 --> 00:48:16.420
to what you get from
random, random meaning
00:48:16.420 --> 00:48:20.320
if A were a random set
of the same density.
00:48:20.320 --> 00:48:23.430
So question is, can the
number of 3-AP's be less
00:48:23.430 --> 00:48:27.370
than the random count?
00:48:27.370 --> 00:48:28.570
And the answer is yes.
00:48:28.570 --> 00:48:32.595
So, for example,
you could have--
00:48:32.595 --> 00:48:36.020
in the integers, you can have
a barren type construction that
00:48:36.020 --> 00:48:37.560
has no 3-AP's.
00:48:37.560 --> 00:48:40.080
So, certainly, that's
fewer 3-AP's than random.
00:48:40.080 --> 00:48:43.800
And you can do
similar things here.
00:48:43.800 --> 00:48:47.880
But what Green's theorem
says is that there
00:48:47.880 --> 00:48:51.090
exists some popular
common difference--
00:48:51.090 --> 00:48:55.273
so this is a popular
common difference--
00:48:58.520 --> 00:49:03.170
such that the number
of 3-AP's in A
00:49:03.170 --> 00:49:09.240
with this common difference
is at least as much as
00:49:09.240 --> 00:49:12.770
what you should expect
in a random setting,
00:49:12.770 --> 00:49:13.970
up to a minus epsilon.
00:49:18.830 --> 00:49:20.510
So this is the theorem.
00:49:20.510 --> 00:49:23.006
So let me say the
intuition again.
00:49:23.006 --> 00:49:26.330
It says that, given
an arbitrary set A,
00:49:26.330 --> 00:49:30.180
provided the space
dimension is large enough,
00:49:30.180 --> 00:49:32.830
there exists some popular
common difference,
00:49:32.830 --> 00:49:35.330
where popular means that
the number of 3-AP's
00:49:35.330 --> 00:49:38.990
with that common difference
is at least roughly as many
00:49:38.990 --> 00:49:39.530
as random.
00:49:42.410 --> 00:49:46.420
In particular, this
proves Roth's theorem,
00:49:46.420 --> 00:49:49.373
because you have at
least some 3-AP's.
00:49:49.373 --> 00:49:50.290
But it tells you more.
00:49:50.290 --> 00:49:52.660
It tells you there's some
common difference that
00:49:52.660 --> 00:49:57.500
has a lot of 3-AP's,
even though, on average,
00:49:57.500 --> 00:49:59.690
if you just take an average,
if you take a random y,
00:49:59.690 --> 00:50:00.300
this is false.
00:50:04.160 --> 00:50:05.897
Any questions about
the statement?
00:50:10.630 --> 00:50:15.200
So Green developed
an arithmetic analog
00:50:15.200 --> 00:50:18.030
of Szemeredi's graph
regularity lemma
00:50:18.030 --> 00:50:19.500
in order to prove this theorem.
00:50:26.120 --> 00:50:33.050
So starting with Szemeredi's
graph regularity lemma,
00:50:33.050 --> 00:50:37.340
he found a way to import that
technique into the arithmetic
00:50:37.340 --> 00:50:40.560
setting, in F3 to the n.
00:50:40.560 --> 00:50:43.850
So I want to show you how,
roughly, how this is done.
00:50:43.850 --> 00:50:47.030
And just like in Szemeredi's
graph regularity lemma,
00:50:47.030 --> 00:50:50.540
there were unavoidable bounds
which are of power type,
00:50:50.540 --> 00:50:53.240
the same thing is true in
the arithmetic setting.
00:50:53.240 --> 00:50:59.980
So Green's proof shows
that the theorem is true,
00:50:59.980 --> 00:51:07.110
with n0 being something
like tower in--
00:51:07.110 --> 00:51:09.900
a tower of twos.
00:51:09.900 --> 00:51:14.230
The height of the tower is a
polynomial in 1 over epsilon.
00:51:17.974 --> 00:51:21.390
So just like in regularity
lemma for graphs.
00:51:21.390 --> 00:51:24.690
So this was recently
improved in a paper
00:51:24.690 --> 00:51:30.712
by Fox and Pham just a
couple of years ago, where--
00:51:30.712 --> 00:51:33.750
and this is the proof that
I will show you today--
00:51:33.750 --> 00:51:38.400
where you can take n0 to be
slightly better but still
00:51:38.400 --> 00:51:43.290
a tower, but a tower of now
height log in 1 over epsilon.
00:51:43.290 --> 00:51:45.620
So it's from a really,
really big tower
00:51:45.620 --> 00:51:47.310
to slightly less big tower.
00:51:50.060 --> 00:51:52.010
But, more importantly,
it turns out--
00:51:52.010 --> 00:51:55.820
so they also showed
that this is tight.
00:51:58.610 --> 00:52:01.270
You cannot do better.
00:52:01.270 --> 00:52:05.915
There exists constructions,
there exist sets A for which
00:52:05.915 --> 00:52:06.415
you--
00:52:06.415 --> 00:52:08.910
I mean, this theorem
is false if you
00:52:08.910 --> 00:52:12.390
replace the big O by less
than some very small constant.
00:52:15.660 --> 00:52:19.100
So many applications of
the regularity lemma.
00:52:19.100 --> 00:52:22.170
That first proof, maybe using
regularity, is difficult. Well,
00:52:22.170 --> 00:52:23.700
it gives you a very poor bound.
00:52:23.700 --> 00:52:27.270
But, subsequently, there were
other proofs, better proofs,
00:52:27.270 --> 00:52:30.810
that give you
non-tower type bounds.
00:52:30.810 --> 00:52:33.000
But this is the
first application
00:52:33.000 --> 00:52:35.610
that we've seen
where, it turns out,
00:52:35.610 --> 00:52:41.520
the regularity lemma gives
you the correct bound.
00:52:41.520 --> 00:52:44.000
So it's really-- you
need a tower-type bound.
00:52:44.000 --> 00:52:45.500
I mean, we know the
regularity lemma
00:52:45.500 --> 00:52:47.240
itself needs tower-type bounds.
00:52:47.240 --> 00:52:48.890
But it turns out
this application also
00:52:48.890 --> 00:52:51.208
needs tower-type bounds.
00:52:51.208 --> 00:52:52.250
That's quite interesting.
00:52:52.250 --> 00:52:54.080
So, here, the use
of regularity is
00:52:54.080 --> 00:52:57.180
really necessary in
this quantitative sense.
00:53:02.342 --> 00:53:03.300
So let's see the proof.
00:53:05.970 --> 00:53:11.940
So let me first prove a
slightly technical lemma
00:53:11.940 --> 00:53:13.748
about bounded increments.
00:53:16.440 --> 00:53:19.055
So this is-- corresponds
to the statement
00:53:19.055 --> 00:53:20.550
that if you have
energy increments,
00:53:20.550 --> 00:53:22.800
you can not increase
too many times,
00:53:22.800 --> 00:53:25.260
but in a slightly
different form.
00:53:25.260 --> 00:53:28.380
So suppose you have
numbers alpha and epsilon
00:53:28.380 --> 00:53:30.060
bigger than 0.
00:53:30.060 --> 00:53:39.450
And if you have this sequence
of a's between 0 and 1,
00:53:39.450 --> 00:53:44.430
and such that a0
is at least alpha,
00:53:44.430 --> 00:53:50.760
then there exists
some k, at most log
00:53:50.760 --> 00:53:56.670
base 2 of 1 over
epsilon, such that 2
00:53:56.670 --> 00:54:01.950
a sub k minus a sub k
plus 1 is at least alpha
00:54:01.950 --> 00:54:04.248
cubed minus epsilon.
00:54:04.248 --> 00:54:05.540
So don't worry about this form.
00:54:05.540 --> 00:54:08.400
We'll see shorty why we
want something like that.
00:54:08.400 --> 00:54:10.860
But the proof itself is
very straightforward.
00:54:10.860 --> 00:54:16.461
Because, otherwise--
so you start with a0.
00:54:16.461 --> 00:54:21.780
Now, then, if this is not
true for k equals to 0,
00:54:21.780 --> 00:54:30.560
then a1 is at least 2 a0 minus
epsilon cubed plus epsilon.
00:54:30.560 --> 00:54:35.210
So a0 is at least alpha cubed.
00:54:35.210 --> 00:54:39.265
So if-- otherwise, you have
some lower bound on alpha
00:54:39.265 --> 00:54:49.080
1, which is at least
alpha cubed plus epsilon.
00:54:49.080 --> 00:54:52.410
And, likewise, you have
some lower bound on alpha 2.
00:54:59.880 --> 00:55:01.866
You have some lower bound on--
00:55:01.866 --> 00:55:09.680
sorry-- alpha 2, and this
lower bound is plus 2 epsilon.
00:55:09.680 --> 00:55:10.920
So you keep iterating.
00:55:10.920 --> 00:55:16.450
You see the next thing
is 4 epsilon, and so on.
00:55:16.450 --> 00:55:20.640
So if you get to more
than this many iterations,
00:55:20.640 --> 00:55:23.430
you go more than 1.
00:55:23.430 --> 00:55:32.790
So alpha k is bigger than 1
if k is ceiling of log base 2
00:55:32.790 --> 00:55:34.140
of 1 over epsilon.
00:55:34.140 --> 00:55:38.710
And that will be a
contradiction to the hypothesis.
00:55:38.710 --> 00:55:42.990
So this is a small variation
on this fact that you cannot
00:55:42.990 --> 00:55:45.220
increment too many times.
00:55:45.220 --> 00:55:47.130
Each time, you go up by a bit.
00:55:47.130 --> 00:55:51.990
Whereas, we save a little
bit because the number
00:55:51.990 --> 00:55:53.780
of iterations is
now logarithmic.
00:55:53.780 --> 00:55:55.290
So you double in
epsilon each time.
00:56:01.370 --> 00:56:11.490
If I give you a function f on F3
to the n, and U is a subspace--
00:56:11.490 --> 00:56:13.010
so this notation means subspace.
00:56:17.360 --> 00:56:24.270
Let me write f sub U to
be the function obtained
00:56:24.270 --> 00:56:31.131
by averaging f on each U coset.
00:56:31.131 --> 00:56:32.730
So you have some subspace.
00:56:32.730 --> 00:56:36.420
You partition your space into
translates of that subspace,
00:56:36.420 --> 00:56:39.030
and you replace the
value of f on each coset
00:56:39.030 --> 00:56:41.520
by its average on that coset.
00:56:41.520 --> 00:56:43.710
So this is similar to
what we did with graphons.
00:56:43.710 --> 00:56:44.720
You're stepping.
00:56:44.720 --> 00:56:46.230
So you're averaging
on each block.
00:56:51.730 --> 00:56:53.670
So now let me prove
something which
00:56:53.670 --> 00:56:57.690
is kind of like an
arithmetic regularity lemma.
00:57:01.610 --> 00:57:05.330
And I mean, this statement
will be new to you,
00:57:05.330 --> 00:57:08.210
but it should look similar to
some of the statements we've
00:57:08.210 --> 00:57:09.430
seen before in the course.
00:57:12.150 --> 00:57:15.840
And the statement is
that, for every epsilon,
00:57:15.840 --> 00:57:21.770
there exists some m which
is a function of epsilon.
00:57:21.770 --> 00:57:25.070
And, in fact, it
will be bounded,
00:57:25.070 --> 00:57:28.850
in terms of tower of
height, at most order
00:57:28.850 --> 00:57:31.250
logarithmic in 1 over epsilon.
00:57:31.250 --> 00:57:36.860
Such that for every
function f on F3
00:57:36.860 --> 00:57:45.600
to the n that are values bounded
between 0 and 1, there exists
00:57:45.600 --> 00:57:57.750
subspaces W and U, where
the codimension of W
00:57:57.750 --> 00:57:59.020
is, at most, m.
00:57:59.020 --> 00:58:01.810
So you should think of this
as the course partition
00:58:01.810 --> 00:58:05.480
and the fine partition in the
partition regularity lemma.
00:58:05.480 --> 00:58:07.120
And the codimension is--
00:58:07.120 --> 00:58:09.610
corresponds to the
number of pieces.
00:58:09.610 --> 00:58:13.150
So three ways to codimension
is the number of cosets.
00:58:13.150 --> 00:58:18.010
So you have bounded many
parts, and have two partitions.
00:58:18.010 --> 00:58:22.540
And what I would like
is that the number--
00:58:22.540 --> 00:58:23.920
so if I--
00:58:23.920 --> 00:58:28.730
I want f to be pseudorandom
after doing this partitioning,
00:58:28.730 --> 00:58:29.960
so to speak.
00:58:29.960 --> 00:58:32.050
And this corresponds
to the statement
00:58:32.050 --> 00:58:39.520
that if I look f minus fW, then
the maximum Fourier coefficient
00:58:39.520 --> 00:58:44.020
is quite small, where
quite small means, at most,
00:58:44.020 --> 00:58:51.940
epsilon over the
size of U complement.
00:58:51.940 --> 00:58:55.920
So size of U perp.
00:58:55.920 --> 00:59:02.820
And, also, there is
this other condition
00:59:02.820 --> 00:59:10.310
which tells you that the
L3 norms between f sub U
00:59:10.310 --> 00:59:16.400
and f sub W are
related in this way.
00:59:23.798 --> 00:59:25.090
So we haven't seen this before.
00:59:25.090 --> 00:59:29.070
In fact, specifically, this
inequality is very ad hoc
00:59:29.070 --> 00:59:31.740
to the application of
popular difference in 3-AP's.
00:59:31.740 --> 00:59:33.570
But we have seen
something similar,
00:59:33.570 --> 00:59:36.030
where this relationship
is replaced
00:59:36.030 --> 00:59:38.460
by something that accounts
for the difference between L2
00:59:38.460 --> 00:59:39.828
norms.
00:59:39.828 --> 00:59:41.370
So if you go back
to your notes, when
00:59:41.370 --> 00:59:44.730
we discussed regularity lemma
in a more analytic fashion,
00:59:44.730 --> 00:59:45.402
we have that.
00:59:45.402 --> 00:59:46.860
And you should
think of this-- when
00:59:46.860 --> 00:59:50.497
we discussed strong regularity
lemma, this definition here,
00:59:50.497 --> 00:59:52.080
this roughly corresponds
to definition
00:59:52.080 --> 00:59:54.570
that in the fine partition
versus the course partition
00:59:54.570 --> 00:59:57.510
the edge densities are
roughly similar, that when you
00:59:57.510 --> 01:00:00.290
do the further partitioning,
you're not changing densities
01:00:00.290 --> 01:00:01.230
up by very much.
01:00:04.020 --> 01:00:08.315
So that's the arithmetic
regularity lemma.
01:00:08.315 --> 01:00:09.690
And once you have
the statement--
01:00:09.690 --> 01:00:11.760
I mean, I think the hardest part
is writing down the statement.
01:00:11.760 --> 01:00:14.760
Once you have the statement,
the proof itself is kind of this
01:00:14.760 --> 01:00:20.230
follow your nose approach,
where you first define
01:00:20.230 --> 01:00:21.840
the sequence of epsilons.
01:00:21.840 --> 01:00:26.120
Epsilon 0 is 1, and
epsilon sub k plus 1--
01:00:26.120 --> 01:00:28.060
and don't worry
about this for now.
01:00:28.060 --> 01:00:32.140
You will see in a second why
these numbers are chosen.
01:00:41.500 --> 01:00:46.660
Let me write R sub k to be the
set of r's-- so there will be
01:00:46.660 --> 01:00:48.820
characters--
01:00:48.820 --> 01:00:52.285
such that the Fourier
coefficient fr
01:00:52.285 --> 01:00:54.400
is at least epsilon sub k.
01:00:56.960 --> 01:00:59.570
So the r's are supposed
to identify how we're
01:00:59.570 --> 01:01:01.175
going to do the partitioning.
01:01:05.830 --> 01:01:11.540
Now, the size of
this R is bounded.
01:01:11.540 --> 01:01:17.140
So I claim that the
size of R is, at most, 1
01:01:17.140 --> 01:01:22.000
over epsilon sub k squared.
01:01:22.000 --> 01:01:29.470
And that's because there is
this parsable identity, which
01:01:29.470 --> 01:01:34.360
tells you that the L2 sum
of the Fourier coefficients
01:01:34.360 --> 01:01:40.480
is equal to the L2 of the
function, which is at most 1.
01:01:40.480 --> 01:01:42.820
So the number of Fourier
coefficients that exceed
01:01:42.820 --> 01:01:44.800
a certain quantity
cannot be too many.
01:01:52.950 --> 01:02:00.930
So let U now be the
subspace defined by taking
01:02:00.930 --> 01:02:03.780
the orthogonal
complement of these r's.
01:02:07.420 --> 01:02:15.550
And let's note that if we
take alpha sub k to be the--
01:02:15.550 --> 01:02:22.270
if we take alpha sub k to be the
L3 norm cubed of the function
01:02:22.270 --> 01:02:29.160
derived from averaging f along
the U's, and then looking
01:02:29.160 --> 01:02:32.720
at the third moment
of these densities.
01:02:32.720 --> 01:02:38.610
So these alphas, we can apply
the increment lemma initially
01:02:38.610 --> 01:02:41.910
to deduce that there exists--
01:02:41.910 --> 01:02:44.520
so, in particular, this
number here is at least alpha
01:02:44.520 --> 01:02:46.290
cubed by convexity.
01:02:49.890 --> 01:02:57.620
So by the previous lemma, there
exists some k, no more than
01:02:57.620 --> 01:02:59.550
on the order of 1 over--
01:02:59.550 --> 01:03:05.337
of log 1 over epsilon,
such that 2 alpha sub
01:03:05.337 --> 01:03:15.580
k minus alpha sub k plus
1 is at least the density
01:03:15.580 --> 01:03:18.880
of f cubed minus epsilon.
01:03:18.880 --> 01:03:22.000
So this alpha is supposed
to be the density of f.
01:03:31.320 --> 01:03:34.310
So we find this k.
01:03:34.310 --> 01:03:40.610
And we have this bound
over here from satisfying
01:03:40.610 --> 01:03:43.590
that inequality.
01:03:43.590 --> 01:03:46.820
So this is the density increment
argument, the energy increment
01:03:46.820 --> 01:03:47.800
argument.
01:03:47.800 --> 01:03:50.600
So we're doing the energy
increment argument, basically
01:03:50.600 --> 01:03:52.100
the same argument
as the one that we
01:03:52.100 --> 01:03:55.100
did when we discussed
graph regularity lemma,
01:03:55.100 --> 01:03:57.088
but now presented in a
slightly different form
01:03:57.088 --> 01:03:58.380
and a different order of logic.
01:03:58.380 --> 01:04:01.330
But it's the same argument.
01:04:01.330 --> 01:04:03.960
And what we would like
to show is that you also
01:04:03.960 --> 01:04:06.600
have this pseudorandomness
condition about having
01:04:06.600 --> 01:04:08.343
small Fourier coefficients.
01:04:16.560 --> 01:04:19.460
So what's happening here with
the Fourier coefficients?
01:04:19.460 --> 01:04:23.200
Now, how is the Fourier
coefficient of an average
01:04:23.200 --> 01:04:26.050
f related to the original f?
01:04:26.050 --> 01:04:31.680
So that's something you
want to understand up there.
01:04:31.680 --> 01:04:34.200
And that's something
that's not hard to analyze.
01:04:34.200 --> 01:04:40.395
Because if you have
a function U or W--
01:04:40.395 --> 01:04:44.790
so either one-- then
the Fourier coefficients
01:04:44.790 --> 01:04:48.000
of this average version
is very much related
01:04:48.000 --> 01:04:49.050
to the original function.
01:04:51.810 --> 01:04:58.510
It turns out that if
you take an r which
01:04:58.510 --> 01:05:07.780
is in the orthogonal complement,
then the Fourier coefficient
01:05:07.780 --> 01:05:10.010
doesn't change.
01:05:10.010 --> 01:05:16.800
And if you are not in the
orthogonal complement,
01:05:16.800 --> 01:05:19.950
then the Fourier
coefficient gets zeroed out.
01:05:27.170 --> 01:05:29.990
So that's something that's
not too hard to check,
01:05:29.990 --> 01:05:33.550
and I urge you to
think about it.
01:05:33.550 --> 01:05:37.590
So, with that in mind, let's go
back to verify this over here.
01:05:40.680 --> 01:05:43.600
So what we have
now is that the--
01:05:49.072 --> 01:05:55.570
so this quantity, which
measures the largest Fourier
01:05:55.570 --> 01:06:01.430
coefficient, the difference
between f and U sub k plus 1,
01:06:01.430 --> 01:06:05.220
is, at most--
01:06:05.220 --> 01:06:08.030
and what U sub k
plus 1 is doing is
01:06:08.030 --> 01:06:11.090
we're looking at possible
large Fourier coefficients,
01:06:11.090 --> 01:06:14.060
and we are getting rid of them.
01:06:14.060 --> 01:06:18.680
So we're zeroing out these
large Fourier coefficients,
01:06:18.680 --> 01:06:21.350
so that the remaining
Fourier coefficients are all
01:06:21.350 --> 01:06:22.180
quite small.
01:06:31.430 --> 01:06:35.060
But we chose our R so that if--
01:06:35.060 --> 01:06:36.260
so this big R--
01:06:36.260 --> 01:06:38.450
so that if your little
r is not in big R,
01:06:38.450 --> 01:06:40.400
then the Fourier
coefficient must be small.
01:06:40.400 --> 01:06:44.960
That's how we chose
the big R. So we
01:06:44.960 --> 01:06:49.940
have this bound over here.
01:06:49.940 --> 01:07:01.790
And by the definition of the
epsilon, we have that bound.
01:07:01.790 --> 01:07:04.040
And, also, we're combining
with this estimate,
01:07:04.040 --> 01:07:08.850
upper bound estimate
on the size of R sub k.
01:07:08.850 --> 01:07:15.010
So point being we have that.
01:07:15.010 --> 01:07:23.190
So now take W to be U sub k
plus 1, and U to b U sub k,
01:07:23.190 --> 01:07:26.670
and then we have
everything that we want.
01:07:26.670 --> 01:07:27.360
Question, yes.
01:07:27.360 --> 01:07:30.540
AUDIENCE: Why is the
codimension of W small?
01:07:30.540 --> 01:07:33.550
YUFEI ZHAO: Question is, why
is the codimension of W small?
01:07:33.550 --> 01:07:35.240
So what is the codimension of W?
01:07:38.590 --> 01:07:42.860
So we want to know that the
codimension of W is bounded.
01:07:42.860 --> 01:07:45.620
So the codimension of W is--
01:07:48.742 --> 01:07:52.720
I mean, the codimension of
any of these U sub k's is,
01:07:52.720 --> 01:08:01.090
at most, 3 raised to the
number of r's that produce it.
01:08:01.090 --> 01:08:04.540
And the size of R is bounded.
01:08:04.540 --> 01:08:13.660
So if we pick m so that it
uniformly bounds the size of R,
01:08:13.660 --> 01:08:16.660
then we have a bound
on the codimension.
01:08:16.660 --> 01:08:17.588
So that's important.
01:08:17.588 --> 01:08:19.630
So we need to know that
the codimension is small.
01:08:19.630 --> 01:08:21.838
Otherwise, if you don't have
the bound on codimension
01:08:21.838 --> 01:08:26.062
you can just take
the zero subspace,
01:08:26.062 --> 01:08:27.479
and, trivially,
everything's true.
01:08:31.490 --> 01:08:34.149
We have a regularity lemma, and
what comes with a regularity
01:08:34.149 --> 01:08:35.652
lemma is a counting lemma.
01:08:35.652 --> 01:08:37.319
So let me write down
the counting lemma,
01:08:37.319 --> 01:08:38.319
and I'll skip the proof.
01:08:44.560 --> 01:08:50.439
So the counting lemma tells
you that if you have f and g
01:08:50.439 --> 01:09:01.510
both functions on F3 to the n,
and U is a subspace F, then--
01:09:01.510 --> 01:09:02.859
so let me define--
01:09:02.859 --> 01:09:05.800
so the quantity that
I'm interested in is--
01:09:08.939 --> 01:09:19.069
so I'm interested in
understanding 3-AP's where
01:09:19.069 --> 01:09:21.630
the common difference is
in a particular subspace.
01:09:32.700 --> 01:09:41.920
So we claim that the 3-AP count
of f with common difference
01:09:41.920 --> 01:09:43.990
restricted to the subspace U--
01:09:47.850 --> 01:09:51.370
so it's similar between
f and g if f and g are
01:09:51.370 --> 01:09:53.979
close to each other in Fourier.
01:09:57.490 --> 01:09:59.200
Well, not quite, because--
01:09:59.200 --> 01:10:01.890
so something like
this, we saw earlier
01:10:01.890 --> 01:10:03.990
in the proof of Roth's
theorem if we don't
01:10:03.990 --> 01:10:05.518
restrict the common difference.
01:10:05.518 --> 01:10:07.560
Turns out, if you restrict
the common difference,
01:10:07.560 --> 01:10:10.020
you lose a little bit.
01:10:10.020 --> 01:10:12.510
So you lose a factor
which is basically
01:10:12.510 --> 01:10:19.280
the size of the complement
of U. So I won't prove that.
01:10:21.850 --> 01:10:25.500
But now let me go on
to the punch line.
01:10:29.920 --> 01:10:35.830
So if we start with, again,
f function in your space,
01:10:35.830 --> 01:10:45.340
taking bounds between 0 and 1,
and I have subspaces U and W,
01:10:45.340 --> 01:10:49.000
I claim that the--
01:10:49.000 --> 01:10:53.142
if I look at f
averaged through W,
01:10:53.142 --> 01:10:57.370
and I consider 3-AP counts with
common difference restricted
01:10:57.370 --> 01:11:01.480
to U, then this
quantity here is lower
01:11:01.480 --> 01:11:08.490
bounded by this difference
between L3 norms.
01:11:16.850 --> 01:11:17.890
So I claim this is true.
01:11:20.700 --> 01:11:22.480
So this is just some inequality.
01:11:22.480 --> 01:11:23.780
This is some inequality.
01:11:26.830 --> 01:11:30.130
So of all the things that
I did back in high school
01:11:30.130 --> 01:11:32.440
doing math competitions, I
think the one skill which,
01:11:32.440 --> 01:11:34.690
I think, I find
most helpful now is
01:11:34.690 --> 01:11:36.830
being able to do inequalities.
01:11:36.830 --> 01:11:40.000
And I thought I would never
see these three-variable
01:11:40.000 --> 01:11:42.430
inequalities again, but
when I saw this one--
01:11:42.430 --> 01:11:44.560
so Fox and Pham, when
they first showed me
01:11:44.560 --> 01:11:47.320
a somewhat different
proof of an approach that
01:11:47.320 --> 01:11:49.763
didn't go through this specific
inequality, I told them,
01:11:49.763 --> 01:11:51.930
hey, there's this thing I
remember from high school.
01:11:51.930 --> 01:11:54.042
It's called Schur's inequality.
01:11:54.042 --> 01:11:56.500
And I thought I would never
see it again after high school,
01:11:56.500 --> 01:11:57.875
but apparently
it's still useful.
01:12:01.360 --> 01:12:03.130
So what Schur's
inequality says--
01:12:06.160 --> 01:12:08.560
this is one of those
three-variable inequalities
01:12:08.560 --> 01:12:13.420
that you would know if
you did math olympiads--
01:12:13.420 --> 01:12:22.270
that you have-- so
it's an inequality
01:12:22.270 --> 01:12:26.860
between non-negative-- actually,
it's true for real numbers
01:12:26.860 --> 01:12:30.390
as well, but let's say it's
non-negative real numbers.
01:12:36.040 --> 01:12:37.983
So that's Schur's equality.
01:12:42.420 --> 01:12:56.760
So if you look at the left-hand
side, the left-hand side is--
01:12:56.760 --> 01:12:59.310
it can be written as a
sum in the following way.
01:12:59.310 --> 01:13:01.720
I mean, it can be written
in the following way.
01:13:01.720 --> 01:13:06.450
So its expectation over
x, y, z that are 3-AP's
01:13:06.450 --> 01:13:08.625
in the same U coset.
01:13:12.880 --> 01:13:15.480
So I'm counting 3-AP's with
common difference restricted
01:13:15.480 --> 01:13:20.310
to U. So common 3-AP's
in the same U coset.
01:13:20.310 --> 01:13:26.260
And I am looking at
the product of f sub
01:13:26.260 --> 01:13:31.610
W evaluated on this 3-AP.
01:13:31.610 --> 01:13:37.380
So what I would like to do now
is apply Schur's inequality
01:13:37.380 --> 01:13:45.170
to a, b, and c, being
these three numbers.
01:13:45.170 --> 01:13:48.030
The point is you have
this a, b, c on the left.
01:13:48.030 --> 01:13:50.640
And then everything on the
right involves only a subset
01:13:50.640 --> 01:13:54.220
of a, b, c, and they simplify.
01:13:54.220 --> 01:13:58.720
So if I do this, then I
lower bound this quantity
01:13:58.720 --> 01:14:08.990
by twice the expectation of
x and y in the same coset,
01:14:08.990 --> 01:14:22.690
same U coset of f sub W
of x squared f sub W of y.
01:14:25.420 --> 01:14:27.372
Maybe I took two other
things, but they're
01:14:27.372 --> 01:14:29.080
all symmetric with
respect to each other.
01:14:31.720 --> 01:14:36.180
And minus the term
that corresponds
01:14:36.180 --> 01:14:39.915
to this sum of cubes.
01:14:39.915 --> 01:14:43.180
So like that.
01:14:43.180 --> 01:14:47.130
So this is a consequence
of Schur's equality applied
01:14:47.130 --> 01:14:48.720
with a, b, c like this.
01:14:51.870 --> 01:14:58.700
But now you see, over here,
I can analyze this expression
01:14:58.700 --> 01:14:59.300
even further.
01:14:59.300 --> 01:15:03.640
Because if I let y vary
within the same U coset,
01:15:03.640 --> 01:15:08.180
then, over here, it
averages out to U cosets.
01:15:08.180 --> 01:15:14.860
So U is bigger than W.
So what we have is--
01:15:18.030 --> 01:15:22.340
so what we have over here
is that it is at least twice
01:15:22.340 --> 01:15:26.381
of f of f--
01:15:26.381 --> 01:15:38.180
f of U-- fW squared fU minus
the expectation of fW squared--
01:15:38.180 --> 01:15:41.290
fW cubed.
01:15:41.290 --> 01:15:46.130
And I can use
convexity on f sub W
01:15:46.130 --> 01:16:04.150
to get that, which is
what we're looking for.
01:16:04.150 --> 01:16:07.092
So the last step is convexity.
01:16:07.092 --> 01:16:08.550
So I'm running
through a little bit
01:16:08.550 --> 01:16:10.590
quick here because we're
running out of time,
01:16:10.590 --> 01:16:13.200
but all of these
steps are fairly
01:16:13.200 --> 01:16:15.420
simple once you observe
the first thing you can do
01:16:15.420 --> 01:16:18.830
is Schur's inequality.
01:16:18.830 --> 01:16:20.460
And we're almost there.
01:16:20.460 --> 01:16:21.360
We're almost done.
01:16:21.360 --> 01:16:23.270
We're almost done.
01:16:23.270 --> 01:16:29.230
So from that lemma
up there, I claim now
01:16:29.230 --> 01:16:32.770
that, for every
epsilon, there exists
01:16:32.770 --> 01:16:41.680
some m which is tower log
in 1 over epsilon, such
01:16:41.680 --> 01:16:48.670
that if f is a function
on F3 to the n,
01:16:48.670 --> 01:16:53.380
taking bounds
between 0 and 1, then
01:16:53.380 --> 01:17:00.410
there exists a subspace
U of codimension,
01:17:00.410 --> 01:17:08.590
at most, m such that
the 3-AP count, 3-AP
01:17:08.590 --> 01:17:11.680
density with common
difference restricted to U,
01:17:11.680 --> 01:17:17.710
is at least the random
bound minus epsilon.
01:17:24.370 --> 01:17:25.580
Why is this true?
01:17:25.580 --> 01:17:32.660
Well, we put everything
together, and choose U and W
01:17:32.660 --> 01:17:34.610
as in regularity lemma.
01:17:37.630 --> 01:17:48.720
And, by counting lemma, we have
that the 3-AP density of f,
01:17:48.720 --> 01:17:52.560
so it is at least--
01:17:52.560 --> 01:17:54.830
so we're using counting
lemma over here--
01:17:54.830 --> 01:18:00.480
it is at least the 3-AP
density of f sub W of U
01:18:00.480 --> 01:18:04.418
minus a small error
which we can control.
01:18:04.418 --> 01:18:05.460
So this step is counting.
01:18:09.710 --> 01:18:11.520
And now we apply that
inequality up there.
01:18:23.720 --> 01:18:27.120
And finally, we chose our U
and W in the regularity lemma
01:18:27.120 --> 01:18:31.830
so that this difference
here is controlled.
01:18:31.830 --> 01:18:37.370
So it is controlled by the
random bound minus epsilon.
01:18:42.100 --> 01:18:43.530
And that's it.
01:18:43.530 --> 01:18:45.420
So you change
epsilon to 4 epsilon,
01:18:45.420 --> 01:18:47.830
but we can change it back.
01:18:47.830 --> 01:18:48.600
And that's it.
01:18:51.600 --> 01:18:53.640
So we have the
statement that you
01:18:53.640 --> 01:18:57.660
have this subspace of
bounded codimension where
01:18:57.660 --> 01:18:59.880
you have this popular
difference result.
01:18:59.880 --> 01:19:02.790
It doesn't quite guarantee you
a single common difference,
01:19:02.790 --> 01:19:04.760
because, well, you
don't really want
01:19:04.760 --> 01:19:10.200
it to be the case where U is
just a single point because I
01:19:10.200 --> 01:19:13.090
want a nonzero
common difference.
01:19:13.090 --> 01:19:15.580
But if U is large enough--
01:19:15.580 --> 01:19:21.060
if n is large enough
at bounded codimension,
01:19:21.060 --> 01:19:22.930
so, then, the size
of U is large enough.
01:19:25.780 --> 01:19:31.260
So, then, there exists some
nonzero common difference.
01:19:31.260 --> 01:19:35.160
You pick some nonzero
element of U. On average,
01:19:35.160 --> 01:19:38.170
this should work out just fine.
01:19:38.170 --> 01:19:41.820
So I'll leave that
detail to you.
01:19:41.820 --> 01:19:43.530
One more thing I
want to mention is
01:19:43.530 --> 01:19:46.650
that all of this machinery
involving regularity
01:19:46.650 --> 01:19:49.450
and Fourier, as with
things we've done before,
01:19:49.450 --> 01:19:52.720
carries over to other settings--
general Abelian groups,
01:19:52.720 --> 01:19:55.490
and also the integers.
01:19:55.490 --> 01:19:57.710
And you may ask, well,
we have this for 3-AP's.
01:19:57.710 --> 01:20:01.570
What about longer
arithmetic progressions?
01:20:01.570 --> 01:20:03.750
In the integers, it
turns out it is also
01:20:03.750 --> 01:20:07.170
true, that Green's
statement, in the integers
01:20:07.170 --> 01:20:10.845
if you replace 3-AP by 4-AP.
01:20:10.845 --> 01:20:12.220
That's a theorem
of Green and Tao
01:20:12.220 --> 01:20:15.470
involving higher-order quadratic
analysis-- quadratic Fourier
01:20:15.470 --> 01:20:17.140
analysis.
01:20:17.140 --> 01:20:25.130
However, and rather
surprisingly, 4-AP, it's OK.
01:20:25.130 --> 01:20:27.790
But 5-AP and
longer, it is false.
01:20:30.880 --> 01:20:33.250
The corresponding statement
about popular differences
01:20:33.250 --> 01:20:35.960
for 5-AP in the
integers is false.
01:20:35.960 --> 01:20:38.285
There are counterexamples.
01:20:38.285 --> 01:20:40.910
So it's really a statement about
3-AP's and 4-AP's, and there's
01:20:40.910 --> 01:20:42.470
some magic cancellations
that happen
01:20:42.470 --> 01:20:43.637
in 4-AP's that make it true.
01:20:48.450 --> 01:20:48.950
OK, great.
01:20:48.950 --> 01:20:50.980
So that's all for today.