WEBVTT

00:00:17.367 --> 00:00:18.950
YUFEI ZHAO: For the
past few lectures,

00:00:18.950 --> 00:00:21.310
we've been discussing the
structure of set addition,

00:00:21.310 --> 00:00:25.260
and which culminated in the
proof of Freiman's theorem.

00:00:25.260 --> 00:00:28.270
So this was a pretty
big and central result

00:00:28.270 --> 00:00:30.640
in additive combinatorics,
which gives you

00:00:30.640 --> 00:00:35.500
a complete characterization
of sets with small doubling.

00:00:35.500 --> 00:00:38.650
Today, I want to look at a
somewhat different issue also

00:00:38.650 --> 00:00:40.780
related to sets
of small doubling,

00:00:40.780 --> 00:00:44.860
but this time we want to
have a somewhat different

00:00:44.860 --> 00:00:48.760
characterization of what does
it mean for a set to have

00:00:48.760 --> 00:00:51.652
lots of additive structure.

00:00:51.652 --> 00:00:53.110
So in today's
lecture, we're always

00:00:53.110 --> 00:00:55.968
going to be working
in an Abelian group.

00:01:00.450 --> 00:01:02.990
Let me define the
following quantity.

00:01:02.990 --> 00:01:09.450
Given sets A and B, we
define the additive energy

00:01:09.450 --> 00:01:16.450
between A and B to be
denoted by E of A and B.

00:01:16.450 --> 00:01:17.820
So A and B are subgroups.

00:01:17.820 --> 00:01:21.840
They're subsets of this
arbitrary Abelian group.

00:01:21.840 --> 00:01:26.250
So E of A and B is defined to
be the number of quadruples, a1,

00:01:26.250 --> 00:01:32.820
a2, b1, b2, where a1, a2
are elements of A, and b1,

00:01:32.820 --> 00:01:41.310
b2 are elements of B,
such that a1 plus b1

00:01:41.310 --> 00:01:43.680
equals to a2 plus b2.

00:01:48.650 --> 00:01:52.610
So the additive
energy is the number

00:01:52.610 --> 00:01:56.510
of quadruples of these
elements where you

00:01:56.510 --> 00:01:59.510
have this additive relation.

00:01:59.510 --> 00:02:02.000
And we would like
to understand sets

00:02:02.000 --> 00:02:04.790
with large additive energy.

00:02:04.790 --> 00:02:07.460
So, intuitively, if you
have lots of solutions

00:02:07.460 --> 00:02:09.680
to this equation in
your sets, then the

00:02:09.680 --> 00:02:14.590
sets themselves should have lots
of internal additive structure.

00:02:14.590 --> 00:02:17.900
So it's a different way of
describing additive structure,

00:02:17.900 --> 00:02:19.400
and we'd like to
understand how does

00:02:19.400 --> 00:02:21.800
this way of describing
additive structure

00:02:21.800 --> 00:02:26.510
relate to things we've seen
before, namely small doubling.

00:02:30.480 --> 00:02:33.840
When you have not two
sets but just one set--

00:02:33.840 --> 00:02:37.170
slightly easier to think about--

00:02:37.170 --> 00:02:42.690
we just write E of A.
I mean E of A comma A.

00:02:42.690 --> 00:02:55.150
And these objects are analogous
to 4 cycles in graph theory.

00:02:55.150 --> 00:02:59.020
Because if you about this
expression here in a Cayley

00:02:59.020 --> 00:03:02.650
graph, let's say
over F2, then this

00:03:02.650 --> 00:03:05.110
is the description of a 4 cycle.

00:03:05.110 --> 00:03:07.240
You go around 4
steps, and you come

00:03:07.240 --> 00:03:09.460
back to where you started from.

00:03:09.460 --> 00:03:13.360
So these objects are
the analogs of 4 cycles.

00:03:13.360 --> 00:03:16.720
And we already saw in our
discussion of quasi-randomness,

00:03:16.720 --> 00:03:18.970
and also elsewhere,
that 4 cycles

00:03:18.970 --> 00:03:22.030
play an important
role in graph theory.

00:03:22.030 --> 00:03:24.160
And, likewise, these
additive energies

00:03:24.160 --> 00:03:27.190
are going to play an important
role in describing sets

00:03:27.190 --> 00:03:28.847
with additive structure.

00:03:33.320 --> 00:03:35.570
Consider the following quantity.

00:03:35.570 --> 00:03:42.530
We're going to let r sub A comma
B of x to be the number of ways

00:03:42.530 --> 00:03:45.410
to write x as a plus b.

00:03:49.710 --> 00:03:53.330
So x equals to a plus b.

00:03:56.080 --> 00:03:59.750
So r sub A comma B of
x is the number of ways

00:03:59.750 --> 00:04:04.040
I can write x as a plus b, where
a comes from big A, little b

00:04:04.040 --> 00:04:09.110
comes from big B. Then,
reinterpreting the formula

00:04:09.110 --> 00:04:12.590
up there, we see that the
additive energy between two

00:04:12.590 --> 00:04:18.758
sets A and B is simply the sum
of the squares of A-- r sub A

00:04:18.758 --> 00:04:26.630
comma B. As x ranges over
all elements of the group,

00:04:26.630 --> 00:04:34.580
we only need to take x
in the sumset A plus B.

00:04:34.580 --> 00:04:38.780
So the basic question,
like when we discussed

00:04:38.780 --> 00:04:41.480
additive combinatorics, in the
sense of when we discussed sets

00:04:41.480 --> 00:04:44.750
of small doubling,
there we asked,

00:04:44.750 --> 00:04:49.910
if you have a set A of a certain
size, how big can a plus a be?

00:04:49.910 --> 00:04:51.350
Here, let's ask the same.

00:04:51.350 --> 00:04:55.490
If I give you set A of a certain
size, how big or how small

00:04:55.490 --> 00:05:00.070
can the additive
energy of the set be?

00:05:00.070 --> 00:05:02.440
What's the most number
of possible number

00:05:02.440 --> 00:05:03.520
of additive quadruples.

00:05:03.520 --> 00:05:07.760
What's the least possible
number of additive quadruples?

00:05:07.760 --> 00:05:09.950
There's some
trivial bounds, just

00:05:09.950 --> 00:05:12.550
like in the case of sumsets.

00:05:17.950 --> 00:05:19.290
So what are some trivial bounds?

00:05:23.660 --> 00:05:29.330
On one hand, by taking a1 equal
to a2, and b2 equal to b2,

00:05:29.330 --> 00:05:33.620
we see that the energy is
always at least the square

00:05:33.620 --> 00:05:37.970
of the size of A.
On the other hand,

00:05:37.970 --> 00:05:40.300
if I fix three of
the four elements,

00:05:40.300 --> 00:05:42.750
then the fourth
element is determined.

00:05:42.750 --> 00:05:49.030
So the upper bound is
cube of the size of A.

00:05:49.030 --> 00:05:51.430
And you convince
yourself that, except up

00:05:51.430 --> 00:05:54.340
to maybe a constant
factors, this

00:05:54.340 --> 00:05:58.300
is the best possible general
upper and lower bound.

00:05:58.300 --> 00:06:01.510
Similar situation with sumsets,
where you have lower bound

00:06:01.510 --> 00:06:04.550
linear, upper bound quadratic.

00:06:04.550 --> 00:06:07.948
Which is the side with
additive structure?

00:06:07.948 --> 00:06:10.250
So if you have lots
of additive structure,

00:06:10.250 --> 00:06:12.650
you have high energy.

00:06:12.650 --> 00:06:16.930
So this range is when you have
lots of additive structure.

00:06:16.930 --> 00:06:19.360
And we would like to
understand, what can you

00:06:19.360 --> 00:06:23.290
say about a set with
high additive energy?

00:06:27.460 --> 00:06:32.030
Well, what are some examples of
sets with high additive energy?

00:06:32.030 --> 00:06:34.450
It turns out that if
you have a set that

00:06:34.450 --> 00:06:39.640
has small doubling,
then, automatically,

00:06:39.640 --> 00:06:42.010
it implies large
additive energy.

00:06:49.030 --> 00:06:54.740
So, in particular, intervals,
or GAPs, or a large subset

00:06:54.740 --> 00:06:57.480
of GAPs, or all these examples
that we saw-- in fact,

00:06:57.480 --> 00:07:00.590
these are all the examples
coming from Freiman's theorem.

00:07:00.590 --> 00:07:01.790
Also, arbitrary groups.

00:07:01.790 --> 00:07:02.840
You can have subgroups.

00:07:02.840 --> 00:07:05.630
And so all of these examples
have large additive energy.

00:07:08.490 --> 00:07:10.490
So let me-- I'll you the
proof just in a second.

00:07:10.490 --> 00:07:11.740
It's not hard.

00:07:11.740 --> 00:07:14.330
But the real question is,
what about the converse?

00:07:14.330 --> 00:07:17.500
So can you see much in
the reverse direction?

00:07:17.500 --> 00:07:20.670
But, first, let me show you
this claim that small doubling

00:07:20.670 --> 00:07:23.690
implies large additive energy.

00:07:23.690 --> 00:07:28.310
Well, if you have small
doubling, if a plus A is size,

00:07:28.310 --> 00:07:33.940
at most, k times
the size of A, then

00:07:33.940 --> 00:07:37.030
it turns out the
additive energy of A

00:07:37.030 --> 00:07:42.080
is at least the
maximum possible,

00:07:42.080 --> 00:07:46.370
which is A cubed divided by k.

00:07:46.370 --> 00:07:49.480
So that's within a constant
factor of the maximum.

00:07:49.480 --> 00:07:50.450
It's pretty large.

00:07:50.450 --> 00:07:54.900
If you have small doubling,
then large additive energy.

00:07:54.900 --> 00:07:57.580
So let's see the proof.

00:07:57.580 --> 00:07:59.440
So you can often
tell how hard a proof

00:07:59.440 --> 00:08:02.320
is by how simple the statement
is, although that's not always

00:08:02.320 --> 00:08:05.080
the case, as we've seen
with some of our theorems,

00:08:05.080 --> 00:08:07.660
like Plunnecke's inequality.

00:08:07.660 --> 00:08:09.990
But in this case, it turns
out to be fairly simple.

00:08:09.990 --> 00:08:20.620
So we see that r sub A comma
A is supported on A plus A.

00:08:20.620 --> 00:08:26.250
So we use Cauchy-Schwarz
to write--

00:08:26.250 --> 00:08:33.419
so, first, we write additive
energy in terms of the sum

00:08:33.419 --> 00:08:35.460
of the squares of these r's.

00:08:38.260 --> 00:08:46.670
And now, by Cauchy-Schwarz,
we find that you can replace

00:08:46.670 --> 00:08:50.510
the sum of the squared
r's by the sum of the r's.

00:08:50.510 --> 00:08:55.370
But now the key point
here is that we take out

00:08:55.370 --> 00:08:59.090
this factor coming from
Cauchy-Schwarz, which is only

00:08:59.090 --> 00:09:03.560
A plus A. So if the support size
is small, we gain in this step.

00:09:06.310 --> 00:09:11.110
But what is the sum of r's?

00:09:11.110 --> 00:09:13.210
I mean, r of x is
just number of ways

00:09:13.210 --> 00:09:16.320
to write x as little
a1 plus little ab--

00:09:16.320 --> 00:09:17.800
little a2.

00:09:17.800 --> 00:09:23.500
So if I sum over all x, I'm just
looking at different two ways--

00:09:23.500 --> 00:09:28.810
we're just looking at ways of
picking an ordered pair from A.

00:09:28.810 --> 00:09:34.210
So this last expression
is equal to the size of A

00:09:34.210 --> 00:09:39.980
to power 4 divided by
A plus A. And now we

00:09:39.980 --> 00:09:43.430
use that A has small
doubling to conclude

00:09:43.430 --> 00:09:47.700
that the final quantity is at
least A cubed divided by k.

00:09:53.820 --> 00:09:58.512
So we see small doubling
implies large additive energy.

00:09:58.512 --> 00:09:59.720
And this kind of makes sense.

00:09:59.720 --> 00:10:03.830
If your set doesn't
expand, then there

00:10:03.830 --> 00:10:08.180
are many collisions of sums.

00:10:08.180 --> 00:10:11.110
And so you must have lots of
solutions to that equation

00:10:11.110 --> 00:10:13.220
up there.

00:10:13.220 --> 00:10:15.230
But what about the converse?

00:10:15.230 --> 00:10:18.440
If I give you a set with
large additive energy,

00:10:18.440 --> 00:10:21.170
must it necessarily
have small doubling?

00:10:24.922 --> 00:10:27.015
Oh.

00:10:27.015 --> 00:10:28.140
Let me show you an example.

00:10:30.760 --> 00:10:38.320
So, well-- so a large
additive energy,

00:10:38.320 --> 00:10:45.070
does it imply small doubling?

00:10:47.680 --> 00:10:50.730
So consider the
following example, where

00:10:50.730 --> 00:10:53.610
you take a set A which
is a combination,

00:10:53.610 --> 00:10:56.970
is a union of a set
with small doubling

00:10:56.970 --> 00:11:03.874
plus a bunch of elements
without additive structure.

00:11:10.170 --> 00:11:12.230
So I take a set
with small doubling

00:11:12.230 --> 00:11:16.940
plus a bunch of elements
without additive structure.

00:11:16.940 --> 00:11:21.190
Then it has large additive
energy, just coming

00:11:21.190 --> 00:11:25.120
from this interval itself.

00:11:25.120 --> 00:11:31.990
So the energy of A
is order N cubed.

00:11:31.990 --> 00:11:34.630
N is the number of elements.

00:11:34.630 --> 00:11:38.320
What about A plus A?

00:11:38.320 --> 00:11:41.650
Well, for A plus A,
this part doesn't--

00:11:41.650 --> 00:11:43.120
that's the part
that contributes,

00:11:43.120 --> 00:11:48.270
or the part of this A
without additive structure.

00:11:48.270 --> 00:11:52.250
And we see that the
size of A plus A

00:11:52.250 --> 00:11:56.830
is quadratic in the size of A.

00:11:56.830 --> 00:12:00.470
So, unfortunately,
the converse fails.

00:12:00.470 --> 00:12:05.480
So you can have sets that have
large additive energy and also

00:12:05.480 --> 00:12:07.590
large doubling.

00:12:07.590 --> 00:12:11.100
But, you see, the reason why
this has large additive energy

00:12:11.100 --> 00:12:14.110
is because there is a very
highly structured additively

00:12:14.110 --> 00:12:15.990
structured piece of it.

00:12:15.990 --> 00:12:20.290
And, somehow, we want to forget
about this extra garbage.

00:12:20.290 --> 00:12:24.790
And that's part of the reason
why the converse is not true.

00:12:24.790 --> 00:12:26.820
So we would like
a statement that

00:12:26.820 --> 00:12:30.150
says that if you have
large additive energy, then

00:12:30.150 --> 00:12:33.750
it must come from some
highly structured piece that

00:12:33.750 --> 00:12:36.380
has small doubling.

00:12:36.380 --> 00:12:38.260
And that is true, and
that's the content

00:12:38.260 --> 00:12:40.540
of the Balog-Szemeredi-Gowers
theorem, which

00:12:40.540 --> 00:12:43.660
is the main topic today.

00:12:43.660 --> 00:12:51.890
So the Balog-Szemeredi-Gowers
theorem says that if you have

00:12:51.890 --> 00:12:52.760
a set--

00:12:52.760 --> 00:12:56.060
so we're working always in
some arbitrary Abelian group.

00:12:56.060 --> 00:13:00.650
If you have a set
with large energy,

00:13:00.650 --> 00:13:07.280
then there exists some
subset A prime of A such

00:13:07.280 --> 00:13:13.460
that A prime is a fairly
large proportion of A.

00:13:13.460 --> 00:13:18.260
And here, by large I mean up to
polynomial changes in the error

00:13:18.260 --> 00:13:18.920
parameters.

00:13:22.010 --> 00:13:28.310
So this A prime is such that
A prime has small doubling.

00:13:34.340 --> 00:13:36.760
If you have large
additive energy,

00:13:36.760 --> 00:13:40.960
then I can pick out a large
piece with small doubling

00:13:40.960 --> 00:13:44.200
constant, and I only
loose a polynomial

00:13:44.200 --> 00:13:45.775
in the error factors.

00:13:48.200 --> 00:13:50.075
So that's the
Balog-Szemeredi-Gowers theorem,

00:13:50.075 --> 00:13:56.420
and it describes
this example up here.

00:13:56.420 --> 00:13:58.058
Any questions about
the statement?

00:14:01.200 --> 00:14:04.970
So what I will actually show you
is a slight variant, actually

00:14:04.970 --> 00:14:08.782
a more general statement, where,
instead of having one set,

00:14:08.782 --> 00:14:09.990
we're going to have two sets.

00:14:12.720 --> 00:14:16.880
So here's Balog-Szemeredi-Gowers
theorem version

00:14:16.880 --> 00:14:25.460
2, where now we have two sets.

00:14:25.460 --> 00:14:27.050
Again, A and B are--

00:14:27.050 --> 00:14:28.640
I'm not going to write any--

00:14:28.640 --> 00:14:29.690
I'm not going to write
it in this lecture,

00:14:29.690 --> 00:14:32.210
but A and B are always subsets
of some arbitrary Abelian

00:14:32.210 --> 00:14:32.710
group.

00:14:32.710 --> 00:14:34.790
So A and B both have
size of, at most,

00:14:34.790 --> 00:14:40.325
n, and the energy
between A and B is large.

00:14:44.670 --> 00:14:53.580
Then there exists a subset A
prime of A, B prime of B such

00:14:53.580 --> 00:14:59.820
that both A prime
and B prime are

00:14:59.820 --> 00:15:05.250
large fractions of
their parent set,

00:15:05.250 --> 00:15:14.070
and such that A prime
plus B prime is not

00:15:14.070 --> 00:15:15.790
too much bigger than n.

00:15:21.170 --> 00:15:24.150
It's not so obvious
why the second version

00:15:24.150 --> 00:15:26.020
implies the first version.

00:15:26.020 --> 00:15:29.030
So you can say, well, take
A and B to be the same.

00:15:29.030 --> 00:15:31.580
But then the
conclusion gives you

00:15:31.580 --> 00:15:36.200
possibly two different
subsets, A prime and B prime.

00:15:36.200 --> 00:15:39.980
But the first version,
I only want one subset

00:15:39.980 --> 00:15:43.320
that has small doubling.

00:15:43.320 --> 00:15:45.270
So, fortunately,
the second version

00:15:45.270 --> 00:15:47.782
does imply the first version.

00:15:47.782 --> 00:15:48.490
So let's see why.

00:15:54.020 --> 00:15:58.610
The second version implies the
first version because, if we--

00:16:03.090 --> 00:16:06.350
so there's a tool
that we introduced

00:16:06.350 --> 00:16:08.630
early on when we discussed
Freiman's theorem,

00:16:08.630 --> 00:16:14.390
and this is the Ruzsa
triangle inequality.

00:16:14.390 --> 00:16:16.410
So the spirit of Ruzsa
triangle inequality

00:16:16.410 --> 00:16:19.680
is it allows you to
relate, to sort of go

00:16:19.680 --> 00:16:23.010
back and forth between different
sumsets in different sets.

00:16:23.010 --> 00:16:31.250
So by Ruzsa triangle inequality,
if we apply the second version

00:16:31.250 --> 00:16:34.535
with A equals to B, then--

00:16:37.750 --> 00:16:40.290
and we pick out this
A prime and B prime,

00:16:40.290 --> 00:16:43.050
then we see that A
prime plus A prime

00:16:43.050 --> 00:16:54.370
is, at most, A prime plus B
prime squared over B prime.

00:16:54.370 --> 00:16:56.290
Well, actually, this uses the--

00:16:56.290 --> 00:16:58.420
vice versa it uses a
slightly stronger version

00:16:58.420 --> 00:17:01.910
that we had to use
Plunnecke-Ruzsa key lemma

00:17:01.910 --> 00:17:02.650
to prove.

00:17:02.650 --> 00:17:04.089
But you can come up--

00:17:04.089 --> 00:17:06.730
I mean, if you don't care
about the precise loss

00:17:06.730 --> 00:17:09.040
in the polynomial
factors, you can also

00:17:09.040 --> 00:17:10.780
use the basic Ruzsa
triangle inequality

00:17:10.780 --> 00:17:13.270
to deduce a similar statement.

00:17:13.270 --> 00:17:14.920
This is easier to deduce.

00:17:14.920 --> 00:17:16.300
So you have that.

00:17:16.300 --> 00:17:19.990
And now, the second
version tells you

00:17:19.990 --> 00:17:24.150
that the numerator
is, at most, poly kn,

00:17:24.150 --> 00:17:30.340
and the denominator
is, at most-- at least,

00:17:30.340 --> 00:17:33.170
n divided by poly k.

00:17:33.170 --> 00:17:40.440
Remember, over here, to get this
hypothesis, we automatically

00:17:40.440 --> 00:17:45.830
have that the size
of A and B are not

00:17:45.830 --> 00:17:47.580
two much smaller than n.

00:17:47.580 --> 00:17:50.700
Or else this cannot be true.

00:17:50.700 --> 00:17:58.200
So putting all these estimates
together, we get that.

00:17:58.200 --> 00:18:02.705
So these two versions, they
are equivalent to each other.

00:18:02.705 --> 00:18:04.080
Second version
implies the first.

00:18:04.080 --> 00:18:06.240
The second one is stronger.

00:18:06.240 --> 00:18:08.292
The first one is
slightly more useful.

00:18:08.292 --> 00:18:09.750
They're not
necessarily equivalent,

00:18:09.750 --> 00:18:13.190
but the second one is stronger.

00:18:13.190 --> 00:18:16.370
Any questions?

00:18:16.370 --> 00:18:17.847
All right.

00:18:17.847 --> 00:18:19.680
So this is a
Balog-Szemeredi-Gowers theorem.

00:18:19.680 --> 00:18:21.570
So the content of
today's lecture

00:18:21.570 --> 00:18:24.930
is to show you how to
prove this theorem.

00:18:24.930 --> 00:18:26.940
A remark about the
naming of this theorem.

00:18:26.940 --> 00:18:29.100
So you might notice that
these three letters do not

00:18:29.100 --> 00:18:31.738
coming in alphabetical order.

00:18:31.738 --> 00:18:33.780
And the reason is that
this theorem was initially

00:18:33.780 --> 00:18:37.380
approved by Balog and
Szemeredi, but using

00:18:37.380 --> 00:18:40.890
a more involved
method that didn't

00:18:40.890 --> 00:18:43.470
give polynomial high bounds.

00:18:43.470 --> 00:18:47.190
And Gowers, in his proof
of Szemeredi's theorem,

00:18:47.190 --> 00:18:49.530
his new proof of Szemeredi's
theorem with good bounds,

00:18:49.530 --> 00:18:51.030
he required--

00:18:51.030 --> 00:18:52.470
well, he looked
into this theorem

00:18:52.470 --> 00:18:54.930
and gave a new
proof that resulted

00:18:54.930 --> 00:18:56.990
in this polynomial type bounds.

00:18:56.990 --> 00:18:59.700
And it is that idea that
we're going to see today.

00:19:09.567 --> 00:19:11.650
So this course is called
graph theory and additive

00:19:11.650 --> 00:19:12.850
combinatorics.

00:19:12.850 --> 00:19:15.490
And the last two
topics of this course--

00:19:15.490 --> 00:19:17.680
today being
Balog-Szemeredi-Gowers,

00:19:17.680 --> 00:19:20.740
and tomorrow we're going to
see sum-product problem--

00:19:20.740 --> 00:19:23.500
are both great
examples of problems

00:19:23.500 --> 00:19:26.740
in additive combinatorics
where tools from graph theory

00:19:26.740 --> 00:19:29.960
play an important role
in their solutions.

00:19:29.960 --> 00:19:33.910
So it's a nice combination
of the subject where we see

00:19:33.910 --> 00:19:36.350
both topics at the same time.

00:19:36.350 --> 00:19:39.190
So I want to show you the proof
of Balog-Szemeredi-Gowers,

00:19:39.190 --> 00:19:41.890
and the proof goes
via a graph analog.

00:19:41.890 --> 00:19:44.860
So I'm going to state for
you a graphical version

00:19:44.860 --> 00:19:48.960
of the Balog-Szemeredi-Gowers
theorem.

00:19:48.960 --> 00:19:50.520
And it goes like this.

00:19:50.520 --> 00:20:02.860
If G is a bipartite graph
between vertex sets A and B--

00:20:02.860 --> 00:20:06.140
and here A and B are still
subsets of the Abelian group--

00:20:17.740 --> 00:20:24.910
we define this restricted
sumset, A plus sub G of B,

00:20:24.910 --> 00:20:33.820
to be the set of
sums where I'm only

00:20:33.820 --> 00:20:40.256
taking sums across edges in g.

00:20:44.540 --> 00:20:47.500
So, in particular, if G is
the complete bipartite graph,

00:20:47.500 --> 00:20:50.100
then this is the usual sumset.

00:20:50.100 --> 00:20:54.390
But now I may allow
G to be a subset

00:20:54.390 --> 00:20:56.220
of the complete bipartite graph.

00:20:56.220 --> 00:20:59.430
So only taking some
but not all of the--

00:20:59.430 --> 00:21:01.920
only taking-- yes, some of
this sums but not all of them.

00:21:04.600 --> 00:21:12.940
The graphical version of
Balog-Szemeredi-Gowers

00:21:12.940 --> 00:21:23.290
says that if you have A and B
be subsets of an Abelian group,

00:21:23.290 --> 00:21:27.610
both having size,
at most, n, and G

00:21:27.610 --> 00:21:35.770
is a bipartite graph
between A and B,

00:21:35.770 --> 00:21:42.460
such that G has lots of
edges, has at least n squared

00:21:42.460 --> 00:21:43.570
over k edges.

00:21:47.290 --> 00:21:56.090
If the restricted sumset
between A and B is small--

00:21:56.090 --> 00:22:02.470
So here we're not looking at all
the sums but a large fraction

00:22:02.470 --> 00:22:04.540
of the possible pairwise sums.

00:22:04.540 --> 00:22:07.040
If that sumset has
small size, this

00:22:07.040 --> 00:22:10.210
is kind of like a restricted
doubling constant.

00:22:10.210 --> 00:22:16.090
Then there exists
A prime, subset

00:22:16.090 --> 00:22:26.530
of A, B prime, subset of
B, with A prime and B prime

00:22:26.530 --> 00:22:32.830
both fairly large fractions
of their parent set,

00:22:32.830 --> 00:22:36.970
and such that the unrestricted
sumset between A prime and B

00:22:36.970 --> 00:22:40.270
prime is not too large.

00:22:48.020 --> 00:22:50.390
So let me say it again.

00:22:50.390 --> 00:22:52.760
So we have a fairly dense--

00:22:52.760 --> 00:22:55.180
so a constant
fraction edge density,

00:22:55.180 --> 00:22:59.480
a fairly dense bipartite
graph between A and B. A and B

00:22:59.480 --> 00:23:02.660
are subsets of
the Abelian group.

00:23:02.660 --> 00:23:08.700
Then-- and such that the
restricted sumset is small.

00:23:08.700 --> 00:23:14.840
Then I can restrict A and B to
subsets, fairly large subsets,

00:23:14.840 --> 00:23:19.070
so that the complete sumset
between the subsets A prime

00:23:19.070 --> 00:23:21.170
and B prime is small.

00:23:26.180 --> 00:23:29.720
Let me show you why the
graphical version of BSG

00:23:29.720 --> 00:23:33.270
implies the version of
BSG I stated up there.

00:23:50.630 --> 00:23:54.511
But, so why do we care about
this graphical version?

00:23:54.511 --> 00:23:59.530
Well, suppose we-- so we
have all of these hypotheses.

00:23:59.530 --> 00:24:08.030
Let's write-- so we have all
of those hypotheses up there.

00:24:08.030 --> 00:24:11.938
So let's write r
to be r sub A comma

00:24:11.938 --> 00:24:16.930
B, so I don't have to carry
the subscripts all around.

00:24:16.930 --> 00:24:17.920
What do you think--

00:24:17.920 --> 00:24:20.760
so I start with
A and B up there,

00:24:20.760 --> 00:24:24.660
and I need to
construct that graph G.

00:24:24.660 --> 00:24:26.340
So what should we
choose as our graph?

00:24:30.940 --> 00:24:34.460
Let's consider the popular sums.

00:24:40.370 --> 00:24:44.900
So the popular sums are
going to be elements

00:24:44.900 --> 00:24:50.390
in the complete
sumset such that it

00:24:50.390 --> 00:24:54.530
is represented as a sum
in many different ways.

00:25:02.760 --> 00:25:07.840
And we're going to take
edges that correspond

00:25:07.840 --> 00:25:12.760
to these popular sums.

00:25:12.760 --> 00:25:26.670
So let's consider
bipartite graph G such

00:25:26.670 --> 00:25:39.770
that A comma B is an edge if and
only A plus B is a popular sum.

00:25:46.900 --> 00:25:50.390
So let's verify some
of the hypotheses.

00:25:50.390 --> 00:25:53.110
So we're going to
assume graph BSG,

00:25:53.110 --> 00:25:57.340
and let's verify the
hypothesis in graph BSG.

00:25:57.340 --> 00:25:59.500
On one hand, because
each element of S

00:25:59.500 --> 00:26:05.170
is a popular sum, if we
consider its multiplicity,

00:26:05.170 --> 00:26:13.750
we find that the size of S
multiplied by n over 2k, lower

00:26:13.750 --> 00:26:19.245
bound be size of A
times the size of B.

00:26:19.245 --> 00:26:26.750
So if you think about all the
different pairs in A and B,

00:26:26.750 --> 00:26:31.780
each sum here, each popular
sum, contributes this many times

00:26:31.780 --> 00:26:36.330
to this A cross B.

00:26:36.330 --> 00:26:41.370
So, as a result, because
size of A and size of B

00:26:41.370 --> 00:26:44.880
are both, at most, n, we
find that the size of S

00:26:44.880 --> 00:26:46.180
is, at most, 2kn.

00:26:49.382 --> 00:26:51.780
And if you think
about what G is,

00:26:51.780 --> 00:27:00.840
then this implies also that the
restricted sumset of A and B

00:27:00.840 --> 00:27:02.310
across this graph G--

00:27:02.310 --> 00:27:04.080
which only requires
the popular sums.

00:27:04.080 --> 00:27:10.718
So the restricted sumset is
precisely the popular sums.

00:27:10.718 --> 00:27:13.660
So restricted sumset
is not too large.

00:27:18.930 --> 00:27:19.900
OK, good.

00:27:19.900 --> 00:27:24.020
So we got one of the conditions,
that the restricted sumset

00:27:24.020 --> 00:27:25.910
is not too large.

00:27:25.910 --> 00:27:30.150
And now we want to show that
this graph has lots of edges.

00:27:30.150 --> 00:27:31.360
It has lots of edges.

00:27:36.210 --> 00:27:39.120
And here's where we would need
to use the hypothesis that,

00:27:39.120 --> 00:27:44.166
between A and B, originally
there is large additive energy.

00:27:44.166 --> 00:27:49.980
And the point here is that
these unpopular sums cannot

00:27:49.980 --> 00:27:55.140
contribute very much to the
additive energy in total,

00:27:55.140 --> 00:27:58.240
because each one of
them is unpopular.

00:27:58.240 --> 00:28:01.960
So the dominant contributions
to the additive energy

00:28:01.960 --> 00:28:05.280
are going to come
from the popular sums,

00:28:05.280 --> 00:28:08.910
and we're going to use that to
show that G has lots of edges.

00:28:12.660 --> 00:28:16.980
So let's lower bound the number
of edges of G by first showing

00:28:16.980 --> 00:28:18.820
that--

00:28:18.820 --> 00:28:35.030
so we'll show that the unpopular
sums contribute very little

00:28:35.030 --> 00:28:42.210
to the additive energy
between A and B. Indeed,

00:28:42.210 --> 00:28:49.860
the sums of the squares
of the r's, if for x

00:28:49.860 --> 00:28:58.130
not in popular sums,
is upper bounded by--

00:28:58.130 --> 00:29:01.170
well, claim that
it is upper bounded

00:29:01.170 --> 00:29:09.910
by the following quantity,
that n over 2k times n squared.

00:29:14.520 --> 00:29:19.470
Because I can take
out one factor r,

00:29:19.470 --> 00:29:24.180
upper bound by this
number, just by definition,

00:29:24.180 --> 00:29:27.820
and the sums of the
r's is n squared.

00:29:32.540 --> 00:29:39.150
So you have this additive
energy between A and B.

00:29:39.150 --> 00:29:41.190
I know that it is
large by hypothesis.

00:29:45.940 --> 00:29:48.310
Whereas, I also know
that I can write it

00:29:48.310 --> 00:29:52.570
as a sum of the squares
of the r's, which

00:29:52.570 --> 00:30:00.550
I can break into the
popular contributions

00:30:00.550 --> 00:30:02.530
and the unpopular contributions.

00:30:05.533 --> 00:30:06.950
And, hopefully,
this should all be

00:30:06.950 --> 00:30:09.470
somewhat reminiscent of
basically all these proofs

00:30:09.470 --> 00:30:11.200
that we did so far
in this course,

00:30:11.200 --> 00:30:14.750
where we separate a sum
into the dominant terms

00:30:14.750 --> 00:30:16.510
and the minor terms.

00:30:16.510 --> 00:30:20.010
This came up in Fourier
analysis in particular.

00:30:20.010 --> 00:30:24.320
So we do this
splitting, and we upper

00:30:24.320 --> 00:30:28.820
bound the unpopular
contributions by the estimate

00:30:28.820 --> 00:30:29.890
from just now.

00:30:36.810 --> 00:30:40.800
So, as a result, bringing
this small error term,

00:30:40.800 --> 00:30:44.610
it doesn't cancel
much of the energy.

00:30:44.610 --> 00:30:52.350
So we still have a lower bound
on the sum of the squares

00:30:52.350 --> 00:30:56.590
of the r's in the popular sums.

00:31:00.010 --> 00:31:04.240
But I can also give a fairly
trivial upper bound to a single

00:31:04.240 --> 00:31:08.050
r, namely it cannot
be bigger than n.

00:31:16.220 --> 00:31:23.860
And so the number
of edges of G--

00:31:23.860 --> 00:31:27.480
so what's the number
of edges of G?

00:31:27.480 --> 00:31:28.260
Look at that.

00:31:28.260 --> 00:31:33.470
Each x here contributes
rx many edges.

00:31:33.470 --> 00:31:36.750
So the number of edges of G is
simply the sums of these rx's.

00:31:41.310 --> 00:31:42.690
Which is quite large.

00:31:49.740 --> 00:31:56.070
So the hypothesis of
graph BSG are satisfied.

00:31:56.070 --> 00:31:59.850
And so we can use the
conclusion of graph BSG, which

00:31:59.850 --> 00:32:02.730
is the conclusion that
we're looking for in BSG.

00:32:11.520 --> 00:32:12.532
Any questions?

00:32:17.095 --> 00:32:17.595
Good.

00:32:17.595 --> 00:32:19.860
So the remaining
task is to prove

00:32:19.860 --> 00:32:23.160
the graphical version of BSG.

00:32:23.160 --> 00:32:26.040
So let's take a
quick break, and when

00:32:26.040 --> 00:32:30.030
we come back we'll
focus on this theorem,

00:32:30.030 --> 00:32:35.140
and it has some nice
graph theoretic arguments.

00:32:35.140 --> 00:32:37.430
OK, let's continue.

00:32:37.430 --> 00:32:42.230
We've reduced the proof of the
Balog-Szemeredi-Gowers theorem

00:32:42.230 --> 00:32:44.540
to the following
graphical result.

00:32:44.540 --> 00:32:46.170
Well, it's not just
graphical, right?

00:32:46.170 --> 00:32:49.370
Still-- we're still inside
some an Abelian group,

00:32:49.370 --> 00:32:52.570
still looking at some set
in some Abelian group,

00:32:52.570 --> 00:32:57.140
but, certainly, now it has
a graph attached to it.

00:32:57.140 --> 00:33:01.410
Let me show this theorem
through several steps.

00:33:01.410 --> 00:33:04.700
First, something called
a path of length 2 lemma.

00:33:15.502 --> 00:33:17.860
So the path of length
2 lemma, the statement

00:33:17.860 --> 00:33:21.340
is that you start
with a graph G which

00:33:21.340 --> 00:33:27.130
is a bipartite graph
between vertex sets A and B.

00:33:27.130 --> 00:33:29.050
And now A and B no longer need--

00:33:29.050 --> 00:33:30.100
they're just sets.

00:33:30.100 --> 00:33:31.150
They're just vertex sets.

00:33:31.150 --> 00:33:34.550
We're not going to have sums.

00:33:34.550 --> 00:33:38.175
And the number of edges is
at least a constant fraction

00:33:38.175 --> 00:33:39.175
of the maximum possible.

00:33:45.570 --> 00:33:48.620
Then the conclusion
is that there

00:33:48.620 --> 00:33:55.385
exists some U, a subset of A,
such that U is fairly large.

00:33:59.650 --> 00:34:10.199
And between most pairs
of elements of U--

00:34:10.199 --> 00:34:24.880
so between 1 minus epsilon
fraction of pairs of U--

00:34:24.880 --> 00:34:30.840
there are lots of
common neighbors.

00:34:30.840 --> 00:34:36.650
So at least epsilon
delta squared

00:34:36.650 --> 00:34:46.230
B over 2 common neighbors.

00:34:46.230 --> 00:34:58.550
So you start with this bipartite
graph A and B. Lots of edges.

00:34:58.550 --> 00:35:01.520
And we would like
to show that there

00:35:01.520 --> 00:35:08.840
exists a pretty large subset U
such that between most pairs--

00:35:08.840 --> 00:35:11.150
all but an epsilon fraction--

00:35:11.150 --> 00:35:12.980
of ordered pairs--
they could be the same,

00:35:12.980 --> 00:35:15.350
but it doesn't really matter--

00:35:15.350 --> 00:35:22.600
the number of paths of length
2 between these two vertices

00:35:22.600 --> 00:35:24.920
is quite large.

00:35:24.920 --> 00:35:28.430
So they have lots
of common neighbors.

00:35:28.430 --> 00:35:30.440
Where have we seen
something like this before?

00:35:30.440 --> 00:35:30.890
There's a question?

00:35:30.890 --> 00:35:32.694
AUDIENCE: Is there a
[INAUDIBLE] epsilon?

00:35:32.694 --> 00:35:33.600
YUFEI ZHAO: Ah, yes.

00:35:33.600 --> 00:35:37.156
So for every epsilon
and every delta.

00:35:37.156 --> 00:35:43.634
So let epsilon,
delta be parameters.

00:35:48.080 --> 00:35:51.860
Where have we seen
something like this before?

00:35:51.860 --> 00:35:54.620
So in a bipartite graph
with lots of edges,

00:35:54.620 --> 00:35:59.180
I want to find a large
subset of one of the parts

00:35:59.180 --> 00:36:02.270
so that every pair of
elements, or almost

00:36:02.270 --> 00:36:05.842
every pair of elements, have
lots of common neighbors.

00:36:11.254 --> 00:36:12.238
Yes.

00:36:12.238 --> 00:36:13.570
AUDIENCE: [INAUDIBLE].

00:36:13.570 --> 00:36:15.070
YUFEI ZHAO: Dependent
random choice.

00:36:15.070 --> 00:36:17.150
So in the very first
chapter of this course,

00:36:17.150 --> 00:36:19.340
when we did extremal
graph theory

00:36:19.340 --> 00:36:21.530
forbidding bipartite
subgraphs, there

00:36:21.530 --> 00:36:26.900
was a technique for proving the
extremal number, upper bounds,

00:36:26.900 --> 00:36:29.540
for bipartite graphs
of bounded degree.

00:36:29.540 --> 00:36:32.810
And there we used something
called dependent random choice

00:36:32.810 --> 00:36:35.860
that had a conclusion that
was very similar flavor.

00:36:35.860 --> 00:36:39.848
So there, we had every pair--
so a fairly large, but not as

00:36:39.848 --> 00:36:41.390
large as this-- a
fairly large subset

00:36:41.390 --> 00:36:45.640
where every pair of elements
had lots of common neighbors.

00:36:45.640 --> 00:36:48.230
For every couple, every
k couple of vertices,

00:36:48.230 --> 00:36:50.330
have lots of common neighbors.

00:36:50.330 --> 00:36:51.470
So it's very similar.

00:36:51.470 --> 00:36:53.960
In fact, it's the
same type of technique

00:36:53.960 --> 00:36:56.480
that we'll use to prove
this lemma over here.

00:37:00.390 --> 00:37:05.030
So who remembers how
dependent random choice goes?

00:37:05.030 --> 00:37:09.120
So the idea is that we
are going to choose U

00:37:09.120 --> 00:37:11.200
not uniformly at random.

00:37:11.200 --> 00:37:12.872
So that's not going to work.

00:37:12.872 --> 00:37:15.860
Going to choose it in
a dependent random way.

00:37:15.860 --> 00:37:19.630
So I want elements of U to
have lots of common neighbors,

00:37:19.630 --> 00:37:20.720
typically.

00:37:20.720 --> 00:37:24.950
So one way to guarantee
this is to choose U to be

00:37:24.950 --> 00:37:28.550
a neighborhood from the right.

00:37:28.550 --> 00:37:33.640
So pick a random
element in B and choose

00:37:33.640 --> 00:37:37.300
U to be its neighborhood.

00:37:37.300 --> 00:37:39.230
So let's do that.

00:37:39.230 --> 00:37:41.210
So we're going to use
dependent random choice.

00:37:47.505 --> 00:37:49.380
See, everything in the
course comes together.

00:37:56.000 --> 00:38:04.480
So let's pick v an element
of B uniformly at random.

00:38:09.580 --> 00:38:15.440
And let U be the neighborhood
v. So, first of all,

00:38:15.440 --> 00:38:19.100
by linearity of
expectations, the size of U

00:38:19.100 --> 00:38:27.600
is at least delta of A. So
because the average degree

00:38:27.600 --> 00:38:32.472
from the right from B is
at least delta A just based

00:38:32.472 --> 00:38:33.430
on the number of edges.

00:38:36.550 --> 00:38:43.560
If you have two vertices
a and a prime in A

00:38:43.560 --> 00:39:04.220
with a small number of common
neighbors, then the size of--

00:39:04.220 --> 00:39:07.020
so sorry.

00:39:07.020 --> 00:39:10.520
Let me-- I skipped ahead a bit.

00:39:10.520 --> 00:39:15.170
So if a and a prime have a small
number of common neighbors,

00:39:15.170 --> 00:39:22.730
then the probability that
a and a prime both lie in U

00:39:22.730 --> 00:39:25.580
should be quite small.

00:39:25.580 --> 00:39:28.670
Because if they both had--

00:39:28.670 --> 00:39:33.040
if a and a prime have a small
number of common neighbors,

00:39:33.040 --> 00:39:36.863
in order for a and a prime
to be included in this U,

00:39:36.863 --> 00:39:37.780
you must have chosen--

00:39:41.550 --> 00:39:44.920
so suppose this were
their common neighbor.

00:39:44.920 --> 00:39:49.550
Then in order that a and
a prime be contained in U,

00:39:49.550 --> 00:39:54.470
it must have chosen this v to be
inside the common neighborhood

00:39:54.470 --> 00:39:55.300
of a and a prime.

00:39:57.840 --> 00:39:59.940
Which is unlikely
if a and a prime

00:39:59.940 --> 00:40:02.940
had a small number
of common neighbors.

00:40:02.940 --> 00:40:08.600
So this probability is, at most,
epsilon delta squared over 2.

00:40:12.460 --> 00:40:15.520
Just think about how
U is constructed.

00:40:15.520 --> 00:40:26.740
So if we let x be the number
of a and a primes in U cross U

00:40:26.740 --> 00:40:34.800
with, at most, epsilon
delta squared over 2 times B

00:40:34.800 --> 00:40:42.900
common neighbors, then, by
linearity of expectations,

00:40:42.900 --> 00:40:46.300
the expectation of x is--

00:40:46.300 --> 00:40:54.420
well, by summing up all of these
probabilities of a and a prime,

00:40:54.420 --> 00:40:56.890
both being in U--

00:40:56.890 --> 00:41:02.010
so this is, at most,
epsilon delta squared

00:41:02.010 --> 00:41:04.740
over 2 times size of A squared.

00:41:08.250 --> 00:41:12.030
So, typically, at
least in expectation,

00:41:12.030 --> 00:41:16.260
you do not expect very
many pairs of elements in U

00:41:16.260 --> 00:41:20.510
with few common neighbors.

00:41:20.510 --> 00:41:22.940
But we can also turn
such an estimate

00:41:22.940 --> 00:41:24.830
into a specific instance.

00:41:28.015 --> 00:41:33.840
And the way to do this is to
consider the quantity size of U

00:41:33.840 --> 00:41:39.280
squared minus x over epsilon.

00:41:39.280 --> 00:41:43.070
Well, first of all, we can
lower bound this quantity,

00:41:43.070 --> 00:41:47.630
because the size of
second moment of U

00:41:47.630 --> 00:41:53.030
is at least the first
moment of U squared.

00:41:53.030 --> 00:42:02.450
And we also know that the
size of x in expectation

00:42:02.450 --> 00:42:04.830
is not very large.

00:42:04.830 --> 00:42:07.700
So the whole expression
can be lower bounded

00:42:07.700 --> 00:42:15.910
by delta squared over 2
times the size of A squared.

00:42:25.630 --> 00:42:26.935
So this is epsilon, sorry.

00:42:30.120 --> 00:42:34.880
Therefore, there is
some concrete instance

00:42:34.880 --> 00:42:39.110
of this randomness resulting
in some specific U such

00:42:39.110 --> 00:42:41.180
that this inequality holds.

00:42:41.180 --> 00:42:54.330
So there exists some U such
that this inequality holds.

00:42:54.330 --> 00:43:03.310
And, in particular, we find
that the size of U is at least--

00:43:03.310 --> 00:43:05.110
just forget about
this minus term--

00:43:05.110 --> 00:43:08.950
is at least that right-hand
side, square root.

00:43:08.950 --> 00:43:11.500
So, in particular, the
size of U is at least

00:43:11.500 --> 00:43:14.415
delta over 2 times
the size of A.

00:43:14.415 --> 00:43:18.307
And, just looking at the
left-hand side, which

00:43:18.307 --> 00:43:20.890
must be a non-negative quantity
because the right-hand side is

00:43:20.890 --> 00:43:26.800
non-negative, we find that
x is, at most, an epsilon

00:43:26.800 --> 00:43:29.786
fraction of U squared.

00:43:34.480 --> 00:43:38.730
So putting these
together, we arrive

00:43:38.730 --> 00:43:42.280
at the path of length 2 lemma.

00:43:42.280 --> 00:43:43.830
So let me go through it again.

00:43:43.830 --> 00:43:46.100
So this is the dependent
random choice method,

00:43:46.100 --> 00:43:50.480
where we're going to--
we want to find this U,

00:43:50.480 --> 00:43:52.430
where most pairs
of vertices in U

00:43:52.430 --> 00:43:55.920
have lots of common neighbors.

00:43:55.920 --> 00:43:58.640
So we start from the right side.

00:43:58.640 --> 00:44:02.480
We start from B, pick a
uniform random vertex, which

00:44:02.480 --> 00:44:08.598
you call v, and let U be
the neighborhood of v.

00:44:08.598 --> 00:44:11.140
And I claim that this
U, typically, should

00:44:11.140 --> 00:44:13.170
have the desired property.

00:44:13.170 --> 00:44:18.160
And the reason is that, if
you have a pair of vertices

00:44:18.160 --> 00:44:24.030
on the left that do not
have many common neighbors,

00:44:24.030 --> 00:44:27.360
then I claim it is highly
unlikely that these two

00:44:27.360 --> 00:44:33.360
vertices both appear in U.
Because for them to both appear

00:44:33.360 --> 00:44:38.310
in U, your v have been selected
inside the common neighborhood

00:44:38.310 --> 00:44:42.390
of a and a prime, which is
unlikely if a and a prime

00:44:42.390 --> 00:44:46.550
have few common neighbors.

00:44:46.550 --> 00:44:50.050
So, as a result,
the expected number

00:44:50.050 --> 00:44:58.338
of pairs in U with small number
of common neighbors is small.

00:44:58.338 --> 00:45:00.130
And, already, that's
a very good indication

00:45:00.130 --> 00:45:01.020
that we're on the right track.

00:45:01.020 --> 00:45:02.670
And, to finish
things off, we look

00:45:02.670 --> 00:45:07.830
at this expression, which we
can lower bound by convexity.

00:45:07.830 --> 00:45:10.890
And we know the size of U
in expectation is large.

00:45:10.890 --> 00:45:13.470
And, also, the size of
x, that we just saw,

00:45:13.470 --> 00:45:17.260
is small in expectation.

00:45:17.260 --> 00:45:19.890
So you have this
inequality over here.

00:45:19.890 --> 00:45:21.690
And because there's
an expectation,

00:45:21.690 --> 00:45:25.560
it implies that there's some
specific instance such that,

00:45:25.560 --> 00:45:28.800
without the expectation,
the inequality holds.

00:45:28.800 --> 00:45:30.570
So take that specific instance.

00:45:30.570 --> 00:45:34.740
We obtain some U such that
this inequality is true,

00:45:34.740 --> 00:45:37.800
which simultaneously
implies that U is large

00:45:37.800 --> 00:45:40.910
and x, the number of
bad pairs, is small.

00:45:43.796 --> 00:45:47.850
So that was dependent
random choice.

00:45:47.850 --> 00:45:48.953
Any questions?

00:45:51.731 --> 00:45:54.046
All right.

00:45:54.046 --> 00:45:56.250
So that was the path
of length 2 lemma.

00:45:56.250 --> 00:45:57.900
So it tells us I
can take a large set

00:45:57.900 --> 00:46:02.590
with lots of paths of length 2
between most pairs of vertices.

00:46:02.590 --> 00:46:07.888
Let's upgrade this lemma to
a path of length 3 lemma.

00:46:18.850 --> 00:46:20.640
So, in the path
of length 3 lemma,

00:46:20.640 --> 00:46:27.378
we start with a bipartite graph,
as before, between A and B.

00:46:27.378 --> 00:46:33.970
So G is a bipartite
between A and B.

00:46:33.970 --> 00:46:39.230
And, as before, we have a
lot of edges between A and B.

00:46:39.230 --> 00:46:42.690
It's the delta fraction
of all possible edges.

00:46:42.690 --> 00:46:50.840
Then the conclusion is that
there exists A prime in A and B

00:46:50.840 --> 00:46:57.270
prime subset of B such
that A prime and B

00:46:57.270 --> 00:47:01.560
prime are both large
fractions of their parent set.

00:47:08.070 --> 00:47:15.820
And now, the-- and,
furthermore, every pair

00:47:15.820 --> 00:47:24.390
between A prime and
B prime is joined

00:47:24.390 --> 00:47:28.895
by many paths of length 3.

00:47:36.820 --> 00:47:39.300
So a path of length 3
means there's 3 edges.

00:47:42.610 --> 00:47:50.130
And, here, this eta is basically
the original error term

00:47:50.130 --> 00:47:51.690
up to a polynomial change.

00:48:00.270 --> 00:48:05.020
So starting with this bipartite
graph that's fairly dense,

00:48:05.020 --> 00:48:08.500
the lemma tells us
that we can find

00:48:08.500 --> 00:48:13.870
some large A prime
and large B prime so

00:48:13.870 --> 00:48:17.440
that between every vertex in
A prime and every vertex in B

00:48:17.440 --> 00:48:21.960
prime, there are lots of paths
of length 3 between them.

00:48:28.530 --> 00:48:29.263
Every time.

00:48:33.500 --> 00:48:37.215
So we should think about
all of these constants as--

00:48:37.215 --> 00:48:39.830
plus you only make polynomial
changes in the constants,

00:48:39.830 --> 00:48:42.290
we're happy.

00:48:42.290 --> 00:48:46.560
Here, eta is a polynomial
change in the delta.

00:48:46.560 --> 00:48:49.130
There's a convention which I
like which is not universal,

00:48:49.130 --> 00:48:51.860
but it's often solved,
unlike this convention.

00:48:51.860 --> 00:48:53.930
It's the difference
between the little c

00:48:53.930 --> 00:48:56.630
and the big C is that
a little c is better

00:48:56.630 --> 00:48:59.440
if you make it smaller,
and a big C is better--

00:48:59.440 --> 00:49:02.420
I mean, it's better in
the sense that if this

00:49:02.420 --> 00:49:04.970
is true for little
c and big C, and you

00:49:04.970 --> 00:49:10.050
make little c smaller and big C
bigger, then it is still true.

00:49:10.050 --> 00:49:12.570
So big C is a sufficiently
large constant,

00:49:12.570 --> 00:49:15.436
and little c is a
sufficiently small constant.

00:49:15.436 --> 00:49:16.422
Just a--

00:49:30.740 --> 00:49:36.650
So let's see the path of
length 3 lemma, see it's proof.

00:49:36.650 --> 00:49:39.460
We're going to use the
path of length 2 lemma,

00:49:39.460 --> 00:49:42.070
but we need a bit of
preparation first.

00:49:42.070 --> 00:49:46.930
So the proof has some nice
ideas, but it's also--

00:49:46.930 --> 00:49:50.740
some parts of it are slightly
tedious, so bear with me.

00:49:50.740 --> 00:49:58.690
So we're going to construct
a chain of subsets A--

00:49:58.690 --> 00:50:02.717
inside A. So A1, A2, A3.

00:50:02.717 --> 00:50:04.800
And this is just because
there's a few cleaning up

00:50:04.800 --> 00:50:08.100
steps that need to be done.

00:50:08.100 --> 00:50:19.880
Let's call two
vertices in A friendly

00:50:19.880 --> 00:50:23.980
if they have lots
of common neighbors.

00:50:23.980 --> 00:50:25.430
And, precisely,
we're going to say

00:50:25.430 --> 00:50:28.750
they're friendly if they
have more than delta

00:50:28.750 --> 00:50:34.770
squared over 80 times the
size of B common neighbors.

00:50:41.770 --> 00:50:46.590
Let me construct this sequence
of subsets as follows.

00:50:46.590 --> 00:50:53.870
First, let A1 be all
the vertices in A

00:50:53.870 --> 00:50:58.200
with degree not too small.

00:50:58.200 --> 00:51:02.500
So this is in preparation.

00:51:02.500 --> 00:51:05.950
So it will make our life
quite a bit easier later on.

00:51:05.950 --> 00:51:09.100
Let's just trim all the
really small degree vertices

00:51:09.100 --> 00:51:11.850
so that we don't have
to think about them.

00:51:11.850 --> 00:51:15.870
So you trim all the
small degree vertices.

00:51:15.870 --> 00:51:20.420
And think about how
many edges you trim.

00:51:20.420 --> 00:51:25.700
You cannot trim so many edges,
because each time you trim such

00:51:25.700 --> 00:51:30.100
a vertex, you only get rid
of a small number of edges.

00:51:30.100 --> 00:51:34.300
So, in the end, at least half
of the original set of edges

00:51:34.300 --> 00:51:36.690
must remain.

00:51:36.690 --> 00:51:43.320
And, as a result, the
size of A1 is at least

00:51:43.320 --> 00:51:50.590
a delta over 2 fraction of
the original vertex set.

00:51:50.590 --> 00:51:53.050
Otherwise, you could
not have contained half

00:51:53.050 --> 00:51:57.460
of the original set of edges.

00:51:57.460 --> 00:51:59.380
So this is the
first trimming step.

00:52:02.180 --> 00:52:07.390
So we got rid of some edges, but
we got rid of fewer than half

00:52:07.390 --> 00:52:10.940
of the original edges.

00:52:10.940 --> 00:52:15.720
And because now you have
a minimum degree on A1,

00:52:15.720 --> 00:52:18.670
the number of edges
between A1 and B

00:52:18.670 --> 00:52:22.660
is quite large,
still quite large.

00:52:22.660 --> 00:52:27.040
So think about passing
down to A1 now.

00:52:27.040 --> 00:52:31.530
In the second step, we are going
to apply the path of length 2

00:52:31.530 --> 00:52:34.480
lemma to this A1.

00:52:34.480 --> 00:52:41.880
So A2 is going to be
constructed from--

00:52:41.880 --> 00:52:50.170
so using the path
of length 2 lemma,

00:52:50.170 --> 00:52:56.620
specifically with parameter
epsilon being delta over 10.

00:52:56.620 --> 00:52:59.440
Although, remember, now
the density of the graph

00:52:59.440 --> 00:53:01.760
went from delta to delta over 2.

00:53:01.760 --> 00:53:04.245
Again, if you don't care
about the specific numbers,

00:53:04.245 --> 00:53:05.620
they're all
polynomials in delta.

00:53:05.620 --> 00:53:06.703
So don't worry about them.

00:53:06.703 --> 00:53:08.590
Everything's poly delta.

00:53:08.590 --> 00:53:11.860
So we're going to apply
the path of length 2 lemma

00:53:11.860 --> 00:53:16.240
to find this subset A2.

00:53:16.240 --> 00:53:25.660
And it has the property
that A2 is quite large,

00:53:25.660 --> 00:53:45.320
and all but a small fraction
of pairs in A2 are friendly.

00:53:54.580 --> 00:53:59.540
So we passed down to, first,
trimming small degree vertices,

00:53:59.540 --> 00:54:02.120
and then passed
down further to A2,

00:54:02.120 --> 00:54:06.020
where all but a small
fraction of elements in A2,

00:54:06.020 --> 00:54:08.563
or all but a small
fraction of the pairs

00:54:08.563 --> 00:54:10.230
are friendly to each
other, meaning they

00:54:10.230 --> 00:54:11.770
have lots of common neighbors.

00:54:15.020 --> 00:54:16.870
And now let's look
at the other side.

00:54:16.870 --> 00:54:21.610
Let's look at B. So
we're in this situation

00:54:21.610 --> 00:54:24.520
now where you have--

00:54:27.760 --> 00:54:31.390
so we're now in a situation
where you've passed down

00:54:31.390 --> 00:54:42.630
to A2 and in B, where, because
of what we did initially,

00:54:42.630 --> 00:54:47.410
every vertex in here
have large degree.

00:54:47.410 --> 00:54:53.830
So there's this minimum
degree condition

00:54:53.830 --> 00:54:57.190
from every vertex on the left.

00:54:57.190 --> 00:55:00.250
So the average degree
is still very high.

00:55:02.960 --> 00:55:07.020
As a result, the
average degree from B

00:55:07.020 --> 00:55:09.280
is going to be quite high.

00:55:09.280 --> 00:55:13.720
So let's focus on the B side
and pick out vertices in B

00:55:13.720 --> 00:55:16.570
that have high degree.

00:55:16.570 --> 00:55:23.390
So let's B1 denote
vertices in B such

00:55:23.390 --> 00:55:30.530
that the degree from B
to A2 is at least half

00:55:30.530 --> 00:55:33.390
of what you expect
based on average degree.

00:55:37.850 --> 00:55:41.390
And, as before, the same
logic as the A1 step.

00:55:41.390 --> 00:55:52.760
We see that B1 has large size,
is a large fraction of B.

00:55:52.760 --> 00:55:57.410
And now we pass
down to this B1 set.

00:56:04.214 --> 00:56:17.970
Now, finally, let's consider
A3 to be vertices in A2

00:56:17.970 --> 00:56:21.490
where a is friendly.

00:56:21.490 --> 00:56:28.650
So vertices a in A2 such that
a is friendly to at least 1

00:56:28.650 --> 00:56:31.210
over delta over--

00:56:31.210 --> 00:56:38.740
so 1 minus delta over
5 fraction of A2.

00:56:42.590 --> 00:56:50.430
So we saw that, in A2, most
pairs of vertices are friendly.

00:56:50.430 --> 00:57:00.000
So most, meaning all but
a delta over 10 fraction.

00:57:00.000 --> 00:57:05.770
So if we consider
vertices which are

00:57:05.770 --> 00:57:10.740
unfriendly to many
other vertices in A2,

00:57:10.740 --> 00:57:13.560
there aren't so many of them.

00:57:13.560 --> 00:57:16.440
If there were many of them,
you couldn't have had that.

00:57:16.440 --> 00:57:18.850
So that's why I
constructed this set

00:57:18.850 --> 00:57:23.540
A3 consisting of
elements in A2 that

00:57:23.540 --> 00:57:27.110
are friendly to many elements.

00:57:27.110 --> 00:57:32.990
And the size of A3 is at
least half of that of A2.

00:57:40.974 --> 00:57:50.510
So we have this A3 inside.

00:57:50.510 --> 00:57:51.696
All right.

00:57:51.696 --> 00:57:57.235
And now I claim that we can
take A3 and B as our final sets,

00:57:57.235 --> 00:58:01.510
and that between every vertex
in A3 and every vertex in B1,

00:58:01.510 --> 00:58:04.990
I claim there must be
lots of paths of length 3.

00:58:04.990 --> 00:58:07.420
But, first, let's
check their sizes.

00:58:07.420 --> 00:58:09.820
I mean, the sizes all should
be OK, because we never

00:58:09.820 --> 00:58:11.960
lost too much at each step.

00:58:11.960 --> 00:58:13.740
If you only care about
polynomial factors,

00:58:13.740 --> 00:58:15.990
well, you already see that
we never lost anything more

00:58:15.990 --> 00:58:17.590
than a polynomial factor.

00:58:17.590 --> 00:58:20.510
But just to be precise, the
size of A3 is at least--

00:58:20.510 --> 00:58:24.320
so if you count up the
factor lost at each step,

00:58:24.320 --> 00:58:29.230
so it's 1/2 delta
over 4 delta over 2.

00:58:29.230 --> 00:58:34.480
So it's at least delta
squared over 16 fraction

00:58:34.480 --> 00:58:36.870
of the original set A.

00:58:36.870 --> 00:58:44.320
And now, if we
consider a comma b

00:58:44.320 --> 00:58:49.650
to be an arbitrary
pair in A3 cross B1,

00:58:49.650 --> 00:58:53.140
I claim that there
must be many paths.

00:58:53.140 --> 00:58:59.090
Because by using-- so what
properties do we know?

00:58:59.090 --> 00:59:05.920
We know that b is adjacent
to a large fraction.

00:59:05.920 --> 00:59:10.300
So here large means at least
delta over 4-- so bounded

00:59:10.300 --> 00:59:16.080
below-- a large fraction of A2.

00:59:16.080 --> 00:59:16.580
Yes.

00:59:16.580 --> 00:59:17.302
So I apologize.

00:59:17.302 --> 00:59:19.260
When I say the word large,
depending on context

00:59:19.260 --> 00:59:22.640
it can mean bigger than delta,
or it could mean at least 1

00:59:22.640 --> 00:59:23.370
minus delta.

00:59:23.370 --> 00:59:25.380
So you look at
what I write down.

00:59:25.380 --> 00:59:31.070
So b is adjacent to at least
delta over 4 fraction of A2.

00:59:31.070 --> 00:59:39.300
At the same time, we know that
a is friendly to at least 1

00:59:39.300 --> 00:59:43.940
minus delta over
5 fraction of A2.

00:59:49.000 --> 00:59:54.070
So these two sets, they must
overlap by at least a delta

00:59:54.070 --> 00:59:55.060
over 20 fraction.

01:00:00.351 --> 01:00:05.260
So let's take a vertex b.

01:00:05.260 --> 01:00:13.100
So you-- so it's adjacent
to many vertices here.

01:00:13.100 --> 01:00:17.526
And if you look
at a vertex in A,

01:00:17.526 --> 01:00:21.240
it's friendly to
a large fraction.

01:00:21.240 --> 01:00:25.570
So, in particular, it's
friendly to all these elements

01:00:25.570 --> 01:00:26.070
over here.

01:00:28.760 --> 01:00:34.840
So, to finish off, what
does it mean for a--

01:00:34.840 --> 01:00:37.600
this is-- this vertex is a.

01:00:37.600 --> 01:00:38.810
This vertex is b.

01:00:38.810 --> 01:00:40.970
What does it mean for
a to be friendly to all

01:00:40.970 --> 01:00:42.830
of these shaded elements?

01:00:42.830 --> 01:00:47.510
It means that there are
lots of paths from a

01:00:47.510 --> 01:00:51.930
to each of these elements.

01:00:51.930 --> 01:00:55.974
And then you can finish off
the paths going back to b.

01:00:55.974 --> 01:00:56.750
Yes.

01:00:56.750 --> 01:01:00.092
AUDIENCE: The shaded stuff is
allowed to be outside of A3?

01:01:00.092 --> 01:01:01.800
YUFEI ZHAO: No. the
shaded-- the question

01:01:01.800 --> 01:01:04.040
is, is the shaded stuff
allowed to be outside of A3?

01:01:04.040 --> 01:01:04.540
No.

01:01:04.540 --> 01:01:06.870
The shaded things are inside A3.

01:01:06.870 --> 01:01:10.900
So we're looking at
intersections within A3.

01:01:14.550 --> 01:01:15.090
No, sorry.

01:01:17.055 --> 01:01:18.180
Actually, no, you're right.

01:01:18.180 --> 01:01:20.530
So the shaded things
can be outside A3.

01:01:20.530 --> 01:01:22.450
So shaded things
can be outside A3.

01:01:22.450 --> 01:01:23.130
I apologize.

01:01:23.130 --> 01:01:25.650
So everything now is in A2.

01:01:28.800 --> 01:01:35.100
So b is adjacent to a
large fraction of A2.

01:01:35.100 --> 01:01:43.800
And a here is friendly to some
part of the neighbors of b.

01:01:43.800 --> 01:01:48.920
So you can complete
paths like that.

01:01:52.750 --> 01:01:53.250
Yes.

01:01:53.250 --> 01:01:54.880
So only the starting
and ending points

01:01:54.880 --> 01:01:56.410
have to be in A
prime and B prime.

01:01:56.410 --> 01:01:58.970
Everything else, they can go
outside of the A prime and B

01:01:58.970 --> 01:02:00.666
prime.

01:02:00.666 --> 01:02:03.970
Yes, thank you.

01:02:03.970 --> 01:02:22.516
So the number of paths from
a to B to A2 back to b is--

01:02:22.516 --> 01:02:24.960
let's see if I can
stay within B1--

01:02:24.960 --> 01:02:26.010
so is at least--

01:02:34.110 --> 01:02:34.610
yes.

01:02:34.610 --> 01:02:36.530
So it's-- sorry.

01:02:36.530 --> 01:02:44.070
This is B. So it's at least
delta over 20 times A2 times

01:02:44.070 --> 01:02:49.680
delta over delta squared
over 80 times B. So

01:02:49.680 --> 01:02:52.470
if you don't care about
polynomial factors in delta,

01:02:52.470 --> 01:02:55.000
then you see that--

01:02:58.080 --> 01:03:00.140
the point is there's
a large fraction of--

01:03:02.418 --> 01:03:03.460
there are a lot of paths.

01:03:03.460 --> 01:03:07.000
So there are a lot of paths
between each little a and each

01:03:07.000 --> 01:03:09.670
little b by the
construction we've done.

01:03:15.190 --> 01:03:16.630
So let me just do a recap.

01:03:16.630 --> 01:03:19.990
So there were quite a few
details in this proof,

01:03:19.990 --> 01:03:22.300
and some of them have
to do with cleaning up.

01:03:22.300 --> 01:03:24.640
Because it's not so
nice to work with graphs

01:03:24.640 --> 01:03:26.850
that just have large
average degree.

01:03:26.850 --> 01:03:28.480
It's much nicer to
work with graphs

01:03:28.480 --> 01:03:29.920
with large minimum degree.

01:03:29.920 --> 01:03:33.580
So there are a couple of steps
here to take care of vertices

01:03:33.580 --> 01:03:34.990
with small degrees.

01:03:34.990 --> 01:03:39.290
So we started with, between
A and B, lots of edges.

01:03:39.290 --> 01:03:42.410
And we trim vertices
from A with small degree.

01:03:42.410 --> 01:03:45.250
So we get A1.

01:03:45.250 --> 01:03:48.970
And then we apply the path
of length 2 lemma to get A2.

01:03:48.970 --> 01:03:52.838
So inside A2, most
pairs of vertices

01:03:52.838 --> 01:03:54.630
have lots of common
neighbors, but not all.

01:03:57.510 --> 01:04:01.860
We then go back to
B to get B1, which

01:04:01.860 --> 01:04:04.940
has large minimum degree to A2.

01:04:07.650 --> 01:04:12.030
And then A3 looks
at vertices in A

01:04:12.030 --> 01:04:16.490
with many friendly
companions in A2.

01:04:20.200 --> 01:04:24.100
And A3 is large, and I claim
that between every vertex in A3

01:04:24.100 --> 01:04:28.120
and every vertex in B, you
have many paths of length 3.

01:04:28.120 --> 01:04:32.340
Because if you start
with a vertex in A3,

01:04:32.340 --> 01:04:35.430
it has many friendly companions.

01:04:35.430 --> 01:04:41.640
So many here means at least 1
minus delta over 5 fraction.

01:04:41.640 --> 01:04:49.470
Whereas every vertex in B1
has lots of neighbors in A2,

01:04:49.470 --> 01:04:53.430
where lots means at
least delta over 4.

01:04:53.430 --> 01:04:56.610
So there's
necessarily an overlap

01:04:56.610 --> 01:04:59.230
of at least delta over 20.

01:04:59.230 --> 01:05:01.750
And for that overlap,
we can create

01:05:01.750 --> 01:05:07.510
lots of paths going
through this overlap from A

01:05:07.510 --> 01:05:12.947
to B. Any questions?

01:05:16.220 --> 01:05:16.780
OK, great.

01:05:16.780 --> 01:05:21.940
So let's put everything together
to prove the graphical version

01:05:21.940 --> 01:05:23.704
of Balog-Szemeredi-Gowers.

01:05:31.040 --> 01:05:32.730
So we'll prove the
graphical version

01:05:32.730 --> 01:05:34.452
of Balog-Szemeredi-Gowers.

01:05:42.660 --> 01:05:46.920
So by-- so, first, note
that the hypothesis

01:05:46.920 --> 01:05:49.450
of Balog-Szemeredi-Gowers
already

01:05:49.450 --> 01:05:53.730
implies that the size
of A and the size of B

01:05:53.730 --> 01:05:55.650
are not too small.

01:05:58.910 --> 01:06:03.287
Because, otherwise, you couldn't
have had n squared over k edges

01:06:03.287 --> 01:06:03.870
to begin with.

01:06:08.060 --> 01:06:16.610
So by the path of
length 3 lemma,

01:06:16.610 --> 01:06:20.375
there exists A prime
in A and B prime

01:06:20.375 --> 01:06:24.980
in B with the
following properties.

01:06:24.980 --> 01:06:29.240
That A prime has a
large fraction of--

01:06:32.306 --> 01:06:36.110
so A prime and B prime
are both large in size.

01:06:40.460 --> 01:06:46.665
And for all vertices
a in A prime

01:06:46.665 --> 01:06:54.090
and vertices b in
B prime, there are

01:06:54.090 --> 01:06:59.040
lots of paths of length
3 between these vertices.

01:06:59.040 --> 01:07:05.010
So there are at least k
to the minus little o1--

01:07:05.010 --> 01:07:10.950
to the minus big
O1 times n squared

01:07:10.950 --> 01:07:18.050
pairs of intermediate
vertices a1,

01:07:18.050 --> 01:07:37.760
b1 in A cross B, such that
a b1 a1 b is a path in G.

01:07:37.760 --> 01:07:41.690
So let me draw the
situation for you.

01:07:48.920 --> 01:08:00.210
So we have A and B.
And so inside A and B,

01:08:00.210 --> 01:08:08.100
we have this fairly
large A prime

01:08:08.100 --> 01:08:14.026
and B prime, such that
for every little a

01:08:14.026 --> 01:08:23.470
and little b, there are
many paths like that

01:08:23.470 --> 01:08:26.890
going to b1 and a2.

01:08:30.439 --> 01:08:44.270
Let me set-- so let me set
x to be a plus b1, that sum,

01:08:44.270 --> 01:08:52.240
y to be a1 plus b1,
and z to be a1 plus b.

01:09:04.460 --> 01:09:21.380
So now notice that we can write
this a plus b in at least k

01:09:21.380 --> 01:09:27.350
to the minus big O1
times n squared ways

01:09:27.350 --> 01:09:39.410
as x minus y plus z by
following this path, where x, y,

01:09:39.410 --> 01:09:45.500
and z all lie in the
restricted sumset,

01:09:45.500 --> 01:09:49.350
because that's how the
restricted sumset is defined.

01:09:49.350 --> 01:09:54.229
So if you have an edge,
then the sum of the elements

01:09:54.229 --> 01:09:57.400
across on the two
ends, by definition,

01:09:57.400 --> 01:10:00.970
lies in the restricted sumset.

01:10:00.970 --> 01:10:02.900
So the path of length
3 lemma tells us

01:10:02.900 --> 01:10:06.320
that every pair a
and b, their sum

01:10:06.320 --> 01:10:12.290
can be written in many different
ways as this combination.

01:10:12.290 --> 01:10:21.360
As a result, we see that
A prime plus B prime--

01:10:21.360 --> 01:10:26.477
so this sum, if we consider sum
along with its multiplicity--

01:10:31.450 --> 01:10:36.760
so now we're really looking at
all the different sums as well

01:10:36.760 --> 01:10:43.890
as ways of writing the
sum as this combination--

01:10:43.890 --> 01:10:58.060
we see that it is bounded above
by the restricted sumset raised

01:10:58.060 --> 01:10:58.920
to the third power.

01:11:11.295 --> 01:11:13.740
Because each of these
choices, x, y, and z,

01:11:13.740 --> 01:11:16.330
they come from the
restricted sumset.

01:11:16.330 --> 01:11:19.150
But the hypothesis of
Balog-Szemeredi-Gowers,

01:11:19.150 --> 01:11:21.880
the graphical version, is
that the restricted sumset

01:11:21.880 --> 01:11:24.630
is small in size.

01:11:24.630 --> 01:11:32.430
So we can now upper bound
the restricted sumset

01:11:32.430 --> 01:11:36.580
by, basically, the--

01:11:36.580 --> 01:11:41.840
within a constant, within a
factor of the maximum possible.

01:11:41.840 --> 01:11:47.640
And now we are done,
because we have deduced

01:11:47.640 --> 01:11:52.770
that the complete sumset
between A prime and B prime

01:11:52.770 --> 01:12:02.320
is, at most, a constant
factor with change

01:12:02.320 --> 01:12:04.310
in constant by a polynomial.

01:12:04.310 --> 01:12:06.795
So a constant factor more
than the maximum possible.

01:12:11.000 --> 01:12:14.490
So it's, at mostly, k to
the big O1 poly k times n.

01:12:17.070 --> 01:12:20.310
So that proves the
graphical version

01:12:20.310 --> 01:12:22.890
of Balog-Szemeredi-Gowers.

01:12:22.890 --> 01:12:26.070
And because we showed earlier
that the graphical version

01:12:26.070 --> 01:12:28.560
of Balog-Szemeredi-Gowers
implies Balog-Szemeredi-Gowers,

01:12:28.560 --> 01:12:32.170
this shows the
Balog-Szemeredi-Gowers theorem.

01:12:32.170 --> 01:12:35.880
So let me recap some of
the ideas we saw today.

01:12:35.880 --> 01:12:38.260
And so the whole point
of Balog-Szemeredi-Gowers

01:12:38.260 --> 01:12:42.070
and all of these related lemmas
and theorems and variations

01:12:42.070 --> 01:12:47.050
is that you start with
something that has

01:12:47.050 --> 01:12:49.600
a lot of additive structure.

01:12:49.600 --> 01:12:54.440
Well, after we passed down to
graphs just a lot of edges.

01:12:54.440 --> 01:12:57.700
So you start with
a situation where

01:12:57.700 --> 01:13:02.325
you have kind of 1% goodness.

01:13:02.325 --> 01:13:03.700
And you want to
show that you can

01:13:03.700 --> 01:13:07.960
restrict to fairly
large subsets,

01:13:07.960 --> 01:13:10.610
so that you have perfection.

01:13:10.610 --> 01:13:14.510
So you have complete goodness
between these two sets.

01:13:14.510 --> 01:13:17.240
And this is what's going on
in both the graphical version

01:13:17.240 --> 01:13:18.920
and the additive version.

01:13:18.920 --> 01:13:21.560
So back to the graph
path of length 3 lemma.

01:13:21.560 --> 01:13:25.820
So we were able to boost the
path of length 2 lemma, which

01:13:25.820 --> 01:13:31.400
tells us something about
99% of the pairs having lots

01:13:31.400 --> 01:13:37.160
of common neighbors, to
100% of the pairs having

01:13:37.160 --> 01:13:41.010
lots of path of length 3.

01:13:41.010 --> 01:13:43.380
And in the additive
setting, we saw that

01:13:43.380 --> 01:13:47.880
by starting with a situation
where the hypothesis is

01:13:47.880 --> 01:13:51.210
somewhat patchy, so like
a 1% type hypothesis,

01:13:51.210 --> 01:13:54.750
we can pass down to
fairly large sets, where

01:13:54.750 --> 01:13:58.590
the complete sumset, starting
with just the restricted sumset

01:13:58.590 --> 01:14:01.110
being small, can pass
down to large sets

01:14:01.110 --> 01:14:03.540
where the complete
sumset is small.

01:14:03.540 --> 01:14:05.730
And this is an important
principle, that, often,

01:14:05.730 --> 01:14:11.210
when we have some typicality
by an appropriate argument--

01:14:11.210 --> 01:14:14.300
and, here, it's not at
all a trivial argument.

01:14:14.300 --> 01:14:15.890
So there's some
cleverness involved,

01:14:15.890 --> 01:14:18.120
that by doing some
kind of argument,

01:14:18.120 --> 01:14:21.950
we may be able to pass down
to some fairly large set

01:14:21.950 --> 01:14:25.130
where it's not typically
good, but everything's

01:14:25.130 --> 01:14:27.020
perfectly good.

01:14:27.020 --> 01:14:31.820
That's the spirit here of the
Balog-Szemeredi-Gowers theorem.

01:14:31.820 --> 01:14:35.010
So, next time, for the last
lecture of this course,

01:14:35.010 --> 01:14:38.600
I will tell you about
the sum-product problem,

01:14:38.600 --> 01:14:40.802
where the--

01:14:40.802 --> 01:14:44.500
there are also some graph-- very
nice graph theoretic inputs.