WEBVTT
00:00:00.120 --> 00:00:02.460
The following content is
provided under a Creative
00:00:02.460 --> 00:00:03.880
Commons license.
00:00:03.880 --> 00:00:06.090
Your support will help
MIT OpenCourseWare
00:00:06.090 --> 00:00:10.180
continue to offer high quality
educational resources for free.
00:00:10.180 --> 00:00:12.720
To make a donation or to
view additional materials
00:00:12.720 --> 00:00:16.200
from hundreds of MIT courses,
visit MIT OpenCourseWare
00:00:16.200 --> 00:00:17.625
at ocw.mit.edu.
00:00:20.507 --> 00:00:22.590
PHILIPPE RIGOLLET: --of
our limiting distribution,
00:00:22.590 --> 00:00:24.259
which happen to be Gaussian.
00:00:24.259 --> 00:00:25.800
But if the central
limit theorem told
00:00:25.800 --> 00:00:28.560
us that the limiting
distribution of some average
00:00:28.560 --> 00:00:30.549
was something that
looked like a Poisson
00:00:30.549 --> 00:00:32.340
or an [? exponential, ?]
then we would just
00:00:32.340 --> 00:00:34.770
have in the same way
taken the quintiles
00:00:34.770 --> 00:00:36.700
of the exponential distribution.
00:00:36.700 --> 00:00:39.440
So let's go back to what we had.
00:00:39.440 --> 00:00:46.990
So generically if you have a
set of observations X1 to Xn.
00:00:46.990 --> 00:00:52.180
So remember for the kiss example
they were denoted by R1 to Rn,
00:00:52.180 --> 00:00:55.240
because they were turning
the head to the right,
00:00:55.240 --> 00:00:56.850
but let's just go back.
00:00:56.850 --> 00:00:59.800
We say X1 to Xn,
and in this case
00:00:59.800 --> 00:01:02.710
I'm going to assume
they're IID, and I'm
00:01:02.710 --> 00:01:05.700
going to make them Bernoulli
with [INAUDIBLE] p,
00:01:05.700 --> 00:01:06.710
and p is unknown, right?
00:01:10.150 --> 00:01:11.600
So what did we do from here?
00:01:11.600 --> 00:01:15.824
Well, we said p is
the expectation of Xi,
00:01:15.824 --> 00:01:17.990
and actually we didn't even
think about it too much.
00:01:17.990 --> 00:01:19.090
We said, well, if
I need to estimate
00:01:19.090 --> 00:01:21.460
the proportion of people who
turn their head to the right
00:01:21.460 --> 00:01:22.960
when they kiss, I
just basically I'm
00:01:22.960 --> 00:01:24.400
going to compute the average.
00:01:24.400 --> 00:01:28.660
So our p hat was just
Xn bar, which was just 1
00:01:28.660 --> 00:01:32.170
over n sum from i
over 1 2n of the Xi.
00:01:32.170 --> 00:01:34.990
The average of the observations
was their estimate.
00:01:34.990 --> 00:01:37.690
And then we wanted to build
some confidence intervals
00:01:37.690 --> 00:01:38.220
around this.
00:01:38.220 --> 00:01:41.360
So what we wanted to understand
is, how much that this p hat
00:01:41.360 --> 00:01:42.970
fluctuates.
00:01:42.970 --> 00:01:44.060
This is a random variable.
00:01:44.060 --> 00:01:45.100
It's an average of
random variables.
00:01:45.100 --> 00:01:46.570
It's a random
variable, so we want
00:01:46.570 --> 00:01:47.740
to know what the
distribution is.
00:01:47.740 --> 00:01:49.406
And if we know what
the distribution is,
00:01:49.406 --> 00:01:51.670
then we actually know,
well, where it fluctuates.
00:01:51.670 --> 00:01:52.810
What the expectation is.
00:01:52.810 --> 00:01:55.649
Around which value it tends
to fluctuate et cetera.
00:01:55.649 --> 00:01:57.190
And so what the
central limit theorem
00:01:57.190 --> 00:02:03.310
told us was if I take square
root of n times Xn bar minus p,
00:02:03.310 --> 00:02:04.990
which is its average.
00:02:04.990 --> 00:02:07.445
And then I divide it by
the standard deviation.
00:02:10.840 --> 00:02:15.670
Then this thing here converges
as n goes to infinity,
00:02:15.670 --> 00:02:17.380
and we will say
a little bit more
00:02:17.380 --> 00:02:19.360
about what it means
in distribution
00:02:19.360 --> 00:02:23.157
to some standard
normal random variable.
00:02:23.157 --> 00:02:24.740
So that was the
central limit theorem.
00:02:27.069 --> 00:02:28.610
So what it means is
that when I think
00:02:28.610 --> 00:02:35.410
of this as a random variable,
when n is large enough
00:02:35.410 --> 00:02:37.630
it's going to look like this.
00:02:37.630 --> 00:02:40.030
And so I understand
perfectly its fluctuations.
00:02:40.030 --> 00:02:43.450
I know that this
thing here has--
00:02:43.450 --> 00:02:45.520
I know the probability
of being in this zone.
00:02:45.520 --> 00:02:47.890
I know that this
number here is 0.
00:02:47.890 --> 00:02:49.600
I know a bunch of things.
00:02:49.600 --> 00:02:51.910
And then, in
particular, what I was
00:02:51.910 --> 00:02:55.990
interested in was that
the probability, that's
00:02:55.990 --> 00:02:59.110
the absolute value of a
Gaussian random variable,
00:02:59.110 --> 00:03:05.111
exceeds q alpha over
2, q alpha over 2.
00:03:05.111 --> 00:03:06.610
We said that this
was equal to what?
00:03:13.610 --> 00:03:15.527
Anybody?
00:03:15.527 --> 00:03:16.110
What was that?
00:03:16.110 --> 00:03:18.137
AUDIENCE: [INAUDIBLE]
00:03:18.137 --> 00:03:19.470
PHILIPPE RIGOLLET: Alpha, right?
00:03:19.470 --> 00:03:21.210
So that's the probability.
00:03:21.210 --> 00:03:23.060
That's my random variable.
00:03:23.060 --> 00:03:27.050
So this is by definition q
alpha over 2 is the number.
00:03:27.050 --> 00:03:29.960
So that to the right
of it is alpha over 2.
00:03:29.960 --> 00:03:34.100
And this is a negative q
alpha over 2 by symmetry.
00:03:34.100 --> 00:03:36.120
And so the probability
that i exceeds-- well,
00:03:36.120 --> 00:03:38.150
it's not very symmetric,
but the probability
00:03:38.150 --> 00:03:41.020
that i exceeds this
value, q alpha over 2,
00:03:41.020 --> 00:03:46.250
is just the sum of
the two gray areas.
00:03:46.250 --> 00:03:47.360
All right?
00:03:47.360 --> 00:03:50.605
So now I said that this thing
was approximately equal,
00:03:50.605 --> 00:03:51.980
due to the central
limit theorem,
00:03:51.980 --> 00:03:55.320
to the probability,
that square root of n.
00:03:55.320 --> 00:03:59.063
Xn bar minus p divided by
square root p 1 minus p.
00:04:04.970 --> 00:04:10.180
Well, absolute value was
larger than q alpha over 2.
00:04:10.180 --> 00:04:12.870
Well, then this thing by default
is actually approximately equal
00:04:12.870 --> 00:04:16.870
to alpha, just because of virtue
of the central limit theorem.
00:04:16.870 --> 00:04:23.770
And then we just said,
well, I'll solve for p.
00:04:23.770 --> 00:04:28.420
Has anyone attempted to solve
the degree two equation for p
00:04:28.420 --> 00:04:29.412
in the homework?
00:04:29.412 --> 00:04:30.370
Everybody has tried it?
00:04:35.400 --> 00:04:37.740
So essentially, this is
going to be an equation in p.
00:04:37.740 --> 00:04:39.240
Sometimes we don't
want to solve it.
00:04:39.240 --> 00:04:41.823
Some of the p's we will replace
by their worst possible value.
00:04:41.823 --> 00:04:44.430
For example, we said one
of the tricks we had was
00:04:44.430 --> 00:04:48.830
that this value here,
square root of p 1 minus p,
00:04:48.830 --> 00:04:51.217
was always less than one half.
00:04:51.217 --> 00:04:53.550
Until we could actually get
the confidence interval that
00:04:53.550 --> 00:04:55.174
was larger than all
possible confidence
00:04:55.174 --> 00:04:57.170
intervals for all
possible values of p,
00:04:57.170 --> 00:04:59.390
but we could solve for p.
00:04:59.390 --> 00:05:01.570
Do we all agree on the
principle of what we did?
00:05:01.570 --> 00:05:03.840
So that's how you build
confidence intervals.
00:05:03.840 --> 00:05:05.360
Now let's step
back for a second,
00:05:05.360 --> 00:05:08.070
and see what was important in
the building of this confidence
00:05:08.070 --> 00:05:09.470
interval.
00:05:09.470 --> 00:05:11.870
The really key thing is
that I didn't tell you
00:05:11.870 --> 00:05:15.350
why I formed this thing, right?
00:05:15.350 --> 00:05:17.120
We started from
x bar, and then I
00:05:17.120 --> 00:05:21.000
took some weird function of x
bar that depended on p and n.
00:05:21.000 --> 00:05:23.824
And the reason is, because
when I take this function,
00:05:23.824 --> 00:05:25.240
the central limit
theorem tells me
00:05:25.240 --> 00:05:28.009
that it converges to
something that I know.
00:05:28.009 --> 00:05:30.550
But this very important thing
about the something that I know
00:05:30.550 --> 00:05:35.030
is that it does not depend on
anything that I don't know.
00:05:35.030 --> 00:05:36.800
For example, if I
forgot to divide
00:05:36.800 --> 00:05:40.220
by square root of p 1 minus
p, then this thing would have
00:05:40.220 --> 00:05:43.980
had a variance, which
is the p 1 minus p.
00:05:43.980 --> 00:05:47.620
If I didn't remove this
p here, the mean here
00:05:47.620 --> 00:05:49.860
would have been affected by p.
00:05:49.860 --> 00:05:53.041
And there's no table
for normal p 1.
00:05:53.041 --> 00:05:53.540
Yes?
00:05:53.540 --> 00:05:55.834
AUDIENCE: [INAUDIBLE]
00:05:55.834 --> 00:05:58.000
PHILIPPE RIGOLLET: Oh, so
the square root of n terms
00:05:58.000 --> 00:05:58.500
come from.
00:05:58.500 --> 00:06:00.830
So really you should view this.
00:06:00.830 --> 00:06:04.780
So there's a rule and sort
of a quiet rule in math
00:06:04.780 --> 00:06:08.990
that you don't write a
divided by b over c, right?
00:06:08.990 --> 00:06:12.714
You write c times a divided
by b, because it looks nicer.
00:06:12.714 --> 00:06:14.380
But the way you want
to think about this
00:06:14.380 --> 00:06:20.600
is that this is x bar minus p
divided by the square root of p
00:06:20.600 --> 00:06:23.839
1 minus p divided by n.
00:06:23.839 --> 00:06:25.630
And the reason is,
because this is actually
00:06:25.630 --> 00:06:27.000
the standard deviation of this--
00:06:27.000 --> 00:06:28.720
oh sorry, x bar n.
00:06:28.720 --> 00:06:31.510
This is actually the standard
deviation of this guy,
00:06:31.510 --> 00:06:36.540
and the square root of n comes
from the [INAUDIBLE] average.
00:06:36.540 --> 00:06:39.922
So the key thing
was that this thing,
00:06:39.922 --> 00:06:42.130
this limiting distribution
did not depend on anything
00:06:42.130 --> 00:06:43.340
I don't know.
00:06:43.340 --> 00:06:45.635
And this is actually called
a pivotal distribution.
00:06:45.635 --> 00:06:47.690
It's pivotal.
00:06:47.690 --> 00:06:49.020
I don't need anything.
00:06:49.020 --> 00:06:51.750
I don't need to know anything,
and I can read it in a table.
00:06:51.750 --> 00:06:54.320
Sometimes there's going
to be complicated things,
00:06:54.320 --> 00:06:55.430
but now we have computers.
00:06:55.430 --> 00:06:57.846
The beauty about Gaussian is
that people have studied them
00:06:57.846 --> 00:07:00.049
to death, and you can
open any stats textbook,
00:07:00.049 --> 00:07:02.090
and you will see a table
again that will tell you
00:07:02.090 --> 00:07:04.140
for each value of alpha
you're interested in,
00:07:04.140 --> 00:07:07.220
it will tell you what
q alpha over 2 is.
00:07:07.220 --> 00:07:10.566
But there might be some
crazy distributions,
00:07:10.566 --> 00:07:12.440
but as long as they
don't depend on anything,
00:07:12.440 --> 00:07:13.981
we might actually
be able to simulate
00:07:13.981 --> 00:07:16.540
from them, and in particular
compute what q alpha over 2
00:07:16.540 --> 00:07:19.157
is for any possible
value [INAUDIBLE]..
00:07:19.157 --> 00:07:21.240
And so that's what we're
going to be trying to do.
00:07:21.240 --> 00:07:22.800
Finding pivotal distributions.
00:07:22.800 --> 00:07:26.060
How do we take this Xn bar,
which is a good estimate,
00:07:26.060 --> 00:07:28.940
and turn it into something
which may be exactly
00:07:28.940 --> 00:07:31.100
or asymptotically
does not depend
00:07:31.100 --> 00:07:33.410
on any unknown parameter.
00:07:33.410 --> 00:07:35.600
So here is one way
we can actually--
00:07:35.600 --> 00:07:38.084
so that's what we did for
the kiss example, right?
00:07:38.084 --> 00:07:39.500
And here I mentioned,
for example,
00:07:39.500 --> 00:07:41.780
in the extreme case,
when n was equal to 3
00:07:41.780 --> 00:07:44.240
we would get a different
thing, but here the CLT
00:07:44.240 --> 00:07:45.860
would not be valid.
00:07:45.860 --> 00:07:49.520
And what that means is that
my pivotal distribution
00:07:49.520 --> 00:07:52.870
is actually not the
normal distribution,
00:07:52.870 --> 00:07:54.620
but it might be something else.
00:07:54.620 --> 00:07:56.920
And I said we can make
take exact computations.
00:07:56.920 --> 00:07:58.510
Well, let's see
what it is, right?
00:07:58.510 --> 00:08:06.610
If I have three observations,
so I'm going to have X1, X2, X3.
00:08:06.610 --> 00:08:08.810
So now I take the
average of those guys.
00:08:13.260 --> 00:08:15.404
OK, so that's my estimate.
00:08:15.404 --> 00:08:16.820
How many values
can this guy take?
00:08:23.125 --> 00:08:25.065
It's a little bit of counting.
00:08:27.980 --> 00:08:28.529
Four values.
00:08:28.529 --> 00:08:29.820
How did you get to that number?
00:08:37.590 --> 00:08:41.919
OK, so each of these guys
can take value 0, 1, right?
00:08:41.919 --> 00:08:43.669
So the number of values
that it can take,
00:08:43.669 --> 00:08:45.730
I mean, it's a little
annoying, because then I
00:08:45.730 --> 00:08:47.110
have to sum them, right?
00:08:47.110 --> 00:08:51.620
So basically, I have to
count the number of 1's.
00:08:51.620 --> 00:08:54.789
So how many 1's
can I get, right?
00:08:54.789 --> 00:08:57.330
Sorry I have to-- yeah, so this
is the number of 1's that I--
00:08:57.330 --> 00:08:58.500
OK, so let's look at that.
00:08:58.500 --> 00:09:00.182
So we get 0, 0, 0.
00:09:00.182 --> 00:09:01.690
0, 0, 1.
00:09:01.690 --> 00:09:03.350
And then I get
basically three of them
00:09:03.350 --> 00:09:04.975
that have just the
one in there, right?
00:09:07.660 --> 00:09:08.920
So there's three of them.
00:09:08.920 --> 00:09:12.710
How many of them
have exactly two 1's?
00:09:12.710 --> 00:09:13.350
2.
00:09:13.350 --> 00:09:15.270
Sorry, 3, right?
00:09:15.270 --> 00:09:18.230
So it's just this guy where
I replaced the 0's and the 1.
00:09:18.230 --> 00:09:21.390
OK, so now I get--
00:09:21.390 --> 00:09:23.750
so here I get three
that take the value 1,
00:09:23.750 --> 00:09:25.680
and one that gets the value 0.
00:09:25.680 --> 00:09:28.110
And then I get three
that take the value 2,
00:09:28.110 --> 00:09:30.970
and then one that
takes the value 1.
00:09:30.970 --> 00:09:33.100
The value [? 0 ?] 1's, right?
00:09:33.100 --> 00:09:35.870
OK, so everybody knows what I'm
missing here is just the ones
00:09:35.870 --> 00:09:38.440
here where I replaced
the 0's by 1's.
00:09:38.440 --> 00:09:40.480
So the number of values
that this thing can take
00:09:40.480 --> 00:09:43.080
is 1, 2, 3, 4.
00:09:43.080 --> 00:09:45.749
So someone is counting
much faster than me.
00:09:45.749 --> 00:09:48.040
And so those numbers, you've
probably seen them before,
00:09:48.040 --> 00:09:48.539
right?
00:09:48.539 --> 00:09:50.530
1, 3, 3, 1, remember?
00:09:50.530 --> 00:09:52.930
And so essentially
those guys, it
00:09:52.930 --> 00:09:58.760
takes only three values,
which are either 1/3, 1.
00:09:58.760 --> 00:10:02.332
Sorry, 1/3.
00:10:02.332 --> 00:10:06.400
Oh OK, so it's 0, sorry.
00:10:06.400 --> 00:10:10.100
1/3, 2/3, and 1.
00:10:10.100 --> 00:10:12.790
Those are the four possible
values you can take.
00:10:12.790 --> 00:10:14.870
And so now-- which is
probably much easier
00:10:14.870 --> 00:10:16.714
to count like that--
and so now all
00:10:16.714 --> 00:10:18.380
I have to tell you
if I want to describe
00:10:18.380 --> 00:10:20.240
the distribution
of this probability
00:10:20.240 --> 00:10:23.090
of this random variable,
is just the probability
00:10:23.090 --> 00:10:24.990
that it takes each
of these values.
00:10:24.990 --> 00:10:30.170
So X bar 3 takes the
value 0 probability
00:10:30.170 --> 00:10:34.160
that X bar 3 takes the
value 1/3, et cetera.
00:10:34.160 --> 00:10:36.262
If I give you each of
these possible values,
00:10:36.262 --> 00:10:38.720
then you will be able to know
exactly what the distribution
00:10:38.720 --> 00:10:41.720
is, and hopefully maybe
to turn it into something
00:10:41.720 --> 00:10:42.691
you can compute.
00:10:42.691 --> 00:10:44.690
Now the thing is that
those values will actually
00:10:44.690 --> 00:10:47.290
depend on the unknown p.
00:10:47.290 --> 00:10:48.440
What is the unknown p here?
00:10:48.440 --> 00:10:49.856
What is the
probability that X bar
00:10:49.856 --> 00:10:52.214
3 is equal to 0 for example?
00:10:52.214 --> 00:10:53.166
I'm sorry?
00:10:53.166 --> 00:10:54.594
AUDIENCE: [INAUDIBLE]
00:10:54.594 --> 00:10:55.760
PHILIPPE RIGOLLET: Yeah, OK.
00:10:55.760 --> 00:10:59.930
So let's write it without
making the computation So 1/8 is
00:10:59.930 --> 00:11:03.866
probably not the
right answer, right?
00:11:03.866 --> 00:11:09.267
For example, if p is equal to
0, what is this probability?
00:11:09.267 --> 00:11:10.740
1.
00:11:10.740 --> 00:11:13.980
If p is 1, what is
this probability?
00:11:13.980 --> 00:11:14.480
0.
00:11:14.480 --> 00:11:16.250
So it will depend on p.
00:11:16.250 --> 00:11:18.314
So the probability that
this thing is equal to 0,
00:11:18.314 --> 00:11:20.480
is just the probability
that all three of those guys
00:11:20.480 --> 00:11:21.606
are equal to 0.
00:11:21.606 --> 00:11:24.105
The probability that X1 is equal
to 0, and X2 is equal to 0,
00:11:24.105 --> 00:11:25.532
and X3 is equal to 0.
00:11:25.532 --> 00:11:26.990
Now my things are
independent, so I
00:11:26.990 --> 00:11:28.340
do what I actually
want to do, which
00:11:28.340 --> 00:11:29.964
say the probability
of the intersection
00:11:29.964 --> 00:11:32.330
is the product of the
probabilities, right?
00:11:32.330 --> 00:11:34.940
So it's just the probability
that each of them is equal to 0
00:11:34.940 --> 00:11:36.315
to the power of 3.
00:11:36.315 --> 00:11:38.690
And the probability that each
of them, or say one of them
00:11:38.690 --> 00:11:41.585
is equal to 0, is
just 1 minus p.
00:11:45.960 --> 00:11:48.900
And then for this guy I just
get the probability-- well,
00:11:48.900 --> 00:11:51.690
it's more complicated, because I
have to decide which one it is.
00:11:51.690 --> 00:11:53.580
But those things are
just the probability
00:11:53.580 --> 00:11:56.290
of some binomial random
variables, right?
00:11:56.290 --> 00:12:00.320
This is just a
binomial, X bar 3.
00:12:00.320 --> 00:12:03.986
So if I look at X bar 3,
and then I multiply it by 3,
00:12:03.986 --> 00:12:05.860
it's just this sum of
independent Bernoulli's
00:12:05.860 --> 00:12:07.120
with parameter p.
00:12:07.120 --> 00:12:11.696
So this is actually a binomial
with parameter 3 and p.
00:12:11.696 --> 00:12:13.070
And there's tables
for binomials,
00:12:13.070 --> 00:12:16.567
and they tell you all this.
00:12:16.567 --> 00:12:18.650
Now the thing is I want
to invert this guy, right?
00:12:18.650 --> 00:12:19.870
Somehow.
00:12:19.870 --> 00:12:21.055
This thing depends on p.
00:12:21.055 --> 00:12:22.990
I don't like it, so
I'm going to have
00:12:22.990 --> 00:12:25.874
to find ways to get this
things depending on p,
00:12:25.874 --> 00:12:27.790
and I could make all
these nasty computations,
00:12:27.790 --> 00:12:29.710
and spend hours doing this.
00:12:29.710 --> 00:12:31.330
But there's tricks
to go around this.
00:12:31.330 --> 00:12:32.410
There's upper bounds.
00:12:32.410 --> 00:12:34.390
Just like we just
said, well, maybe I
00:12:34.390 --> 00:12:36.840
don't want to solve the
second degree equation in p,
00:12:36.840 --> 00:12:40.360
because it's just going to
capture maybe smaller order
00:12:40.360 --> 00:12:41.030
terms, right?
00:12:41.030 --> 00:12:43.930
Things that maybe won't make
a huge difference numerically.
00:12:43.930 --> 00:12:46.900
You can check that in
your problem set one.
00:12:46.900 --> 00:12:48.910
Does it make a huge
difference numerically
00:12:48.910 --> 00:12:50.590
to solve the second
degree equation,
00:12:50.590 --> 00:12:52.960
or to just use the
[INAUDIBLE] p 1
00:12:52.960 --> 00:12:56.050
minus p or even to plug
in p hat instead of p.
00:12:56.050 --> 00:12:57.720
Those are going to
be the-- problem
00:12:57.720 --> 00:13:01.540
set one is to make sure that you
see what magnitude of changes
00:13:01.540 --> 00:13:05.350
you get by changing from
one method to the other.
00:13:05.350 --> 00:13:13.420
So what I wanted to
go to is something
00:13:13.420 --> 00:13:16.150
where we can use
something, which is just
00:13:16.150 --> 00:13:17.900
a little more brute force.
00:13:17.900 --> 00:13:19.600
So the probability
that-- so here
00:13:19.600 --> 00:13:20.931
is this Hoeffding's inequality.
00:13:20.931 --> 00:13:21.430
We saw that.
00:13:21.430 --> 00:13:23.320
That's what we've
finished on last time.
00:13:23.320 --> 00:13:25.120
So Hoeffding's
inequality is actually
00:13:25.120 --> 00:13:27.560
one of the most
useful inequalities.
00:13:27.560 --> 00:13:30.130
If any one of you is doing
anything really to algorithms,
00:13:30.130 --> 00:13:32.089
you've seen that
inequality before.
00:13:32.089 --> 00:13:33.880
It's extremely convenient
that it tells you
00:13:33.880 --> 00:13:35.650
something about bounded
random variables,
00:13:35.650 --> 00:13:37.984
and if you do algorithms
typically with things bounded.
00:13:37.984 --> 00:13:40.150
And that's the case of
Bernoulli's random variables,
00:13:40.150 --> 00:13:40.649
right?
00:13:40.649 --> 00:13:42.765
They're bounded between 0 and 1.
00:13:42.765 --> 00:13:44.140
And so when I do
this thing, when
00:13:44.140 --> 00:13:46.810
I do Hoeffding's inequality,
what this thing is telling
00:13:46.810 --> 00:13:53.120
me is for any given epsilon
here, for any given epsilon,
00:13:53.120 --> 00:13:55.790
what is the probability
that Xn bar goes away
00:13:55.790 --> 00:13:58.370
from its expectation?
00:13:58.370 --> 00:14:02.030
All right, then we saw that it
decreases somewhat similarly
00:14:02.030 --> 00:14:04.560
to the way a Gaussian
would look like.
00:14:04.560 --> 00:14:08.120
So essentially what Hoeffding's
inequality is telling me, is
00:14:08.120 --> 00:14:18.120
that I have this picture, when
I have a Gaussian with mean u,
00:14:18.120 --> 00:14:20.747
I know it looks
like this, right?
00:14:20.747 --> 00:14:22.330
What Hoeffding's
inequality is telling
00:14:22.330 --> 00:14:24.780
me is that if I actually
take the average
00:14:24.780 --> 00:14:27.740
of some bounded
random variables,
00:14:27.740 --> 00:14:30.494
then their probability
distribution function or maybe
00:14:30.494 --> 00:14:32.910
math function-- this thing
might not even have [INAUDIBLE]
00:14:32.910 --> 00:14:35.540
the density, but let's think
of it as being a density just
00:14:35.540 --> 00:14:38.630
for simplicity-- it's
going to be something
00:14:38.630 --> 00:14:40.895
that's going to look like this.
00:14:40.895 --> 00:14:42.270
It's going to be
somewhat-- well,
00:14:42.270 --> 00:14:44.061
sometimes it's going
to have to escape just
00:14:44.061 --> 00:14:46.540
for the sake of
having integral 1.
00:14:46.540 --> 00:14:49.362
But it's essentially
telling me that those guys
00:14:49.362 --> 00:14:52.680
stay below those guys.
00:14:52.680 --> 00:14:56.610
The probability that
Xn bar exceeds mu
00:14:56.610 --> 00:14:58.940
is bounded by
something that decays
00:14:58.940 --> 00:15:00.802
like to tail of Gaussian.
00:15:00.802 --> 00:15:03.010
So really that's the picture
you should have in mind.
00:15:03.010 --> 00:15:05.740
When I average bounded
random variables,
00:15:05.740 --> 00:15:08.240
I actually have something
that might be really rugged.
00:15:08.240 --> 00:15:10.510
It might not be smooth
like a Gaussian,
00:15:10.510 --> 00:15:12.620
but I know that it's always
bounded by a Gaussian.
00:15:12.620 --> 00:15:14.620
And what's nice about it
is that when I actually
00:15:14.620 --> 00:15:17.800
start computing probability
that exceeds some number,
00:15:17.800 --> 00:15:24.340
say alpha over 2, then I
know that this I can actually
00:15:24.340 --> 00:15:29.460
get a number, which is just--
00:15:29.460 --> 00:15:31.830
sorry, the probability
that it exceeds, yeah.
00:15:31.830 --> 00:15:33.580
So this number that I
get here is actually
00:15:33.580 --> 00:15:35.424
going to be somewhat
smaller, right?
00:15:35.424 --> 00:15:37.840
So that's going to be the q
alpha over 2 for the Gaussian,
00:15:37.840 --> 00:15:39.390
and that's going to be the--
00:15:39.390 --> 00:15:41.598
I don't know, r alpha over
2 for this [? Bernoulli ?]
00:15:41.598 --> 00:15:43.550
random variable.
00:15:43.550 --> 00:15:46.478
Like q prime or different q.
00:15:46.478 --> 00:15:50.149
So I can actually do
this without actually
00:15:50.149 --> 00:15:51.190
taking any limits, right?
00:15:51.190 --> 00:15:53.200
This is valid for any n.
00:15:53.200 --> 00:15:54.910
I don't need to
actually go to infinity.
00:15:54.910 --> 00:15:57.370
Now this seems a
bit magical, right?
00:15:57.370 --> 00:15:59.821
I mean, I just said
we need n to be,
00:15:59.821 --> 00:16:01.570
we discussed that we
wanted n to be larger
00:16:01.570 --> 00:16:03.660
than 30 last time for
the central limit theorem
00:16:03.660 --> 00:16:05.950
to kick in, and this
one seems to tell me
00:16:05.950 --> 00:16:07.940
I can do it for any n.
00:16:07.940 --> 00:16:12.970
Now there will be a price to pay
is that I pick up this 2 over b
00:16:12.970 --> 00:16:13.930
minus alpha squared.
00:16:13.930 --> 00:16:20.421
So that's the variance of the
Gaussian that I have, right?
00:16:20.421 --> 00:16:20.920
Sort of.
00:16:20.920 --> 00:16:23.450
That's telling me what
the variance should be,
00:16:23.450 --> 00:16:24.950
and this is actually
not as nice.
00:16:24.950 --> 00:16:27.530
I pick factor 4
compared to the Gaussian
00:16:27.530 --> 00:16:29.290
that I would get for that.
00:16:29.290 --> 00:16:32.230
So let's try to solve
it for our case.
00:16:32.230 --> 00:16:33.800
So I just told you try it.
00:16:33.800 --> 00:16:35.038
Did anybody try to do it?
00:16:37.362 --> 00:16:39.070
So we started from
this last time, right?
00:16:41.980 --> 00:16:43.730
And the reason was
that we could say
00:16:43.730 --> 00:16:46.490
that the probability that this
thing exceeds q alpha over 2
00:16:46.490 --> 00:16:47.690
is alpha.
00:16:47.690 --> 00:16:52.000
So that was using CLT, so let's
just keep it here, and see
00:16:52.000 --> 00:16:53.484
what we would do differently.
00:16:56.230 --> 00:16:58.530
What Hoeffding tells me is
that the probability that Xn
00:16:58.530 --> 00:17:00.070
bar minus--
00:17:00.070 --> 00:17:04.265
well, what is mu in this case?
00:17:04.265 --> 00:17:06.220
It's p, right?
00:17:06.220 --> 00:17:07.660
It's just notation here.
00:17:07.660 --> 00:17:09.280
Mu was the average,
but we call it
00:17:09.280 --> 00:17:12.970
p in the case of
Bernoulli's, exceeds--
00:17:12.970 --> 00:17:17.220
let's just call it
epsilon for a second.
00:17:17.220 --> 00:17:19.271
So we said that this
was bounded by what?
00:17:19.271 --> 00:17:21.020
So Hoeffding tells me
that this is bounded
00:17:21.020 --> 00:17:26.059
by 2 times exponential minus 2.
00:17:26.059 --> 00:17:29.150
Now the nice thing is that
I pick up a factor n here,
00:17:29.150 --> 00:17:30.150
epsilon squared.
00:17:30.150 --> 00:17:33.130
And what is b minus a
squared for the Bernoulli's?
00:17:33.130 --> 00:17:33.840
1.
00:17:33.840 --> 00:17:36.517
So I don't have a
denominator here.
00:17:36.517 --> 00:17:38.350
And I'm going to do
exactly what I did here.
00:17:38.350 --> 00:17:40.720
I'm going to set this
guy to be equal to alpha.
00:17:43.240 --> 00:17:46.640
So that if I get
alpha here, then that
00:17:46.640 --> 00:17:50.100
means that just
solving for epsilon,
00:17:50.100 --> 00:17:52.600
I'm going to have some number,
which will play the role of q
00:17:52.600 --> 00:17:54.200
alpha over 2, and
then I'm going to be
00:17:54.200 --> 00:17:58.400
able to just say that p
is between X bar and minus
00:17:58.400 --> 00:18:00.834
epsilon, and X bar
n plus epsilon.
00:18:00.834 --> 00:18:02.268
OK, so let's do it.
00:18:05.140 --> 00:18:06.780
So we have to
solve the equation.
00:18:14.572 --> 00:18:20.770
2 exponential minus 2n
epsilon squared equals alpha,
00:18:20.770 --> 00:18:22.846
which means that--
00:18:22.846 --> 00:18:26.542
so here I'm going to get,
there's a 2 right here.
00:18:26.542 --> 00:18:29.200
So that means that I
get alpha over 2 here.
00:18:29.200 --> 00:18:30.850
Then I take the
logs on both sides,
00:18:30.850 --> 00:18:32.058
and now let me just write it.
00:18:36.650 --> 00:18:39.430
And then I want to
solve for epsilon.
00:18:39.430 --> 00:18:43.350
So that means that epsilon
is equal to square root log
00:18:43.350 --> 00:18:45.860
q over alpha divided by 2n.
00:18:50.618 --> 00:18:51.118
Yes?
00:18:51.118 --> 00:18:53.030
AUDIENCE: [INAUDIBLE]
00:18:53.030 --> 00:18:55.410
PHILIPPE RIGOLLET:
Why is b minus a 1?
00:18:55.410 --> 00:18:57.840
Well, let's just look, right?
00:18:57.840 --> 00:19:00.860
X lives in the
interval b minus a.
00:19:00.860 --> 00:19:06.110
So I can take b to be 25,
and a to be my negative 42.
00:19:06.110 --> 00:19:09.134
But I'm going to try to
be as sharp as I can.
00:19:09.134 --> 00:19:10.800
All right, so what
is the smallest value
00:19:10.800 --> 00:19:13.340
you can think of such that
a Bernoulli random variable
00:19:13.340 --> 00:19:15.290
is larger than or
equal to this value?
00:19:19.510 --> 00:19:23.740
What values does a Bernoulli
random variable take?
00:19:23.740 --> 00:19:24.530
0 and 1.
00:19:24.530 --> 00:19:29.240
So it takes values
between 0 and 1.
00:19:29.240 --> 00:19:31.280
It just maxes the value.
00:19:31.280 --> 00:19:33.860
Actually, this is the
worst possible case
00:19:33.860 --> 00:19:38.130
for the Hoeffding inequality.
00:19:38.130 --> 00:19:40.250
So now I just get this
one, and so now you
00:19:40.250 --> 00:19:41.750
tell me that I have this thing.
00:19:41.750 --> 00:19:43.260
So when I solve
this guy over there.
00:19:43.260 --> 00:19:46.070
So combining this
thing and this thing
00:19:46.070 --> 00:19:53.300
implies that the probability
that p lives between Xn
00:19:53.300 --> 00:20:01.660
bar minus square root log 2
over alpha divided by 2n and X
00:20:01.660 --> 00:20:10.970
bar plus the square root log
2 over alpha divided by 2n
00:20:10.970 --> 00:20:12.020
is equal to?
00:20:15.170 --> 00:20:16.882
I mean, is at least.
00:20:16.882 --> 00:20:18.090
What is it at least equal to?
00:20:22.930 --> 00:20:25.870
Here this controls the
probability of them outside
00:20:25.870 --> 00:20:27.180
of this interval, right?
00:20:27.180 --> 00:20:31.730
It tells me the probability
that Xn bar is far from p
00:20:31.730 --> 00:20:32.605
by more than epsilon.
00:20:32.605 --> 00:20:34.521
So there's a probability
that they're actually
00:20:34.521 --> 00:20:36.640
outside of the interval
that I just wrote.
00:20:36.640 --> 00:20:39.650
So it's 1 minus the probability
of being in the interval.
00:20:39.650 --> 00:20:43.882
So this is at least
1 minus alpha.
00:20:43.882 --> 00:20:46.340
So I just use the fact that a
probability of the complement
00:20:46.340 --> 00:20:50.100
is 1 minus the
probability of the set.
00:20:50.100 --> 00:20:53.460
And since I have an upper bound
on the probability of the set,
00:20:53.460 --> 00:20:59.100
I have a lower bound on the
probability of the complement.
00:20:59.100 --> 00:21:03.170
So now it's a bit different.
00:21:03.170 --> 00:21:06.640
Before, we actually wrote
something that was--
00:21:06.640 --> 00:21:08.010
so let me get it back.
00:21:08.010 --> 00:21:11.990
So if we go back to the example
where we took the [INAUDIBLE]
00:21:11.990 --> 00:21:16.840
over p, we got this guy.
00:21:16.840 --> 00:21:19.990
q alpha over square root of--
00:21:19.990 --> 00:21:21.700
over 2 square root n.
00:21:21.700 --> 00:21:24.966
So we had Xn bar plus minus
q alpha over 2 square root n.
00:21:24.966 --> 00:21:27.340
Actually, that was q alpha
over 2n, I'm sorry about that.
00:21:30.730 --> 00:21:34.540
And so now we have something
that replaces this q alpha,
00:21:34.540 --> 00:21:40.880
and it's essentially square
root of 2 log 2 over alpha.
00:21:40.880 --> 00:21:43.580
Because if I replace
q alpha by square root
00:21:43.580 --> 00:21:47.240
of 2 log 2 over
alpha, I actually
00:21:47.240 --> 00:21:49.338
get exactly this thing here.
00:21:52.030 --> 00:21:55.970
And so the question is,
what would you guess?
00:21:55.970 --> 00:22:01.790
Is this number, this margin,
square root of log 2 over alpha
00:22:01.790 --> 00:22:05.930
divided by 2n, is it smaller
or larger than this guy?
00:22:05.930 --> 00:22:08.915
q alpha all over 2/3n.
00:22:08.915 --> 00:22:09.810
Yes?
00:22:09.810 --> 00:22:10.640
Larger.
00:22:10.640 --> 00:22:12.180
Everybody agrees with this?
00:22:12.180 --> 00:22:14.690
Just qualitatively?
00:22:14.690 --> 00:22:17.430
Right, because we just made a
very conservative statement.
00:22:17.430 --> 00:22:18.510
We do not use anything.
00:22:18.510 --> 00:22:20.100
This is true always.
00:22:20.100 --> 00:22:22.080
So it can only be better.
00:22:22.080 --> 00:22:24.840
The reason in statistics where
you use those assumptions
00:22:24.840 --> 00:22:27.590
that n is large enough, that you
have this independence that you
00:22:27.590 --> 00:22:30.090
like so much, and so you can
actually have the central limit
00:22:30.090 --> 00:22:32.290
theorem kick in,
all these things
00:22:32.290 --> 00:22:35.500
are for you to have
enough assumptions
00:22:35.500 --> 00:22:38.190
so that you can actually make
sharper and sharper decisions.
00:22:38.190 --> 00:22:40.249
More and more
confident statement.
00:22:40.249 --> 00:22:42.540
And that's why there's all
this junk science out there,
00:22:42.540 --> 00:22:45.540
because people make too much
assumptions for their own good.
00:22:45.540 --> 00:22:46.956
They're saying,
well, let's assume
00:22:46.956 --> 00:22:50.720
that everything is the way I
love it, so that I can for sure
00:22:50.720 --> 00:22:53.539
any minor change, I
will be able to say
00:22:53.539 --> 00:22:55.830
that's because I made an
important scientific discovery
00:22:55.830 --> 00:23:02.050
rather than, well, that
was just [INAUDIBLE] OK?
00:23:02.050 --> 00:23:04.350
So now here's the fun moment.
00:23:04.350 --> 00:23:09.110
And actually let me tell you
why we look at this thing.
00:23:09.110 --> 00:23:11.600
So there's actually--
who has seen
00:23:11.600 --> 00:23:14.328
different types of convergence
in the probability statistic
00:23:14.328 --> 00:23:14.828
class?
00:23:17.900 --> 00:23:20.430
[INAUDIBLE] students.
00:23:20.430 --> 00:23:22.340
And so there's
different types of--
00:23:22.340 --> 00:23:25.610
in the real numbers
there's very simple.
00:23:25.610 --> 00:23:27.160
There's one
convergence, Xn turns
00:23:27.160 --> 00:23:29.680
to X. To start thinking
about functions,
00:23:29.680 --> 00:23:32.230
well, maybe you have
uniform convergence,
00:23:32.230 --> 00:23:33.610
you have pointwise convergence.
00:23:33.610 --> 00:23:34.990
So if you've done
some real analysis,
00:23:34.990 --> 00:23:36.948
you know there's different
types of convergence
00:23:36.948 --> 00:23:37.790
you can think of.
00:23:37.790 --> 00:23:40.437
And in the convergence
of random variables,
00:23:40.437 --> 00:23:42.770
there's also different types,
but for different reasons.
00:23:42.770 --> 00:23:44.802
It's just because the
question is, what do you
00:23:44.802 --> 00:23:45.760
do with the randomness?
00:23:45.760 --> 00:23:47.885
When you see that something
converges to something,
00:23:47.885 --> 00:23:50.620
it probably means that
you're willing to tolerate
00:23:50.620 --> 00:23:54.190
low probability things happening
or where it doesn't happen,
00:23:54.190 --> 00:23:56.350
and on how you
handle those, creates
00:23:56.350 --> 00:23:58.670
different types of convergence.
00:23:58.670 --> 00:24:03.340
So to be fair, in statistics the
only convergence we care about
00:24:03.340 --> 00:24:05.600
is the convergence
in distribution.
00:24:05.600 --> 00:24:07.857
That's this one.
00:24:07.857 --> 00:24:09.940
The one that comes from
the central limit theorem.
00:24:09.940 --> 00:24:12.617
That's actually the weakest
possible you could make.
00:24:12.617 --> 00:24:14.200
Which is good, because
that means it's
00:24:14.200 --> 00:24:16.150
going to happen more often.
00:24:16.150 --> 00:24:17.840
And so why do we
need this thing?
00:24:17.840 --> 00:24:19.400
Because the only
thing we really need
00:24:19.400 --> 00:24:21.580
to do is to say that
when I start computing
00:24:21.580 --> 00:24:23.854
probabilities on
this random variable,
00:24:23.854 --> 00:24:25.520
they're going to look
like probabilities
00:24:25.520 --> 00:24:27.840
on that random variable.
00:24:27.840 --> 00:24:30.000
All right, so for example,
think of the following
00:24:30.000 --> 00:24:41.070
two random variables,
x and minus x.
00:24:41.070 --> 00:24:42.570
So this is the same
random variable,
00:24:42.570 --> 00:24:44.970
and this one is negative.
00:24:44.970 --> 00:24:48.050
When I look at those
two random variables,
00:24:48.050 --> 00:24:51.310
think of them as a sequence,
a constant sequence.
00:24:51.310 --> 00:24:53.970
These two constant sequences
do not go to the same number,
00:24:53.970 --> 00:24:54.470
right?
00:24:54.470 --> 00:24:57.910
One is plus-- one is x,
the other one is minus x.
00:24:57.910 --> 00:25:01.240
So unless x is the random
variable always equal to 0,
00:25:01.240 --> 00:25:03.290
those two things are different.
00:25:03.290 --> 00:25:05.920
However, when I compute
probabilities on this guy,
00:25:05.920 --> 00:25:09.010
and when I compute probabilities
on that guy, they're the same.
00:25:09.010 --> 00:25:12.250
Because x and minus x
have the same distribution
00:25:12.250 --> 00:25:15.430
just by symmetry of the
gaps in random variables.
00:25:15.430 --> 00:25:17.040
And so you can see
this is very weak.
00:25:17.040 --> 00:25:19.150
I'm not saying anything about
the two random variables being
00:25:19.150 --> 00:25:20.566
close to each other
every time I'm
00:25:20.566 --> 00:25:22.100
going to flip my coin, right?
00:25:22.100 --> 00:25:25.685
Maybe I'm going to press my
computer and say, what is x?
00:25:25.685 --> 00:25:26.560
Well, it's 1.2.
00:25:26.560 --> 00:25:29.110
Negative x is going
to be negative 1.2.
00:25:29.110 --> 00:25:30.670
Those things are
far apart, and it
00:25:30.670 --> 00:25:32.230
doesn't matter, because
in average those things
00:25:32.230 --> 00:25:34.330
are going to have the same
probabilities that's happening.
00:25:34.330 --> 00:25:36.040
And that's all we care
about in statistics.
00:25:36.040 --> 00:25:37.810
You need to realize that
this is what's important,
00:25:37.810 --> 00:25:39.130
and why you need to know.
00:25:39.130 --> 00:25:40.600
Because you have it really good.
00:25:40.600 --> 00:25:43.120
If your problem is you really
care more about convergence
00:25:43.120 --> 00:25:45.956
almost surely, which is probably
the strongest you can think of.
00:25:45.956 --> 00:25:48.590
So what we're going to do is
talk about different types
00:25:48.590 --> 00:25:51.200
of convergence not to
just reflect on the fact
00:25:51.200 --> 00:25:53.120
on how our life is good.
00:25:53.120 --> 00:25:56.420
It's just that the problem
is that when the convergence
00:25:56.420 --> 00:26:00.110
in distribution is so weak that
it cannot do anything I want
00:26:00.110 --> 00:26:00.740
with it.
00:26:00.740 --> 00:26:04.400
In particular, I cannot
say that if X converges,
00:26:04.400 --> 00:26:07.230
Xn converges in distribution,
and Yn converges
00:26:07.230 --> 00:26:10.790
in distribution, then Xn plus
Yn converge in distribution
00:26:10.790 --> 00:26:12.080
to the sum of their limits.
00:26:12.080 --> 00:26:12.890
I cannot do that.
00:26:12.890 --> 00:26:14.855
It's just too weak.
00:26:14.855 --> 00:26:16.730
Think of this example
and it's preventing you
00:26:16.730 --> 00:26:17.896
to do quite a lot of things.
00:26:20.820 --> 00:26:26.030
So this is converge in
distribution to sum n 0, 1.
00:26:26.030 --> 00:26:28.940
This is converge in
distribution to sum n 0, 1.
00:26:28.940 --> 00:26:31.490
But their sum is 0, and
it's certainly not--
00:26:31.490 --> 00:26:33.830
it doesn't look
like the sum of two
00:26:33.830 --> 00:26:36.440
independent Gaussian
random variables, right?
00:26:36.440 --> 00:26:40.220
And so what we need is to
have stronger conditions here
00:26:40.220 --> 00:26:42.950
and there, so that we can
actually put things together.
00:26:42.950 --> 00:26:45.176
And we're going to have
more complicated formulas.
00:26:45.176 --> 00:26:46.550
One of the formulas,
for example,
00:26:46.550 --> 00:26:50.430
is if I replace p by p
hat in this denominator.
00:26:50.430 --> 00:26:53.470
We mentioned doing
this at some point.
00:26:53.470 --> 00:26:57.550
So I would need that
p hat goes to p,
00:26:57.550 --> 00:26:59.320
but I need stronger
than n distributions
00:26:59.320 --> 00:27:00.420
so that this happens.
00:27:00.420 --> 00:27:04.270
I actually need this to
happen in a stronger sense.
00:27:04.270 --> 00:27:07.690
So here are the first two
strongest sense in which
00:27:07.690 --> 00:27:09.670
random variables can converge.
00:27:09.670 --> 00:27:13.140
The first one is almost surely.
00:27:13.140 --> 00:27:16.570
And who has already seen
this notation little omega
00:27:16.570 --> 00:27:19.490
when they're talking
about random variables?
00:27:19.490 --> 00:27:20.510
All right, so very few.
00:27:20.510 --> 00:27:24.012
So this little omega is-- so
what is a random variable?
00:27:24.012 --> 00:27:25.970
A random variable is
something that you measure
00:27:25.970 --> 00:27:27.625
on something that's random.
00:27:27.625 --> 00:27:29.000
So the example I
like to think of
00:27:29.000 --> 00:27:34.910
is, if you take a ball
of snow, and put it
00:27:34.910 --> 00:27:37.070
in the sun for some time.
00:27:37.070 --> 00:27:38.212
You come back.
00:27:38.212 --> 00:27:39.920
It's going to have a
random shape, right?
00:27:39.920 --> 00:27:42.604
It's going to be a random
blurb of something.
00:27:42.604 --> 00:27:45.020
But there's still a bunch of
things you can measure on it.
00:27:45.020 --> 00:27:46.410
You can measure its volume.
00:27:46.410 --> 00:27:48.200
You can measure its
inner temperature.
00:27:48.200 --> 00:27:50.210
You can measure
its surface area.
00:27:50.210 --> 00:27:52.250
All these things are
random variables,
00:27:52.250 --> 00:27:54.590
but the ball itself is omega.
00:27:54.590 --> 00:27:56.900
That's the thing on which
you make your measurement.
00:27:56.900 --> 00:28:00.870
And so a random variable is
just a function of those omegas.
00:28:00.870 --> 00:28:03.210
Now why do we make all
these things fancy?
00:28:03.210 --> 00:28:04.800
Because you cannot
take any function.
00:28:04.800 --> 00:28:06.841
This function has to be
what's called measurable,
00:28:06.841 --> 00:28:09.070
and there's entire
courses on measure theory,
00:28:09.070 --> 00:28:11.030
and not everything
is measurable.
00:28:11.030 --> 00:28:13.175
And so that's why you have
to be a little careful
00:28:13.175 --> 00:28:14.550
why not everything
is measurable,
00:28:14.550 --> 00:28:17.590
because you need some
sort of nice property.
00:28:17.590 --> 00:28:19.797
So that the measure
of something,
00:28:19.797 --> 00:28:22.380
the union of two things, is less
than the sum of the measures,
00:28:22.380 --> 00:28:23.830
things like that.
00:28:23.830 --> 00:28:30.940
And so almost surely is telling
you that for most of the balls,
00:28:30.940 --> 00:28:34.540
for most of the omegas,
that's the right-hand side.
00:28:34.540 --> 00:28:37.150
The probability of omega is
such that those things converge
00:28:37.150 --> 00:28:41.400
to each other is
actually equal to 1.
00:28:41.400 --> 00:28:45.620
So it tells me that for almost
all omegas, all the omegas,
00:28:45.620 --> 00:28:47.246
if I put them together,
I get something
00:28:47.246 --> 00:28:48.328
that has probability of 1.
00:28:48.328 --> 00:28:50.970
It might be that there are other
ones that have probability 0,
00:28:50.970 --> 00:28:52.680
but what it's telling
is that this thing
00:28:52.680 --> 00:28:55.841
happens for all possible
realization of the underlying
00:28:55.841 --> 00:28:56.340
thing.
00:28:56.340 --> 00:28:57.720
That's very strong.
00:28:57.720 --> 00:29:00.141
It essentially says
randomness does not matter,
00:29:00.141 --> 00:29:01.390
because it's happening always.
00:29:04.310 --> 00:29:06.340
Now convergence in
probability allows
00:29:06.340 --> 00:29:09.180
you to squeeze a little bit
of probability under the rock.
00:29:09.180 --> 00:29:12.130
It tells you I want the
convergence to hold,
00:29:12.130 --> 00:29:17.120
but I'm willing to let go
of some little epsilon.
00:29:17.120 --> 00:29:23.500
So I'm willing to allow Tn
to be less than epsilon.
00:29:23.500 --> 00:29:27.380
Tn minus T to be-- sorry,
to be larger than epsilon.
00:29:27.380 --> 00:29:29.360
But the problem is they
want this to go to 0
00:29:29.360 --> 00:29:31.430
as well as n goes to
infinity, but for each
00:29:31.430 --> 00:29:34.091
n this thing does not
have to be 0, which
00:29:34.091 --> 00:29:36.250
is different from here, right?
00:29:36.250 --> 00:29:40.140
So this probability
here is fine.
00:29:40.140 --> 00:29:44.460
So it's a little weaker, but
it's a slightly different one.
00:29:44.460 --> 00:29:46.860
I'm not going to ask you
to learn and show that one
00:29:46.860 --> 00:29:48.510
is weaker than the other one.
00:29:48.510 --> 00:29:51.010
But just know that these
are two different types.
00:29:51.010 --> 00:29:53.805
This one is actually much
easier to check than this one.
00:30:02.550 --> 00:30:06.550
Then there's something
called convergence in Lp.
00:30:06.550 --> 00:30:09.200
So this one is the fact that
it embodies the following fact.
00:30:09.200 --> 00:30:11.740
If I give you a random
variable with mean 0,
00:30:11.740 --> 00:30:14.110
and I tell you that its
variance is going to 0, right?
00:30:14.110 --> 00:30:16.795
You have a sequence of random
variables, their mean is 0,
00:30:16.795 --> 00:30:20.390
their expectation is 0, but
their variance is going to 0.
00:30:20.390 --> 00:30:23.500
So think of Gaussian random
variables with mean 0,
00:30:23.500 --> 00:30:26.300
and a variance
that shrinks to 0.
00:30:26.300 --> 00:30:29.260
And this random variable
converges to a spike at 0,
00:30:29.260 --> 00:30:31.570
so it converges to 0, right?
00:30:31.570 --> 00:30:35.660
And so what I mean by that is
that to have this convergence,
00:30:35.660 --> 00:30:38.800
all I had to tell you was that
the variance was going to 0.
00:30:38.800 --> 00:30:41.700
And so in L2 this is really
what it's telling you.
00:30:41.700 --> 00:30:44.720
It's telling you, well, if
the variance is going to 0--
00:30:44.720 --> 00:30:46.870
well, it's for any
random variable T,
00:30:46.870 --> 00:30:50.240
so here what I describe
was for a deterministic.
00:30:50.240 --> 00:30:55.340
So Tn goes to a random variable
T. If you look at the square--
00:30:55.340 --> 00:30:58.415
the expectation of the square
distance, and it goes to 0.
00:30:58.415 --> 00:31:00.540
But you don't have to limit
yourself to the square.
00:31:00.540 --> 00:31:01.910
You can take power of three.
00:31:01.910 --> 00:31:06.780
You can take power
67.6, power of 9 pi.
00:31:06.780 --> 00:31:09.780
You take whatever power you
want, it can be fractional.
00:31:09.780 --> 00:31:13.920
It has to be lower than 1, and
that's the convergence in Lp.
00:31:13.920 --> 00:31:17.520
But we mostly care
about integer p.
00:31:17.520 --> 00:31:20.107
And then here's our star, the
convergence in distribution,
00:31:20.107 --> 00:31:21.690
and that's just the
one that tells you
00:31:21.690 --> 00:31:27.290
that when I start computing
probabilities on the Tn,
00:31:27.290 --> 00:31:31.620
they're going to look very close
to the probabilities on the T.
00:31:31.620 --> 00:31:34.410
So that was our Tn with
this guy, for example,
00:31:34.410 --> 00:31:37.110
and T was this standard
Gaussian distribution.
00:31:37.110 --> 00:31:38.960
Now here, this is
not any probability.
00:31:38.960 --> 00:31:42.440
This is just the probability
then less than or equal to x.
00:31:42.440 --> 00:31:44.390
But if you remember
your probability class,
00:31:44.390 --> 00:31:45.830
if you can compute
those probabilities,
00:31:45.830 --> 00:31:47.204
you can compute
any probabilities
00:31:47.204 --> 00:31:50.034
just by subtracting and just
building things together.
00:31:55.230 --> 00:31:58.650
Well, I need this for all x's,
so I want this for each x,
00:31:58.650 --> 00:32:01.520
So you fix x, and then you
make the limit go to infinity.
00:32:01.520 --> 00:32:03.180
You make n go to
infinity, and I want
00:32:03.180 --> 00:32:06.480
this for the point x's at which
the cumulative distribution
00:32:06.480 --> 00:32:08.230
function of T is continuous.
00:32:08.230 --> 00:32:15.350
There might be jumps, and that
I don't actually care for those.
00:32:15.350 --> 00:32:17.777
All right, so here I mentioned
it for random variables.
00:32:17.777 --> 00:32:19.860
If you're interested,
there's also random vectors.
00:32:19.860 --> 00:32:23.430
A random vector is just a
table of random variables.
00:32:23.430 --> 00:32:25.351
You can talk about
random matrices.
00:32:25.351 --> 00:32:27.350
And you can talk about
random whatever you want.
00:32:27.350 --> 00:32:28.920
Every time you have
an object that's
00:32:28.920 --> 00:32:31.140
just collecting real
numbers, you can just
00:32:31.140 --> 00:32:33.370
plug random variables in there.
00:32:33.370 --> 00:32:37.050
And so there's all these
definitions that [? extend. ?]
00:32:37.050 --> 00:32:39.080
So where I see you
see an absolute value,
00:32:39.080 --> 00:32:40.166
we'll see a norm.
00:32:40.166 --> 00:32:43.040
Things like this.
00:32:43.040 --> 00:32:46.070
So I'm sure this might
look scary a little bit,
00:32:46.070 --> 00:32:49.010
but really what we are going to
use is only the last one, which
00:32:49.010 --> 00:32:50.426
as you can see is
just telling you
00:32:50.426 --> 00:32:52.890
that the probabilities
converge to the probabilities.
00:32:52.890 --> 00:32:55.830
But I'm going to need the other
ones every once in a while.
00:32:55.830 --> 00:32:59.670
And the reason is,
well, OK, so here I'm
00:32:59.670 --> 00:33:02.970
actually going to the
important characterizations
00:33:02.970 --> 00:33:05.340
of the convergence
in distribution,
00:33:05.340 --> 00:33:08.110
which is R convergence style.
00:33:08.110 --> 00:33:10.200
So i converge in
distribution if and only
00:33:10.200 --> 00:33:14.070
if for any function that's
continuous and bounded,
00:33:14.070 --> 00:33:16.170
when I look at the
expectation of f of Tn,
00:33:16.170 --> 00:33:19.870
this converges to the
expectation of f of T. OK,
00:33:19.870 --> 00:33:25.127
so this is just those two
things are actually equivalent.
00:33:25.127 --> 00:33:27.710
Sometimes it's easier to check
one, easier to check the other,
00:33:27.710 --> 00:33:30.043
but in this class you won't
have to prove that something
00:33:30.043 --> 00:33:33.380
converges in distribution
other than just combining
00:33:33.380 --> 00:33:37.240
our existing
convergence results.
00:33:37.240 --> 00:33:40.020
And then the last one which
is equivalent to the above two
00:33:40.020 --> 00:33:42.990
is, anybody knows what the
name of this quantity is?
00:33:42.990 --> 00:33:45.120
This expectation here?
00:33:45.120 --> 00:33:47.160
What is it called?
00:33:47.160 --> 00:33:49.080
The characteristic
function, right?
00:33:49.080 --> 00:33:52.680
And so this i is the complex
i, and is the complex number.
00:33:52.680 --> 00:33:54.120
And so it's
essentially telling me
00:33:54.120 --> 00:33:56.070
that, well, rather
than actually looking
00:33:56.070 --> 00:33:58.620
at all bounded and continuous
but real functions,
00:33:58.620 --> 00:34:03.630
I can actually look
at one specific family
00:34:03.630 --> 00:34:08.290
of complex functions, which
are the functions that maps
00:34:08.290 --> 00:34:12.980
T to E to the ixT
for x and R. That's
00:34:12.980 --> 00:34:14.880
a much smaller
family of functions.
00:34:14.880 --> 00:34:17.280
All possible continuous
embedded functions
00:34:17.280 --> 00:34:21.590
has many more elements
than just the real element.
00:34:21.590 --> 00:34:24.310
And so now I can show that
if I limit myself to do it,
00:34:24.310 --> 00:34:25.492
it's actually sufficient.
00:34:28.139 --> 00:34:32.520
So those three things are used
all over the literature just
00:34:32.520 --> 00:34:33.360
to show things.
00:34:33.360 --> 00:34:37.219
In particular, if you're
interested in deep digging
00:34:37.219 --> 00:34:39.510
a little more mathematically,
the central limit theorem
00:34:39.510 --> 00:34:40.510
is going to be so important.
00:34:40.510 --> 00:34:42.120
Maybe you want to read
about how to prove it.
00:34:42.120 --> 00:34:43.949
We're not going to
prove it in this class.
00:34:43.949 --> 00:34:49.800
There's probably at least five
different ways of proving it,
00:34:49.800 --> 00:34:52.440
but the most canonical one, the
one that you find in textbooks,
00:34:52.440 --> 00:34:55.980
is the one that actually
uses the third element.
00:34:55.980 --> 00:34:59.100
So you just look at the
characteristic function
00:34:59.100 --> 00:35:04.400
of the square root of
n Xn bar minus say mu,
00:35:04.400 --> 00:35:07.850
and you just expand the thing,
and this is what you get.
00:35:07.850 --> 00:35:09.230
And you will see
that in the end,
00:35:09.230 --> 00:35:13.820
you will get the characteristic
function of a Gaussian.
00:35:13.820 --> 00:35:14.570
Why a Gaussian?
00:35:14.570 --> 00:35:15.800
Why does it kick in?
00:35:15.800 --> 00:35:17.420
Well, because what is the
characteristic function
00:35:17.420 --> 00:35:17.900
of a Gaussian?
00:35:17.900 --> 00:35:19.760
Does anybody remember the
characteristic function
00:35:19.760 --> 00:35:20.718
of a standard Gaussian?
00:35:20.718 --> 00:35:21.929
AUDIENCE: [INAUDIBLE]
00:35:21.929 --> 00:35:23.470
PHILIPPE RIGOLLET:
Yeah, well, I mean
00:35:23.470 --> 00:35:27.760
there's two pi's and stuff
that goes away, right?
00:35:27.760 --> 00:35:29.330
A Gaussian is a random variable.
00:35:29.330 --> 00:35:31.140
A characteristic
function is a function,
00:35:31.140 --> 00:35:33.040
and so it's not really itself.
00:35:33.040 --> 00:35:34.800
It looks like itself.
00:35:34.800 --> 00:35:37.262
Anybody knows what
the actual formula is?
00:35:37.262 --> 00:35:37.761
Yeah.
00:35:37.761 --> 00:35:39.584
AUDIENCE: [INAUDIBLE]
00:35:39.584 --> 00:35:41.000
PHILIPPE RIGOLLET:
E to the minus?
00:35:41.000 --> 00:35:42.230
AUDIENCE: E to the
minus x squared over 2.
00:35:42.230 --> 00:35:43.355
PHILIPPE RIGOLLET: Exactly.
00:35:43.355 --> 00:35:44.960
E to the minus x squared over 2.
00:35:44.960 --> 00:35:46.670
But this x squared
over 2 is actually
00:35:46.670 --> 00:35:49.701
just the second order expansion
in the Taylor expansion.
00:35:49.701 --> 00:35:51.534
And that's why the
Gaussian is so important.
00:35:51.534 --> 00:35:54.820
It's just the second
order Taylor expansion.
00:35:54.820 --> 00:35:56.190
And so you can check it out.
00:35:56.190 --> 00:35:58.350
I think Terry Tao has
some stuff on his blog,
00:35:58.350 --> 00:36:00.360
and there's a bunch
of different proofs.
00:36:00.360 --> 00:36:02.790
But if you want to prove
convergence in distribution,
00:36:02.790 --> 00:36:07.900
you very likely are going to
use one this three right here.
00:36:07.900 --> 00:36:09.010
So let's move on.
00:36:13.050 --> 00:36:15.510
This is when I said
that this convergence is
00:36:15.510 --> 00:36:17.190
weaker than that convergence.
00:36:17.190 --> 00:36:18.870
This is what I meant.
00:36:18.870 --> 00:36:20.700
If you have convergence
in one style,
00:36:20.700 --> 00:36:23.200
it implies convergence
in the other stuff.
00:36:23.200 --> 00:36:26.505
So the first [INAUDIBLE] is that
if Tn converges almost surely,
00:36:26.505 --> 00:36:28.950
this a dot s dot
means almost surely,
00:36:28.950 --> 00:36:31.200
then it also converges
in probability
00:36:31.200 --> 00:36:32.900
and actually the
two limits, which
00:36:32.900 --> 00:36:37.410
are this random variable
T, are equal almost surely.
00:36:37.410 --> 00:36:39.750
Basically what it means is
that whatever you measure one
00:36:39.750 --> 00:36:42.166
is going to be the same that
you measure on the other one.
00:36:42.166 --> 00:36:44.300
So that's very strong.
00:36:44.300 --> 00:36:47.960
So that means that
convergence almost surely
00:36:47.960 --> 00:36:50.990
is stronger than
convergence in probability.
00:36:50.990 --> 00:36:53.570
If you're converge in Lp
then you also converge
00:36:53.570 --> 00:36:56.390
in Lq for sum q less than p.
00:36:56.390 --> 00:36:59.480
So if you converge in L2,
you'll also converge in L1.
00:36:59.480 --> 00:37:03.050
If you converge in L67,
you converge in L2.
00:37:03.050 --> 00:37:04.920
If you're converge
in L infinity,
00:37:04.920 --> 00:37:09.390
you converge in Lp for anything.
00:37:09.390 --> 00:37:12.390
And so, again, limits are equal.
00:37:12.390 --> 00:37:14.396
And then when you
converge in distribution,
00:37:14.396 --> 00:37:15.770
when you converge
in probability,
00:37:15.770 --> 00:37:18.860
you also converge
in distribution.
00:37:18.860 --> 00:37:22.780
OK, so almost surely
implies probability.
00:37:22.780 --> 00:37:24.400
Lp implies probability.
00:37:24.400 --> 00:37:26.520
Probability implies
distribution.
00:37:26.520 --> 00:37:28.740
And here note that
I did not write,
00:37:28.740 --> 00:37:30.890
and the limits are
equal almost surely.
00:37:30.890 --> 00:37:31.390
Why?
00:37:35.446 --> 00:37:37.070
Because the convergence
in distribution
00:37:37.070 --> 00:37:38.930
is actually not telling you
that your random variable
00:37:38.930 --> 00:37:40.850
is converging to
another random variable.
00:37:40.850 --> 00:37:42.433
It's telling you
that the distribution
00:37:42.433 --> 00:37:45.190
of your random variable is
converging to a distribution.
00:37:45.190 --> 00:37:47.180
And think of this, guys.
00:37:47.180 --> 00:37:49.064
x and minus x.
00:37:49.064 --> 00:37:50.480
The central limit
theorem tells me
00:37:50.480 --> 00:37:53.460
that I'm converging to some
standard Gaussian distribution,
00:37:53.460 --> 00:37:57.334
but am I converging to x or
am I converging to minus x?
00:37:57.334 --> 00:37:58.375
It's not well identified.
00:37:58.375 --> 00:38:01.470
It's any random variable
that has this distribution.
00:38:01.470 --> 00:38:04.110
So there's no way
the limits are equal.
00:38:04.110 --> 00:38:06.070
Their distributions are
going to be the same,
00:38:06.070 --> 00:38:07.910
but they're not the same limit.
00:38:07.910 --> 00:38:09.970
Is that clear for everyone?
00:38:09.970 --> 00:38:12.700
So in a way, convergence
in distribution
00:38:12.700 --> 00:38:15.177
is really not a convergence
of a random variable
00:38:15.177 --> 00:38:16.510
towards another random variable.
00:38:16.510 --> 00:38:18.520
It's just telling you
the limiting distribution
00:38:18.520 --> 00:38:20.390
of your random
variable [INAUDIBLE]
00:38:20.390 --> 00:38:22.822
which is enough for us.
00:38:22.822 --> 00:38:24.530
And one thing that's
actually really nice
00:38:24.530 --> 00:38:28.790
is this continuous
mapping theorem, which
00:38:28.790 --> 00:38:30.347
essentially tells you that--
00:38:30.347 --> 00:38:32.180
so this is one of the
theorems that we like,
00:38:32.180 --> 00:38:33.950
because they tell
us you can do what
00:38:33.950 --> 00:38:35.660
you feel like you want to do.
00:38:35.660 --> 00:38:39.830
So if I have Tn that goes to
T, f of Tn goes to f of T,
00:38:39.830 --> 00:38:42.800
and this is true for
any of those convergence
00:38:42.800 --> 00:38:45.650
except for Lp.
00:38:48.170 --> 00:38:51.490
But they have to have f,
which is continuous, otherwise
00:38:51.490 --> 00:38:54.950
weird stuff can happen.
00:38:54.950 --> 00:38:58.150
So this is going to be
convenient, because here I
00:38:58.150 --> 00:39:00.012
don't have X to n minus p.
00:39:00.012 --> 00:39:01.220
I have a continuous function.
00:39:01.220 --> 00:39:03.094
It's between a linear
function of Xn minus p,
00:39:03.094 --> 00:39:05.800
but I could think of like
even crazier stuff to do,
00:39:05.800 --> 00:39:07.876
and it would still be true.
00:39:07.876 --> 00:39:10.250
If I took the square, it would
converge to something that
00:39:10.250 --> 00:39:11.600
looks like its distribution.
00:39:11.600 --> 00:39:12.975
It's the same as
the distribution
00:39:12.975 --> 00:39:16.100
of a square Gaussian.
00:39:16.100 --> 00:39:18.435
So this is a mouthful,
these two slides--
00:39:18.435 --> 00:39:20.310
actually this particular
slide is a mouthful.
00:39:20.310 --> 00:39:24.620
What I have in my head since
I was pretty much where you're
00:39:24.620 --> 00:39:27.890
sitting, is this diagram.
00:39:27.890 --> 00:39:32.100
So what it tells me-- so it's
actually voluntarily cropped,
00:39:32.100 --> 00:39:35.430
so you can start from
any Lq you want large.
00:39:35.430 --> 00:39:38.030
And then as you
decrease the index,
00:39:38.030 --> 00:39:39.750
you are actually
implying, implying,
00:39:39.750 --> 00:39:42.690
implying until you imply
convergence in probability.
00:39:42.690 --> 00:39:44.850
Convergence almost surely
implies convergence
00:39:44.850 --> 00:39:49.650
in probability, and everything
goes to the [? sync, ?]
00:39:49.650 --> 00:39:52.590
that is convergence
in distribution.
00:39:52.590 --> 00:39:55.230
So everything implies
convergence in distribution.
00:39:55.230 --> 00:39:57.800
So that's basically rather than
remembering those formulas,
00:39:57.800 --> 00:39:59.840
this is really the diagram
you want to remember.
00:40:02.690 --> 00:40:06.590
All right, so why do we bother
learning about those things.
00:40:06.590 --> 00:40:09.380
That's because of this
limits and operations.
00:40:09.380 --> 00:40:10.580
Operations and limits.
00:40:10.580 --> 00:40:13.710
If I have a sequence
of real numbers,
00:40:13.710 --> 00:40:17.770
and I know that Xn converges
to X and Yn converges to Y,
00:40:17.770 --> 00:40:20.051
then I can start doing all
my manipulations and things
00:40:20.051 --> 00:40:20.550
are happy.
00:40:20.550 --> 00:40:21.560
I can add stuff.
00:40:21.560 --> 00:40:23.240
I can multiply stuff.
00:40:23.240 --> 00:40:28.049
But it's not true always for
convergence in distribution.
00:40:28.049 --> 00:40:29.590
But it is, what's
nice, it's actually
00:40:29.590 --> 00:40:32.490
true for convergence
almost surely.
00:40:32.490 --> 00:40:35.250
Convergence almost surely
everything is true.
00:40:35.250 --> 00:40:38.110
It's just impossible
to make it fail.
00:40:38.110 --> 00:40:41.080
But convergence in probability
is not always everything,
00:40:41.080 --> 00:40:43.870
but at least you can actually
add stuff and multiply stuff.
00:40:43.870 --> 00:40:46.600
And it will still give
you the sum of the n,
00:40:46.600 --> 00:40:49.080
and the product of the n.
00:40:49.080 --> 00:40:55.590
You can even take the ratio
if V is not 0 of course.
00:40:55.590 --> 00:40:57.090
If the limit is not
0, then actually
00:40:57.090 --> 00:40:58.520
you need Vn to be not 0 as well.
00:41:01.440 --> 00:41:05.530
You can actually prove
this last statement, right?
00:41:05.530 --> 00:41:08.620
Because it's a combination
of the first statement
00:41:08.620 --> 00:41:11.740
of the second one, and the
continuous mapping theorem.
00:41:11.740 --> 00:41:14.770
Because the function
that maps x to 1
00:41:14.770 --> 00:41:19.180
over x on everything
but 0, is continuous.
00:41:19.180 --> 00:41:24.560
And so 1 over Vn
converges to 1 over V,
00:41:24.560 --> 00:41:26.820
and then I can multiply
those two things.
00:41:26.820 --> 00:41:28.870
So you actually knew that one.
00:41:28.870 --> 00:41:30.760
But really this is
not what matters,
00:41:30.760 --> 00:41:35.110
because this is something that
you will do whatever happens.
00:41:35.110 --> 00:41:37.786
If I don't tell you you cannot
do it, well, you will do it.
00:41:37.786 --> 00:41:39.160
But in general
those things don't
00:41:39.160 --> 00:41:40.660
apply to convergence
in distribution
00:41:40.660 --> 00:41:44.390
unless the pair itself is known
to converge in distribution.
00:41:44.390 --> 00:41:48.220
Remember when I said that
these things apply to vectors,
00:41:48.220 --> 00:41:51.150
then you need to actually
say that the vector converges
00:41:51.150 --> 00:41:53.520
in distributions to
the limiting factor.
00:41:53.520 --> 00:41:55.299
Now this tells
you in particular,
00:41:55.299 --> 00:41:57.340
since the cumulative
distribution function is not
00:41:57.340 --> 00:41:59.820
defined for vectors,
I would have
00:41:59.820 --> 00:42:02.610
to actually use one of the
other distributions, one
00:42:02.610 --> 00:42:04.410
of the other criteria,
which is convergence
00:42:04.410 --> 00:42:07.410
of characteristic
functions or convergence
00:42:07.410 --> 00:42:11.100
of a function of bounded
continuous function
00:42:11.100 --> 00:42:12.470
of the random variable.
00:42:12.470 --> 00:42:17.154
0.2 or 0.3, but 0.1 is not
going get you anywhere.
00:42:17.154 --> 00:42:18.570
But this is something
that's going
00:42:18.570 --> 00:42:20.850
to be too hard for us to
deal with, so we're actually
00:42:20.850 --> 00:42:23.742
going to rely on the
fact that we have
00:42:23.742 --> 00:42:24.950
something that's even better.
00:42:24.950 --> 00:42:26.580
There's something
that is waiting for us
00:42:26.580 --> 00:42:29.163
at the end of his lecture, which
is called Slutsky's that says
00:42:29.163 --> 00:42:33.490
that if V, in this case,
converges in probability
00:42:33.490 --> 00:42:36.040
but U converge in distribution,
I can actually still do that.
00:42:36.040 --> 00:42:37.456
I actually don't
need both of them
00:42:37.456 --> 00:42:38.746
to converge in probability.
00:42:38.746 --> 00:42:41.204
I actually need only one of
them to converge in probability
00:42:41.204 --> 00:42:42.162
to make this statement.
00:42:42.162 --> 00:42:45.070
But two sum.
00:42:45.070 --> 00:42:47.060
So let's go to another example.
00:42:47.060 --> 00:42:49.750
So I just want to make sure that
we keep on doing statistics.
00:42:49.750 --> 00:42:51.953
And every time we're going
to just do a little bit
00:42:51.953 --> 00:42:54.202
too much probability, I'm
going to reset the pressure,
00:42:54.202 --> 00:42:56.090
and start doing
statistics again.
00:42:56.090 --> 00:42:59.460
All right, so assume
you observe the times
00:42:59.460 --> 00:43:04.590
the inter-arrival time
of the T at Kendall.
00:43:04.590 --> 00:43:06.030
So this is not the arrival time.
00:43:06.030 --> 00:43:09.980
It's not like 7:56, 8:15.
00:43:09.980 --> 00:43:12.920
No, it's really the
inter-arrival time, right?
00:43:12.920 --> 00:43:17.300
So say the next T is
arriving in six minutes.
00:43:17.300 --> 00:43:20.950
So let's say [INAUDIBLE] bound.
00:43:20.950 --> 00:43:23.250
And so you have this
inter-arrival time.
00:43:23.250 --> 00:43:27.260
So those are numbers say,
3, 4, 5, 4, 3, et cetera.
00:43:27.260 --> 00:43:29.490
So I have this
sequence of numbers.
00:43:29.490 --> 00:43:31.100
So I'm going to
observe this, and I'm
00:43:31.100 --> 00:43:36.050
going to try to infer what
is the rate of T's going out
00:43:36.050 --> 00:43:38.957
of the station from this.
00:43:38.957 --> 00:43:40.790
So I'm going to assume
that these things are
00:43:40.790 --> 00:43:43.160
mutually independent.
00:43:43.160 --> 00:43:44.890
That's probably not
completely true.
00:43:44.890 --> 00:43:46.850
Again, it just means
that what it would mean
00:43:46.850 --> 00:43:49.100
is that two consecutive
inter-arrival times are
00:43:49.100 --> 00:43:50.021
independent.
00:43:50.021 --> 00:43:52.020
I mean, you can make it
independent if you want,
00:43:52.020 --> 00:43:53.603
but again, this
independent assumption
00:43:53.603 --> 00:43:56.180
is for us to be happy and safe.
00:43:56.180 --> 00:43:58.160
Unless someone comes
with overwhelming proof
00:43:58.160 --> 00:44:01.466
that it's not independent and
far from being independent,
00:44:01.466 --> 00:44:03.950
then yes, you have a problem.
00:44:03.950 --> 00:44:06.200
But it might be the fact
that it's actually-- if you
00:44:06.200 --> 00:44:09.240
have a T that's one hour late.
00:44:09.240 --> 00:44:12.330
If an inter-arrival time is
one hour, then the other T,
00:44:12.330 --> 00:44:14.300
either they fixed
it, and it's going
00:44:14.300 --> 00:44:17.150
to be just 30 seconds behind,
or they haven't fixed it,
00:44:17.150 --> 00:44:18.990
then it's going to be
another hour behind.
00:44:18.990 --> 00:44:20.780
So they're not
exactly independent,
00:44:20.780 --> 00:44:24.430
but they are when things
work well and approximate.
00:44:24.430 --> 00:44:27.580
And so now I need to model
a random variable that's
00:44:27.580 --> 00:44:29.564
positive, maybe
not upper bounded.
00:44:29.564 --> 00:44:31.480
I mean, people complain
enough that this thing
00:44:31.480 --> 00:44:32.435
can be really large.
00:44:32.435 --> 00:44:34.810
And so one thing that people
like for inter-arrival times
00:44:34.810 --> 00:44:36.839
is exponential distribution.
00:44:36.839 --> 00:44:38.380
So that's a positive
random variable.
00:44:38.380 --> 00:44:40.463
Looks like an exponential
on the right-hand slide,
00:44:40.463 --> 00:44:41.650
on the positive line.
00:44:41.650 --> 00:44:43.600
And so it decays
very fast towards 0.
00:44:43.600 --> 00:44:45.400
The probability that
you have very large
00:44:45.400 --> 00:44:49.080
values exponentially small, and
there's a [INAUDIBLE] lambda
00:44:49.080 --> 00:44:50.900
that controls how
exponential is defined.
00:44:50.900 --> 00:44:53.600
It's exponential minus
lambda times something.
00:44:53.600 --> 00:44:56.270
And so we're going
to assume that they
00:44:56.270 --> 00:44:58.610
have the same distribution,
the same random variable.
00:44:58.610 --> 00:45:00.530
So they're IID, because
they are independent,
00:45:00.530 --> 00:45:01.810
and they're identically
distributed.
00:45:01.810 --> 00:45:04.018
They all have this exponential
with parameter lambda,
00:45:04.018 --> 00:45:06.330
and I'm going to try to
learn something about lambda.
00:45:06.330 --> 00:45:08.790
What is the estimated
value of lambda,
00:45:08.790 --> 00:45:12.210
and can I build a confidence
interval for lambda.
00:45:12.210 --> 00:45:16.470
So we observe n arrival times.
00:45:16.470 --> 00:45:20.420
So as I said, the
mutual independence
00:45:20.420 --> 00:45:24.055
is plausible, but not
completely justified.
00:45:24.055 --> 00:45:25.430
The fact that
they're exponential
00:45:25.430 --> 00:45:27.804
is actually something that
people like in all this what's
00:45:27.804 --> 00:45:29.030
called queuing theory.
00:45:29.030 --> 00:45:31.040
So exponentials
arise a lot when you
00:45:31.040 --> 00:45:32.450
talk about inter-arrival times.
00:45:32.450 --> 00:45:34.010
It's not about
the bus, but where
00:45:34.010 --> 00:45:41.780
it's very important is call
centers, service, servers where
00:45:41.780 --> 00:45:45.260
tasks come, and people
want to know how long it's
00:45:45.260 --> 00:45:47.450
going to take to serve a task.
00:45:47.450 --> 00:45:50.060
So when I call at
a center, nobody
00:45:50.060 --> 00:45:52.710
knows how long I'm going to stay
on the phone with this person.
00:45:52.710 --> 00:45:54.590
But it turns out that
empirically exponential
00:45:54.590 --> 00:45:56.797
distributions have been
very good at modeling this.
00:45:56.797 --> 00:45:58.630
And what it means is
that they're actually--
00:45:58.630 --> 00:46:01.860
you have this
memoryless property.
00:46:01.860 --> 00:46:03.570
It's kind of crazy if
you think about it.
00:46:03.570 --> 00:46:04.611
What does that thing say?
00:46:04.611 --> 00:46:06.560
Let's parse it.
00:46:06.560 --> 00:46:08.910
That's the probability.
00:46:08.910 --> 00:46:12.620
So this is condition on the
fact that T1 is larger than T.
00:46:12.620 --> 00:46:14.780
So T1 is just say the
first arrival time.
00:46:14.780 --> 00:46:16.820
That means that
conditionally on the fact
00:46:16.820 --> 00:46:19.700
that I've been waiting
for the first T, well,
00:46:19.700 --> 00:46:23.500
the first [INAUDIBLE].
00:46:23.500 --> 00:46:27.440
Well, I should probably-- the
first subway for more than T
00:46:27.440 --> 00:46:30.340
conditionally-- so I've been
there T minutes already.
00:46:30.340 --> 00:46:33.126
Then the probability that
I wait for s more minutes.
00:46:33.126 --> 00:46:35.000
So that's the probability
that T1 is learned,
00:46:35.000 --> 00:46:38.439
and the time that we've
already waited plus x.
00:46:38.439 --> 00:46:40.230
Given that I've been
waiting for T minutes,
00:46:40.230 --> 00:46:42.340
really I wait for
s more minutes,
00:46:42.340 --> 00:46:46.416
is actually the probability
that I wait for s minutes total.
00:46:46.416 --> 00:46:47.540
It's completely memoryless.
00:46:47.540 --> 00:46:49.670
It doesn't remember how
long have you been waiting.
00:46:49.670 --> 00:46:51.020
The probability does not change.
00:46:51.020 --> 00:46:53.450
You can have waited for
two hours, the probability
00:46:53.450 --> 00:46:55.429
that it takes
another 10 minutes is
00:46:55.429 --> 00:46:56.845
going to be the
same as if you had
00:46:56.845 --> 00:46:59.250
been waiting for zero minutes.
00:46:59.250 --> 00:47:00.750
And that's something
that's actually
00:47:00.750 --> 00:47:02.470
part of your problem set.
00:47:02.470 --> 00:47:03.420
Very easy to compute.
00:47:03.420 --> 00:47:05.730
This is just an
analytical property.
00:47:05.730 --> 00:47:07.226
And you just
manipulate functions,
00:47:07.226 --> 00:47:09.351
and you see that this thing
just happen to be true,
00:47:09.351 --> 00:47:11.840
and that's something
that people like.
00:47:11.840 --> 00:47:15.140
Because that's also
something that benefit.
00:47:15.140 --> 00:47:17.540
And also what we like is
that this thing is positive
00:47:17.540 --> 00:47:21.080
almost surely, which is good
when you model arrival times.
00:47:21.080 --> 00:47:23.132
To be fair, we're not
going to be that careful.
00:47:23.132 --> 00:47:24.590
Because sometimes
we are just going
00:47:24.590 --> 00:47:29.010
to assume that something
follows a normal distribution.
00:47:29.010 --> 00:47:30.627
And in particular,
I mean, I don't
00:47:30.627 --> 00:47:32.460
know if we're going to
go into that details,
00:47:32.460 --> 00:47:34.830
but a good thing that you
can model with a Gaussian
00:47:34.830 --> 00:47:38.430
distribution are
heights of students.
00:47:38.430 --> 00:47:40.720
But technically with
positive probability,
00:47:40.720 --> 00:47:44.290
you can have a negative
Gaussian random variable, right?
00:47:44.290 --> 00:47:48.550
And the probability being it's
probably 10 to the minus 25,
00:47:48.550 --> 00:47:49.716
but it's positive.
00:47:49.716 --> 00:47:51.590
But it's good enough
for us for our modeling.
00:47:51.590 --> 00:47:54.242
So this thing is nice, but this
is not going to be required.
00:47:54.242 --> 00:47:56.200
When you're modeling
positive random variables,
00:47:56.200 --> 00:47:59.050
you don't always have to use
positive distributions that are
00:47:59.050 --> 00:48:01.465
supported on positive numbers.
00:48:01.465 --> 00:48:03.397
You can use distributions
like Gaussian.
00:48:06.300 --> 00:48:09.817
So now this exponential
distribution of T1, Tn
00:48:09.817 --> 00:48:11.400
they have the same
parameter, and that
00:48:11.400 --> 00:48:14.430
means that in average they have
the same inter-arrival time.
00:48:14.430 --> 00:48:16.890
So this lambda is
actually the expectation.
00:48:16.890 --> 00:48:19.390
And what I'm just saying
is that they're identically
00:48:19.390 --> 00:48:21.600
distributed means
that I mean some sort
00:48:21.600 --> 00:48:24.279
of a stationary regime,
and it's not always true.
00:48:24.279 --> 00:48:26.070
I have to look at a
shorter period of time,
00:48:26.070 --> 00:48:28.810
because at rush
hour and 11:00 PM
00:48:28.810 --> 00:48:31.200
clearly those average
inter-arrival times
00:48:31.200 --> 00:48:33.540
are going to be different
So it means that I am really
00:48:33.540 --> 00:48:35.310
focusing maybe on rush hour.
00:48:38.567 --> 00:48:39.650
Sorry, I said it's lambda.
00:48:39.650 --> 00:48:40.816
It's actually 1 over lambda.
00:48:40.816 --> 00:48:42.460
I always mix the two.
00:48:42.460 --> 00:48:44.300
All right, so you have
the density of T1.
00:48:44.300 --> 00:48:46.970
So f of T is this.
00:48:46.970 --> 00:48:49.400
So it's on the
positive real line.
00:48:49.400 --> 00:48:52.390
The fact that I have strictly
positive or larger [INAUDIBLE]
00:48:52.390 --> 00:48:54.542
to 0 doesn't make
any difference.
00:48:54.542 --> 00:48:55.500
So this is the density.
00:48:55.500 --> 00:48:58.220
So it's lambda E to the minus
lambda T. The lambda in front
00:48:58.220 --> 00:48:59.690
just ensures that
when I integrate
00:48:59.690 --> 00:49:03.500
this function between 0
and infinity, I get 1.
00:49:03.500 --> 00:49:06.160
And you can see, it decays like
exponential minus lambda T.
00:49:06.160 --> 00:49:09.688
So if I were to draw it, it
would just look like this.
00:49:13.630 --> 00:49:17.862
So at 0, what
value does it take?
00:49:17.862 --> 00:49:19.750
Lambda.
00:49:19.750 --> 00:49:23.160
And then I decay like
exponential minus lambda T.
00:49:23.160 --> 00:49:30.840
So this is 0, and
this is f of T.
00:49:30.840 --> 00:49:33.730
So very small probability
of being very large.
00:49:33.730 --> 00:49:35.730
Of course, it depends on lambda.
00:49:35.730 --> 00:49:37.916
Now the expectation, you
can compute the expectation
00:49:37.916 --> 00:49:38.790
of this thing, right?
00:49:38.790 --> 00:49:41.881
So you integrate T
times f of T. This
00:49:41.881 --> 00:49:44.130
is part of the little sheet
that I gave you last time.
00:49:44.130 --> 00:49:45.629
This is one of the
things you should
00:49:45.629 --> 00:49:47.160
be able to do blindfolded.
00:49:47.160 --> 00:49:51.276
And then you get the expectation
of T1 is 1 over lambda.
00:49:51.276 --> 00:49:53.010
That's what comes out.
00:49:53.010 --> 00:49:57.630
So as I actually tell many of
my students, 99% of statistics
00:49:57.630 --> 00:50:00.274
is replacing
expectations by averages.
00:50:00.274 --> 00:50:02.940
And so what you're tempted to do
is say, well, if in average I'm
00:50:02.940 --> 00:50:05.910
supposed to see 1 over lambda,
I have 15 observations.
00:50:05.910 --> 00:50:07.810
I'm just going to average
those observations,
00:50:07.810 --> 00:50:10.143
and I'm going to see something
that should be close to 1
00:50:10.143 --> 00:50:11.710
over lambda.
00:50:11.710 --> 00:50:14.890
So statistics is about
replacing averages,
00:50:14.890 --> 00:50:17.950
expectations with
averages, and that's we do.
00:50:17.950 --> 00:50:21.530
So Tn bar here, which is
the average of the Ti's, is
00:50:21.530 --> 00:50:25.060
a pretty good estimator
for 1 over lambda.
00:50:25.060 --> 00:50:27.140
So if I want an
estimate for lambda,
00:50:27.140 --> 00:50:30.190
then I need to
take 1 over Tn bar.
00:50:30.190 --> 00:50:32.510
So here is one estimator.
00:50:32.510 --> 00:50:36.340
I did it without much
principle except that I just
00:50:36.340 --> 00:50:38.740
want to replace
expectations by averages,
00:50:38.740 --> 00:50:41.290
and then I fixed the problem
that I was actually estimating
00:50:41.290 --> 00:50:43.030
1 over lambda by lambda.
00:50:43.030 --> 00:50:45.490
But you could come up with
other estimators, right?
00:50:45.490 --> 00:50:49.730
But let's say this is my way
of getting to that estimator.
00:50:49.730 --> 00:50:52.480
Just like I didn't give you
any principled way of getting p
00:50:52.480 --> 00:50:54.770
hat, which is Xn bar
in the kiss example.
00:50:54.770 --> 00:50:57.850
But that's the
natural way to do it.
00:50:57.850 --> 00:51:01.380
Everybody is completely
shocked by this approach?
00:51:01.380 --> 00:51:03.300
All right, so let's do this.
00:51:03.300 --> 00:51:06.260
So what can I say about the
properties of this estimator
00:51:06.260 --> 00:51:08.130
lambda hat?
00:51:08.130 --> 00:51:12.750
Well, I know that Tn bar
is going to 1 over lambda
00:51:12.750 --> 00:51:14.214
by the law of large number.
00:51:14.214 --> 00:51:14.880
It's an average.
00:51:14.880 --> 00:51:18.120
It converges to the
expectation both almost surely,
00:51:18.120 --> 00:51:19.185
and in probability.
00:51:19.185 --> 00:51:21.310
So the first one is the
strong law of large number,
00:51:21.310 --> 00:51:23.526
the second one is the
weak law of large number.
00:51:23.526 --> 00:51:24.650
I can apply the strong one.
00:51:24.650 --> 00:51:26.800
I have enough conditions.
00:51:26.800 --> 00:51:31.610
And hence, what do I apply
so that 1 over Tn bar
00:51:31.610 --> 00:51:35.110
actually goes to lambda?
00:51:35.110 --> 00:51:36.250
So I said hence.
00:51:36.250 --> 00:51:37.041
What is hence?
00:51:37.041 --> 00:51:37.874
What is it based on?
00:51:37.874 --> 00:51:43.455
AUDIENCE: [INAUDIBLE]
00:51:43.455 --> 00:51:45.580
PHILIPPE RIGOLLET Yeah,
continuous mapping theorem,
00:51:45.580 --> 00:51:45.720
right?
00:51:45.720 --> 00:51:47.370
So I have this
function 1 over x.
00:51:47.370 --> 00:51:49.180
I just apply this function.
00:51:49.180 --> 00:51:51.397
So if it was 1 over
lambda squared,
00:51:51.397 --> 00:51:52.980
I would have the
same thing that would
00:51:52.980 --> 00:51:54.688
happen just because
the function 1 over x
00:51:54.688 --> 00:51:58.130
is continuous away from 0.
00:51:58.130 --> 00:52:00.300
And now the central
limit theorem
00:52:00.300 --> 00:52:02.370
is also telling me
something about lambda.
00:52:02.370 --> 00:52:03.256
About Tn bar, right?
00:52:03.256 --> 00:52:05.130
It's telling me that if
I look at my average,
00:52:05.130 --> 00:52:08.520
I remove the expectation here.
00:52:08.520 --> 00:52:11.520
So if I do Tn bar
minus my expectation,
00:52:11.520 --> 00:52:15.820
rescale by this guy here,
then this thing is going
00:52:15.820 --> 00:52:18.280
to converge to some
Gaussian random variable,
00:52:18.280 --> 00:52:21.260
but here I have this
lambda to the negative 1--
00:52:21.260 --> 00:52:23.530
to the negative 2
here, and that's
00:52:23.530 --> 00:52:25.720
because they did not
tell you that if you
00:52:25.720 --> 00:52:28.730
compute the variance--
00:52:28.730 --> 00:52:30.532
so from this, you
can probably extract.
00:52:34.308 --> 00:52:39.280
So if I have X that follows
some exponential distribution
00:52:39.280 --> 00:52:40.350
with parameter lambda.
00:52:40.350 --> 00:52:42.580
Well, let's call it T.
00:52:42.580 --> 00:52:46.540
So we know that T in
expectation, the expectation
00:52:46.540 --> 00:52:48.340
of T is 1 over lambda.
00:52:48.340 --> 00:52:49.560
What is the variance of T?
00:52:56.690 --> 00:53:00.360
You should be able to read
it from the thing here.
00:53:09.984 --> 00:53:10.900
1 over lambda squared.
00:53:10.900 --> 00:53:12.816
That's what you actually
read in the variance,
00:53:12.816 --> 00:53:16.530
because the central limit
theorem is really telling you
00:53:16.530 --> 00:53:19.590
the distribution
goes through this n.
00:53:19.590 --> 00:53:23.739
But this numbers and this
number you can read, right?
00:53:23.739 --> 00:53:26.280
If you look at the expectation
of this guy it's-- of this guy
00:53:26.280 --> 00:53:26.830
comes out.
00:53:26.830 --> 00:53:28.660
This is 1 over lambda
minus 1 over lambda.
00:53:28.660 --> 00:53:30.360
That's why you read the 0.
00:53:30.360 --> 00:53:32.550
And if you look at the
variance of the dot,
00:53:32.550 --> 00:53:36.330
you get n times the
variance of this average.
00:53:36.330 --> 00:53:39.510
Variance of the average is
picking up a factor 1 over n.
00:53:39.510 --> 00:53:40.590
So the n cancels.
00:53:40.590 --> 00:53:42.881
And then I'm left with only
one of the variances, which
00:53:42.881 --> 00:53:45.250
is 1 over lambda squared.
00:53:45.250 --> 00:53:48.130
OK, so we're not going
to do that in details,
00:53:48.130 --> 00:53:50.430
because, again, this is just
a pure calculus exercise.
00:53:50.430 --> 00:53:54.700
But this is if you compute
integral of lambda e
00:53:54.700 --> 00:53:58.430
to the minus t lambda
times t squared.
00:53:58.430 --> 00:54:01.754
Actually t minus 1
over lambda squared
00:54:01.754 --> 00:54:05.180
dt between 0 and infinity.
00:54:05.180 --> 00:54:07.774
You will see that this thing
is 1 over lambda squared.
00:54:14.157 --> 00:54:15.139
How would I do this?
00:54:20.540 --> 00:54:24.490
Configuration by
[INAUDIBLE] or you know it.
00:54:24.490 --> 00:54:26.100
All right.
00:54:26.100 --> 00:54:29.200
So this is what the central
limit theorem tells me.
00:54:29.200 --> 00:54:31.620
So this gives me
if I solve this,
00:54:31.620 --> 00:54:35.550
and I plug in so I can
multiply by lambda and solve,
00:54:35.550 --> 00:54:40.100
it would give me somewhat
a confidence interval for 1
00:54:40.100 --> 00:54:42.940
over lambda.
00:54:42.940 --> 00:54:44.370
If we just think
of 1 over lambda
00:54:44.370 --> 00:54:46.590
as being the p
that I had before,
00:54:46.590 --> 00:54:48.826
this would give me a
central limit theorem for--
00:54:51.664 --> 00:54:54.460
sorry, a confidence
interval for 1 over lambda.
00:54:54.460 --> 00:54:56.250
So I'm hiding a little
bit under the rug
00:54:56.250 --> 00:54:58.540
the fact that I have
to still define it.
00:54:58.540 --> 00:55:00.955
Let's just actually
go through this.
00:55:00.955 --> 00:55:02.890
I see some of you are
uncomfortable with this,
00:55:02.890 --> 00:55:04.884
so let's just do it.
00:55:04.884 --> 00:55:06.800
So what we've just proved
by the central limit
00:55:06.800 --> 00:55:09.330
theorem is that the
probability, that's
00:55:09.330 --> 00:55:21.180
square root of n Tn minus 1 over
lambda exceeds q alpha over 2
00:55:21.180 --> 00:55:24.690
is approximately
equal to alpha, right?
00:55:24.690 --> 00:55:27.180
That's just the statement of
the central limit theorem,
00:55:27.180 --> 00:55:30.654
and by approximately equal I
mean as n goes to infinity.
00:55:34.230 --> 00:55:36.750
Sorry I did not
write it correctly.
00:55:36.750 --> 00:55:39.440
I still have to divide
by square root of 1
00:55:39.440 --> 00:55:43.050
over lambda squared, which is
the standard deviation, right?
00:55:43.050 --> 00:55:44.620
And we said that
this is a bit ugly.
00:55:44.620 --> 00:55:46.820
So let's just do it
the way it should be.
00:55:46.820 --> 00:55:50.290
So multiply all these
things by lambda.
00:55:50.290 --> 00:55:56.020
So that means now that
the absolute value, so
00:55:56.020 --> 00:55:59.530
with probability 1 minus
alpha asymptotically,
00:55:59.530 --> 00:56:07.870
I have that square root of
n times lambda Tn minus 1
00:56:07.870 --> 00:56:11.080
is less than or equal
to q alpha over 2.
00:56:14.930 --> 00:56:20.020
So what it means is that, oh,
I have negative q alpha over 2
00:56:20.020 --> 00:56:22.640
less than square root of n.
00:56:22.640 --> 00:56:25.224
Let me divide by
square root of n here.
00:56:25.224 --> 00:56:34.620
lambda Tn minus
1 q alpha over 2.
00:56:34.620 --> 00:56:41.891
And so now what I have is that
I get that lambda is between--
00:56:41.891 --> 00:56:50.410
that's Tn bar-- is between
1 plus q alpha over 2
00:56:50.410 --> 00:56:53.510
divided by root n.
00:56:53.510 --> 00:56:57.470
And the whole thing
is divided by Tn bar,
00:56:57.470 --> 00:57:04.010
and same thing on the other side
except I have 1 minus q alpha
00:57:04.010 --> 00:57:08.678
over 2 divided by root
n divided by Tn bar.
00:57:12.980 --> 00:57:16.270
So it's kind of a weird
shape, but it's still
00:57:16.270 --> 00:57:20.238
of the form 1 over Tn bar
plus or minus something.
00:57:20.238 --> 00:57:23.830
But this something
depends on Tn bar itself.
00:57:23.830 --> 00:57:26.230
And that's actually normal,
because Tn bar is not only
00:57:26.230 --> 00:57:29.020
giving me information
about the mean,
00:57:29.020 --> 00:57:31.360
but it's also giving me
information about the variance.
00:57:31.360 --> 00:57:37.570
So it should definitely come
in the size of my error bars.
00:57:37.570 --> 00:57:41.710
And that's the way it comes
in this fairly natural way.
00:57:41.710 --> 00:57:43.810
Everybody agrees?
00:57:43.810 --> 00:57:46.880
So now I have actually
built a confidence interval.
00:57:46.880 --> 00:57:50.770
But what I want to show
you with this example is,
00:57:50.770 --> 00:57:52.870
can I translate this
in a central limit
00:57:52.870 --> 00:57:57.520
theorem for something that
converges to lambda, right?
00:57:57.520 --> 00:58:00.760
I know that Tn bar
converges to 1 over lambda,
00:58:00.760 --> 00:58:05.260
but I also know that 1 over
Tn bar converges to lambda.
00:58:05.260 --> 00:58:09.450
So do I have a central limit
theorem for 1 over Tn bar?
00:58:09.450 --> 00:58:11.490
Technically no, right?
00:58:11.490 --> 00:58:14.520
Central limit theorems are about
averages, and 1 over an average
00:58:14.520 --> 00:58:16.474
is not an average.
00:58:16.474 --> 00:58:20.520
But there's something that
statisticians like a lot,
00:58:20.520 --> 00:58:23.060
and it's called
the Delta method.
00:58:23.060 --> 00:58:24.800
The Delta method
is really something
00:58:24.800 --> 00:58:27.200
that's telling you
that you can actually
00:58:27.200 --> 00:58:30.440
take a function of
an average, and let
00:58:30.440 --> 00:58:32.570
it go to the function
of the limit,
00:58:32.570 --> 00:58:34.700
and you still have a
central limit theorem.
00:58:34.700 --> 00:58:37.280
And the factor or the
price to pay for this
00:58:37.280 --> 00:58:44.040
is something which depends on
the derivative of the function.
00:58:44.040 --> 00:58:46.276
And so let's just
go through this,
00:58:46.276 --> 00:58:48.650
and it's, again, just like
the proof of the central limit
00:58:48.650 --> 00:58:49.640
theorem.
00:58:49.640 --> 00:58:53.550
And actually in many of those
asymptotic statistics results,
00:58:53.550 --> 00:58:55.834
this is actually just
a Taylor expansion,
00:58:55.834 --> 00:58:57.500
and here it's not
even the second order,
00:58:57.500 --> 00:58:59.600
it's actually the
first order, all right?
00:58:59.600 --> 00:59:02.183
So I'm just going to do linear
approximation of this function.
00:59:04.360 --> 00:59:05.320
So let's do it.
00:59:05.320 --> 00:59:12.950
So I have that g of Tn bar--
00:59:12.950 --> 00:59:15.420
actually let's use the
notation of this slide,
00:59:15.420 --> 00:59:17.590
which is Zn and theta.
00:59:17.590 --> 00:59:24.250
So what I know is that Zn
minus theta square root of n
00:59:24.250 --> 00:59:29.454
goes to some Gaussian,
this standard Gaussian.
00:59:29.454 --> 00:59:32.810
No, not standard.
00:59:32.810 --> 00:59:34.080
OK, so that's the assumptions.
00:59:34.080 --> 00:59:40.502
And what I want to show is
some convergence of g of Zn
00:59:40.502 --> 00:59:43.590
to g of theta.
00:59:43.590 --> 00:59:46.350
So I'm not going to
multiply by root n just yet.
00:59:46.350 --> 00:59:49.125
So I'm going to do a first
order Taylor expansion.
00:59:49.125 --> 00:59:57.040
So what it is telling me is that
this is equal to Zn minus theta
00:59:57.040 --> 01:00:01.570
times g prime of,
let's call it theta bar
01:00:01.570 --> 01:00:06.200
where theta bar is
somewhere between say
01:00:06.200 --> 01:00:11.148
Zn and theta, for sum.
01:00:13.980 --> 01:00:17.700
OK, so if theta is less than
Zn you just permute those two.
01:00:17.700 --> 01:00:21.169
So that's what the
Taylor first order Taylor
01:00:21.169 --> 01:00:21.960
expansion tells me.
01:00:21.960 --> 01:00:23.918
There exists a theta bar
that's between the two
01:00:23.918 --> 01:00:26.912
values at which I'm expanding
so that those two things are
01:00:26.912 --> 01:00:29.292
equal.
01:00:29.292 --> 01:00:31.172
Is everybody shocked?
01:00:31.172 --> 01:00:31.672
No?
01:00:31.672 --> 01:00:36.350
So that's standard
Taylor expansion.
01:00:36.350 --> 01:00:38.054
Now I'm going to
multiply by root n.
01:00:44.519 --> 01:00:45.810
And so that's going to be what?
01:00:45.810 --> 01:00:50.200
That's going to be
root n Zn minus theta.
01:00:50.200 --> 01:00:51.970
Ah-ha, that's something I like.
01:00:51.970 --> 01:00:57.130
Times g prime of theta bar.
01:00:59.887 --> 01:01:01.470
Now the central limit
theorem tells me
01:01:01.470 --> 01:01:02.904
that this goes to what?
01:01:06.250 --> 01:01:12.370
Well, this goes to sum n
0 sigma squared, right?
01:01:12.370 --> 01:01:15.400
That was the first
line over there.
01:01:15.400 --> 01:01:20.520
This guy here, well,
it's not clear, right?
01:01:20.520 --> 01:01:21.540
Actually it is.
01:01:21.540 --> 01:01:24.840
Let's start with this guy.
01:01:24.840 --> 01:01:28.450
What does theta bar go to?
01:01:28.450 --> 01:01:30.752
Well, I know that Zn
is going to theta.
01:01:33.660 --> 01:01:37.760
Just because, well, that's
my law of large numbers.
01:01:37.760 --> 01:01:41.010
Zn is going to
theta, which means
01:01:41.010 --> 01:01:44.520
that theta bar is sandwiched
between two values that
01:01:44.520 --> 01:01:46.910
converge to theta.
01:01:46.910 --> 01:01:49.580
So that means that theta bar
converges to theta itself
01:01:49.580 --> 01:01:51.300
as n goes to infinity.
01:01:51.300 --> 01:01:54.940
That's just the law
of large numbers.
01:01:54.940 --> 01:01:57.450
Everybody agrees?
01:01:57.450 --> 01:01:58.950
Just because it's
sandwiched, right?
01:01:58.950 --> 01:02:01.180
So I have Zn.
01:02:01.180 --> 01:02:05.651
I have theta, and theta
bar is somewhere here.
01:02:05.651 --> 01:02:06.900
The picture might be reversed.
01:02:06.900 --> 01:02:08.980
It might be that Zn end
is larger than theta.
01:02:08.980 --> 01:02:10.480
But the law of large
number tells me
01:02:10.480 --> 01:02:14.050
that this guy is not moving,
but this guy is moving that way.
01:02:14.050 --> 01:02:16.444
So you know when
n is [INAUDIBLE],,
01:02:16.444 --> 01:02:18.360
there's very little
wiggle room for theta bar,
01:02:18.360 --> 01:02:19.975
and it can only get to theta.
01:02:23.370 --> 01:02:25.310
And I call it the
sandwich theorem,
01:02:25.310 --> 01:02:29.230
or just find your
favorite food in there.
01:02:29.230 --> 01:02:31.716
So this guy goes
to theta, and now I
01:02:31.716 --> 01:02:33.340
need to make an extra
assumption, which
01:02:33.340 --> 01:02:38.601
is that g prime is continuous.
01:02:38.601 --> 01:02:42.300
And if g prime is continuous,
then g prime of theta bar
01:02:42.300 --> 01:02:44.630
goes to g prime of theta.
01:02:44.630 --> 01:02:49.132
So this thing goes
to g prime of theta.
01:02:52.580 --> 01:02:54.776
But I have an issue here.
01:02:54.776 --> 01:02:56.150
Is that now I have
something that
01:02:56.150 --> 01:02:57.860
converges in distribution
and something
01:02:57.860 --> 01:03:01.540
that converges in say--
01:03:01.540 --> 01:03:04.200
I mean, this converges almost
surely or saying probability
01:03:04.200 --> 01:03:06.370
just to be safe.
01:03:06.370 --> 01:03:09.820
And this one converges
in distribution.
01:03:09.820 --> 01:03:11.050
And I want to combine them.
01:03:11.050 --> 01:03:12.633
But I don't have a
slide that tells me
01:03:12.633 --> 01:03:15.660
I'm allowed to take the product
of something that converges
01:03:15.660 --> 01:03:18.460
in distribution, and something
that converges in probability.
01:03:18.460 --> 01:03:19.500
This does not exist.
01:03:19.500 --> 01:03:21.450
Actually, if
anything it told me,
01:03:21.450 --> 01:03:25.970
do not do anything with things
that converge in distribution.
01:03:25.970 --> 01:03:32.770
And so that gets us to our--
01:03:32.770 --> 01:03:36.000
OK, so I'll come back
to this in a second.
01:03:36.000 --> 01:03:39.560
And that gets us to something
called Slutsky's theorem.
01:03:39.560 --> 01:03:42.940
And Slutsky's theorem tells us
that in very specific cases,
01:03:42.940 --> 01:03:44.740
you can do just that.
01:03:44.740 --> 01:03:49.000
So you have two sequences
of random variables, Xn bar,
01:03:49.000 --> 01:03:53.370
that's Xn that converges to
X. And Yn that converges to Y,
01:03:53.370 --> 01:03:55.370
but Y is not anything.
01:03:55.370 --> 01:03:57.410
Y is not any random variable.
01:03:57.410 --> 01:03:59.090
So X converges in
this distribution.
01:03:59.090 --> 01:04:01.215
Sorry, I forgot to mention,
this is very important.
01:04:01.215 --> 01:04:04.920
Xn converges in distribution,
Y converges in probability.
01:04:04.920 --> 01:04:07.570
And we know that in generality
we cannot combine those two
01:04:07.570 --> 01:04:11.272
things, but Slutsky tells
us that if the limit of Y is
01:04:11.272 --> 01:04:13.230
a constant, meaning it's
not a random variable,
01:04:13.230 --> 01:04:16.080
but it's a
deterministic number 2,
01:04:16.080 --> 01:04:18.940
just a fixed number that's
not a random variable,
01:04:18.940 --> 01:04:21.390
then you can combine them.
01:04:21.390 --> 01:04:24.869
Then you can sum them, and
then you can multiply them.
01:04:28.874 --> 01:04:31.290
I mean, actually you can do
whatever combination you want,
01:04:31.290 --> 01:04:34.800
because it actually implies
that X, the vector Xn, Yn
01:04:34.800 --> 01:04:39.250
converges to the vector Xc.
01:04:39.250 --> 01:04:41.420
OK, so here I just
took two combinations.
01:04:41.420 --> 01:04:44.070
They are very convenient for
us, the sum and the product
01:04:44.070 --> 01:04:45.850
so I could do other
stuff like the ratio
01:04:45.850 --> 01:04:47.563
if c is not 0, things like that.
01:04:51.190 --> 01:04:53.010
So that's what
Slutsky does for us.
01:04:53.010 --> 01:04:56.120
So what you're going to have to
write a lot in your homework,
01:04:56.120 --> 01:04:58.880
in your mid-terms, by Slutsky.
01:04:58.880 --> 01:05:03.230
I know some people are very
generous with their by Slutsky.
01:05:03.230 --> 01:05:05.940
They just do numerical
applications,
01:05:05.940 --> 01:05:08.250
mu is equal to 6, and
therefore by Slutsky
01:05:08.250 --> 01:05:10.260
mu square is equal to 36.
01:05:10.260 --> 01:05:11.690
All right, so don't do that.
01:05:11.690 --> 01:05:15.415
Just use, write Slutsky when
you're actually using Slutsky.
01:05:15.415 --> 01:05:17.540
But this is something that's
very important for us,
01:05:17.540 --> 01:05:18.860
and it turns out
that you're going
01:05:18.860 --> 01:05:20.985
to feel like you can write
by Slutsky all the time,
01:05:20.985 --> 01:05:23.362
because that's going to
work for us all the time.
01:05:23.362 --> 01:05:25.070
Everything we're going
to see is actually
01:05:25.070 --> 01:05:27.590
going to be where we're going
to have to combine stuff.
01:05:27.590 --> 01:05:30.260
Since we only rely on
convergence from distribution
01:05:30.260 --> 01:05:32.090
arising from the
central limit theorem,
01:05:32.090 --> 01:05:34.340
we're actually going to have
to rely on something that
01:05:34.340 --> 01:05:36.920
allows us to combine them,
and the only thing we know
01:05:36.920 --> 01:05:37.590
is Slutsky.
01:05:37.590 --> 01:05:40.290
So we better hope
that this thing works.
01:05:40.290 --> 01:05:41.780
So why Slutsky works for us.
01:05:41.780 --> 01:05:43.640
Can somebody tell
me why Slutsky works
01:05:43.640 --> 01:05:46.960
to combine those two guys?
01:05:46.960 --> 01:05:48.820
So this one is converging
in distribution.
01:05:48.820 --> 01:05:51.740
This one is converging
in probability,
01:05:51.740 --> 01:05:54.710
but to a deterministic number.
01:05:54.710 --> 01:05:57.440
g prime of theta is a
deterministic number.
01:05:57.440 --> 01:06:02.200
I don't know what theta is, but
it's certainly deterministic.
01:06:02.200 --> 01:06:04.380
All right, so I can combine
them, multiply them.
01:06:04.380 --> 01:06:08.740
So that's just the second
line of that in particular.
01:06:08.740 --> 01:06:12.090
All right, everybody is with me?
01:06:12.090 --> 01:06:13.340
So now I'm allowed to do this.
01:06:13.340 --> 01:06:15.048
You can actually--
you will see something
01:06:15.048 --> 01:06:16.950
like counterexample
questions in your problem
01:06:16.950 --> 01:06:18.741
set just so that you
can convince yourself.
01:06:18.741 --> 01:06:19.960
It's always a good thing.
01:06:19.960 --> 01:06:21.150
I don't like to
give them, because I
01:06:21.150 --> 01:06:23.108
think it's much better
for you to actually come
01:06:23.108 --> 01:06:24.860
to the counterexample yourself.
01:06:24.860 --> 01:06:35.670
Like what can go wrong
if Y is not a random--
01:06:35.670 --> 01:06:38.450
sorry, if Y is not a--
01:06:38.450 --> 01:06:42.572
sorry, if c is not the constant,
but it's a random variable.
01:06:42.572 --> 01:06:45.534
You can figure that out.
01:06:45.534 --> 01:06:46.700
All right, so let's go back.
01:06:46.700 --> 01:06:49.040
So we have now this Delta
method that tells us
01:06:49.040 --> 01:06:51.080
that now I have a
central limit theorem
01:06:51.080 --> 01:06:55.500
for functions of averages,
and not just for averages.
01:06:55.500 --> 01:06:57.922
So the only price to pay
is this derivative there.
01:07:00.600 --> 01:07:05.490
So, for example, if g is
just a linear function,
01:07:05.490 --> 01:07:07.860
then I'm going to have a
constant multiplication.
01:07:07.860 --> 01:07:10.680
If g is a quadratic
function, then I'm
01:07:10.680 --> 01:07:13.710
going to have theta squared
that shows up there.
01:07:13.710 --> 01:07:14.550
Things like that.
01:07:14.550 --> 01:07:16.300
So just think of what
kind of applications
01:07:16.300 --> 01:07:17.770
you could have for this.
01:07:17.770 --> 01:07:19.769
Here are the functions
that we're interested in,
01:07:19.769 --> 01:07:21.270
is x maps to 1 over x.
01:07:21.270 --> 01:07:23.049
What is the derivative
of this guy?
01:07:25.947 --> 01:07:29.746
What is the derivative
of 1 over x?
01:07:29.746 --> 01:07:31.120
Negative 1 over
x squared, right?
01:07:31.120 --> 01:07:33.470
That's the thing we're going
to have to put in there.
01:07:33.470 --> 01:07:37.630
And so this is what we get.
01:07:37.630 --> 01:07:44.260
So now when I'm actually
going to write this,
01:07:44.260 --> 01:07:51.272
so if I want to show square root
of n lambda hat minus lambda.
01:07:51.272 --> 01:07:52.480
That's my application, right?
01:07:52.480 --> 01:07:59.150
This is actually 1 over Tn, and
this is 1 over 1 over lambda.
01:07:59.150 --> 01:08:05.510
So the function g of x
is 1 over x in this case.
01:08:05.510 --> 01:08:06.590
So now I have this thing.
01:08:06.590 --> 01:08:08.960
So I know that by
the Delta method--
01:08:08.960 --> 01:08:11.240
oh, and I knew
that Tn, remember,
01:08:11.240 --> 01:08:16.790
square root of Tn
minus 1 over lambda
01:08:16.790 --> 01:08:19.310
was going to sum
normal with mean 0
01:08:19.310 --> 01:08:21.932
and variance 1 over
lambda squared, right?
01:08:21.932 --> 01:08:26.079
So the sigma square over there
is 1 over lambda squared.
01:08:26.079 --> 01:08:27.370
So now this thing goes to what?
01:08:27.370 --> 01:08:28.938
Sum normal.
01:08:28.938 --> 01:08:32.190
What is going to be the mean?
01:08:32.190 --> 01:08:32.690
0.
01:08:35.510 --> 01:08:37.187
And what is the variance?
01:08:37.187 --> 01:08:38.270
So the variance is going--
01:08:38.270 --> 01:08:40.250
I'm going to pick up
this guy, 1 over lambda
01:08:40.250 --> 01:08:46.930
squared, and then I'm going to
have to take g prime of what?
01:08:46.930 --> 01:08:48.709
Of 1 over lambda, right?
01:08:48.709 --> 01:08:49.627
That's my theta.
01:08:52.840 --> 01:08:55.069
So I have g of theta,
which is 1 over theta.
01:08:55.069 --> 01:08:58.406
So I'm going to have g
prime of 1 over lambda.
01:08:58.406 --> 01:09:00.294
And what is g prime
of 1 over lambda?
01:09:05.029 --> 01:09:09.260
So we said that g prime is 1
over negative 1 over x squared.
01:09:09.260 --> 01:09:13.885
So it's negative 1 over
1 over lambda squared--
01:09:17.877 --> 01:09:18.875
sorry, squared.
01:09:21.870 --> 01:09:24.340
Which is nice, because
g can be decreasing.
01:09:24.340 --> 01:09:26.850
So that would be annoying
to have a negative variance.
01:09:26.850 --> 01:09:29.250
And so g prime is
negative 1 over, and so
01:09:29.250 --> 01:09:33.569
what I get eventually is
lambda squared up here,
01:09:33.569 --> 01:09:36.370
but then I square it again.
01:09:36.370 --> 01:09:39.764
So this whole thing
here becomes what?
01:09:39.764 --> 01:09:41.688
Can somebody tell me
what the final result is?
01:09:44.274 --> 01:09:45.149
Lambda squared right?
01:09:45.149 --> 01:09:47.323
So it's lambda 4
divided by lambda 2.
01:09:55.179 --> 01:09:59.620
So that's what's written there.
01:09:59.620 --> 01:10:04.460
And now I can just do my
good old computation for a--
01:10:10.610 --> 01:10:14.570
I can do a good computation
for a confidence interval.
01:10:14.570 --> 01:10:17.880
All right, so let's just
go from the second line.
01:10:17.880 --> 01:10:21.200
So we know that lambda
hat minus lambda
01:10:21.200 --> 01:10:23.980
is less than, we've done
that several times already.
01:10:23.980 --> 01:10:25.520
So it's q alpha over 2--
01:10:25.520 --> 01:10:28.190
sorry, I should put alpha
over 2 over this thing, right?
01:10:28.190 --> 01:10:31.025
So that's really the quintile
of what our alpha over 2 times
01:10:31.025 --> 01:10:34.870
lambda divided by
square root of n.
01:10:34.870 --> 01:10:39.610
All right, and so that means
that my confidence interval
01:10:39.610 --> 01:10:42.610
should be this, lambda hat.
01:10:42.610 --> 01:10:47.670
Lambda belongs to lambda
plus or minus q alpha
01:10:47.670 --> 01:10:51.325
over 2 lambda divided
by root n, right?
01:10:51.325 --> 01:10:53.640
So that's my
confidence interval.
01:10:53.640 --> 01:10:56.957
But again, it's not
very suitable, because--
01:10:56.957 --> 01:10:59.292
sorry, that's lambda hat.
01:10:59.292 --> 01:11:02.561
Because they don't
know how to compute it.
01:11:02.561 --> 01:11:04.510
So now I'm going to
request from the audience
01:11:04.510 --> 01:11:06.464
some remedies for this.
01:11:06.464 --> 01:11:07.940
What do you suggest we do?
01:11:12.860 --> 01:11:14.828
What is the laziest
thing I can do?
01:11:18.272 --> 01:11:19.248
Anybody?
01:11:19.248 --> 01:11:19.748
Yeah.
01:11:19.748 --> 01:11:21.332
AUDIENCE: [INAUDIBLE]
01:11:21.332 --> 01:11:23.290
PHILIPPE RIGOLLET Replace
lambda by lambda hat.
01:11:23.290 --> 01:11:25.152
What justifies
for me to do this?
01:11:25.152 --> 01:11:27.602
AUDIENCE: [INAUDIBLE]
01:11:27.602 --> 01:11:29.060
PHILIPPE RIGOLLET
Yeah, and Slutsky
01:11:29.060 --> 01:11:32.850
tells me I can actually do
it, because Slutsky tells me,
01:11:32.850 --> 01:11:35.210
where does this lambda
come from, right?
01:11:35.210 --> 01:11:37.280
This lambda comes from here.
01:11:37.280 --> 01:11:39.530
That's the one that's here.
01:11:39.530 --> 01:11:41.810
So actually I could
rewrite this entire thing
01:11:41.810 --> 01:11:47.000
as square root of n lambda hat
minus lambda divided by lambda
01:11:47.000 --> 01:11:51.420
converges to sum n 0, 1.
01:11:51.420 --> 01:11:55.560
Now if I replace this by
lambda hat, what I have is
01:11:55.560 --> 01:12:01.600
that this is actually really
the original one times
01:12:01.600 --> 01:12:04.830
lambda divided by lambda hat.
01:12:04.830 --> 01:12:07.510
And this converges
to n 0, 1, right?
01:12:07.510 --> 01:12:10.500
And now what you're telling
me is, well, this guy
01:12:10.500 --> 01:12:15.360
I know it converges to n 0, 1,
and this guy is converging to 1
01:12:15.360 --> 01:12:16.650
by the law of large number.
01:12:16.650 --> 01:12:19.880
But this one is converging to 1,
which happens to be a constant.
01:12:19.880 --> 01:12:22.860
It converges in probability,
so by Slutsky I can actually
01:12:22.860 --> 01:12:25.590
take the product and still
maintain my conversion
01:12:25.590 --> 01:12:29.070
to distribution to
a standard Gaussian.
01:12:29.070 --> 01:12:30.360
So you can always do this.
01:12:30.360 --> 01:12:34.080
Every time you replace
some p by p hat,
01:12:34.080 --> 01:12:35.752
as long as their
ratio goes to 1,
01:12:35.752 --> 01:12:38.210
which is going to be guaranteed
by the law of large number,
01:12:38.210 --> 01:12:40.381
you're actually
going to be fine.
01:12:40.381 --> 01:12:42.464
And that's where we're
going to use Slutsky a lot.
01:12:42.464 --> 01:12:46.640
When we do plug in, Slutsky
is going to be our friend.
01:12:46.640 --> 01:12:47.890
OK, so we can do this.
01:12:51.180 --> 01:12:52.110
And that's one way.
01:12:52.110 --> 01:12:53.650
And then other
ways to just solve
01:12:53.650 --> 01:12:56.160
for lambda like we did before.
01:12:56.160 --> 01:12:58.200
So the first one we
got is actually--
01:12:58.200 --> 01:13:00.840
I don't know if I still
have it somewhere.
01:13:00.840 --> 01:13:03.680
Yeah, that was the one, right?
01:13:03.680 --> 01:13:08.240
So we had 1 over Tn q, and
that's exactly the same
01:13:08.240 --> 01:13:09.180
that we have here.
01:13:09.180 --> 01:13:12.712
So your solution is actually
giving us exactly this guy when
01:13:12.712 --> 01:13:14.368
we actually solve for lambda.
01:13:17.420 --> 01:13:20.690
So this is what we get.
01:13:20.690 --> 01:13:21.620
Lambda hat.
01:13:21.620 --> 01:13:24.140
We replace lambda by
lambda hat, and we
01:13:24.140 --> 01:13:27.750
have our asymptotic
convergence theorem.
01:13:27.750 --> 01:13:30.400
And that's exactly what we
did in Slutsky's theorem.
01:13:30.400 --> 01:13:32.817
Now we're getting to it at
this point is just telling us
01:13:32.817 --> 01:13:36.640
that we can actually do this.
01:13:36.640 --> 01:13:39.680
Are there any questions
about what we did here?
01:13:39.680 --> 01:13:42.520
So this derivation right
here is exactly what I
01:13:42.520 --> 01:13:44.190
did on the board I showed you.
01:13:44.190 --> 01:13:46.690
So let me just show you
with a little more space
01:13:46.690 --> 01:13:49.094
just so that we all
understand, right?
01:13:49.094 --> 01:13:58.570
So we know that square root of n
lambda hat minus lambda divided
01:13:58.570 --> 01:14:00.760
by lambda, the
true lambda defined
01:14:00.760 --> 01:14:04.100
converges to sum n 0, 1.
01:14:04.100 --> 01:14:07.215
So that was CLT
plus Delta method.
01:14:11.700 --> 01:14:13.710
Applying those two,
we got to here.
01:14:13.710 --> 01:14:17.400
And we know that
lambda hat converges
01:14:17.400 --> 01:14:21.600
to lambda in probability and
almost surely, and that's what?
01:14:21.600 --> 01:14:24.980
That was law of large number
plus continued mapping theorem,
01:14:24.980 --> 01:14:25.729
right?
01:14:25.729 --> 01:14:27.687
Because we only knew that
one of our lambda hat
01:14:27.687 --> 01:14:29.148
converges to 1 over lambda.
01:14:29.148 --> 01:14:31.590
So we had to flip
those things around.
01:14:31.590 --> 01:14:33.920
And now what I said is
that I apply Slutsky,
01:14:33.920 --> 01:14:38.210
so I write square root of n
lambda hat minus lambda divided
01:14:38.210 --> 01:14:42.260
by lambda hat, which is the
suggestion that was made to me.
01:14:42.260 --> 01:14:44.160
They said, I want
this, but I would
01:14:44.160 --> 01:14:45.910
want to show that it
converges to sum n 0,
01:14:45.910 --> 01:14:49.970
1 so I can legitimately use
q alpha over 2 in this one
01:14:49.970 --> 01:14:50.745
though.
01:14:50.745 --> 01:14:53.120
And the way we said is like,
well, this thing is actually
01:14:53.120 --> 01:15:00.737
really q divided by lambda times
lambda divided by lambda hat.
01:15:00.737 --> 01:15:02.320
So this thing that
was proposed to me,
01:15:02.320 --> 01:15:03.730
I can decompose
it in the product
01:15:03.730 --> 01:15:05.980
of those two random variables.
01:15:05.980 --> 01:15:09.060
The first one here converges
through the Gaussian
01:15:09.060 --> 01:15:10.600
from the central limit theorem.
01:15:10.600 --> 01:15:14.718
And the second one converges
to 1 from this guy,
01:15:14.718 --> 01:15:17.038
but in probability this time.
01:15:20.620 --> 01:15:23.260
That was the ratio of two
things in probability,
01:15:23.260 --> 01:15:25.030
we can actually get it.
01:15:25.030 --> 01:15:26.753
And so now I apply Slutsky.
01:15:31.180 --> 01:15:34.537
And Slutsky tells me that
I can actually do that.
01:15:34.537 --> 01:15:36.870
But when I take the product
of this thing that converges
01:15:36.870 --> 01:15:40.010
to some standard Gaussian,
and this thing that converges
01:15:40.010 --> 01:15:43.380
in probability to 1, then
their product actually
01:15:43.380 --> 01:15:48.618
converges to still this
standard Gaussian [INAUDIBLE]
01:15:55.370 --> 01:15:58.880
Well, that's exactly
what's done here,
01:15:58.880 --> 01:16:02.340
and I think I'm getting there.
01:16:02.340 --> 01:16:07.570
So in our case, OK, so just a
remark for Slutsky's theorem.
01:16:07.570 --> 01:16:09.070
So that's the last line.
01:16:09.070 --> 01:16:11.850
So in the first example we used
the problem dependent trick,
01:16:11.850 --> 01:16:13.980
which was to say,
well, turns out
01:16:13.980 --> 01:16:16.380
that we knew that p
is between 0 and 1.
01:16:16.380 --> 01:16:18.960
So we have this p 1 minus
p that was annoying to us.
01:16:18.960 --> 01:16:21.240
We just said, let's
just bound it by 1/4,
01:16:21.240 --> 01:16:23.870
because that's going to be
true for any value of p.
01:16:23.870 --> 01:16:26.310
But here, lambda takes any
value between 0 and infinity,
01:16:26.310 --> 01:16:27.612
so we didn't have such a trick.
01:16:27.612 --> 01:16:29.820
It's something like we could
see that lambda was less
01:16:29.820 --> 01:16:30.970
than something.
01:16:30.970 --> 01:16:34.070
Maybe we know it, in which
case we could use that.
01:16:34.070 --> 01:16:36.844
But then in this case,
we could actually also
01:16:36.844 --> 01:16:39.010
have used Slutsky's theorem
by doing plug in, right?
01:16:39.010 --> 01:16:41.890
So here this is my p 1 minus
p that's replaced by p hat 1
01:16:41.890 --> 01:16:43.060
minus p hat.
01:16:43.060 --> 01:16:45.084
And Slutsky justify,
so we did that
01:16:45.084 --> 01:16:46.500
without really
thinking last time.
01:16:46.500 --> 01:16:48.700
But Slutsky actually
justifies the fact
01:16:48.700 --> 01:16:51.225
that this is valid, and
still allows me to use
01:16:51.225 --> 01:16:52.940
this q alpha over 2 here.
01:16:56.230 --> 01:16:58.180
All right, so that's
the end of this lecture.
01:16:58.180 --> 01:17:01.300
Tonight I will post the next
set of slides, chapter two.
01:17:01.300 --> 01:17:04.060
And, well, hopefully the video.
01:17:04.060 --> 01:17:06.810
I'm not sure when it's
going to come out.