WEBVTT
00:00:00.120 --> 00:00:02.460
The following content is
provided under a Creative
00:00:02.460 --> 00:00:03.880
Commons license.
00:00:03.880 --> 00:00:06.090
Your support will help
MIT OpenCourseWare
00:00:06.090 --> 00:00:10.180
continue to offer high-quality
educational resources for free.
00:00:10.180 --> 00:00:12.720
To make a donation or to
view additional materials
00:00:12.720 --> 00:00:16.650
from hundreds of MIT courses,
visit MIT OpenCourseWare
00:00:16.650 --> 00:00:17.880
at ocw.mit.edu.
00:00:20.524 --> 00:00:21.940
PHILIPPE RIGOLLET:
So today, we're
00:00:21.940 --> 00:00:24.820
going to close this
chapter, this short chapter,
00:00:24.820 --> 00:00:26.200
on Bayesian inference.
00:00:26.200 --> 00:00:28.990
Again, this was just
an overview of what you
00:00:28.990 --> 00:00:32.259
can do in Bayesian inference.
00:00:32.259 --> 00:00:34.630
And last time, we
started defining
00:00:34.630 --> 00:00:36.260
what's called Jeffreys priors.
00:00:36.260 --> 00:00:36.760
Right?
00:00:36.760 --> 00:00:38.560
So when you do
Bayesian inference,
00:00:38.560 --> 00:00:41.620
you have to introduce a
prior on your parameter.
00:00:41.620 --> 00:00:43.660
And we said that
usually, it's something
00:00:43.660 --> 00:00:45.820
that encodes your domain
knowledge about where
00:00:45.820 --> 00:00:47.130
the parameter could be.
00:00:47.130 --> 00:00:49.030
But there's also some
principle way to do it,
00:00:49.030 --> 00:00:51.155
if you want to do Bayesian
inference without really
00:00:51.155 --> 00:00:53.420
having to think about it.
00:00:53.420 --> 00:00:56.260
And for example, one
of the natural priors
00:00:56.260 --> 00:00:58.080
were those non-informative
priors, right?
00:00:58.080 --> 00:00:59.740
If you were on a
compact set, it's
00:00:59.740 --> 00:01:01.570
a uniform prior of this set.
00:01:01.570 --> 00:01:04.239
If you're on an infinite set,
you can still think of taking
00:01:04.239 --> 00:01:06.520
the [? 01s ?] prior.
00:01:06.520 --> 00:01:09.280
And that's called an [INAUDIBLE]
That's always equal to 1.
00:01:09.280 --> 00:01:13.300
And that's an improper prior
if you are an infinite set
00:01:13.300 --> 00:01:14.830
or proportional to one.
00:01:14.830 --> 00:01:17.860
And so another prior
that you can think of,
00:01:17.860 --> 00:01:20.230
in the case where you have
a Fisher information, which
00:01:20.230 --> 00:01:23.200
is well-defined, is something
called Jefferys prior.
00:01:23.200 --> 00:01:25.600
And this prior is
a prior which is
00:01:25.600 --> 00:01:28.150
proportional to square root of
the determinant of the Fisher
00:01:28.150 --> 00:01:29.780
information matrix.
00:01:29.780 --> 00:01:31.750
And if you're in
one dimension, it's
00:01:31.750 --> 00:01:37.750
basically proportional to
a square root of the Fisher
00:01:37.750 --> 00:01:40.750
information coefficient,
which we know, for example,
00:01:40.750 --> 00:01:44.170
is the asymptotic variance
of the maximum likelihood
00:01:44.170 --> 00:01:45.370
estimator.
00:01:45.370 --> 00:01:48.010
And it turns out
that it's basically.
00:01:48.010 --> 00:01:50.330
So square root of this
thing is basically
00:01:50.330 --> 00:01:54.160
one over the standard deviation
of the maximum likelihood
00:01:54.160 --> 00:01:55.150
estimator.
00:01:55.150 --> 00:01:56.690
And so you can
compute this, right?
00:01:56.690 --> 00:01:59.944
So you can compute for the
maximum likelihood estimator.
00:01:59.944 --> 00:02:01.360
We know that the
variance is going
00:02:01.360 --> 00:02:09.910
to be p1 minus p
in the Bernoulli
00:02:09.910 --> 00:02:11.200
statistical experiment.
00:02:11.200 --> 00:02:13.510
So you get this one over the
square root of this thing.
00:02:13.510 --> 00:02:16.720
And for example, in
the Gaussian setting,
00:02:16.720 --> 00:02:19.880
you actually have the
Fisher information,
00:02:19.880 --> 00:02:22.000
even in the multi-variate
one, is actually
00:02:22.000 --> 00:02:24.752
going to be something
like the identity matrix.
00:02:24.752 --> 00:02:25.960
So this is proportional to 1.
00:02:25.960 --> 00:02:29.530
It's the improper prior that
you get, in this case, OK?
00:02:29.530 --> 00:02:31.690
Meaning that, for
the Gaussian setting,
00:02:31.690 --> 00:02:33.880
no place where you
center your Gaussian
00:02:33.880 --> 00:02:36.020
is actually better
than any other.
00:02:36.020 --> 00:02:36.520
All right.
00:02:36.520 --> 00:02:40.130
So we basically
left on this slide,
00:02:40.130 --> 00:02:43.570
where we saw that
Jeffreys prior satisfy
00:02:43.570 --> 00:02:46.170
a reparametrization
[INAUDIBLE] invariant
00:02:46.170 --> 00:02:49.180
by transformation of
your parameter, which
00:02:49.180 --> 00:02:51.920
is a desirable property.
00:02:51.920 --> 00:02:57.217
And the way, it says that, well,
if I have my prior on theta,
00:02:57.217 --> 00:02:59.050
and then I suddenly
decide that theta is not
00:02:59.050 --> 00:03:01.720
the parameter I want to use
to parameterize my problem,
00:03:01.720 --> 00:03:04.640
actually what I want
is phi of theta.
00:03:04.640 --> 00:03:07.840
So think, for example, as theta
being the mean of a Gaussian,
00:03:07.840 --> 00:03:11.140
and phi of theta as
being mean to the cube.
00:03:11.140 --> 00:03:11.920
OK?
00:03:11.920 --> 00:03:15.520
This is a one-to-one
map phi, right?
00:03:15.520 --> 00:03:20.185
So for example, if I want to
go from theta to theta cubed,
00:03:20.185 --> 00:03:22.840
and now I decide that this is
the actual parameter that I
00:03:22.840 --> 00:03:26.200
want, well, then it means
that, on this parameter,
00:03:26.200 --> 00:03:29.110
my original prior is going
to induce another prior.
00:03:29.110 --> 00:03:30.970
And here, it says,
well, this prior
00:03:30.970 --> 00:03:33.200
is actually also Jeffreys prior.
00:03:33.200 --> 00:03:33.700
OK?
00:03:33.700 --> 00:03:35.450
So it's essentially
telling you that,
00:03:35.450 --> 00:03:38.410
for this new parametrization,
if you take Jeffreys prior, then
00:03:38.410 --> 00:03:41.201
you actually go back to having
exactly something that's
00:03:41.201 --> 00:03:43.450
of the form's [INAUDIBLE]
of determinant of the Fisher
00:03:43.450 --> 00:03:45.116
information, but this
thing with respect
00:03:45.116 --> 00:03:47.810
to your new
parametrization All right.
00:03:47.810 --> 00:03:50.360
And so why is this true?
00:03:50.360 --> 00:03:53.440
Well, it's just this
change of variable theorem.
00:03:53.440 --> 00:03:58.330
So it's essentially telling
you that, if you call--
00:03:58.330 --> 00:04:08.850
let's call p-- well, let's go
pi tilde of eta prior over eta.
00:04:08.850 --> 00:04:11.130
And you have pi of
theta as the prior
00:04:11.130 --> 00:04:18.040
over theta, than since eta
is of the form phi of theta,
00:04:18.040 --> 00:04:26.620
just by change of variable,
so that's essentially
00:04:26.620 --> 00:04:33.070
a probability result. It
says that pi tilde of eta
00:04:33.070 --> 00:04:42.790
is equal to pi of eta
times d pi of theta times d
00:04:42.790 --> 00:04:48.860
theta over d eta and--
00:04:55.706 --> 00:04:57.189
sorry, is that the one?
00:04:57.189 --> 00:04:58.730
Sorry, I'm going to
have to write it,
00:04:58.730 --> 00:04:59.938
because I always forget this.
00:05:05.209 --> 00:05:07.380
So if I take a function--
00:05:14.380 --> 00:05:14.960
OK.
00:05:14.960 --> 00:05:16.400
So what I want is to check.
00:05:38.340 --> 00:05:41.870
OK, so I want the function
of eta that I can here.
00:05:41.870 --> 00:05:48.480
And what I know is that
this is h of phi of theta.
00:05:48.480 --> 00:05:48.980
All right?
00:05:48.980 --> 00:05:51.810
So sorry, eta is
phi of theta, right?
00:05:51.810 --> 00:05:53.471
Yeah.
00:05:53.471 --> 00:05:54.970
So what I'm going
to do is I'm going
00:05:54.970 --> 00:06:09.130
to do the change of variable,
theta is phi inverse of eta.
00:06:09.130 --> 00:06:14.120
So eta is phi of
theta, which means
00:06:14.120 --> 00:06:20.540
that d eta is equal to d--
00:06:20.540 --> 00:06:26.020
well, to phi prime
of theta d theta.
00:06:26.020 --> 00:06:31.464
So when I'm going to write this,
I'm going to get integral of h.
00:06:31.464 --> 00:06:33.470
Actually, let me
write this, as I
00:06:33.470 --> 00:06:36.980
am more comfortable
writing this as e
00:06:36.980 --> 00:06:40.031
with respect to eta of h of eta.
00:06:40.031 --> 00:06:40.530
OK?
00:06:40.530 --> 00:06:44.580
So that's just eta according
to being drawn from the prior.
00:06:44.580 --> 00:06:47.670
And I want to write this as
the integral of he of eta times
00:06:47.670 --> 00:06:49.080
some function, right?
00:06:49.080 --> 00:06:58.580
So this is the
integral of h of phi
00:06:58.580 --> 00:07:03.556
of theta pi of theta d theta.
00:07:03.556 --> 00:07:06.150
Now, I'm going to do
my change of variable.
00:07:06.150 --> 00:07:09.290
So this is going to be
the integral of h of eta.
00:07:09.290 --> 00:07:16.420
And then pi of phi of--
00:07:16.420 --> 00:07:20.290
so theta is phi inverse of eta.
00:07:20.290 --> 00:07:27.390
And then d theta is phi
prime of theta d theta, OK?
00:07:27.390 --> 00:07:30.210
And so what is pi of phi theta?
00:07:30.210 --> 00:07:32.120
So this thing is proportional.
00:07:32.120 --> 00:07:33.750
So we're in, say,
dimension 1, so it's
00:07:33.750 --> 00:07:38.420
proportional of square root
of the Fisher information.
00:07:38.420 --> 00:07:39.920
And the Fisher
information, we know,
00:07:39.920 --> 00:07:44.630
is the expectation of the square
of the derivative of the log
00:07:44.630 --> 00:07:45.770
likelihood, right?
00:07:45.770 --> 00:07:48.740
So this is square root
of the expectation
00:07:48.740 --> 00:08:03.650
of d over d theta of log of--
00:08:03.650 --> 00:08:06.010
well, now, I need the density.
00:08:06.010 --> 00:08:10.050
Well, let's just
call it l of theta.
00:08:10.050 --> 00:08:17.030
And I want this to be taken
at phi inverse of eta squared.
00:08:19.980 --> 00:08:21.480
And then what I pick up is the--
00:08:23.771 --> 00:08:25.770
so I'm going to put
everything under the square.
00:08:25.770 --> 00:08:31.460
So I get phi prime of
theta squared d theta.
00:08:31.460 --> 00:08:33.260
OK?
00:08:33.260 --> 00:08:35.090
So now, I have the
expectation of a square.
00:08:35.090 --> 00:08:38.539
This does not depend, so this
is-- sorry, this is l of theta.
00:08:38.539 --> 00:08:42.307
This is the expectation of
l of theta of an x, right?
00:08:42.307 --> 00:08:44.390
That's for some variable,
and the expectation here
00:08:44.390 --> 00:08:45.710
is with respect to x.
00:08:45.710 --> 00:08:49.824
That's just the definition
of the Fisher information.
00:08:49.824 --> 00:08:52.240
So now I'm going to squeeze
this guy into the expectation.
00:08:52.240 --> 00:08:53.260
It does not depend on x.
00:08:53.260 --> 00:08:55.412
It just acts as a constant.
00:08:55.412 --> 00:08:57.370
And so what I have now
is that this is actually
00:08:57.370 --> 00:08:59.760
proportional to
the integral of h
00:08:59.760 --> 00:09:05.320
eta times the square root of
the expectation with respect
00:09:05.320 --> 00:09:06.600
to x of what?
00:09:06.600 --> 00:09:10.540
Well, here, I have d over
d theta of log of theta.
00:09:10.540 --> 00:09:15.620
And here, this guy is really
d eta over d theta, right?
00:09:19.524 --> 00:09:21.480
Agree?
00:09:21.480 --> 00:09:24.720
So now, what I'm really left
by-- so I get d over d theta
00:09:24.720 --> 00:09:25.520
times d--
00:09:25.520 --> 00:09:28.047
sorry, times d theta over d eta.
00:09:42.980 --> 00:09:51.396
so that's just d over
d eta of log of eta x.
00:10:00.198 --> 00:10:04.370
And then this guy is now
becoming d eta, right?
00:10:04.370 --> 00:10:06.590
OK, so this was a mess.
00:10:09.710 --> 00:10:12.320
This is a complete mess, because
I actually want to use phi.
00:10:12.320 --> 00:10:14.150
I should not actually
introduce phi at all.
00:10:14.150 --> 00:10:21.930
I should just talk about d eta
over d theta type of things.
00:10:21.930 --> 00:10:24.370
And then that would actually
make my life so much easier.
00:10:24.370 --> 00:10:25.002
OK.
00:10:25.002 --> 00:10:26.710
I'm not going to spend
more time on this.
00:10:26.710 --> 00:10:28.210
This is really just
the idea, right?
00:10:28.210 --> 00:10:30.170
You have square root
of a square in there.
00:10:30.170 --> 00:10:31.480
And then, when you do
your change of variable,
00:10:31.480 --> 00:10:32.710
you just pick up a square.
00:10:32.710 --> 00:10:35.750
You just pick up
something in here.
00:10:35.750 --> 00:10:38.110
And so you just move
this thing in there.
00:10:38.110 --> 00:10:38.920
You get a square.
00:10:38.920 --> 00:10:40.400
It goes inside the square.
00:10:40.400 --> 00:10:42.280
And so your derivative
of the log likelihood
00:10:42.280 --> 00:10:44.488
with respect to theta becomes
a derivative of the log
00:10:44.488 --> 00:10:46.240
likelihood with respect to eta.
00:10:46.240 --> 00:10:48.850
And that's the only thing
that's happening here.
00:10:48.850 --> 00:10:52.478
I'm just being super
sloppy, for some reason.
00:10:52.478 --> 00:10:54.612
OK.
00:10:54.612 --> 00:10:56.570
And then, of course, now,
what you're left with
00:10:56.570 --> 00:10:59.442
is that this is really
just proportional.
00:10:59.442 --> 00:11:00.650
Well, this is actually equal.
00:11:00.650 --> 00:11:02.150
Everything is
proportional, but this
00:11:02.150 --> 00:11:05.090
is equal to the Fisher
information tilde with respect
00:11:05.090 --> 00:11:07.050
to eta now.
00:11:07.050 --> 00:11:07.550
Right?
00:11:07.550 --> 00:11:09.630
You're doing this
with respect to eta.
00:11:09.630 --> 00:11:17.010
And so that's your new
prior with respect to eta.
00:11:17.010 --> 00:11:17.510
OK.
00:11:17.510 --> 00:11:21.800
So one thing that
you want to do,
00:11:21.800 --> 00:11:23.870
once you have-- so
remember, when you actually
00:11:23.870 --> 00:11:26.600
compute your
posterior rate, rather
00:11:26.600 --> 00:11:29.330
than having-- so you
start with a prior,
00:11:29.330 --> 00:11:32.090
and you have some observations,
let's say, x1 to xn.
00:11:36.190 --> 00:11:41.540
When you do Bayesian
inference, rather than spitting
00:11:41.540 --> 00:11:45.450
out just some theta hat, which
is an estimator for theta,
00:11:45.450 --> 00:11:48.565
you actually spit out an
entire posterior distribution--
00:11:53.220 --> 00:11:57.040
pi of theta, given x1 xn.
00:11:57.040 --> 00:11:57.540
OK?
00:11:57.540 --> 00:11:59.460
So there's an
entire distribution
00:11:59.460 --> 00:12:01.110
on the [INAUDIBLE] theta.
00:12:01.110 --> 00:12:04.290
And you can actually use this
to perform inference, rather
00:12:04.290 --> 00:12:06.150
than just having one number.
00:12:06.150 --> 00:12:06.950
OK?
00:12:06.950 --> 00:12:09.300
And so you could actually
build confidence regions
00:12:09.300 --> 00:12:10.540
from this thing.
00:12:10.540 --> 00:12:11.040
OK.
00:12:11.040 --> 00:12:16.600
And so a Bayesian
confidence interval--
00:12:16.600 --> 00:12:21.480
so if your set of parameters
is included in the real line,
00:12:21.480 --> 00:12:23.880
then you can actually--
it's not even guaranteed
00:12:23.880 --> 00:12:25.740
to be to be an interval.
00:12:25.740 --> 00:12:33.350
So let me call it a confidence
region, so a Bayesian
00:12:33.350 --> 00:12:40.090
confidence region, OK?
00:12:40.090 --> 00:12:43.360
So it's just a random subspace.
00:12:43.360 --> 00:12:47.810
So let's call it r,
is included in theta.
00:12:47.810 --> 00:12:49.750
And when you have the
deterministic one,
00:12:49.750 --> 00:12:53.650
we had a definition, which was
with respect to the randomness
00:12:53.650 --> 00:12:54.880
of the data, right?
00:12:54.880 --> 00:12:57.850
That's how you actually
had a random subset.
00:12:57.850 --> 00:12:59.740
So you had a random
confidence interval.
00:12:59.740 --> 00:13:02.200
Here, it's actually
conditioned on the data,
00:13:02.200 --> 00:13:03.640
but with respect
to the randomness
00:13:03.640 --> 00:13:06.531
that you actually get from
your posterior distribution.
00:13:06.531 --> 00:13:07.030
OK?
00:13:07.030 --> 00:13:16.760
So such that the
probability that your theta
00:13:16.760 --> 00:13:18.350
belongs to this
confidence region,
00:13:18.350 --> 00:13:24.500
given x1 xn is, say,
at least 1 minus alpha.
00:13:24.500 --> 00:13:27.040
Let's just take it
equal to 1 minus alpha.
00:13:27.040 --> 00:13:34.530
OK so that's a confidence
region at level 1 minus alpha.
00:13:34.530 --> 00:13:36.240
OK, so that's one way.
00:13:36.240 --> 00:13:38.770
So why would you actually--
00:13:38.770 --> 00:13:41.390
when I actually implement
Bayesian inference,
00:13:41.390 --> 00:13:44.480
I'm actually spitting out
that entire distribution.
00:13:44.480 --> 00:13:47.540
I need to summarize this thing
to communicate it, right?
00:13:47.540 --> 00:13:49.730
I cannot just say this
is this entire function.
00:13:49.730 --> 00:13:51.230
I want to know where
are the regions
00:13:51.230 --> 00:13:54.344
of high probability, where my
perimeter is supposed to be?
00:13:54.344 --> 00:13:56.510
And so here, when I have
this thing, what I actually
00:13:56.510 --> 00:13:58.010
want to have is
something that says,
00:13:58.010 --> 00:14:00.200
well, I want to
summarize this thing
00:14:00.200 --> 00:14:03.680
into some subset of the
real line, in which I'm
00:14:03.680 --> 00:14:08.120
sure that the area under the
curve, here, of my posterior
00:14:08.120 --> 00:14:11.734
is actually 1 minus alpha.
00:14:11.734 --> 00:14:13.400
And there's many ways
to do this, right?
00:14:16.790 --> 00:14:22.450
So one way to do this is
to look at level sets.
00:14:27.870 --> 00:14:29.550
And so rather than
actually-- so let's
00:14:29.550 --> 00:14:32.220
say my posterior
looks like this.
00:14:32.220 --> 00:14:35.760
I know, for example, if I
have a Gaussian distribution,
00:14:35.760 --> 00:14:38.230
I can actually take my posterior
to be-- my posterior is
00:14:38.230 --> 00:14:39.480
actually going to be Gaussian.
00:14:43.060 --> 00:14:50.760
And what I can do is to try
to cut it here on the y-axis
00:14:50.760 --> 00:14:54.910
so that now, the area under
the curve, when I cut here,
00:14:54.910 --> 00:14:59.430
is actually 1 minus alpha.
00:14:59.430 --> 00:15:02.080
OK, so I have some
threshold tau.
00:15:02.080 --> 00:15:05.490
If tau goes to plus
infinity, then I'm
00:15:05.490 --> 00:15:07.380
going to have that this
area under the curve
00:15:07.380 --> 00:15:10.380
here is going to--
00:15:18.012 --> 00:15:19.920
AUDIENCE: [INAUDIBLE]
00:15:19.920 --> 00:15:21.786
PHILIPPE RIGOLLET: Well, no.
00:15:21.786 --> 00:15:23.160
So the area under
the curve, when
00:15:23.160 --> 00:15:24.810
tau is going to
plus infinity, think
00:15:24.810 --> 00:15:27.892
of the small, the when
tau is just right here.
00:15:27.892 --> 00:15:29.280
AUDIENCE: [INAUDIBLE]
00:15:29.280 --> 00:15:32.150
PHILIPPE RIGOLLET: So this is
actually going to 0, right?
00:15:32.150 --> 00:15:33.530
And so I start here.
00:15:33.530 --> 00:15:36.290
And then I start going down
and down and down and down,
00:15:36.290 --> 00:15:39.440
until I actually get something
which is going down to 1 plus
00:15:39.440 --> 00:15:40.160
alpha.
00:15:40.160 --> 00:15:44.000
And if tau is going down to 0,
then my area under the curve
00:15:44.000 --> 00:15:44.750
is going to--
00:15:48.240 --> 00:15:51.604
if tau is here, I'm
cutting nowhere.
00:15:51.604 --> 00:15:52.770
And so I'm getting 1, right?
00:15:56.160 --> 00:15:56.980
Agree?
00:15:56.980 --> 00:16:00.540
Think of, when tau
is very close to 0,
00:16:00.540 --> 00:16:02.876
I'm cutting [? s ?]
s very far here.
00:16:02.876 --> 00:16:04.750
And so I'm getting some
area under the curve,
00:16:04.750 --> 00:16:06.000
which is almost everything.
00:16:06.000 --> 00:16:08.100
And so it's going to 1--
as tau going down to 0.
00:16:08.100 --> 00:16:09.960
Yeah?
00:16:09.960 --> 00:16:12.882
AUDIENCE: Does this only
work for [INAUDIBLE]
00:16:12.882 --> 00:16:14.340
PHILIPPE RIGOLLET:
No, it does not.
00:16:14.340 --> 00:16:17.160
I mean-- so this is a picture.
00:16:17.160 --> 00:16:20.277
So those two things work
for all of them, right?
00:16:20.277 --> 00:16:22.110
But when you have a
[? bimodal, ?] actually,
00:16:22.110 --> 00:16:23.526
this is actually
when things start
00:16:23.526 --> 00:16:24.990
to become interesting, right?
00:16:24.990 --> 00:16:30.600
So when we built a frequentist
confidence interval,
00:16:30.600 --> 00:16:34.590
it was always of the form x
bar plus or minus something.
00:16:34.590 --> 00:16:36.510
But now, if I start to
have a posterior that
00:16:36.510 --> 00:16:40.230
looks like this, what I'm
going to start cutting off,
00:16:40.230 --> 00:16:41.370
I'm going to have two--
00:16:41.370 --> 00:16:44.550
I mean, my confidence
region is going
00:16:44.550 --> 00:16:47.740
to be the union of
those two things, right?
00:16:47.740 --> 00:16:50.700
And it really reflects
the fact that there
00:16:50.700 --> 00:16:51.820
is this bimodal thing.
00:16:51.820 --> 00:16:53.486
It's going to say,
well, with hyperbole,
00:16:53.486 --> 00:16:56.840
I'm actually going to
be either here or here.
00:16:56.840 --> 00:16:59.840
Now, the meaning here of a
Bayesian confidence region
00:16:59.840 --> 00:17:02.570
and the confidence interval are
completely distinct notions,
00:17:02.570 --> 00:17:03.260
right?
00:17:03.260 --> 00:17:06.140
And I'm going to work
out on example with you
00:17:06.140 --> 00:17:08.673
so that we can actually
see that sometimes--
00:17:08.673 --> 00:17:10.089
I mean, both of
them, actually you
00:17:10.089 --> 00:17:11.839
can come up with
some crazy paradoxes.
00:17:11.839 --> 00:17:13.609
So since we don't
have that much time,
00:17:13.609 --> 00:17:17.339
I will actually talk to you
about why, in some instances,
00:17:17.339 --> 00:17:19.819
it's actually a good idea to
think of Bayesian confidence
00:17:19.819 --> 00:17:22.369
intervals rather than
frequentist ones.
00:17:22.369 --> 00:17:25.609
So before we go into
more details about what
00:17:25.609 --> 00:17:27.440
those Bayesian
confidence intervals are,
00:17:27.440 --> 00:17:29.570
let's remind
ourselves what does it
00:17:29.570 --> 00:17:33.110
mean to have a frequentist
confidence interval?
00:17:33.110 --> 00:17:33.610
Right?
00:17:46.460 --> 00:17:46.960
OK.
00:17:46.960 --> 00:17:49.690
So when I have a frequentist
confidence interval,
00:17:49.690 --> 00:17:59.290
let's say something like x bar n
to minus 1.96 sigma over root n
00:17:59.290 --> 00:18:06.136
and x bar n plus 1.96
sigma over root n,
00:18:06.136 --> 00:18:07.510
so that's the
confidence interval
00:18:07.510 --> 00:18:10.720
that you get for the
mean of some Gaussian
00:18:10.720 --> 00:18:16.390
with known variants to be
equal to sigma square, OK.
00:18:16.390 --> 00:18:18.460
So what we know is that
the meaning of this
00:18:18.460 --> 00:18:20.410
is the probability
that theta belongs
00:18:20.410 --> 00:18:25.870
to this is equal to 95%, right?
00:18:25.870 --> 00:18:27.340
And this, more
generally, you can
00:18:27.340 --> 00:18:29.620
think of being q alpha over 2.
00:18:29.620 --> 00:18:33.040
And what you're going to get
is 1 minus alpha here, OK?
00:18:33.040 --> 00:18:34.280
So what does it mean here?
00:18:34.280 --> 00:18:37.480
Well, it looks very much
like what we have here,
00:18:37.480 --> 00:18:39.970
except that we're not
conditioning on x1 xn.
00:18:39.970 --> 00:18:40.720
And we should not.
00:18:40.720 --> 00:18:43.830
Because there was a question
like that in the midterm--
00:18:43.830 --> 00:18:47.590
if I condition on x1 xn, this
probability is either 0 or 1.
00:18:47.590 --> 00:18:48.610
OK?
00:18:48.610 --> 00:18:50.170
Because once I
condition-- so here,
00:18:50.170 --> 00:18:52.170
this probability, actually,
here is with respect
00:18:52.170 --> 00:18:55.010
to the randomness in x1 xn.
00:18:55.010 --> 00:18:56.040
So if I condition--
00:18:58.860 --> 00:19:04.890
so let's build this thing,
r freq, for frequentist.
00:19:07.830 --> 00:19:11.930
Well, given x1 xn--
00:19:11.930 --> 00:19:13.940
and actually, I don't
need to know x1 xn really.
00:19:13.940 --> 00:19:16.420
What I need to know
is what xn bar is.
00:19:16.420 --> 00:19:18.140
Well, this thing now is what?
00:19:18.140 --> 00:19:22.200
It's 1, if theta is
in r, and it's 0,
00:19:22.200 --> 00:19:27.110
if theta is not in r, right?
00:19:27.110 --> 00:19:28.010
That's all there is.
00:19:28.010 --> 00:19:29.900
This is a deterministic
confidence interval,
00:19:29.900 --> 00:19:32.360
once I condition x1 xn.
00:19:32.360 --> 00:19:33.270
So I have a number.
00:19:33.270 --> 00:19:35.720
The average is maybe 3.
00:19:35.720 --> 00:19:36.790
And so I get 3.
00:19:36.790 --> 00:19:41.900
Either theta is between 3
minus 0.5 or in 3 plus 0.5,
00:19:41.900 --> 00:19:42.840
or it's not.
00:19:42.840 --> 00:19:44.000
And so there's basically--
00:19:44.000 --> 00:19:45.470
I mean, I write
it as probability,
00:19:45.470 --> 00:19:47.303
but it's really not a
probalistic statement.
00:19:47.303 --> 00:19:49.160
It's either it's true or not.
00:19:49.160 --> 00:19:50.240
Agreed?
00:19:50.240 --> 00:19:52.580
So what does it mean to have
a frequentist confidence
00:19:52.580 --> 00:19:53.550
interval?
00:19:53.550 --> 00:19:55.270
It means that if I were--
00:19:55.270 --> 00:19:58.660
and here, where the word
frequentist comes from,
00:19:58.660 --> 00:20:02.840
it says that if I repeat this
experiment over and over,
00:20:02.840 --> 00:20:06.700
meaning that on Monday, I
collect a sample of size n,
00:20:06.700 --> 00:20:09.260
and I build a
confidence interval,
00:20:09.260 --> 00:20:12.260
and then on Tuesday, I collect
another sample of size n,
00:20:12.260 --> 00:20:13.890
and I build a
confidence interval,
00:20:13.890 --> 00:20:17.000
and on Wednesday, I do this
again and again, what's going
00:20:17.000 --> 00:20:18.510
to happen is the following.
00:20:18.510 --> 00:20:21.530
I'm going to have my true
theta that lives here.
00:20:21.530 --> 00:20:23.900
And then on Monday, this
is the confidence interval
00:20:23.900 --> 00:20:25.470
that I build.
00:20:25.470 --> 00:20:28.802
OK, so this is the real line.
00:20:28.802 --> 00:20:31.260
The true theta is here, and
this is the confidence interval
00:20:31.260 --> 00:20:32.300
I build on Monday.
00:20:32.300 --> 00:20:32.800
All right?
00:20:32.800 --> 00:20:37.530
So x bar was here, and this
is my confidence interval.
00:20:37.530 --> 00:20:41.540
On Tuesday, I build this
confidence interval maybe.
00:20:41.540 --> 00:20:44.640
x bar was closer to
theta, but smaller.
00:20:44.640 --> 00:20:49.820
But then on Wednesday, I build
this confidence interval.
00:20:49.820 --> 00:20:50.880
I'm not here.
00:20:50.880 --> 00:20:51.920
It's not in there.
00:20:51.920 --> 00:20:53.681
And that's this case.
00:20:53.681 --> 00:20:54.180
Right?
00:20:54.180 --> 00:20:56.100
It happens that it's
just not in there.
00:20:56.100 --> 00:20:57.930
And then on Thursday,
I build another one.
00:20:57.930 --> 00:21:01.300
I almost miss it, but
I'm in there, et cetera.
00:21:01.300 --> 00:21:04.430
Maybe here, Here, I miss again.
00:21:04.430 --> 00:21:07.490
And so what it means to have a
confidence interval-- so what
00:21:07.490 --> 00:21:12.131
does it mean to have a
confidence interval at 95%?
00:21:12.131 --> 00:21:15.610
AUDIENCE: [INAUDIBLE]
00:21:15.610 --> 00:21:18.150
PHILIPPE RIGOLLET: Yeah, so
it means that if I repeat this
00:21:18.150 --> 00:21:19.800
the frequency of times--
00:21:19.800 --> 00:21:21.720
hence, the word
frequentist-- at which
00:21:21.720 --> 00:21:24.150
I'm actually going
to overlap that,
00:21:24.150 --> 00:21:26.910
I'm actually going to
contain theta, should be 95%.
00:21:26.910 --> 00:21:28.890
That's what frequentist means.
00:21:28.890 --> 00:21:31.740
So it's just a matter
of trusting that.
00:21:31.740 --> 00:21:35.690
So on one given thing, one
given realization of your data,
00:21:35.690 --> 00:21:36.970
it's not telling you anything.
00:21:36.970 --> 00:21:38.460
[INAUDIBLE] it's there or not.
00:21:38.460 --> 00:21:42.530
So it's not really
something that's actually
00:21:42.530 --> 00:21:46.430
something that assesses the
confidence of your decision,
00:21:46.430 --> 00:21:48.230
such as data is in there or not.
00:21:48.230 --> 00:21:50.360
It's something that
assesses the confidence
00:21:50.360 --> 00:21:52.410
you have in the method
that you're using.
00:21:52.410 --> 00:21:54.170
If you were you repeat
it over and again,
00:21:54.170 --> 00:21:56.470
it'd be the same thing.
00:21:56.470 --> 00:21:58.850
It would be 95% of the
time correct, right?
00:21:58.850 --> 00:22:02.570
So for example, we know
that we could build a test.
00:22:02.570 --> 00:22:04.940
So it's pretty clear
that you can actually
00:22:04.940 --> 00:22:09.020
build a test for whether
theta is equal to theta naught
00:22:09.020 --> 00:22:10.705
or not equal to
theta naught, by just
00:22:10.705 --> 00:22:13.080
checking whether theta naught
is in a confidence interval
00:22:13.080 --> 00:22:13.780
or not.
00:22:13.780 --> 00:22:15.530
And what it means is
that, if you actually
00:22:15.530 --> 00:22:21.170
are doing those tests at 5%,
that means that 5% of the time,
00:22:21.170 --> 00:22:23.440
if you do this over and
again, 5% of the time
00:22:23.440 --> 00:22:24.610
you're going to be wrong.
00:22:24.610 --> 00:22:27.640
I mentioned my wife
does market research.
00:22:27.640 --> 00:22:31.930
And she does maybe, I don't
know, 100,000 tests a year.
00:22:31.930 --> 00:22:34.210
And if they do
all of them at 1%,
00:22:34.210 --> 00:22:37.550
then it means that 1% of the
time, which is a lot of time,
00:22:37.550 --> 00:22:38.050
right?
00:22:38.050 --> 00:22:40.840
When you do 100,000 a
year, it's 1,000 of them
00:22:40.840 --> 00:22:41.755
are actually wrong.
00:22:41.755 --> 00:22:44.611
OK, I mean, she's
actually hedging
00:22:44.611 --> 00:22:47.110
against the fact that 1% of
them that are going to be wrong.
00:22:47.110 --> 00:22:49.109
That's 1,000 of them that
are going to be wrong.
00:22:49.109 --> 00:22:52.890
Just like, if you do this
100,000 times at 95%,
00:22:52.890 --> 00:22:54.910
5,000 of those guys
are actually not going
00:22:54.910 --> 00:22:56.360
to be the correct ones.
00:22:56.360 --> 00:22:56.860
OK?
00:22:56.860 --> 00:22:58.600
So I mean, it's kind of scary.
00:22:58.600 --> 00:23:01.300
But that's the way it is.
00:23:01.300 --> 00:23:03.730
So that's with the frequentist
interpretation of this is.
00:23:03.730 --> 00:23:07.720
Now, as I mentioned, when we
started this Bayesian chapter,
00:23:07.720 --> 00:23:10.930
I said, Bayesian
statistics converge to--
00:23:10.930 --> 00:23:14.800
I mean, Bayesian decisions
and Bayesian methods converge
00:23:14.800 --> 00:23:16.510
to frequentist methods.
00:23:16.510 --> 00:23:18.590
When the sample size
is large enough,
00:23:18.590 --> 00:23:20.610
they lead to the same decisions.
00:23:20.610 --> 00:23:22.930
And in general, they
need not be the same,
00:23:22.930 --> 00:23:24.970
but they tend to
actually, when the sample
00:23:24.970 --> 00:23:27.830
size is large enough, to
have the same behavior.
00:23:27.830 --> 00:23:30.850
Think about, for
example, the posterior
00:23:30.850 --> 00:23:34.450
that you have when you have
in the Gaussian case, right?
00:23:34.450 --> 00:23:36.420
We said that, in
the Gaussian case,
00:23:36.420 --> 00:23:38.020
what you're going
to see is that it's
00:23:38.020 --> 00:23:40.240
as if you had an extra
observation which
00:23:40.240 --> 00:23:43.230
was essentially
given by your prior.
00:23:43.230 --> 00:23:44.570
OK?
00:23:44.570 --> 00:23:50.830
And now, what's going to happen
is that, when this just one
00:23:50.830 --> 00:23:53.470
observation among n
plus 1, it's really
00:23:53.470 --> 00:23:55.720
going to be totally
drawn, and you
00:23:55.720 --> 00:23:58.390
won't see it when the
sample size grows larger.
00:23:58.390 --> 00:24:00.400
So Bayesian methods are
particularly useful when
00:24:00.400 --> 00:24:02.190
you have a small sample size.
00:24:02.190 --> 00:24:05.680
And when you have a small sample
size, the effect of the prior
00:24:05.680 --> 00:24:06.980
is going to be bigger.
00:24:06.980 --> 00:24:08.950
But most importantly,
you're not going
00:24:08.950 --> 00:24:10.810
to have to repeat this
thing over and again.
00:24:10.810 --> 00:24:11.830
And you're going
to have a meaning.
00:24:11.830 --> 00:24:13.180
You're going to have
to have something
00:24:13.180 --> 00:24:15.138
that has a meaning for
this particular data set
00:24:15.138 --> 00:24:16.150
that you have.
00:24:16.150 --> 00:24:19.900
When I said that the probability
that theta belongs to r--
00:24:19.900 --> 00:24:22.810
and here, I'm going to specify
the fact that it's a Bayesian
00:24:22.810 --> 00:24:24.740
confidence region,
like this one--
00:24:24.740 --> 00:24:27.490
this is actually
conditionally on the data
00:24:27.490 --> 00:24:29.490
that you've collected.
00:24:29.490 --> 00:24:32.110
It says, given this data, given
the points that you have--
00:24:32.110 --> 00:24:34.540
just put in some numbers,
if you want, in there--
00:24:34.540 --> 00:24:36.460
it's actually telling
you the probability
00:24:36.460 --> 00:24:39.430
that theta belongs to
this Bayesian thing,
00:24:39.430 --> 00:24:41.750
to this Bayesian
confidence region.
00:24:41.750 --> 00:24:44.230
Here, since I have
conditioned on x1 xn,
00:24:44.230 --> 00:24:46.840
this probability is really
just with respect to theta
00:24:46.840 --> 00:24:51.660
drawn from the prior, right?
00:24:51.660 --> 00:24:54.150
And so now, it has a
slightly different meaning.
00:24:54.150 --> 00:24:57.170
It's just telling
you that when--
00:24:57.170 --> 00:24:59.570
it's really making a
statement about where
00:24:59.570 --> 00:25:03.870
the regions of hyperability
of your posterior are.
00:25:03.870 --> 00:25:05.050
Now, why is that useful?
00:25:05.050 --> 00:25:11.600
Well, there's actually
an interesting story that
00:25:11.600 --> 00:25:13.980
goes behind Bayesian methods.
00:25:13.980 --> 00:25:17.240
Anybody knows the story of
the USS I think it's Scorpion?
00:25:17.240 --> 00:25:18.610
Do you know the story?
00:25:18.610 --> 00:25:22.770
So that was an American
vessel that disappeared.
00:25:22.770 --> 00:25:25.490
I think it was close to
Bermuda or something.
00:25:25.490 --> 00:25:28.790
But you can tell the story
of the Malaysian Airlines,
00:25:28.790 --> 00:25:31.640
except that I don't think
it's such a successful story.
00:25:31.640 --> 00:25:33.770
But the idea was
essentially, we're
00:25:33.770 --> 00:25:36.050
trying to find where
this thing happened.
00:25:36.050 --> 00:25:39.800
And of course, this
is a one-time thing.
00:25:39.800 --> 00:25:41.686
You need something
that works once.
00:25:41.686 --> 00:25:44.060
You need something that works
for this particular vessel.
00:25:44.060 --> 00:25:46.601
And you don't care, if you go
to the Navy, and you tell them,
00:25:46.601 --> 00:25:48.320
well, here's a method.
00:25:48.320 --> 00:25:51.730
And for 95 out of 100 vessels
that you're going to lose,
00:25:51.730 --> 00:25:53.350
we're going to be
able to find it.
00:25:53.350 --> 00:25:57.230
And they want this to work
for this particular one.
00:25:57.230 --> 00:25:59.750
And so they were
looking, and they were
00:25:59.750 --> 00:26:02.200
diving in different places.
00:26:02.200 --> 00:26:04.710
And suddenly, they
brought in this guy.
00:26:04.710 --> 00:26:05.460
I forget his name.
00:26:05.460 --> 00:26:08.960
I mean, there's a whole story
about this on Wikipedia.
00:26:08.960 --> 00:26:10.612
And he started
collecting the data
00:26:10.612 --> 00:26:13.070
that they had from different
dives and maybe from currents.
00:26:13.070 --> 00:26:14.569
And he started to
put everything in.
00:26:14.569 --> 00:26:17.540
And he said, OK, what is
the posterior distribution
00:26:17.540 --> 00:26:21.140
of the location of the
vessel, given all the things
00:26:21.140 --> 00:26:22.340
that I've seen?
00:26:22.340 --> 00:26:23.390
And what have you seen?
00:26:23.390 --> 00:26:25.280
Well, you've seen that it's
not here, it's not there,
00:26:25.280 --> 00:26:26.071
and it's not there.
00:26:26.071 --> 00:26:29.360
And you've also seen that the
currents were going that way,
00:26:29.360 --> 00:26:30.786
and the winds were
going that way.
00:26:30.786 --> 00:26:32.660
And you can actually
put some modeling traits
00:26:32.660 --> 00:26:33.890
to understand this.
00:26:33.890 --> 00:26:37.940
Now, given this, for this
particular data that you have,
00:26:37.940 --> 00:26:41.420
you can actually think of having
a two-dimensional density that
00:26:41.420 --> 00:26:44.650
tells you where it's more
likely that the vessel is.
00:26:44.650 --> 00:26:46.400
And where are you going
to be looking for?
00:26:46.400 --> 00:26:48.097
Well, if it's a
multimodal distribution,
00:26:48.097 --> 00:26:50.180
you're just going to go
to the highest mode first,
00:26:50.180 --> 00:26:52.190
because that's where it's
the most likely to be.
00:26:52.190 --> 00:26:53.600
And maybe it's not
there, so you're just
00:26:53.600 --> 00:26:55.250
going to update your
posterior, based on the fact
00:26:55.250 --> 00:26:56.791
that it's not there,
and do it again.
00:26:56.791 --> 00:26:59.270
And actually, after
two dives, I think,
00:26:59.270 --> 00:27:01.010
he actually found the thing.
00:27:01.010 --> 00:27:03.122
And that's exactly where
Bayesian statistics
00:27:03.122 --> 00:27:03.830
start to kick in.
00:27:03.830 --> 00:27:08.570
Because you put a lot of
knowledge into your model,
00:27:08.570 --> 00:27:11.340
but you also can actually factor
in a bunch of information,
00:27:11.340 --> 00:27:11.840
right?
00:27:11.840 --> 00:27:13.460
The model, he had
to build a model
00:27:13.460 --> 00:27:17.360
that was actually taking into
account and currents, and when.
00:27:17.360 --> 00:27:20.780
And what you can have
as a guarantee is that,
00:27:20.780 --> 00:27:22.610
when you talk about
the probability
00:27:22.610 --> 00:27:27.346
that this vessel is
in this location,
00:27:27.346 --> 00:27:28.970
given what you've
observed in the past,
00:27:28.970 --> 00:27:30.140
it actually has some sense.
00:27:30.140 --> 00:27:34.610
Whereas, if you were to
use a frequentist approach,
00:27:34.610 --> 00:27:35.810
then there's no probability.
00:27:35.810 --> 00:27:38.660
Either it's underneath this
position or it's not, right?
00:27:38.660 --> 00:27:41.520
So that's actually where
it start to make sense.
00:27:41.520 --> 00:27:43.370
And so you can
actually build this.
00:27:43.370 --> 00:27:44.930
And there's actually
a lot of methods
00:27:44.930 --> 00:27:47.300
that are based on,
for search, that
00:27:47.300 --> 00:27:48.979
are based on Bayesian methods.
00:27:48.979 --> 00:27:50.520
I think, for example,
the Higgs boson
00:27:50.520 --> 00:27:51.920
was based on a lot
of Bayesian methods,
00:27:51.920 --> 00:27:54.050
because this is something
you need to find [INAUDIBLE],,
00:27:54.050 --> 00:27:54.549
right?
00:27:54.549 --> 00:27:57.330
I mean, there was a lot of
prior that has to be built in.
00:27:57.330 --> 00:27:57.830
OK.
00:27:57.830 --> 00:27:59.621
So now, you build this
confidence interval.
00:27:59.621 --> 00:28:02.300
And the nicest way to do
it is to use level sets.
00:28:02.300 --> 00:28:05.210
But again, just like for
Gaussians, I mean, if I had,
00:28:05.210 --> 00:28:12.290
even in the Gaussian
case, I decided
00:28:12.290 --> 00:28:16.110
to go at x bar plus
or minus something,
00:28:16.110 --> 00:28:19.500
but I could go at something
that's completely asymmetric.
00:28:19.500 --> 00:28:21.467
So what's happening is
that here, this method
00:28:21.467 --> 00:28:23.550
guarantees that you're
going to have the narrowest
00:28:23.550 --> 00:28:24.800
possible confidence intervals.
00:28:24.800 --> 00:28:27.480
That's essentially what
it's telling you, OK?
00:28:27.480 --> 00:28:31.890
Because every time I'm choosing
a point, starting from here,
00:28:31.890 --> 00:28:36.170
I'm actually putting as much
area under the curve as I can.
00:28:36.170 --> 00:28:38.660
All right.
00:28:38.660 --> 00:28:41.737
So those are called Bayesian
confidence [? interval. ?]
00:28:41.737 --> 00:28:43.320
Oh yeah, and I
promised you that we're
00:28:43.320 --> 00:28:46.500
going to work on some
example that actually
00:28:46.500 --> 00:28:50.940
gives a meaning to what I just
told you, with actual numbers.
00:28:50.940 --> 00:28:56.790
So this is something that's
taken from Wasserman's book.
00:28:56.790 --> 00:29:01.140
And also, it's
coming from a paper,
00:29:01.140 --> 00:29:03.780
from a stats paper,
from [? Wolpert ?] and I
00:29:03.780 --> 00:29:05.760
don't know who, from the '80s.
00:29:05.760 --> 00:29:07.760
And essentially,
this is how it works.
00:29:07.760 --> 00:29:10.680
So assume that you have
n equals 2 observations.
00:29:14.320 --> 00:29:18.780
And you have y1, so those
observations are y1--
00:29:18.780 --> 00:29:20.680
no, sorry, let's
call them x1, which
00:29:20.680 --> 00:29:26.000
is theta, plus epsilon 1 and x2,
which is theta plus epsilon 2,
00:29:26.000 --> 00:29:31.060
where epsilon 1 and
epsilon 2 are iid.
00:29:31.060 --> 00:29:33.280
And the probability
that epsilon i is equal
00:29:33.280 --> 00:29:35.110
to plus 1 is equal
to the probability
00:29:35.110 --> 00:29:38.440
that epsilon i is equal to
minus 1 is equal to 1/2.
00:29:38.440 --> 00:29:44.550
OK, so it's just the uniform
sign plus minus 1, OK?
00:29:44.550 --> 00:29:46.590
Now, let's think
about so you're trying
00:29:46.590 --> 00:29:47.970
to do some inference on theta.
00:29:47.970 --> 00:29:50.261
Maybe you actually want to
find some inference on theta
00:29:50.261 --> 00:29:51.825
that's actually based on--
00:29:51.825 --> 00:29:55.660
and that's based only
on the x1 and x2.
00:29:55.660 --> 00:29:56.430
OK?
00:29:56.430 --> 00:29:58.750
So I'm going to actually
build a confidence interval.
00:29:58.750 --> 00:30:01.110
But what I really
want to build is a--
00:30:03.594 --> 00:30:05.010
but let's start
thinking about how
00:30:05.010 --> 00:30:07.780
I would find an estimator
for those two things.
00:30:07.780 --> 00:30:09.970
Well, what values am I
going to be getting, right?
00:30:09.970 --> 00:30:13.750
So I'm going to get either
theta plus 1 or theta minus 1.
00:30:13.750 --> 00:30:15.610
And actually, I can
get basically four
00:30:15.610 --> 00:30:19.260
different observations, right?
00:30:19.260 --> 00:30:21.516
Sorry, four different
pairs of observations--
00:30:30.760 --> 00:30:32.410
plus plus theta minus 1.
00:30:32.410 --> 00:30:33.170
Agreed?
00:30:33.170 --> 00:30:37.340
Those are the four possible
observations that I can get.
00:30:37.340 --> 00:30:38.970
Agreed?
00:30:38.970 --> 00:30:42.924
Either they're both equal to
plus 1, both equal to minus 1,
00:30:42.924 --> 00:30:44.340
or one of the two
is equal to plus
00:30:44.340 --> 00:30:46.950
1, the other one to
minus 1, or the epsilons.
00:30:46.950 --> 00:30:47.580
OK.
00:30:47.580 --> 00:30:49.730
So those are the four
observations I can get.
00:30:49.730 --> 00:30:56.010
So in particular, if
they take the same value,
00:30:56.010 --> 00:30:59.390
and you know it's either
theta plus 1 or theta minus 1,
00:30:59.390 --> 00:31:02.100
and if they take a different
value, I know one of them
00:31:02.100 --> 00:31:04.555
is theta plus 1, and one
is actually theta minus 1.
00:31:04.555 --> 00:31:07.180
So in particular, if I take the
average of those two guys, when
00:31:07.180 --> 00:31:09.138
they take different
values, I know I'm actually
00:31:09.138 --> 00:31:10.850
getting theta right.
00:31:10.850 --> 00:31:14.441
So let's build a
confidence region.
00:31:14.441 --> 00:31:16.940
OK, so I'm actually going to
take a confidence region, which
00:31:16.940 --> 00:31:18.810
is just a singleton.
00:31:21.662 --> 00:31:23.120
And I'm going to
say the following.
00:31:23.120 --> 00:31:32.460
Well, if x1 is equal to x2, I'm
just going to take x1 minus 1,
00:31:32.460 --> 00:31:33.320
OK?
00:31:33.320 --> 00:31:34.790
So I'm just saying,
well, I'm never
00:31:34.790 --> 00:31:37.310
going to able to resolve
whether it's plus 1 or minus 1
00:31:37.310 --> 00:31:38.864
that actually gives
me the best one,
00:31:38.864 --> 00:31:41.030
so I'm just going to take
a dive and say, well, it's
00:31:41.030 --> 00:31:42.594
just plus 1.
00:31:42.594 --> 00:31:44.860
OK?
00:31:44.860 --> 00:31:47.710
And then, if they're
different, then here,
00:31:47.710 --> 00:31:50.830
I can do much better.
00:31:50.830 --> 00:31:52.929
I'm going to actually
just think the average.
00:31:56.282 --> 00:31:58.200
OK?
00:31:58.200 --> 00:32:08.360
Now, what I claim is that
this is a confidence region--
00:32:08.360 --> 00:32:10.370
and by default, when
I don't mention it,
00:32:10.370 --> 00:32:16.190
this is a frequentist
confidence region--
00:32:16.190 --> 00:32:18.740
at level 75%.
00:32:21.050 --> 00:32:21.550
OK?
00:32:21.550 --> 00:32:23.100
So let's just check that.
00:32:23.100 --> 00:32:24.685
To check that this
is correct, I need
00:32:24.685 --> 00:32:27.460
to check that the probability
under the realization of x1
00:32:27.460 --> 00:32:30.940
and x2, that theta belongs,
is one of those two guys,
00:32:30.940 --> 00:32:33.291
is actually equal to 0.75.
00:32:33.291 --> 00:32:33.790
Yes?
00:32:33.790 --> 00:32:36.529
AUDIENCE: What are
the [INAUDIBLE]
00:32:36.529 --> 00:32:39.070
PHILIPPE RIGOLLET: Well, it's
just the frequentist confidence
00:32:39.070 --> 00:32:41.842
interval that does not
need to be an interval.
00:32:41.842 --> 00:32:44.050
Actually, in this case, it's
going to be an interval.
00:32:44.050 --> 00:32:46.602
But that's just what it means.
00:32:46.602 --> 00:32:50.055
Yeah, region for Bayesian
was just because--
00:32:50.055 --> 00:32:51.430
I mean, the
confidence intervals,
00:32:51.430 --> 00:32:53.320
when we're frequentist,
we tend to make them
00:32:53.320 --> 00:32:54.606
intervals, because we want--
00:32:54.606 --> 00:32:56.980
but when you're Bayesian, and
you're doing this level set
00:32:56.980 --> 00:32:58.180
thing, you cannot
really guarantee,
00:32:58.180 --> 00:33:00.460
unless its [INAUDIBLE] is
going to be an interval.
00:33:00.460 --> 00:33:02.720
So region is just a way to
not have to say interval,
00:33:02.720 --> 00:33:03.430
in case it's not.
00:33:06.080 --> 00:33:06.640
OK.
00:33:06.640 --> 00:33:08.490
So I have this thing.
00:33:08.490 --> 00:33:11.440
So what I need to check is
the probability that theta
00:33:11.440 --> 00:33:13.000
is in one of those
two things, right?
00:33:13.000 --> 00:33:16.060
So what I need to find is
the probability that theta
00:33:16.060 --> 00:33:24.220
is an [INAUDIBLE] Well, x1 minus
1 and x1 is not equal to x2.
00:33:24.220 --> 00:33:26.840
And those are disjoint events,
so it's plus the probability
00:33:26.840 --> 00:33:35.980
that theta is in x1
plus x2 over 2 and x1--
00:33:35.980 --> 00:33:37.580
sorry, that's equal.
00:33:37.580 --> 00:33:39.700
That's different.
00:33:39.700 --> 00:33:40.200
OK.
00:33:40.200 --> 00:33:42.780
And OK, just before we actually
finish the computation,
00:33:42.780 --> 00:33:44.730
why do I have 75%?
00:33:44.730 --> 00:33:46.920
75% is 3/4.
00:33:46.920 --> 00:33:48.930
So it means that
we have four cases.
00:33:48.930 --> 00:33:52.020
And essentially, I did
not account for one case.
00:33:52.020 --> 00:33:52.650
And it's true.
00:33:52.650 --> 00:33:56.040
I did not account
for this case, when
00:33:56.040 --> 00:34:01.060
the both of the epsilon
i's are equal to minus 1.
00:34:01.060 --> 00:34:01.560
Right?
00:34:01.560 --> 00:34:03.393
So this is essentially
the one I'm not going
00:34:03.393 --> 00:34:04.620
to be able to account for.
00:34:04.620 --> 00:34:06.040
And so we'll see
that in a second.
00:34:06.040 --> 00:34:09.310
So in this case, we know
that everything goes great.
00:34:09.310 --> 00:34:09.810
Right?
00:34:09.810 --> 00:34:11.080
So in this case, this is--
00:34:11.080 --> 00:34:11.580
OK.
00:34:11.580 --> 00:34:13.831
Well, let's just start
from the first line.
00:34:13.831 --> 00:34:15.330
So the first line
is the probability
00:34:15.330 --> 00:34:20.290
that theta is equal to x1 minus
1 and those two are equal.
00:34:20.290 --> 00:34:28.440
So this is the probability
that theta is equal to--
00:34:28.440 --> 00:34:36.260
well, this is theta
plus epsilon 1 minus 1.
00:34:36.260 --> 00:34:43.409
And epsilon 1 is equal
to epsilon 2, right?
00:34:43.409 --> 00:34:45.290
Because I can remove
the theta from here,
00:34:45.290 --> 00:34:47.780
and I can actually remove
the theta from here,
00:34:47.780 --> 00:34:50.765
so that this guy here is
just epsilon 1 is equal to 1.
00:34:50.765 --> 00:34:52.407
So when I intersect
with this guy,
00:34:52.407 --> 00:34:54.740
it's actually the same thing
as epsilon 1 is equal to 1,
00:34:54.740 --> 00:34:56.530
as well--
00:34:56.530 --> 00:34:59.780
episilon 2 is equal
to 1, as well, OK?
00:34:59.780 --> 00:35:05.240
So this first thing is actually
equal to the probability
00:35:05.240 --> 00:35:10.780
that epsilon 1 is equal to 1
and epsilon 2 is equal to 1,
00:35:10.780 --> 00:35:14.180
which is equal to what?
00:35:14.180 --> 00:35:15.570
AUDIENCE: [INAUDIBLE]
00:35:15.570 --> 00:35:17.070
PHILIPPE RIGOLLET:
Yeah, 1/4, right?
00:35:17.070 --> 00:35:19.870
So that's just the
first case over there.
00:35:19.870 --> 00:35:21.020
They're independent.
00:35:21.020 --> 00:35:23.420
Now, I still need to
do the second one.
00:35:23.420 --> 00:35:24.650
So this case is what?
00:35:24.650 --> 00:35:28.890
Well, when those things are
equal, x1 plus x2 over 2
00:35:28.890 --> 00:35:29.390
is what?
00:35:29.390 --> 00:35:31.920
Well, I get theta
plus theta over 2.
00:35:31.920 --> 00:35:33.800
So that's just equal
to the probability
00:35:33.800 --> 00:35:39.620
that epsilon 1 plus epsilon
2 over 2 is equal to 0
00:35:39.620 --> 00:35:43.600
and epsilon 1 is
different from epsilon 2.
00:35:43.600 --> 00:35:44.100
Agreed?
00:35:46.860 --> 00:35:49.797
I just removed the thetas from
these equations, because I can.
00:35:49.797 --> 00:35:51.380
They're just on both
sides every time.
00:35:54.810 --> 00:35:55.310
OK.
00:35:55.310 --> 00:35:56.482
And so that means what?
00:35:56.482 --> 00:35:58.440
That means that the second
part-- so this thing
00:35:58.440 --> 00:36:02.120
is actually equal to
1/4 plus the probability
00:36:02.120 --> 00:36:05.350
that epsilon 1 and epsilon
2 over 2 is equal to 0.
00:36:05.350 --> 00:36:06.544
I can remove the 2.
00:36:06.544 --> 00:36:08.460
So this is just the
probability that one is 1,
00:36:08.460 --> 00:36:10.560
and the other one
is minus 1, right?
00:36:10.560 --> 00:36:12.510
So that's equal
to the probability
00:36:12.510 --> 00:36:17.820
that epsilon 1 is equal to 1 and
epsilon 2 is equal to minus 1
00:36:17.820 --> 00:36:21.360
plus the probability that
epsilon 1 is equal to minus 1
00:36:21.360 --> 00:36:24.447
and epsilon 2 is
equal to plus 1, OK?
00:36:24.447 --> 00:36:25.780
Because they're disjoint events.
00:36:25.780 --> 00:36:28.080
So I can break them
into the sum of the two.
00:36:28.080 --> 00:36:32.310
And each of those guys is also
one of the atomic part of it.
00:36:32.310 --> 00:36:33.960
It's one of the basic things.
00:36:33.960 --> 00:36:36.011
And so each of those
guys has probability 1/4.
00:36:36.011 --> 00:36:38.010
And so here, we can really
see that we accounted
00:36:38.010 --> 00:36:41.910
for everything, except for the
case when epsilon 1 was equal
00:36:41.910 --> 00:36:44.730
to minus 1, and epsilon
2 was equal to minus 1.
00:36:44.730 --> 00:36:45.570
So this is 1/4.
00:36:45.570 --> 00:36:46.380
This is 1/4.
00:36:46.380 --> 00:36:49.850
So the whole thing
is equal to 3/4.
00:36:49.850 --> 00:36:56.060
So now, what we have is that
the probability that epsilon 1
00:36:56.060 --> 00:36:57.350
is in--
00:36:57.350 --> 00:37:03.230
so the probability that data
belongs to this confidence
00:37:03.230 --> 00:37:06.280
region is equal to 3/4.
00:37:06.280 --> 00:37:07.990
And that's very nice.
00:37:07.990 --> 00:37:09.740
But the thing is some
people are sort of--
00:37:09.740 --> 00:37:12.650
I mean, it's not super nice
to be able to see this,
00:37:12.650 --> 00:37:17.510
because, in a way, I know that,
if I observe x1 and x2 that
00:37:17.510 --> 00:37:24.050
are different, I know
for sure that theta,
00:37:24.050 --> 00:37:25.882
that I actually got
the right theta, right?
00:37:25.882 --> 00:37:27.590
That this confidence
interval is actually
00:37:27.590 --> 00:37:31.370
happening with probability 1.
00:37:31.370 --> 00:37:34.700
And the problem is
that I do not know--
00:37:34.700 --> 00:37:37.640
I cannot make this precise
with the notion of frequentist
00:37:37.640 --> 00:37:39.230
confidence intervals.
00:37:39.230 --> 00:37:39.730
OK?
00:37:39.730 --> 00:37:41.396
Because frequentist
confidence intervals
00:37:41.396 --> 00:37:43.810
have to account for the
fact that, in the future,
00:37:43.810 --> 00:37:47.810
it might not be the case
that x1 and x2 are different.
00:37:47.810 --> 00:37:53.360
So Bayesian confidence
regions, by definition--
00:37:53.360 --> 00:37:54.530
well, they're all gone--
00:37:54.530 --> 00:37:57.387
but they are conditioned
on the data that I have.
00:37:57.387 --> 00:37:58.470
And so that's what I want.
00:37:58.470 --> 00:38:00.800
I want to be able to make
this statement conditionally
00:38:00.800 --> 00:38:02.640
and the data that I have.
00:38:02.640 --> 00:38:03.140
OK.
00:38:03.140 --> 00:38:06.450
So if I want to be able
to make this statement,
00:38:06.450 --> 00:38:08.450
if I want to build a
Bayesian confidence region,
00:38:08.450 --> 00:38:10.520
I'm going to have to
put a prior on theta.
00:38:10.520 --> 00:38:12.050
So without loss of generality--
00:38:12.050 --> 00:38:16.520
I mean, maybe with--
but let's assume
00:38:16.520 --> 00:38:25.980
that pi is a prior on theta.
00:38:25.980 --> 00:38:31.540
And let's assume that pi
of j is strictly positive
00:38:31.540 --> 00:38:35.920
for all integers
j equal, say, 0--
00:38:35.920 --> 00:38:42.770
well, actually, for all j in the
integers, positive or negative.
00:38:42.770 --> 00:38:43.270
OK.
00:38:43.270 --> 00:38:46.870
So that's a pretty weak
assumption on my prior.
00:38:46.870 --> 00:38:52.901
I'm just assuming that
theta is some integer.
00:38:52.901 --> 00:38:57.290
And now, let's build our
Bayesian confidence region.
00:38:57.290 --> 00:38:59.540
Well, if I want to build a
Bayesian confidence region,
00:38:59.540 --> 00:39:01.520
I need to understand what
my posterior is going to be.
00:39:01.520 --> 00:39:02.089
OK?
00:39:02.089 --> 00:39:04.630
And if I want to understand what
my posterior is going to be,
00:39:04.630 --> 00:39:11.530
I actually need to build
a likelihood, right?
00:39:11.530 --> 00:39:16.370
So we know that it's the
product of the likelihood
00:39:16.370 --> 00:39:20.740
and of the prior divided by--
00:39:20.740 --> 00:39:21.240
OK.
00:39:31.140 --> 00:39:32.850
So what is my likelihood?
00:39:32.850 --> 00:39:35.540
So my likelihood
is the probability
00:39:35.540 --> 00:39:40.580
of x1 x2, given theta.
00:39:40.580 --> 00:39:41.240
Right?
00:39:41.240 --> 00:39:45.010
That's what the
likelihood should be.
00:39:45.010 --> 00:39:49.840
And now let's say
that actually, just
00:39:49.840 --> 00:39:51.910
to make things a
little simpler, let
00:39:51.910 --> 00:40:07.230
us assume that x1 is
equal to, I don't know, 5,
00:40:07.230 --> 00:40:11.180
and x2 is equal to 7.
00:40:11.180 --> 00:40:12.540
OK?
00:40:12.540 --> 00:40:16.350
So I'm not going to take the
case where they're actually
00:40:16.350 --> 00:40:19.180
equal to each other, because
I know that, in this case,
00:40:19.180 --> 00:40:20.550
x1 and x2 are different.
00:40:20.550 --> 00:40:23.970
I know I'm going to actually
nail exactly what theta is,
00:40:23.970 --> 00:40:26.540
by looking at the average
of those guys, right?
00:40:26.540 --> 00:40:30.630
Here, it must be that
theta is equal to 6.
00:40:30.630 --> 00:40:34.491
So what I want is to compute
the likelihood at 5 and 7, OK?
00:40:38.419 --> 00:40:42.350
And what is this likelihood?
00:40:42.350 --> 00:40:53.950
Well, if theta is
equal to 6, that's
00:40:53.950 --> 00:41:00.010
just the probability that I
will observe 5 and 7, right?
00:41:00.010 --> 00:41:01.910
So what is the probability
I observe 5 and 7?
00:41:04.610 --> 00:41:05.510
Yeah?
00:41:05.510 --> 00:41:06.672
1?
00:41:06.672 --> 00:41:08.499
AUDIENCE: 1/4.
00:41:08.499 --> 00:41:10.040
PHILIPPE RIGOLLET:
That's 1/4, right?
00:41:10.040 --> 00:41:15.260
As the probability, I have
minus 1 for the first epsilon 1,
00:41:15.260 --> 00:41:15.760
right?
00:41:15.760 --> 00:41:17.260
So this is infinity 6.
00:41:17.260 --> 00:41:23.080
This is the probability that
epsilon 1 is equal to minus 1,
00:41:23.080 --> 00:41:28.790
and epsilon 2 is equal to
plus 1, which is equal to 1/4.
00:41:28.790 --> 00:41:31.520
So this probability is 1/4.
00:41:31.520 --> 00:41:35.560
If theta is different from
6, what is this probability?
00:41:35.560 --> 00:41:37.630
So if theta is different
from 6, since we
00:41:37.630 --> 00:41:41.210
know that we've only
loaded the integers--
00:41:41.210 --> 00:41:46.770
so if theta has to
be another integer,
00:41:46.770 --> 00:41:49.214
what is the probability
that I see 5 and 7?
00:41:49.214 --> 00:41:49.731
AUDIENCE: 0.
00:41:49.731 --> 00:41:50.606
PHILIPPE RIGOLLET: 0.
00:41:53.860 --> 00:41:55.190
So that's my likelihood.
00:41:55.190 --> 00:42:00.210
And if I want to know
what my posterior is,
00:42:00.210 --> 00:42:03.340
well, it's just
pi of theta times
00:42:03.340 --> 00:42:10.240
p of 5/6, given theta, divided
by the sum over all T's, say,
00:42:10.240 --> 00:42:11.890
in Z. Right?
00:42:11.890 --> 00:42:14.590
So now, I just need to
normalize this thing.
00:42:14.590 --> 00:42:21.950
So of pi of T, p of
4/6, given T. Agreed?
00:42:24.730 --> 00:42:27.350
That's just the definition
of the posterior.
00:42:27.350 --> 00:42:30.330
But when I sum
these guys, there's
00:42:30.330 --> 00:42:34.780
only one that counts,
because, for those things,
00:42:34.780 --> 00:42:38.140
we know that this is actually
equal to 0 for every T,
00:42:38.140 --> 00:42:41.470
except for when T is equal to 6.
00:42:41.470 --> 00:42:45.380
So this entire sum
here is actually
00:42:45.380 --> 00:42:54.310
equal to pi of 6
times p of 5/6--
00:42:54.310 --> 00:43:03.360
sorry, 5/7, of 5/7,
given that theta
00:43:03.360 --> 00:43:08.370
is equal to 6, which we
know is equal to 1/4.
00:43:08.370 --> 00:43:10.630
And I did not tell
you what pi of 6 was.
00:43:16.840 --> 00:43:18.070
But it's the same thing here.
00:43:18.070 --> 00:43:21.020
The posterior for any
theta that's not 6
00:43:21.020 --> 00:43:23.520
is actually going to be-- this
guy's going to be equal to 0.
00:43:23.520 --> 00:43:26.130
So I really don't
care what this guy is.
00:43:26.130 --> 00:43:29.270
So what it means is that
my posterior becomes what?
00:43:33.870 --> 00:43:40.290
It becomes the
posterior pi of theta,
00:43:40.290 --> 00:43:46.970
given 5 and 7 is equal to--
well, when theta is not
00:43:46.970 --> 00:43:49.090
equal to 6, this is actually 0.
00:43:49.090 --> 00:43:52.450
So regardless of what I do here,
I get something which is 0.
00:43:55.120 --> 00:43:58.000
And if theta is equal
to 6, what I get
00:43:58.000 --> 00:44:02.500
is pi of 6 times
p of 5/7, given 6,
00:44:02.500 --> 00:44:05.560
which I've just computed
here, which is 1/4 divided
00:44:05.560 --> 00:44:08.140
by pi of 6 times 1/4.
00:44:08.140 --> 00:44:10.640
So it's the ratio of two
things that are identical.
00:44:10.640 --> 00:44:13.360
So I get 1.
00:44:13.360 --> 00:44:16.570
So now, my posterior
tells me that, given
00:44:16.570 --> 00:44:22.440
that I observe 5
and 7, theta has
00:44:22.440 --> 00:44:27.690
to be 1 with probability-- has
to be 6 with probability 1.
00:44:27.690 --> 00:44:32.850
So now, I say that this
thing here-- so now, this
00:44:32.850 --> 00:44:34.590
is not something
that actually makes
00:44:34.590 --> 00:44:37.440
sense when I talk about
frequentist confidence
00:44:37.440 --> 00:44:38.310
intervals.
00:44:38.310 --> 00:44:40.560
They don't really make sense,
to talk about confidence
00:44:40.560 --> 00:44:42.330
intervals, given something.
00:44:42.330 --> 00:44:44.100
And so now, given that
I observe 5 and 7,
00:44:44.100 --> 00:44:46.224
I know that the probability
of theta is equal to 1.
00:44:46.224 --> 00:44:50.310
And in this sense, the
Bayesian confidence interval
00:44:50.310 --> 00:44:54.699
is actually more meaningful.
00:44:54.699 --> 00:44:56.990
So one thing I want to actually
say about this Bayesian
00:44:56.990 --> 00:44:58.466
confidence interval
is that it's--
00:45:01.100 --> 00:45:03.181
I mean, here, it's equal
to the value 1, right?
00:45:03.181 --> 00:45:05.180
So it really encompasses
the thing that we want.
00:45:05.180 --> 00:45:06.763
But the fact that
we actually computed
00:45:06.763 --> 00:45:09.140
it using the Bayesian
posterior and the Bayesian rule
00:45:09.140 --> 00:45:10.806
did not really matter
for this argument.
00:45:10.806 --> 00:45:12.980
All I just said was
that it had a prior.
00:45:12.980 --> 00:45:15.080
But just what I
want to illustrate
00:45:15.080 --> 00:45:17.930
is the fact that we can
actually give a meaning
00:45:17.930 --> 00:45:21.740
to the probability that
theta is equal to 6,
00:45:21.740 --> 00:45:23.390
given that I see 5 and 7.
00:45:23.390 --> 00:45:26.780
Whereas, we cannot really
in the other cases.
00:45:26.780 --> 00:45:28.490
And we don't have
to be particularly
00:45:28.490 --> 00:45:31.740
precise in the prior and theta
to be able to give theta this--
00:45:31.740 --> 00:45:32.930
to give this meaning.
00:45:32.930 --> 00:45:35.062
OK?
00:45:35.062 --> 00:45:36.038
All right.
00:45:38.966 --> 00:45:43.130
So now, as I said, I think
the main power of Bayesian
00:45:43.130 --> 00:45:45.980
inference is that it spits out
the posterior distribution,
00:45:45.980 --> 00:45:48.830
and not just the single
number, like frequentists
00:45:48.830 --> 00:45:50.030
would give you.
00:45:50.030 --> 00:45:55.070
Then we can say decorate, or
theta hat, or point estimate,
00:45:55.070 --> 00:45:56.570
with maybe some
confidence interval.
00:45:56.570 --> 00:45:58.400
Maybe we can do
a bunch of tests.
00:45:58.400 --> 00:46:01.070
But at the end of the
day, we just have,
00:46:01.070 --> 00:46:02.624
essentially, one number, right?
00:46:02.624 --> 00:46:04.040
Then maybe we can
understand where
00:46:04.040 --> 00:46:07.310
the fluctuations of this number
are in a frequentist setup.
00:46:07.310 --> 00:46:11.760
but the Bayesian
framework is essentially
00:46:11.760 --> 00:46:13.059
giving you a natural method.
00:46:13.059 --> 00:46:15.517
And you can interpret it in
terms of the probabilities that
00:46:15.517 --> 00:46:17.400
are associated to the prior.
00:46:17.400 --> 00:46:21.180
But you can actually
also try to make some--
00:46:21.180 --> 00:46:25.840
so a Bayesian, if you
give me any prior,
00:46:25.840 --> 00:46:29.040
you're going to actually build
an estimator from this prior,
00:46:29.040 --> 00:46:30.515
maybe from the posterior.
00:46:30.515 --> 00:46:32.890
And maybe it's going to have
some frequentist properties.
00:46:32.890 --> 00:46:35.181
And that's what's really nice
about [? Bayesians, ?] is
00:46:35.181 --> 00:46:36.700
that you can
actually try to give
00:46:36.700 --> 00:46:39.340
some frequentist properties
of Bayesian methods, that
00:46:39.340 --> 00:46:42.224
are built using
Bayesian methodology.
00:46:42.224 --> 00:46:44.140
But you cannot really
go the other way around.
00:46:44.140 --> 00:46:46.449
If I give you a
frequency methodology,
00:46:46.449 --> 00:46:48.490
how are you going to say
something about the fact
00:46:48.490 --> 00:46:51.620
that there's a prior
going on, et cetera?
00:46:51.620 --> 00:46:53.457
And so this is actually
one of the things
00:46:53.457 --> 00:46:55.790
there's actually some research
that's going on for this.
00:46:55.790 --> 00:46:58.147
They call it Bayesian
posterior concentration.
00:46:58.147 --> 00:46:59.980
And one of the things--
so there's something
00:46:59.980 --> 00:47:01.990
called the Bernstein-von
Mises theorem.
00:47:01.990 --> 00:47:03.910
And those are a
class of theorems,
00:47:03.910 --> 00:47:06.790
and those are essentially
methods that tell you, well,
00:47:06.790 --> 00:47:10.690
if I actually run
a Bayesian method,
00:47:10.690 --> 00:47:12.647
and I look at the
posterior that I get--
00:47:12.647 --> 00:47:14.230
it's going to be
something like this--
00:47:14.230 --> 00:47:16.540
but now, I try to study this
in a frequentist point of view,
00:47:16.540 --> 00:47:18.289
there's actually a
true parameter of theta
00:47:18.289 --> 00:47:20.390
somewhere, the true one.
00:47:20.390 --> 00:47:21.640
There's no prior for this guy.
00:47:21.640 --> 00:47:23.410
This is just one fixed number.
00:47:23.410 --> 00:47:25.120
Is it true that as
my sample size is
00:47:25.120 --> 00:47:27.610
going to go to infinity,
then this thing is going
00:47:27.610 --> 00:47:29.860
to concentrate around theta?
00:47:29.860 --> 00:47:31.990
And the rate of
concentration of this thing,
00:47:31.990 --> 00:47:35.440
the size of this width,
the standard deviation
00:47:35.440 --> 00:47:38.290
of this thing, is something
that should decay maybe
00:47:38.290 --> 00:47:40.850
like 1 over square root of
n, or something like this.
00:47:40.850 --> 00:47:43.349
And the rate of
posterior concentration,
00:47:43.349 --> 00:47:45.890
when you characterize it, it's
called the Bernstein-von Mises
00:47:45.890 --> 00:47:46.390
theorem.
00:47:46.390 --> 00:47:47.830
And so people are
looking at this
00:47:47.830 --> 00:47:49.566
in some non-parametric cases.
00:47:49.566 --> 00:47:51.190
You can do it in
pretty much everything
00:47:51.190 --> 00:47:52.190
we've been doing before.
00:47:52.190 --> 00:47:55.690
You can do it for non-parametric
regression estimation
00:47:55.690 --> 00:47:56.794
or density estimation.
00:47:56.794 --> 00:47:58.210
You can do it for,
of course-- you
00:47:58.210 --> 00:48:01.340
can do it for sparse
estimation, if you want.
00:48:01.340 --> 00:48:01.840
OK.
00:48:01.840 --> 00:48:04.967
So you can actually
compute the procedure and--
00:48:08.620 --> 00:48:09.290
yeah.
00:48:09.290 --> 00:48:12.660
And so you can think of it as
being just a method somehow.
00:48:12.660 --> 00:48:14.970
Now, the estimator
I'm talking about-- so
00:48:14.970 --> 00:48:18.210
that's just a general Bayesian
posterior concentration.
00:48:18.210 --> 00:48:20.430
But you can also
try to understand
00:48:20.430 --> 00:48:22.710
what is the property
of something that's
00:48:22.710 --> 00:48:24.210
extracted from this posterior.
00:48:24.210 --> 00:48:26.130
And one thing that
we actually describe
00:48:26.130 --> 00:48:28.310
was, for example,
well, given this guy,
00:48:28.310 --> 00:48:30.060
maybe it's a good idea
to think about what
00:48:30.060 --> 00:48:32.370
the mean of this
thing is, right?
00:48:32.370 --> 00:48:35.040
So there's going to
be some theta hat,
00:48:35.040 --> 00:48:41.460
which is just the integral of
theta pi theta, given x1 xn--
00:48:41.460 --> 00:48:43.860
so that's my posterior--
00:48:43.860 --> 00:48:44.380
d theta.
00:48:44.380 --> 00:48:44.880
Right?
00:48:44.880 --> 00:48:46.500
So that's the posterior mean.
00:48:46.500 --> 00:48:48.750
That's the expected
value with respect
00:48:48.750 --> 00:48:50.880
to the posterior distribution.
00:48:50.880 --> 00:48:53.640
And I want to know how
does this thing behave,
00:48:53.640 --> 00:48:56.670
how close it is to a
true theta if I actually
00:48:56.670 --> 00:48:58.370
am in a frequency setup.
00:48:58.370 --> 00:48:59.784
So that's the posterior mean.
00:49:04.260 --> 00:49:08.450
But this is not the only thing
I can actually spit out, right?
00:49:08.450 --> 00:49:09.980
This is definitely
uniquely defined.
00:49:09.980 --> 00:49:13.490
If you give me a
distribution, I can actually
00:49:13.490 --> 00:49:15.170
spit out its posterior mean.
00:49:15.170 --> 00:49:17.480
But I can also think of
the posterior median.
00:49:21.450 --> 00:49:23.237
But now, if this
is not continuous,
00:49:23.237 --> 00:49:24.570
you might have some uncertainty.
00:49:24.570 --> 00:49:26.570
Maybe the median is
not uniquely defined,
00:49:26.570 --> 00:49:29.180
and so maybe that's not
something you use as much.
00:49:29.180 --> 00:49:31.690
Maybe you can actually talk
about the posterior mode.
00:49:35.160 --> 00:49:38.040
All right, so for example, if
you're posterior density looks
00:49:38.040 --> 00:49:40.020
like this, then
maybe you just want
00:49:40.020 --> 00:49:43.600
to summarize your
posterior with this number.
00:49:43.600 --> 00:49:46.080
So clearly, in this case,
it's not such a good idea,
00:49:46.080 --> 00:49:48.270
because you completely
forget about this mode.
00:49:48.270 --> 00:49:49.811
But maybe that's
what you want to do.
00:49:49.811 --> 00:49:53.400
Maybe you want to focus
on the most peak mode.
00:49:53.400 --> 00:49:58.524
And this is actually called
maximum a posteriori.
00:49:58.524 --> 00:49:59.940
As I said, maybe
you want a sample
00:49:59.940 --> 00:50:03.240
from this posterior
distribution.
00:50:03.240 --> 00:50:06.420
OK, and so in all these cases,
these Bayesian estimators
00:50:06.420 --> 00:50:09.000
will depend on the
prior distribution.
00:50:09.000 --> 00:50:11.610
And the hope is that, as
the sample size grows,
00:50:11.610 --> 00:50:14.130
you won't see that again.
00:50:14.130 --> 00:50:14.630
OK.
00:50:14.630 --> 00:50:20.840
So to conclude, let's just
do a couple of experiments.
00:50:20.840 --> 00:50:22.340
So if I look at--
00:50:25.200 --> 00:50:26.011
did we do this?
00:50:26.011 --> 00:50:26.510
Yes.
00:50:26.510 --> 00:50:30.398
So for example, so let's
focus on the posterior mean.
00:50:34.366 --> 00:50:45.394
And we know-- so remember
in experiment one--
00:50:45.394 --> 00:50:48.100
[INAUDIBLE] example
one, what we had
00:50:48.100 --> 00:50:56.000
was x1 xn that were
[? iid, ?] Bernoulli p,
00:50:56.000 --> 00:51:06.410
and the prior I put on p was
a beta with parameter aa.
00:51:06.410 --> 00:51:07.160
OK?
00:51:07.160 --> 00:51:09.830
And if I go back to
what we computed,
00:51:09.830 --> 00:51:12.740
you can actually compute
the posterior of this thing.
00:51:12.740 --> 00:51:15.000
And we know that it's
actually going to be--
00:51:15.000 --> 00:51:17.390
sorry, that was uniform?
00:51:17.390 --> 00:51:18.620
Where is-- yeah.
00:51:18.620 --> 00:51:31.170
So what we get is that
the posterior, this thing
00:51:31.170 --> 00:51:36.630
is actually going to be
a beta with parameter
00:51:36.630 --> 00:51:42.640
a plus the sum, so a
plus the number of 1s
00:51:42.640 --> 00:51:44.770
and a plus the number of 0s.
00:51:48.590 --> 00:51:49.870
OK?
00:51:49.870 --> 00:51:53.840
And the beta was just
something that looked like--
00:51:56.480 --> 00:52:00.500
the density was p to the
a minus 1, 1 minus p.
00:52:05.440 --> 00:52:05.940
OK?
00:52:05.940 --> 00:52:11.130
So if I want to understand
the posterior mean,
00:52:11.130 --> 00:52:13.950
I need to be able to compute
the expectation of a beta,
00:52:13.950 --> 00:52:16.620
and then maybe plug
in a for a plus
00:52:16.620 --> 00:52:17.980
this guy and minus this guy.
00:52:17.980 --> 00:52:18.480
OK.
00:52:18.480 --> 00:52:21.770
So actually, let me do this.
00:52:21.770 --> 00:52:22.270
OK.
00:52:22.270 --> 00:52:23.930
So what is the expectation?
00:52:26.337 --> 00:52:27.920
So what I want is
something that looks
00:52:27.920 --> 00:52:34.820
like the integral between 0
and 1 of p times a minus 1--
00:52:34.820 --> 00:52:42.320
sorry, p times p a minus
1, 1 minus p, b minus 1.
00:52:42.320 --> 00:52:43.590
Do we agree that this--
00:52:43.590 --> 00:52:46.290
and then there's a
normalizing constant.
00:52:46.290 --> 00:52:49.270
Let's call it c.
00:52:49.270 --> 00:52:49.770
OK?
00:52:53.200 --> 00:52:56.330
So this is what I
need to compute.
00:52:56.330 --> 00:52:57.640
So that's c of a and b.
00:53:00.257 --> 00:53:01.840
Do we agree that
this is the posterior
00:53:01.840 --> 00:53:08.651
mean with respect to a beta
with parameters a and b?
00:53:08.651 --> 00:53:09.150
Right?
00:53:09.150 --> 00:53:13.334
I just integrate p
against the density.
00:53:13.334 --> 00:53:14.750
So what does this
thing look like?
00:53:14.750 --> 00:53:18.550
Well, I can actually
move this guy in here.
00:53:18.550 --> 00:53:23.402
And here, I'm going to
have a plus 1 minus 1.
00:53:23.402 --> 00:53:26.366
OK?
00:53:26.366 --> 00:53:29.360
So the problem is that
this thing is actually--
00:53:29.360 --> 00:53:31.360
the constant is going to
play a big role, right?
00:53:31.360 --> 00:53:33.100
Because this is
essentially equal
00:53:33.100 --> 00:53:40.270
to c a plus 1b
divided by c ab, where
00:53:40.270 --> 00:53:42.220
ca plus 1b is just
the normalizing
00:53:42.220 --> 00:53:46.340
constant of a beta a plus 1 b.
00:53:46.340 --> 00:53:48.729
So I need to know the ratio
of those two constants.
00:53:58.320 --> 00:53:59.660
And this is not something--
00:53:59.660 --> 00:54:01.680
I mean, this is just
a calculus exercise.
00:54:01.680 --> 00:54:06.820
So in this case,
what you get is--
00:54:06.820 --> 00:54:08.640
sorry.
00:54:08.640 --> 00:54:09.750
In this case, you get--
00:54:12.560 --> 00:54:34.940
well, OK, so we get
essentially a divided by,
00:54:34.940 --> 00:54:37.990
I think, it's a plus b.
00:54:37.990 --> 00:54:38.940
Yeah, it's a plus b.
00:54:41.856 --> 00:54:43.314
So that's this quantity.
00:54:47.188 --> 00:54:47.688
OK?
00:54:51.100 --> 00:54:56.520
And when I plug in a to be this
guy and b to be this guy, what
00:54:56.520 --> 00:55:02.520
I get is a plus sum of the xi.
00:55:02.520 --> 00:55:06.240
And then I get a plus this
guy, a plus n minus this guy.
00:55:06.240 --> 00:55:07.720
So those two guys
go away, and I'm
00:55:07.720 --> 00:55:14.050
left with 2a plus n,
which does not work.
00:55:14.050 --> 00:55:15.240
No, that actually works.
00:55:15.240 --> 00:55:18.520
And so now what I do, I
can actually divide and get
00:55:18.520 --> 00:55:19.850
this thing, over there.
00:55:19.850 --> 00:55:20.350
OK.
00:55:20.350 --> 00:55:23.380
So what you can see, the reason
why this thing has been divided
00:55:23.380 --> 00:55:27.730
is that you can really see
that, as n goes to infinity,
00:55:27.730 --> 00:55:30.120
then this thing behaves
like xn bar, which
00:55:30.120 --> 00:55:31.650
is our frequentist estimator.
00:55:31.650 --> 00:55:34.200
The effect of a is
actually going away.
00:55:34.200 --> 00:55:37.530
The effect of the prior, which
is completely captured by a,
00:55:37.530 --> 00:55:40.440
is going away as n
goes to infinity.
00:55:40.440 --> 00:55:42.440
Is there any question?
00:55:47.440 --> 00:55:48.850
You guys have a question.
00:55:48.850 --> 00:55:50.202
What is it?
00:55:50.202 --> 00:55:51.551
Do you have a question?
00:55:51.551 --> 00:55:53.426
AUDIENCE: Yeah, on the
board, is that divided
00:55:53.426 --> 00:55:56.259
by some [INAUDIBLE] stuff?
00:55:56.259 --> 00:55:58.050
PHILIPPE RIGOLLET: Is
that divided by what?
00:55:58.050 --> 00:56:00.555
AUDIENCE: That a over a plus
b, and then you just expanded--
00:56:00.555 --> 00:56:01.930
PHILIPPE RIGOLLET:
Oh yeah, yeah,
00:56:01.930 --> 00:56:05.220
then I said that this
is equal to this, right.
00:56:05.220 --> 00:56:15.690
So that's for a becomes a plus
sum of the xi's, and b becomes
00:56:15.690 --> 00:56:20.391
a plus n minus sum of the xi's.
00:56:20.391 --> 00:56:20.890
OK.
00:56:20.890 --> 00:56:22.508
So that's just for
the posterior one.
00:56:22.508 --> 00:56:26.264
AUDIENCE: What's [INAUDIBLE]
00:56:26.264 --> 00:56:27.430
PHILIPPE RIGOLLET: This guy?
00:56:27.430 --> 00:56:28.070
AUDIENCE: Yeah.
00:56:28.070 --> 00:56:28.740
PHILIPPE RIGOLLET: 2a.
00:56:28.740 --> 00:56:29.281
AUDIENCE: 2a.
00:56:29.281 --> 00:56:30.150
Oh, OK.
00:56:30.150 --> 00:56:31.191
PHILIPPE RIGOLLET: Right.
00:56:31.191 --> 00:56:34.885
So I get a plus a plus n.
00:56:34.885 --> 00:56:37.960
And then those two guys cancel.
00:56:37.960 --> 00:56:38.460
OK?
00:56:38.460 --> 00:56:41.380
And that's what you have here.
00:56:41.380 --> 00:56:44.920
So for a is equal to 1/2--
00:56:44.920 --> 00:56:47.020
and I claim that this
is Jeffreys prior.
00:56:47.020 --> 00:56:53.950
Because remember, Jeffreys was
[INAUDIBLE] was square root
00:56:53.950 --> 00:56:56.100
and was proportional to
the square root of p1 minus
00:56:56.100 --> 00:57:01.050
p, which I can write as p to
the 1/2, 1 minus p to the 1/2.
00:57:01.050 --> 00:57:03.501
So it's just the case
a is equal to 1/2.
00:57:03.501 --> 00:57:04.000
OK.
00:57:04.000 --> 00:57:07.660
So if I use Jeffreys prior, I
just plug in a equals to 1/2,
00:57:07.660 --> 00:57:10.530
and this is what I get.
00:57:10.530 --> 00:57:12.630
OK?
00:57:12.630 --> 00:57:14.880
So those things are going
to have an impact again when
00:57:14.880 --> 00:57:16.150
n is moderately large.
00:57:16.150 --> 00:57:19.090
For large n, those things,
whether you take Jeffreys prior
00:57:19.090 --> 00:57:20.710
or you take whatever
a you prefer,
00:57:20.710 --> 00:57:23.130
it's going to have
no impact whatsoever.
00:57:23.130 --> 00:57:26.894
But n is of the
order of 10 maybe,
00:57:26.894 --> 00:57:28.810
then you're going to
start to see some impact,
00:57:28.810 --> 00:57:30.351
depending on what
a you want to pick.
00:57:33.540 --> 00:57:34.040
OK.
00:57:34.040 --> 00:57:38.390
And then in the second
example, well, here we actually
00:57:38.390 --> 00:57:42.560
computed the posterior
to be this guy.
00:57:42.560 --> 00:57:45.544
Well, here, I can just read off
what the expectation is, right?
00:57:45.544 --> 00:57:47.210
I mean, I don't have
to actually compute
00:57:47.210 --> 00:57:48.970
the expectation of a Gaussian.
00:57:48.970 --> 00:57:50.650
It's just that xn bar.
00:57:50.650 --> 00:57:52.660
And so in this case,
there's actually no--
00:57:52.660 --> 00:57:57.190
I mean, when I have a
non-informative prior
00:57:57.190 --> 00:58:01.750
for a Gaussian, then I
have basically xn in bar.
00:58:01.750 --> 00:58:04.390
As you can see, actually, this
is an interesting example.
00:58:04.390 --> 00:58:06.490
When I actually look
at the posterior,
00:58:06.490 --> 00:58:09.190
it's not something that cost
me a lot to communicate to you,
00:58:09.190 --> 00:58:10.037
right?
00:58:10.037 --> 00:58:12.370
There's one symbol here, one
symbol here, and one symbol
00:58:12.370 --> 00:58:13.330
here.
00:58:13.330 --> 00:58:17.950
I tell you the posterior is
a Gaussian with mean xn bar
00:58:17.950 --> 00:58:19.660
and variance 1/n.
00:58:19.660 --> 00:58:23.530
When I actually turn
that into a poster mean,
00:58:23.530 --> 00:58:26.264
I'm dropping all
this information.
00:58:26.264 --> 00:58:27.930
I'm just giving you
the first parameter.
00:58:27.930 --> 00:58:30.150
So you can see there's
actually much more information
00:58:30.150 --> 00:58:35.100
in the posterior than there
is in the posterior mean.
00:58:35.100 --> 00:58:37.210
The posterior mean
is just a point.
00:58:37.210 --> 00:58:39.930
It's not telling me how
confident I am in this point.
00:58:39.930 --> 00:58:41.950
And this thing is
actually very interesting.
00:58:41.950 --> 00:58:42.450
OK.
00:58:42.450 --> 00:58:44.283
So you can talk about
the posterior variance
00:58:44.283 --> 00:58:45.880
that's associated to it, right?
00:58:45.880 --> 00:58:47.516
You can talk about,
as an output,
00:58:47.516 --> 00:58:49.890
you could give the posterior
mean and posterior variance.
00:58:49.890 --> 00:58:53.311
And those things are
actually interesting.
00:58:53.311 --> 00:58:53.810
All right.
00:58:53.810 --> 00:58:56.370
So I think this is it.
00:58:56.370 --> 00:59:05.360
So as I said, in general,
just like in this case,
00:59:05.360 --> 00:59:07.980
the impact of the prior
is being washed away
00:59:07.980 --> 00:59:10.310
as the sample size
goes to infinity.
00:59:10.310 --> 00:59:12.860
Just well, like here, there's
no impact of the prior.
00:59:12.860 --> 00:59:14.500
It was an noninvasive one.
00:59:14.500 --> 00:59:17.780
But if you actually had an
informative one, [? CF ?]
00:59:17.780 --> 00:59:18.683
homework-- yeah?
00:59:18.683 --> 00:59:19.650
AUDIENCE: [INAUDIBLE]
00:59:19.650 --> 00:59:21.150
PHILIPPE RIGOLLET: Yeah,
so [? CF ?] homework,
00:59:21.150 --> 00:59:23.358
you would actually see an
impact of the prior, which,
00:59:23.358 --> 00:59:25.890
again, would be washed away
as your sample size increases.
00:59:25.890 --> 00:59:26.820
Here, it goes away.
00:59:26.820 --> 00:59:29.610
You just get xn bar over 1.
00:59:29.610 --> 00:59:31.830
And actually, in
these cases, you
00:59:31.830 --> 00:59:35.580
see that the posterior
distribution converges
00:59:35.580 --> 00:59:37.560
to-- sorry, the
Bayesian estimator
00:59:37.560 --> 00:59:39.510
is asymptotically normal.
00:59:39.510 --> 00:59:43.471
This is different from the
distribution of the posterior,
00:59:43.471 --> 00:59:43.970
right?
00:59:43.970 --> 00:59:45.886
This is just the posterior
mean, which happens
00:59:45.886 --> 00:59:47.480
to be asymptotically normal.
00:59:47.480 --> 00:59:49.595
But the posterior
may not have a--
00:59:49.595 --> 00:59:53.000
I mean, here, the
posterior is a beta, right?
00:59:53.000 --> 00:59:55.020
I mean, it's not normal.
00:59:55.020 --> 00:59:57.210
OK, so there's
different-- those things
00:59:57.210 --> 00:59:59.556
are two different things.
00:59:59.556 --> 01:00:01.548
Your question?
01:00:01.548 --> 01:00:04.487
AUDIENCE: What was
the prior [INAUDIBLE]
01:00:04.487 --> 01:00:05.820
PHILIPPE RIGOLLET: All 1, right?
01:00:05.820 --> 01:00:06.986
That was the improper prior.
01:00:06.986 --> 01:00:08.896
AUDIENCE: OK.
01:00:08.896 --> 01:00:12.563
And so that would give you the
same thing as [INAUDIBLE],, not
01:00:12.563 --> 01:00:13.790
just the proportion.
01:00:13.790 --> 01:00:15.373
PHILIPPE RIGOLLET:
Well, I mean, yeah.
01:00:15.373 --> 01:00:17.600
So it's essentially
telling you that--
01:00:17.600 --> 01:00:23.390
so we said that, when you
have a non-informative prior,
01:00:23.390 --> 01:00:25.760
essentially, the maximum
likelihood is the maximum
01:00:25.760 --> 01:00:26.879
a posteriori, right?
01:00:26.879 --> 01:00:28.670
But in this case,
there's so much symmetry,
01:00:28.670 --> 01:00:30.560
that it just so happens that
the maximum in this thing
01:00:30.560 --> 01:00:32.370
is completely symmetric
around its maximum.
01:00:32.370 --> 01:00:34.809
So it means that the expectation
is equal to the maximum,
01:00:34.809 --> 01:00:35.600
to [INAUDIBLE] max.
01:00:40.957 --> 01:00:41.931
Yeah?
01:00:41.931 --> 01:00:43.392
AUDIENCE: I read
somewhere that one
01:00:43.392 --> 01:00:45.340
of the issues with
Bayesian methods
01:00:45.340 --> 01:00:46.801
is that we choose
the wrong prior,
01:00:46.801 --> 01:00:49.723
and it could mess
up your results.
01:00:49.723 --> 01:00:51.370
PHILIPPE RIGOLLET:
Yeah, but hence,
01:00:51.370 --> 01:00:53.980
do not pick the wrong prior.
01:00:53.980 --> 01:00:55.244
I mean, of course, it would.
01:00:55.244 --> 01:00:57.160
I mean, it would mess
up your res-- of course.
01:00:57.160 --> 01:00:58.810
I mean, you're putting
extra information.
01:00:58.810 --> 01:01:00.601
But you could say the
same thing by saying,
01:01:00.601 --> 01:01:03.670
well, the issue with
frequentist method
01:01:03.670 --> 01:01:06.730
is that, if you mess up the
choice of your likelihood,
01:01:06.730 --> 01:01:09.424
then it's going to
mess up your output.
01:01:09.424 --> 01:01:11.590
So here, you just have two
chances of messing it up,
01:01:11.590 --> 01:01:12.250
right?
01:01:12.250 --> 01:01:14.440
You have the-- well, it's gone.
01:01:14.440 --> 01:01:17.920
So you have the product of
the likelihood and the prior,
01:01:17.920 --> 01:01:20.350
and you have one
more chance to--
01:01:20.350 --> 01:01:22.420
but it's true, if you
assume that the model is
01:01:22.420 --> 01:01:25.960
right, then, of course,
finding the wrong prior could
01:01:25.960 --> 01:01:28.520
completely mess up things
if your prior, for example,
01:01:28.520 --> 01:01:30.780
has no support on
the true parameter.
01:01:30.780 --> 01:01:34.715
But if your prior has a positive
weight on the true parameter
01:01:34.715 --> 01:01:38.140
as n goes to infinity--
01:01:38.140 --> 01:01:40.640
I mean, OK, I cannot speak
for all counterexamples
01:01:40.640 --> 01:01:41.480
in the world.
01:01:41.480 --> 01:01:44.450
But I'm sure, under minor
technical conditions,
01:01:44.450 --> 01:01:46.550
you can guarantee
that your posterior
01:01:46.550 --> 01:01:48.530
mean is going to
converge to what
01:01:48.530 --> 01:01:49.742
you need it to converge to.
01:01:53.678 --> 01:01:54.662
Any other question?
01:01:57.881 --> 01:01:58.380
All right.
01:01:58.380 --> 01:02:07.650
So I think this closes the more
traditional mathematical-- not
01:02:07.650 --> 01:02:11.490
mathematical, but traditional
statistics part of this class.
01:02:11.490 --> 01:02:14.310
And from here on, we'll
talk about more multivariate
01:02:14.310 --> 01:02:17.740
statistics, starting with
principal component analysis.
01:02:17.740 --> 01:02:19.800
So that's more like when
you have multiple data.
01:02:19.800 --> 01:02:22.650
We started, in a way, to talk
about multivariate statistics
01:02:22.650 --> 01:02:25.320
when we talked about
multivariate regression.
01:02:25.320 --> 01:02:28.180
But we'll move on to
principal component analysis.
01:02:28.180 --> 01:02:30.690
I'll talk a bit about
multiple testing.
01:02:30.690 --> 01:02:32.400
I haven't made my
mind yet about what
01:02:32.400 --> 01:02:34.350
we'll talk really in December.
01:02:34.350 --> 01:02:36.480
But I want to make
sure that you have
01:02:36.480 --> 01:02:41.310
a taste and a flavor of what is
being interesting in statistics
01:02:41.310 --> 01:02:44.341
these days, especially as you
go towards more [INAUDIBLE]
01:02:44.341 --> 01:02:46.590
learning type of questions,
where really, the focus is
01:02:46.590 --> 01:02:48.619
on prediction rather
than the modeling itself.
01:02:48.619 --> 01:02:50.160
We'll talk about
logistic regression,
01:02:50.160 --> 01:02:52.800
as well, for example,
which is generalized
01:02:52.800 --> 01:02:55.470
linear models, which is just
the generalization in the case
01:02:55.470 --> 01:03:00.480
that y does not take value in
the whole real line, maybe 0,1,
01:03:00.480 --> 01:03:03.360
for example, for regression.
01:03:03.360 --> 01:03:03.960
All right.
01:03:03.960 --> 01:03:05.510
Thanks.