WEBVTT

00:00:00.500 --> 00:00:02.650
We will now go through
an example that

00:00:02.650 --> 00:00:05.820
involves a continuous
unknown parameter,

00:00:05.820 --> 00:00:09.830
the unknown bias of a coin
and discrete observations,

00:00:09.830 --> 00:00:12.100
namely, the number
of heads that are

00:00:12.100 --> 00:00:14.360
observed in a sequence
of coin flips.

00:00:14.360 --> 00:00:17.880
This is an example that we
will start in some detail now,

00:00:17.880 --> 00:00:20.630
and we will also
revisit later on.

00:00:20.630 --> 00:00:23.620
And in the process, we will
also have the opportunity

00:00:23.620 --> 00:00:27.770
to introduce a new class of
probability distributions.

00:00:27.770 --> 00:00:30.860
This example is an
extension of an example

00:00:30.860 --> 00:00:33.320
that we have already
seen, when we first

00:00:33.320 --> 00:00:36.970
introduced the relevant
version of the Bayes rule.

00:00:36.970 --> 00:00:38.800
We have a coin.

00:00:38.800 --> 00:00:44.110
It has a certain bias between 0
and 1, but the bias is unknown.

00:00:44.110 --> 00:00:47.190
And consistent with the
Bayesian philosophy,

00:00:47.190 --> 00:00:50.470
we treat this unknown
bias as a random variable,

00:00:50.470 --> 00:00:54.240
and we assign a prior
probability distribution to it.

00:00:54.240 --> 00:00:57.130
We flip this coin n
times independently,

00:00:57.130 --> 00:00:59.360
where n is some
positive integer,

00:00:59.360 --> 00:01:02.440
and we record the number
of heads that are obtained.

00:01:02.440 --> 00:01:05.030
On the basis of the value
of this random variable,

00:01:05.030 --> 00:01:08.740
we would like to make
inferences about Theta.

00:01:08.740 --> 00:01:11.510
Now to make some more
concrete progress,

00:01:11.510 --> 00:01:13.280
let us make a
specific assumption.

00:01:13.280 --> 00:01:16.820
Let us assume that
the prior on Theta

00:01:16.820 --> 00:01:20.840
is uniform on the unit interval,
in some sense reflecting

00:01:20.840 --> 00:01:25.260
complete ignorance about
the true value of Theta.

00:01:25.260 --> 00:01:30.789
We observe the value of this
random variable, some little k,

00:01:30.789 --> 00:01:34.400
we fix that value, and we're
interested in the functional

00:01:34.400 --> 00:01:38.490
dependence on theta of
this particular quantity,

00:01:38.490 --> 00:01:41.140
when k is given to us.

00:01:41.140 --> 00:01:42.650
How do we do this?

00:01:42.650 --> 00:01:46.610
We use the appropriate form
of the Bayes rule, which

00:01:46.610 --> 00:01:49.740
in this setting is as follows.

00:01:49.740 --> 00:01:54.289
it is the usual
form, but we have

00:01:54.289 --> 00:01:57.620
f's indicating
densities whenever we're

00:01:57.620 --> 00:01:59.509
talking about the
distribution of Theta,

00:01:59.509 --> 00:02:01.440
because Theta is continuous.

00:02:01.440 --> 00:02:04.760
And whenever we talk about
the distribution of K, which

00:02:04.760 --> 00:02:07.020
is discrete, we
use the symbol p,

00:02:07.020 --> 00:02:10.600
because we're dealing with
probability mass functions.

00:02:10.600 --> 00:02:14.770
As always, the
denominator term is such

00:02:14.770 --> 00:02:19.490
that the integral of the
whole expression over theta

00:02:19.490 --> 00:02:20.670
is equal to 1.

00:02:20.670 --> 00:02:23.329
This is the necessary
normalization property,

00:02:23.329 --> 00:02:26.180
and because of this,
this denominator term

00:02:26.180 --> 00:02:29.650
has to be equal to the
integral of the numerator

00:02:29.650 --> 00:02:33.250
over all theta, which
is what we have here.

00:02:33.250 --> 00:02:36.990
So now let us move, and
let us apply this formula.

00:02:36.990 --> 00:02:41.320
We first have the prior,
which is equal to 1.

00:02:41.320 --> 00:02:45.530
Then we have the probability
that K is equal to little k.

00:02:45.530 --> 00:02:49.030
This is the probability of
obtaining exactly k heads,

00:02:49.030 --> 00:02:51.740
if I tell you the
bias or the coin.

00:02:51.740 --> 00:02:53.860
But if I tell you
the bias of the coin,

00:02:53.860 --> 00:02:57.410
we're dealing with the usual
model of independent coin

00:02:57.410 --> 00:03:00.270
flips, and the
probability of k heads

00:03:00.270 --> 00:03:04.610
is given by the binomial
probabilities, which

00:03:04.610 --> 00:03:05.890
takes this form.

00:03:08.900 --> 00:03:14.520
And finally, we have
the denominator term,

00:03:14.520 --> 00:03:18.260
which we do not need to
evaluate at this point.

00:03:18.260 --> 00:03:21.760
Now, I said earlier that we're
interested in the dependence

00:03:21.760 --> 00:03:26.250
on theta, which comes
through these terms.

00:03:26.250 --> 00:03:29.550
On the other hand,
the remaining terms

00:03:29.550 --> 00:03:34.090
do not involve any
thetas, and so they

00:03:34.090 --> 00:03:38.420
can be lumped together
in just a constant.

00:03:38.420 --> 00:03:41.140
And so we can write
the answer that we

00:03:41.140 --> 00:03:44.980
have found in this
more suggestive form.

00:03:44.980 --> 00:03:47.160
We have some
normalizing constant,

00:03:47.160 --> 00:03:50.670
and here we keep separately
the dependence on theta.

00:03:50.670 --> 00:03:52.960
Of course, this
answer that we derived

00:03:52.960 --> 00:03:57.570
is valid for little theta
belonging to the unit interval.

00:03:57.570 --> 00:04:01.660
Outside the unit interval,
either the prior density

00:04:01.660 --> 00:04:07.370
or the posterior density of
Theta would be equal to 0.

00:04:07.370 --> 00:04:12.130
This particular form of
the posterior distribution

00:04:12.130 --> 00:04:15.500
for Theta is a certain
type of density,

00:04:15.500 --> 00:04:18.110
and it shows up in
various contexts.

00:04:18.110 --> 00:04:20.890
And for this reason,
it has a name.

00:04:20.890 --> 00:04:25.320
It is called a Beta distribution
with certain parameters,

00:04:25.320 --> 00:04:28.040
and the parameters
reflect the exponents

00:04:28.040 --> 00:04:32.390
that we have up here
in the two terms.

00:04:32.390 --> 00:04:36.150
Note that these parameters are
the exponents augmented by 1.

00:04:36.150 --> 00:04:39.730
This is for historical reasons
that do not concern us here.

00:04:39.730 --> 00:04:41.720
It is just a convention.

00:04:41.720 --> 00:04:45.840
The important thing is to be
able to recognize what it takes

00:04:45.840 --> 00:04:48.760
for a distribution to
be a Beta distribution.

00:04:48.760 --> 00:04:52.790
That this that the dependence
on theta is of the form theta

00:04:52.790 --> 00:04:57.100
to some power times 1 minus
theta to some other power.

00:04:57.100 --> 00:05:01.060
Any distribution of this form
is called a Beta distribution.

00:05:01.060 --> 00:05:03.020
So now, let's
continue this example

00:05:03.020 --> 00:05:05.270
by considering a
different prior.

00:05:05.270 --> 00:05:10.530
Suppose that the prior is
itself a Beta distribution

00:05:10.530 --> 00:05:13.610
of this form where
alpha and beta are

00:05:13.610 --> 00:05:17.130
some non-negative numbers.

00:05:17.130 --> 00:05:20.250
What is the posterior
in this case?

00:05:20.250 --> 00:05:23.160
We just go through the
same calculation as before,

00:05:23.160 --> 00:05:27.150
but instead of using one
in the place of the prior,

00:05:27.150 --> 00:05:30.850
we now use the prior
that's given to us.

00:05:35.950 --> 00:05:39.909
The probability of k
heads in the n tosses,

00:05:39.909 --> 00:05:43.350
when we know the bias,
is exactly as before.

00:05:43.350 --> 00:05:47.840
It is given by the
binomial probabilities.

00:05:47.840 --> 00:05:53.540
And finally, we need to divide
by the denominator term, which

00:05:53.540 --> 00:05:56.480
is the normalizing constant.

00:05:56.480 --> 00:05:58.670
What do we observe here?

00:05:58.670 --> 00:06:03.750
The dependence on theta
comes through these terms.

00:06:03.750 --> 00:06:07.610
The remaining terms
do not involve theta,

00:06:07.610 --> 00:06:11.710
and they can all be
absorbed in a constant.

00:06:11.710 --> 00:06:16.430
Let's call that constant d, and
collect the remaining terms.

00:06:16.430 --> 00:06:22.260
We have theta to the
power of alpha plus k,

00:06:22.260 --> 00:06:28.550
and then, 1 minus theta to the
power of beta plus n minus k.

00:06:33.530 --> 00:06:36.900
And once more, this is
the form of the posterior

00:06:36.900 --> 00:06:40.170
for thetas belonging
to this range.

00:06:40.170 --> 00:06:43.680
The posterior is 0
outside this range.

00:06:43.680 --> 00:06:45.180
So what do we see?

00:06:45.180 --> 00:06:47.390
We started with
a prior that came

00:06:47.390 --> 00:06:49.920
from the Beta
family of this form,

00:06:49.920 --> 00:06:54.830
and we came up with a
posterior that is still

00:06:54.830 --> 00:06:57.490
a function of
theta of this form,

00:06:57.490 --> 00:07:01.550
but with different values of
the parameters alpha and beta.

00:07:01.550 --> 00:07:03.970
Namely, alpha gets
replaced by alpha plus k,

00:07:03.970 --> 00:07:08.080
beta gets replaced by
beta plus n minus k.

00:07:08.080 --> 00:07:10.340
So we see that if we
start with a prior

00:07:10.340 --> 00:07:12.890
from the family of
Beta distributions,

00:07:12.890 --> 00:07:17.720
the posterior will also
be in that same family.

00:07:17.720 --> 00:07:21.120
This is a beautiful property
of Beta distributions

00:07:21.120 --> 00:07:24.410
that can be exploited
in various ways.

00:07:24.410 --> 00:07:26.890
One of which is that
it actually allows

00:07:26.890 --> 00:07:31.170
for recursive ways of updating
the posterior of Theta

00:07:31.170 --> 00:07:34.159
as we get more and
more observations.