WEBVTT

00:00:00.930 --> 00:00:03.580
In the context of the
problem of estimating

00:00:03.580 --> 00:00:09.000
the unknown bias of a coin, we
encountered this distribution,

00:00:09.000 --> 00:00:12.500
which is known as a
Beta distribution.

00:00:12.500 --> 00:00:16.750
It's a probability density
for a random variable, Theta,

00:00:16.750 --> 00:00:21.650
that takes values in the
interval from 0 to 1.

00:00:21.650 --> 00:00:24.950
So this formula is valid
for thetas in this range.

00:00:24.950 --> 00:00:28.700
And k here is a
non-negative integer.

00:00:31.980 --> 00:00:35.605
Now, this coefficient
here, d(n,k),

00:00:35.605 --> 00:00:39.740
is a normalizing constant,
which is needed so that this is

00:00:39.740 --> 00:00:43.560
a legitimate PDF, that
it integrates to 1.

00:00:43.560 --> 00:00:47.060
And so in particular,
d needs to be

00:00:47.060 --> 00:00:50.440
equal to the integral of what
we have in the numerator.

00:00:50.440 --> 00:00:53.540
This is the choice that makes
this whole expression integrate

00:00:53.540 --> 00:00:54.550
to 1.

00:00:54.550 --> 00:00:57.790
And this integral is
calculated and can

00:00:57.790 --> 00:01:02.110
be found to be equal to
this particular expression.

00:01:02.110 --> 00:01:05.349
How do we derive
this expression?

00:01:05.349 --> 00:01:09.210
One way is to carry out a
long exercise in calculus.

00:01:09.210 --> 00:01:10.830
We have this integral here.

00:01:10.830 --> 00:01:14.500
You might either expand it
and then integrate and collect

00:01:14.500 --> 00:01:18.800
terms, or you could try to
demonstrate this equality

00:01:18.800 --> 00:01:22.310
by applying
integration by parts.

00:01:22.310 --> 00:01:23.940
But this is complicated.

00:01:23.940 --> 00:01:27.020
Is there some simple way
of arguing and deriving

00:01:27.020 --> 00:01:28.200
this expression?

00:01:28.200 --> 00:01:31.370
We will see that there is a very
simple probabilistic argument

00:01:31.370 --> 00:01:33.670
for deriving this equality.

00:01:33.670 --> 00:01:36.680
What we will actually derive
is this same equality,

00:01:36.680 --> 00:01:39.190
but in slightly
different notation.

00:01:39.190 --> 00:01:41.330
Instead of k, we will use alpha.

00:01:41.330 --> 00:01:45.200
Instead of n minus
k, we will use beta.

00:01:45.200 --> 00:01:49.070
So here we have alpha
factorial times beta factorial.

00:01:49.070 --> 00:01:51.130
In the denominator,
we have the sum

00:01:51.130 --> 00:01:54.360
of these two
coefficients plus 1,

00:01:54.360 --> 00:01:59.240
so this corresponds to alpha
plus beta plus 1 factorial.

00:01:59.240 --> 00:02:02.030
This is what we
want to demonstrate.

00:02:02.030 --> 00:02:08.350
What we will do will be to
consider the following setting.

00:02:08.350 --> 00:02:11.900
We start with alpha
plus beta plus 1,

00:02:11.900 --> 00:02:15.150
that many independent
random variables that

00:02:15.150 --> 00:02:17.940
are uniform on
the unit interval,

00:02:17.940 --> 00:02:21.490
and we will consider
the following event

00:02:21.490 --> 00:02:23.510
and its probability.

00:02:23.510 --> 00:02:26.630
This is the probability
that these random variables

00:02:26.630 --> 00:02:32.680
happen to be ordered in
some particular order.

00:02:32.680 --> 00:02:38.750
Let us call this event A, so
this is the probability of A.

00:02:38.750 --> 00:02:41.410
Now, this probability is
not hard to calculate.

00:02:41.410 --> 00:02:45.900
We have alpha plus beta plus 1
random variables-- independent,

00:02:45.900 --> 00:02:47.620
identically distributed.

00:02:47.620 --> 00:02:51.250
By symmetry, any
particular way of ordering

00:02:51.250 --> 00:02:54.590
these random variables
is equally likely.

00:02:54.590 --> 00:02:58.030
How many ways are there
to order alpha plus beta

00:02:58.030 --> 00:03:00.390
plus 1 random variables?

00:03:00.390 --> 00:03:03.440
It's the factorial of
the number of items

00:03:03.440 --> 00:03:05.090
that we're trying to order.

00:03:05.090 --> 00:03:06.570
We're talking about
the probability

00:03:06.570 --> 00:03:09.860
of a particular permutation,
so this probability

00:03:09.860 --> 00:03:15.190
is equal to 1 over the
number of permutations

00:03:15.190 --> 00:03:20.320
of alpha plus beta
plus 1 objects.

00:03:20.320 --> 00:03:23.280
So this is one expression for
the probability of this event

00:03:23.280 --> 00:03:24.280
A.

00:03:24.280 --> 00:03:27.850
Now, we're going to
calculate this probability

00:03:27.850 --> 00:03:29.670
in a different way.

00:03:29.670 --> 00:03:34.829
What we will do is we're going
to apply the total probability

00:03:34.829 --> 00:03:36.010
theorem.

00:03:36.010 --> 00:03:39.250
We're going to
condition on Z. We're

00:03:39.250 --> 00:03:43.770
going to calculate the
conditional probability of A

00:03:43.770 --> 00:03:48.020
given that Z takes
a specific value,

00:03:48.020 --> 00:03:52.570
and then weigh those
probabilities according

00:03:52.570 --> 00:03:59.380
to the probability density
of the random variable Z.

00:03:59.380 --> 00:04:02.300
So this is just the
total probability theorem

00:04:02.300 --> 00:04:05.300
applied to this
particular context.

00:04:05.300 --> 00:04:09.150
And now to make progress,
what we will need to do

00:04:09.150 --> 00:04:11.610
is to find this
conditional probability.

00:04:16.040 --> 00:04:22.810
We fix a constant little theta,
and we want the probability

00:04:22.810 --> 00:04:25.540
that this event happens.

00:04:25.540 --> 00:04:26.465
What is this event?

00:04:32.390 --> 00:04:42.620
It is the event that all of the
X's fall inside this interval,

00:04:42.620 --> 00:04:46.770
all the Y's fall
inside this interval,

00:04:46.770 --> 00:04:52.650
and furthermore, the X's are
sorted and the Y's are sorted.

00:04:52.650 --> 00:04:55.020
So let us write this out.

00:04:55.020 --> 00:05:01.180
It's the probability that
all of the X's happen

00:05:01.180 --> 00:05:06.990
to be less than
theta, all the Y's

00:05:06.990 --> 00:05:14.880
happen to be bigger than
theta, and also, not just that,

00:05:14.880 --> 00:05:25.610
but the X's are sorted,
and furthermore, the Y's

00:05:25.610 --> 00:05:26.990
are sorted as well.

00:05:34.790 --> 00:05:37.700
Clearly, if I give
you the value of theta

00:05:37.700 --> 00:05:41.870
so that Z is equal to theta,
for this event to happen,

00:05:41.870 --> 00:05:46.370
we must have all these
events here happen as well.

00:05:46.370 --> 00:05:51.180
So now, let us try to calculate
the probability of this event.

00:05:51.180 --> 00:05:53.090
We're going to use the
multiplication rule.

00:05:53.090 --> 00:05:55.890
First, take the
probability of this event

00:05:55.890 --> 00:05:58.485
and then the conditional
probability of that event.

00:06:01.460 --> 00:06:04.150
The X's and the Y's
are independent,

00:06:04.150 --> 00:06:08.215
so we can take the probability
of this event and then multiply

00:06:08.215 --> 00:06:11.480
with the probability of this
event involving the Y's.

00:06:11.480 --> 00:06:13.440
How about the probability
of this event,

00:06:13.440 --> 00:06:16.025
that all of the X's
are less than theta?

00:06:16.025 --> 00:06:18.990
Since the X's are
independent, this

00:06:18.990 --> 00:06:21.400
is going to be equal
to the probability

00:06:21.400 --> 00:06:24.870
that X1 is less than theta.

00:06:24.870 --> 00:06:26.440
What is this probability?

00:06:26.440 --> 00:06:30.040
Since X1 is uniform on the unit
interval and this is theta,

00:06:30.040 --> 00:06:32.040
the probability of
falling in this interval

00:06:32.040 --> 00:06:34.430
is equal to theta.

00:06:34.430 --> 00:06:39.070
Times the probability that
X2 is less than theta.

00:06:39.070 --> 00:06:42.670
This probability is,
again, theta and so on.

00:06:42.670 --> 00:06:47.310
We have alpha many
terms of that kind,

00:06:47.310 --> 00:06:50.510
so this probability that all
of these random variables

00:06:50.510 --> 00:06:55.100
are less theta is equal to
theta to the power of alpha.

00:06:55.100 --> 00:06:57.380
Similarly, about the Y's.

00:06:57.380 --> 00:07:00.300
For any particular
Y, the probability

00:07:00.300 --> 00:07:02.970
that it falls in
this interval is

00:07:02.970 --> 00:07:05.540
equal to the length of
this interval, which

00:07:05.540 --> 00:07:08.220
is 1 minus theta.

00:07:08.220 --> 00:07:11.830
This is the probability
for each one of the Y's.

00:07:11.830 --> 00:07:15.750
There's beta many Y's.

00:07:15.750 --> 00:07:17.370
The Y's are independent.

00:07:17.370 --> 00:07:20.080
So the probability that all
of them fall in this interval

00:07:20.080 --> 00:07:24.770
is going to be this number
to the power of beta.

00:07:28.620 --> 00:07:32.690
So suppose that I told
you that all the X's are

00:07:32.690 --> 00:07:37.330
less than theta,
and then I ask you,

00:07:37.330 --> 00:07:41.650
given this information,
what is the probability

00:07:41.650 --> 00:07:45.060
that the X's that
you got are arranged

00:07:45.060 --> 00:07:46.360
in this particular order?

00:07:51.280 --> 00:07:55.630
Now, because of the complete
symmetry of the problem,

00:07:55.630 --> 00:07:59.965
even if I told you that all the
X's fall inside this interval,

00:07:59.965 --> 00:08:04.690
any order of the X's
is equally likely.

00:08:04.690 --> 00:08:07.860
So the probability of
this particular order

00:08:07.860 --> 00:08:12.880
is going to be 1 over the
number of possible orderings.

00:08:12.880 --> 00:08:17.140
How many ways are there that
alpha items can be ordered?

00:08:17.140 --> 00:08:21.240
There are alpha factorial
possible orderings,

00:08:21.240 --> 00:08:25.410
so the probability that I
obtain one particular ordering

00:08:25.410 --> 00:08:29.010
is 1 over alpha factorial.

00:08:29.010 --> 00:08:31.850
And similarly, if I tell
you that the Y's all

00:08:31.850 --> 00:08:34.570
fell in this
interval by symmetry,

00:08:34.570 --> 00:08:36.620
the probability of
a particular order

00:08:36.620 --> 00:08:40.830
is going to be 1 over
the [number of possible]

00:08:40.830 --> 00:08:43.465
orders, which is beta factorial.

00:08:46.240 --> 00:08:46.930
All right.

00:08:46.930 --> 00:08:49.990
So we have this
conditional probability,

00:08:49.990 --> 00:08:54.830
and now we can go back to
this formula and substitute,

00:08:54.830 --> 00:09:00.290
and what we obtain is the
integral of this expression,

00:09:00.290 --> 00:09:05.660
theta to the alpha, 1 minus
theta [to the] beta, 1

00:09:05.660 --> 00:09:11.840
over alpha factorial times
1 over beta factorial.

00:09:11.840 --> 00:09:15.520
Then we have the density of
Z, but since Z is uniform,

00:09:15.520 --> 00:09:18.010
the density is equal to 1.

00:09:18.010 --> 00:09:21.720
And then we have a
factor of d theta.

00:09:21.720 --> 00:09:23.730
So what have we achieved?

00:09:23.730 --> 00:09:26.855
We calculated the
probability of the event A

00:09:26.855 --> 00:09:31.520
in two different ways, and we
got two seemingly different

00:09:31.520 --> 00:09:32.490
answers.

00:09:32.490 --> 00:09:35.420
But these two answers
have to agree.

00:09:35.420 --> 00:09:39.450
Therefore, this expression
is equal to that expression.

00:09:39.450 --> 00:09:43.930
And now if you take this factor,
1 over alpha factorial times 1

00:09:43.930 --> 00:09:47.170
over beta factorial, and
send it to the other side

00:09:47.170 --> 00:09:50.920
of the equation, what we
obtain is exactly the formula

00:09:50.920 --> 00:09:53.420
that we wished to derive.

00:09:53.420 --> 00:09:56.530
This example is an
instance of the following.

00:09:56.530 --> 00:09:58.840
There are certain
formulas in mathematics

00:09:58.840 --> 00:10:01.870
that are somewhat
complicated to derive,

00:10:01.870 --> 00:10:04.270
and their derivations
using, for example,

00:10:04.270 --> 00:10:06.570
calculus are quite unintuitive.

00:10:06.570 --> 00:10:10.090
But once you interpret
the various terms that

00:10:10.090 --> 00:10:14.280
appear in such a relation
in a probabilistic way,

00:10:14.280 --> 00:10:19.660
you can sometimes find very easy
derivations and explanations

00:10:19.660 --> 00:10:22.978
why such a formula
has to be true.