WEBVTT

00:00:00.550 --> 00:00:04.200
We will now study a problem
which is quite difficult to

00:00:04.200 --> 00:00:08.800
approach in a direct brute
force manner but becomes

00:00:08.800 --> 00:00:14.520
tractable once we break it down
into simpler pieces using

00:00:14.520 --> 00:00:17.410
several of the tricks that
we have learned so far.

00:00:17.410 --> 00:00:21.030
And this problem will also
be a good opportunity for

00:00:21.030 --> 00:00:23.840
reviewing some of the tricks
and techniques

00:00:23.840 --> 00:00:25.690
that we have developed.

00:00:25.690 --> 00:00:27.420
The problem is the following.

00:00:27.420 --> 00:00:28.790
There are n people.

00:00:28.790 --> 00:00:33.280
And let's say for the purpose of
illustration that we have 3

00:00:33.280 --> 00:00:36.825
people, persons 1, 2, and 3.

00:00:40.790 --> 00:00:43.280
And each person has a hat.

00:00:43.280 --> 00:00:46.780
They throw their hats
inside a box.

00:00:46.780 --> 00:00:51.770
And then each person picks a hat
at random out of that box.

00:00:51.770 --> 00:00:53.595
So here are the three parts.

00:01:00.610 --> 00:01:04.849
And one possible outcome of this
experiment is that person

00:01:04.849 --> 00:01:09.020
1 ends up with hat number 2,
person 2 ends up with hat

00:01:09.020 --> 00:01:14.120
number 1, person 3 ends
up with hat number 3.

00:01:14.120 --> 00:01:19.090
We could indicate the hats that
each person got by noting

00:01:19.090 --> 00:01:21.940
here the numbers associated
with each

00:01:21.940 --> 00:01:24.760
person, the hat numbers.

00:01:24.760 --> 00:01:29.470
And notice that this sequence
of numbers, which is a

00:01:29.470 --> 00:01:32.570
description of the outcome of
the experiment, is just a

00:01:32.570 --> 00:01:36.660
permutation of the numbers
1, 2, 3 of the hats.

00:01:36.660 --> 00:01:41.020
So we permute the hat numbers so
that we can place them next

00:01:41.020 --> 00:01:44.640
to the person that got
each one of the hats.

00:01:44.640 --> 00:01:49.020
In particular, we have n
factorial possible outcomes.

00:01:49.020 --> 00:01:52.710
This is the number of possible
permutations.

00:01:52.710 --> 00:01:56.340
What does it mean to pick
hats at random?

00:01:56.340 --> 00:01:58.680
One interpretation
is that every

00:01:58.680 --> 00:02:01.370
permutation is equally likely.

00:02:01.370 --> 00:02:05.660
And since we have n factorial
permutations, each permutation

00:02:05.660 --> 00:02:10.169
would have a probability
of 1 over n factorial.

00:02:10.169 --> 00:02:15.440
But there's another way of
describing our model, which is

00:02:15.440 --> 00:02:17.620
the following.

00:02:17.620 --> 00:02:22.650
Person 1 gets a hat at random
out of the three available.

00:02:22.650 --> 00:02:26.420
Then person 2 gets a hat
at random out of

00:02:26.420 --> 00:02:28.750
the remaining hats.

00:02:28.750 --> 00:02:31.860
Then person 3 gets the
remaining hat.

00:02:31.860 --> 00:02:34.600
Each time that there is a
choice, each one of the

00:02:34.600 --> 00:02:37.600
available hats is equally
likely to be picked

00:02:37.600 --> 00:02:39.700
as any other hat.

00:02:39.700 --> 00:02:42.310
Let us calculate the
probability, let's say, that

00:02:42.310 --> 00:02:45.710
this particular permutation
gets materialized.

00:02:45.710 --> 00:02:52.350
The probability that person 1
gets hat number 2 is 1/3.

00:02:52.350 --> 00:02:53.950
Then we're left with two hats.

00:02:57.350 --> 00:02:59.730
Person 2 has 2 hats
to choose from.

00:02:59.730 --> 00:03:02.680
The probability that it picks
this particular hat

00:03:02.680 --> 00:03:05.440
is going to be 1/2.

00:03:05.440 --> 00:03:10.330
And finally, person 3 has only 1
hat available, so it will be

00:03:10.330 --> 00:03:12.500
picked with probability 1.

00:03:12.500 --> 00:03:16.243
So the probability of this
particular permutation is one

00:03:16.243 --> 00:03:18.880
over 3 factorial.

00:03:18.880 --> 00:03:21.900
But you can repeat this argument
and consider any

00:03:21.900 --> 00:03:24.960
other permutation, and you will
always be getting the

00:03:24.960 --> 00:03:26.260
same answer.

00:03:26.260 --> 00:03:29.346
Any particular permutation has
the same probability, one over

00:03:29.346 --> 00:03:31.790
3 factorial.

00:03:31.790 --> 00:03:35.570
The same argument goes through
for the case of general n, n

00:03:35.570 --> 00:03:37.430
people and n hats.

00:03:37.430 --> 00:03:40.740
And we will find that any
permutation will have the same

00:03:40.740 --> 00:03:43.990
probability, 1/n factorial.

00:03:43.990 --> 00:03:48.350
Therefore, the process of
picking one hat at a time is

00:03:48.350 --> 00:03:52.660
probabilistically identical to
a model in which we simply

00:03:52.660 --> 00:03:55.435
state that all permutations
are equally likely.

00:03:58.040 --> 00:04:02.370
Now that we have described our
model and our process and the

00:04:02.370 --> 00:04:05.940
associated probabilities, let
us consider the question we

00:04:05.940 --> 00:04:07.680
want to answer.

00:04:07.680 --> 00:04:13.590
Let X be the number of people
who get their own hat back.

00:04:13.590 --> 00:04:18.380
For example, for the outcome
that we have drawn here, the

00:04:18.380 --> 00:04:22.110
only person who gets their
own hat back is person 3.

00:04:22.110 --> 00:04:26.720
And so in this case X happens
to take the value of 1.

00:04:26.720 --> 00:04:29.950
What we want to do is to
calculate the expected value

00:04:29.950 --> 00:04:35.370
of the random variable X. The
problem is difficult because

00:04:35.370 --> 00:04:41.040
if you try to calculate the PMF
of the random variable X

00:04:41.040 --> 00:04:45.520
and then use the definition of
the expectation to calculate

00:04:45.520 --> 00:04:51.260
this sum, you will run into
big difficulties.

00:04:51.260 --> 00:04:56.100
Calculating this quantity, the
PMF of X, is difficult.

00:04:56.100 --> 00:05:00.300
And it is difficult because
there is no simple expression

00:05:00.300 --> 00:05:02.230
that describes it.

00:05:02.230 --> 00:05:05.460
So we need to do something more
intelligent, find some

00:05:05.460 --> 00:05:08.180
other way of approaching
the problem.

00:05:08.180 --> 00:05:13.480
The trick that we will use is to
employ indicator variables.

00:05:13.480 --> 00:05:18.590
Let Xi be equal to one 1 if
person i selects their own hat

00:05:18.590 --> 00:05:20.680
and 0 otherwise.

00:05:20.680 --> 00:05:25.610
So then, each one of the Xi's
is 1 whenever a person has

00:05:25.610 --> 00:05:27.320
selected their own hat.

00:05:27.320 --> 00:05:32.659
And by adding all the 1's that
we may get, we obtain the

00:05:32.659 --> 00:05:37.580
total number of people who have
selected their own hats.

00:05:37.580 --> 00:05:40.390
This makes things easier,
because now to calculate the

00:05:40.390 --> 00:05:43.820
expected value of X it's
sufficient to calculate the

00:05:43.820 --> 00:05:47.440
expected value of each one of
those terms and add the

00:05:47.440 --> 00:05:50.450
expected values, which
we're allowed to

00:05:50.450 --> 00:05:53.260
do because of linearity.

00:05:53.260 --> 00:05:57.370
So let's look at the
typical term here.

00:05:57.370 --> 00:06:00.410
What is the expected
value of Xi?

00:06:00.410 --> 00:06:04.000
If you consider the first
description or our model, all

00:06:04.000 --> 00:06:07.300
permutations are equally likely,
this description is

00:06:07.300 --> 00:06:10.410
symmetric with respect to
all of the persons.

00:06:10.410 --> 00:06:14.160
So the expected value of Xi
should be the same as the

00:06:14.160 --> 00:06:16.040
expected value of X1.

00:06:18.910 --> 00:06:24.300
Now, to calculate the expected
value of X1, we will consider

00:06:24.300 --> 00:06:27.740
the sequential description of
the process in which 1 is the

00:06:27.740 --> 00:06:30.780
first person to pick a hat.

00:06:30.780 --> 00:06:34.190
Now, since X1 is a Bernoulli
random variable that takes

00:06:34.190 --> 00:06:38.490
values 0 or 1, the expected
value of X1 is just the

00:06:38.490 --> 00:06:42.680
probability that X1
is equal to 1.

00:06:42.680 --> 00:06:46.130
And if person 1 is the first
one to choose a hat, that

00:06:46.130 --> 00:06:53.540
person has probability 1/n of
obtaining the correct hat.

00:06:53.540 --> 00:06:56.560
So each one of these random
variables has an expected

00:06:56.560 --> 00:06:58.065
value of 1/n.

00:07:01.050 --> 00:07:05.400
The expected value of X by
linearity is going to be the

00:07:05.400 --> 00:07:07.465
sum of the expected values.

00:07:11.660 --> 00:07:14.950
There is n of them.

00:07:14.950 --> 00:07:17.010
Each expected value is 1/n.

00:07:20.160 --> 00:07:24.000
And so the final answer is 1.

00:07:24.000 --> 00:07:32.060
This is the expected value
of the random variable X.

00:07:32.060 --> 00:07:36.140
Let us now move and try to
calculate a more difficult

00:07:36.140 --> 00:07:44.060
quantity, namely, the variance
of X. How shall we proceed?

00:07:44.060 --> 00:07:50.300
Things would be easiest if the
random variables Xi were

00:07:50.300 --> 00:07:51.370
independent.

00:07:51.370 --> 00:07:55.630
Because in that case, the
variance of X would be the sum

00:07:55.630 --> 00:07:58.450
of the variances of the Xi's.

00:07:58.450 --> 00:08:01.960
But are the Xi's independent?

00:08:01.960 --> 00:08:04.220
Let us consider a
special case.

00:08:04.220 --> 00:08:09.300
Suppose that we only have two
persons and that I tell you

00:08:09.300 --> 00:08:13.660
that the first person got
their own hat back.

00:08:13.660 --> 00:08:17.980
In that case, the second person
must have also gotten

00:08:17.980 --> 00:08:20.580
their own hat back.

00:08:20.580 --> 00:08:23.840
If, on the other hand, person 1
did not to get their own hat

00:08:23.840 --> 00:08:30.160
back, then person 2 will not get
their own hat back either.

00:08:30.160 --> 00:08:33.808
Because in this scenario, person
1 gets hat 2, and that

00:08:33.808 --> 00:08:36.370
means that person
2 gets hat 1.

00:08:36.370 --> 00:08:39.419
So we see that knowing the value
of the random variable

00:08:39.419 --> 00:08:43.240
X1 tells us a lot about
the value of the

00:08:43.240 --> 00:08:45.030
random variable X2.

00:08:45.030 --> 00:08:46.650
And that means that the
random variables

00:08:46.650 --> 00:08:49.740
X1 and X2 are dependent.

00:08:49.740 --> 00:08:53.140
More generally, if I were to
tell you that the first n

00:08:53.140 --> 00:08:58.680
minus 1 people got their own
hats back, then the last

00:08:58.680 --> 00:09:02.860
remaining person will have
his or her own hat

00:09:02.860 --> 00:09:04.140
available to be picked.

00:09:04.140 --> 00:09:06.770
That's going to be the
only available hat.

00:09:06.770 --> 00:09:10.020
And then person n we also
get their hat back.

00:09:10.020 --> 00:09:13.730
So we see that the information
about some of the Xi's gives

00:09:13.730 --> 00:09:18.180
us information about
the remaining Xn.

00:09:18.180 --> 00:09:19.920
And again, this means
that the random

00:09:19.920 --> 00:09:23.140
variables are dependent.

00:09:23.140 --> 00:09:25.870
Since we do not have
independence, we cannot find

00:09:25.870 --> 00:09:28.760
the variance by just adding the
variances of the different

00:09:28.760 --> 00:09:30.120
random variables.

00:09:30.120 --> 00:09:35.700
But we need to do a lot more
work in that direction.

00:09:35.700 --> 00:09:39.130
In general, whenever we need to
calculate variances, it is

00:09:39.130 --> 00:09:43.220
usually simpler to carry out
the calculation using this

00:09:43.220 --> 00:09:46.710
alternative form for
the variance.

00:09:46.710 --> 00:09:51.440
So let us start towards a
calculation of the expected

00:09:51.440 --> 00:09:54.720
value of X squared.

00:09:54.720 --> 00:09:59.990
Now the random variable X
squared, by simple algebra, is

00:09:59.990 --> 00:10:02.390
this expression times itself.

00:10:02.390 --> 00:10:05.380
And by expanding the
product we get all

00:10:05.380 --> 00:10:07.550
sorts of cross terms.

00:10:07.550 --> 00:10:11.900
Some of these cross terms will
be of the type X1 times Xi or

00:10:11.900 --> 00:10:14.050
X2 times X2.

00:10:14.050 --> 00:10:19.300
These will be terms of this
form, and there is n of them.

00:10:19.300 --> 00:10:23.770
And then we get cross terms,
such as X1 times X2, X1 times

00:10:23.770 --> 00:10:27.670
X3, X2 times X1, and so on.

00:10:27.670 --> 00:10:30.880
How many terms do
we have here?

00:10:30.880 --> 00:10:34.100
Well, if we have n terms
multiplying n other terms we

00:10:34.100 --> 00:10:37.330
have a total of n
squared terms.

00:10:37.330 --> 00:10:41.450
n are already here, so the
remaining terms, which are the

00:10:41.450 --> 00:10:46.050
cross terms, will be
n squared minus n.

00:10:46.050 --> 00:10:53.370
Or, in a simpler form, it's
n times n minus 1.

00:10:53.370 --> 00:10:57.280
So now how are we going to
calculate the expected value

00:10:57.280 --> 00:10:58.490
of X squared?

00:10:58.490 --> 00:11:01.310
Well, we will use linearity
of expectations.

00:11:01.310 --> 00:11:04.890
So we need to calculate the
expected value of Xi squared,

00:11:04.890 --> 00:11:09.080
and we also need to calculate
the expected value of Xi Xj

00:11:09.080 --> 00:11:12.846
when i is different from j.

00:11:12.846 --> 00:11:14.820
Let us start with Xi squared.

00:11:17.800 --> 00:11:22.600
First, if we use the symmetric
description of our model, all

00:11:22.600 --> 00:11:26.920
permutations are equally likely,
then all persons play

00:11:26.920 --> 00:11:27.440
the same role.

00:11:27.440 --> 00:11:29.130
There's symmetry
in the problem.

00:11:29.130 --> 00:11:34.450
So Xi squared has the same
distribution as X1 squared.

00:11:38.270 --> 00:11:42.340
Then, X1 is a 0-1 random
variable, a

00:11:42.340 --> 00:11:43.980
Bernoulli random variable.

00:11:43.980 --> 00:11:48.310
So X1 squared will always take
the same numerical value as

00:11:48.310 --> 00:11:49.730
the random variable X1.

00:11:52.400 --> 00:11:56.300
This is a very special case
which happens only because a

00:11:56.300 --> 00:11:59.140
random variable takes
values in {0, 1}.

00:11:59.140 --> 00:12:01.550
And 0 squared is
the same as 0.

00:12:01.550 --> 00:12:04.575
1 squared is the same as 1.

00:12:04.575 --> 00:12:07.660
This expected value is something
that we have already

00:12:07.660 --> 00:12:09.610
calculated, and it is 1/n.

00:12:13.410 --> 00:12:16.270
Let us now move to the
calculation of the expectation

00:12:16.270 --> 00:12:20.060
of a typical term
inside the sum.

00:12:20.060 --> 00:12:22.320
So let i be different than
j, and look at the

00:12:22.320 --> 00:12:25.280
expected value of Xi Xj.

00:12:25.280 --> 00:12:28.550
Once more, because of the
symmetry of the probabilistic

00:12:28.550 --> 00:12:33.160
model, it doesn't matter which
i and j we are considering.

00:12:33.160 --> 00:12:37.585
So we might as well consider
the product of X1 with X2.

00:12:40.770 --> 00:12:44.760
Now, X1 and X2 take
values 0 and 1.

00:12:44.760 --> 00:12:48.620
And the product of the two also
takes values 0 and 1.

00:12:48.620 --> 00:12:51.440
So this is a Bernoulli random
variable, and so the

00:12:51.440 --> 00:12:54.340
expectation of that random
variable is just the

00:12:54.340 --> 00:12:58.535
probability that this random
variable is equal to 1.

00:13:01.060 --> 00:13:05.040
But for the product to be equal
to 1, the only way that

00:13:05.040 --> 00:13:09.940
this can happen is if both of
these random variables happen

00:13:09.940 --> 00:13:11.590
to be equal to 1.

00:13:14.890 --> 00:13:18.430
Let us now turn to
the sequential

00:13:18.430 --> 00:13:20.370
description of the model.

00:13:20.370 --> 00:13:24.350
The probability that the first
person gets their own hat back

00:13:24.350 --> 00:13:27.400
and the second person gets
their own hat back is the

00:13:27.400 --> 00:13:32.270
probability that the first one
gets their own hat back, and

00:13:32.270 --> 00:13:36.020
then multiplied by the
conditional probability that

00:13:36.020 --> 00:13:40.380
the second person gets their own
hat back, given that the

00:13:40.380 --> 00:13:43.540
first person got their
own hat back.

00:13:43.540 --> 00:13:44.650
What are these probabilities?

00:13:44.650 --> 00:13:46.150
The probability that
a person gets their

00:13:46.150 --> 00:13:47.630
own hat back is 1/n.

00:13:50.720 --> 00:13:55.010
Given that person 1 got their
own hat back, person 2 is

00:13:55.010 --> 00:13:57.450
faced with a situation
where there are n

00:13:57.450 --> 00:13:59.800
minus 1 available hats.

00:13:59.800 --> 00:14:02.540
And one of those is
that person's hat.

00:14:02.540 --> 00:14:07.400
So the probability that person
2 will also pick his or her

00:14:07.400 --> 00:14:11.350
own hat is 1 over n minus 1.

00:14:14.920 --> 00:14:19.610
Now we are in a position to
calculate the expected value

00:14:19.610 --> 00:14:23.360
of X squared.

00:14:23.360 --> 00:14:31.180
The expected value of X squared
consists of the sum of

00:14:31.180 --> 00:14:42.110
n expected values, each one of
which is equal to 1/n plus so

00:14:42.110 --> 00:14:48.400
many expected values, because
we have so many terms, each

00:14:48.400 --> 00:14:54.490
one of which, by this
calculation, is 1/n times 1

00:14:54.490 --> 00:14:57.970
over n minus 1.

00:14:57.970 --> 00:15:00.780
And we see that we get
cancellations here.

00:15:00.780 --> 00:15:04.390
And we obtain 1 plus 1,
which is equal to 2.

00:15:09.450 --> 00:15:13.840
On the other hand we have this
term that we need to subtract.

00:15:13.840 --> 00:15:16.370
We found previously that
the expected value of

00:15:16.370 --> 00:15:17.860
X is equal to 1.

00:15:17.860 --> 00:15:19.950
So we need to subtract 1.

00:15:19.950 --> 00:15:23.800
And the final answer to our
problem is that the variance

00:15:23.800 --> 00:15:26.520
of X is also equal to 1.

00:15:32.200 --> 00:15:36.780
So what we saw in this problem
is that we can deal with quite

00:15:36.780 --> 00:15:41.810
complicated models, but by
breaking them down into more

00:15:41.810 --> 00:15:47.600
manageable pieces, first break
down the random variable X as

00:15:47.600 --> 00:15:51.270
a sum of different random
variables, then taking the

00:15:51.270 --> 00:15:54.680
square of this and break it
down into a number of

00:15:54.680 --> 00:15:58.170
different terms, and then by
considering one term at a

00:15:58.170 --> 00:16:04.550
time, we can often end up with
the solutions or the answers

00:16:04.550 --> 00:16:06.110
to problems that
would have been

00:16:06.110 --> 00:16:07.530
otherwise quite difficult.