WEBVTT
00:00:00.040 --> 00:00:02.460
The following content is
provided under a Creative
00:00:02.460 --> 00:00:03.870
Commons license.
00:00:03.870 --> 00:00:06.910
Your support will help MIT
OpenCourseWare continue to
00:00:06.910 --> 00:00:10.560
offer high-quality educational
resources for free.
00:00:10.560 --> 00:00:13.460
To make a donation or view
additional materials from
00:00:13.460 --> 00:00:19.290
hundreds of MIT courses, visit
MIT OpenCourseWare at
00:00:19.290 --> 00:00:20.540
ocw.mit.edu.
00:00:22.420 --> 00:00:25.310
PROFESSOR: OK, good morning.
00:00:25.310 --> 00:00:30.930
So today, we're going to have
a fairly packed lecture.
00:00:30.930 --> 00:00:34.060
We are going to conclude
with chapter two,
00:00:34.060 --> 00:00:35.560
discrete random variables.
00:00:35.560 --> 00:00:37.140
And we will be talking
mostly about
00:00:37.140 --> 00:00:39.322
multiple random variables.
00:00:39.322 --> 00:00:43.060
And this is also the
last lecture as far
00:00:43.060 --> 00:00:44.720
as quiz one is concerned.
00:00:44.720 --> 00:00:48.350
So it's going to cover the
material until today, and of
00:00:48.350 --> 00:00:52.550
course the next recitation
and tutorial as well.
00:00:52.550 --> 00:00:57.170
OK, so we're going to review
quickly what we introduced at
00:00:57.170 --> 00:01:01.040
the end of last lecture, where
we talked about the joint PMF
00:01:01.040 --> 00:01:02.300
of two random variables.
00:01:02.300 --> 00:01:05.040
We're going to talk about the
case of more than two random
00:01:05.040 --> 00:01:07.440
variables as well.
00:01:07.440 --> 00:01:09.910
We're going to talk about
the familiar concepts of
00:01:09.910 --> 00:01:14.300
conditioning and independence,
but applied to random
00:01:14.300 --> 00:01:16.460
variables instead of events.
00:01:16.460 --> 00:01:19.320
We're going to look at the
expectations once more, talk
00:01:19.320 --> 00:01:22.720
about a few properties that they
have, and then solve a
00:01:22.720 --> 00:01:25.900
couple of problems and calculate
a few things in
00:01:25.900 --> 00:01:28.180
somewhat clever ways.
00:01:28.180 --> 00:01:31.790
So the first point I want to
make is that, to a large
00:01:31.790 --> 00:01:34.870
extent, whatever is happening
in our chapter on discrete
00:01:34.870 --> 00:01:39.160
random variables is just an
exercise in notation.
00:01:39.160 --> 00:01:42.850
There is stuff and concepts that
you are already familiar
00:01:42.850 --> 00:01:45.230
with-- probabilities,
probabilities of two things
00:01:45.230 --> 00:01:47.490
happening, conditional
probabilities.
00:01:47.490 --> 00:01:51.760
And all that we're doing, to
some extent, is rewriting
00:01:51.760 --> 00:01:54.840
those familiar concepts
in new notation.
00:01:54.840 --> 00:01:57.810
So for example, this
is the joint PMF
00:01:57.810 --> 00:01:59.020
of two random variable.
00:01:59.020 --> 00:02:02.080
It gives us, for any pair or
possible values of those
00:02:02.080 --> 00:02:05.510
random variables, the
probability that that pair
00:02:05.510 --> 00:02:07.270
occurs simultaneously.
00:02:07.270 --> 00:02:10.020
So it's the probability that
simultaneously x takes that
00:02:10.020 --> 00:02:13.580
value, and y takes
that other value.
00:02:13.580 --> 00:02:17.210
And similarly, we have the
notion of the conditional PMF,
00:02:17.210 --> 00:02:21.060
which is just a list of the --
condition of -- the various
00:02:21.060 --> 00:02:23.750
conditional probabilities
of interest, conditional
00:02:23.750 --> 00:02:26.450
probability that one random
variable takes this value
00:02:26.450 --> 00:02:30.320
given that the other random
variable takes that value.
00:02:30.320 --> 00:02:33.640
Now, a remark about conditional
probabilities.
00:02:33.640 --> 00:02:36.640
Conditional probabilities
generally are like ordinary
00:02:36.640 --> 00:02:37.370
probabilities.
00:02:37.370 --> 00:02:40.170
You condition on something
particular.
00:02:40.170 --> 00:02:43.230
So here we condition
on a particular y.
00:02:43.230 --> 00:02:46.580
So think of little y as
a fixed quantity.
00:02:46.580 --> 00:02:49.800
And then look at this
as a function of x.
00:02:49.800 --> 00:02:54.430
So given that y, which we
condition on, given our new
00:02:54.430 --> 00:02:58.990
universe, we're considering the
various possibilities for
00:02:58.990 --> 00:03:01.290
x and the probabilities
that they have.
00:03:01.290 --> 00:03:04.000
Now, the probabilities over
all x's, of course,
00:03:04.000 --> 00:03:05.830
needs to add to 1.
00:03:05.830 --> 00:03:11.530
So we should have a relation
of this kind.
00:03:11.530 --> 00:03:14.420
So they're just like ordinary
probabilities over the
00:03:14.420 --> 00:03:18.230
different x's in a universe
where we are told the value of
00:03:18.230 --> 00:03:20.940
the random variable y.
00:03:20.940 --> 00:03:22.335
Now, how are these related?
00:03:25.200 --> 00:03:28.190
So we call these the marginal,
these the joint, these the
00:03:28.190 --> 00:03:29.150
conditional.
00:03:29.150 --> 00:03:31.510
And there are some relations
between these.
00:03:31.510 --> 00:03:35.430
For example, to find the
marginal from the joint, it's
00:03:35.430 --> 00:03:37.730
pretty straightforward.
00:03:37.730 --> 00:03:41.680
The probability that x takes a
particular value is the sum of
00:03:41.680 --> 00:03:45.030
the probabilities of all of the
different ways that this
00:03:45.030 --> 00:03:47.190
particular value may occur.
00:03:47.190 --> 00:03:48.380
What are the different ways?
00:03:48.380 --> 00:03:51.910
Well, it may occur together with
a certain y, or together
00:03:51.910 --> 00:03:55.110
with some other y, or together
with some other y.
00:03:55.110 --> 00:03:58.030
So you look at all the possible
y's that can go
00:03:58.030 --> 00:04:01.750
together with this x, and add
the probabilities of all of
00:04:01.750 --> 00:04:07.220
those pairs for which we get
this particular value of x.
00:04:07.220 --> 00:04:13.120
And then there's a relation
between that connects these
00:04:13.120 --> 00:04:16.230
two probabilities with the
conditional probability.
00:04:16.230 --> 00:04:18.630
And it's this relation.
00:04:18.630 --> 00:04:20.279
It's nothing new.
00:04:20.279 --> 00:04:25.160
It's just new notation for
writing what we already know,
00:04:25.160 --> 00:04:28.130
that the probability of two
things happening is the
00:04:28.130 --> 00:04:31.460
probability that the first thing
happens, and then given
00:04:31.460 --> 00:04:34.210
that the first thing happens,
the probability that the
00:04:34.210 --> 00:04:36.140
second one happened.
00:04:36.140 --> 00:04:39.050
So how do we go from
one to the other?
00:04:39.050 --> 00:04:42.960
Think of A as being the event
that X takes the value, little
00:04:42.960 --> 00:04:49.120
x, and B being the event that
Y takes the value, little y.
00:04:49.120 --> 00:04:52.230
So the joint probability is the
probability that these two
00:04:52.230 --> 00:04:54.220
things happen simultaneously.
00:04:54.220 --> 00:04:58.140
It's the probability that X
takes this value times the
00:04:58.140 --> 00:05:03.280
conditional probability that Y
takes this value, given that X
00:05:03.280 --> 00:05:04.670
took that first value.
00:05:04.670 --> 00:05:08.470
So it's the familiar
multiplication rule, but just
00:05:08.470 --> 00:05:11.030
transcribed in our
new notation.
00:05:11.030 --> 00:05:13.690
So nothing new so far.
00:05:13.690 --> 00:05:17.480
OK, why did we go through this
exercise and this notation?
00:05:17.480 --> 00:05:19.980
It's because in the experiments
where we're
00:05:19.980 --> 00:05:23.160
interested in the real world,
typically there's going to be
00:05:23.160 --> 00:05:24.630
lots of uncertain quantities.
00:05:24.630 --> 00:05:27.150
There's going to be multiple
random variables.
00:05:27.150 --> 00:05:31.520
And we want to be able to talk
about them simultaneously.
00:05:31.520 --> 00:05:31.665
Okay.
00:05:31.665 --> 00:05:35.110
Why two and not more than two?
00:05:35.110 --> 00:05:37.620
How about three random
variables?
00:05:37.620 --> 00:05:41.290
Well, if you understand what's
going on in this slide, you
00:05:41.290 --> 00:05:45.720
should be able to kind of
automatically generalize this
00:05:45.720 --> 00:05:48.260
to the case of multiple
random variables.
00:05:48.260 --> 00:05:51.590
So for example, if we have three
random variables, X, Y,
00:05:51.590 --> 00:05:56.720
and Z, and you see an expression
like this, it
00:05:56.720 --> 00:05:58.670
should be clear what it means.
00:05:58.670 --> 00:06:02.070
It's the probability that
X takes this value and
00:06:02.070 --> 00:06:06.240
simultaneously Y takes that
value and simultaneously Z
00:06:06.240 --> 00:06:07.765
takes that value.
00:06:07.765 --> 00:06:13.280
I guess that's an uppercase Z
here, that's a lowercase z.
00:06:13.280 --> 00:06:20.500
And if I ask you to find the
marginal of X, if I tell you
00:06:20.500 --> 00:06:24.340
the joint PMF of the three
random variables and I ask you
00:06:24.340 --> 00:06:27.320
for this value, how
would you find it?
00:06:27.320 --> 00:06:31.350
Well, you will try to generalize
this relation here.
00:06:31.350 --> 00:06:35.250
The probability that x occurs
is the sum of the
00:06:35.250 --> 00:06:44.450
probabilities of all events
that make X to take that
00:06:44.450 --> 00:06:45.870
particular value.
00:06:45.870 --> 00:06:47.400
So what are all the events?
00:06:47.400 --> 00:06:51.530
Well, this particular x can
happen together with some y
00:06:51.530 --> 00:06:52.790
and some z.
00:06:52.790 --> 00:06:55.150
We don't care which y and z.
00:06:55.150 --> 00:06:57.890
Any y and z will do.
00:06:57.890 --> 00:07:01.220
So when we consider all
possibilities, we need to add
00:07:01.220 --> 00:07:04.760
here over all possible values
of y's and z's.
00:07:04.760 --> 00:07:08.020
So consider all triples,
x, y, z.
00:07:08.020 --> 00:07:12.380
Fix x and consider all the
possibilities for the
00:07:12.380 --> 00:07:16.600
remaining variables, y and z,
add these up, and that gives
00:07:16.600 --> 00:07:24.740
you the marginal PMF of X. And
then there's other things that
00:07:24.740 --> 00:07:26.140
you can do.
00:07:26.140 --> 00:07:29.340
This is the multiplication
rule for two events.
00:07:29.340 --> 00:07:32.510
We saw back in chapter one that
there's a multiplication
00:07:32.510 --> 00:07:35.130
rule when you talk about
more than two events.
00:07:35.130 --> 00:07:38.860
And you can write a chain of
conditional probabilities.
00:07:38.860 --> 00:07:43.860
We can certainly do the same
in our new notation.
00:07:43.860 --> 00:07:45.810
So let's look at this
rule up here.
00:07:48.700 --> 00:07:51.220
Multiplication rule for three
random variables,
00:07:51.220 --> 00:07:53.000
what does it say?
00:07:53.000 --> 00:07:55.280
The probability of three
things happening
00:07:55.280 --> 00:07:59.770
simultaneously, X, Y, Z taking
specific values, little x,
00:07:59.770 --> 00:08:03.110
little y, little z, that
probability is the probability
00:08:03.110 --> 00:08:07.210
that the first thing happens,
that X takes that value.
00:08:07.210 --> 00:08:09.880
Given that X takes that value,
we multiply it with the
00:08:09.880 --> 00:08:14.650
probability that Y takes
also a certain value.
00:08:14.650 --> 00:08:18.560
And now, given that X and Y have
taken those particular
00:08:18.560 --> 00:08:21.730
values, we multiply with a
conditional probability that
00:08:21.730 --> 00:08:24.380
the third thing happens,
given that the
00:08:24.380 --> 00:08:26.960
first two things happen.
00:08:26.960 --> 00:08:30.080
So this is just the
multiplication rule for three
00:08:30.080 --> 00:08:33.530
events, which would be
probability of A intersection
00:08:33.530 --> 00:08:35.669
B intersection C equals--
00:08:35.669 --> 00:08:37.909
you know the rest
of the formula.
00:08:37.909 --> 00:08:42.330
You just rewrite this formula
in PMF notation.
00:08:42.330 --> 00:08:45.310
Probability of A intersection
B intersection C is the
00:08:45.310 --> 00:08:49.450
probability of A, which
corresponds to this term,
00:08:49.450 --> 00:08:54.010
times the probability of B given
A, times the probability
00:08:54.010 --> 00:09:00.700
of C given A and B.
00:09:00.700 --> 00:09:04.920
So what else is there that's
left from chapter one that we
00:09:04.920 --> 00:09:10.190
can or should generalize
to random variables?
00:09:10.190 --> 00:09:12.560
Well, there's the notion
of independence.
00:09:12.560 --> 00:09:16.720
So let's define what
independence means.
00:09:16.720 --> 00:09:19.970
Instead of talking about just
two random variables, let's go
00:09:19.970 --> 00:09:22.470
directly to the case of multiple
random variables.
00:09:22.470 --> 00:09:24.400
When we talked about events,
things were a little
00:09:24.400 --> 00:09:25.100
complicated.
00:09:25.100 --> 00:09:28.480
We had a simple definition for
independence of two events.
00:09:28.480 --> 00:09:31.950
Two events are independent if
the probability of both is
00:09:31.950 --> 00:09:33.740
equal to the product of
the probabilities.
00:09:33.740 --> 00:09:35.830
But for three events, it
was kind of messy.
00:09:35.830 --> 00:09:38.460
We needed to write down
lots of conditions.
00:09:38.460 --> 00:09:41.140
For random variables,
things in some sense
00:09:41.140 --> 00:09:42.060
are a little simpler.
00:09:42.060 --> 00:09:46.360
We only need to write down one
formula and take this as the
00:09:46.360 --> 00:09:49.020
definition of independence.
00:09:49.020 --> 00:09:53.630
Three random variables are
independent if and only if, by
00:09:53.630 --> 00:09:58.390
definition, their joint
probability mass function
00:09:58.390 --> 00:10:02.560
factors out into individual
probability mass functions.
00:10:02.560 --> 00:10:08.190
So the probability that all
three things happen is the
00:10:08.190 --> 00:10:11.840
product of the individual
probabilities that each one of
00:10:11.840 --> 00:10:14.170
these three things
is happening.
00:10:14.170 --> 00:10:17.580
So independence means
mathematically that you can
00:10:17.580 --> 00:10:21.030
just multiply probabilities to
get to the probability of
00:10:21.030 --> 00:10:22.706
several things happening
simultaneously.
00:10:25.680 --> 00:10:31.040
So with three events, we have
to write a huge number of
00:10:31.040 --> 00:10:34.500
equations, of equalities
that have to hold.
00:10:34.500 --> 00:10:37.500
How can it be that with random
variables we can only manage
00:10:37.500 --> 00:10:39.370
with one equality?
00:10:39.370 --> 00:10:41.230
Well, the catch is
that this is not
00:10:41.230 --> 00:10:43.260
really just one equality.
00:10:43.260 --> 00:10:48.390
We require this to be true for
every little x, y, and z.
00:10:48.390 --> 00:10:52.600
So in some sense, this is a
bunch of conditions that are
00:10:52.600 --> 00:10:56.300
being put on the joint PMF, a
bunch of conditions that we
00:10:56.300 --> 00:10:58.130
need to check.
00:10:58.130 --> 00:11:01.040
So this is the mathematical
definition.
00:11:01.040 --> 00:11:05.400
What is the intuitive content
of this definition?
00:11:05.400 --> 00:11:11.130
The intuitive content is
the same as for events.
00:11:11.130 --> 00:11:15.020
Random variables are independent
if knowing
00:11:15.020 --> 00:11:19.490
something about the realized
values of some of these random
00:11:19.490 --> 00:11:25.510
variables does not change our
beliefs about the likelihood
00:11:25.510 --> 00:11:29.510
of various values for the
remaining random variables.
00:11:29.510 --> 00:11:34.250
So independence would translate,
for example, to a
00:11:34.250 --> 00:11:39.690
condition such as the
conditional PMF of X , given
00:11:39.690 --> 00:11:46.420
y, should be equal to the
marginal PMF of X. What is
00:11:46.420 --> 00:11:47.490
this saying?
00:11:47.490 --> 00:11:53.070
That you have some original
beliefs about how likely it is
00:11:53.070 --> 00:11:55.210
for X to take this value.
00:11:55.210 --> 00:11:58.350
Now, someone comes and
tells you that Y took
00:11:58.350 --> 00:12:00.140
on a certain value.
00:12:00.140 --> 00:12:03.470
This causes you, in principle,
to revise your beliefs.
00:12:03.470 --> 00:12:06.430
And your new beliefs will be
captured by the conditional
00:12:06.430 --> 00:12:08.750
PMF, or the conditional
probabilities.
00:12:08.750 --> 00:12:12.820
Independence means that your
revised beliefs actually will
00:12:12.820 --> 00:12:15.420
be the same as your
original beliefs.
00:12:15.420 --> 00:12:19.960
Telling you information about
the value of Y doesn't change
00:12:19.960 --> 00:12:24.400
what you expect for the
random variable X.
00:12:24.400 --> 00:12:28.750
Why didn't we use this
definition for independence?
00:12:28.750 --> 00:12:31.900
Well, because this definition
only makes sense when this
00:12:31.900 --> 00:12:34.330
conditional is well-defined.
00:12:34.330 --> 00:12:43.290
And this conditional is only
well-defined if the events
00:12:43.290 --> 00:12:46.130
that Y takes on that particular
value has positive
00:12:46.130 --> 00:12:47.220
probability.
00:12:47.220 --> 00:12:51.730
We cannot condition on events
that have zero probability, so
00:12:51.730 --> 00:12:55.460
conditional probabilities are
only defined for y's that are
00:12:55.460 --> 00:12:59.500
likely to occur, that have
a positive probability.
00:12:59.500 --> 00:13:03.640
Now, similarly, with multiple
random variables, if they're
00:13:03.640 --> 00:13:07.970
independent, you would have
relations such as the
00:13:07.970 --> 00:13:14.290
conditional of X, given y and
z, should be the same as the
00:13:14.290 --> 00:13:17.340
marginal of X. What
is this saying?
00:13:17.340 --> 00:13:21.220
Again, that if I tell you the
values, the realized values of
00:13:21.220 --> 00:13:25.900
random variables Y and Z, this
is not going to change your
00:13:25.900 --> 00:13:28.900
beliefs about how likely
x is to occur.
00:13:28.900 --> 00:13:30.900
Whatever you believed in the
beginning, you're going to
00:13:30.900 --> 00:13:33.000
believe the same thing
afterwards.
00:13:33.000 --> 00:13:36.130
So it's important to keep that
intuition in mind, because
00:13:36.130 --> 00:13:39.200
sometimes this way you can tell
whether random variables
00:13:39.200 --> 00:13:42.820
are independent without having
to do calculations and to
00:13:42.820 --> 00:13:44.930
check this formula.
00:13:44.930 --> 00:13:47.300
OK, so let's check our concepts
00:13:47.300 --> 00:13:49.250
with a simple example.
00:13:49.250 --> 00:13:52.220
Let's look at two random
variables that are discrete,
00:13:52.220 --> 00:13:55.100
take values between
one and for each.
00:13:55.100 --> 00:13:57.890
And this is a table that
gives us the joint PMF.
00:13:57.890 --> 00:14:05.720
So it tells us the probability
that X equals to 2 and Y
00:14:05.720 --> 00:14:08.040
equals to 1 happening
simultaneously.
00:14:08.040 --> 00:14:10.810
It's an event that has
probability 1/20.
00:14:10.810 --> 00:14:14.510
Are these two random variables
independent?
00:14:14.510 --> 00:14:17.610
You can try to check a
condition like this.
00:14:17.610 --> 00:14:21.940
But can we tell directly
from the table?
00:14:21.940 --> 00:14:28.470
If I tell you a value of Y,
could that give you useful
00:14:28.470 --> 00:14:29.720
information about X?
00:14:32.180 --> 00:14:32.860
Certainly.
00:14:32.860 --> 00:14:38.680
If I tell you that Y is equal
to 1, this tells you that X
00:14:38.680 --> 00:14:40.990
must be equal to 2.
00:14:40.990 --> 00:14:44.870
But if I tell you that Y was
equal to 3, this tells you
00:14:44.870 --> 00:14:47.540
that, still, X could
be anything.
00:14:47.540 --> 00:14:52.220
So telling you the value of
Y kind of changes what you
00:14:52.220 --> 00:14:57.240
expect or what you consider
possible for the values of the
00:14:57.240 --> 00:14:59.020
other random variable.
00:14:59.020 --> 00:15:03.070
So by just inspecting here, we
can tell that the random
00:15:03.070 --> 00:15:04.860
variables are not independent.
00:15:08.290 --> 00:15:08.470
Okay.
00:15:08.470 --> 00:15:10.990
What's the other concept we
introduced in chapter one?
00:15:10.990 --> 00:15:14.060
We introduced the concept of
conditional independence.
00:15:14.060 --> 00:15:17.120
And conditional independence is
like ordinary independence
00:15:17.120 --> 00:15:20.420
but applied to a conditional
universe where we're given
00:15:20.420 --> 00:15:21.780
some information.
00:15:21.780 --> 00:15:24.610
So suppose someone tells you
that the outcome of the
00:15:24.610 --> 00:15:30.420
experiment is such that X is
less than or equal to 2 and Y
00:15:30.420 --> 00:15:33.920
is larger than or equal to 3.
00:15:33.920 --> 00:15:37.670
So we are given the information
that we now live
00:15:37.670 --> 00:15:40.010
inside this universe.
00:15:40.010 --> 00:15:42.080
So what happens inside
this universe?
00:15:42.080 --> 00:15:47.200
Inside this universe, our random
variables are going to
00:15:47.200 --> 00:15:55.140
have a new joint PMF which is
conditioned on the event that
00:15:55.140 --> 00:15:58.650
we were told that
it has occurred.
00:15:58.650 --> 00:16:04.780
So let A correspond to this
sort of event here.
00:16:04.780 --> 00:16:06.900
And now we're dealing with
conditional probabilities.
00:16:06.900 --> 00:16:09.490
What are those conditional
probabilities?
00:16:09.490 --> 00:16:11.490
We can put them in a table.
00:16:11.490 --> 00:16:14.220
So it's a two by two table,
since we only have two
00:16:14.220 --> 00:16:15.540
possible values.
00:16:15.540 --> 00:16:18.080
What are they going to be?
00:16:18.080 --> 00:16:20.740
Well, these probabilities
show up in the ratios
00:16:20.740 --> 00:16:22.910
1, 2, 2, and 4.
00:16:22.910 --> 00:16:25.480
Those ratios have to
stay the same.
00:16:25.480 --> 00:16:29.700
The probabilities need
to add up to one.
00:16:29.700 --> 00:16:34.030
So what should the denominators
be since these
00:16:34.030 --> 00:16:35.380
numbers add up to nine?
00:16:35.380 --> 00:16:37.820
These are the conditional
probabilities.
00:16:37.820 --> 00:16:40.575
So this is the conditional
PMF in this example.
00:16:43.870 --> 00:16:46.990
Now, in this conditional
universe, is x
00:16:46.990 --> 00:16:48.255
independent from y?
00:16:51.230 --> 00:17:01.450
If I tell you that y takes this
value, so we live in this
00:17:01.450 --> 00:17:04.980
universe, what do you
know about x?
00:17:04.980 --> 00:17:08.109
What you know about x is at this
value is twice as likely
00:17:08.109 --> 00:17:09.930
as that value.
00:17:09.930 --> 00:17:13.859
If I condition on y taking this
value, so we're living
00:17:13.859 --> 00:17:16.450
here, what do you
know about x?
00:17:16.450 --> 00:17:21.660
What you know about x is that
this value is twice as likely
00:17:21.660 --> 00:17:23.240
as that value.
00:17:23.240 --> 00:17:24.500
So it's the same.
00:17:24.500 --> 00:17:30.250
Whether we live here or we live
there, this x is twice as
00:17:30.250 --> 00:17:33.670
likely as that x.
00:17:33.670 --> 00:17:41.560
So the conditional PMF in this
new universe, the conditional
00:17:41.560 --> 00:17:55.970
PMF of X given y, in the new
universe is the same as the
00:17:55.970 --> 00:18:01.250
marginal PMF of X, but of course
in the new universe.
00:18:01.250 --> 00:18:04.370
So no matter what y is,
the conditional
00:18:04.370 --> 00:18:06.860
PMF of X is the same.
00:18:06.860 --> 00:18:12.150
And that conditional
PMF is 1/3 and 2/3.
00:18:12.150 --> 00:18:15.150
This is the conditional PMF of
X in the new universe no
00:18:15.150 --> 00:18:17.000
matter what y occurs.
00:18:17.000 --> 00:18:20.330
So Y does not give us any
information about X, doesn't
00:18:20.330 --> 00:18:25.620
cause us to change our beliefs
inside this little universe.
00:18:25.620 --> 00:18:28.440
And therefore the two random
variables are independent.
00:18:28.440 --> 00:18:31.180
Now, the other way that you
can verify that we have
00:18:31.180 --> 00:18:34.960
independence is to find the
marginal PMFs of the two
00:18:34.960 --> 00:18:36.250
random variables.
00:18:36.250 --> 00:18:39.650
The marginal PMF of
X, you find it by
00:18:39.650 --> 00:18:41.100
adding those two terms.
00:18:41.100 --> 00:18:42.720
You get 1/3.
00:18:42.720 --> 00:18:44.620
Adding those two terms,
you get 2/3.
00:18:44.620 --> 00:18:48.530
Marginal PMF of Y, you find it,
you add these two terms,
00:18:48.530 --> 00:18:51.410
and you get 1/3.
00:18:51.410 --> 00:18:56.470
And the marginal PMF of Y
here is going to be 2/3.
00:18:56.470 --> 00:18:59.700
And then you ask the question,
is the joint the product of
00:18:59.700 --> 00:19:00.860
the marginals?
00:19:00.860 --> 00:19:02.630
And indeed it is.
00:19:02.630 --> 00:19:05.330
This times this gives you 1/9.
00:19:05.330 --> 00:19:08.050
This times this gives you 2/9.
00:19:08.050 --> 00:19:12.180
So the values in the table with
the joint PMFs is the
00:19:12.180 --> 00:19:17.220
product of the marginal PMFs of
X and Y in this universe,
00:19:17.220 --> 00:19:19.090
so the two random variables are
00:19:19.090 --> 00:19:21.850
independent inside this universe.
00:19:21.850 --> 00:19:26.704
So we say that they're
conditionally independent.
00:19:26.704 --> 00:19:28.500
All right.
00:19:28.500 --> 00:19:32.720
Now let's move to the new topic,
to the new concept that
00:19:32.720 --> 00:19:35.170
we introduce in this chapter,
which is the concept of
00:19:35.170 --> 00:19:36.440
expectations.
00:19:36.440 --> 00:19:38.200
So what are the things
to know here?
00:19:38.200 --> 00:19:40.150
One is the general idea.
00:19:40.150 --> 00:19:43.140
The way to think about
expectations is that it's
00:19:43.140 --> 00:19:46.080
something like the average value
for random variable if
00:19:46.080 --> 00:19:49.590
you do an experiment over and
over, and if you interpret
00:19:49.590 --> 00:19:51.550
probabilities as frequencies.
00:19:51.550 --> 00:19:57.030
So you get x's over and over
with a certain frequency --
00:19:57.030 --> 00:19:58.670
P(x) --
00:19:58.670 --> 00:20:01.160
a particular value, little
x, gets realized.
00:20:01.160 --> 00:20:03.960
And each time that this happens,
you get x dollars.
00:20:03.960 --> 00:20:06.040
How many dollars do you
get on the average?
00:20:06.040 --> 00:20:09.330
Well, this formula gives you
that particular average.
00:20:09.330 --> 00:20:13.190
So first thing we do is to write
down a definition for
00:20:13.190 --> 00:20:15.420
this sort of concept.
00:20:15.420 --> 00:20:19.810
But then the other things you
need to know is how to
00:20:19.810 --> 00:20:23.990
calculate expectations using
shortcuts sometimes, and what
00:20:23.990 --> 00:20:25.440
properties they have.
00:20:25.440 --> 00:20:28.500
The most important shortcut
there is is that, if you want
00:20:28.500 --> 00:20:31.250
to calculate the expected value,
the average value for a
00:20:31.250 --> 00:20:36.380
random variable, you do not need
to find the PMF of that
00:20:36.380 --> 00:20:37.530
random variable.
00:20:37.530 --> 00:20:41.180
But you can work directly with
the x's and the y's.
00:20:41.180 --> 00:20:44.210
So you do the experiment
over and over.
00:20:44.210 --> 00:20:46.670
The outcome of the experiment
is a pair (x,y).
00:20:46.670 --> 00:20:49.400
And each time that a certain
(x,y) happens,
00:20:49.400 --> 00:20:51.280
you get so many dollars.
00:20:51.280 --> 00:20:54.990
So this fraction of the time,
a certain (x,y) happens.
00:20:54.990 --> 00:20:58.050
And that fraction of the time,
you get so many dollars, so
00:20:58.050 --> 00:21:00.860
this is the average number
of dollars that you get.
00:21:00.860 --> 00:21:05.230
So what you end up, since it
is the average, then that
00:21:05.230 --> 00:21:07.830
means that it corresponds
to the expected value.
00:21:07.830 --> 00:21:09.820
Now, this is something that, of
course, needs a little bit
00:21:09.820 --> 00:21:10.850
of mathematical proof.
00:21:10.850 --> 00:21:13.880
But this is just a different
way of accounting.
00:21:13.880 --> 00:21:16.510
And it turns out we give
you the right answer.
00:21:16.510 --> 00:21:19.420
And it's a very useful
shortcut.
00:21:19.420 --> 00:21:22.070
Now, when we're talking about
functions of random variables,
00:21:22.070 --> 00:21:26.620
in general, we cannot speak
just about averages.
00:21:26.620 --> 00:21:29.690
That is, the expected value
of a function of a random
00:21:29.690 --> 00:21:31.860
variable is not the same
as the function of
00:21:31.860 --> 00:21:33.320
the expected values.
00:21:33.320 --> 00:21:36.120
A function of averages is
not the same as the
00:21:36.120 --> 00:21:38.380
average of a function.
00:21:38.380 --> 00:21:40.510
So in general, this
is not true.
00:21:40.510 --> 00:21:43.960
But what it's important to know
is to know the exceptions
00:21:43.960 --> 00:21:45.370
to this rule.
00:21:45.370 --> 00:21:48.620
And the important exceptions
are mainly two.
00:21:48.620 --> 00:21:51.560
One is the case of linear
00:21:51.560 --> 00:21:53.040
functions of a random variable.
00:21:53.040 --> 00:21:54.800
We discussed this last time.
00:21:54.800 --> 00:21:59.810
So the expected value of
temperature in Celsius is, you
00:21:59.810 --> 00:22:03.340
first find the expected value of
temperature in Fahrenheit,
00:22:03.340 --> 00:22:05.810
and then you do the conversion
to Celsius.
00:22:05.810 --> 00:22:08.600
So whether you first average and
then do the conversion to
00:22:08.600 --> 00:22:11.730
the new units or not, it
shouldn't matter when you get
00:22:11.730 --> 00:22:13.740
the result.
00:22:13.740 --> 00:22:16.740
The other property that turns
out to be true when you talk
00:22:16.740 --> 00:22:19.280
about multiple random variables
is that expectation
00:22:19.280 --> 00:22:21.070
still behaves linearly.
00:22:21.070 --> 00:22:26.600
So let X, Y, and Z be the score
of a random student at
00:22:26.600 --> 00:22:29.940
each one of the three
sections of the SAT.
00:22:29.940 --> 00:22:36.310
So the overall SAT score is X
plus Y plus Z. This is the
00:22:36.310 --> 00:22:40.940
average score, the average
total SAT score.
00:22:40.940 --> 00:22:43.790
Another way to calculate that
average is to look at the
00:22:43.790 --> 00:22:47.480
first section of the SAT and
see what was the average.
00:22:47.480 --> 00:22:50.710
Look at the second section, look
at what was the average,
00:22:50.710 --> 00:22:53.470
and so the third, and
add the averages.
00:22:53.470 --> 00:22:56.910
So you can do the averages for
each section separately, add
00:22:56.910 --> 00:23:00.500
the averages, or you can find
total scores for each student
00:23:00.500 --> 00:23:01.710
and average them.
00:23:01.710 --> 00:23:05.690
So I guess you probably believe
that this is correct
00:23:05.690 --> 00:23:09.030
if you talk just about
averaging scores.
00:23:09.030 --> 00:23:12.580
Since expectations are just the
variation of averages, it
00:23:12.580 --> 00:23:16.010
turns out that this is
also true in general.
00:23:16.010 --> 00:23:19.760
And the derivation of this is
very simple, based on the
00:23:19.760 --> 00:23:21.320
expected value rule.
00:23:21.320 --> 00:23:24.450
And you can look at
it in the notes.
00:23:24.450 --> 00:23:27.740
So this is one exception,
which is linearity.
00:23:27.740 --> 00:23:31.540
The second important exception
is the case of independent
00:23:31.540 --> 00:23:34.520
random variables, that the
product of two random
00:23:34.520 --> 00:23:37.830
variables has an expectation
which is the product of the
00:23:37.830 --> 00:23:38.980
expectations.
00:23:38.980 --> 00:23:41.400
In general, this is not true.
00:23:41.400 --> 00:23:47.010
But for the case where we have
independence, the expectation
00:23:47.010 --> 00:23:48.080
works out as follows.
00:23:48.080 --> 00:23:55.130
Using the expected value rule,
this is how you calculate the
00:23:55.130 --> 00:23:59.170
expected value of a function
of a random variable.
00:23:59.170 --> 00:24:04.810
So think of this as being your
g(X, Y) and this being your
00:24:04.810 --> 00:24:06.160
g(little x, y).
00:24:06.160 --> 00:24:08.760
So this is something that's
generally true.
00:24:08.760 --> 00:24:20.350
Now, if we have independence,
then the PMFs factor out, and
00:24:20.350 --> 00:24:25.660
then you can separate this sum
by bringing together the x
00:24:25.660 --> 00:24:30.130
terms, bring them outside
the y summation.
00:24:30.130 --> 00:24:34.370
And you find that this is the
same as expected value of X
00:24:34.370 --> 00:24:38.890
times the expected value of Y.
So independence is used in
00:24:38.890 --> 00:24:40.140
this step here.
00:24:44.020 --> 00:24:48.640
OK, now what if X and Y are
independent, but instead of
00:24:48.640 --> 00:24:51.020
taking the expectation of
X times Y, we take the
00:24:51.020 --> 00:24:56.600
expectation of the product of
two functions of X and Y?
00:24:56.600 --> 00:24:59.560
I claim that the expected value
of the product is still
00:24:59.560 --> 00:25:02.630
going to be the product of
the expected values.
00:25:02.630 --> 00:25:04.180
How do we show that?
00:25:04.180 --> 00:25:09.230
We could show it by just redoing
this derivation here.
00:25:09.230 --> 00:25:13.500
Instead of X and Y, we would
have g(X) and h(Y), so the
00:25:13.500 --> 00:25:14.850
algebra goes through.
00:25:14.850 --> 00:25:17.720
But there's a better way to
think about it which is more
00:25:17.720 --> 00:25:18.960
conceptual.
00:25:18.960 --> 00:25:20.886
And here's the idea.
00:25:20.886 --> 00:25:25.750
If X and Y are independent,
what does it mean?
00:25:25.750 --> 00:25:31.180
X does not convey any
information about Y. If X
00:25:31.180 --> 00:25:36.350
conveys no information about Y,
does X convey information
00:25:36.350 --> 00:25:40.500
about h(Y)?
00:25:40.500 --> 00:25:41.940
No.
00:25:41.940 --> 00:25:46.160
If X tells me nothing about Y,
nothing new, it shouldn't tell
00:25:46.160 --> 00:25:50.580
me anything about h(Y).
00:25:50.580 --> 00:25:59.270
Now, if X tells me nothing about
h of h(Y), could g(X)
00:25:59.270 --> 00:26:01.470
tell me something about h(Y)?
00:26:01.470 --> 00:26:02.250
No.
00:26:02.250 --> 00:26:06.780
So the idea is that, if X is
unrelated to Y, doesn't have
00:26:06.780 --> 00:26:11.080
any useful information, then
g(X) could not have any useful
00:26:11.080 --> 00:26:13.250
information for h(Y).
00:26:13.250 --> 00:26:21.030
So if X and Y are independent,
then g(X) and h(Y) are also
00:26:21.030 --> 00:26:22.280
independent.
00:26:27.150 --> 00:26:29.430
So this is something that
one can try to prove
00:26:29.430 --> 00:26:31.500
mathematically, but it's more
important to understand
00:26:31.500 --> 00:26:34.530
conceptually why this is so.
00:26:34.530 --> 00:26:38.220
It's in terms of conveying
information.
00:26:38.220 --> 00:26:44.950
So if X tells me nothing about
Y, X cannot tell me anything
00:26:44.950 --> 00:26:48.490
about Y cubed, or X cannot
tell me anything by Y
00:26:48.490 --> 00:26:51.030
squared, and so on.
00:26:51.030 --> 00:26:52.260
That's the idea.
00:26:52.260 --> 00:26:57.180
And once we are convinced that
g(X) and h(Y) are independent,
00:26:57.180 --> 00:27:00.550
then we can apply our previous
rule, that for independent
00:27:00.550 --> 00:27:04.390
random variables, expectations
multiply the right way.
00:27:04.390 --> 00:27:08.660
Apply the previous rule, but
apply it now to these two
00:27:08.660 --> 00:27:10.490
independent random variables.
00:27:10.490 --> 00:27:12.785
And we get the conclusion
that we wanted.
00:27:15.500 --> 00:27:19.050
Now, besides expectations, we
also introduced the concept of
00:27:19.050 --> 00:27:20.300
the variance.
00:27:23.560 --> 00:27:27.450
And if you remember the
definition of the variance,
00:27:27.450 --> 00:27:31.100
let me write down the formula
for the variance of aX.
00:27:31.100 --> 00:27:34.920
It's the expected value of the
random variable that we're
00:27:34.920 --> 00:27:39.630
looking at minus the expected
value of the random variable
00:27:39.630 --> 00:27:42.050
that we're looking at.
00:27:42.050 --> 00:27:44.780
So this is the difference
of the random
00:27:44.780 --> 00:27:47.850
variable from its mean.
00:27:47.850 --> 00:27:50.880
And we take that difference
and square it, so it's the
00:27:50.880 --> 00:27:53.070
squared distance from the
mean, and then take
00:27:53.070 --> 00:27:55.250
expectations of the
whole thing.
00:27:55.250 --> 00:27:59.570
So when you look at that
expression, you realize that a
00:27:59.570 --> 00:28:01.780
can be pulled out of
those expressions.
00:28:04.540 --> 00:28:10.340
And because there is a squared,
when you pull out the
00:28:10.340 --> 00:28:12.980
a, it's going to come
out as an a-squared.
00:28:12.980 --> 00:28:16.050
So that gives us the rule for
finding the variance of a
00:28:16.050 --> 00:28:18.990
scale or product of
a random variable.
00:28:18.990 --> 00:28:22.370
The variance captures the idea
of how wide, how spread out a
00:28:22.370 --> 00:28:24.210
certain distribution is.
00:28:24.210 --> 00:28:26.600
Bigger variance means it's
more spread out.
00:28:26.600 --> 00:28:29.360
Now, if you take a random
variable and the constants to
00:28:29.360 --> 00:28:31.960
it, what does it do to
its distribution?
00:28:31.960 --> 00:28:35.480
It just shifts it, but it
doesn't change its width.
00:28:35.480 --> 00:28:37.140
So intuitively it
means that the
00:28:37.140 --> 00:28:39.030
variance should not change.
00:28:39.030 --> 00:28:42.360
You can check that
mathematically, but it should
00:28:42.360 --> 00:28:44.290
also make sense intuitively.
00:28:44.290 --> 00:28:47.710
So the variance, when you add
the constant, does not change.
00:28:47.710 --> 00:28:51.680
Now, can you add variances is
the way we added expectations?
00:28:51.680 --> 00:28:54.760
Does variance behave linearly?
00:28:54.760 --> 00:28:57.810
It turns out that not always.
00:28:57.810 --> 00:28:59.270
Here, we need a condition.
00:28:59.270 --> 00:29:03.880
It's only in special cases--
00:29:03.880 --> 00:29:06.210
for example, when the two
random variables are
00:29:06.210 --> 00:29:07.190
independent--
00:29:07.190 --> 00:29:09.300
that you can add variances.
00:29:09.300 --> 00:29:13.300
The variance of the sum is the
sum of the variances if X and
00:29:13.300 --> 00:29:15.370
Y are independent.
00:29:15.370 --> 00:29:18.880
The derivation of this is,
again, very short and simple.
00:29:18.880 --> 00:29:22.590
We'll skip it, but it's an
important fact to remember.
00:29:22.590 --> 00:29:26.140
Now, to appreciate why this
equality is not true always,
00:29:26.140 --> 00:29:28.980
we can think of some
extreme examples.
00:29:28.980 --> 00:29:32.250
Suppose that X is the same as
Y. What's going to be the
00:29:32.250 --> 00:29:34.520
variance of X plus Y?
00:29:34.520 --> 00:29:39.810
Well, X plus Y, in this case,
is the same as 2X, so we're
00:29:39.810 --> 00:29:44.620
going to get 4 times the
variance of X, which is
00:29:44.620 --> 00:29:49.770
different than the variance of
X plus the variance of X.
00:29:49.770 --> 00:29:52.920
So that expression would give
us twice the variance of X.
00:29:52.920 --> 00:29:56.460
But actually now it's 4 times
the variance of X. The other
00:29:56.460 --> 00:30:01.990
extreme would be if X is equal
to -Y. Then the variance is
00:30:01.990 --> 00:30:05.390
the variance of the random
variable, which is always
00:30:05.390 --> 00:30:07.020
equal to 0.
00:30:07.020 --> 00:30:09.980
Now, a random variable which
is always equal to 0 has no
00:30:09.980 --> 00:30:10.700
uncertainty.
00:30:10.700 --> 00:30:14.570
It is always equal to its mean
value, so the variance, in
00:30:14.570 --> 00:30:17.090
this case, turns out to be 0.
00:30:17.090 --> 00:30:19.940
So in both of these cases,
of course we have random
00:30:19.940 --> 00:30:23.020
variables that are extremely
dependent.
00:30:23.020 --> 00:30:24.740
Why are they dependent?
00:30:24.740 --> 00:30:27.940
Because if I tell you something
about Y, it tells
00:30:27.940 --> 00:30:32.020
you an awful lot about the value
of X. There's a lot of
00:30:32.020 --> 00:30:34.910
information about X if
I tell you Y, in this
00:30:34.910 --> 00:30:37.050
case or in that case.
00:30:37.050 --> 00:30:39.940
And finally, a short drill.
00:30:39.940 --> 00:30:42.570
If I tell you that the random
variables are independent and
00:30:42.570 --> 00:30:44.840
you want to calculate the
variance of a linear
00:30:44.840 --> 00:30:48.330
combination of this kind,
then how do you argue?
00:30:48.330 --> 00:30:51.940
You argue that, since X and Y
are independent, this means
00:30:51.940 --> 00:30:55.660
that X and 3Y are also
independent.
00:30:55.660 --> 00:30:59.610
X has no information about Y, so
X has no information about
00:30:59.610 --> 00:31:05.000
-Y. X has no information about
-Y, so X should not have any
00:31:05.000 --> 00:31:10.270
information about -3Y.
00:31:10.270 --> 00:31:14.400
So X and -3Y are independent.
00:31:14.400 --> 00:31:18.480
So the variance of Z should be
the variance of X plus the
00:31:18.480 --> 00:31:26.910
variance of -3Y, which is the
variance of X plus 9 times the
00:31:26.910 --> 00:31:31.760
variance of Y. The important
thing to note here is that no
00:31:31.760 --> 00:31:34.080
matter what happens, you
end up getting a
00:31:34.080 --> 00:31:37.000
plus here, not a minus.
00:31:37.000 --> 00:31:41.160
So that's the sort of important
thing to remember in
00:31:41.160 --> 00:31:42.410
this type of calculation.
00:31:44.820 --> 00:31:48.890
So this has been all concepts,
reviews, new
00:31:48.890 --> 00:31:50.390
concepts and all that.
00:31:50.390 --> 00:31:52.720
It's the usual fire hose.
00:31:52.720 --> 00:31:56.680
Now let's use them to do
something useful finally.
00:31:56.680 --> 00:31:59.220
So let's revisit our old
example, the binomial
00:31:59.220 --> 00:32:03.350
distribution, which counts the
number of successes in
00:32:03.350 --> 00:32:06.230
independent trials of a coin.
00:32:06.230 --> 00:32:09.030
It's a biased coin that has
a probability of heads, or
00:32:09.030 --> 00:32:13.000
probability of success, equal
to p at each trial.
00:32:13.000 --> 00:32:16.160
Finally, we can go through the
exercise of calculating the
00:32:16.160 --> 00:32:18.820
expected value of this
random variable.
00:32:18.820 --> 00:32:21.790
And there's the way of
calculating that expectation
00:32:21.790 --> 00:32:24.260
that would be the favorite
of those people who enjoy
00:32:24.260 --> 00:32:27.500
algebra, which is to write down
the definition of the
00:32:27.500 --> 00:32:28.740
expected value.
00:32:28.740 --> 00:32:31.980
We add over all possible values
of the random variable,
00:32:31.980 --> 00:32:35.580
over all the possible k's, and
weigh them according to the
00:32:35.580 --> 00:32:38.440
probabilities that this
particular k occurs.
00:32:38.440 --> 00:32:42.250
The probability that X takes on
a particular value k is, of
00:32:42.250 --> 00:32:44.820
course, the binomial
PMF, which is
00:32:44.820 --> 00:32:47.560
this familiar formula.
00:32:47.560 --> 00:32:50.480
Clearly, that would be a messy
and challenging calculation.
00:32:50.480 --> 00:32:52.490
Can we find a shortcut?
00:32:52.490 --> 00:32:54.010
There's a very clever trick.
00:32:54.010 --> 00:32:56.690
There's lots of problems in
probability that you can
00:32:56.690 --> 00:33:00.000
approach really nicely by
breaking up the random
00:33:00.000 --> 00:33:03.830
variable of interest into a
sum of simpler and more
00:33:03.830 --> 00:33:06.010
manageable random variables.
00:33:06.010 --> 00:33:09.700
And if you can make it to be a
sum of random variables that
00:33:09.700 --> 00:33:12.590
are just 0's or 1's,
so much the better.
00:33:12.590 --> 00:33:13.990
Life is easier.
00:33:13.990 --> 00:33:16.850
Random variables that take
values 0 or 1, we call them
00:33:16.850 --> 00:33:18.380
indicator variables.
00:33:18.380 --> 00:33:21.700
They indicate whether an event
has occurred or not.
00:33:21.700 --> 00:33:25.600
In this case, we look at each
coin flip one at a time.
00:33:25.600 --> 00:33:29.710
For the i-th flip, if it
resulted in heads or a
00:33:29.710 --> 00:33:32.110
success, we record it 1.
00:33:32.110 --> 00:33:34.220
If not, we record it 0.
00:33:34.220 --> 00:33:37.540
And then we look at the
random variable.
00:33:37.540 --> 00:33:42.580
If we take the sum of the Xi's,
what is it going to be?
00:33:42.580 --> 00:33:48.030
We add one each time that we get
a success, so the sum is
00:33:48.030 --> 00:33:50.820
going to be the total
number of successes.
00:33:50.820 --> 00:33:53.900
So we break up the random
variable of interest as a sum
00:33:53.900 --> 00:33:57.610
of really nice and simple
random variables.
00:33:57.610 --> 00:34:00.380
And now we can use the linearity
of expectations.
00:34:00.380 --> 00:34:02.800
We're going to find the
expectation of X by finding
00:34:02.800 --> 00:34:05.700
the expectation of the Xi's
and then adding the
00:34:05.700 --> 00:34:06.770
expectations.
00:34:06.770 --> 00:34:09.520
What's the expected
value of Xi?
00:34:09.520 --> 00:34:13.050
Well, Xi takes the value 1 with
probability p, and takes
00:34:13.050 --> 00:34:15.610
the value 0 with probability
1-p.
00:34:15.610 --> 00:34:19.070
So the expected value
of Xi is just p.
00:34:19.070 --> 00:34:24.889
So the expected value of X is
going to be just n times p.
00:34:24.889 --> 00:34:29.560
Because X is the sum of n terms,
each one of which has
00:34:29.560 --> 00:34:33.050
expectation p, the expected
value of the sum is the sum of
00:34:33.050 --> 00:34:34.600
the expected values.
00:34:34.600 --> 00:34:38.440
So I guess that's a pretty good
shortcut for doing this
00:34:38.440 --> 00:34:40.790
horrendous calculation
up there.
00:34:40.790 --> 00:34:47.210
So in case you didn't realize
it, that's what we just
00:34:47.210 --> 00:34:51.940
established without
doing any algebra.
00:34:51.940 --> 00:34:52.219
Good.
00:34:52.219 --> 00:34:56.150
How about the variance
of X, of Xi?
00:34:56.150 --> 00:34:57.570
Two ways to calculate it.
00:34:57.570 --> 00:35:01.160
One is by using directly the
formula for the variance,
00:35:01.160 --> 00:35:02.370
which would be --
00:35:02.370 --> 00:35:03.900
let's see what it would be.
00:35:03.900 --> 00:35:06.800
With probability
p, you get a 1.
00:35:06.800 --> 00:35:11.270
And in this case, you are
so far from the mean.
00:35:11.270 --> 00:35:13.950
That's your squared distance
from the mean.
00:35:13.950 --> 00:35:18.750
With probability 1-p, you
get a 0, which is so far
00:35:18.750 --> 00:35:20.380
away from the mean.
00:35:20.380 --> 00:35:24.380
And then you can simplify that
formula and get an answer.
00:35:24.380 --> 00:35:28.660
How about a slightly easier
way of doing it.
00:35:28.660 --> 00:35:31.360
Instead of doing the algebra
here, let me indicate the
00:35:31.360 --> 00:35:33.420
slightly easier way.
00:35:33.420 --> 00:35:36.070
We have a formula for the
variance that tells us that we
00:35:36.070 --> 00:35:42.290
can find the variance by
proceeding this way.
00:35:42.290 --> 00:35:45.980
That's a formula that's
generally true for variances.
00:35:45.980 --> 00:35:47.380
Why is this easier?
00:35:47.380 --> 00:35:49.560
What's the expected value
of Xi squared?
00:35:52.240 --> 00:35:53.290
Backtrack.
00:35:53.290 --> 00:35:57.140
What is Xi squared, after all?
00:35:57.140 --> 00:35:59.510
It's the same thing as Xi.
00:35:59.510 --> 00:36:04.200
Since Xi takes value 0 and 1, Xi
squared also takes the same
00:36:04.200 --> 00:36:05.780
values, 0 and 1.
00:36:05.780 --> 00:36:09.050
So the expected value of Xi
squared is the same as the
00:36:09.050 --> 00:36:11.990
expected value of Xi,
which is equal to p.
00:36:15.120 --> 00:36:20.530
And the expected value of Xi
squared is p squared, so we
00:36:20.530 --> 00:36:24.680
get the final answer,
p times (1-p).
00:36:24.680 --> 00:36:28.630
If you were to work through and
do the cancellations in
00:36:28.630 --> 00:36:32.400
this messy expression here,
after one line you would also
00:36:32.400 --> 00:36:34.050
get to the same formula.
00:36:34.050 --> 00:36:38.240
But this sort of illustrates
that working with this formula
00:36:38.240 --> 00:36:40.550
for the variance, sometimes
things work
00:36:40.550 --> 00:36:43.090
out a little faster.
00:36:43.090 --> 00:36:45.420
Finally, are we in business?
00:36:45.420 --> 00:36:47.820
Can we calculate the variance
of the random
00:36:47.820 --> 00:36:50.100
variable X as well?
00:36:50.100 --> 00:36:52.650
Well, we have the rule that
for independent random
00:36:52.650 --> 00:36:55.680
variables, the variance
of the sum is
00:36:55.680 --> 00:36:57.870
the sum of the variances.
00:36:57.870 --> 00:37:00.930
So to find the variance of X,
we just need to add the
00:37:00.930 --> 00:37:02.960
variances of the Xi's.
00:37:02.960 --> 00:37:07.140
We have n Xi's, and each
one of them has
00:37:07.140 --> 00:37:10.110
variance p_n times (1-p).
00:37:10.110 --> 00:37:12.290
And we are done.
00:37:12.290 --> 00:37:17.780
So this way, we have calculated
both the mean and
00:37:17.780 --> 00:37:21.550
the variance of the binomial
random variable.
00:37:21.550 --> 00:37:27.280
It's interesting to look at this
particular formula and
00:37:27.280 --> 00:37:29.180
see what it tells us.
00:37:29.180 --> 00:37:33.470
If you are to plot the variance
of X as a function of
00:37:33.470 --> 00:37:36.050
p, it has this shape.
00:37:45.900 --> 00:37:51.310
And the maximum is
here at 1/2.
00:37:51.310 --> 00:37:55.150
p times (1-p) is 0 when
p is equal to 0.
00:37:55.150 --> 00:37:58.570
And when p equals to 1, it's a
quadratic, so it must have
00:37:58.570 --> 00:38:00.250
this particular shape.
00:38:00.250 --> 00:38:02.080
So what does it tell us?
00:38:02.080 --> 00:38:05.880
If you think about variance as
a measure of uncertainty, it
00:38:05.880 --> 00:38:10.290
tells you that coin flips
are most uncertain when
00:38:10.290 --> 00:38:12.620
your coin is fair.
00:38:12.620 --> 00:38:16.190
When p is equal to 1/2, that's
when you have the most
00:38:16.190 --> 00:38:17.050
randomness.
00:38:17.050 --> 00:38:18.790
And this is kind of intuitive.
00:38:18.790 --> 00:38:21.460
if on the other hand I tell you
that the coin is extremely
00:38:21.460 --> 00:38:26.490
biased, p very close to 1, which
means it almost always
00:38:26.490 --> 00:38:29.460
gives you heads, then
that would be
00:38:29.460 --> 00:38:30.630
a case of low variance.
00:38:30.630 --> 00:38:32.870
There's low variability
in the results.
00:38:32.870 --> 00:38:35.270
There's little uncertainty about
what's going to happen.
00:38:35.270 --> 00:38:39.570
It's going to be mostly heads
with some occasional tails.
00:38:39.570 --> 00:38:42.010
So p equals 1/2.
00:38:42.010 --> 00:38:45.350
Fair coin, that's the coin which
is the most uncertain of
00:38:45.350 --> 00:38:47.240
all coins, in some sense.
00:38:47.240 --> 00:38:49.240
And it corresponds to the
biggest variance.
00:38:49.240 --> 00:38:53.760
It corresponds to an X that has
the widest distribution.
00:38:53.760 --> 00:38:57.680
Now that we're on a roll and we
can calculate such hugely
00:38:57.680 --> 00:39:01.400
complicated sums in simple ways,
let us try to push our
00:39:01.400 --> 00:39:05.100
luck and do a problem with
this flavor, but a little
00:39:05.100 --> 00:39:06.590
harder than that.
00:39:06.590 --> 00:39:07.960
So you go to one of those
00:39:07.960 --> 00:39:09.910
old-fashioned cocktail parties.
00:39:09.910 --> 00:39:16.010
All males at least will have
those standard big hats which
00:39:16.010 --> 00:39:16.990
look identical.
00:39:16.990 --> 00:39:19.700
They check them in when
they walk in.
00:39:19.700 --> 00:39:23.390
And when they walk out, since
they look pretty identical,
00:39:23.390 --> 00:39:26.830
they just pick a random
hat and go home.
00:39:26.830 --> 00:39:31.080
So n people, they pick their
hats completely at random,
00:39:31.080 --> 00:39:33.950
quote, unquote, and
then leave.
00:39:33.950 --> 00:39:36.970
And the question is, to say
something about the number of
00:39:36.970 --> 00:39:42.070
people who end up, by accident
or by luck, to get back their
00:39:42.070 --> 00:39:45.170
own hat, the exact same hat
that they checked in.
00:39:45.170 --> 00:39:48.490
OK, first what do we mean
completely at random?
00:39:48.490 --> 00:39:51.060
Completely at random, we
basically mean that any
00:39:51.060 --> 00:39:54.180
permutation of the hats
is equally likely.
00:39:54.180 --> 00:39:58.520
Any way of distributing those
n hats to the n people, any
00:39:58.520 --> 00:40:01.350
particular way is as likely
as any other way.
00:40:01.350 --> 00:40:05.230
So there's complete symmetry
between hats and people.
00:40:05.230 --> 00:40:08.490
So what we want to do is to
calculate the expected value
00:40:08.490 --> 00:40:11.460
and the variance of this random
variable X. Let's start
00:40:11.460 --> 00:40:13.240
with the expected value.
00:40:13.240 --> 00:40:17.840
Let's reuse the trick from
the binomial case.
00:40:17.840 --> 00:40:21.110
So total number of hats picked,
we're going to think
00:40:21.110 --> 00:40:24.140
of total number of hats
picked as a sum of
00:40:24.140 --> 00:40:26.900
(0, 1) random variables.
00:40:26.900 --> 00:40:30.470
X1 tells us whether person
1 got their own hat back.
00:40:30.470 --> 00:40:32.920
If they did, we record a 1.
00:40:32.920 --> 00:40:34.960
X2, the same thing.
00:40:34.960 --> 00:40:40.910
By adding all X's is how many
1's did we get, which counts
00:40:40.910 --> 00:40:45.510
how many people selected
their own hats.
00:40:45.510 --> 00:40:48.100
So we broke down the random
variable of interest, the
00:40:48.100 --> 00:40:51.500
number of people who get their
own hats back, as a sum of
00:40:51.500 --> 00:40:53.570
random variables.
00:40:53.570 --> 00:40:56.200
And these random variables,
again, are easy to handle,
00:40:56.200 --> 00:40:58.010
because they're binary.
00:40:58.010 --> 00:40:59.250
The only take two values.
00:40:59.250 --> 00:41:03.500
What's the probability that Xi
is equal to 1, the i-th person
00:41:03.500 --> 00:41:06.730
has a probability that they
get their own hat?
00:41:06.730 --> 00:41:09.430
There's n hats by symmetry.
00:41:09.430 --> 00:41:11.890
The chance is that they end up
getting their own hat, as
00:41:11.890 --> 00:41:14.930
opposed to any one of the
other n - 1 hats,
00:41:14.930 --> 00:41:18.020
is going to be 1/n.
00:41:18.020 --> 00:41:20.710
So what's the expected
value of Xi?
00:41:20.710 --> 00:41:23.130
It's one times 1/n.
00:41:23.130 --> 00:41:26.510
With probability 1/n, you get
your own hat, or you get a
00:41:26.510 --> 00:41:30.960
value of 0 with probability
1-1/n, which is 1/n.
00:41:34.660 --> 00:41:38.360
All right, so we got the
expected value of the Xi's.
00:41:38.360 --> 00:41:41.510
And remember, we want to do is
to calculate the expected
00:41:41.510 --> 00:41:46.900
value of X by using this
decomposition?
00:41:46.900 --> 00:41:52.230
Are the random variables Xi
independent of each other?
00:41:52.230 --> 00:41:55.470
You can try to answer that
question by writing down a
00:41:55.470 --> 00:41:58.510
joint PMF for the X's,
but I'm sure that
00:41:58.510 --> 00:42:00.000
you will not succeed.
00:42:00.000 --> 00:42:02.740
But can you think intuitively?
00:42:02.740 --> 00:42:05.940
If I tell you information about
some of the Xi's, does
00:42:05.940 --> 00:42:08.920
it give you information about
the remaining ones?
00:42:08.920 --> 00:42:09.300
Yeah.
00:42:09.300 --> 00:42:13.950
If I tell you that out of 10
people, 9 of them got their
00:42:13.950 --> 00:42:16.710
own hat back, does that
tell you something
00:42:16.710 --> 00:42:18.330
about the 10th person?
00:42:18.330 --> 00:42:18.690
Yes.
00:42:18.690 --> 00:42:22.510
If 9 got their own hat, then the
10th must also have gotten
00:42:22.510 --> 00:42:24.170
their own hat back.
00:42:24.170 --> 00:42:27.170
So the first 9 random variables
tell you something
00:42:27.170 --> 00:42:28.790
about the 10th one.
00:42:28.790 --> 00:42:33.000
And conveying information of
this sort, that's the case of
00:42:33.000 --> 00:42:34.410
dependence.
00:42:34.410 --> 00:42:38.100
All right, so the random
variables are not independent.
00:42:38.100 --> 00:42:39.030
Are we stuck?
00:42:39.030 --> 00:42:43.240
Can we still calculate the
expected value of X?
00:42:43.240 --> 00:42:45.210
Yes, we can.
00:42:45.210 --> 00:42:50.710
And the reason we can is that
expectations are linear.
00:42:50.710 --> 00:42:53.940
Expectation of a sum of random
variables is the sum of the
00:42:53.940 --> 00:42:55.140
expectations.
00:42:55.140 --> 00:42:57.490
And that's always true.
00:42:57.490 --> 00:43:00.710
There's no independence
assumption that's being used
00:43:00.710 --> 00:43:02.540
to apply that rule.
00:43:02.540 --> 00:43:06.980
So we have that the expected
value of X is the sum of the
00:43:06.980 --> 00:43:09.580
expected value of the Xi's.
00:43:09.580 --> 00:43:12.970
And this is a property
that's always true.
00:43:12.970 --> 00:43:14.350
You don't need independence.
00:43:14.350 --> 00:43:15.590
You don't care.
00:43:15.590 --> 00:43:18.660
So we're adding n terms,
each one of which has
00:43:18.660 --> 00:43:20.430
expected value 1/n.
00:43:20.430 --> 00:43:22.670
And the final answer is 1.
00:43:22.670 --> 00:43:27.430
So out of the 100 people who
selected hats at random, on
00:43:27.430 --> 00:43:32.590
the average, you expect only one
of them to end up getting
00:43:32.590 --> 00:43:35.830
their own hat back.
00:43:35.830 --> 00:43:36.640
Very good.
00:43:36.640 --> 00:43:41.620
So since we are succeeding so
far, let's try to see if we
00:43:41.620 --> 00:43:44.620
can succeed in calculating
the variance as well.
00:43:44.620 --> 00:43:46.580
And of course, we will.
00:43:46.580 --> 00:43:50.160
But it's going to be a little
more complicated.
00:43:50.160 --> 00:43:52.760
The reason it's going to be a
little more complicated is
00:43:52.760 --> 00:43:56.500
because the Xi's are not
independent, so the variance
00:43:56.500 --> 00:44:00.280
of the sum is not the same as
the sum of the variances.
00:44:00.280 --> 00:44:04.320
So it's not enough to find the
variances of the Xi's.
00:44:04.320 --> 00:44:06.930
We'll have to do more work.
00:44:06.930 --> 00:44:08.550
And here's what's involved.
00:44:08.550 --> 00:44:12.320
Let's start with the general
formula for the variance,
00:44:12.320 --> 00:44:15.950
which, as I mentioned before,
it's usually the simpler way
00:44:15.950 --> 00:44:18.430
to go about calculating
variances.
00:44:18.430 --> 00:44:21.800
So we need to calculate the
expected value for X-squared,
00:44:21.800 --> 00:44:27.110
and subtract from it the
expectation squared.
00:44:27.110 --> 00:44:31.010
Well, we already found the
expected value of X. It's
00:44:31.010 --> 00:44:31.870
equal to 1.
00:44:31.870 --> 00:44:34.580
So 1-squared gives us just 1.
00:44:34.580 --> 00:44:37.980
So we're left with the task of
calculating the expected value
00:44:37.980 --> 00:44:43.440
of X-squared, the random
variable X-squared.
00:44:43.440 --> 00:44:45.610
Let's try to follow
the same idea.
00:44:45.610 --> 00:44:49.770
Write this messy random
variable, X-squared, as a sum
00:44:49.770 --> 00:44:54.440
of hopefully simpler
random variables.
00:44:54.440 --> 00:44:59.350
So X is the sum of the
Xi's, so you square
00:44:59.350 --> 00:45:01.560
both sides of this.
00:45:01.560 --> 00:45:05.150
And then you expand the
right-hand side.
00:45:05.150 --> 00:45:09.390
When you expand the right-hand
side, you get the squares of
00:45:09.390 --> 00:45:11.420
the terms that appear here.
00:45:11.420 --> 00:45:14.230
And then you get all
the cross-terms.
00:45:14.230 --> 00:45:19.100
For every pair of (i,j) that
are different, i different
00:45:19.100 --> 00:45:24.030
than j, you're going to have
a cross-term in the sum.
00:45:24.030 --> 00:45:29.230
So now, in order to calculate
the expected value of
00:45:29.230 --> 00:45:32.480
X-squared, what does
our task reduce to?
00:45:32.480 --> 00:45:36.230
It reduces to calculating the
expected value of this term
00:45:36.230 --> 00:45:38.690
and calculating the expected
value of that term.
00:45:38.690 --> 00:45:41.060
So let's do them
one at a time.
00:45:41.060 --> 00:45:47.040
Expected value of Xi squared,
what is it going to be?
00:45:47.040 --> 00:45:48.660
Same trick as before.
00:45:48.660 --> 00:45:53.350
Xi takes value 0 or 1, so Xi
squared takes just the same
00:45:53.350 --> 00:45:55.290
values, 0 or 1.
00:45:55.290 --> 00:45:57.010
So that's the easy one.
00:45:57.010 --> 00:46:00.680
That's the same as expected
value of Xi, which we already
00:46:00.680 --> 00:46:04.410
know to be 1/n.
00:46:04.410 --> 00:46:07.830
So this gives us a first
contribution down here.
00:46:10.840 --> 00:46:14.220
The expected value of this
term is going to be what?
00:46:14.220 --> 00:46:17.210
We have n terms in
the summation.
00:46:17.210 --> 00:46:21.800
And each one of these terms
has an expectation of 1/n.
00:46:21.800 --> 00:46:24.710
So we did a piece
of the puzzle.
00:46:24.710 --> 00:46:28.480
So now let's deal with the
second piece of the puzzle.
00:46:28.480 --> 00:46:32.020
Let's find the expected
value of Xi times Xj.
00:46:32.020 --> 00:46:35.540
Now by symmetry, the expected
value of Xi times Xj is going
00:46:35.540 --> 00:46:39.900
to be the same no matter
what i and j you see.
00:46:39.900 --> 00:46:44.930
So let's just think about X1
and X2 and try to find the
00:46:44.930 --> 00:46:48.260
expected value of X1 and X2.
00:46:48.260 --> 00:46:51.710
X1 times X2 is a random
variable.
00:46:51.710 --> 00:46:53.960
What values does it take?
00:46:53.960 --> 00:46:56.570
Only 0 or 1?
00:46:56.570 --> 00:47:00.000
Since X1 and X2 are 0 or 1,
their product can only take
00:47:00.000 --> 00:47:02.010
the values of 0 or 1.
00:47:02.010 --> 00:47:04.990
So to find the probability
distribution of this random
00:47:04.990 --> 00:47:07.320
variable, it's just sufficient
to find the probability that
00:47:07.320 --> 00:47:09.530
it takes the value of 1.
00:47:09.530 --> 00:47:14.500
Now, what does X1 times
X2 equal to 1 mean?
00:47:14.500 --> 00:47:19.500
It means that X1 was
1 and X2 was 1.
00:47:19.500 --> 00:47:22.390
The only way that you can get
a product of 1 is if both of
00:47:22.390 --> 00:47:24.350
them turned out to be 1's.
00:47:24.350 --> 00:47:29.570
So that's the same as saying,
persons 1 and 2 both picked
00:47:29.570 --> 00:47:31.980
their own hats.
00:47:31.980 --> 00:47:35.510
The probability that person 1
and person 2 both pick their
00:47:35.510 --> 00:47:39.600
own hats is the probability of
two things happening, which is
00:47:39.600 --> 00:47:42.320
the product of the first thing
happening times the
00:47:42.320 --> 00:47:44.310
conditional probability
of the second, given
00:47:44.310 --> 00:47:46.160
that the first happened.
00:47:46.160 --> 00:47:48.690
And in words, this is the
probability that the first
00:47:48.690 --> 00:47:51.840
person picked their own hat
times the probability that the
00:47:51.840 --> 00:47:54.920
second person picks their own
hat, given that the first
00:47:54.920 --> 00:47:56.990
person already picked
their own.
00:47:56.990 --> 00:47:58.820
So what's the probability
that the first person
00:47:58.820 --> 00:48:00.760
picks their own hat?
00:48:00.760 --> 00:48:03.040
We know that it's 1/n.
00:48:03.040 --> 00:48:05.030
Now, how about the
second person?
00:48:05.030 --> 00:48:09.540
If I tell you that one person
has their own hat, and that
00:48:09.540 --> 00:48:13.240
person takes their hat and goes
away, from the point of
00:48:13.240 --> 00:48:17.250
view of the second person,
there's n - 1 people left
00:48:17.250 --> 00:48:19.770
looking at n - 1 hats.
00:48:19.770 --> 00:48:22.330
And they're getting just
hats at random.
00:48:22.330 --> 00:48:24.930
What's the chance that
I will get my own?
00:48:24.930 --> 00:48:26.180
It's 1/n - 1.
00:48:29.210 --> 00:48:33.700
So think of them as person 1
goes, picks a hat at random,
00:48:33.700 --> 00:48:36.850
it happens to be their
own, and it leaves.
00:48:36.850 --> 00:48:40.120
You're left with n - 1 people,
and there are n
00:48:40.120 --> 00:48:41.250
- 1 hats out there.
00:48:41.250 --> 00:48:44.490
Person 2 goes and picks a hat
at random, with probability
00:48:44.490 --> 00:48:48.820
1/n - 1, is going to
pick his own hat.
00:48:48.820 --> 00:48:52.400
So the expected value now of
this random variable is,
00:48:52.400 --> 00:48:54.520
again, that same number,
because this is
00:48:54.520 --> 00:48:57.500
a 0, 1 random variable.
00:48:57.500 --> 00:49:02.370
So this is the same as expected
value of Xi times Xj
00:49:02.370 --> 00:49:04.810
when i different than j.
00:49:04.810 --> 00:49:09.830
So here, all that's left to do
is to add the expectations of
00:49:09.830 --> 00:49:10.540
these terms.
00:49:10.540 --> 00:49:14.480
Each one of these terms has an
expected value that's 1/n
00:49:14.480 --> 00:49:16.910
times (1/n - 1).
00:49:16.910 --> 00:49:19.170
And how many terms do we have?
00:49:19.170 --> 00:49:21.410
How many of these are
we adding up?
00:49:24.840 --> 00:49:28.950
It's n-squared - n.
00:49:28.950 --> 00:49:31.830
When you expand the quadratic,
there's a total
00:49:31.830 --> 00:49:33.890
of n-squared terms.
00:49:33.890 --> 00:49:37.860
Some are self-terms,
n of them.
00:49:37.860 --> 00:49:42.170
And the remaining number of
terms is n-squared - n.
00:49:42.170 --> 00:49:48.310
So here we got n-squared
- n terms.
00:49:48.310 --> 00:49:51.200
And so we need to multiply
here with n-squared - n.
00:49:53.810 --> 00:49:59.980
And after you realize that this
number here is 1, and you
00:49:59.980 --> 00:50:03.490
realize that this is the same
as the denominator, you get
00:50:03.490 --> 00:50:06.750
the answer that the expected
value of X squared equals 2.
00:50:06.750 --> 00:50:10.120
And then, finally going up to
the top formula, we get the
00:50:10.120 --> 00:50:14.720
expected value of X squared,
which is 2 - 1, and the
00:50:14.720 --> 00:50:17.610
variance is just equal to 1.
00:50:17.610 --> 00:50:21.680
So the variance of this random
variable, number of people who
00:50:21.680 --> 00:50:25.130
get their own hats back,
is also equal to 1,
00:50:25.130 --> 00:50:26.540
equal to the mean.
00:50:26.540 --> 00:50:27.690
Looks like magic.
00:50:27.690 --> 00:50:29.220
Why is this the case?
00:50:29.220 --> 00:50:31.550
Well, there's a deeper
explanation why these two
00:50:31.550 --> 00:50:33.630
numbers should come out
to be the same.
00:50:33.630 --> 00:50:35.980
But this is something that would
probably have to wait a
00:50:35.980 --> 00:50:39.420
couple of chapters before we
could actually explain it.
00:50:39.420 --> 00:50:40.730
And so I'll stop here.