WEBVTT
00:00:00.040 --> 00:00:02.460
The following content is
provided under a Creative
00:00:02.460 --> 00:00:03.870
Commons license.
00:00:03.870 --> 00:00:06.910
Your support will help MIT
OpenCourseWare continue to
00:00:06.910 --> 00:00:08.700
offer high-quality, educational
00:00:08.700 --> 00:00:10.560
resources for free.
00:00:10.560 --> 00:00:13.460
To make a donation or view
additional materials from
00:00:13.460 --> 00:00:19.290
hundreds of MIT courses, visit
MIT OpenCourseWare at
00:00:19.290 --> 00:00:20.540
ocw.mit.edu.
00:00:22.245 --> 00:00:23.550
PROFESSOR: Good morning.
00:00:23.550 --> 00:00:25.900
So today we're going
to continue the
00:00:25.900 --> 00:00:27.930
subject from last time.
00:00:27.930 --> 00:00:31.010
So we're going to talk about
derived distributions a little
00:00:31.010 --> 00:00:34.685
more, how to derive the
distribution of a function of
00:00:34.685 --> 00:00:36.510
a random variable.
00:00:36.510 --> 00:00:40.440
So last time we discussed a
couple of examples in which we
00:00:40.440 --> 00:00:43.570
had a function of a
single variable.
00:00:43.570 --> 00:00:46.330
And we found the distribution
of Y, if we're told the
00:00:46.330 --> 00:00:47.970
distribution of X.
00:00:47.970 --> 00:00:51.030
So today we're going to do an
example where we deal with the
00:00:51.030 --> 00:00:53.600
function of two random
variables.
00:00:53.600 --> 00:00:56.460
And then we're going to consider
the most interesting
00:00:56.460 --> 00:01:00.470
example of this kind, in which
we have a random variable of
00:01:00.470 --> 00:01:03.800
the form W, which is
the sum of two
00:01:03.800 --> 00:01:05.830
independent, random variables.
00:01:05.830 --> 00:01:08.210
That's a case that shows
up quite often.
00:01:08.210 --> 00:01:10.850
And so we want to see what
exactly happens in this
00:01:10.850 --> 00:01:12.780
particular case.
00:01:12.780 --> 00:01:14.620
Just one comment that
I should make.
00:01:14.620 --> 00:01:18.010
The material that we're covering
now, chapter four, is
00:01:18.010 --> 00:01:21.940
sort of conceptually a little
more difficult than one we
00:01:21.940 --> 00:01:23.600
have been doing before.
00:01:23.600 --> 00:01:26.230
So I would definitely encourage
you to read the text
00:01:26.230 --> 00:01:29.690
before you jump and try
to do the problems in
00:01:29.690 --> 00:01:32.300
your problem sets.
00:01:32.300 --> 00:01:40.270
OK, so let's start with our
example, in which we're given
00:01:40.270 --> 00:01:41.940
two random variables.
00:01:41.940 --> 00:01:43.450
They're jointly continuous.
00:01:43.450 --> 00:01:45.870
And their distribution
is pretty simple.
00:01:45.870 --> 00:01:48.920
They're uniform on
the unit square.
00:01:48.920 --> 00:01:52.440
In particular, each one of the
random variables is uniform on
00:01:52.440 --> 00:01:54.050
the unit interval.
00:01:54.050 --> 00:01:57.160
And the two random variables
are independent.
00:01:57.160 --> 00:02:00.820
What we're going to find is the
distribution of the ratio
00:02:00.820 --> 00:02:03.120
of the two random variables.
00:02:03.120 --> 00:02:07.170
How do we go about it? , Well,
the same cookbook procedure
00:02:07.170 --> 00:02:10.020
that we used last time
for the case of a
00:02:10.020 --> 00:02:13.750
single random variable.
00:02:13.750 --> 00:02:17.100
The cookbook procedure that
we used for this case also
00:02:17.100 --> 00:02:20.240
applies to the case where you
have a function of multiple
00:02:20.240 --> 00:02:21.420
random variables.
00:02:21.420 --> 00:02:23.740
So what was the cookbook
procedure?
00:02:23.740 --> 00:02:27.030
The first step is to find the
cumulative distribution
00:02:27.030 --> 00:02:30.830
function of the random variable
of interest and then
00:02:30.830 --> 00:02:36.260
take the derivative in order
to find the density.
00:02:36.260 --> 00:02:39.770
So let's find the cumulative.
00:02:39.770 --> 00:02:43.800
So, by definition, the
cumulative is the probability
00:02:43.800 --> 00:02:47.940
that the random variable is
less than or equal to the
00:02:47.940 --> 00:02:49.880
argument of the cumulative.
00:02:49.880 --> 00:02:53.010
So if we write this event in
terms of the random variable
00:02:53.010 --> 00:02:58.470
of interest, this is the
probability that our random
00:02:58.470 --> 00:03:01.650
variable is less than
or equal to z.
00:03:01.650 --> 00:03:04.920
So what is that?
00:03:04.920 --> 00:03:09.453
OK, so the ratio is going to be
less than or equal to z, if
00:03:09.453 --> 00:03:14.920
and only if the pair, (x,y),
happens to fall below the line
00:03:14.920 --> 00:03:17.280
that has a slope z.
00:03:17.280 --> 00:03:20.800
OK, so we draw a line
that has a slope z.
00:03:20.800 --> 00:03:23.880
The ratio is less than this
number, if and only if we get
00:03:23.880 --> 00:03:27.700
the pair of x and y that falls
inside this triangle.
00:03:27.700 --> 00:03:29.430
So we're talking about
the probability of
00:03:29.430 --> 00:03:30.880
this particular event.
00:03:30.880 --> 00:03:37.170
Since this line has a slope of
z, the height at this point is
00:03:37.170 --> 00:03:38.650
equal to z.
00:03:38.650 --> 00:03:40.980
And so we can find the
probability of this event.
00:03:40.980 --> 00:03:43.260
It's just the area
of this triangle.
00:03:43.260 --> 00:03:47.100
And so the area is 1
times z times 1/2.
00:03:47.100 --> 00:03:48.760
And we get the answer, z/2.
00:03:52.190 --> 00:03:56.220
Now, is this answer
always correct?
00:03:56.220 --> 00:04:00.300
Now, this answer is going to be
correct only if the slope
00:04:00.300 --> 00:04:05.080
happens to be such that we get
a picture of this kind.
00:04:05.080 --> 00:04:07.380
So when do we get a picture
of this kind?
00:04:07.380 --> 00:04:09.460
When the slope is less than 1.
00:04:09.460 --> 00:04:13.110
If I consider a different slope,
a number, little z --
00:04:13.110 --> 00:04:15.730
that happens to be a slope
of that kind --
00:04:15.730 --> 00:04:17.670
then the picture changes.
00:04:17.670 --> 00:04:20.579
And in that case, we
get a picture of
00:04:20.579 --> 00:04:24.330
this kind, let's say.
00:04:24.330 --> 00:04:28.495
So this is a line here
of slope z, again.
00:04:31.030 --> 00:04:35.790
And this is the second case in
which our number, little z, is
00:04:35.790 --> 00:04:37.960
bigger than 1.
00:04:37.960 --> 00:04:39.740
So how do we proceed?
00:04:39.740 --> 00:04:43.690
Once more, the cumulative is the
probability that the ratio
00:04:43.690 --> 00:04:46.060
is less than or equal
to that number.
00:04:46.060 --> 00:04:50.650
So it's the probability that
we fall below the red line.
00:04:50.650 --> 00:04:56.590
So we're talking about the
event, about this event.
00:04:56.590 --> 00:04:59.450
So to find the probability of
this event, we need to find
00:04:59.450 --> 00:05:02.300
the area of this red shape.
00:05:02.300 --> 00:05:06.310
And one way of finding this area
is to consider the whole
00:05:06.310 --> 00:05:09.560
area and subtract the area
of this triangle.
00:05:09.560 --> 00:05:11.360
So let's do it this way.
00:05:11.360 --> 00:05:15.000
It's going to be 1 minus the
area of the triangle.
00:05:15.000 --> 00:05:16.750
Now, what's the area
of the triangle?
00:05:16.750 --> 00:05:24.420
It's 1/2 times this side, which
is 1 times this side.
00:05:24.420 --> 00:05:28.090
How big is that side?
00:05:28.090 --> 00:05:37.130
Well, if y and the slope is z,
now z is the ratio y over x.
00:05:37.130 --> 00:05:39.050
So if y over x--
00:05:39.050 --> 00:05:46.560
at this point we have
y/x = z and y =1.
00:05:46.560 --> 00:05:49.370
This means that z is 1/x.
00:05:49.370 --> 00:05:55.080
So the coordinate of
this point is 1/x.
00:05:55.080 --> 00:05:56.970
And this means that
we're going to--
00:05:56.970 --> 00:06:04.390
1/z So here we get the
factor of 1/z.
00:06:07.300 --> 00:06:09.440
And we're basically done.
00:06:09.440 --> 00:06:12.630
I guess if you want to have a
complete answer, you should
00:06:12.630 --> 00:06:16.770
also give the formula
for z less than 0.
00:06:16.770 --> 00:06:19.510
What is the cumulative when
z is less than 0, the
00:06:19.510 --> 00:06:22.870
probability that you get the
ratio that's negative?
00:06:22.870 --> 00:06:25.140
Well, since our random variables
are positive,
00:06:25.140 --> 00:06:27.890
there's no way that you can
get a negative ratio.
00:06:27.890 --> 00:06:31.900
So the cumulative down
there is equal to 0.
00:06:31.900 --> 00:06:34.870
So we can plot the cumulative.
00:06:34.870 --> 00:06:37.965
And we can take its derivative
in order to find the density.
00:06:45.980 --> 00:06:49.720
So the cumulative that
we got starts at 0,
00:06:49.720 --> 00:06:52.000
when z's are negative.
00:06:52.000 --> 00:06:59.750
Then it starts going up
in proportion to z, at
00:06:59.750 --> 00:07:01.000
the slope of 1/2.
00:07:03.520 --> 00:07:05.770
So this takes us up to 1.
00:07:08.980 --> 00:07:14.480
And then it starts increasing
towards 1,
00:07:14.480 --> 00:07:15.790
according to this function.
00:07:15.790 --> 00:07:18.780
When you let z go to infinity,
the cumulative is
00:07:18.780 --> 00:07:20.330
going to go to 1.
00:07:20.330 --> 00:07:25.210
And it has a shape of, more
or less, this kind.
00:07:25.210 --> 00:07:29.035
So now to get the density, we
just take the derivative.
00:07:36.790 --> 00:07:40.560
And the density is, of
course, 0 down here.
00:07:40.560 --> 00:07:43.950
Up here the derivative
is just 1/2.
00:07:48.750 --> 00:07:52.470
And beyond that point we need to
take the derivative of this
00:07:52.470 --> 00:07:53.480
expression.
00:07:53.480 --> 00:07:58.700
And the derivative is going to
be 1/2 times 1 over z-squared.
00:07:58.700 --> 00:08:00.990
So it's going to be a
shape of this kind.
00:08:09.440 --> 00:08:11.730
And we're done.
00:08:11.730 --> 00:08:14.820
So you see that problems
involving functions of
00:08:14.820 --> 00:08:19.300
multiple random variables are
no harder than problems that
00:08:19.300 --> 00:08:22.320
deal with the functional of
a single random variable.
00:08:22.320 --> 00:08:25.070
The general procedure is,
again, exactly the same.
00:08:25.070 --> 00:08:28.470
You first find the cumulative,
and then you differentiate.
00:08:28.470 --> 00:08:31.540
The only extra difficulty will
be that when you calculate the
00:08:31.540 --> 00:08:34.570
cumulative, you need to find
the probability of an event
00:08:34.570 --> 00:08:37.020
that involves multiple
random variables.
00:08:37.020 --> 00:08:40.730
And sometimes this could be
a little harder to do.
00:08:40.730 --> 00:08:44.910
By the way, since we dealt
with this example, just a
00:08:44.910 --> 00:08:45.920
couple of questions.
00:08:45.920 --> 00:08:49.280
What do you think is going to
be the expected value of the
00:08:49.280 --> 00:08:52.720
random variable Z?
00:08:52.720 --> 00:08:55.960
Let's see, the expected value
of the random variable Z is
00:08:55.960 --> 00:09:01.120
going to be the integral
of z times the density.
00:09:01.120 --> 00:09:10.950
And the density is equal to 1/2
for z going from 0 to 1.
00:09:10.950 --> 00:09:12.050
And then there's another
00:09:12.050 --> 00:09:14.770
contribution from 1 to infinity.
00:09:14.770 --> 00:09:17.260
There the density is
1/(2z-squared).
00:09:19.880 --> 00:09:24.630
And we get the z, since we're
dealing with expectations, dz.
00:09:24.630 --> 00:09:25.880
So what is this integral?
00:09:29.602 --> 00:09:35.070
Well, if you look here, you're
integrating 1/z, all the way
00:09:35.070 --> 00:09:36.420
to infinity.
00:09:36.420 --> 00:09:41.550
1/z has an integral, which
is the logarithm of z.
00:09:41.550 --> 00:09:44.660
And since the logarithm goes to
infinity, this means that
00:09:44.660 --> 00:09:47.770
this integral is
also infinite.
00:09:47.770 --> 00:09:53.640
So the expectation of the random
variable Z is actually
00:09:53.640 --> 00:09:55.310
infinite in this example.
00:09:55.310 --> 00:09:57.130
There's nothing wrong
with this.
00:09:57.130 --> 00:10:00.500
Lots of random variables have
infinite expectations.
00:10:00.500 --> 00:10:06.980
If the tail of the density falls
kind of slowly, as the
00:10:06.980 --> 00:10:10.650
argument goes to infinity, then
it may well turn out that
00:10:10.650 --> 00:10:12.430
you get an infinite integral.
00:10:12.430 --> 00:10:15.770
So that's just how
things often are.
00:10:15.770 --> 00:10:19.060
Nothing strange about it.
00:10:19.060 --> 00:10:22.700
And now, since we are still in
this example, let me ask
00:10:22.700 --> 00:10:25.680
another question.
00:10:25.680 --> 00:10:30.110
Would we reason, on the average,
would it be true that
00:10:30.110 --> 00:10:31.960
the expected value of Z --
00:10:31.960 --> 00:10:36.710
remember that Z is the ratio
Y/X -- could it be that the
00:10:36.710 --> 00:10:39.850
expected value of Z
is this number?
00:10:43.380 --> 00:10:48.460
Or could it be that it's
equal to this number?
00:10:53.670 --> 00:10:57.345
Or could it be that it's
none of the above?
00:11:01.140 --> 00:11:06.295
OK, so how many people think
this is correct?
00:11:12.500 --> 00:11:14.130
Small number.
00:11:14.130 --> 00:11:15.625
How many people think
this is correct?
00:11:18.290 --> 00:11:21.480
Slightly bigger, but still
a small number.
00:11:21.480 --> 00:11:24.660
And how many people think
this is correct?
00:11:24.660 --> 00:11:26.090
OK, that's--
00:11:26.090 --> 00:11:28.890
this one wins the vote.
00:11:28.890 --> 00:11:32.570
OK, let's see.
00:11:32.570 --> 00:11:37.360
This one is not correct, just
because there's no reason it
00:11:37.360 --> 00:11:39.100
should be correct.
00:11:39.100 --> 00:11:44.420
So, in general, you cannot
reason on the average.
00:11:44.420 --> 00:11:48.460
The expected value of a function
is not the same as
00:11:48.460 --> 00:11:50.950
the same function of the
expected values.
00:11:50.950 --> 00:11:53.740
This is only true if you're
dealing with linear functions
00:11:53.740 --> 00:11:54.950
of random variables.
00:11:54.950 --> 00:11:56.340
So this is not--
00:11:56.340 --> 00:11:59.400
this turns out to
not be correct.
00:11:59.400 --> 00:12:00.790
How about this one?
00:12:00.790 --> 00:12:05.820
Well, X and Y are independent,
by assumption.
00:12:05.820 --> 00:12:10.910
So 1/X and Y are also
independent.
00:12:14.000 --> 00:12:14.710
Why is this?
00:12:14.710 --> 00:12:17.150
Independence means that one
random variable does not
00:12:17.150 --> 00:12:19.670
convey any information
about the other.
00:12:19.670 --> 00:12:24.100
So Y doesn't give you any
information about X. So Y
00:12:24.100 --> 00:12:27.970
doesn't give you any information
about 1/X. Or to
00:12:27.970 --> 00:12:30.140
put it differently, if two
random variables are
00:12:30.140 --> 00:12:36.170
independent, functions of each
one of those random variables
00:12:36.170 --> 00:12:37.990
are also independent.
00:12:37.990 --> 00:12:41.360
If X is independent from
Y, then g(X) is
00:12:41.360 --> 00:12:43.700
independent of h(Y).
00:12:43.700 --> 00:12:45.280
So this applies to this case.
00:12:45.280 --> 00:12:47.780
These two random variables
are independent.
00:12:47.780 --> 00:12:50.350
And since they are independent,
this means that
00:12:50.350 --> 00:12:55.070
the expected value of their
product is equal to the
00:12:55.070 --> 00:12:57.670
product of the expected
values.
00:12:57.670 --> 00:13:02.950
So this relation actually
is true.
00:13:02.950 --> 00:13:05.632
And therefore, this
is not true.
00:13:05.632 --> 00:13:06.882
OK.
00:13:14.690 --> 00:13:17.630
Now, let's move on.
00:13:17.630 --> 00:13:22.420
We have this general procedure
of finding the derived
00:13:22.420 --> 00:13:26.770
distribution by going through
the cumulative.
00:13:26.770 --> 00:13:30.000
Are there some cases where
we can have a shortcut?
00:13:30.000 --> 00:13:34.040
Turns out that there is a
special case or a special
00:13:34.040 --> 00:13:38.340
structure in which we can get
directly from densities to
00:13:38.340 --> 00:13:42.240
densities using directly
just a formula.
00:13:42.240 --> 00:13:44.590
And in that case, we don't
have to go through the
00:13:44.590 --> 00:13:45.750
cumulative.
00:13:45.750 --> 00:13:48.580
And this case is also
interesting, because it gives
00:13:48.580 --> 00:13:52.810
us some insight about how one
density changes to a different
00:13:52.810 --> 00:13:56.430
density and what affects the
shape of those densities.
00:13:56.430 --> 00:14:00.430
So the case where things easy
is when the transformation
00:14:00.430 --> 00:14:03.300
from one random variable to
the other is a strictly
00:14:03.300 --> 00:14:04.660
monotonic one.
00:14:04.660 --> 00:14:10.630
So there's a one-to-one relation
between x's and y's.
00:14:10.630 --> 00:14:14.790
Here we can reason directly in
terms of densities by thinking
00:14:14.790 --> 00:14:17.980
in terms of probabilities
of small intervals.
00:14:17.980 --> 00:14:23.370
So let's look at the small
interval on the x-axis, like
00:14:23.370 --> 00:14:26.540
this one, when X ranges from--
00:14:26.540 --> 00:14:30.390
where capital X ranges
from a small x to a
00:14:30.390 --> 00:14:31.760
small x plus delta.
00:14:31.760 --> 00:14:36.480
So this is a small interval
of length delta.
00:14:36.480 --> 00:14:40.060
Whenever X happens to fall in
this interval, the random
00:14:40.060 --> 00:14:42.720
variable Y is going
to fall in a
00:14:42.720 --> 00:14:45.800
corresponding interval up there.
00:14:45.800 --> 00:14:50.840
So up there we have a
corresponding interval.
00:14:50.840 --> 00:14:55.890
And these two intervals, the
red and the blue interval--
00:14:55.890 --> 00:14:57.670
this is the blue interval.
00:14:57.670 --> 00:15:01.120
And that's the red interval.
00:15:01.120 --> 00:15:05.180
These two intervals should have
the same probability.
00:15:05.180 --> 00:15:08.120
They're exactly the
same event.
00:15:08.120 --> 00:15:13.530
When X falls here, g(X) happens
to fall in there.
00:15:13.530 --> 00:15:16.610
So we can sort of say that the
probability of this little
00:15:16.610 --> 00:15:18.270
interval is the same
as the probability
00:15:18.270 --> 00:15:20.050
of that little interval.
00:15:20.050 --> 00:15:22.870
And we know that probabilities
of little intervals have
00:15:22.870 --> 00:15:25.560
something to do with
densities.
00:15:25.560 --> 00:15:28.260
So what is the probability
of this little interval?
00:15:28.260 --> 00:15:32.490
It's the density of the random
variable X, at this point,
00:15:32.490 --> 00:15:35.750
times the length of
the interval.
00:15:35.750 --> 00:15:38.990
How about the probability
of that interval?
00:15:38.990 --> 00:15:45.070
It's going to be the density of
the random variable Y times
00:15:45.070 --> 00:15:48.180
the length of that
little interval.
00:15:48.180 --> 00:15:50.310
Now, this interval
has length delta.
00:15:50.310 --> 00:15:51.760
Does that mean that
this interval
00:15:51.760 --> 00:15:53.710
also has length delta?
00:15:53.710 --> 00:15:55.440
Well, not necessarily.
00:15:55.440 --> 00:15:58.040
The length of this interval has
something to do with the
00:15:58.040 --> 00:16:01.830
slope of your function g.
00:16:01.830 --> 00:16:05.650
So slope is dy by dx.
00:16:05.650 --> 00:16:09.700
Is how much-- the slope tells
you how big is the y interval
00:16:09.700 --> 00:16:13.820
when you take an interval
x of a certain length.
00:16:13.820 --> 00:16:17.180
So the slope is what multiplies
the length of this
00:16:17.180 --> 00:16:20.430
interval to give you the length
of that interval.
00:16:20.430 --> 00:16:25.150
So the length of this interval
is delta times the slope of
00:16:25.150 --> 00:16:26.400
your function.
00:16:28.870 --> 00:16:35.400
So the length of the interval
is delta times the slope of
00:16:35.400 --> 00:16:37.450
the function, approximately.
00:16:37.450 --> 00:16:41.320
So the probability of this
interval is going to be the
00:16:41.320 --> 00:16:46.790
density of Y times the length
of the interval that we are
00:16:46.790 --> 00:16:47.940
considering.
00:16:47.940 --> 00:16:52.280
So this gives us a relation
between the density of X,
00:16:52.280 --> 00:16:57.080
evaluated at this point, to the
density of Y, evaluated at
00:16:57.080 --> 00:16:58.140
that point.
00:16:58.140 --> 00:17:00.350
The two densities are
closely related.
00:17:00.350 --> 00:17:05.130
If these x's are very likely
to occur, then this is big,
00:17:05.130 --> 00:17:08.359
which means that that density
will also be big.
00:17:08.359 --> 00:17:11.550
If these x's are very likely to
occur, then those y's are
00:17:11.550 --> 00:17:13.530
also very likely to occur.
00:17:13.530 --> 00:17:16.109
But there's also another
factor that comes in.
00:17:16.109 --> 00:17:18.660
And that's the slope
of the function at
00:17:18.660 --> 00:17:21.109
this particular point.
00:17:21.109 --> 00:17:24.500
So we have this relation between
the two densities.
00:17:24.500 --> 00:17:28.130
Now, in interpreting this
equation, you need to make
00:17:28.130 --> 00:17:30.900
sure what's the relation between
the two variables.
00:17:30.900 --> 00:17:34.670
I have both little x's
and little y's.
00:17:34.670 --> 00:17:39.330
Well, this formula is true for
an (x,y) pair, that they're
00:17:39.330 --> 00:17:42.300
related according to this
particular function.
00:17:42.300 --> 00:17:48.000
So if I fix an x and consider
the corresponding y, then the
00:17:48.000 --> 00:17:52.480
densities at those x's and
corresponding y's will be
00:17:52.480 --> 00:17:54.420
related by that formula.
00:17:54.420 --> 00:17:57.650
Now, in the end, you want to
come up with a formula that
00:17:57.650 --> 00:18:01.520
just gives you the density
of Y as a function of y.
00:18:01.520 --> 00:18:03.110
And that means that you need to
00:18:03.110 --> 00:18:06.040
eliminate x from the picture.
00:18:06.040 --> 00:18:11.140
So let's see how that would
go in an example.
00:18:11.140 --> 00:18:17.640
So suppose that we're dealing
with the function y equal to x
00:18:17.640 --> 00:18:21.930
cubed, in which case our
function, g(x), is the
00:18:21.930 --> 00:18:23.180
function x cubed.
00:18:26.090 --> 00:18:31.980
And if x cubed is equal to a
little y, If we have a pair of
00:18:31.980 --> 00:18:38.350
x's and y's that are related
this way, then this means that
00:18:38.350 --> 00:18:41.600
x is going to be the
cubic root of y.
00:18:41.600 --> 00:18:46.550
So this is the formula that
takes us back from y's to x's.
00:18:46.550 --> 00:18:52.940
This is the direct function from
x, how to construct y.
00:18:52.940 --> 00:18:55.470
This is essentially the inverse
function that tells
00:18:55.470 --> 00:18:59.460
us, from a given y what is
the corresponding x.
00:18:59.460 --> 00:19:04.650
Now, if we write this formula,
it tells us that the density
00:19:04.650 --> 00:19:08.270
at the particular x is going
to be the density at the
00:19:08.270 --> 00:19:12.390
corresponding y times the slope
of the function at the
00:19:12.390 --> 00:19:14.770
particular x that we
are considering.
00:19:14.770 --> 00:19:17.150
The slope of the function
is 3x squared.
00:19:20.870 --> 00:19:26.590
Now, we want to end up with a
formula for the density of Y.
00:19:26.590 --> 00:19:29.510
So I'm going to take this
factor, send it
00:19:29.510 --> 00:19:31.410
to the other side.
00:19:31.410 --> 00:19:35.300
But since I want it to be a
function of y, I want to
00:19:35.300 --> 00:19:36.920
eliminate the x's.
00:19:36.920 --> 00:19:39.590
And I'm going to eliminate
the x's using this
00:19:39.590 --> 00:19:41.290
correspondence here.
00:19:41.290 --> 00:19:44.440
So I'm going to get
the density of X
00:19:44.440 --> 00:19:47.830
evaluated at y to the 1/3.
00:19:47.830 --> 00:19:50.404
And then this factor in the
denominator, it's 1/(3y to the
00:19:50.404 --> 00:19:51.654
power 2/3).
00:19:55.710 --> 00:19:59.540
So we end up finally with the
formula for the density of the
00:19:59.540 --> 00:20:02.900
random variable Y.
00:20:02.900 --> 00:20:06.900
And this is the same answer that
you would get if you go
00:20:06.900 --> 00:20:10.030
through this exercise using the
cumulative distribution
00:20:10.030 --> 00:20:11.540
function method.
00:20:11.540 --> 00:20:13.160
You end up getting
the same answer.
00:20:13.160 --> 00:20:15.205
But here we sort of
get it directly.
00:20:19.700 --> 00:20:24.570
Just to get a little more
insight as to why
00:20:24.570 --> 00:20:25.830
the slope comes in--
00:20:29.960 --> 00:20:35.070
suppose that we have a function
like this one.
00:20:38.020 --> 00:20:45.110
So the function is sort of flat,
then moves quickly, and
00:20:45.110 --> 00:20:49.160
then becomes flat again.
00:20:49.160 --> 00:20:50.720
What should be --
00:20:50.720 --> 00:20:55.140
and suppose that X has some kind
of reasonable density,
00:20:55.140 --> 00:20:57.180
some kind of flat density.
00:20:57.180 --> 00:21:01.640
Suppose that X is a pretty
uniform random variable.
00:21:01.640 --> 00:21:04.770
What's going to happen to
the random variable Y?
00:21:04.770 --> 00:21:06.920
What kind of distribution
should it have?
00:21:14.670 --> 00:21:19.220
What are the typical values
of the random variable Y?
00:21:19.220 --> 00:21:26.960
Either x falls here, and y is
a very small number, or--
00:21:26.960 --> 00:21:30.100
let's take that number here
to be -- let's say 2 --
00:21:30.100 --> 00:21:37.290
or x falls in this range, and
y takes a value close to 2.
00:21:37.290 --> 00:21:40.210
And there's a small chance that
x's will be somewhere in
00:21:40.210 --> 00:21:44.350
the middle, in which case y
takes intermediate values.
00:21:44.350 --> 00:21:46.390
So what kind of shape do
you expect for the
00:21:46.390 --> 00:21:48.060
distribution of Y?
00:21:48.060 --> 00:21:51.900
There's going to be a fair
amount of probability that Y
00:21:51.900 --> 00:21:55.510
takes values close to 0.
00:21:55.510 --> 00:21:58.480
There's a small probability
that Y takes
00:21:58.480 --> 00:22:00.130
intermediate values.
00:22:00.130 --> 00:22:03.870
That corresponds to the case
where x falls in here.
00:22:03.870 --> 00:22:05.480
That's not a lot
of probability.
00:22:05.480 --> 00:22:11.280
So the probability that Y takes
values between 0 and 2,
00:22:11.280 --> 00:22:12.760
that's kind of small.
00:22:12.760 --> 00:22:16.860
But then there's a lot of x's
that produces y's that are
00:22:16.860 --> 00:22:18.410
close to 2.
00:22:18.410 --> 00:22:22.110
So there's a significant
probability that Y would take
00:22:22.110 --> 00:22:25.470
values that are close to 2.
00:22:25.470 --> 00:22:26.370
So you--
00:22:26.370 --> 00:22:31.300
the density of Y would have
a shape of this kind.
00:22:31.300 --> 00:22:35.280
By looking at this picture, you
can tell that it's most
00:22:35.280 --> 00:22:39.630
likely that either x will fall
here or x will fall there.
00:22:39.630 --> 00:22:44.110
So the g(x) is most likely
to be close to 0 or
00:22:44.110 --> 00:22:46.290
to be close to 2.
00:22:46.290 --> 00:22:51.420
So since y is most likely to be
close to 0 or close to most
00:22:51.420 --> 00:22:53.850
of the probability
of y is here.
00:22:53.850 --> 00:22:54.570
And there's a small
00:22:54.570 --> 00:22:56.810
probability of being in between.
00:22:56.810 --> 00:23:02.330
Notice that the y's that get a
lot of probability are those
00:23:02.330 --> 00:23:07.490
y's associated with flats
regions off your g function.
00:23:07.490 --> 00:23:11.510
When the g function is flat,
that gives you big densities
00:23:11.510 --> 00:23:12.500
for Y.
00:23:12.500 --> 00:23:16.480
So the density of Y is inversely
proportional to the
00:23:16.480 --> 00:23:18.350
slope of the function.
00:23:18.350 --> 00:23:20.140
And that's what you
get from here.
00:23:20.140 --> 00:23:22.780
The density of Y is--
00:23:22.780 --> 00:23:25.430
send that term to the other
side-- is inversely
00:23:25.430 --> 00:23:28.550
proportional to the slope of
the function that you're
00:23:28.550 --> 00:23:29.800
dealing with.
00:23:32.755 --> 00:23:36.730
OK, so this formula works nicely
for the case where the
00:23:36.730 --> 00:23:38.470
function is one-to-one.
00:23:38.470 --> 00:23:42.610
So we can have a unique
association between x's and
00:23:42.610 --> 00:23:47.500
y's and through an inverse
function, from y's to x's.
00:23:47.500 --> 00:23:50.030
It works for the monotonically
increasing case.
00:23:50.030 --> 00:23:53.660
It also works for the
monotonically decreasing case.
00:23:53.660 --> 00:23:56.120
In the monotonically decreasing
case, the only
00:23:56.120 --> 00:23:59.050
change that you need to do is to
take the absolute value of
00:23:59.050 --> 00:24:01.275
the slope, instead of
the slope itself.
00:24:16.340 --> 00:24:22.480
OK, now, here's another example
or a special case.
00:24:22.480 --> 00:24:27.520
Let's talk about the most
interesting case that involves
00:24:27.520 --> 00:24:29.740
a function of two random
variables.
00:24:29.740 --> 00:24:34.460
And this is the case where we
have two independent, random
00:24:34.460 --> 00:24:38.190
variables, and we want to
find the distribution of
00:24:38.190 --> 00:24:40.150
the sum of the two.
00:24:40.150 --> 00:24:42.300
We're really interested in
the continuous case.
00:24:42.300 --> 00:24:45.540
But as a warm-up, it's useful
to look at the discrete case
00:24:45.540 --> 00:24:48.510
first of discrete random
variables.
00:24:48.510 --> 00:24:52.740
Let's say we want to find the
probability that the sum of X
00:24:52.740 --> 00:24:55.890
and Y is equal to a
particular number.
00:24:55.890 --> 00:24:58.570
And to illustrate this,
let's take that number
00:24:58.570 --> 00:25:00.010
to be equal to 3.
00:25:00.010 --> 00:25:02.380
What's the probability that
the sum of the two random
00:25:02.380 --> 00:25:04.700
variables is equal to 3?
00:25:04.700 --> 00:25:07.640
To find the probability that
the sum is equal to 3, you
00:25:07.640 --> 00:25:11.570
consider all possible ways that
you can get the sum of 3.
00:25:11.570 --> 00:25:14.760
And the different ways are the
points in this picture.
00:25:14.760 --> 00:25:18.100
And they correspond to a line
that goes this way.
00:25:18.100 --> 00:25:21.620
So the probability that the
sum is equal to a certain
00:25:21.620 --> 00:25:24.550
number is the probability
that --
00:25:24.550 --> 00:25:26.340
is the sum of the
probabilities of
00:25:26.340 --> 00:25:27.950
all of those points.
00:25:27.950 --> 00:25:31.190
What is a typical point
in this picture?
00:25:31.190 --> 00:25:34.470
In a typical point, the
random variable X
00:25:34.470 --> 00:25:36.490
takes a certain value.
00:25:36.490 --> 00:25:41.480
And Y takes the value that's
needed so that the sum is
00:25:41.480 --> 00:25:47.650
equal to W. Any combination of
an x with a w minus x, any
00:25:47.650 --> 00:25:51.110
such combination gives
you a sum of w.
00:25:51.110 --> 00:25:54.950
So the probability that the sum
is w is the sum over all
00:25:54.950 --> 00:25:56.120
possible x's.
00:25:56.120 --> 00:25:59.420
That's over all these points of
the probability that we get
00:25:59.420 --> 00:26:01.050
a certain x.
00:26:01.050 --> 00:26:05.630
Let's say x equals 2 times the
corresponding probability that
00:26:05.630 --> 00:26:08.570
random variable Y takes
the value 1.
00:26:08.570 --> 00:26:11.710
And why am I multiplying
probabilities here?
00:26:11.710 --> 00:26:14.070
That's where we use the
assumption that the two random
00:26:14.070 --> 00:26:16.170
variables are independent.
00:26:16.170 --> 00:26:19.610
So the probability that X takes
a certain value and Y
00:26:19.610 --> 00:26:22.870
takes the complementary value,
that probability is the
00:26:22.870 --> 00:26:26.120
product of two probabilities
because of independence.
00:26:26.120 --> 00:26:29.890
And when we write that into our
usual PMF notation, it's a
00:26:29.890 --> 00:26:31.510
formula of this kind.
00:26:31.510 --> 00:26:35.500
So this formula is called
the convolution formula.
00:26:35.500 --> 00:26:42.030
It's an operation that takes
one PMF and another PMF-- p
00:26:42.030 --> 00:26:44.580
we're given the PMF's
of X and Y --
00:26:44.580 --> 00:26:47.640
and produces a new PMF.
00:26:47.640 --> 00:26:50.350
So think of this formula as
giving you a transformation.
00:26:50.350 --> 00:26:53.570
You take two PMF's, you do
something with them, and you
00:26:53.570 --> 00:26:56.190
obtain a new PMF.
00:26:56.190 --> 00:26:59.710
This procedure, what this
formula does is --
00:26:59.710 --> 00:27:04.490
nicely illustrated sort
of by mechanically.
00:27:04.490 --> 00:27:08.640
So let me show you a picture
here and illustrate how the
00:27:08.640 --> 00:27:13.310
mechanics go, in general.
00:27:13.310 --> 00:27:16.790
So you don't have these slides,
but let's just reason
00:27:16.790 --> 00:27:18.040
through it.
00:27:18.040 --> 00:27:22.220
So suppose that you are
given the PMF of X,
00:27:22.220 --> 00:27:23.110
and it has this shape.
00:27:23.110 --> 00:27:26.000
You're given the PMF of
Y. It has this shape.
00:27:26.000 --> 00:27:28.790
And somehow we are going
to do this calculation.
00:27:28.790 --> 00:27:31.940
Now, we need to do this
calculation for every value of
00:27:31.940 --> 00:27:37.190
W, in order to get the PMF of
W. Let's start by doing the
00:27:37.190 --> 00:27:40.200
calculation just for one case.
00:27:40.200 --> 00:27:43.870
Suppose the W is equal to 0, in
which case we need to find
00:27:43.870 --> 00:27:46.835
the sum of Px(x) and Py(-x).
00:27:50.790 --> 00:27:53.780
How do you do this calculation
graphically?
00:27:53.780 --> 00:27:59.550
It involves the PMF of X. But it
involves the PMF of Y, with
00:27:59.550 --> 00:28:02.120
the argument reversed.
00:28:02.120 --> 00:28:04.770
So how do we plot this?
00:28:04.770 --> 00:28:07.940
Well, in order to reverse the
argument, what you need is to
00:28:07.940 --> 00:28:11.230
take this PMF and flip it.
00:28:11.230 --> 00:28:13.850
So that's where it's handy
to have a pair of
00:28:13.850 --> 00:28:16.110
scissors with you.
00:28:16.110 --> 00:28:20.800
So you cut this down.
00:28:20.800 --> 00:28:26.360
And so now you take the PMF
of the random variable Y
00:28:26.360 --> 00:28:28.620
and just flip it.
00:28:28.620 --> 00:28:33.830
So what you see here is this
function where the argument is
00:28:33.830 --> 00:28:35.020
being reversed.
00:28:35.020 --> 00:28:36.260
And then what do we do?
00:28:36.260 --> 00:28:39.080
We cross-multiply
the two plots.
00:28:39.080 --> 00:28:41.070
Any entry here gets multiplied
with the
00:28:41.070 --> 00:28:43.110
corresponding entry there.
00:28:43.110 --> 00:28:46.550
And we consider all those
products and add them up.
00:28:46.550 --> 00:28:50.000
In this particular case, the
flipped PMF doesn't have any
00:28:50.000 --> 00:28:53.850
overlap with the PMF of X. So
we're going to get an answer
00:28:53.850 --> 00:28:56.190
that's equal to 0.
00:28:56.190 --> 00:29:03.320
So for w's equal to 0, the Pw is
going to be equal to 0, in
00:29:03.320 --> 00:29:05.210
this particular plot.
00:29:05.210 --> 00:29:08.760
Now if we have a different
value of w --
00:29:08.760 --> 00:29:09.930
oops.
00:29:09.930 --> 00:29:14.670
If we have a different value
of the argument w, then we
00:29:14.670 --> 00:29:20.530
have here the PMF of Y that's
flipped and shifted by an
00:29:20.530 --> 00:29:22.350
amount of w.
00:29:22.350 --> 00:29:25.930
So the correct picture of what
you do is to take this and
00:29:25.930 --> 00:29:30.250
displace it by a certain
amount of w.
00:29:30.250 --> 00:29:33.430
So here, how much
did I shift it?
00:29:33.430 --> 00:29:40.640
I shifted it until one
falls just below 4.
00:29:40.640 --> 00:29:44.832
So I have shifted by a
total amount of 5.
00:29:44.832 --> 00:29:50.680
So 0 falls under 5, whereas
0 initially was under 0.
00:29:50.680 --> 00:29:53.170
So I'm shifting it by 5 units.
00:29:53.170 --> 00:29:56.320
And I'm now going to
cross-multiply and add.
00:29:56.320 --> 00:29:58.220
Does this give us
the correct--
00:29:58.220 --> 00:30:00.180
does it do the correct thing?
00:30:00.180 --> 00:30:03.700
Yes, because a typical term will
be the probability that
00:30:03.700 --> 00:30:06.920
this random variable is 3 times
the probability that
00:30:06.920 --> 00:30:09.090
this random variable is 2.
00:30:09.090 --> 00:30:12.500
That's a particular way that
you can get a sum of 5.
00:30:12.500 --> 00:30:16.100
If you see here, the way that
things are aligned, it gives
00:30:16.100 --> 00:30:19.500
you all the different ways that
you can get the sum of 5.
00:30:19.500 --> 00:30:23.756
You can get the sum of 5 by
having 1 + 4, or 2 + 3, or 3 +
00:30:23.756 --> 00:30:26.140
2, or 4 + 1.
00:30:26.140 --> 00:30:28.280
You need to add the
probabilities of all those
00:30:28.280 --> 00:30:29.340
combinations.
00:30:29.340 --> 00:30:32.180
So you take this times that.
00:30:32.180 --> 00:30:34.230
That's one product term.
00:30:34.230 --> 00:30:38.220
Then this times 0,
this times that.
00:30:38.220 --> 00:30:39.480
And so 1--
00:30:39.480 --> 00:30:40.560
you cross--
00:30:40.560 --> 00:30:44.710
you find all the products of the
corresponding terms, and
00:30:44.710 --> 00:30:46.190
you add them together.
00:30:46.190 --> 00:30:50.140
So it's a kind of handy
mechanical procedure for doing
00:30:50.140 --> 00:30:53.520
this calculation, especially
when the PMF's are given to
00:30:53.520 --> 00:30:55.850
you in terms of a picture.
00:30:55.850 --> 00:31:00.000
So the summary of these
mechanics are just what we
00:31:00.000 --> 00:31:03.530
did, is that you put the PMF's
on top of each other.
00:31:03.530 --> 00:31:06.260
You take the PMF of
Y. You flip it.
00:31:06.260 --> 00:31:10.160
And for any particular w that
you're interested in, you take
00:31:10.160 --> 00:31:14.070
this flipped PMF and shift
it by an amount of w.
00:31:14.070 --> 00:31:17.120
Given this particular shift for
a particular value of w,
00:31:17.120 --> 00:31:21.020
you cross-multiply terms and
then accumulate them or add
00:31:21.020 --> 00:31:23.280
them together.
00:31:23.280 --> 00:31:26.620
What would you expect to happen
in the continuous case?
00:31:26.620 --> 00:31:28.600
Well, the story is familiar.
00:31:28.600 --> 00:31:32.520
In the continuous case, pretty
much, almost always things
00:31:32.520 --> 00:31:34.730
work out the same way,
except that we
00:31:34.730 --> 00:31:37.260
replace PMF's by PDF's.
00:31:37.260 --> 00:31:42.930
And we replace sums
by integrals.
00:31:42.930 --> 00:31:47.430
So there shouldn't be any
surprise here that you get a
00:31:47.430 --> 00:31:49.680
formula of this kind.
00:31:49.680 --> 00:31:54.030
The density of W can be obtained
from the density of X
00:31:54.030 --> 00:31:58.740
and the density of Y by
calculating this integral.
00:31:58.740 --> 00:32:03.440
Essentially, what this integral
does is it fits a
00:32:03.440 --> 00:32:05.130
particular w of interest.
00:32:05.130 --> 00:32:07.870
We're interested in the
probability that the random
00:32:07.870 --> 00:32:13.160
variable, capital W, takes a
value equal to little w or
00:32:13.160 --> 00:32:14.820
values close to it.
00:32:14.820 --> 00:32:17.240
So this corresponds to the
event, which is this
00:32:17.240 --> 00:32:21.120
particular line on the
two-dimensional space.
00:32:21.120 --> 00:32:24.140
So we need to find
the sort of odd
00:32:24.140 --> 00:32:25.990
probabilities along that line.
00:32:25.990 --> 00:32:28.620
But since the setting is
continuous, we will not add
00:32:28.620 --> 00:32:29.220
probabilities.
00:32:29.220 --> 00:32:31.120
We're going to integrate.
00:32:31.120 --> 00:32:35.430
And for any typical point in
this picture, the probability
00:32:35.430 --> 00:32:39.330
of obtaining an outcome in this
neighborhood is the--
00:32:39.330 --> 00:32:43.460
has something to do with the
density of that particular x
00:32:43.460 --> 00:32:47.190
and the density of the
particular y that would
00:32:47.190 --> 00:32:50.750
compliment x, in order
to form a sum of w.
00:32:50.750 --> 00:32:55.640
So this integral that we have
here is really an integral
00:32:55.640 --> 00:32:59.382
over this particular line.
00:32:59.382 --> 00:33:02.440
OK, so I'm going to
skip the formal
00:33:02.440 --> 00:33:04.010
derivation of this result.
00:33:04.010 --> 00:33:06.830
There's a couple of derivations
in the text.
00:33:06.830 --> 00:33:10.330
And the one which is outlined
here is yet a third
00:33:10.330 --> 00:33:11.500
derivation.
00:33:11.500 --> 00:33:14.300
But the easiest way to make
sense of this formula is to
00:33:14.300 --> 00:33:18.270
consider what happens in
the discrete case.
00:33:18.270 --> 00:33:22.010
So for the rest of the lecture
we're going to consider a few
00:33:22.010 --> 00:33:27.280
extra, more miscellaneous
topics, a few remarks, and a
00:33:27.280 --> 00:33:29.100
few more definitions.
00:33:29.100 --> 00:33:31.740
So let's change--
00:33:31.740 --> 00:33:35.325
flip a page and consider
the next mini topic.
00:33:38.670 --> 00:33:41.370
There's not going to be anything
deep here, but just
00:33:41.370 --> 00:33:44.550
something that's worth
being familiar with.
00:33:44.550 --> 00:33:47.570
If you have two independent,
normal random variables with
00:33:47.570 --> 00:33:50.920
certain parameters, the question
is, what does the
00:33:50.920 --> 00:33:55.160
joined PDF look like?
00:33:55.160 --> 00:33:58.970
So if they're independent, by
definition the joint PDF is
00:33:58.970 --> 00:34:01.760
the product of the
individual PDF's.
00:34:01.760 --> 00:34:04.840
And the PDF's each one
of them involves an
00:34:04.840 --> 00:34:07.030
exponential of something.
00:34:07.030 --> 00:34:11.290
The product of two exponentials
is the
00:34:11.290 --> 00:34:13.389
exponential of the sum.
00:34:13.389 --> 00:34:15.400
So you just add the exponents.
00:34:15.400 --> 00:34:18.320
So this is the formula
for the joint PDF.
00:34:18.320 --> 00:34:20.790
Now, you look at that formula
and you ask, what
00:34:20.790 --> 00:34:23.969
does it look like?
00:34:23.969 --> 00:34:27.780
OK, you can understand it, a
function of two variables by
00:34:27.780 --> 00:34:30.530
thinking about the contours
of this function.
00:34:30.530 --> 00:34:32.850
Look at the points at
which the function
00:34:32.850 --> 00:34:34.389
takes a constant value.
00:34:34.389 --> 00:34:34.920
Where is it?
00:34:34.920 --> 00:34:37.139
When is it constant?
00:34:37.139 --> 00:34:40.150
What's the shape of
the set of points
00:34:40.150 --> 00:34:42.239
where this is a constant?
00:34:42.239 --> 00:34:46.610
So consider all x's and y's for
which this expression here
00:34:46.610 --> 00:34:51.179
is a constant, that this
expression here is a constant.
00:34:51.179 --> 00:34:53.250
What kind of shape is this?
00:34:53.250 --> 00:34:56.170
This is an ellipse.
00:34:56.170 --> 00:35:01.880
And it's an ellipse that's
centered at--
00:35:01.880 --> 00:35:06.530
it's centered at mu x, mu y.
00:35:06.530 --> 00:35:09.760
These are the means of the
two random variables.
00:35:09.760 --> 00:35:13.760
If those sigmas were equal,
that ellipse would
00:35:13.760 --> 00:35:16.970
be actually a circle.
00:35:16.970 --> 00:35:20.210
And you would get contours
of this kind.
00:35:20.210 --> 00:35:23.870
But if, on the other hand, the
sigmas are different, you're
00:35:23.870 --> 00:35:29.900
going to get an ellipse that
has contours of this kind.
00:35:29.900 --> 00:35:32.930
So if my contours are
of this kind, that
00:35:32.930 --> 00:35:35.820
corresponds to what?
00:35:35.820 --> 00:35:39.395
Sigma x being bigger than
sigma y or vice versa.
00:35:42.760 --> 00:35:47.970
OK, contours of this kind
basically tell you that X is
00:35:47.970 --> 00:35:53.610
more likely to be spread out
than Y. So the range of
00:35:53.610 --> 00:35:55.600
possible x's is bigger.
00:35:55.600 --> 00:36:04.610
And X out here is as likely
as a Y up there.
00:36:04.610 --> 00:36:08.920
So big X's have roughly the same
probability as certain
00:36:08.920 --> 00:36:10.260
smaller y's.
00:36:10.260 --> 00:36:14.520
So in a picture of this kind,
the variance of X is going to
00:36:14.520 --> 00:36:17.710
be bigger than the
variance of Y.
00:36:17.710 --> 00:36:20.890
So depending on how these
variances compare with each
00:36:20.890 --> 00:36:22.520
other, that's going
to determine the
00:36:22.520 --> 00:36:24.180
shape of the ellipse.
00:36:24.180 --> 00:36:27.470
If the variance of Y we're
bigger, then your ellipse
00:36:27.470 --> 00:36:28.720
would be the other way.
00:36:28.720 --> 00:36:32.400
It would be elongated in
the other dimension.
00:36:32.400 --> 00:36:34.150
Just visualize it
a little more.
00:36:34.150 --> 00:36:37.120
Let me throw at you a
particular picture.
00:36:37.120 --> 00:36:39.820
This is one--
00:36:39.820 --> 00:36:43.830
this is a picture of
one special case.
00:36:43.830 --> 00:36:46.600
Here, I think, the variances
are equal.
00:36:46.600 --> 00:36:48.340
That's the kind of shape
that you get.
00:36:48.340 --> 00:36:51.330
It looks like a two-dimensional
bell.
00:36:51.330 --> 00:36:54.700
So remember, for a normal random
variables, for a single
00:36:54.700 --> 00:36:57.960
random variable you get a
PDF that's bell shaped.
00:36:57.960 --> 00:37:00.360
That's just a bell-shaped
curve.
00:37:00.360 --> 00:37:04.740
In the two-dimensional case, we
get the joint PDF, which is
00:37:04.740 --> 00:37:05.960
bell shaped again.
00:37:05.960 --> 00:37:09.750
And now it looks more like a
real bell, the way it would be
00:37:09.750 --> 00:37:12.550
laid out in ordinary space.
00:37:12.550 --> 00:37:15.060
And if you look at the contours
of this function, the
00:37:15.060 --> 00:37:18.950
places where the function is
equal, the typcial contour
00:37:18.950 --> 00:37:21.270
would have this shape here.
00:37:21.270 --> 00:37:23.090
And it would be an ellipse.
00:37:23.090 --> 00:37:28.320
And in this case, actually, it
will be more like a circle.
00:37:28.320 --> 00:37:32.650
So these would be the different
contours for
00:37:32.650 --> 00:37:33.900
different--
00:37:36.820 --> 00:37:38.520
so the contours are
places where the
00:37:38.520 --> 00:37:40.550
joint PDF is a constant.
00:37:40.550 --> 00:37:43.200
When you change the value of
that constant, you get the
00:37:43.200 --> 00:37:44.620
different contours.
00:37:44.620 --> 00:37:50.790
And the PDF is, of course,
centered around the mean of
00:37:50.790 --> 00:37:52.350
the two random variables.
00:37:52.350 --> 00:37:55.970
So in this particular case,
since the bell is centered
00:37:55.970 --> 00:38:00.270
around the (0, 0) vector, this
is a plot of a bivariate
00:38:00.270 --> 00:38:02.245
normal with 0 means.
00:38:05.370 --> 00:38:08.680
OK, there's--
00:38:08.680 --> 00:38:14.990
bivariate normals are also
interesting when your bell is
00:38:14.990 --> 00:38:17.280
oriented differently in space.
00:38:17.280 --> 00:38:21.090
We talked about ellipses that
are this way, ellipses that
00:38:21.090 --> 00:38:22.170
are this way.
00:38:22.170 --> 00:38:26.800
You could imagine also bells
that you take them, you squash
00:38:26.800 --> 00:38:30.200
them somehow, so that they
become narrow in one dimension
00:38:30.200 --> 00:38:32.700
and then maybe rotate them.
00:38:32.700 --> 00:38:33.720
So if you had--
00:38:33.720 --> 00:38:37.120
we're not going to go into this
subject, but if you had a
00:38:37.120 --> 00:38:46.580
joint pdf whose contours were
like this, what would that
00:38:46.580 --> 00:38:47.450
correspond to?
00:38:47.450 --> 00:38:51.220
Would your x's and y's
be independent?
00:38:51.220 --> 00:38:51.720
No.
00:38:51.720 --> 00:38:54.840
This would indicate that there's
a relation between the
00:38:54.840 --> 00:38:55.870
x's and the y's.
00:38:55.870 --> 00:38:59.280
That is, when you have bigger
x's, you would expect to also
00:38:59.280 --> 00:39:01.370
get bigger y's.
00:39:01.370 --> 00:39:04.530
So it would be a case of
dependent normals.
00:39:04.530 --> 00:39:09.100
And we're coming back to
this point in a second.
00:39:09.100 --> 00:39:13.710
Before we get to that point in
a second that has to do with
00:39:13.710 --> 00:39:16.840
the dependencies between the
random variables, let's just
00:39:16.840 --> 00:39:18.480
do another digression.
00:39:18.480 --> 00:39:23.700
If we have our two normals that
are independent, as we
00:39:23.700 --> 00:39:28.570
discussed here, we can go and
apply the formula, the
00:39:28.570 --> 00:39:31.770
convolution formula that we
were just discussing.
00:39:31.770 --> 00:39:35.250
Suppose you want to find the
distribution of the sum of
00:39:35.250 --> 00:39:37.160
these two independent normals.
00:39:37.160 --> 00:39:39.100
How do you do this?
00:39:39.100 --> 00:39:42.730
There is a closed-form formula
for the density of the sum,
00:39:42.730 --> 00:39:44.120
which is this one.
00:39:44.120 --> 00:39:47.530
We do have formulas for the
density of X and the density
00:39:47.530 --> 00:39:50.840
of Y, because both of them are
normal, random variables.
00:39:50.840 --> 00:39:54.820
So you need to calculate this
particular integral here.
00:39:54.820 --> 00:39:57.300
It's an integral with
respect to x.
00:39:57.300 --> 00:39:59.770
And you have to calculate
this integral for any
00:39:59.770 --> 00:40:03.190
given value of w.
00:40:03.190 --> 00:40:05.660
So this is an exercise
in integration,
00:40:05.660 --> 00:40:07.230
which is not very difficult.
00:40:07.230 --> 00:40:10.460
And it turns out that after you
do everything, you end up
00:40:10.460 --> 00:40:12.680
with an answer that
has this form.
00:40:12.680 --> 00:40:14.340
And you look at that,
and you suddenly
00:40:14.340 --> 00:40:16.930
recognize, oh, this is normal.
00:40:16.930 --> 00:40:20.760
And conclusion from this
exercise, once it's done, is
00:40:20.760 --> 00:40:23.150
that the sum of two independent
normal random
00:40:23.150 --> 00:40:26.370
variables is also normal.
00:40:26.370 --> 00:40:31.900
Now, the mean of W is, of
course, going to be equal to
00:40:31.900 --> 00:40:35.710
the sum of the means of X and
Y. In this case, in this
00:40:35.710 --> 00:40:37.660
formula I took the
means to be 0.
00:40:37.660 --> 00:40:40.850
So the mean of W is also
going to be 0.
00:40:40.850 --> 00:40:43.650
In the more general case, the
mean of W is going to be just
00:40:43.650 --> 00:40:45.560
the sum of the two means.
00:40:45.560 --> 00:40:49.680
The variance of W is always the
sum of the variances of X
00:40:49.680 --> 00:40:53.350
and Y, since we have independent
random variables.
00:40:53.350 --> 00:40:55.700
So there's no surprise here.
00:40:55.700 --> 00:40:59.990
The main surprise in this
calculation is this fact here,
00:40:59.990 --> 00:41:02.720
that the sum of independent
normal random
00:41:02.720 --> 00:41:04.170
variables is normal.
00:41:04.170 --> 00:41:07.210
I had mentioned this fact
in a previous lecture.
00:41:07.210 --> 00:41:12.070
Here what we're doing is to
basically outline the argument
00:41:12.070 --> 00:41:14.640
that justifies this
particular fact.
00:41:14.640 --> 00:41:17.540
It's an exercise in integration,
where you realize
00:41:17.540 --> 00:41:22.680
that when you convolve two
normal curves, you also get
00:41:22.680 --> 00:41:26.850
back a normal one once more.
00:41:26.850 --> 00:41:30.230
So now, let's return to the
comment I was making here,
00:41:30.230 --> 00:41:33.160
that if you have a contour plot
that has, let's say, a
00:41:33.160 --> 00:41:36.640
shape of this kind, this
indicates some kind of
00:41:36.640 --> 00:41:39.620
dependence between your
two random variables.
00:41:39.620 --> 00:41:43.470
So instead of a contour plot,
let me throw in here a
00:41:43.470 --> 00:41:44.990
scattered diagram.
00:41:44.990 --> 00:41:47.580
What does this scattered
diagram correspond to?
00:41:47.580 --> 00:41:50.650
Suppose you have a discrete
distribution, and each one of
00:41:50.650 --> 00:41:54.760
the points in this diagram
has positive probability.
00:41:54.760 --> 00:41:58.600
When you look at this diagram,
what would you say?
00:41:58.600 --> 00:42:06.890
I would say that when
y is big then x
00:42:06.890 --> 00:42:09.400
also tends to be larger.
00:42:09.400 --> 00:42:15.580
So bigger x's are sort of
associated with bigger y's in
00:42:15.580 --> 00:42:18.160
some average, statistical
sense.
00:42:18.160 --> 00:42:21.410
Whereas, if you have a picture
of this kind, it tells you in
00:42:21.410 --> 00:42:26.980
association that the positive
y's tend to be associated with
00:42:26.980 --> 00:42:30.090
negative x's most of the time.
00:42:30.090 --> 00:42:34.410
Negative y's tend to be
associated mostly with
00:42:34.410 --> 00:42:35.660
positive x's.
00:42:38.510 --> 00:42:42.210
So here there's a relation
that when one variable is
00:42:42.210 --> 00:42:45.790
large, the other one is also
expected to be large.
00:42:45.790 --> 00:42:48.800
Here there's a relation
of the opposite kind.
00:42:48.800 --> 00:42:50.910
How can we capture
this relation
00:42:50.910 --> 00:42:52.310
between two random variables?
00:42:52.310 --> 00:42:56.090
The way we capture it is by
defining this concept called
00:42:56.090 --> 00:43:03.090
the covariance, that looks at
the relation of was X bigger
00:43:03.090 --> 00:43:04.160
than usual?
00:43:04.160 --> 00:43:06.520
That's the question, whether
this is positive.
00:43:06.520 --> 00:43:10.110
And how does this relate to the
answer-- to the question,
00:43:10.110 --> 00:43:13.160
was Y bigger than usual?
00:43:13.160 --> 00:43:16.290
We're asking-- by calculating
this quantity, we're sort of
00:43:16.290 --> 00:43:19.820
asking the question, is there a
systematic relation between
00:43:19.820 --> 00:43:25.790
having a big X with
having a big Y?
00:43:25.790 --> 00:43:28.590
OK , to understand more
precisely what this does,
00:43:28.590 --> 00:43:32.290
let's suppose that the random
variable has 0 means, So that
00:43:32.290 --> 00:43:33.610
we get rid of this--
00:43:33.610 --> 00:43:35.290
get rid of some clutter.
00:43:35.290 --> 00:43:38.940
So the covariance is defined
just as this product.
00:43:38.940 --> 00:43:40.760
What does this do?
00:43:40.760 --> 00:43:45.120
If positive x's tends to go
together with positive y's,
00:43:45.120 --> 00:43:49.080
and negative x's tend to go
together with negative y's,
00:43:49.080 --> 00:43:51.860
this product will always
be positive.
00:43:51.860 --> 00:43:54.880
And the covariance will
end up being positive.
00:43:54.880 --> 00:43:59.090
In particular, if you sit down
with a scattered diagram and
00:43:59.090 --> 00:44:01.220
you do the calculations,
you'll find that the
00:44:01.220 --> 00:44:05.480
covariance of X and Y in this
diagram would be positive,
00:44:05.480 --> 00:44:09.680
because here, most of the time,
X times Y is positive.
00:44:09.680 --> 00:44:12.130
There's going to be a few
negative terms, but there are
00:44:12.130 --> 00:44:14.300
fewer than the positive ones.
00:44:14.300 --> 00:44:17.000
So this is a case of a
positive covariance.
00:44:17.000 --> 00:44:19.570
It indicates a positive relation
between the two
00:44:19.570 --> 00:44:20.450
random variables.
00:44:20.450 --> 00:44:24.700
When one is big, the other
also tends to be big.
00:44:24.700 --> 00:44:26.320
This is the opposite
situation.
00:44:26.320 --> 00:44:28.070
Here, when one variable--
00:44:28.070 --> 00:44:31.000
here, most of the action happens
in this quadrant and
00:44:31.000 --> 00:44:35.530
that quadrant, which means that
X times Y, most of the
00:44:35.530 --> 00:44:37.150
time, is negative.
00:44:37.150 --> 00:44:39.130
You get a few positive
contributions,
00:44:39.130 --> 00:44:40.430
but there are few.
00:44:40.430 --> 00:44:44.430
When you add things up, the
negative terms dominate.
00:44:44.430 --> 00:44:46.510
And in this case we
have covariance of
00:44:46.510 --> 00:44:49.560
X and Y being negative.
00:44:49.560 --> 00:44:53.080
So a positive covariance
indicates a sort of systematic
00:44:53.080 --> 00:44:56.280
relation, that there's a
positive association between
00:44:56.280 --> 00:44:57.370
the two random variables.
00:44:57.370 --> 00:45:00.280
When one is large, the other
also tends to be large.
00:45:00.280 --> 00:45:03.060
Negative covariance is
sort of the opposite.
00:45:03.060 --> 00:45:05.690
When one tends to be
large, the other
00:45:05.690 --> 00:45:09.920
variable tends to be small.
00:45:09.920 --> 00:45:15.050
OK, so what else is there to
say about the covariance?
00:45:15.050 --> 00:45:18.280
One observation to make
is the following.
00:45:18.280 --> 00:45:21.105
What's the covariance
of X with X itself?
00:45:23.940 --> 00:45:28.220
If you plug in X here, you see
that what we have is expected
00:45:28.220 --> 00:45:32.130
value of X minus expected
of X squared.
00:45:32.130 --> 00:45:33.790
And that's just the
definition of the
00:45:33.790 --> 00:45:36.170
variance of a random variable.
00:45:36.170 --> 00:45:41.180
So that's one fact
to keep in mind.
00:45:41.180 --> 00:45:44.620
We had a shortcut formula for
calculating variances.
00:45:44.620 --> 00:45:46.900
There's a similar shortcut
formula for calculating
00:45:46.900 --> 00:45:48.380
covariances.
00:45:48.380 --> 00:45:51.440
In particular, we can calculate
covariances in this
00:45:51.440 --> 00:45:52.720
particular way.
00:45:52.720 --> 00:45:56.500
That's just the convenient way
of doing it whenever you need
00:45:56.500 --> 00:45:57.940
to calculate it.
00:45:57.940 --> 00:46:02.690
And finally, covariances are
very useful when you want to
00:46:02.690 --> 00:46:06.420
calculate the variance of a
sum of random variables.
00:46:08.940 --> 00:46:12.610
We know that if two random
variables are independent, the
00:46:12.610 --> 00:46:16.270
variance of the sum is the
sum of the variances.
00:46:16.270 --> 00:46:20.310
When the random variables are
dependent, this is no longer
00:46:20.310 --> 00:46:23.100
true, and we need to supplement
the formula a
00:46:23.100 --> 00:46:24.200
little bit.
00:46:24.200 --> 00:46:26.240
And there's a typo on
the slides that you
00:46:26.240 --> 00:46:27.680
have in your hands.
00:46:27.680 --> 00:46:32.870
That term of 2 shouldn't
be there.
00:46:32.870 --> 00:46:36.870
And let's see where that
formula comes from.
00:46:41.550 --> 00:46:44.290
Let's suppose that our
random variables are
00:46:44.290 --> 00:46:46.530
independent of --
00:46:46.530 --> 00:46:47.530
not independent --
00:46:47.530 --> 00:46:49.395
our random variables
have 0 means.
00:46:55.680 --> 00:46:57.990
And we want to calculate
the variance.
00:46:57.990 --> 00:47:00.900
So the variance is going
to be expected value of
00:47:00.900 --> 00:47:04.150
(X1 plus Xn) squared.
00:47:04.150 --> 00:47:07.140
What you do is you expand
the square.
00:47:07.140 --> 00:47:12.670
And you get the expected value
of the sum of the Xi squared.
00:47:12.670 --> 00:47:14.780
And then you get all
the cross terms.
00:47:23.070 --> 00:47:24.510
OK.
00:47:24.510 --> 00:47:29.420
And so now, here, let's
assume for simplicity
00:47:29.420 --> 00:47:30.880
that we have 0 means.
00:47:30.880 --> 00:47:34.200
The expected value of this is
the sum of the expected values
00:47:34.200 --> 00:47:36.300
of the X squared terms.
00:47:36.300 --> 00:47:38.430
And that gives us
the variance.
00:47:38.430 --> 00:47:41.560
And then we have all the
possible cross terms.
00:47:41.560 --> 00:47:44.220
And each one of the possible
cross terms is the expected
00:47:44.220 --> 00:47:46.620
value of Xi times Xj.
00:47:46.620 --> 00:47:49.250
This is just the covariance.
00:47:49.250 --> 00:47:52.730
So if you can calculate all
the variances and the
00:47:52.730 --> 00:47:56.210
covariances, then you're able to
calculate also the variance
00:47:56.210 --> 00:47:58.540
of a sum of random variables.
00:47:58.540 --> 00:48:03.260
Now, if two random variables are
independent, then you look
00:48:03.260 --> 00:48:04.800
at this expression.
00:48:04.800 --> 00:48:07.700
Because of independence,
expected value of the product
00:48:07.700 --> 00:48:10.990
is going to be the product
of the expected values.
00:48:10.990 --> 00:48:14.080
And the expected value
of just this term is
00:48:14.080 --> 00:48:15.910
always equal to 0.
00:48:15.910 --> 00:48:19.790
You're expected deviation
from the mean is just 0.
00:48:19.790 --> 00:48:22.650
So the covariance will
turn out to be 0.
00:48:22.650 --> 00:48:25.110
So independent random
variables lead to 0
00:48:25.110 --> 00:48:28.320
covariances, although the
opposite fact is not
00:48:28.320 --> 00:48:30.160
necessarily true.
00:48:30.160 --> 00:48:33.250
So covariances give you some
indication of the relation
00:48:33.250 --> 00:48:35.430
between two random variables.
00:48:35.430 --> 00:48:38.370
Something that's not so
convenient conceptually about
00:48:38.370 --> 00:48:41.440
covariances is that it
has the wrong units.
00:48:41.440 --> 00:48:43.290
That's the same comment
that we had
00:48:43.290 --> 00:48:45.520
made regarding variances.
00:48:45.520 --> 00:48:48.730
And with variances we got out
of that issue by considering
00:48:48.730 --> 00:48:52.540
the standard deviation, which
has the correct units.
00:48:52.540 --> 00:48:58.090
So with the same reasoning, we
want to have a concept that
00:48:58.090 --> 00:49:02.150
captures the relation between
two random variables and, in
00:49:02.150 --> 00:49:05.790
some sense, that doesn't have
to do with the units that
00:49:05.790 --> 00:49:07.050
we're dealing.
00:49:07.050 --> 00:49:10.630
We want to have a dimensionless
quantity.
00:49:10.630 --> 00:49:14.040
That tells us how strongly two
random variables are related
00:49:14.040 --> 00:49:16.010
to each other.
00:49:16.010 --> 00:49:21.180
So instead of considering the
covariance of just X with Y,
00:49:21.180 --> 00:49:24.860
we take our random variables
and standardize them by
00:49:24.860 --> 00:49:28.430
dividing them by their
individual standard deviations
00:49:28.430 --> 00:49:30.460
and take the expectation
of this.
00:49:30.460 --> 00:49:34.780
So what we end up doing is the
covariance of X and Y, which
00:49:34.780 --> 00:49:39.160
has units that are the units of
X times the units of Y. But
00:49:39.160 --> 00:49:41.710
divide with a standard
deviation, so that we get a
00:49:41.710 --> 00:49:44.090
quantity that doesn't
have units.
00:49:44.090 --> 00:49:47.890
This quantity, we call it the
correlation coefficient.
00:49:47.890 --> 00:49:51.060
And it's a very useful quantity,
a very useful
00:49:51.060 --> 00:49:53.610
measure of the strength
of association
00:49:53.610 --> 00:49:55.580
between two random variables.
00:49:55.580 --> 00:49:59.750
It's very informative, because
it falls always
00:49:59.750 --> 00:50:02.330
between -1 and +1.
00:50:02.330 --> 00:50:06.240
This is an algebraic exercise
that you're going to see in
00:50:06.240 --> 00:50:07.780
recitation.
00:50:07.780 --> 00:50:10.600
And the way that you interpret
it is as follows.
00:50:10.600 --> 00:50:13.360
If the two random variables
are independent, the
00:50:13.360 --> 00:50:15.390
covariance is going to be 0.
00:50:15.390 --> 00:50:18.170
The correlation coefficient
is going to be 0.
00:50:18.170 --> 00:50:23.340
So 0 correlation coefficient
basically indicates a lack of
00:50:23.340 --> 00:50:26.570
a systematic relation between
the two random variables.
00:50:26.570 --> 00:50:31.710
On the other hand, when rho is
large, either close to 1 or
00:50:31.710 --> 00:50:34.850
close to -1, this is an
indication of a strong
00:50:34.850 --> 00:50:37.660
association between the
two random variables.
00:50:37.660 --> 00:50:42.770
And the extreme case is when
rho takes an extreme value.
00:50:42.770 --> 00:50:46.300
When rho has a magnitude
equal to 1, it's as
00:50:46.300 --> 00:50:47.790
big as it can be.
00:50:47.790 --> 00:50:50.210
In that case, the two
random variables are
00:50:50.210 --> 00:50:53.630
very strongly related.
00:50:53.630 --> 00:50:54.650
How strongly?
00:50:54.650 --> 00:50:58.030
Well, if you know one random
variable, if you know the
00:50:58.030 --> 00:51:03.530
value of y, you can recover the
value of x and conversely.
00:51:03.530 --> 00:51:07.210
So the case of a complete
correlation is the case where
00:51:07.210 --> 00:51:11.300
one random variable is a linear
function of the other
00:51:11.300 --> 00:51:12.560
random variable.
00:51:12.560 --> 00:51:16.940
In terms of a scatter plot, this
would mean that there's a
00:51:16.940 --> 00:51:22.060
certain line and that the only
possible (x,y) pairs that can
00:51:22.060 --> 00:51:24.940
happen would lie on that line.
00:51:24.940 --> 00:51:28.920
So if all the possible (x,y)
pairs lie on this line, then
00:51:28.920 --> 00:51:32.340
you have this relation, and the
correlation coefficient is
00:51:32.340 --> 00:51:33.440
equal to 1.
00:51:33.440 --> 00:51:36.580
A case where the correlation
coefficient is close to 1
00:51:36.580 --> 00:51:40.480
would be a scatter plot like
this, where the x's and y's
00:51:40.480 --> 00:51:44.820
are quite strongly aligned with
each other, maybe not
00:51:44.820 --> 00:51:47.920
exactly, but fairly strongly.
00:51:47.920 --> 00:51:50.760
All right, so you're going to
hear a little more about
00:51:50.760 --> 00:51:52.710
correlation coefficients
and covariances
00:51:52.710 --> 00:51:53.960
in recitation tomorrow.