WEBVTT
00:00:00.570 --> 00:00:03.350
In this segment, we justify some
of the property is that
00:00:03.350 --> 00:00:05.230
the correlation coefficient
that we
00:00:05.230 --> 00:00:07.780
claimed a little earlier.
00:00:07.780 --> 00:00:10.560
The most important properties of
the correlation coefficient
00:00:10.560 --> 00:00:13.780
lies between minus
1 and plus 1.
00:00:13.780 --> 00:00:16.950
We will prove this property for
the special case where we
00:00:16.950 --> 00:00:20.950
have random variables with zero
means and unit variances.
00:00:20.950 --> 00:00:24.670
So standard deviations are also
1, so most of the terms
00:00:24.670 --> 00:00:27.560
here disappear and the
correlation coefficient is
00:00:27.560 --> 00:00:30.690
simply the expected value
of X times Y.
00:00:30.690 --> 00:00:33.280
We will show that in this
special case the expected
00:00:33.280 --> 00:00:37.370
value of X times Y lies
between minus 1 and 1.
00:00:37.370 --> 00:00:42.630
But the proof of this fact
remains valid with a little
00:00:42.630 --> 00:00:45.790
bit of more algebra along
similar lines
00:00:45.790 --> 00:00:48.501
for the general case.
00:00:48.501 --> 00:00:53.350
What we will do is we will
consider this quantity here
00:00:53.350 --> 00:00:57.200
and expand this quadratic
and write it as
00:00:57.200 --> 00:00:59.790
expected value of X squared.
00:00:59.790 --> 00:01:03.160
Then there's a cross term,
which is minus 2 rho, the
00:01:03.160 --> 00:01:10.070
expected value of X times Y,
plus rho squared, expected
00:01:10.070 --> 00:01:13.560
value of Y squared.
00:01:13.560 --> 00:01:17.840
Now since we assume that the
random variables have 0 mean,
00:01:17.840 --> 00:01:20.370
this is the same as the variance
and we assume that
00:01:20.370 --> 00:01:25.100
the variance is 1, so this
term here is equal to 1.
00:01:25.100 --> 00:01:29.050
Now, the expected value of X
times Y is the same as the
00:01:29.050 --> 00:01:31.170
correlation coefficient
in this case.
00:01:31.170 --> 00:01:35.190
So we have minus 2 rho
squared and from
00:01:35.190 --> 00:01:36.870
here we have rho squared.
00:01:36.870 --> 00:01:40.030
And by the previous argument,
again this quantity, according
00:01:40.030 --> 00:01:43.740
to our assumptions, is equal to
1 so we're left with this
00:01:43.740 --> 00:01:49.830
expression, which is 1
minus rho squared.
00:01:49.830 --> 00:01:52.979
Now, notice that this is the
expectation of a non-negative
00:01:52.979 --> 00:01:57.210
random variable so this
quantity here must be
00:01:57.210 --> 00:01:58.560
non-negative.
00:01:58.560 --> 00:02:06.470
Therefore, 1 minus rho squared
is non-negative, which means
00:02:06.470 --> 00:02:11.850
that rho squared is less
than or equal to 1.
00:02:11.850 --> 00:02:15.230
And that's the same as requiring
that rho lie between
00:02:15.230 --> 00:02:17.820
minus 1 and plus 1.
00:02:17.820 --> 00:02:21.310
And so we have established this
important property, at
00:02:21.310 --> 00:02:24.150
least for the special case of
0 means and unit variances.
00:02:24.150 --> 00:02:28.250
But as I mentioned, it remains
valid more generally.
00:02:28.250 --> 00:02:32.920
Now let us look at an extreme
case, when the absolute value
00:02:32.920 --> 00:02:35.410
of rho is equal to 1.
00:02:35.410 --> 00:02:36.986
What happens in this case?
00:02:36.986 --> 00:02:43.410
In that case, this term is 0
and this implies that the
00:02:43.410 --> 00:02:46.870
expected value of the square
of this random variable is
00:02:46.870 --> 00:02:48.250
equal to 0.
00:02:48.250 --> 00:02:51.770
Now here we have a non-negative
random variable,
00:02:51.770 --> 00:02:55.390
and its expected value is 0,
which means that when we
00:02:55.390 --> 00:02:58.470
calculate the expected value
of this there will be no
00:02:58.470 --> 00:03:02.710
positive contributions and so
the only contributions must be
00:03:02.710 --> 00:03:04.000
equal to 0.
00:03:04.000 --> 00:03:09.100
This means that X minus rho Y
has to be equal to 0 with
00:03:09.100 --> 00:03:11.860
probability 1.
00:03:11.860 --> 00:03:17.260
So X is going to be equal to
rho times Y and this will
00:03:17.260 --> 00:03:19.700
happen with essential
certainty.
00:03:19.700 --> 00:03:23.250
Now also because the absolute
value overall is equal to 1,
00:03:23.250 --> 00:03:30.490
this means that we have either
X equal to Y or X equals to
00:03:30.490 --> 00:03:35.210
minus Y, in case rho is
equal to minus 1.
00:03:35.210 --> 00:03:38.100
So we see that if the
correlation coefficient has an
00:03:38.100 --> 00:03:42.280
absolute value of 1, then X
and Y are related to each
00:03:42.280 --> 00:03:47.620
other according to a simple
linear relation, and it's an
00:03:47.620 --> 00:03:49.579
extreme form of dependence
between
00:03:49.579 --> 00:03:50.829
the two random variables.