WEBVTT
00:00:00.880 --> 00:00:03.730
One situation where
covariances show up
00:00:03.730 --> 00:00:06.110
is when we try to
calculate the variance
00:00:06.110 --> 00:00:08.360
of a sum of random variables.
00:00:08.360 --> 00:00:10.210
So let us look at the
variance of the sum
00:00:10.210 --> 00:00:13.120
of two random
variables, X1 and X2.
00:00:13.120 --> 00:00:15.520
If the two random
variables are independent,
00:00:15.520 --> 00:00:17.650
then we know that the
variance of the sum
00:00:17.650 --> 00:00:20.040
is the sum of the variances.
00:00:20.040 --> 00:00:23.290
Let us now look at what
happens in the case where
00:00:23.290 --> 00:00:26.120
we may have dependence.
00:00:26.120 --> 00:00:30.350
By definition, the variance
is the expected value
00:00:30.350 --> 00:00:32.900
of the difference of the
random variable we're
00:00:32.900 --> 00:00:40.210
interested in from its
expected value, squared.
00:00:43.480 --> 00:00:47.130
And now we rearrange
terms here and write
00:00:47.130 --> 00:00:51.180
what is inside the
expectation as follows.
00:00:51.180 --> 00:01:00.640
We put together X1 with the term
minus the expected value of X1
00:01:00.640 --> 00:01:07.170
and then X2 together with
negative the expected value
00:01:07.170 --> 00:01:07.700
of X2.
00:01:15.590 --> 00:01:21.270
So now we have the square
of the sum of two terms.
00:01:21.270 --> 00:01:25.000
We expand the
quadratic to obtain
00:01:25.000 --> 00:01:32.250
expected value of the
square of the first term
00:01:32.250 --> 00:01:45.530
plus the square of the second
term plus 2 times a cross term.
00:01:56.770 --> 00:01:58.890
And what do we have here?
00:01:58.890 --> 00:02:01.210
The expected value
of the first term
00:02:01.210 --> 00:02:04.970
is just the variance of X1.
00:02:04.970 --> 00:02:08.419
The expected value
of this second term
00:02:08.419 --> 00:02:12.740
is just the variance of X2.
00:02:12.740 --> 00:02:17.079
And finally, the cross term,
the expected value of it,
00:02:17.079 --> 00:02:21.370
we recognize that it is the
same as the covariance of X1
00:02:21.370 --> 00:02:22.590
with X2.
00:02:22.590 --> 00:02:26.520
And we also have this
factor of 2 up here.
00:02:26.520 --> 00:02:29.260
So this is the general
form for the variance
00:02:29.260 --> 00:02:32.040
of the sum of two
random variables.
00:02:32.040 --> 00:02:35.730
In the case of independence,
the covariance is 0,
00:02:35.730 --> 00:02:38.520
and we just have the sum
of the two variances.
00:02:38.520 --> 00:02:41.300
But when the random
variables are dependent,
00:02:41.300 --> 00:02:44.240
it is possible that the
covariance will be non-zero,
00:02:44.240 --> 00:02:46.940
and we have one additional term.
00:02:46.940 --> 00:02:49.860
Let us now not generalize
this calculation.
00:02:49.860 --> 00:02:51.970
Here is for reference
and comparison
00:02:51.970 --> 00:02:55.430
the formula for the case where
we add two random variables.
00:02:55.430 --> 00:02:57.140
But now let us look
at the variance
00:02:57.140 --> 00:02:59.220
of the sum of many of them.
00:02:59.220 --> 00:03:01.380
To keep the calculation
simple, we're
00:03:01.380 --> 00:03:05.230
going to assume that
the means are zero.
00:03:05.230 --> 00:03:07.390
But the final
conclusion will also
00:03:07.390 --> 00:03:10.460
be valid for the case
of non-zero means.
00:03:10.460 --> 00:03:12.810
Since we have
assumed zero means,
00:03:12.810 --> 00:03:15.740
the variance is the same
as the expected value
00:03:15.740 --> 00:03:18.890
of the square of the
random variable involved,
00:03:18.890 --> 00:03:21.130
which is this one.
00:03:21.130 --> 00:03:27.190
And now we expand this quadratic
to obtain the expected value
00:03:27.190 --> 00:03:30.620
of: we will have
a bunch of terms
00:03:30.620 --> 00:03:36.800
of this, where i
ranges from 1 up to n.
00:03:36.800 --> 00:03:43.180
And then we will have a bunch of
cross terms of the form Xi, Xj.
00:03:43.180 --> 00:03:48.380
And we obtain one cross
term for each choice of i
00:03:48.380 --> 00:03:55.020
from 1 to n and for each
choice of j from 1 to n,
00:03:55.020 --> 00:03:58.566
as long as i is
different from j.
00:03:58.566 --> 00:04:04.740
So overall here, this sum will
have n squared minus n terms.
00:04:08.810 --> 00:04:13.840
Now, we use linearity
to move the expectation
00:04:13.840 --> 00:04:15.800
inside the summation.
00:04:15.800 --> 00:04:21.240
And so from here,
we obtain the sum
00:04:21.240 --> 00:04:25.940
of the expected value
of Xi squared, which
00:04:25.940 --> 00:04:28.820
is the same as the
variance of Xi,
00:04:28.820 --> 00:04:31.630
since we assumed zero means.
00:04:31.630 --> 00:04:36.700
And similarly here, we're going
to get this double sum over i's
00:04:36.700 --> 00:04:41.460
that are different from j of
the expected value of Xi, Xj.
00:04:41.460 --> 00:04:44.270
And in the case of
0 means again, this
00:04:44.270 --> 00:04:48.830
is the same as the
covariance of Xi with Xj.
00:04:48.830 --> 00:04:52.490
And so we have obtained
this general formula
00:04:52.490 --> 00:04:56.470
that gives us the variance
of a sum of random variables.
00:04:56.470 --> 00:04:59.220
If the random variables
have 0 covariances,
00:04:59.220 --> 00:05:02.740
then the variance of the sum
is the sum of the variances.
00:05:02.740 --> 00:05:05.460
And this happens in particular
when the random variables
00:05:05.460 --> 00:05:06.850
are independent.
00:05:06.850 --> 00:05:09.940
For the general case, where
we may have dependencies
00:05:09.940 --> 00:05:13.310
and non-zero variances,
then the variance of the sum
00:05:13.310 --> 00:05:17.320
involves also all the
possible covariances
00:05:17.320 --> 00:05:20.220
between the different
random variables.
00:05:20.220 --> 00:05:23.540
And let me finally add
that this formula is also
00:05:23.540 --> 00:05:26.380
valid for the general
case where we do not
00:05:26.380 --> 00:05:28.690
assume that the means are zero.
00:05:28.690 --> 00:05:31.280
And the derivation
is very similar,
00:05:31.280 --> 00:05:33.300
except that there's
a few more symbols
00:05:33.300 --> 00:05:35.300
that are floating around.