WEBVTT
00:00:00.610 --> 00:00:03.930
Let us now revisit the variance
and see what happens
00:00:03.930 --> 00:00:06.000
in the case of independence.
00:00:06.000 --> 00:00:08.640
Variances have some general
properties that we have
00:00:08.640 --> 00:00:11.010
already seen.
00:00:11.010 --> 00:00:14.910
However, since we often add
random variables, we would
00:00:14.910 --> 00:00:18.020
like to be able to say something
about the variance
00:00:18.020 --> 00:00:20.800
of the sum of two random
variables.
00:00:20.800 --> 00:00:24.240
Unfortunately, the situation
is not so simple, and in
00:00:24.240 --> 00:00:28.150
general, the variance of the sum
is not the same as the sum
00:00:28.150 --> 00:00:29.670
of the variances.
00:00:29.670 --> 00:00:32.150
We will see an example
shortly.
00:00:32.150 --> 00:00:36.430
On the other hand, when X and
Y are independent, the
00:00:36.430 --> 00:00:41.610
variance of the sum is equal to
the sum of the variances,
00:00:41.610 --> 00:00:44.030
and this is a very
useful fact.
00:00:44.030 --> 00:00:47.040
Let us go through the derivation
of this property.
00:00:47.040 --> 00:00:52.410
But to keep things simple, let
us assume just for the sake of
00:00:52.410 --> 00:00:56.820
the derivation, that the two
random variables have 0 mean.
00:01:02.650 --> 00:01:08.700
So in that case, the variance
over the sum is just the
00:01:08.700 --> 00:01:11.675
expected value of the
square of the sum.
00:01:17.210 --> 00:01:21.025
And we can expand the quadratic
and write this as
00:01:21.025 --> 00:01:29.170
the expectation of X squared
plus 2 X Y plus Y squared.
00:01:29.170 --> 00:01:33.160
Then we use linearity of
expectations to write this as
00:01:33.160 --> 00:01:38.450
the expected value of X
squared plus twice the
00:01:38.450 --> 00:01:46.910
expected value of X times Y
and then plus the expected
00:01:46.910 --> 00:01:50.570
value of Y squared.
00:01:50.570 --> 00:01:55.370
Now, the first term is just the
variance of X because we
00:01:55.370 --> 00:01:56.920
have assumed that
we have 0 mean.
00:01:59.680 --> 00:02:06.320
The last term is similarly the
variance of Y. How about the
00:02:06.320 --> 00:02:07.870
middle term?
00:02:07.870 --> 00:02:12.900
Because of independence, the
expected value of the product
00:02:12.900 --> 00:02:18.850
is the same as the product of
the expected values, and the
00:02:18.850 --> 00:02:21.690
expected values are
0 in our case.
00:02:21.690 --> 00:02:26.720
So this term, because of
independence, is going to be
00:02:26.720 --> 00:02:28.270
equal to 0.
00:02:28.270 --> 00:02:33.060
In particular, what we have is
that the expected value of XY
00:02:33.060 --> 00:02:38.750
equals the expected value of X
times the expected value of Y,
00:02:38.750 --> 00:02:41.920
equal to 0.
00:02:41.920 --> 00:02:45.180
And so we have verified that
indeed the variance of the sum
00:02:45.180 --> 00:02:47.910
is equal to the sum
of the variances.
00:02:47.910 --> 00:02:49.685
Let us now look at
some examples.
00:02:52.560 --> 00:02:56.730
Suppose that X is the same
random variable as Y. Clearly,
00:02:56.730 --> 00:02:59.610
this is a case where
independence fails to hold.
00:02:59.610 --> 00:03:03.410
If I tell you the value of X,
then you know the value of Y.
00:03:03.410 --> 00:03:07.400
So in this case, the variance of
the sum is the same as the
00:03:07.400 --> 00:03:12.910
variance of twice X. Since X is
the same as Y, X plus Y is
00:03:12.910 --> 00:03:18.760
2 times X. And then using this
property for the variance,
00:03:18.760 --> 00:03:21.500
what happens when we multiply
by a constant?
00:03:21.500 --> 00:03:27.110
This is going to be 4 times
the variance of X.
00:03:27.110 --> 00:03:31.230
In another example, suppose that
X is the negative of Y.
00:03:31.230 --> 00:03:35.790
In that case, X plus Y is
identically equal to 0.
00:03:35.790 --> 00:03:38.640
So we're dealing with a
random variable that
00:03:38.640 --> 00:03:40.030
takes a constant value.
00:03:40.030 --> 00:03:43.510
In particular, it is always
equal to its mean, and so the
00:03:43.510 --> 00:03:46.790
difference from the mean is
always equal to 0, and so the
00:03:46.790 --> 00:03:50.370
variance will also
evaluate to 0.
00:03:50.370 --> 00:03:53.720
So we see that the variance
of the sum can take quite
00:03:53.720 --> 00:03:58.410
different values depending on
the sort of interrelation that
00:03:58.410 --> 00:04:01.450
we have between the two
random variables.
00:04:01.450 --> 00:04:04.740
So these two examples indicate
that knowing the variance of
00:04:04.740 --> 00:04:08.780
each one of the random variables
is not enough to say
00:04:08.780 --> 00:04:11.440
much about the variance
of the sum.
00:04:11.440 --> 00:04:14.530
The answer will generally depend
on how the two random
00:04:14.530 --> 00:04:17.260
variables are related to each
other and what kind of
00:04:17.260 --> 00:04:19.410
dependencies they have.
00:04:19.410 --> 00:04:25.260
As a last example, suppose now
that X and Y are independent.
00:04:25.260 --> 00:04:29.080
X is independent from Y,
and therefore X is also
00:04:29.080 --> 00:04:32.740
independent from minus 3Y.
00:04:32.740 --> 00:04:36.620
Therefore, this variance is
equal to the sum of the
00:04:36.620 --> 00:04:44.080
variances of X and
of minus 3Y.
00:04:44.080 --> 00:04:48.190
And using the facts that we
already know, this is going to
00:04:48.190 --> 00:04:55.440
be equal to the variance of X
plus 9 times the variance of
00:04:55.440 --> 00:04:59.490
Y.
00:04:59.490 --> 00:05:02.970
As an illustration of the
usefulness of the property of
00:05:02.970 --> 00:05:06.300
the variance that we have just
established, we will now use
00:05:06.300 --> 00:05:10.240
it to calculate the variance of
a binomial random variable.
00:05:10.240 --> 00:05:14.330
Remember that a binomial with
parameters n and p corresponds
00:05:14.330 --> 00:05:17.780
to the number of successes
in n independent trials.
00:05:20.400 --> 00:05:22.700
We use indicator variables.
00:05:22.700 --> 00:05:25.870
This is the same trick that we
used to calculate the expected
00:05:25.870 --> 00:05:27.670
value of the binomial.
00:05:27.670 --> 00:05:31.795
So the random variable X sub
i is equal to 1 if the i-th
00:05:31.795 --> 00:05:36.070
trial is a success and
is a 0 otherwise.
00:05:36.070 --> 00:05:42.030
And as we did before, we note
that X, the total number of
00:05:42.030 --> 00:05:45.909
successes, is the sum of those
indicator variables.
00:05:45.909 --> 00:05:49.930
Each success makes one of those
variables equal to 1, so
00:05:49.930 --> 00:05:53.630
by adding those indicator
variables, we're just counting
00:05:53.630 --> 00:05:56.120
the number of successes.
00:05:56.120 --> 00:06:01.100
The key point to note is that
the assumption of independence
00:06:01.100 --> 00:06:05.400
that we're making is essentially
the assumption
00:06:05.400 --> 00:06:11.670
that these random variables Xi
are independent of each other.
00:06:11.670 --> 00:06:16.130
So we're dealing with a
situation where we have a sum
00:06:16.130 --> 00:06:20.790
of independent random variables,
and according to
00:06:20.790 --> 00:06:25.490
what we have shown, the variance
of X is going to be
00:06:25.490 --> 00:06:28.103
the sum of the variances
of the Xi's.
00:06:34.620 --> 00:06:39.500
Now, the Xi's all have the same
distribution so all these
00:06:39.500 --> 00:06:40.950
variances will be the same.
00:06:40.950 --> 00:06:45.520
It suffices to consider
one of them.
00:06:45.520 --> 00:06:49.409
Now, X1 is a Bernoulli random
variable with parameter p.
00:06:49.409 --> 00:06:52.210
We know what its variance is--
00:06:52.210 --> 00:06:57.580
it is p times 1 minus p.
00:06:57.580 --> 00:07:02.890
And therefore, this is the
formula for the variance of a
00:07:02.890 --> 00:07:04.380
binomial random variable.