WEBVTT
00:00:00.460 --> 00:00:04.310
By now, we have defined the
notion of independence of
00:00:04.310 --> 00:00:06.480
events and also the notion of
00:00:06.480 --> 00:00:09.080
independence of random variables.
00:00:09.080 --> 00:00:12.130
The two definitions look
fairly similar, but the
00:00:12.130 --> 00:00:14.820
details are not exactly the
same, because the two
00:00:14.820 --> 00:00:18.190
definitions refer to different
situations.
00:00:18.190 --> 00:00:21.370
For two events, we know what
it means for them to be
00:00:21.370 --> 00:00:22.420
independent.
00:00:22.420 --> 00:00:25.250
The probability of their
intersection is the product of
00:00:25.250 --> 00:00:27.490
their individual
probabilities.
00:00:27.490 --> 00:00:32.240
Now, to make a relation with
random variables, we introduce
00:00:32.240 --> 00:00:35.430
the so-called indicator
random variables.
00:00:35.430 --> 00:00:38.910
So for example, the random
variable X is defined to be
00:00:38.910 --> 00:00:43.660
equal to 1 if event A occurs
and to be equal
00:00:43.660 --> 00:00:45.870
to 0 if event [A]
00:00:45.870 --> 00:00:47.640
does not occur.
00:00:47.640 --> 00:00:50.660
And there is a similar
definition for random variable
00:00:50.660 --> 00:00:51.720
Y.
00:00:51.720 --> 00:00:55.130
In particular, the probability
that random variable X takes
00:00:55.130 --> 00:00:57.260
the value of 1, this
is the probability
00:00:57.260 --> 00:00:58.870
that event A occurs.
00:01:01.960 --> 00:01:05.610
It turns out that the
independence of the two
00:01:05.610 --> 00:01:10.670
events, A and B, is equivalent
to the independence of the two
00:01:10.670 --> 00:01:13.789
indicator random variables.
00:01:13.789 --> 00:01:15.780
And there is a similar
statement, which
00:01:15.780 --> 00:01:17.730
is true more generally.
00:01:17.730 --> 00:01:22.450
That is, n events are
independent if and only if the
00:01:22.450 --> 00:01:28.520
associated n indicator random
variables are independent.
00:01:28.520 --> 00:01:33.670
This is a useful statement,
because it allows us to
00:01:33.670 --> 00:01:36.950
sometimes, instead of
manipulating events, to
00:01:36.950 --> 00:01:39.810
manipulate random variables,
and vice versa.
00:01:39.810 --> 00:01:42.100
And depending on the
context, one maybe
00:01:42.100 --> 00:01:43.950
easier than the other.
00:01:43.950 --> 00:01:47.289
Now, the intuitive content is
that events A and B are
00:01:47.289 --> 00:01:50.320
independent if the occurrence
of event A does not change
00:01:50.320 --> 00:01:54.720
your beliefs about B. And in
terms of random variables, one
00:01:54.720 --> 00:01:58.110
random variable taking a certain
value, which indicates
00:01:58.110 --> 00:02:01.580
whether event A has occurred or
not does not give you any
00:02:01.580 --> 00:02:04.480
information about the other
random variable, which would
00:02:04.480 --> 00:02:09.720
tell you whether event B
has occurred or not.
00:02:09.720 --> 00:02:13.430
It is instructive now to go
through the derivation of this
00:02:13.430 --> 00:02:18.030
fact, at least for the case of
two events, because it gives
00:02:18.030 --> 00:02:21.050
us perhaps some additional
understanding about the
00:02:21.050 --> 00:02:23.710
precise content of the
definitions we have
00:02:23.710 --> 00:02:25.440
introduced.
00:02:25.440 --> 00:02:28.850
So let us suppose that random
variables X and Y are
00:02:28.850 --> 00:02:30.070
independent.
00:02:30.070 --> 00:02:31.320
What does that mean?
00:02:31.320 --> 00:02:34.680
Independence means that the
joint PMF of the two random
00:02:34.680 --> 00:02:40.160
variables, X and Y, factors as a
product of the corresponding
00:02:40.160 --> 00:02:42.480
marginal PMFs.
00:02:42.480 --> 00:02:48.670
And this factorization must be
true no matter what arguments
00:02:48.670 --> 00:02:52.840
we use inside the joint PMF.
00:02:52.840 --> 00:02:57.030
And the combination of X and Y
in this instance have a total
00:02:57.030 --> 00:03:00.240
of four possible values.
00:03:00.240 --> 00:03:02.130
These are the combinations
of zeroes and
00:03:02.130 --> 00:03:03.740
ones that we can form.
00:03:03.740 --> 00:03:07.420
And for this reason, we have
a total of four equations.
00:03:07.420 --> 00:03:11.510
These four equalities are what
is required for X and Y to be
00:03:11.510 --> 00:03:13.130
independent.
00:03:13.130 --> 00:03:16.780
So suppose that this is true,
that the random variables are
00:03:16.780 --> 00:03:18.320
independent.
00:03:18.320 --> 00:03:21.329
Let us take this first relation
and write it in
00:03:21.329 --> 00:03:23.860
probability notation.
00:03:23.860 --> 00:03:26.710
The random variable X taking
the value of 1, that's the
00:03:26.710 --> 00:03:29.740
same as event A occurring.
00:03:29.740 --> 00:03:33.329
And random variable Y taking
the value of 1, that's the
00:03:33.329 --> 00:03:36.400
same as event B occurring.
00:03:36.400 --> 00:03:40.960
So the joint PMF evaluated at
1, 1 is the probability that
00:03:40.960 --> 00:03:44.170
events A and B both occur.
00:03:44.170 --> 00:03:46.040
On the other side of the
equation, we have the
00:03:46.040 --> 00:03:48.860
probability that X is equal to
1, which is the probability
00:03:48.860 --> 00:03:53.980
that A occurs, and similarly,
the probability that B occurs.
00:03:53.980 --> 00:03:59.690
But if this is true, then by
definition, A and B are
00:03:59.690 --> 00:04:02.910
independent events.
00:04:02.910 --> 00:04:05.840
So we have verified one
direction of this statement.
00:04:05.840 --> 00:04:09.310
If the random variables are
independent, then events A and
00:04:09.310 --> 00:04:11.430
B are independent.
00:04:11.430 --> 00:04:15.180
Now, we would like to verify
the reverse statement.
00:04:15.180 --> 00:04:19.170
So suppose that events A
and B are independent.
00:04:19.170 --> 00:04:23.560
In that case, this
relation is true.
00:04:23.560 --> 00:04:28.060
And as we just argued, this
relation is the same as this
00:04:28.060 --> 00:04:31.650
relation but just written
in different notation.
00:04:31.650 --> 00:04:35.500
So we have shown that if A and
B are independent, this
00:04:35.500 --> 00:04:37.890
relation will be true.
00:04:37.890 --> 00:04:40.530
But how about the remaining
three relations?
00:04:40.530 --> 00:04:41.850
We have more work to do.
00:04:44.400 --> 00:04:46.870
Here's how we can proceed.
00:04:46.870 --> 00:04:52.260
If A and B are independent, we
have shown some time ago that
00:04:52.260 --> 00:04:58.670
events A and B complement will
also be independent.
00:04:58.670 --> 00:05:01.560
Intuitively, A doesn't tell
you anything about
00:05:01.560 --> 00:05:03.330
B occuring or not.
00:05:03.330 --> 00:05:06.270
So A does not tell you anything
about whether B
00:05:06.270 --> 00:05:09.060
complement will occur or not.
00:05:09.060 --> 00:05:12.200
Now, these two events being
independent, by the definition
00:05:12.200 --> 00:05:15.640
of independence, we have that
the probability of A
00:05:15.640 --> 00:05:19.190
intersection with B complement
is the product of the
00:05:19.190 --> 00:05:22.968
probabilities of A and
of B complement.
00:05:26.250 --> 00:05:31.620
And then we realize that this
equality, if written in PMF
00:05:31.620 --> 00:05:37.409
notation, corresponds exactly
to this equation here.
00:05:37.409 --> 00:05:41.760
Event A corresponds to X taking
the value of 1, event B
00:05:41.760 --> 00:05:44.770
complement corresponds to
the event that Y takes
00:05:44.770 --> 00:05:47.830
the value of 0.
00:05:47.830 --> 00:05:52.980
By a similar argument, B and
A complement will be
00:05:52.980 --> 00:05:55.780
independent.
00:05:55.780 --> 00:05:59.415
And we translate that into
probability notation.
00:06:09.850 --> 00:06:14.250
And then we translate this
equality into PMF notation.
00:06:14.250 --> 00:06:16.380
And we get this relation.
00:06:16.380 --> 00:06:21.880
Finally, using the same property
that we used to do
00:06:21.880 --> 00:06:26.270
the first step here, we have
that A complement and B
00:06:26.270 --> 00:06:28.820
complement are also
independent.
00:06:28.820 --> 00:06:32.170
And by following the same line
of reasoning, this implies the
00:06:32.170 --> 00:06:35.300
fourth relation as well.
00:06:35.300 --> 00:06:38.310
So we have verified that
if events A and B are
00:06:38.310 --> 00:06:41.590
independent, then we can argue
that all of these four
00:06:41.590 --> 00:06:43.970
equations will be true.
00:06:43.970 --> 00:06:47.830
And therefore, random variables
X and Y will also be
00:06:47.830 --> 00:06:49.080
independent.