WEBVTT
00:00:02.550 --> 00:00:05.000
The probability axioms
are the basic rules
00:00:05.000 --> 00:00:06.710
of probability theory.
00:00:06.710 --> 00:00:09.230
And they are surprisingly few.
00:00:09.230 --> 00:00:11.590
But they imply many interesting
properties that we
00:00:11.590 --> 00:00:13.080
will now explore.
00:00:13.080 --> 00:00:16.910
First we will see that what you
might think of as missing
00:00:16.910 --> 00:00:21.920
axioms are actually implied by
the axioms already in place.
00:00:21.920 --> 00:00:26.850
For example, we have an axiom
that probabilities are
00:00:26.850 --> 00:00:29.130
non-negative.
00:00:29.130 --> 00:00:34.010
We will show that probabilities
are also less
00:00:34.010 --> 00:00:36.600
than or equal to 1.
00:00:36.600 --> 00:00:40.260
We have another axiom that says
that the probability of
00:00:40.260 --> 00:00:42.840
the entire sample space is 1.
00:00:42.840 --> 00:00:45.740
We will show a counterpart that
the probability of the
00:00:45.740 --> 00:00:48.540
empty set is equal to 0.
00:00:48.540 --> 00:00:49.920
This makes perfect sense.
00:00:49.920 --> 00:00:53.720
The empty set has no elements,
so it is impossible.
00:00:53.720 --> 00:00:57.720
There is 0 probability that the
outcome of the experiment
00:00:57.720 --> 00:01:00.790
would lie in the empty set.
00:01:00.790 --> 00:01:03.430
We also have another
intuitive property.
00:01:03.430 --> 00:01:07.460
The probability that an event
happens plus the probability
00:01:07.460 --> 00:01:10.550
that the vendor does not
happen exhaust all
00:01:10.550 --> 00:01:11.720
possibilities.
00:01:11.720 --> 00:01:14.630
And these two probabilities
together should add to 1.
00:01:14.630 --> 00:01:17.920
For instance, if the probability
of heads is 0.6,
00:01:17.920 --> 00:01:22.430
then the probability of
tails should be 0.4.
00:01:22.430 --> 00:01:26.100
Finally, we can generalize the
additivity axiom, which was
00:01:26.100 --> 00:01:32.320
originally given for the case of
two disjoint events to the
00:01:32.320 --> 00:01:35.270
case where we're dealing
with the union of
00:01:35.270 --> 00:01:38.690
several disjoint events.
00:01:38.690 --> 00:01:43.140
By disjoint here we mean that
the intersection of any two of
00:01:43.140 --> 00:01:45.840
these events is the empty set.
00:01:45.840 --> 00:01:48.970
We will prove this for the case
of three events and then
00:01:48.970 --> 00:01:52.310
the argument generalizes for
the case where we're taking
00:01:52.310 --> 00:01:55.490
the union of k disjoint
events, where k
00:01:55.490 --> 00:01:57.750
is any finite number.
00:01:57.750 --> 00:02:00.210
So the intuition of this result
is the same as for the
00:02:00.210 --> 00:02:01.730
case of two events.
00:02:01.730 --> 00:02:05.480
But we will derive it formally
and we will also use it to
00:02:05.480 --> 00:02:08.259
come up with a way of
calculating the probability of
00:02:08.259 --> 00:02:12.740
a finite set by simply adding
the probabilities of its
00:02:12.740 --> 00:02:14.800
individual elements.
00:02:14.800 --> 00:02:18.110
All of these statements
that we just
00:02:18.110 --> 00:02:20.650
presented are intuitive.
00:02:20.650 --> 00:02:22.430
And you do not to really
need to be
00:02:22.430 --> 00:02:24.760
convinced about their validity.
00:02:24.760 --> 00:02:27.940
Nevertheless, it is instructive
to see how these
00:02:27.940 --> 00:02:31.260
statements follow from
the axioms that
00:02:31.260 --> 00:02:32.510
we have put in place.
00:02:35.210 --> 00:02:39.550
So we will now present the
arguments based only on the
00:02:39.550 --> 00:02:41.690
three axioms that we
have available.
00:02:41.690 --> 00:02:45.440
And in order to be able to refer
to these axioms, let us
00:02:45.440 --> 00:02:51.310
give them some names, call
them axioms A, B, and C.
00:02:51.310 --> 00:02:52.930
We start as follows.
00:02:52.930 --> 00:02:57.390
Let us look at the sample space
and a subset of that
00:02:57.390 --> 00:02:58.640
sample space.
00:02:58.640 --> 00:03:03.550
Call it A. And consider the
complement of that subset.
00:03:03.550 --> 00:03:06.810
The complement is the set of
all elements that do not
00:03:06.810 --> 00:03:12.750
belong to the set A. So a set
together with its complement
00:03:12.750 --> 00:03:16.960
make up everything, which is
the entire sample space.
00:03:16.960 --> 00:03:19.680
On the other hand, if an element
belongs to a set A, it
00:03:19.680 --> 00:03:21.740
does not belong to
its complement.
00:03:21.740 --> 00:03:24.290
So the intersection of a
set with its complement
00:03:24.290 --> 00:03:27.020
is the empty set.
00:03:27.020 --> 00:03:31.210
Now we argue as follows.
00:03:31.210 --> 00:03:35.720
We have that the probability of
the entire sample space is
00:03:35.720 --> 00:03:38.510
equal to 1.
00:03:38.510 --> 00:03:42.300
This is true by our
second axiom.
00:03:42.300 --> 00:03:45.640
Now the sample space, as we just
discussed, can be written
00:03:45.640 --> 00:03:50.290
as the union of an event and the
complement of that event.
00:03:50.290 --> 00:03:54.610
This is just a set theoretic
relation.
00:03:54.610 --> 00:04:00.950
And next since a set and its
complement our disjoint, this
00:04:00.950 --> 00:04:05.020
means that we can apply the
additivity axiom and write
00:04:05.020 --> 00:04:09.560
this probability as the sum of
the probability of event A
00:04:09.560 --> 00:04:14.250
with the probability of the
complement of A. This is one
00:04:14.250 --> 00:04:18.190
of the relations that we had
claimed and which we have now
00:04:18.190 --> 00:04:19.730
established.
00:04:19.730 --> 00:04:23.310
Based on this relation, we
can also write that the
00:04:23.310 --> 00:04:27.610
probability of an event A
is equal to 1 minus the
00:04:27.610 --> 00:04:31.450
probability of the complement
of that event.
00:04:31.450 --> 00:04:35.280
And because, by the
non-negativity axiom this
00:04:35.280 --> 00:04:39.570
quantity here is non-negative, 1
minus something non-negative
00:04:39.570 --> 00:04:41.980
is less than or equal to 1.
00:04:41.980 --> 00:04:44.530
We're using here the
non-negativity axiom.
00:04:44.530 --> 00:04:47.480
And we have established another
property, namely that
00:04:47.480 --> 00:04:53.440
probabilities are always less
than or equal to 1.
00:04:53.440 --> 00:05:04.920
Finally, let us note that 1 is
the probability, always, of a
00:05:04.920 --> 00:05:10.680
set plus the probability of
a complement of that set.
00:05:10.680 --> 00:05:13.810
And let us use this property for
the case where the set of
00:05:13.810 --> 00:05:18.020
interest is the entire
sample space.
00:05:18.020 --> 00:05:21.780
Now, the probability of the
entire sample space is itself
00:05:21.780 --> 00:05:24.740
equal to 1.
00:05:24.740 --> 00:05:28.560
And what is the complement of
the entire sample space?
00:05:28.560 --> 00:05:31.510
The complement of the entire
sample space consists of all
00:05:31.510 --> 00:05:34.050
elements that do not belong
to the sample space.
00:05:34.050 --> 00:05:38.290
But since the sample space is
supposed to contain all
00:05:38.290 --> 00:05:41.030
possible elements,
its complement is
00:05:41.030 --> 00:05:43.420
just the empty set.
00:05:43.420 --> 00:05:46.130
And from this relation we get
the implication that the
00:05:46.130 --> 00:05:50.820
probability of the empty
set is equal to 0.
00:05:50.820 --> 00:05:54.380
This establishes yet one more of
the properties that we had
00:05:54.380 --> 00:05:56.090
just claimed a little earlier.
00:06:00.060 --> 00:06:03.390
We finally come to the proof of
the generalization of our
00:06:03.390 --> 00:06:07.110
additivity axiom from the case
of two disjoint events to the
00:06:07.110 --> 00:06:09.540
case of three disjoint events.
00:06:09.540 --> 00:06:12.420
So we have our sample space.
00:06:12.420 --> 00:06:15.650
And within that sample
space we have three
00:06:15.650 --> 00:06:17.910
events, three subsets.
00:06:17.910 --> 00:06:21.170
And these subsets are disjoint
in the sense that any two of
00:06:21.170 --> 00:06:24.920
those subsets have no
elements in common.
00:06:24.920 --> 00:06:29.190
And we're interested in the
probability of the union of A,
00:06:29.190 --> 00:06:30.650
B, and C.
00:06:30.650 --> 00:06:32.470
How do we make progress?
00:06:32.470 --> 00:06:35.750
We have an additivity axiom in
our hands, which applies to
00:06:35.750 --> 00:06:38.909
the case of the union of
two disjoint sets.
00:06:38.909 --> 00:06:40.650
Here we have three of them.
00:06:40.650 --> 00:06:42.870
But we can do the
following trick.
00:06:42.870 --> 00:06:46.970
We can think of the union of A,
B, and C as consisting of
00:06:46.970 --> 00:06:54.950
the union of this blue set
with that green set.
00:06:54.950 --> 00:06:58.720
Formally, what we're doing is
that we're expressing the
00:06:58.720 --> 00:07:01.790
union of these three
sets as follows.
00:07:01.790 --> 00:07:07.330
We form one set by taking the
union of A with B. And we have
00:07:07.330 --> 00:07:11.220
the other set C. And the overall
union can be thought
00:07:11.220 --> 00:07:14.080
of as the union of
these two sets.
00:07:14.080 --> 00:07:17.860
Now since the three sets are
disjoint, this implies that
00:07:17.860 --> 00:07:21.830
the blue set is disjoint from
the green set and so we can
00:07:21.830 --> 00:07:25.990
use the additivity axiom here
to write this probability as
00:07:25.990 --> 00:07:33.310
the probability of A union B
plus the probability of C. And
00:07:33.310 --> 00:07:36.570
now we can use the additivity
axiom once more since the sets
00:07:36.570 --> 00:07:40.659
A and B are disjoint to write
the first term as probability
00:07:40.659 --> 00:07:45.020
of A plus probability of B. We
carry over the last term and
00:07:45.020 --> 00:07:48.200
we have the relation that
we wanted to prove.
00:07:48.200 --> 00:07:50.409
This is the proof for the
case of three events.
00:07:50.409 --> 00:07:54.060
You should be able to follow
this line of proof to write an
00:07:54.060 --> 00:07:56.850
argument for the case of
four events and so on.
00:07:56.850 --> 00:07:59.750
And you might want to continue
by induction.
00:07:59.750 --> 00:08:04.340
And eventually you should be
able to prove that if the sets
00:08:04.340 --> 00:08:16.300
A1 up to Ak are disjoint then
the probability of the union
00:08:16.300 --> 00:08:25.290
of those sets is going to be
equal to the sum of their
00:08:25.290 --> 00:08:28.150
individual probabilities.
00:08:28.150 --> 00:08:31.180
So this is the generalization
to the case where we're
00:08:31.180 --> 00:08:37.570
dealing with the union of
finitely many disjoint events.
00:08:37.570 --> 00:08:43.140
A very useful application of
this comes in the case where
00:08:43.140 --> 00:08:48.800
we want to calculate the
probability of a finite set.
00:08:48.800 --> 00:08:52.960
So here we have a
sample space.
00:08:52.960 --> 00:08:57.910
And within that sample space
we have some particular
00:08:57.910 --> 00:09:02.370
elements S1, S2, up
to Sk, k of them.
00:09:02.370 --> 00:09:07.570
And these elements together
form a finite set.
00:09:07.570 --> 00:09:09.470
What can we say about
the probability
00:09:09.470 --> 00:09:11.850
of this finite set?
00:09:11.850 --> 00:09:17.270
The idea is to take this finite
set that consists of k
00:09:17.270 --> 00:09:22.800
elements and think of it as the
union of several little
00:09:22.800 --> 00:09:27.810
sets that contain one
element each.
00:09:27.810 --> 00:09:31.010
So set theoretically what we're
doing is that we're
00:09:31.010 --> 00:09:34.980
taking this set with k elements
and we write it as
00:09:34.980 --> 00:09:39.210
the union of a set that contains
just S1, a set that
00:09:39.210 --> 00:09:43.820
contains just the second element
S2, and so on, up to
00:09:43.820 --> 00:09:45.070
the k-th element.
00:09:47.710 --> 00:09:50.800
We're assuming, of course, that
these elements are all
00:09:50.800 --> 00:09:53.010
different from each other.
00:09:53.010 --> 00:09:56.630
So in that case, these sets,
these single element sets, are
00:09:56.630 --> 00:09:58.010
all disjoint.
00:09:58.010 --> 00:10:01.990
So using the additivity property
for a union of k
00:10:01.990 --> 00:10:07.300
disjoint sets, we can write
this as the sum of the
00:10:07.300 --> 00:10:11.210
probabilities of the different
single element sets.
00:10:16.770 --> 00:10:21.180
At this point, it is usual to
start abusing, or rather,
00:10:21.180 --> 00:10:23.570
simplifying notation
a little bit.
00:10:23.570 --> 00:10:26.020
Probabilities are assigned
to sets.
00:10:26.020 --> 00:10:29.080
So here we're talking about the
probability of a set that
00:10:29.080 --> 00:10:30.910
contains a single element.
00:10:30.910 --> 00:10:34.870
But intuitively, we can also
talk as just the probability
00:10:34.870 --> 00:10:39.880
of that particular element and
use this simpler notation.
00:10:39.880 --> 00:10:43.450
So when using the simpler
notation, we will be talking
00:10:43.450 --> 00:10:46.930
about the probabilities of
individual elements.
00:10:46.930 --> 00:10:49.960
Although in terms of formal
mathematics, what we really
00:10:49.960 --> 00:10:56.490
mean is the probability of this
event that's comprised
00:10:56.490 --> 00:11:00.880
only of a particular element
S1 and so on.