WEBVTT

00:00:02.550 --> 00:00:05.000
The probability axioms
are the basic rules

00:00:05.000 --> 00:00:06.710
of probability theory.

00:00:06.710 --> 00:00:09.230
And they are surprisingly few.

00:00:09.230 --> 00:00:11.590
But they imply many interesting
properties that we

00:00:11.590 --> 00:00:13.080
will now explore.

00:00:13.080 --> 00:00:16.910
First we will see that what you
might think of as missing

00:00:16.910 --> 00:00:21.920
axioms are actually implied by
the axioms already in place.

00:00:21.920 --> 00:00:26.850
For example, we have an axiom
that probabilities are

00:00:26.850 --> 00:00:29.130
non-negative.

00:00:29.130 --> 00:00:34.010
We will show that probabilities
are also less

00:00:34.010 --> 00:00:36.600
than or equal to 1.

00:00:36.600 --> 00:00:40.260
We have another axiom that says
that the probability of

00:00:40.260 --> 00:00:42.840
the entire sample space is 1.

00:00:42.840 --> 00:00:45.740
We will show a counterpart that
the probability of the

00:00:45.740 --> 00:00:48.540
empty set is equal to 0.

00:00:48.540 --> 00:00:49.920
This makes perfect sense.

00:00:49.920 --> 00:00:53.720
The empty set has no elements,
so it is impossible.

00:00:53.720 --> 00:00:57.720
There is 0 probability that the
outcome of the experiment

00:00:57.720 --> 00:01:00.790
would lie in the empty set.

00:01:00.790 --> 00:01:03.430
We also have another
intuitive property.

00:01:03.430 --> 00:01:07.460
The probability that an event
happens plus the probability

00:01:07.460 --> 00:01:10.550
that the vendor does not
happen exhaust all

00:01:10.550 --> 00:01:11.720
possibilities.

00:01:11.720 --> 00:01:14.630
And these two probabilities
together should add to 1.

00:01:14.630 --> 00:01:17.920
For instance, if the probability
of heads is 0.6,

00:01:17.920 --> 00:01:22.430
then the probability of
tails should be 0.4.

00:01:22.430 --> 00:01:26.100
Finally, we can generalize the
additivity axiom, which was

00:01:26.100 --> 00:01:32.320
originally given for the case of
two disjoint events to the

00:01:32.320 --> 00:01:35.270
case where we're dealing
with the union of

00:01:35.270 --> 00:01:38.690
several disjoint events.

00:01:38.690 --> 00:01:43.140
By disjoint here we mean that
the intersection of any two of

00:01:43.140 --> 00:01:45.840
these events is the empty set.

00:01:45.840 --> 00:01:48.970
We will prove this for the case
of three events and then

00:01:48.970 --> 00:01:52.310
the argument generalizes for
the case where we're taking

00:01:52.310 --> 00:01:55.490
the union of k disjoint
events, where k

00:01:55.490 --> 00:01:57.750
is any finite number.

00:01:57.750 --> 00:02:00.210
So the intuition of this result
is the same as for the

00:02:00.210 --> 00:02:01.730
case of two events.

00:02:01.730 --> 00:02:05.480
But we will derive it formally
and we will also use it to

00:02:05.480 --> 00:02:08.259
come up with a way of
calculating the probability of

00:02:08.259 --> 00:02:12.740
a finite set by simply adding
the probabilities of its

00:02:12.740 --> 00:02:14.800
individual elements.

00:02:14.800 --> 00:02:18.110
All of these statements
that we just

00:02:18.110 --> 00:02:20.650
presented are intuitive.

00:02:20.650 --> 00:02:22.430
And you do not to really
need to be

00:02:22.430 --> 00:02:24.760
convinced about their validity.

00:02:24.760 --> 00:02:27.940
Nevertheless, it is instructive
to see how these

00:02:27.940 --> 00:02:31.260
statements follow from
the axioms that

00:02:31.260 --> 00:02:32.510
we have put in place.

00:02:35.210 --> 00:02:39.550
So we will now present the
arguments based only on the

00:02:39.550 --> 00:02:41.690
three axioms that we
have available.

00:02:41.690 --> 00:02:45.440
And in order to be able to refer
to these axioms, let us

00:02:45.440 --> 00:02:51.310
give them some names, call
them axioms A, B, and C.

00:02:51.310 --> 00:02:52.930
We start as follows.

00:02:52.930 --> 00:02:57.390
Let us look at the sample space
and a subset of that

00:02:57.390 --> 00:02:58.640
sample space.

00:02:58.640 --> 00:03:03.550
Call it A. And consider the
complement of that subset.

00:03:03.550 --> 00:03:06.810
The complement is the set of
all elements that do not

00:03:06.810 --> 00:03:12.750
belong to the set A. So a set
together with its complement

00:03:12.750 --> 00:03:16.960
make up everything, which is
the entire sample space.

00:03:16.960 --> 00:03:19.680
On the other hand, if an element
belongs to a set A, it

00:03:19.680 --> 00:03:21.740
does not belong to
its complement.

00:03:21.740 --> 00:03:24.290
So the intersection of a
set with its complement

00:03:24.290 --> 00:03:27.020
is the empty set.

00:03:27.020 --> 00:03:31.210
Now we argue as follows.

00:03:31.210 --> 00:03:35.720
We have that the probability of
the entire sample space is

00:03:35.720 --> 00:03:38.510
equal to 1.

00:03:38.510 --> 00:03:42.300
This is true by our
second axiom.

00:03:42.300 --> 00:03:45.640
Now the sample space, as we just
discussed, can be written

00:03:45.640 --> 00:03:50.290
as the union of an event and the
complement of that event.

00:03:50.290 --> 00:03:54.610
This is just a set theoretic
relation.

00:03:54.610 --> 00:04:00.950
And next since a set and its
complement our disjoint, this

00:04:00.950 --> 00:04:05.020
means that we can apply the
additivity axiom and write

00:04:05.020 --> 00:04:09.560
this probability as the sum of
the probability of event A

00:04:09.560 --> 00:04:14.250
with the probability of the
complement of A. This is one

00:04:14.250 --> 00:04:18.190
of the relations that we had
claimed and which we have now

00:04:18.190 --> 00:04:19.730
established.

00:04:19.730 --> 00:04:23.310
Based on this relation, we
can also write that the

00:04:23.310 --> 00:04:27.610
probability of an event A
is equal to 1 minus the

00:04:27.610 --> 00:04:31.450
probability of the complement
of that event.

00:04:31.450 --> 00:04:35.280
And because, by the
non-negativity axiom this

00:04:35.280 --> 00:04:39.570
quantity here is non-negative, 1
minus something non-negative

00:04:39.570 --> 00:04:41.980
is less than or equal to 1.

00:04:41.980 --> 00:04:44.530
We're using here the
non-negativity axiom.

00:04:44.530 --> 00:04:47.480
And we have established another
property, namely that

00:04:47.480 --> 00:04:53.440
probabilities are always less
than or equal to 1.

00:04:53.440 --> 00:05:04.920
Finally, let us note that 1 is
the probability, always, of a

00:05:04.920 --> 00:05:10.680
set plus the probability of
a complement of that set.

00:05:10.680 --> 00:05:13.810
And let us use this property for
the case where the set of

00:05:13.810 --> 00:05:18.020
interest is the entire
sample space.

00:05:18.020 --> 00:05:21.780
Now, the probability of the
entire sample space is itself

00:05:21.780 --> 00:05:24.740
equal to 1.

00:05:24.740 --> 00:05:28.560
And what is the complement of
the entire sample space?

00:05:28.560 --> 00:05:31.510
The complement of the entire
sample space consists of all

00:05:31.510 --> 00:05:34.050
elements that do not belong
to the sample space.

00:05:34.050 --> 00:05:38.290
But since the sample space is
supposed to contain all

00:05:38.290 --> 00:05:41.030
possible elements,
its complement is

00:05:41.030 --> 00:05:43.420
just the empty set.

00:05:43.420 --> 00:05:46.130
And from this relation we get
the implication that the

00:05:46.130 --> 00:05:50.820
probability of the empty
set is equal to 0.

00:05:50.820 --> 00:05:54.380
This establishes yet one more of
the properties that we had

00:05:54.380 --> 00:05:56.090
just claimed a little earlier.

00:06:00.060 --> 00:06:03.390
We finally come to the proof of
the generalization of our

00:06:03.390 --> 00:06:07.110
additivity axiom from the case
of two disjoint events to the

00:06:07.110 --> 00:06:09.540
case of three disjoint events.

00:06:09.540 --> 00:06:12.420
So we have our sample space.

00:06:12.420 --> 00:06:15.650
And within that sample
space we have three

00:06:15.650 --> 00:06:17.910
events, three subsets.

00:06:17.910 --> 00:06:21.170
And these subsets are disjoint
in the sense that any two of

00:06:21.170 --> 00:06:24.920
those subsets have no
elements in common.

00:06:24.920 --> 00:06:29.190
And we're interested in the
probability of the union of A,

00:06:29.190 --> 00:06:30.650
B, and C.

00:06:30.650 --> 00:06:32.470
How do we make progress?

00:06:32.470 --> 00:06:35.750
We have an additivity axiom in
our hands, which applies to

00:06:35.750 --> 00:06:38.909
the case of the union of
two disjoint sets.

00:06:38.909 --> 00:06:40.650
Here we have three of them.

00:06:40.650 --> 00:06:42.870
But we can do the
following trick.

00:06:42.870 --> 00:06:46.970
We can think of the union of A,
B, and C as consisting of

00:06:46.970 --> 00:06:54.950
the union of this blue set
with that green set.

00:06:54.950 --> 00:06:58.720
Formally, what we're doing is
that we're expressing the

00:06:58.720 --> 00:07:01.790
union of these three
sets as follows.

00:07:01.790 --> 00:07:07.330
We form one set by taking the
union of A with B. And we have

00:07:07.330 --> 00:07:11.220
the other set C. And the overall
union can be thought

00:07:11.220 --> 00:07:14.080
of as the union of
these two sets.

00:07:14.080 --> 00:07:17.860
Now since the three sets are
disjoint, this implies that

00:07:17.860 --> 00:07:21.830
the blue set is disjoint from
the green set and so we can

00:07:21.830 --> 00:07:25.990
use the additivity axiom here
to write this probability as

00:07:25.990 --> 00:07:33.310
the probability of A union B
plus the probability of C. And

00:07:33.310 --> 00:07:36.570
now we can use the additivity
axiom once more since the sets

00:07:36.570 --> 00:07:40.659
A and B are disjoint to write
the first term as probability

00:07:40.659 --> 00:07:45.020
of A plus probability of B. We
carry over the last term and

00:07:45.020 --> 00:07:48.200
we have the relation that
we wanted to prove.

00:07:48.200 --> 00:07:50.409
This is the proof for the
case of three events.

00:07:50.409 --> 00:07:54.060
You should be able to follow
this line of proof to write an

00:07:54.060 --> 00:07:56.850
argument for the case of
four events and so on.

00:07:56.850 --> 00:07:59.750
And you might want to continue
by induction.

00:07:59.750 --> 00:08:04.340
And eventually you should be
able to prove that if the sets

00:08:04.340 --> 00:08:16.300
A1 up to Ak are disjoint then
the probability of the union

00:08:16.300 --> 00:08:25.290
of those sets is going to be
equal to the sum of their

00:08:25.290 --> 00:08:28.150
individual probabilities.

00:08:28.150 --> 00:08:31.180
So this is the generalization
to the case where we're

00:08:31.180 --> 00:08:37.570
dealing with the union of
finitely many disjoint events.

00:08:37.570 --> 00:08:43.140
A very useful application of
this comes in the case where

00:08:43.140 --> 00:08:48.800
we want to calculate the
probability of a finite set.

00:08:48.800 --> 00:08:52.960
So here we have a
sample space.

00:08:52.960 --> 00:08:57.910
And within that sample space
we have some particular

00:08:57.910 --> 00:09:02.370
elements S1, S2, up
to Sk, k of them.

00:09:02.370 --> 00:09:07.570
And these elements together
form a finite set.

00:09:07.570 --> 00:09:09.470
What can we say about
the probability

00:09:09.470 --> 00:09:11.850
of this finite set?

00:09:11.850 --> 00:09:17.270
The idea is to take this finite
set that consists of k

00:09:17.270 --> 00:09:22.800
elements and think of it as the
union of several little

00:09:22.800 --> 00:09:27.810
sets that contain one
element each.

00:09:27.810 --> 00:09:31.010
So set theoretically what we're
doing is that we're

00:09:31.010 --> 00:09:34.980
taking this set with k elements
and we write it as

00:09:34.980 --> 00:09:39.210
the union of a set that contains
just S1, a set that

00:09:39.210 --> 00:09:43.820
contains just the second element
S2, and so on, up to

00:09:43.820 --> 00:09:45.070
the k-th element.

00:09:47.710 --> 00:09:50.800
We're assuming, of course, that
these elements are all

00:09:50.800 --> 00:09:53.010
different from each other.

00:09:53.010 --> 00:09:56.630
So in that case, these sets,
these single element sets, are

00:09:56.630 --> 00:09:58.010
all disjoint.

00:09:58.010 --> 00:10:01.990
So using the additivity property
for a union of k

00:10:01.990 --> 00:10:07.300
disjoint sets, we can write
this as the sum of the

00:10:07.300 --> 00:10:11.210
probabilities of the different
single element sets.

00:10:16.770 --> 00:10:21.180
At this point, it is usual to
start abusing, or rather,

00:10:21.180 --> 00:10:23.570
simplifying notation
a little bit.

00:10:23.570 --> 00:10:26.020
Probabilities are assigned
to sets.

00:10:26.020 --> 00:10:29.080
So here we're talking about the
probability of a set that

00:10:29.080 --> 00:10:30.910
contains a single element.

00:10:30.910 --> 00:10:34.870
But intuitively, we can also
talk as just the probability

00:10:34.870 --> 00:10:39.880
of that particular element and
use this simpler notation.

00:10:39.880 --> 00:10:43.450
So when using the simpler
notation, we will be talking

00:10:43.450 --> 00:10:46.930
about the probabilities of
individual elements.

00:10:46.930 --> 00:10:49.960
Although in terms of formal
mathematics, what we really

00:10:49.960 --> 00:10:56.490
mean is the probability of this
event that's comprised

00:10:56.490 --> 00:11:00.880
only of a particular element
S1 and so on.