WEBVTT

00:00:01.500 --> 00:00:04.940
We have so far discussed the
first step involved in the

00:00:04.940 --> 00:00:07.350
construction of a probabilistic
model.

00:00:07.350 --> 00:00:10.930
Namely, the construction of
a sample space, which is a

00:00:10.930 --> 00:00:13.730
description of the possible
outcomes of a probabilistic

00:00:13.730 --> 00:00:15.100
experiment.

00:00:15.100 --> 00:00:19.760
We now come to the second and
much more interesting part.

00:00:19.760 --> 00:00:24.210
We need to specify which
outcomes are more likely to

00:00:24.210 --> 00:00:28.540
occur and which ones are less
likely to occur and so on.

00:00:28.540 --> 00:00:33.280
And we will do that by assigning
probabilities to the

00:00:33.280 --> 00:00:35.220
different outcomes.

00:00:35.220 --> 00:00:40.550
However, as we try to do this
assignment, we run into some

00:00:40.550 --> 00:00:43.340
kind of difficulty, which
is the following.

00:00:43.340 --> 00:00:46.310
Remember the previous experiment
involving a

00:00:46.310 --> 00:00:50.880
continuous sample space, which
was the unit square and in

00:00:50.880 --> 00:00:55.280
which we throw a dart at random
and record the point

00:00:55.280 --> 00:00:57.160
that occurred.

00:00:57.160 --> 00:01:00.760
In this experiment, what do you
think is the probability

00:01:00.760 --> 00:01:02.860
of a particular point?

00:01:02.860 --> 00:01:07.540
Let's say what is the
probability that my dart hits

00:01:07.540 --> 00:01:11.890
exactly the center
of this square.

00:01:11.890 --> 00:01:14.330
Well, this probability would
be essentially 0.

00:01:14.330 --> 00:01:16.990
Hitting the center exactly
with infinite

00:01:16.990 --> 00:01:18.870
precision should be 0.

00:01:18.870 --> 00:01:22.460
And so it's natural that in such
a continuous model any

00:01:22.460 --> 00:01:27.020
individual point should
have a 0 probability.

00:01:27.020 --> 00:01:31.280
For this reason instead of
assigning probabilities to

00:01:31.280 --> 00:01:36.640
individual points, we will
instead assign probabilities

00:01:36.640 --> 00:01:42.870
to whole sets, that is, to
subsets of the sample space.

00:01:42.870 --> 00:01:49.090
So here we have our sample
space, which is some

00:01:49.090 --> 00:01:53.620
abstract set omega.

00:01:53.620 --> 00:01:56.570
Here is a subset of
the sample space.

00:01:56.570 --> 00:02:01.540
Call it capital A. We're going
to assign a probability to

00:02:01.540 --> 00:02:05.620
that subset A, which we're
going to denote with this

00:02:05.620 --> 00:02:12.380
notation, which we read as the
probability of set A. So

00:02:12.380 --> 00:02:15.580
probabilities will be
assigned to subsets.

00:02:15.580 --> 00:02:18.420
And these will not cause us
difficulties in the continuous

00:02:18.420 --> 00:02:22.280
case because even though
individual points would have 0

00:02:22.280 --> 00:02:26.720
probability, if you ask me what
are the odds that my dart

00:02:26.720 --> 00:02:31.760
falls in the upper half, let's
say, of this diagram, then

00:02:31.760 --> 00:02:34.840
that should be a reasonable
positive number.

00:02:34.840 --> 00:02:37.220
So even though individual
outcomes may have 0

00:02:37.220 --> 00:02:41.110
probabilities, sets of outcomes
in general would be

00:02:41.110 --> 00:02:44.230
expected to have positive
probabilities.

00:02:44.230 --> 00:02:48.800
So coming back, we're going to
assign probabilities to the

00:02:48.800 --> 00:02:51.810
various subsets of
the sample space.

00:02:51.810 --> 00:02:55.090
And here comes a piece of
terminology, that a subset of

00:02:55.090 --> 00:02:57.890
the sample space is
called an event.

00:02:57.890 --> 00:02:59.880
Why is it called an event?

00:02:59.880 --> 00:03:04.110
Because once we carry out the
experiment and we observe the

00:03:04.110 --> 00:03:08.150
outcome of the experiment,
either this outcome is inside

00:03:08.150 --> 00:03:13.820
the set A and in that case we
say that event A has occurred,

00:03:13.820 --> 00:03:18.550
or the outcome falls outside the
set A in which case we say

00:03:18.550 --> 00:03:22.640
that event A did not occur.

00:03:22.640 --> 00:03:26.630
Now we want to move on and
describe certain rules.

00:03:26.630 --> 00:03:29.340
The rules of the game in
probabilistic models, which

00:03:29.340 --> 00:03:31.570
are basically the
rules that these

00:03:31.570 --> 00:03:34.030
probabilities should satisfy.

00:03:34.030 --> 00:03:36.510
They shouldn't be completely
arbitrary.

00:03:36.510 --> 00:03:42.240
First, by convention,
probabilities are always given

00:03:42.240 --> 00:03:45.270
in the range between 0 and 1.

00:03:45.270 --> 00:03:48.390
Intuitively, 0 probability means
that we believe that

00:03:48.390 --> 00:03:51.250
something practically
cannot happen.

00:03:51.250 --> 00:03:56.680
And probability of 1 means that
we're practically certain

00:03:56.680 --> 00:04:00.080
that an event of interest
is going to happen.

00:04:00.080 --> 00:04:04.410
So we want to specify rules of
these kind for probabilities.

00:04:04.410 --> 00:04:07.240
These rules that any
probabilistic model should

00:04:07.240 --> 00:04:11.100
satisfy are called the axioms
of probability theory.

00:04:11.100 --> 00:04:14.560
And our first axiom is a
nonnegativity axiom.

00:04:14.560 --> 00:04:16.810
Namely, probabilities
will always be

00:04:16.810 --> 00:04:19.130
non-negative numbers.

00:04:19.130 --> 00:04:20.940
It's a reasonable rule.

00:04:20.940 --> 00:04:26.050
The second rule is that if the
subset that we're looking at

00:04:26.050 --> 00:04:29.760
is actually not a subset but
is the entire sample space

00:04:29.760 --> 00:04:35.810
omega, the probability of it
should always be equal to 1.

00:04:35.810 --> 00:04:37.570
What does that mean?

00:04:37.570 --> 00:04:40.850
We know that the outcome is
going to be an element of the

00:04:40.850 --> 00:04:41.850
sample space.

00:04:41.850 --> 00:04:44.430
This is the definition
of the sample space.

00:04:44.430 --> 00:04:48.000
So we have absolute certainty
that our outcome is going to

00:04:48.000 --> 00:04:49.490
be in omega.

00:04:49.490 --> 00:04:52.659
Or in different language we have
absolute certainty that

00:04:52.659 --> 00:04:55.630
event omega is going to occur.

00:04:55.630 --> 00:04:59.170
And we capture this certainty by
saying that the probability

00:04:59.170 --> 00:05:02.760
of event omega is equal to 1.

00:05:02.760 --> 00:05:07.030
These two axioms are pretty
simple and very intuitive.

00:05:07.030 --> 00:05:11.150
The more interesting axiom
is the next one that says

00:05:11.150 --> 00:05:13.880
something a little
more complicated.

00:05:13.880 --> 00:05:18.800
Before we discuss that
particular axiom, a quick

00:05:18.800 --> 00:05:22.770
reminder about set theoretic
notation.

00:05:22.770 --> 00:05:29.250
If we have two sets, let's say
a set A, and another set,

00:05:29.250 --> 00:05:38.260
another set B, we use this
particular notation, which we

00:05:38.260 --> 00:05:44.360
read as "A intersection B" to
refer to the collection of

00:05:44.360 --> 00:05:49.960
elements that belong to both A
and B. So in this picture, the

00:05:49.960 --> 00:05:56.390
intersection of A and B
is this shaded set.

00:05:56.390 --> 00:06:03.030
We use this notation, which we
read as "A union B", to refer

00:06:03.030 --> 00:06:06.960
to the set of elements
that belong to A

00:06:06.960 --> 00:06:09.840
or to B or to both.

00:06:09.840 --> 00:06:13.990
So in terms of this picture,
the union of the two sets

00:06:13.990 --> 00:06:17.250
would be this blue set.

00:06:17.250 --> 00:06:20.830
After this reminder about set
theoretic notation, now let us

00:06:20.830 --> 00:06:23.370
look at the form of
the third axiom.

00:06:23.370 --> 00:06:24.880
What does it say?

00:06:24.880 --> 00:06:28.860
If we have two sets, two events,
two subsets of the

00:06:28.860 --> 00:06:33.480
sample space, which
are disjoint.

00:06:33.480 --> 00:06:36.460
So here's our sample space.

00:06:36.460 --> 00:06:40.220
And here are the two sets
that are disjoint.

00:06:43.340 --> 00:06:47.110
In mathematical terms, two sets
being disjoint means that

00:06:47.110 --> 00:06:50.640
their intersection
has no elements.

00:06:50.640 --> 00:06:53.620
So their intersection
is the empty set.

00:06:53.620 --> 00:06:59.159
And we use this symbol here
to denote the empty set.

00:06:59.159 --> 00:07:03.030
So if the intersection of two
sets is empty, then the

00:07:03.030 --> 00:07:07.685
probability that the outcome
of the experiments falls in

00:07:07.685 --> 00:07:11.740
the union of A and B, that is,
the probability that the

00:07:11.740 --> 00:07:17.260
outcome is here or there, is
equal to the sum of the

00:07:17.260 --> 00:07:21.700
probabilities of
these two sets.

00:07:21.700 --> 00:07:24.070
This is called the
additivity axiom.

00:07:24.070 --> 00:07:29.040
So it says that we can add
probabilities of different

00:07:29.040 --> 00:07:32.760
sets when those two
sets are disjoint.

00:07:32.760 --> 00:07:38.260
In some sense we can think of
probability as being one pound

00:07:38.260 --> 00:07:43.010
of some substance which is
spread over our sample space

00:07:43.010 --> 00:07:46.780
and the probability of A is how
much of that substance is

00:07:46.780 --> 00:07:51.659
sitting on top of a set A. So
what this axiom is saying is

00:07:51.659 --> 00:07:56.740
that the total amount of that
substance sitting on top of A

00:07:56.740 --> 00:08:01.510
and B is how much is sitting on
top of A plus how much is

00:08:01.510 --> 00:08:05.590
sitting on top of B. And that is
the case whenever the sets

00:08:05.590 --> 00:08:10.510
A and B are disjoint
from each other.

00:08:10.510 --> 00:08:13.960
The additivity axiom needs
to be refined a bit.

00:08:13.960 --> 00:08:16.656
We will talk about that
a little later.

00:08:16.656 --> 00:08:20.150
Other than this refinement,
these three axioms are the

00:08:20.150 --> 00:08:22.490
only requirements in
order to have a

00:08:22.490 --> 00:08:25.100
legitimate probability model.

00:08:25.100 --> 00:08:27.950
At this point you may ask,
shouldn't there be more

00:08:27.950 --> 00:08:29.270
requirements?

00:08:29.270 --> 00:08:32.909
Shouldn't we, for example, say
that probabilities cannot be

00:08:32.909 --> 00:08:34.950
greater than 1?

00:08:34.950 --> 00:08:36.760
Yes and no.

00:08:36.760 --> 00:08:40.570
We do not want probabilities to
be larger than 1, but we do

00:08:40.570 --> 00:08:42.320
not need to say it.

00:08:42.320 --> 00:08:45.460
As we will see in the next
segment, such a requirement

00:08:45.460 --> 00:08:48.340
follows from what we
have already said.

00:08:48.340 --> 00:08:51.770
And the same is true for
several other natural

00:08:51.770 --> 00:08:53.110
properties of probabilities.