WEBVTT
00:00:01.500 --> 00:00:04.940
We have so far discussed the
first step involved in the
00:00:04.940 --> 00:00:07.350
construction of a probabilistic
model.
00:00:07.350 --> 00:00:10.930
Namely, the construction of
a sample space, which is a
00:00:10.930 --> 00:00:13.730
description of the possible
outcomes of a probabilistic
00:00:13.730 --> 00:00:15.100
experiment.
00:00:15.100 --> 00:00:19.760
We now come to the second and
much more interesting part.
00:00:19.760 --> 00:00:24.210
We need to specify which
outcomes are more likely to
00:00:24.210 --> 00:00:28.540
occur and which ones are less
likely to occur and so on.
00:00:28.540 --> 00:00:33.280
And we will do that by assigning
probabilities to the
00:00:33.280 --> 00:00:35.220
different outcomes.
00:00:35.220 --> 00:00:40.550
However, as we try to do this
assignment, we run into some
00:00:40.550 --> 00:00:43.340
kind of difficulty, which
is the following.
00:00:43.340 --> 00:00:46.310
Remember the previous experiment
involving a
00:00:46.310 --> 00:00:50.880
continuous sample space, which
was the unit square and in
00:00:50.880 --> 00:00:55.280
which we throw a dart at random
and record the point
00:00:55.280 --> 00:00:57.160
that occurred.
00:00:57.160 --> 00:01:00.760
In this experiment, what do you
think is the probability
00:01:00.760 --> 00:01:02.860
of a particular point?
00:01:02.860 --> 00:01:07.540
Let's say what is the
probability that my dart hits
00:01:07.540 --> 00:01:11.890
exactly the center
of this square.
00:01:11.890 --> 00:01:14.330
Well, this probability would
be essentially 0.
00:01:14.330 --> 00:01:16.990
Hitting the center exactly
with infinite
00:01:16.990 --> 00:01:18.870
precision should be 0.
00:01:18.870 --> 00:01:22.460
And so it's natural that in such
a continuous model any
00:01:22.460 --> 00:01:27.020
individual point should
have a 0 probability.
00:01:27.020 --> 00:01:31.280
For this reason instead of
assigning probabilities to
00:01:31.280 --> 00:01:36.640
individual points, we will
instead assign probabilities
00:01:36.640 --> 00:01:42.870
to whole sets, that is, to
subsets of the sample space.
00:01:42.870 --> 00:01:49.090
So here we have our sample
space, which is some
00:01:49.090 --> 00:01:53.620
abstract set omega.
00:01:53.620 --> 00:01:56.570
Here is a subset of
the sample space.
00:01:56.570 --> 00:02:01.540
Call it capital A. We're going
to assign a probability to
00:02:01.540 --> 00:02:05.620
that subset A, which we're
going to denote with this
00:02:05.620 --> 00:02:12.380
notation, which we read as the
probability of set A. So
00:02:12.380 --> 00:02:15.580
probabilities will be
assigned to subsets.
00:02:15.580 --> 00:02:18.420
And these will not cause us
difficulties in the continuous
00:02:18.420 --> 00:02:22.280
case because even though
individual points would have 0
00:02:22.280 --> 00:02:26.720
probability, if you ask me what
are the odds that my dart
00:02:26.720 --> 00:02:31.760
falls in the upper half, let's
say, of this diagram, then
00:02:31.760 --> 00:02:34.840
that should be a reasonable
positive number.
00:02:34.840 --> 00:02:37.220
So even though individual
outcomes may have 0
00:02:37.220 --> 00:02:41.110
probabilities, sets of outcomes
in general would be
00:02:41.110 --> 00:02:44.230
expected to have positive
probabilities.
00:02:44.230 --> 00:02:48.800
So coming back, we're going to
assign probabilities to the
00:02:48.800 --> 00:02:51.810
various subsets of
the sample space.
00:02:51.810 --> 00:02:55.090
And here comes a piece of
terminology, that a subset of
00:02:55.090 --> 00:02:57.890
the sample space is
called an event.
00:02:57.890 --> 00:02:59.880
Why is it called an event?
00:02:59.880 --> 00:03:04.110
Because once we carry out the
experiment and we observe the
00:03:04.110 --> 00:03:08.150
outcome of the experiment,
either this outcome is inside
00:03:08.150 --> 00:03:13.820
the set A and in that case we
say that event A has occurred,
00:03:13.820 --> 00:03:18.550
or the outcome falls outside the
set A in which case we say
00:03:18.550 --> 00:03:22.640
that event A did not occur.
00:03:22.640 --> 00:03:26.630
Now we want to move on and
describe certain rules.
00:03:26.630 --> 00:03:29.340
The rules of the game in
probabilistic models, which
00:03:29.340 --> 00:03:31.570
are basically the
rules that these
00:03:31.570 --> 00:03:34.030
probabilities should satisfy.
00:03:34.030 --> 00:03:36.510
They shouldn't be completely
arbitrary.
00:03:36.510 --> 00:03:42.240
First, by convention,
probabilities are always given
00:03:42.240 --> 00:03:45.270
in the range between 0 and 1.
00:03:45.270 --> 00:03:48.390
Intuitively, 0 probability means
that we believe that
00:03:48.390 --> 00:03:51.250
something practically
cannot happen.
00:03:51.250 --> 00:03:56.680
And probability of 1 means that
we're practically certain
00:03:56.680 --> 00:04:00.080
that an event of interest
is going to happen.
00:04:00.080 --> 00:04:04.410
So we want to specify rules of
these kind for probabilities.
00:04:04.410 --> 00:04:07.240
These rules that any
probabilistic model should
00:04:07.240 --> 00:04:11.100
satisfy are called the axioms
of probability theory.
00:04:11.100 --> 00:04:14.560
And our first axiom is a
nonnegativity axiom.
00:04:14.560 --> 00:04:16.810
Namely, probabilities
will always be
00:04:16.810 --> 00:04:19.130
non-negative numbers.
00:04:19.130 --> 00:04:20.940
It's a reasonable rule.
00:04:20.940 --> 00:04:26.050
The second rule is that if the
subset that we're looking at
00:04:26.050 --> 00:04:29.760
is actually not a subset but
is the entire sample space
00:04:29.760 --> 00:04:35.810
omega, the probability of it
should always be equal to 1.
00:04:35.810 --> 00:04:37.570
What does that mean?
00:04:37.570 --> 00:04:40.850
We know that the outcome is
going to be an element of the
00:04:40.850 --> 00:04:41.850
sample space.
00:04:41.850 --> 00:04:44.430
This is the definition
of the sample space.
00:04:44.430 --> 00:04:48.000
So we have absolute certainty
that our outcome is going to
00:04:48.000 --> 00:04:49.490
be in omega.
00:04:49.490 --> 00:04:52.659
Or in different language we have
absolute certainty that
00:04:52.659 --> 00:04:55.630
event omega is going to occur.
00:04:55.630 --> 00:04:59.170
And we capture this certainty by
saying that the probability
00:04:59.170 --> 00:05:02.760
of event omega is equal to 1.
00:05:02.760 --> 00:05:07.030
These two axioms are pretty
simple and very intuitive.
00:05:07.030 --> 00:05:11.150
The more interesting axiom
is the next one that says
00:05:11.150 --> 00:05:13.880
something a little
more complicated.
00:05:13.880 --> 00:05:18.800
Before we discuss that
particular axiom, a quick
00:05:18.800 --> 00:05:22.770
reminder about set theoretic
notation.
00:05:22.770 --> 00:05:29.250
If we have two sets, let's say
a set A, and another set,
00:05:29.250 --> 00:05:38.260
another set B, we use this
particular notation, which we
00:05:38.260 --> 00:05:44.360
read as "A intersection B" to
refer to the collection of
00:05:44.360 --> 00:05:49.960
elements that belong to both A
and B. So in this picture, the
00:05:49.960 --> 00:05:56.390
intersection of A and B
is this shaded set.
00:05:56.390 --> 00:06:03.030
We use this notation, which we
read as "A union B", to refer
00:06:03.030 --> 00:06:06.960
to the set of elements
that belong to A
00:06:06.960 --> 00:06:09.840
or to B or to both.
00:06:09.840 --> 00:06:13.990
So in terms of this picture,
the union of the two sets
00:06:13.990 --> 00:06:17.250
would be this blue set.
00:06:17.250 --> 00:06:20.830
After this reminder about set
theoretic notation, now let us
00:06:20.830 --> 00:06:23.370
look at the form of
the third axiom.
00:06:23.370 --> 00:06:24.880
What does it say?
00:06:24.880 --> 00:06:28.860
If we have two sets, two events,
two subsets of the
00:06:28.860 --> 00:06:33.480
sample space, which
are disjoint.
00:06:33.480 --> 00:06:36.460
So here's our sample space.
00:06:36.460 --> 00:06:40.220
And here are the two sets
that are disjoint.
00:06:43.340 --> 00:06:47.110
In mathematical terms, two sets
being disjoint means that
00:06:47.110 --> 00:06:50.640
their intersection
has no elements.
00:06:50.640 --> 00:06:53.620
So their intersection
is the empty set.
00:06:53.620 --> 00:06:59.159
And we use this symbol here
to denote the empty set.
00:06:59.159 --> 00:07:03.030
So if the intersection of two
sets is empty, then the
00:07:03.030 --> 00:07:07.685
probability that the outcome
of the experiments falls in
00:07:07.685 --> 00:07:11.740
the union of A and B, that is,
the probability that the
00:07:11.740 --> 00:07:17.260
outcome is here or there, is
equal to the sum of the
00:07:17.260 --> 00:07:21.700
probabilities of
these two sets.
00:07:21.700 --> 00:07:24.070
This is called the
additivity axiom.
00:07:24.070 --> 00:07:29.040
So it says that we can add
probabilities of different
00:07:29.040 --> 00:07:32.760
sets when those two
sets are disjoint.
00:07:32.760 --> 00:07:38.260
In some sense we can think of
probability as being one pound
00:07:38.260 --> 00:07:43.010
of some substance which is
spread over our sample space
00:07:43.010 --> 00:07:46.780
and the probability of A is how
much of that substance is
00:07:46.780 --> 00:07:51.659
sitting on top of a set A. So
what this axiom is saying is
00:07:51.659 --> 00:07:56.740
that the total amount of that
substance sitting on top of A
00:07:56.740 --> 00:08:01.510
and B is how much is sitting on
top of A plus how much is
00:08:01.510 --> 00:08:05.590
sitting on top of B. And that is
the case whenever the sets
00:08:05.590 --> 00:08:10.510
A and B are disjoint
from each other.
00:08:10.510 --> 00:08:13.960
The additivity axiom needs
to be refined a bit.
00:08:13.960 --> 00:08:16.656
We will talk about that
a little later.
00:08:16.656 --> 00:08:20.150
Other than this refinement,
these three axioms are the
00:08:20.150 --> 00:08:22.490
only requirements in
order to have a
00:08:22.490 --> 00:08:25.100
legitimate probability model.
00:08:25.100 --> 00:08:27.950
At this point you may ask,
shouldn't there be more
00:08:27.950 --> 00:08:29.270
requirements?
00:08:29.270 --> 00:08:32.909
Shouldn't we, for example, say
that probabilities cannot be
00:08:32.909 --> 00:08:34.950
greater than 1?
00:08:34.950 --> 00:08:36.760
Yes and no.
00:08:36.760 --> 00:08:40.570
We do not want probabilities to
be larger than 1, but we do
00:08:40.570 --> 00:08:42.320
not need to say it.
00:08:42.320 --> 00:08:45.460
As we will see in the next
segment, such a requirement
00:08:45.460 --> 00:08:48.340
follows from what we
have already said.
00:08:48.340 --> 00:08:51.770
And the same is true for
several other natural
00:08:51.770 --> 00:08:53.110
properties of probabilities.