WEBVTT

00:00:03.300 --> 00:00:05.900
Putting together a probabilistic
model--

00:00:05.900 --> 00:00:09.060
that is, a model of a random
phenomenon or a random

00:00:09.060 --> 00:00:10.300
experiment--

00:00:10.300 --> 00:00:12.360
involves two steps.

00:00:12.360 --> 00:00:16.020
First step, we describe the
possible outcomes of the

00:00:16.020 --> 00:00:18.930
phenomenon or experiment
of interest.

00:00:18.930 --> 00:00:22.940
Second step, we describe our
beliefs about the likelihood

00:00:22.940 --> 00:00:25.660
of the different possible
outcomes by specifying a

00:00:25.660 --> 00:00:27.570
probability law.

00:00:27.570 --> 00:00:31.650
Here, we start by just talking
about the first step, namely,

00:00:31.650 --> 00:00:33.830
the description of the possible
outcomes of the

00:00:33.830 --> 00:00:34.950
experiment.

00:00:34.950 --> 00:00:36.760
So we carry out an experiment.

00:00:36.760 --> 00:00:38.540
For example, we flip a coin.

00:00:38.540 --> 00:00:41.400
Or maybe we flip five coins
simultaneously.

00:00:41.400 --> 00:00:43.740
Or maybe we roll a die.

00:00:43.740 --> 00:00:48.390
Whatever that experiment is,
it has a number of possible

00:00:48.390 --> 00:00:51.970
outcomes, and we start
by making a list of

00:00:51.970 --> 00:00:53.830
the possible outcomes--

00:00:53.830 --> 00:00:57.490
or, a better word, instead of
the word "list", is to use the

00:00:57.490 --> 00:01:01.520
word "set", which has a more
formal mathematical meaning.

00:01:01.520 --> 00:01:05.400
So we create a set
that we usually

00:01:05.400 --> 00:01:09.850
denote by capital omega.

00:01:09.850 --> 00:01:14.570
That set is called the sample
space and is the set of all

00:01:14.570 --> 00:01:17.565
possible outcomes of
our experiment.

00:01:21.289 --> 00:01:24.750
The elements of that set
should have certain

00:01:24.750 --> 00:01:25.980
properties.

00:01:25.980 --> 00:01:29.590
Namely, the elements should
be mutually exclusive and

00:01:29.590 --> 00:01:31.620
collectively exhaustive.

00:01:31.620 --> 00:01:32.860
What does that mean?

00:01:32.860 --> 00:01:36.289
Mutually exclusive means that,
if at the end of the

00:01:36.289 --> 00:01:41.160
experiment, I tell you that this
outcome happened, then it

00:01:41.160 --> 00:01:44.970
should not be possible that this
outcome also happened.

00:01:44.970 --> 00:01:47.759
At the end of the experiment,
there can only be one of the

00:01:47.759 --> 00:01:50.200
outcomes that has happened.

00:01:50.200 --> 00:01:53.420
Being collectively exhaustive
means something else-- that,

00:01:53.420 --> 00:01:57.330
together, all of these elements
of the set exhaust

00:01:57.330 --> 00:01:59.400
all the possibilities.

00:01:59.400 --> 00:02:03.400
So no matter what, at the end,
you will be able to point to

00:02:03.400 --> 00:02:07.860
one of the outcomes and say,
that's the one that occurred.

00:02:07.860 --> 00:02:09.009
To summarize--

00:02:09.009 --> 00:02:12.340
this set should be such that, at
the end of the experiment,

00:02:12.340 --> 00:02:17.270
you should be always able to
point to one, and exactly one,

00:02:17.270 --> 00:02:20.660
of the possible outcomes and say
that this is the outcome

00:02:20.660 --> 00:02:21.910
that occurred.

00:02:23.700 --> 00:02:28.870
Physically different outcomes
should be distinguished in the

00:02:28.870 --> 00:02:33.530
sample space and correspond
to distinct points.

00:02:33.530 --> 00:02:35.760
But when we say physically
different

00:02:35.760 --> 00:02:37.840
outcomes, what do we mean?

00:02:37.840 --> 00:02:41.910
We really mean different in
all relevant aspects but

00:02:41.910 --> 00:02:45.880
perhaps not different in
irrelevant aspects.

00:02:45.880 --> 00:02:50.180
Let's make more precise what I
mean by that by looking at a

00:02:50.180 --> 00:02:53.600
very simple, and maybe
silly, example,

00:02:53.600 --> 00:02:54.625
which is the following.

00:02:54.625 --> 00:02:58.360
Suppose that you flip a coin
and you see whether it

00:02:58.360 --> 00:03:01.980
resulted in heads or tails.

00:03:01.980 --> 00:03:05.890
So you have a perfectly
legitimate sample space for

00:03:05.890 --> 00:03:09.470
this experiment which consists
of just two points--

00:03:09.470 --> 00:03:11.200
heads and tails.

00:03:11.200 --> 00:03:16.750
Together these two outcomes
exhaust all possibilities.

00:03:16.750 --> 00:03:19.380
And the two outcomes are
mutually exclusive.

00:03:19.380 --> 00:03:22.200
So this is a very legitimate
sample space for this

00:03:22.200 --> 00:03:23.760
experiment.

00:03:23.760 --> 00:03:26.620
Now suppose that while you were
flipping the coin, you

00:03:26.620 --> 00:03:30.110
also looked outside the window
to check the weather.

00:03:30.110 --> 00:03:37.140
And then you could say that my
sample space is really, heads,

00:03:37.140 --> 00:03:38.410
and it's raining.

00:03:40.970 --> 00:03:44.965
Another possible outcome
is heads and no rain.

00:03:48.780 --> 00:03:55.640
Another possible outcome is
tails, and it's raining, and,

00:03:55.640 --> 00:03:59.975
finally, another possible
outcome is tails and no rain.

00:04:05.490 --> 00:04:11.560
This set, consisting of four
elements, is also a perfectly

00:04:11.560 --> 00:04:14.315
legitimate sample space
for the experiment

00:04:14.315 --> 00:04:16.140
of flipping a coin.

00:04:16.140 --> 00:04:19.060
The elements of this sample
space are mutually exclusive

00:04:19.060 --> 00:04:20.390
and collectively exhaustive.

00:04:20.390 --> 00:04:24.830
Exactly one of these outcomes
is going to be true, or will

00:04:24.830 --> 00:04:27.640
have materialized, at the
end of the experiment.

00:04:27.640 --> 00:04:30.440
So which sample space
is the correct one?

00:04:30.440 --> 00:04:32.950
This sample space, the
second one, involves

00:04:32.950 --> 00:04:34.970
some irrelevant details.

00:04:34.970 --> 00:04:40.090
So the preferred sample space
for describing the flipping of

00:04:40.090 --> 00:04:44.260
a coin, the preferred sample
space is the simpler one, the

00:04:44.260 --> 00:04:48.340
first one, which is sort of at
the right granularity, given

00:04:48.340 --> 00:04:50.640
what we're interested in.

00:04:50.640 --> 00:04:54.010
But ultimately, the question
of which one is the right

00:04:54.010 --> 00:04:56.760
sample space depends
on what kind of

00:04:56.760 --> 00:04:58.840
questions you want to answer.

00:04:58.840 --> 00:05:02.280
For example, if you have a
theory that the weather

00:05:02.280 --> 00:05:07.020
affects the behavior of coins,
then, in order to play with

00:05:07.020 --> 00:05:11.960
that theory, or maybe check it
out, and so on, then, in such

00:05:11.960 --> 00:05:15.810
a case, you might want to work
with the second sample space.

00:05:15.810 --> 00:05:19.070
This is a common feature
in all of science.

00:05:19.070 --> 00:05:22.670
Whenever you put together a
model, you need to decide how

00:05:22.670 --> 00:05:25.080
detailed you want your
model to be.

00:05:25.080 --> 00:05:28.870
And the right level of detail is
the one that captures those

00:05:28.870 --> 00:05:32.500
aspects that are relevant
and of interest to you.