WEBVTT
00:00:00.040 --> 00:00:02.460
The following content is
provided under a Creative
00:00:02.460 --> 00:00:03.870
Commons license.
00:00:03.870 --> 00:00:06.910
Your support will help MIT
OpenCourseWare continue to
00:00:06.910 --> 00:00:10.560
offer high quality educational
resources for free.
00:00:10.560 --> 00:00:13.460
To make a donation or view
additional materials from
00:00:13.460 --> 00:00:17.390
hundreds of MIT courses, visit
MIT OpenCourseWare at
00:00:17.390 --> 00:00:18.640
ocw.mit.edu.
00:00:22.440 --> 00:00:29.250
PROFESSOR: OK, so welcome to
6.041/6.431, the class on
00:00:29.250 --> 00:00:31.750
probability models
and the like.
00:00:31.750 --> 00:00:32.740
I'm John Tsitsiklis.
00:00:32.740 --> 00:00:36.340
I will be teaching this class,
and I'm looking forward to
00:00:36.340 --> 00:00:41.060
this being an enjoyable and
also useful experience.
00:00:41.060 --> 00:00:44.500
We have a fair amount of staff
involved in this course, your
00:00:44.500 --> 00:00:48.040
recitation instructors and also
a bunch of TAs, but I
00:00:48.040 --> 00:00:52.860
want to single out our head
TA, Uzoma, who is the key
00:00:52.860 --> 00:00:54.450
person in this class.
00:00:54.450 --> 00:00:56.550
Everything has to
go through him.
00:00:56.550 --> 00:00:59.640
If he doesn't know in which
recitation section you are,
00:00:59.640 --> 00:01:03.700
then simply you do not exist,
so keep that in mind.
00:01:03.700 --> 00:01:04.099
All right.
00:01:04.099 --> 00:01:08.360
So we want to jump right into
the subject, but I'm going to
00:01:08.360 --> 00:01:11.210
take just a few minutes
to talk about a few
00:01:11.210 --> 00:01:14.580
administrative details and
how the course is run.
00:01:14.580 --> 00:01:17.990
So we're going to have lectures
twice a week and I'm
00:01:17.990 --> 00:01:20.300
going to use old fashioned
transparencies.
00:01:20.300 --> 00:01:23.270
Now, you get copies of these
slides with plenty of space
00:01:23.270 --> 00:01:25.760
for you to keep notes on them.
00:01:25.760 --> 00:01:31.190
A useful way of making good use
of the slides is to use
00:01:31.190 --> 00:01:33.670
them as a sort of mnemonic
summary of
00:01:33.670 --> 00:01:35.720
what happens in lecture.
00:01:35.720 --> 00:01:38.460
Not everything that I'm going
to say is, of course, on the
00:01:38.460 --> 00:01:41.700
slides, but by looking them you
get the sense of what's
00:01:41.700 --> 00:01:42.760
happening right now.
00:01:42.760 --> 00:01:45.940
And it may be a good idea to
review them before you go to
00:01:45.940 --> 00:01:47.240
recitation.
00:01:47.240 --> 00:01:48.310
So what happens in recitation?
00:01:48.310 --> 00:01:52.040
In recitation, your recitation
instructor is going to maybe
00:01:52.040 --> 00:01:55.140
review some of the theory
and then solve some
00:01:55.140 --> 00:01:57.150
problems for you.
00:01:57.150 --> 00:02:00.520
And then you have tutorials
where you meet in very small
00:02:00.520 --> 00:02:02.750
groups together with your TA.
00:02:02.750 --> 00:02:05.740
And what happens in tutorials
is that you actually do the
00:02:05.740 --> 00:02:09.020
problem solving with the help
of your TA and the help of
00:02:09.020 --> 00:02:12.290
your classmates in your
tutorial section.
00:02:12.290 --> 00:02:14.340
Now probability is
a tricky subject.
00:02:14.340 --> 00:02:16.750
You may be reading the text,
listening to lectures,
00:02:16.750 --> 00:02:20.660
everything makes perfect sense,
and so on, but until
00:02:20.660 --> 00:02:23.510
you actually sit down and try
to solve problems, you don't
00:02:23.510 --> 00:02:25.600
quite appreciate the
subtleties and the
00:02:25.600 --> 00:02:27.310
difficulties that
are involved.
00:02:27.310 --> 00:02:30.550
So problem solving is a key
part of this class.
00:02:30.550 --> 00:02:34.010
And tutorials are extremely
useful just for this reason
00:02:34.010 --> 00:02:36.710
because that's where you
actually get the practice of
00:02:36.710 --> 00:02:39.620
solving problems on your own,
as opposed to seeing someone
00:02:39.620 --> 00:02:43.510
else who's solving
them for you.
00:02:43.510 --> 00:02:46.840
OK but, mechanics, a key part
of what's going to happen
00:02:46.840 --> 00:02:51.890
today is that you will turn in
your schedule forms that are
00:02:51.890 --> 00:02:55.350
at the end of the handout that
you have in your hands.
00:02:55.350 --> 00:02:59.820
Then, the TAs will be working
frantically through the night,
00:02:59.820 --> 00:03:04.000
and they're going to be
producing a list of who goes
00:03:04.000 --> 00:03:05.700
into what section.
00:03:05.700 --> 00:03:09.640
And when that happens, any
person in this class, with
00:03:09.640 --> 00:03:13.350
probability 90%, is going to be
happy with their assignment
00:03:13.350 --> 00:03:17.670
and, with probability 10%,
they're going to be unhappy.
00:03:17.670 --> 00:03:20.860
Now, unhappy people have
an option, though.
00:03:20.860 --> 00:03:23.820
You can resubmit your form
together with your full
00:03:23.820 --> 00:03:27.470
schedule and constraints, give
it back to the head TA, who
00:03:27.470 --> 00:03:32.160
will then do some further
juggling and reassign people,
00:03:32.160 --> 00:03:36.270
and after that happens, 90% of
those unhappy people will
00:03:36.270 --> 00:03:37.570
become happy.
00:03:37.570 --> 00:03:42.270
And 10% of them will
be less unhappy.
00:03:42.270 --> 00:03:42.840
OK.
00:03:42.840 --> 00:03:46.930
So what's the probability that a
random person is going to be
00:03:46.930 --> 00:03:49.800
unhappy at the end
of this process?
00:03:49.800 --> 00:03:50.780
It's 1%.
00:03:50.780 --> 00:03:51.330
Excellent.
00:03:51.330 --> 00:03:51.490
Good.
00:03:51.490 --> 00:03:53.200
Maybe you don't need
this class.
00:03:53.200 --> 00:03:54.340
OK, so 1%.
00:03:54.340 --> 00:03:57.370
We have about 100 people in this
class, so there's going
00:03:57.370 --> 00:03:59.590
to be about one unhappy
person.
00:03:59.590 --> 00:04:03.020
I mean, anywhere you look in
life, in any group you look
00:04:03.020 --> 00:04:05.370
at, there's always one unhappy
person, right?
00:04:05.370 --> 00:04:09.060
So, what can we do about it?
00:04:09.060 --> 00:04:09.660
All right.
00:04:09.660 --> 00:04:12.710
Another important part about
mechanics is to read carefully
00:04:12.710 --> 00:04:15.540
the statement that we have about
collaboration, academic
00:04:15.540 --> 00:04:17.019
honesty, and all that.
00:04:17.019 --> 00:04:19.149
You're encouraged, it's
a very good idea to
00:04:19.149 --> 00:04:21.140
work with other students.
00:04:21.140 --> 00:04:24.690
You can consult sources that
are out there, but when you
00:04:24.690 --> 00:04:28.140
sit down and write your
solutions you have to do that
00:04:28.140 --> 00:04:32.050
by setting things aside and just
write them on your own.
00:04:32.050 --> 00:04:34.360
You cannot copy something
that somebody else
00:04:34.360 --> 00:04:37.040
has given to you.
00:04:37.040 --> 00:04:41.390
One reason is that we're not
going to like it when it
00:04:41.390 --> 00:04:44.280
happens, and then another reason
is that you're not
00:04:44.280 --> 00:04:46.270
going to do yourself
any favor.
00:04:46.270 --> 00:04:48.830
Really the only way to do well
in this class is to get a lot
00:04:48.830 --> 00:04:51.620
of practice by solving
problems yourselves.
00:04:51.620 --> 00:04:55.160
So if you don't do that on your
own, then when quiz and
00:04:55.160 --> 00:04:59.070
exam time comes, things are
going to be difficult.
00:04:59.070 --> 00:05:02.590
So, as I mentioned here, we're
going to have recitation
00:05:02.590 --> 00:05:06.540
sections, that some of them are
for 6.041 students, some
00:05:06.540 --> 00:05:10.270
are for 6.431 students, the
graduate section of the class.
00:05:10.270 --> 00:05:12.950
Now undergraduates
can sit in the
00:05:12.950 --> 00:05:14.690
graduate recitation sections.
00:05:14.690 --> 00:05:17.650
What's going to happen there is
that things may be just a
00:05:17.650 --> 00:05:21.260
little faster and you may be
covering a problem that's a
00:05:21.260 --> 00:05:23.300
little more advanced and
is not covered in
00:05:23.300 --> 00:05:24.670
the undergrad sections.
00:05:24.670 --> 00:05:28.190
But if you sit in the graduate
section, and you're an
00:05:28.190 --> 00:05:31.140
undergraduate, you're still
just responsible for the
00:05:31.140 --> 00:05:33.130
undergraduate material.
00:05:33.130 --> 00:05:35.760
That is, you can just do the
undergraduate work in the
00:05:35.760 --> 00:05:38.470
class, but maybe be exposed
at the different section.
00:05:41.070 --> 00:05:43.036
OK.
00:05:43.036 --> 00:05:46.220
A few words about the
style of this class.
00:05:46.220 --> 00:05:50.760
We want to focus on basic
ideas and concepts.
00:05:50.760 --> 00:05:53.860
There's going to be lots of
formulas, but what we try to
00:05:53.860 --> 00:05:56.530
do in this class is to actually
have you understand
00:05:56.530 --> 00:05:58.190
what those formulas mean.
00:05:58.190 --> 00:06:01.260
And, in a year from now when
almost all of the formulas
00:06:01.260 --> 00:06:04.660
have been wiped out from your
memory, you still have the
00:06:04.660 --> 00:06:05.610
basic concepts.
00:06:05.610 --> 00:06:08.690
You can understand them, so when
you look things up again,
00:06:08.690 --> 00:06:12.820
they will still make sense.
00:06:12.820 --> 00:06:16.880
It's not the plug and chug kind
of class where you're
00:06:16.880 --> 00:06:19.430
given a list of formulas, you're
given numbers, and you
00:06:19.430 --> 00:06:21.470
plug in and you get answers.
00:06:21.470 --> 00:06:24.950
The really hard part is usually
to choose which
00:06:24.950 --> 00:06:26.280
formulas you're going to use.
00:06:26.280 --> 00:06:28.900
You need judgment, you
need intuition.
00:06:28.900 --> 00:06:32.400
Lots of probability problems, at
least the interesting ones,
00:06:32.400 --> 00:06:34.450
often have lots of different
solutions.
00:06:34.450 --> 00:06:37.440
Some are extremely long, some
are extremely short.
00:06:37.440 --> 00:06:40.550
The extremely short ones usually
involve some kind of
00:06:40.550 --> 00:06:44.320
deeper understanding of what's
going on so that you can pick
00:06:44.320 --> 00:06:46.350
a shortcut and use it.
00:06:46.350 --> 00:06:48.300
And hopefully you are going
to develop this
00:06:48.300 --> 00:06:51.630
skill during this class.
00:06:51.630 --> 00:06:56.360
Now, I could spend a lot of time
in this lecture talking
00:06:56.360 --> 00:06:58.570
about why the subject
is important.
00:06:58.570 --> 00:07:02.270
I'll keep it short because I
think it's almost obvious.
00:07:02.270 --> 00:07:05.650
Anything that happens in
life is uncertain.
00:07:05.650 --> 00:07:09.080
There's uncertainty anywhere, so
whatever you try to do, you
00:07:09.080 --> 00:07:12.550
need to have some way of dealing
or thinking about this
00:07:12.550 --> 00:07:13.930
uncertainty.
00:07:13.930 --> 00:07:17.110
And the way to do that in a
systematic way is by using the
00:07:17.110 --> 00:07:20.110
models that are given to us
by probability theory.
00:07:20.110 --> 00:07:22.330
So if you're an engineer and
you're dealing with a
00:07:22.330 --> 00:07:25.470
communication system or signal
processing, basically you're
00:07:25.470 --> 00:07:28.440
facing a fight against noise.
00:07:28.440 --> 00:07:30.380
Noise is random, is uncertain.
00:07:30.380 --> 00:07:31.450
How do you model it?
00:07:31.450 --> 00:07:33.120
How do you deal with it?
00:07:33.120 --> 00:07:36.400
If you're a manager, I guess
you're dealing with customer
00:07:36.400 --> 00:07:38.410
demand, which is, of
course, random.
00:07:38.410 --> 00:07:41.590
Or you're dealing with the
stock market, which is
00:07:41.590 --> 00:07:42.820
definitely random.
00:07:42.820 --> 00:07:48.190
Or you play the casino, which
is, again, random, and so on.
00:07:48.190 --> 00:07:51.100
And the same goes for pretty
much any other field that you
00:07:51.100 --> 00:07:52.880
can think of.
00:07:52.880 --> 00:07:57.320
But, independent of which field
you're coming from, the
00:07:57.320 --> 00:08:00.630
basic concepts and tools are
really all the same.
00:08:00.630 --> 00:08:04.080
So you may see in bookstores
that there are books,
00:08:04.080 --> 00:08:07.010
probability for scientists,
probability for engineers,
00:08:07.010 --> 00:08:09.900
probability for social
scientists, probability for
00:08:09.900 --> 00:08:11.440
astrologists.
00:08:11.440 --> 00:08:14.880
Well, what all those books have
inside them is exactly
00:08:14.880 --> 00:08:18.040
the same models, the same
equations, the same problems.
00:08:18.040 --> 00:08:21.510
They just make them somewhat
different word problems.
00:08:21.510 --> 00:08:26.000
The basic concepts are just one
and the same, and we'll
00:08:26.000 --> 00:08:30.420
take this as an excuse for not
going too much into specific
00:08:30.420 --> 00:08:31.960
domain applications.
00:08:31.960 --> 00:08:35.260
We will have problems and
examples that are motivated,
00:08:35.260 --> 00:08:38.140
in some loose sense, from
real world situations.
00:08:38.140 --> 00:08:42.030
But we're not really trying in
this class to develop the
00:08:42.030 --> 00:08:46.220
skills for domain-specific
problems.
00:08:46.220 --> 00:08:49.660
Rather, we're going to try to
stick to general understanding
00:08:49.660 --> 00:08:52.390
of the subject.
00:08:52.390 --> 00:08:52.760
OK.
00:08:52.760 --> 00:08:57.280
So the next slide, of which you
do have in your handout,
00:08:57.280 --> 00:09:01.080
gives you a few more details
about the class.
00:09:01.080 --> 00:09:04.540
Maybe one thing to comment here
is that you do need to
00:09:04.540 --> 00:09:06.370
read the text.
00:09:06.370 --> 00:09:09.420
And with calculus books, perhaps
you can live with a
00:09:09.420 --> 00:09:12.640
just a two page summary of all
of the interesting formulas in
00:09:12.640 --> 00:09:18.050
calculus, and you can get by
just with those formulas.
00:09:18.050 --> 00:09:20.430
But here, because we want
to develop concepts and
00:09:20.430 --> 00:09:24.260
intuition, actually reading
words, as opposed to just
00:09:24.260 --> 00:09:27.430
browsing through equations,
does make a difference.
00:09:27.430 --> 00:09:30.250
In the beginning, the class
is kind of easy.
00:09:30.250 --> 00:09:32.820
When we deal with discrete
probability, that's the
00:09:32.820 --> 00:09:37.320
material until our first quiz,
and some of you may get by
00:09:37.320 --> 00:09:40.710
without being too systematic
about following the material.
00:09:40.710 --> 00:09:43.970
But it does get substantially
harder afterwards.
00:09:43.970 --> 00:09:48.110
And I would keep restating that
you do have to read the
00:09:48.110 --> 00:09:52.460
text to really understand
the material.
00:09:52.460 --> 00:09:52.980
OK.
00:09:52.980 --> 00:09:57.850
So now we can start with the
real part of the lecture.
00:09:57.850 --> 00:10:01.670
Let us set the goals
for today.
00:10:01.670 --> 00:10:05.890
So probability, or probability
theory, is a framework for
00:10:05.890 --> 00:10:09.870
dealing with uncertainty, for
dealing with situations in
00:10:09.870 --> 00:10:12.200
which we have some kind
of randomness.
00:10:12.200 --> 00:10:16.300
So what we want to do is, by the
end of today's lecture, to
00:10:16.300 --> 00:10:21.910
give you anything that you need
to know how to set up
00:10:21.910 --> 00:10:23.970
what does it take to set up
a probabilistic model.
00:10:23.970 --> 00:10:28.390
And what are the basic rules of
the game for dealing with
00:10:28.390 --> 00:10:30.520
probabilistic models?
00:10:30.520 --> 00:10:32.780
So, by the end of this lecture,
you will have
00:10:32.780 --> 00:10:34.750
essentially recovered
half of this
00:10:34.750 --> 00:10:36.860
semester's tuition, right?
00:10:36.860 --> 00:10:39.040
So we're going to talk
about probabilistic
00:10:39.040 --> 00:10:40.820
models in more detail--
00:10:40.820 --> 00:10:43.920
the sample space, which is
basically a description of all
00:10:43.920 --> 00:10:47.410
the things that may happen
during a random experiment,
00:10:47.410 --> 00:10:50.940
and the probability law, which
describes our beliefs about
00:10:50.940 --> 00:10:53.710
which outcomes are more
likely to occur
00:10:53.710 --> 00:10:56.080
compared to other outcomes.
00:10:56.080 --> 00:10:59.130
Probability laws have to obey
certain properties that we
00:10:59.130 --> 00:11:00.640
call the axioms of
probability.
00:11:00.640 --> 00:11:04.640
So the main part of today's
lecture is to describe those
00:11:04.640 --> 00:11:09.350
axioms, which are the rules of
the game, and consider a few
00:11:09.350 --> 00:11:12.770
really trivial examples.
00:11:12.770 --> 00:11:15.370
OK, so let's start
with our agenda.
00:11:15.370 --> 00:11:18.080
The first piece in a
probabilistic model is a
00:11:18.080 --> 00:11:21.850
description of the sample
space of an experiment.
00:11:21.850 --> 00:11:27.470
So we do an experiment, and by
experiment we just mean that
00:11:27.470 --> 00:11:30.270
just something happens
out there.
00:11:30.270 --> 00:11:33.300
And that something that happens,
it could be flipping
00:11:33.300 --> 00:11:39.320
a coin, or it could be rolling
a dice, or it could be doing
00:11:39.320 --> 00:11:41.550
something in a card game.
00:11:41.550 --> 00:11:44.190
So we fix a particular
experiment.
00:11:44.190 --> 00:11:48.780
And we come up with a list of
all the possible things that
00:11:48.780 --> 00:11:51.090
may happen during
this experiment.
00:11:51.090 --> 00:11:54.880
So we write down a list of all
the possible outcomes.
00:11:54.880 --> 00:11:57.830
So here's a list of all the
possible outcomes of the
00:11:57.830 --> 00:11:59.050
experiment.
00:11:59.050 --> 00:12:02.730
I use the word "list," but, if
you want to be a little more
00:12:02.730 --> 00:12:06.730
formal, it's better to think
of that list as a set.
00:12:06.730 --> 00:12:08.630
So we have a set.
00:12:08.630 --> 00:12:11.000
That set is our sample space.
00:12:11.000 --> 00:12:14.840
And it's a set whose elements
are the possible outcomes of
00:12:14.840 --> 00:12:15.920
the experiment.
00:12:15.920 --> 00:12:18.530
So, for example, if you're
dealing with flipping a coin,
00:12:18.530 --> 00:12:22.380
your sample space would be
heads, this is one outcome,
00:12:22.380 --> 00:12:24.450
tails is one outcome.
00:12:24.450 --> 00:12:27.540
And this set, which has two
elements, is the sample space
00:12:27.540 --> 00:12:29.260
of the experiment.
00:12:29.260 --> 00:12:29.670
OK.
00:12:29.670 --> 00:12:33.260
What do we need to think about
when we're setting up the
00:12:33.260 --> 00:12:34.430
sample space?
00:12:34.430 --> 00:12:36.690
First, the list should be
mutually exclusive,
00:12:36.690 --> 00:12:37.830
collectively exhaustive.
00:12:37.830 --> 00:12:39.150
What does that mean?
00:12:39.150 --> 00:12:42.490
Collectively exhaustive means
that, no matter what happens
00:12:42.490 --> 00:12:45.730
in the experiment, you're
going to get one of the
00:12:45.730 --> 00:12:47.700
outcomes inside here.
00:12:47.700 --> 00:12:51.010
So you have not forgotten any
of the possibilities of what
00:12:51.010 --> 00:12:53.020
may happen in the experiment.
00:12:53.020 --> 00:12:57.720
Mutually exclusive means that
if this happens, then that
00:12:57.720 --> 00:12:58.870
cannot happen.
00:12:58.870 --> 00:13:01.580
So at the end of the experiment,
you should be able
00:13:01.580 --> 00:13:06.570
to point out to me just one,
exactly one, of these outcomes
00:13:06.570 --> 00:13:10.660
and say, this is the outcome
that happened.
00:13:10.660 --> 00:13:11.040
OK.
00:13:11.040 --> 00:13:13.690
So these are sort of
basic requirements.
00:13:13.690 --> 00:13:16.540
There's another requirement
which is a little more loose.
00:13:16.540 --> 00:13:19.150
When you set up your sample
space, sometimes you do have
00:13:19.150 --> 00:13:23.530
some freedom about the details
of how you're going to
00:13:23.530 --> 00:13:24.900
describe it.
00:13:24.900 --> 00:13:27.160
And the question is,
how much detail are
00:13:27.160 --> 00:13:28.730
you going to include?
00:13:28.730 --> 00:13:31.880
So let's take this coin flipping
experiment and think
00:13:31.880 --> 00:13:34.070
of the following sample space.
00:13:34.070 --> 00:13:37.825
One possible outcome is heads,
a second possible outcome is
00:13:37.825 --> 00:13:44.000
tails and it's raining, and the
third possible outcome is
00:13:44.000 --> 00:13:45.500
tails and it's not raining.
00:13:49.180 --> 00:13:52.760
So this is another possible
sample space for the
00:13:52.760 --> 00:13:56.910
experiment where I flip
a coin just once.
00:13:56.910 --> 00:13:58.330
It's a legitimate one.
00:13:58.330 --> 00:14:01.600
These three possibilities are
mutually exclusive and
00:14:01.600 --> 00:14:03.470
collectively exhaustive.
00:14:03.470 --> 00:14:05.410
Which one is the right
sample space?
00:14:05.410 --> 00:14:08.440
Is it this one or that one?
00:14:08.440 --> 00:14:12.020
Well, if you think that my coin
flipping inside this room
00:14:12.020 --> 00:14:15.690
is completely unrelated to the
weather outside, then you're
00:14:15.690 --> 00:14:18.470
going to stick with
this sample space.
00:14:18.470 --> 00:14:22.080
If, on the other hand, you have
some superstitious belief
00:14:22.080 --> 00:14:27.180
that maybe rain has an effect
on my coins, you might work
00:14:27.180 --> 00:14:29.520
with the sample space
of this kind.
00:14:29.520 --> 00:14:33.190
So you probably wouldn't do
that, but it's a legitimate
00:14:33.190 --> 00:14:35.370
option, strictly speaking.
00:14:35.370 --> 00:14:38.900
Now this example is a little bit
on the frivolous side, but
00:14:38.900 --> 00:14:42.600
the issue that comes up here is
a basic one that shows up
00:14:42.600 --> 00:14:44.700
anywhere in science
and engineering.
00:14:44.700 --> 00:14:48.150
Whenever you're dealing with a
model or with a situation,
00:14:48.150 --> 00:14:50.645
there are zillions of details
in that situation.
00:14:50.645 --> 00:14:54.350
And when you come up with a
model, you choose some of
00:14:54.350 --> 00:14:58.220
those details that you keep in
your model, and some that you
00:14:58.220 --> 00:15:00.060
say, well, these
are irrelevant.
00:15:00.060 --> 00:15:03.780
Or maybe there are small
effects, I can neglect them,
00:15:03.780 --> 00:15:05.970
and you keep them outside
your model.
00:15:05.970 --> 00:15:09.420
So when you go to the real
world, there's definitely an
00:15:09.420 --> 00:15:12.950
element of art and some judgment
that you need to do
00:15:12.950 --> 00:15:15.930
in order to set up an
appropriate sample space.
00:15:20.270 --> 00:15:23.310
So, an easy example now.
00:15:23.310 --> 00:15:26.000
So of course, the elementary
examples are
00:15:26.000 --> 00:15:29.420
coins, cards, and dice.
00:15:29.420 --> 00:15:30.840
So let's deal with dice.
00:15:30.840 --> 00:15:34.550
But to keep the diagram small,
instead of a six-sided die,
00:15:34.550 --> 00:15:38.270
we're going to think about the
die that only has four faces.
00:15:38.270 --> 00:15:40.220
So you can do that with
a tetrahedron,
00:15:40.220 --> 00:15:41.150
doesn't really matter.
00:15:41.150 --> 00:15:44.110
Basically, it's a die that when
you roll it, you get a
00:15:44.110 --> 00:15:47.360
result which is one,
two, three or four.
00:15:47.360 --> 00:15:50.860
However, the experiment that I'm
going to think about will
00:15:50.860 --> 00:15:55.770
consist of two rolls
of a dice.
00:15:55.770 --> 00:15:57.600
A crucial point here--
00:15:57.600 --> 00:16:01.580
I'm rolling the die twice, but
I'm thinking of this as just
00:16:01.580 --> 00:16:06.370
one experiment, not two
different experiments, not a
00:16:06.370 --> 00:16:10.110
repetition twice of the
same experiment.
00:16:10.110 --> 00:16:12.040
So it's one big experiment.
00:16:12.040 --> 00:16:15.190
During that big experiment
various things could happen,
00:16:15.190 --> 00:16:17.910
such as I'm rolling the
die once, and then I'm
00:16:17.910 --> 00:16:20.384
rolling the die twice.
00:16:20.384 --> 00:16:22.450
OK.
00:16:22.450 --> 00:16:25.280
So what's the sample space
for that experiment?
00:16:25.280 --> 00:16:27.020
Well, the sample space
consists of
00:16:27.020 --> 00:16:28.700
the possible outcomes.
00:16:28.700 --> 00:16:33.220
One possible outcome is that
your first roll resulted in
00:16:33.220 --> 00:16:36.670
two and the second roll
resulted in three.
00:16:36.670 --> 00:16:40.950
In which case, the outcome that
you get is this one, a
00:16:40.950 --> 00:16:42.840
two followed by three.
00:16:42.840 --> 00:16:45.840
This is one possible outcome.
00:16:45.840 --> 00:16:49.750
The way I'm describing things,
this outcome is to be
00:16:49.750 --> 00:16:54.130
distinguished from this outcome
here, where a three is
00:16:54.130 --> 00:16:56.656
followed by two.
00:16:56.656 --> 00:17:00.500
If you're playing backgammon, it
doesn't matter which one of
00:17:00.500 --> 00:17:02.250
the two happened.
00:17:02.250 --> 00:17:05.819
But if you're dealing with a
probabilistic model that you
00:17:05.819 --> 00:17:08.530
want to keep track of everything
that happens in
00:17:08.530 --> 00:17:12.829
this composite experiment, there
are good reasons for
00:17:12.829 --> 00:17:15.859
distinguishing between
these two outcomes.
00:17:15.859 --> 00:17:18.609
I mean, when this happens,
it's definitely something
00:17:18.609 --> 00:17:20.220
different from that happening.
00:17:20.220 --> 00:17:22.900
A two followed by a three is
different from a three
00:17:22.900 --> 00:17:24.349
followed by a two.
00:17:24.349 --> 00:17:27.700
So this is the correct sample
space for this experiment
00:17:27.700 --> 00:17:29.890
where we roll the die twice.
00:17:29.890 --> 00:17:32.980
It has a total of 16 elements
and it's, of
00:17:32.980 --> 00:17:35.840
course, a finite set.
00:17:35.840 --> 00:17:39.960
Sometimes, instead of describing
sample spaces in
00:17:39.960 --> 00:17:44.250
terms of lists, or sets, or
diagrams of this kind, it's
00:17:44.250 --> 00:17:46.930
useful to describe
the experiment in
00:17:46.930 --> 00:17:48.660
some sequential way.
00:17:48.660 --> 00:17:50.950
Whenever you have an experiment
that consists of
00:17:50.950 --> 00:17:55.790
multiple stages, it might be
useful, at least visually, to
00:17:55.790 --> 00:17:59.940
give a diagram that shows you
how those stages evolve.
00:17:59.940 --> 00:18:04.080
And that's what we do by using
a sequential description or a
00:18:04.080 --> 00:18:08.390
tree-based description by
drawing a tree of the possible
00:18:08.390 --> 00:18:11.250
evolutions during
our experiment.
00:18:11.250 --> 00:18:14.890
So in this tree, I'm thinking
of a first stage in which I
00:18:14.890 --> 00:18:18.600
roll the first die, and there
are four possible results,
00:18:18.600 --> 00:18:20.520
one, two, three and
four.and 4.
00:18:20.520 --> 00:18:24.310
And, given what happened, let's
say in the first roll,
00:18:24.310 --> 00:18:26.050
suppose I got a one.
00:18:26.050 --> 00:18:28.980
Then I'm rolling the second
dice, and there are four
00:18:28.980 --> 00:18:32.060
possibilities for what may
happen to the second die.
00:18:32.060 --> 00:18:33.570
And the possible results
are one, tow,
00:18:33.570 --> 00:18:36.010
three and four again.
00:18:36.010 --> 00:18:38.860
So what's the relation between
the two diagrams?
00:18:38.860 --> 00:18:42.910
Well, for example, the outcome
two followed by three
00:18:42.910 --> 00:18:46.940
corresponds to this
path on the tree.
00:18:46.940 --> 00:18:50.550
So this path corresponds to
two followed by a three.
00:18:50.550 --> 00:18:54.200
Any path is associated to a
particular outcome, any
00:18:54.200 --> 00:18:57.360
outcome is associated to
a particular path.
00:18:57.360 --> 00:19:00.370
And, instead of paths, you may
want to think in terms of the
00:19:00.370 --> 00:19:01.990
leaves of this diagram.
00:19:01.990 --> 00:19:05.740
Same thing, think of each one
of the leaves as being one
00:19:05.740 --> 00:19:07.980
possible outcome.
00:19:07.980 --> 00:19:11.160
And of course we have 16
outcomes here, we have 16
00:19:11.160 --> 00:19:12.790
outcomes here.
00:19:12.790 --> 00:19:15.920
Maybe you noticed the subtlety
that I used in my language.
00:19:15.920 --> 00:19:18.810
I said I rolled the first
dice and the result
00:19:18.810 --> 00:19:20.580
that I get is a two.
00:19:20.580 --> 00:19:23.700
I didn't use the word "outcome."
I want to reserve
00:19:23.700 --> 00:19:28.960
the word "outcome" to mean the
overall outcome at the end of
00:19:28.960 --> 00:19:30.570
the overall experiment.
00:19:30.570 --> 00:19:36.300
So "2, 3" is the outcome
of the experiment.
00:19:36.300 --> 00:19:38.910
The experiment consisted
of stages.
00:19:38.910 --> 00:19:41.620
Two was the result in the first
stage, three was the
00:19:41.620 --> 00:19:43.370
result in the second stage.
00:19:43.370 --> 00:19:45.720
You put all those results
together, and
00:19:45.720 --> 00:19:47.520
you get your outcome.
00:19:47.520 --> 00:19:53.550
OK, perhaps we are splitting
hairs here, but it's useful to
00:19:53.550 --> 00:19:56.470
keep the concepts right.
00:19:56.470 --> 00:19:59.780
What's special about this
example is that, besides being
00:19:59.780 --> 00:20:03.230
trivial, it has a sample
space which is finite.
00:20:03.230 --> 00:20:06.000
There's 16 possible
total outcomes.
00:20:06.000 --> 00:20:09.210
Not every experiment has
a finite sample space.
00:20:09.210 --> 00:20:12.840
Here's an experiment in which
the sample space is infinite.
00:20:12.840 --> 00:20:17.690
So you are playing darts and
the target is this square.
00:20:17.690 --> 00:20:21.740
And you're perfect at that game,
so you're sure that your
00:20:21.740 --> 00:20:26.010
darts will always fall
inside the square.
00:20:26.010 --> 00:20:29.130
So, but where exactly your dart
would fall inside that
00:20:29.130 --> 00:20:31.180
square, that itself is random.
00:20:31.180 --> 00:20:32.880
We don't know what
it's going to be.
00:20:32.880 --> 00:20:34.300
It's uncertain.
00:20:34.300 --> 00:20:38.090
So all the possible points
inside the square are possible
00:20:38.090 --> 00:20:39.710
outcomes of the experiment.
00:20:39.710 --> 00:20:43.060
So a typical outcome of the
experiment is going to a pair
00:20:43.060 --> 00:20:46.490
of numbers, x,y, where x
and y are real numbers
00:20:46.490 --> 00:20:48.280
between zero and one.
00:20:48.280 --> 00:20:51.390
Now there's infinitely many
real numbers, there's
00:20:51.390 --> 00:20:55.270
infinitely many points in the
square, so this is an example
00:20:55.270 --> 00:20:58.740
in which our sample space
is an infinite set.
00:21:01.670 --> 00:21:06.910
OK, so we're going to revisit
this example a little later.
00:21:06.910 --> 00:21:11.790
So these are two examples of
what the sample space might be
00:21:11.790 --> 00:21:13.730
in simple experiments.
00:21:13.730 --> 00:21:18.240
Now, the more important order of
business is now to look at
00:21:18.240 --> 00:21:21.800
those possible outcomes and to
make some statements about
00:21:21.800 --> 00:21:23.910
their relative likelihoods.
00:21:23.910 --> 00:21:26.780
Which outcome is more
likely to occur
00:21:26.780 --> 00:21:29.060
compared to the others?
00:21:29.060 --> 00:21:32.510
And the way we do this
is by assigning
00:21:32.510 --> 00:21:36.210
probabilities to the outcomes.
00:21:36.210 --> 00:21:38.590
Well, not exactly.
00:21:38.590 --> 00:21:42.440
Suppose that all you were to do
was to assign probabilities
00:21:42.440 --> 00:21:44.320
to individual outcomes.
00:21:44.320 --> 00:21:49.200
If you go back to this example,
and you consider one
00:21:49.200 --> 00:21:52.250
particular outcome-- let's
say this point--
00:21:52.250 --> 00:21:55.620
what would be the probability
that you hit exactly this
00:21:55.620 --> 00:21:58.640
point to infinite precision?
00:21:58.640 --> 00:22:01.070
Intuitively, that probability
would be zero.
00:22:01.070 --> 00:22:05.630
So any individual point in this
diagram in any reasonable
00:22:05.630 --> 00:22:08.520
model should have zero
probability.
00:22:08.520 --> 00:22:11.870
So if you just tell me that
any individual outcome has
00:22:11.870 --> 00:22:14.440
zero probability, you're
not really telling me
00:22:14.440 --> 00:22:17.030
much to work with.
00:22:17.030 --> 00:22:20.910
For that reason, what instead
we're going to do is to assign
00:22:20.910 --> 00:22:25.150
probabilities to subsets of the
sample space, as opposed
00:22:25.150 --> 00:22:29.170
to assigning probabilities
to individual outcomes.
00:22:29.170 --> 00:22:32.410
So here's the picture.
00:22:32.410 --> 00:22:36.890
We have our sample space,
which is omega, and we
00:22:36.890 --> 00:22:39.690
consider some subset of
the sample space.
00:22:39.690 --> 00:22:45.820
Call it A. And I want to assign
a number, a numerical
00:22:45.820 --> 00:22:50.720
probability, to this particular
subset which
00:22:50.720 --> 00:22:56.950
represents my belief about how
likely this set is to occur.
00:22:56.950 --> 00:22:57.340
OK.
00:22:57.340 --> 00:23:01.250
What do we mean "to occur?"
And I'm introducing here a
00:23:01.250 --> 00:23:03.770
language that's being used
in probability theory.
00:23:03.770 --> 00:23:07.410
When we talk about subsets of
the sample space, we usually
00:23:07.410 --> 00:23:10.470
call them events, as
opposed to subsets.
00:23:10.470 --> 00:23:14.480
And the reason is because it
works nicely with the language
00:23:14.480 --> 00:23:16.710
that describes what's
going on.
00:23:16.710 --> 00:23:19.010
So the outcome is a point.
00:23:19.010 --> 00:23:20.540
The outcome is random.
00:23:20.540 --> 00:23:26.800
The outcome may be inside this
set, in which case we say that
00:23:26.800 --> 00:23:31.270
event A occurred, if we get
an outcome inside here.
00:23:31.270 --> 00:23:35.120
Or the outcome may fall outside
the set, in which case
00:23:35.120 --> 00:23:38.530
we say that event
A did not occur.
00:23:38.530 --> 00:23:42.310
So we're going to assign
probabilities to events.
00:23:42.310 --> 00:23:45.630
And now, how should we
do this assignment?
00:23:45.630 --> 00:23:49.180
Well, probabilities are meant to
describe your beliefs about
00:23:49.180 --> 00:23:52.880
which sets are more likely to
occur versus other sets.
00:23:52.880 --> 00:23:55.050
So there's many ways that
you can assign those
00:23:55.050 --> 00:23:56.080
probabilities.
00:23:56.080 --> 00:23:59.290
But there are some ground
rules for this game.
00:23:59.290 --> 00:24:02.990
First, we want probabilities to
be numbers between zero and
00:24:02.990 --> 00:24:06.740
one because that's the
usual convention.
00:24:06.740 --> 00:24:09.840
So a probability of zero means
we're certain that something
00:24:09.840 --> 00:24:10.820
is not going to happen.
00:24:10.820 --> 00:24:13.570
Probability of one means that
we're essentially certain that
00:24:13.570 --> 00:24:14.870
something's going to happen.
00:24:14.870 --> 00:24:17.450
So we want numbers between
zero and one.
00:24:17.450 --> 00:24:19.740
We also want a few
other things.
00:24:19.740 --> 00:24:23.200
And those few other things are
going to be encapsulated in a
00:24:23.200 --> 00:24:25.060
set of axioms.
00:24:25.060 --> 00:24:29.030
What "axioms" means in this
context, it's the ground rules
00:24:29.030 --> 00:24:31.300
that any legitimate
probabilistic
00:24:31.300 --> 00:24:33.410
model should obey.
00:24:33.410 --> 00:24:37.080
You have a choice of what kind
of probabilities you use.
00:24:37.080 --> 00:24:40.900
But, no matter what you use,
they should obey certain
00:24:40.900 --> 00:24:44.740
consistency properties because
if they obey those properties,
00:24:44.740 --> 00:24:47.640
then you can go ahead and do
useful calculations and do
00:24:47.640 --> 00:24:49.360
some useful reasoning.
00:24:49.360 --> 00:24:51.010
So what are these properties?
00:24:51.010 --> 00:24:55.060
First, probabilities should
be non-negative.
00:24:55.060 --> 00:24:56.590
OK?
00:24:56.590 --> 00:24:57.530
That's our convention.
00:24:57.530 --> 00:25:00.350
We want probabilities to be
numbers between zero and one.
00:25:00.350 --> 00:25:02.130
So they should certainly
be non-negative.
00:25:02.130 --> 00:25:04.600
The probability that event
A occurs should be a
00:25:04.600 --> 00:25:06.135
non-negative number.
00:25:06.135 --> 00:25:08.110
What's the second axiom?
00:25:08.110 --> 00:25:13.760
The probability of the entire
sample space is equal to one.
00:25:13.760 --> 00:25:15.590
Why does this make sense?
00:25:15.590 --> 00:25:20.120
Well, the outcome is certain to
be an element of the sample
00:25:20.120 --> 00:25:23.140
space because we set up a
sample space, which is
00:25:23.140 --> 00:25:24.660
collectively exhaustive.
00:25:24.660 --> 00:25:28.590
No matter what the outcome is,
it's going to be an element of
00:25:28.590 --> 00:25:29.350
the sample space.
00:25:29.350 --> 00:25:33.710
We're certain that event omega
is going to occur.
00:25:33.710 --> 00:25:37.470
Therefore, we represent this
certainty by saying that the
00:25:37.470 --> 00:25:41.520
probability of omega
is equal to one.
00:25:41.520 --> 00:25:47.180
Pretty straightforward so far.
00:25:47.180 --> 00:25:52.240
The more interesting axiom
is the third rule.
00:25:52.240 --> 00:25:55.580
Before getting into it,
just a quick reminder.
00:25:55.580 --> 00:26:01.950
If you have two sets, A and B,
the intersection of A and B
00:26:01.950 --> 00:26:07.220
consists of those elements that
belong both to A and B.
00:26:07.220 --> 00:26:09.580
And we denote it this way.
00:26:09.580 --> 00:26:11.510
When you think
probabilistically, the way to
00:26:11.510 --> 00:26:15.530
think of intersection is by
using the word "and." This
00:26:15.530 --> 00:26:21.040
event, this intersection, is the
event that A occurred and
00:26:21.040 --> 00:26:22.450
B occurred.
00:26:22.450 --> 00:26:26.060
If I get an outcome inside here,
A has occurred and B has
00:26:26.060 --> 00:26:27.950
occurred at the same time.
00:26:27.950 --> 00:26:31.150
So you may find the word "and"
to be a little more convenient
00:26:31.150 --> 00:26:33.680
than the word "intersection."
00:26:33.680 --> 00:26:37.360
And similarly, we have some
notation for the union of two
00:26:37.360 --> 00:26:42.280
events, which we
write this way.
00:26:42.280 --> 00:26:46.250
The union of two sets, or two
events, is the collection of
00:26:46.250 --> 00:26:49.370
all the elements that belong
either to the first set, or to
00:26:49.370 --> 00:26:51.400
the second, or to both.
00:26:51.400 --> 00:26:55.220
When you talk about events, you
can use the word "or." So
00:26:55.220 --> 00:26:59.990
this is the event that A
occurred or B occurred.
00:26:59.990 --> 00:27:03.350
And this "or" means that it
could also be that both of
00:27:03.350 --> 00:27:04.600
them occurred.
00:27:08.890 --> 00:27:09.150
OK.
00:27:09.150 --> 00:27:11.280
So now that we have this
notation, what does
00:27:11.280 --> 00:27:13.835
the third axiom say?
00:27:13.835 --> 00:27:19.830
The third axiom says that if we
have two events, A and B,
00:27:19.830 --> 00:27:23.140
that have no common elements--
00:27:23.140 --> 00:27:29.330
so here's A, here's B,
and perhaps this is
00:27:29.330 --> 00:27:31.140
our big sample space.
00:27:31.140 --> 00:27:33.470
The two events have no
common elements.
00:27:33.470 --> 00:27:36.510
So the intersection of the two
events is the empty set.
00:27:36.510 --> 00:27:38.930
There's nothing in their
intersection.
00:27:38.930 --> 00:27:43.190
Then, the total probability of
A together with B has to be
00:27:43.190 --> 00:27:46.600
equal to the sum of the
individual probabilities.
00:27:46.600 --> 00:27:50.510
So the probability that A occurs
or B occurs is equal to
00:27:50.510 --> 00:27:52.390
the probability that
A occurs plus the
00:27:52.390 --> 00:27:55.040
probability that B occurs.
00:27:55.040 --> 00:27:58.860
So think of probability
as being cream cheese.
00:27:58.860 --> 00:28:03.020
You have one pound of cream
cheese, the total probability
00:28:03.020 --> 00:28:05.340
assigned to the entire
sample space.
00:28:05.340 --> 00:28:12.780
And that cream cheese is spread
out over this set.
00:28:12.780 --> 00:28:16.380
The probability of A is how much
cream cheese sits on top
00:28:16.380 --> 00:28:20.320
of A. Probability of B is how
much sits on top of B. The
00:28:20.320 --> 00:28:25.370
probability of A union B is
the total amount of cream
00:28:25.370 --> 00:28:29.650
cheese sitting on top of this
and that, which is obviously
00:28:29.650 --> 00:28:31.880
the sum of how much is
sitting here and how
00:28:31.880 --> 00:28:33.220
much is sitting there.
00:28:33.220 --> 00:28:36.110
So probabilities behave
like cream cheese, or
00:28:36.110 --> 00:28:38.450
they behave like mass.
00:28:38.450 --> 00:28:48.280
For example, if you think of
some material object, the mass
00:28:48.280 --> 00:28:51.800
of this set consisting of two
pieces is obviously the sum of
00:28:51.800 --> 00:28:53.120
the two masses.
00:28:53.120 --> 00:28:55.680
So this property is a
very intuitive one.
00:28:55.680 --> 00:28:58.282
It's a pretty natural
one to have.
00:28:58.282 --> 00:29:00.640
OK.
00:29:00.640 --> 00:29:03.880
Are these axioms enough for
what we want to do?
00:29:03.880 --> 00:29:07.670
I mentioned a while ago that
we want probabilities to be
00:29:07.670 --> 00:29:10.110
numbers between zero and one.
00:29:10.110 --> 00:29:12.400
Here's an axiom that tells you
that probabilities are
00:29:12.400 --> 00:29:13.710
non-negative.
00:29:13.710 --> 00:29:17.280
Should we have another axiom
that tells us that
00:29:17.280 --> 00:29:21.670
probabilities are less
than or equal to one?
00:29:21.670 --> 00:29:23.150
It's a desirable property.
00:29:23.150 --> 00:29:26.090
We would like to have
it in our hands.
00:29:26.090 --> 00:29:29.030
OK, why is it not
in that list?
00:29:29.030 --> 00:29:32.850
Well, the people who are in the
axiom making business are
00:29:32.850 --> 00:29:35.060
mathematicians and
mathematicians tend to be
00:29:35.060 --> 00:29:36.390
pretty laconic.
00:29:36.390 --> 00:29:40.020
You don't say something if
you don't have to say it.
00:29:40.020 --> 00:29:42.580
And this is the case here.
00:29:42.580 --> 00:29:46.660
We don't need that extra axiom
because we can derive it from
00:29:46.660 --> 00:29:48.440
the existing axioms.
00:29:48.440 --> 00:29:50.590
Here's how it goes.
00:29:50.590 --> 00:29:55.180
One is the probability over
the entire sample space.
00:29:55.180 --> 00:29:57.450
Here we're using the
second axiom.
00:30:00.310 --> 00:30:06.070
Now the sample space consists
of A together with the
00:30:06.070 --> 00:30:07.680
complement of A. OK?
00:30:11.200 --> 00:30:14.470
When I write the complement of
A, I mean the complement of A
00:30:14.470 --> 00:30:16.800
inside of the set omega.
00:30:16.800 --> 00:30:21.700
So we have omega, here's A,
here's the complement of A,
00:30:21.700 --> 00:30:24.660
and the overall set is omega.
00:30:24.660 --> 00:30:25.350
OK.
00:30:25.350 --> 00:30:27.520
Now, what's the next step?
00:30:27.520 --> 00:30:28.650
What should I do next?
00:30:28.650 --> 00:30:31.320
Which axiom should I use?
00:30:31.320 --> 00:30:35.350
We use axiom three because a set
and the complement of that
00:30:35.350 --> 00:30:36.730
set are disjoint.
00:30:36.730 --> 00:30:38.770
They don't have any
common elements.
00:30:38.770 --> 00:30:44.050
So axiom three applies and
tells me that this is the
00:30:44.050 --> 00:30:48.150
probability of A plus the
probability of A complement.
00:30:48.150 --> 00:30:53.970
In particular, the probability
of A is equal to one minus the
00:30:53.970 --> 00:30:58.370
probability of A complement,
and this is less
00:30:58.370 --> 00:31:00.540
than or equal to one.
00:31:00.540 --> 00:31:01.790
Why?
00:31:03.430 --> 00:31:06.670
Because probabilities
are non-negative,
00:31:06.670 --> 00:31:10.020
by the first axiom.
00:31:10.020 --> 00:31:10.310
OK.
00:31:10.310 --> 00:31:12.440
So we got the conclusion
that we wanted.
00:31:12.440 --> 00:31:16.130
Probabilities are always less
than or equal to one, and this
00:31:16.130 --> 00:31:20.230
is a simple consequence of the
three axioms that we have.
00:31:20.230 --> 00:31:24.780
This is a really nice argument
because it actually uses each
00:31:24.780 --> 00:31:26.560
one of those axioms.
00:31:26.560 --> 00:31:29.060
The argument is simple, but you
have to use all of these
00:31:29.060 --> 00:31:33.050
three properties to get the
conclusion that you want.
00:31:33.050 --> 00:31:33.720
OK.
00:31:33.720 --> 00:31:37.140
So we can get interesting things
out of our axioms.
00:31:37.140 --> 00:31:40.050
Can we get some more
interesting ones?
00:31:40.050 --> 00:31:44.540
How about the union
of three sets?
00:31:44.540 --> 00:31:47.000
What kind of probability
should it have?
00:31:47.000 --> 00:31:52.870
So here's an event consisting
of three pieces.
00:31:52.870 --> 00:31:56.230
And I want to say something
about the probability of A
00:31:56.230 --> 00:32:01.780
union B union C. What I would
like to say is that this
00:32:01.780 --> 00:32:05.680
probability is equal to the sum
of the three individual
00:32:05.680 --> 00:32:07.140
probabilities.
00:32:07.140 --> 00:32:08.860
How can I do it?
00:32:08.860 --> 00:32:11.080
I have an axiom that
tells me that I can
00:32:11.080 --> 00:32:12.760
do it for two events.
00:32:12.760 --> 00:32:15.370
I don't have an axiom
for three events.
00:32:15.370 --> 00:32:19.210
Well, maybe I can manage things
and still be able to
00:32:19.210 --> 00:32:20.620
use that axiom.
00:32:20.620 --> 00:32:22.700
And here's the trick.
00:32:22.700 --> 00:32:28.000
The union of three sets, you can
think of it as forming the
00:32:28.000 --> 00:32:32.560
union of the first two sets and
then taking the union with
00:32:32.560 --> 00:32:35.670
the third set.
00:32:35.670 --> 00:32:36.530
OK?
00:32:36.530 --> 00:32:39.150
So taking unions, you can
take the unions in any
00:32:39.150 --> 00:32:40.440
order that you want.
00:32:40.440 --> 00:32:44.580
So here we have the
union of two sets.
00:32:44.580 --> 00:32:49.630
Now, ABC are disjoint,
by assumption or
00:32:49.630 --> 00:32:51.780
that's how I drew it.
00:32:51.780 --> 00:32:55.950
So if A, B, and C are disjoint,
then A union B is
00:32:55.950 --> 00:32:59.790
disjoint from C. So here
we have the union of
00:32:59.790 --> 00:33:01.400
two disjoint sets.
00:33:01.400 --> 00:33:05.380
So by the additivity axiom, the
probability of that the
00:33:05.380 --> 00:33:08.960
union is going to be the
probability of the first set
00:33:08.960 --> 00:33:12.000
plus the probability
of the second set.
00:33:12.000 --> 00:33:15.950
And now I can use the additivity
axiom once more to
00:33:15.950 --> 00:33:20.330
write that this is probability
of A plus probability of B
00:33:20.330 --> 00:33:25.220
plus probability of C. So by
using this axiom which was
00:33:25.220 --> 00:33:28.940
stated for two sets, we can
actually derive a similar
00:33:28.940 --> 00:33:32.450
property for the union of
three disjoint sets.
00:33:32.450 --> 00:33:34.640
And then you can repeat
this argument as many
00:33:34.640 --> 00:33:35.940
times as you want.
00:33:35.940 --> 00:33:39.050
It's valid for the union of
ten disjoint sets, for the
00:33:39.050 --> 00:33:42.830
union of a hundred disjoint
sets, for the union of any
00:33:42.830 --> 00:33:44.910
finite number of sets.
00:33:44.910 --> 00:33:53.210
So if A1 up to An are disjoint,
then the probability
00:33:53.210 --> 00:33:59.490
of A1 union An is equal to the
sum of the probabilities of
00:33:59.490 --> 00:34:01.500
the individual sets.
00:34:04.180 --> 00:34:05.740
OK.
00:34:05.740 --> 00:34:08.710
Special case of this
is when we're
00:34:08.710 --> 00:34:10.790
dealing with finite sets.
00:34:10.790 --> 00:34:14.300
Suppose I have just a finite
set of outcomes.
00:34:14.300 --> 00:34:17.880
I put them together in a set
and I'm interested in the
00:34:17.880 --> 00:34:19.630
probability of that set.
00:34:19.630 --> 00:34:22.050
So here's our sample space.
00:34:22.050 --> 00:34:26.840
There's lots of outcomes, but
I'm taking a few of these and
00:34:26.840 --> 00:34:30.120
I form a set out of them.
00:34:30.120 --> 00:34:32.920
This is a set consisting
of, in this
00:34:32.920 --> 00:34:34.760
picture, three elements.
00:34:34.760 --> 00:34:38.260
In general, it consists
of k elements.
00:34:38.260 --> 00:34:43.650
Now, a finite set, I can write
it as a union of single
00:34:43.650 --> 00:34:44.889
element sets.
00:34:44.889 --> 00:34:49.080
So this set here is the union
of this one element set,
00:34:49.080 --> 00:34:52.800
together with this one element
set together with that one
00:34:52.800 --> 00:34:53.980
element set.
00:34:53.980 --> 00:34:56.770
So the total probability of this
set is going to be the
00:34:56.770 --> 00:35:02.510
sum of the probabilities of
the one element sets.
00:35:02.510 --> 00:35:08.030
Now, probability of a one
element set, you need to use
00:35:08.030 --> 00:35:10.010
the brackets here because
probabilities
00:35:10.010 --> 00:35:12.260
are assigned to sets.
00:35:12.260 --> 00:35:16.190
But this gets kind of tedious,
so here one abuses notation a
00:35:16.190 --> 00:35:19.920
little bit and we get rid of
those brackets and just write
00:35:19.920 --> 00:35:24.030
probability of this single,
individual outcome.
00:35:24.030 --> 00:35:28.510
In any case, conclusion from
this exercise is that the
00:35:28.510 --> 00:35:33.410
total probability of a finite
collection of possible
00:35:33.410 --> 00:35:37.070
outcomes, the total probability
is equal to the
00:35:37.070 --> 00:35:42.190
sum of the probabilities
of individual elements.
00:35:42.190 --> 00:35:46.460
So these are basically the
axioms of probability theory.
00:35:46.460 --> 00:35:49.970
Or, well, they're almost
the axioms.
00:35:49.970 --> 00:35:53.060
There are some subtleties
that are involved here.
00:35:53.060 --> 00:35:58.650
One subtlety is that this axiom
here doesn't quite do
00:35:58.650 --> 00:36:01.340
the job for everything
we would like to do.
00:36:01.340 --> 00:36:03.030
And we're going to come
back to this at
00:36:03.030 --> 00:36:05.080
the end of the lecture.
00:36:05.080 --> 00:36:10.380
A second subtlety has to
do with weird sets.
00:36:10.380 --> 00:36:13.570
We said that an event is a
subset of the sample space and
00:36:13.570 --> 00:36:16.712
we assign probabilities
to events.
00:36:16.712 --> 00:36:19.990
Does this mean that we are going
to assign probability to
00:36:19.990 --> 00:36:23.500
every possible subset
of the sample space?
00:36:23.500 --> 00:36:26.660
Ideally, we would
wish to do that.
00:36:26.660 --> 00:36:29.580
Unfortunately, this is
not always possible.
00:36:29.580 --> 00:36:35.010
If you take a sample space, such
as the square, the square
00:36:35.010 --> 00:36:38.560
has nice subsets, those that you
can describe by cutting it
00:36:38.560 --> 00:36:40.220
with lines and so on.
00:36:40.220 --> 00:36:45.540
But it does have some very ugly
subsets, as well, that
00:36:45.540 --> 00:36:48.870
are impossible to visualize,
impossible to imagine, but
00:36:48.870 --> 00:36:50.030
they do exist.
00:36:50.030 --> 00:36:53.710
And those very weird sets are
such that there's no way to
00:36:53.710 --> 00:36:56.750
assign probabilities to them
in a way that's consistent
00:36:56.750 --> 00:36:58.630
with the axioms of
probability.
00:36:58.630 --> 00:36:59.000
OK.
00:36:59.000 --> 00:37:02.960
So this is a very, very fine
point that you can immediately
00:37:02.960 --> 00:37:05.940
forget for the rest
of this class.
00:37:05.940 --> 00:37:09.350
You will only encounter these
sets if you end up doing
00:37:09.350 --> 00:37:12.450
doctoral work on the theoretical
aspects of
00:37:12.450 --> 00:37:15.910
probability theory.
00:37:15.910 --> 00:37:19.570
So it's just a mathematical
subtlety that some very weird
00:37:19.570 --> 00:37:22.560
sets do not have probabilities
assigned to them.
00:37:22.560 --> 00:37:25.110
But we're not going to encounter
these sets and they
00:37:25.110 --> 00:37:26.885
do not show up in any
applications.
00:37:29.520 --> 00:37:29.840
OK.
00:37:29.840 --> 00:37:32.410
So now let's revisit
our examples.
00:37:32.410 --> 00:37:34.800
Let's go back to the
die example.
00:37:34.800 --> 00:37:36.950
We have our sample space.
00:37:36.950 --> 00:37:40.830
Now we need to assign
a probability law.
00:37:40.830 --> 00:37:43.260
There's lots of possible
probability laws
00:37:43.260 --> 00:37:44.690
that you can assign.
00:37:44.690 --> 00:37:49.060
I'm picking one here,
arbitrarily, in which I say
00:37:49.060 --> 00:37:51.320
that every possible outcome
has the same
00:37:51.320 --> 00:37:55.440
probability of 1/16.
00:37:55.440 --> 00:37:56.040
OK.
00:37:56.040 --> 00:37:58.010
Why do I make this model?
00:37:58.010 --> 00:38:02.340
Well, empirically, if you have
well-manufactured dice, they
00:38:02.340 --> 00:38:04.540
tend to behave that way.
00:38:04.540 --> 00:38:06.870
We will be coming back
to this kind of story
00:38:06.870 --> 00:38:08.500
later in this class.
00:38:08.500 --> 00:38:13.040
But I'm not saying that this
is the only probability law
00:38:13.040 --> 00:38:13.720
that there can be.
00:38:13.720 --> 00:38:17.460
You might have weird dice in
which certain outcomes are
00:38:17.460 --> 00:38:19.280
more likely than others.
00:38:19.280 --> 00:38:21.850
But to keep things simple, let's
take every outcome to
00:38:21.850 --> 00:38:24.870
have the same probability
of 1/16.
00:38:24.870 --> 00:38:26.790
OK.
00:38:26.790 --> 00:38:29.340
Now that we have in our hands
a sample space and the
00:38:29.340 --> 00:38:31.990
probability law, we can
actually solve any
00:38:31.990 --> 00:38:33.250
problem there is.
00:38:33.250 --> 00:38:36.070
We can answer any question that
could be posed to us.
00:38:36.070 --> 00:38:39.320
For example, what's the
probability that the outcome,
00:38:39.320 --> 00:38:43.590
which is this pair, is
either 1,1 or 1,2.
00:38:43.590 --> 00:38:50.160
We're talking here about this
particular event, 1,1 or 1,2.
00:38:50.160 --> 00:38:53.300
So it's an event consisting
of these two items.
00:38:53.300 --> 00:38:56.640
According to what we were just
discussing, the probability of
00:38:56.640 --> 00:38:59.540
a finite collection of outcomes
is the sum of their
00:38:59.540 --> 00:39:01.170
individual probabilities.
00:39:01.170 --> 00:39:04.190
Each one of them has probability
of 1/16, so the
00:39:04.190 --> 00:39:07.720
probability of this is 2/16.
00:39:07.720 --> 00:39:11.910
How about the probability of the
event that x is equal to
00:39:11.910 --> 00:39:14.960
one. x is the first roll, so
that's the probability that
00:39:14.960 --> 00:39:18.120
the first roll is
equal to one.
00:39:18.120 --> 00:39:22.340
Notice the syntax that's
being used here.
00:39:22.340 --> 00:39:26.880
Probabilities are assigned to
subsets, to sets, so we think
00:39:26.880 --> 00:39:32.500
of this as meaning the set of
all outcomes such that x is
00:39:32.500 --> 00:39:33.660
equal to one.
00:39:33.660 --> 00:39:35.210
How do you answer
this question?
00:39:35.210 --> 00:39:38.370
You go back to the picture and
you try to visualize or
00:39:38.370 --> 00:39:40.810
identify this event
of interest.
00:39:40.810 --> 00:39:45.570
x is equal to one corresponds
to this event here.
00:39:45.570 --> 00:39:48.950
These are all the outcomes at
which x is equal to one.
00:39:48.950 --> 00:39:50.100
There's four outcomes.
00:39:50.100 --> 00:39:54.180
Each one has probability 1/16,
so the answer is 4/16.
00:39:56.760 --> 00:39:57.820
OK.
00:39:57.820 --> 00:40:06.482
How about the probability
that x plus y is odd?
00:40:06.482 --> 00:40:07.100
OK.
00:40:07.100 --> 00:40:09.840
That will take a little
bit more work.
00:40:09.840 --> 00:40:12.910
But you go to the sample space
and you identify all the
00:40:12.910 --> 00:40:16.010
outcomes at which the sum
is an odd number.
00:40:16.010 --> 00:40:20.930
So that's a place where the sum
is odd, these are other
00:40:20.930 --> 00:40:27.570
places, and I guess that
exhausts all the possible
00:40:27.570 --> 00:40:31.780
outcomes at which we
have an odd sum.
00:40:31.780 --> 00:40:32.890
We count them.
00:40:32.890 --> 00:40:34.030
How many are there?
00:40:34.030 --> 00:40:35.540
There's a total of
eight of them.
00:40:35.540 --> 00:40:40.490
Each one has probability 1/16,
total probability is 8/16.
00:40:40.490 --> 00:40:41.620
And harder question.
00:40:41.620 --> 00:40:44.310
What is the probability that the
minimum of the two rolls
00:40:44.310 --> 00:40:45.820
is equal to 2?
00:40:45.820 --> 00:40:48.710
This is something that you
probably couldn't do in your
00:40:48.710 --> 00:40:51.640
head without the help
of a diagram.
00:40:51.640 --> 00:40:54.780
But once you have a diagram,
things are simple.
00:40:54.780 --> 00:40:55.760
You ask the question.
00:40:55.760 --> 00:40:59.710
OK, this is an event, that the
minimum of the two rolls is
00:40:59.710 --> 00:41:01.140
equal to two.
00:41:01.140 --> 00:41:03.150
This can happen in
several ways.
00:41:03.150 --> 00:41:05.250
What are the several ways
that it can happen?
00:41:05.250 --> 00:41:07.980
Go to the diagram and try
to identify them.
00:41:07.980 --> 00:41:11.620
So the minimum is equal to two
if both of them are two's.
00:41:14.230 --> 00:41:18.780
Or it could be that x is two and
y is bigger, or y is two
00:41:18.780 --> 00:41:21.900
and x is bigger.
00:41:21.900 --> 00:41:23.150
OK.
00:41:23.150 --> 00:41:29.210
I guess we rediscover that
yellow and blue make green, so
00:41:29.210 --> 00:41:31.910
we see here that there's
a total of
00:41:31.910 --> 00:41:34.630
five possible outcomes.
00:41:34.630 --> 00:41:37.645
The probability of this
event is 5/16.
00:41:41.250 --> 00:41:47.460
Simple example, but the
procedure that we followed in
00:41:47.460 --> 00:41:52.490
this example actually applies
to any probability model you
00:41:52.490 --> 00:41:54.240
might ever encounter.
00:41:54.240 --> 00:41:57.720
You set up your sample space,
you make a statement that
00:41:57.720 --> 00:42:00.710
describes the probability law
over that sample space, then
00:42:00.710 --> 00:42:03.640
somebody asks you questions
about various events.
00:42:03.640 --> 00:42:07.300
You go to your pictures,
identify those events, pin
00:42:07.300 --> 00:42:11.410
them down, and then start kind
of counting and calculating
00:42:11.410 --> 00:42:14.370
the total probability for those
outcomes that you're
00:42:14.370 --> 00:42:16.560
considering.
00:42:16.560 --> 00:42:20.180
This example is a special case
of what is called the discrete
00:42:20.180 --> 00:42:22.780
uniform law.
00:42:22.780 --> 00:42:26.500
The model obeys the discrete
uniform law if all outcomes
00:42:26.500 --> 00:42:28.340
are equally likely.
00:42:28.340 --> 00:42:30.040
It doesn't have to
be that way.
00:42:30.040 --> 00:42:33.290
That's just one example
of a probability law.
00:42:33.290 --> 00:42:36.760
But when things are that way,
if all outcomes are equally
00:42:36.760 --> 00:42:45.960
likely and we have N of them,
and you have a set A that has
00:42:45.960 --> 00:42:51.150
little n elements, then each
one of those elements has
00:42:51.150 --> 00:42:54.460
probability one over
capital N since all
00:42:54.460 --> 00:42:56.450
outcomes are equally likely.
00:42:56.450 --> 00:42:58.980
And for our probabilities to add
up to one, each one must
00:42:58.980 --> 00:43:02.620
have this much probability, and
there's little n elements.
00:43:02.620 --> 00:43:06.120
That gives you the probability
of the event of interest.
00:43:06.120 --> 00:43:09.020
So problems like the one in the
previous slide and more
00:43:09.020 --> 00:43:11.560
generally of the type described
here under discrete
00:43:11.560 --> 00:43:15.270
uniform law, these problems
reduce to just counting.
00:43:15.270 --> 00:43:17.500
How many elements are there
in my sample space?
00:43:17.500 --> 00:43:21.160
How many elements are there
inside the event of interest?
00:43:21.160 --> 00:43:24.520
Counting is generally simple,
but for some problems it gets
00:43:24.520 --> 00:43:25.950
pretty complicated.
00:43:25.950 --> 00:43:28.980
And in a couple of weeks, we're
going to have to spend
00:43:28.980 --> 00:43:31.820
the whole lecture just on the
subject of how to count
00:43:31.820 --> 00:43:33.280
systematically.
00:43:33.280 --> 00:43:37.070
Now the procedure we followed in
the previous example is the
00:43:37.070 --> 00:43:39.950
same as the procedure you would
follow in continuous
00:43:39.950 --> 00:43:41.330
probability problems.
00:43:41.330 --> 00:43:44.200
So, going back to our dart
problem, we get the random
00:43:44.200 --> 00:43:46.550
point inside the square.
00:43:46.550 --> 00:43:48.030
That's our sample space.
00:43:48.030 --> 00:43:50.360
We need to assign a
probability law.
00:43:50.360 --> 00:43:53.550
For lack of imagination, I'm
taking the probability law to
00:43:53.550 --> 00:43:56.280
be the area of a subset.
00:43:56.280 --> 00:44:00.990
So if we have two subsets of
the sample space that have
00:44:00.990 --> 00:44:05.000
equal areas, then I'm
postulating that they are
00:44:05.000 --> 00:44:06.560
equally likely to occur.
00:44:06.560 --> 00:44:08.490
The probably that they fall
here is the same as the
00:44:08.490 --> 00:44:11.430
probability that they
fall there.
00:44:11.430 --> 00:44:13.670
The model doesn't have
to be that way.
00:44:13.670 --> 00:44:16.720
But if I have sort of complete
ignorance of which points are
00:44:16.720 --> 00:44:19.310
more likely than others,
that might be the
00:44:19.310 --> 00:44:21.430
reasonable model to use.
00:44:21.430 --> 00:44:24.680
So equal areas mean equal
probabilities.
00:44:24.680 --> 00:44:27.470
If the area is twice as large,
the probability is going to be
00:44:27.470 --> 00:44:28.830
twice as big.
00:44:28.830 --> 00:44:32.130
So this is our model.
00:44:32.130 --> 00:44:34.580
We can now answer questions.
00:44:34.580 --> 00:44:35.730
Let's answer the easy one.
00:44:35.730 --> 00:44:38.070
What's the probability
that the outcome is
00:44:38.070 --> 00:44:40.660
exactly this point?
00:44:40.660 --> 00:44:47.500
That of course is zero because
a single point has zero area.
00:44:47.500 --> 00:44:50.190
And since this probability is
equal to area, that's zero
00:44:50.190 --> 00:44:51.510
probability.
00:44:51.510 --> 00:44:55.940
How about the probability that
the sum of the coordinates of
00:44:55.940 --> 00:45:00.090
the point that we got is less
than or equal to 1/2?
00:45:00.090 --> 00:45:01.570
How do you deal with it?
00:45:01.570 --> 00:45:04.770
Well, you look at the picture
again, at your sample space,
00:45:04.770 --> 00:45:08.130
and try to describe the event
that you're talking about.
00:45:08.130 --> 00:45:12.210
The sum being less than 1/2
corresponds to getting an
00:45:12.210 --> 00:45:16.060
outcome that's below this line,
where this line is the
00:45:16.060 --> 00:45:19.600
line where x plus
y equals to 1/2.
00:45:19.600 --> 00:45:25.860
So the intercepts of that line
with the axis are 1/2 and 1/2.
00:45:25.860 --> 00:45:29.730
So you describe the event
visually and then you use your
00:45:29.730 --> 00:45:30.780
probability law.
00:45:30.780 --> 00:45:33.260
The probability law that we have
is that the probability
00:45:33.260 --> 00:45:36.620
of a set is equal to the
area of that set.
00:45:36.620 --> 00:45:39.900
So all we need to find is the
area of this triangle, which
00:45:39.900 --> 00:45:48.915
is 1/2 times 1/2 times 1/2,
half, equals to 1/8.
00:45:48.915 --> 00:45:49.380
OK.
00:45:49.380 --> 00:45:52.620
Moral from these two examples is
that it's always useful to
00:45:52.620 --> 00:45:56.750
have a picture and work with
a picture to visualize the
00:45:56.750 --> 00:45:58.750
events that you're
talking about.
00:45:58.750 --> 00:46:01.340
And once you have a probability
law in your hands,
00:46:01.340 --> 00:46:04.470
then it's a matter of
calculation to find the
00:46:04.470 --> 00:46:06.540
probabilities of an
event of interest.
00:46:06.540 --> 00:46:09.080
The calculations we did in these
two examples, of course,
00:46:09.080 --> 00:46:10.130
were very simple.
00:46:10.130 --> 00:46:14.510
Sometimes calculations may be
a lot harder, but it's a
00:46:14.510 --> 00:46:15.480
different business.
00:46:15.480 --> 00:46:19.250
It's a business of calculus, for
example, or being good in
00:46:19.250 --> 00:46:20.250
algebra and so on.
00:46:20.250 --> 00:46:24.240
As far as probability is
concerned, it's clear what you
00:46:24.240 --> 00:46:27.110
will be doing, and then maybe
you're faced with a harder
00:46:27.110 --> 00:46:30.540
algebraic part to actually carry
out the calculations.
00:46:30.540 --> 00:46:32.870
The area of a triangle
is easy to compute.
00:46:32.870 --> 00:46:36.030
If I had put down a very
complicated shape, then you
00:46:36.030 --> 00:46:39.300
might need to solve a hard
integration problem to find
00:46:39.300 --> 00:46:42.190
the area of that shape, but
that's stuff that belongs to
00:46:42.190 --> 00:46:46.306
another class that you have
presumably mastered by now.
00:46:46.306 --> 00:46:47.000
Good, OK.
00:46:47.000 --> 00:46:49.730
So now let me spend just a
couple of minutes to return to
00:46:49.730 --> 00:46:52.170
a point that I raised before.
00:46:52.170 --> 00:46:56.270
I was saying that the axiom that
we had about additivity
00:46:56.270 --> 00:46:58.730
might not quite be enough.
00:46:58.730 --> 00:47:01.730
Let's illustrate what I mean
by the following example.
00:47:01.730 --> 00:47:04.960
Think of the experiment where
you keep flipping a coin and
00:47:04.960 --> 00:47:08.120
you wait until you obtain heads
for the first time.
00:47:08.120 --> 00:47:11.390
What's the sample space
of this experiment?
00:47:11.390 --> 00:47:13.730
It might happen the first flip,
it might happen in the
00:47:13.730 --> 00:47:14.700
tenth flip.
00:47:14.700 --> 00:47:18.490
Heads for the first time might
occur in the millionth flip.
00:47:18.490 --> 00:47:21.070
So the outcome of this
experiment is going to be an
00:47:21.070 --> 00:47:23.820
integer and there's no bound
to that integer.
00:47:23.820 --> 00:47:26.780
You might have to wait very
much until that happens.
00:47:26.780 --> 00:47:29.020
So the natural sample
space is the set of
00:47:29.020 --> 00:47:30.950
all possible integers.
00:47:30.950 --> 00:47:35.030
Somebody tells you some
information about the
00:47:35.030 --> 00:47:36.250
probability law.
00:47:36.250 --> 00:47:39.900
The probability that you have
to wait for n flips is equal
00:47:39.900 --> 00:47:41.130
to two to the minus n.
00:47:41.130 --> 00:47:42.850
Where did this come from?
00:47:42.850 --> 00:47:44.220
That's a separate story.
00:47:44.220 --> 00:47:45.730
Where did it come from?
00:47:45.730 --> 00:47:49.840
Somebody tells this to us, and
those probabilities are
00:47:49.840 --> 00:47:52.150
plotted here as a
function of n.
00:47:52.150 --> 00:47:54.580
And you're asked to find the
probability that the outcome
00:47:54.580 --> 00:47:56.660
is an even number.
00:47:56.660 --> 00:47:59.920
How do you go about calculating
that probability?
00:47:59.920 --> 00:48:02.960
So the probability of being an
even number is the probability
00:48:02.960 --> 00:48:08.380
of the subset that consists
of just the even numbers.
00:48:08.380 --> 00:48:11.810
So it would be a subset of this
kind, that includes two,
00:48:11.810 --> 00:48:13.760
four, and so on.
00:48:13.760 --> 00:48:18.270
So any reasonable person would
say, well the probability of
00:48:18.270 --> 00:48:22.170
obtaining an outcome that's
either two or four or six and
00:48:22.170 --> 00:48:25.360
so on is equal to the
probability of obtaining a
00:48:25.360 --> 00:48:28.370
two, plus the probability of
obtaining a four, plus the
00:48:28.370 --> 00:48:31.130
probability of obtaining
a six, and so on.
00:48:31.130 --> 00:48:33.640
These probabilities
are given to us.
00:48:33.640 --> 00:48:35.990
So here I have to
do my algebra.
00:48:35.990 --> 00:48:40.840
I add this geometric series and
I get an answer of 1/3.
00:48:40.840 --> 00:48:43.430
That's what any reasonable
person would do.
00:48:43.430 --> 00:48:48.290
But the person who only knows
the axioms that they posted
00:48:48.290 --> 00:48:51.880
just a little earlier
may get stuck.
00:48:51.880 --> 00:48:53.610
They would get stuck
at this point.
00:48:53.610 --> 00:48:55.700
How do we justify this?
00:48:59.000 --> 00:49:04.010
We had this property for the
union of disjoint sets and the
00:49:04.010 --> 00:49:07.210
corresponding property that
tells us that the total
00:49:07.210 --> 00:49:11.620
probability of finitely many
things, outcomes, is the sum
00:49:11.620 --> 00:49:13.740
of their individual
probabilities.
00:49:13.740 --> 00:49:17.940
But here we're using it on
an infinite collection.
00:49:17.940 --> 00:49:23.180
The probability of infinitely
many points is equal to the
00:49:23.180 --> 00:49:26.070
sum of the probabilities
of each one of these.
00:49:26.070 --> 00:49:30.190
To justify this step we need
to introduce one additional
00:49:30.190 --> 00:49:34.180
rule, an additional axiom, that
tells us that this step
00:49:34.180 --> 00:49:36.160
is actually legitimate.
00:49:36.160 --> 00:49:39.540
And this is the countable
additivity axiom, which is a
00:49:39.540 --> 00:49:42.780
little stronger, or quite
a bit stronger, than the
00:49:42.780 --> 00:49:45.140
additivity axiom
we had before.
00:49:45.140 --> 00:49:49.210
It tells us that if we have a
sequence of sets that are
00:49:49.210 --> 00:49:54.190
disjoint and we want to find
their total probability, then
00:49:54.190 --> 00:49:58.230
we are allowed to add their
individual probabilities.
00:49:58.230 --> 00:50:01.000
So the picture might
be such as follows.
00:50:01.000 --> 00:50:07.420
We have a sequence of sets,
A1, A2, A3, and so on.
00:50:07.420 --> 00:50:10.110
I guess in order to fit them
inside the sample space, the
00:50:10.110 --> 00:50:13.920
sets need to get smaller
and smaller perhaps.
00:50:13.920 --> 00:50:15.340
They are disjoint.
00:50:15.340 --> 00:50:17.330
We have a sequence
of such sets.
00:50:17.330 --> 00:50:21.340
The total probability of falling
anywhere inside one of
00:50:21.340 --> 00:50:25.740
those sets is the sum of their
individual probabilities.
00:50:25.740 --> 00:50:30.150
A key subtlety that's involved
here is that we're talking
00:50:30.150 --> 00:50:33.710
about a sequence of events.
00:50:33.710 --> 00:50:36.560
By "sequence" we mean that
these events can
00:50:36.560 --> 00:50:38.450
be arranged in order.
00:50:38.450 --> 00:50:41.780
I can tell you the first event,
the second event, the
00:50:41.780 --> 00:50:43.530
third event, and so on.
00:50:43.530 --> 00:50:46.320
So if you have such a collection
of events that can
00:50:46.320 --> 00:50:50.690
be ordered as first, second,
third, and so on, then you can
00:50:50.690 --> 00:50:54.040
add their probabilities
to find the
00:50:54.040 --> 00:50:55.790
probability of their union.
00:50:55.790 --> 00:50:58.230
So this point is actually a
little more subtle than you
00:50:58.230 --> 00:51:00.730
might appreciate at this point,
and I'm going to return
00:51:00.730 --> 00:51:04.010
to it at the beginning
of the next lecture.
00:51:04.010 --> 00:51:07.160
For now, enjoy the first
week of classes
00:51:07.160 --> 00:51:09.380
and have a good weekend.
00:51:09.380 --> 00:51:10.630
Thank you.