WEBVTT
00:00:00.680 --> 00:00:03.920
We have seen so far an example
of a probability law on a
00:00:03.920 --> 00:00:07.590
discrete and finite sample space
as well as an example
00:00:07.590 --> 00:00:10.550
with an infinite and continuous
sample space.
00:00:10.550 --> 00:00:14.830
Let us now look at an example
involving a discrete but
00:00:14.830 --> 00:00:17.350
infinite sample space.
00:00:17.350 --> 00:00:20.810
We carry out an experiment whose
outcome is an arbitrary
00:00:20.810 --> 00:00:22.890
positive integer.
00:00:22.890 --> 00:00:25.300
As an example of such an
experiment, suppose that we
00:00:25.300 --> 00:00:28.610
keep tossing a coin and the
outcome is the number of
00:00:28.610 --> 00:00:32.850
tosses until we observe heads
for the first time.
00:00:32.850 --> 00:00:36.140
The first heads might appear
in the first toss or the
00:00:36.140 --> 00:00:39.150
second or the third,
and so on.
00:00:39.150 --> 00:00:42.890
So in this example, any positive
integer is possible.
00:00:42.890 --> 00:00:46.320
And so our sample space
is infinite.
00:00:46.320 --> 00:00:49.480
Let us not specify a
probability law.
00:00:49.480 --> 00:00:52.730
A probability law should
determine the probability of
00:00:52.730 --> 00:00:56.950
every event, of every subset
of the sample space.
00:00:56.950 --> 00:00:59.080
That is, the probability
of every
00:00:59.080 --> 00:01:01.970
set of positive integers.
00:01:01.970 --> 00:01:06.140
But instead I will just tell you
the probability of events
00:01:06.140 --> 00:01:08.860
that contain a single element.
00:01:08.860 --> 00:01:13.050
I'm going to tell you that there
is probability 1 over 2
00:01:13.050 --> 00:01:18.010
to the n that the outcome
is equal to n.
00:01:18.010 --> 00:01:19.860
Is this good enough?
00:01:19.860 --> 00:01:23.800
Is this information enough to
determine the probability of
00:01:23.800 --> 00:01:26.420
any subset?
00:01:26.420 --> 00:01:28.950
Before we look into that
question, let us first do a
00:01:28.950 --> 00:01:32.425
quick sanity check to see
whether these numbers that we
00:01:32.425 --> 00:01:35.420
are given look like legitimate
probabilities.
00:01:35.420 --> 00:01:37.410
Do they add to 1?
00:01:37.410 --> 00:01:39.410
Let's do a quick check.
00:01:39.410 --> 00:01:45.840
So the sum over all the possible
values of n of the
00:01:45.840 --> 00:01:49.610
probabilities that we're given,
which is an infinite
00:01:49.610 --> 00:01:55.520
sum starting from 1, all the way
up to infinity, of 1 over
00:01:55.520 --> 00:01:58.700
2 to the n, is equal
to the following.
00:01:58.700 --> 00:02:04.250
First we take out a factor of
1/2 from all of these terms,
00:02:04.250 --> 00:02:08.080
which reduces the exponent
from n to n minus 1.
00:02:08.080 --> 00:02:13.700
This is the same as running
the sum from n equals 0 to
00:02:13.700 --> 00:02:19.310
infinity of 1/2 and to the n.
00:02:19.310 --> 00:02:24.980
And now we have a usual infinite
geometric series and
00:02:24.980 --> 00:02:27.730
we have a formula for this.
00:02:27.730 --> 00:02:33.320
The geometric series has a value
of 1 over 1 minus the
00:02:33.320 --> 00:02:36.665
number whose power we're
taking, which is 1/2.
00:02:39.280 --> 00:02:42.520
And after we do the arithmetic,
this turns out to
00:02:42.520 --> 00:02:44.240
be equal to 1.
00:02:44.240 --> 00:02:50.860
So indeed, it appears that we
have the basic elements of
00:02:50.860 --> 00:02:54.360
what it would take to have a
legitimate probability law.
00:02:54.360 --> 00:02:57.870
But now let us look into how
we might calculate the
00:02:57.870 --> 00:03:00.510
probability of some
general event.
00:03:00.510 --> 00:03:05.370
For example, the probability
that the outcome is even.
00:03:05.370 --> 00:03:08.300
We proceed as follows.
00:03:08.300 --> 00:03:11.200
The probability that the outcome
is even, this is the
00:03:11.200 --> 00:03:15.840
probability of an infinite
set that consists of
00:03:15.840 --> 00:03:18.610
all the even integers.
00:03:22.280 --> 00:03:29.760
We can write this set as the
union of lots of little sets
00:03:29.760 --> 00:03:33.090
that contain a single
element each.
00:03:33.090 --> 00:03:36.530
So it's the set containing the
number 2, the set containing
00:03:36.530 --> 00:03:38.750
the number 4, the set
containing the
00:03:38.750 --> 00:03:41.120
number 6, and so on.
00:03:44.010 --> 00:03:47.170
At this point we notice that
we're talking about the
00:03:47.170 --> 00:03:51.430
probability of a union of sets
and these sets are disjoint
00:03:51.430 --> 00:03:54.760
because they contain
different elements.
00:03:54.760 --> 00:04:00.900
So we can use an additivity
property and say that this is
00:04:00.900 --> 00:04:05.280
the probability of obtaining a
2, plus the probability of
00:04:05.280 --> 00:04:08.190
obtaining a 4, plus
the probability of
00:04:08.190 --> 00:04:12.390
obtaining a 6 and so on.
00:04:12.390 --> 00:04:15.570
If you're curious about doing
this calculation and actually
00:04:15.570 --> 00:04:19.339
obtaining a numerical answer,
you would proceed as follows.
00:04:19.339 --> 00:04:26.030
You notice that this is 1 over
2 to the second power plus 1
00:04:26.030 --> 00:04:31.370
over 2 to the fourth power plus
1 over 2 to the sixth
00:04:31.370 --> 00:04:34.170
power and so on.
00:04:34.170 --> 00:04:43.260
Now you factor out a factor of
1/4 and what you're left is 1
00:04:43.260 --> 00:04:48.400
plus 1 over 2 to the second
power, which is 1/4, plus 1
00:04:48.400 --> 00:04:56.000
over 2 to the fourth power,
which is the same as 1/4 to
00:04:56.000 --> 00:04:59.760
the second power and so on.
00:04:59.760 --> 00:05:05.440
And now we have 1/4 times the
infinite sum of a geometric
00:05:05.440 --> 00:05:12.620
series, which gives us
1 over 1 minus 1/4.
00:05:12.620 --> 00:05:16.240
And after you do the algebra you
obtain a numerical answer,
00:05:16.240 --> 00:05:17.750
which is equal to 1/3.
00:05:20.260 --> 00:05:23.550
But leaving the details of the
calculation aside, the more
00:05:23.550 --> 00:05:26.810
important question I want to
address is the following.
00:05:26.810 --> 00:05:29.430
Is this calculation correct?
00:05:29.430 --> 00:05:32.370
We seem to have used
an additivity
00:05:32.370 --> 00:05:34.370
property at this point.
00:05:37.720 --> 00:05:41.500
But the additivity properties
that we have in our hands at
00:05:41.500 --> 00:05:46.800
this point only talk about
disjoint unions of finitely
00:05:46.800 --> 00:05:48.290
many subsets.
00:05:48.290 --> 00:05:51.460
Our initial axiom talked about
a disjoint union of two
00:05:51.460 --> 00:05:54.990
subsets and then later on we
established a similar property
00:05:54.990 --> 00:05:58.820
for a disjoint union of
finitely many subsets.
00:05:58.820 --> 00:06:02.620
But here we're talking
about the union of
00:06:02.620 --> 00:06:05.770
infinitely many subsets.
00:06:05.770 --> 00:06:11.940
So this step here is not really
allowed by what we have
00:06:11.940 --> 00:06:13.140
in our hands.
00:06:13.140 --> 00:06:16.500
On the other hand, we would like
our theory to allow this
00:06:16.500 --> 00:06:18.540
kind of calculation.
00:06:18.540 --> 00:06:23.070
The way out of this dilemma is
to introduce an additional
00:06:23.070 --> 00:06:27.015
axiom that will indeed allow
this kind of calculation.
00:06:29.660 --> 00:06:32.836
The axiom that we introduce
is the following.
00:06:32.836 --> 00:06:39.700
If we have an infinite sequence
of disjoint events,
00:06:39.700 --> 00:06:42.430
as for example in
this picture.
00:06:42.430 --> 00:06:44.560
We have our sample space.
00:06:44.560 --> 00:06:46.909
We have a first event, A1.
00:06:46.909 --> 00:06:49.440
We have a second event, A2.
00:06:49.440 --> 00:06:51.690
The third event, A3.
00:06:51.690 --> 00:06:55.730
And so we keep continuing and
we have an infinite sequence
00:06:55.730 --> 00:06:57.400
of such events.
00:06:57.400 --> 00:07:02.770
Then the probability of the
union of these events, of
00:07:02.770 --> 00:07:07.600
these infinitely many events, is
the sum of their individual
00:07:07.600 --> 00:07:09.390
probabilities.
00:07:09.390 --> 00:07:15.630
The key word here is
the word sequence.
00:07:15.630 --> 00:07:20.430
Namely, these events, these sets
that we're dealing with,
00:07:20.430 --> 00:07:25.120
can be arranged so that we can
talk about the first event,
00:07:25.120 --> 00:07:31.490
A1, the second event, A2, the
third one, A3, and so on.
00:07:31.490 --> 00:07:35.510
To appreciate the issue that
arises here and to see why the
00:07:35.510 --> 00:07:41.360
word sequence is so important,
let us consider the following
00:07:41.360 --> 00:07:43.110
calculation.
00:07:43.110 --> 00:07:45.680
Our sample space is
the unit square.
00:07:48.750 --> 00:07:52.290
And we consider a model where
the probability of a set is
00:07:52.290 --> 00:07:57.030
its area, as in the examples
that we considered earlier.
00:07:57.030 --> 00:08:00.550
Let us now look at the
probability of the overall
00:08:00.550 --> 00:08:02.180
sample space.
00:08:02.180 --> 00:08:07.890
Our sample space is the unit
square and the unit square can
00:08:07.890 --> 00:08:13.870
be thought of as the union of
various sets that consist of
00:08:13.870 --> 00:08:15.330
single points.
00:08:15.330 --> 00:08:22.780
So it's the union of subsets
with one element each.
00:08:22.780 --> 00:08:25.100
And it's a union taken
over all the
00:08:25.100 --> 00:08:28.770
points in the unit square.
00:08:28.770 --> 00:08:31.590
Then we think about
additivity.
00:08:31.590 --> 00:08:35.490
We observe that these subsets
are disjoint.
00:08:35.490 --> 00:08:39.080
If we're considering different
points, then we get disjoint
00:08:39.080 --> 00:08:40.890
single element sets.
00:08:40.890 --> 00:08:44.190
And then an additivity property
would tells us that
00:08:44.190 --> 00:08:47.450
the probability of these
union is the sum of the
00:08:47.450 --> 00:08:53.750
probabilities of the different
single element subsets.
00:08:53.750 --> 00:08:57.910
Now, as we discussed before,
single element subsets have 0
00:08:57.910 --> 00:08:58.770
probability.
00:08:58.770 --> 00:09:04.320
So we have a sum of lots of 0s
and the sum of 0s should be
00:09:04.320 --> 00:09:06.310
equal to 0.
00:09:06.310 --> 00:09:09.310
On the other hand, by the
probability axioms, the
00:09:09.310 --> 00:09:11.860
probability of the entire
sample space
00:09:11.860 --> 00:09:13.750
should be equal to 1.
00:09:13.750 --> 00:09:18.140
And so we have established
that 1 is equal to 0.
00:09:18.140 --> 00:09:20.120
This looks like a paradox.
00:09:20.120 --> 00:09:21.840
Is it?
00:09:21.840 --> 00:09:26.110
The catch is that there is
nothing in the axioms we have
00:09:26.110 --> 00:09:29.770
introduced so far or the
properties we have established
00:09:29.770 --> 00:09:32.600
that would justify this step.
00:09:32.600 --> 00:09:36.940
So this step here
is questionable.
00:09:36.940 --> 00:09:40.440
You might argue that the unit
square is the union of
00:09:40.440 --> 00:09:45.490
disjoint single element sets,
which is the case that we have
00:09:45.490 --> 00:09:47.340
in additivity axioms.
00:09:47.340 --> 00:09:50.950
But the additivity axiom only
applies when we have a
00:09:50.950 --> 00:09:53.770
sequence of events.
00:09:53.770 --> 00:09:56.580
And this is not what
we have here.
00:09:56.580 --> 00:09:59.470
This is not a union
of a sequence of
00:09:59.470 --> 00:10:01.090
single element sets.
00:10:01.090 --> 00:10:04.160
In fact, there is no way that
the elements of the unit
00:10:04.160 --> 00:10:06.930
square can be arranged
in a sequence.
00:10:06.930 --> 00:10:13.440
The unit square is said to
be an uncountable set.
00:10:13.440 --> 00:10:16.950
This is a deep and fundamental
mathematical fact.
00:10:16.950 --> 00:10:19.980
What it essentially says is that
there are two kinds of
00:10:19.980 --> 00:10:21.510
infinite sets.
00:10:21.510 --> 00:10:26.450
Discrete ones or in formal
terminology countable.
00:10:26.450 --> 00:10:29.980
These are sets whose elements
can be arranged in a sequence,
00:10:29.980 --> 00:10:31.680
like the integers.
00:10:31.680 --> 00:10:36.910
And also uncountable sets, such
as the unit square or the
00:10:36.910 --> 00:10:40.030
real line, whose elements
cannot be
00:10:40.030 --> 00:10:42.140
arranged in a sequence.
00:10:42.140 --> 00:10:45.680
If you're curious, you can
find the proof of this
00:10:45.680 --> 00:10:48.400
important fact in the
supplementary materials that
00:10:48.400 --> 00:10:51.020
we are providing.
00:10:51.020 --> 00:10:53.680
After all these discussion, you
may now have legitimate
00:10:53.680 --> 00:10:57.340
suspicions about the models
we have been looking at.
00:10:57.340 --> 00:11:00.860
Is area a legitimate
probability law?
00:11:00.860 --> 00:11:05.600
Does it even satisfy countable
additivity?
00:11:05.600 --> 00:11:09.000
This question takes us into deep
waters and has to do with
00:11:09.000 --> 00:11:12.250
a deep subfield of mathematics
called Measure Theory.
00:11:12.250 --> 00:11:15.970
Fortunately, it turns out
that all is well.
00:11:15.970 --> 00:11:19.030
Area is a legitimate
probability law.
00:11:19.030 --> 00:11:23.600
It does indeed satisfy the
countable additivity axiom as
00:11:23.600 --> 00:11:29.270
long as we only deal with nice
subsets of the unit square.
00:11:29.270 --> 00:11:32.640
Fortunately, the subsets that
arise in whatever we do in
00:11:32.640 --> 00:11:35.220
this course will be "nice".
00:11:35.220 --> 00:11:39.890
Subsets that are not nice are
quite pathological and we will
00:11:39.890 --> 00:11:42.260
not encounter them.
00:11:42.260 --> 00:11:47.170
At this stage we are not in a
position to say anything more
00:11:47.170 --> 00:11:50.200
that would be meaningful about
these issues because they're
00:11:50.200 --> 00:11:53.230
quite complicated and
mathematically deep.
00:11:53.230 --> 00:11:57.550
We can only say that there are
some serious mathematical
00:11:57.550 --> 00:11:58.660
subtleties.
00:11:58.660 --> 00:12:01.620
But fortunately, they
can all be overcome
00:12:01.620 --> 00:12:03.190
in a rigorous manner.
00:12:03.190 --> 00:12:06.190
And for the rest of this class,
you can just forget
00:12:06.190 --> 00:12:07.710
about these subtle issues.