1
00:00:00,000 --> 00:00:00,040
2
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative
3
00:00:02,460 --> 00:00:03,870
Commons license.
4
00:00:03,870 --> 00:00:06,910
Your support will help MIT
OpenCourseWare continue to
5
00:00:06,910 --> 00:00:10,560
offer high quality educational
resources for free.
6
00:00:10,560 --> 00:00:13,460
To make a donation or view
additional materials from
7
00:00:13,460 --> 00:00:19,290
hundreds of MIT courses, visit
MIT OpenCourseWare at
8
00:00:19,290 --> 00:00:20,540
ocw.mit.edu.
9
00:00:20,540 --> 00:00:23,750
10
00:00:23,750 --> 00:00:26,970
PROFESSOR: Let us start.
11
00:00:26,970 --> 00:00:30,080
So as always, we're to have
a quick review of what we
12
00:00:30,080 --> 00:00:31,240
discussed last time.
13
00:00:31,240 --> 00:00:34,240
And then today we're going
to introduce just one new
14
00:00:34,240 --> 00:00:38,120
concept, the notion of
independence of two events.
15
00:00:38,120 --> 00:00:41,030
And we will play with
that concept.
16
00:00:41,030 --> 00:00:43,110
So what did we talk
about last time?
17
00:00:43,110 --> 00:00:46,410
The idea is that we have an
experiment, and the experiment
18
00:00:46,410 --> 00:00:48,800
has a sample space omega.
19
00:00:48,800 --> 00:00:52,300
And then somebody comes and
tells us you know the outcome
20
00:00:52,300 --> 00:00:56,840
of the experiments happens to
lie inside this particular
21
00:00:56,840 --> 00:01:00,470
event B. Given this information,
it kind of
22
00:01:00,470 --> 00:01:03,070
changes what we know about
the situation.
23
00:01:03,070 --> 00:01:05,510
It tells us that the outcome
is going to be somewhere
24
00:01:05,510 --> 00:01:06,630
inside here.
25
00:01:06,630 --> 00:01:09,800
So this is essentially
our new sample space.
26
00:01:09,800 --> 00:01:13,130
And now we need to we reassign
probabilities to the various
27
00:01:13,130 --> 00:01:16,550
possible outcomes, because, for
example, these outcomes,
28
00:01:16,550 --> 00:01:20,340
even if they had positive
probability beforehand, now
29
00:01:20,340 --> 00:01:22,890
that we're told that B occurred,
those outcomes out
30
00:01:22,890 --> 00:01:25,220
there are going to have
zero probability.
31
00:01:25,220 --> 00:01:27,670
So we need to revise
our probabilities.
32
00:01:27,670 --> 00:01:29,740
The new probabilities are
called conditional
33
00:01:29,740 --> 00:01:33,390
probabilities, and they're
defined this way.
34
00:01:33,390 --> 00:01:37,000
The conditional probability that
A occurs given that we're
35
00:01:37,000 --> 00:01:40,670
told that B occurred is
calculated by this formula,
36
00:01:40,670 --> 00:01:42,880
which tells us the following--
37
00:01:42,880 --> 00:01:45,750
out of the total probability
that was initially assigned to
38
00:01:45,750 --> 00:01:49,760
the event B, what fraction of
that probability is assigned
39
00:01:49,760 --> 00:01:54,310
to outcomes that also
make A to happen?
40
00:01:54,310 --> 00:01:58,650
So out of the total probability
assigned to B, we
41
00:01:58,650 --> 00:02:03,080
see what fraction of that total
probability is assigned
42
00:02:03,080 --> 00:02:06,650
to those elements here that
will also make A happen.
43
00:02:06,650 --> 00:02:09,360
Conditional probabilities are
left undefined if the
44
00:02:09,360 --> 00:02:12,860
denominator here is zero.
45
00:02:12,860 --> 00:02:16,030
An easy consequence of the
definition is if we bring that
46
00:02:16,030 --> 00:02:18,670
term to the other side, then we
can find the probability of
47
00:02:18,670 --> 00:02:22,040
two things happening by taking
the probability that the first
48
00:02:22,040 --> 00:02:25,170
thing happens, and then, given
that the first thing happened,
49
00:02:25,170 --> 00:02:28,230
the conditional probability that
the second one happens.
50
00:02:28,230 --> 00:02:31,680
Then we saw last time that we
can divide and conquer in
51
00:02:31,680 --> 00:02:36,010
calculating probabilities of
mildly complicated events by
52
00:02:36,010 --> 00:02:39,070
breaking it down into
different scenarios.
53
00:02:39,070 --> 00:02:41,350
So event B can happen
in two ways.
54
00:02:41,350 --> 00:02:44,810
It can happen either together
with A, which is this
55
00:02:44,810 --> 00:02:48,700
probability, or it can happen
together with A complement,
56
00:02:48,700 --> 00:02:49,880
which is this probability.
57
00:02:49,880 --> 00:02:53,100
So basically what we're saying
that the total probability of
58
00:02:53,100 --> 00:02:57,960
B is the probability of this,
which is A intersection B,
59
00:02:57,960 --> 00:03:02,620
plus the probability of that,
which is A complement
60
00:03:02,620 --> 00:03:07,530
intersection B.
61
00:03:07,530 --> 00:03:11,355
So these two facts here,
multiplication rule and the
62
00:03:11,355 --> 00:03:14,760
total probability theorem, are
basic tools that one uses to
63
00:03:14,760 --> 00:03:16,990
break down probability
calculations
64
00:03:16,990 --> 00:03:18,930
into a simpler parts.
65
00:03:18,930 --> 00:03:21,600
So we find probabilities of
two things happening by
66
00:03:21,600 --> 00:03:24,300
looking at each one at a time.
67
00:03:24,300 --> 00:03:27,830
And this is what we do to break
up a situation with two
68
00:03:27,830 --> 00:03:29,760
different possible scenarios.
69
00:03:29,760 --> 00:03:32,110
Then we also have
the Bayes rule,
70
00:03:32,110 --> 00:03:33,600
which does the following.
71
00:03:33,600 --> 00:03:36,640
Given a model that has
conditional probabilities of
72
00:03:36,640 --> 00:03:38,970
this kind, the Bayes rule
allows us to calculate
73
00:03:38,970 --> 00:03:41,570
conditional probabilities in
which the events appear in
74
00:03:41,570 --> 00:03:43,020
different order.
75
00:03:43,020 --> 00:03:45,740
You can think of these
probabilities as describing a
76
00:03:45,740 --> 00:03:49,270
causal model of a certain
situation, whereas these are
77
00:03:49,270 --> 00:03:52,670
the probabilities that you get
after you do some inference
78
00:03:52,670 --> 00:03:55,480
based on the information that
you have available.
79
00:03:55,480 --> 00:03:59,200
Now the Bayes rule, we derived
it, and it's a trivial
80
00:03:59,200 --> 00:04:01,040
half-line calculation.
81
00:04:01,040 --> 00:04:03,670
But it underlies lots
and lots of useful
82
00:04:03,670 --> 00:04:05,290
things in the real world.
83
00:04:05,290 --> 00:04:07,650
We had the radar example
last time.
84
00:04:07,650 --> 00:04:10,410
You can think of more
complicated situations in
85
00:04:10,410 --> 00:04:14,570
which there's a bunch or lots of
different hypotheses about
86
00:04:14,570 --> 00:04:15,920
the environment.
87
00:04:15,920 --> 00:04:18,899
Given any particular setting in
the environment, you have a
88
00:04:18,899 --> 00:04:21,140
measuring device that
can produce
89
00:04:21,140 --> 00:04:23,670
many different outcomes.
90
00:04:23,670 --> 00:04:29,210
And you observe the final
outcome out of your measuring
91
00:04:29,210 --> 00:04:31,820
device, and you're trying
to guess which
92
00:04:31,820 --> 00:04:34,210
particular branch occurred.
93
00:04:34,210 --> 00:04:36,640
That is, you're trying to guess
the state of the world
94
00:04:36,640 --> 00:04:38,500
based on a particular
measurement.
95
00:04:38,500 --> 00:04:40,770
That's what inference
is all about.
96
00:04:40,770 --> 00:04:44,450
So real world problems only
differ from the simple example
97
00:04:44,450 --> 00:04:48,610
that we saw last time in that
this kind of tree is a little
98
00:04:48,610 --> 00:04:50,040
more complicated.
99
00:04:50,040 --> 00:04:52,150
You might have infinitely
many possible
100
00:04:52,150 --> 00:04:54,450
outcomes here and so on.
101
00:04:54,450 --> 00:04:57,960
So setting up the model may be
more elaborate, but the basic
102
00:04:57,960 --> 00:05:01,170
calculation that's done based
on the Bayes rule is
103
00:05:01,170 --> 00:05:04,430
essentially the same as
the one that we saw.
104
00:05:04,430 --> 00:05:07,190
Now something that we discuss
is that sometimes we use
105
00:05:07,190 --> 00:05:11,050
conditional probabilities to
describe models, and let's do
106
00:05:11,050 --> 00:05:14,030
this by looking at a
model where we toss
107
00:05:14,030 --> 00:05:16,150
a coin three times.
108
00:05:16,150 --> 00:05:19,090
And how do we use conditional
probabilities to
109
00:05:19,090 --> 00:05:20,630
describe the situation?
110
00:05:20,630 --> 00:05:22,950
So we have one experiment.
111
00:05:22,950 --> 00:05:26,590
But that one experiment consists
of three consecutive
112
00:05:26,590 --> 00:05:27,880
coin tosses.
113
00:05:27,880 --> 00:05:32,380
So the possible outcomes, our
sample space, consists of
114
00:05:32,380 --> 00:05:37,070
strings of length 3 that tell
us whether we had heads,
115
00:05:37,070 --> 00:05:39,200
tails, and in what sequence.
116
00:05:39,200 --> 00:05:43,110
So three heads in a row is
one particular outcome.
117
00:05:43,110 --> 00:05:46,460
So what is the meaning
of those labels in
118
00:05:46,460 --> 00:05:48,030
front of the branches?
119
00:05:48,030 --> 00:05:51,850
So this P here, of course,
stands for the probability
120
00:05:51,850 --> 00:05:55,640
that the first toss
resulted in heads.
121
00:05:55,640 --> 00:05:59,270
And let me use this notation
to denote that
122
00:05:59,270 --> 00:06:01,170
the first was heads.
123
00:06:01,170 --> 00:06:04,570
I put an H in toss one.
124
00:06:04,570 --> 00:06:08,350
How about the meaning of
this probability here?
125
00:06:08,350 --> 00:06:10,570
Well the meaning of this
probability is
126
00:06:10,570 --> 00:06:11,670
a conditional one.
127
00:06:11,670 --> 00:06:14,170
It's the conditional probability
that the second
128
00:06:14,170 --> 00:06:18,340
toss resulted in heads,
given that the first
129
00:06:18,340 --> 00:06:21,440
one resulted in heads.
130
00:06:21,440 --> 00:06:26,830
And similarly this label here
corresponds to the probability
131
00:06:26,830 --> 00:06:31,550
that the third toss resulted in
heads, given that the first
132
00:06:31,550 --> 00:06:35,010
one and the second one
resulted in heads.
133
00:06:35,010 --> 00:06:39,610
So in this particular model that
I wrote down here, those
134
00:06:39,610 --> 00:06:44,740
probabilities, P, of obtaining
heads remain the same no
135
00:06:44,740 --> 00:06:47,570
matter what happened in
the previous toss.
136
00:06:47,570 --> 00:06:52,020
For example, even if the first
toss was tails, we still have
137
00:06:52,020 --> 00:06:56,920
the same probability, P, that
the second one is heads, given
138
00:06:56,920 --> 00:06:59,100
that the first one was tails.
139
00:06:59,100 --> 00:07:01,820
So we're assuming that no matter
what happened in the
140
00:07:01,820 --> 00:07:05,550
first toss, the second toss will
still have a conditional
141
00:07:05,550 --> 00:07:08,960
probability equal to P. So that
conditional probability
142
00:07:08,960 --> 00:07:12,800
does not depend on what happened
in the first toss.
143
00:07:12,800 --> 00:07:16,040
And we will see that this is a
very special situation, and
144
00:07:16,040 --> 00:07:19,240
that's really the concept of
independence that we are going
145
00:07:19,240 --> 00:07:20,850
to introduce shortly.
146
00:07:20,850 --> 00:07:25,540
But before we get to
independence, let's practice
147
00:07:25,540 --> 00:07:29,060
once more the three skills that
we covered last time in
148
00:07:29,060 --> 00:07:30,490
this example.
149
00:07:30,490 --> 00:07:33,470
So first skill was
multiplication rule.
150
00:07:33,470 --> 00:07:35,660
How do you find the
probability of
151
00:07:35,660 --> 00:07:38,000
several things happening?
152
00:07:38,000 --> 00:07:41,390
That is the probability that
we have tails followed by
153
00:07:41,390 --> 00:07:44,140
heads followed by tails.
154
00:07:44,140 --> 00:07:50,350
So here we're talking about this
particular outcome here,
155
00:07:50,350 --> 00:07:53,130
tails followed by heads
followed by tails.
156
00:07:53,130 --> 00:07:57,070
And the way we calculate such
a probability is by
157
00:07:57,070 --> 00:08:01,480
multiplying conditional
probabilities along the path
158
00:08:01,480 --> 00:08:03,560
that takes us to this outcome.
159
00:08:03,560 --> 00:08:05,160
And so these conditional
probabilities
160
00:08:05,160 --> 00:08:06,220
are recorded here.
161
00:08:06,220 --> 00:08:11,840
So it's going to be (1 minus P)
times P times (1 minus P).
162
00:08:11,840 --> 00:08:14,480
So this is the multiplication
rule.
163
00:08:14,480 --> 00:08:17,990
Second question is how do we
find the probability of a
164
00:08:17,990 --> 00:08:20,510
mildly complicated event?
165
00:08:20,510 --> 00:08:23,520
So the event of interest here
that I wrote down is the
166
00:08:23,520 --> 00:08:25,850
probability that in the
three tosses, we had a
167
00:08:25,850 --> 00:08:28,650
total of one head.
168
00:08:28,650 --> 00:08:30,470
Exactly one head.
169
00:08:30,470 --> 00:08:33,450
This is an event that can
happen in multiple ways.
170
00:08:33,450 --> 00:08:35,940
It happens here.
171
00:08:35,940 --> 00:08:38,200
It happens here.
172
00:08:38,200 --> 00:08:41,380
And it also happens here.
173
00:08:41,380 --> 00:08:44,480
So we want to find the total
probability of the event
174
00:08:44,480 --> 00:08:46,290
consisting of these
three outcomes.
175
00:08:46,290 --> 00:08:47,370
What do we do?
176
00:08:47,370 --> 00:08:51,100
We just add the probabilities
of each individual outcome.
177
00:08:51,100 --> 00:08:53,850
How do we find the probability
of an individual outcome?
178
00:08:53,850 --> 00:08:56,250
Well, that's what we just did.
179
00:08:56,250 --> 00:09:00,300
Now notice that this outcome
has probability P times (1
180
00:09:00,300 --> 00:09:01,550
minus P) squared.
181
00:09:01,550 --> 00:09:04,260
182
00:09:04,260 --> 00:09:07,000
That one should not be there.
183
00:09:07,000 --> 00:09:08,984
So where is it?
184
00:09:08,984 --> 00:09:09,750
Ah.
185
00:09:09,750 --> 00:09:11,000
It's this one.
186
00:09:11,000 --> 00:09:13,830
187
00:09:13,830 --> 00:09:18,610
OK, so the probability of this
outcome is (1 minus P times P)
188
00:09:18,610 --> 00:09:20,970
times (1 minus P), the
same probability.
189
00:09:20,970 --> 00:09:25,470
And finally, this one is again
(1 minus P) squared times P.
190
00:09:25,470 --> 00:09:29,240
So this event of one head can
happen in three ways.
191
00:09:29,240 --> 00:09:32,270
And each one of those three ways
has the same probability
192
00:09:32,270 --> 00:09:33,380
of occurring.
193
00:09:33,380 --> 00:09:36,440
And this is the answer.
194
00:09:36,440 --> 00:09:40,110
And finally, the last thing that
we learned how to do is
195
00:09:40,110 --> 00:09:41,980
to use the Bayes rule to
196
00:09:41,980 --> 00:09:44,230
calculate and make an inference.
197
00:09:44,230 --> 00:09:47,045
So somebody tells you that there
was exactly one head in
198
00:09:47,045 --> 00:09:49,350
your three tosses.
199
00:09:49,350 --> 00:09:52,610
What is the probability
that the first
200
00:09:52,610 --> 00:09:55,110
toss resulted in heads?
201
00:09:55,110 --> 00:09:59,980
OK, I guess you can guess the
answer here if I tell you that
202
00:09:59,980 --> 00:10:01,710
there were three tosses.
203
00:10:01,710 --> 00:10:03,590
One of them was heads.
204
00:10:03,590 --> 00:10:05,670
Where was that head
in the first, the
205
00:10:05,670 --> 00:10:07,300
second, or the third?
206
00:10:07,300 --> 00:10:10,410
Well, by symmetry, they should
all be equally likely.
207
00:10:10,410 --> 00:10:13,770
So there should be probably
just 1/3 that that head
208
00:10:13,770 --> 00:10:16,070
occurred in the first toss.
209
00:10:16,070 --> 00:10:19,230
Let's check our intuition
using the definitions.
210
00:10:19,230 --> 00:10:21,280
So the definition of conditional
probability tells
211
00:10:21,280 --> 00:10:26,030
us the conditional probability
is the probability of both
212
00:10:26,030 --> 00:10:27,310
things happening.
213
00:10:27,310 --> 00:10:33,890
First toss is heads, and we have
exactly one head divided
214
00:10:33,890 --> 00:10:36,430
by the probability
of one head.
215
00:10:36,430 --> 00:10:40,720
216
00:10:40,720 --> 00:10:44,860
What is the probability that the
first toss is heads, and
217
00:10:44,860 --> 00:10:47,340
we have exactly one head?
218
00:10:47,340 --> 00:10:51,810
This is the same as the event
heads, tails, tails.
219
00:10:51,810 --> 00:10:54,280
If I tell you that the first is
heads, and there's only one
220
00:10:54,280 --> 00:10:57,030
head, it means that the
others are tails.
221
00:10:57,030 --> 00:11:03,080
So this is the probability of
heads, tails, tails divided by
222
00:11:03,080 --> 00:11:06,080
the probability of one head.
223
00:11:06,080 --> 00:11:08,660
And we know all of these
quantities probability of
224
00:11:08,660 --> 00:11:12,080
heads, tails, tails is P times
(1 minus P) squared.
225
00:11:12,080 --> 00:11:14,806
Probability of one
head is 3 times P
226
00:11:14,806 --> 00:11:17,680
times (1 minus P) squared.
227
00:11:17,680 --> 00:11:22,820
So the final answer is 1/3,
which is what you should have
228
00:11:22,820 --> 00:11:27,280
a guessed on intuitive
grounds.
229
00:11:27,280 --> 00:11:27,740
Very good.
230
00:11:27,740 --> 00:11:31,110
So we got our practice on
the material that we
231
00:11:31,110 --> 00:11:33,040
did cover last time.
232
00:11:33,040 --> 00:11:33,700
Again, think.
233
00:11:33,700 --> 00:11:38,050
There's basically three basic
skills that we are practicing
234
00:11:38,050 --> 00:11:40,210
and exercising here.
235
00:11:40,210 --> 00:11:43,870
In the problems, quizzes, and in
the real life, you may have
236
00:11:43,870 --> 00:11:47,560
to apply those three skills in
somewhat more complicated
237
00:11:47,560 --> 00:11:49,590
settings, but in the
end that's what it
238
00:11:49,590 --> 00:11:51,860
boils down to usually.
239
00:11:51,860 --> 00:11:55,240
Now let's focus on this special
feature of this
240
00:11:55,240 --> 00:11:59,610
particular model that I
discussed a little earlier.
241
00:11:59,610 --> 00:12:03,010
Think of the event heads
in the second toss.
242
00:12:03,010 --> 00:12:05,690
243
00:12:05,690 --> 00:12:09,750
Initially, the probability of
heads in the second toss, you
244
00:12:09,750 --> 00:12:12,460
know, that it's P, the
probability of
245
00:12:12,460 --> 00:12:14,290
success of your coin.
246
00:12:14,290 --> 00:12:19,100
If I tell you that the first
toss resulted in heads, what's
247
00:12:19,100 --> 00:12:21,240
the probability that the
second toss is heads?
248
00:12:21,240 --> 00:12:24,870
It's again P. If I tell you that
the first toss was tails,
249
00:12:24,870 --> 00:12:27,510
what's the probability that
the second toss is heads?
250
00:12:27,510 --> 00:12:33,290
It's again P. So whether I tell
you the result of the
251
00:12:33,290 --> 00:12:37,280
first toss, or I don't tell
you, it doesn't make any
252
00:12:37,280 --> 00:12:38,490
difference to you.
253
00:12:38,490 --> 00:12:40,690
You would always say the
probability of heads in the
254
00:12:40,690 --> 00:12:44,970
second toss is going to P, no
matter what happened in the
255
00:12:44,970 --> 00:12:46,400
first toss.
256
00:12:46,400 --> 00:12:49,550
This is a special situation to
which we're going to give a
257
00:12:49,550 --> 00:12:53,540
name, and we're going to call
that property independence.
258
00:12:53,540 --> 00:12:58,520
Basically independence between
two things stands for the fact
259
00:12:58,520 --> 00:13:02,690
that the first thing, whether
it occurred or not, doesn't
260
00:13:02,690 --> 00:13:05,630
give you any information, does
not cause you to change your
261
00:13:05,630 --> 00:13:08,980
beliefs about the
second event.
262
00:13:08,980 --> 00:13:11,600
This is the intuition.
263
00:13:11,600 --> 00:13:16,130
Let's try to translate this
into mathematics.
264
00:13:16,130 --> 00:13:19,510
We have two events, and we're
going to say that they're
265
00:13:19,510 --> 00:13:26,010
independent if your initial
beliefs about B are not going
266
00:13:26,010 --> 00:13:30,140
to change if I tell you
that A occurred.
267
00:13:30,140 --> 00:13:34,700
So you believe something
how likely B is.
268
00:13:34,700 --> 00:13:37,640
Then somebody comes and tells
you, you know, A has happened.
269
00:13:37,640 --> 00:13:39,790
Are you going to change
your beliefs?
270
00:13:39,790 --> 00:13:42,200
No, I'm not going
to change them.
271
00:13:42,200 --> 00:13:45,020
Whenever you are in such a
situation, then you say that
272
00:13:45,020 --> 00:13:47,040
the two events are
independent.
273
00:13:47,040 --> 00:13:51,470
Intuitively, the fact that A
occurred does not convey any
274
00:13:51,470 --> 00:13:55,720
information to you about the
likelihood of event B. The
275
00:13:55,720 --> 00:13:58,480
information that A provides
is not so
276
00:13:58,480 --> 00:14:00,780
useful, is not relevant.
277
00:14:00,780 --> 00:14:03,010
A has to do with
something else.
278
00:14:03,010 --> 00:14:06,250
It's not useful for your
guessing whether B is going to
279
00:14:06,250 --> 00:14:07,780
occur or not.
280
00:14:07,780 --> 00:14:13,650
So we can take this as a first
attempt into a definition of
281
00:14:13,650 --> 00:14:15,870
independence.
282
00:14:15,870 --> 00:14:23,130
Now remember that we have this
property, the probability of
283
00:14:23,130 --> 00:14:25,690
two things happening is the
probability of the first times
284
00:14:25,690 --> 00:14:27,920
the conditional probability
of the second.
285
00:14:27,920 --> 00:14:31,390
If we have independence, this
conditional probability is the
286
00:14:31,390 --> 00:14:33,840
same as the unconditional
probability.
287
00:14:33,840 --> 00:14:38,040
So if we have independence
according to that definition,
288
00:14:38,040 --> 00:14:41,190
we get this property that you
can find the probability of
289
00:14:41,190 --> 00:14:44,440
two things happening by just
multiplying their individual
290
00:14:44,440 --> 00:14:45,640
probabilities.
291
00:14:45,640 --> 00:14:48,070
Probability of heads in
the first toss is 1/2.
292
00:14:48,070 --> 00:14:50,900
Probability of heads in the
second toss is 1/2.
293
00:14:50,900 --> 00:14:54,200
Probability of heads
heads is 1/4.
294
00:14:54,200 --> 00:14:57,590
That's what happens if your two
tosses are independent of
295
00:14:57,590 --> 00:14:58,730
each other.
296
00:14:58,730 --> 00:15:03,110
So this property here is
a consequence of this
297
00:15:03,110 --> 00:15:08,470
definition, but it's actually
nicer, better, simpler,
298
00:15:08,470 --> 00:15:12,880
cleaner, more beautiful to take
this as our definition
299
00:15:12,880 --> 00:15:14,380
instead of that one.
300
00:15:14,380 --> 00:15:17,180
Are the two definitions
equivalent?
301
00:15:17,180 --> 00:15:21,040
Well, they're are almost the
same, except for one thing.
302
00:15:21,040 --> 00:15:24,250
Conditional probabilities are
only defined if you condition
303
00:15:24,250 --> 00:15:26,900
on an event that has positive
probability.
304
00:15:26,900 --> 00:15:31,090
So this definition would be
limited to cases where event A
305
00:15:31,090 --> 00:15:34,080
has positive probability,
whereas this definition is
306
00:15:34,080 --> 00:15:38,140
something that you can
write down always.
307
00:15:38,140 --> 00:15:43,280
We will say that two events are
independent if and only if
308
00:15:43,280 --> 00:15:46,940
their probability of happening
simultaneously is equal to the
309
00:15:46,940 --> 00:15:50,470
product of their two individual
probabilities.
310
00:15:50,470 --> 00:15:54,690
And in particular, we can have
events of zero probability.
311
00:15:54,690 --> 00:15:56,220
There's nothing wrong
with that.
312
00:15:56,220 --> 00:16:01,450
If A has 0 probability, then A
intersection B will also have
313
00:16:01,450 --> 00:16:04,990
zero probability, because it's
an even smaller event.
314
00:16:04,990 --> 00:16:09,200
And so we're going to get
zero is equal to zero.
315
00:16:09,200 --> 00:16:13,920
A corollary of what I just said,
if an event A has zero
316
00:16:13,920 --> 00:16:17,700
probability, it's actually
independent of any other event
317
00:16:17,700 --> 00:16:20,220
in our model, because
we're going to get
318
00:16:20,220 --> 00:16:21,810
zero is equal to zero.
319
00:16:21,810 --> 00:16:24,140
And the definition is going
to be satisfied.
320
00:16:24,140 --> 00:16:27,560
This is a little bit harder to
reconcile with the intuition
321
00:16:27,560 --> 00:16:32,800
we have about independence, but
then again, it's part of
322
00:16:32,800 --> 00:16:35,610
the mathematical definition.
323
00:16:35,610 --> 00:16:40,450
So what I want you to retain
is this notion that the
324
00:16:40,450 --> 00:16:46,300
independence is something that
you can check formally using
325
00:16:46,300 --> 00:16:50,420
this definition, but also you
can check intuitively by if,
326
00:16:50,420 --> 00:16:54,280
in some cases, you can reason
that whatever happens and
327
00:16:54,280 --> 00:16:58,310
determines whether A is going
to occur or not, has nothing
328
00:16:58,310 --> 00:17:01,850
absolutely to do with whatever
happens and determines whether
329
00:17:01,850 --> 00:17:04,369
B is going to occur or not.
330
00:17:04,369 --> 00:17:08,440
So if I'm doing a science
experiment in this room, and
331
00:17:08,440 --> 00:17:12,569
it gets hit by some noise that's
causes randomness.
332
00:17:12,569 --> 00:17:16,040
And then five years later,
somebody somewhere else does
333
00:17:16,040 --> 00:17:19,069
the same science experiment
somewhere else, it gets hit by
334
00:17:19,069 --> 00:17:23,230
other noise, you would usually
say that these experiments are
335
00:17:23,230 --> 00:17:23,940
independent.
336
00:17:23,940 --> 00:17:30,230
So what events happen in one
experiment are not going to
337
00:17:30,230 --> 00:17:33,290
change your beliefs about what
might be happening in the
338
00:17:33,290 --> 00:17:36,610
other, because the sources of
noise in these two experiments
339
00:17:36,610 --> 00:17:38,350
are completely unrelated.
340
00:17:38,350 --> 00:17:40,110
They have nothing to
do with each other.
341
00:17:40,110 --> 00:17:43,470
So if I flip a coin here today,
and I flip a coin in my
342
00:17:43,470 --> 00:17:47,890
office tomorrow, one shouldn't
affect the other.
343
00:17:47,890 --> 00:17:52,690
So the events that I get from
these should be independent.
344
00:17:52,690 --> 00:17:55,700
So that's usually how
independence arises.
345
00:17:55,700 --> 00:17:57,580
By having distinct physical
346
00:17:57,580 --> 00:17:59,940
phenomena that do not interact.
347
00:17:59,940 --> 00:18:03,690
Sometimes you also get
independence even though there
348
00:18:03,690 --> 00:18:06,590
is a physical interaction, but
you just happen to have a
349
00:18:06,590 --> 00:18:08,930
numerical accident.
350
00:18:08,930 --> 00:18:13,340
A and B might be physically
related very tightly, but a
351
00:18:13,340 --> 00:18:16,820
numerical accident happens and
you get equality here, that's
352
00:18:16,820 --> 00:18:20,070
another case where we
do get independence.
353
00:18:20,070 --> 00:18:24,350
Now suppose that we have
two events that are
354
00:18:24,350 --> 00:18:27,380
laid out like this.
355
00:18:27,380 --> 00:18:30,240
Are these two events
independent or not?
356
00:18:30,240 --> 00:18:34,570
357
00:18:34,570 --> 00:18:36,620
The picture kind of tells
you that one is
358
00:18:36,620 --> 00:18:38,140
separate from the other.
359
00:18:38,140 --> 00:18:41,170
But separate has nothing
to do with independent.
360
00:18:41,170 --> 00:18:45,340
In fact, these two events are as
dependent as Siamese twins.
361
00:18:45,340 --> 00:18:46,480
Why is that?
362
00:18:46,480 --> 00:18:51,560
If I tell you that A occurred,
then you are certain that B
363
00:18:51,560 --> 00:18:53,060
did not occur.
364
00:18:53,060 --> 00:18:57,780
So information about the
occurrence of A definitely
365
00:18:57,780 --> 00:19:01,090
affects your beliefs about the
possible occurrence or
366
00:19:01,090 --> 00:19:05,490
non-occurrence of B. When the
picture is like that, knowing
367
00:19:05,490 --> 00:19:09,480
that A occurred will change
drastically my beliefs about
368
00:19:09,480 --> 00:19:13,030
B, because now I suddenly
become certain
369
00:19:13,030 --> 00:19:14,870
that B did not occur.
370
00:19:14,870 --> 00:19:18,260
So a picture like this is a
case actually of extreme
371
00:19:18,260 --> 00:19:19,360
dependence.
372
00:19:19,360 --> 00:19:23,440
So don't confuse independence
with disjointness.
373
00:19:23,440 --> 00:19:26,406
They're very different
types of properties.
374
00:19:26,406 --> 00:19:27,080
AUDIENCE: Question.
375
00:19:27,080 --> 00:19:27,520
PROFESSOR: Yes?
376
00:19:27,520 --> 00:19:29,400
AUDIENCE: So I understand
the explanation, but the
377
00:19:29,400 --> 00:19:31,954
probability of A intersect B
[INAUDIBLE] to zero, because
378
00:19:31,954 --> 00:19:32,910
they're disjoint.
379
00:19:32,910 --> 00:19:33,388
PROFESSOR: Yes.
380
00:19:33,388 --> 00:19:35,539
AUDIENCE: But then the product
of probability A and
381
00:19:35,539 --> 00:19:37,690
probability B, one of them
is going to be 1.
382
00:19:37,690 --> 00:19:39,602
[INAUDIBLE]
383
00:19:39,602 --> 00:19:42,690
PROFESSOR: No, suppose that
the probabilities are 1/3,
384
00:19:42,690 --> 00:19:46,610
1/4, and the rest
is out there.
385
00:19:46,610 --> 00:19:48,560
You check the definition
of independence.
386
00:19:48,560 --> 00:19:52,440
Probability of A intersection
B is zero.
387
00:19:52,440 --> 00:19:58,520
Probability of A times the
probability of B is 1/12.
388
00:19:58,520 --> 00:20:00,630
The two are not equal.
389
00:20:00,630 --> 00:20:02,710
Therefore we do not
have independence.
390
00:20:02,710 --> 00:20:03,199
AUDIENCE: Right.
391
00:20:03,199 --> 00:20:05,644
So what's wrong with the
intuition of the probability
392
00:20:05,644 --> 00:20:09,556
of A being 1, and the
other one being 0?
393
00:20:09,556 --> 00:20:12,490
[INAUDIBLE].
394
00:20:12,490 --> 00:20:12,610
PROFESSOR: No.
395
00:20:12,610 --> 00:20:19,340
The probability of A given
B is equal to 0.
396
00:20:19,340 --> 00:20:23,870
Probability of A is
equal to 1/3.
397
00:20:23,870 --> 00:20:26,650
So again, these two
are different.
398
00:20:26,650 --> 00:20:30,210
So we had some initial beliefs
about A, but as soon as we are
399
00:20:30,210 --> 00:20:34,440
told that B occurred, our
beliefs about A changed.
400
00:20:34,440 --> 00:20:37,770
And so since our beliefs
changed, that means that B
401
00:20:37,770 --> 00:20:40,666
conveys information about A.
402
00:20:40,666 --> 00:20:42,931
AUDIENCE: So can you not draw
independent [INAUDIBLE] on a
403
00:20:42,931 --> 00:20:43,390
Venn diagram?
404
00:20:43,390 --> 00:20:44,430
PROFESSOR: I can't hear you.
405
00:20:44,430 --> 00:20:45,352
AUDIENCE: Can you draw
406
00:20:45,352 --> 00:20:46,735
independence on a Venn diagram?
407
00:20:46,735 --> 00:20:51,320
PROFESSOR: No, the Venn diagram
is never enough to
408
00:20:51,320 --> 00:20:53,400
decide independence.
409
00:20:53,400 --> 00:20:56,350
So the typical picture in which
you're going to have
410
00:20:56,350 --> 00:21:00,120
independence would be one event
this way, and another
411
00:21:00,120 --> 00:21:01,760
event this way.
412
00:21:01,760 --> 00:21:03,800
You need to take the probability
of this times the
413
00:21:03,800 --> 00:21:07,795
probability of that, and check
that, numerically, it's equal
414
00:21:07,795 --> 00:21:11,350
to the probability of
this intersection.
415
00:21:11,350 --> 00:21:14,330
So it's more than
a Venn diagram.
416
00:21:14,330 --> 00:21:16,138
Numbers need to come
out right.
417
00:21:16,138 --> 00:21:19,730
418
00:21:19,730 --> 00:21:23,570
Now we did say some time ago
that conditional probabilities
419
00:21:23,570 --> 00:21:27,680
are just like ordinary
probabilities, and whatever we
420
00:21:27,680 --> 00:21:31,870
do in probability theory
can also be done
421
00:21:31,870 --> 00:21:34,350
in conditional universes.
422
00:21:34,350 --> 00:21:37,680
Talking about conditional
probabilities.
423
00:21:37,680 --> 00:21:42,870
So since we have a notion of
independence, then there
424
00:21:42,870 --> 00:21:47,470
should be also a notion of
conditional independence.
425
00:21:47,470 --> 00:21:55,070
So independence was defined
by the probability that A
426
00:21:55,070 --> 00:21:59,070
intersection B is equal to the
probability of A times the
427
00:21:59,070 --> 00:22:01,920
probability of B.
428
00:22:01,920 --> 00:22:05,670
What would be a reasonable
definition of conditional
429
00:22:05,670 --> 00:22:06,840
independence?
430
00:22:06,840 --> 00:22:09,355
Conditional independence would
mean that this same property
431
00:22:09,355 --> 00:22:13,210
could be true, but in a
conditional universe where we
432
00:22:13,210 --> 00:22:15,660
are told that the certain
event happens.
433
00:22:15,660 --> 00:22:19,060
So if we're told that the event
C has happened, then
434
00:22:19,060 --> 00:22:22,240
were transported in a
conditional universe where the
435
00:22:22,240 --> 00:22:26,460
only thing that matters are
conditional probabilities.
436
00:22:26,460 --> 00:22:31,320
And this is just the same plain,
previous definition of
437
00:22:31,320 --> 00:22:35,190
independence, but applied in
a conditional universe.
438
00:22:35,190 --> 00:22:40,020
So this is the definition of
conditional independence.
439
00:22:40,020 --> 00:22:43,390
440
00:22:43,390 --> 00:22:46,940
So it's independence, but with
reference to the conditional
441
00:22:46,940 --> 00:22:48,830
probabilities.
442
00:22:48,830 --> 00:22:51,830
And intuitively it has, again,
the same meaning, that in the
443
00:22:51,830 --> 00:22:56,410
conditional world, if I tell you
that A occurred, then that
444
00:22:56,410 --> 00:22:58,940
doesn't change your
beliefs about B.
445
00:22:58,940 --> 00:23:01,100
So suppose you had a
picture like this.
446
00:23:01,100 --> 00:23:06,880
And somebody told you that
events A and B are independent
447
00:23:06,880 --> 00:23:09,630
unconditionally.
448
00:23:09,630 --> 00:23:14,320
Then somebody comes and tells
you that event C actually has
449
00:23:14,320 --> 00:23:18,150
occurred, so we now live
in this new universe.
450
00:23:18,150 --> 00:23:22,450
In this new universe, is the
independence of A and B going
451
00:23:22,450 --> 00:23:25,180
to be preserved or not?
452
00:23:25,180 --> 00:23:29,300
Are A and B independent
in this new universe?
453
00:23:29,300 --> 00:23:34,780
The answer is no, because in the
new universe, whatever is
454
00:23:34,780 --> 00:23:36,790
left of event A is this piece.
455
00:23:36,790 --> 00:23:39,630
Whatever is left of event
B is this piece.
456
00:23:39,630 --> 00:23:42,310
And these two pieces
are disjoint.
457
00:23:42,310 --> 00:23:45,490
So we are back in a situation
of this kind.
458
00:23:45,490 --> 00:23:46,450
So in the conditional
459
00:23:46,450 --> 00:23:49,620
universe, A and B are disjoint.
460
00:23:49,620 --> 00:23:53,380
And therefore, generically,
they're not going to be
461
00:23:53,380 --> 00:23:54,730
independent.
462
00:23:54,730 --> 00:23:58,030
What's the moral of
this example?
463
00:23:58,030 --> 00:24:01,870
Having independence in the
original model does not imply
464
00:24:01,870 --> 00:24:05,930
independence in a conditional
model.
465
00:24:05,930 --> 00:24:08,490
The opposite is also possible.
466
00:24:08,490 --> 00:24:12,160
And let's illustrate
by another example.
467
00:24:12,160 --> 00:24:17,960
So I have two coins, and both
of them are badly biased.
468
00:24:17,960 --> 00:24:21,680
One coin is much biased
in favor of heads.
469
00:24:21,680 --> 00:24:25,320
The other coin is much biased
in favor of tails.
470
00:24:25,320 --> 00:24:28,050
So the probabilities
being 90%.
471
00:24:28,050 --> 00:24:33,050
Let's consider independent flips
of coin A. This is the
472
00:24:33,050 --> 00:24:34,980
relevant model.
473
00:24:34,980 --> 00:24:39,600
This is a model of two
independent flips
474
00:24:39,600 --> 00:24:41,240
of the first coin.
475
00:24:41,240 --> 00:24:43,850
There's going to be two flips,
and each one has probability
476
00:24:43,850 --> 00:24:46,080
0.9 of being heads.
477
00:24:46,080 --> 00:24:49,330
So that's a model that describes
coin A. You can
478
00:24:49,330 --> 00:24:52,540
think of this as a conditional
model which is a model of the
479
00:24:52,540 --> 00:24:55,940
coin flips conditioned on the
fact that they have chosen
480
00:24:55,940 --> 00:24:57,460
coin A.
481
00:24:57,460 --> 00:25:01,460
Alternatively we could be
dealing with coin B In a
482
00:25:01,460 --> 00:25:05,260
conditional world where we
chose coin B and flip it
483
00:25:05,260 --> 00:25:08,130
twice, this is the
relevant model.
484
00:25:08,130 --> 00:25:10,660
The probability of two heads,
for example, is the
485
00:25:10,660 --> 00:25:13,280
probability of heads the first
time, heads the second time,
486
00:25:13,280 --> 00:25:16,070
and each one is 0.1.
487
00:25:16,070 --> 00:25:19,960
Now I'm building this into a
bigger experiment in which I
488
00:25:19,960 --> 00:25:25,160
first start by choosing one of
the two coins at random.
489
00:25:25,160 --> 00:25:26,620
So I have these two coins.
490
00:25:26,620 --> 00:25:28,610
I blindly pick one of them.
491
00:25:28,610 --> 00:25:32,610
And then I start
flipping them.
492
00:25:32,610 --> 00:25:36,620
So the question now is, are the
coin flips, or the coin
493
00:25:36,620 --> 00:25:39,730
tosses, are they independent
of each other?
494
00:25:39,730 --> 00:25:46,370
If we just stay inside this
sub-model here, are the coin
495
00:25:46,370 --> 00:25:47,620
flips independent?
496
00:25:47,620 --> 00:25:52,240
497
00:25:52,240 --> 00:25:56,540
They are independent, because
the probability of heads in
498
00:25:56,540 --> 00:26:01,780
the second toss is the same,
0.9, no matter what happened
499
00:26:01,780 --> 00:26:03,550
in the first toss.
500
00:26:03,550 --> 00:26:06,550
So the conditional probabilities
of what happens
501
00:26:06,550 --> 00:26:10,050
in the second toss are not
affected by the outcome of the
502
00:26:10,050 --> 00:26:11,180
first toss.
503
00:26:11,180 --> 00:26:14,620
So the second toss and the first
toss are independent.
504
00:26:14,620 --> 00:26:17,800
So here we're just dealing
with plain,
505
00:26:17,800 --> 00:26:19,990
independent coin flips.
506
00:26:19,990 --> 00:26:24,940
Similarity the coin flips within
this sub-model are also
507
00:26:24,940 --> 00:26:26,190
independent.
508
00:26:26,190 --> 00:26:28,840
509
00:26:28,840 --> 00:26:33,410
Now the question is, if we look
at the big model as just
510
00:26:33,410 --> 00:26:38,955
one probability model, instead
of looking at the conditional
511
00:26:38,955 --> 00:26:44,530
sub-models, are the coin flips
independent of each other?
512
00:26:44,530 --> 00:26:49,590
Does the outcome of a few coin
flips give you information
513
00:26:49,590 --> 00:26:53,610
about subsequent coin flips?
514
00:26:53,610 --> 00:27:02,570
Well if I observe ten
heads in a row--
515
00:27:02,570 --> 00:27:05,960
So instead of two coin flips,
now let's think of doing more
516
00:27:05,960 --> 00:27:10,070
of them so that the tree
gets expanded.
517
00:27:10,070 --> 00:27:13,800
So let's start with this.
518
00:27:13,800 --> 00:27:16,020
I don't know which coin it is.
519
00:27:16,020 --> 00:27:18,970
What's the probability that
the 11th coin toss
520
00:27:18,970 --> 00:27:20,220
is going to be heads?
521
00:27:20,220 --> 00:27:25,570
522
00:27:25,570 --> 00:27:29,370
There's complete symmetry here,
so the answer could not
523
00:27:29,370 --> 00:27:32,330
be anything other than 1/2.
524
00:27:32,330 --> 00:27:36,950
So let's justify it,
why is it 1/2?
525
00:27:36,950 --> 00:27:40,560
Well, the probability that the
11th toss is heads, how can
526
00:27:40,560 --> 00:27:42,380
that outcome happen?
527
00:27:42,380 --> 00:27:43,840
It can happen in two ways.
528
00:27:43,840 --> 00:27:50,480
You can choose coin A, which
happens with probability 1/2.
529
00:27:50,480 --> 00:27:54,370
And having chosen coin A,
there's probability 0.9 that
530
00:27:54,370 --> 00:27:58,500
it results in that you get
heads in the 11th toss.
531
00:27:58,500 --> 00:28:03,540
Or you can choose coin B. And
if it's coin B when you flip
532
00:28:03,540 --> 00:28:06,710
it, there's probably 0.1
that you have heads.
533
00:28:06,710 --> 00:28:08,860
So the final answer is 1/2.
534
00:28:08,860 --> 00:28:11,370
535
00:28:11,370 --> 00:28:14,820
So each one of the coins is
biased, but they're biased in
536
00:28:14,820 --> 00:28:16,190
different ways.
537
00:28:16,190 --> 00:28:20,340
If I don't know which coin it
is, their two biases kind of
538
00:28:20,340 --> 00:28:23,740
cancel out, and the probability
of obtaining heads
539
00:28:23,740 --> 00:28:27,880
is just in the middle,
then it's 1/2.
540
00:28:27,880 --> 00:28:31,720
Now if someone tells you that
the first ten tosses were
541
00:28:31,720 --> 00:28:34,940
heads, is that going to
change your beliefs
542
00:28:34,940 --> 00:28:37,300
about the 11th toss?
543
00:28:37,300 --> 00:28:41,820
Here's how a reasonable person
would think about it.
544
00:28:41,820 --> 00:28:49,480
If it's coin B the probability
of obtaining 10 heads in a row
545
00:28:49,480 --> 00:28:51,510
is negligible.
546
00:28:51,510 --> 00:28:55,270
It's going to be 0.1
to the 10th.
547
00:28:55,270 --> 00:28:59,110
If it's coin A. The probability
of 10 heads in a
548
00:28:59,110 --> 00:29:01,380
row is a more reasonable
number.
549
00:29:01,380 --> 00:29:03,850
It's 0.9 to the 10th.
550
00:29:03,850 --> 00:29:10,320
So this event is a lot more
likely to occur with coin A,
551
00:29:10,320 --> 00:29:13,910
rather than coin B.
552
00:29:13,910 --> 00:29:18,820
The plausible explanation of
having seen ten heads in a row
553
00:29:18,820 --> 00:29:25,730
is that I actually chose coin A.
When you see ten heads in a
554
00:29:25,730 --> 00:29:29,690
row, you are pretty certain that
it's coin A that we're
555
00:29:29,690 --> 00:29:30,940
dealing with.
556
00:29:30,940 --> 00:29:33,800
And once you're pretty certain
that it's coin A that we're
557
00:29:33,800 --> 00:29:36,350
dealing with, what's the
probability that the
558
00:29:36,350 --> 00:29:38,246
next toss is heads?
559
00:29:38,246 --> 00:29:40,960
It's going to be 0.9.
560
00:29:40,960 --> 00:29:45,270
So essentially here I'm doing
an inference calculation.
561
00:29:45,270 --> 00:29:48,990
Given this information, I'm
making an inference about
562
00:29:48,990 --> 00:29:50,700
which coin I'm dealing with.
563
00:29:50,700 --> 00:29:53,540
564
00:29:53,540 --> 00:29:57,240
I become pretty certain that
it's coin A, and given that
565
00:29:57,240 --> 00:30:00,640
it's coin A, this probability
is going to be 0.9.
566
00:30:00,640 --> 00:30:04,070
And I'm putting an approximate
sign here, because the
567
00:30:04,070 --> 00:30:06,220
inference that I did
is approximate.
568
00:30:06,220 --> 00:30:09,850
I'm pretty certain it's coin A.
I'm not 100% certain that
569
00:30:09,850 --> 00:30:11,200
it's coin A.
570
00:30:11,200 --> 00:30:15,430
But in any case what happens
here is that the unconditional
571
00:30:15,430 --> 00:30:19,440
probability is different from
the conditional probability.
572
00:30:19,440 --> 00:30:23,710
This information here makes
me change my beliefs
573
00:30:23,710 --> 00:30:25,590
about the 11th toss.
574
00:30:25,590 --> 00:30:30,560
And this means that the 11th
toss is dependent on the
575
00:30:30,560 --> 00:30:31,530
previous tosses.
576
00:30:31,530 --> 00:30:35,590
So the coin tosses have
now become dependent.
577
00:30:35,590 --> 00:30:38,790
What is the physical link that
causes this dependence?
578
00:30:38,790 --> 00:30:42,710
Well, the physical link is
the choice of the coin.
579
00:30:42,710 --> 00:30:46,580
By choosing a particular coin,
I'm introducing a pattern in
580
00:30:46,580 --> 00:30:48,200
the future coin tosses.
581
00:30:48,200 --> 00:30:52,740
And that pattern is what
causes dependence.
582
00:30:52,740 --> 00:30:55,670
OK, so I've been playing a
little bit too loose with the
583
00:30:55,670 --> 00:30:59,810
language here, because we
defined the concept of
584
00:30:59,810 --> 00:31:01,810
independence of two events.
585
00:31:01,810 --> 00:31:06,180
But here I have been referring
to independent coin tosses,
586
00:31:06,180 --> 00:31:08,380
where I'm thinking about
many coin tosses,
587
00:31:08,380 --> 00:31:11,200
like 10 or 11 of them.
588
00:31:11,200 --> 00:31:15,160
So to be proper, I should have
defined for you also the
589
00:31:15,160 --> 00:31:18,710
notion of independence of
multiple events, not just two.
590
00:31:18,710 --> 00:31:21,970
We don't want to just say coin
toss one is independent from
591
00:31:21,970 --> 00:31:23,170
coin toss two.
592
00:31:23,170 --> 00:31:26,250
We want to be able to say
something like, these 10 then
593
00:31:26,250 --> 00:31:29,690
coin tosses are all independent
of each other.
594
00:31:29,690 --> 00:31:33,800
Intuitively what that means
should be the same thing--
595
00:31:33,800 --> 00:31:37,450
that information about some of
the coin tosses doesn't change
596
00:31:37,450 --> 00:31:40,220
your beliefs about the remaining
coin tosses.
597
00:31:40,220 --> 00:31:43,580
How do we translate that into
a mathematical definition?
598
00:31:43,580 --> 00:31:48,600
Well, an ugly attempt
would be to impose
599
00:31:48,600 --> 00:31:51,800
requirements such as this.
600
00:31:51,800 --> 00:31:56,780
Think of A1 being the event that
the first flip was heads.
601
00:31:56,780 --> 00:32:00,980
A2 is the event of that the
second flip was heads.
602
00:32:00,980 --> 00:32:04,320
A3, the third flip, was
heads, and so on.
603
00:32:04,320 --> 00:32:08,310
Here is an event whose
occurrence is not determined
604
00:32:08,310 --> 00:32:10,860
by the first three coin flips.
605
00:32:10,860 --> 00:32:13,400
And here's an event whose
occurrence or not is
606
00:32:13,400 --> 00:32:16,680
determined by the fifth
and sixth coin flip.
607
00:32:16,680 --> 00:32:19,080
If we think physically that
all those coin flips have
608
00:32:19,080 --> 00:32:22,220
nothing to do with each other,
information about the fifth
609
00:32:22,220 --> 00:32:26,420
and sixth coin flip are not
going to change what we expect
610
00:32:26,420 --> 00:32:27,960
from the first three.
611
00:32:27,960 --> 00:32:30,780
So the probability of this
event, the conditional
612
00:32:30,780 --> 00:32:33,050
probability, should be the
same as the unconditional
613
00:32:33,050 --> 00:32:34,430
probability.
614
00:32:34,430 --> 00:32:38,850
And we would like a relation
of this kind to be true, no
615
00:32:38,850 --> 00:32:43,480
matter what kind of formula you
write down, as long as the
616
00:32:43,480 --> 00:32:47,230
events that show up here are
different from the events that
617
00:32:47,230 --> 00:32:49,250
show up there.
618
00:32:49,250 --> 00:32:49,770
OK.
619
00:32:49,770 --> 00:32:52,150
That's sort of an
ugly definition.
620
00:32:52,150 --> 00:32:55,350
The mathematical definition that
actually does the job,
621
00:32:55,350 --> 00:32:59,530
and leads to all the
formulas of this
622
00:32:59,530 --> 00:33:01,130
kind, is the following.
623
00:33:01,130 --> 00:33:03,610
We're going to say that the
collection of events are
624
00:33:03,610 --> 00:33:07,090
independent if we can find the
probability of their joint
625
00:33:07,090 --> 00:33:11,780
occurrence by just multiplying
probabilities.
626
00:33:11,780 --> 00:33:17,380
And that will be true even if
you look at sub-collections of
627
00:33:17,380 --> 00:33:18,640
these events.
628
00:33:18,640 --> 00:33:20,670
Let's make that more precise.
629
00:33:20,670 --> 00:33:24,310
If we have three events, the
definition tells us that the
630
00:33:24,310 --> 00:33:27,560
three events are independent
if the following are true.
631
00:33:27,560 --> 00:33:31,830
Probability A1 and A2 and A3,
you can calculate this
632
00:33:31,830 --> 00:33:34,840
probability by multiplying
individual probabilities.
633
00:33:34,840 --> 00:33:38,370
634
00:33:38,370 --> 00:33:44,320
But the same is true even if
you take fewer events.
635
00:33:44,320 --> 00:33:46,740
Just a few indices out
of the indices
636
00:33:46,740 --> 00:33:48,340
that we have available.
637
00:33:48,340 --> 00:33:54,970
So we also require P(A1
intersection A2) is P(A1)
638
00:33:54,970 --> 00:33:57,600
times P(A2).
639
00:33:57,600 --> 00:34:01,250
And similarly for the other
possibilities of
640
00:34:01,250 --> 00:34:02,500
choosing the indices.
641
00:34:02,500 --> 00:34:10,900
642
00:34:10,900 --> 00:34:14,659
OK, so independence,
mathematical definition,
643
00:34:14,659 --> 00:34:18,860
requires that calculating
probabilities of any
644
00:34:18,860 --> 00:34:22,370
intersection of the events we
have in our hands, that
645
00:34:22,370 --> 00:34:25,590
calculation can be done by just
multiplying individual
646
00:34:25,590 --> 00:34:27,000
probabilities.
647
00:34:27,000 --> 00:34:30,230
And this has to apply to the
case where we consider all of
648
00:34:30,230 --> 00:34:33,300
the events in our hands or just
649
00:34:33,300 --> 00:34:36,900
sub-collections of those events.
650
00:34:36,900 --> 00:34:42,130
Now these relations just by
themselves are called pairwise
651
00:34:42,130 --> 00:34:44,389
independence.
652
00:34:44,389 --> 00:34:47,179
So this relation, for example,
tells us that A1 is
653
00:34:47,179 --> 00:34:48,710
independent from A2.
654
00:34:48,710 --> 00:34:51,130
This tells us that A2 is
independent from A3.
655
00:34:51,130 --> 00:34:54,670
This will tell us that A1
is independent from A3.
656
00:34:54,670 --> 00:34:58,990
But independence of all the
events together actually
657
00:34:58,990 --> 00:35:01,020
requires a little more.
658
00:35:01,020 --> 00:35:05,080
One more equality that has to do
with all three events being
659
00:35:05,080 --> 00:35:07,000
considered at the same time.
660
00:35:07,000 --> 00:35:10,562
And this extra equality
is not redundant.
661
00:35:10,562 --> 00:35:13,020
It actually does make
a difference.
662
00:35:13,020 --> 00:35:15,390
Independence and pairwise
independence
663
00:35:15,390 --> 00:35:17,310
are different things.
664
00:35:17,310 --> 00:35:20,320
So let's illustrate the
situation with an example.
665
00:35:20,320 --> 00:35:22,790
Suppose we have two
coin flips.
666
00:35:22,790 --> 00:35:28,390
The coin tosses are independent,
so the bias is
667
00:35:28,390 --> 00:35:32,910
1/2, so all possible outcomes
have a probability of 1/2
668
00:35:32,910 --> 00:35:36,100
times 1/2, which is 1/4.
669
00:35:36,100 --> 00:35:40,520
And let's consider now a bunch
of different events.
670
00:35:40,520 --> 00:35:46,290
One event is that the
first toss is heads.
671
00:35:46,290 --> 00:35:48,950
This is this blue set here.
672
00:35:48,950 --> 00:35:54,990
Another event is the second
toss is heads.
673
00:35:54,990 --> 00:35:57,970
And this is this black
event here.
674
00:35:57,970 --> 00:36:00,770
675
00:36:00,770 --> 00:36:01,850
OK.
676
00:36:01,850 --> 00:36:04,500
Are these two events
independent?
677
00:36:04,500 --> 00:36:06,660
If you check it mathematically,
yes.
678
00:36:06,660 --> 00:36:09,270
Probability of A is probability
of B is 1/2.
679
00:36:09,270 --> 00:36:13,170
Probability of A times
probability of B is 1/4, which
680
00:36:13,170 --> 00:36:16,700
is the same as the probability
of A intersection B,
681
00:36:16,700 --> 00:36:18,070
which is this set.
682
00:36:18,070 --> 00:36:20,680
So we have just checked
mathematically that A and B
683
00:36:20,680 --> 00:36:22,180
are independent.
684
00:36:22,180 --> 00:36:26,210
Now lets consider a third event
which is that the first
685
00:36:26,210 --> 00:36:30,080
and second toss give
the same result.
686
00:36:30,080 --> 00:36:32,270
I'll use a different color.
687
00:36:32,270 --> 00:36:35,400
First and second toss to
give the same result.
688
00:36:35,400 --> 00:36:38,350
This is the event that
we obtain heads,
689
00:36:38,350 --> 00:36:40,700
heads or tails, tails.
690
00:36:40,700 --> 00:36:43,030
So this is the probability
of C. What's the
691
00:36:43,030 --> 00:36:44,280
probability of C?
692
00:36:44,280 --> 00:36:47,790
693
00:36:47,790 --> 00:36:51,520
Well, C is made up of two
outcomes, each one of which
694
00:36:51,520 --> 00:36:55,500
has probability 1/4, so the
probability of C is 1/2.
695
00:36:55,500 --> 00:36:58,600
What is the probability
of C intersection A?
696
00:36:58,600 --> 00:37:02,760
C intersection A is just this
one outcome, and has
697
00:37:02,760 --> 00:37:06,030
probability 1/4.
698
00:37:06,030 --> 00:37:10,040
What's the probability of A
intersection B intersection C?
699
00:37:10,040 --> 00:37:13,650
The three events intersect just
this outcome, so this
700
00:37:13,650 --> 00:37:15,620
probability is also 1/4.
701
00:37:15,620 --> 00:37:18,610
702
00:37:18,610 --> 00:37:19,860
OK.
703
00:37:19,860 --> 00:37:24,130
704
00:37:24,130 --> 00:37:27,060
What's the probability
of C given A and B?
705
00:37:27,060 --> 00:37:29,800
706
00:37:29,800 --> 00:37:34,840
If A has occurred, and B has
occurred, you are certain that
707
00:37:34,840 --> 00:37:36,980
this outcome here happened.
708
00:37:36,980 --> 00:37:40,160
If the first toss is H and the
second toss is H, then you're
709
00:37:40,160 --> 00:37:41,970
certain of the first
and second toss
710
00:37:41,970 --> 00:37:43,760
gave the same result.
711
00:37:43,760 --> 00:37:46,500
So the conditional probability
of C given A and
712
00:37:46,500 --> 00:37:49,050
B is equal to 1.
713
00:37:49,050 --> 00:37:51,640
So do we have independence
in this example?
714
00:37:51,640 --> 00:37:54,310
715
00:37:54,310 --> 00:37:55,970
We don't.
716
00:37:55,970 --> 00:38:00,210
C, that we obtain the same
result in the first and the
717
00:38:00,210 --> 00:38:04,020
second toss, has probability
1/2.
718
00:38:04,020 --> 00:38:08,480
Half of the possible outcomes
give us two coin flips with
719
00:38:08,480 --> 00:38:10,700
the same result-- heads,
heads or tails, tails.
720
00:38:10,700 --> 00:38:12,970
So the probability
of C is 1/2.
721
00:38:12,970 --> 00:38:17,590
But if I tell you that the
events A and B both occurred,
722
00:38:17,590 --> 00:38:20,900
then you're certain
that C occurred.
723
00:38:20,900 --> 00:38:23,190
If I tell you that we had heads
and heads, then you're
724
00:38:23,190 --> 00:38:25,460
certain the outcomes
were the same.
725
00:38:25,460 --> 00:38:28,830
So the conditional probability
is different from the
726
00:38:28,830 --> 00:38:31,400
unconditional probability.
727
00:38:31,400 --> 00:38:37,050
So by combining these two
relations together, we get
728
00:38:37,050 --> 00:38:39,235
that the three events
are not independent.
729
00:38:39,235 --> 00:38:42,260
730
00:38:42,260 --> 00:38:45,520
But are they pairwise
independent?
731
00:38:45,520 --> 00:38:49,020
Is A independent from B?
732
00:38:49,020 --> 00:38:53,400
Yes, because probability of A
times probability of B is 1/4,
733
00:38:53,400 --> 00:38:58,670
which is probability of
A intersection B. Is C
734
00:38:58,670 --> 00:39:02,350
independent from A?
735
00:39:02,350 --> 00:39:05,780
Well, the probability
of C and A is 1/4.
736
00:39:05,780 --> 00:39:07,620
The probability of C is 1/2.
737
00:39:07,620 --> 00:39:09,830
The probability of A is 1/2.
738
00:39:09,830 --> 00:39:11,150
So it checks.
739
00:39:11,150 --> 00:39:17,960
1/4 is equal to 1/2 and 1/2,
so event C and event A are
740
00:39:17,960 --> 00:39:19,410
independent.
741
00:39:19,410 --> 00:39:24,490
Knowing that the first toss was
heads does not change your
742
00:39:24,490 --> 00:39:28,600
beliefs about whether the two
tosses are going to have the
743
00:39:28,600 --> 00:39:31,380
same outcome or not.
744
00:39:31,380 --> 00:39:34,200
Knowing that the first was
heads, well, the second is
745
00:39:34,200 --> 00:39:36,520
equally likely to be
heads or tails.
746
00:39:36,520 --> 00:39:39,710
So event C has just the
same probability,
747
00:39:39,710 --> 00:39:42,140
again, 1/2, to occur.
748
00:39:42,140 --> 00:39:46,110
To put it the opposite way,
if I tell you that the two
749
00:39:46,110 --> 00:39:47,860
results were the same--
750
00:39:47,860 --> 00:39:51,130
so it's either heads, heads
or tails, tails--
751
00:39:51,130 --> 00:39:53,070
what does that tell you
about the first toss?
752
00:39:53,070 --> 00:39:54,800
Is it heads, or is it tails?
753
00:39:54,800 --> 00:39:56,570
Well, it doesn't tell
you anything.
754
00:39:56,570 --> 00:39:59,700
It could be either over the
two, so the probability of
755
00:39:59,700 --> 00:40:04,490
heads in the first toss is equal
to 1/2, and telling you
756
00:40:04,490 --> 00:40:07,460
C occurred does not
change anything.
757
00:40:07,460 --> 00:40:10,830
So this is an example that
illustrates the case where we
758
00:40:10,830 --> 00:40:14,650
have three events in which
we check that pairwise
759
00:40:14,650 --> 00:40:18,140
independence holds for
any combination of
760
00:40:18,140 --> 00:40:19,250
two of these events.
761
00:40:19,250 --> 00:40:21,900
We have the probability of their
intersection is equal to
762
00:40:21,900 --> 00:40:23,760
the product of their
probabilities.
763
00:40:23,760 --> 00:40:27,930
On the other hand, the three
events taken all together are
764
00:40:27,930 --> 00:40:29,500
not independent.
765
00:40:29,500 --> 00:40:32,780
A doesn't tell me anything
useful, whether C is going to
766
00:40:32,780 --> 00:40:34,710
occur or not.
767
00:40:34,710 --> 00:40:36,730
B doesn't tell me
anything useful.
768
00:40:36,730 --> 00:40:40,840
But if I tell you that both A
and B occurred, the two of
769
00:40:40,840 --> 00:40:44,150
them together tell me something
useful about C.
770
00:40:44,150 --> 00:40:47,165
Namely, they tell me that C
certainly has occurred.
771
00:40:47,165 --> 00:40:49,750
772
00:40:49,750 --> 00:40:51,000
Very good.
773
00:40:51,000 --> 00:40:53,900
774
00:40:53,900 --> 00:40:56,890
So independence is this somewhat
subtle concept.
775
00:40:56,890 --> 00:40:59,710
Once you grasp the intuition of
what it really means, then
776
00:40:59,710 --> 00:41:02,910
things perhaps fall in place.
777
00:41:02,910 --> 00:41:06,630
But it's a concept where
it's easy to get some
778
00:41:06,630 --> 00:41:07,430
misunderstanding.
779
00:41:07,430 --> 00:41:11,370
So just take some
time to digest.
780
00:41:11,370 --> 00:41:14,810
So to lighten things up, I'm
going to spend the remaining
781
00:41:14,810 --> 00:41:18,810
four minutes talking about the
very nice, simple problem that
782
00:41:18,810 --> 00:41:23,240
involves conditional
probabilities and the like.
783
00:41:23,240 --> 00:41:28,140
So here's the problem,
formulated exactly as it shows
784
00:41:28,140 --> 00:41:30,250
up in various textbooks.
785
00:41:30,250 --> 00:41:31,780
And the formulation says
the following.
786
00:41:31,780 --> 00:41:35,310
Well, consider one of those
anachronistic places where
787
00:41:35,310 --> 00:41:40,050
they still have kings or queens,
and where actually
788
00:41:40,050 --> 00:41:43,090
boys take precedence
over girls.
789
00:41:43,090 --> 00:41:44,600
So if there is a boy--
790
00:41:44,600 --> 00:41:47,280
791
00:41:47,280 --> 00:41:52,400
if the royal family has a boy,
then he will become the king
792
00:41:52,400 --> 00:41:58,080
even if he has an older sister
who might be the queen.
793
00:41:58,080 --> 00:42:02,930
So we have one of those
royal families.
794
00:42:02,930 --> 00:42:06,810
That royal family had two
children, and we know that
795
00:42:06,810 --> 00:42:08,060
there is a king.
796
00:42:08,060 --> 00:42:11,370
797
00:42:11,370 --> 00:42:14,250
There is a king, which means
that at least one of the two
798
00:42:14,250 --> 00:42:16,030
children was a boy.
799
00:42:16,030 --> 00:42:18,970
Otherwise we wouldn't
have a king.
800
00:42:18,970 --> 00:42:21,885
What is the probability that the
king's sibling is female?
801
00:42:21,885 --> 00:42:24,920
802
00:42:24,920 --> 00:42:26,170
OK.
803
00:42:26,170 --> 00:42:28,260
804
00:42:28,260 --> 00:42:30,830
I guess we need to make some
assumptions about genetics.
805
00:42:30,830 --> 00:42:33,910
Let's assume that every child
is a boy or a girl with
806
00:42:33,910 --> 00:42:39,440
probability 1/2, and that
different children, what they
807
00:42:39,440 --> 00:42:42,900
are is independent from what
the other children were.
808
00:42:42,900 --> 00:42:47,660
So every childbirth is basically
a coin flip.
809
00:42:47,660 --> 00:42:50,740
OK, so if you take that,
you say, well,
810
00:42:50,740 --> 00:42:52,980
the king is a child.
811
00:42:52,980 --> 00:42:55,890
His sibling is another child.
812
00:42:55,890 --> 00:42:58,450
Children are independent
of each other.
813
00:42:58,450 --> 00:43:05,860
So the probability that the
sibling is a girl is 1/2.
814
00:43:05,860 --> 00:43:07,620
That's the naive answer.
815
00:43:07,620 --> 00:43:09,270
Now let's try to
do it formally.
816
00:43:09,270 --> 00:43:12,410
Let's set up a model
of the experiment.
817
00:43:12,410 --> 00:43:15,650
The royal family had two
children, as we we're told, so
818
00:43:15,650 --> 00:43:17,020
there's four outcomes--
819
00:43:17,020 --> 00:43:22,040
boy boy, boy girl, girl
boy, and girl girl.
820
00:43:22,040 --> 00:43:26,520
Now, we are told that there is
a king, which means what?
821
00:43:26,520 --> 00:43:29,530
This outcome here
did not happen.
822
00:43:29,530 --> 00:43:30,760
It is not possible.
823
00:43:30,760 --> 00:43:33,810
There are three outcomes
that remain possible.
824
00:43:33,810 --> 00:43:37,940
So this is our conditional
sample space given
825
00:43:37,940 --> 00:43:40,500
that there is king.
826
00:43:40,500 --> 00:43:43,170
What are the probabilities
for the original model?
827
00:43:43,170 --> 00:43:46,420
Well with the model that we
assume that every child is a
828
00:43:46,420 --> 00:43:50,885
boy or a girl independently with
probability 1/2, then the
829
00:43:50,885 --> 00:43:54,950
four outcomes would be equally
likely, and they're like this.
830
00:43:54,950 --> 00:43:57,110
These are the original
probabilities.
831
00:43:57,110 --> 00:44:00,810
But once we are told that this
outcome did not happen,
832
00:44:00,810 --> 00:44:03,900
because we have a king, then
we are transported to the
833
00:44:03,900 --> 00:44:05,830
smaller sample space.
834
00:44:05,830 --> 00:44:08,380
In this sample space, what's
the probability that the
835
00:44:08,380 --> 00:44:10,360
sibling is a girl?
836
00:44:10,360 --> 00:44:15,160
Well the sibling is a girl in
two out of the three outcomes.
837
00:44:15,160 --> 00:44:17,290
So the probability that
the sibling is a
838
00:44:17,290 --> 00:44:21,880
girl is actually 2/3.
839
00:44:21,880 --> 00:44:25,780
So that's supposed to
be the right answer.
840
00:44:25,780 --> 00:44:29,620
Maybe a little
counter-intuitive.
841
00:44:29,620 --> 00:44:32,960
So you can play smart and say,
oh I understand such problems
842
00:44:32,960 --> 00:44:35,990
better than you, here is a trick
problem and here's why
843
00:44:35,990 --> 00:44:37,800
the answer is 2/3.
844
00:44:37,800 --> 00:44:41,300
But actually I'm not fully
justified in saying that the
845
00:44:41,300 --> 00:44:42,930
answer is 2/3.
846
00:44:42,930 --> 00:44:46,520
I made lots of hidden
assumptions when I put this
847
00:44:46,520 --> 00:44:50,040
model down, which I
didn't yet state.
848
00:44:50,040 --> 00:44:54,960
So to reverse engineer this
answer, let's actually think
849
00:44:54,960 --> 00:44:57,960
what's the probability model for
which this would have been
850
00:44:57,960 --> 00:44:59,320
the right answer.
851
00:44:59,320 --> 00:45:01,300
And here's the probability
model.
852
00:45:01,300 --> 00:45:02,800
The royal family--
853
00:45:02,800 --> 00:45:07,050
the royal parents decided to
have exactly two children.
854
00:45:07,050 --> 00:45:08,960
They went and had them.
855
00:45:08,960 --> 00:45:11,670
It turned out that at
least one was a boy
856
00:45:11,670 --> 00:45:13,390
and became a king.
857
00:45:13,390 --> 00:45:15,580
Under this scenario--
858
00:45:15,580 --> 00:45:18,070
that they decide to have
exactly two children--
859
00:45:18,070 --> 00:45:20,840
then this is the big
sample space.
860
00:45:20,840 --> 00:45:23,350
It turned out that
one was a boy.
861
00:45:23,350 --> 00:45:25,560
That eliminates this outcome.
862
00:45:25,560 --> 00:45:27,410
And then this picture
is correct and this
863
00:45:27,410 --> 00:45:28,750
is the right answer.
864
00:45:28,750 --> 00:45:31,680
But there's hidden assumptions
being there.
865
00:45:31,680 --> 00:45:35,170
How about if the royal
family had followed
866
00:45:35,170 --> 00:45:37,230
the following strategy?
867
00:45:37,230 --> 00:45:41,760
We're going to have children
until we get a boy, so that we
868
00:45:41,760 --> 00:45:45,700
get a king, and then
we'll stop.
869
00:45:45,700 --> 00:45:47,660
OK, given they have two
children, what's the
870
00:45:47,660 --> 00:45:50,660
probability that the
sibling is a girl?
871
00:45:50,660 --> 00:45:51,880
It's 1.
872
00:45:51,880 --> 00:45:55,260
The reason that they had two
children was because the first
873
00:45:55,260 --> 00:45:57,800
was a girl, so they had
to have a second.
874
00:45:57,800 --> 00:46:00,820
So assumptions about
reproductive practices
875
00:46:00,820 --> 00:46:03,130
actually need to come in,
and they're going
876
00:46:03,130 --> 00:46:04,630
to affect the decisions.
877
00:46:04,630 --> 00:46:08,010
Or, if it's one of those ancient
kingdoms where a king
878
00:46:08,010 --> 00:46:11,790
would always make sure too
strangle any of his brothers,
879
00:46:11,790 --> 00:46:15,560
then the probability that the
sibling is a girl is actually
880
00:46:15,560 --> 00:46:17,570
1 again, and so on.
881
00:46:17,570 --> 00:46:20,590
So it means that one needs to be
careful when you start with
882
00:46:20,590 --> 00:46:24,330
loosely worded problems to
make sure exactly what it
883
00:46:24,330 --> 00:46:26,950
means and what assumptions
you're making.
884
00:46:26,950 --> 00:46:28,880
All right, see you next week.
885
00:46:28,880 --> 00:46:30,130