1
00:00:00,000 --> 00:00:00,040
2
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative
3
00:00:02,460 --> 00:00:03,870
Commons license.
4
00:00:03,870 --> 00:00:06,910
Your support will help MIT
OpenCourseWare continue to
5
00:00:06,910 --> 00:00:10,560
offer high quality educational
resources for free.
6
00:00:10,560 --> 00:00:13,460
To make a donation or view
additional materials from
7
00:00:13,460 --> 00:00:19,290
hundreds of MIT courses, visit
MIT OpenCourseWare at
8
00:00:19,290 --> 00:00:22,004
ocw.mit.edu
9
00:00:22,004 --> 00:00:24,966
JOHN TSISIKLIS: So here's
the agenda for today.
10
00:00:24,966 --> 00:00:26,848
We're going to do a
very quick review.
11
00:00:26,848 --> 00:00:28,936
And then we're going
to introduce some
12
00:00:28,936 --> 00:00:30,560
very important concepts.
13
00:00:30,560 --> 00:00:34,060
The idea is that all
information is--
14
00:00:34,060 --> 00:00:36,450
Information is always partial.
15
00:00:36,450 --> 00:00:40,260
And the question is what do we
do to probabilities if we have
16
00:00:40,260 --> 00:00:43,340
some partial information about
the random experiments.
17
00:00:43,340 --> 00:00:45,770
We're going to introduce the
important concept of
18
00:00:45,770 --> 00:00:47,530
conditional probability.
19
00:00:47,530 --> 00:00:50,860
And then we will see three
very useful ways
20
00:00:50,860 --> 00:00:52,670
in which it is used.
21
00:00:52,670 --> 00:00:55,410
And these ways basically
correspond to divide and
22
00:00:55,410 --> 00:00:58,070
conquer methods for breaking
up problems
23
00:00:58,070 --> 00:01:00,120
into simpler pieces.
24
00:01:00,120 --> 00:01:04,010
And also one more fundamental
tool which allows us to use
25
00:01:04,010 --> 00:01:07,420
conditional probabilities to do
inference, that is, if we
26
00:01:07,420 --> 00:01:09,440
get a little bit of information
about some
27
00:01:09,440 --> 00:01:12,620
phenomenon, what can we
infer about the things
28
00:01:12,620 --> 00:01:14,640
that we have not seen?
29
00:01:14,640 --> 00:01:17,050
So our quick review.
30
00:01:17,050 --> 00:01:22,100
In setting up a model of a
random experiment, the first
31
00:01:22,100 --> 00:01:25,930
thing to do is to come up with
a list of all the possible
32
00:01:25,930 --> 00:01:27,870
outcomes of the experiment.
33
00:01:27,870 --> 00:01:31,120
So that list is what we
call the sample space.
34
00:01:31,120 --> 00:01:32,480
It's a set.
35
00:01:32,480 --> 00:01:34,580
And the elements of the
sample space are all
36
00:01:34,580 --> 00:01:35,720
the possible outcomes.
37
00:01:35,720 --> 00:01:37,560
Those possible outcomes must be
38
00:01:37,560 --> 00:01:39,690
distinguishable from each other.
39
00:01:39,690 --> 00:01:41,020
They're mutually exclusive.
40
00:01:41,020 --> 00:01:44,900
Either one happens or the other
happens, but not both.
41
00:01:44,900 --> 00:01:47,440
And they are collectively
exhaustive, that is no matter
42
00:01:47,440 --> 00:01:50,480
what the outcome of the
experiment is going to be an
43
00:01:50,480 --> 00:01:52,130
element of the sample space.
44
00:01:52,130 --> 00:01:54,200
And then we discussed last
time that there's also an
45
00:01:54,200 --> 00:01:57,510
element of art in how to choose
your sample space,
46
00:01:57,510 --> 00:02:01,440
depending on how much detail
you want to capture.
47
00:02:01,440 --> 00:02:03,130
This is usually the easy part.
48
00:02:03,130 --> 00:02:06,980
Then the more interesting part
is to assign probabilities to
49
00:02:06,980 --> 00:02:10,660
our model, that is to make some
statements about what we
50
00:02:10,660 --> 00:02:14,610
believe to be likely and what
we believe to be unlikely.
51
00:02:14,610 --> 00:02:17,720
The way we do that is by
assigning probabilities to
52
00:02:17,720 --> 00:02:20,510
subsets of the sample space.
53
00:02:20,510 --> 00:02:26,120
So as we have our sample space
here, we may have a subset A.
54
00:02:26,120 --> 00:02:31,090
And we assign a number to that
subset P(A), which is the
55
00:02:31,090 --> 00:02:33,910
probability that this
event happens.
56
00:02:33,910 --> 00:02:37,080
Or this is the probability that
when we do the experiment
57
00:02:37,080 --> 00:02:39,860
and we get an outcome it's the
probability that the outcome
58
00:02:39,860 --> 00:02:41,850
happens to fall inside
that event.
59
00:02:41,850 --> 00:02:44,500
We have certain rules that
probabilities should satisfy.
60
00:02:44,500 --> 00:02:46,210
They're non-negative.
61
00:02:46,210 --> 00:02:49,780
The probability of the overall
sample space is equal to one,
62
00:02:49,780 --> 00:02:52,900
which expresses the fact that
we're are certain, no matter
63
00:02:52,900 --> 00:02:55,480
what, the outcome is going
to be an element
64
00:02:55,480 --> 00:02:56,830
of the sample space.
65
00:02:56,830 --> 00:02:59,760
Well, if we set the top right
so that it exhausts all
66
00:02:59,760 --> 00:03:03,190
possibilities, this should
be the case.
67
00:03:03,190 --> 00:03:05,480
And then there's another
interesting property of
68
00:03:05,480 --> 00:03:09,240
probabilities that says that,
if we have two events or two
69
00:03:09,240 --> 00:03:11,910
subsets that are disjoint, and
we're interested in the
70
00:03:11,910 --> 00:03:17,670
probability, that one or the
other happens, that is the
71
00:03:17,670 --> 00:03:21,870
outcome belongs to A or belongs
to B. For disjoint
72
00:03:21,870 --> 00:03:25,320
events the total probability of
these two, taken together,
73
00:03:25,320 --> 00:03:28,030
is just the sum of their
individual probabilities.
74
00:03:28,030 --> 00:03:30,270
So probabilities behave
like masses.
75
00:03:30,270 --> 00:03:34,760
The mass of the object
consisting of A and B is the
76
00:03:34,760 --> 00:03:37,230
sum of the masses of
these two objects.
77
00:03:37,230 --> 00:03:39,720
Or you can think of
probabilities as areas.
78
00:03:39,720 --> 00:03:41,240
They have, again, the
same property.
79
00:03:41,240 --> 00:03:45,490
The area of A together with B is
the area of A plus the area
80
00:03:45,490 --> 00:03:46,410
B.
81
00:03:46,410 --> 00:03:50,290
But as we discussed at the end
of last lecture, it's useful
82
00:03:50,290 --> 00:03:53,970
to have in our hands a more
general version of this
83
00:03:53,970 --> 00:03:58,990
additivity property, which says
the following, if we take
84
00:03:58,990 --> 00:04:00,982
a sequence of sets--
85
00:04:00,982 --> 00:04:07,480
A1, A2, A3, A4, and so on.
86
00:04:07,480 --> 00:04:09,630
And we put all of those
sets together.
87
00:04:09,630 --> 00:04:11,410
It's an infinite sequence.
88
00:04:11,410 --> 00:04:14,950
And we ask for the probability
that the outcome falls
89
00:04:14,950 --> 00:04:19,170
somewhere in this infinite
union, that is we are asking
90
00:04:19,170 --> 00:04:22,640
for the probability that the
outcome belongs to one of
91
00:04:22,640 --> 00:04:27,950
these sets, and assuming that
the sets are disjoint, we can
92
00:04:27,950 --> 00:04:32,820
again find the probability for
the overall set by adding up
93
00:04:32,820 --> 00:04:36,000
the probabilities of the
individual sets.
94
00:04:36,000 --> 00:04:38,910
So this is a nice and
simple property.
95
00:04:38,910 --> 00:04:43,130
But it's a little more subtle
than you might think.
96
00:04:43,130 --> 00:04:45,820
And let's see what's going
on by considering
97
00:04:45,820 --> 00:04:47,770
the following example.
98
00:04:47,770 --> 00:04:51,850
We had an example last time
where we take our sample space
99
00:04:51,850 --> 00:04:53,800
to be the unit square.
100
00:04:53,800 --> 00:04:58,110
And we said let's consider a
probability law that says that
101
00:04:58,110 --> 00:05:04,190
the probability of a subset is
just the area of that subset.
102
00:05:04,190 --> 00:05:07,630
So let's consider this
probability law.
103
00:05:07,630 --> 00:05:08,530
OK.
104
00:05:08,530 --> 00:05:13,990
Now the unit square is
the set --let me just
105
00:05:13,990 --> 00:05:15,210
draw it this way--
106
00:05:15,210 --> 00:05:20,520
the unit square is the union of
one element set consisting
107
00:05:20,520 --> 00:05:21,680
all of the points.
108
00:05:21,680 --> 00:05:28,280
So the unit square is made up
by the union of the various
109
00:05:28,280 --> 00:05:30,740
points inside the square.
110
00:05:30,740 --> 00:05:33,830
So union over all x's and y's.
111
00:05:33,830 --> 00:05:34,770
OK?
112
00:05:34,770 --> 00:05:36,690
So the square is made
up out of all the
113
00:05:36,690 --> 00:05:38,400
points that this contains.
114
00:05:38,400 --> 00:05:41,140
And now let's do
a calculation.
115
00:05:41,140 --> 00:05:45,060
One is the probability of our
overall sample space, which is
116
00:05:45,060 --> 00:05:47,260
the unit square.
117
00:05:47,260 --> 00:06:02,000
Now the unit square is the union
of these things, which,
118
00:06:02,000 --> 00:06:06,810
according to our additivity
axiom, is the sum of the
119
00:06:06,810 --> 00:06:10,595
probabilities of all of these
one element sets.
120
00:06:10,595 --> 00:06:16,830
121
00:06:16,830 --> 00:06:20,580
Now what is the probability
of a one element set?
122
00:06:20,580 --> 00:06:23,520
What is the probability of
this one element set?
123
00:06:23,520 --> 00:06:26,100
What's the probability that our
outcome is exactly that
124
00:06:26,100 --> 00:06:27,490
particular point?
125
00:06:27,490 --> 00:06:31,460
Well, it's the area of that
set, which is zero.
126
00:06:31,460 --> 00:06:33,990
So it's just the sum of zeros.
127
00:06:33,990 --> 00:06:35,950
And by any reasonable
definition the
128
00:06:35,950 --> 00:06:38,370
sum of zeros is zero.
129
00:06:38,370 --> 00:06:42,220
So we just proved that
one is equal to zero.
130
00:06:42,220 --> 00:06:42,680
OK.
131
00:06:42,680 --> 00:06:48,340
Either probability theory is
dead or there is some mistake
132
00:06:48,340 --> 00:06:51,030
in the derivation that I did.
133
00:06:51,030 --> 00:06:54,580
OK, the mistake is quite
subtle and it
134
00:06:54,580 --> 00:06:57,300
comes at this step.
135
00:06:57,300 --> 00:07:00,640
We're sort of applied the
additivity axiom by saying
136
00:07:00,640 --> 00:07:04,040
that the unit square is the
union of all those sets.
137
00:07:04,040 --> 00:07:06,500
Can we really apply our
additivity axiom.
138
00:07:06,500 --> 00:07:07,260
Here's the catch.
139
00:07:07,260 --> 00:07:11,470
The additivity axiom applies
to the case where we have a
140
00:07:11,470 --> 00:07:17,180
sequence of disjoint events
and we take their union.
141
00:07:17,180 --> 00:07:21,740
Is this a sequence of sets?
142
00:07:21,740 --> 00:07:27,780
Can you make up the whole unit
square by taking a sequence of
143
00:07:27,780 --> 00:07:31,310
elements inside it and cover
the whole unit square?
144
00:07:31,310 --> 00:07:34,900
Well if you try, if you start
looking at the sequence of one
145
00:07:34,900 --> 00:07:40,910
element points, that sequence
will never be able to exhaust
146
00:07:40,910 --> 00:07:43,100
the whole unit square.
147
00:07:43,100 --> 00:07:45,680
So there's a deeper reason
behind that.
148
00:07:45,680 --> 00:07:48,790
And the reason is that infinite
sets are not all of
149
00:07:48,790 --> 00:07:50,130
the same size.
150
00:07:50,130 --> 00:07:52,620
The integers are an
infinite set.
151
00:07:52,620 --> 00:07:55,510
And you can arrange the integers
in a sequence.
152
00:07:55,510 --> 00:07:57,630
But the continuous set
like the units
153
00:07:57,630 --> 00:08:00,205
square is a bigger set.
154
00:08:00,205 --> 00:08:02,050
It's so-called uncountable.
155
00:08:02,050 --> 00:08:06,160
It has more elements than
any sequence could have.
156
00:08:06,160 --> 00:08:13,610
So this union here is not of
this kind, where we would have
157
00:08:13,610 --> 00:08:16,930
a sequence of events.
158
00:08:16,930 --> 00:08:18,370
It's a different
kind of union.
159
00:08:18,370 --> 00:08:23,070
It's a Union that involves a
union of many, many more sets.
160
00:08:23,070 --> 00:08:25,420
So the countable additivity
axiom does not
161
00:08:25,420 --> 00:08:27,360
apply in this case.
162
00:08:27,360 --> 00:08:30,230
Because, we're not dealing
with a sequence of sets.
163
00:08:30,230 --> 00:08:33,780
And so this is the
incorrect step.
164
00:08:33,780 --> 00:08:37,240
So at some level you might think
that this is puzzling
165
00:08:37,240 --> 00:08:38,580
and awfully confusing.
166
00:08:38,580 --> 00:08:41,070
On the other hand, if you think
about areas of the way
167
00:08:41,070 --> 00:08:43,520
you're used to them from
calculus, there's nothing
168
00:08:43,520 --> 00:08:44,940
mysterious about it.
169
00:08:44,940 --> 00:08:47,460
Every point on the unit
square has zero area.
170
00:08:47,460 --> 00:08:50,140
When you put all the points
together, they make up
171
00:08:50,140 --> 00:08:52,330
something that has
finite area.
172
00:08:52,330 --> 00:08:55,470
So there shouldn't be any
mystery behind it.
173
00:08:55,470 --> 00:09:00,230
Now, one interesting thing that
this discussion tells us,
174
00:09:00,230 --> 00:09:03,670
especially the fact that the
single elements set has zero
175
00:09:03,670 --> 00:09:05,790
area, is the following--
176
00:09:05,790 --> 00:09:08,960
Individual points have
zero probability.
177
00:09:08,960 --> 00:09:12,390
After you do the experiment and
you observe the outcome,
178
00:09:12,390 --> 00:09:14,660
it's going to be an
individual point.
179
00:09:14,660 --> 00:09:18,160
So what happened in that
experiment is something that
180
00:09:18,160 --> 00:09:21,820
initially you thought had zero
probability of occurring.
181
00:09:21,820 --> 00:09:25,420
So if you happen to get some
particular numbers and you
182
00:09:25,420 --> 00:09:28,290
say, "Well, in the beginning,
what did I think about those
183
00:09:28,290 --> 00:09:29,280
specific numbers?
184
00:09:29,280 --> 00:09:31,290
I thought they had
zero probability.
185
00:09:31,290 --> 00:09:36,250
But yet those particular
numbers did occur."
186
00:09:36,250 --> 00:09:41,640
So one moral from this is that
zero probability does not mean
187
00:09:41,640 --> 00:09:42,890
impossible.
188
00:09:42,890 --> 00:09:46,920
It just means extremely,
extremely unlikely by itself.
189
00:09:46,920 --> 00:09:49,420
So zero probability
things do happen.
190
00:09:49,420 --> 00:09:53,340
In such continuous models,
actually zero probability
191
00:09:53,340 --> 00:09:56,930
outcomes are everything
that happens.
192
00:09:56,930 --> 00:10:00,790
And the bumper sticker version
of this is to always expect
193
00:10:00,790 --> 00:10:02,220
the unexpected.
194
00:10:02,220 --> 00:10:05,095
Yes?
195
00:10:05,095 --> 00:10:06,345
AUDIENCE: [INAUDIBLE].
196
00:10:06,345 --> 00:10:08,532
197
00:10:08,532 --> 00:10:11,800
JOHN TSISIKLIS: Well,
probability is supposed to be
198
00:10:11,800 --> 00:10:12,530
a real number.
199
00:10:12,530 --> 00:10:16,220
So it's either zero or it's
a positive number.
200
00:10:16,220 --> 00:10:21,350
So you can think of the
probability of things just
201
00:10:21,350 --> 00:10:25,040
close to that point and those
probabilities are tiny and
202
00:10:25,040 --> 00:10:26,390
close to zero.
203
00:10:26,390 --> 00:10:28,780
So that's how we're going to
interpret probabilities in
204
00:10:28,780 --> 00:10:29,810
continuous models.
205
00:10:29,810 --> 00:10:31,340
But this is two chapters
ahead.
206
00:10:31,340 --> 00:10:33,950
207
00:10:33,950 --> 00:10:34,230
Yeah?
208
00:10:34,230 --> 00:10:36,198
AUDIENCE: How do we interpret
probability of zero?
209
00:10:36,198 --> 00:10:37,674
If we can use models that
way, then how about
210
00:10:37,674 --> 00:10:38,658
probability of one?
211
00:10:38,658 --> 00:10:40,462
That it it's extremely
likely but not
212
00:10:40,462 --> 00:10:42,110
necessarily for certain?
213
00:10:42,110 --> 00:10:43,320
JOHN TSISIKLIS: That's
also the case.
214
00:10:43,320 --> 00:10:47,450
For example, if you ask in this
continuous model, if you
215
00:10:47,450 --> 00:10:52,190
ask me for the probability that
x, y, is different than
216
00:10:52,190 --> 00:10:55,840
the zero, zero this is
the whole square,
217
00:10:55,840 --> 00:10:57,220
except for one point.
218
00:10:57,220 --> 00:11:01,150
So the area of this is
going to be one.
219
00:11:01,150 --> 00:11:06,330
But this event is not entirely
certain because the zero, zero
220
00:11:06,330 --> 00:11:08,210
outcome is also possible.
221
00:11:08,210 --> 00:11:12,330
So again, probability of one
means essential certainty.
222
00:11:12,330 --> 00:11:16,450
But it still allows the
possibility that the outcome
223
00:11:16,450 --> 00:11:18,320
might be outside that set.
224
00:11:18,320 --> 00:11:20,910
So these are some of the weird
things that are happening when
225
00:11:20,910 --> 00:11:22,680
you have continuous models.
226
00:11:22,680 --> 00:11:25,240
And that's why we start to
this class with discrete
227
00:11:25,240 --> 00:11:27,050
models, on which would
be spending the
228
00:11:27,050 --> 00:11:30,400
next couple of weeks.
229
00:11:30,400 --> 00:11:30,820
OK.
230
00:11:30,820 --> 00:11:35,650
So now once we have set up our
probability model and we have
231
00:11:35,650 --> 00:11:39,160
a legitimate probability law
that has these properties,
232
00:11:39,160 --> 00:11:43,070
then the rest is
usually simple.
233
00:11:43,070 --> 00:11:45,950
Somebody asks you a question of
calculating the probability
234
00:11:45,950 --> 00:11:47,520
of some event.
235
00:11:47,520 --> 00:11:50,270
While you were told something
about the probability law,
236
00:11:50,270 --> 00:11:52,520
such as for example the
probabilities are equal to
237
00:11:52,520 --> 00:11:55,460
areas, and then you just
need to calculate.
238
00:11:55,460 --> 00:11:58,730
In these type of examples
somebody would give you a set
239
00:11:58,730 --> 00:12:00,230
and you would have
to calculate the
240
00:12:00,230 --> 00:12:01,500
area of that set.
241
00:12:01,500 --> 00:12:06,060
So the rest is just calculation
and simple.
242
00:12:06,060 --> 00:12:09,390
Alright, so now it's time
to start with our main
243
00:12:09,390 --> 00:12:12,600
business for today.
244
00:12:12,600 --> 00:12:16,880
And the starting point
is the following--
245
00:12:16,880 --> 00:12:18,920
You know something
about the world.
246
00:12:18,920 --> 00:12:21,690
And based on what you know when
you set up a probability
247
00:12:21,690 --> 00:12:23,820
model and you write down
probabilities for the
248
00:12:23,820 --> 00:12:26,000
different outcomes.
249
00:12:26,000 --> 00:12:28,950
Then something happens, and
somebody tells you a little
250
00:12:28,950 --> 00:12:33,620
more about the world, gives
you some new information.
251
00:12:33,620 --> 00:12:37,430
This new information, in
general, should change your
252
00:12:37,430 --> 00:12:41,240
beliefs about what happened
or what may happen.
253
00:12:41,240 --> 00:12:44,550
So whenever we're given new
information, some partial
254
00:12:44,550 --> 00:12:47,400
information about the outcome
of the experiment, we should
255
00:12:47,400 --> 00:12:49,750
revise our beliefs.
256
00:12:49,750 --> 00:12:54,470
And conditional probabilities
are just the probabilities
257
00:12:54,470 --> 00:12:58,820
that apply after the revision
of our beliefs, when we're
258
00:12:58,820 --> 00:13:00,580
given some information.
259
00:13:00,580 --> 00:13:04,510
So lets make this into
a numerical example.
260
00:13:04,510 --> 00:13:07,870
So inside the sample space, this
part of the sample space,
261
00:13:07,870 --> 00:13:12,580
let's say has probability 3/6,
this part has 2/6, and that
262
00:13:12,580 --> 00:13:14,550
part has 1/6.
263
00:13:14,550 --> 00:13:17,940
I guess that means that out here
we have zero probability.
264
00:13:17,940 --> 00:13:21,900
So these were our initial
beliefs about the outcome of
265
00:13:21,900 --> 00:13:23,270
the experiment.
266
00:13:23,270 --> 00:13:27,160
Suppose now that someone
comes and tells you
267
00:13:27,160 --> 00:13:30,960
that event B occurred.
268
00:13:30,960 --> 00:13:33,560
So they don't tell you the
full outcome with the
269
00:13:33,560 --> 00:13:34,440
experiment.
270
00:13:34,440 --> 00:13:38,960
But they just tell you that the
outcome is known to lie
271
00:13:38,960 --> 00:13:41,060
inside this set B.
272
00:13:41,060 --> 00:13:44,320
Well then, you should certainly
change your beliefs
273
00:13:44,320 --> 00:13:45,560
in some way.
274
00:13:45,560 --> 00:13:48,420
And your new beliefs about what
is likely to occur and
275
00:13:48,420 --> 00:13:51,770
what is not is going to be
denoted by this notation.
276
00:13:51,770 --> 00:13:55,330
This is the conditional
probability that the event A
277
00:13:55,330 --> 00:13:57,970
is going to occur, the
probability that the outcome
278
00:13:57,970 --> 00:14:01,580
is going to fall inside the set
A given that we are told
279
00:14:01,580 --> 00:14:05,890
and we're sure that the event
lies inside the event B Now
280
00:14:05,890 --> 00:14:09,000
once you're told that the
outcome lies inside the event
281
00:14:09,000 --> 00:14:13,740
B, then our old sample space
in some ways is irrelevant.
282
00:14:13,740 --> 00:14:16,975
We have then you sample space,
which is just the set B. We
283
00:14:16,975 --> 00:14:21,020
are certain that the outcome
is going to be inside B.
284
00:14:21,020 --> 00:14:25,465
For example, what is this
conditional probability?
285
00:14:25,465 --> 00:14:29,120
286
00:14:29,120 --> 00:14:30,160
It should be one.
287
00:14:30,160 --> 00:14:33,250
Given that I told you that B
occurred, you're certain that
288
00:14:33,250 --> 00:14:36,380
B occurred, so this has
unit probability.
289
00:14:36,380 --> 00:14:40,340
So here we see an instance of
revision of our beliefs.
290
00:14:40,340 --> 00:14:44,880
Initially, event B had the
probability of (2+1)/6 --
291
00:14:44,880 --> 00:14:46,300
that's 1/2.
292
00:14:46,300 --> 00:14:49,500
Initially, we thought B
had probability 1/2.
293
00:14:49,500 --> 00:14:52,370
Once we're told that B occurred,
the new probability
294
00:14:52,370 --> 00:14:54,250
of B is equal to one.
295
00:14:54,250 --> 00:14:55,160
OK.
296
00:14:55,160 --> 00:15:00,860
How do we revise the probability
that A occurs?
297
00:15:00,860 --> 00:15:03,950
So we are going to have the
outcome of the experiment.
298
00:15:03,950 --> 00:15:07,330
We know that it's inside B. So
we will either get something
299
00:15:07,330 --> 00:15:09,200
here, and A does not occur.
300
00:15:09,200 --> 00:15:12,570
Or something inside here,
and A does occur.
301
00:15:12,570 --> 00:15:16,280
What's the likelihood that,
given that we're inside B, the
302
00:15:16,280 --> 00:15:18,160
outcome is inside here?
303
00:15:18,160 --> 00:15:21,380
Here's how we're going
to think about.
304
00:15:21,380 --> 00:15:26,110
This part of this set B, in
which A also occurs, in our
305
00:15:26,110 --> 00:15:31,280
initial model was twice as
likely as that part of B. So
306
00:15:31,280 --> 00:15:36,220
outcomes inside here
collectively were twice as
307
00:15:36,220 --> 00:15:38,950
likely as outcomes out there.
308
00:15:38,950 --> 00:15:43,240
So we're going to keep the same
proportions and say, that
309
00:15:43,240 --> 00:15:47,280
given that we are inside the set
B, we still want outcomes
310
00:15:47,280 --> 00:15:51,120
inside here to be twice as
likely outcomes there.
311
00:15:51,120 --> 00:15:55,800
So the proportion of the
probabilities should be two
312
00:15:55,800 --> 00:15:57,570
versus one.
313
00:15:57,570 --> 00:16:01,210
And these probabilities should
add up to one because together
314
00:16:01,210 --> 00:16:04,340
they make the conditional
probability of B. So the
315
00:16:04,340 --> 00:16:09,260
conditional probabilities should
be 2/3 probability of
316
00:16:09,260 --> 00:16:13,080
being here and 1/3 probability
of being there.
317
00:16:13,080 --> 00:16:16,860
That's how we revise
our probabilities.
318
00:16:16,860 --> 00:16:20,740
That's a reasonable, intuitively
reasonable, way of
319
00:16:20,740 --> 00:16:22,230
doing this revision.
320
00:16:22,230 --> 00:16:26,650
Let's translate what we
did into a definition.
321
00:16:26,650 --> 00:16:29,490
The definition says the
following, that the
322
00:16:29,490 --> 00:16:33,410
conditional probability of A
given that B occurred is
323
00:16:33,410 --> 00:16:35,270
calculated as follows.
324
00:16:35,270 --> 00:16:39,430
We look at the total probability
of B. And out of
325
00:16:39,430 --> 00:16:43,190
that probability that was inside
here, what fraction of
326
00:16:43,190 --> 00:16:48,310
that probability is assigned to
points for which the event
327
00:16:48,310 --> 00:16:49,780
A also occurs?
328
00:16:49,780 --> 00:16:54,480
329
00:16:54,480 --> 00:16:56,860
Does it give us the same numbers
as we got with this
330
00:16:56,860 --> 00:16:58,420
heuristic argument?
331
00:16:58,420 --> 00:17:01,530
Well in this example,
probability of A intersection
332
00:17:01,530 --> 00:17:06,359
B is 2/6, divided by total
probability of B, which is
333
00:17:06,359 --> 00:17:12,369
3/6, and so it's 2/3, which
agrees with this answer that's
334
00:17:12,369 --> 00:17:13,589
we got before.
335
00:17:13,589 --> 00:17:18,280
So the former indeed matches
what we were trying to do.
336
00:17:18,280 --> 00:17:21,040
One little technical detail.
337
00:17:21,040 --> 00:17:24,970
If the event B has zero
probability, and then here we
338
00:17:24,970 --> 00:17:27,770
have a ratio that doesn't
make sense.
339
00:17:27,770 --> 00:17:30,470
So in this case, we say that
conditional probabilities are
340
00:17:30,470 --> 00:17:31,720
not defined.
341
00:17:31,720 --> 00:17:34,780
342
00:17:34,780 --> 00:17:38,980
Now you can take this definition
and unravel it and
343
00:17:38,980 --> 00:17:40,260
write it in this form.
344
00:17:40,260 --> 00:17:43,510
The probability of A
intersection B is the
345
00:17:43,510 --> 00:17:46,780
probability of B times the
conditional probability.
346
00:17:46,780 --> 00:17:50,350
347
00:17:50,350 --> 00:17:53,820
So this is just consequence of
the definition but it has a
348
00:17:53,820 --> 00:17:55,370
nice interpretation.
349
00:17:55,370 --> 00:17:57,930
Think of probabilities
as frequencies.
350
00:17:57,930 --> 00:18:01,480
If I do the experiment over and
over, what fraction of the
351
00:18:01,480 --> 00:18:05,300
time is it going to be the case
that both A and B occur?
352
00:18:05,300 --> 00:18:08,490
Well, there's going to be a
certain fraction of the time
353
00:18:08,490 --> 00:18:10,820
at which B occurs.
354
00:18:10,820 --> 00:18:14,760
And out of those times when B
occurs, there's going to be a
355
00:18:14,760 --> 00:18:17,270
further fraction of
the experiments in
356
00:18:17,270 --> 00:18:19,410
which A also occurs.
357
00:18:19,410 --> 00:18:21,930
So interpret the conditional
probability as follows.
358
00:18:21,930 --> 00:18:24,320
You only look at those
experiments at which
359
00:18:24,320 --> 00:18:26,050
B happens to occur.
360
00:18:26,050 --> 00:18:29,820
And look at what fraction of
those experiments where B
361
00:18:29,820 --> 00:18:33,670
already occurred, event
A also occurs.
362
00:18:33,670 --> 00:18:39,610
And there's a symmetrical
version of this equality.
363
00:18:39,610 --> 00:18:44,660
There's symmetry between the
events B and A. So you also
364
00:18:44,660 --> 00:18:48,890
have this relation that
goes the other way.
365
00:18:48,890 --> 00:18:53,950
OK, so what do we use these
conditional probabilities for?
366
00:18:53,950 --> 00:18:55,120
First, one comment.
367
00:18:55,120 --> 00:18:58,100
Conditional probabilities
are just like ordinary
368
00:18:58,100 --> 00:18:59,170
probabilities.
369
00:18:59,170 --> 00:19:02,820
They're the new probabilities
that apply in a new universe
370
00:19:02,820 --> 00:19:07,300
where event B is known
to have occurred.
371
00:19:07,300 --> 00:19:10,620
So we had an original
probability model.
372
00:19:10,620 --> 00:19:12,210
We are told that B occurs.
373
00:19:12,210 --> 00:19:13,840
We revise our model.
374
00:19:13,840 --> 00:19:16,690
Our new model should still be
legitimate probability model.
375
00:19:16,690 --> 00:19:20,770
So it should satisfy all sorts
of properties that ordinary
376
00:19:20,770 --> 00:19:23,210
probabilities do satisfy.
377
00:19:23,210 --> 00:19:29,230
So for example, if A and B are
disjoint events, then we know
378
00:19:29,230 --> 00:19:33,830
that the probability of A
union B is equal to the
379
00:19:33,830 --> 00:19:39,230
probability of A plus
probability of B. And now if I
380
00:19:39,230 --> 00:19:42,770
tell you that a certain event C
occurred, we're placed in a
381
00:19:42,770 --> 00:19:45,220
new universe where
event C occurred.
382
00:19:45,220 --> 00:19:47,515
We have new probabilities
for that universe.
383
00:19:47,515 --> 00:19:49,880
These are the conditional
probabilities.
384
00:19:49,880 --> 00:19:52,960
And conditional probabilities
also satisfy
385
00:19:52,960 --> 00:19:54,820
this kind of property.
386
00:19:54,820 --> 00:19:58,380
So this is just our usual
additivity axiom but the
387
00:19:58,380 --> 00:20:02,290
applied in a new model, in which
we were told that event
388
00:20:02,290 --> 00:20:03,250
C occurred.
389
00:20:03,250 --> 00:20:06,580
So conditional probabilities
do not taste or smell any
390
00:20:06,580 --> 00:20:09,970
different than ordinary
probabilities do.
391
00:20:09,970 --> 00:20:14,350
Conditional probabilities, given
a specific event B, just
392
00:20:14,350 --> 00:20:19,480
form a probability law
on our sample space.
393
00:20:19,480 --> 00:20:22,460
It's a different probability
law but it's still a
394
00:20:22,460 --> 00:20:26,430
probability law that has all
of the desired properties.
395
00:20:26,430 --> 00:20:30,360
OK, so where do conditional
probabilities come up?
396
00:20:30,360 --> 00:20:32,450
They do come up in quizzes
and they do
397
00:20:32,450 --> 00:20:34,070
come up in silly problems.
398
00:20:34,070 --> 00:20:35,680
So let's start with this.
399
00:20:35,680 --> 00:20:37,790
We have this example
from last time.
400
00:20:37,790 --> 00:20:42,220
Two rolls of a die, all possible
pairs of roles are
401
00:20:42,220 --> 00:20:46,410
equally likely, so every element
in this square has
402
00:20:46,410 --> 00:20:47,660
probability of 1/16.
403
00:20:47,660 --> 00:20:50,300
404
00:20:50,300 --> 00:20:52,330
So all elements are
equally likely.
405
00:20:52,330 --> 00:20:54,280
That's our original model.
406
00:20:54,280 --> 00:20:57,210
Then somebody comes and tells us
that the minimum of the two
407
00:20:57,210 --> 00:20:59,530
rolls is equal to zero.
408
00:20:59,530 --> 00:21:02,060
What's that event?
409
00:21:02,060 --> 00:21:05,990
The minimum equal to zero can
happen in many ways, if we get
410
00:21:05,990 --> 00:21:08,990
two zeros or if we
get a zero and--
411
00:21:08,990 --> 00:21:13,140
sorry, if we get two
two's, or get a two
412
00:21:13,140 --> 00:21:14,830
and something larger.
413
00:21:14,830 --> 00:21:21,400
And so the is our new event B.
The red event is the event B.
414
00:21:21,400 --> 00:21:23,500
And now we want to calculate
probabilities
415
00:21:23,500 --> 00:21:25,310
inside this new universe.
416
00:21:25,310 --> 00:21:28,770
For example, you may be
interested in the question,
417
00:21:28,770 --> 00:21:31,960
questions about the maximum
of the two rolls.
418
00:21:31,960 --> 00:21:34,310
In the new universe, what's
the probability that the
419
00:21:34,310 --> 00:21:37,550
maximum is equal to one?
420
00:21:37,550 --> 00:21:44,320
The maximum being equal to
one is this black event.
421
00:21:44,320 --> 00:21:49,240
And given that we're told that
B occurred, this black events
422
00:21:49,240 --> 00:21:50,300
cannot happen.
423
00:21:50,300 --> 00:21:53,240
So this probability
is equal to zero.
424
00:21:53,240 --> 00:21:56,500
How about the maximum
being equal to two,
425
00:21:56,500 --> 00:21:59,110
given that event B?
426
00:21:59,110 --> 00:22:01,760
OK, we can use the
definition here.
427
00:22:01,760 --> 00:22:05,730
It's going to be the probability
that the maximum
428
00:22:05,730 --> 00:22:10,590
is equal to two and B occurs
divided by the probability of
429
00:22:10,590 --> 00:22:16,020
B. The probability that the
maximum is equal to two.
430
00:22:16,020 --> 00:22:19,470
OK, what's the event that the
maximum is equal to two?
431
00:22:19,470 --> 00:22:20,340
Let's draw it.
432
00:22:20,340 --> 00:22:22,300
This is going to be
the blue event.
433
00:22:22,300 --> 00:22:25,950
The maximum is equal to
two if we get any
434
00:22:25,950 --> 00:22:28,520
of those blue points.
435
00:22:28,520 --> 00:22:32,310
So the intersection of the two
events is the intersection of
436
00:22:32,310 --> 00:22:35,170
the red event and
the blue event.
437
00:22:35,170 --> 00:22:37,770
There's only one point in
their intersection.
438
00:22:37,770 --> 00:22:39,640
So the probability of
that intersection
439
00:22:39,640 --> 00:22:41,080
happening is 1/16.
440
00:22:41,080 --> 00:22:43,740
441
00:22:43,740 --> 00:22:45,160
That's the numerator.
442
00:22:45,160 --> 00:22:47,110
How about the denominator?
443
00:22:47,110 --> 00:22:50,610
The event B consists of five
elements, each one of which
444
00:22:50,610 --> 00:22:52,270
had probability of 1/16.
445
00:22:52,270 --> 00:22:54,570
So that's 5/16.
446
00:22:54,570 --> 00:22:58,340
And so the answer is 1/5.
447
00:22:58,340 --> 00:23:02,830
Could we have gotten this
answer in a faster way?
448
00:23:02,830 --> 00:23:04,190
Yes.
449
00:23:04,190 --> 00:23:05,560
Here's how it goes.
450
00:23:05,560 --> 00:23:09,060
We're trying to find the
conditional probability that
451
00:23:09,060 --> 00:23:13,210
we get this point, given
that B occurred.
452
00:23:13,210 --> 00:23:15,570
B consist of five elements.
453
00:23:15,570 --> 00:23:18,250
All of those five elements were
equally likely when we
454
00:23:18,250 --> 00:23:22,720
started, so they remain equally
likely afterwards.
455
00:23:22,720 --> 00:23:25,180
Because when we define
conditional probabilities, we
456
00:23:25,180 --> 00:23:28,110
keep the same proportions
inside the set.
457
00:23:28,110 --> 00:23:31,940
So the five red elements
were equally likely.
458
00:23:31,940 --> 00:23:35,050
They remain equally likely
in the conditional world.
459
00:23:35,050 --> 00:23:39,080
So conditional event B having
happened, each one of these
460
00:23:39,080 --> 00:23:41,580
five elements has the
same probability.
461
00:23:41,580 --> 00:23:44,300
So the probability that we
actually get this point is
462
00:23:44,300 --> 00:23:46,210
going to be 1/5.
463
00:23:46,210 --> 00:23:48,280
And so that's the shortcut.
464
00:23:48,280 --> 00:23:53,070
More generally, whenever you
have a uniform distribution on
465
00:23:53,070 --> 00:23:56,470
your initial sample space,
when you condition on an
466
00:23:56,470 --> 00:24:01,000
event, your new distribution is
still going to be uniform,
467
00:24:01,000 --> 00:24:05,010
but on the smaller events
of that we considered.
468
00:24:05,010 --> 00:24:09,780
So we started with a uniform
distribution on the big square
469
00:24:09,780 --> 00:24:13,730
and we ended up with a
uniform distribution
470
00:24:13,730 --> 00:24:17,230
just on the red point.
471
00:24:17,230 --> 00:24:19,850
Now besides silly problems,
however, conditional
472
00:24:19,850 --> 00:24:25,070
probabilities show up in real
and interesting situations.
473
00:24:25,070 --> 00:24:27,390
And this example is going
to give you some
474
00:24:27,390 --> 00:24:30,430
idea of how that happens.
475
00:24:30,430 --> 00:24:32,250
OK.
476
00:24:32,250 --> 00:24:35,450
Actually, in this example,
instead of starting with a
477
00:24:35,450 --> 00:24:39,480
probability model in terms of
regular probabilities, I'm
478
00:24:39,480 --> 00:24:43,070
actually going to define the
model in terms of conditional
479
00:24:43,070 --> 00:24:43,890
probabilities.
480
00:24:43,890 --> 00:24:45,880
And we'll see how
this is done.
481
00:24:45,880 --> 00:24:48,330
So here's the story.
482
00:24:48,330 --> 00:24:52,210
There may be an airplane flying
up in the sky, in a
483
00:24:52,210 --> 00:24:55,400
particular sector of the sky
that you're watching.
484
00:24:55,400 --> 00:24:57,950
Sometimes there is one sometimes
there isn't.
485
00:24:57,950 --> 00:25:01,760
And from experience you know
that when you look up, there's
486
00:25:01,760 --> 00:25:04,400
five percent probability that
the plane is flying above
487
00:25:04,400 --> 00:25:09,670
there and 95% probability that
there's no plane up there.
488
00:25:09,670 --> 00:25:14,930
So event A is the event that the
plane is flying out there.
489
00:25:14,930 --> 00:25:19,140
Now you bought this wonderful
radar that's looks up.
490
00:25:19,140 --> 00:25:23,300
And you're told in the
manufacturer's specs that, if
491
00:25:23,300 --> 00:25:27,310
there is a plane out there,
your radar is going to
492
00:25:27,310 --> 00:25:30,090
register something, a
blip on the screen
493
00:25:30,090 --> 00:25:32,940
with probability 99%.
494
00:25:32,940 --> 00:25:35,540
And it will not register
anything with
495
00:25:35,540 --> 00:25:37,500
probability one percent.
496
00:25:37,500 --> 00:25:43,890
So this particular part of the
picture is a self-contained
497
00:25:43,890 --> 00:25:50,280
probability model of what your
radar does in a world where a
498
00:25:50,280 --> 00:25:52,530
plane is out there.
499
00:25:52,530 --> 00:25:55,380
So I'm telling you that the
plane is out there.
500
00:25:55,380 --> 00:25:58,240
So we're now dealing with
conditional probabilities
501
00:25:58,240 --> 00:26:00,920
because I gave you some
particular information.
502
00:26:00,920 --> 00:26:04,120
Given this information that the
plane is out there, that's
503
00:26:04,120 --> 00:26:07,770
how your radar is going to
behave with probability 99% is
504
00:26:07,770 --> 00:26:10,320
going to detect it, with
probability one percent is
505
00:26:10,320 --> 00:26:11,620
going to miss it.
506
00:26:11,620 --> 00:26:14,100
So this piece of the picture
is a self-contained
507
00:26:14,100 --> 00:26:15,060
probability model.
508
00:26:15,060 --> 00:26:17,130
The probabilities
add up to one.
509
00:26:17,130 --> 00:26:20,300
But it's a piece of
a larger model.
510
00:26:20,300 --> 00:26:22,820
Similarly, there's the
other possibility.
511
00:26:22,820 --> 00:26:27,980
Maybe a plane is not up there
and the manufacturer specs
512
00:26:27,980 --> 00:26:32,630
tell you something about
false alarms.
513
00:26:32,630 --> 00:26:37,490
A false alarm is the situation
where the plane is not there,
514
00:26:37,490 --> 00:26:41,190
but for some reason your radar
picked up some noise or
515
00:26:41,190 --> 00:26:43,700
whatever and shows a
blip on the screen.
516
00:26:43,700 --> 00:26:46,790
And suppose that this happens
with probability ten percent.
517
00:26:46,790 --> 00:26:49,170
Whereas with probability
90% your radar
518
00:26:49,170 --> 00:26:51,220
gives the correct answer.
519
00:26:51,220 --> 00:26:55,430
So this is sort of a model of
what's going to happen with
520
00:26:55,430 --> 00:26:59,430
respect to both the plane --
we're given probabilities
521
00:26:59,430 --> 00:27:02,000
about this -- and we're given
probabilities about how the
522
00:27:02,000 --> 00:27:04,120
radar behaves.
523
00:27:04,120 --> 00:27:07,740
So here I have indirectly
specified the probability law
524
00:27:07,740 --> 00:27:10,810
in our model by starting with
conditional probabilities as
525
00:27:10,810 --> 00:27:13,670
opposed to starting with
ordinary probabilities.
526
00:27:13,670 --> 00:27:17,160
Can we derive ordinary
probabilities starting from
527
00:27:17,160 --> 00:27:18,740
the conditional number ones?
528
00:27:18,740 --> 00:27:20,340
Yeah, we certainly can.
529
00:27:20,340 --> 00:27:25,810
Let's look at this event, A
intersection B, which is the
530
00:27:25,810 --> 00:27:31,160
event up here, that there
is a plane and our
531
00:27:31,160 --> 00:27:33,750
radar picks it up.
532
00:27:33,750 --> 00:27:35,760
How can we calculate
this probability?
533
00:27:35,760 --> 00:27:38,600
Well we use the definition of
conditional probabilities and
534
00:27:38,600 --> 00:27:41,430
this is the probability of
A times the conditional
535
00:27:41,430 --> 00:27:50,260
probability of B given A.
So it's 0.05 times 0.99.
536
00:27:50,260 --> 00:27:53,290
And the answer, in
case you care--
537
00:27:53,290 --> 00:27:56,730
It's 0.0495.
538
00:27:56,730 --> 00:27:57,650
OK.
539
00:27:57,650 --> 00:28:01,370
So we can calculate the
probabilities of final
540
00:28:01,370 --> 00:28:05,120
outcomes, which are the leaves
of the tree, by using the
541
00:28:05,120 --> 00:28:07,250
probabilities that
we have along the
542
00:28:07,250 --> 00:28:09,000
branches of the tree.
543
00:28:09,000 --> 00:28:11,950
So essentially, what we ended
up doing was to multiply the
544
00:28:11,950 --> 00:28:13,700
probability of this
branch times the
545
00:28:13,700 --> 00:28:17,220
probability of that branch.
546
00:28:17,220 --> 00:28:20,690
Now, how about the answer
to this question.
547
00:28:20,690 --> 00:28:25,350
What is the probability
that our radar is
548
00:28:25,350 --> 00:28:28,660
going to register something?
549
00:28:28,660 --> 00:28:32,800
OK, this is an event that can
happen in multiple ways.
550
00:28:32,800 --> 00:28:38,020
It's the event that consists
of this outcome.
551
00:28:38,020 --> 00:28:41,640
There is a plane and the radar
registers something together
552
00:28:41,640 --> 00:28:46,440
with this outcome, there is no
plane but the radar still
553
00:28:46,440 --> 00:28:48,470
registers something.
554
00:28:48,470 --> 00:28:52,650
So to find the probability of
this event, we need the
555
00:28:52,650 --> 00:28:56,940
individual probabilities
of the two outcomes.
556
00:28:56,940 --> 00:29:00,780
For the first outcome, we
already calculated it.
557
00:29:00,780 --> 00:29:03,870
For the second outcome, the
probability that this happens
558
00:29:03,870 --> 00:29:08,480
is going to be this probability
95% times 0.10,
559
00:29:08,480 --> 00:29:11,280
which is the conditional
probability for taking this
560
00:29:11,280 --> 00:29:15,070
branch, given that there
was no plane out there.
561
00:29:15,070 --> 00:29:18,080
So we just add the numbers.
562
00:29:18,080 --> 00:29:26,950
0.05 times 0.99 plus 0.95
times 0.1 and the
563
00:29:26,950 --> 00:29:31,720
final answer is 0.1445.
564
00:29:31,720 --> 00:29:32,410
OK.
565
00:29:32,410 --> 00:29:35,730
And now here's the interesting
question.
566
00:29:35,730 --> 00:29:41,480
Given that your radar recorded
something, how likely is it
567
00:29:41,480 --> 00:29:45,070
that there is an airplane
up there?
568
00:29:45,070 --> 00:29:46,810
Your radar registering
something --
569
00:29:46,810 --> 00:29:48,730
that can be caused
by two things.
570
00:29:48,730 --> 00:29:52,390
Either there's a plane there,
and your radar did its job.
571
00:29:52,390 --> 00:29:57,400
Or there was nothing, but your
radar fired a false alarm.
572
00:29:57,400 --> 00:30:01,690
What's the probability that this
is the case as opposed to
573
00:30:01,690 --> 00:30:05,370
that being the case?
574
00:30:05,370 --> 00:30:06,460
OK.
575
00:30:06,460 --> 00:30:10,510
The intuitive shortcut would
be that it should be the
576
00:30:10,510 --> 00:30:12,930
probability--
577
00:30:12,930 --> 00:30:15,820
you look at their relative odds
of these two elements and
578
00:30:15,820 --> 00:30:19,570
you use them to find out how
much more likely it is to be
579
00:30:19,570 --> 00:30:21,730
there as opposed
to being there.
580
00:30:21,730 --> 00:30:24,240
But instead of doing this,
let's just write down the
581
00:30:24,240 --> 00:30:26,570
definition and just use it.
582
00:30:26,570 --> 00:30:30,480
It's the probability of A and
B happening, divided by the
583
00:30:30,480 --> 00:30:34,250
probability of B. This is just
our definition of conditional
584
00:30:34,250 --> 00:30:35,540
probabilities.
585
00:30:35,540 --> 00:30:39,300
Now we have already found
the numerator.
586
00:30:39,300 --> 00:30:42,450
We have already calculated
the denominator.
587
00:30:42,450 --> 00:30:46,440
So we take the ratio of these
two numbers and we find the
588
00:30:46,440 --> 00:30:47,650
final answer --
589
00:30:47,650 --> 00:30:54,490
which is 0.34.
590
00:30:54,490 --> 00:30:55,980
OK.
591
00:30:55,980 --> 00:30:59,040
There's this slightly
curious thing that's
592
00:30:59,040 --> 00:31:02,270
happened in this example.
593
00:31:02,270 --> 00:31:08,380
Doesn't this number feel
a little too low?
594
00:31:08,380 --> 00:31:10,700
My radar --
595
00:31:10,700 --> 00:31:13,820
So this is a conditional
probability, given that my
596
00:31:13,820 --> 00:31:17,110
radar said there is something
out there, that there is
597
00:31:17,110 --> 00:31:19,200
indeed something there.
598
00:31:19,200 --> 00:31:21,960
So it's sort of the probability
that our radar
599
00:31:21,960 --> 00:31:24,560
gave the correct answer.
600
00:31:24,560 --> 00:31:28,580
Now, the specs of our radar
we're pretty good.
601
00:31:28,580 --> 00:31:31,460
In this situation, it gives
you the correct
602
00:31:31,460 --> 00:31:34,160
answer 99% of the time.
603
00:31:34,160 --> 00:31:36,020
In this situation, it gives
you the correct
604
00:31:36,020 --> 00:31:38,400
answer 90% of the time.
605
00:31:38,400 --> 00:31:39,730
So you would think
that your radar
606
00:31:39,730 --> 00:31:41,870
there is really reliable.
607
00:31:41,870 --> 00:31:47,730
But yet here the radar recorded
something, but the
608
00:31:47,730 --> 00:31:51,900
chance that the answer that
you get out of this is the
609
00:31:51,900 --> 00:31:55,180
right one, given that it
recorded something, the chance
610
00:31:55,180 --> 00:31:58,970
that there is an airplane
out there is only 30%.
611
00:31:58,970 --> 00:32:01,980
So you cannot really rely on
the measurements from your
612
00:32:01,980 --> 00:32:06,650
radar, even though the specs of
the radar were really good.
613
00:32:06,650 --> 00:32:08,620
What's the reason for this?
614
00:32:08,620 --> 00:32:17,730
Well, the reason is that false
alarms are pretty common.
615
00:32:17,730 --> 00:32:20,110
Most of the time there's
nothing.
616
00:32:20,110 --> 00:32:23,750
And there's a ten percent
probability of false alarms.
617
00:32:23,750 --> 00:32:26,640
So there's roughly a ten percent
probability that in
618
00:32:26,640 --> 00:32:29,730
any given experiment, you
have a false alarm.
619
00:32:29,730 --> 00:32:33,450
And there is about the five
percent probability that
620
00:32:33,450 --> 00:32:37,090
something out there and
your radar gets it.
621
00:32:37,090 --> 00:32:41,350
So when your radar records
something, it's actually more
622
00:32:41,350 --> 00:32:44,980
likely to be a false
alarm rather than
623
00:32:44,980 --> 00:32:46,860
being an actual airplane.
624
00:32:46,860 --> 00:32:49,100
This has probability ten
percent roughly.
625
00:32:49,100 --> 00:32:52,000
This has probability roughly
five percent
626
00:32:52,000 --> 00:32:55,130
So conditional probabilities
are sometimes
627
00:32:55,130 --> 00:32:58,250
counter-intuitive in terms of
the answers that they get.
628
00:32:58,250 --> 00:33:01,210
And you can make similar
stories about doctors
629
00:33:01,210 --> 00:33:04,370
interpreting the results
of tests.
630
00:33:04,370 --> 00:33:07,560
So you tested positive for
a certain disease.
631
00:33:07,560 --> 00:33:11,260
Does it mean that you have
the disease necessarily?
632
00:33:11,260 --> 00:33:14,590
Well if that disease has been
eradicated from the face of
633
00:33:14,590 --> 00:33:17,900
the earth, testing positive
doesn't mean that you have the
634
00:33:17,900 --> 00:33:21,740
disease, even if the test
was designed to be
635
00:33:21,740 --> 00:33:23,320
a pretty good one.
636
00:33:23,320 --> 00:33:28,190
So unfortunately, doctors do get
it wrong also sometimes.
637
00:33:28,190 --> 00:33:29,990
And the reasoning that
comes in such
638
00:33:29,990 --> 00:33:32,290
situations is pretty subtle.
639
00:33:32,290 --> 00:33:34,890
Now for the rest of the lecture,
what we're going to
640
00:33:34,890 --> 00:33:40,710
do is to take this example where
we did three things and
641
00:33:40,710 --> 00:33:41,880
abstract them.
642
00:33:41,880 --> 00:33:44,540
These three trivial calculations
that's we just
643
00:33:44,540 --> 00:33:50,190
did are three very important,
very basic tools that you use
644
00:33:50,190 --> 00:33:53,350
to solve more general
probability problems.
645
00:33:53,350 --> 00:33:55,040
So what's the first one?
646
00:33:55,040 --> 00:33:58,040
We find the probability of a
composite event, two things
647
00:33:58,040 --> 00:34:01,300
happening, by multiplying
probabilities and conditional
648
00:34:01,300 --> 00:34:03,130
probabilities.
649
00:34:03,130 --> 00:34:08,639
More general version of this,
look at any situation, maybe
650
00:34:08,639 --> 00:34:10,860
involving lots and
lots of events.
651
00:34:10,860 --> 00:34:15,510
So here's a story that event A
may happen or may not happen.
652
00:34:15,510 --> 00:34:19,440
Given that A occurred, it's
possible that B happens or
653
00:34:19,440 --> 00:34:21,360
that B does not happen.
654
00:34:21,360 --> 00:34:25,280
Given that B also happens, it's
possible that the event C
655
00:34:25,280 --> 00:34:29,770
also happens or that event
C does not happen.
656
00:34:29,770 --> 00:34:33,400
And somebody specifies for you
a model by giving you all
657
00:34:33,400 --> 00:34:36,230
these conditional probabilities
along the way.
658
00:34:36,230 --> 00:34:39,570
Notice what we move along
the branches as the tree
659
00:34:39,570 --> 00:34:40,690
progresses.
660
00:34:40,690 --> 00:34:45,110
Any point in the tree
corresponds to certain events
661
00:34:45,110 --> 00:34:47,050
having happened.
662
00:34:47,050 --> 00:34:50,980
And then, given that this
has happened, we specify
663
00:34:50,980 --> 00:34:52,360
conditional probabilities.
664
00:34:52,360 --> 00:34:55,989
Given that this has happened,
how likely is it for that C
665
00:34:55,989 --> 00:34:57,900
also occurs?
666
00:34:57,900 --> 00:35:00,890
Given a model of this kind, how
do we find the probability
667
00:35:00,890 --> 00:35:02,660
or for this event?
668
00:35:02,660 --> 00:35:05,310
The answer is extremely
simple.
669
00:35:05,310 --> 00:35:09,930
All that you do is move along
with the tree and multiply
670
00:35:09,930 --> 00:35:12,950
conditional probabilities
along the way.
671
00:35:12,950 --> 00:35:16,900
So in terms of frequencies, how
often do all three things
672
00:35:16,900 --> 00:35:19,310
happen, A, B, and C?
673
00:35:19,310 --> 00:35:22,450
You first see how often
does A occur.
674
00:35:22,450 --> 00:35:24,860
Out of the times that
A occurs, how
675
00:35:24,860 --> 00:35:26,710
often does B occur?
676
00:35:26,710 --> 00:35:29,630
And out of the times where both
A and B have occurred,
677
00:35:29,630 --> 00:35:31,660
how often does C occur?
678
00:35:31,660 --> 00:35:34,390
And you can just multiply those
three frequencies with
679
00:35:34,390 --> 00:35:36,440
each other.
680
00:35:36,440 --> 00:35:39,740
What is the formal
proof of this?
681
00:35:39,740 --> 00:35:43,000
Well, the only thing we have in
our hands is the definition
682
00:35:43,000 --> 00:35:44,890
of conditional probabilities.
683
00:35:44,890 --> 00:35:49,660
So let's just use this.
684
00:35:49,660 --> 00:35:50,910
And--
685
00:35:50,910 --> 00:35:54,370
686
00:35:54,370 --> 00:35:55,000
OK.
687
00:35:55,000 --> 00:35:58,210
Now, the definition of
conditional probabilities
688
00:35:58,210 --> 00:36:00,770
tells us that the probability
of two things is the
689
00:36:00,770 --> 00:36:03,660
probability of one of them
times a conditional
690
00:36:03,660 --> 00:36:04,620
probability.
691
00:36:04,620 --> 00:36:05,850
Unfortunately, here we have the
692
00:36:05,850 --> 00:36:07,310
probability of three things.
693
00:36:07,310 --> 00:36:09,000
What can I do?
694
00:36:09,000 --> 00:36:13,570
I can put a parenthesis in here
and think of this as the
695
00:36:13,570 --> 00:36:18,640
probability of this and that
and apply our definition of
696
00:36:18,640 --> 00:36:20,300
conditional probabilities
here.
697
00:36:20,300 --> 00:36:23,920
The probability of two things
happening is the probability
698
00:36:23,920 --> 00:36:28,430
that the first happens times
the conditional probability
699
00:36:28,430 --> 00:36:34,070
that the second happens, given
A and B, given that the first
700
00:36:34,070 --> 00:36:35,330
one happened.
701
00:36:35,330 --> 00:36:38,850
So this is just the definition
of the conditional probability
702
00:36:38,850 --> 00:36:41,980
of an event, given
another event.
703
00:36:41,980 --> 00:36:44,270
That other event is a
composite one, but
704
00:36:44,270 --> 00:36:45,330
that's not an issue.
705
00:36:45,330 --> 00:36:47,300
It's just an event.
706
00:36:47,300 --> 00:36:50,040
And then we use the definition
of conditional probabilities
707
00:36:50,040 --> 00:36:56,290
once more to break this apart
and make it P(A), P(B given A)
708
00:36:56,290 --> 00:36:58,260
and then finally,
the last term.
709
00:36:58,260 --> 00:37:00,930
710
00:37:00,930 --> 00:37:01,270
OK.
711
00:37:01,270 --> 00:37:03,680
So this proves the formula
that I have up
712
00:37:03,680 --> 00:37:05,290
there on the slides.
713
00:37:05,290 --> 00:37:07,470
And if you wish to calculate
any other
714
00:37:07,470 --> 00:37:09,330
probability in this diagram.
715
00:37:09,330 --> 00:37:12,590
For example, if you want to
calculate this probability,
716
00:37:12,590 --> 00:37:15,580
you would still multiply the
conditional probabilities
717
00:37:15,580 --> 00:37:18,560
along the different branches
of the tree.
718
00:37:18,560 --> 00:37:22,360
In particular, here in this
branch, you would have the
719
00:37:22,360 --> 00:37:26,670
conditional probability of
C complement, given A
720
00:37:26,670 --> 00:37:29,790
intersection B complement,
and so on.
721
00:37:29,790 --> 00:37:32,070
So you write down probabilities
along all those
722
00:37:32,070 --> 00:37:35,940
tree branches and just multiply
them as you go.
723
00:37:35,940 --> 00:37:38,510
724
00:37:38,510 --> 00:37:44,450
So this was the first skill
that we are covering.
725
00:37:44,450 --> 00:37:46,690
What was the second one?
726
00:37:46,690 --> 00:37:53,240
What we did was to calculate
the total probability of a
727
00:37:53,240 --> 00:37:58,520
certain event B that
consisted of--
728
00:37:58,520 --> 00:38:02,820
was made up from different
possibilities, which
729
00:38:02,820 --> 00:38:05,580
corresponded to different
scenarios.
730
00:38:05,580 --> 00:38:08,870
So we wanted to calculate the
probability of this event B
731
00:38:08,870 --> 00:38:12,030
that consisted of those
two elements.
732
00:38:12,030 --> 00:38:13,280
Let's generalize.
733
00:38:13,280 --> 00:38:18,600
734
00:38:18,600 --> 00:38:23,080
So we have our big model.
735
00:38:23,080 --> 00:38:26,110
And this sample space
is partitioned
736
00:38:26,110 --> 00:38:27,410
in a number of sets.
737
00:38:27,410 --> 00:38:30,620
In our radar example, we had
a partition in two sets.
738
00:38:30,620 --> 00:38:33,600
Either a plane is there, or
a plane is not there.
739
00:38:33,600 --> 00:38:35,850
Since we're trying to
generalize, now I'm going to
740
00:38:35,850 --> 00:38:39,410
give you a picture for the case
of three possibilities or
741
00:38:39,410 --> 00:38:41,360
three possible scenarios.
742
00:38:41,360 --> 00:38:45,160
So whatever happens in the
world, there are three
743
00:38:45,160 --> 00:38:49,660
possible scenarios,
A1, A2, A3.
744
00:38:49,660 --> 00:38:54,695
So think of these as there's
nothing in the air, there's an
745
00:38:54,695 --> 00:38:58,190
airplane in the air, or there's
a flock of geese
746
00:38:58,190 --> 00:38:59,490
flying in the air.
747
00:38:59,490 --> 00:39:03,050
So there's three possible
scenarios.
748
00:39:03,050 --> 00:39:08,972
And then there's a certain event
B of interest, such as a
749
00:39:08,972 --> 00:39:12,800
radar records something or
doesn't record something.
750
00:39:12,800 --> 00:39:15,870
We specify this model by giving
751
00:39:15,870 --> 00:39:18,040
probabilities for the Ai's--
752
00:39:18,040 --> 00:39:20,690
753
00:39:20,690 --> 00:39:23,420
That's the probability of
the different scenarios.
754
00:39:23,420 --> 00:39:27,180
And somebody also gives us the
probabilities that this event
755
00:39:27,180 --> 00:39:31,010
B is going to occur, given
that the Ai-th
756
00:39:31,010 --> 00:39:33,480
scenario has occurred.
757
00:39:33,480 --> 00:39:36,230
Think of the Ai's
as scenarios.
758
00:39:36,230 --> 00:39:39,130
759
00:39:39,130 --> 00:39:43,110
And we want to calculate the
overall probability of the
760
00:39:43,110 --> 00:39:47,210
event B. What's happening
in this example?
761
00:39:47,210 --> 00:39:49,640
Perhaps, instead of this
picture, it's easier to
762
00:39:49,640 --> 00:39:54,970
visualize if I go back to the
picture I was using before.
763
00:39:54,970 --> 00:39:59,990
We have three possible
scenarios, A1, A2, A3.
764
00:39:59,990 --> 00:40:05,150
And under each scenario, B may
happen or B may not happen.
765
00:40:05,150 --> 00:40:11,360
766
00:40:11,360 --> 00:40:12,250
And so on.
767
00:40:12,250 --> 00:40:16,060
So here we have A2 intersection
B. And here we
768
00:40:16,060 --> 00:40:22,110
have A3 intersection B. In the
previous slide, we found how
769
00:40:22,110 --> 00:40:25,350
to calculate the probability
of any event of this kind,
770
00:40:25,350 --> 00:40:28,870
which is done by multiplying
probabilities here and
771
00:40:28,870 --> 00:40:31,100
conditional probabilities
there.
772
00:40:31,100 --> 00:40:34,320
Now we are asked to calculate
the total probability of the
773
00:40:34,320 --> 00:40:38,410
event B. The event B can happen
in three possible ways.
774
00:40:38,410 --> 00:40:39,900
It can happen here.
775
00:40:39,900 --> 00:40:41,700
It can happen there.
776
00:40:41,700 --> 00:40:43,780
And it can happen here.
777
00:40:43,780 --> 00:40:50,020
So this is our event B. It
consists of three elements.
778
00:40:50,020 --> 00:40:53,370
To calculate the total
probability of our event B,
779
00:40:53,370 --> 00:40:56,730
all we need to do is to add
these three probabilities.
780
00:40:56,730 --> 00:40:59,440
781
00:40:59,440 --> 00:41:03,510
So B is an event that consists
of these three elements.
782
00:41:03,510 --> 00:41:06,450
There are three ways
that B can happen.
783
00:41:06,450 --> 00:41:10,390
Either B happens together with
A1, or B happens together with
784
00:41:10,390 --> 00:41:13,030
A2, or B happens together
with A3.
785
00:41:13,030 --> 00:41:15,340
So we need to add the
probabilities of these three
786
00:41:15,340 --> 00:41:16,630
contingencies.
787
00:41:16,630 --> 00:41:18,980
For each one of those
contingencies, we can
788
00:41:18,980 --> 00:41:23,020
calculate its probability by
using the multiplication rule.
789
00:41:23,020 --> 00:41:27,580
So the probability of A1 and
B happening is this--
790
00:41:27,580 --> 00:41:30,030
It's the probability of A1
and then B happening
791
00:41:30,030 --> 00:41:32,020
given that A1 happens.
792
00:41:32,020 --> 00:41:36,140
The probability of this
contingency is found by taking
793
00:41:36,140 --> 00:41:39,470
the probability that A2 happens
times the conditional
794
00:41:39,470 --> 00:41:42,350
probability of A2, given
that B happened.
795
00:41:42,350 --> 00:41:44,640
And similarly for
the third one.
796
00:41:44,640 --> 00:41:48,030
So this is the general rule
that we have here.
797
00:41:48,030 --> 00:41:50,830
The rule is written for the
case of three scenarios.
798
00:41:50,830 --> 00:41:54,020
But obviously, it has a
generalization for the case of
799
00:41:54,020 --> 00:41:57,440
four or five or more
scenarios.
800
00:41:57,440 --> 00:42:02,050
It gives you a way of breaking
up the calculation of an event
801
00:42:02,050 --> 00:42:06,740
that can happen in multiple ways
by considering individual
802
00:42:06,740 --> 00:42:09,720
probabilities for the different
ways that the event
803
00:42:09,720 --> 00:42:10,970
can happen.
804
00:42:10,970 --> 00:42:12,950
805
00:42:12,950 --> 00:42:14,640
OK.
806
00:42:14,640 --> 00:42:16,300
So--
807
00:42:16,300 --> 00:42:16,656
Yes?
808
00:42:16,656 --> 00:42:18,180
AUDIENCE: Does this
have to change for
809
00:42:18,180 --> 00:42:19,800
infinite sample space?
810
00:42:19,800 --> 00:42:20,760
JOHN TSISIKLIS: No.
811
00:42:20,760 --> 00:42:23,050
This is true whether
your sample space
812
00:42:23,050 --> 00:42:25,450
is infinite or finite.
813
00:42:25,450 --> 00:42:28,410
What I'm using in this argument
that we have a
814
00:42:28,410 --> 00:42:33,670
partition into just three
scenarios, three events.
815
00:42:33,670 --> 00:42:36,720
So it's a partition to a finite
number of events.
816
00:42:36,720 --> 00:42:41,100
It's also true if it's a
partition into an infinite
817
00:42:41,100 --> 00:42:43,670
sequence of events.
818
00:42:43,670 --> 00:42:47,550
But that's, I think, one of the
theoretical problems at
819
00:42:47,550 --> 00:42:49,430
the end of the chapter.
820
00:42:49,430 --> 00:42:54,350
You probably may not
need it for now.
821
00:42:54,350 --> 00:42:57,550
OK, going back to
the story here.
822
00:42:57,550 --> 00:43:00,410
There are three possible
scenarios about what could
823
00:43:00,410 --> 00:43:03,390
happen in the world that
are captured here.
824
00:43:03,390 --> 00:43:08,660
Event, under each scenario,
event B may or may not happen.
825
00:43:08,660 --> 00:43:11,850
And so these probabilities tell
us the likelihoods of the
826
00:43:11,850 --> 00:43:13,270
different scenarios.
827
00:43:13,270 --> 00:43:17,640
These conditional probabilities
tell us how
828
00:43:17,640 --> 00:43:21,030
likely is it for B to happen
under one scenario, or the
829
00:43:21,030 --> 00:43:23,760
other scenario, or the
other scenario.
830
00:43:23,760 --> 00:43:28,510
The overall probability of
B is found by taking some
831
00:43:28,510 --> 00:43:32,380
combination of the probabilities
of B in the
832
00:43:32,380 --> 00:43:34,250
different possible
worlds, in the
833
00:43:34,250 --> 00:43:36,230
different possible scenarios.
834
00:43:36,230 --> 00:43:38,690
Under some scenario, B
may be very likely.
835
00:43:38,690 --> 00:43:42,280
Under another scenario, it
may be very unlikely.
836
00:43:42,280 --> 00:43:45,740
We take all of these into
account and weigh them
837
00:43:45,740 --> 00:43:48,590
according to the likelihood
of the scenarios.
838
00:43:48,590 --> 00:43:53,040
Now notice that since A1, A2,
and three form a partition,
839
00:43:53,040 --> 00:43:58,530
these three probabilities
have what property?
840
00:43:58,530 --> 00:44:00,810
Add to what?
841
00:44:00,810 --> 00:44:03,640
They add to one.
842
00:44:03,640 --> 00:44:06,020
So it's the probability of this
branch, plus this branch,
843
00:44:06,020 --> 00:44:07,240
plus this branch.
844
00:44:07,240 --> 00:44:11,660
So what we have here is a
weighted average of the
845
00:44:11,660 --> 00:44:15,120
probabilities of the B's into
the different worlds, or in
846
00:44:15,120 --> 00:44:16,690
the different scenarios.
847
00:44:16,690 --> 00:44:17,860
Special case.
848
00:44:17,860 --> 00:44:20,370
Suppose the three scenarios
are equally likely.
849
00:44:20,370 --> 00:44:25,300
So P of A1 equals 1/3, equals
to P of A2, P of A3.
850
00:44:25,300 --> 00:44:27,320
what are we saying here?
851
00:44:27,320 --> 00:44:31,750
In that case of equally likely
scenarios, the probability of
852
00:44:31,750 --> 00:44:35,920
B is the average of the
probabilities of B in the
853
00:44:35,920 --> 00:44:38,835
three different words, or in the
three different scenarios.
854
00:44:38,835 --> 00:44:42,950
855
00:44:42,950 --> 00:44:43,450
OK.
856
00:44:43,450 --> 00:44:46,630
So to finally, the last step.
857
00:44:46,630 --> 00:44:53,800
If we go back again two slides,
the last thing that we
858
00:44:53,800 --> 00:44:57,510
did was to calculate a
conditional probability of
859
00:44:57,510 --> 00:45:01,760
this kind, probability of
A given B, which is a
860
00:45:01,760 --> 00:45:04,080
probability associated
essentially with
861
00:45:04,080 --> 00:45:05,630
an inference problem.
862
00:45:05,630 --> 00:45:09,840
Given that our radar recorded
something, how likely is it
863
00:45:09,840 --> 00:45:12,060
that the plane was up there?
864
00:45:12,060 --> 00:45:15,240
So we're trying to infer whether
a plane was up there
865
00:45:15,240 --> 00:45:18,610
or not, based on the information
that we've got.
866
00:45:18,610 --> 00:45:20,770
So let's generalize once more.
867
00:45:20,770 --> 00:45:24,560
868
00:45:24,560 --> 00:45:28,250
And we're just going to rewrite
what we did in that
869
00:45:28,250 --> 00:45:32,190
example, but in terms of general
symbols instead of the
870
00:45:32,190 --> 00:45:33,650
specific numbers.
871
00:45:33,650 --> 00:45:38,180
So once more, the model that we
have involves probabilities
872
00:45:38,180 --> 00:45:40,480
of the different scenarios.
873
00:45:40,480 --> 00:45:42,830
These we call them prior
probabilities.
874
00:45:42,830 --> 00:45:46,690
They're are our initial beliefs
about how likely each
875
00:45:46,690 --> 00:45:49,360
scenario is to occur.
876
00:45:49,360 --> 00:45:54,500
We also have a model of our
measuring device that tells us
877
00:45:54,500 --> 00:45:58,110
under that scenario how likely
is it that our radar will
878
00:45:58,110 --> 00:46:00,140
register something or not.
879
00:46:00,140 --> 00:46:03,220
So we're given again these
conditional probabilities.
880
00:46:03,220 --> 00:46:04,330
We're given the conditional
881
00:46:04,330 --> 00:46:06,950
probabilities for these branches.
882
00:46:06,950 --> 00:46:11,050
Then we are told that
event B occurred.
883
00:46:11,050 --> 00:46:15,330
And on the basis of this new
information, we want to form
884
00:46:15,330 --> 00:46:18,510
some new beliefs about the
relative likelihood of the
885
00:46:18,510 --> 00:46:20,110
different scenarios.
886
00:46:20,110 --> 00:46:23,790
Going back again to our radar
example, an airplane was
887
00:46:23,790 --> 00:46:26,340
present with probability 5%.
888
00:46:26,340 --> 00:46:29,180
Given that the radar recorded
something, we're going to
889
00:46:29,180 --> 00:46:30,540
change our beliefs.
890
00:46:30,540 --> 00:46:34,870
Now, a plane is present
with probability 34%.
891
00:46:34,870 --> 00:46:38,270
The radar, since we saw
something, we are going to
892
00:46:38,270 --> 00:46:41,880
revise our beliefs as to whether
the plane is out there
893
00:46:41,880 --> 00:46:43,130
or is not there.
894
00:46:43,130 --> 00:46:46,040
895
00:46:46,040 --> 00:46:52,660
And so what we need to do is to
calculate the conditional
896
00:46:52,660 --> 00:46:57,290
probabilities of the different
scenarios, given the
897
00:46:57,290 --> 00:46:59,340
information that we got.
898
00:46:59,340 --> 00:47:02,330
So initially, we have these
probabilities for the
899
00:47:02,330 --> 00:47:04,000
different scenarios.
900
00:47:04,000 --> 00:47:06,870
Once we get the information,
we update them and we
901
00:47:06,870 --> 00:47:09,760
calculate our revised
probabilities or conditional
902
00:47:09,760 --> 00:47:14,130
probabilities given the
observation that we made.
903
00:47:14,130 --> 00:47:14,730
OK.
904
00:47:14,730 --> 00:47:15,760
So what do we do?
905
00:47:15,760 --> 00:47:17,620
We just use the definition
of conditional
906
00:47:17,620 --> 00:47:19,360
probabilities twice.
907
00:47:19,360 --> 00:47:22,490
By definition the conditional
probability is the probability
908
00:47:22,490 --> 00:47:25,740
of two things happening divided
by the probability of
909
00:47:25,740 --> 00:47:27,960
the conditioning event.
910
00:47:27,960 --> 00:47:30,480
Now, I'm using the definition
of conditional probabilities
911
00:47:30,480 --> 00:47:33,550
once more, or rather I use
the multiplication rule.
912
00:47:33,550 --> 00:47:35,970
The probability of two things
happening is the probability
913
00:47:35,970 --> 00:47:38,740
of the first and the second.
914
00:47:38,740 --> 00:47:41,190
So these are things that
are given to us.
915
00:47:41,190 --> 00:47:43,430
They're the probabilities of
the different scenarios.
916
00:47:43,430 --> 00:47:47,750
And it's the model of our
measuring device, which we
917
00:47:47,750 --> 00:47:51,810
assume to be available.
918
00:47:51,810 --> 00:47:53,450
And how about the denominator?
919
00:47:53,450 --> 00:47:57,780
This is total probability of the
event B. But we just found
920
00:47:57,780 --> 00:48:01,140
that's it's easy to calculate
using the formula in the
921
00:48:01,140 --> 00:48:02,400
previous slide.
922
00:48:02,400 --> 00:48:04,750
To find the overall probability
of event B
923
00:48:04,750 --> 00:48:08,260
occurring, we look at the
probabilities of B occurring
924
00:48:08,260 --> 00:48:11,560
under the different scenario
and weigh them according to
925
00:48:11,560 --> 00:48:13,710
the probabilities of
all the scenarios.
926
00:48:13,710 --> 00:48:17,370
So in the end, we have a formula
for the conditional
927
00:48:17,370 --> 00:48:22,730
probability, A's given B,
based on the data of the
928
00:48:22,730 --> 00:48:25,090
problem, which were
probabilities of the different
929
00:48:25,090 --> 00:48:27,360
scenarios and conditional
probabilities of
930
00:48:27,360 --> 00:48:29,490
B, given the A's.
931
00:48:29,490 --> 00:48:33,320
So what this calculation does
is, basically, it reverses the
932
00:48:33,320 --> 00:48:35,310
order of conditioning.
933
00:48:35,310 --> 00:48:39,000
We are given conditional
probabilities of these kind,
934
00:48:39,000 --> 00:48:42,950
where it's B given A and we
produce new conditional
935
00:48:42,950 --> 00:48:46,630
probabilities, where things
go the other way.
936
00:48:46,630 --> 00:48:53,530
So schematically, what's
happening here is that we have
937
00:48:53,530 --> 00:48:59,995
model of cause and
effect and--
938
00:48:59,995 --> 00:49:02,550
939
00:49:02,550 --> 00:49:09,840
So a scenario occurs and that
may cause B to happen or may
940
00:49:09,840 --> 00:49:11,880
not cause it to happen.
941
00:49:11,880 --> 00:49:14,495
So this is a cause/effect
model.
942
00:49:14,495 --> 00:49:17,300
943
00:49:17,300 --> 00:49:20,090
And it's modeled using
probabilities, such as
944
00:49:20,090 --> 00:49:23,350
probability of B given Ai.
945
00:49:23,350 --> 00:49:28,710
And what we want to do is
inference where we are told
946
00:49:28,710 --> 00:49:35,910
that B occurs, and we want
to infer whether Ai
947
00:49:35,910 --> 00:49:38,580
also occurred or not.
948
00:49:38,580 --> 00:49:42,050
And the appropriate
probabilities for that are the
949
00:49:42,050 --> 00:49:45,010
conditional probabilities
that A occurred,
950
00:49:45,010 --> 00:49:48,110
given that B occurred.
951
00:49:48,110 --> 00:49:52,250
So we're starting with a causal
model of our situation.
952
00:49:52,250 --> 00:49:57,220
It models from a given cause how
likely is a certain effect
953
00:49:57,220 --> 00:49:58,830
to be observed.
954
00:49:58,830 --> 00:50:02,920
And then we do inference, which
answers the question,
955
00:50:02,920 --> 00:50:06,730
given that the effect was
observed, how likely is it
956
00:50:06,730 --> 00:50:10,870
that the world was in this
particular situation or state
957
00:50:10,870 --> 00:50:12,940
or scenario.
958
00:50:12,940 --> 00:50:17,260
So the name of the Bayes rule
comes from Thomas Bayes, a
959
00:50:17,260 --> 00:50:20,750
British theologian back
in the 1700s.
960
00:50:20,750 --> 00:50:21,530
It actually--
961
00:50:21,530 --> 00:50:25,000
This calculation addresses
a basic problem, a basic
962
00:50:25,000 --> 00:50:30,230
philosophical problem, how one
can learn from experience or
963
00:50:30,230 --> 00:50:33,300
from experimental data and
some systematic way.
964
00:50:33,300 --> 00:50:35,840
So the British at that time
were preoccupied with this
965
00:50:35,840 --> 00:50:36,710
type of question.
966
00:50:36,710 --> 00:50:41,200
Is there a basic theory that
about how we can incorporate
967
00:50:41,200 --> 00:50:44,280
new knowledge to previous
knowledge.
968
00:50:44,280 --> 00:50:47,600
And this calculation made an
argument that, yes, it is
969
00:50:47,600 --> 00:50:50,100
possible to do that in
a systematic way.
970
00:50:50,100 --> 00:50:53,040
So the philosophical
underpinnings of this have a
971
00:50:53,040 --> 00:50:57,050
very long history and a lot
of discussion around them.
972
00:50:57,050 --> 00:51:00,560
But for our purposes, it's just
an extremely useful tool.
973
00:51:00,560 --> 00:51:03,550
And it's the foundation of
almost everything that gets
974
00:51:03,550 --> 00:51:07,190
done when you try to do
inference based on partial
975
00:51:07,190 --> 00:51:08,860
observations.
976
00:51:08,860 --> 00:51:09,690
Very well.
977
00:51:09,690 --> 00:51:10,940
Till next time.
978
00:51:10,940 --> 00:51:11,760