WEBVTT
00:00:09.082 --> 00:00:13.380
PATRICK WINSTON: Here we are,
down to the final sprint.
00:00:13.380 --> 00:00:15.480
Three to go.
00:00:15.480 --> 00:00:17.955
And we're going to take some of
the last three, maybe two
00:00:17.955 --> 00:00:22.640
of the last three, to talk a
little bit about stuff having
00:00:22.640 --> 00:00:24.920
to do with probabilistic
approaches--
00:00:24.920 --> 00:00:28.400
use of probability in artificial
intelligence.
00:00:28.400 --> 00:00:31.060
Now, for many of you, this
will be kind of a review,
00:00:31.060 --> 00:00:33.370
because I know many of you
learned about probability over
00:00:33.370 --> 00:00:37.230
the [? sand ?] table and
every year since then.
00:00:37.230 --> 00:00:39.780
But maybe we'll put another
little twist into it,
00:00:39.780 --> 00:00:42.340
especially toward the end of
the hour when we get into a
00:00:42.340 --> 00:00:52.160
discussion of that which has
come to be called belief nets.
00:00:52.160 --> 00:00:58.530
But first, I was driving in this
morning, and I was quite
00:00:58.530 --> 00:01:05.670
astonished to see, as I drove
in, this thing here.
00:01:05.670 --> 00:01:09.720
And my first reaction was, oh
my god, it's the world's
00:01:09.720 --> 00:01:12.710
greatest hack.
00:01:12.710 --> 00:01:16.525
And then I decided, well, maybe
it's a piece of art.
00:01:19.170 --> 00:01:22.420
So I'd like to address the
question of how I could come
00:01:22.420 --> 00:01:23.890
to grips with that issue.
00:01:23.890 --> 00:01:28.530
There's a distinct possibility
that this thing is a
00:01:28.530 --> 00:01:32.070
consequence of a hat, possibly
the result of
00:01:32.070 --> 00:01:33.320
some kind of art show.
00:01:36.740 --> 00:01:43.960
And in any event, some sort of
statue appeared, and statues
00:01:43.960 --> 00:01:46.190
don't usually appear
like that.
00:01:46.190 --> 00:01:48.710
So I got the possibility of
thinking about how all these
00:01:48.710 --> 00:01:54.150
things might occur together
or not occur together.
00:01:54.150 --> 00:02:00.670
So the natural thing is to
build myself some sort of
00:02:00.670 --> 00:02:05.640
table to keep track of
my observations.
00:02:05.640 --> 00:02:08.478
So I have three columns
in my table.
00:02:08.478 --> 00:02:13.680
I've got the possibility of
a statue appearing, a hack
00:02:13.680 --> 00:02:17.960
having occurred, and some
sort of art show.
00:02:17.960 --> 00:02:22.090
And so I can make a table of all
the combinations of those
00:02:22.090 --> 00:02:23.340
things that might appear.
00:02:28.290 --> 00:02:32.520
And I happen to have already
guessed that there are going
00:02:32.520 --> 00:02:36.010
to be eight rows in my table.
00:02:36.010 --> 00:02:37.310
So it's going to
look like this.
00:02:44.430 --> 00:02:47.460
And this is the set of
combinations in this row where
00:02:47.460 --> 00:02:50.110
none of that occurs at all.
00:02:50.110 --> 00:02:53.030
And down here is the situation
where all of
00:02:53.030 --> 00:02:54.150
those things occur.
00:02:54.150 --> 00:02:57.620
After all, it's possible that
we can have an art show and
00:02:57.620 --> 00:03:00.320
have a hack be a legitimate
participant in the art show.
00:03:00.320 --> 00:03:03.440
That's why we have
that final row.
00:03:03.440 --> 00:03:05.665
So we have all manner of
combinations in between.
00:03:08.810 --> 00:03:11.063
So those are those
combinations.
00:03:11.063 --> 00:03:21.880
Then we have F, F, T, T, F, F,
T, T, F, T, F, T, F, T, F, T.
00:03:21.880 --> 00:03:26.150
So it's plain that the number of
rows in the table, or these
00:03:26.150 --> 00:03:31.820
binary possibilities, is 2 to
the number of variables.
00:03:31.820 --> 00:03:33.610
And that could be
a big number.
00:03:33.610 --> 00:03:38.329
In fact, I'd love to do a bigger
example, but I don't
00:03:38.329 --> 00:03:40.795
have the patience to do it.
00:03:40.795 --> 00:03:44.940
But anyhow, what we might do is
in order to figure out how
00:03:44.940 --> 00:03:47.970
likely any of these combinations
are, is we might
00:03:47.970 --> 00:03:50.590
have observed the area outside
the student center and rest of
00:03:50.590 --> 00:03:54.370
campus over a long period of
time and keep track of what
00:03:54.370 --> 00:03:58.020
happens on 1,000 days.
00:03:58.020 --> 00:04:01.230
Or maybe 1,000 months
or 1,000 years.
00:04:01.230 --> 00:04:02.730
I don't know.
00:04:02.730 --> 00:04:04.980
The trouble is, these events
don't happen very often.
00:04:04.980 --> 00:04:08.690
So the period of time that I use
for measurement needs to
00:04:08.690 --> 00:04:09.440
be fairly long.
00:04:09.440 --> 00:04:10.950
Probably a day is not
short enough.
00:04:10.950 --> 00:04:16.720
But in any case, I can keep a
tally of how often I see these
00:04:16.720 --> 00:04:18.420
various combinations.
00:04:18.420 --> 00:04:22.330
So this one might be, for
example, 405, this one might
00:04:22.330 --> 00:04:26.820
be 45, this one might be
225, this one might
00:04:26.820 --> 00:04:30.000
be 40, and so on.
00:04:30.000 --> 00:04:32.720
And so having done all those
measurements, kept track of
00:04:32.720 --> 00:04:37.340
all that data, then I could
say, well, the probability
00:04:37.340 --> 00:04:45.350
that at any given time period
one of these things occurs
00:04:45.350 --> 00:04:48.280
will just be the frequency--
00:04:48.280 --> 00:04:49.670
the number of tallies
divided by the
00:04:49.670 --> 00:04:51.760
total number of tallies.
00:04:51.760 --> 00:04:53.400
So that would be a number
between 0 and 1.
00:04:56.330 --> 00:05:00.990
So that's the probability for
each of these events.
00:05:00.990 --> 00:05:03.690
And it's readily calculated
from my data.
00:05:03.690 --> 00:05:09.020
And once I do that, then I can
say that I got myself a joint
00:05:09.020 --> 00:05:14.160
probability table, and I could
perform all manner of miracles
00:05:14.160 --> 00:05:17.220
using that joint probability
table.
00:05:17.220 --> 00:05:18.950
So let me perform a few
of those miracles,
00:05:18.950 --> 00:05:20.200
while we're at it.
00:05:28.240 --> 00:05:29.730
There's the table.
00:05:29.730 --> 00:05:36.010
And now, what I want to do
is I want to count up the
00:05:36.010 --> 00:05:39.280
probability in all the rows
where the statue appears.
00:05:39.280 --> 00:05:41.740
So that's going to be
the probability
00:05:41.740 --> 00:05:43.770
of the statue appearing.
00:05:43.770 --> 00:05:47.990
So I'll just check off those
four boxes there.
00:05:47.990 --> 00:05:49.800
And it looks like the
probability of the statue
00:05:49.800 --> 00:05:54.210
appearing is about 0.355
in my model.
00:05:54.210 --> 00:05:56.320
I don't think it's quite that
frequent, but this is a
00:05:56.320 --> 00:05:57.590
classroom exercise, right?
00:05:57.590 --> 00:06:01.670
So I can make up whatever
numbers I want.
00:06:04.560 --> 00:06:09.170
Now, I could say, well, what's
the probability of a statue
00:06:09.170 --> 00:06:15.220
occurring given that there's
an art show?
00:06:15.220 --> 00:06:18.180
Well, I can limit my tallies to
those in which art show is
00:06:18.180 --> 00:06:22.030
true, like so.
00:06:22.030 --> 00:06:23.210
And in that case,
the probability
00:06:23.210 --> 00:06:24.410
has just zoomed up.
00:06:24.410 --> 00:06:26.490
So if I know there's an art
show, there's a much higher
00:06:26.490 --> 00:06:30.885
probability that a statue
will appear.
00:06:33.540 --> 00:06:37.400
And if I know there's a hack as
well as an art show going
00:06:37.400 --> 00:06:41.750
on, it goes up higher
still to 0.9.
00:06:41.750 --> 00:06:43.180
We can also do other
kinds of things.
00:06:43.180 --> 00:06:46.060
For example, we can go back
to the original table.
00:06:46.060 --> 00:06:53.450
And instead of counting up the
probability we've got a
00:06:53.450 --> 00:06:56.310
statue, as we just did, we're
going to calculate the
00:06:56.310 --> 00:06:59.680
probability that there
is an art show.
00:06:59.680 --> 00:07:02.800
I guess that would be that one
and that one, not that one,
00:07:02.800 --> 00:07:03.890
but that one.
00:07:03.890 --> 00:07:09.310
So the probability there's an
art show is one chance in 10.
00:07:09.310 --> 00:07:11.030
Or we can do the same
thing with a hack.
00:07:11.030 --> 00:07:16.085
In that case, we get that one
off, that one on, that one
00:07:16.085 --> 00:07:17.565
off, that one on, that
one off, that one
00:07:17.565 --> 00:07:18.930
on, that one off.
00:07:18.930 --> 00:07:21.420
So the probability of a hack
on any given time period is
00:07:21.420 --> 00:07:25.230
about 50-50.
00:07:25.230 --> 00:07:27.750
So I've cooked up this little
demo so it does the "ands" of
00:07:27.750 --> 00:07:28.240
all these things.
00:07:28.240 --> 00:07:29.980
It could do "ors," too, with
a little more work.
00:07:29.980 --> 00:07:33.130
But these are just the "ands" of
these various combinations.
00:07:33.130 --> 00:07:35.270
Then you can ask more
complicated questions, like
00:07:35.270 --> 00:07:38.130
for example, you could say, what
is the probability of a
00:07:38.130 --> 00:07:43.540
hack given that there's
a statue?
00:07:43.540 --> 00:07:48.620
And that would be limiting the
calculations to those rows in
00:07:48.620 --> 00:07:50.075
which the statue
thing is true.
00:07:52.840 --> 00:07:57.430
And then what I get is 0.781.
00:07:57.430 --> 00:08:04.010
Now, what would happen to the
probability that it's a hack
00:08:04.010 --> 00:08:05.375
if I know that there's
an art show?
00:08:07.930 --> 00:08:09.280
Will that number
go up or down?
00:08:12.580 --> 00:08:15.010
Well, let's try it.
00:08:15.010 --> 00:08:17.600
Ah, it went down.
00:08:17.600 --> 00:08:21.530
So that's sort of because the
existence of the art show sort
00:08:21.530 --> 00:08:27.810
of explains why the statue
might be there.
00:08:27.810 --> 00:08:29.660
Now, just for fun, I'm going
to switch to another
00:08:29.660 --> 00:08:31.610
situation, very similar.
00:08:31.610 --> 00:08:39.100
And the situation here is that
a neighbor's dog often barks.
00:08:39.100 --> 00:08:40.770
It might be because
of a burglar.
00:08:40.770 --> 00:08:42.480
It might be because
of a raccoon.
00:08:42.480 --> 00:08:45.660
Sometimes, there's a burglar
and a raccoon.
00:08:45.660 --> 00:08:48.900
Sometimes, the damn
dog just barks.
00:08:48.900 --> 00:08:55.370
So let's do some calculations
there and calculate the
00:08:55.370 --> 00:08:58.310
probability that a raccoon
is true, similar to
00:08:58.310 --> 00:09:00.660
what we did last time.
00:09:00.660 --> 00:09:03.240
Looks like on any
given night--
00:09:03.240 --> 00:09:05.550
it's kind of a wooded are--
there's a high probability of
00:09:05.550 --> 00:09:08.600
a raccoon showing up.
00:09:08.600 --> 00:09:16.130
And then we can ask, well, what
is the probability of the
00:09:16.130 --> 00:09:19.670
dog barking given that
a raccoon shows up?
00:09:19.670 --> 00:09:21.960
Well, in that case, we want to
just limit the number of rows
00:09:21.960 --> 00:09:23.190
to those where a raccoon--
00:09:23.190 --> 00:09:26.430
or where the dog is barking.
00:09:26.430 --> 00:09:30.320
Looks like the probability of
the dog barking, knowing
00:09:30.320 --> 00:09:32.410
nothing else, is about
[? 3/7. ?]
00:09:36.790 --> 00:09:40.290
But now we want to know the
probability of the raccoon--
00:09:40.290 --> 00:09:43.030
that's these guys here
need to get checked.
00:09:43.030 --> 00:09:44.570
These are off.
00:09:44.570 --> 00:09:46.115
So that's the probability
of a raccoon.
00:09:49.400 --> 00:09:52.490
Did I get that right?
00:09:52.490 --> 00:09:54.340
Oh, that's probability
of a burglar.
00:09:54.340 --> 00:09:55.590
Sorry, that was too hard.
00:09:57.540 --> 00:09:59.550
So let me go back
and calculate--
00:09:59.550 --> 00:10:02.050
I want to get the probability
of a raccoon.
00:10:02.050 --> 00:10:10.930
That's true, false, true, false,
true, false, true.
00:10:10.930 --> 00:10:12.560
So the probability of
a raccoon, as I
00:10:12.560 --> 00:10:14.570
said before is 0.5.
00:10:14.570 --> 00:10:18.220
Now, what happens to that
probability if I know the dog
00:10:18.220 --> 00:10:20.050
is barking?
00:10:20.050 --> 00:10:23.690
Well, all I need to do is limit
my rows to those where
00:10:23.690 --> 00:10:26.510
the dog is barking,
those bottom four.
00:10:26.510 --> 00:10:28.800
And I'll click that there, and
you'll notice all these
00:10:28.800 --> 00:10:33.690
tallies up above the midpoint
have gone to zero, because
00:10:33.690 --> 00:10:35.380
we're only considering
those cases
00:10:35.380 --> 00:10:37.590
where the dog is barking.
00:10:37.590 --> 00:10:40.140
In that case, the probability
that there's a raccoon--
00:10:40.140 --> 00:10:41.500
just the number of
tallies over the
00:10:41.500 --> 00:10:43.740
total number of tallies--
00:10:43.740 --> 00:10:48.150
gee, I guess it's 225 plus
50 divided by 370.
00:10:48.150 --> 00:10:51.050
That turns out to be 0.743.
00:10:51.050 --> 00:10:56.400
So about 75% of the time, the
dog barking is accounted for--
00:10:56.400 --> 00:10:59.680
well, the probability of a
raccoon under those conditions
00:10:59.680 --> 00:11:01.560
is pretty high.
00:11:01.560 --> 00:11:04.560
And now, once again, I'm going
to ask, well, what is the
00:11:04.560 --> 00:11:08.810
probability of a raccoon, given
that the dog is barking
00:11:08.810 --> 00:11:12.170
and there's a burglar?
00:11:12.170 --> 00:11:14.040
Any guess what will
happen there?
00:11:14.040 --> 00:11:18.120
We did this once before
with the statue.
00:11:18.120 --> 00:11:20.680
Probability first went up when
we saw the statue and then
00:11:20.680 --> 00:11:23.340
went down when we saw
another explanation.
00:11:23.340 --> 00:11:24.860
Here's this one here.
00:11:24.860 --> 00:11:25.930
Wow, look at that.
00:11:25.930 --> 00:11:29.830
It went back to its original
condition, its a priori
00:11:29.830 --> 00:11:32.120
probability.
00:11:32.120 --> 00:11:35.850
So somehow, the existence of
the burglar and the dog
00:11:35.850 --> 00:11:39.740
barking means that the
probability of a raccoon is
00:11:39.740 --> 00:11:42.402
just what it was before
we started this game.
00:11:42.402 --> 00:11:44.350
So those are kind of interesting
questions, and
00:11:44.350 --> 00:11:47.090
there's a lot we can do when we
have this table by way of
00:11:47.090 --> 00:11:50.220
those kinds of calculations.
00:11:50.220 --> 00:11:54.760
And in fact, the whole miracle
of probabilistic inference is
00:11:54.760 --> 00:11:55.480
right in front of us.
00:11:55.480 --> 00:11:58.130
It's the table.
00:11:58.130 --> 00:12:00.060
So why don't we go home?
00:12:00.060 --> 00:12:03.980
Well, because there's a little
problem with this table--
00:12:03.980 --> 00:12:06.370
with these two tables that
I've shown you by way of
00:12:06.370 --> 00:12:08.210
illustration.
00:12:08.210 --> 00:12:17.250
And the problem is that there
are a lot of rows.
00:12:17.250 --> 00:12:19.160
And I had a hard time making
up those numbers.
00:12:19.160 --> 00:12:21.910
I didn't have the patience to
wait and make observations.
00:12:21.910 --> 00:12:23.580
That would take too long.
00:12:23.580 --> 00:12:25.910
So I had to kind of
make some guesses.
00:12:25.910 --> 00:12:29.730
And I could kind of manage
it with eight rows--
00:12:29.730 --> 00:12:31.290
those up there.
00:12:31.290 --> 00:12:33.740
I could put in some tallies.
00:12:33.740 --> 00:12:35.670
It wasn't that big of a deal.
00:12:35.670 --> 00:12:39.330
So I got myself all those
eight numbers
00:12:39.330 --> 00:12:42.760
up there like that.
00:12:42.760 --> 00:12:48.130
And similarly, for the art show
calculations, produced
00:12:48.130 --> 00:12:50.000
eight numbers.
00:12:50.000 --> 00:12:53.250
But what if I added something
else to the mix?
00:12:53.250 --> 00:12:57.310
What if I added the
day of the week or
00:12:57.310 --> 00:12:59.530
what I had for breakfast?
00:12:59.530 --> 00:13:03.000
Each of those things would
double the number of rows of
00:13:03.000 --> 00:13:06.350
their binary variables.
00:13:06.350 --> 00:13:12.860
So if I have to consider 10
influences all working
00:13:12.860 --> 00:13:14.790
together, then I'd have
2 to the 10th.
00:13:14.790 --> 00:13:18.810
I'd have 1,000 numbers
to deal with.
00:13:18.810 --> 00:13:21.020
And that would be hard.
00:13:21.020 --> 00:13:23.110
But if I had a joint probability
table, then I can
00:13:23.110 --> 00:13:24.850
do these kinds of miracles.
00:13:24.850 --> 00:13:27.640
But Dave, if I could have this
little projector now, please.
00:13:31.570 --> 00:13:34.780
I just want to emphasize that
although we're talking about
00:13:34.780 --> 00:13:38.430
probabilistic inference, and
it's a very powerful tool,
00:13:38.430 --> 00:13:41.500
it's not the only tool
we need in our bag.
00:13:41.500 --> 00:13:44.080
Trouble with most ideas in
artificial intelligence is
00:13:44.080 --> 00:13:46.630
that their hardcore proponents
think that they're the only
00:13:46.630 --> 00:13:48.420
thing to do.
00:13:48.420 --> 00:13:52.720
And probabilistic inference
has a role to play in
00:13:52.720 --> 00:13:54.550
developing a theory of
human intelligence.
00:13:54.550 --> 00:13:56.880
And it certainly has a practical
value, but it's not
00:13:56.880 --> 00:13:58.070
the only thing.
00:13:58.070 --> 00:14:01.300
And to illustrate that point,
I'd like to imagine for a few
00:14:01.300 --> 00:14:10.920
moments that MIT were founded in
1861 BC instead of 1861 AD.
00:14:10.920 --> 00:14:15.660
And if that were so, then it
might be the case that there
00:14:15.660 --> 00:14:19.220
would be a research program
on what floats.
00:14:19.220 --> 00:14:21.980
And this, of course, would be
a problem in experimental
00:14:21.980 --> 00:14:25.880
physics, and we could imagine
that those people back there
00:14:25.880 --> 00:14:29.800
in that early MIT would, being
experimentally minded, try
00:14:29.800 --> 00:14:30.900
some things.
00:14:30.900 --> 00:14:33.210
Oh, I didn't know that's
what happened.
00:14:33.210 --> 00:14:35.660
It looks like chalk floats.
00:14:35.660 --> 00:14:38.384
Here's a rock.
00:14:38.384 --> 00:14:40.710
No, it didn't float.
00:14:40.710 --> 00:14:43.300
Here's some money.
00:14:43.300 --> 00:14:44.910
Doesn't float.
00:14:44.910 --> 00:14:46.160
Here's a pencil.
00:14:48.630 --> 00:14:49.600
No, it doesn't float.
00:14:49.600 --> 00:14:51.690
Here's a pen.
00:14:51.690 --> 00:14:54.075
Here's a piece of tin foil
I got from Kendra.
00:14:54.075 --> 00:14:55.490
That floats.
00:14:55.490 --> 00:14:56.130
That's a metal.
00:14:56.130 --> 00:14:57.180
The other stuff's metal, too.
00:14:57.180 --> 00:14:58.830
Now I'm really getting
confused.
00:14:58.830 --> 00:15:01.530
Here's a little wad of paper.
00:15:01.530 --> 00:15:04.670
Here's a cell ph--
no, actually,
00:15:04.670 --> 00:15:05.660
I've tried that before.
00:15:05.660 --> 00:15:06.910
They don't float.
00:15:06.910 --> 00:15:08.240
And they also don't work
afterward, either.
00:15:10.950 --> 00:15:16.840
I don't need to do any of that
in the MIT of 1861 AD and
00:15:16.840 --> 00:15:19.330
beyond, because I know
that Archimedes
00:15:19.330 --> 00:15:20.410
worked this all out.
00:15:20.410 --> 00:15:22.300
And all I have to do is measure
the volume of the
00:15:22.300 --> 00:15:27.380
stuff, divide that by the
weight, and if that ratio is
00:15:27.380 --> 00:15:29.970
big enough, then the
thing will float.
00:15:29.970 --> 00:15:32.220
But back in the old days, I
would have to try a lot of
00:15:32.220 --> 00:15:35.470
stuff and make a big table,
taking into account such
00:15:35.470 --> 00:15:40.400
factors as how hard it is, how
big it is, how heavy it is,
00:15:40.400 --> 00:15:42.740
whether it's alive or not.
00:15:42.740 --> 00:15:44.540
Most things that are
alive float.
00:15:44.540 --> 00:15:46.290
Some don't.
00:15:46.290 --> 00:15:49.030
Fish don't, for instance.
00:15:49.030 --> 00:15:52.790
So it would be foolhardy
to do that.
00:15:52.790 --> 00:15:56.580
That's sort of a probabilistic
inference.
00:15:56.580 --> 00:15:58.430
On the other hand, there are
lots of things where I don't
00:15:58.430 --> 00:16:00.480
know all the stuff I need to
know in order to make the
00:16:00.480 --> 00:16:01.600
calculation.
00:16:01.600 --> 00:16:03.490
I know all the stuff I need to
know in order to decide if
00:16:03.490 --> 00:16:06.530
something floats, but not all
the stuff I need to know in
00:16:06.530 --> 00:16:14.210
order, for example, to decide
if the child of a Republican
00:16:14.210 --> 00:16:17.860
is likely to be a Republican.
00:16:17.860 --> 00:16:20.390
There are a lot of subtle
influences there, and it is
00:16:20.390 --> 00:16:23.365
the case that the children of
Republicans and the children
00:16:23.365 --> 00:16:26.010
of Democrats are more likely to
share the political party
00:16:26.010 --> 00:16:28.360
of their parents.
00:16:28.360 --> 00:16:30.280
But I don't have any direct
way of calculating whether
00:16:30.280 --> 00:16:32.310
that will be true or not.
00:16:32.310 --> 00:16:35.590
All I can do in that case is
what I've done over here, is
00:16:35.590 --> 00:16:38.950
do some measurements, get some
frequencies, take some
00:16:38.950 --> 00:16:42.440
snapshots of the way the world
is and incorporate that into a
00:16:42.440 --> 00:16:45.630
set of probabilities that can
help me determine if any given
00:16:45.630 --> 00:16:50.100
parent is a Republican, given
that I've observed the voting
00:16:50.100 --> 00:16:52.930
behavior their children.
00:16:52.930 --> 00:16:56.010
So probability has a place,
but it's not the
00:16:56.010 --> 00:16:57.760
only tool we need.
00:16:57.760 --> 00:17:00.250
And that is an important
preamble to all the stuff
00:17:00.250 --> 00:17:02.200
we're going to do today.
00:17:02.200 --> 00:17:04.770
Now, we're really through,
because this joint probability
00:17:04.770 --> 00:17:08.240
table is all that there is to
it, except for the fact we
00:17:08.240 --> 00:17:13.290
can't either record all those
numbers, and it becomes
00:17:13.290 --> 00:17:16.579
quickly a pain to
guess at them.
00:17:16.579 --> 00:17:19.348
There are two ways to think
about all this.
00:17:19.348 --> 00:17:22.880
We can think about these
probabilities as probabilities
00:17:22.880 --> 00:17:25.230
that come out of looking
at some data.
00:17:25.230 --> 00:17:28.180
That's a frequentist view
of the probabilities.
00:17:28.180 --> 00:17:30.310
Or we could say, well, we can't
do those measurements.
00:17:30.310 --> 00:17:32.500
So I can just make them up.
00:17:32.500 --> 00:17:34.820
That's sort of the subjective
view of where these
00:17:34.820 --> 00:17:37.530
probabilities come from.
00:17:37.530 --> 00:17:41.480
And in some cases, some people
like to talk about natural
00:17:41.480 --> 00:17:44.790
propensities, like in
quantum mechanics.
00:17:44.790 --> 00:17:47.330
But for our purposes, we either
make them up, or we do
00:17:47.330 --> 00:17:49.140
some tallying.
00:17:49.140 --> 00:17:52.440
Trouble is, we can't deal
with this kind of table.
00:17:52.440 --> 00:17:54.920
So as a consequence of not being
able to deal with this
00:17:54.920 --> 00:18:00.020
kind of table, a gigantic
industry has emerged for
00:18:00.020 --> 00:18:04.370
dealing with probabilities
without the need to work up
00:18:04.370 --> 00:18:06.340
this full table.
00:18:06.340 --> 00:18:07.570
And that's where we're
going to go for
00:18:07.570 --> 00:18:08.820
the rest of the hour.
00:18:12.620 --> 00:18:15.408
And here's the path we're
going to take.
00:18:15.408 --> 00:18:18.050
We're going to talk about some
basic overview of basic
00:18:18.050 --> 00:18:19.380
probability.
00:18:19.380 --> 00:18:23.080
Then we're going to move
ourselves step by step toward
00:18:23.080 --> 00:18:26.500
the so-called belief networks,
which make it possible to make
00:18:26.500 --> 00:18:29.550
this a practical tool.
00:18:29.550 --> 00:18:31.120
So let us begin.
00:18:31.120 --> 00:18:33.950
The first thing is basic
probability.
00:18:33.950 --> 00:18:36.660
Let us say basic.
00:18:39.270 --> 00:18:41.060
And basic probability--
00:18:41.060 --> 00:18:44.830
all probability flows from
a small number of axioms.
00:18:44.830 --> 00:18:50.730
We have the probability of some
event a has got to be
00:18:50.730 --> 00:18:54.400
greater than 0 and
less than 1.
00:18:54.400 --> 00:18:55.890
That's axiom number one.
00:18:59.460 --> 00:19:02.745
In a binary world, things have
a probability of being true.
00:19:02.745 --> 00:19:05.190
Some have a probability
of being false.
00:19:05.190 --> 00:19:07.530
But the true event doesn't have
any possibility of being
00:19:07.530 --> 00:19:12.470
anything other than true, so
the probability of true is
00:19:12.470 --> 00:19:16.850
equal to 1, and the probability
of false--
00:19:16.850 --> 00:19:20.480
the false event, the
false condition--
00:19:20.480 --> 00:19:24.430
has no possibility of being
true, so that's 0.
00:19:24.430 --> 00:19:31.190
Then the third of the axioms
of probability is that the
00:19:31.190 --> 00:19:39.760
probability of a plus the
probability of b minus the
00:19:39.760 --> 00:19:47.510
probability of a and
b is equal to the
00:19:47.510 --> 00:19:51.510
probability of a or b.
00:19:54.040 --> 00:19:56.380
Yeah, that makes sense, right?
00:19:56.380 --> 00:19:58.900
I guess it would make more sense
if I didn't switch my
00:19:58.900 --> 00:20:00.730
notation in midstream--
00:20:00.730 --> 00:20:03.970
a and b.
00:20:03.970 --> 00:20:06.290
So those are the axioms that
mathematicians love to start
00:20:06.290 --> 00:20:08.180
up that way, and they can derive
everything there is to
00:20:08.180 --> 00:20:09.040
derive from that.
00:20:09.040 --> 00:20:12.210
But I never can deal with
stuff that way.
00:20:12.210 --> 00:20:14.270
I have to draw a picture and
think of this stuff in a more
00:20:14.270 --> 00:20:16.530
intuitionist type of way.
00:20:16.530 --> 00:20:20.530
So that's the formal approach
to dealing with probability,
00:20:20.530 --> 00:20:28.810
and it's mirrored by intuitions
that have to do
00:20:28.810 --> 00:20:34.410
with discussions of spaces,
like so, in which we have
00:20:34.410 --> 00:20:42.120
circles, or areas, representing
a and b.
00:20:42.120 --> 00:20:45.260
And to keep my notation
consistent,
00:20:45.260 --> 00:20:46.510
I'll make those lowercase.
00:20:49.330 --> 00:20:53.580
So you can think of those as
spaces of all possible worlds
00:20:53.580 --> 00:20:54.990
in which these things
might occur.
00:20:54.990 --> 00:20:58.100
Or you can think of them
as sample spaces.
00:20:58.100 --> 00:21:01.290
But in any event, you associate
with the probability
00:21:01.290 --> 00:21:06.330
of a the size of this area here
relative to the total
00:21:06.330 --> 00:21:08.860
area in the rectangle--
00:21:08.860 --> 00:21:10.850
the universe.
00:21:10.850 --> 00:21:15.570
So the probability of a is the
size of this circle divided by
00:21:15.570 --> 00:21:18.690
the size of this rectangle
in this picture.
00:21:18.690 --> 00:21:22.210
So now all these axioms
make sense.
00:21:22.210 --> 00:21:25.250
The probability that a is
certain is just when that
00:21:25.250 --> 00:21:29.010
fills up the whole thing, and
there's no other place for a
00:21:29.010 --> 00:21:31.590
sample to be, that means
it has to be a.
00:21:31.590 --> 00:21:35.570
So that probability goes
all the way up to 1.
00:21:35.570 --> 00:21:39.450
On the other hand, if the size
of a is just an infinitesimal
00:21:39.450 --> 00:21:44.230
dot, then the chances of landing
in that world is 0.
00:21:44.230 --> 00:21:46.900
That's the bound on
the other end.
00:21:46.900 --> 00:21:48.860
So this--
00:21:48.860 --> 00:21:50.900
axiom number one-- makes
sense in terms of that
00:21:50.900 --> 00:21:52.250
picture over there.
00:21:52.250 --> 00:21:54.290
Likewise, axiom number two.
00:21:54.290 --> 00:21:57.500
What about axiom number three?
00:21:57.500 --> 00:22:03.150
Does that make sense in terms
of all this stuff?
00:22:03.150 --> 00:22:08.430
And the answer is, sure, because
we can just look at
00:22:08.430 --> 00:22:12.850
those areas with a little
bit of colored chalk.
00:22:12.850 --> 00:22:16.920
And so the probability of a
is just this area here.
00:22:16.920 --> 00:22:21.300
The probability of b
is this area here.
00:22:21.300 --> 00:22:23.330
And if we want to know the
probability that we're in
00:22:23.330 --> 00:22:27.700
either a or b, then we just have
to add up those areas.
00:22:27.700 --> 00:22:30.040
But when we add up those areas,
this intersection part
00:22:30.040 --> 00:22:32.260
is added in twice.
00:22:32.260 --> 00:22:35.675
So we've got to subtract that
off in order to make this
00:22:35.675 --> 00:22:38.300
thing make a rational equation,
so that makes sense.
00:22:38.300 --> 00:22:40.230
And axiom three makes
sense, just as
00:22:40.230 --> 00:22:43.110
axioms one and two did.
00:22:43.110 --> 00:22:45.370
So that's all there is
to basic probability.
00:22:45.370 --> 00:22:48.060
And now you could do all sorts
of algebra on that, and it's
00:22:48.060 --> 00:22:51.420
elegant, because it's like
circuit theory or
00:22:51.420 --> 00:22:54.240
electromagnetism, because
from a very
00:22:54.240 --> 00:22:55.970
small number of axioms--
00:22:55.970 --> 00:22:57.730
in this case three--
00:22:57.730 --> 00:23:02.180
you can build an elegant
mathematical system.
00:23:02.180 --> 00:23:03.910
And that's what probability
subjects do.
00:23:03.910 --> 00:23:06.760
But we're not going to go there,
because we're sort of
00:23:06.760 --> 00:23:10.740
focused on getting down to a
point where we can deal with
00:23:10.740 --> 00:23:12.570
that joint probability
table that we
00:23:12.570 --> 00:23:14.260
currently can't deal with.
00:23:14.260 --> 00:23:17.050
So we're not going to go into
a whole lot of algebra with
00:23:17.050 --> 00:23:17.810
these things.
00:23:17.810 --> 00:23:22.620
Just what we need in order to
go through that network.
00:23:22.620 --> 00:23:25.440
So the next thing we need to
deal with is conditional
00:23:25.440 --> 00:23:27.220
probability.
00:23:27.220 --> 00:23:30.360
And whereas those are axioms,
this is a definition.
00:23:35.100 --> 00:23:41.620
We say that the probability of
a given b is equal to, by
00:23:41.620 --> 00:23:46.760
definition, the probability
of a and b.
00:23:46.760 --> 00:23:48.880
I'm using that common notation
to mean [INAUDIBLE]
00:23:48.880 --> 00:23:51.760
as is conventional
in the field.
00:23:51.760 --> 00:23:57.190
And then we're going to divide
that by the probability of B.
00:23:57.190 --> 00:23:59.390
You can take that as a
definition, and then it's just
00:23:59.390 --> 00:24:01.970
a little bit of mysterious
algebra.
00:24:01.970 --> 00:24:05.600
Or you could do like we did up
there and take an intuitionist
00:24:05.600 --> 00:24:13.100
approach and ask what that
stuff means in terms of a
00:24:13.100 --> 00:24:17.560
circle diagram and some
sort of space.
00:24:17.560 --> 00:24:18.960
And let's see, what
does that mean?
00:24:18.960 --> 00:24:23.320
It means that we're trying to
restrict the probability of a
00:24:23.320 --> 00:24:29.370
to those circumstances where
b is known to be so.
00:24:29.370 --> 00:24:30.620
And we're going to say that--
00:24:33.080 --> 00:24:37.810
we've got this part here,
and then we've got the
00:24:37.810 --> 00:24:41.370
intersection of a with b.
00:24:41.370 --> 00:24:44.680
And so it does make sense as a
definition, because it says
00:24:44.680 --> 00:24:47.210
that if you've got b, then the
probability that you're going
00:24:47.210 --> 00:24:50.190
to get a is the size of
that intersection--
00:24:50.190 --> 00:24:52.140
the pink and orange stuff--
00:24:52.140 --> 00:24:55.240
divided by the whole of b.
00:24:55.240 --> 00:24:58.450
So it's as if we restricted the
universe of consideration
00:24:58.450 --> 00:25:00.950
to just that part of the
original universe
00:25:00.950 --> 00:25:03.220
as covered by b.
00:25:03.220 --> 00:25:07.370
So that makes sense
as a definition.
00:25:07.370 --> 00:25:14.430
And we can rewrite that, of
course, as P of a and b is
00:25:14.430 --> 00:25:19.190
equal to the probability
of a given b times the
00:25:19.190 --> 00:25:21.000
probability of b.
00:25:23.740 --> 00:25:27.370
That's all basic stuff.
00:25:27.370 --> 00:25:31.370
Now, we do want to do a little
bit of algebra here, because I
00:25:31.370 --> 00:25:34.570
want to consider not just two
cases, but what if we divide
00:25:34.570 --> 00:25:37.960
this space up into
three parts?
00:25:37.960 --> 00:25:44.310
Then we'll say that the
probability of a, b, and c is
00:25:44.310 --> 00:25:45.560
equal to what?
00:25:48.900 --> 00:25:51.000
Well, there are lots of ways
to think about that.
00:25:51.000 --> 00:25:54.360
But one way to think about it is
that we are restricting the
00:25:54.360 --> 00:25:56.410
universe to that part
of the world where b
00:25:56.410 --> 00:25:59.530
and c are both true.
00:25:59.530 --> 00:26:03.380
So let's say that y is
equal to b and c--
00:26:08.330 --> 00:26:12.980
the intersection of b and c,
where a and b are both true.
00:26:12.980 --> 00:26:18.570
Then we can use this formula
over here to say that
00:26:18.570 --> 00:26:24.270
probability of a, b, and c is
equal to the probability of a
00:26:24.270 --> 00:26:33.670
and y, which is equal to the
probability of a given y times
00:26:33.670 --> 00:26:36.090
the probability of y.
00:26:36.090 --> 00:26:42.260
And then we can expand that back
out and say that P of a
00:26:42.260 --> 00:26:48.020
given b and c is equal
to the probability--
00:26:48.020 --> 00:26:52.500
sorry, times the probability
of y, but y is equal to the
00:26:52.500 --> 00:26:57.250
probability of b
and c, like so.
00:27:00.990 --> 00:27:02.720
Ah, but wait--
00:27:02.720 --> 00:27:06.365
we can run this idea over that
one, too, and we can say that
00:27:06.365 --> 00:27:09.760
this whole works is equal to the
probability of a given b
00:27:09.760 --> 00:27:16.130
and c times the probability
of b given c times the
00:27:16.130 --> 00:27:19.480
probability of c.
00:27:19.480 --> 00:27:22.086
And now, when we stand back and
let that sing to us, we
00:27:22.086 --> 00:27:25.010
can see that some magic is
beginning to happen here,
00:27:25.010 --> 00:27:29.660
because we've taken this
probability of all things
00:27:29.660 --> 00:27:34.850
being so, and we've broken up
into a product of three
00:27:34.850 --> 00:27:37.020
probabilities.
00:27:37.020 --> 00:27:39.150
The first two are conditional
probabilities, so they're
00:27:39.150 --> 00:27:40.690
really all conditional
probabilities.
00:27:40.690 --> 00:27:43.530
The last one's conditional
on nothing.
00:27:43.530 --> 00:27:46.405
But look what happens as we
go from left to right.
00:27:46.405 --> 00:27:49.220
a is dependent on two things.
00:27:49.220 --> 00:27:52.910
b is only dependent on one thing
and nothing to the left.
00:27:52.910 --> 00:27:57.040
c is dependent on nothing
and nothing to the left.
00:27:57.040 --> 00:28:00.890
So you can sense a
generalization coming.
00:28:00.890 --> 00:28:02.220
So let's write it down.
00:28:11.820 --> 00:28:17.530
So let's go from here over
to here and say that the
00:28:17.530 --> 00:28:19.970
probability of a whole
bunch of things--
00:28:19.970 --> 00:28:25.250
x1 through x10--
00:28:25.250 --> 00:28:28.675
is equal to some product
of probabilities.
00:28:28.675 --> 00:28:32.760
We'll let the index
i run from n to 1.
00:28:32.760 --> 00:28:37.680
Probability of x to the last one
in the series, conditioned
00:28:37.680 --> 00:28:39.060
on all the other ones--
00:28:39.060 --> 00:28:44.235
sorry, that's probability
of i, i minus 1
00:28:44.235 --> 00:28:46.200
down to x1 like so.
00:28:49.040 --> 00:28:52.950
And for the first one in this
product, i will be equal to n.
00:28:52.950 --> 00:28:56.160
For the second one, i will
be equal to n minus 1.
00:28:56.160 --> 00:29:00.740
But you'll notice that as I
go from n toward 1, these
00:29:00.740 --> 00:29:02.340
conditionals get smaller--
00:29:02.340 --> 00:29:06.740
the number of things on
condition get smaller, and
00:29:06.740 --> 00:29:11.930
none of these things
are on the left.
00:29:11.930 --> 00:29:15.190
They're only stuff that
I have on the right.
00:29:15.190 --> 00:29:18.690
So what I mean to say is all of
these things have an index
00:29:18.690 --> 00:29:21.240
that's smaller than
this index.
00:29:21.240 --> 00:29:23.930
None of the ones that have a
higher index are appearing in
00:29:23.930 --> 00:29:25.870
that conditional.
00:29:25.870 --> 00:29:28.900
So it's a way of taking a
probability of the end of a
00:29:28.900 --> 00:29:32.180
whole bunch of things and
writing it as a product of
00:29:32.180 --> 00:29:34.690
conditional probabilities.
00:29:34.690 --> 00:29:36.000
So we're making good progress.
00:29:36.000 --> 00:29:38.010
We've done one.
00:29:38.010 --> 00:29:39.420
We've done two.
00:29:39.420 --> 00:29:41.220
And now we've done three,
because this
00:29:41.220 --> 00:29:42.470
is the chain rule.
00:29:47.850 --> 00:29:51.340
And we're about halfway through
our diagram, halfway
00:29:51.340 --> 00:29:54.710
to the point where we can
do something fun.
00:29:54.710 --> 00:29:56.900
But we still have a couple more
concepts to deal with,
00:29:56.900 --> 00:29:59.960
and the next concept is the
concept of conditional
00:29:59.960 --> 00:30:02.730
probability.
00:30:02.730 --> 00:30:06.400
So that's all this
stuff up here--
00:30:06.400 --> 00:30:07.650
oops.
00:30:10.860 --> 00:30:13.740
All this stuff here is the
definition of conditional
00:30:13.740 --> 00:30:14.990
probability.
00:30:19.800 --> 00:30:25.800
And now I want to go to the
definition of independence.
00:30:44.800 --> 00:30:46.870
So that's another definitional
deal.
00:30:46.870 --> 00:30:49.560
But it's another definitional
deal that makes some sense
00:30:49.560 --> 00:30:51.940
with a diagram as well.
00:30:51.940 --> 00:30:59.080
So the definition
goes like this.
00:30:59.080 --> 00:31:10.640
We say that P of a given b
is equal to P of a if a
00:31:10.640 --> 00:31:20.480
independent of b.
00:31:20.480 --> 00:31:23.690
So that says that the
probability of a doesn't
00:31:23.690 --> 00:31:26.980
depend on what's going
on with b.
00:31:26.980 --> 00:31:29.520
It's the same either way.
00:31:29.520 --> 00:31:30.630
So it's independent.
00:31:30.630 --> 00:31:33.310
b doesn't matter.
00:31:33.310 --> 00:31:35.550
So what does that look like
if we try to do an
00:31:35.550 --> 00:31:38.490
intuitionist diagram?
00:31:38.490 --> 00:31:39.740
Well, let's see.
00:31:42.809 --> 00:31:44.300
Here's a.
00:31:44.300 --> 00:31:46.440
Here's b.
00:31:46.440 --> 00:31:50.890
Now, the probability
of a given b--
00:31:50.890 --> 00:31:51.890
well, let's see.
00:31:51.890 --> 00:31:59.780
That must be this part here
divided by this part here.
00:32:04.060 --> 00:32:08.760
So the ratio of those areas is
the probability of a given b.
00:32:08.760 --> 00:32:16.300
So that's the probability of
this way divided by the
00:32:16.300 --> 00:32:20.140
probability of both ways.
00:32:24.090 --> 00:32:28.380
So what's the probability of
a in terms of these areas?
00:32:28.380 --> 00:32:32.000
Well, probability of a in terms
of these areas is the
00:32:32.000 --> 00:32:34.240
probability--
00:32:34.240 --> 00:32:35.680
let's see, have I
got this right?
00:32:35.680 --> 00:32:37.610
I've got this upside down.
00:32:41.610 --> 00:32:44.790
The probability of a given b
is the probability of the
00:32:44.790 --> 00:32:46.000
stuff in the intersection--
00:32:46.000 --> 00:32:47.290
so that's both ways--
00:32:49.810 --> 00:32:53.460
divided by the probability
of the stuff in b, which
00:32:53.460 --> 00:32:54.710
is going this way.
00:32:58.620 --> 00:33:03.040
And let's see, the probability
of a not conditioned on
00:33:03.040 --> 00:33:08.510
anything except being in this
universe is all these hash
00:33:08.510 --> 00:33:18.100
marks, like so, divided
by the universe.
00:33:21.170 --> 00:33:23.530
So when we say that something's
independent, it
00:33:23.530 --> 00:33:25.170
means that those two ratios
are the same.
00:33:28.170 --> 00:33:30.610
That's all it means in the
intuitionist's point of view.
00:33:30.610 --> 00:33:33.710
So it says that this little
area here divided by this
00:33:33.710 --> 00:33:36.970
whole area is the same as this
whole area for a divided by
00:33:36.970 --> 00:33:39.000
the size of the universe.
00:33:39.000 --> 00:33:40.250
So that's what independence
means.
00:33:43.050 --> 00:33:45.270
Now, that's quite
a lot of work.
00:33:45.270 --> 00:33:46.980
But we're not done with
independence, because we've
00:33:46.980 --> 00:33:49.730
got conditional independence
to deal with.
00:34:01.360 --> 00:34:03.170
And that, too, can be viewed
as a definition.
00:34:08.340 --> 00:34:11.810
And what we're going to say is
that the probability of a
00:34:11.810 --> 00:34:19.020
given b and z is equal to the
probability of a given z.
00:34:23.350 --> 00:34:24.210
What's that mean?
00:34:24.210 --> 00:34:28.010
That means that if you know
that we're dealing with z,
00:34:28.010 --> 00:34:33.100
then the probability of
a doesn't depend on b.
00:34:33.100 --> 00:34:35.239
b doesn't matter anymore
once you're
00:34:35.239 --> 00:34:38.900
restricted to being in z.
00:34:38.900 --> 00:34:42.350
So you can look at
that this way.
00:34:47.070 --> 00:34:52.060
Here's a, and here's
b, and here is z.
00:34:55.600 --> 00:34:58.360
So what we're saying is that
we're restricting the world to
00:34:58.360 --> 00:35:01.860
being in this part of the
universe where z is.
00:35:01.860 --> 00:35:09.145
So the probability of a given b
and z is this piece in here.
00:35:12.340 --> 00:35:16.050
a given b and z is
that part there.
00:35:16.050 --> 00:35:23.340
And the probability of a given
z is this part here
00:35:23.340 --> 00:35:27.280
divided by all of z.
00:35:27.280 --> 00:35:32.580
So we're saying that the ratio
of this little piece here to
00:35:32.580 --> 00:35:39.010
this part, which I'll mark that
way, ratio of this to
00:35:39.010 --> 00:35:42.080
this is the same as the
ratio of that to that.
00:35:42.080 --> 00:35:45.410
So that's conditional
independence.
00:35:45.410 --> 00:35:49.810
So you can infer from these
things, with a little bit of
00:35:49.810 --> 00:36:01.352
algebra, that P of a and b given
z is equal to P of a
00:36:01.352 --> 00:36:05.490
given z times P of b in z.
00:36:09.260 --> 00:36:12.400
Boy, that's been quite a
journey, but we got all the
00:36:12.400 --> 00:36:16.200
way through one, two, three,
four, and five.
00:36:16.200 --> 00:36:18.070
And now the next thing is belief
nets, and I'm going to
00:36:18.070 --> 00:36:22.400
ask you to forget everything
I've said for a minute or two.
00:36:22.400 --> 00:36:24.420
And we'll come back to it.
00:36:24.420 --> 00:36:29.360
I want to talk about the dog
and the burglar and the
00:36:29.360 --> 00:36:32.300
raccoon again.
00:36:32.300 --> 00:36:36.070
And now, forgetting about
probability, I can say, look,
00:36:36.070 --> 00:36:40.700
the dog barks if a
raccoon shows up.
00:36:40.700 --> 00:36:44.790
The dog barks if a
burglar shows up.
00:36:44.790 --> 00:36:48.110
A burglar doesn't show up
because the dog is barking.
00:36:48.110 --> 00:36:51.470
A raccoon doesn't show up
because the dog is barking.
00:36:51.470 --> 00:36:54.580
So the causality flows from the
burglar and the raccoon to
00:36:54.580 --> 00:36:56.020
the barking.
00:36:56.020 --> 00:36:58.570
So we can make a diagram
of that.
00:36:58.570 --> 00:37:01.310
And our diagram will
look like this.
00:37:01.310 --> 00:37:07.540
Here is the burglar, and
here is the raccoon.
00:37:07.540 --> 00:37:12.090
And these have causal relations
to the dog barking.
00:37:15.390 --> 00:37:22.080
So that's an interesting idea,
because now I can say that--
00:37:22.080 --> 00:37:24.550
well, I can't say anything yet,
because I want to add a
00:37:24.550 --> 00:37:26.190
little more complexity to it.
00:37:26.190 --> 00:37:28.920
I'm going to add two
more variables.
00:37:28.920 --> 00:37:34.640
You might call the police,
depending on how vigorous the
00:37:34.640 --> 00:37:36.430
dog is barking, I guess.
00:37:36.430 --> 00:37:40.300
And the raccoon has a propensity
to knocking over
00:37:40.300 --> 00:37:42.660
the trash can.
00:37:42.660 --> 00:37:44.600
So now, I've got
five variables.
00:37:44.600 --> 00:37:47.900
How big a joint probability
table am I going to need to
00:37:47.900 --> 00:37:50.020
keep my tallies straight?
00:37:50.020 --> 00:37:50.980
Well, it'll be 2 to the 5th.
00:37:50.980 --> 00:37:53.900
That's 32.
00:37:53.900 --> 00:37:59.780
But what I'm going to say is
that this diagram is a
00:37:59.780 --> 00:38:07.820
statement, that every node in it
depends on its parents and
00:38:07.820 --> 00:38:10.630
nothing else that's
not a descendant.
00:38:10.630 --> 00:38:13.380
Now, I need to say that about
50 times, because you've got
00:38:13.380 --> 00:38:15.020
to say it right.
00:38:15.020 --> 00:38:18.070
Every node there is independent
of every
00:38:18.070 --> 00:38:20.620
non-descendant other
then its parents.
00:38:20.620 --> 00:38:22.310
No, that's not quite right.
00:38:22.310 --> 00:38:26.380
Given its parents, every node
is independent of all other
00:38:26.380 --> 00:38:28.670
non-descendants.
00:38:28.670 --> 00:38:32.070
Well, what does that mean?
00:38:32.070 --> 00:38:34.730
Here's the deal with
calling the police.
00:38:34.730 --> 00:38:37.180
Here's its one and
only parent.
00:38:37.180 --> 00:38:40.400
So given this parent, the
probability that they were
00:38:40.400 --> 00:38:44.120
going to call the police doesn't
depend on anything
00:38:44.120 --> 00:38:48.520
like B, R, or T. It's because
all of the causality is
00:38:48.520 --> 00:38:51.600
flowing through this
dog barking.
00:38:51.600 --> 00:38:55.150
I'm not going to call the
police in a way that's
00:38:55.150 --> 00:38:57.240
dependent on anything else other
than whether the dog is
00:38:57.240 --> 00:38:58.860
barking or not.
00:38:58.860 --> 00:39:04.430
Because this guy has this as
a parent, and these are not
00:39:04.430 --> 00:39:10.245
descendants of calling the
police, so this is independent
00:39:10.245 --> 00:39:13.730
of B, R, and T.
00:39:13.730 --> 00:39:16.220
So let's go walk through
the others.
00:39:16.220 --> 00:39:17.470
Here's the dog.
00:39:17.470 --> 00:39:19.360
The dog's parents are burger
00:39:19.360 --> 00:39:21.950
appearing and raccoon appearing.
00:39:21.950 --> 00:39:27.590
So the probability that the dog
appears is independent of
00:39:27.590 --> 00:39:29.580
that trash can over there,
because that's not a
00:39:29.580 --> 00:39:30.850
descendant.
00:39:30.850 --> 00:39:33.660
It is dependent on
these parents.
00:39:33.660 --> 00:39:35.790
How about the trash can?
00:39:35.790 --> 00:39:37.340
It depends only on
the raccoon.
00:39:40.070 --> 00:39:43.810
It doesn't depend on any other
non-descendant, so therefore,
00:39:43.810 --> 00:39:50.190
it doesn't depend on D,
B, or P. How about B?
00:39:50.190 --> 00:39:52.900
It has no parents.
00:39:52.900 --> 00:39:58.210
So it depends on nothing else,
because everything else is
00:39:58.210 --> 00:40:09.070
either a non-descendant, because
B does not dependent
00:40:09.070 --> 00:40:12.895
on R and T, because they're
not descendants.
00:40:16.400 --> 00:40:19.160
It's interesting that B might
depend on D and P, because
00:40:19.160 --> 00:40:20.410
those are descendants.
00:40:22.950 --> 00:40:26.120
So it's important to understand
that there's this
00:40:26.120 --> 00:40:33.020
business of independence given
the parents of all other
00:40:33.020 --> 00:40:35.200
non-descendants.
00:40:35.200 --> 00:40:37.620
And you'll see why that funny,
strange language is important
00:40:37.620 --> 00:40:40.060
in a minute.
00:40:40.060 --> 00:40:40.710
But now, let's see--
00:40:40.710 --> 00:40:43.920
I want to make a model of what's
going to happen here.
00:40:43.920 --> 00:40:47.540
So let me see what kind of
probabilities I'm going to
00:40:47.540 --> 00:40:50.300
have to figure out.
00:40:50.300 --> 00:40:54.790
This guy doesn't depend
on anything upstream.
00:40:54.790 --> 00:40:56.460
So we could just say that
all we need there is the
00:40:56.460 --> 00:40:58.990
probability that a burglar
is going to appear.
00:40:58.990 --> 00:41:01.880
Let's say it's a fairly
high-crime neighborhood--
00:41:01.880 --> 00:41:03.020
1 chance in 10--
00:41:03.020 --> 00:41:06.330
1 day in 10, a burglar
appears.
00:41:06.330 --> 00:41:11.760
The raccoon doesn't depend on
anything other than its own
00:41:11.760 --> 00:41:14.130
propensity, so its probability,
00:41:14.130 --> 00:41:16.970
we'll say, is 0.5.
00:41:16.970 --> 00:41:19.780
Raccoons love the place, so it
shows up about 1 day in 2.
00:41:22.340 --> 00:41:24.300
So what about the dog barking?
00:41:24.300 --> 00:41:28.690
That depends on whether there's
a burglar, and the
00:41:28.690 --> 00:41:31.110
other parent is whether
there's a raccoon.
00:41:31.110 --> 00:41:34.270
So we need to keep track of the
probability that the dog
00:41:34.270 --> 00:41:37.350
will bark for all four
combinations.
00:41:42.060 --> 00:41:46.980
So this will be the burglar, and
this will be the raccoon.
00:41:46.980 --> 00:41:51.980
This will be false, false,
true, true--
00:41:51.980 --> 00:41:55.400
oops-- false, false,
true, false,
00:41:55.400 --> 00:41:59.360
false, true, true, true.
00:41:59.360 --> 00:42:03.450
So let's say it's a wonderful
dog, and it always barks if
00:42:03.450 --> 00:42:05.700
there's a burglar.
00:42:05.700 --> 00:42:10.500
So that would say that the
probability here is 1.0, and
00:42:10.500 --> 00:42:13.170
the probability here is 1.0.
00:42:13.170 --> 00:42:17.875
And if there's neither a burglar
nor a raccoon, the dog
00:42:17.875 --> 00:42:19.420
still likes to bark
just for fun.
00:42:19.420 --> 00:42:22.130
So we'll say that's a
chance of 1 in 10.
00:42:22.130 --> 00:42:26.370
And then in case there's a
burglar, let's say this.
00:42:26.370 --> 00:42:28.290
There's no burglar, but
there is a raccoon--
00:42:28.290 --> 00:42:31.710
he's tired of the raccoons, so
he only barks half the time.
00:42:31.710 --> 00:42:34.280
Do these numbers, by the way,
have to add up to 1?
00:42:34.280 --> 00:42:36.290
They clearly don't.
00:42:36.290 --> 00:42:37.370
These numbers don't
add up to one.
00:42:37.370 --> 00:42:40.690
What adds up to 1 is this
is the probability
00:42:40.690 --> 00:42:43.210
that the dog barks.
00:42:43.210 --> 00:42:47.310
And then the other phantom
probability is out here.
00:42:47.310 --> 00:42:48.935
And these have to add up to 1.
00:42:48.935 --> 00:42:52.850
So that would be 0.9, that would
be 0.0, that would be
00:42:52.850 --> 00:42:57.050
0.5, and this would be 0.0.
00:42:57.050 --> 00:43:01.820
So because those are just 1
minus the numbers in these
00:43:01.820 --> 00:43:06.830
columns, I don't bother
to write them down.
00:43:06.830 --> 00:43:08.540
Well, we still have a couple
more things to do.
00:43:08.540 --> 00:43:11.280
The probability that we'll call
the police depends only
00:43:11.280 --> 00:43:12.405
on the dog.
00:43:12.405 --> 00:43:14.770
So we'll have a column for the
dog, and then we'll have a
00:43:14.770 --> 00:43:16.425
probability of calling
the police.
00:43:19.070 --> 00:43:22.770
There's a probability for that
being false and a probability
00:43:22.770 --> 00:43:24.760
for that being true.
00:43:24.760 --> 00:43:28.790
So if the dog doesn't bark,
there's really hardly any
00:43:28.790 --> 00:43:30.730
chance we'll call the police.
00:43:30.730 --> 00:43:32.820
So make that 0, 0, 1.
00:43:32.820 --> 00:43:36.420
If the dog is barking, if he
barks vigorously enough, maybe
00:43:36.420 --> 00:43:40.430
1 chance in 10.
00:43:40.430 --> 00:43:43.640
Here, we have the trash can--
the final thing we have to
00:43:43.640 --> 00:43:44.830
think about.
00:43:44.830 --> 00:43:48.240
There's the trash can;
rather, the raccoon.
00:43:48.240 --> 00:43:51.890
And here's the trash
can probability.
00:43:51.890 --> 00:43:57.460
Depends on the raccoon being
either present or not present.
00:43:57.460 --> 00:44:00.270
If the raccoon is not present,
the probability the trash can
00:44:00.270 --> 00:44:04.650
is knocked over by, say,
the wind is 1 in 1,000.
00:44:04.650 --> 00:44:08.510
If the raccoon is there, oh man,
that guy always likes to
00:44:08.510 --> 00:44:11.340
go in there, so that's 0.8.
00:44:11.340 --> 00:44:14.580
So now I'm done specifying
this model.
00:44:14.580 --> 00:44:18.570
And the question is, how many
numbers did I have to specify?
00:44:18.570 --> 00:44:21.140
Well, let's see.
00:44:21.140 --> 00:44:25.150
I have to specify that one, that
one, that one, that one,
00:44:25.150 --> 00:44:29.060
that one, that one-- that's
6, 7, 8, 9, 10.
00:44:29.060 --> 00:44:32.540
So I had to specify
10 numbers.
00:44:32.540 --> 00:44:35.480
If I just try to build myself
a joint probability table
00:44:35.480 --> 00:44:39.586
straightaway, how many numbers
would I have to supply?
00:44:39.586 --> 00:44:41.970
Well, it's 2 to the n.
00:44:41.970 --> 00:44:48.970
So it's 2 to the
5th, that's 32.
00:44:48.970 --> 00:44:51.560
Considerable saving.
00:44:51.560 --> 00:44:54.910
By the way, how do you suppose
I made that table?
00:44:54.910 --> 00:44:57.220
Not by doing all
those numbers.
00:44:57.220 --> 00:45:01.460
By making this belief network
and then using the belief
00:45:01.460 --> 00:45:04.470
network to calculate
those numbers.
00:45:04.470 --> 00:45:07.900
And that's why this is a
miracle, because with these
00:45:07.900 --> 00:45:11.400
numbers, I can calculate those
numbers instead of making them
00:45:11.400 --> 00:45:15.420
up or making a whole lot of
tally-type measurements.
00:45:15.420 --> 00:45:18.540
So I'd like to make sure
that that's true.
00:45:18.540 --> 00:45:24.150
And I can use this stuff here
to calculate the full joint
00:45:24.150 --> 00:45:27.440
probability table.
00:45:27.440 --> 00:45:30.890
So here's how this works.
00:45:30.890 --> 00:45:33.265
I have the probability
of some combination--
00:45:36.020 --> 00:45:44.150
let's say the police, the dog,
the burglar, the trash can,
00:45:44.150 --> 00:45:45.400
and the raccoon.
00:45:50.220 --> 00:45:52.400
All the combinations that are
possible there will give me an
00:45:52.400 --> 00:45:54.582
entry in the table-- one row.
00:45:54.582 --> 00:45:56.280
But let's see--
00:45:56.280 --> 00:45:57.150
there's some miracle here.
00:45:57.150 --> 00:45:59.670
Oh, this chain rule.
00:45:59.670 --> 00:46:01.920
Let's use the chain rule.
00:46:01.920 --> 00:46:05.820
We can write that as a
probability that we call the
00:46:05.820 --> 00:46:10.950
police given d, b, t, and r.
00:46:10.950 --> 00:46:14.480
And then the next one in my
chain is probability of d
00:46:14.480 --> 00:46:17.950
given b, t, and r.
00:46:17.950 --> 00:46:20.090
Then the next one in the chain
is the probability of
00:46:20.090 --> 00:46:23.920
b given t and r.
00:46:23.920 --> 00:46:28.470
And the next one in my chain
is P of t given r.
00:46:28.470 --> 00:46:31.335
And the final one in
my chain is p of r.
00:46:33.860 --> 00:46:36.200
Now, we have some conditional
independence
00:46:36.200 --> 00:46:38.170
knowledge, too, don't we?
00:46:38.170 --> 00:46:45.740
We know that this probability
here depends only on d because
00:46:45.740 --> 00:46:47.150
there are no descendants.
00:46:47.150 --> 00:46:49.880
So therefore, we don't have to
think about that, and all the
00:46:49.880 --> 00:46:54.100
numbers we need here are
produced by this table.
00:46:54.100 --> 00:46:55.190
How about this one here?
00:46:55.190 --> 00:46:58.850
Probability that the dog barks
depends only on its parents, b
00:46:58.850 --> 00:47:01.550
and r, so it doesn't
depend on t.
00:47:05.390 --> 00:47:09.080
So b, in turn, depends on--
00:47:09.080 --> 00:47:09.960
what does it depend on?
00:47:09.960 --> 00:47:12.030
It doesn't depend on anything.
00:47:12.030 --> 00:47:14.330
So we can scratch those.
00:47:14.330 --> 00:47:17.890
Probability of t given r, yeah,
there's a probability
00:47:17.890 --> 00:47:20.030
there, but we can get
that from the table.
00:47:20.030 --> 00:47:22.680
And finally, P or r.
00:47:22.680 --> 00:47:25.550
So that's why I went through
all that probability junk,
00:47:25.550 --> 00:47:30.680
because if we arrange things in
the expansion of this, from
00:47:30.680 --> 00:47:35.100
bottom to top, then we arrange
things so that none of these
00:47:35.100 --> 00:47:39.860
guys depends on a descendant
in this formula.
00:47:39.860 --> 00:47:41.510
And we have a limited number
of things that it
00:47:41.510 --> 00:47:44.720
depends on above it.
00:47:44.720 --> 00:47:46.750
So that's the way we can
calculate back the full joint
00:47:46.750 --> 00:47:48.000
probability table.
00:47:51.845 --> 00:47:54.380
And that brings us to the end
of the discussion today.
00:47:54.380 --> 00:47:56.940
But the thing we're going to
think about is, how much
00:47:56.940 --> 00:47:59.850
saving do we really
get out of this?
00:47:59.850 --> 00:48:03.940
In this particular case, we
only had to devise 10
00:48:03.940 --> 00:48:05.290
numbers out of 32.
00:48:05.290 --> 00:48:09.400
What if we had 10 properties
or 100 properties?
00:48:09.400 --> 00:48:11.270
How much saving would
we get then?
00:48:11.270 --> 00:48:13.070
That's what we'll take
up next time,
00:48:13.070 --> 00:48:14.430
after the quiz on Wednesday.