WEBVTT
00:00:00.040 --> 00:00:02.460
The following content is
provided under a Creative
00:00:02.460 --> 00:00:03.870
Commons license.
00:00:03.870 --> 00:00:06.910
Your support will help MIT
OpenCourseWare continue to
00:00:06.910 --> 00:00:10.560
offer high quality educational
resources for free.
00:00:10.560 --> 00:00:13.460
To make a donation, or view
additional materials from
00:00:13.460 --> 00:00:19.290
hundreds of MIT courses, visit
MIT OpenCourseWare at
00:00:19.290 --> 00:00:21.996
ocw.mit.edu.
00:00:21.996 --> 00:00:22.490
PROFESSOR: OK.
00:00:22.490 --> 00:00:26.380
So today's lecture will be on
the subject of counting.
00:00:26.380 --> 00:00:29.620
So counting, I guess, is
a pretty simple affair
00:00:29.620 --> 00:00:33.060
conceptually, but it's a
topic that can also get
00:00:33.060 --> 00:00:34.410
to be pretty tricky.
00:00:34.410 --> 00:00:37.840
The reason we're going to talk
about counting is that there's
00:00:37.840 --> 00:00:41.000
a lot of probability problems
whose solution actually
00:00:41.000 --> 00:00:45.050
reduces to successfully counting
the cardinalities of
00:00:45.050 --> 00:00:46.110
various sets.
00:00:46.110 --> 00:00:49.270
So we're going to see the basic,
simplest methods that
00:00:49.270 --> 00:00:52.350
one can use to count
systematically in various
00:00:52.350 --> 00:00:53.660
situations.
00:00:53.660 --> 00:00:56.600
So in contrast to previous
lectures, we're not going to
00:00:56.600 --> 00:00:59.700
introduce any significant
new concepts of a
00:00:59.700 --> 00:01:01.460
probabilistic nature.
00:01:01.460 --> 00:01:04.340
We're just going to use the
probability tools that we
00:01:04.340 --> 00:01:05.680
already know.
00:01:05.680 --> 00:01:08.460
And we're going to apply them
in situations where there's
00:01:08.460 --> 00:01:10.840
also some counting involved.
00:01:10.840 --> 00:01:13.000
Now, today we're going
to just touch the
00:01:13.000 --> 00:01:14.490
surface of this subject.
00:01:14.490 --> 00:01:16.650
There's a whole field of
mathematics called
00:01:16.650 --> 00:01:19.920
combinatorics who are people who
actually spend their whole
00:01:19.920 --> 00:01:24.220
lives counting more and
more complicated sets.
00:01:24.220 --> 00:01:27.580
We were not going to get
anywhere close to the full
00:01:27.580 --> 00:01:31.360
complexity of the field, but
we'll get just enough tools
00:01:31.360 --> 00:01:36.260
that allow us to address
problems of the type that one
00:01:36.260 --> 00:01:39.730
encounters in most common
situations.
00:01:39.730 --> 00:01:43.250
So the basic idea, the basic
principle is something that
00:01:43.250 --> 00:01:45.400
we've already discussed.
00:01:45.400 --> 00:01:49.820
So counting methods apply in
situations where we have
00:01:49.820 --> 00:01:53.660
probabilistic experiments with
a finite number of outcomes
00:01:53.660 --> 00:01:56.070
and where every outcome--
00:01:56.070 --> 00:01:57.770
every possible outcome--
00:01:57.770 --> 00:02:00.260
has the same probability
of occurring.
00:02:00.260 --> 00:02:04.440
So we have our sample space,
omega, and it's got a bunch of
00:02:04.440 --> 00:02:06.520
discrete points in there.
00:02:06.520 --> 00:02:10.539
And the cardinality of the set
omega is some capital N. So,
00:02:10.539 --> 00:02:14.960
in particular, we assume that
the sample points are equally
00:02:14.960 --> 00:02:18.490
likely, which means that every
element of the sample space
00:02:18.490 --> 00:02:22.420
has the same probability
equal to 1 over N.
00:02:22.420 --> 00:02:26.650
And then we are interested in a
subset of the sample space,
00:02:26.650 --> 00:02:29.860
call it A. And that
subset consists
00:02:29.860 --> 00:02:31.350
of a number of elements.
00:02:31.350 --> 00:02:36.080
Let the cardinality of that
subset be equal to little n.
00:02:36.080 --> 00:02:39.370
And then to find the probability
of that set, all
00:02:39.370 --> 00:02:42.060
we need to do is to add the
probabilities of the
00:02:42.060 --> 00:02:43.610
individual elements.
00:02:43.610 --> 00:02:47.030
There's little n elements, and
each one has probability one
00:02:47.030 --> 00:02:50.340
over capital N. And
that's the answer.
00:02:50.340 --> 00:02:53.250
So this means that to solve
problems in this context, all
00:02:53.250 --> 00:02:56.580
that we need to be able to do
is to figure out the number
00:02:56.580 --> 00:03:00.800
capital N and to figure out
the number little n.
00:03:00.800 --> 00:03:04.330
Now, if somebody gives you a set
by just giving you a list
00:03:04.330 --> 00:03:07.660
and gives you another set,
again, giving you a list, it's
00:03:07.660 --> 00:03:09.120
easy to count there element.
00:03:09.120 --> 00:03:11.510
You just count how much
there is on the list.
00:03:11.510 --> 00:03:15.030
But sometimes the sets are
described in some more
00:03:15.030 --> 00:03:20.590
implicit way, and we may have to
do a little bit more work.
00:03:20.590 --> 00:03:22.360
There's various tricks that are
00:03:22.360 --> 00:03:24.410
involved in counting properly.
00:03:24.410 --> 00:03:27.080
And the most common
one is to--
00:03:27.080 --> 00:03:31.350
when you consider a set of
possible outcomes, to describe
00:03:31.350 --> 00:03:33.800
the construction of those
possible outcomes through a
00:03:33.800 --> 00:03:35.440
sequential process.
00:03:35.440 --> 00:03:38.130
So think of a probabilistic
experiment that involves a
00:03:38.130 --> 00:03:42.560
number of stages, and in each
one of the stages there's a
00:03:42.560 --> 00:03:45.890
number of possible choices
that there may be.
00:03:45.890 --> 00:03:48.630
The overall experiment consists
of carrying out all
00:03:48.630 --> 00:03:49.930
the stages to the end.
00:03:52.855 --> 00:03:55.830
And the number of points in the
sample space is how many
00:03:55.830 --> 00:03:59.920
final outcomes there can be in
this multi-stage experiment.
00:03:59.920 --> 00:04:02.910
So in this picture we have an
experiment in which of the
00:04:02.910 --> 00:04:07.230
first stage we have
four choices.
00:04:07.230 --> 00:04:11.640
In the second stage, no matter
what happened in the first
00:04:11.640 --> 00:04:16.010
stage, the way this is drawn
we have three choices.
00:04:16.010 --> 00:04:19.720
No matter whether we ended up
here, there, or there, we have
00:04:19.720 --> 00:04:22.860
three choices in the
second stage.
00:04:22.860 --> 00:04:27.830
And then there's a third stage
and at least in this picture,
00:04:27.830 --> 00:04:31.520
no matter what happened in the
first two stages, in the third
00:04:31.520 --> 00:04:35.930
stage we're going to have
two possible choices.
00:04:35.930 --> 00:04:41.070
So how many leaves are there
at the end of this tree?
00:04:41.070 --> 00:04:42.170
That's simple.
00:04:42.170 --> 00:04:45.460
It's just the product of
these three numbers.
00:04:45.460 --> 00:04:48.520
The number of possible leaves
that we have out there is 4
00:04:48.520 --> 00:04:50.430
times 3 times 2.
00:04:50.430 --> 00:04:54.030
Number of choices at each stage
gets multiplied, and
00:04:54.030 --> 00:04:57.680
that gives us the number
of overall choices.
00:04:57.680 --> 00:05:01.230
So this is the general rule, the
general trick that we are
00:05:01.230 --> 00:05:03.520
going to use over and over.
00:05:03.520 --> 00:05:07.660
So let's apply it to some very
simple problems as a warm up.
00:05:07.660 --> 00:05:10.530
How many license plates can you
make if you're allowed to
00:05:10.530 --> 00:05:17.140
use three letters and then
followed by four digits?
00:05:17.140 --> 00:05:20.020
At least if you're dealing with
the English alphabet, you
00:05:20.020 --> 00:05:23.460
have 26 choices for
the first letter.
00:05:23.460 --> 00:05:27.100
Then you have 26 choices
for the second letter.
00:05:27.100 --> 00:05:30.300
And then 26 choices for
the third letter.
00:05:30.300 --> 00:05:31.750
And then we start the digits.
00:05:31.750 --> 00:05:34.970
We have 10 choices for the first
digit, 10 choices for
00:05:34.970 --> 00:05:37.780
the second digit, 10 choices for
the third, 10 choices for
00:05:37.780 --> 00:05:40.030
the last one.
00:05:40.030 --> 00:05:43.010
Let's make it a little more
complicated, suppose that
00:05:43.010 --> 00:05:47.100
we're interested in license
plates where no letter can be
00:05:47.100 --> 00:05:50.270
repeated and no digit
can be repeated.
00:05:50.270 --> 00:05:53.040
So you have to use different
letters, different digits.
00:05:53.040 --> 00:05:55.110
How many license plates
can you make?
00:05:55.110 --> 00:05:56.780
OK, let's choose the
first letter,
00:05:56.780 --> 00:05:59.130
and we have 26 choices.
00:05:59.130 --> 00:06:02.350
Now, I'm ready to choose my
second letter, how many
00:06:02.350 --> 00:06:04.610
choices do I have?
00:06:04.610 --> 00:06:08.090
I have 25, because I already
used one letter.
00:06:08.090 --> 00:06:11.900
I have the 25 remaining letters
to choose from.
00:06:11.900 --> 00:06:14.330
For the next letter,
how many choices?
00:06:14.330 --> 00:06:17.030
Well, I used up two of
my letters, so I
00:06:17.030 --> 00:06:19.840
only have 24 available.
00:06:19.840 --> 00:06:22.950
And then we start with the
digits, 10 choices for the
00:06:22.950 --> 00:06:26.910
first digit, 9 choices for the
second, 8 for the third, 7 for
00:06:26.910 --> 00:06:28.160
the last one.
00:06:30.480 --> 00:06:31.730
All right.
00:06:31.730 --> 00:06:38.710
So, now, let's bring some
symbols in a related problem.
00:06:38.710 --> 00:06:44.190
You are given a set that
consists of n elements and
00:06:44.190 --> 00:06:47.040
you're supposed to take
those n elements and
00:06:47.040 --> 00:06:50.050
put them in a sequence.
00:06:50.050 --> 00:06:52.290
That is to order them.
00:06:52.290 --> 00:06:56.240
Any possible ordering of those
elements is called a
00:06:56.240 --> 00:06:57.560
permutation.
00:06:57.560 --> 00:07:02.630
So for example, if we have the
set 1, 2, 3, 4, a possible
00:07:02.630 --> 00:07:05.105
permutation is the
list 2, 3, 4, 1.
00:07:08.560 --> 00:07:10.680
That's one possible
permutation.
00:07:10.680 --> 00:07:13.180
And there's lots of possible
permutations, of course, the
00:07:13.180 --> 00:07:15.560
question is how many
are there.
00:07:15.560 --> 00:07:20.390
OK, let's think about building
this permutation by choosing
00:07:20.390 --> 00:07:21.460
one at a time.
00:07:21.460 --> 00:07:26.330
Which of these elements goes
into each one of these slots?
00:07:26.330 --> 00:07:28.740
How many choices for the number
that goes into the
00:07:28.740 --> 00:07:31.160
first slot or the elements?
00:07:31.160 --> 00:07:34.430
Well, we can choose any one of
the available elements, so we
00:07:34.430 --> 00:07:35.680
have n choices.
00:07:38.650 --> 00:07:42.220
Let's say this element goes
here, having used up that
00:07:42.220 --> 00:07:45.730
element, we're left with n minus
1 elements and we can
00:07:45.730 --> 00:07:49.530
pick any one of these and bring
it into the second slot.
00:07:49.530 --> 00:07:52.340
So here we have n choices, here
we're going to have n
00:07:52.340 --> 00:07:55.940
minus 1 choices, then how
many we put there will
00:07:55.940 --> 00:07:58.060
have n minus 2 choices.
00:07:58.060 --> 00:08:00.220
And you go down until the end.
00:08:00.220 --> 00:08:02.160
What happens at this point
when you are to
00:08:02.160 --> 00:08:03.880
pick the last element?
00:08:03.880 --> 00:08:06.650
Well, you've used n minus of
them, there's only one
00:08:06.650 --> 00:08:08.070
left in your bag.
00:08:08.070 --> 00:08:09.520
You're forced to use that one.
00:08:09.520 --> 00:08:13.920
So the last stage, you're going
to have only one choice.
00:08:13.920 --> 00:08:18.050
So, basically, the number of
possible permutations is the
00:08:18.050 --> 00:08:21.860
product of all integers
from n down to one, or
00:08:21.860 --> 00:08:24.020
from one up to n.
00:08:24.020 --> 00:08:26.550
And there's a symbol that we
use for this number, it's
00:08:26.550 --> 00:08:29.210
called n factorial.
00:08:29.210 --> 00:08:32.990
So n factorial is the number of
permutations of n objects.
00:08:32.990 --> 00:08:37.320
The number of ways that you can
order n objects that are
00:08:37.320 --> 00:08:39.100
given to you.
00:08:39.100 --> 00:08:42.100
Now, a different equation.
00:08:42.100 --> 00:08:44.310
We have n elements.
00:08:44.310 --> 00:08:48.680
Let's say the elements
are 1, 1,2, up to n.
00:08:48.680 --> 00:08:51.310
And it's a set.
00:08:51.310 --> 00:08:54.490
And we want to create
a subset.
00:08:54.490 --> 00:08:58.460
How many possible subsets
are there?
00:08:58.460 --> 00:09:02.950
So speaking of subsets means
looking at each one of the
00:09:02.950 --> 00:09:06.880
elements and deciding whether
you're going to put it in to
00:09:06.880 --> 00:09:08.440
subsets or not.
00:09:08.440 --> 00:09:13.240
For example, I could choose
to put 1 in, but 2 I'm not
00:09:13.240 --> 00:09:17.100
putting it in, 3 I'm not putting
it in, 4 I'm putting
00:09:17.100 --> 00:09:18.630
it, and so on.
00:09:18.630 --> 00:09:21.200
So that's how you
create a subset.
00:09:21.200 --> 00:09:23.660
You look at each one of the
elements and you say, OK, I'm
00:09:23.660 --> 00:09:27.310
going to put it in the subset,
or I'm not going to put it.
00:09:27.310 --> 00:09:30.900
So think of these as consisting
of stages.
00:09:30.900 --> 00:09:33.240
At each stage you look at
one element, and you
00:09:33.240 --> 00:09:35.090
make a binary decision.
00:09:35.090 --> 00:09:38.410
Do I put it in the
subset, or not?
00:09:38.410 --> 00:09:41.940
So therefore, how many
subsets are there?
00:09:41.940 --> 00:09:45.060
Well, I have two choices
for the first element.
00:09:45.060 --> 00:09:47.740
Am I going to put in
the subset, or not?
00:09:47.740 --> 00:09:50.630
I have two choices for the
next element, and so on.
00:09:53.450 --> 00:09:57.390
For each one of the elements,
we have two choices.
00:09:57.390 --> 00:10:02.090
So the overall number of choices
is 2 to the power n.
00:10:02.090 --> 00:10:03.710
So, conclusion--
00:10:03.710 --> 00:10:10.150
the number of subsets, often n
element set, is 2 to the n.
00:10:15.050 --> 00:10:20.430
So in particular, if we take n
equal to 1, let's check that
00:10:20.430 --> 00:10:22.190
our answer makes sense.
00:10:22.190 --> 00:10:26.420
If we have n equal to one, how
many subsets does it have?
00:10:26.420 --> 00:10:29.675
So we're dealing with
a set of just one.
00:10:29.675 --> 00:10:30.925
What are the subsets?
00:10:33.830 --> 00:10:37.290
One subset is this one.
00:10:37.290 --> 00:10:41.530
Do we have other subsets
of the one element set?
00:10:41.530 --> 00:10:43.920
Yes, we have the empty set.
00:10:43.920 --> 00:10:44.870
That's the second one.
00:10:44.870 --> 00:10:48.860
These are the two possible
subsets of
00:10:48.860 --> 00:10:50.850
this particular set.
00:10:50.850 --> 00:10:56.790
So 2 subsets when n is equal to
1, that checks the answer.
00:10:56.790 --> 00:10:58.040
All right.
00:11:00.290 --> 00:11:07.590
OK, so having gone so far, we
can do our first example now.
00:11:07.590 --> 00:11:12.990
So we are given a die
and we're going
00:11:12.990 --> 00:11:16.620
to roll it 6 times.
00:11:16.620 --> 00:11:20.030
OK, let's make some assumptions
about the rolls.
00:11:20.030 --> 00:11:29.560
Let's assume that the rolls are
independent, and that the
00:11:29.560 --> 00:11:30.985
die is also fair.
00:11:34.180 --> 00:11:38.110
So this means that the
probability of any particular
00:11:38.110 --> 00:11:40.350
outcome of the die rolls--
00:11:40.350 --> 00:11:43.960
for example, so we have 6 rolls,
one particular outcome
00:11:43.960 --> 00:11:48.760
could be 3,3,1,6,5.
00:11:48.760 --> 00:11:51.220
So that's one possible
outcome.
00:11:51.220 --> 00:11:54.060
What's the probability
of this outcome?
00:11:54.060 --> 00:11:57.470
There's probability 1/6 that
this happens, 1/6 that this
00:11:57.470 --> 00:12:00.530
happens, 1/6 that this
happens, and so on.
00:12:00.530 --> 00:12:04.250
So the probability that
the outcome is this
00:12:04.250 --> 00:12:07.540
is 1/6 to the sixth.
00:12:10.970 --> 00:12:13.690
What did I use to come
up with this answer?
00:12:13.690 --> 00:12:17.550
I used independence, so I
multiplied the probability of
00:12:17.550 --> 00:12:20.360
the first roll gives me a 2,
times the probability that the
00:12:20.360 --> 00:12:22.990
second roll gives me
a 3, and so on.
00:12:22.990 --> 00:12:26.790
And then I used the assumption
that the die is fair, so that
00:12:26.790 --> 00:12:30.300
the probability of 2 is
1/6, the probably of 3
00:12:30.300 --> 00:12:32.240
is 1/6, and so on.
00:12:32.240 --> 00:12:34.810
So if I were to spell it out,
it's the probability that we
00:12:34.810 --> 00:12:37.740
get the 2 in the first roll,
times the probability of 3 in
00:12:37.740 --> 00:12:40.850
the second roll, times the
probability of the
00:12:40.850 --> 00:12:42.800
5 in the last roll.
00:12:42.800 --> 00:12:46.455
So by independence, I can
multiply probabilities.
00:12:46.455 --> 00:12:49.530
And because the die is fair,
each one of these numbers is
00:12:49.530 --> 00:12:53.910
1/6 to the sixth.
00:12:53.910 --> 00:12:58.170
And so the same calculation
would apply no matter what
00:12:58.170 --> 00:13:00.610
numbers I would put in here.
00:13:00.610 --> 00:13:03.920
So all possible outcomes
are equally likely.
00:13:06.630 --> 00:13:08.400
Let's start with this.
00:13:08.400 --> 00:13:12.650
So since all possible outcomes
are equally likely to find an
00:13:12.650 --> 00:13:15.870
answer to a probability
question, if we're dealing
00:13:15.870 --> 00:13:21.430
with some particular event, so
the event is that all rolls
00:13:21.430 --> 00:13:22.900
give different numbers.
00:13:22.900 --> 00:13:31.160
That's our event A. And our
sample space is some set
00:13:31.160 --> 00:13:32.790
capital omega.
00:13:32.790 --> 00:13:35.890
We know that the answer is going
to be the cardinality of
00:13:35.890 --> 00:13:40.200
the set A, divided by the
cardinality of the set omega.
00:13:40.200 --> 00:13:42.830
So let's deal with the
easy one first.
00:13:42.830 --> 00:13:45.960
How many elements are there
in the sample space?
00:13:45.960 --> 00:13:48.880
How many possible outcomes
are there when you
00:13:48.880 --> 00:13:51.570
roll a dice 6 times?
00:13:51.570 --> 00:13:54.600
You have 6 choices for
the first roll.
00:13:54.600 --> 00:13:57.840
You have 6 choices for the
second roll and so on.
00:13:57.840 --> 00:14:00.330
So the overall number
of outcomes is going
00:14:00.330 --> 00:14:03.950
to be 6 to the sixth.
00:14:03.950 --> 00:14:08.200
So number of elements in
the sample space is 6
00:14:08.200 --> 00:14:10.470
to the sixth power.
00:14:10.470 --> 00:14:14.820
And I guess this checks
with this.
00:14:14.820 --> 00:14:18.480
We have 6 to the sixth outcomes,
each one has this
00:14:18.480 --> 00:14:20.570
much probability,
so the overall
00:14:20.570 --> 00:14:23.230
probability is equal to one.
00:14:23.230 --> 00:14:24.460
Right?
00:14:24.460 --> 00:14:28.690
So the probability of an
individual outcome is one over
00:14:28.690 --> 00:14:32.400
how many possible outcomes
we have, which is this.
00:14:32.400 --> 00:14:33.810
All right.
00:14:33.810 --> 00:14:36.620
So how about the numerator?
00:14:36.620 --> 00:14:42.430
We are interested in outcomes
in which the numbers that we
00:14:42.430 --> 00:14:44.585
get are all different.
00:14:48.770 --> 00:14:54.080
So what is an outcome in which
the numbers are all different?
00:14:54.080 --> 00:14:56.310
So the die has 6 faces.
00:14:56.310 --> 00:14:58.200
We roll it 6 times.
00:14:58.200 --> 00:15:00.340
We're going to get 6
different numbers.
00:15:00.340 --> 00:15:03.780
This means that we're going to
exhaust all the possible
00:15:03.780 --> 00:15:07.640
numbers, but they can appear
in any possible sequence.
00:15:07.640 --> 00:15:13.190
So an outcome that makes this
event happen is a list of the
00:15:13.190 --> 00:15:16.250
numbers from 1 to 6,
but arranged in
00:15:16.250 --> 00:15:18.160
some arbitrary order.
00:15:18.160 --> 00:15:23.200
So the possible outcomes that
make event A happen are just
00:15:23.200 --> 00:15:25.990
the permutations of the
numbers from 1 to 6.
00:15:31.070 --> 00:15:33.900
One possible outcome that makes
our events to happen--
00:15:33.900 --> 00:15:35.440
it would be this.
00:15:39.000 --> 00:15:42.050
Here we have 6 possible numbers,
but any other list of
00:15:42.050 --> 00:15:44.070
this kind in which none
of the numbers is
00:15:44.070 --> 00:15:46.650
repeated would also do.
00:15:46.650 --> 00:15:51.660
So number of outcomes that make
the event happen is the
00:15:51.660 --> 00:15:53.980
number of permutations
of 6 elements.
00:15:53.980 --> 00:15:56.060
So it's 6 factorial.
00:15:56.060 --> 00:15:59.340
And so the final answer is
going to be 6 factorial
00:15:59.340 --> 00:16:02.830
divided by 6 to the sixth.
00:16:02.830 --> 00:16:06.580
All right, so that's a typical
way that's one solves problems
00:16:06.580 --> 00:16:07.800
of this kind.
00:16:07.800 --> 00:16:10.660
We know how to count
certain things.
00:16:10.660 --> 00:16:14.260
For example, here we knew how to
count permutations, and we
00:16:14.260 --> 00:16:16.830
used our knowledge to count the
elements of the set that
00:16:16.830 --> 00:16:18.080
we need to deal with.
00:16:24.380 --> 00:16:30.970
So now let's get to a slightly
more difficult problem.
00:16:30.970 --> 00:16:37.390
We're given once more a
set with n elements.
00:16:40.620 --> 00:16:46.000
We already know how many subsets
that set has, but now
00:16:46.000 --> 00:16:50.940
we would be interested in
subsets that have exactly k
00:16:50.940 --> 00:16:54.480
elements in them.
00:16:54.480 --> 00:17:05.819
So we start with our big set
that has n elements, and we
00:17:05.819 --> 00:17:11.890
want to construct a subset
that has k elements.
00:17:11.890 --> 00:17:14.200
Out of those n I'm
going to choose k
00:17:14.200 --> 00:17:16.079
and put them in there.
00:17:16.079 --> 00:17:18.180
In how many ways
can I do this?
00:17:18.180 --> 00:17:20.710
More concrete way of thinking
about this problem--
00:17:20.710 --> 00:17:24.960
you have n people in some group
and you want to form a
00:17:24.960 --> 00:17:28.270
committee by picking people from
that group, and you want
00:17:28.270 --> 00:17:31.020
to form a committee
with k people.
00:17:31.020 --> 00:17:32.460
Where k is a given number.
00:17:32.460 --> 00:17:34.670
For example, a 5 person
committee.
00:17:34.670 --> 00:17:37.510
How many 5 person committees
are possible if you're
00:17:37.510 --> 00:17:39.450
starting with 100 people?
00:17:39.450 --> 00:17:40.960
So that's what we
want to count.
00:17:40.960 --> 00:17:44.210
How many k element subsets
are there?
00:17:44.210 --> 00:17:48.030
We don't yet know the answer,
but let's give a name to it.
00:17:48.030 --> 00:17:52.210
And the name is going to be this
particular symbol, which
00:17:52.210 --> 00:17:55.220
we read as n choose k.
00:17:55.220 --> 00:18:00.130
Out of n elements, we want
to choose k of them.
00:18:00.130 --> 00:18:02.000
OK.
00:18:02.000 --> 00:18:04.810
That may be a little tricky.
00:18:04.810 --> 00:18:10.170
So what we're going to do is
to instead figure out a
00:18:10.170 --> 00:18:15.590
somewhat easier problem,
which is going to be--
00:18:15.590 --> 00:18:20.840
in how many ways can I pick k
out of these people and puts
00:18:20.840 --> 00:18:25.450
them in a particular order?
00:18:25.450 --> 00:18:30.430
So how many possible ordered
lists can I make that consist
00:18:30.430 --> 00:18:31.840
of k people?
00:18:31.840 --> 00:18:35.240
By ordered, I mean that we take
those k people and we say
00:18:35.240 --> 00:18:38.010
this is the first person
in the community.
00:18:38.010 --> 00:18:39.600
That's the second person
in the committee.
00:18:39.600 --> 00:18:42.070
That's the third person in
the committee and so on.
00:18:42.070 --> 00:18:46.100
So in how many ways
can we do this?
00:18:46.100 --> 00:18:50.840
Out of these n, we want to
choose just k of them and put
00:18:50.840 --> 00:18:52.010
them in slots.
00:18:52.010 --> 00:18:53.970
One after the other.
00:18:53.970 --> 00:18:57.480
So this is pretty much like the
license plate problem we
00:18:57.480 --> 00:19:00.680
solved just a little earlier.
00:19:00.680 --> 00:19:06.390
So we have n choices for who
we put as the top person in
00:19:06.390 --> 00:19:07.350
the community.
00:19:07.350 --> 00:19:11.490
We can pick anyone and have
them be the first person.
00:19:11.490 --> 00:19:13.000
Then I'm going to choose
the second
00:19:13.000 --> 00:19:14.640
person in the committee.
00:19:14.640 --> 00:19:16.700
I've used up 1 person.
00:19:16.700 --> 00:19:21.530
So I'm going to have n
minus 1 choices here.
00:19:21.530 --> 00:19:25.840
And now, at this stage I've used
up 2 people, so I have n
00:19:25.840 --> 00:19:28.640
minus 2 choices here.
00:19:28.640 --> 00:19:31.240
And this keeps going on.
00:19:31.240 --> 00:19:34.110
Well, what is going to
be the last number?
00:19:34.110 --> 00:19:36.310
Is it's n minus k?
00:19:36.310 --> 00:19:39.980
Well, not really.
00:19:39.980 --> 00:19:44.090
I'm starting subtracting numbers
after the second one,
00:19:44.090 --> 00:19:48.250
so by the end I will have
subtracted k minus 1.
00:19:48.250 --> 00:19:54.800
So that's how many choices I
will have for the last person.
00:19:54.800 --> 00:19:58.270
So this is the number
of ways--
00:19:58.270 --> 00:20:02.390
the product of these numbers
there gives me the number of
00:20:02.390 --> 00:20:08.420
ways that I can create ordered
lists consisting of k people
00:20:08.420 --> 00:20:11.700
out of the n that
we started with.
00:20:11.700 --> 00:20:15.120
Now, you can do a little bit of
algebra and check that this
00:20:15.120 --> 00:20:17.910
expression here is the same
as that expression.
00:20:17.910 --> 00:20:19.200
Why is this?
00:20:19.200 --> 00:20:22.520
This factorial has all the
products from 1 up to n.
00:20:22.520 --> 00:20:25.140
This factorial has all
the products from 1
00:20:25.140 --> 00:20:26.710
up to n minus k.
00:20:26.710 --> 00:20:28.240
So you get cancellations.
00:20:28.240 --> 00:20:31.860
And what's left is all the
products starting from the
00:20:31.860 --> 00:20:37.610
next number after here, which
is this particular number.
00:20:37.610 --> 00:20:42.350
So the number of possible ways
of creating such ordered lists
00:20:42.350 --> 00:20:46.330
is n factorial divided by
n minus k factorial.
00:20:49.480 --> 00:20:53.180
Now, a different way that I
could make an ordered list--
00:20:53.180 --> 00:20:57.950
instead of picking the people
one at a time, I could first
00:20:57.950 --> 00:21:01.700
choose my k people who are going
to be in the committee,
00:21:01.700 --> 00:21:04.080
and then put them in order.
00:21:04.080 --> 00:21:07.680
And tell them out of these k,
you are the first, you are the
00:21:07.680 --> 00:21:10.010
second, you are the third.
00:21:10.010 --> 00:21:12.590
Starting with this k
people, in how many
00:21:12.590 --> 00:21:15.580
ways can I order them?
00:21:15.580 --> 00:21:18.150
That's the number
of permutations.
00:21:20.820 --> 00:21:25.180
Starting with a set with k
objects, in how many ways can
00:21:25.180 --> 00:21:28.340
I put them in a specific
order?
00:21:28.340 --> 00:21:31.140
How many specific orders
are there?
00:21:31.140 --> 00:21:32.390
That's basically the question.
00:21:32.390 --> 00:21:34.600
In how many ways can
I permute these k
00:21:34.600 --> 00:21:36.290
people and arrange them.
00:21:36.290 --> 00:21:38.450
So the number of ways
that you can do
00:21:38.450 --> 00:21:42.660
this step is k factorial.
00:21:42.660 --> 00:21:48.330
So in how many ways can I
start with a set with n
00:21:48.330 --> 00:21:52.020
elements, go through this
process, and end up with a
00:21:52.020 --> 00:21:55.560
sorted list with k elements?
00:21:55.560 --> 00:21:57.620
By the rule that--
00:21:57.620 --> 00:22:02.160
when we have stages, the total
number of stages is how many
00:22:02.160 --> 00:22:05.370
choices we had in the first
stage, times how many choices
00:22:05.370 --> 00:22:08.510
we had in the second stage.
00:22:08.510 --> 00:22:12.670
The number of ways that this
process can happen is this
00:22:12.670 --> 00:22:14.890
times that.
00:22:14.890 --> 00:22:18.340
This is a different way that
that process could happen.
00:22:18.340 --> 00:22:22.640
And the number of possible
of ways is this number.
00:22:22.640 --> 00:22:27.600
No matter which way we carry out
that process, in the end
00:22:27.600 --> 00:22:34.610
we have the possible ways of
arranging k people out of the
00:22:34.610 --> 00:22:36.770
n that we started with.
00:22:36.770 --> 00:22:40.730
So the final answer that we get
when we count should be
00:22:40.730 --> 00:22:44.220
either this, or this
times that.
00:22:44.220 --> 00:22:47.620
Both are equally valid ways of
counting, so both should give
00:22:47.620 --> 00:22:49.050
us the same answer.
00:22:49.050 --> 00:22:52.950
So we get this equality here.
00:22:52.950 --> 00:22:56.370
So these two expressions
corresponds to two different
00:22:56.370 --> 00:23:01.660
ways of constructing ordered
lists of k people starting
00:23:01.660 --> 00:23:05.580
with n people initially.
00:23:05.580 --> 00:23:09.120
And now that we have this
relation, we can send the k
00:23:09.120 --> 00:23:11.540
factorial to the denominator.
00:23:11.540 --> 00:23:13.940
And that tells us what
that number, n choose
00:23:13.940 --> 00:23:16.250
k, is going to be.
00:23:16.250 --> 00:23:20.060
So this formula-- it's written
here in red, because you're
00:23:20.060 --> 00:23:22.150
going to see it a zillion
times until
00:23:22.150 --> 00:23:23.740
the end of the semester--
00:23:23.740 --> 00:23:25.950
they are called the binomial
coefficients.
00:23:31.170 --> 00:23:34.600
And they tell us the number of
possible ways that we can
00:23:34.600 --> 00:23:38.380
create a k element subset,
starting with a
00:23:38.380 --> 00:23:41.270
set that has n elements.
00:23:41.270 --> 00:23:44.430
It's always good to do a sanity
check to formulas by
00:23:44.430 --> 00:23:46.710
considering extreme cases.
00:23:46.710 --> 00:23:52.810
So let's take the case where
k is equal to n.
00:23:56.820 --> 00:23:59.420
What's the right answer
in this case?
00:23:59.420 --> 00:24:02.905
How many n elements subsets
are there out
00:24:02.905 --> 00:24:04.950
of an element set?
00:24:04.950 --> 00:24:07.580
Well, your subset needs
to include every one.
00:24:07.580 --> 00:24:09.400
You don't have any choices.
00:24:09.400 --> 00:24:10.750
There's only one choice.
00:24:10.750 --> 00:24:12.600
It's the set itself.
00:24:12.600 --> 00:24:15.700
So the answer should
be equal to 1.
00:24:15.700 --> 00:24:19.980
That's the number of n element
subsets, starting with a set
00:24:19.980 --> 00:24:21.340
with n elements.
00:24:21.340 --> 00:24:25.250
Let's see if the formula gives
us the right answer.
00:24:25.250 --> 00:24:31.750
We have n factorial divided by
k, which is n in our case--
00:24:31.750 --> 00:24:32.630
n factorial.
00:24:32.630 --> 00:24:36.700
And then n minus k
is 0 factorial.
00:24:36.700 --> 00:24:42.070
So if our formula is correct, we
should have this equality.
00:24:42.070 --> 00:24:45.620
And what's the way to
make that correct?
00:24:45.620 --> 00:24:47.880
Well, it depends what kind
of meaning do we
00:24:47.880 --> 00:24:49.420
give to this symbol?
00:24:49.420 --> 00:24:53.510
How do we define
zero factorial?
00:24:53.510 --> 00:24:55.750
I guess in some ways
it's arbitrary.
00:24:55.750 --> 00:24:58.110
We're going to define it
in a way that makes
00:24:58.110 --> 00:24:59.640
this formula right.
00:24:59.640 --> 00:25:03.870
So the definition that we will
be using is that whenever you
00:25:03.870 --> 00:25:08.700
have 0 factorial, it's going
to stand for the number 1.
00:25:08.700 --> 00:25:12.030
So let's check that this is
also correct, at the other
00:25:12.030 --> 00:25:13.380
extreme case.
00:25:13.380 --> 00:25:17.670
If we let k equal to 0, what
does the formula give us?
00:25:17.670 --> 00:25:20.710
It gives us, again, n factorial
divided by 0
00:25:20.710 --> 00:25:23.090
factorial times n factorial.
00:25:23.090 --> 00:25:27.560
According to our convention,
this again is equal to 1.
00:25:27.560 --> 00:25:33.680
So there is one subset of our
set that we started with that
00:25:33.680 --> 00:25:35.260
has zero elements.
00:25:35.260 --> 00:25:37.450
Which subset is it?
00:25:37.450 --> 00:25:39.980
It's the empty set.
00:25:39.980 --> 00:25:45.190
So the empty set is the single
subset of the set that we
00:25:45.190 --> 00:25:49.360
started with that happens to
have exactly zero elements.
00:25:49.360 --> 00:25:52.510
So the formula checks in this
extreme case as well.
00:25:52.510 --> 00:25:55.820
So we're comfortable using it.
00:25:55.820 --> 00:26:01.180
Now these factorials and these
coefficients are really messy
00:26:01.180 --> 00:26:03.050
algebraic objects.
00:26:03.050 --> 00:26:07.740
There's lots of beautiful
identities that they satisfy,
00:26:07.740 --> 00:26:10.350
which you can prove
algebraically sometimes by
00:26:10.350 --> 00:26:13.930
using induction and having
cancellations happen
00:26:13.930 --> 00:26:15.310
all over the place.
00:26:15.310 --> 00:26:17.780
But it's really messy.
00:26:17.780 --> 00:26:22.540
Sometimes you can bypass those
calculations by being clever
00:26:22.540 --> 00:26:24.630
and using your understanding
of what these
00:26:24.630 --> 00:26:26.620
coefficients stand for.
00:26:26.620 --> 00:26:31.490
So here's a typical example.
00:26:31.490 --> 00:26:35.450
What is the sum of those
binomial coefficients?
00:26:35.450 --> 00:26:40.130
I fix n, and sum over
all possible cases.
00:26:40.130 --> 00:26:44.110
So if you're an algebra genius,
you're going to take
00:26:44.110 --> 00:26:49.830
this expression here, plug it in
here, and then start doing
00:26:49.830 --> 00:26:51.460
algebra furiously.
00:26:51.460 --> 00:26:54.970
And half an hour later, you
may get the right answer.
00:26:54.970 --> 00:26:56.425
But now let's try
to be clever.
00:26:59.470 --> 00:27:01.380
What does this really do?
00:27:01.380 --> 00:27:04.200
What does that formula count?
00:27:04.200 --> 00:27:07.280
We're considering k
element subsets.
00:27:07.280 --> 00:27:09.040
That's this number.
00:27:09.040 --> 00:27:12.360
And we're considering the number
of k element subsets
00:27:12.360 --> 00:27:14.840
for different choices of k.
00:27:14.840 --> 00:27:18.890
The first term in this sum
counts how many 0-element
00:27:18.890 --> 00:27:20.450
subsets we have.
00:27:20.450 --> 00:27:23.680
The next term in this sum counts
how many 1-element
00:27:23.680 --> 00:27:24.660
subsets we have.
00:27:24.660 --> 00:27:30.010
The next term counts how many
2-element subsets we have.
00:27:30.010 --> 00:27:33.130
So in the end, what
have we counted?
00:27:33.130 --> 00:27:36.660
We've counted the total
number of subsets.
00:27:36.660 --> 00:27:38.430
We've considered all possible
cardinalities.
00:27:43.420 --> 00:27:46.850
We've counted the number
of subsets of size k.
00:27:46.850 --> 00:27:49.740
We've considered all
possible sizes k.
00:27:49.740 --> 00:27:52.230
The overall count is
going to be the
00:27:52.230 --> 00:27:54.356
total number of subsets.
00:27:57.880 --> 00:28:00.800
And we know what this is.
00:28:00.800 --> 00:28:03.740
A couple of slides ago, we
discussed that this number is
00:28:03.740 --> 00:28:05.480
equal to 2 to the n.
00:28:05.480 --> 00:28:11.550
So, nice, clean and simple
answer, which is easy to guess
00:28:11.550 --> 00:28:15.110
once you give an interpretation
to the
00:28:15.110 --> 00:28:17.580
algebraic expression that you
have in front of you.
00:28:21.610 --> 00:28:22.280
All right.
00:28:22.280 --> 00:28:27.410
So let's move again to sort of
an example in which those
00:28:27.410 --> 00:28:31.960
binomial coefficients are
going to show up.
00:28:31.960 --> 00:28:34.700
So here's the setting--
00:28:34.700 --> 00:28:40.900
n independent coin tosses,
and each coin toss has a
00:28:40.900 --> 00:28:45.770
probability, P, of resulting
in heads.
00:28:45.770 --> 00:28:48.320
So this is our probabilistic
experiment.
00:28:48.320 --> 00:28:51.200
Suppose we do 6 tosses.
00:28:51.200 --> 00:28:53.980
What's the probability that we
get this particular sequence
00:28:53.980 --> 00:28:56.320
of outcomes?
00:28:56.320 --> 00:28:59.800
Because of independence, we
can multiply probability.
00:28:59.800 --> 00:29:02.400
So it's going to be the
probability that the first
00:29:02.400 --> 00:29:05.570
toss results in heads, times
the probability that the
00:29:05.570 --> 00:29:08.770
second toss results in tails,
times the probability that the
00:29:08.770 --> 00:29:12.050
third one results in tails,
times probability of heads,
00:29:12.050 --> 00:29:14.610
times probability of heads,
times probability of heads,
00:29:14.610 --> 00:29:20.930
which is just P to the fourth
times (1 minus P) squared.
00:29:20.930 --> 00:29:24.360
So that's the probability of
this particular sequence.
00:29:24.360 --> 00:29:26.980
How about a different
sequence?
00:29:26.980 --> 00:29:32.830
If I had 4 tails and 2 heads,
but in a different order--
00:29:39.130 --> 00:29:42.870
let's say if we considered
this particular outcome--
00:29:42.870 --> 00:29:45.480
would the answer be different?
00:29:45.480 --> 00:29:49.020
We would still have P, times P,
times P, times P, times (1
00:29:49.020 --> 00:29:51.070
minus P), times (1 minus P).
00:29:51.070 --> 00:29:54.670
We would get again,
the same answer.
00:29:54.670 --> 00:29:59.510
So what you observe from just
this example is that, more
00:29:59.510 --> 00:30:03.240
generally, the probability
of obtaining a particular
00:30:03.240 --> 00:30:08.930
sequence of heads and tails is
P to a power, equal to the
00:30:08.930 --> 00:30:10.300
number of heads.
00:30:10.300 --> 00:30:12.240
So here we had 4 heads.
00:30:12.240 --> 00:30:15.100
So there's P to the
fourth showing up.
00:30:15.100 --> 00:30:21.116
And then (1 minus P) to the
power number of tails.
00:30:21.116 --> 00:30:26.970
So every k head sequence--
00:30:26.970 --> 00:30:32.310
every outcome in which we have
exactly k heads, has the same
00:30:32.310 --> 00:30:37.180
probability, which is going to
be P to the k, (1 minus p), to
00:30:37.180 --> 00:30:38.930
the (n minus k).
00:30:38.930 --> 00:30:43.310
This is the probability of any
particular sequence that has
00:30:43.310 --> 00:30:44.980
exactly k heads.
00:30:44.980 --> 00:30:46.980
So that's the probability
of a particular
00:30:46.980 --> 00:30:48.920
sequence with k heads.
00:30:48.920 --> 00:30:53.160
So now let's ask the question,
what is the probability that
00:30:53.160 --> 00:30:57.980
my experiment results in exactly
k heads, but in some
00:30:57.980 --> 00:30:59.930
arbitrary order?
00:30:59.930 --> 00:31:02.500
So the heads could
show up anywhere.
00:31:02.500 --> 00:31:04.080
So there's a number
of different ways
00:31:04.080 --> 00:31:05.370
that this can happen.
00:31:05.370 --> 00:31:11.220
What's the overall probability
that this event takes place?
00:31:11.220 --> 00:31:15.560
So the probability of an event
taking place is the sum of the
00:31:15.560 --> 00:31:19.390
probabilities of all the
individual ways that
00:31:19.390 --> 00:31:22.020
the event can occur.
00:31:22.020 --> 00:31:24.410
So it's the sum of the
probabilities of all the
00:31:24.410 --> 00:31:27.650
outcomes that make
the event happen.
00:31:27.650 --> 00:31:31.420
The different ways that we can
obtain k heads are the number
00:31:31.420 --> 00:31:37.940
of different sequences that
contain exactly k heads.
00:31:37.940 --> 00:31:44.430
We just figured out that any
sequence with exactly k heads
00:31:44.430 --> 00:31:47.110
has this probability.
00:31:47.110 --> 00:31:51.110
So to do this summation, we just
need to take the common
00:31:51.110 --> 00:31:56.030
probability of each individual
k head sequence, times how
00:31:56.030 --> 00:31:58.140
many terms we have
in this sum.
00:32:01.320 --> 00:32:07.020
So what we're left to do now
is to figure out how many k
00:32:07.020 --> 00:32:09.990
head sequences are there.
00:32:09.990 --> 00:32:15.448
How many outcomes are there in
which we have exactly k heads.
00:32:18.940 --> 00:32:21.270
OK.
00:32:21.270 --> 00:32:27.590
So what are the ways that I can
describe to you a sequence
00:32:27.590 --> 00:32:30.600
with k heads?
00:32:30.600 --> 00:32:34.970
I can take my n slots
that corresponds to
00:32:34.970 --> 00:32:36.220
the different tosses.
00:32:42.920 --> 00:32:45.420
I'm interested in particular
sequences that
00:32:45.420 --> 00:32:47.750
have exactly k heads.
00:32:47.750 --> 00:32:53.590
So what I need to do is to
choose k slots and assign
00:32:53.590 --> 00:32:54.850
heads to them.
00:33:05.530 --> 00:33:11.580
So to specify a sequence that
has exactly k heads is the
00:33:11.580 --> 00:33:17.380
same thing as drawing this
picture and telling you which
00:33:17.380 --> 00:33:23.640
are the k slots that happened
to have heads.
00:33:23.640 --> 00:33:30.110
So I need to choose out of those
n slots, k of them, and
00:33:30.110 --> 00:33:31.635
assign them heads.
00:33:31.635 --> 00:33:35.290
In how many ways can I
choose this k slots?
00:33:35.290 --> 00:33:41.640
Well, it's the question of
starting with a set of n slots
00:33:41.640 --> 00:33:47.080
and choosing k slots out
of the n available.
00:33:47.080 --> 00:33:55.540
So the number of k head
sequences is the same as the
00:33:55.540 --> 00:34:04.300
number of k element subsets of
the set of slots that we
00:34:04.300 --> 00:34:10.520
started with, which are
the n slots 1 up to n.
00:34:10.520 --> 00:34:12.800
We know what that number is.
00:34:12.800 --> 00:34:18.770
We counted, before, the number
of k element subsets, starting
00:34:18.770 --> 00:34:20.290
with a set with n elements.
00:34:20.290 --> 00:34:23.030
And we gave a symbol to that
number, which is that
00:34:23.030 --> 00:34:24.850
thing, n choose k.
00:34:24.850 --> 00:34:28.110
So this is the final answer
that we obtain.
00:34:28.110 --> 00:34:32.449
So these are the so-called
binomial probabilities.
00:34:32.449 --> 00:34:35.190
And they gave us the
probabilities for different
00:34:35.190 --> 00:34:39.580
numbers of heads starting with
a fair coin that's being
00:34:39.580 --> 00:34:42.050
tossed a number of times.
00:34:42.050 --> 00:34:46.170
This formula is correct, of
course, for reasonable values
00:34:46.170 --> 00:34:52.370
of k, meaning its correct for
k equals 0, 1, up to n.
00:34:52.370 --> 00:34:57.650
If k is bigger than n, what's
the probability of k heads?
00:34:57.650 --> 00:35:01.340
If k is bigger than n, there's
no way to obtain k heads, so
00:35:01.340 --> 00:35:03.480
that probability is,
of course, zero.
00:35:03.480 --> 00:35:07.610
So these probabilities only
makes sense for the numbers k
00:35:07.610 --> 00:35:10.405
that are possible, given
that we have n tosses.
00:35:13.200 --> 00:35:16.850
And now a question similar
to the one we had in
00:35:16.850 --> 00:35:18.480
the previous slide.
00:35:18.480 --> 00:35:22.910
If I write down this
summation--
00:35:22.910 --> 00:35:28.240
even worse algebra than the one
in the previous slide--
00:35:28.240 --> 00:35:35.840
what do you think this number
will turn out to be?
00:35:35.840 --> 00:35:39.930
It should be 1 because this
is the probability
00:35:39.930 --> 00:35:42.930
of obtaining k heads.
00:35:42.930 --> 00:35:45.470
When we do the summation, what
we're doing is we're
00:35:45.470 --> 00:35:48.550
considering the probability of
0 heads, plus the probability
00:35:48.550 --> 00:35:50.780
of 1 head, plus the probability
of 2 heads, plus
00:35:50.780 --> 00:35:52.420
the probability of n heads.
00:35:52.420 --> 00:35:54.780
We've exhausted all the
possibilities in our
00:35:54.780 --> 00:35:55.720
experiment.
00:35:55.720 --> 00:35:58.730
So the overall probability,
when you exhaust all
00:35:58.730 --> 00:36:01.160
possibilities, must
be equal to 1.
00:36:01.160 --> 00:36:04.180
So that's yet another beautiful
formula that
00:36:04.180 --> 00:36:06.960
evaluates into something
really simple.
00:36:06.960 --> 00:36:11.460
And if you tried to prove this
identity algebraically, of
00:36:11.460 --> 00:36:16.030
course, you would have to
suffer quite a bit.
00:36:16.030 --> 00:36:20.130
So now armed with the binomial
probabilities, we can do the
00:36:20.130 --> 00:36:21.380
harder problems.
00:36:23.480 --> 00:36:27.340
So let's take the same
experiment again.
00:36:27.340 --> 00:36:32.610
We flip a coin independently
10 times.
00:36:32.610 --> 00:36:37.450
So these 10 tosses
are independent.
00:36:37.450 --> 00:36:40.000
We flip it 10 times.
00:36:40.000 --> 00:36:43.985
We don't see the result, but
somebody comes and tells us,
00:36:43.985 --> 00:36:47.890
you know, there were exactly
3 heads in the 10
00:36:47.890 --> 00:36:49.688
tosses that you had.
00:36:49.688 --> 00:36:50.930
OK?
00:36:50.930 --> 00:36:53.280
So a certain event happened.
00:36:53.280 --> 00:36:57.450
And now you're asked to find
the probability of another
00:36:57.450 --> 00:37:01.400
event, which is that the first
2 tosses were heads.
00:37:01.400 --> 00:37:08.990
Let's call that event A. OK.
00:37:08.990 --> 00:37:14.320
So are we in the setting
of discrete
00:37:14.320 --> 00:37:16.850
uniform probability laws?
00:37:16.850 --> 00:37:21.760
When we toss a coin multiple
times, is it the case that all
00:37:21.760 --> 00:37:24.130
outcomes are equally likely?
00:37:24.130 --> 00:37:27.850
All sequences are
equally likely?
00:37:27.850 --> 00:37:30.515
That's the case if you
have a fair coin--
00:37:30.515 --> 00:37:32.630
that all sequences are
equally likely.
00:37:32.630 --> 00:37:37.170
But if your coin is not fair,
of course, heads/heads is
00:37:37.170 --> 00:37:39.630
going to have a different
probability than tails/tails.
00:37:39.630 --> 00:37:43.720
If your coin is biased towards
heads, then heads/heads is
00:37:43.720 --> 00:37:45.330
going to be more likely.
00:37:45.330 --> 00:37:49.440
So we're not quite in
the uniform setting.
00:37:49.440 --> 00:37:53.450
Our overall sample space, omega,
does not have equally
00:37:53.450 --> 00:37:55.680
likely elements.
00:37:55.680 --> 00:37:57.880
Do we care about that?
00:37:57.880 --> 00:37:59.700
Not necessarily.
00:37:59.700 --> 00:38:04.570
All the action now happens
inside the event B that we are
00:38:04.570 --> 00:38:06.510
told has occurred.
00:38:06.510 --> 00:38:10.000
So we have our big sample
space, omega.
00:38:10.000 --> 00:38:13.860
Elements of that sample space
are not equally likely.
00:38:13.860 --> 00:38:17.390
We are told that a certain
event B occurred.
00:38:17.390 --> 00:38:21.830
And inside that event B, we're
asked to find the conditional
00:38:21.830 --> 00:38:26.100
probability that A has
also occurred.
00:38:26.100 --> 00:38:30.850
Now here's the lucky thing,
inside the event B, all
00:38:30.850 --> 00:38:33.270
outcomes are equally likely.
00:38:35.920 --> 00:38:40.710
The outcomes inside B are the
sequences of 10 tosses that
00:38:40.710 --> 00:38:42.970
have exactly 3 heads.
00:38:42.970 --> 00:38:47.370
Every 3-head sequence has
this probability.
00:38:47.370 --> 00:38:50.790
So the elements of
B are equally
00:38:50.790 --> 00:38:52.760
likely with each other.
00:38:55.800 --> 00:39:01.030
Once we condition on the event
B having occurred, what
00:39:01.030 --> 00:39:03.740
happens to the probabilities
of the different outcomes
00:39:03.740 --> 00:39:05.430
inside here?
00:39:05.430 --> 00:39:09.790
Well, conditional probability
laws keep the same proportions
00:39:09.790 --> 00:39:11.710
as the unconditional ones.
00:39:11.710 --> 00:39:15.930
The elements of B were equally
likely when we started, so
00:39:15.930 --> 00:39:21.590
they're equally likely once we
are told that B has occurred.
00:39:21.590 --> 00:39:26.440
So to do with this problem, we
need to just transport us to
00:39:26.440 --> 00:39:30.680
this smaller universe and think
about what's happening
00:39:30.680 --> 00:39:32.920
in that little universe.
00:39:32.920 --> 00:39:36.150
In that little universe,
all elements of
00:39:36.150 --> 00:39:39.930
B are equally likely.
00:39:39.930 --> 00:39:43.860
So to find the probability of
some subset of that set, we
00:39:43.860 --> 00:39:47.250
only need to count the
cardinality of B, and count
00:39:47.250 --> 00:39:51.090
the cardinality of A.
So let's do that.
00:39:51.090 --> 00:39:53.780
Number of outcomes in B--
00:39:53.780 --> 00:40:00.290
in how many ways can we get
3 heads out of 10 tosses?
00:40:00.290 --> 00:40:03.190
That's the number we considered
before, and
00:40:03.190 --> 00:40:06.250
it's 10 choose 3.
00:40:06.250 --> 00:40:11.020
This is the number of
3-head sequences
00:40:11.020 --> 00:40:13.840
when you have 10 tosses.
00:40:13.840 --> 00:40:20.580
Now let's look at the event A.
The event A is that the first
00:40:20.580 --> 00:40:26.150
2 tosses where heads, but we're
living now inside this
00:40:26.150 --> 00:40:31.220
universe B. Given that B
occurred, how many elements
00:40:31.220 --> 00:40:34.760
does A have in there?
00:40:34.760 --> 00:40:41.536
In how many ways can A happen
inside the B universe.
00:40:41.536 --> 00:40:46.860
If you're told that the
first 2 were heads--
00:40:46.860 --> 00:40:49.470
sorry.
00:40:49.470 --> 00:40:54.540
So out of the outcomes in B that
have 3 heads, how many
00:40:54.540 --> 00:40:56.630
start with heads/heads?
00:40:56.630 --> 00:41:00.370
Well, if it starts with
heads/heads, then the only
00:41:00.370 --> 00:41:04.830
uncertainty is the location
of the third head.
00:41:04.830 --> 00:41:07.940
So we started with heads/heads,
we're going to
00:41:07.940 --> 00:41:13.020
have three heads, the question
is, where is that third head
00:41:13.020 --> 00:41:14.090
going to be.
00:41:14.090 --> 00:41:16.540
It has eight possibilities.
00:41:16.540 --> 00:41:20.940
So slot 1 is heads, slot 2 is
heads, the third heads can be
00:41:20.940 --> 00:41:22.140
anywhere else.
00:41:22.140 --> 00:41:25.020
So there's 8 possibilities
for where the third
00:41:25.020 --> 00:41:26.270
head is going to be.
00:41:29.630 --> 00:41:31.660
OK.
00:41:31.660 --> 00:41:36.720
So what we have counted here is
really the cardinality of A
00:41:36.720 --> 00:41:43.450
intersection B, which is out of
the elements in B, how many
00:41:43.450 --> 00:41:49.410
of them make A happen, divided
by the cardinality of B. And
00:41:49.410 --> 00:41:53.860
that gives us the answer, which
is going to be 10 choose
00:41:53.860 --> 00:41:57.530
3, divided by 8.
00:41:57.530 --> 00:42:01.330
And I should probably redraw a
little bit of the picture that
00:42:01.330 --> 00:42:02.510
they have here.
00:42:02.510 --> 00:42:06.690
The set A is not necessarily
contained in B. It could also
00:42:06.690 --> 00:42:14.750
have stuff outside B. So the
event that the first 2 tosses
00:42:14.750 --> 00:42:18.690
are heads can happen with a
total of 3 heads, but it can
00:42:18.690 --> 00:42:22.650
also happen with a different
total number of heads.
00:42:22.650 --> 00:42:27.340
But once we are transported
inside the set B, what we need
00:42:27.340 --> 00:42:32.460
to count is just this part of
A. It's A intersection B and
00:42:32.460 --> 00:42:35.180
compare it with the total number
of elements in the set
00:42:35.180 --> 00:42:40.310
B. Did I write it the
opposite way?
00:42:40.310 --> 00:42:41.700
Yes.
00:42:41.700 --> 00:42:46.330
So this is 8 over 10 choose 3.
00:42:49.260 --> 00:42:49.640
OK.
00:42:49.640 --> 00:42:52.965
So we're going to close with a
more difficult problem now.
00:42:57.920 --> 00:43:00.580
OK.
00:43:00.580 --> 00:43:05.650
This business of n choose k has
to do with starting with a
00:43:05.650 --> 00:43:11.350
set and picking a subset
of k elements.
00:43:11.350 --> 00:43:15.080
Another way of thinking of that
is that we start with a
00:43:15.080 --> 00:43:20.770
set with n elements and you
choose a subset that has k,
00:43:20.770 --> 00:43:24.350
which means that there's n
minus k that are left.
00:43:24.350 --> 00:43:29.980
Picking a subset is the same as
partitioning our set into
00:43:29.980 --> 00:43:32.510
two pieces.
00:43:32.510 --> 00:43:36.010
Now let's generalize this
question and start counting
00:43:36.010 --> 00:43:38.150
partitions in general.
00:43:38.150 --> 00:43:42.570
Somebody gives you a set
that has n elements.
00:43:42.570 --> 00:43:44.670
Somebody gives you also
certain numbers--
00:43:44.670 --> 00:43:49.400
n1, n2, n3, let's say,
n4, where these
00:43:49.400 --> 00:43:53.740
numbers add up to n.
00:43:53.740 --> 00:43:58.740
And you're asked to partition
this set into four subsets
00:43:58.740 --> 00:44:01.450
where each one of the subsets
has this particular
00:44:01.450 --> 00:44:02.580
cardinality.
00:44:02.580 --> 00:44:08.250
So you're asking to cut it into
four pieces, each one
00:44:08.250 --> 00:44:11.100
having the prescribed
cardinality.
00:44:11.100 --> 00:44:15.090
In how many ways can we
do this partitioning?
00:44:15.090 --> 00:44:19.370
n choose k was the answer when
we partitioned in two pieces,
00:44:19.370 --> 00:44:21.910
what's the answer
more generally?
00:44:21.910 --> 00:44:26.230
For a concrete example of a
partition, you have your 52
00:44:26.230 --> 00:44:32.120
card deck and you deal, as in
bridge, by giving 13 cards to
00:44:32.120 --> 00:44:34.000
each one of the players.
00:44:34.000 --> 00:44:38.080
Assuming that the dealing is
done fairly and with a well
00:44:38.080 --> 00:44:43.790
shuffled deck of cards, every
particular partition of the 52
00:44:43.790 --> 00:44:50.590
cards into four hands, that is
four subsets of 13 each,
00:44:50.590 --> 00:44:52.380
should be equally likely.
00:44:52.380 --> 00:44:56.140
So we take the 52 cards and we
partition them into subsets of
00:44:56.140 --> 00:44:58.550
13, 13, 13, and 13.
00:44:58.550 --> 00:45:01.020
And we assume that all possible
partitions, all
00:45:01.020 --> 00:45:04.240
possible ways of dealing the
cards are equally likely.
00:45:04.240 --> 00:45:07.560
So we are again in a setting
where we can use counting,
00:45:07.560 --> 00:45:10.410
because all the possible
outcomes are equally likely.
00:45:10.410 --> 00:45:14.050
So an outcome of the experiment
is the hands that
00:45:14.050 --> 00:45:17.070
each player ends up getting.
00:45:17.070 --> 00:45:20.170
And when you get the cards in
your hands, it doesn't matter
00:45:20.170 --> 00:45:22.000
in which order that
you got them.
00:45:22.000 --> 00:45:25.460
It only matters what cards
you have on you.
00:45:25.460 --> 00:45:31.160
So it only matters which subset
of the cards you got.
00:45:31.160 --> 00:45:31.590
All right.
00:45:31.590 --> 00:45:35.820
So what's the cardinality of
the sample space in this
00:45:35.820 --> 00:45:37.160
experiment?
00:45:37.160 --> 00:45:42.010
So let's do it for the concrete
numbers that we have
00:45:42.010 --> 00:45:49.390
for the problem of partitioning
52 cards.
00:45:49.390 --> 00:45:52.540
So think of dealing as follows--
you shuffle the deck
00:45:52.540 --> 00:45:56.250
perfectly, and then you take the
top 13 cards and give them
00:45:56.250 --> 00:45:57.740
to one person.
00:45:57.740 --> 00:46:03.230
In how many possible hands are
there for that person?
00:46:03.230 --> 00:46:08.970
Out of the 52 cards, I choose 13
at random and give them to
00:46:08.970 --> 00:46:10.680
the first person.
00:46:10.680 --> 00:46:13.260
Having done that, what
happens next?
00:46:13.260 --> 00:46:16.260
I'm left with 39 cards.
00:46:16.260 --> 00:46:20.600
And out of those 39 cards, I
pick 13 of them and give them
00:46:20.600 --> 00:46:22.250
to the second person.
00:46:22.250 --> 00:46:25.790
Now I'm left with 26 cards.
00:46:25.790 --> 00:46:30.920
Out of those 26, I choose 13,
give them to the third person.
00:46:30.920 --> 00:46:34.040
And for the last person there
isn't really any choice.
00:46:34.040 --> 00:46:37.890
Out of the 13, I have to give
that person all 13.
00:46:37.890 --> 00:46:40.230
And that number is
just equal to 1.
00:46:40.230 --> 00:46:43.530
So we don't care about it.
00:46:43.530 --> 00:46:43.910
All right.
00:46:43.910 --> 00:46:48.270
So next thing you do is to write
down the formulas for
00:46:48.270 --> 00:46:49.450
these numbers.
00:46:49.450 --> 00:46:52.450
So, for example, here you
would have 52 factorial,
00:46:52.450 --> 00:46:55.880
divided by 13 factorial,
times 39
00:46:55.880 --> 00:46:59.040
factorial, and you continue.
00:46:59.040 --> 00:47:01.310
And then there are nice
cancellations that happen.
00:47:01.310 --> 00:47:05.120
This 39 factorial is going to
cancel the 39 factorial that
00:47:05.120 --> 00:47:07.020
comes from there, and so on.
00:47:07.020 --> 00:47:10.200
After you do the cancellations
and all the algebra, you're
00:47:10.200 --> 00:47:13.380
left with this particular
answer, which is the number of
00:47:13.380 --> 00:47:18.120
possible partitions of 52 cards
into four players where
00:47:18.120 --> 00:47:21.710
each player gets exactly
13 hands.
00:47:21.710 --> 00:47:25.140
If you were to generalize this
formula to the setting that we
00:47:25.140 --> 00:47:29.200
have here, the more general
formula is--
00:47:29.200 --> 00:47:33.840
you have n factorial, where n is
the number of objects that
00:47:33.840 --> 00:47:39.770
you are distributing, divided
by the product of the
00:47:39.770 --> 00:47:41.840
factorials of the--
00:47:41.840 --> 00:47:46.000
OK, here I'm doing it for
the case where we split
00:47:46.000 --> 00:47:49.310
it into four sets.
00:47:49.310 --> 00:47:53.740
So that would be the answer when
we partition a set into
00:47:53.740 --> 00:47:57.780
four subsets of prescribed
cardinalities.
00:47:57.780 --> 00:48:00.120
And you can guess how that
formula would generalize if
00:48:00.120 --> 00:48:03.590
you want to split it into
five sets or six sets.
00:48:03.590 --> 00:48:03.950
OK.
00:48:03.950 --> 00:48:10.190
So far we just figured out the
size of the sample space.
00:48:10.190 --> 00:48:14.660
Now we need to look at our
event, which is the event that
00:48:14.660 --> 00:48:20.640
each player gets an ace, let's
call that event A. In how many
00:48:20.640 --> 00:48:22.800
ways can that event happens?
00:48:22.800 --> 00:48:26.970
How many possible hands are
there in which every player
00:48:26.970 --> 00:48:29.350
has exactly one ace?
00:48:29.350 --> 00:48:33.100
So I need to think about the
sequential process by which I
00:48:33.100 --> 00:48:36.860
distribute the cards so that
everybody gets exactly one
00:48:36.860 --> 00:48:40.440
ace, and then try to think
in how many ways can that
00:48:40.440 --> 00:48:42.210
sequential process happen.
00:48:42.210 --> 00:48:45.660
So one way of making sure that
everybody gets exactly one ace
00:48:45.660 --> 00:48:46.730
is the following--
00:48:46.730 --> 00:48:51.210
I take the four aces and I
distribute them randomly to
00:48:51.210 --> 00:48:53.970
the four players, but making
sure that each one gets
00:48:53.970 --> 00:48:55.580
exactly one ace.
00:48:55.580 --> 00:48:57.510
In how many ways can
that happen?
00:48:57.510 --> 00:49:02.210
I take the ace of spades and I
send it to a random person out
00:49:02.210 --> 00:49:03.210
of the four.
00:49:03.210 --> 00:49:07.430
So there's 4 choices for this.
00:49:07.430 --> 00:49:10.280
Then I'm left with 3
aces to distribute.
00:49:10.280 --> 00:49:14.050
That person already
gotten an ace.
00:49:14.050 --> 00:49:17.270
I take the next ace, and
I give it to one of
00:49:17.270 --> 00:49:19.590
the 3 people remaining.
00:49:19.590 --> 00:49:22.480
So there's 3 choices
for how to do that.
00:49:22.480 --> 00:49:26.970
And then for the next ace,
there's 2 people who have not
00:49:26.970 --> 00:49:28.640
yet gotten an ace,
and they give it
00:49:28.640 --> 00:49:30.770
randomly to one of them.
00:49:30.770 --> 00:49:36.760
So these are the possible ways
of distributing for the 4
00:49:36.760 --> 00:49:42.040
aces, so that each person
gets exactly one.
00:49:42.040 --> 00:49:44.230
It's actually the same
as this problem.
00:49:44.230 --> 00:49:48.930
Starting with a set of four
things, in how many ways can I
00:49:48.930 --> 00:49:53.430
partition them into four subsets
where the first set
00:49:53.430 --> 00:49:56.220
has one element, the second has
one element, the third one
00:49:56.220 --> 00:49:58.270
has another element,
and so on.
00:49:58.270 --> 00:50:05.710
So it agrees with that formula
by giving us 4 factorial.
00:50:05.710 --> 00:50:06.040
OK.
00:50:06.040 --> 00:50:09.400
So there are different ways
of distributing the aces.
00:50:09.400 --> 00:50:11.760
And then there's different
ways of distributing the
00:50:11.760 --> 00:50:13.460
remaining 48 cards.
00:50:13.460 --> 00:50:15.110
How many ways are there?
00:50:15.110 --> 00:50:18.920
Well, I have 48 cards that I'm
going to distribute to four
00:50:18.920 --> 00:50:22.760
players by giving 12
cards to each one.
00:50:22.760 --> 00:50:26.430
It's exactly the same question
as the one we had here, except
00:50:26.430 --> 00:50:30.230
that now it's 48 cards,
12 to each person.
00:50:30.230 --> 00:50:33.400
And that gives us this
particular count.
00:50:33.400 --> 00:50:39.216
So putting all that together
gives us the different ways
00:50:39.216 --> 00:50:43.350
that we can distribute the cards
to the four players so
00:50:43.350 --> 00:50:45.910
that each one gets
exactly one ace.
00:50:45.910 --> 00:50:48.600
The number of possible ways
is going to be this four
00:50:48.600 --> 00:50:54.610
factorial, coming from here,
times this number--
00:50:54.610 --> 00:50:56.890
this gives us the number of
ways that the event of
00:50:56.890 --> 00:50:58.760
interest can happen--
00:50:58.760 --> 00:51:02.760
and then the denominator is the
cardinality of our sample
00:51:02.760 --> 00:51:04.930
space, which is this number.
00:51:04.930 --> 00:51:07.560
So this looks like
a horrible mess.
00:51:07.560 --> 00:51:10.590
It turns out that this
expression does simplify to
00:51:10.590 --> 00:51:13.180
something really,
really simple.
00:51:13.180 --> 00:51:16.420
And if you look at the textbook
for this problem, you
00:51:16.420 --> 00:51:18.750
will see an alternative
derivation that gives you a
00:51:18.750 --> 00:51:22.720
short cut to the same
numerical answer.
00:51:22.720 --> 00:51:23.160
All right.
00:51:23.160 --> 00:51:25.240
So that basically concludes
chapter one.
00:51:25.240 --> 00:51:29.940
From next time we're going to
consider introducing random
00:51:29.940 --> 00:51:32.950
variables and make the subject
even more interesting.