WEBVTT
00:00:00.820 --> 00:00:03.830
The next random variable that
we will discuss is the
00:00:03.830 --> 00:00:05.560
binomial random variable.
00:00:05.560 --> 00:00:07.570
It is one that is
already familiar
00:00:07.570 --> 00:00:09.380
to us in most respects.
00:00:09.380 --> 00:00:13.190
It is associated with the
experiment of taking a coin
00:00:13.190 --> 00:00:16.510
and tossing it n times
independently.
00:00:16.510 --> 00:00:20.100
And at each toss, there
is a probability, p,
00:00:20.100 --> 00:00:21.590
of obtaining heads.
00:00:21.590 --> 00:00:25.120
So the experiment is completely
specified in terms
00:00:25.120 --> 00:00:26.270
of two parameters--
00:00:26.270 --> 00:00:30.470
n, the number of tosses, and p,
the probability of heads at
00:00:30.470 --> 00:00:32.549
each one of the tosses.
00:00:32.549 --> 00:00:35.830
We can represent this experiment
by the usual
00:00:35.830 --> 00:00:38.020
sequential tree diagram.
00:00:38.020 --> 00:00:41.490
And the leaves of the tree are
the possible outcomes of the
00:00:41.490 --> 00:00:42.440
experiment.
00:00:42.440 --> 00:00:44.640
So these are the elements
of the sample space.
00:00:44.640 --> 00:00:49.585
And a typical outcome is a
particular sequence of heads
00:00:49.585 --> 00:00:52.190
and tails that has length n.
00:00:52.190 --> 00:00:56.890
In this diagram here, we took
n to be equal to 3.
00:00:56.890 --> 00:01:00.120
We can now define a random
variable associated with this
00:01:00.120 --> 00:01:00.720
experiment.
00:01:00.720 --> 00:01:03.660
Our random variable that we
denote by capital X is the
00:01:03.660 --> 00:01:06.370
number of heads that
are observed.
00:01:06.370 --> 00:01:10.950
So for example, if the outcome
happens to be this one--
00:01:10.950 --> 00:01:15.900
tails, heads, heads-- we have
2 heads that are observed.
00:01:15.900 --> 00:01:21.200
And the numerical value of our
random variable is equal to 2.
00:01:21.200 --> 00:01:24.580
In general, this random
variable, a binomial random
00:01:24.580 --> 00:01:29.500
variable, can be used to model
any kind of situation in which
00:01:29.500 --> 00:01:34.200
we have a fixed number of
independent trials and
00:01:34.200 --> 00:01:38.160
identical trials, and each trial
can result in success or
00:01:38.160 --> 00:01:42.539
failure, and we have a
probability of success equal
00:01:42.539 --> 00:01:44.880
to some given number, p.
00:01:44.880 --> 00:01:48.440
The number of successes obtained
in these trials is,
00:01:48.440 --> 00:01:51.000
of course, random and
it is modeled by a
00:01:51.000 --> 00:01:53.800
binomial random variable.
00:01:53.800 --> 00:01:57.979
We can now proceed and calculate
the PMF of this
00:01:57.979 --> 00:01:59.220
random variable.
00:01:59.220 --> 00:02:03.140
Instead of calculating the whole
PMF, let us look at just
00:02:03.140 --> 00:02:06.000
one typical entry of the PMF.
00:02:06.000 --> 00:02:09.259
Let's look at this entry, which,
by definition, is the
00:02:09.259 --> 00:02:14.990
probability that our random
variable takes the value of 2.
00:02:14.990 --> 00:02:17.840
Now, the random variable taking
the numerical value of
00:02:17.840 --> 00:02:23.030
2, this is an event that can
happen in three possible ways
00:02:23.030 --> 00:02:25.840
that we can identify in
the sample space.
00:02:25.840 --> 00:02:29.150
We can have 2 heads followed
by a tail.
00:02:29.150 --> 00:02:32.990
We can have heads,
tails, heads.
00:02:32.990 --> 00:02:39.460
Or we can have tails,
heads, heads.
00:02:39.460 --> 00:02:43.150
The probability of this
outcome is p times p
00:02:43.150 --> 00:02:44.690
times (1 minus p).
00:02:44.690 --> 00:02:47.230
So it's p squared times
(1 minus p).
00:02:47.230 --> 00:02:49.660
And the other two outcomes
also have the same
00:02:49.660 --> 00:02:54.050
probability, so the overall
probability is 3 times this.
00:02:54.050 --> 00:02:59.250
Which can also be written this
way, 3 is the same as
00:02:59.250 --> 00:03:00.060
3-choose-2.
00:03:00.060 --> 00:03:03.050
It's the number of ways that you
can choose 2 heads, where
00:03:03.050 --> 00:03:04.930
they will be placed
in a sequence of
00:03:04.930 --> 00:03:09.940
3 slots or 3 trials.
00:03:09.940 --> 00:03:15.660
More generally, we have the
familiar binomial formula.
00:03:15.660 --> 00:03:18.530
So this is a formula that
you have already seen.
00:03:18.530 --> 00:03:22.180
It's the probability of
obtaining k successes in a
00:03:22.180 --> 00:03:25.180
sequence of n independent
trials.
00:03:25.180 --> 00:03:29.160
The only thing that is new is
that instead of using the
00:03:29.160 --> 00:03:32.130
traditional probability
notation, now
00:03:32.130 --> 00:03:35.020
we're using PMF notation.
00:03:35.020 --> 00:03:38.750
To get a feel for the binomial
PMF, it's instructive to look
00:03:38.750 --> 00:03:39.890
at some plots.
00:03:39.890 --> 00:03:43.670
So suppose that we toss the coin
three times and that the
00:03:43.670 --> 00:03:46.510
coin tosses are fair, so that
the probability of heads is
00:03:46.510 --> 00:03:48.180
equal to 1/2.
00:03:48.180 --> 00:03:52.930
Then we see that 1 head or 2
heads are equally likely, and
00:03:52.930 --> 00:03:59.100
they are more likely than the
outcome of 0 or 3 heads.
00:03:59.100 --> 00:04:02.860
Now, if we change the number of
tosses and toss the coin 10
00:04:02.860 --> 00:04:07.260
times, then we see that
the most likely result
00:04:07.260 --> 00:04:10.050
is to have 5 heads.
00:04:10.050 --> 00:04:13.660
And then as we move away from
5 in either direction, the
00:04:13.660 --> 00:04:15.920
probability of that
particular result
00:04:15.920 --> 00:04:18.250
becomes smaller and smaller.
00:04:18.250 --> 00:04:22.840
Now, if we toss the coin many
times, let's say 100 times,
00:04:22.840 --> 00:04:28.250
the coin is still fair, then
we see that the number of
00:04:28.250 --> 00:04:32.350
heads that we're going to get is
most likely to be somewhere
00:04:32.350 --> 00:04:36.810
in this range between,
let's say, 35 and 65.
00:04:36.810 --> 00:04:40.750
These are values of the random
variable that have some
00:04:40.750 --> 00:04:43.820
noticeable or high
probabilities.
00:04:43.820 --> 00:04:48.710
But anything below 30 or
anything about 70 is extremely
00:04:48.710 --> 00:04:51.240
unlikely to occur.
00:04:51.240 --> 00:04:55.360
We can generate similar plots
for unfair coins.
00:04:55.360 --> 00:04:58.360
So suppose now that our coin is
biased and the probability
00:04:58.360 --> 00:05:01.610
of heads is quite low,
equal to 0.2.
00:05:01.610 --> 00:05:05.680
In that case, the most likely
result is that we're going to
00:05:05.680 --> 00:05:07.640
see 0 heads.
00:05:07.640 --> 00:05:10.740
And then, there's smaller and
smaller probability of
00:05:10.740 --> 00:05:12.440
obtaining more heads.
00:05:12.440 --> 00:05:16.740
On the other hand, if we toss
the coin 10 times, we expect
00:05:16.740 --> 00:05:21.320
to see a few heads, not a very
large number, but some number
00:05:21.320 --> 00:05:25.210
of heads between, let's
say, 0 and 4.
00:05:25.210 --> 00:05:30.210
Finally, if we toss the coin 100
times and we take the coin
00:05:30.210 --> 00:05:35.690
to be an extremely unfair one,
what do we expect to see?
00:05:35.690 --> 00:05:39.220
If we think of probabilities as
frequencies, we expect to
00:05:39.220 --> 00:05:43.010
see heads roughly
10% of the time.
00:05:43.010 --> 00:05:48.190
So, given that n is 100, we
expect to see about 10 heads.
00:05:48.190 --> 00:05:51.460
But when we say about
10 heads, we do not
00:05:51.460 --> 00:05:53.659
mean exactly 10 heads.
00:05:53.659 --> 00:05:57.480
About 10 heads, in this
instance, as this plot tells
00:05:57.480 --> 00:06:02.440
us, is any number more or less
in the range from 0 to 20.
00:06:02.440 --> 00:06:05.650
But anything above 20 is
extremely unlikely.