1
00:00:00,480 --> 00:00:03,600
We will now work with a
geometric random variable and
2
00:00:03,600 --> 00:00:06,830
put to use our understanding
of conditional PMFs and
3
00:00:06,830 --> 00:00:08,850
conditional expectations.
4
00:00:08,850 --> 00:00:11,810
Remember that a geometric random
variable corresponds to
5
00:00:11,810 --> 00:00:14,580
the number of independent
coin tosses until
6
00:00:14,580 --> 00:00:16,550
the first head occurs.
7
00:00:16,550 --> 00:00:20,445
And here p is a parameter
that describes the coin.
8
00:00:20,445 --> 00:00:24,600
It is the probability of heads
at each coin toss.
9
00:00:24,600 --> 00:00:28,430
We have already seen the formula
for the geometric PMF
10
00:00:28,430 --> 00:00:30,370
and the corresponding plot.
11
00:00:30,370 --> 00:00:34,380
We will now add one very
important property which is
12
00:00:34,380 --> 00:00:37,140
usually called Memorylessness.
13
00:00:37,140 --> 00:00:40,080
Ultimately, this property has
to do with the fact that
14
00:00:40,080 --> 00:00:43,850
independent coin tosses do
not have any memory.
15
00:00:43,850 --> 00:00:48,490
Past coin tosses do not affect
future coin tosses.
16
00:00:48,490 --> 00:00:52,290
So consider a coin-tossing
experiment with independent
17
00:00:52,290 --> 00:00:57,050
tosses and let X be the
number of tosses
18
00:00:57,050 --> 00:01:00,140
until the first heads.
19
00:01:00,140 --> 00:01:04,209
And X is a geometric
with parameter p.
20
00:01:04,209 --> 00:01:08,270
Suppose that you show up a
little after the experiment
21
00:01:08,270 --> 00:01:10,030
has started.
22
00:01:10,030 --> 00:01:15,000
And you're told that there was
so far just one coin toss.
23
00:01:15,000 --> 00:01:18,460
And that this coin toss
resulted in tails.
24
00:01:18,460 --> 00:01:23,940
Now you have to take over and
carry out the remaining tosses
25
00:01:23,940 --> 00:01:26,130
until heads are observed.
26
00:01:26,130 --> 00:01:29,980
What should your model
be about the future?
27
00:01:29,980 --> 00:01:34,440
Well, you will be making
independent coin tosses until
28
00:01:34,440 --> 00:01:36,400
the first heads.
29
00:01:36,400 --> 00:01:41,560
So the number of such tosses
will be a random variable,
30
00:01:41,560 --> 00:01:45,530
which is geometric
with parameter p.
31
00:01:45,530 --> 00:01:47,180
So this duration--
32
00:01:47,180 --> 00:01:49,479
as far as you are concerned--
33
00:01:49,479 --> 00:01:54,430
is geometric with parameter p.
34
00:01:54,430 --> 00:01:59,610
Therefore, the number of
remaining coin tosses starting
35
00:01:59,610 --> 00:02:00,920
from here--
36
00:02:00,920 --> 00:02:04,200
given that the first
toss was tails--
37
00:02:04,200 --> 00:02:07,990
has the same geometric
distribution as the original
38
00:02:07,990 --> 00:02:10,370
random variable X.
39
00:02:10,370 --> 00:02:13,290
This is the Memorylessness
property.
40
00:02:13,290 --> 00:02:18,100
Now, since X is the total number
of coin tosses and
41
00:02:18,100 --> 00:02:22,890
since your coin tosses were
all of them except for the
42
00:02:22,890 --> 00:02:27,190
first one, the random variable
that you are concerned
43
00:02:27,190 --> 00:02:30,430
with is X minus 1.
44
00:02:30,430 --> 00:02:33,030
And so the geometric
distribution that you are
45
00:02:33,030 --> 00:02:39,180
seeing here is the conditional
distribution of X minus 1
46
00:02:39,180 --> 00:02:43,440
given that the first toss
resulted in tails, which is
47
00:02:43,440 --> 00:02:49,520
the same as the event that X
is strictly larger than 1.
48
00:02:49,520 --> 00:02:53,210
So the statement that we have
been making is the following
49
00:02:53,210 --> 00:02:55,460
in more mathematical
language--
50
00:02:55,460 --> 00:03:00,930
that conditioned on X being
larger than 1, the random
51
00:03:00,930 --> 00:03:04,370
variable X minus 1, which is the
remaining number of coin
52
00:03:04,370 --> 00:03:10,100
tosses, has a geometric
distribution with parameter p.
53
00:03:10,100 --> 00:03:11,830
Let us now give a more precise,
54
00:03:11,830 --> 00:03:13,150
mathematical argument.
55
00:03:13,150 --> 00:03:15,620
But first, for a special case.
56
00:03:15,620 --> 00:03:19,210
Let's us look at the conditional
probabilities for
57
00:03:19,210 --> 00:03:22,430
the random variable X minus 1.
58
00:03:22,430 --> 00:03:25,560
And calculate, for example, the
conditional probability
59
00:03:25,560 --> 00:03:32,610
that X minus 1 is equal to 3,
given that X is larger than 1.
60
00:03:32,610 --> 00:03:34,780
Which is the same as saying
that the first
61
00:03:34,780 --> 00:03:36,730
toss resulted in tails.
62
00:03:39,460 --> 00:03:41,770
Now, the first toss
resulted in tails.
63
00:03:41,770 --> 00:03:45,480
This is the probability that
you will need three more
64
00:03:45,480 --> 00:03:48,970
tosses until you
observe heads.
65
00:03:48,970 --> 00:03:52,410
Needing three more tosses until
you observe heads is the
66
00:03:52,410 --> 00:03:58,079
event that you had tails in the
second toss, tails in the
67
00:03:58,079 --> 00:04:03,660
third toss, and heads
in the fourth toss.
68
00:04:03,660 --> 00:04:07,170
And all that is conditioned
on the first toss
69
00:04:07,170 --> 00:04:10,750
having resulted in tails.
70
00:04:10,750 --> 00:04:14,900
However, the different coin
tosses are independent.
71
00:04:14,900 --> 00:04:18,810
So the conditional
probabilities, given the event
72
00:04:18,810 --> 00:04:22,670
that the first toss was tails
should be the same as the
73
00:04:22,670 --> 00:04:25,050
unconditional probabilities.
74
00:04:25,050 --> 00:04:28,580
The first toss does not change
our beliefs about the
75
00:04:28,580 --> 00:04:34,470
probabilities associated with
the remaining tosses.
76
00:04:34,470 --> 00:04:36,760
Now, this unconditional
77
00:04:36,760 --> 00:04:40,190
probability is easy to calculate.
78
00:04:40,190 --> 00:04:43,550
It is 1 minus p squared--
79
00:04:43,550 --> 00:04:45,900
because we have two
tails in a row--
80
00:04:45,900 --> 00:04:48,920
times p.
81
00:04:48,920 --> 00:04:53,170
Now, we observe that this
quantity here is the
82
00:04:53,170 --> 00:04:57,670
probability that a geometric
random variable takes the
83
00:04:57,670 --> 00:05:00,580
value of three.
84
00:05:00,580 --> 00:05:03,240
Here what have we calculated?
85
00:05:03,240 --> 00:05:09,340
We have calculated the PMF of
the random variable X minus 1
86
00:05:09,340 --> 00:05:14,610
in a conditional universe where
X is larger than 1.
87
00:05:14,610 --> 00:05:18,510
And we evaluated it
for a value of 3.
88
00:05:18,510 --> 00:05:21,210
The probability that our random
variable X minus 1
89
00:05:21,210 --> 00:05:23,200
takes the value of 3.
90
00:05:23,200 --> 00:05:28,410
So what we have shown is that
this conditional PMF is the
91
00:05:28,410 --> 00:05:32,900
same as the unconditional PMF.
92
00:05:32,900 --> 00:05:36,110
Now, there is nothing special
about the number 3.
93
00:05:36,110 --> 00:05:40,600
You can generalize this argument
and establish that
94
00:05:40,600 --> 00:05:45,900
the conditional probability of
X minus 1 given that X is
95
00:05:45,900 --> 00:05:51,090
strictly larger than one, for
any particular k, is the same
96
00:05:51,090 --> 00:05:54,300
as the corresponding probability
for the random
97
00:05:54,300 --> 00:06:01,600
variable X, which is given
by the geometric PMF.
98
00:06:01,600 --> 00:06:07,715
Finally, there is nothing
special about the value of 1
99
00:06:07,715 --> 00:06:10,980
that we're using here.
100
00:06:10,980 --> 00:06:16,500
In fact, we can generalize
and argue as follows--
101
00:06:16,500 --> 00:06:20,410
suppose that I tell you that X
is strictly larger than n.
102
00:06:20,410 --> 00:06:24,880
That is, the first n tosses
resulted in tails.
103
00:06:24,880 --> 00:06:29,160
Once more, these past tosses
were wasted but have no effect
104
00:06:29,160 --> 00:06:30,200
on the future.
105
00:06:30,200 --> 00:06:34,909
So the conditional PMF of the
remaining number of tosses
106
00:06:34,909 --> 00:06:38,090
should be, again, the same.
107
00:06:38,090 --> 00:06:42,200
Therefore, the statement we're
making is that this geometric
108
00:06:42,200 --> 00:06:49,330
PMF will also be the PMF of
X minus n, given that X is
109
00:06:49,330 --> 00:06:53,659
strictly larger than n, and this
will be true no matter
110
00:06:53,659 --> 00:06:58,570
what argument we plug-in
into the PMF.
111
00:06:58,570 --> 00:07:01,640
We will now exploit the
Memorylessness property of the
112
00:07:01,640 --> 00:07:05,490
geometric PMF and use it
together with the total
113
00:07:05,490 --> 00:07:08,690
expectation theorem to
calculate the mean or
114
00:07:08,690 --> 00:07:11,620
expectation of the
geometric PMF.
115
00:07:11,620 --> 00:07:14,050
If we wanted to calculate the
expected value of the
116
00:07:14,050 --> 00:07:17,590
geometric using the definition
of the expectation, we would
117
00:07:17,590 --> 00:07:21,280
have to calculate this infinite
sum here, which is
118
00:07:21,280 --> 00:07:23,030
quite difficult.
119
00:07:23,030 --> 00:07:26,710
Instead, we're going to
use a certain trick.
120
00:07:26,710 --> 00:07:28,420
The trick is the following--
121
00:07:28,420 --> 00:07:32,120
to break down the expected value
calculation into two
122
00:07:32,120 --> 00:07:33,820
different scenarios.
123
00:07:33,820 --> 00:07:38,210
Under one scenario we obtain
heads in the first toss.
124
00:07:38,210 --> 00:07:42,430
And in that case the
random variable X--
125
00:07:42,430 --> 00:07:44,460
the number of tosses until
the first heads--
126
00:07:44,460 --> 00:07:45,730
is equal to 1.
127
00:07:45,730 --> 00:07:48,480
And this scenario occurs
with probability p.
128
00:07:48,480 --> 00:07:51,470
And we have another scenario
with probability 1 minus p
129
00:07:51,470 --> 00:07:54,580
where we obtain tails
in the first toss.
130
00:07:54,580 --> 00:07:56,830
And in that case, our
random variable is
131
00:07:56,830 --> 00:07:59,400
strictly larger than 1.
132
00:07:59,400 --> 00:08:05,340
Now, the expected value of
X consists of two pieces.
133
00:08:05,340 --> 00:08:08,930
We have a first toss
no matter what.
134
00:08:08,930 --> 00:08:12,990
And then we have the number
of remaining tosses,
135
00:08:12,990 --> 00:08:15,060
which is X minus 1.
136
00:08:15,060 --> 00:08:18,640
So this is true by linearity
of expectations.
137
00:08:18,640 --> 00:08:23,480
Now, the expected value of X
minus 1 consists of two pieces
138
00:08:23,480 --> 00:08:25,770
using the total expectation
theorem.
139
00:08:25,770 --> 00:08:29,300
The probability of the first
scenario times the expected
140
00:08:29,300 --> 00:08:34,450
value of X minus 1 given
that X is equal to 1,
141
00:08:34,450 --> 00:08:36,558
plus 1 minus p--
142
00:08:36,558 --> 00:08:38,970
the probability of the
second scenario--
143
00:08:38,970 --> 00:08:43,549
times the expected value of
X minus 1 given that X
144
00:08:43,549 --> 00:08:45,000
is bigger than 1.
145
00:08:47,660 --> 00:08:51,200
Now, this term here is 0.
146
00:08:51,200 --> 00:08:52,000
Why?
147
00:08:52,000 --> 00:08:55,730
If I tell you that X is equal
to 1, then you're certain
148
00:08:55,730 --> 00:08:58,350
that's X minus 1
is equal to 0.
149
00:08:58,350 --> 00:09:01,770
So this term gives
a 0 contribution.
150
00:09:01,770 --> 00:09:03,760
How about the next term?
151
00:09:03,760 --> 00:09:08,850
We have a 1 minus p here times
this expected value.
152
00:09:08,850 --> 00:09:15,560
Now this random variable,
conditioned on this event, has
153
00:09:15,560 --> 00:09:20,890
the same distribution as an
ordinary, unconditioned
154
00:09:20,890 --> 00:09:22,890
geometric random variable.
155
00:09:22,890 --> 00:09:27,450
So this expectation here must be
the same as the expectation
156
00:09:27,450 --> 00:09:32,050
of an ordinary, unconditioned,
geometric random variable.
157
00:09:32,050 --> 00:09:34,730
And this gives us an equality.
158
00:09:34,730 --> 00:09:38,900
Both sides involve the expected
value of X. But we
159
00:09:38,900 --> 00:09:42,780
can solve this equation for
the expected value.
160
00:09:42,780 --> 00:09:45,610
And we obtain the end result
that the expected
161
00:09:45,610 --> 00:09:49,540
value is 1 over p.
162
00:09:49,540 --> 00:09:53,550
By the way, this answer
makes intuitive sense.
163
00:09:53,550 --> 00:09:58,030
If p is small, this means
that the odds of
164
00:09:58,030 --> 00:10:00,860
seeing heads is small.
165
00:10:00,860 --> 00:10:05,040
Then in that case, we need to
wait longer and longer until
166
00:10:05,040 --> 00:10:07,720
we see heads for
the first time.
167
00:10:07,720 --> 00:10:10,740
Setting aside the specific form
of the answer that we
168
00:10:10,740 --> 00:10:15,510
found, what we have just done
actually illustrates that
169
00:10:15,510 --> 00:10:19,950
fairly difficult calculations
can become very simple if one
170
00:10:19,950 --> 00:10:23,740
breaks down a model or a problem
in a clever way.
171
00:10:23,740 --> 00:10:26,450
This is going to be a recurring
theme throughout
172
00:10:26,450 --> 00:10:27,700
this class.