1
00:00:00,000 --> 00:00:00,040
2
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative
3
00:00:02,460 --> 00:00:03,870
Commons license.
4
00:00:03,870 --> 00:00:06,910
Your support will help MIT
OpenCourseWare continue to
5
00:00:06,910 --> 00:00:10,560
offer high quality educational
resources for free.
6
00:00:10,560 --> 00:00:13,460
To make a donation or view
additional materials from
7
00:00:13,460 --> 00:00:19,290
hundreds of MIT courses, visit
MIT OpenCourseWare at
8
00:00:19,290 --> 00:00:20,540
ocw.mit.edu.
9
00:00:20,540 --> 00:00:23,130
10
00:00:23,130 --> 00:00:25,940
PROFESSOR: So today's agenda
is to say a few more things
11
00:00:25,940 --> 00:00:28,050
about continuous random
variables.
12
00:00:28,050 --> 00:00:32,049
Mainly we're going to talk a
little bit about inference.
13
00:00:32,049 --> 00:00:35,080
This is a topic that we're going
to revisit at the end of
14
00:00:35,080 --> 00:00:36,390
the semester.
15
00:00:36,390 --> 00:00:38,070
But there's a few things
that we can
16
00:00:38,070 --> 00:00:40,180
already say at this point.
17
00:00:40,180 --> 00:00:44,060
And then the new topic for
today is the subject of
18
00:00:44,060 --> 00:00:45,880
derived distributions.
19
00:00:45,880 --> 00:00:48,140
Basically if you know the
distribution of one random
20
00:00:48,140 --> 00:00:50,230
variable, and you have a
function of that random
21
00:00:50,230 --> 00:00:52,010
variable, how to find a
22
00:00:52,010 --> 00:00:54,840
distribution for that function.
23
00:00:54,840 --> 00:00:58,180
And it's a fairly mechanical
skill, but that's an important
24
00:00:58,180 --> 00:01:00,740
one, so we're going
to go through it.
25
00:01:00,740 --> 00:01:02,200
So let's see where we stand.
26
00:01:02,200 --> 00:01:03,540
Here is the big picture.
27
00:01:03,540 --> 00:01:06,720
That's all we have
done so far.
28
00:01:06,720 --> 00:01:09,460
We have talked about discrete
random variables, which we
29
00:01:09,460 --> 00:01:11,970
described by probability
mass function.
30
00:01:11,970 --> 00:01:14,900
So if we have multiple random
variables, we describe them
31
00:01:14,900 --> 00:01:16,760
with the a joint
mass function.
32
00:01:16,760 --> 00:01:19,810
And then we define conditional
probabilities, or conditional
33
00:01:19,810 --> 00:01:24,310
PMFs, and the three are related
according to this
34
00:01:24,310 --> 00:01:27,040
formula, which is, you can
think of it either as the
35
00:01:27,040 --> 00:01:29,300
definition of conditional
probability.
36
00:01:29,300 --> 00:01:32,170
Or as the multiplication rule,
the probability of two things
37
00:01:32,170 --> 00:01:35,870
happening is the product of the
probabilities of the first
38
00:01:35,870 --> 00:01:38,200
thing happening, and then the
second happening, given that
39
00:01:38,200 --> 00:01:39,860
the first has happened.
40
00:01:39,860 --> 00:01:42,830
There's another relation between
this, which is the
41
00:01:42,830 --> 00:01:46,360
probability of x occurring, is
the sum of the different
42
00:01:46,360 --> 00:01:50,560
probabilities of the different
ways that x may occur, which
43
00:01:50,560 --> 00:01:53,700
is in conjunction with different
values of y.
44
00:01:53,700 --> 00:01:57,730
And there's an analog of all
that in the continuous world,
45
00:01:57,730 --> 00:02:02,430
where all you do is to replace
p's by f's, and replace sums
46
00:02:02,430 --> 00:02:03,340
by integrals.
47
00:02:03,340 --> 00:02:05,620
So the formulas all
look the same.
48
00:02:05,620 --> 00:02:09,120
The interpretations are a little
more subtle, so the f's
49
00:02:09,120 --> 00:02:11,720
are not probabilities, they're
probability densities.
50
00:02:11,720 --> 00:02:16,010
So they're probabilities per
unit length, or in the case of
51
00:02:16,010 --> 00:02:20,290
joint PDf's, these are
probabilities per unit area.
52
00:02:20,290 --> 00:02:22,690
So they're densities
of some sort.
53
00:02:22,690 --> 00:02:26,020
Probably the more subtle concept
to understand what it
54
00:02:26,020 --> 00:02:29,250
really is the conditional
density.
55
00:02:29,250 --> 00:02:30,590
In some sense, it's simple.
56
00:02:30,590 --> 00:02:34,900
It's just the density of X in
a world where you have been
57
00:02:34,900 --> 00:02:40,290
told the value of the random
variable Y. It's a function
58
00:02:40,290 --> 00:02:44,510
that has two arguments, but the
best way to think about it
59
00:02:44,510 --> 00:02:47,050
is to say that we fixed y.
60
00:02:47,050 --> 00:02:50,980
We're told the value of the
random variable Y, and we look
61
00:02:50,980 --> 00:02:52,930
at it as a function of x.
62
00:02:52,930 --> 00:02:56,150
So as a function of x, the
denominator is a constant, and
63
00:02:56,150 --> 00:02:59,650
it just looks like the
joint density.
64
00:02:59,650 --> 00:03:01,620
when we keep y fixed.
65
00:03:01,620 --> 00:03:05,570
So it's really a function of
one argument, just the
66
00:03:05,570 --> 00:03:06,870
argument x.
67
00:03:06,870 --> 00:03:10,080
And it has the same shape as the
joint's density when you
68
00:03:10,080 --> 00:03:11,720
take that slice of it.
69
00:03:11,720 --> 00:03:17,570
So conditional PDFs are just
slices of joint PDFs.
70
00:03:17,570 --> 00:03:20,810
There's a bunch of concepts,
expectations, variances,
71
00:03:20,810 --> 00:03:23,790
cumulative distribution
functions that apply equally
72
00:03:23,790 --> 00:03:26,260
well for to both universes
of discrete or
73
00:03:26,260 --> 00:03:28,800
continuous random variables.
74
00:03:28,800 --> 00:03:31,330
So why is probability useful?
75
00:03:31,330 --> 00:03:36,170
Probability is useful because,
among other things, we use it
76
00:03:36,170 --> 00:03:38,420
to make sense of the
world around us.
77
00:03:38,420 --> 00:03:41,870
We use it to make inferences
about things that we do not
78
00:03:41,870 --> 00:03:43,280
see directly.
79
00:03:43,280 --> 00:03:45,570
And this is done in a
very simple manner
80
00:03:45,570 --> 00:03:46,840
using the base rule.
81
00:03:46,840 --> 00:03:49,730
We've already seen some of that,
and now we're going to
82
00:03:49,730 --> 00:03:55,070
revisit it with a bunch of
different variations.
83
00:03:55,070 --> 00:03:58,240
And the variations come because
sometimes our random
84
00:03:58,240 --> 00:04:01,040
variable are discrete, sometimes
they're continuous,
85
00:04:01,040 --> 00:04:04,390
or we can have a combination
of the two.
86
00:04:04,390 --> 00:04:08,170
So the big picture is that
there's some unknown random
87
00:04:08,170 --> 00:04:11,660
variable out of there, and we
know the distribution that's
88
00:04:11,660 --> 00:04:12,550
random variable.
89
00:04:12,550 --> 00:04:16,360
And in the discrete case, it's
going to be given by PMF.
90
00:04:16,360 --> 00:04:20,269
In the continuous case,
it's given a PDF.
91
00:04:20,269 --> 00:04:24,060
Then we have some phenomenon,
some noisy phenomenon or some
92
00:04:24,060 --> 00:04:28,380
measuring device, and that
measuring device produces
93
00:04:28,380 --> 00:04:31,260
observable random variables Y.
94
00:04:31,260 --> 00:04:34,930
We don't know what x is, but we
have some beliefs about how
95
00:04:34,930 --> 00:04:36,310
X is distributed.
96
00:04:36,310 --> 00:04:39,450
We observe the random variable
Y. We need a
97
00:04:39,450 --> 00:04:41,300
model of this box.
98
00:04:41,300 --> 00:04:46,170
And the model of that box is
going to be either a PMF, for
99
00:04:46,170 --> 00:04:52,565
the random variable Y. And that
model tells us, if the
100
00:04:52,565 --> 00:04:57,080
true state of the world is X,
how do we expect to Y to be
101
00:04:57,080 --> 00:04:58,520
distributed?
102
00:04:58,520 --> 00:05:01,610
That's for the case where
Y is this discrete.
103
00:05:01,610 --> 00:05:06,350
If Y is a continuous, you might
instead have a density
104
00:05:06,350 --> 00:05:10,820
for Y, or something
of that form.
105
00:05:10,820 --> 00:05:13,980
So in either case, this
should be a function
106
00:05:13,980 --> 00:05:15,520
that's known to us.
107
00:05:15,520 --> 00:05:18,370
This is our model of the
measuring device.
108
00:05:18,370 --> 00:05:20,950
And now having observed
y, we want to make
109
00:05:20,950 --> 00:05:22,680
inferences about x.
110
00:05:22,680 --> 00:05:25,140
What does it mean to
make inferences?
111
00:05:25,140 --> 00:05:29,880
Well the most complete answer in
the inference problem is to
112
00:05:29,880 --> 00:05:32,380
tell me the probability
distribution
113
00:05:32,380 --> 00:05:34,830
of the unknown quantity.
114
00:05:34,830 --> 00:05:36,900
But when I say the probability
distribution, I
115
00:05:36,900 --> 00:05:38,540
don't mean this one.
116
00:05:38,540 --> 00:05:41,280
I mean the probability
distribution that takes into
117
00:05:41,280 --> 00:05:43,760
account the measurements
that you got.
118
00:05:43,760 --> 00:05:48,270
So the output of an inference
problem is to come up with the
119
00:05:48,270 --> 00:05:59,830
distribution of X, the unknown
quantity, given what we have
120
00:05:59,830 --> 00:06:00,980
already observed.
121
00:06:00,980 --> 00:06:04,110
And in the discrete case, it
would be an object like that.
122
00:06:04,110 --> 00:06:08,920
If X is continuous, it would
be an object of this kind.
123
00:06:08,920 --> 00:06:13,340
124
00:06:13,340 --> 00:06:18,080
OK, so we're given conditional
probabilities of this type,
125
00:06:18,080 --> 00:06:21,240
and we want to get conditional
distributions of the opposite
126
00:06:21,240 --> 00:06:23,280
type where the order of the
127
00:06:23,280 --> 00:06:25,580
conditioning is being reversed.
128
00:06:25,580 --> 00:06:28,980
So the starting point
is always a formula
129
00:06:28,980 --> 00:06:30,810
such as this one.
130
00:06:30,810 --> 00:06:33,670
The probability of x happening,
and then y
131
00:06:33,670 --> 00:06:36,280
happening given that
x happens.
132
00:06:36,280 --> 00:06:40,910
This is the probability that
a particular x and y happen
133
00:06:40,910 --> 00:06:42,370
simultaneously.
134
00:06:42,370 --> 00:06:47,240
But this is also equal to the
probability that y happens,
135
00:06:47,240 --> 00:06:50,377
and then that x happens, given
that y has happened.
136
00:06:50,377 --> 00:06:53,060
137
00:06:53,060 --> 00:06:57,140
And you take this expression
and send one term to the
138
00:06:57,140 --> 00:07:00,950
denominator of the other side,
and this gives us the base
139
00:07:00,950 --> 00:07:03,180
rule for the discrete case.
140
00:07:03,180 --> 00:07:05,550
Which is this one that you have
already seen, and you
141
00:07:05,550 --> 00:07:07,200
have played with it.
142
00:07:07,200 --> 00:07:10,720
So this is what the formula
looks like in
143
00:07:10,720 --> 00:07:12,030
the discrete case.
144
00:07:12,030 --> 00:07:14,570
And the typical example where
both random variables are
145
00:07:14,570 --> 00:07:18,000
discrete is the one we discussed
some time ago.
146
00:07:18,000 --> 00:07:20,720
X is, let's say, a binary
variable, or whether an
147
00:07:20,720 --> 00:07:22,960
airplane is present
up there or not.
148
00:07:22,960 --> 00:07:27,790
Y is a discrete measurement, for
example, whether our radar
149
00:07:27,790 --> 00:07:30,040
beeped or it didn't beep.
150
00:07:30,040 --> 00:07:33,860
And we make inferences and
calculate the probability that
151
00:07:33,860 --> 00:07:37,860
the plane is there, or the
probability that the plane is
152
00:07:37,860 --> 00:07:41,000
not there, given the measurement
that we have made.
153
00:07:41,000 --> 00:07:43,940
And of course X and Y do not
need to be just binary.
154
00:07:43,940 --> 00:07:47,480
They could be more general
discrete random variables.
155
00:07:47,480 --> 00:07:50,900
So how does the story change
in the continuous case?
156
00:07:50,900 --> 00:07:53,290
First, what's a possible
application of
157
00:07:53,290 --> 00:07:54,570
the continuous case?
158
00:07:54,570 --> 00:07:59,620
Well, think of X as being some
signal that takes values over
159
00:07:59,620 --> 00:08:00,630
a continuous range.
160
00:08:00,630 --> 00:08:04,730
Let's say X is the current
through a resistor.
161
00:08:04,730 --> 00:08:07,530
And then you have some measuring
device that measures
162
00:08:07,530 --> 00:08:11,530
currents, but that device is
noisy, it gets hit, let's say
163
00:08:11,530 --> 00:08:13,640
for example, by Gaussian
noise.
164
00:08:13,640 --> 00:08:18,340
And the Y that you observe is a
noisy version of X. But your
165
00:08:18,340 --> 00:08:22,410
instruments are analog, so
you measure things on
166
00:08:22,410 --> 00:08:24,750
a continuous scale.
167
00:08:24,750 --> 00:08:26,250
What are you going to
do in that case?
168
00:08:26,250 --> 00:08:29,920
Well the inference problem, the
output of the inference
169
00:08:29,920 --> 00:08:33,360
problem, is going to be the
conditional distribution of X.
170
00:08:33,360 --> 00:08:38,950
What do you think your current
is based on a particular value
171
00:08:38,950 --> 00:08:40,870
of Y that you have observed?
172
00:08:40,870 --> 00:08:44,480
So the output of our inference
problem is, given the specific
173
00:08:44,480 --> 00:08:48,560
value of Y, to calculate this
entire function as a function
174
00:08:48,560 --> 00:08:51,050
of x, and then go and plot it.
175
00:08:51,050 --> 00:08:53,570
How do we calculate it?
176
00:08:53,570 --> 00:08:57,410
You go through the same
calculation as in the discrete
177
00:08:57,410 --> 00:09:01,590
case, except that all of the
x's gets replaced by p's.
178
00:09:01,590 --> 00:09:04,630
In the continuous case, it's
equally true that the joint's
179
00:09:04,630 --> 00:09:07,790
density is the product of the
marginal density with the
180
00:09:07,790 --> 00:09:09,220
conditional density.
181
00:09:09,220 --> 00:09:11,400
So the formula is still
valid with just a
182
00:09:11,400 --> 00:09:13,160
little change of notation.
183
00:09:13,160 --> 00:09:16,480
So we end up with the same
formula here, except that we
184
00:09:16,480 --> 00:09:18,990
replace x's with p's.
185
00:09:18,990 --> 00:09:23,240
So all of these functions
are known to us.
186
00:09:23,240 --> 00:09:25,500
We have formulas for them.
187
00:09:25,500 --> 00:09:29,400
We fix a specific value of y,
we plug it in, so we're left
188
00:09:29,400 --> 00:09:30,640
with a function of x.
189
00:09:30,640 --> 00:09:33,420
And that gives us the posterior
distribution.
190
00:09:33,420 --> 00:09:38,130
Actually there's also a
denominator term that's not
191
00:09:38,130 --> 00:09:42,340
necessarily given to us, but we
can always calculate it if
192
00:09:42,340 --> 00:09:45,650
we have the marginal of X,
and we have the model for
193
00:09:45,650 --> 00:09:47,250
measuring device.
194
00:09:47,250 --> 00:09:50,960
Then we can always find the
marginal distribution of Y. So
195
00:09:50,960 --> 00:09:54,630
this quantity, that number, is
in general a known one, as
196
00:09:54,630 --> 00:09:58,490
well, and doesn't give
us any problems.
197
00:09:58,490 --> 00:10:03,140
So to complicate things a little
bit, we can also look
198
00:10:03,140 --> 00:10:07,610
into situations where our two
random variables are of
199
00:10:07,610 --> 00:10:09,080
different kinds.
200
00:10:09,080 --> 00:10:12,290
For example, one random variable
could be discrete,
201
00:10:12,290 --> 00:10:15,280
and the other it might
be continuous.
202
00:10:15,280 --> 00:10:17,340
And there's two versions.
203
00:10:17,340 --> 00:10:22,320
Here one version is when X is
discrete, but Y is continuous.
204
00:10:22,320 --> 00:10:25,130
What's an example of this?
205
00:10:25,130 --> 00:10:30,690
Well suppose that I send a
single bit of information so
206
00:10:30,690 --> 00:10:34,620
my X is 0 or 1.
207
00:10:34,620 --> 00:10:39,710
And what I measure is Y,
which is X plus, let's
208
00:10:39,710 --> 00:10:42,360
say, Gaussian noise.
209
00:10:42,360 --> 00:10:48,960
210
00:10:48,960 --> 00:10:52,550
This is the standard example
that shows up in any textbook
211
00:10:52,550 --> 00:10:55,220
on communication, or
signal processing.
212
00:10:55,220 --> 00:10:58,530
You send a single bit, but what
you observe is a noisy
213
00:10:58,530 --> 00:11:02,120
version of that bit.
214
00:11:02,120 --> 00:11:05,150
You start with a model
of your x's.
215
00:11:05,150 --> 00:11:07,610
These would be your prior
probabilities.
216
00:11:07,610 --> 00:11:11,670
For example, you might be
believe that either 0 or 1 are
217
00:11:11,670 --> 00:11:16,250
equally likely, in which case
your PMF gives equal weight to
218
00:11:16,250 --> 00:11:18,320
two possible values.
219
00:11:18,320 --> 00:11:21,840
And then we need a model of
our measuring device.
220
00:11:21,840 --> 00:11:23,990
This is one specific model.
221
00:11:23,990 --> 00:11:28,090
The general model would have
a shape such as follows.
222
00:11:28,090 --> 00:11:37,560
Y has a distribution,
its density.
223
00:11:37,560 --> 00:11:41,590
And that density, however,
depends on the value of X.
224
00:11:41,590 --> 00:11:46,170
So when x is 0, we might get
a density of this kind.
225
00:11:46,170 --> 00:11:50,250
And when x is 1, we might
get the density
226
00:11:50,250 --> 00:11:52,210
of a different kind.
227
00:11:52,210 --> 00:11:57,010
So these are the conditional
densities of y in a universe
228
00:11:57,010 --> 00:11:59,730
that's specified by a particular
value of x.
229
00:11:59,730 --> 00:12:04,660
230
00:12:04,660 --> 00:12:09,040
And then we go ahead and
do our inference.
231
00:12:09,040 --> 00:12:13,520
OK, what's the right formula
for doing this inference?
232
00:12:13,520 --> 00:12:18,270
We need a formula that's sort of
an analog of this one, but
233
00:12:18,270 --> 00:12:22,210
applies to the case where we
have two random variables of
234
00:12:22,210 --> 00:12:23,670
different kinds.
235
00:12:23,670 --> 00:12:29,370
So let me just redo this
calculation here.
236
00:12:29,370 --> 00:12:33,250
Except that I'm not going to
have a probability of taking
237
00:12:33,250 --> 00:12:34,340
specific values.
238
00:12:34,340 --> 00:12:36,800
It will have to be something
a little different.
239
00:12:36,800 --> 00:12:39,250
So here's how it goes.
240
00:12:39,250 --> 00:12:44,340
Let's look at the probability
that X takes a specific value
241
00:12:44,340 --> 00:12:47,510
that makes sense in the discrete
case, but for the
242
00:12:47,510 --> 00:12:50,040
continuous random variable,
let's look at the probability
243
00:12:50,040 --> 00:12:53,480
that it takes values in
some little interval.
244
00:12:53,480 --> 00:12:55,940
And now this probability of
two things happening, I'm
245
00:12:55,940 --> 00:12:57,520
going to write it
as a product.
246
00:12:57,520 --> 00:12:59,450
And I'm going to write
this as a product in
247
00:12:59,450 --> 00:13:01,350
two different ways.
248
00:13:01,350 --> 00:13:09,360
So one way is to say that this
is the probability that X
249
00:13:09,360 --> 00:13:13,670
takes that value and then given
that X takes that value,
250
00:13:13,670 --> 00:13:19,310
the probability that Y falls
inside that interval.
251
00:13:19,310 --> 00:13:21,810
So this is our usual
multiplication rule for
252
00:13:21,810 --> 00:13:25,330
multiplying probabilities, but
I can use the multiplication
253
00:13:25,330 --> 00:13:27,610
rule also in a different way.
254
00:13:27,610 --> 00:13:30,210
It's the probability
that Y falls in
255
00:13:30,210 --> 00:13:33,460
the range of interest.
256
00:13:33,460 --> 00:13:36,990
And then the probability that X
takes the value of interest
257
00:13:36,990 --> 00:13:41,145
given that Y satisfies
the first condition.
258
00:13:41,145 --> 00:13:45,960
259
00:13:45,960 --> 00:13:53,760
So this is something that's
definitely true.
260
00:13:53,760 --> 00:13:57,410
We're just using the
multiplication rule.
261
00:13:57,410 --> 00:14:02,240
And now let's translate it
into PMF is PDF notation.
262
00:14:02,240 --> 00:14:07,130
So the entry up there is the
PMF of X evaluated at x.
263
00:14:07,130 --> 00:14:10,030
The second entry, what is it?
264
00:14:10,030 --> 00:14:12,230
Well probabilities of
little intervals are
265
00:14:12,230 --> 00:14:13,480
given to us by densities.
266
00:14:13,480 --> 00:14:16,010
267
00:14:16,010 --> 00:14:19,160
But we are in the conditional
universe where X takes on a
268
00:14:19,160 --> 00:14:20,430
particular value.
269
00:14:20,430 --> 00:14:27,450
So it's going to be the density
of Y given the value
270
00:14:27,450 --> 00:14:30,210
of X times delta.
271
00:14:30,210 --> 00:14:32,790
So probabilities of little
intervals are given by the
272
00:14:32,790 --> 00:14:36,430
density times the length of
the little interval, but
273
00:14:36,430 --> 00:14:39,390
because we're working in the
conditional universe, it has
274
00:14:39,390 --> 00:14:41,230
to be the conditional density.
275
00:14:41,230 --> 00:14:43,860
Now let's try the second
expression.
276
00:14:43,860 --> 00:14:46,690
This is the probability
that the Y falls
277
00:14:46,690 --> 00:14:48,040
into the little interval.
278
00:14:48,040 --> 00:14:51,160
So that's the density
of Y times delta.
279
00:14:51,160 --> 00:14:53,950
And then here we have an
object which is the
280
00:14:53,950 --> 00:14:59,690
conditional probability X in a
universe where the value of Y
281
00:14:59,690 --> 00:15:00,940
is given to us.
282
00:15:00,940 --> 00:15:04,900
283
00:15:04,900 --> 00:15:08,830
Now this relation is sort
of approximate.
284
00:15:08,830 --> 00:15:13,630
This is true for very small
delta in the limit.
285
00:15:13,630 --> 00:15:17,880
But we can cancel the deltas
from both sides, and we're
286
00:15:17,880 --> 00:15:21,800
left with a formula that links
together PMFs and PDFs.
287
00:15:21,800 --> 00:15:25,340
Now this may look terribly
confusing because there's both
288
00:15:25,340 --> 00:15:27,730
p's and f's involved.
289
00:15:27,730 --> 00:15:29,850
But the logic should be clear.
290
00:15:29,850 --> 00:15:32,590
If a random variable
is discrete, it's
291
00:15:32,590 --> 00:15:34,480
described by PMF.
292
00:15:34,480 --> 00:15:38,120
So here we're talking about
the PMF of X in some
293
00:15:38,120 --> 00:15:39,130
particular universe.
294
00:15:39,130 --> 00:15:41,210
X is discrete, so
it has a PMF.
295
00:15:41,210 --> 00:15:42,320
Similarly here.
296
00:15:42,320 --> 00:15:45,380
Y is continuous so it's
described by a PDF.
297
00:15:45,380 --> 00:15:47,840
And even in the conditional
universe where I tell you the
298
00:15:47,840 --> 00:15:50,900
value of X, Y is still a
continuous random variable, so
299
00:15:50,900 --> 00:15:53,280
it's been described by a PDF.
300
00:15:53,280 --> 00:15:55,430
So this is the basic
relation that links
301
00:15:55,430 --> 00:15:57,360
together PMF and PDFs.
302
00:15:57,360 --> 00:15:59,080
In this mixed the world.
303
00:15:59,080 --> 00:16:04,270
And now in this inequality,
you can take this term and
304
00:16:04,270 --> 00:16:07,830
send it to the new denominator
to the other side.
305
00:16:07,830 --> 00:16:10,070
And what you end up with
is the formula
306
00:16:10,070 --> 00:16:11,830
that we have up here.
307
00:16:11,830 --> 00:16:15,640
And this is a formula that we
can use to make inferences
308
00:16:15,640 --> 00:16:18,780
about the discrete random
variable X when we're told the
309
00:16:18,780 --> 00:16:26,540
value of the continuous random
variable Y. The probability
310
00:16:26,540 --> 00:16:29,690
that X takes on a particular
value has something
311
00:16:29,690 --> 00:16:31,330
to do with the prior.
312
00:16:31,330 --> 00:16:36,520
And other than that, it's
proportional to this quantity,
313
00:16:36,520 --> 00:16:41,720
the conditional of Y given X.
So these are the quantities
314
00:16:41,720 --> 00:16:43,190
that we plotted here.
315
00:16:43,190 --> 00:16:47,550
Suppose that the x's are equally
likely in your prior,
316
00:16:47,550 --> 00:16:50,210
so we don't really care
about that term.
317
00:16:50,210 --> 00:16:55,530
It tells us that the posterior
of X is proportional to that
318
00:16:55,530 --> 00:16:58,520
particular density under
the given x's.
319
00:16:58,520 --> 00:17:03,350
So in this picture, if I were to
get a particular y here, I
320
00:17:03,350 --> 00:17:07,200
would say that x equals 1
has a probability that's
321
00:17:07,200 --> 00:17:09,220
proportional to this quantity.
322
00:17:09,220 --> 00:17:11,470
x equals 0 has a probability
that's
323
00:17:11,470 --> 00:17:13,599
proportional to this quantity.
324
00:17:13,599 --> 00:17:16,910
So the ratio of these two
quantities gives us the
325
00:17:16,910 --> 00:17:21,200
relative odds of the different
x's given the y
326
00:17:21,200 --> 00:17:24,010
that we have observed.
327
00:17:24,010 --> 00:17:28,099
So we're going to come back to
this topic and redo plenty of
328
00:17:28,099 --> 00:17:31,350
examples of these kinds towards
the end of the class,
329
00:17:31,350 --> 00:17:34,480
when we spend some
time dedicated
330
00:17:34,480 --> 00:17:36,130
to inference problems.
331
00:17:36,130 --> 00:17:39,890
But already at this stage, we
sort of have the basic skills
332
00:17:39,890 --> 00:17:42,000
to deal with a lot of that.
333
00:17:42,000 --> 00:17:43,840
And it's useful at this
point to pull all
334
00:17:43,840 --> 00:17:45,610
the formulas together.
335
00:17:45,610 --> 00:17:49,990
So finally let's look at the
last case that's remaining.
336
00:17:49,990 --> 00:17:54,440
Here we have a continuous
phenomenon that we're trying
337
00:17:54,440 --> 00:17:57,770
to measure, but our measurements
are discrete.
338
00:17:57,770 --> 00:18:00,780
What's an example where
this might happen?
339
00:18:00,780 --> 00:18:05,270
So you have some device that
emits light, and you drive it
340
00:18:05,270 --> 00:18:07,500
with a current that has
a certain intensity.
341
00:18:07,500 --> 00:18:09,910
You don't know what that
current is, and it's a
342
00:18:09,910 --> 00:18:12,120
continuous random variable.
343
00:18:12,120 --> 00:18:14,600
But the device emits
light by sending
344
00:18:14,600 --> 00:18:16,580
out individual photons.
345
00:18:16,580 --> 00:18:20,480
And your measurement is some
other device that counts how
346
00:18:20,480 --> 00:18:23,250
many photons did you get
in a single second.
347
00:18:23,250 --> 00:18:28,020
So if we have devices that emit
a very low intensity you
348
00:18:28,020 --> 00:18:31,720
can actually start counting
individual photons as they're
349
00:18:31,720 --> 00:18:32,980
being observed.
350
00:18:32,980 --> 00:18:35,390
So we have a discrete
measurement, which is the
351
00:18:35,390 --> 00:18:38,920
number of problems, and we
have a continuous hidden
352
00:18:38,920 --> 00:18:43,060
random variable that we're
trying to estimate.
353
00:18:43,060 --> 00:18:45,790
What do we do in this case?
354
00:18:45,790 --> 00:18:52,600
Well we start again with a
formula of this kind, and send
355
00:18:52,600 --> 00:18:55,560
the p term to the denominator.
356
00:18:55,560 --> 00:18:58,180
And that's the formula that we
use there, except that the
357
00:18:58,180 --> 00:19:01,100
roles of x's and y's
are interchanged.
358
00:19:01,100 --> 00:19:06,810
So since here we have Y being
discrete, we should change all
359
00:19:06,810 --> 00:19:07,590
the subscripts.
360
00:19:07,590 --> 00:19:15,490
It would be p_Y f_X given
y f_X, and P(Y given X).
361
00:19:15,490 --> 00:19:19,230
So just change all
those subscripts.
362
00:19:19,230 --> 00:19:22,740
Because now what we're used to
be continuous became discrete,
363
00:19:22,740 --> 00:19:25,310
and vice versa.
364
00:19:25,310 --> 00:19:27,360
Take that formula, send
the other terms to the
365
00:19:27,360 --> 00:19:32,140
denominator, and we have a
formula for the density, or X,
366
00:19:32,140 --> 00:19:34,370
given the particular
measurements for Y that we
367
00:19:34,370 --> 00:19:36,350
have obtained.
368
00:19:36,350 --> 00:19:41,420
In some sense that's all there
is in Bayesian inference.
369
00:19:41,420 --> 00:19:46,540
It's using these very simple
one line formulas.
370
00:19:46,540 --> 00:19:51,210
But why are there people then
who make their living solving
371
00:19:51,210 --> 00:19:52,550
inference problems?
372
00:19:52,550 --> 00:19:54,990
Well, the devil is
in the details.
373
00:19:54,990 --> 00:19:57,460
As we're going to discuss,
there are some real world
374
00:19:57,460 --> 00:20:01,150
issues of how exactly do you
design your f's, how do you
375
00:20:01,150 --> 00:20:04,680
model your system, then how do
you do your calculations.
376
00:20:04,680 --> 00:20:06,940
This might not be always easy.
377
00:20:06,940 --> 00:20:09,710
For example, there's certain
integrals or sums that have to
378
00:20:09,710 --> 00:20:12,900
be evaluated, which may be
hard to do and so on.
379
00:20:12,900 --> 00:20:14,900
So this object is
a lot of richer
380
00:20:14,900 --> 00:20:16,730
than just these formulas.
381
00:20:16,730 --> 00:20:21,270
On the other hand, at the
conceptual level, that's the
382
00:20:21,270 --> 00:20:23,910
basis for Bayesian inference,
that these
383
00:20:23,910 --> 00:20:25,160
are the basic concepts.
384
00:20:25,160 --> 00:20:27,570
385
00:20:27,570 --> 00:20:30,850
All right, so now let's change
gear and move to the new
386
00:20:30,850 --> 00:20:36,180
subject, which is the topic of
finding the distribution of a
387
00:20:36,180 --> 00:20:38,360
functional for a random
variable.
388
00:20:38,360 --> 00:20:42,820
We call those distributions
derived distributions, because
389
00:20:42,820 --> 00:20:45,480
we're given the distribution
of X. We're interested in a
390
00:20:45,480 --> 00:20:48,980
function of X. We want to derive
the distribution of
391
00:20:48,980 --> 00:20:51,020
that function based on
the distribution
392
00:20:51,020 --> 00:20:53,060
that we already know.
393
00:20:53,060 --> 00:20:56,610
So it could be a function of
just one random variable.
394
00:20:56,610 --> 00:20:59,170
It could be a function of
several random variables.
395
00:20:59,170 --> 00:21:02,880
So one example that we are going
to solve at some point,
396
00:21:02,880 --> 00:21:05,830
let's say you have to run the
variables X and Y. Somebody
397
00:21:05,830 --> 00:21:09,055
tells you their distribution,
for example, is a uniform of
398
00:21:09,055 --> 00:21:10,000
the square.
399
00:21:10,000 --> 00:21:12,120
For some reason, you're
interested in the ratio of
400
00:21:12,120 --> 00:21:14,660
these two random variables,
and you want to find the
401
00:21:14,660 --> 00:21:16,910
distribution of that ratio.
402
00:21:16,910 --> 00:21:21,810
You can think of lots of cases
where your random variable of
403
00:21:21,810 --> 00:21:25,950
interest is created by taking
some other unknown variables
404
00:21:25,950 --> 00:21:27,570
and taking a function of them.
405
00:21:27,570 --> 00:21:31,170
And so it's legitimate to care
about the distribution of that
406
00:21:31,170 --> 00:21:33,310
random variable.
407
00:21:33,310 --> 00:21:35,560
A caveat, however.
408
00:21:35,560 --> 00:21:39,480
There's an important case where
you don't need to find
409
00:21:39,480 --> 00:21:41,840
the distribution of that
random variable.
410
00:21:41,840 --> 00:21:44,600
And this is when you want to
calculate the expectations.
411
00:21:44,600 --> 00:21:47,750
If all you care about is the
expected value of this
412
00:21:47,750 --> 00:21:50,580
function of the random
variables, you can work
413
00:21:50,580 --> 00:21:53,800
directly with the distribution
of the original random
414
00:21:53,800 --> 00:21:58,490
variables without ever having
to find the PDF of g.
415
00:21:58,490 --> 00:22:03,790
So you don't do unnecessary work
if it's not needed, but
416
00:22:03,790 --> 00:22:06,290
if it's needed, or if you're
asked to do it,
417
00:22:06,290 --> 00:22:08,470
then you just do it.
418
00:22:08,470 --> 00:22:13,040
So how do we find the
distribution of the function?
419
00:22:13,040 --> 00:22:17,690
As a warm-up, let's look
at the discrete case.
420
00:22:17,690 --> 00:22:21,120
Suppose that X is a discrete
random variable and takes
421
00:22:21,120 --> 00:22:22,550
certain values.
422
00:22:22,550 --> 00:22:27,070
We have a function g that
maps x's into y's.
423
00:22:27,070 --> 00:22:30,430
And we want to find the
probability mass function for
424
00:22:30,430 --> 00:22:31,930
Y.
425
00:22:31,930 --> 00:22:36,780
So for example, if I'm
interested in finding the
426
00:22:36,780 --> 00:22:41,020
probability that Y takes on
this particular value, how
427
00:22:41,020 --> 00:22:42,910
would they find it?
428
00:22:42,910 --> 00:22:46,890
Well I ask, what are the
different ways that these
429
00:22:46,890 --> 00:22:49,390
particular y value can happen?
430
00:22:49,390 --> 00:22:53,390
And the different ways that it
can happen is either if x
431
00:22:53,390 --> 00:22:56,800
takes this value, or if
X takes that value.
432
00:22:56,800 --> 00:23:02,650
So we identify this event in the
y space with that event in
433
00:23:02,650 --> 00:23:04,220
the x space.
434
00:23:04,220 --> 00:23:06,790
These two events
are identical.
435
00:23:06,790 --> 00:23:12,350
X falls in this set if and only
if Y falls in that set.
436
00:23:12,350 --> 00:23:15,060
Therefore, the probability of
Y falling in that set is the
437
00:23:15,060 --> 00:23:17,540
probability of X falling
in that set.
438
00:23:17,540 --> 00:23:20,890
The probability of X falling in
that set is just the sum of
439
00:23:20,890 --> 00:23:24,650
the individual probabilities
of the x's in this set.
440
00:23:24,650 --> 00:23:27,360
So we just add the probabilities
of the different
441
00:23:27,360 --> 00:23:31,300
x's where the summation is taken
over all x's that leads
442
00:23:31,300 --> 00:23:35,070
to that particular value of y.
443
00:23:35,070 --> 00:23:35,860
Very good.
444
00:23:35,860 --> 00:23:39,090
So that's all there is
in the discrete case.
445
00:23:39,090 --> 00:23:41,070
It's a very nice and simple.
446
00:23:41,070 --> 00:23:43,460
So let's transfer
these methods to
447
00:23:43,460 --> 00:23:45,810
the continuous case.
448
00:23:45,810 --> 00:23:47,890
Suppose we are in the
continuous case.
449
00:23:47,890 --> 00:23:52,140
Suppose that X and Y now can
take values anywhere.
450
00:23:52,140 --> 00:23:55,440
And I try to use same methods
and I ask, what is the
451
00:23:55,440 --> 00:24:00,340
probability that Y is going
to take this value?
452
00:24:00,340 --> 00:24:03,100
At least if the diagram is this
way, you would say this
453
00:24:03,100 --> 00:24:06,990
is the same as the probability
that X takes this value.
454
00:24:06,990 --> 00:24:10,220
So I can find the probability
of Y being this in terms of
455
00:24:10,220 --> 00:24:12,600
the probability of
X being that.
456
00:24:12,600 --> 00:24:14,610
Is this useful?
457
00:24:14,610 --> 00:24:16,480
In the continuous
case, it's not.
458
00:24:16,480 --> 00:24:19,830
Because in the continuous case,
any single value has 0
459
00:24:19,830 --> 00:24:21,020
probability.
460
00:24:21,020 --> 00:24:25,450
So what you're going to get out
of this argument is that
461
00:24:25,450 --> 00:24:29,530
the probability Y takes this
value is 0, is equal to the
462
00:24:29,530 --> 00:24:32,800
probability that X takes that
value which also 0.
463
00:24:32,800 --> 00:24:34,650
That doesn't help us.
464
00:24:34,650 --> 00:24:36,060
We want to do something more.
465
00:24:36,060 --> 00:24:40,650
We want to actually find,
perhaps, the density of Y, as
466
00:24:40,650 --> 00:24:43,550
opposed to the probabilities
of individual y's.
467
00:24:43,550 --> 00:24:47,620
So to find the density of Y,
you might argue as follows.
468
00:24:47,620 --> 00:24:51,100
I'm looking at an interval for
y, and I ask what's the
469
00:24:51,100 --> 00:24:53,510
probability of falling
in this interval.
470
00:24:53,510 --> 00:24:57,890
And you go back and find the
corresponding set of x's that
471
00:24:57,890 --> 00:25:02,090
leads to those y's, and equate
those two probabilities.
472
00:25:02,090 --> 00:25:04,960
The probability of all of those
y's collectively should
473
00:25:04,960 --> 00:25:09,710
be equal to the probability of
all of the x's that map into
474
00:25:09,710 --> 00:25:11,930
that interval collectively.
475
00:25:11,930 --> 00:25:16,010
And this way you can
relate the two.
476
00:25:16,010 --> 00:25:22,870
As far as the mechanics go, in
many cases it's easier to not
477
00:25:22,870 --> 00:25:26,670
to work with little intervals,
but instead to work with
478
00:25:26,670 --> 00:25:30,110
cumulative distribution
functions that used to work
479
00:25:30,110 --> 00:25:32,600
with sort of big intervals.
480
00:25:32,600 --> 00:25:35,460
So you can instead do
a different picture.
481
00:25:35,460 --> 00:25:38,250
Look at this set of y's.
482
00:25:38,250 --> 00:25:41,690
This is the set of y's
that are smaller
483
00:25:41,690 --> 00:25:43,200
than a certain value.
484
00:25:43,200 --> 00:25:46,990
The probability of this set
is given by the cumulative
485
00:25:46,990 --> 00:25:49,740
distribution of the
random variable Y.
486
00:25:49,740 --> 00:25:54,450
Now this set of y's gets
produced by some corresponding
487
00:25:54,450 --> 00:25:56,850
set of x's.
488
00:25:56,850 --> 00:26:04,120
Maybe these are the x's that
map into y's in that set.
489
00:26:04,120 --> 00:26:06,040
And then we argue as follows.
490
00:26:06,040 --> 00:26:08,870
The probability that the Y falls
in this interval is the
491
00:26:08,870 --> 00:26:12,600
same as the probability that
X falls in that interval.
492
00:26:12,600 --> 00:26:15,810
So the event of Y falling here
and the event of X falling
493
00:26:15,810 --> 00:26:19,330
there are the same, so their
probabilities must be equal.
494
00:26:19,330 --> 00:26:22,010
And then I do the calculations
here.
495
00:26:22,010 --> 00:26:25,050
And I end up getting the
cumulative distribution
496
00:26:25,050 --> 00:26:28,760
function of Y. Once I have the
cumulative, I can get the
497
00:26:28,760 --> 00:26:31,670
density by just differentiating.
498
00:26:31,670 --> 00:26:34,900
So this is the general cookbook
procedure that we
499
00:26:34,900 --> 00:26:37,886
will be using to calculate
it derived distributions.
500
00:26:37,886 --> 00:26:40,450
501
00:26:40,450 --> 00:26:43,500
We're interested in a random
variable Y, which is a
502
00:26:43,500 --> 00:26:45,320
function of the x's.
503
00:26:45,320 --> 00:26:50,070
We will aim at obtaining the
cumulative distribution of Y.
504
00:26:50,070 --> 00:26:54,040
Somehow, manage to calculate the
probability of this event.
505
00:26:54,040 --> 00:26:58,120
Once we get it, and what I mean
by get it, I don't mean
506
00:26:58,120 --> 00:27:00,980
getting it for a single
value of little y.
507
00:27:00,980 --> 00:27:04,640
You need to get this
for all little y's.
508
00:27:04,640 --> 00:27:07,930
So you need to get the
function itself, the
509
00:27:07,930 --> 00:27:09,480
cumulative distribution.
510
00:27:09,480 --> 00:27:12,750
Once you get it in that form,
then you can calculate the
511
00:27:12,750 --> 00:27:15,260
derivative at any particular
point.
512
00:27:15,260 --> 00:27:18,000
And this is going to give
you the density of Y.
513
00:27:18,000 --> 00:27:19,690
So a simple two-step
procedure.
514
00:27:19,690 --> 00:27:24,050
The devil is in the details of
how you carry the mechanics.
515
00:27:24,050 --> 00:27:27,580
So let's do one first example.
516
00:27:27,580 --> 00:27:31,020
Suppose that X is a uniform
random variable, takes values
517
00:27:31,020 --> 00:27:32,660
between 0 and 2.
518
00:27:32,660 --> 00:27:35,605
We're interested in the random
variable Y, which is the cube
519
00:27:35,605 --> 00:27:37,500
of X. What kind of distribution
520
00:27:37,500 --> 00:27:38,840
is it going to have?
521
00:27:38,840 --> 00:27:44,960
Now first notice that Y takes
values between 0 and 8.
522
00:27:44,960 --> 00:27:48,810
So X is uniform, so all the
x's are equally likely.
523
00:27:48,810 --> 00:27:51,680
524
00:27:51,680 --> 00:27:55,340
You might then say, well, in
that case, all the y's should
525
00:27:55,340 --> 00:27:56,740
be equally likely.
526
00:27:56,740 --> 00:28:00,630
So Y might also have a
uniform distribution.
527
00:28:00,630 --> 00:28:02,210
Is this true?
528
00:28:02,210 --> 00:28:04,040
We'll find out.
529
00:28:04,040 --> 00:28:06,990
So let's start applying the
cookbook procedure.
530
00:28:06,990 --> 00:28:10,410
We want to find first the
cumulative distribution of the
531
00:28:10,410 --> 00:28:14,890
random variable Y, which by
definition is the probability
532
00:28:14,890 --> 00:28:17,370
that the random variable is
less than or equal to a
533
00:28:17,370 --> 00:28:18,850
certain number.
534
00:28:18,850 --> 00:28:20,680
That's what we want to find.
535
00:28:20,680 --> 00:28:24,440
What we have in our hands is the
distribution of X. That's
536
00:28:24,440 --> 00:28:26,320
what we need to work with.
537
00:28:26,320 --> 00:28:30,090
So the first step that you need
to do is to look at this
538
00:28:30,090 --> 00:28:33,680
events and translate it, and
write it in terms of the
539
00:28:33,680 --> 00:28:39,040
random variable about which you
know you have information.
540
00:28:39,040 --> 00:28:44,320
So Y is X cubed, so this event
is the same as that event.
541
00:28:44,320 --> 00:28:46,760
So now we can forget
about the y's.
542
00:28:46,760 --> 00:28:49,860
It's just an exercise involving
a single random
543
00:28:49,860 --> 00:28:52,750
variable with a known
distribution and we want to
544
00:28:52,750 --> 00:28:56,610
calculate the probability
of some event.
545
00:28:56,610 --> 00:28:58,780
So we're looking
at this event.
546
00:28:58,780 --> 00:29:02,230
X cubed being less than or equal
to Y. We massage that
547
00:29:02,230 --> 00:29:06,130
expression so that's it involves
X directly, so let's
548
00:29:06,130 --> 00:29:08,960
take cubic roots of both sides
of this inequality.
549
00:29:08,960 --> 00:29:12,130
This event is the same as the
event that X is less than or
550
00:29:12,130 --> 00:29:14,820
equal to Y to the 1/3.
551
00:29:14,820 --> 00:29:19,300
Now with a uniform distribution
on [0,2], what is
552
00:29:19,300 --> 00:29:22,070
that probability going to be?
553
00:29:22,070 --> 00:29:27,710
It's the probability of being in
the interval from 0 to y to
554
00:29:27,710 --> 00:29:34,680
the 1/3, so it's going to be in
the area under the uniform
555
00:29:34,680 --> 00:29:37,010
going up to that point.
556
00:29:37,010 --> 00:29:39,315
And what's the area under
that uniform?
557
00:29:39,315 --> 00:29:42,650
558
00:29:42,650 --> 00:29:44,290
So here's x.
559
00:29:44,290 --> 00:29:50,810
Here is the distribution
of X. It goes up to 2.
560
00:29:50,810 --> 00:29:53,330
The distribution of
X is this one.
561
00:29:53,330 --> 00:29:56,860
We want to go up to
y to the 1/3.
562
00:29:56,860 --> 00:30:02,390
So the probability for this
event happening is this area.
563
00:30:02,390 --> 00:30:06,590
And the area is equal to the
base, which is y to the 1/3
564
00:30:06,590 --> 00:30:08,250
times the height.
565
00:30:08,250 --> 00:30:09,720
What is the height?
566
00:30:09,720 --> 00:30:13,480
Well since the density must
integrate to 1, the total area
567
00:30:13,480 --> 00:30:15,340
under the curve has to be 1.
568
00:30:15,340 --> 00:30:19,660
So the height here is 1/2, and
that explains why we get the
569
00:30:19,660 --> 00:30:22,530
1/2 factor down there.
570
00:30:22,530 --> 00:30:24,900
So that's the formula for the
cumulative distribution.
571
00:30:24,900 --> 00:30:26,070
And then the rest is easy.
572
00:30:26,070 --> 00:30:28,340
You just take derivatives.
573
00:30:28,340 --> 00:30:32,650
You differentiate this
expression with respect to y
574
00:30:32,650 --> 00:30:36,240
1/2 times 1/3, and y
drops by one power.
575
00:30:36,240 --> 00:30:39,670
So you get y to 2/3 in
the denominator.
576
00:30:39,670 --> 00:30:55,490
So if you wish to plot this,
it's 1/y to the 2/3.
577
00:30:55,490 --> 00:31:00,480
So when y goes to 0, it sort
of blows up and it
578
00:31:00,480 --> 00:31:03,090
goes on this way.
579
00:31:03,090 --> 00:31:06,090
Is this picture correct
the way I've drawn it?
580
00:31:06,090 --> 00:31:08,900
581
00:31:08,900 --> 00:31:11,256
What's wrong with it?
582
00:31:11,256 --> 00:31:12,630
[? AUDIENCE: Something. ?]
583
00:31:12,630 --> 00:31:13,420
PROFESSOR: Yes.
584
00:31:13,420 --> 00:31:17,610
y only takes values
from 0 to 8.
585
00:31:17,610 --> 00:31:21,890
This formula that I wrote here
is only correct when the
586
00:31:21,890 --> 00:31:25,000
preview picture applies.
587
00:31:25,000 --> 00:31:31,070
I took my y to the 1/3 to
be between 0 and 2.
588
00:31:31,070 --> 00:31:40,650
So this formula here is only
correct for y between 0 and 8.
589
00:31:40,650 --> 00:31:43,770
590
00:31:43,770 --> 00:31:46,610
And for that reason, the formula
for the derivative is
591
00:31:46,610 --> 00:31:50,700
also true only for a
y between 0 and 8.
592
00:31:50,700 --> 00:31:55,630
And any other values of why are
impossible, so they get
593
00:31:55,630 --> 00:31:57,880
zero density.
594
00:31:57,880 --> 00:32:04,070
So to complete the picture
here, the PDF of y has a
595
00:32:04,070 --> 00:32:09,290
cut-off of 8, and it's also
0 everywhere else.
596
00:32:09,290 --> 00:32:13,330
597
00:32:13,330 --> 00:32:16,640
And one thing that we see is
that the distribution of Y is
598
00:32:16,640 --> 00:32:17,980
not uniform.
599
00:32:17,980 --> 00:32:24,240
Certain y's are more likely than
others, even though we
600
00:32:24,240 --> 00:32:26,130
started with a uniform random
601
00:32:26,130 --> 00:32:32,240
variable X. All right.
602
00:32:32,240 --> 00:32:36,530
So we will keep doing examples
of this kind, a sequence of
603
00:32:36,530 --> 00:32:40,350
progressively more interesting
or more complicated.
604
00:32:40,350 --> 00:32:42,530
So that's going to continue
in the next lecture.
605
00:32:42,530 --> 00:32:45,930
You're going to see plenty of
examples in your recitations
606
00:32:45,930 --> 00:32:48,060
and tutorials and so on.
607
00:32:48,060 --> 00:32:52,420
So let's do one that's pretty
similar to the one that we
608
00:32:52,420 --> 00:32:57,730
did, but it's going to add to
just a small twist in how we
609
00:32:57,730 --> 00:33:00,470
do the mechanics.
610
00:33:00,470 --> 00:33:02,780
OK so you set your
cruise control
611
00:33:02,780 --> 00:33:04,010
when you start driving.
612
00:33:04,010 --> 00:33:06,310
And you keep driving at the
constants based at the
613
00:33:06,310 --> 00:33:07,870
constant speed.
614
00:33:07,870 --> 00:33:09,980
Where you set your cruise
control is somewhere
615
00:33:09,980 --> 00:33:11,660
between 30 and 60.
616
00:33:11,660 --> 00:33:14,520
You're going to drive
a distance of 200.
617
00:33:14,520 --> 00:33:18,660
And so the time it's going to
take for your trip is 200 over
618
00:33:18,660 --> 00:33:20,530
the setting of your
cruise control.
619
00:33:20,530 --> 00:33:22,610
So it's 200/V.
620
00:33:22,610 --> 00:33:26,210
Somebody gives you the
distribution of V, and they
621
00:33:26,210 --> 00:33:29,490
tell you not only it's between
30 and 60, it's roughly
622
00:33:29,490 --> 00:33:33,530
equally likely to be anything
between 30 and 60, so we have
623
00:33:33,530 --> 00:33:36,280
a uniform distribution
over that range.
624
00:33:36,280 --> 00:33:40,060
So we have a distribution of
V. We want to find the
625
00:33:40,060 --> 00:33:43,460
distribution of the random
variable T, which is the time
626
00:33:43,460 --> 00:33:46,540
it takes till your trip ends.
627
00:33:46,540 --> 00:33:49,200
628
00:33:49,200 --> 00:33:51,790
So how are we going
to proceed?
629
00:33:51,790 --> 00:33:55,170
We'll use the exact same
cookbook procedure.
630
00:33:55,170 --> 00:33:57,360
We're going to start by
finding the cumulative
631
00:33:57,360 --> 00:34:02,920
distribution of T.
What is this?
632
00:34:02,920 --> 00:34:05,730
By definition, the cumulative
distribution is the
633
00:34:05,730 --> 00:34:10,230
probability that T is less
than a certain number.
634
00:34:10,230 --> 00:34:12,070
OK.
635
00:34:12,070 --> 00:34:15,340
Now we don't know the
distribution of T, so we
636
00:34:15,340 --> 00:34:17,989
cannot to work with these
event directly.
637
00:34:17,989 --> 00:34:21,960
But we take that event and
translate it into T-space.
638
00:34:21,960 --> 00:34:28,205
So we replace the t's by what we
know T to be in terms of V
639
00:34:28,205 --> 00:34:28,271
or
640
00:34:28,271 --> 00:34:33,565
the v's All right.
641
00:34:33,565 --> 00:34:36,230
642
00:34:36,230 --> 00:34:39,659
So we have the distribution
of V. So now let's
643
00:34:39,659 --> 00:34:41,739
calculate this quantity.
644
00:34:41,739 --> 00:34:42,179
OK.
645
00:34:42,179 --> 00:34:46,210
Let's massage this event and
rewrite it as the probability
646
00:34:46,210 --> 00:35:06,880
that V is larger or
equal to 200/T.
647
00:35:06,880 --> 00:35:10,870
So what is this going to be?
648
00:35:10,870 --> 00:35:14,400
So let's say that 200/T
is some number that
649
00:35:14,400 --> 00:35:16,015
falls inside the range.
650
00:35:16,015 --> 00:35:19,150
651
00:35:19,150 --> 00:35:24,630
So that's going to be true if
200/T is bigger than 30, and
652
00:35:24,630 --> 00:35:26,610
less than 60.
653
00:35:26,610 --> 00:35:37,110
Which means that t is
less than 30/200.
654
00:35:37,110 --> 00:35:38,360
No, 200/30.
655
00:35:38,360 --> 00:35:41,300
656
00:35:41,300 --> 00:35:44,570
And bigger than 200/60.
657
00:35:44,570 --> 00:35:51,360
So for t's inside that range,
this number 200/t falls inside
658
00:35:51,360 --> 00:35:52,230
that range.
659
00:35:52,230 --> 00:35:55,960
This is the range of t's that
are possible, given the
660
00:35:55,960 --> 00:35:59,240
description of the problem
the we have set up.
661
00:35:59,240 --> 00:36:04,940
So for t's in that range, what
is the probability that V is
662
00:36:04,940 --> 00:36:07,900
bigger than this number?
663
00:36:07,900 --> 00:36:11,550
So V being bigger than that
number is the probability of
664
00:36:11,550 --> 00:36:17,000
this event, so it's going to be
the area under this curve.
665
00:36:17,000 --> 00:36:22,880
So the area under that curve
is the height of the curve,
666
00:36:22,880 --> 00:36:27,300
which is 1/3 over 30
times the base.
667
00:36:27,300 --> 00:36:28,910
How big is the base?
668
00:36:28,910 --> 00:36:33,060
Well it's from that point to 60,
so the base has a length
669
00:36:33,060 --> 00:36:36,500
of 60 minus 200/t.
670
00:36:36,500 --> 00:36:45,470
671
00:36:45,470 --> 00:36:50,580
And this is a formula which is
valid for those t's for which
672
00:36:50,580 --> 00:36:52,420
this picture is correct.
673
00:36:52,420 --> 00:36:57,410
And this picture is correct if
200/T happens to fall in this
674
00:36:57,410 --> 00:37:01,540
interval, which is the same as
T falling in that interval,
675
00:37:01,540 --> 00:37:03,980
which are the t's that
are possible.
676
00:37:03,980 --> 00:37:07,390
So finally let's find the
density of T, which is what
677
00:37:07,390 --> 00:37:09,430
we're looking for.
678
00:37:09,430 --> 00:37:12,450
We find this by taking the
derivative in this expression
679
00:37:12,450 --> 00:37:14,370
with respect to t.
680
00:37:14,370 --> 00:37:18,150
We only get one term
from here.
681
00:37:18,150 --> 00:37:26,045
And this is going to be 200/30,
1 over t squared.
682
00:37:26,045 --> 00:37:30,820
683
00:37:30,820 --> 00:37:34,020
And this is the formula for
the density for t's in the
684
00:37:34,020 --> 00:37:35,270
allowed to range.
685
00:37:35,270 --> 00:37:46,890
686
00:37:46,890 --> 00:37:51,130
OK, so that's the end of the
solution to this particular
687
00:37:51,130 --> 00:37:52,880
problem as well.
688
00:37:52,880 --> 00:37:55,640
I said that there was a little
twist compared to
689
00:37:55,640 --> 00:37:57,130
the previous one.
690
00:37:57,130 --> 00:37:58,410
What was the twist?
691
00:37:58,410 --> 00:38:01,380
Well the twist was that in the
previous problem we dealt with
692
00:38:01,380 --> 00:38:05,580
the X cubed function, which was
monotonically increasing.
693
00:38:05,580 --> 00:38:07,760
Here we dealt with the
function that was
694
00:38:07,760 --> 00:38:09,850
monotonically decreasing.
695
00:38:09,850 --> 00:38:13,850
So when we had to find the
probability that T is less
696
00:38:13,850 --> 00:38:17,220
than something, that translated
into an event that
697
00:38:17,220 --> 00:38:19,640
V was bigger than something.
698
00:38:19,640 --> 00:38:22,410
Your time is less than something
if and only if your
699
00:38:22,410 --> 00:38:25,090
velocity is bigger
than something.
700
00:38:25,090 --> 00:38:27,510
So for when you're dealing
with the monotonically
701
00:38:27,510 --> 00:38:31,950
decreasing function, at some
point some inequalities will
702
00:38:31,950 --> 00:38:33,200
have to get reversed.
703
00:38:33,200 --> 00:38:38,540
704
00:38:38,540 --> 00:38:43,700
Finally let's look at
a very useful one.
705
00:38:43,700 --> 00:38:47,990
Which is the case where we take
a linear function of a
706
00:38:47,990 --> 00:38:49,700
random variable.
707
00:38:49,700 --> 00:38:55,810
So X is a random variable with
given distribution, and we can
708
00:38:55,810 --> 00:38:57,110
see there is a linear
function.
709
00:38:57,110 --> 00:38:59,920
So in this particular instance,
we take a to be
710
00:38:59,920 --> 00:39:03,590
equal to 2 and b equal to 5.
711
00:39:03,590 --> 00:39:08,680
And let us first argue
just by picture.
712
00:39:08,680 --> 00:39:13,920
So X is a random variable that
has a given distribution.
713
00:39:13,920 --> 00:39:16,150
Let's say it's this
weird shape here.
714
00:39:16,150 --> 00:39:20,170
And x ranges from -1 to +2.
715
00:39:20,170 --> 00:39:22,140
Let's do things one
step at the time.
716
00:39:22,140 --> 00:39:26,190
Let's first find the
distribution of 2X.
717
00:39:26,190 --> 00:39:28,960
Why do you think you
know about 2X?
718
00:39:28,960 --> 00:39:35,330
Well if x ranges from -1 to 2,
then the random variable X is
719
00:39:35,330 --> 00:39:36,580
going to range from -2 to +4.
720
00:39:36,580 --> 00:39:39,560
721
00:39:39,560 --> 00:39:42,360
So that's what the range
is going to be.
722
00:39:42,360 --> 00:39:48,840
Now dealing with the random
variable 2X, as opposed to the
723
00:39:48,840 --> 00:39:52,520
random variable X, in some sense
it's just changing the
724
00:39:52,520 --> 00:39:55,270
units in which we measure
that random variable.
725
00:39:55,270 --> 00:39:58,130
It's just changing the
scale on which we
726
00:39:58,130 --> 00:39:59,730
draw and plot things.
727
00:39:59,730 --> 00:40:03,180
So if it's just a scale change,
then intuition should
728
00:40:03,180 --> 00:40:08,120
tell you that the random
variable X should have a PDF
729
00:40:08,120 --> 00:40:12,850
of the same shape, except that
it's scaled out by a factor of
730
00:40:12,850 --> 00:40:16,540
2, because our random variable
of 2X now has a range that's
731
00:40:16,540 --> 00:40:18,570
twice as large.
732
00:40:18,570 --> 00:40:23,720
So we take the same PDF and
scale it up by stretching the
733
00:40:23,720 --> 00:40:26,790
x-axis by a factor of 2.
734
00:40:26,790 --> 00:40:30,330
So what does scaling
correspond to
735
00:40:30,330 --> 00:40:33,870
in terms of a formula?
736
00:40:33,870 --> 00:40:39,500
So the distribution of 2X as a
function, let's say, a generic
737
00:40:39,500 --> 00:40:45,760
argument z, is going to be the
distribution of X, but scaled
738
00:40:45,760 --> 00:40:47,010
by a factor of 2.
739
00:40:47,010 --> 00:40:50,060
740
00:40:50,060 --> 00:40:54,100
So taking a function and
replacing its arguments by the
741
00:40:54,100 --> 00:40:58,740
argument over 2, what it
does is it stretches it
742
00:40:58,740 --> 00:41:00,430
by a factor of 2.
743
00:41:00,430 --> 00:41:04,410
You have probably been tortured
ever since middle
744
00:41:04,410 --> 00:41:08,150
school to figure out when need
to stretch a function, whether
745
00:41:08,150 --> 00:41:12,470
you need to put 2z or z/2.
746
00:41:12,470 --> 00:41:15,450
And the one that actually does
the stretching is to put the
747
00:41:15,450 --> 00:41:18,000
z/2 in that place.
748
00:41:18,000 --> 00:41:21,180
So that's what the
stretching does.
749
00:41:21,180 --> 00:41:23,670
Could that to be the
full answer?
750
00:41:23,670 --> 00:41:24,930
Well there's a catch.
751
00:41:24,930 --> 00:41:29,730
If you stretch this function by
a factor of 2, what happens
752
00:41:29,730 --> 00:41:32,100
to the area under
the function?
753
00:41:32,100 --> 00:41:34,120
It's going to get doubled.
754
00:41:34,120 --> 00:41:38,670
But the total probability must
add up to 1, so we need to do
755
00:41:38,670 --> 00:41:41,840
something else to make sure that
the area under the curve
756
00:41:41,840 --> 00:41:44,300
stays to 1.
757
00:41:44,300 --> 00:41:47,980
So we need to take that function
and scale it down by
758
00:41:47,980 --> 00:41:51,720
this factor of 2.
759
00:41:51,720 --> 00:41:55,580
So when you're dealing with a
multiple of a random variable,
760
00:41:55,580 --> 00:42:00,580
what happens to the PDF is you
stretch it according to the
761
00:42:00,580 --> 00:42:04,320
multiple, and then scale it
down by the same number so
762
00:42:04,320 --> 00:42:07,460
that you preserve the area
under that curve.
763
00:42:07,460 --> 00:42:10,800
So now we found the distribution
of 2X.
764
00:42:10,800 --> 00:42:14,910
How about the distribution
of 2X + 5?
765
00:42:14,910 --> 00:42:18,560
Well what does adding 5
to random variable do?
766
00:42:18,560 --> 00:42:20,940
You're going to get essentially
the same values
767
00:42:20,940 --> 00:42:23,720
with the same probability,
except that those values all
768
00:42:23,720 --> 00:42:26,260
get shifted by 5.
769
00:42:26,260 --> 00:42:30,650
So all that you need to do is
to take this PDF here, and
770
00:42:30,650 --> 00:42:32,690
shift it by 5 units.
771
00:42:32,690 --> 00:42:35,530
So the range used to
be from -2 to 4.
772
00:42:35,530 --> 00:42:38,750
The new range is going
to be from 3 to 9.
773
00:42:38,750 --> 00:42:40,390
And that's the final answer.
774
00:42:40,390 --> 00:42:44,900
This is the distribution of
2X + 5, starting with this
775
00:42:44,900 --> 00:42:48,240
particular distribution of X.
776
00:42:48,240 --> 00:42:53,600
Now shifting to the
right by b, what
777
00:42:53,600 --> 00:42:55,700
does it do to a function?
778
00:42:55,700 --> 00:42:58,620
Shifting to the right to
by a certain amount,
779
00:42:58,620 --> 00:43:04,960
mathematically, it corresponds
to putting -b in the argument
780
00:43:04,960 --> 00:43:06,000
of the function.
781
00:43:06,000 --> 00:43:09,750
So I'm taking the formula that
I had here, which is the
782
00:43:09,750 --> 00:43:12,220
scaling by a factor of a.
783
00:43:12,220 --> 00:43:17,200
The scaling down to keep the
total area equal to 1.
784
00:43:17,200 --> 00:43:19,740
And then I need to introduce
this extra
785
00:43:19,740 --> 00:43:20,990
term to do the shifting.
786
00:43:20,990 --> 00:43:23,300
787
00:43:23,300 --> 00:43:26,200
So this is a plausible
argument.
788
00:43:26,200 --> 00:43:31,080
The proof by picture that this
should be the right answer.
789
00:43:31,080 --> 00:43:38,295
But just in order to keep our
skills tuned and refined, let
790
00:43:38,295 --> 00:43:42,950
us do this derivation in a
more formal way using our
791
00:43:42,950 --> 00:43:45,135
two-step cookbook procedure.
792
00:43:45,135 --> 00:43:48,000
793
00:43:48,000 --> 00:43:51,010
And I'm going to do it under
the assumption that a is
794
00:43:51,010 --> 00:43:54,910
positive, as in the example
that's we just did.
795
00:43:54,910 --> 00:43:59,090
So what's the two-step
procedure?
796
00:43:59,090 --> 00:44:03,700
We want to find the cumulative
of Y, and after that we're
797
00:44:03,700 --> 00:44:05,720
going to differentiate.
798
00:44:05,720 --> 00:44:09,220
By definition the cumulative
is the probability that the
799
00:44:09,220 --> 00:44:13,280
random variable takes values
less than a certain number.
800
00:44:13,280 --> 00:44:17,190
And now we need to take this
event and translate it, and
801
00:44:17,190 --> 00:44:21,110
express it in terms of the
original random variables.
802
00:44:21,110 --> 00:44:24,970
So Y is, by definition,
aX + b, so we're
803
00:44:24,970 --> 00:44:28,970
looking at this event.
804
00:44:28,970 --> 00:44:33,580
And now we want to express this
event in a clean form
805
00:44:33,580 --> 00:44:39,730
where X shows up in
a straight way.
806
00:44:39,730 --> 00:44:42,740
Let's say I'm going to massage
this event and
807
00:44:42,740 --> 00:44:44,640
write it in this form.
808
00:44:44,640 --> 00:44:48,070
For this inequality to be true,
x should be less than or
809
00:44:48,070 --> 00:44:53,820
equal to (y minus
b) divided by a.
810
00:44:53,820 --> 00:44:56,820
OK, now what is this?
811
00:44:56,820 --> 00:45:01,330
This is the cumulative
distribution of X evaluated at
812
00:45:01,330 --> 00:45:02,580
the particular point.
813
00:45:02,580 --> 00:45:07,850
814
00:45:07,850 --> 00:45:14,760
So we got a formula for the
cumulative Y based on the
815
00:45:14,760 --> 00:45:17,880
cumulative of X. What's
the next step?
816
00:45:17,880 --> 00:45:21,550
Next step is to take derivatives
of both sides.
817
00:45:21,550 --> 00:45:28,810
So the density of Y is going to
be the derivative of this
818
00:45:28,810 --> 00:45:31,270
expression with respect to y.
819
00:45:31,270 --> 00:45:36,830
OK, so now here we need
to use the chain rule.
820
00:45:36,830 --> 00:45:40,670
It's going to be the derivative
of the F function
821
00:45:40,670 --> 00:45:43,080
with respect to its argument.
822
00:45:43,080 --> 00:45:46,930
And then we need to take the
derivative of the argument
823
00:45:46,930 --> 00:45:48,780
with respect to y.
824
00:45:48,780 --> 00:45:51,530
What is the derivative
of the cumulative?
825
00:45:51,530 --> 00:45:53,190
The derivative of
the cumulative
826
00:45:53,190 --> 00:45:56,290
is the density itself.
827
00:45:56,290 --> 00:45:59,578
And we evaluate it at the
point of interest.
828
00:45:59,578 --> 00:46:02,180
829
00:46:02,180 --> 00:46:05,340
And then the chain rule tells
us that we need to take the
830
00:46:05,340 --> 00:46:08,800
derivative of this with
respect to y, and the
831
00:46:08,800 --> 00:46:11,370
derivative of this with
respect to y is 1/a.
832
00:46:11,370 --> 00:46:14,290
833
00:46:14,290 --> 00:46:18,330
And this gives us the formula
which is consistent with what
834
00:46:18,330 --> 00:46:21,810
I had written down here,
for the case where a
835
00:46:21,810 --> 00:46:25,030
is a positive number.
836
00:46:25,030 --> 00:46:27,915
What if a was a negative
number?
837
00:46:27,915 --> 00:46:30,570
838
00:46:30,570 --> 00:46:31,910
Could this formula be true?
839
00:46:31,910 --> 00:46:35,120
840
00:46:35,120 --> 00:46:36,140
Of course not.
841
00:46:36,140 --> 00:46:39,000
Densities cannot be
negative, right?
842
00:46:39,000 --> 00:46:41,180
So that formula cannot
be true.
843
00:46:41,180 --> 00:46:43,750
Something needs to change.
844
00:46:43,750 --> 00:46:45,140
What should change?
845
00:46:45,140 --> 00:46:50,970
Where does this argument break
down when a is negative?
846
00:46:50,970 --> 00:46:56,470
847
00:46:56,470 --> 00:47:01,570
So when I write this inequality
in this form, I
848
00:47:01,570 --> 00:47:03,940
divide by a.
849
00:47:03,940 --> 00:47:07,730
But when you divide by a
negative number, the direction
850
00:47:07,730 --> 00:47:10,390
of an inequality is
going to change.
851
00:47:10,390 --> 00:47:14,520
So when a is negative, this
inequality becomes larger than
852
00:47:14,520 --> 00:47:16,190
or equal to.
853
00:47:16,190 --> 00:47:18,770
And in that case, the expression
that I have up
854
00:47:18,770 --> 00:47:24,360
there would change when this
is larger than here.
855
00:47:24,360 --> 00:47:27,900
Instead of getting the
cumulative, I would get 1
856
00:47:27,900 --> 00:47:32,350
minus the cumulative of (y
minus b) divided by a.
857
00:47:32,350 --> 00:47:35,240
858
00:47:35,240 --> 00:47:39,890
So this is the probability that
X is bigger than this
859
00:47:39,890 --> 00:47:41,170
particular number.
860
00:47:41,170 --> 00:47:44,000
And now when you take the
derivatives, there's going to
861
00:47:44,000 --> 00:47:46,570
be a minus sign that shows up.
862
00:47:46,570 --> 00:47:49,810
And that minus sign will
end up being here.
863
00:47:49,810 --> 00:47:53,730
And so we're taking the negative
of a negative number,
864
00:47:53,730 --> 00:47:56,420
and that basically is equivalent
to taking the
865
00:47:56,420 --> 00:47:58,660
absolute value of that number.
866
00:47:58,660 --> 00:48:03,830
So all that happens when we have
a negative a is that we
867
00:48:03,830 --> 00:48:07,010
have to take the absolute value
of the scaling factor
868
00:48:07,010 --> 00:48:10,250
instead of the factor itself.
869
00:48:10,250 --> 00:48:14,020
All right, so this general
formula is quite useful for
870
00:48:14,020 --> 00:48:16,690
dealing with linear functions
of random variables.
871
00:48:16,690 --> 00:48:21,330
And one nice application of it
is to take the formula for a
872
00:48:21,330 --> 00:48:25,460
normal random variable, consider
a linear function of
873
00:48:25,460 --> 00:48:29,600
a normal random variable, plug
into this formula, and what
874
00:48:29,600 --> 00:48:34,000
you will find is that Y also
has a normal distribution.
875
00:48:34,000 --> 00:48:37,310
So using this formula, now we
can prove a statement that I
876
00:48:37,310 --> 00:48:40,565
had made a couple of lectures
ago, that a linear function of
877
00:48:40,565 --> 00:48:43,900
a normal random variable
is also linear.
878
00:48:43,900 --> 00:48:47,600
That's how you would prove it.
879
00:48:47,600 --> 00:48:51,190
I think this is it
for today so.
880
00:48:51,190 --> 00:48:52,440