1
00:00:00,000 --> 00:00:00,040
2
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative
3
00:00:02,460 --> 00:00:03,870
Commons license.
4
00:00:03,870 --> 00:00:06,910
Your support will help MIT
OpenCourseWare continue to
5
00:00:06,910 --> 00:00:10,560
offer high-quality educational
resources for free.
6
00:00:10,560 --> 00:00:13,460
To make a donation or view
additional materials from
7
00:00:13,460 --> 00:00:19,290
hundreds of MIT courses, visit
MIT OpenCourseWare at
8
00:00:19,290 --> 00:00:20,540
ocw.mit.edu.
9
00:00:20,540 --> 00:00:23,050
10
00:00:23,050 --> 00:00:25,080
JOHN TSITSIKLIS:
OK let's start.
11
00:00:25,080 --> 00:00:26,560
So we've had the quiz.
12
00:00:26,560 --> 00:00:29,760
And I guess there's both good
and bad news in it.
13
00:00:29,760 --> 00:00:31,590
Yesterday, as you know,
the bad news.
14
00:00:31,590 --> 00:00:33,910
The average was a little
lower than what
15
00:00:33,910 --> 00:00:36,260
we would have wanted.
16
00:00:36,260 --> 00:00:39,580
On the other hand, the good news
is that the distribution
17
00:00:39,580 --> 00:00:41,770
was nicely spread.
18
00:00:41,770 --> 00:00:44,890
And that's the main purpose of
this quiz is basically for you
19
00:00:44,890 --> 00:00:48,260
to calibrate and see roughly
where you are standing.
20
00:00:48,260 --> 00:00:50,650
The other piece of the good
news is that, as you know,
21
00:00:50,650 --> 00:00:53,590
this quiz doesn't count for very
much in your final grade.
22
00:00:53,590 --> 00:00:58,230
So it's really a matter of
calibration and to get your
23
00:00:58,230 --> 00:01:02,810
mind set appropriately to
prepare for the second quiz,
24
00:01:02,810 --> 00:01:04,470
which counts a lot more.
25
00:01:04,470 --> 00:01:06,370
And it's more substantial.
26
00:01:06,370 --> 00:01:08,810
And we'll make sure that
the second quiz will
27
00:01:08,810 --> 00:01:12,110
have a higher average.
28
00:01:12,110 --> 00:01:12,520
All right.
29
00:01:12,520 --> 00:01:15,410
So let's go to our material.
30
00:01:15,410 --> 00:01:18,190
We're talking now
these days about
31
00:01:18,190 --> 00:01:20,440
continuous random variables.
32
00:01:20,440 --> 00:01:23,240
And I'll remind you what
we discussed last time.
33
00:01:23,240 --> 00:01:25,970
I'll remind you of the concept
of the probability density
34
00:01:25,970 --> 00:01:28,230
function of a single
random variable.
35
00:01:28,230 --> 00:01:31,090
And then we're going to rush
through all the concepts that
36
00:01:31,090 --> 00:01:34,230
we covered for the case of
discrete random variables and
37
00:01:34,230 --> 00:01:37,770
discuss their analogs for
the continuous case.
38
00:01:37,770 --> 00:01:40,410
And talk about notions
such as conditioning
39
00:01:40,410 --> 00:01:42,170
independence and so on.
40
00:01:42,170 --> 00:01:46,420
So the big picture is here.
41
00:01:46,420 --> 00:01:49,590
We have all those concepts that
we developed for the case
42
00:01:49,590 --> 00:01:52,350
of discrete random variables.
43
00:01:52,350 --> 00:01:55,560
And now we will just talk about
their analogs in the
44
00:01:55,560 --> 00:01:56,840
continuous case.
45
00:01:56,840 --> 00:02:00,800
We already discussed this analog
last week, the density
46
00:02:00,800 --> 00:02:04,520
of a single random variable.
47
00:02:04,520 --> 00:02:08,570
Then there are certain concepts
that show up both in
48
00:02:08,570 --> 00:02:10,780
the discrete and the
continuous case.
49
00:02:10,780 --> 00:02:14,560
So we have the cumulative
distribution function, which
50
00:02:14,560 --> 00:02:18,070
is a description of the
probability distribution of a
51
00:02:18,070 --> 00:02:21,470
random variable and which
applies whether you have a
52
00:02:21,470 --> 00:02:23,780
discrete or continuous
random variable.
53
00:02:23,780 --> 00:02:26,500
Then there's the notion
of the expected value.
54
00:02:26,500 --> 00:02:29,990
And in the two cases, the
expected value is calculated
55
00:02:29,990 --> 00:02:32,990
in a slightly different way,
but not very different.
56
00:02:32,990 --> 00:02:36,080
We have sums in one case,
integrals in the other.
57
00:02:36,080 --> 00:02:37,720
And this is the general
pattern that
58
00:02:37,720 --> 00:02:39,030
we're going to have.
59
00:02:39,030 --> 00:02:42,120
Formulas for the discrete case
translate to corresponding
60
00:02:42,120 --> 00:02:44,920
formulas or expressions in
the continuous case.
61
00:02:44,920 --> 00:02:50,010
We generically replace sums by
integrals, and we replace must
62
00:02:50,010 --> 00:02:54,230
functions with density
functions.
63
00:02:54,230 --> 00:02:58,330
Then the new pieces for today
are going to be mostly the
64
00:02:58,330 --> 00:03:01,570
notion of a joint density
function, which is how we
65
00:03:01,570 --> 00:03:04,330
describe the probability
distribution of two random
66
00:03:04,330 --> 00:03:08,370
variables that are somehow
related, in general, and then
67
00:03:08,370 --> 00:03:11,780
the notion of a conditional
density function that tells us
68
00:03:11,780 --> 00:03:15,160
the distribution of one random
variable X when you're told
69
00:03:15,160 --> 00:03:19,200
the value of another random
variable Y. There's another
70
00:03:19,200 --> 00:03:22,680
concept, which is the
conditional PDF given that the
71
00:03:22,680 --> 00:03:24,860
certain event has happened.
72
00:03:24,860 --> 00:03:27,420
This is a concept that's
in some ways simpler.
73
00:03:27,420 --> 00:03:31,360
You've already seen a little
bit of that in last week's
74
00:03:31,360 --> 00:03:33,140
recitation and tutorial.
75
00:03:33,140 --> 00:03:35,640
The idea is that we have a
single random variable.
76
00:03:35,640 --> 00:03:37,710
It's described by a density.
77
00:03:37,710 --> 00:03:41,110
Then you're told that the
certain event has occurred.
78
00:03:41,110 --> 00:03:42,880
Your model changes
the universe that
79
00:03:42,880 --> 00:03:43,910
you are dealing with.
80
00:03:43,910 --> 00:03:46,640
In the new universe, you are
dealing with a new density
81
00:03:46,640 --> 00:03:51,310
function, the one that applies
given the knowledge that we
82
00:03:51,310 --> 00:03:55,700
have that the certain
event has occurred.
83
00:03:55,700 --> 00:03:56,160
All right.
84
00:03:56,160 --> 00:03:59,870
So what exactly did
we say about
85
00:03:59,870 --> 00:04:02,140
continuous random variables?
86
00:04:02,140 --> 00:04:05,020
The first thing is the
definition, that a random
87
00:04:05,020 --> 00:04:09,370
variable is said to be
continuous if we are given a
88
00:04:09,370 --> 00:04:12,220
certain object that we call
the probability density
89
00:04:12,220 --> 00:04:17,050
function and we can calculate
interval probabilities given
90
00:04:17,050 --> 00:04:18,709
this density function.
91
00:04:18,709 --> 00:04:21,589
So the definition is that the
random variable is continuous
92
00:04:21,589 --> 00:04:24,490
if you can calculate
probabilities associated with
93
00:04:24,490 --> 00:04:27,380
that random variable
given that formula.
94
00:04:27,380 --> 00:04:29,770
So this formula tells you that
the probability that your
95
00:04:29,770 --> 00:04:33,340
random variable falls inside
this interval is the area
96
00:04:33,340 --> 00:04:34,880
under the density curve.
97
00:04:34,880 --> 00:04:37,390
98
00:04:37,390 --> 00:04:37,700
OK.
99
00:04:37,700 --> 00:04:39,720
There's a few properties
that a density
100
00:04:39,720 --> 00:04:41,020
function must satisfy.
101
00:04:41,020 --> 00:04:42,900
Since we're talking about
probabilities, and
102
00:04:42,900 --> 00:04:45,890
probabilities are non-negative,
we have that the
103
00:04:45,890 --> 00:04:49,530
density function is always
a non-negative function.
104
00:04:49,530 --> 00:04:52,790
The total probability over
the entire real line
105
00:04:52,790 --> 00:04:54,690
must be equal to 1.
106
00:04:54,690 --> 00:04:58,070
So the integral when you
integrate over the entire real
107
00:04:58,070 --> 00:04:59,590
line has to be equal to 1.
108
00:04:59,590 --> 00:05:01,800
That's the second property.
109
00:05:01,800 --> 00:05:05,200
Another property that you get is
that if you let a equal to
110
00:05:05,200 --> 00:05:07,720
b, this integral becomes 0.
111
00:05:07,720 --> 00:05:11,390
And that tells you that the
probability of a single point
112
00:05:11,390 --> 00:05:15,990
in the continuous case
is always equal to 0.
113
00:05:15,990 --> 00:05:17,780
So these are formal
properties.
114
00:05:17,780 --> 00:05:21,290
When you want to think
intuitively, the best way to
115
00:05:21,290 --> 00:05:25,540
think about what the density
function is to think in terms
116
00:05:25,540 --> 00:05:28,320
of little intervals, the
probability that my random
117
00:05:28,320 --> 00:05:31,540
variable falls inside
the little interval.
118
00:05:31,540 --> 00:05:35,170
Well, inside that little
interval, the density function
119
00:05:35,170 --> 00:05:36,940
here is roughly constant.
120
00:05:36,940 --> 00:05:42,430
So that integral becomes the
value of the density times the
121
00:05:42,430 --> 00:05:45,340
length of the interval over
which you are integrating,
122
00:05:45,340 --> 00:05:47,070
which is delta.
123
00:05:47,070 --> 00:05:50,240
And so the density function
basically gives us
124
00:05:50,240 --> 00:05:54,990
probabilities of little events,
of small events.
125
00:05:54,990 --> 00:05:59,200
And the density is to be
interpreted as probability per
126
00:05:59,200 --> 00:06:02,290
unit length at a certain
place in the diagram.
127
00:06:02,290 --> 00:06:04,800
So in that place in the diagram,
the probability per
128
00:06:04,800 --> 00:06:07,870
unit length around this
neighborhood would be the
129
00:06:07,870 --> 00:06:12,320
height of the density function
at that point.
130
00:06:12,320 --> 00:06:13,270
What else?
131
00:06:13,270 --> 00:06:16,440
We have a formula for
calculating expected values of
132
00:06:16,440 --> 00:06:17,980
functions of random variables.
133
00:06:17,980 --> 00:06:21,310
In the discrete case, we had the
formula where here we had
134
00:06:21,310 --> 00:06:25,430
the sum, and instead of the
density, we had the PMF.
135
00:06:25,430 --> 00:06:29,188
The same formula is also valid
in the continuous case.
136
00:06:29,188 --> 00:06:35,120
And it's not too hard to derive,
but we will not do it.
137
00:06:35,120 --> 00:06:36,910
But let's think of the
intuition of what
138
00:06:36,910 --> 00:06:38,420
this formula says.
139
00:06:38,420 --> 00:06:41,670
You're trying to figure out on
the average how much g(X) is
140
00:06:41,670 --> 00:06:42,780
going to be.
141
00:06:42,780 --> 00:06:47,130
And then you reason, and you
say, well, X may turn out to
142
00:06:47,130 --> 00:06:52,560
take a particular value or a
small interval of values.
143
00:06:52,560 --> 00:06:54,780
This is the probability
that X falls
144
00:06:54,780 --> 00:06:56,640
inside the small interval.
145
00:06:56,640 --> 00:07:00,310
And when that happens, g(X)
takes that value.
146
00:07:00,310 --> 00:07:03,930
So this fraction of the time,
you fall in the little
147
00:07:03,930 --> 00:07:07,350
neighborhood of x, and
you get so much.
148
00:07:07,350 --> 00:07:10,860
Then you average over all the
possible x's that can happen.
149
00:07:10,860 --> 00:07:13,930
And that gives you the average
value of the function g(X).
150
00:07:13,930 --> 00:07:17,730
151
00:07:17,730 --> 00:07:18,045
OK.
152
00:07:18,045 --> 00:07:20,650
So this is the easy stuff.
153
00:07:20,650 --> 00:07:23,690
Now let's get to the
new material.
154
00:07:23,690 --> 00:07:26,330
We want to talk about multiple
random variables
155
00:07:26,330 --> 00:07:27,320
simultaneously.
156
00:07:27,320 --> 00:07:31,530
So we want to talk now about two
random variables that are
157
00:07:31,530 --> 00:07:35,020
continuous, and in some sense
that they are jointly
158
00:07:35,020 --> 00:07:35,840
continuous.
159
00:07:35,840 --> 00:07:38,080
And let's see what this means.
160
00:07:38,080 --> 00:07:40,840
The definition is similar to
the definition we had for a
161
00:07:40,840 --> 00:07:44,850
single random variable, where
I take this formula here as
162
00:07:44,850 --> 00:07:49,510
the definition of continuous
random variables.
163
00:07:49,510 --> 00:07:53,830
Two random variables are said to
be jointly continuous if we
164
00:07:53,830 --> 00:07:58,190
can calculate probabilities by
integrating a certain function
165
00:07:58,190 --> 00:08:01,070
that we call the joint
density function
166
00:08:01,070 --> 00:08:03,310
over the set of interest.
167
00:08:03,310 --> 00:08:08,690
So we have our two-dimensional
plane.
168
00:08:08,690 --> 00:08:10,900
This is the x-y plane.
169
00:08:10,900 --> 00:08:13,810
There's a certain event S that
we're interested in.
170
00:08:13,810 --> 00:08:15,860
We want to calculate
the probability.
171
00:08:15,860 --> 00:08:17,370
How do we do that?
172
00:08:17,370 --> 00:08:22,660
We are given this function
f_(X,Y), the joint density.
173
00:08:22,660 --> 00:08:25,910
It's a function of the two
arguments x and y.
174
00:08:25,910 --> 00:08:29,530
So think of that function as
being some kind of surface
175
00:08:29,530 --> 00:08:34,809
that sits on top of the
two-dimensional plane.
176
00:08:34,809 --> 00:08:39,140
The probability of falling
inside the set S, we calculate
177
00:08:39,140 --> 00:08:45,350
it by looking at the volume
under the surface, that volume
178
00:08:45,350 --> 00:08:50,470
that sits on top of S. So the
surface underneath it has a
179
00:08:50,470 --> 00:08:52,010
certain total volume.
180
00:08:52,010 --> 00:08:54,650
What should that total
volume be?
181
00:08:54,650 --> 00:08:57,050
Well, we think of these volumes
as probabilities.
182
00:08:57,050 --> 00:09:00,180
So the total probability
should be equal to 1.
183
00:09:00,180 --> 00:09:05,430
The total volume under this
surface, should be equal to 1.
184
00:09:05,430 --> 00:09:08,220
So that's one property
that we want our
185
00:09:08,220 --> 00:09:10,138
density function to have.
186
00:09:10,138 --> 00:09:16,080
187
00:09:16,080 --> 00:09:20,500
So when you integrate over the
entire space, this is of the
188
00:09:20,500 --> 00:09:22,400
volume under your surface.
189
00:09:22,400 --> 00:09:24,090
That should be equal to 1.
190
00:09:24,090 --> 00:09:27,280
Of course, since we're talking
about probabilities, the joint
191
00:09:27,280 --> 00:09:29,560
density should be a non-negative
function.
192
00:09:29,560 --> 00:09:34,140
So think of the situation
as having one pound of
193
00:09:34,140 --> 00:09:38,230
probability that's spread
all over your space.
194
00:09:38,230 --> 00:09:41,430
And the height of this joint
density function basically
195
00:09:41,430 --> 00:09:45,470
tells you how much probability
tends to be accumulated in
196
00:09:45,470 --> 00:09:48,400
certain regions of space
as opposed to other
197
00:09:48,400 --> 00:09:49,870
parts of the space.
198
00:09:49,870 --> 00:09:53,130
So wherever the density is big,
that means that this is
199
00:09:53,130 --> 00:09:54,920
an area of the two-dimensional
plane that's
200
00:09:54,920 --> 00:09:56,340
more likely to occur.
201
00:09:56,340 --> 00:09:59,160
Where the density is small, that
means that those x-y's
202
00:09:59,160 --> 00:10:01,100
are less likely to occur.
203
00:10:01,100 --> 00:10:03,070
You have already seen
one example
204
00:10:03,070 --> 00:10:06,050
of continuous densities.
205
00:10:06,050 --> 00:10:08,730
That was the example we had in
the very beginning of the
206
00:10:08,730 --> 00:10:10,700
class with a uniform
207
00:10:10,700 --> 00:10:13,380
distribution on the unit square.
208
00:10:13,380 --> 00:10:15,510
That was a special
case of a density
209
00:10:15,510 --> 00:10:17,250
function that was constant.
210
00:10:17,250 --> 00:10:20,090
So all places in the unit square
were roughly equally
211
00:10:20,090 --> 00:10:22,010
likely as any other places.
212
00:10:22,010 --> 00:10:25,580
But in other models, some parts
of the space may be more
213
00:10:25,580 --> 00:10:27,000
likely than others.
214
00:10:27,000 --> 00:10:29,470
And we describe those relative
likelihoods using
215
00:10:29,470 --> 00:10:31,120
this density function.
216
00:10:31,120 --> 00:10:33,420
So if somebody gives us the
density function, this
217
00:10:33,420 --> 00:10:38,480
determines for us probabilities
of all the
218
00:10:38,480 --> 00:10:41,520
subsets of the two-dimensional
plane.
219
00:10:41,520 --> 00:10:45,710
Now for an intuitive
interpretation, it's good to
220
00:10:45,710 --> 00:10:47,460
think about small events.
221
00:10:47,460 --> 00:10:51,220
So let's take a particular x
here and then x plus delta.
222
00:10:51,220 --> 00:10:53,020
So this is a small interval.
223
00:10:53,020 --> 00:10:56,190
Take another small interval
here that goes from y to y
224
00:10:56,190 --> 00:10:57,560
plus delta.
225
00:10:57,560 --> 00:11:03,270
And let's look at the event that
x falls here and y falls
226
00:11:03,270 --> 00:11:04,780
right there.
227
00:11:04,780 --> 00:11:05,780
What is this event?
228
00:11:05,780 --> 00:11:07,760
Well, this is the event
that will fall
229
00:11:07,760 --> 00:11:11,030
inside this little rectangle.
230
00:11:11,030 --> 00:11:15,820
Using this rule for calculating
probabilities,
231
00:11:15,820 --> 00:11:19,040
what is the probability of that
rectangle going to be?
232
00:11:19,040 --> 00:11:23,130
Well, it should be the integral
of the density over
233
00:11:23,130 --> 00:11:24,300
this rectangle.
234
00:11:24,300 --> 00:11:29,720
Or it's the volume under the
surface that sits on top of
235
00:11:29,720 --> 00:11:31,010
that rectangle.
236
00:11:31,010 --> 00:11:34,300
Now, if the rectangle is very
small, the joint density is
237
00:11:34,300 --> 00:11:36,760
not going to change very much
in that neighborhood.
238
00:11:36,760 --> 00:11:38,770
So we can treat it
as a constant.
239
00:11:38,770 --> 00:11:42,350
So the volume is going to
be the height times
240
00:11:42,350 --> 00:11:44,030
the area of the base.
241
00:11:44,030 --> 00:11:47,150
The height at that point is
whatever the function happens
242
00:11:47,150 --> 00:11:49,460
to be around that point.
243
00:11:49,460 --> 00:11:52,590
And the area of the base
is delta squared.
244
00:11:52,590 --> 00:11:58,750
So this is the intuitive way
to understand what a joint
245
00:11:58,750 --> 00:12:01,070
density function really
tells you.
246
00:12:01,070 --> 00:12:04,200
It specifies for you
probabilities of little
247
00:12:04,200 --> 00:12:08,500
squares, of little rectangles.
248
00:12:08,500 --> 00:12:11,880
And it allows you to think of
the joint density function as
249
00:12:11,880 --> 00:12:15,310
probability per unit area.
250
00:12:15,310 --> 00:12:18,790
So these are the units of the
density, its probability per
251
00:12:18,790 --> 00:12:23,800
unit area in the neighborhood
of a certain point.
252
00:12:23,800 --> 00:12:26,970
So what do we do with this
density function once we have
253
00:12:26,970 --> 00:12:28,410
it in our hands?
254
00:12:28,410 --> 00:12:32,640
Well, we can use it to calculate
expected values.
255
00:12:32,640 --> 00:12:34,880
Suppose that you have a
function of two random
256
00:12:34,880 --> 00:12:38,040
variables described by
a joint density.
257
00:12:38,040 --> 00:12:41,580
You can find, perhaps, the
distribution of this random
258
00:12:41,580 --> 00:12:45,330
variable and then use the
basic definition of the
259
00:12:45,330 --> 00:12:46,150
expectation.
260
00:12:46,150 --> 00:12:49,260
Or you can calculate
expectations directly, using
261
00:12:49,260 --> 00:12:52,010
the distribution of the original
random variables.
262
00:12:52,010 --> 00:12:55,280
This is a formula that's again
identical to the formula that
263
00:12:55,280 --> 00:12:57,290
we had for the discrete case.
264
00:12:57,290 --> 00:12:59,500
In the discrete case,
we had a double sum
265
00:12:59,500 --> 00:13:02,590
here, and we had PMFs.
266
00:13:02,590 --> 00:13:06,290
So the intuition behind this
formula is the same that one
267
00:13:06,290 --> 00:13:08,220
had for the discrete case.
268
00:13:08,220 --> 00:13:12,550
It's just that the mechanics
are different.
269
00:13:12,550 --> 00:13:16,220
Then something that we did in
the discrete case was to find
270
00:13:16,220 --> 00:13:21,510
a way to go from the joint
density of the two random
271
00:13:21,510 --> 00:13:25,750
variables taken together to the
density of just one of the
272
00:13:25,750 --> 00:13:28,190
random variables.
273
00:13:28,190 --> 00:13:30,570
So we had a formula for
the discrete case.
274
00:13:30,570 --> 00:13:33,450
Let's see how things are
going to work out in
275
00:13:33,450 --> 00:13:35,800
the continuous case.
276
00:13:35,800 --> 00:13:40,560
So in the continuous
case, we have here
277
00:13:40,560 --> 00:13:42,330
our two random variables.
278
00:13:42,330 --> 00:13:45,030
And we have a density
for them.
279
00:13:45,030 --> 00:13:48,340
And let's say that we want to
calculate the probability that
280
00:13:48,340 --> 00:13:51,570
x falls inside this interval.
281
00:13:51,570 --> 00:13:53,510
So we're looking at the
probability that our random
282
00:13:53,510 --> 00:13:58,380
variable X falls in the interval
from little x to x
283
00:13:58,380 --> 00:13:59,630
plus delta.
284
00:13:59,630 --> 00:14:02,130
285
00:14:02,130 --> 00:14:08,260
Now, by the properties that we
already have for interpreting
286
00:14:08,260 --> 00:14:11,460
the density function of a single
random variable, the
287
00:14:11,460 --> 00:14:14,100
probability of a little interval
is approximately the
288
00:14:14,100 --> 00:14:18,750
density of that single random
variable times delta.
289
00:14:18,750 --> 00:14:22,120
And now we want to find a
formula for this marginal
290
00:14:22,120 --> 00:14:26,540
density in terms of
the joint density.
291
00:14:26,540 --> 00:14:26,890
OK.
292
00:14:26,890 --> 00:14:28,930
So this is the probability
that x
293
00:14:28,930 --> 00:14:30,970
falls inside this interval.
294
00:14:30,970 --> 00:14:34,070
In terms of the two-dimensional
plane, this is
295
00:14:34,070 --> 00:14:40,030
the probability that (x,y)
falls inside this strip.
296
00:14:40,030 --> 00:14:44,520
So to find that probability,
we need to calculate the
297
00:14:44,520 --> 00:14:48,530
probability that (x,y) falls in
here, which is going to be
298
00:14:48,530 --> 00:14:55,780
the double integral over the
interval over this strip, of
299
00:14:55,780 --> 00:14:57,030
the joint density.
300
00:14:57,030 --> 00:15:05,080
301
00:15:05,080 --> 00:15:07,920
And what are we integrating
over?
302
00:15:07,920 --> 00:15:11,185
y goes from minus infinity
to plus infinity.
303
00:15:11,185 --> 00:15:15,680
304
00:15:15,680 --> 00:15:22,755
And the dummy variable x goes
from little x to x plus delta.
305
00:15:22,755 --> 00:15:27,240
306
00:15:27,240 --> 00:15:31,580
So to integrate over this strip,
what we do is for any
307
00:15:31,580 --> 00:15:34,810
given y, we integrate
in this dimension.
308
00:15:34,810 --> 00:15:36,770
This is the x integral.
309
00:15:36,770 --> 00:15:40,220
And then we integrate over
the y dimension.
310
00:15:40,220 --> 00:15:42,920
Now what is this
inner integral?
311
00:15:42,920 --> 00:15:50,250
Because x only varies very
little, this is approximately
312
00:15:50,250 --> 00:15:53,040
constant in that range.
313
00:15:53,040 --> 00:15:56,210
So the integral with
respect to x just
314
00:15:56,210 --> 00:15:58,840
becomes delta times f(x,y).
315
00:15:58,840 --> 00:16:02,010
316
00:16:02,010 --> 00:16:03,490
And then we've got our dy.
317
00:16:03,490 --> 00:16:06,930
318
00:16:06,930 --> 00:16:11,760
So this is what the inner
integral will evaluate to.
319
00:16:11,760 --> 00:16:15,280
We are integrating over
the little interval.
320
00:16:15,280 --> 00:16:17,450
So we're keeping y fixed.
321
00:16:17,450 --> 00:16:22,020
Integrating over here, we take
the value of the density times
322
00:16:22,020 --> 00:16:24,940
how much we're integrating
over.
323
00:16:24,940 --> 00:16:27,890
And we get this formula.
324
00:16:27,890 --> 00:16:28,410
OK.
325
00:16:28,410 --> 00:16:33,170
Now, this expression must be
equal to that expression.
326
00:16:33,170 --> 00:16:40,060
So if we cancel the deltas, we
see that the marginal density
327
00:16:40,060 --> 00:16:44,000
must be equal to the integral of
the joint density, where we
328
00:16:44,000 --> 00:16:48,200
have integrated out
the value of y.
329
00:16:48,200 --> 00:16:54,060
330
00:16:54,060 --> 00:16:59,000
So this formula should come as
no surprise at this point.
331
00:16:59,000 --> 00:17:01,380
It's exactly the same as the
formula that we had for
332
00:17:01,380 --> 00:17:03,270
discrete random variables.
333
00:17:03,270 --> 00:17:06,800
But now we are replacing the
sum with an integral.
334
00:17:06,800 --> 00:17:14,690
And instead of using the
joint PMF, we are
335
00:17:14,690 --> 00:17:18,480
using the joint PDF.
336
00:17:18,480 --> 00:17:21,810
Then, continuing going down the
list of things we did for
337
00:17:21,810 --> 00:17:24,839
discrete random variables, we
can now introduce a definition
338
00:17:24,839 --> 00:17:28,310
of the notion of independence
of two random variables.
339
00:17:28,310 --> 00:17:31,050
And by analogy with the discrete
case, we define
340
00:17:31,050 --> 00:17:33,940
independence to be the
following condition.
341
00:17:33,940 --> 00:17:37,210
Two random variables are
independent if and only if
342
00:17:37,210 --> 00:17:42,220
their joint density function
factors out as a product of
343
00:17:42,220 --> 00:17:44,390
their marginal densities.
344
00:17:44,390 --> 00:17:48,000
And this property needs to
be true for all x and y.
345
00:17:48,000 --> 00:17:49,890
So this is the formal
definition.
346
00:17:49,890 --> 00:17:53,020
Operationally and intuitively,
what does it mean?
347
00:17:53,020 --> 00:17:55,110
Well, intuitively it means
the same thing as in
348
00:17:55,110 --> 00:17:56,600
the discrete case.
349
00:17:56,600 --> 00:18:00,610
Knowing anything about X
shouldn't tell you anything
350
00:18:00,610 --> 00:18:05,320
about Y. That is, information
about X is not going to change
351
00:18:05,320 --> 00:18:10,120
your beliefs about Y. We are
going to come back to this
352
00:18:10,120 --> 00:18:11,370
statement in a second.
353
00:18:11,370 --> 00:18:14,320
354
00:18:14,320 --> 00:18:16,920
The other thing that it
allows you to do--
355
00:18:16,920 --> 00:18:20,750
I'm not going to derive this--
is it allows you to calculate
356
00:18:20,750 --> 00:18:25,650
probabilities by multiplying
individual probabilities.
357
00:18:25,650 --> 00:18:28,110
So if you ask for the
probability that x falls in a
358
00:18:28,110 --> 00:18:34,220
certain set A and y falls in a
certain set B, then you can
359
00:18:34,220 --> 00:18:37,670
calculate that probability
by multiplying individual
360
00:18:37,670 --> 00:18:38,920
probabilities.
361
00:18:38,920 --> 00:18:41,860
362
00:18:41,860 --> 00:18:46,090
This takes just two lines of
derivation, which I'm not
363
00:18:46,090 --> 00:18:47,710
going to do.
364
00:18:47,710 --> 00:18:51,240
But it comes back to
the usual notion of
365
00:18:51,240 --> 00:18:53,370
independence of events.
366
00:18:53,370 --> 00:18:56,340
Basically, operationally
independence means that you
367
00:18:56,340 --> 00:18:57,660
can multiply probabilities.
368
00:18:57,660 --> 00:19:00,190
369
00:19:00,190 --> 00:19:04,380
So now let's look
at an example.
370
00:19:04,380 --> 00:19:08,150
There's a sort of pretty famous
and classical one.
371
00:19:08,150 --> 00:19:12,540
It goes back a lot more
than a 100 years.
372
00:19:12,540 --> 00:19:16,290
And it's the famous
Needle of Buffon.
373
00:19:16,290 --> 00:19:19,860
Buffon was a French naturalist
who, for some reason, also
374
00:19:19,860 --> 00:19:22,150
decided to play with
probability.
375
00:19:22,150 --> 00:19:24,590
And look at the following
problem.
376
00:19:24,590 --> 00:19:28,400
So you have the two-dimensional
plane.
377
00:19:28,400 --> 00:19:33,870
And on the plane we draw a
bunch of parallel lines.
378
00:19:33,870 --> 00:19:37,575
And those parallel lines are
separated by a length.
379
00:19:37,575 --> 00:19:46,830
380
00:19:46,830 --> 00:19:52,270
And the lines are apart
at distance d.
381
00:19:52,270 --> 00:19:58,780
And we throw a needle at random,
completely at random.
382
00:19:58,780 --> 00:20:01,510
And we'll have to give a meaning
to what "completely at
383
00:20:01,510 --> 00:20:03,180
random" means.
384
00:20:03,180 --> 00:20:06,490
And when we throw a needle,
there's two possibilities.
385
00:20:06,490 --> 00:20:09,640
Either the needle is going to
fall in a way that does not
386
00:20:09,640 --> 00:20:13,120
intersect any of the lines, or
it's going to fall in a way
387
00:20:13,120 --> 00:20:15,700
that it intersects
one of the lines.
388
00:20:15,700 --> 00:20:19,470
We're taking the needle to be
shorter than this distance, so
389
00:20:19,470 --> 00:20:22,185
the needle cannot intersect
two lines simultaneously.
390
00:20:22,185 --> 00:20:26,230
It either intersects 0, or it
intersects one of the lines.
391
00:20:26,230 --> 00:20:29,610
The question is to find the
probability that the needle is
392
00:20:29,610 --> 00:20:32,100
going to intersect a line.
393
00:20:32,100 --> 00:20:34,650
What's the probability
of this?
394
00:20:34,650 --> 00:20:35,010
OK.
395
00:20:35,010 --> 00:20:40,020
We are going to approach this
problem by using our standard
396
00:20:40,020 --> 00:20:42,110
four-step procedure.
397
00:20:42,110 --> 00:20:46,560
Set up your sample space,
describe a probability law on
398
00:20:46,560 --> 00:20:51,460
that sample space, identify
the event of interest, and
399
00:20:51,460 --> 00:20:53,370
then calculate.
400
00:20:53,370 --> 00:20:58,470
These four steps basically
correspond to these three
401
00:20:58,470 --> 00:21:04,110
bullets and then the last
equation down here.
402
00:21:04,110 --> 00:21:06,510
So first thing is to set
up a sample space.
403
00:21:06,510 --> 00:21:09,470
We need some variables to
describe what happened in the
404
00:21:09,470 --> 00:21:10,780
experiment.
405
00:21:10,780 --> 00:21:14,300
So what happens in the
experiment is that the needle
406
00:21:14,300 --> 00:21:16,500
lands somewhere.
407
00:21:16,500 --> 00:21:20,450
And where it lands, we can
describe this by specifying
408
00:21:20,450 --> 00:21:24,160
the location of the center
of the needle.
409
00:21:24,160 --> 00:21:27,020
And what do we mean by the
location of the center?
410
00:21:27,020 --> 00:21:30,310
Well, we can take as our
variable to be the distance
411
00:21:30,310 --> 00:21:33,035
from the center of the needle
to the nearest line.
412
00:21:33,035 --> 00:21:36,280
413
00:21:36,280 --> 00:21:42,520
So it tells us the vertical
distance of the center of the
414
00:21:42,520 --> 00:21:45,930
needle from the nearest line.
415
00:21:45,930 --> 00:21:47,500
The other thing that
matters is the
416
00:21:47,500 --> 00:21:49,400
orientation of the needle.
417
00:21:49,400 --> 00:21:53,820
So we need one more variable,
which we take to be the angle
418
00:21:53,820 --> 00:21:56,940
that the needle is forming
with the lines.
419
00:21:56,940 --> 00:22:00,260
We can put the angle here,
or you can put in there.
420
00:22:00,260 --> 00:22:02,620
Yes, it's still the
same angle.
421
00:22:02,620 --> 00:22:06,850
So we have these two variables
that described what happened
422
00:22:06,850 --> 00:22:08,190
in the experiment.
423
00:22:08,190 --> 00:22:11,280
And we can take our sample space
to be the set of all
424
00:22:11,280 --> 00:22:14,390
possible x's and theta's.
425
00:22:14,390 --> 00:22:16,770
What are the possible x's?
426
00:22:16,770 --> 00:22:20,800
The lines are d apart, so the
nearest line is going to be
427
00:22:20,800 --> 00:22:24,400
anywhere between
0 and d/2 away.
428
00:22:24,400 --> 00:22:28,630
So that tells us what the
possible x's will be.
429
00:22:28,630 --> 00:22:31,420
As for theta, it really
depends how
430
00:22:31,420 --> 00:22:33,230
you define your angle.
431
00:22:33,230 --> 00:22:37,510
We are going to define our theta
to be the acute angle
432
00:22:37,510 --> 00:22:44,020
that's formed between the needle
and a line, if you were
433
00:22:44,020 --> 00:22:45,130
to extend it.
434
00:22:45,130 --> 00:22:50,180
So theta is going to be
something between 0 and pi/2.
435
00:22:50,180 --> 00:22:54,140
So I guess these red pieces
really correspond to the part
436
00:22:54,140 --> 00:22:58,490
of setting up the
sample space.
437
00:22:58,490 --> 00:22:58,810
OK.
438
00:22:58,810 --> 00:23:00,270
So that's part one.
439
00:23:00,270 --> 00:23:03,390
Second part is we
need a model.
440
00:23:03,390 --> 00:23:03,690
OK.
441
00:23:03,690 --> 00:23:08,140
Let's take our model to be that
we basically know nothing
442
00:23:08,140 --> 00:23:10,600
about how the needle falls.
443
00:23:10,600 --> 00:23:13,890
It can fall in any possible way,
and all possible ways are
444
00:23:13,890 --> 00:23:15,230
equally likely.
445
00:23:15,230 --> 00:23:18,910
Now, if you have those parallel
lines, and you close
446
00:23:18,910 --> 00:23:22,330
your eyes completely and throw a
needle completely at random,
447
00:23:22,330 --> 00:23:25,260
any x should be equally
likely.
448
00:23:25,260 --> 00:23:29,490
So we describe that situation by
saying that X should have a
449
00:23:29,490 --> 00:23:31,360
uniform distribution.
450
00:23:31,360 --> 00:23:33,880
That is, it should have a
constant density over the
451
00:23:33,880 --> 00:23:35,410
range of interest.
452
00:23:35,410 --> 00:23:39,160
Similarly, if you kind of spin
your needle completely at
453
00:23:39,160 --> 00:23:43,580
random, any angle should be as
likely as any other angle.
454
00:23:43,580 --> 00:23:47,160
And we decide to model this
situation by saying that theta
455
00:23:47,160 --> 00:23:49,680
also has a uniform
distribution over
456
00:23:49,680 --> 00:23:50,995
the range of interest.
457
00:23:50,995 --> 00:23:54,220
458
00:23:54,220 --> 00:23:58,500
And finally, where we put it
should have nothing to do with
459
00:23:58,500 --> 00:24:00,370
how much we rotate it.
460
00:24:00,370 --> 00:24:04,320
And we capture this
mathematically by saying that
461
00:24:04,320 --> 00:24:07,480
X is going to be independent
of theta.
462
00:24:07,480 --> 00:24:09,220
Now, this is going
to be our model.
463
00:24:09,220 --> 00:24:11,920
I'm not deriving the model
from anything.
464
00:24:11,920 --> 00:24:15,480
I'm only saying that this sounds
like a model that does
465
00:24:15,480 --> 00:24:19,800
not assume any knowledge or
preference for certain values
466
00:24:19,800 --> 00:24:22,360
of x rather than other
values of theta.
467
00:24:22,360 --> 00:24:25,660
In the absence of any other
particular information you
468
00:24:25,660 --> 00:24:28,420
might have in your hands, that's
the most reasonable
469
00:24:28,420 --> 00:24:30,520
model to come up with.
470
00:24:30,520 --> 00:24:32,150
So you model the problem
that way.
471
00:24:32,150 --> 00:24:35,490
So what's the formula for
the joint density?
472
00:24:35,490 --> 00:24:37,590
It's going to be the
product of the
473
00:24:37,590 --> 00:24:41,200
densities of X and Theta.
474
00:24:41,200 --> 00:24:42,410
Why is it the product?
475
00:24:42,410 --> 00:24:45,530
This is because we assumed
independence.
476
00:24:45,530 --> 00:24:48,910
And the density of X, since
it's uniform, and since it
477
00:24:48,910 --> 00:24:54,630
needs to integrate to 1, that
density needs to be 2/d.
478
00:24:54,630 --> 00:24:57,580
That's the density of X.
And the density of
479
00:24:57,580 --> 00:25:00,740
Theta needs to be 2/pi.
480
00:25:00,740 --> 00:25:03,660
That's the value for the density
of Theta so that the
481
00:25:03,660 --> 00:25:07,920
overall probability over this
interval ends up being 1.
482
00:25:07,920 --> 00:25:12,390
So now we do have our joint
density in our hands.
483
00:25:12,390 --> 00:25:14,690
The next thing to do
is to identify
484
00:25:14,690 --> 00:25:17,920
the event of interest.
485
00:25:17,920 --> 00:25:20,720
And this is best done
in a picture.
486
00:25:20,720 --> 00:25:23,380
And there's two possible
situations
487
00:25:23,380 --> 00:25:25,450
that one could have.
488
00:25:25,450 --> 00:25:33,450
Either the needle falls this
way, or it falls this way.
489
00:25:33,450 --> 00:25:38,300
So how can we tell if one or the
other is going to happen?
490
00:25:38,300 --> 00:25:45,470
It has to do with whether this
interval here is smaller than
491
00:25:45,470 --> 00:25:50,130
that or bigger than that.
492
00:25:50,130 --> 00:25:52,260
So we are comparing
the height of this
493
00:25:52,260 --> 00:25:55,460
interval to that interval.
494
00:25:55,460 --> 00:25:58,220
This interval here
is capital X.
495
00:25:58,220 --> 00:26:02,350
This interval here,
what is it?
496
00:26:02,350 --> 00:26:07,040
This is half of the length of
the needle, which is l/2.
497
00:26:07,040 --> 00:26:10,590
To find this height, we take l/2
and multiply it with the
498
00:26:10,590 --> 00:26:13,700
sine of the angle
that we have.
499
00:26:13,700 --> 00:26:18,330
So the length of this
interval up here is
500
00:26:18,330 --> 00:26:23,500
l/2 times sine theta.
501
00:26:23,500 --> 00:26:28,520
If this is smaller than
x, the needle does not
502
00:26:28,520 --> 00:26:30,010
intersect the line.
503
00:26:30,010 --> 00:26:33,130
If this is bigger than
x, then the needle
504
00:26:33,130 --> 00:26:34,920
intersects the line.
505
00:26:34,920 --> 00:26:37,870
So the event of interest, that
the needle intersects the
506
00:26:37,870 --> 00:26:42,740
line, is described this way
in terms of x and theta.
507
00:26:42,740 --> 00:26:46,170
And now that we have the event
of interest described
508
00:26:46,170 --> 00:26:50,100
mathematically, all that we
need to do is to find the
509
00:26:50,100 --> 00:26:54,800
probability of this event, we
integrate the joint density
510
00:26:54,800 --> 00:26:59,560
over the part of (x, theta)
space in which this
511
00:26:59,560 --> 00:27:01,320
inequality is true.
512
00:27:01,320 --> 00:27:04,670
So it's a double integral over
the set of all x's and theta's
513
00:27:04,670 --> 00:27:06,450
where this is true.
514
00:27:06,450 --> 00:27:11,430
The way to do this integral is
we fix theta, and we integrate
515
00:27:11,430 --> 00:27:15,150
for x's that go from 0
up to that number.
516
00:27:15,150 --> 00:27:19,030
And theta can be anything
between 0 and pi/2.
517
00:27:19,030 --> 00:27:23,620
So the integral over this set
is basically this double
518
00:27:23,620 --> 00:27:24,980
integral here.
519
00:27:24,980 --> 00:27:27,475
We already have a formula
for the joint density.
520
00:27:27,475 --> 00:27:30,930
It's 4 over pi d, so
we put it here.
521
00:27:30,930 --> 00:27:32,640
And now, fortunately,
this is a pretty
522
00:27:32,640 --> 00:27:34,645
easy integral to evaluate.
523
00:27:34,645 --> 00:27:37,650
The integral with respect to x
-- there's nothing in here.
524
00:27:37,650 --> 00:27:40,950
So the integral is just the
length of the interval over
525
00:27:40,950 --> 00:27:42,370
which we're integrating.
526
00:27:42,370 --> 00:27:44,950
It's l/2 sine theta.
527
00:27:44,950 --> 00:27:47,870
And then we need to integrate
this with respect to theta.
528
00:27:47,870 --> 00:27:53,990
We know that the integral of a
sine is a negative cosine.
529
00:27:53,990 --> 00:27:56,990
You plug in the values for
the negative cosine
530
00:27:56,990 --> 00:27:58,390
at the two end points.
531
00:27:58,390 --> 00:28:00,260
I'm sure you can do
this integral .
532
00:28:00,260 --> 00:28:04,540
And we finally obtain the
answer, which is amazingly
533
00:28:04,540 --> 00:28:08,210
simple for such a pretty
complicated-looking problem.
534
00:28:08,210 --> 00:28:09,910
It's 2l over pi d.
535
00:28:09,910 --> 00:28:12,420
536
00:28:12,420 --> 00:28:15,360
So some people a long, long time
ago, after they looked at
537
00:28:15,360 --> 00:28:19,290
this answer, they said that
maybe that gives us an
538
00:28:19,290 --> 00:28:22,910
interesting way where one could
estimate the value by
539
00:28:22,910 --> 00:28:26,130
pi, for example,
experimentally.
540
00:28:26,130 --> 00:28:27,690
How do you do that?
541
00:28:27,690 --> 00:28:32,360
Fix l and d, the dimensions
of the problem.
542
00:28:32,360 --> 00:28:36,680
Throw a million needles on
your piece of paper.
543
00:28:36,680 --> 00:28:40,690
See how often your needless
do intersect the line.
544
00:28:40,690 --> 00:28:43,540
That gives you a number
for this quantity.
545
00:28:43,540 --> 00:28:48,540
You know l and d, so you can
use that to infer pi.
546
00:28:48,540 --> 00:28:52,330
And there's an apocryphal story
about a wounded soldier
547
00:28:52,330 --> 00:28:55,300
in a hospital after the
American Civil War who
548
00:28:55,300 --> 00:28:58,490
actually had heard about this
and was spending his time in
549
00:28:58,490 --> 00:29:02,680
the hospital throwing needles
on pieces of paper.
550
00:29:02,680 --> 00:29:04,350
I don't know if it's
true or not.
551
00:29:04,350 --> 00:29:07,330
But let's do something
similar here.
552
00:29:07,330 --> 00:29:11,720
So let's look at this diagram.
553
00:29:11,720 --> 00:29:14,110
We fix the dimensions.
554
00:29:14,110 --> 00:29:15,920
This is supposed to
be our little d.
555
00:29:15,920 --> 00:29:18,330
That's supposed to
be our little l.
556
00:29:18,330 --> 00:29:22,430
We have the formula from the
previous slide that p
557
00:29:22,430 --> 00:29:25,230
is 2l over pi d.
558
00:29:25,230 --> 00:29:29,230
In this instance, we choose
d to be twice l.
559
00:29:29,230 --> 00:29:32,170
So this number is 1/pi.
560
00:29:32,170 --> 00:29:37,770
So the probability that the
needle hits the line is 1/pi.
561
00:29:37,770 --> 00:29:41,150
So I need needles that are
3.1 centimeters long.
562
00:29:41,150 --> 00:29:42,730
I couldn't find such needles.
563
00:29:42,730 --> 00:29:47,360
But I could find paper clips
that are 3.1 centimeters long.
564
00:29:47,360 --> 00:29:51,510
So let's start throwing paper
clips at random and see how
565
00:29:51,510 --> 00:29:55,285
many of them will end up
intersecting the lines.
566
00:29:55,285 --> 00:30:00,501
567
00:30:00,501 --> 00:30:01,920
Good.
568
00:30:01,920 --> 00:30:02,400
OK.
569
00:30:02,400 --> 00:30:09,350
So out of eight paper clips,
we have exactly four that
570
00:30:09,350 --> 00:30:11,510
intersected the line.
571
00:30:11,510 --> 00:30:13,620
So our estimate for the
probability of intersecting
572
00:30:13,620 --> 00:30:18,970
the line is 1/2, which gives us
an estimate for the value
573
00:30:18,970 --> 00:30:22,010
of pi, which is two.
574
00:30:22,010 --> 00:30:24,960
Well, I mean, within an
engineering approximation,
575
00:30:24,960 --> 00:30:29,090
we're in the right
ballpark, right?
576
00:30:29,090 --> 00:30:32,890
So this might look like a
silly way of trying to
577
00:30:32,890 --> 00:30:33,920
estimate pi.
578
00:30:33,920 --> 00:30:36,420
And it probably is.
579
00:30:36,420 --> 00:30:41,200
On the other hand, this kind of
methodology is being used
580
00:30:41,200 --> 00:30:44,930
especially by physicists and
also by statisticians.
581
00:30:44,930 --> 00:30:46,550
It's used a lot.
582
00:30:46,550 --> 00:30:48,260
When is it used?
583
00:30:48,260 --> 00:30:52,300
If you have an integral to
calculate, such as this
584
00:30:52,300 --> 00:30:55,980
integral, but you're not lucky,
and your functions are
585
00:30:55,980 --> 00:30:59,980
not so simple where you can do
your calculations by hand, and
586
00:30:59,980 --> 00:31:02,590
maybe the dimensions are
larger-- instead of two random
587
00:31:02,590 --> 00:31:04,590
variables you have 100
random variables, so
588
00:31:04,590 --> 00:31:08,210
it's a 100-fold integral--
589
00:31:08,210 --> 00:31:10,830
then there's no way to do
that in the computer.
590
00:31:10,830 --> 00:31:14,230
But the way that you can
actually do it is by
591
00:31:14,230 --> 00:31:18,290
generating random samples of
your random variables, doing
592
00:31:18,290 --> 00:31:21,220
that simulation over and
over many times.
593
00:31:21,220 --> 00:31:25,010
That is, by interpreting an
integral as a probability, you
594
00:31:25,010 --> 00:31:29,060
can use simulation to estimate
that probability.
595
00:31:29,060 --> 00:31:32,470
And that gives you a way of
calculating integrals.
596
00:31:32,470 --> 00:31:36,850
And physicists do actually use
that a lot, as well as
597
00:31:36,850 --> 00:31:39,630
statisticians, computer
scientists, and so on.
598
00:31:39,630 --> 00:31:41,760
It's a so-called Monte
Carlo method
599
00:31:41,760 --> 00:31:43,990
for evaluating integrals.
600
00:31:43,990 --> 00:31:50,250
And it's a basic piece of the
toolbox in science these days.
601
00:31:50,250 --> 00:31:54,610
Finally, the harder concept
of the day is the idea of
602
00:31:54,610 --> 00:31:55,770
conditioning.
603
00:31:55,770 --> 00:31:58,740
And here things become a little
subtle when you deal
604
00:31:58,740 --> 00:32:00,970
with continuous random
variables.
605
00:32:00,970 --> 00:32:02,290
OK.
606
00:32:02,290 --> 00:32:05,810
First, remember again our basic
interpretation of what a
607
00:32:05,810 --> 00:32:06,860
density is.
608
00:32:06,860 --> 00:32:08,200
A density gives us
609
00:32:08,200 --> 00:32:10,500
probabilities of little intervals.
610
00:32:10,500 --> 00:32:13,560
So how should we define
conditional densities?
611
00:32:13,560 --> 00:32:16,600
Conditional densities should
again give us probabilities of
612
00:32:16,600 --> 00:32:21,290
little intervals, but inside a
conditional world where we
613
00:32:21,290 --> 00:32:24,530
have been told something about
the other random variable.
614
00:32:24,530 --> 00:32:28,090
So what we would like to be
true is the following.
615
00:32:28,090 --> 00:32:31,340
We would like to define a
concept of a conditional
616
00:32:31,340 --> 00:32:34,530
density of a random variable X
given the value of another
617
00:32:34,530 --> 00:32:37,860
random variable Y. And it should
behave the following
618
00:32:37,860 --> 00:32:40,570
way, that the conditional
density gives us the
619
00:32:40,570 --> 00:32:42,690
probability of little
intervals--
620
00:32:42,690 --> 00:32:44,260
same as here--
621
00:32:44,260 --> 00:32:48,440
given that we are told
the value of y.
622
00:32:48,440 --> 00:32:50,930
And here's where the
subtleties come.
623
00:32:50,930 --> 00:32:54,420
The main thing to notice is
that here I didn't write
624
00:32:54,420 --> 00:32:59,000
"equal," I wrote "approximately
equal." Why do
625
00:32:59,000 --> 00:33:01,250
we need that?
626
00:33:01,250 --> 00:33:04,460
Well, the thing is that
conditional probabilities are
627
00:33:04,460 --> 00:33:08,840
not defined when you condition
on an event that has 0
628
00:33:08,840 --> 00:33:10,180
probability.
629
00:33:10,180 --> 00:33:13,400
So we need the conditioning
event here to have posed this
630
00:33:13,400 --> 00:33:14,430
probability.
631
00:33:14,430 --> 00:33:18,840
So instead of saying that Y is
exactly equal to little y, we
632
00:33:18,840 --> 00:33:22,900
want to instead say we're in a
new universe where capital Y
633
00:33:22,900 --> 00:33:27,070
is very close to little y.
634
00:33:27,070 --> 00:33:31,410
And then this notion of "very
close" kind of takes the limit
635
00:33:31,410 --> 00:33:34,910
and takes it to be
infinitesimally close.
636
00:33:34,910 --> 00:33:38,610
So this is the way to interpret
conditional
637
00:33:38,610 --> 00:33:40,120
probabilities.
638
00:33:40,120 --> 00:33:42,550
That's what they should mean.
639
00:33:42,550 --> 00:33:45,330
Now, in practice, when you
actually use probability, you
640
00:33:45,330 --> 00:33:46,780
forget about that subtlety.
641
00:33:46,780 --> 00:33:50,940
And you say, well, I've been
told that Y is equal to 1.3.
642
00:33:50,940 --> 00:33:53,780
Give me the conditional
distribution of X. But
643
00:33:53,780 --> 00:33:58,080
formally or rigorously, you
should say I'm being told that
644
00:33:58,080 --> 00:34:01,400
Y is infinitesimally
close to 1.3.
645
00:34:01,400 --> 00:34:03,620
Tell me the distribution of X.
646
00:34:03,620 --> 00:34:08,580
Now, if this is what we want,
what should this quantity be?
647
00:34:08,580 --> 00:34:10,489
It's a conditional probability,
so it should be
648
00:34:10,489 --> 00:34:12,800
the probability of two
things happening--
649
00:34:12,800 --> 00:34:16,550
X being close to little x, Y
being close to little y.
650
00:34:16,550 --> 00:34:20,010
And that's basically given to
us by the joint density
651
00:34:20,010 --> 00:34:23,920
divided by the probability of
the conditioning event, which
652
00:34:23,920 --> 00:34:27,449
has something to do with the
density of Y itself.
653
00:34:27,449 --> 00:34:30,840
And if you do things carefully,
you see that the
654
00:34:30,840 --> 00:34:34,350
only way to satisfy this
relation is to define the
655
00:34:34,350 --> 00:34:38,065
conditional density by this
particular formula.
656
00:34:38,065 --> 00:34:38,590
OK.
657
00:34:38,590 --> 00:34:44,159
Big discussion to come down in
the end to what you should
658
00:34:44,159 --> 00:34:46,120
have probably guessed by now.
659
00:34:46,120 --> 00:34:49,170
We just take any formulas and
expressions from the discrete
660
00:34:49,170 --> 00:34:53,570
case and replace PMFs by PDFs.
661
00:34:53,570 --> 00:34:58,030
So the conditional PDF is
defined by this formula where
662
00:34:58,030 --> 00:35:02,450
here we have joint PDF and
marginal PDF, as opposed to
663
00:35:02,450 --> 00:35:05,450
the discrete case where we
had the joint PMF and
664
00:35:05,450 --> 00:35:07,540
the marginal PMF.
665
00:35:07,540 --> 00:35:11,850
So in some sense, it's just
a syntactic change.
666
00:35:11,850 --> 00:35:14,510
In another sense, it's a little
subtler on how you
667
00:35:14,510 --> 00:35:17,130
actually interpret it.
668
00:35:17,130 --> 00:35:20,230
Speaking about interpretation,
what are some ways of thinking
669
00:35:20,230 --> 00:35:22,170
about the joint density?
670
00:35:22,170 --> 00:35:24,740
Well, the best way to think
about it is that somebody has
671
00:35:24,740 --> 00:35:27,720
fixed little y for you.
672
00:35:27,720 --> 00:35:31,980
So little y is being
fixed here.
673
00:35:31,980 --> 00:35:35,350
And we look at this density
as a function of X.
674
00:35:35,350 --> 00:35:37,020
I've told you what Y is.
675
00:35:37,020 --> 00:35:39,870
Tell me what you know about X.
And you tell me that X has a
676
00:35:39,870 --> 00:35:42,070
certain distribution.
677
00:35:42,070 --> 00:35:44,840
What does that distribution
look like?
678
00:35:44,840 --> 00:35:50,070
It has exactly the same shape
as the joint density.
679
00:35:50,070 --> 00:35:53,390
Remember, we fixed Y. So
this is a constant.
680
00:35:53,390 --> 00:35:57,200
So the only thing that varies
is X. So we get the function
681
00:35:57,200 --> 00:36:01,320
that behaves like the joint
density when you fix y, which
682
00:36:01,320 --> 00:36:04,100
is really you take the joint
density, and you
683
00:36:04,100 --> 00:36:05,650
take a slice of it.
684
00:36:05,650 --> 00:36:09,200
You fix a y, and you see
how it varies with x.
685
00:36:09,200 --> 00:36:11,810
So in that sense, the
conditional PDF is just a
686
00:36:11,810 --> 00:36:14,150
slice of the joint PDF.
687
00:36:14,150 --> 00:36:17,230
But we need to divide by a
certain number, which just
688
00:36:17,230 --> 00:36:19,480
scales it and changes
its shape.
689
00:36:19,480 --> 00:36:21,950
We're coming back to a
picture in a second.
690
00:36:21,950 --> 00:36:25,410
But before going to the picture,
lets go back to the
691
00:36:25,410 --> 00:36:27,840
interpretation of
independence.
692
00:36:27,840 --> 00:36:30,230
If the two random the variables
are independent,
693
00:36:30,230 --> 00:36:33,550
according to our definition in
the previous slide, the joint
694
00:36:33,550 --> 00:36:36,130
density is going to factor
as the product of
695
00:36:36,130 --> 00:36:37,820
the marginal densities.
696
00:36:37,820 --> 00:36:40,850
The density of Y in the
numerator cancels the density
697
00:36:40,850 --> 00:36:42,010
in the denominator.
698
00:36:42,010 --> 00:36:44,410
And we're just left with
the density of X.
699
00:36:44,410 --> 00:36:46,940
So in the case of independence,
what we get is
700
00:36:46,940 --> 00:36:49,870
that the conditional is the
same as the marginal.
701
00:36:49,870 --> 00:36:52,980
And that solidifies our
intuition that in the case of
702
00:36:52,980 --> 00:36:58,080
independence, being told
something about the value of Y
703
00:36:58,080 --> 00:37:02,540
does not change our beliefs
about how X is distributed.
704
00:37:02,540 --> 00:37:06,110
So whatever we expected about X
is going to remain true even
705
00:37:06,110 --> 00:37:09,180
after we are told something
about Y.
706
00:37:09,180 --> 00:37:12,680
So let's look at
some pictures.
707
00:37:12,680 --> 00:37:16,110
Here is what the joint
PDF might look like.
708
00:37:16,110 --> 00:37:19,480
Here we've got our
x and y-axis.
709
00:37:19,480 --> 00:37:23,100
And if you want to calculate the
probability of a certain
710
00:37:23,100 --> 00:37:27,240
event, what you do is you look
at that event and you see how
711
00:37:27,240 --> 00:37:31,740
much of that mass is sitting
on top of that event.
712
00:37:31,740 --> 00:37:35,180
Now let's start slicing.
713
00:37:35,180 --> 00:37:43,360
Let's fix a value of x and look
along that slice where we
714
00:37:43,360 --> 00:37:48,610
obtain this function.
715
00:37:48,610 --> 00:37:52,280
Now what does that slice do?
716
00:37:52,280 --> 00:37:56,100
That slice tells us for that
particular x what the possible
717
00:37:56,100 --> 00:38:00,330
values of y are going to be
and how likely they are.
718
00:38:00,330 --> 00:38:05,440
If we integrate over all
y's, what do we get?
719
00:38:05,440 --> 00:38:10,400
Integrating over all y's just
gives us the marginal density
720
00:38:10,400 --> 00:38:15,270
of X. It's the calculation
that we did here.
721
00:38:15,270 --> 00:38:19,820
By integrating over all y's, we
find the marginal density
722
00:38:19,820 --> 00:38:27,850
of X. So the total area under
that slice gives us the
723
00:38:27,850 --> 00:38:31,340
marginal density of X. And by
looking at the different
724
00:38:31,340 --> 00:38:35,430
slices, we find how likely the
different values of x are
725
00:38:35,430 --> 00:38:36,660
going to be.
726
00:38:36,660 --> 00:38:39,410
How about the conditional?
727
00:38:39,410 --> 00:38:48,790
If we're interested in the
conditional of Y given X, how
728
00:38:48,790 --> 00:38:51,200
would you think about it?
729
00:38:51,200 --> 00:38:54,620
This refers to a universe where
we are told that capital
730
00:38:54,620 --> 00:38:57,550
X takes on a specific value.
731
00:38:57,550 --> 00:39:00,010
So we put ourselves in
the universe where
732
00:39:00,010 --> 00:39:01,810
this line has happened.
733
00:39:01,810 --> 00:39:05,940
There's still possible values
of y that can happen.
734
00:39:05,940 --> 00:39:09,270
And this shape kind of tells us
the relative likelihoods of
735
00:39:09,270 --> 00:39:10,760
the different y's.
736
00:39:10,760 --> 00:39:14,060
And this is indeed going to be
the shape of the conditional
737
00:39:14,060 --> 00:39:17,850
distribution of Y given
that X has occurred.
738
00:39:17,850 --> 00:39:21,090
On the other hand, the
conditional distribution must
739
00:39:21,090 --> 00:39:22,630
add up to 1.
740
00:39:22,630 --> 00:39:25,920
So the total probability over
all of the different y's in
741
00:39:25,920 --> 00:39:27,730
this universe, that
total probability
742
00:39:27,730 --> 00:39:29,540
should be equal to 1.
743
00:39:29,540 --> 00:39:31,450
Here it's not equal to 1.
744
00:39:31,450 --> 00:39:34,290
The total area is the
marginal density.
745
00:39:34,290 --> 00:39:38,590
To make it equal to 1, we need
to divide by the marginal
746
00:39:38,590 --> 00:39:44,160
density, which is basically to
renormalize this shape so that
747
00:39:44,160 --> 00:39:48,500
the total area under that slice,
under that shape, is
748
00:39:48,500 --> 00:39:50,400
equal to 1.
749
00:39:50,400 --> 00:39:53,430
So we start with the joint.
750
00:39:53,430 --> 00:39:55,730
We take the slices.
751
00:39:55,730 --> 00:40:00,280
And then we adjust the slices
so that every slice has an
752
00:40:00,280 --> 00:40:03,610
area underneath equal to 1.
753
00:40:03,610 --> 00:40:05,650
And this gives us
the conditional.
754
00:40:05,650 --> 00:40:09,160
So for example, down here--
755
00:40:09,160 --> 00:40:11,840
you can not even see it
in this diagram--
756
00:40:11,840 --> 00:40:15,410
but after you renormalize it
so that its total area is
757
00:40:15,410 --> 00:40:20,160
equal to 1, you get this sort of
narrow spike that goes up.
758
00:40:20,160 --> 00:40:22,980
And so this is a plot of the
conditional distributions that
759
00:40:22,980 --> 00:40:26,060
you get for the different
values of x.
760
00:40:26,060 --> 00:40:29,050
Given a particular value of x,
you're going to get this
761
00:40:29,050 --> 00:40:31,460
certain conditional
distribution.
762
00:40:31,460 --> 00:40:36,460
So this picture is worth about
as much as anything else in
763
00:40:36,460 --> 00:40:38,840
this particular chapter.
764
00:40:38,840 --> 00:40:42,990
Make sure you kind of understand
exactly all these
765
00:40:42,990 --> 00:40:44,240
pieces of the picture.
766
00:40:44,240 --> 00:40:47,130
767
00:40:47,130 --> 00:40:49,870
And finally, let's go, in the
remaining time, through an
768
00:40:49,870 --> 00:40:55,240
example where we're going to
throw in the bucket all the
769
00:40:55,240 --> 00:40:58,320
concepts and notations that
we have introduced so far.
770
00:40:58,320 --> 00:40:59,960
So the example is as follows.
771
00:40:59,960 --> 00:41:04,210
We start with a stick that
has a certain length.
772
00:41:04,210 --> 00:41:07,790
And we break it a completely
random location.
773
00:41:07,790 --> 00:41:09,390
And--
774
00:41:09,390 --> 00:41:13,686
yes, this 1 should be l.
775
00:41:13,686 --> 00:41:14,130
OK.
776
00:41:14,130 --> 00:41:15,770
So it has length l.
777
00:41:15,770 --> 00:41:19,210
And we're going to break
it at the random place.
778
00:41:19,210 --> 00:41:21,970
And we call that random place
where we break it, we call it
779
00:41:21,970 --> 00:41:24,210
X.
780
00:41:24,210 --> 00:41:26,670
X can be anywhere, uniform
distribution.
781
00:41:26,670 --> 00:41:31,800
So this means that X has a
density that goes from 0 to l.
782
00:41:31,800 --> 00:41:34,760
I guess this capital L is
supposed to be the same as the
783
00:41:34,760 --> 00:41:36,190
lower-case l.
784
00:41:36,190 --> 00:41:39,430
So that's the density of X. And
since the density needs to
785
00:41:39,430 --> 00:41:43,160
integrate to 1, the height of
that density has to be 1/l.
786
00:41:43,160 --> 00:41:46,330
787
00:41:46,330 --> 00:41:49,660
Now, having broken the stick
and given that we are left
788
00:41:49,660 --> 00:41:53,080
with this piece of the stick,
I'm now going to break it
789
00:41:53,080 --> 00:41:56,900
again at a completely random
place, meaning I'm going to
790
00:41:56,900 --> 00:41:59,940
choose a point where I break it
uniformly over the length
791
00:41:59,940 --> 00:42:00,940
of the stick.
792
00:42:00,940 --> 00:42:02,750
What does this mean?
793
00:42:02,750 --> 00:42:05,720
And let's call Y the location
where I break it.
794
00:42:05,720 --> 00:42:10,290
So Y is going to range
between 0 and x.
795
00:42:10,290 --> 00:42:11,850
x is the stick that
I'm left with.
796
00:42:11,850 --> 00:42:14,190
So I'm going to break it
somewhere in between.
797
00:42:14,190 --> 00:42:21,140
So I pick a y between 0 and x.
798
00:42:21,140 --> 00:42:24,480
And of course, x
is less than l.
799
00:42:24,480 --> 00:42:26,150
And I'm going to
break it there.
800
00:42:26,150 --> 00:42:30,640
So y is uniform between
0 and x.
801
00:42:30,640 --> 00:42:36,460
What does that mean, that the
density of y, given that you
802
00:42:36,460 --> 00:42:42,940
have already told me x, ranges
from 0 to little x?
803
00:42:42,940 --> 00:42:46,170
If I told you that the first
break happened at a particular
804
00:42:46,170 --> 00:42:50,850
x, then y can only range
over this interval.
805
00:42:50,850 --> 00:42:52,830
And I'm assuming a uniform
806
00:42:52,830 --> 00:42:54,330
distribution over that interval.
807
00:42:54,330 --> 00:42:56,420
So we have this kind of shape.
808
00:42:56,420 --> 00:43:00,700
And that fixes for
us the height of
809
00:43:00,700 --> 00:43:01,950
the conditional density.
810
00:43:01,950 --> 00:43:05,380
811
00:43:05,380 --> 00:43:11,690
So what's the joint density of
those two random variables?
812
00:43:11,690 --> 00:43:14,440
By the definition of conditional
densities, the
813
00:43:14,440 --> 00:43:18,290
conditional was defined as the
ratio of this divided by that.
814
00:43:18,290 --> 00:43:21,500
So we can find the joint density
by taking the marginal
815
00:43:21,500 --> 00:43:23,630
and then multiplying
by the conditional.
816
00:43:23,630 --> 00:43:26,120
This is the same formula as
in the discrete case.
817
00:43:26,120 --> 00:43:29,770
This is our very familiar
multiplication rule, but
818
00:43:29,770 --> 00:43:32,150
adjusted to the case of
continuous random variables.
819
00:43:32,150 --> 00:43:34,871
So Ps become Fs.
820
00:43:34,871 --> 00:43:35,290
OK.
821
00:43:35,290 --> 00:43:37,560
So we do have a formula
for this.
822
00:43:37,560 --> 00:43:38,540
What is it?
823
00:43:38,540 --> 00:43:40,190
It's 1/l--
824
00:43:40,190 --> 00:43:42,140
that's the density of X --
825
00:43:42,140 --> 00:43:46,460
times 1/x, which is the
conditional density of Y. This
826
00:43:46,460 --> 00:43:48,630
is the formula for the
joint density.
827
00:43:48,630 --> 00:43:50,140
But we must be careful.
828
00:43:50,140 --> 00:43:53,230
This is a formula that's
not valid anywhere.
829
00:43:53,230 --> 00:43:57,150
It's only valid for the x's
and y's that are possible.
830
00:43:57,150 --> 00:44:00,840
And the x's and y's that are
possible are given by these
831
00:44:00,840 --> 00:44:01,900
inequalities.
832
00:44:01,900 --> 00:44:05,940
So x can range from 0 to
l, and y can only be
833
00:44:05,940 --> 00:44:07,270
smaller than x.
834
00:44:07,270 --> 00:44:09,780
So this is the formula
for the density on
835
00:44:09,780 --> 00:44:12,310
this part of our space.
836
00:44:12,310 --> 00:44:16,270
The density is 0
anywhere else.
837
00:44:16,270 --> 00:44:18,430
So what does it look like?
838
00:44:18,430 --> 00:44:20,950
It's basically a 1/x function.
839
00:44:20,950 --> 00:44:23,460
So it's sort of constant
along that dimension.
840
00:44:23,460 --> 00:44:27,600
But as x goes to 0, your
density goes up and
841
00:44:27,600 --> 00:44:29,280
can even blow up.
842
00:44:29,280 --> 00:44:33,400
It sort of looks like a sail
that's raised and somewhat
843
00:44:33,400 --> 00:44:37,640
curved and has a point up
there going to infinity.
844
00:44:37,640 --> 00:44:39,680
So this is the joint density.
845
00:44:39,680 --> 00:44:43,480
Now once you have in your hands
a joint density, then
846
00:44:43,480 --> 00:44:46,010
you can answer in principle
any problem.
847
00:44:46,010 --> 00:44:50,550
It's just a matter of plugging
in and doing computations.
848
00:44:50,550 --> 00:44:53,650
How about calculating something
like a conditional
849
00:44:53,650 --> 00:44:59,040
expectation of Y given
a value of x?
850
00:44:59,040 --> 00:44:59,430
OK.
851
00:44:59,430 --> 00:45:02,530
That's a concept we have
not defined so far.
852
00:45:02,530 --> 00:45:04,860
But how should we define it?
853
00:45:04,860 --> 00:45:06,080
Means the reasonable thing.
854
00:45:06,080 --> 00:45:09,930
We'll define it the same way
as ordinary expectations
855
00:45:09,930 --> 00:45:14,160
except that since we're given
some conditioning information,
856
00:45:14,160 --> 00:45:17,130
we should use the probability
distribution that applies to
857
00:45:17,130 --> 00:45:18,840
that particular situation.
858
00:45:18,840 --> 00:45:22,570
So in a situation where we are
told the value of x, the
859
00:45:22,570 --> 00:45:25,760
distribution that applies is the
conditional distribution
860
00:45:25,760 --> 00:45:29,950
of Y. So it's going to be the
conditional density of Y given
861
00:45:29,950 --> 00:45:31,470
the value of x.
862
00:45:31,470 --> 00:45:34,120
Now, we know what this is.
863
00:45:34,120 --> 00:45:37,860
It's given by 1/x.
864
00:45:37,860 --> 00:45:46,160
So we need to integrate
y times 1/x dy.
865
00:45:46,160 --> 00:45:48,920
And what should we
integrate over?
866
00:45:48,920 --> 00:45:53,930
Well, given the value of x, y
can only range from 0 to x.
867
00:45:53,930 --> 00:45:56,150
So this is what we get.
868
00:45:56,150 --> 00:46:01,690
And you do your integral, and
you get that this is x/2.
869
00:46:01,690 --> 00:46:03,060
Is it a surprise?
870
00:46:03,060 --> 00:46:04,450
It shouldn't be.
871
00:46:04,450 --> 00:46:10,890
This is just the expected value
of Y in a universe where
872
00:46:10,890 --> 00:46:14,560
X has been realized and Y is
given by this distribution.
873
00:46:14,560 --> 00:46:17,390
Y is uniform between 0 and x.
874
00:46:17,390 --> 00:46:20,820
The expected value of Y should
be the midpoint of this
875
00:46:20,820 --> 00:46:22,100
interval, which is x/2.
876
00:46:22,100 --> 00:46:25,090
877
00:46:25,090 --> 00:46:28,580
Now let's do fancier stuff.
878
00:46:28,580 --> 00:46:31,850
Since we have the joint
distribution, we should be
879
00:46:31,850 --> 00:46:34,250
able to calculate
the marginal.
880
00:46:34,250 --> 00:46:36,500
What is the distribution of Y?
881
00:46:36,500 --> 00:46:40,510
After breaking the stick twice,
how big is the little
882
00:46:40,510 --> 00:46:42,890
piece that I'm left with?
883
00:46:42,890 --> 00:46:44,630
How do we find this?
884
00:46:44,630 --> 00:46:48,850
To find the marginal, we just
take the joint and integrate
885
00:46:48,850 --> 00:46:52,670
out the variable that
we don't want.
886
00:46:52,670 --> 00:46:55,220
A particular y can happen
in many ways.
887
00:46:55,220 --> 00:46:57,800
It can happen together
with any x.
888
00:46:57,800 --> 00:47:00,700
So we consider all the possible
x's that can go
889
00:47:00,700 --> 00:47:05,940
together with this y and average
over all those x's.
890
00:47:05,940 --> 00:47:09,330
So we plug in the formula for
the joint density from the
891
00:47:09,330 --> 00:47:10,140
previous slide.
892
00:47:10,140 --> 00:47:13,070
We know that it's 1/lx.
893
00:47:13,070 --> 00:47:16,880
And what's the range
of the x's?
894
00:47:16,880 --> 00:47:22,880
So to find the density of Y for
a particular y up here,
895
00:47:22,880 --> 00:47:26,480
I'm going to integrate
over x's.
896
00:47:26,480 --> 00:47:29,040
The density is 0
here and there.
897
00:47:29,040 --> 00:47:32,160
The density is nonzero
only in this part.
898
00:47:32,160 --> 00:47:37,260
So I need to integrate over x's
going from here to there.
899
00:47:37,260 --> 00:47:39,120
So what's the "here"?
900
00:47:39,120 --> 00:47:42,200
This line goes up at
the slope of 1.
901
00:47:42,200 --> 00:47:45,420
So this is the line
x equals y.
902
00:47:45,420 --> 00:47:49,835
So if I fix y, it means that
my integral starts from a
903
00:47:49,835 --> 00:47:53,670
value of x that is
also equal to y.
904
00:47:53,670 --> 00:47:58,330
So where the integral starts
from is at x equals y.
905
00:47:58,330 --> 00:48:01,770
And it goes all the way until
the end of the length of our
906
00:48:01,770 --> 00:48:03,660
stick, which is l.
907
00:48:03,660 --> 00:48:08,760
So we need to integrate
from little y up to l.
908
00:48:08,760 --> 00:48:12,520
So that's something that
almost always comes up.
909
00:48:12,520 --> 00:48:15,690
It's not enough to have just
this formula for integrating
910
00:48:15,690 --> 00:48:16,640
the joint density.
911
00:48:16,640 --> 00:48:19,160
You need to keep track
of different regions.
912
00:48:19,160 --> 00:48:23,920
And if the joint density is 0
in some regions, then you
913
00:48:23,920 --> 00:48:28,250
exclude those regions from
the range of integration.
914
00:48:28,250 --> 00:48:32,380
So the range of integration is
only over those values where
915
00:48:32,380 --> 00:48:35,600
the particular formula is valid,
the places where the
916
00:48:35,600 --> 00:48:37,990
joint density is nonzero.
917
00:48:37,990 --> 00:48:38,360
All right.
918
00:48:38,360 --> 00:48:41,760
The integral of 1/x dx, that
gives you a logarithm.
919
00:48:41,760 --> 00:48:45,460
So we evaluate this integral,
and we get an
920
00:48:45,460 --> 00:48:47,410
expression of this kind.
921
00:48:47,410 --> 00:48:53,660
So the density of Y has a
somewhat unexpected shape.
922
00:48:53,660 --> 00:48:55,470
So it's a logarithmic
function.
923
00:48:55,470 --> 00:48:59,860
And it goes this way.
924
00:48:59,860 --> 00:49:02,980
It's for y going all
the way to l.
925
00:49:02,980 --> 00:49:07,860
When y is equal to l, the
logarithm of 1 is equal to 0.
926
00:49:07,860 --> 00:49:12,660
But when y approaches 0,
logarithm of something big
927
00:49:12,660 --> 00:49:15,740
blows up, and we get a
shape of this form.
928
00:49:15,740 --> 00:49:21,900
929
00:49:21,900 --> 00:49:22,330
OK.
930
00:49:22,330 --> 00:49:25,960
Finally, we can calculate the
expected value of Y. And we
931
00:49:25,960 --> 00:49:29,430
can do this by using the
definition of the expectation.
932
00:49:29,430 --> 00:49:33,300
So integral of y times
the density of y.
933
00:49:33,300 --> 00:49:36,290
We already found what that
density is, so we
934
00:49:36,290 --> 00:49:38,030
can plug it in here.
935
00:49:38,030 --> 00:49:40,470
And we're integrating over
the range of possible
936
00:49:40,470 --> 00:49:42,470
y's, from 0 to l.
937
00:49:42,470 --> 00:49:46,930
Now this involves the integral
for y log y, which I'm sure
938
00:49:46,930 --> 00:49:49,500
you have encountered in your
calculus classes but maybe do
939
00:49:49,500 --> 00:49:51,350
not remember how to do it.
940
00:49:51,350 --> 00:49:53,650
In any case, you look it
up in some integral
941
00:49:53,650 --> 00:49:55,300
tables or do it by parts.
942
00:49:55,300 --> 00:49:59,360
And you get the final
answer of l/4.
943
00:49:59,360 --> 00:50:02,400
And at this point, you say,
that's a really simple answer.
944
00:50:02,400 --> 00:50:06,200
Shouldn't I have expected
it to be l/4?
945
00:50:06,200 --> 00:50:07,680
I guess, yes.
946
00:50:07,680 --> 00:50:11,070
I mean, when you break it once,
the expected value of
947
00:50:11,070 --> 00:50:14,220
what you are left with is going
to be 1/2 of what you
948
00:50:14,220 --> 00:50:15,860
started with.
949
00:50:15,860 --> 00:50:19,320
When you break it the next time,
the expected length of
950
00:50:19,320 --> 00:50:23,380
what you're left with should be
1/2 of the piece that you
951
00:50:23,380 --> 00:50:24,550
are now breaking.
952
00:50:24,550 --> 00:50:27,350
So each time that you break it
at random, you expected it to
953
00:50:27,350 --> 00:50:29,840
become smaller by
a factor of 1/2.
954
00:50:29,840 --> 00:50:31,960
So if you break it twice, you
are left something that's
955
00:50:31,960 --> 00:50:33,940
expected to be 1/4.
956
00:50:33,940 --> 00:50:37,350
This is reasoning on the
average, which happens to give
957
00:50:37,350 --> 00:50:39,010
you the right answer
in this case.
958
00:50:39,010 --> 00:50:41,800
But again, there's the warning
that reasoning on the average
959
00:50:41,800 --> 00:50:44,230
doesn't always give you
the right answer.
960
00:50:44,230 --> 00:50:48,100
So be careful about doing
arguments of this type.
961
00:50:48,100 --> 00:50:48,620
Very good.
962
00:50:48,620 --> 00:50:49,870
See you on Wednesday.
963
00:50:49,870 --> 00:50:50,870