1
00:00:00,000 --> 00:00:00,040
2
00:00:00,040 --> 00:00:02,460
The following content is
provided under a Creative
3
00:00:02,460 --> 00:00:03,870
Commons license.
4
00:00:03,870 --> 00:00:06,910
Your support will help MIT
OpenCourseWare continue to
5
00:00:06,910 --> 00:00:10,560
offer high-quality educational
resources for free.
6
00:00:10,560 --> 00:00:13,460
To make a donation or view
additional materials from
7
00:00:13,460 --> 00:00:19,290
hundreds of MIT courses, visit
MIT OpenCourseWare at
8
00:00:19,290 --> 00:00:20,540
ocw.mit.edu.
9
00:00:20,540 --> 00:00:22,640
10
00:00:22,640 --> 00:00:22,990
JOHN TSITSIKLIS: OK.
11
00:00:22,990 --> 00:00:24,020
We can start.
12
00:00:24,020 --> 00:00:26,540
Good morning.
13
00:00:26,540 --> 00:00:29,600
So we're going to start
now a new unit.
14
00:00:29,600 --> 00:00:32,200
For the next couple of lectures,
we will be talking
15
00:00:32,200 --> 00:00:34,560
about continuous random
variables.
16
00:00:34,560 --> 00:00:36,520
So this is new material
which is not going
17
00:00:36,520 --> 00:00:37,400
to be in the quiz.
18
00:00:37,400 --> 00:00:41,170
You are going to have a long
break next week without any
19
00:00:41,170 --> 00:00:45,230
lecture, just a quiz and
recitation and tutorial.
20
00:00:45,230 --> 00:00:48,500
So what's going to happen
in this new unit?
21
00:00:48,500 --> 00:00:52,760
Basically, we want to do
everything that we did for
22
00:00:52,760 --> 00:00:56,610
discrete random variables,
reintroduce the same sort of
23
00:00:56,610 --> 00:00:59,510
concepts but see how they apply
and how they need to be
24
00:00:59,510 --> 00:01:02,840
modified in order to talk about
random variables that
25
00:01:02,840 --> 00:01:04,700
take continuous values.
26
00:01:04,700 --> 00:01:06,610
At some level, it's
all the same.
27
00:01:06,610 --> 00:01:10,340
At some level, it's quite a bit
harder because when things
28
00:01:10,340 --> 00:01:12,490
are continuous, calculus
comes in.
29
00:01:12,490 --> 00:01:14,770
So the calculations that you
have to do on the side
30
00:01:14,770 --> 00:01:17,760
sometimes need a little
bit more thinking.
31
00:01:17,760 --> 00:01:20,300
In terms of new concepts,
there's not going to be a
32
00:01:20,300 --> 00:01:24,200
whole lot today, some analogs
of things we have done.
33
00:01:24,200 --> 00:01:27,110
We're going to introduce the
concept of cumulative
34
00:01:27,110 --> 00:01:29,950
distribution functions, which
allows us to deal with
35
00:01:29,950 --> 00:01:32,750
discrete and continuous
random variables, all
36
00:01:32,750 --> 00:01:34,560
of them in one shot.
37
00:01:34,560 --> 00:01:37,890
And finally, introduce a famous
kind of continuous
38
00:01:37,890 --> 00:01:41,900
random variable, the normal
random variable.
39
00:01:41,900 --> 00:01:43,970
OK, so what's the story?
40
00:01:43,970 --> 00:01:46,970
Continuous random variables are
random variables that take
41
00:01:46,970 --> 00:01:50,350
values over the continuum.
42
00:01:50,350 --> 00:01:53,470
So the numerical value of the
random variable can be any
43
00:01:53,470 --> 00:01:55,240
real number.
44
00:01:55,240 --> 00:01:58,600
They don't take values just
in a discrete set.
45
00:01:58,600 --> 00:02:00,660
So we have our sample space.
46
00:02:00,660 --> 00:02:02,020
The experiment happens.
47
00:02:02,020 --> 00:02:05,730
We get some omega, a sample
point in the sample space.
48
00:02:05,730 --> 00:02:10,070
And once that point is
determined, it determines the
49
00:02:10,070 --> 00:02:12,500
numerical value of the
random variable.
50
00:02:12,500 --> 00:02:15,370
Remember, random variables are
functions on the sample space.
51
00:02:15,370 --> 00:02:17,020
You pick a sample point.
52
00:02:17,020 --> 00:02:19,690
This determines the numerical
value of the random variable.
53
00:02:19,690 --> 00:02:23,500
So that numerical value is going
to be some real number
54
00:02:23,500 --> 00:02:26,010
on that line.
55
00:02:26,010 --> 00:02:28,550
Now we want to say something
about the distribution of the
56
00:02:28,550 --> 00:02:29,290
random variable.
57
00:02:29,290 --> 00:02:31,970
We want to say which values are
more likely than others to
58
00:02:31,970 --> 00:02:34,060
occur in a certain sense.
59
00:02:34,060 --> 00:02:36,910
For example, you may be
interested in a particular
60
00:02:36,910 --> 00:02:40,090
event, the event that the random
variable takes values
61
00:02:40,090 --> 00:02:42,360
in the interval from a to b.
62
00:02:42,360 --> 00:02:43,950
And we want to say something
about the
63
00:02:43,950 --> 00:02:45,820
probability of that event.
64
00:02:45,820 --> 00:02:48,510
In principle, how
is this done?
65
00:02:48,510 --> 00:02:52,010
You go back to the sample space,
and you find all those
66
00:02:52,010 --> 00:02:56,790
outcomes for which the value of
the random variable happens
67
00:02:56,790 --> 00:02:58,500
to be in that interval.
68
00:02:58,500 --> 00:03:01,870
The probability that the random
variable falls here is
69
00:03:01,870 --> 00:03:06,070
the same as the probability of
all outcomes that make the
70
00:03:06,070 --> 00:03:08,290
random variable to
fall in there.
71
00:03:08,290 --> 00:03:11,190
So in principle, you can work on
the original sample space,
72
00:03:11,190 --> 00:03:14,750
find the probability of this
event, and you would be done.
73
00:03:14,750 --> 00:03:18,810
But similar to what happened in
chapter 2, we want to kind
74
00:03:18,810 --> 00:03:22,910
of push the sample space in the
background and just work
75
00:03:22,910 --> 00:03:26,890
directly on the real
axis and talk about
76
00:03:26,890 --> 00:03:28,640
probabilities up here.
77
00:03:28,640 --> 00:03:32,430
So we want now a way to specify
probabilities, how
78
00:03:32,430 --> 00:03:38,340
they are bunched together, or
arranged, along the real line.
79
00:03:38,340 --> 00:03:40,980
So what did we do for discrete
random variables?
80
00:03:40,980 --> 00:03:44,100
We introduced PMFs, probability
mass functions.
81
00:03:44,100 --> 00:03:47,100
And the way that we described
the random variable was by
82
00:03:47,100 --> 00:03:50,300
saying this point has so much
mass on top of it, that point
83
00:03:50,300 --> 00:03:52,790
has so much mass on top
of it, and so on.
84
00:03:52,790 --> 00:03:57,610
And so we assigned a total
amount of 1 unit of
85
00:03:57,610 --> 00:03:58,670
probability.
86
00:03:58,670 --> 00:04:01,810
We assigned it to different
masses, which we put at
87
00:04:01,810 --> 00:04:04,870
different points on
the real axis.
88
00:04:04,870 --> 00:04:08,070
So that's what you do if
somebody gives you a pound of
89
00:04:08,070 --> 00:04:11,910
discrete stuff, a pound of
mass in little chunks.
90
00:04:11,910 --> 00:04:15,300
And you place those chunks
at a few points.
91
00:04:15,300 --> 00:04:20,890
Now, in the continuous case,
this total unit of probability
92
00:04:20,890 --> 00:04:25,440
mass does not sit just on
discrete points but is spread
93
00:04:25,440 --> 00:04:28,140
all over the real axis.
94
00:04:28,140 --> 00:04:31,280
So now we're going to have a
unit of mass that spreads on
95
00:04:31,280 --> 00:04:32,510
top of the real axis.
96
00:04:32,510 --> 00:04:36,020
How do we describe masses that
are continuously spread?
97
00:04:36,020 --> 00:04:39,680
The way we describe them is
by specifying densities.
98
00:04:39,680 --> 00:04:43,800
That is, how thick is the mass
that's sitting here?
99
00:04:43,800 --> 00:04:46,210
How dense is the mass that's
sitting there?
100
00:04:46,210 --> 00:04:48,260
So that's exactly what
we're going to do.
101
00:04:48,260 --> 00:04:50,930
We're going to introduce the
concept of a probability
102
00:04:50,930 --> 00:04:55,340
density function that tells us
how probabilities accumulate
103
00:04:55,340 --> 00:04:59,270
at different parts
of the real axis.
104
00:04:59,270 --> 00:05:03,780
105
00:05:03,780 --> 00:05:07,870
So here's an example or a
picture of a possible
106
00:05:07,870 --> 00:05:10,210
probability density function.
107
00:05:10,210 --> 00:05:13,210
What does that density function
kind of convey
108
00:05:13,210 --> 00:05:14,290
intuitively?
109
00:05:14,290 --> 00:05:17,510
Well, that these x's
are relatively
110
00:05:17,510 --> 00:05:19,160
less likely to occur.
111
00:05:19,160 --> 00:05:22,120
Those x's are somewhat more
likely to occur because the
112
00:05:22,120 --> 00:05:24,930
density is higher.
113
00:05:24,930 --> 00:05:27,950
Now, for a more formal
definition, we're going to say
114
00:05:27,950 --> 00:05:35,620
that a random variable X is said
to be continuous if it
115
00:05:35,620 --> 00:05:38,560
can be described by a
density function in
116
00:05:38,560 --> 00:05:40,780
the following sense.
117
00:05:40,780 --> 00:05:42,910
We have a density function.
118
00:05:42,910 --> 00:05:47,830
And we calculate probabilities
of falling inside an interval
119
00:05:47,830 --> 00:05:52,580
by finding the area under
the curve that sits
120
00:05:52,580 --> 00:05:54,940
on top of that interval.
121
00:05:54,940 --> 00:05:57,800
So that's sort of the defining
relation for
122
00:05:57,800 --> 00:05:59,190
continuous random variables.
123
00:05:59,190 --> 00:06:00,860
It's an implicit definition.
124
00:06:00,860 --> 00:06:03,870
And it tells us a random
variable is continuous if we
125
00:06:03,870 --> 00:06:06,560
can calculate probabilities
this way.
126
00:06:06,560 --> 00:06:09,520
So the probability of falling
in this interval is the area
127
00:06:09,520 --> 00:06:10,500
under this curve.
128
00:06:10,500 --> 00:06:14,950
Mathematically, it's the
integral of the density over
129
00:06:14,950 --> 00:06:17,020
this particular interval.
130
00:06:17,020 --> 00:06:20,410
If the density happens to be
constant over that interval,
131
00:06:20,410 --> 00:06:23,610
the area under the curve would
be the length of the interval
132
00:06:23,610 --> 00:06:26,440
times the height of
the density, which
133
00:06:26,440 --> 00:06:28,170
sort of makes sense.
134
00:06:28,170 --> 00:06:32,020
Now, because the density is not
constant but it kind of
135
00:06:32,020 --> 00:06:35,720
moves around, what you need is
to write down an integral.
136
00:06:35,720 --> 00:06:39,100
Now, this formula is very much
analogous to what you would do
137
00:06:39,100 --> 00:06:41,030
for discrete random variables.
138
00:06:41,030 --> 00:06:44,140
For a discrete random variable,
how do you calculate
139
00:06:44,140 --> 00:06:45,610
this probability?
140
00:06:45,610 --> 00:06:48,800
You look at all x's
in this interval.
141
00:06:48,800 --> 00:06:54,060
And you add the probability mass
function over that range.
142
00:06:54,060 --> 00:06:59,660
So just for comparison, this
would be the formula for the
143
00:06:59,660 --> 00:07:01,590
discrete case--
144
00:07:01,590 --> 00:07:05,620
the sum over all x's in the
interval from a to b over the
145
00:07:05,620 --> 00:07:09,420
probability mass function.
146
00:07:09,420 --> 00:07:12,650
And there is a syntactic analogy
that's happening here
147
00:07:12,650 --> 00:07:16,160
and which will be a persistent
theme when we deal with
148
00:07:16,160 --> 00:07:18,920
continuous random variables.
149
00:07:18,920 --> 00:07:22,620
Sums get replaced
by integrals.
150
00:07:22,620 --> 00:07:24,110
In the discrete case, you add.
151
00:07:24,110 --> 00:07:26,920
In the continuous case,
you integrate.
152
00:07:26,920 --> 00:07:31,600
Mass functions get replaced
by density functions.
153
00:07:31,600 --> 00:07:35,500
So you can take pretty much any
formula from the discrete
154
00:07:35,500 --> 00:07:40,020
case and translate it to a
continuous analog of that
155
00:07:40,020 --> 00:07:41,480
formula, as we're
going to see.
156
00:07:41,480 --> 00:07:43,990
157
00:07:43,990 --> 00:07:45,240
OK.
158
00:07:45,240 --> 00:07:47,250
159
00:07:47,250 --> 00:07:50,040
So let's take this
now as our model.
160
00:07:50,040 --> 00:07:53,220
What is the probability that
the random variable takes a
161
00:07:53,220 --> 00:07:58,440
specific value if we have a
continuous random variable?
162
00:07:58,440 --> 00:08:00,200
Well, this would be the case.
163
00:08:00,200 --> 00:08:02,880
It's a case of a trivial
interval, where the two end
164
00:08:02,880 --> 00:08:04,660
points coincide.
165
00:08:04,660 --> 00:08:07,670
So it would be the integral
from a to itself.
166
00:08:07,670 --> 00:08:10,520
So you're integrating just
over a single point.
167
00:08:10,520 --> 00:08:12,790
Now, when you integrate over
a single point, the
168
00:08:12,790 --> 00:08:14,600
integral is just 0.
169
00:08:14,600 --> 00:08:17,980
The area under the curve, if
you're only looking at a
170
00:08:17,980 --> 00:08:19,560
single point, it's 0.
171
00:08:19,560 --> 00:08:22,670
So big property of continuous
random variables is that any
172
00:08:22,670 --> 00:08:26,940
individual point has
0 probability.
173
00:08:26,940 --> 00:08:30,740
In particular, when you look at
the value of the density,
174
00:08:30,740 --> 00:08:35,299
the density does not tell you
the probability of that point.
175
00:08:35,299 --> 00:08:37,860
The point itself has
0 probability.
176
00:08:37,860 --> 00:08:42,409
So the density tells you
something a little different.
177
00:08:42,409 --> 00:08:44,645
We are going to see shortly
what that is.
178
00:08:44,645 --> 00:08:47,390
179
00:08:47,390 --> 00:08:52,070
Before we get there,
can the density be
180
00:08:52,070 --> 00:08:54,410
an arbitrary function?
181
00:08:54,410 --> 00:08:56,160
Almost, but not quite.
182
00:08:56,160 --> 00:08:57,650
There are two things
that we want.
183
00:08:57,650 --> 00:09:00,310
First, since densities
are used to calculate
184
00:09:00,310 --> 00:09:02,690
probabilities, and since
probabilities must be
185
00:09:02,690 --> 00:09:06,840
non-negative, the density should
also be non-negative.
186
00:09:06,840 --> 00:09:10,960
Otherwise you would be getting
negative probabilities, which
187
00:09:10,960 --> 00:09:13,360
is not a good thing.
188
00:09:13,360 --> 00:09:16,930
So that's a basic property
that any density function
189
00:09:16,930 --> 00:09:18,640
should obey.
190
00:09:18,640 --> 00:09:21,970
The second property that we
need is that the overall
191
00:09:21,970 --> 00:09:25,210
probability of the entire real
line should be equal to 1.
192
00:09:25,210 --> 00:09:27,980
So if you ask me, what is the
probability that x falls
193
00:09:27,980 --> 00:09:30,760
between minus infinity and plus
infinity, well, we are
194
00:09:30,760 --> 00:09:33,590
sure that x is going to
fall in that range.
195
00:09:33,590 --> 00:09:37,400
So the probability of that
event should be 1.
196
00:09:37,400 --> 00:09:40,480
So the probability of being
between minus infinity and
197
00:09:40,480 --> 00:09:43,600
plus infinity should be 1, which
means that the integral
198
00:09:43,600 --> 00:09:46,410
from minus infinity to plus
infinity should be 1.
199
00:09:46,410 --> 00:09:50,460
So that just tells us that
there's 1 unit of total
200
00:09:50,460 --> 00:09:54,690
probability that's being
spread over our space.
201
00:09:54,690 --> 00:09:59,000
Now, what's the best way to
think intuitively about what
202
00:09:59,000 --> 00:10:01,480
the density function does?
203
00:10:01,480 --> 00:10:06,470
The interpretation that I find
most natural and easy to
204
00:10:06,470 --> 00:10:10,300
convey the meaning of a
density is to look at
205
00:10:10,300 --> 00:10:13,220
probabilities of small
intervals.
206
00:10:13,220 --> 00:10:18,850
So let us take an x somewhere
here and then x plus delta
207
00:10:18,850 --> 00:10:20,230
just next to it.
208
00:10:20,230 --> 00:10:23,050
So delta is a small number.
209
00:10:23,050 --> 00:10:26,460
And let's look at the
probability of the event that
210
00:10:26,460 --> 00:10:29,750
we get a value in that range.
211
00:10:29,750 --> 00:10:32,220
For continuous random variables,
the way we find the
212
00:10:32,220 --> 00:10:35,270
probability of falling in that
range is by integrating the
213
00:10:35,270 --> 00:10:37,550
density over that range.
214
00:10:37,550 --> 00:10:41,610
So we're drawing this picture.
215
00:10:41,610 --> 00:10:46,060
And we want to take the
area under this curve.
216
00:10:46,060 --> 00:10:50,760
Now, what happens if delta
is a fairly small number?
217
00:10:50,760 --> 00:10:55,030
If delta is pretty small, our
density is not going to change
218
00:10:55,030 --> 00:10:57,040
much over that range.
219
00:10:57,040 --> 00:10:59,330
So you can pretend that
the density is
220
00:10:59,330 --> 00:11:01,230
approximately constant.
221
00:11:01,230 --> 00:11:04,550
And so to find the area under
the curve, you just take the
222
00:11:04,550 --> 00:11:07,760
base times the height.
223
00:11:07,760 --> 00:11:10,630
And it doesn't matter where
exactly you take the height in
224
00:11:10,630 --> 00:11:13,140
that interval, because the
density doesn't change very
225
00:11:13,140 --> 00:11:15,370
much over that interval.
226
00:11:15,370 --> 00:11:19,760
And so the integral becomes just
base times the height.
227
00:11:19,760 --> 00:11:24,020
So for small intervals, the
probability of a small
228
00:11:24,020 --> 00:11:30,170
interval is approximately
the density times delta.
229
00:11:30,170 --> 00:11:32,340
So densities essentially
give us
230
00:11:32,340 --> 00:11:34,670
probabilities of small intervals.
231
00:11:34,670 --> 00:11:38,100
And if you want to think about
it a little differently, you
232
00:11:38,100 --> 00:11:41,020
can take that delta from
here and send it to
233
00:11:41,020 --> 00:11:43,960
the denominator there.
234
00:11:43,960 --> 00:11:48,880
And what this tells you
is that the density is
235
00:11:48,880 --> 00:11:55,270
probability per unit length for
intervals of small length.
236
00:11:55,270 --> 00:11:59,860
So the units of density are
probability per unit length.
237
00:11:59,860 --> 00:12:01,420
Densities are not
probabilities.
238
00:12:01,420 --> 00:12:04,430
They are rates at which
probabilities accumulate,
239
00:12:04,430 --> 00:12:06,780
probabilities per unit length.
240
00:12:06,780 --> 00:12:09,780
And since densities are not
probabilities, they don't have
241
00:12:09,780 --> 00:12:11,960
to be less than 1.
242
00:12:11,960 --> 00:12:14,730
Ordinary probabilities always
must be less than 1.
243
00:12:14,730 --> 00:12:18,000
But density is a different
kind of thing.
244
00:12:18,000 --> 00:12:20,530
It can get pretty big
in some places.
245
00:12:20,530 --> 00:12:23,680
It can even sort of blow
up in some places.
246
00:12:23,680 --> 00:12:27,620
As long as the total area under
the curve is 1, other
247
00:12:27,620 --> 00:12:32,830
than that, the curve can do
anything that it wants.
248
00:12:32,830 --> 00:12:35,930
Now, the density prescribes
for us the
249
00:12:35,930 --> 00:12:41,620
probability of intervals.
250
00:12:41,620 --> 00:12:44,710
Sometimes we may want to find
the probability of more
251
00:12:44,710 --> 00:12:46,540
general sets.
252
00:12:46,540 --> 00:12:47,780
How would we do that?
253
00:12:47,780 --> 00:12:51,580
Well, for nice sets, you will
just integrate the density
254
00:12:51,580 --> 00:12:54,260
over that nice set.
255
00:12:54,260 --> 00:12:56,640
I'm not quite defining
what "nice" means.
256
00:12:56,640 --> 00:12:59,140
That's a pretty technical
topic in the theory of
257
00:12:59,140 --> 00:13:00,160
probability.
258
00:13:00,160 --> 00:13:04,530
But for our purposes, usually we
will take b to be something
259
00:13:04,530 --> 00:13:06,500
like a union of intervals.
260
00:13:06,500 --> 00:13:10,200
So how do you find the
probability of falling in the
261
00:13:10,200 --> 00:13:11,690
union of two intervals?
262
00:13:11,690 --> 00:13:14,180
Well, you find the probability
of falling in that interval
263
00:13:14,180 --> 00:13:16,240
plus the probability of falling
in that interval.
264
00:13:16,240 --> 00:13:19,150
So it's the integral over this
interval plus the integral
265
00:13:19,150 --> 00:13:20,500
over that interval.
266
00:13:20,500 --> 00:13:24,370
And you think of this as just
integrating over the union of
267
00:13:24,370 --> 00:13:25,730
the two intervals.
268
00:13:25,730 --> 00:13:28,580
So once you can calculate
probabilities of intervals,
269
00:13:28,580 --> 00:13:30,590
then usually you are in
business, and you can
270
00:13:30,590 --> 00:13:34,000
calculate anything else
you might want.
271
00:13:34,000 --> 00:13:36,330
So the probability density
function is a complete
272
00:13:36,330 --> 00:13:39,530
description of any statistical
information we might be
273
00:13:39,530 --> 00:13:44,425
interested in for a continuous
random variable.
274
00:13:44,425 --> 00:13:44,880
OK.
275
00:13:44,880 --> 00:13:47,330
So now we can start walking
through the concepts and the
276
00:13:47,330 --> 00:13:51,730
definitions that we have for
discrete random variables and
277
00:13:51,730 --> 00:13:54,230
translate them to the
continuous case.
278
00:13:54,230 --> 00:13:58,960
The first big concept is the
concept of the expectation.
279
00:13:58,960 --> 00:14:01,680
One can start with a
mathematical definition.
280
00:14:01,680 --> 00:14:04,810
And here we put down
a definition by
281
00:14:04,810 --> 00:14:07,730
just translating notation.
282
00:14:07,730 --> 00:14:11,160
Wherever we have a sum in
the discrete case, we
283
00:14:11,160 --> 00:14:13,060
now write an integral.
284
00:14:13,060 --> 00:14:16,310
And wherever we had the
probability mass function, we
285
00:14:16,310 --> 00:14:20,570
now throw in the probability
density function.
286
00:14:20,570 --> 00:14:22,010
This formula--
287
00:14:22,010 --> 00:14:24,200
you may have seen it in
freshman physics--
288
00:14:24,200 --> 00:14:28,190
basically, it again gives you
the center of gravity of the
289
00:14:28,190 --> 00:14:31,150
picture that you have when
you have the density.
290
00:14:31,150 --> 00:14:36,460
It's the center of gravity of
the object sitting underneath
291
00:14:36,460 --> 00:14:38,220
the probability density
function.
292
00:14:38,220 --> 00:14:40,900
So that the interpretation
still applies.
293
00:14:40,900 --> 00:14:44,120
It's also true that our
conceptual interpretation of
294
00:14:44,120 --> 00:14:47,820
what an expectation means is
also valid in this case.
295
00:14:47,820 --> 00:14:51,770
That is, if you repeat an
experiment a zillion times,
296
00:14:51,770 --> 00:14:54,100
each time drawing an independent
sample of your
297
00:14:54,100 --> 00:14:58,500
random variable x, in the long
run, the average that you are
298
00:14:58,500 --> 00:15:01,860
going to get should be
the expectation.
299
00:15:01,860 --> 00:15:04,740
One can reason in a hand-waving
way, sort of
300
00:15:04,740 --> 00:15:07,440
intuitively, the way we did it
for the case of discrete
301
00:15:07,440 --> 00:15:08,770
random variables.
302
00:15:08,770 --> 00:15:11,940
But this is also a theorem
of some sort.
303
00:15:11,940 --> 00:15:15,300
It's a limit theorem that we're
going to visit later on
304
00:15:15,300 --> 00:15:17,530
in this class.
305
00:15:17,530 --> 00:15:20,700
Having defined the expectation
and having claimed that the
306
00:15:20,700 --> 00:15:23,100
interpretation of the
expectation is that same as
307
00:15:23,100 --> 00:15:26,810
before, then we can start taking
just any formula you've
308
00:15:26,810 --> 00:15:28,580
seen before and just
translate it.
309
00:15:28,580 --> 00:15:31,200
So for example, to find the
expected value of a function
310
00:15:31,200 --> 00:15:35,430
of a continuous random variable,
you do not have to
311
00:15:35,430 --> 00:15:39,130
find the PDF or PMF of g(X).
312
00:15:39,130 --> 00:15:43,040
You can just work directly with
the original distribution
313
00:15:43,040 --> 00:15:44,990
of the random variable
capital X.
314
00:15:44,990 --> 00:15:48,570
And this formula is the same
as for the discrete case.
315
00:15:48,570 --> 00:15:50,880
Sums get replaced
by integrals.
316
00:15:50,880 --> 00:15:54,340
And PMFs get replaced by PDFs.
317
00:15:54,340 --> 00:15:57,050
And in particular, the variance
of a random variable
318
00:15:57,050 --> 00:15:59,080
is defined again the same way.
319
00:15:59,080 --> 00:16:03,390
The variance is the expected
value, the average of the
320
00:16:03,390 --> 00:16:07,920
distance of X from the mean
and then squared.
321
00:16:07,920 --> 00:16:10,690
So it's the expected value for
a random variable that takes
322
00:16:10,690 --> 00:16:12,500
these numerical values.
323
00:16:12,500 --> 00:16:17,250
And same formula as before,
integral and F instead of
324
00:16:17,250 --> 00:16:19,420
summation, and the P.
325
00:16:19,420 --> 00:16:23,090
And the formulas that we have
derived or formulas that you
326
00:16:23,090 --> 00:16:26,260
have seen for the discrete case,
they all go through the
327
00:16:26,260 --> 00:16:27,090
continuous case.
328
00:16:27,090 --> 00:16:31,990
So for example, the useful
relation for variances, which
329
00:16:31,990 --> 00:16:37,410
is this one, remains true.
330
00:16:37,410 --> 00:16:37,850
All right.
331
00:16:37,850 --> 00:16:39,790
So time for an example.
332
00:16:39,790 --> 00:16:43,500
The most simple example of a
continuous random variable
333
00:16:43,500 --> 00:16:45,170
that there is, is the so-called
334
00:16:45,170 --> 00:16:48,670
uniform random variable.
335
00:16:48,670 --> 00:16:51,940
So the uniform random variable
is described by a density
336
00:16:51,940 --> 00:16:55,540
which is 0 except over
an interval.
337
00:16:55,540 --> 00:16:58,360
And over that interval,
it is constant.
338
00:16:58,360 --> 00:17:00,190
What is it meant to convey?
339
00:17:00,190 --> 00:17:04,829
It's trying to convey the idea
that all x's in this range are
340
00:17:04,829 --> 00:17:06,540
equally likely.
341
00:17:06,540 --> 00:17:08,390
Well, that doesn't
say very much.
342
00:17:08,390 --> 00:17:11,170
Any individual x has
0 probability.
343
00:17:11,170 --> 00:17:13,460
So it's conveying a little
more than that.
344
00:17:13,460 --> 00:17:18,000
What it is saying is that if I
take an interval of a given
345
00:17:18,000 --> 00:17:22,089
length delta, and I take another
interval of the same
346
00:17:22,089 --> 00:17:26,290
length, delta, under the uniform
distribution, these
347
00:17:26,290 --> 00:17:29,290
two intervals are going to have
the same probability.
348
00:17:29,290 --> 00:17:34,670
So being uniform means that
intervals of same length have
349
00:17:34,670 --> 00:17:35,720
the same probability.
350
00:17:35,720 --> 00:17:40,390
So no interval is more likely
than any other to occur.
351
00:17:40,390 --> 00:17:44,200
And in that sense, it conveys
the idea of sort of complete
352
00:17:44,200 --> 00:17:45,100
randomness.
353
00:17:45,100 --> 00:17:48,430
Any little interval in our range
is equally likely as any
354
00:17:48,430 --> 00:17:49,830
other little interval.
355
00:17:49,830 --> 00:17:50,260
All right.
356
00:17:50,260 --> 00:17:53,870
So what's the formula
for this density?
357
00:17:53,870 --> 00:17:55,280
I only told you the range.
358
00:17:55,280 --> 00:17:57,490
What's the height?
359
00:17:57,490 --> 00:18:00,340
Well, the area under the density
must be equal to 1.
360
00:18:00,340 --> 00:18:02,700
Total probability
is equal to 1.
361
00:18:02,700 --> 00:18:07,100
And so the height, inescapably,
is going to be 1
362
00:18:07,100 --> 00:18:09,480
over (b minus a).
363
00:18:09,480 --> 00:18:14,880
That's the height that makes
the density integrate to 1.
364
00:18:14,880 --> 00:18:16,610
So that's the formula.
365
00:18:16,610 --> 00:18:21,240
And if you don't want to lose
one point in your exam, you
366
00:18:21,240 --> 00:18:25,946
have to say that it's
also 0, otherwise.
367
00:18:25,946 --> 00:18:27,794
OK.
368
00:18:27,794 --> 00:18:28,260
All right?
369
00:18:28,260 --> 00:18:31,760
That's sort of the
complete answer.
370
00:18:31,760 --> 00:18:35,590
How about the expected value
of this random variable?
371
00:18:35,590 --> 00:18:36,060
OK.
372
00:18:36,060 --> 00:18:39,730
You can find the expected value
in two different ways.
373
00:18:39,730 --> 00:18:42,400
One is to start with
the definition.
374
00:18:42,400 --> 00:18:45,220
And so you integrate
over the range of
375
00:18:45,220 --> 00:18:47,185
interest times the density.
376
00:18:47,185 --> 00:18:50,350
377
00:18:50,350 --> 00:18:55,460
And you figure out what that
integral is going to be.
378
00:18:55,460 --> 00:18:57,800
Or you can be a little
more clever.
379
00:18:57,800 --> 00:19:01,290
Since the center-of-gravity
interpretation is still true,
380
00:19:01,290 --> 00:19:03,890
it must be the center of gravity
of this picture.
381
00:19:03,890 --> 00:19:06,680
And the center of gravity is,
of course, the midpoint.
382
00:19:06,680 --> 00:19:11,740
Whenever you have symmetry,
the mean is always the
383
00:19:11,740 --> 00:19:20,630
midpoint of the diagram that
gives you the PDF.
384
00:19:20,630 --> 00:19:22,180
OK.
385
00:19:22,180 --> 00:19:24,870
So that's the expected
value of X.
386
00:19:24,870 --> 00:19:27,990
Finally, regarding the variance,
well, there you will
387
00:19:27,990 --> 00:19:30,240
have to do a little
bit of calculus.
388
00:19:30,240 --> 00:19:33,460
We can write down
the definition.
389
00:19:33,460 --> 00:19:35,930
So it's an integral
instead of a sum.
390
00:19:35,930 --> 00:19:40,590
A typical value of the random
variable minus the expected
391
00:19:40,590 --> 00:19:44,280
value, squared, times
the density.
392
00:19:44,280 --> 00:19:45,650
And we integrate.
393
00:19:45,650 --> 00:19:48,820
You do this integral, and you
find it's (b minus a) squared
394
00:19:48,820 --> 00:19:52,660
over that number, which
happens to be 12.
395
00:19:52,660 --> 00:19:56,140
Maybe more interesting is the
standard deviation itself.
396
00:19:56,140 --> 00:19:59,140
397
00:19:59,140 --> 00:20:02,760
And you see that the standard
deviation is proportional to
398
00:20:02,760 --> 00:20:05,280
the width of that interval.
399
00:20:05,280 --> 00:20:07,850
This agrees with our intuition,
that the standard
400
00:20:07,850 --> 00:20:12,730
deviation is meant to capture a
sense of how spread out our
401
00:20:12,730 --> 00:20:14,000
distribution is.
402
00:20:14,000 --> 00:20:17,370
And the standard deviation has
the same units as the random
403
00:20:17,370 --> 00:20:19,040
variable itself.
404
00:20:19,040 --> 00:20:22,860
So it's sort of good to-- you
can interpret it in a
405
00:20:22,860 --> 00:20:27,180
reasonable way based
on that picture.
406
00:20:27,180 --> 00:20:30,890
OK, yes.
407
00:20:30,890 --> 00:20:38,280
Now, let's go up one level and
think about the following.
408
00:20:38,280 --> 00:20:41,740
So we have formulas for the
discrete case, formulas for
409
00:20:41,740 --> 00:20:42,690
the continuous case.
410
00:20:42,690 --> 00:20:44,420
So you can write them
side by side.
411
00:20:44,420 --> 00:20:47,100
One has sums, the other
has integrals.
412
00:20:47,100 --> 00:20:49,450
Suppose you want to make an
argument and say that
413
00:20:49,450 --> 00:20:52,160
something is true for every
random variable.
414
00:20:52,160 --> 00:20:55,770
You would essentially need to
do two separate proofs, for
415
00:20:55,770 --> 00:20:57,510
discrete and for continuous.
416
00:20:57,510 --> 00:21:00,400
Is there some way of dealing
with random variables just one
417
00:21:00,400 --> 00:21:05,130
at a time, in one shot, using
a sort of uniform notation?
418
00:21:05,130 --> 00:21:07,990
Is there a unifying concept?
419
00:21:07,990 --> 00:21:10,170
Luckily, there is one.
420
00:21:10,170 --> 00:21:12,400
It's the notion of the
cumulative distribution
421
00:21:12,400 --> 00:21:13,850
function of a random variable.
422
00:21:13,850 --> 00:21:16,400
423
00:21:16,400 --> 00:21:20,730
And it's a concept that applies
equally well to
424
00:21:20,730 --> 00:21:22,890
discrete and continuous
random variables.
425
00:21:22,890 --> 00:21:26,210
So it's an object that we can
use to describe distributions
426
00:21:26,210 --> 00:21:29,340
in both cases, using just
one piece of notation.
427
00:21:29,340 --> 00:21:32,070
428
00:21:32,070 --> 00:21:33,600
So what's the definition?
429
00:21:33,600 --> 00:21:36,290
It's the probability that the
random variable takes values
430
00:21:36,290 --> 00:21:39,030
less than a certain
number little x.
431
00:21:39,030 --> 00:21:41,440
So you go to the diagram, and
you see what's the probability
432
00:21:41,440 --> 00:21:44,060
that I'm falling to
the left of this.
433
00:21:44,060 --> 00:21:47,680
And you specify those
probabilities for all x's.
434
00:21:47,680 --> 00:21:51,400
In the continuous case, you
calculate those probabilities
435
00:21:51,400 --> 00:21:53,090
using the integral formula.
436
00:21:53,090 --> 00:21:55,730
So you integrate from
here up to x.
437
00:21:55,730 --> 00:21:58,850
In the discrete case, to find
the probability to the left of
438
00:21:58,850 --> 00:22:02,790
some point, you go here, and
you add probabilities again
439
00:22:02,790 --> 00:22:03,980
from the left.
440
00:22:03,980 --> 00:22:06,770
So the way that the cumulative
distribution function is
441
00:22:06,770 --> 00:22:10,010
calculated is a little different
in the continuous
442
00:22:10,010 --> 00:22:10,850
and discrete case.
443
00:22:10,850 --> 00:22:11,990
In one case you integrate.
444
00:22:11,990 --> 00:22:13,440
In the other, you sum.
445
00:22:13,440 --> 00:22:18,340
But leaving aside how it's being
calculated, what the
446
00:22:18,340 --> 00:22:22,530
concept is, it's the same
concept in both cases.
447
00:22:22,530 --> 00:22:25,810
So let's see what the shape of
the cumulative distribution
448
00:22:25,810 --> 00:22:28,360
function would be in
the two cases.
449
00:22:28,360 --> 00:22:34,100
So here what we want is to
record for every little x the
450
00:22:34,100 --> 00:22:36,760
probability of falling
to the left of x.
451
00:22:36,760 --> 00:22:38,240
So let's start here.
452
00:22:38,240 --> 00:22:41,580
Probability of falling to
the left of here is 0--
453
00:22:41,580 --> 00:22:43,550
0, 0, 0.
454
00:22:43,550 --> 00:22:47,280
Once we get here and we start
moving to the right, the
455
00:22:47,280 --> 00:22:51,750
probability of falling to the
left of here is the area of
456
00:22:51,750 --> 00:22:53,610
this little rectangle.
457
00:22:53,610 --> 00:22:57,590
And the area of that little
rectangle increases linearly
458
00:22:57,590 --> 00:22:59,290
as I keep moving.
459
00:22:59,290 --> 00:23:03,780
So accordingly, the CDF
increases linearly until I get
460
00:23:03,780 --> 00:23:04,870
to that point.
461
00:23:04,870 --> 00:23:08,670
At that point, what's
the value of my CDF?
462
00:23:08,670 --> 00:23:09,020
1.
463
00:23:09,020 --> 00:23:11,400
I have accumulated all the
probability there is.
464
00:23:11,400 --> 00:23:13,180
I have integrated it.
465
00:23:13,180 --> 00:23:15,890
This total area has
to be equal to 1.
466
00:23:15,890 --> 00:23:18,780
So it reaches 1, and then
there's no more probability to
467
00:23:18,780 --> 00:23:20,040
be accumulated.
468
00:23:20,040 --> 00:23:23,170
It just stays at 1.
469
00:23:23,170 --> 00:23:28,050
So the value here
is equal to 1.
470
00:23:28,050 --> 00:23:30,270
OK.
471
00:23:30,270 --> 00:23:36,716
How would you find the density
if somebody gave you the CDF?
472
00:23:36,716 --> 00:23:39,570
The CDF is the integral
of the density.
473
00:23:39,570 --> 00:23:43,820
Therefore, the density is the
derivative of the CDF.
474
00:23:43,820 --> 00:23:46,190
So you look at this picture
and take the derivative.
475
00:23:46,190 --> 00:23:48,580
Derivative is 0 here, 0 here.
476
00:23:48,580 --> 00:23:51,330
And it's a constant
up there, which
477
00:23:51,330 --> 00:23:53,120
corresponds to that constant.
478
00:23:53,120 --> 00:23:56,900
So more generally, and an
important thing to know, is
479
00:23:56,900 --> 00:24:04,250
that the derivative of the CDF
is equal to the density--
480
00:24:04,250 --> 00:24:10,210
481
00:24:10,210 --> 00:24:14,170
almost, with a little
bit of an exception.
482
00:24:14,170 --> 00:24:15,800
What's the exception?
483
00:24:15,800 --> 00:24:19,200
At those places where the CDF
does not have a derivative--
484
00:24:19,200 --> 00:24:21,520
here where it has a corner--
485
00:24:21,520 --> 00:24:23,720
the derivative is undefined.
486
00:24:23,720 --> 00:24:26,030
And in some sense, the
density is also
487
00:24:26,030 --> 00:24:27,460
ambiguous at that point.
488
00:24:27,460 --> 00:24:31,860
Is my density at the endpoint,
is it 0 or is it 1?
489
00:24:31,860 --> 00:24:33,330
It doesn't really matter.
490
00:24:33,330 --> 00:24:36,670
If you change the density at
just a single point, it's not
491
00:24:36,670 --> 00:24:39,000
going to affect the
value of any
492
00:24:39,000 --> 00:24:41,530
integral you ever calculate.
493
00:24:41,530 --> 00:24:44,900
So the value of the density at
the endpoint, you can leave it
494
00:24:44,900 --> 00:24:47,390
as being ambiguous, or
you can specify it.
495
00:24:47,390 --> 00:24:49,130
It doesn't matter.
496
00:24:49,130 --> 00:24:53,590
So at all places where the
CDF has a derivative,
497
00:24:53,590 --> 00:24:54,970
this will be true.
498
00:24:54,970 --> 00:24:58,470
At those places where you have
corners, which do show up
499
00:24:58,470 --> 00:25:01,740
sometimes, well, you
don't really care.
500
00:25:01,740 --> 00:25:03,640
How about the discrete case?
501
00:25:03,640 --> 00:25:07,450
In the discrete case, the CDF
has a more peculiar shape.
502
00:25:07,450 --> 00:25:08,870
So let's do the calculation.
503
00:25:08,870 --> 00:25:10,440
We want to find the
probability of b
504
00:25:10,440 --> 00:25:11,920
to the left of here.
505
00:25:11,920 --> 00:25:13,970
That probability is 0, 0, 0.
506
00:25:13,970 --> 00:25:16,170
Once we cross that point, the
probability of being to the
507
00:25:16,170 --> 00:25:19,140
left of here is 1/6.
508
00:25:19,140 --> 00:25:22,030
So as soon as we cross the
point 1, we get the
509
00:25:22,030 --> 00:25:25,740
probability of 1/6, which means
that the size of the
510
00:25:25,740 --> 00:25:29,230
jump that we have here is 1/6.
511
00:25:29,230 --> 00:25:31,020
Now, question.
512
00:25:31,020 --> 00:25:35,175
At this point 1, which is the
correct value of the CDF?
513
00:25:35,175 --> 00:25:39,090
Is it 0, or is it 1/6?
514
00:25:39,090 --> 00:25:40,560
It's 1/6 because--
515
00:25:40,560 --> 00:25:42,540
you need to look carefully
at the definitions, the
516
00:25:42,540 --> 00:25:46,180
probability of x being less
than or equal to little x.
517
00:25:46,180 --> 00:25:49,230
If I take little x to be 1,
it's the probability that
518
00:25:49,230 --> 00:25:51,900
capital X is less than
or equal to 1.
519
00:25:51,900 --> 00:25:55,730
So it includes the event
that x is equal to 1.
520
00:25:55,730 --> 00:25:58,130
So it includes this
probability here.
521
00:25:58,130 --> 00:26:02,710
So at jump points, the correct
value of the CDF is going to
522
00:26:02,710 --> 00:26:04,650
be this one.
523
00:26:04,650 --> 00:26:08,130
And now as I trace, x is
going to the right.
524
00:26:08,130 --> 00:26:12,750
As soon as I cross this point,
I have added another 3/6
525
00:26:12,750 --> 00:26:14,180
probability.
526
00:26:14,180 --> 00:26:20,350
So that 3/6 causes a
jump to the CDF.
527
00:26:20,350 --> 00:26:23,280
And that determines
the new value.
528
00:26:23,280 --> 00:26:27,860
And finally, once I cross
the last point, I get
529
00:26:27,860 --> 00:26:31,631
another jump of 2/6.
530
00:26:31,631 --> 00:26:35,900
A general moral from these two
examples and these pictures.
531
00:26:35,900 --> 00:26:39,270
CDFs are well defined
in both cases.
532
00:26:39,270 --> 00:26:42,490
For the case of continuous
random variables, the CDF will
533
00:26:42,490 --> 00:26:45,000
be a continuous function.
534
00:26:45,000 --> 00:26:46,330
It starts from 0.
535
00:26:46,330 --> 00:26:49,760
It eventually goes to 1
and goes smoothly--
536
00:26:49,760 --> 00:26:54,100
well, continuously from smaller
to higher values.
537
00:26:54,100 --> 00:26:55,200
It can only go up.
538
00:26:55,200 --> 00:26:58,300
It cannot go down since we're
accumulating more and more
539
00:26:58,300 --> 00:27:00,230
probability as we are
going to the right.
540
00:27:00,230 --> 00:27:03,160
In the discrete case, again
it starts from 0,
541
00:27:03,160 --> 00:27:04,610
and it goes to 1.
542
00:27:04,610 --> 00:27:07,740
But it does it in a
staircase manner.
543
00:27:07,740 --> 00:27:13,050
And you get a jump at each place
where the PMF assigns a
544
00:27:13,050 --> 00:27:14,660
positive mass.
545
00:27:14,660 --> 00:27:19,560
So jumps in the CDF are
associated with point masses
546
00:27:19,560 --> 00:27:20,330
in our distribution.
547
00:27:20,330 --> 00:27:23,570
In the continuous case, we don't
have any point masses,
548
00:27:23,570 --> 00:27:25,470
so we do not have any
jumps either.
549
00:27:25,470 --> 00:27:30,390
550
00:27:30,390 --> 00:27:33,300
Now, besides saving
us notation--
551
00:27:33,300 --> 00:27:36,020
we don't have to deal
with discrete
552
00:27:36,020 --> 00:27:39,000
and continuous twice--
553
00:27:39,000 --> 00:27:43,240
CDFs give us actually a little
more flexibility.
554
00:27:43,240 --> 00:27:46,840
Not all random variables are
continuous or discrete.
555
00:27:46,840 --> 00:27:49,790
You can cook up random variables
that are kind of
556
00:27:49,790 --> 00:27:53,410
neither or a mixture
of the two.
557
00:27:53,410 --> 00:27:59,540
An example would be, let's
say you play a game.
558
00:27:59,540 --> 00:28:03,620
And with a certain probability,
you get a certain
559
00:28:03,620 --> 00:28:05,690
number of dollars
in your hands.
560
00:28:05,690 --> 00:28:07,000
So you flip a coin.
561
00:28:07,000 --> 00:28:14,120
And with probability 1/2, you
get a reward of 1/2 dollars.
562
00:28:14,120 --> 00:28:18,430
And with probability 1/2, you
are led to a dark room where
563
00:28:18,430 --> 00:28:20,580
you spin a wheel of fortune.
564
00:28:20,580 --> 00:28:23,410
And that wheel of fortune gives
you a random reward
565
00:28:23,410 --> 00:28:25,610
between 0 and 1.
566
00:28:25,610 --> 00:28:28,600
So any of these outcomes
is possible.
567
00:28:28,600 --> 00:28:31,100
And the amount that you're
going to get,
568
00:28:31,100 --> 00:28:33,930
let's say, is uniform.
569
00:28:33,930 --> 00:28:35,640
So you flip a coin.
570
00:28:35,640 --> 00:28:38,360
And depending on the outcome of
the coin, either you get a
571
00:28:38,360 --> 00:28:43,530
certain value or you get a
value that ranges over a
572
00:28:43,530 --> 00:28:45,360
continuous interval.
573
00:28:45,360 --> 00:28:48,380
So what kind of random
variable is it?
574
00:28:48,380 --> 00:28:50,280
Is it continuous?
575
00:28:50,280 --> 00:28:54,100
Well, continuous random
variables assign 0 probability
576
00:28:54,100 --> 00:28:56,180
to individual points.
577
00:28:56,180 --> 00:28:58,020
Is it the case here?
578
00:28:58,020 --> 00:29:00,680
No, because you have positive
probability of
579
00:29:00,680 --> 00:29:04,740
obtaining 1/2 dollar.
580
00:29:04,740 --> 00:29:07,040
So our random variable
is not continuous.
581
00:29:07,040 --> 00:29:08,220
Is it discrete?
582
00:29:08,220 --> 00:29:11,600
It's not discrete, because our
random variable can take
583
00:29:11,600 --> 00:29:14,260
values also over a
continuous range.
584
00:29:14,260 --> 00:29:16,780
So we call such a random
variable a
585
00:29:16,780 --> 00:29:19,380
mixed random variable.
586
00:29:19,380 --> 00:29:27,200
If you were to draw its
distribution very loosely,
587
00:29:27,200 --> 00:29:33,740
probably you would want to draw
a picture like this one,
588
00:29:33,740 --> 00:29:36,710
which kind of conveys the
idea of what's going on.
589
00:29:36,710 --> 00:29:39,690
So just think of this as a
drawing of masses that are
590
00:29:39,690 --> 00:29:41,840
sitting over a table.
591
00:29:41,840 --> 00:29:47,940
We place an object that weighs
half a pound, but it's an
592
00:29:47,940 --> 00:29:50,230
object that takes zero space.
593
00:29:50,230 --> 00:29:53,720
So half a pound is just sitting
on top of that point.
594
00:29:53,720 --> 00:29:57,980
And we take another half-pound
of probability and spread it
595
00:29:57,980 --> 00:30:00,740
uniformly over that interval.
596
00:30:00,740 --> 00:30:04,820
So this is like a piece that
comes from mass functions.
597
00:30:04,820 --> 00:30:08,060
And that's a piece that looks
more like a density function.
598
00:30:08,060 --> 00:30:10,920
And we just throw them together
in the picture.
599
00:30:10,920 --> 00:30:13,150
I'm not trying to associate
any formal
600
00:30:13,150 --> 00:30:14,310
meaning with this picture.
601
00:30:14,310 --> 00:30:18,410
It's just a schematic of how
probabilities are distributed,
602
00:30:18,410 --> 00:30:20,860
help us visualize
what's going on.
603
00:30:20,860 --> 00:30:26,080
Now, if you have taken classes
on systems and all of that,
604
00:30:26,080 --> 00:30:29,890
you may have seen the concept
of an impulse function.
605
00:30:29,890 --> 00:30:33,630
And you my start saying that,
oh, I should treat this
606
00:30:33,630 --> 00:30:36,190
mathematically as a so-called
impulse function.
607
00:30:36,190 --> 00:30:39,400
But we do not need this for our
purposes in this class.
608
00:30:39,400 --> 00:30:43,860
Just think of this as a nice
picture that conveys what's
609
00:30:43,860 --> 00:30:46,200
going on in this particular
case.
610
00:30:46,200 --> 00:30:51,740
So now, what would the CDF
look like in this case?
611
00:30:51,740 --> 00:30:55,550
The CDF is always well defined,
no matter what kind
612
00:30:55,550 --> 00:30:57,220
of random variable you have.
613
00:30:57,220 --> 00:30:59,540
So the fact that it's not
continuous, it's not discrete
614
00:30:59,540 --> 00:31:01,870
shouldn't be a problem as
long as we can calculate
615
00:31:01,870 --> 00:31:04,120
probabilities of this kind.
616
00:31:04,120 --> 00:31:07,600
So the probability of falling
to the left here is 0.
617
00:31:07,600 --> 00:31:10,850
Once I start crossing there, the
probability of falling to
618
00:31:10,850 --> 00:31:13,890
the left of a point increases
linearly with
619
00:31:13,890 --> 00:31:15,610
how far I have gone.
620
00:31:15,610 --> 00:31:17,900
So we get this linear
increase.
621
00:31:17,900 --> 00:31:21,250
But as soon as I cross that
point, I accumulate another
622
00:31:21,250 --> 00:31:24,220
1/2 unit of probability
instantly.
623
00:31:24,220 --> 00:31:27,860
And once I accumulate that 1/2
unit, it means that my CDF is
624
00:31:27,860 --> 00:31:30,320
going to have a jump of 1/2.
625
00:31:30,320 --> 00:31:33,780
And then afterwards, I still
keep accumulating probability
626
00:31:33,780 --> 00:31:36,760
at a fixed rate, the rate
being the density.
627
00:31:36,760 --> 00:31:39,640
And I keep accumulating, again,
at a linear rate until
628
00:31:39,640 --> 00:31:42,160
I settle to 1.
629
00:31:42,160 --> 00:31:46,240
So this is a CDF that has
certain pieces where it
630
00:31:46,240 --> 00:31:48,060
increases continuously.
631
00:31:48,060 --> 00:31:50,280
And that corresponds to the
continuous part of our
632
00:31:50,280 --> 00:31:51,390
randomize variable.
633
00:31:51,390 --> 00:31:55,090
And it also has some places
where it has discrete jumps.
634
00:31:55,090 --> 00:31:57,500
And those district jumps
correspond to places in which
635
00:31:57,500 --> 00:32:00,990
we have placed a
positive mass.
636
00:32:00,990 --> 00:32:01,780
And by the--
637
00:32:01,780 --> 00:32:03,750
OK, yeah.
638
00:32:03,750 --> 00:32:06,580
So this little 0 shouldn't
be there.
639
00:32:06,580 --> 00:32:08,040
So let's cross it out.
640
00:32:08,040 --> 00:32:10,980
641
00:32:10,980 --> 00:32:11,780
All right.
642
00:32:11,780 --> 00:32:15,830
So finally, we're going to take
the remaining time and
643
00:32:15,830 --> 00:32:17,610
introduce our new friend.
644
00:32:17,610 --> 00:32:23,080
It's going to be the Gaussian
or normal distribution.
645
00:32:23,080 --> 00:32:27,690
So it's the most important
distribution there is in all
646
00:32:27,690 --> 00:32:28,940
of probability theory.
647
00:32:28,940 --> 00:32:31,230
It's plays a very
central role.
648
00:32:31,230 --> 00:32:34,340
It shows up all over
the place.
649
00:32:34,340 --> 00:32:37,870
We'll see later in the
class in more detail
650
00:32:37,870 --> 00:32:39,450
why it shows up.
651
00:32:39,450 --> 00:32:42,115
But the quick preview
is the following.
652
00:32:42,115 --> 00:32:46,220
If you have a phenomenon in
which you measure a certain
653
00:32:46,220 --> 00:32:50,970
quantity, but that quantity is
made up of lots and lots of
654
00:32:50,970 --> 00:32:52,820
random contributions--
655
00:32:52,820 --> 00:32:55,870
so your random variable is
actually the sum of lots and
656
00:32:55,870 --> 00:32:59,570
lots of independent little
random variables--
657
00:32:59,570 --> 00:33:04,290
then invariability, no matter
what kind of distribution the
658
00:33:04,290 --> 00:33:08,260
little random variables have,
their sum will turn out to
659
00:33:08,260 --> 00:33:11,500
have approximately a normal
distribution.
660
00:33:11,500 --> 00:33:14,490
So this makes the normal
distribution to arise very
661
00:33:14,490 --> 00:33:16,680
naturally in lots and
lots of contexts.
662
00:33:16,680 --> 00:33:21,210
Whenever you have noise that's
comprised of lots of different
663
00:33:21,210 --> 00:33:26,310
independent pieces of noise,
then the end result will be a
664
00:33:26,310 --> 00:33:28,650
random variable that's normal.
665
00:33:28,650 --> 00:33:31,250
So we are going to come back
to that topic later.
666
00:33:31,250 --> 00:33:34,620
But that's the preview comment,
basically to argue
667
00:33:34,620 --> 00:33:37,430
that it's an important one.
668
00:33:37,430 --> 00:33:37,690
OK.
669
00:33:37,690 --> 00:33:38,810
And there's a special case.
670
00:33:38,810 --> 00:33:41,030
If you are dealing with a
binomial distribution, which
671
00:33:41,030 --> 00:33:44,610
is the sum of lots of Bernoulli
random variables,
672
00:33:44,610 --> 00:33:47,200
again you would expect that
the binomial would start
673
00:33:47,200 --> 00:33:51,170
looking like a normal if you
have many, many-- a large
674
00:33:51,170 --> 00:33:53,150
number of point fields.
675
00:33:53,150 --> 00:33:53,530
All right.
676
00:33:53,530 --> 00:33:56,560
So what's the math
involved here?
677
00:33:56,560 --> 00:34:02,370
Let's parse the formula for
the density of the normal.
678
00:34:02,370 --> 00:34:07,110
What we start with is the
function X squared over 2.
679
00:34:07,110 --> 00:34:09,750
And if you are to plot X
squared over 2, it's a
680
00:34:09,750 --> 00:34:12,840
parabola, and it has
this shape --
681
00:34:12,840 --> 00:34:14,860
X squared over 2.
682
00:34:14,860 --> 00:34:16,790
Then what do we do?
683
00:34:16,790 --> 00:34:20,210
We take the negative exponential
of this.
684
00:34:20,210 --> 00:34:24,600
So when X squared over
2 is 0, then negative
685
00:34:24,600 --> 00:34:28,980
exponential is 1.
686
00:34:28,980 --> 00:34:32,739
When X squared over 2 increases,
the negative
687
00:34:32,739 --> 00:34:37,130
exponential of that falls off,
and it falls off pretty fast.
688
00:34:37,130 --> 00:34:39,630
So as this goes up, the
formula for the
689
00:34:39,630 --> 00:34:41,150
density goes down.
690
00:34:41,150 --> 00:34:45,060
And because exponentials are
pretty strong in how quickly
691
00:34:45,060 --> 00:34:49,530
they fall off, this means that
the tails of this distribution
692
00:34:49,530 --> 00:34:53,370
actually do go down
pretty fast.
693
00:34:53,370 --> 00:34:53,659
OK.
694
00:34:53,659 --> 00:34:57,800
So that explains the shape
of the normal PDF.
695
00:34:57,800 --> 00:35:02,340
How about this factor 1
over square root 2 pi?
696
00:35:02,340 --> 00:35:05,540
Where does this come from?
697
00:35:05,540 --> 00:35:08,760
Well, the integral has
to be equal to 1.
698
00:35:08,760 --> 00:35:14,620
So you have to go and do your
calculus exercise and find the
699
00:35:14,620 --> 00:35:18,350
integral of this the minus X
squared over 2 function and
700
00:35:18,350 --> 00:35:22,240
then figure out, what constant
do I need to put in front so
701
00:35:22,240 --> 00:35:24,250
that the integral
is equal to 1?
702
00:35:24,250 --> 00:35:26,820
How do you evaluate
that integral?
703
00:35:26,820 --> 00:35:30,760
Either you go to Mathematica
or Wolfram's Alpha or
704
00:35:30,760 --> 00:35:33,340
whatever, and it tells
you what it is.
705
00:35:33,340 --> 00:35:37,260
Or it's a very beautiful
calculus exercise that you may
706
00:35:37,260 --> 00:35:39,050
have seen at some point.
707
00:35:39,050 --> 00:35:42,190
You throw in another exponential
of this kind, you
708
00:35:42,190 --> 00:35:46,520
bring in polar coordinates, and
somehow the answer comes
709
00:35:46,520 --> 00:35:48,010
beautifully out there.
710
00:35:48,010 --> 00:35:51,910
But in any case, this is the
constant that you need to make
711
00:35:51,910 --> 00:35:56,070
it integrate to 1 and to be
a legitimate density.
712
00:35:56,070 --> 00:35:58,550
We call this the standard
normal.
713
00:35:58,550 --> 00:36:02,280
And for the standard normal,
what is the expected value?
714
00:36:02,280 --> 00:36:05,780
Well, the symmetry, so
it's equal to 0.
715
00:36:05,780 --> 00:36:07,490
What is the variance?
716
00:36:07,490 --> 00:36:09,740
Well, here there's
no shortcut.
717
00:36:09,740 --> 00:36:12,490
You have to do another
calculus exercise.
718
00:36:12,490 --> 00:36:17,080
And you find that the variance
is equal to 1.
719
00:36:17,080 --> 00:36:17,750
OK.
720
00:36:17,750 --> 00:36:21,720
So this is a normal that's
centered around 0.
721
00:36:21,720 --> 00:36:24,990
How about other types of normals
that are centered at
722
00:36:24,990 --> 00:36:26,760
different places?
723
00:36:26,760 --> 00:36:29,730
So we can do the same
kind of thing.
724
00:36:29,730 --> 00:36:34,080
Instead of centering it at 0,
we can take some place where
725
00:36:34,080 --> 00:36:39,640
we want to center it, write down
a quadratic such as (X
726
00:36:39,640 --> 00:36:44,050
minus mu) squared, and then
take the negative
727
00:36:44,050 --> 00:36:45,940
exponential of that.
728
00:36:45,940 --> 00:36:53,790
And that gives us a normal
density that's centered at mu.
729
00:36:53,790 --> 00:37:01,190
Now, I may wish to control
the width of my density.
730
00:37:01,190 --> 00:37:04,820
To control the width of my
density, equivalently I can
731
00:37:04,820 --> 00:37:07,720
control the width
of my parabola.
732
00:37:07,720 --> 00:37:15,430
If my parabola is narrower, if
my parabola looks like this,
733
00:37:15,430 --> 00:37:17,990
what's going to happen
to the density?
734
00:37:17,990 --> 00:37:20,550
It's going to fall
off much faster.
735
00:37:20,550 --> 00:37:26,620
736
00:37:26,620 --> 00:37:26,920
OK.
737
00:37:26,920 --> 00:37:31,150
How do I make my parabola
narrower or wider?
738
00:37:31,150 --> 00:37:35,300
I do it by putting in a
constant down here.
739
00:37:35,300 --> 00:37:39,890
So by putting a sigma here, this
stretches or widens my
740
00:37:39,890 --> 00:37:42,840
parabola by a factor of sigma.
741
00:37:42,840 --> 00:37:43,540
Let's see.
742
00:37:43,540 --> 00:37:44,780
Which way does it go?
743
00:37:44,780 --> 00:37:49,330
If sigma is very small,
this is a big number.
744
00:37:49,330 --> 00:37:55,080
My parabola goes up quickly,
which means my normal falls
745
00:37:55,080 --> 00:37:56,730
off very fast.
746
00:37:56,730 --> 00:38:02,630
So small sigma corresponds
to a narrower density.
747
00:38:02,630 --> 00:38:08,870
And so it, therefore, should be
intuitive that the standard
748
00:38:08,870 --> 00:38:11,520
deviation is proportional
to sigma.
749
00:38:11,520 --> 00:38:13,380
Because that's the amount
by which you
750
00:38:13,380 --> 00:38:15,080
are scaling the picture.
751
00:38:15,080 --> 00:38:17,320
And indeed, the standard
deviation is sigma.
752
00:38:17,320 --> 00:38:21,470
And so the variance
is sigma squared.
753
00:38:21,470 --> 00:38:26,590
So all that we have done here
to create a general normal
754
00:38:26,590 --> 00:38:31,180
with a given mean and variance
is to take this picture, shift
755
00:38:31,180 --> 00:38:35,600
it in space so that the mean
sits at mu instead of 0, and
756
00:38:35,600 --> 00:38:38,880
then scale it by a
factor of sigma.
757
00:38:38,880 --> 00:38:41,130
This gives us a normal
with a given
758
00:38:41,130 --> 00:38:42,560
mean and a given variance.
759
00:38:42,560 --> 00:38:47,670
And the formula for
it is this one.
760
00:38:47,670 --> 00:38:48,810
All right.
761
00:38:48,810 --> 00:38:52,230
Now, normal random variables
have some wonderful
762
00:38:52,230 --> 00:38:54,160
properties.
763
00:38:54,160 --> 00:39:00,190
And one of them is that they
behave nicely when you take
764
00:39:00,190 --> 00:39:02,740
linear functions of them.
765
00:39:02,740 --> 00:39:07,190
So let's fix some constants
a and b, suppose that X is
766
00:39:07,190 --> 00:39:13,840
normal, and look at this
linear function Y.
767
00:39:13,840 --> 00:39:17,340
What is the expected
value of Y?
768
00:39:17,340 --> 00:39:19,220
Here we don't need
anything special.
769
00:39:19,220 --> 00:39:22,920
We know that the expected value
of a linear function is
770
00:39:22,920 --> 00:39:26,690
the linear function of
the expectation.
771
00:39:26,690 --> 00:39:30,570
So the expected value is this.
772
00:39:30,570 --> 00:39:33,230
How about the variance?
773
00:39:33,230 --> 00:39:36,430
We know that the variance of a
linear function doesn't care
774
00:39:36,430 --> 00:39:37,910
about the constant term.
775
00:39:37,910 --> 00:39:40,880
But the variance gets multiplied
by a squared.
776
00:39:40,880 --> 00:39:46,880
So we get these variance, where
sigma squared is the
777
00:39:46,880 --> 00:39:49,070
variance of the original
normal.
778
00:39:49,070 --> 00:39:53,730
So have we used so far the
property that X is normal?
779
00:39:53,730 --> 00:39:55,170
No, we haven't.
780
00:39:55,170 --> 00:39:59,650
This calculation here is true
in general when you take a
781
00:39:59,650 --> 00:40:02,650
linear function of a
random variable.
782
00:40:02,650 --> 00:40:08,730
But if X is normal, we get the
other additional fact that Y
783
00:40:08,730 --> 00:40:10,930
is also going to be normal.
784
00:40:10,930 --> 00:40:14,300
So that's the nontrivial
part of the fact that
785
00:40:14,300 --> 00:40:16,070
I'm claiming here.
786
00:40:16,070 --> 00:40:19,700
So linear functions of normal
random variables are
787
00:40:19,700 --> 00:40:23,020
themselves normal.
788
00:40:23,020 --> 00:40:26,680
How do we convince ourselves
about it?
789
00:40:26,680 --> 00:40:27,080
OK.
790
00:40:27,080 --> 00:40:31,390
It's something that we will do
formerly in about two or three
791
00:40:31,390 --> 00:40:33,390
lectures from today.
792
00:40:33,390 --> 00:40:35,310
So we're going to prove it.
793
00:40:35,310 --> 00:40:39,770
But if you think about it
intuitively, normal means this
794
00:40:39,770 --> 00:40:42,070
particular bell-shaped curve.
795
00:40:42,070 --> 00:40:45,550
And that bell-shaped curve could
be sitting anywhere and
796
00:40:45,550 --> 00:40:47,910
could be scaled in any way.
797
00:40:47,910 --> 00:40:51,190
So you start with a
bell-shaped curve.
798
00:40:51,190 --> 00:40:55,370
If you take X, which is bell
shaped, and you multiply it by
799
00:40:55,370 --> 00:40:57,500
a constant, what does that do?
800
00:40:57,500 --> 00:41:01,260
Multiplying by a constant is
just like scaling the axis or
801
00:41:01,260 --> 00:41:03,750
changing the units with which
you're measuring it.
802
00:41:03,750 --> 00:41:08,880
So it will take a bell shape
and spread it or narrow it.
803
00:41:08,880 --> 00:41:10,850
But it will still
be a bell shape.
804
00:41:10,850 --> 00:41:13,440
And then when you add the
constant, you just take that
805
00:41:13,440 --> 00:41:16,260
bell and move it elsewhere.
806
00:41:16,260 --> 00:41:19,970
So under linear transformations,
bell shapes
807
00:41:19,970 --> 00:41:23,360
will remain bell shapes, just
sitting at a different place
808
00:41:23,360 --> 00:41:25,090
and with a different width.
809
00:41:25,090 --> 00:41:30,490
And that sort of the intuition
of why normals remain normals
810
00:41:30,490 --> 00:41:32,096
under this kind of
transformation.
811
00:41:32,096 --> 00:41:35,100
812
00:41:35,100 --> 00:41:36,770
So why is this useful?
813
00:41:36,770 --> 00:41:37,960
Well, OK.
814
00:41:37,960 --> 00:41:39,890
We have a formula
for the density.
815
00:41:39,890 --> 00:41:43,750
But usually we want to calculate
probabilities.
816
00:41:43,750 --> 00:41:45,760
How will you calculate
probabilities?
817
00:41:45,760 --> 00:41:48,670
If I ask you, what's the
probability that the normal is
818
00:41:48,670 --> 00:41:51,380
less than 3, how
do you find it?
819
00:41:51,380 --> 00:41:54,830
You need to integrate the
density from minus
820
00:41:54,830 --> 00:41:57,300
infinity up to 3.
821
00:41:57,300 --> 00:42:03,230
Unfortunately, the integral of
the expression that shows up
822
00:42:03,230 --> 00:42:06,720
that you would have to
calculate, an integral of this
823
00:42:06,720 --> 00:42:12,690
kind from, let's say, minus
infinity to some number, is
824
00:42:12,690 --> 00:42:16,270
something that's not known
in closed form.
825
00:42:16,270 --> 00:42:23,490
So if you're looking for a
closed-form formula for this--
826
00:42:23,490 --> 00:42:25,040
X bar--
827
00:42:25,040 --> 00:42:27,890
if you're looking for a
closed-form formula that gives
828
00:42:27,890 --> 00:42:32,010
you the value of this integral
as a function of X bar, you're
829
00:42:32,010 --> 00:42:34,460
not going to find it.
830
00:42:34,460 --> 00:42:36,150
So what can we do?
831
00:42:36,150 --> 00:42:38,790
Well, since it's a useful
integral, we can
832
00:42:38,790 --> 00:42:40,880
just tabulate it.
833
00:42:40,880 --> 00:42:46,070
Calculate it once and for all,
for all values of X bar up to
834
00:42:46,070 --> 00:42:50,440
some precision, and have
that table, and use it.
835
00:42:50,440 --> 00:42:53,010
That's what one does.
836
00:42:53,010 --> 00:42:54,885
OK, but now there is a catch.
837
00:42:54,885 --> 00:42:59,600
Are we going to write down a
table for every conceivable
838
00:42:59,600 --> 00:43:01,870
type of normal distribution--
839
00:43:01,870 --> 00:43:05,115
that is, for every possible
mean and every variance?
840
00:43:05,115 --> 00:43:07,400
I guess that would be
a pretty long table.
841
00:43:07,400 --> 00:43:09,540
You don't want to do that.
842
00:43:09,540 --> 00:43:12,820
Fortunately, it's enough to
have a table with the
843
00:43:12,820 --> 00:43:17,590
numerical values only for
the standard normal.
844
00:43:17,590 --> 00:43:20,880
And once you have those, you can
use them in a clever way
845
00:43:20,880 --> 00:43:24,000
to calculate probabilities for
the more general case.
846
00:43:24,000 --> 00:43:26,090
So let's see how this is done.
847
00:43:26,090 --> 00:43:30,610
So our starting point is that
someone has graciously
848
00:43:30,610 --> 00:43:36,520
calculated for us the values
of the CDF, the cumulative
849
00:43:36,520 --> 00:43:40,350
distribution function, that is
the probability of falling
850
00:43:40,350 --> 00:43:44,120
below a certain point for
the standard normal
851
00:43:44,120 --> 00:43:46,610
and at various places.
852
00:43:46,610 --> 00:43:48,770
How do we read this table?
853
00:43:48,770 --> 00:43:55,840
The probability that X is
less than, let's say,
854
00:43:55,840 --> 00:43:59,170
0.63 is this number.
855
00:43:59,170 --> 00:44:04,610
This number, 0.7357, is the
probability that the standard
856
00:44:04,610 --> 00:44:08,070
normal is below 0.63.
857
00:44:08,070 --> 00:44:11,377
So the table refers to
the standard normal.
858
00:44:11,377 --> 00:44:15,990
859
00:44:15,990 --> 00:44:19,600
But someone, let's say, gives
us some other numbers and
860
00:44:19,600 --> 00:44:22,140
tells us we're dealing with a
normal with a certain mean and
861
00:44:22,140 --> 00:44:23,530
a certain variance.
862
00:44:23,530 --> 00:44:26,555
And we want to calculate the
probability that the value of
863
00:44:26,555 --> 00:44:28,740
that random variable is less
than or equal to 3.
864
00:44:28,740 --> 00:44:30,470
How are we going to do it?
865
00:44:30,470 --> 00:44:36,210
Well, there's a standard trick,
which is so-called
866
00:44:36,210 --> 00:44:39,480
standardizing a random
variable.
867
00:44:39,480 --> 00:44:41,350
Standardizing a random variable
868
00:44:41,350 --> 00:44:43,080
stands for the following.
869
00:44:43,080 --> 00:44:44,490
You look at the random
variable, and you
870
00:44:44,490 --> 00:44:46,280
subtract the mean.
871
00:44:46,280 --> 00:44:50,690
This makes it a random
variable with 0 mean.
872
00:44:50,690 --> 00:44:54,270
And then if I divide by the
standard deviation, what
873
00:44:54,270 --> 00:44:58,220
happens to the variance of
this random variable?
874
00:44:58,220 --> 00:45:03,860
Dividing by a number divides the
variance by sigma squared.
875
00:45:03,860 --> 00:45:07,300
The original variance of
X was sigma squared.
876
00:45:07,300 --> 00:45:11,740
So when I divide by sigma, I
end up with unit variance.
877
00:45:11,740 --> 00:45:14,920
So after I do this
transformation, I get a random
878
00:45:14,920 --> 00:45:19,190
variable that has 0 mean
and unit variance.
879
00:45:19,190 --> 00:45:20,650
It is also normal.
880
00:45:20,650 --> 00:45:23,580
Why is its normal?
881
00:45:23,580 --> 00:45:28,890
Because this expression is a
linear function of the X that
882
00:45:28,890 --> 00:45:30,120
I started with.
883
00:45:30,120 --> 00:45:32,700
It's a linear function of a
normal random variable.
884
00:45:32,700 --> 00:45:34,620
Therefore, it is normal.
885
00:45:34,620 --> 00:45:37,090
And it is a standard normal.
886
00:45:37,090 --> 00:45:41,460
So by taking a general normal
random variable and doing this
887
00:45:41,460 --> 00:45:47,200
standardization, you end up
with a standard normal to
888
00:45:47,200 --> 00:45:49,580
which you can then
apply the table.
889
00:45:49,580 --> 00:45:52,100
890
00:45:52,100 --> 00:45:56,180
Sometimes one calls this
the normalized score.
891
00:45:56,180 --> 00:45:59,100
If you're thinking about test
results, how would you
892
00:45:59,100 --> 00:46:00,780
interpret this number?
893
00:46:00,780 --> 00:46:05,440
It tells you how many standard
deviations are you
894
00:46:05,440 --> 00:46:07,900
away from the mean.
895
00:46:07,900 --> 00:46:10,470
This is how much you are
away from the mean.
896
00:46:10,470 --> 00:46:13,080
And you count it in terms
of how many standard
897
00:46:13,080 --> 00:46:14,390
deviations it is.
898
00:46:14,390 --> 00:46:19,680
So this number being equal to 3
tells you that X happens to
899
00:46:19,680 --> 00:46:23,160
be 3 standard deviations
above the mean.
900
00:46:23,160 --> 00:46:26,030
And I guess if you're looking
at your quiz scores, very
901
00:46:26,030 --> 00:46:30,690
often that's the kind of number
that you think about.
902
00:46:30,690 --> 00:46:32,130
So it's a useful quantity.
903
00:46:32,130 --> 00:46:35,120
But it's also useful for doing
the calculation we're now
904
00:46:35,120 --> 00:46:36,050
going to do.
905
00:46:36,050 --> 00:46:40,910
So suppose that X has a mean of
2 and a variance of 16, so
906
00:46:40,910 --> 00:46:43,600
a standard deviation of 4.
907
00:46:43,600 --> 00:46:46,030
And we're going to calculate the
probability of this event.
908
00:46:46,030 --> 00:46:49,900
This event is described in terms
of this X that has ugly
909
00:46:49,900 --> 00:46:51,530
means and variances.
910
00:46:51,530 --> 00:46:55,390
But we can take this event
and rewrite it as
911
00:46:55,390 --> 00:46:57,070
an equivalent event.
912
00:46:57,070 --> 00:47:01,470
X less than 3 is this same as
X minus 2 being less than 3
913
00:47:01,470 --> 00:47:06,410
minus 2, which is the same as
this ratio being less than
914
00:47:06,410 --> 00:47:08,440
that ratio.
915
00:47:08,440 --> 00:47:11,460
So I'm subtracting from both
sides of the inequality the
916
00:47:11,460 --> 00:47:14,170
mean and then dividing by
the standard deviation.
917
00:47:14,170 --> 00:47:16,190
This event is the same
as that event.
918
00:47:16,190 --> 00:47:19,430
Why do we like this
better than that?
919
00:47:19,430 --> 00:47:23,670
We like it because this is the
standardized, or normalized,
920
00:47:23,670 --> 00:47:28,660
version of X. We know that
this is standard normal.
921
00:47:28,660 --> 00:47:30,650
And so we're asking the
question, what's the
922
00:47:30,650 --> 00:47:34,130
probability that the standard
normal is less than this
923
00:47:34,130 --> 00:47:37,300
number, which is 1/4?
924
00:47:37,300 --> 00:47:45,380
So that's the key property, that
this is normal (0, 1).
925
00:47:45,380 --> 00:47:48,470
And so we can look up now with
the table and ask for the
926
00:47:48,470 --> 00:47:51,010
probability that the standard
normal random variable
927
00:47:51,010 --> 00:47:53,170
is less than 0.25.
928
00:47:53,170 --> 00:47:55,130
Where is that going to be?
929
00:47:55,130 --> 00:48:01,390
0.2, 0.25, it's here.
930
00:48:01,390 --> 00:48:09,600
So the answer is 0.987.
931
00:48:09,600 --> 00:48:15,570
So I guess this is just a drill
that you could learn in
932
00:48:15,570 --> 00:48:16,190
high school.
933
00:48:16,190 --> 00:48:18,990
You didn't have to come here
to learn about it.
934
00:48:18,990 --> 00:48:22,030
But it's a drill that's very
useful when we will be
935
00:48:22,030 --> 00:48:24,060
calculating normal probabilities
all the time.
936
00:48:24,060 --> 00:48:27,300
So make sure you know how to
use the table and how to
937
00:48:27,300 --> 00:48:30,350
massage a general normal
random variable into a
938
00:48:30,350 --> 00:48:33,380
standard normal random
variable.
939
00:48:33,380 --> 00:48:33,790
OK.
940
00:48:33,790 --> 00:48:37,450
So just one more minute to look
at the big picture and
941
00:48:37,450 --> 00:48:40,940
take stock of what we
have done so far
942
00:48:40,940 --> 00:48:42,970
and where we're going.
943
00:48:42,970 --> 00:48:47,840
Chapter 2 was this part of the
picture, where we dealt with
944
00:48:47,840 --> 00:48:50,460
discrete random variables.
945
00:48:50,460 --> 00:48:54,590
And this time, today, we
started talking about
946
00:48:54,590 --> 00:48:56,410
continuous random variables.
947
00:48:56,410 --> 00:49:00,305
And we introduced the density
function, which is the analog
948
00:49:00,305 --> 00:49:03,160
of the probability
mass function.
949
00:49:03,160 --> 00:49:05,790
We have the concepts
of expectation and
950
00:49:05,790 --> 00:49:07,090
variance and CDF.
951
00:49:07,090 --> 00:49:10,290
And this kind of notation
applies to both discrete and
952
00:49:10,290 --> 00:49:11,720
continuous cases.
953
00:49:11,720 --> 00:49:17,310
They are calculated the same way
in both cases except that
954
00:49:17,310 --> 00:49:19,770
in the continuous case,
you use sums.
955
00:49:19,770 --> 00:49:22,740
In the discrete case,
you use integrals.
956
00:49:22,740 --> 00:49:25,320
So on that side, you
have integrals.
957
00:49:25,320 --> 00:49:27,500
In this case, you have sums.
958
00:49:27,500 --> 00:49:30,200
In this case, you always have
Fs in your formulas.
959
00:49:30,200 --> 00:49:33,500
In this case, you always have
Ps in your formulas.
960
00:49:33,500 --> 00:49:37,890
So what's there that's left
for us to do is to look at
961
00:49:37,890 --> 00:49:42,460
these two concepts, joint
probability mass functions and
962
00:49:42,460 --> 00:49:47,410
conditional mass functions, and
figure out what would be
963
00:49:47,410 --> 00:49:51,080
the equivalent concepts on
the continuous side.
964
00:49:51,080 --> 00:49:55,240
So we will need some notion of
a joint density when we're
965
00:49:55,240 --> 00:49:57,510
dealing with multiple
random variables.
966
00:49:57,510 --> 00:50:00,310
And we will also need the
concept of conditional
967
00:50:00,310 --> 00:50:03,430
density, again for the case of
continuous random variables.
968
00:50:03,430 --> 00:50:07,840
The intuition and the meaning
of these objects is going to
969
00:50:07,840 --> 00:50:14,120
be exactly the same as here,
only a little subtler because
970
00:50:14,120 --> 00:50:16,000
densities are not
probabilities.
971
00:50:16,000 --> 00:50:18,630
They're rates at which
probabilities accumulate.
972
00:50:18,630 --> 00:50:22,030
So that adds a little bit of
potential confusion here,
973
00:50:22,030 --> 00:50:24,680
which, hopefully, we will fully
resolve in the next
974
00:50:24,680 --> 00:50:26,490
couple of sections.
975
00:50:26,490 --> 00:50:27,310
All right.
976
00:50:27,310 --> 00:50:28,560
Thank you.
977
00:50:28,560 --> 00:50:29,070