1
00:00:01 --> 00:00:03
The following content is
provided under a Creative
2
00:00:03 --> 00:00:05
Commons license.
Your support will help MIT
3
00:00:05 --> 00:00:08
OpenCourseWare continue to offer
high quality educational
4
00:00:08 --> 00:00:13
resources for free.
To make a donation or to view
5
00:00:13 --> 00:00:18
additional materials from
hundreds of MIT courses,
6
00:00:18 --> 00:00:23
visit MIT OpenCourseWare at
ocw.mit.edu.
7
00:00:23 --> 00:00:28
Today we are going to see how
to use what we saw last time
8
00:00:28 --> 00:00:33
about partial derivatives to
handle minimization or
9
00:00:33 --> 00:00:41
maximization problems involving
functions of several variables.
10
00:00:41 --> 00:00:44
Remember last time we said that
when we have a function,
11
00:00:44 --> 00:00:49
say, of two variables, x and y,
then we have actually two
12
00:00:49 --> 00:00:53
different derivatives,
partial f, partial x,
13
00:00:53 --> 00:01:02
also called f sub x,
the derivative with respect to
14
00:01:02 --> 00:01:11
x keeping y constant.
And we have partial f,
15
00:01:11 --> 00:01:21
partial y, also called f sub y,
where we vary y and we keep x
16
00:01:21 --> 00:01:26
as a constant.
And now, one thing I didn't
17
00:01:26 --> 00:01:30
have time to tell you about but
hopefully you thought about in
18
00:01:30 --> 00:01:37
recitation yesterday,
is the approximation formula
19
00:01:37 --> 00:01:47
that tells you what happens if
you vary both x and y.
20
00:01:47 --> 00:01:50
f sub x tells us what happens
if we change x a little bit,
21
00:01:50 --> 00:01:53
by some small amount delta x.
f sub y tells us how f changes,
22
00:01:53 --> 00:01:56
if you change y by a small
amount delta y.
23
00:01:56 --> 00:02:00
If we do both at the same time
then the two effects will add up
24
00:02:00 --> 00:02:02
with each other,
because you can imagine that
25
00:02:02 --> 00:02:05
first you will change x and then
you will change y.
26
00:02:05 --> 00:02:12
Or the other way around.
It doesn't really matter.
27
00:02:12 --> 00:02:18
If we change x by a certain
amount delta x,
28
00:02:18 --> 00:02:23
and if we change y by the
amount delta y,
29
00:02:23 --> 00:02:32
and let's say that we have z=
f(x, y) then that changes by an
30
00:02:32 --> 00:02:40
amount which is approximately f
sub x times delta x plus f sub y
31
00:02:40 --> 00:02:45
times delta y.
And that is one of the most
32
00:02:45 --> 00:02:49
important formulas about partial
derivatives.
33
00:02:49 --> 00:02:54
The intuition for this,
again, is just the two effects
34
00:02:54 --> 00:02:58
of if I change x by a small
amount and then I change y.
35
00:02:58 --> 00:03:02
Well, first changing x will
modify f, how much does it
36
00:03:02 --> 00:03:06
modify f?
The answer is the rate change
37
00:03:06 --> 00:03:09
is f sub x.
And if I change y then the rate
38
00:03:09 --> 00:03:13
of change of f when I change y
is f sub y.
39
00:03:13 --> 00:03:17
So all together I get this
change as a value of f.
40
00:03:17 --> 00:03:19
And, of course,
that is only an approximation
41
00:03:19 --> 00:03:22
formula.
Actually, there would be higher
42
00:03:22 --> 00:03:28
order terms involving second and
third derivatives and so on.
43
00:03:28 --> 00:03:43
One way to justify this --
Sorry.
44
00:03:43 --> 00:03:47
I was distracted by the
microphone.
45
00:03:47 --> 00:03:55
OK.
How do we justify this formula?
46
00:03:55 --> 00:04:05
Well, one way to think about it
is in terms of tangent plane
47
00:04:05 --> 00:04:10
approximation.
Let's think about the tangent
48
00:04:10 --> 00:04:13
plane with regard to a function
f.
49
00:04:13 --> 00:04:15
We have some pictures to show
you.
50
00:04:15 --> 00:04:20
It will be easier if I show you
pictures.
51
00:04:20 --> 00:04:24
Remember, partial f,
partial x was obtained by
52
00:04:24 --> 00:04:29
looking at the situation where y
is held constant.
53
00:04:29 --> 00:04:33
That means I am slicing the
graph of f by a plane that is
54
00:04:33 --> 00:04:35
parallel to the x,
z plane.
55
00:04:35 --> 00:04:39
And when I change x,
z changes, and the slope of
56
00:04:39 --> 00:04:44
that is going to be the
derivative with respect to x.
57
00:04:44 --> 00:04:49
Now, if I do the same in the
other direction then I will have
58
00:04:49 --> 00:04:53
similarly the slope in a slice
now parallel to the y,
59
00:04:53 --> 00:04:57
z plane that will be partial f,
partial y.
60
00:04:57 --> 00:05:00
In fact, in each case,
I have a line.
61
00:05:00 --> 00:05:02
And that line is tangent to the
surface.
62
00:05:02 --> 00:05:06
Now, if I have two lines
tangent to the surface,
63
00:05:06 --> 00:05:09
well, then together they
determine for me the tangent
64
00:05:09 --> 00:05:13
plane to the surface.
Let's try to see how that works.
65
00:05:13 --> 00:05:18
66
00:05:18 --> 00:05:28
We know that f sub x and f sub
y are the slopes of two tangent
67
00:05:28 --> 00:05:37
lines to this plane,
two tangent lines to the graph.
68
00:05:37 --> 00:05:39
And let's write down the
equations of these lines.
69
00:05:39 --> 00:05:41
I am not going to write
parametric equations.
70
00:05:41 --> 00:05:45
I am going to write them in
terms of x, y,
71
00:05:45 --> 00:05:49
z coordinates.
Let's say that partial f of a
72
00:05:49 --> 00:05:53
partial x at the given point is
equal to a.
73
00:05:53 --> 00:06:00
That means that we have a line
given by the following
74
00:06:00 --> 00:06:05
conditions.
I am going to keep y constant
75
00:06:05 --> 00:06:07
equal to y0.
And I am going to change x.
76
00:06:07 --> 00:06:12
And, as I change x,
z will change at the rate that
77
00:06:12 --> 00:06:22
is equal to a.
That would be z = 0 a(x - x0).
78
00:06:22 --> 00:06:26
That is how you would describe
a line that, I guess,
79
00:06:26 --> 00:06:30
the one that is plotted in
green here, been dissected with
80
00:06:30 --> 00:06:33
the slice parallel to the x,
z plane.
81
00:06:33 --> 00:06:40
I hold y constant equal to y0.
And z is a function of x that
82
00:06:40 --> 00:06:50
varies with a rate of a.
And now if I look similarly at
83
00:06:50 --> 00:06:55
the other slice,
let's say that the partial with
84
00:06:55 --> 00:07:00
respect to y is equal to b,
then I get another line which
85
00:07:00 --> 00:07:06
is obtained by the fact that z
now will depend on y.
86
00:07:06 --> 00:07:10
And the rate of change with
respect to y will be b.
87
00:07:10 --> 00:07:15
While x is held constant equal
to x0.
88
00:07:15 --> 00:07:19
These two lines are both going
to be in the tangent plane to
89
00:07:19 --> 00:07:20
the surface.
90
00:07:20 --> 00:07:40
91
00:07:40 --> 00:07:45
They are both tangent to the
graph of f and together they
92
00:07:45 --> 00:07:47
determine the plane.
93
00:07:47 --> 00:07:56
94
00:07:56 --> 00:08:08
And that plane is just given by
the formula z = z0 a( x - x0) b
95
00:08:08 --> 00:08:13
( y - y0).
If you look at what happens --
96
00:08:13 --> 00:08:19
This is the equation of a plane.
z equals constant times x plus
97
00:08:19 --> 00:08:24
constant times y plus constant.
And if you look at what happens
98
00:08:24 --> 00:08:28
if I hold y constant and vary x,
I will get the first line.
99
00:08:28 --> 00:08:33
If I hold x constant and vary
y, I get the second line.
100
00:08:33 --> 00:08:34
Another way to do it,
of course,
101
00:08:34 --> 00:08:37
would provide actually
parametric equations of these
102
00:08:37 --> 00:08:40
lines,
get vectors along them and then
103
00:08:40 --> 00:08:43
take the cross-product to get
the normal vector to the plane.
104
00:08:43 --> 00:08:47
And then get this equation for
the plane using the normal
105
00:08:47 --> 00:08:49
vector.
That also works and it gives
106
00:08:49 --> 00:08:53
you the same formula.
If you are curious of the
107
00:08:53 --> 00:08:57
exercise, do it again using
parametrics and using
108
00:08:57 --> 00:09:01
cross-product to get the plane
equation.
109
00:09:01 --> 00:09:03
That is how we get the tangent
plane.
110
00:09:03 --> 00:09:06
And now what this approximation
formula here says is that,
111
00:09:06 --> 00:09:10
in fact, the graph of a
function is close to the tangent
112
00:09:10 --> 00:09:12
plane.
If we were moving on the
113
00:09:12 --> 00:09:15
tangent plane,
this would be an actual
114
00:09:15 --> 00:09:17
equality.
Delta z would be a linear
115
00:09:17 --> 00:09:23
function of delta x and delta y.
And the graph of a function is
116
00:09:23 --> 00:09:27
near the tangent plane,
but is not quite the same,
117
00:09:27 --> 00:09:33
so it is only an approximation
for small delta x and small
118
00:09:33 --> 00:09:43
delta y.
The approximation formula says
119
00:09:43 --> 00:09:57
the graph of f is close to its
tangent plane.
120
00:09:57 --> 00:10:02
And we can use that formula
over here now to estimate how
121
00:10:02 --> 00:10:08
the value of f changes if I
change x and y at the same time.
122
00:10:08 --> 00:10:18
Questions about that?
Now that we have caught up with
123
00:10:18 --> 00:10:23
what we were supposed to see on
Tuesday, I can tell you now
124
00:10:23 --> 00:10:26
about max and min problems.
125
00:10:26 --> 00:10:38
126
00:10:38 --> 00:10:48
That is going to be an
application of partial
127
00:10:48 --> 00:11:00
derivatives to look at
optimization problems.
128
00:11:00 --> 00:11:03
Maybe ten years from now,
when you have a real job,
129
00:11:03 --> 00:11:07
your job might be to actually
minimize the cost of something
130
00:11:07 --> 00:11:11
or maximize the profit of
something or whatever.
131
00:11:11 --> 00:11:14
But typically the function that
you will have to strive to
132
00:11:14 --> 00:11:18
minimize or maximize will depend
on several variables.
133
00:11:18 --> 00:11:22
If you have a function of one
variable, you know that to find
134
00:11:22 --> 00:11:26
its minimum or its maximum you
look at the derivative and set
135
00:11:26 --> 00:11:29
that equal to zero.
And you try to then look at
136
00:11:29 --> 00:11:38
what happens to the function.
Here it is going to be kind of
137
00:11:38 --> 00:11:47
similar, except,
of course, we have several
138
00:11:47 --> 00:11:51
derivatives.
For today we will think about a
139
00:11:51 --> 00:11:56
function of two variables,
but it works exactly the same
140
00:11:56 --> 00:12:00
if you have three variables,
ten variables,
141
00:12:00 --> 00:12:07
a million variables.
The first observation is that
142
00:12:07 --> 00:12:17
if we have a local minimum or a
local maximum then both partial
143
00:12:17 --> 00:12:21
derivatives,
so partial f partial x and
144
00:12:21 --> 00:12:26
partial f partial y,
are both zero at the same time.
145
00:12:26 --> 00:12:30
Why is that?
Well, let's say that f of x is
146
00:12:30 --> 00:12:32
zero.
That means when I vary x to
147
00:12:32 --> 00:12:35
first order the function doesn't
change.
148
00:12:35 --> 00:12:37
Maybe that is because it is
going through...
149
00:12:37 --> 00:12:42
If I look only at the slice
parallel to the x-axis then
150
00:12:42 --> 00:12:45
maybe I am going through the
minimum.
151
00:12:45 --> 00:12:48
But if partial f,
partial y is not 0 then
152
00:12:48 --> 00:12:51
actually, by changing y,
I could still make a value
153
00:12:51 --> 00:12:54
larger or smaller.
That wouldn't be an actual
154
00:12:54 --> 00:12:57
maximum or minimum.
It would only be a maximum or
155
00:12:57 --> 00:13:01
minimum if I stay in the slice.
But if I allow myself to change
156
00:13:01 --> 00:13:04
y that doesn't work.
I need actually to know that if
157
00:13:04 --> 00:13:07
I change y the value will not
change either to first order.
158
00:13:07 --> 00:13:11
That is why you also need
partial f, partial y to be zero.
159
00:13:11 --> 00:13:13
Now, let's say that they are
both zero.
160
00:13:13 --> 00:13:16
Well, why is that enough?
It is essentially enough
161
00:13:16 --> 00:13:20
because of this formula telling
me that if both of these guys
162
00:13:20 --> 00:13:24
are zero then to first order the
function doesn't change.
163
00:13:24 --> 00:13:26
Then, of course,
there will be maybe quadratic
164
00:13:26 --> 00:13:28
terms that will actually turn
that, you know,
165
00:13:28 --> 00:13:31
this won't really say that your
function is actually constant.
166
00:13:31 --> 00:13:35
It will just tell you that
maybe it will actually be
167
00:13:35 --> 00:13:40
quadratic or higher order in
delta x and delta y.
168
00:13:40 --> 00:13:52
That is what you expect to have
at a maximum or a minimum.
169
00:13:52 --> 00:14:05
The condition is the same thing
as saying that the tangent plane
170
00:14:05 --> 00:14:15
to the graph is actually going
to be horizontal.
171
00:14:15 --> 00:14:18
And that is what you want to
have.
172
00:14:18 --> 00:14:23
Say you have a minimum,
well, the tangent plane at this
173
00:14:23 --> 00:14:30
point, at the bottom of the
graph is going to be horizontal.
174
00:14:30 --> 00:14:35
And you can see that on this
equation of a tangent plane,
175
00:14:35 --> 00:14:40
when both these coefficients
are 0 that is when the equation
176
00:14:40 --> 00:14:44
becomes z equals constant:
the horizontal plane.
177
00:14:44 --> 00:14:50
Does that make sense?
We will have a name for this
178
00:14:50 --> 00:14:52
kind of point because,
actually,
179
00:14:52 --> 00:14:55
what we will see very soon is
that these conditions are
180
00:14:55 --> 00:14:57
necessary but are not
sufficient.
181
00:14:57 --> 00:15:02
There are actually other kinds
of points where the partial
182
00:15:02 --> 00:15:08
derivatives are zero.
Let's give a name to this.
183
00:15:08 --> 00:15:24
We say the definition is (x0,
y0) is a critical point of f --
184
00:15:24 --> 00:15:36
-- if the partial derivative,
with respect to x,
185
00:15:36 --> 00:15:44
and partial derivative with
respect to y are both zero.
186
00:15:44 --> 00:15:50
Generally, you would want all
the partial derivatives,
187
00:15:50 --> 00:15:56
no matter how many variables
you have, to be zero at the same
188
00:15:56 --> 00:16:06
time.
Let's see an example.
189
00:16:06 --> 00:16:23
Let's say I give you the
function f(x;y)= x^2 - 2xy 3y^2
190
00:16:23 --> 00:16:28
2x - 2y.
And let's try to figure out
191
00:16:28 --> 00:16:32
whether we can minimize or
maximize this.
192
00:16:32 --> 00:16:37
What we would start doing
immediately is taking the
193
00:16:37 --> 00:16:43
partial derivatives.
What is f sub x?
194
00:16:43 --> 00:16:56
It starts with 2x - 2y 0 2.
Remember that y is a constant
195
00:16:56 --> 00:17:04
so this differentiates to zero.
Now, if we do f sub y,
196
00:17:04 --> 00:17:14
that is going to be 0-2x 6y-2.
And what we want to do is set
197
00:17:14 --> 00:17:17
these things to zero.
And we want to solve these two
198
00:17:17 --> 00:17:21
equations at the same time.
An important thing to remember,
199
00:17:21 --> 00:17:23
and maybe I should have told
you a couple of weeks ago
200
00:17:23 --> 00:17:25
already,
if you have two equations to
201
00:17:25 --> 00:17:28
solve, well,
it is very good to try to
202
00:17:28 --> 00:17:30
simplify them by adding them
together or whatever,
203
00:17:30 --> 00:17:33
but you must keep two equations.
If you have two equations,
204
00:17:33 --> 00:17:37
you shouldn't end up with just
one equation out of nowhere.
205
00:17:37 --> 00:17:40
For example here,
we can certainly simplify
206
00:17:40 --> 00:17:46
things by summing them together.
If we add them together,
207
00:17:46 --> 00:17:52
well, the x's cancel and the
constants cancel.
208
00:17:52 --> 00:17:56
In fact, we are just left with
4y for zero.
209
00:17:56 --> 00:18:00
That is pretty good.
That tells us y should be zero.
210
00:18:00 --> 00:18:02
But then we should,
of course, go back to these and
211
00:18:02 --> 00:18:07
see what else we know.
Well, now it tells us,
212
00:18:07 --> 00:18:14
if you put y = 0 it tells you
2x 2 = 0.
213
00:18:14 --> 00:18:26
That tells you x = - 1.
We have one critical point that
214
00:18:26 --> 00:18:33
is (x, y) = (- 1;
0).
215
00:18:33 --> 00:18:39
Any questions so far?
No.
216
00:18:39 --> 00:18:40
Well, you should have a
question.
217
00:18:40 --> 00:18:49
The question should be how do
we know if it is a maximum or a
218
00:18:49 --> 00:18:53
minimum?
Yeah.
219
00:18:53 --> 00:18:55
If we had a function of one
variable, we would decide things
220
00:18:55 --> 00:18:58
based on the second derivative.
And, in fact,
221
00:18:58 --> 00:19:00
we will see tomorrow how to do
things based on the second
222
00:19:00 --> 00:19:03
derivative.
But that is kind of tricky
223
00:19:03 --> 00:19:06
because there are a lot of
second derivatives.
224
00:19:06 --> 00:19:09
I mean we already have two
first derivatives.
225
00:19:09 --> 00:19:14
You can imagine that if you
keep taking partials you may end
226
00:19:14 --> 00:19:17
up with more and more,
so we will have to figure out
227
00:19:17 --> 00:19:19
carefully what the condition
should be.
228
00:19:19 --> 00:19:27
We will do that tomorrow.
For now, let's just try to look
229
00:19:27 --> 00:19:38
a bit at how do we understand
these things by hand?
230
00:19:38 --> 00:19:42
In fact, let me point out to
you immediately that there is
231
00:19:42 --> 00:19:49
more than maxima and minima.
Remember, we saw the example of
232
00:19:49 --> 00:19:52
x^2 y^2.
That has a critical point.
233
00:19:52 --> 00:19:56
That critical point is
obviously a minimum.
234
00:19:56 --> 00:19:58
And, of course,
it could be a local minimum
235
00:19:58 --> 00:20:01
because it could be that if you
have a more complicated function
236
00:20:01 --> 00:20:04
there is indeed a minimum here,
but then elsewhere the function
237
00:20:04 --> 00:20:08
drops to a lower value.
We call that just a local
238
00:20:08 --> 00:20:12
minimum to say that it is a
minimum if you stick two values
239
00:20:12 --> 00:20:15
that are close enough to that
point.
240
00:20:15 --> 00:20:19
Of course, you also have local
maximum, which I didn't plot,
241
00:20:19 --> 00:20:23
but it is easy to plot.
That is a local maximum.
242
00:20:23 --> 00:20:27
But there is a third example of
critical point,
243
00:20:27 --> 00:20:31
and that is a saddle point.
The saddle point,
244
00:20:31 --> 00:20:35
it is a new phenomena that you
don't really see in single
245
00:20:35 --> 00:20:38
variable calculus.
It is a critical point that is
246
00:20:38 --> 00:20:42
neither a minimum nor a maximum
because, depending on which
247
00:20:42 --> 00:20:46
direction you look in,
it's either one or the other.
248
00:20:46 --> 00:20:50
See the point in the middle,
at the origin,
249
00:20:50 --> 00:20:55
is a saddle point.
If you look at the tangent
250
00:20:55 --> 00:20:58
plane to this graph,
you will see that it is
251
00:20:58 --> 00:21:01
actually horizontal at the
origin.
252
00:21:01 --> 00:21:05
You have this mountain pass
where the ground is horizontal.
253
00:21:05 --> 00:21:08
But, depending on which
direction you go,
254
00:21:08 --> 00:21:12
you go up or down.
So, we say that a point is a
255
00:21:12 --> 00:21:16
saddle point if it is neither a
minimum or a maximum.
256
00:21:16 --> 00:21:30
257
00:21:30 --> 00:21:38
Possibilities could be a local
min, a local max or a saddle.
258
00:21:38 --> 00:21:42
Tomorrow we will see how to
decide which one it is,
259
00:21:42 --> 00:21:46
in general, using second
derivatives.
260
00:21:46 --> 00:21:50
For this time,
let's just try to do it by
261
00:21:50 --> 00:21:53
hand.
I just want to observe,
262
00:21:53 --> 00:21:57
in fact, I can try to,
you know,
263
00:21:57 --> 00:21:58
these examples that I have
here,
264
00:21:58 --> 00:22:02
they are x^2 y^2, y^2 - x^2,
they are sums or differences of
265
00:22:02 --> 00:22:05
squares.
And, if we know that we can put
266
00:22:05 --> 00:22:08
things as sum of squares for
example, we will be done.
267
00:22:08 --> 00:22:16
Let's try to express this maybe
as the square of something.
268
00:22:16 --> 00:22:21
The main problem is this 2xy.
Observe we know something that
269
00:22:21 --> 00:22:26
starts with x^2 - 2xy but is
actually a square of something
270
00:22:26 --> 00:22:32
else.
It would be x^2 - 2xy y^2,
271
00:22:32 --> 00:22:37
not plus 3y2.
Let's try that.
272
00:22:37 --> 00:22:48
So, we are going to complete
the square.
273
00:22:48 --> 00:22:53
I am going to say it is x minus
y squared, so it gives me the
274
00:22:53 --> 00:23:01
first two terms and also the y2.
Well, I still need to add two
275
00:23:01 --> 00:23:09
more y^2, and I also need to
add, of course,
276
00:23:09 --> 00:23:15
the 2x and - 2y.
It is still not simple enough
277
00:23:15 --> 00:23:19
for my taste.
I can actually do better.
278
00:23:19 --> 00:23:24
These guys look like a sum of
squares, but here I have this
279
00:23:24 --> 00:23:28
extra stuff, 2x - 2y.
Well, that is 2 (x - y).
280
00:23:28 --> 00:23:32
It looks like maybe we can
modify this and make this into
281
00:23:32 --> 00:23:36
another square.
So, in fact,
282
00:23:36 --> 00:23:45
I can simplify this further to
(x - y 1)^2.
283
00:23:45 --> 00:23:51
That would be (x - y)^2 2( x -
y), and then there is a plus
284
00:23:51 --> 00:23:55
one.
Well, we don't have a plus one
285
00:23:55 --> 00:24:00
so let's remove it by
subtracting one.
286
00:24:00 --> 00:24:07
And I still have my 2y^2.
Do you see why this is the same
287
00:24:07 --> 00:24:13
function?
Yeah.
288
00:24:13 --> 00:24:19
Again, if I expand x minus y
plus one squared,
289
00:24:19 --> 00:24:28
I get (x - y)^2 2 (x - y) 1.
But I will have minus one that
290
00:24:28 --> 00:24:34
will cancel out and then I have
a plus 2y^2.
291
00:24:34 --> 00:24:41
Now, what I know is a sum of
two squared minus one.
292
00:24:41 --> 00:24:44
And this critical point,
(x,y) = (-1;0),
293
00:24:44 --> 00:24:49
that is actually when this is
zero and that is zero,
294
00:24:49 --> 00:24:55
so that is the smallest value.
This is always greater or equal
295
00:24:55 --> 00:25:00
to zero, the same with that one,
so that is always at least
296
00:25:00 --> 00:25:03
minus one.
And minus one happens to be the
297
00:25:03 --> 00:25:13
value at the critical point.
So, it is a minimum.
298
00:25:13 --> 00:25:16
Now, of course here I was very
lucky.
299
00:25:16 --> 00:25:19
I mean, generally,
I couldn't expect things to
300
00:25:19 --> 00:25:21
simplify that much.
In fact, I cheated.
301
00:25:21 --> 00:25:26
I started from that,
I expanded, and then that is
302
00:25:26 --> 00:25:30
how I got my example.
The general method will be a
303
00:25:30 --> 00:25:32
bit different,
but you will see it will
304
00:25:32 --> 00:25:34
actually also involve completing
squares.
305
00:25:34 --> 00:25:42
Just there is more to it than
what we have seen.
306
00:25:42 --> 00:25:48
We will come back to this
tomorrow.
307
00:25:48 --> 00:25:56
Sorry?
How do I know that this equals
308
00:25:56 --> 00:26:09
-- How do I know that the whole
function is greater or equal to
309
00:26:09 --> 00:26:15
negative one?
Well, I wrote f of x,
310
00:26:15 --> 00:26:20
y as something squared plus
2y^2 - 1.
311
00:26:20 --> 00:26:25
This squared is always a
positive number and not a
312
00:26:25 --> 00:26:27
negative.
It is a square.
313
00:26:27 --> 00:26:30
The square of something is
always non-negative.
314
00:26:30 --> 00:26:34
Similarly, y^2 is also always
non-negative.
315
00:26:34 --> 00:26:38
So if you add something that is
at least zero plus something
316
00:26:38 --> 00:26:40
that is at least zero and you
subtract one,
317
00:26:40 --> 00:26:43
you get always at least minus
one.
318
00:26:43 --> 00:26:48
And, in fact,
the only way you can get minus
319
00:26:48 --> 00:26:54
one is if both of these guys are
zero at the same time.
320
00:26:54 --> 00:27:17
That is how I get my minimum.
More about this tomorrow.
321
00:27:17 --> 00:27:20
In fact,
what I would like to tell you
322
00:27:20 --> 00:27:23
about now instead is a nice
application of min,
323
00:27:23 --> 00:27:27
max problems that maybe you
don't think of as a min,
324
00:27:27 --> 00:27:31
max problem that you will see.
I mean you will think of it
325
00:27:31 --> 00:27:35
that way because probably your
calculator can do it for you or,
326
00:27:35 --> 00:27:37
if not, your computer can do it
for you.
327
00:27:37 --> 00:27:42
But it is actually something
where the theory is based on
328
00:27:42 --> 00:27:47
minimization in two variables.
Very often in experimental
329
00:27:47 --> 00:27:52
sciences you have to do
something called least-squares
330
00:27:52 --> 00:28:01
intercalation.
And what is that about?
331
00:28:01 --> 00:28:07
Well, it is the idea that maybe
you do some experiments and you
332
00:28:07 --> 00:28:11
record some data.
You have some data x and some
333
00:28:11 --> 00:28:13
data y.
And, I don't know,
334
00:28:13 --> 00:28:17
maybe, for example,
x is -- Maybe your measuring
335
00:28:17 --> 00:28:21
frogs and you're trying to
measure how bit the frog leg is
336
00:28:21 --> 00:28:23
compared to the eyes of the
frog,
337
00:28:23 --> 00:28:26
or you're trying to measure
something.
338
00:28:26 --> 00:28:30
And if you are doing chemistry
then it could be how much you
339
00:28:30 --> 00:28:35
put of some reactant and how
much of the output product that
340
00:28:35 --> 00:28:37
you wanted to synthesize
generated.
341
00:28:37 --> 00:28:43
All sorts of things.
Make up your own example.
342
00:28:43 --> 00:28:46
You measure basically,
for various values of x,
343
00:28:46 --> 00:28:48
what the value of y ends up
being.
344
00:28:48 --> 00:28:52
And then you like to claim
these points are kind of
345
00:28:52 --> 00:28:53
aligned.
And, of course,
346
00:28:53 --> 00:28:55
to a mathematician they are not
aligned.
347
00:28:55 --> 00:28:57
But, to an experimental
scientist, that is evidence that
348
00:28:57 --> 00:29:00
there is a relation between the
two.
349
00:29:00 --> 00:29:03
And so you want to claim -- And
in your paper you will actually
350
00:29:03 --> 00:29:05
draw a nice little line like
that.
351
00:29:05 --> 00:29:10
The functions depend linearly
on each of them.
352
00:29:10 --> 00:29:15
The question is how do we come
up with that nice line that
353
00:29:15 --> 00:29:19
passes smack in the middle of
the points?
354
00:29:19 --> 00:29:27
The question is,
given experimental data xi,
355
00:29:27 --> 00:29:36
yi -- Maybe I should actually
be more precise.
356
00:29:36 --> 00:29:37
You are given some experimental
data.
357
00:29:37 --> 00:29:45
You have data points x1,
y1, x2, y2 and so on,
358
00:29:45 --> 00:29:52
xn, yn,
the question would be find the
359
00:29:52 --> 00:30:00
"best fit"
line of a form y equals ax b
360
00:30:00 --> 00:30:08
that somehow approximates very
well this data.
361
00:30:08 --> 00:30:11
You can also use that right
away to predict various things.
362
00:30:11 --> 00:30:13
For example,
if you look at your new
363
00:30:13 --> 00:30:17
homework,
actually the first problem asks
364
00:30:17 --> 00:30:22
you to predict how many iPods
will be on this planet in ten
365
00:30:22 --> 00:30:28
years looking at past sales and
how they behave.
366
00:30:28 --> 00:30:31
One thing, right away,
before you lose all the money
367
00:30:31 --> 00:30:35
that you don't have yet,
you cannot use that to predict
368
00:30:35 --> 00:30:39
the stock market.
So, don't try to use that to
369
00:30:39 --> 00:30:52
make money.
It doesn't work.
370
00:30:52 --> 00:30:58
One tricky thing here that I
want to draw your attention to
371
00:30:58 --> 00:31:02
is what are the unknowns here?
The natural answer would be to
372
00:31:02 --> 00:31:03
say that the unknowns are x and
y.
373
00:31:03 --> 00:31:07
That is not actually the case.
We are not going to solve for
374
00:31:07 --> 00:31:09
some x and y.
I mean we have some values
375
00:31:09 --> 00:31:12
given to us.
And, when we are looking for
376
00:31:12 --> 00:31:16
that line, we don't really care
about the perfect value of x.
377
00:31:16 --> 00:31:21
What we care about is actually
these coefficients a and b that
378
00:31:21 --> 00:31:26
will tell us what the relation
is between x and y.
379
00:31:26 --> 00:31:30
In fact, we are trying to solve
for a and b that will give us
380
00:31:30 --> 00:31:34
the nicest possible line for
these points.
381
00:31:34 --> 00:31:36
The unknowns,
in our equations,
382
00:31:36 --> 00:31:39
will have to be a and b,
not x and y.
383
00:31:39 --> 00:32:11
384
00:32:11 --> 00:32:20
The question really is find the
"best"
385
00:32:20 --> 00:32:23
a and b.
And, of course,
386
00:32:23 --> 00:32:26
we have to decide what we mean
by best.
387
00:32:26 --> 00:32:30
Best will mean that we minimize
some function of a and b that
388
00:32:30 --> 00:32:34
measures the total errors that
we are making when we are
389
00:32:34 --> 00:32:38
choosing this line compared to
the experimental data.
390
00:32:38 --> 00:32:43
Maybe, roughly speaking,
it should measure how far these
391
00:32:43 --> 00:32:49
points are from the line.
But now there are various ways
392
00:32:49 --> 00:32:52
to do it.
And a lot of them are valid
393
00:32:52 --> 00:32:57
they give you different answers.
You have to decide what it is
394
00:32:57 --> 00:32:59
that you prefer.
For example,
395
00:32:59 --> 00:33:04
you could measure the distance
to the line by projecting
396
00:33:04 --> 00:33:08
perpendicularly.
Or you could measure instead,
397
00:33:08 --> 00:33:13
for a given value of x,
the difference between the
398
00:33:13 --> 00:33:17
experimental value of y and the
predicted one.
399
00:33:17 --> 00:33:21
And that is often more relevant
because these guys actually may
400
00:33:21 --> 00:33:25
be expressed in different units.
They are not the same type of
401
00:33:25 --> 00:33:29
quantity.
You cannot actually combine
402
00:33:29 --> 00:33:32
them arbitrarily.
Anyway, the convention is
403
00:33:32 --> 00:33:34
usually we measure distance in
this way.
404
00:33:34 --> 00:33:38
Next, you could try to minimize
the largest distance.
405
00:33:38 --> 00:33:42
Say we look at who has the
largest error and we make that
406
00:33:42 --> 00:33:44
the smallest possible.
The drawback of doing that is
407
00:33:44 --> 00:33:47
experimentally very often you
have one data point that is not
408
00:33:47 --> 00:33:50
good because maybe you fell
asleep in front of the
409
00:33:50 --> 00:33:53
experiment.
And so you didn't measure the
410
00:33:53 --> 00:33:55
right thing.
You tend to want to not give
411
00:33:55 --> 00:33:59
too much importance to some data
point that is far away from the
412
00:33:59 --> 00:34:02
others.
Maybe instead you want to
413
00:34:02 --> 00:34:06
measure the average distance or
maybe you want to actually give
414
00:34:06 --> 00:34:09
more weight to things that are
further away.
415
00:34:09 --> 00:34:12
And then you don't want to do
the distance with a square of
416
00:34:12 --> 00:34:14
the distance.
There are various possible
417
00:34:14 --> 00:34:18
answers, but one of them gives
us actually a particularly nice
418
00:34:18 --> 00:34:22
formula for a and b.
And so that is why it is the
419
00:34:22 --> 00:34:27
universally used one.
Here it says list squares.
420
00:34:27 --> 00:34:31
That's because we will measure,
actually, the sum of the
421
00:34:31 --> 00:34:35
squares of the errors.
And why do we do that?
422
00:34:35 --> 00:34:37
Well, part of it is because it
looks good.
423
00:34:37 --> 00:34:42
When you see this plot in
scientific papers they really
424
00:34:42 --> 00:34:46
look like the line is indeed the
ideal line.
425
00:34:46 --> 00:34:49
And the second reason is
because actually the
426
00:34:49 --> 00:34:52
minimization problem that we
will get is particularly simple,
427
00:34:52 --> 00:34:57
well-posed and easy to solve.
So we will have a nice formula
428
00:34:57 --> 00:35:03
for the best a and the best b.
If you have a method that is
429
00:35:03 --> 00:35:07
simple and gives you a good
answer then that is probably
430
00:35:07 --> 00:35:09
good.
We have to define best.
431
00:35:09 --> 00:35:22
Here it is in the sense of
minimizing the total square
432
00:35:22 --> 00:35:29
error.
Or maybe I should say total
433
00:35:29 --> 00:35:35
square deviation instead.
What do I mean by this?
434
00:35:35 --> 00:35:44
The deviation for each data
point is the difference between
435
00:35:44 --> 00:35:52
what you have measured and what
you are predicting by your
436
00:35:52 --> 00:36:00
model.
That is the difference between
437
00:36:00 --> 00:36:11
y1 and axi plus b.
Now, what we will do is try to
438
00:36:11 --> 00:36:25
minimize the function capital D,
which is just the sum for all
439
00:36:25 --> 00:36:36
the data points of the square of
a deviation.
440
00:36:36 --> 00:36:40
Let me go over this again.
This is a function of a and b.
441
00:36:40 --> 00:36:43
Of course there are a lot of
letters in here,
442
00:36:43 --> 00:36:46
but xi and yi in real life
there will be numbers given to
443
00:36:46 --> 00:36:48
you.
There will be numbers that you
444
00:36:48 --> 00:36:51
have measured.
You have measured all of this
445
00:36:51 --> 00:36:53
data.
They are just going to be
446
00:36:53 --> 00:36:58
numbers.
You put them in there and you
447
00:36:58 --> 00:37:04
get a function of a and b.
Any questions?
448
00:37:04 --> 00:37:16
449
00:37:16 --> 00:37:20
How do we minimize this
function of a and b?
450
00:37:20 --> 00:37:27
Well, let's use your knowledge.
Let's actually look for a
451
00:37:27 --> 00:37:34
critical point.
We want to solve for partial d
452
00:37:34 --> 00:37:42
over partial a= 0,
partial d over partial b = 0.
453
00:37:42 --> 00:37:48
That is how we look for
critical points.
454
00:37:48 --> 00:37:52
Let's take the derivative of
this with respect to a.
455
00:37:52 --> 00:37:59
Well, the derivative of a sum
is sum of the derivatives.
456
00:37:59 --> 00:38:04
And now we have to take the
derivative of this quantity
457
00:38:04 --> 00:38:07
squared.
Remember, we take the
458
00:38:07 --> 00:38:11
derivative of the square.
We take twice this quantity
459
00:38:11 --> 00:38:15
times the derivative of what we
are squaring.
460
00:38:15 --> 00:38:26
We will get 2(yi - axi) b times
the derivative of this with
461
00:38:26 --> 00:38:30
respect to a.
What is the derivative of this
462
00:38:30 --> 00:38:35
with respect to a?
Negative xi, exactly.
463
00:38:35 --> 00:38:38
And so we will want this to be
0.
464
00:38:38 --> 00:38:41
And partial d over partial b,
we do the same thing,
465
00:38:41 --> 00:38:45
but different shading with
respect to b instead of with
466
00:38:45 --> 00:38:50
respect to a.
Again, the sum of squares twice
467
00:38:50 --> 00:38:58
yi minus axi equals b times the
derivative of this with respect
468
00:38:58 --> 00:39:02
to b is, I think,
negative one.
469
00:39:02 --> 00:39:07
Those are the equations we have
to solve.
470
00:39:07 --> 00:39:10
Well, let's reorganize this a
little bit.
471
00:39:10 --> 00:39:24
472
00:39:24 --> 00:39:32
The first equation.
See, there are a's and there
473
00:39:32 --> 00:39:36
are b's in these equations.
I am going to just look at the
474
00:39:36 --> 00:39:39
coefficients of a and b.
If you have good eyes,
475
00:39:39 --> 00:39:42
you can see probably that these
are actually linear equations in
476
00:39:42 --> 00:39:45
a and b.
There is a lot of clutter with
477
00:39:45 --> 00:39:47
all these x's and y's all over
the place.
478
00:39:47 --> 00:39:55
Let's actually try to expand
things and make that more
479
00:39:55 --> 00:39:59
apparent.
The first thing I will do is
480
00:39:59 --> 00:40:02
actually get rid of these
factors of two.
481
00:40:02 --> 00:40:05
They are just not very
important.
482
00:40:05 --> 00:40:10
I can simplify things.
Next, I am going to look at the
483
00:40:10 --> 00:40:15
coefficient of a.
I will get basically a times xi
484
00:40:15 --> 00:40:24
squared.
Let me just do it and should be
485
00:40:24 --> 00:40:33
clear.
I claim when we simplify this
486
00:40:33 --> 00:40:46
we get xi squared times a plus
xi times b minus xiyi.
487
00:40:46 --> 00:40:53
And we set this equal to zero.
Do you agree that this is what
488
00:40:53 --> 00:40:57
we get when we expand that
product?
489
00:40:57 --> 00:41:03
Yeah. Kind of?
OK. Let's do the other one.
490
00:41:03 --> 00:41:08
We just multiply by minus one,
so we take the opposite of that
491
00:41:08 --> 00:41:19
which would be axi plus b.
I will write that as xia plus b
492
00:41:19 --> 00:41:25
minus yi.
Sorry. I forgot the n here.
493
00:41:25 --> 00:41:30
And let me just reorganize that
by actually putting all the a's
494
00:41:30 --> 00:41:34
together.
That means I will have sum of
495
00:41:34 --> 00:41:40
all the xi2 times a plus sum of
xib minus sum of xiyi equal to
496
00:41:40 --> 00:41:41
zero.
497
00:41:41 --> 00:42:08
498
00:42:08 --> 00:42:15
If I rewrite this,
it becomes sum of xi2 times a
499
00:42:15 --> 00:42:24
plus sum of the xi's time b,
and let me move the other guys
500
00:42:24 --> 00:42:30
to the other side,
equals sum of xiyi.
501
00:42:30 --> 00:42:37
And that one becomes sum of xi
times a.
502
00:42:37 --> 00:42:41
Plus how many b's do I get on
this one?
503
00:42:41 --> 00:42:45
I get one for each data point.
When I sum them together,
504
00:42:45 --> 00:42:48
I will get n.
Very good.
505
00:42:48 --> 00:42:56
N times b equals sum of yi.
Now, this quantities look
506
00:42:56 --> 00:42:58
scary, but they are actually
just numbers.
507
00:42:58 --> 00:43:01
For example,
this one, you look at all your
508
00:43:01 --> 00:43:05
data points.
For each of them you take the
509
00:43:05 --> 00:43:10
value of x and you just sum all
these numbers together.
510
00:43:10 --> 00:43:19
What you get,
actually, is a linear system in
511
00:43:19 --> 00:43:26
a and b, a two by two linear
system.
512
00:43:26 --> 00:43:32
And so now we can solve this
for a and b.
513
00:43:32 --> 00:43:35
In practice,
of course, first you plug in
514
00:43:35 --> 00:43:40
the numbers for xi and yi and
then you solve the system that
515
00:43:40 --> 00:43:44
you get.
And we know how to solve two by
516
00:43:44 --> 00:43:46
two linear systems,
I hope.
517
00:43:46 --> 00:43:50
That's how we find the best fit
line.
518
00:43:50 --> 00:43:54
Now, why is that going to be
the best one instead of the
519
00:43:54 --> 00:43:56
worst one?
We just solved for a critical
520
00:43:56 --> 00:43:58
point.
That could actually be a
521
00:43:58 --> 00:44:01
maximum of this error function
D.
522
00:44:01 --> 00:44:05
We will have the answer to that
next time, but trust me.
523
00:44:05 --> 00:44:08
If you really want to go over
the second derivative test that
524
00:44:08 --> 00:44:11
we will see tomorrow and apply
it in this case,
525
00:44:11 --> 00:44:14
it is quite hard to check,
but you can see it is actually
526
00:44:14 --> 00:44:28
a minimum.
I will just say -- -- we can
527
00:44:28 --> 00:44:42
show that it is a minimum.
Now, the event with the linear
528
00:44:42 --> 00:44:47
case is the one that we are the
most familiar with.
529
00:44:47 --> 00:44:56
Least-squares interpolation
actually works in much more
530
00:44:56 --> 00:45:03
general settings.
Because instead of fitting for
531
00:45:03 --> 00:45:06
the best line,
if you think it has a different
532
00:45:06 --> 00:45:10
kind of relation then maybe you
can fit in using a different
533
00:45:10 --> 00:45:14
kind of formula.
Let me actually illustrate that
534
00:45:14 --> 00:45:17
with an example.
I don't know if you are
535
00:45:17 --> 00:45:21
familiar with Moore's law.
It is something that is
536
00:45:21 --> 00:45:24
supposed to tell you how quickly
basically computer chips become
537
00:45:24 --> 00:45:27
smarter faster and faster all
the time.
538
00:45:27 --> 00:45:31
It's a law that says things
about the number of transistors
539
00:45:31 --> 00:45:33
that you can fit onto a computer
chip.
540
00:45:33 --> 00:45:45
Here I have some data about --
Here is data about the number of
541
00:45:45 --> 00:45:58
transistors on a standard PC
processor as a function of time.
542
00:45:58 --> 00:46:01
And if you try to do a
best-line fit,
543
00:46:01 --> 00:46:07
well, it doesn't seem to follow
a linear trend.
544
00:46:07 --> 00:46:11
On the other hand,
if you plug the diagram in the
545
00:46:11 --> 00:46:13
log scale,
the log of a number of
546
00:46:13 --> 00:46:15
transitions as a function of
time,
547
00:46:15 --> 00:46:21
then you get a much better line.
And so, in fact,
548
00:46:21 --> 00:46:26
that means that you had an
exponential relation between the
549
00:46:26 --> 00:46:30
number of transistors and time.
And so, actually that's what
550
00:46:30 --> 00:46:32
Moore's law says.
It says that the number of
551
00:46:32 --> 00:46:36
transistors in the chip doubles
every 18 months or every two
552
00:46:36 --> 00:46:40
years.
They keep changing the
553
00:46:40 --> 00:46:49
statement.
How do we find the best
554
00:46:49 --> 00:46:58
exponential fit?
Well, an exponential fit would
555
00:46:58 --> 00:47:05
be something of a form y equals
a constant times exponential of
556
00:47:05 --> 00:47:09
a times x.
That is what we want to look at.
557
00:47:09 --> 00:47:13
Well, we could try to minimize
a square error like we did
558
00:47:13 --> 00:47:16
before.
That doesn't work well at all.
559
00:47:16 --> 00:47:18
The equations that you get are
very complicated.
560
00:47:18 --> 00:47:24
You cannot solve them.
But remember what I showed you
561
00:47:24 --> 00:47:28
on this log plot.
If you plot the log of y as a
562
00:47:28 --> 00:47:33
function of x then suddenly it
becomes a linear relation.
563
00:47:33 --> 00:47:43
Observe, this is the same as ln
of y equals ln of c plus ax.
564
00:47:43 --> 00:47:55
And that is the linear best fit.
What you do is you just look
565
00:47:55 --> 00:48:08
for the best straight line fit
for the log of y.
566
00:48:08 --> 00:48:10
That is something we already
know.
567
00:48:10 --> 00:48:12
But you can also do,
for example,
568
00:48:12 --> 00:48:16
let's say that we have
something more complicated.
569
00:48:16 --> 00:48:21
Let's say that we have actually
a quadratic law.
570
00:48:21 --> 00:48:27
For example,
y is of the form ax^2 bx c.
571
00:48:27 --> 00:48:31
And, of course,
you are trying to find somehow
572
00:48:31 --> 00:48:34
the best.
That would mean here fitting
573
00:48:34 --> 00:48:37
the best parabola for your data
points.
574
00:48:37 --> 00:48:40
Well, to do that,
you would need to find a,
575
00:48:40 --> 00:48:45
b and c.
And now you will have actually
576
00:48:45 --> 00:48:51
a function of a,
b and c, which would be the sum
577
00:48:51 --> 00:48:57
of the old data points of the
square deviation.
578
00:48:57 --> 00:49:01
And, if you try to solve for
critical points,
579
00:49:01 --> 00:49:03
now you will have three
equations involving a,
580
00:49:03 --> 00:49:05
b and c,
in fact, you will find a three
581
00:49:05 --> 00:49:09
by three linear system.
And it works the same way.
582
00:49:09 --> 00:49:14
Just you have a little bit more
data.
583
00:49:14 --> 00:49:19
Basically, you see that this
best fit problems are an example
584
00:49:19 --> 00:49:24
of a minimization problem that
maybe you didn't expect to see
585
00:49:24 --> 00:49:30
minimization problems come in.
But that is really the way to
586
00:49:30 --> 00:49:34
handle these questions.
Tomorrow we will go back to the
587
00:49:34 --> 00:49:38
question of how do we decide
whether it is a minimum or a
588
00:49:38 --> 00:49:40
maximum.
And we will continue exploring
589
00:49:40 --> 00:49:43
in terms of several variables.
590
00:49:43 --> 00:49:48