1
00:00:01 --> 00:00:03
The following content is
provided under a Creative
2
00:00:03 --> 00:00:05
Commons license.
Your support will help MIT
3
00:00:05 --> 00:00:08
OpenCourseWare continue to offer
high quality educational
4
00:00:08 --> 00:00:13
resources for free.
To make a donation or to view
5
00:00:13 --> 00:00:18
additional materials from
hundreds of MIT courses,
6
00:00:18 --> 00:00:23
visit MIT OpenCourseWare at
ocw.mit.edu.
7
00:00:25 --> 00:00:29
Last time we saw things about
gradients and directional
8
00:00:29 --> 00:00:32
derivatives.
Before that we studied how to
9
00:00:32 --> 00:00:37
look for minima and maxima of
functions of several variables.
10
00:00:37 --> 00:00:41
And today we are going to look
again at min/max problems but in
11
00:00:41 --> 00:00:45
a different setting,
namely, one for variables that
12
00:00:45 --> 00:00:49
are not independent.
And so what we will see is you
13
00:00:49 --> 00:00:52
may have heard of Lagrange
multipliers.
14
00:00:52 --> 00:00:59
And this is the one point in
the term when I can shine with
15
00:00:59 --> 00:01:05
my French accent and say
Lagrange's name properly.
16
00:01:05 --> 00:01:08
OK.
What are Lagrange multipliers
17
00:01:08 --> 00:01:13
about?
Well, the goal is to minimize
18
00:01:13 --> 00:01:19
or maximize a function of
several variables.
19
00:01:19 --> 00:01:22
Let's say, for example,
f of x, y, z,
20
00:01:22 --> 00:01:27
but where these variables are
no longer independent.
21
00:01:27 --> 00:01:41
22
00:01:41 --> 00:01:43
They are not independent.
That means that there is a
23
00:01:43 --> 00:01:47
relation between them.
The relation is maybe some
24
00:01:47 --> 00:01:52
equation of the form g of x,
y, z equals some constant.
25
00:01:52 --> 00:01:57
You take the relation between
x, y, z, you call that g and
26
00:01:57 --> 00:02:02
that gives you the constraint.
And your goal is to minimize f
27
00:02:02 --> 00:02:05
only of those values of x,
y, z that satisfy the
28
00:02:05 --> 00:02:07
constraint.
What is one way to do that?
29
00:02:07 --> 00:02:10
Well, one to do that,
if the constraint is very
30
00:02:10 --> 00:02:14
simple, we can maybe solve for
one of the variables.
31
00:02:14 --> 00:02:17
Maybe we can solve this
equation for one of the
32
00:02:17 --> 00:02:21
variables, plug it back into f,
and then we have a usual
33
00:02:21 --> 00:02:25
min/max problem that we have
seen how to do.
34
00:02:25 --> 00:02:28
The problem is sometimes you
cannot actually solve for x,
35
00:02:28 --> 00:02:31
y, z in here because this
condition is too complicated and
36
00:02:31 --> 00:02:38
then we need a new method.
That is what we are going to do.
37
00:02:38 --> 00:02:41
Why would we care about that?
Well, one example is actually
38
00:02:41 --> 00:02:43
in physics.
Maybe you have seen in
39
00:02:43 --> 00:02:47
thermodynamics that you study
quantities about gases,
40
00:02:47 --> 00:02:50
and those quantities that
involve pressure,
41
00:02:50 --> 00:02:53
volume and temperature.
And pressure,
42
00:02:53 --> 00:02:56
volume and temperature are not
independent of each other.
43
00:02:56 --> 00:02:59
I mean you know probably the
equation PV = NRT.
44
00:02:59 --> 00:03:01
And, of course,
there you could actually solve
45
00:03:01 --> 00:03:03
to express things in terms of
one or the other.
46
00:03:03 --> 00:03:07
But sometimes it is more
convenient to keep all three
47
00:03:07 --> 00:03:09
variables but treat them as
constrained.
48
00:03:09 --> 00:03:19
It is just an example of a
situation where you might want
49
00:03:19 --> 00:03:24
to do this.
Anyway, we will look mostly at
50
00:03:24 --> 00:03:28
particular examples,
but just to point out that this
51
00:03:28 --> 00:03:32
is useful when you study guesses
in physics.
52
00:03:32 --> 00:03:35
The first observation is we
cannot use our usual method of
53
00:03:35 --> 00:03:36
looking for critical points of
f.
54
00:03:36 --> 00:03:40
Because critical points of f
typically will not satisfy this
55
00:03:40 --> 00:03:43
condition and so won't be good
solutions.
56
00:03:43 --> 00:03:49
We need something else.
Let's look at an example,
57
00:03:49 --> 00:03:53
and we will see how that leads
us to the method.
58
00:03:53 --> 00:04:03
For example,
let's say that I want to find
59
00:04:03 --> 00:04:17
the point closest to the origin
-- -- on the hyperbola xy equals
60
00:04:17 --> 00:04:23
3 in the plane.
That means I have this
61
00:04:23 --> 00:04:26
hyperbola, and I am asking
myself what is the point on it
62
00:04:26 --> 00:04:29
that is the closest to the
origin?
63
00:04:29 --> 00:04:31
I mean we can solve this by
elementary geometry,
64
00:04:31 --> 00:04:34
we don't need actually Lagrange
multipliers,
65
00:04:34 --> 00:04:38
but we are going to do it with
Lagrange multipliers because it
66
00:04:38 --> 00:04:41
is a pretty good example.
What does it mean?
67
00:04:41 --> 00:04:47
Well, it means that we want to
minimize distance to the origin.
68
00:04:47 --> 00:04:49
What is the distance to the
origin?
69
00:04:49 --> 00:04:53
If I have a point,
at coordinates (x,
70
00:04:53 --> 00:04:58
y) and then the distance to the
origin is square root of x
71
00:04:58 --> 00:05:02
squared plus y squared.
Well, do we really want to
72
00:05:02 --> 00:05:05
minimize that or can we minimize
something easier?
73
00:05:05 --> 00:05:06
Yeah.
Maybe we can minimize the
74
00:05:06 --> 00:05:14
square of a distance.
Let's forget this guy and
75
00:05:14 --> 00:05:23
instead -- Actually,
we will minimize f of x,
76
00:05:23 --> 00:05:27
y equals x squared plus y
squared,
77
00:05:27 --> 00:05:39
that looks better,
subject to the constraint xy =
78
00:05:39 --> 00:05:44
3.
And so we will call this thing
79
00:05:44 --> 00:05:50
g of x, y to illustrate the
general method.
80
00:05:50 --> 00:05:58
Let's look at a picture.
Here you can see in yellow the
81
00:05:58 --> 00:06:02
hyperbola xy equals three.
And we are going to look for
82
00:06:02 --> 00:06:05
the points that are the closest
to the origin.
83
00:06:05 --> 00:06:08
What can we do?
Well, for example,
84
00:06:08 --> 00:06:13
we can plot the function x
squared plus y squared,
85
00:06:13 --> 00:06:17
function f.
That is the contour plot of f
86
00:06:17 --> 00:06:21
with a hyperbola on top of it.
Now let's see what we can do
87
00:06:21 --> 00:06:25
with that.
Well, let's ask ourselves,
88
00:06:25 --> 00:06:30
for example,
if I look at points where f
89
00:06:30 --> 00:06:34
equals 20 now.
I think I am at 20 but you
90
00:06:34 --> 00:06:37
cannot really see it.
That is a circle with a point
91
00:06:37 --> 00:06:41
whose distant square is 20.
Well, can I find a solution if
92
00:06:41 --> 00:06:44
I am on the hyperbola?
Yes, there are four points at
93
00:06:44 --> 00:06:46
this distance.
Can I do better?
94
00:06:46 --> 00:06:49
Well, let's decrease for
distance.
95
00:06:49 --> 00:06:52
Yes, we can still find points
on the hyperbola and so on.
96
00:06:52 --> 00:06:56
Except if we go too low then
there are no points on this
97
00:06:56 --> 00:07:00
circle anymore in the hyperbola.
If we decrease the value of f
98
00:07:00 --> 00:07:03
that we want to look at that
will somehow limit value beyond
99
00:07:03 --> 00:07:07
which we cannot go,
and that is the minimum of f.
100
00:07:07 --> 00:07:13
We are trying to look for the
smallest value of f that will
101
00:07:13 --> 00:07:17
actually be realized on the
hyperbola.
102
00:07:17 --> 00:07:20
When does that happen?
Well, I have to backtrack a
103
00:07:20 --> 00:07:23
little bit.
It seems like the limiting case
104
00:07:23 --> 00:07:26
is basically here.
It is when the circle is
105
00:07:26 --> 00:07:31
tangent to the hyperbola.
That is the smallest circle
106
00:07:31 --> 00:07:37
that will hit the hyperbola.
If I take a larger value of f,
107
00:07:37 --> 00:07:39
I will have solutions.
If I take a smaller value of f,
108
00:07:39 --> 00:07:41
I will not have any solutions
anymore.
109
00:07:41 --> 00:07:49
So, that is the situation that
we want to solve for.
110
00:07:49 --> 00:07:54
How do we find that minimum?
Well, a key observation that is
111
00:07:54 --> 00:07:58
valid on this picture,
and that actually remain true
112
00:07:58 --> 00:08:03
in the completely general case,
is that when we have a minimum
113
00:08:03 --> 00:08:09
the level curve of f is actually
tangent to our hyperbola.
114
00:08:09 --> 00:08:15
It is tangent to the set of
points where x,
115
00:08:15 --> 00:08:20
y equals three,
to the hyperbola.
116
00:08:20 --> 00:08:32
Let's write that down.
We observe that at the minimum
117
00:08:32 --> 00:08:49
the level curve of f is tangent
to the hyperbola.
118
00:08:49 --> 00:08:53
Remember, the hyperbola is
given by the equal g equals
119
00:08:53 --> 00:08:56
three, so it is a level curve of
g.
120
00:08:56 --> 00:08:59
We have a level curve of f and
a level curve of g that are
121
00:08:59 --> 00:09:03
tangent to each other.
And I claim that is going to be
122
00:09:03 --> 00:09:07
the general situation that we
are interested in.
123
00:09:07 --> 00:09:12
How do we try to solve for
points where this happens?
124
00:09:12 --> 00:09:28
125
00:09:28 --> 00:09:36
How do we find x,
y where the level curves of f
126
00:09:36 --> 00:09:47
and g are tangent to each other?
Let's think for a second.
127
00:09:47 --> 00:09:51
If the two level curves are
tangent to each other that means
128
00:09:51 --> 00:09:57
they have the same tangent line.
That means that the normal
129
00:09:57 --> 00:10:03
vectors should be parallel.
Let me maybe draw a picture
130
00:10:03 --> 00:10:06
here.
This is the level curve maybe f
131
00:10:06 --> 00:10:11
equals something.
And this is the level curve g
132
00:10:11 --> 00:10:16
equals constant.
Here my constant is three.
133
00:10:16 --> 00:10:20
Well, if I look for gradient
vectors, the gradient of f will
134
00:10:20 --> 00:10:23
be perpendicular to the level
curve of f.
135
00:10:23 --> 00:10:27
The gradient of g will be
perpendicular to the level curve
136
00:10:27 --> 00:10:29
of g.
They don't have any reason to
137
00:10:29 --> 00:10:32
be of the same size,
but they have to be parallel to
138
00:10:32 --> 00:10:35
each other.
Of course, they could also be
139
00:10:35 --> 00:10:38
parallel pointing in opposite
directions.
140
00:10:38 --> 00:10:48
But the key point is that when
this happens the gradient of f
141
00:10:48 --> 00:10:54
is parallel to the gradient of
g.
142
00:10:54 --> 00:11:03
Well, let's check that.
Here is a point.
143
00:11:03 --> 00:11:05
And I can plot the gradient of
f in blue.
144
00:11:05 --> 00:11:08
The gradient of g in yellow.
And you see,
145
00:11:08 --> 00:11:12
in most of these places,
somehow the two gradients are
146
00:11:12 --> 00:11:14
not really parallel.
Actually, I should not be
147
00:11:14 --> 00:11:17
looking at random points.
I should be looking only on the
148
00:11:17 --> 00:11:19
hyperbola.
I want points on the hyperbola
149
00:11:19 --> 00:11:22
where the two gradients are
parallel.
150
00:11:22 --> 00:11:28
Well, when does that happen?
Well, it looks like it will
151
00:11:28 --> 00:11:31
happen here.
When I am at a minimum,
152
00:11:31 --> 00:11:34
the two gradient vectors are
parallel.
153
00:11:34 --> 00:11:37
It is not really proof.
It is an example that seems to
154
00:11:37 --> 00:11:43
be convincing.
So far things work pretty well.
155
00:11:43 --> 00:11:46
How do we decide if two vectors
are parallel?
156
00:11:46 --> 00:11:50
Well, they are parallel when
they are proportional to each
157
00:11:50 --> 00:11:54
other.
You can write one of them as a
158
00:11:54 --> 00:12:02
constant times the other one,
and that constant usually one
159
00:12:02 --> 00:12:07
uses the Greek letter lambda.
I don't know if you have seen
160
00:12:07 --> 00:12:10
it before.
It is the Greek letter for L.
161
00:12:10 --> 00:12:15
And probably,
I am sure, it is somebody's
162
00:12:15 --> 00:12:22
idea of paying tribute to
Lagrange by putting an L in
163
00:12:22 --> 00:12:25
there.
Lambda is just a constant.
164
00:12:25 --> 00:12:31
And we are looking for a scalar
lambda and points x and y where
165
00:12:31 --> 00:12:33
this holds.
In fact,
166
00:12:33 --> 00:12:37
what we are doing is replacing
min/max problems in two
167
00:12:37 --> 00:12:41
variables with a constraint
between them by a set of
168
00:12:41 --> 00:12:47
equations involving,
you will see, three variables.
169
00:12:47 --> 00:12:54
We had min/max with two
variables x, y,
170
00:12:54 --> 00:13:00
but no independent.
We had a constraint g of x,
171
00:13:00 --> 00:13:06
y equals constant.
And that becomes something new.
172
00:13:06 --> 00:13:12
That becomes a system of
equations where we have to
173
00:13:12 --> 00:13:19
solve, well, let's write down
what it means for gradient f to
174
00:13:19 --> 00:13:26
be proportional to gradient g.
That means that f sub x should
175
00:13:26 --> 00:13:32
be lambda times g sub x,
and f sub y should be lambda
176
00:13:32 --> 00:13:36
times g sub y.
Because the gradient vectors
177
00:13:36 --> 00:13:39
here are f sub x,
f sub y and g sub x,
178
00:13:39 --> 00:13:43
g sub y.
If you have a third variable z
179
00:13:43 --> 00:13:49
then you have also an equation f
sub z equals lambda g sub z.
180
00:13:49 --> 00:13:53
Now, let's see.
How many unknowns do we have in
181
00:13:53 --> 00:13:55
these equations?
Well, there is x,
182
00:13:55 --> 00:14:01
there is y and there is lambda.
We have three unknowns and have
183
00:14:01 --> 00:14:06
only two equations.
Something is missing.
184
00:14:06 --> 00:14:10
Well, I mean x and y are not
actually independent.
185
00:14:10 --> 00:14:14
They are related by the
equation g of x,
186
00:14:14 --> 00:14:21
y equals c, so we need to add
the constraint g equals c.
187
00:14:21 --> 00:14:26
And now we have three equations
involving three variables.
188
00:14:26 --> 00:14:39
Let's see how that works.
Here remember we have f equals
189
00:14:39 --> 00:14:45
x squared y squared and g = xy.
What is f sub x?
190
00:14:45 --> 00:14:52
It is going to be 2x equals
lambda times,
191
00:14:52 --> 00:14:55
what is g sub x,
y.
192
00:14:55 --> 00:14:59
Maybe I should write here f sub
x equals lambda g sub x just to
193
00:14:59 --> 00:15:03
remind you.
Then we have f sub y equals
194
00:15:03 --> 00:15:10
lambda g sub y.
F sub y is 2y equals lambda
195
00:15:10 --> 00:15:18
times g sub y is x.
And then our third equation g
196
00:15:18 --> 00:15:22
equals c becomes xy equals
three.
197
00:15:22 --> 00:15:26
So, that is what you would have
to solve.
198
00:15:26 --> 00:15:33
Any questions at this point?
No.
199
00:15:33 --> 00:15:44
Yes?
How do I know the direction of
200
00:15:44 --> 00:15:47
a gradient?
Do you mean how do I know that
201
00:15:47 --> 00:15:50
it is perpendicular to a level
curve?
202
00:15:50 --> 00:15:54
Oh, how do I know if it points
in that direction on the
203
00:15:54 --> 00:15:56
opposite one?
Well, that depends.
204
00:15:56 --> 00:15:59
I mean we'd seen in last time,
but the gradient is
205
00:15:59 --> 00:16:02
perpendicular to the level and
points towards higher values of
206
00:16:02 --> 00:16:05
a function.
So it could be -- Wait.
207
00:16:05 --> 00:16:08
What did I have?
It could be that my gradient
208
00:16:08 --> 00:16:11
vectors up there actually point
in opposite directions.
209
00:16:11 --> 00:16:15
It doesn't matter to me because
it will still look the same in
210
00:16:15 --> 00:16:18
terms of the equation,
just lambda will be positive or
211
00:16:18 --> 00:16:22
negative, depending on the case.
I can handle both situations.
212
00:16:22 --> 00:16:30
It's not a problem.
I can allow lambda to be
213
00:16:30 --> 00:16:34
positive or negative.
Well, in this example,
214
00:16:34 --> 00:16:35
it looks like lambda will be
positive.
215
00:16:35 --> 00:16:38
If you look at the picture on
the plot.
216
00:16:38 --> 00:16:48
Yes?
Well, because actually they are
217
00:16:48 --> 00:16:51
not equal to each other.
If you look at this point where
218
00:16:51 --> 00:16:55
the hyperbola and the circle
touch each other,
219
00:16:55 --> 00:16:58
first of all,
I don't know which circle I am
220
00:16:58 --> 00:17:01
going to look at.
I am trying to solve,
221
00:17:01 --> 00:17:04
actually, for the radius of the
circle.
222
00:17:04 --> 00:17:07
I am trying to find what the
minimum value of f is.
223
00:17:07 --> 00:17:10
And, second,
at that point,
224
00:17:10 --> 00:17:14
the value of f and the value of
g are not equal.
225
00:17:14 --> 00:17:17
g is equal to three because I
want the hyperbola x equals
226
00:17:17 --> 00:17:19
three.
The value of f will be the
227
00:17:19 --> 00:17:22
square of a distance,
whatever that is.
228
00:17:22 --> 00:17:27
I think it will end up being 6,
but we will see.
229
00:17:27 --> 00:17:29
So, you cannot really set them
equal because you don't know
230
00:17:29 --> 00:17:45
what f is equal to in advance.
Yes?
231
00:17:45 --> 00:17:49
Not quite.
Actually, here I am just using
232
00:17:49 --> 00:17:52
this idea of finding a point
closest to the origin to
233
00:17:52 --> 00:17:55
illustrate an example of a
min/max problem.
234
00:17:55 --> 00:17:59
The general problem we are
trying to solve is minimize f
235
00:17:59 --> 00:18:03
subject to g equals constant.
And what we are going to do for
236
00:18:03 --> 00:18:07
that is we are really going to
say instead let's look at places
237
00:18:07 --> 00:18:10
where gradient f and gradient g
are parallel to each other and
238
00:18:10 --> 00:18:14
solve for equations of that.
I think we completely lose the
239
00:18:14 --> 00:18:19
notion of closest point if we
just look at these equations.
240
00:18:19 --> 00:18:21
We don't really say anything
about closest points anymore.
241
00:18:21 --> 00:18:24
Of course, that is what they
mean in the end.
242
00:18:24 --> 00:18:28
But, in the general setting,
there is no closest point
243
00:18:28 --> 00:18:31
involved anymore.
OK.
244
00:18:31 --> 00:18:40
Yes?
Yes.
245
00:18:40 --> 00:18:43
It is always going to be the
case that,
246
00:18:43 --> 00:18:46
at the minimum,
or at the maximum of a function
247
00:18:46 --> 00:18:49
subject to a constraint,
the level curves of f and the
248
00:18:49 --> 00:18:52
level curves of g will be
tangent to each other.
249
00:18:52 --> 00:18:54
That is the basis for this
method.
250
00:18:54 --> 00:19:00
I am going to justify that soon.
It could be minimum or maximum.
251
00:19:00 --> 00:19:02
In three-dimensions it could
even be a saddle point.
252
00:19:02 --> 00:19:03
And, in fact,
I should say in advance,
253
00:19:03 --> 00:19:06
this method will not tell us
whether it is a minimum or a
254
00:19:06 --> 00:19:08
maximum.
We do not have any way of
255
00:19:08 --> 00:19:10
knowing, except for testing
values.
256
00:19:10 --> 00:19:13
We cannot use second derivative
tests or anything like that.
257
00:19:13 --> 00:19:21
I will get back to that.
Yes?
258
00:19:21 --> 00:19:23
Yes.
Here you can set y equals to
259
00:19:23 --> 00:19:26
favor x.
Then you can minimize x squared
260
00:19:26 --> 00:19:30
plus nine over x squared.
In general, if I am trying to
261
00:19:30 --> 00:19:33
solve a more complicated
problem, I might not be able to
262
00:19:33 --> 00:19:35
solve.
I am doing an example where,
263
00:19:35 --> 00:19:38
indeed, here you could solve
and remove one variable,
264
00:19:38 --> 00:19:41
but you cannot always do that.
And this method will still work.
265
00:19:41 --> 00:19:47
The other one won't.
OK.
266
00:19:47 --> 00:19:53
I don't see any other questions.
Are there any other questions?
267
00:19:53 --> 00:19:56
No.
OK.
268
00:19:56 --> 00:20:02
I see a lot of students
stretching and so on,
269
00:20:02 --> 00:20:08
so it is very confusing for me.
How do we solve these equations?
270
00:20:08 --> 00:20:14
Well, the answer is in general
we might be in deep trouble.
271
00:20:14 --> 00:20:18
There is no general method for
solving the equations that you
272
00:20:18 --> 00:20:21
get from this method.
You just have to think about
273
00:20:21 --> 00:20:25
them.
Sometimes it will be very easy.
274
00:20:25 --> 00:20:28
Sometimes it will be so hard
that you cannot actually do it
275
00:20:28 --> 00:20:31
without the computer.
Sometimes it will be just hard
276
00:20:31 --> 00:20:33
enough to be on Part B of this
week's problem set.
277
00:20:33 --> 00:20:50
278
00:20:50 --> 00:20:56
I claim in this case we can
actually do it without so much
279
00:20:56 --> 00:21:03
trouble, because actually we can
think of this as a two by two
280
00:21:03 --> 00:21:10
linear system in x and y.
Well, let me do something.
281
00:21:10 --> 00:21:18
Let me rewrite the first two
equations as 2x - lambda y = 0.
282
00:21:18 --> 00:21:30
And lambda x - 2y = 0.
And xy = 3.
283
00:21:30 --> 00:21:36
That is what we want to solve.
Well, I can put this into
284
00:21:36 --> 00:21:41
matrix form.
Two minus lambda,
285
00:21:41 --> 00:21:48
lambda minus two times x,
y equals 0,0.
286
00:21:48 --> 00:21:52
Now, how do I solve a linear
system matrix times x,
287
00:21:52 --> 00:21:54
y equals zero?
Well, I always have an obvious
288
00:21:54 --> 00:21:56
solution.
X and y both equal to zero.
289
00:21:56 --> 00:22:02
Is that a good solution?
No, because zero times zero is
290
00:22:02 --> 00:22:07
not three.
We want another solution,
291
00:22:07 --> 00:22:14
the trivial solution.
0,0 does not solve the
292
00:22:14 --> 00:22:20
constraint equation xy equals
three, so we want another
293
00:22:20 --> 00:22:24
solution.
When do we have another
294
00:22:24 --> 00:22:29
solution?
Well, when the determinant of a
295
00:22:29 --> 00:22:37
matrix is zero.
We have other solutions that
296
00:22:37 --> 00:22:46
exist only if determinant of a
matrix is zero.
297
00:22:46 --> 00:23:01
M is this guy.
Let's compute the determinant.
298
00:23:01 --> 00:23:08
Well, that seems to be negative
four plus lambda squared.
299
00:23:08 --> 00:23:15
That is zero exactly when
lambda squared equals four,
300
00:23:15 --> 00:23:20
which is lambda is plus or
minus two.
301
00:23:20 --> 00:23:25
Already you see here it is a
the level of difficulty that is
302
00:23:25 --> 00:23:30
a little bit much for an exam
but perfectly fine for a problem
303
00:23:30 --> 00:23:33
set or for a beautiful lecture
like this one.
304
00:23:33 --> 00:23:37
How do we deal with -- Well,
we have two cases to look at.
305
00:23:37 --> 00:23:40
Lambda equals two or lambda
equals minus two.
306
00:23:40 --> 00:23:43
Let's start with lambda equals
two.
307
00:23:43 --> 00:23:47
If I set lambda equals two,
what does this equation become?
308
00:23:47 --> 00:23:53
Well, it becomes x equals y.
This one becomes y equals x.
309
00:23:53 --> 00:23:57
Well, they seem to be the same.
x equals y.
310
00:23:57 --> 00:24:01
And then the equation xy equals
three becomes,
311
00:24:01 --> 00:24:06
well, x squared equals three.
I have two solutions.
312
00:24:06 --> 00:24:15
One is x equals root three and,
therefore, y equals root three
313
00:24:15 --> 00:24:23
as well, or negative root three
and negative root three.
314
00:24:23 --> 00:24:26
Let's look at the other case.
If I set lambda equal to
315
00:24:26 --> 00:24:30
negative two then I get 2x
equals negative 2y.
316
00:24:30 --> 00:24:37
That means x equals negative y.
The second one,
317
00:24:37 --> 00:24:40
2y equals negative 2x.
That is y equals negative x.
318
00:24:40 --> 00:24:45
Well, that is the same thing.
And xy equals three becomes
319
00:24:45 --> 00:24:51
negative x squared equals three.
Can we solve that?
320
00:24:51 --> 00:24:58
No.
There are no solutions here.
321
00:24:58 --> 00:25:03
Now we have two candidate
points which are these two
322
00:25:03 --> 00:25:07
points, root three,
root three or negative root
323
00:25:07 --> 00:25:13
three, negative root three.
OK.
324
00:25:13 --> 00:25:16
Let's actually look at what we
have here.
325
00:25:16 --> 00:25:20
Maybe you cannot read the
coordinates, but the point that
326
00:25:20 --> 00:25:23
I have here is indeed root
three, root three.
327
00:25:23 --> 00:25:26
How do we see that lambda
equals two?
328
00:25:26 --> 00:25:29
Well, if you look at this
picture, the gradient of f,
329
00:25:29 --> 00:25:32
that is the blue vector,
is indeed twice the yellow
330
00:25:32 --> 00:25:36
vector, gradient g.
That is where you read the
331
00:25:36 --> 00:25:41
value of lambda.
And we have the other solution
332
00:25:41 --> 00:25:45
which is somewhere here.
Negative root three,
333
00:25:45 --> 00:25:48
negative root there.
And there, again,
334
00:25:48 --> 00:25:51
lambda equals two.
The two vectors are
335
00:25:51 --> 00:25:59
proportional by a factor of two.
Yes?
336
00:25:59 --> 00:26:01
No, solutions are not quite
guaranteed to be absolute minima
337
00:26:01 --> 00:26:03
or maxima.
They are guaranteed to be
338
00:26:03 --> 00:26:06
somehow critical points end of a
constraint.
339
00:26:06 --> 00:26:09
That means if you were able to
solve and eliminate the variable
340
00:26:09 --> 00:26:12
that would be a critical point.
When you have the same problem,
341
00:26:12 --> 00:26:14
as we have critical points,
are they maxima or minima?
342
00:26:14 --> 00:26:22
And the answer is,
well, we won't know until we
343
00:26:22 --> 00:26:28
check.
More questions?
344
00:26:28 --> 00:26:32
No.
Yes?
345
00:26:32 --> 00:26:36
What is a Lagrange multiplier?
Well, it is this number lambda
346
00:26:36 --> 00:26:39
that is called the multiplier
here.
347
00:26:39 --> 00:26:44
It is a multiplier because it
is what you have to multiply
348
00:26:44 --> 00:26:48
gradient of g by to get gradient
of f.
349
00:26:48 --> 00:26:49
It multiplies.
350
00:26:49 --> 00:27:04
351
00:27:04 --> 00:27:11
Let's try to see why is this
method valid?
352
00:27:11 --> 00:27:18
Because so far I have shown you
pictures and have said see they
353
00:27:18 --> 00:27:23
are tangent.
But why is it that they have to
354
00:27:23 --> 00:27:28
be tangent in general?
Let's think about it.
355
00:27:28 --> 00:27:37
Let's say that we are at
constrained min or max.
356
00:27:37 --> 00:27:42
What that means is that if I
move on the level g equals
357
00:27:42 --> 00:27:46
constant then the value of f
should only increase or only
358
00:27:46 --> 00:27:49
decrease.
But it means,
359
00:27:49 --> 00:27:53
in particular,
to first order it will not
360
00:27:53 --> 00:27:56
change.
At an unconstrained min or max,
361
00:27:56 --> 00:27:59
partial derivatives are zero.
In this case,
362
00:27:59 --> 00:28:02
derivatives are zero only in
the allowed directions.
363
00:28:02 --> 00:28:09
And the allowed directions are
those that stay on the levels of
364
00:28:09 --> 00:28:21
this g equals constant.
In any direction along the
365
00:28:21 --> 00:28:40
level set g = c the rate of
change of f must be zero.
366
00:28:40 --> 00:28:44
That is what happens at minima
or maxima.
367
00:28:44 --> 00:28:49
Except here,
of course, we look only at the
368
00:28:49 --> 00:28:54
allowed directions.
Let's say the same thing in
369
00:28:54 --> 00:28:57
terms of directional
derivatives.
370
00:28:57 --> 00:29:23
371
00:29:23 --> 00:29:35
That means for any direction
that is tangent to the
372
00:29:35 --> 00:29:49
constraint level g equal c,
we must have df over ds in the
373
00:29:49 --> 00:30:00
direction of u equals zero.
I will draw a picture.
374
00:30:00 --> 00:30:05
Let's say now I am in three
variables just to give you
375
00:30:05 --> 00:30:09
different examples.
Here I have a level surface g
376
00:30:09 --> 00:30:11
equals c.
I am at my point.
377
00:30:11 --> 00:30:18
And if I move in any direction
that is on the level surface,
378
00:30:18 --> 00:30:24
so I move in the direction u
tangent to the level surface,
379
00:30:24 --> 00:30:32
then the rate of change of f in
that direction should be zero.
380
00:30:32 --> 00:30:34
Now, remember what the formula
is for this guy.
381
00:30:34 --> 00:30:44
Well, we have seen that this
guy is actually radiant f dot u.
382
00:30:44 --> 00:30:58
That means any such vector u
must be perpendicular to the
383
00:30:58 --> 00:31:05
gradient of f.
That means that the gradient of
384
00:31:05 --> 00:31:10
f should be perpendicular to
anything that is tangent to this
385
00:31:10 --> 00:31:12
level.
That means the gradient of f
386
00:31:12 --> 00:31:16
should be perpendicular to the
level set.
387
00:31:16 --> 00:31:17
That is what we have shown.
388
00:31:17 --> 00:31:37
389
00:31:37 --> 00:31:40
But we know another vector that
is also perpendicular to the
390
00:31:40 --> 00:31:57
level set of g.
That is the gradient of g.
391
00:31:57 --> 00:32:02
We conclude that the gradient
of f must be parallel to the
392
00:32:02 --> 00:32:07
gradient of g because both are
perpendicular to the level set
393
00:32:07 --> 00:32:09
of g.
I see confused faces,
394
00:32:09 --> 00:32:13
so let me try to tell you again
where that comes from.
395
00:32:13 --> 00:32:16
We said if we had a constrained
minimum or maximum,
396
00:32:16 --> 00:32:19
if we move in the level set of
g, f doesn't change.
397
00:32:19 --> 00:32:20
Well, it doesn't change to
first order.
398
00:32:20 --> 00:32:24
It is the same idea as when you
are looking for a minimum you
399
00:32:24 --> 00:32:26
set the derivative equal to
zero.
400
00:32:26 --> 00:32:31
So the derivative in any
direction, tangent to g equals
401
00:32:31 --> 00:32:34
c, should be the directional
derivative of f,
402
00:32:34 --> 00:32:38
in any such direction,
should be zero.
403
00:32:38 --> 00:32:43
That is what we mean by
critical point of f.
404
00:32:43 --> 00:32:48
And so that means that any
vector u, any unit vector
405
00:32:48 --> 00:32:55
tangent to the level set of g is
going to be perpendicular to the
406
00:32:55 --> 00:33:00
gradient of f.
That means that the gradient of
407
00:33:00 --> 00:33:04
f is perpendicular to the level
set of g.
408
00:33:04 --> 00:33:06
If you want,
that means the level sets of f
409
00:33:06 --> 00:33:10
and g are tangent to each other.
That is justifying what we have
410
00:33:10 --> 00:33:15
observed in the picture that the
two level sets have to be
411
00:33:15 --> 00:33:20
tangent to each other at the
prime minimum or maximum.
412
00:33:20 --> 00:33:23
Does that make a little bit of
sense?
413
00:33:23 --> 00:33:28
Kind of.
I see at least a few faces
414
00:33:28 --> 00:33:35
nodding so I take that to be a
positive answer.
415
00:33:35 --> 00:33:39
Since I have been asked by
several of you,
416
00:33:39 --> 00:33:43
how do I know if it is a
maximum or a minimum?
417
00:33:43 --> 00:33:57
Well, warning,
the method doesn't tell whether
418
00:33:57 --> 00:34:09
a solution is a minimum or a
maximum.
419
00:34:09 --> 00:34:13
How do we do it?
Well, more bad news.
420
00:34:13 --> 00:34:26
We cannot use the second
derivative test.
421
00:34:26 --> 00:34:30
And the reason for that is that
we care actually only about
422
00:34:30 --> 00:34:34
these specific directions that
are tangent to variable of g.
423
00:34:34 --> 00:34:39
And we don't want to bother to
try to define directional second
424
00:34:39 --> 00:34:42
derivatives.
Not to mention that actually it
425
00:34:42 --> 00:34:45
wouldn't work.
There is a criterion but it is
426
00:34:45 --> 00:34:49
much more complicated than that.
Basically, the answer for us is
427
00:34:49 --> 00:34:52
that we don't have a second
derivative test in this
428
00:34:52 --> 00:34:54
situation.
What are we left with?
429
00:34:54 --> 00:34:57
Well, we are just left with
comparing values.
430
00:34:57 --> 00:35:00
Say that in this problem you
found a point where f equals
431
00:35:00 --> 00:35:04
three, a point where f equals
nine, a point where f equals 15.
432
00:35:04 --> 00:35:08
Well, then probably the minimum
is the point where f equals
433
00:35:08 --> 00:35:12
three and the maximum is 15.
Actually, in this case,
434
00:35:12 --> 00:35:17
where we found minima,
these two points are tied for
435
00:35:17 --> 00:35:19
minimum.
What about the maximum?
436
00:35:19 --> 00:35:22
What is the maximum of f on the
hyperbola?
437
00:35:22 --> 00:35:25
Well, it is infinity because
the point can go as far as you
438
00:35:25 --> 00:35:29
want from the origin.
But the general idea is if we
439
00:35:29 --> 00:35:35
have a good reason to believe
that there should be a minimum,
440
00:35:35 --> 00:35:38
and it's not like at infinity
or something weird like that,
441
00:35:38 --> 00:35:42
then the minimum will be a
solution of the Lagrange
442
00:35:42 --> 00:35:46
multiplier equations.
We just look for all the
443
00:35:46 --> 00:35:51
solutions and then we choose the
one that gives us the lowest
444
00:35:51 --> 00:35:55
value.
Is that good enough?
445
00:35:55 --> 00:35:57
Let me actually write that down.
446
00:35:57 --> 00:36:23
447
00:36:23 --> 00:36:35
To find the minimum or the
maximum, we compare values of f
448
00:36:35 --> 00:36:46
at the various solutions -- --
to Lagrange multiplier
449
00:36:46 --> 00:36:49
equations.
450
00:36:49 --> 00:37:08
451
00:37:08 --> 00:37:11
I should say also that
sometimes you can just conclude
452
00:37:11 --> 00:37:14
by thinking geometrically.
In this case,
453
00:37:14 --> 00:37:18
when it is asking you which
point is closest to the origin
454
00:37:18 --> 00:37:23
you can just see that your
answer is the correct one.
455
00:37:23 --> 00:37:32
Let's do an advanced example.
Advanced means that -- Well,
456
00:37:32 --> 00:37:37
this one I didn't actually dare
to put on top of the other
457
00:37:37 --> 00:37:48
problem sets.
Instead, I am going to do it.
458
00:37:48 --> 00:37:51
What is this going to be about?
We are going to look for a
459
00:37:51 --> 00:38:03
surface minimizing pyramid.
Let's say that we want to build
460
00:38:03 --> 00:38:19
a pyramid with a given
triangular base -- -- and a
461
00:38:19 --> 00:38:28
given volume.
Say that I have maybe in the x,
462
00:38:28 --> 00:38:33
y plane I am giving you some
triangle.
463
00:38:33 --> 00:38:40
And I am going to try to build
a pyramid.
464
00:38:40 --> 00:38:48
Of course, I can choose where
to put the top of a pyramid.
465
00:38:48 --> 00:38:53
This guy will end up being
behind now.
466
00:38:53 --> 00:39:09
And the constraint and the goal
is to minimize the total surface
467
00:39:09 --> 00:39:13
area.
The first time I taught this
468
00:39:13 --> 00:39:15
class, it was a few years ago,
was just before they built the
469
00:39:15 --> 00:39:17
Stata Center.
And then I used to motivate
470
00:39:17 --> 00:39:20
this problem by saying Frank
Gehry has gone crazy and has
471
00:39:20 --> 00:39:23
been given a triangular plot of
land he wants to put a pyramid.
472
00:39:23 --> 00:39:26
There needs to be the right
amount of volume so that you can
473
00:39:26 --> 00:39:28
put all the offices in there.
And he wants it to be,
474
00:39:28 --> 00:39:31
actually, covered in solid
gold.
475
00:39:31 --> 00:39:34
And because that is expensive,
the administration wants him to
476
00:39:34 --> 00:39:38
cut the costs a bit.
And so you have to minimize the
477
00:39:38 --> 00:39:42
total size so that it doesn't
cost too much.
478
00:39:42 --> 00:39:45
We will see if MIT comes up
with a triangular pyramid
479
00:39:45 --> 00:39:48
building.
Hopefully not.
480
00:39:48 --> 00:39:58
It could be our next dorm,
you never know.
481
00:39:58 --> 00:40:01
Anyway, it is a fine geometry
problem.
482
00:40:01 --> 00:40:07
Let's try to think about how we
can do this.
483
00:40:07 --> 00:40:10
The natural way to think about
it would be -- Well,
484
00:40:10 --> 00:40:11
what do we have to look for
first?
485
00:40:11 --> 00:40:18
We have to look for the
position of that top point.
486
00:40:18 --> 00:40:29
Remember we know that the
volume of a pyramid is one-third
487
00:40:29 --> 00:40:37
the area of base times height.
In fact, fixing the volume,
488
00:40:37 --> 00:40:39
knowing that we have fixed the
area of a base,
489
00:40:39 --> 00:40:43
means that we are fixing the
height of the pyramid.
490
00:40:43 --> 00:40:47
The height is completely fixed.
What we have to choose just is
491
00:40:47 --> 00:40:52
where do we put that top point?
Do we put it smack in the
492
00:40:52 --> 00:40:58
middle of a triangle or to a
side or even anywhere we want?
493
00:40:58 --> 00:41:15
Its z coordinate is fixed.
Let's call h the height.
494
00:41:15 --> 00:41:20
What we could do is something
like this.
495
00:41:20 --> 00:41:24
We say we have three points of
a base.
496
00:41:24 --> 00:41:32
Let's call them p1 at (x1,
y1,0); p2 at (x2,
497
00:41:32 --> 00:41:36
y2,0); p3 at (x3,
y3,0).
498
00:41:36 --> 00:41:40
This point p is the unknown
point at (x, y,
499
00:41:40 --> 00:41:42
h).
We know the height.
500
00:41:42 --> 00:41:46
And then we want to minimize
the sum of the areas of these
501
00:41:46 --> 00:41:50
three triangles.
One here, one here and one at
502
00:41:50 --> 00:41:53
the back.
And areas of triangles we know
503
00:41:53 --> 00:41:57
how to express by using length
of cross-product.
504
00:41:57 --> 00:42:00
It becomes a function of x and
y.
505
00:42:00 --> 00:42:04
And you can try to minimize it.
Actually, it doesn't quite work.
506
00:42:04 --> 00:42:05
The formulas are just too
complicated.
507
00:42:05 --> 00:42:14
You will never get there.
What happens is actually maybe
508
00:42:14 --> 00:42:18
we need better coordinates.
Why do we need better
509
00:42:18 --> 00:42:21
coordinates?
That is because the geometry is
510
00:42:21 --> 00:42:24
kind of difficult to do if you
use x, y coordinates.
511
00:42:24 --> 00:42:28
I mean formula for
cross-product is fine,
512
00:42:28 --> 00:42:33
but then the length of the
vector will be annoying and just
513
00:42:33 --> 00:42:37
doesn't look good.
Instead, let's think about it
514
00:42:37 --> 00:42:38
differently.
515
00:42:38 --> 00:42:54
516
00:42:54 --> 00:43:01
I claim if we do it this way
and we express the area as a
517
00:43:01 --> 00:43:06
function of x,
y, well, actually we can't
518
00:43:06 --> 00:43:13
solve for a minimum.
Here is another way to do it.
519
00:43:13 --> 00:43:17
Well, what has worked pretty
well for us so far is this
520
00:43:17 --> 00:43:19
geometric idea of base times
height.
521
00:43:19 --> 00:43:29
So let's think in terms of the
heights of side triangles.
522
00:43:29 --> 00:43:37
I am going to use the height of
these things.
523
00:43:37 --> 00:43:43
And I am going to say that the
area will be the sum of three
524
00:43:43 --> 00:43:48
terms, which are three bases
times three heights.
525
00:43:48 --> 00:43:53
Let's give names to these
quantities.
526
00:43:53 --> 00:43:58
Actually, for that it is going
to be good to have the point in
527
00:43:58 --> 00:44:01
the xy plane that lives directly
below p.
528
00:44:01 --> 00:44:08
Let's call it q.
P is the point that coordinates
529
00:44:08 --> 00:44:13
x, y, h.
And let's call q the point that
530
00:44:13 --> 00:44:19
is just below it and so it'
coordinates are x,
531
00:44:19 --> 00:44:22
y, 0.
Let's see.
532
00:44:22 --> 00:44:34
Let me draw a map of this thing.
p1, p2, p3 and I have my point
533
00:44:34 --> 00:44:37
q in the middle.
Let's see.
534
00:44:37 --> 00:44:40
To know these areas,
I need to know the base.
535
00:44:40 --> 00:44:44
Well, the base I can decide
that I know it because it is
536
00:44:44 --> 00:44:48
part of my given data.
I know the sides of this
537
00:44:48 --> 00:44:53
triangle.
Let me call the lengths a1,
538
00:44:53 --> 00:44:56
a2, a3.
I also need to know the height,
539
00:44:56 --> 00:44:58
so I need to know these
lengths.
540
00:44:58 --> 00:45:01
How do I know these lengths?
Well, its distance in space,
541
00:45:01 --> 00:45:03
but it is a little bit
annoying.
542
00:45:03 --> 00:45:10
But maybe I can reduce it to a
distance in the plane by looking
543
00:45:10 --> 00:45:17
instead at this distance here.
Let me give names to the
544
00:45:17 --> 00:45:24
distances from q to the sides.
Let's call u1,
545
00:45:24 --> 00:45:35
u2, u3 the distances from q to
the sides.
546
00:45:35 --> 00:45:47
547
00:45:47 --> 00:45:49
Well, now I can claim I can
find, actually,
548
00:45:49 --> 00:45:53
sorry.
I need to draw one more thing.
549
00:45:53 --> 00:45:57
I claim I have a nice formula
for the area,
550
00:45:57 --> 00:46:01
because this is vertical and
this is horizontal so this
551
00:46:01 --> 00:46:05
length here is u3,
this length here is h.
552
00:46:05 --> 00:46:13
So what is this length here?
It is the square root of u3
553
00:46:13 --> 00:46:17
squared plus h squared.
And similarly for these other
554
00:46:17 --> 00:46:23
guys.
They are square roots of a u
555
00:46:23 --> 00:46:31
squared plus h squared.
The heights of the faces are
556
00:46:31 --> 00:46:36
square root of u1 squared times
h squared.
557
00:46:36 --> 00:46:43
And similarly with u2 and u3.
So the total side area is going
558
00:46:43 --> 00:46:47
to be the area of the first
faces,
559
00:46:47 --> 00:46:58
one-half of base times height,
plus one-half of a base times a
560
00:46:58 --> 00:47:06
height plus one-half of the
third one.
561
00:47:06 --> 00:47:09
It doesn't look so much better.
But, trust me,
562
00:47:09 --> 00:47:15
it will get better.
Now, that is a function of
563
00:47:15 --> 00:47:19
three variables,
u1, u2, u3.
564
00:47:19 --> 00:47:22
And how do we relate u1,
u2, u3 to each other?
565
00:47:22 --> 00:47:25
They are probably not
independent.
566
00:47:25 --> 00:47:32
Well, let's cut this triangle
here into three pieces like
567
00:47:32 --> 00:47:35
that.
Then each piece has side --
568
00:47:35 --> 00:47:40
Well, let's look at it the piece
of the bottom.
569
00:47:40 --> 00:47:50
It has base a3, height u3.
Cutting base into three tells
570
00:47:50 --> 00:47:57
you that the area of a base is
one-half of a1,
571
00:47:57 --> 00:48:04
u1 plus one-half of a2,
u2 plus one-half of a3,
572
00:48:04 --> 00:48:09
u3.
And that is our constraint.
573
00:48:09 --> 00:48:12
My three variables,
u1, u2, u3, are constrained in
574
00:48:12 --> 00:48:14
this way.
The sum of this figure must be
575
00:48:14 --> 00:48:17
the area of a base.
And I want to minimize that guy.
576
00:48:17 --> 00:48:23
So that is my g and that guy
here is my f.
577
00:48:23 --> 00:48:28
Now we try to apply our
Lagrange multiplier equations.
578
00:48:28 --> 00:48:33
Well, partial f of a partial u1
is -- Well,
579
00:48:33 --> 00:48:36
if you do the calculation,
you will see it is one-half a1,
580
00:48:36 --> 00:48:43
u1 over square root of u1^2
plus h^2 equals lambda,
581
00:48:43 --> 00:48:46
what is partial g,
partial a1?
582
00:48:46 --> 00:48:50
That one you can do, I am sure.
It is one-half a1.
583
00:48:50 --> 00:49:00
Oh, these guys simplify.
If you do the same with the
584
00:49:00 --> 00:49:09
second one -- -- things simplify
again.
585
00:49:09 --> 00:49:17
And the same with the third one.
Well, you will get,
586
00:49:17 --> 00:49:21
after simplifying,
u3 over square root of u3
587
00:49:21 --> 00:49:24
squared plus h squared equals
lambda.
588
00:49:24 --> 00:49:27
Now, that means this guy equals
this guy equals this guy.
589
00:49:27 --> 00:49:33
They are all equal to lambda.
And, if you think about it,
590
00:49:33 --> 00:49:39
that means that u1 = u2 = u3.
See, it looked like scary
591
00:49:39 --> 00:49:42
equations but the solution is
very simple.
592
00:49:42 --> 00:49:45
What does it mean?
It means that our point q
593
00:49:45 --> 00:49:47
should be equidistant from all
three sides.
594
00:49:47 --> 00:49:52
That is called the incenter.
Q should be in the incenter.
595
00:49:52 --> 00:49:56
The next time you have to build
a golden pyramid and don't want
596
00:49:56 --> 00:49:59
to go broke, well,
you know where to put the top.
597
00:49:59 --> 00:50:03
If that was a bit fast, sorry.
Anyway, it is not completely
598
00:50:03 --> 00:50:06
crucial.
But go over it and you will see
599
00:50:06 --> 00:50:08
it works.
Have a nice weekend.
600
00:50:08 --> 00:50:10