1
00:00:01,540 --> 00:00:03,910
The following content is
provided under a Creative
2
00:00:03,910 --> 00:00:05,300
Commons license.
3
00:00:05,300 --> 00:00:07,510
Your support will help
MIT OpenCourseWare
4
00:00:07,510 --> 00:00:11,600
continue to offer high-quality
educational resources for free.
5
00:00:11,600 --> 00:00:14,140
To make a donation or to
view additional materials
6
00:00:14,140 --> 00:00:18,100
from hundreds of MIT courses,
visit MIT OpenCourseWare
7
00:00:18,100 --> 00:00:19,310
at ocw.mit.edu.
8
00:00:24,145 --> 00:00:24,770
JAMES SWAN: OK.
9
00:00:24,770 --> 00:00:26,790
Well, everyone's
quieted down, so that
10
00:00:26,790 --> 00:00:28,680
means we have to get started.
11
00:00:28,680 --> 00:00:31,350
So let me say something here.
12
00:00:31,350 --> 00:00:34,260
This will be our
last conversation
13
00:00:34,260 --> 00:00:36,360
about optimization.
14
00:00:36,360 --> 00:00:39,510
So we've discussed
unconstrained optimization.
15
00:00:39,510 --> 00:00:42,150
And now we're going to discuss
a slightly more complicated
16
00:00:42,150 --> 00:00:43,525
problem-- but
you're going to see
17
00:00:43,525 --> 00:00:45,420
it's really not that
much more complicated--
18
00:00:45,420 --> 00:00:47,170
constrained optimization.
19
00:00:50,042 --> 00:00:51,750
These are the things
we discussed before.
20
00:00:51,750 --> 00:00:53,520
I don't want to spend
much time recapping
21
00:00:53,520 --> 00:00:55,936
because I want to take a minute
and talk about the midterm
22
00:00:55,936 --> 00:00:57,780
exam.
23
00:00:57,780 --> 00:00:59,250
So we have a quiz.
24
00:00:59,250 --> 00:01:01,020
It's next Wednesday.
25
00:01:01,020 --> 00:01:02,680
Here's where it's
going to be located.
26
00:01:02,680 --> 00:01:03,630
Here's 66.
27
00:01:03,630 --> 00:01:04,950
Head down Ames.
28
00:01:04,950 --> 00:01:09,030
You're looking for Walker
Memorial on the third floor.
29
00:01:09,030 --> 00:01:12,030
Unfortunately, the time for
the quiz is 7:00 to 9:00 PM.
30
00:01:12,030 --> 00:01:14,640
We really did try hard
to get the scheduling
31
00:01:14,640 --> 00:01:16,450
office to give us
something better,
32
00:01:16,450 --> 00:01:20,100
but the only way to get a room
that would fit everybody in
33
00:01:20,100 --> 00:01:22,035
was to do it at
this time in Walker.
34
00:01:22,035 --> 00:01:23,910
I really don't understand,
because I actually
35
00:01:23,910 --> 00:01:27,480
requested locations for
the quizzes back in April.
36
00:01:27,480 --> 00:01:31,800
And somehow I was
too early, maybe,
37
00:01:31,800 --> 00:01:33,992
and got buried under a pile.
38
00:01:33,992 --> 00:01:35,700
Maybe not important
enough, I don't know.
39
00:01:35,700 --> 00:01:40,146
But it's got to be from
seven to nine next Wednesday.
40
00:01:40,146 --> 00:01:40,645
Third floor.
41
00:01:40,645 --> 00:01:42,270
There's not going
to be any class
42
00:01:42,270 --> 00:01:44,350
next Wednesday because
you have a quiz instead.
43
00:01:44,350 --> 00:01:49,500
So you get a little extra time
to relax or study, prepare,
44
00:01:49,500 --> 00:01:51,690
calm yourself before
you go into the exam.
45
00:01:51,690 --> 00:01:54,150
There's no homework this week.
46
00:01:54,150 --> 00:01:58,950
So you can just use this time
to focus on the material we've
47
00:01:58,950 --> 00:02:00,090
discussed.
48
00:02:00,090 --> 00:02:02,310
There's a practice
exam from last year
49
00:02:02,310 --> 00:02:07,950
posted on the Steller site,
which you can utilize and study
50
00:02:07,950 --> 00:02:08,850
from.
51
00:02:08,850 --> 00:02:10,250
I'll tell you this.
52
00:02:13,506 --> 00:02:15,990
That practice exam is
skewed a little more
53
00:02:15,990 --> 00:02:19,410
towards some chemical
engineering problems
54
00:02:19,410 --> 00:02:22,060
that motivate the numerics.
55
00:02:22,060 --> 00:02:25,050
I've found in the past that
when problems like that
56
00:02:25,050 --> 00:02:27,060
are given on the exam,
sometimes there's
57
00:02:27,060 --> 00:02:30,580
a lot of reading that goes into
understanding the engineering
58
00:02:30,580 --> 00:02:31,080
problem.
59
00:02:31,080 --> 00:02:33,760
And that tends to set
back the problem-solving.
60
00:02:33,760 --> 00:02:37,110
So I'll tell you that the quiz
that you'll take on Wednesday
61
00:02:37,110 --> 00:02:40,590
will have less of the
engineering associated with it,
62
00:02:40,590 --> 00:02:46,080
and focus more on the numerical
or computational science.
63
00:02:46,080 --> 00:02:48,400
The underlying
sorts of questions,
64
00:02:48,400 --> 00:02:51,450
the way the questions are
asked, the kinds of responses
65
00:02:51,450 --> 00:02:53,730
you're expected to give
I'd say are very similar.
66
00:02:53,730 --> 00:02:57,450
But we've tried to tune
the exam so that it'll
67
00:02:57,450 --> 00:02:59,430
be less of a burden
to understand
68
00:02:59,430 --> 00:03:01,429
the structure of the
problem before describing
69
00:03:01,429 --> 00:03:02,220
how you'd solve it.
70
00:03:02,220 --> 00:03:03,920
So I think that's good.
71
00:03:03,920 --> 00:03:07,990
It's comprehensive up to today.
72
00:03:07,990 --> 00:03:10,860
So linear algebra, systems
of nonlinear equations
73
00:03:10,860 --> 00:03:14,160
and optimization
are the quiz topics.
74
00:03:14,160 --> 00:03:18,660
We're going to switch on
Friday to ordinary differential
75
00:03:18,660 --> 00:03:20,735
equations and initial
value problems.
76
00:03:20,735 --> 00:03:22,110
So you have two
lectures on that,
77
00:03:22,110 --> 00:03:24,540
but you won't have
done any homework.
78
00:03:24,540 --> 00:03:26,940
You probably don't know enough
or aren't practiced enough
79
00:03:26,940 --> 00:03:30,796
to answer any questions
intelligently on the quiz.
80
00:03:30,796 --> 00:03:32,670
So don't expect that
material to be on there.
81
00:03:32,670 --> 00:03:33,380
It's not.
82
00:03:33,380 --> 00:03:36,672
It's going to be
these three topics.
83
00:03:36,672 --> 00:03:38,880
Are there any questions
about this that I can answer?
84
00:03:42,864 --> 00:03:44,358
Kristin has a question.
85
00:03:44,358 --> 00:03:45,354
AUDIENCE: [INAUDIBLE].
86
00:03:56,535 --> 00:03:57,160
JAMES SWAN: OK.
87
00:03:57,160 --> 00:03:59,380
So yeah, come prepared.
88
00:03:59,380 --> 00:04:00,190
It might be cold.
89
00:04:00,190 --> 00:04:02,440
It might be hot.
90
00:04:02,440 --> 00:04:04,130
It leaks when it
rains a little bit.
91
00:04:04,130 --> 00:04:07,480
Yeah, it's not
the greatest spot.
92
00:04:07,480 --> 00:04:09,590
So come prepared.
93
00:04:09,590 --> 00:04:10,300
That's true.
94
00:04:10,300 --> 00:04:12,000
Other questions?
95
00:04:12,000 --> 00:04:13,292
Things you want to know?
96
00:04:13,292 --> 00:04:15,180
AUDIENCE: What can
we take to the exam?
97
00:04:15,180 --> 00:04:16,471
JAMES SWAN: Ooh, good question.
98
00:04:16,471 --> 00:04:20,059
So you can bring the book
recommended for the course.
99
00:04:20,059 --> 00:04:21,100
You can bring your notes.
100
00:04:21,100 --> 00:04:22,960
You can bring a calculator.
101
00:04:22,960 --> 00:04:24,300
You need to bring some pencils.
102
00:04:24,300 --> 00:04:28,210
We'll provide blue books for
you to write your solutions
103
00:04:28,210 --> 00:04:29,860
to the exam in.
104
00:04:29,860 --> 00:04:31,150
So those are the materials.
105
00:04:31,150 --> 00:04:32,320
Good.
106
00:04:32,320 --> 00:04:33,370
What else?
107
00:04:33,370 --> 00:04:34,710
Same question.
108
00:04:34,710 --> 00:04:36,419
OK.
109
00:04:36,419 --> 00:04:37,898
Other questions?
110
00:04:40,856 --> 00:04:42,171
No?
111
00:04:42,171 --> 00:04:42,670
OK.
112
00:04:45,755 --> 00:04:47,880
So then let's jump into
the topic of the day, which
113
00:04:47,880 --> 00:04:50,070
is constrained optimization.
114
00:04:50,070 --> 00:04:51,840
So these are
problems of the sort.
115
00:04:51,840 --> 00:04:57,870
Minimize an objective function
f of x subject to the constraint
116
00:04:57,870 --> 00:05:02,190
that x belongs to some set
D, or find the argument
117
00:05:02,190 --> 00:05:04,007
x that minimizes this function.
118
00:05:04,007 --> 00:05:05,590
These are equivalent
sorts of problem.
119
00:05:05,590 --> 00:05:08,410
Sometimes, we want to know
one or the other or both.
120
00:05:08,410 --> 00:05:10,790
That's not a problem.
121
00:05:10,790 --> 00:05:12,290
And graphically,
it looks like this.
122
00:05:12,290 --> 00:05:13,770
Here's f, our
objective function.
123
00:05:13,770 --> 00:05:16,470
It's a nice convex
bowl-shaped function here.
124
00:05:16,470 --> 00:05:19,080
And we want to know the
values of x1 and x2,
125
00:05:19,080 --> 00:05:21,870
let's say, that
minimize this function
126
00:05:21,870 --> 00:05:23,220
subject to some constraint.
127
00:05:23,220 --> 00:05:27,900
That constraint could
be that x1 and x2 live
128
00:05:27,900 --> 00:05:29,820
inside this little blue circle.
129
00:05:29,820 --> 00:05:32,820
It could be D. It could
be that x1 and x2 live
130
00:05:32,820 --> 00:05:34,680
on the surface of
this circle, right,
131
00:05:34,680 --> 00:05:36,780
on the circumference
of this circle.
132
00:05:36,780 --> 00:05:40,670
That could be the constraint.
133
00:05:40,670 --> 00:05:43,160
So these are the sorts of
problems we want to solve.
134
00:05:43,160 --> 00:05:45,980
D is called the
feasible set, and can
135
00:05:45,980 --> 00:05:48,710
be described in terms of really
two types of constraints.
136
00:05:48,710 --> 00:05:51,200
One is what we call
equality constraints.
137
00:05:51,200 --> 00:05:54,650
So D can be the set
of values x such
138
00:05:54,650 --> 00:06:00,440
that some nonlinear function
c of x is equal to zero.
139
00:06:00,440 --> 00:06:03,080
So it's the set of points
that satisfy this nonlinear
140
00:06:03,080 --> 00:06:04,400
equation.
141
00:06:04,400 --> 00:06:06,080
And among those
points, we want to know
142
00:06:06,080 --> 00:06:09,274
which one produces the minimum
in the objective function.
143
00:06:09,274 --> 00:06:10,940
Or it could be an
inequality constraint.
144
00:06:10,940 --> 00:06:14,420
So D could be the set of
points such that some nonlinear
145
00:06:14,420 --> 00:06:20,240
function h of x is, by
convention, positive.
146
00:06:20,240 --> 00:06:22,170
So h of x could
represent, for example,
147
00:06:22,170 --> 00:06:23,900
the interior of a
circle, and c of x
148
00:06:23,900 --> 00:06:27,189
could represent the
circumference of a circle.
149
00:06:27,189 --> 00:06:28,730
And we would have
nonlinear equations
150
00:06:28,730 --> 00:06:33,560
that reflect those
values of x that satisfy
151
00:06:33,560 --> 00:06:35,360
those sorts of geometries.
152
00:06:38,770 --> 00:06:43,930
So equality constrained,
points that lie on this circle,
153
00:06:43,930 --> 00:06:48,010
inequality constrained, points
that lie within this circle.
154
00:06:48,010 --> 00:06:50,560
The shape of the feasible set
is constrained by the problem
155
00:06:50,560 --> 00:06:52,460
that you're actually
interested in.
156
00:06:52,460 --> 00:06:54,460
So it's easy for me to
draw circles in the plane
157
00:06:54,460 --> 00:06:56,293
because that's a shape
you're familiar with.
158
00:06:56,293 --> 00:06:57,970
But actually, it'll
come from some sort
159
00:06:57,970 --> 00:07:01,060
of physical constraint on the
engineering problem you're
160
00:07:01,060 --> 00:07:03,940
looking at, like mole fractions
need to be bigger than zero
161
00:07:03,940 --> 00:07:06,430
and smaller than one, and
temperatures in absolute value
162
00:07:06,430 --> 00:07:09,220
have to be bigger than zero
and smaller than some value
163
00:07:09,220 --> 00:07:12,790
because that's a safety
factor on the process.
164
00:07:12,790 --> 00:07:16,960
So these set up the
constraints on various sorts
165
00:07:16,960 --> 00:07:19,864
of optimization problems
that we're interested in.
166
00:07:19,864 --> 00:07:22,030
It could also be true that
we're interested in, say,
167
00:07:22,030 --> 00:07:24,790
optimization in the domain
outside of this circle, too.
168
00:07:24,790 --> 00:07:28,030
It could be on the inside,
could be on the outside.
169
00:07:28,030 --> 00:07:31,816
That's also an inequality
constrained sort of problem.
170
00:07:31,816 --> 00:07:34,730
You know some of these already.
171
00:07:34,730 --> 00:07:36,350
They're familiar to you.
172
00:07:36,350 --> 00:07:38,840
So here's a classic
one from mechanics.
173
00:07:38,840 --> 00:07:43,120
Here's the total energy in a
system for, say, a pendulum.
174
00:07:43,120 --> 00:07:47,080
So x is like the position of
the tip of this pendulum and v
175
00:07:47,080 --> 00:07:48,910
is the velocity
that it moves with.
176
00:07:48,910 --> 00:07:50,690
This is the kinetic energy.
177
00:07:50,690 --> 00:07:51,920
This is the potential energy.
178
00:07:51,920 --> 00:07:54,128
And we know the pendulum
will come to rest in a place
179
00:07:54,128 --> 00:07:56,680
where the energy is minimized.
180
00:07:56,680 --> 00:07:59,050
Well, the energy can
only be minimized
181
00:07:59,050 --> 00:08:02,590
when the velocity here is zero,
because any non-zero velocity
182
00:08:02,590 --> 00:08:04,450
will always push the
energy content up.
183
00:08:04,450 --> 00:08:06,040
So it comes to rest.
184
00:08:06,040 --> 00:08:07,690
It doesn't move.
185
00:08:07,690 --> 00:08:10,120
And then there's some
value of x at which
186
00:08:10,120 --> 00:08:11,710
the energy is minimized.
187
00:08:11,710 --> 00:08:14,680
If there is no constraint
that says that the pendulum is
188
00:08:14,680 --> 00:08:17,680
attached to some
central axis, then I
189
00:08:17,680 --> 00:08:19,270
can always make
the energy smaller
190
00:08:19,270 --> 00:08:21,430
by making x more
and more negative.
191
00:08:21,430 --> 00:08:22,570
It just keeps falling.
192
00:08:22,570 --> 00:08:23,740
There is no stopping point.
193
00:08:23,740 --> 00:08:25,507
But there's a constraint.
194
00:08:25,507 --> 00:08:27,340
The distance between
the tip of the pendulum
195
00:08:27,340 --> 00:08:31,965
and this central point is
some fixed distance out.
196
00:08:31,965 --> 00:08:34,090
So this is an equality
constrained sort of problem,
197
00:08:34,090 --> 00:08:35,714
and we have to choose
from the set of v
198
00:08:35,714 --> 00:08:38,679
and x the values subject
to this constraint that
199
00:08:38,679 --> 00:08:39,970
minimize the total energy.
200
00:08:39,970 --> 00:08:43,650
And that's this configuration
of the pendulum here.
201
00:08:43,650 --> 00:08:46,420
So you know these sorts
of problems already.
202
00:08:46,420 --> 00:08:51,380
We talked about this one,
linear sorts of programs.
203
00:08:51,380 --> 00:08:55,480
These are optimization problems
where the objective function is
204
00:08:55,480 --> 00:08:57,940
linear in the design variables.
205
00:08:57,940 --> 00:09:01,300
So it's just the dot product
between x and some vector
206
00:09:01,300 --> 00:09:03,070
c that weights the
different design
207
00:09:03,070 --> 00:09:05,300
options against each other.
208
00:09:05,300 --> 00:09:07,060
So we talked about ice cream.
209
00:09:07,060 --> 00:09:08,480
Yes, this is all
premium ice cream
210
00:09:08,480 --> 00:09:11,435
because it comes in
the small containers,
211
00:09:11,435 --> 00:09:12,810
subject to different
constraints.
212
00:09:12,810 --> 00:09:14,226
So those constraints
can be things
213
00:09:14,226 --> 00:09:16,480
like, oh, x has to be
positive because we can't make
214
00:09:16,480 --> 00:09:18,220
negative amounts of ice cream.
215
00:09:18,220 --> 00:09:20,440
And maybe we've
done market research
216
00:09:20,440 --> 00:09:22,300
that tells us that
the market can only
217
00:09:22,300 --> 00:09:26,110
tolerate certain ratios of
different types of ice cream.
218
00:09:26,110 --> 00:09:28,600
And that may be some
set of linear equations
219
00:09:28,600 --> 00:09:31,570
that describe that market
research that sort of bound
220
00:09:31,570 --> 00:09:33,812
the upper values of
how much ice cream
221
00:09:33,812 --> 00:09:35,020
we can put out on the market.
222
00:09:35,020 --> 00:09:38,680
And then we try to choose the
optimal blend of pina colada
223
00:09:38,680 --> 00:09:40,230
and strawberry to sell.
224
00:09:43,385 --> 00:09:46,040
So those are linear programs.
225
00:09:46,040 --> 00:09:50,270
This is an inequality
constrained optimization.
226
00:09:53,660 --> 00:09:56,330
In general, we might write
these problems like this.
227
00:09:56,330 --> 00:09:59,900
We might say minimize f of
x subject to the constraint
228
00:09:59,900 --> 00:10:04,400
that c of x is 0 and
h of x is positive.
229
00:10:04,400 --> 00:10:06,200
So minimize it over
the values of x that
230
00:10:06,200 --> 00:10:08,490
satisfy these two constraints.
231
00:10:08,490 --> 00:10:12,060
There's an old approach that's
discussed in the literature.
232
00:10:12,060 --> 00:10:12,920
And it's not used.
233
00:10:12,920 --> 00:10:14,000
I'm going to describe
it to you, and then I
234
00:10:14,000 --> 00:10:16,000
want you to try to figure
out why it's not used.
235
00:10:16,000 --> 00:10:19,210
And it's called
the penalty method.
236
00:10:19,210 --> 00:10:21,250
And the penalty
method works this way.
237
00:10:21,250 --> 00:10:23,990
It says define a new
objective function,
238
00:10:23,990 --> 00:10:28,690
which is our old objective
function plus some penalty
239
00:10:28,690 --> 00:10:30,835
for violating the constraints.
240
00:10:30,835 --> 00:10:31,960
How does that penalty work?
241
00:10:31,960 --> 00:10:35,590
So we know that we want
values of x for which c of x
242
00:10:35,590 --> 00:10:38,200
is equal to 0.
243
00:10:38,200 --> 00:10:41,200
So if we add to our objective
function the norm of c of x--
244
00:10:41,200 --> 00:10:44,720
this is a positive quantity--
245
00:10:44,720 --> 00:10:46,700
this is a positive
quantity-- whenever
246
00:10:46,700 --> 00:10:48,740
x doesn't satisfy
the constraint,
247
00:10:48,740 --> 00:10:51,080
this positive
quantity will give us
248
00:10:51,080 --> 00:10:54,420
a bigger value for this
objective function f
249
00:10:54,420 --> 00:10:56,180
than if c of x was equal to 0.
250
00:10:56,180 --> 00:11:01,820
So we penalize points which
don't satisfy the constraint.
251
00:11:01,820 --> 00:11:04,940
And in the limit that this
penalty factor mu here
252
00:11:04,940 --> 00:11:10,130
goes to zero, the penalties
get large, so large
253
00:11:10,130 --> 00:11:13,280
that our solution
will have to prefer
254
00:11:13,280 --> 00:11:14,962
satisfying the constraints.
255
00:11:14,962 --> 00:11:16,670
There's another penalty
factor over here,
256
00:11:16,670 --> 00:11:19,280
which is identical to this
one but for the inequality
257
00:11:19,280 --> 00:11:20,360
constraint.
258
00:11:20,360 --> 00:11:26,770
It says take a
heaviside step function
259
00:11:26,770 --> 00:11:31,400
for which is equal to 1 when
the value of its argument
260
00:11:31,400 --> 00:11:32,960
is positive, and
it's equal to zero
261
00:11:32,960 --> 00:11:35,650
when the value of its
argument is negative.
262
00:11:35,650 --> 00:11:40,760
So whenever I violate each
of my inequality constraints,
263
00:11:40,760 --> 00:11:44,930
Hi of x, turn on this
heaviside step function,
264
00:11:44,930 --> 00:11:46,610
make it equal to 1,
and then multiply it
265
00:11:46,610 --> 00:11:50,270
by the value of the constraint
squared, a positive number.
266
00:11:50,270 --> 00:11:52,430
So this is the inequality
constraint penalty,
267
00:11:52,430 --> 00:11:54,470
and this is the equality
constraint penalty.
268
00:11:54,470 --> 00:11:57,480
People don't use this, though.
269
00:11:57,480 --> 00:11:58,470
It makes sense.
270
00:11:58,470 --> 00:12:00,540
I take the limit
that mu goes to zero.
271
00:12:00,540 --> 00:12:03,810
I'm going to have
to prefer solutions
272
00:12:03,810 --> 00:12:06,580
that satisfy these constraints.
273
00:12:06,580 --> 00:12:08,745
Otherwise, if I don't
satisfy these constraints,
274
00:12:08,745 --> 00:12:10,620
I could always move
closer to a solution that
275
00:12:10,620 --> 00:12:12,036
satisfies the
constraint, and I'll
276
00:12:12,036 --> 00:12:14,970
bring down the value of
the objective function.
277
00:12:14,970 --> 00:12:15,857
I'll make it lower.
278
00:12:15,857 --> 00:12:17,940
So I'll always prefer these
lower value solutions.
279
00:12:17,940 --> 00:12:21,180
But can you guys take a second
and sort of talk to each other?
280
00:12:21,180 --> 00:12:25,147
See if you can figure out why
one doesn't use this method.
281
00:12:25,147 --> 00:12:26,355
Why is this method a problem?
282
00:15:08,980 --> 00:15:11,780
OK, I heard the volume go
up at some point, which
283
00:15:11,780 --> 00:15:13,952
means either you
switched topics and felt
284
00:15:13,952 --> 00:15:15,410
more comfortable
talking about that
285
00:15:15,410 --> 00:15:17,360
than this, or maybe
you guys were coming
286
00:15:17,360 --> 00:15:19,970
to some conclusions, or
had some ideas about why
287
00:15:19,970 --> 00:15:21,359
this might be a bad idea.
288
00:15:21,359 --> 00:15:23,900
Do you want to volunteer some
of what you were talking about?
289
00:15:23,900 --> 00:15:24,878
Yeah, Hersh.
290
00:15:24,878 --> 00:15:28,301
AUDIENCE: Could it
be that [INAUDIBLE]??
291
00:15:41,040 --> 00:15:43,340
JAMES SWAN: Well, that's
an interesting idea.
292
00:15:43,340 --> 00:15:45,710
So yeah, if we have a
non-convex optimization problem,
293
00:15:45,710 --> 00:15:48,710
there could be some issues
with f of x, and maybe f
294
00:15:48,710 --> 00:15:50,810
of x runs away so
fast that I can never
295
00:15:50,810 --> 00:15:53,080
make the penalty big enough
to enforce the constraint.
296
00:15:53,080 --> 00:15:54,830
That's actually a
really interesting idea.
297
00:15:54,830 --> 00:15:57,630
And I like the idea of comparing
the magnitude of these two
298
00:15:57,630 --> 00:15:58,130
terms.
299
00:15:58,130 --> 00:15:59,600
I think that's on
the right track.
300
00:15:59,600 --> 00:16:01,560
Were there some
other ideas about why
301
00:16:01,560 --> 00:16:02,624
you might not do this?
302
00:16:02,624 --> 00:16:03,290
Different ideas?
303
00:16:03,290 --> 00:16:04,408
Yeah.
304
00:16:04,408 --> 00:16:05,830
AUDIENCE: [INAUDIBLE].
305
00:16:09,980 --> 00:16:12,480
JAMES SWAN: Well, you know,
that that's an interesting idea,
306
00:16:12,480 --> 00:16:14,630
but actually the two
terms in the parentheses
307
00:16:14,630 --> 00:16:17,150
here are both positive.
308
00:16:17,150 --> 00:16:19,100
So they're only
going to be minimized
309
00:16:19,100 --> 00:16:21,760
when I satisfy the constraints.
310
00:16:21,760 --> 00:16:24,500
So the local minima of
the terms in parentheses
311
00:16:24,500 --> 00:16:30,260
sit on or within the
boundaries of the feasible set
312
00:16:30,260 --> 00:16:31,410
that we're looking at.
313
00:16:31,410 --> 00:16:32,868
So by construction,
actually, we're
314
00:16:32,868 --> 00:16:35,660
going to be able to satisfy
them because the local minima
315
00:16:35,660 --> 00:16:38,420
of these points sits
on these boundaries.
316
00:16:38,420 --> 00:16:43,230
These terms are minimized by
satisfying the constraints.
317
00:16:43,230 --> 00:16:43,731
Other ideas?
318
00:16:43,731 --> 00:16:44,229
Yeah.
319
00:16:44,229 --> 00:16:46,190
AUDIENCE: Do your iterates
have to be feasible?
320
00:16:46,190 --> 00:16:46,610
JAMES SWAN: What's that?
321
00:16:46,610 --> 00:16:48,380
AUDIENCE: Your iterates
don't have to be feasible?
322
00:16:48,380 --> 00:16:49,963
JAMES SWAN: Ooh,
this is a good point.
323
00:16:49,963 --> 00:16:52,590
The iterates-- this is an
unconstrained optimization
324
00:16:52,590 --> 00:16:53,090
problem.
325
00:16:53,090 --> 00:16:55,600
I'm just going to minimize
this objective function.
326
00:16:55,600 --> 00:16:57,140
It's like what
Hersh said, I can go
327
00:16:57,140 --> 00:16:58,764
anywhere I want in the domain.
328
00:16:58,764 --> 00:17:00,680
I'm going to minimize
this objective function,
329
00:17:00,680 --> 00:17:01,280
and then I'm going
to try to take
330
00:17:01,280 --> 00:17:02,570
the limit as mu goes to zero.
331
00:17:02,570 --> 00:17:04,194
The iterates don't
have to be feasible.
332
00:17:04,194 --> 00:17:06,609
Maybe I can't even evaluate
f of x if the iterates aren't
333
00:17:06,609 --> 00:17:07,310
feasible.
334
00:17:07,310 --> 00:17:08,660
That's an excellent point.
335
00:17:08,660 --> 00:17:10,640
That could be an issue.
336
00:17:10,640 --> 00:17:13,829
Anything else?
337
00:17:13,829 --> 00:17:17,030
Are there some other ideas?
338
00:17:17,030 --> 00:17:17,617
Sure.
339
00:17:17,617 --> 00:17:19,078
AUDIENCE: [INAUDIBLE].
340
00:17:28,050 --> 00:17:30,007
JAMES SWAN: I think
that's a good point.
341
00:17:30,007 --> 00:17:31,468
AUDIENCE: --boundary
from outside
342
00:17:31,468 --> 00:17:33,210
without knowing what's inside.
343
00:17:33,210 --> 00:17:33,960
JAMES SWAN: Short.
344
00:17:33,960 --> 00:17:36,090
So you'll see, actually,
the right way to do this
345
00:17:36,090 --> 00:17:38,370
is to use what's called
interior point methods, which
346
00:17:38,370 --> 00:17:39,900
live inside the domain.
347
00:17:39,900 --> 00:17:41,100
This is an excellent point.
348
00:17:41,100 --> 00:17:43,890
There's another issue
with this that's
349
00:17:43,890 --> 00:17:46,391
I think actually less subtle
than some of these ideas, which
350
00:17:46,391 --> 00:17:47,640
they're all correct, actually.
351
00:17:47,640 --> 00:17:50,220
These can be problems with
this sort of penalty method.
352
00:17:50,220 --> 00:17:53,370
As I take the limit
that mu goes to zero,
353
00:17:53,370 --> 00:17:57,390
the penalty function
becomes large for all points
354
00:17:57,390 --> 00:17:58,320
outside the domain.
355
00:17:58,320 --> 00:18:02,347
They can become larger
than f for those points.
356
00:18:02,347 --> 00:18:03,930
And so there are
some practical issues
357
00:18:03,930 --> 00:18:06,210
about comparing these two
terms against each other.
358
00:18:06,210 --> 00:18:11,010
I may not have sufficient
accuracy, sufficient number
359
00:18:11,010 --> 00:18:14,950
of digits to accurately add
these two terms together.
360
00:18:14,950 --> 00:18:17,661
So I may prefer
to find some point
361
00:18:17,661 --> 00:18:20,160
that lives on the boundary of
the domain as mu goes to zero.
362
00:18:20,160 --> 00:18:21,534
But I can't
guarantee that it was
363
00:18:21,534 --> 00:18:27,600
a minima of f on that domain,
or within that feasible set.
364
00:18:27,600 --> 00:18:30,360
So a lot of practical
issues that suggest this
365
00:18:30,360 --> 00:18:32,397
is a bad idea.
366
00:18:32,397 --> 00:18:33,230
This is an old idea.
367
00:18:33,230 --> 00:18:34,650
People knew this was
bad for a long time.
368
00:18:34,650 --> 00:18:35,691
It seems natural, though.
369
00:18:35,691 --> 00:18:37,530
It seems like a good
way to transform
370
00:18:37,530 --> 00:18:41,154
from these constrained
optimization problems
371
00:18:41,154 --> 00:18:42,570
to something we
know how to solve,
372
00:18:42,570 --> 00:18:43,992
an unconstrained optimization.
373
00:18:43,992 --> 00:18:46,200
But actually, it turns out
not to be such a great way
374
00:18:46,200 --> 00:18:46,700
to do it.
375
00:18:50,340 --> 00:18:52,080
So let's talk about
separating out
376
00:18:52,080 --> 00:18:54,160
these two different
methods from each other,
377
00:18:54,160 --> 00:18:55,740
or these two different problems.
378
00:18:55,740 --> 00:18:57,840
Let's talk first about
equality constraints,
379
00:18:57,840 --> 00:19:01,550
and then we'll talk about
inequality constraints.
380
00:19:01,550 --> 00:19:04,040
So equality constrained
optimization problems
381
00:19:04,040 --> 00:19:04,850
look like this.
382
00:19:04,850 --> 00:19:08,280
Minimize f of x subject
to c of x equals zero.
383
00:19:08,280 --> 00:19:09,710
And let's make it even easier.
384
00:19:09,710 --> 00:19:13,910
Rather than having some vector
of equality constraints,
385
00:19:13,910 --> 00:19:15,722
let's just have
a single equation
386
00:19:15,722 --> 00:19:17,930
that we have to satisfy for
that equality constraint,
387
00:19:17,930 --> 00:19:19,550
like the equation for a circle.
388
00:19:19,550 --> 00:19:22,970
Solutions have to sit on the
circumference of a circle.
389
00:19:22,970 --> 00:19:26,170
So one equation that
we have to satisfy.
390
00:19:26,170 --> 00:19:28,640
You might ask again, what
are the necessary conditions
391
00:19:28,640 --> 00:19:31,480
for defining a minimum?
392
00:19:31,480 --> 00:19:33,230
That's what we used
when we had equality--
393
00:19:33,230 --> 00:19:35,270
or when we had
unconstrained optimization.
394
00:19:35,270 --> 00:19:38,420
First we had to define
what a minimum was,
395
00:19:38,420 --> 00:19:40,940
and we found that minima
were critical points, places
396
00:19:40,940 --> 00:19:44,720
where the gradient of the
objective function was zero.
397
00:19:44,720 --> 00:19:46,880
That doesn't have
to be true anymore.
398
00:19:46,880 --> 00:19:52,670
Now, the minima has to live on
this boundary of some domain.
399
00:19:52,670 --> 00:19:56,240
It has to live in this set
of points c of x equals zero.
400
00:19:56,240 --> 00:19:58,100
And the gradient of
f is not necessarily
401
00:19:58,100 --> 00:20:00,980
zero at that minimal point.
402
00:20:00,980 --> 00:20:04,580
But you might guess that
Taylor expansions are the way
403
00:20:04,580 --> 00:20:14,180
to figure out what the
appropriate conditions
404
00:20:14,180 --> 00:20:15,180
for a minima are.
405
00:20:15,180 --> 00:20:18,050
So let's take f of x, and
let's expand it, do a Taylor
406
00:20:18,050 --> 00:20:20,630
expansion in some direction, d.
407
00:20:20,630 --> 00:20:23,810
So we'll take a step away
from x, which is small,
408
00:20:23,810 --> 00:20:24,840
in some direction, d.
409
00:20:24,840 --> 00:20:28,550
So f of x plus d is
f of x plus g dot
410
00:20:28,550 --> 00:20:34,580
d, the dot product between
the gradient of f and d.
411
00:20:34,580 --> 00:20:38,570
And at a minimum,
either the gradient
412
00:20:38,570 --> 00:20:43,460
is zero or the gradient is
perpendicular to this direction
413
00:20:43,460 --> 00:20:45,860
we moved in, d.
414
00:20:45,860 --> 00:20:53,380
We know that because this
term is going to increase--
415
00:20:53,380 --> 00:20:55,370
well, will change
the value of f of x.
416
00:20:55,370 --> 00:20:57,060
It will either make
it bigger or smaller
417
00:20:57,060 --> 00:20:59,460
depending on whether it's
positive or negative.
418
00:20:59,460 --> 00:21:01,170
In either case, it
will say that this
419
00:21:01,170 --> 00:21:04,770
point x can't be a minimum
unless this term is exactly
420
00:21:04,770 --> 00:21:07,740
equal to zero in the limit
that d becomes small.
421
00:21:07,740 --> 00:21:09,870
So either the gradient
is zero or the gradient
422
00:21:09,870 --> 00:21:13,310
is orthogonal to this
direction d we stepped in.
423
00:21:13,310 --> 00:21:16,290
And d was arbitrary.
424
00:21:16,290 --> 00:21:18,710
We just said take a
step in a direction, d.
425
00:21:22,350 --> 00:21:24,150
Lets take our
equality constraint
426
00:21:24,150 --> 00:21:28,140
and do the same sort of
Taylor expansion, because we
427
00:21:28,140 --> 00:21:33,050
know if we're searching for
a minima along this curve
428
00:21:33,050 --> 00:21:35,310
c of x better be equal to zero.
429
00:21:35,310 --> 00:21:36,820
It better satisfy
the constraint.
430
00:21:36,820 --> 00:21:40,440
And also, c of x plus d, that
little step in the direction d,
431
00:21:40,440 --> 00:21:43,140
should also satisfy
the constraint.
432
00:21:43,140 --> 00:21:46,980
We want to study only the
feasible set of values.
433
00:21:46,980 --> 00:21:48,602
So actually, d
wasn't arbitrary. d
434
00:21:48,602 --> 00:21:50,310
had to satisfy this
constraint that, when
435
00:21:50,310 --> 00:21:53,700
I took this little step, c of x
plus d had to be equal to zero.
436
00:21:53,700 --> 00:21:55,770
So again, we'll take
now a Taylor expansion
437
00:21:55,770 --> 00:21:59,310
of c of x plus d, which
is c of x plus grad
438
00:21:59,310 --> 00:22:03,260
of c of x dotted with d.
439
00:22:03,260 --> 00:22:07,730
And that implies that d must be
perpendicular to the gradient
440
00:22:07,730 --> 00:22:10,790
of c of x, because c of
x plus d has to be zero
441
00:22:10,790 --> 00:22:12,200
and c of x has to be zero.
442
00:22:12,200 --> 00:22:16,496
So the gradient of c of x
dot d-- it's a leading order
443
00:22:16,496 --> 00:22:17,870
has also got to
be equal to zero.
444
00:22:17,870 --> 00:22:21,290
So d and the gradient
in c are perpendicular,
445
00:22:21,290 --> 00:22:23,780
and d and the gradient
in g have to be
446
00:22:23,780 --> 00:22:27,230
perpendicular at a minimum.
447
00:22:27,230 --> 00:22:29,900
That's going to define the
minimum on this equality
448
00:22:29,900 --> 00:22:31,960
constrained set.
449
00:22:31,960 --> 00:22:34,830
Does that make sense?
450
00:22:34,830 --> 00:22:37,260
c satisfies the
constraint, c plus d
451
00:22:37,260 --> 00:22:38,820
satisfies the constraint.
452
00:22:38,820 --> 00:22:42,060
If this is true, d has to be
perpendicular to the gradient
453
00:22:42,060 --> 00:22:46,430
of c, g has to be perpendicular
to the gradient of d.
454
00:22:46,430 --> 00:22:50,050
d is, in some sense,
arbitrary still.
455
00:22:50,050 --> 00:22:52,080
d has to satisfy
condition that it's
456
00:22:52,080 --> 00:22:53,800
perpendicular to
the gradient of c,
457
00:22:53,800 --> 00:22:55,949
but who knows,
there could be lots
458
00:22:55,949 --> 00:22:57,990
of vectors that are
perpendicular to the gradient
459
00:22:57,990 --> 00:22:59,880
of c.
460
00:22:59,880 --> 00:23:02,100
So the only generic
relationship between these two
461
00:23:02,100 --> 00:23:06,720
we can formulate is g must be
parallel to the gradient of c.
462
00:23:06,720 --> 00:23:08,970
g is perpendicular
to d, gradient
463
00:23:08,970 --> 00:23:10,550
of c is perpendicular to d.
464
00:23:10,550 --> 00:23:12,985
In the most generic
way, g and gradient of c
465
00:23:12,985 --> 00:23:14,360
should be parallel
to each other,
466
00:23:14,360 --> 00:23:16,290
because d I can
select arbitrarily
467
00:23:16,290 --> 00:23:21,400
from all the vectors of
the same dimension as x.
468
00:23:24,060 --> 00:23:25,710
If g is parallel to
the gradient of c,
469
00:23:25,710 --> 00:23:31,080
then I can write that g
minus some scalar multiplied
470
00:23:31,080 --> 00:23:33,257
by the gradient of
c is equal to zero.
471
00:23:33,257 --> 00:23:35,340
That's an equivalent
statement, that g is parallel
472
00:23:35,340 --> 00:23:37,230
to the gradient of c.
473
00:23:37,230 --> 00:23:41,490
So that's a condition
associated with points
474
00:23:41,490 --> 00:23:45,660
x that solve this equality
constrained problem.
475
00:23:45,660 --> 00:23:47,790
The other condition
is that point x still
476
00:23:47,790 --> 00:23:52,170
has to satisfy the
equality constraint.
477
00:23:52,170 --> 00:23:55,560
But I introduced a new
unknown, this lambda,
478
00:23:55,560 --> 00:23:58,170
which is called the
Lagrange multiplier.
479
00:23:58,170 --> 00:24:02,730
So now I have one extra unknown,
but I have one extra equation.
480
00:24:06,094 --> 00:24:08,010
Let me give you a graphical
depiction of this,
481
00:24:08,010 --> 00:24:11,510
and then I'll write down
the formal equations again.
482
00:24:11,510 --> 00:24:14,660
So let's suppose
we want to minimize
483
00:24:14,660 --> 00:24:17,750
this parabolic function
subject to the constraint
484
00:24:17,750 --> 00:24:20,540
that the solution
lives on the line.
485
00:24:20,540 --> 00:24:22,520
So here's the contours
of the function,
486
00:24:22,520 --> 00:24:24,590
and the solution has
to live on this line.
487
00:24:28,380 --> 00:24:30,050
So I get to stand
on this line, and I
488
00:24:30,050 --> 00:24:34,010
get to walk and walk and walk
until I can't walk downhill
489
00:24:34,010 --> 00:24:36,650
anymore. and I've got to
turn and walk uphill again.
490
00:24:36,650 --> 00:24:40,870
And you can see the point where
I can't walk downhill anymore
491
00:24:40,870 --> 00:24:45,170
is the place where this
constraint is parallel
492
00:24:45,170 --> 00:24:49,700
to the contour, or
where the gradient
493
00:24:49,700 --> 00:24:52,520
of the objective
function is parallel
494
00:24:52,520 --> 00:24:55,712
to the gradient
of the constraint.
495
00:24:55,712 --> 00:24:57,170
So you can actually
find this point
496
00:24:57,170 --> 00:25:00,140
by imagining yourself
moving along this landscape.
497
00:25:00,140 --> 00:25:02,960
After I get to this point,
I start going uphill again.
498
00:25:05,530 --> 00:25:09,700
So that's the method of
Lagrange multipliers.
499
00:25:09,700 --> 00:25:11,800
Minimize f of x subject
to this constraint.
500
00:25:11,800 --> 00:25:15,380
The solution is
given by the point x
501
00:25:15,380 --> 00:25:20,510
at which the gradient is
parallel to the gradient of c,
502
00:25:20,510 --> 00:25:22,910
and at which c is equal to zero.
503
00:25:22,910 --> 00:25:25,970
And you solve this system
of nonlinear equations
504
00:25:25,970 --> 00:25:27,900
for two unknowns.
505
00:25:27,900 --> 00:25:31,790
One is x, and the other
is this unknown lambda.
506
00:25:31,790 --> 00:25:35,540
How far stretched
is the gradient
507
00:25:35,540 --> 00:25:39,192
in f relative to
the gradient in c?
508
00:25:39,192 --> 00:25:41,150
So again, we've turned
the minimization problem
509
00:25:41,150 --> 00:25:43,760
into a system of
nonlinear equations.
510
00:25:43,760 --> 00:25:46,100
In order to satisfy the
equality constraint,
511
00:25:46,100 --> 00:25:47,980
we've had to introduce
another unknown,
512
00:25:47,980 --> 00:25:50,150
the Lagrange multiplier.
513
00:25:50,150 --> 00:25:54,980
It turns out this solution
set, x and lambda,
514
00:25:54,980 --> 00:25:58,940
is a critical point of
something called the Lagrangian.
515
00:25:58,940 --> 00:26:03,950
It's a function f of x
minus lambda times c.
516
00:26:03,950 --> 00:26:08,840
It's a critical point in x
and lambda of this nonlinear
517
00:26:08,840 --> 00:26:10,820
function called the Lagrangian.
518
00:26:10,820 --> 00:26:13,010
It's not a minimum of this
function, unfortunately.
519
00:26:13,010 --> 00:26:17,240
It's a saddle point of the
Lagrangian, it turns out.
520
00:26:17,240 --> 00:26:21,671
So we're trying to find a
saddle point of the Lagrangian.
521
00:26:21,671 --> 00:26:23,990
Does this make sense?
522
00:26:23,990 --> 00:26:24,747
Yes?
523
00:26:24,747 --> 00:26:25,902
OK.
524
00:26:25,902 --> 00:26:27,360
We've got to be
careful, of course.
525
00:26:27,360 --> 00:26:29,630
Just like with
unconstrained optimization,
526
00:26:29,630 --> 00:26:33,530
we actually have to check that
our solution is a minimum.
527
00:26:33,530 --> 00:26:36,200
We can't take for
granted, we can't
528
00:26:36,200 --> 00:26:39,920
suppose that our nonlinear
solver found a minimum
529
00:26:39,920 --> 00:26:41,322
when it solved this equation.
530
00:26:41,322 --> 00:26:43,530
Other critical points can
satisfy this equation, too.
531
00:26:43,530 --> 00:26:47,169
So we've got to go back and try
to check robustly whether it's
532
00:26:47,169 --> 00:26:47,960
actually a minimum.
533
00:26:47,960 --> 00:26:48,918
But this is the method.
534
00:26:48,918 --> 00:26:53,060
Introduce an additional unknown,
the Lagrange multiplier,
535
00:26:53,060 --> 00:26:54,809
because you can
show geometrically
536
00:26:54,809 --> 00:26:56,600
that the gradient of
the objective function
537
00:26:56,600 --> 00:27:00,140
should be parallel to the
gradient of the constraint
538
00:27:00,140 --> 00:27:01,670
at the minimum.
539
00:27:01,670 --> 00:27:03,541
Does that make sense?
540
00:27:03,541 --> 00:27:05,060
Does this picture make sense?
541
00:27:05,060 --> 00:27:06,020
OK.
542
00:27:06,020 --> 00:27:08,353
So you know how to solve
systems of nonlinear equations,
543
00:27:08,353 --> 00:27:10,460
you know how to solve
constrained optimization
544
00:27:10,460 --> 00:27:10,959
problems.
545
00:27:15,300 --> 00:27:17,060
So here's f.
546
00:27:17,060 --> 00:27:19,870
Here's c.
547
00:27:19,870 --> 00:27:23,170
We can actually write out
what these equations are.
548
00:27:23,170 --> 00:27:25,750
So you can show that the
gradient of x minus lambda
549
00:27:25,750 --> 00:27:31,660
gradient of c, that's a vector,
2x1 minus lambda and 20x2
550
00:27:31,660 --> 00:27:33,070
plus lambda.
551
00:27:33,070 --> 00:27:34,960
And c is the equation
for this line
552
00:27:34,960 --> 00:27:37,480
down here, so x1
minus x2 minus 3.
553
00:27:37,480 --> 00:27:39,850
And that's all got
to be equal to zero.
554
00:27:39,850 --> 00:27:42,410
In this case, this is just a
system of linear equations.
555
00:27:42,410 --> 00:27:44,080
So you can actually
solve directly
556
00:27:44,080 --> 00:27:47,629
for x1, x2, and lambda.
557
00:27:47,629 --> 00:27:50,170
And it's not too difficult to
find the solution for all three
558
00:27:50,170 --> 00:27:52,480
of these things by hand.
559
00:27:52,480 --> 00:27:55,750
But in general, these
constraints can be nonlinear.
560
00:27:55,750 --> 00:27:58,570
The objective function
doesn't have to be quadratic.
561
00:27:58,570 --> 00:28:00,340
Those are the easiest
cases to look at.
562
00:28:00,340 --> 00:28:02,810
And the same
methodology applies.
563
00:28:02,810 --> 00:28:06,265
And so you should check
that you're able to do this.
564
00:28:06,265 --> 00:28:08,700
This is the simplest possible
equality constraint problem.
565
00:28:08,700 --> 00:28:09,720
You could do it by hand.
566
00:28:09,720 --> 00:28:11,230
You should check that you're
actually able to do it,
567
00:28:11,230 --> 00:28:13,840
that you understand the steps
that go into writing out
568
00:28:13,840 --> 00:28:14,911
these equations.
569
00:28:17,500 --> 00:28:19,290
Let's just take one
step forward and look
570
00:28:19,290 --> 00:28:21,810
at a less generic
case, one in which
571
00:28:21,810 --> 00:28:28,220
we have a vector valued
function that gives the equality
572
00:28:28,220 --> 00:28:29,240
constraints instead.
573
00:28:29,240 --> 00:28:31,220
So rather than one equation
we have to satisfy,
574
00:28:31,220 --> 00:28:31,985
there may be many.
575
00:28:34,610 --> 00:28:39,110
It's possible that the
feasible set doesn't
576
00:28:39,110 --> 00:28:41,240
have any solutions in it.
577
00:28:41,240 --> 00:28:42,830
It's possible that
there is no x that
578
00:28:42,830 --> 00:28:46,560
satisfies all of these
constraints simultaneously.
579
00:28:46,560 --> 00:28:49,490
That's a bad problem to have.
580
00:28:49,490 --> 00:28:52,100
You wouldn't like to have
that problem very much.
581
00:28:52,100 --> 00:28:53,850
But it's possible
that that's the case.
582
00:28:53,850 --> 00:28:57,600
But let's assume that there are
solutions for the time being.
583
00:28:57,600 --> 00:29:00,081
So there are x's that satisfy
the equality constraint.
584
00:29:00,081 --> 00:29:01,580
Let's see if we can
figure out again
585
00:29:01,580 --> 00:29:04,640
what the necessary conditions
for defining a minima are.
586
00:29:04,640 --> 00:29:07,100
So same as before,
let's Taylor expand
587
00:29:07,100 --> 00:29:10,340
f of x going in
some direction, d.
588
00:29:10,340 --> 00:29:12,060
And let's make d
a nice small step
589
00:29:12,060 --> 00:29:14,840
so we can just treat
the f of x plus d
590
00:29:14,840 --> 00:29:16,820
as a linearized function.
591
00:29:16,820 --> 00:29:19,220
So we can see again
that g has to be
592
00:29:19,220 --> 00:29:21,770
perpendicular to this
direction, d, if we're
593
00:29:21,770 --> 00:29:22,790
going to have a minima.
594
00:29:22,790 --> 00:29:24,665
Otherwise, I could step
in some direction, d,
595
00:29:24,665 --> 00:29:27,920
and I'll find either a
smaller value of f of x plus d
596
00:29:27,920 --> 00:29:30,560
or a bigger value
of f of x plus d.
597
00:29:30,560 --> 00:29:33,626
So g has to be
perpendicular to d.
598
00:29:33,626 --> 00:29:35,000
And for the equality
constraints,
599
00:29:35,000 --> 00:29:39,901
again, they all have to satisfy
this equality constraint
600
00:29:39,901 --> 00:29:40,400
up there.
601
00:29:40,400 --> 00:29:43,970
So c of x has to be equal
to zero, and c of x plus d
602
00:29:43,970 --> 00:29:45,310
also has to be equal to zero.
603
00:29:47,950 --> 00:29:51,710
And so if we take
a Taylor expansion
604
00:29:51,710 --> 00:29:55,670
of c of x plus d,
about x, you'll
605
00:29:55,670 --> 00:29:58,730
get c of x plus d
plus the Jacobian
606
00:29:58,730 --> 00:30:02,680
of c, all the partial
derivatives of c with respect
607
00:30:02,680 --> 00:30:04,970
to x, multiplied by d.
608
00:30:07,960 --> 00:30:11,710
We know that c of x plus d
is zero, and c of x is zero,
609
00:30:11,710 --> 00:30:16,830
so the directions, d, belong
to what set of vectors?
610
00:30:16,830 --> 00:30:18,240
The null space.
611
00:30:18,240 --> 00:30:21,300
So these directions have
to live in the null space
612
00:30:21,300 --> 00:30:25,620
of the Jacobian of c.
613
00:30:25,620 --> 00:30:27,090
So I can't step
in any direction,
614
00:30:27,090 --> 00:30:29,100
I have to step in
directions that
615
00:30:29,100 --> 00:30:33,464
are in the null space of c.
616
00:30:33,464 --> 00:30:36,520
g is perpendicular
to d, as well.
617
00:30:36,520 --> 00:30:40,180
And d belongs to
the null space of c.
618
00:30:40,180 --> 00:30:45,940
In fact, you know that d
is perpendicular to each
619
00:30:45,940 --> 00:30:47,860
of the rows of the Jacobian.
620
00:30:47,860 --> 00:30:49,290
Right?
621
00:30:49,290 --> 00:30:51,120
You know that?
622
00:30:51,120 --> 00:30:53,800
I just do the matrix
vector product, right?
623
00:30:53,800 --> 00:30:56,190
And so each element of
this matrix vector product
624
00:30:56,190 --> 00:31:01,920
is the dot product of d with a
different row of the Jacobian.
625
00:31:01,920 --> 00:31:05,880
So those rows are
a set of vectors.
626
00:31:05,880 --> 00:31:12,660
Those rows describe the
range of J transpose,
627
00:31:12,660 --> 00:31:15,960
or the row space of J. Remember
we talked about the four
628
00:31:15,960 --> 00:31:17,460
fundamental
subspaces, and I said
629
00:31:17,460 --> 00:31:19,001
we almost never use
those other ones,
630
00:31:19,001 --> 00:31:20,910
but this is one
time when we will.
631
00:31:20,910 --> 00:31:25,670
So those rows belong to
the range of J transpose,
632
00:31:25,670 --> 00:31:29,760
or they belong to the
left null space of J.
633
00:31:29,760 --> 00:31:33,420
I need to find a g,
a gradient, which
634
00:31:33,420 --> 00:31:34,980
is always perpendicular to d.
635
00:31:34,980 --> 00:31:39,690
And I know d is always
perpendicular to the rows of J.
636
00:31:39,690 --> 00:31:44,315
So I can write g as a linear
superposition of the rows of J.
637
00:31:44,315 --> 00:31:46,440
As long as g is a linear
superposition of the rows,
638
00:31:46,440 --> 00:31:50,320
it'll always be
perpendicular to d.
639
00:31:50,320 --> 00:31:53,080
Vectors from the null
space of a matrix
640
00:31:53,080 --> 00:31:57,640
are orthogonal to vectors from
the row space of that matrix,
641
00:31:57,640 --> 00:31:59,326
it turns out.
642
00:31:59,326 --> 00:32:00,950
And they're orthogonal
for this reason.
643
00:32:06,350 --> 00:32:11,750
So it tells us, if
Jd is zero, then
644
00:32:11,750 --> 00:32:13,300
d belongs to the null space.
645
00:32:13,300 --> 00:32:15,280
g is perpendicular to d.
646
00:32:15,280 --> 00:32:18,350
That means I could write g
as a linear superposition
647
00:32:18,350 --> 00:32:24,530
of the rows of J. So g belongs
to the range of J transpose,
648
00:32:24,530 --> 00:32:27,290
or it belongs to
the row space of J.
649
00:32:27,290 --> 00:32:29,230
Those are equivalent statements.
650
00:32:29,230 --> 00:32:30,980
And therefore, I should
be able to write g
651
00:32:30,980 --> 00:32:33,782
as a linear superposition
of the rows of J.
652
00:32:33,782 --> 00:32:35,240
And one way to say
that is I should
653
00:32:35,240 --> 00:32:37,160
be able to write
g as J transpose
654
00:32:37,160 --> 00:32:41,990
times some other vector lambda.
655
00:32:41,990 --> 00:32:43,790
That's an equivalent
way of saying
656
00:32:43,790 --> 00:32:48,004
that g is a linear
superposition of the rows of J.
657
00:32:48,004 --> 00:32:49,420
I don't know the
values of lambda.
658
00:32:53,700 --> 00:32:55,590
So I introduced a
new set of unknowns,
659
00:32:55,590 --> 00:32:59,142
a set of Lagrange multipliers.
660
00:32:59,142 --> 00:33:00,600
My minima is going
to be found when
661
00:33:00,600 --> 00:33:04,530
I satisfy this equation,
just like before,
662
00:33:04,530 --> 00:33:08,280
and when I'm able to satisfy
all of the equality constraints.
663
00:33:14,160 --> 00:33:19,960
How many Lagrange
multipliers do I have here?
664
00:33:19,960 --> 00:33:20,960
Can you figure that out?
665
00:33:20,960 --> 00:33:23,324
You can talk with your
neighbors if you want.
666
00:33:23,324 --> 00:33:24,240
Take a couple minutes.
667
00:33:24,240 --> 00:33:26,600
Tell me how many Lagrange
multipliers, how many elements
668
00:33:26,600 --> 00:33:27,710
are in this vector lambda.
669
00:34:19,380 --> 00:34:22,934
How many elements are in lambda?
670
00:34:22,934 --> 00:34:23,600
Can you tell me?
671
00:34:26,383 --> 00:34:26,883
Sam.
672
00:34:26,883 --> 00:34:29,239
AUDIENCE: Same as the number
of equality constraints.
673
00:34:29,239 --> 00:34:30,530
JAMES SWAN: Yes.
674
00:34:30,530 --> 00:34:33,469
It's the same as the number
of equality constraints.
675
00:34:33,469 --> 00:34:38,219
J came from the gradient of c.
676
00:34:38,219 --> 00:34:41,370
It's the Jacobian of c.
677
00:34:41,370 --> 00:34:46,290
So it has a number of columns
equal to the number of elements
678
00:34:46,290 --> 00:34:48,889
in x, because I'm taking
partial derivatives with respect
679
00:34:48,889 --> 00:34:51,300
to each element of
x, and has a number
680
00:34:51,300 --> 00:34:54,960
of rows equal to the
number of elements of c.
681
00:34:54,960 --> 00:34:58,770
So J transpose, I just
transpose those dimensions.
682
00:34:58,770 --> 00:35:02,700
And lambda must have the
same number of elements
683
00:35:02,700 --> 00:35:06,000
as c does in order to make
this product make sense.
684
00:35:06,000 --> 00:35:08,250
So I introduce a new
number of unknowns.
685
00:35:08,250 --> 00:35:12,240
It's equal to exactly the
number of equality constraints
686
00:35:12,240 --> 00:35:14,610
that I had, which
is good, because I'm
687
00:35:14,610 --> 00:35:17,220
going to make a system
of equations that
688
00:35:17,220 --> 00:35:20,790
says g of x minus J
transpose lambda equals 0
689
00:35:20,790 --> 00:35:23,460
and c of x equals 0.
690
00:35:23,460 --> 00:35:26,970
And the number of equations
here is the number
691
00:35:26,970 --> 00:35:29,290
of elements in x
for this gradient,
692
00:35:29,290 --> 00:35:31,874
and the number of
elements in c for c.
693
00:35:31,874 --> 00:35:33,540
And the number of
unknowns is the number
694
00:35:33,540 --> 00:35:36,902
of elements in x, and the number
of elements in c associated
695
00:35:36,902 --> 00:35:38,110
with the Lagrange multiplier.
696
00:35:38,110 --> 00:35:40,800
So I have enough equations
and unknowns to determine
697
00:35:40,800 --> 00:35:43,700
all of these things.
698
00:35:43,700 --> 00:35:47,440
So whether I have one equality
constraint or a million
699
00:35:47,440 --> 00:35:50,530
equality constraints,
the problem is identical.
700
00:35:50,530 --> 00:35:52,660
We use the method of
Lagrange multipliers.
701
00:35:52,660 --> 00:35:55,780
We have to solve an
augmented system of equations
702
00:35:55,780 --> 00:36:02,800
for x and this projection on the
row space of J, which tells us
703
00:36:02,800 --> 00:36:05,710
how the gradient is
stretched or made up,
704
00:36:05,710 --> 00:36:08,800
composed of elements
of the row space of J.
705
00:36:08,800 --> 00:36:10,300
These are the
conditions associated
706
00:36:10,300 --> 00:36:13,690
with a minima in our
objective function
707
00:36:13,690 --> 00:36:17,566
on this boundary dictated
by the equality constraint.
708
00:36:17,566 --> 00:36:19,690
And of course, the solution
set is a critical point
709
00:36:19,690 --> 00:36:25,800
of a Lagrangian, which is
f of x minus c dot lambda.
710
00:36:25,800 --> 00:36:28,840
And it's not a minimum of
it, it's a critical point.
711
00:36:28,840 --> 00:36:33,309
It's a saddle point, it turns
out, of this Lagrangian.
712
00:36:33,309 --> 00:36:35,350
So we've got to check,
did we find a saddle point
713
00:36:35,350 --> 00:36:39,974
or not when we find a solution
to this equation here.
714
00:36:39,974 --> 00:36:41,890
But it's just a system
of nonlinear equations.
715
00:36:41,890 --> 00:36:44,860
If we have some good initial
guess, what do we apply?
716
00:36:44,860 --> 00:36:48,670
Newton-Raphson, converge
rate towards the solution.
717
00:36:48,670 --> 00:36:51,790
If we don't have a
good initial guess,
718
00:36:51,790 --> 00:36:55,270
we've discussed lots of methods
we could employ, like homotopy
719
00:36:55,270 --> 00:36:57,340
or continuation
to try to develop
720
00:36:57,340 --> 00:37:00,709
good initial guesses for
what the solution should be.
721
00:37:00,709 --> 00:37:02,601
Are there any
questions about this?
722
00:37:05,450 --> 00:37:07,930
Good.
723
00:37:07,930 --> 00:37:08,430
OK.
724
00:37:08,430 --> 00:37:14,370
So you go to Matlab
and you call fmincon,
725
00:37:14,370 --> 00:37:17,120
do a minimization problem, and
you give it some constraints.
726
00:37:17,120 --> 00:37:18,870
Linear constraints,
nonlinear constraints,
727
00:37:18,870 --> 00:37:20,502
it doesn't matter actually.
728
00:37:20,502 --> 00:37:22,210
The problem is the
same for both of them.
729
00:37:22,210 --> 00:37:25,440
It's just a little bit easier
if I have linear constraints.
730
00:37:25,440 --> 00:37:28,650
If this constraining function
is a linear function, then
731
00:37:28,650 --> 00:37:30,360
the Jacobian I know.
732
00:37:30,360 --> 00:37:34,594
It's the coefficient matrix
of this linear problem.
733
00:37:34,594 --> 00:37:36,760
Now I only have to solve
linear equations down here.
734
00:37:36,760 --> 00:37:39,730
So the problem is a little
bit simpler to solve.
735
00:37:39,730 --> 00:37:42,360
So Matlab sort of
breaks these apart
736
00:37:42,360 --> 00:37:45,420
so it can use different
techniques depending on which
737
00:37:45,420 --> 00:37:46,530
sort of problem is posed.
738
00:37:46,530 --> 00:37:48,030
But the solution
method is the same.
739
00:37:48,030 --> 00:37:49,950
It does the method of
Lagrange multipliers
740
00:37:49,950 --> 00:37:51,394
to find the solution.
741
00:37:51,394 --> 00:37:51,894
OK?
742
00:37:54,850 --> 00:37:58,270
Inequality constraints.
743
00:37:58,270 --> 00:38:02,620
So interior point
methods were mentioned.
744
00:38:02,620 --> 00:38:04,750
And it turns out this
is really the best
745
00:38:04,750 --> 00:38:08,740
way to go about solving
generic inequality constrained
746
00:38:08,740 --> 00:38:09,790
problems.
747
00:38:09,790 --> 00:38:11,470
So the problems of
the sort minimize
748
00:38:11,470 --> 00:38:15,340
f of x subject to
h of x is positive,
749
00:38:15,340 --> 00:38:17,830
or at least not negative.
750
00:38:17,830 --> 00:38:20,110
This is some
nonlinear inequality
751
00:38:20,110 --> 00:38:23,350
that describes some domain
and its boundary in which
752
00:38:23,350 --> 00:38:25,120
the solution has to live.
753
00:38:25,120 --> 00:38:29,530
And what's done is to rewrite
as an unconstrained optimization
754
00:38:29,530 --> 00:38:34,480
problem with a barrier
that's incorporated.
755
00:38:34,480 --> 00:38:36,310
This looks a lot like
the penalty method,
756
00:38:36,310 --> 00:38:37,750
but it's very different.
757
00:38:37,750 --> 00:38:39,610
And I'll explain how.
758
00:38:39,610 --> 00:38:43,630
So instead, we want to
minimize this f of x minus mu
759
00:38:43,630 --> 00:38:49,090
times the sum of log of h,
each of these constraints.
760
00:38:51,840 --> 00:38:58,110
If h is negative, we'll take the
log of the negative argument.
761
00:38:58,110 --> 00:39:00,430
That's a problem
computationally.
762
00:39:00,430 --> 00:39:02,670
So the best we could do
is approach the boundary
763
00:39:02,670 --> 00:39:05,400
where h is equal to zero.
764
00:39:05,400 --> 00:39:08,320
And as h goes to zero, the
log goes to minus infinity.
765
00:39:08,320 --> 00:39:11,490
So this term tends to blow
up because I've got a minus
766
00:39:11,490 --> 00:39:12,780
sign in front of it.
767
00:39:12,780 --> 00:39:16,920
So this is sort
of like a penalty,
768
00:39:16,920 --> 00:39:19,320
but it's a little different
because the factor in front
769
00:39:19,320 --> 00:39:23,670
I'm actually going to take
the limit as mu goes to zero.
770
00:39:23,670 --> 00:39:26,340
I'm going to take the limit
as this factor gets small,
771
00:39:26,340 --> 00:39:28,650
rather than gets big.
772
00:39:28,650 --> 00:39:31,170
The log will always
get big as I approach
773
00:39:31,170 --> 00:39:33,070
the boundary of the domain.
774
00:39:33,070 --> 00:39:35,061
It'll blow up.
775
00:39:35,061 --> 00:39:36,060
So that's not a problem.
776
00:39:36,060 --> 00:39:39,180
But I can take the limit that
mu gets smaller and smaller.
777
00:39:39,180 --> 00:39:42,690
And this quantity here
will have less and less
778
00:39:42,690 --> 00:39:46,920
of an impact on the shape of
this new objective function
779
00:39:46,920 --> 00:39:48,600
and mu gets smaller and smaller.
780
00:39:48,600 --> 00:39:51,250
The impact will only be
nearest the boundary.
781
00:39:51,250 --> 00:39:53,100
Does that make sense?
782
00:39:53,100 --> 00:39:55,356
So you take the limit
that mu approaches zero.
783
00:39:55,356 --> 00:39:57,480
It's got to approach it
from the positive side, not
784
00:39:57,480 --> 00:40:01,386
the negative side, so
everything behaves well.
785
00:40:01,386 --> 00:40:05,575
And this is called an
interior point method.
786
00:40:05,575 --> 00:40:07,950
So we have to determine the
minimum of this new objective
787
00:40:07,950 --> 00:40:10,740
function for progressively
weaker barriers.
788
00:40:10,740 --> 00:40:12,570
So we might start
with some value of mu,
789
00:40:12,570 --> 00:40:15,780
and we might reduce
mu progressively
790
00:40:15,780 --> 00:40:17,490
until we get mu
down small enough
791
00:40:17,490 --> 00:40:19,509
that we think we've
converged to a solution.
792
00:40:19,509 --> 00:40:20,800
So how do you do that reliably?
793
00:40:24,340 --> 00:40:29,140
What's the procedure one uses
to solve a problem successively
794
00:40:29,140 --> 00:40:30,445
for different parameter values?
795
00:40:32,914 --> 00:40:33,830
AUDIENCE: [INAUDIBLE].
796
00:40:33,830 --> 00:40:35,538
JAMES SWAN: Yeah, it's
a homotopy, right?
797
00:40:35,538 --> 00:40:38,120
You're just going to change
the value of this barrier
798
00:40:38,120 --> 00:40:39,364
parameter.
799
00:40:39,364 --> 00:40:40,780
And you're going
to find a minima.
800
00:40:40,780 --> 00:40:42,650
And if you make a small change
in the barrier parameter,
801
00:40:42,650 --> 00:40:44,774
that's going to serve as
an excellent initial guess
802
00:40:44,774 --> 00:40:46,220
for the next value.
803
00:40:46,220 --> 00:40:48,770
And so you're just going
to take these small steps.
804
00:40:48,770 --> 00:40:51,200
And the optimization
routine is going
805
00:40:51,200 --> 00:40:53,180
to carry you towards
the minimum in the limit
806
00:40:53,180 --> 00:40:54,370
that mu goes to zero.
807
00:40:54,370 --> 00:40:55,610
So you do this with homotopy.
808
00:40:58,340 --> 00:41:01,700
Here's an example of this
sort of interior point
809
00:41:01,700 --> 00:41:03,200
method, a trivial example.
810
00:41:03,200 --> 00:41:06,180
Minimize x subject
to x being positive.
811
00:41:06,180 --> 00:41:09,695
So we know the solution
lives where x equals zero.
812
00:41:09,695 --> 00:41:13,040
But let's write this as
unconstrained optimization
813
00:41:13,040 --> 00:41:13,760
using a barrier.
814
00:41:13,760 --> 00:41:18,850
So minimize x minus
mu times log x.
815
00:41:18,850 --> 00:41:21,140
Here's x minus mu times log x.
816
00:41:21,140 --> 00:41:26,070
So out here, where x is
big, x wins over log x,
817
00:41:26,070 --> 00:41:28,120
so everything starts
to look linear.
818
00:41:28,120 --> 00:41:30,470
But as x become
smaller and smaller,
819
00:41:30,470 --> 00:41:34,010
log x gets very negative, so
minus log x gets very positive.
820
00:41:34,010 --> 00:41:36,210
And here's the log
creeping back up.
821
00:41:36,210 --> 00:41:38,206
And as I decrease mu
smaller and smaller,
822
00:41:38,206 --> 00:41:39,830
you can see the minima
of this function
823
00:41:39,830 --> 00:41:44,840
is moving closer and
closer and closer to zero.
824
00:41:44,840 --> 00:41:47,240
So if I take the limit
that mu decreases
825
00:41:47,240 --> 00:41:49,520
from some positive
number towards zero,
826
00:41:49,520 --> 00:41:52,410
eventually this minimum
is going to converge
827
00:41:52,410 --> 00:41:55,340
to the minimum of the
constrained inequality,
828
00:41:55,340 --> 00:41:57,580
constrained
optimization problem.
829
00:41:57,580 --> 00:41:58,900
Make sense?
830
00:41:58,900 --> 00:42:01,350
OK.
831
00:42:01,350 --> 00:42:01,850
OK.
832
00:42:01,850 --> 00:42:04,740
So we want to do this.
833
00:42:04,740 --> 00:42:07,130
You can use any barrier
function you want.
834
00:42:07,130 --> 00:42:09,650
Any thoughts on why a
logarithmic barrier is used?
835
00:42:18,710 --> 00:42:19,210
No.
836
00:42:19,210 --> 00:42:21,100
OK, that's OK.
837
00:42:21,100 --> 00:42:23,910
So minus log is
going to be convex.
838
00:42:23,910 --> 00:42:26,720
Log isn't convex, but minus
log is going to be convex.
839
00:42:26,720 --> 00:42:27,795
So that's good.
840
00:42:27,795 --> 00:42:29,920
If this function's convex,
then their combination's
841
00:42:29,920 --> 00:42:32,020
going to be convex,
and we'll be OK.
842
00:42:32,020 --> 00:42:34,300
But the gradient of the
log is easy to compute.
843
00:42:34,300 --> 00:42:37,690
Grad log h is 1 over h grad h.
844
00:42:37,690 --> 00:42:40,180
So if I know h, I know
grad h, it's easy for me
845
00:42:40,180 --> 00:42:42,130
to compute the
gradient of log h.
846
00:42:42,130 --> 00:42:46,000
We know we're going to solve
this unconstrained optimization
847
00:42:46,000 --> 00:42:48,690
problem where we need to take
grad of this objective function
848
00:42:48,690 --> 00:42:49,210
equal zero.
849
00:42:49,210 --> 00:42:50,987
So the calculations are easy.
850
00:42:50,987 --> 00:42:52,320
The log makes it easy like that.
851
00:42:52,320 --> 00:42:55,870
The log is also like
the most weakly singular
852
00:42:55,870 --> 00:42:57,520
function available to us.
853
00:42:57,520 --> 00:43:00,070
Out of all the tool box of
all problems we can reach to,
854
00:43:00,070 --> 00:43:02,835
the log has the mildest
sort of singularities.
855
00:43:02,835 --> 00:43:04,960
Singularities at both ends,
which is sort of funny,
856
00:43:04,960 --> 00:43:06,520
but the mildest sort
of singularities
857
00:43:06,520 --> 00:43:08,782
you have to cope with.
858
00:43:08,782 --> 00:43:10,990
So we want to find the
minimum of these unconstrained
859
00:43:10,990 --> 00:43:15,310
optimization problems where the
gradient of f minus mu sum 1
860
00:43:15,310 --> 00:43:18,400
over h, grad h,
is equal to zero.
861
00:43:18,400 --> 00:43:20,700
And we just do that for
progressively smaller values
862
00:43:20,700 --> 00:43:23,410
of mu, and we'll
converge to a solution.
863
00:43:23,410 --> 00:43:25,720
That's the interior
point method.
864
00:43:25,720 --> 00:43:31,360
You use homotopy to study a
sequence of barrier parameters,
865
00:43:31,360 --> 00:43:36,760
or continuation to study a
sequence of barrier parameters.
866
00:43:36,760 --> 00:43:41,580
You stop the homotopy or
continuation when what?
867
00:43:41,580 --> 00:43:42,806
How are you going to stop?
868
00:43:47,000 --> 00:43:49,234
I've got to make
mu small, right?
869
00:43:49,234 --> 00:43:51,150
I want to go towards the
limit mu equals zero.
870
00:43:51,150 --> 00:43:52,860
I can't actually get
to mu equals zero,
871
00:43:52,860 --> 00:43:54,450
I've just got to approach it.
872
00:43:54,450 --> 00:43:57,930
So how small do I need
to make mu before I quit?
873
00:43:57,930 --> 00:43:59,160
It's an interesting question.
874
00:43:59,160 --> 00:43:59,910
What do you think?
875
00:44:02,274 --> 00:44:03,440
I'll take this answer first.
876
00:44:03,440 --> 00:44:06,512
AUDIENCE: So it doesn't
affect the limitation.
877
00:44:06,512 --> 00:44:07,220
JAMES SWAN: Good.
878
00:44:07,220 --> 00:44:09,027
So we might look at
the solution and see
879
00:44:09,027 --> 00:44:10,610
is the solution
becoming less and less
880
00:44:10,610 --> 00:44:12,430
sensitive to the choice of mu.
881
00:44:12,430 --> 00:44:14,252
Did you have another suggestion?
882
00:44:14,252 --> 00:44:15,668
AUDIENCE: [INAUDIBLE].
883
00:44:17,930 --> 00:44:19,180
JAMES SWAN: Set the tolerance.
884
00:44:19,180 --> 00:44:20,766
Right, OK.
885
00:44:20,766 --> 00:44:23,484
AUDIENCE: [INAUDIBLE].
886
00:44:23,484 --> 00:44:24,150
JAMES SWAN: Mhm.
887
00:44:24,150 --> 00:44:25,050
Right, right, right, right.
888
00:44:25,050 --> 00:44:25,550
So you--
889
00:44:25,550 --> 00:44:28,842
AUDIENCE: [INAUDIBLE].
890
00:44:28,842 --> 00:44:29,550
JAMES SWAN: Good.
891
00:44:29,550 --> 00:44:31,210
So there were two
suggestions here.
892
00:44:31,210 --> 00:44:34,100
One is along the lines
of a step-norm criteria,
893
00:44:34,100 --> 00:44:36,520
like I check my
solution as I change mu,
894
00:44:36,520 --> 00:44:39,700
and I ask when does
my solution seem
895
00:44:39,700 --> 00:44:42,700
relatively insensitive to mu.
896
00:44:42,700 --> 00:44:45,370
When the changes in these
steps relative to mu
897
00:44:45,370 --> 00:44:48,010
get sufficiently
small, I might be
898
00:44:48,010 --> 00:44:49,810
willing to accept
these solutions
899
00:44:49,810 --> 00:44:53,230
as reasonable solutions for
the constrained optimization.
900
00:44:53,230 --> 00:44:55,180
I can also go back
and I can check
901
00:44:55,180 --> 00:44:58,240
sort of function norm criteria.
902
00:44:58,240 --> 00:45:00,650
I can take the value of
x I found as the minimum,
903
00:45:00,650 --> 00:45:02,920
and I can ask how
good a job does
904
00:45:02,920 --> 00:45:08,140
it do satisfying the
original equations.
905
00:45:08,140 --> 00:45:11,180
How far away am I from
satisfying the inequality
906
00:45:11,180 --> 00:45:11,680
constraint?
907
00:45:11,680 --> 00:45:14,740
How close am I to actually
minimizing the function
908
00:45:14,740 --> 00:45:15,625
within that domain?
909
00:45:20,245 --> 00:45:21,344
OK.
910
00:45:21,344 --> 00:45:22,760
So we're running
out of time here.
911
00:45:22,760 --> 00:45:26,300
Let me provide you
with an example.
912
00:45:26,300 --> 00:45:27,350
So let's minimize again--
913
00:45:27,350 --> 00:45:29,808
I always pick this function
because it's easy to visualize,
914
00:45:29,808 --> 00:45:32,770
a nice parabolic function
that opens upwards.
915
00:45:32,770 --> 00:45:36,340
And let's minimize it
subject to the constraint
916
00:45:36,340 --> 00:45:42,620
that h of x1 and x2
is equal to 1 minus--
917
00:45:42,620 --> 00:45:45,730
well, the equation for a circle
of radius 1, essentially.
918
00:45:45,730 --> 00:45:49,240
The interior of that circle.
919
00:45:49,240 --> 00:45:51,450
So here's the contours
of the function,
920
00:45:51,450 --> 00:45:53,055
and this red domain
is the constraint.
921
00:45:53,055 --> 00:45:54,840
And we want to know
the smallest value
922
00:45:54,840 --> 00:45:58,440
of f that lives in this domain.
923
00:45:58,440 --> 00:45:59,440
So here's a Matlab code.
924
00:45:59,440 --> 00:46:00,800
You can try it out.
925
00:46:00,800 --> 00:46:03,960
And make a function, the
objective function, f,
926
00:46:03,960 --> 00:46:06,650
it's x squared plus 10x--
927
00:46:06,650 --> 00:46:08,900
x1 squared plus 10x2 squared.
928
00:46:08,900 --> 00:46:10,240
Here's the gradient.
929
00:46:10,240 --> 00:46:13,210
Here's the Hessian.
930
00:46:13,210 --> 00:46:15,700
Here, I calculate h.
931
00:46:15,700 --> 00:46:17,080
Here's the gradient in h.
932
00:46:17,080 --> 00:46:19,810
Here's the Hessian in h.
933
00:46:19,810 --> 00:46:22,810
I've got to define a new
objective function, phi,
934
00:46:22,810 --> 00:46:26,470
which is f minus mu log h.
935
00:46:26,470 --> 00:46:29,320
This is the gradient in phi
and this is the Hessian of phi.
936
00:46:29,320 --> 00:46:30,760
Oh, man, what a mess.
937
00:46:30,760 --> 00:46:33,310
But actually, not such
a mess, because the log
938
00:46:33,310 --> 00:46:35,785
makes it really easy to
take these derivatives.
939
00:46:35,785 --> 00:46:40,810
So it's just a lot of
differential sort of calculus
940
00:46:40,810 --> 00:46:43,960
involved in working this out,
but this is the Hessian of phi.
941
00:46:43,960 --> 00:46:46,270
And then I need
some initial guess.
942
00:46:46,270 --> 00:46:48,940
So I pick the
center of my circle
943
00:46:48,940 --> 00:46:50,737
as an initial guess
for the solution.
944
00:46:50,737 --> 00:46:52,570
And I'm going to loop
over values of mu that
945
00:46:52,570 --> 00:46:53,707
get progressively smaller.
946
00:46:53,707 --> 00:46:55,290
I'll just go down
to 10 to the minus 2
947
00:46:55,290 --> 00:46:57,719
and stop for illustration
purposes here.
948
00:46:57,719 --> 00:47:00,010
But really, we should be
checking the solution as we go
949
00:47:00,010 --> 00:47:04,360
and deciding what values
we want to stop with.
950
00:47:04,360 --> 00:47:06,800
And then this loop
here, what's this do?
951
00:47:09,880 --> 00:47:12,222
What's it do?
952
00:47:12,222 --> 00:47:15,030
Can you tell?
953
00:47:15,030 --> 00:47:16,440
AUDIENCE: Is it Newton?
954
00:47:16,440 --> 00:47:16,770
JAMES SWAN: What's that?
955
00:47:16,770 --> 00:47:17,596
AUDIENCE: Newton?
956
00:47:17,596 --> 00:47:19,470
JAMES SWAN: Yeah, it's
Newton-Raphson, right?
957
00:47:19,470 --> 00:47:25,050
x is x minus Hessian inverse
times grad phi, right?
958
00:47:25,050 --> 00:47:26,462
So I just do Newton-Raphson.
959
00:47:26,462 --> 00:47:28,170
I take my initial
guess and I loop around
960
00:47:28,170 --> 00:47:30,630
with Newton-Raphson, and
when this loop finishes,
961
00:47:30,630 --> 00:47:32,910
I reduce mu, and it'll
just use my previous guess
962
00:47:32,910 --> 00:47:35,580
as the initial guess for
the next value of the loop,
963
00:47:35,580 --> 00:47:38,187
until mu is sufficiently small.
964
00:47:38,187 --> 00:47:39,680
OK?
965
00:47:39,680 --> 00:47:41,310
Interior point method.
966
00:47:41,310 --> 00:47:43,370
Here's what that
solution path looks like.
967
00:47:43,370 --> 00:47:46,520
So mu started at 1, and
the barrier was here.
968
00:47:46,520 --> 00:47:49,370
It was close to the edge of the
circle, but not quite on it.
969
00:47:49,370 --> 00:47:51,200
But as I reduced mu
further and further
970
00:47:51,200 --> 00:47:53,030
and further, you
can see the path,
971
00:47:53,030 --> 00:47:54,530
the solution path,
that was followed
972
00:47:54,530 --> 00:47:56,930
works its way closer to
the boundary of the circle.
973
00:47:56,930 --> 00:47:59,027
And the minimum is
found right here.
974
00:47:59,027 --> 00:48:00,860
So it turns out the
minimum of this function
975
00:48:00,860 --> 00:48:02,720
doesn't live in the
domain, it lives
976
00:48:02,720 --> 00:48:04,960
on the boundary of the domain.
977
00:48:04,960 --> 00:48:08,830
Recall that this point
should be a point where
978
00:48:08,830 --> 00:48:11,290
the boundary of the
domain is parallel
979
00:48:11,290 --> 00:48:15,699
to the contours of the function,
since actually we didn't need
980
00:48:15,699 --> 00:48:16,990
the inequality constraint here.
981
00:48:16,990 --> 00:48:18,630
We could have used the
equality constraint.
982
00:48:18,630 --> 00:48:20,840
The equality constrained
problem has the same solution
983
00:48:20,840 --> 00:48:22,423
as the inequality
constrained problem.
984
00:48:22,423 --> 00:48:23,800
And look, that
actually happened.
985
00:48:23,800 --> 00:48:25,677
Here's the contours
of the function.
986
00:48:25,677 --> 00:48:27,760
The contour of the function
runs right along here,
987
00:48:27,760 --> 00:48:29,200
and you can see
it looks like it's
988
00:48:29,200 --> 00:48:31,360
going to be tangent to
the circle at this point.
989
00:48:31,360 --> 00:48:34,450
So the interpoint
method actually solved
990
00:48:34,450 --> 00:48:37,450
an equality constrained problem
in addition to an inequality
991
00:48:37,450 --> 00:48:40,589
constrained problem, which is--
that's sort of cool that you
992
00:48:40,589 --> 00:48:41,380
can do it that way.
993
00:48:44,010 --> 00:48:46,620
How about if I want to do
a combination of equality
994
00:48:46,620 --> 00:48:48,135
and inequality constraints?
995
00:48:48,135 --> 00:48:50,010
Then what do I do?
996
00:48:57,285 --> 00:48:58,255
Yeah.
997
00:48:58,255 --> 00:49:02,150
AUDIENCE: [INAUDIBLE].
998
00:49:02,150 --> 00:49:03,020
JAMES SWAN: Perfect.
999
00:49:03,020 --> 00:49:07,280
Convert the equality
constraint into unknowns,
1000
00:49:07,280 --> 00:49:09,909
Lagrange multipliers, instead.
1001
00:49:09,909 --> 00:49:11,450
And then do the
interior point method
1002
00:49:11,450 --> 00:49:13,449
on the Lagrange
multiplier problem.
1003
00:49:13,449 --> 00:49:15,740
Now you've got a combination
of equality and inequality
1004
00:49:15,740 --> 00:49:16,460
constrained.
1005
00:49:16,460 --> 00:49:18,885
This is exactly
what Matlab does.
1006
00:49:18,885 --> 00:49:20,840
So it converts
equality constraints
1007
00:49:20,840 --> 00:49:22,610
into Lagrange multipliers.
1008
00:49:22,610 --> 00:49:24,500
Inequality constraints
it actually solves
1009
00:49:24,500 --> 00:49:26,630
using interior point methods.
1010
00:49:26,630 --> 00:49:28,790
Buried in that
interior point method
1011
00:49:28,790 --> 00:49:32,450
is some form of Newton-Raphson
and steepest descent combined
1012
00:49:32,450 --> 00:49:34,580
together, like dog
leg we talked about
1013
00:49:34,580 --> 00:49:36,500
for unconstrained problems.
1014
00:49:36,500 --> 00:49:38,120
And it's going to
do a continuation.
1015
00:49:38,120 --> 00:49:40,470
As it reduces the
values of mu, it'll
1016
00:49:40,470 --> 00:49:43,000
have some heuristic
for how it does that.
1017
00:49:43,000 --> 00:49:46,010
It's going to use its previous
solutions as initial guesses
1018
00:49:46,010 --> 00:49:47,910
for the next iteration.
1019
00:49:47,910 --> 00:49:49,700
So these are very
complicated problems,
1020
00:49:49,700 --> 00:49:52,310
but if you understand how to
solve systems of nonlinear
1021
00:49:52,310 --> 00:49:55,010
equations, and you
think carefully
1022
00:49:55,010 --> 00:49:57,350
about how to control numerical
error in your algorithm,
1023
00:49:57,350 --> 00:49:58,808
you come to a
conclusion like this,
1024
00:49:58,808 --> 00:50:04,100
that you can do these sorts of
Lagrange multiplier interior
1025
00:50:04,100 --> 00:50:06,710
point methods to solve a
wide variety of problems
1026
00:50:06,710 --> 00:50:09,820
with reasonable reliability.
1027
00:50:09,820 --> 00:50:10,625
OK?
1028
00:50:10,625 --> 00:50:12,156
Any more questions?
1029
00:50:15,030 --> 00:50:16,140
No?
1030
00:50:16,140 --> 00:50:17,200
Good.
1031
00:50:17,200 --> 00:50:20,830
Well, thank you, and
we'll see you on Friday.