1
00:00:01,540 --> 00:00:03,910
The following content is
provided under a Creative

2
00:00:03,910 --> 00:00:05,300
Commons license.

3
00:00:05,300 --> 00:00:07,510
Your support will help
MIT OpenCourseWare

4
00:00:07,510 --> 00:00:11,600
continue to offer high-quality
educational resources for free.

5
00:00:11,600 --> 00:00:14,140
To make a donation or to
view additional materials

6
00:00:14,140 --> 00:00:18,100
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,100 --> 00:00:19,310
at ocw.mit.edu.

8
00:00:24,145 --> 00:00:24,770
JAMES SWAN: OK.

9
00:00:24,770 --> 00:00:26,790
Well, everyone's
quieted down, so that

10
00:00:26,790 --> 00:00:28,680
means we have to get started.

11
00:00:28,680 --> 00:00:31,350
So let me say something here.

12
00:00:31,350 --> 00:00:34,260
This will be our
last conversation

13
00:00:34,260 --> 00:00:36,360
about optimization.

14
00:00:36,360 --> 00:00:39,510
So we've discussed
unconstrained optimization.

15
00:00:39,510 --> 00:00:42,150
And now we're going to discuss
a slightly more complicated

16
00:00:42,150 --> 00:00:43,525
problem-- but
you're going to see

17
00:00:43,525 --> 00:00:45,420
it's really not that
much more complicated--

18
00:00:45,420 --> 00:00:47,170
constrained optimization.

19
00:00:50,042 --> 00:00:51,750
These are the things
we discussed before.

20
00:00:51,750 --> 00:00:53,520
I don't want to spend
much time recapping

21
00:00:53,520 --> 00:00:55,936
because I want to take a minute
and talk about the midterm

22
00:00:55,936 --> 00:00:57,780
exam.

23
00:00:57,780 --> 00:00:59,250
So we have a quiz.

24
00:00:59,250 --> 00:01:01,020
It's next Wednesday.

25
00:01:01,020 --> 00:01:02,680
Here's where it's
going to be located.

26
00:01:02,680 --> 00:01:03,630
Here's 66.

27
00:01:03,630 --> 00:01:04,950
Head down Ames.

28
00:01:04,950 --> 00:01:09,030
You're looking for Walker
Memorial on the third floor.

29
00:01:09,030 --> 00:01:12,030
Unfortunately, the time for
the quiz is 7:00 to 9:00 PM.

30
00:01:12,030 --> 00:01:14,640
We really did try hard
to get the scheduling

31
00:01:14,640 --> 00:01:16,450
office to give us
something better,

32
00:01:16,450 --> 00:01:20,100
but the only way to get a room
that would fit everybody in

33
00:01:20,100 --> 00:01:22,035
was to do it at
this time in Walker.

34
00:01:22,035 --> 00:01:23,910
I really don't understand,
because I actually

35
00:01:23,910 --> 00:01:27,480
requested locations for
the quizzes back in April.

36
00:01:27,480 --> 00:01:31,800
And somehow I was
too early, maybe,

37
00:01:31,800 --> 00:01:33,992
and got buried under a pile.

38
00:01:33,992 --> 00:01:35,700
Maybe not important
enough, I don't know.

39
00:01:35,700 --> 00:01:40,146
But it's got to be from
seven to nine next Wednesday.

40
00:01:40,146 --> 00:01:40,645
Third floor.

41
00:01:40,645 --> 00:01:42,270
There's not going
to be any class

42
00:01:42,270 --> 00:01:44,350
next Wednesday because
you have a quiz instead.

43
00:01:44,350 --> 00:01:49,500
So you get a little extra time
to relax or study, prepare,

44
00:01:49,500 --> 00:01:51,690
calm yourself before
you go into the exam.

45
00:01:51,690 --> 00:01:54,150
There's no homework this week.

46
00:01:54,150 --> 00:01:58,950
So you can just use this time
to focus on the material we've

47
00:01:58,950 --> 00:02:00,090
discussed.

48
00:02:00,090 --> 00:02:02,310
There's a practice
exam from last year

49
00:02:02,310 --> 00:02:07,950
posted on the Steller site,
which you can utilize and study

50
00:02:07,950 --> 00:02:08,850
from.

51
00:02:08,850 --> 00:02:10,250
I'll tell you this.

52
00:02:13,506 --> 00:02:15,990
That practice exam is
skewed a little more

53
00:02:15,990 --> 00:02:19,410
towards some chemical
engineering problems

54
00:02:19,410 --> 00:02:22,060
that motivate the numerics.

55
00:02:22,060 --> 00:02:25,050
I've found in the past that
when problems like that

56
00:02:25,050 --> 00:02:27,060
are given on the exam,
sometimes there's

57
00:02:27,060 --> 00:02:30,580
a lot of reading that goes into
understanding the engineering

58
00:02:30,580 --> 00:02:31,080
problem.

59
00:02:31,080 --> 00:02:33,760
And that tends to set
back the problem-solving.

60
00:02:33,760 --> 00:02:37,110
So I'll tell you that the quiz
that you'll take on Wednesday

61
00:02:37,110 --> 00:02:40,590
will have less of the
engineering associated with it,

62
00:02:40,590 --> 00:02:46,080
and focus more on the numerical
or computational science.

63
00:02:46,080 --> 00:02:48,400
The underlying
sorts of questions,

64
00:02:48,400 --> 00:02:51,450
the way the questions are
asked, the kinds of responses

65
00:02:51,450 --> 00:02:53,730
you're expected to give
I'd say are very similar.

66
00:02:53,730 --> 00:02:57,450
But we've tried to tune
the exam so that it'll

67
00:02:57,450 --> 00:02:59,430
be less of a burden
to understand

68
00:02:59,430 --> 00:03:01,429
the structure of the
problem before describing

69
00:03:01,429 --> 00:03:02,220
how you'd solve it.

70
00:03:02,220 --> 00:03:03,920
So I think that's good.

71
00:03:03,920 --> 00:03:07,990
It's comprehensive up to today.

72
00:03:07,990 --> 00:03:10,860
So linear algebra, systems
of nonlinear equations

73
00:03:10,860 --> 00:03:14,160
and optimization
are the quiz topics.

74
00:03:14,160 --> 00:03:18,660
We're going to switch on
Friday to ordinary differential

75
00:03:18,660 --> 00:03:20,735
equations and initial
value problems.

76
00:03:20,735 --> 00:03:22,110
So you have two
lectures on that,

77
00:03:22,110 --> 00:03:24,540
but you won't have
done any homework.

78
00:03:24,540 --> 00:03:26,940
You probably don't know enough
or aren't practiced enough

79
00:03:26,940 --> 00:03:30,796
to answer any questions
intelligently on the quiz.

80
00:03:30,796 --> 00:03:32,670
So don't expect that
material to be on there.

81
00:03:32,670 --> 00:03:33,380
It's not.

82
00:03:33,380 --> 00:03:36,672
It's going to be
these three topics.

83
00:03:36,672 --> 00:03:38,880
Are there any questions
about this that I can answer?

84
00:03:42,864 --> 00:03:44,358
Kristin has a question.

85
00:03:44,358 --> 00:03:45,354
AUDIENCE: [INAUDIBLE].

86
00:03:56,535 --> 00:03:57,160
JAMES SWAN: OK.

87
00:03:57,160 --> 00:03:59,380
So yeah, come prepared.

88
00:03:59,380 --> 00:04:00,190
It might be cold.

89
00:04:00,190 --> 00:04:02,440
It might be hot.

90
00:04:02,440 --> 00:04:04,130
It leaks when it
rains a little bit.

91
00:04:04,130 --> 00:04:07,480
Yeah, it's not
the greatest spot.

92
00:04:07,480 --> 00:04:09,590
So come prepared.

93
00:04:09,590 --> 00:04:10,300
That's true.

94
00:04:10,300 --> 00:04:12,000
Other questions?

95
00:04:12,000 --> 00:04:13,292
Things you want to know?

96
00:04:13,292 --> 00:04:15,180
AUDIENCE: What can
we take to the exam?

97
00:04:15,180 --> 00:04:16,471
JAMES SWAN: Ooh, good question.

98
00:04:16,471 --> 00:04:20,059
So you can bring the book
recommended for the course.

99
00:04:20,059 --> 00:04:21,100
You can bring your notes.

100
00:04:21,100 --> 00:04:22,960
You can bring a calculator.

101
00:04:22,960 --> 00:04:24,300
You need to bring some pencils.

102
00:04:24,300 --> 00:04:28,210
We'll provide blue books for
you to write your solutions

103
00:04:28,210 --> 00:04:29,860
to the exam in.

104
00:04:29,860 --> 00:04:31,150
So those are the materials.

105
00:04:31,150 --> 00:04:32,320
Good.

106
00:04:32,320 --> 00:04:33,370
What else?

107
00:04:33,370 --> 00:04:34,710
Same question.

108
00:04:34,710 --> 00:04:36,419
OK.

109
00:04:36,419 --> 00:04:37,898
Other questions?

110
00:04:40,856 --> 00:04:42,171
No?

111
00:04:42,171 --> 00:04:42,670
OK.

112
00:04:45,755 --> 00:04:47,880
So then let's jump into
the topic of the day, which

113
00:04:47,880 --> 00:04:50,070
is constrained optimization.

114
00:04:50,070 --> 00:04:51,840
So these are
problems of the sort.

115
00:04:51,840 --> 00:04:57,870
Minimize an objective function
f of x subject to the constraint

116
00:04:57,870 --> 00:05:02,190
that x belongs to some set
D, or find the argument

117
00:05:02,190 --> 00:05:04,007
x that minimizes this function.

118
00:05:04,007 --> 00:05:05,590
These are equivalent
sorts of problem.

119
00:05:05,590 --> 00:05:08,410
Sometimes, we want to know
one or the other or both.

120
00:05:08,410 --> 00:05:10,790
That's not a problem.

121
00:05:10,790 --> 00:05:12,290
And graphically,
it looks like this.

122
00:05:12,290 --> 00:05:13,770
Here's f, our
objective function.

123
00:05:13,770 --> 00:05:16,470
It's a nice convex
bowl-shaped function here.

124
00:05:16,470 --> 00:05:19,080
And we want to know the
values of x1 and x2,

125
00:05:19,080 --> 00:05:21,870
let's say, that
minimize this function

126
00:05:21,870 --> 00:05:23,220
subject to some constraint.

127
00:05:23,220 --> 00:05:27,900
That constraint could
be that x1 and x2 live

128
00:05:27,900 --> 00:05:29,820
inside this little blue circle.

129
00:05:29,820 --> 00:05:32,820
It could be D. It could
be that x1 and x2 live

130
00:05:32,820 --> 00:05:34,680
on the surface of
this circle, right,

131
00:05:34,680 --> 00:05:36,780
on the circumference
of this circle.

132
00:05:36,780 --> 00:05:40,670
That could be the constraint.

133
00:05:40,670 --> 00:05:43,160
So these are the sorts of
problems we want to solve.

134
00:05:43,160 --> 00:05:45,980
D is called the
feasible set, and can

135
00:05:45,980 --> 00:05:48,710
be described in terms of really
two types of constraints.

136
00:05:48,710 --> 00:05:51,200
One is what we call
equality constraints.

137
00:05:51,200 --> 00:05:54,650
So D can be the set
of values x such

138
00:05:54,650 --> 00:06:00,440
that some nonlinear function
c of x is equal to zero.

139
00:06:00,440 --> 00:06:03,080
So it's the set of points
that satisfy this nonlinear

140
00:06:03,080 --> 00:06:04,400
equation.

141
00:06:04,400 --> 00:06:06,080
And among those
points, we want to know

142
00:06:06,080 --> 00:06:09,274
which one produces the minimum
in the objective function.

143
00:06:09,274 --> 00:06:10,940
Or it could be an
inequality constraint.

144
00:06:10,940 --> 00:06:14,420
So D could be the set of
points such that some nonlinear

145
00:06:14,420 --> 00:06:20,240
function h of x is, by
convention, positive.

146
00:06:20,240 --> 00:06:22,170
So h of x could
represent, for example,

147
00:06:22,170 --> 00:06:23,900
the interior of a
circle, and c of x

148
00:06:23,900 --> 00:06:27,189
could represent the
circumference of a circle.

149
00:06:27,189 --> 00:06:28,730
And we would have
nonlinear equations

150
00:06:28,730 --> 00:06:33,560
that reflect those
values of x that satisfy

151
00:06:33,560 --> 00:06:35,360
those sorts of geometries.

152
00:06:38,770 --> 00:06:43,930
So equality constrained,
points that lie on this circle,

153
00:06:43,930 --> 00:06:48,010
inequality constrained, points
that lie within this circle.

154
00:06:48,010 --> 00:06:50,560
The shape of the feasible set
is constrained by the problem

155
00:06:50,560 --> 00:06:52,460
that you're actually
interested in.

156
00:06:52,460 --> 00:06:54,460
So it's easy for me to
draw circles in the plane

157
00:06:54,460 --> 00:06:56,293
because that's a shape
you're familiar with.

158
00:06:56,293 --> 00:06:57,970
But actually, it'll
come from some sort

159
00:06:57,970 --> 00:07:01,060
of physical constraint on the
engineering problem you're

160
00:07:01,060 --> 00:07:03,940
looking at, like mole fractions
need to be bigger than zero

161
00:07:03,940 --> 00:07:06,430
and smaller than one, and
temperatures in absolute value

162
00:07:06,430 --> 00:07:09,220
have to be bigger than zero
and smaller than some value

163
00:07:09,220 --> 00:07:12,790
because that's a safety
factor on the process.

164
00:07:12,790 --> 00:07:16,960
So these set up the
constraints on various sorts

165
00:07:16,960 --> 00:07:19,864
of optimization problems
that we're interested in.

166
00:07:19,864 --> 00:07:22,030
It could also be true that
we're interested in, say,

167
00:07:22,030 --> 00:07:24,790
optimization in the domain
outside of this circle, too.

168
00:07:24,790 --> 00:07:28,030
It could be on the inside,
could be on the outside.

169
00:07:28,030 --> 00:07:31,816
That's also an inequality
constrained sort of problem.

170
00:07:31,816 --> 00:07:34,730
You know some of these already.

171
00:07:34,730 --> 00:07:36,350
They're familiar to you.

172
00:07:36,350 --> 00:07:38,840
So here's a classic
one from mechanics.

173
00:07:38,840 --> 00:07:43,120
Here's the total energy in a
system for, say, a pendulum.

174
00:07:43,120 --> 00:07:47,080
So x is like the position of
the tip of this pendulum and v

175
00:07:47,080 --> 00:07:48,910
is the velocity
that it moves with.

176
00:07:48,910 --> 00:07:50,690
This is the kinetic energy.

177
00:07:50,690 --> 00:07:51,920
This is the potential energy.

178
00:07:51,920 --> 00:07:54,128
And we know the pendulum
will come to rest in a place

179
00:07:54,128 --> 00:07:56,680
where the energy is minimized.

180
00:07:56,680 --> 00:07:59,050
Well, the energy can
only be minimized

181
00:07:59,050 --> 00:08:02,590
when the velocity here is zero,
because any non-zero velocity

182
00:08:02,590 --> 00:08:04,450
will always push the
energy content up.

183
00:08:04,450 --> 00:08:06,040
So it comes to rest.

184
00:08:06,040 --> 00:08:07,690
It doesn't move.

185
00:08:07,690 --> 00:08:10,120
And then there's some
value of x at which

186
00:08:10,120 --> 00:08:11,710
the energy is minimized.

187
00:08:11,710 --> 00:08:14,680
If there is no constraint
that says that the pendulum is

188
00:08:14,680 --> 00:08:17,680
attached to some
central axis, then I

189
00:08:17,680 --> 00:08:19,270
can always make
the energy smaller

190
00:08:19,270 --> 00:08:21,430
by making x more
and more negative.

191
00:08:21,430 --> 00:08:22,570
It just keeps falling.

192
00:08:22,570 --> 00:08:23,740
There is no stopping point.

193
00:08:23,740 --> 00:08:25,507
But there's a constraint.

194
00:08:25,507 --> 00:08:27,340
The distance between
the tip of the pendulum

195
00:08:27,340 --> 00:08:31,965
and this central point is
some fixed distance out.

196
00:08:31,965 --> 00:08:34,090
So this is an equality
constrained sort of problem,

197
00:08:34,090 --> 00:08:35,714
and we have to choose
from the set of v

198
00:08:35,714 --> 00:08:38,679
and x the values subject
to this constraint that

199
00:08:38,679 --> 00:08:39,970
minimize the total energy.

200
00:08:39,970 --> 00:08:43,650
And that's this configuration
of the pendulum here.

201
00:08:43,650 --> 00:08:46,420
So you know these sorts
of problems already.

202
00:08:46,420 --> 00:08:51,380
We talked about this one,
linear sorts of programs.

203
00:08:51,380 --> 00:08:55,480
These are optimization problems
where the objective function is

204
00:08:55,480 --> 00:08:57,940
linear in the design variables.

205
00:08:57,940 --> 00:09:01,300
So it's just the dot product
between x and some vector

206
00:09:01,300 --> 00:09:03,070
c that weights the
different design

207
00:09:03,070 --> 00:09:05,300
options against each other.

208
00:09:05,300 --> 00:09:07,060
So we talked about ice cream.

209
00:09:07,060 --> 00:09:08,480
Yes, this is all
premium ice cream

210
00:09:08,480 --> 00:09:11,435
because it comes in
the small containers,

211
00:09:11,435 --> 00:09:12,810
subject to different
constraints.

212
00:09:12,810 --> 00:09:14,226
So those constraints
can be things

213
00:09:14,226 --> 00:09:16,480
like, oh, x has to be
positive because we can't make

214
00:09:16,480 --> 00:09:18,220
negative amounts of ice cream.

215
00:09:18,220 --> 00:09:20,440
And maybe we've
done market research

216
00:09:20,440 --> 00:09:22,300
that tells us that
the market can only

217
00:09:22,300 --> 00:09:26,110
tolerate certain ratios of
different types of ice cream.

218
00:09:26,110 --> 00:09:28,600
And that may be some
set of linear equations

219
00:09:28,600 --> 00:09:31,570
that describe that market
research that sort of bound

220
00:09:31,570 --> 00:09:33,812
the upper values of
how much ice cream

221
00:09:33,812 --> 00:09:35,020
we can put out on the market.

222
00:09:35,020 --> 00:09:38,680
And then we try to choose the
optimal blend of pina colada

223
00:09:38,680 --> 00:09:40,230
and strawberry to sell.

224
00:09:43,385 --> 00:09:46,040
So those are linear programs.

225
00:09:46,040 --> 00:09:50,270
This is an inequality
constrained optimization.

226
00:09:53,660 --> 00:09:56,330
In general, we might write
these problems like this.

227
00:09:56,330 --> 00:09:59,900
We might say minimize f of
x subject to the constraint

228
00:09:59,900 --> 00:10:04,400
that c of x is 0 and
h of x is positive.

229
00:10:04,400 --> 00:10:06,200
So minimize it over
the values of x that

230
00:10:06,200 --> 00:10:08,490
satisfy these two constraints.

231
00:10:08,490 --> 00:10:12,060
There's an old approach that's
discussed in the literature.

232
00:10:12,060 --> 00:10:12,920
And it's not used.

233
00:10:12,920 --> 00:10:14,000
I'm going to describe
it to you, and then I

234
00:10:14,000 --> 00:10:16,000
want you to try to figure
out why it's not used.

235
00:10:16,000 --> 00:10:19,210
And it's called
the penalty method.

236
00:10:19,210 --> 00:10:21,250
And the penalty
method works this way.

237
00:10:21,250 --> 00:10:23,990
It says define a new
objective function,

238
00:10:23,990 --> 00:10:28,690
which is our old objective
function plus some penalty

239
00:10:28,690 --> 00:10:30,835
for violating the constraints.

240
00:10:30,835 --> 00:10:31,960
How does that penalty work?

241
00:10:31,960 --> 00:10:35,590
So we know that we want
values of x for which c of x

242
00:10:35,590 --> 00:10:38,200
is equal to 0.

243
00:10:38,200 --> 00:10:41,200
So if we add to our objective
function the norm of c of x--

244
00:10:41,200 --> 00:10:44,720
this is a positive quantity--

245
00:10:44,720 --> 00:10:46,700
this is a positive
quantity-- whenever

246
00:10:46,700 --> 00:10:48,740
x doesn't satisfy
the constraint,

247
00:10:48,740 --> 00:10:51,080
this positive
quantity will give us

248
00:10:51,080 --> 00:10:54,420
a bigger value for this
objective function f

249
00:10:54,420 --> 00:10:56,180
than if c of x was equal to 0.

250
00:10:56,180 --> 00:11:01,820
So we penalize points which
don't satisfy the constraint.

251
00:11:01,820 --> 00:11:04,940
And in the limit that this
penalty factor mu here

252
00:11:04,940 --> 00:11:10,130
goes to zero, the penalties
get large, so large

253
00:11:10,130 --> 00:11:13,280
that our solution
will have to prefer

254
00:11:13,280 --> 00:11:14,962
satisfying the constraints.

255
00:11:14,962 --> 00:11:16,670
There's another penalty
factor over here,

256
00:11:16,670 --> 00:11:19,280
which is identical to this
one but for the inequality

257
00:11:19,280 --> 00:11:20,360
constraint.

258
00:11:20,360 --> 00:11:26,770
It says take a
heaviside step function

259
00:11:26,770 --> 00:11:31,400
for which is equal to 1 when
the value of its argument

260
00:11:31,400 --> 00:11:32,960
is positive, and
it's equal to zero

261
00:11:32,960 --> 00:11:35,650
when the value of its
argument is negative.

262
00:11:35,650 --> 00:11:40,760
So whenever I violate each
of my inequality constraints,

263
00:11:40,760 --> 00:11:44,930
Hi of x, turn on this
heaviside step function,

264
00:11:44,930 --> 00:11:46,610
make it equal to 1,
and then multiply it

265
00:11:46,610 --> 00:11:50,270
by the value of the constraint
squared, a positive number.

266
00:11:50,270 --> 00:11:52,430
So this is the inequality
constraint penalty,

267
00:11:52,430 --> 00:11:54,470
and this is the equality
constraint penalty.

268
00:11:54,470 --> 00:11:57,480
People don't use this, though.

269
00:11:57,480 --> 00:11:58,470
It makes sense.

270
00:11:58,470 --> 00:12:00,540
I take the limit
that mu goes to zero.

271
00:12:00,540 --> 00:12:03,810
I'm going to have
to prefer solutions

272
00:12:03,810 --> 00:12:06,580
that satisfy these constraints.

273
00:12:06,580 --> 00:12:08,745
Otherwise, if I don't
satisfy these constraints,

274
00:12:08,745 --> 00:12:10,620
I could always move
closer to a solution that

275
00:12:10,620 --> 00:12:12,036
satisfies the
constraint, and I'll

276
00:12:12,036 --> 00:12:14,970
bring down the value of
the objective function.

277
00:12:14,970 --> 00:12:15,857
I'll make it lower.

278
00:12:15,857 --> 00:12:17,940
So I'll always prefer these
lower value solutions.

279
00:12:17,940 --> 00:12:21,180
But can you guys take a second
and sort of talk to each other?

280
00:12:21,180 --> 00:12:25,147
See if you can figure out why
one doesn't use this method.

281
00:12:25,147 --> 00:12:26,355
Why is this method a problem?

282
00:15:08,980 --> 00:15:11,780
OK, I heard the volume go
up at some point, which

283
00:15:11,780 --> 00:15:13,952
means either you
switched topics and felt

284
00:15:13,952 --> 00:15:15,410
more comfortable
talking about that

285
00:15:15,410 --> 00:15:17,360
than this, or maybe
you guys were coming

286
00:15:17,360 --> 00:15:19,970
to some conclusions, or
had some ideas about why

287
00:15:19,970 --> 00:15:21,359
this might be a bad idea.

288
00:15:21,359 --> 00:15:23,900
Do you want to volunteer some
of what you were talking about?

289
00:15:23,900 --> 00:15:24,878
Yeah, Hersh.

290
00:15:24,878 --> 00:15:28,301
AUDIENCE: Could it
be that [INAUDIBLE]??

291
00:15:41,040 --> 00:15:43,340
JAMES SWAN: Well, that's
an interesting idea.

292
00:15:43,340 --> 00:15:45,710
So yeah, if we have a
non-convex optimization problem,

293
00:15:45,710 --> 00:15:48,710
there could be some issues
with f of x, and maybe f

294
00:15:48,710 --> 00:15:50,810
of x runs away so
fast that I can never

295
00:15:50,810 --> 00:15:53,080
make the penalty big enough
to enforce the constraint.

296
00:15:53,080 --> 00:15:54,830
That's actually a
really interesting idea.

297
00:15:54,830 --> 00:15:57,630
And I like the idea of comparing
the magnitude of these two

298
00:15:57,630 --> 00:15:58,130
terms.

299
00:15:58,130 --> 00:15:59,600
I think that's on
the right track.

300
00:15:59,600 --> 00:16:01,560
Were there some
other ideas about why

301
00:16:01,560 --> 00:16:02,624
you might not do this?

302
00:16:02,624 --> 00:16:03,290
Different ideas?

303
00:16:03,290 --> 00:16:04,408
Yeah.

304
00:16:04,408 --> 00:16:05,830
AUDIENCE: [INAUDIBLE].

305
00:16:09,980 --> 00:16:12,480
JAMES SWAN: Well, you know,
that that's an interesting idea,

306
00:16:12,480 --> 00:16:14,630
but actually the two
terms in the parentheses

307
00:16:14,630 --> 00:16:17,150
here are both positive.

308
00:16:17,150 --> 00:16:19,100
So they're only
going to be minimized

309
00:16:19,100 --> 00:16:21,760
when I satisfy the constraints.

310
00:16:21,760 --> 00:16:24,500
So the local minima of
the terms in parentheses

311
00:16:24,500 --> 00:16:30,260
sit on or within the
boundaries of the feasible set

312
00:16:30,260 --> 00:16:31,410
that we're looking at.

313
00:16:31,410 --> 00:16:32,868
So by construction,
actually, we're

314
00:16:32,868 --> 00:16:35,660
going to be able to satisfy
them because the local minima

315
00:16:35,660 --> 00:16:38,420
of these points sits
on these boundaries.

316
00:16:38,420 --> 00:16:43,230
These terms are minimized by
satisfying the constraints.

317
00:16:43,230 --> 00:16:43,731
Other ideas?

318
00:16:43,731 --> 00:16:44,229
Yeah.

319
00:16:44,229 --> 00:16:46,190
AUDIENCE: Do your iterates
have to be feasible?

320
00:16:46,190 --> 00:16:46,610
JAMES SWAN: What's that?

321
00:16:46,610 --> 00:16:48,380
AUDIENCE: Your iterates
don't have to be feasible?

322
00:16:48,380 --> 00:16:49,963
JAMES SWAN: Ooh,
this is a good point.

323
00:16:49,963 --> 00:16:52,590
The iterates-- this is an
unconstrained optimization

324
00:16:52,590 --> 00:16:53,090
problem.

325
00:16:53,090 --> 00:16:55,600
I'm just going to minimize
this objective function.

326
00:16:55,600 --> 00:16:57,140
It's like what
Hersh said, I can go

327
00:16:57,140 --> 00:16:58,764
anywhere I want in the domain.

328
00:16:58,764 --> 00:17:00,680
I'm going to minimize
this objective function,

329
00:17:00,680 --> 00:17:01,280
and then I'm going
to try to take

330
00:17:01,280 --> 00:17:02,570
the limit as mu goes to zero.

331
00:17:02,570 --> 00:17:04,194
The iterates don't
have to be feasible.

332
00:17:04,194 --> 00:17:06,609
Maybe I can't even evaluate
f of x if the iterates aren't

333
00:17:06,609 --> 00:17:07,310
feasible.

334
00:17:07,310 --> 00:17:08,660
That's an excellent point.

335
00:17:08,660 --> 00:17:10,640
That could be an issue.

336
00:17:10,640 --> 00:17:13,829
Anything else?

337
00:17:13,829 --> 00:17:17,030
Are there some other ideas?

338
00:17:17,030 --> 00:17:17,617
Sure.

339
00:17:17,617 --> 00:17:19,078
AUDIENCE: [INAUDIBLE].

340
00:17:28,050 --> 00:17:30,007
JAMES SWAN: I think
that's a good point.

341
00:17:30,007 --> 00:17:31,468
AUDIENCE: --boundary
from outside

342
00:17:31,468 --> 00:17:33,210
without knowing what's inside.

343
00:17:33,210 --> 00:17:33,960
JAMES SWAN: Short.

344
00:17:33,960 --> 00:17:36,090
So you'll see, actually,
the right way to do this

345
00:17:36,090 --> 00:17:38,370
is to use what's called
interior point methods, which

346
00:17:38,370 --> 00:17:39,900
live inside the domain.

347
00:17:39,900 --> 00:17:41,100
This is an excellent point.

348
00:17:41,100 --> 00:17:43,890
There's another issue
with this that's

349
00:17:43,890 --> 00:17:46,391
I think actually less subtle
than some of these ideas, which

350
00:17:46,391 --> 00:17:47,640
they're all correct, actually.

351
00:17:47,640 --> 00:17:50,220
These can be problems with
this sort of penalty method.

352
00:17:50,220 --> 00:17:53,370
As I take the limit
that mu goes to zero,

353
00:17:53,370 --> 00:17:57,390
the penalty function
becomes large for all points

354
00:17:57,390 --> 00:17:58,320
outside the domain.

355
00:17:58,320 --> 00:18:02,347
They can become larger
than f for those points.

356
00:18:02,347 --> 00:18:03,930
And so there are
some practical issues

357
00:18:03,930 --> 00:18:06,210
about comparing these two
terms against each other.

358
00:18:06,210 --> 00:18:11,010
I may not have sufficient
accuracy, sufficient number

359
00:18:11,010 --> 00:18:14,950
of digits to accurately add
these two terms together.

360
00:18:14,950 --> 00:18:17,661
So I may prefer
to find some point

361
00:18:17,661 --> 00:18:20,160
that lives on the boundary of
the domain as mu goes to zero.

362
00:18:20,160 --> 00:18:21,534
But I can't
guarantee that it was

363
00:18:21,534 --> 00:18:27,600
a minima of f on that domain,
or within that feasible set.

364
00:18:27,600 --> 00:18:30,360
So a lot of practical
issues that suggest this

365
00:18:30,360 --> 00:18:32,397
is a bad idea.

366
00:18:32,397 --> 00:18:33,230
This is an old idea.

367
00:18:33,230 --> 00:18:34,650
People knew this was
bad for a long time.

368
00:18:34,650 --> 00:18:35,691
It seems natural, though.

369
00:18:35,691 --> 00:18:37,530
It seems like a good
way to transform

370
00:18:37,530 --> 00:18:41,154
from these constrained
optimization problems

371
00:18:41,154 --> 00:18:42,570
to something we
know how to solve,

372
00:18:42,570 --> 00:18:43,992
an unconstrained optimization.

373
00:18:43,992 --> 00:18:46,200
But actually, it turns out
not to be such a great way

374
00:18:46,200 --> 00:18:46,700
to do it.

375
00:18:50,340 --> 00:18:52,080
So let's talk about
separating out

376
00:18:52,080 --> 00:18:54,160
these two different
methods from each other,

377
00:18:54,160 --> 00:18:55,740
or these two different problems.

378
00:18:55,740 --> 00:18:57,840
Let's talk first about
equality constraints,

379
00:18:57,840 --> 00:19:01,550
and then we'll talk about
inequality constraints.

380
00:19:01,550 --> 00:19:04,040
So equality constrained
optimization problems

381
00:19:04,040 --> 00:19:04,850
look like this.

382
00:19:04,850 --> 00:19:08,280
Minimize f of x subject
to c of x equals zero.

383
00:19:08,280 --> 00:19:09,710
And let's make it even easier.

384
00:19:09,710 --> 00:19:13,910
Rather than having some vector
of equality constraints,

385
00:19:13,910 --> 00:19:15,722
let's just have
a single equation

386
00:19:15,722 --> 00:19:17,930
that we have to satisfy for
that equality constraint,

387
00:19:17,930 --> 00:19:19,550
like the equation for a circle.

388
00:19:19,550 --> 00:19:22,970
Solutions have to sit on the
circumference of a circle.

389
00:19:22,970 --> 00:19:26,170
So one equation that
we have to satisfy.

390
00:19:26,170 --> 00:19:28,640
You might ask again, what
are the necessary conditions

391
00:19:28,640 --> 00:19:31,480
for defining a minimum?

392
00:19:31,480 --> 00:19:33,230
That's what we used
when we had equality--

393
00:19:33,230 --> 00:19:35,270
or when we had
unconstrained optimization.

394
00:19:35,270 --> 00:19:38,420
First we had to define
what a minimum was,

395
00:19:38,420 --> 00:19:40,940
and we found that minima
were critical points, places

396
00:19:40,940 --> 00:19:44,720
where the gradient of the
objective function was zero.

397
00:19:44,720 --> 00:19:46,880
That doesn't have
to be true anymore.

398
00:19:46,880 --> 00:19:52,670
Now, the minima has to live on
this boundary of some domain.

399
00:19:52,670 --> 00:19:56,240
It has to live in this set
of points c of x equals zero.

400
00:19:56,240 --> 00:19:58,100
And the gradient of
f is not necessarily

401
00:19:58,100 --> 00:20:00,980
zero at that minimal point.

402
00:20:00,980 --> 00:20:04,580
But you might guess that
Taylor expansions are the way

403
00:20:04,580 --> 00:20:14,180
to figure out what the
appropriate conditions

404
00:20:14,180 --> 00:20:15,180
for a minima are.

405
00:20:15,180 --> 00:20:18,050
So let's take f of x, and
let's expand it, do a Taylor

406
00:20:18,050 --> 00:20:20,630
expansion in some direction, d.

407
00:20:20,630 --> 00:20:23,810
So we'll take a step away
from x, which is small,

408
00:20:23,810 --> 00:20:24,840
in some direction, d.

409
00:20:24,840 --> 00:20:28,550
So f of x plus d is
f of x plus g dot

410
00:20:28,550 --> 00:20:34,580
d, the dot product between
the gradient of f and d.

411
00:20:34,580 --> 00:20:38,570
And at a minimum,
either the gradient

412
00:20:38,570 --> 00:20:43,460
is zero or the gradient is
perpendicular to this direction

413
00:20:43,460 --> 00:20:45,860
we moved in, d.

414
00:20:45,860 --> 00:20:53,380
We know that because this
term is going to increase--

415
00:20:53,380 --> 00:20:55,370
well, will change
the value of f of x.

416
00:20:55,370 --> 00:20:57,060
It will either make
it bigger or smaller

417
00:20:57,060 --> 00:20:59,460
depending on whether it's
positive or negative.

418
00:20:59,460 --> 00:21:01,170
In either case, it
will say that this

419
00:21:01,170 --> 00:21:04,770
point x can't be a minimum
unless this term is exactly

420
00:21:04,770 --> 00:21:07,740
equal to zero in the limit
that d becomes small.

421
00:21:07,740 --> 00:21:09,870
So either the gradient
is zero or the gradient

422
00:21:09,870 --> 00:21:13,310
is orthogonal to this
direction d we stepped in.

423
00:21:13,310 --> 00:21:16,290
And d was arbitrary.

424
00:21:16,290 --> 00:21:18,710
We just said take a
step in a direction, d.

425
00:21:22,350 --> 00:21:24,150
Lets take our
equality constraint

426
00:21:24,150 --> 00:21:28,140
and do the same sort of
Taylor expansion, because we

427
00:21:28,140 --> 00:21:33,050
know if we're searching for
a minima along this curve

428
00:21:33,050 --> 00:21:35,310
c of x better be equal to zero.

429
00:21:35,310 --> 00:21:36,820
It better satisfy
the constraint.

430
00:21:36,820 --> 00:21:40,440
And also, c of x plus d, that
little step in the direction d,

431
00:21:40,440 --> 00:21:43,140
should also satisfy
the constraint.

432
00:21:43,140 --> 00:21:46,980
We want to study only the
feasible set of values.

433
00:21:46,980 --> 00:21:48,602
So actually, d
wasn't arbitrary. d

434
00:21:48,602 --> 00:21:50,310
had to satisfy this
constraint that, when

435
00:21:50,310 --> 00:21:53,700
I took this little step, c of x
plus d had to be equal to zero.

436
00:21:53,700 --> 00:21:55,770
So again, we'll take
now a Taylor expansion

437
00:21:55,770 --> 00:21:59,310
of c of x plus d, which
is c of x plus grad

438
00:21:59,310 --> 00:22:03,260
of c of x dotted with d.

439
00:22:03,260 --> 00:22:07,730
And that implies that d must be
perpendicular to the gradient

440
00:22:07,730 --> 00:22:10,790
of c of x, because c of
x plus d has to be zero

441
00:22:10,790 --> 00:22:12,200
and c of x has to be zero.

442
00:22:12,200 --> 00:22:16,496
So the gradient of c of x
dot d-- it's a leading order

443
00:22:16,496 --> 00:22:17,870
has also got to
be equal to zero.

444
00:22:17,870 --> 00:22:21,290
So d and the gradient
in c are perpendicular,

445
00:22:21,290 --> 00:22:23,780
and d and the gradient
in g have to be

446
00:22:23,780 --> 00:22:27,230
perpendicular at a minimum.

447
00:22:27,230 --> 00:22:29,900
That's going to define the
minimum on this equality

448
00:22:29,900 --> 00:22:31,960
constrained set.

449
00:22:31,960 --> 00:22:34,830
Does that make sense?

450
00:22:34,830 --> 00:22:37,260
c satisfies the
constraint, c plus d

451
00:22:37,260 --> 00:22:38,820
satisfies the constraint.

452
00:22:38,820 --> 00:22:42,060
If this is true, d has to be
perpendicular to the gradient

453
00:22:42,060 --> 00:22:46,430
of c, g has to be perpendicular
to the gradient of d.

454
00:22:46,430 --> 00:22:50,050
d is, in some sense,
arbitrary still.

455
00:22:50,050 --> 00:22:52,080
d has to satisfy
condition that it's

456
00:22:52,080 --> 00:22:53,800
perpendicular to
the gradient of c,

457
00:22:53,800 --> 00:22:55,949
but who knows,
there could be lots

458
00:22:55,949 --> 00:22:57,990
of vectors that are
perpendicular to the gradient

459
00:22:57,990 --> 00:22:59,880
of c.

460
00:22:59,880 --> 00:23:02,100
So the only generic
relationship between these two

461
00:23:02,100 --> 00:23:06,720
we can formulate is g must be
parallel to the gradient of c.

462
00:23:06,720 --> 00:23:08,970
g is perpendicular
to d, gradient

463
00:23:08,970 --> 00:23:10,550
of c is perpendicular to d.

464
00:23:10,550 --> 00:23:12,985
In the most generic
way, g and gradient of c

465
00:23:12,985 --> 00:23:14,360
should be parallel
to each other,

466
00:23:14,360 --> 00:23:16,290
because d I can
select arbitrarily

467
00:23:16,290 --> 00:23:21,400
from all the vectors of
the same dimension as x.

468
00:23:24,060 --> 00:23:25,710
If g is parallel to
the gradient of c,

469
00:23:25,710 --> 00:23:31,080
then I can write that g
minus some scalar multiplied

470
00:23:31,080 --> 00:23:33,257
by the gradient of
c is equal to zero.

471
00:23:33,257 --> 00:23:35,340
That's an equivalent
statement, that g is parallel

472
00:23:35,340 --> 00:23:37,230
to the gradient of c.

473
00:23:37,230 --> 00:23:41,490
So that's a condition
associated with points

474
00:23:41,490 --> 00:23:45,660
x that solve this equality
constrained problem.

475
00:23:45,660 --> 00:23:47,790
The other condition
is that point x still

476
00:23:47,790 --> 00:23:52,170
has to satisfy the
equality constraint.

477
00:23:52,170 --> 00:23:55,560
But I introduced a new
unknown, this lambda,

478
00:23:55,560 --> 00:23:58,170
which is called the
Lagrange multiplier.

479
00:23:58,170 --> 00:24:02,730
So now I have one extra unknown,
but I have one extra equation.

480
00:24:06,094 --> 00:24:08,010
Let me give you a graphical
depiction of this,

481
00:24:08,010 --> 00:24:11,510
and then I'll write down
the formal equations again.

482
00:24:11,510 --> 00:24:14,660
So let's suppose
we want to minimize

483
00:24:14,660 --> 00:24:17,750
this parabolic function
subject to the constraint

484
00:24:17,750 --> 00:24:20,540
that the solution
lives on the line.

485
00:24:20,540 --> 00:24:22,520
So here's the contours
of the function,

486
00:24:22,520 --> 00:24:24,590
and the solution has
to live on this line.

487
00:24:28,380 --> 00:24:30,050
So I get to stand
on this line, and I

488
00:24:30,050 --> 00:24:34,010
get to walk and walk and walk
until I can't walk downhill

489
00:24:34,010 --> 00:24:36,650
anymore. and I've got to
turn and walk uphill again.

490
00:24:36,650 --> 00:24:40,870
And you can see the point where
I can't walk downhill anymore

491
00:24:40,870 --> 00:24:45,170
is the place where this
constraint is parallel

492
00:24:45,170 --> 00:24:49,700
to the contour, or
where the gradient

493
00:24:49,700 --> 00:24:52,520
of the objective
function is parallel

494
00:24:52,520 --> 00:24:55,712
to the gradient
of the constraint.

495
00:24:55,712 --> 00:24:57,170
So you can actually
find this point

496
00:24:57,170 --> 00:25:00,140
by imagining yourself
moving along this landscape.

497
00:25:00,140 --> 00:25:02,960
After I get to this point,
I start going uphill again.

498
00:25:05,530 --> 00:25:09,700
So that's the method of
Lagrange multipliers.

499
00:25:09,700 --> 00:25:11,800
Minimize f of x subject
to this constraint.

500
00:25:11,800 --> 00:25:15,380
The solution is
given by the point x

501
00:25:15,380 --> 00:25:20,510
at which the gradient is
parallel to the gradient of c,

502
00:25:20,510 --> 00:25:22,910
and at which c is equal to zero.

503
00:25:22,910 --> 00:25:25,970
And you solve this system
of nonlinear equations

504
00:25:25,970 --> 00:25:27,900
for two unknowns.

505
00:25:27,900 --> 00:25:31,790
One is x, and the other
is this unknown lambda.

506
00:25:31,790 --> 00:25:35,540
How far stretched
is the gradient

507
00:25:35,540 --> 00:25:39,192
in f relative to
the gradient in c?

508
00:25:39,192 --> 00:25:41,150
So again, we've turned
the minimization problem

509
00:25:41,150 --> 00:25:43,760
into a system of
nonlinear equations.

510
00:25:43,760 --> 00:25:46,100
In order to satisfy the
equality constraint,

511
00:25:46,100 --> 00:25:47,980
we've had to introduce
another unknown,

512
00:25:47,980 --> 00:25:50,150
the Lagrange multiplier.

513
00:25:50,150 --> 00:25:54,980
It turns out this solution
set, x and lambda,

514
00:25:54,980 --> 00:25:58,940
is a critical point of
something called the Lagrangian.

515
00:25:58,940 --> 00:26:03,950
It's a function f of x
minus lambda times c.

516
00:26:03,950 --> 00:26:08,840
It's a critical point in x
and lambda of this nonlinear

517
00:26:08,840 --> 00:26:10,820
function called the Lagrangian.

518
00:26:10,820 --> 00:26:13,010
It's not a minimum of this
function, unfortunately.

519
00:26:13,010 --> 00:26:17,240
It's a saddle point of the
Lagrangian, it turns out.

520
00:26:17,240 --> 00:26:21,671
So we're trying to find a
saddle point of the Lagrangian.

521
00:26:21,671 --> 00:26:23,990
Does this make sense?

522
00:26:23,990 --> 00:26:24,747
Yes?

523
00:26:24,747 --> 00:26:25,902
OK.

524
00:26:25,902 --> 00:26:27,360
We've got to be
careful, of course.

525
00:26:27,360 --> 00:26:29,630
Just like with
unconstrained optimization,

526
00:26:29,630 --> 00:26:33,530
we actually have to check that
our solution is a minimum.

527
00:26:33,530 --> 00:26:36,200
We can't take for
granted, we can't

528
00:26:36,200 --> 00:26:39,920
suppose that our nonlinear
solver found a minimum

529
00:26:39,920 --> 00:26:41,322
when it solved this equation.

530
00:26:41,322 --> 00:26:43,530
Other critical points can
satisfy this equation, too.

531
00:26:43,530 --> 00:26:47,169
So we've got to go back and try
to check robustly whether it's

532
00:26:47,169 --> 00:26:47,960
actually a minimum.

533
00:26:47,960 --> 00:26:48,918
But this is the method.

534
00:26:48,918 --> 00:26:53,060
Introduce an additional unknown,
the Lagrange multiplier,

535
00:26:53,060 --> 00:26:54,809
because you can
show geometrically

536
00:26:54,809 --> 00:26:56,600
that the gradient of
the objective function

537
00:26:56,600 --> 00:27:00,140
should be parallel to the
gradient of the constraint

538
00:27:00,140 --> 00:27:01,670
at the minimum.

539
00:27:01,670 --> 00:27:03,541
Does that make sense?

540
00:27:03,541 --> 00:27:05,060
Does this picture make sense?

541
00:27:05,060 --> 00:27:06,020
OK.

542
00:27:06,020 --> 00:27:08,353
So you know how to solve
systems of nonlinear equations,

543
00:27:08,353 --> 00:27:10,460
you know how to solve
constrained optimization

544
00:27:10,460 --> 00:27:10,959
problems.

545
00:27:15,300 --> 00:27:17,060
So here's f.

546
00:27:17,060 --> 00:27:19,870
Here's c.

547
00:27:19,870 --> 00:27:23,170
We can actually write out
what these equations are.

548
00:27:23,170 --> 00:27:25,750
So you can show that the
gradient of x minus lambda

549
00:27:25,750 --> 00:27:31,660
gradient of c, that's a vector,
2x1 minus lambda and 20x2

550
00:27:31,660 --> 00:27:33,070
plus lambda.

551
00:27:33,070 --> 00:27:34,960
And c is the equation
for this line

552
00:27:34,960 --> 00:27:37,480
down here, so x1
minus x2 minus 3.

553
00:27:37,480 --> 00:27:39,850
And that's all got
to be equal to zero.

554
00:27:39,850 --> 00:27:42,410
In this case, this is just a
system of linear equations.

555
00:27:42,410 --> 00:27:44,080
So you can actually
solve directly

556
00:27:44,080 --> 00:27:47,629
for x1, x2, and lambda.

557
00:27:47,629 --> 00:27:50,170
And it's not too difficult to
find the solution for all three

558
00:27:50,170 --> 00:27:52,480
of these things by hand.

559
00:27:52,480 --> 00:27:55,750
But in general, these
constraints can be nonlinear.

560
00:27:55,750 --> 00:27:58,570
The objective function
doesn't have to be quadratic.

561
00:27:58,570 --> 00:28:00,340
Those are the easiest
cases to look at.

562
00:28:00,340 --> 00:28:02,810
And the same
methodology applies.

563
00:28:02,810 --> 00:28:06,265
And so you should check
that you're able to do this.

564
00:28:06,265 --> 00:28:08,700
This is the simplest possible
equality constraint problem.

565
00:28:08,700 --> 00:28:09,720
You could do it by hand.

566
00:28:09,720 --> 00:28:11,230
You should check that you're
actually able to do it,

567
00:28:11,230 --> 00:28:13,840
that you understand the steps
that go into writing out

568
00:28:13,840 --> 00:28:14,911
these equations.

569
00:28:17,500 --> 00:28:19,290
Let's just take one
step forward and look

570
00:28:19,290 --> 00:28:21,810
at a less generic
case, one in which

571
00:28:21,810 --> 00:28:28,220
we have a vector valued
function that gives the equality

572
00:28:28,220 --> 00:28:29,240
constraints instead.

573
00:28:29,240 --> 00:28:31,220
So rather than one equation
we have to satisfy,

574
00:28:31,220 --> 00:28:31,985
there may be many.

575
00:28:34,610 --> 00:28:39,110
It's possible that the
feasible set doesn't

576
00:28:39,110 --> 00:28:41,240
have any solutions in it.

577
00:28:41,240 --> 00:28:42,830
It's possible that
there is no x that

578
00:28:42,830 --> 00:28:46,560
satisfies all of these
constraints simultaneously.

579
00:28:46,560 --> 00:28:49,490
That's a bad problem to have.

580
00:28:49,490 --> 00:28:52,100
You wouldn't like to have
that problem very much.

581
00:28:52,100 --> 00:28:53,850
But it's possible
that that's the case.

582
00:28:53,850 --> 00:28:57,600
But let's assume that there are
solutions for the time being.

583
00:28:57,600 --> 00:29:00,081
So there are x's that satisfy
the equality constraint.

584
00:29:00,081 --> 00:29:01,580
Let's see if we can
figure out again

585
00:29:01,580 --> 00:29:04,640
what the necessary conditions
for defining a minima are.

586
00:29:04,640 --> 00:29:07,100
So same as before,
let's Taylor expand

587
00:29:07,100 --> 00:29:10,340
f of x going in
some direction, d.

588
00:29:10,340 --> 00:29:12,060
And let's make d
a nice small step

589
00:29:12,060 --> 00:29:14,840
so we can just treat
the f of x plus d

590
00:29:14,840 --> 00:29:16,820
as a linearized function.

591
00:29:16,820 --> 00:29:19,220
So we can see again
that g has to be

592
00:29:19,220 --> 00:29:21,770
perpendicular to this
direction, d, if we're

593
00:29:21,770 --> 00:29:22,790
going to have a minima.

594
00:29:22,790 --> 00:29:24,665
Otherwise, I could step
in some direction, d,

595
00:29:24,665 --> 00:29:27,920
and I'll find either a
smaller value of f of x plus d

596
00:29:27,920 --> 00:29:30,560
or a bigger value
of f of x plus d.

597
00:29:30,560 --> 00:29:33,626
So g has to be
perpendicular to d.

598
00:29:33,626 --> 00:29:35,000
And for the equality
constraints,

599
00:29:35,000 --> 00:29:39,901
again, they all have to satisfy
this equality constraint

600
00:29:39,901 --> 00:29:40,400
up there.

601
00:29:40,400 --> 00:29:43,970
So c of x has to be equal
to zero, and c of x plus d

602
00:29:43,970 --> 00:29:45,310
also has to be equal to zero.

603
00:29:47,950 --> 00:29:51,710
And so if we take
a Taylor expansion

604
00:29:51,710 --> 00:29:55,670
of c of x plus d,
about x, you'll

605
00:29:55,670 --> 00:29:58,730
get c of x plus d
plus the Jacobian

606
00:29:58,730 --> 00:30:02,680
of c, all the partial
derivatives of c with respect

607
00:30:02,680 --> 00:30:04,970
to x, multiplied by d.

608
00:30:07,960 --> 00:30:11,710
We know that c of x plus d
is zero, and c of x is zero,

609
00:30:11,710 --> 00:30:16,830
so the directions, d, belong
to what set of vectors?

610
00:30:16,830 --> 00:30:18,240
The null space.

611
00:30:18,240 --> 00:30:21,300
So these directions have
to live in the null space

612
00:30:21,300 --> 00:30:25,620
of the Jacobian of c.

613
00:30:25,620 --> 00:30:27,090
So I can't step
in any direction,

614
00:30:27,090 --> 00:30:29,100
I have to step in
directions that

615
00:30:29,100 --> 00:30:33,464
are in the null space of c.

616
00:30:33,464 --> 00:30:36,520
g is perpendicular
to d, as well.

617
00:30:36,520 --> 00:30:40,180
And d belongs to
the null space of c.

618
00:30:40,180 --> 00:30:45,940
In fact, you know that d
is perpendicular to each

619
00:30:45,940 --> 00:30:47,860
of the rows of the Jacobian.

620
00:30:47,860 --> 00:30:49,290
Right?

621
00:30:49,290 --> 00:30:51,120
You know that?

622
00:30:51,120 --> 00:30:53,800
I just do the matrix
vector product, right?

623
00:30:53,800 --> 00:30:56,190
And so each element of
this matrix vector product

624
00:30:56,190 --> 00:31:01,920
is the dot product of d with a
different row of the Jacobian.

625
00:31:01,920 --> 00:31:05,880
So those rows are
a set of vectors.

626
00:31:05,880 --> 00:31:12,660
Those rows describe the
range of J transpose,

627
00:31:12,660 --> 00:31:15,960
or the row space of J. Remember
we talked about the four

628
00:31:15,960 --> 00:31:17,460
fundamental
subspaces, and I said

629
00:31:17,460 --> 00:31:19,001
we almost never use
those other ones,

630
00:31:19,001 --> 00:31:20,910
but this is one
time when we will.

631
00:31:20,910 --> 00:31:25,670
So those rows belong to
the range of J transpose,

632
00:31:25,670 --> 00:31:29,760
or they belong to the
left null space of J.

633
00:31:29,760 --> 00:31:33,420
I need to find a g,
a gradient, which

634
00:31:33,420 --> 00:31:34,980
is always perpendicular to d.

635
00:31:34,980 --> 00:31:39,690
And I know d is always
perpendicular to the rows of J.

636
00:31:39,690 --> 00:31:44,315
So I can write g as a linear
superposition of the rows of J.

637
00:31:44,315 --> 00:31:46,440
As long as g is a linear
superposition of the rows,

638
00:31:46,440 --> 00:31:50,320
it'll always be
perpendicular to d.

639
00:31:50,320 --> 00:31:53,080
Vectors from the null
space of a matrix

640
00:31:53,080 --> 00:31:57,640
are orthogonal to vectors from
the row space of that matrix,

641
00:31:57,640 --> 00:31:59,326
it turns out.

642
00:31:59,326 --> 00:32:00,950
And they're orthogonal
for this reason.

643
00:32:06,350 --> 00:32:11,750
So it tells us, if
Jd is zero, then

644
00:32:11,750 --> 00:32:13,300
d belongs to the null space.

645
00:32:13,300 --> 00:32:15,280
g is perpendicular to d.

646
00:32:15,280 --> 00:32:18,350
That means I could write g
as a linear superposition

647
00:32:18,350 --> 00:32:24,530
of the rows of J. So g belongs
to the range of J transpose,

648
00:32:24,530 --> 00:32:27,290
or it belongs to
the row space of J.

649
00:32:27,290 --> 00:32:29,230
Those are equivalent statements.

650
00:32:29,230 --> 00:32:30,980
And therefore, I should
be able to write g

651
00:32:30,980 --> 00:32:33,782
as a linear superposition
of the rows of J.

652
00:32:33,782 --> 00:32:35,240
And one way to say
that is I should

653
00:32:35,240 --> 00:32:37,160
be able to write
g as J transpose

654
00:32:37,160 --> 00:32:41,990
times some other vector lambda.

655
00:32:41,990 --> 00:32:43,790
That's an equivalent
way of saying

656
00:32:43,790 --> 00:32:48,004
that g is a linear
superposition of the rows of J.

657
00:32:48,004 --> 00:32:49,420
I don't know the
values of lambda.

658
00:32:53,700 --> 00:32:55,590
So I introduced a
new set of unknowns,

659
00:32:55,590 --> 00:32:59,142
a set of Lagrange multipliers.

660
00:32:59,142 --> 00:33:00,600
My minima is going
to be found when

661
00:33:00,600 --> 00:33:04,530
I satisfy this equation,
just like before,

662
00:33:04,530 --> 00:33:08,280
and when I'm able to satisfy
all of the equality constraints.

663
00:33:14,160 --> 00:33:19,960
How many Lagrange
multipliers do I have here?

664
00:33:19,960 --> 00:33:20,960
Can you figure that out?

665
00:33:20,960 --> 00:33:23,324
You can talk with your
neighbors if you want.

666
00:33:23,324 --> 00:33:24,240
Take a couple minutes.

667
00:33:24,240 --> 00:33:26,600
Tell me how many Lagrange
multipliers, how many elements

668
00:33:26,600 --> 00:33:27,710
are in this vector lambda.

669
00:34:19,380 --> 00:34:22,934
How many elements are in lambda?

670
00:34:22,934 --> 00:34:23,600
Can you tell me?

671
00:34:26,383 --> 00:34:26,883
Sam.

672
00:34:26,883 --> 00:34:29,239
AUDIENCE: Same as the number
of equality constraints.

673
00:34:29,239 --> 00:34:30,530
JAMES SWAN: Yes.

674
00:34:30,530 --> 00:34:33,469
It's the same as the number
of equality constraints.

675
00:34:33,469 --> 00:34:38,219
J came from the gradient of c.

676
00:34:38,219 --> 00:34:41,370
It's the Jacobian of c.

677
00:34:41,370 --> 00:34:46,290
So it has a number of columns
equal to the number of elements

678
00:34:46,290 --> 00:34:48,889
in x, because I'm taking
partial derivatives with respect

679
00:34:48,889 --> 00:34:51,300
to each element of
x, and has a number

680
00:34:51,300 --> 00:34:54,960
of rows equal to the
number of elements of c.

681
00:34:54,960 --> 00:34:58,770
So J transpose, I just
transpose those dimensions.

682
00:34:58,770 --> 00:35:02,700
And lambda must have the
same number of elements

683
00:35:02,700 --> 00:35:06,000
as c does in order to make
this product make sense.

684
00:35:06,000 --> 00:35:08,250
So I introduce a new
number of unknowns.

685
00:35:08,250 --> 00:35:12,240
It's equal to exactly the
number of equality constraints

686
00:35:12,240 --> 00:35:14,610
that I had, which
is good, because I'm

687
00:35:14,610 --> 00:35:17,220
going to make a system
of equations that

688
00:35:17,220 --> 00:35:20,790
says g of x minus J
transpose lambda equals 0

689
00:35:20,790 --> 00:35:23,460
and c of x equals 0.

690
00:35:23,460 --> 00:35:26,970
And the number of equations
here is the number

691
00:35:26,970 --> 00:35:29,290
of elements in x
for this gradient,

692
00:35:29,290 --> 00:35:31,874
and the number of
elements in c for c.

693
00:35:31,874 --> 00:35:33,540
And the number of
unknowns is the number

694
00:35:33,540 --> 00:35:36,902
of elements in x, and the number
of elements in c associated

695
00:35:36,902 --> 00:35:38,110
with the Lagrange multiplier.

696
00:35:38,110 --> 00:35:40,800
So I have enough equations
and unknowns to determine

697
00:35:40,800 --> 00:35:43,700
all of these things.

698
00:35:43,700 --> 00:35:47,440
So whether I have one equality
constraint or a million

699
00:35:47,440 --> 00:35:50,530
equality constraints,
the problem is identical.

700
00:35:50,530 --> 00:35:52,660
We use the method of
Lagrange multipliers.

701
00:35:52,660 --> 00:35:55,780
We have to solve an
augmented system of equations

702
00:35:55,780 --> 00:36:02,800
for x and this projection on the
row space of J, which tells us

703
00:36:02,800 --> 00:36:05,710
how the gradient is
stretched or made up,

704
00:36:05,710 --> 00:36:08,800
composed of elements
of the row space of J.

705
00:36:08,800 --> 00:36:10,300
These are the
conditions associated

706
00:36:10,300 --> 00:36:13,690
with a minima in our
objective function

707
00:36:13,690 --> 00:36:17,566
on this boundary dictated
by the equality constraint.

708
00:36:17,566 --> 00:36:19,690
And of course, the solution
set is a critical point

709
00:36:19,690 --> 00:36:25,800
of a Lagrangian, which is
f of x minus c dot lambda.

710
00:36:25,800 --> 00:36:28,840
And it's not a minimum of
it, it's a critical point.

711
00:36:28,840 --> 00:36:33,309
It's a saddle point, it turns
out, of this Lagrangian.

712
00:36:33,309 --> 00:36:35,350
So we've got to check,
did we find a saddle point

713
00:36:35,350 --> 00:36:39,974
or not when we find a solution
to this equation here.

714
00:36:39,974 --> 00:36:41,890
But it's just a system
of nonlinear equations.

715
00:36:41,890 --> 00:36:44,860
If we have some good initial
guess, what do we apply?

716
00:36:44,860 --> 00:36:48,670
Newton-Raphson, converge
rate towards the solution.

717
00:36:48,670 --> 00:36:51,790
If we don't have a
good initial guess,

718
00:36:51,790 --> 00:36:55,270
we've discussed lots of methods
we could employ, like homotopy

719
00:36:55,270 --> 00:36:57,340
or continuation
to try to develop

720
00:36:57,340 --> 00:37:00,709
good initial guesses for
what the solution should be.

721
00:37:00,709 --> 00:37:02,601
Are there any
questions about this?

722
00:37:05,450 --> 00:37:07,930
Good.

723
00:37:07,930 --> 00:37:08,430
OK.

724
00:37:08,430 --> 00:37:14,370
So you go to Matlab
and you call fmincon,

725
00:37:14,370 --> 00:37:17,120
do a minimization problem, and
you give it some constraints.

726
00:37:17,120 --> 00:37:18,870
Linear constraints,
nonlinear constraints,

727
00:37:18,870 --> 00:37:20,502
it doesn't matter actually.

728
00:37:20,502 --> 00:37:22,210
The problem is the
same for both of them.

729
00:37:22,210 --> 00:37:25,440
It's just a little bit easier
if I have linear constraints.

730
00:37:25,440 --> 00:37:28,650
If this constraining function
is a linear function, then

731
00:37:28,650 --> 00:37:30,360
the Jacobian I know.

732
00:37:30,360 --> 00:37:34,594
It's the coefficient matrix
of this linear problem.

733
00:37:34,594 --> 00:37:36,760
Now I only have to solve
linear equations down here.

734
00:37:36,760 --> 00:37:39,730
So the problem is a little
bit simpler to solve.

735
00:37:39,730 --> 00:37:42,360
So Matlab sort of
breaks these apart

736
00:37:42,360 --> 00:37:45,420
so it can use different
techniques depending on which

737
00:37:45,420 --> 00:37:46,530
sort of problem is posed.

738
00:37:46,530 --> 00:37:48,030
But the solution
method is the same.

739
00:37:48,030 --> 00:37:49,950
It does the method of
Lagrange multipliers

740
00:37:49,950 --> 00:37:51,394
to find the solution.

741
00:37:51,394 --> 00:37:51,894
OK?

742
00:37:54,850 --> 00:37:58,270
Inequality constraints.

743
00:37:58,270 --> 00:38:02,620
So interior point
methods were mentioned.

744
00:38:02,620 --> 00:38:04,750
And it turns out this
is really the best

745
00:38:04,750 --> 00:38:08,740
way to go about solving
generic inequality constrained

746
00:38:08,740 --> 00:38:09,790
problems.

747
00:38:09,790 --> 00:38:11,470
So the problems of
the sort minimize

748
00:38:11,470 --> 00:38:15,340
f of x subject to
h of x is positive,

749
00:38:15,340 --> 00:38:17,830
or at least not negative.

750
00:38:17,830 --> 00:38:20,110
This is some
nonlinear inequality

751
00:38:20,110 --> 00:38:23,350
that describes some domain
and its boundary in which

752
00:38:23,350 --> 00:38:25,120
the solution has to live.

753
00:38:25,120 --> 00:38:29,530
And what's done is to rewrite
as an unconstrained optimization

754
00:38:29,530 --> 00:38:34,480
problem with a barrier
that's incorporated.

755
00:38:34,480 --> 00:38:36,310
This looks a lot like
the penalty method,

756
00:38:36,310 --> 00:38:37,750
but it's very different.

757
00:38:37,750 --> 00:38:39,610
And I'll explain how.

758
00:38:39,610 --> 00:38:43,630
So instead, we want to
minimize this f of x minus mu

759
00:38:43,630 --> 00:38:49,090
times the sum of log of h,
each of these constraints.

760
00:38:51,840 --> 00:38:58,110
If h is negative, we'll take the
log of the negative argument.

761
00:38:58,110 --> 00:39:00,430
That's a problem
computationally.

762
00:39:00,430 --> 00:39:02,670
So the best we could do
is approach the boundary

763
00:39:02,670 --> 00:39:05,400
where h is equal to zero.

764
00:39:05,400 --> 00:39:08,320
And as h goes to zero, the
log goes to minus infinity.

765
00:39:08,320 --> 00:39:11,490
So this term tends to blow
up because I've got a minus

766
00:39:11,490 --> 00:39:12,780
sign in front of it.

767
00:39:12,780 --> 00:39:16,920
So this is sort
of like a penalty,

768
00:39:16,920 --> 00:39:19,320
but it's a little different
because the factor in front

769
00:39:19,320 --> 00:39:23,670
I'm actually going to take
the limit as mu goes to zero.

770
00:39:23,670 --> 00:39:26,340
I'm going to take the limit
as this factor gets small,

771
00:39:26,340 --> 00:39:28,650
rather than gets big.

772
00:39:28,650 --> 00:39:31,170
The log will always
get big as I approach

773
00:39:31,170 --> 00:39:33,070
the boundary of the domain.

774
00:39:33,070 --> 00:39:35,061
It'll blow up.

775
00:39:35,061 --> 00:39:36,060
So that's not a problem.

776
00:39:36,060 --> 00:39:39,180
But I can take the limit that
mu gets smaller and smaller.

777
00:39:39,180 --> 00:39:42,690
And this quantity here
will have less and less

778
00:39:42,690 --> 00:39:46,920
of an impact on the shape of
this new objective function

779
00:39:46,920 --> 00:39:48,600
and mu gets smaller and smaller.

780
00:39:48,600 --> 00:39:51,250
The impact will only be
nearest the boundary.

781
00:39:51,250 --> 00:39:53,100
Does that make sense?

782
00:39:53,100 --> 00:39:55,356
So you take the limit
that mu approaches zero.

783
00:39:55,356 --> 00:39:57,480
It's got to approach it
from the positive side, not

784
00:39:57,480 --> 00:40:01,386
the negative side, so
everything behaves well.

785
00:40:01,386 --> 00:40:05,575
And this is called an
interior point method.

786
00:40:05,575 --> 00:40:07,950
So we have to determine the
minimum of this new objective

787
00:40:07,950 --> 00:40:10,740
function for progressively
weaker barriers.

788
00:40:10,740 --> 00:40:12,570
So we might start
with some value of mu,

789
00:40:12,570 --> 00:40:15,780
and we might reduce
mu progressively

790
00:40:15,780 --> 00:40:17,490
until we get mu
down small enough

791
00:40:17,490 --> 00:40:19,509
that we think we've
converged to a solution.

792
00:40:19,509 --> 00:40:20,800
So how do you do that reliably?

793
00:40:24,340 --> 00:40:29,140
What's the procedure one uses
to solve a problem successively

794
00:40:29,140 --> 00:40:30,445
for different parameter values?

795
00:40:32,914 --> 00:40:33,830
AUDIENCE: [INAUDIBLE].

796
00:40:33,830 --> 00:40:35,538
JAMES SWAN: Yeah, it's
a homotopy, right?

797
00:40:35,538 --> 00:40:38,120
You're just going to change
the value of this barrier

798
00:40:38,120 --> 00:40:39,364
parameter.

799
00:40:39,364 --> 00:40:40,780
And you're going
to find a minima.

800
00:40:40,780 --> 00:40:42,650
And if you make a small change
in the barrier parameter,

801
00:40:42,650 --> 00:40:44,774
that's going to serve as
an excellent initial guess

802
00:40:44,774 --> 00:40:46,220
for the next value.

803
00:40:46,220 --> 00:40:48,770
And so you're just going
to take these small steps.

804
00:40:48,770 --> 00:40:51,200
And the optimization
routine is going

805
00:40:51,200 --> 00:40:53,180
to carry you towards
the minimum in the limit

806
00:40:53,180 --> 00:40:54,370
that mu goes to zero.

807
00:40:54,370 --> 00:40:55,610
So you do this with homotopy.

808
00:40:58,340 --> 00:41:01,700
Here's an example of this
sort of interior point

809
00:41:01,700 --> 00:41:03,200
method, a trivial example.

810
00:41:03,200 --> 00:41:06,180
Minimize x subject
to x being positive.

811
00:41:06,180 --> 00:41:09,695
So we know the solution
lives where x equals zero.

812
00:41:09,695 --> 00:41:13,040
But let's write this as
unconstrained optimization

813
00:41:13,040 --> 00:41:13,760
using a barrier.

814
00:41:13,760 --> 00:41:18,850
So minimize x minus
mu times log x.

815
00:41:18,850 --> 00:41:21,140
Here's x minus mu times log x.

816
00:41:21,140 --> 00:41:26,070
So out here, where x is
big, x wins over log x,

817
00:41:26,070 --> 00:41:28,120
so everything starts
to look linear.

818
00:41:28,120 --> 00:41:30,470
But as x become
smaller and smaller,

819
00:41:30,470 --> 00:41:34,010
log x gets very negative, so
minus log x gets very positive.

820
00:41:34,010 --> 00:41:36,210
And here's the log
creeping back up.

821
00:41:36,210 --> 00:41:38,206
And as I decrease mu
smaller and smaller,

822
00:41:38,206 --> 00:41:39,830
you can see the minima
of this function

823
00:41:39,830 --> 00:41:44,840
is moving closer and
closer and closer to zero.

824
00:41:44,840 --> 00:41:47,240
So if I take the limit
that mu decreases

825
00:41:47,240 --> 00:41:49,520
from some positive
number towards zero,

826
00:41:49,520 --> 00:41:52,410
eventually this minimum
is going to converge

827
00:41:52,410 --> 00:41:55,340
to the minimum of the
constrained inequality,

828
00:41:55,340 --> 00:41:57,580
constrained
optimization problem.

829
00:41:57,580 --> 00:41:58,900
Make sense?

830
00:41:58,900 --> 00:42:01,350
OK.

831
00:42:01,350 --> 00:42:01,850
OK.

832
00:42:01,850 --> 00:42:04,740
So we want to do this.

833
00:42:04,740 --> 00:42:07,130
You can use any barrier
function you want.

834
00:42:07,130 --> 00:42:09,650
Any thoughts on why a
logarithmic barrier is used?

835
00:42:18,710 --> 00:42:19,210
No.

836
00:42:19,210 --> 00:42:21,100
OK, that's OK.

837
00:42:21,100 --> 00:42:23,910
So minus log is
going to be convex.

838
00:42:23,910 --> 00:42:26,720
Log isn't convex, but minus
log is going to be convex.

839
00:42:26,720 --> 00:42:27,795
So that's good.

840
00:42:27,795 --> 00:42:29,920
If this function's convex,
then their combination's

841
00:42:29,920 --> 00:42:32,020
going to be convex,
and we'll be OK.

842
00:42:32,020 --> 00:42:34,300
But the gradient of the
log is easy to compute.

843
00:42:34,300 --> 00:42:37,690
Grad log h is 1 over h grad h.

844
00:42:37,690 --> 00:42:40,180
So if I know h, I know
grad h, it's easy for me

845
00:42:40,180 --> 00:42:42,130
to compute the
gradient of log h.

846
00:42:42,130 --> 00:42:46,000
We know we're going to solve
this unconstrained optimization

847
00:42:46,000 --> 00:42:48,690
problem where we need to take
grad of this objective function

848
00:42:48,690 --> 00:42:49,210
equal zero.

849
00:42:49,210 --> 00:42:50,987
So the calculations are easy.

850
00:42:50,987 --> 00:42:52,320
The log makes it easy like that.

851
00:42:52,320 --> 00:42:55,870
The log is also like
the most weakly singular

852
00:42:55,870 --> 00:42:57,520
function available to us.

853
00:42:57,520 --> 00:43:00,070
Out of all the tool box of
all problems we can reach to,

854
00:43:00,070 --> 00:43:02,835
the log has the mildest
sort of singularities.

855
00:43:02,835 --> 00:43:04,960
Singularities at both ends,
which is sort of funny,

856
00:43:04,960 --> 00:43:06,520
but the mildest sort
of singularities

857
00:43:06,520 --> 00:43:08,782
you have to cope with.

858
00:43:08,782 --> 00:43:10,990
So we want to find the
minimum of these unconstrained

859
00:43:10,990 --> 00:43:15,310
optimization problems where the
gradient of f minus mu sum 1

860
00:43:15,310 --> 00:43:18,400
over h, grad h,
is equal to zero.

861
00:43:18,400 --> 00:43:20,700
And we just do that for
progressively smaller values

862
00:43:20,700 --> 00:43:23,410
of mu, and we'll
converge to a solution.

863
00:43:23,410 --> 00:43:25,720
That's the interior
point method.

864
00:43:25,720 --> 00:43:31,360
You use homotopy to study a
sequence of barrier parameters,

865
00:43:31,360 --> 00:43:36,760
or continuation to study a
sequence of barrier parameters.

866
00:43:36,760 --> 00:43:41,580
You stop the homotopy or
continuation when what?

867
00:43:41,580 --> 00:43:42,806
How are you going to stop?

868
00:43:47,000 --> 00:43:49,234
I've got to make
mu small, right?

869
00:43:49,234 --> 00:43:51,150
I want to go towards the
limit mu equals zero.

870
00:43:51,150 --> 00:43:52,860
I can't actually get
to mu equals zero,

871
00:43:52,860 --> 00:43:54,450
I've just got to approach it.

872
00:43:54,450 --> 00:43:57,930
So how small do I need
to make mu before I quit?

873
00:43:57,930 --> 00:43:59,160
It's an interesting question.

874
00:43:59,160 --> 00:43:59,910
What do you think?

875
00:44:02,274 --> 00:44:03,440
I'll take this answer first.

876
00:44:03,440 --> 00:44:06,512
AUDIENCE: So it doesn't
affect the limitation.

877
00:44:06,512 --> 00:44:07,220
JAMES SWAN: Good.

878
00:44:07,220 --> 00:44:09,027
So we might look at
the solution and see

879
00:44:09,027 --> 00:44:10,610
is the solution
becoming less and less

880
00:44:10,610 --> 00:44:12,430
sensitive to the choice of mu.

881
00:44:12,430 --> 00:44:14,252
Did you have another suggestion?

882
00:44:14,252 --> 00:44:15,668
AUDIENCE: [INAUDIBLE].

883
00:44:17,930 --> 00:44:19,180
JAMES SWAN: Set the tolerance.

884
00:44:19,180 --> 00:44:20,766
Right, OK.

885
00:44:20,766 --> 00:44:23,484
AUDIENCE: [INAUDIBLE].

886
00:44:23,484 --> 00:44:24,150
JAMES SWAN: Mhm.

887
00:44:24,150 --> 00:44:25,050
Right, right, right, right.

888
00:44:25,050 --> 00:44:25,550
So you--

889
00:44:25,550 --> 00:44:28,842
AUDIENCE: [INAUDIBLE].

890
00:44:28,842 --> 00:44:29,550
JAMES SWAN: Good.

891
00:44:29,550 --> 00:44:31,210
So there were two
suggestions here.

892
00:44:31,210 --> 00:44:34,100
One is along the lines
of a step-norm criteria,

893
00:44:34,100 --> 00:44:36,520
like I check my
solution as I change mu,

894
00:44:36,520 --> 00:44:39,700
and I ask when does
my solution seem

895
00:44:39,700 --> 00:44:42,700
relatively insensitive to mu.

896
00:44:42,700 --> 00:44:45,370
When the changes in these
steps relative to mu

897
00:44:45,370 --> 00:44:48,010
get sufficiently
small, I might be

898
00:44:48,010 --> 00:44:49,810
willing to accept
these solutions

899
00:44:49,810 --> 00:44:53,230
as reasonable solutions for
the constrained optimization.

900
00:44:53,230 --> 00:44:55,180
I can also go back
and I can check

901
00:44:55,180 --> 00:44:58,240
sort of function norm criteria.

902
00:44:58,240 --> 00:45:00,650
I can take the value of
x I found as the minimum,

903
00:45:00,650 --> 00:45:02,920
and I can ask how
good a job does

904
00:45:02,920 --> 00:45:08,140
it do satisfying the
original equations.

905
00:45:08,140 --> 00:45:11,180
How far away am I from
satisfying the inequality

906
00:45:11,180 --> 00:45:11,680
constraint?

907
00:45:11,680 --> 00:45:14,740
How close am I to actually
minimizing the function

908
00:45:14,740 --> 00:45:15,625
within that domain?

909
00:45:20,245 --> 00:45:21,344
OK.

910
00:45:21,344 --> 00:45:22,760
So we're running
out of time here.

911
00:45:22,760 --> 00:45:26,300
Let me provide you
with an example.

912
00:45:26,300 --> 00:45:27,350
So let's minimize again--

913
00:45:27,350 --> 00:45:29,808
I always pick this function
because it's easy to visualize,

914
00:45:29,808 --> 00:45:32,770
a nice parabolic function
that opens upwards.

915
00:45:32,770 --> 00:45:36,340
And let's minimize it
subject to the constraint

916
00:45:36,340 --> 00:45:42,620
that h of x1 and x2
is equal to 1 minus--

917
00:45:42,620 --> 00:45:45,730
well, the equation for a circle
of radius 1, essentially.

918
00:45:45,730 --> 00:45:49,240
The interior of that circle.

919
00:45:49,240 --> 00:45:51,450
So here's the contours
of the function,

920
00:45:51,450 --> 00:45:53,055
and this red domain
is the constraint.

921
00:45:53,055 --> 00:45:54,840
And we want to know
the smallest value

922
00:45:54,840 --> 00:45:58,440
of f that lives in this domain.

923
00:45:58,440 --> 00:45:59,440
So here's a Matlab code.

924
00:45:59,440 --> 00:46:00,800
You can try it out.

925
00:46:00,800 --> 00:46:03,960
And make a function, the
objective function, f,

926
00:46:03,960 --> 00:46:06,650
it's x squared plus 10x--

927
00:46:06,650 --> 00:46:08,900
x1 squared plus 10x2 squared.

928
00:46:08,900 --> 00:46:10,240
Here's the gradient.

929
00:46:10,240 --> 00:46:13,210
Here's the Hessian.

930
00:46:13,210 --> 00:46:15,700
Here, I calculate h.

931
00:46:15,700 --> 00:46:17,080
Here's the gradient in h.

932
00:46:17,080 --> 00:46:19,810
Here's the Hessian in h.

933
00:46:19,810 --> 00:46:22,810
I've got to define a new
objective function, phi,

934
00:46:22,810 --> 00:46:26,470
which is f minus mu log h.

935
00:46:26,470 --> 00:46:29,320
This is the gradient in phi
and this is the Hessian of phi.

936
00:46:29,320 --> 00:46:30,760
Oh, man, what a mess.

937
00:46:30,760 --> 00:46:33,310
But actually, not such
a mess, because the log

938
00:46:33,310 --> 00:46:35,785
makes it really easy to
take these derivatives.

939
00:46:35,785 --> 00:46:40,810
So it's just a lot of
differential sort of calculus

940
00:46:40,810 --> 00:46:43,960
involved in working this out,
but this is the Hessian of phi.

941
00:46:43,960 --> 00:46:46,270
And then I need
some initial guess.

942
00:46:46,270 --> 00:46:48,940
So I pick the
center of my circle

943
00:46:48,940 --> 00:46:50,737
as an initial guess
for the solution.

944
00:46:50,737 --> 00:46:52,570
And I'm going to loop
over values of mu that

945
00:46:52,570 --> 00:46:53,707
get progressively smaller.

946
00:46:53,707 --> 00:46:55,290
I'll just go down
to 10 to the minus 2

947
00:46:55,290 --> 00:46:57,719
and stop for illustration
purposes here.

948
00:46:57,719 --> 00:47:00,010
But really, we should be
checking the solution as we go

949
00:47:00,010 --> 00:47:04,360
and deciding what values
we want to stop with.

950
00:47:04,360 --> 00:47:06,800
And then this loop
here, what's this do?

951
00:47:09,880 --> 00:47:12,222
What's it do?

952
00:47:12,222 --> 00:47:15,030
Can you tell?

953
00:47:15,030 --> 00:47:16,440
AUDIENCE: Is it Newton?

954
00:47:16,440 --> 00:47:16,770
JAMES SWAN: What's that?

955
00:47:16,770 --> 00:47:17,596
AUDIENCE: Newton?

956
00:47:17,596 --> 00:47:19,470
JAMES SWAN: Yeah, it's
Newton-Raphson, right?

957
00:47:19,470 --> 00:47:25,050
x is x minus Hessian inverse
times grad phi, right?

958
00:47:25,050 --> 00:47:26,462
So I just do Newton-Raphson.

959
00:47:26,462 --> 00:47:28,170
I take my initial
guess and I loop around

960
00:47:28,170 --> 00:47:30,630
with Newton-Raphson, and
when this loop finishes,

961
00:47:30,630 --> 00:47:32,910
I reduce mu, and it'll
just use my previous guess

962
00:47:32,910 --> 00:47:35,580
as the initial guess for
the next value of the loop,

963
00:47:35,580 --> 00:47:38,187
until mu is sufficiently small.

964
00:47:38,187 --> 00:47:39,680
OK?

965
00:47:39,680 --> 00:47:41,310
Interior point method.

966
00:47:41,310 --> 00:47:43,370
Here's what that
solution path looks like.

967
00:47:43,370 --> 00:47:46,520
So mu started at 1, and
the barrier was here.

968
00:47:46,520 --> 00:47:49,370
It was close to the edge of the
circle, but not quite on it.

969
00:47:49,370 --> 00:47:51,200
But as I reduced mu
further and further

970
00:47:51,200 --> 00:47:53,030
and further, you
can see the path,

971
00:47:53,030 --> 00:47:54,530
the solution path,
that was followed

972
00:47:54,530 --> 00:47:56,930
works its way closer to
the boundary of the circle.

973
00:47:56,930 --> 00:47:59,027
And the minimum is
found right here.

974
00:47:59,027 --> 00:48:00,860
So it turns out the
minimum of this function

975
00:48:00,860 --> 00:48:02,720
doesn't live in the
domain, it lives

976
00:48:02,720 --> 00:48:04,960
on the boundary of the domain.

977
00:48:04,960 --> 00:48:08,830
Recall that this point
should be a point where

978
00:48:08,830 --> 00:48:11,290
the boundary of the
domain is parallel

979
00:48:11,290 --> 00:48:15,699
to the contours of the function,
since actually we didn't need

980
00:48:15,699 --> 00:48:16,990
the inequality constraint here.

981
00:48:16,990 --> 00:48:18,630
We could have used the
equality constraint.

982
00:48:18,630 --> 00:48:20,840
The equality constrained
problem has the same solution

983
00:48:20,840 --> 00:48:22,423
as the inequality
constrained problem.

984
00:48:22,423 --> 00:48:23,800
And look, that
actually happened.

985
00:48:23,800 --> 00:48:25,677
Here's the contours
of the function.

986
00:48:25,677 --> 00:48:27,760
The contour of the function
runs right along here,

987
00:48:27,760 --> 00:48:29,200
and you can see
it looks like it's

988
00:48:29,200 --> 00:48:31,360
going to be tangent to
the circle at this point.

989
00:48:31,360 --> 00:48:34,450
So the interpoint
method actually solved

990
00:48:34,450 --> 00:48:37,450
an equality constrained problem
in addition to an inequality

991
00:48:37,450 --> 00:48:40,589
constrained problem, which is--
that's sort of cool that you

992
00:48:40,589 --> 00:48:41,380
can do it that way.

993
00:48:44,010 --> 00:48:46,620
How about if I want to do
a combination of equality

994
00:48:46,620 --> 00:48:48,135
and inequality constraints?

995
00:48:48,135 --> 00:48:50,010
Then what do I do?

996
00:48:57,285 --> 00:48:58,255
Yeah.

997
00:48:58,255 --> 00:49:02,150
AUDIENCE: [INAUDIBLE].

998
00:49:02,150 --> 00:49:03,020
JAMES SWAN: Perfect.

999
00:49:03,020 --> 00:49:07,280
Convert the equality
constraint into unknowns,

1000
00:49:07,280 --> 00:49:09,909
Lagrange multipliers, instead.

1001
00:49:09,909 --> 00:49:11,450
And then do the
interior point method

1002
00:49:11,450 --> 00:49:13,449
on the Lagrange
multiplier problem.

1003
00:49:13,449 --> 00:49:15,740
Now you've got a combination
of equality and inequality

1004
00:49:15,740 --> 00:49:16,460
constrained.

1005
00:49:16,460 --> 00:49:18,885
This is exactly
what Matlab does.

1006
00:49:18,885 --> 00:49:20,840
So it converts
equality constraints

1007
00:49:20,840 --> 00:49:22,610
into Lagrange multipliers.

1008
00:49:22,610 --> 00:49:24,500
Inequality constraints
it actually solves

1009
00:49:24,500 --> 00:49:26,630
using interior point methods.

1010
00:49:26,630 --> 00:49:28,790
Buried in that
interior point method

1011
00:49:28,790 --> 00:49:32,450
is some form of Newton-Raphson
and steepest descent combined

1012
00:49:32,450 --> 00:49:34,580
together, like dog
leg we talked about

1013
00:49:34,580 --> 00:49:36,500
for unconstrained problems.

1014
00:49:36,500 --> 00:49:38,120
And it's going to
do a continuation.

1015
00:49:38,120 --> 00:49:40,470
As it reduces the
values of mu, it'll

1016
00:49:40,470 --> 00:49:43,000
have some heuristic
for how it does that.

1017
00:49:43,000 --> 00:49:46,010
It's going to use its previous
solutions as initial guesses

1018
00:49:46,010 --> 00:49:47,910
for the next iteration.

1019
00:49:47,910 --> 00:49:49,700
So these are very
complicated problems,

1020
00:49:49,700 --> 00:49:52,310
but if you understand how to
solve systems of nonlinear

1021
00:49:52,310 --> 00:49:55,010
equations, and you
think carefully

1022
00:49:55,010 --> 00:49:57,350
about how to control numerical
error in your algorithm,

1023
00:49:57,350 --> 00:49:58,808
you come to a
conclusion like this,

1024
00:49:58,808 --> 00:50:04,100
that you can do these sorts of
Lagrange multiplier interior

1025
00:50:04,100 --> 00:50:06,710
point methods to solve a
wide variety of problems

1026
00:50:06,710 --> 00:50:09,820
with reasonable reliability.

1027
00:50:09,820 --> 00:50:10,625
OK?

1028
00:50:10,625 --> 00:50:12,156
Any more questions?

1029
00:50:15,030 --> 00:50:16,140
No?

1030
00:50:16,140 --> 00:50:17,200
Good.

1031
00:50:17,200 --> 00:50:20,830
Well, thank you, and
we'll see you on Friday.