WEBVTT
00:00:01.540 --> 00:00:03.910
The following content is
provided under a Creative
00:00:03.910 --> 00:00:05.300
Commons license.
00:00:05.300 --> 00:00:07.510
Your support will help
MIT OpenCourseWare
00:00:07.510 --> 00:00:11.600
continue to offer high-quality
educational resources for free.
00:00:11.600 --> 00:00:14.140
To make a donation or to
view additional materials
00:00:14.140 --> 00:00:18.100
from hundreds of MIT courses,
visit MIT OpenCourseWare
00:00:18.100 --> 00:00:19.310
at ocw.mit.edu.
00:00:24.145 --> 00:00:24.770
JAMES SWAN: OK.
00:00:24.770 --> 00:00:26.790
Well, everyone's
quieted down, so that
00:00:26.790 --> 00:00:28.680
means we have to get started.
00:00:28.680 --> 00:00:31.350
So let me say something here.
00:00:31.350 --> 00:00:34.260
This will be our
last conversation
00:00:34.260 --> 00:00:36.360
about optimization.
00:00:36.360 --> 00:00:39.510
So we've discussed
unconstrained optimization.
00:00:39.510 --> 00:00:42.150
And now we're going to discuss
a slightly more complicated
00:00:42.150 --> 00:00:43.525
problem-- but
you're going to see
00:00:43.525 --> 00:00:45.420
it's really not that
much more complicated--
00:00:45.420 --> 00:00:47.170
constrained optimization.
00:00:50.042 --> 00:00:51.750
These are the things
we discussed before.
00:00:51.750 --> 00:00:53.520
I don't want to spend
much time recapping
00:00:53.520 --> 00:00:55.936
because I want to take a minute
and talk about the midterm
00:00:55.936 --> 00:00:57.780
exam.
00:00:57.780 --> 00:00:59.250
So we have a quiz.
00:00:59.250 --> 00:01:01.020
It's next Wednesday.
00:01:01.020 --> 00:01:02.680
Here's where it's
going to be located.
00:01:02.680 --> 00:01:03.630
Here's 66.
00:01:03.630 --> 00:01:04.950
Head down Ames.
00:01:04.950 --> 00:01:09.030
You're looking for Walker
Memorial on the third floor.
00:01:09.030 --> 00:01:12.030
Unfortunately, the time for
the quiz is 7:00 to 9:00 PM.
00:01:12.030 --> 00:01:14.640
We really did try hard
to get the scheduling
00:01:14.640 --> 00:01:16.450
office to give us
something better,
00:01:16.450 --> 00:01:20.100
but the only way to get a room
that would fit everybody in
00:01:20.100 --> 00:01:22.035
was to do it at
this time in Walker.
00:01:22.035 --> 00:01:23.910
I really don't understand,
because I actually
00:01:23.910 --> 00:01:27.480
requested locations for
the quizzes back in April.
00:01:27.480 --> 00:01:31.800
And somehow I was
too early, maybe,
00:01:31.800 --> 00:01:33.992
and got buried under a pile.
00:01:33.992 --> 00:01:35.700
Maybe not important
enough, I don't know.
00:01:35.700 --> 00:01:40.146
But it's got to be from
seven to nine next Wednesday.
00:01:40.146 --> 00:01:40.645
Third floor.
00:01:40.645 --> 00:01:42.270
There's not going
to be any class
00:01:42.270 --> 00:01:44.350
next Wednesday because
you have a quiz instead.
00:01:44.350 --> 00:01:49.500
So you get a little extra time
to relax or study, prepare,
00:01:49.500 --> 00:01:51.690
calm yourself before
you go into the exam.
00:01:51.690 --> 00:01:54.150
There's no homework this week.
00:01:54.150 --> 00:01:58.950
So you can just use this time
to focus on the material we've
00:01:58.950 --> 00:02:00.090
discussed.
00:02:00.090 --> 00:02:02.310
There's a practice
exam from last year
00:02:02.310 --> 00:02:07.950
posted on the Steller site,
which you can utilize and study
00:02:07.950 --> 00:02:08.850
from.
00:02:08.850 --> 00:02:10.250
I'll tell you this.
00:02:13.506 --> 00:02:15.990
That practice exam is
skewed a little more
00:02:15.990 --> 00:02:19.410
towards some chemical
engineering problems
00:02:19.410 --> 00:02:22.060
that motivate the numerics.
00:02:22.060 --> 00:02:25.050
I've found in the past that
when problems like that
00:02:25.050 --> 00:02:27.060
are given on the exam,
sometimes there's
00:02:27.060 --> 00:02:30.580
a lot of reading that goes into
understanding the engineering
00:02:30.580 --> 00:02:31.080
problem.
00:02:31.080 --> 00:02:33.760
And that tends to set
back the problem-solving.
00:02:33.760 --> 00:02:37.110
So I'll tell you that the quiz
that you'll take on Wednesday
00:02:37.110 --> 00:02:40.590
will have less of the
engineering associated with it,
00:02:40.590 --> 00:02:46.080
and focus more on the numerical
or computational science.
00:02:46.080 --> 00:02:48.400
The underlying
sorts of questions,
00:02:48.400 --> 00:02:51.450
the way the questions are
asked, the kinds of responses
00:02:51.450 --> 00:02:53.730
you're expected to give
I'd say are very similar.
00:02:53.730 --> 00:02:57.450
But we've tried to tune
the exam so that it'll
00:02:57.450 --> 00:02:59.430
be less of a burden
to understand
00:02:59.430 --> 00:03:01.429
the structure of the
problem before describing
00:03:01.429 --> 00:03:02.220
how you'd solve it.
00:03:02.220 --> 00:03:03.920
So I think that's good.
00:03:03.920 --> 00:03:07.990
It's comprehensive up to today.
00:03:07.990 --> 00:03:10.860
So linear algebra, systems
of nonlinear equations
00:03:10.860 --> 00:03:14.160
and optimization
are the quiz topics.
00:03:14.160 --> 00:03:18.660
We're going to switch on
Friday to ordinary differential
00:03:18.660 --> 00:03:20.735
equations and initial
value problems.
00:03:20.735 --> 00:03:22.110
So you have two
lectures on that,
00:03:22.110 --> 00:03:24.540
but you won't have
done any homework.
00:03:24.540 --> 00:03:26.940
You probably don't know enough
or aren't practiced enough
00:03:26.940 --> 00:03:30.796
to answer any questions
intelligently on the quiz.
00:03:30.796 --> 00:03:32.670
So don't expect that
material to be on there.
00:03:32.670 --> 00:03:33.380
It's not.
00:03:33.380 --> 00:03:36.672
It's going to be
these three topics.
00:03:36.672 --> 00:03:38.880
Are there any questions
about this that I can answer?
00:03:42.864 --> 00:03:44.358
Kristin has a question.
00:03:44.358 --> 00:03:45.354
AUDIENCE: [INAUDIBLE].
00:03:56.535 --> 00:03:57.160
JAMES SWAN: OK.
00:03:57.160 --> 00:03:59.380
So yeah, come prepared.
00:03:59.380 --> 00:04:00.190
It might be cold.
00:04:00.190 --> 00:04:02.440
It might be hot.
00:04:02.440 --> 00:04:04.130
It leaks when it
rains a little bit.
00:04:04.130 --> 00:04:07.480
Yeah, it's not
the greatest spot.
00:04:07.480 --> 00:04:09.590
So come prepared.
00:04:09.590 --> 00:04:10.300
That's true.
00:04:10.300 --> 00:04:12.000
Other questions?
00:04:12.000 --> 00:04:13.292
Things you want to know?
00:04:13.292 --> 00:04:15.180
AUDIENCE: What can
we take to the exam?
00:04:15.180 --> 00:04:16.471
JAMES SWAN: Ooh, good question.
00:04:16.471 --> 00:04:20.059
So you can bring the book
recommended for the course.
00:04:20.059 --> 00:04:21.100
You can bring your notes.
00:04:21.100 --> 00:04:22.960
You can bring a calculator.
00:04:22.960 --> 00:04:24.300
You need to bring some pencils.
00:04:24.300 --> 00:04:28.210
We'll provide blue books for
you to write your solutions
00:04:28.210 --> 00:04:29.860
to the exam in.
00:04:29.860 --> 00:04:31.150
So those are the materials.
00:04:31.150 --> 00:04:32.320
Good.
00:04:32.320 --> 00:04:33.370
What else?
00:04:33.370 --> 00:04:34.710
Same question.
00:04:34.710 --> 00:04:36.419
OK.
00:04:36.419 --> 00:04:37.898
Other questions?
00:04:40.856 --> 00:04:42.171
No?
00:04:42.171 --> 00:04:42.670
OK.
00:04:45.755 --> 00:04:47.880
So then let's jump into
the topic of the day, which
00:04:47.880 --> 00:04:50.070
is constrained optimization.
00:04:50.070 --> 00:04:51.840
So these are
problems of the sort.
00:04:51.840 --> 00:04:57.870
Minimize an objective function
f of x subject to the constraint
00:04:57.870 --> 00:05:02.190
that x belongs to some set
D, or find the argument
00:05:02.190 --> 00:05:04.007
x that minimizes this function.
00:05:04.007 --> 00:05:05.590
These are equivalent
sorts of problem.
00:05:05.590 --> 00:05:08.410
Sometimes, we want to know
one or the other or both.
00:05:08.410 --> 00:05:10.790
That's not a problem.
00:05:10.790 --> 00:05:12.290
And graphically,
it looks like this.
00:05:12.290 --> 00:05:13.770
Here's f, our
objective function.
00:05:13.770 --> 00:05:16.470
It's a nice convex
bowl-shaped function here.
00:05:16.470 --> 00:05:19.080
And we want to know the
values of x1 and x2,
00:05:19.080 --> 00:05:21.870
let's say, that
minimize this function
00:05:21.870 --> 00:05:23.220
subject to some constraint.
00:05:23.220 --> 00:05:27.900
That constraint could
be that x1 and x2 live
00:05:27.900 --> 00:05:29.820
inside this little blue circle.
00:05:29.820 --> 00:05:32.820
It could be D. It could
be that x1 and x2 live
00:05:32.820 --> 00:05:34.680
on the surface of
this circle, right,
00:05:34.680 --> 00:05:36.780
on the circumference
of this circle.
00:05:36.780 --> 00:05:40.670
That could be the constraint.
00:05:40.670 --> 00:05:43.160
So these are the sorts of
problems we want to solve.
00:05:43.160 --> 00:05:45.980
D is called the
feasible set, and can
00:05:45.980 --> 00:05:48.710
be described in terms of really
two types of constraints.
00:05:48.710 --> 00:05:51.200
One is what we call
equality constraints.
00:05:51.200 --> 00:05:54.650
So D can be the set
of values x such
00:05:54.650 --> 00:06:00.440
that some nonlinear function
c of x is equal to zero.
00:06:00.440 --> 00:06:03.080
So it's the set of points
that satisfy this nonlinear
00:06:03.080 --> 00:06:04.400
equation.
00:06:04.400 --> 00:06:06.080
And among those
points, we want to know
00:06:06.080 --> 00:06:09.274
which one produces the minimum
in the objective function.
00:06:09.274 --> 00:06:10.940
Or it could be an
inequality constraint.
00:06:10.940 --> 00:06:14.420
So D could be the set of
points such that some nonlinear
00:06:14.420 --> 00:06:20.240
function h of x is, by
convention, positive.
00:06:20.240 --> 00:06:22.170
So h of x could
represent, for example,
00:06:22.170 --> 00:06:23.900
the interior of a
circle, and c of x
00:06:23.900 --> 00:06:27.189
could represent the
circumference of a circle.
00:06:27.189 --> 00:06:28.730
And we would have
nonlinear equations
00:06:28.730 --> 00:06:33.560
that reflect those
values of x that satisfy
00:06:33.560 --> 00:06:35.360
those sorts of geometries.
00:06:38.770 --> 00:06:43.930
So equality constrained,
points that lie on this circle,
00:06:43.930 --> 00:06:48.010
inequality constrained, points
that lie within this circle.
00:06:48.010 --> 00:06:50.560
The shape of the feasible set
is constrained by the problem
00:06:50.560 --> 00:06:52.460
that you're actually
interested in.
00:06:52.460 --> 00:06:54.460
So it's easy for me to
draw circles in the plane
00:06:54.460 --> 00:06:56.293
because that's a shape
you're familiar with.
00:06:56.293 --> 00:06:57.970
But actually, it'll
come from some sort
00:06:57.970 --> 00:07:01.060
of physical constraint on the
engineering problem you're
00:07:01.060 --> 00:07:03.940
looking at, like mole fractions
need to be bigger than zero
00:07:03.940 --> 00:07:06.430
and smaller than one, and
temperatures in absolute value
00:07:06.430 --> 00:07:09.220
have to be bigger than zero
and smaller than some value
00:07:09.220 --> 00:07:12.790
because that's a safety
factor on the process.
00:07:12.790 --> 00:07:16.960
So these set up the
constraints on various sorts
00:07:16.960 --> 00:07:19.864
of optimization problems
that we're interested in.
00:07:19.864 --> 00:07:22.030
It could also be true that
we're interested in, say,
00:07:22.030 --> 00:07:24.790
optimization in the domain
outside of this circle, too.
00:07:24.790 --> 00:07:28.030
It could be on the inside,
could be on the outside.
00:07:28.030 --> 00:07:31.816
That's also an inequality
constrained sort of problem.
00:07:31.816 --> 00:07:34.730
You know some of these already.
00:07:34.730 --> 00:07:36.350
They're familiar to you.
00:07:36.350 --> 00:07:38.840
So here's a classic
one from mechanics.
00:07:38.840 --> 00:07:43.120
Here's the total energy in a
system for, say, a pendulum.
00:07:43.120 --> 00:07:47.080
So x is like the position of
the tip of this pendulum and v
00:07:47.080 --> 00:07:48.910
is the velocity
that it moves with.
00:07:48.910 --> 00:07:50.690
This is the kinetic energy.
00:07:50.690 --> 00:07:51.920
This is the potential energy.
00:07:51.920 --> 00:07:54.128
And we know the pendulum
will come to rest in a place
00:07:54.128 --> 00:07:56.680
where the energy is minimized.
00:07:56.680 --> 00:07:59.050
Well, the energy can
only be minimized
00:07:59.050 --> 00:08:02.590
when the velocity here is zero,
because any non-zero velocity
00:08:02.590 --> 00:08:04.450
will always push the
energy content up.
00:08:04.450 --> 00:08:06.040
So it comes to rest.
00:08:06.040 --> 00:08:07.690
It doesn't move.
00:08:07.690 --> 00:08:10.120
And then there's some
value of x at which
00:08:10.120 --> 00:08:11.710
the energy is minimized.
00:08:11.710 --> 00:08:14.680
If there is no constraint
that says that the pendulum is
00:08:14.680 --> 00:08:17.680
attached to some
central axis, then I
00:08:17.680 --> 00:08:19.270
can always make
the energy smaller
00:08:19.270 --> 00:08:21.430
by making x more
and more negative.
00:08:21.430 --> 00:08:22.570
It just keeps falling.
00:08:22.570 --> 00:08:23.740
There is no stopping point.
00:08:23.740 --> 00:08:25.507
But there's a constraint.
00:08:25.507 --> 00:08:27.340
The distance between
the tip of the pendulum
00:08:27.340 --> 00:08:31.965
and this central point is
some fixed distance out.
00:08:31.965 --> 00:08:34.090
So this is an equality
constrained sort of problem,
00:08:34.090 --> 00:08:35.714
and we have to choose
from the set of v
00:08:35.714 --> 00:08:38.679
and x the values subject
to this constraint that
00:08:38.679 --> 00:08:39.970
minimize the total energy.
00:08:39.970 --> 00:08:43.650
And that's this configuration
of the pendulum here.
00:08:43.650 --> 00:08:46.420
So you know these sorts
of problems already.
00:08:46.420 --> 00:08:51.380
We talked about this one,
linear sorts of programs.
00:08:51.380 --> 00:08:55.480
These are optimization problems
where the objective function is
00:08:55.480 --> 00:08:57.940
linear in the design variables.
00:08:57.940 --> 00:09:01.300
So it's just the dot product
between x and some vector
00:09:01.300 --> 00:09:03.070
c that weights the
different design
00:09:03.070 --> 00:09:05.300
options against each other.
00:09:05.300 --> 00:09:07.060
So we talked about ice cream.
00:09:07.060 --> 00:09:08.480
Yes, this is all
premium ice cream
00:09:08.480 --> 00:09:11.435
because it comes in
the small containers,
00:09:11.435 --> 00:09:12.810
subject to different
constraints.
00:09:12.810 --> 00:09:14.226
So those constraints
can be things
00:09:14.226 --> 00:09:16.480
like, oh, x has to be
positive because we can't make
00:09:16.480 --> 00:09:18.220
negative amounts of ice cream.
00:09:18.220 --> 00:09:20.440
And maybe we've
done market research
00:09:20.440 --> 00:09:22.300
that tells us that
the market can only
00:09:22.300 --> 00:09:26.110
tolerate certain ratios of
different types of ice cream.
00:09:26.110 --> 00:09:28.600
And that may be some
set of linear equations
00:09:28.600 --> 00:09:31.570
that describe that market
research that sort of bound
00:09:31.570 --> 00:09:33.812
the upper values of
how much ice cream
00:09:33.812 --> 00:09:35.020
we can put out on the market.
00:09:35.020 --> 00:09:38.680
And then we try to choose the
optimal blend of pina colada
00:09:38.680 --> 00:09:40.230
and strawberry to sell.
00:09:43.385 --> 00:09:46.040
So those are linear programs.
00:09:46.040 --> 00:09:50.270
This is an inequality
constrained optimization.
00:09:53.660 --> 00:09:56.330
In general, we might write
these problems like this.
00:09:56.330 --> 00:09:59.900
We might say minimize f of
x subject to the constraint
00:09:59.900 --> 00:10:04.400
that c of x is 0 and
h of x is positive.
00:10:04.400 --> 00:10:06.200
So minimize it over
the values of x that
00:10:06.200 --> 00:10:08.490
satisfy these two constraints.
00:10:08.490 --> 00:10:12.060
There's an old approach that's
discussed in the literature.
00:10:12.060 --> 00:10:12.920
And it's not used.
00:10:12.920 --> 00:10:14.000
I'm going to describe
it to you, and then I
00:10:14.000 --> 00:10:16.000
want you to try to figure
out why it's not used.
00:10:16.000 --> 00:10:19.210
And it's called
the penalty method.
00:10:19.210 --> 00:10:21.250
And the penalty
method works this way.
00:10:21.250 --> 00:10:23.990
It says define a new
objective function,
00:10:23.990 --> 00:10:28.690
which is our old objective
function plus some penalty
00:10:28.690 --> 00:10:30.835
for violating the constraints.
00:10:30.835 --> 00:10:31.960
How does that penalty work?
00:10:31.960 --> 00:10:35.590
So we know that we want
values of x for which c of x
00:10:35.590 --> 00:10:38.200
is equal to 0.
00:10:38.200 --> 00:10:41.200
So if we add to our objective
function the norm of c of x--
00:10:41.200 --> 00:10:44.720
this is a positive quantity--
00:10:44.720 --> 00:10:46.700
this is a positive
quantity-- whenever
00:10:46.700 --> 00:10:48.740
x doesn't satisfy
the constraint,
00:10:48.740 --> 00:10:51.080
this positive
quantity will give us
00:10:51.080 --> 00:10:54.420
a bigger value for this
objective function f
00:10:54.420 --> 00:10:56.180
than if c of x was equal to 0.
00:10:56.180 --> 00:11:01.820
So we penalize points which
don't satisfy the constraint.
00:11:01.820 --> 00:11:04.940
And in the limit that this
penalty factor mu here
00:11:04.940 --> 00:11:10.130
goes to zero, the penalties
get large, so large
00:11:10.130 --> 00:11:13.280
that our solution
will have to prefer
00:11:13.280 --> 00:11:14.962
satisfying the constraints.
00:11:14.962 --> 00:11:16.670
There's another penalty
factor over here,
00:11:16.670 --> 00:11:19.280
which is identical to this
one but for the inequality
00:11:19.280 --> 00:11:20.360
constraint.
00:11:20.360 --> 00:11:26.770
It says take a
heaviside step function
00:11:26.770 --> 00:11:31.400
for which is equal to 1 when
the value of its argument
00:11:31.400 --> 00:11:32.960
is positive, and
it's equal to zero
00:11:32.960 --> 00:11:35.650
when the value of its
argument is negative.
00:11:35.650 --> 00:11:40.760
So whenever I violate each
of my inequality constraints,
00:11:40.760 --> 00:11:44.930
Hi of x, turn on this
heaviside step function,
00:11:44.930 --> 00:11:46.610
make it equal to 1,
and then multiply it
00:11:46.610 --> 00:11:50.270
by the value of the constraint
squared, a positive number.
00:11:50.270 --> 00:11:52.430
So this is the inequality
constraint penalty,
00:11:52.430 --> 00:11:54.470
and this is the equality
constraint penalty.
00:11:54.470 --> 00:11:57.480
People don't use this, though.
00:11:57.480 --> 00:11:58.470
It makes sense.
00:11:58.470 --> 00:12:00.540
I take the limit
that mu goes to zero.
00:12:00.540 --> 00:12:03.810
I'm going to have
to prefer solutions
00:12:03.810 --> 00:12:06.580
that satisfy these constraints.
00:12:06.580 --> 00:12:08.745
Otherwise, if I don't
satisfy these constraints,
00:12:08.745 --> 00:12:10.620
I could always move
closer to a solution that
00:12:10.620 --> 00:12:12.036
satisfies the
constraint, and I'll
00:12:12.036 --> 00:12:14.970
bring down the value of
the objective function.
00:12:14.970 --> 00:12:15.857
I'll make it lower.
00:12:15.857 --> 00:12:17.940
So I'll always prefer these
lower value solutions.
00:12:17.940 --> 00:12:21.180
But can you guys take a second
and sort of talk to each other?
00:12:21.180 --> 00:12:25.147
See if you can figure out why
one doesn't use this method.
00:12:25.147 --> 00:12:26.355
Why is this method a problem?
00:15:08.980 --> 00:15:11.780
OK, I heard the volume go
up at some point, which
00:15:11.780 --> 00:15:13.952
means either you
switched topics and felt
00:15:13.952 --> 00:15:15.410
more comfortable
talking about that
00:15:15.410 --> 00:15:17.360
than this, or maybe
you guys were coming
00:15:17.360 --> 00:15:19.970
to some conclusions, or
had some ideas about why
00:15:19.970 --> 00:15:21.359
this might be a bad idea.
00:15:21.359 --> 00:15:23.900
Do you want to volunteer some
of what you were talking about?
00:15:23.900 --> 00:15:24.878
Yeah, Hersh.
00:15:24.878 --> 00:15:28.301
AUDIENCE: Could it
be that [INAUDIBLE]??
00:15:41.040 --> 00:15:43.340
JAMES SWAN: Well, that's
an interesting idea.
00:15:43.340 --> 00:15:45.710
So yeah, if we have a
non-convex optimization problem,
00:15:45.710 --> 00:15:48.710
there could be some issues
with f of x, and maybe f
00:15:48.710 --> 00:15:50.810
of x runs away so
fast that I can never
00:15:50.810 --> 00:15:53.080
make the penalty big enough
to enforce the constraint.
00:15:53.080 --> 00:15:54.830
That's actually a
really interesting idea.
00:15:54.830 --> 00:15:57.630
And I like the idea of comparing
the magnitude of these two
00:15:57.630 --> 00:15:58.130
terms.
00:15:58.130 --> 00:15:59.600
I think that's on
the right track.
00:15:59.600 --> 00:16:01.560
Were there some
other ideas about why
00:16:01.560 --> 00:16:02.624
you might not do this?
00:16:02.624 --> 00:16:03.290
Different ideas?
00:16:03.290 --> 00:16:04.408
Yeah.
00:16:04.408 --> 00:16:05.830
AUDIENCE: [INAUDIBLE].
00:16:09.980 --> 00:16:12.480
JAMES SWAN: Well, you know,
that that's an interesting idea,
00:16:12.480 --> 00:16:14.630
but actually the two
terms in the parentheses
00:16:14.630 --> 00:16:17.150
here are both positive.
00:16:17.150 --> 00:16:19.100
So they're only
going to be minimized
00:16:19.100 --> 00:16:21.760
when I satisfy the constraints.
00:16:21.760 --> 00:16:24.500
So the local minima of
the terms in parentheses
00:16:24.500 --> 00:16:30.260
sit on or within the
boundaries of the feasible set
00:16:30.260 --> 00:16:31.410
that we're looking at.
00:16:31.410 --> 00:16:32.868
So by construction,
actually, we're
00:16:32.868 --> 00:16:35.660
going to be able to satisfy
them because the local minima
00:16:35.660 --> 00:16:38.420
of these points sits
on these boundaries.
00:16:38.420 --> 00:16:43.230
These terms are minimized by
satisfying the constraints.
00:16:43.230 --> 00:16:43.731
Other ideas?
00:16:43.731 --> 00:16:44.229
Yeah.
00:16:44.229 --> 00:16:46.190
AUDIENCE: Do your iterates
have to be feasible?
00:16:46.190 --> 00:16:46.610
JAMES SWAN: What's that?
00:16:46.610 --> 00:16:48.380
AUDIENCE: Your iterates
don't have to be feasible?
00:16:48.380 --> 00:16:49.963
JAMES SWAN: Ooh,
this is a good point.
00:16:49.963 --> 00:16:52.590
The iterates-- this is an
unconstrained optimization
00:16:52.590 --> 00:16:53.090
problem.
00:16:53.090 --> 00:16:55.600
I'm just going to minimize
this objective function.
00:16:55.600 --> 00:16:57.140
It's like what
Hersh said, I can go
00:16:57.140 --> 00:16:58.764
anywhere I want in the domain.
00:16:58.764 --> 00:17:00.680
I'm going to minimize
this objective function,
00:17:00.680 --> 00:17:01.280
and then I'm going
to try to take
00:17:01.280 --> 00:17:02.570
the limit as mu goes to zero.
00:17:02.570 --> 00:17:04.194
The iterates don't
have to be feasible.
00:17:04.194 --> 00:17:06.609
Maybe I can't even evaluate
f of x if the iterates aren't
00:17:06.609 --> 00:17:07.310
feasible.
00:17:07.310 --> 00:17:08.660
That's an excellent point.
00:17:08.660 --> 00:17:10.640
That could be an issue.
00:17:10.640 --> 00:17:13.829
Anything else?
00:17:13.829 --> 00:17:17.030
Are there some other ideas?
00:17:17.030 --> 00:17:17.617
Sure.
00:17:17.617 --> 00:17:19.078
AUDIENCE: [INAUDIBLE].
00:17:28.050 --> 00:17:30.007
JAMES SWAN: I think
that's a good point.
00:17:30.007 --> 00:17:31.468
AUDIENCE: --boundary
from outside
00:17:31.468 --> 00:17:33.210
without knowing what's inside.
00:17:33.210 --> 00:17:33.960
JAMES SWAN: Short.
00:17:33.960 --> 00:17:36.090
So you'll see, actually,
the right way to do this
00:17:36.090 --> 00:17:38.370
is to use what's called
interior point methods, which
00:17:38.370 --> 00:17:39.900
live inside the domain.
00:17:39.900 --> 00:17:41.100
This is an excellent point.
00:17:41.100 --> 00:17:43.890
There's another issue
with this that's
00:17:43.890 --> 00:17:46.391
I think actually less subtle
than some of these ideas, which
00:17:46.391 --> 00:17:47.640
they're all correct, actually.
00:17:47.640 --> 00:17:50.220
These can be problems with
this sort of penalty method.
00:17:50.220 --> 00:17:53.370
As I take the limit
that mu goes to zero,
00:17:53.370 --> 00:17:57.390
the penalty function
becomes large for all points
00:17:57.390 --> 00:17:58.320
outside the domain.
00:17:58.320 --> 00:18:02.347
They can become larger
than f for those points.
00:18:02.347 --> 00:18:03.930
And so there are
some practical issues
00:18:03.930 --> 00:18:06.210
about comparing these two
terms against each other.
00:18:06.210 --> 00:18:11.010
I may not have sufficient
accuracy, sufficient number
00:18:11.010 --> 00:18:14.950
of digits to accurately add
these two terms together.
00:18:14.950 --> 00:18:17.661
So I may prefer
to find some point
00:18:17.661 --> 00:18:20.160
that lives on the boundary of
the domain as mu goes to zero.
00:18:20.160 --> 00:18:21.534
But I can't
guarantee that it was
00:18:21.534 --> 00:18:27.600
a minima of f on that domain,
or within that feasible set.
00:18:27.600 --> 00:18:30.360
So a lot of practical
issues that suggest this
00:18:30.360 --> 00:18:32.397
is a bad idea.
00:18:32.397 --> 00:18:33.230
This is an old idea.
00:18:33.230 --> 00:18:34.650
People knew this was
bad for a long time.
00:18:34.650 --> 00:18:35.691
It seems natural, though.
00:18:35.691 --> 00:18:37.530
It seems like a good
way to transform
00:18:37.530 --> 00:18:41.154
from these constrained
optimization problems
00:18:41.154 --> 00:18:42.570
to something we
know how to solve,
00:18:42.570 --> 00:18:43.992
an unconstrained optimization.
00:18:43.992 --> 00:18:46.200
But actually, it turns out
not to be such a great way
00:18:46.200 --> 00:18:46.700
to do it.
00:18:50.340 --> 00:18:52.080
So let's talk about
separating out
00:18:52.080 --> 00:18:54.160
these two different
methods from each other,
00:18:54.160 --> 00:18:55.740
or these two different problems.
00:18:55.740 --> 00:18:57.840
Let's talk first about
equality constraints,
00:18:57.840 --> 00:19:01.550
and then we'll talk about
inequality constraints.
00:19:01.550 --> 00:19:04.040
So equality constrained
optimization problems
00:19:04.040 --> 00:19:04.850
look like this.
00:19:04.850 --> 00:19:08.280
Minimize f of x subject
to c of x equals zero.
00:19:08.280 --> 00:19:09.710
And let's make it even easier.
00:19:09.710 --> 00:19:13.910
Rather than having some vector
of equality constraints,
00:19:13.910 --> 00:19:15.722
let's just have
a single equation
00:19:15.722 --> 00:19:17.930
that we have to satisfy for
that equality constraint,
00:19:17.930 --> 00:19:19.550
like the equation for a circle.
00:19:19.550 --> 00:19:22.970
Solutions have to sit on the
circumference of a circle.
00:19:22.970 --> 00:19:26.170
So one equation that
we have to satisfy.
00:19:26.170 --> 00:19:28.640
You might ask again, what
are the necessary conditions
00:19:28.640 --> 00:19:31.480
for defining a minimum?
00:19:31.480 --> 00:19:33.230
That's what we used
when we had equality--
00:19:33.230 --> 00:19:35.270
or when we had
unconstrained optimization.
00:19:35.270 --> 00:19:38.420
First we had to define
what a minimum was,
00:19:38.420 --> 00:19:40.940
and we found that minima
were critical points, places
00:19:40.940 --> 00:19:44.720
where the gradient of the
objective function was zero.
00:19:44.720 --> 00:19:46.880
That doesn't have
to be true anymore.
00:19:46.880 --> 00:19:52.670
Now, the minima has to live on
this boundary of some domain.
00:19:52.670 --> 00:19:56.240
It has to live in this set
of points c of x equals zero.
00:19:56.240 --> 00:19:58.100
And the gradient of
f is not necessarily
00:19:58.100 --> 00:20:00.980
zero at that minimal point.
00:20:00.980 --> 00:20:04.580
But you might guess that
Taylor expansions are the way
00:20:04.580 --> 00:20:14.180
to figure out what the
appropriate conditions
00:20:14.180 --> 00:20:15.180
for a minima are.
00:20:15.180 --> 00:20:18.050
So let's take f of x, and
let's expand it, do a Taylor
00:20:18.050 --> 00:20:20.630
expansion in some direction, d.
00:20:20.630 --> 00:20:23.810
So we'll take a step away
from x, which is small,
00:20:23.810 --> 00:20:24.840
in some direction, d.
00:20:24.840 --> 00:20:28.550
So f of x plus d is
f of x plus g dot
00:20:28.550 --> 00:20:34.580
d, the dot product between
the gradient of f and d.
00:20:34.580 --> 00:20:38.570
And at a minimum,
either the gradient
00:20:38.570 --> 00:20:43.460
is zero or the gradient is
perpendicular to this direction
00:20:43.460 --> 00:20:45.860
we moved in, d.
00:20:45.860 --> 00:20:53.380
We know that because this
term is going to increase--
00:20:53.380 --> 00:20:55.370
well, will change
the value of f of x.
00:20:55.370 --> 00:20:57.060
It will either make
it bigger or smaller
00:20:57.060 --> 00:20:59.460
depending on whether it's
positive or negative.
00:20:59.460 --> 00:21:01.170
In either case, it
will say that this
00:21:01.170 --> 00:21:04.770
point x can't be a minimum
unless this term is exactly
00:21:04.770 --> 00:21:07.740
equal to zero in the limit
that d becomes small.
00:21:07.740 --> 00:21:09.870
So either the gradient
is zero or the gradient
00:21:09.870 --> 00:21:13.310
is orthogonal to this
direction d we stepped in.
00:21:13.310 --> 00:21:16.290
And d was arbitrary.
00:21:16.290 --> 00:21:18.710
We just said take a
step in a direction, d.
00:21:22.350 --> 00:21:24.150
Lets take our
equality constraint
00:21:24.150 --> 00:21:28.140
and do the same sort of
Taylor expansion, because we
00:21:28.140 --> 00:21:33.050
know if we're searching for
a minima along this curve
00:21:33.050 --> 00:21:35.310
c of x better be equal to zero.
00:21:35.310 --> 00:21:36.820
It better satisfy
the constraint.
00:21:36.820 --> 00:21:40.440
And also, c of x plus d, that
little step in the direction d,
00:21:40.440 --> 00:21:43.140
should also satisfy
the constraint.
00:21:43.140 --> 00:21:46.980
We want to study only the
feasible set of values.
00:21:46.980 --> 00:21:48.602
So actually, d
wasn't arbitrary. d
00:21:48.602 --> 00:21:50.310
had to satisfy this
constraint that, when
00:21:50.310 --> 00:21:53.700
I took this little step, c of x
plus d had to be equal to zero.
00:21:53.700 --> 00:21:55.770
So again, we'll take
now a Taylor expansion
00:21:55.770 --> 00:21:59.310
of c of x plus d, which
is c of x plus grad
00:21:59.310 --> 00:22:03.260
of c of x dotted with d.
00:22:03.260 --> 00:22:07.730
And that implies that d must be
perpendicular to the gradient
00:22:07.730 --> 00:22:10.790
of c of x, because c of
x plus d has to be zero
00:22:10.790 --> 00:22:12.200
and c of x has to be zero.
00:22:12.200 --> 00:22:16.496
So the gradient of c of x
dot d-- it's a leading order
00:22:16.496 --> 00:22:17.870
has also got to
be equal to zero.
00:22:17.870 --> 00:22:21.290
So d and the gradient
in c are perpendicular,
00:22:21.290 --> 00:22:23.780
and d and the gradient
in g have to be
00:22:23.780 --> 00:22:27.230
perpendicular at a minimum.
00:22:27.230 --> 00:22:29.900
That's going to define the
minimum on this equality
00:22:29.900 --> 00:22:31.960
constrained set.
00:22:31.960 --> 00:22:34.830
Does that make sense?
00:22:34.830 --> 00:22:37.260
c satisfies the
constraint, c plus d
00:22:37.260 --> 00:22:38.820
satisfies the constraint.
00:22:38.820 --> 00:22:42.060
If this is true, d has to be
perpendicular to the gradient
00:22:42.060 --> 00:22:46.430
of c, g has to be perpendicular
to the gradient of d.
00:22:46.430 --> 00:22:50.050
d is, in some sense,
arbitrary still.
00:22:50.050 --> 00:22:52.080
d has to satisfy
condition that it's
00:22:52.080 --> 00:22:53.800
perpendicular to
the gradient of c,
00:22:53.800 --> 00:22:55.949
but who knows,
there could be lots
00:22:55.949 --> 00:22:57.990
of vectors that are
perpendicular to the gradient
00:22:57.990 --> 00:22:59.880
of c.
00:22:59.880 --> 00:23:02.100
So the only generic
relationship between these two
00:23:02.100 --> 00:23:06.720
we can formulate is g must be
parallel to the gradient of c.
00:23:06.720 --> 00:23:08.970
g is perpendicular
to d, gradient
00:23:08.970 --> 00:23:10.550
of c is perpendicular to d.
00:23:10.550 --> 00:23:12.985
In the most generic
way, g and gradient of c
00:23:12.985 --> 00:23:14.360
should be parallel
to each other,
00:23:14.360 --> 00:23:16.290
because d I can
select arbitrarily
00:23:16.290 --> 00:23:21.400
from all the vectors of
the same dimension as x.
00:23:24.060 --> 00:23:25.710
If g is parallel to
the gradient of c,
00:23:25.710 --> 00:23:31.080
then I can write that g
minus some scalar multiplied
00:23:31.080 --> 00:23:33.257
by the gradient of
c is equal to zero.
00:23:33.257 --> 00:23:35.340
That's an equivalent
statement, that g is parallel
00:23:35.340 --> 00:23:37.230
to the gradient of c.
00:23:37.230 --> 00:23:41.490
So that's a condition
associated with points
00:23:41.490 --> 00:23:45.660
x that solve this equality
constrained problem.
00:23:45.660 --> 00:23:47.790
The other condition
is that point x still
00:23:47.790 --> 00:23:52.170
has to satisfy the
equality constraint.
00:23:52.170 --> 00:23:55.560
But I introduced a new
unknown, this lambda,
00:23:55.560 --> 00:23:58.170
which is called the
Lagrange multiplier.
00:23:58.170 --> 00:24:02.730
So now I have one extra unknown,
but I have one extra equation.
00:24:06.094 --> 00:24:08.010
Let me give you a graphical
depiction of this,
00:24:08.010 --> 00:24:11.510
and then I'll write down
the formal equations again.
00:24:11.510 --> 00:24:14.660
So let's suppose
we want to minimize
00:24:14.660 --> 00:24:17.750
this parabolic function
subject to the constraint
00:24:17.750 --> 00:24:20.540
that the solution
lives on the line.
00:24:20.540 --> 00:24:22.520
So here's the contours
of the function,
00:24:22.520 --> 00:24:24.590
and the solution has
to live on this line.
00:24:28.380 --> 00:24:30.050
So I get to stand
on this line, and I
00:24:30.050 --> 00:24:34.010
get to walk and walk and walk
until I can't walk downhill
00:24:34.010 --> 00:24:36.650
anymore. and I've got to
turn and walk uphill again.
00:24:36.650 --> 00:24:40.870
And you can see the point where
I can't walk downhill anymore
00:24:40.870 --> 00:24:45.170
is the place where this
constraint is parallel
00:24:45.170 --> 00:24:49.700
to the contour, or
where the gradient
00:24:49.700 --> 00:24:52.520
of the objective
function is parallel
00:24:52.520 --> 00:24:55.712
to the gradient
of the constraint.
00:24:55.712 --> 00:24:57.170
So you can actually
find this point
00:24:57.170 --> 00:25:00.140
by imagining yourself
moving along this landscape.
00:25:00.140 --> 00:25:02.960
After I get to this point,
I start going uphill again.
00:25:05.530 --> 00:25:09.700
So that's the method of
Lagrange multipliers.
00:25:09.700 --> 00:25:11.800
Minimize f of x subject
to this constraint.
00:25:11.800 --> 00:25:15.380
The solution is
given by the point x
00:25:15.380 --> 00:25:20.510
at which the gradient is
parallel to the gradient of c,
00:25:20.510 --> 00:25:22.910
and at which c is equal to zero.
00:25:22.910 --> 00:25:25.970
And you solve this system
of nonlinear equations
00:25:25.970 --> 00:25:27.900
for two unknowns.
00:25:27.900 --> 00:25:31.790
One is x, and the other
is this unknown lambda.
00:25:31.790 --> 00:25:35.540
How far stretched
is the gradient
00:25:35.540 --> 00:25:39.192
in f relative to
the gradient in c?
00:25:39.192 --> 00:25:41.150
So again, we've turned
the minimization problem
00:25:41.150 --> 00:25:43.760
into a system of
nonlinear equations.
00:25:43.760 --> 00:25:46.100
In order to satisfy the
equality constraint,
00:25:46.100 --> 00:25:47.980
we've had to introduce
another unknown,
00:25:47.980 --> 00:25:50.150
the Lagrange multiplier.
00:25:50.150 --> 00:25:54.980
It turns out this solution
set, x and lambda,
00:25:54.980 --> 00:25:58.940
is a critical point of
something called the Lagrangian.
00:25:58.940 --> 00:26:03.950
It's a function f of x
minus lambda times c.
00:26:03.950 --> 00:26:08.840
It's a critical point in x
and lambda of this nonlinear
00:26:08.840 --> 00:26:10.820
function called the Lagrangian.
00:26:10.820 --> 00:26:13.010
It's not a minimum of this
function, unfortunately.
00:26:13.010 --> 00:26:17.240
It's a saddle point of the
Lagrangian, it turns out.
00:26:17.240 --> 00:26:21.671
So we're trying to find a
saddle point of the Lagrangian.
00:26:21.671 --> 00:26:23.990
Does this make sense?
00:26:23.990 --> 00:26:24.747
Yes?
00:26:24.747 --> 00:26:25.902
OK.
00:26:25.902 --> 00:26:27.360
We've got to be
careful, of course.
00:26:27.360 --> 00:26:29.630
Just like with
unconstrained optimization,
00:26:29.630 --> 00:26:33.530
we actually have to check that
our solution is a minimum.
00:26:33.530 --> 00:26:36.200
We can't take for
granted, we can't
00:26:36.200 --> 00:26:39.920
suppose that our nonlinear
solver found a minimum
00:26:39.920 --> 00:26:41.322
when it solved this equation.
00:26:41.322 --> 00:26:43.530
Other critical points can
satisfy this equation, too.
00:26:43.530 --> 00:26:47.169
So we've got to go back and try
to check robustly whether it's
00:26:47.169 --> 00:26:47.960
actually a minimum.
00:26:47.960 --> 00:26:48.918
But this is the method.
00:26:48.918 --> 00:26:53.060
Introduce an additional unknown,
the Lagrange multiplier,
00:26:53.060 --> 00:26:54.809
because you can
show geometrically
00:26:54.809 --> 00:26:56.600
that the gradient of
the objective function
00:26:56.600 --> 00:27:00.140
should be parallel to the
gradient of the constraint
00:27:00.140 --> 00:27:01.670
at the minimum.
00:27:01.670 --> 00:27:03.541
Does that make sense?
00:27:03.541 --> 00:27:05.060
Does this picture make sense?
00:27:05.060 --> 00:27:06.020
OK.
00:27:06.020 --> 00:27:08.353
So you know how to solve
systems of nonlinear equations,
00:27:08.353 --> 00:27:10.460
you know how to solve
constrained optimization
00:27:10.460 --> 00:27:10.959
problems.
00:27:15.300 --> 00:27:17.060
So here's f.
00:27:17.060 --> 00:27:19.870
Here's c.
00:27:19.870 --> 00:27:23.170
We can actually write out
what these equations are.
00:27:23.170 --> 00:27:25.750
So you can show that the
gradient of x minus lambda
00:27:25.750 --> 00:27:31.660
gradient of c, that's a vector,
2x1 minus lambda and 20x2
00:27:31.660 --> 00:27:33.070
plus lambda.
00:27:33.070 --> 00:27:34.960
And c is the equation
for this line
00:27:34.960 --> 00:27:37.480
down here, so x1
minus x2 minus 3.
00:27:37.480 --> 00:27:39.850
And that's all got
to be equal to zero.
00:27:39.850 --> 00:27:42.410
In this case, this is just a
system of linear equations.
00:27:42.410 --> 00:27:44.080
So you can actually
solve directly
00:27:44.080 --> 00:27:47.629
for x1, x2, and lambda.
00:27:47.629 --> 00:27:50.170
And it's not too difficult to
find the solution for all three
00:27:50.170 --> 00:27:52.480
of these things by hand.
00:27:52.480 --> 00:27:55.750
But in general, these
constraints can be nonlinear.
00:27:55.750 --> 00:27:58.570
The objective function
doesn't have to be quadratic.
00:27:58.570 --> 00:28:00.340
Those are the easiest
cases to look at.
00:28:00.340 --> 00:28:02.810
And the same
methodology applies.
00:28:02.810 --> 00:28:06.265
And so you should check
that you're able to do this.
00:28:06.265 --> 00:28:08.700
This is the simplest possible
equality constraint problem.
00:28:08.700 --> 00:28:09.720
You could do it by hand.
00:28:09.720 --> 00:28:11.230
You should check that you're
actually able to do it,
00:28:11.230 --> 00:28:13.840
that you understand the steps
that go into writing out
00:28:13.840 --> 00:28:14.911
these equations.
00:28:17.500 --> 00:28:19.290
Let's just take one
step forward and look
00:28:19.290 --> 00:28:21.810
at a less generic
case, one in which
00:28:21.810 --> 00:28:28.220
we have a vector valued
function that gives the equality
00:28:28.220 --> 00:28:29.240
constraints instead.
00:28:29.240 --> 00:28:31.220
So rather than one equation
we have to satisfy,
00:28:31.220 --> 00:28:31.985
there may be many.
00:28:34.610 --> 00:28:39.110
It's possible that the
feasible set doesn't
00:28:39.110 --> 00:28:41.240
have any solutions in it.
00:28:41.240 --> 00:28:42.830
It's possible that
there is no x that
00:28:42.830 --> 00:28:46.560
satisfies all of these
constraints simultaneously.
00:28:46.560 --> 00:28:49.490
That's a bad problem to have.
00:28:49.490 --> 00:28:52.100
You wouldn't like to have
that problem very much.
00:28:52.100 --> 00:28:53.850
But it's possible
that that's the case.
00:28:53.850 --> 00:28:57.600
But let's assume that there are
solutions for the time being.
00:28:57.600 --> 00:29:00.081
So there are x's that satisfy
the equality constraint.
00:29:00.081 --> 00:29:01.580
Let's see if we can
figure out again
00:29:01.580 --> 00:29:04.640
what the necessary conditions
for defining a minima are.
00:29:04.640 --> 00:29:07.100
So same as before,
let's Taylor expand
00:29:07.100 --> 00:29:10.340
f of x going in
some direction, d.
00:29:10.340 --> 00:29:12.060
And let's make d
a nice small step
00:29:12.060 --> 00:29:14.840
so we can just treat
the f of x plus d
00:29:14.840 --> 00:29:16.820
as a linearized function.
00:29:16.820 --> 00:29:19.220
So we can see again
that g has to be
00:29:19.220 --> 00:29:21.770
perpendicular to this
direction, d, if we're
00:29:21.770 --> 00:29:22.790
going to have a minima.
00:29:22.790 --> 00:29:24.665
Otherwise, I could step
in some direction, d,
00:29:24.665 --> 00:29:27.920
and I'll find either a
smaller value of f of x plus d
00:29:27.920 --> 00:29:30.560
or a bigger value
of f of x plus d.
00:29:30.560 --> 00:29:33.626
So g has to be
perpendicular to d.
00:29:33.626 --> 00:29:35.000
And for the equality
constraints,
00:29:35.000 --> 00:29:39.901
again, they all have to satisfy
this equality constraint
00:29:39.901 --> 00:29:40.400
up there.
00:29:40.400 --> 00:29:43.970
So c of x has to be equal
to zero, and c of x plus d
00:29:43.970 --> 00:29:45.310
also has to be equal to zero.
00:29:47.950 --> 00:29:51.710
And so if we take
a Taylor expansion
00:29:51.710 --> 00:29:55.670
of c of x plus d,
about x, you'll
00:29:55.670 --> 00:29:58.730
get c of x plus d
plus the Jacobian
00:29:58.730 --> 00:30:02.680
of c, all the partial
derivatives of c with respect
00:30:02.680 --> 00:30:04.970
to x, multiplied by d.
00:30:07.960 --> 00:30:11.710
We know that c of x plus d
is zero, and c of x is zero,
00:30:11.710 --> 00:30:16.830
so the directions, d, belong
to what set of vectors?
00:30:16.830 --> 00:30:18.240
The null space.
00:30:18.240 --> 00:30:21.300
So these directions have
to live in the null space
00:30:21.300 --> 00:30:25.620
of the Jacobian of c.
00:30:25.620 --> 00:30:27.090
So I can't step
in any direction,
00:30:27.090 --> 00:30:29.100
I have to step in
directions that
00:30:29.100 --> 00:30:33.464
are in the null space of c.
00:30:33.464 --> 00:30:36.520
g is perpendicular
to d, as well.
00:30:36.520 --> 00:30:40.180
And d belongs to
the null space of c.
00:30:40.180 --> 00:30:45.940
In fact, you know that d
is perpendicular to each
00:30:45.940 --> 00:30:47.860
of the rows of the Jacobian.
00:30:47.860 --> 00:30:49.290
Right?
00:30:49.290 --> 00:30:51.120
You know that?
00:30:51.120 --> 00:30:53.800
I just do the matrix
vector product, right?
00:30:53.800 --> 00:30:56.190
And so each element of
this matrix vector product
00:30:56.190 --> 00:31:01.920
is the dot product of d with a
different row of the Jacobian.
00:31:01.920 --> 00:31:05.880
So those rows are
a set of vectors.
00:31:05.880 --> 00:31:12.660
Those rows describe the
range of J transpose,
00:31:12.660 --> 00:31:15.960
or the row space of J. Remember
we talked about the four
00:31:15.960 --> 00:31:17.460
fundamental
subspaces, and I said
00:31:17.460 --> 00:31:19.001
we almost never use
those other ones,
00:31:19.001 --> 00:31:20.910
but this is one
time when we will.
00:31:20.910 --> 00:31:25.670
So those rows belong to
the range of J transpose,
00:31:25.670 --> 00:31:29.760
or they belong to the
left null space of J.
00:31:29.760 --> 00:31:33.420
I need to find a g,
a gradient, which
00:31:33.420 --> 00:31:34.980
is always perpendicular to d.
00:31:34.980 --> 00:31:39.690
And I know d is always
perpendicular to the rows of J.
00:31:39.690 --> 00:31:44.315
So I can write g as a linear
superposition of the rows of J.
00:31:44.315 --> 00:31:46.440
As long as g is a linear
superposition of the rows,
00:31:46.440 --> 00:31:50.320
it'll always be
perpendicular to d.
00:31:50.320 --> 00:31:53.080
Vectors from the null
space of a matrix
00:31:53.080 --> 00:31:57.640
are orthogonal to vectors from
the row space of that matrix,
00:31:57.640 --> 00:31:59.326
it turns out.
00:31:59.326 --> 00:32:00.950
And they're orthogonal
for this reason.
00:32:06.350 --> 00:32:11.750
So it tells us, if
Jd is zero, then
00:32:11.750 --> 00:32:13.300
d belongs to the null space.
00:32:13.300 --> 00:32:15.280
g is perpendicular to d.
00:32:15.280 --> 00:32:18.350
That means I could write g
as a linear superposition
00:32:18.350 --> 00:32:24.530
of the rows of J. So g belongs
to the range of J transpose,
00:32:24.530 --> 00:32:27.290
or it belongs to
the row space of J.
00:32:27.290 --> 00:32:29.230
Those are equivalent statements.
00:32:29.230 --> 00:32:30.980
And therefore, I should
be able to write g
00:32:30.980 --> 00:32:33.782
as a linear superposition
of the rows of J.
00:32:33.782 --> 00:32:35.240
And one way to say
that is I should
00:32:35.240 --> 00:32:37.160
be able to write
g as J transpose
00:32:37.160 --> 00:32:41.990
times some other vector lambda.
00:32:41.990 --> 00:32:43.790
That's an equivalent
way of saying
00:32:43.790 --> 00:32:48.004
that g is a linear
superposition of the rows of J.
00:32:48.004 --> 00:32:49.420
I don't know the
values of lambda.
00:32:53.700 --> 00:32:55.590
So I introduced a
new set of unknowns,
00:32:55.590 --> 00:32:59.142
a set of Lagrange multipliers.
00:32:59.142 --> 00:33:00.600
My minima is going
to be found when
00:33:00.600 --> 00:33:04.530
I satisfy this equation,
just like before,
00:33:04.530 --> 00:33:08.280
and when I'm able to satisfy
all of the equality constraints.
00:33:14.160 --> 00:33:19.960
How many Lagrange
multipliers do I have here?
00:33:19.960 --> 00:33:20.960
Can you figure that out?
00:33:20.960 --> 00:33:23.324
You can talk with your
neighbors if you want.
00:33:23.324 --> 00:33:24.240
Take a couple minutes.
00:33:24.240 --> 00:33:26.600
Tell me how many Lagrange
multipliers, how many elements
00:33:26.600 --> 00:33:27.710
are in this vector lambda.
00:34:19.380 --> 00:34:22.934
How many elements are in lambda?
00:34:22.934 --> 00:34:23.600
Can you tell me?
00:34:26.383 --> 00:34:26.883
Sam.
00:34:26.883 --> 00:34:29.239
AUDIENCE: Same as the number
of equality constraints.
00:34:29.239 --> 00:34:30.530
JAMES SWAN: Yes.
00:34:30.530 --> 00:34:33.469
It's the same as the number
of equality constraints.
00:34:33.469 --> 00:34:38.219
J came from the gradient of c.
00:34:38.219 --> 00:34:41.370
It's the Jacobian of c.
00:34:41.370 --> 00:34:46.290
So it has a number of columns
equal to the number of elements
00:34:46.290 --> 00:34:48.889
in x, because I'm taking
partial derivatives with respect
00:34:48.889 --> 00:34:51.300
to each element of
x, and has a number
00:34:51.300 --> 00:34:54.960
of rows equal to the
number of elements of c.
00:34:54.960 --> 00:34:58.770
So J transpose, I just
transpose those dimensions.
00:34:58.770 --> 00:35:02.700
And lambda must have the
same number of elements
00:35:02.700 --> 00:35:06.000
as c does in order to make
this product make sense.
00:35:06.000 --> 00:35:08.250
So I introduce a new
number of unknowns.
00:35:08.250 --> 00:35:12.240
It's equal to exactly the
number of equality constraints
00:35:12.240 --> 00:35:14.610
that I had, which
is good, because I'm
00:35:14.610 --> 00:35:17.220
going to make a system
of equations that
00:35:17.220 --> 00:35:20.790
says g of x minus J
transpose lambda equals 0
00:35:20.790 --> 00:35:23.460
and c of x equals 0.
00:35:23.460 --> 00:35:26.970
And the number of equations
here is the number
00:35:26.970 --> 00:35:29.290
of elements in x
for this gradient,
00:35:29.290 --> 00:35:31.874
and the number of
elements in c for c.
00:35:31.874 --> 00:35:33.540
And the number of
unknowns is the number
00:35:33.540 --> 00:35:36.902
of elements in x, and the number
of elements in c associated
00:35:36.902 --> 00:35:38.110
with the Lagrange multiplier.
00:35:38.110 --> 00:35:40.800
So I have enough equations
and unknowns to determine
00:35:40.800 --> 00:35:43.700
all of these things.
00:35:43.700 --> 00:35:47.440
So whether I have one equality
constraint or a million
00:35:47.440 --> 00:35:50.530
equality constraints,
the problem is identical.
00:35:50.530 --> 00:35:52.660
We use the method of
Lagrange multipliers.
00:35:52.660 --> 00:35:55.780
We have to solve an
augmented system of equations
00:35:55.780 --> 00:36:02.800
for x and this projection on the
row space of J, which tells us
00:36:02.800 --> 00:36:05.710
how the gradient is
stretched or made up,
00:36:05.710 --> 00:36:08.800
composed of elements
of the row space of J.
00:36:08.800 --> 00:36:10.300
These are the
conditions associated
00:36:10.300 --> 00:36:13.690
with a minima in our
objective function
00:36:13.690 --> 00:36:17.566
on this boundary dictated
by the equality constraint.
00:36:17.566 --> 00:36:19.690
And of course, the solution
set is a critical point
00:36:19.690 --> 00:36:25.800
of a Lagrangian, which is
f of x minus c dot lambda.
00:36:25.800 --> 00:36:28.840
And it's not a minimum of
it, it's a critical point.
00:36:28.840 --> 00:36:33.309
It's a saddle point, it turns
out, of this Lagrangian.
00:36:33.309 --> 00:36:35.350
So we've got to check,
did we find a saddle point
00:36:35.350 --> 00:36:39.974
or not when we find a solution
to this equation here.
00:36:39.974 --> 00:36:41.890
But it's just a system
of nonlinear equations.
00:36:41.890 --> 00:36:44.860
If we have some good initial
guess, what do we apply?
00:36:44.860 --> 00:36:48.670
Newton-Raphson, converge
rate towards the solution.
00:36:48.670 --> 00:36:51.790
If we don't have a
good initial guess,
00:36:51.790 --> 00:36:55.270
we've discussed lots of methods
we could employ, like homotopy
00:36:55.270 --> 00:36:57.340
or continuation
to try to develop
00:36:57.340 --> 00:37:00.709
good initial guesses for
what the solution should be.
00:37:00.709 --> 00:37:02.601
Are there any
questions about this?
00:37:05.450 --> 00:37:07.930
Good.
00:37:07.930 --> 00:37:08.430
OK.
00:37:08.430 --> 00:37:14.370
So you go to Matlab
and you call fmincon,
00:37:14.370 --> 00:37:17.120
do a minimization problem, and
you give it some constraints.
00:37:17.120 --> 00:37:18.870
Linear constraints,
nonlinear constraints,
00:37:18.870 --> 00:37:20.502
it doesn't matter actually.
00:37:20.502 --> 00:37:22.210
The problem is the
same for both of them.
00:37:22.210 --> 00:37:25.440
It's just a little bit easier
if I have linear constraints.
00:37:25.440 --> 00:37:28.650
If this constraining function
is a linear function, then
00:37:28.650 --> 00:37:30.360
the Jacobian I know.
00:37:30.360 --> 00:37:34.594
It's the coefficient matrix
of this linear problem.
00:37:34.594 --> 00:37:36.760
Now I only have to solve
linear equations down here.
00:37:36.760 --> 00:37:39.730
So the problem is a little
bit simpler to solve.
00:37:39.730 --> 00:37:42.360
So Matlab sort of
breaks these apart
00:37:42.360 --> 00:37:45.420
so it can use different
techniques depending on which
00:37:45.420 --> 00:37:46.530
sort of problem is posed.
00:37:46.530 --> 00:37:48.030
But the solution
method is the same.
00:37:48.030 --> 00:37:49.950
It does the method of
Lagrange multipliers
00:37:49.950 --> 00:37:51.394
to find the solution.
00:37:51.394 --> 00:37:51.894
OK?
00:37:54.850 --> 00:37:58.270
Inequality constraints.
00:37:58.270 --> 00:38:02.620
So interior point
methods were mentioned.
00:38:02.620 --> 00:38:04.750
And it turns out this
is really the best
00:38:04.750 --> 00:38:08.740
way to go about solving
generic inequality constrained
00:38:08.740 --> 00:38:09.790
problems.
00:38:09.790 --> 00:38:11.470
So the problems of
the sort minimize
00:38:11.470 --> 00:38:15.340
f of x subject to
h of x is positive,
00:38:15.340 --> 00:38:17.830
or at least not negative.
00:38:17.830 --> 00:38:20.110
This is some
nonlinear inequality
00:38:20.110 --> 00:38:23.350
that describes some domain
and its boundary in which
00:38:23.350 --> 00:38:25.120
the solution has to live.
00:38:25.120 --> 00:38:29.530
And what's done is to rewrite
as an unconstrained optimization
00:38:29.530 --> 00:38:34.480
problem with a barrier
that's incorporated.
00:38:34.480 --> 00:38:36.310
This looks a lot like
the penalty method,
00:38:36.310 --> 00:38:37.750
but it's very different.
00:38:37.750 --> 00:38:39.610
And I'll explain how.
00:38:39.610 --> 00:38:43.630
So instead, we want to
minimize this f of x minus mu
00:38:43.630 --> 00:38:49.090
times the sum of log of h,
each of these constraints.
00:38:51.840 --> 00:38:58.110
If h is negative, we'll take the
log of the negative argument.
00:38:58.110 --> 00:39:00.430
That's a problem
computationally.
00:39:00.430 --> 00:39:02.670
So the best we could do
is approach the boundary
00:39:02.670 --> 00:39:05.400
where h is equal to zero.
00:39:05.400 --> 00:39:08.320
And as h goes to zero, the
log goes to minus infinity.
00:39:08.320 --> 00:39:11.490
So this term tends to blow
up because I've got a minus
00:39:11.490 --> 00:39:12.780
sign in front of it.
00:39:12.780 --> 00:39:16.920
So this is sort
of like a penalty,
00:39:16.920 --> 00:39:19.320
but it's a little different
because the factor in front
00:39:19.320 --> 00:39:23.670
I'm actually going to take
the limit as mu goes to zero.
00:39:23.670 --> 00:39:26.340
I'm going to take the limit
as this factor gets small,
00:39:26.340 --> 00:39:28.650
rather than gets big.
00:39:28.650 --> 00:39:31.170
The log will always
get big as I approach
00:39:31.170 --> 00:39:33.070
the boundary of the domain.
00:39:33.070 --> 00:39:35.061
It'll blow up.
00:39:35.061 --> 00:39:36.060
So that's not a problem.
00:39:36.060 --> 00:39:39.180
But I can take the limit that
mu gets smaller and smaller.
00:39:39.180 --> 00:39:42.690
And this quantity here
will have less and less
00:39:42.690 --> 00:39:46.920
of an impact on the shape of
this new objective function
00:39:46.920 --> 00:39:48.600
and mu gets smaller and smaller.
00:39:48.600 --> 00:39:51.250
The impact will only be
nearest the boundary.
00:39:51.250 --> 00:39:53.100
Does that make sense?
00:39:53.100 --> 00:39:55.356
So you take the limit
that mu approaches zero.
00:39:55.356 --> 00:39:57.480
It's got to approach it
from the positive side, not
00:39:57.480 --> 00:40:01.386
the negative side, so
everything behaves well.
00:40:01.386 --> 00:40:05.575
And this is called an
interior point method.
00:40:05.575 --> 00:40:07.950
So we have to determine the
minimum of this new objective
00:40:07.950 --> 00:40:10.740
function for progressively
weaker barriers.
00:40:10.740 --> 00:40:12.570
So we might start
with some value of mu,
00:40:12.570 --> 00:40:15.780
and we might reduce
mu progressively
00:40:15.780 --> 00:40:17.490
until we get mu
down small enough
00:40:17.490 --> 00:40:19.509
that we think we've
converged to a solution.
00:40:19.509 --> 00:40:20.800
So how do you do that reliably?
00:40:24.340 --> 00:40:29.140
What's the procedure one uses
to solve a problem successively
00:40:29.140 --> 00:40:30.445
for different parameter values?
00:40:32.914 --> 00:40:33.830
AUDIENCE: [INAUDIBLE].
00:40:33.830 --> 00:40:35.538
JAMES SWAN: Yeah, it's
a homotopy, right?
00:40:35.538 --> 00:40:38.120
You're just going to change
the value of this barrier
00:40:38.120 --> 00:40:39.364
parameter.
00:40:39.364 --> 00:40:40.780
And you're going
to find a minima.
00:40:40.780 --> 00:40:42.650
And if you make a small change
in the barrier parameter,
00:40:42.650 --> 00:40:44.774
that's going to serve as
an excellent initial guess
00:40:44.774 --> 00:40:46.220
for the next value.
00:40:46.220 --> 00:40:48.770
And so you're just going
to take these small steps.
00:40:48.770 --> 00:40:51.200
And the optimization
routine is going
00:40:51.200 --> 00:40:53.180
to carry you towards
the minimum in the limit
00:40:53.180 --> 00:40:54.370
that mu goes to zero.
00:40:54.370 --> 00:40:55.610
So you do this with homotopy.
00:40:58.340 --> 00:41:01.700
Here's an example of this
sort of interior point
00:41:01.700 --> 00:41:03.200
method, a trivial example.
00:41:03.200 --> 00:41:06.180
Minimize x subject
to x being positive.
00:41:06.180 --> 00:41:09.695
So we know the solution
lives where x equals zero.
00:41:09.695 --> 00:41:13.040
But let's write this as
unconstrained optimization
00:41:13.040 --> 00:41:13.760
using a barrier.
00:41:13.760 --> 00:41:18.850
So minimize x minus
mu times log x.
00:41:18.850 --> 00:41:21.140
Here's x minus mu times log x.
00:41:21.140 --> 00:41:26.070
So out here, where x is
big, x wins over log x,
00:41:26.070 --> 00:41:28.120
so everything starts
to look linear.
00:41:28.120 --> 00:41:30.470
But as x become
smaller and smaller,
00:41:30.470 --> 00:41:34.010
log x gets very negative, so
minus log x gets very positive.
00:41:34.010 --> 00:41:36.210
And here's the log
creeping back up.
00:41:36.210 --> 00:41:38.206
And as I decrease mu
smaller and smaller,
00:41:38.206 --> 00:41:39.830
you can see the minima
of this function
00:41:39.830 --> 00:41:44.840
is moving closer and
closer and closer to zero.
00:41:44.840 --> 00:41:47.240
So if I take the limit
that mu decreases
00:41:47.240 --> 00:41:49.520
from some positive
number towards zero,
00:41:49.520 --> 00:41:52.410
eventually this minimum
is going to converge
00:41:52.410 --> 00:41:55.340
to the minimum of the
constrained inequality,
00:41:55.340 --> 00:41:57.580
constrained
optimization problem.
00:41:57.580 --> 00:41:58.900
Make sense?
00:41:58.900 --> 00:42:01.350
OK.
00:42:01.350 --> 00:42:01.850
OK.
00:42:01.850 --> 00:42:04.740
So we want to do this.
00:42:04.740 --> 00:42:07.130
You can use any barrier
function you want.
00:42:07.130 --> 00:42:09.650
Any thoughts on why a
logarithmic barrier is used?
00:42:18.710 --> 00:42:19.210
No.
00:42:19.210 --> 00:42:21.100
OK, that's OK.
00:42:21.100 --> 00:42:23.910
So minus log is
going to be convex.
00:42:23.910 --> 00:42:26.720
Log isn't convex, but minus
log is going to be convex.
00:42:26.720 --> 00:42:27.795
So that's good.
00:42:27.795 --> 00:42:29.920
If this function's convex,
then their combination's
00:42:29.920 --> 00:42:32.020
going to be convex,
and we'll be OK.
00:42:32.020 --> 00:42:34.300
But the gradient of the
log is easy to compute.
00:42:34.300 --> 00:42:37.690
Grad log h is 1 over h grad h.
00:42:37.690 --> 00:42:40.180
So if I know h, I know
grad h, it's easy for me
00:42:40.180 --> 00:42:42.130
to compute the
gradient of log h.
00:42:42.130 --> 00:42:46.000
We know we're going to solve
this unconstrained optimization
00:42:46.000 --> 00:42:48.690
problem where we need to take
grad of this objective function
00:42:48.690 --> 00:42:49.210
equal zero.
00:42:49.210 --> 00:42:50.987
So the calculations are easy.
00:42:50.987 --> 00:42:52.320
The log makes it easy like that.
00:42:52.320 --> 00:42:55.870
The log is also like
the most weakly singular
00:42:55.870 --> 00:42:57.520
function available to us.
00:42:57.520 --> 00:43:00.070
Out of all the tool box of
all problems we can reach to,
00:43:00.070 --> 00:43:02.835
the log has the mildest
sort of singularities.
00:43:02.835 --> 00:43:04.960
Singularities at both ends,
which is sort of funny,
00:43:04.960 --> 00:43:06.520
but the mildest sort
of singularities
00:43:06.520 --> 00:43:08.782
you have to cope with.
00:43:08.782 --> 00:43:10.990
So we want to find the
minimum of these unconstrained
00:43:10.990 --> 00:43:15.310
optimization problems where the
gradient of f minus mu sum 1
00:43:15.310 --> 00:43:18.400
over h, grad h,
is equal to zero.
00:43:18.400 --> 00:43:20.700
And we just do that for
progressively smaller values
00:43:20.700 --> 00:43:23.410
of mu, and we'll
converge to a solution.
00:43:23.410 --> 00:43:25.720
That's the interior
point method.
00:43:25.720 --> 00:43:31.360
You use homotopy to study a
sequence of barrier parameters,
00:43:31.360 --> 00:43:36.760
or continuation to study a
sequence of barrier parameters.
00:43:36.760 --> 00:43:41.580
You stop the homotopy or
continuation when what?
00:43:41.580 --> 00:43:42.806
How are you going to stop?
00:43:47.000 --> 00:43:49.234
I've got to make
mu small, right?
00:43:49.234 --> 00:43:51.150
I want to go towards the
limit mu equals zero.
00:43:51.150 --> 00:43:52.860
I can't actually get
to mu equals zero,
00:43:52.860 --> 00:43:54.450
I've just got to approach it.
00:43:54.450 --> 00:43:57.930
So how small do I need
to make mu before I quit?
00:43:57.930 --> 00:43:59.160
It's an interesting question.
00:43:59.160 --> 00:43:59.910
What do you think?
00:44:02.274 --> 00:44:03.440
I'll take this answer first.
00:44:03.440 --> 00:44:06.512
AUDIENCE: So it doesn't
affect the limitation.
00:44:06.512 --> 00:44:07.220
JAMES SWAN: Good.
00:44:07.220 --> 00:44:09.027
So we might look at
the solution and see
00:44:09.027 --> 00:44:10.610
is the solution
becoming less and less
00:44:10.610 --> 00:44:12.430
sensitive to the choice of mu.
00:44:12.430 --> 00:44:14.252
Did you have another suggestion?
00:44:14.252 --> 00:44:15.668
AUDIENCE: [INAUDIBLE].
00:44:17.930 --> 00:44:19.180
JAMES SWAN: Set the tolerance.
00:44:19.180 --> 00:44:20.766
Right, OK.
00:44:20.766 --> 00:44:23.484
AUDIENCE: [INAUDIBLE].
00:44:23.484 --> 00:44:24.150
JAMES SWAN: Mhm.
00:44:24.150 --> 00:44:25.050
Right, right, right, right.
00:44:25.050 --> 00:44:25.550
So you--
00:44:25.550 --> 00:44:28.842
AUDIENCE: [INAUDIBLE].
00:44:28.842 --> 00:44:29.550
JAMES SWAN: Good.
00:44:29.550 --> 00:44:31.210
So there were two
suggestions here.
00:44:31.210 --> 00:44:34.100
One is along the lines
of a step-norm criteria,
00:44:34.100 --> 00:44:36.520
like I check my
solution as I change mu,
00:44:36.520 --> 00:44:39.700
and I ask when does
my solution seem
00:44:39.700 --> 00:44:42.700
relatively insensitive to mu.
00:44:42.700 --> 00:44:45.370
When the changes in these
steps relative to mu
00:44:45.370 --> 00:44:48.010
get sufficiently
small, I might be
00:44:48.010 --> 00:44:49.810
willing to accept
these solutions
00:44:49.810 --> 00:44:53.230
as reasonable solutions for
the constrained optimization.
00:44:53.230 --> 00:44:55.180
I can also go back
and I can check
00:44:55.180 --> 00:44:58.240
sort of function norm criteria.
00:44:58.240 --> 00:45:00.650
I can take the value of
x I found as the minimum,
00:45:00.650 --> 00:45:02.920
and I can ask how
good a job does
00:45:02.920 --> 00:45:08.140
it do satisfying the
original equations.
00:45:08.140 --> 00:45:11.180
How far away am I from
satisfying the inequality
00:45:11.180 --> 00:45:11.680
constraint?
00:45:11.680 --> 00:45:14.740
How close am I to actually
minimizing the function
00:45:14.740 --> 00:45:15.625
within that domain?
00:45:20.245 --> 00:45:21.344
OK.
00:45:21.344 --> 00:45:22.760
So we're running
out of time here.
00:45:22.760 --> 00:45:26.300
Let me provide you
with an example.
00:45:26.300 --> 00:45:27.350
So let's minimize again--
00:45:27.350 --> 00:45:29.808
I always pick this function
because it's easy to visualize,
00:45:29.808 --> 00:45:32.770
a nice parabolic function
that opens upwards.
00:45:32.770 --> 00:45:36.340
And let's minimize it
subject to the constraint
00:45:36.340 --> 00:45:42.620
that h of x1 and x2
is equal to 1 minus--
00:45:42.620 --> 00:45:45.730
well, the equation for a circle
of radius 1, essentially.
00:45:45.730 --> 00:45:49.240
The interior of that circle.
00:45:49.240 --> 00:45:51.450
So here's the contours
of the function,
00:45:51.450 --> 00:45:53.055
and this red domain
is the constraint.
00:45:53.055 --> 00:45:54.840
And we want to know
the smallest value
00:45:54.840 --> 00:45:58.440
of f that lives in this domain.
00:45:58.440 --> 00:45:59.440
So here's a Matlab code.
00:45:59.440 --> 00:46:00.800
You can try it out.
00:46:00.800 --> 00:46:03.960
And make a function, the
objective function, f,
00:46:03.960 --> 00:46:06.650
it's x squared plus 10x--
00:46:06.650 --> 00:46:08.900
x1 squared plus 10x2 squared.
00:46:08.900 --> 00:46:10.240
Here's the gradient.
00:46:10.240 --> 00:46:13.210
Here's the Hessian.
00:46:13.210 --> 00:46:15.700
Here, I calculate h.
00:46:15.700 --> 00:46:17.080
Here's the gradient in h.
00:46:17.080 --> 00:46:19.810
Here's the Hessian in h.
00:46:19.810 --> 00:46:22.810
I've got to define a new
objective function, phi,
00:46:22.810 --> 00:46:26.470
which is f minus mu log h.
00:46:26.470 --> 00:46:29.320
This is the gradient in phi
and this is the Hessian of phi.
00:46:29.320 --> 00:46:30.760
Oh, man, what a mess.
00:46:30.760 --> 00:46:33.310
But actually, not such
a mess, because the log
00:46:33.310 --> 00:46:35.785
makes it really easy to
take these derivatives.
00:46:35.785 --> 00:46:40.810
So it's just a lot of
differential sort of calculus
00:46:40.810 --> 00:46:43.960
involved in working this out,
but this is the Hessian of phi.
00:46:43.960 --> 00:46:46.270
And then I need
some initial guess.
00:46:46.270 --> 00:46:48.940
So I pick the
center of my circle
00:46:48.940 --> 00:46:50.737
as an initial guess
for the solution.
00:46:50.737 --> 00:46:52.570
And I'm going to loop
over values of mu that
00:46:52.570 --> 00:46:53.707
get progressively smaller.
00:46:53.707 --> 00:46:55.290
I'll just go down
to 10 to the minus 2
00:46:55.290 --> 00:46:57.719
and stop for illustration
purposes here.
00:46:57.719 --> 00:47:00.010
But really, we should be
checking the solution as we go
00:47:00.010 --> 00:47:04.360
and deciding what values
we want to stop with.
00:47:04.360 --> 00:47:06.800
And then this loop
here, what's this do?
00:47:09.880 --> 00:47:12.222
What's it do?
00:47:12.222 --> 00:47:15.030
Can you tell?
00:47:15.030 --> 00:47:16.440
AUDIENCE: Is it Newton?
00:47:16.440 --> 00:47:16.770
JAMES SWAN: What's that?
00:47:16.770 --> 00:47:17.596
AUDIENCE: Newton?
00:47:17.596 --> 00:47:19.470
JAMES SWAN: Yeah, it's
Newton-Raphson, right?
00:47:19.470 --> 00:47:25.050
x is x minus Hessian inverse
times grad phi, right?
00:47:25.050 --> 00:47:26.462
So I just do Newton-Raphson.
00:47:26.462 --> 00:47:28.170
I take my initial
guess and I loop around
00:47:28.170 --> 00:47:30.630
with Newton-Raphson, and
when this loop finishes,
00:47:30.630 --> 00:47:32.910
I reduce mu, and it'll
just use my previous guess
00:47:32.910 --> 00:47:35.580
as the initial guess for
the next value of the loop,
00:47:35.580 --> 00:47:38.187
until mu is sufficiently small.
00:47:38.187 --> 00:47:39.680
OK?
00:47:39.680 --> 00:47:41.310
Interior point method.
00:47:41.310 --> 00:47:43.370
Here's what that
solution path looks like.
00:47:43.370 --> 00:47:46.520
So mu started at 1, and
the barrier was here.
00:47:46.520 --> 00:47:49.370
It was close to the edge of the
circle, but not quite on it.
00:47:49.370 --> 00:47:51.200
But as I reduced mu
further and further
00:47:51.200 --> 00:47:53.030
and further, you
can see the path,
00:47:53.030 --> 00:47:54.530
the solution path,
that was followed
00:47:54.530 --> 00:47:56.930
works its way closer to
the boundary of the circle.
00:47:56.930 --> 00:47:59.027
And the minimum is
found right here.
00:47:59.027 --> 00:48:00.860
So it turns out the
minimum of this function
00:48:00.860 --> 00:48:02.720
doesn't live in the
domain, it lives
00:48:02.720 --> 00:48:04.960
on the boundary of the domain.
00:48:04.960 --> 00:48:08.830
Recall that this point
should be a point where
00:48:08.830 --> 00:48:11.290
the boundary of the
domain is parallel
00:48:11.290 --> 00:48:15.699
to the contours of the function,
since actually we didn't need
00:48:15.699 --> 00:48:16.990
the inequality constraint here.
00:48:16.990 --> 00:48:18.630
We could have used the
equality constraint.
00:48:18.630 --> 00:48:20.840
The equality constrained
problem has the same solution
00:48:20.840 --> 00:48:22.423
as the inequality
constrained problem.
00:48:22.423 --> 00:48:23.800
And look, that
actually happened.
00:48:23.800 --> 00:48:25.677
Here's the contours
of the function.
00:48:25.677 --> 00:48:27.760
The contour of the function
runs right along here,
00:48:27.760 --> 00:48:29.200
and you can see
it looks like it's
00:48:29.200 --> 00:48:31.360
going to be tangent to
the circle at this point.
00:48:31.360 --> 00:48:34.450
So the interpoint
method actually solved
00:48:34.450 --> 00:48:37.450
an equality constrained problem
in addition to an inequality
00:48:37.450 --> 00:48:40.589
constrained problem, which is--
that's sort of cool that you
00:48:40.589 --> 00:48:41.380
can do it that way.
00:48:44.010 --> 00:48:46.620
How about if I want to do
a combination of equality
00:48:46.620 --> 00:48:48.135
and inequality constraints?
00:48:48.135 --> 00:48:50.010
Then what do I do?
00:48:57.285 --> 00:48:58.255
Yeah.
00:48:58.255 --> 00:49:02.150
AUDIENCE: [INAUDIBLE].
00:49:02.150 --> 00:49:03.020
JAMES SWAN: Perfect.
00:49:03.020 --> 00:49:07.280
Convert the equality
constraint into unknowns,
00:49:07.280 --> 00:49:09.909
Lagrange multipliers, instead.
00:49:09.909 --> 00:49:11.450
And then do the
interior point method
00:49:11.450 --> 00:49:13.449
on the Lagrange
multiplier problem.
00:49:13.449 --> 00:49:15.740
Now you've got a combination
of equality and inequality
00:49:15.740 --> 00:49:16.460
constrained.
00:49:16.460 --> 00:49:18.885
This is exactly
what Matlab does.
00:49:18.885 --> 00:49:20.840
So it converts
equality constraints
00:49:20.840 --> 00:49:22.610
into Lagrange multipliers.
00:49:22.610 --> 00:49:24.500
Inequality constraints
it actually solves
00:49:24.500 --> 00:49:26.630
using interior point methods.
00:49:26.630 --> 00:49:28.790
Buried in that
interior point method
00:49:28.790 --> 00:49:32.450
is some form of Newton-Raphson
and steepest descent combined
00:49:32.450 --> 00:49:34.580
together, like dog
leg we talked about
00:49:34.580 --> 00:49:36.500
for unconstrained problems.
00:49:36.500 --> 00:49:38.120
And it's going to
do a continuation.
00:49:38.120 --> 00:49:40.470
As it reduces the
values of mu, it'll
00:49:40.470 --> 00:49:43.000
have some heuristic
for how it does that.
00:49:43.000 --> 00:49:46.010
It's going to use its previous
solutions as initial guesses
00:49:46.010 --> 00:49:47.910
for the next iteration.
00:49:47.910 --> 00:49:49.700
So these are very
complicated problems,
00:49:49.700 --> 00:49:52.310
but if you understand how to
solve systems of nonlinear
00:49:52.310 --> 00:49:55.010
equations, and you
think carefully
00:49:55.010 --> 00:49:57.350
about how to control numerical
error in your algorithm,
00:49:57.350 --> 00:49:58.808
you come to a
conclusion like this,
00:49:58.808 --> 00:50:04.100
that you can do these sorts of
Lagrange multiplier interior
00:50:04.100 --> 00:50:06.710
point methods to solve a
wide variety of problems
00:50:06.710 --> 00:50:09.820
with reasonable reliability.
00:50:09.820 --> 00:50:10.625
OK?
00:50:10.625 --> 00:50:12.156
Any more questions?
00:50:15.030 --> 00:50:16.140
No?
00:50:16.140 --> 00:50:17.200
Good.
00:50:17.200 --> 00:50:20.830
Well, thank you, and
we'll see you on Friday.