WEBVTT
00:00:00.000 --> 00:00:01.470
INTRODUCTION: The
following content
00:00:01.470 --> 00:00:04.420
is provided by MIT
OpenCourseWare under a Creative
00:00:04.420 --> 00:00:05.844
Commons license.
00:00:05.844 --> 00:00:07.510
Additional information
about our license
00:00:07.510 --> 00:00:10.550
and MIT OpenCourseWare
in general
00:00:10.550 --> 00:00:11.930
is available at ocw.mit.edu.
00:00:15.620 --> 00:00:17.230
PROFESSOR: Specific
problem, and it's
00:00:17.230 --> 00:00:22.310
a pure linear least
squares problem,
00:00:22.310 --> 00:00:24.950
but it's got two terms.
00:00:24.950 --> 00:00:30.190
So we're used to minimizing
A*u minus b square.
00:00:30.190 --> 00:00:33.180
That gives us the least
squares solution u
00:00:33.180 --> 00:00:36.430
hat to a linear system.
00:00:36.430 --> 00:00:42.360
And usually the reason we have
to go to the least square thing
00:00:42.360 --> 00:00:44.900
is that there's
no exact solution.
00:00:44.900 --> 00:00:50.390
Probably A has more
equations than unknowns.
00:00:50.390 --> 00:00:55.230
A is long and thin, and
there's no exact solution,
00:00:55.230 --> 00:00:58.600
so we look for the best
solution, and we call it u hat.
00:00:58.600 --> 00:00:59.280
OK.
00:00:59.280 --> 00:01:01.960
But there are a lot
of problems in which
00:01:01.960 --> 00:01:05.540
a second square appears.
00:01:05.540 --> 00:01:10.240
There's also a B*u equal d
hiding in the background.
00:01:10.240 --> 00:01:13.430
And so we really have like
two sets of equations.
00:01:17.590 --> 00:01:22.500
And we multiply that second
square by some factor
00:01:22.500 --> 00:01:25.230
alpha and that wise
choice of alpha
00:01:25.230 --> 00:01:30.000
is usually a big
part of the problem.
00:01:30.000 --> 00:01:32.750
And I want to speak about
some of the applications
00:01:32.750 --> 00:01:34.570
of this area.
00:01:34.570 --> 00:01:41.500
So from the point of view
of the normal equations,
00:01:41.500 --> 00:01:43.790
the system that
you actually solve,
00:01:43.790 --> 00:01:46.470
you could say no problem.
00:01:46.470 --> 00:01:49.060
If we knew how to do
this, then we certainly
00:01:49.060 --> 00:01:54.280
can do both of them together,
because instead of A transpose
00:01:54.280 --> 00:01:58.950
A showing up, we'll now have
A transpose A plus alpha B
00:01:58.950 --> 00:02:05.230
transpose A. That'll be the
positive definite coefficient
00:02:05.230 --> 00:02:07.670
matrix on the left side.
00:02:07.670 --> 00:02:09.650
And then on the
right side, instead
00:02:09.650 --> 00:02:12.500
of just the usual
A transpose b, this
00:02:12.500 --> 00:02:17.010
term is also going to give
us an alpha B transpose d.
00:02:17.010 --> 00:02:24.410
All I'm saying is we don't need
any new mathematics to reach
00:02:24.410 --> 00:02:29.480
this normal equation with the
-- sort of the two-term normal
00:02:29.480 --> 00:02:30.030
equation.
00:02:30.030 --> 00:02:34.410
And another way to think of
exactly the same thing is we're
00:02:34.410 --> 00:02:42.410
looking at the least squares
problem, where the two matrices
00:02:42.410 --> 00:02:46.710
A and B both are multiplying u.
00:02:46.710 --> 00:02:51.550
And we have two bits
of data, b and d,
00:02:51.550 --> 00:02:53.570
and all were doing
is the usual thing
00:02:53.570 --> 00:02:55.730
but with a weight in here.
00:02:55.730 --> 00:02:58.980
And the weight is the identity
matrix for the A part,
00:02:58.980 --> 00:03:01.830
and it's alpha times the
identity matrix for the B part.
00:03:01.830 --> 00:03:04.510
So this is our C right.
00:03:04.510 --> 00:03:11.150
This is our C, just
to say that, really,
00:03:11.150 --> 00:03:17.070
the notation that we created,
the formulation we have,
00:03:17.070 --> 00:03:21.910
allows us to take this step, so
C appears here and A transpose
00:03:21.910 --> 00:03:26.600
C*b, C appears over here
too, just as always.
00:03:26.600 --> 00:03:29.580
OK.
00:03:29.580 --> 00:03:31.920
But there are
important questions.
00:03:31.920 --> 00:03:34.440
And of course, always, the
first important question
00:03:34.440 --> 00:03:38.280
in applied math is what
problem are you solving?
00:03:38.280 --> 00:03:42.810
Why have we produced
this class of problems?
00:03:42.810 --> 00:03:45.840
And I have two answers.
00:03:45.840 --> 00:03:50.320
Let me just mention
first, so we are sure what
00:03:50.320 --> 00:03:53.080
the shape of these matrices is.
00:03:53.080 --> 00:03:58.750
A, as always, has more
rows than columns.
00:03:58.750 --> 00:04:00.620
Of course, u is n by 1.
00:04:04.590 --> 00:04:09.490
It's just a column vector.
00:04:09.490 --> 00:04:14.530
But A has too many
equations, too many rows,
00:04:14.530 --> 00:04:19.250
for us to get an exact
solution; B, on the other hand,
00:04:19.250 --> 00:04:20.950
has few rows.
00:04:20.950 --> 00:04:23.230
It might even only have one row.
00:04:23.230 --> 00:04:30.820
It's very common to add on
one constraint or one term
00:04:30.820 --> 00:04:33.220
in regularizing the situation.
00:04:33.220 --> 00:04:37.300
Anyway, p is relatively small.
00:04:37.300 --> 00:04:47.960
So the total matrix A, B has
m plus p rows, and the same n
00:04:47.960 --> 00:04:51.380
columns, and we're ready to go.
00:04:51.380 --> 00:04:56.710
But the two parts are
different somehow.
00:04:56.710 --> 00:04:58.910
They come for different reasons.
00:04:58.910 --> 00:05:04.200
And now, I wrote down here
two places they come from.
00:05:04.200 --> 00:05:08.990
And these are big
applications of applied math.
00:05:08.990 --> 00:05:14.600
And one of them produces
small coefficients alpha.
00:05:14.600 --> 00:05:20.460
And what's the purpose of the
B*u minus d term in that case,
00:05:20.460 --> 00:05:22.310
with just a small alpha?
00:05:24.970 --> 00:05:26.770
The problem is that
the A transpose
00:05:26.770 --> 00:05:31.620
A part is nearly
singular or is singular.
00:05:31.620 --> 00:05:35.840
So that the usual normal
equation, without the B,
00:05:35.840 --> 00:05:42.320
would be in trouble, and this
of course happens pretty often.
00:05:42.320 --> 00:05:49.980
So the idea of regularization
is get some control
00:05:49.980 --> 00:05:56.330
of the solution by putting
in another term that
00:05:56.330 --> 00:06:01.290
keeps some control over u, and
stops it from just taking off,
00:06:01.290 --> 00:06:06.550
as what happened where the
original normal equations would
00:06:06.550 --> 00:06:08.580
have a very large u hat.
00:06:08.580 --> 00:06:16.820
So we're just, like, adding
a little steady part that
00:06:16.820 --> 00:06:18.800
keeps it a bit under control.
00:06:18.800 --> 00:06:23.790
And so the A transpose A is
nearly singular in ill-posed
00:06:23.790 --> 00:06:27.800
problems, so we make them
-- it's like giving aspirin
00:06:27.800 --> 00:06:29.740
to an ill-posed problem, right?
00:06:29.740 --> 00:06:34.880
You don't fix it,
but it can operate.
00:06:34.880 --> 00:06:35.550
OK.
00:06:35.550 --> 00:06:39.880
And where do ill-posed
problems come from?
00:06:39.880 --> 00:06:46.100
And I just wanted to say that I
think the fundamental ill-posed
00:06:46.100 --> 00:06:53.470
problems in science
is: given positions --
00:06:53.470 --> 00:06:59.650
suppose we know that
a mass, let's say,
00:06:59.650 --> 00:07:03.480
is in certain positions
at certain times --
00:07:03.480 --> 00:07:06.150
find the velocity.
00:07:06.150 --> 00:07:09.820
So we often, in
applications have some way
00:07:09.820 --> 00:07:13.280
to know position, and
want to know velocity.
00:07:13.280 --> 00:07:17.090
And maybe you realize
that that problem is not
00:07:17.090 --> 00:07:22.420
well posed, because velocity
takes the derivative.
00:07:22.420 --> 00:07:24.450
And if you take the
derivative, that's
00:07:24.450 --> 00:07:28.140
not a good operator to invert.
00:07:28.140 --> 00:07:32.990
Taking the derivative
makes things very rough.
00:07:41.490 --> 00:07:47.300
All sorts of cases, we're
looking for the velocity,
00:07:47.300 --> 00:07:48.570
and we only have positions.
00:07:48.570 --> 00:07:54.440
One that I think
about is from GPS.
00:07:54.440 --> 00:08:00.220
So GPS uses space-based
satellites, as you all know,
00:08:00.220 --> 00:08:04.470
to give you very
accurate positions.
00:08:04.470 --> 00:08:06.710
And somehow, out
of those positions,
00:08:06.710 --> 00:08:09.920
you get pretty accurate
but not, of course,
00:08:09.920 --> 00:08:15.640
as accurate as the positions,
but you get decent velocities.
00:08:15.640 --> 00:08:16.530
And how?
00:08:16.530 --> 00:08:21.360
And there is an example where
you want to know the motion --
00:08:21.360 --> 00:08:24.720
of course, to ask for the
acceleration would be asking
00:08:24.720 --> 00:08:27.380
for yet another derivative.
00:08:27.380 --> 00:08:35.120
You see why the derivative
is an ill-posed thing?
00:08:35.120 --> 00:08:37.590
Let me just say
ahead of time, I'm
00:08:37.590 --> 00:08:46.660
going to make today's lecture
about direction number two,
00:08:46.660 --> 00:08:48.340
not the ill-posed problems.
00:08:48.340 --> 00:08:51.300
So I'm just, like,
throwing in some comments
00:08:51.300 --> 00:08:53.755
about the ill-posed
problem, and then
00:08:53.755 --> 00:08:55.730
I'll have a weekend
to think about those,
00:08:55.730 --> 00:08:58.330
and then next week,
I'll come back
00:08:58.330 --> 00:09:00.760
to this ill posed problems.
00:09:00.760 --> 00:09:06.320
And specifically, they often
come from inverse problems,
00:09:06.320 --> 00:09:12.580
is a big source of
ill posed problems
00:09:12.580 --> 00:09:14.470
that need regularization.
00:09:14.470 --> 00:09:20.490
It's just a very large
class of equations.
00:09:20.490 --> 00:09:27.800
I mean, I was just going to say
about the derivative example.
00:09:27.800 --> 00:09:30.700
Why is that so unstable?
00:09:30.700 --> 00:09:37.340
Well, from the point
of finite differences,
00:09:37.340 --> 00:09:40.480
if we have positions, how
do you estimate velocities?
00:09:40.480 --> 00:09:42.990
You take a difference
quotient, right?
00:09:42.990 --> 00:09:47.900
You take the position at this
time, the position at a close
00:09:47.900 --> 00:09:51.630
by time, and you
divide by delta t.
00:09:51.630 --> 00:09:53.780
That's a reasonable start.
00:09:53.780 --> 00:09:56.640
But dividing by delta
t, that small number,
00:09:56.640 --> 00:10:00.120
is producing big numbers.
00:10:00.120 --> 00:10:05.790
Any errors in the position
are multiplied by that 1
00:10:05.790 --> 00:10:09.550
over delta t and blown up.
00:10:09.550 --> 00:10:13.330
And similarly, in
frequency space,
00:10:13.330 --> 00:10:17.330
where the functions that we
think about are the functions
00:10:17.330 --> 00:10:22.520
like e to the i*k*t, the
derivatives brings down
00:10:22.520 --> 00:10:24.650
the factor k.
00:10:24.650 --> 00:10:28.050
So high oscillations,
that's the point.
00:10:28.050 --> 00:10:31.160
Oscillatory functions
can be pretty small,
00:10:31.160 --> 00:10:34.150
but their derivative
can be enormous.
00:10:34.150 --> 00:10:38.710
So it's that oscillation
which is often associated
00:10:38.710 --> 00:10:40.940
with noise in the measurements.
00:10:40.940 --> 00:10:45.440
You know, noisy
measurements are jumpy,
00:10:45.440 --> 00:10:51.000
and when we go to take their
derivative or their finite
00:10:51.000 --> 00:10:52.980
difference, we get big answers.
00:10:52.980 --> 00:11:01.830
Anyway, for me that's the
model ill-posed problem,
00:11:01.830 --> 00:11:03.890
to find velocities.
00:11:03.890 --> 00:11:05.600
And how to do it?
00:11:05.600 --> 00:11:09.090
I mean a lot of thought
has gone into that.
00:11:09.090 --> 00:11:14.060
Let me leave it there,
and come back to it.
00:11:14.060 --> 00:11:19.340
But I say all this just to
emphasize its importance.
00:11:19.340 --> 00:11:22.950
Not that we'll completely
solve it, actually,
00:11:22.950 --> 00:11:25.730
for GPS or for any
other thing, it's
00:11:25.730 --> 00:11:29.090
just all we can do is medicate.
00:11:29.090 --> 00:11:29.810
OK.
00:11:29.810 --> 00:11:34.490
Now this is the one that
we can really solve.
00:11:34.490 --> 00:11:37.820
So this is a different
application entirely.
00:11:37.820 --> 00:11:42.180
In this application, this
second term, B*u equal d,
00:11:42.180 --> 00:11:46.060
is something important,
something that we want
00:11:46.060 --> 00:11:48.230
to enforce.
00:11:48.230 --> 00:11:51.740
It's a constraint,
you could say.
00:11:51.740 --> 00:11:55.550
And one way to enforce it
which fits this pattern
00:11:55.550 --> 00:11:58.380
is to take alpha
very large, right.
00:11:58.380 --> 00:12:02.350
When we take alpha large, we're
putting a really heavy weight
00:12:02.350 --> 00:12:07.020
on that B*u minus d square,
and when we minimize,
00:12:07.020 --> 00:12:12.520
that weight will force B*u
to be pretty close to d.
00:12:12.520 --> 00:12:16.100
But of course, B*u equal
d doesn't determine u.
00:12:16.100 --> 00:12:18.450
Everybody's got
that picture clear?
00:12:18.450 --> 00:12:23.410
From B*u equal d
has many solutions.
00:12:28.520 --> 00:12:33.160
And so the real problem that
we're trying to solve is
00:12:33.160 --> 00:12:38.010
enforce B*u equal d, but
among those solutions,
00:12:38.010 --> 00:12:42.780
pick the one that
minimizes the first square,
00:12:42.780 --> 00:12:45.010
A*u minus b squared.
00:12:45.010 --> 00:12:46.810
So you see the difference?
00:12:46.810 --> 00:12:48.360
You're trying to
enforce something
00:12:48.360 --> 00:12:52.830
that the physics or the
geometry, or whatever
00:12:52.830 --> 00:12:58.580
source says has to be true.
00:12:58.580 --> 00:13:00.180
And you can do it.
00:13:00.180 --> 00:13:04.220
And you're left with
lots of options.
00:13:04.220 --> 00:13:12.220
And then the combined problem
attempts to pick the right u.
00:13:12.220 --> 00:13:12.760
OK.
00:13:12.760 --> 00:13:16.630
So that's the
application number two
00:13:16.630 --> 00:13:18.270
that I want to
speak about today.
00:13:18.270 --> 00:13:24.100
And actually, I want to
give several ways to do it.
00:13:24.100 --> 00:13:26.830
It's a very important problem.
00:13:26.830 --> 00:13:34.580
And one way will be to
actually solve B*u equal d.
00:13:34.580 --> 00:13:38.070
Find those solutions.
00:13:38.070 --> 00:13:40.360
And you may say,
well, that's what
00:13:40.360 --> 00:13:42.000
we learned in linear
algebra, that's
00:13:42.000 --> 00:13:44.950
the very foundation
of linear algebra,
00:13:44.950 --> 00:13:48.570
is there a particular
solution, right?
00:13:51.310 --> 00:13:56.750
Every solution is of this form
particular plus null space.
00:13:56.750 --> 00:14:04.400
Maybe I'll just point to
the start of that approach.
00:14:04.400 --> 00:14:06.670
So want to solve B*u equal d.
00:14:06.670 --> 00:14:09.470
And I'll come back
to this method
00:14:09.470 --> 00:14:12.830
after dealing with the
least squares approach.
00:14:12.830 --> 00:14:16.200
But here's really
the direct approach.
00:14:16.200 --> 00:14:19.440
That if I solve B*u equal
d, then there's a particular
00:14:19.440 --> 00:14:21.750
solution that solves it.
00:14:21.750 --> 00:14:28.250
And then you can always add on
the general solution which is,
00:14:28.250 --> 00:14:30.380
sorry add on the null
space solution --
00:14:30.380 --> 00:14:32.990
the solution of B*u equals 0.
00:14:32.990 --> 00:14:36.230
And B*u equals 0 has
lots of solutions.
00:14:36.230 --> 00:14:38.850
So we would have to find them.
00:14:38.850 --> 00:14:40.920
OK.
00:14:40.920 --> 00:14:43.550
I mean that's what 18.06
would naturally do,
00:14:43.550 --> 00:14:47.090
but actually never,
I'm ashamed to say,
00:14:47.090 --> 00:14:49.270
but I didn't do it in 18.06.
00:14:49.270 --> 00:14:56.810
I never actually said how I
would scientifically compute,
00:14:56.810 --> 00:15:00.710
in a stable way, the solutions.
00:15:00.710 --> 00:15:01.250
OK.
00:15:01.250 --> 00:15:04.400
So I think that
will be important.
00:15:04.400 --> 00:15:06.650
But that's not the
only way to do it.
00:15:06.650 --> 00:15:11.030
That's called the
null space method.
00:15:11.030 --> 00:15:14.140
And sometimes it's the
right choice, sometimes not.
00:15:14.140 --> 00:15:20.930
This would be called the
heavy weight method, right.
00:15:20.930 --> 00:15:25.590
Put on a very heavy weight
and solve a standard problem.
00:15:25.590 --> 00:15:26.500
OK.
00:15:26.500 --> 00:15:28.900
So let me follow that one up.
00:15:28.900 --> 00:15:32.070
And then they'll
be a third method.
00:15:32.070 --> 00:15:34.570
And maybe there's going to be
space on the middle blackboard
00:15:34.570 --> 00:15:35.630
for it.
00:15:35.630 --> 00:15:37.930
And what would the
third method be?
00:15:37.930 --> 00:15:43.770
That will be use a
Lagrange multiplier.
00:15:43.770 --> 00:15:45.820
This thing is a constraint.
00:15:45.820 --> 00:15:48.510
I'll enforce it by a
Lagrange multiplier.
00:15:48.510 --> 00:15:49.050
OK.
00:15:49.050 --> 00:15:51.060
That's coming next.
00:15:51.060 --> 00:15:54.920
The way I'm enforcing it right
now is by a heavy weight.
00:15:54.920 --> 00:15:57.200
OK.
00:15:57.200 --> 00:16:01.650
One reason for the
popularity of this method
00:16:01.650 --> 00:16:05.000
is you don't have to
do any new thinking.
00:16:05.000 --> 00:16:11.070
You just create these
equations and solve them.
00:16:11.070 --> 00:16:16.380
Where the other methods maybe
ask us to think separately
00:16:16.380 --> 00:16:19.694
about the constraint.
00:16:19.694 --> 00:16:21.360
Here we don't have
to things separately,
00:16:21.360 --> 00:16:24.910
we just create this normal
equation, we solve it,
00:16:24.910 --> 00:16:29.030
and we get an answer u hat.
00:16:29.030 --> 00:16:31.420
Maybe I should call
it u hat alpha,
00:16:31.420 --> 00:16:34.430
because it depends
on the weight alpha,
00:16:34.430 --> 00:16:41.480
certainly, which we hope
is near the exact solution.
00:16:41.480 --> 00:16:46.250
The exact solution being the one
that exactly solves B*u equal
00:16:46.250 --> 00:16:47.120
d.
00:16:47.120 --> 00:16:52.070
Because u hat alpha will not
exactly solve B*u equal d.
00:16:52.070 --> 00:16:56.530
But we can find solutions
that do, and then among those,
00:16:56.530 --> 00:17:00.530
we can minimize
A*u minus b square.
00:17:00.530 --> 00:17:01.030
OK.
00:17:01.030 --> 00:17:06.101
So just a word about
this heavy weight method.
00:17:06.101 --> 00:17:06.600
OK.
00:17:10.260 --> 00:17:14.700
Well, first an
interesting point.
00:17:14.700 --> 00:17:19.240
A point that I think
is sort of interesting.
00:17:19.240 --> 00:17:22.650
I want to let alpha go to
infinity and see what happens,
00:17:22.650 --> 00:17:24.480
right.
00:17:24.480 --> 00:17:26.970
Everybody figures that as
alpha goes to infinity,
00:17:26.970 --> 00:17:29.490
I'm going to get
the right answer.
00:17:29.490 --> 00:17:31.150
Because as alpha
goes to infinity,
00:17:31.150 --> 00:17:34.370
it's going to more and more
enforce the constraint B*u
00:17:34.370 --> 00:17:35.550
equal d.
00:17:35.550 --> 00:17:39.510
And then, with that constraint
enforced, the other part of it
00:17:39.510 --> 00:17:43.050
will find the best
u and that's great.
00:17:43.050 --> 00:17:46.620
But let alpha go to
infinity in this equation,
00:17:46.620 --> 00:17:47.400
and what happens?
00:17:50.050 --> 00:17:53.760
So this is just like a side
comment just to say alpha,
00:17:53.760 --> 00:17:58.050
you know, taking a limit you got
to think about doing it right.
00:17:58.050 --> 00:18:01.694
Well, let's see, if I let
alpha go to infinity as it is,
00:18:01.694 --> 00:18:03.360
that'll be infinite
that'll be infinite,
00:18:03.360 --> 00:18:06.140
and I won't know
what's going on.
00:18:06.140 --> 00:18:12.960
Let me divide by alpha before
I let alpha go to infinity.
00:18:12.960 --> 00:18:15.850
So if I just divide
everything by alpha --
00:18:15.850 --> 00:18:19.790
can I do that with
an eraser here?
00:18:19.790 --> 00:18:21.810
I'll divide by alpha.
00:18:21.810 --> 00:18:24.580
So there's a 1 over alpha here.
00:18:24.580 --> 00:18:26.590
I divide this by alpha.
00:18:26.590 --> 00:18:30.960
And this has a 1
over alpha there.
00:18:30.960 --> 00:18:33.480
And now, if I let
alpha go to infinity,
00:18:33.480 --> 00:18:36.800
I get something sensible.
00:18:36.800 --> 00:18:41.150
This goes to 0, right,
alpha going to infinity,
00:18:41.150 --> 00:18:42.420
getting bigger and bigger.
00:18:42.420 --> 00:18:43.850
This goes to 0.
00:18:43.850 --> 00:18:45.370
So what do I get in the limit?
00:18:45.370 --> 00:18:51.640
I get that this equals
this in the limit.
00:18:51.640 --> 00:18:53.260
So shall I put that up here?
00:18:53.260 --> 00:18:56.220
Well, I'll put it here, because
I don't like it, frankly.
00:18:58.930 --> 00:19:03.390
So I'll just squeeze
it in this little spot.
00:19:03.390 --> 00:19:07.350
That if I let alpha go to
into infinity, so 1 over alpha
00:19:07.350 --> 00:19:08.030
goes to 0.
00:19:08.030 --> 00:19:14.810
I get B transpose B, u hat
infinity, shall I call it?
00:19:14.810 --> 00:19:16.450
Equals B transpose d.
00:19:19.440 --> 00:19:21.750
And I guess what I
want to say is, from
00:19:21.750 --> 00:19:25.580
that I don't learn a whole lot.
00:19:25.580 --> 00:19:28.100
because B transpose B
is a singular matrix,
00:19:28.100 --> 00:19:33.710
B transpose B is a
matrix of only rank p,
00:19:33.710 --> 00:19:40.350
it's very singular, right?
00:19:40.350 --> 00:19:44.760
B had this crazy
shape, long and thin.
00:19:44.760 --> 00:19:47.660
B transpose B will
be tall, B transpose
00:19:47.660 --> 00:19:52.980
B will be a large matrix,
but its rank will only be p.
00:19:52.980 --> 00:19:55.230
It's an n by n matrix
of rank p, and it's
00:19:55.230 --> 00:19:58.690
singular and who knows
what's going on there.
00:20:01.350 --> 00:20:04.370
That little side
issue was simply
00:20:04.370 --> 00:20:08.740
to say that you
can't just let alpha
00:20:08.740 --> 00:20:11.530
go to infinity in the
central equation there,
00:20:11.530 --> 00:20:13.570
and expect to see
what's happening.
00:20:13.570 --> 00:20:14.070
OK.
00:20:14.070 --> 00:20:17.870
So somehow there's
more to it than that.
00:20:17.870 --> 00:20:25.420
So let me put alpha back where
it belongs, and think again.
00:20:25.420 --> 00:20:27.970
OK.
00:20:27.970 --> 00:20:35.340
And I guess by thinking
again, I might as well
00:20:35.340 --> 00:20:43.190
think in terms of this
way of writing it.
00:20:43.190 --> 00:20:45.760
Because I recognize this, right?
00:20:45.760 --> 00:20:49.100
This is exactly the framework
that we've developed.
00:20:55.280 --> 00:20:58.760
So this is the least
squares problem.
00:20:58.760 --> 00:21:01.050
I just want to write
down the saddle point
00:21:01.050 --> 00:21:08.420
matrix that goes with this
least squares problem.
00:21:08.420 --> 00:21:10.230
What is the saddle point matrix?
00:21:10.230 --> 00:21:11.610
Do you remember?
00:21:11.610 --> 00:21:21.650
The saddle point matrix S
is -- it has, up here --
00:21:21.650 --> 00:21:26.980
so now I've got
an A and a B here.
00:21:26.980 --> 00:21:31.070
So it's going to be
a little bit larger.
00:21:31.070 --> 00:21:34.320
Then I have my usual zero block.
00:21:34.320 --> 00:21:38.800
And I have my usual A
transpose B transpose block.
00:21:38.800 --> 00:21:41.520
And what block goes there?
00:21:41.520 --> 00:21:43.280
That's the C inverse, right.
00:21:47.690 --> 00:21:53.426
It's our usual C inverse,
A, A transpose, 0
00:21:53.426 --> 00:21:54.800
that we're totally
accustomed to.
00:21:54.800 --> 00:22:03.050
But now A has grown
into A, B; 0 is still 0;
00:22:03.050 --> 00:22:05.890
the transpose is
still the transpose;
00:22:05.890 --> 00:22:09.240
and up here is C inverse,
and since C was this,
00:22:09.240 --> 00:22:14.050
C inverse will be the identity,
and the identity over alpha.
00:22:14.050 --> 00:22:15.590
OK.
00:22:15.590 --> 00:22:18.330
So that's my S_alpha,
you could say.
00:22:21.040 --> 00:22:32.020
And now I'm prepared to let --
so my equation is S_alpha --
00:22:32.020 --> 00:22:40.110
written as a block equation,
what are the pieces of it?
00:22:40.110 --> 00:22:47.670
u is the guy that I'm
looking for, the u hat alpha.
00:22:47.670 --> 00:22:50.880
And there was a Lagrange
multiplier that came in.
00:22:50.880 --> 00:22:54.440
You remember, that's how
we got to a block form
00:22:54.440 --> 00:22:56.080
from a scalar form.
00:22:56.080 --> 00:23:00.540
And I guess I usually call it
w, so I'll stay with w for here.
00:23:00.540 --> 00:23:02.170
OK.
00:23:02.170 --> 00:23:06.920
So that's what multiplies w, u.
00:23:06.920 --> 00:23:11.410
And it's what
gives -- let's see.
00:23:11.410 --> 00:23:19.590
I think it gives a B, and
it gives a d from this A*u
00:23:19.590 --> 00:23:23.004
and B*u, and I think
here if gives a 0,
00:23:23.004 --> 00:23:24.560
because we didn't have any.
00:23:24.560 --> 00:23:25.060
OK.
00:23:27.980 --> 00:23:29.450
What am I doing here?
00:23:29.450 --> 00:23:31.690
I'm just writing
the problem in a way
00:23:31.690 --> 00:23:34.740
where I can let alpha go
to 0, and see the limit.
00:23:34.740 --> 00:23:43.260
So let alpha go to infinity,
this is heavy weight part.
00:23:43.260 --> 00:23:44.520
So this will go to 0.
00:23:44.520 --> 00:23:52.320
So this approaches the
S_infinity, w_infinity,
00:23:52.320 --> 00:23:59.070
we could call it, u
hat infinity is now --
00:23:59.070 --> 00:24:03.400
well you see what the limit
is, that's 0 in that block.
00:24:03.400 --> 00:24:08.070
This is A, this is B, this is A
transpose, this is B transpose,
00:24:08.070 --> 00:24:11.220
this is our usual zero
block, multiplying
00:24:11.220 --> 00:24:18.120
our same w_infinity,
u hat infinity,
00:24:18.120 --> 00:24:23.730
equaling our same b, d, and 0.
00:24:23.730 --> 00:24:25.250
This is the limiting equation.
00:24:30.590 --> 00:24:31.970
And it's great.
00:24:35.230 --> 00:24:40.230
This is the equation that
determines the limit as alpha
00:24:40.230 --> 00:24:44.220
goes to infinity, that
determines the best u.
00:24:44.220 --> 00:24:48.660
This is the problem that
we really want to solve.
00:24:48.660 --> 00:24:51.830
Maybe that's what I should say.
00:24:51.830 --> 00:24:56.380
Do you see the constraint
B*u equal d in here from this
00:24:56.380 --> 00:24:58.380
middle block row?
00:24:58.380 --> 00:25:04.140
That says 0, 0, B u hat is d.
00:25:04.140 --> 00:25:08.290
So we've introduced
the constraint.
00:25:08.290 --> 00:25:13.330
The first part is w_infinity
with an A*u_infinity,
00:25:13.330 --> 00:25:18.790
that's the usual error term,
the thing that we probably
00:25:18.790 --> 00:25:20.000
can't make 0.
00:25:20.000 --> 00:25:23.670
And then this is the
usual Lagrange multiplier
00:25:23.670 --> 00:25:25.450
term from there.
00:25:29.120 --> 00:25:34.360
So I've spoken pretty quickly
here, and let me just conclude.
00:25:34.360 --> 00:25:39.150
This is the limit equation,
is the correct limit equation.
00:25:39.150 --> 00:25:40.880
This is the limit
equation that we want
00:25:40.880 --> 00:25:44.780
to solve one way or another.
00:25:44.780 --> 00:25:51.120
And taking alpha large is one
way to get near the answer,
00:25:51.120 --> 00:25:53.250
but we'll look at
other ways now.
00:25:53.250 --> 00:26:00.840
So this is really the
correct equations to solve.
00:26:05.050 --> 00:26:12.740
Going the saddle point
Lagrange multiplier route.
00:26:12.740 --> 00:26:14.910
OK.
00:26:14.910 --> 00:26:19.360
So let me summarize
what I've done so far.
00:26:22.340 --> 00:26:30.060
My problem is when B*u equal
d is a constraint that I would
00:26:30.060 --> 00:26:35.040
like to satisfy, and one way
to do it is to take alpha,
00:26:35.040 --> 00:26:38.770
you know, pretty near the
largest number that the machine
00:26:38.770 --> 00:26:43.750
will hold, say 10 to the 15.
00:26:43.750 --> 00:26:46.500
Put a really heavy
weight on this.
00:26:46.500 --> 00:26:49.920
But of course, when you
let alpha be 10 to the 15,
00:26:49.920 --> 00:26:54.470
you can see that there's like
some possible problems here.
00:26:54.470 --> 00:26:56.860
When you let alpha have
an enormous weight,
00:26:56.860 --> 00:26:59.900
you're really
tilting this matrix
00:26:59.900 --> 00:27:03.230
so strongly, you know, you
couldn't let it be 10 to the 20
00:27:03.230 --> 00:27:07.610
in single precision or you'd
wipe out A transpose A.
00:27:07.610 --> 00:27:09.570
So it's a balance here.
00:27:09.570 --> 00:27:14.440
So I guess probably a lot
of a numerical analysts
00:27:14.440 --> 00:27:18.440
would say wrong way to
do it, the right way
00:27:18.440 --> 00:27:24.590
is solve this equation, or
else do it this other way.
00:27:24.590 --> 00:27:30.590
But a lot of people with
codes say OK, you know,
00:27:30.590 --> 00:27:32.090
you're going to be
a nervous Nellie,
00:27:32.090 --> 00:27:34.220
I'm just going to use my code.
00:27:34.220 --> 00:27:37.920
And that's quite normal,
quite human response.
00:27:37.920 --> 00:27:40.640
OK.
00:27:40.640 --> 00:27:46.400
And this will
frequently succeed.
00:27:46.400 --> 00:27:46.900
OK.
00:27:46.900 --> 00:27:50.640
So that's one method to
do it, not the method that
00:27:50.640 --> 00:27:54.690
the professionals in numerical
analysis -- maybe I'm thinking,
00:27:54.690 --> 00:27:58.060
for example, the book
by Golub/van Loan,
00:27:58.060 --> 00:28:03.760
if you know that book, that
would discuss this problem.
00:28:03.760 --> 00:28:09.170
And it would actually
discuss this third method,
00:28:09.170 --> 00:28:12.280
this null space
method of solving it.
00:28:12.280 --> 00:28:13.510
OK.
00:28:13.510 --> 00:28:16.570
Maybe I'll go to that
null space method.
00:28:16.570 --> 00:28:20.450
So this was one way.
00:28:20.450 --> 00:28:25.030
Another way is
solve B*u equal d.
00:28:29.470 --> 00:28:32.670
And remember again, we
only have p equations,
00:28:32.670 --> 00:28:35.275
we have n unknowns,
so there's going
00:28:35.275 --> 00:28:38.740
to be freedom in the solution.
00:28:38.740 --> 00:28:43.210
So we have to identify
a particular solution,
00:28:43.210 --> 00:28:45.910
there's a lot of freedom in
that particular solution,
00:28:45.910 --> 00:28:51.350
and then we can add to it --
this null space is going to be
00:28:51.350 --> 00:29:02.120
n minus p dimensions, n minus p
degrees of freedom in the null
00:29:02.120 --> 00:29:02.960
space.
00:29:02.960 --> 00:29:06.190
That's the dimension of
the null space, n minus p.
00:29:06.190 --> 00:29:08.590
I'm assuming that
B has full rank p,
00:29:08.590 --> 00:29:12.100
but p is a small
number compared to n.
00:29:12.100 --> 00:29:12.680
OK.
00:29:12.680 --> 00:29:14.860
So how do you find a
particular solution?
00:29:14.860 --> 00:29:16.680
How do you find the
null space solution?
00:29:16.680 --> 00:29:23.930
As I said, that's what I
should be explaining in 18.06.
00:29:23.930 --> 00:29:27.560
And of course, we do it in
18.06, but we do it with a 3
00:29:27.560 --> 00:29:33.290
by 3 matrix, and we practically,
you know, we do it by hand,
00:29:33.290 --> 00:29:36.190
where here we're talking about
matrices of order thousands
00:29:36.190 --> 00:29:39.860
or millions, we don't
do those by hand.
00:29:39.860 --> 00:29:44.450
And we better not do
it in an unstable way.
00:29:44.450 --> 00:29:48.940
So the question is what's
a good way to do it?
00:29:48.940 --> 00:29:53.720
And really, the heart of
modern numerical analysis
00:29:53.720 --> 00:30:00.110
is orthogonalize stuff,
get orthogonal vectors.
00:30:00.110 --> 00:30:03.190
Because if you have
orthogonal vectors,
00:30:03.190 --> 00:30:05.500
they don't get out of scale.
00:30:05.500 --> 00:30:10.950
The numbers involved
don't become unstable.
00:30:10.950 --> 00:30:14.585
And the standard
orthogonalization process
00:30:14.585 --> 00:30:17.330
is Gram-Schmidt,
that's right, those
00:30:17.330 --> 00:30:18.760
are the words we all think of.
00:30:18.760 --> 00:30:22.470
If I have a bunch of vectors,
I have to make them orthogonal,
00:30:22.470 --> 00:30:28.140
I want to make them
orthogonal, then I use -- well,
00:30:28.140 --> 00:30:30.900
Gram-Schmidt is
what we think of.
00:30:30.900 --> 00:30:36.830
But actually, MATLAB
doesn't use Gram-Schmidt,
00:30:36.830 --> 00:30:40.490
doesn't use the
usual Gram-Schmidt,
00:30:40.490 --> 00:30:44.480
as Gram and Schmidt
thought of it.
00:30:44.480 --> 00:30:49.520
MATLAB goes a different
route to the same conclusion.
00:30:49.520 --> 00:30:54.560
So let me just remind you
what Gram-Schmidt produced.
00:30:58.420 --> 00:31:05.890
And let me put in the name
of the numerical analyst
00:31:05.890 --> 00:31:14.390
long after Gram and
Schmidt, it's Householder,
00:31:14.390 --> 00:31:20.790
you know, the guy from
Tennessee with good ideas.
00:31:20.790 --> 00:31:24.590
So he had another way
to the same answer,
00:31:24.590 --> 00:31:26.970
which is this factorization.
00:31:26.970 --> 00:31:34.200
So we take our matrix,
often it's A in 18.06,
00:31:34.200 --> 00:31:40.150
and we factor it, we want to
orthogonalize its columns.
00:31:40.150 --> 00:31:43.060
So the columns of A
get orthogonalized
00:31:43.060 --> 00:31:52.740
into the columns of Q. So this
has the orthogonal columns.
00:31:52.740 --> 00:31:54.580
And then, of course,
there's some connection
00:31:54.580 --> 00:31:57.730
between the original columns
and the orthogonal columns,
00:31:57.730 --> 00:32:01.320
and that connection is
by triangular matrix
00:32:01.320 --> 00:32:03.020
R, upper triangular.
00:32:08.050 --> 00:32:12.720
I don't know if you
remember that from 18.06.
00:32:12.720 --> 00:32:19.140
What I typically do is I explain
Gram-Schmidt as they knew it,
00:32:19.140 --> 00:32:25.820
and then at the last
minute I pull Q and R out
00:32:25.820 --> 00:32:29.620
as a way to express
the result. OK.
00:32:29.620 --> 00:32:32.240
So it's the result
we want, and not
00:32:32.240 --> 00:32:35.780
the particular Gram-Schmidt
way to get there,
00:32:35.780 --> 00:32:38.470
and Householder produces
a better way to get there.
00:32:38.470 --> 00:32:40.160
OK.
00:32:40.160 --> 00:32:46.290
But the main point is that if a
matrix has independent columns,
00:32:46.290 --> 00:32:52.170
or even if it hasn't, but if
it has independent columns
00:32:52.170 --> 00:32:58.270
we know everything about it,
that we can orthogonalize
00:32:58.270 --> 00:32:59.550
those columns.
00:32:59.550 --> 00:33:04.840
In fact, we can -- here's
what I'm leading to, this B
00:33:04.840 --> 00:33:11.440
transpose I'm remembering has
this shape because B had that
00:33:11.440 --> 00:33:18.470
shape, so it's B transpose that
I'm going to do Gram-Schmidt,
00:33:18.470 --> 00:33:30.510
Householder, use -- The command
in MATLAB is [Q, R] equals,
00:33:30.510 --> 00:33:36.040
with Gram-Schmidt we could
have used the letters G and S,
00:33:36.040 --> 00:33:40.140
but since we don't use
their actual method anymore,
00:33:40.140 --> 00:33:44.670
we could use the letter HH
for Householder or something.
00:33:44.670 --> 00:33:50.680
But it's qr of, in this case,
B transpose is what we want.
00:33:50.680 --> 00:33:51.800
OK.
00:33:51.800 --> 00:33:59.440
So what that very frequently
used command in MATLAB
00:33:59.440 --> 00:34:09.800
produces is Q and R. And it
produces a square matrix Q,
00:34:09.800 --> 00:34:14.680
where these columns, the
columns of the first part,
00:34:14.680 --> 00:34:20.490
Q_1 transpose,
are orthogonalized
00:34:20.490 --> 00:34:22.890
versions of these columns.
00:34:22.890 --> 00:34:28.060
And the R just tells us the
connection between them.
00:34:28.060 --> 00:34:32.950
Then it also produces, and
this is handy as you'll see,
00:34:32.950 --> 00:34:39.390
the algorithm also produces
n minus p more columns, that
00:34:39.390 --> 00:34:42.060
are orthogonal to these guys.
00:34:42.060 --> 00:34:45.100
So it produces a complete
orthonormal basis,
00:34:45.100 --> 00:34:48.230
a complete set of n
columns altogether.
00:34:48.230 --> 00:34:53.540
Q_1 transpose has the
column that really are
00:34:53.540 --> 00:34:54.920
associated with these problems.
00:34:54.920 --> 00:34:58.530
And these are going to be
associated with the null space.
00:35:02.610 --> 00:35:06.420
So out of this, I see
that actually B transpose
00:35:06.420 --> 00:35:13.480
is Q_1 transpose
R. So you can say
00:35:13.480 --> 00:35:18.330
this is the reduced
factorization with only p
00:35:18.330 --> 00:35:23.620
columns, and this
is the full picture
00:35:23.620 --> 00:35:26.810
with the other n minus p
columns that are orthogonal.
00:35:26.810 --> 00:35:29.660
And the reason that's
handy is they tell us
00:35:29.660 --> 00:35:31.950
about the null space.
00:35:31.950 --> 00:35:38.020
So now I want to identify out
of this a particular solution
00:35:38.020 --> 00:35:39.960
and the general
null space solution.
00:35:39.960 --> 00:35:41.780
OK.
00:35:41.780 --> 00:35:43.470
So what are those?
00:35:43.470 --> 00:35:48.450
So particular solution is
going to use this part.
00:35:50.980 --> 00:35:57.160
So, let's see, I want
a particular solution.
00:35:57.160 --> 00:36:05.280
So B, transposing that
is R transpose Q_1.
00:36:05.280 --> 00:36:07.430
OK.
00:36:07.430 --> 00:36:15.490
So now I'm prepared to solve
-- step one is the particular
00:36:15.490 --> 00:36:16.230
solution.
00:36:16.230 --> 00:36:21.500
I want to get be
B*u_particular equal d.
00:36:21.500 --> 00:36:23.350
OK.
00:36:23.350 --> 00:36:27.700
But now I have B
is nicely factored.
00:36:27.700 --> 00:36:37.010
So this is R transpose
Q_1 u particular equal d.
00:36:40.370 --> 00:36:46.570
So now comes the computation
the code has to do.
00:36:46.570 --> 00:36:54.370
It has to invert that to
get Q_1 times u particular
00:36:54.370 --> 00:36:58.590
equals R inverse transpose d.
00:36:58.590 --> 00:37:00.680
So it had to solve
a triangular system,
00:37:00.680 --> 00:37:02.930
but of course, a triangular
system is quick to solve.
00:37:02.930 --> 00:37:05.040
That's the good part here.
00:37:05.040 --> 00:37:09.740
And then this final step
to get u particular,
00:37:09.740 --> 00:37:14.000
I have to put the inverse
of that guy over there,
00:37:14.000 --> 00:37:17.010
but because this has
orthogonal column, that's
00:37:17.010 --> 00:37:19.040
just Q_1 transpose.
00:37:19.040 --> 00:37:22.720
So there we go.
00:37:22.720 --> 00:37:28.900
That's the inverse
of R. So that's
00:37:28.900 --> 00:37:32.660
what I should've done
in 18.06 and never did,
00:37:32.660 --> 00:37:36.430
and you get to see it.
00:37:36.430 --> 00:37:39.330
What's a convenient
particular solution?
00:37:39.330 --> 00:37:46.540
Everybody knows, we
got a whole collection
00:37:46.540 --> 00:37:47.780
of particular solutions.
00:37:47.780 --> 00:37:51.950
We want to choose one
that's nice and stable.
00:37:51.950 --> 00:37:54.360
And the reason it's
stable is that it
00:37:54.360 --> 00:37:59.670
works with orthogonal
columns, orthonormal even,
00:37:59.670 --> 00:38:07.630
and triangular matrix for
which linear systems are
00:38:07.630 --> 00:38:08.760
highly active.
00:38:08.760 --> 00:38:09.500
OK.
00:38:09.500 --> 00:38:11.350
So that's the
particular solution.
00:38:11.350 --> 00:38:13.320
Now what's the null
space solution?
00:38:13.320 --> 00:38:21.240
What are the general
solutions to null space part?
00:38:21.240 --> 00:38:25.120
What are the solution to those?
00:38:25.120 --> 00:38:29.190
Well, I can just go
down the same steps.
00:38:29.190 --> 00:38:35.310
This is R transpose Q_1
u null space equals 0.
00:38:38.480 --> 00:38:41.620
I multiply both sides --
this is a nice square,
00:38:41.620 --> 00:38:47.500
invertible matrix -- I multiply
by its inverse, kills that.
00:38:47.500 --> 00:39:00.620
So now I have Q_1 times the --
Q_1 is really the heart of B.
00:39:00.620 --> 00:39:07.830
So what vectors are
perpendicular to Q_1?
00:39:07.830 --> 00:39:09.670
I hope I've got this right.
00:39:09.670 --> 00:39:13.440
It's easy to mix up a
transpose in the process.
00:39:13.440 --> 00:39:16.560
So let me just pause to be
sure I'm doing it correctly.
00:39:22.120 --> 00:39:23.540
OK.
00:39:23.540 --> 00:39:24.610
I hope.
00:39:24.610 --> 00:39:26.460
Did I check that I get it right?
00:39:29.220 --> 00:39:30.020
Yes.
00:39:30.020 --> 00:39:30.520
OK.
00:39:38.060 --> 00:39:41.850
I could have written
what B is here, since I
00:39:41.850 --> 00:39:46.130
have B transpose as a product.
00:39:46.130 --> 00:39:56.570
B is R_0, Q_1, Q_2.
00:39:56.570 --> 00:39:58.770
OK.
00:39:58.770 --> 00:40:05.360
And I want to multiply
by u_null and get 0.
00:40:05.360 --> 00:40:06.760
OK.
00:40:06.760 --> 00:40:08.680
So what should u_null be?
00:40:12.840 --> 00:40:22.980
u_null should be -- this
part is giving us a 0,
00:40:22.980 --> 00:40:25.340
so this is like gone.
00:40:25.340 --> 00:40:31.070
So you see the two are the same.
00:40:31.070 --> 00:40:38.580
So what vectors are
perpendicular to those?
00:40:38.580 --> 00:40:50.360
The answer is the u_null is
a combination of the columns
00:40:50.360 --> 00:40:57.170
of Q_2 transpose.
00:40:57.170 --> 00:41:01.750
It's the Q_2 part that's
telling us about the null space.
00:41:01.750 --> 00:41:05.670
It was the Q_1 part that
gave the particular solution,
00:41:05.670 --> 00:41:08.750
it's the Q_2 part that
gives the general solution.
00:41:08.750 --> 00:41:15.000
In other words, u_null
is Q_2 transpose times
00:41:15.000 --> 00:41:21.280
any vector, let me call
it z, this is any z.
00:41:21.280 --> 00:41:23.530
OK.
00:41:23.530 --> 00:41:28.050
Now this has my n minus
p degrees of freedom.
00:41:35.430 --> 00:41:38.390
Sorry, I'm trying to
do quite a bit here.
00:41:38.390 --> 00:41:39.990
I'm trying to say
how you actually
00:41:39.990 --> 00:41:49.810
solve rectangular systems
when they're not determinate.
00:41:52.410 --> 00:41:54.730
There are many solutions.
00:41:54.730 --> 00:41:57.810
This is a good particular
solution to find,
00:41:57.810 --> 00:42:00.670
and this is a good way to
find the general solution,
00:42:00.670 --> 00:42:01.670
the null space solution.
00:42:01.670 --> 00:42:10.580
This is a combination
of the other columns.
00:42:10.580 --> 00:42:11.890
OK.
00:42:11.890 --> 00:42:14.030
All right, now
we're done really,
00:42:14.030 --> 00:42:19.830
because I now know
what u looks like; u
00:42:19.830 --> 00:42:23.690
looks like this part,
which I've computed,
00:42:23.690 --> 00:42:27.940
and this part, which
has the freedom.
00:42:27.940 --> 00:42:30.370
Let me put those
two parts together.
00:42:30.370 --> 00:42:37.200
So now I want to minimize --
so I'm near the end here --
00:42:37.200 --> 00:42:41.440
A*u minus B, but
A*u is u_particular,
00:42:41.440 --> 00:42:44.660
and I have u_particular here.
00:42:44.660 --> 00:42:50.210
Q_1 transpose R
minus transpose d,
00:42:50.210 --> 00:42:55.900
that's u_particular, plus
u_null, and that's this.
00:42:55.900 --> 00:43:06.430
This u_null was also here;
u_null was any Q_2 transpose z,
00:43:06.430 --> 00:43:07.640
right.
00:43:07.640 --> 00:43:10.480
All that is u, A*u minus b.
00:43:15.880 --> 00:43:16.420
OK.
00:43:19.730 --> 00:43:23.050
Up to possibly screwing
up on some transposes,
00:43:23.050 --> 00:43:26.790
this is the right method.
00:43:26.790 --> 00:43:29.400
So this is a fixed solution.
00:43:29.400 --> 00:43:31.930
I just want to write
that as a different way.
00:43:31.930 --> 00:43:37.350
Minimize A Q transpose z.
00:43:37.350 --> 00:43:41.380
Now, we're minimizing
over the z's.
00:43:41.380 --> 00:43:50.300
So u had n components, but
somehow p degrees of freedom
00:43:50.300 --> 00:43:53.950
were used up by the
constraint B*u equal d.
00:43:53.950 --> 00:43:57.320
And we have the n minus
p true degrees of freedom
00:43:57.320 --> 00:43:58.610
are in the z.
00:43:58.610 --> 00:44:03.430
So there's this minus the b.
00:44:03.430 --> 00:44:05.410
This is all known stuff.
00:44:05.410 --> 00:44:13.000
A Q_1 transpose R minus
transpose d, square.
00:44:13.000 --> 00:44:14.040
OK.
00:44:14.040 --> 00:44:17.020
I'm there.
00:44:17.020 --> 00:44:20.830
So this is a standard
minimization problem.
00:44:20.830 --> 00:44:26.850
Minimize, shall I
call this A tilde z?
00:44:26.850 --> 00:44:30.850
And I'll call all
this stuff b tilde.
00:44:30.850 --> 00:44:34.410
And the solution is found
from the normal equations
00:44:34.410 --> 00:44:42.250
A tilde transpose, A
tilde times the best
00:44:42.250 --> 00:44:46.760
z, I'll put a hat
on it to emphasize
00:44:46.760 --> 00:44:53.131
that it's the great one, is
A tilde transpose b tilde.
00:44:53.131 --> 00:44:53.630
OK.
00:44:57.500 --> 00:45:02.050
Finished that process
without leaving myself
00:45:02.050 --> 00:45:05.210
a lot of time for
the other method.
00:45:05.210 --> 00:45:13.540
Conclusion here, that after
you've done the QR step, the qr
00:45:13.540 --> 00:45:20.570
command, and then after
you solved a linear system
00:45:20.570 --> 00:45:27.930
with the R transpose, and you've
multiplied by Q's, and you've
00:45:27.930 --> 00:45:32.810
ended up with this problem
with a new matrix A tilde and B
00:45:32.810 --> 00:45:36.230
tilde, then you just
do the normal equation.
00:45:36.230 --> 00:45:44.580
The web will have the code
that takes those steps,
00:45:44.580 --> 00:45:47.150
reaches this conclusion,
and solves it.
00:45:47.150 --> 00:45:47.840
OK.
00:45:47.840 --> 00:45:50.760
So that's the null space method.
00:45:50.760 --> 00:45:56.150
And it would be our
method of choice when --
00:45:56.150 --> 00:46:01.290
so z has n minus p components.
00:46:01.290 --> 00:46:05.600
If p is near n, then they're
not many z's and this
00:46:05.600 --> 00:46:07.340
is highly efficient.
00:46:07.340 --> 00:46:07.910
OK.
00:46:07.910 --> 00:46:10.940
So the null space
method is one way to go.
00:46:10.940 --> 00:46:13.470
Can I just in the
remaining minutes
00:46:13.470 --> 00:46:22.040
go back to the Lagrange
multiplier idea?
00:46:22.040 --> 00:46:23.900
So what's the Lagrange
multiplier idea?
00:46:26.990 --> 00:46:28.980
So let me write
the problem again
00:46:28.980 --> 00:46:30.550
as Lagrange would like it.
00:46:30.550 --> 00:46:38.060
Minimize A*u minus b squared
subject to B*u equal d.
00:46:41.140 --> 00:46:44.290
That's the problem
we're solving.
00:46:44.290 --> 00:46:46.940
I should have
written it earlier.
00:46:46.940 --> 00:46:50.320
Let me put a star here,
because this is our problem.
00:46:50.320 --> 00:46:51.380
OK.
00:46:51.380 --> 00:46:55.600
So one way to tackle it
was take that constraint,
00:46:55.600 --> 00:46:57.200
give it a heavy weight.
00:46:57.200 --> 00:46:59.460
That was method one.
00:46:59.460 --> 00:47:06.050
Method two was solve this
constraint in full detail,
00:47:06.050 --> 00:47:12.350
get the z's that remain
as degrees of freedom,
00:47:12.350 --> 00:47:16.180
plug in u_particular plus
u null space into here,
00:47:16.180 --> 00:47:18.800
and then you have
a problem in the z.
00:47:18.800 --> 00:47:20.360
That's method two.
00:47:20.360 --> 00:47:24.400
Now, so method
three is Lagrange.
00:47:24.400 --> 00:47:28.440
So method three would say
OK, what does Lagrange do?
00:47:28.440 --> 00:47:37.250
L, we call it the Lagrangian, he
takes this A*u minus b square,
00:47:37.250 --> 00:47:40.590
and adds to it some
Lagrange multiplier,
00:47:40.590 --> 00:47:48.630
and I'll use maybe the standard
lambda, times B*u minus d,
00:47:48.630 --> 00:47:49.130
right?
00:47:49.130 --> 00:47:50.250
That's Lagrange's idea.
00:47:56.030 --> 00:47:58.230
You recognize Lagrange's idea.
00:47:58.230 --> 00:48:02.390
Takes the constraint,
multiply it by a multiplier.
00:48:02.390 --> 00:48:12.110
In fact this is p
constraints, so p lambdas.
00:48:12.110 --> 00:48:16.390
Lambda's a vector
of p multipliers.
00:48:16.390 --> 00:48:19.200
Not just a single one, because
we've got the p constraints.
00:48:19.200 --> 00:48:20.740
And now what does Lagrange do?
00:48:23.810 --> 00:48:30.770
He sets the derivative
dL / d lambda -- well,
00:48:30.770 --> 00:48:34.070
so let me do the dL/du first.
00:48:34.070 --> 00:48:40.570
He sets dL/du to 0,
and dL / d lambda to 0.
00:48:44.464 --> 00:48:46.130
I could've started
out with this method,
00:48:46.130 --> 00:48:50.310
because it's going to lead
us to the equations faster.
00:48:50.310 --> 00:48:53.230
What equations do we
get from dL/du equals 0?
00:48:53.230 --> 00:48:56.550
What's the gradient
with respect to u?
00:48:56.550 --> 00:48:59.770
That gives us A transpose A*u.
00:48:59.770 --> 00:49:04.200
Oh, probably we want a 1/2
here, so that the numbers
00:49:04.200 --> 00:49:05.220
come out right.
00:49:05.220 --> 00:49:13.730
We get A transpose A*u, and
another u part will be the B
00:49:13.730 --> 00:49:15.750
lambda.
00:49:15.750 --> 00:49:20.760
Taking the derivative of u will
produce a B transpose lambda
00:49:20.760 --> 00:49:22.430
out of that.
00:49:22.430 --> 00:49:26.050
Yeah, a B transpose
lambda out of that.
00:49:26.050 --> 00:49:29.800
And then, in here will
be a linear term in u
00:49:29.800 --> 00:49:31.990
that we might as well put
on the right-hand side
00:49:31.990 --> 00:49:35.670
as A transpose b.
00:49:35.670 --> 00:49:37.400
Familiar.
00:49:37.400 --> 00:49:39.010
OK.
00:49:39.010 --> 00:49:41.580
And what about dL / d lambda?
00:49:41.580 --> 00:49:48.050
Well that's just our
constraint, B*u equals d right.
00:49:51.230 --> 00:49:53.110
Having built in the
constraint, when
00:49:53.110 --> 00:49:55.520
I take the derivative
with respect to lambda,
00:49:55.520 --> 00:49:57.520
the constraint just
comes back again.
00:49:57.520 --> 00:50:01.700
So this is now method three.
00:50:01.700 --> 00:50:04.440
Solve that system.
00:50:04.440 --> 00:50:09.110
And I guess what I want say
in the remaining 30 seconds
00:50:09.110 --> 00:50:15.640
is that solving this system
is the same as this one.
00:50:15.640 --> 00:50:19.660
Those two are exactly the same.
00:50:19.660 --> 00:50:24.080
So that's a system with three
parts, but I can, as always --
00:50:24.080 --> 00:50:27.420
maybe I can even get there.
00:50:27.420 --> 00:50:30.720
Can you see that
if I take this part
00:50:30.720 --> 00:50:36.210
and I subtract A transpose
times the top row
00:50:36.210 --> 00:50:41.690
from the bottom row,
what will that give me?
00:50:41.690 --> 00:50:46.020
Let me just hope that it
works, well I won't actually.
00:50:46.020 --> 00:50:49.780
Time is up, it's asking
too much to do even
00:50:49.780 --> 00:50:53.890
this one piece of linear algebra
that can be in the notes.
00:50:53.890 --> 00:50:58.740
So this system that we got
as the correct limit equation
00:50:58.740 --> 00:51:01.980
is exactly the same
one that Lagrange gets.
00:51:01.980 --> 00:51:03.160
So that's one way.
00:51:03.160 --> 00:51:07.660
This is a system with
n plus p unknowns.
00:51:11.500 --> 00:51:15.020
That's the price you pay
for going Lagrange's route.
00:51:15.020 --> 00:51:17.060
You add p unknowns.
00:51:17.060 --> 00:51:23.330
This was a system with
n minus p unknowns.
00:51:23.330 --> 00:51:26.090
That's because you're
using the constraints
00:51:26.090 --> 00:51:28.300
to reduce the problem.
00:51:28.300 --> 00:51:34.780
And the original method one
was a method with n unknowns,
00:51:34.780 --> 00:51:35.850
the unknowns in u.
00:51:35.850 --> 00:51:41.600
So you have the choice n, n
plus p that Lagrange would like,
00:51:41.600 --> 00:51:46.910
and n minus p that
Golub/van Loan would prefer.
00:51:46.910 --> 00:51:53.810
And usually it's method two or
method three is recommended,
00:51:53.810 --> 00:51:56.120
but method one often used.
00:51:56.120 --> 00:51:56.690
OK.
00:51:56.690 --> 00:52:06.160
So that's the lecture
on this point.
00:52:06.160 --> 00:52:07.110
That's today.
00:52:07.110 --> 00:52:13.410
And then next week comes
the whole class of problems
00:52:13.410 --> 00:52:17.120
like finding velocities
from displacements,
00:52:17.120 --> 00:52:20.110
where alpha is a
small parameter.
00:52:20.110 --> 00:52:26.650
And then after that come
discussions of the completed
00:52:26.650 --> 00:52:32.020
project ones, and the upcoming
extensions into project two.
00:52:32.020 --> 00:52:32.620
OK.
00:52:32.620 --> 00:52:34.560
See you next week, thanks.
00:52:34.560 --> 00:52:35.810
Good.