WEBVTT

00:00:01.550 --> 00:00:03.920
The following content is
provided under a Creative

00:00:03.920 --> 00:00:05.310
Commons license.

00:00:05.310 --> 00:00:07.520
Your support will help
MIT OpenCourseWare

00:00:07.520 --> 00:00:11.610
continue to offer high-quality
educational resources for free.

00:00:11.610 --> 00:00:14.180
To make a donation or to
view additional materials

00:00:14.180 --> 00:00:18.140
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:18.140 --> 00:00:19.026
at ocw.mit.edu.

00:00:22.830 --> 00:00:25.220
GILBERT STRANG:
So I've mentioned

00:00:25.220 --> 00:00:27.890
randomized linear
algebra a few times,

00:00:27.890 --> 00:00:33.720
and I thought, OK, I'm going
to jump in and describe

00:00:33.720 --> 00:00:36.390
randomized matrix
multiplication.

00:00:36.390 --> 00:00:40.710
It's a pretty cool
idea, it seems to me.

00:00:40.710 --> 00:00:45.640
So this is a topic within
randomized linear algebra.

00:00:45.640 --> 00:00:48.180
And when would we be
doing any of this?

00:00:48.180 --> 00:00:53.320
It would be for matrices that
are just really, really large.

00:00:53.320 --> 00:01:00.090
So we plan to sample the
columns of A and sample

00:01:00.090 --> 00:01:03.210
the corresponding
rows of B, so actually

00:01:03.210 --> 00:01:07.530
that when we decide on a column,
we've also decided on a row.

00:01:07.530 --> 00:01:11.460
So we're taking
those pieces, which

00:01:11.460 --> 00:01:15.210
do correctly add up
to AB, but we're not

00:01:15.210 --> 00:01:17.040
going to take them all.

00:01:17.040 --> 00:01:20.490
We're going to take
different ones,

00:01:20.490 --> 00:01:23.940
randomly sampled
with probabilities--

00:01:23.940 --> 00:01:26.700
we have to decide
probabilities--

00:01:26.700 --> 00:01:33.840
and then we'll add
up our samples,

00:01:33.840 --> 00:01:38.520
and we hope that the
result is close to AB.

00:01:38.520 --> 00:01:40.570
That's the idea.

00:01:40.570 --> 00:01:48.510
OK, so this lecture then,
so I wrote these pages

00:01:48.510 --> 00:01:51.900
about six months ago.

00:01:51.900 --> 00:01:54.600
So I've been desperately
trying to remember

00:01:54.600 --> 00:01:58.650
what I wrote, because it's not
a subject I have ever spoken

00:01:58.650 --> 00:02:01.590
about before, but it's so neat.

00:02:01.590 --> 00:02:04.770
So here are some of the
things that come into it.

00:02:04.770 --> 00:02:08.100
We have to decide
on probabilities.

00:02:08.100 --> 00:02:11.540
Then we want to
compute the mean.

00:02:11.540 --> 00:02:15.150
So it's our first day with
some of these key ideas

00:02:15.150 --> 00:02:18.300
from statistics and probability.

00:02:18.300 --> 00:02:21.730
So we're going to take
probabilities that add to 1.

00:02:21.730 --> 00:02:24.390
We're going to figure
out what's the mean value

00:02:24.390 --> 00:02:28.230
of our random AB.

00:02:28.230 --> 00:02:34.170
We hope, and we will see that
the mean value of the random AB

00:02:34.170 --> 00:02:37.720
is correct AB.

00:02:37.720 --> 00:02:41.380
But there will be a variance.

00:02:41.380 --> 00:02:43.090
Every sample won't be correct.

00:02:43.090 --> 00:02:45.760
In fact, no samples
will be correct.

00:02:45.760 --> 00:02:49.840
Only when we add them up on
the average, they're correct,

00:02:49.840 --> 00:02:53.500
and we get to correct AB.

00:02:53.500 --> 00:02:55.620
So the mean will come out right.

00:02:55.620 --> 00:02:57.455
It will come out as AB.

00:02:57.455 --> 00:02:58.330
You'll see it happen.

00:03:02.065 --> 00:03:02.565
Correct.

00:03:06.380 --> 00:03:10.160
But there's a big
variance, not zero.

00:03:13.430 --> 00:03:17.060
We'll be all over the
place with our samples.

00:03:17.060 --> 00:03:20.390
They'll just average out
right, but they'll be all over.

00:03:20.390 --> 00:03:24.020
No particular sample
will be right at all.

00:03:24.020 --> 00:03:27.860
So then we want to pick
the best probabilities.

00:03:31.050 --> 00:03:35.990
So our job will be, once we
see how the system works,

00:03:35.990 --> 00:03:38.930
we're going to
assign probabilities.

00:03:38.930 --> 00:03:41.890
And we're going to choose
the probabilities that

00:03:41.890 --> 00:03:44.360
minimize the variance.

00:03:44.360 --> 00:03:47.870
So this is a typical
situation where

00:03:47.870 --> 00:03:50.780
the mean is pretty
straightforward

00:03:50.780 --> 00:03:54.200
and does what you
want, but having

00:03:54.200 --> 00:03:58.295
the correct mean does not mean
you've got good answers at all.

00:03:58.295 --> 00:04:01.400
And the average
of, you know, like

00:04:01.400 --> 00:04:05.630
minus 100 and 100 might be
the correct answer is zero,

00:04:05.630 --> 00:04:07.750
but you're way off.

00:04:07.750 --> 00:04:11.893
And this measures
how far you are.

00:04:11.893 --> 00:04:13.560
So I don't know if
you know these words.

00:04:13.560 --> 00:04:19.070
It's unfortunate, but I guess
18.065 can't change it now,

00:04:19.070 --> 00:04:22.310
that the variance is
written sigma squared.

00:04:22.310 --> 00:04:26.430
And we already have a good use
for the Greek letter sigma,

00:04:26.430 --> 00:04:32.020
but today it has a
different use for variance.

00:04:32.020 --> 00:04:36.460
And this-- Lagrange multipliers
will come in near the end.

00:04:36.460 --> 00:04:39.930
So basically, let
me do a practice

00:04:39.930 --> 00:04:43.770
example to recall
what mean and variance

00:04:43.770 --> 00:04:45.360
and what those are about.

00:04:45.360 --> 00:04:49.680
So let me take a
matrix that's 1 by 2.

00:04:49.680 --> 00:04:53.670
So my matrix is just
going to be a b.

00:04:57.080 --> 00:05:05.240
OK, I'm going to
sample that twice,

00:05:05.240 --> 00:05:08.160
and my rule for the two
samples will be the same.

00:05:08.160 --> 00:05:14.570
They will be a identically
distributed, totally identical.

00:05:14.570 --> 00:05:15.890
And what's my rule going to be?

00:05:15.890 --> 00:05:18.020
So this is like practice.

00:05:18.020 --> 00:05:20.810
My rule is going to be I take
that column or that column

00:05:20.810 --> 00:05:22.370
with probabilities I have.

00:05:26.320 --> 00:05:27.235
And I do it twice.

00:05:30.200 --> 00:05:31.630
And I take the average.

00:05:31.630 --> 00:05:35.630
So I'm going to take
probabilities are going to be

00:05:35.630 --> 00:05:39.800
1/2, 1/2 for the two columns.

00:05:39.800 --> 00:05:43.175
And I'm going to do s
equals to 2 samples.

00:05:47.840 --> 00:05:50.150
And I'm going to add--

00:05:50.150 --> 00:05:52.530
I'll weight them with--

00:05:52.530 --> 00:06:02.400
and I'll take the average
of the two samples.

00:06:02.400 --> 00:06:02.900
OK.

00:06:05.560 --> 00:06:11.690
And that will be my
randomized matrix.

00:06:11.690 --> 00:06:14.730
OK, so could we compute
the mean for the--

00:06:14.730 --> 00:06:18.900
so I've described a
randomized sampling process.

00:06:18.900 --> 00:06:21.870
I've given you the
probabilities, 1/2 and 1/2,

00:06:21.870 --> 00:06:24.210
the number of times
I'm going to do it,

00:06:24.210 --> 00:06:26.610
and then I divide by
that number of times.

00:06:26.610 --> 00:06:28.230
So this is really--

00:06:28.230 --> 00:06:32.850
I have a 1 over s here,
because I've got s of these.

00:06:32.850 --> 00:06:38.060
And now-- so what are
the possibilities here?

00:06:38.060 --> 00:06:39.310
I want to find the mean.

00:06:39.310 --> 00:06:43.530
First of all, let's
practice with the mean.

00:06:43.530 --> 00:06:46.740
OK, so here are two--

00:06:46.740 --> 00:06:49.770
I could think of two different
ways to compute the mean.

00:06:49.770 --> 00:06:52.080
Let me start with this one.

00:06:52.080 --> 00:06:54.380
What is the mean,
the average value--

00:06:54.380 --> 00:06:56.550
mean means average value--

00:06:56.550 --> 00:06:59.000
of the first sample?

00:06:59.000 --> 00:07:02.870
So the average value of
the first sample, I would--

00:07:02.870 --> 00:07:09.050
what is the mean, the general
formula is you add up all

00:07:09.050 --> 00:07:15.960
the sample times it's--

00:07:15.960 --> 00:07:18.710
the possible samples
times their probabilities.

00:07:21.670 --> 00:07:24.270
And in this case,
the probabilities

00:07:24.270 --> 00:07:30.690
are 1/2 that the
sample is a, 0 and 1/2

00:07:30.690 --> 00:07:32.430
that the sample is 0, b.

00:07:35.920 --> 00:07:38.680
So those are my two samples.

00:07:38.680 --> 00:07:45.850
And computing the mean
of the total, I get--

00:07:45.850 --> 00:07:50.710
but then mean for each sample,
but then I have to multiply,

00:07:50.710 --> 00:07:54.310
so let's put what I
got here 1/2 of a, b.

00:07:59.090 --> 00:08:02.140
That was the meaning
of each sample,

00:08:02.140 --> 00:08:05.650
because my probabilities
were equal, 1/2 and 1/2.

00:08:05.650 --> 00:08:09.040
And now, I've got
s equal 2 samples.

00:08:13.870 --> 00:08:24.480
So I multiply by 2, and
I get a, b as the mean.

00:08:27.510 --> 00:08:28.675
Mean is correct.

00:08:36.844 --> 00:08:38.340
Good.

00:08:38.340 --> 00:08:40.530
We did the easy one, the mean.

00:08:40.530 --> 00:08:44.710
Now, practice with a
variance, or else quit here.

00:08:44.710 --> 00:08:46.450
Maybe I should quit
while I'm ahead.

00:08:46.450 --> 00:08:53.080
I've got the mean exactly right,
but of course, the samples

00:08:53.080 --> 00:08:54.010
might not be right.

00:08:56.530 --> 00:08:58.690
So now for the variance.

00:08:58.690 --> 00:09:00.820
So what is variance?

00:09:00.820 --> 00:09:01.780
Do you remember that?

00:09:01.780 --> 00:09:04.960
There are actually two
ways to compute variance.

00:09:04.960 --> 00:09:09.070
Let me just remember those over
here and push that board up.

00:09:11.940 --> 00:09:13.740
So the variance sigma squared.

00:09:18.600 --> 00:09:20.280
Forgive me, if you're
a statistician,

00:09:20.280 --> 00:09:25.170
this is like you were
born knowing this.

00:09:25.170 --> 00:09:27.670
But the rest of us, we're not.

00:09:27.670 --> 00:09:33.850
So the variance is the sum--

00:09:33.850 --> 00:09:38.080
one way to do it is add up
the different probabilities

00:09:38.080 --> 00:09:45.190
of different things that
could happen of output

00:09:45.190 --> 00:09:49.890
minus the mean squared.

00:09:49.890 --> 00:09:52.630
So it's the average--

00:09:52.630 --> 00:10:02.030
it's the average distance
squared from the mean.

00:10:04.860 --> 00:10:08.780
So it takes whatever output
that came with output number

00:10:08.780 --> 00:10:13.430
i, minus the mean, which
is the average output.

00:10:13.430 --> 00:10:16.510
I square those,
and I get a number.

00:10:16.510 --> 00:10:19.350
And that sort of tells me how--

00:10:19.350 --> 00:10:30.310
it tells me like in
the famous Gaussian,

00:10:30.310 --> 00:10:32.410
if I had a Gaussian
distribution here,

00:10:32.410 --> 00:10:34.840
I have a distribution
of 1/2 and 1/2.

00:10:34.840 --> 00:10:38.170
So like that maybe
even has a name,

00:10:38.170 --> 00:10:42.860
like binomial or something
or Bernoulli or whatever.

00:10:42.860 --> 00:10:45.890
But here on this Gaussian
that we all remember,

00:10:45.890 --> 00:10:53.180
can I mark what in that
figure where the mean is?

00:10:53.180 --> 00:10:54.930
Right in the center.

00:10:54.930 --> 00:10:56.210
OK.

00:10:56.210 --> 00:10:58.600
Mean.

00:10:58.600 --> 00:11:00.910
And what is the variance?

00:11:00.910 --> 00:11:04.960
Just to recall what
everybody in the first time

00:11:04.960 --> 00:11:08.290
may even hear that
word variance,

00:11:08.290 --> 00:11:11.950
what is the variance
kind of measuring?

00:11:11.950 --> 00:11:15.040
You're summing squares,
so whether you're

00:11:15.040 --> 00:11:16.930
on the right of the
mean or the left

00:11:16.930 --> 00:11:20.080
of the mean, no difference,
because you're squaring it.

00:11:20.080 --> 00:11:22.620
And it's the distance.

00:11:22.620 --> 00:11:28.090
The variance would be sort
of like a typical width.

00:11:28.090 --> 00:11:29.860
Maybe I overdid it.

00:11:29.860 --> 00:11:33.250
But that would be a
sort of typical sigma.

00:11:35.860 --> 00:11:40.270
I'm really just-- since
the words statistics, mean,

00:11:40.270 --> 00:11:42.970
and variance haven't
been mentioned in 18.065

00:11:42.970 --> 00:11:46.000
until today, I'm just
kind of recalling.

00:11:46.000 --> 00:11:52.840
OK, so now I'm prepared
to compute this example.

00:11:52.840 --> 00:11:57.640
OK, maybe I'll-- maybe
I'll compute it over here.

00:11:57.640 --> 00:12:03.710
OK, so shall I compute the
variance for each sample,

00:12:03.710 --> 00:12:07.090
and then I'll multiply by 2,
because I have two samples.

00:12:07.090 --> 00:12:09.610
So what are they--

00:12:09.610 --> 00:12:12.390
so this is the sigma
squared sample.

00:12:16.070 --> 00:12:19.250
Obviously, I could write
down all the possibilities.

00:12:19.250 --> 00:12:21.530
Yeah, let me just do the sigma.

00:12:21.530 --> 00:12:28.820
So the sample could either
have picked out a, 0 or 0, b.

00:12:28.820 --> 00:12:30.900
And the probabilities
were a 1/2.

00:12:30.900 --> 00:12:39.080
So I have 1/2 times the
probability times the output.

00:12:39.080 --> 00:12:47.810
Let's say the output
is a, 0 minus the mean,

00:12:47.810 --> 00:12:51.460
which was a over 2, b over 2.

00:12:54.330 --> 00:12:56.800
And I want to square that.

00:12:56.800 --> 00:13:00.420
So that was one possibility
when I picked a, 0,

00:13:00.420 --> 00:13:02.730
and the other one,
which I'm also

00:13:02.730 --> 00:13:05.400
doing with probability
1/2, is in case

00:13:05.400 --> 00:13:08.280
I picked 0, b, what was--

00:13:15.730 --> 00:13:19.980
you see, I'm not getting
0 for the variance,

00:13:19.980 --> 00:13:22.470
because I'm making
an error every time.

00:13:22.470 --> 00:13:26.550
I'm never getting the correct
a, 0 or the correct 0,

00:13:26.550 --> 00:13:31.110
b, because I'm always doing
this one in the middle.

00:13:31.110 --> 00:13:36.600
Now, if I compute all
that, I get a quantity,

00:13:36.600 --> 00:13:40.950
and maybe I'll just,
to be on the safe side,

00:13:40.950 --> 00:13:42.280
ask your forgiveness.

00:13:42.280 --> 00:13:44.400
If I write the answer.

00:13:44.400 --> 00:13:47.280
And we could even try
to get the answer, but--

00:13:53.760 --> 00:13:57.550
so this is from two samples.

00:13:57.550 --> 00:13:58.810
So this is double that one.

00:14:06.530 --> 00:14:10.090
I guess I'm bold
enough to try it.

00:14:10.090 --> 00:14:16.030
So a, 0, so that would be
minus a over 2 and a b over 2.

00:14:16.030 --> 00:14:20.800
I think we got here 1/2 of--

00:14:20.800 --> 00:14:33.380
I think-- looks to me like a
over 2 squared plus b over--

00:14:33.380 --> 00:14:39.862
I'm missing my plus or minus,
but when I'm squaring them,

00:14:39.862 --> 00:14:41.320
that's the whole
point of variance.

00:14:41.320 --> 00:14:42.590
Doesn't matter.

00:14:42.590 --> 00:14:45.340
And the b over 2.

00:14:45.340 --> 00:14:51.790
And here, I think I'm
wrong by a over 2,

00:14:51.790 --> 00:14:57.160
and I'm wrong by b over
2 or minus a over 2.

00:14:57.160 --> 00:15:00.080
But when I square them
again, doesn't matter.

00:15:00.080 --> 00:15:05.770
So I think I get another
1/2 of a over 2 squared

00:15:05.770 --> 00:15:07.180
plus b over 2 squared.

00:15:13.700 --> 00:15:17.360
Forgive me for this
simple computation,

00:15:17.360 --> 00:15:19.670
but just to practice.

00:15:19.670 --> 00:15:20.900
So what have I got?

00:15:20.900 --> 00:15:22.550
I've got a 1/2 of
that and 1/2 of that.

00:15:22.550 --> 00:15:26.260
So that adds up to this
thing a squared over 4

00:15:26.260 --> 00:15:33.790
plus b squared over 4, but
then I'm doing two samples.

00:15:33.790 --> 00:15:36.230
I have to multiply by
the number of samples.

00:15:36.230 --> 00:15:39.880
So I think so times
2 for two samples.

00:15:39.880 --> 00:15:41.950
I think I'm getting--

00:15:41.950 --> 00:15:49.220
it was 1/4, but now it will
be 1/2 of a squared b squared.

00:15:49.220 --> 00:15:51.360
Yeah, I didn't-- yeah, yeah.

00:15:53.870 --> 00:15:56.450
I think that's
right, but forgive me

00:15:56.450 --> 00:15:58.790
while I just ask myself.

00:15:58.790 --> 00:15:59.290
Yeah.

00:16:03.990 --> 00:16:07.010
This will be-- actually, you
already have these notes.

00:16:07.010 --> 00:16:08.870
This is section 2.4.

00:16:08.870 --> 00:16:12.550
So I think it's
there on Stellar.

00:16:12.550 --> 00:16:16.210
So what's the point of this?

00:16:16.210 --> 00:16:20.680
First point was to like
remember some of the steps that

00:16:20.680 --> 00:16:23.660
go into the variance.

00:16:23.660 --> 00:16:25.640
Oh, there's another
formula for variance

00:16:25.640 --> 00:16:27.310
and I want to tell you.

00:16:27.310 --> 00:16:31.650
And the second point is
to bring in a new idea.

00:16:34.290 --> 00:16:36.720
Suppose we want to make this--

00:16:36.720 --> 00:16:39.890
suppose this variance
is bigger than we want.

00:16:39.890 --> 00:16:44.540
Suppose, for example, that
b is a lot bigger than a.

00:16:44.540 --> 00:16:46.970
Suppose b is a
lot bigger than a.

00:16:46.970 --> 00:16:49.130
Then what should we
have done differently

00:16:49.130 --> 00:16:53.030
in this randomized
linear algebra?

00:16:53.030 --> 00:17:01.940
If I'm trying to get this thing
close, get close to that thing,

00:17:01.940 --> 00:17:04.849
and if b is a lot bigger
than a, then what should

00:17:04.849 --> 00:17:06.460
I do differently?

00:17:09.280 --> 00:17:14.099
I don't know what b is exactly,
but I have the information

00:17:14.099 --> 00:17:16.710
that it's bigger than a.

00:17:16.710 --> 00:17:18.810
Then I should increase
the probability--

00:17:18.810 --> 00:17:22.310
I shouldn't do half and half.

00:17:22.310 --> 00:17:28.440
So here was randomized
sampling taking the average.

00:17:28.440 --> 00:17:32.070
My probabilities
were a 1/2 and a 1/2.

00:17:36.560 --> 00:17:41.770
I believe that I could
keep the mean correct.

00:17:41.770 --> 00:17:46.050
Of course, that's fundamental
to get the mean right.

00:17:46.050 --> 00:17:49.860
And get a better answer,
you get a smaller variance

00:17:49.860 --> 00:17:55.680
than that b squared over
there by picking that thing

00:17:55.680 --> 00:17:57.210
with higher probability.

00:17:57.210 --> 00:18:02.270
So that's where the randomized--

00:18:02.270 --> 00:18:07.390
it turns out to be called
norm squared probability.

00:18:07.390 --> 00:18:16.310
The decision on what the
probability should be goes--

00:18:16.310 --> 00:18:19.010
it turns out to be
the optimal one,

00:18:19.010 --> 00:18:22.470
goes with the
square of the size.

00:18:22.470 --> 00:18:25.400
So if b is twice
as big as a, and I

00:18:25.400 --> 00:18:29.540
want to get the variance
down, then the probability--

00:18:29.540 --> 00:18:32.630
I should use probabilities
that are four times--

00:18:35.750 --> 00:18:38.800
four times as often I
will choose b than a.

00:18:38.800 --> 00:18:42.594
That's going to be the
conclusion at 2 o'clock,

00:18:42.594 --> 00:18:44.810
hopefully.

00:18:44.810 --> 00:18:48.090
OK, so that's one point.

00:18:48.090 --> 00:18:51.260
And just another
little point while we

00:18:51.260 --> 00:18:57.080
are reviewing variance,
this is the standard formula

00:18:57.080 --> 00:19:04.710
for the variance, sum of
all the possible outcomes

00:19:04.710 --> 00:19:10.070
with their probabilities, the
distance from the mean squared.

00:19:10.070 --> 00:19:14.090
Do you know a second formula,
which is very close to this

00:19:14.090 --> 00:19:18.470
and very similar, and it
comes from substituting

00:19:18.470 --> 00:19:22.790
the meaning of the mean,
substituting what the mean is?

00:19:22.790 --> 00:19:28.150
So yeah, I just want to
mention a second formula.

00:19:30.800 --> 00:19:34.270
And I don't know which
one we'll actually use.

00:19:34.270 --> 00:19:41.110
But the second formula for the
same quantity, sigma squared,

00:19:41.110 --> 00:19:48.870
is the sum of probabilities
times output squared.

00:19:55.510 --> 00:19:58.660
So I haven't subtracted off the
mean in this second formula.

00:19:58.660 --> 00:20:00.350
I have to do it now.

00:20:00.350 --> 00:20:02.290
And I'll do the mean--

00:20:02.290 --> 00:20:05.440
I'll do the mean all
at once, mean squared.

00:20:11.860 --> 00:20:16.150
Of course, the mean involves--

00:20:16.150 --> 00:20:19.890
remember that the mean is
the sum of the probability

00:20:19.890 --> 00:20:20.650
times the outcome.

00:20:25.710 --> 00:20:29.910
And it's just playing
with a little algebra

00:20:29.910 --> 00:20:32.760
to show that you can either--
you have a choice of whatever

00:20:32.760 --> 00:20:37.650
is more convenient, subtract
the mean of from each output

00:20:37.650 --> 00:20:42.120
or do all the
outputs, but then you

00:20:42.120 --> 00:20:45.570
haven't accounted for
the fact that you really

00:20:45.570 --> 00:20:48.390
want the distances
from the mean,

00:20:48.390 --> 00:20:50.670
and then you subtract
off the mean squared.

00:20:50.670 --> 00:20:56.150
Two ways to do it, two
ways, equal ways to do it.

00:20:56.150 --> 00:21:04.810
Yeah, we will review the basic
ideas of mean and variance

00:21:04.810 --> 00:21:09.750
in the section on probability.

00:21:09.750 --> 00:21:12.000
Here, yes, question?

00:21:12.000 --> 00:21:12.500
Yeah.

00:21:12.500 --> 00:21:15.690
AUDIENCE: Is the mean
a part of [INAUDIBLE]??

00:21:15.690 --> 00:21:16.880
GILBERT STRANG: The mean?

00:21:16.880 --> 00:21:18.540
Oh, in here?

00:21:18.540 --> 00:21:19.260
That's separate.

00:21:19.260 --> 00:21:20.880
Yeah, that's the whole point.

00:21:20.880 --> 00:21:24.960
Yeah, so this was like, do
this, and then subtract off

00:21:24.960 --> 00:21:26.490
the mean squared.

00:21:26.490 --> 00:21:32.620
Or keep the mean in every
term, and do it that way.

00:21:32.620 --> 00:21:38.910
Yeah, you could verify
that the two are the same.

00:21:38.910 --> 00:21:44.760
OK, so when we go now
to the bigger question,

00:21:44.760 --> 00:21:48.110
I've forgotten which way I do
it, but I'm free to choose.

00:21:48.110 --> 00:21:52.680
OK, is that like small
sample reasonable,

00:21:52.680 --> 00:21:58.630
and you get the
idea that if the--

00:21:58.630 --> 00:22:04.390
if we know that if we look
at our matrix, first of all,

00:22:04.390 --> 00:22:09.820
and find out which columns are
large, large norm, and which

00:22:09.820 --> 00:22:14.620
columns are smaller, then that
might be useful information

00:22:14.620 --> 00:22:19.240
to weight our probabilities to
pick the larger one more often.

00:22:19.240 --> 00:22:21.030
OK.

00:22:21.030 --> 00:22:22.350
OK.

00:22:22.350 --> 00:22:26.010
In fact, let me just tell you
what are the two possibilities

00:22:26.010 --> 00:22:27.660
there.

00:22:27.660 --> 00:22:32.490
One is what I just said,
weight your probabilities

00:22:32.490 --> 00:22:36.330
by the square of the norm,
this norm squared weighting

00:22:36.330 --> 00:22:42.730
that we'll see and take
the columns as they come,

00:22:42.730 --> 00:22:45.540
but with higher probability
on the big columns,

00:22:45.540 --> 00:22:53.140
or you could say another way
would be mix the columns,

00:22:53.140 --> 00:22:57.060
so that they more or
less have similar sizes,

00:22:57.060 --> 00:23:03.220
and then, keep track
of what you've done,

00:23:03.220 --> 00:23:07.330
and then just the
probabilities can all be equal.

00:23:07.330 --> 00:23:08.980
So that would be the other way.

00:23:08.980 --> 00:23:11.710
Take your matrix,
mix it up, take

00:23:11.710 --> 00:23:15.930
combinations of the columns
with random numbers.

00:23:15.930 --> 00:23:17.290
It's a random world here.

00:23:20.500 --> 00:23:25.565
Do a mixing, and then
operate on the mixed matrix.

00:23:28.210 --> 00:23:31.420
OK, I'm going to do
it the first way.

00:23:31.420 --> 00:23:35.560
I'm going to pick these
probabilities to--

00:23:35.560 --> 00:23:39.220
they'll turn out to be
proportional to norm squared.

00:23:39.220 --> 00:23:41.080
OK, ready for that?

00:23:41.080 --> 00:23:41.680
Here it comes.

00:23:45.990 --> 00:23:52.180
So let me bring that down.

00:23:54.690 --> 00:23:55.990
Yeah.

00:23:55.990 --> 00:23:57.150
OK.

00:23:57.150 --> 00:23:59.150
Actually, I could
leave it up for now,

00:23:59.150 --> 00:24:03.990
because it told us
what we're up to.

00:24:03.990 --> 00:24:04.490
OK.

00:24:07.550 --> 00:24:09.230
So what have I got?

00:24:09.230 --> 00:24:10.580
Let me just see if I can--

00:24:13.290 --> 00:24:16.070
so we're multiplying
a times b, and we're

00:24:16.070 --> 00:24:17.880
going to use these
probabilities.

00:24:17.880 --> 00:24:26.540
Pj is going to be the length
of that column times the length

00:24:26.540 --> 00:24:32.100
of that row, norm squared.

00:24:32.100 --> 00:24:36.750
Well, norm squared, if I was
multiplying a by a transpose,

00:24:36.750 --> 00:24:38.400
then I really would be squaring.

00:24:38.400 --> 00:24:41.340
That would be the same as that.

00:24:41.340 --> 00:24:46.680
So I'm going to use the word
norm squared or length squared.

00:24:46.680 --> 00:24:51.050
Also, here, where the two--

00:24:51.050 --> 00:24:54.620
I'm not assuming that
b is a transpose.

00:24:54.620 --> 00:24:57.510
OK, so that will be
the probabilities

00:24:57.510 --> 00:24:59.310
will be proportional to that.

00:24:59.310 --> 00:25:03.730
But now, those that
don't add up to 1, so how

00:25:03.730 --> 00:25:06.640
do I make the
probabilities add up to 1?

00:25:06.640 --> 00:25:16.810
This is the probability
of choosing column

00:25:16.810 --> 00:25:26.230
j of a times times row j of b.

00:25:26.230 --> 00:25:28.870
That's what Pj refers to.

00:25:32.720 --> 00:25:37.070
OK, so what is my plan?

00:25:37.070 --> 00:25:39.620
Oh, I have to make the
probabilities add to 1,

00:25:39.620 --> 00:25:46.610
or I'm really breaking
the fundamental law here.

00:25:46.610 --> 00:25:49.490
So if I have a bunch
of probabilities,

00:25:49.490 --> 00:25:52.180
and I kind of know what I want,
but they don't add up to 1,

00:25:52.180 --> 00:25:55.010
what do I do?

00:25:55.010 --> 00:25:56.930
Divide by their sum.

00:25:56.930 --> 00:25:58.790
Let me call c their sum.

00:25:58.790 --> 00:26:02.570
So the probability is
going to be that over c,

00:26:02.570 --> 00:26:09.470
and c is going to be the sum of
however many rows and columns.

00:26:09.470 --> 00:26:18.790
I guess maybe I had r in my
picture of aj bj transpose.

00:26:18.790 --> 00:26:24.270
OK, so all I did was
scale the probability so

00:26:24.270 --> 00:26:26.640
that they now add to 1.

00:26:26.640 --> 00:26:28.100
Good.

00:26:28.100 --> 00:26:31.820
OK, so now I'm
ready to go to work.

00:26:31.820 --> 00:26:34.460
I'm ready to choose--

00:26:34.460 --> 00:26:37.620
oh, yes, so here's my rule.

00:26:37.620 --> 00:26:45.860
I will choose column row
j with this probability,

00:26:45.860 --> 00:26:47.930
but then I'm going
to multiply it,

00:26:47.930 --> 00:26:50.990
and I'm free to do
that if I want to.

00:26:50.990 --> 00:26:58.340
So my approximation, my
approximate AB will be--

00:26:58.340 --> 00:27:04.540
I'll take this,
whichever comes out,

00:27:04.540 --> 00:27:09.970
I'll take the aj bj
transpose that comes out.

00:27:09.970 --> 00:27:13.180
It comes out with
probability Pj.

00:27:13.180 --> 00:27:16.000
But I'm going to
divide this by--

00:27:16.000 --> 00:27:18.384
and I think I'm, this
is the right one--

00:27:18.384 --> 00:27:22.390
s, the number of
samples, times Pj.

00:27:22.390 --> 00:27:27.670
So I thought, at
first, that's weird.

00:27:27.670 --> 00:27:33.480
Went to all the trouble to
pick these Pj's, claiming

00:27:33.480 --> 00:27:34.950
that these are the good ones.

00:27:34.950 --> 00:27:39.060
So my claim to eventually
prove at the end--

00:27:39.060 --> 00:27:42.420
first, I'll have to understand
how the sampling is done.

00:27:42.420 --> 00:27:44.340
That's like the most important.

00:27:44.340 --> 00:27:47.100
But then when I go
to compute the mean,

00:27:47.100 --> 00:27:49.930
I'll get the correct
mean, and when

00:27:49.930 --> 00:27:52.240
I go to compute
the variance, I'll

00:27:52.240 --> 00:27:55.210
get some expression
for the variance,

00:27:55.210 --> 00:28:02.640
and then the plan will be
choose these Pj's to minimize

00:28:02.640 --> 00:28:06.030
that total variance.

00:28:06.030 --> 00:28:07.860
So this is what--

00:28:07.860 --> 00:28:09.870
that's a typical sample.

00:28:09.870 --> 00:28:16.770
With probability Pj, pick
that that matrix, that

00:28:16.770 --> 00:28:19.600
rank 1 matrix.

00:28:19.600 --> 00:28:24.220
So then my approximate AB
is the sum of all these

00:28:24.220 --> 00:28:26.050
over s samples.

00:28:30.800 --> 00:28:32.150
Are you with me?

00:28:32.150 --> 00:28:34.660
Let me just repeat.

00:28:34.660 --> 00:28:37.970
I'm trying to multiply AB.

00:28:37.970 --> 00:28:41.150
Each sample is just a
single column times row.

00:28:41.150 --> 00:28:43.850
So it's way wrong, way wrong.

00:28:43.850 --> 00:28:46.960
It's just a tiny piece of AB.

00:28:46.960 --> 00:28:50.890
But I take that sample
with probability Pj,

00:28:50.890 --> 00:28:55.180
and I divide it by S Pj, so
that the Pj's cancel here.

00:29:00.740 --> 00:29:02.320
Oh, yes.

00:29:02.320 --> 00:29:04.640
OK, right.

00:29:04.640 --> 00:29:10.420
So I would like to see
that the mean is correct.

00:29:10.420 --> 00:29:14.480
I would like to see that
the mean is correct.

00:29:14.480 --> 00:29:17.640
I'm going to compute
the mean of my process.

00:29:17.640 --> 00:29:20.640
So like it's falling
into my lap here.

00:29:20.640 --> 00:29:22.260
I made it that way.

00:29:22.260 --> 00:29:24.540
These Pj's cancel.

00:29:24.540 --> 00:29:26.460
I divided by s.

00:29:26.460 --> 00:29:30.690
So the mean of a
typical sample will be--

00:29:30.690 --> 00:29:41.910
so the mean of one
sample is the probability

00:29:41.910 --> 00:29:43.800
of getting it times what I take.

00:29:43.800 --> 00:29:51.080
So it's just the sum of
aj bj transpose over s.

00:29:51.080 --> 00:29:53.580
You're going to say, OK,
you're wasting our time.

00:29:53.580 --> 00:29:55.440
But we got--

00:29:55.440 --> 00:29:57.000
I would just want
to show that I'm

00:29:57.000 --> 00:30:02.010
getting the correct
mean out of this plan.

00:30:02.010 --> 00:30:05.370
So do you see that if
that's a mean of one sample,

00:30:05.370 --> 00:30:08.730
so what's the mean of the
sum of all the samples?

00:30:11.310 --> 00:30:14.910
Well, multiply by s, because
it was the same mean.

00:30:14.910 --> 00:30:17.490
Every sample had
the same mean, just

00:30:17.490 --> 00:30:21.960
as it did in our Little
League practice example.

00:30:21.960 --> 00:30:24.090
So that's the mean
of one sample.

00:30:24.090 --> 00:30:33.970
So the mean of all samples added
together, multiplies this by s.

00:30:33.970 --> 00:30:36.630
The s's cancel, and I get AB.

00:30:43.590 --> 00:30:49.010
Remembering my-- however way
I defined AB there, yeah.

00:30:49.010 --> 00:30:49.510
Yeah.

00:30:53.880 --> 00:30:58.990
All I'm saying here is that
I did something reasonable

00:30:58.990 --> 00:31:05.440
in the sampling process, so
that the mean came out right.

00:31:05.440 --> 00:31:08.575
And now is the hard
part, the variance.

00:31:11.600 --> 00:31:13.130
What's the variance?

00:31:13.130 --> 00:31:15.650
OK, so what do I
have to compute--

00:31:15.650 --> 00:31:38.560
and I may-- it will depend on
the p's, p1 to pr, I guess.

00:31:38.560 --> 00:31:42.970
We had r different rows,
different column row

00:31:42.970 --> 00:31:49.390
pairs to choose, and we chose
probabilities, these guys,

00:31:49.390 --> 00:31:52.030
that depended on this size.

00:31:52.030 --> 00:31:56.750
And now I'm going to
compute the variance,

00:31:56.750 --> 00:32:03.900
and it won't be 0, because
every sample is wrong.

00:32:03.900 --> 00:32:07.080
I'm never getting from a sample.

00:32:07.080 --> 00:32:11.130
A sample is just giving me a
column times a row, a rank 1

00:32:11.130 --> 00:32:17.410
guy, and they averaged out
to give the correct product.

00:32:17.410 --> 00:32:22.660
But each one is certainly wrong,
because it's just a rank 1.

00:32:22.660 --> 00:32:26.940
So when I compute variance, I'm
going to definitely not get 0,

00:32:26.940 --> 00:32:28.380
right?

00:32:28.380 --> 00:32:31.910
In other words, of course,
when would the variance be 0?

00:32:34.530 --> 00:32:37.580
Yeah, if AB were rank 1, I guess
I'd get it right every time.

00:32:37.580 --> 00:32:38.250
Thanks.

00:32:38.250 --> 00:32:40.560
That was a better answer
than I had in mind.

00:32:40.560 --> 00:32:42.120
Yeah, yeah.

00:32:42.120 --> 00:32:46.780
The variance would only be
0 if every sample was right.

00:32:46.780 --> 00:32:49.930
And that would be true
if the rank was 1,

00:32:49.930 --> 00:32:51.760
and there was only
one thing to choose.

00:32:51.760 --> 00:32:55.780
But that's not the
problem we want.

00:32:55.780 --> 00:32:57.600
OK, so the variance is there.

00:33:01.660 --> 00:33:07.650
My instinct is to tell you
what this calculation produces,

00:33:07.650 --> 00:33:09.240
since you and I can read.

00:33:14.060 --> 00:33:17.690
Would you allow me to do that?

00:33:17.690 --> 00:33:26.210
So here, the variance
for a sample turned out

00:33:26.210 --> 00:33:29.670
to equal, so we
will figure it out,

00:33:29.670 --> 00:33:33.290
turns out to equal
the sum over--

00:33:33.290 --> 00:33:42.980
as it was up there of the aj
bj transpose, probably squared.

00:33:42.980 --> 00:33:46.450
Let me just check.

00:33:46.450 --> 00:33:48.470
Yes, squared.

00:33:48.470 --> 00:33:51.990
Yeah, why don't I
help myself here?

00:33:51.990 --> 00:33:57.140
So these are squared because
variances are squared.

00:33:57.140 --> 00:34:01.310
And then when I
look to see what--

00:34:01.310 --> 00:34:05.940
I think there is an s
there, and there's a Pj,

00:34:05.940 --> 00:34:10.380
so why is there a Pj there,
when it canceled here?

00:34:10.380 --> 00:34:15.000
So here, the Pj, when I
multiply by that, canceled.

00:34:15.000 --> 00:34:19.030
Why doesn't it
cancel over there?

00:34:19.030 --> 00:34:21.460
Because it's squared over there.

00:34:21.460 --> 00:34:23.500
Over there, this
thing is squared.

00:34:23.500 --> 00:34:25.389
So it was Pj twice.

00:34:25.389 --> 00:34:28.170
Here, I have Pj, its
probability once.

00:34:28.170 --> 00:34:33.250
So I've still got the Pj in the
denominator, one factor of Pj

00:34:33.250 --> 00:34:35.350
in the denominator.

00:34:35.350 --> 00:34:37.600
And then-- so that is--

00:34:37.600 --> 00:34:41.230
I guess what I'm
doing is I'm computing

00:34:41.230 --> 00:34:44.610
the variance this way.

00:34:44.610 --> 00:34:48.830
So what I've computed
now is this first bit,

00:34:48.830 --> 00:34:54.350
and then I said should
subtract the mean squared.

00:34:54.350 --> 00:34:57.410
And this is for one sample.

00:34:57.410 --> 00:35:01.275
So the mean squared is--

00:35:04.260 --> 00:35:10.440
I think it turns out to be
1 over s times AB squared

00:35:10.440 --> 00:35:12.830
in this Frobenius norm.

00:35:12.830 --> 00:35:17.700
It's a squared plus b squared
stuff that I saw before.

00:35:21.080 --> 00:35:25.460
OK, so this--

00:35:25.460 --> 00:35:33.142
I've jumped a serious
step to get from the sum--

00:35:33.142 --> 00:35:35.110
the formula for the variance.

00:35:35.110 --> 00:35:40.730
I've plugged in this
problem and got that.

00:35:40.730 --> 00:35:43.350
OK, and now I'm going to sample.

00:35:46.670 --> 00:35:51.760
Let's see where-- yeah.

00:35:51.760 --> 00:35:53.410
I would like to simplify this.

00:35:56.570 --> 00:35:58.010
I would like to simplify that.

00:36:01.710 --> 00:36:05.140
So I have to plug in the Pj's.

00:36:05.140 --> 00:36:08.380
OK, so after plug
in for that Pj,

00:36:08.380 --> 00:36:13.930
and we decided what Pj
was going to be here.

00:36:18.270 --> 00:36:23.380
OK, so when I plug that
in in the denominator,

00:36:23.380 --> 00:36:27.670
it will cancel one of these.

00:36:27.670 --> 00:36:33.870
And I'll just have a sum
of of aj Pj bj norms.

00:36:33.870 --> 00:36:38.730
And what that is C.

00:36:38.730 --> 00:36:42.920
So let me say this again just.

00:36:42.920 --> 00:36:46.620
It's something you can just
check when you have a minute.

00:36:46.620 --> 00:36:53.600
When I plug in that value for
Pj here, it cancels the squares

00:36:53.600 --> 00:36:55.800
and just leaves the first power.

00:36:55.800 --> 00:37:03.530
So then I'm adding up the
first power, and I get C.

00:37:03.530 --> 00:37:08.800
But the Pj had a factor
C in the denominator,

00:37:08.800 --> 00:37:12.110
and it's in the denominator over
there, so that C is up there.

00:37:12.110 --> 00:37:19.190
So it's C squared coming
here, a constant squared,

00:37:19.190 --> 00:37:20.640
minus the other term.

00:37:20.640 --> 00:37:22.580
There's a 1 over s.

00:37:22.580 --> 00:37:24.200
That will eventually go away.

00:37:24.200 --> 00:37:28.070
And this other term
is 1 over s norm

00:37:28.070 --> 00:37:32.360
AB the Fromenius norm squared.

00:37:34.940 --> 00:37:37.550
Or maybe 1 over s's are--

00:37:42.370 --> 00:37:50.800
so you're seeing-- and I
apologize, a little bit messy

00:37:50.800 --> 00:37:52.720
bit of algebra.

00:37:52.720 --> 00:37:54.370
A little bit messy
bit of algebra.

00:37:54.370 --> 00:37:57.910
But that's what
we ended up with.

00:37:57.910 --> 00:38:00.880
And when we take s
samples and combine them,

00:38:00.880 --> 00:38:05.530
that will cancel
the s, and I think

00:38:05.530 --> 00:38:09.350
it'll knock that out when
we combine the s samples.

00:38:09.350 --> 00:38:09.850
OK.

00:38:19.340 --> 00:38:21.660
OK.

00:38:21.660 --> 00:38:23.700
Now what?

00:38:28.480 --> 00:38:34.080
Now, we get to choose
those probabilities.

00:38:34.080 --> 00:38:35.580
And how are we going
to choose them?

00:38:38.720 --> 00:38:40.130
What will be the best choice?

00:38:40.130 --> 00:38:42.350
Here is the expression
for the variance.

00:38:42.350 --> 00:38:43.500
Yeah, this is good.

00:38:43.500 --> 00:38:45.380
This is good.

00:38:45.380 --> 00:38:51.530
Stay with me for now, and you
will be saying to yourself,

00:38:51.530 --> 00:38:55.160
there's some steps there
that I didn't see fully,

00:38:55.160 --> 00:38:56.970
and I want to check.

00:38:56.970 --> 00:38:58.490
And I agree.

00:38:58.490 --> 00:39:03.870
But let me say that
we get to that point,

00:39:03.870 --> 00:39:07.240
and this is a fixed number.

00:39:07.240 --> 00:39:11.340
So it's C that we would
like to make small,

00:39:11.340 --> 00:39:14.340
and that's our final job.

00:39:14.340 --> 00:39:20.550
This was true for any choice of
the probabilities P. Well, oh,

00:39:20.550 --> 00:39:21.790
yeah, sorry.

00:39:21.790 --> 00:39:24.610
Yeah, yeah.

00:39:24.610 --> 00:39:27.490
So I want to--

00:39:27.490 --> 00:39:30.392
this still had in
it a probability.

00:39:34.410 --> 00:39:34.910
Yeah.

00:39:34.910 --> 00:39:36.420
What do I want to do?

00:39:36.420 --> 00:39:40.890
I want to show that that was
the best choice, that this

00:39:40.890 --> 00:39:42.640
was the best choice.

00:39:42.640 --> 00:39:44.220
Yeah, yeah.

00:39:44.220 --> 00:39:45.840
I want to show that
that's the best

00:39:45.840 --> 00:39:54.850
choice, that the choice of
weights of probabilities, based

00:39:54.850 --> 00:39:59.440
on length of a times the length
of b-- of course, it sounds

00:39:59.440 --> 00:40:00.880
reasonable, doesn't it?

00:40:00.880 --> 00:40:05.290
We want to-- for big
columns and big rows,

00:40:05.290 --> 00:40:08.590
we want to have a higher
probability to choose those.

00:40:08.590 --> 00:40:11.020
But is the probability
proportional

00:40:11.020 --> 00:40:14.380
to the length of
both, or should it

00:40:14.380 --> 00:40:17.800
be proportional to the 10th
power or the square root

00:40:17.800 --> 00:40:18.760
or what?

00:40:18.760 --> 00:40:25.810
That's what our final
step of optimizing the P.

00:40:25.810 --> 00:40:27.190
So this is the final step.

00:40:30.290 --> 00:40:40.900
Optimize the
probabilities, P1 to P2,

00:40:40.900 --> 00:40:47.450
I guess, no, P1 to Pr, for the r
rows, r columns of a and r rows

00:40:47.450 --> 00:40:50.630
of b, subject to--

00:40:50.630 --> 00:40:53.030
they have to add up to 1.

00:40:53.030 --> 00:40:55.070
And what do I mean by optimize?

00:40:55.070 --> 00:40:56.200
I mean minimize.

00:40:59.050 --> 00:41:04.350
This optimize means
minimizing this expression,

00:41:04.350 --> 00:41:11.603
C. So aj bj transpose.

00:41:18.710 --> 00:41:23.420
Where is-- over Pj.

00:41:23.420 --> 00:41:26.540
Oh yeah, wait a minute.

00:41:26.540 --> 00:41:28.848
Help.

00:41:28.848 --> 00:41:30.780
Help.

00:41:30.780 --> 00:41:32.190
So let me just see.

00:41:35.494 --> 00:41:39.720
Yeah, my variance
has got a Pj in it.

00:41:39.720 --> 00:41:41.170
Yeah, my variance-- sorry--

00:41:41.170 --> 00:41:43.990
my variance-- oh, OK.

00:41:43.990 --> 00:41:44.995
This is my variance.

00:41:48.300 --> 00:41:54.300
This is the result if I make
the right choice for the--

00:41:54.300 --> 00:41:57.390
if I make this choice
for the probabilities.

00:41:57.390 --> 00:42:00.820
But I'm backing up a minute.

00:42:00.820 --> 00:42:11.050
This is if-- this is the
with optimal Pj's, then

00:42:11.050 --> 00:42:12.490
we got that answer.

00:42:12.490 --> 00:42:13.150
Great.

00:42:13.150 --> 00:42:14.560
That was our answer.

00:42:14.560 --> 00:42:18.790
But I'm backing up
to this and saying,

00:42:18.790 --> 00:42:24.420
what are the optimal Pj's
to make this variance small?

00:42:24.420 --> 00:42:29.900
So really, I'm just doing this.

00:42:29.900 --> 00:42:33.180
Let me write the
problem simpler.

00:42:33.180 --> 00:42:44.370
Minimize with the sum
of the P's equal 1,

00:42:44.370 --> 00:42:51.240
some quantity Q squared
over Qj over Pj.

00:42:51.240 --> 00:42:52.030
Yeah, that's it.

00:42:56.860 --> 00:43:01.240
How do you-- so these Qj's that
I just introduced that letter

00:43:01.240 --> 00:43:03.310
for are the aj bj's.

00:43:06.000 --> 00:43:07.000
They're given.

00:43:07.000 --> 00:43:10.390
Maybe I'll just put back aj Pj.

00:43:10.390 --> 00:43:16.330
So to repeat, this is the
calculation of the variance

00:43:16.330 --> 00:43:18.940
for any choice of Pj's.

00:43:18.940 --> 00:43:22.480
This is what I get if
I make the best choice,

00:43:22.480 --> 00:43:24.970
but over here, I'm going
to show that it is the best

00:43:24.970 --> 00:43:29.230
choice, that it's the choice
that makes this result as

00:43:29.230 --> 00:43:30.850
small as possible.

00:43:30.850 --> 00:43:34.930
So that's the Lagrange
multiplier aspect.

00:43:34.930 --> 00:43:39.370
So the statistics has been done.

00:43:39.370 --> 00:43:41.630
I'm getting this answer.

00:43:41.630 --> 00:43:46.330
And instead of putting
in some weird Q,

00:43:46.330 --> 00:43:49.546
let me put in what these are.

00:43:49.546 --> 00:43:50.370
They're whatever.

00:43:50.370 --> 00:43:51.495
They're a bunch of numbers.

00:43:54.490 --> 00:44:02.630
But I'm dividing by the Pj, and
how do you find the best Pj?

00:44:02.630 --> 00:44:09.580
Do you know about that
optimization question?

00:44:09.580 --> 00:44:12.200
They have to add to 1.

00:44:12.200 --> 00:44:15.310
And the Lagrange
had the great idea.

00:44:15.310 --> 00:44:19.650
So this is maybe the first
time we've used his idea.

00:44:19.650 --> 00:44:23.390
So do you remember
what his idea is?

00:44:23.390 --> 00:44:28.970
He takes this constraint, and
he builds it into the function.

00:44:28.970 --> 00:44:34.460
He multiplies it by some unknown
mysterious number, often called

00:44:34.460 --> 00:44:37.280
lambda, but nothing to
do with eigenvalues,

00:44:37.280 --> 00:44:42.050
of the constraints that
the Pi's should add to 1.

00:44:42.050 --> 00:44:44.660
So he had 0.

00:44:44.660 --> 00:44:49.680
He had 0, but with
a variable lambda.

00:44:49.680 --> 00:44:51.830
This is Lagrange's idea.

00:44:51.830 --> 00:44:55.160
So it's pretty neat
that this problem--

00:44:55.160 --> 00:44:56.960
I've left randomized sampling.

00:44:56.960 --> 00:45:00.380
I've arrived at this
final sub problem,

00:45:00.380 --> 00:45:03.530
optimizing the probabilities
under the condition

00:45:03.530 --> 00:45:06.320
that they add to 1,
and Lagrange's idea

00:45:06.320 --> 00:45:11.640
was build that equation
into the function.

00:45:11.640 --> 00:45:15.200
Then you can take
derivatives, but you also

00:45:15.200 --> 00:45:17.570
take derivatives with
respect to lambda,

00:45:17.570 --> 00:45:20.570
because that's now an unknown.

00:45:20.570 --> 00:45:23.510
And you solve-- you set
the derivatives to 0,

00:45:23.510 --> 00:45:24.470
and you get the answer.

00:45:24.470 --> 00:45:26.060
It's like a miracle.

00:45:28.720 --> 00:45:32.970
But if you've seen Lagrange,
it's a confusing miracle.

00:45:32.970 --> 00:45:34.030
That's what it is.

00:45:34.030 --> 00:45:34.710
Yeah.

00:45:34.710 --> 00:45:35.680
OK.

00:45:35.680 --> 00:45:39.580
So if I take the derivatives
with respect to the P's, set

00:45:39.580 --> 00:45:45.870
them to 0, I think I'm going
to get the recommended P's.

00:45:48.540 --> 00:45:52.980
So I've computed the final
answer with a recommended P's,

00:45:52.980 --> 00:45:55.860
but now I'm going to show that
they really are recommended.

00:45:55.860 --> 00:45:59.910
So can you take the derivative
of that with respect to P?

00:45:59.910 --> 00:46:05.280
Can I-- I'll just raise this a
little, raise it a little more.

00:46:05.280 --> 00:46:07.170
OK, take the
derivative with respect

00:46:07.170 --> 00:46:11.460
to P, each P, because
I've got n unknowns there,

00:46:11.460 --> 00:46:14.670
or however many,
maybe r unknowns.

00:46:14.670 --> 00:46:17.890
And I've got lambda, so
I've got r plus 1 things.

00:46:17.890 --> 00:46:22.260
So what's the derivative with
respect to P. OK, calculus.

00:46:22.260 --> 00:46:23.760
Take the derivative of that.

00:46:23.760 --> 00:46:30.240
It's aj bj transpose over--

00:46:30.240 --> 00:46:32.150
with a minus Pj squared, right?

00:46:34.690 --> 00:46:37.645
And the derivative of that
with respect to Pj is?

00:46:40.740 --> 00:46:42.940
Minus lambda.

00:46:42.940 --> 00:46:46.450
So that derivative with
respect to Pj is 0,

00:46:46.450 --> 00:46:52.360
and the derivative-- so this was
a derivative with respect to Pj

00:46:52.360 --> 00:46:54.440
has to be 0.

00:46:54.440 --> 00:46:57.735
And then the derivative
with respect to lambda--

00:46:57.735 --> 00:47:03.020
the derivative with
respect to lambda is that,

00:47:03.020 --> 00:47:04.850
on call them j's--

00:47:04.850 --> 00:47:07.772
j's minus equals 1.

00:47:10.880 --> 00:47:13.780
Lagrange confused
the whole world,

00:47:13.780 --> 00:47:18.360
but he gave us a break that
in the derivative with respect

00:47:18.360 --> 00:47:21.190
to lambda, it just brings
back that constraint,

00:47:21.190 --> 00:47:23.440
because he just built it in
with the factor of lambda,

00:47:23.440 --> 00:47:25.970
then he took the derivative,
and it brought back

00:47:25.970 --> 00:47:27.160
that constraint.

00:47:27.160 --> 00:47:29.170
But this part is
the beautiful part.

00:47:32.970 --> 00:47:35.392
Now, what do I learn from that?

00:47:39.170 --> 00:47:42.350
And sometimes this
would be a plus.

00:47:42.350 --> 00:47:47.640
Why don't I make it a plus
just to make my life easier?

00:47:47.640 --> 00:47:50.330
Lagrange is dead now,
and he don't care anyway,

00:47:50.330 --> 00:47:52.450
whether it's plus or a minus.

00:47:52.450 --> 00:47:52.950
OK.

00:47:56.810 --> 00:47:58.850
So this is telling me this.

00:47:58.850 --> 00:48:02.030
So this is tell me
what its multiplier is.

00:48:02.030 --> 00:48:03.110
He's telling me that--

00:48:03.110 --> 00:48:11.750
this equation is telling me
that the multiplier is aj bj

00:48:11.750 --> 00:48:15.090
transpose over Pj squared.

00:48:17.770 --> 00:48:22.840
Or put it another way, he's
telling me that Pj squared is--

00:48:26.070 --> 00:48:30.870
I guess, I'm hoping that after
the pretty confusing steps

00:48:30.870 --> 00:48:35.400
that we took, this is a
separate little bit of math,

00:48:35.400 --> 00:48:37.710
using the Lagrange
multiplier idea,

00:48:37.710 --> 00:48:41.490
and I hope that your
thought will be,

00:48:41.490 --> 00:48:43.450
boy, that was pretty simple.

00:48:43.450 --> 00:48:45.300
So I'm going to put
the Pj squareds here

00:48:45.300 --> 00:48:46.256
and the lambda there.

00:48:49.470 --> 00:48:50.490
What does this tell me?

00:48:54.650 --> 00:48:57.800
I've taken the derivative with
respect to the Pj's, and I got

00:48:57.800 --> 00:49:02.750
this equation for each j
because I took the derivative,

00:49:02.750 --> 00:49:06.290
the partial derivative with
respect to each of the Pj's.

00:49:06.290 --> 00:49:09.720
And it tells me
that Pj squared--

00:49:13.260 --> 00:49:14.040
wait a minute.

00:49:14.040 --> 00:49:15.630
What's the square in there for?

00:49:15.630 --> 00:49:17.070
Help.

00:49:17.070 --> 00:49:18.420
I've only got two minutes.

00:49:18.420 --> 00:49:24.580
And oh, they have to add to 1.

00:49:24.580 --> 00:49:27.280
Oh yeah, lambda is
going to save us.

00:49:27.280 --> 00:49:30.310
Right, lambda is
going to save us,

00:49:30.310 --> 00:49:34.360
because the total
probabilities-- so Pj

00:49:34.360 --> 00:49:37.480
will be the square
root of this stuff.

00:49:40.820 --> 00:49:44.230
And then I-- the number
lambda, I haven't decided.

00:49:44.230 --> 00:49:46.970
Lagrange's multiplier,
I haven't decided.

00:49:46.970 --> 00:49:48.300
So what is it?

00:49:48.300 --> 00:49:51.660
It's the correct number
to make this equal to 1.

00:49:51.660 --> 00:49:56.950
So that is the C. Oh god.

00:49:59.740 --> 00:50:02.450
Why have I got
square root there?

00:50:02.450 --> 00:50:03.780
Shoot.

00:50:03.780 --> 00:50:04.194
AUDIENCE: I think
you're supposed

00:50:04.194 --> 00:50:05.330
to start off with squares.

00:50:05.330 --> 00:50:07.455
GILBERT STRANG: I should
have started with squares?

00:50:07.455 --> 00:50:08.870
AUDIENCE: [INAUDIBLE]

00:50:08.870 --> 00:50:11.200
GILBERT STRANG: So
these should be squares?

00:50:11.200 --> 00:50:13.398
Ah, thank you.

00:50:13.398 --> 00:50:14.690
You could have told me earlier.

00:50:17.620 --> 00:50:19.210
When you see a
professor in trouble,

00:50:19.210 --> 00:50:22.490
don't just let him hang there.

00:50:22.490 --> 00:50:24.260
OK, all right.

00:50:24.260 --> 00:50:29.400
OK, and this is aj bj transpose.

00:50:29.400 --> 00:50:31.910
So apart from the
kerfuffle here,

00:50:31.910 --> 00:50:34.850
and the notes get
it right, because I

00:50:34.850 --> 00:50:40.310
had time to think
there, it turns out

00:50:40.310 --> 00:50:45.960
that this optimum gave
the formula for the Pj's

00:50:45.960 --> 00:50:49.320
that I used earlier.

00:50:49.320 --> 00:50:51.350
So when I introduced
this formula,

00:50:51.350 --> 00:50:55.130
I said, let's choose
those probabilities,

00:50:55.130 --> 00:50:57.560
but then I came
back at the very end

00:50:57.560 --> 00:51:00.620
and showed that they are
the probabilities that

00:51:00.620 --> 00:51:02.720
minimize the variance.

00:51:02.720 --> 00:51:05.690
So that's like today's lecture.

00:51:05.690 --> 00:51:08.060
Can you just think
a minute, but please

00:51:08.060 --> 00:51:12.870
do go back through the
notes, because there

00:51:12.870 --> 00:51:15.800
is some messy steps
in the variance

00:51:15.800 --> 00:51:20.120
there that I had
to go by quickly.

00:51:20.120 --> 00:51:23.240
But you understand the
principle, that we set up

00:51:23.240 --> 00:51:26.300
a randomized system.

00:51:26.300 --> 00:51:35.570
We choose probabilities, aiming
to get the smallest variance.

00:51:35.570 --> 00:51:38.750
And it turns out that the
good probabilities are bigger

00:51:38.750 --> 00:51:43.770
when the column is a larger
column, so that to use this,

00:51:43.770 --> 00:51:45.890
you have to go
through the matrix

00:51:45.890 --> 00:51:48.710
and find the length
of the columns,

00:51:48.710 --> 00:51:52.290
because that's what's telling
you the probabilities.

00:51:52.290 --> 00:51:55.070
So that's like a
first pass through.

00:51:55.070 --> 00:51:57.380
Before you do the
randomized sampling,

00:51:57.380 --> 00:51:59.720
you must decide on
the probabilities,

00:51:59.720 --> 00:52:04.280
and they depend on the sizes
of the different columns.

00:52:04.280 --> 00:52:08.630
Thank you for getting
me through that.

00:52:08.630 --> 00:52:11.630
I'll come back to a little
more about randomized things

00:52:11.630 --> 00:52:16.700
next time, and then later, not
much later, but a little bit

00:52:16.700 --> 00:52:19.940
later, we'll be seeing
probability much more

00:52:19.940 --> 00:52:23.086
seriously OK, thank you.