WEBVTT
00:00:17.470 --> 00:00:20.000
MICHALE FEE: OK, let's
go ahead and get started.
00:00:20.000 --> 00:00:22.630
All right, so today, we're
going to continue talking about
00:00:22.630 --> 00:00:26.350
feed-forward neural networks,
and we're going to keep working
00:00:26.350 --> 00:00:31.180
on some interesting
aspects of linear algebra--
00:00:31.180 --> 00:00:32.950
matrix transformations.
00:00:32.950 --> 00:00:37.990
We're going to introduce a
new idea from linear algebra,
00:00:37.990 --> 00:00:40.240
the idea of basis sets.
00:00:40.240 --> 00:00:42.700
We're going to describe some
interesting and important
00:00:42.700 --> 00:00:46.390
properties of basis sets,
such as linear independence.
00:00:46.390 --> 00:00:49.480
And then we're going to end with
just a very simple formulation
00:00:49.480 --> 00:00:55.090
of how to change between
different basis sets.
00:00:55.090 --> 00:00:58.210
So let me explain
a little bit more,
00:00:58.210 --> 00:01:03.850
motivate a little bit more
why we're doing these things.
00:01:03.850 --> 00:01:08.770
So as people, as animals,
looking out at the world,
00:01:08.770 --> 00:01:11.740
we are looking at
high-dimensional data.
00:01:11.740 --> 00:01:16.330
We have hundreds of millions of
photoreceptors in our retina.
00:01:16.330 --> 00:01:22.660
Those data get compressed
down into about a million
00:01:22.660 --> 00:01:26.050
nerve fibers that go through
our optic nerve up to our brain.
00:01:26.050 --> 00:01:28.120
So it's a very
high-dimensional data set.
00:01:28.120 --> 00:01:31.030
And then our brain unpacks
that data and tries
00:01:31.030 --> 00:01:32.020
to make sense of it.
00:01:32.020 --> 00:01:34.720
And it does that by
passing that data
00:01:34.720 --> 00:01:36.970
through layers of
neural circuits
00:01:36.970 --> 00:01:38.500
that make transformations.
00:01:38.500 --> 00:01:42.790
And we've talked about how in
going from one layer of neurons
00:01:42.790 --> 00:01:44.440
to another layer
of neurons, there's
00:01:44.440 --> 00:01:47.080
a feed-forward projection
that essentially
00:01:47.080 --> 00:01:50.507
does what looks like a
matrix multiplication, OK?
00:01:50.507 --> 00:01:52.090
So that's one of the
reasons why we're
00:01:52.090 --> 00:01:55.940
trying to understand what
matrix multiplications do.
00:01:55.940 --> 00:01:58.930
Now, we talked about some of
the matrix transformations
00:01:58.930 --> 00:02:01.240
that you can see when you
do a matrix multiplication.
00:02:01.240 --> 00:02:04.630
And one of those was a rotation.
00:02:04.630 --> 00:02:07.690
Matrix multiplications
can implement rotations.
00:02:07.690 --> 00:02:11.980
And rotations are very
important for visualizing
00:02:11.980 --> 00:02:13.160
high-dimensional data.
00:02:13.160 --> 00:02:17.740
So this is from a website
at Google research,
00:02:17.740 --> 00:02:21.640
where they've implemented
different viewers
00:02:21.640 --> 00:02:24.790
for high-dimensional data, ways
of taking high-dimensional data
00:02:24.790 --> 00:02:29.590
and reducing the dimensionality
and then visualizing
00:02:29.590 --> 00:02:31.000
what that data looks like.
00:02:31.000 --> 00:02:33.100
And one of the
most important ways
00:02:33.100 --> 00:02:36.310
that you visualize
high-dimensional data
00:02:36.310 --> 00:02:39.945
is by rotating it and looking
at it from different angles.
00:02:39.945 --> 00:02:41.320
And what you're
doing when you do
00:02:41.320 --> 00:02:43.440
that is you take this
high-dimensional data,
00:02:43.440 --> 00:02:45.580
you rotate it,
and you project it
00:02:45.580 --> 00:02:49.190
into a plane, which is what
you're seeing on the screen.
00:02:49.190 --> 00:02:52.630
And you can see that
you get a lot out
00:02:52.630 --> 00:02:56.590
of looking at
different projections
00:02:56.590 --> 00:02:58.780
and different
rotations of data sets.
00:02:58.780 --> 00:03:01.750
Also, when you're
zooming in on the data,
00:03:01.750 --> 00:03:04.270
that's another matrix
transformation.
00:03:04.270 --> 00:03:07.570
You can stretch
and compress and do
00:03:07.570 --> 00:03:09.910
all sorts of different
things to data.
00:03:09.910 --> 00:03:13.810
Now, one of the cool
things is that when
00:03:13.810 --> 00:03:17.800
we study the brain
to try to figure out
00:03:17.800 --> 00:03:24.730
how it does this really cool
process of rotating data
00:03:24.730 --> 00:03:26.980
through its
transformations that are
00:03:26.980 --> 00:03:30.370
produced by neural networks,
we record from lots of neurons.
00:03:30.370 --> 00:03:32.230
There's technology
now where you can
00:03:32.230 --> 00:03:35.800
image from thousands, or even
tens of thousands, of neurons
00:03:35.800 --> 00:03:37.120
simultaneously.
00:03:37.120 --> 00:03:40.000
And again, it's this really
high-dimensional data
00:03:40.000 --> 00:03:42.040
set that we're looking
at to try to figure out
00:03:42.040 --> 00:03:43.570
how the brain works.
00:03:43.570 --> 00:03:46.330
And so in order to
analyze those data,
00:03:46.330 --> 00:03:50.140
we try to build programs
or machines that
00:03:50.140 --> 00:03:54.280
act like the brain in order
to understand the data that we
00:03:54.280 --> 00:03:56.260
collect from the brain.
00:03:56.260 --> 00:03:59.350
It's really cool.
00:03:59.350 --> 00:04:00.560
So it's kind of fun.
00:04:00.560 --> 00:04:04.030
As neuroscientists, we're
trying to build a brain
00:04:04.030 --> 00:04:08.160
to analyze the data that
we collect from the brain.
00:04:08.160 --> 00:04:12.100
All right, so the cool thing
is that the math that we're
00:04:12.100 --> 00:04:16.060
looking at right now and
the kinds of neural networks
00:04:16.060 --> 00:04:18.190
that we're looking at
right now are exactly
00:04:18.190 --> 00:04:19.959
the kinds of math
and neural networks
00:04:19.959 --> 00:04:24.790
that you use to
explain the brain
00:04:24.790 --> 00:04:30.220
and to look at data in very
powerful ways, all right?
00:04:30.220 --> 00:04:32.230
So that's what
we're trying to do.
00:04:32.230 --> 00:04:35.800
So let's start by coming back
to our two-layer feed-forward
00:04:35.800 --> 00:04:38.410
network and looking
in a little bit more
00:04:38.410 --> 00:04:39.790
detail about what it does.
00:04:39.790 --> 00:04:42.850
OK, so I introduced the idea,
this two-layer feed-forward
00:04:42.850 --> 00:04:43.630
network.
00:04:43.630 --> 00:04:46.540
We have an input layer that
has a vector of firing rates,
00:04:46.540 --> 00:04:49.600
a firing rate that describes
each of those input neurons,
00:04:49.600 --> 00:04:51.040
a vector of firing rates.
00:04:51.040 --> 00:04:52.690
That, again, is
a list of numbers
00:04:52.690 --> 00:04:55.840
that describes the firing rate
of each neuron in the output
00:04:55.840 --> 00:04:57.930
layer.
00:04:57.930 --> 00:05:00.030
And the connections
between these two layers
00:05:00.030 --> 00:05:03.870
are a bunch of synapses,
synaptic weights,
00:05:03.870 --> 00:05:08.310
that we can use to calculate
to transform the firing
00:05:08.310 --> 00:05:11.010
rates at the input layer into
the firing rates at the output
00:05:11.010 --> 00:05:11.970
layer.
00:05:11.970 --> 00:05:15.330
So let's look in a little
bit more detail now
00:05:15.330 --> 00:05:20.460
at what that collection
of weights looks like.
00:05:20.460 --> 00:05:22.820
So we describe it as a matrix.
00:05:22.820 --> 00:05:24.850
That's called the weight matrix.
00:05:24.850 --> 00:05:27.750
The matrix has in it a
number for the weight
00:05:27.750 --> 00:05:31.860
from each of the input neurons
to each of the output neurons.
00:05:31.860 --> 00:05:35.790
The rows are a vector
of weights onto each
00:05:35.790 --> 00:05:36.870
of the output neurons.
00:05:36.870 --> 00:05:39.660
And we'll see in
a couple of slides
00:05:39.660 --> 00:05:45.150
that the columns are the set of
weights from each input neuron
00:05:45.150 --> 00:05:47.790
to all the output neurons.
00:05:47.790 --> 00:05:51.990
A row of this weight matrix
is a vector of weights
00:05:51.990 --> 00:05:54.480
onto one of the output neurons.
00:05:57.460 --> 00:06:01.630
All right, so we can
compute the firing rates
00:06:01.630 --> 00:06:04.630
of the neurons in
our output layer
00:06:04.630 --> 00:06:08.590
for the case of linear
neurons in the output layer
00:06:08.590 --> 00:06:11.890
simply as a matrix
product of this weight
00:06:11.890 --> 00:06:15.550
vector times the vector
of input firing rates.
00:06:15.550 --> 00:06:18.640
And that matrix
multiplication gives us
00:06:18.640 --> 00:06:21.400
a vector that describes the
firing rates of the output
00:06:21.400 --> 00:06:22.460
layer.
00:06:22.460 --> 00:06:24.650
So let me just go through
what that looks like.
00:06:24.650 --> 00:06:28.780
If we define a column vector
of firing rates of each
00:06:28.780 --> 00:06:31.510
of the output neurons,
we can write that
00:06:31.510 --> 00:06:36.400
as the weight matrix times the
column vector of the firing
00:06:36.400 --> 00:06:38.740
rates of the input layer.
00:06:38.740 --> 00:06:42.550
We can calculate the firing
rate of the first neuron
00:06:42.550 --> 00:06:45.070
in the output layer
as the dot product
00:06:45.070 --> 00:06:48.790
of that row of the weight
matrix with that vector
00:06:48.790 --> 00:06:50.410
of firing rates, OK?
00:06:50.410 --> 00:06:54.460
And that gives us the
firing rate. v1 is then
00:06:54.460 --> 00:06:58.570
W of a equals 1 dot u.
00:06:58.570 --> 00:07:02.380
That is one particular
way of thinking
00:07:02.380 --> 00:07:07.220
about how you're calculating
the firing rates in the output
00:07:07.220 --> 00:07:07.720
layer.
00:07:07.720 --> 00:07:10.600
And it's called the dot
product interpretation
00:07:10.600 --> 00:07:12.880
of matrix multiplication,
all right?
00:07:12.880 --> 00:07:17.140
Now, there's a different
sort of complementary way
00:07:17.140 --> 00:07:18.850
of thinking about
what happens when
00:07:18.850 --> 00:07:21.970
you do this matrix
product that's also
00:07:21.970 --> 00:07:23.650
important to
understand, because it's
00:07:23.650 --> 00:07:27.910
a different way of thinking
about what's going on.
00:07:27.910 --> 00:07:32.600
We can also think about the
columns of this weight matrix.
00:07:32.600 --> 00:07:35.200
And we can think about
the weight matrix
00:07:35.200 --> 00:07:39.130
as a collection
of column vectors
00:07:39.130 --> 00:07:43.270
that we put together
into matrix form.
00:07:43.270 --> 00:07:45.930
So in this particular
network here,
00:07:45.930 --> 00:07:48.600
we can write down this
weight matrix, all right?
00:07:48.600 --> 00:07:51.510
And you can see that
this first input
00:07:51.510 --> 00:07:57.180
neuron connects to output neuron
one, so there's a one there.
00:07:57.180 --> 00:08:00.090
The first input neuron
connects to output neuron two,
00:08:00.090 --> 00:08:01.500
so there's a one there.
00:08:01.500 --> 00:08:05.070
The first input neuron does not
connect to output neuron three,
00:08:05.070 --> 00:08:08.870
so there's a zero there, OK?
00:08:08.870 --> 00:08:09.480
All right.
00:08:09.480 --> 00:08:12.360
So the columns of
the weight matrix
00:08:12.360 --> 00:08:16.800
represent the pattern
of projections from one
00:08:16.800 --> 00:08:20.280
of the input neurons to
all of the output neurons.
00:08:23.570 --> 00:08:27.200
All right, so let's
just take a look at what
00:08:27.200 --> 00:08:30.800
would happen if only one of
our input neurons was active
00:08:30.800 --> 00:08:33.650
and all the others were silent.
00:08:33.650 --> 00:08:35.110
So this neuron is active.
00:08:35.110 --> 00:08:39.716
What would the output
vector look like?
00:08:39.716 --> 00:08:41.299
What would the pattern
of firing rates
00:08:41.299 --> 00:08:46.630
look like for the output
neurons in this case?
00:08:46.630 --> 00:08:47.130
Anybody?
00:08:47.130 --> 00:08:48.290
It's straightforward.
00:08:48.290 --> 00:08:49.373
It's not a trick question.
00:08:53.110 --> 00:08:53.885
[INAUDIBLE]?
00:09:00.535 --> 00:09:01.428
AUDIENCE: So--
00:09:01.428 --> 00:09:03.720
MICHALE FEE: If this neuron
is firing and these weights
00:09:03.720 --> 00:09:04.645
are all one or zero.
00:09:07.293 --> 00:09:09.482
AUDIENCE: The one neuron, a--
00:09:09.482 --> 00:09:10.190
MICHALE FEE: Yes?
00:09:10.190 --> 00:09:10.730
This--
00:09:10.730 --> 00:09:11.240
AUDIENCE: Yeah, [INAUDIBLE].
00:09:11.240 --> 00:09:13.032
MICHALE FEE: --would
fire, this would fire,
00:09:13.032 --> 00:09:14.440
and that would not fire, right?
00:09:14.440 --> 00:09:17.060
Good.
00:09:17.060 --> 00:09:20.460
So you can write that out
as a matrix multiplication.
00:09:20.460 --> 00:09:23.300
So the firing rate
vector, in this case,
00:09:23.300 --> 00:09:28.310
would be the dot product of
this with this, this with this,
00:09:28.310 --> 00:09:29.570
and that with that.
00:09:29.570 --> 00:09:33.200
And what you would see is
that the output firing rate
00:09:33.200 --> 00:09:42.390
vector would look like this
first column of the weight
00:09:42.390 --> 00:09:43.430
matrix.
00:09:43.430 --> 00:09:45.860
So the output vector
would look like 1,
00:09:45.860 --> 00:09:50.040
1, 0 if only the first
neuron were active.
00:09:50.040 --> 00:09:55.320
So you can think of the
output firing rate vector
00:09:55.320 --> 00:09:59.940
as being a contribution
from neuron one--
00:09:59.940 --> 00:10:01.980
and that contribution
from neuron one
00:10:01.980 --> 00:10:05.910
is simply the first column
of the weight matrix--
00:10:05.910 --> 00:10:09.330
plus a contribution
from neuron two,
00:10:09.330 --> 00:10:12.720
which is given by the
second column of the weight
00:10:12.720 --> 00:10:16.510
matrix, and a contribution
from input neuron three,
00:10:16.510 --> 00:10:20.490
which is given by the third
column of the weight matrix,
00:10:20.490 --> 00:10:21.120
OK?
00:10:21.120 --> 00:10:26.010
So you can think of the
output firing rate vector
00:10:26.010 --> 00:10:29.670
as being a linear
combination of a contribution
00:10:29.670 --> 00:10:32.910
from the first
neuron, a contribution
00:10:32.910 --> 00:10:34.980
from the second neuron,
and a contribution
00:10:34.980 --> 00:10:36.060
from the third neuron.
00:10:36.060 --> 00:10:38.760
Does that make sense?
00:10:38.760 --> 00:10:41.970
It's a different way
of thinking about it.
00:10:41.970 --> 00:10:45.060
In the dot product
interpretation,
00:10:45.060 --> 00:10:51.210
we're asking, what is the--
00:10:51.210 --> 00:10:53.190
we're summing up
all of the weights
00:10:53.190 --> 00:10:57.150
onto neuron one
from those synapses.
00:10:57.150 --> 00:10:59.940
We're summing up all the
weights onto neuron two
00:10:59.940 --> 00:11:02.610
from those synapses and summing
up all the weights onto neuron
00:11:02.610 --> 00:11:04.240
three from those synapses.
00:11:04.240 --> 00:11:08.900
So we're doing it one
output neuron at a time.
00:11:08.900 --> 00:11:12.230
In this other interpretation
of this matrix multiplication,
00:11:12.230 --> 00:11:13.850
we're doing something different.
00:11:13.850 --> 00:11:17.780
We're asking, what is the
contribution to the output
00:11:17.780 --> 00:11:20.240
from one of the input neurons?
00:11:20.240 --> 00:11:24.620
What is the contribution to
the output from another input
00:11:24.620 --> 00:11:25.260
neuron?
00:11:25.260 --> 00:11:27.110
And what is the
contribution to the output
00:11:27.110 --> 00:11:29.270
from yet another input neuron?
00:11:29.270 --> 00:11:31.110
Does that makes sense?
00:11:31.110 --> 00:11:32.900
OK.
00:11:32.900 --> 00:11:35.510
All right, so we have
a linear combination
00:11:35.510 --> 00:11:38.650
of contributions from each
of those input neurons.
00:11:41.910 --> 00:11:45.390
And that's called the outer
product interpretation.
00:11:45.390 --> 00:11:47.880
I'm not going to explain right
now why it's called that,
00:11:47.880 --> 00:11:50.470
but that's how
that's referred to.
00:11:50.470 --> 00:11:53.700
So the output pattern
is a linear combination
00:11:53.700 --> 00:11:55.360
of contributions.
00:11:55.360 --> 00:12:03.660
OK, so let's take a look at
the effect of some very simple
00:12:03.660 --> 00:12:04.830
feed-forward networks, OK?
00:12:04.830 --> 00:12:08.210
So let's just look
at a few examples.
00:12:08.210 --> 00:12:10.270
So if we have a
feed forward-- this
00:12:10.270 --> 00:12:12.940
is sort of the simplest
feed-forward network.
00:12:12.940 --> 00:12:16.710
Each neuron in the
input layer connects
00:12:16.710 --> 00:12:20.230
to one neuron in the output
layer with a weight of one.
00:12:20.230 --> 00:12:24.330
So what is the weight
matrix of this network?
00:12:24.330 --> 00:12:25.190
AUDIENCE: Identity.
00:12:25.190 --> 00:12:26.920
MICHALE FEE: It's
the identity matrix.
00:12:26.920 --> 00:12:29.820
And so the firing rate
of the output layer
00:12:29.820 --> 00:12:33.030
will be exactly the same as
the firing rates in the input
00:12:33.030 --> 00:12:33.720
layer, OK?
00:12:33.720 --> 00:12:37.920
So there's the weight matrix,
which is just the identity
00:12:37.920 --> 00:12:40.380
matrix, the firing rate.
00:12:40.380 --> 00:12:43.710
And the output layer is
just the identity matrix
00:12:43.710 --> 00:12:45.690
times the firing rate
of the input layer.
00:12:45.690 --> 00:12:49.650
And so that's equal to
the input firing rate, OK?
00:12:49.650 --> 00:12:55.100
All right, let's take a
slightly more complex network,
00:12:55.100 --> 00:12:58.280
and let's make each one of
those weights independent.
00:12:58.280 --> 00:12:59.810
They're not all
just equal to one,
00:12:59.810 --> 00:13:02.480
but they're scaled
by some constant--
00:13:02.480 --> 00:13:05.780
lambda 1, lambda
2, and lambda 3.
00:13:05.780 --> 00:13:08.040
The weight matrix
looks like this.
00:13:08.040 --> 00:13:10.400
It's a diagonal matrix,
where each of those weights
00:13:10.400 --> 00:13:13.070
is on the diagonal.
00:13:13.070 --> 00:13:16.250
And in that case, you can
see that the output firing
00:13:16.250 --> 00:13:19.160
rate is just this diagonal
matrix times the input firing
00:13:19.160 --> 00:13:20.190
rate.
00:13:20.190 --> 00:13:24.260
And you can see that the
output firing rate is just
00:13:24.260 --> 00:13:27.680
the input firing rate where each
component of the input firing
00:13:27.680 --> 00:13:29.855
rate is scaled by some constant.
00:13:32.360 --> 00:13:35.960
Pretty straightforward.
00:13:35.960 --> 00:13:42.230
Let's take a look at a case
where the weight matrix now
00:13:42.230 --> 00:13:44.660
corresponds to a
rotation matrix, OK?
00:13:44.660 --> 00:13:49.430
So we're going to let the weight
matrix look like this rotation
00:13:49.430 --> 00:13:52.100
matrix that we talked
about on Tuesday, where
00:13:52.100 --> 00:13:56.930
are the diagonal elements are
cosine of sum rotation angle,
00:13:56.930 --> 00:13:59.780
and the off-diagonal elements
are plus and minus sine
00:13:59.780 --> 00:14:01.850
of the rotation angle.
00:14:01.850 --> 00:14:05.150
So you can see that this
weight matrix corresponds
00:14:05.150 --> 00:14:10.280
to this network, where the
projection from input neuron
00:14:10.280 --> 00:14:13.820
one to output neuron
one is cosine phi.
00:14:13.820 --> 00:14:16.490
Input neuron two to output
neuron two is cosine phi.
00:14:16.490 --> 00:14:21.110
And then these cross-connections
are a plus and minus sine phi.
00:14:21.110 --> 00:14:23.540
OK, so what does that do?
00:14:23.540 --> 00:14:28.950
So we can see that the output
firing rate vector is just
00:14:28.950 --> 00:14:32.100
a product of this rotation
matrix times the input firing
00:14:32.100 --> 00:14:32.970
rate vector.
00:14:32.970 --> 00:14:36.860
And you can write down
each component like that.
00:14:36.860 --> 00:14:39.340
All right, so what does that do?
00:14:39.340 --> 00:14:41.310
So let's take a
particular rotation angle.
00:14:41.310 --> 00:14:44.280
We're going to take a
rotation angle of pi
00:14:44.280 --> 00:14:46.930
over 4, which is 45 degrees.
00:14:46.930 --> 00:14:49.930
That's what the weight
matrix looks like.
00:14:49.930 --> 00:14:53.010
And we can do that
multiplication
00:14:53.010 --> 00:14:58.880
to find that the output firing
rate vector looks like--
00:14:58.880 --> 00:15:01.880
one of the neurons
has a firing rate
00:15:01.880 --> 00:15:06.500
that looks like the sum of
the two input firing rates,
00:15:06.500 --> 00:15:10.310
and the other output
neuron has a firing rate
00:15:10.310 --> 00:15:14.220
that looks like the difference
between the two input firing
00:15:14.220 --> 00:15:15.463
rates.
00:15:15.463 --> 00:15:16.880
And if you look
at what this looks
00:15:16.880 --> 00:15:22.310
like in the space of
firing rates of the input
00:15:22.310 --> 00:15:26.620
layer and the output layer,
we can see what happens, OK?
00:15:26.620 --> 00:15:29.720
So what we'll often
do when we look
00:15:29.720 --> 00:15:31.970
at the behavior
of neural networks
00:15:31.970 --> 00:15:35.930
is we'll make a
plot of the firing
00:15:35.930 --> 00:15:38.600
rates of the different
neurons in the network.
00:15:38.600 --> 00:15:42.620
And what we'll often do for
simple feed-forward networks,
00:15:42.620 --> 00:15:45.230
and we'll also do this
for recurrent networks,
00:15:45.230 --> 00:15:53.890
is we'll plot the input firing
rates as in the plane of u1
00:15:53.890 --> 00:15:56.010
and u2.
00:15:56.010 --> 00:16:00.670
And then we can plot the output
firing rates in the same plane.
00:16:00.670 --> 00:16:05.370
So, for example, if we
have an input state that
00:16:05.370 --> 00:16:10.080
looks like u1 equals u2,
it will be some point
00:16:10.080 --> 00:16:11.520
on this diagonal line.
00:16:14.100 --> 00:16:17.850
We can then plot the
output firing rate
00:16:17.850 --> 00:16:21.270
on this plane, v1 versus v2.
00:16:21.270 --> 00:16:26.540
And what will the output
firing rate look like?
00:16:26.540 --> 00:16:30.190
What will the firing rate of
v1 look like in this case?
00:16:34.910 --> 00:16:36.810
AUDIENCE: [INAUDIBLE]
00:16:36.810 --> 00:16:41.410
MICHALE FEE: Yeah let's
say this is one and one.
00:16:41.410 --> 00:16:44.140
So what will the firing rate
of this neuron look like?
00:16:44.140 --> 00:16:44.852
[INAUDIBLE]?
00:16:44.852 --> 00:16:46.398
AUDIENCE: [INAUDIBLE]
00:16:46.398 --> 00:16:47.440
MICHALE FEE: What's that?
00:16:47.440 --> 00:16:48.340
AUDIENCE: [INAUDIBLE]
00:16:48.340 --> 00:16:50.410
MICHALE FEE: So the
firing rate of v1
00:16:50.410 --> 00:16:52.720
is just this quantity
right here, right?
00:16:52.720 --> 00:16:56.690
So it's u1 plus u2, right?
00:16:56.690 --> 00:17:00.920
So it's like 1
plus 1 over root 2.
00:17:00.920 --> 00:17:02.450
So it will be big.
00:17:02.450 --> 00:17:06.109
What will the firing rate
of neuron v2 look like?
00:17:06.109 --> 00:17:09.079
It'll be u2 minus u1, which is?
00:17:09.079 --> 00:17:09.810
AUDIENCE: Zero.
00:17:09.810 --> 00:17:10.700
MICHALE FEE: Zero.
00:17:10.700 --> 00:17:15.300
So it will be over here, right?
00:17:15.300 --> 00:17:20.579
So it will be that input
rotated by 45 degrees.
00:17:24.290 --> 00:17:26.089
And input down here--
00:17:26.089 --> 00:17:30.110
so the firing rate of the one
will be the sum of those two.
00:17:30.110 --> 00:17:32.220
Those two inputs
are both negative.
00:17:32.220 --> 00:17:37.940
So v1 for this input
will be big and negative.
00:17:37.940 --> 00:17:41.960
And v2 will be the
difference of u1 and u2,
00:17:41.960 --> 00:17:44.785
which for anything
on this line is?
00:17:44.785 --> 00:17:45.410
AUDIENCE: Zero.
00:17:45.410 --> 00:17:46.160
MICHALE FEE: Zero.
00:17:48.430 --> 00:17:49.840
OK.
00:17:49.840 --> 00:17:54.680
And so that input will
be rotated over to here.
00:17:54.680 --> 00:17:58.160
So you can think
of it this way--
00:17:58.160 --> 00:18:05.110
any input in this space of
u1 and u2, in the output
00:18:05.110 --> 00:18:10.880
will be just rotate by, in this
case, it's minus 45 degrees.
00:18:10.880 --> 00:18:15.220
So that's clockwise,
are the minus rotations.
00:18:15.220 --> 00:18:19.240
So you can just predict the
output firing rates simply
00:18:19.240 --> 00:18:23.170
by taking the input
firing rates in this plane
00:18:23.170 --> 00:18:26.740
and rotating them
by minus 45 degrees.
00:18:26.740 --> 00:18:28.520
All right, any
questions about that?
00:18:28.520 --> 00:18:31.610
It's very simple.
00:18:31.610 --> 00:18:34.990
So this little neural
network implements
00:18:34.990 --> 00:18:41.280
rotations of this input space.
00:18:41.280 --> 00:18:42.170
That's pretty cool.
00:18:51.370 --> 00:18:54.190
Why would you want a
network to do rotations?
00:18:54.190 --> 00:18:57.040
Well, this solves
exactly the problem
00:18:57.040 --> 00:18:59.890
that we were
working on last time
00:18:59.890 --> 00:19:02.440
when we were talking about
our perceptron, where we were
00:19:02.440 --> 00:19:07.450
trying to classify stimuli
that could not be separated
00:19:07.450 --> 00:19:10.330
in one dimension,
but rather, can
00:19:10.330 --> 00:19:12.050
be separated in two dimensions.
00:19:12.050 --> 00:19:14.890
So if we have
different categories--
00:19:14.890 --> 00:19:17.050
dogs and non-dogs--
00:19:17.050 --> 00:19:22.190
that can be viewed along
different dimensions--
00:19:22.190 --> 00:19:24.820
how furry they are--
00:19:24.820 --> 00:19:26.810
but can't be separated--
00:19:26.810 --> 00:19:30.640
the two categories can't be
separated from each other
00:19:30.640 --> 00:19:35.260
on the basis of just one
dimension of observation.
00:19:35.260 --> 00:19:42.580
So in this case, what we want to
do is take this base of inputs
00:19:42.580 --> 00:19:49.120
and rotate it into a new what
we'll call a new basis set
00:19:49.120 --> 00:19:53.620
so that now we can take the
firing rates of these output
00:19:53.620 --> 00:19:59.090
neurons and use
those to separate
00:19:59.090 --> 00:20:01.935
these different categories
from each other.
00:20:01.935 --> 00:20:02.810
Does that make sense?
00:20:07.660 --> 00:20:09.980
OK, so let me show you a
few more examples of that.
00:20:12.670 --> 00:20:15.490
So this is one way to
think about what we
00:20:15.490 --> 00:20:18.120
do when we do color vision, OK?
00:20:18.120 --> 00:20:23.320
So you know that we have
different cones in our retina
00:20:23.320 --> 00:20:26.590
that are sensitive to
different wavelengths.
00:20:26.590 --> 00:20:31.020
Most colors are combinations
of those wavelengths.
00:20:31.020 --> 00:20:35.040
So if we look at the
activity of, let's say,
00:20:35.040 --> 00:20:39.060
a cone that's sensitive to
wavelength one and the activity
00:20:39.060 --> 00:20:45.485
in a cone that's sensitive to
wavelength two, we might see--
00:20:45.485 --> 00:20:47.180
and then we look
around the world.
00:20:47.180 --> 00:20:49.370
We'll see a bunch
of different objects
00:20:49.370 --> 00:20:51.590
or a bunch of
different stimuli that
00:20:51.590 --> 00:20:54.815
activate those two different
cones in different ratios.
00:20:57.750 --> 00:21:03.120
And you might imagine that this
axis corresponds to, let's say,
00:21:03.120 --> 00:21:05.290
how much red there
is in a stimulus.
00:21:05.290 --> 00:21:07.530
This axis corresponds
to how much green
00:21:07.530 --> 00:21:08.970
there is in a stimulus.
00:21:08.970 --> 00:21:11.190
But let's say that you're
in an environment where
00:21:11.190 --> 00:21:15.460
there's some cloud of
contribution of red and green.
00:21:15.460 --> 00:21:19.530
So what would this direction
correspond to in this cloud?
00:21:24.870 --> 00:21:28.860
This direction corresponds
to more red and more green.
00:21:28.860 --> 00:21:32.465
What would that correspond to?
00:21:32.465 --> 00:21:34.693
AUDIENCE: Brown.
00:21:34.693 --> 00:21:36.610
MICHALE FEE: So what I'm
trying to get at here
00:21:36.610 --> 00:21:39.520
is that the sum of
those two is sort
00:21:39.520 --> 00:21:42.040
of the brightness of
the object, right?
00:21:42.040 --> 00:21:45.540
Something that has little
red and little green
00:21:45.540 --> 00:21:47.890
will look the same color
as something that has
00:21:47.890 --> 00:21:50.620
more red and more green, right?
00:21:50.620 --> 00:21:52.510
But what's different
about those two stimuli
00:21:52.510 --> 00:21:55.120
is that the one's
brighter than the other.
00:21:55.120 --> 00:21:57.860
The second one is brighter
than the first one.
00:21:57.860 --> 00:22:02.810
But this dimension
corresponds to what?
00:22:02.810 --> 00:22:06.560
Differences in the ratio
of those two colors, right?
00:22:06.560 --> 00:22:11.870
Sort of changes in the different
[AUDIO OUT] wavelengths,
00:22:11.870 --> 00:22:14.460
and that corresponds to color.
00:22:14.460 --> 00:22:17.780
So if we can take
this base of stimuli
00:22:17.780 --> 00:22:21.230
and rotate it such that
one axis corresponds
00:22:21.230 --> 00:22:25.490
to the sum of the two colors
and the other axis corresponds
00:22:25.490 --> 00:22:27.920
to the difference
of the two colors,
00:22:27.920 --> 00:22:31.310
then this axis will tell
you how bright it is,
00:22:31.310 --> 00:22:35.560
and this axis will tell you what
the hue is, what the color is.
00:22:35.560 --> 00:22:36.660
Does that makes sense?
00:22:36.660 --> 00:22:38.750
So there's a simple
case of where
00:22:38.750 --> 00:22:46.220
taking a rotation of a inputs
base, of a set of sensors,
00:22:46.220 --> 00:22:48.470
will give you
different information
00:22:48.470 --> 00:22:53.060
than you would get if you
just had one of those stimuli.
00:22:53.060 --> 00:22:58.070
If you were to just look at
the activity of the cone that's
00:22:58.070 --> 00:23:01.670
giving you a red
signal, if one object
00:23:01.670 --> 00:23:03.830
has more activity
in that cone, you
00:23:03.830 --> 00:23:07.220
don't know whether that
other object is just brighter
00:23:07.220 --> 00:23:11.220
or if it's actually more
red, that looked red.
00:23:11.220 --> 00:23:12.740
Does that makes sense?
00:23:12.740 --> 00:23:18.770
So doing a rotation gives
us signals in single neurons
00:23:18.770 --> 00:23:20.480
that carries useful information.
00:23:20.480 --> 00:23:24.520
It can disambiguate different
kinds of information.
00:23:24.520 --> 00:23:27.740
All right, so we can use
that simple rotation matrix
00:23:27.740 --> 00:23:32.360
to perform that
kind of separation.
00:23:32.360 --> 00:23:36.100
So brightness and color.
00:23:36.100 --> 00:23:38.860
Here's another example.
00:23:38.860 --> 00:23:45.070
I didn't get to talk about this
in this class, but there are--
00:23:45.070 --> 00:23:53.400
so barn owls, they can very
exquisitely localize objects
00:23:53.400 --> 00:23:56.100
by sound.
00:23:56.100 --> 00:23:59.100
So they hunt, essentially,
at night in the dark.
00:23:59.100 --> 00:24:04.800
They can hear a mouse
scurrying around in the grass.
00:24:04.800 --> 00:24:07.710
They just listen to
that sound, and they
00:24:07.710 --> 00:24:10.950
can tell exactly where it
is, and then they dive down
00:24:10.950 --> 00:24:13.500
and catch the mouse.
00:24:13.500 --> 00:24:15.160
So how did they do that?
00:24:15.160 --> 00:24:17.130
Well, they used
timing differences
00:24:17.130 --> 00:24:21.750
to tell which way the sound
is coming from side to side,
00:24:21.750 --> 00:24:23.730
and they use
intensity differences
00:24:23.730 --> 00:24:26.200
to tell which way the sound
is coming from up and down.
00:24:26.200 --> 00:24:27.950
Now, how do you use
intensity differences?
00:24:27.950 --> 00:24:30.570
Well, one of their
ears, their right ear
00:24:30.570 --> 00:24:32.950
pointed slightly upwards.
00:24:32.950 --> 00:24:35.710
And their left ear is
pointed slightly downwards.
00:24:35.710 --> 00:24:38.770
So when they hear a sound
that's slightly louder
00:24:38.770 --> 00:24:42.040
in the right ear and slightly
softer in the left ear,
00:24:42.040 --> 00:24:45.550
they know that it's coming
from up above, right?
00:24:45.550 --> 00:24:47.195
And if it's the
other way around,
00:24:47.195 --> 00:24:49.480
if it's slightly louder
in the left ear and softer
00:24:49.480 --> 00:24:55.550
in the right ear, they know it's
coming from below horizontal.
00:24:55.550 --> 00:24:59.060
And it's extremely
precise system, OK?
00:24:59.060 --> 00:25:00.450
So here's an example.
00:25:00.450 --> 00:25:02.390
So if they're sitting
there listening
00:25:02.390 --> 00:25:06.030
to the intensity, the amplitude
of the sound in the left ear
00:25:06.030 --> 00:25:10.040
and the amplitude of the
sound in the right ear,
00:25:10.040 --> 00:25:14.390
some sounds will be up here with
high amplitude in both ears.
00:25:14.390 --> 00:25:16.970
Some sounds will be
over here, with more
00:25:16.970 --> 00:25:22.070
amplitude in the right ear and
less amplitude in the left ear.
00:25:22.070 --> 00:25:24.800
What does this
dimension correspond to?
00:25:24.800 --> 00:25:27.764
That dimension corresponds to?
00:25:27.764 --> 00:25:28.620
AUDIENCE: Proximity.
00:25:28.620 --> 00:25:30.390
MICHALE FEE:
Proximity or, overall,
00:25:30.390 --> 00:25:32.610
the loudness of
the sound, right?
00:25:32.610 --> 00:25:36.060
And what does this
dimension correspond to?
00:25:36.060 --> 00:25:37.650
AUDIENCE: Direction.
00:25:37.650 --> 00:25:40.060
MICHALE FEE: The difference
in intensity corresponds
00:25:40.060 --> 00:25:47.830
to the elevation of the sound
relative to the horizontal.
00:25:47.830 --> 00:25:48.330
All right?
00:25:48.330 --> 00:25:52.200
So, in fact, what happens
in the owl's brain
00:25:52.200 --> 00:25:57.540
is that these two signals
undergo a rotation to produce
00:25:57.540 --> 00:26:01.080
activity in some neurons
that's sensitive to the overall
00:26:01.080 --> 00:26:04.830
loudness and activity
in other neurons that's
00:26:04.830 --> 00:26:09.190
sensitive to the difference
between the intensity
00:26:09.190 --> 00:26:10.570
of the two sounds.
00:26:10.570 --> 00:26:14.560
It's a measure of the
elevation of the sounds.
00:26:14.560 --> 00:26:17.590
All right, so this
kind of rotation matrix
00:26:17.590 --> 00:26:22.060
is very useful for
projecting stimuli
00:26:22.060 --> 00:26:26.530
into the right dimension so
that they give useful signals.
00:26:33.320 --> 00:26:38.600
All right, so let's come back
to our matrix transformations
00:26:38.600 --> 00:26:41.060
and look in a little
bit more detail
00:26:41.060 --> 00:26:43.640
about what kinds
of transformations
00:26:43.640 --> 00:26:45.270
you can do with matrices.
00:26:45.270 --> 00:26:50.720
So we talked about
how matrices can do
00:26:50.720 --> 00:26:53.630
stretch, compression, rotation.
00:26:53.630 --> 00:26:56.660
And we're going to talk about
a new kind of transformation
00:26:56.660 --> 00:26:59.880
that they can do.
00:26:59.880 --> 00:27:05.070
So you remember we talked about
how a matrix multiplication
00:27:05.070 --> 00:27:08.580
implements a transformation
from one set of vectors
00:27:08.580 --> 00:27:10.500
into another set of vectors?
00:27:10.500 --> 00:27:14.250
And the inverse of that
matrix transforms back
00:27:14.250 --> 00:27:17.820
to the original
set of vectors, OK?
00:27:17.820 --> 00:27:19.870
So you can make
a transformation,
00:27:19.870 --> 00:27:21.900
and then you can undo
that transformation
00:27:21.900 --> 00:27:25.680
by multiplying by the
inverse of the matrix.
00:27:25.680 --> 00:27:29.760
OK, so we talked about different
kinds of transformations
00:27:29.760 --> 00:27:31.380
that you can do.
00:27:31.380 --> 00:27:33.790
So if you take the
identity matrix
00:27:33.790 --> 00:27:35.580
and you make a
small perturbation
00:27:35.580 --> 00:27:38.730
to both of the diagonal
elements, the same perturbation
00:27:38.730 --> 00:27:40.860
to both diagonal
elements, you're basically
00:27:40.860 --> 00:27:43.620
taking a set of vectors
and you're stretching them
00:27:43.620 --> 00:27:45.810
uniformly in all directions.
00:27:45.810 --> 00:27:48.900
If you make a perturbation
to just one of the components
00:27:48.900 --> 00:27:51.540
of the identity matrix,
you can take the data
00:27:51.540 --> 00:27:55.200
and stretch it in one
direction or stretch it
00:27:55.200 --> 00:27:57.060
in the other direction.
00:27:57.060 --> 00:28:01.740
If you add something
to the first component
00:28:01.740 --> 00:28:04.020
and subtract something
from the second component,
00:28:04.020 --> 00:28:06.630
you can stretch in one
direction and compress
00:28:06.630 --> 00:28:08.710
in another direction.
00:28:08.710 --> 00:28:13.447
We talked about reflections and
inversions through the origin.
00:28:13.447 --> 00:28:15.030
These are all
transformations that are
00:28:15.030 --> 00:28:18.840
produced by diagonal matrices.
00:28:18.840 --> 00:28:22.260
And the inverse of
those diagonal matrices
00:28:22.260 --> 00:28:25.800
is just one over the
diagonal elements.
00:28:25.800 --> 00:28:28.170
OK, we also talked
about rotations
00:28:28.170 --> 00:28:30.960
that you can do with
this rotation matrix.
00:28:30.960 --> 00:28:34.380
And then the inverse
of the rotation matrix
00:28:34.380 --> 00:28:39.420
is, basically, you compute the
inverse of a rotation matrix
00:28:39.420 --> 00:28:41.610
simply by computing
the rotation matrix
00:28:41.610 --> 00:28:46.260
with a minus sign for
this, using the negative
00:28:46.260 --> 00:28:47.550
of the rotation angle.
00:28:50.750 --> 00:28:54.440
And we also talked about
how a rotation matrix--
00:28:54.440 --> 00:28:56.480
for a rotation
matrix, the inverse
00:28:56.480 --> 00:28:58.280
is also equal to the transpose.
00:28:58.280 --> 00:29:01.220
And the reason is
that rotation matrices
00:29:01.220 --> 00:29:04.460
have this antisymmetry, where
the off-diagonal elements have
00:29:04.460 --> 00:29:06.720
the opposite sign.
00:29:06.720 --> 00:29:09.680
One of the things we
haven't talked about is--
00:29:09.680 --> 00:29:15.950
so we talked about how
this kind of matrix
00:29:15.950 --> 00:29:20.960
can produce a stretch along
one dimension or a stretch
00:29:20.960 --> 00:29:24.560
along the other
dimension of the vectors.
00:29:24.560 --> 00:29:30.380
But one really important
kind of transformation
00:29:30.380 --> 00:29:35.180
that we need to understand is
how you can produce stretches
00:29:35.180 --> 00:29:37.460
in an arbitrary direction, OK?
00:29:37.460 --> 00:29:42.380
So not just along the x-axis or
along the y-axis, but along any
00:29:42.380 --> 00:29:44.990
arbitrary direction.
00:29:44.990 --> 00:29:48.410
And the reason we need
to know how that works
00:29:48.410 --> 00:29:53.240
is because that formulation
of how you write down a matrix
00:29:53.240 --> 00:29:56.780
to stretch data in any
arbitrary direction
00:29:56.780 --> 00:30:01.520
is the basis of a lot of
really important data analysis
00:30:01.520 --> 00:30:04.490
methods, including
principal component
00:30:04.490 --> 00:30:07.910
analysis and other methods.
00:30:07.910 --> 00:30:09.980
So I'm going to
walk you through how
00:30:09.980 --> 00:30:12.630
to think about making
stretches in data
00:30:12.630 --> 00:30:14.180
in arbitrary dimensions.
00:30:14.180 --> 00:30:18.380
OK, so here's what we're
going to walk through.
00:30:18.380 --> 00:30:20.000
Let's say we have
a set of vectors.
00:30:20.000 --> 00:30:21.093
I just picked--
00:30:21.093 --> 00:30:22.260
I don't know, what is that--
00:30:22.260 --> 00:30:25.940
20 or so random vectors.
00:30:25.940 --> 00:30:29.840
So I just called a random
number generator 20 times
00:30:29.840 --> 00:30:33.720
and just picked
20 random vectors.
00:30:33.720 --> 00:30:40.280
And we're going to figure out
how to write down a matrix that
00:30:40.280 --> 00:30:43.640
will transform
that set of vectors
00:30:43.640 --> 00:30:48.560
into another set of vectors that
stretched along some arbitrary
00:30:48.560 --> 00:30:50.870
axis.
00:30:50.870 --> 00:30:52.700
Does that make sense?
00:30:52.700 --> 00:30:55.450
So how do we do that?
00:30:55.450 --> 00:30:59.210
And remember, we know
how to do two things.
00:30:59.210 --> 00:31:03.070
We know how to stretch a set
of vectors along the x-axis.
00:31:03.070 --> 00:31:06.680
We know how to stretch
vectors along the y-axis,
00:31:06.680 --> 00:31:09.020
and we know how to
rotate a set of vectors.
00:31:09.020 --> 00:31:11.140
So we're just going
to combine those two
00:31:11.140 --> 00:31:14.500
ingredients to produce this
stretch in an arbitrary
00:31:14.500 --> 00:31:15.560
direction.
00:31:15.560 --> 00:31:18.290
So now I've given
you the recipe--
00:31:18.290 --> 00:31:20.200
or I've given you
the ingredients.
00:31:20.200 --> 00:31:21.900
The recipe's pretty
obvious, right?
00:31:21.900 --> 00:31:25.450
We're going to take this
set of initial vectors.
00:31:25.450 --> 00:31:26.150
Good.
00:31:26.150 --> 00:31:26.650
Lina?
00:31:26.650 --> 00:31:28.960
AUDIENCE: You [INAUDIBLE].
00:31:28.960 --> 00:31:29.890
That's it.
00:31:29.890 --> 00:31:30.880
MICHALE FEE: Bingo.
00:31:30.880 --> 00:31:32.170
That's it.
00:31:32.170 --> 00:31:33.630
OK, so we're going to take--
00:31:33.630 --> 00:31:36.400
all right, so we're going to
rotate this thing 45 degrees.
00:31:36.400 --> 00:31:38.500
We take this original
set of vectors.
00:31:38.500 --> 00:31:39.550
We're going to--
00:31:39.550 --> 00:31:41.830
OK, so first of
all, the first thing
00:31:41.830 --> 00:31:45.010
we do when we want to
take a set of points
00:31:45.010 --> 00:31:47.530
and stretch it along
an arbitrary direction,
00:31:47.530 --> 00:31:49.960
we pick that angle that
we want to stretch it
00:31:49.960 --> 00:31:51.880
on-- in this case, 45 degrees.
00:31:51.880 --> 00:31:55.150
And we write down a rotation
matrix corresponding
00:31:55.150 --> 00:31:58.490
to that rotation,
corresponding to that angle.
00:31:58.490 --> 00:32:01.780
So that's the first thing we do.
00:32:01.780 --> 00:32:04.550
So we've chosen 45
degrees as the angle
00:32:04.550 --> 00:32:06.040
we want to stretch on.
00:32:06.040 --> 00:32:08.080
So now we write down
a rotation matrix
00:32:08.080 --> 00:32:11.230
for a 45-degree rotation.
00:32:11.230 --> 00:32:12.730
Then what we're
going to do is we're
00:32:12.730 --> 00:32:15.820
going to take that set
of points and we're
00:32:15.820 --> 00:32:19.540
going to rotate it
by minus 45 degrees.
00:32:24.190 --> 00:32:27.210
So how do we do that?
00:32:27.210 --> 00:32:33.000
How do we take any one of those
vectors x and rotate it by--
00:32:33.000 --> 00:32:36.800
so this that rotation
matrix is for plus 45.
00:32:36.800 --> 00:32:41.322
How do we rotate that
vector by minus 45?
00:32:41.322 --> 00:32:44.150
AUDIENCE: [INAUDIBLE] multiply
it by the [INAUDIBLE]..
00:32:44.150 --> 00:32:44.900
MICHALE FEE: Good.
00:32:44.900 --> 00:32:45.760
Say it.
00:32:45.760 --> 00:32:47.510
AUDIENCE: Multiply by
the inverse of that.
00:32:47.510 --> 00:32:49.170
MICHALE FEE: Yeah, and
what's the inverse of a--
00:32:49.170 --> 00:32:49.770
AUDIENCE: Transpose.
00:32:49.770 --> 00:32:50.728
MICHALE FEE: Transpose.
00:32:50.728 --> 00:32:54.360
So we don't have to go to Matlab
and use the inverse matrix
00:32:54.360 --> 00:32:55.470
in inversion.
00:32:55.470 --> 00:32:58.560
We can just do the transpose.
00:32:58.560 --> 00:33:03.570
OK, so we take that vector and
we multiply it by transpose.
00:33:03.570 --> 00:33:06.060
So that does a minus
45-degree rotation
00:33:06.060 --> 00:33:08.082
of all of those points.
00:33:08.082 --> 00:33:09.040
And then what do we do?
00:33:13.290 --> 00:33:14.250
Lina, you said it.
00:33:14.250 --> 00:33:14.760
Stretch it.
00:33:14.760 --> 00:33:16.708
Stretch it along?
00:33:16.708 --> 00:33:19.180
AUDIENCE: The x-axis?
00:33:19.180 --> 00:33:21.520
MICHALE FEE: The x-axis, good.
00:33:21.520 --> 00:33:25.040
What does that matrix
look like that does that?
00:33:25.040 --> 00:33:27.234
Just give me-- yup?
00:33:27.234 --> 00:33:29.165
AUDIENCE: 5, 0, 0, 1.
00:33:29.165 --> 00:33:30.040
MICHALE FEE: Awesome.
00:33:30.040 --> 00:33:30.580
That's it.
00:33:30.580 --> 00:33:36.480
So we're going to stretch
using a stretch matrix.
00:33:36.480 --> 00:33:39.220
So I use phi for
a rotation matrix,
00:33:39.220 --> 00:33:42.830
and I use lambda for a
stretch matrix, a stretch
00:33:42.830 --> 00:33:45.940
matrix along x or y.
00:33:45.940 --> 00:33:48.790
Lambda is a diagonal
matrix, which always just
00:33:48.790 --> 00:33:52.910
stretches or compresses
along the x or y direction.
00:33:52.910 --> 00:33:55.045
And then what do we do?
00:33:55.045 --> 00:33:56.320
AUDIENCE: [INAUDIBLE]
00:33:56.320 --> 00:33:57.070
MICHALE FEE: Good.
00:33:57.070 --> 00:34:00.220
By multiplying by?
00:34:00.220 --> 00:34:01.690
By this.
00:34:01.690 --> 00:34:03.700
Excellent.
00:34:03.700 --> 00:34:04.210
That's all.
00:34:04.210 --> 00:34:05.890
So how do we write this down?
00:34:05.890 --> 00:34:09.370
So, remember, here, we're sort
of marching through the recipe
00:34:09.370 --> 00:34:12.520
from left to right.
00:34:12.520 --> 00:34:16.070
When you write down matrices,
you go the other way.
00:34:16.070 --> 00:34:18.070
So when you do matrix
multiplication,
00:34:18.070 --> 00:34:22.300
you take your vector x and you
multiply it on the left side
00:34:22.300 --> 00:34:25.929
by phi transpose.
00:34:25.929 --> 00:34:28.810
And then you take that and you
multiply that on the left side
00:34:28.810 --> 00:34:30.969
by lambda.
00:34:30.969 --> 00:34:33.020
And then you take that.
00:34:33.020 --> 00:34:34.719
That now gives you these.
00:34:34.719 --> 00:34:38.210
And now to get the
final answer here,
00:34:38.210 --> 00:34:42.565
you multiply again on
the left side by phi.
00:34:42.565 --> 00:34:44.409
That's it.
00:34:44.409 --> 00:34:47.230
That's how you produce
an arbitrary stretch--
00:34:47.230 --> 00:34:49.630
a stretch or a
compression of a data
00:34:49.630 --> 00:34:52.480
in an arbitrary
direction, all right?
00:34:52.480 --> 00:34:55.449
You take the data, the vector.
00:34:55.449 --> 00:34:59.200
You multiply it by a
rotation matrix transpose,
00:34:59.200 --> 00:35:02.560
multiply it by a stretch
matrix, a diagonal matrix,
00:35:02.560 --> 00:35:07.150
and you multiply it
by a rotation matrix.
00:35:07.150 --> 00:35:09.376
Rotate, stretch, unrotate.
00:35:14.860 --> 00:35:18.610
So let's actually do
this for 45 degrees.
00:35:18.610 --> 00:35:22.850
So there's our rotation matrix--
00:35:22.850 --> 00:35:27.470
1, minus 1, 1, 1.
00:35:27.470 --> 00:35:31.590
The transpose is
1, 1, minus 1, 1.
00:35:31.590 --> 00:35:33.670
And here's our stretch matrix.
00:35:33.670 --> 00:35:37.320
In this case, it was
stretched by a factor of two.
00:35:37.320 --> 00:35:44.250
So we multiply x by phi
transpose, multiply by lambda,
00:35:44.250 --> 00:35:46.680
and then multiply by phi.
00:35:46.680 --> 00:35:49.320
So we can now write that down.
00:35:49.320 --> 00:35:51.600
If you just do
those three matrix
00:35:51.600 --> 00:35:54.580
multiplications-- those two
matrix multiplications, sorry,
00:35:54.580 --> 00:35:55.080
yes?
00:35:55.080 --> 00:35:56.370
One, two.
00:35:56.370 --> 00:35:58.920
Two matrix multiplications.
00:35:58.920 --> 00:36:02.797
You get a single matrix
that when you multiply it by
00:36:02.797 --> 00:36:05.175
x implements this stretch.
00:36:08.040 --> 00:36:10.000
Any questions about that?
00:36:10.000 --> 00:36:12.910
You should ask me now
if you don't understand,
00:36:12.910 --> 00:36:16.780
because I want you to be able
to do this for an arbitrary--
00:36:16.780 --> 00:36:20.980
so I'm going to
give you some angle,
00:36:20.980 --> 00:36:25.090
and I'll tell you,
construct a matrix that
00:36:25.090 --> 00:36:32.290
stretches data along a 30-degree
axis by a factor of five.
00:36:32.290 --> 00:36:35.410
You should be able to
write down that matrix.
00:36:35.410 --> 00:36:37.510
All right, so this is
what you're going to do,
00:36:37.510 --> 00:36:41.710
and that's what that matrix will
look like, something like that.
00:36:41.710 --> 00:36:48.880
Now, we can stretch these
data along a 45-degree axis
00:36:48.880 --> 00:36:50.350
by some factor.
00:36:50.350 --> 00:36:52.430
It's a factor of two here.
00:36:52.430 --> 00:36:53.500
How do we go back?
00:36:53.500 --> 00:36:57.340
How do we undo that stretch?
00:36:57.340 --> 00:37:01.780
So how do you take the inverse
of a product of a bunch
00:37:01.780 --> 00:37:03.480
of matrices like this?
00:37:03.480 --> 00:37:05.600
So the answer is very simple.
00:37:05.600 --> 00:37:10.420
If we want to take the inverse
of a product of three matrices,
00:37:10.420 --> 00:37:13.570
what we do is we just--
00:37:13.570 --> 00:37:16.790
it's, again, a product
of three matrices.
00:37:16.790 --> 00:37:20.830
It's a product of the inverse
of those three matrices,
00:37:20.830 --> 00:37:22.880
but you have to
reverse the order.
00:37:22.880 --> 00:37:25.660
So if you want to find the
inverse of matrix A times B
00:37:25.660 --> 00:37:31.270
times C, it's C inverse times
B inverse times A inverse.
00:37:31.270 --> 00:37:34.970
And you can prove that that's
the right term as follows.
00:37:34.970 --> 00:37:42.520
So ABC inverse times ABC should
be the identity matrix, right?
00:37:42.520 --> 00:37:49.150
So let's replace this
by this result here.
00:37:49.150 --> 00:37:52.020
So C inverse B inverse
A inverse times
00:37:52.020 --> 00:37:54.790
ABC would be the
identity matrix.
00:37:54.790 --> 00:38:01.340
And you can see that right
here, A inverse times A is i.
00:38:01.340 --> 00:38:03.290
So you can get rid of that.
00:38:03.290 --> 00:38:06.740
B inverse times B is i.
00:38:06.740 --> 00:38:11.480
C inverse times C is i.
00:38:11.480 --> 00:38:14.900
So we just proved that
that is the correct way
00:38:14.900 --> 00:38:18.070
of taking the inverse of a
product of matrices, all right?
00:38:20.650 --> 00:38:26.050
So the inverse of
this kind of matrix
00:38:26.050 --> 00:38:30.210
that stretches data along
an arbitrary direction
00:38:30.210 --> 00:38:31.080
looks like this.
00:38:31.080 --> 00:38:37.050
It's phi transpose inverse
lambda inverse phi inverse.
00:38:37.050 --> 00:38:40.320
So let's figure out what
each one of those things is.
00:38:40.320 --> 00:38:44.100
So what is phi
transpose inverse,
00:38:44.100 --> 00:38:46.228
where phi is a rotation matrix?
00:38:46.228 --> 00:38:47.020
AUDIENCE: Just phi.
00:38:47.020 --> 00:38:48.940
MICHALE FEE: Phi, good.
00:38:48.940 --> 00:38:51.171
And what is phi inverse?
00:38:51.171 --> 00:38:52.464
AUDIENCE: [INAUDIBLE]
00:38:52.464 --> 00:38:53.760
MICHALE FEE: [INAUDIBLE].
00:38:53.760 --> 00:38:55.190
Good.
00:38:55.190 --> 00:38:58.320
And lambda inverse we'll
get to in a second.
00:38:58.320 --> 00:39:05.300
So the inverse of this
arbitrary rotated stretch matrix
00:39:05.300 --> 00:39:12.450
is just another rotated
stretch matrix, right?
00:39:12.450 --> 00:39:17.860
Where the lambda now has--
00:39:17.860 --> 00:39:21.370
lambda inverse is just
given by the inverse of each
00:39:21.370 --> 00:39:24.240
of those diagonal elements.
00:39:24.240 --> 00:39:28.815
So it's super easy to
find the inverse of one
00:39:28.815 --> 00:39:33.200
of these matrices that computes
this stretch in an arbitrary
00:39:33.200 --> 00:39:34.550
direction.
00:39:34.550 --> 00:39:36.800
You just keep the same phi.
00:39:36.800 --> 00:39:40.940
It's just phi times some
diagonal matrix times
00:39:40.940 --> 00:39:45.963
phi transpose, but the
diagonals are inverted.
00:39:45.963 --> 00:39:46.880
Does that makes sense?
00:39:49.700 --> 00:39:51.110
All right, so
let's write it out.
00:39:51.110 --> 00:39:55.550
We're going to undo this
45-degree stretch that we just
00:39:55.550 --> 00:39:56.520
did.
00:39:56.520 --> 00:40:02.060
We're going to do it by
rotating, stretching by 1/2
00:40:02.060 --> 00:40:04.520
instead of stretching by two.
00:40:04.520 --> 00:40:09.060
So you can see that compresses
now along the x-axis.
00:40:09.060 --> 00:40:10.790
And then we rotate
back, and we're back
00:40:10.790 --> 00:40:14.380
to our original data.
00:40:14.380 --> 00:40:17.110
Any questions about that?
00:40:17.110 --> 00:40:19.570
It's really easy,
as long as you just
00:40:19.570 --> 00:40:25.000
think through what you're doing
as you go through those steps,
00:40:25.000 --> 00:40:25.870
all right?
00:40:25.870 --> 00:40:26.970
Any questions about that?
00:40:31.200 --> 00:40:32.410
OK.
00:40:32.410 --> 00:40:32.910
Wow.
00:40:37.910 --> 00:40:38.690
All right.
00:40:38.690 --> 00:40:41.090
So you can actually
just write those down
00:40:41.090 --> 00:40:46.100
and compute the
single matrix that
00:40:46.100 --> 00:40:55.710
implements this compression
along that 45-degree axis, OK?
00:40:55.710 --> 00:40:56.210
All right.
00:41:00.040 --> 00:41:02.470
So let me just show
you one other example.
00:41:02.470 --> 00:41:04.930
And I'll show you
something interesting
00:41:04.930 --> 00:41:09.680
that happens if you construct
a matrix that instead
00:41:09.680 --> 00:41:13.400
of stretching along a
45-degree axis does compression
00:41:13.400 --> 00:41:16.130
along a 45-degree axis.
00:41:16.130 --> 00:41:18.690
So here's our original data.
00:41:18.690 --> 00:41:25.155
Let's take that data and
rotate it by plus 45 degrees.
00:41:28.100 --> 00:41:33.720
Multiplied by lambda, that
compresses along the x-axis
00:41:33.720 --> 00:41:39.150
and then rotates by
minus 45 degrees.
00:41:39.150 --> 00:41:44.670
So here's an example where we
can take data and compress it
00:41:44.670 --> 00:41:48.750
along an axis of minus
45 degrees, all right?
00:41:48.750 --> 00:41:50.130
So you can write this down.
00:41:50.130 --> 00:41:52.440
So we're going to say
we're going to compress
00:41:52.440 --> 00:41:54.630
along a minus 45 degree axis.
00:41:54.630 --> 00:41:57.450
We write down phi of minus 45.
00:41:57.450 --> 00:42:00.453
Notice that when you do this
compression or stretching,
00:42:00.453 --> 00:42:02.370
there are different ways
you can do it, right?
00:42:02.370 --> 00:42:03.930
You can take the data.
00:42:03.930 --> 00:42:08.190
You can rotate it this way and
then squish along this axis.
00:42:08.190 --> 00:42:12.090
Or you could rotate it this
way and squish along this axis,
00:42:12.090 --> 00:42:12.590
right?
00:42:15.770 --> 00:42:18.277
So there are choices
for how you do it.
00:42:18.277 --> 00:42:19.860
But in the end,
you're going to end up
00:42:19.860 --> 00:42:23.460
with the same matrix that
does all of those equivalent
00:42:23.460 --> 00:42:24.420
transformations.
00:42:24.420 --> 00:42:25.750
OK, so here we are.
00:42:25.750 --> 00:42:27.000
We're going to write this out.
00:42:27.000 --> 00:42:28.458
So we're writing
down a matrix that
00:42:28.458 --> 00:42:32.730
produces this compression
along a minus 45-degree axis.
00:42:32.730 --> 00:42:34.770
So there's 5 minus 45.
00:42:34.770 --> 00:42:37.720
There's lambda, a
compression along the x-axis.
00:42:37.720 --> 00:42:41.950
So here, it's 0.2001.
00:42:41.950 --> 00:42:44.250
And here's the phi transpose.
00:42:44.250 --> 00:42:52.310
So you write all that out, and
you get 0.6, 0.4, 0.4, 0.4.
00:42:52.310 --> 00:42:53.540
Let me show you one more.
00:42:56.240 --> 00:43:03.980
What happens if we accidentally
take this data, we rotate it,
00:43:03.980 --> 00:43:09.068
and then we squish
the data to zero?
00:43:09.068 --> 00:43:10.496
Yes?
00:43:10.496 --> 00:43:16.210
AUDIENCE: [INAUDIBLE]
00:43:16.210 --> 00:43:17.350
MICHALE FEE: It doesn't.
00:43:17.350 --> 00:43:18.340
You can do either one.
00:43:21.780 --> 00:43:22.490
Let me go back.
00:43:32.940 --> 00:43:34.690
Let me just go back
to the very first one.
00:43:37.680 --> 00:43:42.390
So here, we rotated
clockwise and then
00:43:42.390 --> 00:43:46.020
stretched along the
x-axis and then unrotated.
00:43:46.020 --> 00:43:51.930
We could have taken these
data, rotated counterclockwise,
00:43:51.930 --> 00:43:56.695
stretched along the y-axis,
and then rotated back, right?
00:43:56.695 --> 00:43:57.570
Does that make sense?
00:44:01.240 --> 00:44:03.070
You'll still get
the same answer.
00:44:03.070 --> 00:44:07.750
You'll still get the same
answer for this matrix here.
00:44:11.940 --> 00:44:13.230
OK, now watch this.
00:44:19.560 --> 00:44:23.120
What happens if we take
these data, we rotate them,
00:44:23.120 --> 00:44:29.650
and then we compress
data all the way to zero?
00:44:29.650 --> 00:44:32.660
So by compressing
the data to a line,
00:44:32.660 --> 00:44:34.820
we're multiplying it by zero.
00:44:34.820 --> 00:44:40.440
We put a zero in this element of
the stretch matrix, all right?
00:44:40.440 --> 00:44:41.450
And what happens?
00:44:41.450 --> 00:44:46.120
The data get compressed
right to zero, OK?
00:44:46.120 --> 00:44:47.360
And then we can rotate back.
00:44:47.360 --> 00:44:49.460
So we've taken these data.
00:44:49.460 --> 00:44:53.150
We can write down a matrix
that takes those data
00:44:53.150 --> 00:45:00.310
and squishes them to zero
along some arbitrary direction.
00:45:00.310 --> 00:45:08.510
Now, can we take those data and
go back to the original data?
00:45:08.510 --> 00:45:10.220
Can we write down
a transformation
00:45:10.220 --> 00:45:13.310
that takes those and goes
back to the original data?
00:45:13.310 --> 00:45:15.119
Why not?
00:45:15.119 --> 00:45:16.877
AUDIENCE: Lambda
doesn't [INAUDIBLE]..
00:45:16.877 --> 00:45:17.960
MICHALE FEE: Say it again.
00:45:17.960 --> 00:45:19.340
AUDIENCE: Lambda
doesn't [INAUDIBLE]..
00:45:19.340 --> 00:45:20.090
MICHALE FEE: Good.
00:45:20.090 --> 00:45:22.412
What's another way
to think about that?
00:45:22.412 --> 00:45:24.180
AUDIENCE: We've
lost [INAUDIBLE]..
00:45:24.180 --> 00:45:26.310
MICHALE FEE: You've
lost that information.
00:45:26.310 --> 00:45:30.990
So in order to go back from
here to the original data,
00:45:30.990 --> 00:45:35.280
you have to have information
somewhere here that tells you
00:45:35.280 --> 00:45:40.240
how far out to stretch it
again when you try to go back.
00:45:40.240 --> 00:45:42.160
But in this case, we've
compressed everything
00:45:42.160 --> 00:45:46.260
to a line, and so
there's no information
00:45:46.260 --> 00:45:48.140
how to go back to
the original data.
00:45:51.610 --> 00:45:54.700
And how do you know
if you've done this?
00:45:54.700 --> 00:45:58.195
Well, you can take a look at
this matrix that you created.
00:46:00.720 --> 00:46:03.210
So let's say somebody
gave you this matrix.
00:46:03.210 --> 00:46:05.810
How would you tell
whether you could
00:46:05.810 --> 00:46:07.100
back to the original data?
00:46:09.660 --> 00:46:11.890
Any ideas?
00:46:11.890 --> 00:46:13.042
Abiba?
00:46:13.042 --> 00:46:14.260
AUDIENCE: [INAUDIBLE]
00:46:14.260 --> 00:46:15.010
MICHALE FEE: Good.
00:46:15.010 --> 00:46:16.177
You look at the determinant.
00:46:16.177 --> 00:46:19.480
So if you calculate the
determinant of this matrix,
00:46:19.480 --> 00:46:21.100
the determinant is zero.
00:46:21.100 --> 00:46:23.620
And as soon as you see
a zero determinant,
00:46:23.620 --> 00:46:27.100
you know right away
that you can't go back.
00:46:27.100 --> 00:46:28.840
After you've made
this transformation,
00:46:28.840 --> 00:46:32.320
you can't go back to
the original data.
00:46:32.320 --> 00:46:36.010
And we're going to get into a
little more detail about why
00:46:36.010 --> 00:46:39.040
that is and what that means.
00:46:39.040 --> 00:46:43.660
And the reason here is that the
determinant of lambda is zero.
00:46:43.660 --> 00:46:46.780
The determinant of
a product matrices
00:46:46.780 --> 00:46:49.210
like this is the product
of the determinants.
00:46:49.210 --> 00:46:51.940
And in this case, the
determinant of the lambda
00:46:51.940 --> 00:46:55.510
matrix is zero, and so the
determinant of the product
00:46:55.510 --> 00:46:57.910
is zero, OK?
00:46:57.910 --> 00:47:02.930
All right, so now let's
talk about basis sets.
00:47:02.930 --> 00:47:07.230
All right, so we can think of
vectors in abstract directions.
00:47:07.230 --> 00:47:11.190
So if I hold my arm
out here and tell you
00:47:11.190 --> 00:47:13.220
this is a vector--
there's the origin.
00:47:13.220 --> 00:47:15.390
The vectors pointing
in that direction.
00:47:15.390 --> 00:47:19.020
You don't need a
coordinate system
00:47:19.020 --> 00:47:21.540
to know which way I'm pointing.
00:47:21.540 --> 00:47:25.800
I don't need to tell
you my arm is pointing
00:47:25.800 --> 00:47:28.470
80 centimeters in
that direction and 40
00:47:28.470 --> 00:47:30.938
centimeters in that
direction and 10 centimeters
00:47:30.938 --> 00:47:31.980
in that direction, right?
00:47:31.980 --> 00:47:34.200
You don't need a
coordinate system
00:47:34.200 --> 00:47:38.930
to know which way
I'm pointing, right?
00:47:38.930 --> 00:47:44.870
But if I want to
quantify that vector so
00:47:44.870 --> 00:47:47.690
that-- if you want to quantify
that vector so that you can
00:47:47.690 --> 00:47:50.780
maybe tell somebody else
precisely which direction I'm
00:47:50.780 --> 00:47:55.890
pointing, you need to write
down those numbers, OK?
00:47:55.890 --> 00:48:00.280
So you can think of vectors
in abstract directions,
00:48:00.280 --> 00:48:05.040
but if you want to actually
quantify it or write it down,
00:48:05.040 --> 00:48:07.410
you need to choose
a coordinate system.
00:48:07.410 --> 00:48:10.170
And so to do this,
you choose a set
00:48:10.170 --> 00:48:13.890
of vectors, special
vectors, called a basis set.
00:48:13.890 --> 00:48:16.590
And now we just say,
here's a vector.
00:48:16.590 --> 00:48:21.510
How much is it pointing in
that direction, that direction,
00:48:21.510 --> 00:48:22.890
and that direction?
00:48:22.890 --> 00:48:24.870
And that's called a basis set.
00:48:24.870 --> 00:48:28.230
So we can write
down our vector now
00:48:28.230 --> 00:48:32.490
as a set of three numbers
that simply tell us
00:48:32.490 --> 00:48:35.520
how far that vector
is overlapped
00:48:35.520 --> 00:48:39.810
with three other vectors
that form the basis set.
00:48:39.810 --> 00:48:41.430
So the standard
way of doing this
00:48:41.430 --> 00:48:47.080
is to describe a vector as a
component in the x direction,
00:48:47.080 --> 00:48:51.400
which is a vector 1, 1, 0, sort
of in the standard notation;
00:48:51.400 --> 00:48:53.880
a component in the y
direction, which is 0,
00:48:53.880 --> 00:48:58.380
1, 0; and a component in
the z direction, 0, 0, 1.
00:48:58.380 --> 00:49:04.920
So we can write those vectors
as standard basis vectors.
00:49:04.920 --> 00:49:07.260
The numbers x, y,
and z here are called
00:49:07.260 --> 00:49:09.150
the coordinates of the vector.
00:49:09.150 --> 00:49:13.950
And the vectors e1, e2, and e3
are called the basis vectors.
00:49:13.950 --> 00:49:16.380
And this is how you
would write that down
00:49:16.380 --> 00:49:18.660
for a three-dimensional
vector, OK?
00:49:18.660 --> 00:49:20.640
Again, the little
hat here denotes
00:49:20.640 --> 00:49:25.600
that those are unit vectors
that have a length one.
00:49:25.600 --> 00:49:27.680
All right, so in order
to describe an arbitrary
00:49:27.680 --> 00:49:30.770
vector in a space
of n real numbers,
00:49:30.770 --> 00:49:36.620
Rn, the basis vectors each
need to have n numbers.
00:49:36.620 --> 00:49:39.410
And in order to describe an
arbitrary vector in that space,
00:49:39.410 --> 00:49:42.710
you need to have
n basis vectors.
00:49:42.710 --> 00:49:44.570
You need to have--
00:49:44.570 --> 00:49:47.630
in n dimensions, you need
to have n basis vectors,
00:49:47.630 --> 00:49:52.830
and each one knows basis vectors
has to have n numbers in them.
00:49:52.830 --> 00:49:55.130
So these vectors here--
00:49:55.130 --> 00:49:59.435
1, 0, 0; 0, 1, 0; and 0, 0, 1--
are called the standard basis.
00:50:03.120 --> 00:50:06.120
And each one of these values
has one element that's one
00:50:06.120 --> 00:50:07.200
and the rest are zero.
00:50:07.200 --> 00:50:08.340
That's the standard basis.
00:50:12.720 --> 00:50:16.450
The standard basis
has the property
00:50:16.450 --> 00:50:20.470
that any one of those vectors
dotted into itself is one.
00:50:20.470 --> 00:50:22.060
That's because
they're unit vectors.
00:50:22.060 --> 00:50:23.440
They have length one.
00:50:23.440 --> 00:50:28.960
So i dot ei is the length
squared of the i-th vector.
00:50:28.960 --> 00:50:32.340
And if the length is one, then
the length squared is one.
00:50:32.340 --> 00:50:36.210
Each vector is orthogonal
to all the other vectors.
00:50:36.210 --> 00:50:41.440
That means that each e1 dot e2
is zero, and e1 dot e3 is zero,
00:50:41.440 --> 00:50:43.770
and e2 dot e3 is zero.
00:50:43.770 --> 00:50:49.350
You can write down as e sub i
dot e sub j equals zero for i
00:50:49.350 --> 00:50:52.580
not equal to j.
00:50:52.580 --> 00:50:54.470
You can write all
of those properties
00:50:54.470 --> 00:50:56.420
down in one equation--
00:50:56.420 --> 00:51:00.920
e sub i dot e sub
j equals delta i j.
00:51:00.920 --> 00:51:05.950
Delta i j is what's called
the Kronecker delta function.
00:51:05.950 --> 00:51:09.800
The Kronecker delta function is
a one if i equals j and a zero
00:51:09.800 --> 00:51:13.430
if i is not equal to j, OK?
00:51:13.430 --> 00:51:16.730
So it's a very compact way
of writing down this property
00:51:16.730 --> 00:51:19.070
that each vector
is a unit vector
00:51:19.070 --> 00:51:23.370
and each vector is orthogonal
to all the other vectors.
00:51:23.370 --> 00:51:28.140
And the set with that property
is called an off an orthonormal
00:51:28.140 --> 00:51:28.790
basis set.
00:51:31.700 --> 00:51:37.600
All right, now, the standard
basis is not the only basis--
00:51:37.600 --> 00:51:39.110
sorry.
00:51:39.110 --> 00:51:41.850
I'm trying to do
x, y, and z here.
00:51:41.850 --> 00:51:45.510
So if you have x,
y, and z, that's
00:51:45.510 --> 00:51:48.690
not the only
orthonormal basis set.
00:51:48.690 --> 00:51:54.480
Any basis set that is a
rotation of those three vectors
00:51:54.480 --> 00:51:57.880
is also an orthonormal basis.
00:51:57.880 --> 00:52:02.720
Let's write down two other
orthogonal unit vectors.
00:52:02.720 --> 00:52:06.500
We can write down our
vector v in this other basis
00:52:06.500 --> 00:52:08.760
set as follows.
00:52:08.760 --> 00:52:13.910
We just take our vector v.
We can plot the basis vectors
00:52:13.910 --> 00:52:15.650
in this other basis.
00:52:15.650 --> 00:52:20.370
And we can simply project v
onto those other basis vectors.
00:52:20.370 --> 00:52:26.540
So we can project v onto f1,
and we can project v onto f2.
00:52:26.540 --> 00:52:32.030
So we can write v as a sum of
a vector in the direction of f1
00:52:32.030 --> 00:52:33.890
and a vector in the
direction of f2.
00:52:36.420 --> 00:52:42.720
You can write down this vector
v in this different basis set
00:52:42.720 --> 00:52:45.580
as a vector with two components.
00:52:45.580 --> 00:52:48.270
This is two dimensional.
00:52:48.270 --> 00:52:50.180
This is R2.
00:52:50.180 --> 00:52:53.010
You can write it down as
a two-component vector--
00:52:53.010 --> 00:52:56.460
v dot f1 and v dot f2.
00:52:56.460 --> 00:52:59.460
So that's a simple intuition
for what [AUDIO OUT]
00:52:59.460 --> 00:53:01.050
in two dimensions.
00:53:01.050 --> 00:53:05.370
We're going to develop the
formalism for doing this
00:53:05.370 --> 00:53:07.100
in arbitrary dimensions, OK?
00:53:07.100 --> 00:53:09.620
And it's very simple.
00:53:09.620 --> 00:53:14.100
All right, these
components here are
00:53:14.100 --> 00:53:20.240
called the vector coordinates
of this vector basis f.
00:53:20.240 --> 00:53:26.360
All right, now, basis
sets, or basis vectors,
00:53:26.360 --> 00:53:29.000
don't have to be
orthogonal to each other,
00:53:29.000 --> 00:53:31.750
and they don't
have to be normal.
00:53:31.750 --> 00:53:33.980
They don't have
to be unit vector.
00:53:33.980 --> 00:53:37.220
You can write down
an arbitrary vector
00:53:37.220 --> 00:53:41.570
as a sum of
components that aren't
00:53:41.570 --> 00:53:43.350
orthogonal to each other.
00:53:43.350 --> 00:53:45.080
So you can write
down this vector v
00:53:45.080 --> 00:53:50.510
as a sum of a component
here in the f1 direction
00:53:50.510 --> 00:53:53.100
and a component in
the f2 direction,
00:53:53.100 --> 00:53:56.330
even if f1 and f2 are not
orthogonal to each other
00:53:56.330 --> 00:53:59.100
and even if they're
not unit vectors.
00:53:59.100 --> 00:54:02.930
So, again, v is expressed
as a linear combination
00:54:02.930 --> 00:54:05.360
of a vector in the f1
direction and a vector
00:54:05.360 --> 00:54:07.760
in the f2 direction.
00:54:07.760 --> 00:54:12.400
OK, so let's take a
vector and decompose it
00:54:12.400 --> 00:54:15.420
into an arbitrary
basis set f1 and f2.
00:54:18.120 --> 00:54:22.510
So v equals c1 f1 plus c2 f2.
00:54:22.510 --> 00:54:24.560
The coefficients here are
called the coordinates
00:54:24.560 --> 00:54:27.020
of the vector in this basis.
00:54:27.020 --> 00:54:30.710
And the vector v sub f--
00:54:30.710 --> 00:54:39.620
these numbers, c1 and c2, when
combined into this vector,
00:54:39.620 --> 00:54:44.840
is called the coordinate
vector of v in the basis f1
00:54:44.840 --> 00:54:46.880
and f2, OK?
00:54:46.880 --> 00:54:47.870
Does that makes sense?
00:54:47.870 --> 00:54:49.440
Just some terminology.
00:54:52.630 --> 00:54:56.440
OK, so let's define
this basis, f1 and f2.
00:54:56.440 --> 00:55:00.550
We just pick two vectors,
an arbitrary two vectors.
00:55:00.550 --> 00:55:05.740
And I'll explain later that not
all choices of vectors work,
00:55:05.740 --> 00:55:08.030
but most of them do.
00:55:08.030 --> 00:55:11.080
So here are two vectors that
we can choose as a basis--
00:55:11.080 --> 00:55:17.105
so 1, 3, which is sort of
like this, and minus 2, 1
00:55:17.105 --> 00:55:17.980
is kind of like that.
00:55:22.442 --> 00:55:24.150
And we're going to
write down this vector
00:55:24.150 --> 00:55:26.070
v in this new basis.
00:55:26.070 --> 00:55:30.250
So we have a vector v that's
3, 5 in the standard basis,
00:55:30.250 --> 00:55:35.250
and we're going to rewrite it
in this new basis, all right?
00:55:35.250 --> 00:55:37.680
So we're going to find the
vector coordinates of v
00:55:37.680 --> 00:55:39.150
in the new basis.
00:55:39.150 --> 00:55:40.840
So we're going to
do this as follows.
00:55:40.840 --> 00:55:43.650
We're going to write v as a
linear combination of these two
00:55:43.650 --> 00:55:45.240
basis vectors.
00:55:45.240 --> 00:55:49.580
So c1 times f1--
00:55:49.580 --> 00:55:53.060
1, 3-- plus c2 times f2--
00:55:53.060 --> 00:55:56.240
minus 2, 1-- is equal to 3, 5.
00:55:56.240 --> 00:55:57.680
That make sense?
00:55:57.680 --> 00:55:58.370
So what is that?
00:55:58.370 --> 00:56:04.010
That is just a system
of equations, right?
00:56:04.010 --> 00:56:08.570
And what we're trying to
do is solve for c1 and c2.
00:56:08.570 --> 00:56:09.470
That's it.
00:56:09.470 --> 00:56:13.010
So we already did this
problem in the last lecture.
00:56:16.420 --> 00:56:18.440
So we have this
system of equations.
00:56:18.440 --> 00:56:23.060
We can write this down in the
following matrix notation.
00:56:23.060 --> 00:56:29.280
F times vf-- vf is
just c1 and c2--
00:56:29.280 --> 00:56:31.305
equals v. So there's F--
00:56:31.305 --> 00:56:32.970
1, 3; minus 2, 1.
00:56:32.970 --> 00:56:36.030
Those are our two basis vectors.
00:56:36.030 --> 00:56:41.130
Times c1 c2-- the vector
c1, c2-- is equal to 3, 5.
00:56:41.130 --> 00:56:43.620
And we solve for vf.
00:56:43.620 --> 00:56:46.080
In other words, we
solve for c1 and c2
00:56:46.080 --> 00:56:56.540
simply by multiplying v by
the inverse of this matrix F.
00:56:56.540 --> 00:57:02.820
So the coordinate vector
in this new base is said
00:57:02.820 --> 00:57:06.810
is just the old vector
times f inverse.
00:57:06.810 --> 00:57:08.430
And what is f inverse?
00:57:08.430 --> 00:57:16.750
F inverse is just a matrix
that has the basis vectors
00:57:16.750 --> 00:57:18.175
as the columns of the matrix.
00:57:24.330 --> 00:57:29.730
So the coordinates of this
vector in his new basis set
00:57:29.730 --> 00:57:35.260
are given by f inverse times v.
We can find the inverse of f.
00:57:35.260 --> 00:57:40.690
So if that's our f, we can
calculate the inverse of that.
00:57:40.690 --> 00:57:44.050
Remember, you flip
the diagonal elements.
00:57:44.050 --> 00:57:47.020
You multiply the
off-diagonals by minus 1,
00:57:47.020 --> 00:57:49.930
and you divide by
the determinant.
00:57:49.930 --> 00:58:01.000
So f inverse is this times v is
that, and v sub f is just 13/7
00:58:01.000 --> 00:58:04.380
over minus 4/7.
00:58:04.380 --> 00:58:08.550
So that's just a different
way of writing v.
00:58:08.550 --> 00:58:10.710
So there's v in
the standard basis.
00:58:10.710 --> 00:58:15.210
There's v in this
new basis, all right?
00:58:15.210 --> 00:58:21.890
And all you do to go
from the standard basis
00:58:21.890 --> 00:58:25.520
to any arbitrary new basis
is multiply the vector
00:58:25.520 --> 00:58:26.270
by f inverse.
00:58:33.800 --> 00:58:38.550
And when you're actually
doing this in Matlab,
00:58:38.550 --> 00:58:39.940
this is really simple.
00:58:39.940 --> 00:58:43.800
You just write down
a matrix F that has
00:58:43.800 --> 00:58:46.530
the basis sets in the columns.
00:58:46.530 --> 00:58:49.620
You just use the matrix
inverse function,
00:58:49.620 --> 00:58:52.710
and then you multiply
that by the data matrix,
00:58:52.710 --> 00:58:54.300
by the data vector.
00:58:54.300 --> 00:58:58.490
All right, so I'm just
going to summarize again.
00:58:58.490 --> 00:59:02.060
In order to find the coordinate
vector for v in this new basis,
00:59:02.060 --> 00:59:05.780
you construct a matrix
F, whose columns
00:59:05.780 --> 00:59:09.000
are just the elements
of the basis vectors.
00:59:09.000 --> 00:59:11.720
So if you have
two basis vectors,
00:59:11.720 --> 00:59:14.600
it's a two-- remember, each
of those basis vectors.
00:59:14.600 --> 00:59:16.850
In two dimensions, there
are two basis vectors.
00:59:16.850 --> 00:59:20.180
Each has two numbers, so
this is a 2 by 2 matrix.
00:59:20.180 --> 00:59:24.200
In n dimensions, you
have n basis vectors.
00:59:24.200 --> 00:59:26.640
Each of the basis
vectors has n numbers.
00:59:26.640 --> 00:59:31.440
And so this matrix F is an
n by n matrix, all right?
00:59:31.440 --> 00:59:38.730
You know that you can write down
v as this basis times v sub f.
00:59:38.730 --> 00:59:41.310
You solve for v sub f by
multiplying both sides
00:59:41.310 --> 00:59:42.840
by f inverse, all right?
00:59:42.840 --> 00:59:45.720
That performs whats
called change of basis.
00:59:50.100 --> 00:59:54.670
Now, that only works
if f has an inverse.
00:59:54.670 --> 00:59:59.550
So if you're going to choose
a new basis to write down
00:59:59.550 --> 01:00:02.250
your vector, you have to
be careful to pick one
01:00:02.250 --> 01:00:04.320
that has an inverse, all right?
01:00:04.320 --> 01:00:05.820
And I want to show
you what it looks
01:00:05.820 --> 01:00:08.640
like when you pick a basis
that doesn't have an inverse
01:00:08.640 --> 01:00:10.110
and what that means.
01:00:10.110 --> 01:00:14.620
All right, and that gets to the
idea of linear independence.
01:00:14.620 --> 01:00:20.140
All right, so, remember I said
that if in n dimensions, in Rn,
01:00:20.140 --> 01:00:25.390
in order to have a basis in Rn,
you have certain requirements?
01:00:25.390 --> 01:00:26.990
Not any vectors will work.
01:00:26.990 --> 01:00:29.920
So let's take a look
at these vectors.
01:00:29.920 --> 01:00:32.800
Will those work to describe an--
01:00:32.800 --> 01:00:35.890
will that basis set work
to describe an arbitrary
01:00:35.890 --> 01:00:37.435
vector in three dimensions?
01:00:37.435 --> 01:00:38.050
No?
01:00:38.050 --> 01:00:39.913
Why not?
01:00:39.913 --> 01:00:45.068
AUDIENCE: [INAUDIBLE] vectors,
so if you're [INAUDIBLE]..
01:00:45.068 --> 01:00:45.860
MICHALE FEE: Right.
01:00:45.860 --> 01:00:48.950
So the problem is in which
coordinate, which axis?
01:00:48.950 --> 01:00:49.700
AUDIENCE: Z-axis.
01:00:49.700 --> 01:00:50.700
MICHALE FEE: The z-axis.
01:00:50.700 --> 01:00:54.020
You can see that you have zeros
in all three of those vectors,
01:00:54.020 --> 01:00:56.690
OK?
01:00:56.690 --> 01:00:59.720
You can't describe any
vector with this basis
01:00:59.720 --> 01:01:03.169
that has a non-zero
component in the z direction.
01:01:08.710 --> 01:01:11.700
And the reason is that
any linear combination
01:01:11.700 --> 01:01:16.700
of these three vectors will
always lie in the xy plane.
01:01:16.700 --> 01:01:19.310
So you can't describe
any vector here
01:01:19.310 --> 01:01:25.720
that has a non-zero z
component, all right?
01:01:25.720 --> 01:01:28.330
So what we say is that
this set of vectors
01:01:28.330 --> 01:01:31.910
doesn't span all of R3.
01:01:31.910 --> 01:01:36.830
It only spans the
xy plane, which
01:01:36.830 --> 01:01:40.225
is what we call a
subspace of R3, OK?
01:01:44.990 --> 01:01:47.210
OK, so let's take a look
at these three vectors.
01:01:47.210 --> 01:01:48.770
The other thing to
notice is that you
01:01:48.770 --> 01:01:52.250
can write any one
of these vectors
01:01:52.250 --> 01:01:56.240
as a linear combination
of the other two.
01:01:56.240 --> 01:02:01.750
So you can write f3
as a sum of f1 and f2.
01:02:01.750 --> 01:02:03.850
The sum of those two vectors
is equal to that one.
01:02:03.850 --> 01:02:06.850
You can write f2 as f3 minus f1.
01:02:06.850 --> 01:02:09.940
So any of these vectors can be
written as a linear combination
01:02:09.940 --> 01:02:11.330
of the others.
01:02:11.330 --> 01:02:15.310
And so that set of vectors
is called linearly dependent.
01:02:19.180 --> 01:02:23.630
And any set of linearly
dependent vectors cannot form
01:02:23.630 --> 01:02:24.130
a basis.
01:02:26.880 --> 01:02:28.980
And how do you know
if a set of vectors
01:02:28.980 --> 01:02:33.480
that you choose for your
basis is linearly dependent?
01:02:33.480 --> 01:02:38.560
Well, again, you just find the
determinant of that matrix.
01:02:38.560 --> 01:02:44.030
And if it's zero, those
vectors are linearly dependent.
01:02:44.030 --> 01:02:48.670
So what that corresponds to
is you're taking your data
01:02:48.670 --> 01:02:54.890
and when you transform
it into a new basis,
01:02:54.890 --> 01:02:58.220
if the determinant
of that matrix F
01:02:58.220 --> 01:03:01.580
is zero, then what you're doing
is you're taking those data
01:03:01.580 --> 01:03:05.763
and transforming them to a space
where they're being collapsed.
01:03:05.763 --> 01:03:07.430
Let's say if you're
in three dimensions,
01:03:07.430 --> 01:03:12.350
those data are being collapsed
onto a plane or onto a line,
01:03:12.350 --> 01:03:14.390
OK?
01:03:14.390 --> 01:03:18.510
And that means you can't undo
that transformation, all right?
01:03:18.510 --> 01:03:20.730
And the way to tell whether
you've got that problem
01:03:20.730 --> 01:03:23.862
is looking at the determinant.
01:03:23.862 --> 01:03:25.820
All right, let me show
you one other cool thing
01:03:25.820 --> 01:03:27.920
about the determinant.
01:03:27.920 --> 01:03:30.500
There's a very simple
geometrical interpretation
01:03:30.500 --> 01:03:33.320
of what the determinant is, OK?
01:03:33.320 --> 01:03:34.700
All right, sorry.
01:03:34.700 --> 01:03:37.580
So if f maps your
data onto a subspace,
01:03:37.580 --> 01:03:39.290
then the mapping
is not reversible.
01:03:39.290 --> 01:03:43.910
OK, so what does the
determinant correspond to?
01:03:43.910 --> 01:03:48.770
Let's say in two dimensions,
if I have two orthogonal unit
01:03:48.770 --> 01:03:52.670
vectors, you can
think of those vectors
01:03:52.670 --> 01:03:58.460
as kind of forming a
square in this space.
01:03:58.460 --> 01:04:01.470
Or in three dimensions, if I
have three orthogonal vectors,
01:04:01.470 --> 01:04:05.810
you can think of those vectors
as defining a cube, OK?
01:04:05.810 --> 01:04:07.700
And if there unit
vectors, then they
01:04:07.700 --> 01:04:10.990
define a cube of volume one.
01:04:10.990 --> 01:04:16.010
Here, you have the
square of area one.
01:04:16.010 --> 01:04:21.454
So let's think about
this unit volume.
01:04:21.454 --> 01:04:26.120
If I transform those two
vectors or those three vectors
01:04:26.120 --> 01:04:30.710
in 3D space by a
matrix A, those vectors
01:04:30.710 --> 01:04:34.730
get rotated and transformed.
01:04:34.730 --> 01:04:38.450
They point in different
directions, and they define--
01:04:38.450 --> 01:04:42.150
it's no longer a cube, but they
define some sort of rhombus,
01:04:42.150 --> 01:04:43.720
OK?
01:04:43.720 --> 01:04:48.580
You can ask, what is the
volume of that rhombus?
01:04:48.580 --> 01:04:53.560
The volume of that rhombus
is just the determinant
01:04:53.560 --> 01:04:58.550
of that matrix A.
So now what happens
01:04:58.550 --> 01:05:03.180
if I have a cube in
three-dimensional space
01:05:03.180 --> 01:05:06.210
and I multiply it by a
matrix that transforms it
01:05:06.210 --> 01:05:10.230
into a rhombus that
has zero volume?
01:05:10.230 --> 01:05:12.240
So let's say I have
those three vectors.
01:05:12.240 --> 01:05:16.440
It transforms it into,
let's say, a square.
01:05:16.440 --> 01:05:20.200
The volume of that square
in three dimensional space
01:05:20.200 --> 01:05:22.830
is zero.
01:05:22.830 --> 01:05:25.800
So what that means is I'm
transforming my vectors
01:05:25.800 --> 01:05:28.770
into a space that
has zero volume
01:05:28.770 --> 01:05:30.640
in the original dimensions, OK?
01:05:30.640 --> 01:05:35.880
So I'm transforming things
from 3D into a 2D plane.
01:05:35.880 --> 01:05:39.210
And what that means is
I've lost information,
01:05:39.210 --> 01:05:40.260
and I can't go back.
01:05:44.430 --> 01:05:49.840
OK, notice that a rotation
matrix, if I take this cube
01:05:49.840 --> 01:05:53.620
and I rotate it, has
exactly the same volume
01:05:53.620 --> 01:05:55.940
as it did before I rotated it.
01:05:55.940 --> 01:06:00.400
And so you can always tell when
you have a rotation matrix,
01:06:00.400 --> 01:06:04.640
because the determinant of
a rotation matrix is one.
01:06:04.640 --> 01:06:11.050
So if you take a matrix A
and you find the determinant
01:06:11.050 --> 01:06:12.910
and you find that the
determinant is one,
01:06:12.910 --> 01:06:18.190
you know that you have
a pure rotation matrix.
01:06:18.190 --> 01:06:20.736
What does it mean if the
determinant is minus one?
01:06:24.310 --> 01:06:26.800
What it means is
you have a rotation,
01:06:26.800 --> 01:06:32.620
but that one of the axes
is inverted, is flipped.
01:06:32.620 --> 01:06:33.730
There's a mirror in there.
01:06:36.660 --> 01:06:39.470
So you can tell if you
have a pure rotation
01:06:39.470 --> 01:06:43.610
or if you have a rotation and
one of the axes is flipped.
01:06:43.610 --> 01:06:46.490
Because in the pure rotation,
the determinant is one.
01:06:46.490 --> 01:06:53.360
And in an impure rotation, you
have a rotation and a mirror
01:06:53.360 --> 01:06:53.860
flip.
01:06:56.890 --> 01:07:02.750
All right, and I just want to
make a couple more comments
01:07:02.750 --> 01:07:05.990
about change of basis, OK?
01:07:05.990 --> 01:07:10.580
All right, so let's choose
a set of basis vectors
01:07:10.580 --> 01:07:13.370
for our new basis.
01:07:13.370 --> 01:07:17.470
Let's write those
into a matrix F.
01:07:17.470 --> 01:07:22.140
It's going to be our
matrix of basis vectors.
01:07:22.140 --> 01:07:24.990
If the determinant
is not equal to zero,
01:07:24.990 --> 01:07:27.300
then these vectors,
that set of vectors,
01:07:27.300 --> 01:07:29.790
are linearly independent.
01:07:29.790 --> 01:07:34.050
That means you cannot write one
of those vectors as a linear
01:07:34.050 --> 01:07:35.280
combination of--
01:07:35.280 --> 01:07:37.800
any one of those vectors
as a linear combination
01:07:37.800 --> 01:07:39.800
of the others.
01:07:39.800 --> 01:07:45.230
Those vectors form a complete
basis in that n dimensional
01:07:45.230 --> 01:07:47.820
space.
01:07:47.820 --> 01:07:50.960
The matrix F implements
a change of basis,
01:07:50.960 --> 01:07:54.110
and you can go from
the standard basis to F
01:07:54.110 --> 01:07:56.600
by multiplying your
vector by F inverse
01:07:56.600 --> 01:07:59.570
to get the coordinate
vector and your new basis.
01:07:59.570 --> 01:08:05.100
And you can go back from that
rotated or transformed basis
01:08:05.100 --> 01:08:10.440
back to the coordinate basis
by multiplying by F, OK?
01:08:10.440 --> 01:08:14.250
Multiply by F inverse
transforms to the new basis.
01:08:14.250 --> 01:08:16.200
Multiplying by F
transforms back.
01:08:19.319 --> 01:08:26.260
If that set of vectors is
an orthonormal basis, then--
01:08:26.260 --> 01:08:31.200
OK, so let's take this
matrix F that has columns
01:08:31.200 --> 01:08:32.729
that are the new basis vectors.
01:08:32.729 --> 01:08:38.630
And let's say that those
form an orthonormal basis.
01:08:38.630 --> 01:08:42.020
In that case, we can write
down-- so, in any case,
01:08:42.020 --> 01:08:46.100
we can write down the transpose
of this matrix, F transpose.
01:08:46.100 --> 01:08:51.210
And now the rows of that
matrix are the basis vectors.
01:08:51.210 --> 01:08:55.569
Notice that if we multiply
F transpose times F,
01:08:55.569 --> 01:08:59.990
we have basis vectors in
rows here and columns here.
01:08:59.990 --> 01:09:03.060
So what is F transpose
F for the case
01:09:03.060 --> 01:09:05.399
where these are
unit vectors that
01:09:05.399 --> 01:09:07.180
are orthogonal to each other?
01:09:07.180 --> 01:09:08.385
What is that product?
01:09:08.385 --> 01:09:09.260
AUDIENCE: [INAUDIBLE]
01:09:09.260 --> 01:09:09.479
MICHALE FEE: It's what?
01:09:09.479 --> 01:09:10.060
AUDIENCE: [INAUDIBLE]
01:09:10.060 --> 01:09:10.810
MICHALE FEE: Good.
01:09:10.810 --> 01:09:14.738
Because F1 dot F1 is one.
01:09:14.738 --> 01:09:17.840
F1 dot F2 is zero.
01:09:17.840 --> 01:09:21.330
F2 dot F1 is zero,
and F2 dot F2 is 0.
01:09:21.330 --> 01:09:24.140
So that's equal to the
identity matrix, right?
01:09:26.880 --> 01:09:30.300
So F transpose equals F inverse.
01:09:30.300 --> 01:09:33.899
If the inverse of a matrix
is just its transpose,
01:09:33.899 --> 01:09:35.924
then that matrix is
a rotation matrix.
01:09:38.810 --> 01:09:41.100
So F is just the
rotation matrix.
01:09:41.100 --> 01:09:43.109
All right, now let's
see what happens.
01:09:43.109 --> 01:09:48.810
So that means the inverse of
F is just this F transpose.
01:09:48.810 --> 01:09:51.359
Let's do this coordinate--
let's [AUDIO OUT]
01:09:51.359 --> 01:09:54.310
change of basis for this case.
01:09:54.310 --> 01:09:58.680
So you can see that v sub f,
the coordinate vector in the new
01:09:58.680 --> 01:10:04.770
basis, is F transpose
v. Here's F transpose--
01:10:04.770 --> 01:10:07.020
the basis vectors
are in the rows--
01:10:07.020 --> 01:10:14.525
times v. This is just v
dot F1, v dot F2, right?
01:10:14.525 --> 01:10:20.830
So this shows how for
a orthonormal basis,
01:10:20.830 --> 01:10:25.270
the transpose, which
is the inverse of F--
01:10:25.270 --> 01:10:27.190
taking the transpose
of F times v
01:10:27.190 --> 01:10:29.290
is just taking the
dot product of v
01:10:29.290 --> 01:10:32.320
with each of the
basis vectors, OK?
01:10:32.320 --> 01:10:36.880
So that ties it back to what we
were showing before about how
01:10:36.880 --> 01:10:39.220
to do this change of basis, OK?
01:10:39.220 --> 01:10:42.400
Just tying up those two
ways of thinking about it.
01:10:45.190 --> 01:10:53.490
So, again, what
we've been developing
01:10:53.490 --> 01:10:56.720
when we talk about
change of basis
01:10:56.720 --> 01:11:02.500
are ways of rotating
vectors, rotating sets
01:11:02.500 --> 01:11:04.480
of data, into
different dimensions,
01:11:04.480 --> 01:11:07.780
into different basis
sets so that we
01:11:07.780 --> 01:11:11.510
can look at data from
different directions.
01:11:11.510 --> 01:11:14.210
That's all we're doing.
01:11:14.210 --> 01:11:16.370
And you can see
that when you look
01:11:16.370 --> 01:11:20.300
at data from different
directions, you can get--
01:11:20.300 --> 01:11:23.720
some views of data, you have
a lot of things overlapping,
01:11:23.720 --> 01:11:24.680
and you can't see them.
01:11:24.680 --> 01:11:28.010
But when you rotate those
data, now, all of a sudden,
01:11:28.010 --> 01:11:31.820
you can see things
clearly that used to be--
01:11:31.820 --> 01:11:36.590
things get separated in some
views, whereas in other views,
01:11:36.590 --> 01:11:39.980
things are kind of mixed up
and covering each other, OK?
01:11:39.980 --> 01:11:44.270
And that's exactly what neural
networks are doing when they're
01:11:44.270 --> 01:11:48.260
analyzing sensory stimuli.
01:11:48.260 --> 01:11:50.150
They're doing that
kind of rotations
01:11:50.150 --> 01:11:54.440
and untangling the
data to see what's
01:11:54.440 --> 01:11:58.400
there in that
high-dimensional data, OK?
01:11:58.400 --> 01:12:00.670
All right, that's it.