These video lectures of Professor Gilbert Strang teaching 18.06 were recorded in Fall 1999 and do not correspond precisely to the current edition of the textbook. However, this book is still the best reference for more information on the topics covered in each lecture.
Instructor/speaker: Prof. Gilbert Strang
OK, this is the lecture on linear transformations. Actually, linear algebra courses used to begin with this lecture, so you could say I'm beginning this course again by talking about linear transformations.
In a lot of courses, those come first before matrices. The idea of a linear transformation makes sense without a matrix, and physicists and other -- some people like it better that way. They don't like coordinates.
They don't want those numbers. They want to see what's going on with the whole space. But, for most of us, in the end, if we're going to compute anything, we introduce coordinates, and then every linear transformation will lead us to a matrix.
And then, to all the things that we've done about null space and row space, and determinant, and eigenvalues -- all will come from the matrix.
But, behind it -- in other words, behind this is the idea of a linear transformation. Let me give an example of a linear transformation. So, example.
Example one. A projection. I can describe a projection without telling you any matrix, anything about any matrix. I can describe a projection, say, this will be a linear transformation that takes, say, all of R^2, every vector in the plane, into a vector in the plane. And this is the way people describe, a mapping. It takes every vector, and so, by what rule? So, what's the rule, is, I take a -- so here's the plane, this is going to be my line, my line through my line, and I'm going to project every vector onto that line. So if I take a vector like b -- or let me call the vector v for the moment -- the projection -- the linear transformation is going to produce this vector as T(v). So T -- it's like a function.
Exactly like a function. You give me an input, the transformation produces the output.
So transformation, sometimes the word map, or mapping is used. A map between inputs and outputs. So this is one particular map, this is one example, a projection that takes every vector -- here, let me do another vector v, or let me do this vector w, what is T(w)? You see? There are no coordinates here.
I've drawn those axes, but I'm sorry I drew them, I'm going to remove them, that's the whole point, is that we don't need axes, we just need -- so guts -- get it out of there, I'm not a physicist, so I draw those axes. So the input is w, the output of the projection is, project on that line, T(w). OK.
Now, I could think of a lot of transformations T.
But, in this linear algebra course, I want it to be a linear transformation. So here are the rules for a linear transformation. Here, see, exactly, the two operations that we can do on vectors, adding and multiplying by scalars, the transformation does something special with respect to those operations.
So, for example, the projection is a linear transformation because -- for example, if I wanted to check that one, if I took v to be twice as long, the projection would be twice as long.
If I took v to be minus -- if I changed from v to minus v, the projection would change to a minus.
So c equal to two, c equal minus one, any c is OK. So you see that actually, those combine, I can combine those into one statement. What the transformation does to any linear combination, it must produce the same combination of T(v) and T(w). Let's think about some -- I mean, it's like, not hard to decide, is a transformation linear or is it not.
Let me give you an example so you can tell me the answer. Suppose my transformation is -- here's another example two.
Shift the whole plane. So here are all my vectors, my plane, and every vector v in the plane, I shift it over by, let's say, three by some vector v0.
Shift whole plane by v0. So every vector in the plane -- this was v, T(v) will be v+v0. There's T(v).
Here's v0. There's the typical v.
And there's T(v). You see what this transformation does? Takes this vector and adds to it. Adds a fixed vector to it.
Well, that seems like a pretty reasonable, simple transformation, but is it linear? The answer is no, it's not linear.
Which law is broken? Maybe both laws are broken.
Let's see. If I double the length of v, does the transformation produce something double -- do I double T(v)? No.
If I double the length of v, in this transformation, I'm just adding on the same one -- same v0, not two v0s, but only one v0 for every vector, so I don't get two times the transform. Do you see what I'm saying? That if I double this, then the transformation starts there and only goes one v0 out and doesn't double T(v). In fact, a linear transformation -- what is T of zero? That's just like a special case, but really worth noticing. The zero vector in a linear transformation must get transformed to zero.
It can't move, because, take any vector V here -- well, so you can see why T of zero is zero. Take v to be the zero vector, take c to be three. Then we'd have T of zero vector equaling three T of zero vector, the T of zero has to be zero.
OK. So, this example is really a non-example. Shifting the whole plane is not a linear transformation. Or if I cooked up some formula that involved squaring, or the transformation that, also non-example, how about the transformation that, takes any vector and produces its length? So there's a transformation that takes any vector, say, any vector in R^3, let me just -- I'll just get a chance to use this notation again.
Suppose I think of the transformation that takes any vector in R^3 and produces this number.
So that, I could say, is a member of R^1, for example, if I wanted.
Or just real numbers. That's certainly not linear.
It's true that the zero vector goes to zero.
But if I double a vector, it does double the length, that's true. But suppose I multiply a vector by minus two. What happens to its length? It just doubles. It doesn't get multiplied by minus two. So when c is minus two in my requirement, I'm not satisfying that requirement.
So T of minus v is not minus v -- minus, the length, it's just the length. OK, so that's another non-example. Projection was an example, let me give you another example.
I can stay here and have a -- this will be an example that is a linear transformation, a rotation.
Rotation by -- what shall we say? By 45 degrees. OK? So again, let me choose this, this will be a mapping, from the whole plane of vectors, into the whole plane of vectors, and it just -- here is the input vector v, and the output vector foam this 45 degree rotation is just rotate that thing by 45 degrees, T(v).
So every vector got rotated. You see that I can describe this without any coordinates. And see that it's linear.
If I doubled v, the rotation would just be twice as far out. If I had v+w, and if I rotated each of them and added, the answer's the same as if I add and then rotate. That's what the linear transformation is. OK, so those are two examples.
Two examples, projection and rotation, and I could invent more that are linear transformations where I haven't told you a matrix yet. Actually, the book has a picture of the action of linear transformations -- actually, the cover of the book has it. So, in this section seven point one, we can think of a -- actually, here let's take this linear transformation, rotation, suppose I have, as the cover of the book has, a house in R^2.
So instead of this, let me take a small house in R^2. So that's a whole lot of points. The idea is, with this linear transformation, that I can see what it does to everything at once.
I don't have to just take one vector at a time and see what T of V is, I can take all the vectors on the outline of the house, and see where they all go.
In fact, that will show me where the whole house goes.
So what will happen with this particular linear transformation? The whole house will rotate, so the result, if I can draw it, will be, the house will be sitting there.
OK. And, but suppose I give some other examples. Oh, let me give some examples that involve a matrix. Example three -- and this is important -- coming from a matrix at -- we always call A.
So the transformation will be, multiply by A.
There is a linear transformation. And a whole family of them, because every matrix produces a transformation by this simple rule, just multiply every vector by that matrix, and it's linear, right? Linear, I have to check that A(v) -- A times v plus w equals Av plus A w, which is fine, and I have to check that A times vc equals c A(v).
Check. Those are fine.
So there is a linear transformation. And if I take my favorite matrix A, and I apply it to all vectors in the plane, it will produce a bunch of outputs. See, the idea is now worth thinking of, like, the big picture.
The whole plane is transformed by matrix multiplication.
Every vector in the plane gets multiplied by A.
Let's take an example, and see what happens to the vectors of the house. So this is still a transformation from plane to plane, and let me take a particular matrix A -- well, if I cooked up a rotation matrix, this would be the right picture.
If I cooked up a projection matrix, the projection would be the picture. Let me just take some other matrix. Let me take the matrix one zero zero minus one. What happens to the house, to all vectors, and in particular, we can sort of visualize it if we look at the house -- so the house is not rotated any more, what do I get? What happens to all the vectors if I do this transformation? I multiply by this matrix. Well, of course, it's an easy matrix, it's diagonal.
The x component stays the same, the y component reverses sign, so that like the roof of that house, the point, the tip of the roof, has an x component which stays the same, but its y component reverses, and it's down here.
And, of course, what we get is, the house is, like, upside down.
Now, I have to put -- where does the door go? I guess the door goes upside down there, right? So here's the input, here's the input house, and this is the output. OK.
This idea of a linear transformation is like kind of the abstract description of matrix multiplication.
And what's our goal here? Our goal is to understand linear transformations, and the way to understand them is to find the matrix that lies behind them.
That's really the idea. Find the matrix that lies behind them. Um, and to do that, we have to bring in coordinates.
We have to choose a basis. So let me point out what's the story -- if we have a linear transformation -- so start with -- start. Suppose we have a linear transformation. Let -- from now on, let T stand for linear transformations.
I won't be interested in the nonlinear ones.
Only linear transformations I'm interested in.
OK. I start with a linear transformation T. Let's suppose its inputs are vectors in R^3. OK? And suppose its outputs are vectors in R^2, for example. OK.
What's an example of such a transformation, just before I go any further? Any matrix of the right size will do this. So what would be the right shape of a matrix? So, for example -- I'm wanting to give you an example, just because, here, I'm thinking of transformations that take three-dimensional space to two-dimensional space. And I want them to be linear, and the easy way to invent them is a matrix multiplication. So example, T of v should be any A v. Those transformations are linear, that's what 18.06 is about. And A should be what size, what shape of matrix should that be? I want V to have three components, because this is what the inputs have -- so here's the input in R^3, and here's the output in R^2.
So what shape of matrix? So this should be, I guess, a two by three matrix? Right? A two by three matrix. A two by three matrix, we'll multiply a vector in R^3 -- you see I'm moving to coordinates so quickly, I'm not a true physicist here. A two by three matrix, we'll multiply a vector in R^3 an produce an output in R^2, and it will be a linear transformation, and OK.
So there's a whole lot of examples, every two by three matrix give me an example, and basically, I want to show you that there are no other examples.
Every linear transformation is associated with a matrix.
Now, let me come back to the idea of linear transformation.
Suppose I've got this linear transformation in my mind, and I want to tell you what it is.
Suppose I tell you what the transformation does to one vector. OK.
You know one thing, then. All right. So this is like the -- what I'm speaking about now is, how much information is needed to know the transformation? By knowing T, I -- to know T of v for all v. All inputs.
How much information do I have to give you so that you know what the transformation does to every vector? OK, I could tell you what the transformation -- so I could take a vector v1, one particular vector, tell you what the transformation does to it -- fine. But now you only know what the transformation does to one vector.
So you say, OK, that's not enough, tell me what it does to another vector.
So I say, OK, give me a vector, you give me a vector v2, and we see, what does the transformation do to v2? Now, you only know -- or do you only know what the transformation does to two vectors? Have I got to ask you -- answer you about every vector in the whole input space, or can you, knowing what it does to v1 and v2, how much do you now know about the transformation? You know what the transformation does to a larger bunch of vectors than just these two, because you know what it does to every linear combination. You know what it does, now, to the whole plane of vectors, with bases v1 and v2. I'm assuming v1 and v2 were independent. If they were dependent, if v2 was six times v1, then I didn't give you any new information in T of v2, you already knew it would be six times T of v1. So you can see what I'd headed for. If I know what the transformation does to every vector in a basis, then I know everything. So the information needed to know T of v for all inputs is T of v1, T of v2, up to T of vm, let's say, or vn, for any basis -- for a basis v1 up to vn. This is a base for any -- can I call it an input basis? It's a basis for the space of inputs.
The things that T is acting on. You see this point, that if I have a basis for the input space, and I tell you what the transformation does to every one of those basis vectors, that is all I'm allowed to tell you, and it's enough to know T of v for all v-s, because why? Because every v is some combination of these basis vectors, c1v1+...+cnvn, that's what a basis is, right? It spans the space.
And if I know what T does to this, and what T does to v2, and what T does to vn, then I know what T does to V.
By this linearity, it has to be c1 T of v1 plus O one plus cn T of vn. There's no choice.
So, the point of this comment is that if I know what T does to a basis, to each vector in a basis, then I know the linear transformation. The property of linearity tells me all the other vectors. All the other outputs.
OK. So now, we got -- so that light we now see, what do we really need in a linear transformation, and we're ready to go to a matrix.
OK. What's the step now that takes us from a linear transformation that's free of coordinates to a matrix that's been created with respect to coordinates? The matrix is going to come from the coordinate system.
These are the coordinates. Coordinates mean a basis is decided. Once you decide on a basis -- this is where coordinates come from.
You decide on a basis, then every vector, these are the coordinates in that basis. There is one and only one way to express v as a combination of the basis vectors, and the numbers you need in that combination are the coordinates.
Let me write that down. So what are coordinates? Coordinates come from a basis. Coordinates come from a basis.
The coordinates of v, the coordinates of v are these numbers that tell you how much of each basis vector is in v.
If I change the basis, I change the coordinates, right? Now, we have always been assuming that were working with a standard basis, right? The basis we don't even think about this stuff, because if I give you the vector v equals three two four, you have been assuming completely -- and probably rightly -- that I had in mind the standard basis, that this vector was three times the first coordinate vector, and two times the second, and four times the third.
But you're not entitled -- I might have had some other basis in mind. This is like the standard basis. And then the coordinates are sitting right there in the vector.
But I could have chosen a different basis, like I might have had eigenvectors of a matrix, and I might have said, OK, that's a great basis, I'll use the eigenvectors of this matrix as my basis vectors.
Which are not necessarily these three, but some other basis.
So that was an example, this is the real thing, the coordinates are these numbers, I'll circle them again, the amounts of each basis. OK.
So, if I want to create a matrix that describes a linear transformation, now I'm ready to do that. OK, OK. So now what I plan to do is construct the matrix A that represents, or tells me about, a linear transformation, linear transformation T.
OK. So I really start with the transformation -- whether it's a projection or a rotation, or some strange movement of this house in the plane, or some transformation from n-dimensional space to -- or m-dimensional space to n-dimensional space.
n to m, I guess. Usually, we'll have T, we'll somehow transform n-dimensional space to m-dimensional space, and the whole point is that if I have a basis for n-dimensional space -- I guess I need two bases, really. I need an input basis to describe the inputs, and I need an output basis to give me coordinates -- to give me some numbers for the output.
So I've got to choose two bases.
Choose a basis v1 up to vn for the inputs, for the inputs in -- they came from R^n. So the transformation is taking every n-dimensional vector into some m-dimensional vector.
And I have to choose a basis, and I'll call them w1 up to wn, for the outputs. Those are guys in R^m.
Once I've chosen the basis, that settles the matrix -- I now working with coordinates. Every vector in R^n, every input vector has some coordinates.
So here's what I do, here's what I do.
Can I say it in words? I take a vector v.
I express it in its basis, in the basis, so I get its coordinates. Then I'm going to multiply those coordinates by the right matrix A, and that will give me the coordinates of the output in the output basis.
I'd better write that down, that was a mouthful.
What I want -- I want a matrix A that does what the linear transformation does. And it does it with respecting these bases. So I want the matrix to be -- well, let's suppose -- look, let me take an example. Let me take the projection example.
The projection example. Suppose I take -- because we've got that -- we've got that projection in mind -- I can fit in here. Here's the projection example.
So the projection example, I'm thinking of n and m as two.
The transformation takes the plane, takes every vector in the plane, and, let me draw the plane, just so we remember it's a plane -- and there's the thing that I'm projecting onto, that's the line I'm projecting onto -- so the transformation takes every vector in the plane and projects it onto that line. So this is projection, so I'm going to do projection.
OK. But, I'm going to choose a basis that I like better than the standard basis. My basis -- in fact, I'll choose the same basis for inputs and for outputs, and the basis will be -- my first basis vector will be right on the line.
There's my first basis vector. Say, a unit vector, on the line. And my second basis vector will be a unit vector perpendicular to that line.
And I'm going to choose that as the output basis, also. And I'm going to ask you, what's the matrix? What's the matrix? How do I describe this transformation of projection with respect to this basis? OK? So what's the rule? I take any vector v, it's some combination of the first basis ve- vector, and the second basis vector. Now, what is T of v? Suppose the input is -- well, suppose the input is v1.
What's the output? v1, right? The projection leaves this one alone.
So we know what the projection does to this first basis vector, this guy, it leaves it. What does the projection do to the second basis vector? It kills it, sends it to zero. So what does the projection do to a combination? It kills this part, and this part, it leaves alone.
Now, all I want to do is find the matrix.
I now want to find the matrix that takes an input, c1 c2, the coordinates, and gives me the output, c1 0. You see that in this basis, the coordinates of the input were c1, c2, and the coordinates of the output are c1, 0.
And of course, not hard to find a matrix that will do that. The matrix that will do that is the matrix one, zero, zero, zero.
Because if I multiply input by that matrix A -- this is A times input coordinates -- and I'm hoping to get the output coordinates. And what do I get from that multiplication? I get the right answer, c1 and zero. So what's the point? So the first point is, there's a matrix that does the job. If there's a linear transformation out there, coordinate-free, no coordinates, and then I choose a basis for the inputs, and I choose a basis for the outputs, then there's a matrix that does the job.
And what's the job? It multiplies the input coordinates and produces the output coordinates.
Now, in this example -- let me repeat, I chose the input basis was the same as the output basis.
The input basis and output basis were both along the line, and perpendicular to the line. They're actually the eigenvectors of the projection. And, as a result, the matrix came out diagonal. In fact, it came out to be lambda. This is like, the good basis. So the good -- the eigenvector basis is the good basis, it leads to the matrix -- the diagonal matrix of eigenvalues lambda, and just as in this example, the eigenvectors and eigenvalues of this linear transformation were along the line, and perpendicular. The eigenvalues were one and zero, and that's the matrix that we got.
OK. So that's a, like, the great choice of matrix, that's the choice a physicist would do when he had to finally -- he or she had to finally bring coordinates in unwillingly, the coordinates to be chosen, the good coordinates are the eigenvectors, because, if I did this projection in the standard basis -- which I could do, right? I could do the whole thing in the standard basis -- I better try, if I can do that. What are we calling -- so I'll have to tell you now which line we're projecting on.
Say, the 45 degree line. So say we're projecting onto 45 degree line, and we use not the eigenvector basis, but the standard basis. The standard basis, v1, is one, zero, and v2 is zero, one. And again, I'll use the same basis for the outputs. Then I have to do this -- I can find a matrix, it will be the matrix that we would always think of, it would be the projection matrix. It will be, actually, it's the matrix that we learned about in chapter four, it's what I call the matrix -- do you remember, P was A, A transpose over A transpose A? And I think, in this example, it will come out, one-half, one-half, one-half, one-half. I believe that's the matrix that comes from our formula. And that's the matrix that will do the job. If I give you this input, one, zero, what's the output? The output is one-half, one-half. And that should be the right projection. And if I give you the input zero, one, the output is, again, one-half, one-half, again the projection. So that's the matrix, but not diagonal of course, because we didn't choose a great basis, we just chose the handiest basis.
Well, so the course has practically been about the handiest basis, and just dealing with the matrix that we got. And it's not that bad a matrix, it's symmetric, and it has this P squared equal P property, all those things are good.
But in the best basis, it's easy to see that P squared equals P, and it's symmetric, and it's diagonal.
So that's the idea then, is, do you see now how I'm associating a matrix to the transformation? I'd better write the rule down, I'd better write the rule down.
The rule to find the matrix A. All right, first column.
So, a rule to find A, we're given the bases.
Of course, we don't -- because there's no way we could construct the matrix until we're told what the bases are.
So we're given the input basis, and the output basis, v1 to vn, w1 to wm. Those are given.
Now, in the first column of A, how do I find that column? The first column of the matrix. So that should tell me what happens to the first basis vector.
So the rule is, apply the linear transformation to v1. To the first basis vector.
And then, I'll write it -- so that's the output, right? The input is v1, what's the output? The output is in the output space, it's some combination of these guys, and it's that combination that goes into the first column -- so, let me -- I'll put this word -- right, I'll say it in words again. How to find this matrix. Take the first basis vector. Apply the transformation, then it's in the output space, T of v1, so it's some combination of these outputs, this output basis.
So that combination, the coefficients in that combination will be the first column -- so a1, a row 2, column 1, w2, am1, wm.
There are the numbers in the first column of the matrix.
Let me make the point by doing the second column.
Second column of A. What's the idea, now? I take the second basis vector, I apply the transformation to it, that's in -- now I get an output, so it's some combination in the output basis -- and that combination is the bunch of numbers that should go in the second column of the matrix.
OK. And so forth.
So I get a matrix, and the matrix I get does the right job. Now, the matrix constructed that way, and following the rules of matrix multiplication.
The result will be that if I give you the input coordinates, and I multiply by the matrix, so the outcome of all this is A times the input coordinates correctly reproduces the output coordinates. Why is this right? Let me just check the first column.
Suppose the input coordinates are one and all zeros.
What does that mean? What's the input? If the input coordinates are one and other -- and the rest zeros, then the input is v1, right? That's the vector that has coordinates one and all zeros.
OK? When I multiply A by the one and all zeros, I'll get the first column of A, I'll get these numbers. And, sure enough, those are the output coordinates for T of v1. So we made it right on the first column, we made it right on the second column, we made it right on all the basis vectors, and then it has to be right on every vector. OK.
So there is a picture of the matrix for a linear transformation. Finally, let me give you another -- a different linear transformation.
The linear transformation that takes the derivative.
That's a linear transformation. Suppose the input space is all combination c1 plus c2x plus c3 x squared.
So the basis is these simple functions.
Then what's the output? Is the derivative.
The output is the derivative, so the output is c2+2c3 x.
And let's take as output basis, the vectors one and x.
So we're going from a three-dimensional space of inputs to a two-dimensional space of outputs by the derivative. And I don't know if you ever thought that the derivative is linear.
But if it weren't linear, taking derivatives would take forever, right? We are able to compute derivatives of functions exactly because we know it's a linear transformation, so that if we learn the derivatives of a few functions, like sine x and cos x and e to the x, and another little short list, then we can take all their combinations and we can do all the derivatives.
OK, now what's the matrix? What's the matrix? So I want the matrix to multiply these input vectors -- input coordinates, and give these output coordinates. So I just think, OK, what's the matrix that does it? I can follow my rule of construction, or I can see what the matrix is.
It should be a two by three matrix, right? And the matrix -- so I'm just figuring out, what do I want? No, I'll -- let me write it here. What do I want from my matrix? What should that matrix do? Well, I want to get c2 in the first output, so zero, one, zero will do it. I want to get two c3, so zero, zero, two will do it. That's the matrix for this linear transformation with those bases and those coordinates. You see, it just clicks, and the whole point is that the inverse matrix gives the inverse to the linear transformation, that the product of two matrices gives the right matrix for the product of two transformations -- matrix multiplication really came from linear transformations.
I'd better pick up on that theme Monday after Thanksgiving.
And I hope you have a great holiday.
I hope Indian summer keeps going.
OK, see you on Monday.