Professor Strang starts this lecture asking the question “Which matrices can be completed to have a rank of 1?” He then provides several examples. In the second part, he introduces convolution and cyclic convolution.
Which matrices can be completed to have rank = 1?
Perfect answer: No cycles in a certain graph
Cyclic permutation \(P\) and circulant matrices
\(c_0 I + c_1 P + c_2 P^2 + \cdots\)
Start of Fourier analysis for vectors
Related section in textbook: IV.8 and IV.2
Instructor: Prof. Gilbert Strang
The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or to view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.
GILBERT STRANG: OK, let me start again. Thanks for helping with Professor Rao and working on the labs the last two times. I was saying the labs were really for his class, which is 90 minutes, so that we were kind of running out of time just answering those questions, getting the system to say, OK, keep going. Anyway, he's going to develop those. And I think they'll be a good thing.
So actually, the second lab, as I was watching it, I was thinking of a math question. And may I report to you on that math question? So you remember the lab was completing a matrix. You got some entries. And you can put in any other entries in the other positions. And the question was can you complete it to a rank 1 matrix?
Here's my question. What positions are OK? And what positions could you maybe not be able to complete to a rank 1 matrix? Examples are the best. So I'm looking at this question here.
OK, I'm looking for a rank 1 matrix of the form x column y transpose, or uv transpose, I guess. Those are the letters where you used to, uv transpose. And that means of course that aij is ui times vj. So we're able to choose-- we have m u's.
Let me ask a question. Could I complete-- so let me take m equal n equal 3. So a 3 by 3 matrix. I'm going to give you m plus n minus 1, which is 3 plus 3 minus 1, five positions.
And suppose I give non-zeros in which five positions? So where is that number m plus n minus 1 coming from? Well, because I have m u's and I have n v's for a rank 1 matrix-- i goes from 1 to m, j goes from 1 to n. So I have m plus n. But obviously I could require the first u to be a 1. In other words, when I'm multiplying those, there is one degree of freedom that is sort of repeated here, because if I'm just given-- anyway, you see it, that I can rescale the u so that the first u is a 1. And that leave me m plus n minus 1.
So could I give you these five numbers and could you complete that to a rank 1 matrix? So those are non-zeros. Why do I say non-zeros? Because if that number happened to be a zero, I'd have bad news there. If I prescribed that one and I prescribed it to be zero, then if every column has to be a multiple of that column, these would all have to be 0. So I don't want that. I want to be able to prescribe five numbers. And therefore, they better not be zero.
Could you complete the job with that choice of a five positions? I think you could. Because how would you decide on that number? Well, I mean at least one way to look at it is, what would that number be there?
That number would be chosen. There would be one and only one choice for that that would give rank 1. In fact, that little 2 by 2 determinant would have to be? 0, right? All the columns are supposed to be multiples of one column. All the rows are supposed to be multiples of one row. All the 2 by 2s are rank 1. And rank 1 for a 2 by 2 means determinant 0.
So I would know what that one had to be. That number would have to be a 1 2 times a 2 1 over a 1 1. Every 2 by determinant has to be 0 here. If I want rank 1, I can't stand any 2 by 2s that are invertible matrices. So all the determinants have to be zero.
And, of course, then these three numbers would allow me to fill in there. Those three numbers would allow me to fill that. Those three numbers will allow me to fill that. It would be easy, easy. So that I can see that those are five positions that work.
Now, give me five positions that won't work. So this is good. This is check. Now, I want one where I can't make it work.
So let me give that number, that number. So I'm looking for something that fails, a set of positions that fails. And then I want to be able to test, does the set of positions work for any non-zeros or does it not work?
What am I looking for here? A different one that works? OK, so what else would work, apart from that?
Let's see, I suppose I take that one. Now, what about that position there? I must not choose that one, right? I must not choose that one, because that would complete a 2 by 2. And if I gave the wrong numbers, the determinant wouldn't be zero. And I would be failed. So I must not give that position.
Maybe I could give that position. Maybe I could give this one. How's that? If I had those five, would I be able to complete? It looks good anyway. It looks good.
Let's see, how would I go about completing that? Let's see, I would know what that number had to be from this 2 by 2. From this 2 by 2, I would know that number. From this-- where now? This and this and this, no, that's not any good.
What do I do now? How would I get this number? Oh, OK, I guess I have to use-- this would complete that. Oh, this 2 by 2, oh, yeah, OK, no problem. That would complete that one, right?
But this one, I don't think I can get immediately. But I can get it. Once I know these, then obviously I can get that, right? So it's sort of nice combinatorial problem. Which five positions will work? Which five will not work? How do I tell?
Of course, that's for a 3 by 3. Yeah, so let me-- I had like a bigger example. Let me move up to 4 by 4.
So here comes the idea that I learned last night from a math professor who does combinatorics. So these people who do combinatorics, they know stuff that the rest of us don't.
So his view is these would be rows, one, two, three. I may add another row. Let me do a 4 by 4. And these would be columns, one, two, three, four. Now, I'm looking for seven-- so if I prescribe an entry, I'll put in an edge between them. So let me let me hype this one up to 4 by 4.
Here, a good one. Actually, I'll just draw this picture. So suppose I prescribe the 1, 1-- oh, yeah, and over here I'll show you what I've done. So that I'm putting in the 1, 1 position. I'm looking for 7, right? 4 plus 4 minus 1, seven numbers. OK.
Then, I'll put in, this is row 2 column 1. Row 2 column 1, that would be there. Then I'll put in a 2 to 4. So that would be row 2, column 4. Then I'll put in a 3 to 4. So that would be row 3, column 4.
And finally, I'll put in-- let's see, so how many have I done? One, two, three, four, and I've got to get up to seven, prescribed seven. Well, I'll put more in. So on row 4, I gave 2, 3, and 4.
And did I put in this one? Yes, I did.
So I've now prescribed seven numbers. Could I complete the matrix to a rank 1 matrix? You see the math question. If I'm given seven positions there-- this is another way to look at the same picture.
This is I put in an edge for every x. So if I have seven x's, I've got seven edges. And this is called a bipartite graph. So it's a graph.
I've got four nodes here, four nodes here. So I have eight nodes. It's a bipartite graph because I have one part of nodes over there, one part of nodes here. Bipartite means two parts.
So the rows give me one part of the edges. The columns give me the other part of the edges. And all the connections go between the parts. I don't have any lines from a row to a row, because the whole code is that this tells me where those seven positions are.
Now, I'm just going ask you, can I complete the graph? Can I complete the matrix? Can I complete rank 1 matrix there? What do you think?
And the real question is, what's the rule? How can I see when I can't complete the matrix, something gets in the way? Here, what got in the way was this 2 by 2.
And actually, I asked the question on email, can I always complete it-- yeah, so that would be the question I could ask you. Can I always complete the graph-- complete the matrix to have rank 1 when I don't run into this? Is that all I have to avoid, a 2 by 2, where I'm given all four entries-- oh, yeah, this I was able to complete. Sorry to-- let's do one.
If I'm given those entries and maybe some others, I guess seven altogether, that's a failure. I can't prescribe any non-zeros in those seven positions, because if I prescribe any non-zeros, I probably won't have a zero determinant here. I won't have rank 1. That column will not be a multiple of that column. Whatever I do here, I've screwed up already. So that's a fail.
Let me take that picture and turn it into this picture, so rows, columns. So if I take these seven, that's a row 1-- row 1, 2, 3, 4, so row 1 goes to 1 and 2. Row 2 goes to 1 and 2. Row 3 goes to 4. And row 4 goes to 3 and 4.
That's a failure. And how do I know it's a failure? I want to now come up with the answer.
So if I give any seven positions or any m plus n minus 1 positions like that, I can create a graph like that, just following the rule that you saw. Those seven positions gave me seven edges in my graph. And it's a bipartite graph because every edge goes from this part over to this park. And that's a failure.
And the reason it's a failure is that I have here a cycle. If I go across, down, across, down, I come back where I started. That would be a cycle equals failure.
Everybody see I can't give these four numbers generally. Once I've given you three, I don't have any freedom left with that fourth one. And the way to see that in this picture is that there's a cycle.
So here is the combinatorics thing that Professor Postnikov told me about last night. You can complete to a rank 1 matrix if and only if no cycles. So that just answered my question perfectly. This one is one where I can't complete, and it's got a cycle. A cycle meaning you come back to where you started.
So my question to him was, can you always complete it if there are no 2 by 2s in the way? And his answers told me that maybe the 2 by 2s could be OK, but there could be a bigger cycle, a longer cycle that would screw you up. So let me close with an example of that sort.
So this is going to be fail again. But no 2 by 2 is responsible. In other words, it's going to fail. I won't be able to complete this matrix, even though there aren't any completed 2 by 2s. The 2 by 2, I knew immediately was failure. So let me see if I can do that.
So here's a failure. This is rows 1, 2, 3, 4. And this is columns 1, 2, 3, 4. And I think I'm going to have 1 given there. And I sort of started this before. And 2 goes to 1 and 4. And 3 goes to 4. And 4 goes to-- let's see.
OK, now, I've only put it in 2, 4, 5. And I'm allowed seven. But I think I'm already in trouble here. So let me draw this picture for that.
So this is rows 1, 2, 3, 4. And this is column 1, 2, 3, 4. And now, I've prescribed that one. Oh, no, that goes from-- oh, I've forgotten the right way to do the picture. My bipartite graph wasn't what it should have been. So the 1, 2, 3, 4.
OK, now I'm going to do the bipartite graph picture that goes with this picture. So 1 to 1. 2 to 1. 2 to 4.
But what I'm going to do-- let me just say what I'm going to do. I'm going to create a cycle over here of length 6. A cycle of length 4 is what I got from a 2 by 2. From my 2 by 2, that took me one way, back another, back another, back another. I completed a cycle. I came back to where I started with just four edges.
Now, I want to complete a cycle with six edges. So let me draw it in the picture here. Now, I'm going to put an edge from 4 back to 1. So 1 to 1, 1 to 4, 2 to 1, 2 to 4.
Have I got any-- I think 1, 2, ah, damn. I didn't want a short cycle. I want a bigger, longer cycle. Let me get the darn thing here.
So 2 to 4 is not what I want. So I start a cycle. OK, go somewhere. Go back. Go somewhere. Go back. Go somewhere. Go back. Now, there I got it. Length six. Length six.
So let me put the x's in where they belong. So 1 is connected to 1 and 4. That's right. 2 is connected to 2 and 4. 3 is not connected to anybody. Let's put that one in. And 4 is connected to 1 and 2. 4 is connected to 1 and 2. OK, I believe that picture goes with that picture.
Now, my claim is that there are no there is no 2 by 2 in here that shows me immediately failure. But I will fail-- I can't live with those-- that looks like eight. Sorry, I only want seven. So shall I just take-- sorry?
AUDIENCE: You've got 3, 4 in the wrong spot.
GILBERT STRANG: 3, 4, and it shouldn't be there. All right, OK, thanks. Right. You see I'm not a combinatorics person. But it's so beautiful to have the inspiration to convert that picture to this picture and then realize that a problem in this picture is a cycle in this picture. That's the whole message.
That's the whole message, that when I'm looking here a cycle means that I've built in a requirement, which random non-zeros won't satisfy. So you see that anyway. You see the cycle, 1, 2, 3, 4, 5, 6, yeah. So there's a cycle there.
So somehow those six x's, whichever they are-- I guess all but the 3 by 3. So I could take away this part of the graph and just have 3 and 3, and there would be 6 numbers in there and that's too many.
OK, I'm going to stop there exactly half way through that class. Well, I'm going to stop there with a question, which I don't know if I dare send another question to Professor Postnikov, who answered this one perfectly. But my question would be, when could you complete a rank 2 matrix? What positions could you fill in and be able to reach rank 2?
That would be trouble, right? I don't know where we would go with that. But for us, for 18.065, that's the natural question. Rank 1 is super special for us. And rank 2 or rank R would be the general case. So they're good math questions.
Now, topic 2, convolution. So the first lab in the course just happened to have a convolution and used that word and so on. But we didn't connect that with the lectures. So now, I'd like to talk about convolutions, which are extremely important.
And they are important in machine learning because they give you a set of weights connecting-- they're an efficient way to-- let me show you. Let me show you.
So a convolution matrix-- and this will be a cyclic convolution matrix. And the shorthand for cyclic convolution matrix is circulant. So what does a circulant look like? I'll call it C.
A circulant has constant diagonals. That's what convolution means matrix wise. Convolution means constant down each diagonal. And cyclic means complete, circle around again. The diagonals circle around. So I'll just show you what I mean.
So 4 by 4, let's say. So it has some entry to-- there's one diagonal. Here is another diagonal. So now I have constant diagonals. But if it's going to be a circulant matrix-- and this is the best family of matrices in the world. The algebra is just terrific for these matrices. And that's why they're the heart of signal processing.
So this is a constant diagonal matrix. But it's not yet a circulant. That diagonal circles around to be completed. Every diagonal has four entries here.
So let me take another one, say 1, 1. Then that would circle around to 1, 1. And this guy could be 0. There's a circulant matrix.
Do you understand then what the entry-- you only need four numbers, say the first column. If you prescribe the first column of a circulant matrix, then you've told Matlab, for example, all it needs to know. If you tell it one column, it can get all the other columns just by cyclic shift.
There's the first column, 2, 0, 1, 5. The second column is 2, 0, 1, 5 shifted down by one. The next column is again shifted down by one, 2, 0, 1, 5. And the last column is 2, 0, 1, 5. So they're all the same columns after a cyclic shift.
So the key matrix in this is really a cyclic shift matrix. Say 0, 1, 0, 0, 0, 0-- so it just has one non-zero diagonal. And then it's cyclic.
Do you see that in fact this matrix C, I can produce this matrix C from P and P squared? What would P squared be? And I'll write it here without while keeping our eye on P.
So if I square that matrix, what matrix would I get? So this is a shift by one. If I shift by one again, that's multiplying it again by P. So that would give me P squared. What what's in the-- OK, so what happens now? It's a shift by two I guess.
So this will be 0, 0, 1, 0. This will be 0, 0, 0, 1. Then it's cyclic still. So I'll put in the 1s there and then just fill in the 0s. So P squared, shift by two. You start out with this one.
Is that right? Yes. Yes. OK. OK.
Yeah, let's just make it multiply a vector, x0, x1, x2, x3. What it does is it puts-- well, I've started the numbering with 2 here. x2, x3, x0, and x1. So it's shifted it by two and cyclically. So the x2, x3 that got shifted off the bottom come back to the top.
So the first property is, of circulants-- so I suppose I have matrices C and D circulants. So fact 1, C times D is also a circulant. So if I multiply circulant matrices, I get more circulant matrices. And the identity is a circulant matrix. I have a little group of matrices, the best little group of matrices there is. I can multiply two guys in the group, and I get another one in the group.
Why is CD also a circulant? Let's see, can we see why that fact is true? I guess here here's my way to look at it.
A circulant matrix-- let me put it here-- a circulant matrix, every circulant matrix is C0 times the identity circulant plus C1 times the single shift plus C2 times the double shift plus C3 times the triple shift. That's what it takes to put C0, C1, C2, and C3 on those diagonals.
C0I is obviously on the main diagonal. C1P-- remember what P is-- that puts C1 on the next diagonal. Then C2P squared, when I square this, I've shifted by one more. So it will put C2 on that diagonal. And finally, C3 would go in those positions.
So every circulant matrix is a polynomial in P, in a single shift. This is any combination of shifts. This is a single shift. That's a double shift. That's a triple shift. That's a zero shift. And if I take combinations I get a circulant matrix.
So now, I can see why CD is also a circulant, because this is a polynomial in P. And this is a polynomial in P. And when I multiply those together, I get a polynomial in P.
But usually, if I multiply a third degree polynomial by another third degree polynomial-- so say I'm looking at 4 by 4 circulant matrices-- so this would be a polynomial of P in P third degree 3. And this would be a polynomial of degree 3. So that would give me a polynomial of degree 6.
But I don't want that. That somehow I have to define multiplication so that I want this to be degree-- these are all supposed to be 4 by 4 matrices, just like that one. That's twice the identity plus five P plus 1P squared plus 0P cubed.
And suppose I square that. Then, again, I'm going to get up a circulant matrix. And I don't want it to go up to degree 7. I just want four terms in my polynomial. I just want the main diagonal and the next three diagonals and then cycling around completes the matrix.
So how to get degrees 3? That's the question then. Can you just tell me the answer? So I'm multiplying-- yeah, let's just do an example. So I've C0, C1P, C2P squared, C3P cubed times D0 plus D1P plus D2P squared and D3P cubed. So P is 4 by 4 circular shift.
And I'm writing the 4 by 4 matrix that way. This should be the identity, of course. You knew that.
OK, so when I multiply those guys, why do I not get degree six? Why is that product-- like P cubed times P cubed C3P cubed times DP3 cubed, that gives me C3D3 P to the sixth power. So what's up? What do I do? Yeah.
AUDIENCE: Does P to the 4 equal to identity?
GILBERT STRANG: Yes, that's the key. That's the periodic part. P to the 4 equal the identity. Thank you. Yeah.
So the P to the sixth term is really a P squared term. The P to the fifth term is really a P term. P the fourth term is really a P to the zero term.
So I'm just doing cyclic convolution actually. Let me just say when I'm multiplying-- yeah, so now, I'm telling you what-- I can first tell you what convolution is and then what cyclic convolution is. And listen up to this, because it's a good thing to know.
So, first of all, convolution. So suppose I want to take the convolution of 3, 1, 2-- that's often the symbol for a convolution-- with 4, 6, 1. So what does that mean? I've got a vector.
So for me, what's hiding there is polynomial 3 plus x plus 2x squared times 4 plus 6x plus 1x squared. And I just multiply them out. And I get 5-- so I get 3 times 4 gives me a 12. 3 times 6x is 18x. But I've also got 4x from here. So I've got 22x.
So I just do that multiplication as of polynomials. And what I'm doing is convolution with the vectors. And here's the way you wrote it in first grade.
So here's first grade. 2, 12, 8. You haven't learned to carry yet. So 12 goes in there as 12. Then 1 is 4, 6, 1. And 3 is 12, 18, 3.
And then you add up. So you have 2, 13, 17, 22, and 12. So that's the answer there. That's the convolution-- I guess I have to take it from-- yeah-- 12. 3 times four gave 12. Now, 3 times 6 plus 1 times 4 that's the 22. Oh, yeah, we already started here.
But now, what's after that 17? Why did I say 17?
AUDIENCE: 8 plus 6 plus 3.
GILBERT STRANG: Oh, 8 plus 6 plus 3. Thanks. 17. And then 12 plus 1, 13. And then 2. That's non-cyclic. So that's convolution.
Oh, if I want it to be non-cyclic, I don't put a circle in. So that's a symbol for ordinary convolution. If you do the Matlab command conv for convolution, if you gave it those vectors, it would give you a vector with five components as a result.
But now, let me make it cyclic. So what's going to happen now when I make it cyclic? So this represents 12, 22P, 17P squared, 13P cubed. And what's 13P cubed? If n is 3 and I'm doing 3 by 3 matrices, then 13P cubed is the same as? 13, right? P cube is I.
So the 13 will go back there. And the 2 will be P fourth. And it will go back as P. So now with convolution, cyclic convolution gives 12 and 13, 25, 22 and 2, 24, and the 17. So I'm getting back a vector of length 3 just as I wanted to.
If I have a 3 by 3 matrix, convolution cyclic circulant matrix, with those three diagonals and these three diagonals, then I want the answer to be a 3 by 3 matrix with these diagonals. So again, matrix multiplication of circulant matrices corresponds exactly to multiplying polynomials cyclically. And that's cyclic convolution.
And then there's just one little trick to see if you've got the numbers. I believe that it should be true that if 3 plus 1 plus 2 is 6, 4 plus 6 plus 1 is 11, and I believe that those should add to what? What do I hope that they add to? 66.
I'm a little nervous about doing it, but I think we can. Yeah, so 49, 59, 66, check. Yeah, so the digits, the sum of the digits of one factor multiplies the sum of the digits of the other factor to give the sum of the digits in the convolution. And that would be true whether we're doing cyclic, which is bringing the 13 and the 2 back in or not cyclic, where we have 5 numbers.
That's actually a check that I don't know if you ever thought about that in second grade, multiplying these numbers, multiplying that number by that number and getting some answer, which probably half the class did not get right. Then fifth grade, they're kind of getting it. But if they knew a check, so they could have just added these numbers to get 11, added these numbers to get 6, multiplied those to get 66, and check that those added to 66. That never occurred to me I admit in second grade either. But now you know. Now you can pass second grade.
So this is the picture for cyclic convolution. And I just have a few minutes left to tell you about the eigenvalues and eigenvectors of these matrices. So what can you tell me about the eigenvalues-- or say, the eigenvectors?
I have a matrix P. It happens to be a permutation matrix. But it's a matrix.
Then I take this polynomial in the matrix. What can you tell me about eigenvectors of this matrix C? So my C looks like that. And I'm asking you for the eigenvectors of that. And I'm saying that that is a polynomial, sum of powers of P, like so. In fact, the numbers here just come off the diagonals. So what about the eigenvectors?
AUDIENCE: All 1s.
GILBERT STRANG: Well, that is an eigenvector. That's true. The vector of all 1s will be an eigenvector. With what eigenvalue actually?
AUDIENCE: C2 plus C1.
GILBERT STRANG: Yeah. So what I want you to see is the eigenvectors of C are the same as the eigenvectors of P. If I have an eigenvector of P, that's also an eigenvector of P squared and P cubed in a combination. So eigenvectors of P are eigenvectors of C.
So then the question is, what are the eigenvectors of P? You see what I mean? It didn't really matter if I'm looking for eigenvectors what those numbers were. The point is that those numbers were constant on diagonals and that therefore it's built out of 1 matrix P, built out of 1 matrix P. So all I have to know is the eigenvectors of P. So that's the final step here. What are the eigenvectors and eigenvalues of P?
So eigenvectors, eigenvalues for P equal 1, 1, 1, and 1, say. And otherwise zeros. So somebody mentioned that the vector of 1, 1, 1, 1, that is an eigenvector, because if I do that multiplication, it shifts this vector down and cyclically around. But that just brings back the same vector. So lambda is 1. And that's an eigenvector with eigenvalue 1. But this is a 4 by 4 matrix.
AUDIENCE: 1 minus 1, 1 minus 1.
GILBERT STRANG: Ah, now you're guessing 1 minus 1, 1 minus 1. Interesting. Well, tell me an eigenvector for minus 1. Do you want to just alternate signs there? So let me let me do that multiplication times 1 minus 1, 1 minus 1.
Of course, I don't have to do that multiplication. I'm just going to shift this. So I'm going to shift that down, 1 minus 1 and cyclically around.
And now, what's the eigenvalue? It is an eigenvector. And what multiple of that vector gives that vector?
AUDIENCE: Minus 1.
GILBERT STRANG: Minus 1. So that was a good idea. But the rest of what you said was a bad idea, because the next eigenvectors got to be somewhere else. We've just got one eigenvalue there and one there. And we haven't got the other two. So what are the other two eigenvalues of this permutation matrix?
GILBERT STRANG: 1, that's right. We're in a circulant world. Draw a circle. So the eigenvectors are the four roots of 1. Complex. So 1 and minus 1 you've got. These are the lambdas. But the other ones are I and minus I. Yeah. Yeah.
Why don't I, since time's up, let me leave until Friday the eigenvalues and eigenvectors for in general, for arbitrary size. But that's a picture of the eigen-- we're close to the eigenvectors and eigenvalues of every-- all circulants have the same eigenvectors. All circulants have that eigenvector, 4 by 4. All circulants have that eigenvector 4 by 4. And we'll find the other two.
OK, so I'll see you Friday for those and the convolution rule, which is the most important rule in signal processing. OK. Thanks.