Home » Courses » Mathematics » Linear Algebra » Video Lectures » Lecture 31: Change of basis; image compression
Flash and JavaScript are required for this feature.
Download the video from iTunes U or the Internet Archive.
These video lectures of Professor Gilbert Strang teaching 18.06 were recorded in Fall 1999 and do not correspond precisely to the current edition of the textbook. However, this book is still the best reference for more information on the topics covered in each lecture.
Strang, Gilbert. Introduction to Linear Algebra. 4th ed. Wellesley, MA: Wellesley-Cambridge Press, February 2009. ISBN: 9780980232714.
Instructor/speaker: Prof. Gilbert Strang
Lecture 31: Change of basis...
OK. So, coming nearer the end of the course, this lecture will be a mixture of the linear algebra that comes with a change of basis.
And a change of basis from one basis to another basis is something you really do in applications.
And, I would like to talk about those applications.
I got a little bit involved with compression.
Compressing a signal, compressing an image.
And that's exactly change-of-basis.
And then, the main theme in this chapter is th- the connection between a linear transformation, which doesn't have to have coordinates, and the matrix that tells us that transformation with respect to coordinates. So the matrix is the coordinate-based description of the linear transformation. Let me start out with the nice part, which is just to tell you something about image compression. Those of you -- well, everybody's going to meet compression, because you know that the amount of data that we're getting -- well, these lectures are compressed. So that, actually, probably you see my motion as jerky? Shall I use that word? Have you looked on the web? I should like to find a better word.
Compressed, let's say. So the complete signal is, of course, in those video cameras, and in the videotape, but that goes to the bottom of building nine, and out of that comes a jumpy motion because it uses a standard system for compressing images.
And, you'll notice that the stuff that sits on the board comes very clearly, but it's my motion that needs a whole lot of bits, right? So, and if I were to run up and back up there and back, that would need too many bits, and I'd be compressed even more. So, what does compression mean? Let me just think of a still image.
And of course, satellites, and computations of the climate, computations of combustion, the computers and sensors of all kinds are just giving us overwhelming amounts of data. The Web is, too.
Now, some compression can be done with no loss.
Lossless compression is possible just using, sort of, the fact that there are redundancies.
But I'm talking here about lossy compression.
So I'm talking about -- here's an image.
And what does an image consist of? It consists of a lot of little pixels, right? Maybe five hundred and twelve by five hundred and twelve.
Two to the ninth by two to the ninth pixels, and so this is pixel number one, one, so that's a pixel.
And if we're in black and white, the typical pixel would tell us a gray-scale, from zero to two fifty five.
So a pixel is usually a value of one of the xi, so this would be the i-th pixel, is -- it's usually a real number on a scale from zero to two fifty five.
In other words, two to the eighth possibilities. So usually, that's the standard, so that's eight -- eight bits.
But then we have that for every pixel, so we have five hundred and twelve squared pixels, we're really operating x is a vector in R^n, but what is n? n is five hundred and twelve squared.
That's our problem, right there.
A pixel is a vector that gives us the information about the image. I'm sorry.
The image that comes through is a vector of that length that -- that's the information that we have about the image, if it's a color image, we would have three times that length, because we'd need three coordinates to get color.
So it would be three times five hundred and twelve squared.
It's an enormous amount of information, and we couldn't send out the image for these lectures without compressing it.
It would overload the system. So it has to be compressed.
The standard compression, and still used with lectures is, called JPEG. I think that stands for Joint Photographic Experts Group. They established a system of compression. And I just want to tell you what it's about. It's a change-of-basis.
What basis do we have? The current basis we have is, you could say, the standard basis is, every pixel, give a value.
So that's like we have a vector x which is five hundred and twelve squared long and, in the i-th position, we get a number like one twenty one or something.
The pixel next to it might be one twenty four, maybe where my tie begins to enter, so if it was mostly blue shirt, this would be a slight difference in shading, but pretty close, then the tie would be a different color, so we might have quite a few pixels for the blue shirt, and a whole lot more for the blackboard, that are very close. And that's what are very correlated. And that's what gives us the possibility of compression. For example, before the lecture starts, if we had a blank blackboard, then there's an image, but it would make no sense to take that image and tell you what it is pixel by pixel.
I mean, there's a case in which all pixel values, all gray levels are the same -- or practically the same, depending on the erasing of the board, but extremely close -- and, so that's an image where the standard basis is lousy.
That's the basic fact, that the standard basis which gives the value of every pixel makes no use of the fact that we're getting a whole lot of pixels whose gray levels -- the neighboring pixels tend to have the same gray level as their neighbors. So how do we take advantage of that fact? Well, one basis vector that would be extremely nice to include in the basis would be a vector of all ones. That's not in our standard basis, so let me just write again, the standard basis is our one, and all the rest zeroes, zero, one, and all the rest, zeroes, everybody knows what these standard basis is.
Now, any other basis for R -- so this is -- for this very high-dimensional space -- now I'm going to speak about a better basis. Better basis -- and let me just emphasize, one vector that would be extremely nice to have in that basis is the vector of all ones.
Why is that? Let me just say again, because that vector of all ones, by itself, one vector is able to completely give the information on a solid image. Of course, our image won't be solid, it will have a mix of solid and signal.
So having that one vector in the basis is going to save us a whole lot. Now, the question is, what other vectors should be in the basis? The extreme vector in the basis might be a vector of one minus one, one minus one, one minus one.
That would be a vector that shows -- I mean, that's like a checkerboard vector, right? That's a vector that would, if the image was like a huge checkerboard of plus, minus, plus, minus, plus, minus, that vector would carry the whole signal.
But much more common would be maybe to have half the image, darker and the other half lighter.
So another vector that might be quite useful in here would be half ones and half minus ones. I'm just trying to get across the idea of that a basis could be where, that first of all, we've got the bases at our disposal.
Like, we're free to choose that.
And it's a billion-dollar decision what we choose.
So, and TV people would rather pre- would prefer one basis based on the way the signal is scanned, and movie people would prefer another, I mean, there's giant politics in this question that really reduces to a linear algebra problem, what basis to choose. I'll just mention the best known basis, which JPEG uses, -- let me put that here -- is the Fourier basis. So when you use the Fourier basis, that includes -- this is the constant vector, the D C vector if we're electrical engineers, the l- vector of all ones, so it would include one, one, one, one. Often eight by eight is a good choice. Eight by eight is a good choice. So, what do I mean by this eight by eight? I mean that the big signal, which is five twelve by five twelve, gets broken down, and JPEG does this, into eight by eight blocks. And we -- sort of, this is too much to deal with at once. So what JPEG does is take this eight by eight block, which is sixty four coefficients, sixty four, pixels, and changes the basis on that piece.
And then, now, let's see, I was going to write down Fourier, so you remember Fourier as this vector of all ones, and then, the vector -- oh, well, actually, I gave a lecture earlier about the Fourier matrix, this matrix whose columns are powers of a complex number w. I won't repeat that, because I don't want to go into the details of the Fourier basis, just to tell you how compression works.
So what happens in JPEG? What happens to the video, to each image, of these lectures? It gets broken into eight by eight blocks.
OK. Within each block, we have sixty four coefficients, sixty four basis vectors, sixty four pixels, and we change basis in sixty four dimensional space using these Fourier vectors. Just note, that was a lossless step. Let me emphasize.
In comes the signal x. We change basis.
This is the basis change. Change basis.
Choose a better basis. So it produces, the coefficients c. So sixty four pixels come in, sixty four coefficients come out.
Now comes the compression. Now come -- this was lossless.
It's just -- we know that R -- R sixty four has plenty of bases, and we've chosen one. Now, in that basis, we write the signal in that basis, and that's what my lecture -- that's the math part of my lecture.
Now here's the application part.
The next part is going to be the compression step.
And that's lossy. We're going to lose information. And what will actually happen at that step? Well, one thing we could do is just throw away the small coefficients.
So that's called thresholding, we set some threshold.
Every coefficient, every basis vector that's not in there more than the threshold value, and we set them threshold so that our eye can't see the difference, or can hardly see the difference, whether we throw away that little bit of that basis vector or keep it. So this compression step produces a compressed set of coefficients. I'll just keep going here.
So it keeps going, this compression step produces some coefficient c hat. And with many zeroes.
So that's where the compression came. Probably, there is enough of this vector of all ones -- we very seldom throw that away. Usually, its coefficient will be large. But the coefficient of something like this, that quickly alternative vector, there's probably very little of that in any smooth signal. That's high-frequency -- this is low-frequency, zero frequency.
This stuff is the highest frequency we could have, and if the noise, the jitter is producing that sort of output, but a smooth lecture like this one is, has very little of that highest frequency, very little noise in this lecture.
OK, so we throw away whatever there is, and we're left with just a few coefficients, and then we reconstruct a signal using those coefficients. We take those coefficients, times their basis vectors, but this sum doesn't have sixty four terms any more. Probably, it has about two or three terms. So that would -- say it has three terms. From sixty four down to three, that's compression of twenty one to one.
That's the kind of compression you're looking for.
And everybody is looking for that sort of compression.
Let's see, I guess I met the problem with the FBI and fingerprints. So there's a whole lot of still images. You know, with your thumb, you make these inky marks which go somewhere.
it used to go to Washington and get stored in a big file. So Washington had a file of thirty million murderers, cheaters on quizzes, other stuff, and actually, there was no way to retrieve them in time. So suppose you're at the police station, they say, OK, this person may have done this, check with Washington, have they got -- are his or her fingerprints on file? Well, Washington won't know the answer within a week if it's got filing cabinets full of fingerprints. So of course, the natural step is digitizing. So all fingerprints are now digitized, so now it's at least electronic, but still there's too much information in each one.
I mean, you can't search through that many, fingerprints if the digital image is five twelve squared by five twelve squared, if it's that many pixels.
So you get compressed. So the FBI had to decide what basis to choose for compression of fingerprints.
And then they built a big new facility in West Virginia, and that's where fingerprints now are sent.
So I think, if you get your fingerprints done now at the police station, if it's an up-to-date police station, it happens digitally, and the signal is sent digitally, and then in West Virginia, it's compressed and indexed. And then, if they want to find you, they can do it within minutes instead of within a week. OK.
So this compression comes up for signals, for images, for video -- which is, like these lectures -- there's another aspect. You could treat the video as one still image after another one, and compress each one, and then run them and make a video.
But that misses -- well, you can see why that's not optimal. In a video thing, you have a sequence of images, so video is really a sequence of images but what about one image to the next image? They're extremely correlated. I mean that I'm getting an image every split-second, and also, I'm moving slightly.
That's what's producing the, jumpy motion on the video. But I'm not, like, you know -- each image in the sequence is pretty close to the one before.
So you have to use, like, prediction and correction. I mean, the image of me one instant -- one time-step later, you would assume would be the same, and then plus a small correction. And you would only code and digitize the correction, and compress the correction. So a sequence of images that's highly correlated and the problem in compression is always to use this correlation, this fact that, in time, or in space, things don't change instantly, they're very often smooth changes, and, you can predict one value from the previous value.
OK. So those are applications which are pure linear algebra. I could, well, maybe you'll allow me to tell you, and the book describes, the new basis that's the competition for Fourier.
So the competition for Fourier is called wavelets, and I can describe what that basis is like, say, in the eight by eight case.
So the eight by eight wavelet basis is the vector of all ones, eight ones, then the vector of four ones and four minus ones, then the vector of two ones, and two minus ones, and four zeroes. And also the vector of four zeroes and two ones and two minus ones.
So now I'm up to four, and I need four more, right? For R^8? The next basis vector will be one minus one and six zeroes, and then three more like that, with the one minus one there, and there, and there. So those are eight vectors in eight-dimensional space, those are called wavelets, and it's a very simple wavelet choice, it's a more sophisticated choice.
This is a little jumpy, to jump between one and minus one. And, actually, you can see, now, suppose you compare the wavelet basis with the Fourier basis above.
How could I write this guy, which is in the Fourier basis, it's an eight -- it's a vector in R^8. How would I write that as a combination of the wavelet basis? Have I told you enough about the wavelet basis that you can see, how does this very fast guy -- what combination of the wavelet basis is that very fast guy? It would be this one -- it would be the sum of these four, right? That very fast guy will be that one minus one, and the next one, and the next one, and the next one. So this is the sum of those last four wavelets. This one, we've kept, and so on. So, each -- well, every -- well, that's what a basis does.
Every vector in R^8 is some combination of those, and for the linear algebra -- so the linear algebra is this step, find the coefficient. That's the step we want to take. What if I give you the basis, like this wavelet basis, and I give you the pixel -- so here are the pixel values, P1, P2, down to P8 -- what's the job? What's the linear algebra here? So these are the values, this is in the standard basis, right? Those are just the values at eight successive points. I guess I'm dropping down to one dimension, instead of eight by eight, I'm just going to take eight pixel values along that first top row. So what do I want to do? In standard basis, here are the pixel values.
I want to write that as a combination of c1 times this guy, plus c2 times this guy, plus c3, these are the coefficients, plus c4 times this one -- do you see what I'm doing? I want to write this vector P as a combination of c1 times the first wavelet plus c8 times the eighth wavelet. That's the transform step.
That's the lossless step. That's the step from P -- oh, I'm calling it P here, and I called it x there, so let me -- at the risk of moving, and therefore making this jumpy -- suppose the signal I'm now calling P, that a pixel values, and I'm looking for the coefficients. OK, tell me how to do it.
If I give you eight basis vectors, and I give you the input signal, and I ask for the coefficients, what do I do? What's the step? I'm trying to solve this, I want to know the eight coefficients, so I'm changing from the standard basis, which is just the eight gray-scale values to the wavelet basis, where the same vector is represented by eight numbers. It's got to take eight numbers to tell you a vector in R^8, and those eight numbers are the coefficients of the basis. Look, we've done this thing before. There is the equation in vector notation, we want to see it as a matrix.
This is a combination of columns of the wavelet matrix, right? This is P equals c1, c2, down to c8, and these guys are the columns.
I mean, this is the step that we're constantly taking in this course, the first basis vector goes in the first column, the second basis vector goes in the second column, and so on, the eight columns of this wavelet matrix are the eight basis vectors. This is a wavelet matrix W. So, the step to change basis -- so now I'm finally coming to this change-of-basis, so the change of basis that, let me stay with this board, but -- well, let me just go above it, here.
So the standard basis, we know, the wavelet basis we have here, and the transform is simply, solve the equations, P=W C. So the coefficients are W inverse P. Right.
This shows a critical point. A good basis has a nice, fast, inverse. So good basis means what? Eh? So this is like the billion-dollar competition, and it's not over yet. People are going to come up with better bases than these.
So a good basis will be, first good thing would be fast.
I have to be able to multiply by W fast, and multiply by W -- by its inverse fast. That's -- if a basis doesn't allow you to do that fast, then it's going to take so much time that you can't afford it. So these bases -- the Fourier basis, everybody said, OK, I know how to deal quickly with the Fourier basis, because we have something called the Fast Fourier Transform. So there's a FFT that came in my earlier lecture, and comes in the last chapter of the book, so change-of-basis is done -- if, for the Fourier basis, it's done fast by the FFT and there's a fast wavelet transform. I can change, for this wavelet example, this matrix is easy to invert.
It's just somebody had a smart idea in choosing that wavelet basis and inverting it, it has a nice inverse.
Actually, you can see why it has a nice inverse.
Do you see any property of these eight basis vectors? Well, I've only written five of them, but if you see that property for those five, you'll see it for the three remaining. Well, if I give you those eight vectors and ask, what's a nice property? Well, you would say, first, they're all ones and minus ones and zeroes. So every multiplication is very fast using -- just in binary. But what's the other great property of those vectors? Anybody see it? So, of course, when I think about a basis, one nice property -- I don't have to have it, but I'm happy if it's there -- is that they're orthogonal.
If the basis vectors are orthogonal, then I'm in good shape. And these are...
do you see? Take the dot product of that with that, you get four plus ones and four minus ones, you get zero. Take the dot product of that with that. You get two plus ones and two minus ones. Or the dot product of that with that. Two plus ones and two minus ones. You can easily check that that's an orthogonal basis. It's not orthonormal.
To fix it up, I should divide by the length, to make them unit vectors. Let's suppose I do that.
So somewhere in here, I've got to account for the fact that this has length square root of eight, that has length square root of four, that has length square root of two. But that's just a constant factor that's easy to -- so suppose we've done that.
Then, tell me what's W inverse? That's what chapter four, section four point four was about.
If we have orthonormal columns then the inverse is the same as the transpose. So if we have a fast way to multiply by W, which we do, the inverse is going to look just the same, and we'll have a fast way to do W inverse.
So that's the wavelet basis passes this requirement for fast. We can use it fast.
But there's a second requirement, is it any good? Because the the very fastest thing we could do is not to change basis at all. Right? The fastest thing would be, OK, stay with the standard basis, stay with eight pixel values.
But that was poor from compression point of view, right? Those eight pixel values, if I just took those eight numbers, I can't throw some of those away. If I throw away ninety percent -- if I compress ten to one, and throw away ninety percent of my pixel values, well, my picture's just gone dark. Whereas, the basis that was good, the wavelet basis or the Fourier basis, if I throw away c5, c6, c7, and c8, all I'm throwing away is little blips that are probably there in very small amounts. So the second property that we need is good compression. So first, it has to be fast, and secondly, a few basis vectors should come close to the signal. So a few is enough.
Can I write it that way? A few basis vectors are enough to reproduce the image just exactly as on a video of these 18.06 lectures. Uh, I don't know what the compression rate is, I'll ask, David, who does the compression -- and, by the way, I'll try to get the lectures, that are relevant for the quiz up onto the Web in time. So I'll send them a message today. So, he's using the Fourier basis because the JPEG -- so JPEG two thousand, which will be the next standard for image compression, will include wavelets. So, I mean, you're actually getting a kind of up-to-date, picture of where this big world of signal and image processing is.
That Fourier is what everybody knew, and what people automatically used, and the new one is wavelets, where this is the simplest set of wavelets.
And this isn't the one that the FBI uses, by the way, the FBI uses a smoother wavelet, instead of jumping from one to minus one, it's a smooth, Cutoff. and, that's what we'll be in in JPEG two thousand. OK, so that's that application.
Now, let me come to the math, the linear algebra part of the lecture. Well, we've actually seen a change-of-basis. So let -- let me just review that eh-eh change-of-basis idea, and then the i- and then the transformation to a matrix.
OK. So this, I hope you see that these applications are really big.
Now, I have to talk a little about change-of-basis, and a little about that. The matrix.
OK. OK.
OK. So change-of-basis.
Basically, forgive that put, OK, I have, I have my vector in one basis, and I want to change to a different one.
Actually, you saw it for the wavelet case.
So I need the -- let the matrix W, and the columns of W be the new basis vectors. Then the change-of-basis involves, just as it did there, W inverse.
So we have the vector, say, x, in the old basis, and that converts to a vector, let's say, c, in the new basis, and the relation is exactly what we had there, that x is W c. That's the step we have to take.
There's a matrix W that gives us a change-of-basis.
OK. What I want to do is think about transformations on matrices.
So here's the question to complete this lecture.
Suppose I have a linear transformation T.
So we would think of it as an eight -- as a n by n matrix.
And it's computed with respect to a certain basis.
So T -- no, I'm sorry. I've got the transformation T, period. That's taking eight-dimensional space to eight-dimensional space.
Now, let's get matrices in there.
OK. So, with respect to a first basis, say v1 up to v8, it has a matrix A.
I'm just setting up letters here.
With respect to a second basis, say, I'll make it u1 up to -- or w1, since I've used (w)s, w1 up to w8, it has a matrix B. And my question is, what's the connection between A and B? How is the matrix -- the transformation T is settled.
We could say, it's a rotation, for example. So that would be one transformation of eight-dimensional space, just spin it a little. Or project it. Or whatever linear transformation we've got.
Now, we have to remember -- my first step is to remind you how you create that matrix A. Then my second step is, we would use the same method to create B, but because it came from the same transformation, there's got to be a relation between A and B. What's the relation between A and B? And let me jump to the answer on that one. That if I have the same transformation, and I'm compute on its matrix in one basis, and then I computer it in another basis, those two matrices are similar.
So these two matrices are similar.
Now, do you remember what similar matrices meant? Similar. A is similar to -- the two matrices are similar. Similar.
And what do I mean by that? I mean that I take the matrix B, and I can compute it from the matrix A using some similarity, some matrix M on one side, and M inverse on the other.
And this M will be the change-of-basis matrix.
This part of the lecture is, admittedly, compressed. What I wanted you to -- it's really the conclusion that I want you to spot. Now, I have to go back and say, what does it mean for A to be the matrix of this transformation T. So I have to remind you what that meant, that was in the last lecture.
Then this is the conclusion that if I change to a different basis, we now know -- see, if I change to a different basis, two things happen. Every vector has new coordinates. There, the rule is this one, between the old coordinates and the new ones. Every matrix changes, every transformation has a new matrix. And the new matrix is related this way, the M could be the same as the W.
The M there would be the W here.
OK. So, can I, in the remaining minutes, recapture my lecture -- the end of my lecture that was just before Thanksgiving, about the matrix? OK. What's the matrix? And I'll just take one basis. So now this part is going to go onto this board here. What is the matrix? What is A? OK.
Using a basis v1 up to v8. Mm.
OK. What's the point? The point is, if I know what the transformation does to those eight basis vectors, I know it completely. I know T, I know everything about T, I know T completely from knowing T of V -- what T does to v1, what T does to v2, what T does to v8.
Why is that? It's because T is a linear transformation. So that if I know what these outputs are -- so these are the inputs v1 up to v8, these are the outputs from the transformation, like everyone rotated, everyone projected, whatever transformation I've done, then why is it that I know everything? How does linearity work? Why? This is because every x is some combination of these basis vectors, right? c1v1, c2v2, c8v8, they were a basis.
That's the whole point of a basis, that every vector is a combination of the basis vectors in exactly one way.
And then, what is T of x? The point is, I claim that we know T of x completely for every x, because every x is a combination of those -- and now we use the linear transformation part to say that the output from x has to be c1 times the output from v1 plus v2 times the output from v2, and so on.
Up through c8 times the output from v8.
So this is like just saying, OK.
We know everything when we know what T does to each basis vector. OK.
So those are the eight things we need. Now -- but we need these answers in this basis.
So this first output is some combination of the eight basis vectors. So write T acting on the first input -- in other words, write the first output as a combination of the basis vectors, say a11 v1 + a21 v2 and so on a81 v8. Write T of v2 as some combination a12 of v1, a22 of v2 and so on. I'm creating the matrix A, column by column. Those numbers go in the first column, these numbers go in the second column, the matrix A that thi- this -- this is our matrix that represents T in this basis is these numbers.
a11 down to a18, a21 down to a28, and so on. OK.
That's the recipe. In other words, if I give you a transformation, and a basis.
So that's what I have to give you.
The inputs are the basis and to tell you what the transformation is. And then, you tell me -- you compute T for each basis, expand that result in the basis, and that gives you the sixty four numbers that go into the matrix A. Let me suppose -- let's close with the best example of all. Suppose v1 to v8, this basis, is the eigenvectors.
Suppose we have an eigenvector basis so that T(vi) is in the same direction of vi. Now, my question is, what is A? Can you carry through the steps? Let's do them together, because we can do it in one minute.
So, we've chosen this perfect basis.
And, actually, with signal image processing, they might look for the eigenvectors.
But that would take more calculation time that just saying, OK, we'll use the wavelet basis.
Or, OK, we'll use the Fourier basis.
But the very best basis is the eigenvector basis.
OK, what's the matrix? So, what's the first column of the matrix? How do I get the first column? I take the first basis vector v1.
I opt -- I look to see, what does the transformation do to it? The output is lambda one v1.
I express that output as a combination so the first input is v1. Its output is lambda one v1.
Now write lambda one v1 as a combination of the basis vectors, well, it's already done.
It's just lambda one times the first basis vector and zero times the others. So this first column will have lambda one and zeroes. OK.
Second input is v2. Output is lambda two v2.
OK, write that output as a combination of the (v)s.
It's already done. It's just lambda two times the second v. So we need, in the second column, we have lambda two times the second v.
Well, you see what's coming, that in that basis, in the eigenvector basis, the matrix is diagonal.
So that's the perfect basis, that's the basis we'd love to have for image processing, but to find the eigenvectors of our pixel matrix would be too expensive.
So we do something cheaper and close, which is to choose a good basis like wavelets. OK, thanks.
So I'll -- quiz review on Wednesday, all day.
Thanks.
MIT OpenCourseWare makes the materials used in the teaching of almost all of MIT's subjects available on the Web, free of charge. With more than 2,200 courses available, OCW is delivering on the promise of open sharing of knowledge. Learn more »
© 2001–2015
Massachusetts Institute of Technology
Your use of the MIT OpenCourseWare site and materials is subject to our Creative Commons License and other terms of use.