Home » Courses » Mathematics » Multivariable Calculus » Video Lectures » Lecture 12: Gradient
Flash and JavaScript are required for this feature.
Download the video from iTunes U or the Internet Archive.
Topics covered: Gradient; directional derivative; tangent plane
Instructor: Prof. Denis Auroux
Lecture 12: Gradient
Lecture Notes - Week 5 Summary (PDF)
The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation or to view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu. so -- OK, so remember last time, on Tuesday we learned about the chain rule, and so for example we saw that if we have a function that depends, sorry, on three variables, x,y,z, that x,y,z themselves depend on some variable, t, then you can find a formula for df/dt by writing down wx/dx dt wy dy/dt wz dz/dt. And, the meaning of that formula is that while the change in w is caused by changes in x, y, and z, x, y, and z change at rates dx/dt, dy/dt, dz/dt. And, this causes a function to change accordingly using, well, the partial derivatives tell you how sensitive w is to changes in each variable. OK, so, we are going to just rewrite this in a new notation. So, I'm going to rewrite this in a more concise form as gradient of w dot product with velocity vector dr/dt. So, the gradient of w is a vector formed by putting together all of the partial derivatives. OK, so it's the vector whose components are the partials. And, of course, it's a vector that depends on x, y, and z, right? These guys depend on x, y, z. So, it's actually one vector for each point, x, y, z. You can talk about the gradient of w at some point, x, y, z. So, at each point, it gives you a vector. That actually is what we will call later a vector field. We'll get back to that later. And, dr/dt is just the velocity vector dx/dt, dy/dt, dz/dt. OK, so the new definition for today is the definition of the gradient vector. And, our goal will be to understand a bit better, what does this vector mean? What does it measure? And, what can we do with it? But, you see that in terms of information content, it's really the same information that's already in the partial derivatives, or in the differential. So, yes, and I should say, of course you can also use the gradient and other things like approximation formulas and so on. And so far, it's just notation. It's a way to rewrite things. But, so here's the first cool property of the gradient. So, I claim that the gradient vector is perpendicular to the level surface corresponding to setting the function, w, equal to a constant. OK, so if I draw a contour plot of my function, so, actually forget about z because I want to draw a two variable contour plot. So, say I have a function of two variables, x and y, then maybe it has some contour plot. And, I'm saying if I take the gradient of a function at this point, (x,y). So, I will have a vector. Well, if I draw that vector on top of a contour plot, it's going to end up being perpendicular to the level curve. Same thing if I have a function of three variables. Then, I can try to draw its contour plot. Of course, I can't really do it because the contour plot would be living in space with x, y, and z. But, it would be a bunch of level faces, and the gradient vector would be a vector in space. That vector is perpendicular to the level faces. So, let's try to see that on a couple of examples. So, let's do a first example. What's the easiest case? Let's take a linear function of x, y, and z. So, I will take w equals a1 times x plus a2 times y plus a3 times z. Well, so, what's the gradient of this function? Well, the first component will be a1. That's partial w partial x. Then, a2, that's partial w partial y, and a3, partial w partial z. Now, what is the levels of this? Well, if I set w equal to some constant, c, that means I look at the points where a1x a2y a3z equals c. What kind of service is that? It's a plane. And, we know how to find a normal vector to this plane just by looking at the coefficients. So, it's a plane with a normal vector exactly this gradient. And, in fact, in a way, this is the only case you need to check because of linear approximations. If you replace a function by its linear approximation, that means you will replace the level surfaces by their tension planes. And then, you'll actually end up in this situation. But maybe that's not very convincing. So, let's do another example. So, let's do a second example. Let's say we look at the function x^2 y^2. OK, so now it's a function of just two variables because that way we'll be able to actually draw a picture for you. OK, so what are the level sets of this function? Well, they're going to be circles, right? w equals c is a circle, x^2 y^2 = c. So, I should say, maybe, sorry, the level curve is a circle. So, the contour plot looks something like that. Now, what's the gradient vector? Well, the gradient of this function, so, partial w partial x is 2x. And partial w partial y is 2y. So, let's say I take a point, x comma y, and I try to draw my gradient vector. So, here at x, y, so, I have to draw the vector, <2x, 2y>. What does it look like? Well, it's going in that direction. It's parallel to the position vector for this point. It's actually twice the position vector. So, I guess it goes more or less like this. What's interesting, too, is it is perpendicular to this circle. OK, so it's a general feature. Actually, let me show you more examples, oops, not the one I want. So, I don't know if you can see it so well. Well, hopefully you can. So, here I have a contour plot of a function, and I have a blue vector. That's the gradient vector at the pink point on the plot. So, you can see, I can move the pink point, and the gradient vector, of course, changes because the gradient depends on x and y. But, what doesn't change is that it's always perpendicular to the level curves. Anywhere I am, my gradient stays perpendicular to the level curve. OK, is that convincing? Is that visible for people who can't see blue? OK, so, OK, so we have a lot of evidence, but let's try to prove the theorem because it will be interesting. So, first of all, sorry, any questions about the statement, the example, anything, yes? Ah, very good question. Does the gradient vector, why is the gradient vector perpendicular in one direction rather than the other? So, we'll see the answer to that in a few minutes. But let me just tell you immediately, to the side, which side it's pointing to, it's always pointing towards higher values of a function. OK, and we'll see in that maybe about half an hour. So, well, let me say actually points towards higher values of w. OK, any other questions? I don't see any questions. OK, so let's try to prove this theorem, at least this part of the theorem. We're not going to prove that just yet. That will come in a while. So, well, maybe we want to understand first what happens if we move inside the level curve, OK? So, let's imagine that we are taking a moving point that stays on the level curve or on the level surface. And then, we know, well, what happens is that the function stays constant. But, we can also know how quickly the function changes using the chain rule up there. So, maybe the chain rule will actually be the key to understanding how the gradient vector and the motion on the level service relate. So, let's take a curve, r equals r of t, that stays inside, well, maybe I should say on the level surface, w equals c. So, let's think about what that means. So, just to get you used to this idea, I'm going to draw a level surface of a function of three variables. OK, so it's a surface given by the equation w of x, y, z equals some constant, c. And, so now I'm going to have a point on that, and it's going to move on that surface. So, I will have some parametric curve that lives on this surface. So, the question is, what's going to happen at any given time? Well, the first observation is that the velocity vector, what can I say about the velocity vector of this motion? It's going to be tangent to the level surface, right? If I move on a surface, then at any point, my velocity is tangent to the curve. But, if it's tangent to the curve, then it's also tangent to the surface because the curve is inside the surface. So, OK, it's getting a bit cluttered. Maybe I should draw a bigger picture. Let me do that right away here. So, I have my level surface, w equals c. I have a curve on that, and at some point, I'm going to have a certain velocity. So, the claim is that the velocity, v, equals dr/dt is tangent -- -- to the level, w equals c because it's tangent to the curve, and the curve is inside the level, OK? Now, what else can we say? Well, we have, the chain rule will tell us how the value of w changes. So, by the chain rule, we have dw/dt. So, the rate of change of the value of w as I move along this curve is given by the dot product between the gradient and the velocity vector. And, so, well, maybe I can rewrite it as w dot v, and that should be, well, what should it be? What happens to the value of w as t changes? Well, it stays constant because we are moving on a curve. That curve might be complicated, but it stays always on the level, w equals c. So, it's zero because w of t equals c, which is a constant. OK, is that convincing? OK, so now if we have a dot product that's zero, that tells us that these two guys are perpendicular. So -- So if the gradient vector is perpendicular to v, OK, that's a good start. We know that the gradient is perpendicular to this vector tangent that's tangent to the level surface. What about other vectors tangent to the level surface? Well, in fact, I could use any curve drawn on the level of w equals c. So, I could move, really, any way I wanted on that surface. In particular, I claim that I could have chosen my velocity vector to be any vector tangent to the surface. OK, so let's write this. So this is true for any curve, or, I'll say for any motion on the level surface, w equals c. So that means v can be any vector tangent to the surface tangent to the level. See, for example, OK, let me draw one more picture. OK, so I have my level surface. So, I'm drawing more and more levels, and they never quite look the same. But I have a point. And, at this point, I have the tangent plane to the level surface. OK, so this is tangent plane to the level. Then, if I choose any vector in that tangent plane. Let's say I choose the one that goes in that direction. Then, I can actually find a curve that goes in that direction, and stays on the level. So, here, that would be a curve that somehow goes from the right to the left, and of course it has to end up going up or something like that. OK, so given any vector tangent -- -- let's call that vector v tangent to the level, we get that the gradient is perpendicular to v. So, if the gradient is perpendicular to this vector tangent to this curve, but also to any vector, I can draw that tangent to my surface. So, what does that mean? Well, that means the gradient is actually perpendicular to the tangent plane or to the surface at this point. So, the gradient is perpendicular. And, well, here, I've illustrated things with a three-dimensional example, but really it works the same if you have only two variables. Then you have a level curve that has a tangent line, and the gradient is perpendicular to that line. OK, any questions? No? OK, so, let's see. That's actually pretty neat because there is a nice application of this, which is to try to figure out, now we know, actually, how to find the tangent plane to anything, pretty much. OK, so let's see. So, let's say that, for example, I want to find -- -- the tangent plane -- -- to the surface with equation, let's say, x^2 y^2-z^2 = 4 at the point (2,1, 1). Let me write that. So, how do we do that? Well, one way that we already know, if we solve this for z, so we can write z equals a function of x and y, then we know tangent plane approximation for the graph of a function, z equals some function of x and y. But, that doesn't look like it's the best way to do it. OK, the best way to it, now that we have the gradient vector, is actually to directly say, oh, we know the normal vector to this plane. The normal vector will just be the gradient. Oh, I think I have a cool picture to show. OK, so that's what it looks like. OK, so here you have the surface x2 y2-z2 equals four. That's called a hyperboloid because it looks like when you get when you spin a hyperbola around an axis. And, here's a tangent plane at the given point. So, it doesn't look very tangent because it crosses the surface. But, it's really, if you think about it, you will see it's really the plane that's approximating the surface in the best way that you can at this given point. It is really the tangent plane. So, how do we find this plane? Well, you can plot it on a computer. That's not exactly how you would look for it in the first place. So, the way to do it is that we compute the gradient. So, a gradient of what? Well, a gradient of this function. OK, so I should say, this is the level set, w equals four, where w equals x^2 y^2 - z^2. And so, we know that the gradient of this, well, what is it? 2x, then 2y, and then negative 2z. So, at this given point, I guess we are at x equals two. So, that's four. And then, y and z are one. So, two, negative two. OK, and that's going to be the normal vector to the surface or to the tangent plane. That's one way to define the tangent plane. All right, it has the same normal vector as the surface. That's one way to define the normal vector to the surface, if you prefer. Being perpendicular to the surface means that you are perpendicular to its tangent plane. OK, so the equation is, well, 4x 2y-2z equals something, where something is, well, we should just plug in that point. We'll get eight plus two minus two looks like we'll get eight. And, of course, we could simplify dividing everything by two, but it's not very important here. OK, so now if you have a surface given by an evil equation, and a point on the surface, well, you know how to find the tangent plane to the surface at that point. OK, any questions? No. OK, let me give just another reason why, another way that we could have seen this. So, I claim, in fact, we could have done this without the gradient, or using the gradient in a somehow disguised way. So, here's another way. So, the other way to do it would be to start with a differential, OK? dw, while it's pretty much the same content, but let me write it as a differential, dw is 2xdx 2ydy-2zdz. So, at a given point, at (2,1, 1), this is 4dx 2dy-2dz. Now, if we want to change this into an approximation formula, we can. We know that the change in w is approximately equal to 4 delta x 2 delta y - 2 delta z. OK, so when do we stay on the level surface? Well, we stay on the level surface when w doesn't change, so, when this becomes zero, OK? Now, what does this approximation sign mean? Well, it means for small changes in x, y, z, this guy will be close to that guy. It also means something else. Remember, these approximation formulas, they are linear approximations. They mean that we replace the function, actually, by some closest linear formula that will be nearby. And so, in particular, if we set this equal to zero instead of approximately zero, it means we'll actually be moving on the tangent plane to the level set. If you want strict equalities in approximations means that we replace the function by its tangent approximation. So -- [APPLAUSE] OK, so the level corresponds to delta w equals zero, and its tangent plane corresponds to four delta x plus two delta y minus two delta z equals zero. That's what I'm trying to say, basically. And, what's delta x? Well, that means it's the change in x. So, what's the change in x here? That means, well, we started with x equals two, and we moved to some other value, x. So, that's actually x- 2, right? That's how much x has changed compared to 2. And, two times (y - 1) minus two times z - 1 = 0. That's the equation of a tangent plane. It's the same equation as the one over there. These are just two different methods to get it. OK, so this one explains to you what's going on in terms of approximation formulas. This one goes right away, by using the gradient factor. So, in a way, with this one, you don't have to think nearly as much. But, you can use either one. OK, questions? No? OK, so let's move on to new topic, which is another application of a gradient vector, and that is directional derivatives. OK, so let's say that we have a function of two variables, x and y. Well, we know how to compute partial w over partial x or partial w over partial y, which measure how w changes if I move in the direction of the x axis or in the direction of the y axis. So, what about moving in other directions? Well, of course, we've seen other approximation formulas and so on. But, we can still ask, is there a derivative in every direction? And that's basically, yes, that's the directional derivative. OK, so these are derivatives in the direction of I hat or j hat, the vectors that go along the x or the y axis. So, what if we move in another direction, let's say, the direction of some unit vector, let's call it u . OK, so if I give you a unit vector, you can ask yourself, if I move in the direction, how quickly will my function change? So -- So, let's look at the straight trajectory. What this should mean is I start at some value, x, y, and there I have my vector u. And, I'm going to move in a straight line in the direction of u. And, I have the graph of my function -- -- and I'm asking myself how quickly does the value change when I move on the graph in that direction? OK, so let's look at a straight line trajectory So, we have a position vector, r, that will depend on some parameter which I will call s. You'll see why very soon, in such a way that the derivative is this given unit vector u hat. So, why do I use s for my parameter rather than t. Well, it's a convention. I'm moving at unit speed along this line. So that means that actually, I'm parameterizing things by the distance that I've traveled along a curve, sorry, along this line. So, here it's called s in the sense of arc length. Actually, it's not really an arc because it's a straight line, so it's the distance along the line. OK, so because we are parameterizing by distance, we are just using s as a convention just to distinguish it from other situations. And, so, now, the question will be, what is dw/ds? What's the rate of change of w when I move like that? Well, of course we know the answer because that's a special case of the chain rule. So, that's how we will actually compute it. But, in terms of what it means, it really means we are asking ourselves, we start at a point and we change the variables in a certain direction, which is not necessarily the x or the y direction, but really any direction. And then, what's the derivative in that direction? OK, does that make sense as a concept? Kind of? I see some faces that are not completely convinced. So, maybe you should show more pictures. Well, let me first write down a bit more and show you something. So I just want to give you the actual definition. Sorry, first of all in case you wonder what this is all about, so let's say the components of our unit vector are two numbers, a and b. Then, it means we'll move along the line x of s equals some initial value, the point where we are actually at the directional derivative plus s times a, or I meant to say plus a times s. And, y of s equals y0 bs. And then, we plug that into w. And then we take the derivative. So, we have a notation for that which is going to be dw/ds with a subscript in the direction of u to indicate in which direction we are actually going to move. And, that's called the directional derivative -- -- in the direction of u. OK, so, let's see what it means geometrically. So, remember, we've seen things about partial derivatives, and we see that the partial derivatives are the slopes of slices of the graph by vertical planes that are parallel to the x or the y directions. OK, so, if I have a point, at any point, I can slice the graph of my function by two planes, one that's going along the x, one along the y direction. And then, I can look at the slices of the graph. Let me see if I can use that thing. So, we can look at the slices of the graph that are drawn here. In fact, we look at the tangent lines to the slices, and we look at the slope and that gives us the partial derivatives in case you are on that side and want to see also the pointer that was here. So, now, similarly, the directional derivative means, actually, we'll be slicing our graph by the vertical plane. It's not really colorful, something more colorful. We'll be slicing things by a plane that is now in the direction of this vector, u, and we'll be looking at the slope of the slice of the graph. So, what that looks like here, so that's the same applet the way that you've used on your problem set in case you are wondering. So, now, I'm picking a point on the contour plot. And, at that point, I slice the graph. So, here I'm starting by slicing in the direction of the x axis. So, in fact, what I'm measuring here by the slope of the slice is the partial in the x direction. It's really partial f partial x, which is also the directional derivative in the direction of i. And now, if I rotate the slice, then I have all of these planes. So, you see at the bottom left, I have the direction in which I'm going. There's this, like, rotating line that tells you in which direction I'm going to be moving. And for each direction, I have a plane. And, when I slice by that plane, I will get, so I have this direction here going maybe to the southwest. So, that gives me a slice of my graph by a vertical plane, and the slice has a certain slope. And, the slope is going to be the directional derivative in that direction. OK, I think that's as graphic as I can get. OK, any questions about that? No? OK, so let's see how we compute that guy. So, let me just write again just in case you want to, in case you didn't hear me it's the slope of the slice of the graph by a vertical plane -- -- that contains the given direction, that's parallel to the direction, u. So, how do we compute it? Well, we can use the chain rule. The chain rule implies that dw/ds is actually the gradient of w dot product with the velocity vector dr/ds. But, remember we say that we are going to be moving at unit speed in the direction of u. So, in fact, that's just gradient w dot product with the unit vector u. OK, so the formula that we remember is really dw/ds in the direction of u is gradient w dot product of u. And, maybe I should also say in words, this is the component of the gradient in the direction of u. And, maybe that makes more sense. So, for example, the directional derivative in the direction of I hat is the component along the x axes. That's the same as, indeed, the partial derivatives in the x direction. Things make sense. dw/ds in the direction of I hat is, sorry, gradient w dot I hat, which is wx,maybe I should write, partial w of partial x. OK, now, so that's basically what we need to know to compute these guys. So now, let's go back to the gradient and see what this tells us about the gradient. [APPLAUSE] I see you guys are having fun. OK, OK, let's do a little bit of geometry here. That should calm you down. So, we said dw/ds in the direction of u is gradient w dot u. That's the same as the length of gradient w times the length of u. Well, that happens to be one because we are taking the unit vector times the cosine of the angle between the gradient and the given unit vector, u, so, have this angle, theta. OK, that's another way of saying we are taking the component of a gradient in the direction of u. But now, what does that tell us? Well, let's try to figure out in which directions w changes the fastest, in which direction it increases the most or decreases the most, or doesn't actually change. So, when is this going to be the largest? If I fix a point, if I set a point, then the gradient vector at that point is given to me. But, the question is, in which direction does it change the most quickly? Well, what I can change is the direction, and this will be the largest when the cosine is one. So, this is largest when the cosine of the angle is one. That means the angle is zero. That means u is actually in the direction of the gradient. OK, so that's a new way to think about the direction of a gradient. The gradient is the direction in which the function increases the most quickly at that point. So, the direction of gradient w is the direction of fastest increase of w at the given point. And, what is the magnitude of w? Well, it's actually the directional derivative in that direction. OK, so if I go in that direction, which gives me the fastest increase, then the corresponding slope will be the length of the gradient. And, with the direction of the fastest decrease? It's going in the opposite direction, right? I mean, if you are on a mountain, and you know that you are facing the mountain, that's the direction of fastest increase. The direction of fastest decrease is behind you straight down. OK, so, the minimal value of dw/ds is achieved when cosine of theta is minus one. That means theta equals 180�. That means u is in the direction of minus the gradient. It points opposite to the gradient. And, finally, when do we have dw/ds equals zero? So, in which direction does the function not change? Well, we have two answers to that. One is to just use the formula. So, that's one cosine theta equals zero. That means theta equals 90 degrees. That means that u is perpendicular to the gradient. The other way to think about it, the direction in which the value doesn't change is a direction that's tangent to the level surface. If we are not changing a, it means we are moving along the level. And, that's the same thing -- -- as being tangent to the level. So, let me just show that on the picture here. So, if actually show you the gradient, you can't really see it here. I need to move it a bit. So, the gradient here is pointing straight up at the point that I have chosen. Now, if I choose a slice that's perpendicular, and a direction that's perpendicular to the gradient, so that's actually tangent to the level curve, then you see that my slice is flat. I don't actually have any slop. The directional derivative in a direction that's perpendicular to the gradient is basically zero. Now, if I rotate, then the slope sort of increases, increases, increases, and it becomes the largest when I'm going in the direction of a gradient. So, here, I have, actually, a pretty big slope. And now, if I keep rotating, then the slope will decrease again. Then it becomes zero when I perpendicular, and then it becomes negative. It's the most negative when I pointing away from the gradient and then becomes zero again when I'm back perpendicular. OK, so for example, if I give you a contour plot, and I ask you to draw the direction of the gradient vector, well, at this point, for example, you would look at the picture. The gradient vector would be going perpendicular to the level. And, it would be going towards higher values of a function. I don't know if you can see the labels, but the thing in the middle is a minimum. So, it will actually be pointing in this kind of direction. OK, so that's it for today.
This is one of over 2,200 courses on OCW. Find materials for this course in the pages linked along the left.
MIT OpenCourseWare is a free & open publication of material from thousands of MIT courses, covering the entire MIT curriculum.
No enrollment or registration. Freely browse and use OCW materials at your own pace. There's no signup, and no start or end dates.
Knowledge is your reward. Use OCW to guide your own life-long learning, or to teach others. We don't offer credit or certification for using OCW.
Made for sharing. Download files for later. Send to friends and colleagues. Modify, remix, and reuse (just remember to cite OCW as the source.)
Learn more at Get Started with MIT OpenCourseWare
MIT OpenCourseWare makes the materials used in the teaching of almost all of MIT's subjects available on the Web, free of charge. With more than 2,200 courses available, OCW is delivering on the promise of open sharing of knowledge. Learn more »
© 2001–2015
Massachusetts Institute of Technology
Your use of the MIT OpenCourseWare site and materials is subject to our Creative Commons License and other terms of use.