Topics covered: Mean value theorem; Inequalities
Instructor: Prof. David Jerison
Lecture Notes (PDF)
The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To make a donation, or to view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu.
PROFESSOR: What we're going to talk about today is a continuation of last time. I want to review Newton's method because I want to talk to you about its accuracy. So if you remember, the way Newton's method works is this. If you have a curve and you want to know whether it crosses the axis. And you don't know where this point is, this point which I'll call x here, what you do is you take a guess. Maybe you take a point x_0 here. And then you go down to this point on the graph, and you draw the tangent line. I'll draw these in a couple of different colors so that you can see the difference between them.
So here's a tangent line. It's coming out like that. And that one is going to get a little closer to our target point. But now the trick is, and this is rather hard to see because the scale gets small incredibly fast, is that if you go right up from that, and you do this same trick over again. That is, this is your second guess, x_1, and now you draw the second tangent line. Which is going to come down this way. That's really close. You can see here on the chalkboard, it's practically the same as the dot of x. So that's the next guess. Which is x_2. And I want to analyze, now, how close it gets. And just describe to you how it works. So let me just remind you of the formulas, too. It's worth having them in your head.
So the formula for the next one is this. And then the idea is just to repeat this process. Which has a fancy name, in algorithms, which is to iterate, if you like. So we repeat the process. And that means, for example, we generate x_2 from x_1 by the same formula. And we did this last time. And, more generally, the (n+1)st is generated from the nth guess, by this formula here. So what I'd like to do is just draw the picture of one step a little bit more closely. So I want to blow up the picture, which is above me there. That's a little too high. Where are my erasers? Got to get it a little lower than that, since I'm going to depict everything above the line here. So here's my curve coming down. And suppose that x_1 is here, so this is directly above it is this point here. And then as I drew it, this green tangent coming down like that. It's a little bit closer, and this was the place, x_2, and then here was x, our target, which is where the curve crosses as opposed to the straight tangent line crossing.
So that's the picture that I want you to keep in mind. And now, we're just going to do a very qualitative kind of error analysis. So here's our error analysis. And we're starting out, the distance between x_1 and x is what we want to measure. In other words, how close we are to where we're heading. And so I've called that, I'm going to call that Error 1. That's x - x1, in absolute value. And then, E_2 would be x - x_2, in absolute value. And so forth. And, last time, when I was estimating the size of this-- So E_n would be whatever it was. Last time, remember, we did it for a specific case. So last time, I actually wrote down the numbers. And they were these numbers, maybe you could call them E_n, which was the absolute value of square root of 5 minus x_n. These are the sizes that I was writing down last time. And I just want to talk about in general what to expect. That worked amazingly well, and I want to show you that that's true pretty much in general.
So the first distance, again, is E_1, is this distance here. That's the E_1. And then this shorter distance, here, this little bit, which I'll mark maybe in green, is E_2. So how much shorter is E_1 than E_2? Well, the idea is pretty simple. It's that if this distance and this vertical distance, they are probably about the same as the perpendicular distance. And this is basically the situation of a curve touching a tangent line. Then the separation is going to be quadratic. And that's basically what's going to happen. So, in other words the distance E_2 is going to be about the square of the distance E_1. And that's really the only feature of this that I want to point out. So, approximately, this is the situation that we're going to get. And so what that means is, and maybe thinking from last time, what we had was something roughly like this. You have an E_0, you have an E_1, you have an E_2, you have an E_3, and so forth. Maybe I'll write down E_4 here. And last time, this was about 10^(-1). So the expectation based on this rule is that the next error's the square of the previous one. So that's 10^(-2). The next one is the square of the previous one. So that's 10^(-4). And the next one is the square of that, that's 10^(-8). And this one is 10^(-16).
So the thing that's impressive about this list of numbers is you can see that the number of digits of accuracy doubles at each stage. Accuracy doubles at each step. The number of digits of accuracy doubles at each step. So, very, very quickly you get past the accuracy of your calculator, as you saw on your problem set. And this thing works beautifully. So, let me just summarize by saying that Newton's method works very well. By which I mean this kind of rate. And I want to be just slightly specific. If-- there are really two conditions disguised in this, that are going on. One is that f' has to be, not to big-- to be not small. And f'' has to be not too big. That's roughly speaking what's going on. I'll explain these in just a second. And x_0 starts nearby. Nearby the target value x.
So that's really what's going on here. So let me just illustrate to you. So I'm not going to explain this, except to say the reason why this second derivative gets involved is that it's how curved the curve is, that how far away you get. If the second derivative were 0, that would be the best possible case. Then we would get it on the nose. If the second derivative is not too big, that means the quadratic part is not too big. So we don't get away very far from the green line to the curve itself. The other thing to say is, as I said, that x_0 needs to start nearby. So I'll explain that by explaining what maybe could go wrong. So the ways the method can fail, and one example which actually would have happened in our example from last time, which was y = x^2 - 5, is suppose we'd started x_0 over here. Then this thing would have gone off to the left, and we would have landed on not the square root of 5 but the other root. So if it's too far away, then we get the wrong root.
So that's an example, explaining that the x_0 needs to start near the root that we're talking about. Otherwise, the method doesn't know which root you're asking for. It only knows where you started. So it may go off to the wrong place. OK, it can't read your mind. Yes, question.
PROFESSOR: Oh, good question. So the question was, what if the first error is larger than 1. Are you in trouble? And the answer is, absolutely, yes. If you have quadratic behavior, you can see. If you have a quadratic nearby, it's pretty close to the straight line. But far away, a parabola is miles from a straight line. It's way, way, way far away. So if you're foolish enough to start over here, you may have some trouble making progress. Actually, it isn't, when I-- that little wiggle there just meant proportional to. In fact, in the particular case of a parabola, it manages to get back. It saves itself. But there's no guarantee of that sort of thing. You really do want to start reasonably close. Yep.
PROFESSOR: What you have to do is you have to watch out. That is, it's hard to know what assumptions to make about x_0. You plug it into the machine and you see what you get. And either it works or it doesn't. You can tell that it's marching toward a specific place, and you can tell that that place probably is a zero, usually. But maybe it's not the one you were looking for. So in other words, you have to use your head. You run the program and then you see what it does. And if you're lucky-- the problem is, if you have no idea where the zero is, you may just wander around forever. As we'll see in a second.
So the next example here is the following. I said that f' has to be not too small. There's a real catastrophe hiding just inside this picture. Which is the transition between when you find the positive root and when you find the negative root here. Which is, if you're right down here. If you were foolish enough to get 0, then what's going to happen is your tangent line is horizontal. It doesn't even meet the axis. So in the formula, you can see that's a catastrophe. Because there's an f' in the denominator. So that's 0. That's undefined. It's not surprising, it's consistent that the parallel line doesn't meet the axis. And you have no x_1. So you had-- So if you like, another point here is that f' = 0 is a disaster. A disaster for the method. Because the next-- So say, if f(x_0) = 0, then x_1 is undefined.
And finally, there's one other weird thing that can happen. Which is, which I'll just draw a picture of schematically. Which you can get from a wiggle. So this wiggle has three roots. The way I've drawn it. And it can be that you can start over here with your x_0. And draw your tangent line and go over here to an x_1. And then that tangent line will take you right back to the x_0. I didn't draw it quite right, but that's about right. So it goes over like this.
So let me draw the two tangent lines, so that you can see it properly. Sorry, I messed it up. So here are the two tangent lines. This guy and this guy. And it just goes back and forth. x_0 cycles to x_1, and x_1 goes back to x_0. We have a cycle. And it never goes anywhere. This is, the grass is always greener. It's over here, it thinks, oh, I really would prefer to go to this zero and then it thinks oh, I want to go back. And it goes back and forth, and back and forth.
Grass is always greener on the other side of the fence. Never, never gets anywhere. So those are the sorts of things that can go wrong with Newton's method. Nevertheless, it's fantastic. It works very well, in a lot of situations. And solves basically any equation that you can imagine, numerically.
Next we're going to move on. We're going to move on to something which is a little theoretical. Which is the mean value theorem. And that will allow us in just a day or so to launch into the ideas of integration, which is the whole second half of the course. So let's get started with that. The mean value theorem will henceforth be abbreviated MVT.
So I don't have to write quite as much every time I refer to it. The mean value theorem, colloquially, says the following. If you go from Boston to LA, which I think a lot of Red Sox fans are going to want to do soon, so that's 3,000 miles. In 6 hours, then at some time you are going at a certain speed. The average of this speed. Average, so, speed, which in this case is what? So we're going at the average speed. That's 3,000 miles times 6 hours, so that's 500 miles per hour. Exactly.
So some time on your journey-- of course, some of the time you're going more than 500 miles an hour, sometimes you are going less. And some time you must've been going 500 miles an hour exactly. That's the mean value theorem. The reason why it's called mean value theorem is that word mean is the same as the word average.
So now I'm going to state it in math symbols, the same theorem. And it's a formula. It says that the difference quotient - so this is the distance traveled divided by the time elapsed, that's the average speed - is equal to the infinitesimal speed for some time in between. So some c, which is in between a and b-- I'm not quite done. It's a real theorem, it has hypotheses. I've told you the conclusion first, but there are some hypotheses, they're straightforward hypotheses. Provided f is differentiable, that is, it has a derivative in the interval a < x < b. And continuous in a <= x <= b.
There has to be a sense that you can make out of the speed, or the rate of change of f at each intermediate point. And in order for the values at the ends to make sense, it has to be continuous. There has to be a link between the values at the ends and what's going on in between. If it were discontinuous, there would be no relation between the left and right values and the rest of the function. So here's the theorem, conclusion and its hypothesis. And it means what I said more colloquially up above. Now, I'm going to prove this theorem immediately. At least, give a geometric intuitive argument, which is not very different from the one that's given in a very systematic treatment. So here's the proof of the mean value theorem. It's really just a picture. So here's a place, and here's another place on the graph. And the graph is going along like this, let's say. And this line here is the secant line. So this is (a, f(a)) down here. And this is (b, f(b)) up there. And this segment is the secant, its slope is the slope that we're aiming for. The slope of that line is the left-hand side of this formula here. So we need to find something with that slope.
And what we need to find is a tangent line with that slope, because what's on the right-hand side is the slope of a tangent line. So here's how we construct it. We take a parallel line, down here. And then we just translate it up, leaving it parallel, we move it up. Towards this one. Until it touches. And where it touches, at this point of tangency, down there, I've just found my value of c. And you can see that if the tangent line is parallel to this line, that's exactly the equation we want. So this thing has slope f'(c). And this other one has slope equal to this complicated expression, (f(b) - f(a)) / (b - a).
That is almost the end of the proof. There's one problem. So, again, we move a parallel line up. Move up the parallel line until it touches. There's a little subtlety here, which I just want to emphasize. Which is that that dotted line keeps on going here. But when we bring it up, we're going to ignore what's happening outside of a. And beyond b, alright? So we're just going to ignore the rest of the graph. But there is one thing that can go wrong. So if it does not touch, then the picture looks likes this. Here are the same two points. And the graph is all above. And we brought up our thing. And it went like that. So we didn't construct a tangent line. If this happens. So we're in trouble, in that point. In this situation, sorry. But there's a trick, which is a straightforward trick. Then bring the tangent lines down from the top. So parallel lines, sorry, not tangent lines. Parallel lines. From above. So, that's the whole story. That's how we cook up this point c, with the right properties.
I want to point out just one more theoretical thing. And then the rest, we're going to be drawing conclusions. So there's one more theoretical remark about the proof, which is something that is fairly important to understand. When you understand a proof, you should always be thinking about why the hypotheses are necessary. Where do I use the hypothesis. And I want to give you an example the proof doesn't work to show you that the hypothesis is an important one. So the example is the following. I'll just take a function which is two straight lines like this. And if you try to perform this trick with these things, then it's going to come up and it's going to touch here. But the problem is that the tangent line is not defined here. There are lots of tangents, and there's no derivative at this point. So the derivative doesn't exist here. So this is the claim that one bad point ruins the proof. We need f' to exist at all-- so, f'(x) to exist at all x in between. Can't get away even with one defective point.
Now it's time to draw some consequences. And the main consequence is going to have to do with applications to graphing. But we'll see tomorrow and for the rest of the course that this is even more significant. It's significant to all the rest of calculus. I'm going to list three consequences which you're quite familiar with already. So, the first one is if f' is positive, then f is increasing. And the second one is if f' is negative, then f is decreasing. And the last one seems like the simplest. But even this one alone is the key to everything. If f' = 0, then f is constant. These are three consequences, now, of the mean value theorem. And let me show you how they're proved.
I just told you that they were true, maybe, a while ago. And certainly I mentioned the first two. The last one was so simple that we maybe just swept it under the rug. You did use it on a problem set, once or twice. But it turns out that this actually requires proof, and we're going to give the proof right now. The way that the proof goes is simply to write down, to rewrite star. Rewrite our formula. Which says that (f(b) - f(a)) / (b - a) = f'(c). And you see I've written it from left to right here to say that the right-hand side information about the derivative is going to be giving the information about the function. That's the way I'm going to read it. In order to express this, though, I'm going to just rewrite it a couple of times here. So here's f(a), multiplying through by the denominator. And now I'm going to write it in another customary form for the mean value theorem. Which is f(b) = f(a) + f'(c) (b-a). So here's another version. I should probably have put this one in the box to begin with anyway. And, just changing it around algebraically, it's this fact here. They're the same thing. And now with the formula written in this form, I claim that I can check these three facts.
Let's start with the first one. I'm going to set things up always so that a < b. And that's the setup of the theorem. And so that means that b - a is positive. Which means that this factor over here is a positive number. If f' is positive, which is what happens in the first case, that's the assumption that we're making, then this is a positive number. And so f(b) > f(a). Which means that it's increasing. It goes up as the value goes up. Similarly, if f'(c) is negative, then this is a positive times a negative number, this is negative. f(b) < f(a). So it goes the other way. Maybe I'll write this way. And finally, if f'(c) = 0, then f(b) = f(a). Which if you apply it to all possible ends, means if you can do it for every interval, which you can, then that means that f is constant. It never gets to change values. Well you might have believed these facts already. But I just want to emphasize to you that this turns out to be the one key link between infinitesimals, between limits and these actual differences. Before, we were saying that the difference quotient was approximately equal to the derivative. Now we're saying that it's exactly equal to a derivative. Although we don't know exactly which point to use. It's some point in between. I'm going to be deducing some other consequences in a second, but let me stop for second to make sure that everybody's on board. Especially since I've finished the blackboards here. Before we-- everybody happy?
PROFESSOR: I'm just going to repeat your question first. I'm a little bit confused, you said, about what guarantees that there's a point of tangency. That's what you said. So do you want to elaborate, or do you want to want to stop with what you just said? What is it that confuses you?
PROFESSOR: So I'm not claiming that there's only one point. This could wiggle a lot of times and it maybe touches at ten places. In other words, it's OK with me if it touches more than once. Then I just have more, the more the merrier. In other words, I don't want there necessarily only to be one. It could come down like this. And touch a second time. Is that what was concerning you? So in mathematics, when we claim that this is true for some point, we don't necessarily mean that it doesn't work for others. In fact, if the function is constant, this is 0 and in fact this equation is true for every c. That satisfies your question? The fact that this point exists actually is a touchy point. I just convinced you of it visually. It's a geometric issue, whether you're allowed to do this. Indeed, it has to do with the existence of tangent lines and more analysis than we can do in this class. Yeah. Another question.
PROFESSOR: Pardon me.
PROFESSOR: The question is, what's the difference between this and the linear approximation. And I think, let me see if I can describe that. I'll leave the theorem on the board. I'm going to get rid of the colloquial version of the theorem. And I'll try to describe to you the difference between this and the linear approximation. I was planning to do that in a while, but we'll do it right now since that's what you're asking. That's fine. So here's the situation. The linear approximation, so let's say comparison with linear approximation. They're very closely related. The linear approximation says the change in f over the change in x, that's the left-hand side of this thing, is approximately f'(a). For b near a, and b - a = delta x. This statement, which is in the box, which is sitting right up there, is the statement that this change in f is actually equal to something. Not approximately equal to it. It's equal to f' of some c. And the problem here is that we don't know exactly which c. This is for some c. Between a and b. Right, so. That's the difference between the two. And let me elaborate a little bit.
If you're trying to understand what (f(b) - f(a)) / (b - a) is, the mean value theorem is telling you for sure that it's equal to this f'(c). So that means it's less than or equal to the largest possible value on the-- largest value you can get, for sure. And this is on the whole interval. And I'm going to include the ends, because when you take a max it's sometimes achieved at the ends. And similarly, because it's f'(c), it's definitely bigger than the min on this same interval here. This is all you can say based on the mean value theorem. All you know is this. And colloquially, what that means is that the average speed is between the maximum and the minimum. Not very surprising. The mean value theorem is supposed to be very intuitively obvious. It's saying the average speed is trapped between the maximum speed and the minimum speed. For sure, that's something, that's why-- incidentally this wasn't really proved when Newton and Leibniz were around. But, let's write this so that you can read it. Average speed is between the max and the min. But nobody had any trouble, they didn't disbelieve it because it's a very natural thing.
Now if, for example, I take any kind of linear approximation, say, for instance, e^x is approximately 1 + x. Then I'm making the guess-- no, I don't want to say this yet. That's not going to explain it to you well enough. What we're saying, so this is the mean value here. This is what the mean value theorem says. And here's the linear approximation. The linear approximation is saying that the average speed is approximately the initial speed, or possibly the final speed. So if a is the left endpoint, then it's the initial speed. If it happens to be the right endpoint, if the value of x is to the left then it's the final speed. So those are the-- so you can see it's approximately right. Because the speed, when you're on a short interval, shouldn't be varying very much. The max and the min should be pretty close together. So that's why the linear approximation is reasonable. And this is telling you absolutely, it's no less than the min and no more than the max. Yeah.
PROFESSOR: The little kink?
PROFESSOR: If you approach from the top. So if it's still under here I can show you it again. Oh yeah, it's still there. Good.
PROFESSOR: Oh, the one with the wiggle on top? Yeah, this one you can't. Because there's nothing to touch and it also fails from the bottom because there's this bad point. From the top, it could work. It can certainly work both ways. So, for example. See if you're a machine, you maybe don't have a way of doing this. But if you're a human being you can spot all the places. There are a bunch of spots where the slope is right. And it's perfectly OK. All of them work.
PROFESSOR: It's not that the c is the same. It's just we've now found one, two, three, four, five c's for which it works.
STUDENT: [INAUDIBLE] PROFESSOR: If you're asked to find a c, so first of all that's kind of a phony question. There are some questions on your problem set which ask you to find a c. That actually is struggling to get you to understand what the statement of the mean value theorem is, but you should not pay a lot of attention to those questions. They're not very impressive. But, of course, you would have to find all the-- if it asked you to find one, you find one. If you can find some more, fine. You can pick whichever one you want. Mean value theorem just doesn't care. The mean value theorem doesn't care because actually, the mean value theorem is never used except to-- in real life, except in this context here. You can never nail down which c it is, so the only thing you can say is that you're going slower than the maximum speed and faster than the minimum speed. Sorry, say that again?
PROFESSOR: If you're asked for a specific c, you have to find a specific c. And it has to be in the range. In between, it has to be in here.
So now I want to tell you about another kind of application, which is really just a consequence of what I've described here. I should emphasize, by the way, this, probably, should be doing this. I guess we've never used this color here. It's popular. This is pink. So this one is so good. So since we're going to do this. So the reason why the exclamation points are temporary, this is such an obvious fact. But this is the way that you're going to want to use the mean value theorem, and this is the only way you need to understand the mean value theorem. On your test, or ever in your whole life. So this is the way it will be used. As I will make very clear when we review for the exam.
In practice what happens is you even forget about the mean value theorem, and what you remember is these three properties here. Which are themselves consequences of the mean value theorem. So these are the ones that I want to illustrate now. In my next discussion here. I'm just going to talk about inequalities. Inequalities are relationships between functions. And I'm going to prove a couple of them using the properties over there, the properties that functions with positive derivatives are increasing. Here's an example. e^x > 1 + x, where x > 0. The proof is the following. I consider-- So here's a proof. I consider the function f(x), which is the difference. e^x - (1+x). I observe that it starts at f(0) equal to, well, that's e^0 - (1+0), which is 0. And, it keeps on going. f'(x) is e^x, if I differentiate here, the 1 goes away. I get minus 1. That's the derivative of the function. And this function, because e^x > 1 for x positive, is positive. As x gets bigger and bigger, this rate of increase is positive. And therefore, three dots, that's therefore, f(x) is bigger than its starting place, for x > 0. If it's increasing, then that's-- in particular, it's increasing starting from 0. So this is true. Now, all I have to do is read what this inequality says. And what it says is that e^x, just plug in for f(x), which is right here, minus (1+x) is greater than the starting value, which was 0. Now, I put the thing that's negative on the other side. So that's the same thing as e^x > 1 + x. That's a typical inequality. And now, we'll use this principle again. Oh gee, I erased the wrong thing. I erased the statement and not the proof. Well, hide the proof.
The next thing I want to prove to you is that e^x > 1 + x + x^2 / 2. So, how do I do that? I introduce a function g(x), which is e^x minus this. And now, I'm just going to do exactly the same thing I did before. Which is, I get started with g(0), which is 1 - 1. Which is 0. And g'(x) is e^x minus - now, look at what happens when I differentiate this. The 1 goes away. The x gives me a 1, and the x^2 / 2 gives me a plus x. And this one is positive for x > 0, because of step 1. Because of the previous one that I did. So this one is increasing. g is increasing. Which says that g(x) > g(0). And if you just read that off, it's exactly the same as our inequality here. e^x > 1 + x + x^2 / 2. Now, you can keep on going with this essentially forever. And let me just write down what you get. You get e^x is greater than 1 plus x plus x^2 / 2, the next one turns out to be x^3 / (3*2). x^4 / (4*3*2). And you can do whatever you want. You can do others. And this is like the tortoise and the hare. This is the tortoise, and this is the hare, it's always ahead. But eventually, if you go infinitely far, it catches up. So this turns out to be exactly equal to e^x in the limit. And we'll talk about that maybe at the end of the course.
This is one of over 2,400 courses on OCW. Explore materials for this course in the pages linked along the left.
MIT OpenCourseWare is a free & open publication of material from thousands of MIT courses, covering the entire MIT curriculum.
No enrollment or registration. Freely browse and use OCW materials at your own pace. There's no signup, and no start or end dates.
Knowledge is your reward. Use OCW to guide your own life-long learning, or to teach others. We don't offer credit or certification for using OCW.
Made for sharing. Download files for later. Send to friends and colleagues. Modify, remix, and reuse (just remember to cite OCW as the source.)
Learn more at Get Started with MIT OpenCourseWare