Flash and JavaScript are required for this feature.

Download the video from iTunes U or the Internet Archive.

**Topics covered:** Floating point numbers, successive refinement, finding roots

**Instructor:** Prof. Eric Grimson, Prof. John Guttag

5: Floating Point Numbers, ...

ANNOUNCER: Open content is provided under a creative commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To make a donation, or view additional materials from hundreds of MIT courses, visit MIT OpenCourseWare at ocw.mit.edu .

PROFESSOR JOHN GUTTAG: Good morning.

We should start with the confession, for those of you looking at this on OpenCourseWare, that I'm currently lecturing to an empty auditorium. The fifth lecture for 600 this term, we ran into some technical difficulties, which left us with a recording we weren't very satisfied with. So, I'm-- this is a redo, and if you will hear no questions from the audience and that's because there is no audience.

Nevertheless I will do my best to pretend. I've been told this is a little bit like giving a speech before the US Congress when C-SPAN is the only thing watching.

OK. Computers are supposed to be good for crunching numbers. And we've looked a little bit at numbers this term, but I now want to get into looking at them in more depth than we've been doing.

Python has two different kinds of numbers. So far, the only kind we've really paid any attention to is type int. And those were intended to mirror the integers, as we all learned about starting in elementary school. And they're good for things that you can count. Any place you'd use whole numbers.

Interestingly, Python, unlike some languages, has what are called arbitrary precision integers. By that, we mean, you can make numbers as big as you want them to.

Let's look at an example. We'll just take a for a variable name, and we'll set a to be two raised to the one-thousandth power. That, by the way, is a really big number. And now what happens if we try and display it? We get a lot of digits. You can see why I'm doing this on the screen instead of writing it on the blackboard.

I'm not going to ask you whether you believe this is the right answer, trust me, trust Python. I would like you to notice, at the very end of this is the letter L.

What does that mean? It means long. That's telling us that it's representing these-- this particular integer in what it calls it's internal long format.

You needn't worry about that. The only thing to say about it is, when you're dealing with long integers, it's a lot less efficient than when you're dealing with smaller numbers. And that's all it's kind of warning you, by printing this L.

About two billion is the magic number. When you get over two billion, it's now going to deal with long integers, so if, for example, you're trying to deal with the US budget deficit, you will need integers of type L.

OK. Let's look at another interesting example. Suppose I said, b equal to two raised to the nine hundred ninety-ninth power. I can display b, and it's a different number, considerably smaller, but again, ending in an L.

And now, what you think I'll get if we try a divided by b? And remember, we're now doing integer division. Well, let's see.

We get 2L. Well, you'd expect it to be two, because if you think about the meaning of exponentiation, indeed, the difference between raising something to the nine hundred ninety-ninth power and to the one-thousandth power should be, in this case, two, since that's what we're raising to a power.

Why does it say 2L, right? Two is considerably less than two billion, and that's because once you get L, you stay L. Not particularly important, but kind of worth knowing.

Well, why am I bothering you with this whole issue of how numbers are represented in the computer? In an ideal world, you would ignore this completely, and just say, numbers do what numbers are supposed to do.

But as we're about to see, sometimes in Python, and in fact in every programming language, things behave contrary to what your intuition suggests. And I want to spend a little time helping you understand why this happens. So let's look at a different kind of number.

And now we're going to look at what Python, and almost every other programming language, calls type float. Which is short for floating point. And that's the way that programming languages typically represent what we think of as real numbers.

So, let's look at an example. I'm going to set the variable x to be 0.1, 1/10, and now we're going to display x.

Huh? Take a look at this. Why isn't it .1? Why is it 0.1, a whole bunch of zeros, and then this mysterious one appearing at the end?

Is it because Python just wants to be obnoxious and is making life hard? No, it has to do with the way the numbers are represented inside the computer. Python, like almost every modern programming language, represents numbers using the i triple e floating point standard, and it's i triple e 754.

Never again will you have to remember that it's 754. I promise not to ask you that question on a quiz.

But that's what they do. This is a variant of scientific notation. Something you probably learned about in high school, as a way to represent very large numbers.

Typically, the way we do that, is we represent the numbers in the form of a mantissa and an exponent. So we represent a floating point number as a pair, of a mantissa and an exponent.

And because computers work in the binary system, it's unlike what you probably learned in high school, where we raise ten to some power. Here we'll always be raising two to some power. Maybe a little later in the term, if we talk about computer architecture, we'll get around to explaining why computers working binary, but for now, just assume that they do and in fact they always have.

All right. Purists manage to refer to the mantissa as a significant, but I won't do that, because I'm an old guy and it was a mantissa when I first learned about it and I just can't break myself of the habit.

All right. So how does this work? Well, when we recognize so-- when we represent something, the mantissa is between one and two. Whoops. Strictly less than two, greater than or equal to one.

The exponent, is in the range, -1022 to +1023. So this lets us represent numbers up to about 10 to the 308th, plus or minus 10 to the 308th, plus or minus. So, quite a large range of numbers.

Where did these magic things come from? You know, what-- kind of a strange numbers to see here. Well, it has to do with the fact that computers typically have words in them, and the words today in a modern computer are 64 bits. For many years they were 32 bits, before that they were 16 bits, before that they were 8 bits, they've continually grown, but we've been at 64 for a while and I think we'll be stuck at 64 for a while.

So as we do this, what we do is, we get one bit for the sign-- is it a positive or negative number?-- 11 for the exponent, and that leaves 52 for the mantissa. And that basically tells us how we're storing numbers.

Hi, are you here for the 600 lecture? There is none today, because we have a quiz this evening. It's now the time that the lecture would normally have started, and a couple of students who forgot that we have a quiz this evening, instead of a lecture, just strolled in, and now strolled out.

OK. You may never need to know these constants again, but it's worth knowing that they exist, and that basically, this gives us about the equivalent of seventeen decimal digits of precision. So we can represent numbers up to seventeen decimal digits long. This is an important concept to understand, that unlike the long ints where they can grow arbitrarily big, when we're dealing with floating points, if we need something more than seventeen decimal digits, in Python at least, we won't be able to get it. And that's true in many languages.

Now the good news is, this is an enormous number, and it's highly unlikely that ever in your life, you will need more precision than that.

All right. Now, let's go back to the 0.1 mystery that we started at, and ask ourselves, why we have a problem representing that number in the computer, hence, we get something funny out from we try and print it back.

Well, let's look at an easier problem first. Let's look at representing the fraction 1/8. That has a nice representation. That's equal in decimal to 0.125, and we can represent it conveniently in both base 10 and base 2.

So if you want to represent it in base 10, what is it? What is that equal to? Well, we'll take a mantissa, 1.25, and now we need to multiply it by something that we can represent nicely, and in fact that will be times 10 to the -1. So the exponent would simply be -1, and we have a nice representation.

Suppose we want to represent it in base 2? What would it be? 1.0 times-- anybody?-- Well, 2 to the -3. So, in binary notation, that would be written as 0.001.

So you see, 1/8 is kind of a nice number. We can represent it nicely in either base 10 or base 2.

But how about that pesky fraction 1/10? Well, in base 10, we know how to represent, it's 1 times 10 to the-- 10 to the what?-- 10 to the 1? No.

But in base 2, it's a problem. There is no finite binary number that exactly represents this decimal fraction. In fact, if we try and find the binary number, what we find is, we get an infinitely repeating series. Zero zero zero one one zero zero one one zero zero, and et cetera. Stop at any finite number of bits, and you get only an approximation to the decimal fraction 1/10.

So on most computers, if you were to print the decimal value of the binary approximation-- and that's what we're printing here, on this screen, right? We think in decimal, so Python quite nicely for us is printing things in decimal-- it would have to display-- well I'm not going to write it, it's a very long number, lots of digits-- however, in Python, whenever we display something, it uses the built-in function repr, short for representation, that it converts the internal representation in this case of a number, to a string, and then displays that string in this case on the screen. For floats, it rounds it to seventeen digits. There's that magic number seventeen again. Hence, when it rounds it to seventeen digits, we get exactly what you see in the bottom of the screen up there.

Answer to the mystery, why does it display this? Now why should we care? Well, it's not so much that we care about what gets displayed, but we have to think about the implications, at least sometimes we have to think about the implications, of what this inexact representation of numbers means when we start doing more-or-less complex computations on those numbers.

So let's look at a little example here. I'll start by starting the variable s to 0.0 . Notice I'm being careful to make it a float. And then for i in range, let's see, let's take 10, we'll increase s by 0.1 .

All right, we've done that, and now, what happens when I print s? Well, again you don't get what your intuition says you should get. Notice the last two digits, which are eight and nine. Well, what's happening here?

What's happened, is the error has accumulated. I had a small error when I started, but every time I added it, the error got bigger and it accumulates. Sometimes you can get in trouble in a computation because of that.

Now what happens, by the way, if I print s? That's kind of an interesting question.

Notice that it prints one. And why is that? It's because the print command has done a rounding here. It automatically rounds.

And that's kind of good, but it's also kind of bad, because that means when you're debugging your program, you can get very confused. You say, it says it's one, why am I getting a different answer when I do the computation? And that's because it's not really one inside. So you have to be careful.

Now mostly, these round-off errors balance each other out. Some floats are slightly higher than they're supposed to be, some are slightly lower, and in most computations it all comes out in the wash and you get the right answer. Truth be told, most of the time, you can avoid worrying about these things. But, as we say in Latin, caveat computor. Sometimes you have to worry a little bit.

Now there is one thing about floating points about which you should always worry. And that's really the point I want to drive home, and that's about the meaning of double equal.

Let's look at an example of this. So we've before seen the use of import, so I'm going to import math, it gives me some useful mathematical functions, then I'm going to set the variable a to the square root of two.

Whoops. Why didn't this work? Because what I should have said is math dot square root of two. Explaining to the interpreter that I want to get the function sqrt from the module math. So now I've got a here, and I can look at what a is, yeah, some approximation to the square root about of two.

Now here's the interesting question. Suppose I ask about the Boolean a times a equals equals two. Now in my heart, I think, if I've taken the square root of number and then I've multiplied it by itself, I could get the original number back. After all, that's the meaning of square root.

But by now, you won't be surprised if the answer of this is false, because we know what we've stored is only an approximation to the square root. And that's kind of interesting. So we can see that, by, if I look at a times a, I'll get two point a whole bunch of zeros and then a four at the end.

So this means, if I've got a test in my program, in some sense it will give me the unexpected answer false. What this tells us, is that it's very risky to ever use the built-in double--equals to compare floating points, and in fact, you should never be testing for equality, you should always be testing for close enough.

So typically, what you want to do in your program, is ask the following question: is the absolute value of a times a minus 2.0 less than epsilon? If we could easily type Greek, we'd have written it that way, but we can't.

So that's some small value chosen to be appropriate for the application. Saying, if these two things are within epsilon of each other, then I'm going to treat them as equal.

And so what I typically do when I'm writing a Python code that's going to deal with floating point numbers, and I do this from time to time, is I introduce a function called almost equal, or near, or pick your favorite word, that does this for me. And wherever I would normally written double x equals y, instead I write, near x,y, and it computes it for me.

Not a big deal, but keep this in mind, or as soon as you start dealing with numbers, you will get very frustrated in trying to understand what your program does.

OK. Enough of numbers for a while, I'm sure some of you will find this a relief. I now want to get away from details of floating point, and talk about general methods again, returning to the real theme of the course of solving problems using computers.

Last week, we looked at the rather silly problem of finding the square root of a perfect square. Well, that's not usually what you need. Let's think about the more useful problem of finding the square root of a real number.

Well, you've just seen how you do that. You import math and you call sqrt. Let's pretend that we didn't know that trick, or let's pretend it's your job to introduce-- implement, rather-- math. And so, you need to figure out how to implement square root.

Why might this be a challenge? What are some of the issues?

And there are several. One is, what we've just seen might not be an exact answer. For example, the square root of two. So we need to worry about that, and clearly the way we're going to solve that, as we'll see, is using a concept similar to epsilon. In fact, we'll even call it epsilon.

Another problem with the method we looked at last time is, there we were doing exhaustive enumeration. We were enumerating all the possible answers, checking each one, and if it was good, stopping.

Well, the problem with reals, as opposed to integers, is we can't enumerate all guesses. And that's because the reals are uncountable. If I ask you to enumerate the positive integers, you'll say one, two, three, four, five. If I ask you to enumerate the reals, the positive reals, where do you start? One over a billion, plus who knows?

Now as we've just seen in fact, since there's a limit to the precision floating point, technically you can enumerate all the floating point numbers. And I say technically, because if you tried to do that, your computation would not terminate any time soon.

So even though in some, in principle you could enumerate them, in fact you really can't. And so we think of the floating points, like the reals, as being innumerable. Or not innumerable, as to say as being uncountable. So we can't do that.

So we have to find something clever, because we're now searching a very large space of possible answers. What would, technically you might call a large state space. So we're going to take our previous method of guess and check, and replace it by something called guess, check, and improve.

Previously, we just generated guesses in some systematic way, but without knowing that we were getting closer to the answer. Think of the original barnyard problem with the chickens and the heads and the legs, we just enumerated possibilities, but we didn't know that one guess was better than the previous guess.

Now, we're going to find a way to do the enumeration where we have good reason to believe, at least with high probability, that each guess is better than the previous guess.

This is what's called successive approximation. And that's a very important concept. Many problems are solved computationally using successive approximation.

Every successive approximation method has the same rough structure. You start with some guess, which would be the initial guess, you then iterate-- and in a minute I'll tell you why I'm doing it this particular way, over some range. I've chosen one hundred, but doesn't have to be one hundred, just some number there-- if f of x, that is to say some some function of my--

Whoops, I shouldn't have said x. My notes say x, but it's the wrong thing-- if f of x, f of the guess, is close enough, so for example, if when I square guess, I get close enough to the number who's root I'm-- square root I'm looking for, then I'll return the guess.

If it's not close enough, I'll get a better guess. If I do my, in this case, one hundred iterations, and I've not get-- gotten a guess that's good enough, I'm going to quit with some error. Saying, wow. I thought my method was good enough that a hundred guesses should've gotten me there. If it didn't, I may be wrong.

I always like to have some limit, so that my program can't spin off into the ether, guessing forever.

OK. Let's look at an example of that. So here's a successive approximation to the square root. I've called it square root bi. The bi is not a reference to the sexual preferences of the function, but a reference to the fact that this is an example of what's called a bi-section method.

The basic idea behind any bi-section method is the same, and we'll see lots of examples of this semester, is that you have some linearly-arranged space of possible answers. And it has the property that if I take a guess somewhere, let's say there, I guess that's the answer to the question, if it turns out that's not the answer, I can easily determine whether the answer lies to the left or the right of the guess.

So if I guess that 89.12 is the square root of a number, and it turns out not to be the square root of the number, I have a way of saying, is 89.12 too big or too small. If it was too big, then I know I'd better guess some number over here. It was too small, then I'd better guess some number over here.

Why do I call it bi-section? Because I'm dividing it in half, and in general as we'll see, when I know what my space of answers is, I always, as my next guess, choose something half-way along that line. So I made a guess, and let's say was too small, and I know the answer is between here and here, this was too small, I now know that the answer is between here and here, so my next guess will be in the middle.

The beauty of always guessing in the middle is, at each guess, if it's wrong, I get to throw out half of the state space. So I know how long it's going to take me to search the possibilities in some sense, because I'm getting logarithmically progressed.

This is exactly what we saw when we looked at recursion in some sense, where we solved the problem by, at each step, solving a smaller problem. The same problem, but on a smaller solution space.

Now as it happens, I'm not using recursion in this implementation we have up on the screen, I'm doing it iteratively but the idea is the same. So we'll take a quick look at it now, then we'll quit and we'll come back to in the next lecture a little more thoroughly.

I'm going to warn you right now, that there's a bug in this code, and in the next lecture, we'll see if we can discover what that is.

So, it takes two arguments; x, the number whose square root we're looking for, and epsilon, how close we need to get. It assumes that x is non-negative, and that epsilon is greater than zero.

Why do we need to assume that's epsilon is greater than zero? Well, if you made epsilon zero, and then say, we're looking for the square root of two, we know we'll never get an answer. So, we want it to be positive, and then it returns y such that y times y is within epsilon of x. It's near, to use the terminology we used before.

The next thing we see in the program, is two assert statements. This is because I never trust the people who call my functions to do the right thing. Even though I said I'm going to assume certain things about x and epsilon, I'm actually going to test it. And so, I'm going to assert that x is greater than or equal to zero, and that epsilon is greater than zero.

What assert does, is it tests the predicate, say x greater than or equal to zero, if it's true, it does nothing, just goes on to the next statement. But if it's false, it prints a message, the string, which is my second argument here, and then the program just stops.

So rather than my function going off and doing something bizarre, for example running forever, it just stops with a message saying, you called me with arguments you shouldn't have called me with.

All right, so that's the specification and then my check of the assumptions. The next thing it does, is it looks for a range such that I believe I am assured that my answer lies between the ran-- these values, and I'm going to say, well, my answer will be no smaller than zero, and no bigger than x. Now, is this the tightest possible range? Maybe not, but I'm not too fussy about that. I'm just trying to make sure that I cover the space.

Then I'll start with a guess, and again I'm not going to worry too much about the guess, I'm going to take low plus high and divide by two, that is to say, choose something in the middle of this space, and then essentially do what we've got here.

So it's a little bit more involved here, I'm going to set my counter to one, just to keep checking, then say, while the absolute value of the guess squared minus x is greater than epsilon, that is to say, why my guess is not yet good enough, and the counter is not greater than a hundred, I'll get the next guess.

Notice by the way, I have a print statement here which I've commented out, but I sort of figured that my program would not work correctly the first time, and so, I, when I first typed and put in a print statement, it would let me see what was happening each iteration through this loop, so that if it didn't work, I could get a sense of why not.

In the next lecture, when we look for the bug in this program, you will see me uncomment out that print statement, but for now, we go to the next thing.

And we're here, we know the guess wasn't good enough, so I now say, if the guess squared was less than x, then I will change the low bound to be the guess. Otherwise, I'll change the high bound to be the guess. So I move either the low bound or I move the high bound, either way I'm cutting the search space in half each step. I'll get my new guess. I'll increment my counter, and off I go.

In the happy event that eventually I get a good enough guess, you'll see a-- I'll exit the loop. When I exit the loop, I checked, did I exit it because I exceeded the counter, I didn't have a good-enough guess. If so, I'll print the message iteration count exceeded. Otherwise, I'll print the result and return it.

Now again, if I were writing a square root function to be used in another program, I probably wouldn't bother printing the result and the number of iterations and all of that, but again, I'm doing that here for, because we want to see what it's doing.

All right. We'll run it a couple times and then I'll let you out for the day. Let's go do this. All right. We're here.

Well, notice when I run it, nothing happens. Why did nothing happen? Well, nothing happens, it was just a function. Functions don't do anything until I call them. So let's call it. Let's call square root bi with 40.001 Took only one at-- iteration, that was pretty fast, estimated two as an answer, we're pretty happy with that answer. Let's try another example. Let's look at nine. I always like to, by the way, start with questions whose answer I know. We'll try and get a little bit more precise.

Well, all right. Here it took eighteen iterations. Didn't actually give me the answer three, which we know happens to be the answer, but it gave me something that was within epsilon of three, so it meets the specification, so I should be perfectly happy.

Let's look at another example. Try a bigger number here.

All right? So I've looked for the square root of a thousand, here it took twenty-nine iterations, we're kind of creeping up there, gave me an estimate.

Ah, let's look at our infamous example of two, see what it does here. Worked around.

Now, we can see it's actually working, and I'm getting answers that we believe are good-enough answers, but we also see that the speed of what we talk about as convergence-- how many iterations it takes, the number of iterations-- is variable, and it seems to be related to at least two things, and we'll see more about this in the next lecture. The size of the number whose square root we're looking for, and the precision to which I want the answer.

Next lecture, we'll look at a, what's wrong with this one, and I would ask you to between now and the next lecture, think about it, see if you can find the bug yourself, we'll look first for the bug, and then after that, we'll look at a better method of finding the answer.

Thank you.

## Free Downloads

### Video

- iTunes U (MP4 - 98MB)
- Internet Archive (MP4 - 98MB)

### Free Streaming

### Subtitle

- English - US (SRT)

## Welcome!

This is one of over 2,200 courses on OCW. Find materials for this course in the pages linked along the left.

**MIT OpenCourseWare** is a free & open publication of material from thousands of MIT courses, covering the entire MIT curriculum.

**No enrollment or registration.** Freely browse and use OCW materials at your own pace. There's no signup, and no start or end dates.

**Knowledge is your reward.** Use OCW to guide your own life-long learning, or to teach others. We don't offer credit or certification for using OCW.

**Made for sharing**. Download files for later. Send to friends and colleagues. Modify, remix, and reuse (just remember to cite OCW as the source.)

Learn more at Get Started with MIT OpenCourseWare