Lecture 4: What's Significant in Laboratory Measurement? Error Analysis Lecture

Flash and JavaScript are required for this feature.

Download the video from Internet Archive.

Description: In today's lecture, Dr. Hewett discusses what's significant in laboratory measurement, how to take measurements in the lab, how to do calculations, and how to do data analysis when there's a lot of quantitative data, as in the upcoming Charles River lab.

Instructor: Dr. Sarah Hewett

[SQUEAKING]

[RUSTLING]

[CLICKING]

 

SARAH HEWETT: All right. I guess we can get started. So today, we're going to talk about what's significant in laboratory measurement and how to take measurements in the lab, how to do calculations with the lab, and how to do some of the data analysis when you have a lot of quantitative data, which you will have in the lab coming up when we do the Ellen Swallow Richards or the Charles River Lab.

One thing I wanted to point out-- so there's hand-outs in the back. We're going to be going through how to do the calculations and how to do all the statistics. And there's an example problem that we're going to work through as we go in the back. So if you want to make sure you have a copy of that, that might be helpful.

And one of the things that was pointed out this week in the lab is that in the lab, there is a typo left over from when we were in the old labs in Building 4. And so this Wednesday and Thursday, the TA session that it says, the help session about how to write the ferrocene lab report is going to be in the lab, not in 4:00, 4:30 as it says in the lab manual.

So just come to lab as normal. The beginning of the lab period will be a quiz on the ferrocene lab, so anything procedure wise, calculations, reactions, all that good stuff, things that you should know, having done the ferrocene lab and hopefully having started to write up your report a little bit.

And then after the first 20 minutes or so, it's going to be the quiz. And then your TAs will give a little bit of a lecture about what they're looking for in the ferrocene reports. So that's a good time to ask any questions about-- if you're unsure about where to write anything, how to write anything up if you have anything that you've already written and you want them to look at. They will be in the lab to help you out with that.

And we have some more office hours, too, that got posted to Stellar. And I will have a slide about them at the end of this lecture, too, for more help. But, yeah, day four of ferrocene-- come take the quiz, then get some help on your reports. It is in the lab. You don't have to go anywhere else. Just come to lab as normal.

OK, so if we're going to be talking about measurement today, then we need to talk about what things you should be thinking about if you're going to measure something. So if you're going to measure the length of this piece of wood with this ruler, how long would you be able to say this piece of wood is?

AUDIENCE: There's no units.

SARAH HEWETT: There's no units. Yeah, that's a problem. What else is wrong?

AUDIENCE: It's not long enough.

SARAH HEWETT: It's not long enough. So what's the minimum we could say about this piece of wood?

AUDIENCE: [INAUDIBLE]

SARAH HEWETT: It's longer than 1 something. So this is not a great measuring tool. What about this? Any better? Slightly better. So now, what could we say about the length of the piece of wood?

AUDIENCE: Between 1.

AUDIENCE: Between 1 and 2.

SARAH HEWETT: It's between 1 and 2, probably a little closer to 2. But we can't really say too much more about it than that. And we still don't have any units. So this is just to get you thinking about what we need to be considering when we are choosing how we're going to measure something and when we were recording measurements, what information we can reasonably get from the measuring tools that we're using.

So to talk about measurements a little bit more, we're going to need to talk about some terminology and get this out of the way, so that we're all using the same language. And you've probably seen these terms before in your other science classes in high school and middle school, going way back.

But just as a quick refresher, so we're all on the same page, precision is how close repeated measurements are to one another. So if you measure the same thing a bunch of times, you would like the results to be the same. And you've probably seen these bulls-eye diagrams where if the results are all really close together, that is very precise.

And you can measure that with the standard deviation. So the standard deviation gives you an idea of relatively how close your points are to each other. Accuracy is how close a measurement is to the true value or the value that you actually are trying to measure.

So if you are throwing darts and you are trying to get the bulls-eye, then if you have high accuracy, then you have them all in the bulls-eye. If you have a high accuracy and high precision, then all of your darts always go to the center. You can have high accuracy where you're always around the target, but your points are scattered. And then you can have low accuracy and low precision where you're just all over the place.

We measure accuracy by the percent error. So it's the error relative to the result that you're trying to get. And the answers don't need to be close to each other to be accurate. Like I said, they can be spread out, but still all close to the target value. So these are some terms that you probably heard there a little bit, used differently in chemistry sometimes than you may use them colloquially. So we just want to go over that.

More terminology-- absolute error is how far away a measurement is from the true or accepted value. So if you are trying to measure, let's see, with this here, pipette, this pipette is designed to measure 10 milliliters. And you can look at it. And it has a tolerance of 0.04 milliliters.

So that's an absolute error. It is how far away your measurement will be from the true value if you're trying to measure 10 and you measured 10.04. Then your absolute error is 0.04 milliliters. So you just subtract what you got from what you intended to get.

The relative error is how far a measurement is from the true value, but it's relative to the quantity that you're trying to measure. So if you've measured 0.04 or 10.04 and you intended to measure 10, then your absolute error is the 0.04. To get your relative error, you'll make it a percent and divide it by the 10 that you intended to measure. So now, you have a 0.4% error.

One way to think about this is if you have an absolute error of 1 milliliter when you're measuring out 2.5 liters of material, that's pretty small. But if you have an absolute error of 1 milliliter when you're measuring out 2.5 milliliters of material, then that becomes a much bigger problem. So depending on the quantity that you're measuring, sometimes it's a little bit more useful to report the relative error than the absolute error and vice versa, depending on the application.

If you say, yeah, it had an error of 1 milliliter, and your friends are all saying, oh, you did such a great job, and then you say, I was only trying to measure 2 milliliters, then not so great. Other types of error are random error and systematic error. So random error is things that we can't really control for or identify. And it causes data to fluctuate relatively uniformly around a mean value.

And that's one of the things that's given on the uncertainty in the measuring apparatus that you're using. So this pipette measures 10 milliliters, plus or minus 0.04 milliliters. That is a random or indeterminate error. The data will fluctuate around 10 milliliters within 0.04. You know that. And you can control for that using statistical analyses, which we'll talk about in a little bit.

Systematic or determined error is error that has a known cause. And that could be from faulty or poorly calibrated instruments, human error, or chemical behavior inside reactions. So if you intended to get 5 grams of product and you got less than 5 grams, then you know that there is some error associated with your experiment. And it could be because maybe some of your iron oxidized. Or maybe you spilled some of your ferrocene. Or maybe you were not tearing the balance every time you used it.

So that'll introduce error into your measurement if you were using the equipment improperly. Or if you're reading from a burette and you're always reading in the wrong direction or you didn't read to enough significant figures, that's systematic error.

And so that's not necessarily from the glassware itself. That's from how you were using it. And these are types of error that you can talk about in your discussion section of your lab report. So you could talk about the error that is associated with the measurements that you can control and that you cannot control.

So the last term that we need to talk about in terms of when we're making measurements or doing things in the lab is, what does it mean for something to be significant? So you may have heard of the term "significant" figures or this is a significant result. What does it mean for a number or measurement to be significant? And what makes a measurement that we take significant in the lab? Yeah?

AUDIENCE: Ideally, significant means all the numbers that you're certain about plus one estimate or one final.

SARAH HEWETT: Exactly. Yeah. So a number is significant. And you report the significant figures-- are in a measurement, all of the numbers that you're certain of, plus the first uncertain digit. So a lot of times, you'll have some sort of-- you won't have measurements every-- maybe tenth of a milliliter or 1 milliliter, depending on the size of the glassware that you're using.

So you can be certain of some of the measurements. And then you're uncertain of others. So in our piece of wood, we were certain that it was bigger than 1, but we were uncertain of the next digit. So we could estimate that second digit. And that counts as significant.

A quick review of significant figures and how to use them. All non-zero digits in a number are significant. Zeros in between non-zero digits are also significant. Zeros to the left of a non-zero digit are not significant. Zeros to the right are only significant if there's a decimal place. And you can avoid ambiguity in that with scientific notation. So if we just go through these really quickly, how many significant figures does this have? Four.

AUDIENCE: Three

SARAH HEWETT: Three.

AUDIENCE: Six.

SARAH HEWETT: Six.

AUDIENCE: Two.

SARAH HEWETT: Two, maybe.

AUDIENCE: Maybe?

SARAH HEWETT: As written, yeah, you would probably call it two, because the zero is to the right. But if you wanted the zero to be significant, then you could write it like this. And now, how many significant figures do we have?

AUDIENCE: Three.

SARAH HEWETT: Three. And if the zero is not significant, then to avoid people being confused about it, you could write it like this. And we have two. So that's one way you can use scientific notation to get around ambiguity. Yes?

AUDIENCE: Yeah, I have a question. So for 590, if someone put a decimal point after the zero-- would it be three significant figures, or would it be two?

SARAH HEWETT: So that's a question that I've always gotten. I think, just conventionally speaking, if you wrote something as like this, maybe. More conventionally correct would be to write it as 5.90 times 10 to the 2. Because you don't usually have a decimal place if there's nothing after it.

And then if you wrote it like this, though, then you have four significant figures. And that's more than you want. So just go with the scientific notation, instead of having a hanging decimal place. Because that's not the typical way to write numbers. When you're doing math with significant figures, if you're adding and subtracting, then your final answer should have the same number of decimal places as the number you started with that has the smallest number of decimal places.

When you are multiplying and dividing, the answer should be rounded to the same number of significant figures as the number you started with that has the least number of significant figures. So when you are multiplying and dividing, you want to count up your sig figs. And when you are adding and subtracting, then you pay attention to your decimal places.

So if we go back to our piece of wood, then using that ruler that we had earlier, we can estimate. Maybe this is 1.8. So we have gradations every one unit. We still don't have units. Whatever this is. And then you can estimate this, one digit pass that. And then our uncertainty is our gradations divided by 2.

So we know that we are within 0.5 of the correct answer. Because we can estimate that it's within this half of our interval. If we add more gradations to our ruler-- so if we have markings every 0.1, still no units, then we can say with certainty that our piece of wood is 1.567 bigger than 1.7 and smaller than 1.8. So it's 1.7 something. You can estimate that last decimal place. And then your error gets smaller to your uncertainty in that measurement.

This is important in the lab. Because you need to make sure that you're choosing the correct apparatus for the amount of uncertainty that you want in your measurement. So if you see something in your lab manual that says measure 1.2 grams, then this balance is totally fine. Because the balances-- if you've never noticed at the top, they have the uncertainty written on there. So the uncertainty is in this last decimal place here.

So if you want 1.2 grams, then you can't get 1.2. And then your last digit is the uncertain one. Should always write down all of the digits that are on the balance screen. But when you're going to calculate uncertainty or do an error propagation, then you know that the last digit is where your uncertainty is. This is plus or minus 0.01 grams.

So we have three different types of balances in the lab. So you want to make sure that you're choosing the one that has the correct number of decimal places for the quantity that you are trying to measure. And the same thing goes for glassware. So the uncertainty in the glassware is frequently reported on the glassware itself. And there are standard tolerances for various sizes and grades of glassware.

So if you've ever seen glassware that's volumetric, it may say either an A or a B on it. So the A is the highest standard. And then B is a slightly higher tolerance. And there are standard things of those that you can look up. But it's usually written on the glassware. And so this is an example of the 10-milliliter pipette that I was showing you earlier.

So you can look at the pipette. And then on the top part here, it'll say 10 milliliters. And this one says plus or minus 0.04. This pipette says plus or minus 0.02. So that one is calibrated a little bit more precisely. These are also 10-milliliter pipettes. And you can look on the top part here to figure out what their tolerances. These are both 0.06.

And one of the other things that you need to make sure when you are using different types of glassware is to figure out where the markings end. So these are both 10-milliliter pipettes. And they are designed to measure 10 milliliters within 0.06 milliliters.

This one-- I don't know if you can see it from here. Probably not, but maybe if I hold it up here. The markings-- and they go from 0 to 10. And it ends here. And there are no more markings at the tip. So that means the only time you're measuring something is when your liquid is within where the markings are. So if you're going to measure 10 milliliters, you need to start your liquid here, end it at 10. And don't go below that, because that quantity is not calibrated.

There are some pipettes like this one that are also designed to measure 10 milliliters, but these have gradations all the way to the end of the pipette. So to get 10 milliliters out of this one, you would fill it up to here and then drain it all the way out to the bottom. So you want to make sure that you check your glassware ahead of time to figure out how much of it you can use to measure liquids and what you get if you empty it all the way out, versus measuring by difference.

If you look at your graduated cylinders, your graduated cylinders are different sizes. And the markings come in different levels of precision. So this has markings every half of a milliliter. And its tolerance is listed at the top as plus or minus 0.3. The 50-milliliter graduated cylinder has markings every 1 milliliter. And at the top, it'll tell you that its tolerance is plus or minus 0.5 milliliter. So again, that's the 1 milliliter divided by 2 rule. So sometimes that holds. And sometimes it's a little bit different.

Volumetric glassware always has a smaller tolerance, so this is to 250-milliliter volumetric flask. And on here, it will tell you that its tolerance is plus or minus 0.12. So you should always look at these. And when you are making measurements in the lab, when you're writing things down in your lab manual, you should always write down what the uncertainty in the measurement that you're making is. So whether it's on one of the balances or a volume, always make sure that you write down what the uncertainty of the glassware that you're using is.

If you look at something like a beaker, this also has a tolerance on it. And it's plus or minus 5%. So that's a lot when you are looking at a 300-milliliter beaker. That's going to be a lot more uncertain than any of these other measuring implements. So it's good to know before you choose, what piece of glassware you going to use to measure something, how precise you want your answer to be, and how much uncertainty is acceptable in the measurement that you are making.

With burettes-- so this is a 50-milliliter burette. And it doesn't have a tolerance listed on it, but it has markings every tenth of a milliliter. So what would be uncertainty in the burette? Point? So if the markings are every-- a 50-milliliter burette. The markings are every 0.1 milliliters. So the uncertainty is--

AUDIENCE: 0.05.

SARAH HEWETT: 0.05, so plus or minus 0.05 milliliters. In a 10-milliliter burette, the markings are every-- let's see. I think they're every 0.2. No. What are these? 0.05. Yeah, these are every 0.01 in a 10-milliliter burette. So what would the uncertainty be here?

AUDIENCE: 0.005.

SARAH HEWETT: Yeah. So make sure that you are looking at what glassware you're using, and you record the correct number of significant figures. Because it changes based on the type of burette and even within different types of glassware where they are, made different with varying levels of quality.

So we can make one measurement. And then we can figure out what the uncertainty is in that one measurement. But what happens if we take multiple measurements and have to do math with them? So if you look at your worksheet here, this is an example of something that you might do in the lab where you have done a series of titrations to calculate the concentration of some sort of HCl solution.

So the first question is, what is the uncertainty associated with a 50-milliliter burette? We already figured that out. And then if we were going to calculate the volume dispensed in each titration trial with the correct error, how would we do that? So one of the ways that you could think about adding up the error in these measurements is just straight up, adding the error.

So for this first example, we have 12.52 milliliters. This chalk is no good. And then do volume by difference. We have 0.24 milliliters. And each of these has an uncertainty of plus or minus 0.05 milliliters. So if we added up the error in these measurements, then we would get 0.1 milliliters. But it is probably pretty unlikely that both of these measurements were off by the full 0.05 milliliters. So this would tend to overestimate the amount of error in your final answer.

So what we do is we propagate the error. And when you're adding and subtracting, you use the absolute errors. Because you should be adding quantities that have the same units. So units will be consistent throughout your calculation. And when you do this-- did anybody subtract this out? I don't know if anyone has a calculator. Got one?

AUDIENCE: Yeah.

SARAH HEWETT: Great. So if we subtract these two numbers and we get 12.2 milliliters. And then how do we propagate the error for this calculation? So our error is going to be the square root of the squares of the errors of each individual measurement. So each one is-- so if anyone has done that-- or you do it in your head. Why not?

The answer that you get is 0.070710678. So that's a lot of extra decimal places. And what we really care about in uncertainty is, where is the first decimal point that is uncertain? What is that first digit that we don't know for sure? And so we always round an error to the first significant figure. So the error associated with these measurements is just 0.07 mL. So to right this perfectly, you would write it as 12.28 minus 0.07 milliliters.

And you can calculate the volumes for each of those trials. And then the error for all of them is going to be the same. Because the error for each of your direct measurements is going to be 0.05. So then we can talk about what happens to the error in a calculation when you are multiplying or dividing something, which is this next part.

So if you're going to use this value to calculate the concentration of the HCl in this fake experiment, then you will have to do some multiplication and division. And you'll be using different quantities that have different units. So you can't just use the absolute error, because they have different units.

So if you have your error with your burette reading, that'll be in milliliters. If you have your error with your concentration, that's in molarity. So we can't use the absolute error. So we'll have to use the relative error, which, if you remember, is the absolute error divided by the measurement that you've made. So you want to take a second and do out that calculation for how to calculate the concentration of the HCl. Then we can do it all together in a second.

And we'll just do it for the first trial. And then if you guys want to have more practice, you can do it for the others afterwards. So if we're going to set up this calculation for the concentration of the HCl solution, where would we start? It's the first thing that we have to do or to calculate in this. Yeah?

AUDIENCE: Figure out how much sodium hydroxide we have.

SARAH HEWETT: Yeah. In what units?

AUDIENCE: Moles.

SARAH HEWETT: Moles. So we can figure out how many moles of sodium hydroxide we have by taking our volume. And then we have our units, our concentration return, moles per liter. So we need to turn this into liters. And then we can use our molarity to get moles. I guess we could do that as moles of NaOH in 1 liter. And then we can go from moles of NaOH to moles of HCl. What's our mole ratio?

AUDIENCE: One to one.

SARAH HEWETT: Too easy. One to one. Yeah. 1 mole. And then we have moles of HCl. And then the last step is to divide by our volume of HCl. Yes, which in this case is 0.0100. Yeah, liters. And that should give you, if anybody did it out, 0.60172 molar, hopefully. So that's our concentration.

And now, we have to figure out what the error associated with that concentration is. So we will look at the error involved in each of these terms. So we have our error associated with our burette measurement. So our first one is going to be the 0.07 over 12.28 squared plus-- what's the next error that we have?

AUDIENCE: [INAUDIBLE]

SARAH HEWETT: Which one? No, say it.

AUDIENCE: The last one.

SARAH HEWETT: Last one. Yeah, we have error associated with this measurement, right? And the error associated with that is 0.02 milliliters. So you have 0.02 milliliters over your 10 milliliters. And then what's the last error that we have? Yeah?

AUDIENCE: Concentration of NaOH.

SARAH HEWETT: The concentration of NaOH. The 0.04. So you have 0.04 over 0.49. That's how you will calculate your error for the concentration of the HCl. And if you do this out, you get 0.08. So then, how would we report these two together? That's the thing from before.

So what is this? What type of error is this? Is it an absolute or a relative error? Relative. So we calculated it from all of the relative errors. So this is a relative error. And it's also-- you could think of this as 8%. So in order to get it back into a standard error that we can report with our 6.60172, then we have to multiply these two together. And if we go back to our sig figs from before-- in this calculation, how many significant figures should our final concentration answer have?

AUDIENCE: Two.

SARAH HEWETT: Two. So this has two sig figs. So we'll round this to 0.60. Then you can multiply that by your 0.08 relative error. And your final answer should look something like this, way down here-- 0.60 plus or minus 0.05 molar. Any questions about calculating errors and doing error propagation?

So this is why it's important to make sure that you have the correct tolerances and the correct errors associated with all of your measurements. Because when you go to do all of your calculations and write your lab report up, it will be very hard to figure out your error if you are missing some. OK.

So now that we know how to deal with the various independent measurements, what happens when we make the same measurement more than once? And this goes back way-- probably to middle school when we talk about the mean or the average of a sample.

So in statistics land, they call all of the possible values that you could measure of something the population. And it's really hard to know the true mean of the population, unless you have access to the entire population and you've taken all possible measurements, which is pretty much impossible, especially in the case of 5.310. We're not going to be making all of the possible measurements there are for each quantity that we measure.

So in the lab, we can take the mean of a subset of the population. We call that the sample and then calculate our sample mean. So population mean, if you ever see in statistical literature, is denoted as mu. And then our sample population is denoted as the x bar, where you just add up all of your measurements, divide by the number of measurements you took. And that's your mean.

So we calculate an average or a mean for the HCl solution on the back. If you didn't do the calculations from before, the concentrations that you should get are 0.60, 0.61, 0.61, and 0.64. So if you add all those up and divide by 5, then your mean concentration is 0.615 or rounded to the correct number of significant figures-- 0.62 molar.

The other thing that you can calculate is the standard deviation of your measurements. And so remember, that's a measure of precision or how close all of your measurements are to one another. So small standard deviation means that all of your measurements are very close together, very precise. And to calculate the standard deviation, again you could calculate the standard deviation of an entire population. It's the sigma up there. And then you need to know your population mean.

But in the lab, we are taking a smaller subset of the population. So we have to calculate a sample standard deviation, which is denoted as S. And the major difference here is that instead of dividing by N, which is the number of samples, you divide by N minus 1, so the number of measurements you took, minus 1.

So to calculate a standard deviation, you calculate each of your measurements, subtract it from the sample mean, square it, add those all up, divide by N minus 1, and then take the square root. And you can do this on a scientific calculator, a graphing calculator. Excel does it for you, any statistical program that you want. You don't have to calculate this by hand in your lab manual or in your lab report. You could just show the formula for it. And then you can calculate it using any sort of statistical program.

So if we calculate the standard deviation for our concentration of the HCl solution, then you should get-- not going to do this all out by hand right now, because we don't have a lot of time. But if you wanted to try it out later or check your answer for somewhere else, it should be 0.1732. It keeps going.

So the standard deviation is typically rounded to the same number of significant figures as the measurements that you are using to get the standard deviation. You'll see it reported pretty frequently like that. So we can report our standard deviation as 0.017.

Since we are taking a mean that is not the population mean, we know that we may not have exactly-- the mean of your sample will probably not be exactly the same as the population mean. But you can figure out how good of an estimate your mean is, of the overall population mean, by calculating the standard error of the mean. And that is done by taking the standard deviation of your sample and dividing by the square root of the number of data points that you have.

So if you want to get a better and better estimate of the true mean of your population, then you can keep increasing the N value. And eventually, it'll make your standard error smaller to a point, since it's the square root. Once you get a big enough number of N, then incrementally, you get less and less benefit from adding more and more data points.

But you can also make your standard error smaller by decreasing your standard deviation. And so if you can make more precise measurements, then that will also help improve the quality of your mean and how accurate it is toward the actual population mean.

All of this is leading up to doing statistics. And all of statistics is essentially based on the Gaussian distribution or the normal distribution. And you've probably seen this before. And this graph is an axis of where the x-axis is the number of standard deviations away from the mean. And then the y is the probability of finding a measurement in that space.

So you can see that the highest probability of all of your measurements is going to be very close to the mean. So if you're taking measurements accurately, they should be very close to the mean. And then as you get further and further away, there is less probability that some measurement you take will be three standard deviations away from the mean and so on.

So this number Z is calculated by subtracting your number from the mean and dividing by the number of standard deviations. So it's just essentially the number of standard deviations away from the mean-- your measurement is. Oh, back up. We cannot go back. There we go.

One way that we can use this is to figure out confidence levels of the measurements that we take or the means that we calculate. And so you can come to an error under the curve analysis here-- or an area under the curve analysis. And so for all measurements that are within one standard deviation, plus or minus from the mean, that accounts for 68% of the area there.

If you go within two standard deviations, you can stay within 95% confidence that your mean is within that area. And then within three standard deviations, plus or minus, so 99% chance that your mean is within that area. So we can use this when we are calculating the accuracy or how confident we are that have measured the population mean with the samples that we have taken in the lab.

We can do this because of the central limit theorem, which states that the distribution of sample means in a population will constitute a normal distribution, even if the population itself is not normally distributed. So if you take a bunch of samples or take a bunch of measurements in the lab, then you can use the normal distribution to statistically analyze them, even if you don't know the distribution of the population itself.

So we can use that information to calculate confidence levels of a mean that we calculate. So what that means is we can calculate how confident we are that our mean falls within a certain range of the actual population mean. And you can say it with different levels of confidence. You could say I'm 99% confident. And you'll get a wider range or 95% confident. And you'll get a little bit of a smaller range.

And the range of the values is what we call the confidence interval. And you can calculate a confidence interval for the actual population mean. And that is your standard mean, that Z square root we were talking about, times the standard deviation of the population, divided by the square root of N.

To use Z as in the previous slide, you need to have a big enough sample size that your estimates of the mean and the standard deviation are a good enough substitute for the actual sigma, the population standard deviation. That's really challenging in lab situations. And in 5 through 10, we will not ever have a big enough sample size for that to be the case.

So as an alternative, we can use the t statistic. So to calculate a confidence interval for a mean that you calculate in lab, you can take your mean, plus or minus the t statistic, times your sample standard deviation, over the square root of N. And you can calculate t statistics using this formula if you have all of this information. Or you can get a table of t values for varying confidence levels and varying degrees of freedom, which is N minus 1, which is the number of measurements that you took, minus 1.

So if we're going to calculate a confidence interval for the mean of our HCl concentration, then we will have something like-- we need our average, plus or minus ts, square root of N. And you'll notice that this standard deviation over the square root of N is the standard error of the mean. So if you calculate that, then it'll make it slightly easier to calculate your confidence interval. Yeah.

And so to do that, we're going to need some t table, which gives you the t statistics. Our average of the mean was 0.62, plus or minus. So the t value-- if we have four measurements, then our degrees of freedom is going to be 3, so N minus 1. And then we want to use a two-tailed t-test. Because we don't know which way our measurement is going to vary. So the two-tail means that it could be higher or lower than the actual mean.

So you want to calculate one that has a value of 0.05. So that corresponds to the 95% confidence interval. And 3-- so 3.182 is our t value and then times 0.017. And what you get, if you do that out, is-- so when you're going to report your 95% confidence interval, you would add this number and subtract this number from your mean and then report it as a range in parentheses, so 0.593,0.647. And so that's your 95% confidence interval.

And so what that's saying is that there is a 95% chance that the actual population mean lies between these two numbers. You are 95% sure that the actual mean is in there. So in your lab reports, when it says to calculate a 95% confidence interval, that is what we are looking for.

Other issues you may run across in your data is if you take repeated measurements of the same quantity. You may have outliers. And sometimes you can look at your data and say, wow, there's definitely an outlier here and other times-- but even if you can do that, it's nice to be able to mathematically demonstrate that it is, in fact, an outlier.

And so to avoid subjectivity in whether you're tossing out data points left or right, just to make yourself look better, you can use the Q-test to help decide whether a value can be kept in a data set or should be rejected as an outlier. And the way that the Q-test works is you take the absolute value of the result that you're worried about, the questionable value.

Subtract it from its nearest neighbor, so whatever the next closest value that you measured was. And then divide it by the spread of the data, which is just your highest value that you measured, minus your lowest value that you measured. And then you have to look at another table of standardized Q values. And if the Q that you calculate is greater than the Q in the table for the number of data points in the confidence level that you want, then you can reject that point as an outlier at that confidence level.

So the last question on here is not really related to the first whole set of problems. But if you took the following measurements for a concentration of an NaOH solution, are there any outliers in that data set? So what's our Q calculation for this data set? Which result might be questionable? 2.86. Yeah, that seems to be a little bit higher than everybody else. So our questionable result is 2.86 minus--

AUDIENCE: 2.52.

SARAH HEWETT: 2.52 is its next closest neighbor divided by-- how do we calculate the spread of our data?

AUDIENCE: [INAUDIBLE]

SARAH HEWETT: So 2.86, which is our max. And our minimum is 2.38. So if you do that out, you get 0.34 divided by 0.48. So our Q value is 0.708. Now, if we look at a table of values for our Q-test-- so we can either calculate it at 90% confident that it's an outlier, 95% confident that it's an outlier, and all the way down.

So if we wanted to calculate at 95% confidence level that this point is an outlier, then we had how many data points in this data set? 6. So we'd look at 6. 95%. Our Q value is 0.625. So 0.708 is greater than 0.625. So is it an outlier or not? The Q that you calculate is bigger than the Q in the table. Then it is, indeed, an outlier. And we can confidently say that that does not belong in our data set. Ooh, we're working on it. Come on.

So you can see that if anything had changed, like if we had had fewer data points, then it would not be an outlier. Or if you wanted a higher confidence level, then it also would not have been an outlier. So it's a little more strict, how far away from the rest of the points it has to be before you can call it an outlier.

Last thing we're going to talk about is doing the least squares regression. You'll make a bunch of calibration curves when you are doing different measurements with UV-Vis spectroscopy. So in UV-Vis, which you'll talk about in the next couple of labs, the absorbance of a sample at a certain wavelength, how much light it absorbs is related to its concentration.

And this is a calibration curve with the concentration of bovine serum albumin protein versus absorbance, which you guys will be making a similar one in the catalase lab coming up. So it assumes that one variable is known. So that's our concentration, that we have known concentrations, and that all of the variation in the y-axis is linearly related to our x values. So that's one assumption that you have to have before you are going to make a least squares linear regression.

The way that you can calculate all of the values in a least squares regression-- which the whole idea of is to get an equation for a line in the form of y equals mx plus b. So that's the point of all this. And you'll have different points. And you want to calculate a straight line through all of your points.

So some of the terminology that is associated with this is the residual value. And so that's your vertical distance between a point and the line of best fit. So this is your residual value. Each point has a residual value. And the way that the line of best fit is generated in the "least squares method" is they take the square of all of these residual values and try to minimize it.

And so you can calculate various quantities if you take all of the x values, subtract them from the average x and square them. Then you'll get the Sxx. You could do the same thing with the y values and then the x and the y values. The slope of the line is this value over this one. To get the y-intercept, you take the average y, subtract it from the slope and the average x. And then the r squared.

This was a misprint in your slide hand-outs. It should be 1 minus this quantity in the hand-outs. The 1 is not on there. And so the R squared value, which you've probably seen before, is the coefficient of determination. And that is a measure of how well this best fit line can explain the variation in y in a linear fashion. So essentially, how well do your points fit a linear relationship? Is that a good way to explain your data?

So the best way to generate this is in Excel. You could calculate all those quantities by hand if you want to. But Excel will do it for you or your other favorite mathematical software program. You graph your points as an x-y scatterplot. Then you can right click the data points. And it'll say add trend line. And then you can have options to display the equation and the R squared on the graph.

So you'll get your y equals mx plus b equation there and your R squared value there. So you can see that this line is-- this is not a very linear set of data. So your R squared, ideally, is close to 1. And this is close-ish to 1, but that is definitely not a linear relationship.

You can get it higher than that. And we'll talk about that more when you guys make your calibration curves for the bovine serum albumin. The Coomassie blue dye is only linear in certain regions of the calibration curve. So you'll have to calculate different lines for different pieces of the curve, depending on what your concentration is.

The last thing that we can do to get more information out of a set of linear data is to use the LINEST program in Excel. And if you've never done this before, you type in your data as a series of x and y points. And then instead of graphing it-- well, you can graph it, too.

But in addition to graphing it, you highlight a 2 by 5 area of empty cells. And while that's highlighted in the first cell, you type in equals LINEST. And then it'll prompt you to highlight your y values, then highlight your x values. And then you can type true and true. And what that'll do is if you type false, it'll set the y-intercept to zero. And we don't want to force our lines through zero.

So you will have your slope. And then if you press Control-Shift-Enter, even on a Mac-- so on a Mac, press Control, not Command. Press Control-Shift-Enter. And it'll give you this array of data. Once you get this array of data, don't try to change any of these values or Excel will get mad at you, so just leave it.

And this is the key to what each of the cells is telling you, because it won't give you this information. You can look it up online, or it'll be on these slides. So your first thing-- it'll just give you your slope and your intercept. And then it'll give you-- these are the two that we care about the most, are the standard deviation of the slope and the standard deviation of the y-intercept.

So sometimes in this course, you'll make graphs where the slope is representative of a certain quantity. Or the y-intercept is representative of a certain quantity if you're graphing any equation in linear form. So then you'll have the standard deviations of each of those measurements. Then it'll also give you your R squared and a bunch of other information that you want if you're doing different statistical tests.