1 00:00:01,540 --> 00:00:03,910 The following content is provided under a Creative 2 00:00:03,910 --> 00:00:05,300 Commons license. 3 00:00:05,300 --> 00:00:07,510 Your support will help MIT OpenCourseWare 4 00:00:07,510 --> 00:00:11,600 continue to offer high quality educational resources for free. 5 00:00:11,600 --> 00:00:14,140 To make a donation or to view additional materials 6 00:00:14,140 --> 00:00:18,100 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,100 --> 00:00:19,310 at ocw.mit.edu. 8 00:00:22,672 --> 00:00:24,130 WILLIAM GREEN: So today we're going 9 00:00:24,130 --> 00:00:31,920 to talk about Bayesian prior estimation and prior estimation 10 00:00:31,920 --> 00:00:32,500 in general. 11 00:00:36,830 --> 00:00:44,830 So the last time we were writing down the expressions 12 00:00:44,830 --> 00:00:52,540 for the probability of observing a mean measurement if you 13 00:00:52,540 --> 00:00:54,470 know what the model is. 14 00:00:54,470 --> 00:00:55,730 So let's try to do that again. 15 00:00:55,730 --> 00:01:04,840 So suppose I have a model that predicts some observable 16 00:01:04,840 --> 00:01:06,390 and it depends on some knobs, and it 17 00:01:06,390 --> 00:01:09,850 depends on some parameters. 18 00:01:09,850 --> 00:01:15,480 And suppose because I have great powers of faith 19 00:01:15,480 --> 00:01:18,670 that I believe this model is 100% correct with every core 20 00:01:18,670 --> 00:01:20,440 of my being. 21 00:01:20,440 --> 00:01:22,620 And also because I have tremendous confidence in all 22 00:01:22,620 --> 00:01:25,810 the people who built my apparatus, 23 00:01:25,810 --> 00:01:28,210 and the knobs that I turn actually correspond 24 00:01:28,210 --> 00:01:31,030 to the real values, and I have tremendous confidence 25 00:01:31,030 --> 00:01:34,320 in all the literature that reports the parameter values. 26 00:01:34,320 --> 00:01:37,920 And so I'm absolutely certain that this is the truth. 27 00:01:37,920 --> 00:01:41,410 So we'll start from a position of absolute certainty, 28 00:01:41,410 --> 00:01:45,229 and then we'll degrade into doubt as the collector goes on. 29 00:01:45,229 --> 00:01:47,020 So let's start from the position of someone 30 00:01:47,020 --> 00:01:51,665 who has absolute faith that this is the truth. 31 00:01:51,665 --> 00:01:52,900 Is true. 32 00:01:55,470 --> 00:01:58,090 So I have a model, I really believe this model. 33 00:01:58,090 --> 00:02:01,330 So for example, I believe that the kilogram 34 00:02:01,330 --> 00:02:04,810 weight in the SI Institute in Paris 35 00:02:04,810 --> 00:02:07,510 weighs exactly one kilogram. 36 00:02:07,510 --> 00:02:09,800 I believe that with every core of my being. 37 00:02:09,800 --> 00:02:13,390 I'm completely confident that model is correct. 38 00:02:13,390 --> 00:02:15,921 So there are some things I'm really confident on. 39 00:02:15,921 --> 00:02:16,420 That's one. 40 00:02:19,180 --> 00:02:22,410 And maybe guys have some things you really believe, too. 41 00:02:22,410 --> 00:02:24,160 So let's go with things we really believe. 42 00:02:28,780 --> 00:02:32,950 So I plan to conduct some experiments that 43 00:02:32,950 --> 00:02:37,880 measure this observable and are related to this model. 44 00:02:37,880 --> 00:02:45,865 And so I'm going to do 10 repeats of measuring y. 45 00:02:52,630 --> 00:02:57,101 So I'm going to get to the kilogram blob that's in Paris, 46 00:02:57,101 --> 00:02:59,350 and I'm going to stick it on my really expensive scale 47 00:02:59,350 --> 00:03:00,880 that I really believe is great, and I'm 48 00:03:00,880 --> 00:03:02,060 going to measure its weight. 49 00:03:02,060 --> 00:03:03,520 And then I'm going to put it back, and then 50 00:03:03,520 --> 00:03:05,370 put it back, and put it back, and put it back. 51 00:03:05,370 --> 00:03:06,850 And I'm going to get another really great scale that I 52 00:03:06,850 --> 00:03:09,516 really believe is great, and I'm going to measure it there, too. 53 00:03:09,516 --> 00:03:11,350 So I've got a lot of repeats of measuring 54 00:03:11,350 --> 00:03:13,270 the weight of this kilogram, and I 55 00:03:13,270 --> 00:03:15,950 believe it's really a kilogram. 56 00:03:15,950 --> 00:03:18,340 But the stupid measurements don't say a kilogram. 57 00:03:18,340 --> 00:03:25,070 They say, you know, 1.0003, 0.99995, all kinds of numbers 58 00:03:25,070 --> 00:03:27,610 not equal to one kilogram. 59 00:03:27,610 --> 00:03:33,280 So now I'm going to try to figure out 60 00:03:33,280 --> 00:03:36,340 what the probability is that it would have measured 61 00:03:36,340 --> 00:03:38,180 some particular value y. 62 00:03:38,180 --> 00:03:58,230 So what is the probability that my experimental 63 00:03:58,230 --> 00:04:01,960 is between some value, say, y and y plus dy. 64 00:04:04,990 --> 00:04:06,420 So that's a question for you. 65 00:04:06,420 --> 00:04:07,930 So what's the probability? 66 00:04:19,700 --> 00:04:20,915 Sorry, what? 67 00:04:20,915 --> 00:04:22,590 [INAUDIBLE] 68 00:04:22,590 --> 00:04:29,704 OK, so we think that the probability that y 69 00:04:29,704 --> 00:04:37,815 is in this interval given that the model is true. 70 00:04:43,050 --> 00:04:45,420 And I know the theta values perfectly, 71 00:04:45,420 --> 00:04:49,220 and I know the x values perfectly, 72 00:04:49,220 --> 00:04:51,590 is equal to some integral of what? 73 00:04:55,100 --> 00:04:59,820 The bounds integral, probably y to y plus dy, believe that? 74 00:05:02,520 --> 00:05:03,791 What's the integrand? 75 00:05:11,660 --> 00:05:13,220 Sorry, what? 76 00:05:13,220 --> 00:05:16,040 AUDIENCE: The probability [INAUDIBLE] AC function of y. 77 00:05:16,040 --> 00:05:17,540 WILLIAM GREEN: Right, so what is it? 78 00:05:20,071 --> 00:05:22,043 AUDIENCE: [INAUDIBLE] 79 00:05:22,043 --> 00:05:23,760 You wrote it down last time, I think. 80 00:05:33,260 --> 00:05:36,340 So this [INAUDIBLE] is large? 81 00:05:36,340 --> 00:05:37,380 Standard normal, right? 82 00:05:37,380 --> 00:05:42,880 So it should be one over sigma root of 2 pi. 83 00:06:00,220 --> 00:06:01,090 Does that sound OK? 84 00:06:06,510 --> 00:06:07,010 I mean here. 85 00:06:10,794 --> 00:06:12,674 It's probably the same, it's fine. 86 00:06:20,578 --> 00:06:21,673 Yep. 87 00:06:21,673 --> 00:06:23,048 AUDIENCE: What does that notation 88 00:06:23,048 --> 00:06:26,602 mean, if your model is true? 89 00:06:26,602 --> 00:06:29,060 WILLIAM GREEN: So this means, given that the model is true, 90 00:06:29,060 --> 00:06:31,434 and I know these data values are exactly certain numbers, 91 00:06:31,434 --> 00:06:34,040 and the x values are actually certain numbers, what's 92 00:06:34,040 --> 00:06:35,870 the probability that I would make 93 00:06:35,870 --> 00:06:40,220 a measurement whose average would fall in this interval? 94 00:06:40,220 --> 00:06:42,139 So this line means given that this is true, 95 00:06:42,139 --> 00:06:43,430 what's the probability of that? 96 00:06:48,070 --> 00:06:50,020 OK, is this right? 97 00:06:50,020 --> 00:06:52,572 Is this surprising? 98 00:06:52,572 --> 00:06:54,412 This is OK? 99 00:06:54,412 --> 00:06:55,370 So this is what I mean. 100 00:06:55,370 --> 00:06:58,160 So we say that our probability distribution converges 101 00:06:58,160 --> 00:07:00,462 to a Gaussian distribution, this is what we expect. 102 00:07:00,462 --> 00:07:02,170 So we expect it to have been large enough 103 00:07:02,170 --> 00:07:03,003 for this to be true. 104 00:07:06,130 --> 00:07:07,692 Yeah? 105 00:07:07,692 --> 00:07:08,650 This is very important. 106 00:07:08,650 --> 00:07:10,511 This is like the whole course, actually. 107 00:07:10,511 --> 00:07:12,510 This is the whole section, is this one equation. 108 00:07:12,510 --> 00:07:15,340 So I just wanted to make sure you really get what this says. 109 00:07:15,340 --> 00:07:16,930 And if you don't like the integral, 110 00:07:16,930 --> 00:07:18,721 you can make dy really small, and then it's 111 00:07:18,721 --> 00:07:19,650 just this times dy. 112 00:07:23,506 --> 00:07:24,006 OK? 113 00:07:28,850 --> 00:07:31,050 Actually, this notation's like [INAUDIBLE] I 114 00:07:31,050 --> 00:07:33,220 think I should do this way. 115 00:07:33,220 --> 00:07:35,492 I should do this. 116 00:07:35,492 --> 00:07:36,200 This is a number. 117 00:07:38,731 --> 00:07:39,980 Let's get rid of the integral. 118 00:07:39,980 --> 00:07:42,240 Let's make dy really small. 119 00:07:42,240 --> 00:07:43,774 I'll make it [INAUDIBLE]. 120 00:07:48,596 --> 00:07:49,220 That all right? 121 00:07:52,470 --> 00:07:55,510 So this is the probability density that we would observe, 122 00:07:55,510 --> 00:07:59,820 this is the experimental value y that we observe from the mean, 123 00:07:59,820 --> 00:08:05,204 and this is the little with of our tiny little interval. 124 00:08:05,204 --> 00:08:06,350 Is that all right? 125 00:08:10,570 --> 00:08:11,070 Yes? 126 00:08:11,070 --> 00:08:14,353 AUDIENCE: So is sigma the [INAUDIBLE] on there? 127 00:08:14,353 --> 00:08:16,231 WILLIAM GREEN: Ah, what is sigma? 128 00:08:16,231 --> 00:08:17,230 That's a great question. 129 00:08:17,230 --> 00:08:18,729 We didn't write down what sigma was. 130 00:08:18,729 --> 00:08:19,423 What is sigma? 131 00:08:19,423 --> 00:08:21,235 AUDIENCE: Standard deviation? 132 00:08:21,235 --> 00:08:23,526 WILLIAM GREEN: It's not the standard deviation exactly. 133 00:08:23,526 --> 00:08:25,650 Standard deviation of the mean, right? 134 00:08:25,650 --> 00:08:28,170 So there's two sigmas. 135 00:08:28,170 --> 00:08:32,049 We have the sigma of y, of the measurements, 136 00:08:32,049 --> 00:08:42,559 and that's equal to average value of y squared minus. 137 00:08:49,430 --> 00:08:51,680 So we just figure that for how many experiments we do, 138 00:08:51,680 --> 00:08:54,030 we just compute the average of y squared, the average y, 139 00:08:54,030 --> 00:08:55,180 subtract them. 140 00:08:55,180 --> 00:08:56,340 That's the variance. 141 00:08:56,340 --> 00:08:58,550 And then sigma that I used in that equation 142 00:08:58,550 --> 00:09:04,580 there is 1 over n times sigma y. 143 00:09:04,580 --> 00:09:08,160 And we call this the variation of the mean, 144 00:09:08,160 --> 00:09:11,987 it's the uncertainty in the mean value of y. 145 00:09:11,987 --> 00:09:13,445 And the central limits theorem said 146 00:09:13,445 --> 00:09:15,230 that as long as n gets really large, 147 00:09:15,230 --> 00:09:18,500 we expect that this should converge to this. 148 00:09:18,500 --> 00:09:21,690 And we talked last time about how when n get bigger, 149 00:09:21,690 --> 00:09:24,710 these averages don't really change when it gets big. 150 00:09:24,710 --> 00:09:25,910 They're just the average. 151 00:09:28,580 --> 00:09:30,620 But this number declined as n gets big because 152 00:09:30,620 --> 00:09:34,390 of this one over n formula. 153 00:09:34,390 --> 00:09:39,170 And to understand that, suppose I measure the weight, 154 00:09:39,170 --> 00:09:43,080 and I measure, it should be around one kilogram. 155 00:09:43,080 --> 00:09:46,430 But in fact my measurements are all over here. 156 00:09:46,430 --> 00:09:50,610 Lots of measurements. 157 00:09:50,610 --> 00:09:54,020 So they have a variance something like this. 158 00:09:54,020 --> 00:09:56,060 But if I make a plot of-- 159 00:09:59,650 --> 00:10:03,370 as I run, I compute the running average. 160 00:10:03,370 --> 00:10:08,470 So when I run the first two points, 161 00:10:08,470 --> 00:10:10,960 I get some average value here. 162 00:10:10,960 --> 00:10:15,370 After I run 27 points more, the average value is here. 163 00:10:15,370 --> 00:10:18,700 After I run 1,000 repeats, the average value is here. 164 00:10:18,700 --> 00:10:20,117 It's getting pretty close to this, 165 00:10:20,117 --> 00:10:21,616 and the uncertainty in this number's 166 00:10:21,616 --> 00:10:23,350 getting smaller and smaller as I'm doing 167 00:10:23,350 --> 00:10:25,120 better and better averages. 168 00:10:25,120 --> 00:10:26,900 The average more and more repeats. 169 00:10:26,900 --> 00:10:29,054 Does that make sense? 170 00:10:29,054 --> 00:10:31,590 OK. 171 00:10:31,590 --> 00:10:37,910 So from this key equation, I can derive a lot of things. 172 00:10:37,910 --> 00:10:39,510 And it depends what you want to do. 173 00:10:39,510 --> 00:10:44,220 So one thing people do a lot is it was called model validation. 174 00:10:50,060 --> 00:10:51,180 And what does this mean? 175 00:10:51,180 --> 00:10:54,890 It means I have a model, I believe it's true. 176 00:10:54,890 --> 00:10:57,420 I have some parameters, I believe they're true. 177 00:10:57,420 --> 00:10:59,420 But there are some foolish skeptics out there 178 00:10:59,420 --> 00:11:01,400 who don't have the faith that I do. 179 00:11:01,400 --> 00:11:04,520 And they think that my model's baloney, or my parameter 180 00:11:04,520 --> 00:11:06,440 values are wrong, or something. 181 00:11:06,440 --> 00:11:10,520 And so to prove I'm right, I'm going to make some experiments. 182 00:11:10,520 --> 00:11:12,020 And I'm going to show that I make 183 00:11:12,020 --> 00:11:16,340 a plot that looks like the experiment and model agree. 184 00:11:16,340 --> 00:11:18,795 Some of you might have done this in your life, yes? 185 00:11:18,795 --> 00:11:21,920 Everybody might make a parity plot or something. 186 00:11:21,920 --> 00:11:24,830 You've seen these things before. 187 00:11:24,830 --> 00:11:28,919 Now, this is like a confidence builder. 188 00:11:28,919 --> 00:11:30,710 You're trying to get the skeptics out there 189 00:11:30,710 --> 00:11:33,560 to believe that there's some evidence 190 00:11:33,560 --> 00:11:38,120 to back up your faith that this model is perfect. 191 00:11:38,120 --> 00:11:40,490 And what you really want to know is 192 00:11:40,490 --> 00:11:45,200 like, if the measurement that I measure, 193 00:11:45,200 --> 00:11:49,050 the average for my 10,000 repeated measurements, 194 00:11:49,050 --> 00:11:54,720 I expect that this quantity should be pretty big somehow 195 00:11:54,720 --> 00:11:56,294 in some way. 196 00:11:56,294 --> 00:11:57,710 By then quantitatively saying what 197 00:11:57,710 --> 00:12:02,124 that means exactly what's a good fit, what's a bad fit, 198 00:12:02,124 --> 00:12:04,040 this is actually kind of a difficult question, 199 00:12:04,040 --> 00:12:05,510 and we'll come back to this one. 200 00:12:05,510 --> 00:12:09,470 But that's a very common use of this equation 201 00:12:09,470 --> 00:12:13,200 is to try to do validation. 202 00:12:13,200 --> 00:12:15,400 Now because it's kind of complicated, 203 00:12:15,400 --> 00:12:17,250 most people don't actually do it. 204 00:12:17,250 --> 00:12:19,530 So instead what they do is they just plot 205 00:12:19,530 --> 00:12:22,860 some data points, and they plot your model curve. 206 00:12:22,860 --> 00:12:26,040 And as long as they look good, then you're done. 207 00:12:26,040 --> 00:12:30,990 So that's the normal way that it's done in the literature 208 00:12:30,990 --> 00:12:31,810 currently. 209 00:12:31,810 --> 00:12:33,810 But of course, that's completely unquantitative. 210 00:12:33,810 --> 00:12:36,690 It doesn't really say whether the model and the data 211 00:12:36,690 --> 00:12:39,750 really agree, it just means they look sort of like each other. 212 00:12:39,750 --> 00:12:41,615 So that's like a human qualitative thing. 213 00:12:41,615 --> 00:12:42,990 Now, if the purpose of validation 214 00:12:42,990 --> 00:12:48,270 is just to convince humans, then you've done the purpose. 215 00:12:48,270 --> 00:12:50,670 Now, if your purpose is to try to quantitatively say 216 00:12:50,670 --> 00:12:51,630 something, then you really have to get 217 00:12:51,630 --> 00:12:53,610 into this equation, which usually is not done 218 00:12:53,610 --> 00:12:56,770 but would be the right thing to do for validation. 219 00:12:56,770 --> 00:13:01,740 Now the alternative view is disproving a model. 220 00:13:08,636 --> 00:13:14,200 But I just say that there's several ways this can happen. 221 00:13:14,200 --> 00:13:16,500 You can try to disprove a model, but you might also 222 00:13:16,500 --> 00:13:20,648 show that the theta values are incorrect. 223 00:13:27,360 --> 00:13:31,780 Or you might show that the experiment is wrong. 224 00:13:41,910 --> 00:13:43,410 These are all possibilities, reasons 225 00:13:43,410 --> 00:13:48,330 why the model and the data might not agree with each other. 226 00:13:48,330 --> 00:13:52,820 So this equation, it only holds if the model is really true, 227 00:13:52,820 --> 00:13:55,470 if the parameter values are all perfectly correct, 228 00:13:55,470 --> 00:14:00,790 if we know exactly what all the knob values are perfectly. 229 00:14:00,790 --> 00:14:02,700 If any of those things are not true, 230 00:14:02,700 --> 00:14:05,240 then you should have some discrepancy, 231 00:14:05,240 --> 00:14:07,270 and there should be a way to show it. 232 00:14:07,270 --> 00:14:08,820 And really what you're showing is 233 00:14:08,820 --> 00:14:13,650 that you'd observed some y that is 234 00:14:13,650 --> 00:14:15,270 very unlikely to be observed. 235 00:14:15,270 --> 00:14:18,720 So probably observing that y is very extremely unlikely 236 00:14:18,720 --> 00:14:20,470 if all these other things were true. 237 00:14:20,470 --> 00:14:24,210 So if all these things are true, and you compute this value, 238 00:14:24,210 --> 00:14:26,490 and this value's very tiny, then it 239 00:14:26,490 --> 00:14:29,910 makes you think that it's unlikely that you 240 00:14:29,910 --> 00:14:31,140 would have observed that. 241 00:14:31,140 --> 00:14:33,180 And therefore, you might try to use that as an argument to say 242 00:14:33,180 --> 00:14:34,290 that something must be wrong. 243 00:14:34,290 --> 00:14:36,840 The model's wrong, parameters are wrong, the knobs are wrong, 244 00:14:36,840 --> 00:14:39,180 something's wrong. 245 00:14:39,180 --> 00:14:40,970 My y values are wrong. 246 00:14:40,970 --> 00:14:42,870 It could be any of those things. 247 00:14:42,870 --> 00:14:46,599 So this is often the most exciting papers to publish. 248 00:14:46,599 --> 00:14:49,140 You publish a paper, you take some model that a lot of people 249 00:14:49,140 --> 00:14:50,030 believe. 250 00:14:50,030 --> 00:14:52,810 You tell them they're full of baloney, it's completely wrong. 251 00:14:52,810 --> 00:14:56,940 My great experiment shows you are completely wrong. 252 00:14:56,940 --> 00:14:59,495 And so you'll see a lot of these in Nature. 253 00:14:59,495 --> 00:15:00,870 I should warn you, a lot of those 254 00:15:00,870 --> 00:15:06,780 get retracted later, a very high retraction rate in Nature. 255 00:15:06,780 --> 00:15:08,250 Because they want to publish papers 256 00:15:08,250 --> 00:15:11,760 like that that show that the common view is incorrect, 257 00:15:11,760 --> 00:15:12,900 and sometimes it's true. 258 00:15:12,900 --> 00:15:14,430 But oftentimes the common view is actually correct, 259 00:15:14,430 --> 00:15:15,730 and there's something wrong with the experiment, 260 00:15:15,730 --> 00:15:18,146 or the interpretation, or how they computed this equation, 261 00:15:18,146 --> 00:15:19,040 or whatever. 262 00:15:19,040 --> 00:15:21,770 And so actually it turns out the common view is perfectly fine, 263 00:15:21,770 --> 00:15:26,280 and it's just that the foolish authors went off on a tangent. 264 00:15:26,280 --> 00:15:28,080 And then they have to six months later 265 00:15:28,080 --> 00:15:29,790 publish a retraction, by the way, sorry, 266 00:15:29,790 --> 00:15:31,322 paper was completely wrong. 267 00:15:31,322 --> 00:15:32,530 And so you see a lot of that. 268 00:15:36,570 --> 00:15:39,210 So that's a second kind of thing. 269 00:15:39,210 --> 00:15:42,870 And we'll talk more about that a little bit later, too. 270 00:15:42,870 --> 00:15:51,920 And then another thing is I'll relax my assumptions. 271 00:15:51,920 --> 00:15:56,360 So I'll say, well, I'm sure that the model is true, 272 00:15:56,360 --> 00:15:59,540 and I'm sure that my knob settings are perfect, 273 00:15:59,540 --> 00:16:01,530 and I know what they are. 274 00:16:01,530 --> 00:16:04,460 But I'm not really sure about all the parameters. 275 00:16:04,460 --> 00:16:06,710 And therefore I want to use the experiment 276 00:16:06,710 --> 00:16:10,134 to try to refine parameter values. 277 00:16:19,580 --> 00:16:22,910 So I'm trying to take my y's that I measure and somehow 278 00:16:22,910 --> 00:16:26,630 infer something about the thetas. 279 00:16:26,630 --> 00:16:29,680 And this is a very common thing to do. 280 00:16:29,680 --> 00:16:33,570 So in my group we've tried to measure the rate coefficient 281 00:16:33,570 --> 00:16:35,220 for a reaction. 282 00:16:35,220 --> 00:16:37,297 We believe there is value of that theta, 283 00:16:37,297 --> 00:16:39,630 and in fact, we probably have an estimate of what it is. 284 00:16:39,630 --> 00:16:41,540 But we're not sure of the exact number, 285 00:16:41,540 --> 00:16:43,790 and we'd like to do an experiment to refine the number 286 00:16:43,790 --> 00:16:45,600 and get it more accurately determined. 287 00:16:45,600 --> 00:16:49,150 So that's another useful thing to do. 288 00:16:49,150 --> 00:16:54,100 And this leads into two somewhat different points of view 289 00:16:54,100 --> 00:16:55,870 about this. 290 00:16:55,870 --> 00:16:59,760 One you've probably done already called least squares fitting. 291 00:16:59,760 --> 00:17:00,820 That's one view. 292 00:17:00,820 --> 00:17:03,580 And the other is this Bayesian view 293 00:17:03,580 --> 00:17:06,119 that I'll tell you about next. 294 00:17:06,119 --> 00:17:10,609 So there's sort of A and B. There's 295 00:17:10,609 --> 00:17:14,920 one that I'll call Bayesian, and one I'll call least squares. 296 00:17:18,764 --> 00:17:20,680 They're sort of related to each other, but not 297 00:17:20,680 --> 00:17:22,544 exactly the same conceptually. 298 00:17:22,544 --> 00:17:23,710 So I'll try to explain that. 299 00:17:27,069 --> 00:17:30,190 So the Bayesian view is probabilistic, 300 00:17:30,190 --> 00:17:38,250 so it's actually pretty straightforward to write down. 301 00:17:38,250 --> 00:17:49,170 Remember that we wrote that the probability of A and B 302 00:17:49,170 --> 00:17:52,880 is equal to the probability of A times 303 00:17:52,880 --> 00:17:58,980 the probability of B given A, and it's also 304 00:17:58,980 --> 00:18:02,646 equal to the probability of B times the probability 305 00:18:02,646 --> 00:18:07,650 of A given B. And what we have here is 306 00:18:07,650 --> 00:18:09,440 one of these conditional probabilities, 307 00:18:09,440 --> 00:18:11,330 if the thetas have a certain value, 308 00:18:11,330 --> 00:18:13,290 this is a certain probability. 309 00:18:13,290 --> 00:18:16,410 So I should be able to use that formula somehow. 310 00:18:16,410 --> 00:18:21,770 So I can write down that the probability of measuring 311 00:18:21,770 --> 00:18:35,920 y given theta is equal to the probability of y times 312 00:18:35,920 --> 00:18:43,280 the probability of theta given y divided 313 00:18:43,280 --> 00:18:49,960 by the probability of theta. 314 00:18:49,960 --> 00:18:52,460 So I just took this formula, and I plugged in y's and thetas 315 00:18:52,460 --> 00:18:54,710 instead of A's and B's. 316 00:18:54,710 --> 00:18:57,770 So I said, these two are equal to each other. 317 00:18:57,770 --> 00:18:59,510 Rearranged it so then I can rewrite this. 318 00:18:59,510 --> 00:19:01,700 This is the way we have it in here, probably 319 00:19:01,700 --> 00:19:03,800 of measuring y given theta. 320 00:19:03,800 --> 00:19:05,370 Let's flip it around. 321 00:19:05,370 --> 00:19:11,000 So probability of theta given that we measured 322 00:19:11,000 --> 00:19:21,740 y is equal to the probability of theta times 323 00:19:21,740 --> 00:19:28,430 the probability of observing y if theta was true 324 00:19:28,430 --> 00:19:31,823 divided by the probability of y. 325 00:19:36,725 --> 00:19:37,850 Terrible handwriting there. 326 00:19:41,470 --> 00:19:43,147 That's just algebra. 327 00:19:43,147 --> 00:19:44,480 So this is what we want to know. 328 00:19:44,480 --> 00:19:46,646 We want to know, what's the probability distribution 329 00:19:46,646 --> 00:19:48,090 of the parameter values y? 330 00:19:48,090 --> 00:19:50,490 Because some of them are uncertain. 331 00:19:50,490 --> 00:19:52,680 Now, before we started the experiment, 332 00:19:52,680 --> 00:19:56,220 we had some idea of what the ranges were 333 00:19:56,220 --> 00:19:59,340 for all the primary values. 334 00:19:59,340 --> 00:20:01,620 Like I'm trying to measure a rate coefficient. 335 00:20:01,620 --> 00:20:06,000 I know from experience with other similar reactions, 336 00:20:06,000 --> 00:20:08,550 from a quantum chemistry calculation, 337 00:20:08,550 --> 00:20:12,420 from some indirect evidence, from some other more 338 00:20:12,420 --> 00:20:14,910 complicated experiment, I have some idea 339 00:20:14,910 --> 00:20:17,274 that this rate coefficient has to be in a certain range. 340 00:20:17,274 --> 00:20:18,690 Now, it could be pretty uncertain. 341 00:20:18,690 --> 00:20:21,150 It might be five orders of magnitude uncertain. 342 00:20:21,150 --> 00:20:23,580 But I know it's not less than 0. 343 00:20:23,580 --> 00:20:26,730 I know it can't be faster than the diffusion limit, how 344 00:20:26,730 --> 00:20:28,590 fast things can come together. 345 00:20:28,590 --> 00:20:30,905 So for sure I know some range, and oftentimes I 346 00:20:30,905 --> 00:20:32,640 know a much narrower range than that. 347 00:20:32,640 --> 00:20:35,700 So I have some information about these parameter values 348 00:20:35,700 --> 00:20:36,709 before I even start. 349 00:20:36,709 --> 00:20:38,250 Some of the parameters of the model I 350 00:20:38,250 --> 00:20:39,960 know perfectly, or pretty well. 351 00:20:39,960 --> 00:20:43,090 So you know maybe there's a Planck's constant, or the heat 352 00:20:43,090 --> 00:20:45,090 of formation of one of my chemicals or something 353 00:20:45,090 --> 00:20:46,590 like that shows up in the numbers, 354 00:20:46,590 --> 00:20:48,860 and I might know that parameter really pretty accurately. 355 00:20:48,860 --> 00:20:51,026 Whereas the particular rate coefficient I care about 356 00:20:51,026 --> 00:20:53,940 is the thing I really don't know very well. 357 00:20:53,940 --> 00:20:57,180 So some of these have tight probability distributions 358 00:20:57,180 --> 00:21:00,720 ahead of time, and some of them have loose ones. 359 00:21:00,720 --> 00:21:01,900 And this thing has a name. 360 00:21:01,900 --> 00:21:02,816 It's called the prior. 361 00:21:05,670 --> 00:21:09,864 And it's our prior information before we did the experiment. 362 00:21:09,864 --> 00:21:11,780 And this one, after we've done the experiment, 363 00:21:11,780 --> 00:21:12,770 we're going to change it. 364 00:21:12,770 --> 00:21:14,395 So we're going to say previously people 365 00:21:14,395 --> 00:21:17,474 thought that the parameters all lied in these certain ranges. 366 00:21:17,474 --> 00:21:18,890 And now I'm going to get a tighter 367 00:21:18,890 --> 00:21:20,973 range, because I have some additional experimental 368 00:21:20,973 --> 00:21:22,050 information. 369 00:21:22,050 --> 00:21:23,430 So this is called the posterior. 370 00:21:28,650 --> 00:21:31,440 This means before, it means after. 371 00:21:31,440 --> 00:21:33,330 So this is what I know about parameter values 372 00:21:33,330 --> 00:21:36,920 before and after the experiment. 373 00:21:36,920 --> 00:21:41,050 This is the formula that I have over there. 374 00:21:41,050 --> 00:21:44,624 It's a probability that if the thetas had a certain value, 375 00:21:44,624 --> 00:21:46,290 probably would have observed what I saw. 376 00:21:46,290 --> 00:21:46,790 Yeah? 377 00:21:46,790 --> 00:21:50,234 AUDIENCE: Which one refers to which? 378 00:21:50,234 --> 00:21:53,420 WILLIAM GREEN: Sorry, this is the prior, 379 00:21:53,420 --> 00:21:54,616 this is the posterior. 380 00:21:59,450 --> 00:22:02,115 And those of you who are paying attention to notation 381 00:22:02,115 --> 00:22:03,740 realize I'm not doing this very nicely. 382 00:22:03,740 --> 00:22:06,639 Because these are continuous variables, 383 00:22:06,639 --> 00:22:08,180 and I'm writing capital PRs, and they 384 00:22:08,180 --> 00:22:10,880 should be not be capital PRs, it should be probably density 385 00:22:10,880 --> 00:22:12,516 functions instead. 386 00:22:12,516 --> 00:22:13,640 So let's rewrite it nicely. 387 00:22:18,140 --> 00:22:23,370 So the probability of theta given y, 388 00:22:23,370 --> 00:22:26,780 probability density is equal to the probability density 389 00:22:26,780 --> 00:22:37,422 of theta initially times the probability of y given theta, 390 00:22:37,422 --> 00:22:41,470 [INAUDIBLE] density divided by the probability density of-- 391 00:22:44,900 --> 00:22:47,420 all right? 392 00:22:47,420 --> 00:22:48,920 And what I just basically did was 393 00:22:48,920 --> 00:22:51,378 this is the correct equation the previous other one was all 394 00:22:51,378 --> 00:22:54,807 multiplied by d theta dy, it shouldn't be done that way. 395 00:22:54,807 --> 00:22:55,390 So this is OK? 396 00:22:59,640 --> 00:23:02,100 Now this is the prior information 397 00:23:02,100 --> 00:23:04,670 I have about the parameter values. 398 00:23:04,670 --> 00:23:08,010 I know that they have to fall into some ranges. 399 00:23:08,010 --> 00:23:11,084 And really all I'm doing is I'm correcting that information. 400 00:23:11,084 --> 00:23:13,500 I'm improving the information to tighten the distribution. 401 00:23:13,500 --> 00:23:18,910 So initially I know that my rate constant, here's 402 00:23:18,910 --> 00:23:20,700 my rate constant. 403 00:23:20,700 --> 00:23:24,727 I know that it's got to be greater than zero, 404 00:23:24,727 --> 00:23:27,060 and I don't think it's really down there at zero anyway. 405 00:23:27,060 --> 00:23:28,351 I think it's somewhere in here. 406 00:23:28,351 --> 00:23:29,670 I really don't know much. 407 00:23:29,670 --> 00:23:31,740 And I really don't think it's all up at the diffusion limit, 408 00:23:31,740 --> 00:23:33,740 and no way it's higher than the diffusion limit. 409 00:23:33,740 --> 00:23:35,910 So that's my initial information that I 410 00:23:35,910 --> 00:23:41,340 have about the probability distribution of K. 411 00:23:41,340 --> 00:23:43,800 So it's the rate coefficient I want to know, 412 00:23:43,800 --> 00:23:45,480 and I know it's bigger than zero, 413 00:23:45,480 --> 00:23:47,480 and I know it's less than infinity. 414 00:23:47,480 --> 00:23:49,500 And actually I know there's some physical limit, 415 00:23:49,500 --> 00:23:51,291 it can't be higher than something or other. 416 00:23:53,509 --> 00:23:55,300 And you can do this for any problem, right? 417 00:23:55,300 --> 00:23:56,990 I give you any parameter, you should 418 00:23:56,990 --> 00:23:58,909 be able to tell me something about it. 419 00:23:58,909 --> 00:24:00,950 You might be uncertain by 20 orders of magnitude, 420 00:24:00,950 --> 00:24:02,866 but at least you have an error bar some width. 421 00:24:02,866 --> 00:24:04,742 It can't be anything, right? 422 00:24:04,742 --> 00:24:06,950 A lot of parameters have to be positive, for example. 423 00:24:06,950 --> 00:24:08,390 You know that. 424 00:24:08,390 --> 00:24:10,017 And you usually know something. 425 00:24:10,017 --> 00:24:12,100 you might not think you know anything, but you do, 426 00:24:12,100 --> 00:24:13,683 you actually do know before you start. 427 00:24:13,683 --> 00:24:17,110 So you actually know, this is the P of theta to start with. 428 00:24:19,880 --> 00:24:23,605 And after I've done the experiment, 429 00:24:23,605 --> 00:24:25,980 hopefully you're going to know more information about it. 430 00:24:25,980 --> 00:24:29,610 I might know that this quantity here 431 00:24:29,610 --> 00:24:33,120 is going to be like a Gaussian distribution. 432 00:24:33,120 --> 00:24:35,980 It might have a kind of goofball dependence on theta. 433 00:24:35,980 --> 00:24:38,110 I should comment that. 434 00:24:38,110 --> 00:24:42,740 Notice how theta appears inside F. 435 00:24:42,740 --> 00:24:44,680 So theta's up in the exponent. 436 00:24:44,680 --> 00:24:49,520 It's sort of inside a Gaussian, but it's like processed by F 437 00:24:49,520 --> 00:24:53,240 and so the observable might have a pretty goofball dependence 438 00:24:53,240 --> 00:24:55,240 on this rate coefficient. 439 00:24:55,240 --> 00:24:58,640 So this thing could be some weird thing. 440 00:24:58,640 --> 00:25:03,549 But for sure, when I change theta so this changes a lot, 441 00:25:03,549 --> 00:25:05,340 it's going to make a pretty big difference. 442 00:25:05,340 --> 00:25:08,700 Because up inside the exponent of a Gaussian, 443 00:25:08,700 --> 00:25:11,940 so it's going to drop off a lot somewhere. 444 00:25:11,940 --> 00:25:15,370 So I should get something that looks something like this 445 00:25:15,370 --> 00:25:18,210 maybe for my experiment. 446 00:25:18,210 --> 00:25:24,060 So this one is P of K initially, the prior. 447 00:25:24,060 --> 00:25:28,010 This one is P of yk. 448 00:25:31,910 --> 00:25:34,010 And what this equation says is I want 449 00:25:34,010 --> 00:25:37,330 to multiply those two together. 450 00:25:37,330 --> 00:25:39,240 And so I'm going to multiply this times this, 451 00:25:39,240 --> 00:25:50,180 and I'm going to get some new thing that's 452 00:25:50,180 --> 00:25:55,432 something like that when I multiply this time that. 453 00:25:55,432 --> 00:25:58,340 Is that OK? 454 00:25:58,340 --> 00:26:03,090 And so that's my new numerator of this equation. 455 00:26:03,090 --> 00:26:05,226 Now this denominator doesn't make too much sense. 456 00:26:05,226 --> 00:26:06,600 This says, what's the probability 457 00:26:06,600 --> 00:26:13,035 that I measured the mean I measured, given nothing? 458 00:26:13,035 --> 00:26:14,910 So this is sort of like the prior probability 459 00:26:14,910 --> 00:26:16,750 that I would have measured it or something. 460 00:26:16,750 --> 00:26:17,833 I don't know what this is. 461 00:26:17,833 --> 00:26:22,100 So instead what people do, is they say, forget this. 462 00:26:22,100 --> 00:26:27,234 But instead, let's multiply this by a constant that's 463 00:26:27,234 --> 00:26:29,400 going to normalize it to make it probability density 464 00:26:29,400 --> 00:26:31,370 so that it integrates to one. 465 00:26:34,370 --> 00:26:36,950 So that's the way Bayes' theorem is used. 466 00:26:36,950 --> 00:26:38,540 This is called Bayesian analysis. 467 00:26:42,140 --> 00:26:43,850 And so what it's telling you is how 468 00:26:43,850 --> 00:26:49,220 to take your experimental information as expressed 469 00:26:49,220 --> 00:26:59,010 in this formula and use all your previous information 470 00:26:59,010 --> 00:27:03,030 about the parameters, put them all together, 471 00:27:03,030 --> 00:27:06,490 now we have a cumulative information about everything. 472 00:27:06,490 --> 00:27:09,760 So we have some parameters that came into our problem 473 00:27:09,760 --> 00:27:12,370 into my experiment, but from previous work, 474 00:27:12,370 --> 00:27:14,630 I also knew something about those parameters. 475 00:27:14,630 --> 00:27:16,330 Now I put it all together and I get 476 00:27:16,330 --> 00:27:19,180 a new value of probability distribution 477 00:27:19,180 --> 00:27:21,020 of those parameters. 478 00:27:21,020 --> 00:27:23,530 And if my expert was really good, 479 00:27:23,530 --> 00:27:27,237 it would make this really tight [WHOOSHING SOUND].. 480 00:27:27,237 --> 00:27:29,070 And then when I multiply these two together, 481 00:27:29,070 --> 00:27:32,470 it's going to make this really sharp, 482 00:27:32,470 --> 00:27:35,380 and we have a really good value of k. 483 00:27:35,380 --> 00:27:37,250 So that's like the ideal case if I 484 00:27:37,250 --> 00:27:39,440 have a really great, well-designed experiment 485 00:27:39,440 --> 00:27:44,090 executed perfectly with great precision, then I can do this. 486 00:27:44,090 --> 00:27:46,214 More generally, when I don't think about it, 487 00:27:46,214 --> 00:27:47,630 I get some distribution like this. 488 00:27:50,210 --> 00:27:52,830 I still learn something compared to what I had before, 489 00:27:52,830 --> 00:27:54,440 but it might not be much. 490 00:27:54,440 --> 00:27:57,410 So now I can end up with some distribution that's 491 00:27:57,410 --> 00:27:59,630 a little tighter than before. 492 00:28:04,130 --> 00:28:07,190 So is this OK so far? 493 00:28:07,190 --> 00:28:11,040 All right, now this is super simple. 494 00:28:11,040 --> 00:28:13,200 I didn't have to solve anything, all 495 00:28:13,200 --> 00:28:17,120 I had to do was multiply two distributions together. 496 00:28:17,120 --> 00:28:20,810 So in some respects, this is what you should always do. 497 00:28:20,810 --> 00:28:22,962 All you do is you take your experiment, 498 00:28:22,962 --> 00:28:25,170 you multiply the probability distribution corresponds 499 00:28:25,170 --> 00:28:27,334 to your experiment times the prior, 500 00:28:27,334 --> 00:28:29,750 and you get some posterior, and that's why new information 501 00:28:29,750 --> 00:28:31,340 about the distribution. 502 00:28:31,340 --> 00:28:33,870 And if I have a distribution like this, 503 00:28:33,870 --> 00:28:37,370 suppose this is my new distribution here, 504 00:28:37,370 --> 00:28:41,240 I can still get it central value, that's my mean value, k. 505 00:28:41,240 --> 00:28:44,010 I can get an estimate of the range of k. 506 00:28:44,010 --> 00:28:45,780 So I end up with a k plus or minus 507 00:28:45,780 --> 00:28:50,870 dk maybe, from just looking at the plot. 508 00:28:50,870 --> 00:28:53,670 In fact, I never even have to evaluate what this constant is 509 00:28:53,670 --> 00:28:54,540 in order to do this. 510 00:28:54,540 --> 00:28:57,830 I can just go look at the plot, see where the peak is, 511 00:28:57,830 --> 00:29:01,030 figure out the width, and I can report now 512 00:29:01,030 --> 00:29:03,460 because in my experiment, k plus or minus 513 00:29:03,460 --> 00:29:05,590 dk is more precisely determined than it was before. 514 00:29:09,150 --> 00:29:12,840 Now, a practical challenge with this 515 00:29:12,840 --> 00:29:15,980 is that theta is usually a lot of parameters. 516 00:29:15,980 --> 00:29:19,700 And I only drew the plot here in one dimension, 517 00:29:19,700 --> 00:29:23,280 but really it's a multi-dimensional plot. 518 00:29:23,280 --> 00:29:27,205 So really what looked like, suppose I had two parameters. 519 00:29:27,205 --> 00:29:29,580 I had my k I care about, and I have some other parameter, 520 00:29:29,580 --> 00:29:34,870 theta 2, that also it shows up in my model. 521 00:29:34,870 --> 00:29:39,170 And say, before I started, I knew 522 00:29:39,170 --> 00:29:43,570 theta 2 fell in this kind of range, 523 00:29:43,570 --> 00:29:47,950 and I knew k fell in this kind of range. 524 00:29:47,950 --> 00:29:51,880 So really before I started, if I think what it looks like, 525 00:29:51,880 --> 00:29:57,440 I really had sort of a blobby rectangular contour 526 00:29:57,440 --> 00:30:02,460 plot, where I think it's more likely that the k 527 00:30:02,460 --> 00:30:05,506 value and the theta 2 value are somewhere in this range. 528 00:30:05,506 --> 00:30:08,130 And the most likely one is maybe somewhere in the middle there. 529 00:30:08,130 --> 00:30:09,780 But I really didn't know much. 530 00:30:09,780 --> 00:30:12,720 So it could be anywhere in this whole blob. 531 00:30:12,720 --> 00:30:15,960 Now, when I do the experiment, the experimental value 532 00:30:15,960 --> 00:30:19,240 depends of both k and theta 2. 533 00:30:19,240 --> 00:30:22,510 And commonly what'll happen is that the distribution 534 00:30:22,510 --> 00:30:25,190 from the experiment-- 535 00:30:25,190 --> 00:30:28,718 need color chalk here. 536 00:30:28,718 --> 00:30:31,100 Let's get rid of these guys. 537 00:30:31,100 --> 00:30:33,932 So this is my probably distribution, there's my prior. 538 00:30:33,932 --> 00:30:37,850 If I do the experiment, maybe I'll have something like this. 539 00:30:37,850 --> 00:30:40,510 That the experiment says that the guys 540 00:30:40,510 --> 00:30:47,094 have to be somewhere in the contour plot like this. 541 00:30:47,094 --> 00:30:48,510 Because I can get pretty good fits 542 00:30:48,510 --> 00:30:50,080 of the data with different values of k 543 00:30:50,080 --> 00:30:52,038 as long as I compensate with the value theta 2. 544 00:30:54,780 --> 00:30:58,370 Now I multiply these two dimensional functions. 545 00:30:58,370 --> 00:31:02,410 The original is a blob function, and this 546 00:31:02,410 --> 00:31:05,020 is a stretched out blob. 547 00:31:05,020 --> 00:31:08,140 And I multiply a stretched out blob times a fat blob, 548 00:31:08,140 --> 00:31:09,970 I get some stretched out blob that 549 00:31:09,970 --> 00:31:12,170 looks something like the intersection of these guys. 550 00:31:12,170 --> 00:31:14,587 And so I end up with some kind of blob like that. 551 00:31:14,587 --> 00:31:15,670 I'll draw it really thick. 552 00:31:15,670 --> 00:31:21,940 So this is my posterior, some kind of blob like this. 553 00:31:21,940 --> 00:31:25,060 So now I know a little bit more about these two parameters 554 00:31:25,060 --> 00:31:29,370 than I did before I started because of my experiment. 555 00:31:29,370 --> 00:31:30,417 Is this OK? 556 00:31:30,417 --> 00:31:32,500 I really can't say I know what the real value of k 557 00:31:32,500 --> 00:31:34,005 is, or the value of theta 2. 558 00:31:34,005 --> 00:31:36,439 But I know that combinations of k 559 00:31:36,439 --> 00:31:38,730 and theta 2 that are sort of in this range, all of them 560 00:31:38,730 --> 00:31:40,438 will give me pretty good fits to my data, 561 00:31:40,438 --> 00:31:43,500 and also be consistent with all the previous information I have 562 00:31:43,500 --> 00:31:46,500 about those parameter values. 563 00:31:46,500 --> 00:31:48,880 Is that all right? 564 00:31:48,880 --> 00:31:51,394 Now, I drew it with two parameters. 565 00:31:51,394 --> 00:31:53,560 In a lot of models we have, we have five parameters, 566 00:31:53,560 --> 00:31:55,643 six parameters, seven parameters, nine parameters, 567 00:31:55,643 --> 00:31:56,540 14 parameters. 568 00:31:56,540 --> 00:31:58,360 We have a lot of parameters. 569 00:31:58,360 --> 00:32:01,660 And so then we try to make this plot, even how 570 00:32:01,660 --> 00:32:05,784 to display the plot is going to be a little problematic. 571 00:32:05,784 --> 00:32:06,700 But it's there, right? 572 00:32:06,700 --> 00:32:10,630 And somehow, we still narrowed down the hypervolume 573 00:32:10,630 --> 00:32:12,970 in the parameter space from whatever 574 00:32:12,970 --> 00:32:15,580 it was to begin with to now we know 575 00:32:15,580 --> 00:32:16,850 something a little bit better. 576 00:32:16,850 --> 00:32:19,389 We have a narrower range of the parameters that 577 00:32:19,389 --> 00:32:21,680 would be consistent with all the information available, 578 00:32:21,680 --> 00:32:24,400 including my new experiment. 579 00:32:24,400 --> 00:32:26,560 And then the next guy does his experiment, 580 00:32:26,560 --> 00:32:29,350 and he does an experiment that shows that these guys have 581 00:32:29,350 --> 00:32:32,320 to be somewhere in this range in order to be 582 00:32:32,320 --> 00:32:34,570 consistent with his experiment. 583 00:32:34,570 --> 00:32:37,660 And so now I can narrow down the range 584 00:32:37,660 --> 00:32:39,490 to be something like that. 585 00:32:39,490 --> 00:32:41,240 And the next person does their experiment, 586 00:32:41,240 --> 00:32:42,540 and they get something else, and something else, 587 00:32:42,540 --> 00:32:43,360 and something else. 588 00:32:43,360 --> 00:32:46,030 And eventually by 2050, we have a pretty nice determination 589 00:32:46,030 --> 00:32:48,970 of the parameter values. 590 00:32:48,970 --> 00:32:51,760 So that's the advance of science, 591 00:32:51,760 --> 00:32:55,212 as drawn in chalk by Professor Green at the board. 592 00:32:59,890 --> 00:33:02,020 So this is a very important way to think 593 00:33:02,020 --> 00:33:04,740 about it, is what you're doing when you do experiments, 594 00:33:04,740 --> 00:33:09,730 is you're generally restricting the range of parameter space 595 00:33:09,730 --> 00:33:12,030 that's still consistent with everything. 596 00:33:12,030 --> 00:33:13,870 And when we mean consistent, we mean 597 00:33:13,870 --> 00:33:16,286 that the probability that you would have observed what you 598 00:33:16,286 --> 00:33:18,174 did observe is reasonably high. 599 00:33:18,174 --> 00:33:20,590 We'll still have to come back to quantitatively figure out 600 00:33:20,590 --> 00:33:21,796 what reasonably high means. 601 00:33:25,600 --> 00:33:29,440 Now, when you did this before when you were kids, 602 00:33:29,440 --> 00:33:31,450 nobody mentioned the word Bayes, or Bayesian, 603 00:33:31,450 --> 00:33:34,630 or conditional probabilities, right? 604 00:33:34,630 --> 00:33:38,077 So they just said, oh, just do a least squares fit. 605 00:33:38,077 --> 00:33:39,410 How many of you did that before? 606 00:33:42,930 --> 00:33:45,660 So somebody told me before, forget this stuff, 607 00:33:45,660 --> 00:33:47,790 we're going to never even mention this stuff. 608 00:33:47,790 --> 00:33:49,581 We're just going to do a least squares fit. 609 00:33:56,300 --> 00:33:59,060 Now, where did the least squares fit idea came from? 610 00:33:59,060 --> 00:34:01,670 It came from looking at this formula and saying, 611 00:34:01,670 --> 00:34:06,020 you know, this is the deviations between the experiment 612 00:34:06,020 --> 00:34:11,719 and the model prediction, and I weigh them somehow, 613 00:34:11,719 --> 00:34:13,061 and I have the square. 614 00:34:13,061 --> 00:34:14,810 And that's the thing I want to make small. 615 00:34:14,810 --> 00:34:20,239 If I have a high probability that what I observed really 616 00:34:20,239 --> 00:34:23,659 happened, or the probably I'm going to observe this, 617 00:34:23,659 --> 00:34:25,550 it's got to be that these guys have to be 618 00:34:25,550 --> 00:34:26,580 reasonably close to each other. 619 00:34:26,580 --> 00:34:27,610 They're really different. 620 00:34:27,610 --> 00:34:29,485 And it's going to be very small, because it's 621 00:34:29,485 --> 00:34:30,770 inside an exponential. 622 00:34:30,770 --> 00:34:32,659 And if those guys are really different, 623 00:34:32,659 --> 00:34:35,170 and the squared thing is really large, 624 00:34:35,170 --> 00:34:37,210 then the probability is incredibly small 625 00:34:37,210 --> 00:34:39,290 that I would have observed that. 626 00:34:39,290 --> 00:34:42,679 So we think that this thing should be small. 627 00:34:42,679 --> 00:34:47,600 And in fact, if I want to get the very best fit I can get, 628 00:34:47,600 --> 00:34:50,840 which means like the probability was the highest of what 629 00:34:50,840 --> 00:34:54,460 I observed in the real observation or something, 630 00:34:54,460 --> 00:34:56,492 then if I'm free to adjust one of these thetas, 631 00:34:56,492 --> 00:34:57,950 I can adjust the theta, try to make 632 00:34:57,950 --> 00:35:01,057 this thing like equal to zero, or small as I can. 633 00:35:01,057 --> 00:35:03,390 So that's where the concept of least squares comes from. 634 00:35:07,290 --> 00:35:11,360 Now, when you're doing least squares, 635 00:35:11,360 --> 00:35:14,236 you almost always have multiple parameters, 636 00:35:14,236 --> 00:35:16,610 and therefore you're going to have to have multiple data. 637 00:35:16,610 --> 00:35:20,234 And they can't just be a repeat of one number. 638 00:35:20,234 --> 00:35:22,400 Can't be your data, it's not sufficient to determine 639 00:35:22,400 --> 00:35:23,870 the parameters. 640 00:35:23,870 --> 00:35:25,710 So normally when you do an experiment, 641 00:35:25,710 --> 00:35:27,700 you have to change the knobs. 642 00:35:27,700 --> 00:35:30,660 We have to make measurements in a couple different conditions. 643 00:35:30,660 --> 00:35:31,970 Like for example, kinetics. 644 00:35:31,970 --> 00:35:35,120 You often want the Arrhenius A factor and the EA. 645 00:35:35,120 --> 00:35:37,092 And so I got to run the experiment 646 00:35:37,092 --> 00:35:39,050 in more than one temperature or I'm never going 647 00:35:39,050 --> 00:35:39,960 to be able to figure that out. 648 00:35:39,960 --> 00:35:42,080 So I have to change the temperature in my reactor. 649 00:35:42,080 --> 00:35:43,829 Make some measurements at one temperature, 650 00:35:43,829 --> 00:35:46,310 and make some measurements at a different temperature. 651 00:35:46,310 --> 00:35:49,040 And for almost everything in life that you want to measure, 652 00:35:49,040 --> 00:35:50,540 you're going to have to do this. 653 00:35:50,540 --> 00:35:52,940 You vary the concentration of your enzyme 654 00:35:52,940 --> 00:35:55,796 if you want to see how the enzyme kinetics depends 655 00:35:55,796 --> 00:35:57,170 on something. you can't just keep 656 00:35:57,170 --> 00:35:59,630 running exactly the same condition over and over. 657 00:35:59,630 --> 00:36:01,657 You'll get that number really precise, 658 00:36:01,657 --> 00:36:03,740 but it's not enough information to really fill out 659 00:36:03,740 --> 00:36:05,280 the parameters in your model. 660 00:36:05,280 --> 00:36:08,870 So you're going to have to run several different experiments 661 00:36:08,870 --> 00:36:10,640 with different knob settings. 662 00:36:10,640 --> 00:36:13,920 Also, normally we don't just measure one quantity, one 663 00:36:13,920 --> 00:36:15,170 observable in each experiment. 664 00:36:15,170 --> 00:36:17,790 We usually try to measure as many things as we can. 665 00:36:17,790 --> 00:36:19,527 So we actually have several observables 666 00:36:19,527 --> 00:36:21,860 at each knob setting, and we have several knob settings, 667 00:36:21,860 --> 00:36:23,152 so we have quite a lot of data. 668 00:36:23,152 --> 00:36:25,485 And each one of those is repeated a whole bunch of times 669 00:36:25,485 --> 00:36:28,400 so that we're confident that we can use this Gaussian formula. 670 00:36:28,400 --> 00:36:39,120 And so what we really have is the pi-th observable measured 671 00:36:39,120 --> 00:36:47,150 at the l-th knob position. 672 00:36:47,150 --> 00:36:48,740 Well, I'm sorry, l's not good either, 673 00:36:48,740 --> 00:36:51,684 it's used in your notes for something else. 674 00:36:51,684 --> 00:36:54,460 M, there you go. 675 00:36:54,460 --> 00:36:56,000 The m-th knob position. 676 00:36:56,000 --> 00:36:59,380 Now, normally you have several knobs, so that's a factor. 677 00:36:59,380 --> 00:37:03,720 And we have a lot of observables we can make at each position. 678 00:37:03,720 --> 00:37:08,340 So this thing is a measurement. 679 00:37:08,340 --> 00:37:12,650 And we repeated this multiple times so I can get the average. 680 00:37:12,650 --> 00:37:20,960 And we're also going to have a corresponding sigma I M, which 681 00:37:20,960 --> 00:37:25,230 is the variance of the mean. 682 00:37:25,230 --> 00:37:27,350 So it's variance, that's going to be divided 683 00:37:27,350 --> 00:37:30,020 by the square root of the number of repeats 684 00:37:30,020 --> 00:37:33,560 for that particular experiment and that particular observable. 685 00:37:33,560 --> 00:37:36,260 So this is your incoming data set, 686 00:37:36,260 --> 00:37:44,750 and you also have your model which predicts y model, 687 00:37:44,750 --> 00:37:54,878 it predicts the observable i for the sequel to fi of xk theta. 688 00:37:59,559 --> 00:38:01,100 So if you have certain knob settings, 689 00:38:01,100 --> 00:38:04,500 like certain temperature, and you have your parameter values, 690 00:38:04,500 --> 00:38:07,610 then you can calculate what the model thinks 691 00:38:07,610 --> 00:38:09,895 should be the observable value, and then you 692 00:38:09,895 --> 00:38:14,570 can actually measure it and measure its variance. 693 00:38:14,570 --> 00:38:15,960 so that's the normal situation. 694 00:38:15,960 --> 00:38:19,250 And now you want to figure out, are there some values 695 00:38:19,250 --> 00:38:24,789 of the theta that make the model and the data agree? 696 00:38:24,789 --> 00:38:26,580 And that's the least squares fitting thing. 697 00:38:26,580 --> 00:38:32,093 So what we can define as a new quantity, 698 00:38:32,093 --> 00:38:43,170 weight of the residual vector EJ, which is defined to be jk, 699 00:38:43,170 --> 00:38:45,220 to be consistent with Joe Scott's notes. 700 00:38:45,220 --> 00:38:47,822 AUDIENCE: Is k the same as m? 701 00:38:47,822 --> 00:38:49,280 WILLIAM GREEN: M is in oppositions, 702 00:38:49,280 --> 00:38:50,823 I'll tell you what k is in a second. 703 00:39:00,000 --> 00:39:02,890 m, sorry. 704 00:39:02,890 --> 00:39:03,780 Too many indices. 705 00:39:17,390 --> 00:39:21,240 OK, so this is the residual between the model position. 706 00:39:21,240 --> 00:39:25,500 And now-- oh man, I'm sorry, [INAUDIBLE].. 707 00:39:25,500 --> 00:39:31,730 K is an index over i and m. 708 00:39:31,730 --> 00:39:34,127 So k is just going to list all the data you got. 709 00:39:34,127 --> 00:39:36,210 Some of the data came from the same knob settings, 710 00:39:36,210 --> 00:39:37,835 some came from different knob settings. 711 00:39:37,835 --> 00:39:38,372 Yeah? 712 00:39:38,372 --> 00:39:42,660 AUDIENCE: So is x the m the y model i [INAUDIBLE]?? 713 00:39:42,660 --> 00:39:43,910 WILLIAM GREEN: Thank you, yes. 714 00:39:49,110 --> 00:39:57,480 y model i, I guess this is now k. 715 00:40:06,280 --> 00:40:08,005 And so k is one of these indices that 716 00:40:08,005 --> 00:40:10,630 carry-- you can bind two indices together and put them together 717 00:40:10,630 --> 00:40:12,940 just like you did in your PD problems. 718 00:40:12,940 --> 00:40:13,440 All right. 719 00:40:17,020 --> 00:40:20,541 Now, I wrote down this sigma. 720 00:40:20,541 --> 00:40:22,540 But actually if you're measuring multiple things 721 00:40:22,540 --> 00:40:24,070 at the same experiment, you should 722 00:40:24,070 --> 00:40:26,500 expect them to be correlated. 723 00:40:26,500 --> 00:40:29,010 So really what we should worry about 724 00:40:29,010 --> 00:40:34,930 is the c, the covariance matrix, that we defined last time. 725 00:40:34,930 --> 00:40:38,750 So you should also compute that thing. 726 00:40:38,750 --> 00:40:45,170 And so what you should expect is the probability density 727 00:40:45,170 --> 00:40:51,800 that we would measure any particular residuals 728 00:40:51,800 --> 00:40:53,900 if the model is true. 729 00:40:53,900 --> 00:40:56,842 And if we have these certain parameters, theta, 730 00:40:56,842 --> 00:41:02,715 this should be equal to 2 pi negative k over 2 731 00:41:02,715 --> 00:41:10,200 the determinant of c negative 1.2 exponential 732 00:41:10,200 --> 00:41:22,180 of negative 1/2, epsilon transpose c inverse epsilon. 733 00:41:22,180 --> 00:41:26,671 So this is the multi measurement version of the same equation 734 00:41:26,671 --> 00:41:27,170 here. 735 00:41:31,420 --> 00:41:35,410 So this is the quantity that we think 736 00:41:35,410 --> 00:41:40,150 should be small if we have good parameter values 737 00:41:40,150 --> 00:41:42,019 and we did a good experiment. 738 00:41:42,019 --> 00:41:43,810 Actually, even when we did bad experiments, 739 00:41:43,810 --> 00:41:46,210 still should be small if we have good parameter values. 740 00:41:48,990 --> 00:41:51,360 And that's because the c's, if we did a bad experiment, 741 00:41:51,360 --> 00:41:53,330 we'll have a high variance or something, 742 00:41:53,330 --> 00:41:55,780 then we should see the c's will give us weightings 743 00:41:55,780 --> 00:41:57,420 that will reflect that. 744 00:41:57,420 --> 00:41:58,580 Yeah? 745 00:41:58,580 --> 00:42:00,980 AUDIENCE: [INAUDIBLE] 746 00:42:00,980 --> 00:42:02,900 WILLIAM GREEN: Is that-- 747 00:42:02,900 --> 00:42:07,700 AUDIENCE: So you have the next [INAUDIBLE] K [INAUDIBLE].. 748 00:42:07,700 --> 00:42:13,410 WILLIAM GREEN: Oh I'm sorry, this is the capital K, 749 00:42:13,410 --> 00:42:17,810 this is the number of data points. 750 00:42:17,810 --> 00:42:24,748 So little k is equal to 1 to capital K. 751 00:42:24,748 --> 00:42:28,140 AUDIENCE: So does capital K count for both experiments? 752 00:42:28,140 --> 00:42:32,330 WILLIAM GREEN: It's the number of distinct data values 753 00:42:32,330 --> 00:42:34,740 after you've already averaged over repeats. 754 00:42:34,740 --> 00:42:39,960 So you do m experiments, at each experiment you measure capital 755 00:42:39,960 --> 00:42:42,270 I observables. 756 00:42:42,270 --> 00:42:45,860 So it's like m times I, so K. If you measured everything 757 00:42:45,860 --> 00:42:50,010 in every experiment, it's equal to I times m. 758 00:42:59,060 --> 00:43:04,130 Now there's two ways that people approach this in literature. 759 00:43:04,130 --> 00:43:05,720 The fancy way is you say, you know, 760 00:43:05,720 --> 00:43:09,490 this covariance matrix comes in in a pretty important way 761 00:43:09,490 --> 00:43:11,870 into this probability distribution function. 762 00:43:11,870 --> 00:43:14,270 And so maybe I need to worry a lot about whether I really 763 00:43:14,270 --> 00:43:16,500 know the covariance matrix. 764 00:43:16,500 --> 00:43:22,440 And my uncertainty in the mean drops pretty fast 765 00:43:22,440 --> 00:43:26,020 as I do averaging, but I'm not so confident 766 00:43:26,020 --> 00:43:29,350 that my answer in the covariance matrix was small. 767 00:43:29,350 --> 00:43:31,980 So what people do sometimes is they'll 768 00:43:31,980 --> 00:43:40,940 try to vary both c and theta, and try to get a best fit where 769 00:43:40,940 --> 00:43:41,750 they're varying c. 770 00:43:41,750 --> 00:43:43,458 But then they have additional constraints 771 00:43:43,458 --> 00:43:46,190 on c that c has to satisfy the equations you gave last time 772 00:43:46,190 --> 00:43:49,384 about how you calculate the covariance matrix from data. 773 00:43:49,384 --> 00:43:51,050 And so I was saying, well, I want this c 774 00:43:51,050 --> 00:43:54,050 to personify these equations pretty well, 775 00:43:54,050 --> 00:44:02,930 but true covariance of the world of the system 776 00:44:02,930 --> 00:44:04,370 is not the same as what I actually 777 00:44:04,370 --> 00:44:08,460 measure by just measuring, say, five repeats of an experiment. 778 00:44:08,460 --> 00:44:10,722 And so I might want to vary the c. 779 00:44:10,722 --> 00:44:14,120 You try to vary the c, turns out to be kind of complicated math, 780 00:44:14,120 --> 00:44:15,770 so not many people do it. 781 00:44:15,770 --> 00:44:17,646 Even though conceptually it makes some sense, 782 00:44:17,646 --> 00:44:19,686 you should worry about the fact you're not really 783 00:44:19,686 --> 00:44:20,810 sure about the covariance. 784 00:44:20,810 --> 00:44:22,950 So what a lot of people do is they say, 785 00:44:22,950 --> 00:44:25,460 let's just use the c that's computed from the formulas 786 00:44:25,460 --> 00:44:27,180 I gave you last time experimentally. 787 00:44:27,180 --> 00:44:34,190 So just say, let's just take c experimental, put them in here. 788 00:44:34,190 --> 00:44:36,600 And now this is a constant. 789 00:44:36,600 --> 00:44:40,440 And now the only thing that varies in this problem 790 00:44:40,440 --> 00:44:43,290 is thetas which come into the epsilons. 791 00:44:43,290 --> 00:44:47,290 Because the epsilons depend on theta. 792 00:44:47,290 --> 00:44:49,980 And so in that case, I can just try 793 00:44:49,980 --> 00:44:53,390 to maximize this probability. 794 00:44:53,390 --> 00:44:55,990 And what that happens to do is to minimize 795 00:44:55,990 --> 00:44:59,330 this quantity in the exponent. 796 00:44:59,330 --> 00:45:02,650 And so all I need to do is say, for example, 797 00:45:02,650 --> 00:45:11,500 theta best is equal to arg [INAUDIBLE] theta epsilon 798 00:45:11,500 --> 00:45:16,737 of theta c epsilon. 799 00:45:22,550 --> 00:45:24,550 And so this is the least squares fitting problem 800 00:45:24,550 --> 00:45:26,320 that you guys have probably done before. 801 00:45:26,320 --> 00:45:27,695 And probably what you did was you 802 00:45:27,695 --> 00:45:30,580 assumed I had perfectly uncorrelated data, 803 00:45:30,580 --> 00:45:32,170 and all my errors were the same. 804 00:45:32,170 --> 00:45:35,380 And so c, which is the identity matrix, and I took it out. 805 00:45:35,380 --> 00:45:36,521 Probably did that before? 806 00:45:36,521 --> 00:45:39,290 Yeah, OK. 807 00:45:39,290 --> 00:45:42,710 That's pretty dangerous to do, I'd say. 808 00:45:42,710 --> 00:45:45,400 What people do a lot, which is a little bit less dangerous, 809 00:45:45,400 --> 00:45:47,480 is at least say, well, you know, when 810 00:45:47,480 --> 00:45:53,540 I measure the concentration of species x by GC, 811 00:45:53,540 --> 00:45:57,020 I have an error bar of plus or minus 5%. 812 00:45:57,020 --> 00:46:00,080 And when I measure the temperature 813 00:46:00,080 --> 00:46:02,525 with my thermocouple, I have an error bar 814 00:46:02,525 --> 00:46:04,790 of plus or minus 2 degrees. 815 00:46:04,790 --> 00:46:08,360 And so the variances of these guys should be a lot different, 816 00:46:08,360 --> 00:46:11,180 temperature and GC signal. 817 00:46:11,180 --> 00:46:14,895 And therefore I definitely need to weight my deviation somehow. 818 00:46:14,895 --> 00:46:16,520 And it's really what you do is you keep 819 00:46:16,520 --> 00:46:19,551 the diagonal entries of this. 820 00:46:19,551 --> 00:46:20,300 That's often done. 821 00:46:20,300 --> 00:46:23,570 And we just forget the fact that they might be covariant. 822 00:46:23,570 --> 00:46:25,040 But if you've done the experiments, 823 00:46:25,040 --> 00:46:26,310 you actually do have enough information 824 00:46:26,310 --> 00:46:27,840 to compute this thing anyway, so you might as well just 825 00:46:27,840 --> 00:46:29,068 use the experimental value. 826 00:46:33,860 --> 00:46:35,340 So this is the least squares thing. 827 00:46:35,340 --> 00:46:38,502 And let's think, what the heck is this doing? 828 00:46:38,502 --> 00:46:40,960 We're saying, all of a sudden we grabbed all the parameters 829 00:46:40,960 --> 00:46:42,626 in the model, which might include things 830 00:46:42,626 --> 00:46:45,940 like the molecular weight of hydrogen or something. 831 00:46:45,940 --> 00:46:49,120 And we can find the very best values 832 00:46:49,120 --> 00:46:52,445 that would make our data match the experiment as best as 833 00:46:52,445 --> 00:46:52,945 possible. 834 00:46:55,630 --> 00:46:57,820 And in some sense, that's great, we 835 00:46:57,820 --> 00:47:00,310 know the best values and parameters for our experiment. 836 00:47:00,310 --> 00:47:02,170 But of course, if we vary the molecular weight of hydrogen, 837 00:47:02,170 --> 00:47:04,330 it's going to screw up somebody else's experiment. 838 00:47:04,330 --> 00:47:05,830 Because somebody else did some other experiment 839 00:47:05,830 --> 00:47:07,913 that depended on the molecular weight of hydrogen, 840 00:47:07,913 --> 00:47:09,980 and they had to get some other value 841 00:47:09,980 --> 00:47:12,400 to match their experiment. 842 00:47:12,400 --> 00:47:15,820 So in these parameter set, anything 843 00:47:15,820 --> 00:47:17,380 I do to vary those parameters, I got 844 00:47:17,380 --> 00:47:20,209 to watch out that maybe some of those parameters 845 00:47:20,209 --> 00:47:22,500 are involved with somebody else's model and [INAUDIBLE] 846 00:47:22,500 --> 00:47:24,010 some other experiments. 847 00:47:24,010 --> 00:47:27,230 And I'm not really free to vary them all freely. 848 00:47:27,230 --> 00:47:28,850 So this is the idea from the Bayesian 849 00:47:28,850 --> 00:47:32,020 of having the prior intimation is 850 00:47:32,020 --> 00:47:35,500 so you know some of the ranges on these thetas already, 851 00:47:35,500 --> 00:47:38,454 and some of them you might know really sharp distributions, 852 00:47:38,454 --> 00:47:40,870 like the molecular weight of hydrogen. You might know that 853 00:47:40,870 --> 00:47:43,070 to a lot of decimal places. 854 00:47:43,070 --> 00:47:46,810 And so when people do this, normally you 855 00:47:46,810 --> 00:47:49,120 don't vary all of the thetas. 856 00:47:49,120 --> 00:47:51,450 Usually what you do is you select a set of thetas 857 00:47:51,450 --> 00:47:55,020 that you feel free to vary because they're so uncertain, 858 00:47:55,020 --> 00:47:58,380 and other thetas that you think, oh, I better not touch them. 859 00:47:58,380 --> 00:48:01,716 Because if I adjust them, I may go 860 00:48:01,716 --> 00:48:03,840 to crazy values that are inconsistent with somebody 861 00:48:03,840 --> 00:48:05,275 else's experiment. 862 00:48:05,275 --> 00:48:07,150 So a lot of times like the molecular weights, 863 00:48:07,150 --> 00:48:08,080 you would not touch them. 864 00:48:08,080 --> 00:48:09,704 You would just say, I got to just stick 865 00:48:09,704 --> 00:48:12,640 to the recommended values and the tables. 866 00:48:12,640 --> 00:48:14,890 I'm not free to vary the molecular weight of hydrogen, 867 00:48:14,890 --> 00:48:16,640 even though if I did it would make my data 868 00:48:16,640 --> 00:48:18,460 match my experiment better. 869 00:48:18,460 --> 00:48:20,390 Makes my model and the experiment 870 00:48:20,390 --> 00:48:22,520 match more precisely. 871 00:48:22,520 --> 00:48:25,450 So deciding which parameters to vary in this 872 00:48:25,450 --> 00:48:27,490 is a really crucial thing. 873 00:48:30,610 --> 00:48:35,800 And that's a lot of the art of doing 874 00:48:35,800 --> 00:48:39,430 this has to do with that issue. 875 00:48:39,430 --> 00:48:42,130 Also, you don't have to keep the thetas 876 00:48:42,130 --> 00:48:43,310 in the form you have them. 877 00:48:43,310 --> 00:48:44,560 You could do a transformation. 878 00:48:44,560 --> 00:48:49,420 So you could change to, say, W's that's equal to, say, 879 00:48:49,420 --> 00:48:52,540 some matrix times the thetas, and I could express 880 00:48:52,540 --> 00:48:54,760 the equation in terms of the W's. 881 00:48:54,760 --> 00:48:57,640 So I could transform my original representational parameters 882 00:48:57,640 --> 00:48:59,670 as some other parameters. 883 00:48:59,670 --> 00:49:03,530 And often times, your experiment might 884 00:49:03,530 --> 00:49:06,230 be really good at determining some of these W's, even 885 00:49:06,230 --> 00:49:09,800 if it might be incapable of determining any of the thetas. 886 00:49:09,800 --> 00:49:13,830 So you often might know some linear combination 887 00:49:13,830 --> 00:49:15,830 of parameters, or maybe not linear combinations, 888 00:49:15,830 --> 00:49:18,590 some non-linear combination of parameters 889 00:49:18,590 --> 00:49:22,790 might actually be determinable very well from your experiment, 890 00:49:22,790 --> 00:49:24,950 even though you can't determine things separately. 891 00:49:24,950 --> 00:49:27,290 And this gets into the idea of dimensionless numbers. 892 00:49:27,290 --> 00:49:30,680 So your experiment might depend on some dimensionless number 893 00:49:30,680 --> 00:49:32,241 very sensitively. 894 00:49:32,241 --> 00:49:34,490 And you can be quite confident from your external data 895 00:49:34,490 --> 00:49:36,657 what the value of that dimensionless number must be. 896 00:49:36,657 --> 00:49:38,656 But if you look inside the dimensionless number, 897 00:49:38,656 --> 00:49:40,350 it depends on a lot of different things. 898 00:49:40,350 --> 00:49:41,990 And you might not have any information about them 899 00:49:41,990 --> 00:49:42,770 separately. 900 00:49:42,770 --> 00:49:44,600 All you know is about your experiment just 901 00:49:44,600 --> 00:49:48,410 tells you the value of that one parameter very accurately. 902 00:49:48,410 --> 00:49:51,080 So this is another big part of the art 903 00:49:51,080 --> 00:49:54,240 of doing the model versus data is setting up 904 00:49:54,240 --> 00:49:58,050 your model in terms of parameters that you really 905 00:49:58,050 --> 00:50:00,900 can determine, and getting out all the ones you can't 906 00:50:00,900 --> 00:50:02,370 determine and fixing them. 907 00:50:02,370 --> 00:50:04,469 So we're really going to generally change do 908 00:50:04,469 --> 00:50:05,260 this kind of thing. 909 00:50:05,260 --> 00:50:13,050 But we're going to say that some thetas are fixed, 910 00:50:13,050 --> 00:50:21,780 and also we might change to a different representation, 911 00:50:21,780 --> 00:50:26,780 change to W's instead. 912 00:50:26,780 --> 00:50:27,950 Yeah? 913 00:50:27,950 --> 00:50:30,800 AUDIENCE: Can you explain where this transform-- 914 00:50:30,800 --> 00:50:32,470 I don't really know what's up with-- 915 00:50:32,470 --> 00:50:34,220 WILLIAM GREEN: Yeah, let's get an example. 916 00:50:34,220 --> 00:50:39,640 Suppose I was doing a reactor that had A equilibrium of B. 917 00:50:39,640 --> 00:50:41,520 And I was really interested in kf, 918 00:50:41,520 --> 00:50:44,720 the forward rate for A going to B. I'm a kineticist, 919 00:50:44,720 --> 00:50:48,100 I love to know A goes to B. However, 920 00:50:48,100 --> 00:50:50,210 if I setup the experiment wrong, it 921 00:50:50,210 --> 00:50:53,140 might be that this readction ran all the way to equilibrium. 922 00:50:53,140 --> 00:50:54,890 And what I see in the products is actually 923 00:50:54,890 --> 00:50:57,620 just the equilibrium ratio of A to B. 924 00:50:57,620 --> 00:51:03,630 So what I'm measuring might be something that's dependent 925 00:51:03,630 --> 00:51:06,110 really on kf over kr, and that might be the quantity 926 00:51:06,110 --> 00:51:07,136 I can really determine. 927 00:51:07,136 --> 00:51:08,635 Because that's equilibrium constant. 928 00:51:11,160 --> 00:51:13,030 If I didn't think about it, I could just 929 00:51:13,030 --> 00:51:16,950 try to have the model fitting procedure, just optimize 930 00:51:16,950 --> 00:51:18,760 to find the very best value of kf. 931 00:51:18,760 --> 00:51:21,700 And in that situation, it might have a lot of trouble, 932 00:51:21,700 --> 00:51:24,130 because it might be quite indeterminant what 933 00:51:24,130 --> 00:51:27,074 the kf is, because really all that matters is the ratio. 934 00:51:27,074 --> 00:51:28,490 Also I think about this some more, 935 00:51:28,490 --> 00:51:33,080 suppose I run at short times, and I 936 00:51:33,080 --> 00:51:34,580 measure the time dependence. 937 00:51:34,580 --> 00:51:37,230 What I'm really measuring is kf plus kr. 938 00:51:37,230 --> 00:51:39,490 Do you remember we did the analysis of A 939 00:51:39,490 --> 00:51:43,930 goes to B, one of the early homework problems? 940 00:51:43,930 --> 00:51:48,680 The time cost was actually kf plus kr, not kf separately. 941 00:51:48,680 --> 00:51:51,210 And so if I measure the exponential decay 942 00:51:51,210 --> 00:51:53,507 time constant, I'm really determining kf plus kr, 943 00:51:53,507 --> 00:51:55,340 I might be able to determine that very well. 944 00:51:55,340 --> 00:51:57,170 Actually, in my lab, I can do a great job for this. 945 00:51:57,170 --> 00:51:58,670 I have an instrument that can measure 946 00:51:58,670 --> 00:52:00,378 the time constant of the exponential of k 947 00:52:00,378 --> 00:52:02,540 really precisely, but it's determining the sum. 948 00:52:02,540 --> 00:52:05,170 It's not determining either one of them separately. 949 00:52:05,170 --> 00:52:07,045 And I might have to do a separate experiment, 950 00:52:07,045 --> 00:52:09,200 say a thermo-experiment to get the ratio. 951 00:52:09,200 --> 00:52:12,000 And then from the two I can put them together and get the two 952 00:52:12,000 --> 00:52:13,914 values distinctly. 953 00:52:13,914 --> 00:52:15,330 So this will be an example of this 954 00:52:15,330 --> 00:52:24,760 would be a W. My W is kf plus kr, the matrix would be 1001, 955 00:52:24,760 --> 00:52:26,410 something like that. 956 00:52:26,410 --> 00:52:29,220 1-1, something like that. 957 00:52:29,220 --> 00:52:32,860 Where I add these two guys, kf, kr. 958 00:52:32,860 --> 00:52:36,280 These are my two parameters, 1 plus 1. 959 00:52:36,280 --> 00:52:39,690 And I can determine W1 now very accurately. 960 00:52:39,690 --> 00:52:44,820 sorry, this is m, this is W. 961 00:52:44,820 --> 00:52:50,570 So now in terms of W, this has two parameters now, W1 and W2. 962 00:52:50,570 --> 00:52:52,790 I can't determine W2 from my experiment, 963 00:52:52,790 --> 00:52:54,497 but I can determine W1 really well. 964 00:52:54,497 --> 00:52:56,330 So then when I do the least squares fitting, 965 00:52:56,330 --> 00:52:58,010 I should vary W1. 966 00:52:58,010 --> 00:53:00,350 I can fix it it for my experimental data, 967 00:53:00,350 --> 00:53:02,400 and just leave W2 fixed at some value. 968 00:53:02,400 --> 00:53:05,470 I can't do anything about it. 969 00:53:05,470 --> 00:53:06,291 That all right? 970 00:53:09,060 --> 00:53:11,630 Now, do you get the difference in these two points of view? 971 00:53:11,630 --> 00:53:16,650 This is like, two completely different ways 972 00:53:16,650 --> 00:53:18,702 to look at the problem. 973 00:53:18,702 --> 00:53:20,660 You can think about it as, these parameters are 974 00:53:20,660 --> 00:53:23,150 free for me to vary, and I just have 975 00:53:23,150 --> 00:53:25,975 to be careful to select the ones I'm really free to vary. 976 00:53:25,975 --> 00:53:28,100 And that's the least squares fitting point of view. 977 00:53:28,100 --> 00:53:32,421 Or I could say, I'm not really determining anything 978 00:53:32,421 --> 00:53:34,670 in particular, all I'm doing is taking the whole range 979 00:53:34,670 --> 00:53:36,545 of uncertainty that we have about parameters, 980 00:53:36,545 --> 00:53:41,080 and by my experiment, I narrowed it down in the Bayesian view. 981 00:53:41,080 --> 00:53:42,790 So it's the two different points of view. 982 00:53:42,790 --> 00:53:46,040 To do this one, I need to make sure I have enough data 983 00:53:46,040 --> 00:53:48,204 to determine something. 984 00:53:48,204 --> 00:53:49,620 So I have to have enough determine 985 00:53:49,620 --> 00:53:52,490 some parameter, at least one, otherwise 986 00:53:52,490 --> 00:53:54,212 there's no point in doing this. 987 00:53:54,212 --> 00:53:56,420 This one I can do even if I can't determine anything, 988 00:53:56,420 --> 00:54:00,230 because I could still narrow down the range of parameters. 989 00:54:00,230 --> 00:54:04,490 But this might be harder to report in a table. 990 00:54:04,490 --> 00:54:08,120 Because all I have at the end is a new probably density function 991 00:54:08,120 --> 00:54:11,130 of multiple parameters. 992 00:54:11,130 --> 00:54:11,820 All right? 993 00:54:11,820 --> 00:54:13,470 OK, we're done. 994 00:54:13,470 --> 00:54:16,220 See you guys on Friday.