1 00:00:01,540 --> 00:00:03,910 The following content is provided under a Creative 2 00:00:03,910 --> 00:00:05,300 Commons license. 3 00:00:05,300 --> 00:00:07,510 Your support will help MIT OpenCourseWare 4 00:00:07,510 --> 00:00:11,600 continue to offer high-quality educational resources for free. 5 00:00:11,600 --> 00:00:14,140 To make a donation or to view additional materials 6 00:00:14,140 --> 00:00:18,100 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,100 --> 00:00:19,310 at ocw.mit.edu. 8 00:00:22,702 --> 00:00:24,660 WILLIAM GREEN: All right, so I know some of you 9 00:00:24,660 --> 00:00:27,030 have succeeded to do the homework and some of you, 10 00:00:27,030 --> 00:00:28,380 I think, have not. 11 00:00:28,380 --> 00:00:29,520 Is this correct? 12 00:00:29,520 --> 00:00:30,144 AUDIENCE: Yeah. 13 00:00:30,144 --> 00:00:31,530 WILLIAM GREEN: OK. 14 00:00:31,530 --> 00:00:33,120 So I was wondering if someone who 15 00:00:33,120 --> 00:00:34,536 has succeeded to do their homework 16 00:00:34,536 --> 00:00:39,045 might comment on how small a mesh do you need to converge. 17 00:00:39,045 --> 00:00:40,830 AUDIENCE: [INAUDIBLE] 18 00:00:40,830 --> 00:00:42,160 WILLIAM GREEN: It's about l? 19 00:00:42,160 --> 00:00:42,870 L? 20 00:00:42,870 --> 00:00:45,203 OK, so you need something on the order of l to converge. 21 00:00:45,203 --> 00:00:45,969 Is that correct? 22 00:00:45,969 --> 00:00:47,760 So if you're trying to do the problem using 23 00:00:47,760 --> 00:00:50,040 mesh much bigger than l, you should probably 24 00:00:50,040 --> 00:00:52,323 try a tighter mesh. 25 00:00:52,323 --> 00:00:52,822 Yes? 26 00:00:52,822 --> 00:00:54,349 AUDIENCE: [INAUDIBLE] 27 00:00:54,349 --> 00:00:55,390 WILLIAM GREEN: All right. 28 00:00:55,390 --> 00:00:55,889 Yes? 29 00:00:55,889 --> 00:00:58,260 AUDIENCE: [INAUDIBLE] 30 00:00:58,260 --> 00:00:59,220 WILLIAM GREEN: Yes. 31 00:00:59,220 --> 00:01:01,510 Yes. 32 00:01:01,510 --> 00:01:02,440 All right. 33 00:01:02,440 --> 00:01:04,870 And has anyone managed to get the [INAUDIBLE] solution 34 00:01:04,870 --> 00:01:08,480 to actually be consistent with the [INAUDIBLE] solution? 35 00:01:08,480 --> 00:01:11,080 AUDIENCE: Something like 3% or 4% or so. 36 00:01:11,080 --> 00:01:13,750 WILLIAM GREEN: 3% or 4%, OK. 37 00:01:13,750 --> 00:01:15,820 And I assume that the [INAUDIBLE] is also 38 00:01:15,820 --> 00:01:18,790 using a mesh of similar size? 39 00:01:18,790 --> 00:01:19,599 Hard to tell? 40 00:01:19,599 --> 00:01:21,390 AUDIENCE: I used like a triangular system-- 41 00:01:21,390 --> 00:01:23,806 WILLIAM GREEN: Yeah, yeah, but I mean, it's really, really 42 00:01:23,806 --> 00:01:24,946 tiny ones at the bottom? 43 00:01:24,946 --> 00:01:27,112 If you want me to blow it up, I can just take a look 44 00:01:27,112 --> 00:01:29,800 and see to be sure. 45 00:01:29,800 --> 00:01:35,020 All right, and is backslash able to handle a million 46 00:01:35,020 --> 00:01:36,395 by million matrix? 47 00:01:36,395 --> 00:01:40,040 AUDIENCE: Like 10 seconds with [INAUDIBLE].. 48 00:01:40,040 --> 00:01:41,290 WILLIAM GREEN: [INAUDIBLE] OK. 49 00:01:41,290 --> 00:01:44,220 So you need to-- so do the sparse allocation. 50 00:01:44,220 --> 00:01:46,330 And MATLAB is so smart that it just 51 00:01:46,330 --> 00:01:48,350 can handle it with a million by million, 52 00:01:48,350 --> 00:01:49,766 which is pretty amazing, actually. 53 00:01:49,766 --> 00:01:51,880 That's a pretty big matrix. 54 00:01:51,880 --> 00:01:55,720 All right, sorry, this is too loud. 55 00:01:55,720 --> 00:02:00,020 All right, so last time, we were doing some elementary things 56 00:02:00,020 --> 00:02:01,439 about probability. 57 00:02:01,439 --> 00:02:03,730 Actually, any more questions about the homework problem 58 00:02:03,730 --> 00:02:04,290 before we get started? 59 00:02:04,290 --> 00:02:05,440 AUDIENCE: What's the answer? 60 00:02:05,440 --> 00:02:06,814 WILLIAM GREEN: What's the answer? 61 00:02:06,814 --> 00:02:08,170 You could ask your classmates. 62 00:02:08,170 --> 00:02:09,963 Any other questions? 63 00:02:12,841 --> 00:02:13,340 All right. 64 00:02:17,420 --> 00:02:26,570 So I had you confused a little bit with this formula 65 00:02:26,570 --> 00:02:31,047 probability of either A or B. So I asked what 66 00:02:31,047 --> 00:02:31,880 the probability of-- 67 00:02:31,880 --> 00:02:34,970 I flipped two coins-- that one of them would be a head. 68 00:02:34,970 --> 00:02:36,890 And I could see a lot of consternation. 69 00:02:36,890 --> 00:02:39,285 The general formula for this is it's 70 00:02:39,285 --> 00:02:43,780 the probability of A plus the probability of B 71 00:02:43,780 --> 00:02:51,822 minus the probability of A and B. 72 00:02:51,822 --> 00:02:53,780 It can't just be the two of them added together 73 00:02:53,780 --> 00:02:58,520 because if you have 50% chance for head of the penny 74 00:02:58,520 --> 00:03:01,340 and the dime is 50% chance, this would add up to 100% chance 75 00:03:01,340 --> 00:03:04,170 that you'll get a head, but you know sometimes it's true. 76 00:03:04,170 --> 00:03:06,000 So this is the formula. 77 00:03:06,000 --> 00:03:11,810 And then the probability of A and B 78 00:03:11,810 --> 00:03:15,920 is often written in terms of the conditional probabilities, 79 00:03:15,920 --> 00:03:19,400 the probability of A times the probability that B would happen 80 00:03:19,400 --> 00:03:22,171 given that A already happened, which is also 81 00:03:22,171 --> 00:03:23,420 equal to the other way around. 82 00:03:30,020 --> 00:03:31,520 And this has to be read carefully. 83 00:03:31,520 --> 00:03:33,845 It means B already happened, and then you 84 00:03:33,845 --> 00:03:35,220 want to know the probability of A 85 00:03:35,220 --> 00:03:37,000 given that B already happened. 86 00:03:37,000 --> 00:03:38,954 So it's sort of like this way-- 87 00:03:38,954 --> 00:03:40,620 I don't know-- the way I think about it. 88 00:03:40,620 --> 00:03:41,940 Like, this happened first. and now I'm 89 00:03:41,940 --> 00:03:44,148 checking the probability that that's going to happen. 90 00:03:47,730 --> 00:03:50,165 Now, a nice little example of this 91 00:03:50,165 --> 00:03:51,540 is given in [INAUDIBLE] textbook. 92 00:03:51,540 --> 00:03:53,460 And I think it's nice enough that it's worthwhile to spend 93 00:03:53,460 --> 00:03:55,050 a few minutes talking about it. 94 00:03:55,050 --> 00:03:56,160 So he was-- 95 00:03:56,160 --> 00:03:57,630 [INAUDIBLE] who wrote the textbook, 96 00:03:57,630 --> 00:03:59,005 was not actually a numerical guy. 97 00:03:59,005 --> 00:04:00,660 He was a polymer chemist. 98 00:04:00,660 --> 00:04:03,870 And so he gave a nice polymer example. 99 00:04:03,870 --> 00:04:07,380 So if you have a polymer and the monomers 100 00:04:07,380 --> 00:04:11,790 have some big molecule, and at one side, 101 00:04:11,790 --> 00:04:14,816 they have a sort of acceptor group, 102 00:04:14,816 --> 00:04:17,440 and the other side, some kind of donor group-- we'll call it D, 103 00:04:17,440 --> 00:04:19,935 I guess. 104 00:04:19,935 --> 00:04:21,060 And these are the monomers. 105 00:04:21,060 --> 00:04:22,310 And so they can link together. 106 00:04:22,310 --> 00:04:23,940 The donor can react to the acceptor. 107 00:04:23,940 --> 00:04:34,182 So you can end up with things like this and so on. 108 00:04:34,182 --> 00:04:35,140 So this is the monomer. 109 00:04:35,140 --> 00:04:36,186 This is the dimer. 110 00:04:36,186 --> 00:04:38,060 Then you could keep on [INAUDIBLE] like this. 111 00:04:38,060 --> 00:04:40,370 And many, many, many of the materials 112 00:04:40,370 --> 00:04:42,920 you use every day, the fabrics in the seats 113 00:04:42,920 --> 00:04:45,506 that you're sitting on, the backs 114 00:04:45,506 --> 00:04:51,380 of the seats, your clothing, the binder 115 00:04:51,380 --> 00:04:53,300 holding the chalk together, all this stuff 116 00:04:53,300 --> 00:04:55,670 is made from polymers like this. 117 00:04:55,670 --> 00:04:58,220 So this is a pretty important, actually, practical problem. 118 00:04:58,220 --> 00:05:03,620 And so you start with the monomers, 119 00:05:03,620 --> 00:05:08,840 and they react where you have A reacting plus D, over and over 120 00:05:08,840 --> 00:05:10,380 again. 121 00:05:10,380 --> 00:05:17,540 And we want to understand the statistics of what 122 00:05:17,540 --> 00:05:21,802 chain lengths are going to make, maybe what weight percent 123 00:05:21,802 --> 00:05:23,510 or what would the average microweight be, 124 00:05:23,510 --> 00:05:26,093 something like that would be the kind of things we care about. 125 00:05:29,030 --> 00:05:34,180 So a way to think about it is if I've reacted this 126 00:05:34,180 --> 00:05:39,880 to some extent and I just grab a random polymer 127 00:05:39,880 --> 00:05:44,850 chain, any molecule in there, and I look and find, let's 128 00:05:44,850 --> 00:05:49,840 say, the unreacted D end-- 129 00:05:49,840 --> 00:05:53,380 so any oligomer is going to have one unreacted D end. 130 00:05:53,380 --> 00:05:55,100 You can see no matter how long I make it, 131 00:05:55,100 --> 00:05:56,815 there will still be one unreacted D end. 132 00:05:56,815 --> 00:05:58,690 And I'm neglecting the possibility this might 133 00:05:58,690 --> 00:06:00,880 circle around and make a loop. 134 00:06:00,880 --> 00:06:04,180 So assuming no loops, then any molecule I grab 135 00:06:04,180 --> 00:06:06,490 is going to have one unreacted D end. 136 00:06:06,490 --> 00:06:08,380 So I grab a molecule. 137 00:06:08,380 --> 00:06:10,780 I start at the unreacted D end, and I look 138 00:06:10,780 --> 00:06:12,350 at the A that's next to it. 139 00:06:12,350 --> 00:06:14,570 And I say, is that A reacted or not? 140 00:06:14,570 --> 00:06:17,650 So if it's a monomer, I grab the D. I look over here. 141 00:06:17,650 --> 00:06:18,880 The A is unreacted. 142 00:06:18,880 --> 00:06:25,760 So the probability that it's a monomer 143 00:06:25,760 --> 00:06:27,890 is going to be equal sort of like 1 minus P 144 00:06:27,890 --> 00:06:33,200 where P is the probability that As react. 145 00:06:33,200 --> 00:06:36,380 So it didn't react, so just like that. 146 00:06:36,380 --> 00:06:40,250 This one, the one next to it has reacted. 147 00:06:40,250 --> 00:06:47,910 So this is just going to be the probability of a dimer is 148 00:06:47,910 --> 00:07:00,540 the probability that my nearest neighbor reacted 149 00:07:00,540 --> 00:07:14,286 and next neighbor is unreacted, right? 150 00:07:14,286 --> 00:07:17,600 Is that OK? 151 00:07:17,600 --> 00:07:19,262 So I can write this way. 152 00:07:19,262 --> 00:07:20,720 I could say, what's the probability 153 00:07:20,720 --> 00:07:34,510 that my nearest reacted times a conditional probability, next 154 00:07:34,510 --> 00:07:42,480 unreacted if nearest is reacted? 155 00:07:46,770 --> 00:07:49,760 So far, so good? 156 00:07:49,760 --> 00:07:50,940 You guys are OK with this? 157 00:07:50,940 --> 00:07:52,400 So I grabbed a chain. 158 00:07:52,400 --> 00:07:54,370 I'm trying to see if it's a dimer. 159 00:07:54,370 --> 00:07:56,100 I'm going to calculate the probability 160 00:07:56,100 --> 00:07:59,580 that this next acceptor group has 161 00:07:59,580 --> 00:08:02,015 been reacted to a donor group. 162 00:08:02,015 --> 00:08:03,390 If it has reacted, then I'm going 163 00:08:03,390 --> 00:08:05,277 to check the next one after that. 164 00:08:05,277 --> 00:08:06,610 So this is the nearest neighbor. 165 00:08:06,610 --> 00:08:07,970 This is the next nearest neighbor. 166 00:08:07,970 --> 00:08:09,303 And I want this to be unreacted. 167 00:08:09,303 --> 00:08:12,320 If that's both true, then I have a dimer. 168 00:08:12,320 --> 00:08:14,890 If either one of those is false, [INAUDIBLE].. 169 00:08:14,890 --> 00:08:17,270 Is that OK? 170 00:08:17,270 --> 00:08:19,515 So now I need to have a probability. 171 00:08:19,515 --> 00:08:22,250 So what's the probability that the nearest one is reacted? 172 00:08:22,250 --> 00:08:24,560 There's some probability that things have reacted. 173 00:08:24,560 --> 00:08:29,860 So this is going to be my P, probability that things 174 00:08:29,860 --> 00:08:31,050 reacted. 175 00:08:31,050 --> 00:08:33,419 And I wanted this to be unreacted. 176 00:08:33,419 --> 00:08:34,470 Now, there's a question. 177 00:08:34,470 --> 00:08:37,179 Are these correlated or not? 178 00:08:37,179 --> 00:08:39,976 Now, in reality, everything's correlated to everything. 179 00:08:39,976 --> 00:08:41,309 So probably, they're correlated. 180 00:08:41,309 --> 00:08:44,850 But if we're trying to make a model and think about it, 181 00:08:44,850 --> 00:08:47,100 the fact that this thing reacted at this side 182 00:08:47,100 --> 00:08:48,750 doesn't really affect this side if this 183 00:08:48,750 --> 00:08:52,500 is a big enough [INAUDIBLE] So to a good approximation, 184 00:08:52,500 --> 00:08:55,140 this is independent of whether or not it's reacted or not. 185 00:08:55,140 --> 00:08:58,530 So this is still going to have the ordinary probability 186 00:08:58,530 --> 00:09:05,160 of being unreacted, which would be 1 minus P. 187 00:09:05,160 --> 00:09:18,310 So I could write down that the probability of being a monomer 188 00:09:18,310 --> 00:09:24,930 is equal to 1 minus P. The probability of being a dimer 189 00:09:24,930 --> 00:09:28,940 is equal to P times 1 minus P. What's the probability of being 190 00:09:28,940 --> 00:09:29,440 a trimer? 191 00:09:36,540 --> 00:09:44,630 P squared times 1 minus P. And in general, 192 00:09:44,630 --> 00:09:50,370 the probability of being an n-mer 193 00:09:50,370 --> 00:09:57,320 is equal to P n minus 1 times 1 minus P. 194 00:09:57,320 --> 00:10:01,880 So now you guys are statistical polymer chemists. 195 00:10:01,880 --> 00:10:04,790 So this derivation was derived by a guy named Flory. 196 00:10:04,790 --> 00:10:05,950 He got the Nobel Prize. 197 00:10:05,950 --> 00:10:07,329 He's a pretty important guy. 198 00:10:07,329 --> 00:10:08,870 If you want to learn a lot about him, 199 00:10:08,870 --> 00:10:10,619 I think both Professor Cohen and Professor 200 00:10:10,619 --> 00:10:12,920 Rutledge teach classes that are basically, learn 201 00:10:12,920 --> 00:10:15,280 what Mr. Flory figured out. 202 00:10:15,280 --> 00:10:18,899 Well, maybe that's a little bit too strong, but pretty much. 203 00:10:18,899 --> 00:10:20,440 There's another guy named [INAUDIBLE] 204 00:10:20,440 --> 00:10:22,970 that did a bit too, so [INAUDIBLE] and Flory. 205 00:10:22,970 --> 00:10:24,995 Basically everything about polymers 206 00:10:24,995 --> 00:10:26,450 worked out by at these guys. 207 00:10:26,450 --> 00:10:27,640 And all they did was just probability theory, 208 00:10:27,640 --> 00:10:28,723 so it was a piece of cake. 209 00:10:31,820 --> 00:10:36,480 And so this is the probability that you have an n-mer. 210 00:10:36,480 --> 00:10:38,210 So now we can compute things like, 211 00:10:38,210 --> 00:10:41,720 what is the expectation value of the chain length? 212 00:10:41,720 --> 00:10:44,930 How many guys link together? 213 00:10:44,930 --> 00:10:50,870 And that's defined to be the sum of n 214 00:10:50,870 --> 00:10:52,611 times the probability of n. 215 00:11:00,310 --> 00:11:09,435 So that, in this case, is going to be sum of n times P 216 00:11:09,435 --> 00:11:14,970 to the n minus 1 times 1 minus P. 217 00:11:14,970 --> 00:11:19,880 Now, a lot of these kinds of simple series summations, 218 00:11:19,880 --> 00:11:20,880 there's formulas for it. 219 00:11:20,880 --> 00:11:22,570 And maybe in high school, you guys might have studied series. 220 00:11:22,570 --> 00:11:24,012 I don't know if you remember. 221 00:11:24,012 --> 00:11:24,970 And so you can look up. 222 00:11:24,970 --> 00:11:26,430 And some of these have analytical formulas 223 00:11:26,430 --> 00:11:27,000 that are really simple. 224 00:11:27,000 --> 00:11:28,320 But you can just leave it this way too, 225 00:11:28,320 --> 00:11:29,390 because you get a value numerically 226 00:11:29,390 --> 00:11:30,390 with MATLAB, no trouble. 227 00:11:38,090 --> 00:11:44,990 You can also figure out what is the concentration of oligomers 228 00:11:44,990 --> 00:11:48,400 with n units in them. 229 00:11:48,400 --> 00:12:00,440 And so that's going to be equal to the total concentration 230 00:12:00,440 --> 00:12:08,010 of polymers times the probability that it has n. 231 00:12:11,460 --> 00:12:13,200 So this one, we just worked out. 232 00:12:13,200 --> 00:12:18,540 The total concentration, a way to figure 233 00:12:18,540 --> 00:12:24,680 that out is to think about there's 234 00:12:24,680 --> 00:12:27,882 one monomer or one monomer-- 235 00:12:27,882 --> 00:12:29,090 I'll call this a polymer too. 236 00:12:29,090 --> 00:12:30,860 This is a polymer with one unit. 237 00:12:30,860 --> 00:12:36,320 There's one polymer molecule per unreacted end, unreacted D end. 238 00:12:36,320 --> 00:12:39,110 So it's really, I want to know how many are unreacted. 239 00:12:39,110 --> 00:12:47,720 So that's going to be 1 minus P times the amount of monomer 240 00:12:47,720 --> 00:12:50,130 I had to start with. 241 00:12:50,130 --> 00:12:53,870 It could be A or D. It doesn't matter. 242 00:12:53,870 --> 00:12:55,950 It's like, how many of them-- 243 00:12:55,950 --> 00:12:59,190 I started with a certain amount of free ends. 244 00:12:59,190 --> 00:13:04,355 What fraction of them have reacted based on 1 245 00:13:04,355 --> 00:13:08,330 minus P. Yeah, it's 1 minus P. 246 00:13:08,330 --> 00:13:09,680 So as P goes-- 247 00:13:09,680 --> 00:13:11,040 well, yeah, it goes backwards. 248 00:13:11,040 --> 00:13:15,340 Yeah, as P goes to infinity-- 249 00:13:15,340 --> 00:13:16,520 I think that's right. 250 00:13:19,060 --> 00:13:20,740 Yeah, when P is-- 251 00:13:20,740 --> 00:13:23,510 well, I'm totally confused here now. 252 00:13:23,510 --> 00:13:26,350 1 minus P sound right? 253 00:13:26,350 --> 00:13:27,850 Maybe I did the reasoning backwards. 254 00:13:27,850 --> 00:13:29,391 This is definitely the right formula. 255 00:13:29,391 --> 00:13:31,200 I'm just confusing myself with my language. 256 00:13:31,200 --> 00:13:34,350 This is a, at least for me, endemic problem 257 00:13:34,350 --> 00:13:37,505 with probability is you could say things very glibly. 258 00:13:37,505 --> 00:13:39,380 You've got to think of exactly what you mean. 259 00:13:39,380 --> 00:13:59,470 So the concentration of unreacted ends, 260 00:13:59,470 --> 00:14:08,180 so initially, this was equal to A. It was all unreacted ends. 261 00:14:08,180 --> 00:14:14,010 And as the process proceeds, as P increases, then at the end, 262 00:14:14,010 --> 00:14:15,380 it's going to be very small. 263 00:14:15,380 --> 00:14:16,210 So this is right. 264 00:14:20,565 --> 00:14:22,190 And the concentration of unreacted ends 265 00:14:22,190 --> 00:14:23,648 is equal to the total concentration 266 00:14:23,648 --> 00:14:26,410 of polymers, the number of polymers [INAUDIBLE].. 267 00:14:26,410 --> 00:14:31,332 So it's this times P n minus 1 times [INAUDIBLE].. 268 00:14:36,120 --> 00:14:41,070 All right, and this is called the Flory redistribution. 269 00:14:41,070 --> 00:14:43,680 And that gives the concentrations of all 270 00:14:43,680 --> 00:14:46,410 your oligomers after you do a polymerization 271 00:14:46,410 --> 00:14:49,540 if they're all uncorrelated and you don't form any loops. 272 00:14:55,120 --> 00:14:57,410 It's often very important to know 273 00:14:57,410 --> 00:14:58,975 the width of the distribution. 274 00:14:58,975 --> 00:15:01,100 If you make a polymer, you want to make things have 275 00:15:01,100 --> 00:15:03,160 as monodisperse as possible. 276 00:15:03,160 --> 00:15:05,806 It's because you'd really like to make this pure chemical. 277 00:15:05,806 --> 00:15:07,430 There's some polymer chain length which 278 00:15:07,430 --> 00:15:09,410 is optimal for your purpose. 279 00:15:09,410 --> 00:15:12,036 You want to try to make sure that the average value, 280 00:15:12,036 --> 00:15:14,660 average value, this is going to be equal to the value you want. 281 00:15:14,660 --> 00:15:18,230 So you want to keep running P up until you reach the point where 282 00:15:18,230 --> 00:15:21,470 the average chain length is the chain length that's 283 00:15:21,470 --> 00:15:23,870 optimal for your application. 284 00:15:23,870 --> 00:15:25,910 If you make the polymer too long, 285 00:15:25,910 --> 00:15:28,780 then it's going to be hard to dissolve it. 286 00:15:28,780 --> 00:15:31,610 It's going to be hard to handle and it's can be solid. 287 00:15:31,610 --> 00:15:33,390 If you make it too short, then it 288 00:15:33,390 --> 00:15:35,140 may not have the mechanical properties you 289 00:15:35,140 --> 00:15:36,470 need for the polymer to have. 290 00:15:36,470 --> 00:15:38,270 So there's some optimal choice. 291 00:15:38,270 --> 00:15:41,090 So you typically run the conversion 292 00:15:41,090 --> 00:15:44,436 until P reaches a number so that this is your optimal value, 293 00:15:44,436 --> 00:15:46,310 but then you care about what's the dispersion 294 00:15:46,310 --> 00:15:48,380 about that optimal value. 295 00:15:48,380 --> 00:15:50,960 And particularly, the unreacted monomers that are left 296 00:15:50,960 --> 00:15:53,300 might be a problem because they might leach out 297 00:15:53,300 --> 00:15:55,280 over time because they might still be liquids, 298 00:15:55,280 --> 00:15:56,740 or even gases that come out. 299 00:15:56,740 --> 00:15:59,930 So this famous problem, people made baby bottles 300 00:15:59,930 --> 00:16:03,320 and they have some leftover small molecules 301 00:16:03,320 --> 00:16:04,491 in the baby bottles. 302 00:16:04,491 --> 00:16:06,240 And then they can leach out into the milk, 303 00:16:06,240 --> 00:16:08,210 and the mothers don't appreciate that. 304 00:16:08,210 --> 00:16:10,940 So there's a lot of real practical problems 305 00:16:10,940 --> 00:16:11,949 about how to do this. 306 00:16:11,949 --> 00:16:13,740 So anyway, you'd be interested in the width 307 00:16:13,740 --> 00:16:15,550 of the distribution. 308 00:16:15,550 --> 00:16:18,775 So we define what's called the variance. 309 00:16:23,810 --> 00:16:30,436 And the variance of n is written this way. 310 00:16:30,436 --> 00:16:34,070 And it's just defined to be the expectation 311 00:16:34,070 --> 00:16:38,180 value of n squared minus the expectation value of n squared. 312 00:16:38,180 --> 00:16:41,150 These two are always different or almost always different, 313 00:16:41,150 --> 00:16:42,130 so it's not 0. 314 00:16:45,490 --> 00:16:47,830 So this is equal to the summation 315 00:16:47,830 --> 00:16:55,320 of n squared times the probability of n minus-- 316 00:17:01,690 --> 00:17:02,780 all right? 317 00:17:02,780 --> 00:17:05,295 And a lot of times in the polymer field, 318 00:17:05,295 --> 00:17:07,670 what they'll take is they'll take the square root of this 319 00:17:07,670 --> 00:17:09,530 and they'll compare sigma n divided 320 00:17:09,530 --> 00:17:11,119 by expectation value of n. 321 00:17:11,119 --> 00:17:13,579 This is a dimensionless number because sigma n 322 00:17:13,579 --> 00:17:14,940 will have the dimensions. 323 00:17:14,940 --> 00:17:17,359 Sigma squared is dimensions n squared. 324 00:17:17,359 --> 00:17:20,420 This is dimensions of n, so it's a dimensionless number. 325 00:17:20,420 --> 00:17:23,730 And that's-- I think they call it dispersity of polymer, 326 00:17:23,730 --> 00:17:24,720 something like that. 327 00:17:29,180 --> 00:17:33,910 Now, notice that when we use these [INAUDIBLE],, 328 00:17:33,910 --> 00:17:35,770 when we wrote it this way, it's implicitly 329 00:17:35,770 --> 00:17:38,920 that these things are divided by the summation 330 00:17:38,920 --> 00:17:41,230 of the probability of n. 331 00:17:41,230 --> 00:17:43,210 But because these probabilities sum to 1, 332 00:17:43,210 --> 00:17:44,210 I can just leave it out. 333 00:17:47,320 --> 00:17:49,440 But sometimes, it may be difficult 334 00:17:49,440 --> 00:17:51,950 for you to figure out exactly what the probabilities are 335 00:17:51,950 --> 00:17:53,730 and you'll need a scaling factor to force this thing 336 00:17:53,730 --> 00:17:54,480 to be equal to 1. 337 00:17:54,480 --> 00:17:57,020 So sometimes, people leave these in the denominator. 338 00:17:57,020 --> 00:17:58,811 There's another thing you might care about, 339 00:17:58,811 --> 00:18:05,400 which would be like, what's the weight percent of Pn? 340 00:18:05,400 --> 00:18:07,920 So what fraction of the weight of the polymer 341 00:18:07,920 --> 00:18:10,470 is my particular oligomer, Pn? 342 00:18:10,470 --> 00:18:13,717 [INAUDIBLE] sorry, some special one, Pm. 343 00:18:13,717 --> 00:18:15,300 And I want to know its weight percent. 344 00:18:15,300 --> 00:18:28,650 So that's going to be equal to the weight of Pm 345 00:18:28,650 --> 00:18:33,060 in the mix divided by the total weight. 346 00:18:39,070 --> 00:18:44,790 So that's equal to the weight of m times 347 00:18:44,790 --> 00:18:49,770 the probability of m divided by the total weight, which 348 00:18:49,770 --> 00:18:51,812 is going to be the weight of all these guys, 349 00:18:51,812 --> 00:18:54,940 times the probability of each of them. 350 00:18:54,940 --> 00:18:56,570 And you can see this is different. 351 00:18:56,570 --> 00:18:59,140 This is not the same as-- 352 00:18:59,140 --> 00:19:09,880 not equal to, right? 353 00:19:09,880 --> 00:19:11,090 It's not the same thing. 354 00:19:11,090 --> 00:19:15,812 So just watch out when you do this. 355 00:19:15,812 --> 00:19:18,270 And in fact, in the polymer world, they always have to say, 356 00:19:18,270 --> 00:19:19,150 I did weight average. 357 00:19:19,150 --> 00:19:23,210 I did number average, because they're different. 358 00:19:23,210 --> 00:19:25,436 Is this OK? 359 00:19:25,436 --> 00:19:27,650 Yeah? 360 00:19:27,650 --> 00:19:30,029 So I would-- my general confidence, at least for me, 361 00:19:30,029 --> 00:19:32,570 if I skip steps, I always get it wrong when I do probability. 362 00:19:32,570 --> 00:19:33,745 So don't skip steps. 363 00:19:33,745 --> 00:19:36,890 Do it one by one by one, what you really mean. 364 00:19:36,890 --> 00:19:37,640 Then you'll be OK. 365 00:19:40,370 --> 00:19:45,270 All right, now, this is a cute little example. 366 00:19:45,270 --> 00:19:46,270 It's discrete variables. 367 00:19:46,270 --> 00:19:48,995 It's easy to count everything. 368 00:19:48,995 --> 00:19:51,120 Very often, we care about probability distributions 369 00:19:51,120 --> 00:19:52,272 of continuous variables. 370 00:19:52,272 --> 00:19:54,480 And we have to do those probability density functions 371 00:19:54,480 --> 00:19:58,430 that I talked about last time which have units in them. 372 00:19:58,430 --> 00:20:03,090 And so as we mentioned last time, 373 00:20:03,090 --> 00:20:04,770 if you want to know the probability 374 00:20:04,770 --> 00:20:18,740 that a continuous variable, that x is a member of this interval, 375 00:20:18,740 --> 00:20:30,340 the probability this is true is equal to Px of x hat times dx. 376 00:20:33,190 --> 00:20:35,230 And so this quantity has units of 1 377 00:20:35,230 --> 00:20:39,284 over x, whatever the units of x are. 378 00:20:39,284 --> 00:20:41,200 And then you have to multiply it by x in order 379 00:20:41,200 --> 00:20:42,560 to get the units to be dimensionless, which 380 00:20:42,560 --> 00:20:43,685 is what the probability is. 381 00:20:47,850 --> 00:20:54,050 And this is like obvious things, like P Px of x prime dx prime. 382 00:20:54,050 --> 00:20:56,550 [INAUDIBLE] value as possible of x has got to be equal to 1. 383 00:20:56,550 --> 00:20:59,810 It's a probability, which is the same 384 00:20:59,810 --> 00:21:03,380 as saying that probability of x is some value anywhere is 1. 385 00:21:03,380 --> 00:21:05,651 So there's some [INAUDIBLE] you measure. 386 00:21:08,480 --> 00:21:13,490 And you can also have a probability that x is less than 387 00:21:13,490 --> 00:21:15,470 or equal to x prime. 388 00:21:15,470 --> 00:21:19,045 And that's the integral from negative infinity 389 00:21:19,045 --> 00:21:23,520 to x prime of Px of x dx. 390 00:21:29,760 --> 00:21:35,716 And the mean is just the integral of x Px. 391 00:21:42,660 --> 00:21:44,320 And you can compute the x squared. 392 00:21:44,320 --> 00:21:47,070 The average of x and x squared, same thing. 393 00:21:47,070 --> 00:21:48,440 You average of anything. 394 00:21:55,290 --> 00:21:57,250 You can put these together. 395 00:21:57,250 --> 00:22:02,280 You can get sigma x squared is equal to x squared 396 00:22:02,280 --> 00:22:06,640 minus the average squared. 397 00:22:06,640 --> 00:22:09,690 So that's the variance of x. 398 00:22:09,690 --> 00:22:12,600 You can also do this with any function. 399 00:22:12,600 --> 00:22:18,510 So you can say that the average value of a function 400 00:22:18,510 --> 00:22:24,112 is equal to the integral of f of x Px of x dx. 401 00:22:27,030 --> 00:22:29,370 This is an average value of a function 402 00:22:29,370 --> 00:22:31,770 of a random variable described by probability density 403 00:22:31,770 --> 00:22:34,350 function with P of x. 404 00:22:34,350 --> 00:22:37,046 And then you can get things like sigma 405 00:22:37,046 --> 00:22:43,890 f squared is equal to the integral of f of x, quantity 406 00:22:43,890 --> 00:22:49,555 squared, Px of x minus-- 407 00:22:55,870 --> 00:22:59,274 all right? 408 00:22:59,274 --> 00:22:59,940 Everything's OK? 409 00:22:59,940 --> 00:23:02,101 Yeah. 410 00:23:02,101 --> 00:23:04,350 All right, so a lot of times, people are going to say, 411 00:23:04,350 --> 00:23:06,900 we do sampling from Px. 412 00:23:06,900 --> 00:23:09,720 So sampling from Px means that we have some probability 413 00:23:09,720 --> 00:23:13,200 distribution function, Px of x, and we 414 00:23:13,200 --> 00:23:17,400 want to have one value of x that we draw from that probability 415 00:23:17,400 --> 00:23:18,660 distribution. 416 00:23:18,660 --> 00:23:20,760 When we say it that way, we mean that we're 417 00:23:20,760 --> 00:23:25,370 more likely to find x's where Px has a high value 418 00:23:25,370 --> 00:23:28,430 and we're less likely to draw an x value that Px 419 00:23:28,430 --> 00:23:30,459 has a low value. 420 00:23:30,459 --> 00:23:31,750 So that's what's sampling from. 421 00:23:31,750 --> 00:23:33,770 Now, you can do that mathematically 422 00:23:33,770 --> 00:23:36,800 using random number generators in MATLAB, for example, 423 00:23:36,800 --> 00:23:39,290 and we'll do that sometimes. 424 00:23:39,290 --> 00:23:41,580 But you do it all the time when you do experiments. 425 00:23:41,580 --> 00:23:43,911 So the experiment has some probability density function 426 00:23:43,911 --> 00:23:45,410 that you're going observe something, 427 00:23:45,410 --> 00:23:47,450 you're going to measure something. 428 00:23:47,450 --> 00:23:49,670 And you don't know what that distribution is, 429 00:23:49,670 --> 00:23:51,470 but every time you make a measurement, 430 00:23:51,470 --> 00:23:53,650 you're sampling from that distribution. 431 00:23:53,650 --> 00:23:55,700 So that's the key conceptual idea 432 00:23:55,700 --> 00:23:59,180 is that there is a Px of x out there for our measurement. 433 00:23:59,180 --> 00:24:02,165 So you're trying to measure how tall I am. 434 00:24:02,165 --> 00:24:03,540 Every time you measure it, you're 435 00:24:03,540 --> 00:24:08,070 drawing from a distribution of experimental measurements 436 00:24:08,070 --> 00:24:11,430 of Professor Green's height. 437 00:24:11,430 --> 00:24:14,340 And there is some Px of x that exists even though you 438 00:24:14,340 --> 00:24:15,776 don't know what it is. 439 00:24:15,776 --> 00:24:17,400 And each time you make the measurement, 440 00:24:17,400 --> 00:24:19,920 you're drawing numbers from that distribution. 441 00:24:19,920 --> 00:24:24,990 And if you draw a lot of them, then you can do an average. 442 00:24:24,990 --> 00:24:27,930 And it should be an average that's close to this. 443 00:24:27,930 --> 00:24:30,360 If you drew an infinite number of values, 444 00:24:30,360 --> 00:24:31,680 then you're sampling this. 445 00:24:31,680 --> 00:24:35,820 You can make a histogram plot of the heights you measure of me, 446 00:24:35,820 --> 00:24:39,210 and it should have some shape that's similar to Px of x. 447 00:24:39,210 --> 00:24:40,980 Does that makes sense? 448 00:24:40,980 --> 00:24:41,610 All right. 449 00:24:41,610 --> 00:24:43,110 So actually, everyday you're drawing 450 00:24:43,110 --> 00:24:44,401 from probability distributions. 451 00:24:44,401 --> 00:24:45,780 You just didn't know it. 452 00:24:45,780 --> 00:24:46,530 It's like [INAUDIBLE] street. 453 00:24:46,530 --> 00:24:47,985 The probability the bus is going to hit me or not 454 00:24:47,985 --> 00:24:49,890 and the bus driver is going to stop, 455 00:24:49,890 --> 00:24:50,730 I think there's a high probability, 456 00:24:50,730 --> 00:24:52,479 but I'm always a little worried, actually. 457 00:24:52,479 --> 00:24:53,290 Good. 458 00:24:53,290 --> 00:24:55,890 I'm drawing from-- it's a particular instance 459 00:24:55,890 --> 00:24:59,360 of that probability distribution about whether the bus driver's 460 00:24:59,360 --> 00:25:01,740 really going to stop or not. 461 00:25:01,740 --> 00:25:04,640 And if I sample enough times, I might be dead. 462 00:25:04,640 --> 00:25:07,680 But anyway, all right. 463 00:25:15,270 --> 00:25:17,770 Often we have multiple variables. 464 00:25:17,770 --> 00:25:19,670 So you can write down-- 465 00:25:33,740 --> 00:25:36,430 you can define Px hat. 466 00:25:50,300 --> 00:25:51,740 So now I have multiple x's. 467 00:25:51,740 --> 00:25:53,890 It's like more than one variable of x. 468 00:25:53,890 --> 00:25:56,139 And I wanted the probability density function of them. 469 00:25:56,139 --> 00:25:58,430 I'm going to measure this and this and this and this, 470 00:25:58,430 --> 00:25:59,810 all right? 471 00:25:59,810 --> 00:26:04,700 And this is equal to the probability 472 00:26:04,700 --> 00:26:16,760 that x1 is a member of the set, x1, x1 plus dx1, 473 00:26:16,760 --> 00:26:27,136 and x2 is a member of x2, x2 plus dx2, and that. 474 00:26:29,584 --> 00:26:31,250 That's what probability density function 475 00:26:31,250 --> 00:26:33,230 means with multiple variables. 476 00:26:37,560 --> 00:26:39,800 So this is very common for us because we often 477 00:26:39,800 --> 00:26:42,950 measure and experiment more than one thing, right? 478 00:26:42,950 --> 00:26:45,860 So you measure the flow rate and the temperature. 479 00:26:45,860 --> 00:26:49,250 You measure the yield and the absorption 480 00:26:49,250 --> 00:26:52,835 at some wavelength that corresponds to an impurity. 481 00:26:52,835 --> 00:26:55,460 Usually when you experiment, you often measure multiple things. 482 00:26:55,460 --> 00:26:58,580 And so you're sampling from multiple observable 483 00:26:58,580 --> 00:26:59,780 simultaneously. 484 00:26:59,780 --> 00:27:02,836 And implicitly, you're sampling from some complicated PDF 485 00:27:02,836 --> 00:27:05,210 like this even though you don't know the shape of the PDF 486 00:27:05,210 --> 00:27:06,126 usually to start with. 487 00:27:10,180 --> 00:27:14,220 And so then when you have this multiple variable case, 488 00:27:14,220 --> 00:27:21,900 you can define a thing called the covariance matrix 489 00:27:21,900 --> 00:27:25,130 where the elements of the matrix Cij 490 00:27:25,130 --> 00:27:37,060 are equal to xi xj, the mean of that product, minus xi xj. 491 00:27:37,060 --> 00:27:39,820 And so you can see that, for example, sigma i squared 492 00:27:39,820 --> 00:27:43,030 is equal to Cii, but the diagonal elements 493 00:27:43,030 --> 00:27:44,545 are just the variances. 494 00:27:44,545 --> 00:27:47,170 But now we have the covariances because we measured, let's say, 495 00:27:47,170 --> 00:27:48,146 two things. 496 00:27:53,750 --> 00:27:59,220 All right, so suppose we do n measurements 497 00:27:59,220 --> 00:28:02,040 and we compute the average of our repeats. 498 00:28:02,040 --> 00:28:06,140 So we'd just repeat the same measurements over and over. 499 00:28:06,140 --> 00:28:09,287 So suppose you measure my height and my weight. 500 00:28:09,287 --> 00:28:10,870 Every time I go to the medical clinic, 501 00:28:10,870 --> 00:28:13,370 they always measure my height, my weight, my blood pressure. 502 00:28:13,370 --> 00:28:14,540 You've got three numbers. 503 00:28:14,540 --> 00:28:17,210 And I could go back in there 47 times, 504 00:28:17,210 --> 00:28:19,010 and they'll do it 47 times. 505 00:28:19,010 --> 00:28:21,050 And if a different technician measured 506 00:28:21,050 --> 00:28:24,199 it using different [INAUDIBLE] and a different scale, 507 00:28:24,199 --> 00:28:25,490 I might get a different number. 508 00:28:25,490 --> 00:28:27,156 Sometimes, I forget to take my shoes off 509 00:28:27,156 --> 00:28:29,281 so I'm a little bit taller than I would have been. 510 00:28:29,281 --> 00:28:30,530 So the numbers go up and down. 511 00:28:30,530 --> 00:28:31,447 They fluctuate, right? 512 00:28:31,447 --> 00:28:32,488 You'd expect that, right? 513 00:28:32,488 --> 00:28:33,950 If you looked at my medical chart, 514 00:28:33,950 --> 00:28:35,450 it's not the same number every time. 515 00:28:39,910 --> 00:28:43,970 But you'd think, if everything's right in the world, 516 00:28:43,970 --> 00:28:45,710 that I'm an old guy. 517 00:28:45,710 --> 00:28:47,960 I've been going to the medical clinic for a long time. 518 00:28:47,960 --> 00:28:50,170 If I look at my chart and average all those numbers, 519 00:28:50,170 --> 00:28:52,122 it should be somewhere close to the true value 520 00:28:52,122 --> 00:28:52,830 of those numbers. 521 00:28:52,830 --> 00:29:00,200 So I should have that the average values experimentally, 522 00:29:00,200 --> 00:29:03,677 which I just define to be the averages-- 523 00:29:15,110 --> 00:29:18,150 this is the number of experiments. 524 00:29:18,150 --> 00:29:20,990 OK, so I can have these averages. 525 00:29:20,990 --> 00:29:27,600 And I would expect that as n goes to infinity, 526 00:29:27,600 --> 00:29:31,670 I hope that my experimental values go 527 00:29:31,670 --> 00:29:34,520 to the same value of x that I would 528 00:29:34,520 --> 00:29:38,590 have gotten from the true probability distribution 529 00:29:38,590 --> 00:29:39,350 function. 530 00:29:39,350 --> 00:29:46,610 If I knew what Px of x is and I evaluated the integral 531 00:29:46,610 --> 00:29:49,970 and I got x, I think it should be the same as the experiment 532 00:29:49,970 --> 00:29:52,580 as long as I did enough repeats. 533 00:29:52,580 --> 00:29:55,262 So this is almost like an article of faith here, yeah? 534 00:29:55,262 --> 00:29:56,220 It's what you'd expect. 535 00:30:02,550 --> 00:30:04,450 Now, the interesting thing about this-- 536 00:30:04,450 --> 00:30:06,570 I mean, probably you've done this a lot. 537 00:30:06,570 --> 00:30:09,070 You probably did experiments and you've averaged some things 538 00:30:09,070 --> 00:30:10,110 before, right? 539 00:30:10,110 --> 00:30:12,600 If everybody in the class tried to measure how tall I was, 540 00:30:12,600 --> 00:30:14,490 you guys all wouldn't get the same number. 541 00:30:14,490 --> 00:30:15,690 But you'd think that if you took the average 542 00:30:15,690 --> 00:30:17,440 of the whole classroom, it might be pretty 543 00:30:17,440 --> 00:30:19,380 close to my true height, right? 544 00:30:19,380 --> 00:30:28,280 So the key idea here is that the sigma 545 00:30:28,280 --> 00:30:38,840 squared of the x measurement experimental, which 546 00:30:38,840 --> 00:30:42,310 we define to be this-- 547 00:30:55,967 --> 00:30:57,550 maybe we should do this one at a time. 548 00:31:00,130 --> 00:31:01,224 [INAUDIBLE] 549 00:31:11,470 --> 00:31:13,840 Then I can have a vector of these guys for all 550 00:31:13,840 --> 00:31:15,082 the different measurements. 551 00:31:15,082 --> 00:31:16,540 So there's some error in my height. 552 00:31:16,540 --> 00:31:17,390 There's some error in my weight. 553 00:31:17,390 --> 00:31:20,350 There's some different error in my blood pressure measurement, 554 00:31:20,350 --> 00:31:23,450 but each should have their own variances. 555 00:31:23,450 --> 00:31:24,910 I can have the covariances. 556 00:31:50,280 --> 00:31:52,620 OK, so these are all the experimental quantities. 557 00:31:52,620 --> 00:31:55,078 You guys maybe even computed all these before in your life. 558 00:31:57,760 --> 00:31:59,610 And we expect that this should go like this 559 00:31:59,610 --> 00:32:00,674 as n goes to infinity. 560 00:32:00,674 --> 00:32:02,340 Now what's going to happen to these guys 561 00:32:02,340 --> 00:32:03,374 as n goes to infinity? 562 00:32:03,374 --> 00:32:04,915 That's the really important question. 563 00:32:09,080 --> 00:32:13,910 So there's an amazing theory called the central limit 564 00:32:13,910 --> 00:32:15,565 theorem of statistics. 565 00:32:26,460 --> 00:32:40,240 And what this theorem says, that as n gets large 566 00:32:40,240 --> 00:32:58,470 and if trials are uncorrelated and the x's aren't correlated, 567 00:32:58,470 --> 00:33:01,960 which is the same as saying that Cij is equal to 0 off 568 00:33:01,960 --> 00:33:16,420 the diagonal, then the probability of making 569 00:33:16,420 --> 00:33:33,696 the measurement x is proportional to the Gaussian, 570 00:33:33,696 --> 00:33:34,320 the bell curve. 571 00:34:02,640 --> 00:34:04,470 All right? 572 00:34:04,470 --> 00:34:08,790 So this is only true as n gets very large. 573 00:34:08,790 --> 00:34:10,929 It doesn't specify exactly how large has to be, 574 00:34:10,929 --> 00:34:14,590 but it's true for any Px, any distribution 575 00:34:14,590 --> 00:34:17,005 function, probability distribution function. 576 00:34:17,005 --> 00:34:19,840 So everything becomes a bell curve 577 00:34:19,840 --> 00:34:22,719 if you look at the averages. 578 00:34:22,719 --> 00:34:27,760 And sigma i squared in that limit 579 00:34:27,760 --> 00:34:35,598 goes to 1 over n sigma xi squared experimental. 580 00:34:41,900 --> 00:34:44,130 And this is really important. 581 00:34:44,130 --> 00:34:47,750 So what this says is that the width of this Gaussian 582 00:34:47,750 --> 00:34:50,690 distribution gets narrower and narrower as you 583 00:34:50,690 --> 00:34:55,130 increase the number of repeated experiments 584 00:34:55,130 --> 00:34:57,860 or increase the number of samples. 585 00:34:57,860 --> 00:35:03,460 So this is really saying that the uncertainty in the mean 586 00:35:03,460 --> 00:35:09,321 is scaling as 1 over root n where 587 00:35:09,321 --> 00:35:10,820 n is the number of samples or number 588 00:35:10,820 --> 00:35:12,440 of experiments that's repeated. 589 00:35:15,420 --> 00:35:19,730 Now, sigma, the variance, is not like that at all. 590 00:35:19,730 --> 00:35:26,290 So this quantity, actually as you increase n, 591 00:35:26,290 --> 00:35:27,430 just goes to a constant. 592 00:35:27,430 --> 00:35:29,982 It goes to whatever the real variance is, 593 00:35:29,982 --> 00:35:31,940 which if you're measuring me, it might how good 594 00:35:31,940 --> 00:35:32,940 your ruler or something. 595 00:35:32,940 --> 00:35:35,890 It'll tell you roughly what the real variance is. 596 00:35:35,890 --> 00:35:39,840 And that number does not go to 0 as the number of repeats 597 00:35:39,840 --> 00:35:40,340 happens. 598 00:35:40,340 --> 00:35:42,090 I mean, I could get the whole student body 599 00:35:42,090 --> 00:35:45,112 to measure how tall I am at MIT, and they're still 600 00:35:45,112 --> 00:35:46,320 not going to have 0 variance. 601 00:35:46,320 --> 00:35:48,920 It's going to still be some variance, right? 602 00:35:48,920 --> 00:35:53,750 So this quantity stays constant as n increases 603 00:35:53,750 --> 00:35:56,570 or goes to a constant value once it sort of stabilizes. 604 00:35:56,570 --> 00:35:58,160 You have to have enough samples. 605 00:35:58,160 --> 00:36:02,487 But this quantity, the uncertainty in the mean value, 606 00:36:02,487 --> 00:36:04,820 gets smaller and smaller and smaller as the square of n. 607 00:36:09,850 --> 00:36:13,120 Now, this is only true in the limit as n is large. 608 00:36:13,120 --> 00:36:17,330 Now, this is a huge problem because experimentalists 609 00:36:17,330 --> 00:36:20,654 are lazy, and you don't want to do that many measurements. 610 00:36:20,654 --> 00:36:22,070 And it's hard to do a measurement. 611 00:36:22,070 --> 00:36:25,100 So for example, the Higgs boson was discovered, what, a year 612 00:36:25,100 --> 00:36:27,090 and a half ago, two years ago? 613 00:36:27,090 --> 00:36:31,070 And I think altogether, they had like nine observations 614 00:36:31,070 --> 00:36:33,920 or something when they reported it, OK? 615 00:36:33,920 --> 00:36:36,230 So nine is not infinity. 616 00:36:36,230 --> 00:36:39,020 And so they don't have infinitely small error 617 00:36:39,020 --> 00:36:40,155 bars on that measurement. 618 00:36:40,155 --> 00:36:42,530 And in fact, who knows if it really looks like a Gaussian 619 00:36:42,530 --> 00:36:45,280 distribution from such a small sample, 620 00:36:45,280 --> 00:36:49,640 but they still reported 90% confidence 621 00:36:49,640 --> 00:36:51,950 interval using the Gaussian distribution formula 622 00:36:51,950 --> 00:36:54,350 to figure out confidence intervals. 623 00:36:54,350 --> 00:36:56,760 So everybody does this. 624 00:36:56,760 --> 00:36:59,074 If n is big, it should be right. 625 00:36:59,074 --> 00:37:00,990 And you could prove mathematically it's right, 626 00:37:00,990 --> 00:37:04,540 but the formula doesn't really tell you how big is big. 627 00:37:04,540 --> 00:37:07,060 So this is like a general problem. 628 00:37:07,060 --> 00:37:13,000 And it leads to us oftentimes misestimating 629 00:37:13,000 --> 00:37:15,370 how accurate our results are because we're 630 00:37:15,370 --> 00:37:18,820 going to use formulas that are based on-- assuming that we've 631 00:37:18,820 --> 00:37:21,820 averaged enough repeats that we're in this limit where we 632 00:37:21,820 --> 00:37:25,270 can use the Gaussian formulas and get this nice limit 633 00:37:25,270 --> 00:37:26,284 formula. 634 00:37:26,284 --> 00:37:28,450 But in fact, we haven't really reach that because we 635 00:37:28,450 --> 00:37:30,480 haven't done enough repeats. 636 00:37:30,480 --> 00:37:33,880 So anyway, this is just the way life is. 637 00:37:33,880 --> 00:37:35,337 That's the way life is. 638 00:37:35,337 --> 00:37:37,420 And I think there's even discussions in statistics 639 00:37:37,420 --> 00:37:41,390 journals and stuff about how to make corrections and use 640 00:37:41,390 --> 00:37:46,870 slightly better forms that get the fact that your distribution 641 00:37:46,870 --> 00:37:49,960 of the mean doesn't narrow down to a beautiful Gaussian 642 00:37:49,960 --> 00:37:50,740 so fast. 643 00:37:50,740 --> 00:37:52,567 It has some stuff in the tails. 644 00:37:52,567 --> 00:37:54,400 People talk about that, like low probability 645 00:37:54,400 --> 00:37:56,816 events out in the tails of distributions, stuff like that. 646 00:37:56,816 --> 00:37:59,660 So that's a big field of statistics. 647 00:37:59,660 --> 00:38:01,890 I don't know too much about it, but it's like-- 648 00:38:01,890 --> 00:38:04,600 I mean, it's very practical because-- 649 00:38:04,600 --> 00:38:07,120 now unfortunately, oftentimes in chemical engineering, 650 00:38:07,120 --> 00:38:10,690 we make so few repeats that we have no chance 651 00:38:10,690 --> 00:38:13,480 to figure out what the tails are doing, maybe [INAUDIBLE] 652 00:38:13,480 --> 00:38:15,280 our tails. 653 00:38:15,280 --> 00:38:18,610 And so this is a big problem for trying to make sure 654 00:38:18,610 --> 00:38:20,110 you really have things right. 655 00:38:20,110 --> 00:38:25,690 So I would say in general, this is an optimistic estimate 656 00:38:25,690 --> 00:38:30,526 of what the uncertainty in the mean is. 657 00:38:30,526 --> 00:38:31,900 Uncertainties are usually bigger. 658 00:38:31,900 --> 00:38:35,950 So you shouldn't be surprised if your data doesn't 659 00:38:35,950 --> 00:38:41,156 match a model brilliantly well as predicted by this formula. 660 00:38:41,156 --> 00:38:43,030 Now, if it's off by some orders of magnitude, 661 00:38:43,030 --> 00:38:44,717 you might be a little alarmed. 662 00:38:44,717 --> 00:38:46,550 And that might be the normal situation, too. 663 00:38:46,550 --> 00:38:50,570 But anyway, if it's just off by a little bit, 664 00:38:50,570 --> 00:38:53,050 I wouldn't sweat it because you probably 665 00:38:53,050 --> 00:38:56,080 haven't done enough repeats to be entitled to such 666 00:38:56,080 --> 00:38:57,780 a beautiful result as this. 667 00:39:02,750 --> 00:39:05,970 We can write a similar-- 668 00:39:05,970 --> 00:39:10,850 actually, so here I assumed that the x's are uncorrelated. 669 00:39:10,850 --> 00:39:12,940 That's almost never true. 670 00:39:12,940 --> 00:39:15,350 If you actually numerically evaluate the C's, usually 671 00:39:15,350 --> 00:39:17,012 they have off-diagonal elements. 672 00:39:17,012 --> 00:39:18,845 For example, my weight and my blood pressure 673 00:39:18,845 --> 00:39:21,650 are probably correlated. 674 00:39:21,650 --> 00:39:25,820 And so you wouldn't expect them to be totally uncorrelated. 675 00:39:25,820 --> 00:39:32,330 And so there's another formula like this. 676 00:39:32,330 --> 00:39:34,490 It's given in the notes by Joe Scott 677 00:39:34,490 --> 00:39:36,920 that includes the covariance. 678 00:39:36,920 --> 00:39:42,770 And you just get a different form of what you'd expect, OK? 679 00:39:42,770 --> 00:39:48,440 And the covariance should also converge roughly as 1 over n 680 00:39:48,440 --> 00:39:49,880 if you have enough samples. 681 00:39:49,880 --> 00:39:52,058 So you should eventually get some covariance. 682 00:39:55,550 --> 00:39:59,120 You can write very similar formulas 683 00:39:59,120 --> 00:40:02,790 like this for functions. 684 00:40:02,790 --> 00:40:11,300 So if I have a function f of x and that's 685 00:40:11,300 --> 00:40:14,960 really what I care about-- 686 00:40:14,960 --> 00:40:19,370 remember, I said that I have the average value of f 687 00:40:19,370 --> 00:40:28,094 is equal to f of x Px of x dx. 688 00:40:28,094 --> 00:40:29,760 And I could make this vectors if I want. 689 00:40:32,370 --> 00:40:35,690 And I could repeat my function, and I'd get some number. 690 00:40:35,690 --> 00:40:37,310 And I could repeat the variance. 691 00:40:37,310 --> 00:40:38,090 I have a sigma f. 692 00:40:44,877 --> 00:40:46,960 And this is something I like to do a lot of times. 693 00:40:49,570 --> 00:40:59,060 Then if we do experimental delta f-- 694 00:40:59,060 --> 00:41:02,590 so we don't know what the probability distribution 695 00:41:02,590 --> 00:41:04,690 function is usually, or often. 696 00:41:04,690 --> 00:41:07,145 So we'll try to evaluate this experimentally. 697 00:41:11,030 --> 00:41:18,282 This is going to be 1 over n, the values 698 00:41:18,282 --> 00:41:22,905 f of x little n, the n-th trial. 699 00:41:26,610 --> 00:41:29,892 And we could write a similar thing for sigma f, 700 00:41:29,892 --> 00:41:31,100 which I just did right there. 701 00:41:31,100 --> 00:41:31,750 You can do the same thing. 702 00:41:31,750 --> 00:41:33,416 Just make these experimental values now. 703 00:41:40,580 --> 00:41:48,950 The sigma f squared experimental should go to 1 704 00:41:48,950 --> 00:41:51,470 over n times the variance. 705 00:41:55,860 --> 00:42:00,542 And this was the sigma in the mean of f, 1 over n 706 00:42:00,542 --> 00:42:02,950 times the variance of the sigma. 707 00:42:05,590 --> 00:42:07,900 All right, so this is the same beautiful thing, 708 00:42:07,900 --> 00:42:12,550 that the uncertainty in the mean value of f 709 00:42:12,550 --> 00:42:16,570 narrows with the number of trials. 710 00:42:16,570 --> 00:42:19,670 So you have some original variance 711 00:42:19,670 --> 00:42:24,400 that you computed here, either experimentally or from the PDF. 712 00:42:24,400 --> 00:42:25,870 Experimentally is fine. 713 00:42:25,870 --> 00:42:28,320 And then now you want to know the uncertainty 714 00:42:28,320 --> 00:42:30,535 in the mean value, and that drops down 715 00:42:30,535 --> 00:42:34,000 with the number of trials in the number of things you average. 716 00:42:34,000 --> 00:42:36,730 So this all leads in two directions. 717 00:42:36,730 --> 00:42:38,320 What we're going to talk about first 718 00:42:38,320 --> 00:42:41,950 is about comparing models versus experiments 719 00:42:41,950 --> 00:42:44,860 where we're sampling by doing the experiment. 720 00:42:44,860 --> 00:42:46,600 So that's one really important direction, 721 00:42:46,600 --> 00:42:47,920 maybe the most important one. 722 00:42:47,920 --> 00:42:52,630 But it also suggests ways you could do numerical integration. 723 00:42:52,630 --> 00:42:54,370 So if I wanted to evaluate an integral 724 00:42:54,370 --> 00:42:59,230 that looks like this, f of x P of x dx, 725 00:42:59,230 --> 00:43:02,860 and if I had some way to sample from Px, 726 00:43:02,860 --> 00:43:06,790 then one way to evaluate this numerical integral 727 00:43:06,790 --> 00:43:09,139 would be to-- 728 00:43:09,139 --> 00:43:10,930 sorry, I made this vector [INAUDIBLE] a lot 729 00:43:10,930 --> 00:43:15,420 of species there, a lot of directions. 730 00:43:15,420 --> 00:43:17,620 If I want to evaluate this multiple integral-- 731 00:43:17,620 --> 00:43:22,389 it's a lot of integrals for every dimension of x-- 732 00:43:22,389 --> 00:43:23,930 that would be very hard to do, right? 733 00:43:23,930 --> 00:43:25,554 We talked about in [INAUDIBLE],, if you 734 00:43:25,554 --> 00:43:28,930 get more than about three or four of these integral signs, 735 00:43:28,930 --> 00:43:31,660 usually you're in big trouble to evaluate the integral. 736 00:43:31,660 --> 00:43:34,210 But you can do it by what's called Monte Carlo sampling 737 00:43:34,210 --> 00:43:38,060 where you sample from P of x and just evaluate the value of f 738 00:43:38,060 --> 00:43:40,960 at some particular x points you pull as a sample 739 00:43:40,960 --> 00:43:42,767 and just repeat their average. 740 00:43:42,767 --> 00:43:44,350 And the average of those things should 741 00:43:44,350 --> 00:43:46,660 converge, according to this formula, 742 00:43:46,660 --> 00:43:48,590 as you increase the number of samples. 743 00:43:48,590 --> 00:43:50,590 And so that's the whole principle of Monte Carlo 744 00:43:50,590 --> 00:43:53,420 methods, and we'll come back to that a little bit later. 745 00:43:53,420 --> 00:43:56,650 And you can apply that to a lot of problems. 746 00:43:56,650 --> 00:44:00,800 Basically, any problem you have in numerics, you have a choice. 747 00:44:00,800 --> 00:44:03,930 You can use deterministic methods or stochastic methods. 748 00:44:03,930 --> 00:44:05,680 Deterministic methods, if you can do them, 749 00:44:05,680 --> 00:44:08,530 usually are the fastest and more accurate, 750 00:44:08,530 --> 00:44:11,560 but stochastic ones are often very easy to program 751 00:44:11,560 --> 00:44:14,250 and sometimes are actually the fastest way to do it. 752 00:44:14,250 --> 00:44:15,750 In particular, in this kind of case, 753 00:44:15,750 --> 00:44:17,572 we have lots of dimensions, many, many x's. 754 00:44:17,572 --> 00:44:19,780 It turns out that stochastic ones are pretty good way 755 00:44:19,780 --> 00:44:21,720 to do it. 756 00:44:21,720 --> 00:44:25,157 But we're going to talk mostly about [INAUDIBLE] data 757 00:44:25,157 --> 00:44:27,240 because that's going to be important to all of you 758 00:44:27,240 --> 00:44:28,570 in your research. 759 00:44:28,570 --> 00:44:30,380 So let's talk about that for a minute. 760 00:44:35,740 --> 00:44:38,310 I'll just comment, there's really good notes posted on 761 00:44:38,310 --> 00:44:40,440 the [INAUDIBLE] website for all this material, 762 00:44:40,440 --> 00:44:41,820 so you should definitely read it. 763 00:44:41,820 --> 00:44:43,705 And the textbook has a lot of material. 764 00:44:43,705 --> 00:44:46,950 It's maybe not so easy to read as the notes are, but plenty 765 00:44:46,950 --> 00:44:49,200 to learn, for sure. 766 00:44:49,200 --> 00:44:52,867 So we generally have a situation where we have an experiment. 767 00:44:52,867 --> 00:44:54,450 And what do we have in the experiment? 768 00:44:54,450 --> 00:44:56,370 We have some knobs. 769 00:44:59,130 --> 00:45:01,150 These are things that we can change. 770 00:45:01,150 --> 00:45:03,810 So we can change some valve positions. 771 00:45:03,810 --> 00:45:05,280 We can change how much electricity 772 00:45:05,280 --> 00:45:06,900 goes into our heaters. 773 00:45:06,900 --> 00:45:10,920 We can change the setting on our back pressure regulator. 774 00:45:10,920 --> 00:45:14,200 We can change the chemicals we pour into the system. 775 00:45:14,200 --> 00:45:16,920 So there's a lot of knobs that we control. 776 00:45:16,920 --> 00:45:20,850 And I'm going to call the knobs x. 777 00:45:20,850 --> 00:45:22,510 And then we have parameters. 778 00:45:25,790 --> 00:45:28,550 And these are other things that affect 779 00:45:28,550 --> 00:45:30,230 the result of the experiment that we 780 00:45:30,230 --> 00:45:32,420 don't have control over. 781 00:45:32,420 --> 00:45:35,840 And I'm going to call those theta. 782 00:45:35,840 --> 00:45:38,660 So for example, if I do a kinetics experiment, 783 00:45:38,660 --> 00:45:41,729 it depends on the rate coefficients. 784 00:45:41,729 --> 00:45:43,520 I have no control of the rate coefficients. 785 00:45:43,520 --> 00:45:45,769 They're going to [INAUDIBLE] by God, as far as I know. 786 00:45:45,769 --> 00:45:49,749 So they're some numbers, but they definitely affect 787 00:45:49,749 --> 00:45:51,540 the result. And if the rate coefficient had 788 00:45:51,540 --> 00:45:53,581 a different value, I would get a different result 789 00:45:53,581 --> 00:45:56,540 in the kinetic experiment. 790 00:45:56,540 --> 00:45:59,630 The molecular weight of sulfur, I have no control over that. 791 00:45:59,630 --> 00:46:00,930 That's just a parameter. 792 00:46:00,930 --> 00:46:02,660 But if I weigh something and it has a certain number of atoms 793 00:46:02,660 --> 00:46:05,020 of sulfur, it's going to be a very important parameter 794 00:46:05,020 --> 00:46:08,150 in determining the result. 795 00:46:08,150 --> 00:46:11,120 So we have these two things. 796 00:46:11,120 --> 00:46:20,550 And then we're going to have some measurables, things 797 00:46:20,550 --> 00:46:21,620 that we can measure. 798 00:46:21,620 --> 00:46:24,590 Let's call them y. 799 00:46:24,590 --> 00:46:28,740 And in general, we think that if we set the x value 800 00:46:28,740 --> 00:46:30,680 and we know the theta values, we should 801 00:46:30,680 --> 00:46:33,030 get some measurable values. 802 00:46:33,030 --> 00:46:37,070 And so there's a y that the model 803 00:46:37,070 --> 00:46:44,490 says that's a function of the x's and the thetas. 804 00:46:44,490 --> 00:46:46,985 Now, I write this as a simple function like this. 805 00:46:46,985 --> 00:46:48,380 This might be really complicated. 806 00:46:48,380 --> 00:46:50,213 It might have partial differential equations 807 00:46:50,213 --> 00:46:51,284 embedded inside it. 808 00:46:51,284 --> 00:46:53,450 It might have all kinds of horrible stuff inside it. 809 00:46:53,450 --> 00:46:54,920 But you guys already know how to solve all these problems 810 00:46:54,920 --> 00:46:55,940 already because you've done it. 811 00:46:55,940 --> 00:46:57,981 You've been in this class through seven homeworks 812 00:46:57,981 --> 00:46:58,940 already. 813 00:46:58,940 --> 00:47:00,654 And so no problem, right? 814 00:47:00,654 --> 00:47:01,820 So if I give you something-- 815 00:47:01,820 --> 00:47:02,810 I give you some knobs. 816 00:47:02,810 --> 00:47:06,705 I give you some parameters-- you can compute it, all right? 817 00:47:06,705 --> 00:47:09,080 And so then the question is-- that's what the model says. 818 00:47:09,080 --> 00:47:10,700 So we could predict the forward prediction 819 00:47:10,700 --> 00:47:13,100 of what the model should say if I knew what the parameter 820 00:47:13,100 --> 00:47:17,090 values were, if I knew what the knob values were. 821 00:47:17,090 --> 00:47:22,440 And I want to-- 822 00:47:22,440 --> 00:47:28,040 oftentimes what I measure, y data, 823 00:47:28,040 --> 00:47:30,780 which is a function of the knobs, 824 00:47:30,780 --> 00:47:32,834 it's implicitly a function of the parameters. 825 00:47:32,834 --> 00:47:34,375 I have no control of them, so I'm not 826 00:47:34,375 --> 00:47:35,510 going to even put them in here. 827 00:47:35,510 --> 00:47:36,593 So I set the knobs I want. 828 00:47:36,593 --> 00:47:37,990 I get some data. 829 00:47:37,990 --> 00:47:40,450 I want these two to match each other. 830 00:47:40,450 --> 00:47:43,470 I think they should be the same thing if my model is true, 831 00:47:43,470 --> 00:47:43,970 yeah? 832 00:47:43,970 --> 00:47:45,718 So this is my model, really. 833 00:47:51,780 --> 00:47:54,424 But I don't think they should be exactly the same. 834 00:47:54,424 --> 00:47:56,590 I mean, just like when you try to measure my height, 835 00:47:56,590 --> 00:47:58,290 you don't get exactly the same numbers. 836 00:47:58,290 --> 00:48:02,050 So these y data are not going to be exactly the same numbers 837 00:48:02,050 --> 00:48:03,482 as my model would say. 838 00:48:03,482 --> 00:48:04,940 So now I have to cope with the fact 839 00:48:04,940 --> 00:48:08,610 that I have deviations between the data and the model. 840 00:48:08,610 --> 00:48:11,160 And how am I going to handle that, all right? 841 00:48:15,260 --> 00:48:18,560 And also, we have a set of these guys, 842 00:48:18,560 --> 00:48:20,550 typically do some repeats. 843 00:48:20,550 --> 00:48:22,802 So we have like several numbers for each setting 844 00:48:22,802 --> 00:48:25,010 in the x's, and they don't even agree with each other 845 00:48:25,010 --> 00:48:25,820 because they're all different. 846 00:48:25,820 --> 00:48:27,361 Every time I repeated the experiment, 847 00:48:27,361 --> 00:48:29,150 I got some different result-- 848 00:48:29,150 --> 00:48:31,112 that's my y's-- for each x. 849 00:48:31,112 --> 00:48:32,570 And then I change the x a few times 850 00:48:32,570 --> 00:48:33,694 at different knob settings. 851 00:48:33,694 --> 00:48:35,160 Then I make some more measurements. 852 00:48:35,160 --> 00:48:37,370 And I have a whole bunch of y values 853 00:48:37,370 --> 00:48:40,350 that are all scattered numbers that maybe scatter 854 00:48:40,350 --> 00:48:42,660 around this model possibly, if I'm lucky, 855 00:48:42,660 --> 00:48:44,030 if the model's right. 856 00:48:44,030 --> 00:48:47,057 Often, usually I also don't know if the model's correct. 857 00:48:47,057 --> 00:48:49,390 So that's another thing to hold in the back of your mind 858 00:48:49,390 --> 00:48:51,320 is like, we're going to this whole comparison 859 00:48:51,320 --> 00:48:53,240 assuming the model's correct. 860 00:48:53,240 --> 00:48:55,640 And then we might, at the end, decide, hmm, maybe 861 00:48:55,640 --> 00:48:56,848 the model's not really right. 862 00:48:56,848 --> 00:48:58,340 I may have to go make a new model. 863 00:48:58,340 --> 00:49:01,110 So that's just a thing to keep in the back your mind. 864 00:49:01,110 --> 00:49:02,950 But we'll be optimistic to start with, 865 00:49:02,950 --> 00:49:04,340 and we'll assume that the model is good. 866 00:49:04,340 --> 00:49:05,714 And our only challenge is we just 867 00:49:05,714 --> 00:49:11,060 don't have the right values of the thetas, maybe, in my model. 868 00:49:11,060 --> 00:49:12,370 And this is another thing, too. 869 00:49:12,370 --> 00:49:14,690 So the thetas are things like rate coefficients 870 00:49:14,690 --> 00:49:16,610 and molecular weights and viscosities 871 00:49:16,610 --> 00:49:19,387 and stuff that are like properties of the universe, 872 00:49:19,387 --> 00:49:20,720 and they're real numbers, maybe. 873 00:49:20,720 --> 00:49:23,120 They're also things like the length of my apparatus 874 00:49:23,120 --> 00:49:25,010 and stuff like that. 875 00:49:25,010 --> 00:49:28,809 But I don't know those numbers to perfect precision, right? 876 00:49:28,809 --> 00:49:30,350 The best number I can find, if I look 877 00:49:30,350 --> 00:49:32,984 in the database is, you know, you 878 00:49:32,984 --> 00:49:34,400 could find like the speed of light 879 00:49:34,400 --> 00:49:36,525 to like 11 significant figures, but I don't know it 880 00:49:36,525 --> 00:49:38,240 to the 12th significant figure. 881 00:49:38,240 --> 00:49:40,144 So I don't know any of the numbers perfectly. 882 00:49:40,144 --> 00:49:42,060 And a lot of numbers I don't even know at all. 883 00:49:42,060 --> 00:49:43,280 So like there's some rate coefficients 884 00:49:43,280 --> 00:49:45,170 that no one has ever measured or calculated 885 00:49:45,170 --> 00:49:47,070 in the history of the world. 886 00:49:47,070 --> 00:49:48,930 And my students have to deal with that a lot 887 00:49:48,930 --> 00:49:49,910 in the Green group. 888 00:49:49,910 --> 00:49:52,719 So a lot of these are quite uncertain. 889 00:49:52,719 --> 00:49:54,510 But there are some that are pretty certain. 890 00:49:54,510 --> 00:49:56,736 You have quite a big variance, actually, of how certainly you 891 00:49:56,736 --> 00:49:57,819 know the parameter values. 892 00:50:00,900 --> 00:50:07,140 So one idea, a very popular idea, is to say, you know, 893 00:50:07,140 --> 00:50:12,260 I have this deviation between the model and the experiment. 894 00:50:12,260 --> 00:50:15,490 So I want to sort of do a minimization by varying, 895 00:50:15,490 --> 00:50:23,110 say, parameter values of some measure of the error 896 00:50:23,110 --> 00:50:24,561 between the model and the data. 897 00:50:32,100 --> 00:50:35,376 Somehow, I want to minimize that. 898 00:50:35,376 --> 00:50:38,000 And I have to think about, well, what should I really minimize? 899 00:50:38,000 --> 00:50:44,010 And the popular thing to minimize is these guys squared 900 00:50:44,010 --> 00:50:48,376 and actually to weight them by some kind of sigma 901 00:50:48,376 --> 00:50:49,500 for each one of these guys. 902 00:50:49,500 --> 00:50:51,441 So this is-- we should change the notation, 903 00:50:51,441 --> 00:50:52,190 make this clearer. 904 00:51:23,620 --> 00:51:30,890 These guys-- one model, and it's the i-th measurement 905 00:51:30,890 --> 00:51:33,853 that corresponds to that n-th experiment. 906 00:51:44,220 --> 00:51:49,320 So I think that the difference between what I measured 907 00:51:49,320 --> 00:51:51,690 and what the model calculated should be sort of scaled 908 00:51:51,690 --> 00:51:55,650 by the variance, right? 909 00:51:55,650 --> 00:51:58,530 So I would expect that this sum has 910 00:51:58,530 --> 00:52:00,990 a bunch of numbers that are sort of order of one 911 00:52:00,990 --> 00:52:03,840 because I expect the deviation to be approximately scaled 912 00:52:03,840 --> 00:52:06,800 of the variance of my measurements. 913 00:52:06,800 --> 00:52:10,750 And if these deviations are much larger than the variance, 914 00:52:10,750 --> 00:52:12,330 then I think my model's not right 915 00:52:12,330 --> 00:52:14,080 and what I'm going to try to do right here 916 00:52:14,080 --> 00:52:16,570 is I'm going to try to adjust the thetas, the parameters, 917 00:52:16,570 --> 00:52:21,300 to try to force the model to agree better to my experiment. 918 00:52:21,300 --> 00:52:28,000 And this form looks a lot like this. 919 00:52:28,000 --> 00:52:29,880 Do you see this? 920 00:52:29,880 --> 00:52:31,950 You see I have a sum of the deviations 921 00:52:31,950 --> 00:52:36,480 between the experiment and a theoretical sort of thing 922 00:52:36,480 --> 00:52:39,210 divided by some variance? 923 00:52:39,210 --> 00:52:42,800 And so this is the motivation of where this comes from, 924 00:52:42,800 --> 00:52:50,550 is that I want to make the probability that I would make 925 00:52:50,550 --> 00:52:55,470 this observation experimentally would be maximum 926 00:52:55,470 --> 00:53:00,460 if this quantity in the exponent is as small as possible. 927 00:53:00,460 --> 00:53:02,550 So I'm going to try to minimize that quantity, 928 00:53:02,550 --> 00:53:05,080 and that's exactly what I'm doing over here. 929 00:53:05,080 --> 00:53:06,502 Is that all right? 930 00:53:06,502 --> 00:53:07,960 OK, so next time when we come back, 931 00:53:07,960 --> 00:53:10,490 I'll talk more about how we actually do it.