1
00:00:01,540 --> 00:00:03,910
The following content is
provided under a Creative

2
00:00:03,910 --> 00:00:05,300
Commons license.

3
00:00:05,300 --> 00:00:07,510
Your support will help
MIT OpenCourseWare

4
00:00:07,510 --> 00:00:11,600
continue to offer high-quality
educational resources for free.

5
00:00:11,600 --> 00:00:14,140
To make a donation or to
view additional materials

6
00:00:14,140 --> 00:00:18,100
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,100 --> 00:00:19,310
at ocw.mit.edu.

8
00:00:22,702 --> 00:00:24,660
WILLIAM GREEN: All right,
so I know some of you

9
00:00:24,660 --> 00:00:27,030
have succeeded to do the
homework and some of you,

10
00:00:27,030 --> 00:00:28,380
I think, have not.

11
00:00:28,380 --> 00:00:29,520
Is this correct?

12
00:00:29,520 --> 00:00:30,144
AUDIENCE: Yeah.

13
00:00:30,144 --> 00:00:31,530
WILLIAM GREEN: OK.

14
00:00:31,530 --> 00:00:33,120
So I was wondering
if someone who

15
00:00:33,120 --> 00:00:34,536
has succeeded to
do their homework

16
00:00:34,536 --> 00:00:39,045
might comment on how small a
mesh do you need to converge.

17
00:00:39,045 --> 00:00:40,830
AUDIENCE: [INAUDIBLE]

18
00:00:40,830 --> 00:00:42,160
WILLIAM GREEN: It's about l?

19
00:00:42,160 --> 00:00:42,870
L?

20
00:00:42,870 --> 00:00:45,203
OK, so you need something on
the order of l to converge.

21
00:00:45,203 --> 00:00:45,969
Is that correct?

22
00:00:45,969 --> 00:00:47,760
So if you're trying to
do the problem using

23
00:00:47,760 --> 00:00:50,040
mesh much bigger than
l, you should probably

24
00:00:50,040 --> 00:00:52,323
try a tighter mesh.

25
00:00:52,323 --> 00:00:52,822
Yes?

26
00:00:52,822 --> 00:00:54,349
AUDIENCE: [INAUDIBLE]

27
00:00:54,349 --> 00:00:55,390
WILLIAM GREEN: All right.

28
00:00:55,390 --> 00:00:55,889
Yes?

29
00:00:55,889 --> 00:00:58,260
AUDIENCE: [INAUDIBLE]

30
00:00:58,260 --> 00:00:59,220
WILLIAM GREEN: Yes.

31
00:00:59,220 --> 00:01:01,510
Yes.

32
00:01:01,510 --> 00:01:02,440
All right.

33
00:01:02,440 --> 00:01:04,870
And has anyone managed to
get the [INAUDIBLE] solution

34
00:01:04,870 --> 00:01:08,480
to actually be consistent
with the [INAUDIBLE] solution?

35
00:01:08,480 --> 00:01:11,080
AUDIENCE: Something
like 3% or 4% or so.

36
00:01:11,080 --> 00:01:13,750
WILLIAM GREEN: 3% or 4%, OK.

37
00:01:13,750 --> 00:01:15,820
And I assume that the
[INAUDIBLE] is also

38
00:01:15,820 --> 00:01:18,790
using a mesh of similar size?

39
00:01:18,790 --> 00:01:19,599
Hard to tell?

40
00:01:19,599 --> 00:01:21,390
AUDIENCE: I used like
a triangular system--

41
00:01:21,390 --> 00:01:23,806
WILLIAM GREEN: Yeah, yeah, but
I mean, it's really, really

42
00:01:23,806 --> 00:01:24,946
tiny ones at the bottom?

43
00:01:24,946 --> 00:01:27,112
If you want me to blow it
up, I can just take a look

44
00:01:27,112 --> 00:01:29,800
and see to be sure.

45
00:01:29,800 --> 00:01:35,020
All right, and is backslash
able to handle a million

46
00:01:35,020 --> 00:01:36,395
by million matrix?

47
00:01:36,395 --> 00:01:40,040
AUDIENCE: Like 10
seconds with [INAUDIBLE]..

48
00:01:40,040 --> 00:01:41,290
WILLIAM GREEN: [INAUDIBLE] OK.

49
00:01:41,290 --> 00:01:44,220
So you need to-- so do
the sparse allocation.

50
00:01:44,220 --> 00:01:46,330
And MATLAB is so
smart that it just

51
00:01:46,330 --> 00:01:48,350
can handle it with a
million by million,

52
00:01:48,350 --> 00:01:49,766
which is pretty
amazing, actually.

53
00:01:49,766 --> 00:01:51,880
That's a pretty big matrix.

54
00:01:51,880 --> 00:01:55,720
All right, sorry,
this is too loud.

55
00:01:55,720 --> 00:02:00,020
All right, so last time, we were
doing some elementary things

56
00:02:00,020 --> 00:02:01,439
about probability.

57
00:02:01,439 --> 00:02:03,730
Actually, any more questions
about the homework problem

58
00:02:03,730 --> 00:02:04,290
before we get started?

59
00:02:04,290 --> 00:02:05,440
AUDIENCE: What's the answer?

60
00:02:05,440 --> 00:02:06,814
WILLIAM GREEN:
What's the answer?

61
00:02:06,814 --> 00:02:08,170
You could ask your classmates.

62
00:02:08,170 --> 00:02:09,963
Any other questions?

63
00:02:12,841 --> 00:02:13,340
All right.

64
00:02:17,420 --> 00:02:26,570
So I had you confused a
little bit with this formula

65
00:02:26,570 --> 00:02:31,047
probability of either
A or B. So I asked what

66
00:02:31,047 --> 00:02:31,880
the probability of--

67
00:02:31,880 --> 00:02:34,970
I flipped two coins-- that
one of them would be a head.

68
00:02:34,970 --> 00:02:36,890
And I could see a
lot of consternation.

69
00:02:36,890 --> 00:02:39,285
The general formula
for this is it's

70
00:02:39,285 --> 00:02:43,780
the probability of A
plus the probability of B

71
00:02:43,780 --> 00:02:51,822
minus the probability
of A and B.

72
00:02:51,822 --> 00:02:53,780
It can't just be the two
of them added together

73
00:02:53,780 --> 00:02:58,520
because if you have 50%
chance for head of the penny

74
00:02:58,520 --> 00:03:01,340
and the dime is 50% chance,
this would add up to 100% chance

75
00:03:01,340 --> 00:03:04,170
that you'll get a head, but
you know sometimes it's true.

76
00:03:04,170 --> 00:03:06,000
So this is the formula.

77
00:03:06,000 --> 00:03:11,810
And then the
probability of A and B

78
00:03:11,810 --> 00:03:15,920
is often written in terms of
the conditional probabilities,

79
00:03:15,920 --> 00:03:19,400
the probability of A times the
probability that B would happen

80
00:03:19,400 --> 00:03:22,171
given that A already
happened, which is also

81
00:03:22,171 --> 00:03:23,420
equal to the other way around.

82
00:03:30,020 --> 00:03:31,520
And this has to
be read carefully.

83
00:03:31,520 --> 00:03:33,845
It means B already
happened, and then you

84
00:03:33,845 --> 00:03:35,220
want to know the
probability of A

85
00:03:35,220 --> 00:03:37,000
given that B already happened.

86
00:03:37,000 --> 00:03:38,954
So it's sort of like this way--

87
00:03:38,954 --> 00:03:40,620
I don't know-- the
way I think about it.

88
00:03:40,620 --> 00:03:41,940
Like, this happened
first. and now I'm

89
00:03:41,940 --> 00:03:44,148
checking the probability
that that's going to happen.

90
00:03:47,730 --> 00:03:50,165
Now, a nice little
example of this

91
00:03:50,165 --> 00:03:51,540
is given in
[INAUDIBLE] textbook.

92
00:03:51,540 --> 00:03:53,460
And I think it's nice enough
that it's worthwhile to spend

93
00:03:53,460 --> 00:03:55,050
a few minutes talking about it.

94
00:03:55,050 --> 00:03:56,160
So he was--

95
00:03:56,160 --> 00:03:57,630
[INAUDIBLE] who
wrote the textbook,

96
00:03:57,630 --> 00:03:59,005
was not actually
a numerical guy.

97
00:03:59,005 --> 00:04:00,660
He was a polymer chemist.

98
00:04:00,660 --> 00:04:03,870
And so he gave a
nice polymer example.

99
00:04:03,870 --> 00:04:07,380
So if you have a
polymer and the monomers

100
00:04:07,380 --> 00:04:11,790
have some big molecule,
and at one side,

101
00:04:11,790 --> 00:04:14,816
they have a sort
of acceptor group,

102
00:04:14,816 --> 00:04:17,440
and the other side, some kind of
donor group-- we'll call it D,

103
00:04:17,440 --> 00:04:19,935
I guess.

104
00:04:19,935 --> 00:04:21,060
And these are the monomers.

105
00:04:21,060 --> 00:04:22,310
And so they can link together.

106
00:04:22,310 --> 00:04:23,940
The donor can react
to the acceptor.

107
00:04:23,940 --> 00:04:34,182
So you can end up with
things like this and so on.

108
00:04:34,182 --> 00:04:35,140
So this is the monomer.

109
00:04:35,140 --> 00:04:36,186
This is the dimer.

110
00:04:36,186 --> 00:04:38,060
Then you could keep on
[INAUDIBLE] like this.

111
00:04:38,060 --> 00:04:40,370
And many, many, many
of the materials

112
00:04:40,370 --> 00:04:42,920
you use every day, the
fabrics in the seats

113
00:04:42,920 --> 00:04:45,506
that you're sitting
on, the backs

114
00:04:45,506 --> 00:04:51,380
of the seats, your
clothing, the binder

115
00:04:51,380 --> 00:04:53,300
holding the chalk
together, all this stuff

116
00:04:53,300 --> 00:04:55,670
is made from polymers like this.

117
00:04:55,670 --> 00:04:58,220
So this is a pretty important,
actually, practical problem.

118
00:04:58,220 --> 00:05:03,620
And so you start
with the monomers,

119
00:05:03,620 --> 00:05:08,840
and they react where you have A
reacting plus D, over and over

120
00:05:08,840 --> 00:05:10,380
again.

121
00:05:10,380 --> 00:05:17,540
And we want to understand
the statistics of what

122
00:05:17,540 --> 00:05:21,802
chain lengths are going to
make, maybe what weight percent

123
00:05:21,802 --> 00:05:23,510
or what would the
average microweight be,

124
00:05:23,510 --> 00:05:26,093
something like that would be the
kind of things we care about.

125
00:05:29,030 --> 00:05:34,180
So a way to think about
it is if I've reacted this

126
00:05:34,180 --> 00:05:39,880
to some extent and I just
grab a random polymer

127
00:05:39,880 --> 00:05:44,850
chain, any molecule in there,
and I look and find, let's

128
00:05:44,850 --> 00:05:49,840
say, the unreacted D end--

129
00:05:49,840 --> 00:05:53,380
so any oligomer is going to
have one unreacted D end.

130
00:05:53,380 --> 00:05:55,100
You can see no matter
how long I make it,

131
00:05:55,100 --> 00:05:56,815
there will still be
one unreacted D end.

132
00:05:56,815 --> 00:05:58,690
And I'm neglecting the
possibility this might

133
00:05:58,690 --> 00:06:00,880
circle around and make a loop.

134
00:06:00,880 --> 00:06:04,180
So assuming no loops,
then any molecule I grab

135
00:06:04,180 --> 00:06:06,490
is going to have
one unreacted D end.

136
00:06:06,490 --> 00:06:08,380
So I grab a molecule.

137
00:06:08,380 --> 00:06:10,780
I start at the unreacted
D end, and I look

138
00:06:10,780 --> 00:06:12,350
at the A that's next to it.

139
00:06:12,350 --> 00:06:14,570
And I say, is that
A reacted or not?

140
00:06:14,570 --> 00:06:17,650
So if it's a monomer, I grab
the D. I look over here.

141
00:06:17,650 --> 00:06:18,880
The A is unreacted.

142
00:06:18,880 --> 00:06:25,760
So the probability
that it's a monomer

143
00:06:25,760 --> 00:06:27,890
is going to be equal
sort of like 1 minus P

144
00:06:27,890 --> 00:06:33,200
where P is the
probability that As react.

145
00:06:33,200 --> 00:06:36,380
So it didn't react,
so just like that.

146
00:06:36,380 --> 00:06:40,250
This one, the one next
to it has reacted.

147
00:06:40,250 --> 00:06:47,910
So this is just going to be
the probability of a dimer is

148
00:06:47,910 --> 00:07:00,540
the probability that my
nearest neighbor reacted

149
00:07:00,540 --> 00:07:14,286
and next neighbor
is unreacted, right?

150
00:07:14,286 --> 00:07:17,600
Is that OK?

151
00:07:17,600 --> 00:07:19,262
So I can write this way.

152
00:07:19,262 --> 00:07:20,720
I could say, what's
the probability

153
00:07:20,720 --> 00:07:34,510
that my nearest reacted times
a conditional probability, next

154
00:07:34,510 --> 00:07:42,480
unreacted if nearest is reacted?

155
00:07:46,770 --> 00:07:49,760
So far, so good?

156
00:07:49,760 --> 00:07:50,940
You guys are OK with this?

157
00:07:50,940 --> 00:07:52,400
So I grabbed a chain.

158
00:07:52,400 --> 00:07:54,370
I'm trying to see
if it's a dimer.

159
00:07:54,370 --> 00:07:56,100
I'm going to calculate
the probability

160
00:07:56,100 --> 00:07:59,580
that this next
acceptor group has

161
00:07:59,580 --> 00:08:02,015
been reacted to a donor group.

162
00:08:02,015 --> 00:08:03,390
If it has reacted,
then I'm going

163
00:08:03,390 --> 00:08:05,277
to check the next
one after that.

164
00:08:05,277 --> 00:08:06,610
So this is the nearest neighbor.

165
00:08:06,610 --> 00:08:07,970
This is the next
nearest neighbor.

166
00:08:07,970 --> 00:08:09,303
And I want this to be unreacted.

167
00:08:09,303 --> 00:08:12,320
If that's both true,
then I have a dimer.

168
00:08:12,320 --> 00:08:14,890
If either one of those
is false, [INAUDIBLE]..

169
00:08:14,890 --> 00:08:17,270
Is that OK?

170
00:08:17,270 --> 00:08:19,515
So now I need to
have a probability.

171
00:08:19,515 --> 00:08:22,250
So what's the probability that
the nearest one is reacted?

172
00:08:22,250 --> 00:08:24,560
There's some probability
that things have reacted.

173
00:08:24,560 --> 00:08:29,860
So this is going to be my
P, probability that things

174
00:08:29,860 --> 00:08:31,050
reacted.

175
00:08:31,050 --> 00:08:33,419
And I wanted this
to be unreacted.

176
00:08:33,419 --> 00:08:34,470
Now, there's a question.

177
00:08:34,470 --> 00:08:37,179
Are these correlated or not?

178
00:08:37,179 --> 00:08:39,976
Now, in reality, everything's
correlated to everything.

179
00:08:39,976 --> 00:08:41,309
So probably, they're correlated.

180
00:08:41,309 --> 00:08:44,850
But if we're trying to make
a model and think about it,

181
00:08:44,850 --> 00:08:47,100
the fact that this thing
reacted at this side

182
00:08:47,100 --> 00:08:48,750
doesn't really affect
this side if this

183
00:08:48,750 --> 00:08:52,500
is a big enough [INAUDIBLE]
So to a good approximation,

184
00:08:52,500 --> 00:08:55,140
this is independent of whether
or not it's reacted or not.

185
00:08:55,140 --> 00:08:58,530
So this is still going to
have the ordinary probability

186
00:08:58,530 --> 00:09:05,160
of being unreacted,
which would be 1 minus P.

187
00:09:05,160 --> 00:09:18,310
So I could write down that the
probability of being a monomer

188
00:09:18,310 --> 00:09:24,930
is equal to 1 minus P. The
probability of being a dimer

189
00:09:24,930 --> 00:09:28,940
is equal to P times 1 minus P.
What's the probability of being

190
00:09:28,940 --> 00:09:29,440
a trimer?

191
00:09:36,540 --> 00:09:44,630
P squared times 1 minus
P. And in general,

192
00:09:44,630 --> 00:09:50,370
the probability
of being an n-mer

193
00:09:50,370 --> 00:09:57,320
is equal to P n minus
1 times 1 minus P.

194
00:09:57,320 --> 00:10:01,880
So now you guys are
statistical polymer chemists.

195
00:10:01,880 --> 00:10:04,790
So this derivation was
derived by a guy named Flory.

196
00:10:04,790 --> 00:10:05,950
He got the Nobel Prize.

197
00:10:05,950 --> 00:10:07,329
He's a pretty important guy.

198
00:10:07,329 --> 00:10:08,870
If you want to learn
a lot about him,

199
00:10:08,870 --> 00:10:10,619
I think both Professor
Cohen and Professor

200
00:10:10,619 --> 00:10:12,920
Rutledge teach classes
that are basically, learn

201
00:10:12,920 --> 00:10:15,280
what Mr. Flory figured out.

202
00:10:15,280 --> 00:10:18,899
Well, maybe that's a little bit
too strong, but pretty much.

203
00:10:18,899 --> 00:10:20,440
There's another guy
named [INAUDIBLE]

204
00:10:20,440 --> 00:10:22,970
that did a bit too, so
[INAUDIBLE] and Flory.

205
00:10:22,970 --> 00:10:24,995
Basically everything
about polymers

206
00:10:24,995 --> 00:10:26,450
worked out by at these guys.

207
00:10:26,450 --> 00:10:27,640
And all they did was
just probability theory,

208
00:10:27,640 --> 00:10:28,723
so it was a piece of cake.

209
00:10:31,820 --> 00:10:36,480
And so this is the probability
that you have an n-mer.

210
00:10:36,480 --> 00:10:38,210
So now we can
compute things like,

211
00:10:38,210 --> 00:10:41,720
what is the expectation
value of the chain length?

212
00:10:41,720 --> 00:10:44,930
How many guys link together?

213
00:10:44,930 --> 00:10:50,870
And that's defined
to be the sum of n

214
00:10:50,870 --> 00:10:52,611
times the probability of n.

215
00:11:00,310 --> 00:11:09,435
So that, in this case, is
going to be sum of n times P

216
00:11:09,435 --> 00:11:14,970
to the n minus 1
times 1 minus P.

217
00:11:14,970 --> 00:11:19,880
Now, a lot of these kinds
of simple series summations,

218
00:11:19,880 --> 00:11:20,880
there's formulas for it.

219
00:11:20,880 --> 00:11:22,570
And maybe in high school, you
guys might have studied series.

220
00:11:22,570 --> 00:11:24,012
I don't know if you remember.

221
00:11:24,012 --> 00:11:24,970
And so you can look up.

222
00:11:24,970 --> 00:11:26,430
And some of these have
analytical formulas

223
00:11:26,430 --> 00:11:27,000
that are really simple.

224
00:11:27,000 --> 00:11:28,320
But you can just
leave it this way too,

225
00:11:28,320 --> 00:11:29,390
because you get a
value numerically

226
00:11:29,390 --> 00:11:30,390
with MATLAB, no trouble.

227
00:11:38,090 --> 00:11:44,990
You can also figure out what is
the concentration of oligomers

228
00:11:44,990 --> 00:11:48,400
with n units in them.

229
00:11:48,400 --> 00:12:00,440
And so that's going to be equal
to the total concentration

230
00:12:00,440 --> 00:12:08,010
of polymers times the
probability that it has n.

231
00:12:11,460 --> 00:12:13,200
So this one, we just worked out.

232
00:12:13,200 --> 00:12:18,540
The total concentration,
a way to figure

233
00:12:18,540 --> 00:12:24,680
that out is to
think about there's

234
00:12:24,680 --> 00:12:27,882
one monomer or one monomer--

235
00:12:27,882 --> 00:12:29,090
I'll call this a polymer too.

236
00:12:29,090 --> 00:12:30,860
This is a polymer with one unit.

237
00:12:30,860 --> 00:12:36,320
There's one polymer molecule per
unreacted end, unreacted D end.

238
00:12:36,320 --> 00:12:39,110
So it's really, I want to
know how many are unreacted.

239
00:12:39,110 --> 00:12:47,720
So that's going to be 1 minus
P times the amount of monomer

240
00:12:47,720 --> 00:12:50,130
I had to start with.

241
00:12:50,130 --> 00:12:53,870
It could be A or D.
It doesn't matter.

242
00:12:53,870 --> 00:12:55,950
It's like, how many of them--

243
00:12:55,950 --> 00:12:59,190
I started with a certain
amount of free ends.

244
00:12:59,190 --> 00:13:04,355
What fraction of them
have reacted based on 1

245
00:13:04,355 --> 00:13:08,330
minus P. Yeah, it's 1 minus P.

246
00:13:08,330 --> 00:13:09,680
So as P goes--

247
00:13:09,680 --> 00:13:11,040
well, yeah, it goes backwards.

248
00:13:11,040 --> 00:13:15,340
Yeah, as P goes to infinity--

249
00:13:15,340 --> 00:13:16,520
I think that's right.

250
00:13:19,060 --> 00:13:20,740
Yeah, when P is--

251
00:13:20,740 --> 00:13:23,510
well, I'm totally
confused here now.

252
00:13:23,510 --> 00:13:26,350
1 minus P sound right?

253
00:13:26,350 --> 00:13:27,850
Maybe I did the
reasoning backwards.

254
00:13:27,850 --> 00:13:29,391
This is definitely
the right formula.

255
00:13:29,391 --> 00:13:31,200
I'm just confusing
myself with my language.

256
00:13:31,200 --> 00:13:34,350
This is a, at least
for me, endemic problem

257
00:13:34,350 --> 00:13:37,505
with probability is you
could say things very glibly.

258
00:13:37,505 --> 00:13:39,380
You've got to think of
exactly what you mean.

259
00:13:39,380 --> 00:13:59,470
So the concentration
of unreacted ends,

260
00:13:59,470 --> 00:14:08,180
so initially, this was equal to
A. It was all unreacted ends.

261
00:14:08,180 --> 00:14:14,010
And as the process proceeds, as
P increases, then at the end,

262
00:14:14,010 --> 00:14:15,380
it's going to be very small.

263
00:14:15,380 --> 00:14:16,210
So this is right.

264
00:14:20,565 --> 00:14:22,190
And the concentration
of unreacted ends

265
00:14:22,190 --> 00:14:23,648
is equal to the
total concentration

266
00:14:23,648 --> 00:14:26,410
of polymers, the number
of polymers [INAUDIBLE]..

267
00:14:26,410 --> 00:14:31,332
So it's this times P n
minus 1 times [INAUDIBLE]..

268
00:14:36,120 --> 00:14:41,070
All right, and this is called
the Flory redistribution.

269
00:14:41,070 --> 00:14:43,680
And that gives the
concentrations of all

270
00:14:43,680 --> 00:14:46,410
your oligomers after
you do a polymerization

271
00:14:46,410 --> 00:14:49,540
if they're all uncorrelated
and you don't form any loops.

272
00:14:55,120 --> 00:14:57,410
It's often very
important to know

273
00:14:57,410 --> 00:14:58,975
the width of the distribution.

274
00:14:58,975 --> 00:15:01,100
If you make a polymer, you
want to make things have

275
00:15:01,100 --> 00:15:03,160
as monodisperse as possible.

276
00:15:03,160 --> 00:15:05,806
It's because you'd really like
to make this pure chemical.

277
00:15:05,806 --> 00:15:07,430
There's some polymer
chain length which

278
00:15:07,430 --> 00:15:09,410
is optimal for your purpose.

279
00:15:09,410 --> 00:15:12,036
You want to try to make
sure that the average value,

280
00:15:12,036 --> 00:15:14,660
average value, this is going to
be equal to the value you want.

281
00:15:14,660 --> 00:15:18,230
So you want to keep running P up
until you reach the point where

282
00:15:18,230 --> 00:15:21,470
the average chain length
is the chain length that's

283
00:15:21,470 --> 00:15:23,870
optimal for your application.

284
00:15:23,870 --> 00:15:25,910
If you make the
polymer too long,

285
00:15:25,910 --> 00:15:28,780
then it's going to be
hard to dissolve it.

286
00:15:28,780 --> 00:15:31,610
It's going to be hard to
handle and it's can be solid.

287
00:15:31,610 --> 00:15:33,390
If you make it
too short, then it

288
00:15:33,390 --> 00:15:35,140
may not have the
mechanical properties you

289
00:15:35,140 --> 00:15:36,470
need for the polymer to have.

290
00:15:36,470 --> 00:15:38,270
So there's some optimal choice.

291
00:15:38,270 --> 00:15:41,090
So you typically
run the conversion

292
00:15:41,090 --> 00:15:44,436
until P reaches a number so
that this is your optimal value,

293
00:15:44,436 --> 00:15:46,310
but then you care about
what's the dispersion

294
00:15:46,310 --> 00:15:48,380
about that optimal value.

295
00:15:48,380 --> 00:15:50,960
And particularly, the unreacted
monomers that are left

296
00:15:50,960 --> 00:15:53,300
might be a problem because
they might leach out

297
00:15:53,300 --> 00:15:55,280
over time because they
might still be liquids,

298
00:15:55,280 --> 00:15:56,740
or even gases that come out.

299
00:15:56,740 --> 00:15:59,930
So this famous problem,
people made baby bottles

300
00:15:59,930 --> 00:16:03,320
and they have some
leftover small molecules

301
00:16:03,320 --> 00:16:04,491
in the baby bottles.

302
00:16:04,491 --> 00:16:06,240
And then they can leach
out into the milk,

303
00:16:06,240 --> 00:16:08,210
and the mothers don't
appreciate that.

304
00:16:08,210 --> 00:16:10,940
So there's a lot of
real practical problems

305
00:16:10,940 --> 00:16:11,949
about how to do this.

306
00:16:11,949 --> 00:16:13,740
So anyway, you'd be
interested in the width

307
00:16:13,740 --> 00:16:15,550
of the distribution.

308
00:16:15,550 --> 00:16:18,775
So we define what's
called the variance.

309
00:16:23,810 --> 00:16:30,436
And the variance of n
is written this way.

310
00:16:30,436 --> 00:16:34,070
And it's just defined
to be the expectation

311
00:16:34,070 --> 00:16:38,180
value of n squared minus the
expectation value of n squared.

312
00:16:38,180 --> 00:16:41,150
These two are always different
or almost always different,

313
00:16:41,150 --> 00:16:42,130
so it's not 0.

314
00:16:45,490 --> 00:16:47,830
So this is equal
to the summation

315
00:16:47,830 --> 00:16:55,320
of n squared times the
probability of n minus--

316
00:17:01,690 --> 00:17:02,780
all right?

317
00:17:02,780 --> 00:17:05,295
And a lot of times
in the polymer field,

318
00:17:05,295 --> 00:17:07,670
what they'll take is they'll
take the square root of this

319
00:17:07,670 --> 00:17:09,530
and they'll compare
sigma n divided

320
00:17:09,530 --> 00:17:11,119
by expectation value of n.

321
00:17:11,119 --> 00:17:13,579
This is a dimensionless
number because sigma n

322
00:17:13,579 --> 00:17:14,940
will have the dimensions.

323
00:17:14,940 --> 00:17:17,359
Sigma squared is
dimensions n squared.

324
00:17:17,359 --> 00:17:20,420
This is dimensions of n, so
it's a dimensionless number.

325
00:17:20,420 --> 00:17:23,730
And that's-- I think they
call it dispersity of polymer,

326
00:17:23,730 --> 00:17:24,720
something like that.

327
00:17:29,180 --> 00:17:33,910
Now, notice that when we
use these [INAUDIBLE],,

328
00:17:33,910 --> 00:17:35,770
when we wrote it this
way, it's implicitly

329
00:17:35,770 --> 00:17:38,920
that these things are
divided by the summation

330
00:17:38,920 --> 00:17:41,230
of the probability of n.

331
00:17:41,230 --> 00:17:43,210
But because these
probabilities sum to 1,

332
00:17:43,210 --> 00:17:44,210
I can just leave it out.

333
00:17:47,320 --> 00:17:49,440
But sometimes, it
may be difficult

334
00:17:49,440 --> 00:17:51,950
for you to figure out exactly
what the probabilities are

335
00:17:51,950 --> 00:17:53,730
and you'll need a scaling
factor to force this thing

336
00:17:53,730 --> 00:17:54,480
to be equal to 1.

337
00:17:54,480 --> 00:17:57,020
So sometimes, people leave
these in the denominator.

338
00:17:57,020 --> 00:17:58,811
There's another thing
you might care about,

339
00:17:58,811 --> 00:18:05,400
which would be like, what's
the weight percent of Pn?

340
00:18:05,400 --> 00:18:07,920
So what fraction of the
weight of the polymer

341
00:18:07,920 --> 00:18:10,470
is my particular oligomer, Pn?

342
00:18:10,470 --> 00:18:13,717
[INAUDIBLE] sorry,
some special one, Pm.

343
00:18:13,717 --> 00:18:15,300
And I want to know
its weight percent.

344
00:18:15,300 --> 00:18:28,650
So that's going to be
equal to the weight of Pm

345
00:18:28,650 --> 00:18:33,060
in the mix divided
by the total weight.

346
00:18:39,070 --> 00:18:44,790
So that's equal to
the weight of m times

347
00:18:44,790 --> 00:18:49,770
the probability of m divided
by the total weight, which

348
00:18:49,770 --> 00:18:51,812
is going to be the
weight of all these guys,

349
00:18:51,812 --> 00:18:54,940
times the probability
of each of them.

350
00:18:54,940 --> 00:18:56,570
And you can see
this is different.

351
00:18:56,570 --> 00:18:59,140
This is not the same as--

352
00:18:59,140 --> 00:19:09,880
not equal to, right?

353
00:19:09,880 --> 00:19:11,090
It's not the same thing.

354
00:19:11,090 --> 00:19:15,812
So just watch out
when you do this.

355
00:19:15,812 --> 00:19:18,270
And in fact, in the polymer
world, they always have to say,

356
00:19:18,270 --> 00:19:19,150
I did weight average.

357
00:19:19,150 --> 00:19:23,210
I did number average,
because they're different.

358
00:19:23,210 --> 00:19:25,436
Is this OK?

359
00:19:25,436 --> 00:19:27,650
Yeah?

360
00:19:27,650 --> 00:19:30,029
So I would-- my general
confidence, at least for me,

361
00:19:30,029 --> 00:19:32,570
if I skip steps, I always get
it wrong when I do probability.

362
00:19:32,570 --> 00:19:33,745
So don't skip steps.

363
00:19:33,745 --> 00:19:36,890
Do it one by one by one,
what you really mean.

364
00:19:36,890 --> 00:19:37,640
Then you'll be OK.

365
00:19:40,370 --> 00:19:45,270
All right, now, this is
a cute little example.

366
00:19:45,270 --> 00:19:46,270
It's discrete variables.

367
00:19:46,270 --> 00:19:48,995
It's easy to count everything.

368
00:19:48,995 --> 00:19:51,120
Very often, we care about
probability distributions

369
00:19:51,120 --> 00:19:52,272
of continuous variables.

370
00:19:52,272 --> 00:19:54,480
And we have to do those
probability density functions

371
00:19:54,480 --> 00:19:58,430
that I talked about last time
which have units in them.

372
00:19:58,430 --> 00:20:03,090
And so as we
mentioned last time,

373
00:20:03,090 --> 00:20:04,770
if you want to know
the probability

374
00:20:04,770 --> 00:20:18,740
that a continuous variable, that
x is a member of this interval,

375
00:20:18,740 --> 00:20:30,340
the probability this is true is
equal to Px of x hat times dx.

376
00:20:33,190 --> 00:20:35,230
And so this quantity
has units of 1

377
00:20:35,230 --> 00:20:39,284
over x, whatever
the units of x are.

378
00:20:39,284 --> 00:20:41,200
And then you have to
multiply it by x in order

379
00:20:41,200 --> 00:20:42,560
to get the units to be
dimensionless, which

380
00:20:42,560 --> 00:20:43,685
is what the probability is.

381
00:20:47,850 --> 00:20:54,050
And this is like obvious things,
like P Px of x prime dx prime.

382
00:20:54,050 --> 00:20:56,550
[INAUDIBLE] value as possible
of x has got to be equal to 1.

383
00:20:56,550 --> 00:20:59,810
It's a probability,
which is the same

384
00:20:59,810 --> 00:21:03,380
as saying that probability of
x is some value anywhere is 1.

385
00:21:03,380 --> 00:21:05,651
So there's some
[INAUDIBLE] you measure.

386
00:21:08,480 --> 00:21:13,490
And you can also have a
probability that x is less than

387
00:21:13,490 --> 00:21:15,470
or equal to x prime.

388
00:21:15,470 --> 00:21:19,045
And that's the integral
from negative infinity

389
00:21:19,045 --> 00:21:23,520
to x prime of Px of x dx.

390
00:21:29,760 --> 00:21:35,716
And the mean is just
the integral of x Px.

391
00:21:42,660 --> 00:21:44,320
And you can compute
the x squared.

392
00:21:44,320 --> 00:21:47,070
The average of x and
x squared, same thing.

393
00:21:47,070 --> 00:21:48,440
You average of anything.

394
00:21:55,290 --> 00:21:57,250
You can put these together.

395
00:21:57,250 --> 00:22:02,280
You can get sigma x squared
is equal to x squared

396
00:22:02,280 --> 00:22:06,640
minus the average squared.

397
00:22:06,640 --> 00:22:09,690
So that's the variance of x.

398
00:22:09,690 --> 00:22:12,600
You can also do this
with any function.

399
00:22:12,600 --> 00:22:18,510
So you can say that the
average value of a function

400
00:22:18,510 --> 00:22:24,112
is equal to the integral
of f of x Px of x dx.

401
00:22:27,030 --> 00:22:29,370
This is an average
value of a function

402
00:22:29,370 --> 00:22:31,770
of a random variable described
by probability density

403
00:22:31,770 --> 00:22:34,350
function with P of x.

404
00:22:34,350 --> 00:22:37,046
And then you can get
things like sigma

405
00:22:37,046 --> 00:22:43,890
f squared is equal to the
integral of f of x, quantity

406
00:22:43,890 --> 00:22:49,555
squared, Px of x minus--

407
00:22:55,870 --> 00:22:59,274
all right?

408
00:22:59,274 --> 00:22:59,940
Everything's OK?

409
00:22:59,940 --> 00:23:02,101
Yeah.

410
00:23:02,101 --> 00:23:04,350
All right, so a lot of times,
people are going to say,

411
00:23:04,350 --> 00:23:06,900
we do sampling from Px.

412
00:23:06,900 --> 00:23:09,720
So sampling from Px means
that we have some probability

413
00:23:09,720 --> 00:23:13,200
distribution function,
Px of x, and we

414
00:23:13,200 --> 00:23:17,400
want to have one value of x that
we draw from that probability

415
00:23:17,400 --> 00:23:18,660
distribution.

416
00:23:18,660 --> 00:23:20,760
When we say it that
way, we mean that we're

417
00:23:20,760 --> 00:23:25,370
more likely to find x's
where Px has a high value

418
00:23:25,370 --> 00:23:28,430
and we're less likely to
draw an x value that Px

419
00:23:28,430 --> 00:23:30,459
has a low value.

420
00:23:30,459 --> 00:23:31,750
So that's what's sampling from.

421
00:23:31,750 --> 00:23:33,770
Now, you can do
that mathematically

422
00:23:33,770 --> 00:23:36,800
using random number generators
in MATLAB, for example,

423
00:23:36,800 --> 00:23:39,290
and we'll do that sometimes.

424
00:23:39,290 --> 00:23:41,580
But you do it all the time
when you do experiments.

425
00:23:41,580 --> 00:23:43,911
So the experiment has some
probability density function

426
00:23:43,911 --> 00:23:45,410
that you're going
observe something,

427
00:23:45,410 --> 00:23:47,450
you're going to
measure something.

428
00:23:47,450 --> 00:23:49,670
And you don't know what
that distribution is,

429
00:23:49,670 --> 00:23:51,470
but every time you
make a measurement,

430
00:23:51,470 --> 00:23:53,650
you're sampling from
that distribution.

431
00:23:53,650 --> 00:23:55,700
So that's the key
conceptual idea

432
00:23:55,700 --> 00:23:59,180
is that there is a Px of x
out there for our measurement.

433
00:23:59,180 --> 00:24:02,165
So you're trying to
measure how tall I am.

434
00:24:02,165 --> 00:24:03,540
Every time you
measure it, you're

435
00:24:03,540 --> 00:24:08,070
drawing from a distribution
of experimental measurements

436
00:24:08,070 --> 00:24:11,430
of Professor Green's height.

437
00:24:11,430 --> 00:24:14,340
And there is some Px of x
that exists even though you

438
00:24:14,340 --> 00:24:15,776
don't know what it is.

439
00:24:15,776 --> 00:24:17,400
And each time you
make the measurement,

440
00:24:17,400 --> 00:24:19,920
you're drawing numbers
from that distribution.

441
00:24:19,920 --> 00:24:24,990
And if you draw a lot of them,
then you can do an average.

442
00:24:24,990 --> 00:24:27,930
And it should be an average
that's close to this.

443
00:24:27,930 --> 00:24:30,360
If you drew an infinite
number of values,

444
00:24:30,360 --> 00:24:31,680
then you're sampling this.

445
00:24:31,680 --> 00:24:35,820
You can make a histogram plot of
the heights you measure of me,

446
00:24:35,820 --> 00:24:39,210
and it should have some shape
that's similar to Px of x.

447
00:24:39,210 --> 00:24:40,980
Does that makes sense?

448
00:24:40,980 --> 00:24:41,610
All right.

449
00:24:41,610 --> 00:24:43,110
So actually, everyday
you're drawing

450
00:24:43,110 --> 00:24:44,401
from probability distributions.

451
00:24:44,401 --> 00:24:45,780
You just didn't know it.

452
00:24:45,780 --> 00:24:46,530
It's like [INAUDIBLE] street.

453
00:24:46,530 --> 00:24:47,985
The probability the bus
is going to hit me or not

454
00:24:47,985 --> 00:24:49,890
and the bus driver
is going to stop,

455
00:24:49,890 --> 00:24:50,730
I think there's a
high probability,

456
00:24:50,730 --> 00:24:52,479
but I'm always a little
worried, actually.

457
00:24:52,479 --> 00:24:53,290
Good.

458
00:24:53,290 --> 00:24:55,890
I'm drawing from-- it's
a particular instance

459
00:24:55,890 --> 00:24:59,360
of that probability distribution
about whether the bus driver's

460
00:24:59,360 --> 00:25:01,740
really going to stop or not.

461
00:25:01,740 --> 00:25:04,640
And if I sample enough
times, I might be dead.

462
00:25:04,640 --> 00:25:07,680
But anyway, all right.

463
00:25:15,270 --> 00:25:17,770
Often we have
multiple variables.

464
00:25:17,770 --> 00:25:19,670
So you can write down--

465
00:25:33,740 --> 00:25:36,430
you can define Px hat.

466
00:25:50,300 --> 00:25:51,740
So now I have multiple x's.

467
00:25:51,740 --> 00:25:53,890
It's like more than
one variable of x.

468
00:25:53,890 --> 00:25:56,139
And I wanted the probability
density function of them.

469
00:25:56,139 --> 00:25:58,430
I'm going to measure this
and this and this and this,

470
00:25:58,430 --> 00:25:59,810
all right?

471
00:25:59,810 --> 00:26:04,700
And this is equal
to the probability

472
00:26:04,700 --> 00:26:16,760
that x1 is a member of
the set, x1, x1 plus dx1,

473
00:26:16,760 --> 00:26:27,136
and x2 is a member of x2,
x2 plus dx2, and that.

474
00:26:29,584 --> 00:26:31,250
That's what probability
density function

475
00:26:31,250 --> 00:26:33,230
means with multiple variables.

476
00:26:37,560 --> 00:26:39,800
So this is very common
for us because we often

477
00:26:39,800 --> 00:26:42,950
measure and experiment
more than one thing, right?

478
00:26:42,950 --> 00:26:45,860
So you measure the flow
rate and the temperature.

479
00:26:45,860 --> 00:26:49,250
You measure the yield
and the absorption

480
00:26:49,250 --> 00:26:52,835
at some wavelength that
corresponds to an impurity.

481
00:26:52,835 --> 00:26:55,460
Usually when you experiment, you
often measure multiple things.

482
00:26:55,460 --> 00:26:58,580
And so you're sampling
from multiple observable

483
00:26:58,580 --> 00:26:59,780
simultaneously.

484
00:26:59,780 --> 00:27:02,836
And implicitly, you're sampling
from some complicated PDF

485
00:27:02,836 --> 00:27:05,210
like this even though you
don't know the shape of the PDF

486
00:27:05,210 --> 00:27:06,126
usually to start with.

487
00:27:10,180 --> 00:27:14,220
And so then when you have
this multiple variable case,

488
00:27:14,220 --> 00:27:21,900
you can define a thing
called the covariance matrix

489
00:27:21,900 --> 00:27:25,130
where the elements
of the matrix Cij

490
00:27:25,130 --> 00:27:37,060
are equal to xi xj, the mean
of that product, minus xi xj.

491
00:27:37,060 --> 00:27:39,820
And so you can see that,
for example, sigma i squared

492
00:27:39,820 --> 00:27:43,030
is equal to Cii, but
the diagonal elements

493
00:27:43,030 --> 00:27:44,545
are just the variances.

494
00:27:44,545 --> 00:27:47,170
But now we have the covariances
because we measured, let's say,

495
00:27:47,170 --> 00:27:48,146
two things.

496
00:27:53,750 --> 00:27:59,220
All right, so suppose
we do n measurements

497
00:27:59,220 --> 00:28:02,040
and we compute the
average of our repeats.

498
00:28:02,040 --> 00:28:06,140
So we'd just repeat the same
measurements over and over.

499
00:28:06,140 --> 00:28:09,287
So suppose you measure
my height and my weight.

500
00:28:09,287 --> 00:28:10,870
Every time I go to
the medical clinic,

501
00:28:10,870 --> 00:28:13,370
they always measure my height,
my weight, my blood pressure.

502
00:28:13,370 --> 00:28:14,540
You've got three numbers.

503
00:28:14,540 --> 00:28:17,210
And I could go back
in there 47 times,

504
00:28:17,210 --> 00:28:19,010
and they'll do it 47 times.

505
00:28:19,010 --> 00:28:21,050
And if a different
technician measured

506
00:28:21,050 --> 00:28:24,199
it using different [INAUDIBLE]
and a different scale,

507
00:28:24,199 --> 00:28:25,490
I might get a different number.

508
00:28:25,490 --> 00:28:27,156
Sometimes, I forget
to take my shoes off

509
00:28:27,156 --> 00:28:29,281
so I'm a little bit taller
than I would have been.

510
00:28:29,281 --> 00:28:30,530
So the numbers go up and down.

511
00:28:30,530 --> 00:28:31,447
They fluctuate, right?

512
00:28:31,447 --> 00:28:32,488
You'd expect that, right?

513
00:28:32,488 --> 00:28:33,950
If you looked at
my medical chart,

514
00:28:33,950 --> 00:28:35,450
it's not the same
number every time.

515
00:28:39,910 --> 00:28:43,970
But you'd think, if
everything's right in the world,

516
00:28:43,970 --> 00:28:45,710
that I'm an old guy.

517
00:28:45,710 --> 00:28:47,960
I've been going to the medical
clinic for a long time.

518
00:28:47,960 --> 00:28:50,170
If I look at my chart and
average all those numbers,

519
00:28:50,170 --> 00:28:52,122
it should be somewhere
close to the true value

520
00:28:52,122 --> 00:28:52,830
of those numbers.

521
00:28:52,830 --> 00:29:00,200
So I should have that the
average values experimentally,

522
00:29:00,200 --> 00:29:03,677
which I just define
to be the averages--

523
00:29:15,110 --> 00:29:18,150
this is the number
of experiments.

524
00:29:18,150 --> 00:29:20,990
OK, so I can have
these averages.

525
00:29:20,990 --> 00:29:27,600
And I would expect that
as n goes to infinity,

526
00:29:27,600 --> 00:29:31,670
I hope that my
experimental values go

527
00:29:31,670 --> 00:29:34,520
to the same value
of x that I would

528
00:29:34,520 --> 00:29:38,590
have gotten from the true
probability distribution

529
00:29:38,590 --> 00:29:39,350
function.

530
00:29:39,350 --> 00:29:46,610
If I knew what Px of x is
and I evaluated the integral

531
00:29:46,610 --> 00:29:49,970
and I got x, I think it should
be the same as the experiment

532
00:29:49,970 --> 00:29:52,580
as long as I did enough repeats.

533
00:29:52,580 --> 00:29:55,262
So this is almost like an
article of faith here, yeah?

534
00:29:55,262 --> 00:29:56,220
It's what you'd expect.

535
00:30:02,550 --> 00:30:04,450
Now, the interesting
thing about this--

536
00:30:04,450 --> 00:30:06,570
I mean, probably
you've done this a lot.

537
00:30:06,570 --> 00:30:09,070
You probably did experiments
and you've averaged some things

538
00:30:09,070 --> 00:30:10,110
before, right?

539
00:30:10,110 --> 00:30:12,600
If everybody in the class tried
to measure how tall I was,

540
00:30:12,600 --> 00:30:14,490
you guys all wouldn't
get the same number.

541
00:30:14,490 --> 00:30:15,690
But you'd think that
if you took the average

542
00:30:15,690 --> 00:30:17,440
of the whole classroom,
it might be pretty

543
00:30:17,440 --> 00:30:19,380
close to my true height, right?

544
00:30:19,380 --> 00:30:28,280
So the key idea here
is that the sigma

545
00:30:28,280 --> 00:30:38,840
squared of the x measurement
experimental, which

546
00:30:38,840 --> 00:30:42,310
we define to be this--

547
00:30:55,967 --> 00:30:57,550
maybe we should do
this one at a time.

548
00:31:00,130 --> 00:31:01,224
[INAUDIBLE]

549
00:31:11,470 --> 00:31:13,840
Then I can have a vector
of these guys for all

550
00:31:13,840 --> 00:31:15,082
the different measurements.

551
00:31:15,082 --> 00:31:16,540
So there's some
error in my height.

552
00:31:16,540 --> 00:31:17,390
There's some error in my weight.

553
00:31:17,390 --> 00:31:20,350
There's some different error in
my blood pressure measurement,

554
00:31:20,350 --> 00:31:23,450
but each should have
their own variances.

555
00:31:23,450 --> 00:31:24,910
I can have the covariances.

556
00:31:50,280 --> 00:31:52,620
OK, so these are all the
experimental quantities.

557
00:31:52,620 --> 00:31:55,078
You guys maybe even computed
all these before in your life.

558
00:31:57,760 --> 00:31:59,610
And we expect that this
should go like this

559
00:31:59,610 --> 00:32:00,674
as n goes to infinity.

560
00:32:00,674 --> 00:32:02,340
Now what's going to
happen to these guys

561
00:32:02,340 --> 00:32:03,374
as n goes to infinity?

562
00:32:03,374 --> 00:32:04,915
That's the really
important question.

563
00:32:09,080 --> 00:32:13,910
So there's an amazing theory
called the central limit

564
00:32:13,910 --> 00:32:15,565
theorem of statistics.

565
00:32:26,460 --> 00:32:40,240
And what this theorem
says, that as n gets large

566
00:32:40,240 --> 00:32:58,470
and if trials are uncorrelated
and the x's aren't correlated,

567
00:32:58,470 --> 00:33:01,960
which is the same as saying
that Cij is equal to 0 off

568
00:33:01,960 --> 00:33:16,420
the diagonal, then the
probability of making

569
00:33:16,420 --> 00:33:33,696
the measurement x is
proportional to the Gaussian,

570
00:33:33,696 --> 00:33:34,320
the bell curve.

571
00:34:02,640 --> 00:34:04,470
All right?

572
00:34:04,470 --> 00:34:08,790
So this is only true
as n gets very large.

573
00:34:08,790 --> 00:34:10,929
It doesn't specify exactly
how large has to be,

574
00:34:10,929 --> 00:34:14,590
but it's true for any
Px, any distribution

575
00:34:14,590 --> 00:34:17,005
function, probability
distribution function.

576
00:34:17,005 --> 00:34:19,840
So everything
becomes a bell curve

577
00:34:19,840 --> 00:34:22,719
if you look at the averages.

578
00:34:22,719 --> 00:34:27,760
And sigma i squared
in that limit

579
00:34:27,760 --> 00:34:35,598
goes to 1 over n sigma
xi squared experimental.

580
00:34:41,900 --> 00:34:44,130
And this is really important.

581
00:34:44,130 --> 00:34:47,750
So what this says is that
the width of this Gaussian

582
00:34:47,750 --> 00:34:50,690
distribution gets narrower
and narrower as you

583
00:34:50,690 --> 00:34:55,130
increase the number of
repeated experiments

584
00:34:55,130 --> 00:34:57,860
or increase the
number of samples.

585
00:34:57,860 --> 00:35:03,460
So this is really saying that
the uncertainty in the mean

586
00:35:03,460 --> 00:35:09,321
is scaling as 1
over root n where

587
00:35:09,321 --> 00:35:10,820
n is the number of
samples or number

588
00:35:10,820 --> 00:35:12,440
of experiments that's repeated.

589
00:35:15,420 --> 00:35:19,730
Now, sigma, the variance,
is not like that at all.

590
00:35:19,730 --> 00:35:26,290
So this quantity, actually
as you increase n,

591
00:35:26,290 --> 00:35:27,430
just goes to a constant.

592
00:35:27,430 --> 00:35:29,982
It goes to whatever
the real variance is,

593
00:35:29,982 --> 00:35:31,940
which if you're measuring
me, it might how good

594
00:35:31,940 --> 00:35:32,940
your ruler or something.

595
00:35:32,940 --> 00:35:35,890
It'll tell you roughly
what the real variance is.

596
00:35:35,890 --> 00:35:39,840
And that number does not go
to 0 as the number of repeats

597
00:35:39,840 --> 00:35:40,340
happens.

598
00:35:40,340 --> 00:35:42,090
I mean, I could get
the whole student body

599
00:35:42,090 --> 00:35:45,112
to measure how tall I am
at MIT, and they're still

600
00:35:45,112 --> 00:35:46,320
not going to have 0 variance.

601
00:35:46,320 --> 00:35:48,920
It's going to still be
some variance, right?

602
00:35:48,920 --> 00:35:53,750
So this quantity stays
constant as n increases

603
00:35:53,750 --> 00:35:56,570
or goes to a constant value
once it sort of stabilizes.

604
00:35:56,570 --> 00:35:58,160
You have to have enough samples.

605
00:35:58,160 --> 00:36:02,487
But this quantity, the
uncertainty in the mean value,

606
00:36:02,487 --> 00:36:04,820
gets smaller and smaller and
smaller as the square of n.

607
00:36:09,850 --> 00:36:13,120
Now, this is only true in
the limit as n is large.

608
00:36:13,120 --> 00:36:17,330
Now, this is a huge problem
because experimentalists

609
00:36:17,330 --> 00:36:20,654
are lazy, and you don't want
to do that many measurements.

610
00:36:20,654 --> 00:36:22,070
And it's hard to
do a measurement.

611
00:36:22,070 --> 00:36:25,100
So for example, the Higgs boson
was discovered, what, a year

612
00:36:25,100 --> 00:36:27,090
and a half ago, two years ago?

613
00:36:27,090 --> 00:36:31,070
And I think altogether, they
had like nine observations

614
00:36:31,070 --> 00:36:33,920
or something when
they reported it, OK?

615
00:36:33,920 --> 00:36:36,230
So nine is not infinity.

616
00:36:36,230 --> 00:36:39,020
And so they don't have
infinitely small error

617
00:36:39,020 --> 00:36:40,155
bars on that measurement.

618
00:36:40,155 --> 00:36:42,530
And in fact, who knows if it
really looks like a Gaussian

619
00:36:42,530 --> 00:36:45,280
distribution from
such a small sample,

620
00:36:45,280 --> 00:36:49,640
but they still
reported 90% confidence

621
00:36:49,640 --> 00:36:51,950
interval using the Gaussian
distribution formula

622
00:36:51,950 --> 00:36:54,350
to figure out
confidence intervals.

623
00:36:54,350 --> 00:36:56,760
So everybody does this.

624
00:36:56,760 --> 00:36:59,074
If n is big, it should be right.

625
00:36:59,074 --> 00:37:00,990
And you could prove
mathematically it's right,

626
00:37:00,990 --> 00:37:04,540
but the formula doesn't really
tell you how big is big.

627
00:37:04,540 --> 00:37:07,060
So this is like a
general problem.

628
00:37:07,060 --> 00:37:13,000
And it leads to us
oftentimes misestimating

629
00:37:13,000 --> 00:37:15,370
how accurate our results
are because we're

630
00:37:15,370 --> 00:37:18,820
going to use formulas that are
based on-- assuming that we've

631
00:37:18,820 --> 00:37:21,820
averaged enough repeats that
we're in this limit where we

632
00:37:21,820 --> 00:37:25,270
can use the Gaussian formulas
and get this nice limit

633
00:37:25,270 --> 00:37:26,284
formula.

634
00:37:26,284 --> 00:37:28,450
But in fact, we haven't
really reach that because we

635
00:37:28,450 --> 00:37:30,480
haven't done enough repeats.

636
00:37:30,480 --> 00:37:33,880
So anyway, this is
just the way life is.

637
00:37:33,880 --> 00:37:35,337
That's the way life is.

638
00:37:35,337 --> 00:37:37,420
And I think there's even
discussions in statistics

639
00:37:37,420 --> 00:37:41,390
journals and stuff about how
to make corrections and use

640
00:37:41,390 --> 00:37:46,870
slightly better forms that get
the fact that your distribution

641
00:37:46,870 --> 00:37:49,960
of the mean doesn't narrow
down to a beautiful Gaussian

642
00:37:49,960 --> 00:37:50,740
so fast.

643
00:37:50,740 --> 00:37:52,567
It has some stuff in the tails.

644
00:37:52,567 --> 00:37:54,400
People talk about that,
like low probability

645
00:37:54,400 --> 00:37:56,816
events out in the tails of
distributions, stuff like that.

646
00:37:56,816 --> 00:37:59,660
So that's a big
field of statistics.

647
00:37:59,660 --> 00:38:01,890
I don't know too much
about it, but it's like--

648
00:38:01,890 --> 00:38:04,600
I mean, it's very
practical because--

649
00:38:04,600 --> 00:38:07,120
now unfortunately, oftentimes
in chemical engineering,

650
00:38:07,120 --> 00:38:10,690
we make so few repeats
that we have no chance

651
00:38:10,690 --> 00:38:13,480
to figure out what the tails
are doing, maybe [INAUDIBLE]

652
00:38:13,480 --> 00:38:15,280
our tails.

653
00:38:15,280 --> 00:38:18,610
And so this is a big problem
for trying to make sure

654
00:38:18,610 --> 00:38:20,110
you really have things right.

655
00:38:20,110 --> 00:38:25,690
So I would say in general,
this is an optimistic estimate

656
00:38:25,690 --> 00:38:30,526
of what the uncertainty
in the mean is.

657
00:38:30,526 --> 00:38:31,900
Uncertainties are
usually bigger.

658
00:38:31,900 --> 00:38:35,950
So you shouldn't be surprised
if your data doesn't

659
00:38:35,950 --> 00:38:41,156
match a model brilliantly well
as predicted by this formula.

660
00:38:41,156 --> 00:38:43,030
Now, if it's off by some
orders of magnitude,

661
00:38:43,030 --> 00:38:44,717
you might be a little alarmed.

662
00:38:44,717 --> 00:38:46,550
And that might be the
normal situation, too.

663
00:38:46,550 --> 00:38:50,570
But anyway, if it's just
off by a little bit,

664
00:38:50,570 --> 00:38:53,050
I wouldn't sweat it
because you probably

665
00:38:53,050 --> 00:38:56,080
haven't done enough repeats
to be entitled to such

666
00:38:56,080 --> 00:38:57,780
a beautiful result as this.

667
00:39:02,750 --> 00:39:05,970
We can write a similar--

668
00:39:05,970 --> 00:39:10,850
actually, so here I assumed
that the x's are uncorrelated.

669
00:39:10,850 --> 00:39:12,940
That's almost never true.

670
00:39:12,940 --> 00:39:15,350
If you actually numerically
evaluate the C's, usually

671
00:39:15,350 --> 00:39:17,012
they have off-diagonal elements.

672
00:39:17,012 --> 00:39:18,845
For example, my weight
and my blood pressure

673
00:39:18,845 --> 00:39:21,650
are probably correlated.

674
00:39:21,650 --> 00:39:25,820
And so you wouldn't expect them
to be totally uncorrelated.

675
00:39:25,820 --> 00:39:32,330
And so there's another
formula like this.

676
00:39:32,330 --> 00:39:34,490
It's given in the
notes by Joe Scott

677
00:39:34,490 --> 00:39:36,920
that includes the covariance.

678
00:39:36,920 --> 00:39:42,770
And you just get a different
form of what you'd expect, OK?

679
00:39:42,770 --> 00:39:48,440
And the covariance should also
converge roughly as 1 over n

680
00:39:48,440 --> 00:39:49,880
if you have enough samples.

681
00:39:49,880 --> 00:39:52,058
So you should eventually
get some covariance.

682
00:39:55,550 --> 00:39:59,120
You can write very
similar formulas

683
00:39:59,120 --> 00:40:02,790
like this for functions.

684
00:40:02,790 --> 00:40:11,300
So if I have a function
f of x and that's

685
00:40:11,300 --> 00:40:14,960
really what I care about--

686
00:40:14,960 --> 00:40:19,370
remember, I said that I
have the average value of f

687
00:40:19,370 --> 00:40:28,094
is equal to f of x Px of x dx.

688
00:40:28,094 --> 00:40:29,760
And I could make this
vectors if I want.

689
00:40:32,370 --> 00:40:35,690
And I could repeat my function,
and I'd get some number.

690
00:40:35,690 --> 00:40:37,310
And I could repeat the variance.

691
00:40:37,310 --> 00:40:38,090
I have a sigma f.

692
00:40:44,877 --> 00:40:46,960
And this is something I
like to do a lot of times.

693
00:40:49,570 --> 00:40:59,060
Then if we do
experimental delta f--

694
00:40:59,060 --> 00:41:02,590
so we don't know what the
probability distribution

695
00:41:02,590 --> 00:41:04,690
function is usually, or often.

696
00:41:04,690 --> 00:41:07,145
So we'll try to evaluate
this experimentally.

697
00:41:11,030 --> 00:41:18,282
This is going to be
1 over n, the values

698
00:41:18,282 --> 00:41:22,905
f of x little n, the n-th trial.

699
00:41:26,610 --> 00:41:29,892
And we could write a
similar thing for sigma f,

700
00:41:29,892 --> 00:41:31,100
which I just did right there.

701
00:41:31,100 --> 00:41:31,750
You can do the same thing.

702
00:41:31,750 --> 00:41:33,416
Just make these
experimental values now.

703
00:41:40,580 --> 00:41:48,950
The sigma f squared
experimental should go to 1

704
00:41:48,950 --> 00:41:51,470
over n times the variance.

705
00:41:55,860 --> 00:42:00,542
And this was the sigma in
the mean of f, 1 over n

706
00:42:00,542 --> 00:42:02,950
times the variance of the sigma.

707
00:42:05,590 --> 00:42:07,900
All right, so this is
the same beautiful thing,

708
00:42:07,900 --> 00:42:12,550
that the uncertainty
in the mean value of f

709
00:42:12,550 --> 00:42:16,570
narrows with the
number of trials.

710
00:42:16,570 --> 00:42:19,670
So you have some
original variance

711
00:42:19,670 --> 00:42:24,400
that you computed here, either
experimentally or from the PDF.

712
00:42:24,400 --> 00:42:25,870
Experimentally is fine.

713
00:42:25,870 --> 00:42:28,320
And then now you want
to know the uncertainty

714
00:42:28,320 --> 00:42:30,535
in the mean value,
and that drops down

715
00:42:30,535 --> 00:42:34,000
with the number of trials in the
number of things you average.

716
00:42:34,000 --> 00:42:36,730
So this all leads
in two directions.

717
00:42:36,730 --> 00:42:38,320
What we're going
to talk about first

718
00:42:38,320 --> 00:42:41,950
is about comparing
models versus experiments

719
00:42:41,950 --> 00:42:44,860
where we're sampling by
doing the experiment.

720
00:42:44,860 --> 00:42:46,600
So that's one really
important direction,

721
00:42:46,600 --> 00:42:47,920
maybe the most important one.

722
00:42:47,920 --> 00:42:52,630
But it also suggests ways you
could do numerical integration.

723
00:42:52,630 --> 00:42:54,370
So if I wanted to
evaluate an integral

724
00:42:54,370 --> 00:42:59,230
that looks like this,
f of x P of x dx,

725
00:42:59,230 --> 00:43:02,860
and if I had some way
to sample from Px,

726
00:43:02,860 --> 00:43:06,790
then one way to evaluate
this numerical integral

727
00:43:06,790 --> 00:43:09,139
would be to--

728
00:43:09,139 --> 00:43:10,930
sorry, I made this
vector [INAUDIBLE] a lot

729
00:43:10,930 --> 00:43:15,420
of species there, a
lot of directions.

730
00:43:15,420 --> 00:43:17,620
If I want to evaluate
this multiple integral--

731
00:43:17,620 --> 00:43:22,389
it's a lot of integrals
for every dimension of x--

732
00:43:22,389 --> 00:43:23,930
that would be very
hard to do, right?

733
00:43:23,930 --> 00:43:25,554
We talked about in
[INAUDIBLE],, if you

734
00:43:25,554 --> 00:43:28,930
get more than about three or
four of these integral signs,

735
00:43:28,930 --> 00:43:31,660
usually you're in big trouble
to evaluate the integral.

736
00:43:31,660 --> 00:43:34,210
But you can do it by what's
called Monte Carlo sampling

737
00:43:34,210 --> 00:43:38,060
where you sample from P of x
and just evaluate the value of f

738
00:43:38,060 --> 00:43:40,960
at some particular x
points you pull as a sample

739
00:43:40,960 --> 00:43:42,767
and just repeat their average.

740
00:43:42,767 --> 00:43:44,350
And the average of
those things should

741
00:43:44,350 --> 00:43:46,660
converge, according
to this formula,

742
00:43:46,660 --> 00:43:48,590
as you increase the
number of samples.

743
00:43:48,590 --> 00:43:50,590
And so that's the whole
principle of Monte Carlo

744
00:43:50,590 --> 00:43:53,420
methods, and we'll come back
to that a little bit later.

745
00:43:53,420 --> 00:43:56,650
And you can apply that
to a lot of problems.

746
00:43:56,650 --> 00:44:00,800
Basically, any problem you have
in numerics, you have a choice.

747
00:44:00,800 --> 00:44:03,930
You can use deterministic
methods or stochastic methods.

748
00:44:03,930 --> 00:44:05,680
Deterministic methods,
if you can do them,

749
00:44:05,680 --> 00:44:08,530
usually are the fastest
and more accurate,

750
00:44:08,530 --> 00:44:11,560
but stochastic ones are
often very easy to program

751
00:44:11,560 --> 00:44:14,250
and sometimes are actually
the fastest way to do it.

752
00:44:14,250 --> 00:44:15,750
In particular, in
this kind of case,

753
00:44:15,750 --> 00:44:17,572
we have lots of
dimensions, many, many x's.

754
00:44:17,572 --> 00:44:19,780
It turns out that stochastic
ones are pretty good way

755
00:44:19,780 --> 00:44:21,720
to do it.

756
00:44:21,720 --> 00:44:25,157
But we're going to talk
mostly about [INAUDIBLE] data

757
00:44:25,157 --> 00:44:27,240
because that's going to
be important to all of you

758
00:44:27,240 --> 00:44:28,570
in your research.

759
00:44:28,570 --> 00:44:30,380
So let's talk about
that for a minute.

760
00:44:35,740 --> 00:44:38,310
I'll just comment, there's
really good notes posted on

761
00:44:38,310 --> 00:44:40,440
the [INAUDIBLE] website
for all this material,

762
00:44:40,440 --> 00:44:41,820
so you should
definitely read it.

763
00:44:41,820 --> 00:44:43,705
And the textbook has
a lot of material.

764
00:44:43,705 --> 00:44:46,950
It's maybe not so easy to read
as the notes are, but plenty

765
00:44:46,950 --> 00:44:49,200
to learn, for sure.

766
00:44:49,200 --> 00:44:52,867
So we generally have a situation
where we have an experiment.

767
00:44:52,867 --> 00:44:54,450
And what do we have
in the experiment?

768
00:44:54,450 --> 00:44:56,370
We have some knobs.

769
00:44:59,130 --> 00:45:01,150
These are things
that we can change.

770
00:45:01,150 --> 00:45:03,810
So we can change
some valve positions.

771
00:45:03,810 --> 00:45:05,280
We can change how
much electricity

772
00:45:05,280 --> 00:45:06,900
goes into our heaters.

773
00:45:06,900 --> 00:45:10,920
We can change the setting on
our back pressure regulator.

774
00:45:10,920 --> 00:45:14,200
We can change the chemicals
we pour into the system.

775
00:45:14,200 --> 00:45:16,920
So there's a lot of
knobs that we control.

776
00:45:16,920 --> 00:45:20,850
And I'm going to
call the knobs x.

777
00:45:20,850 --> 00:45:22,510
And then we have parameters.

778
00:45:25,790 --> 00:45:28,550
And these are other
things that affect

779
00:45:28,550 --> 00:45:30,230
the result of the
experiment that we

780
00:45:30,230 --> 00:45:32,420
don't have control over.

781
00:45:32,420 --> 00:45:35,840
And I'm going to
call those theta.

782
00:45:35,840 --> 00:45:38,660
So for example, if I do
a kinetics experiment,

783
00:45:38,660 --> 00:45:41,729
it depends on the
rate coefficients.

784
00:45:41,729 --> 00:45:43,520
I have no control of
the rate coefficients.

785
00:45:43,520 --> 00:45:45,769
They're going to [INAUDIBLE]
by God, as far as I know.

786
00:45:45,769 --> 00:45:49,749
So they're some numbers,
but they definitely affect

787
00:45:49,749 --> 00:45:51,540
the result. And if the
rate coefficient had

788
00:45:51,540 --> 00:45:53,581
a different value, I would
get a different result

789
00:45:53,581 --> 00:45:56,540
in the kinetic experiment.

790
00:45:56,540 --> 00:45:59,630
The molecular weight of sulfur,
I have no control over that.

791
00:45:59,630 --> 00:46:00,930
That's just a parameter.

792
00:46:00,930 --> 00:46:02,660
But if I weigh something and it
has a certain number of atoms

793
00:46:02,660 --> 00:46:05,020
of sulfur, it's going to be
a very important parameter

794
00:46:05,020 --> 00:46:08,150
in determining the result.

795
00:46:08,150 --> 00:46:11,120
So we have these two things.

796
00:46:11,120 --> 00:46:20,550
And then we're going to have
some measurables, things

797
00:46:20,550 --> 00:46:21,620
that we can measure.

798
00:46:21,620 --> 00:46:24,590
Let's call them y.

799
00:46:24,590 --> 00:46:28,740
And in general, we think
that if we set the x value

800
00:46:28,740 --> 00:46:30,680
and we know the theta
values, we should

801
00:46:30,680 --> 00:46:33,030
get some measurable values.

802
00:46:33,030 --> 00:46:37,070
And so there's a
y that the model

803
00:46:37,070 --> 00:46:44,490
says that's a function of
the x's and the thetas.

804
00:46:44,490 --> 00:46:46,985
Now, I write this as a
simple function like this.

805
00:46:46,985 --> 00:46:48,380
This might be
really complicated.

806
00:46:48,380 --> 00:46:50,213
It might have partial
differential equations

807
00:46:50,213 --> 00:46:51,284
embedded inside it.

808
00:46:51,284 --> 00:46:53,450
It might have all kinds of
horrible stuff inside it.

809
00:46:53,450 --> 00:46:54,920
But you guys already know how
to solve all these problems

810
00:46:54,920 --> 00:46:55,940
already because you've done it.

811
00:46:55,940 --> 00:46:57,981
You've been in this class
through seven homeworks

812
00:46:57,981 --> 00:46:58,940
already.

813
00:46:58,940 --> 00:47:00,654
And so no problem, right?

814
00:47:00,654 --> 00:47:01,820
So if I give you something--

815
00:47:01,820 --> 00:47:02,810
I give you some knobs.

816
00:47:02,810 --> 00:47:06,705
I give you some parameters--
you can compute it, all right?

817
00:47:06,705 --> 00:47:09,080
And so then the question is--
that's what the model says.

818
00:47:09,080 --> 00:47:10,700
So we could predict
the forward prediction

819
00:47:10,700 --> 00:47:13,100
of what the model should say
if I knew what the parameter

820
00:47:13,100 --> 00:47:17,090
values were, if I knew
what the knob values were.

821
00:47:17,090 --> 00:47:22,440
And I want to--

822
00:47:22,440 --> 00:47:28,040
oftentimes what I
measure, y data,

823
00:47:28,040 --> 00:47:30,780
which is a function
of the knobs,

824
00:47:30,780 --> 00:47:32,834
it's implicitly a function
of the parameters.

825
00:47:32,834 --> 00:47:34,375
I have no control
of them, so I'm not

826
00:47:34,375 --> 00:47:35,510
going to even put them in here.

827
00:47:35,510 --> 00:47:36,593
So I set the knobs I want.

828
00:47:36,593 --> 00:47:37,990
I get some data.

829
00:47:37,990 --> 00:47:40,450
I want these two to
match each other.

830
00:47:40,450 --> 00:47:43,470
I think they should be the
same thing if my model is true,

831
00:47:43,470 --> 00:47:43,970
yeah?

832
00:47:43,970 --> 00:47:45,718
So this is my model, really.

833
00:47:51,780 --> 00:47:54,424
But I don't think they
should be exactly the same.

834
00:47:54,424 --> 00:47:56,590
I mean, just like when you
try to measure my height,

835
00:47:56,590 --> 00:47:58,290
you don't get exactly
the same numbers.

836
00:47:58,290 --> 00:48:02,050
So these y data are not going
to be exactly the same numbers

837
00:48:02,050 --> 00:48:03,482
as my model would say.

838
00:48:03,482 --> 00:48:04,940
So now I have to
cope with the fact

839
00:48:04,940 --> 00:48:08,610
that I have deviations between
the data and the model.

840
00:48:08,610 --> 00:48:11,160
And how am I going to
handle that, all right?

841
00:48:15,260 --> 00:48:18,560
And also, we have a
set of these guys,

842
00:48:18,560 --> 00:48:20,550
typically do some repeats.

843
00:48:20,550 --> 00:48:22,802
So we have like several
numbers for each setting

844
00:48:22,802 --> 00:48:25,010
in the x's, and they don't
even agree with each other

845
00:48:25,010 --> 00:48:25,820
because they're all different.

846
00:48:25,820 --> 00:48:27,361
Every time I repeated
the experiment,

847
00:48:27,361 --> 00:48:29,150
I got some different result--

848
00:48:29,150 --> 00:48:31,112
that's my y's-- for each x.

849
00:48:31,112 --> 00:48:32,570
And then I change
the x a few times

850
00:48:32,570 --> 00:48:33,694
at different knob settings.

851
00:48:33,694 --> 00:48:35,160
Then I make some
more measurements.

852
00:48:35,160 --> 00:48:37,370
And I have a whole
bunch of y values

853
00:48:37,370 --> 00:48:40,350
that are all scattered
numbers that maybe scatter

854
00:48:40,350 --> 00:48:42,660
around this model
possibly, if I'm lucky,

855
00:48:42,660 --> 00:48:44,030
if the model's right.

856
00:48:44,030 --> 00:48:47,057
Often, usually I also don't
know if the model's correct.

857
00:48:47,057 --> 00:48:49,390
So that's another thing to
hold in the back of your mind

858
00:48:49,390 --> 00:48:51,320
is like, we're going to
this whole comparison

859
00:48:51,320 --> 00:48:53,240
assuming the model's correct.

860
00:48:53,240 --> 00:48:55,640
And then we might, at the
end, decide, hmm, maybe

861
00:48:55,640 --> 00:48:56,848
the model's not really right.

862
00:48:56,848 --> 00:48:58,340
I may have to go
make a new model.

863
00:48:58,340 --> 00:49:01,110
So that's just a thing to
keep in the back your mind.

864
00:49:01,110 --> 00:49:02,950
But we'll be optimistic
to start with,

865
00:49:02,950 --> 00:49:04,340
and we'll assume that
the model is good.

866
00:49:04,340 --> 00:49:05,714
And our only
challenge is we just

867
00:49:05,714 --> 00:49:11,060
don't have the right values of
the thetas, maybe, in my model.

868
00:49:11,060 --> 00:49:12,370
And this is another thing, too.

869
00:49:12,370 --> 00:49:14,690
So the thetas are things
like rate coefficients

870
00:49:14,690 --> 00:49:16,610
and molecular weights
and viscosities

871
00:49:16,610 --> 00:49:19,387
and stuff that are like
properties of the universe,

872
00:49:19,387 --> 00:49:20,720
and they're real numbers, maybe.

873
00:49:20,720 --> 00:49:23,120
They're also things like
the length of my apparatus

874
00:49:23,120 --> 00:49:25,010
and stuff like that.

875
00:49:25,010 --> 00:49:28,809
But I don't know those numbers
to perfect precision, right?

876
00:49:28,809 --> 00:49:30,350
The best number I
can find, if I look

877
00:49:30,350 --> 00:49:32,984
in the database
is, you know, you

878
00:49:32,984 --> 00:49:34,400
could find like
the speed of light

879
00:49:34,400 --> 00:49:36,525
to like 11 significant
figures, but I don't know it

880
00:49:36,525 --> 00:49:38,240
to the 12th significant figure.

881
00:49:38,240 --> 00:49:40,144
So I don't know any of
the numbers perfectly.

882
00:49:40,144 --> 00:49:42,060
And a lot of numbers I
don't even know at all.

883
00:49:42,060 --> 00:49:43,280
So like there's some
rate coefficients

884
00:49:43,280 --> 00:49:45,170
that no one has ever
measured or calculated

885
00:49:45,170 --> 00:49:47,070
in the history of the world.

886
00:49:47,070 --> 00:49:48,930
And my students have
to deal with that a lot

887
00:49:48,930 --> 00:49:49,910
in the Green group.

888
00:49:49,910 --> 00:49:52,719
So a lot of these
are quite uncertain.

889
00:49:52,719 --> 00:49:54,510
But there are some that
are pretty certain.

890
00:49:54,510 --> 00:49:56,736
You have quite a big variance,
actually, of how certainly you

891
00:49:56,736 --> 00:49:57,819
know the parameter values.

892
00:50:00,900 --> 00:50:07,140
So one idea, a very popular
idea, is to say, you know,

893
00:50:07,140 --> 00:50:12,260
I have this deviation between
the model and the experiment.

894
00:50:12,260 --> 00:50:15,490
So I want to sort of do a
minimization by varying,

895
00:50:15,490 --> 00:50:23,110
say, parameter values of
some measure of the error

896
00:50:23,110 --> 00:50:24,561
between the model and the data.

897
00:50:32,100 --> 00:50:35,376
Somehow, I want
to minimize that.

898
00:50:35,376 --> 00:50:38,000
And I have to think about, well,
what should I really minimize?

899
00:50:38,000 --> 00:50:44,010
And the popular thing to
minimize is these guys squared

900
00:50:44,010 --> 00:50:48,376
and actually to weight
them by some kind of sigma

901
00:50:48,376 --> 00:50:49,500
for each one of these guys.

902
00:50:49,500 --> 00:50:51,441
So this is-- we should
change the notation,

903
00:50:51,441 --> 00:50:52,190
make this clearer.

904
00:51:23,620 --> 00:51:30,890
These guys-- one model, and
it's the i-th measurement

905
00:51:30,890 --> 00:51:33,853
that corresponds to
that n-th experiment.

906
00:51:44,220 --> 00:51:49,320
So I think that the difference
between what I measured

907
00:51:49,320 --> 00:51:51,690
and what the model calculated
should be sort of scaled

908
00:51:51,690 --> 00:51:55,650
by the variance, right?

909
00:51:55,650 --> 00:51:58,530
So I would expect
that this sum has

910
00:51:58,530 --> 00:52:00,990
a bunch of numbers that
are sort of order of one

911
00:52:00,990 --> 00:52:03,840
because I expect the deviation
to be approximately scaled

912
00:52:03,840 --> 00:52:06,800
of the variance of
my measurements.

913
00:52:06,800 --> 00:52:10,750
And if these deviations are
much larger than the variance,

914
00:52:10,750 --> 00:52:12,330
then I think my
model's not right

915
00:52:12,330 --> 00:52:14,080
and what I'm going to
try to do right here

916
00:52:14,080 --> 00:52:16,570
is I'm going to try to adjust
the thetas, the parameters,

917
00:52:16,570 --> 00:52:21,300
to try to force the model to
agree better to my experiment.

918
00:52:21,300 --> 00:52:28,000
And this form looks
a lot like this.

919
00:52:28,000 --> 00:52:29,880
Do you see this?

920
00:52:29,880 --> 00:52:31,950
You see I have a sum
of the deviations

921
00:52:31,950 --> 00:52:36,480
between the experiment and
a theoretical sort of thing

922
00:52:36,480 --> 00:52:39,210
divided by some variance?

923
00:52:39,210 --> 00:52:42,800
And so this is the motivation
of where this comes from,

924
00:52:42,800 --> 00:52:50,550
is that I want to make the
probability that I would make

925
00:52:50,550 --> 00:52:55,470
this observation
experimentally would be maximum

926
00:52:55,470 --> 00:53:00,460
if this quantity in the exponent
is as small as possible.

927
00:53:00,460 --> 00:53:02,550
So I'm going to try to
minimize that quantity,

928
00:53:02,550 --> 00:53:05,080
and that's exactly what
I'm doing over here.

929
00:53:05,080 --> 00:53:06,502
Is that all right?

930
00:53:06,502 --> 00:53:07,960
OK, so next time
when we come back,

931
00:53:07,960 --> 00:53:10,490
I'll talk more about
how we actually do it.