1
00:00:01,540 --> 00:00:03,910
The following content is
provided under a Creative

2
00:00:03,910 --> 00:00:05,300
Commons license.

3
00:00:05,300 --> 00:00:07,510
Your support will help
MIT OpenCourseWare

4
00:00:07,510 --> 00:00:11,600
continue to offer high quality
educational resources for free.

5
00:00:11,600 --> 00:00:14,140
To make a donation or to
view additional materials

6
00:00:14,140 --> 00:00:18,100
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,100 --> 00:00:19,310
at ocw.mit.edu.

8
00:00:22,672 --> 00:00:24,130
WILLIAM GREEN: So
today we're going

9
00:00:24,130 --> 00:00:31,920
to talk about Bayesian prior
estimation and prior estimation

10
00:00:31,920 --> 00:00:32,500
in general.

11
00:00:36,830 --> 00:00:44,830
So the last time we were
writing down the expressions

12
00:00:44,830 --> 00:00:52,540
for the probability of observing
a mean measurement if you

13
00:00:52,540 --> 00:00:54,470
know what the model is.

14
00:00:54,470 --> 00:00:55,730
So let's try to do that again.

15
00:00:55,730 --> 00:01:04,840
So suppose I have a model
that predicts some observable

16
00:01:04,840 --> 00:01:06,390
and it depends on
some knobs, and it

17
00:01:06,390 --> 00:01:09,850
depends on some parameters.

18
00:01:09,850 --> 00:01:15,480
And suppose because I
have great powers of faith

19
00:01:15,480 --> 00:01:18,670
that I believe this model is
100% correct with every core

20
00:01:18,670 --> 00:01:20,440
of my being.

21
00:01:20,440 --> 00:01:22,620
And also because I have
tremendous confidence in all

22
00:01:22,620 --> 00:01:25,810
the people who
built my apparatus,

23
00:01:25,810 --> 00:01:28,210
and the knobs that I
turn actually correspond

24
00:01:28,210 --> 00:01:31,030
to the real values, and I
have tremendous confidence

25
00:01:31,030 --> 00:01:34,320
in all the literature that
reports the parameter values.

26
00:01:34,320 --> 00:01:37,920
And so I'm absolutely certain
that this is the truth.

27
00:01:37,920 --> 00:01:41,410
So we'll start from a position
of absolute certainty,

28
00:01:41,410 --> 00:01:45,229
and then we'll degrade into
doubt as the collector goes on.

29
00:01:45,229 --> 00:01:47,020
So let's start from
the position of someone

30
00:01:47,020 --> 00:01:51,665
who has absolute faith
that this is the truth.

31
00:01:51,665 --> 00:01:52,900
Is true.

32
00:01:55,470 --> 00:01:58,090
So I have a model, I
really believe this model.

33
00:01:58,090 --> 00:02:01,330
So for example, I
believe that the kilogram

34
00:02:01,330 --> 00:02:04,810
weight in the SI
Institute in Paris

35
00:02:04,810 --> 00:02:07,510
weighs exactly one kilogram.

36
00:02:07,510 --> 00:02:09,800
I believe that with
every core of my being.

37
00:02:09,800 --> 00:02:13,390
I'm completely confident
that model is correct.

38
00:02:13,390 --> 00:02:15,921
So there are some things
I'm really confident on.

39
00:02:15,921 --> 00:02:16,420
That's one.

40
00:02:19,180 --> 00:02:22,410
And maybe guys have some
things you really believe, too.

41
00:02:22,410 --> 00:02:24,160
So let's go with things
we really believe.

42
00:02:28,780 --> 00:02:32,950
So I plan to conduct
some experiments that

43
00:02:32,950 --> 00:02:37,880
measure this observable and
are related to this model.

44
00:02:37,880 --> 00:02:45,865
And so I'm going to do 10
repeats of measuring y.

45
00:02:52,630 --> 00:02:57,101
So I'm going to get to the
kilogram blob that's in Paris,

46
00:02:57,101 --> 00:02:59,350
and I'm going to stick it
on my really expensive scale

47
00:02:59,350 --> 00:03:00,880
that I really believe
is great, and I'm

48
00:03:00,880 --> 00:03:02,060
going to measure its weight.

49
00:03:02,060 --> 00:03:03,520
And then I'm going to
put it back, and then

50
00:03:03,520 --> 00:03:05,370
put it back, and put it
back, and put it back.

51
00:03:05,370 --> 00:03:06,850
And I'm going to get another
really great scale that I

52
00:03:06,850 --> 00:03:09,516
really believe is great, and I'm
going to measure it there, too.

53
00:03:09,516 --> 00:03:11,350
So I've got a lot of
repeats of measuring

54
00:03:11,350 --> 00:03:13,270
the weight of this
kilogram, and I

55
00:03:13,270 --> 00:03:15,950
believe it's really a kilogram.

56
00:03:15,950 --> 00:03:18,340
But the stupid measurements
don't say a kilogram.

57
00:03:18,340 --> 00:03:25,070
They say, you know, 1.0003,
0.99995, all kinds of numbers

58
00:03:25,070 --> 00:03:27,610
not equal to one kilogram.

59
00:03:27,610 --> 00:03:33,280
So now I'm going to
try to figure out

60
00:03:33,280 --> 00:03:36,340
what the probability is
that it would have measured

61
00:03:36,340 --> 00:03:38,180
some particular value y.

62
00:03:38,180 --> 00:03:58,230
So what is the probability
that my experimental

63
00:03:58,230 --> 00:04:01,960
is between some value,
say, y and y plus dy.

64
00:04:04,990 --> 00:04:06,420
So that's a question for you.

65
00:04:06,420 --> 00:04:07,930
So what's the probability?

66
00:04:19,700 --> 00:04:20,915
Sorry, what?

67
00:04:20,915 --> 00:04:22,590
[INAUDIBLE]

68
00:04:22,590 --> 00:04:29,704
OK, so we think that
the probability that y

69
00:04:29,704 --> 00:04:37,815
is in this interval given
that the model is true.

70
00:04:43,050 --> 00:04:45,420
And I know the theta
values perfectly,

71
00:04:45,420 --> 00:04:49,220
and I know the x
values perfectly,

72
00:04:49,220 --> 00:04:51,590
is equal to some
integral of what?

73
00:04:55,100 --> 00:04:59,820
The bounds integral, probably
y to y plus dy, believe that?

74
00:05:02,520 --> 00:05:03,791
What's the integrand?

75
00:05:11,660 --> 00:05:13,220
Sorry, what?

76
00:05:13,220 --> 00:05:16,040
AUDIENCE: The probability
[INAUDIBLE] AC function of y.

77
00:05:16,040 --> 00:05:17,540
WILLIAM GREEN:
Right, so what is it?

78
00:05:20,071 --> 00:05:22,043
AUDIENCE: [INAUDIBLE]

79
00:05:22,043 --> 00:05:23,760
You wrote it down
last time, I think.

80
00:05:33,260 --> 00:05:36,340
So this [INAUDIBLE] is large?

81
00:05:36,340 --> 00:05:37,380
Standard normal, right?

82
00:05:37,380 --> 00:05:42,880
So it should be one
over sigma root of 2 pi.

83
00:06:00,220 --> 00:06:01,090
Does that sound OK?

84
00:06:06,510 --> 00:06:07,010
I mean here.

85
00:06:10,794 --> 00:06:12,674
It's probably the
same, it's fine.

86
00:06:20,578 --> 00:06:21,673
Yep.

87
00:06:21,673 --> 00:06:23,048
AUDIENCE: What
does that notation

88
00:06:23,048 --> 00:06:26,602
mean, if your model is true?

89
00:06:26,602 --> 00:06:29,060
WILLIAM GREEN: So this means,
given that the model is true,

90
00:06:29,060 --> 00:06:31,434
and I know these data values
are exactly certain numbers,

91
00:06:31,434 --> 00:06:34,040
and the x values are actually
certain numbers, what's

92
00:06:34,040 --> 00:06:35,870
the probability
that I would make

93
00:06:35,870 --> 00:06:40,220
a measurement whose average
would fall in this interval?

94
00:06:40,220 --> 00:06:42,139
So this line means
given that this is true,

95
00:06:42,139 --> 00:06:43,430
what's the probability of that?

96
00:06:48,070 --> 00:06:50,020
OK, is this right?

97
00:06:50,020 --> 00:06:52,572
Is this surprising?

98
00:06:52,572 --> 00:06:54,412
This is OK?

99
00:06:54,412 --> 00:06:55,370
So this is what I mean.

100
00:06:55,370 --> 00:06:58,160
So we say that our probability
distribution converges

101
00:06:58,160 --> 00:07:00,462
to a Gaussian distribution,
this is what we expect.

102
00:07:00,462 --> 00:07:02,170
So we expect it to
have been large enough

103
00:07:02,170 --> 00:07:03,003
for this to be true.

104
00:07:06,130 --> 00:07:07,692
Yeah?

105
00:07:07,692 --> 00:07:08,650
This is very important.

106
00:07:08,650 --> 00:07:10,511
This is like the whole
course, actually.

107
00:07:10,511 --> 00:07:12,510
This is the whole section,
is this one equation.

108
00:07:12,510 --> 00:07:15,340
So I just wanted to make sure
you really get what this says.

109
00:07:15,340 --> 00:07:16,930
And if you don't
like the integral,

110
00:07:16,930 --> 00:07:18,721
you can make dy really
small, and then it's

111
00:07:18,721 --> 00:07:19,650
just this times dy.

112
00:07:23,506 --> 00:07:24,006
OK?

113
00:07:28,850 --> 00:07:31,050
Actually, this notation's
like [INAUDIBLE] I

114
00:07:31,050 --> 00:07:33,220
think I should do this way.

115
00:07:33,220 --> 00:07:35,492
I should do this.

116
00:07:35,492 --> 00:07:36,200
This is a number.

117
00:07:38,731 --> 00:07:39,980
Let's get rid of the integral.

118
00:07:39,980 --> 00:07:42,240
Let's make dy really small.

119
00:07:42,240 --> 00:07:43,774
I'll make it [INAUDIBLE].

120
00:07:48,596 --> 00:07:49,220
That all right?

121
00:07:52,470 --> 00:07:55,510
So this is the probability
density that we would observe,

122
00:07:55,510 --> 00:07:59,820
this is the experimental value
y that we observe from the mean,

123
00:07:59,820 --> 00:08:05,204
and this is the little with
of our tiny little interval.

124
00:08:05,204 --> 00:08:06,350
Is that all right?

125
00:08:10,570 --> 00:08:11,070
Yes?

126
00:08:11,070 --> 00:08:14,353
AUDIENCE: So is sigma
the [INAUDIBLE] on there?

127
00:08:14,353 --> 00:08:16,231
WILLIAM GREEN:
Ah, what is sigma?

128
00:08:16,231 --> 00:08:17,230
That's a great question.

129
00:08:17,230 --> 00:08:18,729
We didn't write
down what sigma was.

130
00:08:18,729 --> 00:08:19,423
What is sigma?

131
00:08:19,423 --> 00:08:21,235
AUDIENCE: Standard deviation?

132
00:08:21,235 --> 00:08:23,526
WILLIAM GREEN: It's not the
standard deviation exactly.

133
00:08:23,526 --> 00:08:25,650
Standard deviation
of the mean, right?

134
00:08:25,650 --> 00:08:28,170
So there's two sigmas.

135
00:08:28,170 --> 00:08:32,049
We have the sigma of
y, of the measurements,

136
00:08:32,049 --> 00:08:42,559
and that's equal to average
value of y squared minus.

137
00:08:49,430 --> 00:08:51,680
So we just figure that for
how many experiments we do,

138
00:08:51,680 --> 00:08:54,030
we just compute the average
of y squared, the average y,

139
00:08:54,030 --> 00:08:55,180
subtract them.

140
00:08:55,180 --> 00:08:56,340
That's the variance.

141
00:08:56,340 --> 00:08:58,550
And then sigma that I
used in that equation

142
00:08:58,550 --> 00:09:04,580
there is 1 over n times sigma y.

143
00:09:04,580 --> 00:09:08,160
And we call this the
variation of the mean,

144
00:09:08,160 --> 00:09:11,987
it's the uncertainty
in the mean value of y.

145
00:09:11,987 --> 00:09:13,445
And the central
limits theorem said

146
00:09:13,445 --> 00:09:15,230
that as long as n
gets really large,

147
00:09:15,230 --> 00:09:18,500
we expect that this
should converge to this.

148
00:09:18,500 --> 00:09:21,690
And we talked last time
about how when n get bigger,

149
00:09:21,690 --> 00:09:24,710
these averages don't really
change when it gets big.

150
00:09:24,710 --> 00:09:25,910
They're just the average.

151
00:09:28,580 --> 00:09:30,620
But this number declined
as n gets big because

152
00:09:30,620 --> 00:09:34,390
of this one over n formula.

153
00:09:34,390 --> 00:09:39,170
And to understand that,
suppose I measure the weight,

154
00:09:39,170 --> 00:09:43,080
and I measure, it should
be around one kilogram.

155
00:09:43,080 --> 00:09:46,430
But in fact my measurements
are all over here.

156
00:09:46,430 --> 00:09:50,610
Lots of measurements.

157
00:09:50,610 --> 00:09:54,020
So they have a variance
something like this.

158
00:09:54,020 --> 00:09:56,060
But if I make a plot of--

159
00:09:59,650 --> 00:10:03,370
as I run, I compute
the running average.

160
00:10:03,370 --> 00:10:08,470
So when I run the
first two points,

161
00:10:08,470 --> 00:10:10,960
I get some average value here.

162
00:10:10,960 --> 00:10:15,370
After I run 27 points more,
the average value is here.

163
00:10:15,370 --> 00:10:18,700
After I run 1,000 repeats,
the average value is here.

164
00:10:18,700 --> 00:10:20,117
It's getting pretty
close to this,

165
00:10:20,117 --> 00:10:21,616
and the uncertainty
in this number's

166
00:10:21,616 --> 00:10:23,350
getting smaller and
smaller as I'm doing

167
00:10:23,350 --> 00:10:25,120
better and better averages.

168
00:10:25,120 --> 00:10:26,900
The average more
and more repeats.

169
00:10:26,900 --> 00:10:29,054
Does that make sense?

170
00:10:29,054 --> 00:10:31,590
OK.

171
00:10:31,590 --> 00:10:37,910
So from this key equation, I
can derive a lot of things.

172
00:10:37,910 --> 00:10:39,510
And it depends what
you want to do.

173
00:10:39,510 --> 00:10:44,220
So one thing people do a lot is
it was called model validation.

174
00:10:50,060 --> 00:10:51,180
And what does this mean?

175
00:10:51,180 --> 00:10:54,890
It means I have a model,
I believe it's true.

176
00:10:54,890 --> 00:10:57,420
I have some parameters,
I believe they're true.

177
00:10:57,420 --> 00:10:59,420
But there are some
foolish skeptics out there

178
00:10:59,420 --> 00:11:01,400
who don't have the
faith that I do.

179
00:11:01,400 --> 00:11:04,520
And they think that my model's
baloney, or my parameter

180
00:11:04,520 --> 00:11:06,440
values are wrong, or something.

181
00:11:06,440 --> 00:11:10,520
And so to prove I'm right, I'm
going to make some experiments.

182
00:11:10,520 --> 00:11:12,020
And I'm going to
show that I make

183
00:11:12,020 --> 00:11:16,340
a plot that looks like the
experiment and model agree.

184
00:11:16,340 --> 00:11:18,795
Some of you might have done
this in your life, yes?

185
00:11:18,795 --> 00:11:21,920
Everybody might make a
parity plot or something.

186
00:11:21,920 --> 00:11:24,830
You've seen these things before.

187
00:11:24,830 --> 00:11:28,919
Now, this is like a
confidence builder.

188
00:11:28,919 --> 00:11:30,710
You're trying to get
the skeptics out there

189
00:11:30,710 --> 00:11:33,560
to believe that
there's some evidence

190
00:11:33,560 --> 00:11:38,120
to back up your faith that
this model is perfect.

191
00:11:38,120 --> 00:11:40,490
And what you really
want to know is

192
00:11:40,490 --> 00:11:45,200
like, if the measurement
that I measure,

193
00:11:45,200 --> 00:11:49,050
the average for my 10,000
repeated measurements,

194
00:11:49,050 --> 00:11:54,720
I expect that this quantity
should be pretty big somehow

195
00:11:54,720 --> 00:11:56,294
in some way.

196
00:11:56,294 --> 00:11:57,710
By then quantitatively
saying what

197
00:11:57,710 --> 00:12:02,124
that means exactly what's a
good fit, what's a bad fit,

198
00:12:02,124 --> 00:12:04,040
this is actually kind
of a difficult question,

199
00:12:04,040 --> 00:12:05,510
and we'll come back to this one.

200
00:12:05,510 --> 00:12:09,470
But that's a very common
use of this equation

201
00:12:09,470 --> 00:12:13,200
is to try to do validation.

202
00:12:13,200 --> 00:12:15,400
Now because it's
kind of complicated,

203
00:12:15,400 --> 00:12:17,250
most people don't
actually do it.

204
00:12:17,250 --> 00:12:19,530
So instead what they
do is they just plot

205
00:12:19,530 --> 00:12:22,860
some data points, and they
plot your model curve.

206
00:12:22,860 --> 00:12:26,040
And as long as they look
good, then you're done.

207
00:12:26,040 --> 00:12:30,990
So that's the normal way that
it's done in the literature

208
00:12:30,990 --> 00:12:31,810
currently.

209
00:12:31,810 --> 00:12:33,810
But of course, that's
completely unquantitative.

210
00:12:33,810 --> 00:12:36,690
It doesn't really say whether
the model and the data

211
00:12:36,690 --> 00:12:39,750
really agree, it just means they
look sort of like each other.

212
00:12:39,750 --> 00:12:41,615
So that's like a human
qualitative thing.

213
00:12:41,615 --> 00:12:42,990
Now, if the purpose
of validation

214
00:12:42,990 --> 00:12:48,270
is just to convince humans,
then you've done the purpose.

215
00:12:48,270 --> 00:12:50,670
Now, if your purpose is to
try to quantitatively say

216
00:12:50,670 --> 00:12:51,630
something, then you
really have to get

217
00:12:51,630 --> 00:12:53,610
into this equation,
which usually is not done

218
00:12:53,610 --> 00:12:56,770
but would be the right
thing to do for validation.

219
00:12:56,770 --> 00:13:01,740
Now the alternative view
is disproving a model.

220
00:13:08,636 --> 00:13:14,200
But I just say that there's
several ways this can happen.

221
00:13:14,200 --> 00:13:16,500
You can try to disprove a
model, but you might also

222
00:13:16,500 --> 00:13:20,648
show that the theta
values are incorrect.

223
00:13:27,360 --> 00:13:31,780
Or you might show that
the experiment is wrong.

224
00:13:41,910 --> 00:13:43,410
These are all
possibilities, reasons

225
00:13:43,410 --> 00:13:48,330
why the model and the data
might not agree with each other.

226
00:13:48,330 --> 00:13:52,820
So this equation, it only holds
if the model is really true,

227
00:13:52,820 --> 00:13:55,470
if the parameter values
are all perfectly correct,

228
00:13:55,470 --> 00:14:00,790
if we know exactly what all
the knob values are perfectly.

229
00:14:00,790 --> 00:14:02,700
If any of those
things are not true,

230
00:14:02,700 --> 00:14:05,240
then you should have
some discrepancy,

231
00:14:05,240 --> 00:14:07,270
and there should be
a way to show it.

232
00:14:07,270 --> 00:14:08,820
And really what
you're showing is

233
00:14:08,820 --> 00:14:13,650
that you'd observed
some y that is

234
00:14:13,650 --> 00:14:15,270
very unlikely to be observed.

235
00:14:15,270 --> 00:14:18,720
So probably observing that
y is very extremely unlikely

236
00:14:18,720 --> 00:14:20,470
if all these other
things were true.

237
00:14:20,470 --> 00:14:24,210
So if all these things are true,
and you compute this value,

238
00:14:24,210 --> 00:14:26,490
and this value's
very tiny, then it

239
00:14:26,490 --> 00:14:29,910
makes you think that
it's unlikely that you

240
00:14:29,910 --> 00:14:31,140
would have observed that.

241
00:14:31,140 --> 00:14:33,180
And therefore, you might try to
use that as an argument to say

242
00:14:33,180 --> 00:14:34,290
that something must be wrong.

243
00:14:34,290 --> 00:14:36,840
The model's wrong, parameters
are wrong, the knobs are wrong,

244
00:14:36,840 --> 00:14:39,180
something's wrong.

245
00:14:39,180 --> 00:14:40,970
My y values are wrong.

246
00:14:40,970 --> 00:14:42,870
It could be any of those things.

247
00:14:42,870 --> 00:14:46,599
So this is often the most
exciting papers to publish.

248
00:14:46,599 --> 00:14:49,140
You publish a paper, you take
some model that a lot of people

249
00:14:49,140 --> 00:14:50,030
believe.

250
00:14:50,030 --> 00:14:52,810
You tell them they're full of
baloney, it's completely wrong.

251
00:14:52,810 --> 00:14:56,940
My great experiment shows
you are completely wrong.

252
00:14:56,940 --> 00:14:59,495
And so you'll see a
lot of these in Nature.

253
00:14:59,495 --> 00:15:00,870
I should warn you,
a lot of those

254
00:15:00,870 --> 00:15:06,780
get retracted later, a very
high retraction rate in Nature.

255
00:15:06,780 --> 00:15:08,250
Because they want
to publish papers

256
00:15:08,250 --> 00:15:11,760
like that that show that the
common view is incorrect,

257
00:15:11,760 --> 00:15:12,900
and sometimes it's true.

258
00:15:12,900 --> 00:15:14,430
But oftentimes the common
view is actually correct,

259
00:15:14,430 --> 00:15:15,730
and there's something
wrong with the experiment,

260
00:15:15,730 --> 00:15:18,146
or the interpretation, or how
they computed this equation,

261
00:15:18,146 --> 00:15:19,040
or whatever.

262
00:15:19,040 --> 00:15:21,770
And so actually it turns out the
common view is perfectly fine,

263
00:15:21,770 --> 00:15:26,280
and it's just that the foolish
authors went off on a tangent.

264
00:15:26,280 --> 00:15:28,080
And then they have
to six months later

265
00:15:28,080 --> 00:15:29,790
publish a retraction,
by the way, sorry,

266
00:15:29,790 --> 00:15:31,322
paper was completely wrong.

267
00:15:31,322 --> 00:15:32,530
And so you see a lot of that.

268
00:15:36,570 --> 00:15:39,210
So that's a second
kind of thing.

269
00:15:39,210 --> 00:15:42,870
And we'll talk more about
that a little bit later, too.

270
00:15:42,870 --> 00:15:51,920
And then another thing is
I'll relax my assumptions.

271
00:15:51,920 --> 00:15:56,360
So I'll say, well, I'm sure
that the model is true,

272
00:15:56,360 --> 00:15:59,540
and I'm sure that my knob
settings are perfect,

273
00:15:59,540 --> 00:16:01,530
and I know what they are.

274
00:16:01,530 --> 00:16:04,460
But I'm not really sure
about all the parameters.

275
00:16:04,460 --> 00:16:06,710
And therefore I want
to use the experiment

276
00:16:06,710 --> 00:16:10,134
to try to refine
parameter values.

277
00:16:19,580 --> 00:16:22,910
So I'm trying to take my y's
that I measure and somehow

278
00:16:22,910 --> 00:16:26,630
infer something
about the thetas.

279
00:16:26,630 --> 00:16:29,680
And this is a very
common thing to do.

280
00:16:29,680 --> 00:16:33,570
So in my group we've tried to
measure the rate coefficient

281
00:16:33,570 --> 00:16:35,220
for a reaction.

282
00:16:35,220 --> 00:16:37,297
We believe there is
value of that theta,

283
00:16:37,297 --> 00:16:39,630
and in fact, we probably have
an estimate of what it is.

284
00:16:39,630 --> 00:16:41,540
But we're not sure
of the exact number,

285
00:16:41,540 --> 00:16:43,790
and we'd like to do an
experiment to refine the number

286
00:16:43,790 --> 00:16:45,600
and get it more
accurately determined.

287
00:16:45,600 --> 00:16:49,150
So that's another
useful thing to do.

288
00:16:49,150 --> 00:16:54,100
And this leads into two somewhat
different points of view

289
00:16:54,100 --> 00:16:55,870
about this.

290
00:16:55,870 --> 00:16:59,760
One you've probably done already
called least squares fitting.

291
00:16:59,760 --> 00:17:00,820
That's one view.

292
00:17:00,820 --> 00:17:03,580
And the other is
this Bayesian view

293
00:17:03,580 --> 00:17:06,119
that I'll tell you about next.

294
00:17:06,119 --> 00:17:10,609
So there's sort of
A and B. There's

295
00:17:10,609 --> 00:17:14,920
one that I'll call Bayesian,
and one I'll call least squares.

296
00:17:18,764 --> 00:17:20,680
They're sort of related
to each other, but not

297
00:17:20,680 --> 00:17:22,544
exactly the same conceptually.

298
00:17:22,544 --> 00:17:23,710
So I'll try to explain that.

299
00:17:27,069 --> 00:17:30,190
So the Bayesian view
is probabilistic,

300
00:17:30,190 --> 00:17:38,250
so it's actually pretty
straightforward to write down.

301
00:17:38,250 --> 00:17:49,170
Remember that we wrote that
the probability of A and B

302
00:17:49,170 --> 00:17:52,880
is equal to the
probability of A times

303
00:17:52,880 --> 00:17:58,980
the probability of B
given A, and it's also

304
00:17:58,980 --> 00:18:02,646
equal to the probability
of B times the probability

305
00:18:02,646 --> 00:18:07,650
of A given B. And
what we have here is

306
00:18:07,650 --> 00:18:09,440
one of these conditional
probabilities,

307
00:18:09,440 --> 00:18:11,330
if the thetas have
a certain value,

308
00:18:11,330 --> 00:18:13,290
this is a certain probability.

309
00:18:13,290 --> 00:18:16,410
So I should be able to
use that formula somehow.

310
00:18:16,410 --> 00:18:21,770
So I can write down that
the probability of measuring

311
00:18:21,770 --> 00:18:35,920
y given theta is equal to
the probability of y times

312
00:18:35,920 --> 00:18:43,280
the probability of
theta given y divided

313
00:18:43,280 --> 00:18:49,960
by the probability of theta.

314
00:18:49,960 --> 00:18:52,460
So I just took this formula,
and I plugged in y's and thetas

315
00:18:52,460 --> 00:18:54,710
instead of A's and B's.

316
00:18:54,710 --> 00:18:57,770
So I said, these two
are equal to each other.

317
00:18:57,770 --> 00:18:59,510
Rearranged it so then
I can rewrite this.

318
00:18:59,510 --> 00:19:01,700
This is the way we have
it in here, probably

319
00:19:01,700 --> 00:19:03,800
of measuring y given theta.

320
00:19:03,800 --> 00:19:05,370
Let's flip it around.

321
00:19:05,370 --> 00:19:11,000
So probability of theta
given that we measured

322
00:19:11,000 --> 00:19:21,740
y is equal to the
probability of theta times

323
00:19:21,740 --> 00:19:28,430
the probability of observing
y if theta was true

324
00:19:28,430 --> 00:19:31,823
divided by the probability of y.

325
00:19:36,725 --> 00:19:37,850
Terrible handwriting there.

326
00:19:41,470 --> 00:19:43,147
That's just algebra.

327
00:19:43,147 --> 00:19:44,480
So this is what we want to know.

328
00:19:44,480 --> 00:19:46,646
We want to know, what's the
probability distribution

329
00:19:46,646 --> 00:19:48,090
of the parameter values y?

330
00:19:48,090 --> 00:19:50,490
Because some of
them are uncertain.

331
00:19:50,490 --> 00:19:52,680
Now, before we started
the experiment,

332
00:19:52,680 --> 00:19:56,220
we had some idea of
what the ranges were

333
00:19:56,220 --> 00:19:59,340
for all the primary values.

334
00:19:59,340 --> 00:20:01,620
Like I'm trying to measure
a rate coefficient.

335
00:20:01,620 --> 00:20:06,000
I know from experience with
other similar reactions,

336
00:20:06,000 --> 00:20:08,550
from a quantum
chemistry calculation,

337
00:20:08,550 --> 00:20:12,420
from some indirect evidence,
from some other more

338
00:20:12,420 --> 00:20:14,910
complicated experiment,
I have some idea

339
00:20:14,910 --> 00:20:17,274
that this rate coefficient
has to be in a certain range.

340
00:20:17,274 --> 00:20:18,690
Now, it could be
pretty uncertain.

341
00:20:18,690 --> 00:20:21,150
It might be five orders
of magnitude uncertain.

342
00:20:21,150 --> 00:20:23,580
But I know it's not less than 0.

343
00:20:23,580 --> 00:20:26,730
I know it can't be faster
than the diffusion limit, how

344
00:20:26,730 --> 00:20:28,590
fast things can come together.

345
00:20:28,590 --> 00:20:30,905
So for sure I know some
range, and oftentimes I

346
00:20:30,905 --> 00:20:32,640
know a much narrower
range than that.

347
00:20:32,640 --> 00:20:35,700
So I have some information
about these parameter values

348
00:20:35,700 --> 00:20:36,709
before I even start.

349
00:20:36,709 --> 00:20:38,250
Some of the parameters
of the model I

350
00:20:38,250 --> 00:20:39,960
know perfectly, or pretty well.

351
00:20:39,960 --> 00:20:43,090
So you know maybe there's a
Planck's constant, or the heat

352
00:20:43,090 --> 00:20:45,090
of formation of one of
my chemicals or something

353
00:20:45,090 --> 00:20:46,590
like that shows
up in the numbers,

354
00:20:46,590 --> 00:20:48,860
and I might know that parameter
really pretty accurately.

355
00:20:48,860 --> 00:20:51,026
Whereas the particular rate
coefficient I care about

356
00:20:51,026 --> 00:20:53,940
is the thing I really
don't know very well.

357
00:20:53,940 --> 00:20:57,180
So some of these have tight
probability distributions

358
00:20:57,180 --> 00:21:00,720
ahead of time, and some
of them have loose ones.

359
00:21:00,720 --> 00:21:01,900
And this thing has a name.

360
00:21:01,900 --> 00:21:02,816
It's called the prior.

361
00:21:05,670 --> 00:21:09,864
And it's our prior information
before we did the experiment.

362
00:21:09,864 --> 00:21:11,780
And this one, after we've
done the experiment,

363
00:21:11,780 --> 00:21:12,770
we're going to change it.

364
00:21:12,770 --> 00:21:14,395
So we're going to
say previously people

365
00:21:14,395 --> 00:21:17,474
thought that the parameters all
lied in these certain ranges.

366
00:21:17,474 --> 00:21:18,890
And now I'm going
to get a tighter

367
00:21:18,890 --> 00:21:20,973
range, because I have some
additional experimental

368
00:21:20,973 --> 00:21:22,050
information.

369
00:21:22,050 --> 00:21:23,430
So this is called the posterior.

370
00:21:28,650 --> 00:21:31,440
This means before,
it means after.

371
00:21:31,440 --> 00:21:33,330
So this is what I know
about parameter values

372
00:21:33,330 --> 00:21:36,920
before and after the experiment.

373
00:21:36,920 --> 00:21:41,050
This is the formula
that I have over there.

374
00:21:41,050 --> 00:21:44,624
It's a probability that if the
thetas had a certain value,

375
00:21:44,624 --> 00:21:46,290
probably would have
observed what I saw.

376
00:21:46,290 --> 00:21:46,790
Yeah?

377
00:21:46,790 --> 00:21:50,234
AUDIENCE: Which one
refers to which?

378
00:21:50,234 --> 00:21:53,420
WILLIAM GREEN: Sorry,
this is the prior,

379
00:21:53,420 --> 00:21:54,616
this is the posterior.

380
00:21:59,450 --> 00:22:02,115
And those of you who are
paying attention to notation

381
00:22:02,115 --> 00:22:03,740
realize I'm not doing
this very nicely.

382
00:22:03,740 --> 00:22:06,639
Because these are
continuous variables,

383
00:22:06,639 --> 00:22:08,180
and I'm writing
capital PRs, and they

384
00:22:08,180 --> 00:22:10,880
should be not be capital PRs,
it should be probably density

385
00:22:10,880 --> 00:22:12,516
functions instead.

386
00:22:12,516 --> 00:22:13,640
So let's rewrite it nicely.

387
00:22:18,140 --> 00:22:23,370
So the probability
of theta given y,

388
00:22:23,370 --> 00:22:26,780
probability density is equal
to the probability density

389
00:22:26,780 --> 00:22:37,422
of theta initially times the
probability of y given theta,

390
00:22:37,422 --> 00:22:41,470
[INAUDIBLE] density divided by
the probability density of--

391
00:22:44,900 --> 00:22:47,420
all right?

392
00:22:47,420 --> 00:22:48,920
And what I just
basically did was

393
00:22:48,920 --> 00:22:51,378
this is the correct equation
the previous other one was all

394
00:22:51,378 --> 00:22:54,807
multiplied by d theta dy, it
shouldn't be done that way.

395
00:22:54,807 --> 00:22:55,390
So this is OK?

396
00:22:59,640 --> 00:23:02,100
Now this is the
prior information

397
00:23:02,100 --> 00:23:04,670
I have about the
parameter values.

398
00:23:04,670 --> 00:23:08,010
I know that they have to
fall into some ranges.

399
00:23:08,010 --> 00:23:11,084
And really all I'm doing is I'm
correcting that information.

400
00:23:11,084 --> 00:23:13,500
I'm improving the information
to tighten the distribution.

401
00:23:13,500 --> 00:23:18,910
So initially I know that
my rate constant, here's

402
00:23:18,910 --> 00:23:20,700
my rate constant.

403
00:23:20,700 --> 00:23:24,727
I know that it's got to
be greater than zero,

404
00:23:24,727 --> 00:23:27,060
and I don't think it's really
down there at zero anyway.

405
00:23:27,060 --> 00:23:28,351
I think it's somewhere in here.

406
00:23:28,351 --> 00:23:29,670
I really don't know much.

407
00:23:29,670 --> 00:23:31,740
And I really don't think it's
all up at the diffusion limit,

408
00:23:31,740 --> 00:23:33,740
and no way it's higher
than the diffusion limit.

409
00:23:33,740 --> 00:23:35,910
So that's my initial
information that I

410
00:23:35,910 --> 00:23:41,340
have about the probability
distribution of K.

411
00:23:41,340 --> 00:23:43,800
So it's the rate
coefficient I want to know,

412
00:23:43,800 --> 00:23:45,480
and I know it's
bigger than zero,

413
00:23:45,480 --> 00:23:47,480
and I know it's
less than infinity.

414
00:23:47,480 --> 00:23:49,500
And actually I know there's
some physical limit,

415
00:23:49,500 --> 00:23:51,291
it can't be higher than
something or other.

416
00:23:53,509 --> 00:23:55,300
And you can do this
for any problem, right?

417
00:23:55,300 --> 00:23:56,990
I give you any
parameter, you should

418
00:23:56,990 --> 00:23:58,909
be able to tell me
something about it.

419
00:23:58,909 --> 00:24:00,950
You might be uncertain by
20 orders of magnitude,

420
00:24:00,950 --> 00:24:02,866
but at least you have
an error bar some width.

421
00:24:02,866 --> 00:24:04,742
It can't be anything, right?

422
00:24:04,742 --> 00:24:06,950
A lot of parameters have to
be positive, for example.

423
00:24:06,950 --> 00:24:08,390
You know that.

424
00:24:08,390 --> 00:24:10,017
And you usually know something.

425
00:24:10,017 --> 00:24:12,100
you might not think you
know anything, but you do,

426
00:24:12,100 --> 00:24:13,683
you actually do know
before you start.

427
00:24:13,683 --> 00:24:17,110
So you actually know, this is
the P of theta to start with.

428
00:24:19,880 --> 00:24:23,605
And after I've done
the experiment,

429
00:24:23,605 --> 00:24:25,980
hopefully you're going to know
more information about it.

430
00:24:25,980 --> 00:24:29,610
I might know that
this quantity here

431
00:24:29,610 --> 00:24:33,120
is going to be like a
Gaussian distribution.

432
00:24:33,120 --> 00:24:35,980
It might have a kind of
goofball dependence on theta.

433
00:24:35,980 --> 00:24:38,110
I should comment that.

434
00:24:38,110 --> 00:24:42,740
Notice how theta
appears inside F.

435
00:24:42,740 --> 00:24:44,680
So theta's up in the exponent.

436
00:24:44,680 --> 00:24:49,520
It's sort of inside a Gaussian,
but it's like processed by F

437
00:24:49,520 --> 00:24:53,240
and so the observable might have
a pretty goofball dependence

438
00:24:53,240 --> 00:24:55,240
on this rate coefficient.

439
00:24:55,240 --> 00:24:58,640
So this thing could
be some weird thing.

440
00:24:58,640 --> 00:25:03,549
But for sure, when I change
theta so this changes a lot,

441
00:25:03,549 --> 00:25:05,340
it's going to make a
pretty big difference.

442
00:25:05,340 --> 00:25:08,700
Because up inside the
exponent of a Gaussian,

443
00:25:08,700 --> 00:25:11,940
so it's going to drop
off a lot somewhere.

444
00:25:11,940 --> 00:25:15,370
So I should get something
that looks something like this

445
00:25:15,370 --> 00:25:18,210
maybe for my experiment.

446
00:25:18,210 --> 00:25:24,060
So this one is P of K
initially, the prior.

447
00:25:24,060 --> 00:25:28,010
This one is P of yk.

448
00:25:31,910 --> 00:25:34,010
And what this equation
says is I want

449
00:25:34,010 --> 00:25:37,330
to multiply those two together.

450
00:25:37,330 --> 00:25:39,240
And so I'm going to
multiply this times this,

451
00:25:39,240 --> 00:25:50,180
and I'm going to get
some new thing that's

452
00:25:50,180 --> 00:25:55,432
something like that when
I multiply this time that.

453
00:25:55,432 --> 00:25:58,340
Is that OK?

454
00:25:58,340 --> 00:26:03,090
And so that's my new
numerator of this equation.

455
00:26:03,090 --> 00:26:05,226
Now this denominator
doesn't make too much sense.

456
00:26:05,226 --> 00:26:06,600
This says, what's
the probability

457
00:26:06,600 --> 00:26:13,035
that I measured the mean
I measured, given nothing?

458
00:26:13,035 --> 00:26:14,910
So this is sort of like
the prior probability

459
00:26:14,910 --> 00:26:16,750
that I would have
measured it or something.

460
00:26:16,750 --> 00:26:17,833
I don't know what this is.

461
00:26:17,833 --> 00:26:22,100
So instead what people do,
is they say, forget this.

462
00:26:22,100 --> 00:26:27,234
But instead, let's multiply
this by a constant that's

463
00:26:27,234 --> 00:26:29,400
going to normalize it to
make it probability density

464
00:26:29,400 --> 00:26:31,370
so that it integrates to one.

465
00:26:34,370 --> 00:26:36,950
So that's the way
Bayes' theorem is used.

466
00:26:36,950 --> 00:26:38,540
This is called
Bayesian analysis.

467
00:26:42,140 --> 00:26:43,850
And so what it's
telling you is how

468
00:26:43,850 --> 00:26:49,220
to take your experimental
information as expressed

469
00:26:49,220 --> 00:26:59,010
in this formula and use all
your previous information

470
00:26:59,010 --> 00:27:03,030
about the parameters,
put them all together,

471
00:27:03,030 --> 00:27:06,490
now we have a cumulative
information about everything.

472
00:27:06,490 --> 00:27:09,760
So we have some parameters
that came into our problem

473
00:27:09,760 --> 00:27:12,370
into my experiment,
but from previous work,

474
00:27:12,370 --> 00:27:14,630
I also knew something
about those parameters.

475
00:27:14,630 --> 00:27:16,330
Now I put it all
together and I get

476
00:27:16,330 --> 00:27:19,180
a new value of
probability distribution

477
00:27:19,180 --> 00:27:21,020
of those parameters.

478
00:27:21,020 --> 00:27:23,530
And if my expert
was really good,

479
00:27:23,530 --> 00:27:27,237
it would make this really
tight [WHOOSHING SOUND]..

480
00:27:27,237 --> 00:27:29,070
And then when I multiply
these two together,

481
00:27:29,070 --> 00:27:32,470
it's going to make
this really sharp,

482
00:27:32,470 --> 00:27:35,380
and we have a really
good value of k.

483
00:27:35,380 --> 00:27:37,250
So that's like the
ideal case if I

484
00:27:37,250 --> 00:27:39,440
have a really great,
well-designed experiment

485
00:27:39,440 --> 00:27:44,090
executed perfectly with great
precision, then I can do this.

486
00:27:44,090 --> 00:27:46,214
More generally, when I
don't think about it,

487
00:27:46,214 --> 00:27:47,630
I get some
distribution like this.

488
00:27:50,210 --> 00:27:52,830
I still learn something
compared to what I had before,

489
00:27:52,830 --> 00:27:54,440
but it might not be much.

490
00:27:54,440 --> 00:27:57,410
So now I can end up with
some distribution that's

491
00:27:57,410 --> 00:27:59,630
a little tighter than before.

492
00:28:04,130 --> 00:28:07,190
So is this OK so far?

493
00:28:07,190 --> 00:28:11,040
All right, now this
is super simple.

494
00:28:11,040 --> 00:28:13,200
I didn't have to
solve anything, all

495
00:28:13,200 --> 00:28:17,120
I had to do was multiply
two distributions together.

496
00:28:17,120 --> 00:28:20,810
So in some respects, this is
what you should always do.

497
00:28:20,810 --> 00:28:22,962
All you do is you
take your experiment,

498
00:28:22,962 --> 00:28:25,170
you multiply the probability
distribution corresponds

499
00:28:25,170 --> 00:28:27,334
to your experiment
times the prior,

500
00:28:27,334 --> 00:28:29,750
and you get some posterior,
and that's why new information

501
00:28:29,750 --> 00:28:31,340
about the distribution.

502
00:28:31,340 --> 00:28:33,870
And if I have a
distribution like this,

503
00:28:33,870 --> 00:28:37,370
suppose this is my
new distribution here,

504
00:28:37,370 --> 00:28:41,240
I can still get it central
value, that's my mean value, k.

505
00:28:41,240 --> 00:28:44,010
I can get an estimate
of the range of k.

506
00:28:44,010 --> 00:28:45,780
So I end up with
a k plus or minus

507
00:28:45,780 --> 00:28:50,870
dk maybe, from just
looking at the plot.

508
00:28:50,870 --> 00:28:53,670
In fact, I never even have to
evaluate what this constant is

509
00:28:53,670 --> 00:28:54,540
in order to do this.

510
00:28:54,540 --> 00:28:57,830
I can just go look at the
plot, see where the peak is,

511
00:28:57,830 --> 00:29:01,030
figure out the width,
and I can report now

512
00:29:01,030 --> 00:29:03,460
because in my experiment,
k plus or minus

513
00:29:03,460 --> 00:29:05,590
dk is more precisely
determined than it was before.

514
00:29:09,150 --> 00:29:12,840
Now, a practical
challenge with this

515
00:29:12,840 --> 00:29:15,980
is that theta is usually
a lot of parameters.

516
00:29:15,980 --> 00:29:19,700
And I only drew the plot
here in one dimension,

517
00:29:19,700 --> 00:29:23,280
but really it's a
multi-dimensional plot.

518
00:29:23,280 --> 00:29:27,205
So really what looked like,
suppose I had two parameters.

519
00:29:27,205 --> 00:29:29,580
I had my k I care about, and
I have some other parameter,

520
00:29:29,580 --> 00:29:34,870
theta 2, that also it
shows up in my model.

521
00:29:34,870 --> 00:29:39,170
And say, before
I started, I knew

522
00:29:39,170 --> 00:29:43,570
theta 2 fell in
this kind of range,

523
00:29:43,570 --> 00:29:47,950
and I knew k fell in
this kind of range.

524
00:29:47,950 --> 00:29:51,880
So really before I started,
if I think what it looks like,

525
00:29:51,880 --> 00:29:57,440
I really had sort of a
blobby rectangular contour

526
00:29:57,440 --> 00:30:02,460
plot, where I think it's
more likely that the k

527
00:30:02,460 --> 00:30:05,506
value and the theta 2 value
are somewhere in this range.

528
00:30:05,506 --> 00:30:08,130
And the most likely one is maybe
somewhere in the middle there.

529
00:30:08,130 --> 00:30:09,780
But I really didn't know much.

530
00:30:09,780 --> 00:30:12,720
So it could be anywhere
in this whole blob.

531
00:30:12,720 --> 00:30:15,960
Now, when I do the experiment,
the experimental value

532
00:30:15,960 --> 00:30:19,240
depends of both k and theta 2.

533
00:30:19,240 --> 00:30:22,510
And commonly what'll happen
is that the distribution

534
00:30:22,510 --> 00:30:25,190
from the experiment--

535
00:30:25,190 --> 00:30:28,718
need color chalk here.

536
00:30:28,718 --> 00:30:31,100
Let's get rid of these guys.

537
00:30:31,100 --> 00:30:33,932
So this is my probably
distribution, there's my prior.

538
00:30:33,932 --> 00:30:37,850
If I do the experiment, maybe
I'll have something like this.

539
00:30:37,850 --> 00:30:40,510
That the experiment
says that the guys

540
00:30:40,510 --> 00:30:47,094
have to be somewhere in
the contour plot like this.

541
00:30:47,094 --> 00:30:48,510
Because I can get
pretty good fits

542
00:30:48,510 --> 00:30:50,080
of the data with
different values of k

543
00:30:50,080 --> 00:30:52,038
as long as I compensate
with the value theta 2.

544
00:30:54,780 --> 00:30:58,370
Now I multiply these two
dimensional functions.

545
00:30:58,370 --> 00:31:02,410
The original is a blob
function, and this

546
00:31:02,410 --> 00:31:05,020
is a stretched out blob.

547
00:31:05,020 --> 00:31:08,140
And I multiply a stretched
out blob times a fat blob,

548
00:31:08,140 --> 00:31:09,970
I get some stretched
out blob that

549
00:31:09,970 --> 00:31:12,170
looks something like the
intersection of these guys.

550
00:31:12,170 --> 00:31:14,587
And so I end up with some
kind of blob like that.

551
00:31:14,587 --> 00:31:15,670
I'll draw it really thick.

552
00:31:15,670 --> 00:31:21,940
So this is my posterior,
some kind of blob like this.

553
00:31:21,940 --> 00:31:25,060
So now I know a little bit
more about these two parameters

554
00:31:25,060 --> 00:31:29,370
than I did before I started
because of my experiment.

555
00:31:29,370 --> 00:31:30,417
Is this OK?

556
00:31:30,417 --> 00:31:32,500
I really can't say I know
what the real value of k

557
00:31:32,500 --> 00:31:34,005
is, or the value of theta 2.

558
00:31:34,005 --> 00:31:36,439
But I know that
combinations of k

559
00:31:36,439 --> 00:31:38,730
and theta 2 that are sort of
in this range, all of them

560
00:31:38,730 --> 00:31:40,438
will give me pretty
good fits to my data,

561
00:31:40,438 --> 00:31:43,500
and also be consistent with all
the previous information I have

562
00:31:43,500 --> 00:31:46,500
about those parameter values.

563
00:31:46,500 --> 00:31:48,880
Is that all right?

564
00:31:48,880 --> 00:31:51,394
Now, I drew it with
two parameters.

565
00:31:51,394 --> 00:31:53,560
In a lot of models we have,
we have five parameters,

566
00:31:53,560 --> 00:31:55,643
six parameters, seven
parameters, nine parameters,

567
00:31:55,643 --> 00:31:56,540
14 parameters.

568
00:31:56,540 --> 00:31:58,360
We have a lot of parameters.

569
00:31:58,360 --> 00:32:01,660
And so then we try to
make this plot, even how

570
00:32:01,660 --> 00:32:05,784
to display the plot is going
to be a little problematic.

571
00:32:05,784 --> 00:32:06,700
But it's there, right?

572
00:32:06,700 --> 00:32:10,630
And somehow, we still
narrowed down the hypervolume

573
00:32:10,630 --> 00:32:12,970
in the parameter
space from whatever

574
00:32:12,970 --> 00:32:15,580
it was to begin
with to now we know

575
00:32:15,580 --> 00:32:16,850
something a little bit better.

576
00:32:16,850 --> 00:32:19,389
We have a narrower range
of the parameters that

577
00:32:19,389 --> 00:32:21,680
would be consistent with all
the information available,

578
00:32:21,680 --> 00:32:24,400
including my new experiment.

579
00:32:24,400 --> 00:32:26,560
And then the next guy
does his experiment,

580
00:32:26,560 --> 00:32:29,350
and he does an experiment that
shows that these guys have

581
00:32:29,350 --> 00:32:32,320
to be somewhere in this
range in order to be

582
00:32:32,320 --> 00:32:34,570
consistent with his experiment.

583
00:32:34,570 --> 00:32:37,660
And so now I can
narrow down the range

584
00:32:37,660 --> 00:32:39,490
to be something like that.

585
00:32:39,490 --> 00:32:41,240
And the next person
does their experiment,

586
00:32:41,240 --> 00:32:42,540
and they get something
else, and something else,

587
00:32:42,540 --> 00:32:43,360
and something else.

588
00:32:43,360 --> 00:32:46,030
And eventually by 2050, we have
a pretty nice determination

589
00:32:46,030 --> 00:32:48,970
of the parameter values.

590
00:32:48,970 --> 00:32:51,760
So that's the
advance of science,

591
00:32:51,760 --> 00:32:55,212
as drawn in chalk by
Professor Green at the board.

592
00:32:59,890 --> 00:33:02,020
So this is a very
important way to think

593
00:33:02,020 --> 00:33:04,740
about it, is what you're
doing when you do experiments,

594
00:33:04,740 --> 00:33:09,730
is you're generally restricting
the range of parameter space

595
00:33:09,730 --> 00:33:12,030
that's still consistent
with everything.

596
00:33:12,030 --> 00:33:13,870
And when we mean
consistent, we mean

597
00:33:13,870 --> 00:33:16,286
that the probability that you
would have observed what you

598
00:33:16,286 --> 00:33:18,174
did observe is reasonably high.

599
00:33:18,174 --> 00:33:20,590
We'll still have to come back
to quantitatively figure out

600
00:33:20,590 --> 00:33:21,796
what reasonably high means.

601
00:33:25,600 --> 00:33:29,440
Now, when you did this
before when you were kids,

602
00:33:29,440 --> 00:33:31,450
nobody mentioned the
word Bayes, or Bayesian,

603
00:33:31,450 --> 00:33:34,630
or conditional
probabilities, right?

604
00:33:34,630 --> 00:33:38,077
So they just said, oh, just
do a least squares fit.

605
00:33:38,077 --> 00:33:39,410
How many of you did that before?

606
00:33:42,930 --> 00:33:45,660
So somebody told me
before, forget this stuff,

607
00:33:45,660 --> 00:33:47,790
we're going to never
even mention this stuff.

608
00:33:47,790 --> 00:33:49,581
We're just going to do
a least squares fit.

609
00:33:56,300 --> 00:33:59,060
Now, where did the least
squares fit idea came from?

610
00:33:59,060 --> 00:34:01,670
It came from looking at
this formula and saying,

611
00:34:01,670 --> 00:34:06,020
you know, this is the deviations
between the experiment

612
00:34:06,020 --> 00:34:11,719
and the model prediction,
and I weigh them somehow,

613
00:34:11,719 --> 00:34:13,061
and I have the square.

614
00:34:13,061 --> 00:34:14,810
And that's the thing
I want to make small.

615
00:34:14,810 --> 00:34:20,239
If I have a high probability
that what I observed really

616
00:34:20,239 --> 00:34:23,659
happened, or the probably
I'm going to observe this,

617
00:34:23,659 --> 00:34:25,550
it's got to be that
these guys have to be

618
00:34:25,550 --> 00:34:26,580
reasonably close to each other.

619
00:34:26,580 --> 00:34:27,610
They're really different.

620
00:34:27,610 --> 00:34:29,485
And it's going to be
very small, because it's

621
00:34:29,485 --> 00:34:30,770
inside an exponential.

622
00:34:30,770 --> 00:34:32,659
And if those guys
are really different,

623
00:34:32,659 --> 00:34:35,170
and the squared thing
is really large,

624
00:34:35,170 --> 00:34:37,210
then the probability
is incredibly small

625
00:34:37,210 --> 00:34:39,290
that I would have observed that.

626
00:34:39,290 --> 00:34:42,679
So we think that this
thing should be small.

627
00:34:42,679 --> 00:34:47,600
And in fact, if I want to get
the very best fit I can get,

628
00:34:47,600 --> 00:34:50,840
which means like the probability
was the highest of what

629
00:34:50,840 --> 00:34:54,460
I observed in the real
observation or something,

630
00:34:54,460 --> 00:34:56,492
then if I'm free to adjust
one of these thetas,

631
00:34:56,492 --> 00:34:57,950
I can adjust the
theta, try to make

632
00:34:57,950 --> 00:35:01,057
this thing like equal to
zero, or small as I can.

633
00:35:01,057 --> 00:35:03,390
So that's where the concept
of least squares comes from.

634
00:35:07,290 --> 00:35:11,360
Now, when you're
doing least squares,

635
00:35:11,360 --> 00:35:14,236
you almost always have
multiple parameters,

636
00:35:14,236 --> 00:35:16,610
and therefore you're going to
have to have multiple data.

637
00:35:16,610 --> 00:35:20,234
And they can't just be
a repeat of one number.

638
00:35:20,234 --> 00:35:22,400
Can't be your data, it's
not sufficient to determine

639
00:35:22,400 --> 00:35:23,870
the parameters.

640
00:35:23,870 --> 00:35:25,710
So normally when you
do an experiment,

641
00:35:25,710 --> 00:35:27,700
you have to change the knobs.

642
00:35:27,700 --> 00:35:30,660
We have to make measurements in
a couple different conditions.

643
00:35:30,660 --> 00:35:31,970
Like for example, kinetics.

644
00:35:31,970 --> 00:35:35,120
You often want the Arrhenius
A factor and the EA.

645
00:35:35,120 --> 00:35:37,092
And so I got to
run the experiment

646
00:35:37,092 --> 00:35:39,050
in more than one temperature
or I'm never going

647
00:35:39,050 --> 00:35:39,960
to be able to figure that out.

648
00:35:39,960 --> 00:35:42,080
So I have to change the
temperature in my reactor.

649
00:35:42,080 --> 00:35:43,829
Make some measurements
at one temperature,

650
00:35:43,829 --> 00:35:46,310
and make some measurements
at a different temperature.

651
00:35:46,310 --> 00:35:49,040
And for almost everything in
life that you want to measure,

652
00:35:49,040 --> 00:35:50,540
you're going to have to do this.

653
00:35:50,540 --> 00:35:52,940
You vary the concentration
of your enzyme

654
00:35:52,940 --> 00:35:55,796
if you want to see how the
enzyme kinetics depends

655
00:35:55,796 --> 00:35:57,170
on something. you
can't just keep

656
00:35:57,170 --> 00:35:59,630
running exactly the same
condition over and over.

657
00:35:59,630 --> 00:36:01,657
You'll get that
number really precise,

658
00:36:01,657 --> 00:36:03,740
but it's not enough
information to really fill out

659
00:36:03,740 --> 00:36:05,280
the parameters in your model.

660
00:36:05,280 --> 00:36:08,870
So you're going to have to run
several different experiments

661
00:36:08,870 --> 00:36:10,640
with different knob settings.

662
00:36:10,640 --> 00:36:13,920
Also, normally we don't just
measure one quantity, one

663
00:36:13,920 --> 00:36:15,170
observable in each experiment.

664
00:36:15,170 --> 00:36:17,790
We usually try to measure
as many things as we can.

665
00:36:17,790 --> 00:36:19,527
So we actually have
several observables

666
00:36:19,527 --> 00:36:21,860
at each knob setting, and we
have several knob settings,

667
00:36:21,860 --> 00:36:23,152
so we have quite a lot of data.

668
00:36:23,152 --> 00:36:25,485
And each one of those is
repeated a whole bunch of times

669
00:36:25,485 --> 00:36:28,400
so that we're confident that we
can use this Gaussian formula.

670
00:36:28,400 --> 00:36:39,120
And so what we really have is
the pi-th observable measured

671
00:36:39,120 --> 00:36:47,150
at the l-th knob position.

672
00:36:47,150 --> 00:36:48,740
Well, I'm sorry,
l's not good either,

673
00:36:48,740 --> 00:36:51,684
it's used in your notes
for something else.

674
00:36:51,684 --> 00:36:54,460
M, there you go.

675
00:36:54,460 --> 00:36:56,000
The m-th knob position.

676
00:36:56,000 --> 00:36:59,380
Now, normally you have several
knobs, so that's a factor.

677
00:36:59,380 --> 00:37:03,720
And we have a lot of observables
we can make at each position.

678
00:37:03,720 --> 00:37:08,340
So this thing is a measurement.

679
00:37:08,340 --> 00:37:12,650
And we repeated this multiple
times so I can get the average.

680
00:37:12,650 --> 00:37:20,960
And we're also going to have a
corresponding sigma I M, which

681
00:37:20,960 --> 00:37:25,230
is the variance of the mean.

682
00:37:25,230 --> 00:37:27,350
So it's variance, that's
going to be divided

683
00:37:27,350 --> 00:37:30,020
by the square root of
the number of repeats

684
00:37:30,020 --> 00:37:33,560
for that particular experiment
and that particular observable.

685
00:37:33,560 --> 00:37:36,260
So this is your
incoming data set,

686
00:37:36,260 --> 00:37:44,750
and you also have your model
which predicts y model,

687
00:37:44,750 --> 00:37:54,878
it predicts the observable i for
the sequel to fi of xk theta.

688
00:37:59,559 --> 00:38:01,100
So if you have
certain knob settings,

689
00:38:01,100 --> 00:38:04,500
like certain temperature, and
you have your parameter values,

690
00:38:04,500 --> 00:38:07,610
then you can calculate
what the model thinks

691
00:38:07,610 --> 00:38:09,895
should be the observable
value, and then you

692
00:38:09,895 --> 00:38:14,570
can actually measure it
and measure its variance.

693
00:38:14,570 --> 00:38:15,960
so that's the normal situation.

694
00:38:15,960 --> 00:38:19,250
And now you want to figure
out, are there some values

695
00:38:19,250 --> 00:38:24,789
of the theta that make the
model and the data agree?

696
00:38:24,789 --> 00:38:26,580
And that's the least
squares fitting thing.

697
00:38:26,580 --> 00:38:32,093
So what we can define
as a new quantity,

698
00:38:32,093 --> 00:38:43,170
weight of the residual vector
EJ, which is defined to be jk,

699
00:38:43,170 --> 00:38:45,220
to be consistent with
Joe Scott's notes.

700
00:38:45,220 --> 00:38:47,822
AUDIENCE: Is k the same as m?

701
00:38:47,822 --> 00:38:49,280
WILLIAM GREEN: M
is in oppositions,

702
00:38:49,280 --> 00:38:50,823
I'll tell you what
k is in a second.

703
00:39:00,000 --> 00:39:02,890
m, sorry.

704
00:39:02,890 --> 00:39:03,780
Too many indices.

705
00:39:17,390 --> 00:39:21,240
OK, so this is the residual
between the model position.

706
00:39:21,240 --> 00:39:25,500
And now-- oh man, I'm
sorry, [INAUDIBLE]..

707
00:39:25,500 --> 00:39:31,730
K is an index over i and m.

708
00:39:31,730 --> 00:39:34,127
So k is just going to
list all the data you got.

709
00:39:34,127 --> 00:39:36,210
Some of the data came from
the same knob settings,

710
00:39:36,210 --> 00:39:37,835
some came from
different knob settings.

711
00:39:37,835 --> 00:39:38,372
Yeah?

712
00:39:38,372 --> 00:39:42,660
AUDIENCE: So is x the m
the y model i [INAUDIBLE]??

713
00:39:42,660 --> 00:39:43,910
WILLIAM GREEN: Thank you, yes.

714
00:39:49,110 --> 00:39:57,480
y model i, I guess
this is now k.

715
00:40:06,280 --> 00:40:08,005
And so k is one of
these indices that

716
00:40:08,005 --> 00:40:10,630
carry-- you can bind two indices
together and put them together

717
00:40:10,630 --> 00:40:12,940
just like you did
in your PD problems.

718
00:40:12,940 --> 00:40:13,440
All right.

719
00:40:17,020 --> 00:40:20,541
Now, I wrote down this sigma.

720
00:40:20,541 --> 00:40:22,540
But actually if you're
measuring multiple things

721
00:40:22,540 --> 00:40:24,070
at the same
experiment, you should

722
00:40:24,070 --> 00:40:26,500
expect them to be correlated.

723
00:40:26,500 --> 00:40:29,010
So really what we
should worry about

724
00:40:29,010 --> 00:40:34,930
is the c, the covariance matrix,
that we defined last time.

725
00:40:34,930 --> 00:40:38,750
So you should also
compute that thing.

726
00:40:38,750 --> 00:40:45,170
And so what you should expect
is the probability density

727
00:40:45,170 --> 00:40:51,800
that we would measure
any particular residuals

728
00:40:51,800 --> 00:40:53,900
if the model is true.

729
00:40:53,900 --> 00:40:56,842
And if we have these
certain parameters, theta,

730
00:40:56,842 --> 00:41:02,715
this should be equal to
2 pi negative k over 2

731
00:41:02,715 --> 00:41:10,200
the determinant of c
negative 1.2 exponential

732
00:41:10,200 --> 00:41:22,180
of negative 1/2, epsilon
transpose c inverse epsilon.

733
00:41:22,180 --> 00:41:26,671
So this is the multi measurement
version of the same equation

734
00:41:26,671 --> 00:41:27,170
here.

735
00:41:31,420 --> 00:41:35,410
So this is the
quantity that we think

736
00:41:35,410 --> 00:41:40,150
should be small if we
have good parameter values

737
00:41:40,150 --> 00:41:42,019
and we did a good experiment.

738
00:41:42,019 --> 00:41:43,810
Actually, even when we
did bad experiments,

739
00:41:43,810 --> 00:41:46,210
still should be small if we
have good parameter values.

740
00:41:48,990 --> 00:41:51,360
And that's because the c's,
if we did a bad experiment,

741
00:41:51,360 --> 00:41:53,330
we'll have a high
variance or something,

742
00:41:53,330 --> 00:41:55,780
then we should see the c's
will give us weightings

743
00:41:55,780 --> 00:41:57,420
that will reflect that.

744
00:41:57,420 --> 00:41:58,580
Yeah?

745
00:41:58,580 --> 00:42:00,980
AUDIENCE: [INAUDIBLE]

746
00:42:00,980 --> 00:42:02,900
WILLIAM GREEN: Is that--

747
00:42:02,900 --> 00:42:07,700
AUDIENCE: So you have the next
[INAUDIBLE] K [INAUDIBLE]..

748
00:42:07,700 --> 00:42:13,410
WILLIAM GREEN: Oh I'm sorry,
this is the capital K,

749
00:42:13,410 --> 00:42:17,810
this is the number
of data points.

750
00:42:17,810 --> 00:42:24,748
So little k is equal
to 1 to capital K.

751
00:42:24,748 --> 00:42:28,140
AUDIENCE: So does capital K
count for both experiments?

752
00:42:28,140 --> 00:42:32,330
WILLIAM GREEN: It's the
number of distinct data values

753
00:42:32,330 --> 00:42:34,740
after you've already
averaged over repeats.

754
00:42:34,740 --> 00:42:39,960
So you do m experiments, at each
experiment you measure capital

755
00:42:39,960 --> 00:42:42,270
I observables.

756
00:42:42,270 --> 00:42:45,860
So it's like m times I, so
K. If you measured everything

757
00:42:45,860 --> 00:42:50,010
in every experiment,
it's equal to I times m.

758
00:42:59,060 --> 00:43:04,130
Now there's two ways that people
approach this in literature.

759
00:43:04,130 --> 00:43:05,720
The fancy way is
you say, you know,

760
00:43:05,720 --> 00:43:09,490
this covariance matrix comes
in in a pretty important way

761
00:43:09,490 --> 00:43:11,870
into this probability
distribution function.

762
00:43:11,870 --> 00:43:14,270
And so maybe I need to worry
a lot about whether I really

763
00:43:14,270 --> 00:43:16,500
know the covariance matrix.

764
00:43:16,500 --> 00:43:22,440
And my uncertainty in the
mean drops pretty fast

765
00:43:22,440 --> 00:43:26,020
as I do averaging, but
I'm not so confident

766
00:43:26,020 --> 00:43:29,350
that my answer in the
covariance matrix was small.

767
00:43:29,350 --> 00:43:31,980
So what people do
sometimes is they'll

768
00:43:31,980 --> 00:43:40,940
try to vary both c and theta,
and try to get a best fit where

769
00:43:40,940 --> 00:43:41,750
they're varying c.

770
00:43:41,750 --> 00:43:43,458
But then they have
additional constraints

771
00:43:43,458 --> 00:43:46,190
on c that c has to satisfy the
equations you gave last time

772
00:43:46,190 --> 00:43:49,384
about how you calculate the
covariance matrix from data.

773
00:43:49,384 --> 00:43:51,050
And so I was saying,
well, I want this c

774
00:43:51,050 --> 00:43:54,050
to personify these
equations pretty well,

775
00:43:54,050 --> 00:44:02,930
but true covariance of
the world of the system

776
00:44:02,930 --> 00:44:04,370
is not the same
as what I actually

777
00:44:04,370 --> 00:44:08,460
measure by just measuring, say,
five repeats of an experiment.

778
00:44:08,460 --> 00:44:10,722
And so I might
want to vary the c.

779
00:44:10,722 --> 00:44:14,120
You try to vary the c, turns out
to be kind of complicated math,

780
00:44:14,120 --> 00:44:15,770
so not many people do it.

781
00:44:15,770 --> 00:44:17,646
Even though conceptually
it makes some sense,

782
00:44:17,646 --> 00:44:19,686
you should worry about
the fact you're not really

783
00:44:19,686 --> 00:44:20,810
sure about the covariance.

784
00:44:20,810 --> 00:44:22,950
So what a lot of
people do is they say,

785
00:44:22,950 --> 00:44:25,460
let's just use the c that's
computed from the formulas

786
00:44:25,460 --> 00:44:27,180
I gave you last
time experimentally.

787
00:44:27,180 --> 00:44:34,190
So just say, let's just take c
experimental, put them in here.

788
00:44:34,190 --> 00:44:36,600
And now this is a constant.

789
00:44:36,600 --> 00:44:40,440
And now the only thing
that varies in this problem

790
00:44:40,440 --> 00:44:43,290
is thetas which come
into the epsilons.

791
00:44:43,290 --> 00:44:47,290
Because the epsilons
depend on theta.

792
00:44:47,290 --> 00:44:49,980
And so in that
case, I can just try

793
00:44:49,980 --> 00:44:53,390
to maximize this probability.

794
00:44:53,390 --> 00:44:55,990
And what that happens
to do is to minimize

795
00:44:55,990 --> 00:44:59,330
this quantity in the exponent.

796
00:44:59,330 --> 00:45:02,650
And so all I need to
do is say, for example,

797
00:45:02,650 --> 00:45:11,500
theta best is equal to arg
[INAUDIBLE] theta epsilon

798
00:45:11,500 --> 00:45:16,737
of theta c epsilon.

799
00:45:22,550 --> 00:45:24,550
And so this is the least
squares fitting problem

800
00:45:24,550 --> 00:45:26,320
that you guys have
probably done before.

801
00:45:26,320 --> 00:45:27,695
And probably what
you did was you

802
00:45:27,695 --> 00:45:30,580
assumed I had perfectly
uncorrelated data,

803
00:45:30,580 --> 00:45:32,170
and all my errors were the same.

804
00:45:32,170 --> 00:45:35,380
And so c, which is the identity
matrix, and I took it out.

805
00:45:35,380 --> 00:45:36,521
Probably did that before?

806
00:45:36,521 --> 00:45:39,290
Yeah, OK.

807
00:45:39,290 --> 00:45:42,710
That's pretty dangerous
to do, I'd say.

808
00:45:42,710 --> 00:45:45,400
What people do a lot, which is
a little bit less dangerous,

809
00:45:45,400 --> 00:45:47,480
is at least say,
well, you know, when

810
00:45:47,480 --> 00:45:53,540
I measure the concentration
of species x by GC,

811
00:45:53,540 --> 00:45:57,020
I have an error bar
of plus or minus 5%.

812
00:45:57,020 --> 00:46:00,080
And when I measure
the temperature

813
00:46:00,080 --> 00:46:02,525
with my thermocouple,
I have an error bar

814
00:46:02,525 --> 00:46:04,790
of plus or minus 2 degrees.

815
00:46:04,790 --> 00:46:08,360
And so the variances of these
guys should be a lot different,

816
00:46:08,360 --> 00:46:11,180
temperature and GC signal.

817
00:46:11,180 --> 00:46:14,895
And therefore I definitely need
to weight my deviation somehow.

818
00:46:14,895 --> 00:46:16,520
And it's really what
you do is you keep

819
00:46:16,520 --> 00:46:19,551
the diagonal entries of this.

820
00:46:19,551 --> 00:46:20,300
That's often done.

821
00:46:20,300 --> 00:46:23,570
And we just forget the fact
that they might be covariant.

822
00:46:23,570 --> 00:46:25,040
But if you've done
the experiments,

823
00:46:25,040 --> 00:46:26,310
you actually do have
enough information

824
00:46:26,310 --> 00:46:27,840
to compute this thing anyway,
so you might as well just

825
00:46:27,840 --> 00:46:29,068
use the experimental value.

826
00:46:33,860 --> 00:46:35,340
So this is the
least squares thing.

827
00:46:35,340 --> 00:46:38,502
And let's think, what
the heck is this doing?

828
00:46:38,502 --> 00:46:40,960
We're saying, all of a sudden
we grabbed all the parameters

829
00:46:40,960 --> 00:46:42,626
in the model, which
might include things

830
00:46:42,626 --> 00:46:45,940
like the molecular weight
of hydrogen or something.

831
00:46:45,940 --> 00:46:49,120
And we can find the
very best values

832
00:46:49,120 --> 00:46:52,445
that would make our data match
the experiment as best as

833
00:46:52,445 --> 00:46:52,945
possible.

834
00:46:55,630 --> 00:46:57,820
And in some sense,
that's great, we

835
00:46:57,820 --> 00:47:00,310
know the best values and
parameters for our experiment.

836
00:47:00,310 --> 00:47:02,170
But of course, if we vary the
molecular weight of hydrogen,

837
00:47:02,170 --> 00:47:04,330
it's going to screw up
somebody else's experiment.

838
00:47:04,330 --> 00:47:05,830
Because somebody else
did some other experiment

839
00:47:05,830 --> 00:47:07,913
that depended on the
molecular weight of hydrogen,

840
00:47:07,913 --> 00:47:09,980
and they had to get
some other value

841
00:47:09,980 --> 00:47:12,400
to match their experiment.

842
00:47:12,400 --> 00:47:15,820
So in these parameter
set, anything

843
00:47:15,820 --> 00:47:17,380
I do to vary those
parameters, I got

844
00:47:17,380 --> 00:47:20,209
to watch out that maybe
some of those parameters

845
00:47:20,209 --> 00:47:22,500
are involved with somebody
else's model and [INAUDIBLE]

846
00:47:22,500 --> 00:47:24,010
some other experiments.

847
00:47:24,010 --> 00:47:27,230
And I'm not really free
to vary them all freely.

848
00:47:27,230 --> 00:47:28,850
So this is the idea
from the Bayesian

849
00:47:28,850 --> 00:47:32,020
of having the
prior intimation is

850
00:47:32,020 --> 00:47:35,500
so you know some of the ranges
on these thetas already,

851
00:47:35,500 --> 00:47:38,454
and some of them you might know
really sharp distributions,

852
00:47:38,454 --> 00:47:40,870
like the molecular weight of
hydrogen. You might know that

853
00:47:40,870 --> 00:47:43,070
to a lot of decimal places.

854
00:47:43,070 --> 00:47:46,810
And so when people
do this, normally you

855
00:47:46,810 --> 00:47:49,120
don't vary all of the thetas.

856
00:47:49,120 --> 00:47:51,450
Usually what you do is
you select a set of thetas

857
00:47:51,450 --> 00:47:55,020
that you feel free to vary
because they're so uncertain,

858
00:47:55,020 --> 00:47:58,380
and other thetas that you think,
oh, I better not touch them.

859
00:47:58,380 --> 00:48:01,716
Because if I adjust
them, I may go

860
00:48:01,716 --> 00:48:03,840
to crazy values that are
inconsistent with somebody

861
00:48:03,840 --> 00:48:05,275
else's experiment.

862
00:48:05,275 --> 00:48:07,150
So a lot of times like
the molecular weights,

863
00:48:07,150 --> 00:48:08,080
you would not touch them.

864
00:48:08,080 --> 00:48:09,704
You would just say,
I got to just stick

865
00:48:09,704 --> 00:48:12,640
to the recommended
values and the tables.

866
00:48:12,640 --> 00:48:14,890
I'm not free to vary the
molecular weight of hydrogen,

867
00:48:14,890 --> 00:48:16,640
even though if I did
it would make my data

868
00:48:16,640 --> 00:48:18,460
match my experiment better.

869
00:48:18,460 --> 00:48:20,390
Makes my model
and the experiment

870
00:48:20,390 --> 00:48:22,520
match more precisely.

871
00:48:22,520 --> 00:48:25,450
So deciding which
parameters to vary in this

872
00:48:25,450 --> 00:48:27,490
is a really crucial thing.

873
00:48:30,610 --> 00:48:35,800
And that's a lot
of the art of doing

874
00:48:35,800 --> 00:48:39,430
this has to do with that issue.

875
00:48:39,430 --> 00:48:42,130
Also, you don't have
to keep the thetas

876
00:48:42,130 --> 00:48:43,310
in the form you have them.

877
00:48:43,310 --> 00:48:44,560
You could do a transformation.

878
00:48:44,560 --> 00:48:49,420
So you could change to, say,
W's that's equal to, say,

879
00:48:49,420 --> 00:48:52,540
some matrix times the
thetas, and I could express

880
00:48:52,540 --> 00:48:54,760
the equation in
terms of the W's.

881
00:48:54,760 --> 00:48:57,640
So I could transform my original
representational parameters

882
00:48:57,640 --> 00:48:59,670
as some other parameters.

883
00:48:59,670 --> 00:49:03,530
And often times,
your experiment might

884
00:49:03,530 --> 00:49:06,230
be really good at determining
some of these W's, even

885
00:49:06,230 --> 00:49:09,800
if it might be incapable of
determining any of the thetas.

886
00:49:09,800 --> 00:49:13,830
So you often might know
some linear combination

887
00:49:13,830 --> 00:49:15,830
of parameters, or maybe
not linear combinations,

888
00:49:15,830 --> 00:49:18,590
some non-linear
combination of parameters

889
00:49:18,590 --> 00:49:22,790
might actually be determinable
very well from your experiment,

890
00:49:22,790 --> 00:49:24,950
even though you can't
determine things separately.

891
00:49:24,950 --> 00:49:27,290
And this gets into the idea
of dimensionless numbers.

892
00:49:27,290 --> 00:49:30,680
So your experiment might depend
on some dimensionless number

893
00:49:30,680 --> 00:49:32,241
very sensitively.

894
00:49:32,241 --> 00:49:34,490
And you can be quite confident
from your external data

895
00:49:34,490 --> 00:49:36,657
what the value of that
dimensionless number must be.

896
00:49:36,657 --> 00:49:38,656
But if you look inside
the dimensionless number,

897
00:49:38,656 --> 00:49:40,350
it depends on a lot
of different things.

898
00:49:40,350 --> 00:49:41,990
And you might not have
any information about them

899
00:49:41,990 --> 00:49:42,770
separately.

900
00:49:42,770 --> 00:49:44,600
All you know is about
your experiment just

901
00:49:44,600 --> 00:49:48,410
tells you the value of that
one parameter very accurately.

902
00:49:48,410 --> 00:49:51,080
So this is another
big part of the art

903
00:49:51,080 --> 00:49:54,240
of doing the model
versus data is setting up

904
00:49:54,240 --> 00:49:58,050
your model in terms of
parameters that you really

905
00:49:58,050 --> 00:50:00,900
can determine, and getting
out all the ones you can't

906
00:50:00,900 --> 00:50:02,370
determine and fixing them.

907
00:50:02,370 --> 00:50:04,469
So we're really going
to generally change do

908
00:50:04,469 --> 00:50:05,260
this kind of thing.

909
00:50:05,260 --> 00:50:13,050
But we're going to say
that some thetas are fixed,

910
00:50:13,050 --> 00:50:21,780
and also we might change to
a different representation,

911
00:50:21,780 --> 00:50:26,780
change to W's instead.

912
00:50:26,780 --> 00:50:27,950
Yeah?

913
00:50:27,950 --> 00:50:30,800
AUDIENCE: Can you explain
where this transform--

914
00:50:30,800 --> 00:50:32,470
I don't really know
what's up with--

915
00:50:32,470 --> 00:50:34,220
WILLIAM GREEN: Yeah,
let's get an example.

916
00:50:34,220 --> 00:50:39,640
Suppose I was doing a reactor
that had A equilibrium of B.

917
00:50:39,640 --> 00:50:41,520
And I was really
interested in kf,

918
00:50:41,520 --> 00:50:44,720
the forward rate for A going
to B. I'm a kineticist,

919
00:50:44,720 --> 00:50:48,100
I love to know A
goes to B. However,

920
00:50:48,100 --> 00:50:50,210
if I setup the
experiment wrong, it

921
00:50:50,210 --> 00:50:53,140
might be that this readction
ran all the way to equilibrium.

922
00:50:53,140 --> 00:50:54,890
And what I see in the
products is actually

923
00:50:54,890 --> 00:50:57,620
just the equilibrium
ratio of A to B.

924
00:50:57,620 --> 00:51:03,630
So what I'm measuring might
be something that's dependent

925
00:51:03,630 --> 00:51:06,110
really on kf over kr, and
that might be the quantity

926
00:51:06,110 --> 00:51:07,136
I can really determine.

927
00:51:07,136 --> 00:51:08,635
Because that's
equilibrium constant.

928
00:51:11,160 --> 00:51:13,030
If I didn't think
about it, I could just

929
00:51:13,030 --> 00:51:16,950
try to have the model fitting
procedure, just optimize

930
00:51:16,950 --> 00:51:18,760
to find the very
best value of kf.

931
00:51:18,760 --> 00:51:21,700
And in that situation, it
might have a lot of trouble,

932
00:51:21,700 --> 00:51:24,130
because it might be
quite indeterminant what

933
00:51:24,130 --> 00:51:27,074
the kf is, because really all
that matters is the ratio.

934
00:51:27,074 --> 00:51:28,490
Also I think about
this some more,

935
00:51:28,490 --> 00:51:33,080
suppose I run at
short times, and I

936
00:51:33,080 --> 00:51:34,580
measure the time dependence.

937
00:51:34,580 --> 00:51:37,230
What I'm really
measuring is kf plus kr.

938
00:51:37,230 --> 00:51:39,490
Do you remember we
did the analysis of A

939
00:51:39,490 --> 00:51:43,930
goes to B, one of the
early homework problems?

940
00:51:43,930 --> 00:51:48,680
The time cost was actually kf
plus kr, not kf separately.

941
00:51:48,680 --> 00:51:51,210
And so if I measure
the exponential decay

942
00:51:51,210 --> 00:51:53,507
time constant, I'm really
determining kf plus kr,

943
00:51:53,507 --> 00:51:55,340
I might be able to
determine that very well.

944
00:51:55,340 --> 00:51:57,170
Actually, in my lab, I can
do a great job for this.

945
00:51:57,170 --> 00:51:58,670
I have an instrument
that can measure

946
00:51:58,670 --> 00:52:00,378
the time constant of
the exponential of k

947
00:52:00,378 --> 00:52:02,540
really precisely, but
it's determining the sum.

948
00:52:02,540 --> 00:52:05,170
It's not determining either
one of them separately.

949
00:52:05,170 --> 00:52:07,045
And I might have to do
a separate experiment,

950
00:52:07,045 --> 00:52:09,200
say a thermo-experiment
to get the ratio.

951
00:52:09,200 --> 00:52:12,000
And then from the two I can put
them together and get the two

952
00:52:12,000 --> 00:52:13,914
values distinctly.

953
00:52:13,914 --> 00:52:15,330
So this will be
an example of this

954
00:52:15,330 --> 00:52:24,760
would be a W. My W is kf plus
kr, the matrix would be 1001,

955
00:52:24,760 --> 00:52:26,410
something like that.

956
00:52:26,410 --> 00:52:29,220
1-1, something like that.

957
00:52:29,220 --> 00:52:32,860
Where I add these
two guys, kf, kr.

958
00:52:32,860 --> 00:52:36,280
These are my two
parameters, 1 plus 1.

959
00:52:36,280 --> 00:52:39,690
And I can determine W1
now very accurately.

960
00:52:39,690 --> 00:52:44,820
sorry, this is m, this is W.

961
00:52:44,820 --> 00:52:50,570
So now in terms of W, this has
two parameters now, W1 and W2.

962
00:52:50,570 --> 00:52:52,790
I can't determine W2
from my experiment,

963
00:52:52,790 --> 00:52:54,497
but I can determine
W1 really well.

964
00:52:54,497 --> 00:52:56,330
So then when I do the
least squares fitting,

965
00:52:56,330 --> 00:52:58,010
I should vary W1.

966
00:52:58,010 --> 00:53:00,350
I can fix it it for
my experimental data,

967
00:53:00,350 --> 00:53:02,400
and just leave W2
fixed at some value.

968
00:53:02,400 --> 00:53:05,470
I can't do anything about it.

969
00:53:05,470 --> 00:53:06,291
That all right?

970
00:53:09,060 --> 00:53:11,630
Now, do you get the difference
in these two points of view?

971
00:53:11,630 --> 00:53:16,650
This is like, two
completely different ways

972
00:53:16,650 --> 00:53:18,702
to look at the problem.

973
00:53:18,702 --> 00:53:20,660
You can think about it
as, these parameters are

974
00:53:20,660 --> 00:53:23,150
free for me to vary,
and I just have

975
00:53:23,150 --> 00:53:25,975
to be careful to select the
ones I'm really free to vary.

976
00:53:25,975 --> 00:53:28,100
And that's the least squares
fitting point of view.

977
00:53:28,100 --> 00:53:32,421
Or I could say, I'm not
really determining anything

978
00:53:32,421 --> 00:53:34,670
in particular, all I'm doing
is taking the whole range

979
00:53:34,670 --> 00:53:36,545
of uncertainty that we
have about parameters,

980
00:53:36,545 --> 00:53:41,080
and by my experiment, I narrowed
it down in the Bayesian view.

981
00:53:41,080 --> 00:53:42,790
So it's the two
different points of view.

982
00:53:42,790 --> 00:53:46,040
To do this one, I need to
make sure I have enough data

983
00:53:46,040 --> 00:53:48,204
to determine something.

984
00:53:48,204 --> 00:53:49,620
So I have to have
enough determine

985
00:53:49,620 --> 00:53:52,490
some parameter, at
least one, otherwise

986
00:53:52,490 --> 00:53:54,212
there's no point in doing this.

987
00:53:54,212 --> 00:53:56,420
This one I can do even if
I can't determine anything,

988
00:53:56,420 --> 00:54:00,230
because I could still narrow
down the range of parameters.

989
00:54:00,230 --> 00:54:04,490
But this might be harder
to report in a table.

990
00:54:04,490 --> 00:54:08,120
Because all I have at the end is
a new probably density function

991
00:54:08,120 --> 00:54:11,130
of multiple parameters.

992
00:54:11,130 --> 00:54:11,820
All right?

993
00:54:11,820 --> 00:54:13,470
OK, we're done.

994
00:54:13,470 --> 00:54:16,220
See you guys on Friday.