1
00:00:00,000 --> 00:00:02,430
The following content is
provided under a Creative

2
00:00:02,430 --> 00:00:03,730
Commons license.

3
00:00:03,730 --> 00:00:06,030
Your support will help
MIT OpenCourseWare

4
00:00:06,030 --> 00:00:10,060
continue to offer high quality
educational resources for free.

5
00:00:10,060 --> 00:00:12,690
To make a donation or to
view additional materials

6
00:00:12,690 --> 00:00:16,560
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:16,560 --> 00:00:17,904
at ocw.mit.edu.

8
00:00:21,790 --> 00:00:25,810
DUANE BONING: OK,
so last time we

9
00:00:25,810 --> 00:00:28,600
continued with our discussion
of design of experiments

10
00:00:28,600 --> 00:00:32,830
and especially looking at
fractional factorial designs,

11
00:00:32,830 --> 00:00:35,260
some of the aliasing
patterns that come up,

12
00:00:35,260 --> 00:00:39,310
and how that interplays
with model construction,

13
00:00:39,310 --> 00:00:42,640
in particular, what terms
of a model you can include,

14
00:00:42,640 --> 00:00:44,770
what you can't
include, as well as

15
00:00:44,770 --> 00:00:49,340
a few ideas on different
kinds of patterns,

16
00:00:49,340 --> 00:00:51,790
things like the central
composite pattern as well as

17
00:00:51,790 --> 00:00:54,070
fractional or full factorial.

18
00:00:54,070 --> 00:00:56,560
What I want to do today is
pick up a little bit more

19
00:00:56,560 --> 00:01:00,130
on response surface
modeling, or RSM.

20
00:01:00,130 --> 00:01:01,790
We've already touched
on some of these,

21
00:01:01,790 --> 00:01:03,748
but there's a couple of
things I've alluded to,

22
00:01:03,748 --> 00:01:06,310
but we haven't really
shown you, things

23
00:01:06,310 --> 00:01:10,120
like how one gets confidence
intervals on the estimates

24
00:01:10,120 --> 00:01:12,400
of coefficients in the model.

25
00:01:12,400 --> 00:01:15,190
Just like when we were
doing some estimation

26
00:01:15,190 --> 00:01:17,140
of statistical
distributions, we would

27
00:01:17,140 --> 00:01:20,980
say we want more than just
an estimate of the mean

28
00:01:20,980 --> 00:01:23,080
or an estimate of the
variance of a process.

29
00:01:23,080 --> 00:01:27,640
We would like to know what range
we might say 90% of the time

30
00:01:27,640 --> 00:01:31,330
with 90% confidence, we think
the true mean or true variance

31
00:01:31,330 --> 00:01:32,470
lies.

32
00:01:32,470 --> 00:01:36,040
Similarly, when we're fitting
models and model coefficients,

33
00:01:36,040 --> 00:01:38,230
we'd like some
notion of what range

34
00:01:38,230 --> 00:01:42,160
we think the true model
coefficients likely lie

35
00:01:42,160 --> 00:01:44,060
based on the data that we have.

36
00:01:44,060 --> 00:01:45,880
So I want to go over
that a little bit.

37
00:01:45,880 --> 00:01:49,180
And then we'll start talking
about using these models

38
00:01:49,180 --> 00:01:53,950
for process optimization,
so combining

39
00:01:53,950 --> 00:01:56,980
a little bit of the
response surface methodology

40
00:01:56,980 --> 00:02:01,240
with design of experiments
both in sequential fashion

41
00:02:01,240 --> 00:02:04,660
and in iterative fashion,
where one might adapt

42
00:02:04,660 --> 00:02:09,460
the model on the fly or based
on the additional experiments

43
00:02:09,460 --> 00:02:13,360
in order to drive the process
or try to seek out and find

44
00:02:13,360 --> 00:02:15,290
an optimum in the process.

45
00:02:15,290 --> 00:02:16,570
So that's the plan.

46
00:02:16,570 --> 00:02:21,430
I've noted here a
reading assignment.

47
00:02:21,430 --> 00:02:23,440
You can read all of chapter 8.

48
00:02:23,440 --> 00:02:26,890
It's actually interesting,
but what I'm mostly focused on

49
00:02:26,890 --> 00:02:30,070
are the first three
sections in May and Spanos,

50
00:02:30,070 --> 00:02:32,240
which talks about
process modeling.

51
00:02:32,240 --> 00:02:34,240
So it's covering a
lot of the material

52
00:02:34,240 --> 00:02:38,980
here on response surface models,
model fitting, a little bit

53
00:02:38,980 --> 00:02:40,990
of regression, and
then also using

54
00:02:40,990 --> 00:02:42,870
these things for optimization.

55
00:02:42,870 --> 00:02:45,340
So a couple of chapters
that have a little bit more

56
00:02:45,340 --> 00:02:49,040
advanced material on
principal component analysis,

57
00:02:49,040 --> 00:02:52,850
which we may come back
to a little bit later.

58
00:02:52,850 --> 00:02:54,160
OK, so that's the plan.

59
00:02:58,380 --> 00:03:02,210
Here's a list of some of the
fundamentals of regression.

60
00:03:02,210 --> 00:03:04,685
When we were talking
about fractional factorial

61
00:03:04,685 --> 00:03:11,180
and factorial design, especially
those formed out of contrast,

62
00:03:11,180 --> 00:03:15,260
that simplified method
using differences

63
00:03:15,260 --> 00:03:18,660
in different
collections of the data,

64
00:03:18,660 --> 00:03:20,930
we found that those
were very useful,

65
00:03:20,930 --> 00:03:25,190
quick ways to be able to
estimate model effects,

66
00:03:25,190 --> 00:03:29,600
to fill those into ANOVA tables
and decide if those effects are

67
00:03:29,600 --> 00:03:32,210
significant, and then
also the relationship

68
00:03:32,210 --> 00:03:36,680
of those for model
coefficient estimation.

69
00:03:36,680 --> 00:03:40,350
I want to talk a
little bit about

70
00:03:40,350 --> 00:03:42,690
the alternative
perspective, which

71
00:03:42,690 --> 00:03:47,130
is regression as a way for
fitting those coefficients.

72
00:03:47,130 --> 00:03:49,700
And we've already
done some of that.

73
00:03:49,700 --> 00:03:51,690
What I'm going to
illustrate here

74
00:03:51,690 --> 00:03:55,860
is our basic assumption
and what falls out

75
00:03:55,860 --> 00:04:01,050
of using minimization of
least square or squared error

76
00:04:01,050 --> 00:04:05,130
estimates in order to
fit the coefficients

77
00:04:05,130 --> 00:04:07,260
or estimate the
coefficients in a model.

78
00:04:07,260 --> 00:04:12,570
And I want to talk a little
bit more about estimation.

79
00:04:12,570 --> 00:04:14,160
We've already
touched on estimation

80
00:04:14,160 --> 00:04:15,990
using the normal equations.

81
00:04:15,990 --> 00:04:19,500
But especially I want to talk
about the variance again,

82
00:04:19,500 --> 00:04:22,710
in these coefficients, things
like the confidence intervals

83
00:04:22,710 --> 00:04:26,920
for fitting of coefficients.

84
00:04:26,920 --> 00:04:31,930
I'm going to do this here mostly
in the context of a simplified

85
00:04:31,930 --> 00:04:34,810
perspective, a one
parameter model.

86
00:04:34,810 --> 00:04:37,240
I just have one
input and one output.

87
00:04:37,240 --> 00:04:39,220
And we'll do the simplest model.

88
00:04:39,220 --> 00:04:41,650
We'll build it up to
a simple linear model,

89
00:04:41,650 --> 00:04:44,410
but all of these ideas
also carry through

90
00:04:44,410 --> 00:04:49,360
for polynomial regression
when I've got multiple inputs.

91
00:04:49,360 --> 00:04:50,860
But I think it's a
little bit easier

92
00:04:50,860 --> 00:04:54,850
to see and discuss in the
context of a simplified--

93
00:04:54,850 --> 00:04:56,660
a simplified model.

94
00:04:56,660 --> 00:04:59,650
And we also talked
last time a bit--

95
00:04:59,650 --> 00:05:01,780
the last couple of
times about lack of fit.

96
00:05:01,780 --> 00:05:04,480
And I have a little
example that carries us

97
00:05:04,480 --> 00:05:08,440
through the development of a
model looking for lack of fit

98
00:05:08,440 --> 00:05:12,460
or seeing lack of fit
and extending the model.

99
00:05:12,460 --> 00:05:16,640
So it's got a small
example embedded in here.

100
00:05:16,640 --> 00:05:18,190
In fact, that
small example might

101
00:05:18,190 --> 00:05:24,790
look familiar to those of
you that saw or took 2853.

102
00:05:24,790 --> 00:05:28,120
It's actually the
same model that I

103
00:05:28,120 --> 00:05:32,350
described in a very condensed
lecture there on regression.

104
00:05:36,100 --> 00:05:39,640
It's also important,
I think, for us to get

105
00:05:39,640 --> 00:05:41,320
a little bit of terminology.

106
00:05:41,320 --> 00:05:44,590
You've probably
run into measures

107
00:05:44,590 --> 00:05:50,470
of moral goodness, an overall
summary measure of R-squared

108
00:05:50,470 --> 00:05:53,320
that is an attempt to
capture how good the model is

109
00:05:53,320 --> 00:05:57,980
in describing what's
going on with your data.

110
00:05:57,980 --> 00:06:01,240
So once one is done, the ANOVA
analysis it's actually quite

111
00:06:01,240 --> 00:06:06,850
easy to calculate both the
goodness of fit R-squared

112
00:06:06,850 --> 00:06:11,230
and the adjusted R-squared as
shown here because they both

113
00:06:11,230 --> 00:06:12,850
depend on--

114
00:06:12,850 --> 00:06:17,350
both of these R-squared
measures and the ANOVA look

115
00:06:17,350 --> 00:06:22,270
at the amount of
variation in your data

116
00:06:22,270 --> 00:06:25,060
and the amount of variation
expressed in your model

117
00:06:25,060 --> 00:06:30,070
and use those to summarize
how good the model is.

118
00:06:30,070 --> 00:06:32,530
So the first measure,
this R-squared,

119
00:06:32,530 --> 00:06:35,320
is basically just
looking and saying

120
00:06:35,320 --> 00:06:43,170
if I were to simply model
my output as the mean, how

121
00:06:43,170 --> 00:06:46,830
much better does a model that
has more than the mean in it

122
00:06:46,830 --> 00:06:50,170
do in explaining the data?

123
00:06:50,170 --> 00:06:52,200
So essentially
what we do is look

124
00:06:52,200 --> 00:06:57,260
at the sum of squared
deviations around the mean.

125
00:06:57,260 --> 00:07:02,040
So this is total summit squared
deviations around the mean.

126
00:07:02,040 --> 00:07:05,700
And then we say OK,
how much sum of squared

127
00:07:05,700 --> 00:07:17,860
deviations based on the
model, so the amount explained

128
00:07:17,860 --> 00:07:21,820
in the model, compared
to the total deviations

129
00:07:21,820 --> 00:07:22,510
around the mean?

130
00:07:22,510 --> 00:07:28,780
What fraction of those
is captured in the model?

131
00:07:28,780 --> 00:07:32,050
So in other words,
if there's really

132
00:07:32,050 --> 00:07:36,940
nothing going on except a
flat dependency, that is,

133
00:07:36,940 --> 00:07:38,575
there is no slope with x.

134
00:07:38,575 --> 00:07:42,060
As I vary x, nothing
changes in the model,

135
00:07:42,060 --> 00:07:44,890
then this simplified
notion of R-squared

136
00:07:44,890 --> 00:07:49,570
is basically saying there
is no dependency on x.

137
00:07:49,570 --> 00:07:52,630
And therefore, I'm
going to explain

138
00:07:52,630 --> 00:07:54,850
with the model
essentially nothing.

139
00:07:54,850 --> 00:07:57,850
Now it's funny because
we are ignoring the fact

140
00:07:57,850 --> 00:08:01,420
that you might also be
fitting the mean value.

141
00:08:01,420 --> 00:08:04,270
But the notion captured in
the R-squared is dependence

142
00:08:04,270 --> 00:08:08,650
on the input, dependence on x.

143
00:08:08,650 --> 00:08:12,040
Now the big gotcha with
this simple measure

144
00:08:12,040 --> 00:08:16,450
is I can always add
model coefficients

145
00:08:16,450 --> 00:08:19,900
and fit more of my
data, or at least I

146
00:08:19,900 --> 00:08:25,460
can do that ignoring
replication in the model.

147
00:08:25,460 --> 00:08:30,980
For Example, we saw with a,
say a two input, one output

148
00:08:30,980 --> 00:08:35,450
model and a full factorial, If
I just have those four corner

149
00:08:35,450 --> 00:08:39,169
points, I can fit up
to a second order model

150
00:08:39,169 --> 00:08:40,590
with the interaction terms.

151
00:08:40,590 --> 00:08:42,230
If I have four data
points, I could

152
00:08:42,230 --> 00:08:45,350
fit the mean, first order,
first order, and interaction

153
00:08:45,350 --> 00:08:47,660
with exactly four coefficients.

154
00:08:47,660 --> 00:08:50,510
And in that case, what
would the R-squared

155
00:08:50,510 --> 00:08:55,920
be if I put my data with
all four coefficients?

156
00:08:55,920 --> 00:08:56,760
1.

157
00:08:56,760 --> 00:08:59,170
I would fit the data perfectly.

158
00:08:59,170 --> 00:09:00,710
Again, this is
without replication.

159
00:09:00,710 --> 00:09:02,790
And I would fit
the data perfectly.

160
00:09:05,300 --> 00:09:07,940
And therefore I'd have
an R-squared of 1.

161
00:09:07,940 --> 00:09:11,670
Now is that really
a perfect model?

162
00:09:11,670 --> 00:09:15,550
Well, kind of, but
what you've done

163
00:09:15,550 --> 00:09:18,190
is you've used all of
the degrees of freedom

164
00:09:18,190 --> 00:09:22,060
in the data to actually fit
or use them to fit the model.

165
00:09:22,060 --> 00:09:28,030
We also don't have any notion of
replication, which isn't really

166
00:09:28,030 --> 00:09:29,350
completely captured.

167
00:09:29,350 --> 00:09:33,400
So one way of
penalizing ourselves

168
00:09:33,400 --> 00:09:37,180
for the use of these
additional model terms

169
00:09:37,180 --> 00:09:42,340
is to essentially have a
different perspective referred

170
00:09:42,340 --> 00:09:46,450
to as the adjusted
R-squared, which essentially

171
00:09:46,450 --> 00:09:48,580
looks at the residual data.

172
00:09:48,580 --> 00:09:51,790
Rather than the deviations
captured by the model,

173
00:09:51,790 --> 00:09:55,210
it's looking at OK,
what deviations are not

174
00:09:55,210 --> 00:09:56,650
captured by the model?

175
00:09:56,650 --> 00:09:59,530
What residual data
would I have, which

176
00:09:59,530 --> 00:10:03,010
also has a side effect of
essentially penalizing us

177
00:10:03,010 --> 00:10:06,100
for the use of additional
model coefficients

178
00:10:06,100 --> 00:10:09,910
because we use up
degrees of freedom

179
00:10:09,910 --> 00:10:15,550
in the model when we
add model coefficients.

180
00:10:15,550 --> 00:10:18,310
So very often people talk
about the adjusted R square

181
00:10:18,310 --> 00:10:21,640
as this fair comparison
between models, especially

182
00:10:21,640 --> 00:10:25,270
between models where I may have
a simplified model with fewer

183
00:10:25,270 --> 00:10:27,820
coefficients and a more
complicated model with more

184
00:10:27,820 --> 00:10:28,990
coefficients.

185
00:10:28,990 --> 00:10:32,050
And essentially what
we do is form it

186
00:10:32,050 --> 00:10:37,480
as the ratio of the mean
square error of the residuals

187
00:10:37,480 --> 00:10:42,010
over the total mean square
variance, if you will,

188
00:10:42,010 --> 00:10:45,250
captured by deviations
around the mean

189
00:10:45,250 --> 00:10:47,710
and then subtract
that all from 1.

190
00:10:47,710 --> 00:10:50,770
And the way I like to think
about it is essentially,

191
00:10:50,770 --> 00:10:52,690
I start with the perfect model.

192
00:10:52,690 --> 00:10:57,550
And then any
residual error, which

193
00:10:57,550 --> 00:11:03,250
could include both replication
error and lack of fit error,

194
00:11:03,250 --> 00:11:08,320
whatever percentage error that
I don't capture in the data--

195
00:11:08,320 --> 00:11:10,750
the sum of squared
deviations divided

196
00:11:10,750 --> 00:11:15,700
by the degrees of freedom,
that's my mean square error

197
00:11:15,700 --> 00:11:19,180
estimate or my estimate
of the true total variance

198
00:11:19,180 --> 00:11:22,380
around the mean.

199
00:11:22,380 --> 00:11:26,430
Whatever fraction of that
that is in the residual,

200
00:11:26,430 --> 00:11:29,310
that's what I'm not modeling.

201
00:11:29,310 --> 00:11:32,940
So essentially what
we're doing is simply

202
00:11:32,940 --> 00:11:35,520
looking at what's not
expressed in the model.

203
00:11:35,520 --> 00:11:39,250
And the model can never
capture pure replication error,

204
00:11:39,250 --> 00:11:41,970
so it's got that variance, but
it might also have lack of it

205
00:11:41,970 --> 00:11:42,470
in it.

206
00:11:45,550 --> 00:11:49,090
So most statistical
packages will report out

207
00:11:49,090 --> 00:11:50,350
both of these numbers.

208
00:11:50,350 --> 00:11:52,070
You can also calculate them.

209
00:11:52,070 --> 00:11:57,910
But it's generally
I like the R-squared

210
00:11:57,910 --> 00:12:01,660
adjusted as a better measure.

211
00:12:01,660 --> 00:12:04,360
In part because it feels
to me a little bit more

212
00:12:04,360 --> 00:12:08,230
conceptual and comprehensive,
in terms of telling me

213
00:12:08,230 --> 00:12:10,300
what's not captured
in the model,

214
00:12:10,300 --> 00:12:15,670
how much pure variation
going on in the data

215
00:12:15,670 --> 00:12:17,360
is not in the model.

216
00:12:17,360 --> 00:12:21,370
However, you have to be really
careful interpreting what

217
00:12:21,370 --> 00:12:24,220
that R-squared is telling you.

218
00:12:24,220 --> 00:12:28,870
It's not necessarily telling you
that your model is good or bad.

219
00:12:28,870 --> 00:12:35,160
You might have a perfect model
given variant noise factors

220
00:12:35,160 --> 00:12:37,060
in the model.

221
00:12:37,060 --> 00:12:39,990
So for example, if
underlying everything,

222
00:12:39,990 --> 00:12:42,640
I've got a true
systematic dependency,

223
00:12:42,640 --> 00:12:45,840
but I also have pure
replication variance,

224
00:12:45,840 --> 00:12:49,170
that's going to limit how good
your R-squared can possibly

225
00:12:49,170 --> 00:12:53,610
be even if your model were
perfect in terms of capturing

226
00:12:53,610 --> 00:12:57,430
the systematic dependency.

227
00:12:57,430 --> 00:13:02,270
I think there was a question
lurking there in Singapore.

228
00:13:02,270 --> 00:13:07,190
AUDIENCE: Yes, so for
R-squared and just R-squared

229
00:13:07,190 --> 00:13:14,220
the closer those values are to
1, the better of the model is?

230
00:13:14,220 --> 00:13:16,084
DUANE BONING: Yes, definitely.

231
00:13:16,084 --> 00:13:17,000
AUDIENCE: OK.

232
00:13:17,000 --> 00:13:19,442
DUANE BONING:
Definitely 1 is better.

233
00:13:19,442 --> 00:13:20,900
But you have to be
a little careful

234
00:13:20,900 --> 00:13:23,300
in interpreting because even--

235
00:13:23,300 --> 00:13:26,390
AUDIENCE: What you just--

236
00:13:26,390 --> 00:13:28,160
no but what, Professor,
you just said

237
00:13:28,160 --> 00:13:33,140
is the R-squared increase both
the error of the model, also

238
00:13:33,140 --> 00:13:34,525
the error of the noise.

239
00:13:34,525 --> 00:13:35,900
So you can't really
differentiate

240
00:13:35,900 --> 00:13:36,870
between these two.

241
00:13:36,870 --> 00:13:39,860
DUANE BONING: That's
right, that's right.

242
00:13:39,860 --> 00:13:42,560
And that's where a lack of
fit analysis-- and we'll

243
00:13:42,560 --> 00:13:44,570
go in and do one of
those as well --is also

244
00:13:44,570 --> 00:13:47,180
still important for being
able to try to differentiate

245
00:13:47,180 --> 00:13:51,620
between those two sources of
imperfection in the model.

246
00:13:51,620 --> 00:13:53,390
Yeah?

247
00:13:53,390 --> 00:13:58,910
AUDIENCE: Also you mentioned
the second R-squared also being

248
00:13:58,910 --> 00:14:00,927
[INAUDIBLE].

249
00:14:00,927 --> 00:14:01,760
DUANE BONING: Right.

250
00:14:01,760 --> 00:14:03,890
AUDIENCE: Your
main concern is fit

251
00:14:03,890 --> 00:14:07,220
and having more
coefficients is cheap,

252
00:14:07,220 --> 00:14:12,280
would you prefer R-squared
or adjusted R-squared?

253
00:14:12,280 --> 00:14:14,230
DUANE BONING: So the
question is what would I

254
00:14:14,230 --> 00:14:17,830
prefer if the number of-- if
fitting additional coefficients

255
00:14:17,830 --> 00:14:18,532
is cheap.

256
00:14:18,532 --> 00:14:19,990
AUDIENCE: And fit
is more important

257
00:14:19,990 --> 00:14:23,380
DUANE BONING: And fit
is more important.

258
00:14:23,380 --> 00:14:29,160
I think I would still
essentially think

259
00:14:29,160 --> 00:14:33,870
of R-squared as somewhat of a
more representative description

260
00:14:33,870 --> 00:14:41,850
of the trade off between adding
coefficients, improving my fit.

261
00:14:41,850 --> 00:14:46,750
But also my R-squared
doesn't get as much batter.

262
00:14:46,750 --> 00:14:49,690
And in fact, if I
start overfitting,

263
00:14:49,690 --> 00:14:53,690
it will tend to degrade
slightly my R-squared.

264
00:14:53,690 --> 00:14:55,930
However, what I think
a better mechanism

265
00:14:55,930 --> 00:14:59,770
for actually making the decision
about whether to include

266
00:14:59,770 --> 00:15:03,280
coefficients or not is
an analysis of variance

267
00:15:03,280 --> 00:15:06,250
and looking at the
significance of those model

268
00:15:06,250 --> 00:15:11,370
coefficients, both the
significance and the magnitude.

269
00:15:11,370 --> 00:15:15,860
So I would tend to do more the
regression analysis together

270
00:15:15,860 --> 00:15:17,000
with the ANOVA.

271
00:15:17,000 --> 00:15:20,060
And the R-squared is a
nice aggregate measure,

272
00:15:20,060 --> 00:15:24,280
but it's not the thing that
drives my decision-making so

273
00:15:24,280 --> 00:15:27,308
much, so I hope that helps.

274
00:15:27,308 --> 00:15:29,350
So we'll see some examples
of some R-squared that

275
00:15:29,350 --> 00:15:32,680
come out of some analysis.

276
00:15:32,680 --> 00:15:38,170
Now we said that regression
is at least as almost--

277
00:15:38,170 --> 00:15:44,060
most commonly used is driven
by minimization of a least--

278
00:15:44,060 --> 00:15:46,780
or minimization of a
squared error measure.

279
00:15:46,780 --> 00:15:48,820
And this is just
trying to illustrate

280
00:15:48,820 --> 00:15:51,790
what I'm talking
about here with where

281
00:15:51,790 --> 00:15:57,730
the residuals, the differences
between my model and my data,

282
00:15:57,730 --> 00:16:00,917
may come from in
the simple 1D case.

283
00:16:00,917 --> 00:16:02,500
We've already talked
a bit about this,

284
00:16:02,500 --> 00:16:05,180
but I'm using a very,
very simple model here,

285
00:16:05,180 --> 00:16:06,555
which has only one term.

286
00:16:06,555 --> 00:16:10,540
It doesn't even have
a constant offset.

287
00:16:10,540 --> 00:16:15,100
It's simply got a linear, direct
linear dependence of the output

288
00:16:15,100 --> 00:16:16,210
on the input.

289
00:16:16,210 --> 00:16:19,120
And I'm saying
that the true model

290
00:16:19,120 --> 00:16:23,600
does have some noise in it,
which is normally distributed.

291
00:16:23,600 --> 00:16:26,470
And I'm fitting that
or estimating that

292
00:16:26,470 --> 00:16:29,500
with some coefficient,
a little b.

293
00:16:29,500 --> 00:16:35,370
And so this is my fit
through my data minimizing

294
00:16:35,370 --> 00:16:37,380
the squared
deviations, or I'd like

295
00:16:37,380 --> 00:16:39,510
to minimize the
squared deviations.

296
00:16:39,510 --> 00:16:42,360
And again, we're saying that any
differences between the model

297
00:16:42,360 --> 00:16:48,360
prediction, essentially the
y hat sub i minus the y sub i

298
00:16:48,360 --> 00:16:51,180
for that data point,
that's a residual.

299
00:16:51,180 --> 00:16:54,090
That's an error.

300
00:16:54,090 --> 00:16:58,320
And it can come from two factors
again, either lack of fit

301
00:16:58,320 --> 00:17:04,770
in the model or because of the
underlying noise in the data.

302
00:17:04,770 --> 00:17:08,230
Now last time, or
maybe even 2 times ago,

303
00:17:08,230 --> 00:17:11,760
we talked about the use
of regression numerically,

304
00:17:11,760 --> 00:17:16,589
if you will or
algebraically, to estimate

305
00:17:16,589 --> 00:17:20,460
this beta with the best
be based on a minimization

306
00:17:20,460 --> 00:17:24,250
of the sum of squared errors.

307
00:17:24,250 --> 00:17:27,839
So we take each one of
those residuals, square it,

308
00:17:27,839 --> 00:17:30,460
and then we sum that
over all of our data.

309
00:17:30,460 --> 00:17:31,950
And it turns out
what we're trying

310
00:17:31,950 --> 00:17:35,550
to do is find the
beta hat, the b that

311
00:17:35,550 --> 00:17:38,100
estimates the beta
hat that minimizes

312
00:17:38,100 --> 00:17:40,380
that sum of squared deviations.

313
00:17:40,380 --> 00:17:43,390
And what's nice
with linear models

314
00:17:43,390 --> 00:17:48,400
is there's an algebraic
way to find actually what b

315
00:17:48,400 --> 00:17:50,870
does that minimization for us.

316
00:17:50,870 --> 00:17:53,830
But I also want
to just remind you

317
00:17:53,830 --> 00:17:58,300
that lurking inside
of that minimization

318
00:17:58,300 --> 00:18:06,880
is an estimate of the total
sum of squared residuals, SSR,

319
00:18:06,880 --> 00:18:12,280
what's lurking back there in
that R and R-squared adjusted.

320
00:18:12,280 --> 00:18:14,440
And then if I divide
that out again

321
00:18:14,440 --> 00:18:20,560
by the degrees of
freedom, mu sub R,

322
00:18:20,560 --> 00:18:26,370
then I've got also my estimate
of variance in the underlying

323
00:18:26,370 --> 00:18:28,240
model assuming no lack of fit.

324
00:18:32,320 --> 00:18:34,600
So we said with least
squares estimation,

325
00:18:34,600 --> 00:18:39,140
I can form the set
of linear equations.

326
00:18:39,140 --> 00:18:42,940
And assuming that the residuals
are all normal or orthogonal

327
00:18:42,940 --> 00:18:50,680
to each other, then the sum
of the product of our residual

328
00:18:50,680 --> 00:18:53,200
and the input should sum to 0.

329
00:18:53,200 --> 00:18:55,480
And when you carry through
the algebra for that,

330
00:18:55,480 --> 00:19:01,180
out plops the formula for
the slope coefficient given

331
00:19:01,180 --> 00:19:04,750
our data, simply the sum
of the product of my x sub

332
00:19:04,750 --> 00:19:10,090
i times y sub i over the sum of
my x sub i squared, it's funky.

333
00:19:10,090 --> 00:19:13,150
And as I said,
here's our estimate

334
00:19:13,150 --> 00:19:16,030
of the underlying variance.

335
00:19:16,030 --> 00:19:20,140
That's our best estimate,
unbiased, best estimate

336
00:19:20,140 --> 00:19:21,820
of the process variance.

337
00:19:21,820 --> 00:19:24,920
And in this case, we're only
fitting one model coefficient.

338
00:19:24,920 --> 00:19:28,270
So I've got my total
number set of data

339
00:19:28,270 --> 00:19:31,780
and then I've just
got n minus 1,

340
00:19:31,780 --> 00:19:35,750
since I've only got
one model coefficient.

341
00:19:35,750 --> 00:19:37,580
Now the interesting
thing that I've

342
00:19:37,580 --> 00:19:40,790
alluded to in a previous
lecture but haven't

343
00:19:40,790 --> 00:19:46,640
shown you is I want more than
just the best estimate of b.

344
00:19:46,640 --> 00:19:50,050
I'd like to have a
confidence interval on b.

345
00:19:50,050 --> 00:19:55,060
Given the spread in the data
and an underlying normal noise

346
00:19:55,060 --> 00:19:58,750
model or noise assumption,
what do I think the range,

347
00:19:58,750 --> 00:20:04,300
say a 95% confidence interval,
might be on my estimation of b?

348
00:20:04,300 --> 00:20:08,860
And we can do that very simply
by taking the formula for b

349
00:20:08,860 --> 00:20:16,990
and simply doing our variance of
b calculation on that formula.

350
00:20:16,990 --> 00:20:18,970
It's just variance math.

351
00:20:18,970 --> 00:20:21,640
And that's what's broken
out here in the y.

352
00:20:21,640 --> 00:20:28,480
If I expand out the b
summation into a some

353
00:20:28,480 --> 00:20:33,040
of those individual terms, I can
then apply my normal variance

354
00:20:33,040 --> 00:20:35,270
math here.

355
00:20:35,270 --> 00:20:39,580
And what I've got for the
variance of that some--

356
00:20:39,580 --> 00:20:41,500
just thinking of each
of these elements

357
00:20:41,500 --> 00:20:46,750
has some constant, then that the
variance of that sum of terms

358
00:20:46,750 --> 00:20:49,780
is the variance of
the constant squared--

359
00:20:49,780 --> 00:20:53,080
or excuse me, the value of
the constant squared times

360
00:20:53,080 --> 00:20:55,910
the variance of each of
those underlying variables.

361
00:20:55,910 --> 00:20:58,840
And when you go and do
that, what you've got

362
00:20:58,840 --> 00:21:04,570
is another formula down
here for the variance

363
00:21:04,570 --> 00:21:13,110
in that coefficient b based
on the data that you've got.

364
00:21:13,110 --> 00:21:16,810
So once I've got
that up here, I've

365
00:21:16,810 --> 00:21:19,270
got my estimate
for the variance.

366
00:21:19,270 --> 00:21:23,110
Now we've got an estimate of
what 1 standard deviation would

367
00:21:23,110 --> 00:21:25,390
be in the variance.

368
00:21:25,390 --> 00:21:29,350
And then you can express that
based on whatever confidence

369
00:21:29,350 --> 00:21:30,220
interval you want.

370
00:21:30,220 --> 00:21:34,000
So I might write that
typically as b plus or minus

371
00:21:34,000 --> 00:21:39,070
1 standard error, 1
standard deviation in b.

372
00:21:39,070 --> 00:21:40,900
1 standard deviation--

373
00:21:40,900 --> 00:21:44,700
I can't remember, what that
correspond to, the typical?

374
00:21:44,700 --> 00:21:49,600
Got about a 90%
confidence interval?

375
00:21:49,600 --> 00:21:52,090
Plus or minus 1
standard deviation?

376
00:21:52,090 --> 00:21:56,030
67%, thank you.

377
00:21:56,030 --> 00:21:59,030
The one I always remember
is two standard errors.

378
00:21:59,030 --> 00:22:01,770
That's about 95% confidence.

379
00:22:01,770 --> 00:22:06,680
So if you wanted to 95%
confidence interval, now

380
00:22:06,680 --> 00:22:08,720
you know how to formulate that.

381
00:22:08,720 --> 00:22:11,480
It might be 1.96
or whatever it is.

382
00:22:15,690 --> 00:22:18,930
So there you have
nicely falling out

383
00:22:18,930 --> 00:22:22,860
of the basic mathematical
formulation for minimizing

384
00:22:22,860 --> 00:22:25,770
the sum of squares
both the best estimate

385
00:22:25,770 --> 00:22:28,920
for your slope and a confidence
interval to the slope.

386
00:22:31,830 --> 00:22:37,530
By the way, if you're based that
on a relatively small number

387
00:22:37,530 --> 00:22:39,390
of data points,
you should probably

388
00:22:39,390 --> 00:22:45,070
use a t distribution rather
than a normal distribution.

389
00:22:45,070 --> 00:22:51,390
So it might change my 1.964
for a 95% confidence interval,

390
00:22:51,390 --> 00:22:55,020
as we're used to.

391
00:22:55,020 --> 00:22:57,990
So this also lets us
now go back and do--

392
00:22:57,990 --> 00:23:01,590
think again about
another perspective

393
00:23:01,590 --> 00:23:03,232
on analysis of variance.

394
00:23:03,232 --> 00:23:05,190
In fact, you guys played
with this a little bit

395
00:23:05,190 --> 00:23:10,920
or saw this in a slightly
different form on the quiz.

396
00:23:10,920 --> 00:23:17,250
There's two ways of thinking
about the significance

397
00:23:17,250 --> 00:23:22,740
whether some slope coefficient
or model coefficient should

398
00:23:22,740 --> 00:23:24,600
be included in the model.

399
00:23:24,600 --> 00:23:27,150
The basic hypothesis
is are we saying

400
00:23:27,150 --> 00:23:30,510
do I have enough evidence to
suggest that that slope term is

401
00:23:30,510 --> 00:23:32,490
non-zero?

402
00:23:32,490 --> 00:23:36,990
If it might be 0 to some
degree of confidence,

403
00:23:36,990 --> 00:23:40,260
then I shouldn't include it.

404
00:23:40,260 --> 00:23:42,180
So one way of doing
it is the ANOVA

405
00:23:42,180 --> 00:23:45,765
with the ratio of
variances in the F test.

406
00:23:45,765 --> 00:23:49,440
The other way is basically
looking at the confidence

407
00:23:49,440 --> 00:23:54,220
interval for beta, say the
95% confidence interval,

408
00:23:54,220 --> 00:23:58,580
and if that intersects 0,
that says that more than 5%

409
00:23:58,580 --> 00:24:02,090
of the time based on just
random variation in the data,

410
00:24:02,090 --> 00:24:05,490
I might have a 0
coefficient there,

411
00:24:05,490 --> 00:24:08,750
in which case I cannot say that
it is significantly different

412
00:24:08,750 --> 00:24:10,820
than 0.

413
00:24:10,820 --> 00:24:12,800
So you can make
that determination

414
00:24:12,800 --> 00:24:15,410
about whether you should include
the model coefficient based

415
00:24:15,410 --> 00:24:19,870
on your confidence interval for
each individual term as well.

416
00:24:22,610 --> 00:24:24,800
So that's just alluding
back to what we already know

417
00:24:24,800 --> 00:24:28,220
but just trying to make
sure you see the connection

418
00:24:28,220 --> 00:24:31,100
or alternative ways of looking
at it either in the ANOVA

419
00:24:31,100 --> 00:24:34,880
table, or if you want to look
at individual coefficient

420
00:24:34,880 --> 00:24:37,760
terms, the confidence
intervals on those individual

421
00:24:37,760 --> 00:24:39,620
coefficients.

422
00:24:39,620 --> 00:24:42,930
OK, let's do an example.

423
00:24:42,930 --> 00:24:45,620
Here's a very
simple set of data.

424
00:24:45,620 --> 00:24:50,870
We've got some input, some
x value, call that "age".

425
00:24:50,870 --> 00:24:52,850
And some y values.

426
00:24:52,850 --> 00:24:56,350
Call that "income".

427
00:24:56,350 --> 00:24:59,180
And if I just plot the data,
let me get the data up here--

428
00:25:03,460 --> 00:25:06,645
actually, what I've
done here is used JUMP.

429
00:25:06,645 --> 00:25:08,770
I don't know how many of
you have played with JUMP,

430
00:25:08,770 --> 00:25:11,500
but I love JUMP because
it's nice and interactive.

431
00:25:11,500 --> 00:25:14,860
It does a lot of
regression analysis,

432
00:25:14,860 --> 00:25:18,350
lets me explore the data
fairly interactively,

433
00:25:18,350 --> 00:25:21,550
I like it a lot better
than Excel for doing

434
00:25:21,550 --> 00:25:22,840
some of these analysis.

435
00:25:22,840 --> 00:25:25,360
I think in an
earlier problem set,

436
00:25:25,360 --> 00:25:27,700
we did give you a pointer
to where you could run that

437
00:25:27,700 --> 00:25:32,240
on Athena and so on.

438
00:25:32,240 --> 00:25:34,750
And what this is doing is
basically looking and doing

439
00:25:34,750 --> 00:25:39,820
my analysis of variance for
a very simple linear model

440
00:25:39,820 --> 00:25:42,830
without a constant term.

441
00:25:42,830 --> 00:25:45,490
So I've just got one
model coefficient, looks

442
00:25:45,490 --> 00:25:48,550
at the sum of squares,
the mean square,

443
00:25:48,550 --> 00:25:51,550
looks at the residual with
the remaining data point

444
00:25:51,550 --> 00:25:53,170
forms and F.

445
00:25:53,170 --> 00:25:54,970
That F ratio is huge.

446
00:25:54,970 --> 00:25:59,800
It's 1,000, and the probability
of observing that large of an F

447
00:25:59,800 --> 00:26:01,510
is minuscule.

448
00:26:01,510 --> 00:26:06,850
So I have great confidence
that in fact there is a slope.

449
00:26:06,850 --> 00:26:10,660
And if I look down here
at my income leverage

450
00:26:10,660 --> 00:26:14,620
residual versus
the age parameter,

451
00:26:14,620 --> 00:26:21,510
I can see this is basically
just y sub i x sub i.

452
00:26:21,510 --> 00:26:25,240
I see a definite trend there.

453
00:26:25,240 --> 00:26:30,040
Now what this nice plot
has done is the solid line

454
00:26:30,040 --> 00:26:33,520
is my best fit, but it
is also plotted for us

455
00:26:33,520 --> 00:26:39,810
with the dashed line the
confidence on the output.

456
00:26:39,810 --> 00:26:46,150
I think it's a 95% confidence
interval on the output as well.

457
00:26:46,150 --> 00:26:50,040
Now I told you how to get an
estimate on the confidence

458
00:26:50,040 --> 00:26:52,050
interval for our b term.

459
00:26:52,050 --> 00:26:54,660
How do we get a confidence
interval on the output term?

460
00:26:57,570 --> 00:26:59,430
Well, what we're
going to need to do

461
00:26:59,430 --> 00:27:03,420
is also do the variance
calculations on our y formula

462
00:27:03,420 --> 00:27:06,870
and see how
uncertainty in our data

463
00:27:06,870 --> 00:27:12,570
also propagates through to
uncertainty in our output.

464
00:27:12,570 --> 00:27:14,130
But before we do
that, we can also

465
00:27:14,130 --> 00:27:17,130
see here in the
JUMP output things

466
00:27:17,130 --> 00:27:23,940
like the parameter estimates
for our age dependence.

467
00:27:23,940 --> 00:27:26,910
So here's our best guess
for the age dependence

468
00:27:26,910 --> 00:27:31,140
is a simple 0.5 estimate.

469
00:27:31,140 --> 00:27:34,410
And it is also showing us
things like the standard error

470
00:27:34,410 --> 00:27:38,010
in these typical ANOVA tables,
which we've ignored in the past

471
00:27:38,010 --> 00:27:39,450
if you've been looking at these.

472
00:27:39,450 --> 00:27:41,730
But that can also be
used them directly,

473
00:27:41,730 --> 00:27:44,670
as we talked about, to give
me a confidence interval,

474
00:27:44,670 --> 00:27:50,010
depending on what level of
error whatever level of alpha

475
00:27:50,010 --> 00:27:52,860
I want to be able to
estimate those things.

476
00:27:52,860 --> 00:27:56,190
And it's also looking
at an individual t ratio

477
00:27:56,190 --> 00:27:57,930
for each of the coefficients.

478
00:27:57,930 --> 00:28:00,420
I've only got one here,
but it's basically

479
00:28:00,420 --> 00:28:04,620
doing a one by one assessment
of each of my model coefficients

480
00:28:04,620 --> 00:28:08,530
to see if it's significant.

481
00:28:08,530 --> 00:28:11,830
And in fact, it's
significant since it's

482
00:28:11,830 --> 00:28:15,460
exactly ends up being the same
probability not really shown

483
00:28:15,460 --> 00:28:17,540
here.

484
00:28:17,540 --> 00:28:19,750
Essentially, the t
test and the F test

485
00:28:19,750 --> 00:28:25,125
are identical in
this simple example.

486
00:28:25,125 --> 00:28:32,550
AUDIENCE: [INAUDIBLE] think
of some subset of data,

487
00:28:32,550 --> 00:28:35,830
wouldn't it make sense to have
[INAUDIBLE] part of the data

488
00:28:35,830 --> 00:28:41,010
then use some for testing like
the model and seeing if it

489
00:28:41,010 --> 00:28:45,120
actually has a prediction
because if you use that entire

490
00:28:45,120 --> 00:28:48,890
data set then essentially--

491
00:28:48,890 --> 00:28:50,640
DUANE BONING: That's
an interesting point.

492
00:28:50,640 --> 00:28:53,630
So what you're saying
is how about the idea

493
00:28:53,630 --> 00:28:56,180
if you have a fair amount
of data of holding out

494
00:28:56,180 --> 00:28:59,640
some of the data, fitting
the data some portion of it,

495
00:28:59,640 --> 00:29:05,110
and the held back data to
sort of test the model.

496
00:29:05,110 --> 00:29:16,260
And I think, especially when
you do nonlinear models--

497
00:29:16,260 --> 00:29:17,790
and I don't mean
just polynomial,

498
00:29:17,790 --> 00:29:20,070
but I mean some other
nonlinear dependence

499
00:29:20,070 --> 00:29:24,360
--that cross validation is
extremely common and very

500
00:29:24,360 --> 00:29:25,740
useful.

501
00:29:25,740 --> 00:29:28,870
Here, you could do that.

502
00:29:28,870 --> 00:29:31,440
And essentially what
I think that's doing

503
00:29:31,440 --> 00:29:39,240
is allowing you to do a lack
of fit versus noise estimate.

504
00:29:39,240 --> 00:29:41,260
In other words,
what you're doing,

505
00:29:41,260 --> 00:29:43,530
I think conceptually,
there is saying here's

506
00:29:43,530 --> 00:29:45,900
what my model would
have predicted.

507
00:29:45,900 --> 00:29:47,370
Here's my data point.

508
00:29:47,370 --> 00:29:52,380
There's a residual that I'm
going to attribute maybe--

509
00:29:52,380 --> 00:29:55,830
again, it's to a mix of
random noise underlying

510
00:29:55,830 --> 00:30:00,060
but also model lack of fidelity.

511
00:30:04,110 --> 00:30:09,090
I think it's more common to go
ahead and use all of your data

512
00:30:09,090 --> 00:30:12,780
because then you've got
your aggregate measures

513
00:30:12,780 --> 00:30:16,740
and can run all of your tests
with the highest resolution

514
00:30:16,740 --> 00:30:18,600
possible.

515
00:30:18,600 --> 00:30:22,440
But I suspect there's
actually a relationship that's

516
00:30:22,440 --> 00:30:25,235
very close in there.

517
00:30:25,235 --> 00:30:27,360
I think it's a little better
to use all of the data

518
00:30:27,360 --> 00:30:30,330
because the more data you
have, the better your estimates

519
00:30:30,330 --> 00:30:32,610
of underlying
process variance are

520
00:30:32,610 --> 00:30:37,260
so you can better differentiate
lack of fit from noise.

521
00:30:37,260 --> 00:30:40,090
But I haven't thought
about that very much,

522
00:30:40,090 --> 00:30:42,900
especially of the
simple linear cases.

523
00:30:42,900 --> 00:30:45,090
It's an interesting approach.

524
00:30:51,490 --> 00:30:55,060
So I want to come back
to this lack of fit

525
00:30:55,060 --> 00:30:58,660
versus the pure error
because we talked

526
00:30:58,660 --> 00:31:02,590
about often being able to do
multiple runs at the same x

527
00:31:02,590 --> 00:31:03,670
values.

528
00:31:03,670 --> 00:31:06,580
In this data here
that I've shown you,

529
00:31:06,580 --> 00:31:10,930
we actually have a
difficulty in distinguishing

530
00:31:10,930 --> 00:31:15,460
between model lack of fit
and underlying variance.

531
00:31:15,460 --> 00:31:18,130
I had to basically
make an assumption

532
00:31:18,130 --> 00:31:21,670
that my underlying
model was truly linear.

533
00:31:21,670 --> 00:31:24,940
And then I'm basically
assuming, if I

534
00:31:24,940 --> 00:31:27,155
go back even further here--

535
00:31:27,155 --> 00:31:28,030
where did my data go?

536
00:31:30,730 --> 00:31:36,310
--I'm basically assuming a y
sub i is equal to beat sub--

537
00:31:36,310 --> 00:31:41,140
beta x sub i plus
epsilon sub i model.

538
00:31:41,140 --> 00:31:50,080
Why not-- I have really nothing
except ideas of parsimony,

539
00:31:50,080 --> 00:31:56,170
simple models in general
and perhaps prior knowledge

540
00:31:56,170 --> 00:31:59,320
of the physics of the
process to really say this

541
00:31:59,320 --> 00:32:01,120
is the form of the model.

542
00:32:01,120 --> 00:32:09,280
If you look at my data, why
couldn't my model be that?

543
00:32:09,280 --> 00:32:10,930
It may well be.

544
00:32:10,930 --> 00:32:13,330
It might have a very
complicated structure.

545
00:32:13,330 --> 00:32:15,980
That might be true.

546
00:32:15,980 --> 00:32:17,620
The problem is I don't have--

547
00:32:17,620 --> 00:32:24,080
in this random data, I
don't have any replicates

548
00:32:24,080 --> 00:32:26,780
to be able to give me
an independent notion

549
00:32:26,780 --> 00:32:34,310
of underlying repeated
variance noise from model form.

550
00:32:34,310 --> 00:32:35,930
And so that goes
back to what we said

551
00:32:35,930 --> 00:32:40,190
is if we have multiple runs at
the same x values, especially

552
00:32:40,190 --> 00:32:44,360
if we design an experiment
so that we do that,

553
00:32:44,360 --> 00:32:47,630
and we aren't using this
sort of happenstance data,

554
00:32:47,630 --> 00:32:50,840
then we can decompose
the total residual error

555
00:32:50,840 --> 00:32:55,460
into that lack of fit and
pure replicate error and start

556
00:32:55,460 --> 00:33:00,530
to be able to distinguish
between model structure

557
00:33:00,530 --> 00:33:05,600
and and pure replication error.

558
00:33:05,600 --> 00:33:07,540
And so we talked
previously about being

559
00:33:07,540 --> 00:33:11,950
able to form the F
test, the of variance

560
00:33:11,950 --> 00:33:16,840
explained by deviations
from model prediction

561
00:33:16,840 --> 00:33:21,190
in the replicate
data over total error

562
00:33:21,190 --> 00:33:25,360
and then seeing how likely it
would be to observe that ratio

563
00:33:25,360 --> 00:33:30,228
and use the F test in
the ANOVA test for that.

564
00:33:30,228 --> 00:33:32,520
And we'll come back to that
a little bit in an example.

565
00:33:36,510 --> 00:33:37,590
This is a quick one.

566
00:33:37,590 --> 00:33:41,790
I showed you an example here
where the previous example

567
00:33:41,790 --> 00:33:46,110
was a pure linear term without
even a constant offset.

568
00:33:46,110 --> 00:33:50,820
We can also do models
that have both a slope

569
00:33:50,820 --> 00:33:53,040
term and a constant term.

570
00:33:53,040 --> 00:33:57,540
And this is simply formulated
here as a means centered model.

571
00:33:57,540 --> 00:34:04,170
If I were to take my data in
and say when x was added mean,

572
00:34:04,170 --> 00:34:05,610
this term would be 0.

573
00:34:05,610 --> 00:34:08,290
So this is not
really an intercept.

574
00:34:08,290 --> 00:34:13,320
This is saying my a coefficient
is when x is added to mean.

575
00:34:13,320 --> 00:34:17,100
I could similarly formulate
it so that the coefficient

576
00:34:17,100 --> 00:34:21,199
would be when x was 0.

577
00:34:21,199 --> 00:34:23,780
The point being that
the same approach

578
00:34:23,780 --> 00:34:29,360
for estimating both a linear
term and a constant offset term

579
00:34:29,360 --> 00:34:30,500
can apply.

580
00:34:30,500 --> 00:34:39,500
And the same notion of not
only getting estimates but also

581
00:34:39,500 --> 00:34:44,719
getting confidence
intervals based on variances

582
00:34:44,719 --> 00:34:48,920
in those coefficients applies.

583
00:34:48,920 --> 00:34:53,060
So we can also use this to
get confidence intervals,

584
00:34:53,060 --> 00:34:57,120
not only on the slope term but
also on the variance term--

585
00:34:57,120 --> 00:34:58,820
I mean the offset term.

586
00:35:03,680 --> 00:35:10,370
Now we can also, what's nice is,
do the same math now and look

587
00:35:10,370 --> 00:35:14,570
at a variance in our
prediction of the output.

588
00:35:14,570 --> 00:35:17,000
I already alluded to that with
these confidence intervals

589
00:35:17,000 --> 00:35:23,430
on that plot of y versus
x in that one set of data.

590
00:35:23,430 --> 00:35:26,960
And if I basically am saying,
OK, this is my best estimate--

591
00:35:26,960 --> 00:35:28,250
this was my--

592
00:35:28,250 --> 00:35:31,280
this is equal to the a
coefficient --this is my best

593
00:35:31,280 --> 00:35:36,050
estimate of the
underlying linear model

594
00:35:36,050 --> 00:35:40,080
with an offset term, and I just
do my variance math on this,

595
00:35:40,080 --> 00:35:43,320
I've got a variance of
some of these terms.

596
00:35:43,320 --> 00:35:45,650
And if you carry
through that math,

597
00:35:45,650 --> 00:35:49,250
this is just a constant
at each x sub i.

598
00:35:49,250 --> 00:35:52,560
Since x bar is a constant,
x sub i is a constant.

599
00:35:52,560 --> 00:35:54,140
So in the variance
math, when I look

600
00:35:54,140 --> 00:35:56,270
at the variance
of this term, it's

601
00:35:56,270 --> 00:35:59,405
the variance of this times
the variance of this.

602
00:35:59,405 --> 00:36:01,430
This is a constant
term, so I've got

603
00:36:01,430 --> 00:36:05,750
that constant squared out in
front of the variance of my b.

604
00:36:05,750 --> 00:36:07,550
We already calculated
what the variance

605
00:36:07,550 --> 00:36:10,520
of the b a and the variance
of the B term were.

606
00:36:10,520 --> 00:36:14,000
I can plug those in and
get an overall estimate

607
00:36:14,000 --> 00:36:20,870
of the variance of each of
my y sub i terms in my model.

608
00:36:20,870 --> 00:36:22,640
And based on--
once I've got that

609
00:36:22,640 --> 00:36:26,420
for the single standard error,
my single standard deviation,

610
00:36:26,420 --> 00:36:30,470
I can use the t or the normal
to get a confidence interval

611
00:36:30,470 --> 00:36:31,310
on the output.

612
00:36:35,110 --> 00:36:37,470
So it's the same thing we
did on the coefficients.

613
00:36:37,470 --> 00:36:41,940
I can also do it to tell me what
kind of spread, what confidence

614
00:36:41,940 --> 00:36:46,700
do I have in where the true
output should lie when I'm

615
00:36:46,700 --> 00:36:49,580
predicting for
any x value, where

616
00:36:49,580 --> 00:36:52,850
I think the actual true
output y would lie.

617
00:36:52,850 --> 00:36:55,220
Now there's an
interesting aspect

618
00:36:55,220 --> 00:37:03,870
to this, which is if I look
for any given x sub i input

619
00:37:03,870 --> 00:37:10,240
particular x input value,
notice OK, that's right here.

620
00:37:10,240 --> 00:37:13,480
I plug-in for my
particular i of interest.

621
00:37:13,480 --> 00:37:17,360
Notice that the denominator here
was a sum over all of my data.

622
00:37:17,360 --> 00:37:19,160
So that ends up being
just a constant.

623
00:37:19,160 --> 00:37:20,950
It doesn't change.

624
00:37:20,950 --> 00:37:24,160
But depending on what
x I'm looking at,

625
00:37:24,160 --> 00:37:29,015
where I am on the x, the
size of this changes.

626
00:37:32,010 --> 00:37:36,030
So for example, if
I look at my mean,

627
00:37:36,030 --> 00:37:39,630
if I look where my x
sub i is equal to x bar,

628
00:37:39,630 --> 00:37:42,840
that numerator term goes to 0.

629
00:37:42,840 --> 00:37:45,690
And essentially what
I've got in that case

630
00:37:45,690 --> 00:37:49,260
is at the mean of my
data, my estimation

631
00:37:49,260 --> 00:37:53,490
is basically-- my variance
in my output estimate

632
00:37:53,490 --> 00:37:58,200
is basically just related to
the random noise in the data.

633
00:37:58,200 --> 00:38:02,610
But then as I get further
and further from the mean,

634
00:38:02,610 --> 00:38:05,355
my confidence interval
in my output spreads.

635
00:38:07,990 --> 00:38:11,110
So what you will
often see on data--

636
00:38:11,110 --> 00:38:16,360
this was x data and
this is my y --is

637
00:38:16,360 --> 00:38:20,170
near the center of
your data, you've

638
00:38:20,170 --> 00:38:22,660
got the narrowest
confidence intervals.

639
00:38:22,660 --> 00:38:24,980
And as I get further
and further away,

640
00:38:24,980 --> 00:38:30,610
if I were to use the dash for
a 95% confidence on the output,

641
00:38:30,610 --> 00:38:35,500
the further away that I
get in x from my x bar,

642
00:38:35,500 --> 00:38:38,575
the wider my prediction
error becomes.

643
00:38:41,590 --> 00:38:43,840
Even though I'm still
may be interpolating over

644
00:38:43,840 --> 00:38:48,720
the data I've got, my
variance does spread

645
00:38:48,720 --> 00:38:57,240
as I get further and further
away, just an interesting fact.

646
00:38:57,240 --> 00:39:01,410
All right, we're almost ready
to do a polynomial example.

647
00:39:01,410 --> 00:39:05,130
I just want to point out we
talked about this previously.

648
00:39:05,130 --> 00:39:09,060
We can also do not only
a constant term but also

649
00:39:09,060 --> 00:39:11,310
a linear term.

650
00:39:11,310 --> 00:39:14,850
We can do terms that include
this square polynomial,

651
00:39:14,850 --> 00:39:18,720
for example, include
curvature in the x squared.

652
00:39:18,720 --> 00:39:22,110
One important fact
is this is still

653
00:39:22,110 --> 00:39:25,500
linear data in the coefficients.

654
00:39:28,130 --> 00:39:32,960
And what this means is the
least squares approach--

655
00:39:32,960 --> 00:39:36,240
least squares minimization,
still applies.

656
00:39:36,240 --> 00:39:37,790
So you can still
do least squares

657
00:39:37,790 --> 00:39:40,670
minimization to estimate
your beta coefficients.

658
00:39:40,670 --> 00:39:43,460
And essentially what
you do mechanically,

659
00:39:43,460 --> 00:39:46,400
say in something
like Excel, is create

660
00:39:46,400 --> 00:39:51,440
that additional fake column
of data, just taking your x.

661
00:39:51,440 --> 00:39:55,670
You can almost think of this
as equating that with an x2,

662
00:39:55,670 --> 00:39:59,780
think of this as an x1, and
building your data column,

663
00:39:59,780 --> 00:40:03,530
taking each of your x
coefficients, squaring it,

664
00:40:03,530 --> 00:40:06,680
and that becomes a
new x sub 2 input.

665
00:40:06,680 --> 00:40:08,090
And then all you're
doing is just

666
00:40:08,090 --> 00:40:12,000
a linear fit now in these
multiple coefficients.

667
00:40:12,000 --> 00:40:13,760
So it looks exactly
the same like we

668
00:40:13,760 --> 00:40:19,940
did for multiple inputs, even if
we have additional higher order

669
00:40:19,940 --> 00:40:21,635
terms in the x squared.

670
00:40:25,630 --> 00:40:28,990
So let's look at a
simple example here.

671
00:40:28,990 --> 00:40:31,990
Pull these threads together,
look at confidence,

672
00:40:31,990 --> 00:40:34,390
but also look at
it in the case when

673
00:40:34,390 --> 00:40:38,860
I've got some replicate data so
we can get a little experience

674
00:40:38,860 --> 00:40:41,080
with this lack of fit idea.

675
00:40:41,080 --> 00:40:47,020
And so in this case, we've
got importantly here cases

676
00:40:47,020 --> 00:40:50,900
where I've replicated
my x values.

677
00:40:50,900 --> 00:40:55,960
So I've got two runs with 20
grams of some kind of growth

678
00:40:55,960 --> 00:40:57,230
supplement.

679
00:40:57,230 --> 00:41:00,268
And so I've got two different
output values at that point.

680
00:41:00,268 --> 00:41:01,810
And I've got another
point where I've

681
00:41:01,810 --> 00:41:09,570
got three replicates, triply
replicated set of data.

682
00:41:09,570 --> 00:41:12,720
And what I'd like to do
is try to fit a model

683
00:41:12,720 --> 00:41:16,500
and hear what we've
got in the picture is

684
00:41:16,500 --> 00:41:19,350
an inkling or a foreshadowing
of some of the kinds of models

685
00:41:19,350 --> 00:41:23,580
we might consider and some of
the issues we might consider.

686
00:41:23,580 --> 00:41:24,480
If we look--

687
00:41:24,480 --> 00:41:28,690
I think you can see it here
--the basic data here in black,

688
00:41:28,690 --> 00:41:30,940
these are the data points.

689
00:41:30,940 --> 00:41:32,900
So this is just my output.

690
00:41:32,900 --> 00:41:34,920
There's my triply
replicated data.

691
00:41:34,920 --> 00:41:36,870
There Is my x data.

692
00:41:36,870 --> 00:41:39,000
First off, I could try
to fit that with a mean.

693
00:41:39,000 --> 00:41:40,360
That's just the red line.

694
00:41:40,360 --> 00:41:43,920
That's the pure just
mean of my data.

695
00:41:43,920 --> 00:41:45,930
The green line here
is a first order

696
00:41:45,930 --> 00:41:49,350
fit to just a slope
coefficient and the mean,

697
00:41:49,350 --> 00:41:51,720
so two model terms.

698
00:41:51,720 --> 00:41:53,880
And you can see
already that's not

699
00:41:53,880 --> 00:41:56,100
going to be a very good model.

700
00:41:56,100 --> 00:41:58,920
And what we've got is enough
data here with the replicates

701
00:41:58,920 --> 00:42:01,320
to perhaps be able
to detect that using

702
00:42:01,320 --> 00:42:05,340
our machinery of ANOVA, and
then perhaps then build that

703
00:42:05,340 --> 00:42:10,860
into a second order model that
we can already get a sense is

704
00:42:10,860 --> 00:42:13,260
going to be a quadratic
model that fits the data lot

705
00:42:13,260 --> 00:42:15,615
that a lot better.

706
00:42:15,615 --> 00:42:17,610
Now, If I were to just try it--

707
00:42:17,610 --> 00:42:19,590
let's say I didn't already--

708
00:42:19,590 --> 00:42:21,770
first off you should always
plot your actual data

709
00:42:21,770 --> 00:42:24,870
so you have a feel for
what kind of a model

710
00:42:24,870 --> 00:42:26,860
is going to be needed.

711
00:42:26,860 --> 00:42:29,370
So if you were to actually plot
that data, you would already

712
00:42:29,370 --> 00:42:32,670
you probably needed
a quadratic model.

713
00:42:32,670 --> 00:42:37,080
So you might go ahead and
up front, include that term.

714
00:42:37,080 --> 00:42:40,740
But let's say we had
not done that, we'd just

715
00:42:40,740 --> 00:42:42,910
tried to fit it with
a very simple model,

716
00:42:42,910 --> 00:42:44,640
a simple linear model.

717
00:42:44,640 --> 00:42:46,890
And if we go through
and do the ANOVA,

718
00:42:46,890 --> 00:42:50,580
now because we do have
repeated residual,

719
00:42:50,580 --> 00:42:55,140
I can split my overall residual
sum of squared deviations

720
00:42:55,140 --> 00:42:58,420
into a lack of fit term.

721
00:42:58,420 --> 00:42:59,880
That's a sum of
squared deviations

722
00:42:59,880 --> 00:43:02,430
just from my replicated--

723
00:43:02,430 --> 00:43:07,110
or my total deviation from my
model from my replicated data.

724
00:43:07,110 --> 00:43:10,890
And I can formulate then a
ratio of those two things.

725
00:43:10,890 --> 00:43:15,150
And what I've got is
deviations from my model

726
00:43:15,150 --> 00:43:17,140
that are much larger.

727
00:43:17,140 --> 00:43:21,597
So this is a deviation.

728
00:43:21,597 --> 00:43:22,430
It's not a good one.

729
00:43:22,430 --> 00:43:24,650
Actually right, there the
deviation from the model

730
00:43:24,650 --> 00:43:27,740
is quite small.

731
00:43:27,740 --> 00:43:30,620
If I were to look right
here, for example,

732
00:43:30,620 --> 00:43:34,070
this is my deviation
from the model.

733
00:43:34,070 --> 00:43:36,560
I don't have any
replicate data there.

734
00:43:36,560 --> 00:43:39,770
Right here, I've got deviation
from the linear model.

735
00:43:39,770 --> 00:43:44,710
And then I've got
pure replicate error.

736
00:43:44,710 --> 00:43:50,380
And you can start to see that
the deviations from my best

737
00:43:50,380 --> 00:43:53,440
estimate prediction at the
model is much, much larger.

738
00:43:53,440 --> 00:43:57,940
And that's what shows up in
this ratio of the two variances.

739
00:43:57,940 --> 00:44:00,970
If you do that and follow
through with the F,

740
00:44:00,970 --> 00:44:03,910
that's highly unlikely--
that big of a ratio

741
00:44:03,910 --> 00:44:09,580
is highly unlikely to occur by
chance given the noise spread.

742
00:44:09,580 --> 00:44:12,910
So if you actually go in and
do the lack of fit analysis,

743
00:44:12,910 --> 00:44:16,370
it's already setting
up big red flags.

744
00:44:16,370 --> 00:44:18,610
Here's my red flag saying,
look out, look out.

745
00:44:18,610 --> 00:44:23,530
You've got a lot of
evidence of a lack of fit.

746
00:44:23,530 --> 00:44:26,080
What's interesting
in this example is

747
00:44:26,080 --> 00:44:30,010
if I were to just look
at the significance

748
00:44:30,010 --> 00:44:36,980
of the individual model
terms, this pops out in fact

749
00:44:36,980 --> 00:44:40,510
that the mean is
highly significant

750
00:44:40,510 --> 00:44:42,025
but the slope term is not.

751
00:44:45,930 --> 00:44:47,280
So this would say--

752
00:44:47,280 --> 00:44:48,960
if I weren't looking
at lack of fit

753
00:44:48,960 --> 00:44:51,870
and paying attention
to that red flag,

754
00:44:51,870 --> 00:44:57,270
I might be tempted to
say a very wrong thing.

755
00:44:57,270 --> 00:45:00,570
I might be tempted to say
there is a significant estimate

756
00:45:00,570 --> 00:45:07,710
of the mean that's non-zero,
but given the spread in my data,

757
00:45:07,710 --> 00:45:10,530
I cannot conclude that
there is a linear dependence

758
00:45:10,530 --> 00:45:12,390
on my input.

759
00:45:12,390 --> 00:45:18,090
My linear dependence
on x could be 0.

760
00:45:18,090 --> 00:45:20,370
In other words,
with that green line

761
00:45:20,370 --> 00:45:25,820
right here, that's a
small slope that given

762
00:45:25,820 --> 00:45:30,740
the spread in my data is not
justified to actually estimate

763
00:45:30,740 --> 00:45:32,060
as anything other than 0.

764
00:45:35,720 --> 00:45:36,690
Interesting, huh?

765
00:45:39,350 --> 00:45:41,930
So you really need
to look at both.

766
00:45:41,930 --> 00:45:44,120
I'd have to be very
careful because

767
00:45:44,120 --> 00:45:46,910
the extra explanatory
power of the linear term

768
00:45:46,910 --> 00:45:48,860
is very, very minimal here.

769
00:45:48,860 --> 00:45:50,690
So I might think
OK, so I've really

770
00:45:50,690 --> 00:45:53,210
got no dependence at
all, which what I really

771
00:45:53,210 --> 00:45:54,380
got his lack of fit.

772
00:45:57,820 --> 00:45:58,620
That making sense?

773
00:46:01,210 --> 00:46:04,780
So what I might then do is
say, OK, I am paying attention

774
00:46:04,780 --> 00:46:05,710
to that big red flag.

775
00:46:05,710 --> 00:46:06,790
I've got lack of fit.

776
00:46:06,790 --> 00:46:13,460
Maybe I better add a
quadratic term, refit my data.

777
00:46:13,460 --> 00:46:19,300
So now if I look at
the S for my model

778
00:46:19,300 --> 00:46:22,150
with the mean with a term
for the linear coefficient

779
00:46:22,150 --> 00:46:25,960
and one for the quadratic,
now what do I get?

780
00:46:25,960 --> 00:46:29,620
And return to breaking
apart my residual

781
00:46:29,620 --> 00:46:34,330
and now looking and seeing
how much deviation is there

782
00:46:34,330 --> 00:46:38,650
due to lack of fit compared to
underlying replicate variance.

783
00:46:38,650 --> 00:46:40,880
And now that ratio
is very small.

784
00:46:40,880 --> 00:46:44,620
So now I don't have
any longer any evidence

785
00:46:44,620 --> 00:46:47,770
of lack of fit, that's good.

786
00:46:47,770 --> 00:46:50,320
And now I can return
to deciding about

787
00:46:50,320 --> 00:46:54,820
whether individual
terms are significant.

788
00:46:54,820 --> 00:46:59,380
And we don't see the full F
test, it's an incomplete ANOVA.

789
00:46:59,380 --> 00:47:01,750
But what we would
basically find here

790
00:47:01,750 --> 00:47:05,830
is the mean term is
significant, the quadratic term

791
00:47:05,830 --> 00:47:08,197
is significant.

792
00:47:08,197 --> 00:47:09,280
How about the linear term?

793
00:47:12,160 --> 00:47:14,180
It's still not significant.

794
00:47:14,180 --> 00:47:17,620
So in fact, we've got a
a mean and a square term

795
00:47:17,620 --> 00:47:20,590
but no dependence
on the linear term.

796
00:47:20,590 --> 00:47:22,090
You will typically see that.

797
00:47:22,090 --> 00:47:26,770
In fact, these-- if these
terms are truly orthogonal,

798
00:47:26,770 --> 00:47:29,590
if I add the terms, it should
not change my estimates

799
00:47:29,590 --> 00:47:31,300
for the other terms.

800
00:47:31,300 --> 00:47:34,570
That's not quite true if you
throw those missing terms

801
00:47:34,570 --> 00:47:36,820
into noise factors.

802
00:47:36,820 --> 00:47:40,750
But the basic point here is
I've now actually captured

803
00:47:40,750 --> 00:47:47,610
that the dependence on x
with this quadratic term.

804
00:47:47,610 --> 00:47:49,660
So you can do exactly
the same thing.

805
00:47:49,660 --> 00:47:54,210
This is the same
data using Excel.

806
00:47:54,210 --> 00:47:56,730
And you get the
same kind of a table

807
00:47:56,730 --> 00:47:59,970
here with an x term
and x squared term.

808
00:47:59,970 --> 00:48:03,390
And what's interesting
here is you can also

809
00:48:03,390 --> 00:48:06,780
go in and look at estimates
of the coefficients,

810
00:48:06,780 --> 00:48:11,010
the standard error, 95%
confidence intervals.

811
00:48:11,010 --> 00:48:15,630
And I guess actually if you were
to look at that 95% confidence

812
00:48:15,630 --> 00:48:18,960
interval for that x term,
looks like it actually

813
00:48:18,960 --> 00:48:22,170
is likely to be non-zero.

814
00:48:22,170 --> 00:48:24,810
So I did get that right.

815
00:48:28,040 --> 00:48:31,740
So actually you probably
should include that term,

816
00:48:31,740 --> 00:48:34,530
even though the ratio
is a little bit smaller.

817
00:48:34,530 --> 00:48:36,470
It is still significant.

818
00:48:36,470 --> 00:48:38,720
Now I also put this one
up because it's also

819
00:48:38,720 --> 00:48:43,880
got estimates of your R-squared
and adjusted R-squared.

820
00:48:43,880 --> 00:48:49,370
where it's giving
you a nice feel.

821
00:48:49,370 --> 00:48:53,840
R-squared of around 0.9, 0.95,
you start to feel pretty good

822
00:48:53,840 --> 00:48:54,462
about--

823
00:48:54,462 --> 00:48:55,670
pretty good about your model.

824
00:48:58,660 --> 00:49:00,880
So I don't know if you
played around with Excel.

825
00:49:00,880 --> 00:49:06,850
So again, I encourage JUMP, but
if you do need to use Excel,

826
00:49:06,850 --> 00:49:08,620
there is--

827
00:49:08,620 --> 00:49:12,400
under the data analysis
tool if you pull that down,

828
00:49:12,400 --> 00:49:15,010
you will also see the
regression analysis.

829
00:49:15,010 --> 00:49:19,240
And it will let you indicate
what your output problems are

830
00:49:19,240 --> 00:49:21,040
and what your input columns are.

831
00:49:21,040 --> 00:49:24,040
And it does just the least
squares regression, pops out

832
00:49:24,040 --> 00:49:26,920
your ANOVA table for you.

833
00:49:26,920 --> 00:49:29,410
In that case, you
actually have to construct

834
00:49:29,410 --> 00:49:32,800
by hand your wide
square or your x

835
00:49:32,800 --> 00:49:35,350
squared data if you
want to polynomial fit.

836
00:49:35,350 --> 00:49:37,870
And that's what I've
just illustrated here.

837
00:49:37,870 --> 00:49:41,200
You can't simply, unfortunately,
at least in the version

838
00:49:41,200 --> 00:49:45,040
of Excel I have, say I want
to try a polynomial model up

839
00:49:45,040 --> 00:49:49,450
to some order and have
it just know to do that

840
00:49:49,450 --> 00:49:51,400
on the polynomial input data.

841
00:49:51,400 --> 00:49:53,470
You actually have
to create columns

842
00:49:53,470 --> 00:49:55,030
for each of the
model coefficients

843
00:49:55,030 --> 00:49:56,478
that you want to estimate.

844
00:50:00,150 --> 00:50:03,780
Here's the same polynomial
regression using the JUMP

845
00:50:03,780 --> 00:50:07,470
package, again, with
all of the lack of fit

846
00:50:07,470 --> 00:50:12,360
versus pure error, the
x and x squared terms,

847
00:50:12,360 --> 00:50:16,440
t ratios, all of that, but
basically the same analysis

848
00:50:16,440 --> 00:50:20,850
with the second order included.

849
00:50:20,850 --> 00:50:23,400
OK so with that, I'm going
to-- about to move on

850
00:50:23,400 --> 00:50:24,780
to process optimization.

851
00:50:24,780 --> 00:50:30,000
But I'd like to take any
questions on regression,

852
00:50:30,000 --> 00:50:33,100
confidence intervals, confidence
intervals and input, confidence

853
00:50:33,100 --> 00:50:34,200
intervals and outputs.

854
00:50:34,200 --> 00:50:35,430
Is that all?

855
00:50:35,430 --> 00:50:37,770
It's starting to feel--

856
00:50:37,770 --> 00:50:40,950
are you confident in
your understanding

857
00:50:40,950 --> 00:50:42,255
of confidence intervals?

858
00:50:42,255 --> 00:50:43,020
Yeah, question?

859
00:50:43,020 --> 00:50:44,660
AUDIENCE: Definitely
don't know what

860
00:50:44,660 --> 00:50:49,220
do you do if your inputs
that are correlated?

861
00:50:49,220 --> 00:50:51,350
DUANE BONING: OK so the
question was, what do you

862
00:50:51,350 --> 00:50:53,300
do if your inputs
are correlated.

863
00:50:55,890 --> 00:51:02,540
So what is assumed
in all of these fits

864
00:51:02,540 --> 00:51:05,090
is essentially you've
got orthogonality.

865
00:51:05,090 --> 00:51:07,800
If we go back to the
tables we were forming

866
00:51:07,800 --> 00:51:09,740
with full factorial
and so on, we're

867
00:51:09,740 --> 00:51:13,950
assuming that each of your
columns are orthogonal,

868
00:51:13,950 --> 00:51:17,570
which is to say we're assuming
each of your coefficients

869
00:51:17,570 --> 00:51:22,580
in each of your different terms
are uncorrelated or orthogonal.

870
00:51:22,580 --> 00:51:27,860
If they are orthogonal, and you
do a least squares regression--

871
00:51:27,860 --> 00:51:31,670
or if they are not orthogonal,
there they are correlated,

872
00:51:31,670 --> 00:51:33,480
what happens?

873
00:51:33,480 --> 00:51:37,100
Well, what happens is you've
got to model coefficients

874
00:51:37,100 --> 00:51:41,120
both trying to explain some
amount of the same data.

875
00:51:41,120 --> 00:51:43,200
And they fight
against each other.

876
00:51:43,200 --> 00:51:48,980
And it's almost
random how the effect

877
00:51:48,980 --> 00:51:51,650
that-- that true underlying
effect gets apportioned between

878
00:51:51,650 --> 00:51:54,390
say a beta 1 and a beta to term.

879
00:51:54,390 --> 00:51:56,630
In fact very, very tiny
little perturbations,

880
00:51:56,630 --> 00:52:00,170
and you can get a different
mix of beta 1 and beta 2.

881
00:52:00,170 --> 00:52:03,230
And it turns out
you might still be

882
00:52:03,230 --> 00:52:05,300
OK in terms of
predicting an output

883
00:52:05,300 --> 00:52:08,400
because at least your model
has both of them in there.

884
00:52:08,400 --> 00:52:11,390
But it really screws up
your ability to decide

885
00:52:11,390 --> 00:52:17,980
is that model term
significant or not.

886
00:52:17,980 --> 00:52:21,820
What you need to do
is transform your data

887
00:52:21,820 --> 00:52:25,060
to get it into an
orthogonal form

888
00:52:25,060 --> 00:52:28,750
to get rid of the correlation
to basically create do

889
00:52:28,750 --> 00:52:32,980
model coefficients and
new explanatory values

890
00:52:32,980 --> 00:52:39,190
to fake x values that don't
have the correlation in them.

891
00:52:39,190 --> 00:52:42,940
And the classic
tool for doing that

892
00:52:42,940 --> 00:52:49,390
is principal component
analysis or some transformation

893
00:52:49,390 --> 00:52:55,900
of the data to a different basis
than your original x1, x2, x3

894
00:52:55,900 --> 00:52:59,220
coefficients.

895
00:52:59,220 --> 00:53:02,580
We might talk a little bit
about multivariable things.

896
00:53:02,580 --> 00:53:08,100
I think we did a little bit with
multivariable statistical and T

897
00:53:08,100 --> 00:53:12,240
charts and so on,
but essentially

898
00:53:12,240 --> 00:53:15,180
a principal components or some
other kind of transformation

899
00:53:15,180 --> 00:53:17,430
is needed on the
data in order to then

900
00:53:17,430 --> 00:53:20,640
have individual
coefficients that

901
00:53:20,640 --> 00:53:23,200
are not duplicating each other.

902
00:53:23,200 --> 00:53:27,060
If you look, I think it's
chapter section 8 point--

903
00:53:27,060 --> 00:53:29,010
maybe 8.4.

904
00:53:29,010 --> 00:53:31,800
The next one after what I
assigned as a reading, that

905
00:53:31,800 --> 00:53:33,660
talks about principal
component analysis

906
00:53:33,660 --> 00:53:36,270
and how you do that
and process modeling.

907
00:53:36,270 --> 00:53:38,070
So you can read that section.

908
00:53:38,070 --> 00:53:42,510
It's actually very
good, very interesting.

909
00:53:42,510 --> 00:53:47,990
Other questions, progression?

910
00:53:47,990 --> 00:53:48,490
Yeah?

911
00:53:48,490 --> 00:53:50,073
AUDIENCE: If there
is a big difference

912
00:53:50,073 --> 00:53:52,360
between R-squared and
adjusted R-squared, what

913
00:53:52,360 --> 00:53:54,010
is that telling us?

914
00:53:54,010 --> 00:53:58,860
In this case, it's essentially
[INAUDIBLE] 0.9 and 0.8,

915
00:53:58,860 --> 00:54:01,192
or 0.7 [INAUDIBLE].

916
00:54:03,413 --> 00:54:04,830
DUANE BONING: Yes,
so the question

917
00:54:04,830 --> 00:54:06,330
is what if you have
big differences

918
00:54:06,330 --> 00:54:09,570
between R-squared and
adjusted R-squared.

919
00:54:09,570 --> 00:54:13,710
I think it's
essentially telling you

920
00:54:13,710 --> 00:54:17,930
that the influence of
additional model coefficients

921
00:54:17,930 --> 00:54:24,350
is really important, both--

922
00:54:24,350 --> 00:54:26,060
this very qualitative.

923
00:54:26,060 --> 00:54:27,860
But essentially,
it's telling you

924
00:54:27,860 --> 00:54:31,850
there's more than going on
than just the mean response.

925
00:54:31,850 --> 00:54:34,490
So you're seeing a little
bit of a mix of both--

926
00:54:34,490 --> 00:54:37,370
the penalty of adding
more model coefficients,

927
00:54:37,370 --> 00:54:40,580
but it's also
telling you there's

928
00:54:40,580 --> 00:54:45,530
likely additional structure
that you needed in order

929
00:54:45,530 --> 00:54:47,090
to use that.

930
00:54:47,090 --> 00:54:48,410
But that's pretty qualitative.

931
00:54:48,410 --> 00:54:50,780
I think basically it's
signaling that there's

932
00:54:50,780 --> 00:54:52,430
more than just mean--

933
00:54:52,430 --> 00:54:54,230
mean deviations going on.

934
00:54:57,480 --> 00:55:01,110
It sounded like there
was a microphone

935
00:55:01,110 --> 00:55:03,054
question in Singapore?

936
00:55:03,054 --> 00:55:05,030
AUDIENCE: Question on slide 50.

937
00:55:09,650 --> 00:55:14,450
You mentioned we should
only see the mean which

938
00:55:14,450 --> 00:55:17,900
also focused on the lack
of fit and the pure error.

939
00:55:17,900 --> 00:55:20,630
So why do you say that
we only see the mean,

940
00:55:20,630 --> 00:55:22,460
we may say it's a good model.

941
00:55:22,460 --> 00:55:23,990
Can you explain that again?

942
00:55:23,990 --> 00:55:25,365
DUANE BONING:
Yeah, actually what

943
00:55:25,365 --> 00:55:27,830
I was saying in this
example is that if I only

944
00:55:27,830 --> 00:55:34,710
looked at the mean, I might be
hesitant to include any model

945
00:55:34,710 --> 00:55:37,120
terms beyond the mean.

946
00:55:37,120 --> 00:55:42,100
So I might not actually think
it's a good model at all.

947
00:55:42,100 --> 00:55:45,990
So that part of your question,
I'm not sure I quite understood

948
00:55:45,990 --> 00:55:49,260
or quite agreed with.

949
00:55:49,260 --> 00:55:50,280
But I do--

950
00:55:50,280 --> 00:55:53,400
I guess maybe I'm
just repeating myself,

951
00:55:53,400 --> 00:55:57,270
I think it is really critical
to look for lack of fit

952
00:55:57,270 --> 00:55:59,730
because you need
both perspectives.

953
00:55:59,730 --> 00:56:05,790
You need to look not only at
model coefficients in terms

954
00:56:05,790 --> 00:56:08,430
and whether they should
be included in the model,

955
00:56:08,430 --> 00:56:12,870
but you also have to be
alert am I missing terms.

956
00:56:16,010 --> 00:56:19,040
That's what the lack of
fit enables you to do.

957
00:56:19,040 --> 00:56:22,490
This is basically saying
the terms that are there,

958
00:56:22,490 --> 00:56:23,630
are they significant?

959
00:56:26,630 --> 00:56:29,110
So in some, sense this one
is basically just leading

960
00:56:29,110 --> 00:56:32,920
you to throw away coefficients
and throw away model terms.

961
00:56:32,920 --> 00:56:36,400
And this number two, the
lack of fit, is telling you,

962
00:56:36,400 --> 00:56:38,650
hey wait a second, there's
stuff going on in the model

963
00:56:38,650 --> 00:56:40,120
that you're not
explaining that's

964
00:56:40,120 --> 00:56:43,840
different than random
noise, so maybe you

965
00:56:43,840 --> 00:56:46,600
should add model terms.

966
00:56:46,600 --> 00:56:48,670
And so you need
both perspectives.

967
00:56:52,380 --> 00:56:58,310
OK so I think we're ready to
move on and look a little bit

968
00:56:58,310 --> 00:57:00,260
at process optimization.

969
00:57:00,260 --> 00:57:03,710
I want to touch on the most
natural use of these sorts

970
00:57:03,710 --> 00:57:08,130
of models, which is we define
an experimental design,

971
00:57:08,130 --> 00:57:10,280
we go gather the data,
we build a model,

972
00:57:10,280 --> 00:57:12,260
and then we start
playing with the model.

973
00:57:12,260 --> 00:57:15,620
I think of that is
offline use of the model,

974
00:57:15,620 --> 00:57:19,160
using it to try to
identify an optimal point.

975
00:57:19,160 --> 00:57:22,880
But it's not purely
offline because I

976
00:57:22,880 --> 00:57:26,540
want to make the point that if
you're predicting an optimum,

977
00:57:26,540 --> 00:57:30,770
you probably want to go back and
run some confirming experiments

978
00:57:30,770 --> 00:57:34,640
and use those back with
your physical process

979
00:57:34,640 --> 00:57:37,940
to check your model and maybe
even iterate and improve

980
00:57:37,940 --> 00:57:39,050
your model.

981
00:57:39,050 --> 00:57:41,160
So that's one natural approach.

982
00:57:41,160 --> 00:57:42,470
And the other is--

983
00:57:42,470 --> 00:57:47,070
that should be online use.

984
00:57:47,070 --> 00:57:52,440
So another clever approach
is actually build simplified

985
00:57:52,440 --> 00:57:56,340
models in a little part of
the space, use that to tell me

986
00:57:56,340 --> 00:58:00,300
what direction to move in
exploring my overall process

987
00:58:00,300 --> 00:58:05,340
space, and then dynamically
build and improve my model.

988
00:58:05,340 --> 00:58:09,480
In the case when my real goal
is getting to an optimum,

989
00:58:09,480 --> 00:58:12,840
not having the perfect model
covering all of my space

990
00:58:12,840 --> 00:58:15,090
but rather get to
an optimum point.

991
00:58:15,090 --> 00:58:17,640
So I want to touch on
both of these ideas, ways

992
00:58:17,640 --> 00:58:20,640
of using these sort of
simplified response surface

993
00:58:20,640 --> 00:58:22,650
models.

994
00:58:22,650 --> 00:58:27,250
And part of the point
here is one important use

995
00:58:27,250 --> 00:58:31,450
of these models really is trying
to find an optimal process

996
00:58:31,450 --> 00:58:35,230
output or find the inputs that
give me an optimal process

997
00:58:35,230 --> 00:58:36,310
output.

998
00:58:36,310 --> 00:58:39,160
And that optimal
process output may

999
00:58:39,160 --> 00:58:41,170
have multiple
characteristics about it

1000
00:58:41,170 --> 00:58:44,360
that are important for us.

1001
00:58:44,360 --> 00:58:48,220
One is I want to be
close to a target value.

1002
00:58:48,220 --> 00:58:53,530
But the other is we may
also want small sensitivity,

1003
00:58:53,530 --> 00:58:56,180
small deviations in my output.

1004
00:58:56,180 --> 00:58:58,480
And if we go back to
our variation equation,

1005
00:58:58,480 --> 00:59:02,770
that may mean I want small
deviations around noise factors

1006
00:59:02,770 --> 00:59:05,710
that I'm not controlling.

1007
00:59:05,710 --> 00:59:11,670
And I may also want
relatively small sensitivity

1008
00:59:11,670 --> 00:59:13,950
even to some of my
input parameters

1009
00:59:13,950 --> 00:59:15,810
because I'm going to
fix them in my process.

1010
00:59:15,810 --> 00:59:20,500
And I'm not dynamically or in
a feedback loop changing them.

1011
00:59:20,500 --> 00:59:24,690
So in some cases, I want
this to also be small.

1012
00:59:24,690 --> 00:59:27,000
So we'll talk a
little bit about ways

1013
00:59:27,000 --> 00:59:30,750
to mix in these and
other objectives.

1014
00:59:30,750 --> 00:59:33,060
For right now, I'm
going to mostly focus

1015
00:59:33,060 --> 00:59:40,710
on say trying to meet some
set of target mean values.

1016
00:59:40,710 --> 00:59:43,710
But I can make the
point you can generalize

1017
00:59:43,710 --> 00:59:47,730
what I'm going to be talking
about here by thinking

1018
00:59:47,730 --> 00:59:51,150
of some objective function,
or some cost function,

1019
00:59:51,150 --> 00:59:55,620
or some goodness function that
actually mixes in together

1020
00:59:55,620 --> 00:59:57,630
multiple objectives.

1021
00:59:57,630 --> 01:00:00,240
So some of the objectives,
you might have a cost function

1022
01:00:00,240 --> 01:00:06,120
that penalizes for
deviations from the target

1023
01:00:06,120 --> 01:00:08,220
or maybe sum of
squared deviations

1024
01:00:08,220 --> 01:00:10,930
if I have multiple
outputs from the target.

1025
01:00:10,930 --> 01:00:15,450
It may also penalize me
for larger x's because--

1026
01:00:15,450 --> 01:00:18,810
larger input because
there's more cost

1027
01:00:18,810 --> 01:00:24,000
associated with using more
gas if I have a higher gas

1028
01:00:24,000 --> 01:00:25,860
flow in some process.

1029
01:00:25,860 --> 01:00:27,840
And then I can also
include other things

1030
01:00:27,840 --> 01:00:34,410
like terms that penalize for
sensitivity, these delta y's,

1031
01:00:34,410 --> 01:00:36,450
sensitivity to the output.

1032
01:00:36,450 --> 01:00:40,890
And I can keep throwing
additional things in.

1033
01:00:40,890 --> 01:00:46,770
So if I've got in general some
complicated objective function,

1034
01:00:46,770 --> 01:00:51,120
if I can formulate
that and actually model

1035
01:00:51,120 --> 01:00:57,120
either empirically or
analytically that cost

1036
01:00:57,120 --> 01:01:00,300
function as a
function of my input

1037
01:01:00,300 --> 01:01:04,650
or as a function utilizing the
models that I already have,

1038
01:01:04,650 --> 01:01:07,710
I can then formulate an
optimization function

1039
01:01:07,710 --> 01:01:09,180
or an optimization
problem where I

1040
01:01:09,180 --> 01:01:11,130
might be trying to
minimize that cost

1041
01:01:11,130 --> 01:01:15,563
or minimize that objective.

1042
01:01:15,563 --> 01:01:16,980
Or maybe I'm trying
to maximize it

1043
01:01:16,980 --> 01:01:20,070
because I think of it as really
a goodness function rather

1044
01:01:20,070 --> 01:01:21,690
than a penalty function.

1045
01:01:21,690 --> 01:01:26,460
But overall, I've got some
complicated form for J

1046
01:01:26,460 --> 01:01:29,310
as a function of my factors.

1047
01:01:29,310 --> 01:01:32,820
Or my factors might
be my actual input,

1048
01:01:32,820 --> 01:01:38,010
but they may also be noise
factors, other factors that I

1049
01:01:38,010 --> 01:01:41,840
haven't explicitly modeled.

1050
01:01:41,840 --> 01:01:48,150
And we'll talk about robustness
next week, or not next week,

1051
01:01:48,150 --> 01:01:49,830
on Thursday.

1052
01:01:49,830 --> 01:01:51,360
But right now, I
just want to talk

1053
01:01:51,360 --> 01:01:57,690
about adjusting or searching
for good input factors

1054
01:01:57,690 --> 01:02:03,430
to minimize or maximize some
cost function with constraints.

1055
01:02:03,430 --> 01:02:05,850
So in general, you can think
about different approaches

1056
01:02:05,850 --> 01:02:06,690
for this.

1057
01:02:06,690 --> 01:02:10,020
If I've got a full
expression for y

1058
01:02:10,020 --> 01:02:16,530
as some function of x and
maybe J is some function of y,

1059
01:02:16,530 --> 01:02:21,570
I have overall got some
overall function for my cost

1060
01:02:21,570 --> 01:02:24,240
as a function of my inputs.

1061
01:02:24,240 --> 01:02:27,690
Then I can go in
and try to minimize,

1062
01:02:27,690 --> 01:02:32,910
really dj dx and find--

1063
01:02:32,910 --> 01:02:35,130
with some assumptions
of monotonicity,

1064
01:02:35,130 --> 01:02:38,430
I can find an overall minimum
or at least a local minimum

1065
01:02:38,430 --> 01:02:40,480
or maximum to that function.

1066
01:02:40,480 --> 01:02:43,080
So that's if I've got
a full expression.

1067
01:02:43,080 --> 01:02:46,230
And we'll explore
that a little bit.

1068
01:02:46,230 --> 01:02:48,870
Another approach is more
of an incremental approach.

1069
01:02:48,870 --> 01:02:50,970
Rather than having
the full expression

1070
01:02:50,970 --> 01:02:54,420
and leaping right
to the optimum point

1071
01:02:54,420 --> 01:02:58,720
based on a local minimum
or local maximum,

1072
01:02:58,720 --> 01:03:00,540
I may have to search for it.

1073
01:03:00,540 --> 01:03:04,255
I may have to iteratively
explore the space.

1074
01:03:04,255 --> 01:03:05,880
And we'll talk a
little bit about these

1075
01:03:05,880 --> 01:03:10,200
with hill climbing or steepest
ascent and descent kinds

1076
01:03:10,200 --> 01:03:10,855
of problems.

1077
01:03:10,855 --> 01:03:12,480
And I've already
mentioned a little bit

1078
01:03:12,480 --> 01:03:15,360
of this online versus offline.

1079
01:03:15,360 --> 01:03:17,460
So here's the simplest
picture for one

1080
01:03:17,460 --> 01:03:19,230
of these optimization problems.

1081
01:03:19,230 --> 01:03:22,950
I've got my input x, and
I've got my output y.

1082
01:03:22,950 --> 01:03:31,470
And what I'm looking for is
a maximum for my output y.

1083
01:03:31,470 --> 01:03:33,810
And maybe here simply
my cost function

1084
01:03:33,810 --> 01:03:39,850
is simply J or J is equal
to y, something like that.

1085
01:03:39,850 --> 01:03:43,500
So I'm not differentiating
here too much between y and J.

1086
01:03:43,500 --> 01:03:45,450
I'm just simply saying
what I'm looking

1087
01:03:45,450 --> 01:03:50,520
for is the overall
maximum for this output.

1088
01:03:50,520 --> 01:03:55,650
And one knows from basic
geometry, basic algebra

1089
01:03:55,650 --> 01:03:59,400
that the maximum will occur--

1090
01:03:59,400 --> 01:04:02,310
unless I hit some
constraints or some boundary

1091
01:04:02,310 --> 01:04:05,670
cases --will occur
when I've got zero

1092
01:04:05,670 --> 01:04:09,780
curvature in that function.

1093
01:04:09,780 --> 01:04:11,370
So how do I find it?

1094
01:04:11,370 --> 01:04:15,970
Well, one approach is, again,
this analytic approach.

1095
01:04:15,970 --> 01:04:18,760
If I have a full
expression, I can simply

1096
01:04:18,760 --> 01:04:20,680
recognize that
that minimum occurs

1097
01:04:20,680 --> 01:04:26,320
where there is zero curvature,
solve for the y such

1098
01:04:26,320 --> 01:04:32,440
that that curvature is 0, and
I directly get to the answer.

1099
01:04:32,440 --> 01:04:35,920
But in order to do that, I
need a full analytic model.

1100
01:04:35,920 --> 01:04:40,330
To do that, I needed
perhaps relatively small

1101
01:04:40,330 --> 01:04:43,900
or good accurate increments
and x or assumptions

1102
01:04:43,900 --> 01:04:45,730
on the model form.

1103
01:04:45,730 --> 01:04:50,695
And especially if I have
relatively sparse data points,

1104
01:04:50,695 --> 01:04:54,190
if I had say just
these data points,

1105
01:04:54,190 --> 01:04:58,510
it's quite easy to miss
the true optimum because

1106
01:04:58,510 --> 01:05:06,370
of noise or imperfections
in my model fit.

1107
01:05:06,370 --> 01:05:09,570
So it can actually be a little
bit tricky with small amounts

1108
01:05:09,570 --> 01:05:14,430
of data to find that if I
fit an overall analytic model

1109
01:05:14,430 --> 01:05:16,650
to a very small
number of data points.

1110
01:05:18,900 --> 01:05:23,940
An alternative is a little bit
of an iterative or a search

1111
01:05:23,940 --> 01:05:31,020
process where we might actually
add data or explore or model,

1112
01:05:31,020 --> 01:05:37,770
either explore experiments or
explore a model in a smaller

1113
01:05:37,770 --> 01:05:43,770
space in each case and sort of
seek to find the optimum point.

1114
01:05:43,770 --> 01:05:46,530
And here are a simple
conceptual idea

1115
01:05:46,530 --> 01:05:50,340
here is in some
regions of my space,

1116
01:05:50,340 --> 01:05:54,660
I may have very good
model fits less so

1117
01:05:54,660 --> 01:05:56,730
than with much less
error than trying

1118
01:05:56,730 --> 01:06:00,060
to fit this overall quadratic to
a small number of data points.

1119
01:06:00,060 --> 01:06:02,310
I may have relatively
good model fit

1120
01:06:02,310 --> 01:06:04,275
in smaller regions of the space.

1121
01:06:06,990 --> 01:06:09,110
Remember that confidence
interval on the output?

1122
01:06:09,110 --> 01:06:13,190
I said as we get further
and further away from say

1123
01:06:13,190 --> 01:06:18,440
the central moments of our
data, my confidence interval

1124
01:06:18,440 --> 01:06:21,380
on my output prediction
gets wider and wider.

1125
01:06:21,380 --> 01:06:24,860
If I shrink my space,
I get better estimates

1126
01:06:24,860 --> 01:06:27,480
of my model in a local space.

1127
01:06:27,480 --> 01:06:29,190
And so one approach
here is to say,

1128
01:06:29,190 --> 01:06:30,900
I'm going to look
in a local space

1129
01:06:30,900 --> 01:06:34,490
get a good estimate
of what the slope is.

1130
01:06:34,490 --> 01:06:39,270
Maybe it's a reduced order
model that's only linear.

1131
01:06:39,270 --> 01:06:42,760
So I'm not even trying to
fit additional curvature.

1132
01:06:42,760 --> 01:06:46,410
And then use that
to say my output y

1133
01:06:46,410 --> 01:06:51,000
is increasing in this
direction with x increasing.

1134
01:06:51,000 --> 01:06:55,200
And use that to project
forward a small amount

1135
01:06:55,200 --> 01:07:00,030
and suggest a new
x value to try.

1136
01:07:00,030 --> 01:07:06,660
So it's projecting and
additional steps to explore.

1137
01:07:06,660 --> 01:07:09,870
If I then do that and build
an additional linear model--

1138
01:07:09,870 --> 01:07:16,220
whoa --build an additional
linear model here,

1139
01:07:16,220 --> 01:07:18,530
it might suggest
another small step.

1140
01:07:18,530 --> 01:07:23,540
And as my linear model starts to
have a slope turn that shrinks,

1141
01:07:23,540 --> 01:07:26,690
that's telling me I'm
getting something closer

1142
01:07:26,690 --> 01:07:31,370
to an optimum point or at
least a local optimum point.

1143
01:07:31,370 --> 01:07:35,720
And at that point that's
signaling me that if I really

1144
01:07:35,720 --> 01:07:39,260
want improved accuracy
at that point in space,

1145
01:07:39,260 --> 01:07:43,770
to really zero in on the
maximum, I can do two things.

1146
01:07:43,770 --> 01:07:47,760
One is still constrained
my search space.

1147
01:07:47,760 --> 01:07:52,550
But also in this region,
it's quite likely that my--

1148
01:07:55,580 --> 01:07:57,240
it's quite likely--

1149
01:07:57,240 --> 01:07:59,360
I don't want this.

1150
01:07:59,360 --> 01:08:00,900
I don't know what that was.

1151
01:08:00,900 --> 01:08:05,860
Oh, wow, something
funky happened.

1152
01:08:05,860 --> 01:08:10,540
In this space, it's just like
with that curvature model

1153
01:08:10,540 --> 01:08:12,520
that I showed you
earlier, the linear term

1154
01:08:12,520 --> 01:08:14,800
is probably no longer
very significant.

1155
01:08:14,800 --> 01:08:17,470
I really need the
quadratic term.

1156
01:08:17,470 --> 01:08:20,500
So I might fit locally
a quadratic model just

1157
01:08:20,500 --> 01:08:24,069
near the optimum which allows
me in a restricted space

1158
01:08:24,069 --> 01:08:27,160
to get an accurate model
that really lets me zero in

1159
01:08:27,160 --> 01:08:31,180
on the optimum point.

1160
01:08:31,180 --> 01:08:33,490
So out here, a linear
model might be good

1161
01:08:33,490 --> 01:08:34,870
enough up in here.

1162
01:08:34,870 --> 01:08:37,840
I may need a beta
0 plus a beta 2 x

1163
01:08:37,840 --> 01:08:42,790
squared term, maybe still
also with a linear term

1164
01:08:42,790 --> 01:08:44,510
here as well.

1165
01:08:44,510 --> 01:08:46,510
But I can basically
build dynamically

1166
01:08:46,510 --> 01:08:50,189
the model getting an accurate
model near the optimum point.

1167
01:08:53,189 --> 01:08:55,689
Now, I showed you this
in 1D, the point 1D,

1168
01:08:55,689 --> 01:08:58,439
but you can also do
this with two inputs,

1169
01:08:58,439 --> 01:09:02,760
where I've got a 3D model if
this is an x1, this is an x2,

1170
01:09:02,760 --> 01:09:04,890
and this is a y.

1171
01:09:04,890 --> 01:09:07,020
But you can essentially
think the same thing.

1172
01:09:07,020 --> 01:09:13,770
If I start out here in this
space, locally it's linear.

1173
01:09:13,770 --> 01:09:17,490
I can use that to
suggest the next step

1174
01:09:17,490 --> 01:09:22,790
to take using a simplified
linear model in this region.

1175
01:09:22,790 --> 01:09:29,390
And then as I hill climb up,
as I get close to the optimum,

1176
01:09:29,390 --> 01:09:33,729
then again now near
the optimum, I need--

1177
01:09:33,729 --> 01:09:37,189
as my x1 and x2, I may
need a quadratic model

1178
01:09:37,189 --> 01:09:38,990
in those two coefficients.

1179
01:09:38,990 --> 01:09:42,560
But I can extend the same
idea to hill climbing

1180
01:09:42,560 --> 01:09:45,800
not only in one input, but
two inputs, three inputs,

1181
01:09:45,800 --> 01:09:51,220
multiple inputs in order
to get to an optimum point.

1182
01:09:51,220 --> 01:09:52,720
So essentially what
we're doing here

1183
01:09:52,720 --> 01:09:55,120
is, again, linear
gradient modeling,

1184
01:09:55,120 --> 01:09:59,590
it is useful often to include
still an interaction term.

1185
01:09:59,590 --> 01:10:02,330
But essentially we're doing
exactly that same thing.

1186
01:10:02,330 --> 01:10:05,840
And if my model
itself is linear,

1187
01:10:05,840 --> 01:10:07,145
an interesting thing happened.

1188
01:10:10,780 --> 01:10:12,400
Where is my overall optimum?

1189
01:10:12,400 --> 01:10:17,270
If I'm trying to
get to maximized y,

1190
01:10:17,270 --> 01:10:19,900
where's my maximum
y going to occur?

1191
01:10:19,900 --> 01:10:23,390
It will always occur on a
boundary when I hit a limit

1192
01:10:23,390 --> 01:10:27,080
of my input and x's.

1193
01:10:27,080 --> 01:10:30,880
So an important thing that
I haven't talked much about

1194
01:10:30,880 --> 01:10:34,560
is also the notion of
additional constraints.

1195
01:10:34,560 --> 01:10:39,460
We may be driving to an interior
point like in this model,

1196
01:10:39,460 --> 01:10:41,460
but it's also
possible that we may

1197
01:10:41,460 --> 01:10:46,410
be driving to either a corner
point or some other boundary

1198
01:10:46,410 --> 01:10:52,020
point because of a constraint
on my allowable ranges

1199
01:10:52,020 --> 01:10:53,040
for my x inputs.

1200
01:10:56,040 --> 01:10:58,110
There is another
piece of terminology

1201
01:10:58,110 --> 01:11:02,490
that's sometimes used for
these kinds of searches,

1202
01:11:02,490 --> 01:11:04,800
either steepest descent
or steepest descent,

1203
01:11:04,800 --> 01:11:07,620
whether you're climbing or
looking for a local minima.

1204
01:11:07,620 --> 01:11:10,680
And the basic point is when
I've got that simplified

1205
01:11:10,680 --> 01:11:15,470
linear model perhaps with
the linear interaction term

1206
01:11:15,470 --> 01:11:19,340
as well, you can think about
the local gradient with respect

1207
01:11:19,340 --> 01:11:24,650
to x1 or the local gradient
with respect to x2.

1208
01:11:24,650 --> 01:11:29,060
And now when you make your
step, what you often want to do

1209
01:11:29,060 --> 01:11:34,460
is make the step in the overall
steepest descent direction,

1210
01:11:34,460 --> 01:11:39,270
changing both your x1 and x2
parameter at the same time.

1211
01:11:39,270 --> 01:11:44,830
So this is simply
showing when I move

1212
01:11:44,830 --> 01:11:49,420
and hill climb, I may change
x1 and x2 proportionally

1213
01:11:49,420 --> 01:11:53,320
depending on the relative slope
in those two coefficients.

1214
01:11:53,320 --> 01:11:54,970
And it's relatively
easy once I've

1215
01:11:54,970 --> 01:11:57,880
got that model to
decide what direction is

1216
01:11:57,880 --> 01:12:00,880
the overall steepest descent.

1217
01:12:00,880 --> 01:12:04,820
Another point here is
that with quadratic terms,

1218
01:12:04,820 --> 01:12:09,230
you can have complicated
functions where your minima may

1219
01:12:09,230 --> 01:12:13,250
occur in the interior of
the space or your maxima

1220
01:12:13,250 --> 01:12:15,020
in the interior of the space.

1221
01:12:15,020 --> 01:12:20,450
But you can also have
hyperbolic or inverse polynomial

1222
01:12:20,450 --> 01:12:25,040
kinds of relationships
where, again, you

1223
01:12:25,040 --> 01:12:28,970
may have local minima or maxima
with respect to one variable

1224
01:12:28,970 --> 01:12:31,790
depending on what you're
doing with the other variable.

1225
01:12:31,790 --> 01:12:34,820
Or you may also have places
where you end up with a maxima

1226
01:12:34,820 --> 01:12:36,930
again at your constraint points.

1227
01:12:36,930 --> 01:12:42,660
So in your search, you've
got to account for both.

1228
01:12:42,660 --> 01:12:46,700
So I can summarize
what we've done here

1229
01:12:46,700 --> 01:12:50,030
with a combined procedure
for design of experiments

1230
01:12:50,030 --> 01:12:53,330
and optimization in either
the iterative fashion

1231
01:12:53,330 --> 01:13:02,450
or at the end, I'll allude to
evolutionary or incremental

1232
01:13:02,450 --> 01:13:04,050
kind of version.

1233
01:13:04,050 --> 01:13:08,540
So this is a summary of the last
two or three lectures boiled

1234
01:13:08,540 --> 01:13:13,100
down into a reminder-- a
summary of the basic process

1235
01:13:13,100 --> 01:13:16,610
or procedure for doing
DOE and optimization.

1236
01:13:16,610 --> 01:13:18,770
We said originally
our goal here is

1237
01:13:18,770 --> 01:13:22,070
to build a model, to do
a design of experiments.

1238
01:13:22,070 --> 01:13:23,870
I do want to
emphasize that depends

1239
01:13:23,870 --> 01:13:26,840
on some knowledge of the
process, a little bit

1240
01:13:26,840 --> 01:13:28,790
of knowledge either
experience based

1241
01:13:28,790 --> 01:13:30,980
or in the physics
of the process.

1242
01:13:30,980 --> 01:13:33,380
Because you need that
in order to do things

1243
01:13:33,380 --> 01:13:39,280
decide what the important
inputs are likely to be.

1244
01:13:39,280 --> 01:13:43,030
Now there are things you can
do with the DOE to confirm that

1245
01:13:43,030 --> 01:13:46,630
or to expand your knowledge,
like factor screening

1246
01:13:46,630 --> 01:13:47,420
experiments.

1247
01:13:47,420 --> 01:13:49,600
We talked about
fractional factorial

1248
01:13:49,600 --> 01:13:52,390
with large numbers of
coefficients where you're just

1249
01:13:52,390 --> 01:13:56,080
trying to decide is there
a main effect associated

1250
01:13:56,080 --> 01:13:57,550
with that factor.

1251
01:13:57,550 --> 01:14:01,700
But up front, defining the
inputs is very important.

1252
01:14:01,700 --> 01:14:05,890
We also need to define
limits on the inputs.

1253
01:14:05,890 --> 01:14:08,800
What space do we want
to explore and build

1254
01:14:08,800 --> 01:14:11,380
a model over in our
design of experiments?

1255
01:14:13,960 --> 01:14:17,350
So overall, we're going to
need to first build our--

1256
01:14:17,350 --> 01:14:19,210
decide on a DOE.

1257
01:14:19,210 --> 01:14:21,040
We'd go and run our experiments.

1258
01:14:21,040 --> 01:14:23,950
And then we're going to
construct our response service

1259
01:14:23,950 --> 01:14:24,610
model.

1260
01:14:24,610 --> 01:14:26,980
And if we're using it
for the optimization,

1261
01:14:26,980 --> 01:14:28,630
I also want to make
the point that you

1262
01:14:28,630 --> 01:14:32,470
need to think early on about
what your overall optimization

1263
01:14:32,470 --> 01:14:36,130
or penalty function is
because that may strongly

1264
01:14:36,130 --> 01:14:42,130
influence your DOE and maybe
even your factor selection.

1265
01:14:42,130 --> 01:14:45,760
So for example, if you
believe that you're really

1266
01:14:45,760 --> 01:14:53,650
going to need an optimization
that folds in things like noise

1267
01:14:53,650 --> 01:14:57,730
in addition to just
trying to get to a target,

1268
01:14:57,730 --> 01:15:02,500
that can have a profound effect
on the DOE that you explore.

1269
01:15:02,500 --> 01:15:05,420
And we'll talk about
that on Thursday,

1270
01:15:05,420 --> 01:15:07,930
where you might do
additional small experiments

1271
01:15:07,930 --> 01:15:10,600
at each point in
the DOE in order

1272
01:15:10,600 --> 01:15:15,040
to build a sensitivity
model of that delta y

1273
01:15:15,040 --> 01:15:18,790
as a function of some
additional noise factors.

1274
01:15:18,790 --> 01:15:22,210
So depending on
what it is you're

1275
01:15:22,210 --> 01:15:26,350
trying to achieve with your
model, that can of course,

1276
01:15:26,350 --> 01:15:29,920
I guess it's obvious,
that can affect

1277
01:15:29,920 --> 01:15:32,920
the structure of your model
and the design of experiments

1278
01:15:32,920 --> 01:15:34,780
that you want to do.

1279
01:15:34,780 --> 01:15:36,880
So we've already talked
about a lot of this.

1280
01:15:36,880 --> 01:15:39,730
Again in summary,
your DOE includes

1281
01:15:39,730 --> 01:15:43,600
decisions about what
likely terms you think

1282
01:15:43,600 --> 01:15:46,090
might be in there based on
your knowledge of the physics.

1283
01:15:46,090 --> 01:15:48,040
Is it going to be mostly linear?

1284
01:15:48,040 --> 01:15:50,680
Might there be quadratic terms?

1285
01:15:50,680 --> 01:15:53,110
That can influence
again the selection

1286
01:15:53,110 --> 01:15:55,810
of the high-low center points.

1287
01:15:55,810 --> 01:15:58,690
Do you need center points,
do you need three levels

1288
01:15:58,690 --> 01:16:01,610
for all factors, and so on.

1289
01:16:01,610 --> 01:16:04,480
And you also need to think about
things like the noise factors.

1290
01:16:04,480 --> 01:16:08,020
We talked about these
nuisance factors, if you will,

1291
01:16:08,020 --> 01:16:09,650
or additional noise factors.

1292
01:16:09,650 --> 01:16:13,420
So that you might randomize
or block against those.

1293
01:16:13,420 --> 01:16:16,120
If they're not going to be
explicitly in the model,

1294
01:16:16,120 --> 01:16:19,000
you don't want them
aliasing with or confounding

1295
01:16:19,000 --> 01:16:22,800
with the terms you actually had.

1296
01:16:22,800 --> 01:16:25,860
The response service modeling
is actually a pretty easy piece,

1297
01:16:25,860 --> 01:16:29,730
especially if you use
things like the regression

1298
01:16:29,730 --> 01:16:31,950
and the ANOVA approach.

1299
01:16:31,950 --> 01:16:35,760
Again, you can use
contrast, if you've

1300
01:16:35,760 --> 01:16:38,010
got a highly structured
design and experiment

1301
01:16:38,010 --> 01:16:41,030
for very rapid estimation
of those terms.

1302
01:16:41,030 --> 01:16:43,590
But overall, the
emphasis here is

1303
01:16:43,590 --> 01:16:49,260
you're trying to determine if
there's significant variation

1304
01:16:49,260 --> 01:16:52,530
in your data, are individual
terms significant,

1305
01:16:52,530 --> 01:16:53,950
are you missing terms.

1306
01:16:53,950 --> 01:16:57,180
So that lack of fit is
extremely important.

1307
01:16:57,180 --> 01:17:01,590
And there's often a very
interesting interplay

1308
01:17:01,590 --> 01:17:05,010
with the regression modeling.

1309
01:17:05,010 --> 01:17:08,640
In fact, an approach we
haven't talked about much,

1310
01:17:08,640 --> 01:17:11,280
but it's essentially
inherent in what

1311
01:17:11,280 --> 01:17:16,680
we've been talking about
here is also referred to as--

1312
01:17:16,680 --> 01:17:20,910
I think it's-- not
piece-wise, step-wise,

1313
01:17:20,910 --> 01:17:22,680
step-wise regression.

1314
01:17:22,680 --> 01:17:25,920
And some of the
interactive tools like JUMP

1315
01:17:25,920 --> 01:17:28,440
actually explicitly
support this,

1316
01:17:28,440 --> 01:17:32,500
where one factor at a
time, you look and say,

1317
01:17:32,500 --> 01:17:34,590
I would like to
add a term or drop

1318
01:17:34,590 --> 01:17:38,340
a term based on cut off decision
points, on significance,

1319
01:17:38,340 --> 01:17:39,260
and so on.

1320
01:17:39,260 --> 01:17:43,500
So you can build up an
appropriate regression model

1321
01:17:43,500 --> 01:17:48,270
by dropping or adding
terms as needed.

1322
01:17:48,270 --> 01:17:52,380
And we talked about this at a
fairly high order or high level

1323
01:17:52,380 --> 01:17:55,230
about the optimization
procedure and again, just

1324
01:17:55,230 --> 01:17:58,500
ideas of defining
your penalty function

1325
01:17:58,500 --> 01:18:01,380
and then searching
for your optimization

1326
01:18:01,380 --> 01:18:04,270
either piece-wise
or analytically.

1327
01:18:04,270 --> 01:18:06,470
I'll come back to
this in just a second.

1328
01:18:06,470 --> 01:18:08,530
But I do want to
emphasize that once you've

1329
01:18:08,530 --> 01:18:18,940
come to some expected
optimum point,

1330
01:18:18,940 --> 01:18:24,370
you really should check
that and confirm that often

1331
01:18:24,370 --> 01:18:30,100
because you're building
your estimate of your model

1332
01:18:30,100 --> 01:18:32,270
based on relatively
limited data,

1333
01:18:32,270 --> 01:18:33,910
especially in the
factorial models

1334
01:18:33,910 --> 01:18:37,690
perhaps with only one interior
point or center point based

1335
01:18:37,690 --> 01:18:41,050
on mostly extreme old data.

1336
01:18:41,050 --> 01:18:43,600
And especially if you've
driven your optimum

1337
01:18:43,600 --> 01:18:49,940
to some interior point
using say the analytic model

1338
01:18:49,940 --> 01:18:53,240
of the response service
model rather than iteratively

1339
01:18:53,240 --> 01:18:57,530
or incrementally, you're
making a lot of big assumptions

1340
01:18:57,530 --> 01:19:01,430
about the shape of the model
right near your optimum,

1341
01:19:01,430 --> 01:19:04,790
like it's convex right
at that optimum point.

1342
01:19:04,790 --> 01:19:09,230
So you really ought to go in
and do a confirming experiment

1343
01:19:09,230 --> 01:19:13,960
right at or right
near your optimum

1344
01:19:13,960 --> 01:19:16,480
in order to really
test the model

1345
01:19:16,480 --> 01:19:21,740
and consider model error
right at that point.

1346
01:19:21,740 --> 01:19:25,220
And that might actually drive
you to improving the model

1347
01:19:25,220 --> 01:19:31,080
or exploring slightly different
space right near that optimum.

1348
01:19:31,080 --> 01:19:32,810
Now the one last
thing I just want

1349
01:19:32,810 --> 01:19:37,340
to allude to is an
alternative approach

1350
01:19:37,340 --> 01:19:43,770
here is often starting with
some data point in a small space

1351
01:19:43,770 --> 01:19:47,760
and building your model
iteratively or adaptively.

1352
01:19:47,760 --> 01:19:50,230
And next week, at
the end of next week,

1353
01:19:50,230 --> 01:19:52,680
we'll have a guest
lecturer, Dan fry,

1354
01:19:52,680 --> 01:19:56,520
who has actually studied
one factor at a time

1355
01:19:56,520 --> 01:20:00,870
incremental exploration
and model building

1356
01:20:00,870 --> 01:20:04,260
for the purpose of
optimization a great deal.

1357
01:20:04,260 --> 01:20:07,920
So he's going to lead us
through an alternative approach

1358
01:20:07,920 --> 01:20:11,280
of actually doing
full factorial models

1359
01:20:11,280 --> 01:20:15,270
but trying to find the optimum
by not defining up front

1360
01:20:15,270 --> 01:20:19,150
the whole DOE and
running the whole thing,

1361
01:20:19,150 --> 01:20:23,790
but rather just walking
around your multifactor space

1362
01:20:23,790 --> 01:20:27,600
in order to try to
find the optimum point.

1363
01:20:27,600 --> 01:20:33,990
And that has some relationship
to another approach that

1364
01:20:33,990 --> 01:20:38,820
is also in May and Spanos in
chapter 8.5 which I've just

1365
01:20:38,820 --> 01:20:42,000
mentioned to you but not
expect that you actually

1366
01:20:42,000 --> 01:20:45,810
have to know a lot about, which
is evolutionary optimization.

1367
01:20:45,810 --> 01:20:49,830
Which would say build a
local model use that again

1368
01:20:49,830 --> 01:20:52,740
and a hill climbing fashion
to suggest where you

1369
01:20:52,740 --> 01:20:56,170
want to go for your next point.

1370
01:20:56,170 --> 01:20:59,130
Maybe in fact you simply
pick one of those corners.

1371
01:20:59,130 --> 01:21:01,800
And then you build a
do model around that.

1372
01:21:01,800 --> 01:21:04,920
And it might suggest
you move your process

1373
01:21:04,920 --> 01:21:08,370
to another corner, in which
case you build another model

1374
01:21:08,370 --> 01:21:13,710
and so on, so that you
can walk or evolutionarily

1375
01:21:13,710 --> 01:21:17,490
arrive at an optimum
point in your process,

1376
01:21:17,490 --> 01:21:21,200
building local
models along the way.

1377
01:21:21,200 --> 01:21:24,650
OK so next time, the
one additional topic

1378
01:21:24,650 --> 01:21:28,340
I want to mention in this space
of optimization and process

1379
01:21:28,340 --> 01:21:32,370
optimization and DOE is
this notion of robustness.

1380
01:21:32,370 --> 01:21:34,640
I'll allude to actually
building models

1381
01:21:34,640 --> 01:21:37,640
that include the
variance in them

1382
01:21:37,640 --> 01:21:40,440
and not just the overall output.

1383
01:21:40,440 --> 01:21:45,410
So we'll come back to that
on Thursday and enjoy.

1384
01:21:45,410 --> 01:21:47,750
In the meantime, I think
you've got the problem that

1385
01:21:47,750 --> 01:21:48,833
is due on Thursday.

1386
01:21:48,833 --> 01:21:50,750
And it's going to let
you explore a little bit

1387
01:21:50,750 --> 01:21:53,840
more some of these DOE and
response service model kinds

1388
01:21:53,840 --> 01:21:54,800
of things.

1389
01:21:54,800 --> 01:21:56,500
So we'll see you on Thursday.