1
00:00:00,000 --> 00:00:02,430
The following content is
provided under a Creative

2
00:00:02,430 --> 00:00:03,730
Commons license.

3
00:00:03,730 --> 00:00:06,030
Your support will help
MIT OpenCourseWare

4
00:00:06,030 --> 00:00:10,060
continue to offer high quality
educational resources for free.

5
00:00:10,060 --> 00:00:12,660
To make a donation or to
view additional materials

6
00:00:12,660 --> 00:00:16,560
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:16,560 --> 00:00:17,874
at ocw.mit.edu.

8
00:00:22,118 --> 00:00:23,660
DUANE BONING: So
what I'm going to do

9
00:00:23,660 --> 00:00:27,180
is pick up a little bit where
we left off from last time.

10
00:00:27,180 --> 00:00:29,210
Last time we were
talking, primarily,

11
00:00:29,210 --> 00:00:30,830
about full factorial models.

12
00:00:30,830 --> 00:00:33,650
And then we dealt with a few
important additional issues

13
00:00:33,650 --> 00:00:34,730
in experimental design.

14
00:00:34,730 --> 00:00:39,320
So we were talking
about issues of blocking

15
00:00:39,320 --> 00:00:42,470
against nuisance factors,
kind of a practical issue,

16
00:00:42,470 --> 00:00:47,030
but that also got us into
a generic issue that's

17
00:00:47,030 --> 00:00:51,200
actually much more fundamental
and important of confounding.

18
00:00:51,200 --> 00:00:55,220
And I want to pick up on this
issue of confounding today

19
00:00:55,220 --> 00:01:00,680
because very often you will
want to do fewer experiments

20
00:01:00,680 --> 00:01:03,410
than a full factorial.

21
00:01:03,410 --> 00:01:07,160
That 2 to the k
grows very fast--

22
00:01:07,160 --> 00:01:10,860
grows exponentially fast with a
number of factors, for example.

23
00:01:10,860 --> 00:01:13,340
And so very often you
might ask the question,

24
00:01:13,340 --> 00:01:15,830
can I reduce the
number of experiments

25
00:01:15,830 --> 00:01:19,490
and still get the key
information that I want to?

26
00:01:19,490 --> 00:01:21,920
And so that's where
we'll really pick up

27
00:01:21,920 --> 00:01:26,780
with fractional factorial
designs today and understanding

28
00:01:26,780 --> 00:01:29,870
confounding and
aliasing patterns

29
00:01:29,870 --> 00:01:36,410
that come with different subsets
of a full factorial design.

30
00:01:36,410 --> 00:01:38,000
Then we'll touch on
some implications

31
00:01:38,000 --> 00:01:42,630
for model construction that fall
out pretty naturally from that.

32
00:01:42,630 --> 00:01:45,050
And then start talking
a little bit, hopefully

33
00:01:45,050 --> 00:01:48,920
if we have time, on process
optimization using some

34
00:01:48,920 --> 00:01:52,190
of these kinds of
design of experiments

35
00:01:52,190 --> 00:01:54,590
techniques and the models
that we're building.

36
00:01:57,650 --> 00:02:00,650
So as I was saying,
very often we

37
00:02:00,650 --> 00:02:05,120
want to run fewer than that
exponentially growing number

38
00:02:05,120 --> 00:02:08,990
of experiments, even if
it's just to level, building

39
00:02:08,990 --> 00:02:11,150
simple linear models.

40
00:02:11,150 --> 00:02:14,870
Again, we've got a 2 to
the k exponential growth.

41
00:02:14,870 --> 00:02:17,660
And, as an example,
imagine we said

42
00:02:17,660 --> 00:02:21,380
we wanted to run less
than the full 2 to the k,

43
00:02:21,380 --> 00:02:22,850
say, for three inputs--

44
00:02:22,850 --> 00:02:26,240
so for three inputs, if we
run the full 2 to the k.

45
00:02:26,240 --> 00:02:28,940
And we wanted to form a
full linear regression

46
00:02:28,940 --> 00:02:32,300
model with interactions--
so it's still on quadratic,

47
00:02:32,300 --> 00:02:34,620
but it does have all
the interactions.

48
00:02:34,620 --> 00:02:36,740
This is what the
model looks like.

49
00:02:36,740 --> 00:02:38,630
And it's got, if
you count them up--

50
00:02:38,630 --> 00:02:42,150
it's got eight coefficients.

51
00:02:42,150 --> 00:02:47,000
So if we were to do less
than the full 2 to 3

52
00:02:47,000 --> 00:02:49,670
or 8 experiments, we
obviously would not

53
00:02:49,670 --> 00:02:55,640
have enough data points to
uniquely fit every coefficient.

54
00:02:55,640 --> 00:02:59,000
So that's already giving
us the biggest clue.

55
00:02:59,000 --> 00:03:04,610
Is if you do less than
a full factorial model,

56
00:03:04,610 --> 00:03:07,130
then even for
linear experiments,

57
00:03:07,130 --> 00:03:11,220
you won't be able to fit
all of the coefficients.

58
00:03:11,220 --> 00:03:17,060
So what you end up having to do
is make a decision, a priori,

59
00:03:17,060 --> 00:03:20,660
that some of these factor
effects are going to be small.

60
00:03:20,660 --> 00:03:23,870
That is to say, we're going
to treat the coefficient

61
00:03:23,870 --> 00:03:26,630
as essentially zero.

62
00:03:26,630 --> 00:03:28,370
But then what's
happening with the data?

63
00:03:28,370 --> 00:03:30,840
What impact does that have
on other coefficients?

64
00:03:30,840 --> 00:03:33,660
So we can explore that slightly.

65
00:03:33,660 --> 00:03:34,850
Here's an example.

66
00:03:34,850 --> 00:03:39,560
We'll call this a 2 to
the 3 minus 1 experiment.

67
00:03:39,560 --> 00:03:43,790
And I'll talk about these half
fractions a little bit more.

68
00:03:43,790 --> 00:03:46,850
But instead of the full 2
to the 3 or 8 experiments,

69
00:03:46,850 --> 00:03:50,000
we'll do a half fraction here.

70
00:03:50,000 --> 00:03:53,210
We'll just do 4 experiments
instead of the 8.

71
00:03:53,210 --> 00:03:58,520
And so if I were to pick my
factor levels on x1 and x2

72
00:03:58,520 --> 00:04:00,800
and then think about what the--

73
00:04:00,800 --> 00:04:03,560
so I'm just doing--

74
00:04:03,560 --> 00:04:07,460
in my mind, I might think if
I were just doing 2 factors,

75
00:04:07,460 --> 00:04:16,400
I would have a full
factorial on to 2 factors.

76
00:04:19,079 --> 00:04:24,980
And I also could calculate what
an interaction term would be.

77
00:04:24,980 --> 00:04:27,410
That's fine, that works well.

78
00:04:27,410 --> 00:04:32,660
I've got 4 coefficients
and a 2 factor experiment.

79
00:04:32,660 --> 00:04:36,950
And that I could uniquely fit
all of those coefficients.

80
00:04:36,950 --> 00:04:40,910
However, what if I were
to simply relabel that.

81
00:04:40,910 --> 00:04:44,180
And instead of thinking
of that as an interaction,

82
00:04:44,180 --> 00:04:49,710
what if instead I labeled
that an x3 column?

83
00:04:49,710 --> 00:04:52,850
Then I could fit the linear
term for an x3 model.

84
00:04:56,240 --> 00:05:00,620
So imagine I was
doing a 3 factor,

85
00:05:00,620 --> 00:05:03,470
but only worried
about main effects.

86
00:05:03,470 --> 00:05:05,150
I wasn't looking
for interaction.

87
00:05:05,150 --> 00:05:08,150
I didn't care about
the x1, x2 interaction.

88
00:05:08,150 --> 00:05:13,710
If I did that, and just defined
x3 as if it were that column,

89
00:05:13,710 --> 00:05:16,670
then I've got a 3
factor experiment,

90
00:05:16,670 --> 00:05:20,630
but I can only see main effects.

91
00:05:20,630 --> 00:05:22,520
But you see what's
going on here,

92
00:05:22,520 --> 00:05:25,010
I still have that
interaction term.

93
00:05:25,010 --> 00:05:27,680
I still have an
x1 x2 interaction.

94
00:05:27,680 --> 00:05:31,040
So the key idea of confounding,
that we saw before,

95
00:05:31,040 --> 00:05:34,670
is lurking in here.

96
00:05:34,670 --> 00:05:38,360
If I were to do
truly a third factor

97
00:05:38,360 --> 00:05:42,890
and set on my control knobs
for that experiment, my x3

98
00:05:42,890 --> 00:05:45,830
according to these
high-low settings,

99
00:05:45,830 --> 00:05:51,390
those would also give
me the same information,

100
00:05:51,390 --> 00:05:54,410
if you will, the same
combination settings for x1

101
00:05:54,410 --> 00:05:57,350
and x2, that I would
have used to help

102
00:05:57,350 --> 00:06:01,490
me detect an the interaction
between x1 and x2.

103
00:06:01,490 --> 00:06:04,010
That is to say, the
x1, x2 interaction is

104
00:06:04,010 --> 00:06:08,900
confounded with a third
factor, if I were to do it.

105
00:06:08,900 --> 00:06:12,560
Just expanding on that a
little bit, if I were--

106
00:06:12,560 --> 00:06:13,790
this should be a hat.

107
00:06:13,790 --> 00:06:16,610
There's another little
weird font thing going on.

108
00:06:16,610 --> 00:06:24,530
If I were doing a 3 factor
experiment, truly a 3 factor,

109
00:06:24,530 --> 00:06:28,100
I could build up to a
linear model OK with those 4

110
00:06:28,100 --> 00:06:31,460
experimental parameters,
but I could not

111
00:06:31,460 --> 00:06:35,590
fit any interaction terms.

112
00:06:35,590 --> 00:06:38,790
So I have a choice
in my model building.

113
00:06:38,790 --> 00:06:43,410
Do I take my three factors and
look for just main effects?

114
00:06:43,410 --> 00:06:46,140
Or do I stick with
my two factors

115
00:06:46,140 --> 00:06:49,860
and look for an interaction
effect and not model at all--

116
00:06:49,860 --> 00:06:54,810
pretend the third
factor doesn't matter

117
00:06:54,810 --> 00:06:58,320
or try to keep that constant
and not let that enter into it?

118
00:06:58,320 --> 00:06:59,630
Question--

119
00:06:59,630 --> 00:07:03,486
AUDIENCE: If you
computed this [INAUDIBLE]

120
00:07:03,486 --> 00:07:05,970
you still can't say
for sure that it

121
00:07:05,970 --> 00:07:08,660
was x3 or [? x on ?] x2.

122
00:07:08,660 --> 00:07:12,466
So you describe it
[INAUDIBLE] you really

123
00:07:12,466 --> 00:07:14,840
have no way of knowing, right?

124
00:07:14,840 --> 00:07:16,220
DUANE BONING: So if--

125
00:07:16,220 --> 00:07:20,840
so the question-- by the way,
could we put the back screen

126
00:07:20,840 --> 00:07:23,720
also to the slides?

127
00:07:23,720 --> 00:07:25,490
Great, thanks.

128
00:07:25,490 --> 00:07:28,730
So the question is, we
can still distinguish

129
00:07:28,730 --> 00:07:32,375
x1, x2, and x3 or not?

130
00:07:32,375 --> 00:07:34,070
AUDIENCE: No-- x3--

131
00:07:34,070 --> 00:07:36,980
the effect that
we ascribe to x3--

132
00:07:36,980 --> 00:07:39,560
x1, x2.

133
00:07:39,560 --> 00:07:42,803
So we cannot differentiate
between the two.

134
00:07:42,803 --> 00:07:46,100
Or [INAUDIBLE] might tell
us we have a huge effect

135
00:07:46,100 --> 00:07:48,275
because of x3--

136
00:07:48,275 --> 00:07:49,932
a significant effect.

137
00:07:49,932 --> 00:07:52,370
So we're not really
sure if it's x3

138
00:07:52,370 --> 00:07:55,362
that's having an effect or
an interaction of x1, x2.

139
00:07:55,362 --> 00:07:57,570
DUANE BONING: Yeah, you guys
may not have heard that,

140
00:07:57,570 --> 00:08:00,410
but basically, just
had the restatement

141
00:08:00,410 --> 00:08:02,030
of exactly the issue.

142
00:08:02,030 --> 00:08:06,500
That it's confounding
between is it an x3 factor

143
00:08:06,500 --> 00:08:11,610
or is it a beta 1,
2, x1, x2 factor.

144
00:08:11,610 --> 00:08:14,520
You cannot distinguish
between the two.

145
00:08:14,520 --> 00:08:18,310
AUDIENCE: So should
we do something

146
00:08:18,310 --> 00:08:21,922
different to find that out?

147
00:08:21,922 --> 00:08:23,380
DUANE BONING: Yeah,
so the question

148
00:08:23,380 --> 00:08:25,690
is what would you do then?

149
00:08:25,690 --> 00:08:32,620
There's a priori knowledge
that may guide you to a belief

150
00:08:32,620 --> 00:08:34,929
that there is no
interaction effect.

151
00:08:34,929 --> 00:08:37,803
That there's no
possible physical way

152
00:08:37,803 --> 00:08:39,970
there could be an interaction
effect, in which you'd

153
00:08:39,970 --> 00:08:42,190
should be safe.

154
00:08:42,190 --> 00:08:44,620
Or you might be
trying to say, I'm

155
00:08:44,620 --> 00:08:49,940
not even sure either of
these effects exists.

156
00:08:49,940 --> 00:08:53,170
And so I'm quite happy to
do just four experiments.

157
00:08:53,170 --> 00:09:00,370
Do an ANOVA that tells me
is either of those, just one

158
00:09:00,370 --> 00:09:02,770
or two, significant.

159
00:09:02,770 --> 00:09:04,510
In which case, now
I worry-- maybe

160
00:09:04,510 --> 00:09:10,150
then I add experimental
points to differentiate.

161
00:09:10,150 --> 00:09:12,340
So we've talked a little
bit more about that.

162
00:09:12,340 --> 00:09:18,070
But that's, in a nutshell,
the confounding and aliasing

163
00:09:18,070 --> 00:09:20,680
thought process.

164
00:09:20,680 --> 00:09:22,510
And we'll look at
some rules of thumb

165
00:09:22,510 --> 00:09:30,480
that would lead one to believe,
for example, that I may have,

166
00:09:30,480 --> 00:09:34,180
in many cases, more of an
a priori belief that maybe

167
00:09:34,180 --> 00:09:37,300
the main effect will be
stronger and more important.

168
00:09:37,300 --> 00:09:42,340
And I doubt that the interaction
effect will be there.

169
00:09:42,340 --> 00:09:44,950
But those are kind of rule
of thumb and assumptions

170
00:09:44,950 --> 00:09:47,500
that if you can do the
experiments, you can check.

171
00:09:51,490 --> 00:09:54,370
So the point out of
this was simply--

172
00:09:54,370 --> 00:09:57,300
again, these should be models--

173
00:09:57,300 --> 00:09:58,530
y hat models.

174
00:10:02,160 --> 00:10:05,910
For exactly the same
table of settings,

175
00:10:05,910 --> 00:10:09,270
if I looked at it one way,
I could build a model where

176
00:10:09,270 --> 00:10:12,000
I've got some interaction term.

177
00:10:12,000 --> 00:10:16,830
Or I can look at
the main effect.

178
00:10:16,830 --> 00:10:18,720
Or I can give up--

179
00:10:18,720 --> 00:10:21,720
so this could have been
the x3 main effect.

180
00:10:21,720 --> 00:10:25,950
Or similarly, I could
say, get confused

181
00:10:25,950 --> 00:10:30,070
between some other main effect
and an interaction effect.

182
00:10:30,070 --> 00:10:33,210
In other words, I've got four
coefficients, four experiments.

183
00:10:33,210 --> 00:10:35,970
I've got some
confounding going on,

184
00:10:35,970 --> 00:10:38,490
but it's not clear actually.

185
00:10:38,490 --> 00:10:41,730
I've given you the
example for one of them

186
00:10:41,730 --> 00:10:43,080
in the previous two slides.

187
00:10:43,080 --> 00:10:44,940
But the same thing
holds true for any

188
00:10:44,940 --> 00:10:49,000
of the other interactions
with one of the main effects.

189
00:10:49,000 --> 00:10:53,040
So if I really
have three factors,

190
00:10:53,040 --> 00:10:55,620
I might really
have eight kinds--

191
00:10:55,620 --> 00:10:56,910
in a linear model--

192
00:10:56,910 --> 00:10:59,790
I might actually
have eight terms.

193
00:10:59,790 --> 00:11:04,110
And what I've done is
folded four of them onto

194
00:11:04,110 --> 00:11:06,990
or confounded them with four
of the other coefficients.

195
00:11:09,730 --> 00:11:14,020
And so then the question becomes
what's confounded with what?

196
00:11:14,020 --> 00:11:18,730
How do I structure and
how do I pick which

197
00:11:18,730 --> 00:11:22,360
subset of experiments to run?

198
00:11:22,360 --> 00:11:26,710
So this is, essentially,
just saying the same thing.

199
00:11:26,710 --> 00:11:30,640
The point is we can
carefully pick which subset,

200
00:11:30,640 --> 00:11:31,630
in many cases--

201
00:11:31,630 --> 00:11:34,990
this kind of a very small
experimental design.

202
00:11:34,990 --> 00:11:36,730
But in many cases,
we can pick which

203
00:11:36,730 --> 00:11:40,510
subset of the rows
we want to take,

204
00:11:40,510 --> 00:11:42,940
which have a fraction
we want to do,

205
00:11:42,940 --> 00:11:47,920
based on our belief
in which interactions

206
00:11:47,920 --> 00:11:52,000
are going to be least
likely or least important.

207
00:11:52,000 --> 00:11:55,310
So what we want to do is get a
little bit of a feel for that.

208
00:11:55,310 --> 00:11:59,500
So here's a little
bit larger picture,

209
00:11:59,500 --> 00:12:02,980
as well as now extending
some of our shortcut

210
00:12:02,980 --> 00:12:08,470
techniques, our terminology,
our little factor algebra

211
00:12:08,470 --> 00:12:15,340
that we've got for defining rows
and columns in our experiments.

212
00:12:15,340 --> 00:12:17,470
So if I were doing
a three factor

213
00:12:17,470 --> 00:12:23,290
experiment, a full 2 to the 3
array, this is our x matrix.

214
00:12:23,290 --> 00:12:27,670
The identity column--
the C column--

215
00:12:27,670 --> 00:12:29,060
you can sort of see I've--

216
00:12:29,060 --> 00:12:30,640
all low, all high.

217
00:12:30,640 --> 00:12:34,090
And then within that, I've got
the B columns, low and high,

218
00:12:34,090 --> 00:12:36,190
low and high, low and high.

219
00:12:36,190 --> 00:12:40,170
And then within each of those,
low-high, low-high, low-high,

220
00:12:40,170 --> 00:12:40,670
low-high.

221
00:12:40,670 --> 00:12:45,550
So that's, again, all
16 of the possible--

222
00:12:45,550 --> 00:12:49,780
excuse me, all eight of
the possible combinations.

223
00:12:49,780 --> 00:12:53,720
And then we can construct
our interaction terms.

224
00:12:53,720 --> 00:12:56,260
These are two
factor interactions.

225
00:12:56,260 --> 00:12:59,420
And then we've got a three
factor interaction as well.

226
00:12:59,420 --> 00:13:03,190
So that would be all eight
of our model coefficients.

227
00:13:03,190 --> 00:13:08,410
The column would tell me
how to form the contrasts

228
00:13:08,410 --> 00:13:14,170
for detecting or estimating
each of those interaction

229
00:13:14,170 --> 00:13:15,570
terms in the model.

230
00:13:18,510 --> 00:13:23,520
So that's just our baseline.

231
00:13:23,520 --> 00:13:26,800
And now let's consider
what happens--

232
00:13:26,800 --> 00:13:33,420
let's think for this full
2 to the 3 experiment.

233
00:13:33,420 --> 00:13:36,100
If I were to only do
half of the experiments--

234
00:13:36,100 --> 00:13:41,730
let's say I did the upper half--
the top four, the shaded four

235
00:13:41,730 --> 00:13:48,870
experiments only, what happens
in terms of what coefficients

236
00:13:48,870 --> 00:13:50,100
can we estimate?

237
00:13:50,100 --> 00:13:51,450
What ones can we not?

238
00:13:51,450 --> 00:13:52,200
Please come on in.

239
00:13:54,800 --> 00:13:57,540
So which coefficients can I
estimate and which ones can

240
00:13:57,540 --> 00:14:00,000
I not?

241
00:14:00,000 --> 00:14:02,310
And what's confounded with what?

242
00:14:05,460 --> 00:14:09,870
We just got some
visitors filing in.

243
00:14:09,870 --> 00:14:12,360
I think there is, perhaps,
some bench space over there,

244
00:14:12,360 --> 00:14:13,770
as well, if you need some.

245
00:14:16,950 --> 00:14:19,680
So everyone in Singapore wave.

246
00:14:19,680 --> 00:14:21,670
Wave to our guests.

247
00:14:21,670 --> 00:14:22,170
Great.

248
00:14:27,220 --> 00:14:32,710
So if we do just this
upper half fraction here,

249
00:14:32,710 --> 00:14:37,180
let's look at a couple of things
that are immediately obvious.

250
00:14:37,180 --> 00:14:39,930
One that's really obvious here--

251
00:14:39,930 --> 00:14:41,930
look at the C column.

252
00:14:41,930 --> 00:14:45,280
If we do those set
of four experiments,

253
00:14:45,280 --> 00:14:50,800
can you estimate what the
C effect is going to be?

254
00:14:50,800 --> 00:14:53,530
No, we haven't excited
that variable at all.

255
00:14:53,530 --> 00:14:57,670
It's all four experiments, we're
at the low setting of that.

256
00:14:57,670 --> 00:15:00,730
So it's clear, if I
picked those four,

257
00:15:00,730 --> 00:15:04,923
I'm basically making an upfront
decision that the C effect--

258
00:15:04,923 --> 00:15:06,340
I'm not going to
be able to model.

259
00:15:06,340 --> 00:15:07,798
I'm not going to
be able to fit it.

260
00:15:07,798 --> 00:15:09,550
I'm not going to know
if it's significant.

261
00:15:09,550 --> 00:15:14,560
I'm not exercising
that variable in a way

262
00:15:14,560 --> 00:15:16,360
with the right
combination of experiments

263
00:15:16,360 --> 00:15:22,150
to unambiguously say,
yes, there was a C effect.

264
00:15:22,150 --> 00:15:28,750
So one way that we can describe
that in our funky DOE algebra

265
00:15:28,750 --> 00:15:33,130
is to say that the C column
is equal to minus the identity

266
00:15:33,130 --> 00:15:34,250
column.

267
00:15:34,250 --> 00:15:38,350
It's just minus 1 times
the identity column.

268
00:15:38,350 --> 00:15:41,680
And we can also look at
some of the other columns.

269
00:15:41,680 --> 00:15:42,790
And here's a neat one.

270
00:15:42,790 --> 00:15:45,370
Let's say-- let's look at this.

271
00:15:45,370 --> 00:15:53,560
The AC column, right
here, and the A column.

272
00:15:53,560 --> 00:15:55,240
And again, in our
funky algebra, you

273
00:15:55,240 --> 00:15:58,270
can see already this
AC equals minus A.

274
00:15:58,270 --> 00:16:00,700
But if you look in one
to one correspondence,

275
00:16:00,700 --> 00:16:07,120
the AC column is exactly the
same, just with a minus sign.

276
00:16:07,120 --> 00:16:11,620
The same combinations
of levels as the--

277
00:16:11,620 --> 00:16:15,740
both the A and the AC have
the same combinations.

278
00:16:15,740 --> 00:16:19,840
So that's confounding
or aliasing

279
00:16:19,840 --> 00:16:21,400
between those two columns.

280
00:16:21,400 --> 00:16:23,980
So even if I run
this experiment,

281
00:16:23,980 --> 00:16:26,950
there's no way for
me to differentiate

282
00:16:26,950 --> 00:16:33,520
whether it was a main effect, an
A effect, or an AC interaction.

283
00:16:33,520 --> 00:16:36,970
Now the other point here is
with just this selection, again,

284
00:16:36,970 --> 00:16:39,970
of the top four
columns, the same thing

285
00:16:39,970 --> 00:16:41,680
happens to other columns.

286
00:16:41,680 --> 00:16:44,110
What's the B column
confounded with?

287
00:16:52,990 --> 00:16:53,830
No, we've got some--

288
00:16:57,298 --> 00:16:59,590
somebody leaning against the
wall trying to stay awake,

289
00:16:59,590 --> 00:17:00,090
I think.

290
00:17:04,480 --> 00:17:06,400
The disembodied
voice in the machine

291
00:17:06,400 --> 00:17:10,869
said to tell you that
that's happened before.

292
00:17:10,869 --> 00:17:13,869
So the B column is
also confounded with--

293
00:17:13,869 --> 00:17:15,310
let's see, what?

294
00:17:15,310 --> 00:17:17,950
Minus 1-- this one.

295
00:17:17,950 --> 00:17:22,480
And then finally, the
last one is the AB column

296
00:17:22,480 --> 00:17:23,740
is confounded with--

297
00:17:27,826 --> 00:17:31,520
let's see, that's just minus
1 of the ABC, it looks like.

298
00:17:31,520 --> 00:17:33,200
Do I have that right?

299
00:17:33,200 --> 00:17:35,150
Yeah.

300
00:17:35,150 --> 00:17:36,290
So that was our point here.

301
00:17:36,290 --> 00:17:38,630
If I had all eight
experiments, I

302
00:17:38,630 --> 00:17:41,510
could have fit eight
coefficients in the model.

303
00:17:41,510 --> 00:17:48,140
But four experiments-- I can
only fit four coefficients.

304
00:17:48,140 --> 00:17:51,710
And it's more important than
just fitting four coefficients.

305
00:17:51,710 --> 00:17:57,800
It's really that I folded
together the effects of two

306
00:17:57,800 --> 00:18:00,830
of those columns
into one coefficient

307
00:18:00,830 --> 00:18:03,440
that I have to
assign either to--

308
00:18:03,440 --> 00:18:05,330
sort of nebulously--
either to one

309
00:18:05,330 --> 00:18:08,450
or the other of the main effect,
or the interaction effect,

310
00:18:08,450 --> 00:18:10,520
or to some superposition
of the two.

311
00:18:14,740 --> 00:18:18,130
So this is just, basically,
saying the same thing.

312
00:18:18,130 --> 00:18:19,810
I'm just pointing out--

313
00:18:19,810 --> 00:18:21,747
looking at the columns,
you can see that.

314
00:18:21,747 --> 00:18:23,830
And if you were to actually
follow through and use

315
00:18:23,830 --> 00:18:27,370
our contrast map for picking
out and detecting what

316
00:18:27,370 --> 00:18:32,230
the contrast for effect A or
the contrast for effect AC

317
00:18:32,230 --> 00:18:35,650
would be, they're essentially
the same contrast.

318
00:18:35,650 --> 00:18:40,450
The same sum of the output rows
for that particular effect.

319
00:18:43,290 --> 00:18:46,680
Now, we have a shorthand
way of describing

320
00:18:46,680 --> 00:18:50,920
what this confounding pattern
is in our funky algebra.

321
00:18:50,920 --> 00:18:56,340
Which is to say, we equate
what columns are equal to what

322
00:18:56,340 --> 00:18:58,620
columns.

323
00:18:58,620 --> 00:19:06,030
And then do a little bit of our
algebra of multiplying a column

324
00:19:06,030 --> 00:19:09,270
factor or level
setting by each other

325
00:19:09,270 --> 00:19:12,030
until we get down to
the identity column

326
00:19:12,030 --> 00:19:13,930
on one side or the other.

327
00:19:13,930 --> 00:19:16,260
And then that becomes
a shorthand way

328
00:19:16,260 --> 00:19:19,260
of describing what the whole
confounding pattern is.

329
00:19:19,260 --> 00:19:21,990
So for example, I can
pick, actually, almost any

330
00:19:21,990 --> 00:19:24,090
of these confounding patterns.

331
00:19:24,090 --> 00:19:28,470
Looking at the AC in
the A column, where I

332
00:19:28,470 --> 00:19:30,840
detect that the AC and the A--

333
00:19:30,840 --> 00:19:36,330
this was the A column is equal
to minus of the AC column.

334
00:19:36,330 --> 00:19:41,460
Now if I multiply both
sides by the A column--

335
00:19:41,460 --> 00:19:43,230
A times minus AC--

336
00:19:46,520 --> 00:19:50,990
then on a element
by element basis,

337
00:19:50,990 --> 00:19:54,470
if I take the A column there
and multiply it by itself,

338
00:19:54,470 --> 00:19:56,690
every minus 1
multiplies by minus 1.

339
00:19:56,690 --> 00:20:00,240
And I get the identity.

340
00:20:00,240 --> 00:20:03,030
Over here I've got
A times a minus A.

341
00:20:03,030 --> 00:20:04,950
And that also
becomes the identity.

342
00:20:04,950 --> 00:20:08,910
And I end up with just the
I equals minus C. Which

343
00:20:08,910 --> 00:20:12,240
I could have also looked
right directly at the column

344
00:20:12,240 --> 00:20:17,780
and said, OK, what is
aliased with the identity?

345
00:20:17,780 --> 00:20:22,110
But also so you get a little
familiar with that funky column

346
00:20:22,110 --> 00:20:24,830
math.

347
00:20:24,830 --> 00:20:27,470
So that, basically,
is a shorthand way

348
00:20:27,470 --> 00:20:32,810
of either describing the
interaction effect or more

349
00:20:32,810 --> 00:20:35,600
usefully, if I
knew up front what

350
00:20:35,600 --> 00:20:42,170
interactions I was willing to
confound with what effect--

351
00:20:42,170 --> 00:20:48,410
If at the start I said, I really
believe that the BC interaction

352
00:20:48,410 --> 00:20:54,170
is going to be minor compared
to some other effect,

353
00:20:54,170 --> 00:20:57,020
then I could use that to
actually help guide which

354
00:20:57,020 --> 00:21:01,310
half of the experiment to pick.

355
00:21:01,310 --> 00:21:05,480
So for example, I could
have picked this half

356
00:21:05,480 --> 00:21:07,790
or I could have
picked this half.

357
00:21:07,790 --> 00:21:11,750
And each one of those would have
been consistent half fractions.

358
00:21:11,750 --> 00:21:15,350
And the choice of what
to pick depends on--

359
00:21:15,350 --> 00:21:20,840
for example, do I think
AC is an alias with A,

360
00:21:20,840 --> 00:21:22,820
maybe, because AC
really shouldn't

361
00:21:22,820 --> 00:21:24,800
be interacting with each other?

362
00:21:24,800 --> 00:21:26,480
I'm happy to do that.

363
00:21:26,480 --> 00:21:28,850
But the AB interaction,
I might actually

364
00:21:28,850 --> 00:21:32,900
be wanting to detect that I'm
willing for that to confound

365
00:21:32,900 --> 00:21:36,860
with the B factor
because I think that that

366
00:21:36,860 --> 00:21:39,745
may be less important.

367
00:21:39,745 --> 00:21:41,120
AUDIENCE: Just
seems to be asking

368
00:21:41,120 --> 00:21:45,140
that A is a processes pressure
and that C is the temperature--

369
00:21:45,140 --> 00:21:46,610
for sure temperature.

370
00:21:46,610 --> 00:21:50,320
So in that case, the
interaction between pressure

371
00:21:50,320 --> 00:21:53,920
and temperature, AC,
if temperature is high

372
00:21:53,920 --> 00:21:55,870
and pressure is
low with the same,

373
00:21:55,870 --> 00:21:58,340
the interaction would
be the same as A?

374
00:21:58,340 --> 00:22:00,745
Because that A is equal to AC.

375
00:22:00,745 --> 00:22:01,495
DUANE BONING: Yes.

376
00:22:05,508 --> 00:22:08,050
AUDIENCE: It will just change
one parameter, just temperature

377
00:22:08,050 --> 00:22:08,563
[INAUDIBLE]

378
00:22:08,563 --> 00:22:09,980
DUANE BONING: But
what it's saying

379
00:22:09,980 --> 00:22:13,760
is that your model, then,
for what the A affect was,

380
00:22:13,760 --> 00:22:15,800
wasn't purely an A affect.

381
00:22:15,800 --> 00:22:19,890
It had a little bit of the
interaction lurking in it.

382
00:22:19,890 --> 00:22:24,770
So when you actually
run that experiment,

383
00:22:24,770 --> 00:22:26,930
if there is an
interaction, you can't

384
00:22:26,930 --> 00:22:31,010
tell whether it was due to
the main effect of pressure

385
00:22:31,010 --> 00:22:36,570
or also because of some
interaction with temperature.

386
00:22:36,570 --> 00:22:44,180
So it's not that you get both or
you only get one or the other,

387
00:22:44,180 --> 00:22:45,980
it's that they're
mixed together.

388
00:22:45,980 --> 00:22:47,160
They're confounded together.

389
00:22:50,580 --> 00:22:57,410
So I alluded to some of the
ideas of how you choose which

390
00:22:57,410 --> 00:22:59,210
design, based on what
confounding you're

391
00:22:59,210 --> 00:23:00,650
willing to live with.

392
00:23:00,650 --> 00:23:05,790
But there's also a few
additional guidelines

393
00:23:05,790 --> 00:23:07,020
that are at work.

394
00:23:07,020 --> 00:23:11,455
One is, there's this idea of
balance and orthogonality--

395
00:23:11,455 --> 00:23:13,080
that I'll talk about
in just a minute--

396
00:23:13,080 --> 00:23:19,290
that tells us you can't just
willy nilly pick random rows

397
00:23:19,290 --> 00:23:24,420
out of your matrix and be able
to use our design experiments

398
00:23:24,420 --> 00:23:26,470
and that analytic techniques.

399
00:23:26,470 --> 00:23:30,300
Things like the estimation
of the effects and so on only

400
00:23:30,300 --> 00:23:35,040
apply if I've got balanced
and orthogonal experiments.

401
00:23:35,040 --> 00:23:37,620
And that leads, also,
a bit, to this idea

402
00:23:37,620 --> 00:23:39,540
or very closely
related to the idea

403
00:23:39,540 --> 00:23:42,780
of getting enough
excitation of the inputs.

404
00:23:46,460 --> 00:23:52,370
So going back to our 2 to
the 3 full table here--

405
00:23:52,370 --> 00:23:56,470
this idea of balance,
first off, says

406
00:23:56,470 --> 00:24:00,880
that in whatever subset
of the design that you've

407
00:24:00,880 --> 00:24:03,700
got for the factors that
you're interested in,

408
00:24:03,700 --> 00:24:07,570
you want all of the columns to
have an equal number of plus

409
00:24:07,570 --> 00:24:09,150
and minus signs.

410
00:24:09,150 --> 00:24:16,400
Now, we saw if I did the
upper fraction for C,

411
00:24:16,400 --> 00:24:18,710
if I was trying to
deal with C, that's

412
00:24:18,710 --> 00:24:23,390
not balanced because I've got
four low settings and zero

413
00:24:23,390 --> 00:24:24,050
high settings.

414
00:24:24,050 --> 00:24:29,700
I have not at all
excited that input.

415
00:24:29,700 --> 00:24:32,400
But if you think back
to all of our algebra

416
00:24:32,400 --> 00:24:34,393
for dealing with
contrasts and being

417
00:24:34,393 --> 00:24:36,810
able to take the average of
this, and the average of that,

418
00:24:36,810 --> 00:24:39,810
and subtract the two
with the contrast,

419
00:24:39,810 --> 00:24:42,390
that's basically
the reason that we

420
00:24:42,390 --> 00:24:45,660
need this idea of
balance between the high

421
00:24:45,660 --> 00:24:48,850
and the low settings for
that particular factor.

422
00:24:48,850 --> 00:24:51,300
So to use our
shorthand approaches,

423
00:24:51,300 --> 00:24:55,340
you need an equal number
of the plus and minus signs

424
00:24:55,340 --> 00:24:57,410
in each of the
columns that you are

425
00:24:57,410 --> 00:24:59,150
trying to use in your model.

426
00:25:02,210 --> 00:25:04,430
Turns out you can relax
that a little bit if you

427
00:25:04,430 --> 00:25:06,740
do some regression approaches.

428
00:25:06,740 --> 00:25:09,350
There's some other
risky, nasty things

429
00:25:09,350 --> 00:25:12,620
that happen like
you may not have

430
00:25:12,620 --> 00:25:15,530
the same amount of
variance or residual error

431
00:25:15,530 --> 00:25:18,770
at different points
in your experiment.

432
00:25:18,770 --> 00:25:20,900
But especially for the
shorthand approaches,

433
00:25:20,900 --> 00:25:25,310
you have to have this
notion of balance.

434
00:25:25,310 --> 00:25:29,210
The second idea
is orthogonality,

435
00:25:29,210 --> 00:25:32,570
which is basically
saying that what I need

436
00:25:32,570 --> 00:25:38,810
is for the sum of the product,
element wise product of two

437
00:25:38,810 --> 00:25:42,110
columns, to sum up to 0.

438
00:25:42,110 --> 00:25:46,320
So for example, here,
the column A and B--

439
00:25:46,320 --> 00:25:48,210
this is a product of one--

440
00:25:48,210 --> 00:25:51,300
product minus 1, product
minus 1, product of 1.

441
00:25:51,300 --> 00:25:53,670
Those sum up together to be 0.

442
00:25:53,670 --> 00:25:58,270
The A and B columns
are orthogonal.

443
00:25:58,270 --> 00:26:02,910
Which is another way of saying
those two columns are not

444
00:26:02,910 --> 00:26:05,050
confounded with each other.

445
00:26:05,050 --> 00:26:08,190
So think in a linear
algebraic sense.

446
00:26:08,190 --> 00:26:10,320
If the two vectors
are orthogonal,

447
00:26:10,320 --> 00:26:14,550
they are not mixing
together effects in any way.

448
00:26:14,550 --> 00:26:18,330
So if I have two
columns that I want

449
00:26:18,330 --> 00:26:21,120
to be able to model
separately and not

450
00:26:21,120 --> 00:26:23,880
have the coefficient
trying to mix in randomly

451
00:26:23,880 --> 00:26:27,390
some amount of one or
the other, but, in fact,

452
00:26:27,390 --> 00:26:30,900
to be identified with
that particular effect,

453
00:26:30,900 --> 00:26:35,070
those two columns
have to be orthogonal.

454
00:26:35,070 --> 00:26:37,020
So for example,
what that's telling

455
00:26:37,020 --> 00:26:42,690
me is if I were to pick
this upper half fraction,

456
00:26:42,690 --> 00:26:45,840
the A and the B
columns are orthogonal.

457
00:26:45,840 --> 00:26:48,190
They are not confounded
with each other.

458
00:26:48,190 --> 00:26:51,180
They are two main effects
that, at least with respect

459
00:26:51,180 --> 00:26:55,730
to each other, are separable.

460
00:26:55,730 --> 00:26:58,610
You could ask that same question
now to our other confounding

461
00:26:58,610 --> 00:26:59,480
patterns.

462
00:26:59,480 --> 00:27:00,220
What did we say?

463
00:27:00,220 --> 00:27:02,390
A is confounded with BC--

464
00:27:02,390 --> 00:27:05,670
was it?

465
00:27:05,670 --> 00:27:07,950
So A and BC--

466
00:27:07,950 --> 00:27:11,450
are those two orthogonal?

467
00:27:11,450 --> 00:27:13,040
Certainly, we know
by confounding,

468
00:27:13,040 --> 00:27:14,165
they're not supposed to be.

469
00:27:14,165 --> 00:27:18,430
And if you do that product
of sums, each one of those,

470
00:27:18,430 --> 00:27:20,560
I believe--

471
00:27:20,560 --> 00:27:22,060
AC, sorry.

472
00:27:22,060 --> 00:27:24,010
Good-- I was about
to say, that's weird.

473
00:27:24,010 --> 00:27:25,390
They look orthogonal.

474
00:27:25,390 --> 00:27:26,470
There we go.

475
00:27:26,470 --> 00:27:29,470
Each one-- the product
is minus 1 every time

476
00:27:29,470 --> 00:27:31,060
they all sum up to minus 4.

477
00:27:31,060 --> 00:27:33,010
They're not orthogonal.

478
00:27:33,010 --> 00:27:34,250
They are mixing in together.

479
00:27:36,910 --> 00:27:38,530
So those are a
couple of these ideas

480
00:27:38,530 --> 00:27:40,750
of balance and
orthogonality that

481
00:27:40,750 --> 00:27:43,030
are in some sense the same--

482
00:27:43,030 --> 00:27:45,820
just another terminology for
talking about the same thing

483
00:27:45,820 --> 00:27:47,870
that we've already built
up some intuition about.

484
00:27:50,560 --> 00:27:55,910
So I think we've already
just went through that.

485
00:27:55,910 --> 00:27:58,540
So if I were, in
this experiment,

486
00:27:58,540 --> 00:28:02,890
doing a half fraction and
picking the upper column,

487
00:28:02,890 --> 00:28:07,120
you can see things like A
and B are balanced, C is not.

488
00:28:07,120 --> 00:28:11,740
So I can't try to put
that into my model.

489
00:28:11,740 --> 00:28:17,650
A, B, and C are orthogonal.

490
00:28:17,650 --> 00:28:20,680
Let's see, is that right?

491
00:28:20,680 --> 00:28:22,420
Yes-- so if I were--

492
00:28:22,420 --> 00:28:24,720
except I'm already knowing
I better not try model

493
00:28:24,720 --> 00:28:27,580
C because I don't have
sufficient excitation in there.

494
00:28:27,580 --> 00:28:30,430
But I could also then
ask, OK, what else

495
00:28:30,430 --> 00:28:32,780
is confounded with each other?

496
00:28:32,780 --> 00:28:35,410
And if there are places where
I don't have orthogonality,

497
00:28:35,410 --> 00:28:40,660
I might then be led to is
there a different half fraction

498
00:28:40,660 --> 00:28:42,730
that I might want to pick.

499
00:28:42,730 --> 00:28:45,580
One might be this lower one.

500
00:28:45,580 --> 00:28:47,740
I've already looked
at the upper half.

501
00:28:47,740 --> 00:28:51,220
But couldn't there be other
meaningful combinations?

502
00:28:51,220 --> 00:28:53,110
And the answer is yes.

503
00:28:53,110 --> 00:28:55,810
So for example,
here's a better subset

504
00:28:55,810 --> 00:29:01,990
if I were looking for a
particular property that was

505
00:29:01,990 --> 00:29:04,330
annoying with the previous one.

506
00:29:04,330 --> 00:29:08,500
In particular, this fact
that I can't even model.

507
00:29:08,500 --> 00:29:10,975
I'm not even exciting
a main effect.

508
00:29:13,900 --> 00:29:16,540
That feels weird.

509
00:29:16,540 --> 00:29:19,360
So a better half
fraction, then, at least

510
00:29:19,360 --> 00:29:22,360
lets me look at main effects.

511
00:29:22,360 --> 00:29:25,810
Might be this shaded
blue half fraction.

512
00:29:25,810 --> 00:29:27,670
Now look what's going on.

513
00:29:27,670 --> 00:29:32,770
All three columns A,
B, and C are balanced--

514
00:29:32,770 --> 00:29:36,370
just the shaded blue parts.

515
00:29:36,370 --> 00:29:39,860
So I've got equal
numbers of high and low,

516
00:29:39,860 --> 00:29:42,950
each of A, B, and C are
mutually orthogonal.

517
00:29:48,740 --> 00:29:52,990
I still have
confounding going on.

518
00:29:52,990 --> 00:29:56,400
I'm not sure quite
exactly what this means.

519
00:29:56,400 --> 00:30:01,670
I guess A-- let's see.

520
00:30:01,670 --> 00:30:05,280
I still have confounding
because, for example--

521
00:30:05,280 --> 00:30:09,020
So A is not orthogonal
with what column here?

522
00:30:09,020 --> 00:30:13,966
BC in this one.

523
00:30:13,966 --> 00:30:16,720
Because 1, 1 minus--

524
00:30:16,720 --> 00:30:18,820
so these are the same.

525
00:30:18,820 --> 00:30:22,540
So these two columns are
confounded with each other.

526
00:30:22,540 --> 00:30:26,280
So those are not
mutually orthogonal.

527
00:30:26,280 --> 00:30:29,160
And I could ask, OK,
what is a shorthand

528
00:30:29,160 --> 00:30:30,457
way of describing this?

529
00:30:30,457 --> 00:30:32,040
I could pick a couple
of these columns

530
00:30:32,040 --> 00:30:37,260
and say A equals BC and do,
again, my funky algebra.

531
00:30:37,260 --> 00:30:42,390
Multiply both sides by
A and get I equals ABC.

532
00:30:42,390 --> 00:30:44,580
And there's my
defining relationship

533
00:30:44,580 --> 00:30:46,510
for what's confounded with what.

534
00:30:46,510 --> 00:30:52,420
Could also look up here,
again, and ABC is exactly--

535
00:30:52,420 --> 00:30:55,120
for the blue-- for that subset--

536
00:30:55,120 --> 00:30:57,070
for the blue element
is confounded

537
00:30:57,070 --> 00:30:58,350
with the orthogonality matrix.

538
00:31:01,750 --> 00:31:04,750
What might be better about
this design or this subset

539
00:31:04,750 --> 00:31:06,670
than the previous one?

540
00:31:06,670 --> 00:31:09,100
Depending on what you're
looking for in the--

541
00:31:09,100 --> 00:31:12,730
what's better about this?

542
00:31:12,730 --> 00:31:14,590
Here we have, I guess,
shorthand for us--

543
00:31:14,590 --> 00:31:16,510
what the other aliases are.

544
00:31:16,510 --> 00:31:19,000
Again, we expect to have
four aliases because I'm only

545
00:31:19,000 --> 00:31:22,000
doing four experiments.

546
00:31:22,000 --> 00:31:23,500
What do you see
about these aliases?

547
00:31:28,630 --> 00:31:30,950
AUDIENCE: [INAUDIBLE]
high and low values

548
00:31:30,950 --> 00:31:35,880
for all the main effects
and one only third order,

549
00:31:35,880 --> 00:31:39,100
like ABC, is the one that
you have not excited at all,

550
00:31:39,100 --> 00:31:41,520
it's the one that [INAUDIBLE]

551
00:31:41,520 --> 00:31:44,590
DUANE BONING: Good--
so that's at least two

552
00:31:44,590 --> 00:31:48,460
of the three properties that I
really like about this design.

553
00:31:48,460 --> 00:31:52,930
If everybody didn't hear
it, he said, at least

554
00:31:52,930 --> 00:31:58,150
you're exciting A, B, and
C. Second, the only column

555
00:31:58,150 --> 00:32:00,700
that you're not exciting
at all is a third order

556
00:32:00,700 --> 00:32:03,520
interaction, which is fine.

557
00:32:03,520 --> 00:32:05,440
How likely is a third
order interaction?

558
00:32:05,440 --> 00:32:08,000
We'll chat about that
a little bit more.

559
00:32:08,000 --> 00:32:09,715
There's another
additional effect.

560
00:32:09,715 --> 00:32:14,300
There's another additional
really nice characteristic

561
00:32:14,300 --> 00:32:17,994
about looking at these
aliasing patterns here.

562
00:32:17,994 --> 00:32:20,050
AUDIENCE: Like, A
is [INAUDIBLE] BC

563
00:32:20,050 --> 00:32:23,480
so they're not measuring
exactly the same amount.

564
00:32:23,480 --> 00:32:25,130
DUANE BONING:
Right, they don't--

565
00:32:25,130 --> 00:32:26,570
I don't know quite what--

566
00:32:26,570 --> 00:32:29,480
don't overlap is a nice
way of describing it.

567
00:32:29,480 --> 00:32:33,290
I've got a main effect
that is confounded

568
00:32:33,290 --> 00:32:43,280
with an interaction of somebody
else's second order term.

569
00:32:43,280 --> 00:32:45,750
So it kind of gets around
that discomfort with A

570
00:32:45,750 --> 00:32:50,430
being compounded with AC
that you were describing.

571
00:32:50,430 --> 00:32:54,080
Now, I still have to worry about
it because if any second order

572
00:32:54,080 --> 00:32:59,660
effects are out there, if
AC is still active, now

573
00:32:59,660 --> 00:33:02,540
I'm not confusing it
with A main effect.

574
00:33:02,540 --> 00:33:05,900
I am confusing though
with the B main effect.

575
00:33:05,900 --> 00:33:12,690
But the nice thing is now I
can- you'll see it a little bit

576
00:33:12,690 --> 00:33:13,190
later--

577
00:33:13,190 --> 00:33:18,620
I can also appeal to
some physical causality

578
00:33:18,620 --> 00:33:22,910
to also talk about the
likelihood of it being

579
00:33:22,910 --> 00:33:26,180
a main effect or an interaction
effect, when I finally

580
00:33:26,180 --> 00:33:29,240
analyze my overall experiment.

581
00:33:29,240 --> 00:33:30,950
What am I saying?

582
00:33:30,950 --> 00:33:33,050
Here's the peek forward at that.

583
00:33:33,050 --> 00:33:37,330
If I were to find that this--

584
00:33:37,330 --> 00:33:42,530
either the B main effect or
the AC interaction is at work,

585
00:33:42,530 --> 00:33:47,270
but I also did an ANOVA
and I found out already

586
00:33:47,270 --> 00:33:50,850
that this was not
significant, in other words,

587
00:33:50,850 --> 00:33:54,500
they're not an A main
effect, it's highly unlikely

588
00:33:54,500 --> 00:33:57,830
that there's an
interaction with A.

589
00:33:57,830 --> 00:33:59,870
If it doesn't have
an overall effect,

590
00:33:59,870 --> 00:34:02,570
why would it have a
subtle interaction.

591
00:34:02,570 --> 00:34:05,030
It's kind of unlikely.

592
00:34:05,030 --> 00:34:07,910
So by ordering
this, you actually

593
00:34:07,910 --> 00:34:11,150
can start to decode what's
going on in the process

594
00:34:11,150 --> 00:34:13,070
with a little bit
better visibility.

595
00:34:15,900 --> 00:34:20,489
So what we're actually
talking about here

596
00:34:20,489 --> 00:34:22,560
there is some
additional terminology

597
00:34:22,560 --> 00:34:25,350
for which is design resolution.

598
00:34:25,350 --> 00:34:30,330
It's basically a characteristic
of the aliasing patterns

599
00:34:30,330 --> 00:34:36,060
and how decoupled you are able
to get between main effects

600
00:34:36,060 --> 00:34:39,030
and interaction effects or
second order interactions

601
00:34:39,030 --> 00:34:41,739
with other
interactions, and so on.

602
00:34:41,739 --> 00:34:46,530
And you will hear, sometimes,
description of particular half

603
00:34:46,530 --> 00:34:51,150
fraction or other
experimental designs, DOEs,

604
00:34:51,150 --> 00:34:55,679
as being resolution three, or
resolution four, resolution

605
00:34:55,679 --> 00:34:56,790
five.

606
00:34:56,790 --> 00:35:01,440
And a resolution three
is a weaker ability

607
00:35:01,440 --> 00:35:06,600
to discern than a higher
resolution in your experiment.

608
00:35:06,600 --> 00:35:12,960
But a resolution three is
nice in that no main effect--

609
00:35:12,960 --> 00:35:16,450
an A factor is alias
with another main effect.

610
00:35:16,450 --> 00:35:23,660
So A is not alias with B.
But you will have main factor

611
00:35:23,660 --> 00:35:26,480
and other interaction aliases.

612
00:35:26,480 --> 00:35:28,070
Let's say you didn't want that.

613
00:35:28,070 --> 00:35:30,830
You might need to
build a more powerful--

614
00:35:30,830 --> 00:35:35,270
do more experimental
combinations.

615
00:35:35,270 --> 00:35:38,810
And you could go to a resolution
four, which is essentially

616
00:35:38,810 --> 00:35:42,530
designed as an experiment
where no main effect is

617
00:35:42,530 --> 00:35:44,210
an alias with
another main effect.

618
00:35:44,210 --> 00:35:47,630
And I'm not even
aliasing any main effect

619
00:35:47,630 --> 00:35:49,350
with its second
order interaction.

620
00:35:49,350 --> 00:35:52,550
I might be so worried that there
are second order interactions

621
00:35:52,550 --> 00:35:55,790
that I have to
make sure that I do

622
00:35:55,790 --> 00:35:59,150
enough experimental points to be
able to detect them separately.

623
00:35:59,150 --> 00:36:04,520
Do an ANOVA on significance
of each of them separately.

624
00:36:04,520 --> 00:36:08,540
The example that we did just
a second ago, what resolution

625
00:36:08,540 --> 00:36:11,170
do you think that is?

626
00:36:11,170 --> 00:36:12,470
It's only resolution three.

627
00:36:12,470 --> 00:36:14,080
We were clearly
seeing interaction

628
00:36:14,080 --> 00:36:16,030
with the second order effect.

629
00:36:16,030 --> 00:36:19,390
If I wanted to get to resolution
four, what would I need to do?

630
00:36:22,150 --> 00:36:24,160
It's the full
factorial, in this case.

631
00:36:24,160 --> 00:36:26,410
I'd have to go to
the full 2 to the 3.

632
00:36:26,410 --> 00:36:30,220
I couldn't pick a half fraction.

633
00:36:30,220 --> 00:36:33,420
So if there were
four factors, it

634
00:36:33,420 --> 00:36:39,070
turns out you get 16 columns.

635
00:36:39,070 --> 00:36:42,040
And now I can pick 16 rows.

636
00:36:42,040 --> 00:36:45,390
I can pick up 16
columns and rows.

637
00:36:45,390 --> 00:36:48,420
But I can pick, now, some
subset if I did a half fraction

638
00:36:48,420 --> 00:36:54,150
and still achieve a resolution
four if I wanted to.

639
00:36:54,150 --> 00:36:56,860
So this is just looking
back, again, at the column.

640
00:36:56,860 --> 00:37:01,170
Again, the main effects are
aliased with interactions only

641
00:37:01,170 --> 00:37:04,260
in this defined experiment.

642
00:37:04,260 --> 00:37:07,650
And that, you will sometimes
see referred to as a 2

643
00:37:07,650 --> 00:37:12,490
to the 3 minus 1 sub resolution
three experimental design.

644
00:37:12,490 --> 00:37:14,890
So now, if you see the
shorthand notation there,

645
00:37:14,890 --> 00:37:17,820
you look and go, oh, that's a
half fraction because it's a 2

646
00:37:17,820 --> 00:37:19,320
to the 3 minus 1.

647
00:37:19,320 --> 00:37:21,270
And it's resolution
three, telling you

648
00:37:21,270 --> 00:37:23,571
something about the aliasing.

649
00:37:23,571 --> 00:37:24,988
AUDIENCE: In the
previous chapter,

650
00:37:24,988 --> 00:37:26,944
is the difference
between resolution 4

651
00:37:26,944 --> 00:37:32,582
and 5 both have no [INAUDIBLE]

652
00:37:32,582 --> 00:37:34,290
DUANE BONING: So there
was a subtle point

653
00:37:34,290 --> 00:37:39,480
in here, which is no main effect
interacts with a second order

654
00:37:39,480 --> 00:37:40,500
interaction.

655
00:37:40,500 --> 00:37:42,960
But I might have a
second order interaction

656
00:37:42,960 --> 00:37:47,040
aliased with another
second order interaction.

657
00:37:47,040 --> 00:37:50,460
The resolution five says,
no second order interactions

658
00:37:50,460 --> 00:37:53,525
are aliased with any other
second order interactions.

659
00:37:58,860 --> 00:38:02,310
So here we've been talking
about half fractions--

660
00:38:02,310 --> 00:38:07,470
2 to the 3 minus 1, especially
as the number of factors

661
00:38:07,470 --> 00:38:08,370
gets large.

662
00:38:08,370 --> 00:38:10,650
Let's say I have a design--

663
00:38:10,650 --> 00:38:13,500
I'm designing a process and
it's got eight different control

664
00:38:13,500 --> 00:38:14,010
knobs.

665
00:38:14,010 --> 00:38:15,960
That's a 2 to the
8th experiment.

666
00:38:15,960 --> 00:38:18,630
That's an awful lot of
different combinations.

667
00:38:18,630 --> 00:38:22,770
I might not want to just go
down a 2 to the 8 minus 1.

668
00:38:22,770 --> 00:38:28,290
I might want to do even
lower half fractions.

669
00:38:28,290 --> 00:38:34,000
For example, if
I cut it in half,

670
00:38:34,000 --> 00:38:36,390
and then cut it in half again,
that's a quarter fraction.

671
00:38:36,390 --> 00:38:39,250
I'm only picking a quarter
of the full factorial.

672
00:38:39,250 --> 00:38:41,850
And again, there
would be a resolution

673
00:38:41,850 --> 00:38:43,200
associated with that.

674
00:38:43,200 --> 00:38:47,450
You could look and see what is
the aliasing pattern with that.

675
00:38:47,450 --> 00:38:50,820
And as you get to the
higher order models,

676
00:38:50,820 --> 00:38:55,200
you will often see
increased half fractions

677
00:38:55,200 --> 00:38:58,290
in early parts of
experimental design

678
00:38:58,290 --> 00:39:02,520
where you're just trying to see
are the main factors important.

679
00:39:02,520 --> 00:39:05,480
And might there be some
second order effects,

680
00:39:05,480 --> 00:39:08,400
some second order interactions,
at least detecting

681
00:39:08,400 --> 00:39:09,690
whether they exist.

682
00:39:09,690 --> 00:39:11,190
And then if they
do, you'll often

683
00:39:11,190 --> 00:39:13,740
then go back in and start
filling in the other parts

684
00:39:13,740 --> 00:39:15,045
of the experiment.

685
00:39:18,510 --> 00:39:20,010
Here's another half fraction.

686
00:39:20,010 --> 00:39:26,690
Trying to think-- this
is, again, for our 2

687
00:39:26,690 --> 00:39:28,830
to the 3 minus 1.

688
00:39:28,830 --> 00:39:30,260
This is a different set of four.

689
00:39:30,260 --> 00:39:33,290
It's just got a different
defining relationship.

690
00:39:33,290 --> 00:39:37,820
And you could, again, ask, in
this case, is that-- number

691
00:39:37,820 --> 00:39:39,950
one, is that a
legitimate half fraction?

692
00:39:39,950 --> 00:39:43,910
Could I legitimately pull
these four columns out--

693
00:39:43,910 --> 00:39:44,750
four rows out?

694
00:39:47,390 --> 00:39:49,580
What kind of balance do I have?

695
00:39:49,580 --> 00:39:51,110
What kind of orthogonality?

696
00:39:51,110 --> 00:39:53,300
Or what's aliased with what?

697
00:39:53,300 --> 00:39:56,000
And you can either
do it by looking at--

698
00:39:56,000 --> 00:39:57,620
you can detect or
decide what are

699
00:39:57,620 --> 00:40:01,430
the aliases by
looking at the columns

700
00:40:01,430 --> 00:40:04,430
or simply doing your
little math down here

701
00:40:04,430 --> 00:40:06,840
from the defining relationship.

702
00:40:06,840 --> 00:40:11,240
So if I start with I
equals AC and say, OK,

703
00:40:11,240 --> 00:40:15,740
what's aliased with C?

704
00:40:15,740 --> 00:40:22,685
So I do C times I equals CA C
equals A because the C times

705
00:40:22,685 --> 00:40:25,300
C is the identity, and so on.

706
00:40:25,300 --> 00:40:27,410
And similarly, I
could say, OK, if I

707
00:40:27,410 --> 00:40:30,950
start with I equals
AC multiplied by B,

708
00:40:30,950 --> 00:40:35,970
I get B equals ABC.

709
00:40:35,970 --> 00:40:37,030
There's that.

710
00:40:37,030 --> 00:40:40,920
So there's a main effect
aliased with the third order.

711
00:40:40,920 --> 00:40:42,180
That's not too bad.

712
00:40:42,180 --> 00:40:46,950
I don't really like that
A aliasing with C. That's

713
00:40:46,950 --> 00:40:51,150
not even a resolution
three experimental pattern.

714
00:40:51,150 --> 00:40:55,200
So I'd be very
careful in using this

715
00:40:55,200 --> 00:40:57,930
unless I really
believed, strangely,

716
00:40:57,930 --> 00:41:00,660
that C was not going
to have a main effect

717
00:41:00,660 --> 00:41:04,980
but I might be worried
about it or interested in it

718
00:41:04,980 --> 00:41:10,470
for the purposes of
looking for interactions.

719
00:41:10,470 --> 00:41:13,650
There is one other subtle place
where you might actually--

720
00:41:17,090 --> 00:41:18,270
nope, never mind.

721
00:41:21,050 --> 00:41:23,840
I've already alluded to a
couple of ways of deciding what

722
00:41:23,840 --> 00:41:25,400
aliasing pattern to choose.

723
00:41:25,400 --> 00:41:28,340
The most important one is your
knowledge of the process--

724
00:41:28,340 --> 00:41:29,720
your experience
with the process.

725
00:41:29,720 --> 00:41:31,680
What factors are
likely to actually,

726
00:41:31,680 --> 00:41:34,760
based on physical
causality, interact.

727
00:41:34,760 --> 00:41:37,850
But there are a few very
important rules of thumb

728
00:41:37,850 --> 00:41:39,470
that are worth mentioning here.

729
00:41:39,470 --> 00:41:42,810
They are this idea of sparsity
effects, hierarchy of effects,

730
00:41:42,810 --> 00:41:43,700
and inheritance.

731
00:41:43,700 --> 00:41:46,040
And we've already talked
qualitatively about them.

732
00:41:46,040 --> 00:41:47,630
But just to nail them down--

733
00:41:50,870 --> 00:41:57,230
if I have eight factors, the
experimenter, you, likely

734
00:41:57,230 --> 00:42:03,370
has a certain amount
of a priori knowledge

735
00:42:03,370 --> 00:42:09,530
that there's an ordering and
likelihood of those effects.

736
00:42:09,530 --> 00:42:12,820
But it's also the case, I
have different eight factors

737
00:42:12,820 --> 00:42:14,980
on my experiment,
it's highly likely

738
00:42:14,980 --> 00:42:17,770
that they don't all equally
influence the process.

739
00:42:20,590 --> 00:42:24,880
Most processes have
a top few factors

740
00:42:24,880 --> 00:42:27,760
that have most of the influence
or most of the effective.

741
00:42:27,760 --> 00:42:33,020
It's like a Pareto
type of a rule.

742
00:42:33,020 --> 00:42:36,540
So early on in
screening, in fact,

743
00:42:36,540 --> 00:42:39,170
you might be very happy doing
one of these half fraction,

744
00:42:39,170 --> 00:42:41,090
quarter fraction, and so on.

745
00:42:41,090 --> 00:42:43,160
The purpose of
which is really just

746
00:42:43,160 --> 00:42:47,420
to narrow down a really large
number of candidate effects

747
00:42:47,420 --> 00:42:49,880
down to a smaller
number that you're then

748
00:42:49,880 --> 00:42:55,280
going to model more accurately
and look for interactions

749
00:42:55,280 --> 00:42:58,130
among those large effects.

750
00:42:58,130 --> 00:43:01,640
So this sparsity
of effects is one

751
00:43:01,640 --> 00:43:05,330
of the things that's at work
in early screening experiments

752
00:43:05,330 --> 00:43:08,225
and is a good rule
of thumb a rule

753
00:43:08,225 --> 00:43:13,270
or a generic effect to be
able to take advantage of.

754
00:43:13,270 --> 00:43:17,950
The second one is this notion of
hierarchy, which is basically--

755
00:43:17,950 --> 00:43:20,770
again, these main
effects are usually

756
00:43:20,770 --> 00:43:23,780
more dominant than
second order effects,

757
00:43:23,780 --> 00:43:29,530
which are more dominant than
third order interactions.

758
00:43:29,530 --> 00:43:34,300
Furthermore, usually
you have to--

759
00:43:34,300 --> 00:43:37,330
not have to-- but
usually, you will

760
00:43:37,330 --> 00:43:44,170
see that the main effect or
the lower order interaction

761
00:43:44,170 --> 00:43:48,340
has to be at work before you
have substantial interactions

762
00:43:48,340 --> 00:43:49,940
with some other factor.

763
00:43:49,940 --> 00:43:53,440
So it's going to be rare
for you to have a big AB

764
00:43:53,440 --> 00:43:59,560
effect and no main effect with
A and no main effect with B.

765
00:43:59,560 --> 00:44:03,970
So that's another rule
that you can often use.

766
00:44:03,970 --> 00:44:08,800
And the likelihood of
having these very high order

767
00:44:08,800 --> 00:44:10,540
interactions--

768
00:44:10,540 --> 00:44:13,990
the idea that you will have an
extra delta in an eight factor

769
00:44:13,990 --> 00:44:18,460
experiment attributable to
exactly the combination of all

770
00:44:18,460 --> 00:44:21,220
of those settings
being important

771
00:44:21,220 --> 00:44:23,260
or large enough to be
important is pretty small.

772
00:44:26,460 --> 00:44:29,860
And then, I guess, I've actually
mixed in the inheritance here

773
00:44:29,860 --> 00:44:30,360
already.

774
00:44:30,360 --> 00:44:34,130
This idea that I really
need both of these factors.

775
00:44:34,130 --> 00:44:35,730
So the hierarchy
is really just more

776
00:44:35,730 --> 00:44:40,290
saying the size of the
effect, schematically pictured

777
00:44:40,290 --> 00:44:41,970
by the size of these boxes.

778
00:44:41,970 --> 00:44:44,290
The main effect
tends to be larger.

779
00:44:44,290 --> 00:44:48,870
And then the inheritance is I
kind of need the lower order

780
00:44:48,870 --> 00:44:50,100
interactions be at work.

781
00:44:53,660 --> 00:44:58,220
So here's just some
additional examples

782
00:44:58,220 --> 00:45:02,070
in the context of
half fractions.

783
00:45:02,070 --> 00:45:05,540
We've already almost
exhaustively explored the 2

784
00:45:05,540 --> 00:45:11,960
to the 3 minus 1 resolution
three kind of picture.

785
00:45:11,960 --> 00:45:14,180
Here's an example
defining relationship

786
00:45:14,180 --> 00:45:16,310
for resolution four.

787
00:45:16,310 --> 00:45:19,310
And here, you really have
to get up to four factors

788
00:45:19,310 --> 00:45:22,310
before having
enough leeway to be

789
00:45:22,310 --> 00:45:25,940
able to have that kind
of an interaction pattern

790
00:45:25,940 --> 00:45:28,790
where I would only have--

791
00:45:28,790 --> 00:45:31,160
so, for example, now
if I did the algebra,

792
00:45:31,160 --> 00:45:34,790
A would be confounded with BCD.

793
00:45:34,790 --> 00:45:43,130
Or if I multiplied AB on
one side, AB is equal to CD.

794
00:45:43,130 --> 00:45:44,930
So that was the
earlier question.

795
00:45:44,930 --> 00:45:49,310
There you can see, a two-factor
effect or two way interaction

796
00:45:49,310 --> 00:45:52,220
is confounded with another
two way interaction.

797
00:45:52,220 --> 00:45:55,160
But all main effects
are only compounded

798
00:45:55,160 --> 00:45:59,643
with three way interactions
and none with each other.

799
00:45:59,643 --> 00:46:01,560
And here would be an
example you can play with

800
00:46:01,560 --> 00:46:06,570
to see what the aliasing pattern
would be with a resolution five

801
00:46:06,570 --> 00:46:09,975
kind of experiment.

802
00:46:09,975 --> 00:46:15,080
AUDIENCE: [INAUDIBLE]
the 4 minus 2 design--

803
00:46:15,080 --> 00:46:18,683
is that-- would that
have-- what's the identity?

804
00:46:18,683 --> 00:46:19,850
DUANE BONING: I think that--

805
00:46:25,920 --> 00:46:28,770
the identity-- I don't
remember offhand.

806
00:46:28,770 --> 00:46:33,090
I think we've got, actually,
an example right here that

807
00:46:33,090 --> 00:46:36,946
shows what the aliasing
pattern would be.

808
00:46:36,946 --> 00:46:40,430
You asked exactly the question.

809
00:46:40,430 --> 00:46:42,740
And so we can work it out.

810
00:46:42,740 --> 00:46:44,670
So here's an example.

811
00:46:44,670 --> 00:46:47,540
First off, we already
know it's probably not

812
00:46:47,540 --> 00:46:52,680
going to be a defining
relationship like this one

813
00:46:52,680 --> 00:46:58,220
because I've only got 2 to the
4 minus 2, which is 2 square.

814
00:46:58,220 --> 00:47:02,870
I've only got four experiments
out of the 16 that I'm picking.

815
00:47:02,870 --> 00:47:05,600
So I've got an awful lot
of confounding going on.

816
00:47:05,600 --> 00:47:07,430
And the question would
be, I can actually

817
00:47:07,430 --> 00:47:10,820
build it and look and see what
the confounding pattern would

818
00:47:10,820 --> 00:47:13,490
be.

819
00:47:13,490 --> 00:47:17,660
And in particular, I might
want to pick it such a way

820
00:47:17,660 --> 00:47:21,650
that I want to at least be able
to detect the four main effects

821
00:47:21,650 --> 00:47:25,230
and lots of other things
could be aliased in with that.

822
00:47:25,230 --> 00:47:29,600
So I could build it a
priori and say, OK, I'm

823
00:47:29,600 --> 00:47:36,800
willing for A to be aliased
with that and with that.

824
00:47:36,800 --> 00:47:40,700
So in this case, when you've
got a double half fraction

825
00:47:40,700 --> 00:47:47,760
I'm going to have aliasing be
between more than two columns.

826
00:47:47,760 --> 00:47:52,620
Now I actually have multi-way
aliasing between three columns.

827
00:47:52,620 --> 00:47:56,555
So in essence, what I've got
is something like A equals BC,

828
00:47:56,555 --> 00:47:59,690
D equals ABC.

829
00:47:59,690 --> 00:48:06,020
And then you can work that
out as I is equal to ABCD.

830
00:48:06,020 --> 00:48:11,120
In this case, is equal to BC.

831
00:48:11,120 --> 00:48:15,860
So it still has this
defining relationship,

832
00:48:15,860 --> 00:48:22,460
but I'm now only picking
four of the rows.

833
00:48:22,460 --> 00:48:25,850
So I don't know if
you guys can see that.

834
00:48:25,850 --> 00:48:27,440
It's kind of tiny.

835
00:48:27,440 --> 00:48:30,410
But this is all 16
of these columns.

836
00:48:30,410 --> 00:48:32,750
And, again, this is
the same combination

837
00:48:32,750 --> 00:48:35,300
that I just went through before.

838
00:48:35,300 --> 00:48:38,040
If you look, the A column here--

839
00:48:38,040 --> 00:48:41,210
these four-- minus
1, minus 1, 1,

840
00:48:41,210 --> 00:48:46,150
1 is now aliased
with both the BCD--

841
00:48:46,150 --> 00:48:54,590
where'd that go-- the BCD minus
1, minus 1, 1, 1, and the ABC

842
00:48:54,590 --> 00:49:01,400
column, minus 1, minus 1, 1, 1.

843
00:49:01,400 --> 00:49:05,900
So if I do that, I have folded
two other columns onto it

844
00:49:05,900 --> 00:49:08,435
in order to get down to
the full half fraction.

845
00:49:12,050 --> 00:49:16,280
And you can also explode
out in the other direction

846
00:49:16,280 --> 00:49:19,850
and see what are all of the
interactions and aliases that

847
00:49:19,850 --> 00:49:23,570
go on, again, either
looking at the columns--

848
00:49:23,570 --> 00:49:25,220
and here, I've
shaded the columns

849
00:49:25,220 --> 00:49:27,286
that are alias with each other.

850
00:49:27,286 --> 00:49:29,300
So in each case,
we've got all four.

851
00:49:29,300 --> 00:49:32,000
Or you can do the column math.

852
00:49:32,000 --> 00:49:35,450
And this has another
nasty effect,

853
00:49:35,450 --> 00:49:42,950
which is those four columns
are not even resolution three.

854
00:49:45,885 --> 00:49:47,510
So going back to your
earlier question,

855
00:49:47,510 --> 00:49:51,590
I don't know what the defining
relationship is for a 2

856
00:49:51,590 --> 00:49:55,080
to the 4 minus 2
resolution three.

857
00:49:55,080 --> 00:49:58,900
I'm not sure what the
defining relationship is.

858
00:49:58,900 --> 00:50:01,130
Sounds like a good
problem set problem.

859
00:50:01,130 --> 00:50:01,880
Remember that one.

860
00:50:06,410 --> 00:50:09,740
So I think we've
explored aliasing.

861
00:50:09,740 --> 00:50:11,180
Do you understand aliasing?

862
00:50:11,180 --> 00:50:13,460
Any questions on the aliasing?

863
00:50:13,460 --> 00:50:15,590
Confounding?

864
00:50:15,590 --> 00:50:17,240
There is one more
aspect to it that I

865
00:50:17,240 --> 00:50:18,980
want to explore a
little bit, which

866
00:50:18,980 --> 00:50:22,490
is, what are the implications
for model construction?

867
00:50:22,490 --> 00:50:24,440
We've already alluded to this.

868
00:50:24,440 --> 00:50:27,440
So let's just sort
of work through it.

869
00:50:27,440 --> 00:50:33,230
But also, folding in and
remembering interaction terms,

870
00:50:33,230 --> 00:50:35,630
but also, potential
higher order terms

871
00:50:35,630 --> 00:50:41,090
and some implications that
arise if-- some of my factors,

872
00:50:41,090 --> 00:50:44,900
I think, might have
quadratic elements to it.

873
00:50:44,900 --> 00:50:49,190
You get, actually, then
complicated aliasing patterns

874
00:50:49,190 --> 00:50:50,550
in those cases.

875
00:50:50,550 --> 00:50:56,630
So a simple case, when I just
got one input and I've got one

876
00:50:56,630 --> 00:50:59,740
output, but I think there
might be a quadratic effect,

877
00:50:59,740 --> 00:51:03,830
what we're seeing here is that I
cannot do just a two level full

878
00:51:03,830 --> 00:51:06,020
factorial to exercise that.

879
00:51:06,020 --> 00:51:07,970
As we talked about
last time, I have

880
00:51:07,970 --> 00:51:11,390
to add some kind of a
center point or some other--

881
00:51:11,390 --> 00:51:13,850
I really need all three
data points in order

882
00:51:13,850 --> 00:51:17,600
to be able to fit
a quadratic term.

883
00:51:20,360 --> 00:51:22,160
And we talked last
time about being

884
00:51:22,160 --> 00:51:25,490
able to do ANOVA
residual analysis

885
00:51:25,490 --> 00:51:29,450
to differentiate whether
that deviation compared

886
00:51:29,450 --> 00:51:35,430
to spread within a replication
error is significant.

887
00:51:35,430 --> 00:51:40,640
So I could decide whether that
coefficient is significant

888
00:51:40,640 --> 00:51:42,210
or not.

889
00:51:42,210 --> 00:51:47,330
If I were generalizing this
now, to more than one factor--

890
00:51:47,330 --> 00:51:53,750
last time we talked about
a two factor example.

891
00:51:53,750 --> 00:51:56,760
And we had up to
second order terms.

892
00:51:56,760 --> 00:52:00,380
But if we expand this
out to a full quadratic

893
00:52:00,380 --> 00:52:03,530
with all of the
interactions, what you see

894
00:52:03,530 --> 00:52:07,940
is a very rapid explosion in
the number of coefficients.

895
00:52:07,940 --> 00:52:12,110
Because I've got
my main effects--

896
00:52:12,110 --> 00:52:15,650
so this is still
just two factors,

897
00:52:15,650 --> 00:52:19,700
but now tree level-- a full
factorial in three levels.

898
00:52:19,700 --> 00:52:21,200
I've got my average.

899
00:52:21,200 --> 00:52:22,340
I've got my main effect.

900
00:52:22,340 --> 00:52:26,810
I've got my interaction
between x1 and x2.

901
00:52:26,810 --> 00:52:32,790
But then I've got, also, x1
squared and my x2 squared.

902
00:52:32,790 --> 00:52:38,090
And then I've got the
interaction between x1 squared

903
00:52:38,090 --> 00:52:42,180
and x2, x1 and x2 squared.

904
00:52:42,180 --> 00:52:47,675
And if both of the
terms were squared,

905
00:52:47,675 --> 00:52:50,100
the question is, do I have--

906
00:52:50,100 --> 00:52:53,460
this is what the full
quadratic model with all

907
00:52:53,460 --> 00:52:55,260
of the interactions would be.

908
00:52:55,260 --> 00:52:59,400
Do I have enough data to
be able to fit this if I

909
00:52:59,400 --> 00:53:02,930
did a three squared problem?

910
00:53:05,950 --> 00:53:06,490
Why not?

911
00:53:11,620 --> 00:53:13,000
How many coefficients
do we have?

912
00:53:15,880 --> 00:53:17,463
AUDIENCE: You have one extra.

913
00:53:17,463 --> 00:53:18,880
DUANE BONING: Do
I have one extra?

914
00:53:18,880 --> 00:53:20,160
One, two, three--

915
00:53:20,160 --> 00:53:22,780
I have nine coefficients.

916
00:53:22,780 --> 00:53:25,540
How many experiments
is three squared?

917
00:53:25,540 --> 00:53:28,450
Nine experiments--
so it's exactly

918
00:53:28,450 --> 00:53:30,880
like what we saw with
the two experiments.

919
00:53:30,880 --> 00:53:36,430
I have exactly, just barely,
the number to perfectly fit.

920
00:53:36,430 --> 00:53:38,950
So actually, if I do the
regression formulation,

921
00:53:38,950 --> 00:53:41,720
which I think is coming--

922
00:53:41,720 --> 00:53:44,740
ew, nasty-- think
these are supposed

923
00:53:44,740 --> 00:53:46,810
to be vertical dot, dot, dots.

924
00:53:46,810 --> 00:53:49,486
And these are supposed to be
horizontal dot, dot, dots.

925
00:53:49,486 --> 00:53:51,580
Weird font problem here.

926
00:53:51,580 --> 00:53:56,470
If I were to actually formulate
this for the three factor,

927
00:53:56,470 --> 00:54:00,100
I would have exactly the
same matrix relationship.

928
00:54:00,100 --> 00:54:03,160
And I have exactly the same
number of rows and columns.

929
00:54:03,160 --> 00:54:06,550
And I would be
able to fit that--

930
00:54:06,550 --> 00:54:08,110
I guess that doesn't
come till later.

931
00:54:08,110 --> 00:54:10,360
I could solve that
directly and get

932
00:54:10,360 --> 00:54:14,260
beta is equal to x minus 1--

933
00:54:14,260 --> 00:54:15,940
my output.

934
00:54:15,940 --> 00:54:19,400
But I don't have
any replicate data.

935
00:54:19,400 --> 00:54:21,190
If I had replicate
data, then I would also

936
00:54:21,190 --> 00:54:26,230
need to do the pseudo
inverse exactly as before.

937
00:54:26,230 --> 00:54:28,290
So the point the
point here is if you

938
00:54:28,290 --> 00:54:34,020
want to build every possible
model in the full quadratic,

939
00:54:34,020 --> 00:54:39,230
I have to have a full factorial
in three levels, as well.

940
00:54:44,710 --> 00:54:47,632
So we can meaningfully talk--

941
00:54:47,632 --> 00:54:49,090
although you don't
see it that much

942
00:54:49,090 --> 00:54:51,790
in the literature and
we'll see why in a moment,

943
00:54:51,790 --> 00:54:56,000
you can talk about full
factorial more than two levels.

944
00:54:56,000 --> 00:54:58,430
I'm talking about
full factorial 3

945
00:54:58,430 --> 00:55:02,300
to the k, where you've
got three levels per test.

946
00:55:02,300 --> 00:55:05,050
But what you will see and I
want to touch on a little bit,

947
00:55:05,050 --> 00:55:08,050
are slightly
different designs that

948
00:55:08,050 --> 00:55:10,330
have a couple of
additional properties

949
00:55:10,330 --> 00:55:13,090
that might be a better
way to go than starting

950
00:55:13,090 --> 00:55:17,650
a priori with the full
tree level factorial,

951
00:55:17,650 --> 00:55:22,270
doing every possible
combination of all three levels.

952
00:55:22,270 --> 00:55:25,690
But rather, start with
the two level experiment.

953
00:55:25,690 --> 00:55:32,320
And then if you start to see
some important interactions

954
00:55:32,320 --> 00:55:34,360
or some indication
that additional effects

955
00:55:34,360 --> 00:55:38,950
are needed, adding experimental
design points incrementally

956
00:55:38,950 --> 00:55:40,400
when it's easy to do that.

957
00:55:40,400 --> 00:55:43,630
It's not always easy to do that
in your experimental setting.

958
00:55:43,630 --> 00:55:45,860
AUDIENCE: How would
you detect that?

959
00:55:45,860 --> 00:55:47,680
Would you say other--

960
00:55:47,680 --> 00:55:49,060
observe other effects?

961
00:55:49,060 --> 00:55:54,220
DUANE BONING: Yes, so if I
do purely the corner model,

962
00:55:54,220 --> 00:55:58,120
is it possible for me to detect
if there might be curvature?

963
00:55:58,120 --> 00:55:59,100
No.

964
00:55:59,100 --> 00:56:01,750
So the first thing I would
do, if I wanted to detect,

965
00:56:01,750 --> 00:56:05,710
is at least add
some center points.

966
00:56:05,710 --> 00:56:07,510
And certainly, for
continuous parameters

967
00:56:07,510 --> 00:56:09,500
we talked about last
time, that makes sense.

968
00:56:09,500 --> 00:56:11,740
It doesn't make sense
for discrete parameters.

969
00:56:11,740 --> 00:56:14,110
Really, curvature is only
a term that makes sense

970
00:56:14,110 --> 00:56:15,450
with a continuous parameters.

971
00:56:15,450 --> 00:56:18,160
So that's kind of the
domain I'm talking in here.

972
00:56:18,160 --> 00:56:22,600
So in fact, my rule
of thumb is I always

973
00:56:22,600 --> 00:56:27,010
have at least the center
points in my original design.

974
00:56:27,010 --> 00:56:30,400
I never do just a pure
two level factorial.

975
00:56:30,400 --> 00:56:34,900
I always add at least the center
points because they tell me--

976
00:56:34,900 --> 00:56:37,850
and I try to replicate at
least the center points

977
00:56:37,850 --> 00:56:43,420
so I can distinguish between
curvature and I can also, then,

978
00:56:43,420 --> 00:56:45,910
not fit exactly perfectly
every interaction,

979
00:56:45,910 --> 00:56:48,310
but I can also start
to ask questions

980
00:56:48,310 --> 00:56:50,650
about the significance
of these effects.

981
00:56:50,650 --> 00:56:54,770
Then I've got some indication
that there might be curvature.

982
00:56:54,770 --> 00:56:58,930
Now I might go in and start
to say, OK, there's curvature

983
00:56:58,930 --> 00:57:01,000
but I don't really
know the nature of it.

984
00:57:01,000 --> 00:57:06,160
Now I want to add these three
level points on more than one

985
00:57:06,160 --> 00:57:08,920
of the experimental factors.

986
00:57:08,920 --> 00:57:13,300
That's where I might then add
even more experimental points.

987
00:57:13,300 --> 00:57:18,670
But I'm always shocked
because a few of the books--

988
00:57:18,670 --> 00:57:21,640
you read Montgomery,
[INAUDIBLE],,

989
00:57:21,640 --> 00:57:24,260
you read most of the
experimental design books.

990
00:57:24,260 --> 00:57:29,860
They rarely talk and emphasize
the value of the center point.

991
00:57:29,860 --> 00:57:34,570
It's just absolutely
crucial in my mind.

992
00:57:34,570 --> 00:57:40,600
I always want to add at
least that off corner--

993
00:57:40,600 --> 00:57:43,030
one off corner point.

994
00:57:43,030 --> 00:57:45,290
And preferably some
replicates of that

995
00:57:45,290 --> 00:57:48,220
because it gives you
so much more power.

996
00:57:48,220 --> 00:57:49,450
And think about it.

997
00:57:49,450 --> 00:57:51,340
If I'm doing four
factors, I'm only

998
00:57:51,340 --> 00:57:53,260
adding one more
experimental combination.

999
00:57:53,260 --> 00:57:56,620
I'm not exploding
out the whole design.

1000
00:57:56,620 --> 00:58:01,150
It's a very cheap way
to learn an awful lot

1001
00:58:01,150 --> 00:58:02,820
more about your experiment.

1002
00:58:07,990 --> 00:58:09,490
So this is simply
making the point.

1003
00:58:09,490 --> 00:58:14,040
We just looked at
the 3 to the 2 case--

1004
00:58:14,040 --> 00:58:18,195
how many coefficients there
are in the full model.

1005
00:58:21,570 --> 00:58:26,580
In particular, if we do the
full model quadratic 3 to the 2,

1006
00:58:26,580 --> 00:58:28,750
we already got
nine coefficients.

1007
00:58:28,750 --> 00:58:30,180
But if I add just
one more factor

1008
00:58:30,180 --> 00:58:34,950
and I'm still worried
about the full model

1009
00:58:34,950 --> 00:58:38,700
with all of the
interactions, the number

1010
00:58:38,700 --> 00:58:43,560
of experiments that I need
to do explodes rapidly.

1011
00:58:43,560 --> 00:58:48,225
Up here with only five
factors, I got 243 model terms.

1012
00:58:52,280 --> 00:58:54,530
Do you really think each of
those model terms is going

1013
00:58:54,530 --> 00:58:58,610
to be nonzero significant?

1014
00:58:58,610 --> 00:59:02,300
Probably not-- so this is
also a wonderful opportunity

1015
00:59:02,300 --> 00:59:06,050
for saying, sparsity of effects,
hierarchy, all of those--

1016
00:59:06,050 --> 00:59:08,690
I'm going to alias some
of those in and discount

1017
00:59:08,690 --> 00:59:09,950
certain interactions.

1018
00:59:09,950 --> 00:59:16,730
And in fact, if I just do main
effects and the third order

1019
00:59:16,730 --> 00:59:21,510
term, but only on
the single effect.

1020
00:59:21,510 --> 00:59:25,940
So an x1 squared--

1021
00:59:25,940 --> 00:59:29,750
second order term, quadratic
model, x1 squared, and x2

1022
00:59:29,750 --> 00:59:32,030
squared, and so on--

1023
00:59:32,030 --> 00:59:35,690
then it only grows linearly
with the number of factors.

1024
00:59:35,690 --> 00:59:41,870
If I know a priori, I can
neglect those higher order

1025
00:59:41,870 --> 00:59:44,550
interactions.

1026
00:59:44,550 --> 00:59:47,210
So this is just working
out and giving an example,

1027
00:59:47,210 --> 00:59:50,450
now using some of our
earlier terminology.

1028
00:59:50,450 --> 00:59:56,720
Again, here I can refer to
the different combinations.

1029
00:59:56,720 --> 01:00:03,570
Again, I have my A and B. And
I can label the AB interaction.

1030
01:00:03,570 --> 01:00:08,930
That's that AB interaction
or the AB effect

1031
01:00:08,930 --> 01:00:10,790
goes with the Beta 1, 2.

1032
01:00:10,790 --> 01:00:15,320
This A2 is just an a squared, B
squared, A squared B, B squared

1033
01:00:15,320 --> 01:00:17,240
A, A square B squared.

1034
01:00:17,240 --> 01:00:23,120
And you can, again, see,
now, the factor levels where

1035
01:00:23,120 --> 01:00:26,660
I've added 0 to indicate
the center level setting

1036
01:00:26,660 --> 01:00:29,210
for each of those.

1037
01:00:29,210 --> 01:00:31,970
But we can also
borrow, and import,

1038
01:00:31,970 --> 01:00:36,620
and use all of the same
aliasing terminology

1039
01:00:36,620 --> 01:00:38,220
that we had earlier.

1040
01:00:38,220 --> 01:00:42,900
So for example, if I only did--

1041
01:00:42,900 --> 01:00:46,100
I don't know a--

1042
01:00:46,100 --> 01:00:50,300
well, if I did a 3
to the 2 minus 1,

1043
01:00:50,300 --> 01:00:53,390
I guess that's a half fraction,
but kind of an odd one

1044
01:00:53,390 --> 01:00:58,880
because that means I
get to pick three rows.

1045
01:00:58,880 --> 01:01:02,120
I could pick-- is that right?

1046
01:01:02,120 --> 01:01:03,410
I think that is.

1047
01:01:03,410 --> 01:01:05,450
You'd have a lot of
aliasing going on.

1048
01:01:08,060 --> 01:01:11,810
So actually, it's not as easy
to talk about a half fraction.

1049
01:01:14,360 --> 01:01:18,767
And in fact, we'll see--

1050
01:01:18,767 --> 01:01:20,850
I can do these aliasing,
but I want to leap ahead.

1051
01:01:20,850 --> 01:01:23,060
Here we go.

1052
01:01:23,060 --> 01:01:25,640
A good way to try to
visualize these, which

1053
01:01:25,640 --> 01:01:27,410
works for lower
number of factors

1054
01:01:27,410 --> 01:01:30,020
because it's not too
high dimensional a space,

1055
01:01:30,020 --> 01:01:38,060
is actually plot out in your
x1 versus x2 experimental space

1056
01:01:38,060 --> 01:01:41,780
what design points you're
actually exercising

1057
01:01:41,780 --> 01:01:43,560
with some half fraction.

1058
01:01:43,560 --> 01:01:47,870
So for example, in this case,
I'm just doing say, these 2/3

1059
01:01:47,870 --> 01:01:50,870
and leaving off this
1/3 of the experiment.

1060
01:01:55,210 --> 01:01:57,070
In which case, what
I'm essentially doing

1061
01:01:57,070 --> 01:02:00,340
is giving up and saying, I'm
not going to do the high level

1062
01:02:00,340 --> 01:02:03,400
setting on my x2 factor.

1063
01:02:03,400 --> 01:02:05,650
And now you can start
to get a good feel,

1064
01:02:05,650 --> 01:02:08,140
nice intuitive feel
that goes together

1065
01:02:08,140 --> 01:02:09,640
with the mathematics,
of what you're

1066
01:02:09,640 --> 01:02:11,950
giving up when you do that.

1067
01:02:11,950 --> 01:02:14,560
If I were not doing
those experiments,

1068
01:02:14,560 --> 01:02:18,190
I just picked this subset,
what model coefficients am

1069
01:02:18,190 --> 01:02:21,250
I detecting or am I
going to be able to fit,

1070
01:02:21,250 --> 01:02:25,800
and what am I not
in that experiment?

1071
01:02:25,800 --> 01:02:27,380
Should be pretty intuitive.

1072
01:02:34,430 --> 01:02:38,870
What model coefficient
would I not be able to fit?

1073
01:02:42,590 --> 01:02:43,958
Next to what?

1074
01:02:43,958 --> 01:02:46,160
AUDIENCE: [INAUDIBLE]

1075
01:02:46,160 --> 01:02:48,260
DUANE BONING:
Yeah, for this one.

1076
01:02:48,260 --> 01:02:52,220
No worries-- I said, we are
going to pick these six points.

1077
01:02:52,220 --> 01:02:55,910
These are the six columns
going with these rows.

1078
01:02:55,910 --> 01:03:00,800
But I'm not doing
those three experiments

1079
01:03:00,800 --> 01:03:03,436
in my x1 and x2
factor [INAUDIBLE]..

1080
01:03:06,220 --> 01:03:10,290
Can I do a quadratic
model in x1?

1081
01:03:10,290 --> 01:03:13,690
Sure, I got three data points
projected along in that.

1082
01:03:13,690 --> 01:03:16,570
Can I do a quadratic
model in x2?

1083
01:03:16,570 --> 01:03:19,700
No, I'm only exercising
two different levels.

1084
01:03:19,700 --> 01:03:22,660
So I can only go up to
linear in that term.

1085
01:03:22,660 --> 01:03:24,220
And that's what I
mean by intuitive.

1086
01:03:24,220 --> 01:03:29,890
I think you can start to see
what, at least main order

1087
01:03:29,890 --> 01:03:33,670
effects as well as second
order terms, are at work.

1088
01:03:33,670 --> 01:03:36,250
It's a little more
subtle to see,

1089
01:03:36,250 --> 01:03:42,190
can I do an x1 and x2 or
an x1 squared and an x2?

1090
01:03:42,190 --> 01:03:45,100
I think you can see,
I've got combinations

1091
01:03:45,100 --> 01:03:47,050
to be able to do some
of those, but there may

1092
01:03:47,050 --> 01:03:48,612
be some confounding going on.

1093
01:03:48,612 --> 01:03:50,320
And then you can look
back at the columns

1094
01:03:50,320 --> 01:03:53,960
to see what kind of
confounding may be occurring.

1095
01:03:53,960 --> 01:03:58,000
So I haven't actually
done that on this

1096
01:03:58,000 --> 01:04:00,760
to figure out which is
confounded with what.

1097
01:04:00,760 --> 01:04:03,798
But let's see-- anybody--

1098
01:04:03,798 --> 01:04:05,590
it's probably so tiny
you don't have a hope

1099
01:04:05,590 --> 01:04:07,570
in the world of seeing it.

1100
01:04:07,570 --> 01:04:08,290
There is one.

1101
01:04:11,530 --> 01:04:15,730
B is equal to minus B
squared, in this case,

1102
01:04:15,730 --> 01:04:17,710
where B was my x2.

1103
01:04:17,710 --> 01:04:20,530
So this is basically
confounding and saying--

1104
01:04:20,530 --> 01:04:23,980
I was telling you,
I could not fit

1105
01:04:23,980 --> 01:04:28,180
the B squared-- the
quadratic term in x2.

1106
01:04:28,180 --> 01:04:32,140
If there actually is
curvature, where did it go?

1107
01:04:32,140 --> 01:04:35,470
It's hidden inside of
the B2 linear term.

1108
01:04:35,470 --> 01:04:39,340
It's confounded with
the B linear term.

1109
01:04:39,340 --> 01:04:42,730
So it's the same
kind of terminology.

1110
01:04:42,730 --> 01:04:55,440
Here's a different pattern
that looks almost the same

1111
01:04:55,440 --> 01:04:56,610
as the other one.

1112
01:04:56,610 --> 01:05:02,226
I can still fit quadratic
in x1, linear in x2--

1113
01:05:02,226 --> 01:05:04,890
in fact, it's not
even clear to me

1114
01:05:04,890 --> 01:05:08,500
right up front what I can't
fit with one or the other.

1115
01:05:08,500 --> 01:05:12,780
I kind of like having the center
point in the other design.

1116
01:05:12,780 --> 01:05:15,790
But now, what's the
difference between these two?

1117
01:05:15,790 --> 01:05:20,700
Actually, I think a lot of the
aliasing is relatively similar.

1118
01:05:20,700 --> 01:05:23,250
I can think of one reason
why I might pick the lower

1119
01:05:23,250 --> 01:05:26,460
design over the upper design.

1120
01:05:26,460 --> 01:05:29,930
AUDIENCE: Wouldn't it make
the effect more pronounced?

1121
01:05:29,930 --> 01:05:33,500
DUANE BONING: Yeah, I
think that in combination

1122
01:05:33,500 --> 01:05:35,870
with what I was thinking,
but I think you're right.

1123
01:05:35,870 --> 01:05:37,430
The statement was,
this would make

1124
01:05:37,430 --> 01:05:39,170
the effective more pronounced.

1125
01:05:39,170 --> 01:05:44,270
I'm also thinking it explores
a larger space of x2.

1126
01:05:44,270 --> 01:05:48,590
This one is just zeroing
in or zooming in on the 0

1127
01:05:48,590 --> 01:05:49,830
to low setting.

1128
01:05:49,830 --> 01:05:52,220
The other one goes all
the way and spans the low

1129
01:05:52,220 --> 01:05:53,580
to the high setting.

1130
01:05:53,580 --> 01:05:56,870
So I'm exploring or fitting
around the larger portion

1131
01:05:56,870 --> 01:05:58,110
of the space.

1132
01:05:58,110 --> 01:06:04,010
So I might prefer, for
that reason, the lower one.

1133
01:06:04,010 --> 01:06:06,210
It doesn't have a
pure center point,

1134
01:06:06,210 --> 01:06:08,900
but it's not clear that
the other one does either.

1135
01:06:11,762 --> 01:06:20,780
AUDIENCE: Is there [INAUDIBLE]

1136
01:06:20,780 --> 01:06:23,570
DUANE BONING: Yeah, so
the same idea is a balance

1137
01:06:23,570 --> 01:06:25,200
and orthogonality are at work.

1138
01:06:25,200 --> 01:06:27,680
But again, it's mostly
with respect to--

1139
01:06:30,320 --> 01:06:31,865
it's more subtle
now because it's

1140
01:06:31,865 --> 01:06:33,860
with respect to
either the linear term

1141
01:06:33,860 --> 01:06:37,130
or the quadratic term.

1142
01:06:37,130 --> 01:06:40,700
So these are both balanced
and orthogonal with respect

1143
01:06:40,700 --> 01:06:45,530
to things like the
main order effects.

1144
01:06:45,530 --> 01:06:49,860
But not orthogonal B to B
squared in these two cases.

1145
01:06:49,860 --> 01:06:51,590
So you can use the
same reasoning.

1146
01:06:51,590 --> 01:06:55,220
But you've got
three levels, you've

1147
01:06:55,220 --> 01:06:57,470
got to expand it out
and think it's not just

1148
01:06:57,470 --> 01:06:59,810
the whole variable
or the interaction.

1149
01:06:59,810 --> 01:07:02,600
It's also the second
order term, possibly,

1150
01:07:02,600 --> 01:07:05,244
being nonorthogonal
with a lower order term.

1151
01:07:07,880 --> 01:07:08,690
Here's another one.

1152
01:07:14,500 --> 01:07:21,080
What do you think about that one
compared to the previous two?

1153
01:07:25,730 --> 01:07:27,240
This one looks
interesting to me.

1154
01:07:34,114 --> 01:07:35,832
AUDIENCE: It's
going to be bigger

1155
01:07:35,832 --> 01:07:39,515
because we can do
quadratic [INAUDIBLE]

1156
01:07:39,515 --> 01:07:42,170
and since the combinations
are less-- they

1157
01:07:42,170 --> 01:07:45,910
have less effect [INAUDIBLE].

1158
01:07:45,910 --> 01:07:47,373
DUANE BONING: Yep--

1159
01:07:47,373 --> 01:07:48,790
I'm not sure
everybody heard that.

1160
01:07:48,790 --> 01:07:52,780
But at least here, we're
exercising all three levels

1161
01:07:52,780 --> 01:07:55,990
of x1, all three levels of x2.

1162
01:07:55,990 --> 01:08:00,430
So we can fit quadratic
terms in each of the cases.

1163
01:08:00,430 --> 01:08:03,610
Where we might need
to be a little careful

1164
01:08:03,610 --> 01:08:05,680
is we're actually making
an interesting trade

1165
01:08:05,680 --> 01:08:09,040
off here-- going back,
also, to your earlier point.

1166
01:08:09,040 --> 01:08:13,210
We're giving up a little bit
of balance in this design.

1167
01:08:13,210 --> 01:08:16,270
In that for the
low setting of x1,

1168
01:08:16,270 --> 01:08:20,500
I've actually got more data
than for the high setting of x1.

1169
01:08:20,500 --> 01:08:23,439
And so if there's
noise effects, I'm

1170
01:08:23,439 --> 01:08:27,100
actually fitting, in effect--

1171
01:08:27,100 --> 01:08:30,069
if you do the regression, you'll
have a narrower confidence

1172
01:08:30,069 --> 01:08:32,229
interval over here
at the low setting

1173
01:08:32,229 --> 01:08:34,069
than you would at
the high setting.

1174
01:08:34,069 --> 01:08:36,859
So you actually have
to be a little careful.

1175
01:08:36,859 --> 01:08:38,920
You can do the regression math.

1176
01:08:38,920 --> 01:08:41,019
You have to be careful
in forming your contrast.

1177
01:08:45,160 --> 01:08:46,899
Normally, you get to
these high r models,

1178
01:08:46,899 --> 01:08:50,200
you're probably throwing it
into a regression anyway.

1179
01:08:50,200 --> 01:08:52,600
But you also then have to be
careful in the interpolation

1180
01:08:52,600 --> 01:08:54,939
and use of the model in
different parts of the space

1181
01:08:54,939 --> 01:08:56,529
because its accuracy
is a little bit

1182
01:08:56,529 --> 01:08:59,109
different in different
parts of this space.

1183
01:08:59,109 --> 01:09:00,040
But I like this--

1184
01:09:00,040 --> 01:09:06,189
I kind of like this one is as
well, even with those caveats.

1185
01:09:06,189 --> 01:09:09,819
But you can still use this and
the same aliasing terminology

1186
01:09:09,819 --> 01:09:13,420
to figure out which
coefficients I'm giving up.

1187
01:09:13,420 --> 01:09:14,883
What's aliasing with what.

1188
01:09:14,883 --> 01:09:17,300
And I'm not going to go through
that, but you can do that.

1189
01:09:17,300 --> 01:09:22,300
AUDIENCE: Would you get
a higher [INAUDIBLE]??

1190
01:09:22,300 --> 01:09:24,550
DUANE BONING: Well,
it's not clear you

1191
01:09:24,550 --> 01:09:27,040
would get an overall
higher or lower r squared.

1192
01:09:35,450 --> 01:09:38,050
You almost always-- and we'll
do a little bit more regression

1193
01:09:38,050 --> 01:09:38,550
later.

1194
01:09:38,550 --> 01:09:40,880
But you almost always,
if you add more terms,

1195
01:09:40,880 --> 01:09:42,359
you get a better r squared.

1196
01:09:42,359 --> 01:09:46,189
But then the question
is, is it a fair model?

1197
01:09:46,189 --> 01:09:48,170
So we can also talk
about an adjusted r

1198
01:09:48,170 --> 01:09:52,130
squared where you penalize
for the additional model--

1199
01:09:52,130 --> 01:09:55,653
the additional model in terms.

1200
01:09:55,653 --> 01:09:57,070
And then this is
just pointing out

1201
01:09:57,070 --> 01:10:02,980
that you can still use
the linear algebraic--

1202
01:10:02,980 --> 01:10:08,860
either direct solution or quasi
inverse solution if you've got

1203
01:10:08,860 --> 01:10:12,860
replicates to be able
to fit the model.

1204
01:10:12,860 --> 01:10:16,210
So these are the sorts
of things that come out.

1205
01:10:16,210 --> 01:10:19,720
This is for an x1, x2.

1206
01:10:19,720 --> 01:10:21,880
I don't know which is which--

1207
01:10:21,880 --> 01:10:24,130
x1, x2.

1208
01:10:24,130 --> 01:10:27,483
If, in fact, there are
true quadratic terms,

1209
01:10:27,483 --> 01:10:28,900
these are a little
different kinds

1210
01:10:28,900 --> 01:10:33,730
of surfaces than the
ruled surface, which

1211
01:10:33,730 --> 01:10:37,690
looked like it had kind of a
funky kind of curvature because

1212
01:10:37,690 --> 01:10:39,790
of an x1, x2 interaction.

1213
01:10:39,790 --> 01:10:43,690
But if I projected down
on any one variable,

1214
01:10:43,690 --> 01:10:45,100
it's always linear.

1215
01:10:45,100 --> 01:10:49,360
Here, if I were to do a
slice holding x1 constant,

1216
01:10:49,360 --> 01:10:52,030
I do, in fact, get
a true quadratic.

1217
01:10:52,030 --> 01:10:55,360
Now, the nice thing
about quadratic surfaces

1218
01:10:55,360 --> 01:11:03,190
is you can start to think about
an optimal point much more

1219
01:11:03,190 --> 01:11:04,930
easily within these.

1220
01:11:04,930 --> 01:11:08,470
Certainly, if the
space is large enough

1221
01:11:08,470 --> 01:11:13,120
to cover a minimum or a maximum,
there's a natural motion

1222
01:11:13,120 --> 01:11:16,450
if you're trying to minimize
or maximize your output

1223
01:11:16,450 --> 01:11:20,290
y of finding the optimum space.

1224
01:11:20,290 --> 01:11:22,480
Now, it's also possible
that if I had a smaller

1225
01:11:22,480 --> 01:11:26,770
space in the true
minimum or maximum

1226
01:11:26,770 --> 01:11:30,380
of the full equation where
outside, then I might have--

1227
01:11:30,380 --> 01:11:33,430
so for example, let's say
my space was constrained

1228
01:11:33,430 --> 01:11:36,280
to only right here,
then I might run

1229
01:11:36,280 --> 01:11:40,750
into the minimum or the maximum
at one of my boundaries.

1230
01:11:40,750 --> 01:11:43,090
But now we've got
this extra notion

1231
01:11:43,090 --> 01:11:45,580
that my min or maximum
might occur somewhere

1232
01:11:45,580 --> 01:11:47,800
in the interior of the space.

1233
01:11:47,800 --> 01:11:49,540
As opposed to with
linear models,

1234
01:11:49,540 --> 01:11:54,340
it's always at one or the
other of the boundaries.

1235
01:11:54,340 --> 01:11:56,860
So this starts to
get to the desire

1236
01:11:56,860 --> 01:11:59,470
for using this
kind of a model now

1237
01:11:59,470 --> 01:12:03,580
to find the optimum point, which
is one of the main reasons we

1238
01:12:03,580 --> 01:12:05,260
do experimental designs.

1239
01:12:05,260 --> 01:12:07,720
Not just to find out
which factors matter,

1240
01:12:07,720 --> 01:12:11,110
but build the model, and
then use it either in,

1241
01:12:11,110 --> 01:12:15,700
maybe, feedback control or
more often, just to set--

1242
01:12:15,700 --> 01:12:19,270
find the process settings or
find the design optimal point

1243
01:12:19,270 --> 01:12:21,295
in order to achieve
some criteria.

1244
01:12:24,370 --> 01:12:27,160
Now, I alluded already to
adding additional points.

1245
01:12:27,160 --> 01:12:31,180
If we did a full
factorial three level--

1246
01:12:31,180 --> 01:12:35,350
that's all nine
combinations of high-low.

1247
01:12:35,350 --> 01:12:39,730
And in the x1, x2 space,
that's all nine points.

1248
01:12:39,730 --> 01:12:43,390
There is an alternative
approach and probably one

1249
01:12:43,390 --> 01:12:46,670
of the most important
experimental designs

1250
01:12:46,670 --> 01:12:48,940
after two level full factorial.

1251
01:12:48,940 --> 01:12:52,000
It's referred to as a
central composite design.

1252
01:12:52,000 --> 01:12:55,060
And you often will design
up front to do this,

1253
01:12:55,060 --> 01:12:58,780
but also very often
it will be what

1254
01:12:58,780 --> 01:13:03,490
you extend a full factorial
corner point design with.

1255
01:13:03,490 --> 01:13:06,280
If I did my first
experiment with 2

1256
01:13:06,280 --> 01:13:14,820
to the 2, just pure full
factorial two levels,

1257
01:13:14,820 --> 01:13:20,540
and I got four tests, first
off, this is probably wrong.

1258
01:13:20,540 --> 01:13:24,870
The model is shown
to have a poor fit.

1259
01:13:24,870 --> 01:13:28,400
If I actually did this for all
four points in the interaction,

1260
01:13:28,400 --> 01:13:31,310
I don't have enough
to detect that.

1261
01:13:31,310 --> 01:13:36,650
So somewhere in here
would be ad center points.

1262
01:13:36,650 --> 01:13:38,950
The power of the center point.

1263
01:13:38,950 --> 01:13:41,180
If you remember nothing
else from today,

1264
01:13:41,180 --> 01:13:43,010
the power of the center point.

1265
01:13:43,010 --> 01:13:45,740
The power of the center point.

1266
01:13:45,740 --> 01:13:48,920
Then I can start to
detect whether it actually

1267
01:13:48,920 --> 01:13:53,190
has a lack of fit
in a formal sense.

1268
01:13:53,190 --> 01:13:56,300
And let's say then I decide
I want to go quadratic,

1269
01:13:56,300 --> 01:14:00,330
but I'm not quite sure
of the shape where

1270
01:14:00,330 --> 01:14:04,320
my min or maximum might be.

1271
01:14:04,320 --> 01:14:08,880
What we often do is add
our additional points.

1272
01:14:08,880 --> 01:14:12,890
So there was our original
four corner points.

1273
01:14:12,890 --> 01:14:15,920
Certainly, we wanted to
add the one at the center.

1274
01:14:15,920 --> 01:14:17,810
Should have done that already.

1275
01:14:17,810 --> 01:14:22,460
But now I can also decide
where to add my interaction

1276
01:14:22,460 --> 01:14:26,060
points different-- so
let's say I added this.

1277
01:14:26,060 --> 01:14:32,150
The full typical 3 to
the 3 would say, OK,

1278
01:14:32,150 --> 01:14:38,640
you add them at exactly
these 3 by 3 grid array.

1279
01:14:38,640 --> 01:14:43,000
I would add them at the center
points of each of these.

1280
01:14:43,000 --> 01:14:45,940
We can do something else
that's a little bit clever.

1281
01:14:45,940 --> 01:14:53,070
Which is instead, add
these interaction points

1282
01:14:53,070 --> 01:15:02,810
off the grid, but at the
location of an outer circle

1283
01:15:02,810 --> 01:15:07,970
equidistant or circumscribed
around the original points

1284
01:15:07,970 --> 01:15:10,340
of my cube-- my hypercube.

1285
01:15:14,630 --> 01:15:17,590
See I think-- there we go.

1286
01:15:17,590 --> 01:15:19,950
So what we would do
in that case here,

1287
01:15:19,950 --> 01:15:22,770
is these would be my
first four points.

1288
01:15:22,770 --> 01:15:24,090
I'd add my center point.

1289
01:15:24,090 --> 01:15:28,800
And now I add these
points off axis.

1290
01:15:28,800 --> 01:15:32,280
There's something really clever,
really valuable in doing this--

1291
01:15:32,280 --> 01:15:35,340
some things that are subtle, and
kind of mathematical, and not

1292
01:15:35,340 --> 01:15:37,440
all that important, but
some other parts that

1293
01:15:37,440 --> 01:15:40,790
are really nice and intuitive.

1294
01:15:40,790 --> 01:15:43,490
I'll give you the subtle
mathematical part.

1295
01:15:43,490 --> 01:15:52,550
This actually maintains
a little bit of fitting--

1296
01:15:52,550 --> 01:15:54,500
if you go and you
do the regression,

1297
01:15:54,500 --> 01:15:56,990
the fact that all of
your off center points

1298
01:15:56,990 --> 01:15:59,690
are now equidistant
from the center point

1299
01:15:59,690 --> 01:16:04,680
means that your model variance,
no matter which direction you

1300
01:16:04,680 --> 01:16:06,430
go, is the same.

1301
01:16:06,430 --> 01:16:08,490
In other words, there's
a confidence interval

1302
01:16:08,490 --> 01:16:13,110
on your coefficients and an
interpolation or extrapolation

1303
01:16:13,110 --> 01:16:15,870
error that grows
as you go further

1304
01:16:15,870 --> 01:16:17,860
from the center of your design.

1305
01:16:17,860 --> 01:16:23,690
And by doing this, that
maintains asymmetry.

1306
01:16:23,690 --> 01:16:27,140
It doesn't matter which
direction you look.

1307
01:16:27,140 --> 01:16:28,940
You've got the
same model accuracy

1308
01:16:28,940 --> 01:16:32,810
purely as a function of
distance from the center.

1309
01:16:32,810 --> 01:16:36,660
So that's a nice
mathematical property.

1310
01:16:36,660 --> 01:16:40,890
But there's another
more intuitive value

1311
01:16:40,890 --> 01:16:42,390
picking these corner points.

1312
01:16:46,250 --> 01:16:49,060
And the way I think
of it is if I just

1313
01:16:49,060 --> 01:16:53,080
picked those 3 by 3
on a regular grid,

1314
01:16:53,080 --> 01:16:56,800
I could then project
down to x1 or x2.

1315
01:16:56,800 --> 01:17:00,520
And then in terms of
exercising the x1 variable,

1316
01:17:00,520 --> 01:17:03,700
I've only got three levels.

1317
01:17:03,700 --> 01:17:07,520
Here, if I were to
project down my points--

1318
01:17:07,520 --> 01:17:10,870
so let me get rid of some
of this scribble here.

1319
01:17:10,870 --> 01:17:15,430
If I were to project down
onto just my x1 axis--

1320
01:17:15,430 --> 01:17:16,930
let's say I did the
experiment and I

1321
01:17:16,930 --> 01:17:20,860
found that x2 actually
was not that important

1322
01:17:20,860 --> 01:17:25,150
and I project down
all of these points.

1323
01:17:25,150 --> 01:17:31,740
That means for my
x1, I've actually

1324
01:17:31,740 --> 01:17:38,120
exercised at five
different levels of x1.

1325
01:17:38,120 --> 01:17:40,400
I've got more than just--

1326
01:17:40,400 --> 01:17:44,030
I've got extra
redundancy built in.

1327
01:17:44,030 --> 01:17:48,150
If I'm fitting just
a quadratic model,

1328
01:17:48,150 --> 01:17:54,220
I've got ways to actually check
whether the quadratic model is

1329
01:17:54,220 --> 01:17:54,880
sufficient.

1330
01:17:54,880 --> 01:18:00,030
I can do a lack of fit test
on even the quadratic model.

1331
01:18:00,030 --> 01:18:04,320
Whereas if I only had
exactly three samples of x1

1332
01:18:04,320 --> 01:18:06,390
and I add the
quadratic in, there's

1333
01:18:06,390 --> 01:18:08,730
no way for me to
ask the question,

1334
01:18:08,730 --> 01:18:12,140
might there be an
even higher order,

1335
01:18:12,140 --> 01:18:14,960
a logarithmic dependence,
or something else

1336
01:18:14,960 --> 01:18:17,540
that's subtle that's going on.

1337
01:18:17,540 --> 01:18:21,590
Because I fit exactly all
of the data that I have.

1338
01:18:21,590 --> 01:18:23,420
So the central composite
design, I think,

1339
01:18:23,420 --> 01:18:26,060
has this nice
intuitive feel of I'm

1340
01:18:26,060 --> 01:18:31,010
actually kind of exercising my
space a little more thoroughly

1341
01:18:31,010 --> 01:18:34,760
to be able to build a model
that can apply a little bit more

1342
01:18:34,760 --> 01:18:36,248
broadly.

1343
01:18:36,248 --> 01:18:37,040
Everybody see that?

1344
01:18:40,690 --> 01:18:43,450
So here's what the central
composite would look like.

1345
01:18:43,450 --> 01:18:47,110
What you end up with is the
original coroner points.

1346
01:18:47,110 --> 01:18:48,970
You add the center point.

1347
01:18:48,970 --> 01:18:53,710
And now you can see, if you
were to do the geometry,

1348
01:18:53,710 --> 01:18:59,020
the distance there
is square root of 2.

1349
01:18:59,020 --> 01:19:03,460
So I'm going a full one
on both the x1 and the x2.

1350
01:19:03,460 --> 01:19:08,390
And you can extrapolate that
to a third order hypercube.

1351
01:19:08,390 --> 01:19:10,990
I think it ends
up being, instead

1352
01:19:10,990 --> 01:19:14,380
of square root of 2, that
distance is a square root of 3

1353
01:19:14,380 --> 01:19:15,230
and so on.

1354
01:19:15,230 --> 01:19:18,480
So you can find
what the distance is

1355
01:19:18,480 --> 01:19:22,890
and pick those
different corner points.

1356
01:19:22,890 --> 01:19:25,650
So what we'll do next time
is build on a little bit

1357
01:19:25,650 --> 01:19:30,120
this idea of using the model.

1358
01:19:30,120 --> 01:19:31,140
We built it.

1359
01:19:31,140 --> 01:19:32,370
We can assess it.

1360
01:19:32,370 --> 01:19:35,190
We can pick our points to build
different orders of model.

1361
01:19:35,190 --> 01:19:39,360
What we'd like to do is talk
a little bit about picking

1362
01:19:39,360 --> 01:19:43,500
one of these surfaces
and how I actually--

1363
01:19:43,500 --> 01:19:45,450
a couple of related,
but slightly different

1364
01:19:45,450 --> 01:19:53,100
ways of looking at optimizing
to find an optimum point either

1365
01:19:53,100 --> 01:19:55,410
after you've built
the whole model or--

1366
01:19:55,410 --> 01:19:59,580
the clever thing here when
you're on a process line--

1367
01:19:59,580 --> 01:20:02,400
doing it interactively
or incrementally.

1368
01:20:02,400 --> 01:20:05,760
Actually, picking your
design points almost one

1369
01:20:05,760 --> 01:20:09,510
at a time based on your current
view of the model driving

1370
01:20:09,510 --> 01:20:11,470
towards the optimum.

1371
01:20:11,470 --> 01:20:13,740
So instead of doing all
your experiments up front,

1372
01:20:13,740 --> 01:20:17,340
you might actually want to do
them in an evolutionary way

1373
01:20:17,340 --> 01:20:19,240
to try to find the optimum.

1374
01:20:19,240 --> 01:20:23,990
So we'll talk about both of
those approaches next time.

1375
01:20:23,990 --> 01:20:27,610
So again, the problem
set is due tomorrow.

1376
01:20:27,610 --> 01:20:32,470
I think we'll hold off and we'll
give ourselves, Hayden and me,

1377
01:20:32,470 --> 01:20:35,400
an extension on
issuing the new problem

1378
01:20:35,400 --> 01:20:38,440
set so you're not doubled
up, and we can think about it

1379
01:20:38,440 --> 01:20:40,120
a little bit more as well.

1380
01:20:40,120 --> 01:20:43,480
And issue that
tomorrow, as well.

1381
01:20:43,480 --> 01:20:46,320
We'll see you then on Tuesday.