1 00:00:00,000 --> 00:00:02,430 The following content is provided under a Creative 2 00:00:02,430 --> 00:00:03,730 Commons license. 3 00:00:03,730 --> 00:00:06,030 Your support will help MIT OpenCourseWare 4 00:00:06,030 --> 00:00:10,060 continue to offer high quality educational resources for free. 5 00:00:10,060 --> 00:00:12,660 To make a donation or to view additional materials 6 00:00:12,660 --> 00:00:16,560 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,560 --> 00:00:17,874 at ocw.mit.edu. 8 00:00:22,118 --> 00:00:23,660 DUANE BONING: So what I'm going to do 9 00:00:23,660 --> 00:00:27,180 is pick up a little bit where we left off from last time. 10 00:00:27,180 --> 00:00:29,210 Last time we were talking, primarily, 11 00:00:29,210 --> 00:00:30,830 about full factorial models. 12 00:00:30,830 --> 00:00:33,650 And then we dealt with a few important additional issues 13 00:00:33,650 --> 00:00:34,730 in experimental design. 14 00:00:34,730 --> 00:00:39,320 So we were talking about issues of blocking 15 00:00:39,320 --> 00:00:42,470 against nuisance factors, kind of a practical issue, 16 00:00:42,470 --> 00:00:47,030 but that also got us into a generic issue that's 17 00:00:47,030 --> 00:00:51,200 actually much more fundamental and important of confounding. 18 00:00:51,200 --> 00:00:55,220 And I want to pick up on this issue of confounding today 19 00:00:55,220 --> 00:01:00,680 because very often you will want to do fewer experiments 20 00:01:00,680 --> 00:01:03,410 than a full factorial. 21 00:01:03,410 --> 00:01:07,160 That 2 to the k grows very fast-- 22 00:01:07,160 --> 00:01:10,860 grows exponentially fast with a number of factors, for example. 23 00:01:10,860 --> 00:01:13,340 And so very often you might ask the question, 24 00:01:13,340 --> 00:01:15,830 can I reduce the number of experiments 25 00:01:15,830 --> 00:01:19,490 and still get the key information that I want to? 26 00:01:19,490 --> 00:01:21,920 And so that's where we'll really pick up 27 00:01:21,920 --> 00:01:26,780 with fractional factorial designs today and understanding 28 00:01:26,780 --> 00:01:29,870 confounding and aliasing patterns 29 00:01:29,870 --> 00:01:36,410 that come with different subsets of a full factorial design. 30 00:01:36,410 --> 00:01:38,000 Then we'll touch on some implications 31 00:01:38,000 --> 00:01:42,630 for model construction that fall out pretty naturally from that. 32 00:01:42,630 --> 00:01:45,050 And then start talking a little bit, hopefully 33 00:01:45,050 --> 00:01:48,920 if we have time, on process optimization using some 34 00:01:48,920 --> 00:01:52,190 of these kinds of design of experiments 35 00:01:52,190 --> 00:01:54,590 techniques and the models that we're building. 36 00:01:57,650 --> 00:02:00,650 So as I was saying, very often we 37 00:02:00,650 --> 00:02:05,120 want to run fewer than that exponentially growing number 38 00:02:05,120 --> 00:02:08,990 of experiments, even if it's just to level, building 39 00:02:08,990 --> 00:02:11,150 simple linear models. 40 00:02:11,150 --> 00:02:14,870 Again, we've got a 2 to the k exponential growth. 41 00:02:14,870 --> 00:02:17,660 And, as an example, imagine we said 42 00:02:17,660 --> 00:02:21,380 we wanted to run less than the full 2 to the k, 43 00:02:21,380 --> 00:02:22,850 say, for three inputs-- 44 00:02:22,850 --> 00:02:26,240 so for three inputs, if we run the full 2 to the k. 45 00:02:26,240 --> 00:02:28,940 And we wanted to form a full linear regression 46 00:02:28,940 --> 00:02:32,300 model with interactions-- so it's still on quadratic, 47 00:02:32,300 --> 00:02:34,620 but it does have all the interactions. 48 00:02:34,620 --> 00:02:36,740 This is what the model looks like. 49 00:02:36,740 --> 00:02:38,630 And it's got, if you count them up-- 50 00:02:38,630 --> 00:02:42,150 it's got eight coefficients. 51 00:02:42,150 --> 00:02:47,000 So if we were to do less than the full 2 to 3 52 00:02:47,000 --> 00:02:49,670 or 8 experiments, we obviously would not 53 00:02:49,670 --> 00:02:55,640 have enough data points to uniquely fit every coefficient. 54 00:02:55,640 --> 00:02:59,000 So that's already giving us the biggest clue. 55 00:02:59,000 --> 00:03:04,610 Is if you do less than a full factorial model, 56 00:03:04,610 --> 00:03:07,130 then even for linear experiments, 57 00:03:07,130 --> 00:03:11,220 you won't be able to fit all of the coefficients. 58 00:03:11,220 --> 00:03:17,060 So what you end up having to do is make a decision, a priori, 59 00:03:17,060 --> 00:03:20,660 that some of these factor effects are going to be small. 60 00:03:20,660 --> 00:03:23,870 That is to say, we're going to treat the coefficient 61 00:03:23,870 --> 00:03:26,630 as essentially zero. 62 00:03:26,630 --> 00:03:28,370 But then what's happening with the data? 63 00:03:28,370 --> 00:03:30,840 What impact does that have on other coefficients? 64 00:03:30,840 --> 00:03:33,660 So we can explore that slightly. 65 00:03:33,660 --> 00:03:34,850 Here's an example. 66 00:03:34,850 --> 00:03:39,560 We'll call this a 2 to the 3 minus 1 experiment. 67 00:03:39,560 --> 00:03:43,790 And I'll talk about these half fractions a little bit more. 68 00:03:43,790 --> 00:03:46,850 But instead of the full 2 to the 3 or 8 experiments, 69 00:03:46,850 --> 00:03:50,000 we'll do a half fraction here. 70 00:03:50,000 --> 00:03:53,210 We'll just do 4 experiments instead of the 8. 71 00:03:53,210 --> 00:03:58,520 And so if I were to pick my factor levels on x1 and x2 72 00:03:58,520 --> 00:04:00,800 and then think about what the-- 73 00:04:00,800 --> 00:04:03,560 so I'm just doing-- 74 00:04:03,560 --> 00:04:07,460 in my mind, I might think if I were just doing 2 factors, 75 00:04:07,460 --> 00:04:16,400 I would have a full factorial on to 2 factors. 76 00:04:19,079 --> 00:04:24,980 And I also could calculate what an interaction term would be. 77 00:04:24,980 --> 00:04:27,410 That's fine, that works well. 78 00:04:27,410 --> 00:04:32,660 I've got 4 coefficients and a 2 factor experiment. 79 00:04:32,660 --> 00:04:36,950 And that I could uniquely fit all of those coefficients. 80 00:04:36,950 --> 00:04:40,910 However, what if I were to simply relabel that. 81 00:04:40,910 --> 00:04:44,180 And instead of thinking of that as an interaction, 82 00:04:44,180 --> 00:04:49,710 what if instead I labeled that an x3 column? 83 00:04:49,710 --> 00:04:52,850 Then I could fit the linear term for an x3 model. 84 00:04:56,240 --> 00:05:00,620 So imagine I was doing a 3 factor, 85 00:05:00,620 --> 00:05:03,470 but only worried about main effects. 86 00:05:03,470 --> 00:05:05,150 I wasn't looking for interaction. 87 00:05:05,150 --> 00:05:08,150 I didn't care about the x1, x2 interaction. 88 00:05:08,150 --> 00:05:13,710 If I did that, and just defined x3 as if it were that column, 89 00:05:13,710 --> 00:05:16,670 then I've got a 3 factor experiment, 90 00:05:16,670 --> 00:05:20,630 but I can only see main effects. 91 00:05:20,630 --> 00:05:22,520 But you see what's going on here, 92 00:05:22,520 --> 00:05:25,010 I still have that interaction term. 93 00:05:25,010 --> 00:05:27,680 I still have an x1 x2 interaction. 94 00:05:27,680 --> 00:05:31,040 So the key idea of confounding, that we saw before, 95 00:05:31,040 --> 00:05:34,670 is lurking in here. 96 00:05:34,670 --> 00:05:38,360 If I were to do truly a third factor 97 00:05:38,360 --> 00:05:42,890 and set on my control knobs for that experiment, my x3 98 00:05:42,890 --> 00:05:45,830 according to these high-low settings, 99 00:05:45,830 --> 00:05:51,390 those would also give me the same information, 100 00:05:51,390 --> 00:05:54,410 if you will, the same combination settings for x1 101 00:05:54,410 --> 00:05:57,350 and x2, that I would have used to help 102 00:05:57,350 --> 00:06:01,490 me detect an the interaction between x1 and x2. 103 00:06:01,490 --> 00:06:04,010 That is to say, the x1, x2 interaction is 104 00:06:04,010 --> 00:06:08,900 confounded with a third factor, if I were to do it. 105 00:06:08,900 --> 00:06:12,560 Just expanding on that a little bit, if I were-- 106 00:06:12,560 --> 00:06:13,790 this should be a hat. 107 00:06:13,790 --> 00:06:16,610 There's another little weird font thing going on. 108 00:06:16,610 --> 00:06:24,530 If I were doing a 3 factor experiment, truly a 3 factor, 109 00:06:24,530 --> 00:06:28,100 I could build up to a linear model OK with those 4 110 00:06:28,100 --> 00:06:31,460 experimental parameters, but I could not 111 00:06:31,460 --> 00:06:35,590 fit any interaction terms. 112 00:06:35,590 --> 00:06:38,790 So I have a choice in my model building. 113 00:06:38,790 --> 00:06:43,410 Do I take my three factors and look for just main effects? 114 00:06:43,410 --> 00:06:46,140 Or do I stick with my two factors 115 00:06:46,140 --> 00:06:49,860 and look for an interaction effect and not model at all-- 116 00:06:49,860 --> 00:06:54,810 pretend the third factor doesn't matter 117 00:06:54,810 --> 00:06:58,320 or try to keep that constant and not let that enter into it? 118 00:06:58,320 --> 00:06:59,630 Question-- 119 00:06:59,630 --> 00:07:03,486 AUDIENCE: If you computed this [INAUDIBLE] 120 00:07:03,486 --> 00:07:05,970 you still can't say for sure that it 121 00:07:05,970 --> 00:07:08,660 was x3 or [? x on ?] x2. 122 00:07:08,660 --> 00:07:12,466 So you describe it [INAUDIBLE] you really 123 00:07:12,466 --> 00:07:14,840 have no way of knowing, right? 124 00:07:14,840 --> 00:07:16,220 DUANE BONING: So if-- 125 00:07:16,220 --> 00:07:20,840 so the question-- by the way, could we put the back screen 126 00:07:20,840 --> 00:07:23,720 also to the slides? 127 00:07:23,720 --> 00:07:25,490 Great, thanks. 128 00:07:25,490 --> 00:07:28,730 So the question is, we can still distinguish 129 00:07:28,730 --> 00:07:32,375 x1, x2, and x3 or not? 130 00:07:32,375 --> 00:07:34,070 AUDIENCE: No-- x3-- 131 00:07:34,070 --> 00:07:36,980 the effect that we ascribe to x3-- 132 00:07:36,980 --> 00:07:39,560 x1, x2. 133 00:07:39,560 --> 00:07:42,803 So we cannot differentiate between the two. 134 00:07:42,803 --> 00:07:46,100 Or [INAUDIBLE] might tell us we have a huge effect 135 00:07:46,100 --> 00:07:48,275 because of x3-- 136 00:07:48,275 --> 00:07:49,932 a significant effect. 137 00:07:49,932 --> 00:07:52,370 So we're not really sure if it's x3 138 00:07:52,370 --> 00:07:55,362 that's having an effect or an interaction of x1, x2. 139 00:07:55,362 --> 00:07:57,570 DUANE BONING: Yeah, you guys may not have heard that, 140 00:07:57,570 --> 00:08:00,410 but basically, just had the restatement 141 00:08:00,410 --> 00:08:02,030 of exactly the issue. 142 00:08:02,030 --> 00:08:06,500 That it's confounding between is it an x3 factor 143 00:08:06,500 --> 00:08:11,610 or is it a beta 1, 2, x1, x2 factor. 144 00:08:11,610 --> 00:08:14,520 You cannot distinguish between the two. 145 00:08:14,520 --> 00:08:18,310 AUDIENCE: So should we do something 146 00:08:18,310 --> 00:08:21,922 different to find that out? 147 00:08:21,922 --> 00:08:23,380 DUANE BONING: Yeah, so the question 148 00:08:23,380 --> 00:08:25,690 is what would you do then? 149 00:08:25,690 --> 00:08:32,620 There's a priori knowledge that may guide you to a belief 150 00:08:32,620 --> 00:08:34,929 that there is no interaction effect. 151 00:08:34,929 --> 00:08:37,803 That there's no possible physical way 152 00:08:37,803 --> 00:08:39,970 there could be an interaction effect, in which you'd 153 00:08:39,970 --> 00:08:42,190 should be safe. 154 00:08:42,190 --> 00:08:44,620 Or you might be trying to say, I'm 155 00:08:44,620 --> 00:08:49,940 not even sure either of these effects exists. 156 00:08:49,940 --> 00:08:53,170 And so I'm quite happy to do just four experiments. 157 00:08:53,170 --> 00:09:00,370 Do an ANOVA that tells me is either of those, just one 158 00:09:00,370 --> 00:09:02,770 or two, significant. 159 00:09:02,770 --> 00:09:04,510 In which case, now I worry-- maybe 160 00:09:04,510 --> 00:09:10,150 then I add experimental points to differentiate. 161 00:09:10,150 --> 00:09:12,340 So we've talked a little bit more about that. 162 00:09:12,340 --> 00:09:18,070 But that's, in a nutshell, the confounding and aliasing 163 00:09:18,070 --> 00:09:20,680 thought process. 164 00:09:20,680 --> 00:09:22,510 And we'll look at some rules of thumb 165 00:09:22,510 --> 00:09:30,480 that would lead one to believe, for example, that I may have, 166 00:09:30,480 --> 00:09:34,180 in many cases, more of an a priori belief that maybe 167 00:09:34,180 --> 00:09:37,300 the main effect will be stronger and more important. 168 00:09:37,300 --> 00:09:42,340 And I doubt that the interaction effect will be there. 169 00:09:42,340 --> 00:09:44,950 But those are kind of rule of thumb and assumptions 170 00:09:44,950 --> 00:09:47,500 that if you can do the experiments, you can check. 171 00:09:51,490 --> 00:09:54,370 So the point out of this was simply-- 172 00:09:54,370 --> 00:09:57,300 again, these should be models-- 173 00:09:57,300 --> 00:09:58,530 y hat models. 174 00:10:02,160 --> 00:10:05,910 For exactly the same table of settings, 175 00:10:05,910 --> 00:10:09,270 if I looked at it one way, I could build a model where 176 00:10:09,270 --> 00:10:12,000 I've got some interaction term. 177 00:10:12,000 --> 00:10:16,830 Or I can look at the main effect. 178 00:10:16,830 --> 00:10:18,720 Or I can give up-- 179 00:10:18,720 --> 00:10:21,720 so this could have been the x3 main effect. 180 00:10:21,720 --> 00:10:25,950 Or similarly, I could say, get confused 181 00:10:25,950 --> 00:10:30,070 between some other main effect and an interaction effect. 182 00:10:30,070 --> 00:10:33,210 In other words, I've got four coefficients, four experiments. 183 00:10:33,210 --> 00:10:35,970 I've got some confounding going on, 184 00:10:35,970 --> 00:10:38,490 but it's not clear actually. 185 00:10:38,490 --> 00:10:41,730 I've given you the example for one of them 186 00:10:41,730 --> 00:10:43,080 in the previous two slides. 187 00:10:43,080 --> 00:10:44,940 But the same thing holds true for any 188 00:10:44,940 --> 00:10:49,000 of the other interactions with one of the main effects. 189 00:10:49,000 --> 00:10:53,040 So if I really have three factors, 190 00:10:53,040 --> 00:10:55,620 I might really have eight kinds-- 191 00:10:55,620 --> 00:10:56,910 in a linear model-- 192 00:10:56,910 --> 00:10:59,790 I might actually have eight terms. 193 00:10:59,790 --> 00:11:04,110 And what I've done is folded four of them onto 194 00:11:04,110 --> 00:11:06,990 or confounded them with four of the other coefficients. 195 00:11:09,730 --> 00:11:14,020 And so then the question becomes what's confounded with what? 196 00:11:14,020 --> 00:11:18,730 How do I structure and how do I pick which 197 00:11:18,730 --> 00:11:22,360 subset of experiments to run? 198 00:11:22,360 --> 00:11:26,710 So this is, essentially, just saying the same thing. 199 00:11:26,710 --> 00:11:30,640 The point is we can carefully pick which subset, 200 00:11:30,640 --> 00:11:31,630 in many cases-- 201 00:11:31,630 --> 00:11:34,990 this kind of a very small experimental design. 202 00:11:34,990 --> 00:11:36,730 But in many cases, we can pick which 203 00:11:36,730 --> 00:11:40,510 subset of the rows we want to take, 204 00:11:40,510 --> 00:11:42,940 which have a fraction we want to do, 205 00:11:42,940 --> 00:11:47,920 based on our belief in which interactions 206 00:11:47,920 --> 00:11:52,000 are going to be least likely or least important. 207 00:11:52,000 --> 00:11:55,310 So what we want to do is get a little bit of a feel for that. 208 00:11:55,310 --> 00:11:59,500 So here's a little bit larger picture, 209 00:11:59,500 --> 00:12:02,980 as well as now extending some of our shortcut 210 00:12:02,980 --> 00:12:08,470 techniques, our terminology, our little factor algebra 211 00:12:08,470 --> 00:12:15,340 that we've got for defining rows and columns in our experiments. 212 00:12:15,340 --> 00:12:17,470 So if I were doing a three factor 213 00:12:17,470 --> 00:12:23,290 experiment, a full 2 to the 3 array, this is our x matrix. 214 00:12:23,290 --> 00:12:27,670 The identity column-- the C column-- 215 00:12:27,670 --> 00:12:29,060 you can sort of see I've-- 216 00:12:29,060 --> 00:12:30,640 all low, all high. 217 00:12:30,640 --> 00:12:34,090 And then within that, I've got the B columns, low and high, 218 00:12:34,090 --> 00:12:36,190 low and high, low and high. 219 00:12:36,190 --> 00:12:40,170 And then within each of those, low-high, low-high, low-high, 220 00:12:40,170 --> 00:12:40,670 low-high. 221 00:12:40,670 --> 00:12:45,550 So that's, again, all 16 of the possible-- 222 00:12:45,550 --> 00:12:49,780 excuse me, all eight of the possible combinations. 223 00:12:49,780 --> 00:12:53,720 And then we can construct our interaction terms. 224 00:12:53,720 --> 00:12:56,260 These are two factor interactions. 225 00:12:56,260 --> 00:12:59,420 And then we've got a three factor interaction as well. 226 00:12:59,420 --> 00:13:03,190 So that would be all eight of our model coefficients. 227 00:13:03,190 --> 00:13:08,410 The column would tell me how to form the contrasts 228 00:13:08,410 --> 00:13:14,170 for detecting or estimating each of those interaction 229 00:13:14,170 --> 00:13:15,570 terms in the model. 230 00:13:18,510 --> 00:13:23,520 So that's just our baseline. 231 00:13:23,520 --> 00:13:26,800 And now let's consider what happens-- 232 00:13:26,800 --> 00:13:33,420 let's think for this full 2 to the 3 experiment. 233 00:13:33,420 --> 00:13:36,100 If I were to only do half of the experiments-- 234 00:13:36,100 --> 00:13:41,730 let's say I did the upper half-- the top four, the shaded four 235 00:13:41,730 --> 00:13:48,870 experiments only, what happens in terms of what coefficients 236 00:13:48,870 --> 00:13:50,100 can we estimate? 237 00:13:50,100 --> 00:13:51,450 What ones can we not? 238 00:13:51,450 --> 00:13:52,200 Please come on in. 239 00:13:54,800 --> 00:13:57,540 So which coefficients can I estimate and which ones can 240 00:13:57,540 --> 00:14:00,000 I not? 241 00:14:00,000 --> 00:14:02,310 And what's confounded with what? 242 00:14:05,460 --> 00:14:09,870 We just got some visitors filing in. 243 00:14:09,870 --> 00:14:12,360 I think there is, perhaps, some bench space over there, 244 00:14:12,360 --> 00:14:13,770 as well, if you need some. 245 00:14:16,950 --> 00:14:19,680 So everyone in Singapore wave. 246 00:14:19,680 --> 00:14:21,670 Wave to our guests. 247 00:14:21,670 --> 00:14:22,170 Great. 248 00:14:27,220 --> 00:14:32,710 So if we do just this upper half fraction here, 249 00:14:32,710 --> 00:14:37,180 let's look at a couple of things that are immediately obvious. 250 00:14:37,180 --> 00:14:39,930 One that's really obvious here-- 251 00:14:39,930 --> 00:14:41,930 look at the C column. 252 00:14:41,930 --> 00:14:45,280 If we do those set of four experiments, 253 00:14:45,280 --> 00:14:50,800 can you estimate what the C effect is going to be? 254 00:14:50,800 --> 00:14:53,530 No, we haven't excited that variable at all. 255 00:14:53,530 --> 00:14:57,670 It's all four experiments, we're at the low setting of that. 256 00:14:57,670 --> 00:15:00,730 So it's clear, if I picked those four, 257 00:15:00,730 --> 00:15:04,923 I'm basically making an upfront decision that the C effect-- 258 00:15:04,923 --> 00:15:06,340 I'm not going to be able to model. 259 00:15:06,340 --> 00:15:07,798 I'm not going to be able to fit it. 260 00:15:07,798 --> 00:15:09,550 I'm not going to know if it's significant. 261 00:15:09,550 --> 00:15:14,560 I'm not exercising that variable in a way 262 00:15:14,560 --> 00:15:16,360 with the right combination of experiments 263 00:15:16,360 --> 00:15:22,150 to unambiguously say, yes, there was a C effect. 264 00:15:22,150 --> 00:15:28,750 So one way that we can describe that in our funky DOE algebra 265 00:15:28,750 --> 00:15:33,130 is to say that the C column is equal to minus the identity 266 00:15:33,130 --> 00:15:34,250 column. 267 00:15:34,250 --> 00:15:38,350 It's just minus 1 times the identity column. 268 00:15:38,350 --> 00:15:41,680 And we can also look at some of the other columns. 269 00:15:41,680 --> 00:15:42,790 And here's a neat one. 270 00:15:42,790 --> 00:15:45,370 Let's say-- let's look at this. 271 00:15:45,370 --> 00:15:53,560 The AC column, right here, and the A column. 272 00:15:53,560 --> 00:15:55,240 And again, in our funky algebra, you 273 00:15:55,240 --> 00:15:58,270 can see already this AC equals minus A. 274 00:15:58,270 --> 00:16:00,700 But if you look in one to one correspondence, 275 00:16:00,700 --> 00:16:07,120 the AC column is exactly the same, just with a minus sign. 276 00:16:07,120 --> 00:16:11,620 The same combinations of levels as the-- 277 00:16:11,620 --> 00:16:15,740 both the A and the AC have the same combinations. 278 00:16:15,740 --> 00:16:19,840 So that's confounding or aliasing 279 00:16:19,840 --> 00:16:21,400 between those two columns. 280 00:16:21,400 --> 00:16:23,980 So even if I run this experiment, 281 00:16:23,980 --> 00:16:26,950 there's no way for me to differentiate 282 00:16:26,950 --> 00:16:33,520 whether it was a main effect, an A effect, or an AC interaction. 283 00:16:33,520 --> 00:16:36,970 Now the other point here is with just this selection, again, 284 00:16:36,970 --> 00:16:39,970 of the top four columns, the same thing 285 00:16:39,970 --> 00:16:41,680 happens to other columns. 286 00:16:41,680 --> 00:16:44,110 What's the B column confounded with? 287 00:16:52,990 --> 00:16:53,830 No, we've got some-- 288 00:16:57,298 --> 00:16:59,590 somebody leaning against the wall trying to stay awake, 289 00:16:59,590 --> 00:17:00,090 I think. 290 00:17:04,480 --> 00:17:06,400 The disembodied voice in the machine 291 00:17:06,400 --> 00:17:10,869 said to tell you that that's happened before. 292 00:17:10,869 --> 00:17:13,869 So the B column is also confounded with-- 293 00:17:13,869 --> 00:17:15,310 let's see, what? 294 00:17:15,310 --> 00:17:17,950 Minus 1-- this one. 295 00:17:17,950 --> 00:17:22,480 And then finally, the last one is the AB column 296 00:17:22,480 --> 00:17:23,740 is confounded with-- 297 00:17:27,826 --> 00:17:31,520 let's see, that's just minus 1 of the ABC, it looks like. 298 00:17:31,520 --> 00:17:33,200 Do I have that right? 299 00:17:33,200 --> 00:17:35,150 Yeah. 300 00:17:35,150 --> 00:17:36,290 So that was our point here. 301 00:17:36,290 --> 00:17:38,630 If I had all eight experiments, I 302 00:17:38,630 --> 00:17:41,510 could have fit eight coefficients in the model. 303 00:17:41,510 --> 00:17:48,140 But four experiments-- I can only fit four coefficients. 304 00:17:48,140 --> 00:17:51,710 And it's more important than just fitting four coefficients. 305 00:17:51,710 --> 00:17:57,800 It's really that I folded together the effects of two 306 00:17:57,800 --> 00:18:00,830 of those columns into one coefficient 307 00:18:00,830 --> 00:18:03,440 that I have to assign either to-- 308 00:18:03,440 --> 00:18:05,330 sort of nebulously-- either to one 309 00:18:05,330 --> 00:18:08,450 or the other of the main effect, or the interaction effect, 310 00:18:08,450 --> 00:18:10,520 or to some superposition of the two. 311 00:18:14,740 --> 00:18:18,130 So this is just, basically, saying the same thing. 312 00:18:18,130 --> 00:18:19,810 I'm just pointing out-- 313 00:18:19,810 --> 00:18:21,747 looking at the columns, you can see that. 314 00:18:21,747 --> 00:18:23,830 And if you were to actually follow through and use 315 00:18:23,830 --> 00:18:27,370 our contrast map for picking out and detecting what 316 00:18:27,370 --> 00:18:32,230 the contrast for effect A or the contrast for effect AC 317 00:18:32,230 --> 00:18:35,650 would be, they're essentially the same contrast. 318 00:18:35,650 --> 00:18:40,450 The same sum of the output rows for that particular effect. 319 00:18:43,290 --> 00:18:46,680 Now, we have a shorthand way of describing 320 00:18:46,680 --> 00:18:50,920 what this confounding pattern is in our funky algebra. 321 00:18:50,920 --> 00:18:56,340 Which is to say, we equate what columns are equal to what 322 00:18:56,340 --> 00:18:58,620 columns. 323 00:18:58,620 --> 00:19:06,030 And then do a little bit of our algebra of multiplying a column 324 00:19:06,030 --> 00:19:09,270 factor or level setting by each other 325 00:19:09,270 --> 00:19:12,030 until we get down to the identity column 326 00:19:12,030 --> 00:19:13,930 on one side or the other. 327 00:19:13,930 --> 00:19:16,260 And then that becomes a shorthand way 328 00:19:16,260 --> 00:19:19,260 of describing what the whole confounding pattern is. 329 00:19:19,260 --> 00:19:21,990 So for example, I can pick, actually, almost any 330 00:19:21,990 --> 00:19:24,090 of these confounding patterns. 331 00:19:24,090 --> 00:19:28,470 Looking at the AC in the A column, where I 332 00:19:28,470 --> 00:19:30,840 detect that the AC and the A-- 333 00:19:30,840 --> 00:19:36,330 this was the A column is equal to minus of the AC column. 334 00:19:36,330 --> 00:19:41,460 Now if I multiply both sides by the A column-- 335 00:19:41,460 --> 00:19:43,230 A times minus AC-- 336 00:19:46,520 --> 00:19:50,990 then on a element by element basis, 337 00:19:50,990 --> 00:19:54,470 if I take the A column there and multiply it by itself, 338 00:19:54,470 --> 00:19:56,690 every minus 1 multiplies by minus 1. 339 00:19:56,690 --> 00:20:00,240 And I get the identity. 340 00:20:00,240 --> 00:20:03,030 Over here I've got A times a minus A. 341 00:20:03,030 --> 00:20:04,950 And that also becomes the identity. 342 00:20:04,950 --> 00:20:08,910 And I end up with just the I equals minus C. Which 343 00:20:08,910 --> 00:20:12,240 I could have also looked right directly at the column 344 00:20:12,240 --> 00:20:17,780 and said, OK, what is aliased with the identity? 345 00:20:17,780 --> 00:20:22,110 But also so you get a little familiar with that funky column 346 00:20:22,110 --> 00:20:24,830 math. 347 00:20:24,830 --> 00:20:27,470 So that, basically, is a shorthand way 348 00:20:27,470 --> 00:20:32,810 of either describing the interaction effect or more 349 00:20:32,810 --> 00:20:35,600 usefully, if I knew up front what 350 00:20:35,600 --> 00:20:42,170 interactions I was willing to confound with what effect-- 351 00:20:42,170 --> 00:20:48,410 If at the start I said, I really believe that the BC interaction 352 00:20:48,410 --> 00:20:54,170 is going to be minor compared to some other effect, 353 00:20:54,170 --> 00:20:57,020 then I could use that to actually help guide which 354 00:20:57,020 --> 00:21:01,310 half of the experiment to pick. 355 00:21:01,310 --> 00:21:05,480 So for example, I could have picked this half 356 00:21:05,480 --> 00:21:07,790 or I could have picked this half. 357 00:21:07,790 --> 00:21:11,750 And each one of those would have been consistent half fractions. 358 00:21:11,750 --> 00:21:15,350 And the choice of what to pick depends on-- 359 00:21:15,350 --> 00:21:20,840 for example, do I think AC is an alias with A, 360 00:21:20,840 --> 00:21:22,820 maybe, because AC really shouldn't 361 00:21:22,820 --> 00:21:24,800 be interacting with each other? 362 00:21:24,800 --> 00:21:26,480 I'm happy to do that. 363 00:21:26,480 --> 00:21:28,850 But the AB interaction, I might actually 364 00:21:28,850 --> 00:21:32,900 be wanting to detect that I'm willing for that to confound 365 00:21:32,900 --> 00:21:36,860 with the B factor because I think that that 366 00:21:36,860 --> 00:21:39,745 may be less important. 367 00:21:39,745 --> 00:21:41,120 AUDIENCE: Just seems to be asking 368 00:21:41,120 --> 00:21:45,140 that A is a processes pressure and that C is the temperature-- 369 00:21:45,140 --> 00:21:46,610 for sure temperature. 370 00:21:46,610 --> 00:21:50,320 So in that case, the interaction between pressure 371 00:21:50,320 --> 00:21:53,920 and temperature, AC, if temperature is high 372 00:21:53,920 --> 00:21:55,870 and pressure is low with the same, 373 00:21:55,870 --> 00:21:58,340 the interaction would be the same as A? 374 00:21:58,340 --> 00:22:00,745 Because that A is equal to AC. 375 00:22:00,745 --> 00:22:01,495 DUANE BONING: Yes. 376 00:22:05,508 --> 00:22:08,050 AUDIENCE: It will just change one parameter, just temperature 377 00:22:08,050 --> 00:22:08,563 [INAUDIBLE] 378 00:22:08,563 --> 00:22:09,980 DUANE BONING: But what it's saying 379 00:22:09,980 --> 00:22:13,760 is that your model, then, for what the A affect was, 380 00:22:13,760 --> 00:22:15,800 wasn't purely an A affect. 381 00:22:15,800 --> 00:22:19,890 It had a little bit of the interaction lurking in it. 382 00:22:19,890 --> 00:22:24,770 So when you actually run that experiment, 383 00:22:24,770 --> 00:22:26,930 if there is an interaction, you can't 384 00:22:26,930 --> 00:22:31,010 tell whether it was due to the main effect of pressure 385 00:22:31,010 --> 00:22:36,570 or also because of some interaction with temperature. 386 00:22:36,570 --> 00:22:44,180 So it's not that you get both or you only get one or the other, 387 00:22:44,180 --> 00:22:45,980 it's that they're mixed together. 388 00:22:45,980 --> 00:22:47,160 They're confounded together. 389 00:22:50,580 --> 00:22:57,410 So I alluded to some of the ideas of how you choose which 390 00:22:57,410 --> 00:22:59,210 design, based on what confounding you're 391 00:22:59,210 --> 00:23:00,650 willing to live with. 392 00:23:00,650 --> 00:23:05,790 But there's also a few additional guidelines 393 00:23:05,790 --> 00:23:07,020 that are at work. 394 00:23:07,020 --> 00:23:11,455 One is, there's this idea of balance and orthogonality-- 395 00:23:11,455 --> 00:23:13,080 that I'll talk about in just a minute-- 396 00:23:13,080 --> 00:23:19,290 that tells us you can't just willy nilly pick random rows 397 00:23:19,290 --> 00:23:24,420 out of your matrix and be able to use our design experiments 398 00:23:24,420 --> 00:23:26,470 and that analytic techniques. 399 00:23:26,470 --> 00:23:30,300 Things like the estimation of the effects and so on only 400 00:23:30,300 --> 00:23:35,040 apply if I've got balanced and orthogonal experiments. 401 00:23:35,040 --> 00:23:37,620 And that leads, also, a bit, to this idea 402 00:23:37,620 --> 00:23:39,540 or very closely related to the idea 403 00:23:39,540 --> 00:23:42,780 of getting enough excitation of the inputs. 404 00:23:46,460 --> 00:23:52,370 So going back to our 2 to the 3 full table here-- 405 00:23:52,370 --> 00:23:56,470 this idea of balance, first off, says 406 00:23:56,470 --> 00:24:00,880 that in whatever subset of the design that you've 407 00:24:00,880 --> 00:24:03,700 got for the factors that you're interested in, 408 00:24:03,700 --> 00:24:07,570 you want all of the columns to have an equal number of plus 409 00:24:07,570 --> 00:24:09,150 and minus signs. 410 00:24:09,150 --> 00:24:16,400 Now, we saw if I did the upper fraction for C, 411 00:24:16,400 --> 00:24:18,710 if I was trying to deal with C, that's 412 00:24:18,710 --> 00:24:23,390 not balanced because I've got four low settings and zero 413 00:24:23,390 --> 00:24:24,050 high settings. 414 00:24:24,050 --> 00:24:29,700 I have not at all excited that input. 415 00:24:29,700 --> 00:24:32,400 But if you think back to all of our algebra 416 00:24:32,400 --> 00:24:34,393 for dealing with contrasts and being 417 00:24:34,393 --> 00:24:36,810 able to take the average of this, and the average of that, 418 00:24:36,810 --> 00:24:39,810 and subtract the two with the contrast, 419 00:24:39,810 --> 00:24:42,390 that's basically the reason that we 420 00:24:42,390 --> 00:24:45,660 need this idea of balance between the high 421 00:24:45,660 --> 00:24:48,850 and the low settings for that particular factor. 422 00:24:48,850 --> 00:24:51,300 So to use our shorthand approaches, 423 00:24:51,300 --> 00:24:55,340 you need an equal number of the plus and minus signs 424 00:24:55,340 --> 00:24:57,410 in each of the columns that you are 425 00:24:57,410 --> 00:24:59,150 trying to use in your model. 426 00:25:02,210 --> 00:25:04,430 Turns out you can relax that a little bit if you 427 00:25:04,430 --> 00:25:06,740 do some regression approaches. 428 00:25:06,740 --> 00:25:09,350 There's some other risky, nasty things 429 00:25:09,350 --> 00:25:12,620 that happen like you may not have 430 00:25:12,620 --> 00:25:15,530 the same amount of variance or residual error 431 00:25:15,530 --> 00:25:18,770 at different points in your experiment. 432 00:25:18,770 --> 00:25:20,900 But especially for the shorthand approaches, 433 00:25:20,900 --> 00:25:25,310 you have to have this notion of balance. 434 00:25:25,310 --> 00:25:29,210 The second idea is orthogonality, 435 00:25:29,210 --> 00:25:32,570 which is basically saying that what I need 436 00:25:32,570 --> 00:25:38,810 is for the sum of the product, element wise product of two 437 00:25:38,810 --> 00:25:42,110 columns, to sum up to 0. 438 00:25:42,110 --> 00:25:46,320 So for example, here, the column A and B-- 439 00:25:46,320 --> 00:25:48,210 this is a product of one-- 440 00:25:48,210 --> 00:25:51,300 product minus 1, product minus 1, product of 1. 441 00:25:51,300 --> 00:25:53,670 Those sum up together to be 0. 442 00:25:53,670 --> 00:25:58,270 The A and B columns are orthogonal. 443 00:25:58,270 --> 00:26:02,910 Which is another way of saying those two columns are not 444 00:26:02,910 --> 00:26:05,050 confounded with each other. 445 00:26:05,050 --> 00:26:08,190 So think in a linear algebraic sense. 446 00:26:08,190 --> 00:26:10,320 If the two vectors are orthogonal, 447 00:26:10,320 --> 00:26:14,550 they are not mixing together effects in any way. 448 00:26:14,550 --> 00:26:18,330 So if I have two columns that I want 449 00:26:18,330 --> 00:26:21,120 to be able to model separately and not 450 00:26:21,120 --> 00:26:23,880 have the coefficient trying to mix in randomly 451 00:26:23,880 --> 00:26:27,390 some amount of one or the other, but, in fact, 452 00:26:27,390 --> 00:26:30,900 to be identified with that particular effect, 453 00:26:30,900 --> 00:26:35,070 those two columns have to be orthogonal. 454 00:26:35,070 --> 00:26:37,020 So for example, what that's telling 455 00:26:37,020 --> 00:26:42,690 me is if I were to pick this upper half fraction, 456 00:26:42,690 --> 00:26:45,840 the A and the B columns are orthogonal. 457 00:26:45,840 --> 00:26:48,190 They are not confounded with each other. 458 00:26:48,190 --> 00:26:51,180 They are two main effects that, at least with respect 459 00:26:51,180 --> 00:26:55,730 to each other, are separable. 460 00:26:55,730 --> 00:26:58,610 You could ask that same question now to our other confounding 461 00:26:58,610 --> 00:26:59,480 patterns. 462 00:26:59,480 --> 00:27:00,220 What did we say? 463 00:27:00,220 --> 00:27:02,390 A is confounded with BC-- 464 00:27:02,390 --> 00:27:05,670 was it? 465 00:27:05,670 --> 00:27:07,950 So A and BC-- 466 00:27:07,950 --> 00:27:11,450 are those two orthogonal? 467 00:27:11,450 --> 00:27:13,040 Certainly, we know by confounding, 468 00:27:13,040 --> 00:27:14,165 they're not supposed to be. 469 00:27:14,165 --> 00:27:18,430 And if you do that product of sums, each one of those, 470 00:27:18,430 --> 00:27:20,560 I believe-- 471 00:27:20,560 --> 00:27:22,060 AC, sorry. 472 00:27:22,060 --> 00:27:24,010 Good-- I was about to say, that's weird. 473 00:27:24,010 --> 00:27:25,390 They look orthogonal. 474 00:27:25,390 --> 00:27:26,470 There we go. 475 00:27:26,470 --> 00:27:29,470 Each one-- the product is minus 1 every time 476 00:27:29,470 --> 00:27:31,060 they all sum up to minus 4. 477 00:27:31,060 --> 00:27:33,010 They're not orthogonal. 478 00:27:33,010 --> 00:27:34,250 They are mixing in together. 479 00:27:36,910 --> 00:27:38,530 So those are a couple of these ideas 480 00:27:38,530 --> 00:27:40,750 of balance and orthogonality that 481 00:27:40,750 --> 00:27:43,030 are in some sense the same-- 482 00:27:43,030 --> 00:27:45,820 just another terminology for talking about the same thing 483 00:27:45,820 --> 00:27:47,870 that we've already built up some intuition about. 484 00:27:50,560 --> 00:27:55,910 So I think we've already just went through that. 485 00:27:55,910 --> 00:27:58,540 So if I were, in this experiment, 486 00:27:58,540 --> 00:28:02,890 doing a half fraction and picking the upper column, 487 00:28:02,890 --> 00:28:07,120 you can see things like A and B are balanced, C is not. 488 00:28:07,120 --> 00:28:11,740 So I can't try to put that into my model. 489 00:28:11,740 --> 00:28:17,650 A, B, and C are orthogonal. 490 00:28:17,650 --> 00:28:20,680 Let's see, is that right? 491 00:28:20,680 --> 00:28:22,420 Yes-- so if I were-- 492 00:28:22,420 --> 00:28:24,720 except I'm already knowing I better not try model 493 00:28:24,720 --> 00:28:27,580 C because I don't have sufficient excitation in there. 494 00:28:27,580 --> 00:28:30,430 But I could also then ask, OK, what else 495 00:28:30,430 --> 00:28:32,780 is confounded with each other? 496 00:28:32,780 --> 00:28:35,410 And if there are places where I don't have orthogonality, 497 00:28:35,410 --> 00:28:40,660 I might then be led to is there a different half fraction 498 00:28:40,660 --> 00:28:42,730 that I might want to pick. 499 00:28:42,730 --> 00:28:45,580 One might be this lower one. 500 00:28:45,580 --> 00:28:47,740 I've already looked at the upper half. 501 00:28:47,740 --> 00:28:51,220 But couldn't there be other meaningful combinations? 502 00:28:51,220 --> 00:28:53,110 And the answer is yes. 503 00:28:53,110 --> 00:28:55,810 So for example, here's a better subset 504 00:28:55,810 --> 00:29:01,990 if I were looking for a particular property that was 505 00:29:01,990 --> 00:29:04,330 annoying with the previous one. 506 00:29:04,330 --> 00:29:08,500 In particular, this fact that I can't even model. 507 00:29:08,500 --> 00:29:10,975 I'm not even exciting a main effect. 508 00:29:13,900 --> 00:29:16,540 That feels weird. 509 00:29:16,540 --> 00:29:19,360 So a better half fraction, then, at least 510 00:29:19,360 --> 00:29:22,360 lets me look at main effects. 511 00:29:22,360 --> 00:29:25,810 Might be this shaded blue half fraction. 512 00:29:25,810 --> 00:29:27,670 Now look what's going on. 513 00:29:27,670 --> 00:29:32,770 All three columns A, B, and C are balanced-- 514 00:29:32,770 --> 00:29:36,370 just the shaded blue parts. 515 00:29:36,370 --> 00:29:39,860 So I've got equal numbers of high and low, 516 00:29:39,860 --> 00:29:42,950 each of A, B, and C are mutually orthogonal. 517 00:29:48,740 --> 00:29:52,990 I still have confounding going on. 518 00:29:52,990 --> 00:29:56,400 I'm not sure quite exactly what this means. 519 00:29:56,400 --> 00:30:01,670 I guess A-- let's see. 520 00:30:01,670 --> 00:30:05,280 I still have confounding because, for example-- 521 00:30:05,280 --> 00:30:09,020 So A is not orthogonal with what column here? 522 00:30:09,020 --> 00:30:13,966 BC in this one. 523 00:30:13,966 --> 00:30:16,720 Because 1, 1 minus-- 524 00:30:16,720 --> 00:30:18,820 so these are the same. 525 00:30:18,820 --> 00:30:22,540 So these two columns are confounded with each other. 526 00:30:22,540 --> 00:30:26,280 So those are not mutually orthogonal. 527 00:30:26,280 --> 00:30:29,160 And I could ask, OK, what is a shorthand 528 00:30:29,160 --> 00:30:30,457 way of describing this? 529 00:30:30,457 --> 00:30:32,040 I could pick a couple of these columns 530 00:30:32,040 --> 00:30:37,260 and say A equals BC and do, again, my funky algebra. 531 00:30:37,260 --> 00:30:42,390 Multiply both sides by A and get I equals ABC. 532 00:30:42,390 --> 00:30:44,580 And there's my defining relationship 533 00:30:44,580 --> 00:30:46,510 for what's confounded with what. 534 00:30:46,510 --> 00:30:52,420 Could also look up here, again, and ABC is exactly-- 535 00:30:52,420 --> 00:30:55,120 for the blue-- for that subset-- 536 00:30:55,120 --> 00:30:57,070 for the blue element is confounded 537 00:30:57,070 --> 00:30:58,350 with the orthogonality matrix. 538 00:31:01,750 --> 00:31:04,750 What might be better about this design or this subset 539 00:31:04,750 --> 00:31:06,670 than the previous one? 540 00:31:06,670 --> 00:31:09,100 Depending on what you're looking for in the-- 541 00:31:09,100 --> 00:31:12,730 what's better about this? 542 00:31:12,730 --> 00:31:14,590 Here we have, I guess, shorthand for us-- 543 00:31:14,590 --> 00:31:16,510 what the other aliases are. 544 00:31:16,510 --> 00:31:19,000 Again, we expect to have four aliases because I'm only 545 00:31:19,000 --> 00:31:22,000 doing four experiments. 546 00:31:22,000 --> 00:31:23,500 What do you see about these aliases? 547 00:31:28,630 --> 00:31:30,950 AUDIENCE: [INAUDIBLE] high and low values 548 00:31:30,950 --> 00:31:35,880 for all the main effects and one only third order, 549 00:31:35,880 --> 00:31:39,100 like ABC, is the one that you have not excited at all, 550 00:31:39,100 --> 00:31:41,520 it's the one that [INAUDIBLE] 551 00:31:41,520 --> 00:31:44,590 DUANE BONING: Good-- so that's at least two 552 00:31:44,590 --> 00:31:48,460 of the three properties that I really like about this design. 553 00:31:48,460 --> 00:31:52,930 If everybody didn't hear it, he said, at least 554 00:31:52,930 --> 00:31:58,150 you're exciting A, B, and C. Second, the only column 555 00:31:58,150 --> 00:32:00,700 that you're not exciting at all is a third order 556 00:32:00,700 --> 00:32:03,520 interaction, which is fine. 557 00:32:03,520 --> 00:32:05,440 How likely is a third order interaction? 558 00:32:05,440 --> 00:32:08,000 We'll chat about that a little bit more. 559 00:32:08,000 --> 00:32:09,715 There's another additional effect. 560 00:32:09,715 --> 00:32:14,300 There's another additional really nice characteristic 561 00:32:14,300 --> 00:32:17,994 about looking at these aliasing patterns here. 562 00:32:17,994 --> 00:32:20,050 AUDIENCE: Like, A is [INAUDIBLE] BC 563 00:32:20,050 --> 00:32:23,480 so they're not measuring exactly the same amount. 564 00:32:23,480 --> 00:32:25,130 DUANE BONING: Right, they don't-- 565 00:32:25,130 --> 00:32:26,570 I don't know quite what-- 566 00:32:26,570 --> 00:32:29,480 don't overlap is a nice way of describing it. 567 00:32:29,480 --> 00:32:33,290 I've got a main effect that is confounded 568 00:32:33,290 --> 00:32:43,280 with an interaction of somebody else's second order term. 569 00:32:43,280 --> 00:32:45,750 So it kind of gets around that discomfort with A 570 00:32:45,750 --> 00:32:50,430 being compounded with AC that you were describing. 571 00:32:50,430 --> 00:32:54,080 Now, I still have to worry about it because if any second order 572 00:32:54,080 --> 00:32:59,660 effects are out there, if AC is still active, now 573 00:32:59,660 --> 00:33:02,540 I'm not confusing it with A main effect. 574 00:33:02,540 --> 00:33:05,900 I am confusing though with the B main effect. 575 00:33:05,900 --> 00:33:12,690 But the nice thing is now I can- you'll see it a little bit 576 00:33:12,690 --> 00:33:13,190 later-- 577 00:33:13,190 --> 00:33:18,620 I can also appeal to some physical causality 578 00:33:18,620 --> 00:33:22,910 to also talk about the likelihood of it being 579 00:33:22,910 --> 00:33:26,180 a main effect or an interaction effect, when I finally 580 00:33:26,180 --> 00:33:29,240 analyze my overall experiment. 581 00:33:29,240 --> 00:33:30,950 What am I saying? 582 00:33:30,950 --> 00:33:33,050 Here's the peek forward at that. 583 00:33:33,050 --> 00:33:37,330 If I were to find that this-- 584 00:33:37,330 --> 00:33:42,530 either the B main effect or the AC interaction is at work, 585 00:33:42,530 --> 00:33:47,270 but I also did an ANOVA and I found out already 586 00:33:47,270 --> 00:33:50,850 that this was not significant, in other words, 587 00:33:50,850 --> 00:33:54,500 they're not an A main effect, it's highly unlikely 588 00:33:54,500 --> 00:33:57,830 that there's an interaction with A. 589 00:33:57,830 --> 00:33:59,870 If it doesn't have an overall effect, 590 00:33:59,870 --> 00:34:02,570 why would it have a subtle interaction. 591 00:34:02,570 --> 00:34:05,030 It's kind of unlikely. 592 00:34:05,030 --> 00:34:07,910 So by ordering this, you actually 593 00:34:07,910 --> 00:34:11,150 can start to decode what's going on in the process 594 00:34:11,150 --> 00:34:13,070 with a little bit better visibility. 595 00:34:15,900 --> 00:34:20,489 So what we're actually talking about here 596 00:34:20,489 --> 00:34:22,560 there is some additional terminology 597 00:34:22,560 --> 00:34:25,350 for which is design resolution. 598 00:34:25,350 --> 00:34:30,330 It's basically a characteristic of the aliasing patterns 599 00:34:30,330 --> 00:34:36,060 and how decoupled you are able to get between main effects 600 00:34:36,060 --> 00:34:39,030 and interaction effects or second order interactions 601 00:34:39,030 --> 00:34:41,739 with other interactions, and so on. 602 00:34:41,739 --> 00:34:46,530 And you will hear, sometimes, description of particular half 603 00:34:46,530 --> 00:34:51,150 fraction or other experimental designs, DOEs, 604 00:34:51,150 --> 00:34:55,679 as being resolution three, or resolution four, resolution 605 00:34:55,679 --> 00:34:56,790 five. 606 00:34:56,790 --> 00:35:01,440 And a resolution three is a weaker ability 607 00:35:01,440 --> 00:35:06,600 to discern than a higher resolution in your experiment. 608 00:35:06,600 --> 00:35:12,960 But a resolution three is nice in that no main effect-- 609 00:35:12,960 --> 00:35:16,450 an A factor is alias with another main effect. 610 00:35:16,450 --> 00:35:23,660 So A is not alias with B. But you will have main factor 611 00:35:23,660 --> 00:35:26,480 and other interaction aliases. 612 00:35:26,480 --> 00:35:28,070 Let's say you didn't want that. 613 00:35:28,070 --> 00:35:30,830 You might need to build a more powerful-- 614 00:35:30,830 --> 00:35:35,270 do more experimental combinations. 615 00:35:35,270 --> 00:35:38,810 And you could go to a resolution four, which is essentially 616 00:35:38,810 --> 00:35:42,530 designed as an experiment where no main effect is 617 00:35:42,530 --> 00:35:44,210 an alias with another main effect. 618 00:35:44,210 --> 00:35:47,630 And I'm not even aliasing any main effect 619 00:35:47,630 --> 00:35:49,350 with its second order interaction. 620 00:35:49,350 --> 00:35:52,550 I might be so worried that there are second order interactions 621 00:35:52,550 --> 00:35:55,790 that I have to make sure that I do 622 00:35:55,790 --> 00:35:59,150 enough experimental points to be able to detect them separately. 623 00:35:59,150 --> 00:36:04,520 Do an ANOVA on significance of each of them separately. 624 00:36:04,520 --> 00:36:08,540 The example that we did just a second ago, what resolution 625 00:36:08,540 --> 00:36:11,170 do you think that is? 626 00:36:11,170 --> 00:36:12,470 It's only resolution three. 627 00:36:12,470 --> 00:36:14,080 We were clearly seeing interaction 628 00:36:14,080 --> 00:36:16,030 with the second order effect. 629 00:36:16,030 --> 00:36:19,390 If I wanted to get to resolution four, what would I need to do? 630 00:36:22,150 --> 00:36:24,160 It's the full factorial, in this case. 631 00:36:24,160 --> 00:36:26,410 I'd have to go to the full 2 to the 3. 632 00:36:26,410 --> 00:36:30,220 I couldn't pick a half fraction. 633 00:36:30,220 --> 00:36:33,420 So if there were four factors, it 634 00:36:33,420 --> 00:36:39,070 turns out you get 16 columns. 635 00:36:39,070 --> 00:36:42,040 And now I can pick 16 rows. 636 00:36:42,040 --> 00:36:45,390 I can pick up 16 columns and rows. 637 00:36:45,390 --> 00:36:48,420 But I can pick, now, some subset if I did a half fraction 638 00:36:48,420 --> 00:36:54,150 and still achieve a resolution four if I wanted to. 639 00:36:54,150 --> 00:36:56,860 So this is just looking back, again, at the column. 640 00:36:56,860 --> 00:37:01,170 Again, the main effects are aliased with interactions only 641 00:37:01,170 --> 00:37:04,260 in this defined experiment. 642 00:37:04,260 --> 00:37:07,650 And that, you will sometimes see referred to as a 2 643 00:37:07,650 --> 00:37:12,490 to the 3 minus 1 sub resolution three experimental design. 644 00:37:12,490 --> 00:37:14,890 So now, if you see the shorthand notation there, 645 00:37:14,890 --> 00:37:17,820 you look and go, oh, that's a half fraction because it's a 2 646 00:37:17,820 --> 00:37:19,320 to the 3 minus 1. 647 00:37:19,320 --> 00:37:21,270 And it's resolution three, telling you 648 00:37:21,270 --> 00:37:23,571 something about the aliasing. 649 00:37:23,571 --> 00:37:24,988 AUDIENCE: In the previous chapter, 650 00:37:24,988 --> 00:37:26,944 is the difference between resolution 4 651 00:37:26,944 --> 00:37:32,582 and 5 both have no [INAUDIBLE] 652 00:37:32,582 --> 00:37:34,290 DUANE BONING: So there was a subtle point 653 00:37:34,290 --> 00:37:39,480 in here, which is no main effect interacts with a second order 654 00:37:39,480 --> 00:37:40,500 interaction. 655 00:37:40,500 --> 00:37:42,960 But I might have a second order interaction 656 00:37:42,960 --> 00:37:47,040 aliased with another second order interaction. 657 00:37:47,040 --> 00:37:50,460 The resolution five says, no second order interactions 658 00:37:50,460 --> 00:37:53,525 are aliased with any other second order interactions. 659 00:37:58,860 --> 00:38:02,310 So here we've been talking about half fractions-- 660 00:38:02,310 --> 00:38:07,470 2 to the 3 minus 1, especially as the number of factors 661 00:38:07,470 --> 00:38:08,370 gets large. 662 00:38:08,370 --> 00:38:10,650 Let's say I have a design-- 663 00:38:10,650 --> 00:38:13,500 I'm designing a process and it's got eight different control 664 00:38:13,500 --> 00:38:14,010 knobs. 665 00:38:14,010 --> 00:38:15,960 That's a 2 to the 8th experiment. 666 00:38:15,960 --> 00:38:18,630 That's an awful lot of different combinations. 667 00:38:18,630 --> 00:38:22,770 I might not want to just go down a 2 to the 8 minus 1. 668 00:38:22,770 --> 00:38:28,290 I might want to do even lower half fractions. 669 00:38:28,290 --> 00:38:34,000 For example, if I cut it in half, 670 00:38:34,000 --> 00:38:36,390 and then cut it in half again, that's a quarter fraction. 671 00:38:36,390 --> 00:38:39,250 I'm only picking a quarter of the full factorial. 672 00:38:39,250 --> 00:38:41,850 And again, there would be a resolution 673 00:38:41,850 --> 00:38:43,200 associated with that. 674 00:38:43,200 --> 00:38:47,450 You could look and see what is the aliasing pattern with that. 675 00:38:47,450 --> 00:38:50,820 And as you get to the higher order models, 676 00:38:50,820 --> 00:38:55,200 you will often see increased half fractions 677 00:38:55,200 --> 00:38:58,290 in early parts of experimental design 678 00:38:58,290 --> 00:39:02,520 where you're just trying to see are the main factors important. 679 00:39:02,520 --> 00:39:05,480 And might there be some second order effects, 680 00:39:05,480 --> 00:39:08,400 some second order interactions, at least detecting 681 00:39:08,400 --> 00:39:09,690 whether they exist. 682 00:39:09,690 --> 00:39:11,190 And then if they do, you'll often 683 00:39:11,190 --> 00:39:13,740 then go back in and start filling in the other parts 684 00:39:13,740 --> 00:39:15,045 of the experiment. 685 00:39:18,510 --> 00:39:20,010 Here's another half fraction. 686 00:39:20,010 --> 00:39:26,690 Trying to think-- this is, again, for our 2 687 00:39:26,690 --> 00:39:28,830 to the 3 minus 1. 688 00:39:28,830 --> 00:39:30,260 This is a different set of four. 689 00:39:30,260 --> 00:39:33,290 It's just got a different defining relationship. 690 00:39:33,290 --> 00:39:37,820 And you could, again, ask, in this case, is that-- number 691 00:39:37,820 --> 00:39:39,950 one, is that a legitimate half fraction? 692 00:39:39,950 --> 00:39:43,910 Could I legitimately pull these four columns out-- 693 00:39:43,910 --> 00:39:44,750 four rows out? 694 00:39:47,390 --> 00:39:49,580 What kind of balance do I have? 695 00:39:49,580 --> 00:39:51,110 What kind of orthogonality? 696 00:39:51,110 --> 00:39:53,300 Or what's aliased with what? 697 00:39:53,300 --> 00:39:56,000 And you can either do it by looking at-- 698 00:39:56,000 --> 00:39:57,620 you can detect or decide what are 699 00:39:57,620 --> 00:40:01,430 the aliases by looking at the columns 700 00:40:01,430 --> 00:40:04,430 or simply doing your little math down here 701 00:40:04,430 --> 00:40:06,840 from the defining relationship. 702 00:40:06,840 --> 00:40:11,240 So if I start with I equals AC and say, OK, 703 00:40:11,240 --> 00:40:15,740 what's aliased with C? 704 00:40:15,740 --> 00:40:22,685 So I do C times I equals CA C equals A because the C times 705 00:40:22,685 --> 00:40:25,300 C is the identity, and so on. 706 00:40:25,300 --> 00:40:27,410 And similarly, I could say, OK, if I 707 00:40:27,410 --> 00:40:30,950 start with I equals AC multiplied by B, 708 00:40:30,950 --> 00:40:35,970 I get B equals ABC. 709 00:40:35,970 --> 00:40:37,030 There's that. 710 00:40:37,030 --> 00:40:40,920 So there's a main effect aliased with the third order. 711 00:40:40,920 --> 00:40:42,180 That's not too bad. 712 00:40:42,180 --> 00:40:46,950 I don't really like that A aliasing with C. That's 713 00:40:46,950 --> 00:40:51,150 not even a resolution three experimental pattern. 714 00:40:51,150 --> 00:40:55,200 So I'd be very careful in using this 715 00:40:55,200 --> 00:40:57,930 unless I really believed, strangely, 716 00:40:57,930 --> 00:41:00,660 that C was not going to have a main effect 717 00:41:00,660 --> 00:41:04,980 but I might be worried about it or interested in it 718 00:41:04,980 --> 00:41:10,470 for the purposes of looking for interactions. 719 00:41:10,470 --> 00:41:13,650 There is one other subtle place where you might actually-- 720 00:41:17,090 --> 00:41:18,270 nope, never mind. 721 00:41:21,050 --> 00:41:23,840 I've already alluded to a couple of ways of deciding what 722 00:41:23,840 --> 00:41:25,400 aliasing pattern to choose. 723 00:41:25,400 --> 00:41:28,340 The most important one is your knowledge of the process-- 724 00:41:28,340 --> 00:41:29,720 your experience with the process. 725 00:41:29,720 --> 00:41:31,680 What factors are likely to actually, 726 00:41:31,680 --> 00:41:34,760 based on physical causality, interact. 727 00:41:34,760 --> 00:41:37,850 But there are a few very important rules of thumb 728 00:41:37,850 --> 00:41:39,470 that are worth mentioning here. 729 00:41:39,470 --> 00:41:42,810 They are this idea of sparsity effects, hierarchy of effects, 730 00:41:42,810 --> 00:41:43,700 and inheritance. 731 00:41:43,700 --> 00:41:46,040 And we've already talked qualitatively about them. 732 00:41:46,040 --> 00:41:47,630 But just to nail them down-- 733 00:41:50,870 --> 00:41:57,230 if I have eight factors, the experimenter, you, likely 734 00:41:57,230 --> 00:42:03,370 has a certain amount of a priori knowledge 735 00:42:03,370 --> 00:42:09,530 that there's an ordering and likelihood of those effects. 736 00:42:09,530 --> 00:42:12,820 But it's also the case, I have different eight factors 737 00:42:12,820 --> 00:42:14,980 on my experiment, it's highly likely 738 00:42:14,980 --> 00:42:17,770 that they don't all equally influence the process. 739 00:42:20,590 --> 00:42:24,880 Most processes have a top few factors 740 00:42:24,880 --> 00:42:27,760 that have most of the influence or most of the effective. 741 00:42:27,760 --> 00:42:33,020 It's like a Pareto type of a rule. 742 00:42:33,020 --> 00:42:36,540 So early on in screening, in fact, 743 00:42:36,540 --> 00:42:39,170 you might be very happy doing one of these half fraction, 744 00:42:39,170 --> 00:42:41,090 quarter fraction, and so on. 745 00:42:41,090 --> 00:42:43,160 The purpose of which is really just 746 00:42:43,160 --> 00:42:47,420 to narrow down a really large number of candidate effects 747 00:42:47,420 --> 00:42:49,880 down to a smaller number that you're then 748 00:42:49,880 --> 00:42:55,280 going to model more accurately and look for interactions 749 00:42:55,280 --> 00:42:58,130 among those large effects. 750 00:42:58,130 --> 00:43:01,640 So this sparsity of effects is one 751 00:43:01,640 --> 00:43:05,330 of the things that's at work in early screening experiments 752 00:43:05,330 --> 00:43:08,225 and is a good rule of thumb a rule 753 00:43:08,225 --> 00:43:13,270 or a generic effect to be able to take advantage of. 754 00:43:13,270 --> 00:43:17,950 The second one is this notion of hierarchy, which is basically-- 755 00:43:17,950 --> 00:43:20,770 again, these main effects are usually 756 00:43:20,770 --> 00:43:23,780 more dominant than second order effects, 757 00:43:23,780 --> 00:43:29,530 which are more dominant than third order interactions. 758 00:43:29,530 --> 00:43:34,300 Furthermore, usually you have to-- 759 00:43:34,300 --> 00:43:37,330 not have to-- but usually, you will 760 00:43:37,330 --> 00:43:44,170 see that the main effect or the lower order interaction 761 00:43:44,170 --> 00:43:48,340 has to be at work before you have substantial interactions 762 00:43:48,340 --> 00:43:49,940 with some other factor. 763 00:43:49,940 --> 00:43:53,440 So it's going to be rare for you to have a big AB 764 00:43:53,440 --> 00:43:59,560 effect and no main effect with A and no main effect with B. 765 00:43:59,560 --> 00:44:03,970 So that's another rule that you can often use. 766 00:44:03,970 --> 00:44:08,800 And the likelihood of having these very high order 767 00:44:08,800 --> 00:44:10,540 interactions-- 768 00:44:10,540 --> 00:44:13,990 the idea that you will have an extra delta in an eight factor 769 00:44:13,990 --> 00:44:18,460 experiment attributable to exactly the combination of all 770 00:44:18,460 --> 00:44:21,220 of those settings being important 771 00:44:21,220 --> 00:44:23,260 or large enough to be important is pretty small. 772 00:44:26,460 --> 00:44:29,860 And then, I guess, I've actually mixed in the inheritance here 773 00:44:29,860 --> 00:44:30,360 already. 774 00:44:30,360 --> 00:44:34,130 This idea that I really need both of these factors. 775 00:44:34,130 --> 00:44:35,730 So the hierarchy is really just more 776 00:44:35,730 --> 00:44:40,290 saying the size of the effect, schematically pictured 777 00:44:40,290 --> 00:44:41,970 by the size of these boxes. 778 00:44:41,970 --> 00:44:44,290 The main effect tends to be larger. 779 00:44:44,290 --> 00:44:48,870 And then the inheritance is I kind of need the lower order 780 00:44:48,870 --> 00:44:50,100 interactions be at work. 781 00:44:53,660 --> 00:44:58,220 So here's just some additional examples 782 00:44:58,220 --> 00:45:02,070 in the context of half fractions. 783 00:45:02,070 --> 00:45:05,540 We've already almost exhaustively explored the 2 784 00:45:05,540 --> 00:45:11,960 to the 3 minus 1 resolution three kind of picture. 785 00:45:11,960 --> 00:45:14,180 Here's an example defining relationship 786 00:45:14,180 --> 00:45:16,310 for resolution four. 787 00:45:16,310 --> 00:45:19,310 And here, you really have to get up to four factors 788 00:45:19,310 --> 00:45:22,310 before having enough leeway to be 789 00:45:22,310 --> 00:45:25,940 able to have that kind of an interaction pattern 790 00:45:25,940 --> 00:45:28,790 where I would only have-- 791 00:45:28,790 --> 00:45:31,160 so, for example, now if I did the algebra, 792 00:45:31,160 --> 00:45:34,790 A would be confounded with BCD. 793 00:45:34,790 --> 00:45:43,130 Or if I multiplied AB on one side, AB is equal to CD. 794 00:45:43,130 --> 00:45:44,930 So that was the earlier question. 795 00:45:44,930 --> 00:45:49,310 There you can see, a two-factor effect or two way interaction 796 00:45:49,310 --> 00:45:52,220 is confounded with another two way interaction. 797 00:45:52,220 --> 00:45:55,160 But all main effects are only compounded 798 00:45:55,160 --> 00:45:59,643 with three way interactions and none with each other. 799 00:45:59,643 --> 00:46:01,560 And here would be an example you can play with 800 00:46:01,560 --> 00:46:06,570 to see what the aliasing pattern would be with a resolution five 801 00:46:06,570 --> 00:46:09,975 kind of experiment. 802 00:46:09,975 --> 00:46:15,080 AUDIENCE: [INAUDIBLE] the 4 minus 2 design-- 803 00:46:15,080 --> 00:46:18,683 is that-- would that have-- what's the identity? 804 00:46:18,683 --> 00:46:19,850 DUANE BONING: I think that-- 805 00:46:25,920 --> 00:46:28,770 the identity-- I don't remember offhand. 806 00:46:28,770 --> 00:46:33,090 I think we've got, actually, an example right here that 807 00:46:33,090 --> 00:46:36,946 shows what the aliasing pattern would be. 808 00:46:36,946 --> 00:46:40,430 You asked exactly the question. 809 00:46:40,430 --> 00:46:42,740 And so we can work it out. 810 00:46:42,740 --> 00:46:44,670 So here's an example. 811 00:46:44,670 --> 00:46:47,540 First off, we already know it's probably not 812 00:46:47,540 --> 00:46:52,680 going to be a defining relationship like this one 813 00:46:52,680 --> 00:46:58,220 because I've only got 2 to the 4 minus 2, which is 2 square. 814 00:46:58,220 --> 00:47:02,870 I've only got four experiments out of the 16 that I'm picking. 815 00:47:02,870 --> 00:47:05,600 So I've got an awful lot of confounding going on. 816 00:47:05,600 --> 00:47:07,430 And the question would be, I can actually 817 00:47:07,430 --> 00:47:10,820 build it and look and see what the confounding pattern would 818 00:47:10,820 --> 00:47:13,490 be. 819 00:47:13,490 --> 00:47:17,660 And in particular, I might want to pick it such a way 820 00:47:17,660 --> 00:47:21,650 that I want to at least be able to detect the four main effects 821 00:47:21,650 --> 00:47:25,230 and lots of other things could be aliased in with that. 822 00:47:25,230 --> 00:47:29,600 So I could build it a priori and say, OK, I'm 823 00:47:29,600 --> 00:47:36,800 willing for A to be aliased with that and with that. 824 00:47:36,800 --> 00:47:40,700 So in this case, when you've got a double half fraction 825 00:47:40,700 --> 00:47:47,760 I'm going to have aliasing be between more than two columns. 826 00:47:47,760 --> 00:47:52,620 Now I actually have multi-way aliasing between three columns. 827 00:47:52,620 --> 00:47:56,555 So in essence, what I've got is something like A equals BC, 828 00:47:56,555 --> 00:47:59,690 D equals ABC. 829 00:47:59,690 --> 00:48:06,020 And then you can work that out as I is equal to ABCD. 830 00:48:06,020 --> 00:48:11,120 In this case, is equal to BC. 831 00:48:11,120 --> 00:48:15,860 So it still has this defining relationship, 832 00:48:15,860 --> 00:48:22,460 but I'm now only picking four of the rows. 833 00:48:22,460 --> 00:48:25,850 So I don't know if you guys can see that. 834 00:48:25,850 --> 00:48:27,440 It's kind of tiny. 835 00:48:27,440 --> 00:48:30,410 But this is all 16 of these columns. 836 00:48:30,410 --> 00:48:32,750 And, again, this is the same combination 837 00:48:32,750 --> 00:48:35,300 that I just went through before. 838 00:48:35,300 --> 00:48:38,040 If you look, the A column here-- 839 00:48:38,040 --> 00:48:41,210 these four-- minus 1, minus 1, 1, 840 00:48:41,210 --> 00:48:46,150 1 is now aliased with both the BCD-- 841 00:48:46,150 --> 00:48:54,590 where'd that go-- the BCD minus 1, minus 1, 1, 1, and the ABC 842 00:48:54,590 --> 00:49:01,400 column, minus 1, minus 1, 1, 1. 843 00:49:01,400 --> 00:49:05,900 So if I do that, I have folded two other columns onto it 844 00:49:05,900 --> 00:49:08,435 in order to get down to the full half fraction. 845 00:49:12,050 --> 00:49:16,280 And you can also explode out in the other direction 846 00:49:16,280 --> 00:49:19,850 and see what are all of the interactions and aliases that 847 00:49:19,850 --> 00:49:23,570 go on, again, either looking at the columns-- 848 00:49:23,570 --> 00:49:25,220 and here, I've shaded the columns 849 00:49:25,220 --> 00:49:27,286 that are alias with each other. 850 00:49:27,286 --> 00:49:29,300 So in each case, we've got all four. 851 00:49:29,300 --> 00:49:32,000 Or you can do the column math. 852 00:49:32,000 --> 00:49:35,450 And this has another nasty effect, 853 00:49:35,450 --> 00:49:42,950 which is those four columns are not even resolution three. 854 00:49:45,885 --> 00:49:47,510 So going back to your earlier question, 855 00:49:47,510 --> 00:49:51,590 I don't know what the defining relationship is for a 2 856 00:49:51,590 --> 00:49:55,080 to the 4 minus 2 resolution three. 857 00:49:55,080 --> 00:49:58,900 I'm not sure what the defining relationship is. 858 00:49:58,900 --> 00:50:01,130 Sounds like a good problem set problem. 859 00:50:01,130 --> 00:50:01,880 Remember that one. 860 00:50:06,410 --> 00:50:09,740 So I think we've explored aliasing. 861 00:50:09,740 --> 00:50:11,180 Do you understand aliasing? 862 00:50:11,180 --> 00:50:13,460 Any questions on the aliasing? 863 00:50:13,460 --> 00:50:15,590 Confounding? 864 00:50:15,590 --> 00:50:17,240 There is one more aspect to it that I 865 00:50:17,240 --> 00:50:18,980 want to explore a little bit, which 866 00:50:18,980 --> 00:50:22,490 is, what are the implications for model construction? 867 00:50:22,490 --> 00:50:24,440 We've already alluded to this. 868 00:50:24,440 --> 00:50:27,440 So let's just sort of work through it. 869 00:50:27,440 --> 00:50:33,230 But also, folding in and remembering interaction terms, 870 00:50:33,230 --> 00:50:35,630 but also, potential higher order terms 871 00:50:35,630 --> 00:50:41,090 and some implications that arise if-- some of my factors, 872 00:50:41,090 --> 00:50:44,900 I think, might have quadratic elements to it. 873 00:50:44,900 --> 00:50:49,190 You get, actually, then complicated aliasing patterns 874 00:50:49,190 --> 00:50:50,550 in those cases. 875 00:50:50,550 --> 00:50:56,630 So a simple case, when I just got one input and I've got one 876 00:50:56,630 --> 00:50:59,740 output, but I think there might be a quadratic effect, 877 00:50:59,740 --> 00:51:03,830 what we're seeing here is that I cannot do just a two level full 878 00:51:03,830 --> 00:51:06,020 factorial to exercise that. 879 00:51:06,020 --> 00:51:07,970 As we talked about last time, I have 880 00:51:07,970 --> 00:51:11,390 to add some kind of a center point or some other-- 881 00:51:11,390 --> 00:51:13,850 I really need all three data points in order 882 00:51:13,850 --> 00:51:17,600 to be able to fit a quadratic term. 883 00:51:20,360 --> 00:51:22,160 And we talked last time about being 884 00:51:22,160 --> 00:51:25,490 able to do ANOVA residual analysis 885 00:51:25,490 --> 00:51:29,450 to differentiate whether that deviation compared 886 00:51:29,450 --> 00:51:35,430 to spread within a replication error is significant. 887 00:51:35,430 --> 00:51:40,640 So I could decide whether that coefficient is significant 888 00:51:40,640 --> 00:51:42,210 or not. 889 00:51:42,210 --> 00:51:47,330 If I were generalizing this now, to more than one factor-- 890 00:51:47,330 --> 00:51:53,750 last time we talked about a two factor example. 891 00:51:53,750 --> 00:51:56,760 And we had up to second order terms. 892 00:51:56,760 --> 00:52:00,380 But if we expand this out to a full quadratic 893 00:52:00,380 --> 00:52:03,530 with all of the interactions, what you see 894 00:52:03,530 --> 00:52:07,940 is a very rapid explosion in the number of coefficients. 895 00:52:07,940 --> 00:52:12,110 Because I've got my main effects-- 896 00:52:12,110 --> 00:52:15,650 so this is still just two factors, 897 00:52:15,650 --> 00:52:19,700 but now tree level-- a full factorial in three levels. 898 00:52:19,700 --> 00:52:21,200 I've got my average. 899 00:52:21,200 --> 00:52:22,340 I've got my main effect. 900 00:52:22,340 --> 00:52:26,810 I've got my interaction between x1 and x2. 901 00:52:26,810 --> 00:52:32,790 But then I've got, also, x1 squared and my x2 squared. 902 00:52:32,790 --> 00:52:38,090 And then I've got the interaction between x1 squared 903 00:52:38,090 --> 00:52:42,180 and x2, x1 and x2 squared. 904 00:52:42,180 --> 00:52:47,675 And if both of the terms were squared, 905 00:52:47,675 --> 00:52:50,100 the question is, do I have-- 906 00:52:50,100 --> 00:52:53,460 this is what the full quadratic model with all 907 00:52:53,460 --> 00:52:55,260 of the interactions would be. 908 00:52:55,260 --> 00:52:59,400 Do I have enough data to be able to fit this if I 909 00:52:59,400 --> 00:53:02,930 did a three squared problem? 910 00:53:05,950 --> 00:53:06,490 Why not? 911 00:53:11,620 --> 00:53:13,000 How many coefficients do we have? 912 00:53:15,880 --> 00:53:17,463 AUDIENCE: You have one extra. 913 00:53:17,463 --> 00:53:18,880 DUANE BONING: Do I have one extra? 914 00:53:18,880 --> 00:53:20,160 One, two, three-- 915 00:53:20,160 --> 00:53:22,780 I have nine coefficients. 916 00:53:22,780 --> 00:53:25,540 How many experiments is three squared? 917 00:53:25,540 --> 00:53:28,450 Nine experiments-- so it's exactly 918 00:53:28,450 --> 00:53:30,880 like what we saw with the two experiments. 919 00:53:30,880 --> 00:53:36,430 I have exactly, just barely, the number to perfectly fit. 920 00:53:36,430 --> 00:53:38,950 So actually, if I do the regression formulation, 921 00:53:38,950 --> 00:53:41,720 which I think is coming-- 922 00:53:41,720 --> 00:53:44,740 ew, nasty-- think these are supposed 923 00:53:44,740 --> 00:53:46,810 to be vertical dot, dot, dots. 924 00:53:46,810 --> 00:53:49,486 And these are supposed to be horizontal dot, dot, dots. 925 00:53:49,486 --> 00:53:51,580 Weird font problem here. 926 00:53:51,580 --> 00:53:56,470 If I were to actually formulate this for the three factor, 927 00:53:56,470 --> 00:54:00,100 I would have exactly the same matrix relationship. 928 00:54:00,100 --> 00:54:03,160 And I have exactly the same number of rows and columns. 929 00:54:03,160 --> 00:54:06,550 And I would be able to fit that-- 930 00:54:06,550 --> 00:54:08,110 I guess that doesn't come till later. 931 00:54:08,110 --> 00:54:10,360 I could solve that directly and get 932 00:54:10,360 --> 00:54:14,260 beta is equal to x minus 1-- 933 00:54:14,260 --> 00:54:15,940 my output. 934 00:54:15,940 --> 00:54:19,400 But I don't have any replicate data. 935 00:54:19,400 --> 00:54:21,190 If I had replicate data, then I would also 936 00:54:21,190 --> 00:54:26,230 need to do the pseudo inverse exactly as before. 937 00:54:26,230 --> 00:54:28,290 So the point the point here is if you 938 00:54:28,290 --> 00:54:34,020 want to build every possible model in the full quadratic, 939 00:54:34,020 --> 00:54:39,230 I have to have a full factorial in three levels, as well. 940 00:54:44,710 --> 00:54:47,632 So we can meaningfully talk-- 941 00:54:47,632 --> 00:54:49,090 although you don't see it that much 942 00:54:49,090 --> 00:54:51,790 in the literature and we'll see why in a moment, 943 00:54:51,790 --> 00:54:56,000 you can talk about full factorial more than two levels. 944 00:54:56,000 --> 00:54:58,430 I'm talking about full factorial 3 945 00:54:58,430 --> 00:55:02,300 to the k, where you've got three levels per test. 946 00:55:02,300 --> 00:55:05,050 But what you will see and I want to touch on a little bit, 947 00:55:05,050 --> 00:55:08,050 are slightly different designs that 948 00:55:08,050 --> 00:55:10,330 have a couple of additional properties 949 00:55:10,330 --> 00:55:13,090 that might be a better way to go than starting 950 00:55:13,090 --> 00:55:17,650 a priori with the full tree level factorial, 951 00:55:17,650 --> 00:55:22,270 doing every possible combination of all three levels. 952 00:55:22,270 --> 00:55:25,690 But rather, start with the two level experiment. 953 00:55:25,690 --> 00:55:32,320 And then if you start to see some important interactions 954 00:55:32,320 --> 00:55:34,360 or some indication that additional effects 955 00:55:34,360 --> 00:55:38,950 are needed, adding experimental design points incrementally 956 00:55:38,950 --> 00:55:40,400 when it's easy to do that. 957 00:55:40,400 --> 00:55:43,630 It's not always easy to do that in your experimental setting. 958 00:55:43,630 --> 00:55:45,860 AUDIENCE: How would you detect that? 959 00:55:45,860 --> 00:55:47,680 Would you say other-- 960 00:55:47,680 --> 00:55:49,060 observe other effects? 961 00:55:49,060 --> 00:55:54,220 DUANE BONING: Yes, so if I do purely the corner model, 962 00:55:54,220 --> 00:55:58,120 is it possible for me to detect if there might be curvature? 963 00:55:58,120 --> 00:55:59,100 No. 964 00:55:59,100 --> 00:56:01,750 So the first thing I would do, if I wanted to detect, 965 00:56:01,750 --> 00:56:05,710 is at least add some center points. 966 00:56:05,710 --> 00:56:07,510 And certainly, for continuous parameters 967 00:56:07,510 --> 00:56:09,500 we talked about last time, that makes sense. 968 00:56:09,500 --> 00:56:11,740 It doesn't make sense for discrete parameters. 969 00:56:11,740 --> 00:56:14,110 Really, curvature is only a term that makes sense 970 00:56:14,110 --> 00:56:15,450 with a continuous parameters. 971 00:56:15,450 --> 00:56:18,160 So that's kind of the domain I'm talking in here. 972 00:56:18,160 --> 00:56:22,600 So in fact, my rule of thumb is I always 973 00:56:22,600 --> 00:56:27,010 have at least the center points in my original design. 974 00:56:27,010 --> 00:56:30,400 I never do just a pure two level factorial. 975 00:56:30,400 --> 00:56:34,900 I always add at least the center points because they tell me-- 976 00:56:34,900 --> 00:56:37,850 and I try to replicate at least the center points 977 00:56:37,850 --> 00:56:43,420 so I can distinguish between curvature and I can also, then, 978 00:56:43,420 --> 00:56:45,910 not fit exactly perfectly every interaction, 979 00:56:45,910 --> 00:56:48,310 but I can also start to ask questions 980 00:56:48,310 --> 00:56:50,650 about the significance of these effects. 981 00:56:50,650 --> 00:56:54,770 Then I've got some indication that there might be curvature. 982 00:56:54,770 --> 00:56:58,930 Now I might go in and start to say, OK, there's curvature 983 00:56:58,930 --> 00:57:01,000 but I don't really know the nature of it. 984 00:57:01,000 --> 00:57:06,160 Now I want to add these three level points on more than one 985 00:57:06,160 --> 00:57:08,920 of the experimental factors. 986 00:57:08,920 --> 00:57:13,300 That's where I might then add even more experimental points. 987 00:57:13,300 --> 00:57:18,670 But I'm always shocked because a few of the books-- 988 00:57:18,670 --> 00:57:21,640 you read Montgomery, [INAUDIBLE],, 989 00:57:21,640 --> 00:57:24,260 you read most of the experimental design books. 990 00:57:24,260 --> 00:57:29,860 They rarely talk and emphasize the value of the center point. 991 00:57:29,860 --> 00:57:34,570 It's just absolutely crucial in my mind. 992 00:57:34,570 --> 00:57:40,600 I always want to add at least that off corner-- 993 00:57:40,600 --> 00:57:43,030 one off corner point. 994 00:57:43,030 --> 00:57:45,290 And preferably some replicates of that 995 00:57:45,290 --> 00:57:48,220 because it gives you so much more power. 996 00:57:48,220 --> 00:57:49,450 And think about it. 997 00:57:49,450 --> 00:57:51,340 If I'm doing four factors, I'm only 998 00:57:51,340 --> 00:57:53,260 adding one more experimental combination. 999 00:57:53,260 --> 00:57:56,620 I'm not exploding out the whole design. 1000 00:57:56,620 --> 00:58:01,150 It's a very cheap way to learn an awful lot 1001 00:58:01,150 --> 00:58:02,820 more about your experiment. 1002 00:58:07,990 --> 00:58:09,490 So this is simply making the point. 1003 00:58:09,490 --> 00:58:14,040 We just looked at the 3 to the 2 case-- 1004 00:58:14,040 --> 00:58:18,195 how many coefficients there are in the full model. 1005 00:58:21,570 --> 00:58:26,580 In particular, if we do the full model quadratic 3 to the 2, 1006 00:58:26,580 --> 00:58:28,750 we already got nine coefficients. 1007 00:58:28,750 --> 00:58:30,180 But if I add just one more factor 1008 00:58:30,180 --> 00:58:34,950 and I'm still worried about the full model 1009 00:58:34,950 --> 00:58:38,700 with all of the interactions, the number 1010 00:58:38,700 --> 00:58:43,560 of experiments that I need to do explodes rapidly. 1011 00:58:43,560 --> 00:58:48,225 Up here with only five factors, I got 243 model terms. 1012 00:58:52,280 --> 00:58:54,530 Do you really think each of those model terms is going 1013 00:58:54,530 --> 00:58:58,610 to be nonzero significant? 1014 00:58:58,610 --> 00:59:02,300 Probably not-- so this is also a wonderful opportunity 1015 00:59:02,300 --> 00:59:06,050 for saying, sparsity of effects, hierarchy, all of those-- 1016 00:59:06,050 --> 00:59:08,690 I'm going to alias some of those in and discount 1017 00:59:08,690 --> 00:59:09,950 certain interactions. 1018 00:59:09,950 --> 00:59:16,730 And in fact, if I just do main effects and the third order 1019 00:59:16,730 --> 00:59:21,510 term, but only on the single effect. 1020 00:59:21,510 --> 00:59:25,940 So an x1 squared-- 1021 00:59:25,940 --> 00:59:29,750 second order term, quadratic model, x1 squared, and x2 1022 00:59:29,750 --> 00:59:32,030 squared, and so on-- 1023 00:59:32,030 --> 00:59:35,690 then it only grows linearly with the number of factors. 1024 00:59:35,690 --> 00:59:41,870 If I know a priori, I can neglect those higher order 1025 00:59:41,870 --> 00:59:44,550 interactions. 1026 00:59:44,550 --> 00:59:47,210 So this is just working out and giving an example, 1027 00:59:47,210 --> 00:59:50,450 now using some of our earlier terminology. 1028 00:59:50,450 --> 00:59:56,720 Again, here I can refer to the different combinations. 1029 00:59:56,720 --> 01:00:03,570 Again, I have my A and B. And I can label the AB interaction. 1030 01:00:03,570 --> 01:00:08,930 That's that AB interaction or the AB effect 1031 01:00:08,930 --> 01:00:10,790 goes with the Beta 1, 2. 1032 01:00:10,790 --> 01:00:15,320 This A2 is just an a squared, B squared, A squared B, B squared 1033 01:00:15,320 --> 01:00:17,240 A, A square B squared. 1034 01:00:17,240 --> 01:00:23,120 And you can, again, see, now, the factor levels where 1035 01:00:23,120 --> 01:00:26,660 I've added 0 to indicate the center level setting 1036 01:00:26,660 --> 01:00:29,210 for each of those. 1037 01:00:29,210 --> 01:00:31,970 But we can also borrow, and import, 1038 01:00:31,970 --> 01:00:36,620 and use all of the same aliasing terminology 1039 01:00:36,620 --> 01:00:38,220 that we had earlier. 1040 01:00:38,220 --> 01:00:42,900 So for example, if I only did-- 1041 01:00:42,900 --> 01:00:46,100 I don't know a-- 1042 01:00:46,100 --> 01:00:50,300 well, if I did a 3 to the 2 minus 1, 1043 01:00:50,300 --> 01:00:53,390 I guess that's a half fraction, but kind of an odd one 1044 01:00:53,390 --> 01:00:58,880 because that means I get to pick three rows. 1045 01:00:58,880 --> 01:01:02,120 I could pick-- is that right? 1046 01:01:02,120 --> 01:01:03,410 I think that is. 1047 01:01:03,410 --> 01:01:05,450 You'd have a lot of aliasing going on. 1048 01:01:08,060 --> 01:01:11,810 So actually, it's not as easy to talk about a half fraction. 1049 01:01:14,360 --> 01:01:18,767 And in fact, we'll see-- 1050 01:01:18,767 --> 01:01:20,850 I can do these aliasing, but I want to leap ahead. 1051 01:01:20,850 --> 01:01:23,060 Here we go. 1052 01:01:23,060 --> 01:01:25,640 A good way to try to visualize these, which 1053 01:01:25,640 --> 01:01:27,410 works for lower number of factors 1054 01:01:27,410 --> 01:01:30,020 because it's not too high dimensional a space, 1055 01:01:30,020 --> 01:01:38,060 is actually plot out in your x1 versus x2 experimental space 1056 01:01:38,060 --> 01:01:41,780 what design points you're actually exercising 1057 01:01:41,780 --> 01:01:43,560 with some half fraction. 1058 01:01:43,560 --> 01:01:47,870 So for example, in this case, I'm just doing say, these 2/3 1059 01:01:47,870 --> 01:01:50,870 and leaving off this 1/3 of the experiment. 1060 01:01:55,210 --> 01:01:57,070 In which case, what I'm essentially doing 1061 01:01:57,070 --> 01:02:00,340 is giving up and saying, I'm not going to do the high level 1062 01:02:00,340 --> 01:02:03,400 setting on my x2 factor. 1063 01:02:03,400 --> 01:02:05,650 And now you can start to get a good feel, 1064 01:02:05,650 --> 01:02:08,140 nice intuitive feel that goes together 1065 01:02:08,140 --> 01:02:09,640 with the mathematics, of what you're 1066 01:02:09,640 --> 01:02:11,950 giving up when you do that. 1067 01:02:11,950 --> 01:02:14,560 If I were not doing those experiments, 1068 01:02:14,560 --> 01:02:18,190 I just picked this subset, what model coefficients am 1069 01:02:18,190 --> 01:02:21,250 I detecting or am I going to be able to fit, 1070 01:02:21,250 --> 01:02:25,800 and what am I not in that experiment? 1071 01:02:25,800 --> 01:02:27,380 Should be pretty intuitive. 1072 01:02:34,430 --> 01:02:38,870 What model coefficient would I not be able to fit? 1073 01:02:42,590 --> 01:02:43,958 Next to what? 1074 01:02:43,958 --> 01:02:46,160 AUDIENCE: [INAUDIBLE] 1075 01:02:46,160 --> 01:02:48,260 DUANE BONING: Yeah, for this one. 1076 01:02:48,260 --> 01:02:52,220 No worries-- I said, we are going to pick these six points. 1077 01:02:52,220 --> 01:02:55,910 These are the six columns going with these rows. 1078 01:02:55,910 --> 01:03:00,800 But I'm not doing those three experiments 1079 01:03:00,800 --> 01:03:03,436 in my x1 and x2 factor [INAUDIBLE].. 1080 01:03:06,220 --> 01:03:10,290 Can I do a quadratic model in x1? 1081 01:03:10,290 --> 01:03:13,690 Sure, I got three data points projected along in that. 1082 01:03:13,690 --> 01:03:16,570 Can I do a quadratic model in x2? 1083 01:03:16,570 --> 01:03:19,700 No, I'm only exercising two different levels. 1084 01:03:19,700 --> 01:03:22,660 So I can only go up to linear in that term. 1085 01:03:22,660 --> 01:03:24,220 And that's what I mean by intuitive. 1086 01:03:24,220 --> 01:03:29,890 I think you can start to see what, at least main order 1087 01:03:29,890 --> 01:03:33,670 effects as well as second order terms, are at work. 1088 01:03:33,670 --> 01:03:36,250 It's a little more subtle to see, 1089 01:03:36,250 --> 01:03:42,190 can I do an x1 and x2 or an x1 squared and an x2? 1090 01:03:42,190 --> 01:03:45,100 I think you can see, I've got combinations 1091 01:03:45,100 --> 01:03:47,050 to be able to do some of those, but there may 1092 01:03:47,050 --> 01:03:48,612 be some confounding going on. 1093 01:03:48,612 --> 01:03:50,320 And then you can look back at the columns 1094 01:03:50,320 --> 01:03:53,960 to see what kind of confounding may be occurring. 1095 01:03:53,960 --> 01:03:58,000 So I haven't actually done that on this 1096 01:03:58,000 --> 01:04:00,760 to figure out which is confounded with what. 1097 01:04:00,760 --> 01:04:03,798 But let's see-- anybody-- 1098 01:04:03,798 --> 01:04:05,590 it's probably so tiny you don't have a hope 1099 01:04:05,590 --> 01:04:07,570 in the world of seeing it. 1100 01:04:07,570 --> 01:04:08,290 There is one. 1101 01:04:11,530 --> 01:04:15,730 B is equal to minus B squared, in this case, 1102 01:04:15,730 --> 01:04:17,710 where B was my x2. 1103 01:04:17,710 --> 01:04:20,530 So this is basically confounding and saying-- 1104 01:04:20,530 --> 01:04:23,980 I was telling you, I could not fit 1105 01:04:23,980 --> 01:04:28,180 the B squared-- the quadratic term in x2. 1106 01:04:28,180 --> 01:04:32,140 If there actually is curvature, where did it go? 1107 01:04:32,140 --> 01:04:35,470 It's hidden inside of the B2 linear term. 1108 01:04:35,470 --> 01:04:39,340 It's confounded with the B linear term. 1109 01:04:39,340 --> 01:04:42,730 So it's the same kind of terminology. 1110 01:04:42,730 --> 01:04:55,440 Here's a different pattern that looks almost the same 1111 01:04:55,440 --> 01:04:56,610 as the other one. 1112 01:04:56,610 --> 01:05:02,226 I can still fit quadratic in x1, linear in x2-- 1113 01:05:02,226 --> 01:05:04,890 in fact, it's not even clear to me 1114 01:05:04,890 --> 01:05:08,500 right up front what I can't fit with one or the other. 1115 01:05:08,500 --> 01:05:12,780 I kind of like having the center point in the other design. 1116 01:05:12,780 --> 01:05:15,790 But now, what's the difference between these two? 1117 01:05:15,790 --> 01:05:20,700 Actually, I think a lot of the aliasing is relatively similar. 1118 01:05:20,700 --> 01:05:23,250 I can think of one reason why I might pick the lower 1119 01:05:23,250 --> 01:05:26,460 design over the upper design. 1120 01:05:26,460 --> 01:05:29,930 AUDIENCE: Wouldn't it make the effect more pronounced? 1121 01:05:29,930 --> 01:05:33,500 DUANE BONING: Yeah, I think that in combination 1122 01:05:33,500 --> 01:05:35,870 with what I was thinking, but I think you're right. 1123 01:05:35,870 --> 01:05:37,430 The statement was, this would make 1124 01:05:37,430 --> 01:05:39,170 the effective more pronounced. 1125 01:05:39,170 --> 01:05:44,270 I'm also thinking it explores a larger space of x2. 1126 01:05:44,270 --> 01:05:48,590 This one is just zeroing in or zooming in on the 0 1127 01:05:48,590 --> 01:05:49,830 to low setting. 1128 01:05:49,830 --> 01:05:52,220 The other one goes all the way and spans the low 1129 01:05:52,220 --> 01:05:53,580 to the high setting. 1130 01:05:53,580 --> 01:05:56,870 So I'm exploring or fitting around the larger portion 1131 01:05:56,870 --> 01:05:58,110 of the space. 1132 01:05:58,110 --> 01:06:04,010 So I might prefer, for that reason, the lower one. 1133 01:06:04,010 --> 01:06:06,210 It doesn't have a pure center point, 1134 01:06:06,210 --> 01:06:08,900 but it's not clear that the other one does either. 1135 01:06:11,762 --> 01:06:20,780 AUDIENCE: Is there [INAUDIBLE] 1136 01:06:20,780 --> 01:06:23,570 DUANE BONING: Yeah, so the same idea is a balance 1137 01:06:23,570 --> 01:06:25,200 and orthogonality are at work. 1138 01:06:25,200 --> 01:06:27,680 But again, it's mostly with respect to-- 1139 01:06:30,320 --> 01:06:31,865 it's more subtle now because it's 1140 01:06:31,865 --> 01:06:33,860 with respect to either the linear term 1141 01:06:33,860 --> 01:06:37,130 or the quadratic term. 1142 01:06:37,130 --> 01:06:40,700 So these are both balanced and orthogonal with respect 1143 01:06:40,700 --> 01:06:45,530 to things like the main order effects. 1144 01:06:45,530 --> 01:06:49,860 But not orthogonal B to B squared in these two cases. 1145 01:06:49,860 --> 01:06:51,590 So you can use the same reasoning. 1146 01:06:51,590 --> 01:06:55,220 But you've got three levels, you've 1147 01:06:55,220 --> 01:06:57,470 got to expand it out and think it's not just 1148 01:06:57,470 --> 01:06:59,810 the whole variable or the interaction. 1149 01:06:59,810 --> 01:07:02,600 It's also the second order term, possibly, 1150 01:07:02,600 --> 01:07:05,244 being nonorthogonal with a lower order term. 1151 01:07:07,880 --> 01:07:08,690 Here's another one. 1152 01:07:14,500 --> 01:07:21,080 What do you think about that one compared to the previous two? 1153 01:07:25,730 --> 01:07:27,240 This one looks interesting to me. 1154 01:07:34,114 --> 01:07:35,832 AUDIENCE: It's going to be bigger 1155 01:07:35,832 --> 01:07:39,515 because we can do quadratic [INAUDIBLE] 1156 01:07:39,515 --> 01:07:42,170 and since the combinations are less-- they 1157 01:07:42,170 --> 01:07:45,910 have less effect [INAUDIBLE]. 1158 01:07:45,910 --> 01:07:47,373 DUANE BONING: Yep-- 1159 01:07:47,373 --> 01:07:48,790 I'm not sure everybody heard that. 1160 01:07:48,790 --> 01:07:52,780 But at least here, we're exercising all three levels 1161 01:07:52,780 --> 01:07:55,990 of x1, all three levels of x2. 1162 01:07:55,990 --> 01:08:00,430 So we can fit quadratic terms in each of the cases. 1163 01:08:00,430 --> 01:08:03,610 Where we might need to be a little careful 1164 01:08:03,610 --> 01:08:05,680 is we're actually making an interesting trade 1165 01:08:05,680 --> 01:08:09,040 off here-- going back, also, to your earlier point. 1166 01:08:09,040 --> 01:08:13,210 We're giving up a little bit of balance in this design. 1167 01:08:13,210 --> 01:08:16,270 In that for the low setting of x1, 1168 01:08:16,270 --> 01:08:20,500 I've actually got more data than for the high setting of x1. 1169 01:08:20,500 --> 01:08:23,439 And so if there's noise effects, I'm 1170 01:08:23,439 --> 01:08:27,100 actually fitting, in effect-- 1171 01:08:27,100 --> 01:08:30,069 if you do the regression, you'll have a narrower confidence 1172 01:08:30,069 --> 01:08:32,229 interval over here at the low setting 1173 01:08:32,229 --> 01:08:34,069 than you would at the high setting. 1174 01:08:34,069 --> 01:08:36,859 So you actually have to be a little careful. 1175 01:08:36,859 --> 01:08:38,920 You can do the regression math. 1176 01:08:38,920 --> 01:08:41,019 You have to be careful in forming your contrast. 1177 01:08:45,160 --> 01:08:46,899 Normally, you get to these high r models, 1178 01:08:46,899 --> 01:08:50,200 you're probably throwing it into a regression anyway. 1179 01:08:50,200 --> 01:08:52,600 But you also then have to be careful in the interpolation 1180 01:08:52,600 --> 01:08:54,939 and use of the model in different parts of the space 1181 01:08:54,939 --> 01:08:56,529 because its accuracy is a little bit 1182 01:08:56,529 --> 01:08:59,109 different in different parts of this space. 1183 01:08:59,109 --> 01:09:00,040 But I like this-- 1184 01:09:00,040 --> 01:09:06,189 I kind of like this one is as well, even with those caveats. 1185 01:09:06,189 --> 01:09:09,819 But you can still use this and the same aliasing terminology 1186 01:09:09,819 --> 01:09:13,420 to figure out which coefficients I'm giving up. 1187 01:09:13,420 --> 01:09:14,883 What's aliasing with what. 1188 01:09:14,883 --> 01:09:17,300 And I'm not going to go through that, but you can do that. 1189 01:09:17,300 --> 01:09:22,300 AUDIENCE: Would you get a higher [INAUDIBLE]?? 1190 01:09:22,300 --> 01:09:24,550 DUANE BONING: Well, it's not clear you 1191 01:09:24,550 --> 01:09:27,040 would get an overall higher or lower r squared. 1192 01:09:35,450 --> 01:09:38,050 You almost always-- and we'll do a little bit more regression 1193 01:09:38,050 --> 01:09:38,550 later. 1194 01:09:38,550 --> 01:09:40,880 But you almost always, if you add more terms, 1195 01:09:40,880 --> 01:09:42,359 you get a better r squared. 1196 01:09:42,359 --> 01:09:46,189 But then the question is, is it a fair model? 1197 01:09:46,189 --> 01:09:48,170 So we can also talk about an adjusted r 1198 01:09:48,170 --> 01:09:52,130 squared where you penalize for the additional model-- 1199 01:09:52,130 --> 01:09:55,653 the additional model in terms. 1200 01:09:55,653 --> 01:09:57,070 And then this is just pointing out 1201 01:09:57,070 --> 01:10:02,980 that you can still use the linear algebraic-- 1202 01:10:02,980 --> 01:10:08,860 either direct solution or quasi inverse solution if you've got 1203 01:10:08,860 --> 01:10:12,860 replicates to be able to fit the model. 1204 01:10:12,860 --> 01:10:16,210 So these are the sorts of things that come out. 1205 01:10:16,210 --> 01:10:19,720 This is for an x1, x2. 1206 01:10:19,720 --> 01:10:21,880 I don't know which is which-- 1207 01:10:21,880 --> 01:10:24,130 x1, x2. 1208 01:10:24,130 --> 01:10:27,483 If, in fact, there are true quadratic terms, 1209 01:10:27,483 --> 01:10:28,900 these are a little different kinds 1210 01:10:28,900 --> 01:10:33,730 of surfaces than the ruled surface, which 1211 01:10:33,730 --> 01:10:37,690 looked like it had kind of a funky kind of curvature because 1212 01:10:37,690 --> 01:10:39,790 of an x1, x2 interaction. 1213 01:10:39,790 --> 01:10:43,690 But if I projected down on any one variable, 1214 01:10:43,690 --> 01:10:45,100 it's always linear. 1215 01:10:45,100 --> 01:10:49,360 Here, if I were to do a slice holding x1 constant, 1216 01:10:49,360 --> 01:10:52,030 I do, in fact, get a true quadratic. 1217 01:10:52,030 --> 01:10:55,360 Now, the nice thing about quadratic surfaces 1218 01:10:55,360 --> 01:11:03,190 is you can start to think about an optimal point much more 1219 01:11:03,190 --> 01:11:04,930 easily within these. 1220 01:11:04,930 --> 01:11:08,470 Certainly, if the space is large enough 1221 01:11:08,470 --> 01:11:13,120 to cover a minimum or a maximum, there's a natural motion 1222 01:11:13,120 --> 01:11:16,450 if you're trying to minimize or maximize your output 1223 01:11:16,450 --> 01:11:20,290 y of finding the optimum space. 1224 01:11:20,290 --> 01:11:22,480 Now, it's also possible that if I had a smaller 1225 01:11:22,480 --> 01:11:26,770 space in the true minimum or maximum 1226 01:11:26,770 --> 01:11:30,380 of the full equation where outside, then I might have-- 1227 01:11:30,380 --> 01:11:33,430 so for example, let's say my space was constrained 1228 01:11:33,430 --> 01:11:36,280 to only right here, then I might run 1229 01:11:36,280 --> 01:11:40,750 into the minimum or the maximum at one of my boundaries. 1230 01:11:40,750 --> 01:11:43,090 But now we've got this extra notion 1231 01:11:43,090 --> 01:11:45,580 that my min or maximum might occur somewhere 1232 01:11:45,580 --> 01:11:47,800 in the interior of the space. 1233 01:11:47,800 --> 01:11:49,540 As opposed to with linear models, 1234 01:11:49,540 --> 01:11:54,340 it's always at one or the other of the boundaries. 1235 01:11:54,340 --> 01:11:56,860 So this starts to get to the desire 1236 01:11:56,860 --> 01:11:59,470 for using this kind of a model now 1237 01:11:59,470 --> 01:12:03,580 to find the optimum point, which is one of the main reasons we 1238 01:12:03,580 --> 01:12:05,260 do experimental designs. 1239 01:12:05,260 --> 01:12:07,720 Not just to find out which factors matter, 1240 01:12:07,720 --> 01:12:11,110 but build the model, and then use it either in, 1241 01:12:11,110 --> 01:12:15,700 maybe, feedback control or more often, just to set-- 1242 01:12:15,700 --> 01:12:19,270 find the process settings or find the design optimal point 1243 01:12:19,270 --> 01:12:21,295 in order to achieve some criteria. 1244 01:12:24,370 --> 01:12:27,160 Now, I alluded already to adding additional points. 1245 01:12:27,160 --> 01:12:31,180 If we did a full factorial three level-- 1246 01:12:31,180 --> 01:12:35,350 that's all nine combinations of high-low. 1247 01:12:35,350 --> 01:12:39,730 And in the x1, x2 space, that's all nine points. 1248 01:12:39,730 --> 01:12:43,390 There is an alternative approach and probably one 1249 01:12:43,390 --> 01:12:46,670 of the most important experimental designs 1250 01:12:46,670 --> 01:12:48,940 after two level full factorial. 1251 01:12:48,940 --> 01:12:52,000 It's referred to as a central composite design. 1252 01:12:52,000 --> 01:12:55,060 And you often will design up front to do this, 1253 01:12:55,060 --> 01:12:58,780 but also very often it will be what 1254 01:12:58,780 --> 01:13:03,490 you extend a full factorial corner point design with. 1255 01:13:03,490 --> 01:13:06,280 If I did my first experiment with 2 1256 01:13:06,280 --> 01:13:14,820 to the 2, just pure full factorial two levels, 1257 01:13:14,820 --> 01:13:20,540 and I got four tests, first off, this is probably wrong. 1258 01:13:20,540 --> 01:13:24,870 The model is shown to have a poor fit. 1259 01:13:24,870 --> 01:13:28,400 If I actually did this for all four points in the interaction, 1260 01:13:28,400 --> 01:13:31,310 I don't have enough to detect that. 1261 01:13:31,310 --> 01:13:36,650 So somewhere in here would be ad center points. 1262 01:13:36,650 --> 01:13:38,950 The power of the center point. 1263 01:13:38,950 --> 01:13:41,180 If you remember nothing else from today, 1264 01:13:41,180 --> 01:13:43,010 the power of the center point. 1265 01:13:43,010 --> 01:13:45,740 The power of the center point. 1266 01:13:45,740 --> 01:13:48,920 Then I can start to detect whether it actually 1267 01:13:48,920 --> 01:13:53,190 has a lack of fit in a formal sense. 1268 01:13:53,190 --> 01:13:56,300 And let's say then I decide I want to go quadratic, 1269 01:13:56,300 --> 01:14:00,330 but I'm not quite sure of the shape where 1270 01:14:00,330 --> 01:14:04,320 my min or maximum might be. 1271 01:14:04,320 --> 01:14:08,880 What we often do is add our additional points. 1272 01:14:08,880 --> 01:14:12,890 So there was our original four corner points. 1273 01:14:12,890 --> 01:14:15,920 Certainly, we wanted to add the one at the center. 1274 01:14:15,920 --> 01:14:17,810 Should have done that already. 1275 01:14:17,810 --> 01:14:22,460 But now I can also decide where to add my interaction 1276 01:14:22,460 --> 01:14:26,060 points different-- so let's say I added this. 1277 01:14:26,060 --> 01:14:32,150 The full typical 3 to the 3 would say, OK, 1278 01:14:32,150 --> 01:14:38,640 you add them at exactly these 3 by 3 grid array. 1279 01:14:38,640 --> 01:14:43,000 I would add them at the center points of each of these. 1280 01:14:43,000 --> 01:14:45,940 We can do something else that's a little bit clever. 1281 01:14:45,940 --> 01:14:53,070 Which is instead, add these interaction points 1282 01:14:53,070 --> 01:15:02,810 off the grid, but at the location of an outer circle 1283 01:15:02,810 --> 01:15:07,970 equidistant or circumscribed around the original points 1284 01:15:07,970 --> 01:15:10,340 of my cube-- my hypercube. 1285 01:15:14,630 --> 01:15:17,590 See I think-- there we go. 1286 01:15:17,590 --> 01:15:19,950 So what we would do in that case here, 1287 01:15:19,950 --> 01:15:22,770 is these would be my first four points. 1288 01:15:22,770 --> 01:15:24,090 I'd add my center point. 1289 01:15:24,090 --> 01:15:28,800 And now I add these points off axis. 1290 01:15:28,800 --> 01:15:32,280 There's something really clever, really valuable in doing this-- 1291 01:15:32,280 --> 01:15:35,340 some things that are subtle, and kind of mathematical, and not 1292 01:15:35,340 --> 01:15:37,440 all that important, but some other parts that 1293 01:15:37,440 --> 01:15:40,790 are really nice and intuitive. 1294 01:15:40,790 --> 01:15:43,490 I'll give you the subtle mathematical part. 1295 01:15:43,490 --> 01:15:52,550 This actually maintains a little bit of fitting-- 1296 01:15:52,550 --> 01:15:54,500 if you go and you do the regression, 1297 01:15:54,500 --> 01:15:56,990 the fact that all of your off center points 1298 01:15:56,990 --> 01:15:59,690 are now equidistant from the center point 1299 01:15:59,690 --> 01:16:04,680 means that your model variance, no matter which direction you 1300 01:16:04,680 --> 01:16:06,430 go, is the same. 1301 01:16:06,430 --> 01:16:08,490 In other words, there's a confidence interval 1302 01:16:08,490 --> 01:16:13,110 on your coefficients and an interpolation or extrapolation 1303 01:16:13,110 --> 01:16:15,870 error that grows as you go further 1304 01:16:15,870 --> 01:16:17,860 from the center of your design. 1305 01:16:17,860 --> 01:16:23,690 And by doing this, that maintains asymmetry. 1306 01:16:23,690 --> 01:16:27,140 It doesn't matter which direction you look. 1307 01:16:27,140 --> 01:16:28,940 You've got the same model accuracy 1308 01:16:28,940 --> 01:16:32,810 purely as a function of distance from the center. 1309 01:16:32,810 --> 01:16:36,660 So that's a nice mathematical property. 1310 01:16:36,660 --> 01:16:40,890 But there's another more intuitive value 1311 01:16:40,890 --> 01:16:42,390 picking these corner points. 1312 01:16:46,250 --> 01:16:49,060 And the way I think of it is if I just 1313 01:16:49,060 --> 01:16:53,080 picked those 3 by 3 on a regular grid, 1314 01:16:53,080 --> 01:16:56,800 I could then project down to x1 or x2. 1315 01:16:56,800 --> 01:17:00,520 And then in terms of exercising the x1 variable, 1316 01:17:00,520 --> 01:17:03,700 I've only got three levels. 1317 01:17:03,700 --> 01:17:07,520 Here, if I were to project down my points-- 1318 01:17:07,520 --> 01:17:10,870 so let me get rid of some of this scribble here. 1319 01:17:10,870 --> 01:17:15,430 If I were to project down onto just my x1 axis-- 1320 01:17:15,430 --> 01:17:16,930 let's say I did the experiment and I 1321 01:17:16,930 --> 01:17:20,860 found that x2 actually was not that important 1322 01:17:20,860 --> 01:17:25,150 and I project down all of these points. 1323 01:17:25,150 --> 01:17:31,740 That means for my x1, I've actually 1324 01:17:31,740 --> 01:17:38,120 exercised at five different levels of x1. 1325 01:17:38,120 --> 01:17:40,400 I've got more than just-- 1326 01:17:40,400 --> 01:17:44,030 I've got extra redundancy built in. 1327 01:17:44,030 --> 01:17:48,150 If I'm fitting just a quadratic model, 1328 01:17:48,150 --> 01:17:54,220 I've got ways to actually check whether the quadratic model is 1329 01:17:54,220 --> 01:17:54,880 sufficient. 1330 01:17:54,880 --> 01:18:00,030 I can do a lack of fit test on even the quadratic model. 1331 01:18:00,030 --> 01:18:04,320 Whereas if I only had exactly three samples of x1 1332 01:18:04,320 --> 01:18:06,390 and I add the quadratic in, there's 1333 01:18:06,390 --> 01:18:08,730 no way for me to ask the question, 1334 01:18:08,730 --> 01:18:12,140 might there be an even higher order, 1335 01:18:12,140 --> 01:18:14,960 a logarithmic dependence, or something else 1336 01:18:14,960 --> 01:18:17,540 that's subtle that's going on. 1337 01:18:17,540 --> 01:18:21,590 Because I fit exactly all of the data that I have. 1338 01:18:21,590 --> 01:18:23,420 So the central composite design, I think, 1339 01:18:23,420 --> 01:18:26,060 has this nice intuitive feel of I'm 1340 01:18:26,060 --> 01:18:31,010 actually kind of exercising my space a little more thoroughly 1341 01:18:31,010 --> 01:18:34,760 to be able to build a model that can apply a little bit more 1342 01:18:34,760 --> 01:18:36,248 broadly. 1343 01:18:36,248 --> 01:18:37,040 Everybody see that? 1344 01:18:40,690 --> 01:18:43,450 So here's what the central composite would look like. 1345 01:18:43,450 --> 01:18:47,110 What you end up with is the original coroner points. 1346 01:18:47,110 --> 01:18:48,970 You add the center point. 1347 01:18:48,970 --> 01:18:53,710 And now you can see, if you were to do the geometry, 1348 01:18:53,710 --> 01:18:59,020 the distance there is square root of 2. 1349 01:18:59,020 --> 01:19:03,460 So I'm going a full one on both the x1 and the x2. 1350 01:19:03,460 --> 01:19:08,390 And you can extrapolate that to a third order hypercube. 1351 01:19:08,390 --> 01:19:10,990 I think it ends up being, instead 1352 01:19:10,990 --> 01:19:14,380 of square root of 2, that distance is a square root of 3 1353 01:19:14,380 --> 01:19:15,230 and so on. 1354 01:19:15,230 --> 01:19:18,480 So you can find what the distance is 1355 01:19:18,480 --> 01:19:22,890 and pick those different corner points. 1356 01:19:22,890 --> 01:19:25,650 So what we'll do next time is build on a little bit 1357 01:19:25,650 --> 01:19:30,120 this idea of using the model. 1358 01:19:30,120 --> 01:19:31,140 We built it. 1359 01:19:31,140 --> 01:19:32,370 We can assess it. 1360 01:19:32,370 --> 01:19:35,190 We can pick our points to build different orders of model. 1361 01:19:35,190 --> 01:19:39,360 What we'd like to do is talk a little bit about picking 1362 01:19:39,360 --> 01:19:43,500 one of these surfaces and how I actually-- 1363 01:19:43,500 --> 01:19:45,450 a couple of related, but slightly different 1364 01:19:45,450 --> 01:19:53,100 ways of looking at optimizing to find an optimum point either 1365 01:19:53,100 --> 01:19:55,410 after you've built the whole model or-- 1366 01:19:55,410 --> 01:19:59,580 the clever thing here when you're on a process line-- 1367 01:19:59,580 --> 01:20:02,400 doing it interactively or incrementally. 1368 01:20:02,400 --> 01:20:05,760 Actually, picking your design points almost one 1369 01:20:05,760 --> 01:20:09,510 at a time based on your current view of the model driving 1370 01:20:09,510 --> 01:20:11,470 towards the optimum. 1371 01:20:11,470 --> 01:20:13,740 So instead of doing all your experiments up front, 1372 01:20:13,740 --> 01:20:17,340 you might actually want to do them in an evolutionary way 1373 01:20:17,340 --> 01:20:19,240 to try to find the optimum. 1374 01:20:19,240 --> 01:20:23,990 So we'll talk about both of those approaches next time. 1375 01:20:23,990 --> 01:20:27,610 So again, the problem set is due tomorrow. 1376 01:20:27,610 --> 01:20:32,470 I think we'll hold off and we'll give ourselves, Hayden and me, 1377 01:20:32,470 --> 01:20:35,400 an extension on issuing the new problem 1378 01:20:35,400 --> 01:20:38,440 set so you're not doubled up, and we can think about it 1379 01:20:38,440 --> 01:20:40,120 a little bit more as well. 1380 01:20:40,120 --> 01:20:43,480 And issue that tomorrow, as well. 1381 01:20:43,480 --> 01:20:46,320 We'll see you then on Tuesday.