1 00:00:00,000 --> 00:00:02,460 The following content is provided under a Creative 2 00:00:02,460 --> 00:00:03,760 Commons license. 3 00:00:03,760 --> 00:00:06,060 Your support will help MIT OpenCourseWare 4 00:00:06,060 --> 00:00:10,090 continue to offer high-quality educational resources for free. 5 00:00:10,090 --> 00:00:12,690 To make a donation or to view additional materials 6 00:00:12,690 --> 00:00:16,560 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,560 --> 00:00:17,904 at ocw.mit.edu. 8 00:00:20,685 --> 00:00:21,690 DUANE BONING: OK. 9 00:00:21,690 --> 00:00:24,570 So today's lecture, in some sense 10 00:00:24,570 --> 00:00:28,110 is actually out of sequence by one from Thursday, 11 00:00:28,110 --> 00:00:30,900 but based on when I was able to get 12 00:00:30,900 --> 00:00:35,230 Dan Frey to be able to come in and give his lecture. 13 00:00:35,230 --> 00:00:41,010 This is a lecture today on variance estimation, which 14 00:00:41,010 --> 00:00:46,470 is kind of a topic that probably ought to come after Thursday's, 15 00:00:46,470 --> 00:00:49,290 because Thursday's is still in the area of design 16 00:00:49,290 --> 00:00:53,520 of experiments, response surface modeling, and optimization. 17 00:00:53,520 --> 00:00:56,340 So it's kind of a nice wrap-up to the last three or four 18 00:00:56,340 --> 00:00:59,100 lectures that we've been doing. 19 00:00:59,100 --> 00:01:01,210 So we'll come back to that on Thursday. 20 00:01:01,210 --> 00:01:02,970 But what I want to do today is this idea 21 00:01:02,970 --> 00:01:05,170 of nested variance components. 22 00:01:05,170 --> 00:01:08,370 And the readings for this material 23 00:01:08,370 --> 00:01:10,560 are not in either of the textbooks. 24 00:01:10,560 --> 00:01:13,710 There's a separate chapter that is 25 00:01:13,710 --> 00:01:16,590 on the website under Readings. 26 00:01:16,590 --> 00:01:18,600 And it's chapter 3 from this book 27 00:01:18,600 --> 00:01:22,830 by David Drain, Statistical Methods for Industrial Process 28 00:01:22,830 --> 00:01:25,140 Control. 29 00:01:25,140 --> 00:01:27,300 I think he's still there, I'm not sure. 30 00:01:27,300 --> 00:01:29,310 He's a statistician for Intel. 31 00:01:29,310 --> 00:01:32,430 And actually, although the title doesn't indicate it, 32 00:01:32,430 --> 00:01:34,950 it's all about semiconductor manufacturing 33 00:01:34,950 --> 00:01:38,650 and statistical methods for that. 34 00:01:38,650 --> 00:01:42,060 But this idea of variance components is actually, 35 00:01:42,060 --> 00:01:44,130 it does come up, and I've never actually 36 00:01:44,130 --> 00:01:47,950 seen it covered in any of the standard statistics texts. 37 00:01:47,950 --> 00:01:50,350 So I think it's a very powerful and important idea, 38 00:01:50,350 --> 00:01:53,310 and that's why I wanted to talk about it today. 39 00:01:53,310 --> 00:01:57,520 In addition to this book as a reading on the website, 40 00:01:57,520 --> 00:02:00,540 we also have spreadsheets. 41 00:02:00,540 --> 00:02:03,870 There's a spreadsheet with three different worksheets in it 42 00:02:03,870 --> 00:02:07,800 that are the two main examples out of the Drain book. 43 00:02:07,800 --> 00:02:11,700 He just pulls up the data, and then 44 00:02:11,700 --> 00:02:14,130 uses some statistics package to pull out 45 00:02:14,130 --> 00:02:15,870 things like the ANOVA table. 46 00:02:15,870 --> 00:02:19,620 I actually show you the spreadsheet 47 00:02:19,620 --> 00:02:21,690 with all the intermediate calculations 48 00:02:21,690 --> 00:02:25,770 for both analysis of variance and this estimation 49 00:02:25,770 --> 00:02:27,370 of variance components in it. 50 00:02:27,370 --> 00:02:29,290 So you'll find that very useful. 51 00:02:29,290 --> 00:02:33,060 You may find it really useful as a template for some 52 00:02:33,060 --> 00:02:37,070 of the work on the problem set as well. 53 00:02:37,070 --> 00:02:43,170 OK, so what I want to do is first 54 00:02:43,170 --> 00:02:47,040 off, refresh ourselves a little bit on analysis of variance. 55 00:02:47,040 --> 00:02:48,510 So in some sense, this is actually 56 00:02:48,510 --> 00:02:52,410 a little bit of a refresher, first on ANOVA, 57 00:02:52,410 --> 00:02:55,280 and then deepening of our understanding, hopefully, 58 00:02:55,280 --> 00:02:56,850 of ANOVA. 59 00:02:56,850 --> 00:03:00,960 So indirectly, it's a little bit of review for the quiz. 60 00:03:00,960 --> 00:03:02,580 But then I want to talk a little bit 61 00:03:02,580 --> 00:03:07,500 about a different assumption of the model, or the underlying 62 00:03:07,500 --> 00:03:10,890 problem that's being looked at, that leads away 63 00:03:10,890 --> 00:03:15,450 from standard ANOVA, towards this idea of nested variance. 64 00:03:15,450 --> 00:03:18,930 And briefly, what nested invariance is, 65 00:03:18,930 --> 00:03:23,220 is structures in which you may have more than one source 66 00:03:23,220 --> 00:03:26,520 of random variation at work. 67 00:03:26,520 --> 00:03:30,300 And a classic example in semiconductor manufacturing 68 00:03:30,300 --> 00:03:35,820 would be this spatial hierarchy of structures like chips, 69 00:03:35,820 --> 00:03:41,130 you have 50, 100, 1000 chips on each wafer. 70 00:03:41,130 --> 00:03:44,970 And you may have chip-to-chip variance, for example. 71 00:03:44,970 --> 00:03:48,990 And then you may have a certain number of wafers in a lot 72 00:03:48,990 --> 00:03:53,320 or a boat of wafers that all are typically processed together. 73 00:03:53,320 --> 00:03:56,850 So I might have 24 wafers in each lot. 74 00:03:56,850 --> 00:04:00,060 And I may have wafer-to-wafer variance. 75 00:04:00,060 --> 00:04:05,050 Similarly, I may run multiple lots over time in one line. 76 00:04:05,050 --> 00:04:08,940 So I have lot-to-lot variation. 77 00:04:08,940 --> 00:04:11,040 And very often, you're gathering lots and lots 78 00:04:11,040 --> 00:04:13,680 of data, much data. 79 00:04:13,680 --> 00:04:16,110 And you're trying to say, OK, how much of it 80 00:04:16,110 --> 00:04:17,820 is die-to-die variation? 81 00:04:17,820 --> 00:04:19,980 How much of it is wafer-to-wafer variation? 82 00:04:19,980 --> 00:04:21,269 How much lot-to-lot? 83 00:04:21,269 --> 00:04:25,890 You want to separate out or decompose the total variation 84 00:04:25,890 --> 00:04:30,330 that you're seeing into each of these different components. 85 00:04:30,330 --> 00:04:33,963 The standard ANOVA isn't really set up to do that. 86 00:04:33,963 --> 00:04:35,880 And that's what I'm going to talk about today, 87 00:04:35,880 --> 00:04:40,380 is try to show you what ANOVA assumes, which 88 00:04:40,380 --> 00:04:46,380 is looking for a fixed effect between different design 89 00:04:46,380 --> 00:04:48,090 points, or different replicates. 90 00:04:48,090 --> 00:04:52,680 And then this notion of nested structures, 91 00:04:52,680 --> 00:04:55,260 where you have things within other things 92 00:04:55,260 --> 00:04:57,600 within other things, or groups within other groups 93 00:04:57,600 --> 00:04:59,050 within other groups. 94 00:04:59,050 --> 00:05:00,910 And so that's the key plan. 95 00:05:00,910 --> 00:05:04,560 So what I want to do is again, refresh 96 00:05:04,560 --> 00:05:07,350 us a little bit on standard ANOVA, 97 00:05:07,350 --> 00:05:10,050 then be fairly explicit about what 98 00:05:10,050 --> 00:05:14,100 the model is that underlies these nested variance 99 00:05:14,100 --> 00:05:16,260 structures. 100 00:05:16,260 --> 00:05:18,600 And then we'll actually go back and work 101 00:05:18,600 --> 00:05:22,660 through how we extend the ideas of ANOVA 102 00:05:22,660 --> 00:05:27,490 to actually be able to estimate these different variances. 103 00:05:27,490 --> 00:05:31,150 And then finally, there are some implications 104 00:05:31,150 --> 00:05:34,300 of these variance structures for how 105 00:05:34,300 --> 00:05:35,810 you would design experiments. 106 00:05:35,810 --> 00:05:38,410 So for example, if you want to get the best 107 00:05:38,410 --> 00:05:42,100 estimate possible for that die-to-die variation, 108 00:05:42,100 --> 00:05:44,350 should you make more die measurements? 109 00:05:44,350 --> 00:05:48,460 Or should you have more wafer measurements? 110 00:05:48,460 --> 00:05:51,280 How would you allocate a total budget 111 00:05:51,280 --> 00:05:55,070 of measurements across that nested variance structure? 112 00:05:55,070 --> 00:05:57,280 So that's the plan. 113 00:05:57,280 --> 00:05:59,950 And the key idea here, again, is there 114 00:05:59,950 --> 00:06:02,440 is a big difference, an important difference that I 115 00:06:02,440 --> 00:06:04,720 want to cover between the standard ANOVA 116 00:06:04,720 --> 00:06:07,790 and these nested variance structures. 117 00:06:07,790 --> 00:06:12,220 So what is it in standard ANOVA we were doing? 118 00:06:12,220 --> 00:06:17,080 So there was a very basic question in the standard ANOVA, 119 00:06:17,080 --> 00:06:23,830 which is, we're basically asking, if I were just having 120 00:06:23,830 --> 00:06:27,520 a single source of random variation at work, 121 00:06:27,520 --> 00:06:29,560 and we make, actually, the assumption that we're 122 00:06:29,560 --> 00:06:32,440 sampling from a normal distribution, when 123 00:06:32,440 --> 00:06:34,840 I look from one group to another, 124 00:06:34,840 --> 00:06:39,610 do I see evidence that something besides just random sampling 125 00:06:39,610 --> 00:06:45,280 from that single normal distribution is at work? 126 00:06:45,280 --> 00:06:51,790 Is there a fixed effect that is large enough 127 00:06:51,790 --> 00:06:57,730 that I think it's more than just a random chance that I got 128 00:06:57,730 --> 00:07:00,280 an observed difference between the means of a couple 129 00:07:00,280 --> 00:07:01,930 of different groups? 130 00:07:01,930 --> 00:07:03,940 And the basic approach mechanically 131 00:07:03,940 --> 00:07:08,350 to do that was first off, we needed some estimate 132 00:07:08,350 --> 00:07:13,300 of the variance of just the natural variation, 133 00:07:13,300 --> 00:07:16,360 the pure replication error. 134 00:07:16,360 --> 00:07:19,360 And what we basically did is treated each of our treatment 135 00:07:19,360 --> 00:07:22,780 groups, looked at each one, and said, 136 00:07:22,780 --> 00:07:26,110 OK, there's a local mean of that treatment group. 137 00:07:26,110 --> 00:07:30,580 What's the replication variance around that local mean? 138 00:07:30,580 --> 00:07:32,830 I wasn't changing now the parameters. 139 00:07:32,830 --> 00:07:38,300 But for each of the ones with the design fixed, 140 00:07:38,300 --> 00:07:40,630 I just looked at replication. 141 00:07:40,630 --> 00:07:44,470 So that was one estimate, that's our natural variation. 142 00:07:44,470 --> 00:07:49,240 Second, we looked at the group-to-group deviations. 143 00:07:49,240 --> 00:07:54,040 And we said, either this is due to a fixed effect, 144 00:07:54,040 --> 00:07:58,450 a systematic fixed effect that's different between those two 145 00:07:58,450 --> 00:08:04,510 groups, or it's just random chance because of sampling. 146 00:08:04,510 --> 00:08:09,490 And then we looked at the ratio of those two variances, 147 00:08:09,490 --> 00:08:11,680 and said, by random chance alone, what 148 00:08:11,680 --> 00:08:14,770 would I expect based on the size of the samples, the number 149 00:08:14,770 --> 00:08:17,800 of data points going into each of those? 150 00:08:17,800 --> 00:08:23,470 Look at the F ratio for the statistics associated 151 00:08:23,470 --> 00:08:25,330 with that ratio of variance to see 152 00:08:25,330 --> 00:08:29,290 if it is likely to have occurred by chance alone. 153 00:08:29,290 --> 00:08:31,120 All sounds familiar? 154 00:08:31,120 --> 00:08:32,650 You could do that in your sleep? 155 00:08:36,080 --> 00:08:37,039 OK. 156 00:08:37,039 --> 00:08:39,935 So here pictorially is what was going on 157 00:08:39,935 --> 00:08:41,299 with the ANOVA example. 158 00:08:41,299 --> 00:08:44,630 And in fact, I think this was the ANOVA example. 159 00:08:44,630 --> 00:08:47,760 I think that we did the actual data 160 00:08:47,760 --> 00:08:52,040 we used when we first introduced ANOVA a few weeks ago. 161 00:08:52,040 --> 00:08:55,520 It's the simplest possible ANOVA setup 162 00:08:55,520 --> 00:08:58,010 you could have with four data points. 163 00:08:58,010 --> 00:08:59,510 And what's beautiful about four data 164 00:08:59,510 --> 00:09:03,690 points is we could actually do the whole calculation by hand. 165 00:09:03,690 --> 00:09:04,970 So here was the situation. 166 00:09:04,970 --> 00:09:09,620 I had two different groups, group 1, group 2, the data 167 00:09:09,620 --> 00:09:12,170 points 3 and 5, 7 and 9. 168 00:09:12,170 --> 00:09:14,810 And again, the basic idea is, we need 169 00:09:14,810 --> 00:09:20,330 to estimate number 1, the within group variation. 170 00:09:20,330 --> 00:09:23,060 And here we have two different local means 171 00:09:23,060 --> 00:09:25,160 with two different estimates of variance. 172 00:09:25,160 --> 00:09:31,070 We pool those to get a pooled estimate of natural variation. 173 00:09:31,070 --> 00:09:33,200 And then we looked at the between group, 174 00:09:33,200 --> 00:09:38,750 the deviation between the means of the two groups. 175 00:09:38,750 --> 00:09:44,540 And we used that to estimate a group-to-group variance. 176 00:09:44,540 --> 00:09:49,200 And then we looked at the ratio between those two. 177 00:09:49,200 --> 00:09:51,045 And we saw it in this example. 178 00:09:51,045 --> 00:09:53,870 I can't remember, I think it was fairly marginal 179 00:09:53,870 --> 00:09:57,590 with this few data points, whether it was actually, 180 00:09:57,590 --> 00:09:59,660 in fact, statistically significant, 181 00:09:59,660 --> 00:10:07,970 whether this mean effect was real or not, because so 182 00:10:07,970 --> 00:10:09,650 few data points, it's really quite 183 00:10:09,650 --> 00:10:12,920 a large spread in the variances that you might actually come up 184 00:10:12,920 --> 00:10:15,290 with by chance. 185 00:10:15,290 --> 00:10:20,600 OK, so what I want to do is go back and mathematically 186 00:10:20,600 --> 00:10:26,780 identify what the implied models were with the standard ANOVA. 187 00:10:26,780 --> 00:10:31,880 This is kind of pedantic here, because I 188 00:10:31,880 --> 00:10:33,440 think we do understand this. 189 00:10:33,440 --> 00:10:35,510 But I'm going to extend this in a minute 190 00:10:35,510 --> 00:10:39,350 and contrast it with the nested variance. 191 00:10:39,350 --> 00:10:42,320 So I want to be very, very clear. 192 00:10:42,320 --> 00:10:45,350 Our null hypothesis in ANOVA, what 193 00:10:45,350 --> 00:10:47,990 we think is happening if there's no fixed effects, 194 00:10:47,990 --> 00:10:51,050 is basically every data point simply 195 00:10:51,050 --> 00:10:56,270 is being drawn from the same distribution. 196 00:10:56,270 --> 00:11:00,020 And it has some fixed mean, and then it's 197 00:11:00,020 --> 00:11:01,820 got some random variation. 198 00:11:01,820 --> 00:11:03,740 And each sample is being pulled from 199 00:11:03,740 --> 00:11:06,590 the same normal distribution with just one 200 00:11:06,590 --> 00:11:09,530 underlying variance at work. 201 00:11:09,530 --> 00:11:12,410 And recall that variance might be measurement variance 202 00:11:12,410 --> 00:11:18,510 together with process variance, but it's all identical. 203 00:11:18,510 --> 00:11:21,920 So that's what's happening if there is no fixed effect. 204 00:11:21,920 --> 00:11:26,670 The alternative hypothesis that we're looking at, 205 00:11:26,670 --> 00:11:29,420 and we're trying to see is there evidence of, 206 00:11:29,420 --> 00:11:37,100 says basically that there is a fixed effect going on. 207 00:11:37,100 --> 00:11:39,860 Now I'm going to introduce a little bit of notation, 208 00:11:39,860 --> 00:11:41,420 and I actually spent some time trying 209 00:11:41,420 --> 00:11:44,060 to do the best I could to make this notation 210 00:11:44,060 --> 00:11:48,295 consistent throughout all of the slides today. 211 00:11:48,295 --> 00:11:50,420 We'll see if I succeeded in that, because there was 212 00:11:50,420 --> 00:11:52,730 a lot of changes to notation. 213 00:11:52,730 --> 00:11:56,930 I'm going to use the i subscript here 214 00:11:56,930 --> 00:12:01,820 in this case to indicate what later we'll 215 00:12:01,820 --> 00:12:07,460 see is the outermost level of variance. 216 00:12:07,460 --> 00:12:12,410 So in this simple picture, now as we go through, 217 00:12:12,410 --> 00:12:15,740 i indicates here, which subgroup I'm in. 218 00:12:15,740 --> 00:12:19,220 So in our simple data, I had two subgroups. 219 00:12:19,220 --> 00:12:22,320 I had group 1 and group 2 with two data points. 220 00:12:22,320 --> 00:12:25,040 So the i subscript here could be either 1 or 2, 221 00:12:25,040 --> 00:12:27,410 indicating group 1 or 2. 222 00:12:27,410 --> 00:12:31,640 And so the point is, there may be a fixed offset, either t1 223 00:12:31,640 --> 00:12:36,560 or t2 from the grand mean, as a fixed effect associated 224 00:12:36,560 --> 00:12:39,480 with being a member of that group. 225 00:12:39,480 --> 00:12:47,360 And now the j is a subscript with this funky little notation 226 00:12:47,360 --> 00:12:51,500 down here, that indicates this is data point 227 00:12:51,500 --> 00:13:00,254 j, the jth replicate within, read those parens right there, 228 00:13:00,254 --> 00:13:01,840 within subgroup i. 229 00:13:04,700 --> 00:13:08,810 So I had two replicates within each subgroup i. 230 00:13:08,810 --> 00:13:13,220 And the simple point here is that our alternative hypothesis 231 00:13:13,220 --> 00:13:18,320 is, there is this fixed offset, t1 or t2, at work, 232 00:13:18,320 --> 00:13:20,840 in addition to the random variation. 233 00:13:20,840 --> 00:13:25,910 And that's what we're trying to look at the estimate for. 234 00:13:25,910 --> 00:13:28,010 But this is standard ANOVA. 235 00:13:28,010 --> 00:13:31,580 And so there is still this assumption 236 00:13:31,580 --> 00:13:36,260 that there really is only one random variation at work. 237 00:13:38,810 --> 00:13:41,090 There's only one random source of variation, 238 00:13:41,090 --> 00:13:43,280 and then there's this fixed effect on top 239 00:13:43,280 --> 00:13:46,470 of that that's repeatable. 240 00:13:46,470 --> 00:13:47,370 It's systematic. 241 00:13:53,530 --> 00:13:58,450 Now, what I'm also going to do, this is a slide I added. 242 00:13:58,450 --> 00:14:02,900 I'm going to show a simple ANOVA table. 243 00:14:02,900 --> 00:14:05,560 This in fact, we showed you earlier a few weeks ago 244 00:14:05,560 --> 00:14:08,237 for that very simple set of data. 245 00:14:08,237 --> 00:14:10,070 And then I'll go back to the previous slide, 246 00:14:10,070 --> 00:14:13,210 just to be very careful on some of the subscripts and notation 247 00:14:13,210 --> 00:14:14,540 that I'm using. 248 00:14:14,540 --> 00:14:16,150 But this is essentially the same data 249 00:14:16,150 --> 00:14:17,530 that I was talking about before. 250 00:14:17,530 --> 00:14:19,210 We have two groups. 251 00:14:19,210 --> 00:14:23,530 And within each group, we had two replicates. 252 00:14:23,530 --> 00:14:27,790 And this part up here is the scratch worksheet 253 00:14:27,790 --> 00:14:34,450 that I might use to generate our ANOVA table down at the bottom. 254 00:14:34,450 --> 00:14:36,070 These are the sorts of calculations 255 00:14:36,070 --> 00:14:39,850 that either your statistics package would do, 256 00:14:39,850 --> 00:14:41,170 or you would do by hand. 257 00:14:41,170 --> 00:14:45,040 And it includes things like calculating the group average 258 00:14:45,040 --> 00:14:46,910 for each of those two groups. 259 00:14:46,910 --> 00:14:49,150 Remember the average of 3 and 5 is 4, 260 00:14:49,150 --> 00:14:52,420 and the average of 7 and 9 is 8. 261 00:14:52,420 --> 00:14:59,290 And then some additional squared deviation calculations 262 00:14:59,290 --> 00:15:05,410 that go into ultimately the sum of squared deviations 263 00:15:05,410 --> 00:15:09,490 calculation that falls into the sum of squares 264 00:15:09,490 --> 00:15:12,340 column in your table. 265 00:15:12,340 --> 00:15:15,070 After which we divide by the degrees of freedom to get 266 00:15:15,070 --> 00:15:18,280 these mean square estimates. 267 00:15:18,280 --> 00:15:21,726 So I've got these little notations, s sub d, s sub g, 268 00:15:21,726 --> 00:15:25,370 s sub e, and then sum of squared deviations, sum of squared g, 269 00:15:25,370 --> 00:15:26,800 sum of squared e. 270 00:15:26,800 --> 00:15:29,410 And that's what I've tried to detail out here 271 00:15:29,410 --> 00:15:34,570 what these definitions are, back here in slide 7. 272 00:15:34,570 --> 00:15:40,135 So this is still intermediate calculations, standard ANOVA. 273 00:15:40,135 --> 00:15:43,330 This is still all review. 274 00:15:43,330 --> 00:15:45,610 But some of the calculations, some of the terms 275 00:15:45,610 --> 00:15:53,690 that go into that, with a little bit of this funky new notation. 276 00:15:53,690 --> 00:15:59,950 So again, recall what we need to do is look at OK, what is-- 277 00:15:59,950 --> 00:16:03,280 sometimes down here. 278 00:16:03,280 --> 00:16:05,950 Our first thing is our estimate of what just 279 00:16:05,950 --> 00:16:09,100 pure replicate error is. 280 00:16:09,100 --> 00:16:11,530 We're just looking for pure replicate error. 281 00:16:11,530 --> 00:16:13,780 And that's basically looking and saying, 282 00:16:13,780 --> 00:16:18,910 I've got my local mean per group i, group 1 or group 2. 283 00:16:18,910 --> 00:16:20,710 And then I've got the data within that. 284 00:16:20,710 --> 00:16:23,440 That's just the deviation of that individual point 285 00:16:23,440 --> 00:16:25,150 from its local mean. 286 00:16:25,150 --> 00:16:30,580 I can take the square of that deviation. 287 00:16:30,580 --> 00:16:33,820 And then I take the sum of that over all 288 00:16:33,820 --> 00:16:39,880 of my data, sum of squared deviations, as my ss sub e. 289 00:16:39,880 --> 00:16:41,890 That's my sum of squared deviations in all 290 00:16:41,890 --> 00:16:44,020 of my data from the local mean. 291 00:16:44,020 --> 00:16:48,370 That's not quite my estimate of variance. 292 00:16:48,370 --> 00:16:54,430 If I now take sse and divide it by the degrees of freedom, 293 00:16:54,430 --> 00:16:57,770 that gives me my mean square error, 294 00:16:57,770 --> 00:17:02,110 which is my estimate of the underlying variance. 295 00:17:02,110 --> 00:17:04,060 Right? 296 00:17:04,060 --> 00:17:07,480 Now the other things that we do, we 297 00:17:07,480 --> 00:17:13,130 said the second piece here is we also look at the fixed effect. 298 00:17:13,130 --> 00:17:15,730 So here, we're just looking at the deviations of the group 299 00:17:15,730 --> 00:17:18,290 mean from the grand mean. 300 00:17:18,290 --> 00:17:22,119 And so here's the local mean, and here's the grand mean. 301 00:17:22,119 --> 00:17:24,760 That has the double bar over it, if you can quite 302 00:17:24,760 --> 00:17:26,780 see the notation there. 303 00:17:26,780 --> 00:17:33,340 And so for that, I've also got a sum of squared errors 304 00:17:33,340 --> 00:17:34,840 here in ssg. 305 00:17:34,840 --> 00:17:38,590 This is the total sum of squared deviations, one 306 00:17:38,590 --> 00:17:39,950 for each data point. 307 00:17:39,950 --> 00:17:43,270 So each data point shares the same group mean. 308 00:17:43,270 --> 00:17:46,060 So I'll have multiple entries in the table. 309 00:17:46,060 --> 00:17:48,790 Because if I have multiple replicates, they all share 310 00:17:48,790 --> 00:17:49,930 the same group mean. 311 00:17:49,930 --> 00:17:52,300 But I'm basically saying, OK, all of those 312 00:17:52,300 --> 00:17:55,180 contribute to group mean deviations 313 00:17:55,180 --> 00:17:59,450 in the total deviations in my data. 314 00:17:59,450 --> 00:18:04,390 And then the last thing, I'm going to put a 0 by this. 315 00:18:04,390 --> 00:18:08,050 Because this last thing is sort of the total deviations 316 00:18:08,050 --> 00:18:09,790 in your data from the grand mean. 317 00:18:09,790 --> 00:18:14,890 This is just your raw sum of squared deviations. 318 00:18:14,890 --> 00:18:16,720 So I just calculate my grand mean, 319 00:18:16,720 --> 00:18:19,510 and I treat all of my data equally, 320 00:18:19,510 --> 00:18:21,010 whether it's within a group or not. 321 00:18:21,010 --> 00:18:25,540 And that's just throwing all of my data into one big bucket, 322 00:18:25,540 --> 00:18:27,910 calculating a grand mean, and saying, 323 00:18:27,910 --> 00:18:31,180 there's a total, total amount of sum of squared deviations 324 00:18:31,180 --> 00:18:33,520 in that data. 325 00:18:33,520 --> 00:18:36,370 And what we want to do is actually take 326 00:18:36,370 --> 00:18:40,150 that total amount of deviations, and now divvy it up, 327 00:18:40,150 --> 00:18:44,650 and say, how much of it is going to be because of group to group 328 00:18:44,650 --> 00:18:47,860 variation, and how much of it is pure replication data? 329 00:18:47,860 --> 00:18:50,390 That's where we're going to want to get to. 330 00:18:50,390 --> 00:18:54,500 But in standard ANOVA, what we simply do 331 00:18:54,500 --> 00:19:03,560 is remember that ssd is equal to the sum of the ssg 332 00:19:03,560 --> 00:19:05,510 and the sum of the sse. 333 00:19:05,510 --> 00:19:09,860 It was in fact, sort of a crude attempt 334 00:19:09,860 --> 00:19:14,090 at divvying up the total squared deviations. 335 00:19:14,090 --> 00:19:16,640 But what we'll see is that it's actually not the right way 336 00:19:16,640 --> 00:19:20,540 to get an estimate of the underlying variances. 337 00:19:20,540 --> 00:19:26,555 And so now in this table, that's all that was going on. 338 00:19:26,555 --> 00:19:29,630 I just applied all of these formulas 339 00:19:29,630 --> 00:19:32,360 to get the scratchpad to work, and then 340 00:19:32,360 --> 00:19:36,050 put that into the ANOVA table. 341 00:19:36,050 --> 00:19:40,910 And in this case, we saw the total sum of squared deviations 342 00:19:40,910 --> 00:19:42,560 from the grand mean. 343 00:19:42,560 --> 00:19:45,590 We have an estimate there of the mean square. 344 00:19:45,590 --> 00:19:48,230 We had the group to group sum of square 345 00:19:48,230 --> 00:19:50,180 and the replicate sum of squares. 346 00:19:50,180 --> 00:19:53,670 We could use that to form an f test. 347 00:19:53,670 --> 00:19:58,550 And we see that it's only 89% confidence 348 00:19:58,550 --> 00:20:02,450 that in fact you would observe that big of a ratio by chance 349 00:20:02,450 --> 00:20:03,660 alone. 350 00:20:03,660 --> 00:20:07,250 So if I had a cutoff of 95% confidence or 90% confidence 351 00:20:07,250 --> 00:20:09,770 with that data, I would have said, 352 00:20:09,770 --> 00:20:11,810 sorry, I don't have strong enough evidence 353 00:20:11,810 --> 00:20:14,360 to say that there's actually a group effect. 354 00:20:14,360 --> 00:20:19,220 There's not a fixed effect here in this data. 355 00:20:19,220 --> 00:20:21,110 OK? 356 00:20:21,110 --> 00:20:22,410 So what's wrong with that? 357 00:20:22,410 --> 00:20:23,840 That's all great. 358 00:20:23,840 --> 00:20:24,630 That's all great. 359 00:20:24,630 --> 00:20:26,460 There's nothing wrong with that. 360 00:20:26,460 --> 00:20:29,780 But if what you're really trying to do 361 00:20:29,780 --> 00:20:34,010 is deal with a different situation, a nested variance 362 00:20:34,010 --> 00:20:39,680 situation, that simple ANOVA can actually lead you astray. 363 00:20:39,680 --> 00:20:44,660 So now, we're actually shifting to a different model. 364 00:20:44,660 --> 00:20:47,220 But it's a model that I think comes up quite a bit, 365 00:20:47,220 --> 00:20:51,500 which is, instead of saying there was a fixed effect, 366 00:20:51,500 --> 00:20:58,040 so every time I had group 1, I would have a deviation t sub 1 367 00:20:58,040 --> 00:20:59,810 for that group. 368 00:20:59,810 --> 00:21:06,260 What if every time instead, I had pulled another sample 369 00:21:06,260 --> 00:21:08,540 that I just lumped together as group 1? 370 00:21:08,540 --> 00:21:11,850 Maybe it's a new wafer. 371 00:21:11,850 --> 00:21:14,900 It in fact, has a wafer random variation, 372 00:21:14,900 --> 00:21:19,310 coming from a different source than the replicate data 373 00:21:19,310 --> 00:21:21,150 within that wafer. 374 00:21:21,150 --> 00:21:22,850 So in other words, I'm moving away 375 00:21:22,850 --> 00:21:25,760 from a fixed effect group to group, 376 00:21:25,760 --> 00:21:29,240 to a random effect group to group. 377 00:21:29,240 --> 00:21:32,090 There's a different random model, 378 00:21:32,090 --> 00:21:37,820 a g sub i here, which is not the fixed effect. 379 00:21:37,820 --> 00:21:41,130 I guess we called that a t sub i in the previous model. 380 00:21:41,130 --> 00:21:45,470 But in fact, this g sub i is now also drawn 381 00:21:45,470 --> 00:21:47,690 from a different random variation, 382 00:21:47,690 --> 00:21:51,660 or a different normal distribution, in this case. 383 00:21:51,660 --> 00:21:53,730 I have a very different model. 384 00:21:53,730 --> 00:21:55,790 I have a very different model. 385 00:21:55,790 --> 00:22:01,070 Now, what I'm really interested in is dealing with a situation 386 00:22:01,070 --> 00:22:04,550 where there's two different variances at work. 387 00:22:04,550 --> 00:22:08,240 I still have replication variance, or the within 388 00:22:08,240 --> 00:22:10,460 group variance. 389 00:22:10,460 --> 00:22:15,350 So imagine again, I'm pulling now two different wafers. 390 00:22:15,350 --> 00:22:19,100 g1 is now wafer 1, g2 is now wafer 2. 391 00:22:19,100 --> 00:22:22,610 And within that, I measure two different chips. 392 00:22:22,610 --> 00:22:24,260 Now what I'm interested in knowing 393 00:22:24,260 --> 00:22:26,690 is what the chip-to-chip variance is, 394 00:22:26,690 --> 00:22:28,610 and what the wafer-to-wafer variance 395 00:22:28,610 --> 00:22:32,210 is, as if it were random. 396 00:22:32,210 --> 00:22:34,190 So that's the key difference. 397 00:22:34,190 --> 00:22:38,030 And what we want to do is still be able to decide, 398 00:22:38,030 --> 00:22:40,250 is there something significant going on? 399 00:22:40,250 --> 00:22:43,100 But really, what I want to do is estimate 400 00:22:43,100 --> 00:22:46,850 what those variances are. 401 00:22:46,850 --> 00:22:48,200 So here's the same picture. 402 00:22:48,200 --> 00:22:50,060 This is the same data. 403 00:22:50,060 --> 00:22:53,270 But now, I have a different kind of picture. 404 00:22:53,270 --> 00:22:57,830 I still need to estimate number one, which 405 00:22:57,830 --> 00:23:00,800 is the within group variances. 406 00:23:00,800 --> 00:23:04,100 But now instead of having a fixed effect, what I want to 407 00:23:04,100 --> 00:23:05,510 do-- 408 00:23:05,510 --> 00:23:07,370 let me call that 2 star. 409 00:23:07,370 --> 00:23:11,510 I want to attribute the group-to-group deviations 410 00:23:11,510 --> 00:23:15,410 that I observe as indications of or samples 411 00:23:15,410 --> 00:23:17,690 of group-to-group variances. 412 00:23:20,660 --> 00:23:23,815 So is this setup clear? 413 00:23:23,815 --> 00:23:25,140 OK. 414 00:23:25,140 --> 00:23:31,250 So our real goal here is to estimate these two variances. 415 00:23:31,250 --> 00:23:33,550 And by the way, point estimates are not 416 00:23:33,550 --> 00:23:34,550 going to be good enough. 417 00:23:34,550 --> 00:23:37,980 We actually want confidence intervals on these estimates. 418 00:23:37,980 --> 00:23:40,100 So that's going to get a little bit tricky. 419 00:23:40,100 --> 00:23:42,650 But what we would like to actually do 420 00:23:42,650 --> 00:23:46,280 is decompose the total variance we observe, 421 00:23:46,280 --> 00:23:48,590 estimate what the total variance is, 422 00:23:48,590 --> 00:23:53,280 and decompose it into these two different sources. 423 00:23:53,280 --> 00:23:54,950 So, here's our first attempt. 424 00:23:54,950 --> 00:23:56,810 I'll call this the naive attempt. 425 00:23:56,810 --> 00:23:58,880 Why don't we just reuse all those calculations 426 00:23:58,880 --> 00:24:00,290 we already had for ANOVA? 427 00:24:00,290 --> 00:24:02,990 Aren't they really telling us the same thing? 428 00:24:02,990 --> 00:24:03,620 Think of it. 429 00:24:03,620 --> 00:24:06,800 Back in ANOVA, we were already saying, OK, 430 00:24:06,800 --> 00:24:11,150 under the null hypothesis, we had replication variance, 431 00:24:11,150 --> 00:24:14,490 and we had a variance due to group-to-group variation. 432 00:24:14,490 --> 00:24:15,380 AUDIENCE: [LAUGHS] 433 00:24:15,380 --> 00:24:16,880 DUANE BONING: Can't I just use those 434 00:24:16,880 --> 00:24:20,360 directly as my estimates of the two variances? 435 00:24:20,360 --> 00:24:21,990 I had a ratio of those two. 436 00:24:21,990 --> 00:24:24,530 Why aren't those two just great estimates 437 00:24:24,530 --> 00:24:27,150 of those two variances? 438 00:24:27,150 --> 00:24:29,120 So our naive attempt, our first attempt, 439 00:24:29,120 --> 00:24:31,800 is let's just use what we already did. 440 00:24:31,800 --> 00:24:36,860 Number one, we had an estimate of pure replication error. 441 00:24:36,860 --> 00:24:41,930 Think of that as within group variance, right? 442 00:24:41,930 --> 00:24:43,040 Let's just reuse that. 443 00:24:43,040 --> 00:24:45,410 That's our estimate of within group variance. 444 00:24:45,410 --> 00:24:50,000 Number two, we had this between group thing. 445 00:24:50,000 --> 00:24:54,440 We had a group mean square, deviation of the group means. 446 00:24:54,440 --> 00:24:57,680 Why not simply estimate the group-to-group variance 447 00:24:57,680 --> 00:25:04,370 as the mean square of that group deviations? 448 00:25:04,370 --> 00:25:07,160 So some squared group-to-group deviations divided 449 00:25:07,160 --> 00:25:09,170 by the degrees of freedom. 450 00:25:09,170 --> 00:25:13,070 And in terms of total variance, if I assume, 451 00:25:13,070 --> 00:25:15,170 again, there are two different variances at work, 452 00:25:15,170 --> 00:25:17,750 there is within group and group-to-group, 453 00:25:17,750 --> 00:25:20,990 and they're independent, then my total variance should just 454 00:25:20,990 --> 00:25:22,880 be the sum of those. 455 00:25:22,880 --> 00:25:28,010 Why don't I simply use that sum of squared deviation divided 456 00:25:28,010 --> 00:25:30,800 by the degrees of freedom? 457 00:25:30,800 --> 00:25:32,960 It's just a total mean squared deviation. 458 00:25:32,960 --> 00:25:36,710 Why not use that as my estimate, just my total variance? 459 00:25:36,710 --> 00:25:39,020 Throw all my data into one big bucket, 460 00:25:39,020 --> 00:25:41,300 calculate total variance. 461 00:25:41,300 --> 00:25:43,790 Isn't that my best estimate of the total variance 462 00:25:43,790 --> 00:25:45,530 in the system? 463 00:25:45,530 --> 00:25:46,730 Seems natural, doesn't it? 464 00:25:51,410 --> 00:25:53,240 In fact, we can calculate the grand mean, 465 00:25:53,240 --> 00:25:56,180 I can calculate the grand variance. 466 00:25:56,180 --> 00:25:58,490 Isn't that a great estimate of total variance 467 00:25:58,490 --> 00:26:00,470 in the system under this model? 468 00:26:00,470 --> 00:26:08,510 And well, the good news is, at least something still sticks. 469 00:26:08,510 --> 00:26:12,480 The within group variance estimate is still good. 470 00:26:12,480 --> 00:26:15,320 But we would be wrong, and I'll show you 471 00:26:15,320 --> 00:26:19,400 why, in this naive approach, with our estimate 472 00:26:19,400 --> 00:26:20,940 of between group variance. 473 00:26:20,940 --> 00:26:27,020 And bizarrely, we're even wrong on our estimate 474 00:26:27,020 --> 00:26:32,030 of total variance in the system, just using all of our data, 475 00:26:32,030 --> 00:26:33,860 undifferentiated. 476 00:26:33,860 --> 00:26:36,074 Yeah. 477 00:26:36,074 --> 00:26:38,973 AUDIENCE: Are they supposed to be 478 00:26:38,973 --> 00:26:46,142 [INAUDIBLE] really the sum of the other two [INAUDIBLE]?? 479 00:26:46,142 --> 00:26:48,600 DUANE BONING: So the question here, for folks in Singapore, 480 00:26:48,600 --> 00:26:55,980 was, shouldn't the last one be equal to the sum 481 00:26:55,980 --> 00:27:00,900 of this one, a plus b? 482 00:27:00,900 --> 00:27:03,810 Shouldn't that equal the total? 483 00:27:03,810 --> 00:27:07,990 And the answer is no. 484 00:27:07,990 --> 00:27:13,090 If you actually look back at the ANOVA table, let me do that. 485 00:27:13,090 --> 00:27:18,400 If you look back at the ANOVA table, 486 00:27:18,400 --> 00:27:21,690 let's look at it down here at the bottom. 487 00:27:21,690 --> 00:27:26,860 The sum of squares is the one that is conserved. 488 00:27:26,860 --> 00:27:29,100 So your total squared deviations, 489 00:27:29,100 --> 00:27:32,880 you break apart into within group or group-to-group. 490 00:27:32,880 --> 00:27:35,490 But by the time you divide it by degrees of freedom, 491 00:27:35,490 --> 00:27:38,880 the mean square or the variances do not add. 492 00:27:38,880 --> 00:27:41,760 And that's weird. 493 00:27:41,760 --> 00:27:46,920 And it turns out, in our second problem, the nested variance 494 00:27:46,920 --> 00:27:49,830 structure, they should add. 495 00:27:49,830 --> 00:27:53,190 We're saying, there is two different sources 496 00:27:53,190 --> 00:27:54,463 of variance at work. 497 00:27:54,463 --> 00:27:56,130 There's within group invariance, there's 498 00:27:56,130 --> 00:27:58,510 group-to-group variance, they're independent. 499 00:27:58,510 --> 00:28:03,000 So my total variance actually should be the sum of the two. 500 00:28:03,000 --> 00:28:05,070 And we're going to use that knowledge, actually, 501 00:28:05,070 --> 00:28:08,340 to help us with estimating things. 502 00:28:08,340 --> 00:28:13,350 So in fact, your insight is right, not for standard ANOVA, 503 00:28:13,350 --> 00:28:16,148 but it is right for the nested variance structure. 504 00:28:24,340 --> 00:28:27,790 Now that I've broken down your confidence in using ANOVA, 505 00:28:27,790 --> 00:28:30,100 hopefully, and we can rebuild what 506 00:28:30,100 --> 00:28:31,500 it is we really want to do. 507 00:28:34,120 --> 00:28:37,680 I think we've already talked about this a little bit. 508 00:28:37,680 --> 00:28:41,820 Again, I think that these nested variance structures actually 509 00:28:41,820 --> 00:28:44,850 arise a lot in any kind of process 510 00:28:44,850 --> 00:28:47,220 where there's any kind of batch processing 511 00:28:47,220 --> 00:28:51,820 at work, either in time or in space. 512 00:28:51,820 --> 00:28:54,900 So it's easiest, in my mind, maybe because I'm just 513 00:28:54,900 --> 00:28:57,420 mostly familiar with semiconductor manufacturing, 514 00:28:57,420 --> 00:28:58,570 to really see that. 515 00:28:58,570 --> 00:29:01,830 But I think it's much more generally true anytime 516 00:29:01,830 --> 00:29:08,580 I have some grouping of stuff, of parts, material, whatever, 517 00:29:08,580 --> 00:29:13,140 that I want to look at stuff within that group, and then 518 00:29:13,140 --> 00:29:15,120 group-to-group, and maybe then there's 519 00:29:15,120 --> 00:29:18,300 a hierarchy where there's a larger group of groups. 520 00:29:18,300 --> 00:29:20,775 And I want to estimate variances within that. 521 00:29:20,775 --> 00:29:24,150 That kind of hierarchical or nested structure, I think 522 00:29:24,150 --> 00:29:26,740 comes up all the time. 523 00:29:26,740 --> 00:29:31,530 And an important point that I haven't really explicitly said, 524 00:29:31,530 --> 00:29:33,570 but would like to mention here, is 525 00:29:33,570 --> 00:29:37,890 that the reason we're interested in estimating these variances 526 00:29:37,890 --> 00:29:42,390 differently, is that very often, physically 527 00:29:42,390 --> 00:29:48,060 in the process, the source of the variation at each of those 528 00:29:48,060 --> 00:29:49,840 levels is actually different. 529 00:29:49,840 --> 00:29:51,940 It's a different orthogonal, different kind 530 00:29:51,940 --> 00:29:54,160 of variation source. 531 00:29:54,160 --> 00:29:57,070 So for example, in the wafer case, 532 00:29:57,070 --> 00:30:00,480 the source of variation chip-to-chip 533 00:30:00,480 --> 00:30:05,130 may have to do with non-uniformity within the tool, 534 00:30:05,130 --> 00:30:08,700 maybe spatial or other kinds of random, in many cases 535 00:30:08,700 --> 00:30:12,360 even spatial systematic variations. 536 00:30:12,360 --> 00:30:15,570 But the point is, there's a different kind of set 537 00:30:15,570 --> 00:30:19,590 of physics governing how well-matched each 538 00:30:19,590 --> 00:30:22,770 of those chips are within one wafer. 539 00:30:22,770 --> 00:30:24,270 It's a different source of physics 540 00:30:24,270 --> 00:30:28,290 than every new wafer I put into that single wafer tool, 541 00:30:28,290 --> 00:30:30,330 from one time to the next. 542 00:30:30,330 --> 00:30:36,180 That may also have deviations due to how well I can control 543 00:30:36,180 --> 00:30:39,870 some of the parameters of that process in time 544 00:30:39,870 --> 00:30:42,070 from one run to the next. 545 00:30:42,070 --> 00:30:44,790 So it seems very natural that indeed, 546 00:30:44,790 --> 00:30:50,220 the variance, the underlying variation in those two cases, 547 00:30:50,220 --> 00:30:51,210 are different. 548 00:30:51,210 --> 00:30:52,860 They are orthogonal. 549 00:30:52,860 --> 00:30:57,520 They are in some sense, additive in that sense. 550 00:30:57,520 --> 00:30:58,890 They are uncorrelated. 551 00:30:58,890 --> 00:31:01,410 And we'll use that assumption. 552 00:31:01,410 --> 00:31:04,110 But there's that underlying assumption in here. 553 00:31:04,110 --> 00:31:07,470 If, in fact, it's just arbitrary grouping where there's 554 00:31:07,470 --> 00:31:09,540 only one source of variation at work, 555 00:31:09,540 --> 00:31:15,820 and it's just random pulling of things to form, 556 00:31:15,820 --> 00:31:17,910 in fact, different groups, then that's 557 00:31:17,910 --> 00:31:20,670 more like the standard ANOVA. 558 00:31:20,670 --> 00:31:24,120 But when there's a nested structure, items within items 559 00:31:24,120 --> 00:31:27,420 within other items, usually there's 560 00:31:27,420 --> 00:31:30,240 a different source of physics that were causing 561 00:31:30,240 --> 00:31:32,610 the variation at each level. 562 00:31:32,610 --> 00:31:35,490 OK, so again, our goal is to estimate 563 00:31:35,490 --> 00:31:39,150 each of these sources of variation, both point estimates 564 00:31:39,150 --> 00:31:43,290 and confidence intervals. 565 00:31:43,290 --> 00:31:44,850 Here's some examples. 566 00:31:44,850 --> 00:31:48,240 I think we've already talked about this. 567 00:31:48,240 --> 00:31:51,510 The within wafer versus say, the wafer-to-wafer, 568 00:31:51,510 --> 00:31:55,060 or run-to-run variability. 569 00:31:55,060 --> 00:31:57,330 OK, so let me build up a little bit 570 00:31:57,330 --> 00:31:59,700 of the explicit model for this case, 571 00:31:59,700 --> 00:32:03,210 especially with multiple layers of nesting. 572 00:32:03,210 --> 00:32:05,670 We'll start without nesting. 573 00:32:05,670 --> 00:32:07,590 Then we'll do one level of nesting. 574 00:32:07,590 --> 00:32:09,600 And then working up to the second example, 575 00:32:09,600 --> 00:32:12,480 we'll do a three-level, two levels of nesting. 576 00:32:12,480 --> 00:32:15,180 But we basically will have points 577 00:32:15,180 --> 00:32:17,290 within wafers within lots. 578 00:32:17,290 --> 00:32:19,650 So in fact, you can keep extending that. 579 00:32:19,650 --> 00:32:23,370 You could have lots within products, and products 580 00:32:23,370 --> 00:32:25,965 within fabs, and who knows? 581 00:32:25,965 --> 00:32:28,110 You could get further detailed. 582 00:32:28,110 --> 00:32:31,320 But here's the simplest model. 583 00:32:31,320 --> 00:32:37,020 And this really is the pure variance case 584 00:32:37,020 --> 00:32:41,010 without any nesting at all. 585 00:32:41,010 --> 00:32:43,650 We basically have our individual measurements, 586 00:32:43,650 --> 00:32:45,180 there's an overall mean. 587 00:32:45,180 --> 00:32:49,860 And then there's some random variation occurring. 588 00:32:49,860 --> 00:32:52,230 And here I'm indicating that I'm taking 589 00:32:52,230 --> 00:32:56,670 multiple measurements, multiple samples, multiple replicates 590 00:32:56,670 --> 00:32:58,350 with this m sub i. 591 00:32:58,350 --> 00:33:02,340 And the point is, it's simply a 0 mean normal distribution 592 00:33:02,340 --> 00:33:04,590 with some variance. 593 00:33:04,590 --> 00:33:10,170 And as we've said, many of our assumptions still hold. 594 00:33:10,170 --> 00:33:15,930 And in fact, we often make these assumptions and use them, 595 00:33:15,930 --> 00:33:17,340 and we have to be careful in some 596 00:33:17,340 --> 00:33:19,590 of the spatial situations I've talked about, 597 00:33:19,590 --> 00:33:22,590 because they don't actually hold. 598 00:33:22,590 --> 00:33:24,750 In some sense, what we're assuming in this case is, 599 00:33:24,750 --> 00:33:27,240 I'm taking multiple measurements on the same wafer. 600 00:33:27,240 --> 00:33:30,180 But I'm assuming randomness in those measurements. 601 00:33:30,180 --> 00:33:33,000 I'm assuming each of those individual measurements, 602 00:33:33,000 --> 00:33:35,550 or each of those individual replicates 603 00:33:35,550 --> 00:33:41,700 is IIND, identically and independently distributed 604 00:33:41,700 --> 00:33:44,850 from a normal distribution. 605 00:33:44,850 --> 00:33:48,555 That's 0 mean with all sharing the same variance. 606 00:33:51,380 --> 00:33:54,540 And just as a precursor, we'll come back to this, 607 00:33:54,540 --> 00:33:56,930 actually, in one of the case studies. 608 00:33:56,930 --> 00:33:59,540 If I have within wafer variation, 609 00:33:59,540 --> 00:34:03,890 chip-to-chip variation, and I have a systematic variance, 610 00:34:03,890 --> 00:34:06,053 center to edge, maybe it's always 611 00:34:06,053 --> 00:34:07,970 thinner in the middle of the wafer and thicker 612 00:34:07,970 --> 00:34:11,210 on the edge of the wafer, that's a systematic effect 613 00:34:11,210 --> 00:34:14,659 that this model is not really good at capturing. 614 00:34:14,659 --> 00:34:16,350 So you've got to be careful. 615 00:34:16,350 --> 00:34:18,199 This is actually a fairly strong assumption 616 00:34:18,199 --> 00:34:25,909 of random sampling within that particular scenario. 617 00:34:25,909 --> 00:34:28,639 OK, but this is the simple case. 618 00:34:28,639 --> 00:34:31,760 Now let's look at this variance structure. 619 00:34:31,760 --> 00:34:36,590 And here I'm using exactly the notation out of Drain's book. 620 00:34:36,590 --> 00:34:39,020 So when you read Drain's chapter, 621 00:34:39,020 --> 00:34:40,909 this should look familiar to you. 622 00:34:40,909 --> 00:34:44,840 In this case, now I've got an overall mean. 623 00:34:44,840 --> 00:34:49,730 And again, we've got a wafer effect. 624 00:34:49,730 --> 00:34:52,190 It's not a fixed effect, it's a random effect. 625 00:34:52,190 --> 00:35:00,890 Meaning every time I pull a new wafer out of my samples, 626 00:35:00,890 --> 00:35:06,110 all of the data points on it will share the same offset. 627 00:35:06,110 --> 00:35:08,720 It's an offset of w sub i. 628 00:35:08,720 --> 00:35:13,580 But w sub i itself, or wafer sub 1, or wafer sub 2, 629 00:35:13,580 --> 00:35:18,500 is itself drawn from a 0 mean normal distribution 630 00:35:18,500 --> 00:35:22,650 with variance sigma squared sub w. 631 00:35:22,650 --> 00:35:27,230 So the amount of wafer offset for that particular wafer 632 00:35:27,230 --> 00:35:31,970 is randomly sampled. 633 00:35:31,970 --> 00:35:35,000 And then within that, I make multiple measurements, 634 00:35:35,000 --> 00:35:38,360 I make j measurements within wafer sub i. 635 00:35:38,360 --> 00:35:40,130 And each of those individual measurements 636 00:35:40,130 --> 00:35:44,390 is itself also randomly distributed. 637 00:35:44,390 --> 00:35:45,500 OK? 638 00:35:45,500 --> 00:35:48,590 So that's just repeating what our situation here was. 639 00:35:48,590 --> 00:35:50,930 But now, instead of using that generic model, 640 00:35:50,930 --> 00:35:54,290 I'm really trying to illustrate it with wafers and measurements 641 00:35:54,290 --> 00:35:55,100 within wafers. 642 00:35:58,640 --> 00:36:06,360 So if I take this basic formula right here, 643 00:36:06,360 --> 00:36:09,720 and I basically ask, OK, now if I 644 00:36:09,720 --> 00:36:12,070 want to do variance calculations, 645 00:36:12,070 --> 00:36:17,710 if I do a variance on this, what is 646 00:36:17,710 --> 00:36:20,560 the variance I'm going to observe in my total data? 647 00:36:23,410 --> 00:36:27,490 Probably should have replicated that equation right up here. 648 00:36:27,490 --> 00:36:38,620 We had xij is equal to mu plus w sub i plus m j of i. 649 00:36:38,620 --> 00:36:43,390 So if I threw all of my data into one big bucket, xij, 650 00:36:43,390 --> 00:36:51,010 and I simply calculated the variance across all of my data, 651 00:36:51,010 --> 00:36:56,120 what should not just my data, but in fact, 652 00:36:56,120 --> 00:36:58,930 if I had the total population, infinite 653 00:36:58,930 --> 00:37:04,030 numbers of measurement, what would the variance of that be? 654 00:37:04,030 --> 00:37:09,250 And we're saying that the w sub i and the measurements 655 00:37:09,250 --> 00:37:12,370 within that are independent. 656 00:37:12,370 --> 00:37:14,230 So that these variances add, there's 657 00:37:14,230 --> 00:37:16,330 no correlation between them. 658 00:37:16,330 --> 00:37:18,085 The variance of a constant is 0. 659 00:37:20,920 --> 00:37:24,940 We have the wafer-to-wafer variance, 660 00:37:24,940 --> 00:37:27,310 so that's my sigma square w. 661 00:37:27,310 --> 00:37:29,260 And I have my measurement variance. 662 00:37:29,260 --> 00:37:34,360 And so my total variance, my total true variance, 663 00:37:34,360 --> 00:37:37,740 is simply the sum of those two other independent variances. 664 00:37:37,740 --> 00:37:41,170 And so that's your earlier intuition 665 00:37:41,170 --> 00:37:45,500 that Nalish was talking about. 666 00:37:45,500 --> 00:37:51,020 So the individual variances are assumed to be independent. 667 00:37:51,020 --> 00:37:55,640 And again, that was not true in that naive attempt. 668 00:37:55,640 --> 00:37:58,500 So how do we do it? 669 00:37:58,500 --> 00:38:01,850 How do I use the data to actually get good estimates 670 00:38:01,850 --> 00:38:05,600 of these independent variances? 671 00:38:05,600 --> 00:38:08,660 And here's the key idea. 672 00:38:08,660 --> 00:38:11,720 We've got sampling at work. 673 00:38:11,720 --> 00:38:13,160 We've got sampling at work. 674 00:38:16,850 --> 00:38:19,070 Essentially within the replicates, 675 00:38:19,070 --> 00:38:21,770 I've got additional variances going on, because 676 00:38:21,770 --> 00:38:26,630 of measurement or replication variance. 677 00:38:26,630 --> 00:38:31,070 And that is contaminating or adding some noise 678 00:38:31,070 --> 00:38:33,920 to my estimate of the group-to-group variance. 679 00:38:33,920 --> 00:38:37,040 So what we basically need to do is unwrap that, 680 00:38:37,040 --> 00:38:42,590 recognize that I've got multiple samples around that, 681 00:38:42,590 --> 00:38:46,130 and pull out that random variance away 682 00:38:46,130 --> 00:38:49,340 from our best estimate of the group-to-group variance. 683 00:38:49,340 --> 00:38:52,430 We need to account for the fact that I've got also 684 00:38:52,430 --> 00:38:56,570 sampling noise going on when I'm trying to estimate 685 00:38:56,570 --> 00:38:58,730 the group-to-group variance. 686 00:38:58,730 --> 00:39:02,270 So what happens is, if I calculate 687 00:39:02,270 --> 00:39:08,340 the wafer average observed variance just from my data, 688 00:39:08,340 --> 00:39:14,150 So if I observe w bar, I observe wafer average 1, wafer 689 00:39:14,150 --> 00:39:16,850 average 2, wafer average 3, and I look 690 00:39:16,850 --> 00:39:21,320 at the variances of these, the observed variance 691 00:39:21,320 --> 00:39:25,700 in the wafer averages actually has 692 00:39:25,700 --> 00:39:29,660 the true wafer-to-wafer variance in it. 693 00:39:29,660 --> 00:39:33,320 But it's also got this sampling noise attached to it. 694 00:39:37,560 --> 00:39:40,380 And so what we want to do is, to get 695 00:39:40,380 --> 00:39:46,200 to the true variance of just the wafer-to-wafer variance, 696 00:39:46,200 --> 00:39:49,830 I need to subtract this off of the observed. 697 00:39:49,830 --> 00:39:54,060 So if this, now again, is my observed wafer-to-wafer 698 00:39:54,060 --> 00:39:57,360 average variance, I subtract off the sampling noise, my best 699 00:39:57,360 --> 00:40:00,450 estimate because of sampling. 700 00:40:00,450 --> 00:40:02,700 And that gives me my best estimate 701 00:40:02,700 --> 00:40:04,710 of the actual wafer-to-wafer variance. 702 00:40:07,590 --> 00:40:11,880 Now, I added a slide here that actually 703 00:40:11,880 --> 00:40:14,010 shows a derivation of this. 704 00:40:18,620 --> 00:40:21,140 But I hope, before I go into the derivation, 705 00:40:21,140 --> 00:40:23,690 hopefully there's good intuition here. 706 00:40:23,690 --> 00:40:26,930 This is just the sampling contamination of noise, right? 707 00:40:26,930 --> 00:40:32,930 And for example, if I had a million replicate measurements, 708 00:40:32,930 --> 00:40:37,550 so if m were a million, I would be averaging out 709 00:40:37,550 --> 00:40:41,750 all of those small measurements, pure replication deviations. 710 00:40:41,750 --> 00:40:43,010 There are 0 mean. 711 00:40:43,010 --> 00:40:45,950 So on average, this term gets smaller and smaller 712 00:40:45,950 --> 00:40:48,170 with more and more sampling of the wafer. 713 00:40:48,170 --> 00:40:50,660 And what I would observe in that case, 714 00:40:50,660 --> 00:40:53,360 from my wafer-to-wafer average variance, 715 00:40:53,360 --> 00:40:56,810 is really, really close to the true one. 716 00:40:56,810 --> 00:40:59,480 This is only a problem when m is small. 717 00:40:59,480 --> 00:41:02,330 And I've got lots of replicate noise or measurement 718 00:41:02,330 --> 00:41:06,560 noise contaminating my estimate of 719 00:41:06,560 --> 00:41:08,700 the wafer-to-wafer variation. 720 00:41:08,700 --> 00:41:11,330 So hopefully there's some degree of intuition here 721 00:41:11,330 --> 00:41:12,650 that makes sense. 722 00:41:12,650 --> 00:41:18,440 If I actually now go in and do the calculation, 723 00:41:18,440 --> 00:41:26,367 first off, what I said is we have three or four or five 724 00:41:26,367 --> 00:41:28,700 different wafers, and I have some number of measurements 725 00:41:28,700 --> 00:41:29,570 within each wafer. 726 00:41:29,570 --> 00:41:33,710 And I'm simply calculating for each of those wafers what 727 00:41:33,710 --> 00:41:36,530 the observed average is. 728 00:41:36,530 --> 00:41:40,340 Up here is the formula for the wafer average. 729 00:41:40,340 --> 00:41:43,760 And then down here, I'm just applying various mathematics 730 00:41:43,760 --> 00:41:45,720 to that formula. 731 00:41:45,720 --> 00:41:51,320 And so what's going on here is, we can see, 732 00:41:51,320 --> 00:41:55,370 we've got our wafer average for wafer sub i, 733 00:41:55,370 --> 00:41:59,210 is simply 1 over the measurements for all of the j 734 00:41:59,210 --> 00:42:01,280 replicates, 1 through m. 735 00:42:01,280 --> 00:42:05,870 We've got m replicates of that within that wafer sub i. 736 00:42:08,690 --> 00:42:10,610 Which I've just expanded out, I just 737 00:42:10,610 --> 00:42:12,720 plugged in x sub i in here. 738 00:42:12,720 --> 00:42:15,830 Now if I expand that summation out, 739 00:42:15,830 --> 00:42:19,620 I've got m replicates of mu. 740 00:42:19,620 --> 00:42:21,170 So that's my m mu. 741 00:42:21,170 --> 00:42:25,040 I've got m replicates, all with the same w sub i. 742 00:42:25,040 --> 00:42:26,760 So that's that right there. 743 00:42:26,760 --> 00:42:28,640 And then I've got my remaining part 744 00:42:28,640 --> 00:42:31,850 of my sum for my individual replicates. 745 00:42:31,850 --> 00:42:36,200 And just multiplying that out, my overall wafer average 746 00:42:36,200 --> 00:42:40,670 for that wafer sub i is mu plus the shared offset because 747 00:42:40,670 --> 00:42:43,340 of the sample for that particular wafer, 748 00:42:43,340 --> 00:42:45,360 and then all my measurement noise. 749 00:42:45,360 --> 00:42:49,700 And now I can apply my variance to the observed, 750 00:42:49,700 --> 00:42:57,640 this is the observed wafer-to-wafer average, 751 00:42:57,640 --> 00:43:00,550 is the variance of the mean, which again, is 0, 752 00:43:00,550 --> 00:43:02,470 because that's just a constant. 753 00:43:02,470 --> 00:43:06,710 This is the true variance, sigma w squared. 754 00:43:06,710 --> 00:43:09,190 And then here, I've got m replicates, 755 00:43:09,190 --> 00:43:11,350 all with the same variance. 756 00:43:11,350 --> 00:43:14,320 So I've got a constant, which gives me, 757 00:43:14,320 --> 00:43:17,950 when I do my variance math, my 1 constant squared, my 1 758 00:43:17,950 --> 00:43:19,210 over m squared. 759 00:43:19,210 --> 00:43:22,430 And then the sum gives me m equal variances. 760 00:43:22,430 --> 00:43:25,900 So that's where we get back to our basically 1 over m 761 00:43:25,900 --> 00:43:28,300 sampling variance. 762 00:43:28,300 --> 00:43:31,060 So that's the derivation going on, if you actually 763 00:43:31,060 --> 00:43:35,200 want to see the whole detail. 764 00:43:35,200 --> 00:43:37,570 But again, all that that's trying to say 765 00:43:37,570 --> 00:43:42,460 is, if I actually look, group-to-group in the observed 766 00:43:42,460 --> 00:43:45,520 means, it's got two things inside of it. 767 00:43:45,520 --> 00:43:48,100 It's got the true wafer variance. 768 00:43:48,100 --> 00:43:49,900 But it's also got noise. 769 00:43:49,900 --> 00:43:53,980 And the noise comes from the underlying measurement 770 00:43:53,980 --> 00:43:56,320 noise or the underlying replicate noise, 771 00:43:56,320 --> 00:43:59,680 reduced by the factor of number of replicates I have. 772 00:43:59,680 --> 00:44:03,580 My typical 1 over n reduction in variance. 773 00:44:03,580 --> 00:44:07,580 But that's what's contaminating my noise. 774 00:44:07,580 --> 00:44:12,820 So once I have that, there's one quick observation to make here. 775 00:44:17,980 --> 00:44:21,330 Before we actually use that to get back 776 00:44:21,330 --> 00:44:24,690 to our estimate of what the true variance is, is to go back 777 00:44:24,690 --> 00:44:27,270 to the earlier point we made. 778 00:44:27,270 --> 00:44:33,480 We said that the total variance, the true total variance 779 00:44:33,480 --> 00:44:35,850 should be the independent sum of these two 780 00:44:35,850 --> 00:44:38,190 sources of other variance. 781 00:44:38,190 --> 00:44:42,120 But the same sampling contamination also 782 00:44:42,120 --> 00:44:45,720 occurs not just for what our observed wafer-to-wafer 783 00:44:45,720 --> 00:44:50,010 variance is, but if I were, in fact, to actually 784 00:44:50,010 --> 00:44:56,130 calculate my grand variance in just the data that I observed, 785 00:44:56,130 --> 00:44:59,520 I took all of my data, threw it into one pool, 786 00:44:59,520 --> 00:45:03,480 calculated a mean, and then did the sum of squared deviations 787 00:45:03,480 --> 00:45:06,750 of all of my data from the grand mean, divided it 788 00:45:06,750 --> 00:45:10,115 by the number of data points I had minus 1, 789 00:45:10,115 --> 00:45:12,490 I used one degree of freedom to calculate the grand mean. 790 00:45:12,490 --> 00:45:17,040 So I'm just estimating the total variance in all of my data. 791 00:45:17,040 --> 00:45:21,000 That's the sigma squared t observed. 792 00:45:21,000 --> 00:45:26,550 The observed total variance is different, 793 00:45:26,550 --> 00:45:35,760 is not equal to the actual true, total variance in my system 794 00:45:35,760 --> 00:45:37,090 at work. 795 00:45:37,090 --> 00:45:41,130 In fact, the observed variance will always 796 00:45:41,130 --> 00:45:45,750 be smaller than the actual observed variance. 797 00:45:45,750 --> 00:45:50,820 And the reason is, if I look at the total observed variance, 798 00:45:50,820 --> 00:45:51,600 oops. 799 00:45:51,600 --> 00:45:55,300 I think this is an error here. 800 00:45:55,300 --> 00:45:57,510 I think this should be a d. 801 00:46:00,640 --> 00:46:03,790 That should be ss sub d. 802 00:46:03,790 --> 00:46:05,530 My total sum of squared deviations 803 00:46:05,530 --> 00:46:06,880 divided by m minus 1. 804 00:46:06,880 --> 00:46:11,330 I've also expanded it out into where that data is coming from. 805 00:46:11,330 --> 00:46:13,690 And the point is, I've got my wafer variances, 806 00:46:13,690 --> 00:46:15,120 I've got my measurement variances, 807 00:46:15,120 --> 00:46:19,940 and I've got some number of replications of each of those. 808 00:46:19,940 --> 00:46:24,310 And if I expand that out, essentially what I've got 809 00:46:24,310 --> 00:46:27,400 is multiple samples at work. 810 00:46:27,400 --> 00:46:32,500 When I make my observed calculation of total variance, 811 00:46:32,500 --> 00:46:34,840 that are factors that are multiplying 812 00:46:34,840 --> 00:46:37,510 times the true underlying independent sources 813 00:46:37,510 --> 00:46:42,100 of variance, but with factors that are always smaller than 1. 814 00:46:42,100 --> 00:46:47,230 In other words, I've got m kinds of replications going on 815 00:46:47,230 --> 00:46:51,520 with sampling from multiple samples from replication. 816 00:46:51,520 --> 00:46:54,580 And I always get that 1 over and reduction in variance. 817 00:46:54,580 --> 00:46:56,710 I've got that same thing happening, 818 00:46:56,710 --> 00:47:00,550 both within the measurement reduction, 819 00:47:00,550 --> 00:47:03,790 and within the wafer reduction. 820 00:47:03,790 --> 00:47:05,350 So the simple point here is, this 821 00:47:05,350 --> 00:47:11,620 is why that naive attempt to just use total variance 822 00:47:11,620 --> 00:47:18,160 and use that as my estimate of the true independent sum 823 00:47:18,160 --> 00:47:22,313 of variances at work, why that doesn't apply, 824 00:47:22,313 --> 00:47:23,230 why that doesn't work. 825 00:47:27,500 --> 00:47:31,680 So, now we've got a strategy here. 826 00:47:31,680 --> 00:47:36,350 What we're going to do is number one, estimate 827 00:47:36,350 --> 00:47:37,730 within group variances. 828 00:47:37,730 --> 00:47:39,800 That's still OK. 829 00:47:39,800 --> 00:47:44,090 Number two, we're going to see the observed 830 00:47:44,090 --> 00:47:49,460 group-to-group variance, but then account for sampling, 831 00:47:49,460 --> 00:47:54,530 subtract off the sigma squared m over m, from that, 832 00:47:54,530 --> 00:47:59,450 to get our best estimate of true group-to-group variance. 833 00:47:59,450 --> 00:48:01,040 Now I have group-to-group variance, 834 00:48:01,040 --> 00:48:03,200 I have within group variance, I can 835 00:48:03,200 --> 00:48:06,830 add those to get my best estimate of total variance. 836 00:48:06,830 --> 00:48:08,700 That's the strategy. 837 00:48:08,700 --> 00:48:13,740 So let's go back to our really simple nested variance example, 838 00:48:13,740 --> 00:48:16,885 and use that strategy not the naive approach, 839 00:48:16,885 --> 00:48:20,000 but replacing the naive approach and see 840 00:48:20,000 --> 00:48:22,460 how the numbers come out. 841 00:48:22,460 --> 00:48:25,730 So again, here this was step one of our strategy. 842 00:48:25,730 --> 00:48:30,620 But within group variance, that's still the same. 843 00:48:30,620 --> 00:48:35,840 The observed group-to-group variance 844 00:48:35,840 --> 00:48:39,440 is simply this mean, which was 4, 845 00:48:39,440 --> 00:48:45,920 and this mean, which was 8 from the grand mean, which was 6. 846 00:48:45,920 --> 00:48:49,850 So each of those group-to-group deviations is 2. 847 00:48:49,850 --> 00:48:55,250 And that's squared, I've got 4 plus 4 is 8. 848 00:48:55,250 --> 00:48:58,532 So that's my observed group-to-group variance. 849 00:48:58,532 --> 00:48:59,990 And that's actually the same number 850 00:48:59,990 --> 00:49:03,770 that we had calculated using regular ANOVA. 851 00:49:03,770 --> 00:49:06,710 And then we did the ratio between those two 852 00:49:06,710 --> 00:49:10,980 in order to decide if there was something significant going on. 853 00:49:10,980 --> 00:49:15,410 But now the point is, that was our contaminated estimate 854 00:49:15,410 --> 00:49:16,370 of group-to-group. 855 00:49:16,370 --> 00:49:18,290 That's our observed. 856 00:49:18,290 --> 00:49:22,340 It may be hard to see there, that's the g to g bar. 857 00:49:22,340 --> 00:49:25,640 That's our group-to-group observed averages. 858 00:49:25,640 --> 00:49:30,530 And so now we want to account for the contamination, 859 00:49:30,530 --> 00:49:34,070 subtract off the sampling effect to get 860 00:49:34,070 --> 00:49:38,900 to our best estimate of the true group-to-group variance. 861 00:49:38,900 --> 00:49:42,050 And it's what we observed, the 8. 862 00:49:42,050 --> 00:49:49,940 But now, using sampling to subtract the true variance. 863 00:49:49,940 --> 00:49:52,940 So this is our sigma squared m divided 864 00:49:52,940 --> 00:49:56,280 by the number of measurements I had in each of those groups, 865 00:49:56,280 --> 00:50:00,150 the sampling effect, which was m equals 2, in this case. 866 00:50:00,150 --> 00:50:03,500 So I'm subtracting that component off. 867 00:50:03,500 --> 00:50:05,840 So I'm peeling that part out. 868 00:50:05,840 --> 00:50:09,890 And I get a best estimate now of the true group-to-group 869 00:50:09,890 --> 00:50:12,590 variance is 7. 870 00:50:12,590 --> 00:50:17,600 So now I've got, within group is 2, group-to-group is 7. 871 00:50:17,600 --> 00:50:21,740 My total is now the sum of those two, or 9. 872 00:50:25,290 --> 00:50:28,320 So that's different than what we had seen before. 873 00:50:28,320 --> 00:50:34,950 Our observed total variance, I can't remember what it was. 874 00:50:34,950 --> 00:50:40,300 But it was smaller than that. 875 00:50:40,300 --> 00:50:45,950 OK, that's pretty much the core of the idea. 876 00:50:45,950 --> 00:50:47,690 Let's just do a couple of examples. 877 00:50:47,690 --> 00:50:51,590 And these are examples out of Drain's book. 878 00:50:51,590 --> 00:50:54,740 This is the one starting on page 196. 879 00:50:54,740 --> 00:50:58,250 So it's a little bit more data than our four data points. 880 00:50:58,250 --> 00:51:01,490 But the basic idea is still there. 881 00:51:01,490 --> 00:51:05,540 And this example is looking at the resistivity variation 882 00:51:05,540 --> 00:51:07,760 across multiple wafers. 883 00:51:07,760 --> 00:51:11,750 So he has 6 different wafers, 1, 2, 3, 4, 5, and 6. 884 00:51:11,750 --> 00:51:15,780 And in each case, he's making three replicate measurements. 885 00:51:15,780 --> 00:51:17,090 So we have to be careful again. 886 00:51:17,090 --> 00:51:19,280 We're assuming that he's randomly 887 00:51:19,280 --> 00:51:26,120 sampling within the wafer to get replicate measurements of the 888 00:51:26,120 --> 00:51:28,130 within wafer variation. 889 00:51:28,130 --> 00:51:31,250 And then we're looking at wafer-to-wafer. 890 00:51:31,250 --> 00:51:35,810 Qualitatively, before we start going and applying 891 00:51:35,810 --> 00:51:41,870 all this machinery, what is this data basically telling you, 892 00:51:41,870 --> 00:51:42,800 qualitatively? 893 00:51:42,800 --> 00:51:46,610 And then we'll see if, in fact, the calculations come out 894 00:51:46,610 --> 00:51:49,190 with something that looks consistent to that. 895 00:51:49,190 --> 00:51:51,260 First off, do you think the within wafer 896 00:51:51,260 --> 00:51:55,860 variation is bigger, or is the wafer-to-wafer variation 897 00:51:55,860 --> 00:51:56,360 bigger? 898 00:51:59,170 --> 00:52:01,900 There's a total amount of deviation in this data. 899 00:52:01,900 --> 00:52:04,210 But where's the main source of this? 900 00:52:04,210 --> 00:52:07,300 Percentage-wise, what do you think the lion's share is? 901 00:52:07,300 --> 00:52:09,490 AUDIENCE: You have the wafer-to-wafer variance. 902 00:52:09,490 --> 00:52:11,073 DUANE BONING: Yeah, it's pretty clear. 903 00:52:11,073 --> 00:52:14,920 There's nice clustering within each wafer. 904 00:52:14,920 --> 00:52:17,990 There is some spread, we have to think about that. 905 00:52:17,990 --> 00:52:23,710 But it looks like there's perhaps bigger 906 00:52:23,710 --> 00:52:25,750 wafer-to-wafer deviations than there 907 00:52:25,750 --> 00:52:28,690 are within wafer deviations. 908 00:52:28,690 --> 00:52:33,490 So I'd kind of be looking to see if I'm decomposing these two 909 00:52:33,490 --> 00:52:34,900 sources of variance. 910 00:52:34,900 --> 00:52:37,615 I am expecting the wafer-to-wafer variance 911 00:52:37,615 --> 00:52:40,090 to be a little bit larger. 912 00:52:40,090 --> 00:52:42,340 But I'm not completely sure, because they're 913 00:52:42,340 --> 00:52:44,530 spread within both. 914 00:52:44,530 --> 00:52:46,510 So I'd like to do the right thing 915 00:52:46,510 --> 00:52:50,720 and get good estimates of these two things. 916 00:52:50,720 --> 00:52:54,880 So Drain then goes through, and does an ANOVA. 917 00:52:54,880 --> 00:53:00,490 By the way, you can still do the typical ANOVA, 918 00:53:00,490 --> 00:53:03,160 because in fact, a lot of the intermediate calculations 919 00:53:03,160 --> 00:53:08,200 you reuse in doing the estimate of variance. 920 00:53:08,200 --> 00:53:11,740 And you still want to ask the question, 921 00:53:11,740 --> 00:53:15,700 do I have evidence that the wafer-to-wafer variance 922 00:53:15,700 --> 00:53:19,220 is bigger than the within wafer variance? 923 00:53:19,220 --> 00:53:21,950 Is there group-to-group deviation going on? 924 00:53:21,950 --> 00:53:26,080 So the ANOVA table is still valid for asking that question. 925 00:53:26,080 --> 00:53:27,700 And that's what he does here. 926 00:53:27,700 --> 00:53:30,580 He's got total deviations from the grand mean. 927 00:53:30,580 --> 00:53:34,000 Remember he had 6 wafers, 3 measurements each, 928 00:53:34,000 --> 00:53:35,890 18 total measurements. 929 00:53:35,890 --> 00:53:40,510 So the sum of squares divided by 17 930 00:53:40,510 --> 00:53:44,320 is an estimate of observed total variation. 931 00:53:44,320 --> 00:53:48,460 And then he calculates the wafer-to-wafer sum of squares. 932 00:53:48,460 --> 00:53:52,210 He has the residual, because he's got three replicates 933 00:53:52,210 --> 00:53:53,020 at each case. 934 00:53:53,020 --> 00:53:56,920 He can form the f over that and look at the statistics 935 00:53:56,920 --> 00:53:58,280 associated with that. 936 00:53:58,280 --> 00:54:03,400 And that ratio, 20 times as much mean square 937 00:54:03,400 --> 00:54:07,900 from wafer-to-wafer compared to within wafer, basically saying, 938 00:54:07,900 --> 00:54:09,250 that's very significant. 939 00:54:09,250 --> 00:54:12,610 There is definitely a wafer-to-wafer effect. 940 00:54:12,610 --> 00:54:15,580 That's not just the same as sampling 941 00:54:15,580 --> 00:54:17,950 coming from within wafer. 942 00:54:17,950 --> 00:54:19,760 Highly significant. 943 00:54:19,760 --> 00:54:23,470 So this is standard ANOVA. 944 00:54:23,470 --> 00:54:30,700 And then he very nicely plops out the variance decomposition. 945 00:54:30,700 --> 00:54:32,680 The variance components. 946 00:54:36,400 --> 00:54:37,950 And notice what he's done here. 947 00:54:37,950 --> 00:54:42,390 Part of it is based on directly the observed results 948 00:54:42,390 --> 00:54:44,610 coming directly from the ANOVA. 949 00:54:44,610 --> 00:54:49,170 In fact, if I were to take the total sum of squares, 950 00:54:49,170 --> 00:54:51,030 divide it by its degree of freedoms, 951 00:54:51,030 --> 00:54:52,950 that is my mean square. 952 00:54:52,950 --> 00:54:59,430 And that's my sigma squared t observed. 953 00:54:59,430 --> 00:55:07,110 My wafer mean square, that's sigma squared w bar. 954 00:55:07,110 --> 00:55:09,870 The observed wafer-to-wafer variance, 955 00:55:09,870 --> 00:55:11,580 which was the sum of squares divided 956 00:55:11,580 --> 00:55:13,410 by its degree of freedom. 957 00:55:13,410 --> 00:55:16,800 And then this is my sigma squared measurement. 958 00:55:16,800 --> 00:55:20,360 That's my random estimates. 959 00:55:23,550 --> 00:55:26,560 Notice again, these do not sum. 960 00:55:26,560 --> 00:55:32,370 [LAUGHS] This is the naive calculations. 961 00:55:32,370 --> 00:55:36,390 You do not want to use those for your variance component 962 00:55:36,390 --> 00:55:38,190 estimates. 963 00:55:38,190 --> 00:55:40,230 In order to get to that, you start 964 00:55:40,230 --> 00:55:45,735 with number one, random variation, the replicate error. 965 00:55:45,735 --> 00:55:47,430 That's a good estimate. 966 00:55:47,430 --> 00:55:52,380 But then you unwrap to get your best estimate 967 00:55:52,380 --> 00:56:01,140 of sigma squared wafer-to-wafer, by subtracting off and then 968 00:56:01,140 --> 00:56:03,850 counting for the sampling effects. 969 00:56:03,850 --> 00:56:10,128 And then you sum those together to get sigma t squared. 970 00:56:10,128 --> 00:56:12,045 And what this has done out here in the percent 971 00:56:12,045 --> 00:56:16,170 is simply now assigned a percentage of wafer-to-wafer 972 00:56:16,170 --> 00:56:18,930 versus within wafer variance. 973 00:56:18,930 --> 00:56:22,530 And so in this case, about 87% of the observed variance 974 00:56:22,530 --> 00:56:25,380 is because of wafer-to-wafer. 975 00:56:25,380 --> 00:56:29,295 And only about 13% is within wafer variance. 976 00:56:31,950 --> 00:56:35,100 Now, how did he actually go and do this calculation? 977 00:56:35,100 --> 00:56:37,620 Well, it's those formulas that I gave you. 978 00:56:37,620 --> 00:56:42,990 Or he says, run SAC PROC NESTED in SAS. 979 00:56:46,100 --> 00:56:48,130 That's all he gives you. 980 00:56:48,130 --> 00:56:49,670 OK? 981 00:56:49,670 --> 00:56:52,550 So what I tried to do is in that spreadsheet 982 00:56:52,550 --> 00:56:55,940 that I posted on the website, is actually 983 00:56:55,940 --> 00:56:59,090 go in and do the calculations that you 984 00:56:59,090 --> 00:57:03,800 need in order to get to those variance components, 985 00:57:03,800 --> 00:57:06,080 for this example. 986 00:57:06,080 --> 00:57:09,920 And it's basically just applying all these formulas and concepts 987 00:57:09,920 --> 00:57:12,740 that we've already been talking about. 988 00:57:12,740 --> 00:57:17,300 The tricky piece is actually appropriately counting 989 00:57:17,300 --> 00:57:20,120 for sampling effects. 990 00:57:20,120 --> 00:57:23,990 How many samples go into the denominator 991 00:57:23,990 --> 00:57:27,500 to subtract off the sampling effect, the sigma squared 992 00:57:27,500 --> 00:57:28,880 over m? 993 00:57:28,880 --> 00:57:31,160 It's pretty easy in the two-level case. 994 00:57:31,160 --> 00:57:35,210 But now when I have measurements within wafers within lots, 995 00:57:35,210 --> 00:57:38,150 I've got two levels of sampling going on. 996 00:57:38,150 --> 00:57:41,670 And I got to know what factors to use. 997 00:57:41,670 --> 00:57:44,060 It's not just sigma squared over m. 998 00:57:44,060 --> 00:57:46,610 We'll see in a moment, it's sigma squared 999 00:57:46,610 --> 00:57:50,090 of something divided by m times the number of wafers. 1000 00:57:50,090 --> 00:57:54,560 And my spreadsheets try to help keep careful track of that. 1001 00:57:54,560 --> 00:57:56,480 In the one-level case, it's pretty easy. 1002 00:57:56,480 --> 00:58:01,500 Let me get to two levels in just a second. 1003 00:58:01,500 --> 00:58:03,950 But the other thing that essentially Drain's book 1004 00:58:03,950 --> 00:58:09,650 just pulls out of the air is the interval estimates, again based 1005 00:58:09,650 --> 00:58:11,750 on SAC PROC NESTED. 1006 00:58:14,430 --> 00:58:19,730 And there's no help at all in terms of where you come up 1007 00:58:19,730 --> 00:58:22,700 with interval estimates. 1008 00:58:22,700 --> 00:58:26,270 So basically, my best recommendation 1009 00:58:26,270 --> 00:58:33,550 is simply use our concept of chi-squared distributions 1010 00:58:33,550 --> 00:58:38,800 with the appropriate estimate of the number of degrees 1011 00:58:38,800 --> 00:58:40,780 of freedom or the number of data points 1012 00:58:40,780 --> 00:58:46,960 going into the estimate of that variance piece. 1013 00:58:46,960 --> 00:58:51,070 And my spreadsheet also shows some of that for you. 1014 00:58:51,070 --> 00:58:53,170 By the way, if you actually do that, 1015 00:58:53,170 --> 00:58:56,800 it turns out that the book claims 1016 00:58:56,800 --> 00:59:01,390 for the total variance and the wafer variance and the error 1017 00:59:01,390 --> 00:59:06,130 variances, that those are 95% confidence intervals. 1018 00:59:06,130 --> 00:59:10,030 But I think those are actually 90% confidence intervals 1019 00:59:10,030 --> 00:59:14,080 if you do the calculation with the chi-squared formula 1020 00:59:14,080 --> 00:59:16,960 down here. 1021 00:59:16,960 --> 00:59:22,660 So I'm actually not sure exactly what's going into his tables. 1022 00:59:22,660 --> 00:59:25,990 I get slightly different answers. 1023 00:59:25,990 --> 00:59:29,140 But I think the best conservative estimate, 1024 00:59:29,140 --> 00:59:32,770 which may have a slight amount of extra overcounting 1025 00:59:32,770 --> 00:59:38,560 of variance, but it's a slightly larger confidence interval. 1026 00:59:38,560 --> 00:59:40,240 That is to say it's conservative, 1027 00:59:40,240 --> 00:59:43,750 you're not fooling yourself, and thinking things 1028 00:59:43,750 --> 00:59:45,820 are significant when they're not, 1029 00:59:45,820 --> 00:59:50,770 is simply use the chi-squared distribution. 1030 00:59:50,770 --> 00:59:55,090 And so that's what my best recommendation 1031 00:59:55,090 --> 00:59:57,730 for the interval estimates are. 1032 01:00:00,390 --> 01:00:04,410 OK, so we're pretty comfortable with two levels? 1033 01:00:04,410 --> 01:00:06,030 Let's do three levels. 1034 01:00:06,030 --> 01:00:07,050 Great fun. 1035 01:00:07,050 --> 01:00:09,040 It's the same idea. 1036 01:00:09,040 --> 01:00:14,820 But now I have, not only measurements within wafers. 1037 01:00:14,820 --> 01:00:17,880 I have wafers within lots. 1038 01:00:17,880 --> 01:00:21,390 So I may have a random lot-to-lot effect. 1039 01:00:21,390 --> 01:00:24,240 So I pull 24 wafers, I do lots of processing. 1040 01:00:24,240 --> 01:00:27,390 I pull another 24 wafers, I do some processing. 1041 01:00:27,390 --> 01:00:31,710 There is a lot average that may be different from another lot 1042 01:00:31,710 --> 01:00:33,570 average, from different from another lot 1043 01:00:33,570 --> 01:00:38,620 average, because there's a lot-to-lot variance at work, 1044 01:00:38,620 --> 01:00:40,740 in addition now. 1045 01:00:40,740 --> 01:00:41,400 OK? 1046 01:00:41,400 --> 01:00:43,650 So this is two levels of nesting, 1047 01:00:43,650 --> 01:00:47,720 or a three-level variance structure. 1048 01:00:47,720 --> 01:00:54,350 Now what happens with the observed lot-to-lot variance? 1049 01:00:54,350 --> 01:00:56,010 It's the same idea. 1050 01:00:56,010 --> 01:01:01,190 But now we've got multiple levels of sampling going on. 1051 01:01:01,190 --> 01:01:05,030 You may not be able to see it, but over that l right 1052 01:01:05,030 --> 01:01:06,590 here, there is a bar. 1053 01:01:06,590 --> 01:01:11,510 Again, this is sigma squared l bar. 1054 01:01:11,510 --> 01:01:13,940 The observed lot-to-lot variation. 1055 01:01:13,940 --> 01:01:18,470 And what it's got in it is the true lot-to-lot variance. 1056 01:01:18,470 --> 01:01:25,170 But it's also got wafer-to-wafer variance noise added onto it. 1057 01:01:25,170 --> 01:01:27,320 And then on top of that, it's also 1058 01:01:27,320 --> 01:01:33,110 got replication within wafer variance noise added onto it. 1059 01:01:33,110 --> 01:01:37,490 Now, the good thing is, I've got multiple wafers. 1060 01:01:37,490 --> 01:01:40,250 Say I've got 24 wafers in each lot. 1061 01:01:40,250 --> 01:01:46,520 So the effect of the 24 wafer-to-wafer variance noise 1062 01:01:46,520 --> 01:01:51,710 gets reduced by my factor of w of equal to 24. 1063 01:01:51,710 --> 01:01:55,550 And similarly, also, within each of those wafers, 1064 01:01:55,550 --> 01:01:57,780 I may have 10 measurements. 1065 01:01:57,780 --> 01:02:03,990 And that noise is multiplicatively averaged out. 1066 01:02:03,990 --> 01:02:07,760 I've got a factor of the number of measurements, say it's 10, 1067 01:02:07,760 --> 01:02:10,340 and number of wafers in a lot, would say 1068 01:02:10,340 --> 01:02:13,010 it was 24 wafers in each lot. 1069 01:02:13,010 --> 01:02:17,030 That's an awful lot of data of measurement noise 1070 01:02:17,030 --> 01:02:20,300 that has lots of chances with a big denominator 1071 01:02:20,300 --> 01:02:22,370 here to average out. 1072 01:02:22,370 --> 01:02:26,750 So that factor starts to get smaller fairly rapidly. 1073 01:02:26,750 --> 01:02:31,550 The lowest levels start to become smaller. 1074 01:02:31,550 --> 01:02:35,680 But the basic strategy is going to be the same. 1075 01:02:35,680 --> 01:02:40,010 A phrase I've heard, or maybe Drain uses, is, 1076 01:02:40,010 --> 01:02:44,930 what we want to do is estimate the individual variance 1077 01:02:44,930 --> 01:02:48,170 components, three levels, but think of it 1078 01:02:48,170 --> 01:02:51,850 as peeling the onion from the inside out. 1079 01:02:55,300 --> 01:02:59,860 I'm confident at the innermost level of the variance 1080 01:02:59,860 --> 01:03:01,180 of measurements. 1081 01:03:01,180 --> 01:03:04,420 Once I have that, I have the observed wafer-to-wafer 1082 01:03:04,420 --> 01:03:06,640 variance, and I could subtract the sampling out. 1083 01:03:06,640 --> 01:03:09,310 So now I have the next inner level 1084 01:03:09,310 --> 01:03:12,040 of the onion, a good estimate for that. 1085 01:03:12,040 --> 01:03:15,010 Using that, I can subtract out that sampling 1086 01:03:15,010 --> 01:03:19,430 from the outermost level of observed variances. 1087 01:03:19,430 --> 01:03:22,840 So it's the same strategy, we work from the inside 1088 01:03:22,840 --> 01:03:28,450 out to get estimates of the outer levels of variance. 1089 01:03:28,450 --> 01:03:29,025 Yeah. 1090 01:03:29,025 --> 01:03:30,817 AUDIENCE: This one gives a different answer 1091 01:03:30,817 --> 01:03:33,152 if you do two wafers and three measurements 1092 01:03:33,152 --> 01:03:35,323 or three wafers and two measurements. 1093 01:03:35,323 --> 01:03:36,365 DUANE BONING: Absolutely. 1094 01:03:36,365 --> 01:03:39,362 AUDIENCE: It's saying there's a better 1095 01:03:39,362 --> 01:03:43,405 way of doing things, like [INAUDIBLE] wafer 1096 01:03:43,405 --> 01:03:45,120 with fewer measurements. 1097 01:03:49,530 --> 01:03:50,810 DUANE BONING: Absolutely. 1098 01:03:50,810 --> 01:03:54,110 So what Nalish is saying here is, 1099 01:03:54,110 --> 01:03:58,580 you can imagine you will get different answers if I 1100 01:03:58,580 --> 01:04:02,550 had two measurements on each of 3 wafers, 6 total measurements, 1101 01:04:02,550 --> 01:04:03,870 for example. 1102 01:04:03,870 --> 01:04:05,840 Or if I took those same 6 measurements 1103 01:04:05,840 --> 01:04:09,590 and I did 3 measurements on only 2 wafers. 1104 01:04:09,590 --> 01:04:11,420 Those denominators are different. 1105 01:04:11,420 --> 01:04:16,340 And your precision of your estimates will be different. 1106 01:04:16,340 --> 01:04:21,470 Both your point estimates may be slightly different. 1107 01:04:21,470 --> 01:04:23,870 But also your confidence intervals 1108 01:04:23,870 --> 01:04:28,280 will be different, because in essence, the amount of noise 1109 01:04:28,280 --> 01:04:31,700 is going differently in the two cases, 1110 01:04:31,700 --> 01:04:33,830 and the number of samples is going. 1111 01:04:33,830 --> 01:04:37,970 So that's jumping ahead about two more slides, 1112 01:04:37,970 --> 01:04:40,070 but it's exactly the point that this 1113 01:04:40,070 --> 01:04:42,710 does have an important implication on sampling. 1114 01:04:42,710 --> 01:04:46,250 How you construct your sampling plan, 1115 01:04:46,250 --> 01:04:48,980 how you allocate your total measurement budget 1116 01:04:48,980 --> 01:04:51,890 and total replication budget, depending 1117 01:04:51,890 --> 01:04:56,537 on which variance maybe you want to estimate most accurately. 1118 01:05:01,300 --> 01:05:04,420 Let me just qualitatively show you 1119 01:05:04,420 --> 01:05:09,520 the results here for the three-level example in Drain. 1120 01:05:09,520 --> 01:05:11,510 This is building on the two-level example, 1121 01:05:11,510 --> 01:05:14,320 so we're still looking at sheet resistance. 1122 01:05:14,320 --> 01:05:19,420 We now have 3 wafers within each lot. 1123 01:05:19,420 --> 01:05:24,940 So this is lot 1, lot 2, all the way up to 11 lots. 1124 01:05:24,940 --> 01:05:27,550 And then within each lot, the two little triangles 1125 01:05:27,550 --> 01:05:31,410 here, we're taking two measurements within each wafer. 1126 01:05:31,410 --> 01:05:38,530 Now qualitatively, what do you think is the biggest source 1127 01:05:38,530 --> 01:05:39,840 of invariance in this data? 1128 01:05:47,420 --> 01:05:51,830 Is it within wafer, wafer-to-wafer, or lot-to-lot? 1129 01:05:55,686 --> 01:05:57,614 AUDIENCE: Wafer-to-wafer. 1130 01:05:57,614 --> 01:05:58,460 DUANE BONING: Yeah. 1131 01:05:58,460 --> 01:06:01,370 So I hear a vote, and I kind of concur with it, 1132 01:06:01,370 --> 01:06:04,030 wafer-to-wafer looks pretty big. 1133 01:06:04,030 --> 01:06:07,820 So for example, here's wafer-to-wafer, 1134 01:06:07,820 --> 01:06:09,890 another wafer-to-wafer. 1135 01:06:09,890 --> 01:06:12,590 Within wafer, it's pretty nicely clustered. 1136 01:06:12,590 --> 01:06:16,100 So I don't expect a big within-wafer variance. 1137 01:06:16,100 --> 01:06:17,720 Lot-to-lot's a little harder to see, 1138 01:06:17,720 --> 01:06:20,910 because I have to average these 3 wafers. 1139 01:06:20,910 --> 01:06:27,140 But it looks like there is some lot-to-lot variations. 1140 01:06:27,140 --> 01:06:29,270 But it looks a little bit smaller. 1141 01:06:29,270 --> 01:06:35,510 So it looks to me like sigma squared wafer is bigger 1142 01:06:35,510 --> 01:06:37,610 than sigma squared lot, which is bigger 1143 01:06:37,610 --> 01:06:41,250 than sigma squared measurement. 1144 01:06:41,250 --> 01:06:43,910 So let's see if that comes out of our variance components. 1145 01:06:48,030 --> 01:06:50,580 I've given you also this giant spreadsheet 1146 01:06:50,580 --> 01:06:52,090 table with all of that data. 1147 01:06:52,090 --> 01:06:53,730 And again, the estimates. 1148 01:06:53,730 --> 01:06:56,130 There's again, the standard ANOVA, and then 1149 01:06:56,130 --> 01:06:59,370 the splitting out into the variance components. 1150 01:06:59,370 --> 01:07:10,470 In the standard ANOVA, you can actually ask the question, 1151 01:07:10,470 --> 01:07:14,590 is there statistical evidence for wafer-to-wafer variation? 1152 01:07:14,590 --> 01:07:18,180 So it's basically, that ratio right there is 62. 1153 01:07:18,180 --> 01:07:24,030 And it's highly unlikely that that's by chance alone. 1154 01:07:24,030 --> 01:07:29,460 He also does, is there evidence for lot-to-lot variance? 1155 01:07:29,460 --> 01:07:33,160 And in the standard ANOVA, it looks pretty weak. 1156 01:07:33,160 --> 01:07:36,340 Given the amount of wafer-to-wafer variance, 1157 01:07:36,340 --> 01:07:39,190 the ANOVA table is saying, what you 1158 01:07:39,190 --> 01:07:43,660 may be observing for your lot-to-lot deviations, 1159 01:07:43,660 --> 01:07:46,270 is because there's big wafer-to-wafer variation. 1160 01:07:46,270 --> 01:07:52,120 It may not be a significant lot-to-lot variance. 1161 01:07:54,730 --> 01:07:56,350 So that's interesting. 1162 01:07:56,350 --> 01:07:58,160 Let's come back to that in a second. 1163 01:07:58,160 --> 01:07:59,200 Oops. 1164 01:07:59,200 --> 01:08:02,410 Now we can go in, this is just the ANOVA mean squares. 1165 01:08:02,410 --> 01:08:05,140 But then you do this unwrapping of the variance, 1166 01:08:05,140 --> 01:08:06,710 accounting for sampling. 1167 01:08:06,710 --> 01:08:10,210 And what he observes is, the pure replication variance 1168 01:08:10,210 --> 01:08:11,740 is pretty small. 1169 01:08:11,740 --> 01:08:14,320 The wafer invariance is pretty large. 1170 01:08:14,320 --> 01:08:17,770 There's a small remaining lot-to-lot point 1171 01:08:17,770 --> 01:08:20,510 estimate of variance. 1172 01:08:20,510 --> 01:08:22,010 And then we have our total variance. 1173 01:08:22,010 --> 01:08:25,660 So if I decompose that, it looks like about 89% 1174 01:08:25,660 --> 01:08:31,600 is wafer-to-wafer, 3% is within wafer, very small within wafer. 1175 01:08:31,600 --> 01:08:34,240 And here at this point estimate is 1176 01:08:34,240 --> 01:08:37,979 8% of the variance, that's my best guess. 1177 01:08:37,979 --> 01:08:40,479 8% of the variance is coming from 1178 01:08:40,479 --> 01:08:43,120 separate lot-to-lot variance. 1179 01:08:43,120 --> 01:08:45,734 That's point estimates. 1180 01:08:45,734 --> 01:08:47,109 There's something nagging me here 1181 01:08:47,109 --> 01:08:50,740 about this ANOVA observation. 1182 01:08:50,740 --> 01:08:54,880 That lot-to-lot variance wasn't significant. 1183 01:08:54,880 --> 01:08:57,939 What if we looked at the confidence intervals? 1184 01:08:57,939 --> 01:09:00,250 This now looks at the interval estimates. 1185 01:09:00,250 --> 01:09:03,700 And what we see here is, here's our point estimate 1186 01:09:03,700 --> 01:09:06,160 for the replication. 1187 01:09:06,160 --> 01:09:10,479 And it's got a range from 1 to about 3. 1188 01:09:10,479 --> 01:09:12,939 Not too bad of a range for that. 1189 01:09:12,939 --> 01:09:17,859 My wafer variance, I had about 56 as my point estimate. 1190 01:09:17,859 --> 01:09:20,740 And that, based on the numbers of samples and everything 1191 01:09:20,740 --> 01:09:26,350 that I've got, might range from 33 to 113. 1192 01:09:26,350 --> 01:09:28,180 And here's the interesting thing. 1193 01:09:28,180 --> 01:09:32,470 If we do that point estimate for our lot variance, 1194 01:09:32,470 --> 01:09:36,640 but actually look at the chi-squared or it's again, 1195 01:09:36,640 --> 01:09:39,340 this weird, slightly something different than chi-squared 1196 01:09:39,340 --> 01:09:41,140 but very close to it. 1197 01:09:41,140 --> 01:09:46,240 What you get in this case is a negative estimate for the lower 1198 01:09:46,240 --> 01:09:47,530 limit of lot variance. 1199 01:09:50,189 --> 01:09:51,510 Woohoo. 1200 01:09:51,510 --> 01:09:55,530 When a confidence interval intersects 0, that tells you 1201 01:09:55,530 --> 01:09:57,930 it might be 0. 1202 01:09:57,930 --> 01:10:01,320 So in fact, we would set the lower bound to 0. 1203 01:10:01,320 --> 01:10:03,150 If I still needed my best point estimate, 1204 01:10:03,150 --> 01:10:04,890 I would still stick with the 5. 1205 01:10:04,890 --> 01:10:09,630 But this is basically telling me, consistent with the ANOVA, 1206 01:10:09,630 --> 01:10:17,880 that I don't have more than 95% confidence that there's 1207 01:10:17,880 --> 01:10:20,840 a non-zero lot-to-lot variance at work. 1208 01:10:24,330 --> 01:10:29,340 And in fact, if I wanted to, I might go back 1209 01:10:29,340 --> 01:10:35,670 and say, I'm going to set and assume in a new model 1210 01:10:35,670 --> 01:10:38,880 that the lot-to-lot variance is 0. 1211 01:10:38,880 --> 01:10:41,550 And I'm going to attribute that, lump that together 1212 01:10:41,550 --> 01:10:44,520 with the wafer-to-wafer, and build 1213 01:10:44,520 --> 01:10:46,260 just a two-level nested invariance 1214 01:10:46,260 --> 01:10:49,350 where I don't include that as a separate variance source. 1215 01:10:49,350 --> 01:10:52,110 Since it wasn't significant, you might not 1216 01:10:52,110 --> 01:10:54,460 want to include that in your model. 1217 01:10:54,460 --> 01:10:55,880 Is there a question? 1218 01:10:55,880 --> 01:10:58,480 AUDIENCE: Yes. 1219 01:10:58,480 --> 01:11:00,820 So for this analysis, what's the degree 1220 01:11:00,820 --> 01:11:04,738 of freedom that you're going to use in the chi-square? 1221 01:11:04,738 --> 01:11:05,530 DUANE BONING: Yeah. 1222 01:11:05,530 --> 01:11:08,260 It's a little bit tricky, but it's in the spreadsheet. 1223 01:11:08,260 --> 01:11:15,520 Basically, what you do is if you look at the denominator 1224 01:11:15,520 --> 01:11:19,810 when you take the sampling effect, 1225 01:11:19,810 --> 01:11:23,830 so it might be m times w, or m times w minus 1, 1226 01:11:23,830 --> 01:11:26,260 because I have a grand mean. 1227 01:11:26,260 --> 01:11:29,380 When I'm doing that for the lot-to-lot variance, 1228 01:11:29,380 --> 01:11:31,550 that would be my degree of freedom. 1229 01:11:31,550 --> 01:11:33,670 So the degree of freedom that you use, 1230 01:11:33,670 --> 01:11:36,190 the n minus 1 in the chi-squared, 1231 01:11:36,190 --> 01:11:39,400 changes depending on which variance you're estimating, 1232 01:11:39,400 --> 01:11:41,290 which variance interval. 1233 01:11:41,290 --> 01:11:44,230 And I actually have both the definitions 1234 01:11:44,230 --> 01:11:46,740 with the variable names in the spreadsheet, 1235 01:11:46,740 --> 01:11:51,070 and then what the numbers are for this data. 1236 01:11:51,070 --> 01:11:52,690 So that is tricky. 1237 01:11:52,690 --> 01:11:58,810 But it basically, is anytime you're estimating a variance, 1238 01:11:58,810 --> 01:12:01,090 you have a sum of squared deviations. 1239 01:12:01,090 --> 01:12:02,920 And then you take the mean square. 1240 01:12:02,920 --> 01:12:04,780 What's going down in the denominator 1241 01:12:04,780 --> 01:12:06,910 in the mean square estimate, that's 1242 01:12:06,910 --> 01:12:09,800 what you're using for the degree of freedom. 1243 01:12:09,800 --> 01:12:13,616 AUDIENCE: Are they still the same as ANOVA analysis? 1244 01:12:13,616 --> 01:12:17,110 DUANE BONING: Not quite. 1245 01:12:17,110 --> 01:12:19,960 Where they're really based is based on these things. 1246 01:12:24,930 --> 01:12:29,460 So you'll see that in the spreadsheet. 1247 01:12:29,460 --> 01:12:35,112 So the last point I wanted to make 1248 01:12:35,112 --> 01:12:38,760 has to do with, we've already talked about this a little bit, 1249 01:12:38,760 --> 01:12:41,040 how you allocate the measurement budget. 1250 01:12:41,040 --> 01:12:45,030 And the simple observation is, when you're out in outer level, 1251 01:12:45,030 --> 01:12:49,380 if I'm trying to estimate the lot-to-lot variance, 1252 01:12:49,380 --> 01:12:51,640 what most strongly affects that? 1253 01:12:51,640 --> 01:12:55,170 And it's basically the data in the outermost level, 1254 01:12:55,170 --> 01:12:58,230 because the variance component of the innermost level 1255 01:12:58,230 --> 01:13:00,080 gets averaged away fairly quickly. 1256 01:13:03,090 --> 01:13:09,360 So the contamination of measurement variance 1257 01:13:09,360 --> 01:13:16,605 can be reduced if you pick your sampling plan easily. 1258 01:13:19,470 --> 01:13:24,720 By the way, you might still actually 1259 01:13:24,720 --> 01:13:31,140 care about sigma squared in the observed mean-to-mean averages. 1260 01:13:31,140 --> 01:13:35,850 We said that it's not the best estimate of the true, say, 1261 01:13:35,850 --> 01:13:37,690 wafer-to-wafer average. 1262 01:13:37,690 --> 01:13:41,820 But if I were doing SPC, statistical process control 1263 01:13:41,820 --> 01:13:47,250 charting, based on, I observed some sampling plan 1264 01:13:47,250 --> 01:13:51,450 and I observe and I plot on my chart, an observed wafer 1265 01:13:51,450 --> 01:13:55,650 average, that may be what I want to control on. 1266 01:13:55,650 --> 01:13:57,660 And so I actually might still want 1267 01:13:57,660 --> 01:14:05,180 to use that, the sigma x bar, for setting of my control 1268 01:14:05,180 --> 01:14:08,030 limits, because that's the data that I'm charting. 1269 01:14:08,030 --> 01:14:12,110 So don't throw away our old idea of keeping track 1270 01:14:12,110 --> 01:14:14,690 of the observed wafer average. 1271 01:14:14,690 --> 01:14:17,120 Just recognize that it's actually 1272 01:14:17,120 --> 01:14:19,880 got a mix of a couple of variance components inside 1273 01:14:19,880 --> 01:14:22,150 of it. 1274 01:14:22,150 --> 01:14:26,410 OK, and then the last point here was simply the point 1275 01:14:26,410 --> 01:14:32,320 that Nalish already made, is, if you're really 1276 01:14:32,320 --> 01:14:36,760 looking for an outer level variance estimate, what 1277 01:14:36,760 --> 01:14:43,220 you want to do is push more data to the lower levels, 1278 01:14:43,220 --> 01:14:44,970 in order to reduce these things. 1279 01:14:44,970 --> 01:14:50,960 So for example, if I allocated almost all of my data 1280 01:14:50,960 --> 01:14:56,640 just to m, that reduces this factor a lot, 1281 01:14:56,640 --> 01:14:58,970 but not this factor very much. 1282 01:14:58,970 --> 01:15:01,160 So to get the biggest multiplicative 1283 01:15:01,160 --> 01:15:03,260 bang for the buck, what you want to do 1284 01:15:03,260 --> 01:15:09,140 is push it just barely outside of the factor 1285 01:15:09,140 --> 01:15:12,300 that you're trying to estimate. 1286 01:15:12,300 --> 01:15:14,510 So if I'm looking at lot-to-lot variance, 1287 01:15:14,510 --> 01:15:17,960 I need at least multiple w's to get rid of the wafer-to-wafer 1288 01:15:17,960 --> 01:15:19,550 effect. 1289 01:15:19,550 --> 01:15:21,410 And then I get the multiplicative effect 1290 01:15:21,410 --> 01:15:22,700 also with the m. 1291 01:15:22,700 --> 01:15:26,270 That already is multiplying up fairly rapidly. 1292 01:15:26,270 --> 01:15:31,820 But if I want to suppress this factor, 1293 01:15:31,820 --> 01:15:35,540 I need at least some number of wafer replicates. 1294 01:15:35,540 --> 01:15:38,360 On the other hand, if I think that variance is very small, 1295 01:15:38,360 --> 01:15:40,700 and this variance is very large, I 1296 01:15:40,700 --> 01:15:44,910 might allocate more to the m factor. 1297 01:15:44,910 --> 01:15:48,230 So this can influence your strategy 1298 01:15:48,230 --> 01:15:51,500 for how you pick your sampling plans when 1299 01:15:51,500 --> 01:15:53,840 you've got nested structures. 1300 01:15:53,840 --> 01:15:58,010 OK, so to summarize, we have been looking here 1301 01:15:58,010 --> 01:16:01,220 at nested variance structures with this weird grouping 1302 01:16:01,220 --> 01:16:07,310 within one group within another group within another group. 1303 01:16:07,310 --> 01:16:09,710 First off, you should be able to recognize when you've 1304 01:16:09,710 --> 01:16:11,880 got nested variance structures. 1305 01:16:11,880 --> 01:16:15,170 Second, hopefully now you've got at least a feel 1306 01:16:15,170 --> 01:16:19,920 for how you would estimate those separate variance components. 1307 01:16:19,920 --> 01:16:24,710 And then there is a little bit of implications on design plans 1308 01:16:24,710 --> 01:16:28,040 that hopefully you're alert to. 1309 01:16:28,040 --> 01:16:30,500 So you will have a chance to play around with this at least 1310 01:16:30,500 --> 01:16:32,150 a little bit on the problem set, if you 1311 01:16:32,150 --> 01:16:33,830 haven't started that already. 1312 01:16:33,830 --> 01:16:35,100 Do look at the spreadsheet. 1313 01:16:35,100 --> 01:16:38,250 I think that will be a big help to you on that. 1314 01:16:38,250 --> 01:16:40,010 So with that, we'll end. 1315 01:16:40,010 --> 01:16:41,630 And I'll stick around for a minute, 1316 01:16:41,630 --> 01:16:45,032 because it sounds like there's a question in the Singapore end. 1317 01:16:45,032 --> 01:16:48,482 AUDIENCE: Yeah, I just have a question. 1318 01:16:48,482 --> 01:16:49,310 Should I ask now? 1319 01:16:49,310 --> 01:16:51,380 DUANE BONING: Yeah. 1320 01:16:51,380 --> 01:16:54,670 But you guys should feel free to go if you want here. 1321 01:16:54,670 --> 01:16:56,975 AUDIENCE: How do you tell whether it's a fixed effect 1322 01:16:56,975 --> 01:16:59,810 or if it's a nested variance? 1323 01:16:59,810 --> 01:17:01,320 DUANE BONING: Oh, good question. 1324 01:17:01,320 --> 01:17:04,940 How do you tell if it's a fixed effect or nested variance? 1325 01:17:04,940 --> 01:17:06,562 That's a model assumption. 1326 01:17:09,140 --> 01:17:17,660 So I think the basic idea is, if it's a fixed effect, 1327 01:17:17,660 --> 01:17:21,020 and I think I'm changing my group-to-group by, say, 1328 01:17:21,020 --> 01:17:24,680 a design, if wafer number 2 in the lot 1329 01:17:24,680 --> 01:17:28,250 always has a delta of some size as opposed 1330 01:17:28,250 --> 01:17:30,380 to being randomly sampled, that might 1331 01:17:30,380 --> 01:17:32,660 be a systematic fixed effect. 1332 01:17:32,660 --> 01:17:34,595 But just raw data, I don't know. 1333 01:17:34,595 --> 01:17:37,490 You actually have to look at the setup of the situation 1334 01:17:37,490 --> 01:17:44,630 to know whether each one is treated as a wafer replicate, 1335 01:17:44,630 --> 01:17:47,720 or if I'm doing something different to each wafer 1336 01:17:47,720 --> 01:17:48,650 intentionally. 1337 01:17:48,650 --> 01:17:50,570 That would be a fixed effect. 1338 01:17:50,570 --> 01:17:52,500 OK? 1339 01:17:52,500 --> 01:17:53,030 All right? 1340 01:17:53,030 --> 01:17:55,520 So Thursday, we'll see you on Thursday 1341 01:17:55,520 --> 01:17:59,260 with Dan Frey as a guest lecturer.