1 00:00:00,000 --> 00:00:02,430 The following content is provided under a Creative 2 00:00:02,430 --> 00:00:03,730 Commons license. 3 00:00:03,730 --> 00:00:06,030 Your support will help MIT OpenCourseWare 4 00:00:06,030 --> 00:00:10,060 continue to offer high quality educational resources for free. 5 00:00:10,060 --> 00:00:12,690 To make a donation or to view additional materials 6 00:00:12,690 --> 00:00:16,560 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,560 --> 00:00:17,904 at ocw.mit.edu. 8 00:00:21,332 --> 00:00:22,040 DUANE BONING: OK. 9 00:00:22,040 --> 00:00:23,930 As you'll see with today's lecture, 10 00:00:23,930 --> 00:00:27,350 I made a small change in the lecture sequence. 11 00:00:27,350 --> 00:00:32,060 I basically combined the last part of Tuesday's lecture 12 00:00:32,060 --> 00:00:35,540 with what I was going to talk about next week because they 13 00:00:35,540 --> 00:00:38,360 really go together better anyway, which is multivariate 14 00:00:38,360 --> 00:00:41,810 SPC together with a couple of the advanced control charting. 15 00:00:41,810 --> 00:00:47,210 And so next week we'll do the yield modeling lecture 16 00:00:47,210 --> 00:00:51,840 rather than try to squish that in today. 17 00:00:51,840 --> 00:00:55,050 So what I want to do today is first 18 00:00:55,050 --> 00:00:56,212 have a very brief warm up. 19 00:00:56,212 --> 00:00:57,920 Just get us back into the swing of things 20 00:00:57,920 --> 00:01:00,800 with conventional control charts and then talk a little bit 21 00:01:00,800 --> 00:01:03,080 about these alternative control charts 22 00:01:03,080 --> 00:01:07,520 with a little bit of increased sensitivity for rapid detection 23 00:01:07,520 --> 00:01:12,020 of shifts, as well as drifts. 24 00:01:12,020 --> 00:01:14,360 But primarily targeted at shifts. 25 00:01:14,360 --> 00:01:16,790 And these are the moving average-- exponentially 26 00:01:16,790 --> 00:01:19,700 weighted moving average and the cumulative sum chart. 27 00:01:19,700 --> 00:01:24,260 And then we'll move into multivariate SPC, which 28 00:01:24,260 --> 00:01:28,940 is a little bit of a tutorial on multivariate statistics 29 00:01:28,940 --> 00:01:30,900 in general. 30 00:01:30,900 --> 00:01:31,910 So that's the plan. 31 00:01:31,910 --> 00:01:36,770 So to get us warmed up we'll do a rapid design of a control 32 00:01:36,770 --> 00:01:38,750 chart. 33 00:01:38,750 --> 00:01:40,880 How to do this relatively quickly. 34 00:01:40,880 --> 00:01:43,400 I will tell you that we have a model based 35 00:01:43,400 --> 00:01:46,340 on lots of historical data of this process 36 00:01:46,340 --> 00:01:49,490 that when it is in control it's operating 37 00:01:49,490 --> 00:01:52,760 as a normally distributed process with a mean of 1 38 00:01:52,760 --> 00:01:54,380 and a variance of 1. 39 00:01:54,380 --> 00:01:58,280 We're going to be taking samples every nine parts. 40 00:01:58,280 --> 00:02:00,720 And then the next 9 and so on. 41 00:02:00,720 --> 00:02:02,290 So what should the-- 42 00:02:02,290 --> 00:02:03,200 here's the easy one. 43 00:02:03,200 --> 00:02:04,920 Wake you up in the morning. 44 00:02:04,920 --> 00:02:07,682 What should the center line be? 45 00:02:07,682 --> 00:02:08,273 AUDIENCE: 5. 46 00:02:08,273 --> 00:02:08,940 DUANE BONING: 5. 47 00:02:08,940 --> 00:02:09,710 OK. 48 00:02:09,710 --> 00:02:14,700 Everybody-- how about the plus minus 3 sigma? 49 00:02:14,700 --> 00:02:20,910 We're quite happy with the 1 in 370 ARL 50 00:02:20,910 --> 00:02:24,990 with normal typical plus minus 3 sigma control limits. 51 00:02:24,990 --> 00:02:27,750 Where should the plus minus 3 sigma control limits be? 52 00:02:31,840 --> 00:02:33,190 I hear a 6 and 4. 53 00:02:33,190 --> 00:02:35,548 Why? 54 00:02:35,548 --> 00:02:37,940 AUDIENCE: [INAUDIBLE]. 55 00:02:37,940 --> 00:02:38,975 DUANE BONING: I'm sorry? 56 00:02:38,975 --> 00:02:40,082 AUDIENCE: [INAUDIBLE] 57 00:02:40,082 --> 00:02:40,790 DUANE BONING: OK. 58 00:02:40,790 --> 00:02:42,396 Which is? 59 00:02:42,396 --> 00:02:48,252 AUDIENCE: Mu plus sigma [INAUDIBLE].. 60 00:02:48,252 --> 00:02:53,130 DUANE BONING: Mu plus sigma plus minus 3 sigma. 61 00:02:53,130 --> 00:02:56,080 3 sigma over square root of sample size. 62 00:02:56,080 --> 00:02:59,835 And so the 3 sigma and the 3, which comes from the root 63 00:02:59,835 --> 00:03:04,170 [? end ?] make it a nice plus minus 1. 64 00:03:04,170 --> 00:03:06,040 And there are some data. 65 00:03:06,040 --> 00:03:06,570 OK. 66 00:03:06,570 --> 00:03:09,720 So I plotted 99. 67 00:03:09,720 --> 00:03:12,840 We ran the process 99 runs. 68 00:03:12,840 --> 00:03:14,707 We have 11 samples. 69 00:03:14,707 --> 00:03:15,540 Here's the question. 70 00:03:15,540 --> 00:03:16,920 Is the process in control? 71 00:03:21,340 --> 00:03:23,740 We have a yes. 72 00:03:23,740 --> 00:03:24,790 We have another yes. 73 00:03:31,510 --> 00:03:32,920 I contend you don't know. 74 00:03:36,960 --> 00:03:39,010 It's a little bit tricky here. 75 00:03:39,010 --> 00:03:40,730 AUDIENCE: Not the 1 [INAUDIBLE]. 76 00:03:43,323 --> 00:03:44,490 DUANE BONING: Yeah, I guess. 77 00:03:44,490 --> 00:03:46,157 That's actually not what I have in mind. 78 00:03:46,157 --> 00:03:50,650 You could go in and start down the path of looking and saying, 79 00:03:50,650 --> 00:03:54,450 OK, what's the probability of having, for example three 80 00:03:54,450 --> 00:03:55,890 rising points in a row? 81 00:03:55,890 --> 00:03:57,570 And all the other [INAUDIBLE]. 82 00:03:57,570 --> 00:04:04,060 But I'll say none of that's happening. 83 00:04:04,060 --> 00:04:06,960 So I will say that the mean-- 84 00:04:06,960 --> 00:04:09,480 there's nothing too strange happening with the mean 85 00:04:09,480 --> 00:04:11,220 that you can tell with this chart. 86 00:04:11,220 --> 00:04:14,010 Is the process in control? 87 00:04:14,010 --> 00:04:15,977 AUDIENCE: [INAUDIBLE] variance [INAUDIBLE]---- 88 00:04:15,977 --> 00:04:16,769 DUANE BONING: Yeah. 89 00:04:16,769 --> 00:04:18,686 AUDIENCE: [INAUDIBLE] variance is getting huge 90 00:04:18,686 --> 00:04:20,172 compared to earlier points. 91 00:04:20,172 --> 00:04:20,880 DUANE BONING: Oh. 92 00:04:20,880 --> 00:04:22,995 So you're saying the variance is getting larger. 93 00:04:22,995 --> 00:04:24,210 AUDIENCE: [INAUDIBLE] variance-- yeah. 94 00:04:24,210 --> 00:04:25,043 DUANE BONING: Maybe. 95 00:04:25,043 --> 00:04:26,250 It's hard to tell with this. 96 00:04:26,250 --> 00:04:28,290 This Is a mean chart. 97 00:04:28,290 --> 00:04:31,500 All it's telling you is the tracking of the mean. 98 00:04:31,500 --> 00:04:34,410 We said that the process model is that we have 99 00:04:34,410 --> 00:04:35,940 this mean and this variance. 100 00:04:35,940 --> 00:04:38,550 And so I guess the main point here 101 00:04:38,550 --> 00:04:41,670 is you don't really know if you're just tracking the mean. 102 00:04:41,670 --> 00:04:46,300 What's happening also with the variance within your sample? 103 00:04:46,300 --> 00:04:48,730 And in fact, on the next page this 104 00:04:48,730 --> 00:04:53,830 is the raw data-- all 99 points where we grouped 9 by 9 by 9 105 00:04:53,830 --> 00:04:56,120 to get the sample. 106 00:04:56,120 --> 00:05:00,970 And if you saw this, you might be a little less comfortable. 107 00:05:00,970 --> 00:05:03,250 Now you could always go back and look at raw data. 108 00:05:03,250 --> 00:05:05,720 And raw data is always really good to do. 109 00:05:05,720 --> 00:05:07,390 But what you would also want to do 110 00:05:07,390 --> 00:05:11,930 is a typical S chart or a range chart or something like that. 111 00:05:11,930 --> 00:05:13,570 And in this case, if you design the S 112 00:05:13,570 --> 00:05:19,030 chart with those apriori pieces of data-- 113 00:05:19,030 --> 00:05:25,150 sample size is 9 and the process variance is 1-- 114 00:05:25,150 --> 00:05:29,080 with the C4 correction factor and so on-- all of that just 115 00:05:29,080 --> 00:05:32,030 out of the formula-- 116 00:05:32,030 --> 00:05:34,210 you get these control limits. 117 00:05:34,210 --> 00:05:37,630 And sure enough, at about sample 10 118 00:05:37,630 --> 00:05:39,070 we're above the control limit. 119 00:05:39,070 --> 00:05:42,310 Sample 11 is even more above the control limit. 120 00:05:42,310 --> 00:05:46,630 And that's indicative of essentially 121 00:05:46,630 --> 00:05:49,400 the increased variance of those two groups. 122 00:05:49,400 --> 00:05:52,000 And I will tell you, I generated this data with JMP. 123 00:05:52,000 --> 00:05:56,470 And sure enough, right at the whatever it was-- 124 00:05:56,470 --> 00:05:59,020 the 77th point-- 125 00:05:59,020 --> 00:06:03,220 I re-sampled from a normal distribution with something 126 00:06:03,220 --> 00:06:06,350 like a variance of 1.9 or 2 or something like that. 127 00:06:06,350 --> 00:06:12,550 So the main point here was you need to monitor both aspects-- 128 00:06:12,550 --> 00:06:16,720 both the mean and the variance to know whether your process 129 00:06:16,720 --> 00:06:23,080 model is still operative. 130 00:06:23,080 --> 00:06:25,060 We're saying that for a normal distribution 131 00:06:25,060 --> 00:06:28,600 the mean and variance are really telling you everything. 132 00:06:28,600 --> 00:06:30,520 Now, the other sub point in here is 133 00:06:30,520 --> 00:06:33,700 that if you were actually looking at this run data, 134 00:06:33,700 --> 00:06:34,900 there is a lot of noise. 135 00:06:34,900 --> 00:06:36,970 There is inherent process noise. 136 00:06:36,970 --> 00:06:38,860 And part of the goal here is you're 137 00:06:38,860 --> 00:06:43,630 trying to detect a mean in the face of that noise. 138 00:06:43,630 --> 00:06:47,770 And so one way of thinking about sampling and calculating x bar 139 00:06:47,770 --> 00:06:50,510 is that, indeed, you are filtering. 140 00:06:50,510 --> 00:06:51,370 OK. 141 00:06:51,370 --> 00:06:55,030 So you want to filter to suppress 142 00:06:55,030 --> 00:06:58,120 the additional variance effect when you're 143 00:06:58,120 --> 00:07:00,880 looking for the mean phrase. 144 00:07:00,880 --> 00:07:04,930 But again, you've got to also monitor for the variance. 145 00:07:04,930 --> 00:07:08,170 But this noise makes it a little bit tricky 146 00:07:08,170 --> 00:07:11,400 in terms of being able to detect the means. 147 00:07:11,400 --> 00:07:12,790 So you want to filter. 148 00:07:12,790 --> 00:07:17,020 And one of the ideas of these alternative control charts 149 00:07:17,020 --> 00:07:20,860 is explicitly think about the filtering action 150 00:07:20,860 --> 00:07:22,030 that you're doing-- 151 00:07:22,030 --> 00:07:25,390 a little bit almost from a more signal processing 152 00:07:25,390 --> 00:07:27,970 point of view, but not necessarily 153 00:07:27,970 --> 00:07:33,670 have to wait for an additional nine runs 154 00:07:33,670 --> 00:07:36,670 before you call that a sample, aggregate it, 155 00:07:36,670 --> 00:07:40,630 and then plot it, and then wait for another nine runs. 156 00:07:40,630 --> 00:07:45,990 One alternative approach here is to take a running average. 157 00:07:45,990 --> 00:07:51,010 So I take a window, say, of 9 points and measurements, 158 00:07:51,010 --> 00:07:54,400 calculate my statistics, and then for each new run 159 00:07:54,400 --> 00:07:58,000 move my window over and recalculate my statistics 160 00:07:58,000 --> 00:08:00,070 so that I'm in some sense updating 161 00:08:00,070 --> 00:08:04,480 what my best guess of the current state of the process 162 00:08:04,480 --> 00:08:08,380 is after every run. 163 00:08:08,380 --> 00:08:10,780 And the advantages of this are that you can get 164 00:08:10,780 --> 00:08:13,860 more averages per unit of data. 165 00:08:13,860 --> 00:08:16,000 OK. 166 00:08:16,000 --> 00:08:23,750 You can go ahead and use the run data alone if you wanted to. 167 00:08:23,750 --> 00:08:27,140 You could actually plot for example, 168 00:08:27,140 --> 00:08:30,070 the individual measurements and even do a hybrid sort of thing 169 00:08:30,070 --> 00:08:32,110 where you did a running average just 170 00:08:32,110 --> 00:08:35,210 for the standard deviation. 171 00:08:35,210 --> 00:08:38,720 But the real advantage of these is they essentially improve 172 00:08:38,720 --> 00:08:40,610 your ability to detect. 173 00:08:40,610 --> 00:08:43,159 They improve the filtering action 174 00:08:43,159 --> 00:08:47,480 potentially because you can play with that window size 175 00:08:47,480 --> 00:08:53,095 to filter more but also be able to react potentially faster. 176 00:08:53,095 --> 00:08:54,470 So I'm going to describe a couple 177 00:08:54,470 --> 00:08:57,527 of these alternative charts. 178 00:08:57,527 --> 00:08:59,360 And I'm going to focus really on the running 179 00:08:59,360 --> 00:09:01,770 average detection of the mean. 180 00:09:01,770 --> 00:09:06,860 But the same idea also applies to filtering of the variance 181 00:09:06,860 --> 00:09:08,360 across multiple runs. 182 00:09:08,360 --> 00:09:17,810 So there's also running S charts or moving range charts 183 00:09:17,810 --> 00:09:20,340 and those sorts of things. 184 00:09:20,340 --> 00:09:23,330 So don't forget the lesson we just learned a minute ago. 185 00:09:23,330 --> 00:09:25,310 You got to monitor both mean and variance. 186 00:09:25,310 --> 00:09:29,030 But for the next 10, 15 slides I'm only 187 00:09:29,030 --> 00:09:33,300 going to talk about running averages on the average. 188 00:09:33,300 --> 00:09:35,730 So one idea of the running average 189 00:09:35,730 --> 00:09:38,450 is a simple running average or moving average 190 00:09:38,450 --> 00:09:43,880 where we simply take some n values of the past 191 00:09:43,880 --> 00:09:46,130 and calculate say a simple average. 192 00:09:46,130 --> 00:09:48,830 And similarly, we can do that for the variance. 193 00:09:48,830 --> 00:09:52,340 So in that case, this simplest picture, 194 00:09:52,340 --> 00:09:54,500 this simple moving average is just 195 00:09:54,500 --> 00:09:57,080 an equally weighted window. 196 00:09:57,080 --> 00:09:58,640 You pick your window size. 197 00:09:58,640 --> 00:10:02,300 Maybe it's a window size of 9 and you 198 00:10:02,300 --> 00:10:04,760 calculate a new moving average statistic, which 199 00:10:04,760 --> 00:10:06,800 I've labeled m sub i here. 200 00:10:06,800 --> 00:10:10,520 And so that's pretty trivial formula. 201 00:10:10,520 --> 00:10:12,140 Now how do you set the control limits 202 00:10:12,140 --> 00:10:15,890 on a moving average chart? 203 00:10:15,890 --> 00:10:19,970 Well, you want plus minus 3 sigma control 204 00:10:19,970 --> 00:10:23,990 limits based on the variance that you 205 00:10:23,990 --> 00:10:27,080 expect in your statistic. 206 00:10:27,080 --> 00:10:28,220 Right? 207 00:10:28,220 --> 00:10:32,750 So previously we'd have sample size of 9. 208 00:10:32,750 --> 00:10:35,180 And we'd say, well, what's the variance? 209 00:10:35,180 --> 00:10:39,920 It's 1 over-- it's sigma over root n 210 00:10:39,920 --> 00:10:42,980 is the standard deviation associated 211 00:10:42,980 --> 00:10:45,110 with our normal sample size. 212 00:10:45,110 --> 00:10:47,090 And that same formula ends up working here 213 00:10:47,090 --> 00:10:49,280 because we've got equal weighting. 214 00:10:49,280 --> 00:10:50,360 Right? 215 00:10:50,360 --> 00:10:57,590 So if we simply apply, again, assuming each of the points 216 00:10:57,590 --> 00:11:00,950 are truly in process control and I do not have correlation 217 00:11:00,950 --> 00:11:04,130 in time between them-- 218 00:11:04,130 --> 00:11:07,010 it's very easy to see what the variance is. 219 00:11:07,010 --> 00:11:10,640 Now you'll notice something a little strange 220 00:11:10,640 --> 00:11:13,380 right here at the beginning. 221 00:11:13,380 --> 00:11:17,420 This is kind of an artifact of the startup of the chart. 222 00:11:17,420 --> 00:11:20,660 Let's say I have just started the use of the control chart 223 00:11:20,660 --> 00:11:23,370 and I've only got five samples so far, 224 00:11:23,370 --> 00:11:26,960 but I've got a moving average window of 9. 225 00:11:26,960 --> 00:11:28,790 Well, I don't have nine data points yet. 226 00:11:28,790 --> 00:11:32,690 I have to form my moving average really with a window size 227 00:11:32,690 --> 00:11:34,590 only of 5 so far. 228 00:11:34,590 --> 00:11:37,670 And because I'm taking the variance divided 229 00:11:37,670 --> 00:11:41,930 by 5 instead of divided by 9, the variance 230 00:11:41,930 --> 00:11:45,050 of my average of a five point sample is larger. 231 00:11:45,050 --> 00:11:48,500 And you can see essentially each sample 232 00:11:48,500 --> 00:11:50,840 until I get to my full window size of 9, 233 00:11:50,840 --> 00:11:53,610 has slightly larger control chart limits. 234 00:11:53,610 --> 00:11:54,110 OK. 235 00:11:54,110 --> 00:11:57,630 So that's what's going on there. 236 00:11:57,630 --> 00:11:58,130 OK. 237 00:11:58,130 --> 00:12:00,320 And you can also sort of see here now 238 00:12:00,320 --> 00:12:02,700 the filtering action at work. 239 00:12:02,700 --> 00:12:10,140 Notice that each of these samples 240 00:12:10,140 --> 00:12:13,770 don't vary dramatically from the previous one. 241 00:12:13,770 --> 00:12:18,540 They are no longer uncorrelated from one sample to the next. 242 00:12:18,540 --> 00:12:21,270 In fact, with this equal weighting, 243 00:12:21,270 --> 00:12:27,720 every new time point shares 88% of the data 244 00:12:27,720 --> 00:12:31,340 with the previous data point. 245 00:12:31,340 --> 00:12:34,410 So there's a huge amount of correlation from one point 246 00:12:34,410 --> 00:12:35,740 in time in the next. 247 00:12:35,740 --> 00:12:40,470 So you should not try to use things 248 00:12:40,470 --> 00:12:44,070 like the WECO rules associated with three data 249 00:12:44,070 --> 00:12:47,760 points in a row or four or five data points in a row rising. 250 00:12:47,760 --> 00:12:51,690 Those things are right out because this 251 00:12:51,690 --> 00:12:54,990 is a running control chart or moving average control chart. 252 00:12:54,990 --> 00:12:57,240 And you've now got correlation. 253 00:12:57,240 --> 00:13:01,200 It's benefit though is that it is doing that filtering. 254 00:13:01,200 --> 00:13:05,850 And you can take whatever current point 255 00:13:05,850 --> 00:13:08,670 that you get as a sort of your best 256 00:13:08,670 --> 00:13:13,440 guess of what the current state of the overall mean is. 257 00:13:13,440 --> 00:13:16,560 And so you would trigger on alarms 258 00:13:16,560 --> 00:13:18,140 above or below the control limits. 259 00:13:21,920 --> 00:13:25,520 So simple moving average-- this is relatively simple. 260 00:13:25,520 --> 00:13:30,620 Any questions on this before we do a unequally 261 00:13:30,620 --> 00:13:33,790 weighted moving average? 262 00:13:33,790 --> 00:13:35,135 Pretty clear. 263 00:13:35,135 --> 00:13:36,760 AUDIENCE: I have a question, Professor. 264 00:13:36,760 --> 00:13:37,510 DUANE BONING: Yes. 265 00:13:37,510 --> 00:13:38,260 Good. 266 00:13:38,260 --> 00:13:41,470 These questions have been wonderful by the way. 267 00:13:41,470 --> 00:13:45,220 The last couple of classes, I think the questions have really 268 00:13:45,220 --> 00:13:46,310 been right on target. 269 00:13:46,310 --> 00:13:47,780 It's been a lot of fun. 270 00:13:47,780 --> 00:13:50,520 So we have high expectations for your question. 271 00:13:50,520 --> 00:13:52,658 [LAUGHTER] 272 00:13:53,838 --> 00:13:54,380 AUDIENCE: OK. 273 00:13:54,380 --> 00:13:57,110 So that's a lot of pressure, but my question 274 00:13:57,110 --> 00:13:59,300 is that for the window size on 9, 275 00:13:59,300 --> 00:14:02,840 so do the two windows next to each other-- they 276 00:14:02,840 --> 00:14:05,630 overlap by 8 data points? 277 00:14:05,630 --> 00:14:07,490 DUANE BONING: Right. 278 00:14:07,490 --> 00:14:08,040 AUDIENCE: OK. 279 00:14:08,040 --> 00:14:08,540 That's it. 280 00:14:08,540 --> 00:14:11,710 DUANE BONING: That's exactly right. 281 00:14:11,710 --> 00:14:12,710 AUDIENCE: OK, thank you. 282 00:14:15,320 --> 00:14:18,500 DUANE BONING: Now, one problem with that overlap of eight data 283 00:14:18,500 --> 00:14:24,290 points is, by the time you get to 8 points in the past, 284 00:14:24,290 --> 00:14:26,762 it may be-- 285 00:14:26,762 --> 00:14:30,260 that's a long time ago, depending on how rapidly you're 286 00:14:30,260 --> 00:14:31,670 sampling or whatever. 287 00:14:31,670 --> 00:14:35,480 And compared to the data point that you just took, 288 00:14:35,480 --> 00:14:37,340 you might think that the data point you just 289 00:14:37,340 --> 00:14:39,950 took represents the current state of the process 290 00:14:39,950 --> 00:14:45,240 a lot better than the 0.789 time points ago. 291 00:14:45,240 --> 00:14:47,720 And so there is a more general approach here, 292 00:14:47,720 --> 00:14:51,140 where you can pick some weighting for how much you 293 00:14:51,140 --> 00:14:57,910 want to fold in data from different points in time. 294 00:14:57,910 --> 00:15:01,690 And a classic way to do the weighting 295 00:15:01,690 --> 00:15:05,230 is to use an exponentially weighted moving 296 00:15:05,230 --> 00:15:08,260 average, where you might define the weighting function-- 297 00:15:08,260 --> 00:15:10,900 these a sub i's here-- 298 00:15:10,900 --> 00:15:15,070 I guess I've switched notation to w here-- 299 00:15:15,070 --> 00:15:18,010 when I'm at the current sample point or the current time point 300 00:15:18,010 --> 00:15:23,860 t, I might weight points backwards in time by index i, 301 00:15:23,860 --> 00:15:28,210 i going from 1 to 9 to 1,000-- whatever-- 302 00:15:28,210 --> 00:15:30,820 by a function looking something like this, 303 00:15:30,820 --> 00:15:37,880 with some r factor, where r is weighting factor. 304 00:15:37,880 --> 00:15:40,570 So what happens here, as I might think 305 00:15:40,570 --> 00:15:46,600 of r as being how much I believe my current data point-- 306 00:15:46,600 --> 00:15:50,410 might be I believe my current data point by 90%, 307 00:15:50,410 --> 00:15:54,640 and then the next data point, 1 minus r, is only 10%. 308 00:15:54,640 --> 00:15:57,200 So I'm only mixing in 10% of that. 309 00:15:57,200 --> 00:15:59,290 And then the next data point beyond that-- 310 00:15:59,290 --> 00:16:02,290 it's maybe 0.1 times 0.1. 311 00:16:02,290 --> 00:16:05,200 It's only 0.01, so I'm only mixing 312 00:16:05,200 --> 00:16:07,660 in 1% of the previous data point. 313 00:16:07,660 --> 00:16:15,190 So very rapidly, I have this drop-off as a function of data 314 00:16:15,190 --> 00:16:16,375 points in the past-- 315 00:16:18,970 --> 00:16:20,340 this is my i index-- 316 00:16:24,120 --> 00:16:28,710 so that I much more heavily weight recent history, 317 00:16:28,710 --> 00:16:32,910 and then exponentially decay into the past. 318 00:16:35,513 --> 00:16:37,180 But I still have this degree of freedom. 319 00:16:40,305 --> 00:16:41,430 I can pick r. 320 00:16:41,430 --> 00:16:45,960 I can pick the relative weighting of more recent data 321 00:16:45,960 --> 00:16:49,910 compared to old data. 322 00:16:49,910 --> 00:16:52,710 Now, one way of looking at this is, 323 00:16:52,710 --> 00:16:58,790 actually, I have access to all past data, 324 00:16:58,790 --> 00:17:02,750 and I would apply this out to 10 data 325 00:17:02,750 --> 00:17:08,420 points, or 50, or 100 with the appropriate weighting function. 326 00:17:08,420 --> 00:17:10,849 And I guess, with computer systems these days, 327 00:17:10,849 --> 00:17:12,980 it would be no problem to actually apply it 328 00:17:12,980 --> 00:17:14,930 in that fashion. 329 00:17:14,930 --> 00:17:18,410 There's a beautiful thing about the exponentially weighted 330 00:17:18,410 --> 00:17:21,470 moving average, which is I actually 331 00:17:21,470 --> 00:17:25,300 don't have to keep track of all of the raw data. 332 00:17:25,300 --> 00:17:27,050 There's another way of thinking about what 333 00:17:27,050 --> 00:17:30,320 the EWMA, this exponentially weighted moving average 334 00:17:30,320 --> 00:17:31,830 is doing. 335 00:17:31,830 --> 00:17:35,540 And that's that I have my previous best estimate 336 00:17:35,540 --> 00:17:39,590 of the state of my variable-- my previous best estimate 337 00:17:39,590 --> 00:17:40,880 of the mean-- 338 00:17:40,880 --> 00:17:42,860 and now I have one new data point, 339 00:17:42,860 --> 00:17:44,660 and I use that one new data point 340 00:17:44,660 --> 00:17:47,690 to update my best estimate. 341 00:17:47,690 --> 00:17:52,170 My best estimate had embedded in it all of the past history, 342 00:17:52,170 --> 00:17:54,020 so now, with the additional new weight, 343 00:17:54,020 --> 00:17:58,820 I can just do a simple update of my best state estimate. 344 00:17:58,820 --> 00:18:03,050 And there is essentially a recursive formula 345 00:18:03,050 --> 00:18:07,280 that makes this explicit, and it's exactly the same, thing 346 00:18:07,280 --> 00:18:10,730 if you now apply it iteratively out to all history, 347 00:18:10,730 --> 00:18:14,690 as actually performing that computation of applying 348 00:18:14,690 --> 00:18:17,210 the weights on all past data. 349 00:18:17,210 --> 00:18:20,630 So that is to say, let's say, a sub i 350 00:18:20,630 --> 00:18:25,250 is my best estimate of whatever my statistic is-- my mean. 351 00:18:25,250 --> 00:18:28,640 Again, I might weight my current data point, 352 00:18:28,640 --> 00:18:31,950 my most recent data point, say, 90%, 353 00:18:31,950 --> 00:18:34,760 which would be really rapid update. 354 00:18:34,760 --> 00:18:37,430 I really believe my most recent data. 355 00:18:37,430 --> 00:18:42,200 And then my 1 minus r applies to my previous-- 356 00:18:42,200 --> 00:18:47,430 oops-- to my previous estimate. 357 00:18:54,350 --> 00:18:57,230 So I'm believing only 10% my previous estimate, 358 00:18:57,230 --> 00:19:00,223 and 90% my new estimate. 359 00:19:00,223 --> 00:19:01,640 So it makes it a little bit easier 360 00:19:01,640 --> 00:19:03,980 in terms of the computation, but I 361 00:19:03,980 --> 00:19:06,140 think it's also conceptually nice in terms 362 00:19:06,140 --> 00:19:08,870 of what the action here is. 363 00:19:08,870 --> 00:19:14,310 Now, one can go through and do a little bit of the mathematics. 364 00:19:14,310 --> 00:19:18,980 It's actually a little easier to do it on the full summation 365 00:19:18,980 --> 00:19:23,810 version, but you just do basic expectation math 366 00:19:23,810 --> 00:19:27,230 on the variance of an EWMA-- 367 00:19:27,230 --> 00:19:30,350 again, drawing from a normal distribution that 368 00:19:30,350 --> 00:19:32,120 is in control. 369 00:19:32,120 --> 00:19:35,750 And one can estimate the standard deviation 370 00:19:35,750 --> 00:19:38,030 of this a statistic. 371 00:19:38,030 --> 00:19:40,470 And it's kind of a little bit messy. 372 00:19:40,470 --> 00:19:45,020 It's got the underlying process variation in it, the fact 373 00:19:45,020 --> 00:19:47,300 that you are still-- 374 00:19:47,300 --> 00:19:51,080 you can still be doing sampling. 375 00:19:51,080 --> 00:19:56,300 For each x sub i, that might be a sample of n data points. 376 00:19:56,300 --> 00:19:59,530 Very often, you might simply be doing it one data point, 377 00:19:59,530 --> 00:20:02,240 so n, very often, is just 1-- 378 00:20:02,240 --> 00:20:07,220 and then this formula with r in it, including this 1 minus 1 379 00:20:07,220 --> 00:20:11,690 minus r to the 2t, where t is how far along you are. 380 00:20:11,690 --> 00:20:14,150 So it's kind of nasty, but one thing that's nice 381 00:20:14,150 --> 00:20:16,400 is, once you've start started up your control chart 382 00:20:16,400 --> 00:20:18,590 and you're away from that little artifact 383 00:20:18,590 --> 00:20:20,330 that we saw with the moving average, 384 00:20:20,330 --> 00:20:24,350 where the variance limits change-- 385 00:20:24,350 --> 00:20:28,100 once you get far enough away from that start up, 386 00:20:28,100 --> 00:20:33,530 your t is large, and that this whole term just drops out-- 387 00:20:33,530 --> 00:20:35,420 drops out more fast when-- 388 00:20:35,420 --> 00:20:39,830 more rapidly for large values of r, 389 00:20:39,830 --> 00:20:42,900 because I'm really weighting more towards recent history. 390 00:20:42,900 --> 00:20:44,990 But even for small or modest values 391 00:20:44,990 --> 00:20:48,230 of r, after you get out 10, 20 data points, 392 00:20:48,230 --> 00:20:50,220 it typically drops away. 393 00:20:50,220 --> 00:20:52,340 So now you've got an estimate for large T 394 00:20:52,340 --> 00:20:55,610 for the variance, sigma a of your statistic, 395 00:20:55,610 --> 00:20:58,670 and you can simply plot plus/minus 396 00:20:58,670 --> 00:21:01,760 3 sigma, or whatever control limit you want, 397 00:21:01,760 --> 00:21:05,380 depending on your false alarm rate alpha. 398 00:21:05,380 --> 00:21:08,070 So this is pretty cool. 399 00:21:08,070 --> 00:21:10,320 One quick observation here is, again, we still 400 00:21:10,320 --> 00:21:12,720 have a lot of degrees of flexibility 401 00:21:12,720 --> 00:21:15,000 around the choice of r. 402 00:21:15,000 --> 00:21:21,105 And if we look at this formula here, and the effect of r-- 403 00:21:21,105 --> 00:21:25,920 I sort of plotted this out in this next chart. 404 00:21:25,920 --> 00:21:29,790 This effect of r on the sigma multiplier-- oops-- sorry-- 405 00:21:29,790 --> 00:21:33,820 is this factor right here. 406 00:21:33,820 --> 00:21:38,430 So in other words, as r changes, how do my control limits move? 407 00:21:38,430 --> 00:21:41,580 What is this multiplier of my sigma over root 408 00:21:41,580 --> 00:21:44,880 n from my normal sampling? 409 00:21:44,880 --> 00:21:46,410 And you can look at it here. 410 00:21:46,410 --> 00:21:51,330 And as r gets larger, well, that gets larger-- 411 00:21:51,330 --> 00:21:53,010 as r approaches 1. 412 00:21:53,010 --> 00:21:55,750 Typically, you don't ever go above 1. 413 00:21:55,750 --> 00:21:57,630 You can't believe your most recent data 414 00:21:57,630 --> 00:21:59,700 point more than 100%. 415 00:21:59,700 --> 00:22:04,400 It's a ratio, or fraction of the percent. 416 00:22:04,400 --> 00:22:09,450 You can see that, as r gets larger, 417 00:22:09,450 --> 00:22:15,040 this multiplier starts to approach some limit. 418 00:22:15,040 --> 00:22:21,180 And finally, when r is equal to 1, this multiplier is simply 1. 419 00:22:21,180 --> 00:22:27,140 In the limit, when r is equal to 1, what are we doing? 420 00:22:27,140 --> 00:22:30,700 We're doing 100% update based on my most recent data point. 421 00:22:30,700 --> 00:22:33,480 In other words, I'm only looking at my most recent data point. 422 00:22:33,480 --> 00:22:35,790 My variance is simply the process variance 423 00:22:35,790 --> 00:22:37,230 coming from sampling. 424 00:22:37,230 --> 00:22:40,600 This all reduces to [? simply ?] sigma 425 00:22:40,600 --> 00:22:44,190 a being by sampling variance, whatever 426 00:22:44,190 --> 00:22:46,770 size sample I was doing within each formulation. 427 00:22:46,770 --> 00:22:55,080 So in that case, I have no factor or no multiplier 428 00:22:55,080 --> 00:23:00,630 on that sigma A. But for other values of r, what's happening? 429 00:23:00,630 --> 00:23:06,430 For smaller values of r, what do we see? 430 00:23:09,030 --> 00:23:15,690 The sigma A is smaller than the sigma x over root n. 431 00:23:15,690 --> 00:23:20,310 We're getting reduction in the variance because of filtering, 432 00:23:20,310 --> 00:23:21,870 the variance of our estimate. 433 00:23:21,870 --> 00:23:25,110 So that was the whole idea is I can tighten my control 434 00:23:25,110 --> 00:23:30,450 limits, essentially by the filtering action. 435 00:23:30,450 --> 00:23:33,960 So as I get smaller and smaller r's, I 436 00:23:33,960 --> 00:23:36,120 do more and more aggressive filtering 437 00:23:36,120 --> 00:23:38,190 with lots of past data. 438 00:23:38,190 --> 00:23:40,980 I get tighter and tighter limits, 439 00:23:40,980 --> 00:23:45,720 but of course, I'm also suppressing recent data 440 00:23:45,720 --> 00:23:50,070 little bit more heavily, so there is this trade-off. 441 00:23:50,070 --> 00:23:53,040 Let's look at some examples. 442 00:23:53,040 --> 00:23:56,130 Oh, well, I guess I made these points already here. 443 00:23:56,130 --> 00:24:01,320 Again, the variance of this estimate for the mean-- 444 00:24:01,320 --> 00:24:03,240 your best guess of the mean-- 445 00:24:03,240 --> 00:24:07,650 is smaller than you would have had with an x bar chart, 446 00:24:07,650 --> 00:24:11,370 because the x bar chart, remember, for a sample 447 00:24:11,370 --> 00:24:14,170 size of n, is simply that. 448 00:24:14,170 --> 00:24:16,080 So we do get this suppression. 449 00:24:16,080 --> 00:24:20,460 And it is very nice that it also, in the limit, 450 00:24:20,460 --> 00:24:24,930 works out for both the sample size of 1 case-- 451 00:24:24,930 --> 00:24:26,610 these formulas still hold-- 452 00:24:26,610 --> 00:24:29,302 and also the full update case, or the r 453 00:24:29,302 --> 00:24:32,280 equals 1, where we have completely unfiltered data. 454 00:24:32,280 --> 00:24:37,360 You just plot in your pure run data on your chart. 455 00:24:37,360 --> 00:24:43,530 So what happens if I use much smaller r's, like 0.1? 456 00:24:43,530 --> 00:24:45,750 I got much tighter control limits, 457 00:24:45,750 --> 00:24:51,270 but I've got long delays potentially then in detecting-- 458 00:24:51,270 --> 00:24:52,800 I've got a lot of noise suppression, 459 00:24:52,800 --> 00:24:54,690 but again, I get back to long delays 460 00:24:54,690 --> 00:24:58,780 in detecting or responding to a true shift in the mean. 461 00:24:58,780 --> 00:25:01,890 So I want to look at a few examples here 462 00:25:01,890 --> 00:25:05,010 to give you a feel of the trade-off between different r's 463 00:25:05,010 --> 00:25:12,000 for some data, and compare it as well to a typical x bar chart. 464 00:25:12,000 --> 00:25:19,440 OK, so here's a set of data, and we're seeing a comparison here 465 00:25:19,440 --> 00:25:21,645 between the x bar chart-- 466 00:25:24,540 --> 00:25:32,720 so you can see here are the x bar upper and lower control 467 00:25:32,720 --> 00:25:34,030 limits. 468 00:25:34,030 --> 00:25:40,970 And sure enough, after 150 runs, where at-- 469 00:25:40,970 --> 00:25:45,530 I don't know-- about run 90, we make this shift 470 00:25:45,530 --> 00:25:46,410 in the process-- 471 00:25:46,410 --> 00:25:50,000 so we shift the whole process by a half 472 00:25:50,000 --> 00:25:53,090 a standard deviation of the underlying process. 473 00:25:53,090 --> 00:25:56,460 And sure enough, by about-- 474 00:25:56,460 --> 00:26:00,470 I don't know-- run 140, we finally got a data point 475 00:26:00,470 --> 00:26:02,840 in the tail of the mean distribution-- 476 00:26:02,840 --> 00:26:04,370 the regular x bar-- 477 00:26:04,370 --> 00:26:06,200 that fell out. 478 00:26:06,200 --> 00:26:09,110 And we could actually calculate the average run length 479 00:26:09,110 --> 00:26:13,790 to detect a mean shift of 0.5 using some of the techniques we 480 00:26:13,790 --> 00:26:16,270 talked about last time. 481 00:26:16,270 --> 00:26:19,720 And we might not be happy with that very long ARL, 482 00:26:19,720 --> 00:26:24,280 and the question is, would an EWMA chart help us with this? 483 00:26:24,280 --> 00:26:26,170 And this is an example where we're 484 00:26:26,170 --> 00:26:32,020 using about a 30% update based on each new data point, 485 00:26:32,020 --> 00:26:35,230 and plotted in purple with the blue 486 00:26:35,230 --> 00:26:38,170 and the yellow are the upper and lower control 487 00:26:38,170 --> 00:26:40,930 limits for the EWMA. 488 00:26:40,930 --> 00:26:47,640 And what you see is the effect of averaging. 489 00:26:47,640 --> 00:26:53,560 Now, I always find it's easiest for me to conceptualize 490 00:26:53,560 --> 00:26:57,070 the action of the EWMA chart is just kind of like a moving 491 00:26:57,070 --> 00:27:00,640 average-- the equally weighted moving average, but weighted 492 00:27:00,640 --> 00:27:03,160 a little bit more towards recent history. 493 00:27:03,160 --> 00:27:07,690 But the whole idea is I get that aggregation 494 00:27:07,690 --> 00:27:10,540 from any kind of a moving average so that, 495 00:27:10,540 --> 00:27:13,180 right after this shift right here, 496 00:27:13,180 --> 00:27:15,760 now, when I take into account-- 497 00:27:15,760 --> 00:27:21,400 with an r of 0.3, I'm taking into account three, four, five 498 00:27:21,400 --> 00:27:22,540 data points-- 499 00:27:22,540 --> 00:27:25,930 I've got kind of an effective window weighted 500 00:27:25,930 --> 00:27:29,080 about 30% towards my most recent, but including 501 00:27:29,080 --> 00:27:30,730 some past history-- 502 00:27:30,730 --> 00:27:34,450 that all of the plus delta nu data 503 00:27:34,450 --> 00:27:39,760 points all average, and then my normal process variation 504 00:27:39,760 --> 00:27:42,440 just cancels out each other. 505 00:27:42,440 --> 00:27:46,510 And so it's very intuitive, I think, 506 00:27:46,510 --> 00:27:51,820 that this EWMA is going to aggregate or respond 507 00:27:51,820 --> 00:27:56,230 to or sense the delta shift in the mean more 508 00:27:56,230 --> 00:28:03,580 than an individual x bar chart would have been doing-- 509 00:28:03,580 --> 00:28:05,770 so that it responds in this case, 510 00:28:05,770 --> 00:28:10,330 I don't know, after about 15 or 20 data points. 511 00:28:10,330 --> 00:28:15,690 You can start to see excursions beyond the control limit. 512 00:28:15,690 --> 00:28:16,723 Question-- 513 00:28:16,723 --> 00:28:18,390 AUDIENCE: At the beginning of the chart, 514 00:28:18,390 --> 00:28:20,803 why are the control limits [INAUDIBLE] 515 00:28:20,803 --> 00:28:21,720 spreading [INAUDIBLE]? 516 00:28:21,720 --> 00:28:27,100 The last time, they were [INAUDIBLE] 517 00:28:27,100 --> 00:28:28,790 DUANE BONING: Yes, yes-- good point. 518 00:28:31,650 --> 00:28:35,400 I think you can go back to this formula and see if that works. 519 00:28:42,720 --> 00:28:45,150 I bet, if you look at the effective the t 520 00:28:45,150 --> 00:28:53,510 action, formulaically or analytically, 521 00:28:53,510 --> 00:28:57,130 I think that, as t starts out being very small-- 522 00:29:04,980 --> 00:29:08,130 so if t is 0, this is essentially 1. 523 00:29:12,000 --> 00:29:18,240 Well, let's say t was 1 and r was 0.3, I've got 0.7. 524 00:29:18,240 --> 00:29:19,750 I've got a reduction. 525 00:29:19,750 --> 00:29:23,950 So I think, for small t, I've only 526 00:29:23,950 --> 00:29:26,890 got a limited amount of data, and so I'm not-- 527 00:29:26,890 --> 00:29:29,770 I don't have the long tail in the EWMA. 528 00:29:29,770 --> 00:29:32,500 So I think I've only got a small amount of data 529 00:29:32,500 --> 00:29:34,750 with a larger weight in my control limits 530 00:29:34,750 --> 00:29:36,110 I think are smaller. 531 00:29:36,110 --> 00:29:41,200 But then, as t gets larger, I've got a little bit more 532 00:29:41,200 --> 00:29:43,150 and more data until t-- 533 00:29:43,150 --> 00:29:45,310 so I think if I were to plot this 1 minus 1 534 00:29:45,310 --> 00:29:47,200 minus r to the 2t-- 535 00:29:47,200 --> 00:29:54,220 kind of has a funky shape that dies out. 536 00:29:54,220 --> 00:29:56,980 But I wouldn't worry too much about the startup times. 537 00:29:56,980 --> 00:29:59,620 There are artifacts when you start up a control chart, 538 00:29:59,620 --> 00:30:01,810 but once you just start a control chart anyway, 539 00:30:01,810 --> 00:30:04,720 you're still feeling out the process, 540 00:30:04,720 --> 00:30:07,930 so I wouldn't worry too much about the startup 541 00:30:07,930 --> 00:30:08,900 times on these. 542 00:30:08,900 --> 00:30:09,400 Yeah? 543 00:30:09,400 --> 00:30:18,768 AUDIENCE: [INAUDIBLE] How do you get this side [INAUDIBLE] 544 00:30:18,768 --> 00:30:19,560 DUANE BONING: Yeah. 545 00:30:19,560 --> 00:30:22,920 So the question is-- if you didn't hear that in Singapore-- 546 00:30:22,920 --> 00:30:25,110 was, when would you use an EWMA or what 547 00:30:25,110 --> 00:30:29,170 are the factors in using an EWMA from an x bar chart? 548 00:30:29,170 --> 00:30:34,920 And I think, if you've got a lot of noise 549 00:30:34,920 --> 00:30:39,600 and you're really worried about detecting rapidly a shift 550 00:30:39,600 --> 00:30:42,510 in the process, an EWMA-- 551 00:30:42,510 --> 00:30:44,790 and we'll see the CUSUM in a minute-- 552 00:30:44,790 --> 00:30:50,160 has improved sensitivity to rapid detection of the shift. 553 00:30:50,160 --> 00:30:53,410 But it comes at one really important cost, 554 00:30:53,410 --> 00:30:59,280 which is a loss in transparency in what's going on. 555 00:30:59,280 --> 00:31:01,320 The x bar, I think, conceptually, 556 00:31:01,320 --> 00:31:03,930 is a lot easier to understand, have 557 00:31:03,930 --> 00:31:07,290 everybody on the line understand what it is. 558 00:31:07,290 --> 00:31:11,280 The data, if you also explode out and show the full data-- 559 00:31:11,280 --> 00:31:14,610 the x bar and the s-- 560 00:31:14,610 --> 00:31:17,610 gives you a good feel directly to the statistics 561 00:31:17,610 --> 00:31:19,020 of the process. 562 00:31:19,020 --> 00:31:20,940 Here you look at this and you lose touch, 563 00:31:20,940 --> 00:31:27,240 I think, a little bit with the statistics of-- 564 00:31:27,240 --> 00:31:29,550 and the relationship to your raw data. 565 00:31:29,550 --> 00:31:31,440 It's a little harder to get your mind around. 566 00:31:36,660 --> 00:31:40,530 I don't know-- 10, 15 years ago, I fell in love with the EWMA 567 00:31:40,530 --> 00:31:42,870 chart and approach-- 568 00:31:42,870 --> 00:31:47,080 not only for charting, but also for doing 569 00:31:47,080 --> 00:31:49,950 simple run-by-run control. 570 00:31:49,950 --> 00:31:52,410 And so I kind of like it, but I can 571 00:31:52,410 --> 00:31:55,980 empathize with it being a little bit harder to explain 572 00:31:55,980 --> 00:31:59,250 and maintain on the line. 573 00:31:59,250 --> 00:32:02,580 It's also got a little bit more calculation, so back 574 00:32:02,580 --> 00:32:05,010 in the days of-- 575 00:32:05,010 --> 00:32:07,380 and it's still true in some of the manufacturing lines. 576 00:32:07,380 --> 00:32:10,050 I know, when we went on the tour and some of the plants-- 577 00:32:10,050 --> 00:32:14,702 very, very up-to-date plants in Singapore 578 00:32:14,702 --> 00:32:16,410 and other places around the world, people 579 00:32:16,410 --> 00:32:18,510 are still often taking measurements by hand, 580 00:32:18,510 --> 00:32:22,890 and doing the hand calculation, and writing it on a chart. 581 00:32:22,890 --> 00:32:24,270 This has an extra step. 582 00:32:24,270 --> 00:32:27,390 You've got to do an extra average. 583 00:32:27,390 --> 00:32:34,300 So in modern day, when almost all the calculations 584 00:32:34,300 --> 00:32:35,388 are automated. 585 00:32:35,388 --> 00:32:36,930 And the data collection is automated. 586 00:32:36,930 --> 00:32:40,350 I think there's even more reason to go with an EWMA. 587 00:32:40,350 --> 00:32:45,510 So the thing I like about the best about the EWMA is I 588 00:32:45,510 --> 00:32:50,040 think of it as giving me my best current estimate based 589 00:32:50,040 --> 00:32:53,670 not only on just my last data point, but a little bit 590 00:32:53,670 --> 00:32:55,650 of filtering, including past history. 591 00:32:55,650 --> 00:33:01,770 It's giving my best educated state of the current process. 592 00:33:01,770 --> 00:33:05,790 And why I like that for then run-by-run control is, 593 00:33:05,790 --> 00:33:08,880 based on my current model, my updated-- 594 00:33:08,880 --> 00:33:11,100 on a recursive basis, updated model 595 00:33:11,100 --> 00:33:14,740 of the current state of the process I can make a decision. 596 00:33:14,740 --> 00:33:19,620 So here the decision in SPC is the process in control or not 597 00:33:19,620 --> 00:33:24,720 'But if you think of this being not just the EWMA not-- 598 00:33:24,720 --> 00:33:26,140 giving you more information. 599 00:33:26,140 --> 00:33:30,810 It's not just giving you, am I in control or not, 600 00:33:30,810 --> 00:33:32,620 given the random spread? 601 00:33:32,620 --> 00:33:35,190 But if you interpret this little purple data 602 00:33:35,190 --> 00:33:39,030 point here as my best estimate of where 603 00:33:39,030 --> 00:33:42,420 the mean in the recent past has been, 604 00:33:42,420 --> 00:33:45,930 if there is a little bit of drift or movement maybe. 605 00:33:45,930 --> 00:33:51,120 I do active control and not pure SPC control. 606 00:33:51,120 --> 00:33:53,850 I'm not making any longer just the decision. 607 00:33:53,850 --> 00:33:55,080 Nothing has changed. 608 00:33:55,080 --> 00:33:57,030 I'm saying there are small changes, 609 00:33:57,030 --> 00:34:00,180 and maybe I want to go and make a small modification 610 00:34:00,180 --> 00:34:02,860 or an adjustment on the equipment. 611 00:34:02,860 --> 00:34:03,360 Now. 612 00:34:03,360 --> 00:34:07,050 If, you are in true statistical process control, 613 00:34:07,050 --> 00:34:08,739 that would be the wrong thing to do. 614 00:34:08,739 --> 00:34:12,040 You'd be over-adjusting, and we'll talk about that later. 615 00:34:12,040 --> 00:34:16,230 But that's another thing I like about the EWMA is very often, 616 00:34:16,230 --> 00:34:20,310 there is a little bit of time correlation in the process. 617 00:34:20,310 --> 00:34:26,679 And the EWMA is tracking those small changes 618 00:34:26,679 --> 00:34:29,260 and filtering out some of the noise. 619 00:34:29,260 --> 00:34:30,639 It's a good question. 620 00:34:30,639 --> 00:34:31,540 I'll take a question. 621 00:34:31,540 --> 00:34:35,207 There was somebody had a question in Singapore. 622 00:34:35,207 --> 00:34:35,790 AUDIENCE: Yes. 623 00:34:35,790 --> 00:34:40,230 So under smaller r will lead to the narrow control limit, 624 00:34:40,230 --> 00:34:41,715 right? 625 00:34:41,715 --> 00:34:42,840 DUANE BONING: That's right. 626 00:34:42,840 --> 00:34:45,380 AUDIENCE: That's so the narrow control limit-- 627 00:34:45,380 --> 00:34:48,480 you will have a smaller beta? 628 00:34:53,389 --> 00:34:56,449 DUANE BONING: So a smaller beta would be less chance-- 629 00:34:56,449 --> 00:34:57,580 AUDIENCE: Smaller beta? 630 00:34:57,580 --> 00:34:58,330 DUANE BONING: Yes. 631 00:34:58,330 --> 00:35:00,630 AUDIENCE: Yeah, [INAUDIBLE] type II error. 632 00:35:00,630 --> 00:35:03,960 DUANE BONING: Right-- but its power is then larger, 633 00:35:03,960 --> 00:35:08,790 so the chance of detecting a true out of control point 634 00:35:08,790 --> 00:35:11,430 increases. 635 00:35:11,430 --> 00:35:14,850 But there's a very important additional factor, which 636 00:35:14,850 --> 00:35:17,540 is there's additional time lag. 637 00:35:17,540 --> 00:35:19,290 So I'm going to show a couple of examples, 638 00:35:19,290 --> 00:35:22,410 because you can see in other examples 639 00:35:22,410 --> 00:35:25,890 the effective the filtering and the time response that it takes 640 00:35:25,890 --> 00:35:28,710 to detect can go up. 641 00:35:28,710 --> 00:35:30,773 AUDIENCE: OK. 642 00:35:30,773 --> 00:35:32,440 DUANE BONING: So another question here-- 643 00:35:32,440 --> 00:35:33,120 AUDIENCE: I'm was just curious. 644 00:35:33,120 --> 00:35:35,640 What sample size were you using for the x bar chart? 645 00:35:35,640 --> 00:35:36,950 DUANE BONING: I don't know. 646 00:35:36,950 --> 00:35:40,710 I'm not sure what was used in the-- 647 00:35:40,710 --> 00:35:41,790 as a sample size. 648 00:35:41,790 --> 00:35:44,350 I think you're just seeing the plot of the sample, 649 00:35:44,350 --> 00:35:47,025 so the sample size is actually the same in both cases. 650 00:35:52,370 --> 00:35:56,390 But I don't what was being used. 651 00:35:56,390 --> 00:36:00,590 In some sense, it doesn't actually matter too much. 652 00:36:00,590 --> 00:36:02,780 The EWMA is looking-- 653 00:36:02,780 --> 00:36:05,500 is an EWMA over sample. 654 00:36:09,020 --> 00:36:10,970 I might have 1,000 parts an hour, 655 00:36:10,970 --> 00:36:16,400 and every 15 minutes, I might do a sample size of 10. 656 00:36:16,400 --> 00:36:18,820 So I'm only getting sample size 10, lots more. 657 00:36:18,820 --> 00:36:23,960 And what's being plotted here, in some sense, is the sample. 658 00:36:23,960 --> 00:36:27,070 So I have an x bar where I might plot those samples. 659 00:36:27,070 --> 00:36:28,900 Here what I'm doing-- 660 00:36:28,900 --> 00:36:34,240 the EWMA has enough-- it has the flexibility to extend to n 661 00:36:34,240 --> 00:36:36,700 equals 1 sampling, if that's what I'm doing. 662 00:36:36,700 --> 00:36:39,340 But in some sense, it could also be 663 00:36:39,340 --> 00:36:45,160 used where, instead of each of those n equals 10 samples was 664 00:36:45,160 --> 00:36:46,570 the only data I looked at. 665 00:36:46,570 --> 00:36:48,370 Each one of those is my t-- 666 00:36:48,370 --> 00:36:53,220 point in t, and my previous sample of size 10 is t minus 1. 667 00:36:56,260 --> 00:37:00,520 So in some sense, it's doing the whole sample-by-sample 668 00:37:00,520 --> 00:37:02,270 comparison. 669 00:37:02,270 --> 00:37:06,550 So my point here was both the EWMA 670 00:37:06,550 --> 00:37:12,650 and the x bar are using, in this derivation, the same sample 671 00:37:12,650 --> 00:37:13,150 size. 672 00:37:13,150 --> 00:37:15,100 I'm not differentiating in that. 673 00:37:15,100 --> 00:37:18,640 All I'm doing in this case really, with the EWMA 674 00:37:18,640 --> 00:37:21,880 is using multiple samples back in time, 675 00:37:21,880 --> 00:37:25,770 and the x bar only uses one. 676 00:37:25,770 --> 00:37:32,070 So in fact, if I go to r equals 1, the two converge. 677 00:37:32,070 --> 00:37:33,240 They are the same. 678 00:37:33,240 --> 00:37:38,190 The r equals EWMA is exactly the same as the x bar. 679 00:37:38,190 --> 00:37:39,860 AUDIENCE: But the larger my sample size 680 00:37:39,860 --> 00:37:44,160 is the less of a spread I get between my two-- 681 00:37:44,160 --> 00:37:45,486 [INAUDIBLE] 682 00:37:45,486 --> 00:37:47,640 DUANE BONING: They would both go together, though, 683 00:37:47,640 --> 00:37:49,015 because I had to have the factor. 684 00:37:49,015 --> 00:37:52,350 So back here on slide 12, if I change my sample size, 685 00:37:52,350 --> 00:37:56,520 both sigma A and my upper control limits 686 00:37:56,520 --> 00:38:00,550 and lower federal limits for the x bar shrink exactly the same. 687 00:38:00,550 --> 00:38:04,840 So I would scale both of them, if I increase. 688 00:38:04,840 --> 00:38:10,600 So what I was talking about here is with different values of r. 689 00:38:10,600 --> 00:38:14,840 So this was an r equals 0.3. 690 00:38:14,840 --> 00:38:17,740 Here's a smaller r, r equals 0.1. 691 00:38:17,740 --> 00:38:19,240 So what's happening here? 692 00:38:19,240 --> 00:38:22,600 I'm only accepting my most recent data point 10%, 693 00:38:22,600 --> 00:38:25,870 and mixing past history and 90%. 694 00:38:25,870 --> 00:38:28,900 That's more averaging of the past. 695 00:38:28,900 --> 00:38:31,570 I'm using more past history, and so 696 00:38:31,570 --> 00:38:38,030 in this case, what you see is kind of a more slowly drifting, 697 00:38:38,030 --> 00:38:39,750 more heavily-- 698 00:38:39,750 --> 00:38:43,190 excuse me-- more heavily filtered version. 699 00:38:43,190 --> 00:38:47,830 And here, with the mean shift occurring about here, 700 00:38:47,830 --> 00:38:50,890 you can start to see in this case 701 00:38:50,890 --> 00:38:59,017 a big mean shift at 0.5 with, I guess, a sample size of 5. 702 00:38:59,017 --> 00:39:00,850 You do need to know the sample size in order 703 00:39:00,850 --> 00:39:05,260 to know the effect of the mean shift and ability 704 00:39:05,260 --> 00:39:08,660 to detect with either chart. 705 00:39:08,660 --> 00:39:10,000 So somewhere in here-- 706 00:39:10,000 --> 00:39:11,260 I guess around there-- 707 00:39:11,260 --> 00:39:15,730 I got I got lucky, and I'm starting to sample now 708 00:39:15,730 --> 00:39:19,960 with the x bar picking up that detection. 709 00:39:19,960 --> 00:39:24,760 But with the EWMA, in some sense, I've still got 90%. 710 00:39:24,760 --> 00:39:29,770 It takes 9 or 10 data points for the more recent data 711 00:39:29,770 --> 00:39:33,310 to outweigh the past data, and so I've actually 712 00:39:33,310 --> 00:39:37,450 got, in this case, a time lag in detecting 713 00:39:37,450 --> 00:39:43,880 the mean shift with the EWMA compared to the x bar. 714 00:39:43,880 --> 00:39:46,620 So a question here first-- 715 00:39:46,620 --> 00:39:48,584 AUDIENCE: I don't see how we can compare them 716 00:39:48,584 --> 00:39:52,680 because they actually are different time scales. 717 00:39:52,680 --> 00:39:54,810 All the x bar will effectively-- let 718 00:39:54,810 --> 00:40:04,390 us say the sample size is [INAUDIBLE] 719 00:40:04,390 --> 00:40:06,970 DUANE BONING: So the question was, how do you compare 720 00:40:06,970 --> 00:40:08,740 the time scales on these? 721 00:40:08,740 --> 00:40:12,080 So here's what I think is going on. 722 00:40:12,080 --> 00:40:14,650 And this is sort of what I was talking about a second m 723 00:40:14,650 --> 00:40:17,320 but it's probably still a little confusing. 724 00:40:17,320 --> 00:40:22,240 I believe that this n equals 5 sample size means 725 00:40:22,240 --> 00:40:27,100 that, inside of this data point, if I were to explode that, 726 00:40:27,100 --> 00:40:32,090 it actually has five data points inside of it. 727 00:40:32,090 --> 00:40:37,420 And I'm only plotting the average of those five on here. 728 00:40:37,420 --> 00:40:42,160 This data point also has those same five data points, 729 00:40:42,160 --> 00:40:45,310 but then it also grabbed the data 730 00:40:45,310 --> 00:40:48,100 that was lurking in the previous one. 731 00:40:48,100 --> 00:40:53,800 And it has its five data points and its five-- 732 00:40:53,800 --> 00:40:55,040 whoops. 733 00:40:55,040 --> 00:40:55,540 Yah! 734 00:41:01,900 --> 00:41:02,830 Don't want to-- 735 00:41:08,340 --> 00:41:11,400 I think it was trying to exit me out here. 736 00:41:14,200 --> 00:41:15,960 So while it's doing that, the point 737 00:41:15,960 --> 00:41:21,840 here was that the n equals 5 is not looking back 738 00:41:21,840 --> 00:41:23,910 five data points in the x bar. 739 00:41:23,910 --> 00:41:26,800 It's buried in each of the individual points-- 740 00:41:26,800 --> 00:41:38,330 so that both EWMA and x bar are not 741 00:41:38,330 --> 00:41:42,440 affected by this n equals 5 sampling differently. 742 00:41:42,440 --> 00:41:47,510 But then the time scale is correct in terms of-- 743 00:41:47,510 --> 00:41:50,300 x bar only looks at my most recent sample. 744 00:41:50,300 --> 00:41:54,050 The EWMA does look back in history, 745 00:41:54,050 --> 00:41:57,170 and that's why you've got this additional time lag. 746 00:41:57,170 --> 00:42:00,170 Was that getting at your question or your observation, 747 00:42:00,170 --> 00:42:02,370 or is your question a little different? 748 00:42:02,370 --> 00:42:03,850 AUDIENCE: [INAUDIBLE] 749 00:42:03,850 --> 00:42:05,060 DUANE BONING: OK. 750 00:42:05,060 --> 00:42:05,960 OK. 751 00:42:05,960 --> 00:42:08,617 And there was a question in Singapore. 752 00:42:08,617 --> 00:42:09,200 AUDIENCE: Yes. 753 00:42:09,200 --> 00:42:11,630 So my question is sort of built on what 754 00:42:11,630 --> 00:42:13,110 [? Stephen ?] was asking. 755 00:42:13,110 --> 00:42:17,510 So for a smaller r, you have a higher probability 756 00:42:17,510 --> 00:42:19,152 of detecting a mean shift. 757 00:42:19,152 --> 00:42:21,697 But at same time, for a smaller r, 758 00:42:21,697 --> 00:42:23,780 the time it takes for you to detect the mean shift 759 00:42:23,780 --> 00:42:25,070 is longer. 760 00:42:25,070 --> 00:42:27,800 How does these two coexist? 761 00:42:27,800 --> 00:42:30,530 Well, you have a higher chance of detecting it, 762 00:42:30,530 --> 00:42:32,840 but it takes longer as well. 763 00:42:32,840 --> 00:42:35,210 DUANE BONING: Yes. 764 00:42:35,210 --> 00:42:36,980 They're a trade-off. 765 00:42:36,980 --> 00:42:41,660 And so what you are basically given is a degree of freedom 766 00:42:41,660 --> 00:42:44,030 that you get to play with, or experiment with, 767 00:42:44,030 --> 00:42:46,650 or adapt to your process. 768 00:42:46,650 --> 00:42:49,940 So what are the factors in adapting to your process? 769 00:42:49,940 --> 00:42:53,000 If your process is really noisy and you 770 00:42:53,000 --> 00:42:54,980 think it's going to be hard to detect 771 00:42:54,980 --> 00:43:00,020 a mean shift in the phase of that noise, 772 00:43:00,020 --> 00:43:01,970 then you do more averaging. 773 00:43:01,970 --> 00:43:03,620 You'd have a smaller r. 774 00:43:03,620 --> 00:43:05,990 If you're trying to detect-- 775 00:43:05,990 --> 00:43:08,600 maybe with a higher risk of a false alarm, 776 00:43:08,600 --> 00:43:12,230 but you're trying to detect more quickly, 777 00:43:12,230 --> 00:43:16,670 then I'd have a larger r, so it responds faster, 778 00:43:16,670 --> 00:43:18,720 but doesn't filter as much. 779 00:43:18,720 --> 00:43:22,610 So you can see here, with the r equals 0.1 versus the r 780 00:43:22,610 --> 00:43:23,990 equals 0.3-- 781 00:43:23,990 --> 00:43:25,730 this is the same data, I believe. 782 00:43:25,730 --> 00:43:27,110 Is this the same data? 783 00:43:27,110 --> 00:43:29,250 I guess not-- it's just another example. 784 00:43:29,250 --> 00:43:38,240 Here's an r equals 0.3 kind of case, where, for this data, 785 00:43:38,240 --> 00:43:40,400 this looks like about the right trade-off. 786 00:43:40,400 --> 00:43:43,160 I'm getting reasonable filtering. 787 00:43:43,160 --> 00:43:45,980 The black dots are my EWMA. 788 00:43:45,980 --> 00:43:50,480 And for a mean shift here, I've got very rapid-- 789 00:43:50,480 --> 00:43:54,980 I've got rapid detection, much more rapid than, 790 00:43:54,980 --> 00:43:58,730 in fact, the x bar, which doesn't really 791 00:43:58,730 --> 00:44:00,890 trigger finally until way-- 792 00:44:00,890 --> 00:44:05,790 another 20, 25 runs later. 793 00:44:05,790 --> 00:44:10,070 So I think what you'd have to do is, again make the trade-offs 794 00:44:10,070 --> 00:44:13,340 between responsivity, the size of the shift 795 00:44:13,340 --> 00:44:15,440 that you're trying to detect, and how 796 00:44:15,440 --> 00:44:20,960 much cost you to have different delays in detecting the shift. 797 00:44:20,960 --> 00:44:23,540 So we've got two questions here-- one here first. 798 00:44:23,540 --> 00:44:26,490 AUDIENCE: [INAUDIBLE] It's almost 799 00:44:26,490 --> 00:44:28,808 like you are taking a larger sample size. 800 00:44:28,808 --> 00:44:34,290 [INAUDIBLE] You're taking the 5 samples for-- 801 00:44:34,290 --> 00:44:39,180 5 measurements per sample, plus 5 previously or slightly lower 802 00:44:39,180 --> 00:44:42,510 [INAUDIBLE] plus 5 [INAUDIBLE] plus 5 [INAUDIBLE] So 803 00:44:42,510 --> 00:44:44,850 you're just taking more samples per-- 804 00:44:44,850 --> 00:44:46,890 more measurements per data point, right? 805 00:44:46,890 --> 00:44:47,640 DUANE BONING: Yes. 806 00:44:47,640 --> 00:44:49,342 So that's a fair way of looking at it. 807 00:44:49,342 --> 00:44:50,800 That's a good way of looking at it. 808 00:44:50,800 --> 00:44:53,820 So just so everybody could hear, with the x bar, 809 00:44:53,820 --> 00:44:58,590 I take my n equals whatever sample, but with the EWMA, 810 00:44:58,590 --> 00:45:00,780 I still got that from my most recent update, 811 00:45:00,780 --> 00:45:02,550 but I'm doing additional filtering 812 00:45:02,550 --> 00:45:04,770 or averaging of past history in. 813 00:45:04,770 --> 00:45:07,590 And the net effective that is I've got a net aggregate-- 814 00:45:07,590 --> 00:45:10,800 more number of samples, if you will-- 815 00:45:10,800 --> 00:45:13,020 unequally weighted, but a more-- 816 00:45:13,020 --> 00:45:15,360 and therefore, that makes perfect sense 817 00:45:15,360 --> 00:45:19,440 that your control limits come down. 818 00:45:19,440 --> 00:45:21,720 That's a very good way of looking at it. 819 00:45:21,720 --> 00:45:23,740 AUDIENCE: So in the previous example, 820 00:45:23,740 --> 00:45:26,868 does EWMA also trigger before x bar? 821 00:45:26,868 --> 00:45:27,910 Because x bar [INAUDIBLE] 822 00:45:27,910 --> 00:45:30,118 DUANE BONING: Oh, yes-- good point, good point-- yes. 823 00:45:32,740 --> 00:45:35,445 AUDIENCE: Is there ever a case when x bar will trigger 824 00:45:35,445 --> 00:45:37,537 [INAUDIBLE] 825 00:45:37,537 --> 00:45:39,370 DUANE BONING: Good question-- are there ever 826 00:45:39,370 --> 00:45:42,100 situations where x bar might trigger still 827 00:45:42,100 --> 00:45:47,470 before an EWMA for an x bar chart? 828 00:45:47,470 --> 00:45:49,420 I don't quite know the trade-offs 829 00:45:49,420 --> 00:45:53,140 and where you get in the limits of the trade-off. 830 00:45:56,200 --> 00:45:58,510 You've always got chance at work, 831 00:45:58,510 --> 00:46:02,050 and I suspect that, with a large enough-- 832 00:46:02,050 --> 00:46:06,970 so the following scenario probably comes closest. 833 00:46:06,970 --> 00:46:13,030 Let's say you had a very large mean shift, but a very small r. 834 00:46:13,030 --> 00:46:17,920 With an x bar, I react entirely to a sample drawn 835 00:46:17,920 --> 00:46:19,810 from the very large mean shift. 836 00:46:19,810 --> 00:46:23,590 Let's say it was a 2 sigma or a 3 sigma means shift. 837 00:46:23,590 --> 00:46:25,690 But I have a very small r. 838 00:46:25,690 --> 00:46:29,020 Maybe I'm only calling that 0.05. 839 00:46:29,020 --> 00:46:32,550 That 5% mix in of that most recent data point 840 00:46:32,550 --> 00:46:35,620 is probably swamped out by my past data still, 841 00:46:35,620 --> 00:46:38,920 and so in that case, I think my EWMA would likely not 842 00:46:38,920 --> 00:46:42,140 detect the mean shift. 843 00:46:42,140 --> 00:46:44,380 So I think of the EWMA as especially good 844 00:46:44,380 --> 00:46:46,420 for improving responsivity and detection 845 00:46:46,420 --> 00:46:49,900 ability for small effects, because it's aggregating, 846 00:46:49,900 --> 00:46:52,370 just like the CUSUM that we'll see in a second. 847 00:46:52,370 --> 00:46:56,020 But I think, if there are really big-- bigger effects, 848 00:46:56,020 --> 00:46:59,710 you lose by the filtering in the time response. 849 00:46:59,710 --> 00:47:02,005 Was that an example you're going to suggest? 850 00:47:02,005 --> 00:47:07,430 AUDIENCE: [INAUDIBLE] 851 00:47:07,430 --> 00:47:08,680 DUANE BONING: Oh, good point-- 852 00:47:08,680 --> 00:47:12,128 AUDIENCE: [INAUDIBLE] 853 00:47:12,128 --> 00:47:12,920 DUANE BONING: Yeah. 854 00:47:12,920 --> 00:47:19,118 AUDIENCE: [INAUDIBLE] 855 00:47:19,118 --> 00:47:19,910 DUANE BONING: Yeah. 856 00:47:19,910 --> 00:47:20,600 Yeah. 857 00:47:20,600 --> 00:47:25,150 So other rules here would help-- yeah, absolutely. 858 00:47:25,150 --> 00:47:32,370 OK, so I guess this is sort of saying a little bit 859 00:47:32,370 --> 00:47:33,060 the same thing. 860 00:47:33,060 --> 00:47:37,610 This, I think, is really good for small mean shifts-- 861 00:47:37,610 --> 00:47:38,630 that are persistent. 862 00:47:38,630 --> 00:47:42,950 It's not a one-time shot, that I've got a delta mu mean shift, 863 00:47:42,950 --> 00:47:45,060 and this allows us to aggregate. 864 00:47:45,060 --> 00:47:48,300 There are additional ideas we can use. 865 00:47:48,300 --> 00:47:49,800 Yeah-- question [INAUDIBLE] 866 00:47:49,800 --> 00:47:53,630 AUDIENCE: [INAUDIBLE] 867 00:47:53,630 --> 00:47:55,940 DUANE BONING: No, you would not want to-- 868 00:47:55,940 --> 00:47:58,130 so I think Hayden's point here was 869 00:47:58,130 --> 00:48:01,430 on the purple beta points you might apply, 870 00:48:01,430 --> 00:48:03,230 which was just an x bar. 871 00:48:03,230 --> 00:48:06,860 You might apply the WECO but you should not-- 872 00:48:06,860 --> 00:48:09,560 there might be variants you could derive 873 00:48:09,560 --> 00:48:14,100 some of the other WECO rules, but I don't 874 00:48:14,100 --> 00:48:15,600 know how you would do that. 875 00:48:15,600 --> 00:48:16,500 It's not intuitive. 876 00:48:16,500 --> 00:48:18,930 I think, for the EWMA, you pretty much 877 00:48:18,930 --> 00:48:23,220 are just trying to track the mean, 878 00:48:23,220 --> 00:48:28,350 and not additional trends or-- 879 00:48:28,350 --> 00:48:31,320 I would not use any of the additional WECO rules 880 00:48:31,320 --> 00:48:32,440 with an EWMA chart. 881 00:48:35,060 --> 00:48:37,370 So there's another chart that is very often used. 882 00:48:37,370 --> 00:48:39,500 It's called a CUSUM I just want to give you 883 00:48:39,500 --> 00:48:41,150 a quick feel for that, and then move 884 00:48:41,150 --> 00:48:47,840 to some multivariate issues, which is also targeted 885 00:48:47,840 --> 00:48:50,780 at this idea that, when I have a small mean, 886 00:48:50,780 --> 00:48:54,650 shift what I'd like to do is use the fact that it is 887 00:48:54,650 --> 00:48:58,010 a persistent mean shift that I'm then sampling around 888 00:48:58,010 --> 00:48:59,820 within the process noise. 889 00:48:59,820 --> 00:49:02,000 And if there were a simple way to aggregate 890 00:49:02,000 --> 00:49:06,500 the mean shift algorithm many data points in the past, 891 00:49:06,500 --> 00:49:11,480 I'd build up a signal that's larger and might improve 892 00:49:11,480 --> 00:49:17,780 my ability to detect that effect and reduce the average run 893 00:49:17,780 --> 00:49:20,780 length for detecting that effect. 894 00:49:20,780 --> 00:49:24,200 And this CUSUM, or cumulative sum, 895 00:49:24,200 --> 00:49:28,200 is another way of thinking of it as a filter-- 896 00:49:28,200 --> 00:49:30,140 and in this case, just a discrete time 897 00:49:30,140 --> 00:49:34,730 integrator-- which is looking at the sum of deviations 898 00:49:34,730 --> 00:49:39,300 of each of my sample point from the assumed grand mean. 899 00:49:39,300 --> 00:49:42,350 So I'm not just filtering and plotting the average, 900 00:49:42,350 --> 00:49:43,880 but I'm saying-- 901 00:49:43,880 --> 00:49:48,837 asking the question, am I still in control at my assumed x bar 902 00:49:48,837 --> 00:49:51,800 bar, my mu 0? 903 00:49:51,800 --> 00:49:54,560 And if I'm not, then this statistic, 904 00:49:54,560 --> 00:49:59,780 this sum of being to one side of it, the c sub j 905 00:49:59,780 --> 00:50:03,140 here will grow if I've got a plus mean shift, 906 00:50:03,140 --> 00:50:05,900 or become more and more negative and grow if I've 907 00:50:05,900 --> 00:50:08,780 got a negative mean shift-- 908 00:50:08,780 --> 00:50:12,740 so that I'll convert a mean shift to an integration 909 00:50:12,740 --> 00:50:14,570 of a mean shift. 910 00:50:14,570 --> 00:50:16,430 The mean shift is a step. 911 00:50:16,430 --> 00:50:21,350 The integration of a step response is a ramp. 912 00:50:21,350 --> 00:50:25,940 So what we would hope to do with a CUSUM chart 913 00:50:25,940 --> 00:50:28,970 is improve our mean shift sensitivity, 914 00:50:28,970 --> 00:50:32,060 and basically have charts that look like this. 915 00:50:32,060 --> 00:50:39,310 When I'm in control, I might wander around the mean. 916 00:50:39,310 --> 00:50:41,510 Sometimes I'll be above, sometimes below, 917 00:50:41,510 --> 00:50:45,460 so this c sub j, this cumulative statistic, 918 00:50:45,460 --> 00:50:47,110 does wander around a bit. 919 00:50:47,110 --> 00:50:50,050 But if I do have a mean shift, on average, 920 00:50:50,050 --> 00:50:53,560 I keep adding to that C sub J. 921 00:50:53,560 --> 00:50:55,330 Sometimes I might go down a little bit, 922 00:50:55,330 --> 00:50:58,580 because I still have something a little bit below the mean. 923 00:50:58,580 --> 00:51:02,530 And so what you create is this ramp. 924 00:51:02,530 --> 00:51:07,880 You create a slope caused by that mean shift. 925 00:51:07,880 --> 00:51:11,260 Now, one can also now apply some statistics 926 00:51:11,260 --> 00:51:19,330 and design, control limits for that C sub J statistic, 927 00:51:19,330 --> 00:51:22,360 or use a couple of other approaches. 928 00:51:22,360 --> 00:51:28,430 And one approach is this so-called V-Mask that says it-- 929 00:51:28,430 --> 00:51:34,060 what I would like to do is, if, in fact, I've got a mean shift, 930 00:51:34,060 --> 00:51:35,980 this slope-- 931 00:51:35,980 --> 00:51:37,690 let me erase a little bit here-- 932 00:51:40,840 --> 00:51:48,410 whoops-- so there are two different approaches-- 933 00:51:48,410 --> 00:51:50,260 and I just want to briefly mention 934 00:51:50,260 --> 00:51:54,460 both of those-- for detecting things on these charts. 935 00:51:54,460 --> 00:51:59,026 One is draw a control limit for the aggregate of the C sub 936 00:51:59,026 --> 00:52:07,120 J. The other is to look and say, once I've got a mean shift, I 937 00:52:07,120 --> 00:52:11,980 start building data points with, on average, some slope that 938 00:52:11,980 --> 00:52:18,700 is different than a slope that I would expect by random chance 939 00:52:18,700 --> 00:52:24,460 alone, just forming the sum statistic from a normally 940 00:52:24,460 --> 00:52:28,440 sampled process. 941 00:52:28,440 --> 00:52:33,150 The tabular CUSUM-- this idea of actually looking 942 00:52:33,150 --> 00:52:35,890 at the statistics of-- 943 00:52:35,890 --> 00:52:38,490 let me skip this for a second-- 944 00:52:38,490 --> 00:52:41,490 this normalized statistic z and s, 945 00:52:41,490 --> 00:52:46,157 and looking and deciding on what control limits are, and so on-- 946 00:52:46,157 --> 00:52:47,490 I'm actually going to skip that. 947 00:52:47,490 --> 00:52:49,080 There's about five slides in here, 948 00:52:49,080 --> 00:52:53,520 but it's described very nicely in, I think, chapter 8.1 949 00:52:53,520 --> 00:52:55,320 in Montgomery. 950 00:52:55,320 --> 00:52:56,830 And I think that's pretty natural. 951 00:52:56,830 --> 00:53:04,500 It's just sort of a statistic-- a little bit confusing 952 00:53:04,500 --> 00:53:06,870 derivation of the statistics that-- 953 00:53:06,870 --> 00:53:11,760 I just trust the statisticians who have done right. 954 00:53:11,760 --> 00:53:14,740 But there's also another picture here, 955 00:53:14,740 --> 00:53:17,280 which is based on the slope idea. 956 00:53:17,280 --> 00:53:21,150 And you can similarly derive some formulas 957 00:53:21,150 --> 00:53:27,060 based on alpha and beta risks, what size of a delta 958 00:53:27,060 --> 00:53:31,260 shift in the mean you would hope to detect, 959 00:53:31,260 --> 00:53:35,250 and what kind of a slope you would expect to see, 960 00:53:35,250 --> 00:53:36,790 in those cases. 961 00:53:36,790 --> 00:53:39,900 And so you've got these two parameters that 962 00:53:39,900 --> 00:53:43,920 go towards the formulation of this V-Mask 963 00:53:43,920 --> 00:53:48,350 that you overlay on your data points 964 00:53:48,350 --> 00:53:52,820 based on an expected slope, and you need a certain number 965 00:53:52,820 --> 00:53:54,770 of data points in order to be able to build up 966 00:53:54,770 --> 00:53:56,000 your estimate of the slope. 967 00:53:56,000 --> 00:53:57,650 That's the dead zone. 968 00:53:57,650 --> 00:54:02,090 And what you do for each data point here is overlay 969 00:54:02,090 --> 00:54:05,600 the V-Mask and say, look back in time and see, 970 00:54:05,600 --> 00:54:12,140 did any of these points intersect with the mask-- 971 00:54:12,140 --> 00:54:15,230 have a higher slope than you would have expected? 972 00:54:15,230 --> 00:54:17,580 And then you update that on a running basis. 973 00:54:17,580 --> 00:54:20,300 So in this case, here at time point 974 00:54:20,300 --> 00:54:23,960 18 or 19-- whatever that is-- everything looks fine. 975 00:54:23,960 --> 00:54:25,260 I go up. 976 00:54:25,260 --> 00:54:27,530 Everything is still looking fine. 977 00:54:27,530 --> 00:54:29,960 Still here, I'm starting to get some evidence maybe 978 00:54:29,960 --> 00:54:33,290 that things have changed, but statistically speaking, 979 00:54:33,290 --> 00:54:35,210 that's still within the range that I would 980 00:54:35,210 --> 00:54:38,340 expect with some alpha error. 981 00:54:38,340 --> 00:54:43,400 But then finally, I get to a new data point here at time 23, 982 00:54:43,400 --> 00:54:47,540 where, in aggregate, I've got some indication 983 00:54:47,540 --> 00:54:49,860 that the slope is too large. 984 00:54:49,860 --> 00:54:53,120 So this is kind of a intuitive graphical approach that's 985 00:54:53,120 --> 00:54:55,160 talking about the slope associated 986 00:54:55,160 --> 00:55:00,770 with a mean shift in the CUSUM statistic. 987 00:55:00,770 --> 00:55:01,700 Yes? 988 00:55:01,700 --> 00:55:05,210 AUDIENCE: Would you also perform a linear regression 989 00:55:05,210 --> 00:55:07,852 for the last three or four parts and check the [INAUDIBLE]?? 990 00:55:07,852 --> 00:55:09,560 DUANE BONING: So the question is, can you 991 00:55:09,560 --> 00:55:14,580 perform a linear regression and calculate the slope of that? 992 00:55:14,580 --> 00:55:17,900 And I think that's identical to what this is doing graphically. 993 00:55:17,900 --> 00:55:20,300 So in fact, if you were numerically-- 994 00:55:20,300 --> 00:55:24,200 or implementing an automatic version of this, 995 00:55:24,200 --> 00:55:28,790 that would probably be how you would implement it. 996 00:55:28,790 --> 00:55:31,430 That might even give you some more flexibility 997 00:55:31,430 --> 00:55:35,420 in terms of looking at the goodness of that fit, 998 00:55:35,420 --> 00:55:39,440 but I think it's essentially identical to this. 999 00:55:39,440 --> 00:55:41,120 OK. 1000 00:55:41,120 --> 00:55:43,580 So just to summarize these alternative charts, 1001 00:55:43,580 --> 00:55:46,250 there's a few simple messages. 1002 00:55:46,250 --> 00:55:48,830 Noisy data-- if you're trying to detect 1003 00:55:48,830 --> 00:55:50,930 a mean in the noisy data, you might 1004 00:55:50,930 --> 00:55:52,580 want to use some filtering. 1005 00:55:52,580 --> 00:55:55,100 We have the sample size that does some filtering, 1006 00:55:55,100 --> 00:55:57,050 but I might want to use more data 1007 00:55:57,050 --> 00:56:02,300 in the past and this linear discrete filtering, 1008 00:56:02,300 --> 00:56:08,210 either with the CUSUM as a running integrator or an EWMA-- 1009 00:56:08,210 --> 00:56:11,990 which is an iterative discrete filter-- 1010 00:56:11,990 --> 00:56:13,170 can be applied. 1011 00:56:13,170 --> 00:56:15,200 And I think, conceptually, they're 1012 00:56:15,200 --> 00:56:17,330 fairly nice in terms of that noise suppression 1013 00:56:17,330 --> 00:56:20,240 for detecting of the mean-- 1014 00:56:20,240 --> 00:56:22,610 got to remember that your choice of factors, 1015 00:56:22,610 --> 00:56:26,480 like r and the EWMA, depends on your process, and the mix 1016 00:56:26,480 --> 00:56:29,480 of noise, and how fast you want to respond to be 1017 00:56:29,480 --> 00:56:31,520 able to detect a mean shift. 1018 00:56:31,520 --> 00:56:34,500 And then, going back to what we started with our little example 1019 00:56:34,500 --> 00:56:37,430 at the very beginning. 1020 00:56:37,430 --> 00:56:39,350 Noisy data need some filtering. 1021 00:56:39,350 --> 00:56:41,480 We've talked about it with respect to the mean-- 1022 00:56:41,480 --> 00:56:42,860 that, if I'm filtering that data, 1023 00:56:42,860 --> 00:56:45,980 you also still want to look back and monitor the variance 1024 00:56:45,980 --> 00:56:46,890 as well. 1025 00:56:46,890 --> 00:56:52,610 So you can apply these charts also to moving, S bar charts, 1026 00:56:52,610 --> 00:56:54,870 and so on. 1027 00:56:54,870 --> 00:56:56,270 So what I want to do next is move 1028 00:56:56,270 --> 00:56:59,510 a little bit to multivariate process control, 1029 00:56:59,510 --> 00:57:03,860 control charting when I've got more than one parameter that I 1030 00:57:03,860 --> 00:57:07,370 may be monitoring on the unit process or on the product. 1031 00:57:07,370 --> 00:57:09,800 And what's really interesting here 1032 00:57:09,800 --> 00:57:15,110 is there are essentially two classic mistakes that 1033 00:57:15,110 --> 00:57:17,282 are made again and again. 1034 00:57:17,282 --> 00:57:19,490 And I want to give you a feel for those two mistakes, 1035 00:57:19,490 --> 00:57:21,880 and strategies for avoiding them. 1036 00:57:21,880 --> 00:57:23,630 OK? 1037 00:57:23,630 --> 00:57:27,650 One classic problem is you've got more than one parameter 1038 00:57:27,650 --> 00:57:30,530 you're monitoring-- you implement more than one control 1039 00:57:30,530 --> 00:57:33,020 chart-- 1040 00:57:33,020 --> 00:57:34,830 seems natural. 1041 00:57:34,830 --> 00:57:37,730 But the problem is you can get many, many, many more false 1042 00:57:37,730 --> 00:57:43,610 alarms than you might have expected, or believed 1043 00:57:43,610 --> 00:57:46,190 you are going to get, because you're just 1044 00:57:46,190 --> 00:57:48,290 monitoring more parameters. 1045 00:57:48,290 --> 00:57:51,120 And so I want to talk a little bit about that. 1046 00:57:51,120 --> 00:57:53,540 That's even if there's no correlation 1047 00:57:53,540 --> 00:57:56,540 between your parameters-- 1048 00:57:56,540 --> 00:57:58,580 or especially if there's no correlation 1049 00:57:58,580 --> 00:58:00,320 between your parameters, you can get-- 1050 00:58:00,320 --> 00:58:05,060 just by the fact that you're monitoring more than one thing, 1051 00:58:05,060 --> 00:58:07,490 you've got to think about false alarm rate a little bit 1052 00:58:07,490 --> 00:58:08,580 differently. 1053 00:58:08,580 --> 00:58:10,130 And then the second classic problem 1054 00:58:10,130 --> 00:58:14,330 is, well, there is correlation among the parameters-- 1055 00:58:14,330 --> 00:58:15,830 the multiple parameters. 1056 00:58:15,830 --> 00:58:17,580 How do you deal with that? 1057 00:58:17,580 --> 00:58:20,420 And in fact, in this case, you often 1058 00:58:20,420 --> 00:58:23,500 run into the opposite risk. 1059 00:58:23,500 --> 00:58:25,600 Instead of too many false alarms, 1060 00:58:25,600 --> 00:58:28,810 you may have points of data or product 1061 00:58:28,810 --> 00:58:33,070 that is truly out of normal operation 1062 00:58:33,070 --> 00:58:34,990 that you may not detect. 1063 00:58:34,990 --> 00:58:38,830 It's a highly unusual point that may in 1064 00:58:38,830 --> 00:58:41,800 indicate something has changed in your process, 1065 00:58:41,800 --> 00:58:44,740 but if you're just monitoring with univariate control charts, 1066 00:58:44,740 --> 00:58:45,670 you'll never see it. 1067 00:58:48,940 --> 00:58:53,020 That is a common mistake number two. 1068 00:58:53,020 --> 00:58:56,620 Multiple charts is common mistake number one first. 1069 00:58:56,620 --> 00:58:59,770 If I truly have independent parameters being modeled-- 1070 00:58:59,770 --> 00:59:03,250 monitored at my process step, and I set each one of them 1071 00:59:03,250 --> 00:59:07,270 up to have some acceptable alpha rate-- 1072 00:59:07,270 --> 00:59:08,440 false alarm rate-- 1073 00:59:08,440 --> 00:59:13,630 1 out of every 370 process steps going with a plus/minus 1074 00:59:13,630 --> 00:59:17,260 three-sigma-- 1075 00:59:17,260 --> 00:59:21,730 three-sigma control limits, alpha's about 0.0027-- 1076 00:59:21,730 --> 00:59:26,110 sometimes talked about 0.003 as an approximation. 1077 00:59:26,110 --> 00:59:29,260 But now I've got p of those separate charts. 1078 00:59:29,260 --> 00:59:32,740 What is the aggregate false alarm probability? 1079 00:59:32,740 --> 00:59:35,020 Well to be careful about it, if they are, in fact, 1080 00:59:35,020 --> 00:59:38,590 independent, that alpha prime, might aggregate false alarm, 1081 00:59:38,590 --> 00:59:43,010 is, 1 minus 1 minus alpha to the p. 1082 00:59:43,010 --> 00:59:47,710 This is the probability that I don't 1083 00:59:47,710 --> 00:59:52,210 get a false alarm for each of-- for all of the other p control 1084 00:59:52,210 --> 00:59:55,180 charts all at the same time. 1085 00:59:55,180 --> 00:59:57,040 And then 1 minus that is the chance 1086 00:59:57,040 --> 00:59:58,960 that I did get one of them. 1087 00:59:58,960 --> 01:00:03,250 Now for very small alpha, you do that expansion, the 1 minus x 1088 01:00:03,250 --> 01:00:07,780 for small x to the p is simply 1 minus x So you do that 1089 01:00:07,780 --> 01:00:10,960 and you get sort of the nice intuitive result 1090 01:00:10,960 --> 01:00:14,740 that if I've got 10 independent control charts, 1091 01:00:14,740 --> 01:00:18,850 I've got about 10 times the rate of false alarm 1092 01:00:18,850 --> 01:00:19,990 that I would have from one. 1093 01:00:22,990 --> 01:00:23,800 So what do you do? 1094 01:00:32,560 --> 01:00:35,120 What would you do? 1095 01:00:35,120 --> 01:00:38,570 Let me say it this way. 1096 01:00:38,570 --> 01:00:42,500 You've decided, based on your cost analysis or whatever, 1097 01:00:42,500 --> 01:00:48,490 that you're willing about once every 370 runs 1098 01:00:48,490 --> 01:00:52,820 to have a false alarm somewhere on this process, 1099 01:00:52,820 --> 01:00:54,533 that you want to at least-- look at maybe 1100 01:00:54,533 --> 01:00:55,700 you look at it real quickly. 1101 01:00:55,700 --> 01:00:56,867 It doesn't take a long time. 1102 01:00:56,867 --> 01:01:02,090 But that's about how often you want the operator to stop 1103 01:01:02,090 --> 01:01:06,590 and take a look, on average, even when the process is 1104 01:01:06,590 --> 01:01:07,400 in control. 1105 01:01:07,400 --> 01:01:10,340 So that's your false alarm rate that you're willing to accept, 1106 01:01:10,340 --> 01:01:12,110 but you're monitoring 10 parameters. 1107 01:01:17,600 --> 01:01:19,350 [INAUDIBLE] some approaches you would use? 1108 01:01:23,145 --> 01:01:26,040 AUDIENCE: You have to increase your control limits. 1109 01:01:26,040 --> 01:01:27,700 DUANE BONING: Right, perfect-- 1110 01:01:27,700 --> 01:01:29,700 you would increase your control limits. 1111 01:01:29,700 --> 01:01:32,790 If I expand my control limits out a little bit, 1112 01:01:32,790 --> 01:01:38,850 I have smaller false alarm rate on each chart. 1113 01:01:38,850 --> 01:01:50,880 Well, I can basically decide what my aggregate false alarm 1114 01:01:50,880 --> 01:01:53,040 rate would be-- 1115 01:01:53,040 --> 01:01:54,930 1 out of 370. 1116 01:01:54,930 --> 01:01:59,640 And to set my false alarm rate on each individual chart, 1117 01:01:59,640 --> 01:02:03,420 just divide by how many charts I have. 1118 01:02:03,420 --> 01:02:10,080 Now, I might have a 1 in 3,700 false alarm rate on each chart, 1119 01:02:10,080 --> 01:02:15,180 and I use that new alpha to set my control limit. 1120 01:02:15,180 --> 01:02:19,380 They might end up being, in that case, plus minus 3.4 1121 01:02:19,380 --> 01:02:23,410 sigma control limits. 1122 01:02:23,410 --> 01:02:29,170 So the individual false alarm rate is lower on each one, 1123 01:02:29,170 --> 01:02:31,540 because really, the question you are asking-- 1124 01:02:31,540 --> 01:02:37,840 the key question was not that individual parameter, 1125 01:02:37,840 --> 01:02:40,840 but rather, the question, is my process in control? 1126 01:02:40,840 --> 01:02:48,320 In aggregate, do I have an event that looks unlikely-- 1127 01:02:48,320 --> 01:02:54,790 so unlikely that I have 1 minus alpha confidence, 1 minus 1128 01:02:54,790 --> 01:02:59,020 alpha prime confidence, that, in fact, something unusual 1129 01:02:59,020 --> 01:03:02,020 is going on in the process? 1130 01:03:02,020 --> 01:03:03,790 So that's a strategy. 1131 01:03:03,790 --> 01:03:07,180 I want to also point out-- and this is kind of a neat little 1132 01:03:07,180 --> 01:03:09,850 article that I found last year-- 1133 01:03:09,850 --> 01:03:13,390 that this is a more generic problem, 1134 01:03:13,390 --> 01:03:17,020 a more generic common mistake than just control charting. 1135 01:03:17,020 --> 01:03:21,820 It applies all the time, when people are doing significance 1136 01:03:21,820 --> 01:03:25,510 tests or hypothesis tests, but doing more than one of them 1137 01:03:25,510 --> 01:03:28,090 at a time. 1138 01:03:28,090 --> 01:03:30,890 Control charge is just a running hypothesis test. 1139 01:03:30,890 --> 01:03:33,280 Now, if I'm checking 10 hypotheses, 1140 01:03:33,280 --> 01:03:37,090 10 control charts, and 10 parameters. 1141 01:03:37,090 --> 01:03:41,320 My aggregate false alarm rate is higher. 1142 01:03:41,320 --> 01:03:43,570 And there's this neat little article 1143 01:03:43,570 --> 01:03:47,530 pulled from-- that I found in The Economist last February 1144 01:03:47,530 --> 01:03:51,490 that's called "Medical Statistics Signs of the Times." 1145 01:03:51,490 --> 01:03:53,020 The subtitle here-- you may not be 1146 01:03:53,020 --> 01:03:56,710 able to read it-- is, "Why So Much Medical Research is Rot." 1147 01:03:59,830 --> 01:04:02,200 I'll read you the first couple of sentences here. 1148 01:04:02,200 --> 01:04:05,140 People born under the astrological sign of Leo 1149 01:04:05,140 --> 01:04:07,870 are 15% more likely to be admitted 1150 01:04:07,870 --> 01:04:10,090 to hospital with gastric bleeding 1151 01:04:10,090 --> 01:04:13,240 than those born with the other 11 signs. 1152 01:04:13,240 --> 01:04:16,990 Sagittarians are 38% more likely than others to land up 1153 01:04:16,990 --> 01:04:19,720 there because of a broken arm. 1154 01:04:19,720 --> 01:04:22,390 Those are the conclusions that many medical researchers 1155 01:04:22,390 --> 01:04:24,880 would be forced to make from a set of data presented 1156 01:04:24,880 --> 01:04:28,480 to the American Association for the Advancement of Science. 1157 01:04:28,480 --> 01:04:30,400 At least they would be forced to draw them, 1158 01:04:30,400 --> 01:04:33,370 if they applied the lax statistical methods 1159 01:04:33,370 --> 01:04:36,310 of their own work to the records of hospital admissions 1160 01:04:36,310 --> 01:04:38,840 in Ontario, Canada. 1161 01:04:38,840 --> 01:04:41,860 And the basic confusion drawn here in the red 1162 01:04:41,860 --> 01:04:44,980 arises because each result is tested separately 1163 01:04:44,980 --> 01:04:47,410 to see how likely in statistical terms 1164 01:04:47,410 --> 01:04:49,460 it was to have happen by chance. 1165 01:04:49,460 --> 01:04:51,460 If the likelihood is below a certain threshold-- 1166 01:04:51,460 --> 01:04:53,548 typically 95%-- 1167 01:04:53,548 --> 01:04:59,170 so we've got a 95% confidence, which says, 1 in 20 times, 1168 01:04:59,170 --> 01:05:02,890 by chance alone, I might see a signal that big-- 1169 01:05:02,890 --> 01:05:04,640 by chance alone. 1170 01:05:04,640 --> 01:05:06,550 And now I check 12-- 1171 01:05:06,550 --> 01:05:09,700 maybe astrological sign is your 12 things 1172 01:05:09,700 --> 01:05:12,290 I'm checking for as factors-- 1173 01:05:12,290 --> 01:05:17,680 and I've got a 1 in 20 chance, and I check 12 of them-- 1174 01:05:17,680 --> 01:05:20,200 probably I'm going to pop up and say, oh, 1175 01:05:20,200 --> 01:05:23,290 that one's significant. 1176 01:05:23,290 --> 01:05:27,190 And so that's exactly the same common mistake 1177 01:05:27,190 --> 01:05:29,920 that you can see with multiple control charts. 1178 01:05:29,920 --> 01:05:33,490 And so the point here would simply be you should-- 1179 01:05:33,490 --> 01:05:35,230 if you're checking multiple factors, 1180 01:05:35,230 --> 01:05:38,410 you got to have a stronger indication at any one factor-- 1181 01:05:38,410 --> 01:05:41,470 otherwise, you're just fishing, and by chance alone, you're 1182 01:05:41,470 --> 01:05:44,230 going to find stuff. 1183 01:05:44,230 --> 01:05:45,950 So that's kind of a neat example. 1184 01:05:45,950 --> 01:05:48,340 And so here was the point I made about how you fix it. 1185 01:05:48,340 --> 01:05:51,010 You just fix your aggregate alpha prime, 1186 01:05:51,010 --> 01:05:52,720 you have your individual alpha, and then 1187 01:05:52,720 --> 01:05:54,880 you pick your upper and lower control limits 1188 01:05:54,880 --> 01:05:57,430 based on that alpha. 1189 01:05:57,430 --> 01:06:00,340 There's a second common problem, and as I said, 1190 01:06:00,340 --> 01:06:06,010 it's the obverse of this, which is you very often assume 1191 01:06:06,010 --> 01:06:09,370 that, if I'm monitoring 10 process control parameters, 1192 01:06:09,370 --> 01:06:11,290 I assume they're all independent. 1193 01:06:11,290 --> 01:06:13,130 But very often, they are not. 1194 01:06:13,130 --> 01:06:15,702 There is correlation among those parameters, 1195 01:06:15,702 --> 01:06:18,160 and I want to give you a little bit of a feel for what that 1196 01:06:18,160 --> 01:06:21,640 looks like, and what the risk is, 1197 01:06:21,640 --> 01:06:25,810 and what approaches are for control charting and particular 1198 01:06:25,810 --> 01:06:27,360 of this. 1199 01:06:27,360 --> 01:06:30,670 So here's an example drawn from some data that 1200 01:06:30,670 --> 01:06:33,370 came from an LFM thesis a few years ago, 1201 01:06:33,370 --> 01:06:35,200 a body and white assembly. 1202 01:06:35,200 --> 01:06:39,070 So body and white-- this is when they are basically putting 1203 01:06:39,070 --> 01:06:45,390 together all of the different panels on the car and measuring 1204 01:06:45,390 --> 01:06:49,120 many, many points with all kinds of laser, 1205 01:06:49,120 --> 01:06:53,590 automated apparatus to be able to look at dimensional control 1206 01:06:53,590 --> 01:06:56,320 of these panels in the assembly-- 1207 01:06:56,320 --> 01:06:58,150 of some of these panels. 1208 01:06:58,150 --> 01:07:01,030 Other correlated data comes up all over the place-- 1209 01:07:01,030 --> 01:07:05,260 injection molding-- I think we're missing-- 1210 01:07:05,260 --> 01:07:06,850 I can't remember his name. 1211 01:07:06,850 --> 01:07:09,610 He asked a question last time about critical dimensions 1212 01:07:09,610 --> 01:07:10,810 on a semiconductor wafer. 1213 01:07:10,810 --> 01:07:14,110 Yes, they are often correlated. 1214 01:07:14,110 --> 01:07:16,070 But here's the body and white example, 1215 01:07:16,070 --> 01:07:18,610 and what I'm sort of showing are these little coordinate 1216 01:07:18,610 --> 01:07:22,760 measuring machine, CMM kinds of measurements. 1217 01:07:22,760 --> 01:07:26,720 I have a few of these different data points. 1218 01:07:26,720 --> 01:07:29,600 And the point is, if I think of this as a metal 1219 01:07:29,600 --> 01:07:33,320 panel in some way, and I'm measuring the position, say, 1220 01:07:33,320 --> 01:07:39,890 of these four corner points, you would expect there 1221 01:07:39,890 --> 01:07:42,710 to be correlation among them. 1222 01:07:42,710 --> 01:07:46,160 And certain kinds of distortion might change 1223 01:07:46,160 --> 01:07:48,060 relationships between them. 1224 01:07:48,060 --> 01:07:50,630 And in fact, you might even come up with other statistics, 1225 01:07:50,630 --> 01:07:53,450 like a y1, and a y3, and a y4, that 1226 01:07:53,450 --> 01:07:56,720 try to detect or look at tracking 1227 01:07:56,720 --> 01:07:59,630 of a couple of these parameters and deviations. 1228 01:07:59,630 --> 01:08:02,420 But the point is I wouldn't necessarily 1229 01:08:02,420 --> 01:08:05,090 want to monitor those four corner locations 1230 01:08:05,090 --> 01:08:09,250 as independent parameters. 1231 01:08:09,250 --> 01:08:10,250 Here's another example. 1232 01:08:10,250 --> 01:08:14,560 This comes from an ultrasonic welding set of data. 1233 01:08:14,560 --> 01:08:16,689 And you may not be able to see this. 1234 01:08:16,689 --> 01:08:21,529 This is a wonderful correlation plot of three, four, five, six, 1235 01:08:21,529 --> 01:08:25,450 seven, eight, 10 different parameters against each other. 1236 01:08:25,450 --> 01:08:29,740 There are different things, like P2 weld energy, P2 peak power. 1237 01:08:29,740 --> 01:08:31,660 These are simply scatter plots of data 1238 01:08:31,660 --> 01:08:33,300 drawn from that process. 1239 01:08:33,300 --> 01:08:36,710 You can see some of these parameters that track down-- 1240 01:08:36,710 --> 01:08:41,622 that's total weld energy versus P2 weld energy. 1241 01:08:41,622 --> 01:08:43,330 I don't even know what those things mean, 1242 01:08:43,330 --> 01:08:45,663 but they sure sound like they're going to be correlated, 1243 01:08:45,663 --> 01:08:46,670 don't they? 1244 01:08:46,670 --> 01:08:49,240 And sure enough, there's extremely strong correlation 1245 01:08:49,240 --> 01:08:50,710 in that data. 1246 01:08:50,710 --> 01:08:53,350 There's other cases where there's also 1247 01:08:53,350 --> 01:08:55,340 fairly strong correlation. 1248 01:08:55,340 --> 01:08:59,290 So again, these kinds of correlated data, 1249 01:08:59,290 --> 01:09:01,779 where knowing something about one of the measurements 1250 01:09:01,779 --> 01:09:06,939 tells you about the likelihood of the other measurement, 1251 01:09:06,939 --> 01:09:09,319 occur all the time. 1252 01:09:09,319 --> 01:09:12,550 So what if you have truly random independent variables? 1253 01:09:12,550 --> 01:09:14,600 What would a scatter plot look like, 1254 01:09:14,600 --> 01:09:17,350 and how do we go about setting the control limits? 1255 01:09:17,350 --> 01:09:18,580 That's pretty natural, right? 1256 01:09:18,580 --> 01:09:22,420 So here's an example where we had x1 and x2. 1257 01:09:22,420 --> 01:09:27,229 The scatterplot basically looks like a cloud of data-- 1258 01:09:27,229 --> 01:09:30,729 and if I added more data, you start to-- 1259 01:09:30,729 --> 01:09:32,710 depending on the scaling, if I were simply 1260 01:09:32,710 --> 01:09:38,200 to draw, say, a joint 95% confidence integral, 1261 01:09:38,200 --> 01:09:40,660 it's just kind of a cloud of data, 1262 01:09:40,660 --> 01:09:46,790 but there's no cross information between x1 and x2. 1263 01:09:46,790 --> 01:09:49,970 In other words, if I told you that x1 was right here-- 1264 01:09:49,970 --> 01:09:54,170 the value of x1 was this value-- 1265 01:09:54,170 --> 01:09:58,250 that really doesn't tell me very much about what value of x2 1266 01:09:58,250 --> 01:09:59,360 could be. 1267 01:09:59,360 --> 01:10:04,280 It still falls anywhere within its distribution. 1268 01:10:04,280 --> 01:10:07,310 These are two independent random variables. 1269 01:10:07,310 --> 01:10:09,530 And you can decide what the proper limits are. 1270 01:10:09,530 --> 01:10:13,730 Now you know enough about picking x1 and x2, 1271 01:10:13,730 --> 01:10:16,520 and I would do the alpha prime idea 1272 01:10:16,520 --> 01:10:20,510 to pick independent limits on those two cases. 1273 01:10:20,510 --> 01:10:23,030 This is an example scatterplot for 1274 01:10:23,030 --> 01:10:25,700 correlated random variables. 1275 01:10:25,700 --> 01:10:31,010 And in this case, it starts to look like an oval or a football 1276 01:10:31,010 --> 01:10:35,540 kind of shape, where there is statistical correlation 1277 01:10:35,540 --> 01:10:37,640 between these two parameters-- 1278 01:10:37,640 --> 01:10:43,760 by which I mean, if I tell you that x1 is in this range, 1279 01:10:43,760 --> 01:10:46,760 that tells you something about where the value of x2 1280 01:10:46,760 --> 01:10:48,200 is likely to fall. 1281 01:10:48,200 --> 01:10:51,380 It might not fall within its full natural range. 1282 01:10:51,380 --> 01:10:53,930 There is cross information or cross-correlation 1283 01:10:53,930 --> 01:10:55,470 between the two. 1284 01:10:55,470 --> 01:10:59,900 Now, if I were to plot x1 and x2 separately, 1285 01:10:59,900 --> 01:11:03,580 it looks an awful lot like the uncorrelated case. 1286 01:11:03,580 --> 01:11:08,550 So if I had independent control charts for the two variables, 1287 01:11:08,550 --> 01:11:10,010 essentially what I'm doing is just 1288 01:11:10,010 --> 01:11:13,520 projecting all of these data points. 1289 01:11:13,520 --> 01:11:19,010 I project all of these data points to get my range for x2, 1290 01:11:19,010 --> 01:11:22,010 and I project all of these data points. 1291 01:11:22,010 --> 01:11:25,730 And in fact, if I plot both of those, x1 and x2, 1292 01:11:25,730 --> 01:11:27,846 I would likely get-- 1293 01:11:27,846 --> 01:11:30,800 yikes-- don't want to do that-- 1294 01:11:30,800 --> 01:11:37,220 I would likely get two separate, normally distributed variables, 1295 01:11:37,220 --> 01:11:39,770 and I might be inclined to simply plot them 1296 01:11:39,770 --> 01:11:48,050 both as x bar and S kinds of control charts. 1297 01:11:48,050 --> 01:11:51,230 The risk here, though, when you do 1298 01:11:51,230 --> 01:11:56,180 that is you might miss outliers, data 1299 01:11:56,180 --> 01:12:02,660 points that do not fall in the normal or the natural variation 1300 01:12:02,660 --> 01:12:04,220 of the process-- 1301 01:12:04,220 --> 01:12:07,220 this being the natural variation of the process 1302 01:12:07,220 --> 01:12:10,460 that I will not detect. 1303 01:12:10,460 --> 01:12:14,340 This data point still looks like it's within that distribution. 1304 01:12:14,340 --> 01:12:17,450 In, fact, it's very close to the mean of x2. 1305 01:12:17,450 --> 01:12:22,020 And it's still within this distribution. 1306 01:12:22,020 --> 01:12:25,310 But within the natural variation of the process, 1307 01:12:25,310 --> 01:12:30,260 within that oval, its distance, in some sense, 1308 01:12:30,260 --> 01:12:34,760 from the central moments of that distribution-- 1309 01:12:34,760 --> 01:12:36,110 it's pretty distant. 1310 01:12:36,110 --> 01:12:38,990 It's not indicative of the normal operation. 1311 01:12:38,990 --> 01:12:41,660 It's a highly unlikely, by chance alone, data 1312 01:12:41,660 --> 01:12:46,510 point, and I would never see it on the control chart. 1313 01:12:46,510 --> 01:12:51,960 So there are multivariate charts that 1314 01:12:51,960 --> 01:12:57,510 let us get at and chart on a single chart-- 1315 01:12:57,510 --> 01:13:01,560 multiple factors that are correlated together-- 1316 01:13:01,560 --> 01:13:05,400 and detect those kinds of problems. 1317 01:13:05,400 --> 01:13:08,790 What I want to give you is a notion first 1318 01:13:08,790 --> 01:13:14,760 of a statistic that aggregates, in some sense-- 1319 01:13:14,760 --> 01:13:16,950 numerically aggregates exactly what 1320 01:13:16,950 --> 01:13:19,350 I was talking about qualitatively here, 1321 01:13:19,350 --> 01:13:21,720 that tells you this notion of, what 1322 01:13:21,720 --> 01:13:27,120 is the distance from the correlated variation 1323 01:13:27,120 --> 01:13:30,030 cloud of each of my measurement points? 1324 01:13:30,030 --> 01:13:33,720 And if I have a statistic and I know it's PDF, 1325 01:13:33,720 --> 01:13:37,410 then I can set control limits on that statistic. 1326 01:13:37,410 --> 01:13:40,680 And that statistic is the Hotelling T squared, 1327 01:13:40,680 --> 01:13:43,375 is one very important one. 1328 01:13:43,375 --> 01:13:44,250 We can also monitor-- 1329 01:13:48,060 --> 01:13:51,000 have EWMA and CUSUM extensions of these. 1330 01:13:51,000 --> 01:13:53,850 Which I'm not really going to talk much about. 1331 01:13:53,850 --> 01:13:57,810 So what we have to do here is have a slight extension 1332 01:13:57,810 --> 01:14:00,360 to our notion of a probability density function 1333 01:14:00,360 --> 01:14:04,133 to have joint probability density functions that tell me 1334 01:14:04,133 --> 01:14:06,300 if I've got, say, two variables-- or three, or four, 1335 01:14:06,300 --> 01:14:09,300 whatever-- what is the probability density associated 1336 01:14:09,300 --> 01:14:10,510 with those? 1337 01:14:10,510 --> 01:14:13,110 And what we want is a single scale control 1338 01:14:13,110 --> 01:14:14,610 chart that aggregates these. 1339 01:14:17,260 --> 01:14:20,620 Now, I've got a little bit of mathematical notation 1340 01:14:20,620 --> 01:14:24,430 in here that I'm not going to go through the careful full 1341 01:14:24,430 --> 01:14:26,140 explanation or derivation of. 1342 01:14:26,140 --> 01:14:28,600 What I want to simply show you is 1343 01:14:28,600 --> 01:14:34,000 how a multivariate expression of the same variables we've 1344 01:14:34,000 --> 01:14:38,830 been using all along gives you a nice qualitative feel for what 1345 01:14:38,830 --> 01:14:40,970 this statistic looks like. 1346 01:14:40,970 --> 01:14:43,390 So if we have a vector of measurements-- 1347 01:14:43,390 --> 01:14:44,740 let's have p parameters-- 1348 01:14:44,740 --> 01:14:46,720 I'm measuring 10 different parameters-- 1349 01:14:46,720 --> 01:14:52,150 body in white or whatever it is for that particular process-- 1350 01:14:52,150 --> 01:14:56,650 and put those in an x-- underscore, an x vector. 1351 01:14:56,650 --> 01:14:59,190 And then I can define a vector of means. 1352 01:14:59,190 --> 01:15:02,590 So I have a set of means for all 10 parameters. 1353 01:15:02,590 --> 01:15:08,230 And I can also calculate, if I have lots and lots 1354 01:15:08,230 --> 01:15:09,190 of historical data. 1355 01:15:09,190 --> 01:15:13,330 There isn't a true underlying covariance matrix 1356 01:15:13,330 --> 01:15:17,020 that talks about or characterizes how these data go 1357 01:15:17,020 --> 01:15:19,040 together with each other. 1358 01:15:19,040 --> 01:15:21,220 Now, I can estimate that covariance matrix 1359 01:15:21,220 --> 01:15:24,340 based on sample data, so I similarly 1360 01:15:24,340 --> 01:15:28,210 have a true mean and a true covariance. 1361 01:15:28,210 --> 01:15:35,260 Then I might estimate with an x bar and an S-- 1362 01:15:35,260 --> 01:15:37,180 here I'm using a capital S-- 1363 01:15:37,180 --> 01:15:39,610 to be a sample covariance matrix. 1364 01:15:39,610 --> 01:15:42,430 And there are formulas in the book using 1365 01:15:42,430 --> 01:15:46,990 a very simple linear, matrix-- 1366 01:15:46,990 --> 01:15:49,960 linear algebra for how you would do that, 1367 01:15:49,960 --> 01:15:53,500 given a particular set of data. 1368 01:15:53,500 --> 01:15:57,550 And they look a lot like the univariate example. 1369 01:15:57,550 --> 01:16:00,100 But I did want to point out that there's 1370 01:16:00,100 --> 01:16:03,910 both embedded in this the notion of variance-- 1371 01:16:03,910 --> 01:16:08,200 how variable 2 varies together with variable 2-- 1372 01:16:08,200 --> 01:16:12,070 that's just our normal notion of variance. 1373 01:16:12,070 --> 01:16:14,110 And there's also covariance-- 1374 01:16:14,110 --> 01:16:18,265 that is, a measure of how these two variables move together. 1375 01:16:21,150 --> 01:16:26,540 We can actually write out what the PDFs probability density 1376 01:16:26,540 --> 01:16:28,830 functions-- are in these joint cases. 1377 01:16:28,830 --> 01:16:31,730 And if you write it in matrix form using 1378 01:16:31,730 --> 01:16:35,720 this covariance matrix and the mu vector, 1379 01:16:35,720 --> 01:16:39,930 it looks almost exactly the same as the univariate case. 1380 01:16:39,930 --> 01:16:44,840 Notice that, instead of variance in the univariate case, 1381 01:16:44,840 --> 01:16:51,230 now we have the covariance matrix. 1382 01:16:51,230 --> 01:16:58,820 And similarly, instead of just x and u being univariate values, 1383 01:16:58,820 --> 01:17:01,640 I've got those as vectors. 1384 01:17:01,640 --> 01:17:06,800 But the PDF looks almost exactly the same. 1385 01:17:06,800 --> 01:17:12,650 And qualitatively, you're getting an exponential 1386 01:17:12,650 --> 01:17:18,080 drop-off, this e drop-off with the square of the distance 1387 01:17:18,080 --> 01:17:21,080 of a point from the mean-- 1388 01:17:21,080 --> 01:17:23,130 the normalized distance from the point. 1389 01:17:23,130 --> 01:17:24,830 So if we look back at the univariate, 1390 01:17:24,830 --> 01:17:28,400 the whole idea is I subtract off the mean when I standardize, 1391 01:17:28,400 --> 01:17:31,610 divide by the natural variation, and that tells me 1392 01:17:31,610 --> 01:17:33,470 the shape of the bell curve-- 1393 01:17:33,470 --> 01:17:36,650 shape of the normal curve, how fast things drop off. 1394 01:17:36,650 --> 01:17:42,020 And I can also, for any particular multivariate u-- 1395 01:17:42,020 --> 01:17:46,880 multivariate data point x that has its 10 parameters in it-- 1396 01:17:46,880 --> 01:17:51,290 what's really cool is this formulation here 1397 01:17:51,290 --> 01:17:57,710 of the distance of that x vector from the mean point-- 1398 01:17:57,710 --> 01:18:00,740 the main vector squared-- 1399 01:18:00,740 --> 01:18:05,000 that squared distance, but weighted then or normalized 1400 01:18:05,000 --> 01:18:07,250 to the natural spread-- 1401 01:18:07,250 --> 01:18:10,370 that natural spread or width of that cloud-- 1402 01:18:10,370 --> 01:18:13,190 is just a distance measure-- 1403 01:18:13,190 --> 01:18:15,710 squared distance measure of each data 1404 01:18:15,710 --> 01:18:18,800 point from the central moment of the central point 1405 01:18:18,800 --> 01:18:22,160 of the distribution, scaled to how much natural spread 1406 01:18:22,160 --> 01:18:24,770 or scatter there is in the data. 1407 01:18:24,770 --> 01:18:26,075 OK, so that's the PDF. 1408 01:18:28,670 --> 01:18:32,450 Those are just the formulas for the sample use cases. 1409 01:18:32,450 --> 01:18:38,130 What I can now go in and do is ask the question, 1410 01:18:38,130 --> 01:18:43,400 what is the distribution associated with that distance 1411 01:18:43,400 --> 01:18:44,900 metric? 1412 01:18:44,900 --> 01:18:48,500 And that distance metric is essentially-- 1413 01:18:48,500 --> 01:18:54,440 if I expand out that distance formula, 1414 01:18:54,440 --> 01:19:02,430 it is a sum of squared distances normalized to the correlation, 1415 01:19:02,430 --> 01:19:09,230 which starts to sound like sum of squared unit normal 1416 01:19:09,230 --> 01:19:14,240 variables, which was our chi-squared distribution-- 1417 01:19:14,240 --> 01:19:17,790 sum of squared normal-- 1418 01:19:17,790 --> 01:19:22,490 so what's cool here is that the chi-squared distribution with p 1419 01:19:22,490 --> 01:19:23,960 parameters-- 1420 01:19:23,960 --> 01:19:25,340 p degrees of freedom-- 1421 01:19:25,340 --> 01:19:29,390 is the right statistic for measuring how likely it 1422 01:19:29,390 --> 01:19:32,870 is to see those distance-- the T squared, 1423 01:19:32,870 --> 01:19:33,890 those distance metrics. 1424 01:19:36,500 --> 01:19:39,230 So one can actually have the PDF. 1425 01:19:39,230 --> 01:19:43,370 One can have the distribution for that PDF, 1426 01:19:43,370 --> 01:19:49,610 and based on that, you can formulate or form a-- 1427 01:19:49,610 --> 01:19:53,270 that statistic based on my actual data x. 1428 01:19:53,270 --> 01:19:55,790 and then I can have an alpha squared 1429 01:19:55,790 --> 01:20:00,680 that I can pick my cutoff points from appropriately. 1430 01:20:00,680 --> 01:20:04,710 Oops-- I thought I had a PDF in here somewhere. 1431 01:20:04,710 --> 01:20:09,170 So when I'm making a control chart, 1432 01:20:09,170 --> 01:20:11,810 I use the same ideas that we saw before. 1433 01:20:11,810 --> 01:20:13,670 I have a false alarm rate. 1434 01:20:13,670 --> 01:20:16,130 In this case, I'm going to aggregate all my data 1435 01:20:16,130 --> 01:20:18,170 onto a single control chart so I can just 1436 01:20:18,170 --> 01:20:20,480 use my aggregate alpha. 1437 01:20:20,480 --> 01:20:26,545 And I pick the tail points on the chi-squared distribution. 1438 01:20:26,545 --> 01:20:28,670 Remember, a chi-squared might have looked something 1439 01:20:28,670 --> 01:20:35,280 like this, and I might pick a way out here as my chi-squared 1440 01:20:35,280 --> 01:20:38,300 so that I've got my alpha out there in the tail. 1441 01:20:38,300 --> 01:20:44,180 That picks my cutoff point for my upper control limit. 1442 01:20:44,180 --> 01:20:50,360 Generally, you can also do that for a lower control limit. 1443 01:20:50,360 --> 01:20:53,570 Very often, if you do some-- apply some of these formulas, 1444 01:20:53,570 --> 01:20:56,360 the lower control limit is negative. 1445 01:20:56,360 --> 01:20:58,220 And you can't have negative square, 1446 01:20:58,220 --> 01:21:01,340 so you pick 0 as your lower control limit. 1447 01:21:01,340 --> 01:21:04,910 And you're really only monitoring four data points 1448 01:21:04,910 --> 01:21:07,550 that are larger than-- 1449 01:21:07,550 --> 01:21:13,610 I've got more variation in my system than I expected. 1450 01:21:13,610 --> 01:21:18,200 So just to close up on this univariate versus chi-squared 1451 01:21:18,200 --> 01:21:22,220 chart, here's an example-- 1452 01:21:22,220 --> 01:21:28,430 comparing the univariate case up here to the multivariate case 1453 01:21:28,430 --> 01:21:30,090 down here. 1454 01:21:30,090 --> 01:21:35,570 And what I'm plotting here is, again, the scatter of data. 1455 01:21:35,570 --> 01:21:38,840 You might see there's a slight shaded oval case. 1456 01:21:38,840 --> 01:21:40,610 And all the blue data points are what 1457 01:21:40,610 --> 01:21:44,840 I would do if I just projected those down to an x1 and x2, 1458 01:21:44,840 --> 01:21:47,720 and did univariate control charting. 1459 01:21:47,720 --> 01:21:52,640 If I then had this little red dot that fell outside 1460 01:21:52,640 --> 01:21:55,760 of the joint confidence interval-- 1461 01:21:55,760 --> 01:21:58,640 on my individual charts, notice where it falls-- 1462 01:21:58,640 --> 01:22:00,950 perfectly within the control limits-- 1463 01:22:00,950 --> 01:22:04,280 nothing that tells me that that point is unusual. 1464 01:22:04,280 --> 01:22:06,470 But if I then plot that data point-- 1465 01:22:06,470 --> 01:22:11,720 that red data point there and on the chi-squared chart-- 1466 01:22:11,720 --> 01:22:15,830 I calculated the chi statistic, that distance statistic-- 1467 01:22:15,830 --> 01:22:21,140 that would trigger, because its distance from the mean 1468 01:22:21,140 --> 01:22:21,935 is unusual. 1469 01:22:26,700 --> 01:22:32,460 Now, if I actually didn't know my true covariance matrix, 1470 01:22:32,460 --> 01:22:40,056 but I had to estimate it, I have to estimate sigma with an s. 1471 01:22:40,056 --> 01:22:42,420 I'm right back to this whole difference 1472 01:22:42,420 --> 01:22:45,180 between a t and a unit normal. 1473 01:22:45,180 --> 01:22:47,730 So just to close this out we often 1474 01:22:47,730 --> 01:22:51,930 talk about these as t-squared, Hotelling t-squared statistics, 1475 01:22:51,930 --> 01:22:55,660 but it's exactly the same idea. 1476 01:22:55,660 --> 01:22:58,950 And it's got its own cut-off points. 1477 01:22:58,950 --> 01:23:04,410 OK, so with that, I think we've pulled to a close on control 1478 01:23:04,410 --> 01:23:06,000 charting for the-- 1479 01:23:06,000 --> 01:23:07,860 at least for the time being. 1480 01:23:07,860 --> 01:23:11,460 Next time, we'll start diving in on yield modeling 1481 01:23:11,460 --> 01:23:13,733 and different uses of some statistics, 1482 01:23:13,733 --> 01:23:15,150 but I think what you've got now is 1483 01:23:15,150 --> 01:23:17,880 some exposure to EWMA, CUSUM. 1484 01:23:17,880 --> 01:23:21,450 You'll play around with those with the three or four problems 1485 01:23:21,450 --> 01:23:23,940 on this week's problem set. 1486 01:23:23,940 --> 01:23:25,860 And start looking at the quizzes. 1487 01:23:25,860 --> 01:23:29,370 We'll have the first quiz a week from today-- 1488 01:23:29,370 --> 01:23:31,955 in class. 1489 01:23:31,955 --> 01:23:33,580 AUDIENCE: Professor, I have a question. 1490 01:23:33,580 --> 01:23:34,020 DUANE BONING: Yeah. 1491 01:23:34,020 --> 01:23:34,860 Go ahead and break. 1492 01:23:34,860 --> 01:23:38,580 I'll take your question while people are dispersing. 1493 01:23:38,580 --> 01:23:41,120 AUDIENCE: So for the [INAUDIBLE] you taught us today, 1494 01:23:41,120 --> 01:23:42,870 are they going to be included in the quiz? 1495 01:23:45,660 --> 01:23:49,140 DUANE BONING: It's possible you'll see an EWMA or a CUSUM, 1496 01:23:49,140 --> 01:23:54,810 but they would only be fairly simple questions for those. 1497 01:23:54,810 --> 01:23:59,400 It's not likely you'll see a multivariate problem. 1498 01:23:59,400 --> 01:24:01,580 AUDIENCE: OK, thank you.