1 00:00:00,120 --> 00:00:02,661 PROFESSOR: The following content is provided under a Creative 2 00:00:02,661 --> 00:00:03,880 Commons license. 3 00:00:03,880 --> 00:00:06,090 Your support will help MIT OpenCourseWare 4 00:00:06,090 --> 00:00:10,180 continue to offer high quality educational resources for free. 5 00:00:10,180 --> 00:00:12,720 To make a donation or to view additional materials 6 00:00:12,720 --> 00:00:16,680 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,680 --> 00:00:17,880 at ocw.mit.edu. 8 00:00:21,590 --> 00:00:23,240 So welcome back. 9 00:00:23,240 --> 00:00:26,180 So we are now moving to a new chapter, which 10 00:00:26,180 --> 00:00:30,050 is going to have a little more of a statistical flavor 11 00:00:30,050 --> 00:00:32,720 when it comes to designing methods, all right? 12 00:00:32,720 --> 00:00:35,090 Because if you think about it, OK-- 13 00:00:35,090 --> 00:00:39,139 some of you have probably attempted problem number two 14 00:00:39,139 --> 00:00:39,930 in the problem set. 15 00:00:39,930 --> 00:00:44,360 And you realize that maximum likelihood estimators does not 16 00:00:44,360 --> 00:00:48,080 give you super trivial estimators, right? 17 00:00:48,080 --> 00:00:50,752 I mean, when you have an n theta theta, then the thing you get 18 00:00:50,752 --> 00:00:53,210 is not something you could have guessed before you actually 19 00:00:53,210 --> 00:00:55,410 attempted to solve that problem. 20 00:00:55,410 --> 00:00:59,270 And so, in a way, we've seen already sophisticated methods. 21 00:00:59,270 --> 00:01:02,630 However, in many instances, the maximum likelihood estimator 22 00:01:02,630 --> 00:01:03,710 was just an average. 23 00:01:03,710 --> 00:01:07,010 And in a way, even if we had this confirmation 24 00:01:07,010 --> 00:01:09,950 for maximum likelihood that indeed that was the estimator 25 00:01:09,950 --> 00:01:11,960 that maximum likelihood would spit out, 26 00:01:11,960 --> 00:01:15,780 and that our intuition was therefore pretty good, 27 00:01:15,780 --> 00:01:18,560 most of the statistical analysis or use of the central limit 28 00:01:18,560 --> 00:01:20,390 theorems, all these things actually 29 00:01:20,390 --> 00:01:23,780 did not come in the building of estimator, 30 00:01:23,780 --> 00:01:25,700 in the design of the estimator, but really 31 00:01:25,700 --> 00:01:27,365 in the analysis of the estimator. 32 00:01:27,365 --> 00:01:29,955 And you could say, well, if I know already 33 00:01:29,955 --> 00:01:31,580 that the best estimator is the average, 34 00:01:31,580 --> 00:01:32,996 I'm just going to use the average. 35 00:01:32,996 --> 00:01:35,450 I don't have to, basically, quantify how good it is. 36 00:01:35,450 --> 00:01:37,760 I just know it's the best I can do. 37 00:01:37,760 --> 00:01:39,290 We're going to talk about tests. 38 00:01:39,290 --> 00:01:44,460 And we're going to talk about parametric hypothesis testing. 39 00:01:44,460 --> 00:01:46,880 So you should view this as-- parametric means, 40 00:01:46,880 --> 00:01:49,420 well, it's about a parameter, like we did before. 41 00:01:49,420 --> 00:01:54,420 And hypothesis testing is on the same level as estimation. 42 00:01:54,420 --> 00:01:56,510 And on the same level as estimator 43 00:01:56,510 --> 00:01:58,760 will be the word "test," OK? 44 00:01:58,760 --> 00:02:01,100 And when we're going to devise a test, 45 00:02:01,100 --> 00:02:03,200 we're going to actually need to understand 46 00:02:03,200 --> 00:02:06,090 random fluctuations that arise from the central limit theorem 47 00:02:06,090 --> 00:02:06,860 better, OK? 48 00:02:06,860 --> 00:02:08,940 It's not just going to be in the analysis. 49 00:02:08,940 --> 00:02:10,610 It's also going to be in the design. 50 00:02:10,610 --> 00:02:12,901 And everything we've been doing before in understanding 51 00:02:12,901 --> 00:02:14,720 the behavior of an estimator is actually 52 00:02:14,720 --> 00:02:16,910 going to come in and be extremely 53 00:02:16,910 --> 00:02:21,820 useful in the actual design of tests, OK? 54 00:02:21,820 --> 00:02:25,120 So as an example, I want to talk to you about some real data. 55 00:02:28,080 --> 00:02:29,690 I will not study this data. 56 00:02:29,690 --> 00:02:31,420 But this data actually exist. 57 00:02:31,420 --> 00:02:34,330 You can find it on R. And so, it's 58 00:02:34,330 --> 00:02:37,450 the data from the so-called credit union Cherry Blossom 59 00:02:37,450 --> 00:02:38,890 Run, which is a 10 mile race. 60 00:02:38,890 --> 00:02:40,550 It takes place every year in D.C. 61 00:02:40,550 --> 00:02:42,550 It seems that some of the years are pretty nice. 62 00:02:42,550 --> 00:02:45,820 In 2009, there were about 15,000 participants. 63 00:02:45,820 --> 00:02:47,290 Pretty big race. 64 00:02:47,290 --> 00:02:52,330 And the average running time was 103.5 minutes, all right? 65 00:02:52,330 --> 00:02:57,710 So about an hour and a half or a little bit more. 66 00:02:57,710 --> 00:03:01,240 And so, you can ask the following question, right? 67 00:03:01,240 --> 00:03:02,580 This is actual data, right? 68 00:03:02,580 --> 00:03:07,330 103.5 actually averaged the running time for all of 15,000. 69 00:03:07,330 --> 00:03:10,630 Now, this in practice, may not be something very suitable. 70 00:03:10,630 --> 00:03:13,330 And you might want to just sample a few runners 71 00:03:13,330 --> 00:03:15,130 and try to understand how they're 72 00:03:15,130 --> 00:03:16,900 behaving every year without having 73 00:03:16,900 --> 00:03:18,910 to collect the entire data set. 74 00:03:18,910 --> 00:03:20,710 And so, you could ask the question, well, 75 00:03:20,710 --> 00:03:24,160 let's say my budget is to ask for maybe 10 runners 76 00:03:24,160 --> 00:03:25,960 what their running time was. 77 00:03:25,960 --> 00:03:27,760 I still want to be able to determine 78 00:03:27,760 --> 00:03:31,600 whether they were running faster in 2012 than in 2009. 79 00:03:31,600 --> 00:03:34,420 Why do I put 2012, and not 2016? 80 00:03:34,420 --> 00:03:38,860 Well, because the data set for 2012 is also available. 81 00:03:38,860 --> 00:03:41,770 So if you are interested and you know how to use R, 82 00:03:41,770 --> 00:03:44,300 just go and have fun with it. 83 00:03:44,300 --> 00:03:47,180 So to answer this question, what we do is we select n runners, 84 00:03:47,180 --> 00:03:47,680 right? 85 00:03:47,680 --> 00:03:51,430 So n is a moderate number that's more manageable than 15,000. 86 00:03:51,430 --> 00:03:53,500 From the 2012 race at random. 87 00:03:53,500 --> 00:03:56,199 That's where the random variable is going to come from, right? 88 00:03:56,199 --> 00:03:58,615 That's where we actually inject randomness in our problem. 89 00:04:02,789 --> 00:04:04,205 So remember this is an experience. 90 00:04:04,205 --> 00:04:06,970 So really in a way, the runners are the omegas. 91 00:04:06,970 --> 00:04:10,150 And I'm interested in measurements on those guys. 92 00:04:10,150 --> 00:04:11,980 So this is how I have a random variable. 93 00:04:11,980 --> 00:04:15,570 And this random verbal here is measuring their running time. 94 00:04:15,570 --> 00:04:16,120 OK. 95 00:04:16,120 --> 00:04:18,179 If you look at the data set, you have all sorts 96 00:04:18,179 --> 00:04:19,720 of random variables you could measure 97 00:04:19,720 --> 00:04:21,310 about those random runners. 98 00:04:21,310 --> 00:04:22,630 Country of origin. 99 00:04:22,630 --> 00:04:25,330 I don't know, height, age, a bunch of things. 100 00:04:25,330 --> 00:04:25,860 OK. 101 00:04:25,860 --> 00:04:27,401 Here, the random variable of interest 102 00:04:27,401 --> 00:04:29,750 being the running time. 103 00:04:29,750 --> 00:04:30,250 OK. 104 00:04:30,250 --> 00:04:32,510 Everybody understand what the process is? 105 00:04:32,510 --> 00:04:33,010 OK. 106 00:04:33,010 --> 00:04:36,220 So now I'm going to have to make some modeling assumptions. 107 00:04:36,220 --> 00:04:37,930 And here, I'm actually pretty lucky. 108 00:04:37,930 --> 00:04:41,770 I actually have all the data from a past year. 109 00:04:41,770 --> 00:04:44,500 I mean, this is not the data from 2012, which I also have, 110 00:04:44,500 --> 00:04:45,504 but I don't use. 111 00:04:45,504 --> 00:04:47,920 But I can actually use past data to try to understand what 112 00:04:47,920 --> 00:04:49,210 distribution do I have, right? 113 00:04:49,210 --> 00:04:51,610 I mean, after all, running time is going 114 00:04:51,610 --> 00:04:52,780 to be rounded to something. 115 00:04:52,780 --> 00:04:55,632 Maybe I can think of it as a discrete random variable. 116 00:04:55,632 --> 00:04:58,090 Maybe I can think of it as the exponential random variable. 117 00:04:58,090 --> 00:05:00,040 Those are positive numbers. 118 00:05:00,040 --> 00:05:01,896 I mean, there's many kind of running times 119 00:05:01,896 --> 00:05:03,020 that could come up to mind. 120 00:05:03,020 --> 00:05:04,686 Many kind of distributions I could think 121 00:05:04,686 --> 00:05:06,630 of for this modeling part. 122 00:05:06,630 --> 00:05:08,200 But it turns out that if you actually 123 00:05:08,200 --> 00:05:11,440 plug the histogram of those running times for all 15,000 124 00:05:11,440 --> 00:05:14,410 runners in 2009, you actually are 125 00:05:14,410 --> 00:05:16,145 pretty happy to see that it really 126 00:05:16,145 --> 00:05:18,020 looks like a bell-shaped curve, which suggest 127 00:05:18,020 --> 00:05:19,630 that this should be a Gaussian. 128 00:05:19,630 --> 00:05:25,630 So what you go on to do is you estimate the mean 129 00:05:25,630 --> 00:05:29,800 from past observations, which was actually 103.5, as we said. 130 00:05:29,800 --> 00:05:34,900 You submit the variance, which was 373. 131 00:05:34,900 --> 00:05:37,960 And you just try to superimpose the curve 132 00:05:37,960 --> 00:05:43,740 with this one, which is a Gaussian PDF with mean 103.5 133 00:05:43,740 --> 00:05:45,160 and variants 373. 134 00:05:45,160 --> 00:05:48,110 And you see that they actually look very much alike. 135 00:05:48,110 --> 00:05:50,530 And so here, you're pretty comfortable to say 136 00:05:50,530 --> 00:05:53,570 that the running time actually is Gaussian distribution. 137 00:05:53,570 --> 00:05:54,070 All right? 138 00:05:54,070 --> 00:05:56,410 So now I know that the x1 to xn, I'm 139 00:05:56,410 --> 00:05:58,960 going to say they're Gaussian, OK? 140 00:05:58,960 --> 00:06:01,550 I still need to specify two parameters. 141 00:06:01,550 --> 00:06:05,260 So what I want to know is, is the distribution the same 142 00:06:05,260 --> 00:06:06,220 from past years, right? 143 00:06:06,220 --> 00:06:08,380 So I want to know if the random variable that I'm looking 144 00:06:08,380 --> 00:06:09,463 for-- if I, say, pick one. 145 00:06:09,463 --> 00:06:10,510 Say, x1. 146 00:06:10,510 --> 00:06:12,730 Does it have the same distribution in 2012 147 00:06:12,730 --> 00:06:15,820 that it did in 2009? 148 00:06:15,820 --> 00:06:16,560 OK. 149 00:06:16,560 --> 00:06:19,330 And so, the question is, is x1 has 150 00:06:19,330 --> 00:06:23,530 a Gaussian distribution with mean 103.5 and variance 373? 151 00:06:23,530 --> 00:06:24,580 Is that clear? 152 00:06:24,580 --> 00:06:25,450 OK. 153 00:06:25,450 --> 00:06:30,250 So this question that calls for a yes or no answer 154 00:06:30,250 --> 00:06:31,840 is a hypothesis testing problem. 155 00:06:31,840 --> 00:06:34,090 I am testing a hypothesis. 156 00:06:34,090 --> 00:06:36,850 And this is the basis of basically all 157 00:06:36,850 --> 00:06:39,490 of data-driven scientific inquiry. 158 00:06:39,490 --> 00:06:40,570 You just ask questions. 159 00:06:40,570 --> 00:06:43,930 You formulate a scientific hypothesis. 160 00:06:43,930 --> 00:06:48,816 Knocking down this gene is going to cure melanoma, is this true? 161 00:06:48,816 --> 00:06:49,690 I'm going to collect. 162 00:06:49,690 --> 00:06:50,430 I'm going to try. 163 00:06:50,430 --> 00:06:52,990 I'm to observe some patients on which I knock down this gene. 164 00:06:52,990 --> 00:06:54,615 I'm going to collect some measurements. 165 00:06:54,615 --> 00:07:00,190 And I'm going to try to answer this yes/no question, OK? 166 00:07:00,190 --> 00:07:02,120 It's different from the question, 167 00:07:02,120 --> 00:07:08,150 what is the mean running time for this year? 168 00:07:08,150 --> 00:07:10,210 OK. 169 00:07:10,210 --> 00:07:12,350 So this hypothesis testing is testing 170 00:07:12,350 --> 00:07:13,770 if this hypothesis is true. 171 00:07:13,770 --> 00:07:20,750 The hypothesis in common English we just said, 172 00:07:20,750 --> 00:07:22,111 were runners running faster? 173 00:07:22,111 --> 00:07:22,610 All right? 174 00:07:22,610 --> 00:07:24,451 Anybody could formulate this hypothesis. 175 00:07:24,451 --> 00:07:25,700 Now, you go to a statistician. 176 00:07:25,700 --> 00:07:27,241 And he's like, oh, what you're really 177 00:07:27,241 --> 00:07:32,750 asking me is x1 has a Gaussian distribution with mean 178 00:07:32,750 --> 00:07:35,630 less than 103.5 and variance 373, right? 179 00:07:35,630 --> 00:07:38,380 That's really the question that you ask in statistical terms. 180 00:07:38,380 --> 00:07:42,050 And so, if you're asking if this was the same as before, 181 00:07:42,050 --> 00:07:44,720 there's many ways it could not be the same as before. 182 00:07:44,720 --> 00:07:46,490 There's basically three ways it could not 183 00:07:46,490 --> 00:07:47,840 be the same as before. 184 00:07:47,840 --> 00:07:54,360 It could be the case that x1 is in expectation to 103.5 185 00:07:54,360 --> 00:07:56,610 So the expectation has changed. 186 00:07:56,610 --> 00:07:58,710 Or the variance has changed. 187 00:07:58,710 --> 00:08:00,320 Or the distribution has changed. 188 00:08:00,320 --> 00:08:01,400 I mean, who knows? 189 00:08:01,400 --> 00:08:05,060 Maybe runners are now just all running holding their hands. 190 00:08:05,060 --> 00:08:08,190 And it's like now a point mass at 1 given point. 191 00:08:08,190 --> 00:08:08,690 OK. 192 00:08:08,690 --> 00:08:11,000 So you never know what could [INAUDIBLE].. 193 00:08:11,000 --> 00:08:14,060 Now of course, if you allow for any change, 194 00:08:14,060 --> 00:08:16,111 you will find change. 195 00:08:16,111 --> 00:08:18,610 And so what you have to do is to factor in as much knowledge 196 00:08:18,610 --> 00:08:19,270 as you can. 197 00:08:19,270 --> 00:08:20,710 Make as many modeling assumptions, 198 00:08:20,710 --> 00:08:22,210 so that you can let the data speak 199 00:08:22,210 --> 00:08:23,730 about your particular question. 200 00:08:23,730 --> 00:08:27,230 Here, your particular question is, are they running faster? 201 00:08:27,230 --> 00:08:30,460 So you're only really asking a question about the expectation. 202 00:08:30,460 --> 00:08:33,159 You really want to know if the expectation has changed. 203 00:08:33,159 --> 00:08:35,114 So as far as you're concerned, you're 204 00:08:35,114 --> 00:08:37,030 happy to make the assumption that the rest has 205 00:08:37,030 --> 00:08:38,309 been unchanged. 206 00:08:38,309 --> 00:08:39,760 OK. 207 00:08:39,760 --> 00:08:42,429 And so, this is the question we're asking. 208 00:08:42,429 --> 00:08:45,760 Is the expectation now less than 103.5? 209 00:08:45,760 --> 00:08:48,880 Because you specifically asked whether runners were 210 00:08:48,880 --> 00:08:50,650 going faster this year, right? 211 00:08:50,650 --> 00:08:55,420 They tend to go faster rather than slower, all right? 212 00:08:55,420 --> 00:08:56,470 OK. 213 00:08:56,470 --> 00:09:00,030 So this is the question we're asking in mathematical terms. 214 00:09:00,030 --> 00:09:03,350 So first, when I did that, I need to basically fix the rest. 215 00:09:03,350 --> 00:09:05,450 And fixing the rest is actually part 216 00:09:05,450 --> 00:09:07,160 of the modeling assumptions. 217 00:09:07,160 --> 00:09:10,470 So I fixed my variance to be 373. 218 00:09:10,470 --> 00:09:11,300 OK? 219 00:09:11,300 --> 00:09:13,250 I assume that the variance has not 220 00:09:13,250 --> 00:09:17,120 changed between 2009 and 2012. 221 00:09:17,120 --> 00:09:18,620 Now, this is an assumption. 222 00:09:18,620 --> 00:09:21,110 It turns out it's wrong. 223 00:09:21,110 --> 00:09:22,910 So if you look at the data from 2012, 224 00:09:22,910 --> 00:09:24,496 this is not the correct assumption. 225 00:09:24,496 --> 00:09:26,120 But I'm just going to make it right now 226 00:09:26,120 --> 00:09:29,700 for the sake of argument, OK? 227 00:09:29,700 --> 00:09:31,340 And also the fact that it's Gaussian. 228 00:09:31,340 --> 00:09:34,850 Now, this is going to be hard to violate, right? 229 00:09:34,850 --> 00:09:37,640 I mean, where did this bell-shaped curve come from? 230 00:09:37,640 --> 00:09:42,480 Well, it's just natural when you just measure a bunch of things. 231 00:09:42,480 --> 00:09:44,070 The central limit theorem appears 232 00:09:44,070 --> 00:09:45,320 in the small things of nature. 233 00:09:45,320 --> 00:09:48,200 I mean, that's the bedtime story you get about the central limit 234 00:09:48,200 --> 00:09:48,900 theorem. 235 00:09:48,900 --> 00:09:50,940 And that's why the bell-shaped curve is everywhere in nature. 236 00:09:50,940 --> 00:09:52,730 It's the sum of little independent things 237 00:09:52,730 --> 00:09:53,910 that are going on. 238 00:09:53,910 --> 00:09:57,080 And this Gaussian assumption, even if I wanted to relax it, 239 00:09:57,080 --> 00:09:58,910 there's not much else I can do. 240 00:09:58,910 --> 00:10:02,020 It is pretty robust across the years. 241 00:10:02,020 --> 00:10:02,520 All right. 242 00:10:02,520 --> 00:10:04,930 So the only thing that we did not fix 243 00:10:04,930 --> 00:10:08,892 is the expectation of x1, which now I want to know what it is. 244 00:10:08,892 --> 00:10:11,350 And since I don't know what it is, I'm going to call it mu. 245 00:10:11,350 --> 00:10:13,641 And it's going to be a variable of interest, all right? 246 00:10:13,641 --> 00:10:14,770 So it's just a number mu. 247 00:10:14,770 --> 00:10:17,320 Whatever this is I can try to estimate it, maybe using 248 00:10:17,320 --> 00:10:19,162 maximum likelihood estimation. 249 00:10:19,162 --> 00:10:21,370 Probably using the average, because this is Gaussian. 250 00:10:21,370 --> 00:10:22,995 And we know that the maximum likelihood 251 00:10:22,995 --> 00:10:26,000 estimator for a Gaussian is just the average. 252 00:10:26,000 --> 00:10:30,590 And now we only want to test if mu is equal to 103.5, 253 00:10:30,590 --> 00:10:34,100 like it was in 2009. 254 00:10:34,100 --> 00:10:37,774 Or on the contrary, if mu is not equal to 103.5. 255 00:10:37,774 --> 00:10:39,440 And more specifically, if mu is actually 256 00:10:39,440 --> 00:10:41,240 strictly less than 103.5. 257 00:10:41,240 --> 00:10:42,950 That's the question you ask. 258 00:10:42,950 --> 00:10:49,370 Now, why am I in writing mu equal to 103.5 or is 259 00:10:49,370 --> 00:10:53,240 less than 103.5 and equal to 103.5 260 00:10:53,240 --> 00:10:55,450 versus not equal to 103.5? 261 00:10:55,450 --> 00:10:58,640 It's because since you asked me a more precise question, 262 00:10:58,640 --> 00:11:01,270 I'm going to be able to give you a more precise answer. 263 00:11:01,270 --> 00:11:03,890 And so, if your question is very specific-- 264 00:11:03,890 --> 00:11:05,960 are they running faster? 265 00:11:05,960 --> 00:11:08,440 I'm going to factor that in what I write. 266 00:11:08,440 --> 00:11:10,310 If you just ask me, is it the same? 267 00:11:10,310 --> 00:11:13,647 I'm going to have to write, or is it different than 103.5? 268 00:11:13,647 --> 00:11:15,230 And that's less information about what 269 00:11:15,230 --> 00:11:19,200 you're looking for, OK? 270 00:11:19,200 --> 00:11:23,500 So by making all these modeling assumptions-- 271 00:11:23,500 --> 00:11:25,250 the fact that the variance doesn't change, 272 00:11:25,250 --> 00:11:26,708 the fact that it's still Gaussian-- 273 00:11:26,708 --> 00:11:31,050 I've actually reduced the number of. 274 00:11:31,050 --> 00:11:34,440 And I put numbers in quotes, because this is still 275 00:11:34,440 --> 00:11:35,660 an infinite of them. 276 00:11:35,660 --> 00:11:38,220 But I'm limiting the number of ways 277 00:11:38,220 --> 00:11:40,875 the hypothesis can be violated. 278 00:11:43,380 --> 00:11:46,050 The number of possible alternative realities 279 00:11:46,050 --> 00:11:48,540 for this hypothesis, all right? 280 00:11:48,540 --> 00:11:50,520 For example, I'm saying there's no way 281 00:11:50,520 --> 00:11:53,400 mu can be larger than 103.5. 282 00:11:53,400 --> 00:11:55,710 I've already factored that in, OK? 283 00:11:55,710 --> 00:11:56,700 It could be. 284 00:11:56,700 --> 00:11:59,550 But I'm actually just going to say that if it's larger, 285 00:11:59,550 --> 00:12:02,940 all I'm going to be able to tell you is that it's not smaller. 286 00:12:02,940 --> 00:12:06,290 I'm not going to be able to tell you 287 00:12:06,290 --> 00:12:07,710 that it's actually larger, OK? 288 00:12:12,830 --> 00:12:15,830 And the only way it can be rejected now. 289 00:12:15,830 --> 00:12:17,990 The only way I can reject my hypothesis 290 00:12:17,990 --> 00:12:22,900 is if x belongs to very specific family of distributions. 291 00:12:22,900 --> 00:12:24,650 If it has a distribution which is Gaussian 292 00:12:24,650 --> 00:12:29,820 with mean mu and variance of 373 for mu, which is less 103.5. 293 00:12:29,820 --> 00:12:30,320 All right? 294 00:12:30,320 --> 00:12:40,240 So we started with basically was x1-- 295 00:12:40,240 --> 00:12:41,470 so that's the reality. 296 00:12:41,470 --> 00:12:49,840 x1 follows n 103.5 373, OK? 297 00:12:49,840 --> 00:12:53,690 And this is everything else, right? 298 00:12:53,690 --> 00:12:55,680 So for example, here is x follows 299 00:12:55,680 --> 00:13:02,410 some exponential, 0.1, OK? 300 00:13:02,410 --> 00:13:04,270 This is just another distribution here. 301 00:13:04,270 --> 00:13:06,040 Those are all the possible distributions. 302 00:13:06,040 --> 00:13:09,010 What we said is we said, OK, first of all, let's just 303 00:13:09,010 --> 00:13:13,780 keep only those Gaussian distributions, right? 304 00:13:13,780 --> 00:13:18,100 And second, we said, well, among those Gaussian distributions, 305 00:13:18,100 --> 00:13:20,290 let's only look at those that have-- well, 306 00:13:20,290 --> 00:13:24,370 maybe this one should be at the boundary-- 307 00:13:24,370 --> 00:13:26,650 let's only look at the Gaussians here. 308 00:13:26,650 --> 00:13:33,820 So this guy here are all the Gaussians 309 00:13:33,820 --> 00:13:43,290 with mean mu and variance 373 for mu less than 103.5, OK? 310 00:13:43,290 --> 00:13:45,360 So when you're going to give me data, 311 00:13:45,360 --> 00:13:48,660 I'm going to be able to say, well, am I this guy? 312 00:13:48,660 --> 00:13:49,772 Or am I one of those guys? 313 00:13:49,772 --> 00:13:51,480 Rather than searching through everything. 314 00:13:51,480 --> 00:13:53,940 And the more you search the easier for you 315 00:13:53,940 --> 00:13:56,980 to find something that fits better the data, right? 316 00:13:56,980 --> 00:14:00,000 And so, if I allow everything possible, 317 00:14:00,000 --> 00:14:01,800 then there's going to be something 318 00:14:01,800 --> 00:14:04,770 that just by pure randomness is actually going to look better 319 00:14:04,770 --> 00:14:06,490 for the data, OK? 320 00:14:09,510 --> 00:14:12,780 So for example, if I draw 10 random variables, right? 321 00:14:12,780 --> 00:14:15,840 If n is equal to 10. 322 00:14:15,840 --> 00:14:18,150 And let's say they take 10 different values. 323 00:14:18,150 --> 00:14:20,070 Then it's actually more likely that those guys 324 00:14:20,070 --> 00:14:23,040 come from a discrete distribution that 325 00:14:23,040 --> 00:14:27,350 takes each of these values with probability 1 over 10, 326 00:14:27,350 --> 00:14:30,560 than actually some Gaussian random variable, right? 327 00:14:30,560 --> 00:14:31,507 That would be perfect. 328 00:14:31,507 --> 00:14:32,590 I can actually explain it. 329 00:14:32,590 --> 00:14:36,740 If the 10 numbers I got were say-- 330 00:14:36,740 --> 00:14:41,870 let's say I collect 3, 90, 95, and 102. 331 00:14:41,870 --> 00:14:44,420 Then the most likely distribution for those guys 332 00:14:44,420 --> 00:14:46,490 is the discrete distribution that 333 00:14:46,490 --> 00:14:51,260 takes three values, 91 with probability 1/3, 95 334 00:14:51,260 --> 00:14:57,387 with probability 1/3, and 102 with probably 1/3, right? 335 00:14:57,387 --> 00:14:59,720 That's definitely the most likely distribution for this. 336 00:14:59,720 --> 00:15:02,090 So if I allowed this, I would say, oh no. 337 00:15:02,090 --> 00:15:04,610 This is not distributed according to that. 338 00:15:04,610 --> 00:15:06,920 It's distributed according to this very specific 339 00:15:06,920 --> 00:15:09,800 distribution, which is somewhere in the realm 340 00:15:09,800 --> 00:15:12,450 of all possible distributions, OK? 341 00:15:12,450 --> 00:15:15,480 So now we're just going to try to carve out all this stuff 342 00:15:15,480 --> 00:15:18,570 by making our assumptions. 343 00:15:18,570 --> 00:15:19,070 OK. 344 00:15:19,070 --> 00:15:20,870 So here in this particular example, 345 00:15:20,870 --> 00:15:23,270 just make a mental note that what we're doing 346 00:15:23,270 --> 00:15:25,580 is that I actually-- 347 00:15:25,580 --> 00:15:31,340 a little birdie told me that the reference number is 103.5, OK? 348 00:15:31,340 --> 00:15:34,760 That was the thing I'm actually looking for. 349 00:15:34,760 --> 00:15:36,680 In practice, it's actually seldom the case 350 00:15:36,680 --> 00:15:40,260 that you have this reference for yourself to think of, right? 351 00:15:40,260 --> 00:15:43,260 Maybe here, I just happen to have a full data 352 00:15:43,260 --> 00:15:46,620 set of all the runners of 2009. 353 00:15:46,620 --> 00:15:50,680 But if I really just asked you, I said, 354 00:15:50,680 --> 00:15:55,990 were runners faster in 2012 than in 2009? 355 00:15:55,990 --> 00:15:59,262 Here's $10 to perform your statistical analysis. 356 00:15:59,262 --> 00:16:01,720 What you're probably going to do is called maybe 10 runners 357 00:16:01,720 --> 00:16:05,380 from 2012, maybe 15 runners from 2009, 358 00:16:05,380 --> 00:16:07,930 ask them and try to compare their mean. 359 00:16:07,930 --> 00:16:09,250 There's no standard reference. 360 00:16:09,250 --> 00:16:11,680 You would not be able to come up with this 103.5, 361 00:16:11,680 --> 00:16:14,630 because these data maybe is expensive to get or something. 362 00:16:14,630 --> 00:16:15,130 OK. 363 00:16:15,130 --> 00:16:18,250 So this is really more of the standard case, all right? 364 00:16:18,250 --> 00:16:21,310 Where you really compare two things with each other, 365 00:16:21,310 --> 00:16:23,830 but there's no actual ground truth number 366 00:16:23,830 --> 00:16:26,280 that you're comparing it to. 367 00:16:26,280 --> 00:16:26,780 OK. 368 00:16:26,780 --> 00:16:28,700 So we'll come back to that in a second. 369 00:16:28,700 --> 00:16:32,070 I'll tell you what the other example looks like. 370 00:16:32,070 --> 00:16:34,430 So let's just stick to this example. 371 00:16:34,430 --> 00:16:36,710 I tell you it's 103.5, OK? 372 00:16:36,710 --> 00:16:39,170 Let's try to have our intuition work the same way. 373 00:16:39,170 --> 00:16:42,710 We said, well, averages worked well. 374 00:16:42,710 --> 00:16:46,970 The average, tell me, of over these 10 guys 375 00:16:46,970 --> 00:16:49,820 should tell me what the mean should be. 376 00:16:49,820 --> 00:16:52,460 So I can just say, well x bar is going 377 00:16:52,460 --> 00:16:55,500 to be close to the true mean by the law of large number. 378 00:16:55,500 --> 00:17:00,600 So I'm going to decide whether x bar is less than 103.5. 379 00:17:00,600 --> 00:17:04,069 And conclude that in this case, indeed mu is less than 103.5, 380 00:17:04,069 --> 00:17:06,859 because those two quantities are close, right? 381 00:17:06,859 --> 00:17:08,609 I could do that. 382 00:17:08,609 --> 00:17:10,670 The problem is that this could go pretty wrong. 383 00:17:10,670 --> 00:17:13,730 Because if n is small, then I know 384 00:17:13,730 --> 00:17:17,400 that xn bar is not equal to mu. 385 00:17:17,400 --> 00:17:19,819 I know that xn bar is close to mu. 386 00:17:19,819 --> 00:17:21,859 But I also know that there's pretty high chance 387 00:17:21,859 --> 00:17:23,150 that it's not equal to mu. 388 00:17:23,150 --> 00:17:26,410 In particular, I know it's going to be somewhere at 1 over root 389 00:17:26,410 --> 00:17:28,580 n away from mu, right? 390 00:17:28,580 --> 00:17:31,690 1 over root n being the root coming from what? 391 00:17:34,820 --> 00:17:35,600 CLT, right? 392 00:17:35,600 --> 00:17:40,700 That's the root n that comes from CLT. In blunt words, 393 00:17:40,700 --> 00:17:44,510 CLT tells me the mean is at distance 394 00:17:44,510 --> 00:17:47,600 1 over root n from the expectation, pretty much. 395 00:17:47,600 --> 00:17:48,920 That's what it's telling. 396 00:17:48,920 --> 00:17:49,910 So 1 over root n. 397 00:17:52,510 --> 00:17:55,790 If I have 10 people in there, 1 over root 10 398 00:17:55,790 --> 00:17:57,170 is not a huge number, right? 399 00:17:57,170 --> 00:18:00,290 It's like 1/3 pretty much. 400 00:18:00,290 --> 00:18:02,850 So 1/3 103.5. 401 00:18:02,850 --> 00:18:07,940 If the true mean was actually 103.4, 402 00:18:07,940 --> 00:18:12,834 but my average was telling me it's 103.4 plus 1/3, 403 00:18:12,834 --> 00:18:15,250 I would actually come to two different conclusions, right? 404 00:18:22,620 --> 00:18:29,400 So let's say that mu is equal to 103.4, OK? 405 00:18:29,400 --> 00:18:32,700 So you're not supposed to know this, right? 406 00:18:32,700 --> 00:18:34,350 That's the hidden truth. 407 00:18:37,900 --> 00:18:38,400 OK. 408 00:18:38,400 --> 00:18:40,380 Now I have n is equal to 10. 409 00:18:40,380 --> 00:18:49,190 So I know that x bar n minus 103.4 410 00:18:49,190 --> 00:18:52,470 is something of the order of 1 over the square root of 10, 411 00:18:52,470 --> 00:18:57,840 which is of the order of, say, 0.3. 412 00:18:57,840 --> 00:18:58,340 OK. 413 00:18:58,340 --> 00:19:01,480 So here, this is all hand wavy, OK? 414 00:19:01,480 --> 00:19:05,120 But that's what the central limit theorem tells me. 415 00:19:05,120 --> 00:19:13,260 What it means is that it is possible 416 00:19:13,260 --> 00:19:20,030 that x bar n is actually equal to is actually 417 00:19:20,030 --> 00:19:30,970 equal to 103.4 plus 0.3, which is equal to 103.7. 418 00:19:30,970 --> 00:19:40,350 Which means that while the truth is that mu is less than 103.5, 419 00:19:40,350 --> 00:19:47,572 then I would conclude that mu is larger than 103.5, OK? 420 00:19:47,572 --> 00:19:49,780 And that's because I have not been very cautious, OK? 421 00:19:52,990 --> 00:19:56,170 So what we want to do is to have a little buffer 422 00:19:56,170 --> 00:19:58,750 to account for the fact that xn bar is not 423 00:19:58,750 --> 00:20:01,525 a precise value for the true mu. 424 00:20:01,525 --> 00:20:05,180 It's something that's 1 over root n away from you. 425 00:20:05,180 --> 00:20:07,420 And so, what we want is the better heuristic that 426 00:20:07,420 --> 00:20:09,550 says, well, if I want to conclude that I'm 427 00:20:09,550 --> 00:20:14,270 less than 103.5, maybe I need to be less than 103.5 428 00:20:14,270 --> 00:20:17,920 minus a little buffer that goes to 0 as my sample size goes 429 00:20:17,920 --> 00:20:19,300 to infinity. 430 00:20:19,300 --> 00:20:22,354 And actually, that's what the law of large number tells me. 431 00:20:22,354 --> 00:20:23,770 The central limit theorem actually 432 00:20:23,770 --> 00:20:26,740 tells me that this should be true, 433 00:20:26,740 --> 00:20:30,030 something that goes to 0 as n goes to infinity 434 00:20:30,030 --> 00:20:32,850 and the rate 1 over root n, right? 435 00:20:32,850 --> 00:20:36,680 That's basically what the central limit theorem tells me. 436 00:20:36,680 --> 00:20:39,170 So to make this intuition more precise, 437 00:20:39,170 --> 00:20:41,490 we need to understand those fluctuations. 438 00:20:41,490 --> 00:20:43,610 We need to actually put in something 439 00:20:43,610 --> 00:20:47,102 that's more precise than these little wiggles here, OK? 440 00:20:47,102 --> 00:20:49,560 We need to actually have the central limit theorem come in. 441 00:20:53,110 --> 00:20:57,340 So here is the example of comparing two groups. 442 00:20:57,340 --> 00:21:00,700 So pharmaceutical companies use hypothesis 443 00:21:00,700 --> 00:21:03,167 testing to test if a drug is efficient, right? 444 00:21:03,167 --> 00:21:04,000 That's what they do. 445 00:21:04,000 --> 00:21:06,170 They want to know, does my new drug work? 446 00:21:06,170 --> 00:21:09,175 And that's what the Federal Drug Administration office 447 00:21:09,175 --> 00:21:11,560 is doing on a daily basis. 448 00:21:11,560 --> 00:21:18,660 They ask for extremely well regulated clinical trials 449 00:21:18,660 --> 00:21:22,530 on a thousand people, and check, does this drug 450 00:21:22,530 --> 00:21:23,460 make a difference? 451 00:21:23,460 --> 00:21:24,900 Did everybody die? 452 00:21:24,900 --> 00:21:27,270 Does it make no difference? 453 00:21:27,270 --> 00:21:30,450 Should people pay $200 for a pill of sugar, right? 454 00:21:30,450 --> 00:21:33,030 So that's what people are actually asking. 455 00:21:33,030 --> 00:21:36,060 So to do so, of course, there is no ground truth about-- 456 00:21:36,060 --> 00:21:38,830 so there's actually a placebo effect. 457 00:21:38,830 --> 00:21:41,970 So it's not like actually giving a drug that does not work 458 00:21:41,970 --> 00:21:44,400 is going to have no effect on patients. 459 00:21:44,400 --> 00:21:47,320 It will have a small effect, but it's very hard to quantify. 460 00:21:47,320 --> 00:21:50,460 We know that it's there, but we don't know what it is. 461 00:21:50,460 --> 00:21:52,670 And so rather than saying, oh the ground truth 462 00:21:52,670 --> 00:21:56,280 is no improvement, the ground truth is the placebo effect. 463 00:21:56,280 --> 00:22:00,020 And we need to measure what the placebo effect is. 464 00:22:00,020 --> 00:22:01,730 So what we're going to do is we're 465 00:22:01,730 --> 00:22:04,730 going to split our patients into two groups. 466 00:22:04,730 --> 00:22:06,590 And there's going to be what's called a test 467 00:22:06,590 --> 00:22:10,020 group and a control group. 468 00:22:10,020 --> 00:22:13,040 So the word test here is used in a different way 469 00:22:13,040 --> 00:22:14,220 than hypothesis testing. 470 00:22:14,220 --> 00:22:17,490 So we'll just call it typically the drug group. 471 00:22:17,490 --> 00:22:22,790 And so, I will refer to mu drug for this guy, OK? 472 00:22:22,790 --> 00:22:26,120 Now, this let's say this is a cough syrup, OK? 473 00:22:26,120 --> 00:22:29,990 And when you have a cough syrup, the way 474 00:22:29,990 --> 00:22:34,520 you measure the efficacy of a cough syrup 475 00:22:34,520 --> 00:22:40,000 is to measure how many times you cough per minute, OK? 476 00:22:40,000 --> 00:22:42,740 And so, if I define mu control the number 477 00:22:42,740 --> 00:22:48,840 of expectoration per hour. 478 00:22:48,840 --> 00:22:50,500 So just the expected number, right? 479 00:22:50,500 --> 00:22:53,000 This is the number I don't know, because I don't have access 480 00:22:53,000 --> 00:22:55,430 to the entire population of people that will ever 481 00:22:55,430 --> 00:22:57,590 take this cough syrup. 482 00:22:57,590 --> 00:23:00,526 And so, I will call it mu control for the control group. 483 00:23:00,526 --> 00:23:02,900 So those are the people who have been actually given just 484 00:23:02,900 --> 00:23:05,330 like sugar, like maple syrup. 485 00:23:05,330 --> 00:23:09,440 And mu drug are those people who are given the actual syrup, OK? 486 00:23:09,440 --> 00:23:12,710 And you can imagine that maybe maple syrup will have an effect 487 00:23:12,710 --> 00:23:18,020 on expectorations per hour just because, well, it's just sweet 488 00:23:18,020 --> 00:23:19,040 and it helps, OK? 489 00:23:19,040 --> 00:23:21,290 And so, we don't know what this effect is going to be. 490 00:23:21,290 --> 00:23:24,920 We just want to measure if the drug is actually 491 00:23:24,920 --> 00:23:28,670 having just a better impact on expectorations 492 00:23:28,670 --> 00:23:34,700 per hour than the just pure maple syrup, OK? 493 00:23:34,700 --> 00:23:38,385 So what we want to know is if mu drug is less than mu control. 494 00:23:38,385 --> 00:23:39,260 That would be enough. 495 00:23:39,260 --> 00:23:41,540 If we had access to all the populations 496 00:23:41,540 --> 00:23:44,390 that will ever take the syrup for all ages, 497 00:23:44,390 --> 00:23:46,760 then we would just measure, did this have an impact? 498 00:23:46,760 --> 00:23:49,880 And even if it's a slightly ever so small impact, 499 00:23:49,880 --> 00:23:52,760 then it's good to release this cough syrup, 500 00:23:52,760 --> 00:23:55,430 assuming that it has no side effects or anything like this, 501 00:23:55,430 --> 00:23:58,370 because it's just better than maple syrup, OK? 502 00:23:58,370 --> 00:24:00,430 The problem is that we don't have access to this. 503 00:24:00,430 --> 00:24:03,110 And we're going to have to make this decision based on samples 504 00:24:03,110 --> 00:24:09,140 that give me imprecise knowledge about mu drug and mu control. 505 00:24:09,140 --> 00:24:10,870 So in this case, unlike the first case 506 00:24:10,870 --> 00:24:13,990 where we compared an unknown expected value 507 00:24:13,990 --> 00:24:17,530 to have a fixed number, which was one of the 103.5, here, 508 00:24:17,530 --> 00:24:20,980 we're just comparing two unknown numbers with each other, OK? 509 00:24:20,980 --> 00:24:22,522 So there's two sources of randomness. 510 00:24:22,522 --> 00:24:23,896 Trying to estimate the first one. 511 00:24:23,896 --> 00:24:25,492 And trying to estimate the second one. 512 00:24:29,190 --> 00:24:31,860 Before I move on, I just wanted to tell you I apologize. 513 00:24:31,860 --> 00:24:34,480 One of the graders was not able to finish grading his problem 514 00:24:34,480 --> 00:24:35,240 sets for today. 515 00:24:35,240 --> 00:24:39,390 So for those of you who are here just to pick up their homework, 516 00:24:39,390 --> 00:24:41,010 feel free to leave now. 517 00:24:41,010 --> 00:24:45,071 Even if you have a name tag, I will pretend I did not read it. 518 00:24:45,071 --> 00:24:45,570 OK. 519 00:24:45,570 --> 00:24:47,300 So I'm sorry. 520 00:24:47,300 --> 00:24:49,970 You'll get it on Tuesday. 521 00:24:49,970 --> 00:24:53,540 And this will not happen again. 522 00:24:53,540 --> 00:24:54,510 OK. 523 00:24:54,510 --> 00:24:56,722 So for the clinical trials, now I'm 524 00:24:56,722 --> 00:24:57,930 going to collect information. 525 00:24:57,930 --> 00:25:00,138 I'm going to collect the data from the control group. 526 00:25:00,138 --> 00:25:03,420 And I'm going to collect data from the test group, all right? 527 00:25:03,420 --> 00:25:05,490 So my control group here. 528 00:25:05,490 --> 00:25:08,170 I don't have to collect the same number of people in the control 529 00:25:08,170 --> 00:25:09,600 group than in the drug group. 530 00:25:09,600 --> 00:25:12,520 Actually, for cough syrup, maybe it's not that important. 531 00:25:12,520 --> 00:25:14,790 But you can imagine that if you think 532 00:25:14,790 --> 00:25:20,160 you have the cure to a really annoying disease, 533 00:25:20,160 --> 00:25:23,910 it's actually hard to tell half of the people you 534 00:25:23,910 --> 00:25:26,460 will get a pill of nothing, OK? 535 00:25:26,460 --> 00:25:28,140 People tend to want to try the drug. 536 00:25:28,140 --> 00:25:28,942 They're desperate. 537 00:25:28,942 --> 00:25:30,900 And so, you have to have this sort of imbalance 538 00:25:30,900 --> 00:25:35,390 between who is getting the drug and who's not getting the drug. 539 00:25:35,390 --> 00:25:37,865 And people have to qualify for the clinical trials. 540 00:25:37,865 --> 00:25:39,240 There's lots of fluctuations that 541 00:25:39,240 --> 00:25:42,091 affect what the final numbers of people who are actually 542 00:25:42,091 --> 00:25:44,340 going to get the drug and are going to get the control 543 00:25:44,340 --> 00:25:45,180 is going to be. 544 00:25:45,180 --> 00:25:49,277 And so, it's not easy for you to make those two numbers equal. 545 00:25:49,277 --> 00:25:51,360 You'd like to have those numbers equal if you can, 546 00:25:51,360 --> 00:25:55,910 but not necessarily. 547 00:25:55,910 --> 00:25:59,270 And by the way, this is all part of some mystical science called 548 00:25:59,270 --> 00:26:00,440 "design of experiments." 549 00:26:00,440 --> 00:26:02,780 And in particular, you can imagine 550 00:26:02,780 --> 00:26:04,910 that if one of the series had higher variants, 551 00:26:04,910 --> 00:26:07,040 you would want to like more people in this group 552 00:26:07,040 --> 00:26:08,000 than the other group. 553 00:26:08,000 --> 00:26:10,982 Yeah? 554 00:26:10,982 --> 00:26:13,964 STUDENT: So when we're subtracting [INAUDIBLE] 555 00:26:13,964 --> 00:26:20,425 something that [INAUDIBLE] 0 [INAUDIBLE] to be satisfied. 556 00:26:20,425 --> 00:26:22,420 So that's on purpose [INAUDIBLE].. 557 00:26:22,420 --> 00:26:23,180 PROFESSOR: Yeah, that's on purpose. 558 00:26:23,180 --> 00:26:25,055 And I'll come to that in a second, all right? 559 00:26:25,055 --> 00:26:31,130 So basically, we're going to make it 560 00:26:31,130 --> 00:26:34,760 if your answer is, is this true? 561 00:26:34,760 --> 00:26:39,170 We're going to make it as hard as possible, but no harder 562 00:26:39,170 --> 00:26:41,150 for you to say yes to this answer. 563 00:26:41,150 --> 00:26:43,230 Because, well, we'll see why. 564 00:26:45,900 --> 00:26:50,040 OK, so now we have two set of data, the x's and the y's. 565 00:26:50,040 --> 00:26:51,810 The x's are the ones for the drug. 566 00:26:51,810 --> 00:26:55,230 And the y's are the data that I collected from the people, who 567 00:26:55,230 --> 00:26:57,520 were just given a placebo, OK? 568 00:26:57,520 --> 00:26:59,280 And so, they're all IID random variables. 569 00:26:59,280 --> 00:27:02,220 And here, since it's the number of expectorations, 570 00:27:02,220 --> 00:27:07,352 I'm making a blunt modeling assumption. 571 00:27:07,352 --> 00:27:08,810 I'm just going to say it's Poisson. 572 00:27:08,810 --> 00:27:11,990 And it's characterized only by the mean mu drug or the mean mu 573 00:27:11,990 --> 00:27:13,440 control, OK? 574 00:27:13,440 --> 00:27:15,090 I've just made an assumption here. 575 00:27:15,090 --> 00:27:16,750 It could be something different. 576 00:27:16,750 --> 00:27:19,260 But let's say it's a Poisson distribution. 577 00:27:19,260 --> 00:27:21,655 So now what I want to know is to test whether mu drug is 578 00:27:21,655 --> 00:27:22,530 less than mu control. 579 00:27:22,530 --> 00:27:23,880 We said that already. 580 00:27:23,880 --> 00:27:26,340 But the way we said it before was not as mathematical 581 00:27:26,340 --> 00:27:27,120 as it is now. 582 00:27:27,120 --> 00:27:29,520 Now we're actually making a test on the parameters 583 00:27:29,520 --> 00:27:30,540 of Poisson distribution. 584 00:27:30,540 --> 00:27:32,610 Whereas before, we were just making test 585 00:27:32,610 --> 00:27:36,410 on expected numbers, OK? 586 00:27:36,410 --> 00:27:39,650 So the heuristic-- again, if we try to apply the heuristic now. 587 00:27:39,650 --> 00:27:42,890 Rather than comparing mu x bar drug to some fixed number, 588 00:27:42,890 --> 00:27:46,306 I'm actually comparing x bar drug to some control. 589 00:27:46,306 --> 00:27:48,680 But now here, I need to have something that accounts for, 590 00:27:48,680 --> 00:27:51,200 not only the fluctuations of x bar drug, 591 00:27:51,200 --> 00:27:55,406 but also for the fluctuations of x bar control, OK? 592 00:27:55,406 --> 00:27:56,780 And so, now I need something that 593 00:27:56,780 --> 00:27:59,540 goes to 0 when all those two things go to infinity. 594 00:27:59,540 --> 00:28:02,840 And typically, it should go to zero with 1 over root of n drug 595 00:28:02,840 --> 00:28:06,340 and 1 over square root of n control, OK? 596 00:28:06,340 --> 00:28:08,830 That's what the central limit theorem for both x bar 597 00:28:08,830 --> 00:28:11,320 drug and x bar control. 598 00:28:11,320 --> 00:28:15,261 Two central limit theorems are actually telling. 599 00:28:15,261 --> 00:28:15,760 OK. 600 00:28:15,760 --> 00:28:17,840 And then we can conclude that this happens. 601 00:28:17,840 --> 00:28:19,570 And as you said, we're trying to make it 602 00:28:19,570 --> 00:28:21,700 a bit harder to conclude this. 603 00:28:21,700 --> 00:28:23,830 Because let's face it. 604 00:28:23,830 --> 00:28:26,800 If we were actually using two simple heuristic, right? 605 00:28:30,114 --> 00:28:31,030 For simplicity, right? 606 00:28:35,880 --> 00:28:43,930 So I can rewrite x bar drug less than x bar control 607 00:28:43,930 --> 00:28:46,060 minus this something that goes to 0. 608 00:28:46,060 --> 00:28:54,650 I can write it as x bar drug minus x bar control less 609 00:28:54,650 --> 00:28:57,930 than something negative, OK? 610 00:28:57,930 --> 00:29:00,280 This little something, OK? 611 00:29:00,280 --> 00:29:02,990 So now let's look at those guys. 612 00:29:02,990 --> 00:29:06,470 This is the difference of two random variables. 613 00:29:06,470 --> 00:29:08,570 From the central limit theorem, they 614 00:29:08,570 --> 00:29:12,914 should be approximately Gaussian each. 615 00:29:12,914 --> 00:29:14,330 And actually, we're going to think 616 00:29:14,330 --> 00:29:15,660 of them as being independent. 617 00:29:15,660 --> 00:29:18,460 There's no reason why the people in the control group 618 00:29:18,460 --> 00:29:20,210 should have any effect on what's happening 619 00:29:20,210 --> 00:29:21,350 to the people in the test group. 620 00:29:21,350 --> 00:29:23,550 Those people probably don't even know each other. 621 00:29:23,550 --> 00:29:27,056 And so, when I look at this, this should look like n 0 622 00:29:27,056 --> 00:29:28,430 with some mean and some variants, 623 00:29:28,430 --> 00:29:30,590 let's say I don't know what it is, OK? 624 00:29:30,590 --> 00:29:31,940 The mean I actually know. 625 00:29:31,940 --> 00:29:37,250 It's mu drug minus mu control, OK? 626 00:29:37,250 --> 00:29:39,950 So if they were to plot the PDF of this guy, 627 00:29:39,950 --> 00:29:41,210 it would look like this. 628 00:29:41,210 --> 00:29:42,890 I would have something which is centered 629 00:29:42,890 --> 00:29:45,590 at mu drug minus mu control. 630 00:29:48,280 --> 00:29:51,880 And it would look like this, OK? 631 00:29:51,880 --> 00:29:55,780 Now let's say that mu drug is actually equal to mu control. 632 00:29:55,780 --> 00:29:59,810 That this pharmaceutical company is a huge scam. 633 00:29:59,810 --> 00:30:04,810 And they really are trying to sell bottled corn 634 00:30:04,810 --> 00:30:07,580 syrup for $200 a pop, OK? 635 00:30:07,580 --> 00:30:08,980 So this is a huge scam. 636 00:30:08,980 --> 00:30:12,010 And the true things are actually equal to 0. 637 00:30:12,010 --> 00:30:15,670 So this thing is really centered about 0, OK? 638 00:30:15,670 --> 00:30:18,760 Now, if were not to do this, then basically, half 639 00:30:18,760 --> 00:30:20,890 of the time I would actually come up 640 00:30:20,890 --> 00:30:22,900 with a distribution that's above this value. 641 00:30:22,900 --> 00:30:24,370 And half of the time I would have something that's 642 00:30:24,370 --> 00:30:27,070 below this value, which would mean that half of the scams 643 00:30:27,070 --> 00:30:31,040 would actually go through FDA if I did not do this. 644 00:30:31,040 --> 00:30:33,340 So what I'm trying to do is to say, well, OK. 645 00:30:33,340 --> 00:30:35,830 You have to be here, so that there is actually 646 00:30:35,830 --> 00:30:37,930 a very low probability that just by chance 647 00:30:37,930 --> 00:30:40,120 you end up being here. 648 00:30:40,120 --> 00:30:42,520 And we'll make all the statements extremely precise 649 00:30:42,520 --> 00:30:43,780 later on. 650 00:30:43,780 --> 00:30:46,060 But I think the drug thing makes it 651 00:30:46,060 --> 00:30:49,330 interesting to see why you're making it hard, 652 00:30:49,330 --> 00:30:51,220 because You don't want to allow people 653 00:30:51,220 --> 00:30:52,420 to sell a thing like that. 654 00:30:55,830 --> 00:30:58,620 Before we go more into the statistical thinking associated 655 00:30:58,620 --> 00:31:01,050 to tests, let's just see how we would 656 00:31:01,050 --> 00:31:02,460 do this quantification, right? 657 00:31:02,460 --> 00:31:04,410 I mean after all, this is what we probably 658 00:31:04,410 --> 00:31:07,590 are the most comfortable with at this point. 659 00:31:07,590 --> 00:31:10,650 So let's just try to understand this. 660 00:31:10,650 --> 00:31:16,060 And I'm going to make the statisticians favorite test, 661 00:31:16,060 --> 00:31:19,700 which is the thing that obviously you do at home all 662 00:31:19,700 --> 00:31:21,450 the time every time you get a new quarter, 663 00:31:21,450 --> 00:31:23,740 is testing whether it's a fair coin or not. 664 00:31:23,740 --> 00:31:24,240 All right? 665 00:31:24,240 --> 00:31:27,840 So this test, of course, exists only in textbooks. 666 00:31:27,840 --> 00:31:30,780 And I actually did not write this slide. 667 00:31:30,780 --> 00:31:32,670 I was lazy to just replace all this stuff 668 00:31:32,670 --> 00:31:37,410 by the Cherry Blossom Run. 669 00:31:37,410 --> 00:31:38,440 So you have a coin. 670 00:31:38,440 --> 00:31:42,330 Now you have 80 observations, x1 to x80. 671 00:31:42,330 --> 00:31:45,420 So n is equal to 80. 672 00:31:45,420 --> 00:31:53,820 I have x1, xn, IID, Bernoulli p. 673 00:31:53,820 --> 00:31:55,570 And I want to know if I have a fair coin. 674 00:31:55,570 --> 00:31:57,330 So in mathematical language, I want 675 00:31:57,330 --> 00:32:00,380 to know if p is equal to 1/2. 676 00:32:04,800 --> 00:32:07,680 Let's say this is just the heads, OK? 677 00:32:07,680 --> 00:32:09,030 And a biased coin? 678 00:32:09,030 --> 00:32:10,536 Well, maybe you would potentially 679 00:32:10,536 --> 00:32:11,910 be interested whether it's biased 680 00:32:11,910 --> 00:32:13,034 one direction or the other. 681 00:32:13,034 --> 00:32:15,900 But not being a fair coin is already somewhat 682 00:32:15,900 --> 00:32:17,360 of a discovery, OK? 683 00:32:17,360 --> 00:32:20,520 And so, you just want to know whether p is equal to 1/2 684 00:32:20,520 --> 00:32:25,280 or p is not equal to 1/2, OK? 685 00:32:25,280 --> 00:32:29,890 Now, if I were to apply the very naive first example 686 00:32:29,890 --> 00:32:32,080 to not reject this hypothesis. 687 00:32:32,080 --> 00:32:35,710 If I run this thing 80 times, I need 688 00:32:35,710 --> 00:32:40,360 to see exactly 40 heads and 40 tales. 689 00:32:40,360 --> 00:32:43,960 Now this is very unlikely to happen exactly. 690 00:32:43,960 --> 00:32:47,200 You're going to have close to 40 heads and close to 40 tails, 691 00:32:47,200 --> 00:32:49,510 but how close should those things be? 692 00:32:49,510 --> 00:32:50,310 OK? 693 00:32:50,310 --> 00:32:52,540 And so, the little something is going 694 00:32:52,540 --> 00:32:55,300 to be quantified by exactly this, OK? 695 00:32:55,300 --> 00:33:06,490 So now here, let's say that my experiment gave me 54 heads. 696 00:33:06,490 --> 00:33:07,541 That's 54? 697 00:33:07,541 --> 00:33:08,040 Yeah. 698 00:33:10,870 --> 00:33:21,100 Which means that my xn bar is 54 over 80, which is 0.68. 699 00:33:21,100 --> 00:33:21,970 All right? 700 00:33:21,970 --> 00:33:24,580 So I have this estimator. 701 00:33:24,580 --> 00:33:26,620 Looks pretty large, right? 702 00:33:26,620 --> 00:33:29,950 It's much larger than 0.5, so it does look like, 703 00:33:29,950 --> 00:33:32,080 and my mom would certainly conclude, 704 00:33:32,080 --> 00:33:34,240 that this is a biased coin for sure, 705 00:33:34,240 --> 00:33:35,870 because she thinks I'm tricky. 706 00:33:35,870 --> 00:33:36,370 All right. 707 00:33:36,370 --> 00:33:40,110 So the question is, can this be due to chance? 708 00:33:40,110 --> 00:33:42,050 Can this be due to chance alone? 709 00:33:42,050 --> 00:33:45,490 Like what is the likelihood that a fair coin would actually 710 00:33:45,490 --> 00:33:51,800 end up being 54 times on heads rather than 40? 711 00:33:51,800 --> 00:33:52,900 OK? 712 00:33:52,900 --> 00:33:55,540 And so, what we do is we say, OK, I 713 00:33:55,540 --> 00:33:58,370 need to understand, what is the distribution of the number 714 00:33:58,370 --> 00:33:59,732 of times it comes on heads? 715 00:33:59,732 --> 00:34:01,190 And this is going to be a binomial, 716 00:34:01,190 --> 00:34:02,984 but it's a little annoying to play with. 717 00:34:02,984 --> 00:34:05,150 So we're going to use the central limit theorem that 718 00:34:05,150 --> 00:34:10,790 tells me that xn bar minus p divided 719 00:34:10,790 --> 00:34:15,940 by square root of p1 minus p is approximately distributed 720 00:34:15,940 --> 00:34:17,300 as an n01. 721 00:34:17,300 --> 00:34:18,944 And here, since n is equal to 80, 722 00:34:18,944 --> 00:34:21,110 I'm pretty safe that this is actually going to work. 723 00:34:28,420 --> 00:34:33,300 And I can actually use [INAUDIBLE],, 724 00:34:33,300 --> 00:34:34,510 and put xn bar here. 725 00:34:38,050 --> 00:34:40,980 [INAUDIBLE] tells me that this is OK to do. 726 00:34:40,980 --> 00:34:41,480 All right. 727 00:34:41,480 --> 00:34:43,350 So now I'm actually going to compute this. 728 00:34:43,350 --> 00:34:44,870 So here, I know this. 729 00:34:44,870 --> 00:34:46,280 This is square root of 80. 730 00:34:46,280 --> 00:34:48,860 This is a 0.68. 731 00:34:48,860 --> 00:34:50,210 What is this value here? 732 00:34:50,210 --> 00:34:51,179 We'll talk about it. 733 00:34:51,179 --> 00:34:53,090 Well, we're trying to understand what happens 734 00:34:53,090 --> 00:34:55,610 if it is a fair coin, right? 735 00:34:55,610 --> 00:35:02,220 So if fair, then p is equal to 0.5, right? 736 00:35:02,220 --> 00:35:05,160 So what I want to know is, what is the likelihood 737 00:35:05,160 --> 00:35:09,240 that a fair coin would give me 0.68? 738 00:35:09,240 --> 00:35:10,560 Let me finish. 739 00:35:10,560 --> 00:35:11,850 All right. 740 00:35:11,850 --> 00:35:14,190 What is the likelihood that a fair coin will 741 00:35:14,190 --> 00:35:17,470 allow me to do this, so I'm actually allowed to plug-in p 742 00:35:17,470 --> 00:35:19,530 to be 0.5 here? 743 00:35:19,530 --> 00:35:25,000 Now, your question is, why do I not plug-in p to be 0.5? 744 00:35:25,000 --> 00:35:25,640 But you can. 745 00:35:25,640 --> 00:35:26,140 All right. 746 00:35:26,140 --> 00:35:29,024 I just want to make you plug-in p at one specific point, 747 00:35:29,024 --> 00:35:30,190 but you're absolutely right. 748 00:35:34,040 --> 00:35:34,540 OK. 749 00:35:34,540 --> 00:35:37,270 Let's forget about your question for one second. 750 00:35:37,270 --> 00:35:41,590 So now I'm going to have to look at xn bar minus 0.5 divided 751 00:35:41,590 --> 00:35:45,300 by xn bar 1 minus xn bar. 752 00:35:45,300 --> 00:35:51,130 Then this thing is approximately Gaussian and 0,1 753 00:35:51,130 --> 00:35:52,270 if the coin is fair. 754 00:35:56,150 --> 00:36:01,120 Otherwise, I'm going to have a mean which is not zero here. 755 00:36:01,120 --> 00:36:04,730 If the coin is something else, whatever I get here, right? 756 00:36:07,772 --> 00:36:09,230 Let's just write it for one second. 757 00:36:23,520 --> 00:36:25,010 Let's do it. 758 00:36:25,010 --> 00:36:27,510 So what is the distribution of this if p-- 759 00:36:27,510 --> 00:36:33,380 so that's p is equal to 0.5. 760 00:36:33,380 --> 00:36:33,930 OK? 761 00:36:33,930 --> 00:36:39,910 Now if p is equal to 0.6, then this thing is just, well, 762 00:36:39,910 --> 00:36:43,660 I know that this is equal to square root of n xn 763 00:36:43,660 --> 00:36:52,140 bar minus 0.6, divided by xn bar 1 764 00:36:52,140 --> 00:36:55,740 minus xn in the bar squared root, plus-- 765 00:36:55,740 --> 00:36:57,050 well, now the difference. 766 00:36:57,050 --> 00:37:00,710 Is So square root of n, 0.6 minus 767 00:37:00,710 --> 00:37:07,520 0.5, divided by square root of xn bar 1 minus xn bar, right? 768 00:37:07,520 --> 00:37:13,790 Now if p is equal to 0.6, then this guy is n 0,1, 769 00:37:13,790 --> 00:37:17,950 but this guy is something different. 770 00:37:17,950 --> 00:37:22,155 This is just a number that depends on square root of n. 771 00:37:22,155 --> 00:37:24,130 It's actually pretty large. 772 00:37:24,130 --> 00:37:28,620 So if I want to use the fact that this guy has 773 00:37:28,620 --> 00:37:33,650 a normal distribution, I need to plug-in the true value here. 774 00:37:33,650 --> 00:37:38,630 Now, the implicit question that I got was the following. 775 00:37:38,630 --> 00:37:43,630 It says, well, if you know what p is, then what's 776 00:37:43,630 --> 00:37:46,600 actually true is also this. 777 00:37:46,600 --> 00:37:51,450 If p is equal to 0.5, then since I 778 00:37:51,450 --> 00:37:57,180 know that root n xn bar minus p divided by square root of p 1 779 00:37:57,180 --> 00:38:01,590 minus p is some n 0, 1, it's also true 780 00:38:01,590 --> 00:38:06,570 that square root of n xn bar minus 0.5 781 00:38:06,570 --> 00:38:14,940 divided by square root of 0.5 1 minus 0.5 is n 0,1, right? 782 00:38:14,940 --> 00:38:15,750 I know what p is. 783 00:38:15,750 --> 00:38:18,620 I'm just going to make it appear. 784 00:38:18,620 --> 00:38:19,460 OK. 785 00:38:19,460 --> 00:38:22,100 And so, what's actually nice about this particular 786 00:38:22,100 --> 00:38:27,460 [INAUDIBLE] experiment is that I can check if my assumption is 787 00:38:27,460 --> 00:38:31,386 valid by checking whether I'm actually-- 788 00:38:31,386 --> 00:38:32,760 so what I'm going to do right now 789 00:38:32,760 --> 00:38:36,697 is check whether this is likely to be a Gaussian or not, right? 790 00:38:36,697 --> 00:38:38,280 And there's two ways I can violate it. 791 00:38:38,280 --> 00:38:42,496 By violating mean, but also by violating the variance. 792 00:38:42,496 --> 00:38:44,120 And here, what I did in the first case, 793 00:38:44,120 --> 00:38:46,340 I said, well I'm not allowing you to check whether you've 794 00:38:46,340 --> 00:38:47,040 violated the variance. 795 00:38:47,040 --> 00:38:49,340 I'm just plugging whatever variance you're getting. 796 00:38:49,340 --> 00:38:51,110 Whereas here, I'm saying, well, there's 797 00:38:51,110 --> 00:38:52,460 two ways you can violate it. 798 00:38:52,460 --> 00:38:55,430 And I'm just going to factor everything in. 799 00:38:55,430 --> 00:38:58,700 So now I can plug-in this number. 800 00:38:58,700 --> 00:39:00,146 So this is 80. 801 00:39:00,146 --> 00:39:02,440 This is 0.68. 802 00:39:02,440 --> 00:39:04,210 So I can compute all this stuff. 803 00:39:04,210 --> 00:39:06,910 I can compute all this stuff here as well. 804 00:39:06,910 --> 00:39:10,210 And what I get in this case, if I put the xn bar 1, 805 00:39:10,210 --> 00:39:15,160 I get 3.45, OK? 806 00:39:15,160 --> 00:39:17,120 And now I claim that this makes it 807 00:39:17,120 --> 00:39:19,430 reasonable to reject the hypothesis that p 808 00:39:19,430 --> 00:39:21,790 is equal to 0.5. 809 00:39:21,790 --> 00:39:22,900 Can somebody tell me why? 810 00:39:27,819 --> 00:39:28,860 STUDENT: It's pretty big. 811 00:39:28,860 --> 00:39:30,235 PROFESSOR: Yeah, 3 is pretty big. 812 00:39:30,235 --> 00:39:31,510 So it's very unlikely. 813 00:39:31,510 --> 00:39:33,730 So this number that I should see should 814 00:39:33,730 --> 00:39:39,460 look like the number I would get if I asked a computer to draw 815 00:39:39,460 --> 00:39:42,960 one random Gaussian for me. 816 00:39:42,960 --> 00:39:45,250 This number, when I draw one random Gaussian, 817 00:39:45,250 --> 00:39:49,510 is actually a number with 99.9% this number will 818 00:39:49,510 --> 00:39:52,700 be between negative 3 and 3. 819 00:39:52,700 --> 00:39:55,540 With 78% it's going to be between negative 2 and 2. 820 00:40:01,990 --> 00:40:04,410 68% is between minus 1 and 1. 821 00:40:04,410 --> 00:40:07,560 And with like 90% it's between minus 2 and 2. 822 00:40:07,560 --> 00:40:10,860 So getting a 3.45 when you do this 823 00:40:10,860 --> 00:40:13,260 is extremely unlikely to happen, which 824 00:40:13,260 --> 00:40:17,260 means that you would have to be extremely unlucky for this 825 00:40:17,260 --> 00:40:17,890 to ever happen. 826 00:40:17,890 --> 00:40:19,750 Now, it can happen, right? 827 00:40:19,750 --> 00:40:25,000 It could be the case that you flip 80 coins and 80 of them 828 00:40:25,000 --> 00:40:27,505 are heads. 829 00:40:27,505 --> 00:40:29,130 With what probability does this happen? 830 00:40:32,680 --> 00:40:34,390 1 over 2 to the 80, right? 831 00:40:34,390 --> 00:40:39,890 Which is probably better off playing the lottery 832 00:40:39,890 --> 00:40:41,140 with this kind of odds, right? 833 00:40:41,140 --> 00:40:43,840 I mean, this is just not going to happen, but it might happen. 834 00:40:43,840 --> 00:40:48,490 So we cannot remove completely the uncertainty, right? 835 00:40:48,490 --> 00:40:53,250 It's still possible that this is due to noise. 836 00:40:53,250 --> 00:40:55,640 But we're just trying to make all the cases that 837 00:40:55,640 --> 00:40:58,340 are very unlikely go away, OK? 838 00:40:58,340 --> 00:41:03,350 And so, now I claim that 3.45 is very unlikely for a Gaussian. 839 00:41:03,350 --> 00:41:07,700 So if I were to draw the PDF of a standard Gaussian, right? 840 00:41:07,700 --> 00:41:09,320 So n 0, 1, right? 841 00:41:09,320 --> 00:41:12,880 So that's PDF of n 0, 1. 842 00:41:16,740 --> 00:41:21,900 3.73 is basically here, OK? 843 00:41:21,900 --> 00:41:25,260 So it's just too far in the tails. 844 00:41:25,260 --> 00:41:26,030 Understood? 845 00:41:26,030 --> 00:41:30,320 Now I cannot say that the probability that the Gaussian 846 00:41:30,320 --> 00:41:33,810 is equal to 373 is small, right? 847 00:41:33,810 --> 00:41:35,660 I just cannot say that, because it's 0. 848 00:41:35,660 --> 00:41:37,770 And it's also 0 for the probability that it's 0, 849 00:41:37,770 --> 00:41:41,060 even though the most likely values are around 0. 850 00:41:41,060 --> 00:41:44,236 It's the continuous random variable. 851 00:41:44,236 --> 00:41:45,610 Any value you give me, it's going 852 00:41:45,610 --> 00:41:47,720 to happen with probability zero. 853 00:41:47,720 --> 00:41:51,190 So what we're going to say is, well, the fluctuations 854 00:41:51,190 --> 00:41:52,780 are larger than this number. 855 00:41:52,780 --> 00:41:55,030 The probability that I get anything worse 856 00:41:55,030 --> 00:41:57,040 than this is actually extremely small, right? 857 00:41:57,040 --> 00:42:00,970 Anything worse than this is just like farther than 3.73. 858 00:42:00,970 --> 00:42:03,710 And this is going to be what we control. 859 00:42:03,710 --> 00:42:04,300 All right? 860 00:42:04,300 --> 00:42:06,550 So in this case, I claim that it's quite reasonable 861 00:42:06,550 --> 00:42:07,740 to reject the hypothesis. 862 00:42:07,740 --> 00:42:10,830 Is everybody OK with this? 863 00:42:10,830 --> 00:42:12,440 Everybody find this shocking? 864 00:42:12,440 --> 00:42:14,290 Or everybody has no idea what's going on? 865 00:42:14,290 --> 00:42:16,422 Do you have any questions? 866 00:42:16,422 --> 00:42:17,410 Yeah? 867 00:42:17,410 --> 00:42:19,386 STUDENT: Regarding the case of p, where 868 00:42:19,386 --> 00:42:21,362 minus p isn't close to xn. 869 00:42:21,362 --> 00:42:24,820 If you use 1 minus p as 0.5, then you're 870 00:42:24,820 --> 00:42:28,772 dividing by a larger number than you would if you used xn. 871 00:42:28,772 --> 00:42:32,230 So it feels like our true number is not 3.45. 872 00:42:32,230 --> 00:42:34,700 It's something a little bit smaller 873 00:42:34,700 --> 00:42:39,146 than 3.45 for the distribution to actually be like 1/2. 874 00:42:39,146 --> 00:42:40,628 Because it seems like we're adding 875 00:42:40,628 --> 00:42:43,574 an unnecessary extra error by using xn bar. 876 00:42:43,574 --> 00:42:45,074 And we're adding an error that makes 877 00:42:45,074 --> 00:42:50,014 it seem that our result was less likely than it actually was. 878 00:43:00,450 --> 00:43:03,490 PROFESSOR: That's correct. 879 00:43:03,490 --> 00:43:05,790 And you're right. 880 00:43:05,790 --> 00:43:07,540 I didn't want to plug-in the p everywhere, 881 00:43:07,540 --> 00:43:09,670 but you should plug it in everywhere you can. 882 00:43:09,670 --> 00:43:11,010 That's for sure, OK? 883 00:43:11,010 --> 00:43:12,205 So let's agree on that. 884 00:43:12,205 --> 00:43:15,022 And that's true that it makes the number a little bigger. 885 00:43:15,022 --> 00:43:16,480 You compute how much you would get, 886 00:43:16,480 --> 00:43:18,000 we would get if we 0.5 there. 887 00:43:20,760 --> 00:43:23,040 Well, I don't know what the square root of 80 is. 888 00:43:23,040 --> 00:43:26,370 Can somebody compute quickly? 889 00:43:26,370 --> 00:43:27,600 I'm not asking you to do it. 890 00:43:27,600 --> 00:43:46,150 But what I want is two times square root of 80 times 0.18. 891 00:43:46,150 --> 00:43:48,850 3.22 892 00:43:48,850 --> 00:43:49,970 OK. 893 00:43:49,970 --> 00:43:55,462 I can make the same cartoon picture with 3.22. 894 00:43:55,462 --> 00:43:56,170 But you're right. 895 00:43:56,170 --> 00:43:57,370 This is definitely more accurate. 896 00:43:57,370 --> 00:43:58,536 And I should have done this. 897 00:43:58,536 --> 00:44:02,350 I didn't want to get the confused message, OK? 898 00:44:02,350 --> 00:44:02,850 All right. 899 00:44:02,850 --> 00:44:07,770 So now here's a second example that you can think of. 900 00:44:07,770 --> 00:44:11,520 So now I toss it 30 times. 901 00:44:11,520 --> 00:44:17,310 Still in the realm of the central limit theorem. 902 00:44:17,310 --> 00:44:23,910 I get 13 heads rather than 15. 903 00:44:23,910 --> 00:44:27,127 So I'm actually much closer to being exactly at half. 904 00:44:27,127 --> 00:44:28,710 So let's see if this is actually going 905 00:44:28,710 --> 00:44:29,918 to give me a plausible value. 906 00:44:32,580 --> 00:44:34,830 So I get 0.33 in average. 907 00:44:34,830 --> 00:44:40,950 If the truth was 0.5, I would get something like 0.77. 908 00:44:40,950 --> 00:44:44,700 And now I claim that 0.77 is a plausible realization 909 00:44:44,700 --> 00:44:46,950 for some standard Gaussian, OK? 910 00:44:46,950 --> 00:44:49,620 Now, 0.77 is going to look like it's here. 911 00:44:55,130 --> 00:44:57,830 So that could very well be something that just 912 00:44:57,830 --> 00:44:59,420 comes because of randomness. 913 00:44:59,420 --> 00:45:01,860 And again, if you think about it. 914 00:45:01,860 --> 00:45:06,740 If I told you, you were expecting 15, you saw 13, 915 00:45:06,740 --> 00:45:09,479 you're happy to put that on the account of randomness. 916 00:45:09,479 --> 00:45:11,270 Now of course, the question is going to be, 917 00:45:11,270 --> 00:45:12,770 where do I draw the line? 918 00:45:12,770 --> 00:45:13,760 Right? 919 00:45:13,760 --> 00:45:15,740 Is 12 the right number? 920 00:45:15,740 --> 00:45:16,840 Is 11? 921 00:45:16,840 --> 00:45:17,570 Is 10? 922 00:45:17,570 --> 00:45:18,070 What is it? 923 00:45:21,690 --> 00:45:24,770 So basically, the answer is it's whatever you want to be. 924 00:45:24,770 --> 00:45:28,250 The problem it's hard to think on the scale, right? 925 00:45:28,250 --> 00:45:30,332 What does it mean to think on the scale? 926 00:45:30,332 --> 00:45:31,790 If I can't think in this scale, I'm 927 00:45:31,790 --> 00:45:33,950 going to have to think on the scale of 80 of them. 928 00:45:33,950 --> 00:45:38,390 I'm going to have to think on the scale of running 100 coin 929 00:45:38,390 --> 00:45:39,170 flips. 930 00:45:39,170 --> 00:45:43,004 And so, this scale is a moving target all the time. 931 00:45:43,004 --> 00:45:44,420 Every time you have a new problem, 932 00:45:44,420 --> 00:45:45,961 you have to have a new skill in mind. 933 00:45:45,961 --> 00:45:47,660 And it's very difficult. 934 00:45:47,660 --> 00:45:50,840 The purpose of statistical analysis, 935 00:45:50,840 --> 00:45:53,810 and in particular this process that content 936 00:45:53,810 --> 00:45:55,670 that takes your x bar and turns it 937 00:45:55,670 --> 00:45:58,530 into something that should be standard Gaussian, 938 00:45:58,530 --> 00:46:01,010 allows you to map the value of x bar 939 00:46:01,010 --> 00:46:06,350 into a scale that is the standard scale of the Gaussian. 940 00:46:06,350 --> 00:46:07,070 All right? 941 00:46:07,070 --> 00:46:09,050 Now, all you need to have in mind 942 00:46:09,050 --> 00:46:13,220 is, what is a large number or an unusually large number 943 00:46:13,220 --> 00:46:14,014 for a Gaussian? 944 00:46:14,014 --> 00:46:15,180 That's all you need to know. 945 00:46:18,880 --> 00:46:21,760 So here, by the way, 0.77 is not this one, 946 00:46:21,760 --> 00:46:26,860 because it was actually negative 0.77. 947 00:46:26,860 --> 00:46:28,660 So this one. 948 00:46:28,660 --> 00:46:29,160 OK. 949 00:46:29,160 --> 00:46:34,460 So I can be on the right or I can be on the left of zero. 950 00:46:34,460 --> 00:46:36,030 But they are still plausible. 951 00:46:36,030 --> 00:46:40,000 So understand you could actually have in mind 952 00:46:40,000 --> 00:46:42,040 all the values that are plausible for a Gaussian 953 00:46:42,040 --> 00:46:43,415 and those that are not plausible, 954 00:46:43,415 --> 00:46:46,190 and draw the line based on what you think is the right number. 955 00:46:46,190 --> 00:46:49,690 So how large should a positive value of a Gaussian to become 956 00:46:49,690 --> 00:46:52,990 unreasonable for you? 957 00:46:52,990 --> 00:46:54,530 Is it 1? 958 00:46:54,530 --> 00:46:56,190 Is it 1.5? 959 00:46:56,190 --> 00:46:56,819 Is it 2? 960 00:46:56,819 --> 00:46:57,860 Stop me when I get there. 961 00:46:57,860 --> 00:46:59,900 Is it 2.5? 962 00:46:59,900 --> 00:47:00,557 Is it 3? 963 00:47:00,557 --> 00:47:02,348 STUDENT: I think 2.5 is definitely too big. 964 00:47:02,348 --> 00:47:03,014 PROFESSOR: What? 965 00:47:03,014 --> 00:47:04,808 STUDENT: Doesn't it depend on our prior? 966 00:47:04,808 --> 00:47:06,776 Let's say we already have really good evidence 967 00:47:06,776 --> 00:47:09,864 at this point [INAUDIBLE] 968 00:47:09,864 --> 00:47:12,030 PROFESSOR: Yeah, so this is not Bayesian statistics. 969 00:47:12,030 --> 00:47:14,160 So there's no such thing as a prior right now. 970 00:47:14,160 --> 00:47:15,150 We'll get there. 971 00:47:15,150 --> 00:47:18,360 You'll have your moment during one short chapter. 972 00:47:23,510 --> 00:47:25,580 So there's no prior here, right? 973 00:47:25,580 --> 00:47:27,380 It's really a matter of whether you think 974 00:47:27,380 --> 00:47:28,820 is a Gaussian large or not. 975 00:47:28,820 --> 00:47:30,230 It's not a matter of coins. 976 00:47:30,230 --> 00:47:31,530 It's not a matter of anything. 977 00:47:31,530 --> 00:47:33,950 Now I've just reduced it to just one question. 978 00:47:33,950 --> 00:47:36,350 So forget about everything we just said. 979 00:47:36,350 --> 00:47:38,240 And I'm asking you, when do you decide 980 00:47:38,240 --> 00:47:43,010 that a number is too large to be reasonably drawn 981 00:47:43,010 --> 00:47:44,790 from a Gaussian? 982 00:47:44,790 --> 00:47:50,190 And this number is 2 or 1.96. 983 00:47:50,190 --> 00:47:53,010 And that's basically the number that you get from this quintel. 984 00:47:53,010 --> 00:47:55,560 We've seen the 1.96 before, right? 985 00:47:55,560 --> 00:47:59,920 It's actually q alpha over 2, where alpha is equal to 5%. 986 00:47:59,920 --> 00:48:01,620 That's a quintel of a Gaussian. 987 00:48:01,620 --> 00:48:05,130 So actually, what we do is we map it again. 988 00:48:05,130 --> 00:48:06,300 So are now at the Gaussians. 989 00:48:06,300 --> 00:48:08,380 And then we map it again into some probabilities, 990 00:48:08,380 --> 00:48:10,796 which is the probability of being farther than this thing. 991 00:48:10,796 --> 00:48:12,460 And now probabilities, we can think. 992 00:48:12,460 --> 00:48:15,060 Probability is something that quantifies my error. 993 00:48:15,060 --> 00:48:17,220 And the question is what percentage of error 994 00:48:17,220 --> 00:48:18,300 am I willing to tolerate. 995 00:48:18,300 --> 00:48:20,310 And if I tell you 5%, that's something 996 00:48:20,310 --> 00:48:21,570 you can really envision. 997 00:48:21,570 --> 00:48:24,750 What it means is that if I were to do this test a million 998 00:48:24,750 --> 00:48:28,380 times, 5% of the time I would expose myself 999 00:48:28,380 --> 00:48:30,400 to making a mistake. 1000 00:48:30,400 --> 00:48:30,900 All right. 1001 00:48:30,900 --> 00:48:31,950 That's all it would say. 1002 00:48:31,950 --> 00:48:36,790 If you said, well, I don't want to account for 5%, 1003 00:48:36,790 --> 00:48:42,280 maybe I want 1%, then you have to move from 1.94 to 2.5. 1004 00:48:42,280 --> 00:48:44,710 And then if you say at I want 0.01%, 1005 00:48:44,710 --> 00:48:47,180 then you have to move to an even larger number. 1006 00:48:47,180 --> 00:48:48,100 So it depends. 1007 00:48:48,100 --> 00:48:51,610 But stating this number 1%, 5%, 10% 1008 00:48:51,610 --> 00:48:57,380 is much easier than seeing those numbers 1.96, 2.5, et cetera. 1009 00:48:57,380 --> 00:49:00,370 So we're just putting everything back on the scale. 1010 00:49:00,370 --> 00:49:01,480 All right. 1011 00:49:01,480 --> 00:49:03,190 To conclude, this, again, as we said, 1012 00:49:03,190 --> 00:49:05,720 does not suggest that the coin is unfair. 1013 00:49:05,720 --> 00:49:08,260 Now, it might be that the coin is unfair. 1014 00:49:08,260 --> 00:49:10,330 We just don't have enough evidence to say that. 1015 00:49:10,330 --> 00:49:12,580 And that goes back to your question about, 1016 00:49:12,580 --> 00:49:17,420 why are we siding with the fact that we're 1017 00:49:17,420 --> 00:49:22,086 making it harder to conclude that the runners were faster? 1018 00:49:22,086 --> 00:49:23,210 And this is the same thing. 1019 00:49:23,210 --> 00:49:24,830 We're making it harder to conclude 1020 00:49:24,830 --> 00:49:26,540 that the coin is biased. 1021 00:49:26,540 --> 00:49:28,700 Because there is a status quo. 1022 00:49:28,700 --> 00:49:30,500 And we're trying to see if we have evidence 1023 00:49:30,500 --> 00:49:31,910 against the status quo. 1024 00:49:31,910 --> 00:49:35,380 The status quo for the runners is they ran the same speed. 1025 00:49:35,380 --> 00:49:37,880 The status quo for the coin, we can probably all agree 1026 00:49:37,880 --> 00:49:39,560 is that the coin is fair. 1027 00:49:39,560 --> 00:49:41,150 The status quo for a drug? 1028 00:49:41,150 --> 00:49:43,940 I mean, again, unless you prove me 1029 00:49:43,940 --> 00:49:45,920 that you're actually not a scammer 1030 00:49:45,920 --> 00:49:48,959 is that the status quo is that this is maple syrup. 1031 00:49:48,959 --> 00:49:50,000 There's nothing in there. 1032 00:49:50,000 --> 00:49:51,060 Why would you? 1033 00:49:51,060 --> 00:49:53,060 I mean, if I let you get away with it, 1034 00:49:53,060 --> 00:49:55,810 you would put corn syrup. 1035 00:49:55,810 --> 00:49:58,720 It's cheaper. 1036 00:49:58,720 --> 00:49:59,840 OK. 1037 00:49:59,840 --> 00:50:01,171 So now let's move on to math. 1038 00:50:01,171 --> 00:50:01,670 All right. 1039 00:50:01,670 --> 00:50:04,460 So when I started doing mathematics, 1040 00:50:04,460 --> 00:50:06,690 I'm going to have to talk about random variables 1041 00:50:06,690 --> 00:50:08,270 and statistical models. 1042 00:50:08,270 --> 00:50:13,110 And here, there is actually a very simple thing, 1043 00:50:13,110 --> 00:50:15,800 which actually goes back to this picture. 1044 00:50:18,950 --> 00:50:27,230 A test is really asking me if my parameter 1045 00:50:27,230 --> 00:50:28,850 is in some region of the parameter set 1046 00:50:28,850 --> 00:50:30,766 or another region of the parameter set, right? 1047 00:50:30,766 --> 00:50:32,000 Yes/no. 1048 00:50:32,000 --> 00:50:37,490 And so, what I'm going to be given is a sample, x1, xn. 1049 00:50:37,490 --> 00:50:38,120 I have a model. 1050 00:50:41,830 --> 00:50:46,460 And again, those can be braces depending on the day. 1051 00:50:46,460 --> 00:50:54,670 And so, now I'm going to give myself theta 0 and theta 1 1052 00:50:54,670 --> 00:50:55,660 to this joint subset. 1053 00:50:58,590 --> 00:51:01,170 OK. 1054 00:51:01,170 --> 00:51:02,840 So capital theta here is the space 1055 00:51:02,840 --> 00:51:05,200 in which my parameter can live. 1056 00:51:05,200 --> 00:51:06,950 To make two disjoint subsets, I could just 1057 00:51:06,950 --> 00:51:11,356 split this guy in half, right? 1058 00:51:11,356 --> 00:51:13,730 I'm going to say, well, maybe it's this guy and this guy. 1059 00:51:13,730 --> 00:51:14,230 OK. 1060 00:51:14,230 --> 00:51:16,250 So this is theta 0. 1061 00:51:16,250 --> 00:51:18,830 And this is theta 1. 1062 00:51:18,830 --> 00:51:22,840 What it means when I split those two guys, in test, 1063 00:51:22,840 --> 00:51:25,510 I'm actually going to focus only on theta 0 or theta 1. 1064 00:51:25,510 --> 00:51:28,010 And so, it means that a priori I've already 1065 00:51:28,010 --> 00:51:32,240 removed all the possibilities of theta being in this region. 1066 00:51:32,240 --> 00:51:33,540 What does it mean? 1067 00:51:33,540 --> 00:51:37,910 Go back to the example of runners. 1068 00:51:37,910 --> 00:51:41,180 This region here for the Cherry Blossom Run 1069 00:51:41,180 --> 00:51:44,420 is the set of parameters, where mu was larger 1070 00:51:44,420 --> 00:51:47,354 than 103.5, right? 1071 00:51:47,354 --> 00:51:48,020 We removed that. 1072 00:51:48,020 --> 00:51:49,880 We didn't even consider this possibility. 1073 00:51:49,880 --> 00:51:52,820 We said either it's less-- 1074 00:51:52,820 --> 00:51:53,320 sorry. 1075 00:51:53,320 --> 00:51:55,860 That's mu equal to 103.5. 1076 00:51:55,860 --> 00:51:59,770 And this was mu less than 103.5, OK? 1077 00:51:59,770 --> 00:52:03,950 But these guys were like if it happens, it happens. 1078 00:52:03,950 --> 00:52:06,980 I'm not making any statement about that case. 1079 00:52:06,980 --> 00:52:07,520 All right? 1080 00:52:07,520 --> 00:52:09,565 So now I take those two subsets. 1081 00:52:09,565 --> 00:52:11,690 And now I'm going to give them two different names, 1082 00:52:11,690 --> 00:52:15,050 because they're going to have an asymmetric role. 1083 00:52:15,050 --> 00:52:18,710 h0 is the null hypothesis. 1084 00:52:18,710 --> 00:52:23,630 And h1 is the alternative hypothesis. 1085 00:52:23,630 --> 00:52:27,030 h0 is the status quo. 1086 00:52:27,030 --> 00:52:29,560 h1 is what is considered typically 1087 00:52:29,560 --> 00:52:32,140 as scientific discovery. 1088 00:52:32,140 --> 00:52:36,550 So if you're a regulator, you're going to push towards h0. 1089 00:52:36,550 --> 00:52:39,832 If you're a scientist, you're going to push towards h1. 1090 00:52:39,832 --> 00:52:41,290 If you're a pharmaceutical company, 1091 00:52:41,290 --> 00:52:42,990 you're going to push towards h1. 1092 00:52:42,990 --> 00:52:43,900 OK? 1093 00:52:43,900 --> 00:52:47,410 And so, depending on whether you want to be conservative-- oh, 1094 00:52:47,410 --> 00:52:49,272 I can find evidence in a lot of data. 1095 00:52:49,272 --> 00:52:50,980 As soon as you give me three data points, 1096 00:52:50,980 --> 00:52:52,563 I'm going to be able to find evidence. 1097 00:52:52,563 --> 00:52:55,000 That means I'm going to tend to say, oh, it's h1. 1098 00:52:55,000 --> 00:52:58,010 But if you say you need a lot of data 1099 00:52:58,010 --> 00:53:00,260 before you can actually move away from the status quo, 1100 00:53:00,260 --> 00:53:01,870 that's age h0, OK? 1101 00:53:01,870 --> 00:53:03,940 So think of h0 as being status quo, 1102 00:53:03,940 --> 00:53:08,480 h1 being some discovery that goes against the status quo. 1103 00:53:08,480 --> 00:53:08,980 All right? 1104 00:53:08,980 --> 00:53:12,310 So if we believe that the truth theta is either 1105 00:53:12,310 --> 00:53:17,330 in one of those, what we say is we want to test h0 against h1. 1106 00:53:17,330 --> 00:53:17,830 OK. 1107 00:53:17,830 --> 00:53:19,790 This is actually wording. 1108 00:53:19,790 --> 00:53:22,360 So remember, because this is how your questions are 1109 00:53:22,360 --> 00:53:23,770 going to be formulated. 1110 00:53:23,770 --> 00:53:26,320 And this is how you want to probably communicate 1111 00:53:26,320 --> 00:53:27,730 as a statistician. 1112 00:53:27,730 --> 00:53:29,347 So you're going to say I have the null 1113 00:53:29,347 --> 00:53:30,430 and I have an alternative. 1114 00:53:30,430 --> 00:53:32,830 I want to test h0 against h1. 1115 00:53:32,830 --> 00:53:34,480 I want to test the null hypothesis 1116 00:53:34,480 --> 00:53:36,220 against the alternative hypothesis, OK? 1117 00:53:39,540 --> 00:53:42,590 Now, the two hypotheses I forgot to say are actually this. 1118 00:53:42,590 --> 00:53:46,710 h0 is that the theta belongs to theta 0. 1119 00:53:46,710 --> 00:53:50,911 And h1 is that it theta belongs to theta 1. 1120 00:53:50,911 --> 00:53:51,410 OK. 1121 00:53:51,410 --> 00:53:53,280 So here, for example, theta was mu. 1122 00:53:53,280 --> 00:53:57,320 And that was mu equal to 103.5. 1123 00:53:57,320 --> 00:54:01,530 And this was mu less than 103.5. 1124 00:54:01,530 --> 00:54:02,480 OK? 1125 00:54:02,480 --> 00:54:06,360 So typically, they're not going to look like thetas and things 1126 00:54:06,360 --> 00:54:06,860 like that. 1127 00:54:06,860 --> 00:54:09,140 They're going to look like very simple things, where you take 1128 00:54:09,140 --> 00:54:11,180 your usual notation for your usual parameter 1129 00:54:11,180 --> 00:54:15,060 and you just say in mathematical terms what relationship this 1130 00:54:15,060 --> 00:54:16,910 should be satisfying, right? 1131 00:54:16,910 --> 00:54:18,320 For example, in the drug example, 1132 00:54:18,320 --> 00:54:25,150 that would be mu drug is equal to mu control. 1133 00:54:25,150 --> 00:54:30,380 And here, that would be mu drug less than mu control. 1134 00:54:30,380 --> 00:54:34,150 The number of expectorations for people 1135 00:54:34,150 --> 00:54:35,800 who take the drug for the cough syrup 1136 00:54:35,800 --> 00:54:38,300 is less than the number of expectoration of people 1137 00:54:38,300 --> 00:54:42,990 who take the corn syrup, OK? 1138 00:54:42,990 --> 00:54:45,090 So now what we want to do. 1139 00:54:47,890 --> 00:54:51,060 We've set up our hypothesis testing problem. 1140 00:54:51,060 --> 00:54:52,260 You're a scientist. 1141 00:54:52,260 --> 00:54:55,020 You've set up your problem. 1142 00:54:55,020 --> 00:54:58,170 Now what you're going to do is collect data. 1143 00:54:58,170 --> 00:55:00,660 And what you're going to try to find on this data 1144 00:55:00,660 --> 00:55:04,050 is evidence against h0. 1145 00:55:04,050 --> 00:55:06,060 And the alternative is going to guide you 1146 00:55:06,060 --> 00:55:08,010 into which direction you should be looking 1147 00:55:08,010 --> 00:55:10,200 for evidence against this guy. 1148 00:55:10,200 --> 00:55:11,290 All right? 1149 00:55:11,290 --> 00:55:13,820 And so, of course, the narrower the alternative, 1150 00:55:13,820 --> 00:55:15,570 the easier it is for you, because you just 1151 00:55:15,570 --> 00:55:19,410 have to look at the one possible candidate, right? 1152 00:55:19,410 --> 00:55:22,650 But typically, h1 is a big group, like less than. 1153 00:55:22,650 --> 00:55:27,990 Nobody tells you it's either it's 103.5 and 103. 1154 00:55:27,990 --> 00:55:32,460 People tell you it's either 103.5 or less than 103.5. 1155 00:55:32,460 --> 00:55:33,280 OK. 1156 00:55:33,280 --> 00:55:37,330 And so, what we want to do is to decide whether we reject h0. 1157 00:55:37,330 --> 00:55:40,480 So we look for evidence against h0 in the data, OK? 1158 00:55:44,320 --> 00:55:48,880 So as I said, h0 and h1 do not play a symmetric role. 1159 00:55:48,880 --> 00:55:51,700 It's very important to know which one you're 1160 00:55:51,700 --> 00:55:53,465 going to place as h0 and which one you're 1161 00:55:53,465 --> 00:55:54,340 going to place at h1. 1162 00:55:59,260 --> 00:56:01,480 If it's a close call, you're always 1163 00:56:01,480 --> 00:56:04,000 going to side with h0, OK? 1164 00:56:04,000 --> 00:56:05,860 So you have to be careful about those. 1165 00:56:05,860 --> 00:56:07,609 You have to keep that in mind that if it's 1166 00:56:07,609 --> 00:56:10,410 a close call, if data does not carry a lot of evidence, 1167 00:56:10,410 --> 00:56:12,280 you're going to side with h0. 1168 00:56:12,280 --> 00:56:15,700 And so, you're actually never saying that h0 is true. 1169 00:56:15,700 --> 00:56:18,430 You're just saying I did not find evidence against h0. 1170 00:56:18,430 --> 00:56:21,400 You don't say I accept that h0. 1171 00:56:21,400 --> 00:56:25,590 You say I failed to reject h0. 1172 00:56:25,590 --> 00:56:26,090 OK. 1173 00:56:26,090 --> 00:56:28,189 And so one of the things that you 1174 00:56:28,189 --> 00:56:29,980 want to keep in mind when you're doing this 1175 00:56:29,980 --> 00:56:32,960 is this innocent until proven guilty. 1176 00:56:32,960 --> 00:56:37,010 So if you come from a country, like America, 1177 00:56:37,010 --> 00:56:38,090 there's such a thing. 1178 00:56:38,090 --> 00:56:41,270 And in particular, lack of evidence 1179 00:56:41,270 --> 00:56:45,410 does not mean that you are not guilty, all right? 1180 00:56:45,410 --> 00:56:47,720 OJ Simpson was found not guilty. 1181 00:56:47,720 --> 00:56:50,270 It was not found innocent, OK? 1182 00:56:50,270 --> 00:56:52,100 And so, this is basically what happens 1183 00:56:52,100 --> 00:56:55,760 is like the prosecutor brings their evidence. 1184 00:56:55,760 --> 00:56:58,940 And then the jury has to decide whether they 1185 00:56:58,940 --> 00:57:07,400 were convinced that this person was guilty of anything. 1186 00:57:07,400 --> 00:57:11,820 And the question is, do you have enough evidence? 1187 00:57:11,820 --> 00:57:13,550 But if you don't have evidence, it's 1188 00:57:13,550 --> 00:57:17,257 not the burden of the defender to prove that they're innocent. 1189 00:57:17,257 --> 00:57:18,590 Nobody's proving their innocent. 1190 00:57:18,590 --> 00:57:20,120 I mean, sometimes it helps. 1191 00:57:20,120 --> 00:57:22,160 But you just have to make sure that there's not 1192 00:57:22,160 --> 00:57:24,560 enough evidence against you, OK? 1193 00:57:24,560 --> 00:57:26,370 And that's basically what it's doing. 1194 00:57:26,370 --> 00:57:28,980 You're h0 until proven h1. 1195 00:57:31,529 --> 00:57:32,820 So how are we going to do this? 1196 00:57:32,820 --> 00:57:37,200 Well, as I said, the role of estimators 1197 00:57:37,200 --> 00:57:40,460 in hypothesis testing is played by something called tests. 1198 00:57:40,460 --> 00:57:42,300 And a test is a statistic. 1199 00:57:42,300 --> 00:57:44,340 Can somebody remind me what a statistic is? 1200 00:57:47,190 --> 00:57:48,140 Yep? 1201 00:57:48,140 --> 00:57:50,050 STUDENT: The measure [INAUDIBLE] 1202 00:57:50,050 --> 00:57:53,290 PROFESSOR: Yeah, that's actually just one step more. 1203 00:57:53,290 --> 00:57:54,940 So it's a function of the observations. 1204 00:57:54,940 --> 00:57:56,734 And we require it to be measurable. 1205 00:57:56,734 --> 00:57:58,150 And as a rule of thumb, measurable 1206 00:57:58,150 --> 00:58:00,567 means if I give you data, you can actually compute it, OK? 1207 00:58:00,567 --> 00:58:02,649 If you don't see a [INAUDIBLE] or an [INAUDIBLE],, 1208 00:58:02,649 --> 00:58:04,140 you don't have to think about it. 1209 00:58:04,140 --> 00:58:04,780 All right. 1210 00:58:08,950 --> 00:58:11,590 And so, what we do is we just have this test. 1211 00:58:11,590 --> 00:58:14,590 But now I'm actually asking only from this test 1212 00:58:14,590 --> 00:58:18,280 a yes/no answer, which I can code as 0, 1, right? 1213 00:58:18,280 --> 00:58:21,640 So as a rule of thumb, you say that, well, the test 1214 00:58:21,640 --> 00:58:23,140 is equal to 0 then h0. 1215 00:58:23,140 --> 00:58:25,490 The test is equal to 1 at h1. 1216 00:58:25,490 --> 00:58:27,604 And as we said, is that if the test is equal to 0, 1217 00:58:27,604 --> 00:58:29,020 it doesn't mean that a 0 is truth. 1218 00:58:29,020 --> 00:58:31,060 It means that I feel to rejected h0. 1219 00:58:31,060 --> 00:58:33,390 And if the test is equal to 1, I reject h0. 1220 00:58:36,510 --> 00:58:38,750 So I have two possibilities. 1221 00:58:38,750 --> 00:58:39,640 I look at my data. 1222 00:58:39,640 --> 00:58:41,800 I turn it into a yes/no answer. 1223 00:58:41,800 --> 00:58:45,310 And yes/no answer is really h0 or h1, OK? 1224 00:58:45,310 --> 00:58:49,260 Which one is the most likely basically. 1225 00:58:49,260 --> 00:58:50,720 All right. 1226 00:58:50,720 --> 00:58:57,530 So in the coin flip example, our test statistic 1227 00:58:57,530 --> 00:59:00,850 is actually something that takes value 0, 1. 1228 00:59:00,850 --> 00:59:04,600 And anything, any function that takes value at 0, 1229 00:59:04,600 --> 00:59:07,030 1 is an indicator function, OK? 1230 00:59:07,030 --> 00:59:11,507 So an indicator function is just a function. 1231 00:59:11,507 --> 00:59:13,090 So there's many ways you can write it. 1232 00:59:18,760 --> 00:59:20,440 So it's a 1 with a double bar. 1233 00:59:20,440 --> 00:59:21,940 If you aren't comfortable with this, 1234 00:59:21,940 --> 00:59:27,650 it's totally OK to write i of something, like i of a. 1235 00:59:27,650 --> 00:59:28,534 OK. 1236 00:59:28,534 --> 00:59:29,200 And that's what? 1237 00:59:29,200 --> 00:59:34,270 So a, here, is a statement, like an inequality, an equality, 1238 00:59:34,270 --> 00:59:38,600 some mathematical statement, OK? 1239 00:59:38,600 --> 00:59:39,700 Or not mathematical. 1240 00:59:39,700 --> 00:59:43,430 I mean, "a" can be, you know, my grandma is 20 years old, OK? 1241 00:59:43,430 --> 00:59:50,390 And so, this is basically 1 if a is true, and 0 if a is false. 1242 00:59:54,510 --> 00:59:56,260 That's the way you want to think about it. 1243 01:00:02,840 --> 01:00:05,420 This function takes only two values, and that's it. 1244 01:00:10,290 --> 01:00:12,080 So here's the example that we had. 1245 01:00:12,080 --> 01:00:17,330 We looked at whether the standardized xn 1246 01:00:17,330 --> 01:00:20,220 bar, the one that actually is approximately n 0,1 1247 01:00:20,220 --> 01:00:22,590 was larger than something in absolute value, 1248 01:00:22,590 --> 01:00:27,550 either very large or very small, but negative. 1249 01:00:27,550 --> 01:00:29,020 I'm going back to this picture. 1250 01:00:29,020 --> 01:00:31,270 We wanted to know if this guy was 1251 01:00:31,270 --> 01:00:35,660 either to the left of something or to the right of something, 1252 01:00:35,660 --> 01:00:36,160 right? 1253 01:00:36,160 --> 01:00:37,642 Was it in these regions? 1254 01:00:42,352 --> 01:00:49,250 Now this indicator, I can view this as a function of x bar. 1255 01:00:49,250 --> 01:00:52,100 What it does, it really splits the possible values 1256 01:00:52,100 --> 01:00:54,500 of x bar, which is just a real number, right? 1257 01:00:54,500 --> 01:00:56,180 In two groups. 1258 01:00:56,180 --> 01:00:59,030 The groups on which they lead to a value, which is 1. 1259 01:00:59,030 --> 01:01:00,590 And the groups on which they lead 1260 01:01:00,590 --> 01:01:02,270 to value, which is 0, right? 1261 01:01:02,270 --> 01:01:05,420 So what it does is that I can actually think 1262 01:01:05,420 --> 01:01:09,140 of it as the real line, x bar. 1263 01:01:09,140 --> 01:01:13,010 And there's basically some values here, 1264 01:01:13,010 --> 01:01:14,927 where I'm going to get a 1. 1265 01:01:14,927 --> 01:01:16,260 Maybe I'm going to get a 0 here. 1266 01:01:16,260 --> 01:01:17,670 Maybe I'm going to get a 0. 1267 01:01:17,670 --> 01:01:18,840 Maybe I'm going to get a 1. 1268 01:01:18,840 --> 01:01:22,470 I'm just splitting all possible values of x bar. 1269 01:01:22,470 --> 01:01:25,350 And I see whether to spit out the side which is 0 1270 01:01:25,350 --> 01:01:26,880 or which is 1. 1271 01:01:26,880 --> 01:01:29,471 In this case, it's not clear, right? 1272 01:01:29,471 --> 01:01:31,095 I mean, the function is very nonlinear. 1273 01:01:31,095 --> 01:01:34,530 It's x bar minus 0.5 divided by the square root of x bar 1 1274 01:01:34,530 --> 01:01:35,532 minus x bar. 1275 01:01:35,532 --> 01:01:36,990 If we put the p in the denominator, 1276 01:01:36,990 --> 01:01:38,070 that would be clear. 1277 01:01:38,070 --> 01:01:40,530 That would just be exactly something that looks like this. 1278 01:01:45,427 --> 01:01:46,760 The function would be like this. 1279 01:01:46,760 --> 01:01:49,370 It would be 1 if it's smaller than some value. 1280 01:01:49,370 --> 01:01:52,580 Less than 0 if it's in between two values. 1281 01:01:52,580 --> 01:01:54,220 And then 1 again. 1282 01:01:54,220 --> 01:02:00,730 So that's psi, OK? 1283 01:02:00,730 --> 01:02:02,470 So this is 1, right? 1284 01:02:02,470 --> 01:02:03,130 This is 1. 1285 01:02:03,130 --> 01:02:04,540 And this is 0. 1286 01:02:04,540 --> 01:02:07,390 So if x bar is too small or if x bar is too large, 1287 01:02:07,390 --> 01:02:09,590 then I'm getting a value 1. 1288 01:02:09,590 --> 01:02:12,640 But if it's somewhere in between, I'm getting a value 0. 1289 01:02:12,640 --> 01:02:14,410 Now, if I have this weird function, 1290 01:02:14,410 --> 01:02:18,090 it's not clear how this happened. 1291 01:02:18,090 --> 01:02:20,434 So the picture here that I get is 1292 01:02:20,434 --> 01:02:27,080 that I have a weird non-linear function, right? 1293 01:02:27,080 --> 01:02:28,420 So that's x bar. 1294 01:02:28,420 --> 01:02:32,270 That's square root of n x bar n 0.5 1295 01:02:32,270 --> 01:02:36,980 divided by the square root of x bar n 1 minus x bar n, right? 1296 01:02:36,980 --> 01:02:38,210 That's this function. 1297 01:02:38,210 --> 01:02:40,890 A priori, I have no idea what this function looks like. 1298 01:02:43,650 --> 01:02:45,320 We can probably analyze this function, 1299 01:02:45,320 --> 01:02:46,740 but let's pretend we don't know. 1300 01:02:46,740 --> 01:02:49,680 So it's like some crazy stuff like this. 1301 01:02:49,680 --> 01:02:56,320 And all I'm asking is whether in absolute value 1302 01:02:56,320 --> 01:02:59,510 it's larger than c, which means that is this function larger 1303 01:02:59,510 --> 01:03:01,383 than c or less than minus c? 1304 01:03:05,020 --> 01:03:07,100 The intervals on which I'm going to say 1 1305 01:03:07,100 --> 01:03:17,821 are this guy, this guy, this guy, and this guy. 1306 01:03:17,821 --> 01:03:18,320 OK. 1307 01:03:18,320 --> 01:03:20,897 And everywhere else, I'm seeing 0. 1308 01:03:20,897 --> 01:03:21,980 Everybody agree with this? 1309 01:03:21,980 --> 01:03:24,930 This is what I'm doing. 1310 01:03:24,930 --> 01:03:27,370 Now of course, it's probably easier for you 1311 01:03:27,370 --> 01:03:29,350 to just package it into this nice thing that's 1312 01:03:29,350 --> 01:03:31,602 just either larger than c, an absolute value, 1313 01:03:31,602 --> 01:03:33,810 or less Than C. I want to have to plot this function. 1314 01:03:33,810 --> 01:03:36,950 In practice, you don't have to. 1315 01:03:36,950 --> 01:03:40,050 Now, this is where I am actually claiming. 1316 01:03:40,050 --> 01:03:42,200 So here, I actually defined to you a test. 1317 01:03:42,200 --> 01:03:44,500 And I promised, starting this lecture, by saying, 1318 01:03:44,500 --> 01:03:46,250 oh, now we're going to do something better 1319 01:03:46,250 --> 01:03:47,880 than computing the averages. 1320 01:03:47,880 --> 01:03:50,450 Now I'm telling you it's just computing an average. 1321 01:03:50,450 --> 01:03:52,910 And the thing is the test is not just 1322 01:03:52,910 --> 01:03:54,830 the specification of this x bar. 1323 01:03:54,830 --> 01:03:57,901 It's also the specification of this constant c. 1324 01:03:57,901 --> 01:03:58,400 All right? 1325 01:03:58,400 --> 01:04:02,270 And the constant c was exactly where 1326 01:04:02,270 --> 01:04:07,360 our belief about what a large value for a Gaussian is. 1327 01:04:07,360 --> 01:04:09,200 That's exactly where it came in. 1328 01:04:09,200 --> 01:04:12,500 So this choice of c is basically a threshold 1329 01:04:12,500 --> 01:04:15,320 at which we decide above this threshold this isn't 1330 01:04:15,320 --> 01:04:17,104 likely to come from a Gaussian. 1331 01:04:17,104 --> 01:04:18,770 Below this threshold we decide that it's 1332 01:04:18,770 --> 01:04:20,660 likely to come from a Gaussian. 1333 01:04:20,660 --> 01:04:24,200 So we have to choose what this threshold is based 1334 01:04:24,200 --> 01:04:26,750 on what we think likely means. 1335 01:04:34,420 --> 01:04:37,490 Just a little bit more of those things. 1336 01:04:37,490 --> 01:04:39,520 So now we're going to have to characterize 1337 01:04:39,520 --> 01:04:43,030 what makes a good test, right? 1338 01:04:43,030 --> 01:04:44,770 Well, I'll come back to it in a second. 1339 01:04:44,770 --> 01:04:48,755 But you could have a test that says reject all the time. 1340 01:04:48,755 --> 01:04:50,380 And that's going to be bad test, right? 1341 01:04:50,380 --> 01:04:52,120 The FDA is not implementing a test 1342 01:04:52,120 --> 01:04:56,440 that says, yes all drugs work, now let's just go to Aruba, OK? 1343 01:04:56,440 --> 01:04:59,650 So people are trying to have something that 1344 01:04:59,650 --> 01:05:01,560 tries to work all the time. 1345 01:05:01,560 --> 01:05:03,700 Now FDA's not either saying, let's just 1346 01:05:03,700 --> 01:05:07,340 say that no drugs work, and let's go to Aruba, all right? 1347 01:05:07,340 --> 01:05:09,880 They're just trying to say the right thing 1348 01:05:09,880 --> 01:05:11,180 as often as possible. 1349 01:05:11,180 --> 01:05:13,300 And so, we're going to have to measure this. 1350 01:05:13,300 --> 01:05:15,790 So the things that are associated to a test 1351 01:05:15,790 --> 01:05:17,440 are the rejection region. 1352 01:05:17,440 --> 01:05:21,550 And if you look at this x in en, such that psi 1353 01:05:21,550 --> 01:05:25,420 of x is equal to 1, this is exactly this guy that I drew. 1354 01:05:29,024 --> 01:05:30,940 So here, I summarized the values of the sample 1355 01:05:30,940 --> 01:05:32,460 into their average. 1356 01:05:32,460 --> 01:05:35,080 But the values of the sample that I collect 1357 01:05:35,080 --> 01:05:38,691 will lead to a test that says 1. 1358 01:05:38,691 --> 01:05:39,190 All right? 1359 01:05:39,190 --> 01:05:40,610 So this is the rejection region. 1360 01:05:40,610 --> 01:05:43,990 If I collect a data point, technically I have-- 1361 01:05:43,990 --> 01:05:51,250 so I have e to the n, which is a big space like this. 1362 01:05:51,250 --> 01:05:52,540 So that's e to the n. 1363 01:05:52,540 --> 01:05:55,150 Think of it as being the space of xn bars. 1364 01:05:55,150 --> 01:05:59,200 And I have a function that takes only value 0, 1. 1365 01:05:59,200 --> 01:06:01,962 So I can decompose it into this part 1366 01:06:01,962 --> 01:06:04,420 where it takes value 0 and the part where it takes value 1. 1367 01:06:04,420 --> 01:06:06,740 And those can be super complicated, right? 1368 01:06:06,740 --> 01:06:07,880 Can have a thing like this. 1369 01:06:07,880 --> 01:06:11,990 Can have some weird little islands where it takes value 1. 1370 01:06:11,990 --> 01:06:14,090 I can have some islands where it's takes value 0. 1371 01:06:14,090 --> 01:06:16,237 I can have some weird stuff going on. 1372 01:06:16,237 --> 01:06:18,070 But I can always partition it into the value 1373 01:06:18,070 --> 01:06:20,570 where it takes value 0 and the value where it takes value 1. 1374 01:06:20,570 --> 01:06:25,530 And the value where it takes 1, if psi is equal to 1, 1375 01:06:25,530 --> 01:06:32,630 this is called the rejection region of the plot, OK? 1376 01:06:32,630 --> 01:06:37,810 So just the samples that would lead me to rejecting. 1377 01:06:37,810 --> 01:06:42,460 And notice that this is the indicator of the rejection 1378 01:06:42,460 --> 01:06:44,010 region. 1379 01:06:44,010 --> 01:06:48,190 The test is the indicator of the rejection region. 1380 01:06:48,190 --> 01:06:52,530 So there's two ways you can make an error when there's a test. 1381 01:06:52,530 --> 01:06:56,460 Either the truth is in h1, and you're saying actually it's h1. 1382 01:06:56,460 --> 01:06:59,310 Or the truth is in h1, and you say it's h0. 1383 01:06:59,310 --> 01:07:04,710 And that's how we build-in the asymmetry between h0 and h1. 1384 01:07:04,710 --> 01:07:06,480 We control only one of the two errors. 1385 01:07:06,480 --> 01:07:09,510 And we hope for the best for the second one. 1386 01:07:09,510 --> 01:07:13,370 So the type 1 error is the one that says, well, 1387 01:07:13,370 --> 01:07:16,640 if it is actually the status quo, but a claim 1388 01:07:16,640 --> 01:07:19,220 that there is a discovery-- if it's actually h0, 1389 01:07:19,220 --> 01:07:21,700 but I claim that I'm in h1, then I 1390 01:07:21,700 --> 01:07:25,560 admit I commit a type I error. 1391 01:07:25,560 --> 01:07:27,870 And so the probability of type I error 1392 01:07:27,870 --> 01:07:29,730 is this function alpha of psi, which 1393 01:07:29,730 --> 01:07:34,920 is the probability of saying that psi is equal to 1 1394 01:07:34,920 --> 01:07:36,809 when theta is in h0. 1395 01:07:36,809 --> 01:07:38,850 Now, the problem is that this is not just number, 1396 01:07:38,850 --> 01:07:41,790 because theta is just like moving all over h0, right? 1397 01:07:41,790 --> 01:07:45,980 There's many values that theta can be, right? 1398 01:07:45,980 --> 01:07:48,500 So theta is somewhere here. 1399 01:07:52,854 --> 01:07:53,520 I erased it, OK. 1400 01:07:59,120 --> 01:08:00,335 All right. 1401 01:08:00,335 --> 01:08:02,210 For simplicity, we're going to think of theta 1402 01:08:02,210 --> 01:08:07,980 as being mu and 103.5, OK? 1403 01:08:07,980 --> 01:08:11,720 And so, I know that this is theta 1. 1404 01:08:11,720 --> 01:08:18,990 And just this point here was theta 0, OK? 1405 01:08:18,990 --> 01:08:19,490 Agreed? 1406 01:08:19,490 --> 01:08:22,800 This is with the Cherry Blossom Run. 1407 01:08:22,800 --> 01:08:25,699 Now, here in this case, it's actually easy. 1408 01:08:25,699 --> 01:08:27,240 I need to compute this function alpha 1409 01:08:27,240 --> 01:08:37,144 of psi, which maps theta in theta 0 to p theta of psi 1410 01:08:37,144 --> 01:08:37,689 equals 1. 1411 01:08:37,689 --> 01:08:40,682 So that's the probability that I reject when theta is in h0. 1412 01:08:40,682 --> 01:08:42,390 Then there's only one of them to compute, 1413 01:08:42,390 --> 01:08:44,220 because theta can only take this one value. 1414 01:08:44,220 --> 01:08:46,840 So this is really 103.5. 1415 01:08:46,840 --> 01:08:47,340 OK. 1416 01:08:47,340 --> 01:08:48,964 So that's the probability that I reject 1417 01:08:48,964 --> 01:08:52,649 when the true mean was 103.5. 1418 01:08:52,649 --> 01:08:54,870 Now, if I was testing whether-- 1419 01:08:54,870 --> 01:08:57,660 if h0 was this entire guy here, all the 1420 01:08:57,660 --> 01:09:00,090 values larger than 103.5, then I would 1421 01:09:00,090 --> 01:09:03,450 have to compute this function for all possible values 1422 01:09:03,450 --> 01:09:06,220 of the theta in there. 1423 01:09:06,220 --> 01:09:07,220 And guess what? 1424 01:09:07,220 --> 01:09:09,414 The worst case is when it's going to be here. 1425 01:09:09,414 --> 01:09:11,080 Because it's so close to the alternative 1426 01:09:11,080 --> 01:09:15,149 that that's where I'm making the most error possible. 1427 01:09:15,149 --> 01:09:17,550 And then there's the type 2 error, 1428 01:09:17,550 --> 01:09:22,200 which is defined basically in the symmetric ways. 1429 01:09:22,200 --> 01:09:25,089 The function that maps theta to the probability. 1430 01:09:25,089 --> 01:09:26,880 So that's the probability of type 2 errors. 1431 01:09:26,880 --> 01:09:30,330 The probability that I fail to reject h0, right? 1432 01:09:30,330 --> 01:09:34,080 If psi is equal to 0, I fail to reject h0. 1433 01:09:34,080 --> 01:09:39,840 But that actually came from h1, OK? 1434 01:09:39,840 --> 01:09:41,540 So in this example, let's clear. 1435 01:09:41,540 --> 01:09:45,629 If I'm here, like if the true mean was 100, 1436 01:09:45,629 --> 01:09:48,170 I'm looking at the probability that the true mean is actually 1437 01:09:48,170 --> 01:09:51,569 100, and I'm actually saying it was 103.5. 1438 01:09:51,569 --> 01:09:53,504 Or it's not less than 103.5. 1439 01:09:53,504 --> 01:09:54,004 Yeah? 1440 01:09:54,004 --> 01:09:56,045 STUDENT: I'm just still confused by the notation. 1441 01:09:56,045 --> 01:09:59,471 When you say that [INAUDIBLE] theta sub 1 arrow r, 1442 01:09:59,471 --> 01:10:02,650 I'm not sure what that notation means. 1443 01:10:02,650 --> 01:10:04,650 PROFESSOR: Well, this just means it's a function 1444 01:10:04,650 --> 01:10:08,060 that maps theta 0 to r. 1445 01:10:08,060 --> 01:10:09,450 You've seen functions, right? 1446 01:10:09,450 --> 01:10:10,220 OK. 1447 01:10:10,220 --> 01:10:14,690 So that's just the way you write. 1448 01:10:14,690 --> 01:10:20,800 So that means that's a function f that goes from, say, r r, 1449 01:10:20,800 --> 01:10:25,381 and that maps x to x squared. 1450 01:10:25,381 --> 01:10:25,880 OK. 1451 01:10:25,880 --> 01:10:27,921 So here, I'm just saying I don't have to consider 1452 01:10:27,921 --> 01:10:29,030 all possible values. 1453 01:10:29,030 --> 01:10:32,540 I'm only considering the values on theta 0. 1454 01:10:32,540 --> 01:10:33,620 I put r actually. 1455 01:10:33,620 --> 01:10:36,320 I could restrict myself to the interval 0, 1, 1456 01:10:36,320 --> 01:10:38,160 because those are probabilities. 1457 01:10:38,160 --> 01:10:41,090 So it's just telling me where my function comes from 1458 01:10:41,090 --> 01:10:44,330 and where my function goes to. 1459 01:10:44,330 --> 01:10:47,240 And beta is a function, right? 1460 01:10:47,240 --> 01:10:52,990 So beta psi of theta is just the probability 1461 01:10:52,990 --> 01:10:55,610 that theta is equal to 1. 1462 01:10:55,610 --> 01:10:57,930 And I could define that for all thetas-- 1463 01:10:57,930 --> 01:10:58,430 sorry. 1464 01:10:58,430 --> 01:11:00,600 If psi is equal to 0 in this case. 1465 01:11:00,600 --> 01:11:02,810 And that could define that for all thetas. 1466 01:11:02,810 --> 01:11:05,240 But the only ones that lead to an error 1467 01:11:05,240 --> 01:11:06,812 are the thetas that are in h1. 1468 01:11:06,812 --> 01:11:08,270 I mean, I can define this function. 1469 01:11:08,270 --> 01:11:11,121 It's just not going to correspond to an error, OK? 1470 01:11:13,930 --> 01:11:18,960 And the power of a test is the smallest-- 1471 01:11:18,960 --> 01:11:22,101 so the power is basically 1 minus an error. 1472 01:11:22,101 --> 01:11:23,600 1 minus the probability of an error. 1473 01:11:23,600 --> 01:11:27,324 So it's the probability of making a correct decision, OK? 1474 01:11:27,324 --> 01:11:29,490 So it's the probability of making a correct decision 1475 01:11:29,490 --> 01:11:31,830 under h1, that's what the power is. 1476 01:11:31,830 --> 01:11:34,830 But again, this could be a function. 1477 01:11:34,830 --> 01:11:36,900 Because there's many ways that can be in h1 1478 01:11:36,900 --> 01:11:39,150 if h1 is an entire set of numbers. 1479 01:11:39,150 --> 01:11:42,200 For example, all the numbers there are less than 103.5. 1480 01:11:42,200 --> 01:11:45,510 And so, what I'm doing here when I define the power of a test, 1481 01:11:45,510 --> 01:11:50,057 I'm looking at the smallest possible of those values, OK? 1482 01:11:50,057 --> 01:11:51,390 So I'm looking at this function. 1483 01:11:54,140 --> 01:11:57,028 Maybe I should actually expand a little more on this. 1484 01:12:02,700 --> 01:12:03,200 OK. 1485 01:12:03,200 --> 01:12:10,790 So beta psi of theta is the probability under theta 1486 01:12:10,790 --> 01:12:12,590 that psi is equal to 0, right? 1487 01:12:12,590 --> 01:12:18,710 That's the probability in theta 1, 1488 01:12:18,710 --> 01:12:21,110 which means then the alternative, that they 1489 01:12:21,110 --> 01:12:21,884 feel to reject. 1490 01:12:21,884 --> 01:12:23,300 And I really should, because theta 1491 01:12:23,300 --> 01:12:25,520 was actually in theta 1, OK? 1492 01:12:25,520 --> 01:12:29,150 So this thing here is the probability of type 2 error. 1493 01:12:29,150 --> 01:12:34,560 Now, this is 1 minus the probability that I did reject 1494 01:12:34,560 --> 01:12:36,960 and I should have rejected. 1495 01:12:36,960 --> 01:12:39,830 That's just a little off the complement. 1496 01:12:39,830 --> 01:12:42,910 Because if psi is not equal to 0, then it's equal to 1. 1497 01:12:42,910 --> 01:12:44,620 So now if I rearrange this, it tells me 1498 01:12:44,620 --> 01:12:48,260 that the probability that psi is equal to 1-- 1499 01:12:48,260 --> 01:12:50,100 this is actually 1 minus beta psi of theta. 1500 01:12:54,440 --> 01:12:57,110 So that's true for all thetas in theta 1. 1501 01:12:57,110 --> 01:12:58,640 And what I'm saying is, well, this 1502 01:12:58,640 --> 01:13:00,904 is now a good thing, right? 1503 01:13:00,904 --> 01:13:02,570 This number being large is a good thing. 1504 01:13:02,570 --> 01:13:05,330 It means I should have rejected, and I rejected. 1505 01:13:05,330 --> 01:13:07,650 I want this to happen with large probability. 1506 01:13:07,650 --> 01:13:09,025 And so, what I'm going to look at 1507 01:13:09,025 --> 01:13:11,894 is the most conservative choice of this number, right? 1508 01:13:11,894 --> 01:13:13,310 Rather than being super optimistic 1509 01:13:13,310 --> 01:13:16,970 and say, oh, but indeed if theta was actually equal to zero, 1510 01:13:16,970 --> 01:13:19,300 then I'm always going to conclude that-- 1511 01:13:19,300 --> 01:13:22,730 I mean, if mu is equal to 0, everybody runs in 0 seconds, 1512 01:13:22,730 --> 01:13:25,990 then I with high probability I'm actually 1513 01:13:25,990 --> 01:13:27,770 going to make no mistake. 1514 01:13:27,770 --> 01:13:30,920 But really, I should look at the worst possible case, OK? 1515 01:13:30,920 --> 01:13:32,900 So what I'm looking at is basically 1516 01:13:32,900 --> 01:13:45,002 the smallest value it can take on theta one 1517 01:13:45,002 --> 01:13:53,490 is called power of psi. 1518 01:13:53,490 --> 01:13:55,850 Power of the test psi, OK? 1519 01:13:55,850 --> 01:13:58,930 So that's the smallest possible value it can take. 1520 01:14:01,610 --> 01:14:02,110 All right. 1521 01:14:02,110 --> 01:14:02,651 So I'm sorry. 1522 01:14:02,651 --> 01:14:05,310 This is a lot of definitions that you have to sink in. 1523 01:14:05,310 --> 01:14:06,870 And it's not super pleasant. 1524 01:14:06,870 --> 01:14:09,180 But that's what testing is. 1525 01:14:09,180 --> 01:14:10,540 There's a lot of jargon. 1526 01:14:10,540 --> 01:14:12,440 Those are actually fairly simple things. 1527 01:14:12,440 --> 01:14:14,460 Just maybe you should get a sheet for yourself. 1528 01:14:14,460 --> 01:14:17,400 And say, these are the new terms that I learned. 1529 01:14:17,400 --> 01:14:19,584 What is their test, rejection region? 1530 01:14:19,584 --> 01:14:21,000 Probably of type I error, probably 1531 01:14:21,000 --> 01:14:22,740 of type 2 error, and power. 1532 01:14:22,740 --> 01:14:23,910 Just make sure you know what those guys are. 1533 01:14:23,910 --> 01:14:24,410 Oh. 1534 01:14:24,410 --> 01:14:27,162 And null and alternative hypothesis, OK? 1535 01:14:27,162 --> 01:14:28,620 And once you know all these things, 1536 01:14:28,620 --> 01:14:29,953 you know what I'm talking about. 1537 01:14:29,953 --> 01:14:31,320 You know what I'm referring to. 1538 01:14:31,320 --> 01:14:33,000 And this is just jargon. 1539 01:14:33,000 --> 01:14:35,610 But in the end, those are just probabilities. 1540 01:14:35,610 --> 01:14:38,369 I mean, these a natural quantities. 1541 01:14:38,369 --> 01:14:40,160 Just for some reason, people have been used 1542 01:14:40,160 --> 01:14:43,850 to using different terminology. 1543 01:14:43,850 --> 01:14:46,420 So just to illustrate. 1544 01:14:46,420 --> 01:14:48,090 When do I make a typo 1 error? 1545 01:14:48,090 --> 01:14:51,880 And when do I not make a type 1 error? 1546 01:14:51,880 --> 01:14:56,500 So I make a type 1 error if h0 is true and I reject h0, right? 1547 01:14:56,500 --> 01:14:59,560 So the off diagonal blocks are when I make an error. 1548 01:14:59,560 --> 01:15:02,920 When I'm on the diagonal terms, h1 is true 1549 01:15:02,920 --> 01:15:05,530 and I reject h0, that's a correct decision. 1550 01:15:05,530 --> 01:15:08,200 When h0 is true and I fail to reject h0, 1551 01:15:08,200 --> 01:15:11,210 that's also the correct decision to make. 1552 01:15:11,210 --> 01:15:17,110 So I only make errors when I'm in one of the red blocks. 1553 01:15:17,110 --> 01:15:20,860 And one block is the type 1 error and the other block 1554 01:15:20,860 --> 01:15:21,730 is the type 2 error. 1555 01:15:21,730 --> 01:15:24,059 That's all it means, OK? 1556 01:15:24,059 --> 01:15:26,100 So you just have to know which one we called one. 1557 01:15:32,460 --> 01:15:36,960 I mean, this was chosen in a pretty ad hoc way. 1558 01:15:36,960 --> 01:15:40,440 So to conclude this lecture, let me ask you a few questions. 1559 01:15:40,440 --> 01:15:46,780 If in a US court, the defendant is found either say, 1560 01:15:46,780 --> 01:15:49,601 let's just say for the sake of discussion, innocent or guilty. 1561 01:15:49,601 --> 01:15:50,100 All right? 1562 01:15:50,100 --> 01:15:51,516 It's really guilty for not guilty, 1563 01:15:51,516 --> 01:15:53,900 but let's say innocent or guilty. 1564 01:15:53,900 --> 01:15:56,106 When does the jury make a type 1 error? 1565 01:16:03,414 --> 01:16:03,914 Yep? 1566 01:16:07,840 --> 01:16:10,180 And he's guilty? 1567 01:16:10,180 --> 01:16:11,560 And he's innocent, right? 1568 01:16:11,560 --> 01:16:14,560 The status quo, everybody is innocent until proven guilty. 1569 01:16:14,560 --> 01:16:18,500 So that's our h0 is that the person is innocent. 1570 01:16:18,500 --> 01:16:21,510 And so, that means that h0 is innocent. 1571 01:16:21,510 --> 01:16:23,760 And so, we're looking at the probably of type 1 error, 1572 01:16:23,760 --> 01:16:25,560 so that's when we reject the fact that it's innocent. 1573 01:16:25,560 --> 01:16:27,752 So conclude that this person is guilty, OK? 1574 01:16:27,752 --> 01:16:29,710 So type 1 error is when this person is innocent 1575 01:16:29,710 --> 01:16:31,090 and we conclude it's guilty. 1576 01:16:31,090 --> 01:16:32,131 What is the type 2 error? 1577 01:16:36,390 --> 01:16:38,160 Letting a guilty person go free, which 1578 01:16:38,160 --> 01:16:40,290 actually according to the constitution, 1579 01:16:40,290 --> 01:16:42,320 is the better of the two. 1580 01:16:42,320 --> 01:16:42,820 All right? 1581 01:16:42,820 --> 01:16:45,361 So what we're going to try to do is to control the first one, 1582 01:16:45,361 --> 01:16:47,870 and hope for the best for the second one. 1583 01:16:47,870 --> 01:16:51,260 How could the jury make sure that they make no type 1 1584 01:16:51,260 --> 01:16:52,500 error ever? 1585 01:16:57,870 --> 01:17:01,406 Always let the guy go free, right? 1586 01:17:01,406 --> 01:17:03,030 What is the effect on the type 2 error? 1587 01:17:06,600 --> 01:17:08,430 Yeah, it's the worst possible, right? 1588 01:17:08,430 --> 01:17:12,550 I mean, basically, for every guy that's guilty, you let them go. 1589 01:17:12,550 --> 01:17:14,970 That's the worst you can do. 1590 01:17:14,970 --> 01:17:15,930 And same thing, right? 1591 01:17:15,930 --> 01:17:20,470 How can the jury make sure that there's no type 2 error? 1592 01:17:20,470 --> 01:17:21,140 Always convict. 1593 01:17:21,140 --> 01:17:22,890 What is the effect on the American budget? 1594 01:17:22,890 --> 01:17:25,135 What is the effect on the type 1 error? 1595 01:17:28,180 --> 01:17:28,680 Right. 1596 01:17:28,680 --> 01:17:31,710 So the effect is that basically the type 1 error is maximized. 1597 01:17:31,710 --> 01:17:33,540 So there's this trade off between type 1 1598 01:17:33,540 --> 01:17:35,430 and type 2 error that's inherent. 1599 01:17:35,430 --> 01:17:39,030 And that's why we have this sort of multi objective thing. 1600 01:17:39,030 --> 01:17:41,530 We're trying to minimize two things at the same time. 1601 01:17:41,530 --> 01:17:44,340 And I can't find many ad hoc ways, right? 1602 01:17:44,340 --> 01:17:46,730 So if you've taken any optimization, 1603 01:17:46,730 --> 01:17:49,040 trying to optimize two things when one is going up 1604 01:17:49,040 --> 01:17:51,540 while the other one is going down, the only thing you can do 1605 01:17:51,540 --> 01:17:53,370 is make ad hoc heuristics. 1606 01:17:53,370 --> 01:17:55,740 Maybe you try to minimize the sum of those two guys. 1607 01:17:55,740 --> 01:17:59,310 Maybe you try to minimize 1/3 of the first guy 1608 01:17:59,310 --> 01:18:00,940 plus 2/3 of the second guy. 1609 01:18:00,940 --> 01:18:03,390 Maybe you try to minimize the first guy plus the square 1610 01:18:03,390 --> 01:18:04,140 of the second guy. 1611 01:18:04,140 --> 01:18:05,973 You can think of many ways, but none of them 1612 01:18:05,973 --> 01:18:07,470 is more justified than the other. 1613 01:18:07,470 --> 01:18:10,120 However, for statistical hypothesis testing, 1614 01:18:10,120 --> 01:18:12,790 there's one that's very well justified, which is just 1615 01:18:12,790 --> 01:18:15,990 constrain your type 1 error to be the smallest, 1616 01:18:15,990 --> 01:18:18,460 to be at a level that you deem acceptable. 1617 01:18:18,460 --> 01:18:18,960 5%. 1618 01:18:24,510 --> 01:18:27,850 I want to convict at most 5% of innocent people. 1619 01:18:27,850 --> 01:18:29,850 That's what I deem reasonable. 1620 01:18:29,850 --> 01:18:33,570 And based on that, I'm going to try to convict as many people 1621 01:18:33,570 --> 01:18:37,170 as they can, all right? 1622 01:18:37,170 --> 01:18:39,780 So that's called the Nieman Pearson paradigm, 1623 01:18:39,780 --> 01:18:42,240 and we'll talk about it next time. 1624 01:18:42,240 --> 01:18:43,140 All right. 1625 01:18:43,140 --> 01:18:44,990 Thank you.