1 00:00:00,120 --> 00:00:02,460 The following content is provided under a Creative 2 00:00:02,460 --> 00:00:03,880 Commons license. 3 00:00:03,880 --> 00:00:06,090 Your support will help MIT OpenCourseWare 4 00:00:06,090 --> 00:00:10,180 continue to offer high quality educational resources for free. 5 00:00:10,180 --> 00:00:12,720 To make a donation or to view additional materials 6 00:00:12,720 --> 00:00:16,200 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:16,200 --> 00:00:17,625 at ocw.mit.edu. 8 00:00:20,507 --> 00:00:22,590 PHILIPPE RIGOLLET: --of our limiting distribution, 9 00:00:22,590 --> 00:00:24,259 which happen to be Gaussian. 10 00:00:24,259 --> 00:00:25,800 But if the central limit theorem told 11 00:00:25,800 --> 00:00:28,560 us that the limiting distribution of some average 12 00:00:28,560 --> 00:00:30,549 was something that looked like a Poisson 13 00:00:30,549 --> 00:00:32,340 or an [? exponential, ?] then we would just 14 00:00:32,340 --> 00:00:34,770 have in the same way taken the quintiles 15 00:00:34,770 --> 00:00:36,700 of the exponential distribution. 16 00:00:36,700 --> 00:00:39,440 So let's go back to what we had. 17 00:00:39,440 --> 00:00:46,990 So generically if you have a set of observations X1 to Xn. 18 00:00:46,990 --> 00:00:52,180 So remember for the kiss example they were denoted by R1 to Rn, 19 00:00:52,180 --> 00:00:55,240 because they were turning the head to the right, 20 00:00:55,240 --> 00:00:56,850 but let's just go back. 21 00:00:56,850 --> 00:00:59,800 We say X1 to Xn, and in this case 22 00:00:59,800 --> 00:01:02,710 I'm going to assume they're IID, and I'm 23 00:01:02,710 --> 00:01:05,700 going to make them Bernoulli with [INAUDIBLE] p, 24 00:01:05,700 --> 00:01:06,710 and p is unknown, right? 25 00:01:10,150 --> 00:01:11,600 So what did we do from here? 26 00:01:11,600 --> 00:01:15,824 Well, we said p is the expectation of Xi, 27 00:01:15,824 --> 00:01:17,990 and actually we didn't even think about it too much. 28 00:01:17,990 --> 00:01:19,090 We said, well, if I need to estimate 29 00:01:19,090 --> 00:01:21,460 the proportion of people who turn their head to the right 30 00:01:21,460 --> 00:01:22,960 when they kiss, I just basically I'm 31 00:01:22,960 --> 00:01:24,400 going to compute the average. 32 00:01:24,400 --> 00:01:28,660 So our p hat was just Xn bar, which was just 1 33 00:01:28,660 --> 00:01:32,170 over n sum from i over 1 2n of the Xi. 34 00:01:32,170 --> 00:01:34,990 The average of the observations was their estimate. 35 00:01:34,990 --> 00:01:37,690 And then we wanted to build some confidence intervals 36 00:01:37,690 --> 00:01:38,220 around this. 37 00:01:38,220 --> 00:01:41,360 So what we wanted to understand is, how much that this p hat 38 00:01:41,360 --> 00:01:42,970 fluctuates. 39 00:01:42,970 --> 00:01:44,060 This is a random variable. 40 00:01:44,060 --> 00:01:45,100 It's an average of random variables. 41 00:01:45,100 --> 00:01:46,570 It's a random variable, so we want 42 00:01:46,570 --> 00:01:47,740 to know what the distribution is. 43 00:01:47,740 --> 00:01:49,406 And if we know what the distribution is, 44 00:01:49,406 --> 00:01:51,670 then we actually know, well, where it fluctuates. 45 00:01:51,670 --> 00:01:52,810 What the expectation is. 46 00:01:52,810 --> 00:01:55,649 Around which value it tends to fluctuate et cetera. 47 00:01:55,649 --> 00:01:57,190 And so what the central limit theorem 48 00:01:57,190 --> 00:02:03,310 told us was if I take square root of n times Xn bar minus p, 49 00:02:03,310 --> 00:02:04,990 which is its average. 50 00:02:04,990 --> 00:02:07,445 And then I divide it by the standard deviation. 51 00:02:10,840 --> 00:02:15,670 Then this thing here converges as n goes to infinity, 52 00:02:15,670 --> 00:02:17,380 and we will say a little bit more 53 00:02:17,380 --> 00:02:19,360 about what it means in distribution 54 00:02:19,360 --> 00:02:23,157 to some standard normal random variable. 55 00:02:23,157 --> 00:02:24,740 So that was the central limit theorem. 56 00:02:27,069 --> 00:02:28,610 So what it means is that when I think 57 00:02:28,610 --> 00:02:35,410 of this as a random variable, when n is large enough 58 00:02:35,410 --> 00:02:37,630 it's going to look like this. 59 00:02:37,630 --> 00:02:40,030 And so I understand perfectly its fluctuations. 60 00:02:40,030 --> 00:02:43,450 I know that this thing here has-- 61 00:02:43,450 --> 00:02:45,520 I know the probability of being in this zone. 62 00:02:45,520 --> 00:02:47,890 I know that this number here is 0. 63 00:02:47,890 --> 00:02:49,600 I know a bunch of things. 64 00:02:49,600 --> 00:02:51,910 And then, in particular, what I was 65 00:02:51,910 --> 00:02:55,990 interested in was that the probability, that's 66 00:02:55,990 --> 00:02:59,110 the absolute value of a Gaussian random variable, 67 00:02:59,110 --> 00:03:05,111 exceeds q alpha over 2, q alpha over 2. 68 00:03:05,111 --> 00:03:06,610 We said that this was equal to what? 69 00:03:13,610 --> 00:03:15,527 Anybody? 70 00:03:15,527 --> 00:03:16,110 What was that? 71 00:03:16,110 --> 00:03:18,137 AUDIENCE: [INAUDIBLE] 72 00:03:18,137 --> 00:03:19,470 PHILIPPE RIGOLLET: Alpha, right? 73 00:03:19,470 --> 00:03:21,210 So that's the probability. 74 00:03:21,210 --> 00:03:23,060 That's my random variable. 75 00:03:23,060 --> 00:03:27,050 So this is by definition q alpha over 2 is the number. 76 00:03:27,050 --> 00:03:29,960 So that to the right of it is alpha over 2. 77 00:03:29,960 --> 00:03:34,100 And this is a negative q alpha over 2 by symmetry. 78 00:03:34,100 --> 00:03:36,120 And so the probability that i exceeds-- well, 79 00:03:36,120 --> 00:03:38,150 it's not very symmetric, but the probability 80 00:03:38,150 --> 00:03:41,020 that i exceeds this value, q alpha over 2, 81 00:03:41,020 --> 00:03:46,250 is just the sum of the two gray areas. 82 00:03:46,250 --> 00:03:47,360 All right? 83 00:03:47,360 --> 00:03:50,605 So now I said that this thing was approximately equal, 84 00:03:50,605 --> 00:03:51,980 due to the central limit theorem, 85 00:03:51,980 --> 00:03:55,320 to the probability, that square root of n. 86 00:03:55,320 --> 00:03:59,063 Xn bar minus p divided by square root p 1 minus p. 87 00:04:04,970 --> 00:04:10,180 Well, absolute value was larger than q alpha over 2. 88 00:04:10,180 --> 00:04:12,870 Well, then this thing by default is actually approximately equal 89 00:04:12,870 --> 00:04:16,870 to alpha, just because of virtue of the central limit theorem. 90 00:04:16,870 --> 00:04:23,770 And then we just said, well, I'll solve for p. 91 00:04:23,770 --> 00:04:28,420 Has anyone attempted to solve the degree two equation for p 92 00:04:28,420 --> 00:04:29,412 in the homework? 93 00:04:29,412 --> 00:04:30,370 Everybody has tried it? 94 00:04:35,400 --> 00:04:37,740 So essentially, this is going to be an equation in p. 95 00:04:37,740 --> 00:04:39,240 Sometimes we don't want to solve it. 96 00:04:39,240 --> 00:04:41,823 Some of the p's we will replace by their worst possible value. 97 00:04:41,823 --> 00:04:44,430 For example, we said one of the tricks we had was 98 00:04:44,430 --> 00:04:48,830 that this value here, square root of p 1 minus p, 99 00:04:48,830 --> 00:04:51,217 was always less than one half. 100 00:04:51,217 --> 00:04:53,550 Until we could actually get the confidence interval that 101 00:04:53,550 --> 00:04:55,174 was larger than all possible confidence 102 00:04:55,174 --> 00:04:57,170 intervals for all possible values of p, 103 00:04:57,170 --> 00:04:59,390 but we could solve for p. 104 00:04:59,390 --> 00:05:01,570 Do we all agree on the principle of what we did? 105 00:05:01,570 --> 00:05:03,840 So that's how you build confidence intervals. 106 00:05:03,840 --> 00:05:05,360 Now let's step back for a second, 107 00:05:05,360 --> 00:05:08,070 and see what was important in the building of this confidence 108 00:05:08,070 --> 00:05:09,470 interval. 109 00:05:09,470 --> 00:05:11,870 The really key thing is that I didn't tell you 110 00:05:11,870 --> 00:05:15,350 why I formed this thing, right? 111 00:05:15,350 --> 00:05:17,120 We started from x bar, and then I 112 00:05:17,120 --> 00:05:21,000 took some weird function of x bar that depended on p and n. 113 00:05:21,000 --> 00:05:23,824 And the reason is, because when I take this function, 114 00:05:23,824 --> 00:05:25,240 the central limit theorem tells me 115 00:05:25,240 --> 00:05:28,009 that it converges to something that I know. 116 00:05:28,009 --> 00:05:30,550 But this very important thing about the something that I know 117 00:05:30,550 --> 00:05:35,030 is that it does not depend on anything that I don't know. 118 00:05:35,030 --> 00:05:36,800 For example, if I forgot to divide 119 00:05:36,800 --> 00:05:40,220 by square root of p 1 minus p, then this thing would have 120 00:05:40,220 --> 00:05:43,980 had a variance, which is the p 1 minus p. 121 00:05:43,980 --> 00:05:47,620 If I didn't remove this p here, the mean here 122 00:05:47,620 --> 00:05:49,860 would have been affected by p. 123 00:05:49,860 --> 00:05:53,041 And there's no table for normal p 1. 124 00:05:53,041 --> 00:05:53,540 Yes? 125 00:05:53,540 --> 00:05:55,834 AUDIENCE: [INAUDIBLE] 126 00:05:55,834 --> 00:05:58,000 PHILIPPE RIGOLLET: Oh, so the square root of n terms 127 00:05:58,000 --> 00:05:58,500 come from. 128 00:05:58,500 --> 00:06:00,830 So really you should view this. 129 00:06:00,830 --> 00:06:04,780 So there's a rule and sort of a quiet rule in math 130 00:06:04,780 --> 00:06:08,990 that you don't write a divided by b over c, right? 131 00:06:08,990 --> 00:06:12,714 You write c times a divided by b, because it looks nicer. 132 00:06:12,714 --> 00:06:14,380 But the way you want to think about this 133 00:06:14,380 --> 00:06:20,600 is that this is x bar minus p divided by the square root of p 134 00:06:20,600 --> 00:06:23,839 1 minus p divided by n. 135 00:06:23,839 --> 00:06:25,630 And the reason is, because this is actually 136 00:06:25,630 --> 00:06:27,000 the standard deviation of this-- 137 00:06:27,000 --> 00:06:28,720 oh sorry, x bar n. 138 00:06:28,720 --> 00:06:31,510 This is actually the standard deviation of this guy, 139 00:06:31,510 --> 00:06:36,540 and the square root of n comes from the [INAUDIBLE] average. 140 00:06:36,540 --> 00:06:39,922 So the key thing was that this thing, 141 00:06:39,922 --> 00:06:42,130 this limiting distribution did not depend on anything 142 00:06:42,130 --> 00:06:43,340 I don't know. 143 00:06:43,340 --> 00:06:45,635 And this is actually called a pivotal distribution. 144 00:06:45,635 --> 00:06:47,690 It's pivotal. 145 00:06:47,690 --> 00:06:49,020 I don't need anything. 146 00:06:49,020 --> 00:06:51,750 I don't need to know anything, and I can read it in a table. 147 00:06:51,750 --> 00:06:54,320 Sometimes there's going to be complicated things, 148 00:06:54,320 --> 00:06:55,430 but now we have computers. 149 00:06:55,430 --> 00:06:57,846 The beauty about Gaussian is that people have studied them 150 00:06:57,846 --> 00:07:00,049 to death, and you can open any stats textbook, 151 00:07:00,049 --> 00:07:02,090 and you will see a table again that will tell you 152 00:07:02,090 --> 00:07:04,140 for each value of alpha you're interested in, 153 00:07:04,140 --> 00:07:07,220 it will tell you what q alpha over 2 is. 154 00:07:07,220 --> 00:07:10,566 But there might be some crazy distributions, 155 00:07:10,566 --> 00:07:12,440 but as long as they don't depend on anything, 156 00:07:12,440 --> 00:07:13,981 we might actually be able to simulate 157 00:07:13,981 --> 00:07:16,540 from them, and in particular compute what q alpha over 2 158 00:07:16,540 --> 00:07:19,157 is for any possible value [INAUDIBLE].. 159 00:07:19,157 --> 00:07:21,240 And so that's what we're going to be trying to do. 160 00:07:21,240 --> 00:07:22,800 Finding pivotal distributions. 161 00:07:22,800 --> 00:07:26,060 How do we take this Xn bar, which is a good estimate, 162 00:07:26,060 --> 00:07:28,940 and turn it into something which may be exactly 163 00:07:28,940 --> 00:07:31,100 or asymptotically does not depend 164 00:07:31,100 --> 00:07:33,410 on any unknown parameter. 165 00:07:33,410 --> 00:07:35,600 So here is one way we can actually-- 166 00:07:35,600 --> 00:07:38,084 so that's what we did for the kiss example, right? 167 00:07:38,084 --> 00:07:39,500 And here I mentioned, for example, 168 00:07:39,500 --> 00:07:41,780 in the extreme case, when n was equal to 3 169 00:07:41,780 --> 00:07:44,240 we would get a different thing, but here the CLT 170 00:07:44,240 --> 00:07:45,860 would not be valid. 171 00:07:45,860 --> 00:07:49,520 And what that means is that my pivotal distribution 172 00:07:49,520 --> 00:07:52,870 is actually not the normal distribution, 173 00:07:52,870 --> 00:07:54,620 but it might be something else. 174 00:07:54,620 --> 00:07:56,920 And I said we can make take exact computations. 175 00:07:56,920 --> 00:07:58,510 Well, let's see what it is, right? 176 00:07:58,510 --> 00:08:06,610 If I have three observations, so I'm going to have X1, X2, X3. 177 00:08:06,610 --> 00:08:08,810 So now I take the average of those guys. 178 00:08:13,260 --> 00:08:15,404 OK, so that's my estimate. 179 00:08:15,404 --> 00:08:16,820 How many values can this guy take? 180 00:08:23,125 --> 00:08:25,065 It's a little bit of counting. 181 00:08:27,980 --> 00:08:28,529 Four values. 182 00:08:28,529 --> 00:08:29,820 How did you get to that number? 183 00:08:37,590 --> 00:08:41,919 OK, so each of these guys can take value 0, 1, right? 184 00:08:41,919 --> 00:08:43,669 So the number of values that it can take, 185 00:08:43,669 --> 00:08:45,730 I mean, it's a little annoying, because then I 186 00:08:45,730 --> 00:08:47,110 have to sum them, right? 187 00:08:47,110 --> 00:08:51,620 So basically, I have to count the number of 1's. 188 00:08:51,620 --> 00:08:54,789 So how many 1's can I get, right? 189 00:08:54,789 --> 00:08:57,330 Sorry I have to-- yeah, so this is the number of 1's that I-- 190 00:08:57,330 --> 00:08:58,500 OK, so let's look at that. 191 00:08:58,500 --> 00:09:00,182 So we get 0, 0, 0. 192 00:09:00,182 --> 00:09:01,690 0, 0, 1. 193 00:09:01,690 --> 00:09:03,350 And then I get basically three of them 194 00:09:03,350 --> 00:09:04,975 that have just the one in there, right? 195 00:09:07,660 --> 00:09:08,920 So there's three of them. 196 00:09:08,920 --> 00:09:12,710 How many of them have exactly two 1's? 197 00:09:12,710 --> 00:09:13,350 2. 198 00:09:13,350 --> 00:09:15,270 Sorry, 3, right? 199 00:09:15,270 --> 00:09:18,230 So it's just this guy where I replaced the 0's and the 1. 200 00:09:18,230 --> 00:09:21,390 OK, so now I get-- 201 00:09:21,390 --> 00:09:23,750 so here I get three that take the value 1, 202 00:09:23,750 --> 00:09:25,680 and one that gets the value 0. 203 00:09:25,680 --> 00:09:28,110 And then I get three that take the value 2, 204 00:09:28,110 --> 00:09:30,970 and then one that takes the value 1. 205 00:09:30,970 --> 00:09:33,100 The value [? 0 ?] 1's, right? 206 00:09:33,100 --> 00:09:35,870 OK, so everybody knows what I'm missing here is just the ones 207 00:09:35,870 --> 00:09:38,440 here where I replaced the 0's by 1's. 208 00:09:38,440 --> 00:09:40,480 So the number of values that this thing can take 209 00:09:40,480 --> 00:09:43,080 is 1, 2, 3, 4. 210 00:09:43,080 --> 00:09:45,749 So someone is counting much faster than me. 211 00:09:45,749 --> 00:09:48,040 And so those numbers, you've probably seen them before, 212 00:09:48,040 --> 00:09:48,539 right? 213 00:09:48,539 --> 00:09:50,530 1, 3, 3, 1, remember? 214 00:09:50,530 --> 00:09:52,930 And so essentially those guys, it 215 00:09:52,930 --> 00:09:58,760 takes only three values, which are either 1/3, 1. 216 00:09:58,760 --> 00:10:02,332 Sorry, 1/3. 217 00:10:02,332 --> 00:10:06,400 Oh OK, so it's 0, sorry. 218 00:10:06,400 --> 00:10:10,100 1/3, 2/3, and 1. 219 00:10:10,100 --> 00:10:12,790 Those are the four possible values you can take. 220 00:10:12,790 --> 00:10:14,870 And so now-- which is probably much easier 221 00:10:14,870 --> 00:10:16,714 to count like that-- and so now all 222 00:10:16,714 --> 00:10:18,380 I have to tell you if I want to describe 223 00:10:18,380 --> 00:10:20,240 the distribution of this probability 224 00:10:20,240 --> 00:10:23,090 of this random variable, is just the probability 225 00:10:23,090 --> 00:10:24,990 that it takes each of these values. 226 00:10:24,990 --> 00:10:30,170 So X bar 3 takes the value 0 probability 227 00:10:30,170 --> 00:10:34,160 that X bar 3 takes the value 1/3, et cetera. 228 00:10:34,160 --> 00:10:36,262 If I give you each of these possible values, 229 00:10:36,262 --> 00:10:38,720 then you will be able to know exactly what the distribution 230 00:10:38,720 --> 00:10:41,720 is, and hopefully maybe to turn it into something 231 00:10:41,720 --> 00:10:42,691 you can compute. 232 00:10:42,691 --> 00:10:44,690 Now the thing is that those values will actually 233 00:10:44,690 --> 00:10:47,290 depend on the unknown p. 234 00:10:47,290 --> 00:10:48,440 What is the unknown p here? 235 00:10:48,440 --> 00:10:49,856 What is the probability that X bar 236 00:10:49,856 --> 00:10:52,214 3 is equal to 0 for example? 237 00:10:52,214 --> 00:10:53,166 I'm sorry? 238 00:10:53,166 --> 00:10:54,594 AUDIENCE: [INAUDIBLE] 239 00:10:54,594 --> 00:10:55,760 PHILIPPE RIGOLLET: Yeah, OK. 240 00:10:55,760 --> 00:10:59,930 So let's write it without making the computation So 1/8 is 241 00:10:59,930 --> 00:11:03,866 probably not the right answer, right? 242 00:11:03,866 --> 00:11:09,267 For example, if p is equal to 0, what is this probability? 243 00:11:09,267 --> 00:11:10,740 1. 244 00:11:10,740 --> 00:11:13,980 If p is 1, what is this probability? 245 00:11:13,980 --> 00:11:14,480 0. 246 00:11:14,480 --> 00:11:16,250 So it will depend on p. 247 00:11:16,250 --> 00:11:18,314 So the probability that this thing is equal to 0, 248 00:11:18,314 --> 00:11:20,480 is just the probability that all three of those guys 249 00:11:20,480 --> 00:11:21,606 are equal to 0. 250 00:11:21,606 --> 00:11:24,105 The probability that X1 is equal to 0, and X2 is equal to 0, 251 00:11:24,105 --> 00:11:25,532 and X3 is equal to 0. 252 00:11:25,532 --> 00:11:26,990 Now my things are independent, so I 253 00:11:26,990 --> 00:11:28,340 do what I actually want to do, which 254 00:11:28,340 --> 00:11:29,964 say the probability of the intersection 255 00:11:29,964 --> 00:11:32,330 is the product of the probabilities, right? 256 00:11:32,330 --> 00:11:34,940 So it's just the probability that each of them is equal to 0 257 00:11:34,940 --> 00:11:36,315 to the power of 3. 258 00:11:36,315 --> 00:11:38,690 And the probability that each of them, or say one of them 259 00:11:38,690 --> 00:11:41,585 is equal to 0, is just 1 minus p. 260 00:11:45,960 --> 00:11:48,900 And then for this guy I just get the probability-- well, 261 00:11:48,900 --> 00:11:51,690 it's more complicated, because I have to decide which one it is. 262 00:11:51,690 --> 00:11:53,580 But those things are just the probability 263 00:11:53,580 --> 00:11:56,290 of some binomial random variables, right? 264 00:11:56,290 --> 00:12:00,320 This is just a binomial, X bar 3. 265 00:12:00,320 --> 00:12:03,986 So if I look at X bar 3, and then I multiply it by 3, 266 00:12:03,986 --> 00:12:05,860 it's just this sum of independent Bernoulli's 267 00:12:05,860 --> 00:12:07,120 with parameter p. 268 00:12:07,120 --> 00:12:11,696 So this is actually a binomial with parameter 3 and p. 269 00:12:11,696 --> 00:12:13,070 And there's tables for binomials, 270 00:12:13,070 --> 00:12:16,567 and they tell you all this. 271 00:12:16,567 --> 00:12:18,650 Now the thing is I want to invert this guy, right? 272 00:12:18,650 --> 00:12:19,870 Somehow. 273 00:12:19,870 --> 00:12:21,055 This thing depends on p. 274 00:12:21,055 --> 00:12:22,990 I don't like it, so I'm going to have 275 00:12:22,990 --> 00:12:25,874 to find ways to get this things depending on p, 276 00:12:25,874 --> 00:12:27,790 and I could make all these nasty computations, 277 00:12:27,790 --> 00:12:29,710 and spend hours doing this. 278 00:12:29,710 --> 00:12:31,330 But there's tricks to go around this. 279 00:12:31,330 --> 00:12:32,410 There's upper bounds. 280 00:12:32,410 --> 00:12:34,390 Just like we just said, well, maybe I 281 00:12:34,390 --> 00:12:36,840 don't want to solve the second degree equation in p, 282 00:12:36,840 --> 00:12:40,360 because it's just going to capture maybe smaller order 283 00:12:40,360 --> 00:12:41,030 terms, right? 284 00:12:41,030 --> 00:12:43,930 Things that maybe won't make a huge difference numerically. 285 00:12:43,930 --> 00:12:46,900 You can check that in your problem set one. 286 00:12:46,900 --> 00:12:48,910 Does it make a huge difference numerically 287 00:12:48,910 --> 00:12:50,590 to solve the second degree equation, 288 00:12:50,590 --> 00:12:52,960 or to just use the [INAUDIBLE] p 1 289 00:12:52,960 --> 00:12:56,050 minus p or even to plug in p hat instead of p. 290 00:12:56,050 --> 00:12:57,720 Those are going to be the-- problem 291 00:12:57,720 --> 00:13:01,540 set one is to make sure that you see what magnitude of changes 292 00:13:01,540 --> 00:13:05,350 you get by changing from one method to the other. 293 00:13:05,350 --> 00:13:13,420 So what I wanted to go to is something 294 00:13:13,420 --> 00:13:16,150 where we can use something, which is just 295 00:13:16,150 --> 00:13:17,900 a little more brute force. 296 00:13:17,900 --> 00:13:19,600 So the probability that-- so here 297 00:13:19,600 --> 00:13:20,931 is this Hoeffding's inequality. 298 00:13:20,931 --> 00:13:21,430 We saw that. 299 00:13:21,430 --> 00:13:23,320 That's what we've finished on last time. 300 00:13:23,320 --> 00:13:25,120 So Hoeffding's inequality is actually 301 00:13:25,120 --> 00:13:27,560 one of the most useful inequalities. 302 00:13:27,560 --> 00:13:30,130 If any one of you is doing anything really to algorithms, 303 00:13:30,130 --> 00:13:32,089 you've seen that inequality before. 304 00:13:32,089 --> 00:13:33,880 It's extremely convenient that it tells you 305 00:13:33,880 --> 00:13:35,650 something about bounded random variables, 306 00:13:35,650 --> 00:13:37,984 and if you do algorithms typically with things bounded. 307 00:13:37,984 --> 00:13:40,150 And that's the case of Bernoulli's random variables, 308 00:13:40,150 --> 00:13:40,649 right? 309 00:13:40,649 --> 00:13:42,765 They're bounded between 0 and 1. 310 00:13:42,765 --> 00:13:44,140 And so when I do this thing, when 311 00:13:44,140 --> 00:13:46,810 I do Hoeffding's inequality, what this thing is telling 312 00:13:46,810 --> 00:13:53,120 me is for any given epsilon here, for any given epsilon, 313 00:13:53,120 --> 00:13:55,790 what is the probability that Xn bar goes away 314 00:13:55,790 --> 00:13:58,370 from its expectation? 315 00:13:58,370 --> 00:14:02,030 All right, then we saw that it decreases somewhat similarly 316 00:14:02,030 --> 00:14:04,560 to the way a Gaussian would look like. 317 00:14:04,560 --> 00:14:08,120 So essentially what Hoeffding's inequality is telling me, is 318 00:14:08,120 --> 00:14:18,120 that I have this picture, when I have a Gaussian with mean u, 319 00:14:18,120 --> 00:14:20,747 I know it looks like this, right? 320 00:14:20,747 --> 00:14:22,330 What Hoeffding's inequality is telling 321 00:14:22,330 --> 00:14:24,780 me is that if I actually take the average 322 00:14:24,780 --> 00:14:27,740 of some bounded random variables, 323 00:14:27,740 --> 00:14:30,494 then their probability distribution function or maybe 324 00:14:30,494 --> 00:14:32,910 math function-- this thing might not even have [INAUDIBLE] 325 00:14:32,910 --> 00:14:35,540 the density, but let's think of it as being a density just 326 00:14:35,540 --> 00:14:38,630 for simplicity-- it's going to be something 327 00:14:38,630 --> 00:14:40,895 that's going to look like this. 328 00:14:40,895 --> 00:14:42,270 It's going to be somewhat-- well, 329 00:14:42,270 --> 00:14:44,061 sometimes it's going to have to escape just 330 00:14:44,061 --> 00:14:46,540 for the sake of having integral 1. 331 00:14:46,540 --> 00:14:49,362 But it's essentially telling me that those guys 332 00:14:49,362 --> 00:14:52,680 stay below those guys. 333 00:14:52,680 --> 00:14:56,610 The probability that Xn bar exceeds mu 334 00:14:56,610 --> 00:14:58,940 is bounded by something that decays 335 00:14:58,940 --> 00:15:00,802 like to tail of Gaussian. 336 00:15:00,802 --> 00:15:03,010 So really that's the picture you should have in mind. 337 00:15:03,010 --> 00:15:05,740 When I average bounded random variables, 338 00:15:05,740 --> 00:15:08,240 I actually have something that might be really rugged. 339 00:15:08,240 --> 00:15:10,510 It might not be smooth like a Gaussian, 340 00:15:10,510 --> 00:15:12,620 but I know that it's always bounded by a Gaussian. 341 00:15:12,620 --> 00:15:14,620 And what's nice about it is that when I actually 342 00:15:14,620 --> 00:15:17,800 start computing probability that exceeds some number, 343 00:15:17,800 --> 00:15:24,340 say alpha over 2, then I know that this I can actually 344 00:15:24,340 --> 00:15:29,460 get a number, which is just-- 345 00:15:29,460 --> 00:15:31,830 sorry, the probability that it exceeds, yeah. 346 00:15:31,830 --> 00:15:33,580 So this number that I get here is actually 347 00:15:33,580 --> 00:15:35,424 going to be somewhat smaller, right? 348 00:15:35,424 --> 00:15:37,840 So that's going to be the q alpha over 2 for the Gaussian, 349 00:15:37,840 --> 00:15:39,390 and that's going to be the-- 350 00:15:39,390 --> 00:15:41,598 I don't know, r alpha over 2 for this [? Bernoulli ?] 351 00:15:41,598 --> 00:15:43,550 random variable. 352 00:15:43,550 --> 00:15:46,478 Like q prime or different q. 353 00:15:46,478 --> 00:15:50,149 So I can actually do this without actually 354 00:15:50,149 --> 00:15:51,190 taking any limits, right? 355 00:15:51,190 --> 00:15:53,200 This is valid for any n. 356 00:15:53,200 --> 00:15:54,910 I don't need to actually go to infinity. 357 00:15:54,910 --> 00:15:57,370 Now this seems a bit magical, right? 358 00:15:57,370 --> 00:15:59,821 I mean, I just said we need n to be, 359 00:15:59,821 --> 00:16:01,570 we discussed that we wanted n to be larger 360 00:16:01,570 --> 00:16:03,660 than 30 last time for the central limit theorem 361 00:16:03,660 --> 00:16:05,950 to kick in, and this one seems to tell me 362 00:16:05,950 --> 00:16:07,940 I can do it for any n. 363 00:16:07,940 --> 00:16:12,970 Now there will be a price to pay is that I pick up this 2 over b 364 00:16:12,970 --> 00:16:13,930 minus alpha squared. 365 00:16:13,930 --> 00:16:20,421 So that's the variance of the Gaussian that I have, right? 366 00:16:20,421 --> 00:16:20,920 Sort of. 367 00:16:20,920 --> 00:16:23,450 That's telling me what the variance should be, 368 00:16:23,450 --> 00:16:24,950 and this is actually not as nice. 369 00:16:24,950 --> 00:16:27,530 I pick factor 4 compared to the Gaussian 370 00:16:27,530 --> 00:16:29,290 that I would get for that. 371 00:16:29,290 --> 00:16:32,230 So let's try to solve it for our case. 372 00:16:32,230 --> 00:16:33,800 So I just told you try it. 373 00:16:33,800 --> 00:16:35,038 Did anybody try to do it? 374 00:16:37,362 --> 00:16:39,070 So we started from this last time, right? 375 00:16:41,980 --> 00:16:43,730 And the reason was that we could say 376 00:16:43,730 --> 00:16:46,490 that the probability that this thing exceeds q alpha over 2 377 00:16:46,490 --> 00:16:47,690 is alpha. 378 00:16:47,690 --> 00:16:52,000 So that was using CLT, so let's just keep it here, and see 379 00:16:52,000 --> 00:16:53,484 what we would do differently. 380 00:16:56,230 --> 00:16:58,530 What Hoeffding tells me is that the probability that Xn 381 00:16:58,530 --> 00:17:00,070 bar minus-- 382 00:17:00,070 --> 00:17:04,265 well, what is mu in this case? 383 00:17:04,265 --> 00:17:06,220 It's p, right? 384 00:17:06,220 --> 00:17:07,660 It's just notation here. 385 00:17:07,660 --> 00:17:09,280 Mu was the average, but we call it 386 00:17:09,280 --> 00:17:12,970 p in the case of Bernoulli's, exceeds-- 387 00:17:12,970 --> 00:17:17,220 let's just call it epsilon for a second. 388 00:17:17,220 --> 00:17:19,271 So we said that this was bounded by what? 389 00:17:19,271 --> 00:17:21,020 So Hoeffding tells me that this is bounded 390 00:17:21,020 --> 00:17:26,059 by 2 times exponential minus 2. 391 00:17:26,059 --> 00:17:29,150 Now the nice thing is that I pick up a factor n here, 392 00:17:29,150 --> 00:17:30,150 epsilon squared. 393 00:17:30,150 --> 00:17:33,130 And what is b minus a squared for the Bernoulli's? 394 00:17:33,130 --> 00:17:33,840 1. 395 00:17:33,840 --> 00:17:36,517 So I don't have a denominator here. 396 00:17:36,517 --> 00:17:38,350 And I'm going to do exactly what I did here. 397 00:17:38,350 --> 00:17:40,720 I'm going to set this guy to be equal to alpha. 398 00:17:43,240 --> 00:17:46,640 So that if I get alpha here, then that 399 00:17:46,640 --> 00:17:50,100 means that just solving for epsilon, 400 00:17:50,100 --> 00:17:52,600 I'm going to have some number, which will play the role of q 401 00:17:52,600 --> 00:17:54,200 alpha over 2, and then I'm going to be 402 00:17:54,200 --> 00:17:58,400 able to just say that p is between X bar and minus 403 00:17:58,400 --> 00:18:00,834 epsilon, and X bar n plus epsilon. 404 00:18:00,834 --> 00:18:02,268 OK, so let's do it. 405 00:18:05,140 --> 00:18:06,780 So we have to solve the equation. 406 00:18:14,572 --> 00:18:20,770 2 exponential minus 2n epsilon squared equals alpha, 407 00:18:20,770 --> 00:18:22,846 which means that-- 408 00:18:22,846 --> 00:18:26,542 so here I'm going to get, there's a 2 right here. 409 00:18:26,542 --> 00:18:29,200 So that means that I get alpha over 2 here. 410 00:18:29,200 --> 00:18:30,850 Then I take the logs on both sides, 411 00:18:30,850 --> 00:18:32,058 and now let me just write it. 412 00:18:36,650 --> 00:18:39,430 And then I want to solve for epsilon. 413 00:18:39,430 --> 00:18:43,350 So that means that epsilon is equal to square root log 414 00:18:43,350 --> 00:18:45,860 q over alpha divided by 2n. 415 00:18:50,618 --> 00:18:51,118 Yes? 416 00:18:51,118 --> 00:18:53,030 AUDIENCE: [INAUDIBLE] 417 00:18:53,030 --> 00:18:55,410 PHILIPPE RIGOLLET: Why is b minus a 1? 418 00:18:55,410 --> 00:18:57,840 Well, let's just look, right? 419 00:18:57,840 --> 00:19:00,860 X lives in the interval b minus a. 420 00:19:00,860 --> 00:19:06,110 So I can take b to be 25, and a to be my negative 42. 421 00:19:06,110 --> 00:19:09,134 But I'm going to try to be as sharp as I can. 422 00:19:09,134 --> 00:19:10,800 All right, so what is the smallest value 423 00:19:10,800 --> 00:19:13,340 you can think of such that a Bernoulli random variable 424 00:19:13,340 --> 00:19:15,290 is larger than or equal to this value? 425 00:19:19,510 --> 00:19:23,740 What values does a Bernoulli random variable take? 426 00:19:23,740 --> 00:19:24,530 0 and 1. 427 00:19:24,530 --> 00:19:29,240 So it takes values between 0 and 1. 428 00:19:29,240 --> 00:19:31,280 It just maxes the value. 429 00:19:31,280 --> 00:19:33,860 Actually, this is the worst possible case 430 00:19:33,860 --> 00:19:38,130 for the Hoeffding inequality. 431 00:19:38,130 --> 00:19:40,250 So now I just get this one, and so now you 432 00:19:40,250 --> 00:19:41,750 tell me that I have this thing. 433 00:19:41,750 --> 00:19:43,260 So when I solve this guy over there. 434 00:19:43,260 --> 00:19:46,070 So combining this thing and this thing 435 00:19:46,070 --> 00:19:53,300 implies that the probability that p lives between Xn 436 00:19:53,300 --> 00:20:01,660 bar minus square root log 2 over alpha divided by 2n and X 437 00:20:01,660 --> 00:20:10,970 bar plus the square root log 2 over alpha divided by 2n 438 00:20:10,970 --> 00:20:12,020 is equal to? 439 00:20:15,170 --> 00:20:16,882 I mean, is at least. 440 00:20:16,882 --> 00:20:18,090 What is it at least equal to? 441 00:20:22,930 --> 00:20:25,870 Here this controls the probability of them outside 442 00:20:25,870 --> 00:20:27,180 of this interval, right? 443 00:20:27,180 --> 00:20:31,730 It tells me the probability that Xn bar is far from p 444 00:20:31,730 --> 00:20:32,605 by more than epsilon. 445 00:20:32,605 --> 00:20:34,521 So there's a probability that they're actually 446 00:20:34,521 --> 00:20:36,640 outside of the interval that I just wrote. 447 00:20:36,640 --> 00:20:39,650 So it's 1 minus the probability of being in the interval. 448 00:20:39,650 --> 00:20:43,882 So this is at least 1 minus alpha. 449 00:20:43,882 --> 00:20:46,340 So I just use the fact that a probability of the complement 450 00:20:46,340 --> 00:20:50,100 is 1 minus the probability of the set. 451 00:20:50,100 --> 00:20:53,460 And since I have an upper bound on the probability of the set, 452 00:20:53,460 --> 00:20:59,100 I have a lower bound on the probability of the complement. 453 00:20:59,100 --> 00:21:03,170 So now it's a bit different. 454 00:21:03,170 --> 00:21:06,640 Before, we actually wrote something that was-- 455 00:21:06,640 --> 00:21:08,010 so let me get it back. 456 00:21:08,010 --> 00:21:11,990 So if we go back to the example where we took the [INAUDIBLE] 457 00:21:11,990 --> 00:21:16,840 over p, we got this guy. 458 00:21:16,840 --> 00:21:19,990 q alpha over square root of-- 459 00:21:19,990 --> 00:21:21,700 over 2 square root n. 460 00:21:21,700 --> 00:21:24,966 So we had Xn bar plus minus q alpha over 2 square root n. 461 00:21:24,966 --> 00:21:27,340 Actually, that was q alpha over 2n, I'm sorry about that. 462 00:21:30,730 --> 00:21:34,540 And so now we have something that replaces this q alpha, 463 00:21:34,540 --> 00:21:40,880 and it's essentially square root of 2 log 2 over alpha. 464 00:21:40,880 --> 00:21:43,580 Because if I replace q alpha by square root 465 00:21:43,580 --> 00:21:47,240 of 2 log 2 over alpha, I actually 466 00:21:47,240 --> 00:21:49,338 get exactly this thing here. 467 00:21:52,030 --> 00:21:55,970 And so the question is, what would you guess? 468 00:21:55,970 --> 00:22:01,790 Is this number, this margin, square root of log 2 over alpha 469 00:22:01,790 --> 00:22:05,930 divided by 2n, is it smaller or larger than this guy? 470 00:22:05,930 --> 00:22:08,915 q alpha all over 2/3n. 471 00:22:08,915 --> 00:22:09,810 Yes? 472 00:22:09,810 --> 00:22:10,640 Larger. 473 00:22:10,640 --> 00:22:12,180 Everybody agrees with this? 474 00:22:12,180 --> 00:22:14,690 Just qualitatively? 475 00:22:14,690 --> 00:22:17,430 Right, because we just made a very conservative statement. 476 00:22:17,430 --> 00:22:18,510 We do not use anything. 477 00:22:18,510 --> 00:22:20,100 This is true always. 478 00:22:20,100 --> 00:22:22,080 So it can only be better. 479 00:22:22,080 --> 00:22:24,840 The reason in statistics where you use those assumptions 480 00:22:24,840 --> 00:22:27,590 that n is large enough, that you have this independence that you 481 00:22:27,590 --> 00:22:30,090 like so much, and so you can actually have the central limit 482 00:22:30,090 --> 00:22:32,290 theorem kick in, all these things 483 00:22:32,290 --> 00:22:35,500 are for you to have enough assumptions 484 00:22:35,500 --> 00:22:38,190 so that you can actually make sharper and sharper decisions. 485 00:22:38,190 --> 00:22:40,249 More and more confident statement. 486 00:22:40,249 --> 00:22:42,540 And that's why there's all this junk science out there, 487 00:22:42,540 --> 00:22:45,540 because people make too much assumptions for their own good. 488 00:22:45,540 --> 00:22:46,956 They're saying, well, let's assume 489 00:22:46,956 --> 00:22:50,720 that everything is the way I love it, so that I can for sure 490 00:22:50,720 --> 00:22:53,539 any minor change, I will be able to say 491 00:22:53,539 --> 00:22:55,830 that's because I made an important scientific discovery 492 00:22:55,830 --> 00:23:02,050 rather than, well, that was just [INAUDIBLE] OK? 493 00:23:02,050 --> 00:23:04,350 So now here's the fun moment. 494 00:23:04,350 --> 00:23:09,110 And actually let me tell you why we look at this thing. 495 00:23:09,110 --> 00:23:11,600 So there's actually-- who has seen 496 00:23:11,600 --> 00:23:14,328 different types of convergence in the probability statistic 497 00:23:14,328 --> 00:23:14,828 class? 498 00:23:17,900 --> 00:23:20,430 [INAUDIBLE] students. 499 00:23:20,430 --> 00:23:22,340 And so there's different types of-- 500 00:23:22,340 --> 00:23:25,610 in the real numbers there's very simple. 501 00:23:25,610 --> 00:23:27,160 There's one convergence, Xn turns 502 00:23:27,160 --> 00:23:29,680 to X. To start thinking about functions, 503 00:23:29,680 --> 00:23:32,230 well, maybe you have uniform convergence, 504 00:23:32,230 --> 00:23:33,610 you have pointwise convergence. 505 00:23:33,610 --> 00:23:34,990 So if you've done some real analysis, 506 00:23:34,990 --> 00:23:36,948 you know there's different types of convergence 507 00:23:36,948 --> 00:23:37,790 you can think of. 508 00:23:37,790 --> 00:23:40,437 And in the convergence of random variables, 509 00:23:40,437 --> 00:23:42,770 there's also different types, but for different reasons. 510 00:23:42,770 --> 00:23:44,802 It's just because the question is, what do you 511 00:23:44,802 --> 00:23:45,760 do with the randomness? 512 00:23:45,760 --> 00:23:47,885 When you see that something converges to something, 513 00:23:47,885 --> 00:23:50,620 it probably means that you're willing to tolerate 514 00:23:50,620 --> 00:23:54,190 low probability things happening or where it doesn't happen, 515 00:23:54,190 --> 00:23:56,350 and on how you handle those, creates 516 00:23:56,350 --> 00:23:58,670 different types of convergence. 517 00:23:58,670 --> 00:24:03,340 So to be fair, in statistics the only convergence we care about 518 00:24:03,340 --> 00:24:05,600 is the convergence in distribution. 519 00:24:05,600 --> 00:24:07,857 That's this one. 520 00:24:07,857 --> 00:24:09,940 The one that comes from the central limit theorem. 521 00:24:09,940 --> 00:24:12,617 That's actually the weakest possible you could make. 522 00:24:12,617 --> 00:24:14,200 Which is good, because that means it's 523 00:24:14,200 --> 00:24:16,150 going to happen more often. 524 00:24:16,150 --> 00:24:17,840 And so why do we need this thing? 525 00:24:17,840 --> 00:24:19,400 Because the only thing we really need 526 00:24:19,400 --> 00:24:21,580 to do is to say that when I start computing 527 00:24:21,580 --> 00:24:23,854 probabilities on this random variable, 528 00:24:23,854 --> 00:24:25,520 they're going to look like probabilities 529 00:24:25,520 --> 00:24:27,840 on that random variable. 530 00:24:27,840 --> 00:24:30,000 All right, so for example, think of the following 531 00:24:30,000 --> 00:24:41,070 two random variables, x and minus x. 532 00:24:41,070 --> 00:24:42,570 So this is the same random variable, 533 00:24:42,570 --> 00:24:44,970 and this one is negative. 534 00:24:44,970 --> 00:24:48,050 When I look at those two random variables, 535 00:24:48,050 --> 00:24:51,310 think of them as a sequence, a constant sequence. 536 00:24:51,310 --> 00:24:53,970 These two constant sequences do not go to the same number, 537 00:24:53,970 --> 00:24:54,470 right? 538 00:24:54,470 --> 00:24:57,910 One is plus-- one is x, the other one is minus x. 539 00:24:57,910 --> 00:25:01,240 So unless x is the random variable always equal to 0, 540 00:25:01,240 --> 00:25:03,290 those two things are different. 541 00:25:03,290 --> 00:25:05,920 However, when I compute probabilities on this guy, 542 00:25:05,920 --> 00:25:09,010 and when I compute probabilities on that guy, they're the same. 543 00:25:09,010 --> 00:25:12,250 Because x and minus x have the same distribution 544 00:25:12,250 --> 00:25:15,430 just by symmetry of the gaps in random variables. 545 00:25:15,430 --> 00:25:17,040 And so you can see this is very weak. 546 00:25:17,040 --> 00:25:19,150 I'm not saying anything about the two random variables being 547 00:25:19,150 --> 00:25:20,566 close to each other every time I'm 548 00:25:20,566 --> 00:25:22,100 going to flip my coin, right? 549 00:25:22,100 --> 00:25:25,685 Maybe I'm going to press my computer and say, what is x? 550 00:25:25,685 --> 00:25:26,560 Well, it's 1.2. 551 00:25:26,560 --> 00:25:29,110 Negative x is going to be negative 1.2. 552 00:25:29,110 --> 00:25:30,670 Those things are far apart, and it 553 00:25:30,670 --> 00:25:32,230 doesn't matter, because in average those things 554 00:25:32,230 --> 00:25:34,330 are going to have the same probabilities that's happening. 555 00:25:34,330 --> 00:25:36,040 And that's all we care about in statistics. 556 00:25:36,040 --> 00:25:37,810 You need to realize that this is what's important, 557 00:25:37,810 --> 00:25:39,130 and why you need to know. 558 00:25:39,130 --> 00:25:40,600 Because you have it really good. 559 00:25:40,600 --> 00:25:43,120 If your problem is you really care more about convergence 560 00:25:43,120 --> 00:25:45,956 almost surely, which is probably the strongest you can think of. 561 00:25:45,956 --> 00:25:48,590 So what we're going to do is talk about different types 562 00:25:48,590 --> 00:25:51,200 of convergence not to just reflect on the fact 563 00:25:51,200 --> 00:25:53,120 on how our life is good. 564 00:25:53,120 --> 00:25:56,420 It's just that the problem is that when the convergence 565 00:25:56,420 --> 00:26:00,110 in distribution is so weak that it cannot do anything I want 566 00:26:00,110 --> 00:26:00,740 with it. 567 00:26:00,740 --> 00:26:04,400 In particular, I cannot say that if X converges, 568 00:26:04,400 --> 00:26:07,230 Xn converges in distribution, and Yn converges 569 00:26:07,230 --> 00:26:10,790 in distribution, then Xn plus Yn converge in distribution 570 00:26:10,790 --> 00:26:12,080 to the sum of their limits. 571 00:26:12,080 --> 00:26:12,890 I cannot do that. 572 00:26:12,890 --> 00:26:14,855 It's just too weak. 573 00:26:14,855 --> 00:26:16,730 Think of this example and it's preventing you 574 00:26:16,730 --> 00:26:17,896 to do quite a lot of things. 575 00:26:20,820 --> 00:26:26,030 So this is converge in distribution to sum n 0, 1. 576 00:26:26,030 --> 00:26:28,940 This is converge in distribution to sum n 0, 1. 577 00:26:28,940 --> 00:26:31,490 But their sum is 0, and it's certainly not-- 578 00:26:31,490 --> 00:26:33,830 it doesn't look like the sum of two 579 00:26:33,830 --> 00:26:36,440 independent Gaussian random variables, right? 580 00:26:36,440 --> 00:26:40,220 And so what we need is to have stronger conditions here 581 00:26:40,220 --> 00:26:42,950 and there, so that we can actually put things together. 582 00:26:42,950 --> 00:26:45,176 And we're going to have more complicated formulas. 583 00:26:45,176 --> 00:26:46,550 One of the formulas, for example, 584 00:26:46,550 --> 00:26:50,430 is if I replace p by p hat in this denominator. 585 00:26:50,430 --> 00:26:53,470 We mentioned doing this at some point. 586 00:26:53,470 --> 00:26:57,550 So I would need that p hat goes to p, 587 00:26:57,550 --> 00:26:59,320 but I need stronger than n distributions 588 00:26:59,320 --> 00:27:00,420 so that this happens. 589 00:27:00,420 --> 00:27:04,270 I actually need this to happen in a stronger sense. 590 00:27:04,270 --> 00:27:07,690 So here are the first two strongest sense in which 591 00:27:07,690 --> 00:27:09,670 random variables can converge. 592 00:27:09,670 --> 00:27:13,140 The first one is almost surely. 593 00:27:13,140 --> 00:27:16,570 And who has already seen this notation little omega 594 00:27:16,570 --> 00:27:19,490 when they're talking about random variables? 595 00:27:19,490 --> 00:27:20,510 All right, so very few. 596 00:27:20,510 --> 00:27:24,012 So this little omega is-- so what is a random variable? 597 00:27:24,012 --> 00:27:25,970 A random variable is something that you measure 598 00:27:25,970 --> 00:27:27,625 on something that's random. 599 00:27:27,625 --> 00:27:29,000 So the example I like to think of 600 00:27:29,000 --> 00:27:34,910 is, if you take a ball of snow, and put it 601 00:27:34,910 --> 00:27:37,070 in the sun for some time. 602 00:27:37,070 --> 00:27:38,212 You come back. 603 00:27:38,212 --> 00:27:39,920 It's going to have a random shape, right? 604 00:27:39,920 --> 00:27:42,604 It's going to be a random blurb of something. 605 00:27:42,604 --> 00:27:45,020 But there's still a bunch of things you can measure on it. 606 00:27:45,020 --> 00:27:46,410 You can measure its volume. 607 00:27:46,410 --> 00:27:48,200 You can measure its inner temperature. 608 00:27:48,200 --> 00:27:50,210 You can measure its surface area. 609 00:27:50,210 --> 00:27:52,250 All these things are random variables, 610 00:27:52,250 --> 00:27:54,590 but the ball itself is omega. 611 00:27:54,590 --> 00:27:56,900 That's the thing on which you make your measurement. 612 00:27:56,900 --> 00:28:00,870 And so a random variable is just a function of those omegas. 613 00:28:00,870 --> 00:28:03,210 Now why do we make all these things fancy? 614 00:28:03,210 --> 00:28:04,800 Because you cannot take any function. 615 00:28:04,800 --> 00:28:06,841 This function has to be what's called measurable, 616 00:28:06,841 --> 00:28:09,070 and there's entire courses on measure theory, 617 00:28:09,070 --> 00:28:11,030 and not everything is measurable. 618 00:28:11,030 --> 00:28:13,175 And so that's why you have to be a little careful 619 00:28:13,175 --> 00:28:14,550 why not everything is measurable, 620 00:28:14,550 --> 00:28:17,590 because you need some sort of nice property. 621 00:28:17,590 --> 00:28:19,797 So that the measure of something, 622 00:28:19,797 --> 00:28:22,380 the union of two things, is less than the sum of the measures, 623 00:28:22,380 --> 00:28:23,830 things like that. 624 00:28:23,830 --> 00:28:30,940 And so almost surely is telling you that for most of the balls, 625 00:28:30,940 --> 00:28:34,540 for most of the omegas, that's the right-hand side. 626 00:28:34,540 --> 00:28:37,150 The probability of omega is such that those things converge 627 00:28:37,150 --> 00:28:41,400 to each other is actually equal to 1. 628 00:28:41,400 --> 00:28:45,620 So it tells me that for almost all omegas, all the omegas, 629 00:28:45,620 --> 00:28:47,246 if I put them together, I get something 630 00:28:47,246 --> 00:28:48,328 that has probability of 1. 631 00:28:48,328 --> 00:28:50,970 It might be that there are other ones that have probability 0, 632 00:28:50,970 --> 00:28:52,680 but what it's telling is that this thing 633 00:28:52,680 --> 00:28:55,841 happens for all possible realization of the underlying 634 00:28:55,841 --> 00:28:56,340 thing. 635 00:28:56,340 --> 00:28:57,720 That's very strong. 636 00:28:57,720 --> 00:29:00,141 It essentially says randomness does not matter, 637 00:29:00,141 --> 00:29:01,390 because it's happening always. 638 00:29:04,310 --> 00:29:06,340 Now convergence in probability allows 639 00:29:06,340 --> 00:29:09,180 you to squeeze a little bit of probability under the rock. 640 00:29:09,180 --> 00:29:12,130 It tells you I want the convergence to hold, 641 00:29:12,130 --> 00:29:17,120 but I'm willing to let go of some little epsilon. 642 00:29:17,120 --> 00:29:23,500 So I'm willing to allow Tn to be less than epsilon. 643 00:29:23,500 --> 00:29:27,380 Tn minus T to be-- sorry, to be larger than epsilon. 644 00:29:27,380 --> 00:29:29,360 But the problem is they want this to go to 0 645 00:29:29,360 --> 00:29:31,430 as well as n goes to infinity, but for each 646 00:29:31,430 --> 00:29:34,091 n this thing does not have to be 0, which 647 00:29:34,091 --> 00:29:36,250 is different from here, right? 648 00:29:36,250 --> 00:29:40,140 So this probability here is fine. 649 00:29:40,140 --> 00:29:44,460 So it's a little weaker, but it's a slightly different one. 650 00:29:44,460 --> 00:29:46,860 I'm not going to ask you to learn and show that one 651 00:29:46,860 --> 00:29:48,510 is weaker than the other one. 652 00:29:48,510 --> 00:29:51,010 But just know that these are two different types. 653 00:29:51,010 --> 00:29:53,805 This one is actually much easier to check than this one. 654 00:30:02,550 --> 00:30:06,550 Then there's something called convergence in Lp. 655 00:30:06,550 --> 00:30:09,200 So this one is the fact that it embodies the following fact. 656 00:30:09,200 --> 00:30:11,740 If I give you a random variable with mean 0, 657 00:30:11,740 --> 00:30:14,110 and I tell you that its variance is going to 0, right? 658 00:30:14,110 --> 00:30:16,795 You have a sequence of random variables, their mean is 0, 659 00:30:16,795 --> 00:30:20,390 their expectation is 0, but their variance is going to 0. 660 00:30:20,390 --> 00:30:23,500 So think of Gaussian random variables with mean 0, 661 00:30:23,500 --> 00:30:26,300 and a variance that shrinks to 0. 662 00:30:26,300 --> 00:30:29,260 And this random variable converges to a spike at 0, 663 00:30:29,260 --> 00:30:31,570 so it converges to 0, right? 664 00:30:31,570 --> 00:30:35,660 And so what I mean by that is that to have this convergence, 665 00:30:35,660 --> 00:30:38,800 all I had to tell you was that the variance was going to 0. 666 00:30:38,800 --> 00:30:41,700 And so in L2 this is really what it's telling you. 667 00:30:41,700 --> 00:30:44,720 It's telling you, well, if the variance is going to 0-- 668 00:30:44,720 --> 00:30:46,870 well, it's for any random variable T, 669 00:30:46,870 --> 00:30:50,240 so here what I describe was for a deterministic. 670 00:30:50,240 --> 00:30:55,340 So Tn goes to a random variable T. If you look at the square-- 671 00:30:55,340 --> 00:30:58,415 the expectation of the square distance, and it goes to 0. 672 00:30:58,415 --> 00:31:00,540 But you don't have to limit yourself to the square. 673 00:31:00,540 --> 00:31:01,910 You can take power of three. 674 00:31:01,910 --> 00:31:06,780 You can take power 67.6, power of 9 pi. 675 00:31:06,780 --> 00:31:09,780 You take whatever power you want, it can be fractional. 676 00:31:09,780 --> 00:31:13,920 It has to be lower than 1, and that's the convergence in Lp. 677 00:31:13,920 --> 00:31:17,520 But we mostly care about integer p. 678 00:31:17,520 --> 00:31:20,107 And then here's our star, the convergence in distribution, 679 00:31:20,107 --> 00:31:21,690 and that's just the one that tells you 680 00:31:21,690 --> 00:31:27,290 that when I start computing probabilities on the Tn, 681 00:31:27,290 --> 00:31:31,620 they're going to look very close to the probabilities on the T. 682 00:31:31,620 --> 00:31:34,410 So that was our Tn with this guy, for example, 683 00:31:34,410 --> 00:31:37,110 and T was this standard Gaussian distribution. 684 00:31:37,110 --> 00:31:38,960 Now here, this is not any probability. 685 00:31:38,960 --> 00:31:42,440 This is just the probability then less than or equal to x. 686 00:31:42,440 --> 00:31:44,390 But if you remember your probability class, 687 00:31:44,390 --> 00:31:45,830 if you can compute those probabilities, 688 00:31:45,830 --> 00:31:47,204 you can compute any probabilities 689 00:31:47,204 --> 00:31:50,034 just by subtracting and just building things together. 690 00:31:55,230 --> 00:31:58,650 Well, I need this for all x's, so I want this for each x, 691 00:31:58,650 --> 00:32:01,520 So you fix x, and then you make the limit go to infinity. 692 00:32:01,520 --> 00:32:03,180 You make n go to infinity, and I want 693 00:32:03,180 --> 00:32:06,480 this for the point x's at which the cumulative distribution 694 00:32:06,480 --> 00:32:08,230 function of T is continuous. 695 00:32:08,230 --> 00:32:15,350 There might be jumps, and that I don't actually care for those. 696 00:32:15,350 --> 00:32:17,777 All right, so here I mentioned it for random variables. 697 00:32:17,777 --> 00:32:19,860 If you're interested, there's also random vectors. 698 00:32:19,860 --> 00:32:23,430 A random vector is just a table of random variables. 699 00:32:23,430 --> 00:32:25,351 You can talk about random matrices. 700 00:32:25,351 --> 00:32:27,350 And you can talk about random whatever you want. 701 00:32:27,350 --> 00:32:28,920 Every time you have an object that's 702 00:32:28,920 --> 00:32:31,140 just collecting real numbers, you can just 703 00:32:31,140 --> 00:32:33,370 plug random variables in there. 704 00:32:33,370 --> 00:32:37,050 And so there's all these definitions that [? extend. ?] 705 00:32:37,050 --> 00:32:39,080 So where I see you see an absolute value, 706 00:32:39,080 --> 00:32:40,166 we'll see a norm. 707 00:32:40,166 --> 00:32:43,040 Things like this. 708 00:32:43,040 --> 00:32:46,070 So I'm sure this might look scary a little bit, 709 00:32:46,070 --> 00:32:49,010 but really what we are going to use is only the last one, which 710 00:32:49,010 --> 00:32:50,426 as you can see is just telling you 711 00:32:50,426 --> 00:32:52,890 that the probabilities converge to the probabilities. 712 00:32:52,890 --> 00:32:55,830 But I'm going to need the other ones every once in a while. 713 00:32:55,830 --> 00:32:59,670 And the reason is, well, OK, so here I'm 714 00:32:59,670 --> 00:33:02,970 actually going to the important characterizations 715 00:33:02,970 --> 00:33:05,340 of the convergence in distribution, 716 00:33:05,340 --> 00:33:08,110 which is R convergence style. 717 00:33:08,110 --> 00:33:10,200 So i converge in distribution if and only 718 00:33:10,200 --> 00:33:14,070 if for any function that's continuous and bounded, 719 00:33:14,070 --> 00:33:16,170 when I look at the expectation of f of Tn, 720 00:33:16,170 --> 00:33:19,870 this converges to the expectation of f of T. OK, 721 00:33:19,870 --> 00:33:25,127 so this is just those two things are actually equivalent. 722 00:33:25,127 --> 00:33:27,710 Sometimes it's easier to check one, easier to check the other, 723 00:33:27,710 --> 00:33:30,043 but in this class you won't have to prove that something 724 00:33:30,043 --> 00:33:33,380 converges in distribution other than just combining 725 00:33:33,380 --> 00:33:37,240 our existing convergence results. 726 00:33:37,240 --> 00:33:40,020 And then the last one which is equivalent to the above two 727 00:33:40,020 --> 00:33:42,990 is, anybody knows what the name of this quantity is? 728 00:33:42,990 --> 00:33:45,120 This expectation here? 729 00:33:45,120 --> 00:33:47,160 What is it called? 730 00:33:47,160 --> 00:33:49,080 The characteristic function, right? 731 00:33:49,080 --> 00:33:52,680 And so this i is the complex i, and is the complex number. 732 00:33:52,680 --> 00:33:54,120 And so it's essentially telling me 733 00:33:54,120 --> 00:33:56,070 that, well, rather than actually looking 734 00:33:56,070 --> 00:33:58,620 at all bounded and continuous but real functions, 735 00:33:58,620 --> 00:34:03,630 I can actually look at one specific family 736 00:34:03,630 --> 00:34:08,290 of complex functions, which are the functions that maps 737 00:34:08,290 --> 00:34:12,980 T to E to the ixT for x and R. That's 738 00:34:12,980 --> 00:34:14,880 a much smaller family of functions. 739 00:34:14,880 --> 00:34:17,280 All possible continuous embedded functions 740 00:34:17,280 --> 00:34:21,590 has many more elements than just the real element. 741 00:34:21,590 --> 00:34:24,310 And so now I can show that if I limit myself to do it, 742 00:34:24,310 --> 00:34:25,492 it's actually sufficient. 743 00:34:28,139 --> 00:34:32,520 So those three things are used all over the literature just 744 00:34:32,520 --> 00:34:33,360 to show things. 745 00:34:33,360 --> 00:34:37,219 In particular, if you're interested in deep digging 746 00:34:37,219 --> 00:34:39,510 a little more mathematically, the central limit theorem 747 00:34:39,510 --> 00:34:40,510 is going to be so important. 748 00:34:40,510 --> 00:34:42,120 Maybe you want to read about how to prove it. 749 00:34:42,120 --> 00:34:43,949 We're not going to prove it in this class. 750 00:34:43,949 --> 00:34:49,800 There's probably at least five different ways of proving it, 751 00:34:49,800 --> 00:34:52,440 but the most canonical one, the one that you find in textbooks, 752 00:34:52,440 --> 00:34:55,980 is the one that actually uses the third element. 753 00:34:55,980 --> 00:34:59,100 So you just look at the characteristic function 754 00:34:59,100 --> 00:35:04,400 of the square root of n Xn bar minus say mu, 755 00:35:04,400 --> 00:35:07,850 and you just expand the thing, and this is what you get. 756 00:35:07,850 --> 00:35:09,230 And you will see that in the end, 757 00:35:09,230 --> 00:35:13,820 you will get the characteristic function of a Gaussian. 758 00:35:13,820 --> 00:35:14,570 Why a Gaussian? 759 00:35:14,570 --> 00:35:15,800 Why does it kick in? 760 00:35:15,800 --> 00:35:17,420 Well, because what is the characteristic function 761 00:35:17,420 --> 00:35:17,900 of a Gaussian? 762 00:35:17,900 --> 00:35:19,760 Does anybody remember the characteristic function 763 00:35:19,760 --> 00:35:20,718 of a standard Gaussian? 764 00:35:20,718 --> 00:35:21,929 AUDIENCE: [INAUDIBLE] 765 00:35:21,929 --> 00:35:23,470 PHILIPPE RIGOLLET: Yeah, well, I mean 766 00:35:23,470 --> 00:35:27,760 there's two pi's and stuff that goes away, right? 767 00:35:27,760 --> 00:35:29,330 A Gaussian is a random variable. 768 00:35:29,330 --> 00:35:31,140 A characteristic function is a function, 769 00:35:31,140 --> 00:35:33,040 and so it's not really itself. 770 00:35:33,040 --> 00:35:34,800 It looks like itself. 771 00:35:34,800 --> 00:35:37,262 Anybody knows what the actual formula is? 772 00:35:37,262 --> 00:35:37,761 Yeah. 773 00:35:37,761 --> 00:35:39,584 AUDIENCE: [INAUDIBLE] 774 00:35:39,584 --> 00:35:41,000 PHILIPPE RIGOLLET: E to the minus? 775 00:35:41,000 --> 00:35:42,230 AUDIENCE: E to the minus x squared over 2. 776 00:35:42,230 --> 00:35:43,355 PHILIPPE RIGOLLET: Exactly. 777 00:35:43,355 --> 00:35:44,960 E to the minus x squared over 2. 778 00:35:44,960 --> 00:35:46,670 But this x squared over 2 is actually 779 00:35:46,670 --> 00:35:49,701 just the second order expansion in the Taylor expansion. 780 00:35:49,701 --> 00:35:51,534 And that's why the Gaussian is so important. 781 00:35:51,534 --> 00:35:54,820 It's just the second order Taylor expansion. 782 00:35:54,820 --> 00:35:56,190 And so you can check it out. 783 00:35:56,190 --> 00:35:58,350 I think Terry Tao has some stuff on his blog, 784 00:35:58,350 --> 00:36:00,360 and there's a bunch of different proofs. 785 00:36:00,360 --> 00:36:02,790 But if you want to prove convergence in distribution, 786 00:36:02,790 --> 00:36:07,900 you very likely are going to use one this three right here. 787 00:36:07,900 --> 00:36:09,010 So let's move on. 788 00:36:13,050 --> 00:36:15,510 This is when I said that this convergence is 789 00:36:15,510 --> 00:36:17,190 weaker than that convergence. 790 00:36:17,190 --> 00:36:18,870 This is what I meant. 791 00:36:18,870 --> 00:36:20,700 If you have convergence in one style, 792 00:36:20,700 --> 00:36:23,200 it implies convergence in the other stuff. 793 00:36:23,200 --> 00:36:26,505 So the first [INAUDIBLE] is that if Tn converges almost surely, 794 00:36:26,505 --> 00:36:28,950 this a dot s dot means almost surely, 795 00:36:28,950 --> 00:36:31,200 then it also converges in probability 796 00:36:31,200 --> 00:36:32,900 and actually the two limits, which 797 00:36:32,900 --> 00:36:37,410 are this random variable T, are equal almost surely. 798 00:36:37,410 --> 00:36:39,750 Basically what it means is that whatever you measure one 799 00:36:39,750 --> 00:36:42,166 is going to be the same that you measure on the other one. 800 00:36:42,166 --> 00:36:44,300 So that's very strong. 801 00:36:44,300 --> 00:36:47,960 So that means that convergence almost surely 802 00:36:47,960 --> 00:36:50,990 is stronger than convergence in probability. 803 00:36:50,990 --> 00:36:53,570 If you're converge in Lp then you also converge 804 00:36:53,570 --> 00:36:56,390 in Lq for sum q less than p. 805 00:36:56,390 --> 00:36:59,480 So if you converge in L2, you'll also converge in L1. 806 00:36:59,480 --> 00:37:03,050 If you converge in L67, you converge in L2. 807 00:37:03,050 --> 00:37:04,920 If you're converge in L infinity, 808 00:37:04,920 --> 00:37:09,390 you converge in Lp for anything. 809 00:37:09,390 --> 00:37:12,390 And so, again, limits are equal. 810 00:37:12,390 --> 00:37:14,396 And then when you converge in distribution, 811 00:37:14,396 --> 00:37:15,770 when you converge in probability, 812 00:37:15,770 --> 00:37:18,860 you also converge in distribution. 813 00:37:18,860 --> 00:37:22,780 OK, so almost surely implies probability. 814 00:37:22,780 --> 00:37:24,400 Lp implies probability. 815 00:37:24,400 --> 00:37:26,520 Probability implies distribution. 816 00:37:26,520 --> 00:37:28,740 And here note that I did not write, 817 00:37:28,740 --> 00:37:30,890 and the limits are equal almost surely. 818 00:37:30,890 --> 00:37:31,390 Why? 819 00:37:35,446 --> 00:37:37,070 Because the convergence in distribution 820 00:37:37,070 --> 00:37:38,930 is actually not telling you that your random variable 821 00:37:38,930 --> 00:37:40,850 is converging to another random variable. 822 00:37:40,850 --> 00:37:42,433 It's telling you that the distribution 823 00:37:42,433 --> 00:37:45,190 of your random variable is converging to a distribution. 824 00:37:45,190 --> 00:37:47,180 And think of this, guys. 825 00:37:47,180 --> 00:37:49,064 x and minus x. 826 00:37:49,064 --> 00:37:50,480 The central limit theorem tells me 827 00:37:50,480 --> 00:37:53,460 that I'm converging to some standard Gaussian distribution, 828 00:37:53,460 --> 00:37:57,334 but am I converging to x or am I converging to minus x? 829 00:37:57,334 --> 00:37:58,375 It's not well identified. 830 00:37:58,375 --> 00:38:01,470 It's any random variable that has this distribution. 831 00:38:01,470 --> 00:38:04,110 So there's no way the limits are equal. 832 00:38:04,110 --> 00:38:06,070 Their distributions are going to be the same, 833 00:38:06,070 --> 00:38:07,910 but they're not the same limit. 834 00:38:07,910 --> 00:38:09,970 Is that clear for everyone? 835 00:38:09,970 --> 00:38:12,700 So in a way, convergence in distribution 836 00:38:12,700 --> 00:38:15,177 is really not a convergence of a random variable 837 00:38:15,177 --> 00:38:16,510 towards another random variable. 838 00:38:16,510 --> 00:38:18,520 It's just telling you the limiting distribution 839 00:38:18,520 --> 00:38:20,390 of your random variable [INAUDIBLE] 840 00:38:20,390 --> 00:38:22,822 which is enough for us. 841 00:38:22,822 --> 00:38:24,530 And one thing that's actually really nice 842 00:38:24,530 --> 00:38:28,790 is this continuous mapping theorem, which 843 00:38:28,790 --> 00:38:30,347 essentially tells you that-- 844 00:38:30,347 --> 00:38:32,180 so this is one of the theorems that we like, 845 00:38:32,180 --> 00:38:33,950 because they tell us you can do what 846 00:38:33,950 --> 00:38:35,660 you feel like you want to do. 847 00:38:35,660 --> 00:38:39,830 So if I have Tn that goes to T, f of Tn goes to f of T, 848 00:38:39,830 --> 00:38:42,800 and this is true for any of those convergence 849 00:38:42,800 --> 00:38:45,650 except for Lp. 850 00:38:48,170 --> 00:38:51,490 But they have to have f, which is continuous, otherwise 851 00:38:51,490 --> 00:38:54,950 weird stuff can happen. 852 00:38:54,950 --> 00:38:58,150 So this is going to be convenient, because here I 853 00:38:58,150 --> 00:39:00,012 don't have X to n minus p. 854 00:39:00,012 --> 00:39:01,220 I have a continuous function. 855 00:39:01,220 --> 00:39:03,094 It's between a linear function of Xn minus p, 856 00:39:03,094 --> 00:39:05,800 but I could think of like even crazier stuff to do, 857 00:39:05,800 --> 00:39:07,876 and it would still be true. 858 00:39:07,876 --> 00:39:10,250 If I took the square, it would converge to something that 859 00:39:10,250 --> 00:39:11,600 looks like its distribution. 860 00:39:11,600 --> 00:39:12,975 It's the same as the distribution 861 00:39:12,975 --> 00:39:16,100 of a square Gaussian. 862 00:39:16,100 --> 00:39:18,435 So this is a mouthful, these two slides-- 863 00:39:18,435 --> 00:39:20,310 actually this particular slide is a mouthful. 864 00:39:20,310 --> 00:39:24,620 What I have in my head since I was pretty much where you're 865 00:39:24,620 --> 00:39:27,890 sitting, is this diagram. 866 00:39:27,890 --> 00:39:32,100 So what it tells me-- so it's actually voluntarily cropped, 867 00:39:32,100 --> 00:39:35,430 so you can start from any Lq you want large. 868 00:39:35,430 --> 00:39:38,030 And then as you decrease the index, 869 00:39:38,030 --> 00:39:39,750 you are actually implying, implying, 870 00:39:39,750 --> 00:39:42,690 implying until you imply convergence in probability. 871 00:39:42,690 --> 00:39:44,850 Convergence almost surely implies convergence 872 00:39:44,850 --> 00:39:49,650 in probability, and everything goes to the [? sync, ?] 873 00:39:49,650 --> 00:39:52,590 that is convergence in distribution. 874 00:39:52,590 --> 00:39:55,230 So everything implies convergence in distribution. 875 00:39:55,230 --> 00:39:57,800 So that's basically rather than remembering those formulas, 876 00:39:57,800 --> 00:39:59,840 this is really the diagram you want to remember. 877 00:40:02,690 --> 00:40:06,590 All right, so why do we bother learning about those things. 878 00:40:06,590 --> 00:40:09,380 That's because of this limits and operations. 879 00:40:09,380 --> 00:40:10,580 Operations and limits. 880 00:40:10,580 --> 00:40:13,710 If I have a sequence of real numbers, 881 00:40:13,710 --> 00:40:17,770 and I know that Xn converges to X and Yn converges to Y, 882 00:40:17,770 --> 00:40:20,051 then I can start doing all my manipulations and things 883 00:40:20,051 --> 00:40:20,550 are happy. 884 00:40:20,550 --> 00:40:21,560 I can add stuff. 885 00:40:21,560 --> 00:40:23,240 I can multiply stuff. 886 00:40:23,240 --> 00:40:28,049 But it's not true always for convergence in distribution. 887 00:40:28,049 --> 00:40:29,590 But it is, what's nice, it's actually 888 00:40:29,590 --> 00:40:32,490 true for convergence almost surely. 889 00:40:32,490 --> 00:40:35,250 Convergence almost surely everything is true. 890 00:40:35,250 --> 00:40:38,110 It's just impossible to make it fail. 891 00:40:38,110 --> 00:40:41,080 But convergence in probability is not always everything, 892 00:40:41,080 --> 00:40:43,870 but at least you can actually add stuff and multiply stuff. 893 00:40:43,870 --> 00:40:46,600 And it will still give you the sum of the n, 894 00:40:46,600 --> 00:40:49,080 and the product of the n. 895 00:40:49,080 --> 00:40:55,590 You can even take the ratio if V is not 0 of course. 896 00:40:55,590 --> 00:40:57,090 If the limit is not 0, then actually 897 00:40:57,090 --> 00:40:58,520 you need Vn to be not 0 as well. 898 00:41:01,440 --> 00:41:05,530 You can actually prove this last statement, right? 899 00:41:05,530 --> 00:41:08,620 Because it's a combination of the first statement 900 00:41:08,620 --> 00:41:11,740 of the second one, and the continuous mapping theorem. 901 00:41:11,740 --> 00:41:14,770 Because the function that maps x to 1 902 00:41:14,770 --> 00:41:19,180 over x on everything but 0, is continuous. 903 00:41:19,180 --> 00:41:24,560 And so 1 over Vn converges to 1 over V, 904 00:41:24,560 --> 00:41:26,820 and then I can multiply those two things. 905 00:41:26,820 --> 00:41:28,870 So you actually knew that one. 906 00:41:28,870 --> 00:41:30,760 But really this is not what matters, 907 00:41:30,760 --> 00:41:35,110 because this is something that you will do whatever happens. 908 00:41:35,110 --> 00:41:37,786 If I don't tell you you cannot do it, well, you will do it. 909 00:41:37,786 --> 00:41:39,160 But in general those things don't 910 00:41:39,160 --> 00:41:40,660 apply to convergence in distribution 911 00:41:40,660 --> 00:41:44,390 unless the pair itself is known to converge in distribution. 912 00:41:44,390 --> 00:41:48,220 Remember when I said that these things apply to vectors, 913 00:41:48,220 --> 00:41:51,150 then you need to actually say that the vector converges 914 00:41:51,150 --> 00:41:53,520 in distributions to the limiting factor. 915 00:41:53,520 --> 00:41:55,299 Now this tells you in particular, 916 00:41:55,299 --> 00:41:57,340 since the cumulative distribution function is not 917 00:41:57,340 --> 00:41:59,820 defined for vectors, I would have 918 00:41:59,820 --> 00:42:02,610 to actually use one of the other distributions, one 919 00:42:02,610 --> 00:42:04,410 of the other criteria, which is convergence 920 00:42:04,410 --> 00:42:07,410 of characteristic functions or convergence 921 00:42:07,410 --> 00:42:11,100 of a function of bounded continuous function 922 00:42:11,100 --> 00:42:12,470 of the random variable. 923 00:42:12,470 --> 00:42:17,154 0.2 or 0.3, but 0.1 is not going get you anywhere. 924 00:42:17,154 --> 00:42:18,570 But this is something that's going 925 00:42:18,570 --> 00:42:20,850 to be too hard for us to deal with, so we're actually 926 00:42:20,850 --> 00:42:23,742 going to rely on the fact that we have 927 00:42:23,742 --> 00:42:24,950 something that's even better. 928 00:42:24,950 --> 00:42:26,580 There's something that is waiting for us 929 00:42:26,580 --> 00:42:29,163 at the end of his lecture, which is called Slutsky's that says 930 00:42:29,163 --> 00:42:33,490 that if V, in this case, converges in probability 931 00:42:33,490 --> 00:42:36,040 but U converge in distribution, I can actually still do that. 932 00:42:36,040 --> 00:42:37,456 I actually don't need both of them 933 00:42:37,456 --> 00:42:38,746 to converge in probability. 934 00:42:38,746 --> 00:42:41,204 I actually need only one of them to converge in probability 935 00:42:41,204 --> 00:42:42,162 to make this statement. 936 00:42:42,162 --> 00:42:45,070 But two sum. 937 00:42:45,070 --> 00:42:47,060 So let's go to another example. 938 00:42:47,060 --> 00:42:49,750 So I just want to make sure that we keep on doing statistics. 939 00:42:49,750 --> 00:42:51,953 And every time we're going to just do a little bit 940 00:42:51,953 --> 00:42:54,202 too much probability, I'm going to reset the pressure, 941 00:42:54,202 --> 00:42:56,090 and start doing statistics again. 942 00:42:56,090 --> 00:42:59,460 All right, so assume you observe the times 943 00:42:59,460 --> 00:43:04,590 the inter-arrival time of the T at Kendall. 944 00:43:04,590 --> 00:43:06,030 So this is not the arrival time. 945 00:43:06,030 --> 00:43:09,980 It's not like 7:56, 8:15. 946 00:43:09,980 --> 00:43:12,920 No, it's really the inter-arrival time, right? 947 00:43:12,920 --> 00:43:17,300 So say the next T is arriving in six minutes. 948 00:43:17,300 --> 00:43:20,950 So let's say [INAUDIBLE] bound. 949 00:43:20,950 --> 00:43:23,250 And so you have this inter-arrival time. 950 00:43:23,250 --> 00:43:27,260 So those are numbers say, 3, 4, 5, 4, 3, et cetera. 951 00:43:27,260 --> 00:43:29,490 So I have this sequence of numbers. 952 00:43:29,490 --> 00:43:31,100 So I'm going to observe this, and I'm 953 00:43:31,100 --> 00:43:36,050 going to try to infer what is the rate of T's going out 954 00:43:36,050 --> 00:43:38,957 of the station from this. 955 00:43:38,957 --> 00:43:40,790 So I'm going to assume that these things are 956 00:43:40,790 --> 00:43:43,160 mutually independent. 957 00:43:43,160 --> 00:43:44,890 That's probably not completely true. 958 00:43:44,890 --> 00:43:46,850 Again, it just means that what it would mean 959 00:43:46,850 --> 00:43:49,100 is that two consecutive inter-arrival times are 960 00:43:49,100 --> 00:43:50,021 independent. 961 00:43:50,021 --> 00:43:52,020 I mean, you can make it independent if you want, 962 00:43:52,020 --> 00:43:53,603 but again, this independent assumption 963 00:43:53,603 --> 00:43:56,180 is for us to be happy and safe. 964 00:43:56,180 --> 00:43:58,160 Unless someone comes with overwhelming proof 965 00:43:58,160 --> 00:44:01,466 that it's not independent and far from being independent, 966 00:44:01,466 --> 00:44:03,950 then yes, you have a problem. 967 00:44:03,950 --> 00:44:06,200 But it might be the fact that it's actually-- if you 968 00:44:06,200 --> 00:44:09,240 have a T that's one hour late. 969 00:44:09,240 --> 00:44:12,330 If an inter-arrival time is one hour, then the other T, 970 00:44:12,330 --> 00:44:14,300 either they fixed it, and it's going 971 00:44:14,300 --> 00:44:17,150 to be just 30 seconds behind, or they haven't fixed it, 972 00:44:17,150 --> 00:44:18,990 then it's going to be another hour behind. 973 00:44:18,990 --> 00:44:20,780 So they're not exactly independent, 974 00:44:20,780 --> 00:44:24,430 but they are when things work well and approximate. 975 00:44:24,430 --> 00:44:27,580 And so now I need to model a random variable that's 976 00:44:27,580 --> 00:44:29,564 positive, maybe not upper bounded. 977 00:44:29,564 --> 00:44:31,480 I mean, people complain enough that this thing 978 00:44:31,480 --> 00:44:32,435 can be really large. 979 00:44:32,435 --> 00:44:34,810 And so one thing that people like for inter-arrival times 980 00:44:34,810 --> 00:44:36,839 is exponential distribution. 981 00:44:36,839 --> 00:44:38,380 So that's a positive random variable. 982 00:44:38,380 --> 00:44:40,463 Looks like an exponential on the right-hand slide, 983 00:44:40,463 --> 00:44:41,650 on the positive line. 984 00:44:41,650 --> 00:44:43,600 And so it decays very fast towards 0. 985 00:44:43,600 --> 00:44:45,400 The probability that you have very large 986 00:44:45,400 --> 00:44:49,080 values exponentially small, and there's a [INAUDIBLE] lambda 987 00:44:49,080 --> 00:44:50,900 that controls how exponential is defined. 988 00:44:50,900 --> 00:44:53,600 It's exponential minus lambda times something. 989 00:44:53,600 --> 00:44:56,270 And so we're going to assume that they 990 00:44:56,270 --> 00:44:58,610 have the same distribution, the same random variable. 991 00:44:58,610 --> 00:45:00,530 So they're IID, because they are independent, 992 00:45:00,530 --> 00:45:01,810 and they're identically distributed. 993 00:45:01,810 --> 00:45:04,018 They all have this exponential with parameter lambda, 994 00:45:04,018 --> 00:45:06,330 and I'm going to try to learn something about lambda. 995 00:45:06,330 --> 00:45:08,790 What is the estimated value of lambda, 996 00:45:08,790 --> 00:45:12,210 and can I build a confidence interval for lambda. 997 00:45:12,210 --> 00:45:16,470 So we observe n arrival times. 998 00:45:16,470 --> 00:45:20,420 So as I said, the mutual independence 999 00:45:20,420 --> 00:45:24,055 is plausible, but not completely justified. 1000 00:45:24,055 --> 00:45:25,430 The fact that they're exponential 1001 00:45:25,430 --> 00:45:27,804 is actually something that people like in all this what's 1002 00:45:27,804 --> 00:45:29,030 called queuing theory. 1003 00:45:29,030 --> 00:45:31,040 So exponentials arise a lot when you 1004 00:45:31,040 --> 00:45:32,450 talk about inter-arrival times. 1005 00:45:32,450 --> 00:45:34,010 It's not about the bus, but where 1006 00:45:34,010 --> 00:45:41,780 it's very important is call centers, service, servers where 1007 00:45:41,780 --> 00:45:45,260 tasks come, and people want to know how long it's 1008 00:45:45,260 --> 00:45:47,450 going to take to serve a task. 1009 00:45:47,450 --> 00:45:50,060 So when I call at a center, nobody 1010 00:45:50,060 --> 00:45:52,710 knows how long I'm going to stay on the phone with this person. 1011 00:45:52,710 --> 00:45:54,590 But it turns out that empirically exponential 1012 00:45:54,590 --> 00:45:56,797 distributions have been very good at modeling this. 1013 00:45:56,797 --> 00:45:58,630 And what it means is that they're actually-- 1014 00:45:58,630 --> 00:46:01,860 you have this memoryless property. 1015 00:46:01,860 --> 00:46:03,570 It's kind of crazy if you think about it. 1016 00:46:03,570 --> 00:46:04,611 What does that thing say? 1017 00:46:04,611 --> 00:46:06,560 Let's parse it. 1018 00:46:06,560 --> 00:46:08,910 That's the probability. 1019 00:46:08,910 --> 00:46:12,620 So this is condition on the fact that T1 is larger than T. 1020 00:46:12,620 --> 00:46:14,780 So T1 is just say the first arrival time. 1021 00:46:14,780 --> 00:46:16,820 That means that conditionally on the fact 1022 00:46:16,820 --> 00:46:19,700 that I've been waiting for the first T, well, 1023 00:46:19,700 --> 00:46:23,500 the first [INAUDIBLE]. 1024 00:46:23,500 --> 00:46:27,440 Well, I should probably-- the first subway for more than T 1025 00:46:27,440 --> 00:46:30,340 conditionally-- so I've been there T minutes already. 1026 00:46:30,340 --> 00:46:33,126 Then the probability that I wait for s more minutes. 1027 00:46:33,126 --> 00:46:35,000 So that's the probability that T1 is learned, 1028 00:46:35,000 --> 00:46:38,439 and the time that we've already waited plus x. 1029 00:46:38,439 --> 00:46:40,230 Given that I've been waiting for T minutes, 1030 00:46:40,230 --> 00:46:42,340 really I wait for s more minutes, 1031 00:46:42,340 --> 00:46:46,416 is actually the probability that I wait for s minutes total. 1032 00:46:46,416 --> 00:46:47,540 It's completely memoryless. 1033 00:46:47,540 --> 00:46:49,670 It doesn't remember how long have you been waiting. 1034 00:46:49,670 --> 00:46:51,020 The probability does not change. 1035 00:46:51,020 --> 00:46:53,450 You can have waited for two hours, the probability 1036 00:46:53,450 --> 00:46:55,429 that it takes another 10 minutes is 1037 00:46:55,429 --> 00:46:56,845 going to be the same as if you had 1038 00:46:56,845 --> 00:46:59,250 been waiting for zero minutes. 1039 00:46:59,250 --> 00:47:00,750 And that's something that's actually 1040 00:47:00,750 --> 00:47:02,470 part of your problem set. 1041 00:47:02,470 --> 00:47:03,420 Very easy to compute. 1042 00:47:03,420 --> 00:47:05,730 This is just an analytical property. 1043 00:47:05,730 --> 00:47:07,226 And you just manipulate functions, 1044 00:47:07,226 --> 00:47:09,351 and you see that this thing just happen to be true, 1045 00:47:09,351 --> 00:47:11,840 and that's something that people like. 1046 00:47:11,840 --> 00:47:15,140 Because that's also something that benefit. 1047 00:47:15,140 --> 00:47:17,540 And also what we like is that this thing is positive 1048 00:47:17,540 --> 00:47:21,080 almost surely, which is good when you model arrival times. 1049 00:47:21,080 --> 00:47:23,132 To be fair, we're not going to be that careful. 1050 00:47:23,132 --> 00:47:24,590 Because sometimes we are just going 1051 00:47:24,590 --> 00:47:29,010 to assume that something follows a normal distribution. 1052 00:47:29,010 --> 00:47:30,627 And in particular, I mean, I don't 1053 00:47:30,627 --> 00:47:32,460 know if we're going to go into that details, 1054 00:47:32,460 --> 00:47:34,830 but a good thing that you can model with a Gaussian 1055 00:47:34,830 --> 00:47:38,430 distribution are heights of students. 1056 00:47:38,430 --> 00:47:40,720 But technically with positive probability, 1057 00:47:40,720 --> 00:47:44,290 you can have a negative Gaussian random variable, right? 1058 00:47:44,290 --> 00:47:48,550 And the probability being it's probably 10 to the minus 25, 1059 00:47:48,550 --> 00:47:49,716 but it's positive. 1060 00:47:49,716 --> 00:47:51,590 But it's good enough for us for our modeling. 1061 00:47:51,590 --> 00:47:54,242 So this thing is nice, but this is not going to be required. 1062 00:47:54,242 --> 00:47:56,200 When you're modeling positive random variables, 1063 00:47:56,200 --> 00:47:59,050 you don't always have to use positive distributions that are 1064 00:47:59,050 --> 00:48:01,465 supported on positive numbers. 1065 00:48:01,465 --> 00:48:03,397 You can use distributions like Gaussian. 1066 00:48:06,300 --> 00:48:09,817 So now this exponential distribution of T1, Tn 1067 00:48:09,817 --> 00:48:11,400 they have the same parameter, and that 1068 00:48:11,400 --> 00:48:14,430 means that in average they have the same inter-arrival time. 1069 00:48:14,430 --> 00:48:16,890 So this lambda is actually the expectation. 1070 00:48:16,890 --> 00:48:19,390 And what I'm just saying is that they're identically 1071 00:48:19,390 --> 00:48:21,600 distributed means that I mean some sort 1072 00:48:21,600 --> 00:48:24,279 of a stationary regime, and it's not always true. 1073 00:48:24,279 --> 00:48:26,070 I have to look at a shorter period of time, 1074 00:48:26,070 --> 00:48:28,810 because at rush hour and 11:00 PM 1075 00:48:28,810 --> 00:48:31,200 clearly those average inter-arrival times 1076 00:48:31,200 --> 00:48:33,540 are going to be different So it means that I am really 1077 00:48:33,540 --> 00:48:35,310 focusing maybe on rush hour. 1078 00:48:38,567 --> 00:48:39,650 Sorry, I said it's lambda. 1079 00:48:39,650 --> 00:48:40,816 It's actually 1 over lambda. 1080 00:48:40,816 --> 00:48:42,460 I always mix the two. 1081 00:48:42,460 --> 00:48:44,300 All right, so you have the density of T1. 1082 00:48:44,300 --> 00:48:46,970 So f of T is this. 1083 00:48:46,970 --> 00:48:49,400 So it's on the positive real line. 1084 00:48:49,400 --> 00:48:52,390 The fact that I have strictly positive or larger [INAUDIBLE] 1085 00:48:52,390 --> 00:48:54,542 to 0 doesn't make any difference. 1086 00:48:54,542 --> 00:48:55,500 So this is the density. 1087 00:48:55,500 --> 00:48:58,220 So it's lambda E to the minus lambda T. The lambda in front 1088 00:48:58,220 --> 00:48:59,690 just ensures that when I integrate 1089 00:48:59,690 --> 00:49:03,500 this function between 0 and infinity, I get 1. 1090 00:49:03,500 --> 00:49:06,160 And you can see, it decays like exponential minus lambda T. 1091 00:49:06,160 --> 00:49:09,688 So if I were to draw it, it would just look like this. 1092 00:49:13,630 --> 00:49:17,862 So at 0, what value does it take? 1093 00:49:17,862 --> 00:49:19,750 Lambda. 1094 00:49:19,750 --> 00:49:23,160 And then I decay like exponential minus lambda T. 1095 00:49:23,160 --> 00:49:30,840 So this is 0, and this is f of T. 1096 00:49:30,840 --> 00:49:33,730 So very small probability of being very large. 1097 00:49:33,730 --> 00:49:35,730 Of course, it depends on lambda. 1098 00:49:35,730 --> 00:49:37,916 Now the expectation, you can compute the expectation 1099 00:49:37,916 --> 00:49:38,790 of this thing, right? 1100 00:49:38,790 --> 00:49:41,881 So you integrate T times f of T. This 1101 00:49:41,881 --> 00:49:44,130 is part of the little sheet that I gave you last time. 1102 00:49:44,130 --> 00:49:45,629 This is one of the things you should 1103 00:49:45,629 --> 00:49:47,160 be able to do blindfolded. 1104 00:49:47,160 --> 00:49:51,276 And then you get the expectation of T1 is 1 over lambda. 1105 00:49:51,276 --> 00:49:53,010 That's what comes out. 1106 00:49:53,010 --> 00:49:57,630 So as I actually tell many of my students, 99% of statistics 1107 00:49:57,630 --> 00:50:00,274 is replacing expectations by averages. 1108 00:50:00,274 --> 00:50:02,940 And so what you're tempted to do is say, well, if in average I'm 1109 00:50:02,940 --> 00:50:05,910 supposed to see 1 over lambda, I have 15 observations. 1110 00:50:05,910 --> 00:50:07,810 I'm just going to average those observations, 1111 00:50:07,810 --> 00:50:10,143 and I'm going to see something that should be close to 1 1112 00:50:10,143 --> 00:50:11,710 over lambda. 1113 00:50:11,710 --> 00:50:14,890 So statistics is about replacing averages, 1114 00:50:14,890 --> 00:50:17,950 expectations with averages, and that's we do. 1115 00:50:17,950 --> 00:50:21,530 So Tn bar here, which is the average of the Ti's, is 1116 00:50:21,530 --> 00:50:25,060 a pretty good estimator for 1 over lambda. 1117 00:50:25,060 --> 00:50:27,140 So if I want an estimate for lambda, 1118 00:50:27,140 --> 00:50:30,190 then I need to take 1 over Tn bar. 1119 00:50:30,190 --> 00:50:32,510 So here is one estimator. 1120 00:50:32,510 --> 00:50:36,340 I did it without much principle except that I just 1121 00:50:36,340 --> 00:50:38,740 want to replace expectations by averages, 1122 00:50:38,740 --> 00:50:41,290 and then I fixed the problem that I was actually estimating 1123 00:50:41,290 --> 00:50:43,030 1 over lambda by lambda. 1124 00:50:43,030 --> 00:50:45,490 But you could come up with other estimators, right? 1125 00:50:45,490 --> 00:50:49,730 But let's say this is my way of getting to that estimator. 1126 00:50:49,730 --> 00:50:52,480 Just like I didn't give you any principled way of getting p 1127 00:50:52,480 --> 00:50:54,770 hat, which is Xn bar in the kiss example. 1128 00:50:54,770 --> 00:50:57,850 But that's the natural way to do it. 1129 00:50:57,850 --> 00:51:01,380 Everybody is completely shocked by this approach? 1130 00:51:01,380 --> 00:51:03,300 All right, so let's do this. 1131 00:51:03,300 --> 00:51:06,260 So what can I say about the properties of this estimator 1132 00:51:06,260 --> 00:51:08,130 lambda hat? 1133 00:51:08,130 --> 00:51:12,750 Well, I know that Tn bar is going to 1 over lambda 1134 00:51:12,750 --> 00:51:14,214 by the law of large number. 1135 00:51:14,214 --> 00:51:14,880 It's an average. 1136 00:51:14,880 --> 00:51:18,120 It converges to the expectation both almost surely, 1137 00:51:18,120 --> 00:51:19,185 and in probability. 1138 00:51:19,185 --> 00:51:21,310 So the first one is the strong law of large number, 1139 00:51:21,310 --> 00:51:23,526 the second one is the weak law of large number. 1140 00:51:23,526 --> 00:51:24,650 I can apply the strong one. 1141 00:51:24,650 --> 00:51:26,800 I have enough conditions. 1142 00:51:26,800 --> 00:51:31,610 And hence, what do I apply so that 1 over Tn bar 1143 00:51:31,610 --> 00:51:35,110 actually goes to lambda? 1144 00:51:35,110 --> 00:51:36,250 So I said hence. 1145 00:51:36,250 --> 00:51:37,041 What is hence? 1146 00:51:37,041 --> 00:51:37,874 What is it based on? 1147 00:51:37,874 --> 00:51:43,455 AUDIENCE: [INAUDIBLE] 1148 00:51:43,455 --> 00:51:45,580 PHILIPPE RIGOLLET Yeah, continuous mapping theorem, 1149 00:51:45,580 --> 00:51:45,720 right? 1150 00:51:45,720 --> 00:51:47,370 So I have this function 1 over x. 1151 00:51:47,370 --> 00:51:49,180 I just apply this function. 1152 00:51:49,180 --> 00:51:51,397 So if it was 1 over lambda squared, 1153 00:51:51,397 --> 00:51:52,980 I would have the same thing that would 1154 00:51:52,980 --> 00:51:54,688 happen just because the function 1 over x 1155 00:51:54,688 --> 00:51:58,130 is continuous away from 0. 1156 00:51:58,130 --> 00:52:00,300 And now the central limit theorem 1157 00:52:00,300 --> 00:52:02,370 is also telling me something about lambda. 1158 00:52:02,370 --> 00:52:03,256 About Tn bar, right? 1159 00:52:03,256 --> 00:52:05,130 It's telling me that if I look at my average, 1160 00:52:05,130 --> 00:52:08,520 I remove the expectation here. 1161 00:52:08,520 --> 00:52:11,520 So if I do Tn bar minus my expectation, 1162 00:52:11,520 --> 00:52:15,820 rescale by this guy here, then this thing is going 1163 00:52:15,820 --> 00:52:18,280 to converge to some Gaussian random variable, 1164 00:52:18,280 --> 00:52:21,260 but here I have this lambda to the negative 1-- 1165 00:52:21,260 --> 00:52:23,530 to the negative 2 here, and that's 1166 00:52:23,530 --> 00:52:25,720 because they did not tell you that if you 1167 00:52:25,720 --> 00:52:28,730 compute the variance-- 1168 00:52:28,730 --> 00:52:30,532 so from this, you can probably extract. 1169 00:52:34,308 --> 00:52:39,280 So if I have X that follows some exponential distribution 1170 00:52:39,280 --> 00:52:40,350 with parameter lambda. 1171 00:52:40,350 --> 00:52:42,580 Well, let's call it T. 1172 00:52:42,580 --> 00:52:46,540 So we know that T in expectation, the expectation 1173 00:52:46,540 --> 00:52:48,340 of T is 1 over lambda. 1174 00:52:48,340 --> 00:52:49,560 What is the variance of T? 1175 00:52:56,690 --> 00:53:00,360 You should be able to read it from the thing here. 1176 00:53:09,984 --> 00:53:10,900 1 over lambda squared. 1177 00:53:10,900 --> 00:53:12,816 That's what you actually read in the variance, 1178 00:53:12,816 --> 00:53:16,530 because the central limit theorem is really telling you 1179 00:53:16,530 --> 00:53:19,590 the distribution goes through this n. 1180 00:53:19,590 --> 00:53:23,739 But this numbers and this number you can read, right? 1181 00:53:23,739 --> 00:53:26,280 If you look at the expectation of this guy it's-- of this guy 1182 00:53:26,280 --> 00:53:26,830 comes out. 1183 00:53:26,830 --> 00:53:28,660 This is 1 over lambda minus 1 over lambda. 1184 00:53:28,660 --> 00:53:30,360 That's why you read the 0. 1185 00:53:30,360 --> 00:53:32,550 And if you look at the variance of the dot, 1186 00:53:32,550 --> 00:53:36,330 you get n times the variance of this average. 1187 00:53:36,330 --> 00:53:39,510 Variance of the average is picking up a factor 1 over n. 1188 00:53:39,510 --> 00:53:40,590 So the n cancels. 1189 00:53:40,590 --> 00:53:42,881 And then I'm left with only one of the variances, which 1190 00:53:42,881 --> 00:53:45,250 is 1 over lambda squared. 1191 00:53:45,250 --> 00:53:48,130 OK, so we're not going to do that in details, 1192 00:53:48,130 --> 00:53:50,430 because, again, this is just a pure calculus exercise. 1193 00:53:50,430 --> 00:53:54,700 But this is if you compute integral of lambda e 1194 00:53:54,700 --> 00:53:58,430 to the minus t lambda times t squared. 1195 00:53:58,430 --> 00:54:01,754 Actually t minus 1 over lambda squared 1196 00:54:01,754 --> 00:54:05,180 dt between 0 and infinity. 1197 00:54:05,180 --> 00:54:07,774 You will see that this thing is 1 over lambda squared. 1198 00:54:14,157 --> 00:54:15,139 How would I do this? 1199 00:54:20,540 --> 00:54:24,490 Configuration by [INAUDIBLE] or you know it. 1200 00:54:24,490 --> 00:54:26,100 All right. 1201 00:54:26,100 --> 00:54:29,200 So this is what the central limit theorem tells me. 1202 00:54:29,200 --> 00:54:31,620 So this gives me if I solve this, 1203 00:54:31,620 --> 00:54:35,550 and I plug in so I can multiply by lambda and solve, 1204 00:54:35,550 --> 00:54:40,100 it would give me somewhat a confidence interval for 1 1205 00:54:40,100 --> 00:54:42,940 over lambda. 1206 00:54:42,940 --> 00:54:44,370 If we just think of 1 over lambda 1207 00:54:44,370 --> 00:54:46,590 as being the p that I had before, 1208 00:54:46,590 --> 00:54:48,826 this would give me a central limit theorem for-- 1209 00:54:51,664 --> 00:54:54,460 sorry, a confidence interval for 1 over lambda. 1210 00:54:54,460 --> 00:54:56,250 So I'm hiding a little bit under the rug 1211 00:54:56,250 --> 00:54:58,540 the fact that I have to still define it. 1212 00:54:58,540 --> 00:55:00,955 Let's just actually go through this. 1213 00:55:00,955 --> 00:55:02,890 I see some of you are uncomfortable with this, 1214 00:55:02,890 --> 00:55:04,884 so let's just do it. 1215 00:55:04,884 --> 00:55:06,800 So what we've just proved by the central limit 1216 00:55:06,800 --> 00:55:09,330 theorem is that the probability, that's 1217 00:55:09,330 --> 00:55:21,180 square root of n Tn minus 1 over lambda exceeds q alpha over 2 1218 00:55:21,180 --> 00:55:24,690 is approximately equal to alpha, right? 1219 00:55:24,690 --> 00:55:27,180 That's just the statement of the central limit theorem, 1220 00:55:27,180 --> 00:55:30,654 and by approximately equal I mean as n goes to infinity. 1221 00:55:34,230 --> 00:55:36,750 Sorry I did not write it correctly. 1222 00:55:36,750 --> 00:55:39,440 I still have to divide by square root of 1 1223 00:55:39,440 --> 00:55:43,050 over lambda squared, which is the standard deviation, right? 1224 00:55:43,050 --> 00:55:44,620 And we said that this is a bit ugly. 1225 00:55:44,620 --> 00:55:46,820 So let's just do it the way it should be. 1226 00:55:46,820 --> 00:55:50,290 So multiply all these things by lambda. 1227 00:55:50,290 --> 00:55:56,020 So that means now that the absolute value, so 1228 00:55:56,020 --> 00:55:59,530 with probability 1 minus alpha asymptotically, 1229 00:55:59,530 --> 00:56:07,870 I have that square root of n times lambda Tn minus 1 1230 00:56:07,870 --> 00:56:11,080 is less than or equal to q alpha over 2. 1231 00:56:14,930 --> 00:56:20,020 So what it means is that, oh, I have negative q alpha over 2 1232 00:56:20,020 --> 00:56:22,640 less than square root of n. 1233 00:56:22,640 --> 00:56:25,224 Let me divide by square root of n here. 1234 00:56:25,224 --> 00:56:34,620 lambda Tn minus 1 q alpha over 2. 1235 00:56:34,620 --> 00:56:41,891 And so now what I have is that I get that lambda is between-- 1236 00:56:41,891 --> 00:56:50,410 that's Tn bar-- is between 1 plus q alpha over 2 1237 00:56:50,410 --> 00:56:53,510 divided by root n. 1238 00:56:53,510 --> 00:56:57,470 And the whole thing is divided by Tn bar, 1239 00:56:57,470 --> 00:57:04,010 and same thing on the other side except I have 1 minus q alpha 1240 00:57:04,010 --> 00:57:08,678 over 2 divided by root n divided by Tn bar. 1241 00:57:12,980 --> 00:57:16,270 So it's kind of a weird shape, but it's still 1242 00:57:16,270 --> 00:57:20,238 of the form 1 over Tn bar plus or minus something. 1243 00:57:20,238 --> 00:57:23,830 But this something depends on Tn bar itself. 1244 00:57:23,830 --> 00:57:26,230 And that's actually normal, because Tn bar is not only 1245 00:57:26,230 --> 00:57:29,020 giving me information about the mean, 1246 00:57:29,020 --> 00:57:31,360 but it's also giving me information about the variance. 1247 00:57:31,360 --> 00:57:37,570 So it should definitely come in the size of my error bars. 1248 00:57:37,570 --> 00:57:41,710 And that's the way it comes in this fairly natural way. 1249 00:57:41,710 --> 00:57:43,810 Everybody agrees? 1250 00:57:43,810 --> 00:57:46,880 So now I have actually built a confidence interval. 1251 00:57:46,880 --> 00:57:50,770 But what I want to show you with this example is, 1252 00:57:50,770 --> 00:57:52,870 can I translate this in a central limit 1253 00:57:52,870 --> 00:57:57,520 theorem for something that converges to lambda, right? 1254 00:57:57,520 --> 00:58:00,760 I know that Tn bar converges to 1 over lambda, 1255 00:58:00,760 --> 00:58:05,260 but I also know that 1 over Tn bar converges to lambda. 1256 00:58:05,260 --> 00:58:09,450 So do I have a central limit theorem for 1 over Tn bar? 1257 00:58:09,450 --> 00:58:11,490 Technically no, right? 1258 00:58:11,490 --> 00:58:14,520 Central limit theorems are about averages, and 1 over an average 1259 00:58:14,520 --> 00:58:16,474 is not an average. 1260 00:58:16,474 --> 00:58:20,520 But there's something that statisticians like a lot, 1261 00:58:20,520 --> 00:58:23,060 and it's called the Delta method. 1262 00:58:23,060 --> 00:58:24,800 The Delta method is really something 1263 00:58:24,800 --> 00:58:27,200 that's telling you that you can actually 1264 00:58:27,200 --> 00:58:30,440 take a function of an average, and let 1265 00:58:30,440 --> 00:58:32,570 it go to the function of the limit, 1266 00:58:32,570 --> 00:58:34,700 and you still have a central limit theorem. 1267 00:58:34,700 --> 00:58:37,280 And the factor or the price to pay for this 1268 00:58:37,280 --> 00:58:44,040 is something which depends on the derivative of the function. 1269 00:58:44,040 --> 00:58:46,276 And so let's just go through this, 1270 00:58:46,276 --> 00:58:48,650 and it's, again, just like the proof of the central limit 1271 00:58:48,650 --> 00:58:49,640 theorem. 1272 00:58:49,640 --> 00:58:53,550 And actually in many of those asymptotic statistics results, 1273 00:58:53,550 --> 00:58:55,834 this is actually just a Taylor expansion, 1274 00:58:55,834 --> 00:58:57,500 and here it's not even the second order, 1275 00:58:57,500 --> 00:58:59,600 it's actually the first order, all right? 1276 00:58:59,600 --> 00:59:02,183 So I'm just going to do linear approximation of this function. 1277 00:59:04,360 --> 00:59:05,320 So let's do it. 1278 00:59:05,320 --> 00:59:12,950 So I have that g of Tn bar-- 1279 00:59:12,950 --> 00:59:15,420 actually let's use the notation of this slide, 1280 00:59:15,420 --> 00:59:17,590 which is Zn and theta. 1281 00:59:17,590 --> 00:59:24,250 So what I know is that Zn minus theta square root of n 1282 00:59:24,250 --> 00:59:29,454 goes to some Gaussian, this standard Gaussian. 1283 00:59:29,454 --> 00:59:32,810 No, not standard. 1284 00:59:32,810 --> 00:59:34,080 OK, so that's the assumptions. 1285 00:59:34,080 --> 00:59:40,502 And what I want to show is some convergence of g of Zn 1286 00:59:40,502 --> 00:59:43,590 to g of theta. 1287 00:59:43,590 --> 00:59:46,350 So I'm not going to multiply by root n just yet. 1288 00:59:46,350 --> 00:59:49,125 So I'm going to do a first order Taylor expansion. 1289 00:59:49,125 --> 00:59:57,040 So what it is telling me is that this is equal to Zn minus theta 1290 00:59:57,040 --> 01:00:01,570 times g prime of, let's call it theta bar 1291 01:00:01,570 --> 01:00:06,200 where theta bar is somewhere between say 1292 01:00:06,200 --> 01:00:11,148 Zn and theta, for sum. 1293 01:00:13,980 --> 01:00:17,700 OK, so if theta is less than Zn you just permute those two. 1294 01:00:17,700 --> 01:00:21,169 So that's what the Taylor first order Taylor 1295 01:00:21,169 --> 01:00:21,960 expansion tells me. 1296 01:00:21,960 --> 01:00:23,918 There exists a theta bar that's between the two 1297 01:00:23,918 --> 01:00:26,912 values at which I'm expanding so that those two things are 1298 01:00:26,912 --> 01:00:29,292 equal. 1299 01:00:29,292 --> 01:00:31,172 Is everybody shocked? 1300 01:00:31,172 --> 01:00:31,672 No? 1301 01:00:31,672 --> 01:00:36,350 So that's standard Taylor expansion. 1302 01:00:36,350 --> 01:00:38,054 Now I'm going to multiply by root n. 1303 01:00:44,519 --> 01:00:45,810 And so that's going to be what? 1304 01:00:45,810 --> 01:00:50,200 That's going to be root n Zn minus theta. 1305 01:00:50,200 --> 01:00:51,970 Ah-ha, that's something I like. 1306 01:00:51,970 --> 01:00:57,130 Times g prime of theta bar. 1307 01:00:59,887 --> 01:01:01,470 Now the central limit theorem tells me 1308 01:01:01,470 --> 01:01:02,904 that this goes to what? 1309 01:01:06,250 --> 01:01:12,370 Well, this goes to sum n 0 sigma squared, right? 1310 01:01:12,370 --> 01:01:15,400 That was the first line over there. 1311 01:01:15,400 --> 01:01:20,520 This guy here, well, it's not clear, right? 1312 01:01:20,520 --> 01:01:21,540 Actually it is. 1313 01:01:21,540 --> 01:01:24,840 Let's start with this guy. 1314 01:01:24,840 --> 01:01:28,450 What does theta bar go to? 1315 01:01:28,450 --> 01:01:30,752 Well, I know that Zn is going to theta. 1316 01:01:33,660 --> 01:01:37,760 Just because, well, that's my law of large numbers. 1317 01:01:37,760 --> 01:01:41,010 Zn is going to theta, which means 1318 01:01:41,010 --> 01:01:44,520 that theta bar is sandwiched between two values that 1319 01:01:44,520 --> 01:01:46,910 converge to theta. 1320 01:01:46,910 --> 01:01:49,580 So that means that theta bar converges to theta itself 1321 01:01:49,580 --> 01:01:51,300 as n goes to infinity. 1322 01:01:51,300 --> 01:01:54,940 That's just the law of large numbers. 1323 01:01:54,940 --> 01:01:57,450 Everybody agrees? 1324 01:01:57,450 --> 01:01:58,950 Just because it's sandwiched, right? 1325 01:01:58,950 --> 01:02:01,180 So I have Zn. 1326 01:02:01,180 --> 01:02:05,651 I have theta, and theta bar is somewhere here. 1327 01:02:05,651 --> 01:02:06,900 The picture might be reversed. 1328 01:02:06,900 --> 01:02:08,980 It might be that Zn end is larger than theta. 1329 01:02:08,980 --> 01:02:10,480 But the law of large number tells me 1330 01:02:10,480 --> 01:02:14,050 that this guy is not moving, but this guy is moving that way. 1331 01:02:14,050 --> 01:02:16,444 So you know when n is [INAUDIBLE],, 1332 01:02:16,444 --> 01:02:18,360 there's very little wiggle room for theta bar, 1333 01:02:18,360 --> 01:02:19,975 and it can only get to theta. 1334 01:02:23,370 --> 01:02:25,310 And I call it the sandwich theorem, 1335 01:02:25,310 --> 01:02:29,230 or just find your favorite food in there. 1336 01:02:29,230 --> 01:02:31,716 So this guy goes to theta, and now I 1337 01:02:31,716 --> 01:02:33,340 need to make an extra assumption, which 1338 01:02:33,340 --> 01:02:38,601 is that g prime is continuous. 1339 01:02:38,601 --> 01:02:42,300 And if g prime is continuous, then g prime of theta bar 1340 01:02:42,300 --> 01:02:44,630 goes to g prime of theta. 1341 01:02:44,630 --> 01:02:49,132 So this thing goes to g prime of theta. 1342 01:02:52,580 --> 01:02:54,776 But I have an issue here. 1343 01:02:54,776 --> 01:02:56,150 Is that now I have something that 1344 01:02:56,150 --> 01:02:57,860 converges in distribution and something 1345 01:02:57,860 --> 01:03:01,540 that converges in say-- 1346 01:03:01,540 --> 01:03:04,200 I mean, this converges almost surely or saying probability 1347 01:03:04,200 --> 01:03:06,370 just to be safe. 1348 01:03:06,370 --> 01:03:09,820 And this one converges in distribution. 1349 01:03:09,820 --> 01:03:11,050 And I want to combine them. 1350 01:03:11,050 --> 01:03:12,633 But I don't have a slide that tells me 1351 01:03:12,633 --> 01:03:15,660 I'm allowed to take the product of something that converges 1352 01:03:15,660 --> 01:03:18,460 in distribution, and something that converges in probability. 1353 01:03:18,460 --> 01:03:19,500 This does not exist. 1354 01:03:19,500 --> 01:03:21,450 Actually, if anything it told me, 1355 01:03:21,450 --> 01:03:25,970 do not do anything with things that converge in distribution. 1356 01:03:25,970 --> 01:03:32,770 And so that gets us to our-- 1357 01:03:32,770 --> 01:03:36,000 OK, so I'll come back to this in a second. 1358 01:03:36,000 --> 01:03:39,560 And that gets us to something called Slutsky's theorem. 1359 01:03:39,560 --> 01:03:42,940 And Slutsky's theorem tells us that in very specific cases, 1360 01:03:42,940 --> 01:03:44,740 you can do just that. 1361 01:03:44,740 --> 01:03:49,000 So you have two sequences of random variables, Xn bar, 1362 01:03:49,000 --> 01:03:53,370 that's Xn that converges to X. And Yn that converges to Y, 1363 01:03:53,370 --> 01:03:55,370 but Y is not anything. 1364 01:03:55,370 --> 01:03:57,410 Y is not any random variable. 1365 01:03:57,410 --> 01:03:59,090 So X converges in this distribution. 1366 01:03:59,090 --> 01:04:01,215 Sorry, I forgot to mention, this is very important. 1367 01:04:01,215 --> 01:04:04,920 Xn converges in distribution, Y converges in probability. 1368 01:04:04,920 --> 01:04:07,570 And we know that in generality we cannot combine those two 1369 01:04:07,570 --> 01:04:11,272 things, but Slutsky tells us that if the limit of Y is 1370 01:04:11,272 --> 01:04:13,230 a constant, meaning it's not a random variable, 1371 01:04:13,230 --> 01:04:16,080 but it's a deterministic number 2, 1372 01:04:16,080 --> 01:04:18,940 just a fixed number that's not a random variable, 1373 01:04:18,940 --> 01:04:21,390 then you can combine them. 1374 01:04:21,390 --> 01:04:24,869 Then you can sum them, and then you can multiply them. 1375 01:04:28,874 --> 01:04:31,290 I mean, actually you can do whatever combination you want, 1376 01:04:31,290 --> 01:04:34,800 because it actually implies that X, the vector Xn, Yn 1377 01:04:34,800 --> 01:04:39,250 converges to the vector Xc. 1378 01:04:39,250 --> 01:04:41,420 OK, so here I just took two combinations. 1379 01:04:41,420 --> 01:04:44,070 They are very convenient for us, the sum and the product 1380 01:04:44,070 --> 01:04:45,850 so I could do other stuff like the ratio 1381 01:04:45,850 --> 01:04:47,563 if c is not 0, things like that. 1382 01:04:51,190 --> 01:04:53,010 So that's what Slutsky does for us. 1383 01:04:53,010 --> 01:04:56,120 So what you're going to have to write a lot in your homework, 1384 01:04:56,120 --> 01:04:58,880 in your mid-terms, by Slutsky. 1385 01:04:58,880 --> 01:05:03,230 I know some people are very generous with their by Slutsky. 1386 01:05:03,230 --> 01:05:05,940 They just do numerical applications, 1387 01:05:05,940 --> 01:05:08,250 mu is equal to 6, and therefore by Slutsky 1388 01:05:08,250 --> 01:05:10,260 mu square is equal to 36. 1389 01:05:10,260 --> 01:05:11,690 All right, so don't do that. 1390 01:05:11,690 --> 01:05:15,415 Just use, write Slutsky when you're actually using Slutsky. 1391 01:05:15,415 --> 01:05:17,540 But this is something that's very important for us, 1392 01:05:17,540 --> 01:05:18,860 and it turns out that you're going 1393 01:05:18,860 --> 01:05:20,985 to feel like you can write by Slutsky all the time, 1394 01:05:20,985 --> 01:05:23,362 because that's going to work for us all the time. 1395 01:05:23,362 --> 01:05:25,070 Everything we're going to see is actually 1396 01:05:25,070 --> 01:05:27,590 going to be where we're going to have to combine stuff. 1397 01:05:27,590 --> 01:05:30,260 Since we only rely on convergence from distribution 1398 01:05:30,260 --> 01:05:32,090 arising from the central limit theorem, 1399 01:05:32,090 --> 01:05:34,340 we're actually going to have to rely on something that 1400 01:05:34,340 --> 01:05:36,920 allows us to combine them, and the only thing we know 1401 01:05:36,920 --> 01:05:37,590 is Slutsky. 1402 01:05:37,590 --> 01:05:40,290 So we better hope that this thing works. 1403 01:05:40,290 --> 01:05:41,780 So why Slutsky works for us. 1404 01:05:41,780 --> 01:05:43,640 Can somebody tell me why Slutsky works 1405 01:05:43,640 --> 01:05:46,960 to combine those two guys? 1406 01:05:46,960 --> 01:05:48,820 So this one is converging in distribution. 1407 01:05:48,820 --> 01:05:51,740 This one is converging in probability, 1408 01:05:51,740 --> 01:05:54,710 but to a deterministic number. 1409 01:05:54,710 --> 01:05:57,440 g prime of theta is a deterministic number. 1410 01:05:57,440 --> 01:06:02,200 I don't know what theta is, but it's certainly deterministic. 1411 01:06:02,200 --> 01:06:04,380 All right, so I can combine them, multiply them. 1412 01:06:04,380 --> 01:06:08,740 So that's just the second line of that in particular. 1413 01:06:08,740 --> 01:06:12,090 All right, everybody is with me? 1414 01:06:12,090 --> 01:06:13,340 So now I'm allowed to do this. 1415 01:06:13,340 --> 01:06:15,048 You can actually-- you will see something 1416 01:06:15,048 --> 01:06:16,950 like counterexample questions in your problem 1417 01:06:16,950 --> 01:06:18,741 set just so that you can convince yourself. 1418 01:06:18,741 --> 01:06:19,960 It's always a good thing. 1419 01:06:19,960 --> 01:06:21,150 I don't like to give them, because I 1420 01:06:21,150 --> 01:06:23,108 think it's much better for you to actually come 1421 01:06:23,108 --> 01:06:24,860 to the counterexample yourself. 1422 01:06:24,860 --> 01:06:35,670 Like what can go wrong if Y is not a random-- 1423 01:06:35,670 --> 01:06:38,450 sorry, if Y is not a-- 1424 01:06:38,450 --> 01:06:42,572 sorry, if c is not the constant, but it's a random variable. 1425 01:06:42,572 --> 01:06:45,534 You can figure that out. 1426 01:06:45,534 --> 01:06:46,700 All right, so let's go back. 1427 01:06:46,700 --> 01:06:49,040 So we have now this Delta method that tells us 1428 01:06:49,040 --> 01:06:51,080 that now I have a central limit theorem 1429 01:06:51,080 --> 01:06:55,500 for functions of averages, and not just for averages. 1430 01:06:55,500 --> 01:06:57,922 So the only price to pay is this derivative there. 1431 01:07:00,600 --> 01:07:05,490 So, for example, if g is just a linear function, 1432 01:07:05,490 --> 01:07:07,860 then I'm going to have a constant multiplication. 1433 01:07:07,860 --> 01:07:10,680 If g is a quadratic function, then I'm 1434 01:07:10,680 --> 01:07:13,710 going to have theta squared that shows up there. 1435 01:07:13,710 --> 01:07:14,550 Things like that. 1436 01:07:14,550 --> 01:07:16,300 So just think of what kind of applications 1437 01:07:16,300 --> 01:07:17,770 you could have for this. 1438 01:07:17,770 --> 01:07:19,769 Here are the functions that we're interested in, 1439 01:07:19,769 --> 01:07:21,270 is x maps to 1 over x. 1440 01:07:21,270 --> 01:07:23,049 What is the derivative of this guy? 1441 01:07:25,947 --> 01:07:29,746 What is the derivative of 1 over x? 1442 01:07:29,746 --> 01:07:31,120 Negative 1 over x squared, right? 1443 01:07:31,120 --> 01:07:33,470 That's the thing we're going to have to put in there. 1444 01:07:33,470 --> 01:07:37,630 And so this is what we get. 1445 01:07:37,630 --> 01:07:44,260 So now when I'm actually going to write this, 1446 01:07:44,260 --> 01:07:51,272 so if I want to show square root of n lambda hat minus lambda. 1447 01:07:51,272 --> 01:07:52,480 That's my application, right? 1448 01:07:52,480 --> 01:07:59,150 This is actually 1 over Tn, and this is 1 over 1 over lambda. 1449 01:07:59,150 --> 01:08:05,510 So the function g of x is 1 over x in this case. 1450 01:08:05,510 --> 01:08:06,590 So now I have this thing. 1451 01:08:06,590 --> 01:08:08,960 So I know that by the Delta method-- 1452 01:08:08,960 --> 01:08:11,240 oh, and I knew that Tn, remember, 1453 01:08:11,240 --> 01:08:16,790 square root of Tn minus 1 over lambda 1454 01:08:16,790 --> 01:08:19,310 was going to sum normal with mean 0 1455 01:08:19,310 --> 01:08:21,932 and variance 1 over lambda squared, right? 1456 01:08:21,932 --> 01:08:26,079 So the sigma square over there is 1 over lambda squared. 1457 01:08:26,079 --> 01:08:27,370 So now this thing goes to what? 1458 01:08:27,370 --> 01:08:28,938 Sum normal. 1459 01:08:28,938 --> 01:08:32,190 What is going to be the mean? 1460 01:08:32,190 --> 01:08:32,690 0. 1461 01:08:35,510 --> 01:08:37,187 And what is the variance? 1462 01:08:37,187 --> 01:08:38,270 So the variance is going-- 1463 01:08:38,270 --> 01:08:40,250 I'm going to pick up this guy, 1 over lambda 1464 01:08:40,250 --> 01:08:46,930 squared, and then I'm going to have to take g prime of what? 1465 01:08:46,930 --> 01:08:48,709 Of 1 over lambda, right? 1466 01:08:48,709 --> 01:08:49,627 That's my theta. 1467 01:08:52,840 --> 01:08:55,069 So I have g of theta, which is 1 over theta. 1468 01:08:55,069 --> 01:08:58,406 So I'm going to have g prime of 1 over lambda. 1469 01:08:58,406 --> 01:09:00,294 And what is g prime of 1 over lambda? 1470 01:09:05,029 --> 01:09:09,260 So we said that g prime is 1 over negative 1 over x squared. 1471 01:09:09,260 --> 01:09:13,885 So it's negative 1 over 1 over lambda squared-- 1472 01:09:17,877 --> 01:09:18,875 sorry, squared. 1473 01:09:21,870 --> 01:09:24,340 Which is nice, because g can be decreasing. 1474 01:09:24,340 --> 01:09:26,850 So that would be annoying to have a negative variance. 1475 01:09:26,850 --> 01:09:29,250 And so g prime is negative 1 over, and so 1476 01:09:29,250 --> 01:09:33,569 what I get eventually is lambda squared up here, 1477 01:09:33,569 --> 01:09:36,370 but then I square it again. 1478 01:09:36,370 --> 01:09:39,764 So this whole thing here becomes what? 1479 01:09:39,764 --> 01:09:41,688 Can somebody tell me what the final result is? 1480 01:09:44,274 --> 01:09:45,149 Lambda squared right? 1481 01:09:45,149 --> 01:09:47,323 So it's lambda 4 divided by lambda 2. 1482 01:09:55,179 --> 01:09:59,620 So that's what's written there. 1483 01:09:59,620 --> 01:10:04,460 And now I can just do my good old computation for a-- 1484 01:10:10,610 --> 01:10:14,570 I can do a good computation for a confidence interval. 1485 01:10:14,570 --> 01:10:17,880 All right, so let's just go from the second line. 1486 01:10:17,880 --> 01:10:21,200 So we know that lambda hat minus lambda 1487 01:10:21,200 --> 01:10:23,980 is less than, we've done that several times already. 1488 01:10:23,980 --> 01:10:25,520 So it's q alpha over 2-- 1489 01:10:25,520 --> 01:10:28,190 sorry, I should put alpha over 2 over this thing, right? 1490 01:10:28,190 --> 01:10:31,025 So that's really the quintile of what our alpha over 2 times 1491 01:10:31,025 --> 01:10:34,870 lambda divided by square root of n. 1492 01:10:34,870 --> 01:10:39,610 All right, and so that means that my confidence interval 1493 01:10:39,610 --> 01:10:42,610 should be this, lambda hat. 1494 01:10:42,610 --> 01:10:47,670 Lambda belongs to lambda plus or minus q alpha 1495 01:10:47,670 --> 01:10:51,325 over 2 lambda divided by root n, right? 1496 01:10:51,325 --> 01:10:53,640 So that's my confidence interval. 1497 01:10:53,640 --> 01:10:56,957 But again, it's not very suitable, because-- 1498 01:10:56,957 --> 01:10:59,292 sorry, that's lambda hat. 1499 01:10:59,292 --> 01:11:02,561 Because they don't know how to compute it. 1500 01:11:02,561 --> 01:11:04,510 So now I'm going to request from the audience 1501 01:11:04,510 --> 01:11:06,464 some remedies for this. 1502 01:11:06,464 --> 01:11:07,940 What do you suggest we do? 1503 01:11:12,860 --> 01:11:14,828 What is the laziest thing I can do? 1504 01:11:18,272 --> 01:11:19,248 Anybody? 1505 01:11:19,248 --> 01:11:19,748 Yeah. 1506 01:11:19,748 --> 01:11:21,332 AUDIENCE: [INAUDIBLE] 1507 01:11:21,332 --> 01:11:23,290 PHILIPPE RIGOLLET Replace lambda by lambda hat. 1508 01:11:23,290 --> 01:11:25,152 What justifies for me to do this? 1509 01:11:25,152 --> 01:11:27,602 AUDIENCE: [INAUDIBLE] 1510 01:11:27,602 --> 01:11:29,060 PHILIPPE RIGOLLET Yeah, and Slutsky 1511 01:11:29,060 --> 01:11:32,850 tells me I can actually do it, because Slutsky tells me, 1512 01:11:32,850 --> 01:11:35,210 where does this lambda come from, right? 1513 01:11:35,210 --> 01:11:37,280 This lambda comes from here. 1514 01:11:37,280 --> 01:11:39,530 That's the one that's here. 1515 01:11:39,530 --> 01:11:41,810 So actually I could rewrite this entire thing 1516 01:11:41,810 --> 01:11:47,000 as square root of n lambda hat minus lambda divided by lambda 1517 01:11:47,000 --> 01:11:51,420 converges to sum n 0, 1. 1518 01:11:51,420 --> 01:11:55,560 Now if I replace this by lambda hat, what I have is 1519 01:11:55,560 --> 01:12:01,600 that this is actually really the original one times 1520 01:12:01,600 --> 01:12:04,830 lambda divided by lambda hat. 1521 01:12:04,830 --> 01:12:07,510 And this converges to n 0, 1, right? 1522 01:12:07,510 --> 01:12:10,500 And now what you're telling me is, well, this guy 1523 01:12:10,500 --> 01:12:15,360 I know it converges to n 0, 1, and this guy is converging to 1 1524 01:12:15,360 --> 01:12:16,650 by the law of large number. 1525 01:12:16,650 --> 01:12:19,880 But this one is converging to 1, which happens to be a constant. 1526 01:12:19,880 --> 01:12:22,860 It converges in probability, so by Slutsky I can actually 1527 01:12:22,860 --> 01:12:25,590 take the product and still maintain my conversion 1528 01:12:25,590 --> 01:12:29,070 to distribution to a standard Gaussian. 1529 01:12:29,070 --> 01:12:30,360 So you can always do this. 1530 01:12:30,360 --> 01:12:34,080 Every time you replace some p by p hat, 1531 01:12:34,080 --> 01:12:35,752 as long as their ratio goes to 1, 1532 01:12:35,752 --> 01:12:38,210 which is going to be guaranteed by the law of large number, 1533 01:12:38,210 --> 01:12:40,381 you're actually going to be fine. 1534 01:12:40,381 --> 01:12:42,464 And that's where we're going to use Slutsky a lot. 1535 01:12:42,464 --> 01:12:46,640 When we do plug in, Slutsky is going to be our friend. 1536 01:12:46,640 --> 01:12:47,890 OK, so we can do this. 1537 01:12:51,180 --> 01:12:52,110 And that's one way. 1538 01:12:52,110 --> 01:12:53,650 And then other ways to just solve 1539 01:12:53,650 --> 01:12:56,160 for lambda like we did before. 1540 01:12:56,160 --> 01:12:58,200 So the first one we got is actually-- 1541 01:12:58,200 --> 01:13:00,840 I don't know if I still have it somewhere. 1542 01:13:00,840 --> 01:13:03,680 Yeah, that was the one, right? 1543 01:13:03,680 --> 01:13:08,240 So we had 1 over Tn q, and that's exactly the same 1544 01:13:08,240 --> 01:13:09,180 that we have here. 1545 01:13:09,180 --> 01:13:12,712 So your solution is actually giving us exactly this guy when 1546 01:13:12,712 --> 01:13:14,368 we actually solve for lambda. 1547 01:13:17,420 --> 01:13:20,690 So this is what we get. 1548 01:13:20,690 --> 01:13:21,620 Lambda hat. 1549 01:13:21,620 --> 01:13:24,140 We replace lambda by lambda hat, and we 1550 01:13:24,140 --> 01:13:27,750 have our asymptotic convergence theorem. 1551 01:13:27,750 --> 01:13:30,400 And that's exactly what we did in Slutsky's theorem. 1552 01:13:30,400 --> 01:13:32,817 Now we're getting to it at this point is just telling us 1553 01:13:32,817 --> 01:13:36,640 that we can actually do this. 1554 01:13:36,640 --> 01:13:39,680 Are there any questions about what we did here? 1555 01:13:39,680 --> 01:13:42,520 So this derivation right here is exactly what I 1556 01:13:42,520 --> 01:13:44,190 did on the board I showed you. 1557 01:13:44,190 --> 01:13:46,690 So let me just show you with a little more space 1558 01:13:46,690 --> 01:13:49,094 just so that we all understand, right? 1559 01:13:49,094 --> 01:13:58,570 So we know that square root of n lambda hat minus lambda divided 1560 01:13:58,570 --> 01:14:00,760 by lambda, the true lambda defined 1561 01:14:00,760 --> 01:14:04,100 converges to sum n 0, 1. 1562 01:14:04,100 --> 01:14:07,215 So that was CLT plus Delta method. 1563 01:14:11,700 --> 01:14:13,710 Applying those two, we got to here. 1564 01:14:13,710 --> 01:14:17,400 And we know that lambda hat converges 1565 01:14:17,400 --> 01:14:21,600 to lambda in probability and almost surely, and that's what? 1566 01:14:21,600 --> 01:14:24,980 That was law of large number plus continued mapping theorem, 1567 01:14:24,980 --> 01:14:25,729 right? 1568 01:14:25,729 --> 01:14:27,687 Because we only knew that one of our lambda hat 1569 01:14:27,687 --> 01:14:29,148 converges to 1 over lambda. 1570 01:14:29,148 --> 01:14:31,590 So we had to flip those things around. 1571 01:14:31,590 --> 01:14:33,920 And now what I said is that I apply Slutsky, 1572 01:14:33,920 --> 01:14:38,210 so I write square root of n lambda hat minus lambda divided 1573 01:14:38,210 --> 01:14:42,260 by lambda hat, which is the suggestion that was made to me. 1574 01:14:42,260 --> 01:14:44,160 They said, I want this, but I would 1575 01:14:44,160 --> 01:14:45,910 want to show that it converges to sum n 0, 1576 01:14:45,910 --> 01:14:49,970 1 so I can legitimately use q alpha over 2 in this one 1577 01:14:49,970 --> 01:14:50,745 though. 1578 01:14:50,745 --> 01:14:53,120 And the way we said is like, well, this thing is actually 1579 01:14:53,120 --> 01:15:00,737 really q divided by lambda times lambda divided by lambda hat. 1580 01:15:00,737 --> 01:15:02,320 So this thing that was proposed to me, 1581 01:15:02,320 --> 01:15:03,730 I can decompose it in the product 1582 01:15:03,730 --> 01:15:05,980 of those two random variables. 1583 01:15:05,980 --> 01:15:09,060 The first one here converges through the Gaussian 1584 01:15:09,060 --> 01:15:10,600 from the central limit theorem. 1585 01:15:10,600 --> 01:15:14,718 And the second one converges to 1 from this guy, 1586 01:15:14,718 --> 01:15:17,038 but in probability this time. 1587 01:15:20,620 --> 01:15:23,260 That was the ratio of two things in probability, 1588 01:15:23,260 --> 01:15:25,030 we can actually get it. 1589 01:15:25,030 --> 01:15:26,753 And so now I apply Slutsky. 1590 01:15:31,180 --> 01:15:34,537 And Slutsky tells me that I can actually do that. 1591 01:15:34,537 --> 01:15:36,870 But when I take the product of this thing that converges 1592 01:15:36,870 --> 01:15:40,010 to some standard Gaussian, and this thing that converges 1593 01:15:40,010 --> 01:15:43,380 in probability to 1, then their product actually 1594 01:15:43,380 --> 01:15:48,618 converges to still this standard Gaussian [INAUDIBLE] 1595 01:15:55,370 --> 01:15:58,880 Well, that's exactly what's done here, 1596 01:15:58,880 --> 01:16:02,340 and I think I'm getting there. 1597 01:16:02,340 --> 01:16:07,570 So in our case, OK, so just a remark for Slutsky's theorem. 1598 01:16:07,570 --> 01:16:09,070 So that's the last line. 1599 01:16:09,070 --> 01:16:11,850 So in the first example we used the problem dependent trick, 1600 01:16:11,850 --> 01:16:13,980 which was to say, well, turns out 1601 01:16:13,980 --> 01:16:16,380 that we knew that p is between 0 and 1. 1602 01:16:16,380 --> 01:16:18,960 So we have this p 1 minus p that was annoying to us. 1603 01:16:18,960 --> 01:16:21,240 We just said, let's just bound it by 1/4, 1604 01:16:21,240 --> 01:16:23,870 because that's going to be true for any value of p. 1605 01:16:23,870 --> 01:16:26,310 But here, lambda takes any value between 0 and infinity, 1606 01:16:26,310 --> 01:16:27,612 so we didn't have such a trick. 1607 01:16:27,612 --> 01:16:29,820 It's something like we could see that lambda was less 1608 01:16:29,820 --> 01:16:30,970 than something. 1609 01:16:30,970 --> 01:16:34,070 Maybe we know it, in which case we could use that. 1610 01:16:34,070 --> 01:16:36,844 But then in this case, we could actually also 1611 01:16:36,844 --> 01:16:39,010 have used Slutsky's theorem by doing plug in, right? 1612 01:16:39,010 --> 01:16:41,890 So here this is my p 1 minus p that's replaced by p hat 1 1613 01:16:41,890 --> 01:16:43,060 minus p hat. 1614 01:16:43,060 --> 01:16:45,084 And Slutsky justify, so we did that 1615 01:16:45,084 --> 01:16:46,500 without really thinking last time. 1616 01:16:46,500 --> 01:16:48,700 But Slutsky actually justifies the fact 1617 01:16:48,700 --> 01:16:51,225 that this is valid, and still allows me to use 1618 01:16:51,225 --> 01:16:52,940 this q alpha over 2 here. 1619 01:16:56,230 --> 01:16:58,180 All right, so that's the end of this lecture. 1620 01:16:58,180 --> 01:17:01,300 Tonight I will post the next set of slides, chapter two. 1621 01:17:01,300 --> 01:17:04,060 And, well, hopefully the video. 1622 01:17:04,060 --> 01:17:06,810 I'm not sure when it's going to come out.