1 00:00:00,060 --> 00:00:02,500 The following content is provided under a Creative 2 00:00:02,500 --> 00:00:04,019 Commons license. 3 00:00:04,019 --> 00:00:06,360 Your support will help MIT OpenCourseWare 4 00:00:06,360 --> 00:00:10,730 continue to offer high quality educational resources for free. 5 00:00:10,730 --> 00:00:13,340 To make a donation or view additional materials 6 00:00:13,340 --> 00:00:17,217 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:17,217 --> 00:00:17,842 at ocw.mit.edu. 8 00:00:21,520 --> 00:00:25,260 PROFESSOR: OK, so good afternoon. 9 00:00:25,260 --> 00:00:30,820 Today, we will review probability theory. 10 00:00:30,820 --> 00:00:36,090 So I will mostly focus on-- I'll give you some distributions. 11 00:00:36,090 --> 00:00:38,830 So probabilistic distributions, that will be of interest to us 12 00:00:38,830 --> 00:00:40,830 throughout the course. 13 00:00:40,830 --> 00:00:44,610 And I will talk about moment-generating function 14 00:00:44,610 --> 00:00:46,120 a little bit. 15 00:00:46,120 --> 00:00:50,660 Afterwards, I will talk about law of large numbers 16 00:00:50,660 --> 00:00:52,210 and central limit theorem. 17 00:00:56,310 --> 00:01:00,680 Who has heard of all of these topics before? 18 00:01:00,680 --> 00:01:02,150 OK. 19 00:01:02,150 --> 00:01:04,120 That's good. 20 00:01:04,120 --> 00:01:06,624 Then I'll try to focus more on a little bit more 21 00:01:06,624 --> 00:01:07,540 of the advanced stuff. 22 00:01:10,890 --> 00:01:13,830 Then a big part of it will be review for you. 23 00:01:13,830 --> 00:01:18,260 So first of all, just to agree on terminology, let's 24 00:01:18,260 --> 00:01:21,490 review some definitions. 25 00:01:21,490 --> 00:01:32,670 So a random variable X-- we will talk 26 00:01:32,670 --> 00:01:38,900 about discrete and continuous random variables. 27 00:01:43,310 --> 00:01:47,240 Just to set up the notation, I will write discrete as X 28 00:01:47,240 --> 00:01:50,130 and continuous random variable as Y for now. 29 00:01:50,130 --> 00:01:52,820 So they are given by its probability distribution-- 30 00:01:52,820 --> 00:01:57,070 discrete random variable is given by its probability mass 31 00:01:57,070 --> 00:02:02,490 function, f sub X, I will denote. 32 00:02:02,490 --> 00:02:06,900 And continuous is given by probability distribution 33 00:02:06,900 --> 00:02:07,399 function. 34 00:02:11,530 --> 00:02:17,745 I will denote by f sub Y. So pmf and pdf. 35 00:02:22,210 --> 00:02:23,930 Here, I just use a subscript because I 36 00:02:23,930 --> 00:02:26,030 wanted to distinguish f sub x and f sub y. 37 00:02:26,030 --> 00:02:29,140 But when it's clear which random variable we're talking about, 38 00:02:29,140 --> 00:02:32,190 I'll just say f. 39 00:02:32,190 --> 00:02:33,740 So what is this? 40 00:02:33,740 --> 00:02:42,980 A probability mass function is a function from the sample space 41 00:02:42,980 --> 00:02:50,290 to non-negative reals such that the sum over all points 42 00:02:50,290 --> 00:02:54,480 in the domain equals 1. 43 00:02:54,480 --> 00:02:57,110 The probability distribution is very similar. 44 00:02:59,730 --> 00:03:02,890 The function from the sample space non-negative 45 00:03:02,890 --> 00:03:07,500 reals, but now the integration over the domain. 46 00:03:11,780 --> 00:03:16,650 So it's pretty much safe to consider our sample space 47 00:03:16,650 --> 00:03:20,570 to be the real numbers for continuous random variables. 48 00:03:20,570 --> 00:03:23,960 Later in the course, you will see some examples where 49 00:03:23,960 --> 00:03:25,230 it's not the real numbers. 50 00:03:25,230 --> 00:03:29,217 But for now, just consider it as real numbers. 51 00:03:34,840 --> 00:03:39,412 For example, probability mass function. 52 00:03:39,412 --> 00:03:46,810 If X takes 1 with probability 1/3, 53 00:03:46,810 --> 00:03:53,010 minus 1 with probability 1/3, and 0 with probability 1/3. 54 00:03:56,070 --> 00:04:01,464 Then our probability mass function is f_x(1) equals 55 00:04:01,464 --> 00:04:08,370 f_x(-1), 1/3, just like that. 56 00:04:08,370 --> 00:04:11,820 An example of a continuous random variable 57 00:04:11,820 --> 00:04:17,470 is if-- let's say, for example, if f sub Y is 58 00:04:17,470 --> 00:04:25,420 equal to 1 for all y in [0,1], then 59 00:04:25,420 --> 00:04:36,305 this is pdf of uniform random variable 60 00:04:36,305 --> 00:04:39,800 where the space is [0,1]. 61 00:04:39,800 --> 00:04:41,850 So this random variable just picks one out 62 00:04:41,850 --> 00:04:44,330 of the three numbers with equal probability. 63 00:04:44,330 --> 00:04:47,450 This picks one out of this, all the real numbers between 0 64 00:04:47,450 --> 00:04:51,600 and 1, with equal probability. 65 00:04:51,600 --> 00:04:54,956 These are just some basic stuff. 66 00:04:54,956 --> 00:04:56,330 You should be familiar with this, 67 00:04:56,330 --> 00:05:00,934 but I wrote it down just so that we agree on the notation. 68 00:05:00,934 --> 00:05:01,858 OK. 69 00:05:01,858 --> 00:05:03,353 Both of the boards don't slide. 70 00:05:03,353 --> 00:05:06,311 That's good. 71 00:05:06,311 --> 00:05:08,490 A few more stuff. 72 00:05:08,490 --> 00:05:14,530 Expectation-- probability first. 73 00:05:14,530 --> 00:05:22,092 Probability of an event can be computed as probability of A 74 00:05:22,092 --> 00:05:28,200 is equal to either sum of all points in A-- this probability 75 00:05:28,200 --> 00:05:36,700 mass function-- or integral over the set A 76 00:05:36,700 --> 00:05:39,540 depending on what you're using. 77 00:05:39,540 --> 00:05:50,050 And expectation, or mean is-- expectation of X 78 00:05:50,050 --> 00:05:55,410 is equal to the sum over all x, x times that. 79 00:05:55,410 --> 00:06:01,110 And expectation of Y is the integral over omega. 80 00:06:01,110 --> 00:06:02,580 Oh, sorry. 81 00:06:02,580 --> 00:06:04,540 Space. 82 00:06:04,540 --> 00:06:05,538 y times. 83 00:06:11,016 --> 00:06:12,520 OK. 84 00:06:12,520 --> 00:06:16,850 And one more basic concept I'd like to review 85 00:06:16,850 --> 00:06:32,150 is two random variables X_1, X_2 are independent if probability 86 00:06:32,150 --> 00:06:38,220 that X_1 is in A and X_2 is in B equals 87 00:06:38,220 --> 00:06:48,898 the product of the probabilities, for all events A 88 00:06:48,898 --> 00:06:54,222 and B. OK. 89 00:06:57,610 --> 00:06:59,570 All agreed? 90 00:06:59,570 --> 00:07:01,910 So for independence, I will talk about independence 91 00:07:01,910 --> 00:07:04,570 of several random variables as well. 92 00:07:04,570 --> 00:07:09,290 There are two concepts of independence-- 93 00:07:09,290 --> 00:07:10,760 not two, but several. 94 00:07:10,760 --> 00:07:17,220 The two most popular are mutually independent events 95 00:07:17,220 --> 00:07:19,110 and pairwise independent events. 96 00:07:23,583 --> 00:07:27,060 Can somebody tell me the difference between these two 97 00:07:27,060 --> 00:07:28,865 for several variables? 98 00:07:33,230 --> 00:07:34,200 Yes? 99 00:07:34,200 --> 00:07:35,655 AUDIENCE: So usually, independent 100 00:07:35,655 --> 00:07:38,640 means all the random variables are independent, 101 00:07:38,640 --> 00:07:42,550 like X_1 is independent with every others. 102 00:07:42,550 --> 00:07:46,610 But pairwise means X_1 and X_2 are independent, 103 00:07:46,610 --> 00:07:51,677 but X_1, X_2, and x_3, they may not be independent. 104 00:07:51,677 --> 00:07:52,260 PROFESSOR: OK. 105 00:07:52,260 --> 00:07:54,940 Maybe-- yeah. 106 00:07:54,940 --> 00:07:57,020 So that's good. 107 00:07:57,020 --> 00:08:04,420 So let's see-- for the example of three random variables, 108 00:08:04,420 --> 00:08:07,770 it might be the case that each pair are independent. 109 00:08:07,770 --> 00:08:10,110 X_1 and X_2 X_1 is independent with X_2, 110 00:08:10,110 --> 00:08:12,940 X_1 is independent with X_3, X_2 is with X_3. 111 00:08:12,940 --> 00:08:15,290 But altogether, it's not independent. 112 00:08:15,290 --> 00:08:20,780 What that means is, this type of statement is not true. 113 00:08:20,780 --> 00:08:25,200 So there are say A_1, A_2, A_3 for which this does not hold. 114 00:08:25,200 --> 00:08:28,150 But that's just some technical detail. 115 00:08:28,150 --> 00:08:30,960 We will mostly just consider mutually independent events. 116 00:08:30,960 --> 00:08:32,960 So when we say that several random variables are 117 00:08:32,960 --> 00:08:36,630 independent, it just means whatever collection you take, 118 00:08:36,630 --> 00:08:37,742 they're all independent. 119 00:08:43,995 --> 00:08:44,960 OK. 120 00:08:44,960 --> 00:08:47,780 So a little bit more fun stuff [? in this ?] overview. 121 00:08:50,640 --> 00:08:54,275 So we defined random variables. 122 00:08:54,275 --> 00:08:59,060 And one of the most universal random variable, 123 00:08:59,060 --> 00:09:02,310 or distribution, is a normal distribution. 124 00:09:10,920 --> 00:09:14,450 It's a continuous random variable. 125 00:09:14,450 --> 00:09:21,160 Our continuous random variable has normal distribution, 126 00:09:21,160 --> 00:09:29,835 is said to have normal distribution, if-- N(mu, 127 00:09:29,835 --> 00:09:40,380 sigma)-- if the probability distribution function is given 128 00:09:40,380 --> 00:09:46,820 as 1 over sigma square root 2 pi, 129 00:09:46,820 --> 00:09:50,830 e to the minus x minus mu squared. 130 00:09:57,270 --> 00:10:01,194 For all reals. 131 00:10:01,194 --> 00:10:04,146 OK? 132 00:10:04,146 --> 00:10:12,500 So mu mean over-- that's one of the most 133 00:10:12,500 --> 00:10:17,050 universal random variables-- distributions, the most 134 00:10:17,050 --> 00:10:18,100 important one as well. 135 00:10:28,990 --> 00:10:29,870 OK. 136 00:10:29,870 --> 00:10:33,150 So this distribution, how it looks like-- I'm sure 137 00:10:33,150 --> 00:10:36,043 you saw this bell curve before. 138 00:10:36,043 --> 00:10:42,351 It looks like this if it's N(0,1), let's say. 139 00:10:42,351 --> 00:10:45,420 And that's your y. 140 00:10:45,420 --> 00:10:48,360 So it's centered around the origin, 141 00:10:48,360 --> 00:10:52,090 and it's symmetrical on the origin. 142 00:10:52,090 --> 00:10:55,290 So now let's look at our purpose. 143 00:10:55,290 --> 00:10:56,850 Let's think about our purpose. 144 00:10:56,850 --> 00:11:01,940 We want to model a financial product or a stock, 145 00:11:01,940 --> 00:11:05,350 the price of the stock, using some random variable. 146 00:11:05,350 --> 00:11:09,065 The first thing you can try is to use normal distribution. 147 00:11:09,065 --> 00:11:10,690 Normal distribution doesn't make sense, 148 00:11:10,690 --> 00:11:19,586 but we can say the price at day n minus the price at day n 149 00:11:19,586 --> 00:11:21,615 minus 1 is normal distribution. 150 00:11:25,575 --> 00:11:29,440 Is this a sensible definition? 151 00:11:29,440 --> 00:11:30,637 That's not really. 152 00:11:30,637 --> 00:11:31,720 So it's not a good choice. 153 00:11:31,720 --> 00:11:35,810 You can model it like this, but it's not a good choice. 154 00:11:35,810 --> 00:11:38,050 There may be several reasons, but one reason 155 00:11:38,050 --> 00:11:40,860 is that it doesn't take into account the order of magnitude 156 00:11:40,860 --> 00:11:42,110 of the price itself. 157 00:11:42,110 --> 00:11:49,487 So the stock-- let's say you have a stock price that 158 00:11:49,487 --> 00:11:52,730 goes something like that. 159 00:11:52,730 --> 00:11:58,620 And say it was $10 here, and $50 here. 160 00:11:58,620 --> 00:12:01,890 Regardless of where your position is at, 161 00:12:01,890 --> 00:12:05,900 it says that the increment, the absolute value of increment 162 00:12:05,900 --> 00:12:11,080 is identically distributed at this point and at this point. 163 00:12:11,080 --> 00:12:14,770 But if you observed how it works, 164 00:12:14,770 --> 00:12:18,040 usually that's not normally distributed. 165 00:12:18,040 --> 00:12:21,800 What's normally distributed is the percentage 166 00:12:21,800 --> 00:12:24,610 of how much it changes daily. 167 00:12:24,610 --> 00:12:32,125 So this is not a sensible model, not a good model. 168 00:12:35,910 --> 00:12:41,200 But still, we can use normal distribution 169 00:12:41,200 --> 00:12:42,830 to come up with a pretty good model. 170 00:12:49,170 --> 00:13:06,130 So instead, what we want is a relative difference 171 00:13:06,130 --> 00:13:07,892 to be normally distributed. 172 00:13:15,680 --> 00:13:16,720 That is the percent. 173 00:13:26,760 --> 00:13:33,150 The question is, what is the distribution of price? 174 00:13:33,150 --> 00:13:34,826 What does the distribution of price? 175 00:13:45,750 --> 00:13:48,660 So it's not a very good explanation. 176 00:13:48,660 --> 00:13:52,860 Because I'm giving just discrete increments while 177 00:13:52,860 --> 00:13:55,770 these are continuous random variables and so on. 178 00:13:55,770 --> 00:13:59,030 But what I'm trying to say here is that normal distribution 179 00:13:59,030 --> 00:14:00,500 is not good enough. 180 00:14:00,500 --> 00:14:03,360 Instead, we want the percentage change 181 00:14:03,360 --> 00:14:05,450 to be normally distributed. 182 00:14:05,450 --> 00:14:11,300 And if that is the case, what will be the distribution 183 00:14:11,300 --> 00:14:13,066 of the random variable? 184 00:14:13,066 --> 00:14:15,440 In this case, what will be the distribution of the price? 185 00:14:27,420 --> 00:14:30,250 One thing I should mention is, in this case, 186 00:14:30,250 --> 00:14:34,230 if each discrement is normally distributed, 187 00:14:34,230 --> 00:14:39,530 then the price at day n will still 188 00:14:39,530 --> 00:14:44,270 be a normal random variable distributed like that. 189 00:14:47,440 --> 00:14:53,900 So if there's no tendency-- if the average daily increment is 190 00:14:53,900 --> 00:14:56,832 0, then no matter how far you go, 191 00:14:56,832 --> 00:14:58,915 your random variable will be normally distributed. 192 00:15:02,230 --> 00:15:06,110 But here, that will not be the case. 193 00:15:06,110 --> 00:15:08,785 So we want to see what the distribution of P_n 194 00:15:08,785 --> 00:15:11,981 will be in this case. 195 00:15:11,981 --> 00:15:12,480 OK. 196 00:15:17,820 --> 00:15:29,300 To do that-- let me formally write down what I want to say. 197 00:15:29,300 --> 00:15:34,008 What I want to say is this. 198 00:15:34,008 --> 00:15:46,030 I want to define a log-normal distribution Y, 199 00:15:46,030 --> 00:16:07,274 or log-normal random variable Y, such that log of Y 200 00:16:07,274 --> 00:16:08,762 is normally distributed. 201 00:16:24,170 --> 00:16:26,670 So to derive the probability distribution of this 202 00:16:26,670 --> 00:16:28,220 from the normal distribution, we can 203 00:16:28,220 --> 00:16:40,010 use the change of variable formula, which 204 00:16:40,010 --> 00:16:47,340 says the following: suppose X and Y 205 00:16:47,340 --> 00:17:16,781 are random variables such that-- probability of X 206 00:17:16,781 --> 00:17:26,262 minus x-- for all x. 207 00:17:32,250 --> 00:17:48,218 Then F of Y of y-- the first-- of f 208 00:17:48,218 --> 00:17:52,709 sub X is equal to f sub Y of y. 209 00:17:58,198 --> 00:17:59,196 h of x. 210 00:18:07,200 --> 00:18:11,930 So let's try to fit into this story. 211 00:18:11,930 --> 00:18:14,920 We want to have a random variable Y such 212 00:18:14,920 --> 00:18:18,510 that log Y is normally distributed. 213 00:18:18,510 --> 00:18:26,430 Here-- so you can put log of x here. 214 00:18:26,430 --> 00:18:30,300 If Y is normally distributed, X will be the distribution 215 00:18:30,300 --> 00:18:32,890 that we're interested in. 216 00:18:32,890 --> 00:18:37,870 So using this formula, we can find probability distribution 217 00:18:37,870 --> 00:18:40,650 function of the log-normal distribution using 218 00:18:40,650 --> 00:18:43,720 the probability distribution of normal. 219 00:18:43,720 --> 00:18:44,810 So let's do that. 220 00:19:05,669 --> 00:19:10,659 AUDIENCE: [INAUDIBLE], right? 221 00:19:10,659 --> 00:19:12,910 PROFESSOR: Yes. 222 00:19:12,910 --> 00:19:15,006 So it's not a good choice. 223 00:19:15,006 --> 00:19:16,380 Locally, it might be good choice. 224 00:19:16,380 --> 00:19:20,357 But if it's taken over a long time, 225 00:19:20,357 --> 00:19:21,440 it won't be a good choice. 226 00:19:21,440 --> 00:19:24,398 Because it will also take negative values, for example. 227 00:19:28,517 --> 00:19:30,100 So if you just take this model, what's 228 00:19:30,100 --> 00:19:31,849 going to happen over a long period of time 229 00:19:31,849 --> 00:19:35,730 is it's going to hit this square root of n, 230 00:19:35,730 --> 00:19:38,090 negative square root of n line infinitely often. 231 00:19:42,050 --> 00:19:44,620 And then it can go up to infinity, 232 00:19:44,620 --> 00:19:47,470 or it can go down to infinity eventually. 233 00:19:47,470 --> 00:19:49,720 So it will take negative values and positive values. 234 00:19:53,310 --> 00:19:55,460 That's one reason, but there are several reasons 235 00:19:55,460 --> 00:19:57,970 why that's not a good choice. 236 00:19:57,970 --> 00:19:59,440 If you look at a very small scale, 237 00:19:59,440 --> 00:20:03,610 it might be OK, because the base price doesn't change that much. 238 00:20:03,610 --> 00:20:05,490 So if you model in terms of ratio, 239 00:20:05,490 --> 00:20:07,930 our if you model it in an absolute way, 240 00:20:07,930 --> 00:20:09,830 it doesn't matter that much. 241 00:20:09,830 --> 00:20:13,850 But if you want to do it a little bit more large scale, 242 00:20:13,850 --> 00:20:17,890 then that's not a very good choice. 243 00:20:17,890 --> 00:20:20,120 Other questions? 244 00:20:20,120 --> 00:20:21,745 Do you want me to add some explanation? 245 00:20:25,322 --> 00:20:25,822 OK. 246 00:20:29,580 --> 00:20:32,720 So let me get this right. 247 00:20:37,120 --> 00:20:45,440 Y. I want X to be-- yes. 248 00:20:45,440 --> 00:20:49,950 I want X to be the log normal distribution. 249 00:20:56,950 --> 00:21:04,580 And I want Y to be normal distribution 250 00:21:04,580 --> 00:21:07,190 or a normal random variable. 251 00:21:07,190 --> 00:21:12,572 Then the probability that X is at most x 252 00:21:12,572 --> 00:21:24,500 equals the probability that Y is at most-- sigma. 253 00:21:24,500 --> 00:21:29,070 Y is at most log x. 254 00:21:29,070 --> 00:21:33,160 That's the definition of log-normal distribution. 255 00:21:33,160 --> 00:21:39,130 Then by using this change of variable formula, 256 00:21:39,130 --> 00:21:41,780 probability density function of X 257 00:21:41,780 --> 00:21:46,980 is equal to probability density function of Y at log 258 00:21:46,980 --> 00:21:54,440 x times the differentiation of log x which is 1 over x. 259 00:21:54,440 --> 00:22:00,460 So it becomes 1 over x sigma square root 260 00:22:00,460 --> 00:22:07,704 2 pi, e to the minus log x minus mu squared. 261 00:22:11,610 --> 00:22:13,430 So log-normal distribution can also 262 00:22:13,430 --> 00:22:15,380 be defined as the distribution which has 263 00:22:15,380 --> 00:22:17,246 probability mass function this. 264 00:22:22,650 --> 00:22:26,160 You can use either definition. 265 00:22:26,160 --> 00:22:29,391 Let me just make sure that I didn't mess up in the middle. 266 00:22:32,800 --> 00:22:33,780 Yes. 267 00:22:33,780 --> 00:22:39,187 And that only works for x greater than 0. 268 00:22:39,187 --> 00:22:39,687 Yes? 269 00:22:39,687 --> 00:22:41,714 AUDIENCE: [INAUDIBLE]? 270 00:22:41,714 --> 00:22:42,380 PROFESSOR: Yeah. 271 00:22:42,380 --> 00:22:43,940 So all logs are natural log. 272 00:22:43,940 --> 00:22:46,171 It should be ln. 273 00:22:46,171 --> 00:22:46,670 Yeah. 274 00:22:46,670 --> 00:22:48,320 Thank you. 275 00:22:48,320 --> 00:22:49,810 OK. 276 00:22:49,810 --> 00:22:58,370 So question-- what's the mean of this distribution here? 277 00:22:58,370 --> 00:22:58,870 Yeah? 278 00:22:58,870 --> 00:23:00,970 AUDIENCE: 1? 279 00:23:00,970 --> 00:23:02,460 PROFESSOR: Not 1. 280 00:23:02,460 --> 00:23:04,820 It might be mu. 281 00:23:04,820 --> 00:23:07,500 Is it mu? 282 00:23:07,500 --> 00:23:08,260 Oh, sorry. 283 00:23:08,260 --> 00:23:09,850 It might be e to the mu. 284 00:23:09,850 --> 00:23:15,470 Because log X, the normal distribution had mean mu. 285 00:23:15,470 --> 00:23:17,630 log x equals mu might be the center. 286 00:23:17,630 --> 00:23:20,850 If that's the case, x is e to the mu will be the mean. 287 00:23:20,850 --> 00:23:23,915 Is that the case? 288 00:23:23,915 --> 00:23:24,415 Yes? 289 00:23:24,415 --> 00:23:27,890 AUDIENCE: Can you get the mu minus [INAUDIBLE]? 290 00:23:27,890 --> 00:23:29,760 PROFESSOR: Probably right. 291 00:23:29,760 --> 00:23:31,070 I don't remember what's there. 292 00:23:31,070 --> 00:23:32,490 There is a correcting factor. 293 00:23:32,490 --> 00:23:34,292 I don't remember exactly what that is, 294 00:23:34,292 --> 00:23:37,210 but I think you're right. 295 00:23:37,210 --> 00:23:39,770 So one very important thing to remember 296 00:23:39,770 --> 00:23:43,500 is log-normal distribution are referred 297 00:23:43,500 --> 00:23:48,150 to in terms of the parameters mu and sigma, 298 00:23:48,150 --> 00:23:50,510 because that's the mu and sigma up here and here coming 299 00:23:50,510 --> 00:23:52,600 from the normal distribution. 300 00:23:52,600 --> 00:23:57,580 But those are not the mean and variance anymore, 301 00:23:57,580 --> 00:24:01,900 because you skew the distribution. 302 00:24:01,900 --> 00:24:03,700 It's no longer centered at mu. 303 00:24:03,700 --> 00:24:07,490 log X is centered at mu, but when it takes exponential, 304 00:24:07,490 --> 00:24:08,590 it becomes skewed. 305 00:24:08,590 --> 00:24:12,630 And we take the average, you'll see that the mean 306 00:24:12,630 --> 00:24:13,930 is no longer e to the mu. 307 00:24:13,930 --> 00:24:16,365 So that doesn't give the mean. 308 00:24:16,365 --> 00:24:18,490 That doesn't imply that the mean is e to the sigma. 309 00:24:18,490 --> 00:24:20,870 That doesn't imply that the variance is 310 00:24:20,870 --> 00:24:23,242 something like e to the sigma. 311 00:24:23,242 --> 00:24:27,040 That's just totally nonsense. 312 00:24:27,040 --> 00:24:30,080 Just remember-- these are just parameters, some parameters. 313 00:24:30,080 --> 00:24:32,450 It's no longer mean or variance. 314 00:24:35,670 --> 00:24:39,794 And in your homework, one exercise, 315 00:24:39,794 --> 00:24:41,710 we'll ask you to compute the mean and variance 316 00:24:41,710 --> 00:24:44,490 of the random variable. 317 00:24:44,490 --> 00:24:48,560 But really, just try to have it stick in your mind 318 00:24:48,560 --> 00:24:53,160 that mu and sigma is no longer mean and variance. 319 00:24:53,160 --> 00:24:56,230 That's only the case for normal random variables. 320 00:24:56,230 --> 00:24:58,380 And the reason we are still using mu and sigma 321 00:24:58,380 --> 00:25:00,680 is because of this derivation. 322 00:25:00,680 --> 00:25:02,390 And it's easy to describe it in those. 323 00:25:05,830 --> 00:25:07,940 OK. 324 00:25:07,940 --> 00:25:11,800 So the normal distribution and log-normal distribution 325 00:25:11,800 --> 00:25:13,720 will probably be the distributions 326 00:25:13,720 --> 00:25:15,742 that you'll see the most throughout the course. 327 00:25:15,742 --> 00:25:17,325 But there are some other distributions 328 00:25:17,325 --> 00:25:18,500 that you'll also see. 329 00:25:23,460 --> 00:25:24,948 I need this. 330 00:25:32,884 --> 00:25:35,650 I will not talk about it in detail. 331 00:25:35,650 --> 00:25:38,540 It will be some exercise questions. 332 00:25:38,540 --> 00:25:44,939 For example, you have Poisson distribution or exponential 333 00:25:44,939 --> 00:25:45,522 distributions. 334 00:25:52,130 --> 00:25:56,550 These are some other distributions that you'll see. 335 00:25:56,550 --> 00:25:59,060 And all of these-- normal, log-normal, Poisson, 336 00:25:59,060 --> 00:26:01,060 and exponential, and a lot more can 337 00:26:01,060 --> 00:26:04,400 be grouped into a family of distributions 338 00:26:04,400 --> 00:26:05,798 called exponential family. 339 00:26:18,490 --> 00:26:24,026 So a distribution is called to be in an exponential family-- 340 00:26:24,026 --> 00:26:36,590 A distribution belongs to exponential family 341 00:26:36,590 --> 00:26:50,890 if there exists a theta, a vector that parametrizes 342 00:26:50,890 --> 00:27:05,520 the distribution such that the probability density 343 00:27:05,520 --> 00:27:10,670 function for this choice of parameter theta 344 00:27:10,670 --> 00:27:16,480 can be written as h of x times c of theta 345 00:27:16,480 --> 00:27:22,498 times the exponent of sum from i equal 1 to k-- 346 00:27:35,446 --> 00:27:35,970 Yes. 347 00:27:35,970 --> 00:27:40,100 So here, when I write only x, h should only 348 00:27:40,100 --> 00:27:43,400 depend on x, not on theta. 349 00:27:43,400 --> 00:27:45,090 When I write some function of theta, 350 00:27:45,090 --> 00:27:48,020 it should only depend on theta, not on x. 351 00:27:48,020 --> 00:28:01,070 So h(x), t_i(x) depends only on x and c(theta) on my value 352 00:28:01,070 --> 00:28:04,679 theta, depends only on theta. 353 00:28:04,679 --> 00:28:05,720 That's an abstract thing. 354 00:28:05,720 --> 00:28:07,830 It's not clear why this is so useful, 355 00:28:07,830 --> 00:28:10,140 at least from the definition. 356 00:28:10,140 --> 00:28:14,955 But you're going to talk about some distribution 357 00:28:14,955 --> 00:28:16,650 for an exponential family, right? 358 00:28:16,650 --> 00:28:17,150 Yeah. 359 00:28:17,150 --> 00:28:19,840 So you will see something about this. 360 00:28:19,840 --> 00:28:21,770 But one good thing is, they exhibit 361 00:28:21,770 --> 00:28:25,360 some good statistical behavior, the things-- when 362 00:28:25,360 --> 00:28:28,330 you group them into-- all distributions 363 00:28:28,330 --> 00:28:31,460 in the exponential family have some nice statistical 364 00:28:31,460 --> 00:28:35,590 properties, which makes it good. 365 00:28:35,590 --> 00:28:37,270 That's too abstract. 366 00:28:37,270 --> 00:28:42,140 Let's see how log-normal distribution actually falls 367 00:28:42,140 --> 00:28:43,631 into the exponential family. 368 00:28:47,607 --> 00:28:49,444 AUDIENCE: So, let me just comment. 369 00:28:49,444 --> 00:28:50,360 PROFESSOR: Yeah, sure. 370 00:28:50,360 --> 00:28:53,976 AUDIENCE: The notion of independent random variables, 371 00:28:53,976 --> 00:28:58,687 you went over how the-- well, the probability density 372 00:28:58,687 --> 00:29:00,520 functions of collections of random variables 373 00:29:00,520 --> 00:29:01,936 if they're mutually independent is 374 00:29:01,936 --> 00:29:05,640 the product of the probability densities 375 00:29:05,640 --> 00:29:07,132 of the individual variables. 376 00:29:07,132 --> 00:29:10,240 And so with this exponential family, 377 00:29:10,240 --> 00:29:12,685 if you have random variables from the same exponential 378 00:29:12,685 --> 00:29:18,380 family, products of this density function factor out 379 00:29:18,380 --> 00:29:19,700 into a very simple form. 380 00:29:19,700 --> 00:29:21,360 It doesn't get more complicated as you 381 00:29:21,360 --> 00:29:24,430 look at the joint density of many variables, 382 00:29:24,430 --> 00:29:27,510 and in fact simplifies to the same exponential family. 383 00:29:27,510 --> 00:29:30,210 So that's where that becomes very useful. 384 00:29:30,210 --> 00:29:32,305 PROFESSOR: So it's designed so that it factors out 385 00:29:32,305 --> 00:29:33,180 when it's multiplied. 386 00:29:33,180 --> 00:29:34,644 It factors out well. 387 00:29:37,990 --> 00:29:38,650 OK. 388 00:29:38,650 --> 00:29:43,000 So-- sorry about that. 389 00:29:43,000 --> 00:29:44,960 Yeah, log-normal distribution. 390 00:29:44,960 --> 00:29:49,970 So take h(x), 1 over x. 391 00:29:49,970 --> 00:29:52,350 Before that, let's just rewrite that in a different way. 392 00:29:52,350 --> 00:29:58,804 So 1 over x sigma square root 2 pi, e to the minus log 393 00:29:58,804 --> 00:30:03,430 x [INAUDIBLE] squared. 394 00:30:03,430 --> 00:30:04,530 Square. 395 00:30:04,530 --> 00:30:10,546 Can be rewritten as 1 over x, times 1 over sigma 396 00:30:10,546 --> 00:30:18,215 squared 2 pi, e to the minus log x square 397 00:30:18,215 --> 00:30:30,590 over 2 sigma square plus mu log x over sigma square 398 00:30:30,590 --> 00:30:33,065 minus mu square. 399 00:30:37,050 --> 00:30:38,730 Let's write it like that. 400 00:30:38,730 --> 00:30:42,464 Set up h(x) equals 1 over x. 401 00:30:42,464 --> 00:30:51,422 c of theta-- sorry, theta equals mu sigma. 402 00:30:51,422 --> 00:30:55,932 c(theta) is equal to 1 over sigma square root 2 pi, e 403 00:30:55,932 --> 00:30:57,163 to the minus mu square. 404 00:31:01,510 --> 00:31:03,920 So you will parametrize this family 405 00:31:03,920 --> 00:31:06,870 in terms of mu and sigma. 406 00:31:06,870 --> 00:31:09,490 Your h of x here will be 1 over x. 407 00:31:09,490 --> 00:31:14,000 Your c(theta) will be this term and the last term here, 408 00:31:14,000 --> 00:31:16,960 because this doesn't depend on x. 409 00:31:16,960 --> 00:31:21,630 And then you have to figure out what w and t is. 410 00:31:21,630 --> 00:31:24,970 You can let w_1 of x be log x square. 411 00:31:29,180 --> 00:31:38,940 t_1-- no, t_1 of x be log x square, w_1 of theta be minus 1 412 00:31:38,940 --> 00:31:41,392 over 2 sigma square. 413 00:31:41,392 --> 00:31:44,080 And similarly, you can let t_2 equals log 414 00:31:44,080 --> 00:31:51,404 x and w_2 equals mu over sigma. 415 00:31:54,580 --> 00:31:56,570 It's just some technicality, but at least you 416 00:31:56,570 --> 00:31:59,974 can see it really fits in. 417 00:32:02,690 --> 00:32:05,200 OK. 418 00:32:05,200 --> 00:32:07,380 So that's all about distributions 419 00:32:07,380 --> 00:32:10,080 that I want to talk about. 420 00:32:10,080 --> 00:32:12,640 And then let's talk a little bit more 421 00:32:12,640 --> 00:32:15,340 about more interesting stuff, in my opinion. 422 00:32:15,340 --> 00:32:16,705 I like this stuff better. 423 00:32:19,440 --> 00:32:23,340 There are two main things that we're interested in. 424 00:32:23,340 --> 00:32:30,650 When we have a random variable, at least for our purpose, what 425 00:32:30,650 --> 00:32:42,766 we want to study is given a random variable, first, 426 00:32:42,766 --> 00:32:44,015 we want to study a statistics. 427 00:32:50,710 --> 00:32:54,826 So we want to study this statistics, whatever 428 00:32:54,826 --> 00:32:55,798 that means. 429 00:32:59,690 --> 00:33:02,567 And that will be represented by the k-th moments 430 00:33:02,567 --> 00:33:03,525 of the random variable. 431 00:33:10,340 --> 00:33:15,370 Where k-th moment is defined as expectation of X to the k. 432 00:33:20,600 --> 00:33:24,000 And a good way to study all the moments together 433 00:33:24,000 --> 00:33:26,855 in one function is a moment-generating function. 434 00:33:34,300 --> 00:33:36,480 So this moment-generating function 435 00:33:36,480 --> 00:33:40,340 encodes all the k-th moments of a random variable. 436 00:33:40,340 --> 00:33:43,130 So it contains all the statistical information 437 00:33:43,130 --> 00:33:45,339 of a random variable. 438 00:33:45,339 --> 00:33:46,880 That's why moment-generating function 439 00:33:46,880 --> 00:33:48,060 will be interesting to us. 440 00:33:48,060 --> 00:33:50,050 Because when you want to study it, 441 00:33:50,050 --> 00:33:52,760 you don't have to consider each moment separately. 442 00:33:52,760 --> 00:33:54,090 It gives a unified way. 443 00:33:54,090 --> 00:33:58,050 It gives a very good feeling about your function. 444 00:33:58,050 --> 00:33:59,560 That will be our first topic. 445 00:33:59,560 --> 00:34:02,200 Our second topic will be we want to study 446 00:34:02,200 --> 00:34:10,140 its long-term or large-scale behavior. 447 00:34:18,190 --> 00:34:21,199 So for example, assume that you have a normal distribution-- 448 00:34:21,199 --> 00:34:24,449 one random variable with normal distribution. 449 00:34:24,449 --> 00:34:28,800 If we just have a single random variable, 450 00:34:28,800 --> 00:34:30,760 you really have no control. 451 00:34:30,760 --> 00:34:31,870 It can be anywhere. 452 00:34:31,870 --> 00:34:39,260 The outcome can be anything according to that distribution. 453 00:34:39,260 --> 00:34:41,429 But if you have several independent random variables 454 00:34:41,429 --> 00:34:44,540 with the exact same distribution, 455 00:34:44,540 --> 00:34:49,530 if the number is super large-- let's say 100 million-- 456 00:34:49,530 --> 00:34:55,320 and you plot how many random variables fall into each point 457 00:34:55,320 --> 00:34:58,150 into a graph, you'll know that it 458 00:34:58,150 --> 00:35:01,672 has to look very close to this curve. 459 00:35:01,672 --> 00:35:04,160 It will be more dense here, sparser there, 460 00:35:04,160 --> 00:35:06,720 and sparser there. 461 00:35:06,720 --> 00:35:09,050 So you don't have individual control on each 462 00:35:09,050 --> 00:35:10,150 of the random variables. 463 00:35:10,150 --> 00:35:12,185 But when you look at large scale, 464 00:35:12,185 --> 00:35:16,860 you know, at least with very high probability, 465 00:35:16,860 --> 00:35:19,990 it has to look like this curve. 466 00:35:19,990 --> 00:35:22,480 Those kind of things are what we want to study. 467 00:35:22,480 --> 00:35:25,720 When we look at this long-term behavior or large scale 468 00:35:25,720 --> 00:35:28,500 behavior, what can we say? 469 00:35:28,500 --> 00:35:30,130 What kind of events are guaranteed 470 00:35:30,130 --> 00:35:35,110 to happen with probability, let's say, 99.9%? 471 00:35:35,110 --> 00:35:38,680 And actually, some interesting things are happening. 472 00:35:38,680 --> 00:35:44,800 As you might already know, two typical theorems of this type 473 00:35:44,800 --> 00:35:46,850 will be, in this topic will be law 474 00:35:46,850 --> 00:35:53,282 of large numbers and central limit theorem. 475 00:36:02,520 --> 00:36:04,590 So let's start with our first topic-- 476 00:36:04,590 --> 00:36:05,975 the moment-generating function. 477 00:36:26,310 --> 00:36:28,800 The moment-generating function of a random variable 478 00:36:28,800 --> 00:36:31,540 is defined as-- I write it as m sub 479 00:36:31,540 --> 00:36:39,330 X. It's defined as expectation of e to the t times x 480 00:36:39,330 --> 00:36:41,090 where t is some parameter. 481 00:36:41,090 --> 00:36:42,510 t can be any real. 482 00:36:47,372 --> 00:36:48,330 You have to be careful. 483 00:36:48,330 --> 00:36:51,680 It doesn't always converge. 484 00:36:51,680 --> 00:36:58,360 So remark: does not necessarily exist. 485 00:37:09,900 --> 00:37:12,960 So for example, one of the distributions you already saw 486 00:37:12,960 --> 00:37:15,010 does not have moment-generating function. 487 00:37:15,010 --> 00:37:22,101 The log-normal distribution does not 488 00:37:22,101 --> 00:37:23,600 have any moment-generating function. 489 00:37:30,650 --> 00:37:33,720 And that's one thing you have to be careful. 490 00:37:33,720 --> 00:37:35,870 It's not just some theoretical thing. 491 00:37:38,329 --> 00:37:40,120 The statement is not something theoretical. 492 00:37:40,120 --> 00:37:42,670 It actually happens for some random variables 493 00:37:42,670 --> 00:37:45,548 that you encounter in your life. 494 00:37:45,548 --> 00:37:48,190 So be careful. 495 00:37:48,190 --> 00:37:54,460 And that will actually show some very interesting thing 496 00:37:54,460 --> 00:37:57,220 I will later explain. 497 00:37:57,220 --> 00:37:59,796 Some very interesting facts arise from this fact. 498 00:38:03,900 --> 00:38:06,277 Before going into that, first of all, 499 00:38:06,277 --> 00:38:08,110 why is it called moment-generating function? 500 00:38:08,110 --> 00:38:14,540 It's because if you take the k-th derivative 501 00:38:14,540 --> 00:38:26,280 of this function, then it actually 502 00:38:26,280 --> 00:38:33,131 gives the k-th moment of your random variable. 503 00:38:33,131 --> 00:38:34,505 That's where the name comes from. 504 00:38:43,235 --> 00:38:45,225 It's for all integers. 505 00:38:58,320 --> 00:39:00,040 And that gives a different way of writing 506 00:39:00,040 --> 00:39:01,248 a moment-generating function. 507 00:39:11,230 --> 00:39:18,090 Because of that, we may write the moment-generating function 508 00:39:18,090 --> 00:39:24,992 as the sum from k equals 0 to infinity, t to the k, 509 00:39:24,992 --> 00:39:29,912 k factorial, times a k-th moment. 510 00:39:37,790 --> 00:39:40,469 That's like the Taylor expansion. 511 00:39:40,469 --> 00:39:42,010 Because you know all the derivatives, 512 00:39:42,010 --> 00:39:43,551 you know what the functions would be. 513 00:39:43,551 --> 00:39:45,300 Of course, only if it exists. 514 00:39:45,300 --> 00:39:46,300 This might not converge. 515 00:39:55,080 --> 00:39:58,360 So if moment-generating function exists, 516 00:39:58,360 --> 00:40:01,120 they pretty much classify your random variables. 517 00:40:04,630 --> 00:40:09,020 So if two random variables, X, Y, 518 00:40:09,020 --> 00:40:16,120 have the same moment-generating function, 519 00:40:16,120 --> 00:40:24,835 then X and Y have the same distribution. 520 00:40:30,020 --> 00:40:32,550 I will not prove this theorem. 521 00:40:32,550 --> 00:40:35,080 But it says that moment-generating function, 522 00:40:35,080 --> 00:40:39,600 if it exists, encodes really all the information 523 00:40:39,600 --> 00:40:41,516 about your random variables. 524 00:40:41,516 --> 00:40:42,990 You're not losing anything. 525 00:40:46,320 --> 00:40:50,540 However, be very careful when you're applying this theorem. 526 00:40:50,540 --> 00:40:59,920 Because remark, it does not imply 527 00:40:59,920 --> 00:41:20,740 that all random variables with identical k-th moments 528 00:41:20,740 --> 00:41:26,790 for all k has the same distribution. 529 00:41:37,418 --> 00:41:40,030 Do you see it? 530 00:41:40,030 --> 00:41:43,330 If X and Y have a moment-generating function, 531 00:41:43,330 --> 00:41:49,210 and they're the same, then they have the same distribution. 532 00:41:49,210 --> 00:41:52,710 This looks a little bit controversial to this theorem. 533 00:41:52,710 --> 00:41:56,890 It says that it's not necessarily the case 534 00:41:56,890 --> 00:42:01,000 that two random variables, which have identical moments-- so 535 00:42:01,000 --> 00:42:04,750 all k-th moments are the same for two variables-- 536 00:42:04,750 --> 00:42:06,710 even if that's the case, they don't necessarily 537 00:42:06,710 --> 00:42:10,060 have to have the same distribution. 538 00:42:10,060 --> 00:42:12,014 Which seems like it doesn't make sense 539 00:42:12,014 --> 00:42:13,180 if you look at this theorem. 540 00:42:13,180 --> 00:42:14,596 Because moment-generating function 541 00:42:14,596 --> 00:42:16,650 is defined in terms of the moments. 542 00:42:16,650 --> 00:42:18,742 If two random variables have the same moment, 543 00:42:18,742 --> 00:42:20,575 we have the same moment-generating function. 544 00:42:20,575 --> 00:42:22,616 If they have the same moment-generating function, 545 00:42:22,616 --> 00:42:24,970 they have the same distribution. 546 00:42:24,970 --> 00:42:28,450 There is a hole in this argument. 547 00:42:28,450 --> 00:42:31,850 Even if they have the same moments, 548 00:42:31,850 --> 00:42:33,792 it doesn't necessarily imply that they 549 00:42:33,792 --> 00:42:35,500 have the same moment-generating function. 550 00:42:35,500 --> 00:42:39,520 They might both not have moment-generating functions. 551 00:42:39,520 --> 00:42:42,620 That's the glitch. 552 00:42:42,620 --> 00:42:44,040 Be careful. 553 00:42:44,040 --> 00:42:47,587 So just remember that even if they have the same moments, 554 00:42:47,587 --> 00:42:49,670 they don't necessarily have the same distribution. 555 00:42:49,670 --> 00:42:51,740 And the reason is because-- one reason 556 00:42:51,740 --> 00:42:56,110 is because the moment-generating function might not exist. 557 00:42:56,110 --> 00:42:57,930 And if you look in to Wikipedia, you'll 558 00:42:57,930 --> 00:43:00,850 see an example of when it happens, 559 00:43:00,850 --> 00:43:03,345 of two random variables where this happens. 560 00:43:10,310 --> 00:43:13,380 So that's one thing we will use later. 561 00:43:13,380 --> 00:43:17,660 Another thing that we will use later, 562 00:43:17,660 --> 00:43:20,950 it's a statement very similar to that, 563 00:43:20,950 --> 00:43:25,820 but it says something about a sequence of random variables. 564 00:43:25,820 --> 00:43:39,406 So if X_1, X_2, up to X_n is a sequence of random variables 565 00:43:39,406 --> 00:43:48,470 such that the moment-generating function exists, 566 00:43:48,470 --> 00:43:52,580 and it converges-- ah, it goes to infinity. 567 00:43:57,542 --> 00:44:03,250 Tends to the moment-generating function 568 00:44:03,250 --> 00:44:05,380 of some random variable t. 569 00:44:05,380 --> 00:44:13,091 X. For some random variable X for all t. 570 00:44:16,250 --> 00:44:18,970 Here, we're assuming that all moment-generating function 571 00:44:18,970 --> 00:44:20,280 exists. 572 00:44:20,280 --> 00:44:22,050 So again, the situation is, you have 573 00:44:22,050 --> 00:44:24,900 a sequence of random variables. 574 00:44:24,900 --> 00:44:27,600 Their moment-generating function exists. 575 00:44:27,600 --> 00:44:31,790 And in each point t, it converges 576 00:44:31,790 --> 00:44:33,967 to the value of the moment-generating function 577 00:44:33,967 --> 00:44:35,300 of some other random variable x. 578 00:44:38,270 --> 00:44:41,310 And what should happen? 579 00:44:41,310 --> 00:44:43,880 In light of this theorem, it should be the case 580 00:44:43,880 --> 00:44:47,490 that the distribution of this sequence 581 00:44:47,490 --> 00:44:49,240 gets closer and closer to the distribution 582 00:44:49,240 --> 00:44:53,360 of this random variable x. 583 00:44:53,360 --> 00:45:00,220 And to make it formal, to make that information formal, what 584 00:45:00,220 --> 00:45:09,760 we can conclude is, for all x, the probability 585 00:45:09,760 --> 00:45:15,440 X_i is less than or equal to x tends to the probability 586 00:45:15,440 --> 00:45:17,300 that at x. 587 00:45:20,090 --> 00:45:22,990 So in this sense, the distributions 588 00:45:22,990 --> 00:45:25,940 of these random variables converges to the distribution 589 00:45:25,940 --> 00:45:27,216 of that random variable. 590 00:45:30,090 --> 00:45:32,330 So it's just a technical issue. 591 00:45:32,330 --> 00:45:38,890 You can just think of it as these random variables converge 592 00:45:38,890 --> 00:45:41,200 to that random variable. 593 00:45:41,200 --> 00:45:43,230 If you take some graduate probability course, 594 00:45:43,230 --> 00:45:47,100 you'll see that there's several possible ways 595 00:45:47,100 --> 00:45:48,730 to define convergence. 596 00:45:48,730 --> 00:45:50,740 But that's just some technicality. 597 00:45:50,740 --> 00:45:53,397 And the spirit here is just really 598 00:45:53,397 --> 00:45:55,730 the sequence converges if its moment-generating function 599 00:45:55,730 --> 00:45:56,229 converges. 600 00:45:59,790 --> 00:46:02,470 So as you can see from these two theorems, 601 00:46:02,470 --> 00:46:04,440 moment-generating function, if it exists, 602 00:46:04,440 --> 00:46:08,270 is a really powerful tool that allows you 603 00:46:08,270 --> 00:46:09,480 to control the distribution. 604 00:46:13,060 --> 00:46:16,407 You'll see some applications later in central limit theorem. 605 00:46:16,407 --> 00:46:16,990 Any questions? 606 00:46:21,530 --> 00:46:22,446 AUDIENCE: [INAUDIBLE]? 607 00:46:28,557 --> 00:46:29,390 PROFESSOR: This one? 608 00:46:32,870 --> 00:46:34,154 Why? 609 00:46:34,154 --> 00:46:35,612 AUDIENCE: Because it starts with t, 610 00:46:35,612 --> 00:46:38,162 and the right-hand side has nothing general. 611 00:46:40,777 --> 00:46:41,360 PROFESSOR: Ah. 612 00:46:44,318 --> 00:46:47,180 Thank you. 613 00:46:47,180 --> 00:46:48,350 We evaluated at zero. 614 00:46:53,230 --> 00:46:54,694 Other questions? 615 00:46:54,694 --> 00:46:56,646 Other corrections? 616 00:46:56,646 --> 00:46:59,086 AUDIENCE: When you say the moment-generating function 617 00:46:59,086 --> 00:47:01,526 doesn't exist, do you mean that it isn't analytic 618 00:47:01,526 --> 00:47:03,010 or it doesn't converge? 619 00:47:03,010 --> 00:47:04,580 PROFESSOR: It might not converge. 620 00:47:04,580 --> 00:47:08,130 So log-normal distribution, it does not converge. 621 00:47:08,130 --> 00:47:10,412 So for all non-zero t, it does not 622 00:47:10,412 --> 00:47:12,109 converge, for log-normal distribution. 623 00:47:12,109 --> 00:47:13,025 AUDIENCE: [INAUDIBLE]? 624 00:47:16,350 --> 00:47:17,140 PROFESSOR: Here? 625 00:47:17,140 --> 00:47:17,640 Yes. 626 00:47:17,640 --> 00:47:19,822 Pointwise convergence implies pointwise convergence. 627 00:47:22,420 --> 00:47:22,945 No, no. 628 00:47:26,760 --> 00:47:30,474 Because it's pointwise, this conclusion is also rather weak. 629 00:47:30,474 --> 00:47:32,640 It's almost the weakest convergence in distribution. 630 00:48:01,024 --> 00:48:01,524 OK. 631 00:48:01,524 --> 00:48:12,480 The law of large numbers. 632 00:49:04,100 --> 00:49:06,940 So now we're talking about large-scale behavior. 633 00:49:06,940 --> 00:49:09,630 Let X_1 up to X_n be independent random variables 634 00:49:09,630 --> 00:49:11,334 with identical distribution. 635 00:49:11,334 --> 00:49:13,250 We don't really know what the distribution is, 636 00:49:13,250 --> 00:49:15,270 but we know that they're all the same. 637 00:49:15,270 --> 00:49:18,620 In short, I'll just refer to this condition as i.i.d. 638 00:49:18,620 --> 00:49:21,990 random variables later. 639 00:49:21,990 --> 00:49:25,048 Independent, identically distributed random variables. 640 00:49:29,040 --> 00:49:36,530 And let mean be mu, variance be sigma square. 641 00:49:44,470 --> 00:49:50,740 Let's also define X as the average of n random variables. 642 00:49:54,590 --> 00:50:22,986 Then the probability that-- X-- for all-- all positive 643 00:50:22,986 --> 00:50:23,486 [INAUDIBLE]. 644 00:50:31,590 --> 00:50:35,100 So whenever you have identical independent distributions, when 645 00:50:35,100 --> 00:50:39,050 you take their average, if you take a large enough number 646 00:50:39,050 --> 00:50:43,430 of samples, they will be very close to the mean, which 647 00:50:43,430 --> 00:50:44,144 makes sense. 648 00:51:04,420 --> 00:51:06,270 So what's an example of this? 649 00:51:06,270 --> 00:51:14,010 Before proving it, example of this theorem in practice 650 00:51:14,010 --> 00:51:16,605 can be seen in the casino. 651 00:51:22,530 --> 00:51:25,120 So for example, if you're playing blackjack 652 00:51:25,120 --> 00:51:38,890 in a casino, when you're playing against the casino, 653 00:51:38,890 --> 00:51:42,700 you have a very small disadvantage. 654 00:51:42,700 --> 00:51:52,500 If you're playing at the optimal strategy, 655 00:51:52,500 --> 00:51:56,380 you have-- does anybody know the probability? 656 00:51:56,380 --> 00:52:00,460 It's about 48%, 49%. 657 00:52:00,460 --> 00:52:04,520 About 48% chance of winning. 658 00:52:09,160 --> 00:52:14,340 That means if you bet $1 at the beginning of each round, 659 00:52:14,340 --> 00:52:22,605 the expected amount you'll win is $0.48. 660 00:52:22,605 --> 00:52:28,060 The expected amount that the casino will win is $0.52. 661 00:52:28,060 --> 00:52:30,760 But it's designed so that the variance is 662 00:52:30,760 --> 00:52:37,030 so big that this expectation is hidden, the mean is hidden. 663 00:52:37,030 --> 00:52:39,390 From the player's point of view, you only 664 00:52:39,390 --> 00:52:41,390 have a very small sample. 665 00:52:41,390 --> 00:52:44,960 So it looks like the mean doesn't matter, 666 00:52:44,960 --> 00:52:48,710 because the variance takes over in a very short scale. 667 00:52:48,710 --> 00:52:50,730 But from the casino's point of view, 668 00:52:50,730 --> 00:52:54,680 they're taking a very large n there. 669 00:52:54,680 --> 00:53:02,720 So for each round, let's say from the casino's 670 00:53:02,720 --> 00:53:13,500 point of view, it's like taking, they 671 00:53:13,500 --> 00:53:20,520 are taking enormous value of n. 672 00:53:26,640 --> 00:53:27,660 n here. 673 00:53:27,660 --> 00:53:32,380 And that means as long as they have the slightest advantage, 674 00:53:32,380 --> 00:53:34,993 they'll be winning money, and a huge amount of money. 675 00:53:38,240 --> 00:53:41,690 And most games played in the casinos are designed like this. 676 00:53:41,690 --> 00:53:45,730 It looks like the mean is really close to 50%, 677 00:53:45,730 --> 00:53:47,840 but it's hidden, because they designed it 678 00:53:47,840 --> 00:53:51,000 so the variance is big. 679 00:53:51,000 --> 00:53:53,180 But from the casino's point of view, 680 00:53:53,180 --> 00:53:55,010 they have enough players to play the game 681 00:53:55,010 --> 00:54:02,120 so that the law of large numbers just makes them money. 682 00:54:07,770 --> 00:54:09,530 The moral is, don't play blackjack. 683 00:54:12,240 --> 00:54:15,360 Play poker. 684 00:54:15,360 --> 00:54:19,790 The reason that the rule of law of large numbers 685 00:54:19,790 --> 00:54:23,010 doesn't apply, at least in this sense, to poker-- 686 00:54:23,010 --> 00:54:24,220 can anybody explain why? 687 00:54:27,100 --> 00:54:32,000 It's because poker, you're playing against other players. 688 00:54:32,000 --> 00:54:36,500 If you have an advantage, if your skill-- if you believe 689 00:54:36,500 --> 00:54:38,980 that there is skill in poker-- if your skill is better 690 00:54:38,980 --> 00:54:41,330 than the other player by, let's say, 691 00:54:41,330 --> 00:54:47,010 5% chance, then you have an edge over that player. 692 00:54:47,010 --> 00:54:48,010 So you can win money. 693 00:54:48,010 --> 00:54:53,870 The only problem is that because-- poker, you're 694 00:54:53,870 --> 00:54:55,691 not playing against the casino. 695 00:55:00,390 --> 00:55:04,770 Don't play against casino. 696 00:55:04,770 --> 00:55:06,530 But they still have to make money. 697 00:55:06,530 --> 00:55:08,770 So what they do instead is they take rake. 698 00:55:08,770 --> 00:55:12,350 So for each round that the players play, 699 00:55:12,350 --> 00:55:15,740 they pay some fee to the casino. 700 00:55:15,740 --> 00:55:19,920 And how the casino makes money at the poker table 701 00:55:19,920 --> 00:55:22,870 is by accumulating those fees. 702 00:55:22,870 --> 00:55:25,291 They're not taking chances there. 703 00:55:25,291 --> 00:55:26,790 But from the player's point of view, 704 00:55:26,790 --> 00:55:32,405 if you're better than the other player, and the amount of edge 705 00:55:32,405 --> 00:55:35,630 you have over the other player is larger than the fee 706 00:55:35,630 --> 00:55:38,000 that the casino charges to you, then 707 00:55:38,000 --> 00:55:41,380 now you can apply law of large numbers to yourself and win. 708 00:55:45,420 --> 00:55:50,360 And if you take an example as poker, 709 00:55:50,360 --> 00:55:54,372 it looks like-- OK, I'm not going to play poker. 710 00:55:54,372 --> 00:55:59,320 But if it's a hedge fund, or if you're 711 00:55:59,320 --> 00:56:04,850 doing high-frequency trading, that's the moral behind it. 712 00:56:04,850 --> 00:56:07,860 So that's the belief you should have. 713 00:56:07,860 --> 00:56:10,760 You have to believe that you have an edge. 714 00:56:10,760 --> 00:56:13,660 Even if you have a tiny edge, if you 715 00:56:13,660 --> 00:56:16,400 can have enough number of trials, 716 00:56:16,400 --> 00:56:21,000 if you can trade enough of times using some strategy that you 717 00:56:21,000 --> 00:56:26,580 believe is winning over time, then law of large numbers 718 00:56:26,580 --> 00:56:31,266 will take it from there and will bring you money, profit. 719 00:56:34,920 --> 00:56:41,770 Of course, the problem is, when the variance is big, 720 00:56:41,770 --> 00:56:45,210 your belief starts to fall. 721 00:56:45,210 --> 00:56:48,660 At least, that was the case for me when I was playing poker. 722 00:56:48,660 --> 00:56:51,650 Because I believed that I had an edge, 723 00:56:51,650 --> 00:56:55,520 but when there is really swing, it 724 00:56:55,520 --> 00:56:59,680 looks like your expectation is negative. 725 00:56:59,680 --> 00:57:01,885 And that's when you have to believe in yourself. 726 00:57:05,590 --> 00:57:07,690 Yeah. 727 00:57:07,690 --> 00:57:09,480 That's when your faith in mathematics 728 00:57:09,480 --> 00:57:11,929 is being challenged. 729 00:57:11,929 --> 00:57:12,720 It really happened. 730 00:57:15,290 --> 00:57:17,290 I hope it doesn't happen to you. 731 00:57:17,290 --> 00:57:22,730 Anyway, that's proof law of large numbers. 732 00:57:22,730 --> 00:57:23,690 How do you prove it? 733 00:57:23,690 --> 00:57:24,690 The proof is quite easy. 734 00:57:27,840 --> 00:57:32,940 First of all, one observation-- expectation of X is just 735 00:57:32,940 --> 00:57:37,640 expectation of 1 over n times sum of X_i's. 736 00:57:41,400 --> 00:57:52,471 And that, by linearity, just becomes the sum of-- 737 00:57:52,471 --> 00:57:55,883 and that's mu. 738 00:57:55,883 --> 00:57:56,383 OK. 739 00:57:56,383 --> 00:57:59,317 That's good. 740 00:57:59,317 --> 00:58:01,610 And then the variance, what's the variance of X? 741 00:58:04,430 --> 00:58:09,750 That's the expectation of X minus mu 742 00:58:09,750 --> 00:58:20,976 square, which is the expectation sum over all i's, minus mu 743 00:58:20,976 --> 00:58:21,476 square. 744 00:58:24,344 --> 00:58:26,260 I'll group them. 745 00:58:26,260 --> 00:58:33,584 That's the expectation of 1 over n sum of X_i minus mu square. 746 00:58:33,584 --> 00:58:35,580 i is from 1 to n. 747 00:58:43,570 --> 00:58:44,800 What did I do wrong? 748 00:58:44,800 --> 00:58:46,610 1 over n is inside the square. 749 00:58:46,610 --> 00:58:50,720 So I can take it out and square, n square. 750 00:58:50,720 --> 00:58:53,660 And then you're summing n terms of sigma square. 751 00:58:53,660 --> 00:58:57,145 So that is equal to sigma square over n. 752 00:59:02,450 --> 00:59:04,110 That means the effect of averaging 753 00:59:04,110 --> 00:59:08,600 n terms does not affect your average, 754 00:59:08,600 --> 00:59:10,020 but it affects your variance. 755 00:59:13,510 --> 00:59:15,802 It divides your variance by n. 756 00:59:15,802 --> 00:59:18,890 If you take larger and larger n, your variance 757 00:59:18,890 --> 00:59:20,080 gets smaller and smaller. 758 00:59:22,590 --> 00:59:25,970 And using that, we can prove this statement. 759 00:59:25,970 --> 00:59:27,840 There's only one thing you have to notice-- 760 00:59:27,840 --> 00:59:30,510 that the probability that x minus mu 761 00:59:30,510 --> 00:59:32,620 is greater than epsilon. 762 00:59:32,620 --> 00:59:35,840 When you multiply this by epsilon square. 763 00:59:35,840 --> 00:59:41,230 This will be less than or equal to the variance of x. 764 00:59:41,230 --> 00:59:42,780 The reason this inequality holds is 765 00:59:42,780 --> 00:59:46,290 because variance X is defined as the expectation of X minus mu 766 00:59:46,290 --> 00:59:48,200 square. 767 00:59:48,200 --> 00:59:52,340 For all the events when you have X minus mu at least epsilon, 768 00:59:52,340 --> 00:59:54,260 your multiplying factor X square will 769 00:59:54,260 --> 00:59:56,780 be at least epsilon square. 770 00:59:56,780 --> 01:00:00,350 This term will be at least epsilon square 771 01:00:00,350 --> 01:00:03,520 when you fall into this event. 772 01:00:03,520 --> 01:00:07,100 So your variance has to be at least that. 773 01:00:07,100 --> 01:00:11,971 And this is known to be sigma square over n. 774 01:00:11,971 --> 01:00:15,704 So probability that x minus mu is greater 775 01:00:15,704 --> 01:00:21,980 than epsilon is at most sigma square over n epsilon squared. 776 01:00:21,980 --> 01:00:26,140 That means if you take n to go to infinity, that goes to zero. 777 01:00:26,140 --> 01:00:29,590 So the probability that you deviate from the mean 778 01:00:29,590 --> 01:00:33,187 by more than epsilon goes to 0. 779 01:00:33,187 --> 01:00:35,645 You can actually read out a little bit more from the proof. 780 01:00:38,690 --> 01:00:41,635 It also tells a little bit about the speed of convergence. 781 01:00:44,260 --> 01:00:50,230 So let's say you have a random variable X. Your mean is 50. 782 01:00:50,230 --> 01:00:53,930 You epsilon is 0.1. 783 01:00:53,930 --> 01:00:55,830 So you want to know the probability 784 01:00:55,830 --> 01:01:00,480 that you deviate from your mean by more than 0.1. 785 01:01:00,480 --> 01:01:06,010 Let's say you want to be 99% sure. 786 01:01:06,010 --> 01:01:14,812 Want to be 99% sure that X minus mu is less than 0.1, 787 01:01:14,812 --> 01:01:18,120 or X minus 50 is less than 0.1. 788 01:01:18,120 --> 01:01:23,060 In that case, what you can do is-- you want this to be 0.01. 789 01:01:23,060 --> 01:01:26,360 It has to be 0.01. 790 01:01:26,360 --> 01:01:29,800 So plug in that, plug in your variance, plug in your epsilon. 791 01:01:29,800 --> 01:01:32,230 That will give you some bound on n. 792 01:01:32,230 --> 01:01:34,190 If you have more than that number of trials, 793 01:01:34,190 --> 01:01:38,113 you can be 99% sure that you don't deviate from your mean 794 01:01:38,113 --> 01:01:40,680 by more than epsilon. 795 01:01:40,680 --> 01:01:42,700 So that does give some estimate, but I 796 01:01:42,700 --> 01:01:46,150 should mention that this is a very bad estimate. 797 01:01:46,150 --> 01:01:47,990 There are much more powerful estimates 798 01:01:47,990 --> 01:01:48,970 that can be done here. 799 01:01:48,970 --> 01:01:50,770 That will give the order of magnitude-- I didn't really 800 01:01:50,770 --> 01:01:53,440 calculate here, but it looks like it's close to millions. 801 01:01:53,440 --> 01:01:55,900 It has to be close to millions. 802 01:01:55,900 --> 01:02:00,360 But in practice, if you use a lot more powerful tool 803 01:02:00,360 --> 01:02:05,008 of estimating it, it should only be hundreds or at most 804 01:02:05,008 --> 01:02:05,508 thousands. 805 01:02:13,460 --> 01:02:15,960 So the tool you'll use there is moment-generating functions, 806 01:02:15,960 --> 01:02:18,360 something similar to moment-generating functions. 807 01:02:18,360 --> 01:02:20,412 But I will not go into it. 808 01:02:20,412 --> 01:02:20,995 Any questions? 809 01:02:23,610 --> 01:02:25,090 OK. 810 01:02:25,090 --> 01:02:28,552 For those who already saw law of large numbers before, 811 01:02:28,552 --> 01:02:30,510 the name suggests there's also something called 812 01:02:30,510 --> 01:02:32,250 strong law of large numbers. 813 01:02:35,982 --> 01:02:41,380 In that theorem, your conclusion is stronger. 814 01:02:41,380 --> 01:02:45,005 So the convergence is stronger than this type of convergence. 815 01:02:47,810 --> 01:02:51,610 And also, the condition I gave here 816 01:02:51,610 --> 01:02:53,580 is a very strong condition. 817 01:02:53,580 --> 01:02:56,020 The same conclusion is true even if you 818 01:02:56,020 --> 01:02:58,840 weaken some of the conditions. 819 01:02:58,840 --> 01:03:01,580 So for example, the variance does not have to exist. 820 01:03:01,580 --> 01:03:06,480 It can be replaced by some other condition, and so on. 821 01:03:06,480 --> 01:03:08,860 But here, I just want it to be a simple form 822 01:03:08,860 --> 01:03:11,350 so that it's easy to prove. 823 01:03:11,350 --> 01:03:14,274 And you at least get the spirit of what's happening. 824 01:03:20,480 --> 01:03:26,140 Now let's move on to the next topic-- central limit theorem. 825 01:04:11,240 --> 01:04:16,880 So weak law of large numbers says 826 01:04:16,880 --> 01:04:22,210 that if you have IID random variables, 1 over n times 827 01:04:22,210 --> 01:04:27,400 sum over X_i's converges to mu, the mean, in some weak sense. 828 01:04:31,210 --> 01:04:33,730 And the reason it happened was because this had 829 01:04:33,730 --> 01:04:39,157 mean mu and variance sigma square over n. 830 01:04:43,660 --> 01:04:49,730 We've exploited the fact that variance vanishes to get this. 831 01:04:49,730 --> 01:04:53,560 So the question is, what happens if you replace 1 over n 832 01:04:53,560 --> 01:04:54,903 by 1 over square root n? 833 01:04:59,250 --> 01:05:04,590 What happens if-- for the random variable 834 01:05:04,590 --> 01:05:08,300 1 over square root n times X_i? 835 01:05:14,180 --> 01:05:16,990 The reason I'm making this choice of 1 over square root n 836 01:05:16,990 --> 01:05:19,310 is because if you make this choice, 837 01:05:19,310 --> 01:05:26,330 now the average has mean mu and variance sigma square just 838 01:05:26,330 --> 01:05:28,770 as in X_i's. 839 01:05:28,770 --> 01:05:34,981 So this is the same as X_i. 840 01:05:40,910 --> 01:05:44,330 Then what should it look like? 841 01:05:44,330 --> 01:05:46,730 If the random variable is the same mean and same variance 842 01:05:46,730 --> 01:05:52,120 as your original random variable, the distribution 843 01:05:52,120 --> 01:05:54,795 of this, should it look like the distribution of X_i? 844 01:06:00,530 --> 01:06:01,290 If mean is mu. 845 01:06:01,290 --> 01:06:04,170 Thank you very much. 846 01:06:04,170 --> 01:06:05,535 The case when mean is 0. 847 01:06:13,160 --> 01:06:13,660 OK. 848 01:06:13,660 --> 01:06:17,620 For this special case, will it look like X_i, 849 01:06:17,620 --> 01:06:20,820 or will it not look like X_i? 850 01:06:20,820 --> 01:06:24,260 If it doesn't look like X_i, can we say anything interesting 851 01:06:24,260 --> 01:06:27,590 about the distribution of this? 852 01:06:27,590 --> 01:06:31,480 And central limit theorem answers this question. 853 01:06:31,480 --> 01:06:34,980 When I first saw it, I thought it was really interesting. 854 01:06:34,980 --> 01:06:37,161 Because normal distribution comes up here. 855 01:06:40,250 --> 01:06:42,050 And that's probably one of the reasons 856 01:06:42,050 --> 01:06:45,010 that normal distribution is so universal. 857 01:06:45,010 --> 01:06:50,310 Because when you take many independent events 858 01:06:50,310 --> 01:06:53,270 and take the average in this sense, 859 01:06:53,270 --> 01:06:56,765 their distribution converges to a normal distribution. 860 01:06:56,765 --> 01:06:57,265 Yes? 861 01:06:57,265 --> 01:06:59,660 AUDIENCE: How did you get mean equals [INAUDIBLE]? 862 01:06:59,660 --> 01:07:00,970 PROFESSOR: I didn't get it. 863 01:07:00,970 --> 01:07:02,678 I assumed it if X-- yeah. 864 01:07:29,600 --> 01:07:41,480 So theorem: let X_1, X_2, to X_n be 865 01:07:41,480 --> 01:07:51,960 IID random variables with mean, this time, mu and variance 866 01:07:51,960 --> 01:07:55,020 sigma squared. 867 01:07:55,020 --> 01:07:59,308 And let X-- or Y_n. 868 01:08:01,940 --> 01:08:10,023 Y_n be square root n times 1 over n, of X_i minus mu. 869 01:08:24,813 --> 01:08:41,080 Then the distribution of Y_n converges 870 01:08:41,080 --> 01:08:50,056 to that of normal distribution with mean 0 and variance sigma. 871 01:08:55,050 --> 01:08:57,350 What this means-- I'll write it down again-- 872 01:08:57,350 --> 01:09:01,790 it means for all x, probability that Y_n 873 01:09:01,790 --> 01:09:03,790 is less than or equal to x converges 874 01:09:03,790 --> 01:09:07,722 the probability that normal distribution is less than 875 01:09:07,722 --> 01:09:08,910 or equal to x. 876 01:09:14,140 --> 01:09:16,220 What's really interesting here is, 877 01:09:16,220 --> 01:09:20,340 no matter what distribution you had in the beginning, 878 01:09:20,340 --> 01:09:24,090 if we average it out in this sense, 879 01:09:24,090 --> 01:09:25,965 then you converge to the normal distribution. 880 01:09:35,429 --> 01:09:37,720 Any questions about this statement, or any corrections? 881 01:09:40,490 --> 01:09:43,545 Any mistakes that I made? 882 01:09:43,545 --> 01:09:46,015 OK. 883 01:09:46,015 --> 01:09:47,003 Here's the proof. 884 01:09:50,970 --> 01:09:54,400 I will prove it when the moment-generating function 885 01:09:54,400 --> 01:09:54,900 exists. 886 01:09:54,900 --> 01:09:56,816 So assume that the moment-generating functions 887 01:09:56,816 --> 01:09:58,010 exists. 888 01:09:58,010 --> 01:10:04,963 So proof assuming m of X_i exists. 889 01:10:16,810 --> 01:10:19,860 So remember that theorem. 890 01:10:19,860 --> 01:10:22,160 Try to recall that theorem where if you 891 01:10:22,160 --> 01:10:25,130 know that the moment-generating function of Y_n's converges 892 01:10:25,130 --> 01:10:29,250 to the moment-generating function of the normal, then 893 01:10:29,250 --> 01:10:30,210 we have the statement. 894 01:10:30,210 --> 01:10:31,400 The distribution converges. 895 01:10:31,400 --> 01:10:34,328 So that's the statement we're going to use. 896 01:10:34,328 --> 01:10:37,100 That means our goal is to prove that the moment-generating 897 01:10:37,100 --> 01:10:43,020 function of these Y_n's converge to the moment-generating 898 01:10:43,020 --> 01:10:51,088 function of the normal for all t, pointwise convergence. 899 01:10:56,360 --> 01:11:00,080 And this part is well known. 900 01:11:00,080 --> 01:11:01,455 I'll just write it down. 901 01:11:01,455 --> 01:11:06,094 It's known to be e to the t square sigma square over 2. 902 01:11:08,818 --> 01:11:11,173 That just can be computed. 903 01:11:18,610 --> 01:11:21,270 So we want to somehow show that the moment-generating function 904 01:11:21,270 --> 01:11:25,738 of this Y_n converges to that. 905 01:11:25,738 --> 01:11:29,440 The moment-generating function of Y_n 906 01:11:29,440 --> 01:11:36,102 is equal to expectation of e to t Y_n. 907 01:11:42,544 --> 01:11:50,496 e to the t, 1 over square root n, sum of X_i minus mu. 908 01:11:54,490 --> 01:11:57,680 And then because each of the X_i's are independent, 909 01:11:57,680 --> 01:11:59,403 this sum will split into products. 910 01:12:02,650 --> 01:12:14,059 Product of-- let me split it better. 911 01:12:14,059 --> 01:12:19,240 Meets the expectation-- we didn't use independent yet. 912 01:12:19,240 --> 01:12:26,504 Sum becomes products of e to the t, 1 over square root n, X_i 913 01:12:26,504 --> 01:12:27,462 minus mu. 914 01:12:34,650 --> 01:12:36,380 And then because they're independent, 915 01:12:36,380 --> 01:12:37,530 this product can go out. 916 01:12:40,925 --> 01:12:49,996 Equal to the product from 1 to n expectation e to the t times 917 01:12:49,996 --> 01:12:50,984 square root n-- 918 01:12:56,160 --> 01:12:56,660 OK. 919 01:12:56,660 --> 01:12:58,159 Now they're identically distributed, 920 01:12:58,159 --> 01:13:00,900 so you just have to take the n-th power of that. 921 01:13:00,900 --> 01:13:03,923 That's equal to the expectation of e 922 01:13:03,923 --> 01:13:11,920 to the t over square root n, X_i minus mu, to the n-th power. 923 01:13:11,920 --> 01:13:15,420 Now we'll do some estimation. 924 01:13:15,420 --> 01:13:19,450 So use the Taylor expansion of this. 925 01:13:19,450 --> 01:13:30,002 What we get is expectation of 1 plus that, t over square root n 926 01:13:30,002 --> 01:13:36,990 xi minus mu, plus 1 over 2 factorial, that squared, 927 01:13:36,990 --> 01:13:43,760 t over square root n, xi minus mu squared, 928 01:13:43,760 --> 01:13:48,748 plus 1 over 3 factorial, that cubed plus so on. 929 01:13:55,050 --> 01:13:57,990 Then that's equal to 1-- Ah, to the n-th power. 930 01:14:02,920 --> 01:14:06,890 The linearity of expectation, 1 comes out. 931 01:14:06,890 --> 01:14:12,830 Second term is 0, because X_i have mean mu. 932 01:14:12,830 --> 01:14:15,020 So that disappears. 933 01:14:15,020 --> 01:14:26,930 This term-- we have 1 over 2, t squared over n, X_i minus mu 934 01:14:26,930 --> 01:14:29,370 square. 935 01:14:29,370 --> 01:14:31,590 X_i minus mu square, when you take expectation, 936 01:14:31,590 --> 01:14:35,550 that will be sigma square. 937 01:14:35,550 --> 01:14:39,720 And then the terms after that, because we're 938 01:14:39,720 --> 01:14:42,850 only interested in proving that for fixed t, 939 01:14:42,850 --> 01:14:46,160 this converges-- so we're only proving pointwise convergence. 940 01:14:46,160 --> 01:14:49,030 You may consider t as a fixed number. 941 01:14:49,030 --> 01:14:52,540 So as n goes to infinity-- if n is really, really large, 942 01:14:52,540 --> 01:14:56,730 all these terms will be smaller order of magnitude 943 01:14:56,730 --> 01:15:00,830 than n, 1 over n. 944 01:15:00,830 --> 01:15:02,270 Something like that happens. 945 01:15:08,530 --> 01:15:11,250 And that's happening because we're fixed. 946 01:15:11,250 --> 01:15:14,260 For fixed t, we have to prove it. 947 01:15:14,260 --> 01:15:16,292 So if we're saying something uniformly about t, 948 01:15:16,292 --> 01:15:18,390 that's no longer true. 949 01:15:18,390 --> 01:15:21,060 Now we go back to the exponential form. 950 01:15:21,060 --> 01:15:26,540 So this is pretty much just e to that term, 951 01:15:26,540 --> 01:15:30,900 1 over 2 t square sigma square over n 952 01:15:30,900 --> 01:15:37,370 plus little o of 1 over n to the n-th power. 953 01:15:37,370 --> 01:15:42,980 Now, that n can be multiplied to cancel out. 954 01:15:42,980 --> 01:15:46,640 And we see that it's e to t square sigma square over 2 955 01:15:46,640 --> 01:15:48,342 plus the little o of 1. 956 01:15:48,342 --> 01:15:50,370 So if you take n to go to infinity, 957 01:15:50,370 --> 01:15:55,840 that term disappears, and we prove 958 01:15:55,840 --> 01:15:57,410 that it converges to that. 959 01:16:00,100 --> 01:16:04,516 And then by the theorem that I stated before, if we have this, 960 01:16:04,516 --> 01:16:06,182 we know that the distribution converges. 961 01:16:09,880 --> 01:16:10,500 Any questions? 962 01:16:13,760 --> 01:16:14,260 OK. 963 01:16:14,260 --> 01:16:15,515 I'll make one final remark. 964 01:16:29,009 --> 01:16:42,640 So suppose there is a random variable x whose mean we do not 965 01:16:42,640 --> 01:16:44,865 know, whose mean is unknown. 966 01:16:53,670 --> 01:16:55,710 Our goal is to estimate the mean. 967 01:16:58,970 --> 01:17:02,730 And one way to do that is by taking many independent trials 968 01:17:02,730 --> 01:17:05,220 of this random variable. 969 01:17:05,220 --> 01:17:21,680 So take independent trials X_1, X_2, to X_n, and use 1 over-- 970 01:17:21,680 --> 01:17:22,250 X_1 plus... 971 01:17:22,250 --> 01:17:23,565 X_n as our estimator. 972 01:17:32,960 --> 01:17:34,990 Then the law of large numbers says that this 973 01:17:34,990 --> 01:17:36,750 will be very close to the mean. 974 01:17:36,750 --> 01:17:39,840 So if you take n to be large enough, 975 01:17:39,840 --> 01:17:42,100 you will more than likely have some value which 976 01:17:42,100 --> 01:17:44,190 is very close to the mean. 977 01:17:44,190 --> 01:17:47,050 And then the central limit theorem 978 01:17:47,050 --> 01:17:53,530 tells you how the distribution of this variable 979 01:17:53,530 --> 01:17:55,915 is around the mean. 980 01:17:55,915 --> 01:17:57,920 So we don't know what the real value is, 981 01:17:57,920 --> 01:18:00,620 but we know that the distribution 982 01:18:00,620 --> 01:18:02,980 of the value that we will obtain here 983 01:18:02,980 --> 01:18:05,048 is something like that around the mean. 984 01:18:09,340 --> 01:18:17,080 And because normal distribution have very small tails, 985 01:18:17,080 --> 01:18:21,900 the tail distributions is really small, 986 01:18:21,900 --> 01:18:23,950 we will get really close really fast. 987 01:18:27,290 --> 01:18:34,387 And this is known as the maximum likelihood estimator, is it? 988 01:18:37,670 --> 01:18:38,310 OK, yeah. 989 01:18:38,310 --> 01:18:39,980 For some distributions, it's better 990 01:18:39,980 --> 01:18:44,080 to take some other estimator. 991 01:18:44,080 --> 01:18:47,280 Which is quite interesting. 992 01:18:47,280 --> 01:18:50,015 At least my intuition is to take this for every single case, 993 01:18:50,015 --> 01:18:52,890 looks like that will be a good choice. 994 01:18:52,890 --> 01:18:54,680 But it turns out that that's not the case; 995 01:18:54,680 --> 01:18:59,492 for some distributions there's a better choice than this. 996 01:18:59,492 --> 01:19:03,210 And Peter will later talk about it. 997 01:19:06,340 --> 01:19:09,960 If you're interested in, come back. 998 01:19:09,960 --> 01:19:13,960 And that's it for today, any questions? 999 01:19:13,960 --> 01:19:17,875 So next Tuesday we will have an outside speaker, 1000 01:19:17,875 --> 01:19:21,256 and it will be on bonds. 1001 01:19:21,256 --> 01:19:24,883 and I don't think anything from linear algebra will be here.