1 00:00:00 --> 00:00:00 2 00:00:00 --> 00:00:02 The following content is provided under a 3 00:00:02 --> 00:00:03 Creative Commons license. 4 00:00:03 --> 00:00:06 Your support will help MIT OpenCourseWare continue to 5 00:00:06 --> 00:00:10 offer high quality educational resources for free. 6 00:00:10 --> 00:00:13 To make a donation or view additional materials from 7 00:00:13 --> 00:00:17 hundreds of MIT courses, visit MIT OpenCourseWare 8 00:00:17 --> 00:00:21 at ocw.mit.edu. 9 00:00:21 --> 00:00:24 PROFESSOR: I want to pick up exactly where 10 00:00:24 --> 00:00:26 I left off last time. 11 00:00:26 --> 00:00:28 When I was talking about various sins one can 12 00:00:28 --> 00:00:31 commit with statistics. 13 00:00:31 --> 00:00:45 And I had been talking about the sin of data enhancement, 14 00:00:45 --> 00:00:49 where the basic idea there is, you take a piece of data, and 15 00:00:49 --> 00:00:51 you read much more into it than it implies. 16 00:00:51 --> 00:00:56 In particular, a very common thing people do with data 17 00:00:56 --> 00:01:01 is they extrapolate. 18 00:01:01 --> 00:01:08 I'd given you a couple of examples. 19 00:01:08 --> 00:01:17 In the real world, it's often not desirable to say that I 20 00:01:17 --> 00:01:21 have a point here, and a point here, therefore the next 21 00:01:21 --> 00:01:23 point will surely be here. 22 00:01:23 --> 00:01:27 And we can just extrapolate in a straight line. 23 00:01:27 --> 00:01:30 We before saw some examples where I had an algorithm to 24 00:01:30 --> 00:01:34 generate points, and we fit a curve to it, used the curve to 25 00:01:34 --> 00:01:36 predict future points, and discovered it was 26 00:01:36 --> 00:01:40 nowhere close. 27 00:01:40 --> 00:01:44 Unfortunately, we often see people do this sort of thing. 28 00:01:44 --> 00:01:49 One of my favorite stories is, William Ruckelshaus, who was 29 00:01:49 --> 00:01:52 head of the Environmental Protection Agency in 30 00:01:52 --> 00:01:54 the early 1970s. 31 00:01:54 --> 00:01:58 And he had a press conference, spoke about the increased use 32 00:01:58 --> 00:02:03 of cars, and the decreased amount of carpooling. 33 00:02:03 --> 00:02:05 He was trying to get people to carpool, since at the time 34 00:02:05 --> 00:02:11 carpooling was on the way down, and I now quote, "each car 35 00:02:11 --> 00:02:16 entering the central city, sorry, in 1960," he said, "each 36 00:02:16 --> 00:02:20 car entering the central city had 1.7 people in it. 37 00:02:20 --> 00:02:25 By 1970. this had dropped to less than 1.2. 38 00:02:25 --> 00:02:30 If present trends continue, by 1980, more than 1 out of every 39 00:02:30 --> 00:02:36 10 cars entering the city will have no driver." Amazingly 40 00:02:36 --> 00:02:41 enough, the press reported this as a straight story, and talked 41 00:02:41 --> 00:02:45 about how we would be dramatically dropping. 42 00:02:45 --> 00:02:49 Of course, as it happened, it didn't occur. 43 00:02:49 --> 00:02:53 But it's just an example of, how much trouble you can 44 00:02:53 --> 00:02:56 get into by extrapolating. 45 00:02:56 --> 00:03:00 The final sin I want to talk about is probably the most 46 00:03:00 --> 00:03:13 common, and it's called the Texas sharpshooter fallacy. 47 00:03:13 --> 00:03:17 Now before I get into that, are any of you here from Texas? 48 00:03:17 --> 00:03:19 All right, you're going to be offended. 49 00:03:19 --> 00:03:23 Let me think, OK, anybody here from Oklahoma? 50 00:03:23 --> 00:03:24 You'll like it. 51 00:03:24 --> 00:03:26 I'll dump on Oklahoma, it will be much better then. 52 00:03:26 --> 00:03:32 We'll talk about the Oklahoma sharpshooter fallacy. 53 00:03:32 --> 00:03:37 We won't talk about the BCS rankings, though. 54 00:03:37 --> 00:03:40 So the idea here is a pretty simple one. 55 00:03:40 --> 00:03:45 This is a famous marksman who fires his gun randomly at the 56 00:03:45 --> 00:03:52 side of a barn, has a bunch of holes in it, then goes and 57 00:03:52 --> 00:03:59 takes a can of paint and draws bullseyes around all the places 58 00:03:59 --> 00:04:02 his bullets happened to hit. 59 00:04:02 --> 00:04:07 And people walk by the barn and say, God, he is good. 60 00:04:07 --> 00:04:13 So obviously, not a good thing, but amazingly easy 61 00:04:13 --> 00:04:16 to fall into this trap. 62 00:04:16 --> 00:04:17 So here's another example. 63 00:04:17 --> 00:04:24 In August of 2001, a paper which people took seriously 64 00:04:24 --> 00:04:27 appeared in a moderately serious journal called 65 00:04:27 --> 00:04:29 The New Scientist. 66 00:04:29 --> 00:04:33 And it announced that researchers in Scotland had 67 00:04:33 --> 00:04:38 proven that anorexics are likely to have been 68 00:04:38 --> 00:04:41 born in June. 69 00:04:41 --> 00:04:44 I'm sure you all knew that. 70 00:04:44 --> 00:04:45 How did how did they prove this? 71 00:04:45 --> 00:04:47 Or demonstrate this? 72 00:04:47 --> 00:04:59 They studied 446 women. 73 00:04:59 --> 00:05:04 Each of whom had been diagnosed anorexic. 74 00:05:04 --> 00:05:14 And they observed that about 30 percent more than 75 00:05:14 --> 00:05:33 average were born in June. 76 00:05:33 --> 00:05:36 Now, since the monthly average of births, if you divide this 77 00:05:36 --> 00:05:43 by 12, it's about 37, that tells us that 48 78 00:05:43 --> 00:05:47 were born in June. 79 00:05:47 --> 00:05:50 So at first sight, this seems significant, and in fact if you 80 00:05:50 --> 00:05:55 run tests, and ask what's the likelihood of that many more 81 00:05:55 --> 00:06:04 being born in 1 month, you'll find that it's quite unlikely. 82 00:06:04 --> 00:06:08 In fact, you'll find the probability of this happening 83 00:06:08 --> 00:06:16 is only about 3 percent, of it happening just by accident. 84 00:06:16 --> 00:06:19 What's wrong with the logic here? 85 00:06:19 --> 00:06:19 Yes? 86 00:06:19 --> 00:06:24 STUDENT: They only studied diagnosed anorexics. 87 00:06:24 --> 00:06:26 PROFESSOR: No, because they were only interested in the 88 00:06:26 --> 00:06:30 question of when are anorexics born, so it made sense 89 00:06:30 --> 00:06:31 to only study those. 90 00:06:31 --> 00:06:35 Now maybe you're right, that we could study that, in fact, more 91 00:06:35 --> 00:06:37 people are born in June period. 92 00:06:37 --> 00:06:38 That could be true. 93 00:06:38 --> 00:06:40 This would be one of the fallacies we looked 94 00:06:40 --> 00:06:42 at before, right? 95 00:06:42 --> 00:06:45 That there's a lurking variable which is just that people are 96 00:06:45 --> 00:06:47 more likely to be born in June. 97 00:06:47 --> 00:06:50 So that's certainly a possibility. 98 00:06:50 --> 00:06:52 What else? 99 00:06:52 --> 00:06:53 What else is the flaw? 100 00:06:53 --> 00:07:04 Where's the flaw in this logic? 101 00:07:04 --> 00:07:06 Well, what did they do? 102 00:07:06 --> 00:07:12 They participated in the Oklahoma sharpshooter fallacy. 103 00:07:12 --> 00:07:17 What they did is, they looked at 12 months, they took the 104 00:07:17 --> 00:07:22 months with the most births in it, which happened to be 105 00:07:22 --> 00:07:29 June, and calculated the probability of 3 percent. 106 00:07:29 --> 00:07:34 They didn't start with the hypothesis that it was June. 107 00:07:34 --> 00:07:36 They started with 12 months, and then they drew a 108 00:07:36 --> 00:07:39 bullseye around June. 109 00:07:39 --> 00:07:43 So the right question to ask is, what's the probability, not 110 00:07:43 --> 00:07:49 that June had 48 babies, but that at least one of the 111 00:07:49 --> 00:07:55 12 months had 48 babies. 112 00:07:55 --> 00:08:04 That probability is a lot to higher than 3 percent, right? 113 00:08:04 --> 00:08:10 In fact, it's about 30 percent. 114 00:08:10 --> 00:08:17 So what we see is, again perfectly reasonable 115 00:08:17 --> 00:08:21 statistical techniques, but not looking at 116 00:08:21 --> 00:08:23 things in the right way. 117 00:08:23 --> 00:08:26 And answering the wrong question. 118 00:08:26 --> 00:08:29 That make sense to everybody? 119 00:08:29 --> 00:08:31 And you can see why people can fall into this trap, right? 120 00:08:31 --> 00:08:36 It was a perfectly sensible, seemingly sensible argument. 121 00:08:36 --> 00:08:45 So the moral of this particular thing is, be very careful about 122 00:08:45 --> 00:08:49 looking at your data, drawing a conclusion, and then saying how 123 00:08:49 --> 00:08:53 probable was that to have occurred? 124 00:08:53 --> 00:08:56 Because again, you're probably, or maybe, drawing the 125 00:08:56 --> 00:08:59 bullseye around something that's already there. 126 00:08:59 --> 00:09:06 Now if they had taken another set of 446 anorexics, and again 127 00:09:06 --> 00:09:09 June was the month, then there would be some 128 00:09:09 --> 00:09:11 credibility in it. 129 00:09:11 --> 00:09:13 Because they would have started with the hypothesis, not that 130 00:09:13 --> 00:09:19 there existed a month, but that June was particularly likely. 131 00:09:19 --> 00:09:22 But then they would have to also check and make sure that 132 00:09:22 --> 00:09:25 June isn't just a popular month to be born, as 133 00:09:25 --> 00:09:27 was suggested earlier. 134 00:09:27 --> 00:09:30 All right, I could go on and on with this sort of 135 00:09:30 --> 00:09:32 thing, it's kind of fun. 136 00:09:32 --> 00:09:34 But I won't. 137 00:09:34 --> 00:09:36 Instead I'm going to torture you with yet 138 00:09:36 --> 00:09:39 one more simulation. 139 00:09:39 --> 00:09:43 You may be tempted at this point to just zone out. 140 00:09:43 --> 00:09:45 Try not to. 141 00:09:45 --> 00:09:48 And as an added incentive for you to pay attention, I'm 142 00:09:48 --> 00:09:52 going to warn you that this particular simulation will 143 00:09:52 --> 00:09:58 appear in the final, or a variant of it. 144 00:09:58 --> 00:10:02 And what we'll be doing is, early next week we'll be 145 00:10:02 --> 00:10:06 distributing code, which we'll ask you to study, about two or 146 00:10:06 --> 00:10:11 three pages of code, and then on the final we'll be asking 147 00:10:11 --> 00:10:13 you questions about the code. 148 00:10:13 --> 00:10:14 Not that you have to memorize it, we'll 149 00:10:14 --> 00:10:16 give you a copy of it. 150 00:10:16 --> 00:10:20 But you should understand it before you walk in 151 00:10:20 --> 00:10:21 to take the final. 152 00:10:21 --> 00:10:24 Because there will not be time to look at that code for the 153 00:10:24 --> 00:10:30 first time during the quiz, and figure out what it's doing. 154 00:10:30 --> 00:10:38 OK, so let's look at it. 155 00:10:38 --> 00:10:41 I should also warn you that this code includes some Python 156 00:10:41 --> 00:10:46 concepts, at least one, that you have not yet seen. 157 00:10:46 --> 00:10:48 We'll see it briefly today. 158 00:10:48 --> 00:10:52 This is on purpose, because one of the things I hope you have 159 00:10:52 --> 00:10:56 learned to do this semester, is look up things you don't know, 160 00:10:56 --> 00:10:57 and figure out what they do. 161 00:10:57 --> 00:10:59 What they mean. 162 00:10:59 --> 00:11:03 Because we obviously can't, in any course, or even any set of 163 00:11:03 --> 00:11:05 courses, tell you everything you'll ever want 164 00:11:05 --> 00:11:07 to know in life. 165 00:11:07 --> 00:11:10 So intentionally, we've seeded some things in this program 166 00:11:10 --> 00:11:13 that will be unfamiliar, so during the time you're studying 167 00:11:13 --> 00:11:18 the program, get online, look it up, figure out what they do. 168 00:11:18 --> 00:11:21 If you have trouble, we will be having office hours, where 169 00:11:21 --> 00:11:23 you can go and get some help. 170 00:11:23 --> 00:11:26 But the TAs will expect you to have at least tried to 171 00:11:26 --> 00:11:27 figure it out yourself. 172 00:11:27 --> 00:11:27 Yeah? 173 00:11:27 --> 00:11:30 STUDENT: Will the final be open note? 174 00:11:30 --> 00:11:32 PROFESSOR: Final will be open book, open notes, 175 00:11:32 --> 00:11:34 just like the quizzes. 176 00:11:34 --> 00:11:37 It will be the first two hours of the allotted time, we won't 177 00:11:37 --> 00:11:44 go the whole 3 hours, OK? 178 00:11:44 --> 00:11:48 So it won't be hugely longer than the quizzes. 179 00:11:48 --> 00:11:49 It would be a little bit longer. 180 00:11:49 --> 00:11:56 And again, very much in the same style of the quizzes. 181 00:11:56 --> 00:11:59 All right, let's look at this. 182 00:11:59 --> 00:12:03 Let's assume that you've won the lottery, and have serious 183 00:12:03 --> 00:12:09 money that you foolishly wish to invest in the stock market. 184 00:12:09 --> 00:12:15 There are two basic strategies to choose from, in investing. 185 00:12:15 --> 00:12:27 You can either have what's called an indexed portfolio, 186 00:12:27 --> 00:12:41 or a managed portfolio. 187 00:12:41 --> 00:12:45 Indexed portfolios, you basically say, I want to own 188 00:12:45 --> 00:12:49 all of the stocks that there are, and if the stock market 189 00:12:49 --> 00:12:51 goes up, I make money, if the stock market goes 190 00:12:51 --> 00:12:53 down, I lose money. 191 00:12:53 --> 00:12:55 I'm not going to be thinking I'm clever, and can pick 192 00:12:55 --> 00:12:57 winners and losers, I'm just betting on the 193 00:12:57 --> 00:13:00 market as a whole. 194 00:13:00 --> 00:13:03 They're attractive, in that a, they don't require 195 00:13:03 --> 00:13:05 a lot of thought. 196 00:13:05 --> 00:13:09 And b, they have what's called a low expense ratio, since 197 00:13:09 --> 00:13:11 they're easy to implement, you don't pay anyone to be 198 00:13:11 --> 00:13:13 brilliant to implement if for you. 199 00:13:13 --> 00:13:17 So they're very low fees. 200 00:13:17 --> 00:13:21 A managed portfolio, you find somebody you think is really 201 00:13:21 --> 00:13:26 smart, and you pay them a fair amount of money, and in return 202 00:13:26 --> 00:13:30 they assert that they will pick winners for you, and in fact, 203 00:13:30 --> 00:13:33 you will outperform the stock market. 204 00:13:33 --> 00:13:36 And if it goes up 6 percent, well you'll go up 10 percent 205 00:13:36 --> 00:13:40 or more, and if it goes down, don't worry, I'm so smart 206 00:13:40 --> 00:13:45 your stocks won't go down. 207 00:13:45 --> 00:13:47 There's a lot of debate about which is the 208 00:13:47 --> 00:13:52 better of these two. 209 00:13:52 --> 00:13:55 And so now we're going to try and see if we can write a 210 00:13:55 --> 00:14:01 simulation that will give us some insight as to which of 211 00:14:01 --> 00:14:06 these might be better or worse. 212 00:14:06 --> 00:14:10 All right, so that's the basic problem. 213 00:14:10 --> 00:14:16 Now, as we know, and by the way we're not going to write a 214 00:14:16 --> 00:14:18 perfect simulation here, because we're going to try and 215 00:14:18 --> 00:14:21 do it in 40 minutes, or 30 minutes. 216 00:14:21 --> 00:14:24 And it would take at least an hour do a perfect simulation 217 00:14:24 --> 00:14:26 of the stock market. 218 00:14:26 --> 00:14:29 All right. 219 00:14:29 --> 00:14:33 First thing we need to do is have some sort of a theory. 220 00:14:33 --> 00:14:36 When we did the spring, we had this theory of Hooke's Law that 221 00:14:36 --> 00:14:40 told us something, and we built a simulation, or built some 222 00:14:40 --> 00:14:42 tools around that theory. 223 00:14:42 --> 00:14:47 Now we need to think about a model of the stock market. 224 00:14:47 --> 00:14:52 And the model we're going to use is based on what's 225 00:14:52 --> 00:15:00 called the Efficient Market Hypothesis. 226 00:15:00 --> 00:15:05 So the moral here, again, is whenever you're doing an 227 00:15:05 --> 00:15:07 implementation of a simulation, you do need to have some 228 00:15:07 --> 00:15:12 underlying theory about the model. 229 00:15:12 --> 00:15:14 What this model asserts is that markets are 230 00:15:14 --> 00:15:31 informationally efficient. 231 00:15:31 --> 00:15:36 That is to say, current prices reflect all publicly known 232 00:15:36 --> 00:15:43 information about each stock, and therefore are unbiased. 233 00:15:43 --> 00:15:47 That if people thought that the stock was underpriced, well 234 00:15:47 --> 00:15:49 people would buy more of it in the price would have 235 00:15:49 --> 00:15:51 risen already. 236 00:15:51 --> 00:15:53 If people thought the stock was overpriced, well, people would 237 00:15:53 --> 00:15:56 have tried to sell it, and it would have come down. 238 00:15:56 --> 00:15:59 So this is a very popular theory, believed by 239 00:15:59 --> 00:16:05 many famous economists today, and in the past. 240 00:16:05 --> 00:16:09 And says, OK, that effectively means that the market 241 00:16:09 --> 00:16:13 is memoryless. 242 00:16:13 --> 00:16:15 OK, that it doesn't matter what the price of the 243 00:16:15 --> 00:16:18 stock was yesterday. 244 00:16:18 --> 00:16:22 Today, it's priced given the best-known information, and 245 00:16:22 --> 00:16:29 so tomorrow it's equally likely to go up or down. 246 00:16:29 --> 00:16:32 Relative to the whole market, right? 247 00:16:32 --> 00:16:36 It's well known that over periods of multiple 248 00:16:36 --> 00:16:39 decades, the market has a tendency to go up. 249 00:16:39 --> 00:16:42 And so there's an upward bias to the stock market, contrary 250 00:16:42 --> 00:16:46 to what you may have seen recently. 251 00:16:46 --> 00:16:49 But that no particular stock is more or less likely to 252 00:16:49 --> 00:16:53 outperform the market, because all the information is 253 00:16:53 --> 00:16:56 incorporated in the price. 254 00:16:56 --> 00:17:00 And that leads to a notion of being able to model 255 00:17:00 --> 00:17:04 the market, how? 256 00:17:04 --> 00:17:08 How would you model individual stocks if you believe 257 00:17:08 --> 00:17:14 this hypothesis? 258 00:17:14 --> 00:17:15 Somebody? 259 00:17:15 --> 00:17:16 What's going to happen? 260 00:17:16 --> 00:17:18 STUDENT: Random walk. 261 00:17:18 --> 00:17:20 PROFESSOR: Yes, exactly right. 262 00:17:20 --> 00:17:23 So we would model it as a random walk. 263 00:17:23 --> 00:17:26 In fact, there's a very famous book called A Random Walk Down 264 00:17:26 --> 00:17:30 Wall Street, that was one of the first to make 265 00:17:30 --> 00:17:34 this hypothesis. 266 00:17:34 --> 00:17:38 Now later, we may decide to abandon this model, but for 267 00:17:38 --> 00:17:43 the moment let's accept that. 268 00:17:43 --> 00:17:53 And let's think about how we're going to build the simulation. 269 00:17:53 --> 00:17:58 Whenever I think about how to build an interesting program, 270 00:17:58 --> 00:18:01 and I hope whenever you think about it, the first thing I 271 00:18:01 --> 00:18:05 think about is, what are the classes I might want to 272 00:18:05 --> 00:18:08 have, what are the types? 273 00:18:08 --> 00:18:12 And it seems pretty obvious that at least two of the 274 00:18:12 --> 00:18:18 things I'm going to want are stock and market. 275 00:18:18 --> 00:18:23 After all, I'm going to try and build a simulation of the stock 276 00:18:23 --> 00:18:26 market, so I might as well have the notion of a market, and 277 00:18:26 --> 00:18:31 probably the notion of a stock. 278 00:18:31 --> 00:18:35 Which should I implement first? 279 00:18:35 --> 00:18:40 Well, my usual style of programming would be to 280 00:18:40 --> 00:18:43 implement the one that's lowest down in the 281 00:18:43 --> 00:18:48 hierarchy, near the bottom. 282 00:18:48 --> 00:18:50 I won't be able to show you what a market does unless I 283 00:18:50 --> 00:18:55 have stocks, but I can look at what an individual stock does 284 00:18:55 --> 00:18:58 without having a market. 285 00:18:58 --> 00:19:00 So why do I implement this first? 286 00:19:00 --> 00:19:03 Because it will be easier to unit test. 287 00:19:03 --> 00:19:10 I can build class stock, and I can test class stock, before 288 00:19:10 --> 00:19:16 I have a class market. 289 00:19:16 --> 00:19:30 So now let's look at it. 290 00:19:30 --> 00:19:38 Clean up the desktop a little bit. 291 00:19:38 --> 00:19:42 This is similar to, but not identical to, what you 292 00:19:42 --> 00:19:48 have in your handout. 293 00:19:48 --> 00:19:53 All right, so there's class stock. 294 00:19:53 --> 00:19:58 And I'm going to initialize it, create them, with 295 00:19:58 --> 00:19:59 an opening price. 296 00:19:59 --> 00:20:02 When a stock is first listed in the market, it 297 00:20:02 --> 00:20:06 comes with some price. 298 00:20:06 --> 00:20:10 I'm gonna keep as part of each stock, it's history of prices, 299 00:20:10 --> 00:20:13 which we can initialize, well, I've initialized it as empty, 300 00:20:13 --> 00:20:17 but that's probably the wrong thing, right? 301 00:20:17 --> 00:20:26 I probably should have had it being the, starting here, 302 00:20:26 --> 00:20:32 right, the opening price. 303 00:20:32 --> 00:20:34 Now comes an interesting part. 304 00:20:34 --> 00:20:38 Self dot distribution. 305 00:20:38 --> 00:20:46 Well, I lied to you a little bit in my description of 306 00:20:46 --> 00:20:53 what it meant to have the Efficient Market Hypothesis. 307 00:20:53 --> 00:20:59 I said that no stock is likely to outperform the market or 308 00:20:59 --> 00:21:02 underperform the market. 309 00:21:02 --> 00:21:06 But it's not quite true, because typically what they 310 00:21:06 --> 00:21:21 actually do that, is they say it's adjusted for risk. 311 00:21:21 --> 00:21:27 It's clear that some stocks are more volatile than others. 312 00:21:27 --> 00:21:31 If you will buy stock in an electrical utility which has a 313 00:21:31 --> 00:21:34 guaranteed revenue stream, because no matter how bad the 314 00:21:34 --> 00:21:39 economy gets, a lot of people still use electricity, you 315 00:21:39 --> 00:21:43 don't expect it to fluctuate a lot. 316 00:21:43 --> 00:21:50 If you buy stock in a high tech company, that sells things on 317 00:21:50 --> 00:21:55 the internet, you might expect it to fluctuate enormously. 318 00:21:55 --> 00:21:59 Or if you buy stock in a retailer, you might expect 319 00:21:59 --> 00:22:04 it to go up or down more dramatically with the economy, 320 00:22:04 --> 00:22:09 and so in fact there is a notion of risk, and I'm not 321 00:22:09 --> 00:22:12 going to do this in this simulation, but usually people 322 00:22:12 --> 00:22:15 have to be paid to take risk. 323 00:22:15 --> 00:22:18 And so it's usually the case that you can get a higher 324 00:22:18 --> 00:22:20 return if you're willing to take more risk. 325 00:22:20 --> 00:22:27 We might or might not have time to come back to that. 326 00:22:27 --> 00:22:32 But more generally, the point is, that each stock actually 327 00:22:32 --> 00:22:36 behaves a little bit differently. 328 00:22:36 --> 00:22:39 There's a distribution of how it would move. 329 00:22:39 --> 00:22:45 So even if, on average, the stock is expected to not move 330 00:22:45 --> 00:22:51 at all from where it starts, some stocks will be expected to 331 00:22:51 --> 00:22:56 just trundle along without much change, not very volatile. 332 00:22:56 --> 00:23:02 And other stocks might jump up and down a lot because 333 00:23:02 --> 00:23:03 they're very volatile. 334 00:23:03 --> 00:23:07 Even if the expected value is the same, they'd 335 00:23:07 --> 00:23:10 move around a lot. 336 00:23:10 --> 00:23:12 So how can we model this kind of thing? 337 00:23:12 --> 00:23:16 Well, we've already looked at the basic notion. 338 00:23:16 --> 00:23:19 Last time we looked at the notion, last lecture we looked 339 00:23:19 --> 00:23:25 at the idea of a distribution. 340 00:23:25 --> 00:23:29 And when we do a simulation, we're pulling the samples 341 00:23:29 --> 00:23:33 from some distribution. 342 00:23:33 --> 00:23:39 It could be normal, everything, that would be a Gaussian, where 343 00:23:39 --> 00:23:45 if you recall there was a mean, and a standard deviation, and 344 00:23:45 --> 00:23:49 most values were going to be close to the mean. 345 00:23:49 --> 00:23:52 Especially if there is a small standard deviation. 346 00:23:52 --> 00:23:56 If there's a large standard deviation it would be spread. 347 00:23:56 --> 00:23:59 Or it could be uniform, where every value was 348 00:23:59 --> 00:24:00 equally probable. 349 00:24:00 --> 00:24:03 We also looked at exponential. 350 00:24:03 --> 00:24:07 So we're going to assign to each stock, when we create 351 00:24:07 --> 00:24:11 it, a distribution. 352 00:24:11 --> 00:24:17 Some way of visualizing, or thinking about, where we draw 353 00:24:17 --> 00:24:20 the price changes from. 354 00:24:20 --> 00:24:26 This gets us into a new linguistic concept, which 355 00:24:26 --> 00:24:29 we'll see down here. 356 00:24:29 --> 00:24:33 You don't have this particular code on your handout, you do 357 00:24:33 --> 00:24:36 have a code that uses the same concept. 358 00:24:36 --> 00:24:39 So here's my unit test procedure. 359 00:24:39 --> 00:24:42 And here's where I'm going to create distributions. 360 00:24:42 --> 00:24:45 And I'm going to look at two. 361 00:24:45 --> 00:24:49 A random-- a uniform, and a Gaussian. 362 00:24:49 --> 00:24:56 What lambda that does, it creates on the fly a function, 363 00:24:56 --> 00:24:59 as the program runs. 364 00:24:59 --> 00:25:03 That I can then pass around. 365 00:25:03 --> 00:25:08 So here, I'm going to look at the thing random dot uniform, 366 00:25:08 --> 00:25:13 for example, between minus volatility and plus volatility. 367 00:25:13 --> 00:25:16 So ignoring the lambda, what do we expect random 368 00:25:16 --> 00:25:19 dot uniform to do? 369 00:25:19 --> 00:25:25 It has equally likely, in the range from minus volatility 370 00:25:25 --> 00:25:35 to plus volatility, it will return any value in here. 371 00:25:35 --> 00:25:39 But notice the previous line, where I am 372 00:25:39 --> 00:25:42 computing volatility. 373 00:25:42 --> 00:25:47 If I wanted every stock to have the same volatility, I could 374 00:25:47 --> 00:25:52 just do that, if you will, at the time I wrote my program. 375 00:25:52 --> 00:25:57 But here I want it to be determined, chosen at run time. 376 00:25:57 --> 00:26:03 So first, I choose a volatility randomly, from some 377 00:26:03 --> 00:26:06 distribution of possible volatilities from 0 to, 378 00:26:06 --> 00:26:12 in this case, 0.2. 379 00:26:12 --> 00:26:17 Think of this as the percentage move per day. 380 00:26:17 --> 00:26:22 So 2/10 of a percent, would be the move here. 381 00:26:22 --> 00:26:28 And then I'll create this function, this distribution d 382 00:26:28 --> 00:26:33 1, which will, whenever I call it, give me a random, a 383 00:26:33 --> 00:26:37 uniformly selected value between minus and 384 00:26:37 --> 00:26:42 plus volatility. 385 00:26:42 --> 00:26:48 Then when I create the stock, here, I can pass 386 00:26:48 --> 00:26:54 it in, pass in d 1. 387 00:26:54 --> 00:26:55 OK, it's a new concept. 388 00:26:55 --> 00:26:58 I don't expect you'll all immediately grab it, but you 389 00:26:58 --> 00:27:05 will need to understand it before the quiz comes along. 390 00:27:05 --> 00:27:08 And then I could also do a Gaussian one here, with the 391 00:27:08 --> 00:27:11 mean of 0 and the standard deviation of volatility 392 00:27:11 --> 00:27:13 divided by 2. 393 00:27:13 --> 00:27:15 Where do these parameters come from? 394 00:27:15 --> 00:27:17 I made them up out of whole cloth. 395 00:27:17 --> 00:27:20 Later we'll talk about how 1 could think about them 396 00:27:20 --> 00:27:24 more intelligently. 397 00:27:24 --> 00:27:29 Now what do I do with that? 398 00:27:29 --> 00:27:31 All right, we'll see that in a minute. 399 00:27:31 --> 00:27:38 But people understand what the basic idea here is. 400 00:27:38 --> 00:27:44 Now, I can set the price of a stock. 401 00:27:44 --> 00:27:47 And when I do that, I'll append it to history. 402 00:27:47 --> 00:27:50 I can, oh, these have got some remnants which 403 00:27:50 --> 00:27:52 we really don't need. 404 00:27:52 --> 00:27:58 I'll get rid of this which is just an uninteresting thing. 405 00:27:58 --> 00:28:05 And let's look at make move. 406 00:28:05 --> 00:28:06 Because this is the interesting thing. 407 00:28:06 --> 00:28:11 Make move is what we call to change the price of a stock, 408 00:28:11 --> 00:28:14 at the beginning or end of a day if you will. 409 00:28:14 --> 00:28:18 So the first thing it does, is it says, if self dot price is 410 00:28:18 --> 00:28:23 0, I'm just going to return. 411 00:28:23 --> 00:28:26 This is not the right thing to do, by the way. 412 00:28:26 --> 00:28:30 Again, there are some bugs in here. 413 00:28:30 --> 00:28:33 You won't find these bugs in your handout, right? 414 00:28:33 --> 00:28:35 Code is different in the handout. 415 00:28:35 --> 00:28:39 But I wanted to show these to you so we could think about. 416 00:28:39 --> 00:28:41 What I'm more interested in here than in the result 417 00:28:41 --> 00:28:46 of the simulation, is the process of creating it. 418 00:28:46 --> 00:28:48 So why did I put this here? 419 00:28:48 --> 00:28:53 Why did I say if self dot price equals 0 return? 420 00:28:53 --> 00:28:57 Because the first time I wrote the program, I didn't have 421 00:28:57 --> 00:29:01 anything like that here, and a stock could go to 0 422 00:29:01 --> 00:29:03 and then recover. 423 00:29:03 --> 00:29:05 Or even go to negative values. 424 00:29:05 --> 00:29:09 Well we know stock prices are never negative. 425 00:29:09 --> 00:29:11 And in fact we know if the price goes to 0, it's 426 00:29:11 --> 00:29:14 delisted from the exchange. 427 00:29:14 --> 00:29:21 So I said, all right, we better make a special case of that. it 428 00:29:21 --> 00:29:24 turns out, that this will be a bug, and I want you to think 429 00:29:24 --> 00:29:29 about why it's wrong for me to put this check here. 430 00:29:29 --> 00:29:33 The check needs to be somewhere in the program, but this is 431 00:29:33 --> 00:29:36 not the right place for it. 432 00:29:36 --> 00:29:40 So think about why I didn't leave it here. 433 00:29:40 --> 00:29:44 OK, then we'll get the old price, which we're going to try 434 00:29:44 --> 00:29:49 and remember, and now comes the interesting part. 435 00:29:49 --> 00:29:51 We're going to try and figure out how the 436 00:29:51 --> 00:29:54 price should change. 437 00:29:54 --> 00:29:59 So I'm first going to compute something called the base move. 438 00:29:59 --> 00:30:02 Think of this as kind of the basis from which we'll be 439 00:30:02 --> 00:30:05 computing the actual move. 440 00:30:05 --> 00:30:12 I'll draw something from the distribution, so this is 441 00:30:12 --> 00:30:15 interesting, I'm now calling self dot distribution, and 442 00:30:15 --> 00:30:20 remember this will be different for each stock. 443 00:30:20 --> 00:30:23 It will return me some random value from either the Gaussian 444 00:30:23 --> 00:30:28 or the normal distribution. 445 00:30:28 --> 00:30:30 With a different volatility for the stocks because that was 446 00:30:30 --> 00:30:35 also selected randomly, plus some market bias. 447 00:30:35 --> 00:30:37 Saying, well, the market on average will go up a little 448 00:30:37 --> 00:30:42 bit, or go down a little bit. 449 00:30:42 --> 00:30:46 And then I'll set the new price, if you will, self dot 450 00:30:46 --> 00:30:52 price, to self dot price times 1 plus the base move. 451 00:30:52 --> 00:30:53 So notice what this says. 452 00:30:53 --> 00:31:00 If the base move is 0, then the price doesn't change. 453 00:31:00 --> 00:31:02 So that makes sense. 454 00:31:02 --> 00:31:04 Interesting question. 455 00:31:04 --> 00:31:13 Why do you think I said self dot price times 1 plus the base 456 00:31:13 --> 00:31:16 move, rather than just adding the base move to the stock, 457 00:31:16 --> 00:31:18 price of the stock? 458 00:31:18 --> 00:31:22 Again, the first time I coded this, I had an addition there 459 00:31:22 --> 00:31:24 instead of a multiplication. 460 00:31:24 --> 00:31:31 What would the ramifications of an addition there be? 461 00:31:31 --> 00:31:35 That would say, how much the stock changed is independent 462 00:31:35 --> 00:31:39 of its current price. 463 00:31:39 --> 00:31:43 And when I ran that it, I got weird results, because we know 464 00:31:43 --> 00:31:49 that a Google priced at, say, 300, is much more likely to 465 00:31:49 --> 00:31:58 move by 10 points in a day than a stock that's priced at $0.50. 466 00:31:58 --> 00:32:02 So in fact, it is the case, if you look at data, and by the 467 00:32:02 --> 00:32:04 way, that's the way I ended up setting a lot of these 468 00:32:04 --> 00:32:07 parameters and playing with it, was comparing what my 469 00:32:07 --> 00:32:11 simulation said to historical stock data. 470 00:32:11 --> 00:32:14 And indeed it is the case that the price of the stock, the 471 00:32:14 --> 00:32:17 move, the amount of move, tends to be proportional 472 00:32:17 --> 00:32:20 to the price of the stock. 473 00:32:20 --> 00:32:22 Expensive stocks move more. 474 00:32:22 --> 00:32:25 Interestingly enough, the percentage moves are not much 475 00:32:25 --> 00:32:29 different between cheap stocks and expensive stocks. 476 00:32:29 --> 00:32:34 And that's why, I ended up using a multiplicative factor, 477 00:32:34 --> 00:32:47 rather than an additive factor. 478 00:32:47 --> 00:32:50 This is again a general lesson. 479 00:32:50 --> 00:32:53 As you build these kinds of simulations, or anything like 480 00:32:53 --> 00:32:58 this, you need to think through whether things should be 481 00:32:58 --> 00:32:59 multiplicative or additive. 482 00:32:59 --> 00:33:03 Because you get very different results, typically. 483 00:33:03 --> 00:33:07 Multiplicative is what you want to do if the amount of change 484 00:33:07 --> 00:33:13 is proportional to the current size, whether it's price or 485 00:33:13 --> 00:33:18 anything else, and additive if the change is independent of 486 00:33:18 --> 00:33:22 the current value, typically, is I think the general 487 00:33:22 --> 00:33:27 way to think about it. 488 00:33:27 --> 00:33:31 Now, you'll see this other kind of peculiar thing. 489 00:33:31 --> 00:33:39 So I've now set the price, and then I've got this test here. 490 00:33:39 --> 00:33:45 If mo, mo stands for momentum. 491 00:33:45 --> 00:33:53 I'm now exploring the question of whether or not stock prices 492 00:33:53 --> 00:34:03 are indeed memoryless, or the stock changes. 493 00:34:03 --> 00:34:09 And the fancy word for that is Poisson. 494 00:34:09 --> 00:34:14 People often model things as Poisson processes, which is to 495 00:34:14 --> 00:34:20 say, processes in which past behavior has no impact on 496 00:34:20 --> 00:34:24 future behavior, it's memoryless. 497 00:34:24 --> 00:34:29 And in fact, that's what the Efficient Market Hypothesis 498 00:34:29 --> 00:34:31 purports to say. 499 00:34:31 --> 00:34:35 It says that, since all the information is in the current 500 00:34:35 --> 00:34:38 price, you don't have to worry about whether it went up or 501 00:34:38 --> 00:34:45 down yesterday, to decide what it's going to do today. 502 00:34:45 --> 00:34:48 There are people who don't believe that, and instead 503 00:34:48 --> 00:34:52 argue that there is this notion called momentum. 504 00:34:52 --> 00:34:54 These are called momentum investors. 505 00:34:54 --> 00:34:58 And they say, what's most likely to happen today, is 506 00:34:58 --> 00:35:00 what happened yesterday. 507 00:35:00 --> 00:35:01 Or more likely. 508 00:35:01 --> 00:35:04 If the stock went up yesterday, it's more likely to go up 509 00:35:04 --> 00:35:08 today, than if it didn't go up yesterday. 510 00:35:08 --> 00:35:13 So I wasn't sure which religion I was willing to believe in, if 511 00:35:13 --> 00:35:17 either, so I added a parameter called, if you believe in 512 00:35:17 --> 00:35:24 momentum, then you should change the price by -- And here 513 00:35:24 --> 00:35:28 I just did something taking a Gaussian times the last change, 514 00:35:28 --> 00:35:32 and, in fact, added it in. 515 00:35:32 --> 00:35:35 So if it went up yesterday, it will more likely go up today, 516 00:35:35 --> 00:35:39 because I'm throwing in a positive number, otherwise 517 00:35:39 --> 00:35:41 a negative number. 518 00:35:41 --> 00:35:44 Notice that this is additive. 519 00:35:44 --> 00:35:48 Because it's dealing with yesterday's price. 520 00:35:48 --> 00:35:51 Change, with the change. 521 00:35:51 --> 00:35:56 OK, so that's why we're dealing with that. 522 00:35:56 --> 00:35:59 Now, here's where I should've put in this 523 00:35:59 --> 00:36:03 test that I had up here. 524 00:36:03 --> 00:36:05 Get it out from there. 525 00:36:05 --> 00:36:08 Because what I want to do is, say if self dot price is less 526 00:36:08 --> 00:36:14 than 0.01, I'm going to set it to 0, just keep it there. 527 00:36:14 --> 00:36:19 That doesn't solve the problem we had before though, right? 528 00:36:19 --> 00:36:25 Then I'm going to append it, and keep the last 529 00:36:25 --> 00:36:28 change for future use. 530 00:36:28 --> 00:36:32 OK, people understand what's going on here? 531 00:36:32 --> 00:36:35 And then show history is just going to produce a plot. 532 00:36:35 --> 00:36:37 We've seen that a million times before. 533 00:36:37 --> 00:36:42 Any questions about this? 534 00:36:42 --> 00:36:45 Well, I have a question? 535 00:36:45 --> 00:36:46 Does it make any sense? 536 00:36:46 --> 00:36:48 Is it going to work at all? 537 00:36:48 --> 00:36:51 So now let's test it. 538 00:36:51 --> 00:37:01 So, I now have this unit test program called unit test stock. 539 00:37:01 --> 00:37:04 I originally did not make it a function, I had it in-line, and 540 00:37:04 --> 00:37:07 I realized that was really stupid, because I wanted 541 00:37:07 --> 00:37:10 to do it a lot of times. 542 00:37:10 --> 00:37:15 So it's got an internal procedure, internal function, 543 00:37:15 --> 00:37:20 local to the unit test, that runs the simulation. 544 00:37:20 --> 00:37:25 And it takes the set of stocks to simulate, a fig, figure 545 00:37:25 --> 00:37:29 number, this is going to print a bunch of graphs, and I want 546 00:37:29 --> 00:37:32 to say what graph it is, and whether or not I 547 00:37:32 --> 00:37:35 believe in big mo. 548 00:37:35 --> 00:37:41 It sets the mean to 0, and then for s in the stocks, it moves 549 00:37:41 --> 00:37:46 it, giving it the bias and the momentum, then it 550 00:37:46 --> 00:37:49 shows the history. 551 00:37:49 --> 00:37:53 And then computes the mean of, getting me the mean 552 00:37:53 --> 00:37:54 of all the stocks in it. 553 00:37:54 --> 00:37:57 We've seen this sort of thing many times before. 554 00:37:57 --> 00:37:59 I've then got some constants. 555 00:37:59 --> 00:38:02 By the way, I want to emphasize that I've named these constants 556 00:38:02 --> 00:38:04 to make it easier to change. 557 00:38:04 --> 00:38:10 Starting with 20 stocks, 100 days. 558 00:38:10 --> 00:38:15 And then what I do is, I stock sub 1, stocks 1 will be the 559 00:38:15 --> 00:38:18 empty list, stocks 2 is the empty list. 560 00:38:18 --> 00:38:27 Why do you think I'm starting with bias of 0? 561 00:38:27 --> 00:38:32 Because, what do you think the mean should be, if I simulate 562 00:38:32 --> 00:38:35 various things that the bias of 0? 563 00:38:35 --> 00:38:39 I start $100 as the average price of the stock, 564 00:38:39 --> 00:38:42 what should the average price of the stock be? 565 00:38:42 --> 00:38:45 If my code is correct, what should the average price be, 566 00:38:45 --> 00:38:50 after say, 100 days, if there's no bias. 567 00:38:50 --> 00:38:51 Pardon? 568 00:38:51 --> 00:38:53 100, exactly. 569 00:38:53 --> 00:38:56 Since there's no upward or downward bias. 570 00:38:56 --> 00:39:00 They may fluctuate wildly, but if I look at enough stocks, the 571 00:39:00 --> 00:39:03 average should be right around 100. 572 00:39:03 --> 00:39:05 I don't know what the average would be if I 573 00:39:05 --> 00:39:07 chose a different bias. 574 00:39:07 --> 00:39:11 It's a little bit complicated, so I chose the simplest bias. 575 00:39:11 --> 00:39:15 Important lesson, so that there would be some predictability in 576 00:39:15 --> 00:39:19 the results, and I would have some, if you will, smoke test 577 00:39:19 --> 00:39:22 for knowing whether or not I was getting, my code 578 00:39:22 --> 00:39:27 seemed to be working. 579 00:39:27 --> 00:39:31 All right, and initially, well, maybe initially, just to be 580 00:39:31 --> 00:39:39 simple I'm going to start momentum equal to false. 581 00:39:39 --> 00:39:41 Because, again, it seems simpler have a model where 582 00:39:41 --> 00:39:43 there's no momentum. 583 00:39:43 --> 00:39:46 I'm looking for the simplest model possible for the 584 00:39:46 --> 00:39:49 first time I run it. 585 00:39:49 --> 00:39:51 And then we looked at this little loop before, for i in 586 00:39:51 --> 00:39:55 range number of stocks, I'm going to create two different 587 00:39:55 --> 00:40:00 lists of stocks, one where the moves, or distributions, are 588 00:40:00 --> 00:40:04 chosen from a uniform, and the other where they're Gaussian. 589 00:40:04 --> 00:40:07 Because I'm sort of curious as to, again, which is the right 590 00:40:07 --> 00:40:14 way to think about this, all right? 591 00:40:14 --> 00:40:18 And then, I'm going to just call it. 592 00:40:18 --> 00:40:22 We'll see what we get. 593 00:40:22 --> 00:40:22 So let's do it. 594 00:40:22 --> 00:40:25 Let's hope that all the changes I mad have not introduced 595 00:40:25 --> 00:40:29 a syntax error. 596 00:40:29 --> 00:40:31 All right, well at least it did something. 597 00:40:31 --> 00:40:39 Let's see what it did. 598 00:40:39 --> 00:40:44 So the test on the left, you'll remember, was the one with test 599 00:40:44 --> 00:40:48 one, I believe, was the uniform distribution, and test 600 00:40:48 --> 00:40:51 two is the Gaussian. 601 00:40:51 --> 00:40:55 So, but let's, what should we do first? 602 00:40:55 --> 00:41:00 Well, let's do the smoke test number one: is the mean more 603 00:41:00 --> 00:41:02 or less what we expected? 604 00:41:02 --> 00:41:05 Well, it looks like it's dead on 100, which was our 605 00:41:05 --> 00:41:09 initial price in test two. 606 00:41:09 --> 00:41:12 And in test one it's a little bit above 100. 607 00:41:12 --> 00:41:16 But, we didn't do that many stocks, or that many days, 608 00:41:16 --> 00:41:24 so it's quite plausible that it's correct. 609 00:41:24 --> 00:41:28 But, just to be sure, not to be sure, but just to increase my 610 00:41:28 --> 00:41:46 confidence, I'm going to just run it again. 611 00:41:46 --> 00:41:50 Well, here I'm a little bit below 100 and in two, and 612 00:41:50 --> 00:41:53 test one a little bit below 100 as well. 613 00:41:53 --> 00:41:56 You remember last time was a little bit above 100. 614 00:41:56 --> 00:41:59 I feel pretty good about this, and in fact I ran it a lot 615 00:41:59 --> 00:42:01 of times in my office. 616 00:42:01 --> 00:42:05 And it just bounces around, hovering around 100. 617 00:42:05 --> 00:42:06 Course, this is the wrong way to do it. 618 00:42:06 --> 00:42:09 I should really just put it in a nice test harness, where I 619 00:42:09 --> 00:42:14 run 100, 200, 1,000 trials, but I didn't want to bore 620 00:42:14 --> 00:42:15 you with that here. 621 00:42:15 --> 00:42:20 So we'll see that, OK, we passed the first smoke test. 622 00:42:20 --> 00:42:25 We seem to be where we expect to be. 623 00:42:25 --> 00:42:30 Well, let's try smoke test two. 624 00:42:30 --> 00:42:32 What else might we want to see, to see if we got 625 00:42:32 --> 00:42:34 things working properly? 626 00:42:34 --> 00:42:38 Well, I kind of ignored the notion of bias by making it 627 00:42:38 --> 00:42:49 0, so let's give it a big bias here. 628 00:42:49 --> 00:43:09 Assuming it will let me edit it. 629 00:43:09 --> 00:43:11 We just gotta start it up again, it's the 630 00:43:11 --> 00:43:34 safest thing to do. 631 00:43:34 --> 00:43:38 You wouldn't think I would have, I don't have -- all 632 00:43:38 --> 00:43:42 right, be that way about it. 633 00:43:42 --> 00:43:46 Fortunately, we've been through this before. 634 00:43:46 --> 00:43:49 We know if we relaunch the Finder. 635 00:43:49 --> 00:43:58 Who says Mac OS is flawless? 636 00:43:58 --> 00:44:07 All right, we were down here, and I was saying, let's try 637 00:44:07 --> 00:44:10 a larger, introduce a bias. 638 00:44:10 --> 00:44:12 Again, we're trying to see if it does what 639 00:44:12 --> 00:44:13 we think it might do. 640 00:44:13 --> 00:44:17 So what do you think it should do with a bias? 641 00:44:17 --> 00:44:21 Where should the mean be now? 642 00:44:21 --> 00:44:24 Still around 100? 643 00:44:24 --> 00:44:26 Or higher, right? 644 00:44:26 --> 00:44:29 Because we've now put in a bias suggesting 645 00:44:29 --> 00:44:31 that it should go up. 646 00:44:31 --> 00:44:33 Oops. 647 00:44:33 --> 00:44:37 It wouldn't have hurt it. 648 00:44:37 --> 00:44:39 All right. 649 00:44:39 --> 00:44:49 So let's run it. 650 00:44:49 --> 00:44:54 Sure enough, for one, we see, test two, it's a little bit 651 00:44:54 --> 00:45:02 over 100, and for test one it's way over 100. 652 00:45:02 --> 00:45:11 Well, let's make sure it's not a fluke. 653 00:45:11 --> 00:45:21 Try it again. 654 00:45:21 --> 00:45:26 So, sure enough, changing the bias changed the price, 655 00:45:26 --> 00:45:31 and even changed it in the right direction. 656 00:45:31 --> 00:45:33 So we can feel pretty comfortable that it's doing 657 00:45:33 --> 00:45:35 something good with that. 658 00:45:35 --> 00:45:37 We could also feel pretty comfortable that that's 659 00:45:37 --> 00:45:40 probably way too high a bias, right? 660 00:45:40 --> 00:45:45 We would not expect that the mean should be over 160, or 661 00:45:45 --> 00:45:49 in one case, 150, after only 100 days trading, right? 662 00:45:49 --> 00:45:53 Things don't typically go up 50% in 100 days. 663 00:45:53 --> 00:46:03 They go down 50%, but -- All right, so that's good. 664 00:46:03 --> 00:46:06 Oh, let's look at something else now. 665 00:46:06 --> 00:46:10 Let's go back to where, a simpler bias here. 666 00:46:10 --> 00:46:13 We'll run it again. 667 00:46:13 --> 00:46:16 And think about, what's the difference between the 668 00:46:16 --> 00:46:26 Gaussian and the normal? 669 00:46:26 --> 00:46:35 Can we deduce anything about those? 670 00:46:35 --> 00:46:37 Not, well, let me ask you. 671 00:46:37 --> 00:46:45 What do you think, yes or no? 672 00:46:45 --> 00:46:47 Anybody see anything interesting here? 673 00:46:47 --> 00:46:47 Yeah? 674 00:46:47 --> 00:46:52 STUDENT: The variance of the Gaussian seems to be less than 675 00:46:52 --> 00:46:54 the variance of the uniform. 676 00:46:54 --> 00:46:55 PROFESSOR: The variance of the Gaussian -- 677 00:46:55 --> 00:46:59 STUDENT: -- is less. 678 00:46:59 --> 00:47:02 PROFESSOR: So all right, that appears to be the case here. 679 00:47:02 --> 00:47:04 But let's run it again, as we've done with 680 00:47:04 --> 00:47:06 all the other tests. 681 00:47:06 --> 00:47:09 So we have a hypothesis. 682 00:47:09 --> 00:47:13 Let's not fall victim to the Oklahoma sharpshooter. 683 00:47:13 --> 00:47:19 We'll test our hypothesis, or at least examine it again, 684 00:47:19 --> 00:47:27 see if it's, in some sense, repeatable. 685 00:47:27 --> 00:47:32 Well, now what do we see? 686 00:47:32 --> 00:47:35 Doesn't seem to be true this time, right? 687 00:47:35 --> 00:47:37 Not obviously. 688 00:47:37 --> 00:47:42 So, we're not sure about this. 689 00:47:42 --> 00:47:45 So this is something that we would need to 690 00:47:45 --> 00:47:49 investigate further. 691 00:47:49 --> 00:47:52 And we would need, to have to look at it, and it's going to 692 00:47:52 --> 00:47:57 be very tricky, by the way, as to what the right answer is. 693 00:47:57 --> 00:48:04 But if you think about it, it would not be surprising if the 694 00:48:04 --> 00:48:09 Gaussians, at least, gave us some surprising, more extreme, 695 00:48:09 --> 00:48:12 results, than the uniform. 696 00:48:12 --> 00:48:16 Because the uniform, as we've set it up here, is bounded. 697 00:48:16 --> 00:48:22 The minimum and the maximum is bounded. 698 00:48:22 --> 00:48:26 With the Gaussian, there's a tail. 699 00:48:26 --> 00:48:29 And you might every once in a while get this, at least as 700 00:48:29 --> 00:48:34 we've done it in this case, this large move out at the end. 701 00:48:34 --> 00:48:36 You might not. 702 00:48:36 --> 00:48:40 There's nothing profound about this, other than the 703 00:48:40 --> 00:48:43 understanding that the details of how you set these things 704 00:48:43 --> 00:48:47 up can matter a lot. 705 00:48:47 --> 00:48:54 Well, the final thing I want to look at is momentum. 706 00:48:54 --> 00:49:07 So let's go back, and let's set mo to true here. 707 00:49:07 --> 00:49:10 Well, doesn't want us to set mo to true here. 708 00:49:10 --> 00:49:16 Ah, there it does. 709 00:49:16 --> 00:49:19 So, and now let's run it and see what happens. 710 00:49:19 --> 00:49:25 What do you think should happen? 711 00:49:25 --> 00:49:26 Anybody? 712 00:49:26 --> 00:49:31 STUDENT: [INAUDIBLE] 713 00:49:31 --> 00:49:36 PROFESSOR: I think you're right. 714 00:49:36 --> 00:49:41 These ones should curl, see if I can -- oh, not bad. 715 00:49:41 --> 00:49:48 Let's run it. 716 00:49:48 --> 00:49:53 Well, it's a little hard to see, but things 717 00:49:53 --> 00:49:56 tend to take off. 718 00:49:56 --> 00:49:59 Because once things started moving, it tends to 719 00:49:59 --> 00:50:05 move in that direction. 720 00:50:05 --> 00:50:10 All right. 721 00:50:10 --> 00:50:12 How do we go about choosing these parameters? 722 00:50:12 --> 00:50:14 How do we go about deciding what to do? 723 00:50:14 --> 00:50:18 Well, we play with it, the way I've been playing with it, and 724 00:50:18 --> 00:50:22 compare the results to some set of real data. 725 00:50:22 --> 00:50:26 And then we try and get our simulation to match the past, 726 00:50:26 --> 00:50:31 and hope that that will help it predict the future. 727 00:50:31 --> 00:50:33 We're not enough time to go through all the, 728 00:50:33 --> 00:50:35 to do that a lot. 729 00:50:35 --> 00:50:39 I will be posting code that you can play with, and I suggest 730 00:50:39 --> 00:50:42 you go through exactly this kind of exercise. 731 00:50:42 --> 00:50:45 Because this is really the way that people do 732 00:50:45 --> 00:50:46 develop simulations. 733 00:50:46 --> 00:50:50 They don't, out of whole cloth, get it right the first time. 734 00:50:50 --> 00:50:54 They build them, they do what if games, they play with them, 735 00:50:54 --> 00:50:57 and then they try and adjust them to get them right. 736 00:50:57 --> 00:51:00 The nice thing here is, you can decide whether you believe 737 00:51:00 --> 00:51:04 momentum and see what it would mean, or not mean, etc. 738 00:51:04 --> 00:51:06 All right, one more lecture. 739 00:51:06 --> 00:51:09 See you guys next week. 740 00:51:09 --> 00:51:09