1 00:00:01,640 --> 00:00:04,040 The following content is provided under a Creative 2 00:00:04,040 --> 00:00:05,580 Commons license. 3 00:00:05,580 --> 00:00:07,880 Your support will help MIT OpenCourseWare 4 00:00:07,880 --> 00:00:12,270 continue to offer high quality educational resources for free. 5 00:00:12,270 --> 00:00:14,870 To make a donation or view additional materials 6 00:00:14,870 --> 00:00:18,830 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,830 --> 00:00:20,000 at ocw.mit.edu. 8 00:00:23,234 --> 00:00:27,560 TOMER ULLMAN: So so far, we've talked about just examples 9 00:00:27,560 --> 00:00:28,910 of running things forward. 10 00:00:28,910 --> 00:00:32,310 I hope I've given you some examples 11 00:00:32,310 --> 00:00:34,310 of different procedures that you can run forward 12 00:00:34,310 --> 00:00:36,290 to get some interesting stuff, whether it's 13 00:00:36,290 --> 00:00:38,720 a mixture of Gaussians, whether it's 14 00:00:38,720 --> 00:00:41,570 sort of this mixture of Gaussian plus uniform, 15 00:00:41,570 --> 00:00:43,619 whether it's just flipping the coin. 16 00:00:43,619 --> 00:00:46,160 But the question is, OK, I've written down my forward model-- 17 00:00:46,160 --> 00:00:48,590 and hopefully, you saw that, even if it was a little bit 18 00:00:48,590 --> 00:00:50,840 broken, even if you didn't get the full details, 19 00:00:50,840 --> 00:00:52,870 it wasn't that hard to write it down, right? 20 00:00:52,870 --> 00:00:54,620 Someone could say, listen, I think the way 21 00:00:54,620 --> 00:00:55,760 the bombing works is this. 22 00:00:55,760 --> 00:00:56,670 You're going to put a Gaussian. 23 00:00:56,670 --> 00:00:58,480 You're going to put another Gaussian maybe, 24 00:00:58,480 --> 00:00:59,420 or maybe you're going to put three. 25 00:00:59,420 --> 00:01:00,460 I don't know how many. 26 00:01:00,460 --> 00:01:01,340 And you can write that down. 27 00:01:01,340 --> 00:01:03,923 And then you say, OK, I actually want to do inference on that. 28 00:01:03,923 --> 00:01:06,710 And that's when it becomes a little bit painful to do. 29 00:01:06,710 --> 00:01:09,740 And if only there was a way of running forward your model 30 00:01:09,740 --> 00:01:11,790 and, having written the forward direction, 31 00:01:11,790 --> 00:01:12,814 you can do inference. 32 00:01:12,814 --> 00:01:15,230 And it looks like we're talking about something completely 33 00:01:15,230 --> 00:01:17,660 different, but actually, it's not. 34 00:01:17,660 --> 00:01:19,610 We're basically going to run our models, 35 00:01:19,610 --> 00:01:21,359 but we're going to run our models in a way 36 00:01:21,359 --> 00:01:24,510 that it's going to do inference. 37 00:01:24,510 --> 00:01:25,220 So let's see. 38 00:01:25,220 --> 00:01:26,750 How would we possibly do that? 39 00:01:26,750 --> 00:01:30,140 So the basic syntax for any sort of query-- 40 00:01:30,140 --> 00:01:32,300 any sort of inference-- in Church 41 00:01:32,300 --> 00:01:34,790 is by stating the following procedure. 42 00:01:34,790 --> 00:01:38,450 You start out with saying query, where query is not itself 43 00:01:38,450 --> 00:01:38,970 a command. 44 00:01:38,970 --> 00:01:40,730 It's just there all sorts of queries. 45 00:01:40,730 --> 00:01:42,160 There's rejection query. 46 00:01:42,160 --> 00:01:44,420 There's mh-query-- Metropolis Hastings. 47 00:01:44,420 --> 00:01:47,187 There's explicit enumeration. 48 00:01:47,187 --> 00:01:48,770 But the point is, you would write down 49 00:01:48,770 --> 00:01:50,896 that particular query, then you would write down 50 00:01:50,896 --> 00:01:51,770 the generative model. 51 00:01:51,770 --> 00:01:53,840 This is a list of things-- 52 00:01:53,840 --> 00:01:55,510 the way that you think the world works. 53 00:01:55,510 --> 00:01:57,890 So here, for example, you would put in the London bombing 54 00:01:57,890 --> 00:01:58,390 example. 55 00:01:58,390 --> 00:02:00,860 You would put in the list of things like, I don't know, 56 00:02:00,860 --> 00:02:02,639 it's either uniform or not uniform, 57 00:02:02,639 --> 00:02:04,180 it's either Gaussian or not Gaussian. 58 00:02:04,180 --> 00:02:06,890 You're going to put some uncertainty in priors 59 00:02:06,890 --> 00:02:08,947 and things like that. 60 00:02:08,947 --> 00:02:11,030 Once you finish defining your forward model of how 61 00:02:11,030 --> 00:02:12,800 you think the world works, you're 62 00:02:12,800 --> 00:02:15,560 going to ask it a particular thing. 63 00:02:15,560 --> 00:02:17,810 The penultimate statement that you're going to give it 64 00:02:17,810 --> 00:02:20,150 is what we want to know. 65 00:02:20,150 --> 00:02:21,960 For example, in this particular case, 66 00:02:21,960 --> 00:02:24,410 suppose that, before I started, I said, I 67 00:02:24,410 --> 00:02:27,560 don't know if the bombing is targeted or not. 68 00:02:27,560 --> 00:02:29,019 I'm 50/50 either way. 69 00:02:29,019 --> 00:02:29,810 I don't have to be. 70 00:02:29,810 --> 00:02:31,268 But let's say I'm 50/50 either way. 71 00:02:31,268 --> 00:02:33,380 So I say, I'm going to flip a coin. 72 00:02:33,380 --> 00:02:35,690 It's either targeted or it's not targeted. 73 00:02:35,690 --> 00:02:38,052 And that's going to come up either true or false. 74 00:02:38,052 --> 00:02:40,010 And what you're going to basically say-- you're 75 00:02:40,010 --> 00:02:41,510 going to query on that. 76 00:02:41,510 --> 00:02:44,600 You want to say, did the coin come up true or false 77 00:02:44,600 --> 00:02:47,270 given the data? 78 00:02:47,270 --> 00:02:49,070 So the last thing-- the ultimate statement 79 00:02:49,070 --> 00:02:50,960 that you're going to write-- is basically 80 00:02:50,960 --> 00:02:52,544 the conditional statement. 81 00:02:52,544 --> 00:02:54,710 And the conditional statement is basically the thing 82 00:02:54,710 --> 00:02:57,780 that has to evaluate as true. 83 00:02:57,780 --> 00:02:59,780 The usual thing that we would write down there-- 84 00:02:59,780 --> 00:03:00,990 I'll give you some examples of that-- 85 00:03:00,990 --> 00:03:03,230 but what we would usually write is something like, 86 00:03:03,230 --> 00:03:06,620 given that the observed data matches the sample 87 00:03:06,620 --> 00:03:08,690 data from my model. 88 00:03:08,690 --> 00:03:11,330 So if you want to do something like the probability 89 00:03:11,330 --> 00:03:13,314 of a particular hypothesis given the data, 90 00:03:13,314 --> 00:03:15,480 this is the way that you would say, this is my data. 91 00:03:15,480 --> 00:03:16,640 This is what I know. 92 00:03:16,640 --> 00:03:17,690 And the way that it would know is, 93 00:03:17,690 --> 00:03:19,273 you're sort of running down a program, 94 00:03:19,273 --> 00:03:22,340 and you're constraining it to give you 95 00:03:22,340 --> 00:03:25,382 a sample that matches the actual thing that you see. 96 00:03:25,382 --> 00:03:26,840 In the London bombing example, what 97 00:03:26,840 --> 00:03:29,048 you would do is you would write something like query, 98 00:03:29,048 --> 00:03:29,924 a bunch of defines-- 99 00:03:29,924 --> 00:03:32,090 targeted bombing, random bombing, things like that-- 100 00:03:32,090 --> 00:03:33,320 I want to know-- 101 00:03:33,320 --> 00:03:34,790 is it targeted or random? 102 00:03:34,790 --> 00:03:35,817 How did the coin fall? 103 00:03:35,817 --> 00:03:37,400 And what you're going to do is, you're 104 00:03:37,400 --> 00:03:40,790 going to say, listen, run this model forward 105 00:03:40,790 --> 00:03:42,064 under the following condition. 106 00:03:42,064 --> 00:03:43,730 If I just run it forward, I would either 107 00:03:43,730 --> 00:03:45,060 get targeted or not. 108 00:03:45,060 --> 00:03:46,760 It would be 50/50-- 109 00:03:46,760 --> 00:03:48,440 under the condition that whatever 110 00:03:48,440 --> 00:03:51,660 this model samples has to match the actual data. 111 00:03:51,660 --> 00:03:54,175 So I'm going to run it forward, but the thing 112 00:03:54,175 --> 00:03:55,550 that it needs to evaluate as true 113 00:03:55,550 --> 00:03:57,440 is that the samples I got from the model 114 00:03:57,440 --> 00:04:01,580 are equal to the actual data that I got. 115 00:04:01,580 --> 00:04:03,060 Now once you do that-- 116 00:04:03,060 --> 00:04:06,560 once you define that particular thing-- what you've done 117 00:04:06,560 --> 00:04:09,490 is change the probability distribution 118 00:04:09,490 --> 00:04:11,527 that the generative model describes. 119 00:04:11,527 --> 00:04:12,860 Remember how we talked earlier-- 120 00:04:12,860 --> 00:04:14,600 I was sort of trying to hammer it home-- 121 00:04:14,600 --> 00:04:16,550 that anything that you write down as Church 122 00:04:16,550 --> 00:04:18,350 is actually a probability distribution. 123 00:04:18,350 --> 00:04:19,904 You write down the program and you 124 00:04:19,904 --> 00:04:21,320 run it an infinite number of times 125 00:04:21,320 --> 00:04:22,930 and you get some distribution. 126 00:04:22,930 --> 00:04:25,970 Your generative model describes a particular distribution. 127 00:04:25,970 --> 00:04:28,070 If you condition that model on something, 128 00:04:28,070 --> 00:04:31,580 you get a different distribution. 129 00:04:31,580 --> 00:04:33,200 And that different distribution is now 130 00:04:33,200 --> 00:04:35,030 what you're going to sample from. 131 00:04:35,030 --> 00:04:37,070 You're going to sample from the posterior. 132 00:04:37,070 --> 00:04:39,470 You have some prior, you condition it on some data, 133 00:04:39,470 --> 00:04:41,930 and you're going to sample from the posterior. 134 00:04:41,930 --> 00:04:44,374 And sampling from the posterior can be something like-- 135 00:04:44,374 --> 00:04:46,040 and we'll give it some examples, like, I 136 00:04:46,040 --> 00:04:48,890 know how the world works in terms of their objects, 137 00:04:48,890 --> 00:04:51,500 and I know how light works, and I know how vision works. 138 00:04:51,500 --> 00:04:54,680 I don't know what the particular objects in this world are. 139 00:04:54,680 --> 00:04:56,420 That's what I want to know. 140 00:04:56,420 --> 00:04:59,120 I condition on their retinal display 141 00:04:59,120 --> 00:05:01,260 being equal to something. 142 00:05:01,260 --> 00:05:03,860 And now, my posterior probability distribution 143 00:05:03,860 --> 00:05:05,990 is going to basically sample from, say, 144 00:05:05,990 --> 00:05:09,616 your face or these objects or these chairs. 145 00:05:09,616 --> 00:05:11,490 Well, the same thing could work, for example, 146 00:05:11,490 --> 00:05:13,640 if you're trying to query a sentence. 147 00:05:13,640 --> 00:05:16,220 You're trying to parse a sentence from sound, 148 00:05:16,220 --> 00:05:21,410 or you're trying to predict how the next step in a physics 149 00:05:21,410 --> 00:05:23,870 engine is going to work, or many, many, many, many, many 150 00:05:23,870 --> 00:05:28,040 different other things that you can find in probmods.org. 151 00:05:28,040 --> 00:05:30,489 So like I said, the "what we know" is the condition. 152 00:05:30,489 --> 00:05:32,030 And if you set the condition to true, 153 00:05:32,030 --> 00:05:34,400 that's a sampling from the generative model, 154 00:05:34,400 --> 00:05:38,510 because you're always going to evaluate as true. 155 00:05:38,510 --> 00:05:40,640 Now, how could you possibly implement 156 00:05:40,640 --> 00:05:43,620 this sort of magical procedure? 157 00:05:43,620 --> 00:05:45,920 So how could you take some probability distribution 158 00:05:45,920 --> 00:05:49,550 and change it into a different probability distribution that 159 00:05:49,550 --> 00:05:52,460 does what you want it to do? 160 00:05:52,460 --> 00:05:55,219 And there are many, many different ways of doing that. 161 00:05:55,219 --> 00:05:57,260 But the easiest way of doing that is by something 162 00:05:57,260 --> 00:05:59,870 called rejection query. 163 00:05:59,870 --> 00:06:03,690 How many of you know about rejection sampling? 164 00:06:03,690 --> 00:06:05,912 How many don't know about rejection sampling? 165 00:06:05,912 --> 00:06:07,120 OK. 166 00:06:07,120 --> 00:06:08,930 The way rejection sampling works is 167 00:06:08,930 --> 00:06:11,120 that I have some sort of distribution 168 00:06:11,120 --> 00:06:13,799 that I'm trying to sample from. 169 00:06:13,799 --> 00:06:15,590 And suppose that it's really hard to sample 170 00:06:15,590 --> 00:06:17,720 from that distribution exactly. 171 00:06:17,720 --> 00:06:21,860 So let's say that my distribution is this circle. 172 00:06:21,860 --> 00:06:24,800 And for whatever reason, it's really, really hard 173 00:06:24,800 --> 00:06:26,750 to sample from that circle. 174 00:06:26,750 --> 00:06:31,130 I don't want to try to define the probability distribution 175 00:06:31,130 --> 00:06:32,780 that describes this circle. 176 00:06:32,780 --> 00:06:34,700 It's really hard to sample from it. 177 00:06:34,700 --> 00:06:37,241 This is trivial, but there are probability distributions that 178 00:06:37,241 --> 00:06:38,810 are really hard to sample from. 179 00:06:38,810 --> 00:06:39,894 What do you do? 180 00:06:39,894 --> 00:06:41,810 You can construct a really simple distribution 181 00:06:41,810 --> 00:06:43,204 that you can sample from. 182 00:06:43,204 --> 00:06:45,620 Let's say that it's really, really simple for me to sample 183 00:06:45,620 --> 00:06:50,540 from a uniform square that encompasses the circle. 184 00:06:50,540 --> 00:06:52,434 So now, I have some probability distribution. 185 00:06:52,434 --> 00:06:54,350 There's a uniform distribution over the square 186 00:06:54,350 --> 00:06:55,371 that I can sample from. 187 00:06:55,371 --> 00:06:56,870 What does it mean I can sample from? 188 00:06:56,870 --> 00:06:58,453 It means each time I run the procedure 189 00:06:58,453 --> 00:07:00,890 I get some point in the square. 190 00:07:00,890 --> 00:07:02,660 But I don't want the points in the square. 191 00:07:02,660 --> 00:07:06,140 I want only points from the circle. 192 00:07:06,140 --> 00:07:09,890 So what I would do is basically sample from the square, 193 00:07:09,890 --> 00:07:12,650 and each time it falls outside the square, I'm going to say, 194 00:07:12,650 --> 00:07:14,880 throw that out. 195 00:07:14,880 --> 00:07:16,920 That's called rejection sampling. 196 00:07:16,920 --> 00:07:19,230 Because you sample from some sort of procedure 197 00:07:19,230 --> 00:07:20,880 that you know how to sample from, 198 00:07:20,880 --> 00:07:23,070 then you check that sample. 199 00:07:23,070 --> 00:07:25,680 And if that sample didn't meet your desiderata, 200 00:07:25,680 --> 00:07:26,820 you throw it away. 201 00:07:26,820 --> 00:07:30,330 And what you're left with is the circle-- 202 00:07:30,330 --> 00:07:32,400 the distribution that you're trying to get. 203 00:07:32,400 --> 00:07:34,770 So what is the simple thing and what 204 00:07:34,770 --> 00:07:37,410 is the hard thing in what we're describing so far? 205 00:07:37,410 --> 00:07:39,600 The simple thing is the generative model. 206 00:07:39,600 --> 00:07:42,210 It's relatively easy to sample from the generative model. 207 00:07:42,210 --> 00:07:43,480 We just wrote it down. 208 00:07:43,480 --> 00:07:45,460 So we know how to sample from it. 209 00:07:45,460 --> 00:07:47,279 But we're looking for something else. 210 00:07:47,279 --> 00:07:49,320 We're looking for some sort of different program. 211 00:07:49,320 --> 00:07:51,030 We're looking for some sort of setting 212 00:07:51,030 --> 00:07:54,070 of a program that would generate the data that I saw, 213 00:07:54,070 --> 00:07:58,350 not just the generative model that I wrote. 214 00:07:58,350 --> 00:08:01,320 The way that we would do that is, in rejection query, 215 00:08:01,320 --> 00:08:04,960 we would sample from the generative model. 216 00:08:04,960 --> 00:08:09,610 We would check, does that fit what we know? 217 00:08:09,610 --> 00:08:12,237 Suppose I just sampled something. 218 00:08:12,237 --> 00:08:14,320 And then I check, does that apply to the condition 219 00:08:14,320 --> 00:08:15,170 that I want? 220 00:08:15,170 --> 00:08:16,795 So let's say I have a particular world. 221 00:08:20,520 --> 00:08:22,020 It's a sort of a silly world. 222 00:08:22,020 --> 00:08:24,960 But let's just make sure it works given the last time. 223 00:08:24,960 --> 00:08:26,020 OK. 224 00:08:26,020 --> 00:08:28,710 So what I'm going to do is, I'm going 225 00:08:28,710 --> 00:08:30,880 to describe a world in which there is Legolas. 226 00:08:30,880 --> 00:08:34,074 Gimli, and Arwen. 227 00:08:34,074 --> 00:08:36,240 Anyone get Lord of the Rings references or something 228 00:08:36,240 --> 00:08:36,740 like that? 229 00:08:36,740 --> 00:08:37,440 OK. 230 00:08:37,440 --> 00:08:39,510 Each one of them is going to take out 231 00:08:39,510 --> 00:08:42,059 a particular number of orcs. 232 00:08:42,059 --> 00:08:44,290 And let's say that's my generative model 233 00:08:44,290 --> 00:08:47,220 is that I don't know how many each one of them took out. 234 00:08:47,220 --> 00:08:50,132 Let's say that they take anything between zero and 20. 235 00:08:50,132 --> 00:08:51,090 They're having a brawl. 236 00:08:51,090 --> 00:08:53,340 Each one of them is going to take out some number of orcs. 237 00:08:53,340 --> 00:08:55,131 So we're going to define the number of orcs 238 00:08:55,131 --> 00:08:57,180 that Legolas took out as some random integer-- 239 00:08:57,180 --> 00:08:58,050 20. 240 00:08:58,050 --> 00:08:59,480 OK? 241 00:08:59,480 --> 00:09:00,600 Gimli's the same. 242 00:09:00,600 --> 00:09:02,250 Arwen's the same. 243 00:09:02,250 --> 00:09:07,530 And we're going to also define the total number of orcs 244 00:09:07,530 --> 00:09:11,040 that they took out as just plus each one of these things. 245 00:09:11,040 --> 00:09:12,870 So we're going to look at the pile of orcs 246 00:09:12,870 --> 00:09:15,115 that they took out in the end. 247 00:09:15,115 --> 00:09:16,240 That's my generative model. 248 00:09:16,240 --> 00:09:17,110 That's it. 249 00:09:17,110 --> 00:09:20,250 What I'm going to wonder about is, 250 00:09:20,250 --> 00:09:23,070 how many orcs did Gimli take out? 251 00:09:23,070 --> 00:09:26,839 Not knowing anything, how many orcs did Gimli take out? 252 00:09:26,839 --> 00:09:28,630 Should we just switch to kill or something? 253 00:09:28,630 --> 00:09:30,920 I feel bad for the orcs. 254 00:09:30,920 --> 00:09:31,620 But OK. 255 00:09:31,620 --> 00:09:33,600 How many orcs did Gimli take out? 256 00:09:33,600 --> 00:09:34,530 Well, we don't know. 257 00:09:34,530 --> 00:09:35,680 We just said we don't know. 258 00:09:35,680 --> 00:09:37,800 It's a random integer between zero and 20. 259 00:09:37,800 --> 00:09:38,760 It's anyone's guess. 260 00:09:38,760 --> 00:09:39,990 If I just ran this model forward, 261 00:09:39,990 --> 00:09:42,000 it would give me any number between zero and 20. 262 00:09:42,000 --> 00:09:44,710 If I ran it 1,000 times, I would get a uniform distribution 263 00:09:44,710 --> 00:09:45,826 over zero and 20. 264 00:09:45,826 --> 00:09:47,450 Now, I'm going to give you a condition. 265 00:09:47,450 --> 00:09:48,960 It's a simple condition. 266 00:09:48,960 --> 00:09:51,450 The total number of orcs that they all took out 267 00:09:51,450 --> 00:09:54,730 is greater than 45. 268 00:09:54,730 --> 00:09:57,570 Altogether, Gimli, Arwen, Legolas 269 00:09:57,570 --> 00:10:01,080 took out more than 45 orcs. 270 00:10:01,080 --> 00:10:04,454 Now, how many orcs do you think that Gimli took out? 271 00:10:04,454 --> 00:10:06,120 Now, the point is that you would somehow 272 00:10:06,120 --> 00:10:07,300 shift your distribution. 273 00:10:07,300 --> 00:10:08,550 This is a very simple problem. 274 00:10:08,550 --> 00:10:11,010 You could probably write it down on a notepad. 275 00:10:11,010 --> 00:10:13,500 But you're trying to do the posterior of the number of orcs 276 00:10:13,500 --> 00:10:15,350 that Gimli took out, given-- 277 00:10:15,350 --> 00:10:17,100 conditioned on-- the fact that all of them 278 00:10:17,100 --> 00:10:19,850 together took out more than 45. 279 00:10:19,850 --> 00:10:21,360 Is this making sense? 280 00:10:21,360 --> 00:10:22,020 OK. 281 00:10:22,020 --> 00:10:26,340 How would I write down that as a rejection query 282 00:10:26,340 --> 00:10:28,800 without using any syntax that you haven't seen already. 283 00:10:28,800 --> 00:10:30,635 Without using anything like query yet, 284 00:10:30,635 --> 00:10:32,010 I'm just going to use, basically, 285 00:10:32,010 --> 00:10:35,817 recursion to write down a rejection query for that. 286 00:10:35,817 --> 00:10:37,650 And later on, you can use a rejection query. 287 00:10:37,650 --> 00:10:39,420 But what I would do is, I would just say, 288 00:10:39,420 --> 00:10:41,430 here's a procedure that's going to give me 289 00:10:41,430 --> 00:10:43,980 back the number of orcs that Gimli took out 290 00:10:43,980 --> 00:10:49,300 conditioned on everyone taking out more than 45. 291 00:10:49,300 --> 00:10:51,547 So I write down a particular generative model. 292 00:10:51,547 --> 00:10:53,380 Like, I know that Legolas took out somewhere 293 00:10:53,380 --> 00:10:55,820 between zero and 20, Gimli took out somewhere zero and 20, 294 00:10:55,820 --> 00:10:57,695 Arwen took out somewhere between zero and 20, 295 00:10:57,695 --> 00:10:59,830 and they all took out the number of total orcs. 296 00:10:59,830 --> 00:11:01,380 Now I say this. 297 00:11:01,380 --> 00:11:05,700 If the total number of orcs is greater than 45, 298 00:11:05,700 --> 00:11:07,137 that's a good sample. 299 00:11:07,137 --> 00:11:08,470 So I'm going to go through this. 300 00:11:08,470 --> 00:11:09,840 I'm going to sample my program. 301 00:11:09,840 --> 00:11:11,670 I'm going to get, Gimli did-- 302 00:11:11,670 --> 00:11:12,461 I don't know-- 303 00:11:12,461 --> 00:11:12,960 15. 304 00:11:12,960 --> 00:11:14,850 Legolas did 10. 305 00:11:14,850 --> 00:11:17,640 So now, say, Arwen did 20 or something like that. 306 00:11:17,640 --> 00:11:19,920 So now we got to 45. 307 00:11:19,920 --> 00:11:21,180 Is that a good world? 308 00:11:21,180 --> 00:11:22,251 Yes, we're over 45. 309 00:11:22,251 --> 00:11:22,750 Fine. 310 00:11:22,750 --> 00:11:24,270 Give me back whatever it was for Gimli. 311 00:11:24,270 --> 00:11:25,686 I don't even remember what it was. 312 00:11:25,686 --> 00:11:27,270 Give me that back. 313 00:11:27,270 --> 00:11:30,270 Suppose it didn't add up to 45. 314 00:11:30,270 --> 00:11:32,190 Try again. 315 00:11:32,190 --> 00:11:35,430 This is basically the circle and square example from before. 316 00:11:35,430 --> 00:11:39,479 The if statement is telling us, if this matches my condition, 317 00:11:39,479 --> 00:11:40,770 sample randomly from the world. 318 00:11:40,770 --> 00:11:41,730 Sample from the square. 319 00:11:41,730 --> 00:11:43,440 Sample from the generative model. 320 00:11:43,440 --> 00:11:45,773 If the thing that you got matches the condition that you 321 00:11:45,773 --> 00:11:47,550 want, give me back the sample. 322 00:11:47,550 --> 00:11:49,230 Give me back that answer. 323 00:11:49,230 --> 00:11:51,420 If it didn't, try again. 324 00:11:51,420 --> 00:11:54,270 And now, if we do this, and we repeat this procedure 325 00:11:54,270 --> 00:11:59,520 1,000 times, then it's no longer uniform distribution. 326 00:11:59,520 --> 00:12:01,630 It's greater than five or whatever. 327 00:12:01,630 --> 00:12:02,880 It's going to be zero on that. 328 00:12:02,880 --> 00:12:04,830 Because if he took out zero, they're never 329 00:12:04,830 --> 00:12:07,550 going to get to 45, right? 330 00:12:07,550 --> 00:12:09,570 And it's probably going to be sort of skewed 331 00:12:09,570 --> 00:12:10,800 in this direction. 332 00:12:10,800 --> 00:12:12,930 So that's the posterior distribution 333 00:12:12,930 --> 00:12:15,450 on how many orcs Gimli took out conditioned on all of them 334 00:12:15,450 --> 00:12:17,250 taking out more than 45. 335 00:12:17,250 --> 00:12:18,360 That's amazing, you guys. 336 00:12:18,360 --> 00:12:22,380 You've just understood rejection query. 337 00:12:22,380 --> 00:12:26,970 You've just written down, in a few very simple lines of code, 338 00:12:26,970 --> 00:12:28,710 a sampling with rejection. 339 00:12:28,710 --> 00:12:31,320 And in fact, you can define all of conditioning 340 00:12:31,320 --> 00:12:32,205 using this thing. 341 00:12:32,205 --> 00:12:34,830 You don't have to get fancy with Metropolis Hastings and things 342 00:12:34,830 --> 00:12:37,380 like that if you're just trying to sort of prove things. 343 00:12:37,380 --> 00:12:39,900 If you're into computer science and things like that, 344 00:12:39,900 --> 00:12:42,864 you can just define conditioning using this. 345 00:12:42,864 --> 00:12:45,030 You're saying, I have some probability distribution. 346 00:12:45,030 --> 00:12:46,380 I'm trying to condition it. 347 00:12:46,380 --> 00:12:49,280 I get a different probability distribution-- the posterior. 348 00:12:49,280 --> 00:12:50,430 How well behaved is it? 349 00:12:50,430 --> 00:12:51,360 Can I write down? 350 00:12:51,360 --> 00:12:53,030 Things like that-- you can prove it. 351 00:12:53,030 --> 00:12:54,946 All the sort of things that you want to prove, 352 00:12:54,946 --> 00:12:57,960 you can prove using something like this construction. 353 00:12:57,960 --> 00:12:59,370 Why shouldn't you use that? 354 00:12:59,370 --> 00:13:01,620 So you can do that if you're into theoretical computer 355 00:13:01,620 --> 00:13:02,119 science. 356 00:13:02,119 --> 00:13:07,252 What is bad about rejection query? 357 00:13:07,252 --> 00:13:07,960 Does anyone know? 358 00:13:07,960 --> 00:13:08,800 Can you guess? 359 00:13:08,800 --> 00:13:09,300 Sorry? 360 00:13:09,300 --> 00:13:10,216 AUDIENCE: It's costly. 361 00:13:10,216 --> 00:13:11,764 TOMER ULLMAN: Costly in what sense? 362 00:13:11,764 --> 00:13:15,677 AUDIENCE: [INAUDIBLE] 363 00:13:15,677 --> 00:13:16,510 TOMER ULLMAN: Right. 364 00:13:16,510 --> 00:13:17,020 Exactly. 365 00:13:17,020 --> 00:13:20,430 Depending on the condition, it might be a very, very bad idea. 366 00:13:20,430 --> 00:13:22,340 Here, I could sort of do the condition, 367 00:13:22,340 --> 00:13:26,350 because I ran the model forward, and sometimes, I got over 45. 368 00:13:26,350 --> 00:13:29,350 So yeah, why is rejection query a particularly bad example? 369 00:13:29,350 --> 00:13:31,940 Because your condition might be really, 370 00:13:31,940 --> 00:13:34,660 really, really hard to satisfy. 371 00:13:34,660 --> 00:13:40,835 So if, for example, I change this to 59, 372 00:13:40,835 --> 00:13:41,710 what will happen now? 373 00:13:45,384 --> 00:13:47,050 I think because of the way I wrote this, 374 00:13:47,050 --> 00:13:48,091 it will never reach that. 375 00:13:48,091 --> 00:13:50,020 But thanks for that point. 376 00:13:50,020 --> 00:13:52,645 But let's change this to this. 377 00:13:56,320 --> 00:13:59,860 Don't run this, by the way, because it will never stop. 378 00:13:59,860 --> 00:14:01,630 Or it will take a long time. 379 00:14:01,630 --> 00:14:04,480 So now, it's only going to be fulfilled if each one of them 380 00:14:04,480 --> 00:14:05,980 took out 20 orcs, right? 381 00:14:05,980 --> 00:14:08,620 They would all need to take out 20 orcs for the total number 382 00:14:08,620 --> 00:14:11,540 of orcs to be equal to 60. 383 00:14:11,540 --> 00:14:12,800 When is that going to happen? 384 00:14:12,800 --> 00:14:17,800 It's going to happen 1 in 20 times 1 in 20 times 1 in 20. 385 00:14:17,800 --> 00:14:21,520 You're going to waste a lot of samples on something 386 00:14:21,520 --> 00:14:23,120 that's never going to happen. 387 00:14:23,120 --> 00:14:25,487 And it doesn't even matter that much. 388 00:14:25,487 --> 00:14:27,820 And you can easily look at this and you can sort of say, 389 00:14:27,820 --> 00:14:30,320 well, obviously, Gimli took out more than this, 390 00:14:30,320 --> 00:14:33,530 because I know how to program and I can figure it out. 391 00:14:33,530 --> 00:14:36,312 But oftentimes, you will find that you can't exactly say. 392 00:14:36,312 --> 00:14:37,770 You look at some convoluted program 393 00:14:37,770 --> 00:14:40,270 and you won't exactly know how this should look 394 00:14:40,270 --> 00:14:43,050 or whether it's easy or whether it's hard. 395 00:14:43,050 --> 00:14:46,480 But in that sense, rejection query is probably a bad idea. 396 00:14:46,480 --> 00:14:48,620 Another example of why it's a bad idea 397 00:14:48,620 --> 00:14:49,870 is something like precision. 398 00:14:49,870 --> 00:14:52,324 So let's see. 399 00:14:52,324 --> 00:14:54,115 Don't run this, because it'll take forever. 400 00:14:57,310 --> 00:14:59,352 Let's do an estimate of pi. 401 00:14:59,352 --> 00:15:01,810 And again, I was sort of going to give this as an exercise. 402 00:15:01,810 --> 00:15:05,810 But since we don't have a lot of time, 403 00:15:05,810 --> 00:15:08,744 here's one thing that you could do with rejection query, 404 00:15:08,744 --> 00:15:10,410 or you can do it with any sort of query. 405 00:15:16,360 --> 00:15:19,390 You could try to estimate pi. 406 00:15:19,390 --> 00:15:21,580 Literally, using that example that I just did, 407 00:15:21,580 --> 00:15:25,330 you could try to say, OK, sample from some square, 408 00:15:25,330 --> 00:15:29,270 only accept the Xs that are in the circle, 409 00:15:29,270 --> 00:15:32,314 and then sort of try to estimate how many samples you 410 00:15:32,314 --> 00:15:34,480 got in the circle out of the total number of samples 411 00:15:34,480 --> 00:15:35,500 you took. 412 00:15:35,500 --> 00:15:39,120 So run 1,000 samples, and see how many of those fall 413 00:15:39,120 --> 00:15:40,270 in the square. 414 00:15:40,270 --> 00:15:43,060 And if you run 1,000, you'll get three point something. 415 00:15:43,060 --> 00:15:43,810 You can try this. 416 00:15:43,810 --> 00:15:45,429 I did it as an exercise for you. 417 00:15:45,429 --> 00:15:47,470 You can see I sort of set up some of this syntax. 418 00:15:47,470 --> 00:15:50,410 If you want to try this out later, please do. 419 00:15:50,410 --> 00:15:54,010 You run 10,000 samples, it's like 3.1. 420 00:15:54,010 --> 00:15:57,956 I think if you do it 100,000, it'll probably do 3.14-- 421 00:15:57,956 --> 00:15:59,080 maybe not that much better. 422 00:15:59,080 --> 00:16:02,380 Like, 100,000 samples, seriously, to get 3.14-- 423 00:16:02,380 --> 00:16:04,390 we all know it's 3.1415. 424 00:16:04,390 --> 00:16:06,700 Sometimes, if you can use math, you 425 00:16:06,700 --> 00:16:09,230 should use math, in some cases. 426 00:16:09,230 --> 00:16:11,015 On the other hand-- 427 00:16:11,015 --> 00:16:13,390 this is, by the way, a reason why you should probably not 428 00:16:13,390 --> 00:16:15,306 use sampling in general when you can use math. 429 00:16:17,890 --> 00:16:19,879 We can solve it analytically-- 430 00:16:19,879 --> 00:16:22,170 if you're interested in precision and things like that. 431 00:16:22,170 --> 00:16:24,220 But suppose you're not interested in precision. 432 00:16:24,220 --> 00:16:28,930 I'm actually going to try and help that. 433 00:16:28,930 --> 00:16:31,330 So you can actually get a pretty good estimate 434 00:16:31,330 --> 00:16:33,734 from about 10 samples. 435 00:16:33,734 --> 00:16:35,400 If you just do 10 samples on this thing, 436 00:16:35,400 --> 00:16:40,080 you'll probably hit something like three as an estimate. 437 00:16:40,080 --> 00:16:43,152 You can do the histogram for where it will fall. 438 00:16:43,152 --> 00:16:45,110 Most of the samples-- like 70% of the samples-- 439 00:16:45,110 --> 00:16:47,600 will fall between 2.8 and 3.6. 440 00:16:47,600 --> 00:16:49,680 If that's what you care about, then that 441 00:16:49,680 --> 00:16:52,260 might be what all your vision system cares about, 442 00:16:52,260 --> 00:16:54,710 or different things that might require sampling. 443 00:16:54,710 --> 00:16:56,550 Then that's fine. 444 00:16:56,550 --> 00:16:59,130 And I'm not going to go too much into this, 445 00:16:59,130 --> 00:17:02,010 because, like I said, the dream of probabilistic programming 446 00:17:02,010 --> 00:17:03,724 is sort of to free you from thinking 447 00:17:03,724 --> 00:17:04,890 too much about the sampling. 448 00:17:04,890 --> 00:17:07,170 But those of you that are interested in sampling, 449 00:17:07,170 --> 00:17:09,420 that are interested in algorithmic learning and things 450 00:17:09,420 --> 00:17:11,910 like that, there's been a lot of research on exactly that. 451 00:17:11,910 --> 00:17:15,714 When are we OK with just taking one sample? 452 00:17:15,714 --> 00:17:17,339 We write down some sort of model and we 453 00:17:17,339 --> 00:17:21,270 see how well the model does by taking one sample, 10 samples. 454 00:17:21,270 --> 00:17:23,130 What's the precision that you can get? 455 00:17:23,130 --> 00:17:25,440 And does that precision match people 456 00:17:25,440 --> 00:17:29,219 trying to perform a similar task of estimating that thing? 457 00:17:29,219 --> 00:17:31,260 That's, again, like the sampling hypothesis-- not 458 00:17:31,260 --> 00:17:34,140 sampling hypothesis for neurons, sampling hypothesis for the way 459 00:17:34,140 --> 00:17:35,750 people answer questions. 460 00:17:35,750 --> 00:17:38,460 They sample one or two or a few number 461 00:17:38,460 --> 00:17:41,130 of points from their generative model-- not that many. 462 00:17:41,130 --> 00:17:43,890 And the claim is that you can sort of 463 00:17:43,890 --> 00:17:46,920 see that they get better with time, that they probably 464 00:17:46,920 --> 00:17:49,410 don't take that much if you give them more time to think. 465 00:17:49,410 --> 00:17:51,076 It looks a bit like a sampling procedure 466 00:17:51,076 --> 00:17:52,126 that takes more samples. 467 00:17:52,126 --> 00:17:54,250 And it seems like you can get away with quite a lot 468 00:17:54,250 --> 00:17:57,810 if you just do 10 samples or 100 samples for pi. 469 00:17:57,810 --> 00:17:59,670 And there's this SMBC cartoon that I quite 470 00:17:59,670 --> 00:18:06,720 like, which is like, why shouldn't physicists 471 00:18:06,720 --> 00:18:07,640 teach geometry? 472 00:18:07,640 --> 00:18:09,390 And they're like, well, you know, how do I 473 00:18:09,390 --> 00:18:12,180 remember the value of pi? 474 00:18:12,180 --> 00:18:12,930 It's quite simple. 475 00:18:12,930 --> 00:18:15,570 I look at my fingers and there's five of them, 476 00:18:15,570 --> 00:18:16,410 and that's about pi. 477 00:18:21,880 --> 00:18:24,340 What else could we do if we don't 478 00:18:24,340 --> 00:18:26,170 want to use rejection query, if we don't 479 00:18:26,170 --> 00:18:28,480 want to use rejection sampling? 480 00:18:28,480 --> 00:18:29,200 Suppose we don't. 481 00:18:29,200 --> 00:18:30,310 We probably don't. 482 00:18:30,310 --> 00:18:32,592 We could try to do exhaustive enumeration. 483 00:18:32,592 --> 00:18:34,300 If our model is small enough, we can just 484 00:18:34,300 --> 00:18:37,766 consider all the possibilities and explicitly score them. 485 00:18:37,766 --> 00:18:39,640 The other thing that we could do is something 486 00:18:39,640 --> 00:18:41,140 like Metropolis Hastings. 487 00:18:41,140 --> 00:18:44,330 How many of you are familiar with Metropolis Hastings? 488 00:18:44,330 --> 00:18:45,460 OK. 489 00:18:45,460 --> 00:18:47,290 Why don't we raise our hands to the degree 490 00:18:47,290 --> 00:18:49,456 that we are familiar with Metropolis Hastings, where 491 00:18:49,456 --> 00:18:51,660 here is really familiar, here is not familiar. 492 00:18:51,660 --> 00:18:52,432 OK. 493 00:18:52,432 --> 00:18:53,890 Metropolis Hastings-- I'm not going 494 00:18:53,890 --> 00:18:56,062 to go too much into the details. 495 00:18:56,062 --> 00:18:57,520 I'll just be doing it a disservice. 496 00:18:57,520 --> 00:19:02,140 But the way to think about it is to say, instead of just 497 00:19:02,140 --> 00:19:06,010 sampling at random from the entire space of things that 498 00:19:06,010 --> 00:19:10,150 could be really, really bad equally, I'm going to try 499 00:19:10,150 --> 00:19:14,260 and sample from something that I think is likely to be good. 500 00:19:14,260 --> 00:19:17,440 And the way I'm going to do that is, 501 00:19:17,440 --> 00:19:20,440 I'm going to construct, basically-- 502 00:19:20,440 --> 00:19:22,480 what's the best way to explain this? 503 00:19:22,480 --> 00:19:25,990 It's to say, I'm at a particular point in the space. 504 00:19:25,990 --> 00:19:27,560 I've gotten my sample. 505 00:19:27,560 --> 00:19:28,310 I already have it. 506 00:19:28,310 --> 00:19:29,540 What should I do now? 507 00:19:29,540 --> 00:19:32,320 Rejection sampling just says, well, just sample another one 508 00:19:32,320 --> 00:19:35,080 from the generative model and see if that works. 509 00:19:35,080 --> 00:19:36,184 That's a bad idea. 510 00:19:36,184 --> 00:19:37,600 What you should actually do is try 511 00:19:37,600 --> 00:19:40,120 to use the sample that you already have 512 00:19:40,120 --> 00:19:44,542 and how good it is as a sample to inform your next move. 513 00:19:44,542 --> 00:19:46,000 So now, what you're going to do is, 514 00:19:46,000 --> 00:19:47,860 you're in this particular point in space, 515 00:19:47,860 --> 00:19:50,890 and you're going to move to a different point in space that 516 00:19:50,890 --> 00:19:53,447 depends on the point that you are now. 517 00:19:53,447 --> 00:19:55,780 So for example, if you're in some two-dimensional space, 518 00:19:55,780 --> 00:19:58,210 and you're over here, you're not going to sample 519 00:19:58,210 --> 00:19:59,844 from this square. 520 00:19:59,844 --> 00:20:02,260 You're going to sample from a point next to it, let's say. 521 00:20:02,260 --> 00:20:05,290 That's your proposal. 522 00:20:05,290 --> 00:20:07,510 You sample according to your proposal distribution. 523 00:20:07,510 --> 00:20:09,010 Your proposal distribution tells you 524 00:20:09,010 --> 00:20:11,500 where you should sample next. 525 00:20:11,500 --> 00:20:14,650 So you're here in the space of all possible programs 526 00:20:14,650 --> 00:20:15,864 or your theory space. 527 00:20:15,864 --> 00:20:18,280 Metropolis Hastings is more than just programs-- anything. 528 00:20:18,280 --> 00:20:20,020 You're in the space of possible things 529 00:20:20,020 --> 00:20:21,530 that you're trying to sample from. 530 00:20:21,530 --> 00:20:22,030 You're here. 531 00:20:22,030 --> 00:20:23,680 You've got one sample. 532 00:20:23,680 --> 00:20:25,180 Now, you're sort of looking around, 533 00:20:25,180 --> 00:20:26,270 and you take another sample. 534 00:20:26,270 --> 00:20:28,436 You do that according to your proposal distribution. 535 00:20:28,436 --> 00:20:29,560 Your jump there. 536 00:20:29,560 --> 00:20:31,030 And you evaluate this point. 537 00:20:31,030 --> 00:20:32,530 And the way that you evaluate it is, 538 00:20:32,530 --> 00:20:35,465 you just look, how well does this thing fit the data? 539 00:20:35,465 --> 00:20:37,090 And we can get into that-- but the ways 540 00:20:37,090 --> 00:20:40,540 you sort of score your model according to the data. 541 00:20:40,540 --> 00:20:43,590 And you say, well, this one fits the data pretty well. 542 00:20:43,590 --> 00:20:45,940 How well does this one fit the data? 543 00:20:45,940 --> 00:20:46,970 Not so great. 544 00:20:46,970 --> 00:20:49,110 So I should probably move over there. 545 00:20:49,110 --> 00:20:51,690 I should move to that point in program space. 546 00:20:51,690 --> 00:20:54,730 I'm going to move over here. 547 00:20:54,730 --> 00:20:56,260 Now I sample another point. 548 00:20:56,260 --> 00:20:57,180 I go over here. 549 00:20:57,180 --> 00:20:58,390 How well does this do? 550 00:20:58,390 --> 00:21:01,000 Not as great, but I might still accept it. 551 00:21:01,000 --> 00:21:03,610 The point is, you're going to accept and reject 552 00:21:03,610 --> 00:21:07,270 new points, new executions of the program 553 00:21:07,270 --> 00:21:09,700 according to how they predict the data, 554 00:21:09,700 --> 00:21:11,994 according to how well they answer the condition. 555 00:21:11,994 --> 00:21:13,660 If the condition is a simple true-false, 556 00:21:13,660 --> 00:21:16,127 that just means that you have an absolute yes or no. 557 00:21:16,127 --> 00:21:17,710 But many of these conditions are going 558 00:21:17,710 --> 00:21:21,550 to be something like, well, it's good if it matches it. 559 00:21:21,550 --> 00:21:26,600 You get some score from the likelihood and the prior. 560 00:21:26,600 --> 00:21:29,620 So you can score this new point in program space 561 00:21:29,620 --> 00:21:31,330 and either accept or reject it. 562 00:21:31,330 --> 00:21:35,110 And this thing of moving around in program space 563 00:21:35,110 --> 00:21:38,530 and sampling according to some new proposal distribution 564 00:21:38,530 --> 00:21:41,590 and accepting or rejecting and moving around like that 565 00:21:41,590 --> 00:21:46,220 is a lot more efficient in most cases than rejection sampling. 566 00:21:46,220 --> 00:21:48,970 And in the limit, if you keep on doing this-- 567 00:21:48,970 --> 00:21:52,139 if you keep on walking around, walking around, taking samples, 568 00:21:52,139 --> 00:21:53,680 accepting or rejecting them depending 569 00:21:53,680 --> 00:21:57,850 on how well this new point in program states does-- 570 00:21:57,850 --> 00:22:00,800 what you'll end up with is the posterior distribution 571 00:22:00,800 --> 00:22:03,430 that you're trying to sample from. 572 00:22:03,430 --> 00:22:06,950 And I should say, what is a point in program space? 573 00:22:06,950 --> 00:22:10,150 It just means a program that I have completely evaluated. 574 00:22:10,150 --> 00:22:13,600 Like in the case of the London bombing, it would be, 575 00:22:13,600 --> 00:22:17,080 I have two Gaussians, and this one's center is here, 576 00:22:17,080 --> 00:22:18,670 and this one's center is here. 577 00:22:18,670 --> 00:22:21,100 I sort of walk through all the things that I can sample. 578 00:22:21,100 --> 00:22:24,222 I've gotten one particular run of the program. 579 00:22:24,222 --> 00:22:25,930 And then I try to move to somewhere else. 580 00:22:25,930 --> 00:22:28,060 Like, I might change the center of this Gaussian. 581 00:22:28,060 --> 00:22:29,110 Or I might say, well, you know what? 582 00:22:29,110 --> 00:22:30,730 Actually, there aren't two Gaussians. 583 00:22:30,730 --> 00:22:31,000 Let's run it again. 584 00:22:31,000 --> 00:22:31,990 Let's run it again. 585 00:22:31,990 --> 00:22:34,180 There was actually 10. 586 00:22:34,180 --> 00:22:36,100 So what I do, the way I change, the way 587 00:22:36,100 --> 00:22:37,750 I move around in program space is 588 00:22:37,750 --> 00:22:40,120 to go to a particular point along the tree 589 00:22:40,120 --> 00:22:43,439 of the evaluation and say, what if I change that? 590 00:22:43,439 --> 00:22:44,480 What would I end up with? 591 00:22:44,480 --> 00:22:46,930 I sort of re-sample. 592 00:22:46,930 --> 00:22:47,656 And I re-sample. 593 00:22:47,656 --> 00:22:49,030 I end up with some other program. 594 00:22:49,030 --> 00:22:51,520 I basically say, how good is that? 595 00:22:51,520 --> 00:22:54,029 Yes, no-- and I accept or reject that. 596 00:22:54,029 --> 00:22:56,320 As I said, I'm doing this a little bit of a disservice. 597 00:22:56,320 --> 00:22:59,710 But if you keep that mental image in your head of something 598 00:22:59,710 --> 00:23:01,990 bouncing around and accepting or rejecting 599 00:23:01,990 --> 00:23:05,200 new proposals according to how well they do compared to one 600 00:23:05,200 --> 00:23:07,300 another in a sort of pairwise fashion, 601 00:23:07,300 --> 00:23:10,141 you won't go far wrong. 602 00:23:10,141 --> 00:23:12,640 What this also tells you is that it's a little bit important 603 00:23:12,640 --> 00:23:14,620 where you start out. 604 00:23:14,620 --> 00:23:17,380 So if I start out in this particular point in program 605 00:23:17,380 --> 00:23:20,114 space, or in any space, and I look locally, 606 00:23:20,114 --> 00:23:22,030 I might accept or reject and things like that, 607 00:23:22,030 --> 00:23:24,600 but, actually, the really good stuff is over here. 608 00:23:24,600 --> 00:23:26,960 The high probability stuff is over here. 609 00:23:26,960 --> 00:23:29,220 But it'll take me a long time to get to that. 610 00:23:29,220 --> 00:23:30,805 Because I started out here. 611 00:23:30,805 --> 00:23:33,510 Does everyone sort of understand what I mean when I say "here?" 612 00:23:33,510 --> 00:23:35,970 So supposing you have a random square, 613 00:23:35,970 --> 00:23:38,370 and you're trying to sample from a probability 614 00:23:38,370 --> 00:23:41,380 distribution over the square, and in the corner 615 00:23:41,380 --> 00:23:42,790 is this much less probability. 616 00:23:42,790 --> 00:23:45,090 But you started on the corner for whatever reason. 617 00:23:45,090 --> 00:23:46,130 And now, you're trying to figure out 618 00:23:46,130 --> 00:23:48,330 how to get to those good samples in the center, 619 00:23:48,330 --> 00:23:50,827 but you can only move locally. 620 00:23:50,827 --> 00:23:52,410 You will eventually get to the center. 621 00:23:52,410 --> 00:23:53,868 If you run this on long enough, you 622 00:23:53,868 --> 00:23:56,910 will eventually get to those good probability spaces. 623 00:23:56,910 --> 00:24:00,340 But it depends a lot on where you started out. 624 00:24:00,340 --> 00:24:02,730 And that just means that, oftentimes, in Metropolis 625 00:24:02,730 --> 00:24:04,410 Hastings and MCMC and things like that, 626 00:24:04,410 --> 00:24:06,290 you hear about burn-in, which is just 627 00:24:06,290 --> 00:24:10,500 to say, we want to get rid of the initial x samples, 628 00:24:10,500 --> 00:24:13,180 because those samples are going to be biased. 629 00:24:13,180 --> 00:24:15,390 They're going to depend on where we started out. 630 00:24:15,390 --> 00:24:18,060 And the hope is that, after x samples, 631 00:24:18,060 --> 00:24:19,140 we're no longer biased. 632 00:24:19,140 --> 00:24:21,570 We no longer remember where we started from. 633 00:24:21,570 --> 00:24:25,567 We're just sort of sampling around in the space. 634 00:24:25,567 --> 00:24:27,150 So the way Metropolis Hastings works-- 635 00:24:27,150 --> 00:24:29,600 and this is the backbone of inference in Church-- 636 00:24:29,600 --> 00:24:31,672 is, you would write down something like mh-query, 637 00:24:31,672 --> 00:24:33,630 then you would write down the number of samples 638 00:24:33,630 --> 00:24:37,140 that you want from your posterior distribution. 639 00:24:37,140 --> 00:24:38,940 You would write down the lag. 640 00:24:38,940 --> 00:24:42,652 The lag is just to say, forget every x steps. 641 00:24:42,652 --> 00:24:45,110 If you want to talk about this, we can talk about it later. 642 00:24:45,110 --> 00:24:46,220 It's not particularly interesting. 643 00:24:46,220 --> 00:24:47,590 These are just two numbers. 644 00:24:47,590 --> 00:24:49,881 And if you make them bigger, you will get more samples. 645 00:24:49,881 --> 00:24:53,119 You will get a better estimate of your posterior distribution. 646 00:24:53,119 --> 00:24:54,660 You write down some generative model, 647 00:24:54,660 --> 00:24:56,201 you write down what you want to know, 648 00:24:56,201 --> 00:24:58,940 and you write down what you actually know. 649 00:24:58,940 --> 00:25:03,036 And you do a random walk in the program evaluation space. 650 00:25:03,036 --> 00:25:04,410 Like Josh said, what's nice about 651 00:25:04,410 --> 00:25:06,810 this is that it's very, very, very, very general. 652 00:25:06,810 --> 00:25:08,790 This will work on any program, more or less, 653 00:25:08,790 --> 00:25:10,034 defined correctly. 654 00:25:10,034 --> 00:25:12,450 You need to make some decisions, like how many samples you 655 00:25:12,450 --> 00:25:16,470 want to take, what the lag is, what the burn-in is. 656 00:25:16,470 --> 00:25:18,720 You can do all sorts of fanciness. 657 00:25:18,720 --> 00:25:20,000 You can do particle filtering. 658 00:25:20,000 --> 00:25:21,660 You could run several chains. 659 00:25:21,660 --> 00:25:24,282 You can do temperate annealing. 660 00:25:24,282 --> 00:25:25,740 You can do lots of different things 661 00:25:25,740 --> 00:25:27,720 that I just said and might not make a lot of sense. 662 00:25:27,720 --> 00:25:29,470 But the point is that this procedure could 663 00:25:29,470 --> 00:25:31,169 be made more or less fancy. 664 00:25:31,169 --> 00:25:33,210 One of the problems with it is, it takes a while. 665 00:25:33,210 --> 00:25:36,390 Like Josh said, there's a lot of better algorithms. 666 00:25:36,390 --> 00:25:38,610 If you know what your representation is, and it's 667 00:25:38,610 --> 00:25:40,720 something like a feedforward neural network, 668 00:25:40,720 --> 00:25:42,930 you probably shouldn't do Metropolis Hastings on it. 669 00:25:42,930 --> 00:25:45,150 There's a lot of very fast things 670 00:25:45,150 --> 00:25:46,860 that you could do, like gradient descent, 671 00:25:46,860 --> 00:25:48,234 and you don't need to wait around 672 00:25:48,234 --> 00:25:49,380 for this thing to happen. 673 00:25:52,030 --> 00:25:52,530 Let's see. 674 00:25:52,530 --> 00:25:54,750 So I think we have enough time to give you 675 00:25:54,750 --> 00:25:57,760 some examples of inference. 676 00:25:57,760 --> 00:25:59,970 Let's walk through some coin testing examples, 677 00:25:59,970 --> 00:26:01,803 a bit of intuitive physics, and a little bit 678 00:26:01,803 --> 00:26:02,980 of social reasoning. 679 00:26:02,980 --> 00:26:07,080 So suppose that I took a coin and I flipped it, 680 00:26:07,080 --> 00:26:08,670 and it came up heads. 681 00:26:08,670 --> 00:26:09,420 What do you think? 682 00:26:09,420 --> 00:26:11,760 Is this coin weird? 683 00:26:11,760 --> 00:26:12,260 No. 684 00:26:12,260 --> 00:26:13,340 It's OK to say no. 685 00:26:13,340 --> 00:26:16,176 What if I flipped and it got heads five times in a row. 686 00:26:16,176 --> 00:26:17,300 Do people think it's weird? 687 00:26:19,644 --> 00:26:21,560 Raise your hand to the degree that it's weird. 688 00:26:21,560 --> 00:26:24,220 Is it weird that it's five? 689 00:26:24,220 --> 00:26:26,412 If I flipped it 10 times and it came up heads, 690 00:26:26,412 --> 00:26:28,370 raise your hands to the degree that it's weird. 691 00:26:28,370 --> 00:26:30,490 15 times in a row, heads? 692 00:26:30,490 --> 00:26:32,110 20 times in a row, heads? 693 00:26:32,110 --> 00:26:34,670 OK, we more or less asymptoted somewhere between 10 and 15, 694 00:26:34,670 --> 00:26:36,190 which is exactly right. 695 00:26:36,190 --> 00:26:40,870 And the point here is something like, we have a particular 696 00:26:40,870 --> 00:26:43,390 prior over what we think the weight of the coin is. 697 00:26:43,390 --> 00:26:45,874 We're pretty sure that the coin is not biased. 698 00:26:45,874 --> 00:26:47,290 We're pretty sure that the coin is 699 00:26:47,290 --> 00:26:49,420 supposed to be equal weighted. 700 00:26:49,420 --> 00:26:51,460 But then we get more and more evidence, 701 00:26:51,460 --> 00:26:53,376 and we sort of figure out that, wait a minute, 702 00:26:53,376 --> 00:26:54,910 no, this might be a trick coin. 703 00:26:54,910 --> 00:26:56,254 This might be weird. 704 00:26:56,254 --> 00:26:57,670 And the point of the first example 705 00:26:57,670 --> 00:27:02,740 is to show you the basics of conditioning and inference 706 00:27:02,740 --> 00:27:05,750 and things like that using the coin example. 707 00:27:05,750 --> 00:27:09,610 So what we would do is, we would take in-- 708 00:27:09,610 --> 00:27:12,460 and again, as I said, I'm slightly going to rush this. 709 00:27:12,460 --> 00:27:14,530 But I'll still try to explain it. 710 00:27:14,530 --> 00:27:16,630 We're going to define some observed data. 711 00:27:16,630 --> 00:27:18,310 Suppose our observed data is that we 712 00:27:18,310 --> 00:27:20,640 got five heads in a row. 713 00:27:20,640 --> 00:27:21,210 Is that five? 714 00:27:21,210 --> 00:27:21,710 Yes. 715 00:27:21,710 --> 00:27:23,180 It's five. 716 00:27:23,180 --> 00:27:25,420 And now we're going to say, OK, we're 717 00:27:25,420 --> 00:27:27,730 going to define something. 718 00:27:27,730 --> 00:27:30,280 We're going to define an inference procedure. 719 00:27:30,280 --> 00:27:32,170 We're going to call it samples. 720 00:27:32,170 --> 00:27:34,045 The way it's going to work is that it's going 721 00:27:34,045 --> 00:27:36,190 to give us 1,000 samples back. 722 00:27:36,190 --> 00:27:38,080 We said we're going to write mh-query, 723 00:27:38,080 --> 00:27:39,820 the number of samples, and now we're 724 00:27:39,820 --> 00:27:41,471 going to define a generative model. 725 00:27:41,471 --> 00:27:42,970 We're going to end up with the thing 726 00:27:42,970 --> 00:27:45,140 that we're actually interested in under a certain condition. 727 00:27:45,140 --> 00:27:46,630 So what's our model for the world-- 728 00:27:46,630 --> 00:27:48,550 for this simple world? 729 00:27:48,550 --> 00:27:52,660 Let's say my prior on this being a fair coin is very high. 730 00:27:52,660 --> 00:27:54,660 One means I'm absolutely sure it's a fair coin. 731 00:27:54,660 --> 00:27:57,400 Zero means it's not a fair coin. 732 00:27:57,400 --> 00:28:00,460 And we're going to put in a big prior on it being a fair coin. 733 00:28:00,460 --> 00:28:02,740 It's going to be 0.999. 734 00:28:02,740 --> 00:28:04,690 And then we're going to basically say, 735 00:28:04,690 --> 00:28:05,740 somewhere in the beginning, the way 736 00:28:05,740 --> 00:28:07,720 I think the world works is that you're going to pull up 737 00:28:07,720 --> 00:28:09,761 a new coin off the mint, and you're going to say, 738 00:28:09,761 --> 00:28:11,030 is it a fair coin or not? 739 00:28:16,690 --> 00:28:18,710 999 out of 1,000 are fair. 740 00:28:18,710 --> 00:28:20,320 One is not. 741 00:28:20,320 --> 00:28:21,760 So we're basically saying this. 742 00:28:21,760 --> 00:28:23,710 We're going to say, is it a fair coin? 743 00:28:23,710 --> 00:28:28,051 And this is going to come up true 999 times out of 1,000. 744 00:28:28,051 --> 00:28:30,300 And it's going to come up false one time out of 1,000. 745 00:28:30,300 --> 00:28:31,925 Because what we're basically doing here 746 00:28:31,925 --> 00:28:34,350 is just flipping a coin. 747 00:28:34,350 --> 00:28:37,764 We're flipping a coin with a bias of this prior. 748 00:28:37,764 --> 00:28:39,180 So what we have here is this thing 749 00:28:39,180 --> 00:28:41,427 is going to come up this thing-- fair coin. 750 00:28:41,427 --> 00:28:43,260 It's going to come up without any knowledge, 751 00:28:43,260 --> 00:28:45,810 without any data, without seeing anything. 752 00:28:45,810 --> 00:28:48,570 Just, you took a coin off the mint. 753 00:28:48,570 --> 00:28:50,520 999 times out of 1,000, you think 754 00:28:50,520 --> 00:28:52,930 it's going to be fair, without seeing any data. 755 00:28:52,930 --> 00:28:54,750 Now you're going to create a coin. 756 00:28:54,750 --> 00:28:56,480 The coin is going to take in some weight. 757 00:28:56,480 --> 00:28:58,854 There's this procedure that you can flip and actually get 758 00:28:58,854 --> 00:29:00,050 heads or tails. 759 00:29:00,050 --> 00:29:03,480 And the way this coin is going to work is that, if it's fair, 760 00:29:03,480 --> 00:29:06,630 it's going to have a weight of 0.5. 761 00:29:06,630 --> 00:29:09,640 If it's unfair-- and this is a very simple example-- 762 00:29:09,640 --> 00:29:12,510 it's going to have a weight of 0.95. 763 00:29:12,510 --> 00:29:15,780 So the fair coin comes up heads or tails equally likely. 764 00:29:15,780 --> 00:29:18,030 The unfair coin-- the trick coin-- comes up 765 00:29:18,030 --> 00:29:20,939 heads almost all the time. 766 00:29:20,939 --> 00:29:22,980 And again, you can define a different hypothesis. 767 00:29:22,980 --> 00:29:24,021 It doesn't really matter. 768 00:29:24,021 --> 00:29:26,160 But the point is, I defined some sort of coin. 769 00:29:26,160 --> 00:29:29,250 And now, I define some sort of hypothesized data. 770 00:29:29,250 --> 00:29:32,170 Well, the hypothesized data is just, I sample from this coin 771 00:29:32,170 --> 00:29:34,380 that I just made. 772 00:29:34,380 --> 00:29:36,990 And what I want to know is, is this a fair coin? 773 00:29:36,990 --> 00:29:40,110 Yes or no. 774 00:29:40,110 --> 00:29:41,670 The last statement is conditioned 775 00:29:41,670 --> 00:29:44,580 on this sample data, this illusory data, 776 00:29:44,580 --> 00:29:50,100 this imagined data being equal to the observed data. 777 00:29:50,100 --> 00:29:53,230 So now, you have some sort of program, 778 00:29:53,230 --> 00:29:55,710 and you're trying to figure out, did this come out 779 00:29:55,710 --> 00:29:57,240 to be a fair coin or not? 780 00:29:57,240 --> 00:30:00,310 When I did this here, if I didn't condition on anything, 781 00:30:00,310 --> 00:30:03,999 then it, 999 times out of 1,000, should give me back, 782 00:30:03,999 --> 00:30:05,040 yes, this is a fair coin. 783 00:30:05,040 --> 00:30:06,930 But I've now conditioned on some data. 784 00:30:06,930 --> 00:30:10,810 And the data is that it came up five times heads in a row. 785 00:30:10,810 --> 00:30:13,290 And if you do a histogram for that, 786 00:30:13,290 --> 00:30:17,070 then you'll find that it's still very likely to be a fair coin. 787 00:30:17,070 --> 00:30:20,040 Because the prior is so strong. 788 00:30:20,040 --> 00:30:21,750 But now, we can change. 789 00:30:21,750 --> 00:30:26,162 We can change the data to add a few more heads here. 790 00:30:26,162 --> 00:30:27,870 And I think we're now more or less at 10. 791 00:30:27,870 --> 00:30:30,700 And it's starting to be like, well, 792 00:30:30,700 --> 00:30:32,160 is it a fair coin-- yes or no? 793 00:30:32,160 --> 00:30:36,284 Well, I'm like 60% sure that it's a fair coin now. 794 00:30:36,284 --> 00:30:37,200 What if I flipped it-- 795 00:30:37,200 --> 00:30:42,710 I don't know-- like 20 times or something like that, 796 00:30:42,710 --> 00:30:44,690 and I came up with that? 797 00:30:44,690 --> 00:30:48,650 And it's basically saying that it's 100% not a fair coin. 798 00:30:48,650 --> 00:30:49,380 This is false. 799 00:30:49,380 --> 00:30:51,211 It's basically saying, is it a fair coin? 800 00:30:51,211 --> 00:30:51,710 No. 801 00:30:55,940 --> 00:30:57,480 Even though the prior is strong-- 802 00:30:57,480 --> 00:30:58,940 even though if I just ran my generative model 803 00:30:58,940 --> 00:31:01,315 without any conditions, usually, it would be a fair coin, 804 00:31:01,315 --> 00:31:04,040 there is no way that I would run my generative model and sample 805 00:31:04,040 --> 00:31:04,905 it 20 times-- 806 00:31:04,905 --> 00:31:07,280 that coin flip-- and it would come up heads all the time, 807 00:31:07,280 --> 00:31:10,006 when the alternative is that it's going to come up heads. 808 00:31:10,006 --> 00:31:12,130 So now, you can sort of play around with this coin. 809 00:31:12,130 --> 00:31:13,088 This is a nice example. 810 00:31:13,088 --> 00:31:16,392 And by the way, Josh, some version 811 00:31:16,392 --> 00:31:18,350 of this-- not much more complicated than that-- 812 00:31:18,350 --> 00:31:20,000 was a cognition paper a few years ago, 813 00:31:20,000 --> 00:31:21,375 where, basically, you gave people 814 00:31:21,375 --> 00:31:24,470 different sequences of coins, different sequences of numbers, 815 00:31:24,470 --> 00:31:26,750 and you started to see, where does it become weird? 816 00:31:26,750 --> 00:31:29,554 What hypothesis did they think is likely? 817 00:31:29,554 --> 00:31:31,970 And all that they did was, they gave it a more interesting 818 00:31:31,970 --> 00:31:33,800 hypothesis space. 819 00:31:33,800 --> 00:31:36,770 Instead of saying it's either a fair coin or a coin that's 820 00:31:36,770 --> 00:31:40,100 95% heads, they gave you a more general hypothesis space. 821 00:31:40,100 --> 00:31:42,511 Like, it could be a coin that comes mostly heads, mostly 822 00:31:42,511 --> 00:31:43,010 tails. 823 00:31:43,010 --> 00:31:44,840 Maybe it's a coin that does heads, tails, heads, tails, 824 00:31:44,840 --> 00:31:45,860 heads, tails, heads, tails. 825 00:31:45,860 --> 00:31:48,026 Or you can define the sort of alternative procedure, 826 00:31:48,026 --> 00:31:50,240 but that, you would change over here. 827 00:31:50,240 --> 00:31:53,990 The rest of this would stay more or less the same. 828 00:31:53,990 --> 00:31:57,860 That's a very simple way of getting some hypothesis tested. 829 00:31:57,860 --> 00:32:02,080 Let's do some very simple, intuitive physics. 830 00:32:02,080 --> 00:32:05,760 Let's try something like this. 831 00:32:05,760 --> 00:32:09,050 So in Church, you can basically animate physics forward. 832 00:32:12,452 --> 00:32:14,660 I guess I hadn't counted on, when it's a full screen, 833 00:32:14,660 --> 00:32:16,040 it sort of does that thing. 834 00:32:16,040 --> 00:32:18,200 But what you can do is, you can define, 835 00:32:18,200 --> 00:32:19,730 basically, a two-dimensional world 836 00:32:19,730 --> 00:32:21,980 where you say, listen, here's a two-dimensional world. 837 00:32:21,980 --> 00:32:23,112 It's this big. 838 00:32:23,112 --> 00:32:24,320 I'm going to add some shapes. 839 00:32:24,320 --> 00:32:26,480 I'm going to put the shapes in random locations. 840 00:32:26,480 --> 00:32:28,239 I'm going to set gravity to something. 841 00:32:28,239 --> 00:32:29,780 And then I'm going to run it forward. 842 00:32:29,780 --> 00:32:30,350 What happens? 843 00:32:30,350 --> 00:32:31,310 AUDIENCE: Command minus. 844 00:32:31,310 --> 00:32:31,520 TOMER ULLMAN: Sorry? 845 00:32:31,520 --> 00:32:32,528 AUDIENCE: Command minus. 846 00:32:32,528 --> 00:32:34,194 TOMER ULLMAN: Command minus for running? 847 00:32:34,194 --> 00:32:34,936 OK. 848 00:32:34,936 --> 00:32:35,810 AUDIENCE: [INAUDIBLE] 849 00:32:35,810 --> 00:32:36,976 TOMER ULLMAN: Oh, of course. 850 00:32:36,976 --> 00:32:38,101 Thank you. 851 00:32:38,101 --> 00:32:38,600 Trivial. 852 00:32:38,600 --> 00:32:39,830 Thank you. 853 00:32:39,830 --> 00:32:41,280 Is that better, everybody? 854 00:32:41,280 --> 00:32:42,120 Yeah. 855 00:32:42,120 --> 00:32:42,620 OK. 856 00:32:42,620 --> 00:32:44,078 So I have this thing, and I'm going 857 00:32:44,078 --> 00:32:47,120 to hit simulate to try and see what happens. 858 00:32:47,120 --> 00:32:48,637 So I basically have these things. 859 00:32:48,637 --> 00:32:50,720 I guess, in this case, I didn't put any randomness 860 00:32:50,720 --> 00:32:51,740 on where they actually are. 861 00:32:51,740 --> 00:32:53,239 But you could easily imagine putting 862 00:32:53,239 --> 00:32:55,760 some randomness on where these blocks start out, 863 00:32:55,760 --> 00:32:56,820 where these blocks are. 864 00:32:56,820 --> 00:33:00,327 But the point is, this is just running it forward in physics. 865 00:33:00,327 --> 00:33:01,410 Now what is that good for? 866 00:33:01,410 --> 00:33:04,250 Well, you could, for example, define a tower. 867 00:33:04,250 --> 00:33:06,430 A tower is just defining a bunch of blocks, 868 00:33:06,430 --> 00:33:07,570 one on top of the other. 869 00:33:07,570 --> 00:33:09,910 And now you can run it forward, see what happens. 870 00:33:09,910 --> 00:33:10,660 What do you think? 871 00:33:10,660 --> 00:33:12,620 Is this going to fall or not? 872 00:33:12,620 --> 00:33:13,300 Yes? 873 00:33:13,300 --> 00:33:13,800 No? 874 00:33:13,800 --> 00:33:15,650 Let's see. 875 00:33:15,650 --> 00:33:17,570 And you can simulate that forward. 876 00:33:17,570 --> 00:33:18,450 It fell. 877 00:33:18,450 --> 00:33:19,010 OK. 878 00:33:19,010 --> 00:33:20,060 Very nice. 879 00:33:20,060 --> 00:33:23,210 Now what can we do with that? 880 00:33:23,210 --> 00:33:25,400 And as I said, I'm going to zoom through this. 881 00:33:25,400 --> 00:33:28,710 What we can define is basically a bunch of towers like this. 882 00:33:28,710 --> 00:33:32,090 Each one of them is just saying, the blocks are like that. 883 00:33:32,090 --> 00:33:34,577 And all I'm going to do is slightly perturb them 884 00:33:34,577 --> 00:33:35,660 and see if they fall down. 885 00:33:35,660 --> 00:33:38,592 And I'm going to do that 1,000 times for each tower. 886 00:33:38,592 --> 00:33:40,550 And I'm going to do that for a bunch of towers. 887 00:33:40,550 --> 00:33:41,550 Some of them are stable. 888 00:33:41,550 --> 00:33:42,870 Some of them are not stable. 889 00:33:42,870 --> 00:33:44,744 So you can write down some Church code, which 890 00:33:44,744 --> 00:33:46,840 is basically, this is my world. 891 00:33:46,840 --> 00:33:47,760 There's some ground. 892 00:33:47,760 --> 00:33:49,051 The ground is just a rectangle. 893 00:33:49,051 --> 00:33:51,890 Here's a tower-- a stable tower. 894 00:33:51,890 --> 00:33:53,390 The stable tower-- all that it means 895 00:33:53,390 --> 00:33:56,220 is that I'm creating some blocks in this particular order. 896 00:33:56,220 --> 00:33:57,560 Here's an almost stable tower. 897 00:33:57,560 --> 00:33:58,643 It's blocks in this order. 898 00:33:58,643 --> 00:33:59,690 Here's an unstable tower. 899 00:33:59,690 --> 00:34:01,130 It's blocks in that order. 900 00:34:01,130 --> 00:34:03,230 And now, what I'm going to do is, 901 00:34:03,230 --> 00:34:05,420 I'm going to run this tower many times, 902 00:34:05,420 --> 00:34:07,580 and I'm going to count up the number of times 903 00:34:07,580 --> 00:34:08,900 that it actually fell. 904 00:34:08,900 --> 00:34:12,260 And if you do that, you'll see that the stable tower didn't 905 00:34:12,260 --> 00:34:13,310 fall down. 906 00:34:13,310 --> 00:34:15,360 This is just saying, like, did it fall down-- 907 00:34:15,360 --> 00:34:16,460 false, true? 908 00:34:16,460 --> 00:34:19,219 This one didn't fall down any of the time. 909 00:34:19,219 --> 00:34:22,040 This one fell down some of the time. 910 00:34:22,040 --> 00:34:24,320 This one fell down all the time. 911 00:34:24,320 --> 00:34:25,340 It's a toy example. 912 00:34:25,340 --> 00:34:28,460 But it's actually a toy example of a very nice and interesting 913 00:34:28,460 --> 00:34:29,929 paper that came out very recently 914 00:34:29,929 --> 00:34:31,970 and shows something deep about intuitive physics. 915 00:34:35,360 --> 00:34:40,940 We can ask how hard it was to implement this thing. 916 00:34:40,940 --> 00:34:45,650 This is an implementation of liquid physics in not so 917 00:34:45,650 --> 00:34:47,810 much Church as webPPL. 918 00:34:47,810 --> 00:34:50,900 And what they were trying to do here is to sort of say, well, 919 00:34:50,900 --> 00:34:52,489 this is a bunch of water. 920 00:34:52,489 --> 00:34:53,780 Physics is frozen right now. 921 00:34:53,780 --> 00:34:56,190 Imagine that this is a big glob of water. 922 00:34:56,190 --> 00:34:57,950 This is a cup over here. 923 00:34:57,950 --> 00:34:59,330 And this is some barrier. 924 00:34:59,330 --> 00:35:01,130 So if the water falls down on the barrier, 925 00:35:01,130 --> 00:35:03,949 it's going to go every which way. 926 00:35:03,949 --> 00:35:04,490 So let's see. 927 00:35:04,490 --> 00:35:14,720 If we run it, one of the questions that we can ask 928 00:35:14,720 --> 00:35:16,895 here-- and this has sort of been an example-- 929 00:35:16,895 --> 00:35:18,770 what I'm trying to show you here is that this 930 00:35:18,770 --> 00:35:20,180 is an active area of research. 931 00:35:20,180 --> 00:35:22,471 Even though it's sort of like Church-- it's 2D physics. 932 00:35:22,471 --> 00:35:23,060 It's simple. 933 00:35:23,060 --> 00:35:24,684 But people have been doing what they've 934 00:35:24,684 --> 00:35:27,770 been porting in as a liquid physics thing into Church. 935 00:35:27,770 --> 00:35:30,077 That took them a little bit of time. 936 00:35:30,077 --> 00:35:31,910 You probably want to talk to people who know 937 00:35:31,910 --> 00:35:34,370 what they're doing in that. 938 00:35:34,370 --> 00:35:37,677 Save yourself some time and talk to people in a Goodman's group. 939 00:35:37,677 --> 00:35:40,010 But what they did is, they ported it in a liquid physics 940 00:35:40,010 --> 00:35:41,464 implementation into Church. 941 00:35:41,464 --> 00:35:43,880 And they sort of said, OK, now we have some liquid physics 942 00:35:43,880 --> 00:35:45,421 and we can try to ask some questions. 943 00:35:45,421 --> 00:35:47,762 Like, suppose that this glob is going to fall down. 944 00:35:47,762 --> 00:35:49,970 We set it down over here and it's going to fall down. 945 00:35:49,970 --> 00:35:52,790 Where should we put this block in order to get as much 946 00:35:52,790 --> 00:35:56,239 of the liquid into that cup? 947 00:35:56,239 --> 00:35:57,530 That's an interesting question. 948 00:35:57,530 --> 00:35:59,279 It shows something about intuitive physics 949 00:35:59,279 --> 00:36:01,840 of liquids and things like that, more than just objects. 950 00:36:01,840 --> 00:36:03,692 So the way that you would do that is, even 951 00:36:03,692 --> 00:36:05,900 in a simple world like this, you would basically say, 952 00:36:05,900 --> 00:36:10,070 fine, put this thing somewhere, randomly uniform, 953 00:36:10,070 --> 00:36:12,350 conditioned on getting as much water 954 00:36:12,350 --> 00:36:14,090 into this thing as possible. 955 00:36:14,090 --> 00:36:18,230 And then try to figure out where this block should go. 956 00:36:18,230 --> 00:36:19,770 So you start out uniform, condition 957 00:36:19,770 --> 00:36:20,780 on as much water as possible. 958 00:36:20,780 --> 00:36:22,363 You'll get some posterior distribution 959 00:36:22,363 --> 00:36:24,140 of where to place this block. 960 00:36:24,140 --> 00:36:26,615 And just to show you what that looks like, let's try to-- 961 00:36:35,744 --> 00:36:38,355 so suppose that we actually tried to run this. 962 00:36:41,560 --> 00:36:43,390 So in this case, you don't want the block 963 00:36:43,390 --> 00:36:45,010 to go there, for example, right? 964 00:36:45,010 --> 00:36:47,301 Because most of the water is going to slosh over there. 965 00:36:47,301 --> 00:36:48,630 What if we put it over there? 966 00:36:48,630 --> 00:36:50,850 It's a little bit better. 967 00:36:50,850 --> 00:36:52,117 That's not so great. 968 00:36:52,117 --> 00:36:53,950 You could run it many, many different times. 969 00:36:53,950 --> 00:36:55,400 At this point, I hope most of you, 970 00:36:55,400 --> 00:36:57,710 even if you don't quite know what you're doing, 971 00:36:57,710 --> 00:37:01,630 you can see how you would go about writing the program 972 00:37:01,630 --> 00:37:02,860 to figure this out. 973 00:37:02,860 --> 00:37:05,740 You would write down the physics world, assume most of that 974 00:37:05,740 --> 00:37:07,100 is taken care of for you. 975 00:37:07,100 --> 00:37:08,600 All you need to do is sort of figure 976 00:37:08,600 --> 00:37:12,130 out where to place this block, put some uniform distribution 977 00:37:12,130 --> 00:37:16,320 in this area, condition on most of the water landing here, 978 00:37:16,320 --> 00:37:18,820 and then just sample, and figure out where this thing should 979 00:37:18,820 --> 00:37:20,155 be in the world. 980 00:37:20,155 --> 00:37:21,990 And that's pretty cool. 981 00:37:21,990 --> 00:37:24,240 So here's another example. 982 00:37:24,240 --> 00:37:26,220 That was intuitive physics. 983 00:37:26,220 --> 00:37:29,094 Let's move on to intuitive psychology, 984 00:37:29,094 --> 00:37:31,635 which Josh was sort of getting out at the end of his lecture. 985 00:37:35,150 --> 00:37:38,710 And here's a very, very simple question, 986 00:37:38,710 --> 00:37:44,437 which is something like, suppose that you see an agent-- 987 00:37:44,437 --> 00:37:45,520 this guy with googly eyes. 988 00:37:45,520 --> 00:37:47,509 Can people see him from way down there? 989 00:37:47,509 --> 00:37:48,800 There's a guy with googly eyes. 990 00:37:48,800 --> 00:37:49,800 It doesn't matter. 991 00:37:49,800 --> 00:37:51,530 It's me. 992 00:37:51,530 --> 00:37:52,614 I'm right here. 993 00:37:52,614 --> 00:37:53,780 There's a banana over there. 994 00:37:53,780 --> 00:37:56,245 There's an apple over here. 995 00:37:56,245 --> 00:37:58,370 So there's the banana over there, apple over there, 996 00:37:58,370 --> 00:38:00,860 and I start walking over here. 997 00:38:00,860 --> 00:38:04,040 And now, someone asks you, why did Tomer go over there? 998 00:38:04,040 --> 00:38:06,307 And you say, well, I guess he wanted the banana. 999 00:38:06,307 --> 00:38:08,640 And you say, oh, but you don't have access to his goals. 1000 00:38:08,640 --> 00:38:10,100 How do you know he wanted the banana? 1001 00:38:10,100 --> 00:38:12,141 And you say, well, because he went to the banana. 1002 00:38:12,141 --> 00:38:14,154 Well, you're just being circular. 1003 00:38:14,154 --> 00:38:16,070 How would you actually solve a task like this? 1004 00:38:16,070 --> 00:38:16,945 It's sort of trivial. 1005 00:38:16,945 --> 00:38:19,070 And you could solve it through something like cues. 1006 00:38:19,070 --> 00:38:20,630 You could say something like, well, 1007 00:38:20,630 --> 00:38:22,502 the thing that you approach is your goal. 1008 00:38:22,502 --> 00:38:23,960 But another thing that you could do 1009 00:38:23,960 --> 00:38:26,126 is, you could say, well, I assume that the way Tomer 1010 00:38:26,126 --> 00:38:28,310 works is that he has goals. 1011 00:38:28,310 --> 00:38:29,750 I don't know what his goals are. 1012 00:38:29,750 --> 00:38:30,890 But I assume he has goals. 1013 00:38:30,890 --> 00:38:34,940 And I assume that he can plan to reach those goals in some sort 1014 00:38:34,940 --> 00:38:38,030 of semi-efficient manner. 1015 00:38:38,030 --> 00:38:39,830 And he has some beliefs about the world. 1016 00:38:39,830 --> 00:38:42,410 And he's going to carry out some sort of planning procedure 1017 00:38:42,410 --> 00:38:44,220 in order to get to his goals. 1018 00:38:44,220 --> 00:38:47,780 And if Tomer wanted the banana, the action he should take 1019 00:38:47,780 --> 00:38:48,880 is to do this. 1020 00:38:48,880 --> 00:38:51,470 It would be very unlikely for him to do this. 1021 00:38:51,470 --> 00:38:54,870 If he wanted the apple, he would do that, not that. 1022 00:38:54,870 --> 00:38:56,870 So you could sort of use this planning procedure 1023 00:38:56,870 --> 00:39:00,611 to set the knobs on your procedure. 1024 00:39:00,611 --> 00:39:02,110 Think of it like a generative model. 1025 00:39:02,110 --> 00:39:03,526 Your generative model is something 1026 00:39:03,526 --> 00:39:07,040 that goes from goals or utilities and beliefs 1027 00:39:07,040 --> 00:39:08,839 to something like actions. 1028 00:39:08,839 --> 00:39:10,130 And you would define the goals. 1029 00:39:10,130 --> 00:39:11,921 Let's say you don't know what my goals are, 1030 00:39:11,921 --> 00:39:14,882 so you place some distribution over them. 1031 00:39:14,882 --> 00:39:16,340 And then you get to see my actions. 1032 00:39:16,340 --> 00:39:18,200 And you basically try to say, well, 1033 00:39:18,200 --> 00:39:22,310 in program space, what would be the setting of the goals 1034 00:39:22,310 --> 00:39:25,340 of Tomer, such that it would have produced the observed 1035 00:39:25,340 --> 00:39:25,905 actions. 1036 00:39:25,905 --> 00:39:27,530 And if you write down a model for that, 1037 00:39:27,530 --> 00:39:30,630 you'll find that if I set the goal for Tomer as banana, 1038 00:39:30,630 --> 00:39:32,510 I'll get the observed action, which is, 1039 00:39:32,510 --> 00:39:35,030 he walked towards the banana. 1040 00:39:35,030 --> 00:39:38,500 Similarly, for belief, if you know something like, 1041 00:39:38,500 --> 00:39:40,604 you know there are two boxes here. 1042 00:39:40,604 --> 00:39:42,020 You don't know what's inside them. 1043 00:39:42,020 --> 00:39:44,390 You know it's either a banana or an apple. 1044 00:39:44,390 --> 00:39:47,360 And you know that I really love bananas and I hate apples. 1045 00:39:47,360 --> 00:39:49,520 And you see me walking towards this box. 1046 00:39:49,520 --> 00:39:51,686 You can infer that, ah, he probably 1047 00:39:51,686 --> 00:39:53,060 thought that was a banana inside, 1048 00:39:53,060 --> 00:39:54,601 or he knew there was a banana inside. 1049 00:39:54,601 --> 00:39:57,245 And again, if you had some sort of planning procedure, 1050 00:39:57,245 --> 00:39:59,120 you would say, OK, it would make sense for me 1051 00:39:59,120 --> 00:40:02,090 to set his belief to be banana, because the outcome of that, 1052 00:40:02,090 --> 00:40:04,310 if I run the model forward with those settings, 1053 00:40:04,310 --> 00:40:06,930 would be for him to walk in that direction. 1054 00:40:06,930 --> 00:40:08,810 Now let me show you just one example 1055 00:40:08,810 --> 00:40:13,470 of what that sort of model would look like. 1056 00:40:13,470 --> 00:40:15,704 So this would be under intuitive psychology. 1057 00:40:15,704 --> 00:40:17,870 Those of you who are interested in sort of inference 1058 00:40:17,870 --> 00:40:21,042 over inference and, how do agents reason 1059 00:40:21,042 --> 00:40:23,000 about other agents, or goal inference or things 1060 00:40:23,000 --> 00:40:28,050 like that, you might want to take a look at this section. 1061 00:40:28,050 --> 00:40:29,940 And this is sort of super simple. 1062 00:40:29,940 --> 00:40:34,410 There's no probabilities, exactly, in the sense of, 1063 00:40:34,410 --> 00:40:37,020 it's going to be either this goal or that goal. 1064 00:40:37,020 --> 00:40:40,524 It's obviously something that can be modified if you want to. 1065 00:40:40,524 --> 00:40:42,690 Does everyone more or less see what's going on here? 1066 00:40:42,690 --> 00:40:43,939 Let me make that a bit bigger. 1067 00:40:46,810 --> 00:40:50,560 What I tried to write down is a model in which someone went 1068 00:40:50,560 --> 00:40:55,690 for an apple, and you're trying to figure out, 1069 00:40:55,690 --> 00:40:59,310 why did he go for that? 1070 00:40:59,310 --> 00:40:59,940 Yes. 1071 00:40:59,940 --> 00:41:00,440 Sorry. 1072 00:41:00,440 --> 00:41:01,810 It really should be over here. 1073 00:41:05,630 --> 00:41:09,480 Now, let me start out, actually, with something like planning. 1074 00:41:09,480 --> 00:41:12,380 Let's write down the forward model before we do inference. 1075 00:41:12,380 --> 00:41:14,050 Before we do inference, let's write down 1076 00:41:14,050 --> 00:41:15,822 how we think the world works. 1077 00:41:15,822 --> 00:41:17,780 The thing that I said-- the way the world works 1078 00:41:17,780 --> 00:41:21,140 is that Tomer has some goals and some beliefs. 1079 00:41:21,140 --> 00:41:23,030 And given his goals, he'll take some action 1080 00:41:23,030 --> 00:41:24,346 to achieve his goals. 1081 00:41:24,346 --> 00:41:25,470 Let's write down that part. 1082 00:41:25,470 --> 00:41:26,780 That's the forward part. 1083 00:41:26,780 --> 00:41:28,363 If we can write down the forward part, 1084 00:41:28,363 --> 00:41:30,020 the inference part comes for free. 1085 00:41:30,020 --> 00:41:32,870 We just put that in an mh-query and say, 1086 00:41:32,870 --> 00:41:35,640 what's the goal that made the observed thing happen? 1087 00:41:35,640 --> 00:41:40,900 So what we would do is, we would write down something like, 1088 00:41:40,900 --> 00:41:42,512 what action should Tomer take? 1089 00:41:42,512 --> 00:41:43,220 Choose an action. 1090 00:41:43,220 --> 00:41:44,390 It's a procedure. 1091 00:41:44,390 --> 00:41:47,012 It's a procedure that takes on a particular goal. 1092 00:41:47,012 --> 00:41:49,220 Here, it's a particular condition that I can satisfy. 1093 00:41:49,220 --> 00:41:50,303 But it could be a utility. 1094 00:41:50,303 --> 00:41:51,440 It could be anything. 1095 00:41:51,440 --> 00:41:53,420 But it's, what action should I take, 1096 00:41:53,420 --> 00:41:56,760 given a particular goal, given how I think the world works? 1097 00:41:56,760 --> 00:41:58,340 That's a transition function. 1098 00:41:58,340 --> 00:42:00,080 If I take this action, what will happen? 1099 00:42:00,080 --> 00:42:04,430 I need to know that if I go left, from your perspective, 1100 00:42:04,430 --> 00:42:05,780 I'll get to the banana. 1101 00:42:05,780 --> 00:42:09,060 That's the transition function for the world. 1102 00:42:09,060 --> 00:42:11,300 And I need some initial state. 1103 00:42:11,300 --> 00:42:14,720 And now, I sample some action. 1104 00:42:14,720 --> 00:42:16,340 I do an action at random. 1105 00:42:16,340 --> 00:42:18,230 I either go left or go right. 1106 00:42:18,230 --> 00:42:20,090 I sample it from some action prior. 1107 00:42:20,090 --> 00:42:22,260 Suppose it's completely uniform. 1108 00:42:22,260 --> 00:42:23,270 Define action. 1109 00:42:23,270 --> 00:42:25,220 It's simple action prior. 1110 00:42:25,220 --> 00:42:27,252 And suppose my prior is go left, go right, 1111 00:42:27,252 --> 00:42:28,210 with equal probability. 1112 00:42:28,210 --> 00:42:29,740 Everyone with me so far? 1113 00:42:29,740 --> 00:42:32,240 We're trying to get a procedure that will give us an action. 1114 00:42:32,240 --> 00:42:33,710 What action should I take? 1115 00:42:33,710 --> 00:42:34,940 Imagine you took an action. 1116 00:42:34,940 --> 00:42:36,450 It doesn't matter which one. 1117 00:42:36,450 --> 00:42:41,660 Now, what action did you end up with conditioned 1118 00:42:41,660 --> 00:42:44,720 on that action getting you to your goal? 1119 00:42:44,720 --> 00:42:47,390 This is a rejection query. 1120 00:42:47,390 --> 00:42:50,720 I'm trying to sample an action conditioned 1121 00:42:50,720 --> 00:42:52,860 on that action getting me to my goal. 1122 00:42:52,860 --> 00:42:54,290 So let's say I sample the action. 1123 00:42:54,290 --> 00:42:57,340 I hypothesize that I go that way. 1124 00:42:57,340 --> 00:42:59,100 Did I satisfy my goal? 1125 00:42:59,100 --> 00:42:59,600 No. 1126 00:42:59,600 --> 00:43:00,980 I ended up with the apple. 1127 00:43:00,980 --> 00:43:01,860 Do it again. 1128 00:43:01,860 --> 00:43:02,500 Run it again. 1129 00:43:02,500 --> 00:43:03,480 Run it again. 1130 00:43:03,480 --> 00:43:04,940 Now I sample the action "go here." 1131 00:43:04,940 --> 00:43:06,230 Did I end up with a banana? 1132 00:43:06,230 --> 00:43:06,740 Yes. 1133 00:43:06,740 --> 00:43:07,280 OK. 1134 00:43:07,280 --> 00:43:08,363 So what action did I take? 1135 00:43:08,363 --> 00:43:10,240 I went, from your perspective, left. 1136 00:43:10,240 --> 00:43:11,750 Return that. 1137 00:43:11,750 --> 00:43:14,740 We've just written down a procedure for planning. 1138 00:43:14,740 --> 00:43:16,600 And it can be made much more complex 1139 00:43:16,600 --> 00:43:17,920 than that in a few short steps. 1140 00:43:17,920 --> 00:43:20,336 By complex, I don't mean that it's hard for you to follow. 1141 00:43:20,336 --> 00:43:22,270 I mean that it can take in multiple worlds, 1142 00:43:22,270 --> 00:43:24,340 multiple steps, utilities, probabilities, 1143 00:43:24,340 --> 00:43:25,407 things like that. 1144 00:43:25,407 --> 00:43:27,490 And it will spit out a sequence of actions for you 1145 00:43:27,490 --> 00:43:31,790 to go from x to y to get to your goal. 1146 00:43:31,790 --> 00:43:35,712 And it's written as planning as inference. 1147 00:43:35,712 --> 00:43:38,170 Now, there are many different types of planning procedures. 1148 00:43:38,170 --> 00:43:40,900 You could write down Markov decision planning processes. 1149 00:43:40,900 --> 00:43:42,610 You could write down rapid random trees. 1150 00:43:42,610 --> 00:43:44,860 I'm just throwing out names there for those of you who 1151 00:43:44,860 --> 00:43:46,151 are interested in these things. 1152 00:43:46,151 --> 00:43:47,966 There's lots of ways of doing planning. 1153 00:43:47,966 --> 00:43:49,840 This is one particular way of doing planning. 1154 00:43:49,840 --> 00:43:51,340 You could have done many different ways. 1155 00:43:51,340 --> 00:43:53,410 But the point is that we assume you can even sort of wrap this 1156 00:43:53,410 --> 00:43:54,580 up in something. 1157 00:43:54,580 --> 00:43:56,740 Like I, as the observer-- 1158 00:43:56,740 --> 00:43:59,902 I as you-- don't need to know exactly how Tomer works. 1159 00:43:59,902 --> 00:44:02,110 I just need to know that there is some procedure such 1160 00:44:02,110 --> 00:44:05,230 that if I put into it a goal, and somehow, 1161 00:44:05,230 --> 00:44:09,220 how the world works, it will spit out some rational action. 1162 00:44:09,220 --> 00:44:11,720 It's preferable that I have some idea of how it works. 1163 00:44:11,720 --> 00:44:12,820 It doesn't need to be the right one. 1164 00:44:12,820 --> 00:44:14,860 It doesn't need to be the one that I actually use. 1165 00:44:14,860 --> 00:44:16,484 But you need to have some sort of sense 1166 00:44:16,484 --> 00:44:17,920 that I am planning somehow. 1167 00:44:17,920 --> 00:44:20,400 This is one way to plan. 1168 00:44:20,400 --> 00:44:21,940 And now, this is just showing you. 1169 00:44:21,940 --> 00:44:25,110 I put some uniform prior on the action prior. 1170 00:44:25,110 --> 00:44:26,560 I either go left or right. 1171 00:44:26,560 --> 00:44:28,060 The transition function of the world 1172 00:44:28,060 --> 00:44:29,620 is such that if you go left, you get an apple. 1173 00:44:29,620 --> 00:44:31,280 If you go right, you get a banana. 1174 00:44:31,280 --> 00:44:33,250 So that's from my perspective, I guess. 1175 00:44:33,250 --> 00:44:35,050 If you do anything else, you get nothing. 1176 00:44:35,050 --> 00:44:37,750 And then you sort of just say, my goal 1177 00:44:37,750 --> 00:44:39,790 is, did I get to the apple, let say, 1178 00:44:39,790 --> 00:44:43,090 or did I get to the banana? 1179 00:44:43,090 --> 00:44:45,730 I put that into the choose action 1180 00:44:45,730 --> 00:44:49,114 and I'll end up going left. 1181 00:44:49,114 --> 00:44:50,780 Because my goal was to get to the apple. 1182 00:44:50,780 --> 00:44:51,950 The apple was on the left. 1183 00:44:51,950 --> 00:44:54,325 I'm going to choose an action in order to go to the left. 1184 00:44:54,325 --> 00:44:57,710 This whole thing is just to show you that, in fact, this works. 1185 00:44:57,710 --> 00:45:00,740 If you sample it forward, it will give you the right action. 1186 00:45:00,740 --> 00:45:03,200 You can now wrap up this whole thing in something 1187 00:45:03,200 --> 00:45:06,620 that does goal inference, that doesn't know that my goal was 1188 00:45:06,620 --> 00:45:10,897 this, that puts a uniform prayer on this thing, 1189 00:45:10,897 --> 00:45:12,980 and then runs forward many, many different samples 1190 00:45:12,980 --> 00:45:15,590 and comes to the conclusion that it must have been the apple, 1191 00:45:15,590 --> 00:45:18,080 because he went left. 1192 00:45:18,080 --> 00:45:21,840 And again, this example is fully written out for you over here, 1193 00:45:21,840 --> 00:45:23,390 as well as the belief inference. 1194 00:45:23,390 --> 00:45:25,790 This is an example of implicature. 1195 00:45:25,790 --> 00:45:28,690 How many of you know what implicature means, 1196 00:45:28,690 --> 00:45:32,060 like Gricean implicature? 1197 00:45:32,060 --> 00:45:36,272 It's the sort of thing where someone tells me, hey, 1198 00:45:36,272 --> 00:45:37,730 are you going to the party tonight? 1199 00:45:37,730 --> 00:45:41,060 And I say, I'm washing my hair. 1200 00:45:41,060 --> 00:45:44,330 Or you say something like, how good of a lecturer was John? 1201 00:45:44,330 --> 00:45:46,940 And I say, well, he was speaking English. 1202 00:45:46,940 --> 00:45:50,090 I'm not exactly telling you he was a bad lecturer. 1203 00:45:50,090 --> 00:45:51,390 But if he was a good one. 1204 00:45:51,390 --> 00:45:52,970 I would say it. 1205 00:45:52,970 --> 00:45:54,396 The fact that I didn't say-- 1206 00:45:54,396 --> 00:45:56,270 the fact that I chose to say something else-- 1207 00:45:56,270 --> 00:45:59,430 implies that he probably wasn't a good lecturer. 1208 00:45:59,430 --> 00:46:00,770 And this sort of happens a lot. 1209 00:46:00,770 --> 00:46:04,610 And that it works is that-- and this happens a lot in language 1210 00:46:04,610 --> 00:46:08,050 games, social games, reasoning about reasoning-- 1211 00:46:08,050 --> 00:46:09,320 I'm the speaker. 1212 00:46:09,320 --> 00:46:10,370 You're the listener. 1213 00:46:10,370 --> 00:46:11,840 I have some model of you. 1214 00:46:11,840 --> 00:46:13,410 You have some model of me. 1215 00:46:13,410 --> 00:46:16,010 I know that you know that I would have 1216 00:46:16,010 --> 00:46:18,230 said he was a good lecturer. 1217 00:46:18,230 --> 00:46:19,610 I know that you know that. 1218 00:46:19,610 --> 00:46:21,350 If I wanted to, I would have said that. 1219 00:46:21,350 --> 00:46:22,580 And I'm not. 1220 00:46:22,580 --> 00:46:25,730 So I know that you know that I know that. 1221 00:46:25,730 --> 00:46:27,490 And it works out such that you realize 1222 00:46:27,490 --> 00:46:28,740 that he's not a good lecturer. 1223 00:46:28,740 --> 00:46:29,990 And that sounds sort of convoluted. 1224 00:46:29,990 --> 00:46:31,281 But it's actually not that bad. 1225 00:46:31,281 --> 00:46:33,530 And I want to show you an example of how that works. 1226 00:46:33,530 --> 00:46:35,029 And this particular example is based 1227 00:46:35,029 --> 00:46:37,410 on the "some not all" example. 1228 00:46:37,410 --> 00:46:43,280 So this is the sort of thing like, I'm a TA in a class 1229 00:46:43,280 --> 00:46:47,980 and someone asks me, how did the students do on the test? 1230 00:46:47,980 --> 00:46:50,600 And I say, some of the students passed the test. 1231 00:46:50,600 --> 00:46:53,840 Do you think I mean, all the students passed the test? 1232 00:46:53,840 --> 00:46:54,500 No. 1233 00:46:54,500 --> 00:46:55,220 Now, why not? 1234 00:46:55,220 --> 00:46:57,890 Because some, in a sense, also means all. 1235 00:46:57,890 --> 00:46:59,810 All the students is true if some is something 1236 00:46:59,810 --> 00:47:03,364 like a logical thing that means greater than five 1237 00:47:03,364 --> 00:47:05,780 or greater than zero or greater than one-- whatever it is. 1238 00:47:05,780 --> 00:47:07,250 It can include all. 1239 00:47:07,250 --> 00:47:10,492 But if it was all, I would have said all. 1240 00:47:10,492 --> 00:47:12,200 And if it was one, I would have said one. 1241 00:47:12,200 --> 00:47:15,070 So people are likely to infer from me saying some, 1242 00:47:15,070 --> 00:47:16,216 that I probably mean-- 1243 00:47:16,216 --> 00:47:17,840 if there's 100 students and I say some, 1244 00:47:17,840 --> 00:47:20,940 they can give you a distribution over what I mean by that. 1245 00:47:20,940 --> 00:47:23,150 And that distribution depends on the alternatives-- 1246 00:47:23,150 --> 00:47:25,400 the alternative words I could have used, 1247 00:47:25,400 --> 00:47:26,960 which they know I didn't. 1248 00:47:26,960 --> 00:47:30,060 But they know I could have. 1249 00:47:30,060 --> 00:47:32,250 There's a nice example of scaler implicature 1250 00:47:32,250 --> 00:47:33,757 and how it would work in probmods. 1251 00:47:33,757 --> 00:47:36,090 What I want to show you is a slightly different example, 1252 00:47:36,090 --> 00:47:40,010 which is, again, the London bombing example. 1253 00:47:40,010 --> 00:47:42,490 But the way it would work is something like this. 1254 00:47:42,490 --> 00:47:46,950 So here's the background for implicature. 1255 00:47:46,950 --> 00:47:48,350 I came up with this yesterday. 1256 00:47:48,350 --> 00:47:49,650 I'm not quite sure it'll work. 1257 00:47:49,650 --> 00:47:51,730 But we'll see. 1258 00:47:51,730 --> 00:47:54,820 Imagine that things work like this. 1259 00:47:54,820 --> 00:47:56,680 The city of London is being bombed. 1260 00:47:56,680 --> 00:47:59,332 Again, sorry for the slightly dire things. 1261 00:47:59,332 --> 00:48:00,790 The city of London is being bombed, 1262 00:48:00,790 --> 00:48:04,990 and there are three places it could be bombed. 1263 00:48:04,990 --> 00:48:06,550 Again, it's this uniform square. 1264 00:48:06,550 --> 00:48:10,330 It could be bombed in the blue part-- 1265 00:48:10,330 --> 00:48:11,927 anywhere here. 1266 00:48:11,927 --> 00:48:13,510 If it's bombed there in the blue part, 1267 00:48:13,510 --> 00:48:16,517 I would say it was bombed outside London. 1268 00:48:16,517 --> 00:48:17,600 That means outside London. 1269 00:48:17,600 --> 00:48:19,000 It doesn't include these things. 1270 00:48:19,000 --> 00:48:20,470 It's just outside London. 1271 00:48:20,470 --> 00:48:22,387 If it was bombed anywhere in this red square-- 1272 00:48:22,387 --> 00:48:24,345 so imagine that this square is something like-- 1273 00:48:24,345 --> 00:48:27,250 I don't know-- zero to two, and over here, it's zero to one. 1274 00:48:27,250 --> 00:48:29,380 Anywhere in the red square is called London. 1275 00:48:29,380 --> 00:48:31,900 If a bomb fell there, I would say, a bomb during the blitz 1276 00:48:31,900 --> 00:48:33,176 dropped on London. 1277 00:48:33,176 --> 00:48:35,050 If someone asked me, where did the bomb fall, 1278 00:48:35,050 --> 00:48:36,220 I would say, in London. 1279 00:48:36,220 --> 00:48:38,890 But that includes this whole thing. 1280 00:48:38,890 --> 00:48:41,320 It also includes Big Ben. 1281 00:48:41,320 --> 00:48:44,234 Big Ben is in London. 1282 00:48:44,234 --> 00:48:46,150 Maybe some of you can see what I'm getting at. 1283 00:48:46,150 --> 00:48:48,525 So if it fell on Big Ben, I could say it fell on Big Ben. 1284 00:48:48,525 --> 00:48:50,775 I can say it fell in London, because that's also true. 1285 00:48:50,775 --> 00:48:52,733 If it fell here, I would say it fell in London. 1286 00:48:52,733 --> 00:48:54,910 If it fell here, I would say it fell outside London. 1287 00:48:54,910 --> 00:48:57,520 Now, there's a general, and a staff sergeant walks up to him, 1288 00:48:57,520 --> 00:48:59,764 and he says, a bomb fell on London during the blitz. 1289 00:48:59,764 --> 00:49:01,430 And the general says, where did it fall? 1290 00:49:01,430 --> 00:49:02,730 And he says, it fell in London. 1291 00:49:02,730 --> 00:49:03,910 And the general says, OK. 1292 00:49:03,910 --> 00:49:05,170 Then he looks outside his window and he says, 1293 00:49:05,170 --> 00:49:06,440 good god, it hit Big Ben! 1294 00:49:06,440 --> 00:49:08,850 And he says, yes, I said it fell in London. 1295 00:49:08,850 --> 00:49:10,120 That's very weird. 1296 00:49:10,120 --> 00:49:11,600 We don't expect people to do that. 1297 00:49:11,600 --> 00:49:13,970 And Gricean implicature says we shouldn't do that. 1298 00:49:13,970 --> 00:49:15,400 But Grice said it in a way that's 1299 00:49:15,400 --> 00:49:17,290 like, you should give the maximal amount 1300 00:49:17,290 --> 00:49:20,600 of helpful information and not hold out other things. 1301 00:49:20,600 --> 00:49:24,290 How would that fall out of a particular model? 1302 00:49:24,290 --> 00:49:26,800 Well, the way it would fall out is something like this. 1303 00:49:26,800 --> 00:49:28,591 And again, I'll show you the code for that. 1304 00:49:28,591 --> 00:49:30,424 It's something like, there's the speaker. 1305 00:49:30,424 --> 00:49:31,840 The speaker is the staff sergeant. 1306 00:49:31,840 --> 00:49:33,520 He could choose one of three words-- 1307 00:49:33,520 --> 00:49:36,850 outside London, in London, dropped on Big Ben. 1308 00:49:36,850 --> 00:49:39,610 Let's say Ben, London, outside-- 1309 00:49:39,610 --> 00:49:40,540 something like that. 1310 00:49:40,540 --> 00:49:41,920 He could choose one of three. 1311 00:49:41,920 --> 00:49:47,500 Now, the fact that he decided to say London 1312 00:49:47,500 --> 00:49:48,930 could include Big Ben. 1313 00:49:48,930 --> 00:49:52,930 But if it was Ben, he could have also said Ben. 1314 00:49:52,930 --> 00:49:56,600 He didn't, which implies that it probably fell over here. 1315 00:49:56,600 --> 00:49:59,590 So the last model that I wanted to show you was exactly that. 1316 00:49:59,590 --> 00:50:03,190 You sort of say, listen, there's some prior. 1317 00:50:03,190 --> 00:50:04,879 The bombs sort of fall anywhere. 1318 00:50:04,879 --> 00:50:06,170 And there's some distance here. 1319 00:50:06,170 --> 00:50:10,187 And let's say you start out with just random gibberish. 1320 00:50:10,187 --> 00:50:12,020 You can say either Ben or outside or inside. 1321 00:50:12,020 --> 00:50:13,180 It doesn't matter. 1322 00:50:13,180 --> 00:50:15,760 Regardless of what the world actually was, 1323 00:50:15,760 --> 00:50:18,640 this is just defining what each one of these words mean. 1324 00:50:18,640 --> 00:50:22,030 To hit London means to be inside that small square I said. 1325 00:50:22,030 --> 00:50:23,800 To hit Ben means to be inside that smaller 1326 00:50:23,800 --> 00:50:25,330 square inside London. 1327 00:50:25,330 --> 00:50:27,894 To hit outside means to hit outside. 1328 00:50:27,894 --> 00:50:28,810 That's what they mean. 1329 00:50:28,810 --> 00:50:32,240 It just gives you back a true or false on a particular point 1330 00:50:32,240 --> 00:50:33,580 in a two-dimensional space. 1331 00:50:33,580 --> 00:50:36,550 Is it true or is it false that this happened? 1332 00:50:36,550 --> 00:50:38,974 Now, you have a speaker and a listener model. 1333 00:50:38,974 --> 00:50:40,390 And the way that the speaker works 1334 00:50:40,390 --> 00:50:42,160 is that he has a particular state in mind. 1335 00:50:42,160 --> 00:50:44,034 Like, he's looking at the state of the world. 1336 00:50:44,034 --> 00:50:45,100 The bomb fell here. 1337 00:50:45,100 --> 00:50:47,380 And he needs to communicate something. 1338 00:50:47,380 --> 00:50:51,442 And he's reasoning about the listener to a particular depth. 1339 00:50:51,442 --> 00:50:52,900 What he's going to do is he's going 1340 00:50:52,900 --> 00:50:56,477 to choose a word randomly, because our prior is random. 1341 00:50:56,477 --> 00:50:58,560 Kind of like before, we chose an action at random, 1342 00:50:58,560 --> 00:51:01,517 and we saw if it worked, he's going to use a word at random. 1343 00:51:01,517 --> 00:51:03,100 And he's going to choose the word such 1344 00:51:03,100 --> 00:51:07,010 that it's going to cause the right state in the listener. 1345 00:51:07,010 --> 00:51:09,080 So he needs a model of the listener. 1346 00:51:09,080 --> 00:51:10,380 What's the listener? 1347 00:51:10,380 --> 00:51:12,180 The listener is someone who takes in a word 1348 00:51:12,180 --> 00:51:13,405 and tries to figure out the state. 1349 00:51:13,405 --> 00:51:14,340 He doesn't know what happened. 1350 00:51:14,340 --> 00:51:15,200 Where did the bomb fall? 1351 00:51:15,200 --> 00:51:16,260 Someone gives him a word. 1352 00:51:16,260 --> 00:51:18,175 And he's trying to figure out the state. 1353 00:51:18,175 --> 00:51:19,800 So he's drawing a state from the prior. 1354 00:51:19,800 --> 00:51:21,250 And this prior is, it could be anywhere. 1355 00:51:21,250 --> 00:51:22,830 It could be anywhere in the square. 1356 00:51:22,830 --> 00:51:24,420 Where did it actually fall? 1357 00:51:24,420 --> 00:51:27,690 Well, it fell here, let's say, given that I 1358 00:51:27,690 --> 00:51:29,790 got this particular word. 1359 00:51:29,790 --> 00:51:31,920 But this word was generated by a speaker, 1360 00:51:31,920 --> 00:51:32,987 which I need a model of. 1361 00:51:32,987 --> 00:51:35,070 So there's this model of the speaker understanding 1362 00:51:35,070 --> 00:51:36,778 the listener, and a model of the listener 1363 00:51:36,778 --> 00:51:38,930 understanding the speaker up to a particular depth. 1364 00:51:38,930 --> 00:51:41,310 And they need to bottom out at some point. 1365 00:51:41,310 --> 00:51:42,419 And it's not that hard. 1366 00:51:42,419 --> 00:51:44,710 I mean, it takes some time to wrap your head around it. 1367 00:51:44,710 --> 00:51:46,660 But it's written about eight lines of code. 1368 00:51:46,660 --> 00:51:48,118 And that's why I said that Church-- 1369 00:51:48,118 --> 00:51:51,122 remember that caveat I gave you of, it's not a toy language. 1370 00:51:51,122 --> 00:51:52,080 It's under development. 1371 00:51:52,080 --> 00:51:54,604 But it's actually been doing some pretty interesting stuff. 1372 00:51:54,604 --> 00:51:55,770 This is one of those things. 1373 00:51:55,770 --> 00:51:57,510 These sort of language games are really 1374 00:51:57,510 --> 00:51:59,575 hard to write in many other models-- 1375 00:51:59,575 --> 00:52:00,450 really hard to write. 1376 00:52:00,450 --> 00:52:01,380 And here, it's kind of trivial. 1377 00:52:01,380 --> 00:52:02,530 You can sort of see where to play with it. 1378 00:52:02,530 --> 00:52:04,440 I came up with this example yesterday. 1379 00:52:04,440 --> 00:52:07,554 And I asked Andreas, which is my go-to guy for these things, 1380 00:52:07,554 --> 00:52:08,970 and he thought it was interesting. 1381 00:52:08,970 --> 00:52:11,760 So you would run that. 1382 00:52:11,760 --> 00:52:15,720 And let's say that the speaker said it hit London. 1383 00:52:15,720 --> 00:52:17,220 What should the listener understand. 1384 00:52:17,220 --> 00:52:19,277 Where is this distribution over it? 1385 00:52:19,277 --> 00:52:20,860 And I've sort of just sampled from it. 1386 00:52:20,860 --> 00:52:22,276 I've sampled from his distribution 1387 00:52:22,276 --> 00:52:23,730 over where he thinks it fell. 1388 00:52:23,730 --> 00:52:26,520 And I just did a few samples, but you can sort of notice 1389 00:52:26,520 --> 00:52:29,940 the suspicious gap over here. 1390 00:52:29,940 --> 00:52:31,380 And if I sample 100 points, you'll 1391 00:52:31,380 --> 00:52:34,770 notice that it's going to be in London-- 1392 00:52:34,770 --> 00:52:36,830 so between one and one. 1393 00:52:36,830 --> 00:52:38,740 This might take it a minute. 1394 00:52:38,740 --> 00:52:41,910 But what you'll end up seeing is that it's probably anywhere 1395 00:52:41,910 --> 00:52:42,930 in London. 1396 00:52:42,930 --> 00:52:44,859 If someone said to you, it fell in London, 1397 00:52:44,859 --> 00:52:46,650 then you infer that it's anywhere in London 1398 00:52:46,650 --> 00:52:47,749 except Big Ben. 1399 00:52:47,749 --> 00:52:50,040 Because if it was Big Ben, you would have said Big Ben. 1400 00:52:50,040 --> 00:52:50,790 It's not perfect. 1401 00:52:50,790 --> 00:52:52,706 I mean, there are some samples that get there. 1402 00:52:52,706 --> 00:52:55,420 It's actually some shifting distribution. 1403 00:52:55,420 --> 00:52:57,360 But yeah. 1404 00:52:57,360 --> 00:52:59,779 If you took some heat map of this thing, what 1405 00:52:59,779 --> 00:53:01,320 you would end up with is that there's 1406 00:53:01,320 --> 00:53:03,890 some sort of suspicious emptiness over here. 1407 00:53:03,890 --> 00:53:06,390 In this case, there's also a bit of an emptiness over there. 1408 00:53:06,390 --> 00:53:08,820 But in the limit, you'll get that. 1409 00:53:08,820 --> 00:53:12,210 So we did plan B, which was to zoom through a few things 1410 00:53:12,210 --> 00:53:13,230 very quickly. 1411 00:53:13,230 --> 00:53:15,930 I'm sure you guys didn't fully grok the details. 1412 00:53:15,930 --> 00:53:16,990 And that's OK. 1413 00:53:16,990 --> 00:53:18,720 What I wanted to do with plan B was 1414 00:53:18,720 --> 00:53:20,880 to give you a taste of what is possible 1415 00:53:20,880 --> 00:53:22,770 and how you would go about writing models. 1416 00:53:22,770 --> 00:53:24,311 The important things to remember here 1417 00:53:24,311 --> 00:53:26,520 is that probabilistic programs are 1418 00:53:26,520 --> 00:53:29,010 great tools for capturing lots of rich structure-- 1419 00:53:29,010 --> 00:53:31,230 anything from physics to psychology to language 1420 00:53:31,230 --> 00:53:33,854 games to grammar to vision. 1421 00:53:33,854 --> 00:53:35,520 Church is a particularly useful language 1422 00:53:35,520 --> 00:53:37,540 for teaching yourselves about these things. 1423 00:53:37,540 --> 00:53:38,915 There's a lot of different models 1424 00:53:38,915 --> 00:53:40,590 that you can play with on probmods.org. 1425 00:53:40,590 --> 00:53:42,720 You can write down a generative model very easily 1426 00:53:42,720 --> 00:53:44,700 to describe how you think the world works, 1427 00:53:44,700 --> 00:53:46,830 and then you can put that in an inference engine 1428 00:53:46,830 --> 00:53:50,261 and try to figure out what you actually saw. 1429 00:53:50,261 --> 00:53:50,760 OK. 1430 00:53:50,760 --> 00:53:52,920 So thank you. 1431 00:53:52,920 --> 00:53:54,770 [APPLAUSE]