1 00:00:01,680 --> 00:00:04,080 The following content is provided under a Creative 2 00:00:04,080 --> 00:00:05,620 Commons license. 3 00:00:05,620 --> 00:00:07,920 Your support will help MIT OpenCourseWare 4 00:00:07,920 --> 00:00:12,280 continue to offer high quality educational resources for free. 5 00:00:12,280 --> 00:00:14,910 To make a donation or view additional materials 6 00:00:14,910 --> 00:00:18,870 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,870 --> 00:00:21,820 at osw.mit.edu. 8 00:00:21,820 --> 00:00:23,820 TOMER ULLMAN: What we're going to do in the last 9 00:00:23,820 --> 00:00:26,820 section is not so much have a lecture as a debate. 10 00:00:26,820 --> 00:00:28,980 And it's going to be a debate between myself 11 00:00:28,980 --> 00:00:30,150 and Laura Schultz. 12 00:00:30,150 --> 00:00:32,070 And hopefully at the end, all of you 13 00:00:32,070 --> 00:00:33,990 can join in for some sort of free for all. 14 00:00:33,990 --> 00:00:35,760 Although, of course, you're welcome to ask 15 00:00:35,760 --> 00:00:38,370 questions at any moment. 16 00:00:38,370 --> 00:00:40,830 Now, Laura and I have had this debate a few times now. 17 00:00:40,830 --> 00:00:43,230 It's a debate about where do new ideas come from. 18 00:00:43,230 --> 00:00:44,070 It's about theories. 19 00:00:44,070 --> 00:00:45,510 It's about imagination. 20 00:00:45,510 --> 00:00:47,760 The last time that we were supposed to have the debate 21 00:00:47,760 --> 00:00:50,540 was in SRCD, which is a child development conference. 22 00:00:50,540 --> 00:00:52,030 But Laura couldn't make it. 23 00:00:52,030 --> 00:00:54,000 So I ended up pulling a Monty Python man 24 00:00:54,000 --> 00:00:55,590 wrestles with himself routine where 25 00:00:55,590 --> 00:00:59,430 I was arguing both my point and Laura's point. 26 00:00:59,430 --> 00:01:01,950 So it's nice to have a sparring partner again 27 00:01:01,950 --> 00:01:04,260 after that experience. 28 00:01:04,260 --> 00:01:06,877 I am still a little intimidated. 29 00:01:06,877 --> 00:01:09,210 Just to give you a sense of the caliber of my opponent-- 30 00:01:09,210 --> 00:01:11,293 the reason Laura didn't show up to our SRCD debate 31 00:01:11,293 --> 00:01:14,260 was that she was giving a talk of her own 32 00:01:14,260 --> 00:01:16,639 in something called TED. 33 00:01:16,639 --> 00:01:18,180 I don't know if you've heard of SRCD, 34 00:01:18,180 --> 00:01:20,520 but presumably you've heard of TED. 35 00:01:20,520 --> 00:01:21,300 OK. 36 00:01:21,300 --> 00:01:22,610 So what's this debate about? 37 00:01:22,610 --> 00:01:23,620 What am I talking about? 38 00:01:23,620 --> 00:01:26,010 Here's a 65 second prologue sort of setting it up. 39 00:01:26,010 --> 00:01:27,440 And then we'll get into it. 40 00:01:27,440 --> 00:01:29,910 The background to this debate is that, you know, 41 00:01:29,910 --> 00:01:32,370 we have this sort of really nice picture coming out 42 00:01:32,370 --> 00:01:33,750 of development and models. 43 00:01:33,750 --> 00:01:36,690 And sort of Laura and Josh have set up the perfect intro 44 00:01:36,690 --> 00:01:38,430 to that, which is from development 45 00:01:38,430 --> 00:01:41,220 we have this idea of children are sort of like scientists 46 00:01:41,220 --> 00:01:43,680 or hackers or what have you. 47 00:01:43,680 --> 00:01:46,770 They come up with these theories to explain the world. 48 00:01:46,770 --> 00:01:47,920 And they do it beautifully. 49 00:01:47,920 --> 00:01:49,620 And that's what they do in development. 50 00:01:49,620 --> 00:01:51,960 And then we have this idea coming out 51 00:01:51,960 --> 00:01:55,410 of computational land which is, well, 52 00:01:55,410 --> 00:01:57,300 what we mean by theories-- how do we actually 53 00:01:57,300 --> 00:01:58,740 capture that computationally? 54 00:01:58,740 --> 00:02:00,120 How do we formalize that? 55 00:02:00,120 --> 00:02:02,820 Well, maybe it's something like hierarchical learning 56 00:02:02,820 --> 00:02:05,760 over space of programs, right? 57 00:02:05,760 --> 00:02:07,740 Because programs are this the thing 58 00:02:07,740 --> 00:02:08,952 that's kind of like theories. 59 00:02:08,952 --> 00:02:10,410 And they kind of explain the world. 60 00:02:10,410 --> 00:02:13,470 And you can learn them via Bayesian inference and programs 61 00:02:13,470 --> 00:02:15,970 over programs and learning over learning. 62 00:02:15,970 --> 00:02:19,420 And that's sort of a really nice picture. 63 00:02:19,420 --> 00:02:20,512 But a lot of people-- 64 00:02:20,512 --> 00:02:21,970 Josh I think was using the phrase-- 65 00:02:21,970 --> 00:02:25,320 or Nour was using the phrase-- "give us grief." 66 00:02:25,320 --> 00:02:27,960 A lot of people give grief to these computation models 67 00:02:27,960 --> 00:02:28,560 correctly. 68 00:02:28,560 --> 00:02:29,850 They say something like, wait a minute. 69 00:02:29,850 --> 00:02:31,590 These computational models that you've described, 70 00:02:31,590 --> 00:02:32,964 like Josh was describing earlier, 71 00:02:32,964 --> 00:02:35,490 these hierarchical programs over programs, 72 00:02:35,490 --> 00:02:37,630 what's the theory space on that, right? 73 00:02:37,630 --> 00:02:39,690 Like, what's the space of all possible programs? 74 00:02:39,690 --> 00:02:40,190 Remind me. 75 00:02:40,190 --> 00:02:41,270 Isn't that infinite? 76 00:02:41,270 --> 00:02:43,590 And isn't infinite really, really big? 77 00:02:43,590 --> 00:02:45,840 What are you trying to suggest, that children are just 78 00:02:45,840 --> 00:02:48,150 searching through this giant space of programs? 79 00:02:48,150 --> 00:02:51,360 How could they possibly be doing that? 80 00:02:51,360 --> 00:02:53,430 And what we'd like to argue and I'll spell out 81 00:02:53,430 --> 00:02:55,388 is that while we don't search through the space 82 00:02:55,388 --> 00:02:58,170 of all programs, we don't do that when we write out 83 00:02:58,170 --> 00:02:59,070 our programs, right? 84 00:02:59,070 --> 00:03:00,224 We somehow figure it out. 85 00:03:00,224 --> 00:03:01,890 We're writing out these infinite spaces. 86 00:03:01,890 --> 00:03:03,816 And yet we search them in our computers. 87 00:03:03,816 --> 00:03:05,190 That's what we're try to suggest. 88 00:03:05,190 --> 00:03:07,231 The same way that we search them in our computers 89 00:03:07,231 --> 00:03:09,750 might be something like the way the children actually have 90 00:03:09,750 --> 00:03:11,040 to search for their theories. 91 00:03:11,040 --> 00:03:14,250 And it's hard, grueling work for computers and children. 92 00:03:14,250 --> 00:03:16,770 And the algorithms that we're, in particular, interested 93 00:03:16,770 --> 00:03:18,930 in-- there's all sorts of learning algorithms, of course. 94 00:03:18,930 --> 00:03:20,280 But one of the most influential learning 95 00:03:20,280 --> 00:03:22,620 algorithms that people have used for learning these programs 96 00:03:22,620 --> 00:03:24,711 and hierarchical programs over programs and things 97 00:03:24,711 --> 00:03:27,210 like that is just sort of this version of stochastic search. 98 00:03:27,210 --> 00:03:28,950 And I've told you a little bit about that-- those of you who 99 00:03:28,950 --> 00:03:31,800 went to the church tutorial-- about Markov Chain Monte 100 00:03:31,800 --> 00:03:33,280 Carlo, and all that stuff. 101 00:03:33,280 --> 00:03:34,696 But the point is something-- well, 102 00:03:34,696 --> 00:03:37,320 the way to search large, complicated theory spaces 103 00:03:37,320 --> 00:03:39,336 is through something like stochastic search. 104 00:03:39,336 --> 00:03:41,460 And, well, if it's something like stochastic search 105 00:03:41,460 --> 00:03:44,460 algorithms for our theories, maybe children 106 00:03:44,460 --> 00:03:48,750 are doing something like stochastic search, right? 107 00:03:48,750 --> 00:03:51,621 No is what Laura's going to say. 108 00:03:51,621 --> 00:03:53,370 Or she is going to say that can't possibly 109 00:03:53,370 --> 00:03:55,842 be the whole story, and here's what's missing. 110 00:03:55,842 --> 00:03:57,300 So does everyone sort of understand 111 00:03:57,300 --> 00:03:59,730 what the debate is going to be about more or less? 112 00:03:59,730 --> 00:04:02,190 I'm going to spell out, of course, all of this 113 00:04:02,190 --> 00:04:04,590 in the next few minutes. 114 00:04:04,590 --> 00:04:05,220 OK. 115 00:04:05,220 --> 00:04:07,350 So as I said, we're going to switch back and forth, 116 00:04:07,350 --> 00:04:07,770 Laura and I. 117 00:04:07,770 --> 00:04:08,910 The outline of the debate is that I'm 118 00:04:08,910 --> 00:04:10,701 going to give some background for like what 119 00:04:10,701 --> 00:04:12,780 good our theory is or presenting a good theory-- 120 00:04:12,780 --> 00:04:13,950 just really sort of covered that. 121 00:04:13,950 --> 00:04:15,150 So I'll be zooming through it just 122 00:04:15,150 --> 00:04:17,160 to give you an example of what I'm talking about. 123 00:04:17,160 --> 00:04:19,618 Because I want to tell you what stochastic search in theory 124 00:04:19,618 --> 00:04:21,839 space looks like. 125 00:04:21,839 --> 00:04:24,220 After I do that, Laura is going to step in. 126 00:04:24,220 --> 00:04:26,040 And she's going to rudely interrupt 127 00:04:26,040 --> 00:04:27,120 me and say, no, no, no. 128 00:04:27,120 --> 00:04:28,310 That can't possibly be true. 129 00:04:28,310 --> 00:04:30,268 Here's all the things that are wrong with that. 130 00:04:30,268 --> 00:04:31,560 I will then give a rebuttal. 131 00:04:31,560 --> 00:04:34,349 And then Laura will end and summarize. 132 00:04:34,349 --> 00:04:36,390 Beyond all these things, something I didn't write 133 00:04:36,390 --> 00:04:38,190 was all of you. 134 00:04:38,190 --> 00:04:40,240 Jump in and say what you think. 135 00:04:40,240 --> 00:04:40,740 OK. 136 00:04:40,740 --> 00:04:41,850 So what good is a theory? 137 00:04:41,850 --> 00:04:44,266 Like I said, Josh sort of went through this. 138 00:04:44,266 --> 00:04:46,890 I don't think you guys need that much convincing at this point. 139 00:04:46,890 --> 00:04:49,181 But by theory, I mean some sort of structured knowledge 140 00:04:49,181 --> 00:04:50,820 that goes beyond the data, compresses 141 00:04:50,820 --> 00:04:53,760 the data in some way, and is able to predict new things. 142 00:04:53,760 --> 00:04:55,350 My running example is going to be 143 00:04:55,350 --> 00:04:57,670 much simpler than character recognition or anything 144 00:04:57,670 --> 00:04:58,170 like that. 145 00:04:58,170 --> 00:05:01,990 It's going to be about magnets and, in particular, 146 00:05:01,990 --> 00:05:03,612 very simplified theory of magnetism. 147 00:05:03,612 --> 00:05:05,320 So suppose we bring a child into the lab. 148 00:05:05,320 --> 00:05:07,062 And we tell her, look at these blocks. 149 00:05:07,062 --> 00:05:08,020 They all look the same. 150 00:05:08,020 --> 00:05:08,780 Play with them. 151 00:05:08,780 --> 00:05:10,900 See if you can figure out what's going on here. 152 00:05:10,900 --> 00:05:11,500 OK? 153 00:05:11,500 --> 00:05:14,230 And unbeknownst to the child, but beknownst to you, 154 00:05:14,230 --> 00:05:15,730 is that these things are magnets. 155 00:05:15,730 --> 00:05:16,730 They're not just blocks. 156 00:05:16,730 --> 00:05:17,470 Some of them are metal. 157 00:05:17,470 --> 00:05:18,511 Some of them are magnets. 158 00:05:18,511 --> 00:05:19,660 Some of them are plastic. 159 00:05:19,660 --> 00:05:20,320 OK? 160 00:05:20,320 --> 00:05:22,147 So she's going to start playing with them. 161 00:05:22,147 --> 00:05:23,980 And she's going to start noticing something. 162 00:05:23,980 --> 00:05:26,830 Like, sometimes none of these things do anything, right? 163 00:05:26,830 --> 00:05:27,940 They don't stick. 164 00:05:27,940 --> 00:05:30,730 But sometimes, huh, they do stick. 165 00:05:30,730 --> 00:05:33,012 And she starts collecting observations 166 00:05:33,012 --> 00:05:33,970 in something like this. 167 00:05:33,970 --> 00:05:35,200 Like, how could she explain the data? 168 00:05:35,200 --> 00:05:36,530 Well, she could explain the data in this. 169 00:05:36,530 --> 00:05:37,654 You can just write it down. 170 00:05:37,654 --> 00:05:39,370 She could like label these somehow. 171 00:05:39,370 --> 00:05:41,080 She could have all these, A to I. 172 00:05:41,080 --> 00:05:43,776 And she could say, well, A and B attract one another, 173 00:05:43,776 --> 00:05:46,150 B and A attract one another, B and C attract one another, 174 00:05:46,150 --> 00:05:47,160 and so on. 175 00:05:47,160 --> 00:05:47,800 OK? 176 00:05:47,800 --> 00:05:49,210 That's her theory. 177 00:05:49,210 --> 00:05:50,762 That's her explanation of the data. 178 00:05:50,762 --> 00:05:52,220 This is horrible, of course, right? 179 00:05:52,220 --> 00:05:54,810 Like, this is not an explanation in any sense of the word. 180 00:05:54,810 --> 00:05:56,320 It's just a table of data. 181 00:05:56,320 --> 00:05:58,079 But in one sense, why is it not a theory? 182 00:05:58,079 --> 00:06:00,620 It's because it's just writing down what you've already seen. 183 00:06:00,620 --> 00:06:02,530 It doesn't compress it in any way. 184 00:06:02,530 --> 00:06:04,070 And it can't predict anything new. 185 00:06:04,070 --> 00:06:06,880 If I now give you a new block X and I tell you, listen, 186 00:06:06,880 --> 00:06:10,070 X attracts B, what else can you tell me about it? 187 00:06:10,070 --> 00:06:13,880 And she'll say, well, X attracts B. Yeah. 188 00:06:13,880 --> 00:06:14,380 OK. 189 00:06:14,380 --> 00:06:15,700 That's what my table tells me. 190 00:06:15,700 --> 00:06:16,990 You can't really predict anything new 191 00:06:16,990 --> 00:06:18,310 from a giant table like that. 192 00:06:18,310 --> 00:06:19,600 You want some sort of compression. 193 00:06:19,600 --> 00:06:20,560 You want some sort of theory. 194 00:06:20,560 --> 00:06:21,880 But what else could she come up with, right? 195 00:06:21,880 --> 00:06:23,210 There's all sorts of things you could come up with. 196 00:06:23,210 --> 00:06:25,251 But she could have come up with a very simplified 197 00:06:25,251 --> 00:06:26,417 theory that goes like this. 198 00:06:26,417 --> 00:06:28,000 Suppose that we imagine that there are 199 00:06:28,000 --> 00:06:29,140 certain things in the world. 200 00:06:29,140 --> 00:06:31,230 We're just going to call them, like, shmagnet and shmetal. 201 00:06:31,230 --> 00:06:32,646 Because I don't want to confuse it 202 00:06:32,646 --> 00:06:34,872 with actual magnets and actual metals. 203 00:06:34,872 --> 00:06:36,080 But she comes up with a name. 204 00:06:36,080 --> 00:06:38,050 She says there are several things in the world. 205 00:06:38,050 --> 00:06:40,133 And how do these things interact with one another? 206 00:06:40,133 --> 00:06:42,070 Well, let's just hypothesize some rules. 207 00:06:42,070 --> 00:06:44,170 And we can talk about how does she come up with these rules. 208 00:06:44,170 --> 00:06:46,461 How does she know that there's two things in the world? 209 00:06:46,461 --> 00:06:48,359 Let's say for some reason by dumb luck 210 00:06:48,359 --> 00:06:49,900 she hits upon this, which is actually 211 00:06:49,900 --> 00:06:51,280 a really good theory for explaining 212 00:06:51,280 --> 00:06:53,500 that, which is to say there are two things in the world. 213 00:06:53,500 --> 00:06:55,541 And the way they interact is through these rules. 214 00:06:55,541 --> 00:06:57,880 If something is a magnet and another thing is a magnet, 215 00:06:57,880 --> 00:06:59,230 if X is a magnet and Y is the magnet, 216 00:06:59,230 --> 00:07:00,313 they're going to interact. 217 00:07:00,313 --> 00:07:01,660 They're going to stick. 218 00:07:01,660 --> 00:07:04,540 If X is a magnet and Y in the metal they're going to stick. 219 00:07:04,540 --> 00:07:06,351 And interactions are symmetric. 220 00:07:06,351 --> 00:07:06,850 OK. 221 00:07:06,850 --> 00:07:09,040 So metals don't stick to metals, but metals stick to magnets. 222 00:07:09,040 --> 00:07:09,880 Magnets sticks to metals. 223 00:07:09,880 --> 00:07:11,230 And the interactions are symmetric. 224 00:07:11,230 --> 00:07:12,460 And if you have these rules, you need 225 00:07:12,460 --> 00:07:15,160 to pay some overhead, obviously, for remembering the rules. 226 00:07:15,160 --> 00:07:19,100 But you can compress this, you know, n by n thing 227 00:07:19,100 --> 00:07:20,600 into just the vectors of remembering 228 00:07:20,600 --> 00:07:22,510 what are the magnets, what are the magnets. 229 00:07:22,510 --> 00:07:24,590 So you've achieved some compression. 230 00:07:24,590 --> 00:07:26,798 And if I give you something new and I tell you, look, 231 00:07:26,798 --> 00:07:29,200 this is X, X attracts A, and you're like, wait a minute. 232 00:07:29,200 --> 00:07:30,230 A is a magnet. 233 00:07:30,230 --> 00:07:32,187 So this is either a magnet or a metal. 234 00:07:32,187 --> 00:07:34,270 You can probably predict a lot of different things 235 00:07:34,270 --> 00:07:34,930 about this thing. 236 00:07:34,930 --> 00:07:36,763 You can go and design your little experiment 237 00:07:36,763 --> 00:07:39,381 to figure out if this new thing is itself a magnet or a metal. 238 00:07:39,381 --> 00:07:39,880 Great. 239 00:07:39,880 --> 00:07:42,602 So this is wonderful. 240 00:07:42,602 --> 00:07:43,810 I won't go through something. 241 00:07:43,810 --> 00:07:45,250 Like, there's this added thing of finding out 242 00:07:45,250 --> 00:07:47,329 which one are the actual shmagnets and shmetals. 243 00:07:47,329 --> 00:07:49,120 And then you can predict the observed data. 244 00:07:49,120 --> 00:07:49,690 OK. 245 00:07:49,690 --> 00:07:52,390 So now, what we've done is we've set up this sort of space 246 00:07:52,390 --> 00:07:53,710 of possible theories, right? 247 00:07:53,710 --> 00:07:55,660 Imagine like all the possible logical theories 248 00:07:55,660 --> 00:07:57,534 that you could have written out, of all 249 00:07:57,534 --> 00:07:59,200 the possible logical predicates, there's 250 00:07:59,200 --> 00:08:00,490 an infinite amount of them. 251 00:08:00,490 --> 00:08:03,050 But now we've turned this into a rational inference problem, 252 00:08:03,050 --> 00:08:03,550 right? 253 00:08:03,550 --> 00:08:05,530 The problem for you as the learner 254 00:08:05,530 --> 00:08:07,870 is out of the space of all possible theories, just 255 00:08:07,870 --> 00:08:10,710 find the one that best explains the observed data, right? 256 00:08:10,710 --> 00:08:13,360 Where, by best we mean something like Bayes' law, right? 257 00:08:13,360 --> 00:08:16,126 Try to find the best theory to predict the data, 258 00:08:16,126 --> 00:08:17,500 whereby we mean the theory that's 259 00:08:17,500 --> 00:08:21,854 sort of the shortest and most compressed in itself a priori. 260 00:08:21,854 --> 00:08:23,020 You've all seen Bayes' rule. 261 00:08:23,020 --> 00:08:25,311 You've all heard Josh talk about this and other people. 262 00:08:25,311 --> 00:08:27,730 But it also explains the data itself the best. 263 00:08:27,730 --> 00:08:28,810 That's the problem. 264 00:08:28,810 --> 00:08:29,570 Go figure it out. 265 00:08:29,570 --> 00:08:30,700 We formalize it for you. 266 00:08:30,700 --> 00:08:31,750 We've solved it. 267 00:08:31,750 --> 00:08:34,880 And, of course, you know, this elides a lot of things. 268 00:08:34,880 --> 00:08:36,540 It doesn't really solve anything. 269 00:08:36,540 --> 00:08:38,480 But we'll get to that in a second. 270 00:08:38,480 --> 00:08:40,240 So you might wonder like how do we 271 00:08:40,240 --> 00:08:43,330 build a good space for possible theories. 272 00:08:43,330 --> 00:08:45,280 And one way to do then is to say, well, 273 00:08:45,280 --> 00:08:47,110 how do you generate all possible theories? 274 00:08:47,110 --> 00:08:50,127 How do you define a space of all possible theories? 275 00:08:50,127 --> 00:08:52,210 In this case, we're just going to go with grammar. 276 00:08:52,210 --> 00:08:54,209 But you could imagine something like a generator 277 00:08:54,209 --> 00:08:56,830 for all possible programs, right? 278 00:08:56,830 --> 00:09:00,089 So a grammar, to put it very, very, very shortly, 279 00:09:00,089 --> 00:09:01,630 is something that you can run forward 280 00:09:01,630 --> 00:09:02,680 and will generate a sentence. 281 00:09:02,680 --> 00:09:04,680 Or it's a way of sort of looking at the sentence 282 00:09:04,680 --> 00:09:07,897 and figuring out what its underlying rules are. 283 00:09:07,897 --> 00:09:08,980 Let's put it this way, OK? 284 00:09:08,980 --> 00:09:12,135 So in a grammar, you would start with a sort of a sentence node. 285 00:09:12,135 --> 00:09:13,510 And then you know that a sentence 286 00:09:13,510 --> 00:09:16,510 can go into several other nodes in some probability. 287 00:09:16,510 --> 00:09:20,080 For example, a sentence node can go into something 288 00:09:20,080 --> 00:09:21,730 like a verb phrase and a noun phrase. 289 00:09:21,730 --> 00:09:23,855 The noun phrase goes into a noun and another thing. 290 00:09:23,855 --> 00:09:25,720 And you sort generate through this grammar 291 00:09:25,720 --> 00:09:27,162 until you end up with a sentence. 292 00:09:27,162 --> 00:09:28,620 And it can be any sort of sentence. 293 00:09:28,620 --> 00:09:31,840 It can be the clownfish ascended the stairs darkly, right? 294 00:09:31,840 --> 00:09:33,730 And you can run this grammar forward 295 00:09:33,730 --> 00:09:36,010 and generate whole new sentences that you've never 296 00:09:36,010 --> 00:09:37,044 thought of it before. 297 00:09:37,044 --> 00:09:38,710 And you could then also use that grammar 298 00:09:38,710 --> 00:09:39,709 to figure out sentences. 299 00:09:39,709 --> 00:09:42,400 So if you see the sentence, the clownfish ascended the stairs 300 00:09:42,400 --> 00:09:44,858 darkly, you can use something like a grammar to figure out, 301 00:09:44,858 --> 00:09:47,350 well, you know, how is this sentence constructed. 302 00:09:47,350 --> 00:09:49,430 This is a grammar for logical predicates. 303 00:09:49,430 --> 00:09:51,550 It just means that if we run it forward, 304 00:09:51,550 --> 00:09:54,160 it starts out with some sort of abstract thing like, you know, 305 00:09:54,160 --> 00:09:56,170 it's going to be a law or some sort of set 306 00:09:56,170 --> 00:09:57,274 of logical predicates. 307 00:09:57,274 --> 00:09:58,690 You run it forward, and it ends up 308 00:09:58,690 --> 00:10:01,260 not with a sentence, like the clownfish blah, blah, blah. 309 00:10:01,260 --> 00:10:03,990 It ends up with a particular law that's 310 00:10:03,990 --> 00:10:08,100 relating a few particular predicates like, you know-- 311 00:10:08,100 --> 00:10:11,400 sorry-- like X and Y interact symmetrically, right? 312 00:10:11,400 --> 00:10:12,900 Like, there are things in the world. 313 00:10:12,900 --> 00:10:14,910 There are X and Y. If X interacts with Y, 314 00:10:14,910 --> 00:10:17,020 Y interacts with X. That's a law. 315 00:10:17,020 --> 00:10:18,930 We run through the grammar, we generate that. 316 00:10:18,930 --> 00:10:20,070 We could run through the grammar again, 317 00:10:20,070 --> 00:10:22,028 and it will say something completely different. 318 00:10:22,028 --> 00:10:23,825 Like, of all the things that I've related, 319 00:10:23,825 --> 00:10:25,200 there are such things as magnets. 320 00:10:25,200 --> 00:10:26,140 There are metals. 321 00:10:26,140 --> 00:10:28,290 And they should interact in this way or that way. 322 00:10:28,290 --> 00:10:29,490 Once we define a grammar like that, 323 00:10:29,490 --> 00:10:31,781 you can predict all sorts of different theories, right? 324 00:10:31,781 --> 00:10:33,900 I gave simplified magnetism, but you could also 325 00:10:33,900 --> 00:10:36,000 capture kinship with a set of four 326 00:10:36,000 --> 00:10:38,350 laws and several predicates. 327 00:10:38,350 --> 00:10:39,900 You could capture taxonomy. 328 00:10:39,900 --> 00:10:43,500 You could capture very, very simplified psychology 329 00:10:43,500 --> 00:10:44,921 of preference. 330 00:10:44,921 --> 00:10:45,420 OK. 331 00:10:45,420 --> 00:10:48,630 So as I said, this is another view of theory search. 332 00:10:48,630 --> 00:10:50,512 I haven't yet gotten to the algorithmic level 333 00:10:50,512 --> 00:10:52,470 of how do you actually search for these things. 334 00:10:52,470 --> 00:10:53,820 But I did want to set up the grammar, 335 00:10:53,820 --> 00:10:56,240 because it's going to be a little bit important later on. 336 00:10:56,240 --> 00:10:57,500 The grammar is just a way of saying, you know, 337 00:10:57,500 --> 00:10:58,541 you start with something. 338 00:10:58,541 --> 00:11:00,510 You generate it forward until you end up 339 00:11:00,510 --> 00:11:04,570 with a particular set of logical rules. 340 00:11:04,570 --> 00:11:09,300 So, again, to phrase the problem in a particular way 341 00:11:09,300 --> 00:11:12,670 of logical inference is to say we have the space of theories. 342 00:11:12,670 --> 00:11:14,130 It's an infinite space, right? 343 00:11:14,130 --> 00:11:16,200 And before I've seen any data, there 344 00:11:16,200 --> 00:11:18,510 is some probability of the right theory 345 00:11:18,510 --> 00:11:20,372 is to explain my data, where the data can 346 00:11:20,372 --> 00:11:22,830 be something like-- play with these blogs, figure them out. 347 00:11:22,830 --> 00:11:24,492 Before I've even seen the blocks, 348 00:11:24,492 --> 00:11:26,700 before I've seen anything that I need to explain yet, 349 00:11:26,700 --> 00:11:28,650 I have some sort of space of all possible theories. 350 00:11:28,650 --> 00:11:30,108 It can be all the logical theories, 351 00:11:30,108 --> 00:11:32,261 all the possible programs in the world. 352 00:11:32,261 --> 00:11:34,260 And I have some sort of probability distribution 353 00:11:34,260 --> 00:11:36,426 over which one of these are more likely than others, 354 00:11:36,426 --> 00:11:38,326 which comes from the prior. 355 00:11:38,326 --> 00:11:40,200 And the prior can be something on simplicity, 356 00:11:40,200 --> 00:11:43,980 like shorter programs are more likely, or the less free 357 00:11:43,980 --> 00:11:45,045 parameters the better. 358 00:11:45,045 --> 00:11:46,170 Something like that, right? 359 00:11:46,170 --> 00:11:49,344 So before I've seen any data, I can already score some programs 360 00:11:49,344 --> 00:11:50,760 as being unlikely, because they're 361 00:11:50,760 --> 00:11:52,710 too long and too ungainly. 362 00:11:52,710 --> 00:11:54,160 Does that make sense to everyone? 363 00:11:54,160 --> 00:11:54,792 OK. 364 00:11:54,792 --> 00:11:57,000 And that's just a representation of that in 2D space. 365 00:11:57,000 --> 00:11:58,874 It's saying, like, over here is theory space. 366 00:11:58,874 --> 00:12:01,170 And the size of the hills over 2D space 367 00:12:01,170 --> 00:12:03,890 is how likely each point, each point in theory space, 368 00:12:03,890 --> 00:12:05,950 is a theory, where theory can be, 369 00:12:05,950 --> 00:12:08,880 as I said, a set of logical laws or program or anything. 370 00:12:08,880 --> 00:12:10,380 And the height of the hill over that 371 00:12:10,380 --> 00:12:11,610 is the amount of probability you should 372 00:12:11,610 --> 00:12:13,985 assign to that theory, how much you should believe in it. 373 00:12:13,985 --> 00:12:16,410 And as data comes in, what happens 374 00:12:16,410 --> 00:12:19,080 is you become more or less certain of certain programs. 375 00:12:19,080 --> 00:12:21,401 Like, suddenly you shift probability distributions 376 00:12:21,401 --> 00:12:21,900 around. 377 00:12:21,900 --> 00:12:24,525 You say, oh, these things that I didn't think were that likely, 378 00:12:24,525 --> 00:12:27,840 well, the data's really pushing me to accepting these theories. 379 00:12:27,840 --> 00:12:28,380 OK? 380 00:12:28,380 --> 00:12:29,970 And that's what learning is, right? 381 00:12:29,970 --> 00:12:31,136 That's the view of learning. 382 00:12:31,136 --> 00:12:33,210 You just start with particular probabilities. 383 00:12:33,210 --> 00:12:34,470 You get some data. 384 00:12:34,470 --> 00:12:36,870 And then you shift that around, and you 385 00:12:36,870 --> 00:12:37,980 get other probabilities. 386 00:12:37,980 --> 00:12:40,830 And that's sort of a very beautiful picture. 387 00:12:40,830 --> 00:12:43,682 And the only problem with it is that it is clearly false. 388 00:12:43,682 --> 00:12:45,390 And the reason it's false is because this 389 00:12:45,390 --> 00:12:46,265 is an infinite space. 390 00:12:46,265 --> 00:12:48,139 And there's no way that you could have scored 391 00:12:48,139 --> 00:12:49,237 the entire infinite space. 392 00:12:49,237 --> 00:12:51,570 There's no way that what children are doing, what adults 393 00:12:51,570 --> 00:12:53,310 are doing, what computers are doing 394 00:12:53,310 --> 00:12:56,460 is to instantly shift probability mass 395 00:12:56,460 --> 00:12:57,960 over this entire space, right? 396 00:12:57,960 --> 00:12:58,800 New data comes in. 397 00:12:58,800 --> 00:13:00,660 They're holding in this infinite space 398 00:13:00,660 --> 00:13:03,390 and shifting exactly the probabilities around, right? 399 00:13:03,390 --> 00:13:06,270 This view of learning is ridiculous, because, well, it's 400 00:13:06,270 --> 00:13:07,350 patently absurd. 401 00:13:07,350 --> 00:13:11,040 And also, people have taken issue with it 402 00:13:11,040 --> 00:13:13,980 even though it was never meant to be the story of learning 403 00:13:13,980 --> 00:13:15,792 that is happening in the real-- well, 404 00:13:15,792 --> 00:13:17,500 how should I put this, in the real world? 405 00:13:17,500 --> 00:13:20,125 Do you guys know the difference between the computational level 406 00:13:20,125 --> 00:13:23,010 and the algorithmic level when I say something like that? 407 00:13:23,010 --> 00:13:25,680 Show of hands-- who knows about Marr's three levels? 408 00:13:25,680 --> 00:13:26,190 OK. 409 00:13:26,190 --> 00:13:28,880 Who doesn't know about Marr's three levels of explanation? 410 00:13:28,880 --> 00:13:29,550 OK. 411 00:13:29,550 --> 00:13:32,100 The point is to say when you try to explain the phenomenon, 412 00:13:32,100 --> 00:13:34,350 you're going to give it several levels of explanation. 413 00:13:34,350 --> 00:13:36,360 You're going to give it sort of the functional level, what 414 00:13:36,360 --> 00:13:37,944 we might call the computational level. 415 00:13:37,944 --> 00:13:39,484 And then you're going to actually say 416 00:13:39,484 --> 00:13:41,130 how does this actually get implemented 417 00:13:41,130 --> 00:13:42,292 in an actual machine. 418 00:13:42,292 --> 00:13:44,250 And there are many ways of implementing things. 419 00:13:44,250 --> 00:13:47,070 Like you might say, well, look, the general problem for vision 420 00:13:47,070 --> 00:13:50,170 is this, or this thing needs to be in addition function. 421 00:13:50,170 --> 00:13:52,360 These are the sort of things I wanted to do. 422 00:13:52,360 --> 00:13:53,735 But there are many different ways 423 00:13:53,735 --> 00:13:55,390 to implement that function. 424 00:13:55,390 --> 00:13:56,950 Some of them are better than others. 425 00:13:56,950 --> 00:13:58,180 That's the algorithmic level. 426 00:13:58,180 --> 00:14:01,440 How does this actually get implemented in an algorithm? 427 00:14:01,440 --> 00:14:02,940 Then you can implement the algorithm 428 00:14:02,940 --> 00:14:04,754 in many different mechanistic ways. 429 00:14:04,754 --> 00:14:07,170 You can go and implement it in neurons or in silicon chips 430 00:14:07,170 --> 00:14:07,830 or things like that. 431 00:14:07,830 --> 00:14:09,705 There are many different ways of implementing 432 00:14:09,705 --> 00:14:10,950 a particular algorithm. 433 00:14:10,950 --> 00:14:13,914 This view of, you know, you have some theories, 434 00:14:13,914 --> 00:14:15,330 you have some prior over theories, 435 00:14:15,330 --> 00:14:18,360 you shift the probabilities around as data comes in-- 436 00:14:18,360 --> 00:14:21,170 that's an explanation on the computational level. 437 00:14:21,170 --> 00:14:23,970 That's not an explanation of how we actually shift them around, 438 00:14:23,970 --> 00:14:26,640 how we actually search for these theories. 439 00:14:26,640 --> 00:14:29,010 And really the computational level 440 00:14:29,010 --> 00:14:31,417 sometimes gets grief, because people say, well, 441 00:14:31,417 --> 00:14:32,250 what are you saying? 442 00:14:32,250 --> 00:14:33,750 Are you saying that Einstein somehow 443 00:14:33,750 --> 00:14:35,490 had the theory of relativity, we all 444 00:14:35,490 --> 00:14:37,470 had the theory of relativity, in our head? 445 00:14:37,470 --> 00:14:39,080 And his process of discovery was just 446 00:14:39,080 --> 00:14:40,920 to say, well, I believe in Newton's theory. 447 00:14:40,920 --> 00:14:44,070 And I had some low prior on relativity. 448 00:14:44,070 --> 00:14:47,160 But then the data came in, and I shifted my probability 449 00:14:47,160 --> 00:14:48,780 to the theory of relativity. 450 00:14:48,780 --> 00:14:50,826 That sounds not the way people actually learn. 451 00:14:50,826 --> 00:14:52,950 That doesn't sound like the way we discover things. 452 00:14:52,950 --> 00:14:55,260 That doesn't sound like the way we come up with new theories. 453 00:14:55,260 --> 00:14:57,420 That doesn't sound like the way that children learn. 454 00:14:57,420 --> 00:14:59,290 And as I said, that's not exactly what we think. 455 00:14:59,290 --> 00:15:01,290 And that's not what happens in computers either. 456 00:15:01,290 --> 00:15:03,090 That's not the algorithmic level. 457 00:15:03,090 --> 00:15:04,880 So what happens in the algorithm level 458 00:15:04,880 --> 00:15:08,285 is you actually have to search for your theories, OK? 459 00:15:08,285 --> 00:15:10,160 And this is what happens in stochastic search 460 00:15:10,160 --> 00:15:10,900 in particular. 461 00:15:10,900 --> 00:15:12,080 What how does in stochastic search-- 462 00:15:12,080 --> 00:15:14,240 those of you who are in tutorial remember this-- 463 00:15:14,240 --> 00:15:15,780 you have some space of theories. 464 00:15:15,780 --> 00:15:19,540 OK, I'm still giving you the space of possible theories. 465 00:15:19,540 --> 00:15:21,740 Each dot in theory space is, let's say, a program. 466 00:15:21,740 --> 00:15:24,177 Or in this case, it's something like a theory in the sense 467 00:15:24,177 --> 00:15:25,010 I defined it before. 468 00:15:25,010 --> 00:15:28,370 It's a set of logical laws relating these predicates 469 00:15:28,370 --> 00:15:28,890 together. 470 00:15:28,890 --> 00:15:29,390 OK. 471 00:15:29,390 --> 00:15:31,142 So let me use-- 472 00:15:31,142 --> 00:15:32,600 I'm not sure you can follow my hand 473 00:15:32,600 --> 00:15:34,790 like this, right-- gaze detection or whatever. 474 00:15:34,790 --> 00:15:38,630 So theory A is, let's say, a theory that 475 00:15:38,630 --> 00:15:39,910 has three possible laws. 476 00:15:39,910 --> 00:15:43,460 It says if X has a P, P can be anything. 477 00:15:43,460 --> 00:15:46,160 And Y is a P. It's the same sort of thing. 478 00:15:46,160 --> 00:15:47,520 Then they're going to interact. 479 00:15:47,520 --> 00:15:51,069 The second law says if X is P and Y is a Q, 480 00:15:51,069 --> 00:15:51,860 they will interact. 481 00:15:51,860 --> 00:15:52,490 It doesn't matter. 482 00:15:52,490 --> 00:15:53,960 You don't have to figure this theory out, right? 483 00:15:53,960 --> 00:15:56,501 I'm just trying to give you an example of what a theory could 484 00:15:56,501 --> 00:15:58,670 be theoretically. 485 00:15:58,670 --> 00:16:01,490 And they interact symmetrically. 486 00:16:01,490 --> 00:16:02,270 That's one dot. 487 00:16:02,270 --> 00:16:03,710 Let's call that dot number one. 488 00:16:03,710 --> 00:16:07,010 That's theory A. Here is dot number two, theory B. OK? 489 00:16:07,010 --> 00:16:09,689 You as the learner, as the stochastic search learner 490 00:16:09,689 --> 00:16:11,480 in the algorithmic level, don't have access 491 00:16:11,480 --> 00:16:12,521 to the full theory space. 492 00:16:12,521 --> 00:16:14,870 All you have is A. That's where you are right now. 493 00:16:14,870 --> 00:16:16,250 That's all you have of the world. 494 00:16:16,250 --> 00:16:16,750 OK. 495 00:16:16,750 --> 00:16:18,707 That's how you can try to explain the world. 496 00:16:18,707 --> 00:16:20,540 And what you can do is you can try proposing 497 00:16:20,540 --> 00:16:22,550 certain changes to your theory. 498 00:16:22,550 --> 00:16:24,740 You can try taking out a law or putting in a law 499 00:16:24,740 --> 00:16:27,080 or taking out a predicate, somehow sort of messing 500 00:16:27,080 --> 00:16:29,160 with your theory, hacking with your theory 501 00:16:29,160 --> 00:16:31,790 somehow, coming up with a new theory that 502 00:16:31,790 --> 00:16:34,310 gives you theory B. It's a different point 503 00:16:34,310 --> 00:16:35,511 in theory space. 504 00:16:35,511 --> 00:16:37,760 And what you do then is you compare your two theories. 505 00:16:37,760 --> 00:16:40,454 You say, how well does this predict the data? 506 00:16:40,454 --> 00:16:41,870 So and so, you give it some score. 507 00:16:41,870 --> 00:16:44,080 You say, how well does this theory predict the data? 508 00:16:44,080 --> 00:16:45,496 So and so, you give it some score. 509 00:16:45,496 --> 00:16:48,140 You say, how likely is the theory a priori? 510 00:16:48,140 --> 00:16:49,040 How short is it? 511 00:16:49,040 --> 00:16:50,110 How simple is it? 512 00:16:50,110 --> 00:16:50,690 OK. 513 00:16:50,690 --> 00:16:52,310 How short is this theory a priori? 514 00:16:52,310 --> 00:16:52,550 OK. 515 00:16:52,550 --> 00:16:54,110 So you get some score for this theory that's 516 00:16:54,110 --> 00:16:55,514 based on the data in the prior. 517 00:16:55,514 --> 00:16:56,930 You get some score for this theory 518 00:16:56,930 --> 00:16:58,220 based on the data in the prior. 519 00:16:58,220 --> 00:17:00,553 And you basically decide which one of these two theories 520 00:17:00,553 --> 00:17:01,310 to accept. 521 00:17:01,310 --> 00:17:03,590 You could either stay with your old theory 522 00:17:03,590 --> 00:17:05,124 before you proposed any changes. 523 00:17:05,124 --> 00:17:06,790 Or you could decide that, wait a minute, 524 00:17:06,790 --> 00:17:08,960 that theory that I just proposed, the new one, 525 00:17:08,960 --> 00:17:10,102 is actually a bit better. 526 00:17:10,102 --> 00:17:12,560 Or even if it's not better, it's not doing that much worse, 527 00:17:12,560 --> 00:17:13,450 so I'll jump to it. 528 00:17:13,450 --> 00:17:13,950 OK. 529 00:17:13,950 --> 00:17:15,970 That's the stochastic part in stochastic search 530 00:17:15,970 --> 00:17:18,770 or in like these Metropolis-Hastings algorithm. 531 00:17:18,770 --> 00:17:21,770 This way of sort of jumping around in the space 532 00:17:21,770 --> 00:17:23,599 is the stochastic search I'm talking about. 533 00:17:23,599 --> 00:17:25,099 You're here in theory space. 534 00:17:25,099 --> 00:17:26,720 You have one theory. 535 00:17:26,720 --> 00:17:28,339 You propose a change to that theory. 536 00:17:28,339 --> 00:17:29,840 And we'll get in a second to how you propose 537 00:17:29,840 --> 00:17:30,839 a change to that theory. 538 00:17:30,839 --> 00:17:31,885 You propose a change. 539 00:17:31,885 --> 00:17:32,510 On end up here. 540 00:17:32,510 --> 00:17:34,010 You say, should I move here? 541 00:17:34,010 --> 00:17:34,610 Let's check. 542 00:17:34,610 --> 00:17:36,050 Is this theory doing any better? 543 00:17:36,050 --> 00:17:37,880 If it's doing better, move there. 544 00:17:37,880 --> 00:17:41,690 If it's doing worse, well, maybe still move there 545 00:17:41,690 --> 00:17:43,940 depending on how much worse it's doing. 546 00:17:43,940 --> 00:17:46,940 And this process is sort of jumping around in theory space. 547 00:17:46,940 --> 00:17:49,550 Probabilistically proposing theories and accepting them 548 00:17:49,550 --> 00:17:55,240 is not that far from what MCMC is doing or optimization 549 00:17:55,240 --> 00:17:57,900 through MCMC. 550 00:17:57,900 --> 00:17:58,670 Oh, sorry. 551 00:17:58,670 --> 00:17:59,970 I'm clicking on this. 552 00:17:59,970 --> 00:18:00,470 OK. 553 00:18:00,470 --> 00:18:02,570 And you can notice that this sort of search is somewhat 554 00:18:02,570 --> 00:18:04,986 different from-- this is the picture that Josh was showing 555 00:18:04,986 --> 00:18:06,080 before-- 556 00:18:06,080 --> 00:18:08,294 gradient descent or convex optimization, 557 00:18:08,294 --> 00:18:10,460 the sort of thing that you might do neural networks. 558 00:18:10,460 --> 00:18:12,680 I'm not trying to suggest that this is exactly what happens 559 00:18:12,680 --> 00:18:13,430 in neural networks. 560 00:18:13,430 --> 00:18:15,513 I mean, the more complicated neural network stuff, 561 00:18:15,513 --> 00:18:18,620 the energy landscape, can be quite complicated. 562 00:18:18,620 --> 00:18:20,990 And they still need to do stochastic gradient descent. 563 00:18:20,990 --> 00:18:23,870 But it doesn't look as fully connected and as horrible 564 00:18:23,870 --> 00:18:27,230 as these sort of theories on top look like. 565 00:18:27,230 --> 00:18:29,765 Because the space there is much more-- 566 00:18:29,765 --> 00:18:31,140 I don't want to say well-defined. 567 00:18:31,140 --> 00:18:32,100 They're both well-defined. 568 00:18:32,100 --> 00:18:34,370 But it's sort of much easier to search through, right? 569 00:18:34,370 --> 00:18:36,800 It's sort of easy to get in these neural networks 570 00:18:36,800 --> 00:18:38,990 to know where you're going to sort of differentiate 571 00:18:38,990 --> 00:18:40,840 and quickly get to your target. 572 00:18:40,840 --> 00:18:42,320 You're not so much doing this sort 573 00:18:42,320 --> 00:18:44,247 of hard laborious stochastic search. 574 00:18:44,247 --> 00:18:45,830 You sort of have this notion of, well, 575 00:18:45,830 --> 00:18:47,420 it's immediately going to be the best 576 00:18:47,420 --> 00:18:49,210 direction to go in is this. 577 00:18:49,210 --> 00:18:50,049 OK? 578 00:18:50,049 --> 00:18:51,590 I'm just going to do gradient decent. 579 00:18:51,590 --> 00:18:53,060 I'm going to roll downhill. 580 00:18:53,060 --> 00:18:56,540 And then that's some sort of good point in neural network 581 00:18:56,540 --> 00:18:57,190 land. 582 00:18:57,190 --> 00:18:57,895 OK. 583 00:18:57,895 --> 00:19:00,020 So how do we actually propose alternative theories? 584 00:19:00,020 --> 00:19:03,970 Well, you say, look, I have some sort of, let's say-- 585 00:19:03,970 --> 00:19:05,150 let's do this. 586 00:19:05,150 --> 00:19:05,720 OK. 587 00:19:05,720 --> 00:19:07,310 You have some sort of particular rule. 588 00:19:07,310 --> 00:19:08,809 And then what you do-- remember when 589 00:19:08,809 --> 00:19:10,934 I said you have some sort of grammar over theories, 590 00:19:10,934 --> 00:19:13,058 kind of like a grammar in language for those of you 591 00:19:13,058 --> 00:19:13,700 who know? 592 00:19:13,700 --> 00:19:15,790 The grammar describes a particular tree 593 00:19:15,790 --> 00:19:17,990 that you walk through to end up with your theory. 594 00:19:17,990 --> 00:19:20,120 What you do to propose a change to it-- and this 595 00:19:20,120 --> 00:19:21,786 is true for both these logical theories, 596 00:19:21,786 --> 00:19:23,120 but also for programs. 597 00:19:23,120 --> 00:19:24,410 You go to any [INAUDIBLE]. 598 00:19:24,410 --> 00:19:27,160 You go to any sort of node in that tree that 599 00:19:27,160 --> 00:19:29,450 generated your program or generated your theory, 600 00:19:29,450 --> 00:19:30,200 and you change it. 601 00:19:30,200 --> 00:19:33,020 You sort of re-sample from it or, you know, cut it 602 00:19:33,020 --> 00:19:34,380 and regenerate. 603 00:19:34,380 --> 00:19:38,040 And then what that ends up doing is it ends up, for example, 604 00:19:38,040 --> 00:19:40,470 adding a new rule or, for example, 605 00:19:40,470 --> 00:19:43,490 changing the predicate, or adding a new rule again, 606 00:19:43,490 --> 00:19:45,920 or deleting a rule, or deleting a predicate, 607 00:19:45,920 --> 00:19:48,140 or changing a predicate, deleting a predicate by sort 608 00:19:48,140 --> 00:19:50,640 of changing predicates, deleting rules, adding rules, adding 609 00:19:50,640 --> 00:19:51,650 predicates. 610 00:19:51,650 --> 00:19:54,490 I wasn't covering the full space. 611 00:19:54,490 --> 00:19:58,740 You jump around in theory land. 612 00:19:58,740 --> 00:19:59,240 OK. 613 00:19:59,240 --> 00:20:04,150 And this sort of dynamic of moving around in theory land 614 00:20:04,150 --> 00:20:05,440 stochastically-- 615 00:20:05,440 --> 00:20:09,074 one step forward, two steps back, you 616 00:20:09,074 --> 00:20:10,990 know, you don't know the target ahead of time, 617 00:20:10,990 --> 00:20:12,490 when you actually propose something, 618 00:20:12,490 --> 00:20:13,810 you proposed something new-- 619 00:20:13,810 --> 00:20:16,450 different learners might take different paths 620 00:20:16,450 --> 00:20:18,460 to get to the same target ultimately starting 621 00:20:18,460 --> 00:20:20,296 from an equal state of ignorance. 622 00:20:20,296 --> 00:20:21,920 You sort of propose different theories. 623 00:20:21,920 --> 00:20:22,780 You end up in different ways. 624 00:20:22,780 --> 00:20:24,405 But usually, if you get the right data, 625 00:20:24,405 --> 00:20:26,560 eventually you end up in the same spot. 626 00:20:26,560 --> 00:20:28,541 These discrete moments of sort of, you 627 00:20:28,541 --> 00:20:30,415 know, faffing about and sort of saying, well, 628 00:20:30,415 --> 00:20:31,910 this rule, that rule, that doesn't explain it. 629 00:20:31,910 --> 00:20:32,890 This explains it, doesn't explain it. 630 00:20:32,890 --> 00:20:35,410 Suddenly, you propose something that sort of clicks into place. 631 00:20:35,410 --> 00:20:36,790 You say, ah, it's not that there's 632 00:20:36,790 --> 00:20:37,789 two things in the world. 633 00:20:37,789 --> 00:20:39,550 There's three things in the world, right? 634 00:20:39,550 --> 00:20:42,580 It's not just metals and plastics. 635 00:20:42,580 --> 00:20:44,760 There's actually metals and magnets and plastics. 636 00:20:44,760 --> 00:20:45,260 Ah. 637 00:20:45,260 --> 00:20:47,530 And suddenly, you rearrange the data and things like that, 638 00:20:47,530 --> 00:20:48,030 right? 639 00:20:48,030 --> 00:20:49,690 All these things, all these dynamics, 640 00:20:49,690 --> 00:20:52,090 have something of the flavor of children's learning. 641 00:20:52,090 --> 00:20:53,920 And I didn't give you the full spiel. 642 00:20:53,920 --> 00:20:56,204 And you can go and read some of our papers. 643 00:20:56,204 --> 00:20:57,620 There's also been other people who 644 00:20:57,620 --> 00:20:58,995 have been very interested in sort 645 00:20:58,995 --> 00:21:01,360 of looking at the dynamics of stochastic search, 646 00:21:01,360 --> 00:21:03,670 how well it predicts what children are doing. 647 00:21:03,670 --> 00:21:05,410 Among them, I'll just mention people 648 00:21:05,410 --> 00:21:07,690 like Liz Bonawitz and Stephanie Denison, 649 00:21:07,690 --> 00:21:10,840 and also Tom Griffiths, and finally Alison Gopnik. 650 00:21:10,840 --> 00:21:14,200 So I also have TED speaker on my side. 651 00:21:14,200 --> 00:21:19,380 And not just any TED speaker-- it's Laura's former advisor. 652 00:21:19,380 --> 00:21:22,990 So at that point, it's a good point to hand it off to Laura 653 00:21:22,990 --> 00:21:24,900 and see what she has to say about this. 654 00:21:24,900 --> 00:21:27,752 But a midpoint summary is to say theories are useful, right? 655 00:21:27,752 --> 00:21:29,710 I think Josh has already convinced you of that. 656 00:21:29,710 --> 00:21:31,090 We want theories. 657 00:21:31,090 --> 00:21:33,340 The problem is that they define these rich structured 658 00:21:33,340 --> 00:21:35,530 complicated landscapes much more rich, much more 659 00:21:35,530 --> 00:21:36,850 complicated than anything that you might 660 00:21:36,850 --> 00:21:37,974 find in the neural network. 661 00:21:37,974 --> 00:21:39,840 Well, yes. 662 00:21:39,840 --> 00:21:43,030 And it's hard to search for these rich, complicated, fully 663 00:21:43,030 --> 00:21:45,850 connected landscapes, like the space of all programs 664 00:21:45,850 --> 00:21:48,975 or the space of all possible logic theories. 665 00:21:48,975 --> 00:21:50,350 And the way to sort through it is 666 00:21:50,350 --> 00:21:53,320 stochastic search, which can be horribly slow and wrong 667 00:21:53,320 --> 00:21:54,760 and things like that. 668 00:21:54,760 --> 00:21:57,610 But the claim is something like, what are you 669 00:21:57,610 --> 00:21:58,660 going to do, right? 670 00:21:58,660 --> 00:22:00,220 I mean, we want these rich theories. 671 00:22:00,220 --> 00:22:01,780 Rich theories define rich landscapes. 672 00:22:01,780 --> 00:22:03,520 And you just have to get away right now 673 00:22:03,520 --> 00:22:04,710 with stochastic search. 674 00:22:04,710 --> 00:22:06,670 Our algorithm solution for these spaces 675 00:22:06,670 --> 00:22:08,680 are stochastic search in that rich landscape. 676 00:22:08,680 --> 00:22:11,020 And, well, why shouldn't that apply 677 00:22:11,020 --> 00:22:13,480 to what children are doing? 678 00:22:13,480 --> 00:22:15,880 So here with a why not is Laura. 679 00:22:19,065 --> 00:22:20,786 LAURA SCHULZ: I think I have one. 680 00:22:20,786 --> 00:22:22,340 I'm hooked up, right? 681 00:22:22,340 --> 00:22:24,230 OK. 682 00:22:24,230 --> 00:22:27,350 So when I first engaged in this debate 683 00:22:27,350 --> 00:22:30,890 with Tomer, I was stuck on this kind of a thought 684 00:22:30,890 --> 00:22:33,860 that what's going to happen here is-- in which, 685 00:22:33,860 --> 00:22:37,190 following an eloquent exposition of a former model, 686 00:22:37,190 --> 00:22:39,650 attendant experiments, and quantitative data, 687 00:22:39,650 --> 00:22:43,490 Laura proceeds to wave her hands around. 688 00:22:43,490 --> 00:22:45,490 Someone was asking about intuitions. 689 00:22:45,490 --> 00:22:47,660 And I just had this strong compelling intuition 690 00:22:47,660 --> 00:22:49,460 this was completely wrong. 691 00:22:49,460 --> 00:22:51,380 But mainly, I was puzzled as to why I was 692 00:22:51,380 --> 00:22:52,880 stuck on this archaic locution. 693 00:22:52,880 --> 00:22:55,574 What was I doing with this "in which" situation? 694 00:22:55,574 --> 00:22:56,990 And it occurred to me the reason I 695 00:22:56,990 --> 00:23:02,750 was stuck on this particular locution is that I was thinking 696 00:23:02,750 --> 00:23:05,960 of a particular story that nicely illustrates 697 00:23:05,960 --> 00:23:08,920 exactly what I think the problem is with stochastic search. 698 00:23:08,920 --> 00:23:11,260 It's from a classic in developmental literature, 699 00:23:11,260 --> 00:23:14,360 the stories, of course, A. A. Milne and Winnie the Pooh. 700 00:23:14,360 --> 00:23:17,270 And this is the particular problem of "in which" 701 00:23:17,270 --> 00:23:21,170 Christopher Robin and Pooh and all go on an expedition 702 00:23:21,170 --> 00:23:22,820 to the North Pole. 703 00:23:22,820 --> 00:23:24,410 And so they're organizing a search. 704 00:23:24,410 --> 00:23:27,740 And like Tomer's algorithms, they don't actually 705 00:23:27,740 --> 00:23:29,510 know where the North Pole is. 706 00:23:29,510 --> 00:23:33,365 And it turns out, as Christopher Robin confides in a moment 707 00:23:33,365 --> 00:23:36,230 to rabbit, they also don't know what the North Pole is, also 708 00:23:36,230 --> 00:23:37,864 like Tomer's algorithms. 709 00:23:37,864 --> 00:23:39,530 But they do something about the terrain. 710 00:23:39,530 --> 00:23:41,071 And they know something about search. 711 00:23:41,071 --> 00:23:43,100 You gather all the friends and relations. 712 00:23:43,100 --> 00:23:45,350 And you engage in an iterative process over and over 713 00:23:45,350 --> 00:23:46,760 and over again until you succeed, 714 00:23:46,760 --> 00:23:48,650 and you find yourself at the North Pole. 715 00:23:48,650 --> 00:23:51,320 And Eeyore is a little bit skeptical about this. 716 00:23:51,320 --> 00:23:53,370 He says, well, you can call this you know, 717 00:23:53,370 --> 00:23:57,650 expo-whatever or gathering nuts in May. 718 00:23:57,650 --> 00:23:59,815 It's all the same to him. 719 00:23:59,815 --> 00:24:01,190 And at the of the day, I actually 720 00:24:01,190 --> 00:24:03,200 think Christopher Robin and colleagues here 721 00:24:03,200 --> 00:24:06,890 have a certain set of advantages over Tomer and colleagues. 722 00:24:06,890 --> 00:24:08,930 And that is that it is a Hundred Acre Wood. 723 00:24:08,930 --> 00:24:10,620 And so if the North Pole's really there, 724 00:24:10,620 --> 00:24:11,840 and they engage in that process, and they 725 00:24:11,840 --> 00:24:13,590 have all of rabbits friends and relations, 726 00:24:13,590 --> 00:24:15,650 they probably will find it. 727 00:24:15,650 --> 00:24:20,000 But it also turns out that, unlike Tomer and colleagues, 728 00:24:20,000 --> 00:24:23,360 they do know something about what they're searching for. 729 00:24:23,360 --> 00:24:25,230 Now, they're wrong about this. 730 00:24:25,230 --> 00:24:28,280 But nonetheless, I think they access an important constraint 731 00:24:28,280 --> 00:24:30,786 on the search process and actually 732 00:24:30,786 --> 00:24:31,910 helps them out quite a lot. 733 00:24:31,910 --> 00:24:37,200 So that's what I'm going to try to talk to you about today. 734 00:24:37,200 --> 00:24:39,350 So here are the issues with stochastic search. 735 00:24:39,350 --> 00:24:41,600 I think there are two big ones. 736 00:24:41,600 --> 00:24:43,580 Grant everything-- grant grammar, 737 00:24:43,580 --> 00:24:45,530 grant prior knowledge, grant templates, 738 00:24:45,530 --> 00:24:47,810 grant a bias towards simplicity. 739 00:24:47,810 --> 00:24:50,270 The problem with an infinite search space 740 00:24:50,270 --> 00:24:54,650 is infinite is a very, very, very big space to search. 741 00:24:54,650 --> 00:24:56,640 It's a very big space to search. 742 00:24:56,640 --> 00:25:01,200 And children seem to do remarkably well with it. 743 00:25:01,200 --> 00:25:04,040 So I'm going to, I guess, give you a toy fictional example. 744 00:25:04,040 --> 00:25:09,140 I'm going to give you a non-fiction example 745 00:25:09,140 --> 00:25:09,830 from my child. 746 00:25:09,830 --> 00:25:11,930 We were riding on an airplane. 747 00:25:11,930 --> 00:25:14,000 She was about 3 and 1/2 at the time. 748 00:25:14,000 --> 00:25:16,210 And she knows a lot about airplanes. 749 00:25:16,210 --> 00:25:18,650 She has a lot of folk physics, prior knowledge 750 00:25:18,650 --> 00:25:20,540 about airplanes. 751 00:25:20,540 --> 00:25:22,330 And she also knows a lot about phones. 752 00:25:22,330 --> 00:25:23,996 She's had a lot of experience of phones. 753 00:25:23,996 --> 00:25:25,520 But nothing in her prior knowledge 754 00:25:25,520 --> 00:25:26,780 predicted the announcement that you 755 00:25:26,780 --> 00:25:29,330 have to turn off your cell phone when you fly a plane, right? 756 00:25:29,330 --> 00:25:30,440 So this was surprising. 757 00:25:30,440 --> 00:25:32,523 And she immediately said, well, I know the answer. 758 00:25:32,523 --> 00:25:34,440 I know why you have to do that. 759 00:25:34,440 --> 00:25:36,170 And I know that she doesn't know anything 760 00:25:36,170 --> 00:25:38,315 about radio transmission or government bureaucracy. 761 00:25:38,315 --> 00:25:40,340 So I said, how do you know? 762 00:25:40,340 --> 00:25:41,240 Why do you think? 763 00:25:41,240 --> 00:25:45,300 And she said, well, because when the plane takes off, 764 00:25:45,300 --> 00:25:48,790 it's too noisy to hear. 765 00:25:48,790 --> 00:25:51,680 You know, that example is not especially clever 766 00:25:51,680 --> 00:25:54,050 or especially adorable. 767 00:25:54,050 --> 00:25:56,994 But what is really, really interesting about that answer 768 00:25:56,994 --> 00:25:58,910 is that although it is wrong-- it's even wrong 769 00:25:58,910 --> 00:26:01,520 as to the causal direction-- 770 00:26:01,520 --> 00:26:04,040 it is a good wrong answer compared 771 00:26:04,040 --> 00:26:05,960 to all of the other things that are 772 00:26:05,960 --> 00:26:08,240 consistent with her prior knowledge and the grammar 773 00:26:08,240 --> 00:26:11,630 of her intuitive theories that she didn't say. 774 00:26:11,630 --> 00:26:13,890 She didn't say, because airplanes are made of metal 775 00:26:13,890 --> 00:26:14,889 and so are phones. 776 00:26:14,889 --> 00:26:16,430 Because airplanes fly over the Earth, 777 00:26:16,430 --> 00:26:17,510 and the Earth has phones. 778 00:26:17,510 --> 00:26:19,550 Because airplanes are big and phones are small. 779 00:26:19,550 --> 00:26:21,592 Because airplanes-- her grandfather lives in Ohio 780 00:26:21,592 --> 00:26:24,050 and has led her to believe that everything is made in Ohio. 781 00:26:24,050 --> 00:26:26,200 Because airplanes and phones are both made in Ohio. 782 00:26:26,200 --> 00:26:28,610 Infinite is a very big space. 783 00:26:28,610 --> 00:26:30,080 There are a lot of things that you 784 00:26:30,080 --> 00:26:32,864 could say consistent with prior knowledge 785 00:26:32,864 --> 00:26:34,280 where you're making random changes 786 00:26:34,280 --> 00:26:38,330 in your intuitive theories that are not even wrong. 787 00:26:38,330 --> 00:26:39,860 They're not even wrong. 788 00:26:39,860 --> 00:26:41,900 And so the real question is, how did she 789 00:26:41,900 --> 00:26:45,260 converge at a good wrong answer, at an answer 790 00:26:45,260 --> 00:26:48,130 that, although it is wrong, makes sense? 791 00:26:48,130 --> 00:26:53,420 It isn't just, I think, a toy problem. 792 00:26:53,420 --> 00:26:55,130 But here is the problem. 793 00:26:55,130 --> 00:26:57,110 There are innumerable logical constitutive 794 00:26:57,110 --> 00:26:59,900 causal and relational hypotheses consistent with the grammar 795 00:26:59,900 --> 00:27:01,200 of intuitive theories. 796 00:27:01,200 --> 00:27:03,650 How do we so rapidly, literally between the announcement 797 00:27:03,650 --> 00:27:05,191 and the next thing out of our mouths, 798 00:27:05,191 --> 00:27:08,240 converge on ones that, if they were true, 799 00:27:08,240 --> 00:27:09,542 might explain the data? 800 00:27:09,542 --> 00:27:10,500 They might not be true. 801 00:27:10,500 --> 00:27:12,710 But if they were, they could work. 802 00:27:12,710 --> 00:27:14,300 They could solve problems. 803 00:27:14,300 --> 00:27:17,300 And that I think is the really hard mystery. 804 00:27:17,300 --> 00:27:20,830 And again, it's not just a toy problem. 805 00:27:20,830 --> 00:27:24,380 Modeling even relatively simple well-understood problems 806 00:27:24,380 --> 00:27:24,920 takes time. 807 00:27:24,920 --> 00:27:26,977 I would often come across Josh's students 808 00:27:26,977 --> 00:27:28,060 wandering in the hallways. 809 00:27:28,060 --> 00:27:28,930 And I say, what are you doing? 810 00:27:28,930 --> 00:27:31,175 They're like, oh, I'm waiting for my model to run. 811 00:27:31,175 --> 00:27:32,550 I'm like, waiting for your model? 812 00:27:32,550 --> 00:27:33,930 Computers are really fast. 813 00:27:33,930 --> 00:27:35,810 They're fast information processing. 814 00:27:35,810 --> 00:27:37,016 What is it doing? 815 00:27:37,016 --> 00:27:39,140 Well, it turns out it's generating a lot of answers 816 00:27:39,140 --> 00:27:40,460 that aren't even wrong. 817 00:27:40,460 --> 00:27:41,970 That's what it's doing, right? 818 00:27:41,970 --> 00:27:44,360 It's spending a lot of time sitting 819 00:27:44,360 --> 00:27:48,140 around sifting through things that aren't even right. 820 00:27:48,140 --> 00:27:50,859 Iterations are spent searching in hopeless places. 821 00:27:50,859 --> 00:27:53,150 And this is true of some of the best and the brightest. 822 00:27:53,150 --> 00:27:56,920 This was like a fantastic NIPS paper, a major advance. 823 00:27:56,920 --> 00:28:00,110 It's a probabilistic graphics program. 824 00:28:00,110 --> 00:28:04,005 And it's solving the really deep theory problem of CAPTCHAs. 825 00:28:04,005 --> 00:28:05,920 So it's trying to figure out in this case-- 826 00:28:05,920 --> 00:28:07,814 but it doesn't just do-- in fairness-- well, 827 00:28:07,814 --> 00:28:08,730 let me return to that. 828 00:28:08,730 --> 00:28:09,950 Let me show you what it does do. 829 00:28:09,950 --> 00:28:11,241 There's a rectangle over there. 830 00:28:11,241 --> 00:28:14,300 It wants to be able to figure out what is in that space. 831 00:28:14,300 --> 00:28:15,890 And it wants to model it. 832 00:28:15,890 --> 00:28:18,320 It wants to find a rectangle in the lower left-hand corner 833 00:28:18,320 --> 00:28:18,950 of a scene. 834 00:28:18,950 --> 00:28:19,580 So what does it do? 835 00:28:19,580 --> 00:28:21,110 It generates pixels all over the map 836 00:28:21,110 --> 00:28:22,318 until it finds the rectangle. 837 00:28:22,318 --> 00:28:24,450 And I saw this in [INAUDIBLE]. 838 00:28:24,450 --> 00:28:26,164 And I said, why is it looking all-- 839 00:28:26,164 --> 00:28:27,580 can it at least confine its search 840 00:28:27,580 --> 00:28:28,790 to the lower left-hand corner? 841 00:28:28,790 --> 00:28:30,540 But, of course, the algorithm doesn't know 842 00:28:30,540 --> 00:28:31,820 from lower left-hand corners. 843 00:28:31,820 --> 00:28:33,266 It's a powerful algorithm. 844 00:28:33,266 --> 00:28:34,640 If you wanted to solve a CAPTCHA, 845 00:28:34,640 --> 00:28:36,140 you could do-- see look at it. 846 00:28:36,140 --> 00:28:37,140 It's all over the place. 847 00:28:37,140 --> 00:28:39,530 Now, this is a virtue. 848 00:28:39,530 --> 00:28:40,680 And, yeah, it converges. 849 00:28:40,680 --> 00:28:42,200 And that's just great, right? 850 00:28:42,200 --> 00:28:45,092 But why doesn't it search in lower left-hand corner? 851 00:28:45,092 --> 00:28:46,550 Well, the answer is it doesn't know 852 00:28:46,550 --> 00:28:47,440 from lower left-hand corners. 853 00:28:47,440 --> 00:28:50,000 And it's a feature, not a bug, that it doesn't know from this. 854 00:28:50,000 --> 00:28:51,140 Because that means it doesn't just 855 00:28:51,140 --> 00:28:53,348 solve CAPTCHAs, which you can do with edge detection. 856 00:28:53,348 --> 00:28:55,370 It can find, you know, objects in the road. 857 00:28:55,370 --> 00:28:57,152 And it can do all kinds of other things 858 00:28:57,152 --> 00:28:58,610 that I'm sure Josh and Vikash would 859 00:28:58,610 --> 00:28:59,859 be happy to talk to you about. 860 00:28:59,859 --> 00:29:01,250 It's a pretty general thing. 861 00:29:01,250 --> 00:29:02,420 But it's not constrained. 862 00:29:02,420 --> 00:29:03,050 And that's a feature. 863 00:29:03,050 --> 00:29:04,500 But the interesting thing about humans, 864 00:29:04,500 --> 00:29:06,041 including human children, is they are 865 00:29:06,041 --> 00:29:07,524 both flexible and constrained. 866 00:29:07,524 --> 00:29:09,440 They can both solve a whole bunch of problems, 867 00:29:09,440 --> 00:29:11,900 and they can converge on them very quickly, right? 868 00:29:11,900 --> 00:29:14,930 Whereas, here, it trades off. 869 00:29:22,020 --> 00:29:23,870 And, again, that is a simpler problem. 870 00:29:23,870 --> 00:29:25,670 That is a square and a bunch of pixels. 871 00:29:25,670 --> 00:29:27,230 The kind of learning we're talking about when we're 872 00:29:27,230 --> 00:29:29,396 trying to talk about theory generation or real world 873 00:29:29,396 --> 00:29:32,480 learning is a really, really, really, really big space. 874 00:29:32,480 --> 00:29:34,190 So one problem is just how are you 875 00:29:34,190 --> 00:29:37,310 going to get at least to answers that, if they were true, 876 00:29:37,310 --> 00:29:38,960 might work? 877 00:29:38,960 --> 00:29:42,230 But if one problem is that the theory space is really big, 878 00:29:42,230 --> 00:29:45,440 the other problem is that human learners are not that dumb. 879 00:29:45,440 --> 00:29:46,780 We have a lot of knowledge. 880 00:29:46,780 --> 00:29:49,280 And we have a lot of knowledge that these algorithms are not 881 00:29:49,280 --> 00:29:51,230 making use of. 882 00:29:51,230 --> 00:29:53,170 And the question is, why not? 883 00:29:53,170 --> 00:29:54,920 And is there any way that we could develop 884 00:29:54,920 --> 00:29:57,200 models that did make use of it? 885 00:29:57,200 --> 00:30:00,650 In particular, we know a lot about our problems. 886 00:30:00,650 --> 00:30:02,420 Our problems are actually our friends. 887 00:30:02,420 --> 00:30:04,190 We know about our problems and our goals. 888 00:30:04,190 --> 00:30:07,220 And we know about our problems well before we 889 00:30:07,220 --> 00:30:10,250 can solve those problems. 890 00:30:10,250 --> 00:30:13,670 An abstract representation of what the solution might 891 00:30:13,670 --> 00:30:16,160 look like, what it ought to do, what 892 00:30:16,160 --> 00:30:18,560 the criteria it's trying to satisfy are, 893 00:30:18,560 --> 00:30:20,960 could help constrain and guide the search. 894 00:30:20,960 --> 00:30:22,520 It matters about it though, not just 895 00:30:22,520 --> 00:30:24,860 that she had prior knowledge about airplanes and about 896 00:30:24,860 --> 00:30:27,050 phones, but that she had prior knowledge 897 00:30:27,050 --> 00:30:28,890 that the problem she was solving was 898 00:30:28,890 --> 00:30:32,390 an unpredicted incompatibility between airplanes and phones. 899 00:30:32,390 --> 00:30:35,060 You have to turn one off when the other is going on. 900 00:30:35,060 --> 00:30:37,475 That's information that's not in her general background 901 00:30:37,475 --> 00:30:37,975 knowledge. 902 00:30:37,975 --> 00:30:41,066 It's about the particular problem that she has. 903 00:30:41,066 --> 00:30:42,440 And the question is how could you 904 00:30:42,440 --> 00:30:49,820 use that to make good proposals and make better proposals? 905 00:30:49,820 --> 00:30:53,300 So the proposal I have is that when 906 00:30:53,300 --> 00:30:56,359 you know something about what you're looking for, 907 00:30:56,359 --> 00:30:57,650 then that can help you find it. 908 00:30:57,650 --> 00:30:58,640 And this is the kind of knowledge 909 00:30:58,640 --> 00:30:59,900 that Christopher Robin and colleagues 910 00:30:59,900 --> 00:31:01,490 had that Tomer and colleagues did not. 911 00:31:01,490 --> 00:31:04,730 They at least eventually decided that what they were looking for 912 00:31:04,730 --> 00:31:05,810 was, of course, a pole. 913 00:31:05,810 --> 00:31:06,980 They know what poles are. 914 00:31:06,980 --> 00:31:08,090 Therefore, when they find a pole, 915 00:31:08,090 --> 00:31:10,048 they can be quite confident that here they are. 916 00:31:10,048 --> 00:31:12,500 And this is a good candidate solution to their problem. 917 00:31:12,500 --> 00:31:16,830 That solution is wrong, but at least it's wrong. 918 00:31:19,700 --> 00:31:20,320 OK. 919 00:31:20,320 --> 00:31:24,780 So the argument here, which I'm going 920 00:31:24,780 --> 00:31:26,880 to try to get slightly more precise, 921 00:31:26,880 --> 00:31:29,940 is that the form of the problem as an input to the algorithm 922 00:31:29,940 --> 00:31:33,630 should increase the probability that proposes useful ideas. 923 00:31:33,630 --> 00:31:38,790 And you can consider this even in the simplest form 924 00:31:38,790 --> 00:31:41,385 in the kind of information that is contained in our question 925 00:31:41,385 --> 00:31:41,885 words. 926 00:31:41,885 --> 00:31:44,550 So I think it's an interesting feature about human cognition 927 00:31:44,550 --> 00:31:46,740 that we have a very, very small handful of question 928 00:31:46,740 --> 00:31:50,941 words, which we use to query the entire universe. 929 00:31:50,941 --> 00:31:51,690 And you know what? 930 00:31:51,690 --> 00:31:54,670 Those question words do a lot of work for us. 931 00:31:54,670 --> 00:31:58,410 When I tell you I'm asking a question about who, 932 00:31:58,410 --> 00:32:01,274 you might propose that we ought to be looking 933 00:32:01,274 --> 00:32:02,940 for some kind of answer that's something 934 00:32:02,940 --> 00:32:03,890 like a social network. 935 00:32:03,890 --> 00:32:06,210 And a social network might be more likely as an answer 936 00:32:06,210 --> 00:32:07,560 than a 2D map. 937 00:32:07,560 --> 00:32:10,500 Whereas, if I ask you a question about where, well, you really 938 00:32:10,500 --> 00:32:12,150 do want to consider 2D maps. 939 00:32:12,150 --> 00:32:14,670 If I'm asking when, you're talking about a time line 940 00:32:14,670 --> 00:32:15,390 answer. 941 00:32:15,390 --> 00:32:19,350 If I ask you a why question, maybe it's a causal network. 942 00:32:19,350 --> 00:32:21,180 And if I'm asking you a how question, 943 00:32:21,180 --> 00:32:24,037 maybe it's a circuit diagram, right? 944 00:32:24,037 --> 00:32:26,370 You don't know anything about the content at this point. 945 00:32:26,370 --> 00:32:27,900 I could be asking you anything. 946 00:32:27,900 --> 00:32:30,690 But I ask you who was Christopher Columbus, 947 00:32:30,690 --> 00:32:33,120 and you answer 1492. 948 00:32:33,120 --> 00:32:35,730 That's the kind of thing that our algorithms are doing. 949 00:32:35,730 --> 00:32:38,190 That's not even wrong, right? 950 00:32:38,190 --> 00:32:40,110 It's consistent with your prior knowledge. 951 00:32:40,110 --> 00:32:42,270 And it's the kind of thing Watson does 952 00:32:42,270 --> 00:32:44,400 as good [INAUDIBLE] solutions. 953 00:32:44,400 --> 00:32:46,420 But it's not what children do. 954 00:32:46,420 --> 00:32:49,290 It's not what children do. 955 00:32:49,290 --> 00:32:53,310 I think that the issue is that this is actually 956 00:32:53,310 --> 00:32:54,830 a friendly amendment, right? 957 00:32:54,830 --> 00:32:57,500 Because in what we have shown time and again-- 958 00:32:57,500 --> 00:33:02,460 by we I mean not me, but computational modeling folks-- 959 00:33:02,460 --> 00:33:05,310 is that we can use lots of information 960 00:33:05,310 --> 00:33:08,670 out there for hypothesis evaluation, right? 961 00:33:08,670 --> 00:33:10,620 Once we have a theory and we have the data, 962 00:33:10,620 --> 00:33:13,320 we can select and use this information and say, well, 963 00:33:13,320 --> 00:33:15,600 does it answer the problem or not, right? 964 00:33:15,600 --> 00:33:16,710 Does it improve? 965 00:33:16,710 --> 00:33:18,490 Does it make better predictions? 966 00:33:18,490 --> 00:33:22,720 So we use this kind of information that we have. 967 00:33:22,720 --> 00:33:24,840 Even formally, we can say that we can use 968 00:33:24,840 --> 00:33:26,912 it to select among hypotheses. 969 00:33:26,912 --> 00:33:29,370 We can use information about the structural form of problem 970 00:33:29,370 --> 00:33:30,360 to represent them. 971 00:33:30,360 --> 00:33:34,470 The question is, can we use the same kind of information 972 00:33:34,470 --> 00:33:35,970 to constrain the search space? 973 00:33:35,970 --> 00:33:37,136 And it's easy for me to say. 974 00:33:37,136 --> 00:33:38,790 I don't have to do that, right? 975 00:33:38,790 --> 00:33:43,710 But it's the kind of proposal that I think is missing. 976 00:33:43,710 --> 00:33:46,890 Because we have rich constraints that go far, far, far 977 00:33:46,890 --> 00:33:48,930 beyond our question words. 978 00:33:48,930 --> 00:33:50,370 The kinds of problems that we have 979 00:33:50,370 --> 00:33:52,170 and the criteria for solving them 980 00:33:52,170 --> 00:33:54,640 derive from all kinds of sources. 981 00:33:54,640 --> 00:33:56,610 We try to solve different kinds of problems-- 982 00:33:56,610 --> 00:34:01,939 navigation in some cases, explanation in other cases. 983 00:34:01,939 --> 00:34:03,480 And some of those are epistemic ends. 984 00:34:03,480 --> 00:34:04,620 I want to persuade you of something. 985 00:34:04,620 --> 00:34:06,300 I want to instruct you in something. 986 00:34:06,300 --> 00:34:09,420 I want to deceive you in some ways. 987 00:34:09,420 --> 00:34:11,520 But we also have all kinds of non-epistemic goals. 988 00:34:11,520 --> 00:34:12,440 I want to impress you. 989 00:34:12,440 --> 00:34:13,315 I want to soothe you. 990 00:34:13,315 --> 00:34:14,370 I want to entertain you. 991 00:34:14,370 --> 00:34:17,639 Each of these goals is actually a constraint 992 00:34:17,639 --> 00:34:21,026 on what is going to count as the solution. 993 00:34:21,026 --> 00:34:24,210 Our goals are innumerable. 994 00:34:24,210 --> 00:34:25,980 But there are only a small handful 995 00:34:25,980 --> 00:34:28,500 of ways you can solve any given goal. 996 00:34:28,500 --> 00:34:31,139 So when you're dealing with an infinite search space, 997 00:34:31,139 --> 00:34:34,139 having a goal, having a problem, actually 998 00:34:34,139 --> 00:34:37,199 could act as a constraint on how you search for the solution. 999 00:34:37,199 --> 00:34:39,810 And it is an interesting feature of human cognition 1000 00:34:39,810 --> 00:34:43,710 that our goals can be noble or venial. 1001 00:34:43,710 --> 00:34:46,199 They can be impressive or trivial. 1002 00:34:46,199 --> 00:34:49,080 And it may not matter with respect 1003 00:34:49,080 --> 00:34:50,850 to the solution we have. 1004 00:34:50,850 --> 00:34:54,090 We have analytic logic, because the medieval monks 1005 00:34:54,090 --> 00:34:58,530 wanted to find incontrovertible proof for the existence of God. 1006 00:34:58,530 --> 00:35:00,480 We don't hold onto their goal, but we 1007 00:35:00,480 --> 00:35:02,280 hold onto their solution. 1008 00:35:02,280 --> 00:35:07,590 We are here in the East Coast of Massachusetts, 1009 00:35:07,590 --> 00:35:11,550 because of the search for the West Indies, right? 1010 00:35:11,550 --> 00:35:15,010 So our goals act as constraints on the solution 1011 00:35:15,010 --> 00:35:16,560 and on the search process. 1012 00:35:16,560 --> 00:35:18,070 And the importance of our goals may 1013 00:35:18,070 --> 00:35:21,600 be that they do exactly that, that they help leverage 1014 00:35:21,600 --> 00:35:27,330 some new search in a way that at least helps us make progress. 1015 00:35:27,330 --> 00:35:30,630 So the argument here is instead of stochastic search, 1016 00:35:30,630 --> 00:35:31,221 that we have-- 1017 00:35:31,221 --> 00:35:32,470 I don't call it goal oriented. 1018 00:35:32,470 --> 00:35:36,690 I call it goal constrained now-- goal-constrained hypothesis 1019 00:35:36,690 --> 00:35:38,790 generation. 1020 00:35:38,790 --> 00:35:41,940 And the idea here is that at least we 1021 00:35:41,940 --> 00:35:44,345 know something about where we want to go. 1022 00:35:44,345 --> 00:35:45,720 Now, this is not a total argument 1023 00:35:45,720 --> 00:35:47,100 against stochastic search. 1024 00:35:47,100 --> 00:35:49,500 It's just a way of getting stochastic search into a much 1025 00:35:49,500 --> 00:35:51,390 smaller search space, right? 1026 00:35:51,390 --> 00:35:53,970 Once you know what things count, then you can do everything 1027 00:35:53,970 --> 00:35:54,720 Tomer says you do. 1028 00:35:54,720 --> 00:35:55,804 I actually agree with him. 1029 00:35:55,804 --> 00:35:57,928 But you don't want to do it over all possibilities, 1030 00:35:57,928 --> 00:35:59,910 because you know a lot more than that, right? 1031 00:35:59,910 --> 00:36:01,480 You know what kinds of things are going to count. 1032 00:36:01,480 --> 00:36:03,540 You should do it over that space, not over-- should 1033 00:36:03,540 --> 00:36:04,740 look in the left-hand corner. 1034 00:36:04,740 --> 00:36:06,360 Then you can iterate all the pixels you want. 1035 00:36:06,360 --> 00:36:08,151 But if the thing's on the left-hand corner, 1036 00:36:08,151 --> 00:36:11,400 that's where you ought to be looking. 1037 00:36:11,400 --> 00:36:15,120 I'm going to give you a corollary to this, which 1038 00:36:15,120 --> 00:36:17,930 is if you don't have any idea what the search space is, 1039 00:36:17,930 --> 00:36:19,820 you are going to resort to an extremely 1040 00:36:19,820 --> 00:36:22,064 inefficient, extremely frustrating search 1041 00:36:22,064 --> 00:36:24,230 and, actually, the kinds of conditions under which I 1042 00:36:24,230 --> 00:36:25,670 think human beings quit. 1043 00:36:25,670 --> 00:36:29,150 So I will give you an example from my personal experience, 1044 00:36:29,150 --> 00:36:30,380 as Jessica Sommerville knows. 1045 00:36:30,380 --> 00:36:33,186 Because we were trying to get my child's booster seat-- 1046 00:36:33,186 --> 00:36:35,060 because children now need booster seats until 1047 00:36:35,060 --> 00:36:36,660 they're 14-- 1048 00:36:36,660 --> 00:36:38,510 re-attached from my plane flight. 1049 00:36:38,510 --> 00:36:39,530 Couldn't do it, right? 1050 00:36:39,530 --> 00:36:41,113 One thing goes down, two things go up. 1051 00:36:41,113 --> 00:36:42,260 It's a spatial problem. 1052 00:36:42,260 --> 00:36:44,470 We spent 10 minutes, two PhDs, on it-- 1053 00:36:44,470 --> 00:36:46,777 threw the thing out and just had her ride 1054 00:36:46,777 --> 00:36:48,860 in the bottom of the booster seat without the top. 1055 00:36:48,860 --> 00:36:52,730 If I have 1,000 piece puzzle and I have to find a puzzle piece, 1056 00:36:52,730 --> 00:36:55,280 people who are good at puzzles know 1057 00:36:55,280 --> 00:36:58,360 before they find that piece something about what 1058 00:36:58,360 --> 00:36:59,360 it's going to look like. 1059 00:36:59,360 --> 00:37:01,526 Like, OK, well, the edge has to be angled like this. 1060 00:37:01,526 --> 00:37:03,980 It has to have a concavity here and a convexity there. 1061 00:37:03,980 --> 00:37:05,380 And that's what I'm looking for. 1062 00:37:05,380 --> 00:37:07,070 Me-- soon as I look away from the piece, 1063 00:37:07,070 --> 00:37:08,690 I have no idea what I'm looking for. 1064 00:37:08,690 --> 00:37:11,034 And so I do what Tomer would do. 1065 00:37:11,034 --> 00:37:12,950 I do a stochastic search over all the puzzles, 1066 00:37:12,950 --> 00:37:15,140 And with 1,000 pieces and many permutations, 1067 00:37:15,140 --> 00:37:16,970 that is not a good way to solve a puzzle. 1068 00:37:16,970 --> 00:37:19,820 As a result, I never do puzzles, right? 1069 00:37:19,820 --> 00:37:21,770 As entertainment, I just don't understand. 1070 00:37:21,770 --> 00:37:24,846 But if you do know, it's very satisfying, right? 1071 00:37:24,846 --> 00:37:26,220 You know what you're looking for. 1072 00:37:26,220 --> 00:37:28,011 And now you can constrain your search space 1073 00:37:28,011 --> 00:37:31,250 much more effectively to those kinds of things. 1074 00:37:31,250 --> 00:37:33,320 Indeed, when I say we are smarter than that, 1075 00:37:33,320 --> 00:37:34,490 human beings know. 1076 00:37:34,490 --> 00:37:37,170 We have metacognitive principles around these kinds of things, 1077 00:37:37,170 --> 00:37:37,670 right? 1078 00:37:37,670 --> 00:37:39,740 This knowledge is in human minds, 1079 00:37:39,740 --> 00:37:41,390 what it might mean for us to think 1080 00:37:41,390 --> 00:37:43,520 that a problem is tractable. 1081 00:37:43,520 --> 00:37:45,110 What does that word mean, right? 1082 00:37:45,110 --> 00:37:46,910 Sometimes it means we have the financial resources 1083 00:37:46,910 --> 00:37:48,326 or the technology to carry it off. 1084 00:37:48,326 --> 00:37:50,690 But often, it means we have a well-posed problem. 1085 00:37:50,690 --> 00:37:52,280 We don't know the answer, but we know 1086 00:37:52,280 --> 00:37:53,609 what the answer needs to do. 1087 00:37:53,609 --> 00:37:55,400 We know what the answer needs to look like. 1088 00:37:55,400 --> 00:37:59,120 We have criteria for what would count. 1089 00:37:59,120 --> 00:38:02,060 At least we have a precise enough representation 1090 00:38:02,060 --> 00:38:04,880 of the problem to effectively and efficiently guide 1091 00:38:04,880 --> 00:38:05,380 the search. 1092 00:38:05,380 --> 00:38:06,880 And I think that's the kind of thing 1093 00:38:06,880 --> 00:38:09,517 we would ideally like our models and algorithms to have. 1094 00:38:09,517 --> 00:38:11,600 At that point, we may have to bounce around a lot. 1095 00:38:11,600 --> 00:38:16,470 But we're bouncing in a pretty well-defined space. 1096 00:38:16,470 --> 00:38:18,620 So to the degree that this is true, 1097 00:38:18,620 --> 00:38:21,980 it actually explains a lot of otherwise peculiar features 1098 00:38:21,980 --> 00:38:24,440 of human cognition. 1099 00:38:24,440 --> 00:38:27,080 For instance, we had this weird sense 1100 00:38:27,080 --> 00:38:28,900 that we're on the right track, right? 1101 00:38:28,900 --> 00:38:30,450 Well, what does it mean to be on the right track? 1102 00:38:30,450 --> 00:38:32,750 It surely doesn't mean you're better at explaining the data, 1103 00:38:32,750 --> 00:38:33,250 right? 1104 00:38:33,250 --> 00:38:34,905 You may be nowhere close to having 1105 00:38:34,905 --> 00:38:37,280 an answer that makes better predictions or gets it right. 1106 00:38:37,280 --> 00:38:39,350 But you're like, oh, that's a really good idea, you know? 1107 00:38:39,350 --> 00:38:40,040 You get excited. 1108 00:38:40,040 --> 00:38:41,540 Or you're like, no, that's a non-answer. 1109 00:38:41,540 --> 00:38:42,289 What does it mean? 1110 00:38:42,289 --> 00:38:44,240 It might mean that at least it fits 1111 00:38:44,240 --> 00:38:45,770 the abstract form of solution. 1112 00:38:45,770 --> 00:38:47,540 If it were true, it might work. 1113 00:38:50,130 --> 00:38:52,427 We can tell our students in an undergraduate class 1114 00:38:52,427 --> 00:38:54,260 that that was a great idea even when we know 1115 00:38:54,260 --> 00:38:55,642 it's been disproven, right? 1116 00:38:55,642 --> 00:38:56,600 So it's actually false. 1117 00:38:56,600 --> 00:38:57,480 It doesn't explain. 1118 00:38:57,480 --> 00:38:58,936 It's just not true. 1119 00:38:58,936 --> 00:39:01,060 We still think it's a great idea in virtue of what? 1120 00:39:01,060 --> 00:39:03,590 Not it's fit to the data, right? 1121 00:39:03,590 --> 00:39:05,120 Not how well it's predicting things. 1122 00:39:05,120 --> 00:39:07,460 It's actually false and still good. 1123 00:39:07,460 --> 00:39:09,580 What could that possibly mean, right? 1124 00:39:09,580 --> 00:39:11,780 It must mean there's some other constraint 1125 00:39:11,780 --> 00:39:16,100 on hypothesis generation that we are sensitive to. 1126 00:39:16,100 --> 00:39:18,620 So I want to suggest that there are actually two 1127 00:39:18,620 --> 00:39:20,587 constraints for our hypotheses. 1128 00:39:20,587 --> 00:39:22,670 One is how well they fit prior knowledge and data. 1129 00:39:22,670 --> 00:39:23,960 That's the one we know something about. 1130 00:39:23,960 --> 00:39:25,400 That is, for instance, truth. 1131 00:39:25,400 --> 00:39:28,900 But Stephen Colbert in his infinite wisdom 1132 00:39:28,900 --> 00:39:29,900 proposed something else. 1133 00:39:29,900 --> 00:39:32,233 He said, well, we're also sensitive to this thing called 1134 00:39:32,233 --> 00:39:34,642 truthiness, right? 1135 00:39:34,642 --> 00:39:36,350 You know, like how good the story sounds, 1136 00:39:36,350 --> 00:39:37,266 how good the argument. 1137 00:39:37,266 --> 00:39:39,290 And of course, in politics, you know, 1138 00:39:39,290 --> 00:39:41,900 he makes massive and effective fun of this. 1139 00:39:41,900 --> 00:39:45,530 But I think this is a feature of human cognition, not a bug. 1140 00:39:45,530 --> 00:39:48,470 I think it is extremely important that we can generate 1141 00:39:48,470 --> 00:39:50,930 ideas that are truthy, right? 1142 00:39:50,930 --> 00:39:52,730 Because they're plausible. 1143 00:39:52,730 --> 00:39:53,600 They're interesting. 1144 00:39:53,600 --> 00:39:54,570 They're informative. 1145 00:39:54,570 --> 00:39:55,445 They tell good story. 1146 00:39:55,445 --> 00:39:58,470 They may be false, but it could be worse than false. 1147 00:39:58,470 --> 00:39:59,912 It could be not even false. 1148 00:40:03,610 --> 00:40:07,780 So I want to make an important point here. 1149 00:40:07,780 --> 00:40:12,400 Generating new ideas is not just about Einstein versus Newton. 1150 00:40:12,400 --> 00:40:19,810 And it's not about going from an undifferentiated concept 1151 00:40:19,810 --> 00:40:22,420 of heat and temperature to a modern scientific one. 1152 00:40:22,420 --> 00:40:26,140 It's just about radical conceptual change. 1153 00:40:26,140 --> 00:40:29,800 This is the stuff of ordinary everyday thought. 1154 00:40:29,800 --> 00:40:32,900 It is our ability to reliably make up 1155 00:40:32,900 --> 00:40:37,780 new relevant answers to basically any ad hoc question. 1156 00:40:37,780 --> 00:40:39,010 The answers may be trivial. 1157 00:40:39,010 --> 00:40:40,510 They may be false. 1158 00:40:40,510 --> 00:40:41,771 But they are genuinely new. 1159 00:40:41,771 --> 00:40:44,020 And that we didn't have them until we thought of them, 1160 00:40:44,020 --> 00:40:45,430 they're genuinely made up. 1161 00:40:45,430 --> 00:40:47,552 We didn't learn them from evidence or testimony. 1162 00:40:47,552 --> 00:40:48,760 And they answer the question. 1163 00:40:48,760 --> 00:40:50,230 They're not non-sequiturs. 1164 00:40:50,230 --> 00:40:51,712 And I think this is important. 1165 00:40:51,712 --> 00:40:53,170 And this is only possible if we can 1166 00:40:53,170 --> 00:40:56,000 use the form of our problems to guide search. 1167 00:40:56,000 --> 00:40:57,430 So let me give you a few examples. 1168 00:40:57,430 --> 00:41:00,290 What's a good name for a theater company? 1169 00:41:00,290 --> 00:41:01,210 None of you know. 1170 00:41:01,210 --> 00:41:02,650 You haven't thought about that problem before. 1171 00:41:02,650 --> 00:41:04,025 But now you're thinking about it. 1172 00:41:04,025 --> 00:41:06,230 Well, what's a good name for a theater company? 1173 00:41:06,230 --> 00:41:08,175 How do you get stripes on peppermints? 1174 00:41:08,175 --> 00:41:10,300 This is not a problem you walked in thinking about. 1175 00:41:10,300 --> 00:41:12,799 None of you are working on this as your independent project. 1176 00:41:12,799 --> 00:41:13,960 But you can think about it. 1177 00:41:13,960 --> 00:41:15,880 And you already know enough, knowing nothing 1178 00:41:15,880 --> 00:41:19,540 about theater and nothing about peppermints maybe, to know what 1179 00:41:19,540 --> 00:41:21,310 the constraints, what counts, right? 1180 00:41:21,310 --> 00:41:24,160 That's the kind of information you already have. 1181 00:41:24,160 --> 00:41:26,590 It's not prior knowledge about theater companies 1182 00:41:26,590 --> 00:41:27,220 or peppermints. 1183 00:41:27,220 --> 00:41:29,803 It's part knowledge about what's going to work for a solution. 1184 00:41:29,803 --> 00:41:32,910 So for instance, you know that McDonald's is a nonstarter. 1185 00:41:32,910 --> 00:41:36,340 And [INAUDIBLE] or whatever is also not a good name 1186 00:41:36,340 --> 00:41:37,870 for a theater company. 1187 00:41:37,870 --> 00:41:39,820 You know that getting strips on peppermints 1188 00:41:39,820 --> 00:41:42,160 you don't want to do it with a spatter [INAUDIBLE].. 1189 00:41:42,160 --> 00:41:44,290 You don't want to just spray things at it, right? 1190 00:41:44,290 --> 00:41:45,930 Because that wouldn't even count as a solution. 1191 00:41:45,930 --> 00:41:47,980 That's not the kind of thing you're looking for. 1192 00:41:47,980 --> 00:41:50,650 You're looking for something more like fresh ink. 1193 00:41:50,650 --> 00:41:51,160 It's new. 1194 00:41:51,160 --> 00:41:51,730 It's novel. 1195 00:41:51,730 --> 00:41:52,090 It's familiar. 1196 00:41:52,090 --> 00:41:53,050 It makes some reference. 1197 00:41:53,050 --> 00:41:55,508 You're looking for something more like a pendulum approach, 1198 00:41:55,508 --> 00:41:58,599 which at least could generate periodicity. 1199 00:41:58,599 --> 00:42:00,640 I'm sure they don't use a pendulum to spray paint 1200 00:42:00,640 --> 00:42:01,600 peppermints. 1201 00:42:01,600 --> 00:42:02,600 So are you. 1202 00:42:02,600 --> 00:42:05,340 But it's not a bad answer, right? 1203 00:42:05,340 --> 00:42:08,590 So is there any evidence that kids can do this, 1204 00:42:08,590 --> 00:42:10,030 that information contained in only 1205 00:42:10,030 --> 00:42:11,696 that strict form of the problem can help 1206 00:42:11,696 --> 00:42:13,824 learners converge on solutions? 1207 00:42:13,824 --> 00:42:14,740 We wanted to find out. 1208 00:42:14,740 --> 00:42:19,420 So I'm going to show a little baby attempt to start 1209 00:42:19,420 --> 00:42:21,010 getting at this problem. 1210 00:42:21,010 --> 00:42:22,390 We gave kids a machine. 1211 00:42:22,390 --> 00:42:26,890 And we gave kids some things that could work the machine. 1212 00:42:26,890 --> 00:42:31,390 And the machine had two visual effects. 1213 00:42:31,390 --> 00:42:34,520 You could make a ball appear to flow up and down on that screen 1214 00:42:34,520 --> 00:42:35,020 there. 1215 00:42:35,020 --> 00:42:36,340 Or you could make the ball appear at the bottom 1216 00:42:36,340 --> 00:42:37,810 and then flash up at the top, right? 1217 00:42:37,810 --> 00:42:39,502 Because we put a computer behind, right? 1218 00:42:39,502 --> 00:42:41,168 So woo, the ball is moving continuously, 1219 00:42:41,168 --> 00:42:42,820 or the ball is moving discretely. 1220 00:42:42,820 --> 00:42:44,410 And the affordance, as you might note, 1221 00:42:44,410 --> 00:42:46,990 are also continuous or discrete, right? 1222 00:42:46,990 --> 00:42:48,760 There's a rolly ball, or there's this 1223 00:42:48,760 --> 00:42:51,042 peg that you can move back and forth continuously. 1224 00:42:51,042 --> 00:42:52,750 Or there's a peg you can pull in and out, 1225 00:42:52,750 --> 00:42:55,180 or a drawer you can pull in and out. 1226 00:42:55,180 --> 00:42:59,860 So there's continuous and discrete affordances and also 1227 00:42:59,860 --> 00:43:01,220 continuous and discrete effects. 1228 00:43:01,220 --> 00:43:04,080 So we also had an auditory tone that varied continuously 1229 00:43:04,080 --> 00:43:08,340 and an auditory tone that went from high to low. 1230 00:43:08,340 --> 00:43:10,600 Everyone got it? 1231 00:43:10,600 --> 00:43:14,301 And we just showed the children all the parts 1232 00:43:14,301 --> 00:43:15,550 that connected to the machine. 1233 00:43:15,550 --> 00:43:18,429 And then we showed the kids the effects, 1234 00:43:18,429 --> 00:43:19,720 but we had hid the affordances. 1235 00:43:19,720 --> 00:43:22,510 And we said, well, which part made the ball go? 1236 00:43:22,510 --> 00:43:24,980 And we asked them either about the visual or auditory 1237 00:43:24,980 --> 00:43:25,480 affordance? 1238 00:43:25,480 --> 00:43:29,140 They'd seen no covariation data, prior knowledge- you know, 1239 00:43:29,140 --> 00:43:30,560 agnostic about all of this. 1240 00:43:30,560 --> 00:43:32,270 And the question is, well, would they 1241 00:43:32,270 --> 00:43:33,970 say, well, I'm trying to solve a problem 1242 00:43:33,970 --> 00:43:36,980 about a continuous effect. 1243 00:43:36,980 --> 00:43:39,430 I should use a continuous affordance. 1244 00:43:39,430 --> 00:43:42,010 I'm trying to solve a problem about a discrete effect. 1245 00:43:42,010 --> 00:43:44,201 I should use a discrete affordance. 1246 00:43:44,201 --> 00:43:46,450 Would they use something about the form of the problem 1247 00:43:46,450 --> 00:43:52,280 to answer the solution if they knew nothing else about it? 1248 00:43:52,280 --> 00:43:54,640 So there's no fact of the matter here, right? 1249 00:43:54,640 --> 00:43:58,210 Because, obviously, we're not using either of these really. 1250 00:43:58,210 --> 00:44:00,880 But the prediction was that they would indeed 1251 00:44:00,880 --> 00:44:03,190 make this kind of mapping. 1252 00:44:03,190 --> 00:44:04,720 And they did so. 1253 00:44:04,720 --> 00:44:08,349 Now, what you might worry about was that, well, 1254 00:44:08,349 --> 00:44:09,640 there is no fact of the matter. 1255 00:44:09,640 --> 00:44:11,710 So they're just doing some kind of cross-modal mapping, right? 1256 00:44:11,710 --> 00:44:13,330 If you don't have a way to answer the problem, 1257 00:44:13,330 --> 00:44:15,430 they're just saying something like, well, you 1258 00:44:15,430 --> 00:44:17,680 know, this has this property, so does this. 1259 00:44:17,680 --> 00:44:20,982 Let me go ahead and make a mapping from one to the other. 1260 00:44:20,982 --> 00:44:22,690 It's a weird kind of cross-modal mapping. 1261 00:44:22,690 --> 00:44:24,273 Because, usually, you integrate things 1262 00:44:24,273 --> 00:44:27,010 that are in a single stimulus actually contained together, 1263 00:44:27,010 --> 00:44:28,485 like the sound of a ball dropping 1264 00:44:28,485 --> 00:44:29,860 and the sight of a ball dropping, 1265 00:44:29,860 --> 00:44:30,950 not two different things. 1266 00:44:30,950 --> 00:44:33,640 But maybe it's something kind of like that. 1267 00:44:33,640 --> 00:44:36,449 So if they're actually using the form of a problem, 1268 00:44:36,449 --> 00:44:37,990 then if you change the problems, they 1269 00:44:37,990 --> 00:44:39,830 should generate different solutions. 1270 00:44:39,830 --> 00:44:45,170 So what we did in the second experiment is we said-- 1271 00:44:45,170 --> 00:44:46,550 oh. 1272 00:44:46,550 --> 00:44:47,426 Yeah. 1273 00:44:47,426 --> 00:44:49,300 So what we did here is we showed the children 1274 00:44:49,300 --> 00:44:52,510 the continuous visual stimuli, and then we 1275 00:44:52,510 --> 00:44:55,731 asked them to generate the continuous auditory stimuli. 1276 00:44:55,731 --> 00:44:56,230 OK. 1277 00:44:56,230 --> 00:44:58,210 So all of these are now continuous. 1278 00:44:58,210 --> 00:45:00,770 They could still just go ahead and make the continuous map. 1279 00:45:00,770 --> 00:45:03,140 But if you represent the problem as changing from visual 1280 00:45:03,140 --> 00:45:06,210 to audition, that feels like a discrete problem, right? 1281 00:45:06,210 --> 00:45:08,040 You're completely changing modalities. 1282 00:45:08,040 --> 00:45:09,870 So now at this case, you might expect 1283 00:45:09,870 --> 00:45:13,284 the kids to resist making just the continuous mapping 1284 00:45:13,284 --> 00:45:15,450 if they're using something about the kind of problem 1285 00:45:15,450 --> 00:45:17,783 they have to constrain how they search for the solution. 1286 00:45:17,783 --> 00:45:20,200 Now, again, there's no real right answer here. 1287 00:45:20,200 --> 00:45:23,070 But what you see is the kids shifted their responses 1288 00:45:23,070 --> 00:45:24,460 in response to this. 1289 00:45:24,460 --> 00:45:27,300 So this is just a tiny bit of suggestion 1290 00:45:27,300 --> 00:45:30,150 that when there's no differentiating prior knowledge 1291 00:45:30,150 --> 00:45:32,362 and there's no differentiating evidence, 1292 00:45:32,362 --> 00:45:34,320 children take into account what kind of problem 1293 00:45:34,320 --> 00:45:36,980 they're trying to solve, and what the information is, 1294 00:45:36,980 --> 00:45:41,280 and the problem itself that can help constrain their search 1295 00:45:41,280 --> 00:45:42,756 for a solution. 1296 00:45:42,756 --> 00:45:44,880 I'm going to give you another example for some more 1297 00:45:44,880 --> 00:45:47,520 recent work by Pedro Tsividis in our lab. 1298 00:45:47,520 --> 00:45:54,060 He varied the dynamics of a scene. 1299 00:45:54,060 --> 00:45:56,550 He had here some bugs. 1300 00:45:56,550 --> 00:45:58,530 And in one scene, those bugs in green, 1301 00:45:58,530 --> 00:46:00,930 they varied periodically. 1302 00:46:00,930 --> 00:46:03,990 So they just went from having very few spots 1303 00:46:03,990 --> 00:46:05,730 to having a whole lot of spots to having 1304 00:46:05,730 --> 00:46:08,670 a very few spots to having a whole lot of spots, right? 1305 00:46:08,670 --> 00:46:10,590 That's what these green bugs do. 1306 00:46:10,590 --> 00:46:13,980 And the other bugs just got faster over time. 1307 00:46:13,980 --> 00:46:15,480 So those longer vectors are supposed 1308 00:46:15,480 --> 00:46:18,476 to be indicating the bugs are getting faster continuously. 1309 00:46:18,476 --> 00:46:19,850 And then he said, well, you know, 1310 00:46:19,850 --> 00:46:21,330 here are some bugs in these rooms. 1311 00:46:21,330 --> 00:46:23,280 And I have two kinds of lights in the room. 1312 00:46:23,280 --> 00:46:25,410 One set of lights looks like those on top. 1313 00:46:25,410 --> 00:46:28,170 The other set of lights looks like those on the bottom. 1314 00:46:28,170 --> 00:46:31,180 Can you tell me which lights are responsible for the behavior 1315 00:46:31,180 --> 00:46:31,770 of which bugs? 1316 00:46:31,770 --> 00:46:34,540 Again, a sort of very similar kind of problem. 1317 00:46:34,540 --> 00:46:37,050 No fact to the matter that if children can represent 1318 00:46:37,050 --> 00:46:39,150 something about the abstract form of the problem 1319 00:46:39,150 --> 00:46:41,850 and use that to constrain their search for the solution, 1320 00:46:41,850 --> 00:46:44,880 then the periodic lights should reflect the periodic change 1321 00:46:44,880 --> 00:46:46,280 in the bug's behavior. 1322 00:46:46,280 --> 00:46:48,690 The continuous light should reflect the continuous change 1323 00:46:48,690 --> 00:46:49,410 in the bug's behavior. 1324 00:46:49,410 --> 00:46:51,951 We are, of course, not using the words periodic or continuous 1325 00:46:51,951 --> 00:46:53,490 with us or anything like that. 1326 00:46:53,490 --> 00:46:55,014 They have to pull that out and say, 1327 00:46:55,014 --> 00:46:56,430 this is the kind of problem it is. 1328 00:46:56,430 --> 00:46:58,860 This is a feature of a possible solution. 1329 00:46:58,860 --> 00:47:00,660 Let's go ahead and make that mapping. 1330 00:47:00,660 --> 00:47:06,864 And indeed, the kids are doing that well above chance. 1331 00:47:06,864 --> 00:47:08,280 And obviously, in this case, we're 1332 00:47:08,280 --> 00:47:10,170 giving the kids some possible solutions. 1333 00:47:10,170 --> 00:47:11,880 They're not generating it whole cloth. 1334 00:47:11,880 --> 00:47:13,440 But minimally, it's a different way 1335 00:47:13,440 --> 00:47:15,960 of thinking about problems and search. 1336 00:47:15,960 --> 00:47:18,660 They're not using most of our sort of traditional ways 1337 00:47:18,660 --> 00:47:19,470 of figuring it out. 1338 00:47:19,470 --> 00:47:21,719 They're just using something about the kind of problem 1339 00:47:21,719 --> 00:47:23,820 they have and what's available in the problem 1340 00:47:23,820 --> 00:47:26,560 to help sort out the solution. 1341 00:47:26,560 --> 00:47:29,290 So is this analogical reasoning, right? 1342 00:47:29,290 --> 00:47:30,480 It feels kind of an analogy. 1343 00:47:30,480 --> 00:47:32,520 But it's a funny kind of analogy. 1344 00:47:32,520 --> 00:47:34,290 Because what it isn't is a mapping 1345 00:47:34,290 --> 00:47:38,190 between a known problem and a known solution to a new problem 1346 00:47:38,190 --> 00:47:40,110 and a new solution. 1347 00:47:40,110 --> 00:47:43,170 It's rather a mapping from the kind of problem 1348 00:47:43,170 --> 00:47:46,050 you have to the kind of solution you have. 1349 00:47:46,050 --> 00:47:48,460 It's using, again, the problem or the query itself 1350 00:47:48,460 --> 00:47:49,140 as your friend. 1351 00:47:49,140 --> 00:47:51,860 It has information in it about how it wants to be solved. 1352 00:47:51,860 --> 00:47:53,610 How are you going to use that to solve it? 1353 00:47:53,610 --> 00:47:55,620 And, again, the virtue being that 1354 00:47:55,620 --> 00:48:00,960 even if your answer is wrong, if it were true, it would work. 1355 00:48:00,960 --> 00:48:02,970 So the argument also, of course, is 1356 00:48:02,970 --> 00:48:05,940 that this applies to any possible goal we might have 1357 00:48:05,940 --> 00:48:08,640 including those cases where it's just not at all obvious how 1358 00:48:08,640 --> 00:48:10,420 an analogy would apply. 1359 00:48:10,420 --> 00:48:12,996 So what is a good name for a theater company, right? 1360 00:48:12,996 --> 00:48:14,370 You're using something about what 1361 00:48:14,370 --> 00:48:17,255 would count as a solution to constraint something about what 1362 00:48:17,255 --> 00:48:18,630 you think would be a good answer. 1363 00:48:18,630 --> 00:48:21,870 But there's not an easy way to tell that story 1364 00:48:21,870 --> 00:48:24,720 as analogical reasoning. 1365 00:48:24,720 --> 00:48:26,670 So their argument is children seem 1366 00:48:26,670 --> 00:48:30,290 to have data independent criteria for the evaluation 1367 00:48:30,290 --> 00:48:31,470 of hypotheses. 1368 00:48:31,470 --> 00:48:34,110 And these criteria extend beyond simplicity, grammaticality, 1369 00:48:34,110 --> 00:48:36,630 or compatibility with prior knowledge. 1370 00:48:36,630 --> 00:48:40,440 They consider the extent to which a hypothesis fulfills 1371 00:48:40,440 --> 00:48:42,900 the abstract goals of a solution of a problem, 1372 00:48:42,900 --> 00:48:46,350 not just the degree to which it fits the data. 1373 00:48:46,350 --> 00:48:49,920 And I will suggest that this is maybe deeply part 1374 00:48:49,920 --> 00:48:52,055 of an important mystery of human cognition, which 1375 00:48:52,055 --> 00:48:54,806 is-- our most powerful learners spend a lot of their time 1376 00:48:54,806 --> 00:48:57,180 doing something that has defied most of our best attempts 1377 00:48:57,180 --> 00:49:00,900 to explain it, which is they've spent a lot of time, 1378 00:49:00,900 --> 00:49:03,060 many hours a day, just pretending and making up 1379 00:49:03,060 --> 00:49:05,480 stories. 1380 00:49:05,480 --> 00:49:07,580 Stories have some interesting properties. 1381 00:49:07,580 --> 00:49:10,590 They do not have to be true, right? 1382 00:49:10,590 --> 00:49:13,010 They don't have to fit the world or fit the data. 1383 00:49:13,010 --> 00:49:16,370 But they do have to set up a problem and a solution 1384 00:49:16,370 --> 00:49:18,110 that, if true, would solve the problem. 1385 00:49:18,110 --> 00:49:19,360 Most of play has that. 1386 00:49:19,360 --> 00:49:22,220 It is not at all obvious why you would think it was important 1387 00:49:22,220 --> 00:49:25,190 that you want to balance a twisty on the top of the candle 1388 00:49:25,190 --> 00:49:28,220 stick in order to shoot pee through it. 1389 00:49:28,220 --> 00:49:29,575 But it's a problem. 1390 00:49:29,575 --> 00:49:30,200 And guess what? 1391 00:49:30,200 --> 00:49:32,600 If you can set up the problem and find the solution, 1392 00:49:32,600 --> 00:49:35,437 you've just accessed a really important ability 1393 00:49:35,437 --> 00:49:37,520 about setting up problems and setting up solutions 1394 00:49:37,520 --> 00:49:40,730 that, if they worked, would solve that kind of problem. 1395 00:49:40,730 --> 00:49:42,230 And, of course, imagination, most 1396 00:49:42,230 --> 00:49:44,120 of our narratives, most of our fictions, 1397 00:49:44,120 --> 00:49:46,800 have at least those sets of properties. 1398 00:49:46,800 --> 00:49:49,778 So I'm going to go ahead and stop there. 1399 00:49:49,778 --> 00:49:55,222 [APPLAUSE] 1400 00:49:55,222 --> 00:49:55,930 TOMER ULLMAN: OK. 1401 00:49:55,930 --> 00:49:58,346 So if you remember the structure we said at the beginning, 1402 00:49:58,346 --> 00:50:00,280 there will now be a short rebuttal. 1403 00:50:00,280 --> 00:50:02,560 And then Laura will give an even shorter summary, 1404 00:50:02,560 --> 00:50:04,020 and then the free for all. 1405 00:50:04,020 --> 00:50:06,610 So most of you are probably thinking all sorts of ways 1406 00:50:06,610 --> 00:50:07,510 that Laura is wrong. 1407 00:50:07,510 --> 00:50:10,250 But wait, let me get through it first, 1408 00:50:10,250 --> 00:50:12,240 and then see if I didn't cover something. 1409 00:50:14,697 --> 00:50:16,780 But, actually, my response to this, to all of what 1410 00:50:16,780 --> 00:50:21,280 Laura's said, is not you're wrong, but you're right. 1411 00:50:21,280 --> 00:50:22,689 So you're right, Laura. 1412 00:50:22,689 --> 00:50:23,230 You're right. 1413 00:50:23,230 --> 00:50:25,771 Other people in the audience who think that stochastic search 1414 00:50:25,771 --> 00:50:26,315 by itself-- 1415 00:50:26,315 --> 00:50:28,440 if you have some sort of infinite theory space that 1416 00:50:28,440 --> 00:50:30,481 was supposed to account for all possible problems 1417 00:50:30,481 --> 00:50:32,410 and for any new problem what you did 1418 00:50:32,410 --> 00:50:34,630 was to just completely at random try 1419 00:50:34,630 --> 00:50:38,020 to search through that space anew, then that wouldn't work-- 1420 00:50:38,020 --> 00:50:38,696 that's fine. 1421 00:50:38,696 --> 00:50:40,570 And I agree that there's something inherently 1422 00:50:40,570 --> 00:50:42,736 wrong about an algorithm that can take some problem, 1423 00:50:42,736 --> 00:50:44,950 like why these two blocks are sticking together, 1424 00:50:44,950 --> 00:50:47,410 and say, well, maybe it's because the moon is bigger 1425 00:50:47,410 --> 00:50:49,260 than a piece of cheese, right? 1426 00:50:49,260 --> 00:50:52,890 Like, as Laura said, it just seems like it's not even wrong. 1427 00:50:52,890 --> 00:50:55,420 Or maybe it's because people have more than two children 1428 00:50:55,420 --> 00:50:56,140 on average. 1429 00:50:56,140 --> 00:50:56,806 No. 1430 00:50:56,806 --> 00:50:58,180 And there's also something wrong. 1431 00:50:58,180 --> 00:51:00,942 Laura didn't quite get to this or maybe not emphasize it. 1432 00:51:00,942 --> 00:51:02,650 But she emphasizes it sometimes, which is 1433 00:51:02,650 --> 00:51:04,222 there's something wrong about-- 1434 00:51:04,222 --> 00:51:06,180 well, she did a little, but-- an algorithm that 1435 00:51:06,180 --> 00:51:08,035 makes sort of dumb proposals. 1436 00:51:08,035 --> 00:51:10,120 Dumb proposals of all sorts of things-- 1437 00:51:10,120 --> 00:51:13,480 things like you try to explain something in theory space, 1438 00:51:13,480 --> 00:51:16,290 and you say, well, maybe it's because of X. And you check it, 1439 00:51:16,290 --> 00:51:19,000 and it's not X. And you say, oh well, oh well. 1440 00:51:19,000 --> 00:51:22,521 Maybe it's because of X, right? 1441 00:51:22,521 --> 00:51:24,520 There's something wrong about stochastic search. 1442 00:51:24,520 --> 00:51:25,990 Although, I have to say, Laura, you have an eight-year-old. 1443 00:51:25,990 --> 00:51:29,020 And, you know, when we gave this first, I had a two-year-old. 1444 00:51:29,020 --> 00:51:31,420 And I actually think maybe it's X, maybe it's X, 1445 00:51:31,420 --> 00:51:35,192 maybe it's X is not such a crazy way of describing 1446 00:51:35,192 --> 00:51:36,400 what two-year-olds are doing. 1447 00:51:36,400 --> 00:51:37,396 LAURA SCHULZ: Well, [INAUDIBLE]. 1448 00:51:37,396 --> 00:51:39,390 So, you know, like, what if there's noise? 1449 00:51:39,390 --> 00:51:40,181 TOMER ULLMAN: Yeah. 1450 00:51:40,181 --> 00:51:41,850 Yeah, yeah. 1451 00:51:41,850 --> 00:51:42,580 OK. 1452 00:51:42,580 --> 00:51:44,980 But I do want to give an actual response. 1453 00:51:44,980 --> 00:51:50,700 So, you know, I think I have some responses for Laura. 1454 00:51:50,700 --> 00:51:53,200 And these are responses that, importantly, you know, they'll 1455 00:51:53,200 --> 00:51:54,460 try to address what Laura is saying. 1456 00:51:54,460 --> 00:51:55,840 They'll try to take it into account. 1457 00:51:55,840 --> 00:51:58,390 They'll try to give new answers to it that will importantly 1458 00:51:58,390 --> 00:52:00,160 leave her unhappy. 1459 00:52:00,160 --> 00:52:03,400 So what I'm going to try to do is take a page out 1460 00:52:03,400 --> 00:52:07,980 of Hannibal Barca at the Battle of Cannae and try to envelop. 1461 00:52:07,980 --> 00:52:12,652 So I'm going to highlight of work by other people. 1462 00:52:12,652 --> 00:52:14,860 From one direction is work by Steve Piantadosi, which 1463 00:52:14,860 --> 00:52:17,020 is about making these, you know, algorithmic search part 1464 00:52:17,020 --> 00:52:18,520 of the problem, whether it's too slow, 1465 00:52:18,520 --> 00:52:19,990 students are wandering around the halls 1466 00:52:19,990 --> 00:52:21,750 doing nothing waiting for it to converge. 1467 00:52:21,750 --> 00:52:24,477 What if we just did it really, really fast? 1468 00:52:24,477 --> 00:52:26,060 Another way to address this is to say, 1469 00:52:26,060 --> 00:52:27,580 what if we made actually good proposals? 1470 00:52:27,580 --> 00:52:27,790 OK. 1471 00:52:27,790 --> 00:52:30,460 The problem of making proposals that are just ridiculous-- what 1472 00:52:30,460 --> 00:52:32,085 if we made proposals that actually take 1473 00:52:32,085 --> 00:52:33,810 the data into account a little bit 1474 00:52:33,810 --> 00:52:36,351 or the previous data, the stuff that we're trying to explain? 1475 00:52:38,890 --> 00:52:41,782 Another way to address what Laura is saying 1476 00:52:41,782 --> 00:52:43,990 is to say, well, maybe we can make better primitives. 1477 00:52:43,990 --> 00:52:46,840 And better primitives mean that your search space is actually 1478 00:52:46,840 --> 00:52:51,580 more confined to the right sort of things. 1479 00:52:51,580 --> 00:52:54,430 And finally, I'm going to highlight some new thoughts 1480 00:52:54,430 --> 00:52:58,600 by myself about ad hoc spaces and how we might construct them 1481 00:52:58,600 --> 00:53:01,690 to get at this problem of this thing, 1482 00:53:01,690 --> 00:53:04,500 like how would you come up with a new name for theater company 1483 00:53:04,500 --> 00:53:05,380 or a name for a new theater company? 1484 00:53:05,380 --> 00:53:06,970 So I'll go through these somewhat fast. 1485 00:53:06,970 --> 00:53:08,860 You're welcome to come and talk about any of them. 1486 00:53:08,860 --> 00:53:10,359 And I'll point you to the people who 1487 00:53:10,359 --> 00:53:13,540 are actually doing the work, which I've said before. 1488 00:53:13,540 --> 00:53:14,610 So let's see. 1489 00:53:14,610 --> 00:53:16,030 This is just sort of the rebuttal. 1490 00:53:16,030 --> 00:53:18,010 One of the rebuttals is to say, well, here's 1491 00:53:18,010 --> 00:53:20,890 this work by Steve Piantadosi, which is, you know, 1492 00:53:20,890 --> 00:53:23,560 introspection is really actually a poor guide. 1493 00:53:23,560 --> 00:53:27,610 So when [INAUDIBLE] was giving that beautiful example of cell 1494 00:53:27,610 --> 00:53:29,744 phones, why you have to shut them off on airplanes, 1495 00:53:29,744 --> 00:53:31,660 you say, oh, and she came up with this example 1496 00:53:31,660 --> 00:53:33,620 that it's not true. 1497 00:53:33,620 --> 00:53:35,890 But if it were true, it would be nice. 1498 00:53:35,890 --> 00:53:38,492 Well, maybe she went through a billion other things 1499 00:53:38,492 --> 00:53:39,700 before she came up with that? 1500 00:53:39,700 --> 00:53:41,350 Laura's like, she did it like that. 1501 00:53:41,350 --> 00:53:43,960 But we don't have a good sense for what is actually fast, 1502 00:53:43,960 --> 00:53:45,940 what is actually slow. 1503 00:53:45,940 --> 00:53:49,330 And there might be a case where we don't actually introspect 1504 00:53:49,330 --> 00:53:51,175 about a lot of things. 1505 00:53:51,175 --> 00:53:53,110 The things that bubble up into consciousness 1506 00:53:53,110 --> 00:53:54,970 that you might actually accept or reject 1507 00:53:54,970 --> 00:53:56,920 rely on actually a ton of other proposals 1508 00:53:56,920 --> 00:53:59,334 that they don't even bubble up into introspection 1509 00:53:59,334 --> 00:54:01,000 that you're making very quickly and just 1510 00:54:01,000 --> 00:54:02,920 rejecting even before that. 1511 00:54:02,920 --> 00:54:05,140 And Steve came up with this really interesting way. 1512 00:54:05,140 --> 00:54:07,140 You guys have probably heard about deep learning 1513 00:54:07,140 --> 00:54:08,650 and the sort of the GPU revolution 1514 00:54:08,650 --> 00:54:11,000 for deep learning and things like that. 1515 00:54:11,000 --> 00:54:12,720 So the point is if instead of we use 1516 00:54:12,720 --> 00:54:16,220 CPUs we would use graphical processing units, 1517 00:54:16,220 --> 00:54:19,060 then we can make stochastic search algorithms in parallel 1518 00:54:19,060 --> 00:54:19,890 in some cases. 1519 00:54:19,890 --> 00:54:21,514 And once you can make them in parallel, 1520 00:54:21,514 --> 00:54:22,810 then you can put them on a GPU. 1521 00:54:22,810 --> 00:54:24,400 And once you put them on a GPU, a GPU 1522 00:54:24,400 --> 00:54:26,154 is sort of a way of taking something 1523 00:54:26,154 --> 00:54:27,320 that's supposed to be a CPU. 1524 00:54:27,320 --> 00:54:28,900 And instead of having a CPU that can 1525 00:54:28,900 --> 00:54:31,370 do something sort of complicated on a sort of task, 1526 00:54:31,370 --> 00:54:34,630 you can do a lot of really simple tasks in parallel 1527 00:54:34,630 --> 00:54:35,380 very fast. 1528 00:54:35,380 --> 00:54:37,110 So if you could make that proposal, 1529 00:54:37,110 --> 00:54:39,007 that thing I said before-- take a theory, 1530 00:54:39,007 --> 00:54:40,840 make a change to it-- if you could make that 1531 00:54:40,840 --> 00:54:43,060 into the sort of thing that you could put on a GPU, 1532 00:54:43,060 --> 00:54:44,920 then you can make a ton of those proposals 1533 00:54:44,920 --> 00:54:46,530 very quickly in parallel. 1534 00:54:46,530 --> 00:54:48,280 And that's sort of what he figured out how 1535 00:54:48,280 --> 00:54:49,750 to do for a bunch of spaces. 1536 00:54:49,750 --> 00:54:51,280 It's much faster than the CPU. 1537 00:54:51,280 --> 00:54:53,530 And the main advantage is that it's also much cheaper. 1538 00:54:53,530 --> 00:54:55,612 And you can cram a whole bunch of together. 1539 00:54:55,612 --> 00:54:57,070 And you can get to something like-- 1540 00:54:57,070 --> 00:54:58,000 I forget the exact numbers. 1541 00:54:58,000 --> 00:55:00,010 You can make like a million of these theories proposals 1542 00:55:00,010 --> 00:55:00,550 a second. 1543 00:55:00,550 --> 00:55:02,090 And that's just with today's technology, right? 1544 00:55:02,090 --> 00:55:04,210 We don't know what's coming around the corner 1545 00:55:04,210 --> 00:55:06,100 a few years from now. 1546 00:55:06,100 --> 00:55:08,242 You know, Steve plus GPUs is awesome. 1547 00:55:08,242 --> 00:55:10,450 And you could think of it like these various problems 1548 00:55:10,450 --> 00:55:12,310 like you're trying to fit these data points on the bottom. 1549 00:55:12,310 --> 00:55:13,264 Can people see that? 1550 00:55:13,264 --> 00:55:14,680 This is sort of a classic problem. 1551 00:55:14,680 --> 00:55:15,430 You have some data points. 1552 00:55:15,430 --> 00:55:17,020 You're trying to fit a polynomial to it. 1553 00:55:17,020 --> 00:55:19,240 And you're trying to say, well, how will we do that? 1554 00:55:19,240 --> 00:55:21,970 The truth is there's a lot of very clever ways of doing that. 1555 00:55:21,970 --> 00:55:23,344 But let's assume that you're even 1556 00:55:23,344 --> 00:55:25,469 doing random search in polynomial space-- 1557 00:55:25,469 --> 00:55:27,010 not the sort of thing you want to do. 1558 00:55:27,010 --> 00:55:28,340 Those of you who have been to the tutorial, 1559 00:55:28,340 --> 00:55:30,227 I mentioned that if you have an actual better 1560 00:55:30,227 --> 00:55:32,560 way than stochastic search, you should probably do that. 1561 00:55:32,560 --> 00:55:33,730 But suppose you didn't know and you 1562 00:55:33,730 --> 00:55:35,021 wanted to do stochastic search. 1563 00:55:35,021 --> 00:55:37,330 You could still do a million moves a second 1564 00:55:37,330 --> 00:55:39,800 and quickly converge on something like that line 1565 00:55:39,800 --> 00:55:40,461 that you see. 1566 00:55:40,461 --> 00:55:40,960 OK. 1567 00:55:40,960 --> 00:55:42,876 And that line is actually taken from, I think, 1568 00:55:42,876 --> 00:55:45,430 Galileo Galilei's data for how things slide on a hill. 1569 00:55:45,430 --> 00:55:47,810 I'm not Galileo Galilei sat around and said, 1570 00:55:47,810 --> 00:55:49,976 maybe it's x to the square, maybe it's x to the 2.1, 1571 00:55:49,976 --> 00:55:51,900 maybe it's x to the 2.3, maybe it's x to the 2.1, 1572 00:55:51,900 --> 00:55:53,170 maybe it's x to the 2, and then finally 1573 00:55:53,170 --> 00:55:55,044 converged like after a million moves to that. 1574 00:55:55,044 --> 00:55:56,960 That's not exactly scientific discovery. 1575 00:55:56,960 --> 00:55:58,420 But for a lot of everyday thinking, 1576 00:55:58,420 --> 00:56:00,530 you might actually be proposing things very fast 1577 00:56:00,530 --> 00:56:01,651 and rejecting them. 1578 00:56:01,651 --> 00:56:02,150 OK. 1579 00:56:02,150 --> 00:56:03,240 That's Steve. 1580 00:56:03,240 --> 00:56:05,610 Some things from Owen Lewis about making maybe smarter 1581 00:56:05,610 --> 00:56:08,110 proposals-- and this gets at that point of, like, maybe it's 1582 00:56:08,110 --> 00:56:09,000 an x, maybe not. 1583 00:56:09,000 --> 00:56:11,620 So I suppose that I'm trying to teach you a concept. 1584 00:56:11,620 --> 00:56:12,220 OK. 1585 00:56:12,220 --> 00:56:13,630 I'm trying to teach you a particular concept. 1586 00:56:13,630 --> 00:56:15,280 I'm going to give you some positive examples 1587 00:56:15,280 --> 00:56:15,920 of the concept. 1588 00:56:15,920 --> 00:56:16,170 OK? 1589 00:56:16,170 --> 00:56:17,710 This is the sort of thing that psychologists really 1590 00:56:17,710 --> 00:56:18,580 like to study. 1591 00:56:18,580 --> 00:56:19,120 OK? 1592 00:56:19,120 --> 00:56:20,190 So this is a room-- 1593 00:56:20,190 --> 00:56:22,750 no, Roomba's an actual thing. 1594 00:56:22,750 --> 00:56:23,770 Blick gets overused. 1595 00:56:23,770 --> 00:56:26,050 Can someone give me a nonsense term? 1596 00:56:26,050 --> 00:56:26,960 This is a Jabberwock. 1597 00:56:26,960 --> 00:56:27,460 OK? 1598 00:56:27,460 --> 00:56:28,330 This is a Jabberwock. 1599 00:56:28,330 --> 00:56:30,580 I'm going to give you another example of a Jabberwock. 1600 00:56:30,580 --> 00:56:32,500 Who thinks they know what Jabberwocks are? 1601 00:56:32,500 --> 00:56:34,334 You have some sense of what a Jabberwock is? 1602 00:56:34,334 --> 00:56:34,833 OK. 1603 00:56:34,833 --> 00:56:36,130 Huh, that's also a Jabberwock. 1604 00:56:36,130 --> 00:56:37,090 Wait a minute. 1605 00:56:37,090 --> 00:56:38,170 OK. 1606 00:56:38,170 --> 00:56:40,440 The sense that I had for maybe what a Jabberwock is 1607 00:56:40,440 --> 00:56:41,717 is maybe not that great. 1608 00:56:41,717 --> 00:56:42,550 That's a Jabberwock. 1609 00:56:42,550 --> 00:56:43,383 That's a Jabberwock. 1610 00:56:43,383 --> 00:56:44,230 That's a Jabberwock. 1611 00:56:44,230 --> 00:56:45,130 That's a Jabberwock. 1612 00:56:45,130 --> 00:56:45,644 OK. 1613 00:56:45,644 --> 00:56:47,560 And you might think at this point, well, fine. 1614 00:56:47,560 --> 00:56:51,100 Jabberwocks are either squares of any color or red circles. 1615 00:56:51,100 --> 00:56:52,600 Or maybe they're squares or circles. 1616 00:56:52,600 --> 00:56:53,474 I don't know exactly. 1617 00:56:53,474 --> 00:56:55,300 You're building up some sort of theory 1618 00:56:55,300 --> 00:56:57,040 for that concept, which can be described 1619 00:56:57,040 --> 00:57:00,310 in something like a grammar for your current hypothesis. 1620 00:57:00,310 --> 00:57:02,110 You might say it's either a red circle, 1621 00:57:02,110 --> 00:57:03,770 or it's the square of any color. 1622 00:57:03,770 --> 00:57:04,270 OK. 1623 00:57:04,270 --> 00:57:06,394 And that's sort of your grammar for these concepts. 1624 00:57:06,394 --> 00:57:09,897 And now you could sort of change that grammar, right? 1625 00:57:09,897 --> 00:57:11,980 You could sort of excise these nodes and that tree 1626 00:57:11,980 --> 00:57:13,370 to come up with new things. 1627 00:57:13,370 --> 00:57:15,286 Why would you want to come up with new things? 1628 00:57:15,286 --> 00:57:16,820 Because, look, that's a Jabberwock. 1629 00:57:16,820 --> 00:57:17,080 OK. 1630 00:57:17,080 --> 00:57:18,329 I just gave you a new example. 1631 00:57:18,329 --> 00:57:19,970 It's something you didn't know before. 1632 00:57:19,970 --> 00:57:21,280 It's sort of confounding with your theory. 1633 00:57:21,280 --> 00:57:23,350 You have to come up with a new theory on the fly. 1634 00:57:23,350 --> 00:57:23,860 OK? 1635 00:57:23,860 --> 00:57:26,489 Theory-- again, used in a very, very minimal sense here. 1636 00:57:26,489 --> 00:57:28,780 But if you accept that this is something like a theory, 1637 00:57:28,780 --> 00:57:30,640 you have to come up with a new theory for explaining 1638 00:57:30,640 --> 00:57:32,800 why that's a Jabberwock and all the other things that you've 1639 00:57:32,800 --> 00:57:33,790 seen are Jabberwocks. 1640 00:57:33,790 --> 00:57:34,615 What is a bad idea? 1641 00:57:34,615 --> 00:57:39,477 A bad idea is to just cut and generate randomly, right? 1642 00:57:39,477 --> 00:57:41,685 You might come up with something like it's a triangle 1643 00:57:41,685 --> 00:57:42,643 or something like that. 1644 00:57:42,643 --> 00:57:45,241 But you might come up with, well, maybe it's a square. 1645 00:57:45,241 --> 00:57:46,240 No, we already did that. 1646 00:57:46,240 --> 00:57:48,520 Well, maybe it's, you know, just triangles. 1647 00:57:48,520 --> 00:57:49,870 Maybe it's just a square. 1648 00:57:49,870 --> 00:57:51,340 Like, you could spend all this time 1649 00:57:51,340 --> 00:57:54,250 not taking into account your previous theory and the fact 1650 00:57:54,250 --> 00:57:56,080 that your new example had something 1651 00:57:56,080 --> 00:57:59,260 to do with triangles, something to do with red triangles. 1652 00:57:59,260 --> 00:58:01,450 You want to be able to make proposals that take 1653 00:58:01,450 --> 00:58:04,520 into account this new data. 1654 00:58:04,520 --> 00:58:06,610 Does everyone understand what the problem 1655 00:58:06,610 --> 00:58:07,754 we're trying to get at is? 1656 00:58:07,754 --> 00:58:10,420 So what Owen has done is to sort of take these stochastic search 1657 00:58:10,420 --> 00:58:12,711 algorithms and say, if you get a new piece of data that 1658 00:58:12,711 --> 00:58:15,190 contradicts, that sort of interferes with what you had 1659 00:58:15,190 --> 00:58:16,960 before, how would you make proposals 1660 00:58:16,960 --> 00:58:19,390 that must take into account this new data? 1661 00:58:19,390 --> 00:58:21,100 I'm going to recut and generate. 1662 00:58:21,100 --> 00:58:22,750 But I'm going to identify the places 1663 00:58:22,750 --> 00:58:25,210 in my theory that would take into account this new piece 1664 00:58:25,210 --> 00:58:25,750 of data. 1665 00:58:25,750 --> 00:58:27,970 And I'm going to make smarter proposals. 1666 00:58:27,970 --> 00:58:29,200 They might still be wrong. 1667 00:58:29,200 --> 00:58:31,492 And there's better and worse ways of doing this. 1668 00:58:31,492 --> 00:58:33,700 It's still going to be a randomized search and theory 1669 00:58:33,700 --> 00:58:34,199 space. 1670 00:58:34,199 --> 00:58:37,140 But it's at least going to take into account this new data. 1671 00:58:37,140 --> 00:58:38,132 OK. 1672 00:58:38,132 --> 00:58:40,090 And that's just-- I'm afraid I don't have time. 1673 00:58:40,090 --> 00:58:40,860 But, look, it works. 1674 00:58:40,860 --> 00:58:42,340 And it's much better than just bouncing around 1675 00:58:42,340 --> 00:58:44,560 completely around random as you might guess. 1676 00:58:44,560 --> 00:58:48,850 Another response, this work by Eyal Dechter. 1677 00:58:48,850 --> 00:58:51,200 Let me skip over this for a little bit. 1678 00:58:51,200 --> 00:58:53,290 This is work Eyal Dechter, which is to say, 1679 00:58:53,290 --> 00:58:55,701 what if you wanted to use better and better primitives? 1680 00:58:55,701 --> 00:58:56,200 OK. 1681 00:58:56,200 --> 00:58:58,699 So before we have this notion of you're just bouncing around 1682 00:58:58,699 --> 00:59:01,152 in theory space, you're making all sorts of notions, 1683 00:59:01,152 --> 00:59:02,110 let me put it this way. 1684 00:59:02,110 --> 00:59:03,910 Suppose that you're searching through the entire space 1685 00:59:03,910 --> 00:59:04,690 of programs. 1686 00:59:04,690 --> 00:59:05,257 OK? 1687 00:59:05,257 --> 00:59:07,090 And the only thing that you had to work with 1688 00:59:07,090 --> 00:59:09,381 is something like that lambda expression before, right? 1689 00:59:09,381 --> 00:59:12,340 You didn't have the notion of plus, minus, multiplication, 1690 00:59:12,340 --> 00:59:13,741 sine waves, things like that. 1691 00:59:13,741 --> 00:59:16,240 And you're trying to figure out something about an equation. 1692 00:59:16,240 --> 00:59:17,260 And you work through it. 1693 00:59:17,260 --> 00:59:18,510 And there's a way of doing it. 1694 00:59:18,510 --> 00:59:20,385 There's a way of, like, generating functions 1695 00:59:20,385 --> 00:59:21,760 that rely on other functions that 1696 00:59:21,760 --> 00:59:24,093 rely on other functions in the complicated way that will 1697 00:59:24,093 --> 00:59:25,510 give you the plus function. 1698 00:59:25,510 --> 00:59:26,140 OK? 1699 00:59:26,140 --> 00:59:26,840 And you do that. 1700 00:59:26,840 --> 00:59:27,690 And then you generate a lot. 1701 00:59:27,690 --> 00:59:30,070 And you somehow manage to find out the sinus function. 1702 00:59:30,070 --> 00:59:31,944 And you finally figure out that this function 1703 00:59:31,944 --> 00:59:35,841 you're trying to describe is sinus x plus sinus y. 1704 00:59:35,841 --> 00:59:37,090 Let's say something like that. 1705 00:59:37,090 --> 00:59:37,300 OK? 1706 00:59:37,300 --> 00:59:38,480 The exact example doesn't matter. 1707 00:59:38,480 --> 00:59:40,870 But you worked really hard, and you figured that out. 1708 00:59:40,870 --> 00:59:42,319 Now, you get a new example. 1709 00:59:42,319 --> 00:59:44,110 And underneath the hood, it's actually just 1710 00:59:44,110 --> 00:59:45,610 sinus x minus sinus y. 1711 00:59:45,610 --> 00:59:48,855 Or let's say it's sinus y plus sinus z or something like that. 1712 00:59:48,855 --> 00:59:50,230 And now you start all over again. 1713 00:59:50,230 --> 00:59:50,770 You're like, fine. 1714 00:59:50,770 --> 00:59:52,630 OK, lambda something, something, something. 1715 00:59:52,630 --> 00:59:54,963 Like, if you could only use the fact that you've already 1716 00:59:54,963 --> 00:59:56,770 discovered the sign function, you've 1717 00:59:56,770 --> 00:59:59,170 already discovered plus and minus and things like that, 1718 00:59:59,170 --> 01:00:01,990 and now when you come to try and explain a new problem, 1719 01:00:01,990 --> 01:00:04,250 you actually have a lot of previous knowledge. 1720 01:00:04,250 --> 01:00:04,750 OK? 1721 01:00:04,750 --> 01:00:06,550 So when you're trying to describe why airplanes take off 1722 01:00:06,550 --> 01:00:07,480 and how they do that, you're actually 1723 01:00:07,480 --> 01:00:09,110 going to rely on previous knowledge. 1724 01:00:09,110 --> 01:00:11,318 You're not going to search through your entire theory 1725 01:00:11,318 --> 01:00:12,820 space starting from nowhere. 1726 01:00:12,820 --> 01:00:14,530 You're going to rely on primitives 1727 01:00:14,530 --> 01:00:16,146 before that have been useful. 1728 01:00:16,146 --> 01:00:18,145 So you might notice that actually plus and minus 1729 01:00:18,145 --> 01:00:21,520 and sines and things like that and cosine and exponentiation 1730 01:00:21,520 --> 01:00:22,690 are really useful. 1731 01:00:22,690 --> 01:00:24,580 Let's save those as primitives. 1732 01:00:24,580 --> 01:00:27,010 So that next time that we make random proposals in theory 1733 01:00:27,010 --> 01:00:29,530 space, you can think of it of making 1734 01:00:29,530 --> 01:00:32,210 like a whole bunch of moves at once that were useful. 1735 01:00:32,210 --> 01:00:32,710 OK. 1736 01:00:32,710 --> 01:00:34,210 Like, a whole bunch of stuff at once 1737 01:00:34,210 --> 01:00:36,564 that was useful that shows up all the time-- 1738 01:00:36,564 --> 01:00:37,730 you want to make that again. 1739 01:00:37,730 --> 01:00:38,230 OK? 1740 01:00:38,230 --> 01:00:40,300 So I try that, for example. 1741 01:00:40,300 --> 01:00:42,269 And the examples that we gave in theory space-- 1742 01:00:42,269 --> 01:00:44,560 like you might find out that actually a lot of theories 1743 01:00:44,560 --> 01:00:47,710 use transitivity, or a lot of theories use reflection, right? 1744 01:00:47,710 --> 01:00:50,320 In the particular magnet case, if X attracts Y, 1745 01:00:50,320 --> 01:00:52,460 then Y will attract X, in general, 1746 01:00:52,460 --> 01:00:55,210 the law of if X blahs Y, then Y will 1747 01:00:55,210 --> 01:00:57,940 blah X, that turns out to be useful in a lot of domains 1748 01:00:57,940 --> 01:00:59,920 if only there was a way of reusing them, right? 1749 01:00:59,920 --> 01:01:01,000 There is. 1750 01:01:01,000 --> 01:01:03,130 And what Eyal was doing was to basically use 1751 01:01:03,130 --> 01:01:05,990 an explanation compression algorithm, the EC algorithm. 1752 01:01:05,990 --> 01:01:08,692 And what it does is it tries to encapsulate useful concepts. 1753 01:01:08,692 --> 01:01:10,400 And he used it on a whole bunch of stuff. 1754 01:01:10,400 --> 01:01:11,941 One of the nice domains he used it on 1755 01:01:11,941 --> 01:01:13,720 was these circuit diagrams. 1756 01:01:13,720 --> 01:01:16,320 Have any of you actually had to solve circuit diagrams? 1757 01:01:16,320 --> 01:01:19,080 This is the sort of stuff that people at MIT do. 1758 01:01:19,080 --> 01:01:21,737 You're given a particular input-output function. 1759 01:01:21,737 --> 01:01:23,320 And again, like I said, under the hood 1760 01:01:23,320 --> 01:01:25,750 it might be something like you're just told something 1761 01:01:25,750 --> 01:01:27,820 like here's X and Y, OK? 1762 01:01:27,820 --> 01:01:30,520 X and Y can each be 1 or 0. 1763 01:01:30,520 --> 01:01:31,100 OK? 1764 01:01:31,100 --> 01:01:33,730 And now I'm going to give you combinations of values of X 1765 01:01:33,730 --> 01:01:35,240 and Y and what that spits out. 1766 01:01:35,240 --> 01:01:35,740 OK. 1767 01:01:35,740 --> 01:01:37,870 So X and Y are both 1-- 1768 01:01:37,870 --> 01:01:38,920 light turns on. 1769 01:01:38,920 --> 01:01:40,270 X and Y are both 0-- 1770 01:01:40,270 --> 01:01:41,050 light turns on. 1771 01:01:41,050 --> 01:01:43,580 X is 1, Y is 0-- light turns off. 1772 01:01:43,580 --> 01:01:44,080 OK. 1773 01:01:44,080 --> 01:01:46,930 And you're trying to find out some sort of circuit that 1774 01:01:46,930 --> 01:01:50,020 will explain this behavior, some sort of combination logic 1775 01:01:50,020 --> 01:01:54,660 gates, like ANDs and ORs and NANDs and things like that. 1776 01:01:54,660 --> 01:01:55,160 OK? 1777 01:01:55,160 --> 01:01:57,659 Do people sort of understand the problem for these circuits? 1778 01:01:57,659 --> 01:02:00,580 And you can get a long list of things, X, Y, Z, T, some sort 1779 01:02:00,580 --> 01:02:01,610 of complicated behavior. 1780 01:02:01,610 --> 01:02:03,160 You might not even get the full behavior. 1781 01:02:03,160 --> 01:02:05,320 And you're trying to find sort of the minimal set 1782 01:02:05,320 --> 01:02:07,830 of logical predicates, or in this case circuits, that 1783 01:02:07,830 --> 01:02:09,380 will explain that behavior. 1784 01:02:09,380 --> 01:02:11,620 And now suppose that you only have 1785 01:02:11,620 --> 01:02:14,220 the gate NAND to work with. 1786 01:02:14,220 --> 01:02:16,083 Do people know what NAND is? 1787 01:02:16,083 --> 01:02:16,927 OK. 1788 01:02:16,927 --> 01:02:18,010 It's a sort of logic gate. 1789 01:02:18,010 --> 01:02:20,680 It's a very simple logic gate that you can build up 1790 01:02:20,680 --> 01:02:22,300 all the other logic gates from. 1791 01:02:22,300 --> 01:02:25,790 But it will take you a while to build up AND from NAND or OR 1792 01:02:25,790 --> 01:02:26,290 from NAND. 1793 01:02:26,290 --> 01:02:27,340 But you can do it. 1794 01:02:27,340 --> 01:02:29,620 And so what he did was, and his colleagues, they 1795 01:02:29,620 --> 01:02:31,394 started out sort of giving this algorithm 1796 01:02:31,394 --> 01:02:32,560 a lot of different problems. 1797 01:02:32,560 --> 01:02:35,184 Like, here's a bunch of circuit diagrams that you need to solve 1798 01:02:35,184 --> 01:02:37,810 or circuit problems that you need to find the diagram for. 1799 01:02:37,810 --> 01:02:40,120 What the algorithm was doing was that each time it 1800 01:02:40,120 --> 01:02:41,850 solved a problem or set of problems, 1801 01:02:41,850 --> 01:02:44,350 it would go back and look, huh, which parts of this 1802 01:02:44,350 --> 01:02:46,120 can I encapsulate, right? 1803 01:02:46,120 --> 01:02:48,340 Which parts of this can I sort of use again? 1804 01:02:48,340 --> 01:02:51,140 I can carve off a chunk of something that was useful. 1805 01:02:51,140 --> 01:02:52,640 And now, when I make a new proposal, 1806 01:02:52,640 --> 01:02:56,230 I'm not going to say put a NAND here or a NAND there. 1807 01:02:56,230 --> 01:02:58,750 Stochastically, I'm going to sort of put in a whole chunk 1808 01:02:58,750 --> 01:03:00,140 that I've already used before. 1809 01:03:00,140 --> 01:03:00,640 OK? 1810 01:03:00,640 --> 01:03:02,598 I'm going to sort of call that a new primitive. 1811 01:03:02,598 --> 01:03:04,500 Cut out this part of the space. 1812 01:03:04,500 --> 01:03:07,000 Under the hood, it's actually an AND or something like that. 1813 01:03:07,000 --> 01:03:09,130 And discover-- so discover is not an AND. 1814 01:03:09,130 --> 01:03:10,866 And discover is this really useful thing 1815 01:03:10,866 --> 01:03:13,240 that they called E2, which doesn't appear in logic books. 1816 01:03:13,240 --> 01:03:16,070 But it turns out to be really useful for certain diagrams, 1817 01:03:16,070 --> 01:03:18,910 which is take an input, split it into two, 1818 01:03:18,910 --> 01:03:21,790 do something on this part, do something on that part, 1819 01:03:21,790 --> 01:03:22,860 and recombine it. 1820 01:03:22,860 --> 01:03:24,610 It turns out to be a hugely useful concept 1821 01:03:24,610 --> 01:03:25,540 for circuit diagrams. 1822 01:03:25,540 --> 01:03:26,950 And this thing discovers it. 1823 01:03:26,950 --> 01:03:29,500 And once it discovers it, it sort of reuses it. 1824 01:03:29,500 --> 01:03:32,020 And what that does is it turns an infinite and unmanageable 1825 01:03:32,020 --> 01:03:34,401 space into infinite, but a bit more manageable. 1826 01:03:34,401 --> 01:03:34,900 OK? 1827 01:03:34,900 --> 01:03:36,069 So your space is infinite. 1828 01:03:36,069 --> 01:03:38,110 You're not going to search the full length of it. 1829 01:03:38,110 --> 01:03:40,600 Imagine that this is a space of all possible programs. 1830 01:03:40,600 --> 01:03:42,610 As you go down, the programs get way too long. 1831 01:03:42,610 --> 01:03:44,090 You're never going to reach them. 1832 01:03:44,090 --> 01:03:45,530 But some of them are really good. 1833 01:03:45,530 --> 01:03:46,720 Some of them are really good explanations. 1834 01:03:46,720 --> 01:03:48,250 And the only way to get to them is 1835 01:03:48,250 --> 01:03:50,020 if you had some sort of way of chunking 1836 01:03:50,020 --> 01:03:52,990 the problem, of saying, yes, it looks like a long program. 1837 01:03:52,990 --> 01:03:56,020 But actually half of it I've already used before 1838 01:03:56,020 --> 01:03:57,310 to solve a different thing. 1839 01:03:57,310 --> 01:04:00,511 And half of it is less long than two times. 1840 01:04:00,511 --> 01:04:02,260 So you might discover, you know, you might 1841 01:04:02,260 --> 01:04:03,430 have an effective search area. 1842 01:04:03,430 --> 01:04:05,513 You find out all the problems you can solve there. 1843 01:04:08,440 --> 01:04:13,210 Yeah, this is an interesting thing, choice color. 1844 01:04:13,210 --> 01:04:15,730 So imagine that of this blue thing over here 1845 01:04:15,730 --> 01:04:18,290 is describing, within the space of all possible programs, 1846 01:04:18,290 --> 01:04:20,440 the sort of programs that you want to find. 1847 01:04:20,440 --> 01:04:22,870 So there's the effective search area. 1848 01:04:22,870 --> 01:04:25,030 They only cover part of this blue thing. 1849 01:04:25,030 --> 01:04:26,696 You can think of it like the probability 1850 01:04:26,696 --> 01:04:27,860 is really high over there. 1851 01:04:27,860 --> 01:04:29,680 You really want to find all of them. 1852 01:04:29,680 --> 01:04:31,480 But by searching that small space 1853 01:04:31,480 --> 01:04:34,180 and, within that small space, finding the right primitives 1854 01:04:34,180 --> 01:04:36,490 and encapsulating them, you can now actually 1855 01:04:36,490 --> 01:04:38,652 search more efficiently the rest of the space. 1856 01:04:38,652 --> 01:04:41,110 And the rest of the space sort of compresses and compresses 1857 01:04:41,110 --> 01:04:43,594 until it's all within your effective search area. 1858 01:04:43,594 --> 01:04:45,010 Do people sort of understand that? 1859 01:04:45,010 --> 01:04:46,270 It's sort of there were long programs 1860 01:04:46,270 --> 01:04:48,010 before that you never would have gotten to. 1861 01:04:48,010 --> 01:04:49,480 But by searching these small spaces 1862 01:04:49,480 --> 01:04:51,021 that you could search through before, 1863 01:04:51,021 --> 01:04:52,840 discovering the useful parts there, 1864 01:04:52,840 --> 01:04:55,369 these new things that seem really long before 1865 01:04:55,369 --> 01:04:56,160 are actually short. 1866 01:04:56,160 --> 01:04:59,180 Because they can be described by just a few chunks. 1867 01:04:59,180 --> 01:05:00,024 OK? 1868 01:05:00,024 --> 01:05:01,440 This is a really interesting work. 1869 01:05:01,440 --> 01:05:03,240 And I encourage you all to read it, those 1870 01:05:03,240 --> 01:05:04,620 of you who find it interesting. 1871 01:05:04,620 --> 01:05:06,370 The last thing I'm going to do-- and then that'll leaves 10 1872 01:05:06,370 --> 01:05:08,184 minutes to discussion, which is great-- 1873 01:05:08,184 --> 01:05:09,600 is this problem that I guess maybe 1874 01:05:09,600 --> 01:05:11,160 it's really the heart of what Laura is getting at. 1875 01:05:11,160 --> 01:05:13,020 I think she was not satisfied by any of these things. 1876 01:05:13,020 --> 01:05:14,790 And she was sort of pointing out, well, fine, you 1877 01:05:14,790 --> 01:05:16,373 can do stochastic search all you want. 1878 01:05:16,373 --> 01:05:19,020 But the really hard problem is constructing the space itself 1879 01:05:19,020 --> 01:05:20,750 on the fly. 1880 01:05:20,750 --> 01:05:22,500 You're not going to use one infinite space 1881 01:05:22,500 --> 01:05:23,583 for all possible problems. 1882 01:05:23,583 --> 01:05:26,432 You're going to use the right spaces for the right problem. 1883 01:05:26,432 --> 01:05:27,390 And how do you do that? 1884 01:05:30,234 --> 01:05:32,400 In this case, we're going to do, give me a good name 1885 01:05:32,400 --> 01:05:34,211 for a romantic drama. 1886 01:05:34,211 --> 01:05:34,710 All right. 1887 01:05:34,710 --> 01:05:37,270 And your search space is going to be imagined that-- can 1888 01:05:37,270 --> 01:05:38,520 people see sort of the border? 1889 01:05:38,520 --> 01:05:40,656 Like, there's this whole space of uselessness. 1890 01:05:40,656 --> 01:05:42,030 And what you really want to do is 1891 01:05:42,030 --> 01:05:44,460 focus in on that tiny part of useful things. 1892 01:05:44,460 --> 01:05:46,920 If only there was a way of just on the fly, 1893 01:05:46,920 --> 01:05:48,690 you know, zooming in on that thing 1894 01:05:48,690 --> 01:05:50,485 and then bouncing around in that. 1895 01:05:50,485 --> 01:05:53,910 And the point is to say, well, when we construct the space, 1896 01:05:53,910 --> 01:05:55,424 we can just use previous examples. 1897 01:05:55,424 --> 01:05:57,090 I don't think it's the case that we just 1898 01:05:57,090 --> 01:05:59,071 knew something necessarily completely 1899 01:05:59,071 --> 01:06:00,695 new in these sort of everyday thinking. 1900 01:06:00,695 --> 01:06:01,280 Well, maybe. 1901 01:06:01,280 --> 01:06:03,030 We can argue about that in the discussion. 1902 01:06:03,030 --> 01:06:04,863 What you actually start out with is actually 1903 01:06:04,863 --> 01:06:07,800 taking a few examples that you find relevant in some way 1904 01:06:07,800 --> 01:06:09,940 and using those examples to then construct 1905 01:06:09,940 --> 01:06:12,564 your space on the fly, right? 1906 01:06:12,564 --> 01:06:13,980 You might think about things like, 1907 01:06:13,980 --> 01:06:16,950 what other romantic dramas do I remember in the past? 1908 01:06:16,950 --> 01:06:18,220 What do they share in common? 1909 01:06:18,220 --> 01:06:20,670 What movie names do I know of in the past-- 1910 01:06:20,670 --> 01:06:22,830 quickly finding the sort of relevant thing for all 1911 01:06:22,830 --> 01:06:25,082 these things, and then having the space for those, 1912 01:06:25,082 --> 01:06:26,790 and then searching around stochastically. 1913 01:06:26,790 --> 01:06:29,010 Because you're not going to do better than stochastic search. 1914 01:06:29,010 --> 01:06:31,385 There will come a point where you're just bouncing around 1915 01:06:31,385 --> 01:06:32,370 at random. 1916 01:06:32,370 --> 01:06:35,509 So I used this actually, forgive me, a paper title for SRCD 1917 01:06:35,509 --> 01:06:37,050 and came up with some amusing things. 1918 01:06:37,050 --> 01:06:39,460 You guys can play with that online if you want. 1919 01:06:39,460 --> 01:06:42,699 But let's do, give me a good name for a new romantic drama. 1920 01:06:42,699 --> 01:06:45,240 So as I said, what you would do is you would just think about 1921 01:06:45,240 --> 01:06:47,650 all the romantic dramas that you know, like The Climbers, 1922 01:06:47,650 --> 01:06:51,150 Christine of The Big Tops, Cupid's-- these are all actual 1923 01:06:51,150 --> 01:06:53,460 romantic dramas pulled off of Wikipedia-- 1924 01:06:53,460 --> 01:06:55,380 then use those to construct your space. 1925 01:06:55,380 --> 01:06:58,310 Don't care about all the things that could happen in the world. 1926 01:06:58,310 --> 01:06:58,830 OK? 1927 01:06:58,830 --> 01:07:00,150 And what do we mean construct your space? 1928 01:07:00,150 --> 01:07:01,600 Well, there's a bunch of ways to look the space. 1929 01:07:01,600 --> 01:07:03,660 What ideally we would want and what I didn't do, 1930 01:07:03,660 --> 01:07:05,550 but what we're thinking of, is to construct 1931 01:07:05,550 --> 01:07:07,320 a very, very simple grammar which 1932 01:07:07,320 --> 01:07:08,695 instead of all possible sentences 1933 01:07:08,695 --> 01:07:10,890 is a grammar for movie titles. 1934 01:07:10,890 --> 01:07:14,260 And this grammar usually tends to generate things like the, 1935 01:07:14,260 --> 01:07:14,760 right? 1936 01:07:14,760 --> 01:07:15,990 The something something. 1937 01:07:15,990 --> 01:07:16,950 And [AUDIO OUT] long. 1938 01:07:16,950 --> 01:07:17,940 And then it just stops. 1939 01:07:17,940 --> 01:07:18,540 OK? 1940 01:07:18,540 --> 01:07:20,831 And it turns out that something like the adjective noun 1941 01:07:20,831 --> 01:07:23,010 is a really good way of generating names for pubs-- 1942 01:07:23,010 --> 01:07:26,490 The White Queen, The Blond Tiger, The Bleeding Bottle, 1943 01:07:26,490 --> 01:07:29,470 I don't know, something. 1944 01:07:29,470 --> 01:07:30,040 Right? 1945 01:07:30,040 --> 01:07:31,500 That's really useful if only it could do that. 1946 01:07:31,500 --> 01:07:33,100 If it could construct these tiny grammars, 1947 01:07:33,100 --> 01:07:35,183 it'll still give you an infinite number of things. 1948 01:07:35,183 --> 01:07:37,350 But, you know, [AUDIO OUT] movie names. 1949 01:07:37,350 --> 01:07:40,440 Or things like verbing proper name 1950 01:07:40,440 --> 01:07:43,350 turns out to be a really good thing for like, you know, 1951 01:07:43,350 --> 01:07:47,790 Amy Stopping, Interrupting Timmy. 1952 01:07:47,790 --> 01:07:49,820 It's so bad. 1953 01:07:49,820 --> 01:07:52,170 And you could find that from looking at these things. 1954 01:07:52,170 --> 01:07:54,150 And just to show you how much I think 1955 01:07:54,150 --> 01:07:58,540 that this is, you know, actually not that bad of a problem, 1956 01:07:58,540 --> 01:07:59,850 I did not this grammar thing. 1957 01:07:59,850 --> 01:08:01,890 I did something even simpler, which 1958 01:08:01,890 --> 01:08:04,530 is to take all the other names that I could find on Wikipedia 1959 01:08:04,530 --> 01:08:07,900 for different movie genres throughout the ages, 1960 01:08:07,900 --> 01:08:10,940 and then I looked at things like romantic dramas. 1961 01:08:10,940 --> 01:08:13,770 And what I did was construct a very simple n-gram which 1962 01:08:13,770 --> 01:08:16,020 just takes those words and just sort of does 1963 01:08:16,020 --> 01:08:18,000 random walk on those words. 1964 01:08:18,000 --> 01:08:18,810 OK? 1965 01:08:18,810 --> 01:08:21,233 And you could imagine complicating this immediately 1966 01:08:21,233 --> 01:08:22,649 by taking something like embedding 1967 01:08:22,649 --> 01:08:24,120 those words in the high-dimensional space 1968 01:08:24,120 --> 01:08:25,880 and actually picking words that aren't close to that. 1969 01:08:25,880 --> 01:08:27,750 So you could get new words that were never in there before. 1970 01:08:27,750 --> 01:08:28,875 I'm not even doing that. 1971 01:08:28,875 --> 01:08:30,500 I'm doing something ridiculously simple 1972 01:08:30,500 --> 01:08:31,999 that I don't think people are doing. 1973 01:08:31,999 --> 01:08:34,710 But let me show you how reasonable it is. 1974 01:08:34,710 --> 01:08:35,220 OK? 1975 01:08:35,220 --> 01:08:37,109 And what I'm going to compare it to 1976 01:08:37,109 --> 01:08:39,410 is some stuff that we ask people on Mechanical Turk 1977 01:08:39,410 --> 01:08:41,160 to give us names for a new romantic drama. 1978 01:08:44,140 --> 01:08:45,930 Ah, the only thing I forgot was the right 1979 01:08:45,930 --> 01:08:47,040 labeling for these things. 1980 01:08:47,040 --> 01:08:48,600 So Laura, what do you think? 1981 01:08:48,600 --> 01:08:51,380 Is this from Turk, or is this my algorithm? 1982 01:08:54,380 --> 01:08:57,939 LAURA SCHULZ: I am 50/50 [AUDIO OUT].. 1983 01:08:57,939 --> 01:09:00,592 TOMER ULLMAN: So how about we'll have by, not show of hands, 1984 01:09:00,592 --> 01:09:01,800 but people just shout it out. 1985 01:09:01,800 --> 01:09:05,935 Like, if it's, I don't know, Turk or-- 1986 01:09:05,935 --> 01:09:08,939 I'm looking for a short word which is like a Turk or Tomer. 1987 01:09:08,939 --> 01:09:11,600 Let's do it that way, so Tomer just standing in for Tomer' 1988 01:09:11,600 --> 01:09:13,392 simple silly algorithm. 1989 01:09:13,392 --> 01:09:14,850 So who thinks that this was created 1990 01:09:14,850 --> 01:09:17,102 by someone an actual human on Mechanical Turk? 1991 01:09:17,102 --> 01:09:19,560 And who thinks it was created by Tomer mechanically running 1992 01:09:19,560 --> 01:09:20,460 through an algorithm? 1993 01:09:20,460 --> 01:09:21,090 OK. 1994 01:09:21,090 --> 01:09:24,560 So in 3, 2, 1 you're either going to shout Turk or Tomer. 1995 01:09:24,560 --> 01:09:26,142 3, 2, 1. 1996 01:09:26,142 --> 01:09:27,750 [INTERPOSING VOICES] 1997 01:09:27,750 --> 01:09:28,500 TOMER ULLMAN: OK. 1998 01:09:28,500 --> 01:09:31,140 That was actually someone at Mechanical Turk. 1999 01:09:31,140 --> 01:09:32,010 Let's do this again. 2000 01:09:32,010 --> 01:09:36,607 Girls In Ships for a romantic drama, 3, 2, 1-- 2001 01:09:36,607 --> 01:09:38,399 AUDIENCE: Tomer. 2002 01:09:38,399 --> 01:09:39,960 TOMER ULLMAN: This was an algorithm. 2003 01:09:39,960 --> 01:09:42,270 Value Of Love, 3, 2, 1-- 2004 01:09:42,270 --> 01:09:43,870 AUDIENCE: Turk. 2005 01:09:43,870 --> 01:09:45,600 TOMER ULLMAN: That was Turk, good. 2006 01:09:45,600 --> 01:09:47,860 Endless Love, 3, 2, 1-- 2007 01:09:47,860 --> 01:09:48,540 AUDIENCE: Turk. 2008 01:09:48,540 --> 01:09:49,140 TOMER ULLMAN: Good. 2009 01:09:49,140 --> 01:09:49,639 OK. 2010 01:09:49,639 --> 01:09:51,279 How about Legend of Paris? 2011 01:09:51,279 --> 01:09:52,370 3, 2, 1-- 2012 01:09:52,370 --> 01:09:53,935 [INTERPOSING VOICES] 2013 01:09:53,935 --> 01:09:55,060 TOMER ULLMAN: Nobody knows. 2014 01:09:55,060 --> 01:09:57,050 This is actually me. 2015 01:09:57,050 --> 01:09:57,550 OK. 2016 01:09:57,550 --> 01:10:00,730 Who's enjoying this and wants to do a few more? 2017 01:10:00,730 --> 01:10:02,980 Land of Roses, 3, 2, 1-- 2018 01:10:02,980 --> 01:10:03,904 AUDIENCE: Tomer. 2019 01:10:03,904 --> 01:10:05,290 TOMER ULLMAN: Tomer. 2020 01:10:05,290 --> 01:10:07,960 And finally, Those We Meet Again, 3, 2, 1-- 2021 01:10:07,960 --> 01:10:08,790 AUDIENCE: Turk. 2022 01:10:08,790 --> 01:10:10,390 TOMER ULLMAN: No, it wasn't me. 2023 01:10:10,390 --> 01:10:11,290 Oh, sorry, one more. 2024 01:10:11,290 --> 01:10:13,065 Love Lightly, 3, 2, 1-- 2025 01:10:13,065 --> 01:10:13,690 AUDIENCE: Turk. 2026 01:10:13,690 --> 01:10:14,740 TOMER ULLMAN: Yeah, Turk. 2027 01:10:14,740 --> 01:10:16,198 It seems like Turkers were actually 2028 01:10:16,198 --> 01:10:18,760 doing better than the algorithm, which is romantic is love. 2029 01:10:18,760 --> 01:10:20,218 And I'm just going to put something 2030 01:10:20,218 --> 01:10:22,210 with love in the title. 2031 01:10:22,210 --> 01:10:24,490 So who wants to do this action movies, and then 2032 01:10:24,490 --> 01:10:26,580 we'll start stop? 2033 01:10:26,580 --> 01:10:27,269 OK. 2034 01:10:27,269 --> 01:10:28,310 Let's do this for action. 2035 01:10:28,310 --> 01:10:29,830 How about The Chase? 2036 01:10:29,830 --> 01:10:30,760 3, 2, 1-- 2037 01:10:30,760 --> 01:10:32,110 AUDIENCE: Turk. 2038 01:10:32,110 --> 01:10:34,345 TOMER ULLMAN: Yes, how about Who, The Annihilation? 2039 01:10:34,345 --> 01:10:36,580 [LAUGHTER] 2040 01:10:36,580 --> 01:10:38,080 TOMER ULLMAN: OK, that's me. 2041 01:10:38,080 --> 01:10:39,190 The Oversight? 2042 01:10:39,190 --> 01:10:40,330 Turk. 2043 01:10:40,330 --> 01:10:41,450 The Edge. 2044 01:10:41,450 --> 01:10:42,250 AUDIENCE: Turk. 2045 01:10:42,250 --> 01:10:43,070 TOMER ULLMAN: Turk. 2046 01:10:43,070 --> 01:10:45,070 Jack Death? 2047 01:10:45,070 --> 01:10:45,910 Tomer. 2048 01:10:45,910 --> 01:10:46,960 Among Heroes. 2049 01:10:46,960 --> 01:10:47,892 AUDIENCE: Turk. 2050 01:10:47,892 --> 01:10:49,760 TOMER ULLMAN: No, it was me. 2051 01:10:49,760 --> 01:10:51,420 Swordmen in China Three? 2052 01:10:51,420 --> 01:10:52,348 AUDIENCE: Tomer 2053 01:10:52,348 --> 01:10:53,740 TOMER ULLMAN: Tomer. 2054 01:10:53,740 --> 01:10:54,306 And The Hit? 2055 01:10:54,306 --> 01:10:54,930 AUDIENCE: Turk. 2056 01:10:54,930 --> 01:10:55,620 TOMER ULLMAN: People on Turk. 2057 01:10:55,620 --> 01:10:57,710 You can probably [AUDIO OUT] than four in each one. 2058 01:10:57,710 --> 01:11:00,335 And, again, people are like, The Oversight, The Hit, The Chase, 2059 01:11:00,335 --> 01:11:00,879 The Edge. 2060 01:11:00,879 --> 01:11:02,170 That's the only thing they did. 2061 01:11:02,170 --> 01:11:04,740 They actually came up with some clever stuff as well. 2062 01:11:04,740 --> 01:11:06,272 But, you know, it's interesting. 2063 01:11:06,272 --> 01:11:07,480 And, of course, I'm cheating. 2064 01:11:07,480 --> 01:11:09,729 Because the algorithm did a bunch of really dumb stuff 2065 01:11:09,729 --> 01:11:13,250 that I didn't put in here, like Hunchback of Monte Cristo, 2066 01:11:13,250 --> 01:11:15,500 Get it Did, Bell of a Lesser God, 2067 01:11:15,500 --> 01:11:18,010 Eagles Shooting Heroes, Tomb Raider, 2068 01:11:18,010 --> 01:11:23,890 The Raging God Of Violence, and Legend of Legend, my favorite. 2069 01:11:23,890 --> 01:11:27,566 But my point is to say, you know, in the same sense 2070 01:11:27,566 --> 01:11:29,190 that Joshua's saying, you know, imagine 2071 01:11:29,190 --> 01:11:30,981 that you could use something like a ConvNet 2072 01:11:30,981 --> 01:11:33,640 to quickly cede your proposals-- imagine if you could think 2073 01:11:33,640 --> 01:11:36,730 of like a random dumb algorithm that could then [AUDIO OUT] 2074 01:11:36,730 --> 01:11:38,430 and say Legend of something. 2075 01:11:38,430 --> 01:11:39,680 And then you start to say, no. 2076 01:11:39,680 --> 01:11:40,930 That's not really a great idea. 2077 01:11:40,930 --> 01:11:42,304 What you're trying to get at here 2078 01:11:42,304 --> 01:11:44,200 is not 100% accuracy with these silly things, 2079 01:11:44,200 --> 01:11:45,740 but something like 1 in 5. 2080 01:11:45,740 --> 01:11:49,600 1 in 5 is better than 1 in 0, or 1 in a million 2081 01:11:49,600 --> 01:11:52,490 or something like that, which is what Laura was pointing out. 2082 01:11:52,490 --> 01:11:52,990 OK. 2083 01:11:52,990 --> 01:11:55,570 So as I said, we still have a long, long way 2084 01:11:55,570 --> 01:11:57,790 to go to model children to meet Laura's critique. 2085 01:11:57,790 --> 01:11:58,999 It's hard to say what's hard. 2086 01:11:58,999 --> 01:12:01,248 I think that's what I was trying to hint at with Steve 2087 01:12:01,248 --> 01:12:02,050 Piantadosi's point. 2088 01:12:02,050 --> 01:12:03,520 We don't really know what's easy. 2089 01:12:03,520 --> 01:12:05,200 We don't really know what's hard. 2090 01:12:05,200 --> 01:12:07,414 But people in development and in computational land 2091 01:12:07,414 --> 01:12:09,580 should continue to care about stochastic algorithms. 2092 01:12:09,580 --> 01:12:11,205 And people in computational land should 2093 01:12:11,205 --> 01:12:14,440 continue to care about children to everyone's benefit. 2094 01:12:14,440 --> 01:12:16,170 And that's it-- so, Laura. 2095 01:12:16,170 --> 01:12:18,062 LAURA SCHULZ: This will be very short. 2096 01:12:18,062 --> 01:12:21,850 [APPLAUSE] 2097 01:12:21,850 --> 01:12:23,310 So I didn't know. 2098 01:12:23,310 --> 01:12:24,310 Or rather, I did know. 2099 01:12:24,310 --> 01:12:25,250 But I didn't know it was going to be 2100 01:12:25,250 --> 01:12:27,560 part of this debate about Max Siegel's thing. 2101 01:12:27,560 --> 01:12:29,390 So I'll say something briefly about that. 2102 01:12:29,390 --> 01:12:35,216 But you've had three really good approaches to each of these. 2103 01:12:35,216 --> 01:12:36,590 So I'll speak briefly about them. 2104 01:12:36,590 --> 01:12:38,970 I think what Owen is doing is totally great, 2105 01:12:38,970 --> 01:12:41,390 but still driven in some sense by the data, 2106 01:12:41,390 --> 01:12:43,790 not by the question, right? 2107 01:12:43,790 --> 01:12:46,370 And I think the point that I'm going to make just continually 2108 01:12:46,370 --> 01:12:49,580 here is that the way we think is driven by the goals 2109 01:12:49,580 --> 01:12:52,100 that we have, right? 2110 01:12:52,100 --> 01:12:55,200 And each of these solutions in some ways 2111 01:12:55,200 --> 01:12:58,100 is failing to use what is most salient to it as humans, 2112 01:12:58,100 --> 01:12:59,700 which is we have problems. 2113 01:12:59,700 --> 01:13:01,970 We have questions, right? 2114 01:13:01,970 --> 01:13:04,850 And what I would like to do is see us move to a case 2115 01:13:04,850 --> 01:13:07,700 where it's not just the data that's causing 2116 01:13:07,700 --> 01:13:09,257 us to generate new ideas. 2117 01:13:09,257 --> 01:13:11,090 And we're not just trying to deal with that. 2118 01:13:11,090 --> 01:13:16,720 It is actually the information in the problem itself. 2119 01:13:16,720 --> 01:13:19,570 Similarly, I think what Eyal's doing is totally beautiful. 2120 01:13:19,570 --> 01:13:21,240 And the representational compression 2121 01:13:21,240 --> 01:13:23,310 is really, really interesting. 2122 01:13:23,310 --> 01:13:26,050 But a lot of learning problems can't be solved. 2123 01:13:26,050 --> 01:13:28,542 Most of the ones I was gesturing at 2124 01:13:28,542 --> 01:13:30,000 are not really problems of changing 2125 01:13:30,000 --> 01:13:31,166 the representational format. 2126 01:13:31,166 --> 01:13:33,630 It matters hugely that we have an Arabic numeral 2127 01:13:33,630 --> 01:13:35,430 system instead of a Roman numeral system. 2128 01:13:35,430 --> 01:13:37,920 That changes the kinds of problems that we can solve. 2129 01:13:37,920 --> 01:13:42,100 And so that represents a huge advancement. 2130 01:13:42,100 --> 01:13:44,400 And for many kinds of problems, it 2131 01:13:44,400 --> 01:13:47,160 will make search much more efficient. 2132 01:13:47,160 --> 01:13:49,390 But a lot of problems just don't have that property. 2133 01:13:49,390 --> 01:13:50,940 So it's, in some sense, an answer 2134 01:13:50,940 --> 01:13:53,970 to a different kind of problem. 2135 01:13:53,970 --> 01:13:57,840 Steve's proposal-- what can you say, right? 2136 01:13:57,840 --> 01:13:58,699 It could be true. 2137 01:13:58,699 --> 01:14:00,240 There are a billion, billion neurons. 2138 01:14:00,240 --> 01:14:01,614 You get more synaptic connections 2139 01:14:01,614 --> 01:14:03,450 than there are stars in the known universe. 2140 01:14:03,450 --> 01:14:06,310 Of course, it could be true. 2141 01:14:06,310 --> 01:14:08,760 That's what an expedition means-- a long line 2142 01:14:08,760 --> 01:14:10,532 of everybody, says Pooh. 2143 01:14:10,532 --> 01:14:13,550 But it's not as good a story if it's true. 2144 01:14:13,550 --> 01:14:14,700 So it could be true. 2145 01:14:14,700 --> 01:14:16,950 You could do a billion things really really fast 2146 01:14:16,950 --> 01:14:19,130 and just think about the ones that you arrive at. 2147 01:14:19,130 --> 01:14:24,090 But I think the jury's out on that one. 2148 01:14:24,090 --> 01:14:26,750 Max and Tomer and this-- 2149 01:14:26,750 --> 01:14:28,709 and while ago I think Sam Gershman also, right? 2150 01:14:28,709 --> 01:14:30,208 So Sam Gershman came up to us and we 2151 01:14:30,208 --> 01:14:32,490 spent a while talking about how you would invent what 2152 01:14:32,490 --> 01:14:35,700 we were affectionately calling a bullshit generator, our ability 2153 01:14:35,700 --> 01:14:37,317 to [AUDIO OUT]. 2154 01:14:37,317 --> 01:14:39,150 Somebody asked me about anything, you know-- 2155 01:14:39,150 --> 01:14:40,800 tell me about Ionic and Doric columns, 2156 01:14:40,800 --> 01:14:42,510 you remember something from sixth grade. 2157 01:14:42,510 --> 01:14:44,990 And you start talking, right? 2158 01:14:44,990 --> 01:14:46,760 So the question is, what can you do? 2159 01:14:46,760 --> 01:14:48,510 And I think this is a really nice attempt. 2160 01:14:48,510 --> 01:14:50,790 And I think the idea of seeding it from past examples 2161 01:14:50,790 --> 01:14:54,090 to help construct a search space is a really beautiful idea. 2162 01:14:54,090 --> 01:14:58,140 Again, the question is, how do you make that. 2163 01:14:58,140 --> 01:15:01,860 My feeling is still we can do something 2164 01:15:01,860 --> 01:15:03,600 that works for those kinds of problems 2165 01:15:03,600 --> 01:15:04,800 where we have past examples. 2166 01:15:04,800 --> 01:15:07,880 We can do it for any kind of problem. 2167 01:15:07,880 --> 01:15:09,360 And so what I really want to push 2168 01:15:09,360 --> 01:15:11,760 for is use the problem, right? 2169 01:15:11,760 --> 01:15:13,499 Use the information and the problem. 2170 01:15:13,499 --> 01:15:15,540 Because for those problems, like romantic movies, 2171 01:15:15,540 --> 01:15:16,740 we have some existing examples. 2172 01:15:16,740 --> 01:15:17,640 We can the search space. 2173 01:15:17,640 --> 01:15:18,150 We can do that. 2174 01:15:18,150 --> 01:15:20,191 And for my theater company example, it's perfect. 2175 01:15:20,191 --> 01:15:22,620 But for the peppermint example, not so much, right? 2176 01:15:22,620 --> 01:15:24,870 You're not going to see the search space from examples 2177 01:15:24,870 --> 01:15:27,210 of candies, or what you know about the construction 2178 01:15:27,210 --> 01:15:27,720 of candies. 2179 01:15:27,720 --> 01:15:30,303 If you did, it wouldn't generate the pendulum answer, which we 2180 01:15:30,303 --> 01:15:32,010 think is a good wrong answer. 2181 01:15:32,010 --> 01:15:34,230 So it's not just that. 2182 01:15:34,230 --> 01:15:36,600 It is I have a problem of a particular kind. 2183 01:15:36,600 --> 01:15:39,480 It is going to be satisfied by some kind of an answer and not 2184 01:15:39,480 --> 01:15:40,050 others. 2185 01:15:40,050 --> 01:15:43,662 How can I use that to help my [INAUDIBLE]?? 2186 01:15:43,662 --> 01:15:47,850 So that's I think the end of what I have to say. 2187 01:15:47,850 --> 01:15:50,930 And we'll return to questions.