1 00:00:01,680 --> 00:00:04,080 The following content is provided under a Creative 2 00:00:04,080 --> 00:00:05,620 Commons license. 3 00:00:05,620 --> 00:00:07,920 Your support will help MIT OpenCourseWare 4 00:00:07,920 --> 00:00:12,280 continue to offer high quality educational resources for free. 5 00:00:12,280 --> 00:00:14,910 To make a donation or view additional materials 6 00:00:14,910 --> 00:00:18,870 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,870 --> 00:00:22,089 at ocw.mit.edu. 8 00:00:22,089 --> 00:00:23,880 JOSHUA TENENBAUM: So we saw in Laura's talk 9 00:00:23,880 --> 00:00:27,690 this introduction to both this idea of the child as scientist, 10 00:00:27,690 --> 00:00:30,300 all the different ways that children's learning seems 11 00:00:30,300 --> 00:00:33,120 to follow the many different practices that scientists use 12 00:00:33,120 --> 00:00:35,790 to learn about the world, not just the data analysis, 13 00:00:35,790 --> 00:00:38,270 in a sense. 14 00:00:38,270 --> 00:00:42,810 As much as various kinds of statistics on a grand scale, 15 00:00:42,810 --> 00:00:44,520 whether it's Hebbian learning or backprop 16 00:00:44,520 --> 00:00:47,340 important for learning, we know that that's not all 17 00:00:47,340 --> 00:00:50,370 that children do, just like analyzing patterns of data 18 00:00:50,370 --> 00:00:52,110 is not all that scientists do. 19 00:00:52,110 --> 00:00:54,870 And then Laura added this other cool dimension, 20 00:00:54,870 --> 00:00:57,636 thinking about the costs and rewards of information, 21 00:00:57,636 --> 00:00:58,760 when is it worth it or not. 22 00:00:58,760 --> 00:01:01,410 And you could say, maybe she was suggesting that we should 23 00:01:01,410 --> 00:01:04,170 develop the metaphor, expand it a little bit from child 24 00:01:04,170 --> 00:01:08,010 as scientist to maybe like the child as-- oh, oops, sorry-- 25 00:01:08,010 --> 00:01:12,660 lab PI or maybe even NSF center director. 26 00:01:12,660 --> 00:01:14,610 Because as everyone knows, but Tommy certainly 27 00:01:14,610 --> 00:01:18,120 can tell you, whether you're a lab PI who's just gotten tenure 28 00:01:18,120 --> 00:01:19,740 or the director of an NSF Center, 29 00:01:19,740 --> 00:01:22,200 you have to make hard-nosed pragmatic decisions 30 00:01:22,200 --> 00:01:24,360 about what is achievable given the costs 31 00:01:24,360 --> 00:01:26,520 and which research questions are really worth 32 00:01:26,520 --> 00:01:30,420 going after and devoting your time and other resources to. 33 00:01:30,420 --> 00:01:33,617 And that's a very important part of science. 34 00:01:33,617 --> 00:01:35,700 And it's an important part of intuitive knowledge. 35 00:01:35,700 --> 00:01:38,790 I want to add another practical dimension 36 00:01:38,790 --> 00:01:43,200 to things, a way of bringing out, fleshing out 37 00:01:43,200 --> 00:01:46,020 this idea of the child as scientist. 38 00:01:46,020 --> 00:01:48,300 You can think about all these, these are metaphors. 39 00:01:48,300 --> 00:01:50,160 But they're things that we can formalize. 40 00:01:50,160 --> 00:01:53,350 And if our goal is to make computational models of 41 00:01:53,350 --> 00:01:56,370 and ultimately to get some kind of theoretical handle on, 42 00:01:56,370 --> 00:01:57,750 then these are helping us. 43 00:01:57,750 --> 00:01:59,910 By adding in the costs and benefits, 44 00:01:59,910 --> 00:02:01,227 you bring in utility calculus. 45 00:02:01,227 --> 00:02:03,060 And there's not just naive utility calculus, 46 00:02:03,060 --> 00:02:05,610 there's a formal mathematical utility calculus 47 00:02:05,610 --> 00:02:07,110 of these kinds of decisions. 48 00:02:07,110 --> 00:02:09,914 Julian Jara-Ettinger, who Laura mentioned, 49 00:02:09,914 --> 00:02:12,330 was driving a lot of the work towards the end of the talk, 50 00:02:12,330 --> 00:02:15,450 has done some really interesting actual mathematical 51 00:02:15,450 --> 00:02:17,010 computational models of these issues, 52 00:02:17,010 --> 00:02:20,070 as has [? Choi ?] and the other students you talked about. 53 00:02:20,070 --> 00:02:22,320 So the direction I want to push here 54 00:02:22,320 --> 00:02:25,320 is what you might call the child as hacker. 55 00:02:25,320 --> 00:02:27,360 This is trying to make the connection back 56 00:02:27,360 --> 00:02:31,590 to this idea of formalizing common sense knowledge 57 00:02:31,590 --> 00:02:34,230 and intuitive theories as probabilistic programs. 58 00:02:34,230 --> 00:02:37,230 Or just more generally, the idea of a program-- 59 00:02:37,230 --> 00:02:41,910 a structured algorithm and a dat-- 60 00:02:41,910 --> 00:02:43,560 or some combination of algorithms, 61 00:02:43,560 --> 00:02:45,780 data structures, networks of functions 62 00:02:45,780 --> 00:02:49,770 that can scribe interesting causal processes in the world, 63 00:02:49,770 --> 00:02:52,950 like, for example your intuitive physics 64 00:02:52,950 --> 00:02:54,960 or your intuitive psychology. 65 00:02:54,960 --> 00:02:57,780 That's an idea that we talked a lot about last week 66 00:02:57,780 --> 00:03:01,480 or whenever it was, earlier in the week or last week. 67 00:03:01,480 --> 00:03:04,590 And this idea that if your knowledge is 68 00:03:04,590 --> 00:03:08,160 something like a program, or a set of programs, that you can, 69 00:03:08,160 --> 00:03:11,250 say, for example, run forward to simulate physics, 70 00:03:11,250 --> 00:03:13,530 like we had last time, then learning 71 00:03:13,530 --> 00:03:16,549 has to be something like building a program or hacking. 72 00:03:16,549 --> 00:03:18,840 And I think you could make-- this is, again, a research 73 00:03:18,840 --> 00:03:20,722 program that I wish I had. 74 00:03:20,722 --> 00:03:22,680 Like Laura talked to the end about the research 75 00:03:22,680 --> 00:03:23,810 she wished she had. 76 00:03:23,810 --> 00:03:25,380 It's not really just a wish for her. 77 00:03:25,380 --> 00:03:27,630 She actually is working towards that research program. 78 00:03:27,630 --> 00:03:29,519 And this isn't just an empty wish either. 79 00:03:29,519 --> 00:03:31,060 It's something that we're working on, 80 00:03:31,060 --> 00:03:33,660 which is to try to take-- 81 00:03:33,660 --> 00:03:35,340 just as Laura had that wonderful list 82 00:03:35,340 --> 00:03:38,040 of all the things scientists do in their practices 83 00:03:38,040 --> 00:03:40,020 to learn about the world, I think 84 00:03:40,020 --> 00:03:42,210 you could make a similar list of all the things 85 00:03:42,210 --> 00:03:44,820 that you do in your programming or hacking. 86 00:03:44,820 --> 00:03:47,550 By hacking I don't mean like breaking into a secure system, 87 00:03:47,550 --> 00:03:51,660 but modifying your code to make it more awesome. 88 00:03:51,660 --> 00:03:53,160 And I use awesome very deliberately, 89 00:03:53,160 --> 00:03:55,440 because awesome is a multi-dimensional term. 90 00:03:55,440 --> 00:03:56,190 It's just awesome. 91 00:03:56,190 --> 00:03:59,550 But it could be faster, more accurate, more efficient, 92 00:03:59,550 --> 00:04:02,310 more elegant, more generalizable, 93 00:04:02,310 --> 00:04:04,170 more easily communicated to other people, 94 00:04:04,170 --> 00:04:07,020 more easily modularly combined with other code 95 00:04:07,020 --> 00:04:08,989 to do something even more awesome. 96 00:04:08,989 --> 00:04:10,530 I think there's a deep sense in which 97 00:04:10,530 --> 00:04:14,430 that aesthetic behind hacking and making awesome code, 98 00:04:14,430 --> 00:04:16,769 in both an individual and a social setting, that's 99 00:04:16,769 --> 00:04:18,597 a really powerful way to think about many 100 00:04:18,597 --> 00:04:20,430 of the cognitive activities behind learning. 101 00:04:20,430 --> 00:04:24,360 And it goes together with the idea of the child as scientist 102 00:04:24,360 --> 00:04:27,110 if the form of your, quote, "intuitive science" 103 00:04:27,110 --> 00:04:30,400 are computer programs or something like programs. 104 00:04:30,400 --> 00:04:32,137 So we've been working on a few projects 105 00:04:32,137 --> 00:04:33,970 where we've been trying to capture this idea 106 00:04:33,970 --> 00:04:35,970 and to say, well, what would it mean to describe 107 00:04:35,970 --> 00:04:40,710 computationally this idea of learning as either synthesizing 108 00:04:40,710 --> 00:04:44,760 programs, or modifying programs, or making more 109 00:04:44,760 --> 00:04:46,410 awesome programs in your mind. 110 00:04:46,410 --> 00:04:48,360 And I'll just show you a few examples of this. 111 00:04:48,360 --> 00:04:51,180 I'll show you our good, our successful case studies, 112 00:04:51,180 --> 00:04:53,100 places where we've made this idea work. 113 00:04:53,100 --> 00:04:55,020 But the bottom line to foreshadow it 114 00:04:55,020 --> 00:04:57,670 is that this is really, really hard. 115 00:04:57,670 --> 00:05:00,030 And to get it to work for the kinds of knowledge 116 00:05:00,030 --> 00:05:02,150 that, say, Laura was talking about 117 00:05:02,150 --> 00:05:05,137 or that Liz was talking about, the real stuff 118 00:05:05,137 --> 00:05:07,220 of children's knowledge, is still very, very open. 119 00:05:07,220 --> 00:05:09,770 And we want to, basically, build up 120 00:05:09,770 --> 00:05:12,770 to engaging why that problem is so hard. 121 00:05:12,770 --> 00:05:15,290 From this to what Tomer and then Laura will talk about later 122 00:05:15,290 --> 00:05:16,960 on in the afternoon. 123 00:05:16,960 --> 00:05:19,700 But here's a few at least early success stories 124 00:05:19,700 --> 00:05:21,140 that we've worked on. 125 00:05:21,140 --> 00:05:26,090 One goes back to this idea that I presented in my lectures 126 00:05:26,090 --> 00:05:27,530 last week. 127 00:05:27,530 --> 00:05:29,120 And it connects to, again, something 128 00:05:29,120 --> 00:05:30,036 that Laura was saying. 129 00:05:30,036 --> 00:05:31,820 Here, a very basic kind of learning. 130 00:05:31,820 --> 00:05:33,950 It's just the problem of learning 131 00:05:33,950 --> 00:05:38,106 some generalizable concepts at all from very sparse evidence. 132 00:05:38,106 --> 00:05:39,980 Like one-shot learning-- again, something you 133 00:05:39,980 --> 00:05:41,630 heard from Tommy, a number of the other speakers. 134 00:05:41,630 --> 00:05:43,796 We've all been trying to wrap our heads around this. 135 00:05:43,796 --> 00:05:45,680 How can you learn, say, any concept 136 00:05:45,680 --> 00:05:48,050 at all from very, very little data, 137 00:05:48,050 --> 00:05:49,660 maybe just one or a few examples. 138 00:05:49,660 --> 00:05:52,120 So you saw this kind of thing last time. 139 00:05:52,120 --> 00:05:53,870 And I briefly mentioned how we had 140 00:05:53,870 --> 00:05:58,100 tried to capture this problem as something like this 141 00:05:58,100 --> 00:06:00,590 by building this tree structured hypothesis space. 142 00:06:00,590 --> 00:06:03,272 And you could think of this as a kind of program induction. 143 00:06:03,272 --> 00:06:04,730 If you think that there's something 144 00:06:04,730 --> 00:06:07,460 like an evolutionary program which generated these objects, 145 00:06:07,460 --> 00:06:10,340 and you're trying to find the sub procedure of it 146 00:06:10,340 --> 00:06:12,790 that generated just these kinds of objects. 147 00:06:12,790 --> 00:06:15,620 But that's not at all how we were able to model this. 148 00:06:15,620 --> 00:06:17,360 We had a much simpler model. 149 00:06:17,360 --> 00:06:20,404 But let me show you briefly some work that we did in our group 150 00:06:20,404 --> 00:06:21,320 a couple of years ago. 151 00:06:21,320 --> 00:06:24,050 It's really just getting out into publication now. 152 00:06:24,050 --> 00:06:27,680 This is work that was mostly done by two people-- 153 00:06:27,680 --> 00:06:30,881 Ruslan Salakhutdinov, who is now a professor at Toronto, 154 00:06:30,881 --> 00:06:32,630 although about to move to Carnegie Mellon, 155 00:06:32,630 --> 00:06:34,610 I think, and Brenden Lake. 156 00:06:34,610 --> 00:06:37,280 He's a machine learning person, also very well known 157 00:06:37,280 --> 00:06:38,226 for deep learning. 158 00:06:38,226 --> 00:06:40,100 And then Brenden Lake-- this is really mostly 159 00:06:40,100 --> 00:06:42,320 what I'll talk about is Brenden Lake's work, 160 00:06:42,320 --> 00:06:46,110 who is now a post-doc at NYU. 161 00:06:46,110 --> 00:06:49,310 And again, where we think we're building up to 162 00:06:49,310 --> 00:06:51,590 is trying to learn something like the program 163 00:06:51,590 --> 00:06:53,810 of an intuitive physics or intuitive psychology. 164 00:06:53,810 --> 00:06:56,770 But here we're just talking about learning object concepts. 165 00:06:56,770 --> 00:06:58,520 And we've been doing this work with a data 166 00:06:58,520 --> 00:07:01,360 set of handwritten characters, the ones you see on the right 167 00:07:01,360 --> 00:07:02,120 here. 168 00:07:02,120 --> 00:07:05,390 I'll just put it up in contrast or by comparison 169 00:07:05,390 --> 00:07:07,910 to, say, this other much more famous data set 170 00:07:07,910 --> 00:07:10,220 of handwritten characters, the MNIST data set. 171 00:07:10,220 --> 00:07:12,710 How many people have seen the MNIST data set, maybe 172 00:07:12,710 --> 00:07:13,510 in some of the previous talks? 173 00:07:13,510 --> 00:07:14,600 How many people have actually used it? 174 00:07:14,600 --> 00:07:16,460 Yeah, it's a great data set to use. 175 00:07:16,460 --> 00:07:19,890 It's driven a lot of basic machine learning research, 176 00:07:19,890 --> 00:07:21,590 including deep learning. 177 00:07:21,590 --> 00:07:23,470 Yann LeCun originally collected this data 178 00:07:23,470 --> 00:07:24,650 set and put this out there. 179 00:07:24,650 --> 00:07:27,140 And Geoffrey Hinton did most of the development. 180 00:07:27,140 --> 00:07:29,450 The stuff that now wins object recognition challenges 181 00:07:29,450 --> 00:07:30,920 was done on this data set. 182 00:07:30,920 --> 00:07:31,760 But not only that. 183 00:07:31,760 --> 00:07:34,490 Also a lot of Bayesian stuff and probabilistic 184 00:07:34,490 --> 00:07:35,710 generative models. 185 00:07:35,710 --> 00:07:37,820 Now, the thing about that data set, 186 00:07:37,820 --> 00:07:41,180 though, is it has a very small number of classes, just 187 00:07:41,180 --> 00:07:44,960 the digits 0 through 9, and a huge number of examples, 188 00:07:44,960 --> 00:07:47,245 roughly 10,000 examples in each class 189 00:07:47,245 --> 00:07:50,289 or maybe 6,000 examples, something like that. 190 00:07:50,289 --> 00:07:51,830 But we wanted to construct a data set 191 00:07:51,830 --> 00:07:55,370 which was similar in some ways in its complexity and scale, 192 00:07:55,370 --> 00:08:00,260 but where we had many, many more concepts and, perhaps, many 193 00:08:00,260 --> 00:08:01,950 fewer examples. 194 00:08:01,950 --> 00:08:06,740 So here we got people to write by hand characters in 50 195 00:08:06,740 --> 00:08:07,747 different alphabets. 196 00:08:07,747 --> 00:08:09,080 And it's a really cool data set. 197 00:08:09,080 --> 00:08:14,129 So that total data set has 1,623 concepts drawn. 198 00:08:14,129 --> 00:08:15,920 You could call them handwritten characters. 199 00:08:15,920 --> 00:08:19,210 You could just call them simple visual concepts 200 00:08:19,210 --> 00:08:21,890 as a sort of a warm up for bigger problems 201 00:08:21,890 --> 00:08:24,390 of, say, natural objects. 202 00:08:24,390 --> 00:08:26,180 And there's 20 examples per class. 203 00:08:26,180 --> 00:08:30,260 So there's roughly 30,000 total data points 204 00:08:30,260 --> 00:08:33,590 in this data set, very much like MNIST. 205 00:08:33,590 --> 00:08:36,373 You can see, just to illustrate here, 206 00:08:36,373 --> 00:08:39,039 there's many different alphabets that have very different forms. 207 00:08:39,039 --> 00:08:43,010 You can see both the similarities and differences 208 00:08:43,010 --> 00:08:45,732 between alphabets here. 209 00:08:45,732 --> 00:08:48,190 So in that sense, there's kind of a hierarchical structure. 210 00:08:48,190 --> 00:08:50,480 Each one of these is a character in an alphabet. 211 00:08:50,480 --> 00:08:52,370 But there's also the higher level concept 212 00:08:52,370 --> 00:08:54,560 of a sort of a Sanskrit form, as distinct 213 00:08:54,560 --> 00:08:57,571 from, say, to Gaelic, or Hebrew, or Braille. 214 00:08:57,571 --> 00:08:58,820 There's some made-up alphabet. 215 00:08:58,820 --> 00:09:01,250 But one of the neat things about this domain 216 00:09:01,250 --> 00:09:03,980 is that you can make up new concepts, 217 00:09:03,980 --> 00:09:06,590 and you can make up whole concepts of concepts, 218 00:09:06,590 --> 00:09:08,990 like whole new alphabets. 219 00:09:08,990 --> 00:09:10,820 You can do one-shot learning in it. 220 00:09:10,820 --> 00:09:14,670 So let's just try this out here for a second. 221 00:09:14,670 --> 00:09:16,070 You remember the tufa demo. 222 00:09:16,070 --> 00:09:17,653 We can do the same kind of thing here. 223 00:09:17,653 --> 00:09:19,820 Like let's take these characters. 224 00:09:19,820 --> 00:09:22,440 Anybody know the alphabet that this is? 225 00:09:22,440 --> 00:09:23,270 OK, that's good. 226 00:09:23,270 --> 00:09:25,010 Most of you have not seen these before. 227 00:09:25,010 --> 00:09:26,600 That's good that you know. 228 00:09:26,600 --> 00:09:29,750 But we'll do this experiment run on the rest of you. 229 00:09:29,750 --> 00:09:32,570 So here's one example of a concept. 230 00:09:32,570 --> 00:09:34,760 Call it a tufa if you like. 231 00:09:34,760 --> 00:09:37,080 And I'll just run my mouse over these other ones. 232 00:09:37,080 --> 00:09:39,830 And you just clap when I get to the other example 233 00:09:39,830 --> 00:09:41,170 of the same class, OK? 234 00:09:50,480 --> 00:09:52,450 [SOUND OF CLAPS] 235 00:09:52,450 --> 00:09:53,570 OK, very good. 236 00:09:53,570 --> 00:09:56,140 Yeah, people are, basically, perfect at this. 237 00:09:56,140 --> 00:09:57,940 It doesn't take-- I mean, again, it's 238 00:09:57,940 --> 00:10:01,480 very fast and almost perfect. 239 00:10:01,480 --> 00:10:05,784 And again, you saw me talk a little about this last time. 240 00:10:05,784 --> 00:10:07,450 Just like with natural objects, not only 241 00:10:07,450 --> 00:10:09,730 can you learn one of these concepts from one example 242 00:10:09,730 --> 00:10:11,230 and generalize it to others, but you 243 00:10:11,230 --> 00:10:13,630 can use that knowledge in various other ways. 244 00:10:13,630 --> 00:10:16,080 So you can parse these things into parts. 245 00:10:16,080 --> 00:10:17,830 We think that's part of what you're doing. 246 00:10:17,830 --> 00:10:19,577 You can generate new examples. 247 00:10:19,577 --> 00:10:21,160 So here are three different people all 248 00:10:21,160 --> 00:10:22,600 drawing the same character. 249 00:10:22,600 --> 00:10:25,240 And in fact, the whole data set was generated that way. 250 00:10:25,240 --> 00:10:27,370 You can also make higher level generalizations, 251 00:10:27,370 --> 00:10:30,040 recombining the parts into totally new concepts, 252 00:10:30,040 --> 00:10:33,820 the way there's that weird kind of like unicycle thing 253 00:10:33,820 --> 00:10:36,010 over there, unimotorcycle. 254 00:10:36,010 --> 00:10:38,320 Here, I can show you, as you'll see 255 00:10:38,320 --> 00:10:40,510 10 characters in a new alphabet, and you 256 00:10:40,510 --> 00:10:42,610 can make up hypothetical, if, perhaps, 257 00:10:42,610 --> 00:10:44,505 incorrect examples in it. 258 00:10:44,505 --> 00:10:45,880 Again, I'm just going to show you 259 00:10:45,880 --> 00:10:47,260 a couple of case studies of where 260 00:10:47,260 --> 00:10:52,150 this idea of learning as program synthesis might work. 261 00:10:52,150 --> 00:10:55,060 So the idea here is that, as you might see, 262 00:10:55,060 --> 00:10:57,740 these are three characters down on the bottom. 263 00:10:57,740 --> 00:11:00,460 And this is just a very schematic diagram 264 00:11:00,460 --> 00:11:05,920 of how our model tries to represent these as simple kinds 265 00:11:05,920 --> 00:11:07,150 of programs. 266 00:11:07,150 --> 00:11:09,500 Think about how you would draw, say, that character down 267 00:11:09,500 --> 00:11:10,083 to the bottom. 268 00:11:10,083 --> 00:11:13,270 Just try to draw it in midair. 269 00:11:13,270 --> 00:11:16,990 How would you draw that one in the lower left there? 270 00:11:16,990 --> 00:11:19,265 Are many of you doing something like this? 271 00:11:19,265 --> 00:11:20,390 Is that what you are doing? 272 00:11:20,390 --> 00:11:20,890 OK, yeah. 273 00:11:20,890 --> 00:11:22,780 So basically, everyone does that. 274 00:11:22,780 --> 00:11:25,180 And you can describe that as sort 275 00:11:25,180 --> 00:11:27,469 of having two large parts or two strokes, 276 00:11:27,469 --> 00:11:29,260 where you pick up your pen between strokes. 277 00:11:29,260 --> 00:11:31,060 And one of the strokes has two sub strokes, 278 00:11:31,060 --> 00:11:32,410 where you stop your pen. 279 00:11:32,410 --> 00:11:34,250 And there's a consistent relationship. 280 00:11:34,250 --> 00:11:36,400 The second stroke has to begin somewhere 281 00:11:36,400 --> 00:11:39,310 on a particular general region of the first stroke. 282 00:11:39,310 --> 00:11:41,780 And basically, that's the model's representation 283 00:11:41,780 --> 00:11:44,800 of concepts-- part, subparts, and simple relations-- which, 284 00:11:44,800 --> 00:11:47,720 you can see, it might scale up, arguably, 285 00:11:47,720 --> 00:11:50,110 to more interesting kinds of natural objects. 286 00:11:50,110 --> 00:11:52,630 And the basic idea is that you represent that, though, 287 00:11:52,630 --> 00:11:53,557 as a program. 288 00:11:53,557 --> 00:11:54,640 It's a generative program. 289 00:11:54,640 --> 00:11:56,532 It's kind of like a motor program. 290 00:11:56,532 --> 00:11:57,490 But it's more abstract. 291 00:11:57,490 --> 00:11:59,650 We think that when you see these characters 292 00:11:59,650 --> 00:12:02,230 and many other concepts, you represent something 293 00:12:02,230 --> 00:12:03,792 about how you might create it. 294 00:12:03,792 --> 00:12:05,500 But it doesn't mean it's in your muscles. 295 00:12:05,500 --> 00:12:07,180 You could use other hands. 296 00:12:07,180 --> 00:12:08,440 You could use your toe. 297 00:12:08,440 --> 00:12:11,350 Or you could even just think about it in your imagination. 298 00:12:11,350 --> 00:12:14,350 So the model, basically, tries to induce these simple-- 299 00:12:14,350 --> 00:12:16,810 think about them as maybe simple hierarchical plans, 300 00:12:16,810 --> 00:12:18,520 simple action programs. 301 00:12:18,520 --> 00:12:20,650 And it does it by having a program generating 302 00:12:20,650 --> 00:12:23,830 program that can itself have parameters 303 00:12:23,830 --> 00:12:25,400 that can be learned from data. 304 00:12:25,400 --> 00:12:29,410 So this right here, this is a program called GenerateType. 305 00:12:29,410 --> 00:12:32,050 And what that does is it's a program-- 306 00:12:32,050 --> 00:12:33,700 a type means a character concept, 307 00:12:33,700 --> 00:12:36,070 like each of those three things is a different type. 308 00:12:36,070 --> 00:12:38,620 This is a program which generates a program that 309 00:12:38,620 --> 00:12:40,690 generates the actual character. 310 00:12:40,690 --> 00:12:43,810 The second level of program is called GenerateToken. 311 00:12:43,810 --> 00:12:46,030 That's a program which draws a particular instance 312 00:12:46,030 --> 00:12:47,110 of a character. 313 00:12:47,110 --> 00:12:50,470 And just like you can draw many examples of any concept, 314 00:12:50,470 --> 00:12:52,390 you can call that function many times-- 315 00:12:52,390 --> 00:12:54,730 GenerateToken, GenerateToken, GenerateToken. 316 00:12:54,730 --> 00:12:57,460 So your concept of a character is a generative function. 317 00:12:57,460 --> 00:12:59,500 And in order to learn this, you have, basically, 318 00:12:59,500 --> 00:13:02,380 a prior on those programs that comes from a program generating 319 00:13:02,380 --> 00:13:03,130 program. 320 00:13:03,130 --> 00:13:05,187 That's the GenerateType program. 321 00:13:05,187 --> 00:13:07,270 So there's a lot of details behind how this works. 322 00:13:07,270 --> 00:13:10,240 But basically, the model does a kind 323 00:13:10,240 --> 00:13:13,330 of learning to learn up from a held out unsupervised set 324 00:13:13,330 --> 00:13:18,010 and learns the parameters of this program generating 325 00:13:18,010 --> 00:13:19,690 program, which would characterize 326 00:13:19,690 --> 00:13:22,690 how we draw things in general, what characters look 327 00:13:22,690 --> 00:13:23,650 like in general. 328 00:13:23,650 --> 00:13:27,005 And then, when you see a new character, like this one, 329 00:13:27,005 --> 00:13:28,630 effectively, what the model is doing is 330 00:13:28,630 --> 00:13:31,930 it's both parsing this into its parts, and subparts, 331 00:13:31,930 --> 00:13:32,740 and relations. 332 00:13:32,740 --> 00:13:36,406 But that parsing is, basically, the program synthesis. 333 00:13:36,406 --> 00:13:37,780 It is pretty much the same thing. 334 00:13:37,780 --> 00:13:39,238 You're constructing, you're looking 335 00:13:39,238 --> 00:13:41,320 at that output of some program and saying, what 336 00:13:41,320 --> 00:13:44,360 would be the best simple set of parts, and subparts, 337 00:13:44,360 --> 00:13:46,330 and relations that could draw that? 338 00:13:46,330 --> 00:13:48,520 And then I'm going to infer the most likely one, 339 00:13:48,520 --> 00:13:50,710 and then use that as a generalizable template 340 00:13:50,710 --> 00:13:54,640 or program I can then generate other characters with. 341 00:13:54,640 --> 00:13:58,090 So here, maybe to just illustrate really concretely, 342 00:13:58,090 --> 00:14:01,300 if you were to see this character here-- 343 00:14:01,300 --> 00:14:03,470 well, here's one instance of one class. 344 00:14:03,470 --> 00:14:04,970 Here's an instance of another class. 345 00:14:04,970 --> 00:14:06,790 Again, I have no idea which alphabet this is. 346 00:14:06,790 --> 00:14:07,831 Now, what about this one? 347 00:14:07,831 --> 00:14:09,140 Is it class 1 or class 2? 348 00:14:09,140 --> 00:14:10,790 What do you think? 349 00:14:10,790 --> 00:14:11,620 1, yeah. 350 00:14:11,620 --> 00:14:13,341 Anybody think it's class 2? 351 00:14:13,341 --> 00:14:13,840 OK. 352 00:14:13,840 --> 00:14:15,210 So how do we know it's class 1? 353 00:14:15,210 --> 00:14:17,293 Well, at the pixel level, it doesn't look anything 354 00:14:17,293 --> 00:14:18,280 like that. 355 00:14:18,280 --> 00:14:20,410 So this is, again, an example of some of the issues 356 00:14:20,410 --> 00:14:22,534 that Tommy was talking about-- a really severe kind 357 00:14:22,534 --> 00:14:23,260 of invariance. 358 00:14:23,260 --> 00:14:25,762 But it's not just translation or scale invariance, 359 00:14:25,762 --> 00:14:27,220 although it does have some of that. 360 00:14:27,220 --> 00:14:29,440 But it also has this kind of interesting within-class 361 00:14:29,440 --> 00:14:30,261 invariance. 362 00:14:30,261 --> 00:14:31,510 It's a rather different shape. 363 00:14:31,510 --> 00:14:32,800 It's been distorted somewhat. 364 00:14:32,800 --> 00:14:34,450 For a program, there's a powerful way 365 00:14:34,450 --> 00:14:36,370 to capture that where you can say, well 366 00:14:36,370 --> 00:14:40,450 if you would do something like the program 367 00:14:40,450 --> 00:14:43,270 for generating this, which is like one stroke like that 368 00:14:43,270 --> 00:14:46,030 and then these other two things shown with the red and green, 369 00:14:46,030 --> 00:14:49,300 and here's a program that you might induce to generate that. 370 00:14:49,300 --> 00:14:53,820 And then the question is, which of these two programs, 371 00:14:53,820 --> 00:14:55,920 simple hierarchical motor programs, 372 00:14:55,920 --> 00:14:58,370 is most likely to generate that character? 373 00:14:58,370 --> 00:15:00,390 Now, it turns out that it's incredibly unlikely 374 00:15:00,390 --> 00:15:04,710 to generate any character from one of these programs. 375 00:15:04,710 --> 00:15:06,900 These are the log scores, the log probabilities. 376 00:15:06,900 --> 00:15:09,960 So this one is like 2 to the negative 758. 377 00:15:09,960 --> 00:15:13,110 And this one is like 2 to the negative 1,880. 378 00:15:13,110 --> 00:15:14,340 I don't know if it's base e. 379 00:15:14,340 --> 00:15:15,960 It's maybe 2 or e, but whatever. 380 00:15:15,960 --> 00:15:17,670 So each of these is very small. 381 00:15:17,670 --> 00:15:20,010 But this one is like 1,000 orders of magnitude 382 00:15:20,010 --> 00:15:21,484 more likely than that one. 383 00:15:21,484 --> 00:15:22,650 And that makes sense, right? 384 00:15:22,650 --> 00:15:24,780 It just is easier to think intuitively 385 00:15:24,780 --> 00:15:27,660 about generating this shape from that distortion. 386 00:15:27,660 --> 00:15:30,060 So that's basically what the system does. 387 00:15:30,060 --> 00:15:32,640 And it's able to do this remarkable thing 388 00:15:32,640 --> 00:15:35,030 that you were able to do too-- this one-shot learning 389 00:15:35,030 --> 00:15:35,672 of a concept. 390 00:15:35,672 --> 00:15:37,380 Here's just another illustration of this. 391 00:15:37,380 --> 00:15:39,969 We show people one example of a new character 392 00:15:39,969 --> 00:15:41,760 in an alphabet they don't know and ask them 393 00:15:41,760 --> 00:15:42,930 to pick out the other one. 394 00:15:42,930 --> 00:15:44,385 Everybody see where it is here? 395 00:15:44,385 --> 00:15:48,800 It's not that easy, but it's doable. 396 00:15:48,800 --> 00:15:49,820 Down here, right. 397 00:15:49,820 --> 00:15:53,862 So people are better than 95% correct at this. 398 00:15:53,862 --> 00:15:54,820 This is the error rate. 399 00:15:54,820 --> 00:15:56,540 So the error rate is less than 5% 400 00:15:56,540 --> 00:15:59,090 for humans and also for this model. 401 00:15:59,090 --> 00:16:03,720 But for a range of more standard deep learning models, 402 00:16:03,720 --> 00:16:06,410 this one here is, basically, like an image net or MNIST type 403 00:16:06,410 --> 00:16:06,910 1. 404 00:16:06,910 --> 00:16:08,300 So this is the kind of model that 405 00:16:08,300 --> 00:16:12,710 was really sort of massive convolutional classifier. 406 00:16:12,710 --> 00:16:14,406 The best deep learning one is actually 407 00:16:14,406 --> 00:16:15,530 something for this problem. 408 00:16:15,530 --> 00:16:17,600 It's what's called a Siamese ConvNet. 409 00:16:17,600 --> 00:16:19,040 And that can do somewhat better. 410 00:16:19,040 --> 00:16:22,430 But it's still more than twice as bad as people. 411 00:16:22,430 --> 00:16:24,290 So we think this is one place where, 412 00:16:24,290 --> 00:16:26,340 at least in a hard classification problem, 413 00:16:26,340 --> 00:16:28,790 you can see that deep learning still isn't quite there. 414 00:16:28,790 --> 00:16:32,090 Whereas this-- even the best thing-- this 415 00:16:32,090 --> 00:16:34,490 was a network that was, basically, specifically worked 416 00:16:34,490 --> 00:16:36,770 out by one of Ruslan's students for about a year 417 00:16:36,770 --> 00:16:39,740 to solve exactly this problem on this data set. 418 00:16:39,740 --> 00:16:43,640 And it substantially improved over a standard deep learning 419 00:16:43,640 --> 00:16:47,210 classifier, which substantially improved 420 00:16:47,210 --> 00:16:48,920 over a different deep learning model 421 00:16:48,920 --> 00:16:50,030 that Ruslan and I both worked on. 422 00:16:50,030 --> 00:16:52,071 So there's definitely been some improvement here. 423 00:16:52,071 --> 00:16:53,930 And never bet against deep learning. 424 00:16:53,930 --> 00:16:56,690 I can't guarantee that somebody sets, spends their PhD. 425 00:16:56,690 --> 00:16:59,420 They could work out something that could do this well. 426 00:16:59,420 --> 00:17:04,310 But still, it's a case which still has some room to push, 427 00:17:04,310 --> 00:17:05,990 where, for example, just a pure pattern 428 00:17:05,990 --> 00:17:07,609 recognition approach might go. 429 00:17:07,609 --> 00:17:09,532 But maybe more interesting is, again, 430 00:17:09,532 --> 00:17:11,990 going back to all the things we use our knowledge for, kids 431 00:17:11,990 --> 00:17:12,800 might use our knowledge for. 432 00:17:12,800 --> 00:17:14,300 As we don't just classify the world. 433 00:17:14,300 --> 00:17:15,082 We understand it. 434 00:17:15,082 --> 00:17:16,040 We generate new things. 435 00:17:16,040 --> 00:17:17,579 We imagine new things. 436 00:17:17,579 --> 00:17:21,740 So here's a place where you can use your generative program 437 00:17:21,740 --> 00:17:25,430 that none of these networks do, at least by nature. 438 00:17:25,430 --> 00:17:28,119 Maybe you could think of some way to get them to do it. 439 00:17:28,119 --> 00:17:30,140 And this is to say, not just classify, 440 00:17:30,140 --> 00:17:33,056 but to produce, imagine new examples. 441 00:17:33,056 --> 00:17:34,430 So here's an illustration of this 442 00:17:34,430 --> 00:17:38,840 where we gave people an example of one of these new concepts. 443 00:17:38,840 --> 00:17:41,900 And then we said, draw another example of the same concept. 444 00:17:41,900 --> 00:17:42,890 Don't just copy it. 445 00:17:42,890 --> 00:17:45,370 Make up another example of the concept. 446 00:17:45,370 --> 00:17:48,160 And what you can see here is a set of nine examples 447 00:17:48,160 --> 00:17:51,920 that nine different people did in response to that query. 448 00:17:51,920 --> 00:17:53,900 And then you can also see on the other side 449 00:17:53,900 --> 00:17:56,810 nine examples of our program doing that. 450 00:17:56,810 --> 00:17:58,310 Can anybody tell which is the people 451 00:17:58,310 --> 00:18:01,040 and which is the program? 452 00:18:01,040 --> 00:18:02,130 Let's try this out. 453 00:18:02,130 --> 00:18:05,269 So which is the machine for this character, the left 454 00:18:05,269 --> 00:18:05,810 or the right? 455 00:18:05,810 --> 00:18:06,960 How many people say the left? 456 00:18:06,960 --> 00:18:07,626 Raise your hand. 457 00:18:07,626 --> 00:18:09,620 How many people say the right? 458 00:18:09,620 --> 00:18:11,300 About 50-50, very good. 459 00:18:11,300 --> 00:18:14,260 How many people say this is the machine for this one? 460 00:18:14,260 --> 00:18:16,900 How many people say this is the machine? 461 00:18:16,900 --> 00:18:18,240 May be slight preference there. 462 00:18:18,240 --> 00:18:21,080 How many people say this is the machine? 463 00:18:21,080 --> 00:18:23,210 How many people say this is the machine? 464 00:18:23,210 --> 00:18:25,535 How many people say this is the machine? 465 00:18:25,535 --> 00:18:26,910 Some people really like the left. 466 00:18:26,910 --> 00:18:28,535 How many people say that's the machine? 467 00:18:28,535 --> 00:18:30,250 Basically, it's 50/50 for all of them. 468 00:18:30,250 --> 00:18:31,850 Here's the right answer. 469 00:18:31,850 --> 00:18:34,183 I don't know, you could decide if you were right or not. 470 00:18:34,183 --> 00:18:34,849 I don't know, 471 00:18:34,849 --> 00:18:35,640 Here's another set. 472 00:18:35,640 --> 00:18:38,750 Again, I hope it's clear that this is not an easy task. 473 00:18:38,750 --> 00:18:40,810 And in fact, people are, basically, at chance. 474 00:18:40,810 --> 00:18:42,610 We've done a bunch of studies of this. 475 00:18:42,610 --> 00:18:46,180 And most people just can't tell. 476 00:18:46,180 --> 00:18:48,220 People on average are about 50% correct. 477 00:18:48,220 --> 00:18:49,970 You basically just can't tell. 478 00:18:49,970 --> 00:18:52,150 So it's an example of a kind of Turing test 479 00:18:52,150 --> 00:18:54,520 that a certain interesting program learning 480 00:18:54,520 --> 00:18:56,140 program is solving. 481 00:18:56,140 --> 00:18:59,720 At the level that's confusable with humans, 482 00:18:59,720 --> 00:19:02,020 this system is able to learn simple programs 483 00:19:02,020 --> 00:19:03,010 for visual concepts. 484 00:19:03,010 --> 00:19:05,680 And not just classify but use them to create new things. 485 00:19:05,680 --> 00:19:07,942 You can even create new things at this higher level 486 00:19:07,942 --> 00:19:08,650 that I mentioned. 487 00:19:08,650 --> 00:19:10,960 So here, the task, which, again, people and machines 488 00:19:10,960 --> 00:19:16,240 are roughly similar on, is to be given 10 examples each 489 00:19:16,240 --> 00:19:18,070 of a different concept in a higher level 490 00:19:18,070 --> 00:19:20,530 concept, like an alphabet, and then draw new characters 491 00:19:20,530 --> 00:19:21,439 in that alphabet. 492 00:19:21,439 --> 00:19:23,480 And we give people only a few seconds to do this. 493 00:19:23,480 --> 00:19:25,240 So they don't get too artistic. 494 00:19:25,240 --> 00:19:27,650 But again, you can see that machine is able to do this. 495 00:19:27,650 --> 00:19:30,250 People are also kind of similar. 496 00:19:30,250 --> 00:19:33,430 So let me say, that was a success 497 00:19:33,430 --> 00:19:35,980 story, a place for the idea of learning as program induction 498 00:19:35,980 --> 00:19:36,580 kind of works. 499 00:19:36,580 --> 00:19:38,620 What about something more like what we're really 500 00:19:38,620 --> 00:19:40,060 most deeply interested in-- 501 00:19:40,060 --> 00:19:41,140 children's learning? 502 00:19:41,140 --> 00:19:43,990 Like the ability, for example, to, say, understand 503 00:19:43,990 --> 00:19:45,370 goal-directed action. 504 00:19:45,370 --> 00:19:47,170 These cases we've talked a lot about. 505 00:19:47,170 --> 00:19:49,820 Or intuitive physics-- again, cases we've talked about. 506 00:19:49,820 --> 00:19:51,430 And it's part of our research program 507 00:19:51,430 --> 00:19:53,920 for this center, something we'd love all of you guys, 508 00:19:53,920 --> 00:19:55,810 if you're interested, to help work on. 509 00:19:55,810 --> 00:19:56,930 It's a very big problem. 510 00:19:56,930 --> 00:19:59,380 It's how do you characterize the knowledge that kids 511 00:19:59,380 --> 00:20:01,660 are learning over the first few years and the learning 512 00:20:01,660 --> 00:20:03,850 mechanisms that build it, which we'd 513 00:20:03,850 --> 00:20:06,010 like to think of in some similar way. 514 00:20:06,010 --> 00:20:09,730 Like could we say there's some intuitive physics program 515 00:20:09,730 --> 00:20:12,760 and intuitive physics program learning programs 516 00:20:12,760 --> 00:20:16,120 that are building out knowledge for these kinds of problems? 517 00:20:16,120 --> 00:20:17,729 And we don't know how to do it. 518 00:20:17,729 --> 00:20:19,270 But again, here are some of the steps 519 00:20:19,270 --> 00:20:20,436 we've been starting to take. 520 00:20:20,436 --> 00:20:22,745 So this is work that Tomer did as part of his PhD, 521 00:20:22,745 --> 00:20:24,370 and it's something that he's continuing 522 00:20:24,370 --> 00:20:28,270 to do with Liz and others as part of his post-doc. 523 00:20:28,270 --> 00:20:29,680 So we're showing people-- 524 00:20:29,680 --> 00:20:33,450 again, it's much like what you saw from me and from Laura. 525 00:20:33,450 --> 00:20:37,180 We're really interested in learning from sparse data. 526 00:20:37,180 --> 00:20:41,330 Because all the data is sparse in a sense. 527 00:20:41,330 --> 00:20:43,264 But in the lab, you push things to the limit. 528 00:20:43,264 --> 00:20:44,680 So you study really sparse things, 529 00:20:44,680 --> 00:20:46,930 like one-shot learning of a visual concept. 530 00:20:46,930 --> 00:20:50,680 Or here, this is like we've been interested in what can you 531 00:20:50,680 --> 00:20:53,080 learn about the laws of physics from just 532 00:20:53,080 --> 00:20:55,190 watching that for five seconds. 533 00:20:55,190 --> 00:20:58,660 So we show people videos like this. 534 00:20:58,660 --> 00:21:01,180 Think of this as like you're watching hockey pucks on an air 535 00:21:01,180 --> 00:21:01,750 hockey table. 536 00:21:01,750 --> 00:21:05,470 So it's like an overhead view of some things bouncing around. 537 00:21:05,470 --> 00:21:08,830 And you can see that they're kind 538 00:21:08,830 --> 00:21:10,150 of Newtonian in some sense. 539 00:21:10,150 --> 00:21:11,710 They bounce off of each other. 540 00:21:11,710 --> 00:21:14,950 Looks like there's some inertia, inertial collisions. 541 00:21:14,950 --> 00:21:16,492 But you might notice that there's 542 00:21:16,492 --> 00:21:17,950 some other interesting things going 543 00:21:17,950 --> 00:21:20,226 on that are not just F equals m a, 544 00:21:20,226 --> 00:21:21,850 like other interesting kinds of forces. 545 00:21:21,850 --> 00:21:23,058 And I'll show you other ones. 546 00:21:23,058 --> 00:21:26,230 Tomer made a whole awesome set of these movies. 547 00:21:26,230 --> 00:21:28,880 Hopefully, you've got some idea of what's going on there. 548 00:21:28,880 --> 00:21:31,630 Like interesting forces of attraction, and, repulsion, 549 00:21:31,630 --> 00:21:32,900 different kinds of things. 550 00:21:32,900 --> 00:21:35,552 So here, each of those can be described as a program. 551 00:21:35,552 --> 00:21:37,760 And here's a program generating program, if you like. 552 00:21:37,760 --> 00:21:43,109 So the same kind of idea, just as in the handwritten character 553 00:21:43,109 --> 00:21:43,900 model I showed you. 554 00:21:43,900 --> 00:21:46,483 It's not like it's learning in a blank slate way from scratch. 555 00:21:46,483 --> 00:21:49,120 It knows about objects, parts, and subparts. 556 00:21:49,120 --> 00:21:51,532 What it has to learn is, in this domain 557 00:21:51,532 --> 00:21:52,990 of handwritten characters, what are 558 00:21:52,990 --> 00:21:54,689 the parts and relations like. 559 00:21:54,689 --> 00:21:56,230 And then for the particular new thing 560 00:21:56,230 --> 00:21:58,040 you're learning, like this particular new concept, 561 00:21:58,040 --> 00:21:59,873 what are its particular parts and relations. 562 00:21:59,873 --> 00:22:01,910 So there's these several levels of learning 563 00:22:01,910 --> 00:22:04,960 where the big picture of objects and parts is not learned. 564 00:22:04,960 --> 00:22:07,780 And then the specifics for this domain 565 00:22:07,780 --> 00:22:13,450 of handwritten characters, the idea of what strokes look like. 566 00:22:13,450 --> 00:22:15,336 That's learned from sort of a background set. 567 00:22:15,336 --> 00:22:17,710 And then your ability to do one-shot learning or learning 568 00:22:17,710 --> 00:22:19,884 from very sparse data of a new concept 569 00:22:19,884 --> 00:22:22,300 takes all that prior knowledge, some of which is wired in, 570 00:22:22,300 --> 00:22:23,980 some of which is previously learned, 571 00:22:23,980 --> 00:22:25,690 and brings it to bear to generate 572 00:22:25,690 --> 00:22:27,090 a new program very sparsely. 573 00:22:27,090 --> 00:22:28,910 So you have the same kind of thing here. 574 00:22:28,910 --> 00:22:30,970 We were wiring in, in a sense, F equals 575 00:22:30,970 --> 00:22:33,700 m a, the most general laws of physics. 576 00:22:33,700 --> 00:22:36,190 And then we're also wiring in sort of the possibility 577 00:22:36,190 --> 00:22:39,580 that there could be kinds of things and forces 578 00:22:39,580 --> 00:22:41,920 that they exert on each other, as some kinds of things 579 00:22:41,920 --> 00:22:43,560 exert other kinds of forces on others. 580 00:22:43,560 --> 00:22:45,310 And that there could be latent properties, 581 00:22:45,310 --> 00:22:46,990 things like mass and friction. 582 00:22:46,990 --> 00:22:49,090 And then what the model is trying to do 583 00:22:49,090 --> 00:22:51,966 is, basically, to learn about these particular properties. 584 00:22:51,966 --> 00:22:53,590 What's the mass of this kind of object? 585 00:22:53,590 --> 00:22:55,720 What's the friction of this kind of surface? 586 00:22:55,720 --> 00:22:58,150 Which objects exert which kind of forces on each other? 587 00:22:58,150 --> 00:23:01,180 Is there something like gravity blowing everything 588 00:23:01,180 --> 00:23:03,680 to the left, or the right, or down? 589 00:23:03,680 --> 00:23:05,590 What this is showing here is the same kind 590 00:23:05,590 --> 00:23:07,150 of plots we saw from me last time. 591 00:23:07,150 --> 00:23:10,400 It's a plot of people versus model, 592 00:23:10,400 --> 00:23:12,467 based on a whole bunch of different conditions 593 00:23:12,467 --> 00:23:13,300 of the sort you saw. 594 00:23:13,300 --> 00:23:15,920 People are judging these different physical properties. 595 00:23:15,920 --> 00:23:17,990 And they're making greater judgments 596 00:23:17,990 --> 00:23:20,240 of how likely it is, basically, to have one 597 00:23:20,240 --> 00:23:21,560 of these properties or another. 598 00:23:21,560 --> 00:23:24,180 And there's the model on the x-axis, people on the y-axis. 599 00:23:24,180 --> 00:23:27,590 And what you can see is a sort of OK decent fit. 600 00:23:27,590 --> 00:23:32,630 We characterize this experiment as a kind of a mixed success. 601 00:23:32,630 --> 00:23:34,130 I mean, it's sort of shocking people 602 00:23:34,130 --> 00:23:35,060 can learn anything at all. 603 00:23:35,060 --> 00:23:36,800 Like how much could you learn about the laws of physics 604 00:23:36,800 --> 00:23:38,600 from five seconds of observation? 605 00:23:38,600 --> 00:23:40,550 Well, it's also kind of shocking that Newton 606 00:23:40,550 --> 00:23:43,784 could learn about the laws of physics by just looking at, 607 00:23:43,784 --> 00:23:45,950 you know, in the history of the universe, about five 608 00:23:45,950 --> 00:23:49,692 seconds or less worth of data that people had collected 609 00:23:49,692 --> 00:23:50,900 for the planets going around. 610 00:23:50,900 --> 00:23:53,930 So it is the nature of both science and intuitive theory 611 00:23:53,930 --> 00:23:57,590 building that you can get so much from so little. 612 00:23:57,590 --> 00:23:58,992 But people are not Newton here. 613 00:23:58,992 --> 00:24:00,200 They're just using intuition. 614 00:24:00,200 --> 00:24:01,491 They're making quick responses. 615 00:24:01,491 --> 00:24:02,266 And they're OK. 616 00:24:02,266 --> 00:24:04,640 There's a correlation, but it's not perfect by any means. 617 00:24:04,640 --> 00:24:06,140 One of the things that we're working on right 618 00:24:06,140 --> 00:24:07,850 now is looking at, say, what happens 619 00:24:07,850 --> 00:24:10,391 if you can, unlike, say, Newton, go in and actually intervene 620 00:24:10,391 --> 00:24:11,900 and push these planets around. 621 00:24:11,900 --> 00:24:13,460 Hopefully you'll do better. 622 00:24:13,460 --> 00:24:15,050 But stay tuned for that. 623 00:24:15,050 --> 00:24:17,780 The basic thing here, though, is that people can learn something 624 00:24:17,780 --> 00:24:18,540 from this. 625 00:24:18,540 --> 00:24:22,280 But the way our model works is it's not very satisfying for us 626 00:24:22,280 --> 00:24:25,100 as a view of kind of program induction or program 627 00:24:25,100 --> 00:24:25,670 construction. 628 00:24:25,670 --> 00:24:27,860 Because we think it just knows too much or has, 629 00:24:27,860 --> 00:24:29,990 basically, all the form of the program. 630 00:24:29,990 --> 00:24:31,790 And it's estimating some parameters. 631 00:24:31,790 --> 00:24:35,126 It's like one of the things you do as a hacker, as a coder, 632 00:24:35,126 --> 00:24:37,250 is you have your code and you tune some parameters. 633 00:24:37,250 --> 00:24:39,770 Or you try to decide if this function is the right one 634 00:24:39,770 --> 00:24:41,470 to use or that one. 635 00:24:41,470 --> 00:24:43,160 And this is doing that. 636 00:24:43,160 --> 00:24:45,710 But nowhere is this like actually writing new code, 637 00:24:45,710 --> 00:24:46,575 in a sense. 638 00:24:46,575 --> 00:24:48,200 And that's just the really hard problem 639 00:24:48,200 --> 00:24:50,922 that I wanted to mostly leave you with and set up 640 00:24:50,922 --> 00:24:53,380 going what we're going to do for the rest of the afternoon. 641 00:24:53,380 --> 00:24:56,990 Like if you wanted to not just tune the parameters 642 00:24:56,990 --> 00:25:00,000 and figure out the strength or existence of different forces, 643 00:25:00,000 --> 00:25:03,260 but actually write the form of laws, how would you do this? 644 00:25:03,260 --> 00:25:05,100 What's the right hypothesis space? 645 00:25:05,100 --> 00:25:08,390 So you'd need programs that don't just generate programs 646 00:25:08,390 --> 00:25:11,336 but actually write the code of them in a sense. 647 00:25:11,336 --> 00:25:12,710 And what's an effective algorithm 648 00:25:12,710 --> 00:25:15,197 for searching the space of these theories? 649 00:25:15,197 --> 00:25:16,280 It's very, very difficult. 650 00:25:16,280 --> 00:25:18,170 I think, Tomer, are you going to show this figure at all? 651 00:25:18,170 --> 00:25:20,300 Yes, so mostly I'll leave this to Tomer. 652 00:25:20,300 --> 00:25:24,410 But there's a very striking contrast 653 00:25:24,410 --> 00:25:28,580 between the nice optimization landscapes for, say, 654 00:25:28,580 --> 00:25:32,990 neural networks or most any standard scalable 655 00:25:32,990 --> 00:25:35,120 machine learning algorithm, whether it's 656 00:25:35,120 --> 00:25:36,980 trained by gradient descent or convex 657 00:25:36,980 --> 00:25:40,280 optimization, and the kinds of landscapes for optimization 658 00:25:40,280 --> 00:25:43,100 and search that you have if you're trying to generate 659 00:25:43,100 --> 00:25:45,110 a space of programs. 660 00:25:45,110 --> 00:25:48,350 If you want to see our early attempts to try to do something 661 00:25:48,350 --> 00:25:50,360 like learning the form of a program, 662 00:25:50,360 --> 00:25:52,610 look, for example, at stuff that Charles Kemp did. 663 00:25:52,610 --> 00:25:54,530 Part of his thesis that was published 664 00:25:54,530 --> 00:25:58,190 in PNAS a few years ago, where he tried to generate 665 00:25:58,190 --> 00:25:59,450 or, basically, have-- 666 00:25:59,450 --> 00:26:01,850 think of generative grammars for graphs. 667 00:26:01,850 --> 00:26:04,730 Think about the problem-- so Laura mentioned Darwin. 668 00:26:04,730 --> 00:26:07,880 How did Darwin figure out something about evolution 669 00:26:07,880 --> 00:26:10,160 without understanding any of the mechanisms? 670 00:26:10,160 --> 00:26:13,520 Or the more basic problem of figuring out that species 671 00:26:13,520 --> 00:26:15,680 should be generated by some kind of branching tree 672 00:26:15,680 --> 00:26:17,840 process versus other kinds. 673 00:26:17,840 --> 00:26:21,230 Remember last time when I talked about various kinds 674 00:26:21,230 --> 00:26:23,150 of structured probabilistic models, tree 675 00:26:23,150 --> 00:26:27,450 structures, or spaces, or chains for threshold reasoning. 676 00:26:27,450 --> 00:26:29,150 So Charles did some really nice work, 677 00:26:29,150 --> 00:26:32,540 basically trying to use the idea of a program for generating 678 00:26:32,540 --> 00:26:36,800 graphical models, like there's a grammar that grows out graphs. 679 00:26:36,800 --> 00:26:40,190 And he showed how you could take data 680 00:26:40,190 --> 00:26:42,590 drawn from different domains, like, say, those data sets 681 00:26:42,590 --> 00:26:45,330 you saw before of animals and their properties. 682 00:26:45,330 --> 00:26:47,780 We spent an hour on that last time. 683 00:26:47,780 --> 00:26:50,780 So Charles showed how you could induce not only a tree 684 00:26:50,780 --> 00:26:53,330 structure but the higher level fact that there is a tree 685 00:26:53,330 --> 00:26:54,260 structure. 686 00:26:54,260 --> 00:26:57,560 Namely, a rule that generates trees 687 00:26:57,560 --> 00:26:59,180 being the right abstract principle 688 00:26:59,180 --> 00:27:02,150 to, say, give you the structure of species in biology, 689 00:27:02,150 --> 00:27:04,617 whereas other rules would generate 690 00:27:04,617 --> 00:27:05,700 other kinds of structures. 691 00:27:05,700 --> 00:27:08,270 So for example, he took similar data matrices 692 00:27:08,270 --> 00:27:10,520 for how Supreme Court judges voted 693 00:27:10,520 --> 00:27:12,200 and was able to infer a left-right, 694 00:27:12,200 --> 00:27:14,120 liberal-conservative spectrum. 695 00:27:14,120 --> 00:27:17,150 Or data from the proximities between cities and figure 696 00:27:17,150 --> 00:27:20,810 out a sort of cylinder, like latitude and longitude 697 00:27:20,810 --> 00:27:23,840 map of the world just from the distances between cities. 698 00:27:23,840 --> 00:27:27,440 Or take faces and figure out a low dimensional space 699 00:27:27,440 --> 00:27:29,390 as the right way to think about faces. 700 00:27:29,390 --> 00:27:32,190 So in some sense, this was really cool. 701 00:27:32,190 --> 00:27:33,410 We were really excited. 702 00:27:33,410 --> 00:27:36,560 Oh, hey, we have a way to learn these simple programs which 703 00:27:36,560 --> 00:27:39,250 generate structures, which themselves generate the data. 704 00:27:39,250 --> 00:27:41,660 It's where that idea of hierarchical Bayes 705 00:27:41,660 --> 00:27:44,120 meets up with this idea of program induction 706 00:27:44,120 --> 00:27:47,090 or learning a program. 707 00:27:47,090 --> 00:27:48,110 And it even captured-- 708 00:27:48,110 --> 00:27:50,800 OK, this is really the last slide I'll show. 709 00:27:50,800 --> 00:27:52,730 It even captured something which captured 710 00:27:52,730 --> 00:27:54,220 all of our imaginations. 711 00:27:54,220 --> 00:27:56,720 We use this phrase "the blessing of abstraction" to tie back 712 00:27:56,720 --> 00:28:00,920 into one more theme of Laura's, which is this idea that when 713 00:28:00,920 --> 00:28:03,170 kids are building up abstract concepts, 714 00:28:03,170 --> 00:28:05,210 there's a sense in which, unlike, 715 00:28:05,210 --> 00:28:08,840 say, a lot of maybe traditional machine learning methods 716 00:28:08,840 --> 00:28:10,760 or a lot of traditional ideas and philosophy 717 00:28:10,760 --> 00:28:12,625 about the origins of abstract knowledge, 718 00:28:12,625 --> 00:28:14,750 it's not like you just get the concrete stuff first 719 00:28:14,750 --> 00:28:16,630 and layer on the more abstract stuff. 720 00:28:16,630 --> 00:28:18,700 There's a sense often, in children's 721 00:28:18,700 --> 00:28:21,979 learning as in science, in which the big picture comes in first. 722 00:28:21,979 --> 00:28:23,770 The abstract idea comes there, and then you 723 00:28:23,770 --> 00:28:25,250 fill in the details. 724 00:28:25,250 --> 00:28:27,729 So for example, Darwin figured out, 725 00:28:27,729 --> 00:28:29,020 in some sense, the big picture. 726 00:28:29,020 --> 00:28:30,436 He figured out the idea that there 727 00:28:30,436 --> 00:28:32,840 was some kind of branching process that generated species 728 00:28:32,840 --> 00:28:33,820 that was random. 729 00:28:33,820 --> 00:28:38,380 Not a nice perfect Linnean seven-layer hierarchy 730 00:28:38,380 --> 00:28:40,220 but some kind of random branching process. 731 00:28:40,220 --> 00:28:41,380 And he didn't know what the mechanisms 732 00:28:41,380 --> 00:28:42,580 were that gave rise to it. 733 00:28:42,580 --> 00:28:44,800 And similarly, Newton figured out something 734 00:28:44,800 --> 00:28:47,922 about the law of gravitation and everything else in his laws, 735 00:28:47,922 --> 00:28:49,630 though he didn't know the mechanisms that 736 00:28:49,630 --> 00:28:51,007 gave rise to gravity. 737 00:28:51,007 --> 00:28:52,090 And he didn't even know g. 738 00:28:52,090 --> 00:28:54,589 He didn't even know the value of the gravitational constant. 739 00:28:54,589 --> 00:28:56,770 That couldn't be estimated for 100 years later. 740 00:28:56,770 --> 00:28:59,299 But somehow he was able to get the abstract form. 741 00:28:59,299 --> 00:29:01,090 And these nice things that Charles Kemp did 742 00:29:01,090 --> 00:29:02,180 were also able to do that. 743 00:29:02,180 --> 00:29:04,930 So for example, from very little data, 744 00:29:04,930 --> 00:29:07,330 to figure out that animals should be generated 745 00:29:07,330 --> 00:29:09,040 by some kind of a tree structure, 746 00:29:09,040 --> 00:29:10,690 as opposed to, say, the simpler model 747 00:29:10,690 --> 00:29:12,610 of just a bunch of flat clusters. 748 00:29:12,610 --> 00:29:14,590 That model was able to figure that out, 749 00:29:14,590 --> 00:29:16,910 over here on the right, from just a small fraction 750 00:29:16,910 --> 00:29:17,410 of the data. 751 00:29:17,410 --> 00:29:19,120 And then with all the rest of the data, 752 00:29:19,120 --> 00:29:21,449 it was able to figure out the right tree in a sense. 753 00:29:21,449 --> 00:29:23,490 And we called this "the blessing of abstraction," 754 00:29:23,490 --> 00:29:27,190 this idea that often, in these hierarchical program learning 755 00:29:27,190 --> 00:29:29,140 programs, you could get the high level 756 00:29:29,140 --> 00:29:31,112 idea before you got the lower level idea 757 00:29:31,112 --> 00:29:32,320 and then fill in the details. 758 00:29:32,320 --> 00:29:35,140 And I still think there's something fundamentally right 759 00:29:35,140 --> 00:29:39,250 about this idea of children's learning, 760 00:29:39,250 --> 00:29:41,890 both representationally and mechanistically. 761 00:29:41,890 --> 00:29:43,900 And that this dynamics of sometimes getting 762 00:29:43,900 --> 00:29:46,750 the big picture first and using that as a constraint to fill 763 00:29:46,750 --> 00:29:49,540 in the details is fundamentally right. 764 00:29:49,540 --> 00:29:52,270 But actually, understanding how this-- 765 00:29:52,270 --> 00:29:54,790 either algorithmically-- how to search the space of programs 766 00:29:54,790 --> 00:29:57,100 for anything that looks like an intuitive causal theory 767 00:29:57,100 --> 00:29:59,379 of physics and relate that to the dynamics 768 00:29:59,379 --> 00:30:00,670 of how children actually learn. 769 00:30:00,670 --> 00:30:03,190 That's the big open question that I will now hand over 770 00:30:03,190 --> 00:30:06,630 to our other speakers.