1
00:00:01,680 --> 00:00:04,080
The following content is
provided under a Creative

2
00:00:04,080 --> 00:00:05,620
Commons license.

3
00:00:05,620 --> 00:00:07,920
Your support will help
MIT OpenCourseWare

4
00:00:07,920 --> 00:00:12,280
continue to offer high quality
educational resources for free.

5
00:00:12,280 --> 00:00:14,910
To make a donation or
view additional materials

6
00:00:14,910 --> 00:00:18,870
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,870 --> 00:00:22,089
at ocw.mit.edu.

8
00:00:22,089 --> 00:00:23,880
JOSHUA TENENBAUM: So
we saw in Laura's talk

9
00:00:23,880 --> 00:00:27,690
this introduction to both this
idea of the child as scientist,

10
00:00:27,690 --> 00:00:30,300
all the different ways that
children's learning seems

11
00:00:30,300 --> 00:00:33,120
to follow the many different
practices that scientists use

12
00:00:33,120 --> 00:00:35,790
to learn about the world,
not just the data analysis,

13
00:00:35,790 --> 00:00:38,270
in a sense.

14
00:00:38,270 --> 00:00:42,810
As much as various kinds of
statistics on a grand scale,

15
00:00:42,810 --> 00:00:44,520
whether it's Hebbian
learning or backprop

16
00:00:44,520 --> 00:00:47,340
important for learning, we
know that that's not all

17
00:00:47,340 --> 00:00:50,370
that children do, just like
analyzing patterns of data

18
00:00:50,370 --> 00:00:52,110
is not all that scientists do.

19
00:00:52,110 --> 00:00:54,870
And then Laura added this
other cool dimension,

20
00:00:54,870 --> 00:00:57,636
thinking about the costs
and rewards of information,

21
00:00:57,636 --> 00:00:58,760
when is it worth it or not.

22
00:00:58,760 --> 00:01:01,410
And you could say, maybe she
was suggesting that we should

23
00:01:01,410 --> 00:01:04,170
develop the metaphor, expand
it a little bit from child

24
00:01:04,170 --> 00:01:08,010
as scientist to maybe like the
child as-- oh, oops, sorry--

25
00:01:08,010 --> 00:01:12,660
lab PI or maybe even
NSF center director.

26
00:01:12,660 --> 00:01:14,610
Because as everyone
knows, but Tommy certainly

27
00:01:14,610 --> 00:01:18,120
can tell you, whether you're a
lab PI who's just gotten tenure

28
00:01:18,120 --> 00:01:19,740
or the director
of an NSF Center,

29
00:01:19,740 --> 00:01:22,200
you have to make hard-nosed
pragmatic decisions

30
00:01:22,200 --> 00:01:24,360
about what is achievable
given the costs

31
00:01:24,360 --> 00:01:26,520
and which research
questions are really worth

32
00:01:26,520 --> 00:01:30,420
going after and devoting your
time and other resources to.

33
00:01:30,420 --> 00:01:33,617
And that's a very
important part of science.

34
00:01:33,617 --> 00:01:35,700
And it's an important part
of intuitive knowledge.

35
00:01:35,700 --> 00:01:38,790
I want to add another
practical dimension

36
00:01:38,790 --> 00:01:43,200
to things, a way of
bringing out, fleshing out

37
00:01:43,200 --> 00:01:46,020
this idea of the
child as scientist.

38
00:01:46,020 --> 00:01:48,300
You can think about all
these, these are metaphors.

39
00:01:48,300 --> 00:01:50,160
But they're things
that we can formalize.

40
00:01:50,160 --> 00:01:53,350
And if our goal is to make
computational models of

41
00:01:53,350 --> 00:01:56,370
and ultimately to get some
kind of theoretical handle on,

42
00:01:56,370 --> 00:01:57,750
then these are helping us.

43
00:01:57,750 --> 00:01:59,910
By adding in the
costs and benefits,

44
00:01:59,910 --> 00:02:01,227
you bring in utility calculus.

45
00:02:01,227 --> 00:02:03,060
And there's not just
naive utility calculus,

46
00:02:03,060 --> 00:02:05,610
there's a formal
mathematical utility calculus

47
00:02:05,610 --> 00:02:07,110
of these kinds of decisions.

48
00:02:07,110 --> 00:02:09,914
Julian Jara-Ettinger,
who Laura mentioned,

49
00:02:09,914 --> 00:02:12,330
was driving a lot of the work
towards the end of the talk,

50
00:02:12,330 --> 00:02:15,450
has done some really
interesting actual mathematical

51
00:02:15,450 --> 00:02:17,010
computational models
of these issues,

52
00:02:17,010 --> 00:02:20,070
as has [? Choi ?] and the other
students you talked about.

53
00:02:20,070 --> 00:02:22,320
So the direction I
want to push here

54
00:02:22,320 --> 00:02:25,320
is what you might call
the child as hacker.

55
00:02:25,320 --> 00:02:27,360
This is trying to make
the connection back

56
00:02:27,360 --> 00:02:31,590
to this idea of formalizing
common sense knowledge

57
00:02:31,590 --> 00:02:34,230
and intuitive theories as
probabilistic programs.

58
00:02:34,230 --> 00:02:37,230
Or just more generally,
the idea of a program--

59
00:02:37,230 --> 00:02:41,910
a structured
algorithm and a dat--

60
00:02:41,910 --> 00:02:43,560
or some combination
of algorithms,

61
00:02:43,560 --> 00:02:45,780
data structures,
networks of functions

62
00:02:45,780 --> 00:02:49,770
that can scribe interesting
causal processes in the world,

63
00:02:49,770 --> 00:02:52,950
like, for example
your intuitive physics

64
00:02:52,950 --> 00:02:54,960
or your intuitive psychology.

65
00:02:54,960 --> 00:02:57,780
That's an idea that we
talked a lot about last week

66
00:02:57,780 --> 00:03:01,480
or whenever it was, earlier
in the week or last week.

67
00:03:01,480 --> 00:03:04,590
And this idea that
if your knowledge is

68
00:03:04,590 --> 00:03:08,160
something like a program, or a
set of programs, that you can,

69
00:03:08,160 --> 00:03:11,250
say, for example, run
forward to simulate physics,

70
00:03:11,250 --> 00:03:13,530
like we had last
time, then learning

71
00:03:13,530 --> 00:03:16,549
has to be something like
building a program or hacking.

72
00:03:16,549 --> 00:03:18,840
And I think you could make--
this is, again, a research

73
00:03:18,840 --> 00:03:20,722
program that I wish I had.

74
00:03:20,722 --> 00:03:22,680
Like Laura talked to the
end about the research

75
00:03:22,680 --> 00:03:23,810
she wished she had.

76
00:03:23,810 --> 00:03:25,380
It's not really
just a wish for her.

77
00:03:25,380 --> 00:03:27,630
She actually is working
towards that research program.

78
00:03:27,630 --> 00:03:29,519
And this isn't just
an empty wish either.

79
00:03:29,519 --> 00:03:31,060
It's something that
we're working on,

80
00:03:31,060 --> 00:03:33,660
which is to try to take--

81
00:03:33,660 --> 00:03:35,340
just as Laura had
that wonderful list

82
00:03:35,340 --> 00:03:38,040
of all the things scientists
do in their practices

83
00:03:38,040 --> 00:03:40,020
to learn about
the world, I think

84
00:03:40,020 --> 00:03:42,210
you could make a similar
list of all the things

85
00:03:42,210 --> 00:03:44,820
that you do in your
programming or hacking.

86
00:03:44,820 --> 00:03:47,550
By hacking I don't mean like
breaking into a secure system,

87
00:03:47,550 --> 00:03:51,660
but modifying your code
to make it more awesome.

88
00:03:51,660 --> 00:03:53,160
And I use awesome
very deliberately,

89
00:03:53,160 --> 00:03:55,440
because awesome is a
multi-dimensional term.

90
00:03:55,440 --> 00:03:56,190
It's just awesome.

91
00:03:56,190 --> 00:03:59,550
But it could be faster, more
accurate, more efficient,

92
00:03:59,550 --> 00:04:02,310
more elegant, more
generalizable,

93
00:04:02,310 --> 00:04:04,170
more easily communicated
to other people,

94
00:04:04,170 --> 00:04:07,020
more easily modularly
combined with other code

95
00:04:07,020 --> 00:04:08,989
to do something
even more awesome.

96
00:04:08,989 --> 00:04:10,530
I think there's a
deep sense in which

97
00:04:10,530 --> 00:04:14,430
that aesthetic behind hacking
and making awesome code,

98
00:04:14,430 --> 00:04:16,769
in both an individual and
a social setting, that's

99
00:04:16,769 --> 00:04:18,597
a really powerful way
to think about many

100
00:04:18,597 --> 00:04:20,430
of the cognitive
activities behind learning.

101
00:04:20,430 --> 00:04:24,360
And it goes together with the
idea of the child as scientist

102
00:04:24,360 --> 00:04:27,110
if the form of your,
quote, "intuitive science"

103
00:04:27,110 --> 00:04:30,400
are computer programs or
something like programs.

104
00:04:30,400 --> 00:04:32,137
So we've been working
on a few projects

105
00:04:32,137 --> 00:04:33,970
where we've been trying
to capture this idea

106
00:04:33,970 --> 00:04:35,970
and to say, well, what
would it mean to describe

107
00:04:35,970 --> 00:04:40,710
computationally this idea of
learning as either synthesizing

108
00:04:40,710 --> 00:04:44,760
programs, or modifying
programs, or making more

109
00:04:44,760 --> 00:04:46,410
awesome programs in your mind.

110
00:04:46,410 --> 00:04:48,360
And I'll just show you
a few examples of this.

111
00:04:48,360 --> 00:04:51,180
I'll show you our good, our
successful case studies,

112
00:04:51,180 --> 00:04:53,100
places where we've
made this idea work.

113
00:04:53,100 --> 00:04:55,020
But the bottom line
to foreshadow it

114
00:04:55,020 --> 00:04:57,670
is that this is
really, really hard.

115
00:04:57,670 --> 00:05:00,030
And to get it to work for
the kinds of knowledge

116
00:05:00,030 --> 00:05:02,150
that, say, Laura
was talking about

117
00:05:02,150 --> 00:05:05,137
or that Liz was talking
about, the real stuff

118
00:05:05,137 --> 00:05:07,220
of children's knowledge,
is still very, very open.

119
00:05:07,220 --> 00:05:09,770
And we want to,
basically, build up

120
00:05:09,770 --> 00:05:12,770
to engaging why that
problem is so hard.

121
00:05:12,770 --> 00:05:15,290
From this to what Tomer and
then Laura will talk about later

122
00:05:15,290 --> 00:05:16,960
on in the afternoon.

123
00:05:16,960 --> 00:05:19,700
But here's a few at least
early success stories

124
00:05:19,700 --> 00:05:21,140
that we've worked on.

125
00:05:21,140 --> 00:05:26,090
One goes back to this idea
that I presented in my lectures

126
00:05:26,090 --> 00:05:27,530
last week.

127
00:05:27,530 --> 00:05:29,120
And it connects to,
again, something

128
00:05:29,120 --> 00:05:30,036
that Laura was saying.

129
00:05:30,036 --> 00:05:31,820
Here, a very basic
kind of learning.

130
00:05:31,820 --> 00:05:33,950
It's just the
problem of learning

131
00:05:33,950 --> 00:05:38,106
some generalizable concepts at
all from very sparse evidence.

132
00:05:38,106 --> 00:05:39,980
Like one-shot learning--
again, something you

133
00:05:39,980 --> 00:05:41,630
heard from Tommy, a number
of the other speakers.

134
00:05:41,630 --> 00:05:43,796
We've all been trying to
wrap our heads around this.

135
00:05:43,796 --> 00:05:45,680
How can you learn,
say, any concept

136
00:05:45,680 --> 00:05:48,050
at all from very,
very little data,

137
00:05:48,050 --> 00:05:49,660
maybe just one or
a few examples.

138
00:05:49,660 --> 00:05:52,120
So you saw this kind
of thing last time.

139
00:05:52,120 --> 00:05:53,870
And I briefly
mentioned how we had

140
00:05:53,870 --> 00:05:58,100
tried to capture this problem
as something like this

141
00:05:58,100 --> 00:06:00,590
by building this tree
structured hypothesis space.

142
00:06:00,590 --> 00:06:03,272
And you could think of this as
a kind of program induction.

143
00:06:03,272 --> 00:06:04,730
If you think that
there's something

144
00:06:04,730 --> 00:06:07,460
like an evolutionary program
which generated these objects,

145
00:06:07,460 --> 00:06:10,340
and you're trying to find
the sub procedure of it

146
00:06:10,340 --> 00:06:12,790
that generated just
these kinds of objects.

147
00:06:12,790 --> 00:06:15,620
But that's not at all how
we were able to model this.

148
00:06:15,620 --> 00:06:17,360
We had a much simpler model.

149
00:06:17,360 --> 00:06:20,404
But let me show you briefly some
work that we did in our group

150
00:06:20,404 --> 00:06:21,320
a couple of years ago.

151
00:06:21,320 --> 00:06:24,050
It's really just getting
out into publication now.

152
00:06:24,050 --> 00:06:27,680
This is work that was
mostly done by two people--

153
00:06:27,680 --> 00:06:30,881
Ruslan Salakhutdinov, who is
now a professor at Toronto,

154
00:06:30,881 --> 00:06:32,630
although about to move
to Carnegie Mellon,

155
00:06:32,630 --> 00:06:34,610
I think, and Brenden Lake.

156
00:06:34,610 --> 00:06:37,280
He's a machine learning
person, also very well known

157
00:06:37,280 --> 00:06:38,226
for deep learning.

158
00:06:38,226 --> 00:06:40,100
And then Brenden Lake--
this is really mostly

159
00:06:40,100 --> 00:06:42,320
what I'll talk about
is Brenden Lake's work,

160
00:06:42,320 --> 00:06:46,110
who is now a post-doc at NYU.

161
00:06:46,110 --> 00:06:49,310
And again, where we think
we're building up to

162
00:06:49,310 --> 00:06:51,590
is trying to learn
something like the program

163
00:06:51,590 --> 00:06:53,810
of an intuitive physics
or intuitive psychology.

164
00:06:53,810 --> 00:06:56,770
But here we're just talking
about learning object concepts.

165
00:06:56,770 --> 00:06:58,520
And we've been doing
this work with a data

166
00:06:58,520 --> 00:07:01,360
set of handwritten characters,
the ones you see on the right

167
00:07:01,360 --> 00:07:02,120
here.

168
00:07:02,120 --> 00:07:05,390
I'll just put it up in
contrast or by comparison

169
00:07:05,390 --> 00:07:07,910
to, say, this other much
more famous data set

170
00:07:07,910 --> 00:07:10,220
of handwritten characters,
the MNIST data set.

171
00:07:10,220 --> 00:07:12,710
How many people have seen
the MNIST data set, maybe

172
00:07:12,710 --> 00:07:13,510
in some of the previous talks?

173
00:07:13,510 --> 00:07:14,600
How many people have
actually used it?

174
00:07:14,600 --> 00:07:16,460
Yeah, it's a great
data set to use.

175
00:07:16,460 --> 00:07:19,890
It's driven a lot of basic
machine learning research,

176
00:07:19,890 --> 00:07:21,590
including deep learning.

177
00:07:21,590 --> 00:07:23,470
Yann LeCun originally
collected this data

178
00:07:23,470 --> 00:07:24,650
set and put this out there.

179
00:07:24,650 --> 00:07:27,140
And Geoffrey Hinton did
most of the development.

180
00:07:27,140 --> 00:07:29,450
The stuff that now wins
object recognition challenges

181
00:07:29,450 --> 00:07:30,920
was done on this data set.

182
00:07:30,920 --> 00:07:31,760
But not only that.

183
00:07:31,760 --> 00:07:34,490
Also a lot of Bayesian
stuff and probabilistic

184
00:07:34,490 --> 00:07:35,710
generative models.

185
00:07:35,710 --> 00:07:37,820
Now, the thing
about that data set,

186
00:07:37,820 --> 00:07:41,180
though, is it has a very
small number of classes, just

187
00:07:41,180 --> 00:07:44,960
the digits 0 through 9, and
a huge number of examples,

188
00:07:44,960 --> 00:07:47,245
roughly 10,000
examples in each class

189
00:07:47,245 --> 00:07:50,289
or maybe 6,000 examples,
something like that.

190
00:07:50,289 --> 00:07:51,830
But we wanted to
construct a data set

191
00:07:51,830 --> 00:07:55,370
which was similar in some ways
in its complexity and scale,

192
00:07:55,370 --> 00:08:00,260
but where we had many, many
more concepts and, perhaps, many

193
00:08:00,260 --> 00:08:01,950
fewer examples.

194
00:08:01,950 --> 00:08:06,740
So here we got people to
write by hand characters in 50

195
00:08:06,740 --> 00:08:07,747
different alphabets.

196
00:08:07,747 --> 00:08:09,080
And it's a really cool data set.

197
00:08:09,080 --> 00:08:14,129
So that total data set
has 1,623 concepts drawn.

198
00:08:14,129 --> 00:08:15,920
You could call them
handwritten characters.

199
00:08:15,920 --> 00:08:19,210
You could just call them
simple visual concepts

200
00:08:19,210 --> 00:08:21,890
as a sort of a warm
up for bigger problems

201
00:08:21,890 --> 00:08:24,390
of, say, natural objects.

202
00:08:24,390 --> 00:08:26,180
And there's 20
examples per class.

203
00:08:26,180 --> 00:08:30,260
So there's roughly
30,000 total data points

204
00:08:30,260 --> 00:08:33,590
in this data set,
very much like MNIST.

205
00:08:33,590 --> 00:08:36,373
You can see, just
to illustrate here,

206
00:08:36,373 --> 00:08:39,039
there's many different alphabets
that have very different forms.

207
00:08:39,039 --> 00:08:43,010
You can see both the
similarities and differences

208
00:08:43,010 --> 00:08:45,732
between alphabets here.

209
00:08:45,732 --> 00:08:48,190
So in that sense, there's kind
of a hierarchical structure.

210
00:08:48,190 --> 00:08:50,480
Each one of these is a
character in an alphabet.

211
00:08:50,480 --> 00:08:52,370
But there's also the
higher level concept

212
00:08:52,370 --> 00:08:54,560
of a sort of a Sanskrit
form, as distinct

213
00:08:54,560 --> 00:08:57,571
from, say, to Gaelic,
or Hebrew, or Braille.

214
00:08:57,571 --> 00:08:58,820
There's some made-up alphabet.

215
00:08:58,820 --> 00:09:01,250
But one of the neat
things about this domain

216
00:09:01,250 --> 00:09:03,980
is that you can make
up new concepts,

217
00:09:03,980 --> 00:09:06,590
and you can make up whole
concepts of concepts,

218
00:09:06,590 --> 00:09:08,990
like whole new alphabets.

219
00:09:08,990 --> 00:09:10,820
You can do one-shot
learning in it.

220
00:09:10,820 --> 00:09:14,670
So let's just try this
out here for a second.

221
00:09:14,670 --> 00:09:16,070
You remember the tufa demo.

222
00:09:16,070 --> 00:09:17,653
We can do the same
kind of thing here.

223
00:09:17,653 --> 00:09:19,820
Like let's take
these characters.

224
00:09:19,820 --> 00:09:22,440
Anybody know the
alphabet that this is?

225
00:09:22,440 --> 00:09:23,270
OK, that's good.

226
00:09:23,270 --> 00:09:25,010
Most of you have not
seen these before.

227
00:09:25,010 --> 00:09:26,600
That's good that you know.

228
00:09:26,600 --> 00:09:29,750
But we'll do this experiment
run on the rest of you.

229
00:09:29,750 --> 00:09:32,570
So here's one
example of a concept.

230
00:09:32,570 --> 00:09:34,760
Call it a tufa if you like.

231
00:09:34,760 --> 00:09:37,080
And I'll just run my mouse
over these other ones.

232
00:09:37,080 --> 00:09:39,830
And you just clap when I
get to the other example

233
00:09:39,830 --> 00:09:41,170
of the same class, OK?

234
00:09:50,480 --> 00:09:52,450
[SOUND OF CLAPS]

235
00:09:52,450 --> 00:09:53,570
OK, very good.

236
00:09:53,570 --> 00:09:56,140
Yeah, people are,
basically, perfect at this.

237
00:09:56,140 --> 00:09:57,940
It doesn't take--
I mean, again, it's

238
00:09:57,940 --> 00:10:01,480
very fast and almost perfect.

239
00:10:01,480 --> 00:10:05,784
And again, you saw me talk a
little about this last time.

240
00:10:05,784 --> 00:10:07,450
Just like with natural
objects, not only

241
00:10:07,450 --> 00:10:09,730
can you learn one of these
concepts from one example

242
00:10:09,730 --> 00:10:11,230
and generalize it
to others, but you

243
00:10:11,230 --> 00:10:13,630
can use that knowledge
in various other ways.

244
00:10:13,630 --> 00:10:16,080
So you can parse these
things into parts.

245
00:10:16,080 --> 00:10:17,830
We think that's part
of what you're doing.

246
00:10:17,830 --> 00:10:19,577
You can generate new examples.

247
00:10:19,577 --> 00:10:21,160
So here are three
different people all

248
00:10:21,160 --> 00:10:22,600
drawing the same character.

249
00:10:22,600 --> 00:10:25,240
And in fact, the whole data
set was generated that way.

250
00:10:25,240 --> 00:10:27,370
You can also make higher
level generalizations,

251
00:10:27,370 --> 00:10:30,040
recombining the parts
into totally new concepts,

252
00:10:30,040 --> 00:10:33,820
the way there's that weird
kind of like unicycle thing

253
00:10:33,820 --> 00:10:36,010
over there, unimotorcycle.

254
00:10:36,010 --> 00:10:38,320
Here, I can show
you, as you'll see

255
00:10:38,320 --> 00:10:40,510
10 characters in a
new alphabet, and you

256
00:10:40,510 --> 00:10:42,610
can make up hypothetical,
if, perhaps,

257
00:10:42,610 --> 00:10:44,505
incorrect examples in it.

258
00:10:44,505 --> 00:10:45,880
Again, I'm just
going to show you

259
00:10:45,880 --> 00:10:47,260
a couple of case
studies of where

260
00:10:47,260 --> 00:10:52,150
this idea of learning as
program synthesis might work.

261
00:10:52,150 --> 00:10:55,060
So the idea here is
that, as you might see,

262
00:10:55,060 --> 00:10:57,740
these are three characters
down on the bottom.

263
00:10:57,740 --> 00:11:00,460
And this is just a
very schematic diagram

264
00:11:00,460 --> 00:11:05,920
of how our model tries to
represent these as simple kinds

265
00:11:05,920 --> 00:11:07,150
of programs.

266
00:11:07,150 --> 00:11:09,500
Think about how you would
draw, say, that character down

267
00:11:09,500 --> 00:11:10,083
to the bottom.

268
00:11:10,083 --> 00:11:13,270
Just try to draw it in midair.

269
00:11:13,270 --> 00:11:16,990
How would you draw that one
in the lower left there?

270
00:11:16,990 --> 00:11:19,265
Are many of you doing
something like this?

271
00:11:19,265 --> 00:11:20,390
Is that what you are doing?

272
00:11:20,390 --> 00:11:20,890
OK, yeah.

273
00:11:20,890 --> 00:11:22,780
So basically,
everyone does that.

274
00:11:22,780 --> 00:11:25,180
And you can describe
that as sort

275
00:11:25,180 --> 00:11:27,469
of having two large
parts or two strokes,

276
00:11:27,469 --> 00:11:29,260
where you pick up your
pen between strokes.

277
00:11:29,260 --> 00:11:31,060
And one of the strokes
has two sub strokes,

278
00:11:31,060 --> 00:11:32,410
where you stop your pen.

279
00:11:32,410 --> 00:11:34,250
And there's a
consistent relationship.

280
00:11:34,250 --> 00:11:36,400
The second stroke has
to begin somewhere

281
00:11:36,400 --> 00:11:39,310
on a particular general
region of the first stroke.

282
00:11:39,310 --> 00:11:41,780
And basically, that's the
model's representation

283
00:11:41,780 --> 00:11:44,800
of concepts-- part, subparts,
and simple relations-- which,

284
00:11:44,800 --> 00:11:47,720
you can see, it might
scale up, arguably,

285
00:11:47,720 --> 00:11:50,110
to more interesting
kinds of natural objects.

286
00:11:50,110 --> 00:11:52,630
And the basic idea is that
you represent that, though,

287
00:11:52,630 --> 00:11:53,557
as a program.

288
00:11:53,557 --> 00:11:54,640
It's a generative program.

289
00:11:54,640 --> 00:11:56,532
It's kind of like
a motor program.

290
00:11:56,532 --> 00:11:57,490
But it's more abstract.

291
00:11:57,490 --> 00:11:59,650
We think that when you
see these characters

292
00:11:59,650 --> 00:12:02,230
and many other concepts,
you represent something

293
00:12:02,230 --> 00:12:03,792
about how you might create it.

294
00:12:03,792 --> 00:12:05,500
But it doesn't mean
it's in your muscles.

295
00:12:05,500 --> 00:12:07,180
You could use other hands.

296
00:12:07,180 --> 00:12:08,440
You could use your toe.

297
00:12:08,440 --> 00:12:11,350
Or you could even just think
about it in your imagination.

298
00:12:11,350 --> 00:12:14,350
So the model, basically,
tries to induce these simple--

299
00:12:14,350 --> 00:12:16,810
think about them as maybe
simple hierarchical plans,

300
00:12:16,810 --> 00:12:18,520
simple action programs.

301
00:12:18,520 --> 00:12:20,650
And it does it by having
a program generating

302
00:12:20,650 --> 00:12:23,830
program that can
itself have parameters

303
00:12:23,830 --> 00:12:25,400
that can be learned from data.

304
00:12:25,400 --> 00:12:29,410
So this right here, this is a
program called GenerateType.

305
00:12:29,410 --> 00:12:32,050
And what that does
is it's a program--

306
00:12:32,050 --> 00:12:33,700
a type means a
character concept,

307
00:12:33,700 --> 00:12:36,070
like each of those three
things is a different type.

308
00:12:36,070 --> 00:12:38,620
This is a program which
generates a program that

309
00:12:38,620 --> 00:12:40,690
generates the actual character.

310
00:12:40,690 --> 00:12:43,810
The second level of program
is called GenerateToken.

311
00:12:43,810 --> 00:12:46,030
That's a program which
draws a particular instance

312
00:12:46,030 --> 00:12:47,110
of a character.

313
00:12:47,110 --> 00:12:50,470
And just like you can draw
many examples of any concept,

314
00:12:50,470 --> 00:12:52,390
you can call that
function many times--

315
00:12:52,390 --> 00:12:54,730
GenerateToken,
GenerateToken, GenerateToken.

316
00:12:54,730 --> 00:12:57,460
So your concept of a character
is a generative function.

317
00:12:57,460 --> 00:12:59,500
And in order to learn
this, you have, basically,

318
00:12:59,500 --> 00:13:02,380
a prior on those programs that
comes from a program generating

319
00:13:02,380 --> 00:13:03,130
program.

320
00:13:03,130 --> 00:13:05,187
That's the GenerateType program.

321
00:13:05,187 --> 00:13:07,270
So there's a lot of details
behind how this works.

322
00:13:07,270 --> 00:13:10,240
But basically, the
model does a kind

323
00:13:10,240 --> 00:13:13,330
of learning to learn up from
a held out unsupervised set

324
00:13:13,330 --> 00:13:18,010
and learns the parameters
of this program generating

325
00:13:18,010 --> 00:13:19,690
program, which
would characterize

326
00:13:19,690 --> 00:13:22,690
how we draw things in
general, what characters look

327
00:13:22,690 --> 00:13:23,650
like in general.

328
00:13:23,650 --> 00:13:27,005
And then, when you see a new
character, like this one,

329
00:13:27,005 --> 00:13:28,630
effectively, what
the model is doing is

330
00:13:28,630 --> 00:13:31,930
it's both parsing this into
its parts, and subparts,

331
00:13:31,930 --> 00:13:32,740
and relations.

332
00:13:32,740 --> 00:13:36,406
But that parsing is, basically,
the program synthesis.

333
00:13:36,406 --> 00:13:37,780
It is pretty much
the same thing.

334
00:13:37,780 --> 00:13:39,238
You're constructing,
you're looking

335
00:13:39,238 --> 00:13:41,320
at that output of some
program and saying, what

336
00:13:41,320 --> 00:13:44,360
would be the best simple
set of parts, and subparts,

337
00:13:44,360 --> 00:13:46,330
and relations that
could draw that?

338
00:13:46,330 --> 00:13:48,520
And then I'm going to
infer the most likely one,

339
00:13:48,520 --> 00:13:50,710
and then use that as a
generalizable template

340
00:13:50,710 --> 00:13:54,640
or program I can then generate
other characters with.

341
00:13:54,640 --> 00:13:58,090
So here, maybe to just
illustrate really concretely,

342
00:13:58,090 --> 00:14:01,300
if you were to see
this character here--

343
00:14:01,300 --> 00:14:03,470
well, here's one
instance of one class.

344
00:14:03,470 --> 00:14:04,970
Here's an instance
of another class.

345
00:14:04,970 --> 00:14:06,790
Again, I have no idea
which alphabet this is.

346
00:14:06,790 --> 00:14:07,831
Now, what about this one?

347
00:14:07,831 --> 00:14:09,140
Is it class 1 or class 2?

348
00:14:09,140 --> 00:14:10,790
What do you think?

349
00:14:10,790 --> 00:14:11,620
1, yeah.

350
00:14:11,620 --> 00:14:13,341
Anybody think it's class 2?

351
00:14:13,341 --> 00:14:13,840
OK.

352
00:14:13,840 --> 00:14:15,210
So how do we know it's class 1?

353
00:14:15,210 --> 00:14:17,293
Well, at the pixel level,
it doesn't look anything

354
00:14:17,293 --> 00:14:18,280
like that.

355
00:14:18,280 --> 00:14:20,410
So this is, again, an
example of some of the issues

356
00:14:20,410 --> 00:14:22,534
that Tommy was talking
about-- a really severe kind

357
00:14:22,534 --> 00:14:23,260
of invariance.

358
00:14:23,260 --> 00:14:25,762
But it's not just translation
or scale invariance,

359
00:14:25,762 --> 00:14:27,220
although it does
have some of that.

360
00:14:27,220 --> 00:14:29,440
But it also has this kind
of interesting within-class

361
00:14:29,440 --> 00:14:30,261
invariance.

362
00:14:30,261 --> 00:14:31,510
It's a rather different shape.

363
00:14:31,510 --> 00:14:32,800
It's been distorted somewhat.

364
00:14:32,800 --> 00:14:34,450
For a program,
there's a powerful way

365
00:14:34,450 --> 00:14:36,370
to capture that where
you can say, well

366
00:14:36,370 --> 00:14:40,450
if you would do something
like the program

367
00:14:40,450 --> 00:14:43,270
for generating this, which
is like one stroke like that

368
00:14:43,270 --> 00:14:46,030
and then these other two things
shown with the red and green,

369
00:14:46,030 --> 00:14:49,300
and here's a program that you
might induce to generate that.

370
00:14:49,300 --> 00:14:53,820
And then the question is,
which of these two programs,

371
00:14:53,820 --> 00:14:55,920
simple hierarchical
motor programs,

372
00:14:55,920 --> 00:14:58,370
is most likely to
generate that character?

373
00:14:58,370 --> 00:15:00,390
Now, it turns out that
it's incredibly unlikely

374
00:15:00,390 --> 00:15:04,710
to generate any character
from one of these programs.

375
00:15:04,710 --> 00:15:06,900
These are the log scores,
the log probabilities.

376
00:15:06,900 --> 00:15:09,960
So this one is like 2
to the negative 758.

377
00:15:09,960 --> 00:15:13,110
And this one is like 2
to the negative 1,880.

378
00:15:13,110 --> 00:15:14,340
I don't know if it's base e.

379
00:15:14,340 --> 00:15:15,960
It's maybe 2 or e, but whatever.

380
00:15:15,960 --> 00:15:17,670
So each of these is very small.

381
00:15:17,670 --> 00:15:20,010
But this one is like
1,000 orders of magnitude

382
00:15:20,010 --> 00:15:21,484
more likely than that one.

383
00:15:21,484 --> 00:15:22,650
And that makes sense, right?

384
00:15:22,650 --> 00:15:24,780
It just is easier
to think intuitively

385
00:15:24,780 --> 00:15:27,660
about generating this
shape from that distortion.

386
00:15:27,660 --> 00:15:30,060
So that's basically
what the system does.

387
00:15:30,060 --> 00:15:32,640
And it's able to do
this remarkable thing

388
00:15:32,640 --> 00:15:35,030
that you were able to do
too-- this one-shot learning

389
00:15:35,030 --> 00:15:35,672
of a concept.

390
00:15:35,672 --> 00:15:37,380
Here's just another
illustration of this.

391
00:15:37,380 --> 00:15:39,969
We show people one
example of a new character

392
00:15:39,969 --> 00:15:41,760
in an alphabet they
don't know and ask them

393
00:15:41,760 --> 00:15:42,930
to pick out the other one.

394
00:15:42,930 --> 00:15:44,385
Everybody see where it is here?

395
00:15:44,385 --> 00:15:48,800
It's not that easy,
but it's doable.

396
00:15:48,800 --> 00:15:49,820
Down here, right.

397
00:15:49,820 --> 00:15:53,862
So people are better
than 95% correct at this.

398
00:15:53,862 --> 00:15:54,820
This is the error rate.

399
00:15:54,820 --> 00:15:56,540
So the error rate
is less than 5%

400
00:15:56,540 --> 00:15:59,090
for humans and also
for this model.

401
00:15:59,090 --> 00:16:03,720
But for a range of more
standard deep learning models,

402
00:16:03,720 --> 00:16:06,410
this one here is, basically,
like an image net or MNIST type

403
00:16:06,410 --> 00:16:06,910
1.

404
00:16:06,910 --> 00:16:08,300
So this is the
kind of model that

405
00:16:08,300 --> 00:16:12,710
was really sort of massive
convolutional classifier.

406
00:16:12,710 --> 00:16:14,406
The best deep learning
one is actually

407
00:16:14,406 --> 00:16:15,530
something for this problem.

408
00:16:15,530 --> 00:16:17,600
It's what's called
a Siamese ConvNet.

409
00:16:17,600 --> 00:16:19,040
And that can do somewhat better.

410
00:16:19,040 --> 00:16:22,430
But it's still more than
twice as bad as people.

411
00:16:22,430 --> 00:16:24,290
So we think this
is one place where,

412
00:16:24,290 --> 00:16:26,340
at least in a hard
classification problem,

413
00:16:26,340 --> 00:16:28,790
you can see that deep learning
still isn't quite there.

414
00:16:28,790 --> 00:16:32,090
Whereas this-- even
the best thing-- this

415
00:16:32,090 --> 00:16:34,490
was a network that was,
basically, specifically worked

416
00:16:34,490 --> 00:16:36,770
out by one of Ruslan's
students for about a year

417
00:16:36,770 --> 00:16:39,740
to solve exactly this
problem on this data set.

418
00:16:39,740 --> 00:16:43,640
And it substantially improved
over a standard deep learning

419
00:16:43,640 --> 00:16:47,210
classifier, which
substantially improved

420
00:16:47,210 --> 00:16:48,920
over a different
deep learning model

421
00:16:48,920 --> 00:16:50,030
that Ruslan and
I both worked on.

422
00:16:50,030 --> 00:16:52,071
So there's definitely been
some improvement here.

423
00:16:52,071 --> 00:16:53,930
And never bet against
deep learning.

424
00:16:53,930 --> 00:16:56,690
I can't guarantee that somebody
sets, spends their PhD.

425
00:16:56,690 --> 00:16:59,420
They could work out something
that could do this well.

426
00:16:59,420 --> 00:17:04,310
But still, it's a case which
still has some room to push,

427
00:17:04,310 --> 00:17:05,990
where, for example,
just a pure pattern

428
00:17:05,990 --> 00:17:07,609
recognition approach might go.

429
00:17:07,609 --> 00:17:09,532
But maybe more
interesting is, again,

430
00:17:09,532 --> 00:17:11,990
going back to all the things
we use our knowledge for, kids

431
00:17:11,990 --> 00:17:12,800
might use our knowledge for.

432
00:17:12,800 --> 00:17:14,300
As we don't just
classify the world.

433
00:17:14,300 --> 00:17:15,082
We understand it.

434
00:17:15,082 --> 00:17:16,040
We generate new things.

435
00:17:16,040 --> 00:17:17,579
We imagine new things.

436
00:17:17,579 --> 00:17:21,740
So here's a place where you
can use your generative program

437
00:17:21,740 --> 00:17:25,430
that none of these networks
do, at least by nature.

438
00:17:25,430 --> 00:17:28,119
Maybe you could think of some
way to get them to do it.

439
00:17:28,119 --> 00:17:30,140
And this is to say,
not just classify,

440
00:17:30,140 --> 00:17:33,056
but to produce,
imagine new examples.

441
00:17:33,056 --> 00:17:34,430
So here's an
illustration of this

442
00:17:34,430 --> 00:17:38,840
where we gave people an example
of one of these new concepts.

443
00:17:38,840 --> 00:17:41,900
And then we said, draw another
example of the same concept.

444
00:17:41,900 --> 00:17:42,890
Don't just copy it.

445
00:17:42,890 --> 00:17:45,370
Make up another
example of the concept.

446
00:17:45,370 --> 00:17:48,160
And what you can see here
is a set of nine examples

447
00:17:48,160 --> 00:17:51,920
that nine different people
did in response to that query.

448
00:17:51,920 --> 00:17:53,900
And then you can also
see on the other side

449
00:17:53,900 --> 00:17:56,810
nine examples of our
program doing that.

450
00:17:56,810 --> 00:17:58,310
Can anybody tell
which is the people

451
00:17:58,310 --> 00:18:01,040
and which is the program?

452
00:18:01,040 --> 00:18:02,130
Let's try this out.

453
00:18:02,130 --> 00:18:05,269
So which is the machine for
this character, the left

454
00:18:05,269 --> 00:18:05,810
or the right?

455
00:18:05,810 --> 00:18:06,960
How many people say the left?

456
00:18:06,960 --> 00:18:07,626
Raise your hand.

457
00:18:07,626 --> 00:18:09,620
How many people say the right?

458
00:18:09,620 --> 00:18:11,300
About 50-50, very good.

459
00:18:11,300 --> 00:18:14,260
How many people say this is
the machine for this one?

460
00:18:14,260 --> 00:18:16,900
How many people say
this is the machine?

461
00:18:16,900 --> 00:18:18,240
May be slight preference there.

462
00:18:18,240 --> 00:18:21,080
How many people say
this is the machine?

463
00:18:21,080 --> 00:18:23,210
How many people say
this is the machine?

464
00:18:23,210 --> 00:18:25,535
How many people say
this is the machine?

465
00:18:25,535 --> 00:18:26,910
Some people really
like the left.

466
00:18:26,910 --> 00:18:28,535
How many people say
that's the machine?

467
00:18:28,535 --> 00:18:30,250
Basically, it's 50/50
for all of them.

468
00:18:30,250 --> 00:18:31,850
Here's the right answer.

469
00:18:31,850 --> 00:18:34,183
I don't know, you could decide
if you were right or not.

470
00:18:34,183 --> 00:18:34,849
I don't know,

471
00:18:34,849 --> 00:18:35,640
Here's another set.

472
00:18:35,640 --> 00:18:38,750
Again, I hope it's clear that
this is not an easy task.

473
00:18:38,750 --> 00:18:40,810
And in fact, people are,
basically, at chance.

474
00:18:40,810 --> 00:18:42,610
We've done a bunch
of studies of this.

475
00:18:42,610 --> 00:18:46,180
And most people just can't tell.

476
00:18:46,180 --> 00:18:48,220
People on average are
about 50% correct.

477
00:18:48,220 --> 00:18:49,970
You basically just can't tell.

478
00:18:49,970 --> 00:18:52,150
So it's an example of
a kind of Turing test

479
00:18:52,150 --> 00:18:54,520
that a certain interesting
program learning

480
00:18:54,520 --> 00:18:56,140
program is solving.

481
00:18:56,140 --> 00:18:59,720
At the level that's
confusable with humans,

482
00:18:59,720 --> 00:19:02,020
this system is able to
learn simple programs

483
00:19:02,020 --> 00:19:03,010
for visual concepts.

484
00:19:03,010 --> 00:19:05,680
And not just classify but use
them to create new things.

485
00:19:05,680 --> 00:19:07,942
You can even create new
things at this higher level

486
00:19:07,942 --> 00:19:08,650
that I mentioned.

487
00:19:08,650 --> 00:19:10,960
So here, the task, which,
again, people and machines

488
00:19:10,960 --> 00:19:16,240
are roughly similar on, is
to be given 10 examples each

489
00:19:16,240 --> 00:19:18,070
of a different concept
in a higher level

490
00:19:18,070 --> 00:19:20,530
concept, like an alphabet,
and then draw new characters

491
00:19:20,530 --> 00:19:21,439
in that alphabet.

492
00:19:21,439 --> 00:19:23,480
And we give people only
a few seconds to do this.

493
00:19:23,480 --> 00:19:25,240
So they don't get too artistic.

494
00:19:25,240 --> 00:19:27,650
But again, you can see that
machine is able to do this.

495
00:19:27,650 --> 00:19:30,250
People are also kind of similar.

496
00:19:30,250 --> 00:19:33,430
So let me say,
that was a success

497
00:19:33,430 --> 00:19:35,980
story, a place for the idea of
learning as program induction

498
00:19:35,980 --> 00:19:36,580
kind of works.

499
00:19:36,580 --> 00:19:38,620
What about something more
like what we're really

500
00:19:38,620 --> 00:19:40,060
most deeply interested in--

501
00:19:40,060 --> 00:19:41,140
children's learning?

502
00:19:41,140 --> 00:19:43,990
Like the ability, for
example, to, say, understand

503
00:19:43,990 --> 00:19:45,370
goal-directed action.

504
00:19:45,370 --> 00:19:47,170
These cases we've
talked a lot about.

505
00:19:47,170 --> 00:19:49,820
Or intuitive physics-- again,
cases we've talked about.

506
00:19:49,820 --> 00:19:51,430
And it's part of
our research program

507
00:19:51,430 --> 00:19:53,920
for this center, something
we'd love all of you guys,

508
00:19:53,920 --> 00:19:55,810
if you're interested,
to help work on.

509
00:19:55,810 --> 00:19:56,930
It's a very big problem.

510
00:19:56,930 --> 00:19:59,380
It's how do you characterize
the knowledge that kids

511
00:19:59,380 --> 00:20:01,660
are learning over the first
few years and the learning

512
00:20:01,660 --> 00:20:03,850
mechanisms that
build it, which we'd

513
00:20:03,850 --> 00:20:06,010
like to think of in
some similar way.

514
00:20:06,010 --> 00:20:09,730
Like could we say there's
some intuitive physics program

515
00:20:09,730 --> 00:20:12,760
and intuitive physics
program learning programs

516
00:20:12,760 --> 00:20:16,120
that are building out knowledge
for these kinds of problems?

517
00:20:16,120 --> 00:20:17,729
And we don't know how to do it.

518
00:20:17,729 --> 00:20:19,270
But again, here are
some of the steps

519
00:20:19,270 --> 00:20:20,436
we've been starting to take.

520
00:20:20,436 --> 00:20:22,745
So this is work that Tomer
did as part of his PhD,

521
00:20:22,745 --> 00:20:24,370
and it's something
that he's continuing

522
00:20:24,370 --> 00:20:28,270
to do with Liz and others
as part of his post-doc.

523
00:20:28,270 --> 00:20:29,680
So we're showing people--

524
00:20:29,680 --> 00:20:33,450
again, it's much like what you
saw from me and from Laura.

525
00:20:33,450 --> 00:20:37,180
We're really interested in
learning from sparse data.

526
00:20:37,180 --> 00:20:41,330
Because all the data
is sparse in a sense.

527
00:20:41,330 --> 00:20:43,264
But in the lab, you push
things to the limit.

528
00:20:43,264 --> 00:20:44,680
So you study really
sparse things,

529
00:20:44,680 --> 00:20:46,930
like one-shot learning
of a visual concept.

530
00:20:46,930 --> 00:20:50,680
Or here, this is like we've
been interested in what can you

531
00:20:50,680 --> 00:20:53,080
learn about the laws
of physics from just

532
00:20:53,080 --> 00:20:55,190
watching that for five seconds.

533
00:20:55,190 --> 00:20:58,660
So we show people
videos like this.

534
00:20:58,660 --> 00:21:01,180
Think of this as like you're
watching hockey pucks on an air

535
00:21:01,180 --> 00:21:01,750
hockey table.

536
00:21:01,750 --> 00:21:05,470
So it's like an overhead view
of some things bouncing around.

537
00:21:05,470 --> 00:21:08,830
And you can see
that they're kind

538
00:21:08,830 --> 00:21:10,150
of Newtonian in some sense.

539
00:21:10,150 --> 00:21:11,710
They bounce off of each other.

540
00:21:11,710 --> 00:21:14,950
Looks like there's some
inertia, inertial collisions.

541
00:21:14,950 --> 00:21:16,492
But you might
notice that there's

542
00:21:16,492 --> 00:21:17,950
some other interesting
things going

543
00:21:17,950 --> 00:21:20,226
on that are not
just F equals m a,

544
00:21:20,226 --> 00:21:21,850
like other interesting
kinds of forces.

545
00:21:21,850 --> 00:21:23,058
And I'll show you other ones.

546
00:21:23,058 --> 00:21:26,230
Tomer made a whole awesome
set of these movies.

547
00:21:26,230 --> 00:21:28,880
Hopefully, you've got some
idea of what's going on there.

548
00:21:28,880 --> 00:21:31,630
Like interesting forces of
attraction, and, repulsion,

549
00:21:31,630 --> 00:21:32,900
different kinds of things.

550
00:21:32,900 --> 00:21:35,552
So here, each of those can
be described as a program.

551
00:21:35,552 --> 00:21:37,760
And here's a program generating
program, if you like.

552
00:21:37,760 --> 00:21:43,109
So the same kind of idea, just
as in the handwritten character

553
00:21:43,109 --> 00:21:43,900
model I showed you.

554
00:21:43,900 --> 00:21:46,483
It's not like it's learning in
a blank slate way from scratch.

555
00:21:46,483 --> 00:21:49,120
It knows about objects,
parts, and subparts.

556
00:21:49,120 --> 00:21:51,532
What it has to learn
is, in this domain

557
00:21:51,532 --> 00:21:52,990
of handwritten
characters, what are

558
00:21:52,990 --> 00:21:54,689
the parts and relations like.

559
00:21:54,689 --> 00:21:56,230
And then for the
particular new thing

560
00:21:56,230 --> 00:21:58,040
you're learning, like this
particular new concept,

561
00:21:58,040 --> 00:21:59,873
what are its particular
parts and relations.

562
00:21:59,873 --> 00:22:01,910
So there's these several
levels of learning

563
00:22:01,910 --> 00:22:04,960
where the big picture of objects
and parts is not learned.

564
00:22:04,960 --> 00:22:07,780
And then the specifics
for this domain

565
00:22:07,780 --> 00:22:13,450
of handwritten characters, the
idea of what strokes look like.

566
00:22:13,450 --> 00:22:15,336
That's learned from sort
of a background set.

567
00:22:15,336 --> 00:22:17,710
And then your ability to do
one-shot learning or learning

568
00:22:17,710 --> 00:22:19,884
from very sparse
data of a new concept

569
00:22:19,884 --> 00:22:22,300
takes all that prior knowledge,
some of which is wired in,

570
00:22:22,300 --> 00:22:23,980
some of which is
previously learned,

571
00:22:23,980 --> 00:22:25,690
and brings it to
bear to generate

572
00:22:25,690 --> 00:22:27,090
a new program very sparsely.

573
00:22:27,090 --> 00:22:28,910
So you have the same
kind of thing here.

574
00:22:28,910 --> 00:22:30,970
We were wiring in,
in a sense, F equals

575
00:22:30,970 --> 00:22:33,700
m a, the most general
laws of physics.

576
00:22:33,700 --> 00:22:36,190
And then we're also wiring
in sort of the possibility

577
00:22:36,190 --> 00:22:39,580
that there could be kinds
of things and forces

578
00:22:39,580 --> 00:22:41,920
that they exert on each
other, as some kinds of things

579
00:22:41,920 --> 00:22:43,560
exert other kinds
of forces on others.

580
00:22:43,560 --> 00:22:45,310
And that there could
be latent properties,

581
00:22:45,310 --> 00:22:46,990
things like mass and friction.

582
00:22:46,990 --> 00:22:49,090
And then what the
model is trying to do

583
00:22:49,090 --> 00:22:51,966
is, basically, to learn about
these particular properties.

584
00:22:51,966 --> 00:22:53,590
What's the mass of
this kind of object?

585
00:22:53,590 --> 00:22:55,720
What's the friction of
this kind of surface?

586
00:22:55,720 --> 00:22:58,150
Which objects exert which
kind of forces on each other?

587
00:22:58,150 --> 00:23:01,180
Is there something like
gravity blowing everything

588
00:23:01,180 --> 00:23:03,680
to the left, or
the right, or down?

589
00:23:03,680 --> 00:23:05,590
What this is showing
here is the same kind

590
00:23:05,590 --> 00:23:07,150
of plots we saw
from me last time.

591
00:23:07,150 --> 00:23:10,400
It's a plot of
people versus model,

592
00:23:10,400 --> 00:23:12,467
based on a whole bunch
of different conditions

593
00:23:12,467 --> 00:23:13,300
of the sort you saw.

594
00:23:13,300 --> 00:23:15,920
People are judging these
different physical properties.

595
00:23:15,920 --> 00:23:17,990
And they're making
greater judgments

596
00:23:17,990 --> 00:23:20,240
of how likely it is,
basically, to have one

597
00:23:20,240 --> 00:23:21,560
of these properties or another.

598
00:23:21,560 --> 00:23:24,180
And there's the model on the
x-axis, people on the y-axis.

599
00:23:24,180 --> 00:23:27,590
And what you can see is
a sort of OK decent fit.

600
00:23:27,590 --> 00:23:32,630
We characterize this experiment
as a kind of a mixed success.

601
00:23:32,630 --> 00:23:34,130
I mean, it's sort
of shocking people

602
00:23:34,130 --> 00:23:35,060
can learn anything at all.

603
00:23:35,060 --> 00:23:36,800
Like how much could you learn
about the laws of physics

604
00:23:36,800 --> 00:23:38,600
from five seconds
of observation?

605
00:23:38,600 --> 00:23:40,550
Well, it's also kind
of shocking that Newton

606
00:23:40,550 --> 00:23:43,784
could learn about the laws of
physics by just looking at,

607
00:23:43,784 --> 00:23:45,950
you know, in the history
of the universe, about five

608
00:23:45,950 --> 00:23:49,692
seconds or less worth of data
that people had collected

609
00:23:49,692 --> 00:23:50,900
for the planets going around.

610
00:23:50,900 --> 00:23:53,930
So it is the nature of both
science and intuitive theory

611
00:23:53,930 --> 00:23:57,590
building that you can get
so much from so little.

612
00:23:57,590 --> 00:23:58,992
But people are not Newton here.

613
00:23:58,992 --> 00:24:00,200
They're just using intuition.

614
00:24:00,200 --> 00:24:01,491
They're making quick responses.

615
00:24:01,491 --> 00:24:02,266
And they're OK.

616
00:24:02,266 --> 00:24:04,640
There's a correlation, but
it's not perfect by any means.

617
00:24:04,640 --> 00:24:06,140
One of the things that
we're working on right

618
00:24:06,140 --> 00:24:07,850
now is looking at,
say, what happens

619
00:24:07,850 --> 00:24:10,391
if you can, unlike, say, Newton,
go in and actually intervene

620
00:24:10,391 --> 00:24:11,900
and push these planets around.

621
00:24:11,900 --> 00:24:13,460
Hopefully you'll do better.

622
00:24:13,460 --> 00:24:15,050
But stay tuned for that.

623
00:24:15,050 --> 00:24:17,780
The basic thing here, though, is
that people can learn something

624
00:24:17,780 --> 00:24:18,540
from this.

625
00:24:18,540 --> 00:24:22,280
But the way our model works is
it's not very satisfying for us

626
00:24:22,280 --> 00:24:25,100
as a view of kind of
program induction or program

627
00:24:25,100 --> 00:24:25,670
construction.

628
00:24:25,670 --> 00:24:27,860
Because we think it just
knows too much or has,

629
00:24:27,860 --> 00:24:29,990
basically, all the
form of the program.

630
00:24:29,990 --> 00:24:31,790
And it's estimating
some parameters.

631
00:24:31,790 --> 00:24:35,126
It's like one of the things
you do as a hacker, as a coder,

632
00:24:35,126 --> 00:24:37,250
is you have your code and
you tune some parameters.

633
00:24:37,250 --> 00:24:39,770
Or you try to decide if this
function is the right one

634
00:24:39,770 --> 00:24:41,470
to use or that one.

635
00:24:41,470 --> 00:24:43,160
And this is doing that.

636
00:24:43,160 --> 00:24:45,710
But nowhere is this like
actually writing new code,

637
00:24:45,710 --> 00:24:46,575
in a sense.

638
00:24:46,575 --> 00:24:48,200
And that's just the
really hard problem

639
00:24:48,200 --> 00:24:50,922
that I wanted to mostly
leave you with and set up

640
00:24:50,922 --> 00:24:53,380
going what we're going to do
for the rest of the afternoon.

641
00:24:53,380 --> 00:24:56,990
Like if you wanted to not
just tune the parameters

642
00:24:56,990 --> 00:25:00,000
and figure out the strength or
existence of different forces,

643
00:25:00,000 --> 00:25:03,260
but actually write the form of
laws, how would you do this?

644
00:25:03,260 --> 00:25:05,100
What's the right
hypothesis space?

645
00:25:05,100 --> 00:25:08,390
So you'd need programs that
don't just generate programs

646
00:25:08,390 --> 00:25:11,336
but actually write the
code of them in a sense.

647
00:25:11,336 --> 00:25:12,710
And what's an
effective algorithm

648
00:25:12,710 --> 00:25:15,197
for searching the space
of these theories?

649
00:25:15,197 --> 00:25:16,280
It's very, very difficult.

650
00:25:16,280 --> 00:25:18,170
I think, Tomer, are you going
to show this figure at all?

651
00:25:18,170 --> 00:25:20,300
Yes, so mostly I'll
leave this to Tomer.

652
00:25:20,300 --> 00:25:24,410
But there's a very
striking contrast

653
00:25:24,410 --> 00:25:28,580
between the nice optimization
landscapes for, say,

654
00:25:28,580 --> 00:25:32,990
neural networks or most
any standard scalable

655
00:25:32,990 --> 00:25:35,120
machine learning
algorithm, whether it's

656
00:25:35,120 --> 00:25:36,980
trained by gradient
descent or convex

657
00:25:36,980 --> 00:25:40,280
optimization, and the kinds
of landscapes for optimization

658
00:25:40,280 --> 00:25:43,100
and search that you have if
you're trying to generate

659
00:25:43,100 --> 00:25:45,110
a space of programs.

660
00:25:45,110 --> 00:25:48,350
If you want to see our early
attempts to try to do something

661
00:25:48,350 --> 00:25:50,360
like learning the
form of a program,

662
00:25:50,360 --> 00:25:52,610
look, for example, at stuff
that Charles Kemp did.

663
00:25:52,610 --> 00:25:54,530
Part of his thesis
that was published

664
00:25:54,530 --> 00:25:58,190
in PNAS a few years ago,
where he tried to generate

665
00:25:58,190 --> 00:25:59,450
or, basically, have--

666
00:25:59,450 --> 00:26:01,850
think of generative
grammars for graphs.

667
00:26:01,850 --> 00:26:04,730
Think about the problem--
so Laura mentioned Darwin.

668
00:26:04,730 --> 00:26:07,880
How did Darwin figure out
something about evolution

669
00:26:07,880 --> 00:26:10,160
without understanding
any of the mechanisms?

670
00:26:10,160 --> 00:26:13,520
Or the more basic problem
of figuring out that species

671
00:26:13,520 --> 00:26:15,680
should be generated by
some kind of branching tree

672
00:26:15,680 --> 00:26:17,840
process versus other kinds.

673
00:26:17,840 --> 00:26:21,230
Remember last time when I
talked about various kinds

674
00:26:21,230 --> 00:26:23,150
of structured
probabilistic models, tree

675
00:26:23,150 --> 00:26:27,450
structures, or spaces, or
chains for threshold reasoning.

676
00:26:27,450 --> 00:26:29,150
So Charles did some
really nice work,

677
00:26:29,150 --> 00:26:32,540
basically trying to use the
idea of a program for generating

678
00:26:32,540 --> 00:26:36,800
graphical models, like there's
a grammar that grows out graphs.

679
00:26:36,800 --> 00:26:40,190
And he showed how
you could take data

680
00:26:40,190 --> 00:26:42,590
drawn from different domains,
like, say, those data sets

681
00:26:42,590 --> 00:26:45,330
you saw before of animals
and their properties.

682
00:26:45,330 --> 00:26:47,780
We spent an hour
on that last time.

683
00:26:47,780 --> 00:26:50,780
So Charles showed how you
could induce not only a tree

684
00:26:50,780 --> 00:26:53,330
structure but the higher level
fact that there is a tree

685
00:26:53,330 --> 00:26:54,260
structure.

686
00:26:54,260 --> 00:26:57,560
Namely, a rule that
generates trees

687
00:26:57,560 --> 00:26:59,180
being the right
abstract principle

688
00:26:59,180 --> 00:27:02,150
to, say, give you the structure
of species in biology,

689
00:27:02,150 --> 00:27:04,617
whereas other rules
would generate

690
00:27:04,617 --> 00:27:05,700
other kinds of structures.

691
00:27:05,700 --> 00:27:08,270
So for example, he took
similar data matrices

692
00:27:08,270 --> 00:27:10,520
for how Supreme
Court judges voted

693
00:27:10,520 --> 00:27:12,200
and was able to
infer a left-right,

694
00:27:12,200 --> 00:27:14,120
liberal-conservative spectrum.

695
00:27:14,120 --> 00:27:17,150
Or data from the proximities
between cities and figure

696
00:27:17,150 --> 00:27:20,810
out a sort of cylinder,
like latitude and longitude

697
00:27:20,810 --> 00:27:23,840
map of the world just from
the distances between cities.

698
00:27:23,840 --> 00:27:27,440
Or take faces and figure
out a low dimensional space

699
00:27:27,440 --> 00:27:29,390
as the right way to
think about faces.

700
00:27:29,390 --> 00:27:32,190
So in some sense,
this was really cool.

701
00:27:32,190 --> 00:27:33,410
We were really excited.

702
00:27:33,410 --> 00:27:36,560
Oh, hey, we have a way to learn
these simple programs which

703
00:27:36,560 --> 00:27:39,250
generate structures, which
themselves generate the data.

704
00:27:39,250 --> 00:27:41,660
It's where that idea
of hierarchical Bayes

705
00:27:41,660 --> 00:27:44,120
meets up with this idea
of program induction

706
00:27:44,120 --> 00:27:47,090
or learning a program.

707
00:27:47,090 --> 00:27:48,110
And it even captured--

708
00:27:48,110 --> 00:27:50,800
OK, this is really the
last slide I'll show.

709
00:27:50,800 --> 00:27:52,730
It even captured
something which captured

710
00:27:52,730 --> 00:27:54,220
all of our imaginations.

711
00:27:54,220 --> 00:27:56,720
We use this phrase "the blessing
of abstraction" to tie back

712
00:27:56,720 --> 00:28:00,920
into one more theme of Laura's,
which is this idea that when

713
00:28:00,920 --> 00:28:03,170
kids are building up
abstract concepts,

714
00:28:03,170 --> 00:28:05,210
there's a sense
in which, unlike,

715
00:28:05,210 --> 00:28:08,840
say, a lot of maybe traditional
machine learning methods

716
00:28:08,840 --> 00:28:10,760
or a lot of traditional
ideas and philosophy

717
00:28:10,760 --> 00:28:12,625
about the origins of
abstract knowledge,

718
00:28:12,625 --> 00:28:14,750
it's not like you just get
the concrete stuff first

719
00:28:14,750 --> 00:28:16,630
and layer on the
more abstract stuff.

720
00:28:16,630 --> 00:28:18,700
There's a sense
often, in children's

721
00:28:18,700 --> 00:28:21,979
learning as in science, in which
the big picture comes in first.

722
00:28:21,979 --> 00:28:23,770
The abstract idea comes
there, and then you

723
00:28:23,770 --> 00:28:25,250
fill in the details.

724
00:28:25,250 --> 00:28:27,729
So for example,
Darwin figured out,

725
00:28:27,729 --> 00:28:29,020
in some sense, the big picture.

726
00:28:29,020 --> 00:28:30,436
He figured out the
idea that there

727
00:28:30,436 --> 00:28:32,840
was some kind of branching
process that generated species

728
00:28:32,840 --> 00:28:33,820
that was random.

729
00:28:33,820 --> 00:28:38,380
Not a nice perfect Linnean
seven-layer hierarchy

730
00:28:38,380 --> 00:28:40,220
but some kind of random
branching process.

731
00:28:40,220 --> 00:28:41,380
And he didn't know
what the mechanisms

732
00:28:41,380 --> 00:28:42,580
were that gave rise to it.

733
00:28:42,580 --> 00:28:44,800
And similarly, Newton
figured out something

734
00:28:44,800 --> 00:28:47,922
about the law of gravitation
and everything else in his laws,

735
00:28:47,922 --> 00:28:49,630
though he didn't know
the mechanisms that

736
00:28:49,630 --> 00:28:51,007
gave rise to gravity.

737
00:28:51,007 --> 00:28:52,090
And he didn't even know g.

738
00:28:52,090 --> 00:28:54,589
He didn't even know the value
of the gravitational constant.

739
00:28:54,589 --> 00:28:56,770
That couldn't be estimated
for 100 years later.

740
00:28:56,770 --> 00:28:59,299
But somehow he was able
to get the abstract form.

741
00:28:59,299 --> 00:29:01,090
And these nice things
that Charles Kemp did

742
00:29:01,090 --> 00:29:02,180
were also able to do that.

743
00:29:02,180 --> 00:29:04,930
So for example, from
very little data,

744
00:29:04,930 --> 00:29:07,330
to figure out that animals
should be generated

745
00:29:07,330 --> 00:29:09,040
by some kind of
a tree structure,

746
00:29:09,040 --> 00:29:10,690
as opposed to, say,
the simpler model

747
00:29:10,690 --> 00:29:12,610
of just a bunch
of flat clusters.

748
00:29:12,610 --> 00:29:14,590
That model was able
to figure that out,

749
00:29:14,590 --> 00:29:16,910
over here on the right,
from just a small fraction

750
00:29:16,910 --> 00:29:17,410
of the data.

751
00:29:17,410 --> 00:29:19,120
And then with all
the rest of the data,

752
00:29:19,120 --> 00:29:21,449
it was able to figure out
the right tree in a sense.

753
00:29:21,449 --> 00:29:23,490
And we called this "the
blessing of abstraction,"

754
00:29:23,490 --> 00:29:27,190
this idea that often, in these
hierarchical program learning

755
00:29:27,190 --> 00:29:29,140
programs, you could
get the high level

756
00:29:29,140 --> 00:29:31,112
idea before you got
the lower level idea

757
00:29:31,112 --> 00:29:32,320
and then fill in the details.

758
00:29:32,320 --> 00:29:35,140
And I still think there's
something fundamentally right

759
00:29:35,140 --> 00:29:39,250
about this idea of
children's learning,

760
00:29:39,250 --> 00:29:41,890
both representationally
and mechanistically.

761
00:29:41,890 --> 00:29:43,900
And that this dynamics
of sometimes getting

762
00:29:43,900 --> 00:29:46,750
the big picture first and using
that as a constraint to fill

763
00:29:46,750 --> 00:29:49,540
in the details is
fundamentally right.

764
00:29:49,540 --> 00:29:52,270
But actually,
understanding how this--

765
00:29:52,270 --> 00:29:54,790
either algorithmically-- how
to search the space of programs

766
00:29:54,790 --> 00:29:57,100
for anything that looks like
an intuitive causal theory

767
00:29:57,100 --> 00:29:59,379
of physics and relate
that to the dynamics

768
00:29:59,379 --> 00:30:00,670
of how children actually learn.

769
00:30:00,670 --> 00:30:03,190
That's the big open question
that I will now hand over

770
00:30:03,190 --> 00:30:06,630
to our other speakers.