1
00:00:01,640 --> 00:00:04,040
The following content is
provided under a Creative

2
00:00:04,040 --> 00:00:05,580
Commons license.

3
00:00:05,580 --> 00:00:07,880
Your support will help
MIT OpenCourseWare

4
00:00:07,880 --> 00:00:12,270
continue to offer high quality
educational resources for free.

5
00:00:12,270 --> 00:00:14,870
To make a donation or
view additional materials

6
00:00:14,870 --> 00:00:18,830
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,830 --> 00:00:21,015
at osw.mit.edu.

8
00:00:21,015 --> 00:00:22,640
JOSH TENENBAUM: I'm
going to be talking

9
00:00:22,640 --> 00:00:25,580
about computational
cognitive science.

10
00:00:25,580 --> 00:00:28,590
In the brains, minds,
and machines landscape,

11
00:00:28,590 --> 00:00:31,250
this is connecting the
minds and the machines part.

12
00:00:31,250 --> 00:00:32,900
And I really want
to try to emphasize

13
00:00:32,900 --> 00:00:35,870
both some conceptual themes
and some technical themes that

14
00:00:35,870 --> 00:00:37,640
are complimentary
to a lot of what

15
00:00:37,640 --> 00:00:40,490
you've seen for the first
week or so of the class.

16
00:00:40,490 --> 00:00:45,357
That's going to include ideas
of generative models and ideas

17
00:00:45,357 --> 00:00:46,940
of probabilistic
programs, which we'll

18
00:00:46,940 --> 00:00:48,710
see a little bit
here and a lot more

19
00:00:48,710 --> 00:00:50,570
in the tutorial
in the afternoon.

20
00:00:50,570 --> 00:00:53,060
And on the cognitive
side, maybe we

21
00:00:53,060 --> 00:00:56,360
could sum it up by
calling it common sense.

22
00:00:56,360 --> 00:00:59,420
Since this is meant to
be a broad introduction--

23
00:00:59,420 --> 00:01:01,550
and I'm going to try to
cover from some very basic

24
00:01:01,550 --> 00:01:04,849
fundamental things that people
in this field were doing maybe

25
00:01:04,849 --> 00:01:10,220
10 or 15 years ago up until
the state of the art current

26
00:01:10,220 --> 00:01:11,150
research--

27
00:01:11,150 --> 00:01:13,070
I want to try to give
that whole broad sweep.

28
00:01:13,070 --> 00:01:14,570
And I also want to
try to give a bit

29
00:01:14,570 --> 00:01:17,210
of a sort of philosophical
introduction at the beginning

30
00:01:17,210 --> 00:01:19,490
to set this in context with
the other things you're

31
00:01:19,490 --> 00:01:21,050
seeing in the summer school.

32
00:01:21,050 --> 00:01:24,590
I think it's fair to say that
there are two different notions

33
00:01:24,590 --> 00:01:27,957
of intelligence that are
both important and are

34
00:01:27,957 --> 00:01:30,290
both interesting to members
of this center in the summer

35
00:01:30,290 --> 00:01:31,250
school.

36
00:01:31,250 --> 00:01:34,640
The two different
notions are what

37
00:01:34,640 --> 00:01:38,960
I think you could call
classifying, recognizing

38
00:01:38,960 --> 00:01:44,630
patterns in data, and what
you could call explaining,

39
00:01:44,630 --> 00:01:48,687
understanding,
modeling the world.

40
00:01:48,687 --> 00:01:51,020
So, again, there's the notion
of classification, pattern

41
00:01:51,020 --> 00:01:54,470
recognition, finding
patterns in data

42
00:01:54,470 --> 00:01:57,150
and maybe patterns that connect
data to some task you're

43
00:01:57,150 --> 00:01:58,060
trying to solve.

44
00:01:58,060 --> 00:01:59,810
And then there's this
idea of intelligence

45
00:01:59,810 --> 00:02:02,900
as explaining, understanding,
building a model of the world

46
00:02:02,900 --> 00:02:07,280
that you can use to play
on and solve problems with.

47
00:02:07,280 --> 00:02:10,202
I'm going to emphasize here
notions of explanation,

48
00:02:10,202 --> 00:02:11,660
because I think
they are absolutely

49
00:02:11,660 --> 00:02:14,120
central to intelligence,
certainly in any sense

50
00:02:14,120 --> 00:02:15,890
that we mean when we
talk about humans.

51
00:02:15,890 --> 00:02:17,540
And because they get
kind of underemphasized

52
00:02:17,540 --> 00:02:19,373
in a lot of recent work
in machine learning,

53
00:02:19,373 --> 00:02:20,764
AI, neural networks, and so on.

54
00:02:20,764 --> 00:02:22,430
Like, most of the
techniques that you've

55
00:02:22,430 --> 00:02:26,240
seen so far in other
parts of the class

56
00:02:26,240 --> 00:02:30,120
and will continue to see, I
think it's fair to say they

57
00:02:30,120 --> 00:02:31,640
sort of fall under
the broad idea

58
00:02:31,640 --> 00:02:35,570
of trying to classify and
recognize patterns in data.

59
00:02:35,570 --> 00:02:37,070
And there's good
reasons why there's

60
00:02:37,070 --> 00:02:38,820
been a lot of attention
on these recently,

61
00:02:38,820 --> 00:02:41,000
particularly coming from
the more brain side.

62
00:02:41,000 --> 00:02:43,610
Because it's much easier
when you go and look

63
00:02:43,610 --> 00:02:46,970
in the brain to understand
how neural circuits do things

64
00:02:46,970 --> 00:02:49,260
like classifying
recognized patterns.

65
00:02:49,260 --> 00:02:53,180
And it's also, I
think with at least

66
00:02:53,180 --> 00:02:55,190
certain kinds of
current technology,

67
00:02:55,190 --> 00:02:57,230
much easier to get
machines to do this, right?

68
00:02:57,230 --> 00:02:59,560
All the excitement in
deep neural networks

69
00:02:59,560 --> 00:03:01,070
is all about this, right?

70
00:03:01,070 --> 00:03:02,900
But what I want to
try to convince you

71
00:03:02,900 --> 00:03:06,470
here and illustrate a lot of
different kinds of examples

72
00:03:06,470 --> 00:03:08,780
is how both of these
kinds of approaches

73
00:03:08,780 --> 00:03:11,330
are probably necessary,
essential to understanding

74
00:03:11,330 --> 00:03:11,900
the mind.

75
00:03:11,900 --> 00:03:14,330
I won't really bother to try to
convince you that the pattern

76
00:03:14,330 --> 00:03:16,163
recognition approach
is essential, because I

77
00:03:16,163 --> 00:03:17,619
take that for granted.

78
00:03:17,619 --> 00:03:19,910
But both are essential and,
also, that they essentially

79
00:03:19,910 --> 00:03:20,576
need each other.

80
00:03:20,576 --> 00:03:24,470
I'll try to illustrate a couple
of ways in which they really

81
00:03:24,470 --> 00:03:26,630
each solve the problems
that the other one needs--

82
00:03:26,630 --> 00:03:29,510
so ways in which ideas like
deep neural networks for doing

83
00:03:29,510 --> 00:03:31,850
really fast pattern
recognition can

84
00:03:31,850 --> 00:03:35,060
help to make the sort of
explaining understanding

85
00:03:35,060 --> 00:03:39,830
view of intelligence much
quicker and maybe much lower

86
00:03:39,830 --> 00:03:44,324
energy, but also ways in
which the sort of explaining,

87
00:03:44,324 --> 00:03:45,740
understanding view
of intelligence

88
00:03:45,740 --> 00:03:48,950
can make the pattern recognition
view much richer, much more

89
00:03:48,950 --> 00:03:50,691
flexible.

90
00:03:50,691 --> 00:03:51,690
What do you really mean?

91
00:03:51,690 --> 00:03:54,920
What's the difference between
classification and explanation?

92
00:03:54,920 --> 00:03:56,600
Or what makes a
good explanation?

93
00:03:56,600 --> 00:03:58,190
So we're talking
about intelligence

94
00:03:58,190 --> 00:04:01,130
as trying to explain your
experience in the world,

95
00:04:01,130 --> 00:04:04,160
basically, to build a
model that is in some sense

96
00:04:04,160 --> 00:04:06,237
a kind of actionable
causal model.

97
00:04:06,237 --> 00:04:08,570
And there's a bunch of virtues
here, these bullet points

98
00:04:08,570 --> 00:04:09,420
under explanation.

99
00:04:09,420 --> 00:04:10,836
There's a bunch
of things we could

100
00:04:10,836 --> 00:04:13,460
say about what makes
a good explanation

101
00:04:13,460 --> 00:04:15,020
of the world or a good model.

102
00:04:15,020 --> 00:04:16,645
And I won't say too
much abstractly.

103
00:04:16,645 --> 00:04:19,519
I'll mostly try to illustrate
this over the morning.

104
00:04:19,519 --> 00:04:22,610
But like any kind of
model, whether it's

105
00:04:22,610 --> 00:04:24,620
sort of more pattern
recognition classification

106
00:04:24,620 --> 00:04:30,440
style or these more
explanatory type models,

107
00:04:30,440 --> 00:04:33,935
ideas of compactness,
unification,

108
00:04:33,935 --> 00:04:34,810
are important, right?

109
00:04:34,810 --> 00:04:37,630
You want to explain
a lot with a little.

110
00:04:37,630 --> 00:04:38,540
OK?

111
00:04:38,540 --> 00:04:41,810
There's a term if anybody
has read David Deutsch's book

112
00:04:41,810 --> 00:04:43,190
The Beginning Of Infinity.

113
00:04:43,190 --> 00:04:46,100
He talks about this
view in a certain form

114
00:04:46,100 --> 00:04:49,550
of good explanations as being
hard to vary, non-arbitrary.

115
00:04:49,550 --> 00:04:50,150
OK.

116
00:04:50,150 --> 00:04:52,910
That's sort of in common
with any way of describing

117
00:04:52,910 --> 00:04:54,020
or explaining the world.

118
00:04:54,020 --> 00:04:55,520
But some key features
of the models

119
00:04:55,520 --> 00:04:57,670
we're going to talk about--
one is that they're generative.

120
00:04:57,670 --> 00:04:59,420
So what we mean by
generative is that they

121
00:04:59,420 --> 00:05:00,890
generate the world, right?

122
00:05:00,890 --> 00:05:03,490
In some sense, their output
is the world, your experience.

123
00:05:03,490 --> 00:05:05,740
They're trying to explain
the stuff you observe

124
00:05:05,740 --> 00:05:08,530
by positing some hidden,
unobservable, but really

125
00:05:08,530 --> 00:05:12,700
important, causal
actionable deep stuff.

126
00:05:12,700 --> 00:05:14,221
They don't model a task.

127
00:05:14,221 --> 00:05:15,220
That's really important.

128
00:05:15,220 --> 00:05:17,080
Because, like, if you're
used to something like,

129
00:05:17,080 --> 00:05:19,180
you know, end to end training
of a deep neural network

130
00:05:19,180 --> 00:05:21,429
for classification where
there's an objective function

131
00:05:21,429 --> 00:05:22,967
and the task and
the task is to map

132
00:05:22,967 --> 00:05:24,550
from things you
experience and observe

133
00:05:24,550 --> 00:05:26,360
in the world to how
you should behave,

134
00:05:26,360 --> 00:05:28,420
that's sort of the
opposite view, right?

135
00:05:28,420 --> 00:05:31,270
These are things whose output
is not behavior on a task,

136
00:05:31,270 --> 00:05:32,985
but whose output is
the world you see.

137
00:05:32,985 --> 00:05:34,360
Because what
they're trying to do

138
00:05:34,360 --> 00:05:37,240
is produce or
generate explanations.

139
00:05:37,240 --> 00:05:39,320
And that means they have
to come into contact.

140
00:05:39,320 --> 00:05:42,730
They have to basically
explain stuff you see.

141
00:05:42,730 --> 00:05:43,330
OK.

142
00:05:43,330 --> 00:05:45,910
Now, these models are not
just generative in this sense,

143
00:05:45,910 --> 00:05:47,170
but they're causal.

144
00:05:47,170 --> 00:05:49,310
And, again, I'm using
these terms intuitively.

145
00:05:49,310 --> 00:05:51,260
I'll get more precise later on.

146
00:05:51,260 --> 00:05:55,240
But what I mean by that is the
hidden or latent variables that

147
00:05:55,240 --> 00:05:58,840
generate the stuff you
observe are, in some form,

148
00:05:58,840 --> 00:06:00,820
trying to get at the
actual causal mechanisms

149
00:06:00,820 --> 00:06:01,404
in the world--

150
00:06:01,404 --> 00:06:03,778
the things that, if you were
then to go act on the world,

151
00:06:03,778 --> 00:06:06,446
you could intervene on and move
around and succeed in changing

152
00:06:06,446 --> 00:06:07,570
the world the way you want.

153
00:06:07,570 --> 00:06:10,028
Because that's the point of
having one of these rich models

154
00:06:10,028 --> 00:06:13,540
is so that you can use it
to act intelligently, right?

155
00:06:13,540 --> 00:06:16,720
And, again, this is a contrast
with a approach that's

156
00:06:16,720 --> 00:06:18,850
trying to find and
classify patterns that

157
00:06:18,850 --> 00:06:21,377
are useful for performing
some particular task to detect

158
00:06:21,377 --> 00:06:22,960
oh, when I see this,
I should do this.

159
00:06:22,960 --> 00:06:24,779
When I see this, I
should do that, right?

160
00:06:24,779 --> 00:06:25,820
That's good for one task.

161
00:06:25,820 --> 00:06:29,330
But these are meant to be good
for an endless array of tasks.

162
00:06:29,330 --> 00:06:32,020
Not any task, but, in
some important sense,

163
00:06:32,020 --> 00:06:34,207
a kind of unbounded
set of tasks where

164
00:06:34,207 --> 00:06:36,790
given a goal which is different
from your model of the world--

165
00:06:36,790 --> 00:06:37,390
you have your goal.

166
00:06:37,390 --> 00:06:38,764
You have your
model of the world.

167
00:06:38,764 --> 00:06:42,400
And then you use that model to
plan some sequence of actions

168
00:06:42,400 --> 00:06:43,540
to achieve your goal.

169
00:06:43,540 --> 00:06:45,770
And you change the goal,
you get a different plan.

170
00:06:45,770 --> 00:06:47,474
But the model is the
invariant, right?

171
00:06:47,474 --> 00:06:49,390
And it's invariant,
because it captures what's

172
00:06:49,390 --> 00:06:51,280
really going on causally.

173
00:06:51,280 --> 00:06:52,930
And then maybe the
most important,

174
00:06:52,930 --> 00:06:55,420
but hardest to really get a
handle on, theme-- although,

175
00:06:55,420 --> 00:06:57,510
again, we'll try to do
this by the end of today--

176
00:06:57,510 --> 00:06:59,750
is that they're
compositional in some way.

177
00:06:59,750 --> 00:07:02,520
They consist of parts which
have independent meaning

178
00:07:02,520 --> 00:07:04,660
or which have some notion
of meaning, and then ways

179
00:07:04,660 --> 00:07:07,160
of hooking those together
to form larger wholes.

180
00:07:07,160 --> 00:07:10,000
And that gives a kind of
flexibility or extensibility

181
00:07:10,000 --> 00:07:13,990
that is fundamental,
important to intelligence--

182
00:07:13,990 --> 00:07:16,690
the ability to not just,
say, learn from little data,

183
00:07:16,690 --> 00:07:19,450
but to be able to take what
you've learned in some tasks

184
00:07:19,450 --> 00:07:21,940
and use it instantly,
immediately, on tasks you've

185
00:07:21,940 --> 00:07:23,710
never had any training for.

186
00:07:23,710 --> 00:07:26,890
It's, I think, really only with
this kind of model building

187
00:07:26,890 --> 00:07:29,470
view of intelligence
that you can do that.

188
00:07:29,470 --> 00:07:31,930
I'll give one other
motivating example--

189
00:07:31,930 --> 00:07:34,000
just because it will
appear in different forms

190
00:07:34,000 --> 00:07:35,480
throughout the talk--

191
00:07:35,480 --> 00:07:39,010
just of the difference between
classification and explanation

192
00:07:39,010 --> 00:07:43,330
as ways of thinking about the
world with thinking about,

193
00:07:43,330 --> 00:07:46,240
in particular, planets and
just the orbits of objects

194
00:07:46,240 --> 00:07:47,410
in the solar system.

195
00:07:47,410 --> 00:07:51,441
That could include objects,
basically, on any one planet,

196
00:07:51,441 --> 00:07:51,940
like ours.

197
00:07:51,940 --> 00:07:54,520
But think about the problem
of describing the motions

198
00:07:54,520 --> 00:07:56,740
of the planets around the sun.

199
00:07:56,740 --> 00:07:58,030
Well, there's some phenomena.

200
00:07:58,030 --> 00:07:59,560
You can make observations.

201
00:07:59,560 --> 00:08:01,990
You could observe
them in various ways.

202
00:08:01,990 --> 00:08:04,450
Go back to the early
stages of modern science

203
00:08:04,450 --> 00:08:07,850
when the data by which the
phenomena were represented--

204
00:08:07,850 --> 00:08:09,550
you know, things like
just measurements

205
00:08:09,550 --> 00:08:15,490
of those light spots in the
sky, over nights, over years.

206
00:08:15,490 --> 00:08:19,870
So here are two ways to capture
the regularities in the data.

207
00:08:19,870 --> 00:08:23,030
You could think about Kepler's
laws or Newton's laws.

208
00:08:23,030 --> 00:08:25,420
So just to remind you,
these are Kepler's laws.

209
00:08:25,420 --> 00:08:26,930
And these are Newton's laws.

210
00:08:26,930 --> 00:08:28,513
I won't really go
through the details.

211
00:08:28,513 --> 00:08:31,000
Probably, all of you know
these or have some familiarity.

212
00:08:31,000 --> 00:08:32,500
The key thing is
that Kepler's laws

213
00:08:32,500 --> 00:08:37,000
are laws about patterns of
motion and space and time.

214
00:08:37,000 --> 00:08:40,030
They specify the shape of the
orbits, the shape of the path

215
00:08:40,030 --> 00:08:44,860
that the planets trace
out in the solar system.

216
00:08:44,860 --> 00:08:48,570
Not in the sky, but in
the actual 3D world--

217
00:08:48,570 --> 00:08:50,630
the idea that the
orbits, the planets,

218
00:08:50,630 --> 00:08:52,750
are an ellipse with
the sun at one focus.

219
00:08:52,750 --> 00:08:55,570
And then they give some other
mathematical regularities

220
00:08:55,570 --> 00:08:57,220
that describe, in
a sense, how fast

221
00:08:57,220 --> 00:08:59,050
the planets go around
the sun as a function

222
00:08:59,050 --> 00:09:01,300
of the size of the
orbit and the fact

223
00:09:01,300 --> 00:09:03,310
that they kind of go
faster at some places

224
00:09:03,310 --> 00:09:05,690
and slower at other places
in the orbit, right?

225
00:09:05,690 --> 00:09:06,970
OK.

226
00:09:06,970 --> 00:09:09,250
But in a very
important sense, they

227
00:09:09,250 --> 00:09:11,860
don't explain why they
do these things, right?

228
00:09:11,860 --> 00:09:14,500
These are patterns which,
if I were to give you

229
00:09:14,500 --> 00:09:17,424
a set of data, a
path, and I said,

230
00:09:17,424 --> 00:09:19,090
is this a possible
planet or not-- maybe

231
00:09:19,090 --> 00:09:20,800
there's a undiscovered planet.

232
00:09:20,800 --> 00:09:22,510
And this is possibly
that, or maybe

233
00:09:22,510 --> 00:09:24,109
this is some other
thing like a comet.

234
00:09:24,109 --> 00:09:25,900
And you could use this
to classify and say,

235
00:09:25,900 --> 00:09:29,710
yeah, that's a planet,
not a comet, right?

236
00:09:29,710 --> 00:09:32,350
And, you know, you could
use them to predict, right?

237
00:09:32,350 --> 00:09:35,230
If you've observed a planet
over some periods of time

238
00:09:35,230 --> 00:09:37,064
in the sky, then you
could use Kepler's laws

239
00:09:37,064 --> 00:09:39,063
to basically fit an ellipse
and figure out where

240
00:09:39,063 --> 00:09:40,210
it's going to be later on.

241
00:09:40,210 --> 00:09:41,260
That's great.

242
00:09:41,260 --> 00:09:44,120
But they don't explain.

243
00:09:44,120 --> 00:09:46,330
In contrast, Newton's laws
work like this, right?

244
00:09:46,330 --> 00:09:49,030
Again, there's several
different kinds of laws.

245
00:09:49,030 --> 00:09:52,030
There's, classically,
Newton's laws of motion.

246
00:09:52,030 --> 00:09:55,240
These ideas about inertia and
F equals MA and every action

247
00:09:55,240 --> 00:09:57,910
produces an equal and opposite
reaction, again, don't

248
00:09:57,910 --> 00:10:00,129
say anything about planets.

249
00:10:00,129 --> 00:10:01,920
But they really say
everything about force.

250
00:10:01,920 --> 00:10:05,100
They talk about how forces
work and how forces interact

251
00:10:05,100 --> 00:10:07,080
and combine and compose--

252
00:10:07,080 --> 00:10:09,120
compositional--
to produce motion

253
00:10:09,120 --> 00:10:13,080
or, in particular, to
produce the change of motion.

254
00:10:13,080 --> 00:10:17,460
That's acceleration or the
second derivative of position.

255
00:10:17,460 --> 00:10:20,250
And then there's this other law,
the law of gravitational force,

256
00:10:20,250 --> 00:10:22,350
so the universal
gravitation, which

257
00:10:22,350 --> 00:10:25,260
specifies in particular how
you get one particular force.

258
00:10:25,260 --> 00:10:27,450
That's the name of the
force we call gravity

259
00:10:27,450 --> 00:10:29,970
as a function of the
mass of the two bodies

260
00:10:29,970 --> 00:10:31,830
and the square
distance between them

261
00:10:31,830 --> 00:10:35,295
and some unknown
constant, right?

262
00:10:35,295 --> 00:10:37,170
And the idea is you put
these things together

263
00:10:37,170 --> 00:10:38,790
and you get Kepler's law.

264
00:10:38,790 --> 00:10:41,250
You can derive the fact
that the planets have

265
00:10:41,250 --> 00:10:43,140
to go that way from
the combination

266
00:10:43,140 --> 00:10:46,410
of these laws of motion and
the law of gravitational force.

267
00:10:46,410 --> 00:10:48,900
So there's a sense in which
the explanation is deeper

268
00:10:48,900 --> 00:10:53,530
and that you can derive the
patterns from the explanation.

269
00:10:53,530 --> 00:10:54,960
But it's a lot more than that.

270
00:10:54,960 --> 00:10:58,500
Because these laws
don't just explain

271
00:10:58,500 --> 00:11:00,450
the motions of the
planets around the sun,

272
00:11:00,450 --> 00:11:02,200
but a huge number
of other things.

273
00:11:02,200 --> 00:11:03,690
Like, for example,
they don't just

274
00:11:03,690 --> 00:11:04,890
explain the orbits
of the planets,

275
00:11:04,890 --> 00:11:06,640
but also other things
in the solar system.

276
00:11:06,640 --> 00:11:09,000
Like, you can use them
to describe comets.

277
00:11:09,000 --> 00:11:12,180
You can use them to describe the
moon going around the planets.

278
00:11:12,180 --> 00:11:13,980
And you can use
them to explain why

279
00:11:13,980 --> 00:11:16,920
the moon goes around the
Earth and not around the sun

280
00:11:16,920 --> 00:11:18,780
in that sense, right?

281
00:11:18,780 --> 00:11:21,830
You can use them to explain not
just the motions of the really

282
00:11:21,830 --> 00:11:24,330
big things in the solar system,
but the really little things

283
00:11:24,330 --> 00:11:28,230
like, you know, this, and to
explain why when I drop this

284
00:11:28,230 --> 00:11:31,980
or when Newton famously
did or didn't drop an apple

285
00:11:31,980 --> 00:11:33,800
or had an apple drop
on its head, right?

286
00:11:33,800 --> 00:11:36,430
That, superficially, seems to
be a very different pattern,

287
00:11:36,430 --> 00:11:36,930
right?

288
00:11:36,930 --> 00:11:39,600
It's something going down
in your current frame

289
00:11:39,600 --> 00:11:40,380
of reference.

290
00:11:40,380 --> 00:11:43,500
But the very same laws
describe exactly that

291
00:11:43,500 --> 00:11:46,230
and explain why the moon
goes around the Earth,

292
00:11:46,230 --> 00:11:48,600
but the bottle or
the apple goes down

293
00:11:48,600 --> 00:11:52,890
in my current
experience of the world.

294
00:11:52,890 --> 00:11:55,920
In terms of things like
causal and actionable ideas,

295
00:11:55,920 --> 00:11:59,612
they explain how you could get
a man to the moon and back again

296
00:11:59,612 --> 00:12:01,320
or how you could build
a rocket to escape

297
00:12:01,320 --> 00:12:06,730
the gravitational field to not
only get off the ground the way

298
00:12:06,730 --> 00:12:09,720
we're all on the ground, but
to get off or out of orbiting

299
00:12:09,720 --> 00:12:13,480
around and get to orbiting
some other thing, right?

300
00:12:13,480 --> 00:12:15,810
And it's all about
compositionality

301
00:12:15,810 --> 00:12:17,220
as well as causality.

302
00:12:17,220 --> 00:12:19,860
In order to escape the
Earth's gravitational field

303
00:12:19,860 --> 00:12:21,277
or get to the moon
and back again,

304
00:12:21,277 --> 00:12:22,901
there's a lot of
things you have to do.

305
00:12:22,901 --> 00:12:24,630
But one of the key
things you have to do

306
00:12:24,630 --> 00:12:27,240
is generate some
significant force

307
00:12:27,240 --> 00:12:29,617
to oppose, be
stronger, than gravity.

308
00:12:29,617 --> 00:12:31,950
And, you know, Newton really
didn't know how to do that.

309
00:12:31,950 --> 00:12:33,932
But some years later,
people figured out,

310
00:12:33,932 --> 00:12:35,640
you know, by chemistry
and other things--

311
00:12:35,640 --> 00:12:39,280
explosions, rockets-- how to
do some other kind of physics

312
00:12:39,280 --> 00:12:43,140
which could generate a force
that was powerful enough

313
00:12:43,140 --> 00:12:48,780
for an object the size of
rocket to go against gravity

314
00:12:48,780 --> 00:12:52,410
and to get to where you need
to be and then to get back.

315
00:12:52,410 --> 00:12:55,860
So the idea of a causal
model, which in this case

316
00:12:55,860 --> 00:12:59,340
is the one based on forces, and
compositionality-- the ability

317
00:12:59,340 --> 00:13:01,832
to take the general laws
of forces, laws about one

318
00:13:01,832 --> 00:13:03,540
particular kind of
force that's generated

319
00:13:03,540 --> 00:13:05,730
by this mysterious
thing called mass,

320
00:13:05,730 --> 00:13:07,320
some other kinds
of forces generated

321
00:13:07,320 --> 00:13:10,410
by exploding chemicals--
put those all together

322
00:13:10,410 --> 00:13:11,970
is hugely powerful.

323
00:13:11,970 --> 00:13:15,150
And, of course, this
as an expression

324
00:13:15,150 --> 00:13:17,040
of human intelligence--
you know,

325
00:13:17,040 --> 00:13:19,110
the moon shot is a
classic metaphor.

326
00:13:19,110 --> 00:13:20,850
Demis used it in his talk.

327
00:13:20,850 --> 00:13:23,730
And I think if we really
want to understand

328
00:13:23,730 --> 00:13:26,640
the way intelligence works in
the human mind and brain that

329
00:13:26,640 --> 00:13:28,320
could lead to this,
you have to go back

330
00:13:28,320 --> 00:13:29,697
to the roots of intelligence.

331
00:13:29,697 --> 00:13:31,030
You've heard me say this before.

332
00:13:31,030 --> 00:13:32,870
And I'm going to do
this more by today.

333
00:13:32,870 --> 00:13:35,370
We want to go back to the roots
of intelligence in even very

334
00:13:35,370 --> 00:13:38,610
young children where you already
see all of this happening,

335
00:13:38,610 --> 00:13:39,990
right?

336
00:13:39,990 --> 00:13:40,560
OK.

337
00:13:40,560 --> 00:13:41,643
So that's the big picture.

338
00:13:44,370 --> 00:13:45,240
I'll just point you.

339
00:13:45,240 --> 00:13:47,615
If you want to learn more
about the history of this idea,

340
00:13:47,615 --> 00:13:52,290
a really nice thing to read
is this book by Kenneth Craik.

341
00:13:52,290 --> 00:13:56,370
He was an English scientist sort
of a contemporary of Turing,

342
00:13:56,370 --> 00:13:58,770
also died tragically
early, although

343
00:13:58,770 --> 00:14:02,132
from different tragic causes.

344
00:14:02,132 --> 00:14:03,840
He was, you know, one
of the first people

345
00:14:03,840 --> 00:14:06,480
to start thinking about
this topic of brains, minds,

346
00:14:06,480 --> 00:14:09,930
and machines,
cybernetics type ideas,

347
00:14:09,930 --> 00:14:12,270
using math to describe
how the brain works, how

348
00:14:12,270 --> 00:14:15,091
the mind might work in a brain.

349
00:14:15,091 --> 00:14:16,590
As you see when you
read this quote,

350
00:14:16,590 --> 00:14:18,548
he didn't even really
know what a computer was.

351
00:14:18,548 --> 00:14:20,850
Because it was
pre-Turing, right?

352
00:14:20,850 --> 00:14:22,990
But he wrote this wonderful
book, very short book.

353
00:14:22,990 --> 00:14:27,334
And I'll just quote here
from one of the chapters.

354
00:14:27,334 --> 00:14:29,250
The book was called The
Nature Of Explanation.

355
00:14:29,250 --> 00:14:31,789
And it was sort of both a
philosophical study of that

356
00:14:31,789 --> 00:14:33,330
and how explanation
works in science,

357
00:14:33,330 --> 00:14:35,280
like some of the ideas I was
just going through, but also

358
00:14:35,280 --> 00:14:37,920
really arguing in very common
sense and compelling ways

359
00:14:37,920 --> 00:14:41,572
why this is a key idea
for understanding how

360
00:14:41,572 --> 00:14:42,780
the mind and the brain works.

361
00:14:42,780 --> 00:14:46,620
And he wasn't just
talking about humans.

362
00:14:46,620 --> 00:14:49,726
Well, you know, these ideas
have their greatest expression

363
00:14:49,726 --> 00:14:51,600
in some form, their most
powerful expression,

364
00:14:51,600 --> 00:14:52,670
in the human mind.

365
00:14:52,670 --> 00:14:56,610
They're also important
ones for understanding

366
00:14:56,610 --> 00:14:58,890
other intelligent brains.

367
00:14:58,890 --> 00:15:01,420
So he says here, "one of the
most fundamental properties

368
00:15:01,420 --> 00:15:04,720
of thought is its power
of predicting events.

369
00:15:04,720 --> 00:15:06,640
It enables us, for
instance, to design bridges

370
00:15:06,640 --> 00:15:08,194
with a sufficient
factor of safety

371
00:15:08,194 --> 00:15:09,610
instead of building
them haphazard

372
00:15:09,610 --> 00:15:11,680
and waiting to see
whether they collapse.

373
00:15:11,680 --> 00:15:15,400
If the organism carries a small
scale model of external reality

374
00:15:15,400 --> 00:15:17,990
into its own possible
actions within its head,

375
00:15:17,990 --> 00:15:20,200
it is able to try out
various alternatives,

376
00:15:20,200 --> 00:15:21,880
conclude which is
the best of them,

377
00:15:21,880 --> 00:15:24,250
react to future situations
before they arise,

378
00:15:24,250 --> 00:15:25,960
utilize the knowledge
of past events

379
00:15:25,960 --> 00:15:27,750
in dealing with the
present and future

380
00:15:27,750 --> 00:15:30,790
and in every way to react
in a much fuller, safer,

381
00:15:30,790 --> 00:15:34,150
and more competent manner to
the emergencies which face it."

382
00:15:34,150 --> 00:15:35,920
So he's just really
summing up this

383
00:15:35,920 --> 00:15:37,270
is what intelligence is about--

384
00:15:37,270 --> 00:15:39,160
building a model of
the world that you

385
00:15:39,160 --> 00:15:42,010
can manipulate and plan on
and improve, think about,

386
00:15:42,010 --> 00:15:43,970
reason about, all that.

387
00:15:43,970 --> 00:15:46,280
And then he makes this
very nice analogy,

388
00:15:46,280 --> 00:15:49,910
a kind of cognitive
technology analogy.

389
00:15:49,910 --> 00:15:52,091
"Most of the greatest
advances of modern technology

390
00:15:52,091 --> 00:15:53,590
have been instruments
which extended

391
00:15:53,590 --> 00:15:56,890
the scope of our sense organs,
our brains, or our limbs--

392
00:15:56,890 --> 00:15:58,750
such, our telescopes
and microscopes,

393
00:15:58,750 --> 00:16:02,740
wireless calculating machines,
typewriters, motor cars, ships,

394
00:16:02,740 --> 00:16:03,620
and airplanes."

395
00:16:03,620 --> 00:16:04,120
Right?

396
00:16:04,120 --> 00:16:06,790
He's writing in 1943, or that's
when the book was published,

397
00:16:06,790 --> 00:16:08,170
writing a little before that.

398
00:16:08,170 --> 00:16:08,440
Right?

399
00:16:08,440 --> 00:16:09,565
He didn't even have
the word computer.

400
00:16:09,565 --> 00:16:11,940
Or back then, computer meant
something different-- people

401
00:16:11,940 --> 00:16:15,280
who did calculations, basically.

402
00:16:15,280 --> 00:16:17,374
But same idea-- that's
what he's talking about.

403
00:16:17,374 --> 00:16:19,540
He's talking about a computer,
though he doesn't yet

404
00:16:19,540 --> 00:16:21,470
have the language
quite to describe it.

405
00:16:21,470 --> 00:16:23,950
"Is it not possible, therefore,
that our brains themselves

406
00:16:23,950 --> 00:16:26,860
utilize comparable mechanisms
to achieve the same ends

407
00:16:26,860 --> 00:16:28,900
and that these mechanisms
can parallel phenomena

408
00:16:28,900 --> 00:16:31,340
in the external world
as a calculating machine

409
00:16:31,340 --> 00:16:33,917
can parallel with development
of strains in a bridge?"

410
00:16:33,917 --> 00:16:35,500
And what he's saying
is that the brain

411
00:16:35,500 --> 00:16:37,870
is this amazing kind of
calculating machine that,

412
00:16:37,870 --> 00:16:42,945
in some form, can parallel
the development of forces

413
00:16:42,945 --> 00:16:44,320
in all sorts of
different systems

414
00:16:44,320 --> 00:16:47,790
in the world and
not only forces.

415
00:16:47,790 --> 00:16:49,990
And, again, he doesn't
have the vocabulary

416
00:16:49,990 --> 00:16:52,900
in English or the math really
to describe it formally.

417
00:16:52,900 --> 00:16:55,210
That's, you know, why this
is such an exciting time

418
00:16:55,210 --> 00:16:56,080
to be doing all
these things we're

419
00:16:56,080 --> 00:16:57,829
doing is because now
we're really starting

420
00:16:57,829 --> 00:16:59,830
to have the vocabulary
and the technology

421
00:16:59,830 --> 00:17:01,500
to make good on this idea.

422
00:17:01,500 --> 00:17:02,000
OK.

423
00:17:02,000 --> 00:17:06,790
So that's it for the big picture
philosophical introduction.

424
00:17:06,790 --> 00:17:08,770
Now, I'll try to
get more concrete

425
00:17:08,770 --> 00:17:12,369
with the questions that
have motivated not only me,

426
00:17:12,369 --> 00:17:13,630
but many cognitive scientists.

427
00:17:13,630 --> 00:17:17,734
Like, why are we thinking about
these issues of explanation?

428
00:17:17,734 --> 00:17:19,150
And what are our
concrete handles?

429
00:17:19,150 --> 00:17:21,040
Like, let's give a
couple of examples

430
00:17:21,040 --> 00:17:24,550
of ways we can study
intelligence in this form.

431
00:17:24,550 --> 00:17:28,040
And I like to say that the
big question of our field--

432
00:17:28,040 --> 00:17:29,770
it's big enough that
it can fold in most,

433
00:17:29,770 --> 00:17:32,630
if not all, of our big
questions-- is this one.

434
00:17:32,630 --> 00:17:36,160
How does the mind get so
much out of so little, right?

435
00:17:36,160 --> 00:17:38,380
So across cognition
wherever you look,

436
00:17:38,380 --> 00:17:40,720
our minds are building
these rich models

437
00:17:40,720 --> 00:17:43,540
of the world that go way
beyond the data of our senses.

438
00:17:43,540 --> 00:17:45,280
That's this extension
of our sense organs

439
00:17:45,280 --> 00:17:47,590
that Craik was talking
about there, right?

440
00:17:47,590 --> 00:17:50,030
From data that is
altogether way too sparse,

441
00:17:50,030 --> 00:17:51,670
noisy, ambiguous in
all sorts of ways,

442
00:17:51,670 --> 00:17:55,600
we build models that allow us
to go beyond our experience,

443
00:17:55,600 --> 00:17:57,730
to plan effectively.

444
00:17:57,730 --> 00:17:58,600
How do we do it?

445
00:17:58,600 --> 00:18:02,050
And you could add-- and I do
want to go in this direction.

446
00:18:02,050 --> 00:18:04,690
Because it is part
of how when we relate

447
00:18:04,690 --> 00:18:07,000
the mind to the brain or
these more explanatory models

448
00:18:07,000 --> 00:18:09,590
to more sort of pattern
classification models,

449
00:18:09,590 --> 00:18:11,590
we also have to
ask not only how do

450
00:18:11,590 --> 00:18:14,320
you get such a rich model of
the world from so little data,

451
00:18:14,320 --> 00:18:16,220
but how do you do it so quickly?

452
00:18:16,220 --> 00:18:18,110
How do you do it so flexibly?

453
00:18:18,110 --> 00:18:20,740
How do you do it with
such little energy, right?

454
00:18:20,740 --> 00:18:24,640
Metabolic energy is an
incredible constraint

455
00:18:24,640 --> 00:18:28,120
on computation in
the mind and brain.

456
00:18:28,120 --> 00:18:31,000
So just to give some examples--

457
00:18:31,000 --> 00:18:33,370
again, these are ones that
will keep coming up here.

458
00:18:33,370 --> 00:18:35,470
They've come up in our work.

459
00:18:35,470 --> 00:18:36,980
But they're key
ones that allow us

460
00:18:36,980 --> 00:18:38,980
to take the perspective
that you're seeing today

461
00:18:38,980 --> 00:18:40,750
and bring it into contact with
the other perspectives you're

462
00:18:40,750 --> 00:18:41,916
seeing in the summer school.

463
00:18:41,916 --> 00:18:43,930
So let's look at visual
scene perception.

464
00:18:43,930 --> 00:18:45,850
This is just a
snapshot of images

465
00:18:45,850 --> 00:18:48,841
I got searching on Google Images
for, I think, object detection,

466
00:18:48,841 --> 00:18:49,340
right?

467
00:18:49,340 --> 00:18:51,880
And we've seen a lot of examples
of these kinds of things.

468
00:18:51,880 --> 00:18:55,570
You can go to the iCub and see
its trainable object detectors.

469
00:18:55,570 --> 00:18:59,625
We'll see more of this when
Amnon, the Mobileye guy,

470
00:18:59,625 --> 00:19:01,750
comes and tells us about
really cool things they've

471
00:19:01,750 --> 00:19:04,830
done to do object detection
for self-driving cars.

472
00:19:04,830 --> 00:19:07,221
You saw a lot of this kind
of thing in robotics before.

473
00:19:07,221 --> 00:19:07,720
OK.

474
00:19:07,720 --> 00:19:10,720
So what's the basic sort of
idea, the state of the art,

475
00:19:10,720 --> 00:19:12,940
in a lot of higher
level computer vision?

476
00:19:12,940 --> 00:19:15,520
It's getting a
system that learns

477
00:19:15,520 --> 00:19:18,065
to put boxes around regions
of an image that contains

478
00:19:18,065 --> 00:19:20,440
some object of interest that
you can label with the word,

479
00:19:20,440 --> 00:19:23,320
like person or pedestrian
or car or horse,

480
00:19:23,320 --> 00:19:24,610
or various parts of things.

481
00:19:24,610 --> 00:19:26,860
Like, you might not just put
a box around the bicycle,

482
00:19:26,860 --> 00:19:29,370
but you might put a box around
the wheel, handlebar, seat,

483
00:19:29,370 --> 00:19:29,870
and so on.

484
00:19:29,870 --> 00:19:31,006
OK.

485
00:19:31,006 --> 00:19:32,380
And in some sense,
you know, this

486
00:19:32,380 --> 00:19:36,340
is starting to get at some
aspect of computer vision,

487
00:19:36,340 --> 00:19:37,360
right?

488
00:19:37,360 --> 00:19:39,760
Several people have quoted
from David Marr who said,

489
00:19:39,760 --> 00:19:43,100
you know, vision is
figuring out what is where

490
00:19:43,100 --> 00:19:45,630
from images, right?

491
00:19:45,630 --> 00:19:49,860
But Marr meant something
that goes way beyond this,

492
00:19:49,860 --> 00:19:53,255
way beyond putting boxes in
images with single-word labels.

493
00:19:53,255 --> 00:19:54,880
And I think you just
have to, you know,

494
00:19:54,880 --> 00:19:59,497
look around you to see that your
brain's ability to reconstruct

495
00:19:59,497 --> 00:20:01,330
the world, the whole
three-dimensional world

496
00:20:01,330 --> 00:20:03,280
with all the objects
and surfaces in it,

497
00:20:03,280 --> 00:20:07,060
goes so far beyond putting a
few boxes around some parts

498
00:20:07,060 --> 00:20:08,110
of the image, right?

499
00:20:08,110 --> 00:20:10,120
Even put aside the fact
that when you actually

500
00:20:10,120 --> 00:20:14,800
do this in real time on
a real system, you know,

501
00:20:14,800 --> 00:20:18,112
the mistakes and the gaps
are just glaring, right?

502
00:20:18,112 --> 00:20:19,570
But even if you
could do this, even

503
00:20:19,570 --> 00:20:21,403
if you could put a box
around all the things

504
00:20:21,403 --> 00:20:24,050
that we could easily label,
you look around the world.

505
00:20:24,050 --> 00:20:25,660
You see so many
objects and surfaces

506
00:20:25,660 --> 00:20:27,340
out there, all actionable.

507
00:20:27,340 --> 00:20:30,580
This is what this is when I
talk about causality, right?

508
00:20:30,580 --> 00:20:33,040
Think about, you know,
if somebody told me

509
00:20:33,040 --> 00:20:36,460
that there was some treasure
hidden behind the chair that

510
00:20:36,460 --> 00:20:38,552
has Timothy
Goldsmith's name on it,

511
00:20:38,552 --> 00:20:40,510
I know I could go around
looking for the chair.

512
00:20:40,510 --> 00:20:42,455
I think I saw it
over there, right?

513
00:20:42,455 --> 00:20:44,080
And I know exactly
what I'd have to do.

514
00:20:44,080 --> 00:20:46,140
I'd have to go there,
lift up the thing, right?

515
00:20:46,140 --> 00:20:49,630
That's just one of the
many plans I could make

516
00:20:49,630 --> 00:20:51,850
given what I see in this world.

517
00:20:51,850 --> 00:20:55,396
If I didn't know that that
was Timothy Goldsmith's chair,

518
00:20:55,396 --> 00:20:57,520
somewhere over there there's
the Lily chair, right?

519
00:20:57,520 --> 00:20:58,020
OK.

520
00:20:58,020 --> 00:21:00,340
So I know that
there's chairs here.

521
00:21:00,340 --> 00:21:02,800
There's little
name tags on them.

522
00:21:02,800 --> 00:21:05,680
I could go around, make my way
through looking at that tags,

523
00:21:05,680 --> 00:21:08,045
and find the one that
says Lily and then, again,

524
00:21:08,045 --> 00:21:10,420
know what I have to do to go
look for the treasure buried

525
00:21:10,420 --> 00:21:12,190
under it, right?

526
00:21:12,190 --> 00:21:14,466
That's just one of,
really, this endless number

527
00:21:14,466 --> 00:21:16,090
of tasks that you
can do with the model

528
00:21:16,090 --> 00:21:17,590
of the world around
you that you've

529
00:21:17,590 --> 00:21:22,300
built from visual perception.

530
00:21:22,300 --> 00:21:26,090
And we don't need to get
into a debate of, you know--

531
00:21:26,090 --> 00:21:28,990
here, we can do this in a
few minutes if you want--

532
00:21:28,990 --> 00:21:31,150
about the difference
between, like say

533
00:21:31,150 --> 00:21:33,760
for example, what Jim DiCarlo
might call core object

534
00:21:33,760 --> 00:21:36,700
recognition or the kind of stuff
that Winrich is studying where,

535
00:21:36,700 --> 00:21:39,490
you know, you show a
monkey just a single object

536
00:21:39,490 --> 00:21:41,980
against maybe a cluttered
background or a single face

537
00:21:41,980 --> 00:21:44,410
for 100 or 200
milliseconds, and you

538
00:21:44,410 --> 00:21:46,040
ask a very important question.

539
00:21:46,040 --> 00:21:48,310
What can you get
in 100 milliseconds

540
00:21:48,310 --> 00:21:49,630
in that kind of limited scene?

541
00:21:49,630 --> 00:21:51,610
That's a very
important question.

542
00:21:51,610 --> 00:21:54,652
But the convergence of visual
neuroscience on that problem

543
00:21:54,652 --> 00:21:56,110
has enabled us to
really understand

544
00:21:56,110 --> 00:21:58,390
a lot about the
circuits that drive

545
00:21:58,390 --> 00:22:02,860
the first initial paths of some
aspects of high-level vision,

546
00:22:02,860 --> 00:22:03,520
right?

547
00:22:03,520 --> 00:22:05,860
But that is really only
getting out the classification

548
00:22:05,860 --> 00:22:07,840
or pattern detection
part of the problem.

549
00:22:07,840 --> 00:22:10,720
And the other part
of the problem,

550
00:22:10,720 --> 00:22:13,444
figuring out the stuff in the
world that causes what you see,

551
00:22:13,444 --> 00:22:14,860
that is really the
actionable part

552
00:22:14,860 --> 00:22:16,792
of things to guide
your actions the world.

553
00:22:16,792 --> 00:22:19,000
We really are still quite
far from understanding that

554
00:22:19,000 --> 00:22:22,400
at least with those
kinds of methods.

555
00:22:22,400 --> 00:22:23,920
Just to give a few examples--

556
00:22:23,920 --> 00:22:27,280
some of my favorite kind of
hard object detection examples,

557
00:22:27,280 --> 00:22:30,477
but ones that show that
your brain is really

558
00:22:30,477 --> 00:22:32,560
doing this kind of thing
even from a single image.

559
00:22:32,560 --> 00:22:34,060
You know, it
doesn't just require

560
00:22:34,060 --> 00:22:35,960
a lot of extensive exploration.

561
00:22:35,960 --> 00:22:39,370
So let's do some person
detecting problems here.

562
00:22:39,370 --> 00:22:41,630
So here's a few images.

563
00:22:41,630 --> 00:22:44,020
And let's just start with
the one in the upper left.

564
00:22:44,020 --> 00:22:46,531
You tell me.

565
00:22:46,531 --> 00:22:49,030
Here, I'll point with this, so
you can see it on the screen.

566
00:22:49,030 --> 00:22:51,770
How many people are in
this upper left image?

567
00:22:51,770 --> 00:22:52,420
Just tell me.

568
00:22:52,420 --> 00:22:53,800
AUDIENCE: Three.

569
00:22:53,800 --> 00:22:55,180
AUDIENCE: About 18.

570
00:22:55,180 --> 00:22:56,470
JOSH TENENBAUM: About 18?

571
00:22:56,470 --> 00:22:56,970
OK.

572
00:22:59,390 --> 00:22:59,990
OK.

573
00:22:59,990 --> 00:23:01,235
Yeah.

574
00:23:01,235 --> 00:23:02,360
That's a good answer, yeah.

575
00:23:02,360 --> 00:23:04,670
There are somewhere between
20 or 30 or something.

576
00:23:04,670 --> 00:23:05,170
Yeah.

577
00:23:05,170 --> 00:23:08,000
That was even more precise
than I was expecting.

578
00:23:08,000 --> 00:23:08,500
OK.

579
00:23:08,500 --> 00:23:09,340
Now, I don't know.

580
00:23:09,340 --> 00:23:11,381
This would be a good
project if somebody is still

581
00:23:11,381 --> 00:23:13,210
looking for a project.

582
00:23:13,210 --> 00:23:15,700
If you take the best person
detector that you can find out

583
00:23:15,700 --> 00:23:18,324
there or that you can build from
however much training data you

584
00:23:18,324 --> 00:23:20,890
find labeled on the
web, how many of those

585
00:23:20,890 --> 00:23:22,151
people is it going to detect?

586
00:23:22,151 --> 00:23:23,650
You know, my guess
is, at best, it's

587
00:23:23,650 --> 00:23:26,050
going to detect just five
or six-- just the bicyclists

588
00:23:26,050 --> 00:23:26,786
in the front row.

589
00:23:26,786 --> 00:23:27,910
Does that seem fair to say?

590
00:23:27,910 --> 00:23:30,160
Even that will be
a challenge, right?

591
00:23:30,160 --> 00:23:32,650
Whereas, not only do you
have no trouble detecting

592
00:23:32,650 --> 00:23:34,630
the bicyclists in the front
row, but all the other ones back

593
00:23:34,630 --> 00:23:36,790
there, too, even though for
many of them all you can see

594
00:23:36,790 --> 00:23:38,620
is like a little bit
of their face or neck

595
00:23:38,620 --> 00:23:41,260
or sometimes even
just that funny helmet

596
00:23:41,260 --> 00:23:42,910
that bicyclists wear.

597
00:23:42,910 --> 00:23:44,920
But your ability to
make sense of that

598
00:23:44,920 --> 00:23:48,806
depends on understanding a lot
of causal stuff in the world--

599
00:23:48,806 --> 00:23:50,680
the three-dimensional
structure of the world,

600
00:23:50,680 --> 00:23:53,507
the three-dimensional structure
of bodies in the world,

601
00:23:53,507 --> 00:23:55,840
some of the behaviors that
bicyclists tend to engage in,

602
00:23:55,840 --> 00:23:57,160
and so on.

603
00:23:57,160 --> 00:23:59,350
Or take the scene in
the upper right there,

604
00:23:59,350 --> 00:24:01,380
how many people
are in that scene?

605
00:24:01,380 --> 00:24:02,170
AUDIENCE: 350.

606
00:24:02,170 --> 00:24:04,096
JOSH TENENBAUM: 350.

607
00:24:04,096 --> 00:24:05,720
Maybe a couple of
hundred or something.

608
00:24:05,720 --> 00:24:06,627
Yeah, I guess.

609
00:24:06,627 --> 00:24:07,960
Were you counting all this time?

610
00:24:07,960 --> 00:24:08,360
No.

611
00:24:08,360 --> 00:24:08,680
That was a good estimate

612
00:24:08,680 --> 00:24:08,760
AUDIENCE: No.

613
00:24:08,760 --> 00:24:10,020
JOSH TENENBAUM: Yeah, OK.

614
00:24:10,020 --> 00:24:13,417
The scene in the lower left,
how many people are there?

615
00:24:13,417 --> 00:24:14,000
AUDIENCE: 100?

616
00:24:14,000 --> 00:24:15,500
JOSH TENENBAUM:
100-something, yeah.

617
00:24:15,500 --> 00:24:17,215
The scene in the lower right?

618
00:24:17,215 --> 00:24:17,840
AUDIENCE: Zero.

619
00:24:17,840 --> 00:24:18,714
JOSH TENENBAUM: Zero.

620
00:24:18,714 --> 00:24:20,790
Was anybody tempted to say two?

621
00:24:20,790 --> 00:24:23,740
Were you tempted to say
two as a joke or seriously?

622
00:24:23,740 --> 00:24:24,895
Both are valid responses.

623
00:24:24,895 --> 00:24:25,570
AUDIENCE: [INAUDIBLE]

624
00:24:25,570 --> 00:24:26,445
JOSH TENENBAUM: Yeah.

625
00:24:26,445 --> 00:24:27,197
OK.

626
00:24:27,197 --> 00:24:29,530
So, again, how do we solve
all those problems, including

627
00:24:29,530 --> 00:24:32,140
knowing that one in the bottom--

628
00:24:32,140 --> 00:24:36,000
maybe it takes a second or so--
but knowing that, you know,

629
00:24:36,000 --> 00:24:37,570
there's actually zero there.

630
00:24:37,570 --> 00:24:39,470
You know, it's the hats,
the graduation hats,

631
00:24:39,470 --> 00:24:41,470
that are the cues to
people in the other scenes.

632
00:24:41,470 --> 00:24:43,270
But here, again, because
we know something

633
00:24:43,270 --> 00:24:47,010
about physics and the fact
that people need to breathe--

634
00:24:47,010 --> 00:24:48,640
or just tend to
not bury themselves

635
00:24:48,640 --> 00:24:50,890
all the way up to the
tippy top of their head,

636
00:24:50,890 --> 00:24:53,530
unless it's like some kind
of Samuel Beckett play

637
00:24:53,530 --> 00:24:56,620
or something,
Graduation Endgame--

638
00:24:56,620 --> 00:25:01,570
then, you know, there's almost
certainly nobody in that scene.

639
00:25:01,570 --> 00:25:02,584
OK.

640
00:25:02,584 --> 00:25:04,000
Now, all of those
problems, again,

641
00:25:04,000 --> 00:25:06,430
are really way beyond what
current computer vision

642
00:25:06,430 --> 00:25:08,370
can do and really wants to do.

643
00:25:08,370 --> 00:25:11,224
But I mean, I think,
you know, the aspect

644
00:25:11,224 --> 00:25:12,640
of scene understanding
that really

645
00:25:12,640 --> 00:25:14,989
taps into this notion of
intelligence, of explaining

646
00:25:14,989 --> 00:25:16,780
modeling the causal
structure of the world,

647
00:25:16,780 --> 00:25:18,030
should be able to do all that.

648
00:25:18,030 --> 00:25:19,030
Because we can, right?

649
00:25:19,030 --> 00:25:21,720
But here's a problem which
is one that motivates us

650
00:25:21,720 --> 00:25:23,590
on the vision side
that's somewhere

651
00:25:23,590 --> 00:25:26,680
in between these
sort of ridiculously

652
00:25:26,680 --> 00:25:28,930
hard by current standards
problems and one

653
00:25:28,930 --> 00:25:31,202
that, you know,
people can do now.

654
00:25:31,202 --> 00:25:32,660
This is a kind of
problem that I've

655
00:25:32,660 --> 00:25:36,262
been trying to put out there
for computer vision community

656
00:25:36,262 --> 00:25:37,720
to think about it
in a serious way.

657
00:25:37,720 --> 00:25:40,730
Because it's a big challenge,
but it's not ridiculously hard.

658
00:25:40,730 --> 00:25:41,230
OK.

659
00:25:41,230 --> 00:25:44,500
So here, this is a scene of
airplane full of computer

660
00:25:44,500 --> 00:25:48,910
vision researchers,
in fact, going

661
00:25:48,910 --> 00:25:52,560
to last year's CVPR conference.

662
00:25:52,560 --> 00:25:56,779
And, again, how many
people are in the scene?

663
00:25:56,779 --> 00:25:57,320
AUDIENCE: 20?

664
00:25:57,320 --> 00:25:58,530
JOSH TENENBAUM: 20,50?

665
00:25:58,530 --> 00:25:59,850
Yeah, something like that.

666
00:25:59,850 --> 00:26:05,430
Again, you know, more than
10, less than 500, right?

667
00:26:05,430 --> 00:26:06,260
You could count.

668
00:26:06,260 --> 00:26:07,530
Well, you can count, actually.

669
00:26:07,530 --> 00:26:08,850
Let's try that.

670
00:26:08,850 --> 00:26:14,220
So, you know, just do this
mentally along with me.

671
00:26:14,220 --> 00:26:16,800
Just touch, in your
mind, all the people.

672
00:26:16,800 --> 00:26:19,100
You know, 1, 2, 3, 4--
well, it's too hard

673
00:26:19,100 --> 00:26:20,100
to do it with the mouse.

674
00:26:20,100 --> 00:26:22,070
Da, da, da, da, da--
you know, at some point

675
00:26:22,070 --> 00:26:24,570
it gets a little bit hard to
see exactly how many people are

676
00:26:24,570 --> 00:26:26,291
standing in the back
by the restroom.

677
00:26:26,291 --> 00:26:26,790
OK.

678
00:26:26,790 --> 00:26:31,587
But it's amazing how much you
can, with just the slightest

679
00:26:31,587 --> 00:26:34,170
little bit of effort, pick out
all the people even though most

680
00:26:34,170 --> 00:26:35,432
of them are barely visible.

681
00:26:35,432 --> 00:26:36,390
And it's not only that.

682
00:26:36,390 --> 00:26:39,120
It's not just that
you can pick them out.

683
00:26:39,120 --> 00:26:41,580
While you only see a very
small part of their bodies,

684
00:26:41,580 --> 00:26:43,288
you know where all
the rest of their body

685
00:26:43,288 --> 00:26:45,150
is to some degree of
being able to predict

686
00:26:45,150 --> 00:26:48,580
an act if you needed to, right?

687
00:26:48,580 --> 00:26:51,660
So to sort of probe this, here's
a kind of little experiment

688
00:26:51,660 --> 00:26:53,680
we can do.

689
00:26:53,680 --> 00:26:55,390
So let's take this guy here.

690
00:26:55,390 --> 00:26:57,210
See, you've just got his head.

691
00:26:57,210 --> 00:26:59,630
And though you see his
head, think about where

692
00:26:59,630 --> 00:27:00,630
the rest of his body is.

693
00:27:00,630 --> 00:27:07,200
And in particular,
think about where

694
00:27:07,200 --> 00:27:08,872
his right hand is in the scene.

695
00:27:08,872 --> 00:27:10,080
You can't see his right hand.

696
00:27:10,080 --> 00:27:11,746
But in some sense,
you know where it is.

697
00:27:11,746 --> 00:27:12,720
I'll move the cursor.

698
00:27:12,720 --> 00:27:14,219
And you just hum
when I get to where

699
00:27:14,219 --> 00:27:17,220
you think his right hand
is if you could see,

700
00:27:17,220 --> 00:27:18,717
like if everything
was transparent.

701
00:27:18,717 --> 00:27:19,342
AUDIENCE: Yeah.

702
00:27:19,342 --> 00:27:20,629
AUDIENCE: Yeah.

703
00:27:20,629 --> 00:27:21,420
JOSH TENENBAUM: OK.

704
00:27:21,420 --> 00:27:23,260
Somewhere around there.

705
00:27:23,260 --> 00:27:25,560
All right, how about
let's take this guy.

706
00:27:25,560 --> 00:27:28,030
You can see his scalp only and
maybe a bit of his shoulder.

707
00:27:28,030 --> 00:27:32,520
Think about his left big toe.

708
00:27:32,520 --> 00:27:33,300
OK?

709
00:27:33,300 --> 00:27:35,940
Think about that.

710
00:27:35,940 --> 00:27:39,437
And just hum when I get to
where his left big toe is.

711
00:27:39,437 --> 00:27:40,062
AUDIENCE: Yeah.

712
00:27:40,062 --> 00:27:42,137
AUDIENCE: Yeah.

713
00:27:42,137 --> 00:27:43,470
JOSH TENENBAUM: Somewhere, yeah.

714
00:27:43,470 --> 00:27:46,170
All right, so you can see we
did an instant experiment.

715
00:27:46,170 --> 00:27:49,080
You don't even need
Mechanical Turk.

716
00:27:49,080 --> 00:27:50,670
It's like recording
from neurons,

717
00:27:50,670 --> 00:27:52,470
only you're each being a neuron.

718
00:27:52,470 --> 00:27:54,157
And you're humming
instead of spiking.

719
00:27:54,157 --> 00:27:56,490
But it's amazing how much you
can learn about your brain

720
00:27:56,490 --> 00:27:58,417
just by doing things like that.

721
00:27:58,417 --> 00:28:00,750
You've got a whole probability
distribution right there,

722
00:28:00,750 --> 00:28:01,170
right?

723
00:28:01,170 --> 00:28:02,520
And that's a meaningful
distribution.

724
00:28:02,520 --> 00:28:04,230
You weren't just
hallucinating, right?

725
00:28:04,230 --> 00:28:07,380
You were using a model, a
causal model, of how bodies work

726
00:28:07,380 --> 00:28:10,050
and how other three-dimensional
structures work

727
00:28:10,050 --> 00:28:11,962
to solve that problem.

728
00:28:11,962 --> 00:28:13,050
OK.

729
00:28:13,050 --> 00:28:14,550
This isn't just
about bodies, right?

730
00:28:14,550 --> 00:28:16,590
Our ability to
detect objects, like

731
00:28:16,590 --> 00:28:18,360
to detect all the
books on my bookshelf

732
00:28:18,360 --> 00:28:20,940
there-- again, most of which
are barely visible, just

733
00:28:20,940 --> 00:28:25,560
a few pixels, a small part
of each book, or the glasses

734
00:28:25,560 --> 00:28:27,660
in this tabletop
scene there, right?

735
00:28:27,660 --> 00:28:30,420
I don't really know any
other way you can do this.

736
00:28:30,420 --> 00:28:34,110
Like, any standard machine
learning-based book detector

737
00:28:34,110 --> 00:28:36,100
is not going to detect
most of those books.

738
00:28:36,100 --> 00:28:37,770
Any standard glass
detector is not

739
00:28:37,770 --> 00:28:40,320
going to detect most
of those glasses.

740
00:28:40,320 --> 00:28:41,490
And yet you can do it.

741
00:28:41,490 --> 00:28:43,615
And I don't think there's
any alternative to saying

742
00:28:43,615 --> 00:28:45,750
that in some sense, as
we'll talk more about it

743
00:28:45,750 --> 00:28:50,416
in a little bit, you're kind of
inverting the graphics process,

744
00:28:50,416 --> 00:28:52,290
In computer science now,
we call it graphics.

745
00:28:52,290 --> 00:28:53,460
We maybe used to call it optics.

746
00:28:53,460 --> 00:28:55,210
But the way light
bounces off the surfaces

747
00:28:55,210 --> 00:28:56,700
of objects in the
world and comes

748
00:28:56,700 --> 00:28:58,980
into your eye, that's
a causal process

749
00:28:58,980 --> 00:29:00,720
that your visual
system is in some way

750
00:29:00,720 --> 00:29:05,272
able to invert, to model
and go from the observable

751
00:29:05,272 --> 00:29:07,230
to the unobservable stuff,
just like Newton was

752
00:29:07,230 --> 00:29:09,650
doing with astronomical data.

753
00:29:09,650 --> 00:29:10,530
OK.

754
00:29:10,530 --> 00:29:12,780
Enough on vision
for now, sort of.

755
00:29:12,780 --> 00:29:15,720
Let's go from actually just
perceiving this stuff out there

756
00:29:15,720 --> 00:29:18,000
in the world to forming
concepts and generalizing.

757
00:29:18,000 --> 00:29:20,250
So a problem that I've studied
a lot, that a lot of us

758
00:29:20,250 --> 00:29:22,170
have studied a
lot in this field,

759
00:29:22,170 --> 00:29:24,680
is the problem of learning
concepts and, in particular,

760
00:29:24,680 --> 00:29:26,430
one very particular
kind of concept, which

761
00:29:26,430 --> 00:29:28,650
is object kinds like
categories of objects, things

762
00:29:28,650 --> 00:29:30,460
we could label with a word.

763
00:29:30,460 --> 00:29:32,690
It's one of the very
most obvious forms

764
00:29:32,690 --> 00:29:35,190
of interesting learning that
you see in young children, part

765
00:29:35,190 --> 00:29:36,065
of learning language.

766
00:29:36,065 --> 00:29:37,560
But it's not just
about language.

767
00:29:37,560 --> 00:29:39,790
And the striking thing
when you look at, say,

768
00:29:39,790 --> 00:29:42,360
a child learning words--
just in particular let's say,

769
00:29:42,360 --> 00:29:43,920
words that label
kinds of objects,

770
00:29:43,920 --> 00:29:47,430
like chair or horse
or bottle, ball--

771
00:29:47,430 --> 00:29:49,970
is how little data
of a certain labels

772
00:29:49,970 --> 00:29:54,090
or how little task
relevant data is required.

773
00:29:54,090 --> 00:29:57,654
A lot of other data is probably
used in some way, right?

774
00:29:57,654 --> 00:29:59,070
And, again, this
is a theme you've

775
00:29:59,070 --> 00:30:00,929
heard from a number
of the other speakers.

776
00:30:00,929 --> 00:30:02,970
But just to give you some
of my favorite examples

777
00:30:02,970 --> 00:30:04,860
of how we can learn
object concepts from just

778
00:30:04,860 --> 00:30:06,362
one or a few
examples, well, here's

779
00:30:06,362 --> 00:30:08,070
an example from some
experimental stimuli

780
00:30:08,070 --> 00:30:11,880
we use where we just made up a
whole little world of objects.

781
00:30:11,880 --> 00:30:16,604
And in this world, I can teach
you a new name, let's say tufa,

782
00:30:16,604 --> 00:30:17,770
and give you a few examples.

783
00:30:17,770 --> 00:30:18,900
And, again, you
can now go through.

784
00:30:18,900 --> 00:30:21,150
We can try this as a little
experiment here and just

785
00:30:21,150 --> 00:30:22,470
say, you know, yes or no.

786
00:30:22,470 --> 00:30:23,830
For each of these
objects, is it a tufa?

787
00:30:23,830 --> 00:30:25,038
So how about this, yes or no?

788
00:30:25,038 --> 00:30:25,645
AUDIENCE: Yes.

789
00:30:25,645 --> 00:30:26,520
JOSH TENENBAUM: Here?

790
00:30:26,520 --> 00:30:27,175
AUDIENCE: No.

791
00:30:27,175 --> 00:30:28,050
JOSH TENENBAUM: Here?

792
00:30:28,050 --> 00:30:28,592
AUDIENCE: No.

793
00:30:28,592 --> 00:30:29,466
JOSH TENENBAUM: Here?

794
00:30:29,466 --> 00:30:30,150
AUDIENCE: No.

795
00:30:30,150 --> 00:30:30,420
JOSH TENENBAUM: Here?

796
00:30:30,420 --> 00:30:30,962
AUDIENCE: No.

797
00:30:30,962 --> 00:30:31,836
JOSH TENENBAUM: Here?

798
00:30:31,836 --> 00:30:32,420
AUDIENCE: Yes.

799
00:30:32,420 --> 00:30:33,294
JOSH TENENBAUM: Here?

800
00:30:33,294 --> 00:30:33,960
AUDIENCE: No.

801
00:30:33,960 --> 00:30:34,835
JOSH TENENBAUM: Here?

802
00:30:34,835 --> 00:30:35,733
AUDIENCE: Yes.

803
00:30:35,733 --> 00:30:37,206
No.

804
00:30:37,206 --> 00:30:38,679
No.

805
00:30:38,679 --> 00:30:39,661
No.

806
00:30:39,661 --> 00:30:41,625
No.

807
00:30:41,625 --> 00:30:42,620
Yes.

808
00:30:42,620 --> 00:30:43,770
JOSH TENENBAUM: Yeah.

809
00:30:43,770 --> 00:30:44,270
OK.

810
00:30:44,270 --> 00:30:47,090
So first of all, how long
did it take you for each one?

811
00:30:47,090 --> 00:30:48,590
I mean, it basically
didn't take you

812
00:30:48,590 --> 00:30:52,550
any longer than it takes in one
of Winrich's experiments to get

813
00:30:52,550 --> 00:30:54,110
the spike seeing the face.

814
00:30:54,110 --> 00:30:56,030
So you learned this
concept, and now you

815
00:30:56,030 --> 00:30:57,680
can just use it right away.

816
00:30:57,680 --> 00:31:01,680
It's far less than a second
of actual visual processing.

817
00:31:01,680 --> 00:31:04,040
And there was a little
bit of a latency.

818
00:31:04,040 --> 00:31:07,550
This one's a little more
uncertain here, right?

819
00:31:07,550 --> 00:31:10,220
And you saw that in that it
took you maybe almost twice as

820
00:31:10,220 --> 00:31:12,165
long to make that decision.

821
00:31:12,165 --> 00:31:13,470
OK.

822
00:31:13,470 --> 00:31:16,460
That's the kind of thing we'd
like to be able to explain.

823
00:31:16,460 --> 00:31:18,919
And that means how can
you get a whole concept?

824
00:31:18,919 --> 00:31:20,210
It's a whole new kind of thing.

825
00:31:20,210 --> 00:31:21,560
You don't really
know much about it.

826
00:31:21,560 --> 00:31:23,060
Maybe you know it's
some kind of weird plant

827
00:31:23,060 --> 00:31:23,893
on this weird thing.

828
00:31:23,893 --> 00:31:27,320
But you've got a whole new
concept and a whole entry

829
00:31:27,320 --> 00:31:30,920
into a whole, probably,
system of concepts.

830
00:31:30,920 --> 00:31:33,740
Again, several notions of being
quick-- sample complexity,

831
00:31:33,740 --> 00:31:35,570
as we say, just one
or a few examples,

832
00:31:35,570 --> 00:31:37,340
but also the speed--
the speed in which

833
00:31:37,340 --> 00:31:38,990
you formed that
concept and the speed

834
00:31:38,990 --> 00:31:41,810
in which you're able to
deploy it in now recognizing

835
00:31:41,810 --> 00:31:44,660
and detecting things.

836
00:31:44,660 --> 00:31:47,360
Just to give one other real
world example, so it's not just

837
00:31:47,360 --> 00:31:50,030
we make things up-- but, for
example, here's an object.

838
00:31:50,030 --> 00:31:52,130
Just how many know
what this thing is?

839
00:31:52,130 --> 00:31:53,414
Raise your hand if you do.

840
00:31:53,414 --> 00:31:55,330
How many people don't
know what this thing is?

841
00:31:55,330 --> 00:31:55,850
OK.

842
00:31:55,850 --> 00:31:56,990
Good.

843
00:31:56,990 --> 00:31:59,870
So this is a piece of
rock climbing equipment.

844
00:31:59,870 --> 00:32:01,010
It's called a cam.

845
00:32:01,010 --> 00:32:02,781
I won't tell you
anything more than that.

846
00:32:02,781 --> 00:32:04,280
Well, maybe I'll
tell you one thing,

847
00:32:04,280 --> 00:32:07,370
because it's kind of useful.

848
00:32:07,370 --> 00:32:09,880
Well, I mean, you may or
may not even need to--

849
00:32:09,880 --> 00:32:10,640
yeah.

850
00:32:10,640 --> 00:32:12,059
This strap here
is not technically

851
00:32:12,059 --> 00:32:13,350
part of the piece of equipment.

852
00:32:13,350 --> 00:32:14,210
But it doesn't really matter.

853
00:32:14,210 --> 00:32:14,709
OK.

854
00:32:14,709 --> 00:32:17,439
So anyway, I've given you
one example of this new kind

855
00:32:17,439 --> 00:32:18,480
of thing for most of you.

856
00:32:18,480 --> 00:32:20,390
And now, you can look
at a complex scene

857
00:32:20,390 --> 00:32:22,700
like this climber's
equipment rack.

858
00:32:22,700 --> 00:32:26,220
And tell me, are there
any cams in this scene?

859
00:32:26,220 --> 00:32:26,900
AUDIENCE: Yes.

860
00:32:26,900 --> 00:32:28,191
JOSH TENENBAUM: Where are they?

861
00:32:28,191 --> 00:32:29,735
AUDIENCE: On top.

862
00:32:29,735 --> 00:32:30,610
JOSH TENENBAUM: Yeah.

863
00:32:30,610 --> 00:32:31,430
The top.

864
00:32:31,430 --> 00:32:31,930
Like here?

865
00:32:31,930 --> 00:32:32,814
AUDIENCE: No.

866
00:32:32,814 --> 00:32:33,700
Next to there.

867
00:32:33,700 --> 00:32:34,120
JOSH TENENBAUM: Here.

868
00:32:34,120 --> 00:32:34,620
Yeah.

869
00:32:34,620 --> 00:32:35,530
Right, exactly.

870
00:32:35,530 --> 00:32:38,267
How about this scene, any?

871
00:32:38,267 --> 00:32:39,608
AUDIENCE: No.

872
00:32:39,608 --> 00:32:40,950
AUDIENCE: [INAUDIBLE]

873
00:32:40,950 --> 00:32:42,010
JOSH TENENBAUM:
There's none of that--

874
00:32:42,010 --> 00:32:42,968
well, there's a couple.

875
00:32:42,968 --> 00:32:45,760
Anyone see the ones over up
in the upper right up here?

876
00:32:45,760 --> 00:32:46,630
AUDIENCE: Yeah.

877
00:32:46,630 --> 00:32:46,990
JOSH TENENBAUM: Yeah.

878
00:32:46,990 --> 00:32:47,823
They're hard to see.

879
00:32:47,823 --> 00:32:49,780
They're really dark
and shaded, right?

880
00:32:52,242 --> 00:32:54,700
But when I draw your attention
to it, and then you're like,

881
00:32:54,700 --> 00:32:55,200
oh yeah.

882
00:32:55,200 --> 00:32:56,827
I see that, right?

883
00:32:56,827 --> 00:32:58,660
So part of why I give
these examples is they

884
00:32:58,660 --> 00:33:00,970
show how the several
examples I've

885
00:33:00,970 --> 00:33:02,470
been giving, like
the object concept

886
00:33:02,470 --> 00:33:04,511
learning thing, interacts
with the vision, right?

887
00:33:04,511 --> 00:33:06,010
I think your ability
to solve tasks

888
00:33:06,010 --> 00:33:09,880
like this rests on your ability
to form this abstract concept

889
00:33:09,880 --> 00:33:11,620
of this physical object.

890
00:33:11,620 --> 00:33:14,737
And notice all these ones,
they're different colors.

891
00:33:14,737 --> 00:33:16,820
The physical details of
the objects are different.

892
00:33:16,820 --> 00:33:19,270
It's only a category of
object that's preserved.

893
00:33:19,270 --> 00:33:22,060
But your ability to recognize
these things in the real world

894
00:33:22,060 --> 00:33:24,250
depends on, also, the
ability to recognize them

895
00:33:24,250 --> 00:33:26,666
in very different viewpoints
under very different lighting

896
00:33:26,666 --> 00:33:27,250
conditions.

897
00:33:27,250 --> 00:33:30,130
And if we want to explain
how you can do this-- again,

898
00:33:30,130 --> 00:33:33,100
to go back to composability
and compositionality--

899
00:33:33,100 --> 00:33:35,620
we need to understand
how you can put together

900
00:33:35,620 --> 00:33:39,017
the kind of causal model
of how scenes are formed.

901
00:33:39,017 --> 00:33:41,350
That vision is inverting--
this inverse graphics thing--

902
00:33:41,350 --> 00:33:42,766
with the causal
model of something

903
00:33:42,766 --> 00:33:47,560
about how objects concepts
work and compose them together

904
00:33:47,560 --> 00:33:50,260
to be able to learn a
new concept of an object

905
00:33:50,260 --> 00:33:55,175
that you can also recognize new
instances of the kind of thing

906
00:33:55,175 --> 00:33:57,550
in new viewpoints and under
different lighting conditions

907
00:33:57,550 --> 00:34:00,160
than the really wonderfully
perfect example I gave you here

908
00:34:00,160 --> 00:34:02,230
with a nice lighting
and nice viewpoint.

909
00:34:02,230 --> 00:34:04,900
We can push this
to quite extremes.

910
00:34:04,900 --> 00:34:06,970
Like, in that scene
in the upper right,

911
00:34:06,970 --> 00:34:08,150
do you see any cams there?

912
00:34:08,150 --> 00:34:09,020
AUDIENCE: Yeah.

913
00:34:09,020 --> 00:34:09,894
JOSH TENENBAUM: Yeah.

914
00:34:09,894 --> 00:34:11,440
How many are there?

915
00:34:11,440 --> 00:34:12,864
AUDIENCE: [INAUDIBLE]

916
00:34:12,864 --> 00:34:14,739
JOSH TENENBAUM: Quite
a lot, yeah, and, like,

917
00:34:14,739 --> 00:34:16,010
all occluded and cluttered.

918
00:34:16,010 --> 00:34:16,510
Yeah.

919
00:34:16,510 --> 00:34:18,560
Amazing that you can do this.

920
00:34:18,560 --> 00:34:20,949
And as we'll see in
a little bit, what we

921
00:34:20,949 --> 00:34:23,199
do with our object concepts--

922
00:34:23,199 --> 00:34:25,570
and these are other
ways to show this notion

923
00:34:25,570 --> 00:34:27,110
of a generative model--

924
00:34:27,110 --> 00:34:28,360
we don't just classify things.

925
00:34:28,360 --> 00:34:30,790
But we can use them for all
sorts of other tasks, right?

926
00:34:30,790 --> 00:34:33,550
We can use them to generate
or imagine new instances.

927
00:34:33,550 --> 00:34:35,320
We can parse an
object out into parts.

928
00:34:35,320 --> 00:34:37,960
This is another novel,
but real object--

929
00:34:37,960 --> 00:34:41,415
the Segway personal thing.

930
00:34:41,415 --> 00:34:43,540
Which, again, probably all
of you know this, right?

931
00:34:43,540 --> 00:34:46,550
How many people have seen
those Segways before, right?

932
00:34:46,550 --> 00:34:47,050
OK.

933
00:34:47,050 --> 00:34:48,460
But you all probably
remember the first time

934
00:34:48,460 --> 00:34:49,239
you saw one on the street.

935
00:34:49,239 --> 00:34:50,020
And whoa, that's really cool.

936
00:34:50,020 --> 00:34:50,936
What's that new thing?

937
00:34:50,936 --> 00:34:53,449
And then somebody tells you,
and now you know, right?

938
00:34:53,449 --> 00:34:56,679
But it's partly related to your
ability to parse out the parts.

939
00:34:56,679 --> 00:34:58,930
If somebody says, oh, my
Segway has a flat tire,

940
00:34:58,930 --> 00:35:00,670
you kind of know what
that means and what

941
00:35:00,670 --> 00:35:03,130
you could do, at least in
principle, to fix it, right?

942
00:35:07,840 --> 00:35:09,600
You can take different
kinds of things

943
00:35:09,600 --> 00:35:12,010
in some category like
vehicles and imagine

944
00:35:12,010 --> 00:35:13,690
ways of combining
the parts to make yet

945
00:35:13,690 --> 00:35:16,610
other new either real or
fanciful vehicles, like that C

946
00:35:16,610 --> 00:35:18,380
to the lower right there.

947
00:35:18,380 --> 00:35:21,040
These are all things you
do from very little data

948
00:35:21,040 --> 00:35:23,650
from these object concepts.

949
00:35:23,650 --> 00:35:27,010
Moving on and then both
back to some examples

950
00:35:27,010 --> 00:35:30,310
you saw Tomer and I talk
about on the first day

951
00:35:30,310 --> 00:35:32,500
in our brief
introduction and what

952
00:35:32,500 --> 00:35:35,470
we'll get to more by the end
of today, examples like these.

953
00:35:35,470 --> 00:35:37,360
So Tomer already
showed you the scene

954
00:35:37,360 --> 00:35:41,470
of the red and the blue ball
chasing each other around.

955
00:35:41,470 --> 00:35:43,270
I won't rehearse that example.

956
00:35:43,270 --> 00:35:46,400
I'll show you another
scene that is more famous.

957
00:35:46,400 --> 00:35:46,900
OK.

958
00:35:46,900 --> 00:35:49,067
Well, so for the people
who haven't seen it,

959
00:35:49,067 --> 00:35:50,650
you can never watch
it too many times.

960
00:35:50,650 --> 00:35:53,066
Again, like that one, it's
just some shapes moving around.

961
00:35:53,066 --> 00:35:56,050
It was done in the
1940s, that golden age

962
00:35:56,050 --> 00:35:58,690
for cognitive science as
well as many other things.

963
00:35:58,690 --> 00:36:03,310
And much lower technology
of animation, it's

964
00:36:03,310 --> 00:36:06,222
like stop-action
animation on a table top.

965
00:36:06,222 --> 00:36:07,930
But just like the
scene on the left which

966
00:36:07,930 --> 00:36:09,760
is done with computer
animation, just

967
00:36:09,760 --> 00:36:12,610
from the motion of a few shapes
in this two-dimensional world,

968
00:36:12,610 --> 00:36:14,350
you get so much.

969
00:36:14,350 --> 00:36:15,600
First of all, you get physics.

970
00:36:15,600 --> 00:36:17,755
Let's watch it again.

971
00:36:17,755 --> 00:36:19,180
It looks like
there's a collision.

972
00:36:19,180 --> 00:36:20,380
It's just objects,
shapes moving.

973
00:36:20,380 --> 00:36:22,190
But it looks like one thing
is banging into another.

974
00:36:22,190 --> 00:36:24,110
And it looks like they're
characters, right?

975
00:36:24,110 --> 00:36:26,430
It looks like the big one is
kind of bullying the other one.

976
00:36:26,430 --> 00:36:28,240
It's sort of backed
him up against the wall

977
00:36:28,240 --> 00:36:29,240
scaring them off, right?

978
00:36:29,240 --> 00:36:31,270
Does you guys see that?

979
00:36:31,270 --> 00:36:32,830
The other one was hiding.

980
00:36:32,830 --> 00:36:34,960
Now, this one goes
in to go after him.

981
00:36:34,960 --> 00:36:36,820
It starts to get a
little scary, right?

982
00:36:36,820 --> 00:36:39,224
Cue the scary music if
it was a silent movie.

983
00:36:39,224 --> 00:36:40,390
Doo, doo, doo, doo, doo, OK.

984
00:36:40,390 --> 00:36:42,160
You can watch the end of
it on YouTube if you want.

985
00:36:42,160 --> 00:36:43,210
It's quite famous.

986
00:36:43,210 --> 00:36:45,290
So I won't show
you the end of it.

987
00:36:45,290 --> 00:36:47,620
But in case you're getting
nervous, don't worry.

988
00:36:47,620 --> 00:36:51,820
It ends happily, at least for
two of the three characters.

989
00:36:51,820 --> 00:36:54,730
From some combination of all
your experiences in your life

990
00:36:54,730 --> 00:36:57,100
and whatever evolution
genetics gave you

991
00:36:57,100 --> 00:36:58,780
before you came
out into the world,

992
00:36:58,780 --> 00:37:00,740
you've built up a model that
allows you to understand this.

993
00:37:00,740 --> 00:37:02,410
And then it's a separate,
but very interesting,

994
00:37:02,410 --> 00:37:03,430
question and harder one.

995
00:37:03,430 --> 00:37:05,239
How do you get to
that point, right?

996
00:37:05,239 --> 00:37:07,030
The question of the
development of the kind

997
00:37:07,030 --> 00:37:08,530
of commonsense
knowledge that allows

998
00:37:08,530 --> 00:37:10,945
you to parse out just the
motion into both forces,

999
00:37:10,945 --> 00:37:13,390
you know, one thing hitting
another thing, and then

1000
00:37:13,390 --> 00:37:16,892
the whole mental state structure
and the sort of social who's

1001
00:37:16,892 --> 00:37:18,100
good and who's bad on there--

1002
00:37:18,100 --> 00:37:19,130
I mean, because,
again, most people

1003
00:37:19,130 --> 00:37:21,088
when they see this and
think about a little bit

1004
00:37:21,088 --> 00:37:24,160
see some of the characters
as good and others as bad.

1005
00:37:24,160 --> 00:37:27,450
How that knowledge develops
is extremely interesting.

1006
00:37:27,450 --> 00:37:30,340
We're going to see a lot more
of the more experiments, how

1007
00:37:30,340 --> 00:37:34,090
we study this kind of thing
in young children, next week.

1008
00:37:34,090 --> 00:37:36,132
And we'll talk more about
the learning next week.

1009
00:37:36,132 --> 00:37:37,631
We'll see how much
of that I get to.

1010
00:37:37,631 --> 00:37:39,267
What I want to
talk about here is

1011
00:37:39,267 --> 00:37:41,350
sort of general issues of
how the knowledge works,

1012
00:37:41,350 --> 00:37:43,510
how you deploy it, how
you make the inferences

1013
00:37:43,510 --> 00:37:45,110
with the knowledge, and a
little bit about learning.

1014
00:37:45,110 --> 00:37:46,690
Maybe we'll see if we have
time for that at the end.

1015
00:37:46,690 --> 00:37:48,273
But they'll be more
of that next week.

1016
00:37:48,273 --> 00:37:52,974
I think it's important to
understand what the models are,

1017
00:37:52,974 --> 00:37:55,390
these generative models that
you're building of the world,

1018
00:37:55,390 --> 00:37:56,890
before you actually
study learning.

1019
00:37:56,890 --> 00:37:59,380
I think there's a danger
if you study learning.

1020
00:37:59,380 --> 00:38:02,680
Without having the right target
of learning, you might be--

1021
00:38:02,680 --> 00:38:06,580
to take a classic analogy--

1022
00:38:06,580 --> 00:38:08,560
trying to get to the
moon by climbing trees.

1023
00:38:13,460 --> 00:38:15,080
How about this?

1024
00:38:15,080 --> 00:38:18,755
Just to give one example
that is familiar,

1025
00:38:18,755 --> 00:38:20,630
because we saw this
wonderful talk by Demis--

1026
00:38:20,630 --> 00:38:24,350
and I think many people
had seen the DeepMind work.

1027
00:38:29,520 --> 00:38:32,720
And I hope everybody
here saw Demis' talk.

1028
00:38:32,720 --> 00:38:35,210
This is just a couple of
slides from their Nature paper,

1029
00:38:35,210 --> 00:38:38,510
where, again, they had
this deep Q-network, which

1030
00:38:38,510 --> 00:38:40,970
is I think a great
example of trying

1031
00:38:40,970 --> 00:38:44,450
to see how far you can go with
this pattern recognition idea,

1032
00:38:44,450 --> 00:38:45,020
right?

1033
00:38:45,020 --> 00:38:47,186
In a sense, what this network
does, if you remember,

1034
00:38:47,186 --> 00:38:49,650
is it has a bunch of sort
of convolutional layers

1035
00:38:49,650 --> 00:38:50,900
and of fully connected layers.

1036
00:38:50,900 --> 00:38:52,080
But it's mapping.

1037
00:38:52,080 --> 00:38:55,370
It's learning a feedforward
mapping from images

1038
00:38:55,370 --> 00:38:57,170
to joystick action.

1039
00:38:57,170 --> 00:38:59,180
So it's a perfect
example of trying

1040
00:38:59,180 --> 00:39:01,907
to solve interesting
problems of intelligence.

1041
00:39:01,907 --> 00:39:03,740
I think that the problems
of video gaming AI

1042
00:39:03,740 --> 00:39:05,936
are really cool ones.

1043
00:39:05,936 --> 00:39:07,310
With this pattern
classification,

1044
00:39:07,310 --> 00:39:09,018
they're basically
trying to find patterns

1045
00:39:09,018 --> 00:39:10,790
of pixels in Atari
video games that

1046
00:39:10,790 --> 00:39:12,375
are diagnostic of
whether you should

1047
00:39:12,375 --> 00:39:14,000
move your joystick
this way or that way

1048
00:39:14,000 --> 00:39:16,664
or press the button this
way or that way, right?

1049
00:39:16,664 --> 00:39:18,080
And they showed
that that can give

1050
00:39:18,080 --> 00:39:20,990
very competitive performance
with humans when you give it

1051
00:39:20,990 --> 00:39:23,660
enough training data and with
clever training algorithms,

1052
00:39:23,660 --> 00:39:25,250
right?

1053
00:39:25,250 --> 00:39:28,160
But I think there's also an
important sense in which what

1054
00:39:28,160 --> 00:39:30,680
this is doing is quite
different from what

1055
00:39:30,680 --> 00:39:32,660
humans are doing
when they're learning

1056
00:39:32,660 --> 00:39:34,310
to play one of these games.

1057
00:39:34,310 --> 00:39:36,800
And, you know, Demis, I
think is quite aware of this.

1058
00:39:36,800 --> 00:39:38,490
He made some of these
points in his talk

1059
00:39:38,490 --> 00:39:40,760
and, informally,
afterwards, right?

1060
00:39:40,760 --> 00:39:42,710
There's all sorts of
things that a person

1061
00:39:42,710 --> 00:39:44,960
brings to the problem of
learning an Atari video game,

1062
00:39:44,960 --> 00:39:48,470
just like your question of what
do you bring to learning this.

1063
00:39:48,470 --> 00:39:51,150
But I think from a
cognitive point of view,

1064
00:39:51,150 --> 00:39:52,790
the real problem
of intelligence is

1065
00:39:52,790 --> 00:39:55,454
to understand how learning
works with the knowledge

1066
00:39:55,454 --> 00:39:56,870
that you have and
how you actually

1067
00:39:56,870 --> 00:39:58,190
build up that knowledge.

1068
00:39:58,190 --> 00:40:00,770
I think that at least
the current DeepMind

1069
00:40:00,770 --> 00:40:03,360
system, the one that was
published a few months ago,

1070
00:40:03,360 --> 00:40:06,020
is not really getting
that question.

1071
00:40:06,020 --> 00:40:08,630
It's trying to see how much
you can do without really

1072
00:40:08,630 --> 00:40:10,290
a causal model of the world.

1073
00:40:10,290 --> 00:40:12,590
But as I think Demis
showed in his talk,

1074
00:40:12,590 --> 00:40:17,415
that's a direction,
among many others,

1075
00:40:17,415 --> 00:40:20,409
that I think they realized
they need to go in.

1076
00:40:20,409 --> 00:40:21,950
A nice way to
illustrate this is just

1077
00:40:21,950 --> 00:40:24,160
to look at one
particular video game.

1078
00:40:24,160 --> 00:40:25,880
This is a game called Frostbite.

1079
00:40:25,880 --> 00:40:29,180
It's one of the ones
down here on this chart,

1080
00:40:29,180 --> 00:40:31,700
which the DeepMind
system did particularly

1081
00:40:31,700 --> 00:40:34,370
poorly on in terms
of getting only

1082
00:40:34,370 --> 00:40:37,550
about 6% performance
relative to humans.

1083
00:40:37,550 --> 00:40:40,264
But I think it's
interesting and informative.

1084
00:40:40,264 --> 00:40:42,680
And it really gets to the heart
of all of the things we're

1085
00:40:42,680 --> 00:40:43,850
talking about here.

1086
00:40:43,850 --> 00:40:48,290
To contrast how the mind system
as well as other attempts

1087
00:40:48,290 --> 00:40:52,670
to do sort of powerful scalable
deep reinforcement learning,

1088
00:40:52,670 --> 00:40:54,620
I'll show you another
more recent result

1089
00:40:54,620 --> 00:40:56,540
from a different
group in a second.

1090
00:40:56,540 --> 00:40:59,870
Contrast how those systems
learn to play this video game

1091
00:40:59,870 --> 00:41:02,600
with how a human child
might learn to play a game,

1092
00:41:02,600 --> 00:41:04,820
like that kid over there
who's watching his older

1093
00:41:04,820 --> 00:41:07,230
brother play a game, right?

1094
00:41:07,230 --> 00:41:11,690
So the DeepMind
system, you know,

1095
00:41:11,690 --> 00:41:17,060
gets about 1,000 hours of
game play experience, right?

1096
00:41:17,060 --> 00:41:20,390
And then it chops that up
in various interesting ways

1097
00:41:20,390 --> 00:41:22,580
with the replay that
Demis talked about, right?

1098
00:41:22,580 --> 00:41:24,920
But when we talk about getting
so much from so little,

1099
00:41:24,920 --> 00:41:28,550
the basic data is about
1,000 hours of experience.

1100
00:41:28,550 --> 00:41:33,020
But I would venture that
a kid learns a lot more

1101
00:41:33,020 --> 00:41:34,657
from a lot less, right?

1102
00:41:34,657 --> 00:41:36,740
The way a kid actually
learns to play a video game

1103
00:41:36,740 --> 00:41:40,175
is not by trial and error
for 1,000 hours, right?

1104
00:41:40,175 --> 00:41:42,800
I mean, it might be a little bit
of trial and error themselves.

1105
00:41:42,800 --> 00:41:44,000
But, often, it might
be just watching

1106
00:41:44,000 --> 00:41:46,140
someone else play and
say, wow, that's awesome.

1107
00:41:46,140 --> 00:41:47,040
I'd like to do that.

1108
00:41:47,040 --> 00:41:47,540
Can I play?

1109
00:41:47,540 --> 00:41:47,900
My turn.

1110
00:41:47,900 --> 00:41:49,550
My turn-- and wrestling
for the joystick and then

1111
00:41:49,550 --> 00:41:50,570
seeing what you can do.

1112
00:41:50,570 --> 00:41:52,370
And it only takes
a minute, really,

1113
00:41:52,370 --> 00:41:54,200
to figure out if
this game is fun,

1114
00:41:54,200 --> 00:41:55,616
interesting, if
it's something you

1115
00:41:55,616 --> 00:41:59,100
want to do, and to sort of
get the basic hang of things,

1116
00:41:59,100 --> 00:42:00,890
at least of what you
should try to do.

1117
00:42:00,890 --> 00:42:02,520
That's not to say
to be able to do it.

1118
00:42:02,520 --> 00:42:05,420
So I mean, unless you
saw me give a talk,

1119
00:42:05,420 --> 00:42:06,960
has anybody played
this game before?

1120
00:42:06,960 --> 00:42:07,460
OK.

1121
00:42:07,460 --> 00:42:11,300
So perfect example-- let's
watch a minute of this game

1122
00:42:11,300 --> 00:42:13,430
and see if you can figure
out what's going on.

1123
00:42:13,430 --> 00:42:15,920
Think about how you learn
to play this game, right?

1124
00:42:15,920 --> 00:42:17,780
Imagine you're watching
somebody else play.

1125
00:42:17,780 --> 00:42:19,670
This is a video of not
the DeepMind system,

1126
00:42:19,670 --> 00:42:22,070
but of an expert
human game player,

1127
00:42:22,070 --> 00:42:23,660
a really good
human playing this,

1128
00:42:23,660 --> 00:42:25,265
like that kid's older brother.

1129
00:42:25,265 --> 00:42:30,710
[VIDEO PLAYBACK]

1130
00:43:11,767 --> 00:43:12,350
[END PLAYBACK]

1131
00:43:12,350 --> 00:43:13,250
OK.

1132
00:43:13,250 --> 00:43:14,340
Maybe you've got the idea.

1133
00:43:14,340 --> 00:43:17,600
So, again, only people
who haven't seen before,

1134
00:43:17,600 --> 00:43:19,295
so how does this game work?

1135
00:43:19,295 --> 00:43:21,170
So probably everybody
noticed, and it's maybe

1136
00:43:21,170 --> 00:43:22,720
so obvious you didn't
even mention it,

1137
00:43:22,720 --> 00:43:24,050
but every time he
hits a platform,

1138
00:43:24,050 --> 00:43:24,970
there's a beep, right?

1139
00:43:24,970 --> 00:43:26,120
And the platform turns blue.

1140
00:43:26,120 --> 00:43:27,140
Did everybody notice that?

1141
00:43:27,140 --> 00:43:27,640
Right.

1142
00:43:27,640 --> 00:43:30,710
So it only takes like one or two
of those, maybe even just one.

1143
00:43:30,710 --> 00:43:33,350
Like, beep, beep, woop, woop,
and you get that right away.

1144
00:43:33,350 --> 00:43:34,730
That's an important
causal thing.

1145
00:43:34,730 --> 00:43:37,700
And it just happened
that this guy is so good,

1146
00:43:37,700 --> 00:43:38,940
and he starts right away.

1147
00:43:38,940 --> 00:43:40,790
So he goes, ba, ba
ba, ba, ba, and he's

1148
00:43:40,790 --> 00:43:42,200
doing it about once a second.

1149
00:43:42,200 --> 00:43:43,970
And so there's an
illusory correlation.

1150
00:43:43,970 --> 00:43:46,410
And the same part of your
brain that figures out

1151
00:43:46,410 --> 00:43:49,521
the actually important and
true causal thing going on,

1152
00:43:49,521 --> 00:43:51,020
the first thing I
mentioned, figures

1153
00:43:51,020 --> 00:43:53,060
out this other thing, which
is just a slight illusion.

1154
00:43:53,060 --> 00:43:54,500
But if you started
playing it yourself,

1155
00:43:54,500 --> 00:43:56,480
you would quickly notice
that that wasn't true, right?

1156
00:43:56,480 --> 00:43:57,827
Because you'd start off there.

1157
00:43:57,827 --> 00:43:59,910
Maybe you would have thought
of that for a minute.

1158
00:43:59,910 --> 00:44:01,090
But then you'd
start off playing.

1159
00:44:01,090 --> 00:44:03,170
And very quickly, you'd
see you're sitting there

1160
00:44:03,170 --> 00:44:03,950
trying to decide what to do.

1161
00:44:03,950 --> 00:44:05,783
Because you're not as
expert as this person.

1162
00:44:05,783 --> 00:44:08,107
And the temperature's
going down anyway.

1163
00:44:08,107 --> 00:44:10,190
So, again, you would figure
that out very quickly.

1164
00:44:10,190 --> 00:44:11,240
What else is going
on in this game?

1165
00:44:11,240 --> 00:44:12,936
AUDIENCE: He has
to build an igloo.

1166
00:44:12,936 --> 00:44:15,316
JOSH TENENBAUM: He has
to build an igloo, yeah.

1167
00:44:15,316 --> 00:44:16,440
How does he build an igloo?

1168
00:44:16,440 --> 00:44:18,300
AUDIENCE: Just by [INAUDIBLE].

1169
00:44:18,300 --> 00:44:18,510
JOSH TENENBAUM: Right.

1170
00:44:18,510 --> 00:44:20,260
Every time he hits one
of those platforms,

1171
00:44:20,260 --> 00:44:21,820
a brick comes into play.

1172
00:44:21,820 --> 00:44:26,160
And then what, when you say
he has to build an igloo?

1173
00:44:26,160 --> 00:44:28,395
AUDIENCE: [INAUDIBLE]

1174
00:44:28,395 --> 00:44:29,270
JOSH TENENBAUM: Yeah.

1175
00:44:29,270 --> 00:44:30,938
And then what happens?

1176
00:44:30,938 --> 00:44:31,906
AUDIENCE: [INAUDIBLE]

1177
00:44:31,906 --> 00:44:33,842
JOSH TENENBAUM: What, sir?

1178
00:44:33,842 --> 00:44:38,200
AUDIENCE: [INAUDIBLE]

1179
00:44:38,200 --> 00:44:40,070
JOSH TENENBAUM: Right.

1180
00:44:40,070 --> 00:44:41,380
He goes in.

1181
00:44:41,380 --> 00:44:44,320
The level ends, he
gets some score for.

1182
00:44:44,320 --> 00:44:45,360
What about these things?

1183
00:44:45,360 --> 00:44:48,430
What are these, those
little dust on the screen?

1184
00:44:48,430 --> 00:44:50,125
AUDIENCE: Avoid them.

1185
00:44:50,125 --> 00:44:51,250
JOSH TENENBAUM: Avoid them.

1186
00:44:51,250 --> 00:44:51,749
Yeah.

1187
00:44:51,749 --> 00:44:53,537
How do you know?

1188
00:44:53,537 --> 00:44:55,328
AUDIENCE: He doesn't
actually [INAUDIBLE]..

1189
00:44:55,328 --> 00:44:56,240
AUDIENCE: We haven't
seen an example.

1190
00:44:56,240 --> 00:44:57,260
JOSH TENENBAUM: Yeah.

1191
00:44:57,260 --> 00:44:58,470
Well, an example of what?

1192
00:44:58,470 --> 00:45:00,010
We don't know what's going
to happen if he hits one.

1193
00:45:00,010 --> 00:45:00,290
AUDIENCE: We assume [INAUDIBLE].

1194
00:45:00,290 --> 00:45:02,250
JOSH TENENBAUM: But
somehow, we assume-- well,

1195
00:45:02,250 --> 00:45:03,249
it's just an assumption.

1196
00:45:03,249 --> 00:45:07,224
I think we very reasonably
infer that there's something bad

1197
00:45:07,224 --> 00:45:08,390
will happen if he hits them.

1198
00:45:08,390 --> 00:45:10,431
Now, do you remember of
some of the other objects

1199
00:45:10,431 --> 00:45:11,840
that we saw on
the second screen?

1200
00:45:11,840 --> 00:45:13,460
There were these fish, yeah.

1201
00:45:13,460 --> 00:45:16,106
What happens if he hits those?

1202
00:45:16,106 --> 00:45:17,510
AUDIENCE: He gets more points

1203
00:45:17,510 --> 00:45:18,410
JOSH TENENBAUM: He
gets points, yeah.

1204
00:45:18,410 --> 00:45:20,490
And he went out of his
way to actually get them.

1205
00:45:20,490 --> 00:45:20,990
OK.

1206
00:45:20,990 --> 00:45:24,132
So you basically
figured it out, right?

1207
00:45:24,132 --> 00:45:26,090
It only took you really
literally just a minute

1208
00:45:26,090 --> 00:45:29,789
of watching this game
to figure out a lot.

1209
00:45:29,789 --> 00:45:31,580
Now, if you actually
went to go and play it

1210
00:45:31,580 --> 00:45:33,170
after a minute of
experience, you

1211
00:45:33,170 --> 00:45:34,460
wouldn't be that good, right?

1212
00:45:34,460 --> 00:45:37,430
It turns out that it's hard
to coordinate all these moves.

1213
00:45:37,430 --> 00:45:40,760
But you would be kind of
excited and frustrated,

1214
00:45:40,760 --> 00:45:43,640
which is the experience of
a good video game, right?

1215
00:45:43,640 --> 00:45:45,916
Anybody remember the
Flappy Bird phenomenon?

1216
00:45:45,916 --> 00:45:46,630
AUDIENCE: Yeah

1217
00:45:46,630 --> 00:45:47,000
JOSH TENENBAUM: Right.

1218
00:45:47,000 --> 00:45:48,920
This was this, like,
sensation, this game that

1219
00:45:48,920 --> 00:45:50,300
was like the stupidest game.

1220
00:45:50,300 --> 00:45:52,190
I mean, it seemed like
it should be trivial,

1221
00:45:52,190 --> 00:45:53,240
and yet it was really hard.

1222
00:45:53,240 --> 00:45:54,410
But, again, you just
watch it for a second,

1223
00:45:54,410 --> 00:45:55,370
you know exactly what
you're supposed to do.

1224
00:45:55,370 --> 00:45:56,953
You think you can
do it, but it's just

1225
00:45:56,953 --> 00:45:59,010
hard to get the rhythms
down for most people.

1226
00:45:59,010 --> 00:46:00,200
And certainly, this game
is a little bit hard

1227
00:46:00,200 --> 00:46:01,310
to time the rhythms.

1228
00:46:01,310 --> 00:46:03,350
But what you do when
you play this game is

1229
00:46:03,350 --> 00:46:05,690
you get, from one minute,
you build that whole model

1230
00:46:05,690 --> 00:46:08,330
of the world, the causal
relations, the goals,

1231
00:46:08,330 --> 00:46:09,860
the subgoals.

1232
00:46:09,860 --> 00:46:11,810
And you can formulate
clearly what

1233
00:46:11,810 --> 00:46:13,460
are the right kinds of plans.

1234
00:46:13,460 --> 00:46:15,610
But to actually implement
them in real time,

1235
00:46:15,610 --> 00:46:18,050
but without getting killed
is a little bit harder.

1236
00:46:18,050 --> 00:46:20,030
And you could say
that, you know,

1237
00:46:20,030 --> 00:46:21,570
when the child is
learning to walk

1238
00:46:21,570 --> 00:46:23,278
there's a similar kind
of thing going on,

1239
00:46:23,278 --> 00:46:26,150
except usually without the
danger of getting killed, just

1240
00:46:26,150 --> 00:46:27,740
danger falling
over a little bit.

1241
00:46:27,740 --> 00:46:29,540
OK.

1242
00:46:29,540 --> 00:46:31,366
Contrast that learning
dynamics-- which,

1243
00:46:31,366 --> 00:46:32,990
again, I'm just
describing anecdotally.

1244
00:46:32,990 --> 00:46:35,120
One of the things we'd
like to do actually as one

1245
00:46:35,120 --> 00:46:37,100
of our center activities
and it's a possible project

1246
00:46:37,100 --> 00:46:39,350
for students, either in our
center or some of you guys

1247
00:46:39,350 --> 00:46:41,550
if you're interested-- it's
a big possible project--

1248
00:46:41,550 --> 00:46:43,300
is to actually measure
this, like actually

1249
00:46:43,300 --> 00:46:46,290
study what do people learn
from just a minute or two

1250
00:46:46,290 --> 00:46:48,890
or very, very quick
learning experience

1251
00:46:48,890 --> 00:46:51,710
with these kinds of
games, whether they're

1252
00:46:51,710 --> 00:46:53,377
adults like us who've
played other games

1253
00:46:53,377 --> 00:46:54,835
or even young
children who've never

1254
00:46:54,835 --> 00:46:56,030
played a video game before.

1255
00:46:56,030 --> 00:46:59,157
But I think what we will find
is the kind of learning dynamic

1256
00:46:59,157 --> 00:46:59,990
that I'm describing.

1257
00:46:59,990 --> 00:47:01,323
It will be tricky to measure it.

1258
00:47:01,323 --> 00:47:03,197
But I'm sure we can.

1259
00:47:03,197 --> 00:47:05,780
And it'll be very different from
the kind of learning dynamics

1260
00:47:05,780 --> 00:47:08,170
that you get from these
deep reinforcement networks.

1261
00:47:08,170 --> 00:47:11,074
Here, this is an example
of their learning curves

1262
00:47:11,074 --> 00:47:12,740
which comes not from
the DeepMind paper,

1263
00:47:12,740 --> 00:47:14,864
but from some slightly more
recent work from Pieter

1264
00:47:14,864 --> 00:47:16,687
Abbeel's group which
basically builds

1265
00:47:16,687 --> 00:47:18,770
on the same architecture,
but shows how to improve

1266
00:47:18,770 --> 00:47:22,400
the exploration part of it in
order to improve dramatically

1267
00:47:22,400 --> 00:47:24,470
on some games, including
this Frostbite game.

1268
00:47:24,470 --> 00:47:28,440
So this is the learning curve
for this game you just saw.

1269
00:47:28,440 --> 00:47:32,180
The black dashed line
is the DeepMind system

1270
00:47:32,180 --> 00:47:33,140
from the Nature paper.

1271
00:47:33,140 --> 00:47:35,390
And they will tell you
that their current system

1272
00:47:35,390 --> 00:47:36,020
is much better.

1273
00:47:36,020 --> 00:47:37,730
So I don't know how much better.

1274
00:47:37,730 --> 00:47:41,930
But, anyway, just
to be fair, right?

1275
00:47:41,930 --> 00:47:46,520
And, again, I'm essentially
criticizing these approaches

1276
00:47:46,520 --> 00:47:48,210
saying, from a
human point of view,

1277
00:47:48,210 --> 00:47:48,990
they're very
different from humans.

1278
00:47:48,990 --> 00:47:51,950
That's not to take away from the
really impressive engineering

1279
00:47:51,950 --> 00:47:53,952
in AI, machine learning
accomplishments

1280
00:47:53,952 --> 00:47:55,160
that these systems are doing.

1281
00:47:55,160 --> 00:47:56,831
I think they are
really interesting.

1282
00:47:56,831 --> 00:47:57,830
They're really valuable.

1283
00:47:57,830 --> 00:48:02,240
They have scientific value
as well as engineering value.

1284
00:48:02,240 --> 00:48:05,450
I just want to draw the contrast
between what they're doing

1285
00:48:05,450 --> 00:48:07,910
and some other really important
scientific and engineering

1286
00:48:07,910 --> 00:48:09,493
questions that are
the ones that we're

1287
00:48:09,493 --> 00:48:12,330
trying to talk about here.

1288
00:48:12,330 --> 00:48:14,692
So the DeepMind system
is the black dashed line.

1289
00:48:14,692 --> 00:48:17,150
And then the red and blue curves
are two different versions

1290
00:48:17,150 --> 00:48:19,622
of the system from Pieter
Abbeel's group, which

1291
00:48:19,622 --> 00:48:21,080
is basically the
same architecture,

1292
00:48:21,080 --> 00:48:23,240
but it just explores
a little bit better.

1293
00:48:23,240 --> 00:48:26,960
And you can see that the x-axis
is the amount of experience.

1294
00:48:26,960 --> 00:48:28,020
It's in training epochs.

1295
00:48:28,020 --> 00:48:29,540
But I think, if I
understand correctly,

1296
00:48:29,540 --> 00:48:30,740
it's roughly
proportional to like

1297
00:48:30,740 --> 00:48:31,948
hours of gameplay experience.

1298
00:48:31,948 --> 00:48:34,850
So 100 is like 100 hours.

1299
00:48:34,850 --> 00:48:37,580
At the end, the DeepQ
network in the Nature paper

1300
00:48:37,580 --> 00:48:38,550
trained up for 1,000.

1301
00:48:38,550 --> 00:48:41,150
And you're showing
there the asymptote.

1302
00:48:41,150 --> 00:48:43,550
That's the horizontal
dashed line.

1303
00:48:43,550 --> 00:48:47,250
And then this line
here is what it does

1304
00:48:47,250 --> 00:48:48,860
after about 100 iterations.

1305
00:48:48,860 --> 00:48:50,960
And you can see it's
basically asymptoted

1306
00:48:50,960 --> 00:48:53,880
in that after 10 times as much,
there's a time lapse here,

1307
00:48:53,880 --> 00:48:54,380
right?

1308
00:48:54,380 --> 00:48:55,640
10 times as much, it
gets up to about there.

1309
00:48:55,640 --> 00:48:56,300
OK.

1310
00:48:56,300 --> 00:48:59,810
And impressively, Abbeel's
group system does much better.

1311
00:48:59,810 --> 00:49:01,970
After only 100
hours, it's already

1312
00:49:01,970 --> 00:49:04,190
twice as good as that system.

1313
00:49:04,190 --> 00:49:06,680
But, again, contrast
this with humans, both

1314
00:49:06,680 --> 00:49:09,050
what a human would
do and also where

1315
00:49:09,050 --> 00:49:10,820
the human knowledge is, right?

1316
00:49:14,260 --> 00:49:15,860
I mean, the human
game player that you

1317
00:49:15,860 --> 00:49:20,060
saw in here, by the time it's
finished the first screen,

1318
00:49:20,060 --> 00:49:24,290
is already like up here, so
after about a minute of play.

1319
00:49:24,290 --> 00:49:27,290
Now, again, you wouldn't be able
to be that good after a minute.

1320
00:49:27,290 --> 00:49:29,750
But essentially, the difference
between these systems

1321
00:49:29,750 --> 00:49:34,130
is that the DeepQ network never
gets past the first screen even

1322
00:49:34,130 --> 00:49:35,480
with 1,000 hours.

1323
00:49:35,480 --> 00:49:38,390
And this other one gets past
the first screen in 100 hours,

1324
00:49:38,390 --> 00:49:40,760
kind of gets to about
the second screen.

1325
00:49:40,760 --> 00:49:42,800
It's sort of midway
through the second screen.

1326
00:49:42,800 --> 00:49:44,480
In this domain, it's
really interesting

1327
00:49:44,480 --> 00:49:47,862
to think about not what
happens scientifically.

1328
00:49:47,862 --> 00:49:49,820
It's really interesting
to think about not what

1329
00:49:49,820 --> 00:49:52,644
happens when you had
1,000 hours of experience

1330
00:49:52,644 --> 00:49:55,060
with no prior knowledge, because
humans just don't do that

1331
00:49:55,060 --> 00:49:56,740
on this or really any
other task that we

1332
00:49:56,740 --> 00:49:57,890
can study experimentally.

1333
00:49:57,890 --> 00:50:00,820
But you can study what humans
do in the first minute, which is

1334
00:50:00,820 --> 00:50:02,742
just this blip like right here.

1335
00:50:02,742 --> 00:50:05,200
I think if we could get the
right learning curve, you know,

1336
00:50:05,200 --> 00:50:07,283
what you'd see is that
humans are going like this.

1337
00:50:07,283 --> 00:50:09,830
And they may asymptote well
before any of these systems do.

1338
00:50:09,830 --> 00:50:11,710
But the interesting
human learning part

1339
00:50:11,710 --> 00:50:15,090
is what's going on in the
first minute, more or less

1340
00:50:15,090 --> 00:50:19,090
or the first hour, with all of
the knowledge that you bring

1341
00:50:19,090 --> 00:50:21,579
to this task as well
as how did you build up

1342
00:50:21,579 --> 00:50:22,370
all that knowledge.

1343
00:50:22,370 --> 00:50:24,190
So you want to talk
about learning to learn

1344
00:50:24,190 --> 00:50:26,440
and multiple task learning,
so that's all there, too.

1345
00:50:26,440 --> 00:50:27,760
I'm just saying in
this one game that's

1346
00:50:27,760 --> 00:50:29,230
what you can study
I think, or that's

1347
00:50:29,230 --> 00:50:31,510
where the heart of the matter
is of human intelligence

1348
00:50:31,510 --> 00:50:32,996
in this setting.

1349
00:50:32,996 --> 00:50:34,370
And I think we
should study that.

1350
00:50:34,370 --> 00:50:35,440
So, you know, what
I've been trying

1351
00:50:35,440 --> 00:50:37,940
to do here for the last hour
is motivate the kinds of things

1352
00:50:37,940 --> 00:50:39,700
we should study if
we want to understand

1353
00:50:39,700 --> 00:50:42,790
the aspect of intelligence
that we could call explaining,

1354
00:50:42,790 --> 00:50:45,710
understanding, the
heart of building causal

1355
00:50:45,710 --> 00:50:46,550
models of the world.

1356
00:50:46,550 --> 00:50:47,380
We can do it.

1357
00:50:47,380 --> 00:50:49,360
But we have to do it a
little bit differently.

1358
00:50:49,360 --> 00:50:52,240
In a flash, that's the first
problem, I started with.

1359
00:50:52,240 --> 00:50:55,390
How do we learn a generalizable
concept from just one example?

1360
00:50:55,390 --> 00:50:56,920
How can we discover
causal relations

1361
00:50:56,920 --> 00:50:59,260
from just a single observed
event, like that, you know,

1362
00:50:59,260 --> 00:51:02,710
jumping on the block
and the beep and so on,

1363
00:51:02,710 --> 00:51:05,720
which sometimes can go wrong
like any other perceptual

1364
00:51:05,720 --> 00:51:06,220
process?

1365
00:51:06,220 --> 00:51:07,030
You can have illusions.

1366
00:51:07,030 --> 00:51:08,980
You can see an accident
that isn't quite right.

1367
00:51:08,980 --> 00:51:11,521
And then you move your head,
and you see something different.

1368
00:51:11,521 --> 00:51:14,095
Or you go into the game, and
you realize that it's not just

1369
00:51:14,095 --> 00:51:16,220
touching blocks that makes
the temperature go down,

1370
00:51:16,220 --> 00:51:17,950
but it's just time.

1371
00:51:17,950 --> 00:51:21,135
How do we see forces, physics,
and see inside of other minds

1372
00:51:21,135 --> 00:51:22,510
even if they're
just a few shapes

1373
00:51:22,510 --> 00:51:24,260
moving around in two dimensions?

1374
00:51:24,260 --> 00:51:27,670
How do we learn to play games
and act in a whole new world

1375
00:51:27,670 --> 00:51:30,735
in just under a minute, right?

1376
00:51:30,735 --> 00:51:32,110
And then there's
all the problems

1377
00:51:32,110 --> 00:51:34,151
of language, which I'm
not going to go into,

1378
00:51:34,151 --> 00:51:35,650
like understanding
what we're saying

1379
00:51:35,650 --> 00:51:36,899
and what you're reading here--

1380
00:51:36,899 --> 00:51:38,290
also, versions of
these problems.

1381
00:51:38,290 --> 00:51:41,200
And our goal in our field is to
understand this in engineering

1382
00:51:41,200 --> 00:51:43,090
terms, to have a
computational framework that

1383
00:51:43,090 --> 00:51:45,790
explains how this is even
possible and, in particular,

1384
00:51:45,790 --> 00:51:47,450
then how people do it.

1385
00:51:47,450 --> 00:51:48,650
OK.

1386
00:51:48,650 --> 00:51:51,901
Now, you know, in some
sense cognitive scientists

1387
00:51:51,901 --> 00:51:54,400
and researchers, we're not the
first people to work on this.

1388
00:51:54,400 --> 00:51:56,800
Philosophers have talked
about this kind of thing

1389
00:51:56,800 --> 00:51:59,455
for thousands of years
in the Western tradition.

1390
00:51:59,455 --> 00:52:02,375
It's a version of the problem
of induction, the problem of how

1391
00:52:02,375 --> 00:52:04,250
do you know the sun is
going to rise tomorrow

1392
00:52:04,250 --> 00:52:07,210
or just generalizing
from experience.

1393
00:52:07,210 --> 00:52:09,400
And for as long as people
have studied this problem,

1394
00:52:09,400 --> 00:52:11,770
the answer has always
been clear in some form

1395
00:52:11,770 --> 00:52:13,840
that, again, it has to
be about the knowledge

1396
00:52:13,840 --> 00:52:16,420
that you bring to the
situation that gives you

1397
00:52:16,420 --> 00:52:18,460
the constraints that
allows you to fill in

1398
00:52:18,460 --> 00:52:20,300
from this very sparse data.

1399
00:52:20,300 --> 00:52:24,354
But, again, if you're
dissatisfied with that

1400
00:52:24,354 --> 00:52:26,020
is the answer, of
course, you should be.

1401
00:52:26,020 --> 00:52:26,920
That's not really the answer.

1402
00:52:26,920 --> 00:52:28,670
That just raises the
real problems, right?

1403
00:52:28,670 --> 00:52:30,220
And these are the
problems that I

1404
00:52:30,220 --> 00:52:32,350
want to try to
address in the more

1405
00:52:32,350 --> 00:52:37,230
substantive part of the morning,
which is these questions here.

1406
00:52:37,230 --> 00:52:40,380
So how do you
actually use knowledge

1407
00:52:40,380 --> 00:52:41,850
to guide learning
from sparse data?

1408
00:52:41,850 --> 00:52:43,400
What form does it take?

1409
00:52:43,400 --> 00:52:44,834
How can we describe
the knowledge?

1410
00:52:44,834 --> 00:52:46,500
And how can we explain
how it's learned?

1411
00:52:46,500 --> 00:52:49,230
How is that knowledge
itself constructed

1412
00:52:49,230 --> 00:52:51,150
from other kinds
of experiences you

1413
00:52:51,150 --> 00:52:54,660
have combined with whatever,
you know, your genes have set up

1414
00:52:54,660 --> 00:52:55,710
for you?

1415
00:52:55,710 --> 00:52:58,140
And I'm going to be talking
about this approach.

1416
00:52:58,140 --> 00:52:59,925
And you know,
again, really think

1417
00:52:59,925 --> 00:53:01,800
of this as the introduction
to the whole day.

1418
00:53:01,800 --> 00:53:03,674
Because you're going to
see a couple of hours

1419
00:53:03,674 --> 00:53:06,900
from me and then also from Tomer
more hands on in the afternoon.

1420
00:53:06,900 --> 00:53:08,012
This is our approach.

1421
00:53:08,012 --> 00:53:09,720
You can give it
different kinds of names.

1422
00:53:09,720 --> 00:53:11,303
I guess I called it
generative models,

1423
00:53:11,303 --> 00:53:13,830
because that's what Tommy
likes to call it in CBMM.

1424
00:53:13,830 --> 00:53:14,620
And that's fine.

1425
00:53:14,620 --> 00:53:16,320
Like any other
approach, you know,

1426
00:53:16,320 --> 00:53:19,290
there's no one word that
captures what it's about.

1427
00:53:19,290 --> 00:53:20,910
But these are the
key ideas that we're

1428
00:53:20,910 --> 00:53:23,100
going to be talking about.

1429
00:53:23,100 --> 00:53:25,620
We're going to talk a lot
about generative models

1430
00:53:25,620 --> 00:53:26,670
in a probabilistic sense.

1431
00:53:26,670 --> 00:53:29,250
So what it means to
have a generative model

1432
00:53:29,250 --> 00:53:34,620
is to be able to describe the
joint distribution in some form

1433
00:53:34,620 --> 00:53:37,750
on your observable data with
some kind of latent variables,

1434
00:53:37,750 --> 00:53:38,250
right?

1435
00:53:38,250 --> 00:53:39,770
And then you can do
probabilistic inference

1436
00:53:39,770 --> 00:53:41,370
or Bayesian
inference, which means

1437
00:53:41,370 --> 00:53:43,504
conditioning on
some of the outputs

1438
00:53:43,504 --> 00:53:45,420
of that generative model
and making inferences

1439
00:53:45,420 --> 00:53:47,580
about the latent structure,
the hidden variables,

1440
00:53:47,580 --> 00:53:48,900
as well as the other things.

1441
00:53:48,900 --> 00:53:52,500
But crucially, there's
lots of problematic models,

1442
00:53:52,500 --> 00:53:55,050
but these ones have very
particular kinds of structures,

1443
00:53:55,050 --> 00:53:55,550
right?

1444
00:53:55,550 --> 00:53:57,930
So the probabilities
are not just defined

1445
00:53:57,930 --> 00:53:59,040
in statisticians terms.

1446
00:53:59,040 --> 00:54:01,710
But they're defined on some
kind of interestingly structured

1447
00:54:01,710 --> 00:54:03,780
representation that
can actually capture

1448
00:54:03,780 --> 00:54:06,347
the causal and compositional
things we're talking about,

1449
00:54:06,347 --> 00:54:08,430
that can capture the causal
structure of the world

1450
00:54:08,430 --> 00:54:12,660
in a composable way that can
support the kind of flexibility

1451
00:54:12,660 --> 00:54:14,800
of learning and planning
that we're talking about.

1452
00:54:14,800 --> 00:54:18,030
So a key part of how
you do this sort of work

1453
00:54:18,030 --> 00:54:20,940
is to understand how to
build probabilistic models

1454
00:54:20,940 --> 00:54:25,050
and do inference over various
kinds of richly structured

1455
00:54:25,050 --> 00:54:26,502
symbolic representations.

1456
00:54:26,502 --> 00:54:27,960
And this is the
sort of thing which

1457
00:54:27,960 --> 00:54:30,720
is a fairly new
technical advance, right?

1458
00:54:30,720 --> 00:54:33,300
If you look in the
history of AI as well as

1459
00:54:33,300 --> 00:54:34,971
in cognitive
science, there's been

1460
00:54:34,971 --> 00:54:37,470
a lot of back and forth between
people emphasizing these two

1461
00:54:37,470 --> 00:54:40,380
big ideas, the ideas of
statistics and symbols

1462
00:54:40,380 --> 00:54:41,462
if you like, right?

1463
00:54:41,462 --> 00:54:43,170
And there's a long
history of people sort

1464
00:54:43,170 --> 00:54:45,820
of saying one of these is
going to explain everything

1465
00:54:45,820 --> 00:54:48,000
and the other one is not
going explain very much

1466
00:54:48,000 --> 00:54:50,220
or isn't even real, right?

1467
00:54:50,220 --> 00:54:52,819
For example, some of the debates
between Chomsky in language

1468
00:54:52,819 --> 00:54:55,110
in cognitive science and the
people who came before him

1469
00:54:55,110 --> 00:54:57,870
and the people who came after
him had this character, right?

1470
00:54:57,870 --> 00:54:59,880
Or some of the debates
in AI in the first wave

1471
00:54:59,880 --> 00:55:03,560
of neural networks, people
like Minsky, for example,

1472
00:55:03,560 --> 00:55:06,900
and spend some of the
neural network people

1473
00:55:06,900 --> 00:55:09,090
like Jay McClelland initially--

1474
00:55:09,090 --> 00:55:13,601
I mean, I'm mixing
up chronology there.

1475
00:55:13,601 --> 00:55:14,100
I'm sorry.

1476
00:55:14,100 --> 00:55:16,920
But you know, you see this
every time whether it's

1477
00:55:16,920 --> 00:55:18,630
in the '60s or the '80s or now.

1478
00:55:18,630 --> 00:55:23,040
You know, there's a
discourse in our field, which

1479
00:55:23,040 --> 00:55:24,290
is a really interesting one.

1480
00:55:24,290 --> 00:55:26,389
I think, ultimately, we
have to go beyond it.

1481
00:55:26,389 --> 00:55:27,930
And what's so exciting
is that we are

1482
00:55:27,930 --> 00:55:29,500
being starting to go beyond it.

1483
00:55:29,500 --> 00:55:31,892
But there's been this discourse
of people really saying,

1484
00:55:31,892 --> 00:55:33,600
you know, the heart
of human intelligence

1485
00:55:33,600 --> 00:55:35,814
is some kind of rich
symbolic structures.

1486
00:55:35,814 --> 00:55:37,980
Oh, and there's some other
people who said something

1487
00:55:37,980 --> 00:55:38,970
about statistics.

1488
00:55:38,970 --> 00:55:42,411
But that's like trivial or
uninteresting or never going

1489
00:55:42,411 --> 00:55:42,910
to anything.

1490
00:55:42,910 --> 00:55:45,160
And then some other
people often responding

1491
00:55:45,160 --> 00:55:47,160
to those first people--
it's very much of a back

1492
00:55:47,160 --> 00:55:49,870
and forth debate.

1493
00:55:49,870 --> 00:55:53,590
It gets very acrimonious and
emotional saying, you know,

1494
00:55:53,590 --> 00:55:55,680
no, those symbols are
magical, mysterious things,

1495
00:55:55,680 --> 00:55:59,580
completely ridiculous,
totally useless, never worked.

1496
00:55:59,580 --> 00:56:03,240
It's really all
about statistics.

1497
00:56:03,240 --> 00:56:05,370
And somehow something
kind of maybe like symbols

1498
00:56:05,370 --> 00:56:06,630
will emerge from those.

1499
00:56:06,630 --> 00:56:08,760
And I think we as a
field are learning

1500
00:56:08,760 --> 00:56:10,710
that neither of
those extreme views

1501
00:56:10,710 --> 00:56:12,820
is going to get us anywhere
really quite honestly

1502
00:56:12,820 --> 00:56:14,910
and that we have to
understand-- among other things.

1503
00:56:14,910 --> 00:56:16,110
It's not the only thing
we have to understand.

1504
00:56:16,110 --> 00:56:17,651
But a big thing we
have to understand

1505
00:56:17,651 --> 00:56:19,200
and are starting
to understand is

1506
00:56:19,200 --> 00:56:22,020
how to do probabilistic
inference over richly

1507
00:56:22,020 --> 00:56:23,910
structured symbolic objects.

1508
00:56:23,910 --> 00:56:27,150
And that means both using
interesting symbolic structures

1509
00:56:27,150 --> 00:56:29,460
to define the priors for
probabilistic inference,

1510
00:56:29,460 --> 00:56:32,400
but also-- and this moves
more into the third topic--

1511
00:56:32,400 --> 00:56:34,710
being able to think
about learning

1512
00:56:34,710 --> 00:56:37,140
interesting symbolic
representations as a kind

1513
00:56:37,140 --> 00:56:38,430
of probabilistic inference.

1514
00:56:38,430 --> 00:56:41,727
And to do that, we need to
combine statistics and symbols

1515
00:56:41,727 --> 00:56:43,560
with some kind of notion
of what's sometimes

1516
00:56:43,560 --> 00:56:45,730
called hierarchical
probabilistic models.

1517
00:56:45,730 --> 00:56:48,750
Or it's a certain kind of
recursive generative model

1518
00:56:48,750 --> 00:56:51,345
where you don't just have
a generative model that

1519
00:56:51,345 --> 00:56:53,220
has some latent variables
which then generate

1520
00:56:53,220 --> 00:56:54,540
your observable
experience, but where

1521
00:56:54,540 --> 00:56:56,010
you have hierarchies
of these things--

1522
00:56:56,010 --> 00:56:58,260
so generative models for
generative models or priors

1523
00:56:58,260 --> 00:56:58,920
on priors.

1524
00:56:58,920 --> 00:57:01,420
If you've heard of hierarchical
Bayes or hierarchical models

1525
00:57:01,420 --> 00:57:03,414
and statistics, it's
a version of the idea.

1526
00:57:03,414 --> 00:57:05,580
But it's sort of a more
general version of that idea

1527
00:57:05,580 --> 00:57:08,940
where the hypothesis space
and priors for Bayesian

1528
00:57:08,940 --> 00:57:11,220
inference that, you know,
you see in the simplest

1529
00:57:11,220 --> 00:57:13,290
version of Bayes' rule,
are not considered

1530
00:57:13,290 --> 00:57:17,960
to be just some fixed thing
that you write down and wire up

1531
00:57:17,960 --> 00:57:18,720
and that's it.

1532
00:57:18,720 --> 00:57:20,220
But rather, they
themselves could

1533
00:57:20,220 --> 00:57:23,036
be generated by some higher
level or more abstract

1534
00:57:23,036 --> 00:57:24,660
probabilistic model,
a hypothesis space

1535
00:57:24,660 --> 00:57:26,610
of hypothesis spaces,
or priors on priors,

1536
00:57:26,610 --> 00:57:29,410
or a generative model
for generative models.

1537
00:57:29,410 --> 00:57:31,620
And, again, there's a
long history of that idea.

1538
00:57:31,620 --> 00:57:33,990
So, for example, some really
interesting early work

1539
00:57:33,990 --> 00:57:36,330
on grammar induction
in the 1960s

1540
00:57:36,330 --> 00:57:38,300
introduced something
called grammar grammar,

1541
00:57:38,300 --> 00:57:40,880
where it used the
grammar, a formal grammar,

1542
00:57:40,880 --> 00:57:45,774
to give a hypothesis base for
grammars of languages, right?

1543
00:57:45,774 --> 00:57:47,690
But, again, what we're
understanding how to do

1544
00:57:47,690 --> 00:57:51,410
is to combine this notion of a
kind of recursive abstraction

1545
00:57:51,410 --> 00:57:52,797
with statistics and symbols.

1546
00:57:52,797 --> 00:57:54,380
And you put all those
things together,

1547
00:57:54,380 --> 00:57:56,180
and you get a really
powerful tool kit

1548
00:57:56,180 --> 00:57:57,860
for thinking about intelligence.

1549
00:57:57,860 --> 00:58:00,590
There's one other version of
this big picture which you'll

1550
00:58:00,590 --> 00:58:03,740
hear about both in the morning
and in the afternoon, which

1551
00:58:03,740 --> 00:58:05,880
is this idea of
probabilistic programs.

1552
00:58:05,880 --> 00:58:09,290
So when I would give a kind
of tutorial introduction about

1553
00:58:09,290 --> 00:58:10,850
five years ago-- oops, sorry--

1554
00:58:10,850 --> 00:58:13,190
I would say this.

1555
00:58:13,190 --> 00:58:15,590
But one of the really
exciting recent developments

1556
00:58:15,590 --> 00:58:17,420
in the last few
years is in a sense

1557
00:58:17,420 --> 00:58:20,520
a kind of unified language that
puts all these things together.

1558
00:58:20,520 --> 00:58:22,490
So we can have a lot
fewer words on the slide

1559
00:58:22,490 --> 00:58:24,781
and just say, oh, it's all
a big probabilistic program.

1560
00:58:24,781 --> 00:58:27,200
I mean, that's way
simplifying and leaving

1561
00:58:27,200 --> 00:58:28,610
out a lot of important stuff.

1562
00:58:28,610 --> 00:58:30,830
But the language of
probabilistic programs

1563
00:58:30,830 --> 00:58:33,080
that you're going to see
in little bits in my talks

1564
00:58:33,080 --> 00:58:35,450
and much more in the
tutorial later on

1565
00:58:35,450 --> 00:58:37,760
is part of why it's a
powerful language, or really

1566
00:58:37,760 --> 00:58:38,960
the main reason.

1567
00:58:38,960 --> 00:58:41,360
It's that it just gives a
unifying language and set

1568
00:58:41,360 --> 00:58:42,780
of tools for all
of these things,

1569
00:58:42,780 --> 00:58:44,900
including probabilistic
models defined

1570
00:58:44,900 --> 00:58:47,580
over all sorts of interesting
symbolic structures.

1571
00:58:47,580 --> 00:58:50,540
In fact any computable model,
any probabilistic model

1572
00:58:50,540 --> 00:58:53,660
defined on any representation
that's computable

1573
00:58:53,660 --> 00:58:55,950
can be expressed as a
probabilistic program.

1574
00:58:55,950 --> 00:58:59,450
It's where Turing universal
computation meets probability.

1575
00:58:59,450 --> 00:59:01,850
And everything about
hierarchical models,

1576
00:59:01,850 --> 00:59:03,890
generative models for
generative models,

1577
00:59:03,890 --> 00:59:06,890
or priors on priors, hypothesis
space by hypothesis space,

1578
00:59:06,890 --> 00:59:08,591
can be very naturally
expressed in terms

1579
00:59:08,591 --> 00:59:11,090
of probabilistic programs, where
basically you have programs

1580
00:59:11,090 --> 00:59:12,600
that generate other programs.

1581
00:59:12,600 --> 00:59:15,024
So if your model is
a program and it's

1582
00:59:15,024 --> 00:59:16,440
a probabilistic
generative model--

1583
00:59:16,440 --> 00:59:17,510
so it's a
probabilistic program--

1584
00:59:17,510 --> 00:59:19,337
and you want to put
down a generative model

1585
00:59:19,337 --> 00:59:21,170
for generative models
that can make learning

1586
00:59:21,170 --> 00:59:24,170
into inference recursively up
in higher levels of abstraction,

1587
00:59:24,170 --> 00:59:27,260
you just add a little bit more
to the probabilistic program.

1588
00:59:27,260 --> 00:59:30,170
And so it's a very both
beautiful, but also extremely

1589
00:59:30,170 --> 00:59:33,362
useful model building tool kit.

1590
00:59:33,362 --> 00:59:34,820
Now, there's a few
other ideas that

1591
00:59:34,820 --> 00:59:37,281
go along with these things
which I won't talk about.

1592
00:59:37,281 --> 00:59:38,780
The content of what
I'm going to try

1593
00:59:38,780 --> 00:59:39,890
to do for the rest
of the morning

1594
00:59:39,890 --> 00:59:41,431
and what you'll see
for the afternoon

1595
00:59:41,431 --> 00:59:44,840
is just to give you various
examples and ways to do things

1596
00:59:44,840 --> 00:59:46,370
with the ideas on these slides.

1597
00:59:46,370 --> 00:59:47,480
Now, there's some
other stuff which

1598
00:59:47,480 --> 00:59:48,350
we won't say that much about.

1599
00:59:48,350 --> 00:59:50,510
Although, I think Tomer,
who just walked in-- hey--

1600
00:59:50,510 --> 00:59:52,310
you will talk a little
about MCMC, right?

1601
00:59:52,310 --> 00:59:54,632
And we'll say a little
bit about item four,

1602
00:59:54,632 --> 00:59:56,840
because it goes back to
these questions I started off

1603
00:59:56,840 --> 00:59:58,412
with also that
are very pressing.

1604
00:59:58,412 --> 00:59:59,870
And they're really
interesting ones

1605
00:59:59,870 --> 01:00:02,570
for where neural networks meet
up with generative models.

1606
01:00:02,570 --> 01:00:05,660
You know, just how can we do
inference and learning so fast

1607
01:00:05,660 --> 01:00:08,210
and not just from few examples--
that's what this stuff is

1608
01:00:08,210 --> 01:00:08,810
about--

1609
01:00:08,810 --> 01:00:13,670
but just very quickly
in terms of time?

1610
01:00:13,670 --> 01:00:16,080
So we will say a
little bit about that.

1611
01:00:16,080 --> 01:00:20,530
But all of these, every item,
component of this approach,

1612
01:00:20,530 --> 01:00:22,280
is a whole research
area in and of itself.

1613
01:00:22,280 --> 01:00:24,655
There are people who spend
their entire career these days

1614
01:00:24,655 --> 01:00:29,060
focusing on how to make four
work and other people who

1615
01:00:29,060 --> 01:00:32,870
focus on how to use these kind
of rich probabilistic models

1616
01:00:32,870 --> 01:00:35,480
to guide planning
and decision making,

1617
01:00:35,480 --> 01:00:37,090
or how to relate
them to the brain.

1618
01:00:37,090 --> 01:00:40,074
Any one of these you could
spend more than a career on.

1619
01:00:40,074 --> 01:00:41,990
But what's exciting to
us is that with a bunch

1620
01:00:41,990 --> 01:00:44,600
of smart people working on
these and kind of developing

1621
01:00:44,600 --> 01:00:46,990
common languages to
link up these questions,

1622
01:00:46,990 --> 01:00:50,570
I think we really are poised
to make progress in my lifetime

1623
01:00:50,570 --> 01:00:52,840
and even more in yours.