1
00:00:01,640 --> 00:00:04,040
The following content is
provided under a Creative

2
00:00:04,040 --> 00:00:05,580
Commons license.

3
00:00:05,580 --> 00:00:07,880
Your support will help
MIT OpenCourseWare

4
00:00:07,880 --> 00:00:12,270
continue to offer high-quality
educational resources for free.

5
00:00:12,270 --> 00:00:14,870
To make a donation or
view additional materials

6
00:00:14,870 --> 00:00:18,830
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,830 --> 00:00:21,157
at ocw.mit.edu.

8
00:00:21,157 --> 00:00:22,490
JOSH TENENBAUM: We're going to--

9
00:00:22,490 --> 00:00:25,189
I'm just going to give a
bunch of examples of things

10
00:00:25,189 --> 00:00:26,480
that we in our field have done.

11
00:00:26,480 --> 00:00:28,730
Most of them are things that
I've played some role in.

12
00:00:28,730 --> 00:00:31,075
Maybe it was a thesis
project of a student.

13
00:00:31,075 --> 00:00:33,200
But they're meant to be
representative of a broader

14
00:00:33,200 --> 00:00:35,199
set of things that many
people have been working

15
00:00:35,199 --> 00:00:37,349
on developing this toolkit.

16
00:00:37,349 --> 00:00:39,890
And we're going to start from
the beginning in a sense-- just

17
00:00:39,890 --> 00:00:41,870
some very simple
things that we did

18
00:00:41,870 --> 00:00:44,720
to try to look at ways in
which probabilistic, generative

19
00:00:44,720 --> 00:00:48,486
models can inform people's
basic cognitive processes.

20
00:00:48,486 --> 00:00:50,360
And then build up to
more interestingly kinds

21
00:00:50,360 --> 00:00:53,610
of symbolically structured
models, hierarchical models,

22
00:00:53,610 --> 00:00:55,670
and ultimately to these
probabilistic programs

23
00:00:55,670 --> 00:00:57,865
for common sense.

24
00:00:57,865 --> 00:00:59,990
So when I say a lot of
people have been doing this,

25
00:00:59,990 --> 00:01:02,073
I mean here's just a small
number of these people.

26
00:01:02,073 --> 00:01:04,340
Every year or two, I try
to update this slide.

27
00:01:04,340 --> 00:01:07,321
But it's very much
historically dated with people

28
00:01:07,321 --> 00:01:09,320
that I knew when I was
in grad school basically.

29
00:01:09,320 --> 00:01:11,240
There's a lot of
really great work

30
00:01:11,240 --> 00:01:14,542
by younger people who
maybe their names haven't

31
00:01:14,542 --> 00:01:15,500
appeared on this slide.

32
00:01:15,500 --> 00:01:17,554
So those dot dot dots
are extremely serious.

33
00:01:17,554 --> 00:01:19,720
And a lot of the best stuff
is not included on here.

34
00:01:19,720 --> 00:01:22,040
But in the last
couple of decades,

35
00:01:22,040 --> 00:01:24,470
across basically all
the different areas

36
00:01:24,470 --> 00:01:26,180
of cognitive science
that cover basically

37
00:01:26,180 --> 00:01:28,520
all the different things
that cognition does,

38
00:01:28,520 --> 00:01:31,847
there's been great progress
building serious mathematical--

39
00:01:31,847 --> 00:01:34,430
and what we could call reverse
engineering models in the sense

40
00:01:34,430 --> 00:01:38,130
that they are quantitative
models of human cognition,

41
00:01:38,130 --> 00:01:40,944
but they are phrased in
the terms of engineering,

42
00:01:40,944 --> 00:01:42,860
the same things you would
use to build a robot

43
00:01:42,860 --> 00:01:44,900
to do these things,
at least in principle.

44
00:01:44,900 --> 00:01:48,080
And it's been
developing this toolkit

45
00:01:48,080 --> 00:01:50,390
of probabalistic
generative models.

46
00:01:50,390 --> 00:01:52,310
I want to start
off by telling you

47
00:01:52,310 --> 00:01:55,010
a little bit about some
work that I did together

48
00:01:55,010 --> 00:01:56,180
with Tom Griffiths.

49
00:01:56,180 --> 00:02:00,020
So Tom is now a senior
faculty member at Berkeley,

50
00:02:00,020 --> 00:02:01,714
one of the leaders
in this field--

51
00:02:01,714 --> 00:02:04,130
as well as a leading person
in machine learning, actually.

52
00:02:04,130 --> 00:02:05,713
One of the great
things that he's done

53
00:02:05,713 --> 00:02:08,120
is to take inspiration
from human learning

54
00:02:08,120 --> 00:02:11,320
and develop fundamentally new
kinds of probabilistic models,

55
00:02:11,320 --> 00:02:13,130
in non-parametric
Bayes in particular,

56
00:02:13,130 --> 00:02:15,800
inspired by human learning.

57
00:02:15,800 --> 00:02:19,230
But when Tom was a grad
student, we worked together.

58
00:02:19,230 --> 00:02:20,420
He was my first student.

59
00:02:20,420 --> 00:02:21,530
We're almost the same age.

60
00:02:21,530 --> 00:02:23,660
So at this point, we're
more like senior colleagues

61
00:02:23,660 --> 00:02:24,685
than student advisor.

62
00:02:24,685 --> 00:02:26,060
But I'll tell you
about some work

63
00:02:26,060 --> 00:02:27,500
we did back when
he was a student,

64
00:02:27,500 --> 00:02:28,760
and I was just starting off.

65
00:02:28,760 --> 00:02:31,160
And we were both together
trying to tackle this problem

66
00:02:31,160 --> 00:02:36,440
and trying to see, OK, what are
the prospects for understanding

67
00:02:36,440 --> 00:02:39,380
even very basic cognitive
intuitions, like senses

68
00:02:39,380 --> 00:02:41,840
of similarity or the most
basic kinds of causal discovery

69
00:02:41,840 --> 00:02:44,090
intuitions like we were
talking about before,

70
00:02:44,090 --> 00:02:46,460
using some kind of idea
of probabilistic inference

71
00:02:46,460 --> 00:02:48,280
in a generative model?

72
00:02:48,280 --> 00:02:50,360
And at the time-- remember
in the introduction

73
00:02:50,360 --> 00:02:52,280
I was talking about how
there's been this back

74
00:02:52,280 --> 00:02:56,780
and forth discourse over the
decades of people saying, yeah,

75
00:02:56,780 --> 00:02:59,601
rah rah, statistics,
and, statistics,

76
00:02:59,601 --> 00:03:01,100
those are trivial
and uninteresting?

77
00:03:01,100 --> 00:03:03,058
And at the time we started
to do this, at least

78
00:03:03,058 --> 00:03:05,180
in cognitive
psychology, the idea

79
00:03:05,180 --> 00:03:07,520
that cognition could
be seen as some kind

80
00:03:07,520 --> 00:03:09,140
of sophisticated
statistical inference

81
00:03:09,140 --> 00:03:11,100
was very much not
a popular idea.

82
00:03:11,100 --> 00:03:14,000
But we thought that it
was fundamentally right

83
00:03:14,000 --> 00:03:14,990
in some ways.

84
00:03:14,990 --> 00:03:16,130
And it was at the time--

85
00:03:16,130 --> 00:03:19,250
again, this was work we were
doing in the early 2000s

86
00:03:19,250 --> 00:03:21,830
when it was very clear in
machine learning and AI

87
00:03:21,830 --> 00:03:24,410
already how transformative
these ideas were

88
00:03:24,410 --> 00:03:26,300
in building intelligent
machines or starting

89
00:03:26,300 --> 00:03:27,470
to build intelligent machines.

90
00:03:27,470 --> 00:03:29,120
So it seemed clear to
us that at least it

91
00:03:29,120 --> 00:03:31,536
was a good hypothesis worth
exploring and taking much more

92
00:03:31,536 --> 00:03:34,790
seriously than psychologists
had much before that.

93
00:03:34,790 --> 00:03:37,430
That this also could
describe basic aspects

94
00:03:37,430 --> 00:03:38,630
of human thinking.

95
00:03:38,630 --> 00:03:41,930
So I'll give you a couple
examples of what we did here.

96
00:03:41,930 --> 00:03:45,200
Here's a simple kind of causal
inference from coincidences,

97
00:03:45,200 --> 00:03:49,720
much like what you saw
going on in the video game.

98
00:03:49,720 --> 00:03:50,720
There's no time in this.

99
00:03:50,720 --> 00:03:53,840
It's really mostly just space,
or maybe a little bit of time.

100
00:03:53,840 --> 00:03:57,090
The motivation was not a
video game, but imagine--

101
00:03:57,090 --> 00:03:59,990
to put a real world
context on it--

102
00:03:59,990 --> 00:04:02,510
what's sometimes called cancer
clusters or rare disease

103
00:04:02,510 --> 00:04:03,150
clusters.

104
00:04:03,150 --> 00:04:05,400
You can read about these
often in the newspaper, where

105
00:04:05,400 --> 00:04:09,140
somebody has seen some evidence
suggestive of some maybe

106
00:04:09,140 --> 00:04:10,700
hidden environmental cause--

107
00:04:10,700 --> 00:04:14,810
maybe it's a toxic chemical
leak or something--

108
00:04:14,810 --> 00:04:16,940
that seems to be responsible
for-- or maybe they

109
00:04:16,940 --> 00:04:17,731
don't have a cause.

110
00:04:17,731 --> 00:04:19,670
They just see a
suspicious coincidence

111
00:04:19,670 --> 00:04:23,750
of some very rare disease, a
few cases that seem surprisingly

112
00:04:23,750 --> 00:04:25,560
clustered in space and time.

113
00:04:25,560 --> 00:04:29,330
So for example, let's say this
is one square mile of a city.

114
00:04:29,330 --> 00:04:34,070
And each dot represents one case
of some very rare disease that

115
00:04:34,070 --> 00:04:36,106
occurred in the span of a year.

116
00:04:36,106 --> 00:04:36,980
And you look at this.

117
00:04:36,980 --> 00:04:39,230
And you might think
that, well, it

118
00:04:39,230 --> 00:04:41,990
doesn't look like those dots are
completely, uniformly, randomly

119
00:04:41,990 --> 00:04:42,710
distributed over there.

120
00:04:42,710 --> 00:04:44,480
Maybe there's some
weird thing going on

121
00:04:44,480 --> 00:04:47,855
in the upper left or
northwest corner--

122
00:04:47,855 --> 00:04:49,220
some who knows what--

123
00:04:49,220 --> 00:04:50,390
making people sick.

124
00:04:50,390 --> 00:04:51,740
So let me just ask you.

125
00:04:51,740 --> 00:04:56,300
On a scale of 0 to 10,
where 10 means you're sure

126
00:04:56,300 --> 00:04:58,580
there's some kind
of thing going on

127
00:04:58,580 --> 00:05:02,122
and some special cause
in some part of this map.

128
00:05:02,122 --> 00:05:04,580
And 0 means no, you're quite
sure there's nothing going on.

129
00:05:04,580 --> 00:05:06,290
It's just random.

130
00:05:06,290 --> 00:05:08,870
What do you say?

131
00:05:08,870 --> 00:05:11,660
To what extent does this give
evidence for some hidden cause?

132
00:05:11,660 --> 00:05:13,718
So give me a number
between 0 and 10.

133
00:05:13,718 --> 00:05:15,167
AUDIENCE: 5.

134
00:05:15,167 --> 00:05:16,250
JOSH TENENBAUM: OK, great.

135
00:05:16,250 --> 00:05:17,014
5, 2, 7.

136
00:05:17,014 --> 00:05:18,680
I heard a few examples
of each of those.

137
00:05:18,680 --> 00:05:19,180
Perfect.

138
00:05:19,180 --> 00:05:20,442
That's exactly what people do.

139
00:05:20,442 --> 00:05:22,400
You could do the same
thing on Mechanical Turk,

140
00:05:22,400 --> 00:05:26,060
and get 10 times as much
data, and pay a lot more.

141
00:05:26,060 --> 00:05:27,512
It would be the same.

142
00:05:27,512 --> 00:05:28,970
I'll show you the
data in a second.

143
00:05:28,970 --> 00:05:30,990
But here's the
model that we built.

144
00:05:30,990 --> 00:05:33,620
So again, this model
is a very simple kind

145
00:05:33,620 --> 00:05:36,560
of generative model
of a hidden cause

146
00:05:36,560 --> 00:05:38,390
that various people
in statistics

147
00:05:38,390 --> 00:05:40,130
have worked with for a while.

148
00:05:40,130 --> 00:05:45,490
We're basically modeling a
hidden cause as a mixture.

149
00:05:45,490 --> 00:05:47,140
Or I mean it's a
generative model,

150
00:05:47,140 --> 00:05:49,130
so we have to model
the whole data.

151
00:05:49,130 --> 00:05:50,741
When we say there's
a hidden cause,

152
00:05:50,741 --> 00:05:53,240
we don't necessarily mean that
everything is caused by this.

153
00:05:53,240 --> 00:05:56,570
It's just that the data
we see in this picture

154
00:05:56,570 --> 00:06:01,250
is a mixture of whatever the
normal random thing going on is

155
00:06:01,250 --> 00:06:05,420
plus possibly some spatially
localized cause that

156
00:06:05,420 --> 00:06:08,570
has some unknown
position, unknown extent.

157
00:06:08,570 --> 00:06:10,160
Maybe it's a very big region.

158
00:06:10,160 --> 00:06:11,720
And some unknown
intensity-- maybe

159
00:06:11,720 --> 00:06:13,910
it's causing a lot of
cases or not that many.

160
00:06:13,910 --> 00:06:16,280
The hypothesis space maybe
is best visualized like this.

161
00:06:16,280 --> 00:06:19,790
Each of these squares is
a different hypothesis

162
00:06:19,790 --> 00:06:21,980
of a mixture density
or a mixture model--

163
00:06:21,980 --> 00:06:24,650
which is a mixture of just
whatever the normal uniform

164
00:06:24,650 --> 00:06:28,490
process is that causes a disease
unrelated to space and then

165
00:06:28,490 --> 00:06:31,730
some kind of just Gaussian bump,
which can vary in location,

166
00:06:31,730 --> 00:06:36,110
size, and intensity, that
is the possible hidden cause

167
00:06:36,110 --> 00:06:37,670
of some of these cases.

168
00:06:37,670 --> 00:06:39,840
And what the model
that we propose says

169
00:06:39,840 --> 00:06:41,932
is that your sense of
this spatial coincidence--

170
00:06:41,932 --> 00:06:44,390
like when you look at a pattern
of dots really and you see,

171
00:06:44,390 --> 00:06:46,931
oh, that looks like there's a
hidden cluster there somewhere.

172
00:06:46,931 --> 00:06:50,466
It's basically you're trying to
see whether something like one

173
00:06:50,466 --> 00:06:53,090
of those things on the right is
going on as opposed to the null

174
00:06:53,090 --> 00:06:55,280
hypothesis of just
pure randomness.

175
00:06:55,280 --> 00:06:59,150
So we take this log
likelihood ratio,

176
00:06:59,150 --> 00:07:01,700
or log probability,
where we're comparing

177
00:07:01,700 --> 00:07:05,120
the probability of the
data under the hypothesis

178
00:07:05,120 --> 00:07:08,720
that there's some
interesting hidden cause, one

179
00:07:08,720 --> 00:07:11,300
of the things on the right,
versus the alternative

180
00:07:11,300 --> 00:07:12,710
hypothesis that
it's just random,

181
00:07:12,710 --> 00:07:15,641
which is just the simple,
completely uniform density.

182
00:07:15,641 --> 00:07:18,140
And what makes this a little
bit interesting computationally

183
00:07:18,140 --> 00:07:21,560
is that there's an infinite
number of these possibilities

184
00:07:21,560 --> 00:07:22,350
on the right.

185
00:07:22,350 --> 00:07:26,000
There's an infinite number
of different locations

186
00:07:26,000 --> 00:07:27,819
and sizes and intensities
of the Gaussian.

187
00:07:27,819 --> 00:07:29,610
And you have to integrate
over all of them.

188
00:07:29,610 --> 00:07:31,484
So again, there's not
going to be a whole lot

189
00:07:31,484 --> 00:07:33,150
of mathematical details here.

190
00:07:33,150 --> 00:07:35,210
But you can read
about this stuff

191
00:07:35,210 --> 00:07:39,410
if you want to read these
papers that we had here.

192
00:07:39,410 --> 00:07:42,800
But for those of you who
are familiar with this,

193
00:07:42,800 --> 00:07:44,870
with working with
latent variable models,

194
00:07:44,870 --> 00:07:50,240
effectively what
you're doing is just

195
00:07:50,240 --> 00:07:54,200
integrating either analytically
or in a simulation over all

196
00:07:54,200 --> 00:07:56,570
the possible models
and sort of trying

197
00:07:56,570 --> 00:07:59,087
to compute on average how
much does the evidence support

198
00:07:59,087 --> 00:08:01,670
something like what you see on
the right, one of those cluster

199
00:08:01,670 --> 00:08:04,730
possibilities, versus
just uniform density.

200
00:08:04,730 --> 00:08:06,920
And now what I'm showing
you is that model

201
00:08:06,920 --> 00:08:10,260
compared to people's
judgments on an experiment.

202
00:08:10,260 --> 00:08:12,590
So in this experiment,
we showed people patterns

203
00:08:12,590 --> 00:08:14,120
like the one you just saw.

204
00:08:14,120 --> 00:08:17,780
The one you saw
is this one here.

205
00:08:17,780 --> 00:08:20,900
But in the different
stimuli, we varied parameters

206
00:08:20,900 --> 00:08:22,610
that we thought
would be relevant.

207
00:08:22,610 --> 00:08:26,300
So we varied how many
points were there total,

208
00:08:26,300 --> 00:08:29,000
how strong the cluster was in
various ways, whether it was

209
00:08:29,000 --> 00:08:31,640
very tightly
clustered or very big,

210
00:08:31,640 --> 00:08:35,270
the relative number of points
in the cluster versus not.

211
00:08:35,270 --> 00:08:37,039
So what you can see
here, for example,

212
00:08:37,039 --> 00:08:39,230
is it's a very similar
kind of geometry,

213
00:08:39,230 --> 00:08:43,179
except here this is a
sort of biggish cluster.

214
00:08:43,179 --> 00:08:46,340
And then we're making basically
there's four points that look

215
00:08:46,340 --> 00:08:47,870
clustered and two that aren't.

216
00:08:47,870 --> 00:08:50,000
And in these cases, we
just make the four points

217
00:08:50,000 --> 00:08:51,500
more tightly clustered.

218
00:08:51,500 --> 00:08:54,890
Here, what we're
doing is we're going

219
00:08:54,890 --> 00:08:57,961
from having no points that look
clustered to having almost all

220
00:08:57,961 --> 00:08:59,960
of the points looking
clustered and just varying

221
00:08:59,960 --> 00:09:03,200
the ratio of clustered points
to non-clustered points.

222
00:09:03,200 --> 00:09:05,160
Here, we're just changing
the overall number.

223
00:09:05,160 --> 00:09:07,576
So notice that this one is
basically the same as this one.

224
00:09:11,036 --> 00:09:12,410
So again, at both
of these, we've

225
00:09:12,410 --> 00:09:15,601
got four clustered points and
two seemingly non-clustered

226
00:09:15,601 --> 00:09:16,100
ones.

227
00:09:16,100 --> 00:09:18,440
And here we just scale
up in set-- or scale

228
00:09:18,440 --> 00:09:20,270
up from four to two,
to eight and four.

229
00:09:20,270 --> 00:09:23,107
And here we scale it
down to two and one,

230
00:09:23,107 --> 00:09:24,440
and various other manipulations.

231
00:09:24,440 --> 00:09:25,910
And what you can
see is that they

232
00:09:25,910 --> 00:09:28,604
have various systematic
effects on people's judgments.

233
00:09:28,604 --> 00:09:30,020
So what I'm calling
the data there

234
00:09:30,020 --> 00:09:32,292
is the average of
about 150 people

235
00:09:32,292 --> 00:09:33,750
who did the same
judgment you did--

236
00:09:33,750 --> 00:09:35,000
0 to 10.

237
00:09:35,000 --> 00:09:38,810
What you can see is the one
I gave you was this one here.

238
00:09:38,810 --> 00:09:41,372
And the average judgment
was almost exactly five.

239
00:09:41,372 --> 00:09:42,830
And if you look at
the variance, it

240
00:09:42,830 --> 00:09:44,040
looks just like
what you saw here.

241
00:09:44,040 --> 00:09:45,290
Some people say two or three.

242
00:09:45,290 --> 00:09:47,250
Some people say seven.

243
00:09:47,250 --> 00:09:49,374
I chose one that was
right in the middle.

244
00:09:49,374 --> 00:09:51,290
The interesting thing
is that, while you maybe

245
00:09:51,290 --> 00:09:52,700
felt like you were
guessing-- and if you just

246
00:09:52,700 --> 00:09:54,020
listened to what
everyone else was saying,

247
00:09:54,020 --> 00:09:56,561
maybe it sounds like we're just
shouting out random numbers--

248
00:09:56,561 --> 00:09:58,019
that's not what you're doing.

249
00:09:58,019 --> 00:09:59,810
On that one, it looks
like it, because it's

250
00:09:59,810 --> 00:10:00,839
right on the threshold.

251
00:10:00,839 --> 00:10:03,130
But if you look over all
these different patterns, what

252
00:10:03,130 --> 00:10:05,500
you see is that sometimes
people give much higher numbers

253
00:10:05,500 --> 00:10:05,930
than others.

254
00:10:05,930 --> 00:10:08,096
Sometimes people give much
lower number than others.

255
00:10:08,096 --> 00:10:10,579
And the details,
that variation, both

256
00:10:10,579 --> 00:10:12,370
within these different
manipulations we did

257
00:10:12,370 --> 00:10:14,710
and across them,
are almost perfectly

258
00:10:14,710 --> 00:10:17,560
captured by this very simple
probabilistic generative

259
00:10:17,560 --> 00:10:18,890
model for a latent cause.

260
00:10:18,890 --> 00:10:22,084
So the model here is-- this is
the predictions that model I

261
00:10:22,084 --> 00:10:23,500
showed you is
making, where again,

262
00:10:23,500 --> 00:10:25,510
basically, a high
bar means there's

263
00:10:25,510 --> 00:10:30,700
strong evidence in favor of the
hidden latent cause hypothesis.

264
00:10:30,700 --> 00:10:33,760
Some, one, or more--

265
00:10:33,760 --> 00:10:36,400
some cluster-- that low
bar means strong evidence

266
00:10:36,400 --> 00:10:38,020
for the alternative hypothesis.

267
00:10:38,020 --> 00:10:39,760
The scale is a bit arbitrary.

268
00:10:39,760 --> 00:10:43,302
And it's a log
probability ratio scale.

269
00:10:43,302 --> 00:10:45,010
So I'm not going to
comment on the scale.

270
00:10:45,010 --> 00:10:47,384
But importantly, it's the same
scale across all of these.

271
00:10:47,384 --> 00:10:49,810
So a big difference is,
it's the same big difference

272
00:10:49,810 --> 00:10:51,200
in both cases.

273
00:10:51,200 --> 00:10:53,136
And I don't think this
is fairly good evidence

274
00:10:53,136 --> 00:10:54,760
that this model is
capturing your sense

275
00:10:54,760 --> 00:10:57,130
of spatial coincidence
and showing that it's not

276
00:10:57,130 --> 00:10:59,272
just random or arbitrary,
but it's actually

277
00:10:59,272 --> 00:11:01,480
a very rational measure of
how much evidence there is

278
00:11:01,480 --> 00:11:03,850
in the data for a hidden cause.

279
00:11:03,850 --> 00:11:05,364
Here's the same
model now applied

280
00:11:05,364 --> 00:11:07,030
to a different data
set that we actually

281
00:11:07,030 --> 00:11:09,102
collected a few
years before, which

282
00:11:09,102 --> 00:11:10,810
just varies the same
kinds of parameters,

283
00:11:10,810 --> 00:11:12,160
but has a lot more points.

284
00:11:12,160 --> 00:11:15,534
And the same model works
in those cases, too.

285
00:11:15,534 --> 00:11:17,200
The differences are
a little more subtle

286
00:11:17,200 --> 00:11:20,350
with these more points.

287
00:11:20,350 --> 00:11:24,790
So I'll give you one other
example of this sort of thing.

288
00:11:24,790 --> 00:11:26,890
Like the one I just
showed you, we're

289
00:11:26,890 --> 00:11:29,200
taking a fairly simple
statistical model.

290
00:11:29,200 --> 00:11:31,400
This one, as you'll see,
isn't even really causal.

291
00:11:31,400 --> 00:11:33,400
This one at least, that
I showed you, is causal.

292
00:11:33,400 --> 00:11:35,230
The advantage of
this other one is

293
00:11:35,230 --> 00:11:38,175
that it's both a kind of
textbook statistics example,

294
00:11:38,175 --> 00:11:40,300
it's one where people do
something more interesting

295
00:11:40,300 --> 00:11:41,380
than what's in the textbook.

296
00:11:41,380 --> 00:11:43,588
Although you can extend the
textbook analysis to make

297
00:11:43,588 --> 00:11:45,250
it look like what people do.

298
00:11:45,250 --> 00:11:47,200
And unlike in this
case here, you

299
00:11:47,200 --> 00:11:49,210
can actually measure
the empirical statistic.

300
00:11:49,210 --> 00:11:51,340
You can go out, and instead
of just like positing,

301
00:11:51,340 --> 00:11:54,944
here's a simple model of what
a latent environmental cause

302
00:11:54,944 --> 00:11:56,860
would be like, you can
actually go and measure

303
00:11:56,860 --> 00:11:58,870
all the relevant
probability distributions

304
00:11:58,870 --> 00:12:01,530
and compare people not
just with a notional model,

305
00:12:01,530 --> 00:12:03,570
but with what, in
some stronger sense,

306
00:12:03,570 --> 00:12:05,890
is the rational
thing to do, if you

307
00:12:05,890 --> 00:12:09,140
were doing some kind of
intuitive Bayesian inference.

308
00:12:09,140 --> 00:12:11,290
So these are, again
stuff that Tom Griffiths

309
00:12:11,290 --> 00:12:15,540
did with me, in an in and
then after grad school.

310
00:12:15,540 --> 00:12:17,290
We asked people to
make the following kind

311
00:12:17,290 --> 00:12:18,400
of everyday prediction.

312
00:12:18,400 --> 00:12:21,520
So we said, suppose you
read about a movie that's

313
00:12:21,520 --> 00:12:23,650
made $60 million to date.

314
00:12:23,650 --> 00:12:25,505
How much money will
it make in total?

315
00:12:25,505 --> 00:12:27,130
Or you see that
something's been baking

316
00:12:27,130 --> 00:12:28,610
in the oven for 34 minutes.

317
00:12:28,610 --> 00:12:30,310
How long until it's ready?

318
00:12:30,310 --> 00:12:31,810
You meet someone
who's 78 years old.

319
00:12:31,810 --> 00:12:33,570
How long will they live?

320
00:12:33,570 --> 00:12:36,070
Your friend quotes to you from
line 17 of his favorite poem.

321
00:12:36,070 --> 00:12:37,510
How long is the poem?

322
00:12:37,510 --> 00:12:40,000
Or you meet a US congressman
who has served for 11 years.

323
00:12:40,000 --> 00:12:41,770
How long will he serve in total?

324
00:12:41,770 --> 00:12:43,420
So in each of
these cases, you're

325
00:12:43,420 --> 00:12:46,990
encountering some
phenomenon or event

326
00:12:46,990 --> 00:12:49,410
in the world with some
unknown total duration.

327
00:12:49,410 --> 00:12:51,100
We'll call that t, total.

328
00:12:51,100 --> 00:12:53,770
And all we know is that
t, total, is somewhere

329
00:12:53,770 --> 00:12:55,090
between zero and infinity.

330
00:12:55,090 --> 00:12:58,426
We might have a prior on it,
as you'll see in a second.

331
00:12:58,426 --> 00:13:00,977
But we don't know very much
about this particular t,

332
00:13:00,977 --> 00:13:04,210
total except you get one
example, one piece of data,

333
00:13:04,210 --> 00:13:08,227
some t, which we'll just
assume is just randomly sampled

334
00:13:08,227 --> 00:13:09,310
between zero and t, total.

335
00:13:09,310 --> 00:13:11,440
So all we know is that
whatever these things are,

336
00:13:11,440 --> 00:13:14,830
it's something randomly chosen,
less than the total extent

337
00:13:14,830 --> 00:13:17,950
or duration of these events.

338
00:13:17,950 --> 00:13:19,750
And now we can ask,
what can you guess

339
00:13:19,750 --> 00:13:21,850
about the total extent
or duration from that one

340
00:13:21,850 --> 00:13:22,390
observation?

341
00:13:22,390 --> 00:13:25,600
Or in mathematical terms,
there is some unknown interval

342
00:13:25,600 --> 00:13:27,880
from zero up to
some maximal value.

343
00:13:27,880 --> 00:13:30,340
You can put a prior on
what that interval is.

344
00:13:30,340 --> 00:13:33,100
And you have to guess the
interval from one sampled point

345
00:13:33,100 --> 00:13:34,450
sampled randomly within it.

346
00:13:34,450 --> 00:13:36,250
It's also very similar--
and another reason

347
00:13:36,250 --> 00:13:38,083
we studied this-- to
the problem of learning

348
00:13:38,083 --> 00:13:39,620
a concept from one example.

349
00:13:39,620 --> 00:13:41,870
When you're learning what
horses are from one example,

350
00:13:41,870 --> 00:13:43,390
or when you're learning what
that piece of rock climbing

351
00:13:43,390 --> 00:13:44,740
question is-- what's a cam--

352
00:13:44,740 --> 00:13:48,154
from one example, or what's
a tufa from one example.

353
00:13:48,154 --> 00:13:49,570
You can think,
there's some region

354
00:13:49,570 --> 00:13:51,700
in the space of all possible
objects or something,

355
00:13:51,700 --> 00:13:52,616
or some set out there.

356
00:13:52,616 --> 00:13:54,760
And you get one or
a few sample points,

357
00:13:54,760 --> 00:13:57,000
and you have to figure out
the extent of the region.

358
00:13:57,000 --> 00:13:59,740
It's basically the same kind
of problem, mathematically.

359
00:13:59,740 --> 00:14:01,240
But what's cool
about this is we can

360
00:14:01,240 --> 00:14:04,000
measure the priors for these
different classes of events

361
00:14:04,000 --> 00:14:07,190
and compare people with an
optimal Bayesian inference.

362
00:14:07,190 --> 00:14:09,550
And you see something
kind of striking.

363
00:14:09,550 --> 00:14:11,930
So here's, on the top--

364
00:14:11,930 --> 00:14:13,990
I'm showing two different
kinds of data here.

365
00:14:13,990 --> 00:14:16,780
On the top are just empirical
statistics of events

366
00:14:16,780 --> 00:14:19,270
you can measure in the world;
nothing behavioral, nothing

367
00:14:19,270 --> 00:14:20,470
about cognition.

368
00:14:20,470 --> 00:14:23,650
On the bottom, I'm showing
some behavioral data

369
00:14:23,650 --> 00:14:25,540
and comparing it with
model predictions that

370
00:14:25,540 --> 00:14:28,340
are based on the statistics
that are measured on top.

371
00:14:28,340 --> 00:14:29,920
So what we have
in each column is

372
00:14:29,920 --> 00:14:34,000
one of these classes of events,
like movie grosses in dollars.

373
00:14:34,000 --> 00:14:37,090
You can get this data from iMDB,
the Internet Movie Database.

374
00:14:37,090 --> 00:14:41,470
You can see that most movies
make $100 million or less.

375
00:14:41,470 --> 00:14:42,730
There's sort of a power law.

376
00:14:42,730 --> 00:14:46,630
But a few movies make hundreds,
or even many hundreds,

377
00:14:46,630 --> 00:14:48,647
maybe a billion dollars
even, these days.

378
00:14:48,647 --> 00:14:50,980
Similarly with poems, they
have a power law distribution

379
00:14:50,980 --> 00:14:51,549
of length.

380
00:14:51,549 --> 00:14:52,840
So most poems are pretty short.

381
00:14:52,840 --> 00:14:54,250
They fit on a page or less.

382
00:14:54,250 --> 00:14:55,810
But there are some
epic poems, or

383
00:14:55,810 --> 00:14:58,240
some multi-page-- many,
many hundreds of lines.

384
00:14:58,240 --> 00:15:01,290
And they fall off
with a long tail.

385
00:15:01,290 --> 00:15:04,380
Lifespans, movie runtimes
are kind of unimodal, almost

386
00:15:04,380 --> 00:15:06,030
Gaussian-- not exactly.

387
00:15:06,030 --> 00:15:09,420
Those red curves,
histograms' bars,

388
00:15:09,420 --> 00:15:11,070
show the empirical
statistics that we

389
00:15:11,070 --> 00:15:13,050
measured from public data.

390
00:15:13,050 --> 00:15:15,171
And the red curves
show just the best fit

391
00:15:15,171 --> 00:15:17,670
of a simple parametric model,
like a Gaussian or a power law

392
00:15:17,670 --> 00:15:20,130
distribution that
I'm mentioning.

393
00:15:20,130 --> 00:15:22,740
House of representatives-- how
long people serve in the House

394
00:15:22,740 --> 00:15:24,840
has this kind of gamma,
or particular gamma

395
00:15:24,840 --> 00:15:28,020
called an Erlang shape
with a little bit

396
00:15:28,020 --> 00:15:29,820
of an incumbent effect.

397
00:15:29,820 --> 00:15:32,640
Cake baking times-- so
remember we asked how long

398
00:15:32,640 --> 00:15:34,506
is this cake going to bake for.

399
00:15:34,506 --> 00:15:36,880
They don't have any simple
parametric form when you go in

400
00:15:36,880 --> 00:15:38,240
and look at cookbooks.

401
00:15:38,240 --> 00:15:41,132
But you see, there's
something systematic there.

402
00:15:41,132 --> 00:15:42,840
There's a lot of things
that are supposed

403
00:15:42,840 --> 00:15:44,610
to bake for exactly an hour.

404
00:15:44,610 --> 00:15:48,360
There are some which have
a smaller, or a shorter,

405
00:15:48,360 --> 00:15:49,080
but broad mode.

406
00:15:49,080 --> 00:15:53,090
And then there's a few epic
90-minute cakes out there.

407
00:15:53,090 --> 00:15:54,900
So that's all the
empirical statistics.

408
00:15:54,900 --> 00:15:59,410
Now what you're seeing on
the bottom is people's--

409
00:15:59,410 --> 00:16:01,990
well, on the y-axis,
the vertical axis,

410
00:16:01,990 --> 00:16:04,080
you have the average--
it's a median--

411
00:16:04,080 --> 00:16:06,990
of a bunch of human predictions
for the total extent

412
00:16:06,990 --> 00:16:09,990
of any one of these
things, like your guess

413
00:16:09,990 --> 00:16:12,150
of the total length
of a poem given that,

414
00:16:12,150 --> 00:16:14,310
basically, there
is a line 17 in it.

415
00:16:14,310 --> 00:16:16,880
And on the x-axis, what you're
seeing is that one data point,

416
00:16:16,880 --> 00:16:18,630
the one value of t,
which is, all you know

417
00:16:18,630 --> 00:16:20,671
is that it's somewhere
between zero and t, total.

418
00:16:23,160 --> 00:16:25,470
So different groups
of subjects were

419
00:16:25,470 --> 00:16:26,640
given five different values.

420
00:16:26,640 --> 00:16:29,340
So you see five black dots,
which correspond to what

421
00:16:29,340 --> 00:16:31,080
five different
subgroups of subjects

422
00:16:31,080 --> 00:16:35,106
said for each of these
possible t values.

423
00:16:35,106 --> 00:16:36,480
And then the black
and red curves

424
00:16:36,480 --> 00:16:39,150
are the model fit,
which comes from taking

425
00:16:39,150 --> 00:16:42,000
a certain kind of Bayesian
optimal prediction, where

426
00:16:42,000 --> 00:16:44,070
the prior is what's
specified on the top-- that's

427
00:16:44,070 --> 00:16:45,480
the prior on t, total.

428
00:16:45,480 --> 00:16:50,920
The likelihood is a sort
of uniform random density.

429
00:16:50,920 --> 00:16:54,045
So it's just saying t is just a
uniform random sample from zero

430
00:16:54,045 --> 00:16:55,140
up to t, total.

431
00:16:55,140 --> 00:16:57,870
You put those together
to compute a posterior.

432
00:16:57,870 --> 00:17:00,390
And then you-- the particular
estimator we're using

433
00:17:00,390 --> 00:17:02,060
is what's called the
posterior median.

434
00:17:02,060 --> 00:17:03,976
So we're looking at the
median of the exterior

435
00:17:03,976 --> 00:17:06,270
and comparing that with a
median of human subjects.

436
00:17:06,270 --> 00:17:11,354
And what you can see is that
it's almost a perfect fit.

437
00:17:11,354 --> 00:17:13,020
And it doesn't really
matter whether you

438
00:17:13,020 --> 00:17:16,230
take the red curve, which is
what comes from approximating

439
00:17:16,230 --> 00:17:18,954
the prior with one of these
simple parametric models,

440
00:17:18,954 --> 00:17:20,579
or the black one,
which comes from just

441
00:17:20,579 --> 00:17:22,470
taking the empirical histogram.

442
00:17:22,470 --> 00:17:24,060
Although, for the
cake baking times,

443
00:17:24,060 --> 00:17:26,160
you really can only go
for the empirical one.

444
00:17:26,160 --> 00:17:27,910
Because there is no
simple parametric one.

445
00:17:27,910 --> 00:17:30,930
That's why you just see a
jagged black line there.

446
00:17:30,930 --> 00:17:34,800
But it's interesting that
it's almost a perfect fit.

447
00:17:34,800 --> 00:17:36,090
There are a couple--

448
00:17:36,090 --> 00:17:39,430
just like somebody
asked in Demis's talk--

449
00:17:39,430 --> 00:17:42,360
there's one or two cases we
found where this model doesn't

450
00:17:42,360 --> 00:17:45,374
work, sometimes dramatically,
and sometimes a little bit.

451
00:17:45,374 --> 00:17:46,540
And they're all interesting.

452
00:17:46,540 --> 00:17:47,650
But I have time
to talk about it.

453
00:17:47,650 --> 00:17:48,750
That's one of the things
I decided to skip.

454
00:17:48,750 --> 00:17:50,560
If you'd like to talk about
it, I'm happy to do that.

455
00:17:50,560 --> 00:17:52,935
But most of the time, in most
of the cases we've studied,

456
00:17:52,935 --> 00:17:53,977
these are representative.

457
00:17:53,977 --> 00:17:55,810
And I think, again, all
of the failure cases

458
00:17:55,810 --> 00:17:56,970
are quite interesting ones.

459
00:17:56,970 --> 00:17:58,844
That point to, this is
one of the many things

460
00:17:58,844 --> 00:18:00,360
we need to go beyond.

461
00:18:00,360 --> 00:18:02,040
But the interesting
thing isn't just

462
00:18:02,040 --> 00:18:05,220
that the curves fit
the data, but the fact

463
00:18:05,220 --> 00:18:07,650
that the actual shape is
different in each case.

464
00:18:07,650 --> 00:18:10,866
Depending on the prior of this
different classes of events,

465
00:18:10,866 --> 00:18:12,990
you get a fundamentally
different, or qualitatively

466
00:18:12,990 --> 00:18:14,894
different, prediction function.

467
00:18:14,894 --> 00:18:15,810
Sometimes it's linear.

468
00:18:15,810 --> 00:18:16,893
Sometimes it's non-linear.

469
00:18:16,893 --> 00:18:18,540
Sometimes it has
some weird shape.

470
00:18:18,540 --> 00:18:23,910
And really, quite
surprisingly to us,

471
00:18:23,910 --> 00:18:25,410
people seem to be
sensitive to that.

472
00:18:25,410 --> 00:18:29,640
So they seem to predict in ways
that are reflective of not only

473
00:18:29,640 --> 00:18:31,140
the optimal Bayesian
thing to do,

474
00:18:31,140 --> 00:18:34,110
but the optimal Bayesian thing
to do from the optimal prior,

475
00:18:34,110 --> 00:18:36,420
from the correct prior.

476
00:18:36,420 --> 00:18:39,050
And I certainly don't want
to suggest that people always

477
00:18:39,050 --> 00:18:39,550
do this.

478
00:18:39,550 --> 00:18:40,924
But it was very
interesting to us

479
00:18:40,924 --> 00:18:44,280
that for just a bunch of
everyday events, and really,

480
00:18:44,280 --> 00:18:46,800
the places where this
analysis works best

481
00:18:46,800 --> 00:18:49,470
are ones, again, where we think
people actually might plausibly

482
00:18:49,470 --> 00:18:52,500
have good reasons to have
the relevant experiences

483
00:18:52,500 --> 00:18:54,360
with these everyday
events, they seem

484
00:18:54,360 --> 00:18:58,350
to be sensitive to both the
statistics in the sense of just

485
00:18:58,350 --> 00:19:00,090
what's going on in
the world and doing

486
00:19:00,090 --> 00:19:02,980
the right statistical
prediction.

487
00:19:02,980 --> 00:19:04,095
So that's what we did.

488
00:19:04,095 --> 00:19:05,886
10 years ago or so,
that was like the state

489
00:19:05,886 --> 00:19:07,189
of the art for us.

490
00:19:07,189 --> 00:19:08,730
And then we wanted
to know, well, OK,

491
00:19:08,730 --> 00:19:11,110
can we take these sorts
of ideas and scale them up

492
00:19:11,110 --> 00:19:13,110
to some actually interesting
cognitive problems,

493
00:19:13,110 --> 00:19:17,100
like say, for example, learning
words for object categories.

494
00:19:17,100 --> 00:19:18,894
And we did some of that.

495
00:19:18,894 --> 00:19:20,310
I'll show you a
little bit of that

496
00:19:20,310 --> 00:19:22,680
before showing you what I
think was missing there.

497
00:19:22,680 --> 00:19:26,980
I mean, in a lot of ways,
this is a harder problem.

498
00:19:26,980 --> 00:19:28,530
I mean, it's very
similar, as I said.

499
00:19:28,530 --> 00:19:30,930
It's basically like, there's
just like the problem I just

500
00:19:30,930 --> 00:19:32,610
showed you, where there
was an unknown total extent

501
00:19:32,610 --> 00:19:34,950
or duration, and you got
one random sample from it,

502
00:19:34,950 --> 00:19:36,960
here there is some un--

503
00:19:36,960 --> 00:19:40,980
imagine the space of
all possible objects--

504
00:19:40,980 --> 00:19:43,860
could be a manifold or
described by a bunch of knobs.

505
00:19:43,860 --> 00:19:46,500
I mean, these are all generated
from some computer program.

506
00:19:46,500 --> 00:19:48,083
If these were real,
biological things,

507
00:19:48,083 --> 00:19:51,000
they would be generated
from DNA or whatever it is.

508
00:19:51,000 --> 00:19:54,660
But there's some huge, maybe
interestingly structured, space

509
00:19:54,660 --> 00:19:55,920
of all possible objects.

510
00:19:55,920 --> 00:20:00,610
And within that space is some
subset, some region or subset,

511
00:20:00,610 --> 00:20:03,670
somehow described that
is the set of tufas.

512
00:20:03,670 --> 00:20:06,791
And somehow you're able to
grasp that subset, more or less,

513
00:20:06,791 --> 00:20:09,040
if you get its boundaries,
to be able to say yes or no

514
00:20:09,040 --> 00:20:10,770
as you did at the
beginning of the lecture

515
00:20:10,770 --> 00:20:13,030
from just, in this case, a
few points-- three points--

516
00:20:13,030 --> 00:20:15,809
randomly sampled from
somewhere in that region.

517
00:20:15,809 --> 00:20:18,100
It would work just as well
if I showed you one of them,

518
00:20:18,100 --> 00:20:19,210
basically.

519
00:20:19,210 --> 00:20:20,950
So in some sense,
it's the same problem.

520
00:20:20,950 --> 00:20:23,320
But it's much harder,
because here, the space

521
00:20:23,320 --> 00:20:24,970
was this one dimensional thing.

522
00:20:24,970 --> 00:20:26,344
It was just a number.

523
00:20:26,344 --> 00:20:27,760
Whereas here, we
don't know what's

524
00:20:27,760 --> 00:20:29,770
the dimensionality of
the space of objects.

525
00:20:29,770 --> 00:20:31,300
We don't know how to
describe the regions.

526
00:20:31,300 --> 00:20:32,380
Here we knew how to
describe the regions.

527
00:20:32,380 --> 00:20:34,460
They were just intervals
with a lower bound at zero

528
00:20:34,460 --> 00:20:35,860
and an upper bound at
some unknown thing.

529
00:20:35,860 --> 00:20:37,960
And the hypothesis space
of possible regions

530
00:20:37,960 --> 00:20:41,484
was just all the possible upper
bounds of this event duration.

531
00:20:41,484 --> 00:20:43,400
Here we don't know how
to describe this space.

532
00:20:43,400 --> 00:20:45,316
We don't know how to
describe the regions that

533
00:20:45,316 --> 00:20:47,650
correspond to object concepts.

534
00:20:47,650 --> 00:20:51,010
We don't know how to put a
price on those hypotheses.

535
00:20:51,010 --> 00:20:53,590
But in some work that
we did-- in particular,

536
00:20:53,590 --> 00:20:55,480
some work that I did
with Fei Xu, who is also

537
00:20:55,480 --> 00:20:56,920
a professor at Berkeley.

538
00:20:56,920 --> 00:21:00,070
We were colleagues and
friends in graduate school.

539
00:21:00,070 --> 00:21:02,320
We sort of did what
we could at the time.

540
00:21:02,320 --> 00:21:05,050
So we made some guesses about
what that hypothesis space--

541
00:21:05,050 --> 00:21:07,633
what that space might be like,
what the hypothesis space might

542
00:21:07,633 --> 00:21:09,826
be like, how to put some
priors, and so, on there.

543
00:21:09,826 --> 00:21:11,200
Used exactly the
same likelihood,

544
00:21:11,200 --> 00:21:14,260
which was just this very simple
idea that the observed examples

545
00:21:14,260 --> 00:21:18,010
are a uniform random draw
from some subset of the world.

546
00:21:18,010 --> 00:21:20,710
And you have to figure
out what that subset is.

547
00:21:20,710 --> 00:21:22,400
And we were able to
make some progress.

548
00:21:22,400 --> 00:21:25,750
So what we did was we said,
well, like in biology,

549
00:21:25,750 --> 00:21:28,570
perhaps-- and if you saw-- how
many people saw Surya Ganguli's

550
00:21:28,570 --> 00:21:30,970
lecture yesterday morning?

551
00:21:30,970 --> 00:21:32,925
Cool.

552
00:21:32,925 --> 00:21:35,710
I sort of tailored this for
assuming that you probably

553
00:21:35,710 --> 00:21:36,370
had seen that.

554
00:21:36,370 --> 00:21:40,617
Because there's a lot of
similarities, or parallels,

555
00:21:40,617 --> 00:21:41,200
which is neat.

556
00:21:41,200 --> 00:21:43,660
And it's, again, part of
engaging on generative models

557
00:21:43,660 --> 00:21:44,980
and neural networks.

558
00:21:44,980 --> 00:21:47,410
As you saw him do, you'll
get my version of this.

559
00:21:47,410 --> 00:21:50,530
So also, like he
mentioned, there

560
00:21:50,530 --> 00:21:54,310
are actual processes in the
world which generate objects--

561
00:21:54,310 --> 00:21:56,530
something like this.

562
00:21:56,530 --> 00:21:58,180
We know about evolution--

563
00:21:58,180 --> 00:22:01,220
produces basically
tree-structured groups,

564
00:22:01,220 --> 00:22:04,420
which we call species, or genus,
or something like that, or just

565
00:22:04,420 --> 00:22:05,350
taxa, or something.

566
00:22:05,350 --> 00:22:06,970
There's groups of
organisms that have

567
00:22:06,970 --> 00:22:08,560
a common evolutionary descent.

568
00:22:08,560 --> 00:22:11,000
That's the way a biologist
might describe it.

569
00:22:11,000 --> 00:22:14,140
And we know, these days,
a lot about the mechanisms

570
00:22:14,140 --> 00:22:15,490
that produce that.

571
00:22:15,490 --> 00:22:18,340
Even going back
100 or 200 years,

572
00:22:18,340 --> 00:22:19,997
say, to Darwin,
we knew something

573
00:22:19,997 --> 00:22:21,580
about the mechanisms
that produced it,

574
00:22:21,580 --> 00:22:24,100
even if we didn't know
the genetic details, ideas

575
00:22:24,100 --> 00:22:27,400
of something like mutation,
variation, natural selection

576
00:22:27,400 --> 00:22:28,900
as a kind of
mechanistic account,

577
00:22:28,900 --> 00:22:31,080
about right up there
with Newton and forces.

578
00:22:31,080 --> 00:22:33,666
But anyway, scientists
can describe some process

579
00:22:33,666 --> 00:22:34,540
that generates trees.

580
00:22:34,540 --> 00:22:36,160
And maybe people
have some intuition,

581
00:22:36,160 --> 00:22:37,870
just like people seem
to have some intuitions

582
00:22:37,870 --> 00:22:39,620
about these statistics
of everyday events,

583
00:22:39,620 --> 00:22:41,530
maybe they have some
intuitions, somehow,

584
00:22:41,530 --> 00:22:43,990
about the causal processes
in the world, which

585
00:22:43,990 --> 00:22:47,730
give rise to groups and
groups and subgroups.

586
00:22:47,730 --> 00:22:50,290
And they can use that to
set up a hypothesis space.

587
00:22:50,290 --> 00:22:52,180
And the way we
went about this is,

588
00:22:52,180 --> 00:22:55,145
we have no idea how to describe
people's internal mental models

589
00:22:55,145 --> 00:22:57,020
of these things, but
you can do some simple--

590
00:22:57,020 --> 00:22:59,410
there are simple ways
to get this picture

591
00:22:59,410 --> 00:23:02,084
by just basically asking
people to judge similarity

592
00:23:02,084 --> 00:23:03,500
and doing hierarchical
clustering.

593
00:23:03,500 --> 00:23:06,880
So this is a tree that we built
up by just asking people--

594
00:23:06,880 --> 00:23:08,680
getting some subjective
similarity metric

595
00:23:08,680 --> 00:23:11,470
and then doing hierarchical
clustering, which we thought

596
00:23:11,470 --> 00:23:13,990
could roughly approximate
maybe the internal hierarchy

597
00:23:13,990 --> 00:23:15,604
that our mental
models impose on this.

598
00:23:15,604 --> 00:23:17,270
Were you raising your
hand or just-- no.

599
00:23:17,270 --> 00:23:17,540
OK.

600
00:23:17,540 --> 00:23:18,040
Cool.

601
00:23:20,740 --> 00:23:22,570
We ultimately found
this dissatisfying,

602
00:23:22,570 --> 00:23:24,190
because we don't really
know what the features are.

603
00:23:24,190 --> 00:23:25,810
We don't really know if
this is the right tree

604
00:23:25,810 --> 00:23:27,130
or how people built it up.

605
00:23:27,130 --> 00:23:28,630
But it actually
worked pretty well,

606
00:23:28,630 --> 00:23:30,546
in the sense that we
could build up this tree.

607
00:23:30,546 --> 00:23:33,550
We could then assume that
the hypotheses for concepts

608
00:23:33,550 --> 00:23:36,430
just corresponded to
branches of the tree.

609
00:23:36,430 --> 00:23:37,984
And then you could--

610
00:23:37,984 --> 00:23:39,400
again, to put it
just intuitively,

611
00:23:39,400 --> 00:23:41,733
the way you do this learning
from one or a few examples,

612
00:23:41,733 --> 00:23:44,230
let's say that you see
those few tufas over there.

613
00:23:44,230 --> 00:23:46,960
You're basically asking,
which branch of the tree

614
00:23:46,960 --> 00:23:50,470
do I think-- those are randomly
drawn from some internal branch

615
00:23:50,470 --> 00:23:52,360
of the tree, some subtree.

616
00:23:52,360 --> 00:23:53,440
Which subtree is it?

617
00:23:53,440 --> 00:23:55,690
And intuitively, if you see
those things and you say,

618
00:23:55,690 --> 00:23:57,820
well, they are randomly
drawn from some branch,

619
00:23:57,820 --> 00:24:00,190
maybe it's the one
that I've circled.

620
00:24:00,190 --> 00:24:03,670
That sounds like a better bet,
for example, than this one

621
00:24:03,670 --> 00:24:05,710
here, or maybe this
one, which would

622
00:24:05,710 --> 00:24:07,870
include one of these
things, but not the others.

623
00:24:07,870 --> 00:24:10,092
So that's probably unlikely.

624
00:24:10,092 --> 00:24:11,800
And it's probably a
better bet than, say,

625
00:24:11,800 --> 00:24:14,230
this branch, or this branch, or
these ones, which are logically

626
00:24:14,230 --> 00:24:16,300
compatible, but somehow
it would have been sort

627
00:24:16,300 --> 00:24:18,490
of a suspicious coincidence.

628
00:24:18,490 --> 00:24:21,799
If the set of tufas had
really been this branch here,

629
00:24:21,799 --> 00:24:24,340
or this one here, then it would
have been quite a coincidence

630
00:24:24,340 --> 00:24:25,881
that the first three
examples you saw

631
00:24:25,881 --> 00:24:28,490
were all clustered over
there in one corner.

632
00:24:28,490 --> 00:24:32,310
And what we showed was
that, that kind of model,

633
00:24:32,310 --> 00:24:34,060
where that suspicious
coincidence came out

634
00:24:34,060 --> 00:24:35,260
from the same kinds
of things I've just

635
00:24:35,260 --> 00:24:37,720
been showing you for the
causal clustering example,

636
00:24:37,720 --> 00:24:40,407
and for the interval thing,
it's the same Bayesian math.

637
00:24:40,407 --> 00:24:42,240
But now with this
tree-structured hypothesis

638
00:24:42,240 --> 00:24:44,530
space, that was actually--
did a pretty good job

639
00:24:44,530 --> 00:24:46,060
of capturing people's judgments.

640
00:24:46,060 --> 00:24:49,050
We gave people one or a few
examples of these concepts

641
00:24:49,050 --> 00:24:51,550
that, the examples could be
more narrowly or broadly spread,

642
00:24:51,550 --> 00:24:53,560
just like you saw in
the clustering thing,

643
00:24:53,560 --> 00:24:56,010
but just sort of less extensive.

644
00:24:56,010 --> 00:24:57,010
We did this with adults.

645
00:24:57,010 --> 00:24:57,929
We did this with kids.

646
00:24:57,929 --> 00:24:59,845
And I won't really go
into any of the details.

647
00:24:59,845 --> 00:25:02,400
but If you're interested,
check out these various Xu

648
00:25:02,400 --> 00:25:03,300
and Tenenbaum papers.

649
00:25:03,300 --> 00:25:04,766
That's the main one there.

650
00:25:04,766 --> 00:25:06,390
And you know, the
model kind of worked.

651
00:25:06,390 --> 00:25:08,130
But ultimately, we
found it dissatisfying.

652
00:25:08,130 --> 00:25:09,510
Because we couldn't
really explain--

653
00:25:09,510 --> 00:25:11,270
we didn't really know what
the hypothesis space was.

654
00:25:11,270 --> 00:25:13,769
We didn't really know how people
were building up this tree.

655
00:25:13,769 --> 00:25:15,660
And so we did a few things.

656
00:25:15,660 --> 00:25:17,850
We-- meaning I with
some other people--

657
00:25:17,850 --> 00:25:20,404
turned to other problems where
we had a better idea, maybe,

658
00:25:20,404 --> 00:25:22,320
of the feature space and
the hypothesis space,

659
00:25:22,320 --> 00:25:24,990
but the same kind of ideas
could be explored and developed.

660
00:25:24,990 --> 00:25:28,830
And then ultimately-- and I'll
show you this maybe before

661
00:25:28,830 --> 00:25:30,436
lunch, or maybe after lunch--

662
00:25:30,436 --> 00:25:32,810
we went back and tackled the
problem of learning concepts

663
00:25:32,810 --> 00:25:36,810
from examples with other cases
where we could get a better

664
00:25:36,810 --> 00:25:39,960
handle on really knowing what
the representations that people

665
00:25:39,960 --> 00:25:43,350
were using were, and also where
we could compare with machines

666
00:25:43,350 --> 00:25:45,840
in much more compelling
apples and oranges ways.

667
00:25:45,840 --> 00:25:48,600
In some sense here, there's
no machine, as far as I know,

668
00:25:48,600 --> 00:25:51,270
that can solve this problem
as well as our model.

669
00:25:51,270 --> 00:25:53,220
On the other hand,
that's, again, it's

670
00:25:53,220 --> 00:25:54,570
just very much like
the issue that came up

671
00:25:54,570 --> 00:25:55,710
when we were talking about--

672
00:25:55,710 --> 00:25:56,930
I guess maybe it was
with you, Tyler--

673
00:25:56,930 --> 00:25:58,980
when we were talking about the
deep learning-- or with you,

674
00:25:58,980 --> 00:25:59,690
Leo--

675
00:25:59,690 --> 00:26:01,810
the deep reinforcement network.

676
00:26:01,810 --> 00:26:05,010
A machine that's looking
at this just as pixels

677
00:26:05,010 --> 00:26:06,810
is missing so much
of what we bring

678
00:26:06,810 --> 00:26:08,310
to it, which is,
we see these things

679
00:26:08,310 --> 00:26:10,240
as three-dimensional objects.

680
00:26:10,240 --> 00:26:13,309
And just like the
cam in rock climbing,

681
00:26:13,309 --> 00:26:14,850
or any of those
other examples I gave

682
00:26:14,850 --> 00:26:17,414
before, I think that's
essential to the abilities

683
00:26:17,414 --> 00:26:18,330
that people are doing.

684
00:26:18,330 --> 00:26:20,430
The generative model
we build, this tree

685
00:26:20,430 --> 00:26:23,171
is based not on pixels, or
even on ConvNet features,

686
00:26:23,171 --> 00:26:25,170
but on a sense of the
three-dimensional objects,

687
00:26:25,170 --> 00:26:27,610
its parts, and their
relations to each other.

688
00:26:27,610 --> 00:26:30,660
And so, fundamentally, until
we know how to perceive objects

689
00:26:30,660 --> 00:26:34,290
better, this is not going to
be comparable between humans

690
00:26:34,290 --> 00:26:36,030
and machines on equal terms.

691
00:26:36,030 --> 00:26:37,530
But I'll show you
a little bit later

692
00:26:37,530 --> 00:26:39,600
some still pretty quite
interesting, but simpler,

693
00:26:39,600 --> 00:26:42,000
visual concepts that you can
still learn and generalize

694
00:26:42,000 --> 00:26:46,050
from one example, but where they
are comparable in equal terms.

695
00:26:46,050 --> 00:26:48,960
But first I want to tell you
a little bit about these--

696
00:26:48,960 --> 00:26:50,670
yet another cognitive
judgment, which

697
00:26:50,670 --> 00:26:52,800
like the word learning,
or concept learning cases,

698
00:26:52,800 --> 00:26:55,110
involved generalizing
from a few examples.

699
00:26:55,110 --> 00:26:56,850
They also involve
using prior knowledge.

700
00:26:56,850 --> 00:26:58,350
But they're ones
where maybe we have

701
00:26:58,350 --> 00:27:01,290
some way of capturing people's
prior knowledge by using

702
00:27:01,290 --> 00:27:04,380
the right combination
of statistical inference

703
00:27:04,380 --> 00:27:06,560
on some kind of symbolically
structured bottle.

704
00:27:06,560 --> 00:27:08,070
So you can already see, as--

705
00:27:08,070 --> 00:27:12,010
I mean, just sort of to
show the narrative here.

706
00:27:12,010 --> 00:27:14,670
The examples I was
giving here, this

707
00:27:14,670 --> 00:27:16,370
doesn't require any
symbolic structure.

708
00:27:16,370 --> 00:27:17,730
All that stuff I was
talking at the beginning,

709
00:27:17,730 --> 00:27:19,856
about how we have to combine
statistical inference,

710
00:27:19,856 --> 00:27:21,355
sophisticated
statistical inference,

711
00:27:21,355 --> 00:27:23,250
with sophisticated
symbolic representations,

712
00:27:23,250 --> 00:27:24,594
you don't need any of that here.

713
00:27:24,594 --> 00:27:26,010
All the representations
could just

714
00:27:26,010 --> 00:27:29,010
be counting up numbers or
using simple probability

715
00:27:29,010 --> 00:27:30,600
distributions that
statisticians have

716
00:27:30,600 --> 00:27:33,600
worked with for over 100 years.

717
00:27:33,600 --> 00:27:35,477
Once we start to go
here, now we have

718
00:27:35,477 --> 00:27:37,560
to define a model with
some interesting structure,

719
00:27:37,560 --> 00:27:41,510
like a branching tree
structure, and so on.

720
00:27:41,510 --> 00:27:43,410
And as you'll see,
we can quickly

721
00:27:43,410 --> 00:27:45,360
get to lots more
interesting causal,

722
00:27:45,360 --> 00:27:48,870
compositionally-structured
generative models

723
00:27:48,870 --> 00:27:50,220
in similar kinds of tasks.

724
00:27:50,220 --> 00:27:52,300
And in particular,
we were looking for--

725
00:27:52,300 --> 00:27:54,150
for a few years, we
were very interested

726
00:27:54,150 --> 00:27:55,950
in these property
induction tasks.

727
00:27:55,950 --> 00:27:57,480
So this was-- it
happened to be--

728
00:27:57,480 --> 00:27:59,105
I mean, I think this
was a coincidence.

729
00:27:59,105 --> 00:28:01,680
Or maybe we were both influenced
by Susan Carey, actually.

730
00:28:01,680 --> 00:28:04,950
So the work that Surya talked
about, that he was trying

731
00:28:04,950 --> 00:28:06,910
to explain as a
theoretician-- remember,

732
00:28:06,910 --> 00:28:09,330
Surya and Andrew
Saxe, they were trying

733
00:28:09,330 --> 00:28:12,030
to give the theory of
these neural network

734
00:28:12,030 --> 00:28:13,620
models that Jay
McClelland and Tim

735
00:28:13,620 --> 00:28:15,510
Rogers had built
in the early 2000s,

736
00:28:15,510 --> 00:28:17,940
around the same time we
were doing this work.

737
00:28:17,940 --> 00:28:20,280
And they were inspired by
some of Susan Carey's work

738
00:28:20,280 --> 00:28:22,560
on children's intuitive
biology, as well as

739
00:28:22,560 --> 00:28:25,140
other people out there
in cognitive psychology--

740
00:28:25,140 --> 00:28:31,770
for example, Lance
Rips, and Smith, Madine.

741
00:28:31,770 --> 00:28:33,720
Many, many cognitive
psychologists

742
00:28:33,720 --> 00:28:35,100
studied things like this--

743
00:28:35,100 --> 00:28:37,765
Dan Osherson.

744
00:28:37,765 --> 00:28:40,440
They often talked about this as
a kind of inductive reasoning,

745
00:28:40,440 --> 00:28:42,826
or property induction,
where the idea was--

746
00:28:42,826 --> 00:28:45,450
so it might look different from
the task I've given you before,

747
00:28:45,450 --> 00:28:48,875
but actually, it's
deeply related.

748
00:28:48,875 --> 00:28:53,340
The task was often presented
to people like an argument

749
00:28:53,340 --> 00:28:54,870
with premises and
a conclusion, kind

750
00:28:54,870 --> 00:28:57,960
of like a traditional deductive
syllogism, like all men are

751
00:28:57,960 --> 00:29:00,769
mortal, Socrates is a man,
therefore Socrates is mortal.

752
00:29:00,769 --> 00:29:02,310
But these are
inductive in that there

753
00:29:02,310 --> 00:29:06,210
is no-- you can't conclude
with deductive certainty

754
00:29:06,210 --> 00:29:08,010
the conclusion follows
from the premises

755
00:29:08,010 --> 00:29:10,290
or is falsified by the
premise, but rather you just

756
00:29:10,290 --> 00:29:11,480
make a good guess.

757
00:29:11,480 --> 00:29:15,060
The statements above the line
provide some, more or less,

758
00:29:15,060 --> 00:29:16,980
good or bad evidence
for the statement

759
00:29:16,980 --> 00:29:19,545
below the line being true.

760
00:29:19,545 --> 00:29:22,200
These studies were
often done with-- they

761
00:29:22,200 --> 00:29:25,920
could be done with just sort of
familiar biological properties,

762
00:29:25,920 --> 00:29:30,565
like having hairy legs or
being bigger than a breadbox.

763
00:29:30,565 --> 00:29:32,940
I mean, it's also-- it's very
much the same kind of thing

764
00:29:32,940 --> 00:29:35,800
that Tom Mitchell was talking
about, as you'll start to see.

765
00:29:35,800 --> 00:29:38,221
There's another reason why
I wanted to cover this.

766
00:29:38,221 --> 00:29:39,720
We worked on these
things because we

767
00:29:39,720 --> 00:29:42,095
wanted to be able to engage
with the same kinds of things

768
00:29:42,095 --> 00:29:44,100
that people like Jay
McClelland and Tom Mitchell

769
00:29:44,100 --> 00:29:46,180
were thinking about, coming
from different perspectives.

770
00:29:46,180 --> 00:29:47,610
Remember, Tom
Mitchell showed you

771
00:29:47,610 --> 00:29:52,710
his way of classifying brain
representations of semantics

772
00:29:52,710 --> 00:29:56,550
with matrices of objects and
20-question-like features that

773
00:29:56,550 --> 00:29:58,920
included things like is
it hairy, or is it alive,

774
00:29:58,920 --> 00:30:03,020
or does it eggs, or is
it bigger than a car,

775
00:30:03,020 --> 00:30:06,190
or bigger than a
breadbox, or whatever.

776
00:30:06,190 --> 00:30:07,910
Any one of these things--

777
00:30:07,910 --> 00:30:09,960
basically, we're getting
at the same thing.

778
00:30:09,960 --> 00:30:11,510
Here there's just what's--

779
00:30:11,510 --> 00:30:13,010
often these
experiments with humans

780
00:30:13,010 --> 00:30:15,176
were done with so-called
blank predicates, something

781
00:30:15,176 --> 00:30:17,840
that sounded vaguely biological,
but was basically made up,

782
00:30:17,840 --> 00:30:19,850
or that most people
wouldn't know much about.

783
00:30:19,850 --> 00:30:22,220
Does anyone know anything
about T9 hormones?

784
00:30:22,220 --> 00:30:24,299
I hope so, because I made it up.

785
00:30:24,299 --> 00:30:26,090
But some of them were
just done with things

786
00:30:26,090 --> 00:30:28,740
that were real, but not
known to most people.

787
00:30:28,740 --> 00:30:31,072
So if I tell you that
gorillas and seals both have

788
00:30:31,072 --> 00:30:33,530
T9 hormones, you might think
it's sort of, fairly plausible

789
00:30:33,530 --> 00:30:35,738
that horses have T9 hormones,
maybe more so than if I

790
00:30:35,738 --> 00:30:38,060
hadn't told you anything.

791
00:30:38,060 --> 00:30:40,010
Maybe you think that
argument is more

792
00:30:40,010 --> 00:30:41,990
plausible than the one
on the right; given

793
00:30:41,990 --> 00:30:45,007
that gorillas and seals have
T9 or hormones, that anteaters

794
00:30:45,007 --> 00:30:45,590
have hormones.

795
00:30:45,590 --> 00:30:48,020
So maybe you think
horses are somehow

796
00:30:48,020 --> 00:30:50,629
more similar to gorillas and
seals than anteaters are.

797
00:30:50,629 --> 00:30:51,170
I don't know.

798
00:30:51,170 --> 00:30:51,670
Maybe.

799
00:30:51,670 --> 00:30:52,670
Maybe a little bit.

800
00:30:52,670 --> 00:30:54,020
If I made that bees--

801
00:30:54,020 --> 00:30:56,305
gorillas and seals
have T9 on hormones.

802
00:30:56,305 --> 00:30:58,430
Does that make you think
it's likely that bees have

803
00:30:58,430 --> 00:31:02,030
T9 hormones, or pine trees?

804
00:31:02,030 --> 00:31:03,620
The farther the
conclusion category

805
00:31:03,620 --> 00:31:06,620
gets from the premises, the
less plausible it seems.

806
00:31:06,620 --> 00:31:09,260
Maybe the one on the lower right
also seems not very plausible,

807
00:31:09,260 --> 00:31:10,652
or not as plausible.

808
00:31:10,652 --> 00:31:12,110
Because if I tell
you that gorillas

809
00:31:12,110 --> 00:31:14,240
have T9 hormones, chimps,
monkeys, and baboons

810
00:31:14,240 --> 00:31:16,130
all have T9 on
hormones, maybe you

811
00:31:16,130 --> 00:31:18,140
think that it's only
primates or something.

812
00:31:18,140 --> 00:31:19,640
So they're not a
very-- it's, again,

813
00:31:19,640 --> 00:31:21,140
one of these
typicality-suspicious

814
00:31:21,140 --> 00:31:23,750
coincidence businesses.

815
00:31:23,750 --> 00:31:25,670
So again, you can
think of it as-- you

816
00:31:25,670 --> 00:31:27,380
can do these experiments
in various ways.

817
00:31:27,380 --> 00:31:28,963
I won't really go
through the details,

818
00:31:28,963 --> 00:31:30,770
but it basically
involves giving people

819
00:31:30,770 --> 00:31:33,229
a bunch of different sets
of examples, just like--

820
00:31:33,229 --> 00:31:35,270
I mean, in some sense,
the important thing to get

821
00:31:35,270 --> 00:31:37,160
is that abstractly it has
the same character of all

822
00:31:37,160 --> 00:31:38,420
the other tasks you've seen.

823
00:31:38,420 --> 00:31:40,820
You're giving people one or
a few examples, which we're

824
00:31:40,820 --> 00:31:44,150
going to treat as random
draws from some concept,

825
00:31:44,150 --> 00:31:46,940
or some region in
some larger space.

826
00:31:46,940 --> 00:31:49,400
In this case, the examples
are the different premise

827
00:31:49,400 --> 00:31:51,200
categories, like
gorillas and seals

828
00:31:51,200 --> 00:31:55,280
are examples of the concept
of having T9 hormones.

829
00:31:55,280 --> 00:31:57,310
Or gorillas, chimps,
monkeys, and baboons

830
00:31:57,310 --> 00:31:58,880
are an example of a concept.

831
00:31:58,880 --> 00:32:01,460
We're going to put a
prior on possible extents

832
00:32:01,460 --> 00:32:04,430
of that concept, and then
ask what kind of inferences

833
00:32:04,430 --> 00:32:06,867
people make from that
prior, to figure out

834
00:32:06,867 --> 00:32:08,450
what other things
are in that concept.

835
00:32:08,450 --> 00:32:11,630
So are horses in
that same concept?

836
00:32:11,630 --> 00:32:12,382
Or are anteaters?

837
00:32:12,382 --> 00:32:14,840
Or are horses in it more or
less, depending on the examples

838
00:32:14,840 --> 00:32:15,470
you give?

839
00:32:15,470 --> 00:32:17,750
And what's the
nature of that prior?

840
00:32:17,750 --> 00:32:19,820
And what's good
about this is that,

841
00:32:19,820 --> 00:32:24,740
kind of like the everyday
prediction task--

842
00:32:24,740 --> 00:32:27,290
the lines of the poems, or
the movie grosses, or the cake

843
00:32:27,290 --> 00:32:29,210
baking-- we can actually
sort of go out and measure

844
00:32:29,210 --> 00:32:30,960
some features that are
plausibly relevant,

845
00:32:30,960 --> 00:32:33,200
to set up a plausibly
relevant prior,

846
00:32:33,200 --> 00:32:36,860
unlike the interesting
object cases.

847
00:32:36,860 --> 00:32:38,829
But like the interesting
object cases,

848
00:32:38,829 --> 00:32:41,120
there are some interesting
hierarchical and other kinds

849
00:32:41,120 --> 00:32:42,740
of causal compositional
structures

850
00:32:42,740 --> 00:32:44,360
that people seem
to be using that we

851
00:32:44,360 --> 00:32:46,680
can capture in our models.

852
00:32:46,680 --> 00:32:50,240
So here, again, the kinds of
experiments-- these features

853
00:32:50,240 --> 00:32:54,220
were generated many years ago
by Osherson and colleagues.

854
00:32:54,220 --> 00:32:56,270
But it's very similar
to the 20 questions game

855
00:32:56,270 --> 00:32:57,290
that Tom Mitchell used.

856
00:32:57,290 --> 00:32:58,820
And I don't remember
if Surya talked

857
00:32:58,820 --> 00:33:00,230
about where these
features came from,

858
00:33:00,230 --> 00:33:02,570
that he talked a lot about a
matrix of objects and features.

859
00:33:02,570 --> 00:33:04,778
I don't know if he talked
about where they come from.

860
00:33:04,778 --> 00:33:07,250
But actually, psychologists
spent a while coming up

861
00:33:07,250 --> 00:33:09,110
with ways to get
people to just tell you

862
00:33:09,110 --> 00:33:10,820
a bunch of features of animals.

863
00:33:10,820 --> 00:33:13,790
This is, again, it's meant
to capture the knowledge

864
00:33:13,790 --> 00:33:17,210
that maybe a kid would get
from maybe plausibly reading

865
00:33:17,210 --> 00:33:18,700
books and going to the zoo.

866
00:33:18,700 --> 00:33:20,042
We know that elephants are gray.

867
00:33:20,042 --> 00:33:20,750
They're hairless.

868
00:33:20,750 --> 00:33:21,590
They have tough skin.

869
00:33:21,590 --> 00:33:22,100
They're big.

870
00:33:22,100 --> 00:33:24,740
They have a bulbous body shape.

871
00:33:24,740 --> 00:33:25,642
They have long legs.

872
00:33:25,642 --> 00:33:27,600
These are all mostly
relative to other animals.

873
00:33:27,600 --> 00:33:28,910
They have a tail.

874
00:33:28,910 --> 00:33:29,580
They have tusks.

875
00:33:29,580 --> 00:33:31,639
They might be smelly,
compared to other animals--

876
00:33:31,639 --> 00:33:33,680
smellier than average is
sort of what that means.

877
00:33:33,680 --> 00:33:35,130
They walk, as opposed to fly.

878
00:33:35,130 --> 00:33:36,752
They're slow, as
opposed to fast.

879
00:33:36,752 --> 00:33:38,210
They're strong, as
opposed to weak.

880
00:33:38,210 --> 00:33:39,644
It's that kind of business.

881
00:33:39,644 --> 00:33:41,810
So basically what that gives
you is this big matrix.

882
00:33:41,810 --> 00:33:43,393
Again, the same kind
of thing that you

883
00:33:43,393 --> 00:33:46,130
saw in Surya's talk,
the same kind of thing

884
00:33:46,130 --> 00:33:47,750
that Tom Mitchell
is using to help

885
00:33:47,750 --> 00:33:49,130
classify things, the
same kind of thing

886
00:33:49,130 --> 00:33:50,963
that basically everybody
in machine learning

887
00:33:50,963 --> 00:33:55,190
uses-- a matrix of data
with objects, maybe as rows,

888
00:33:55,190 --> 00:33:57,530
and features, or
attributes, as columns.

889
00:33:57,530 --> 00:33:59,330
And the problem here is--

890
00:33:59,330 --> 00:34:02,145
the problem of
learning is to say--

891
00:34:02,145 --> 00:34:04,520
the problem of learning and
generalizing from one example

892
00:34:04,520 --> 00:34:06,920
is to take a new property,
which is a new concept, which

893
00:34:06,920 --> 00:34:08,810
is like a new
column here, to get

894
00:34:08,810 --> 00:34:11,659
one or a few examples of that
concept, which is basically

895
00:34:11,659 --> 00:34:14,600
just filling in one or a
few entries in that column,

896
00:34:14,600 --> 00:34:16,850
and figure out how to fill
in the others, to decide,

897
00:34:16,850 --> 00:34:19,790
do you or don't you have
that property, somehow

898
00:34:19,790 --> 00:34:21,800
building knowledge
that you can generalize

899
00:34:21,800 --> 00:34:24,080
from your prior
experience, which

900
00:34:24,080 --> 00:34:26,360
could be captured by, say,
all the other features

901
00:34:26,360 --> 00:34:27,780
that you know about objects.

902
00:34:27,780 --> 00:34:29,405
So that's the way
that you might set up

903
00:34:29,405 --> 00:34:32,030
this problem, which again, looks
like a lot of other problems

904
00:34:32,030 --> 00:34:34,550
of, say, semi-supervised
learning or sparse matrix

905
00:34:34,550 --> 00:34:35,189
completion.

906
00:34:35,189 --> 00:34:36,980
It's a problem in which
we can, or at least

907
00:34:36,980 --> 00:34:38,592
we thought we could,
compare humans

908
00:34:38,592 --> 00:34:40,550
and many different
algorithms, and even theory,

909
00:34:40,550 --> 00:34:42,020
like from Surya's talk.

910
00:34:42,020 --> 00:34:43,760
And that seemed very
appealing to us.

911
00:34:43,760 --> 00:34:45,480
What we thought, though,
that people were doing,

912
00:34:45,480 --> 00:34:47,355
which is maybe a little
different than what--

913
00:34:47,355 --> 00:34:49,940
or somewhat different-- well,
quite different than what Jay

914
00:34:49,940 --> 00:34:51,199
McClelland thought
people were doing--

915
00:34:51,199 --> 00:34:53,389
maybe a little bit more like
what Susan Carey or some

916
00:34:53,389 --> 00:34:55,722
of the earlier psychologists
thought people were doing--

917
00:34:55,722 --> 00:34:57,110
was something like this.

918
00:34:57,110 --> 00:34:59,890
That the way we
solve this problem,

919
00:34:59,890 --> 00:35:02,380
the way we bridged from our
prior experience to new things

920
00:35:02,380 --> 00:35:06,860
we wanted to learn was
not, say, by just computing

921
00:35:06,860 --> 00:35:08,860
the second order of
statistics and correlations,

922
00:35:08,860 --> 00:35:11,276
and compressing that through
some bottleneck hidden layer,

923
00:35:11,276 --> 00:35:12,970
but by building a
more interesting

924
00:35:12,970 --> 00:35:15,850
structured probabilistic
model that was, in some form,

925
00:35:15,850 --> 00:35:16,450
causal--

926
00:35:16,450 --> 00:35:18,640
in some form-- in some
form, compositional

927
00:35:18,640 --> 00:35:22,140
and hierarchical--
something kind of like this.

928
00:35:22,140 --> 00:35:26,800
And this is a good example of a
hierarchical generative model.

929
00:35:26,800 --> 00:35:29,110
There's three layers
of structure here.

930
00:35:29,110 --> 00:35:31,620
The bottom layer is
the observable layer.

931
00:35:31,620 --> 00:35:34,510
So the arrows in these
generative models point down,

932
00:35:34,510 --> 00:35:37,750
often, usually, where the thing
on the bottom is the thing

933
00:35:37,750 --> 00:35:39,820
you observe, the data
of your experience.

934
00:35:39,820 --> 00:35:42,460
And then the stuff above it
are various levels of structure

935
00:35:42,460 --> 00:35:45,440
that your mind is
positing to explain it.

936
00:35:45,440 --> 00:35:47,230
So here we have two
levels of structure.

937
00:35:47,230 --> 00:35:50,590
The level above this is sort
of this tree in your head.

938
00:35:50,590 --> 00:35:54,130
The idea-- it's like a certain
kind of graph structure,

939
00:35:54,130 --> 00:35:56,654
where the objects, or the
species, are the leaf nodes.

940
00:35:56,654 --> 00:35:58,570
And there's some internal
nodes corresponding,

941
00:35:58,570 --> 00:36:00,760
maybe to higher level taxa,
or groups, or something.

942
00:36:00,760 --> 00:36:02,860
You might have words for
these, too, like mammal,

943
00:36:02,860 --> 00:36:04,930
or primate, or animal.

944
00:36:04,930 --> 00:36:07,000
And the idea is that
there's some kind

945
00:36:07,000 --> 00:36:08,890
of probabilistic model
that you can describe,

946
00:36:08,890 --> 00:36:12,070
maybe even a causal one on top
of that symbolic structure,

947
00:36:12,070 --> 00:36:16,000
that tree, that produces the
data that's more directly

948
00:36:16,000 --> 00:36:17,622
observable, the
observable features,

949
00:36:17,622 --> 00:36:19,330
including the things
you've only sparsely

950
00:36:19,330 --> 00:36:20,867
observed and want to fill in.

951
00:36:20,867 --> 00:36:23,200
And then you might also have
higher levels of structure.

952
00:36:23,200 --> 00:36:24,970
Like if you want to
explain, how did you

953
00:36:24,970 --> 00:36:27,080
learn that tree in
the first place,

954
00:36:27,080 --> 00:36:29,957
maybe it's because you have
some kind of generative model

955
00:36:29,957 --> 00:36:31,040
for that generative model.

956
00:36:31,040 --> 00:36:32,680
So here I'm just using
words to describe it,

957
00:36:32,680 --> 00:36:34,388
but I'll show you some
other stuff in a--

958
00:36:34,388 --> 00:36:37,600
or I'll show you something
more formal a little bit later.

959
00:36:37,600 --> 00:36:40,120
But you could say,
well, maybe the way

960
00:36:40,120 --> 00:36:42,100
I figure out that there's
a tree structure is

961
00:36:42,100 --> 00:36:44,206
by having a hypothesis--

962
00:36:44,206 --> 00:36:45,580
the way I figure
out that there's

963
00:36:45,580 --> 00:36:49,480
that particular tree-structured
graphical model of this domain

964
00:36:49,480 --> 00:36:51,520
is by having the more
general hypothesis

965
00:36:51,520 --> 00:36:54,310
that there is some latent
hierarchy of species.

966
00:36:54,310 --> 00:36:57,040
And I just have to figure
out which one it is.

967
00:36:57,040 --> 00:36:59,380
So you could formulate this
as a hierarchical inference

968
00:36:59,380 --> 00:37:00,940
by saying that
what we're calling

969
00:37:00,940 --> 00:37:02,500
the form, the form
of the model, it's

970
00:37:02,500 --> 00:37:05,230
like a hypothesis space of
models, which are themselves

971
00:37:05,230 --> 00:37:08,890
hypothesis spaces of possible
observed patterns of feature

972
00:37:08,890 --> 00:37:10,060
correlation.

973
00:37:10,060 --> 00:37:11,860
And that, that higher
level knowledge,

974
00:37:11,860 --> 00:37:14,020
puts some kind of a generative
model on these graph

975
00:37:14,020 --> 00:37:15,910
structures, where each
graph structure then

976
00:37:15,910 --> 00:37:17,830
puts a generative model on
the data you can observe.

977
00:37:17,830 --> 00:37:19,210
And then you could
have even higher levels

978
00:37:19,210 --> 00:37:20,080
of this sort of thing.

979
00:37:20,080 --> 00:37:22,180
And then learning could
go on at any or all levels

980
00:37:22,180 --> 00:37:25,399
of this hierarchy, higher
than the level of experience.

981
00:37:25,399 --> 00:37:27,940
So just to show you a little
bit about how this kind of thing

982
00:37:27,940 --> 00:37:31,870
works, what we're calling
the probability of the data

983
00:37:31,870 --> 00:37:34,420
given the structure is actually
exactly the same, really,

984
00:37:34,420 --> 00:37:37,300
as the model that Surya
and Andrew Saxe used.

985
00:37:37,300 --> 00:37:40,960
The difference is that
we were suggesting--

986
00:37:40,960 --> 00:37:42,642
may be right, may be wrong--

987
00:37:42,642 --> 00:37:44,350
that something like
this generative model

988
00:37:44,350 --> 00:37:46,655
was actually in your head.

989
00:37:46,655 --> 00:37:49,270
Surya presented a very
simple abstraction

990
00:37:49,270 --> 00:37:52,480
of evolutionary branching
process, a kind of diffusion

991
00:37:52,480 --> 00:37:54,940
over the tree, where properties
could turn on or off.

992
00:37:54,940 --> 00:37:57,700
And we built basically
that same kind of model.

993
00:37:57,700 --> 00:38:00,880
And we said, maybe you
have something in your head

994
00:38:00,880 --> 00:38:05,230
as a model of, again, the
distribution of properties,

995
00:38:05,230 --> 00:38:08,630
or features, or attributes over
the leaf nodes of the tree.

996
00:38:08,630 --> 00:38:11,222
So if you have this kind
of statistical model.

997
00:38:11,222 --> 00:38:13,180
If you think that there's
something like a tree

998
00:38:13,180 --> 00:38:16,990
structure, and properties are
produced over the leaf nodes

999
00:38:16,990 --> 00:38:19,739
by some kind of switching,
on-and-off, mutation-like

1000
00:38:19,739 --> 00:38:22,030
process, then you can do
something like in this picture

1001
00:38:22,030 --> 00:38:22,530
here.

1002
00:38:22,530 --> 00:38:25,030
You can take an observe a set
of features in that matrix

1003
00:38:25,030 --> 00:38:26,440
and learn the best tree.

1004
00:38:26,440 --> 00:38:28,760
You can figure out that
thing I'm showing on the top,

1005
00:38:28,760 --> 00:38:30,970
that structure, which is,
in some sense, the best

1006
00:38:30,970 --> 00:38:34,000
guess of a tree structure-- a
latent tree structure-- which

1007
00:38:34,000 --> 00:38:37,060
if you then define some
kind of diffusion mutation

1008
00:38:37,060 --> 00:38:40,810
process over that tree, would
produce with high probability

1009
00:38:40,810 --> 00:38:43,930
distributions of features
like those shown there.

1010
00:38:43,930 --> 00:38:45,460
If I gave you a
very different tree

1011
00:38:45,460 --> 00:38:47,450
it would produce other
patterns of correlation.

1012
00:38:47,450 --> 00:38:48,927
And it's just like
Surya said, it

1013
00:38:48,927 --> 00:38:51,010
can be all captured by the
second order statistics

1014
00:38:51,010 --> 00:38:52,420
of feature correlations.

1015
00:38:52,420 --> 00:38:54,190
The nice thing about
this is that now this

1016
00:38:54,190 --> 00:38:56,420
also gives a distribution
on new properties.

1017
00:38:56,420 --> 00:38:57,280
So if I observe--

1018
00:38:57,280 --> 00:38:59,302
because each column is
conditionally independent

1019
00:38:59,302 --> 00:39:00,010
given that model.

1020
00:39:00,010 --> 00:39:01,690
Each column is an
independent sample

1021
00:39:01,690 --> 00:39:04,400
from that generative model.

1022
00:39:04,400 --> 00:39:06,340
And the idea is if I
observe a new property,

1023
00:39:06,340 --> 00:39:08,840
and I want to say, well, which
other things have this, well,

1024
00:39:08,840 --> 00:39:11,740
I can make a guess on using
that probabilistic model.

1025
00:39:11,740 --> 00:39:14,020
I can say, all
right, given that I

1026
00:39:14,020 --> 00:39:16,180
know the value of this
function over the tree,

1027
00:39:16,180 --> 00:39:18,851
this stochastic process,
at some points, what

1028
00:39:18,851 --> 00:39:21,100
do I think the most likely
values are at other points?

1029
00:39:21,100 --> 00:39:23,599
And basically, what you get is,
again, like in the diffusion

1030
00:39:23,599 --> 00:39:25,750
process, a kind of
similarity-based generalization

1031
00:39:25,750 --> 00:39:28,960
with a tree-structured metric,
that nearby points in the tree

1032
00:39:28,960 --> 00:39:30,450
are likely to have
the same value.

1033
00:39:30,450 --> 00:39:33,077
So in particular, things that
are near to, say, species one

1034
00:39:33,077 --> 00:39:35,326
and nine are probably going
to have the same property,

1035
00:39:35,326 --> 00:39:36,640
and others maybe less so.

1036
00:39:36,640 --> 00:39:38,150
And you build that model.

1037
00:39:38,150 --> 00:39:40,000
And it's really quite
striking how much

1038
00:39:40,000 --> 00:39:41,292
it matches people's intuitions.

1039
00:39:41,292 --> 00:39:43,666
So now you're seeing the kinds
of plots I was showing you

1040
00:39:43,666 --> 00:39:44,350
before, where--

1041
00:39:44,350 --> 00:39:46,060
all my data plots
look like this.

1042
00:39:46,060 --> 00:39:48,310
Whenever I'm showing the
scatterplot, by default,

1043
00:39:48,310 --> 00:39:52,300
the y-axis is the average of
a bunch of people's judgments,

1044
00:39:52,300 --> 00:39:55,660
and the x-axis is the model
predictions on the same units

1045
00:39:55,660 --> 00:39:56,950
or scale.

1046
00:39:56,950 --> 00:39:58,330
And each of these
scatterplots is

1047
00:39:58,330 --> 00:40:00,090
from a different
experiment-- not done

1048
00:40:00,090 --> 00:40:02,610
by us, done by other people,
like Osherson and Smith

1049
00:40:02,610 --> 00:40:03,840
from a couple of decades ago.

1050
00:40:06,360 --> 00:40:08,460
But they all sort of have
the same kind of form,

1051
00:40:08,460 --> 00:40:10,800
where each dot is a
different set of examples,

1052
00:40:10,800 --> 00:40:12,160
or a different argument.

1053
00:40:12,160 --> 00:40:14,550
And what typically varied
within an experiment--

1054
00:40:14,550 --> 00:40:15,750
you vary the examples.

1055
00:40:15,750 --> 00:40:18,810
And you fix constant
the conclusion category.

1056
00:40:18,810 --> 00:40:21,240
And you see, basically,
how much evidential support

1057
00:40:21,240 --> 00:40:23,130
to different sets of
two or three examples

1058
00:40:23,130 --> 00:40:25,200
gives to a certain conclusion.

1059
00:40:25,200 --> 00:40:27,480
And it's really, again,
quite striking that--

1060
00:40:27,480 --> 00:40:29,070
sometimes in a more
categorical way,

1061
00:40:29,070 --> 00:40:31,110
sometimes in a more graded
way-- but basically,

1062
00:40:31,110 --> 00:40:32,820
people's average
judgments here just

1063
00:40:32,820 --> 00:40:37,950
line up quite well with the
sort of Bayesian inference

1064
00:40:37,950 --> 00:40:41,190
on this tree-structured
generative model.

1065
00:40:41,190 --> 00:40:45,120
These are just examples of
the kinds of stimuli here.

1066
00:40:45,120 --> 00:40:46,080
Now, we can compare.

1067
00:40:46,080 --> 00:40:47,280
One of the reasons
why we were interested

1068
00:40:47,280 --> 00:40:49,654
in this was to compare, again,
many different approaches.

1069
00:40:49,654 --> 00:40:52,110
So here I'm going to show
you a comparison with just

1070
00:40:52,110 --> 00:40:53,340
a variant of our approach.

1071
00:40:53,340 --> 00:40:55,710
It's the same kind of
hierarchical Bayesian model,

1072
00:40:55,710 --> 00:40:57,540
but now the structure
isn't a tree,

1073
00:40:57,540 --> 00:41:00,360
it's a low-dimensional
Euclidean space.

1074
00:41:00,360 --> 00:41:03,967
You can define the same kinds
of proximity smoothness thing.

1075
00:41:03,967 --> 00:41:06,300
I mean, again, it's more a
standard in machine learning.

1076
00:41:06,300 --> 00:41:10,500
It's related to
Gaussian processes.

1077
00:41:10,500 --> 00:41:12,164
It's much more like
neural networks.

1078
00:41:12,164 --> 00:41:14,580
You could think of this as
kind of like a Bayesian version

1079
00:41:14,580 --> 00:41:17,430
of a bottleneck hidden
layer with two dimensions,

1080
00:41:17,430 --> 00:41:18,930
or a small number of dimensions.

1081
00:41:18,930 --> 00:41:21,974
The pictures that Surya showed
you were all higher dimensions

1082
00:41:21,974 --> 00:41:23,640
than two dimensions
in the latent space,

1083
00:41:23,640 --> 00:41:25,800
or the hidden variable
space, of the neural network,

1084
00:41:25,800 --> 00:41:26,760
the hidden layer space.

1085
00:41:26,760 --> 00:41:28,920
But when he compress it
down to two dimensions,

1086
00:41:28,920 --> 00:41:31,090
it looks pretty good.

1087
00:41:31,090 --> 00:41:32,340
So it's the same kind of idea.

1088
00:41:32,340 --> 00:41:37,370
Now what you're saying
is you're going to find,

1089
00:41:37,370 --> 00:41:39,600
not the best tree that
explains all these features,

1090
00:41:39,600 --> 00:41:41,520
but the best
two-dimensional space.

1091
00:41:41,520 --> 00:41:43,680
Maybe it looks like this.

1092
00:41:43,680 --> 00:41:45,270
Where, again, the
probabilistic model

1093
00:41:45,270 --> 00:41:47,594
says that things which are
relatively-- things that

1094
00:41:47,594 --> 00:41:49,260
are closer in this
two-dimensional space

1095
00:41:49,260 --> 00:41:51,218
are more likely to have
the same feature value.

1096
00:41:51,218 --> 00:41:53,850
So you're basically explaining
all the pairwise feature

1097
00:41:53,850 --> 00:41:57,340
correlations by
distance in this space.

1098
00:41:57,340 --> 00:41:58,230
It's similar.

1099
00:41:58,230 --> 00:42:01,890
Importantly it's not as
causal and compositional.

1100
00:42:01,890 --> 00:42:04,770
The tree models something about,
possibly, the causal processes

1101
00:42:04,770 --> 00:42:06,570
of how organisms come to be.

1102
00:42:06,570 --> 00:42:08,870
If I told you that,
oh, there's this--

1103
00:42:08,870 --> 00:42:13,315
that I told you about a
subspecies, like whatever--

1104
00:42:13,315 --> 00:42:15,600
what's a good example--

1105
00:42:15,600 --> 00:42:16,860
different breeds of dogs.

1106
00:42:16,860 --> 00:42:20,310
Or I told you that, oh, well,
there's not just wolves,

1107
00:42:20,310 --> 00:42:22,710
but there's the gray-tailed
wolf and the red-tailed wolf.

1108
00:42:22,710 --> 00:42:23,376
Red-tailed wolf?

1109
00:42:23,376 --> 00:42:23,940
I don't know.

1110
00:42:23,940 --> 00:42:26,340
Again, they're probably
similar, but they might--

1111
00:42:26,340 --> 00:42:28,452
one red-tailed wolf,
whatever that is, more

1112
00:42:28,452 --> 00:42:29,910
similar to another
red-tailed wolf,

1113
00:42:29,910 --> 00:42:31,409
probably has more
features in common

1114
00:42:31,409 --> 00:42:33,420
than with a gray-tailed
wolf, and probably more

1115
00:42:33,420 --> 00:42:35,004
to the gray-tailed
wolf than to a dog.

1116
00:42:35,004 --> 00:42:37,461
The nice thing about a tree is
I can tell you these things,

1117
00:42:37,461 --> 00:42:38,610
and you can, in your mind--

1118
00:42:38,610 --> 00:42:40,530
maybe you'll never forget that
there's a red-tailed wolf.

1119
00:42:40,530 --> 00:42:41,029
There isn't.

1120
00:42:41,029 --> 00:42:41,920
I just made it up.

1121
00:42:41,920 --> 00:42:44,677
But if you ever find
yourself thinking

1122
00:42:44,677 --> 00:42:47,010
about red-tailed wolves and
whether their properties are

1123
00:42:47,010 --> 00:42:50,040
more or less similar to each
other than to gray-tailed

1124
00:42:50,040 --> 00:42:52,529
wolves, or less so
to dogs, or so on,

1125
00:42:52,529 --> 00:42:54,070
it's because I just
said some things,

1126
00:42:54,070 --> 00:42:55,736
and you grew out your
tree in your mind.

1127
00:42:55,736 --> 00:43:00,187
That's a lot harder to do
in a low-dimensional space.

1128
00:43:00,187 --> 00:43:01,770
And it turns out
that, that model also

1129
00:43:01,770 --> 00:43:03,000
fits this data less well.

1130
00:43:03,000 --> 00:43:05,140
So here I'm just showing
two of those experiments.

1131
00:43:05,140 --> 00:43:06,806
Some of them are well
fit by that model,

1132
00:43:06,806 --> 00:43:09,182
but others are less well fit.

1133
00:43:09,182 --> 00:43:10,890
Now, that's not to
say that they wouldn't

1134
00:43:10,890 --> 00:43:11,710
be good for other things.

1135
00:43:11,710 --> 00:43:12,930
So we also did some experiments.

1136
00:43:12,930 --> 00:43:14,250
This was experiments
that we did.

1137
00:43:14,250 --> 00:43:15,930
Oh, actually, I forgot to
say, really importantly,

1138
00:43:15,930 --> 00:43:18,060
this was all worked done
by Charles Kemp, who's

1139
00:43:18,060 --> 00:43:19,860
now a professor at CMU.

1140
00:43:19,860 --> 00:43:24,870
And it was part of the stuff
that he did in his PhD thesis.

1141
00:43:24,870 --> 00:43:28,430
So we were interested in this
as a way, not to study trees,

1142
00:43:28,430 --> 00:43:30,680
but to study a range of
different kinds of structures.

1143
00:43:30,680 --> 00:43:33,280
And it is true, going back,
I guess, to the question

1144
00:43:33,280 --> 00:43:35,280
you asked, this is what
I was referring to about

1145
00:43:35,280 --> 00:43:36,390
low-dimensional manifolds.

1146
00:43:36,390 --> 00:43:38,520
There are some kinds of
knowledge representations

1147
00:43:38,520 --> 00:43:40,830
we have which might have
a low-dimensional spatial

1148
00:43:40,830 --> 00:43:44,340
structure, in particular,
like mental maps of the world.

1149
00:43:44,340 --> 00:43:48,726
So our intuitive models of the
Earth's surface, and things

1150
00:43:48,726 --> 00:43:50,850
which might be distributed
over the Earth's surface

1151
00:43:50,850 --> 00:43:53,250
spatially, a two-dimensional
map is probably

1152
00:43:53,250 --> 00:43:55,290
a good one for that.

1153
00:43:55,290 --> 00:43:58,590
So here we considered a
similar kind of concept

1154
00:43:58,590 --> 00:44:02,794
learning from a few examples
task, where we said--

1155
00:44:02,794 --> 00:44:03,960
but now we put it like this.

1156
00:44:03,960 --> 00:44:05,459
We said, suppose
that a certain kind

1157
00:44:05,459 --> 00:44:07,170
of Native American
artifact has been

1158
00:44:07,170 --> 00:44:09,240
found in sites near city x.

1159
00:44:09,240 --> 00:44:12,480
How likely is it also to be
found in sites near city y?

1160
00:44:12,480 --> 00:44:16,650
Or we could say sites near
city x and y, how about city z.

1161
00:44:16,650 --> 00:44:21,390
And we told people that
different Native American

1162
00:44:21,390 --> 00:44:25,200
tribes maybe had-- some lived
in a very small area, some

1163
00:44:25,200 --> 00:44:26,487
lived in a very big area.

1164
00:44:26,487 --> 00:44:28,320
Some lived in one place,
some another place.

1165
00:44:28,320 --> 00:44:30,309
Some lived here, and
then moved there.

1166
00:44:30,309 --> 00:44:32,850
We just told people very vague
things that taps into people's

1167
00:44:32,850 --> 00:44:35,730
probably badly remembered,
and very distorted,

1168
00:44:35,730 --> 00:44:39,930
versions of American history
that would basically suggests

1169
00:44:39,930 --> 00:44:42,090
that there should be
some kind of similar kind

1170
00:44:42,090 --> 00:44:44,010
of spatial diffusion
process, but now

1171
00:44:44,010 --> 00:44:46,320
in your 2D mental map of cities.

1172
00:44:46,320 --> 00:44:50,040
So again, there's no claim that
there's any reality to this,

1173
00:44:50,040 --> 00:44:51,137
or fine-grained reality.

1174
00:44:51,137 --> 00:44:53,220
But we thought it would
sort of roughly correspond

1175
00:44:53,220 --> 00:44:55,410
to people's internal
causal generative

1176
00:44:55,410 --> 00:44:58,800
models of archeology.

1177
00:44:58,800 --> 00:45:00,900
Again, I think it says
something about the way

1178
00:45:00,900 --> 00:45:02,566
human intelligence
works that none of us

1179
00:45:02,566 --> 00:45:05,504
are archaeologists, probably,
but we still have these ideas.

1180
00:45:05,504 --> 00:45:07,920
And it turned out that, here,
a spatially structured model

1181
00:45:07,920 --> 00:45:08,940
actually works a lot better.

1182
00:45:08,940 --> 00:45:10,356
Again, it shouldn't
be surprising.

1183
00:45:10,356 --> 00:45:13,816
It's just showing that actually,
the way-- the judgments

1184
00:45:13,816 --> 00:45:16,440
people make when they're making
inferences from a few examples,

1185
00:45:16,440 --> 00:45:17,981
just like you saw
with the predicting

1186
00:45:17,981 --> 00:45:20,070
the everyday events,
but now in the much more

1187
00:45:20,070 --> 00:45:22,230
interestingly
structured domain, is

1188
00:45:22,230 --> 00:45:25,200
sensitive to the different kinds
of environmental statistics.

1189
00:45:25,200 --> 00:45:28,950
There it was different power
laws versus Gaussian's of cake

1190
00:45:28,950 --> 00:45:34,080
bake-- or of movie grosses
versus lifetimes or something.

1191
00:45:34,080 --> 00:45:35,580
Here it's other stuff.

1192
00:45:35,580 --> 00:45:38,250
It's more interestingly
structured kinds of knowledge.

1193
00:45:38,250 --> 00:45:40,050
But you see the same
kind of picture.

1194
00:45:40,050 --> 00:45:41,760
And we thought that
was interesting,

1195
00:45:41,760 --> 00:45:43,979
and again, suggests
some of the ways

1196
00:45:43,979 --> 00:45:46,020
that we are starting to
put these tools together,

1197
00:45:46,020 --> 00:45:48,019
putting together probabilistic
generative models

1198
00:45:48,019 --> 00:45:50,340
with some kind of interestingly
structured knowledge.

1199
00:45:50,340 --> 00:45:52,140
Now, again, as you
saw from Surya,

1200
00:45:52,140 --> 00:45:54,740
and as Jay McClellan and
Tim Rogers worked on,

1201
00:45:54,740 --> 00:45:56,490
you can try to capture
a lot of this stuff

1202
00:45:56,490 --> 00:45:57,240
with neural networks.

1203
00:45:57,240 --> 00:45:59,781
The neat thing about the neural
networks that these guys have

1204
00:45:59,781 --> 00:46:03,090
worked on is that exactly
the same neural network can

1205
00:46:03,090 --> 00:46:05,640
capture this kind
of thing, and it

1206
00:46:05,640 --> 00:46:07,380
can capture this kind of thing.

1207
00:46:07,380 --> 00:46:11,400
So you can train the very
same hidden multilayer neural

1208
00:46:11,400 --> 00:46:16,260
network with one matrix
of object and features.

1209
00:46:16,260 --> 00:46:17,790
And the very same
neural network can

1210
00:46:17,790 --> 00:46:21,416
predict the tree-structured
patterns for animals

1211
00:46:21,416 --> 00:46:23,790
and their properties, as well
as the spatially-structured

1212
00:46:23,790 --> 00:46:27,090
patterns for Native American
artifacts and their cities.

1213
00:46:27,090 --> 00:46:31,380
The catch is that it doesn't
do either of them that well.

1214
00:46:31,380 --> 00:46:35,900
It doesn't do as well as the
tree-structured models do

1215
00:46:35,900 --> 00:46:36,630
for peop--

1216
00:46:36,630 --> 00:46:37,950
when I say either, it
doesn't do that well, I

1217
00:46:37,950 --> 00:46:39,450
mean, in capturing
people's judgments.

1218
00:46:39,450 --> 00:46:41,670
It doesn't do as well as the
best tree-structured models do

1219
00:46:41,670 --> 00:46:43,919
for people's concepts of
animals and their properties.

1220
00:46:43,919 --> 00:46:46,890
And it doesn't do as well as
the best spacial structures.

1221
00:46:46,890 --> 00:46:50,130
But again, it's in the same
spirit as the DeepMind networks

1222
00:46:50,130 --> 00:46:51,550
for playing lots of Atari games.

1223
00:46:51,550 --> 00:46:53,550
The idea there is to have
the same network solve

1224
00:46:53,550 --> 00:46:54,734
all these different tasks.

1225
00:46:54,734 --> 00:46:56,650
And in some sense, I
think that's a good idea.

1226
00:46:56,650 --> 00:46:58,709
I just think that the
architecture should

1227
00:46:58,709 --> 00:47:00,000
have a more flexible structure.

1228
00:47:00,000 --> 00:47:02,100
So we would also
say, in some sense,

1229
00:47:02,100 --> 00:47:04,560
the same architecture is solving
all these different tasks.

1230
00:47:04,560 --> 00:47:07,200
It's just that this
is one setting of it.

1231
00:47:07,200 --> 00:47:10,260
And this is another
setting of it.

1232
00:47:10,260 --> 00:47:12,720
And where they differ is
in the kind of structure

1233
00:47:12,720 --> 00:47:14,220
that-- well, they
differ in the fact

1234
00:47:14,220 --> 00:47:16,502
that they explicitly represent
structure in the world.

1235
00:47:16,502 --> 00:47:18,960
And they explicitly represent
different kinds of structure.

1236
00:47:18,960 --> 00:47:21,043
And they explicitly represent
that different kinds

1237
00:47:21,043 --> 00:47:23,895
of structure are appropriate
to different kinds of domains

1238
00:47:23,895 --> 00:47:26,520
in the world and our intuitions
about the causal processes that

1239
00:47:26,520 --> 00:47:28,470
are at work producing the data.

1240
00:47:28,470 --> 00:47:29,970
And I think that,
again, that's sort

1241
00:47:29,970 --> 00:47:32,136
of the difference between
the pattern classification

1242
00:47:32,136 --> 00:47:33,840
and the understanding
or explaining

1243
00:47:33,840 --> 00:47:37,680
view of intelligence.

1244
00:47:37,680 --> 00:47:40,320
The explanations, of course,
go a lot beyond different ways

1245
00:47:40,320 --> 00:47:42,120
that similarity
can be structured.

1246
00:47:42,120 --> 00:47:44,220
So one of the kind
of nice things-- oh,

1247
00:47:44,220 --> 00:47:46,380
and I guess another--

1248
00:47:46,380 --> 00:47:48,360
two other points beyond that.

1249
00:47:48,360 --> 00:47:50,070
One is that to get
the neural networks

1250
00:47:50,070 --> 00:47:52,319
to do that, you have to train
them with a lot of data.

1251
00:47:52,319 --> 00:47:55,380
Remember, Surya, as Tommy
pushed him on in that talk,

1252
00:47:55,380 --> 00:47:57,570
Surya was very
concerned with modeling

1253
00:47:57,570 --> 00:48:01,650
the dynamics of learning in the
sense of the optimization time

1254
00:48:01,650 --> 00:48:04,410
course, how the weights
change over time.

1255
00:48:04,410 --> 00:48:06,750
But he was usually
looking at infinite data.

1256
00:48:06,750 --> 00:48:09,300
So he was assuming that
you had, effectively,

1257
00:48:09,300 --> 00:48:11,940
an infinite number of columns
of any of these matrices.

1258
00:48:11,940 --> 00:48:13,920
So you could perfectly
compute the statistics.

1259
00:48:13,920 --> 00:48:15,920
And another important
thing about the difference

1260
00:48:15,920 --> 00:48:18,503
being the neural network models
and the ones I was showing you

1261
00:48:18,503 --> 00:48:21,030
is that, suppose you want
to train the model, not

1262
00:48:21,030 --> 00:48:24,780
on an infinite matrix,
but on a small finite one,

1263
00:48:24,780 --> 00:48:26,460
and maybe one with missing data.

1264
00:48:26,460 --> 00:48:28,710
It's a lot harder to get
the-- the neural network will

1265
00:48:28,710 --> 00:48:32,940
do a much poorer job capturing
the structure than these more

1266
00:48:32,940 --> 00:48:33,760
structured models.

1267
00:48:33,760 --> 00:48:35,750
And again, in a way
that's familiar with--

1268
00:48:35,750 --> 00:48:38,820
have you guys talked about
bias-variance dilemma?

1269
00:48:38,820 --> 00:48:42,249
So it's that same kind
of idea that you probably

1270
00:48:42,249 --> 00:48:43,290
heard about from Lorenzo.

1271
00:48:43,290 --> 00:48:45,520
Was it Lorenzo or one
of the machin learni--

1272
00:48:45,520 --> 00:48:46,020
OK.

1273
00:48:46,020 --> 00:48:48,019
So it's that same kind
of idea, but now applying

1274
00:48:48,019 --> 00:48:50,370
in this interesting case
of structured estimation

1275
00:48:50,370 --> 00:48:52,440
of generative models
for the world,

1276
00:48:52,440 --> 00:48:56,340
that if you have relatively
little data, and sparse data,

1277
00:48:56,340 --> 00:48:59,822
then having a more
structured inductive bi--

1278
00:48:59,822 --> 00:49:02,280
having the inductive bias that
comes from a more structured

1279
00:49:02,280 --> 00:49:06,030
representation is going to
be much more valuable when

1280
00:49:06,030 --> 00:49:08,935
you have sparse and noisy data.

1281
00:49:08,935 --> 00:49:11,310
The key-- and again, this is
something that Charles and I

1282
00:49:11,310 --> 00:49:13,590
were really interested
in-- is we wanted to--

1283
00:49:13,590 --> 00:49:16,080
like the DeepMind people,
like the connectionists,

1284
00:49:16,080 --> 00:49:19,080
we wanted to build general
purpose semantic cognition,

1285
00:49:19,080 --> 00:49:23,340
wanted to build general purpose
learning and reasoning systems.

1286
00:49:23,340 --> 00:49:25,260
And we wanted to
somehow figure out

1287
00:49:25,260 --> 00:49:27,843
how you could have the best of
both worlds, how you could have

1288
00:49:27,843 --> 00:49:31,170
a system that relatively
quickly could come

1289
00:49:31,170 --> 00:49:34,380
to get the right kind of strong
constraint-inductive bias

1290
00:49:34,380 --> 00:49:37,110
in some domain, and a different
one for a different domain,

1291
00:49:37,110 --> 00:49:38,700
yet could learn
in a flexible way

1292
00:49:38,700 --> 00:49:41,580
to capture the different
structure in different domains.

1293
00:49:41,580 --> 00:49:43,060
More on that in a little bit.

1294
00:49:43,060 --> 00:49:45,150
But the other thing I
wanted to talk about here

1295
00:49:45,150 --> 00:49:47,970
is just ways in which
our mental models,

1296
00:49:47,970 --> 00:49:49,860
our causal and
compositional ones,

1297
00:49:49,860 --> 00:49:51,991
go beyond just similarity.

1298
00:49:51,991 --> 00:49:53,490
I guess, since time
is short-- well,

1299
00:49:53,490 --> 00:49:55,698
I was planning to go through
this relatively quickly.

1300
00:49:55,698 --> 00:49:57,902
But anyway, mostly I'll
just gesture towards this.

1301
00:49:57,902 --> 00:49:59,360
And if you're
interested, you could

1302
00:49:59,360 --> 00:50:02,200
read the papers that
Charles has, or his thesis.

1303
00:50:02,200 --> 00:50:05,690
But here, there's a long
history of asking people

1304
00:50:05,690 --> 00:50:07,460
to make these kind of
judgments, in which

1305
00:50:07,460 --> 00:50:10,730
the basis for the judgment
isn't something like similarity,

1306
00:50:10,730 --> 00:50:12,599
but some other kind
of causal reasoning.

1307
00:50:12,599 --> 00:50:14,390
So for example, consider
these things here.

1308
00:50:14,390 --> 00:50:17,234
Poodles can bite through wire,
therefore German shepherds

1309
00:50:17,234 --> 00:50:18,150
can bite through wire.

1310
00:50:18,150 --> 00:50:19,889
Is that a strong
argument or weak?

1311
00:50:19,889 --> 00:50:22,430
Compare that with, dobermans
can bite through wire, therefore

1312
00:50:22,430 --> 00:50:24,080
German shepherds can
bite through wire.

1313
00:50:24,080 --> 00:50:26,600
So how many people think that
the top argument is a stronger

1314
00:50:26,600 --> 00:50:27,890
one?

1315
00:50:27,890 --> 00:50:30,320
How many people think the
bottom line is a stronger one?

1316
00:50:30,320 --> 00:50:31,100
So that's typical.

1317
00:50:31,100 --> 00:50:33,980
About twice as many
people prefer the top one.

1318
00:50:33,980 --> 00:50:36,560
Because intuitively--
do I have a little thing

1319
00:50:36,560 --> 00:50:37,370
that will appear?

1320
00:50:37,370 --> 00:50:40,262
Intuitively, anyone want to
explain why you thought so?

1321
00:50:44,198 --> 00:50:45,680
AUDIENCE: Poodles
are really small.

1322
00:50:45,680 --> 00:50:47,330
JOSH TENENBAUM: Poodles
are small or weak.

1323
00:50:47,330 --> 00:50:47,630
Yes.

1324
00:50:47,630 --> 00:50:49,296
And German shepherds
are big and strong.

1325
00:50:49,296 --> 00:50:50,817
And what about dobermans?

1326
00:50:50,817 --> 00:50:52,900
AUDIENCE: They're just as
big as German shepherds.

1327
00:50:52,900 --> 00:50:53,260
JOSH TENENBAUM: Yeah.

1328
00:50:53,260 --> 00:50:53,860
That's right.

1329
00:50:53,860 --> 00:50:56,291
So they're more similar
to German shepherds,

1330
00:50:56,291 --> 00:50:57,790
because they're
both big and strong.

1331
00:50:57,790 --> 00:51:01,327
But notice that something very
different is going on here.

1332
00:51:01,327 --> 00:51:02,410
It's not about similarity.

1333
00:51:02,410 --> 00:51:04,180
It's sort of anti-similarity.

1334
00:51:04,180 --> 00:51:05,990
But it's not just
anti-similarity.

1335
00:51:05,990 --> 00:51:09,160
Suppose I said, German
shepherds can bite through wire,

1336
00:51:09,160 --> 00:51:11,320
therefore poodles can
bite through wire.

1337
00:51:11,320 --> 00:51:12,489
Is that a good argument?

1338
00:51:12,489 --> 00:51:13,030
AUDIENCE: No.

1339
00:51:13,030 --> 00:51:13,630
It's an argument against.

1340
00:51:13,630 --> 00:51:13,750
JOSH TENENBAUM: No.

1341
00:51:13,750 --> 00:51:15,850
It's sort of a terrible
argument, right?

1342
00:51:15,850 --> 00:51:18,970
So there's some kind of
asymmetric dimensional

1343
00:51:18,970 --> 00:51:20,380
reasoning going on.

1344
00:51:20,380 --> 00:51:23,470
Or similarly, if I said,
which of these seems better

1345
00:51:23,470 --> 00:51:27,120
intuitively; Salmon
carry some bacteria,

1346
00:51:27,120 --> 00:51:29,350
therefore grizzly bears
are likely to carry it,

1347
00:51:29,350 --> 00:51:31,180
versus grizzly bears
carry this, therefore

1348
00:51:31,180 --> 00:51:32,890
salmon are likely to carry it.

1349
00:51:32,890 --> 00:51:36,280
How many people say salmon,
therefore grizzly bears?

1350
00:51:36,280 --> 00:51:38,560
How many people say grizzly
bears, therefore salmon?

1351
00:51:38,560 --> 00:51:39,870
How do you know?

1352
00:51:39,870 --> 00:51:41,170
Those who-- yeah, you're right.

1353
00:51:41,170 --> 00:51:43,128
I mean, you're right in
that's what people say.

1354
00:51:43,128 --> 00:51:44,427
I don't know if it's right.

1355
00:51:44,427 --> 00:51:45,260
Again, I made it up.

1356
00:51:45,260 --> 00:51:47,170
But why did you say that,
those of you who said salmon?

1357
00:51:47,170 --> 00:51:48,430
AUDIENCE: Bears eat salmon.

1358
00:51:48,430 --> 00:51:49,180
JOSH TENENBAUM:
Bears eat salmon.

1359
00:51:49,180 --> 00:51:49,680
Yeah.

1360
00:51:49,680 --> 00:51:52,600
So assuming that's true,
so we're told or see on TV,

1361
00:51:52,600 --> 00:51:53,140
then yeah.

1362
00:51:53,140 --> 00:51:55,199
So anyway, these are
these different kinds

1363
00:51:55,199 --> 00:51:56,365
of things that are going on.

1364
00:51:56,365 --> 00:51:58,736
And to cut to the
chase, what we showed

1365
00:51:58,736 --> 00:52:01,360
is that you could capture these
different patterns of reasoning

1366
00:52:01,360 --> 00:52:04,190
with, again, the same kind
of thing, but different.

1367
00:52:04,190 --> 00:52:08,230
It's also a hierarchical
generative model.

1368
00:52:08,230 --> 00:52:10,480
It also has, the key
level of the hierarchy

1369
00:52:10,480 --> 00:52:13,630
is some kind of directed
graphical structure

1370
00:52:13,630 --> 00:52:16,012
that generates distribution
on observable properties.

1371
00:52:16,012 --> 00:52:18,220
But it's a fundamentally
different kind of structure.

1372
00:52:18,220 --> 00:52:20,680
It's not just a tree or a space.

1373
00:52:20,680 --> 00:52:22,420
It might be a
different kind of graph

1374
00:52:22,420 --> 00:52:23,890
and a different kind of process.

1375
00:52:23,890 --> 00:52:26,740
So to be a little bit
more technical, the things

1376
00:52:26,740 --> 00:52:29,400
I showed you with the tree
and the low-dimensional space,

1377
00:52:29,400 --> 00:52:31,540
they had a different
geometry to the graph,

1378
00:52:31,540 --> 00:52:34,090
but the same stochastic
process operating over it.

1379
00:52:34,090 --> 00:52:36,730
It was, in both cases,
basically a diffusion process.

1380
00:52:36,730 --> 00:52:39,292
Whereas to get the kinds of
reasoning that you saw here,

1381
00:52:39,292 --> 00:52:40,750
you need a different
kind of graph.

1382
00:52:40,750 --> 00:52:44,020
In one case it's like a chain to
capture a dimension of strength

1383
00:52:44,020 --> 00:52:45,230
or size, say.

1384
00:52:45,230 --> 00:52:47,494
In the other case, it's
some kind of food web thing.

1385
00:52:47,494 --> 00:52:48,160
It's not a tree.

1386
00:52:48,160 --> 00:52:50,860
It's that kind of
directed network.

1387
00:52:50,860 --> 00:52:52,750
But you also need a
different process.

1388
00:52:52,750 --> 00:52:54,580
So the ways-- the kind
of probability model

1389
00:52:54,580 --> 00:52:55,840
to find that out is different.

1390
00:52:55,840 --> 00:52:57,460
And it's easy to see on the--

1391
00:52:57,460 --> 00:53:00,970
for example-- on the reasoning
with these threshold things,

1392
00:53:00,970 --> 00:53:02,710
like the strength
properties, if you

1393
00:53:02,710 --> 00:53:07,870
compare a 1D chain with
just symmetric diffusion,

1394
00:53:07,870 --> 00:53:09,280
you get a much
worse fit people's

1395
00:53:09,280 --> 00:53:11,410
judgments than if you'd used
what we called this drift

1396
00:53:11,410 --> 00:53:13,576
threshold thing, which is
basically a way of saying,

1397
00:53:13,576 --> 00:53:14,890
OK, I don't know.

1398
00:53:14,890 --> 00:53:17,407
There's some mapping
from strength to being

1399
00:53:17,407 --> 00:53:18,490
able to bite through wire.

1400
00:53:18,490 --> 00:53:19,820
I don't know exactly what it is.

1401
00:53:19,820 --> 00:53:21,340
But the higher up
you go on one, it's

1402
00:53:21,340 --> 00:53:23,006
probably more likely
that you can bite--

1403
00:53:23,006 --> 00:53:24,497
that you can do the other.

1404
00:53:24,497 --> 00:53:26,830
So that provides a wonderful
model of people's judgments

1405
00:53:26,830 --> 00:53:28,660
on these kind of tasks.

1406
00:53:28,660 --> 00:53:30,700
But that sort of
diffusion process,

1407
00:53:30,700 --> 00:53:34,510
like if it was like mutation
in biology, then that

1408
00:53:34,510 --> 00:53:35,980
would provide a very bad model.

1409
00:53:35,980 --> 00:53:38,320
That's the second row here.

1410
00:53:38,320 --> 00:53:41,830
Similarly, this sort of directed
kind of noisy transmission

1411
00:53:41,830 --> 00:53:46,890
process on a food web does a
great way of modeling people's

1412
00:53:46,890 --> 00:53:48,730
judgments about
diseases, but not

1413
00:53:48,730 --> 00:53:51,520
a very good way of
modeling people's judgments

1414
00:53:51,520 --> 00:53:53,230
about these
biological properties.

1415
00:53:53,230 --> 00:53:55,300
But the tree models
you saw before that

1416
00:53:55,300 --> 00:53:56,770
do a great job of
modeling people's

1417
00:53:56,770 --> 00:53:58,520
judgments about the
properties of animals,

1418
00:53:58,520 --> 00:54:01,420
they do a lousy job of modeling
these disease judgments.

1419
00:54:01,420 --> 00:54:04,060
So we have this picture
emerging that, at the time,

1420
00:54:04,060 --> 00:54:05,950
was very satisfying to us.

1421
00:54:05,950 --> 00:54:08,380
That, hey, we can take
this domain of, say,

1422
00:54:08,380 --> 00:54:10,870
animals and their properties,
or the various things we

1423
00:54:10,870 --> 00:54:14,110
can reason about, and there's
a lot of different ways

1424
00:54:14,110 --> 00:54:15,940
we can reason about
just this one domain.

1425
00:54:15,940 --> 00:54:18,942
And by building these
structured probabilistic models

1426
00:54:18,942 --> 00:54:20,650
with different kinds
of graphs structures

1427
00:54:20,650 --> 00:54:22,990
that capture different
kinds of causal processes,

1428
00:54:22,990 --> 00:54:25,690
we could really describe
a lot of different kinds

1429
00:54:25,690 --> 00:54:27,640
of reasoning.

1430
00:54:27,640 --> 00:54:29,249
And we saw this
as part of a theme

1431
00:54:29,249 --> 00:54:31,040
that a lot of other
people were working on.

1432
00:54:31,040 --> 00:54:32,590
So this is-- I
mentioned this before,

1433
00:54:32,590 --> 00:54:35,440
but now I'm just sort of
throwing it all out there.

1434
00:54:35,440 --> 00:54:36,970
A lot of people at
the time-- again,

1435
00:54:36,970 --> 00:54:40,690
this is maybe somewhere
between 5 to 10 years ago--

1436
00:54:40,690 --> 00:54:43,930
more like six or
seven years ago--

1437
00:54:43,930 --> 00:54:46,930
we're extremely interested
in this general view

1438
00:54:46,930 --> 00:54:49,540
of common sense reasoning
and semantic cognition

1439
00:54:49,540 --> 00:54:51,599
by basically taking big
matrices and boiling them

1440
00:54:51,599 --> 00:54:53,140
down to some kind
of graph structure.

1441
00:54:53,140 --> 00:54:55,699
In some form, that's what
Tom Mitchell was doing,

1442
00:54:55,699 --> 00:54:57,490
not just in the talk
you saw, but remember,

1443
00:54:57,490 --> 00:54:59,380
he said there's this
other stuff he does--

1444
00:54:59,380 --> 00:55:01,960
this thing called NELL, the
Never Ending Language Learner.

1445
00:55:01,960 --> 00:55:03,460
I'm showing a little
glimpse of that

1446
00:55:03,460 --> 00:55:06,730
up there from a New York Times
piece on it in the upper right.

1447
00:55:06,730 --> 00:55:09,220
In some ways, in a sort of
at least more implicit way,

1448
00:55:09,220 --> 00:55:11,560
it's what the neural networks
that Jay McClelland, Tim

1449
00:55:11,560 --> 00:55:14,156
Rogers, Surya were
talking about do.

1450
00:55:14,156 --> 00:55:16,030
And we thought-- you
know, we had good reason

1451
00:55:16,030 --> 00:55:17,830
to think that our
approach was more

1452
00:55:17,830 --> 00:55:20,630
like what people were doing
than some of these others.

1453
00:55:20,630 --> 00:55:23,380
But I then came to see-- and
this was around the time when

1454
00:55:23,380 --> 00:55:25,367
CBMM was actually
getting started--

1455
00:55:25,367 --> 00:55:26,950
that none of these
were going to work.

1456
00:55:26,950 --> 00:55:29,680
Like the whole thing was
just not going to work.

1457
00:55:29,680 --> 00:55:32,370
Liz was one of the main people
who convinced me of this.

1458
00:55:32,370 --> 00:55:34,120
But you could just
read the New York Times

1459
00:55:34,120 --> 00:55:37,210
article on Tom Mitchell's piece,
and you can see what's missing.

1460
00:55:37,210 --> 00:55:40,750
So there's Tom, remember.

1461
00:55:40,750 --> 00:55:43,510
This was an article from 2010.

1462
00:55:43,510 --> 00:55:46,070
Just to set the chronology
right, that was right around--

1463
00:55:46,070 --> 00:55:48,490
a little bit after Charles had
finished all that nice work

1464
00:55:48,490 --> 00:55:51,440
I showed you, which again,
I still think is valuable.

1465
00:55:51,440 --> 00:55:53,260
I think it is
capturing something

1466
00:55:53,260 --> 00:55:55,390
about what's going on.

1467
00:55:55,390 --> 00:55:57,520
It was very appealing to
people, like at Google,

1468
00:55:57,520 --> 00:56:00,490
because these knowledge graphs
are very much like the way,

1469
00:56:00,490 --> 00:56:02,380
around the same time,
Google was starting

1470
00:56:02,380 --> 00:56:04,690
to try to put more
semantics into web search--

1471
00:56:04,690 --> 00:56:07,330
again, connected to the
work that Tom was doing.

1472
00:56:07,330 --> 00:56:09,820
And there was this nice
article in the New York

1473
00:56:09,820 --> 00:56:12,850
Times talking about how they
built their system by reading

1474
00:56:12,850 --> 00:56:13,780
the web.

1475
00:56:13,780 --> 00:56:15,387
But the best part
of it was describing

1476
00:56:15,387 --> 00:56:16,970
one of the mistakes
their system made.

1477
00:56:16,970 --> 00:56:18,949
So let me just show this to you.

1478
00:56:18,949 --> 00:56:20,740
About knowledge that's
obvious to a person,

1479
00:56:20,740 --> 00:56:22,540
but not to a
computer-- again, it's

1480
00:56:22,540 --> 00:56:25,566
Tom Mitchell himself
describing this.

1481
00:56:25,566 --> 00:56:27,940
And the challenge of, that's
where NELL has to be headed,

1482
00:56:27,940 --> 00:56:30,610
is how to make the things
that are obvious to people

1483
00:56:30,610 --> 00:56:33,160
obvious to computers.

1484
00:56:33,160 --> 00:56:35,590
He gives this
example of a bug that

1485
00:56:35,590 --> 00:56:38,560
happened in NELL's early life.

1486
00:56:38,560 --> 00:56:40,980
The research team noticed that--

1487
00:56:40,980 --> 00:56:42,990
oh, let's skip down there.

1488
00:56:42,990 --> 00:56:44,330
So, a particular example--

1489
00:56:44,330 --> 00:56:47,920
when Dr. Mitchell scanned the
baked goods category recently,

1490
00:56:47,920 --> 00:56:50,050
he noticed a clear pattern.

1491
00:56:50,050 --> 00:56:51,760
NELL was at first
quite accurate, easily

1492
00:56:51,760 --> 00:56:54,420
identifying all kinds of pies,
breads, cakes, and cookies

1493
00:56:54,420 --> 00:56:55,570
as baked goods.

1494
00:56:55,570 --> 00:56:58,620
But things went awry after
NELL's noun phrase classifier

1495
00:56:58,620 --> 00:57:00,370
decided internet cookies
was a baked good.

1496
00:57:03,100 --> 00:57:06,100
NELL had read the sentence "I
deleted my internet cookies."

1497
00:57:06,100 --> 00:57:07,960
And again, think of
that as, it's kind

1498
00:57:07,960 --> 00:57:09,340
of like a simple proposition.

1499
00:57:09,340 --> 00:57:10,180
It's like, OK.

1500
00:57:10,180 --> 00:57:12,184
But the way it parses
that is cookies

1501
00:57:12,184 --> 00:57:14,350
are things that can be
deleted, the same way you can

1502
00:57:14,350 --> 00:57:16,040
say horses have T9 hormones.

1503
00:57:16,040 --> 00:57:17,500
It's basically just a matrix.

1504
00:57:17,500 --> 00:57:21,220
And the concept is
internet cookies.

1505
00:57:21,220 --> 00:57:23,620
And then there's the
property of can be deleted,

1506
00:57:23,620 --> 00:57:25,545
or something like that.

1507
00:57:25,545 --> 00:57:27,920
And it knows something about
natural language processing.

1508
00:57:27,920 --> 00:57:28,644
So it can see--

1509
00:57:28,644 --> 00:57:30,060
and it's trying
to be intelligent.

1510
00:57:30,060 --> 00:57:30,980
Oh, internet cookies.

1511
00:57:30,980 --> 00:57:33,160
Well, maybe like chocolate
chip cookies and oatmeal raisin

1512
00:57:33,160 --> 00:57:34,743
cookies, those were
a kind of cookies.

1513
00:57:34,743 --> 00:57:36,170
Basically, that's what it did.

1514
00:57:36,170 --> 00:57:37,670
Or no, actually
did the opposite.

1515
00:57:37,670 --> 00:57:39,340
[LAUGHS]

1516
00:57:39,340 --> 00:57:42,740
It said-- when it read
"I deleted my files,"

1517
00:57:42,740 --> 00:57:44,740
it decided files was
probably a baked good, too.

1518
00:57:44,740 --> 00:57:46,990
Well, first it decided internet
cookies was a baked good,

1519
00:57:46,990 --> 00:57:47,710
like those other cookies.

1520
00:57:47,710 --> 00:57:49,450
And then it decided that
files were a baked goods.

1521
00:57:49,450 --> 00:57:51,450
And it started this whole
avalanche of mistakes,

1522
00:57:51,450 --> 00:57:52,776
Dr. Mitchell said.

1523
00:57:52,776 --> 00:57:54,400
He corrected the
internet cookies error

1524
00:57:54,400 --> 00:57:56,827
and restarted NELL's
bakery education.

1525
00:57:56,827 --> 00:57:57,910
[LAUGHS] I mean, like, OK.

1526
00:57:57,910 --> 00:58:02,080
Now rerun without that problem.

1527
00:58:02,080 --> 00:58:04,365
So the point, the lesson
Tom draws from this,

1528
00:58:04,365 --> 00:58:05,740
and that the
article talks about,

1529
00:58:05,740 --> 00:58:07,630
is, oh, well, we still
need some assistance.

1530
00:58:07,630 --> 00:58:10,010
We have to go back and,
by hand, set these things.

1531
00:58:10,010 --> 00:58:11,902
But the key thing
is that, really--

1532
00:58:11,902 --> 00:58:13,360
I think the message
this is telling

1533
00:58:13,360 --> 00:58:16,360
us is no human child would
ever make this mistake.

1534
00:58:16,360 --> 00:58:17,735
Human children
learn in this way.

1535
00:58:17,735 --> 00:58:19,401
They don't need this
kind of assistance.

1536
00:58:19,401 --> 00:58:21,310
It's true that, as Tom
says, you and I don't

1537
00:58:21,310 --> 00:58:22,730
learn in isolation either.

1538
00:58:22,730 --> 00:58:24,688
So, all of the things
we've been talking about,

1539
00:58:24,688 --> 00:58:27,280
about learning from prior
knowledge and so on, are true.

1540
00:58:27,280 --> 00:58:30,010
But there's a basic kind
of common sense thing

1541
00:58:30,010 --> 00:58:32,710
that this is missing,
which is that at the time

1542
00:58:32,710 --> 00:58:34,300
a child is learning
anything about--

1543
00:58:34,300 --> 00:58:37,600
by the time a child is learning
anything about computers,

1544
00:58:37,600 --> 00:58:40,300
and files, and so on, they
understand well before that,

1545
00:58:40,300 --> 00:58:43,030
like back in early infancy, from
say, work that Liz has done,

1546
00:58:43,030 --> 00:58:47,020
and many others, that cookies,
in the sense of baked goods,

1547
00:58:47,020 --> 00:58:49,960
are a physical object, a kind
of food, a thing you eat.

1548
00:58:49,960 --> 00:58:52,441
Files, email-- not
a physical object.

1549
00:58:52,441 --> 00:58:54,190
And there's all sorts
of interesting stuff

1550
00:58:54,190 --> 00:58:57,640
to understand about how kids
learn that a book can be both

1551
00:58:57,640 --> 00:59:00,100
a no-- a novel is both
a story and it's also

1552
00:59:00,100 --> 00:59:02,560
a physical object, and
so a lot of that stuff.

1553
00:59:02,560 --> 00:59:05,380
But there's a basic
common sense understanding

1554
00:59:05,380 --> 00:59:08,170
of the world as consisting
of physical objects,

1555
00:59:08,170 --> 00:59:09,921
and for example,
agents and their goals.

1556
00:59:09,921 --> 00:59:11,670
You heard a little bit
about this from us,

1557
00:59:11,670 --> 00:59:13,270
from me and Tomer,
on the first day.

1558
00:59:13,270 --> 00:59:15,060
And that's where I
want to turn to next.

1559
00:59:15,060 --> 00:59:17,810
And this is just one of many
examples that we realized,

1560
00:59:17,810 --> 00:59:21,610
as cool as this system is, as
great as all this stuff is,

1561
00:59:21,610 --> 00:59:24,670
just trying to approach semantic
knowledge and common sense

1562
00:59:24,670 --> 00:59:27,460
reasoning as some kind
of big matrix completion

1563
00:59:27,460 --> 00:59:30,580
without a much more fundamental
grasp of the ways in which

1564
00:59:30,580 --> 00:59:32,824
the world is real
to a human mind,

1565
00:59:32,824 --> 00:59:34,990
well before they're learning
anything about language

1566
00:59:34,990 --> 00:59:36,930
or any of this
higher level stuff,

1567
00:59:36,930 --> 00:59:39,850
it was just not going
to work, in the same way

1568
00:59:39,850 --> 00:59:42,100
that I think if you want to
build a system that learns

1569
00:59:42,100 --> 00:59:44,100
to play a video game,
even remotely like the way

1570
00:59:44,100 --> 00:59:44,830
a human does,

1571
00:59:44,830 --> 00:59:47,165
there's a lot of more basic
stuff you have to build on.

1572
00:59:47,165 --> 00:59:49,040
And it's the same basic
stuff, I would argue.

1573
00:59:49,040 --> 00:59:50,920
A cool thing about
Atari video games

1574
00:59:50,920 --> 00:59:54,310
is that, even though they
were very low resolution,

1575
00:59:54,310 --> 00:59:59,140
very low-bit color displays,
with very big pixels, what

1576
00:59:59,140 --> 01:00:01,750
makes your ability to
learn that game work is

1577
01:00:01,750 --> 01:00:04,840
the same kind of thing that
makes the ability, even as

1578
01:00:04,840 --> 01:00:07,030
a young child, to not
make this mistake.

1579
01:00:07,030 --> 01:00:09,370
And it's the kind of
thing that Liz and people

1580
01:00:09,370 --> 01:00:11,530
in her field of
developmental psychology--

1581
01:00:11,530 --> 01:00:14,710
in particular, infant research--
have been studying really

1582
01:00:14,710 --> 01:00:16,960
excitingly for a
couple of decades.

1583
01:00:16,960 --> 01:00:20,144
That, I think, is as
transformative for the topic

1584
01:00:20,144 --> 01:00:22,060
of intelligence in brains,
minds, and machines

1585
01:00:22,060 --> 01:00:23,090
as anything.

1586
01:00:23,090 --> 01:00:24,880
So that's what
motivated the work we've

1587
01:00:24,880 --> 01:00:27,213
been doing in the last few
years and the main work we're

1588
01:00:27,213 --> 01:00:28,480
trying to do in the center.

1589
01:00:28,480 --> 01:00:31,830
And it also goes hand-in-hand
with the ways in which

1590
01:00:31,830 --> 01:00:34,810
we've realized that we have
to take what we've learned how

1591
01:00:34,810 --> 01:00:37,570
to do with building problematic
models over interesting

1592
01:00:37,570 --> 01:00:39,200
symbolically-structured
representations

1593
01:00:39,200 --> 01:00:42,550
and so on, but move way
beyond what you could call--

1594
01:00:42,550 --> 01:00:45,379
I mean, we need better,
even more interesting,

1595
01:00:45,379 --> 01:00:46,420
symbolic representations.

1596
01:00:46,420 --> 01:00:48,760
In particular, we need
to move beyond graphs

1597
01:00:48,760 --> 01:00:52,441
and stochastic processes
defined over graphs to programs.

1598
01:00:52,441 --> 01:00:54,190
So that's where the
probabilistic programs

1599
01:00:54,190 --> 01:00:55,790
come back into the mix.

1600
01:00:55,790 --> 01:00:57,100
So again, you already saw this.

1601
01:00:57,100 --> 01:00:58,641
And I'm trying to
close the loop back

1602
01:00:58,641 --> 01:00:59,860
to what we're doing in CBMM.

1603
01:00:59,860 --> 01:01:01,960
I've given you
about 10 to 15 years

1604
01:01:01,960 --> 01:01:04,669
of background in our field
of how we got to this, why

1605
01:01:04,669 --> 01:01:06,460
we think this is
interesting and important,

1606
01:01:06,460 --> 01:01:08,350
and why we think we need to--

1607
01:01:08,350 --> 01:01:11,170
why we've developed a
certain toolkit of ideas,

1608
01:01:11,170 --> 01:01:15,100
and why we think we needed
to keep extending it.

1609
01:01:15,100 --> 01:01:17,830
And I think, as you saw
before, and as you'll see,

1610
01:01:17,830 --> 01:01:19,512
this also, in some ways--

1611
01:01:19,512 --> 01:01:21,970
I think we're getting more and
more to the interesting part

1612
01:01:21,970 --> 01:01:22,871
of common sense.

1613
01:01:22,871 --> 01:01:25,120
But in another way, we're
getting back to the problems

1614
01:01:25,120 --> 01:01:26,500
I started off with
and what a lot

1615
01:01:26,500 --> 01:01:27,880
of other people at
this summer school

1616
01:01:27,880 --> 01:01:29,380
have an interest
in, which is things

1617
01:01:29,380 --> 01:01:31,810
like much more basic aspects
of visual perception.

1618
01:01:31,810 --> 01:01:35,020
I think the heart of real
intelligence and common sense

1619
01:01:35,020 --> 01:01:37,050
reasoning that
we're talking about

1620
01:01:37,050 --> 01:01:40,180
is directly connected to vision
and other sense modalities,

1621
01:01:40,180 --> 01:01:42,580
and how we get around in the
world and plan our actions,

1622
01:01:42,580 --> 01:01:45,157
and the very basic
kinds of goal social

1623
01:01:45,157 --> 01:01:47,240
understandings that you
saw in those little videos

1624
01:01:47,240 --> 01:01:49,027
of the red and blue
ball, or that you

1625
01:01:49,027 --> 01:01:51,360
see if you're trying to do
action recognition and action

1626
01:01:51,360 --> 01:01:52,450
understanding.

1627
01:01:52,450 --> 01:01:55,080
So in some sense, it's
gotten more cognitive.

1628
01:01:55,080 --> 01:01:58,080
But it's also, by getting to
the root of our common sense

1629
01:01:58,080 --> 01:02:01,350
knowledge, it makes better
contact with vision,

1630
01:02:01,350 --> 01:02:02,980
with neuroscience research.

1631
01:02:02,980 --> 01:02:05,429
And so I think it's a
super exciting development

1632
01:02:05,429 --> 01:02:07,470
in what we're doing for
the larger Brains, Minds,

1633
01:02:07,470 --> 01:02:09,150
and Machines agenda.

1634
01:02:09,150 --> 01:02:10,770
So again, now we're
saying, OK, let's

1635
01:02:10,770 --> 01:02:13,920
try to understand
the way in which--

1636
01:02:13,920 --> 01:02:16,530
even these kids
playing with blocks,

1637
01:02:16,530 --> 01:02:19,290
the world is real to them.

1638
01:02:19,290 --> 01:02:21,209
It's not just a
big matrix of data.

1639
01:02:21,209 --> 01:02:22,500
That is a thing in their hands.

1640
01:02:22,500 --> 01:02:24,083
And they have an
understanding of what

1641
01:02:24,083 --> 01:02:27,540
a thing is before they start
compiling lists of properties.

1642
01:02:27,540 --> 01:02:29,430
And they're playing
with somebody else.

1643
01:02:29,430 --> 01:02:31,890
That hand is attached to
a person, who has goals.

1644
01:02:31,890 --> 01:02:35,410
It's not just a big matrix
of rows and columns.

1645
01:02:35,410 --> 01:02:38,430
It's an agent with
goals, and even a mind.

1646
01:02:38,430 --> 01:02:41,610
And they understand
those things before they

1647
01:02:41,610 --> 01:02:44,605
start to learn a lot of other
things, like words for objects,

1648
01:02:44,605 --> 01:02:49,000
and advanced game-playing
behavior, and so on.

1649
01:02:49,000 --> 01:02:50,820
And when we want to
talk about learning,

1650
01:02:50,820 --> 01:02:52,695
we still are interested
in one-shot learning,

1651
01:02:52,695 --> 01:02:55,440
or very rapid learning
from a few examples.

1652
01:02:55,440 --> 01:02:58,950
And we're still interested
in how prior knowledge guides

1653
01:02:58,950 --> 01:03:01,390
that, and how that
knowledge can be built.

1654
01:03:01,390 --> 01:03:03,080
But we want to do
it in this context.

1655
01:03:03,080 --> 01:03:04,410
We want to study in
the context of, say,

1656
01:03:04,410 --> 01:03:06,210
how you learn how magnets
work, or how you learn

1657
01:03:06,210 --> 01:03:08,760
how a touchscreen device
works-- really interesting kinds

1658
01:03:08,760 --> 01:03:12,900
of grounded physical causes.

1659
01:03:12,900 --> 01:03:15,990
So this is what we
have, or what I've come

1660
01:03:15,990 --> 01:03:18,889
to call the common sense core.

1661
01:03:18,889 --> 01:03:21,180
Liz, are you going to talk
about core knowledge at all?

1662
01:03:21,180 --> 01:03:22,596
so there's a phrase
that Liz likes

1663
01:03:22,596 --> 01:03:23,820
to use called core knowledge.

1664
01:03:23,820 --> 01:03:25,620
And this is definitely
meant to evoke that.

1665
01:03:25,620 --> 01:03:27,780
And it's inspired by it.

1666
01:03:27,780 --> 01:03:29,269
I guess I changed
it a little bit,

1667
01:03:29,269 --> 01:03:30,810
because I wanted it
to mean something

1668
01:03:30,810 --> 01:03:32,080
a little bit different.

1669
01:03:32,080 --> 01:03:34,410
And I think, again, to
anticipate a little bit,

1670
01:03:34,410 --> 01:03:35,789
the main difference is--

1671
01:03:35,789 --> 01:03:36,330
I don't know.

1672
01:03:36,330 --> 01:03:38,667
What's the main difference?

1673
01:03:38,667 --> 01:03:40,500
The main difference is
that, in the same way

1674
01:03:40,500 --> 01:03:42,090
that lots of people look
at me and say, oh, he's

1675
01:03:42,090 --> 01:03:44,190
the Bayesian guy, lots
of people look at Liz

1676
01:03:44,190 --> 01:03:47,340
and say, oh, she's the
nativist gal or something.

1677
01:03:47,340 --> 01:03:49,590
And it's true that, compared
to a lot of other people,

1678
01:03:49,590 --> 01:03:51,330
I tend to be more
interested, and have

1679
01:03:51,330 --> 01:03:53,070
done more work
prominently associated

1680
01:03:53,070 --> 01:03:54,507
with Bayesian inference.

1681
01:03:54,507 --> 01:03:56,590
But by no means do I think
that's the whole story.

1682
01:03:56,590 --> 01:03:58,860
And part of what I tried to show
you, and will keep showing you,

1683
01:03:58,860 --> 01:03:59,880
is ways in which
that's only really

1684
01:03:59,880 --> 01:04:01,005
the beginning of the story.

1685
01:04:01,005 --> 01:04:02,670
And Liz is prominently
associated,

1686
01:04:02,670 --> 01:04:05,850
and you'll see some of this,
with really fascinating

1687
01:04:05,850 --> 01:04:10,290
discoveries that key high
level concepts, key kinds

1688
01:04:10,290 --> 01:04:13,230
of real knowledge, are
present, in some sense,

1689
01:04:13,230 --> 01:04:16,890
as early as you can look,
and in some form, I think,

1690
01:04:16,890 --> 01:04:20,490
very plausibly, have to be
due to some kind of innately

1691
01:04:20,490 --> 01:04:23,070
unfolding genetic program that
builds a mind the same way it

1692
01:04:23,070 --> 01:04:23,820
builds a brain.

1693
01:04:23,820 --> 01:04:25,140
But just as we'll
hear from her, that's,

1694
01:04:25,140 --> 01:04:27,223
in some ways, only the
beginning, or only one part

1695
01:04:27,223 --> 01:04:29,130
of a much richer,
more interesting story

1696
01:04:29,130 --> 01:04:31,450
that she's been developing.

1697
01:04:31,450 --> 01:04:33,274
But for that, among
other reasons,

1698
01:04:33,274 --> 01:04:34,690
I'm calling it a
little different.

1699
01:04:34,690 --> 01:04:36,990
And I'm trying to emphasize the
connection to what people in AI

1700
01:04:36,990 --> 01:04:38,156
call common sense reasoning.

1701
01:04:38,156 --> 01:04:41,050
Because I really do think this
is the heart of common sense.

1702
01:04:41,050 --> 01:04:43,380
It's this intuitive physics
and intuitive psychology.

1703
01:04:43,380 --> 01:04:47,610
So again, you saw us already
give an intro to this.

1704
01:04:47,610 --> 01:04:49,950
Maybe what I'll just do is
show you a little bit more

1705
01:04:49,950 --> 01:04:53,692
of the-- well, are you going
to talk about the stuff at all?

1706
01:04:53,692 --> 01:04:54,525
LIZ SPELKE: I guess.

1707
01:04:54,525 --> 01:04:55,020
Yeah.

1708
01:04:55,020 --> 01:04:56,061
JOSH TENENBAUM: Well, OK.

1709
01:04:56,061 --> 01:04:58,599
So this is work-- some of
this is based on Liz's work.

1710
01:04:58,599 --> 01:05:00,390
Some of this is based
on Renée Baillargeon,

1711
01:05:00,390 --> 01:05:02,950
a close colleague of hers, and
many other people out there.

1712
01:05:02,950 --> 01:05:04,800
And I wasn't really going
to go into the details.

1713
01:05:04,800 --> 01:05:07,230
And maybe, Liz, we can decide
whether you want to do this

1714
01:05:07,230 --> 01:05:07,740
or not.

1715
01:05:07,740 --> 01:05:10,920
But what they've shown
is that, even prior

1716
01:05:10,920 --> 01:05:14,280
to the time when kids are
learning words for objects,

1717
01:05:14,280 --> 01:05:17,400
all of this stuff with infants,
two months, four months, eight

1718
01:05:17,400 --> 01:05:18,840
months--

1719
01:05:18,840 --> 01:05:23,520
at this age, kids have, at
best, some vague statistical

1720
01:05:23,520 --> 01:05:26,460
associations of words
to kinds of objects.

1721
01:05:26,460 --> 01:05:28,860
But they already
have a great deal

1722
01:05:28,860 --> 01:05:30,630
of much more abstract
understanding

1723
01:05:30,630 --> 01:05:33,120
of physical objects.

1724
01:05:33,120 --> 01:05:36,732
So I won't-- maybe I should
not go into the details of it.

1725
01:05:36,732 --> 01:05:38,940
But you saw it in that nice
video of the baby playing

1726
01:05:38,940 --> 01:05:39,523
with the cups.

1727
01:05:42,015 --> 01:05:44,364
And there's really
interesting, sort

1728
01:05:44,364 --> 01:05:45,780
of rough, developmental
timelines.

1729
01:05:45,780 --> 01:05:47,946
One of the things we're
trying to figure out in CBMM

1730
01:05:47,946 --> 01:05:51,120
is to actually get much,
much clearer picture on this.

1731
01:05:51,120 --> 01:05:54,120
But at least if you look across
a bunch of different studies,

1732
01:05:54,120 --> 01:05:57,090
sometimes by one lab,
sometimes up by multiple labs,

1733
01:05:57,090 --> 01:06:00,360
you see ways in which, say,
going from two months to five

1734
01:06:00,360 --> 01:06:01,979
months, or five
months to 12 months,

1735
01:06:01,979 --> 01:06:04,020
kids seem to-- their
intuitive physics of objects

1736
01:06:04,020 --> 01:06:06,210
is getting a little
bit more sophisticated.

1737
01:06:06,210 --> 01:06:12,180
So for example, they
tend to understand--

1738
01:06:12,180 --> 01:06:14,880
in some form, they understand
a little bit of how collisions

1739
01:06:14,880 --> 01:06:19,830
conserve momentum, a little bit,
by five months or six months--

1740
01:06:19,830 --> 01:06:22,410
according to one of
Baillargeon's studies--

1741
01:06:22,410 --> 01:06:25,230
in the sense that if they
see a ball roll down a ramp

1742
01:06:25,230 --> 01:06:28,830
and hit another one,
and the second one goes

1743
01:06:28,830 --> 01:06:31,395
a certain distance, if
a bigger object comes,

1744
01:06:31,395 --> 01:06:33,520
they're not too surprised
if this one goes farther.

1745
01:06:33,520 --> 01:06:35,800
But if a little object hits
it, then they are surprised.

1746
01:06:35,800 --> 01:06:37,966
So they expect a bigger
object to be able to move it

1747
01:06:37,966 --> 01:06:39,270
more than a little object.

1748
01:06:39,270 --> 01:06:41,340
But a two-month-old
doesn't understand that.

1749
01:06:41,340 --> 01:06:43,890
Although a two-month-old does
understand-- this is, again,

1750
01:06:43,890 --> 01:06:45,000
from Liz's work--

1751
01:06:45,000 --> 01:06:47,970
that if an object is
colluded by a screen,

1752
01:06:47,970 --> 01:06:50,010
it hasn't disappeared,
and that if an object

1753
01:06:50,010 --> 01:06:52,980
is rolling towards a wall,
and that wall looks solid,

1754
01:06:52,980 --> 01:06:56,174
that the object can't go through
it, and that if it somehow--

1755
01:06:56,174 --> 01:06:58,590
when the screen is removed,
as you see on the upper left--

1756
01:06:58,590 --> 01:07:00,840
appears on the other side
of the screen, that's

1757
01:07:00,840 --> 01:07:01,954
very surprising to them.

1758
01:07:01,954 --> 01:07:04,620
I think-- I'm sure what Liz will
talk about, among other things,

1759
01:07:04,620 --> 01:07:07,710
are the methods they use, the
looking time methods to reveal

1760
01:07:07,710 --> 01:07:09,795
this.

1761
01:07:09,795 --> 01:07:11,490
And I think there's really--

1762
01:07:11,490 --> 01:07:14,200
this is one of the two
main insights that I,

1763
01:07:14,200 --> 01:07:15,630
and I think our
whole field, needs

1764
01:07:15,630 --> 01:07:17,850
to learn from
developmental psychology,

1765
01:07:17,850 --> 01:07:20,730
is how much of a basic
understanding of physics

1766
01:07:20,730 --> 01:07:23,390
like this is present very early.

1767
01:07:23,390 --> 01:07:25,710
And it doesn't
matter whether it's--

1768
01:07:25,710 --> 01:07:28,140
in some sense, it doesn't
matter for the points

1769
01:07:28,140 --> 01:07:31,440
I want to make here, how much
or in what way this is innate,

1770
01:07:31,440 --> 01:07:34,719
or how the genetics and
the experience interact.

1771
01:07:34,719 --> 01:07:35,760
I mean, that does matter.

1772
01:07:35,760 --> 01:07:37,801
And that, that's something
we want to understand,

1773
01:07:37,801 --> 01:07:39,450
and we are hoping
to try to understand

1774
01:07:39,450 --> 01:07:41,830
in the hopefully
not-too-distant future.

1775
01:07:41,830 --> 01:07:43,404
But for the purpose
of understanding

1776
01:07:43,404 --> 01:07:44,820
what is the heart
of common sense,

1777
01:07:44,820 --> 01:07:47,070
how are we going to build
these causal, compositional,

1778
01:07:47,070 --> 01:07:49,149
generative models to
really get at intelligence,

1779
01:07:49,149 --> 01:07:51,690
the main thing is that it should
be about this kind of stuff.

1780
01:07:51,690 --> 01:07:52,800
That's the main focus.

1781
01:07:52,800 --> 01:07:56,340
And then the other big insight
from developmental psychology,

1782
01:07:56,340 --> 01:07:58,260
which has to do with
how we build this stuff,

1783
01:07:58,260 --> 01:08:02,130
is this idea sometimes called
the child as scientist.

1784
01:08:02,130 --> 01:08:04,950
The basic idea is that, just
as this early commonsense

1785
01:08:04,950 --> 01:08:07,227
knowledge is something
like a scientific theory,

1786
01:08:07,227 --> 01:08:09,810
something like a good scientific
theory, the way Newton's laws

1787
01:08:09,810 --> 01:08:12,330
are a better scientific
theory than Kepler's laws

1788
01:08:12,330 --> 01:08:14,910
because of how they capture the
causal structure of the world

1789
01:08:14,910 --> 01:08:15,965
in a compositional way.

1790
01:08:15,965 --> 01:08:17,340
That's another
way to sum up what

1791
01:08:17,340 --> 01:08:20,850
I'm trying to say about
children's early knowledge.

1792
01:08:20,850 --> 01:08:23,520
But also, the way children build
their knowledge is something

1793
01:08:23,520 --> 01:08:25,710
like the way scientists
build their knowledge, which

1794
01:08:25,710 --> 01:08:30,700
is, well, they do
experiments, of course.

1795
01:08:30,700 --> 01:08:32,649
We normally call that play.

1796
01:08:32,649 --> 01:08:34,474
That's one of Laura's
Schulz's big ideas.

1797
01:08:34,474 --> 01:08:36,140
But it's not just
about the experiments.

1798
01:08:36,140 --> 01:08:38,410
I mean, Newton didn't
really do any experiments.

1799
01:08:38,410 --> 01:08:39,077
He just thought.

1800
01:08:39,077 --> 01:08:41,451
And that's another thing you'll
hear from Laura, and also

1801
01:08:41,451 --> 01:08:43,870
from Tomer, is that a lot
of children's learning

1802
01:08:43,870 --> 01:08:46,810
looks less like, say, stochastic
gradient descent, and more

1803
01:08:46,810 --> 01:08:49,479
like scratching your head
and trying to make sense of,

1804
01:08:49,479 --> 01:08:51,337
well, that's really funny.

1805
01:08:51,337 --> 01:08:52,420
Why does this happen here?

1806
01:08:52,420 --> 01:08:53,890
Why does that happen over there?

1807
01:08:53,890 --> 01:08:57,250
Or how can I explain
what seemed to be

1808
01:08:57,250 --> 01:09:00,790
diverse patterns of phenomena
with some common underlying

1809
01:09:00,790 --> 01:09:02,859
principles, and making
analogies between things,

1810
01:09:02,859 --> 01:09:04,420
and then trying out, oh,
well, if that's right,

1811
01:09:04,420 --> 01:09:06,020
then it would make
this prediction.

1812
01:09:06,020 --> 01:09:08,380
And the kid doesn't have to
be conscious of that the way

1813
01:09:08,380 --> 01:09:10,720
scientists maybe are.

1814
01:09:10,720 --> 01:09:13,990
That process of coming
up with theories

1815
01:09:13,990 --> 01:09:16,630
and considering variations,
trying them out,

1816
01:09:16,630 --> 01:09:19,450
seeing what kinds of new
experiences you can create

1817
01:09:19,450 --> 01:09:20,459
for yourself--

1818
01:09:20,459 --> 01:09:22,000
call them an
experiment, or call them

1819
01:09:22,000 --> 01:09:24,010
just a game, or
playing with a toy,

1820
01:09:24,010 --> 01:09:28,210
but that dynamic is the real
heart of how children learn

1821
01:09:28,210 --> 01:09:30,700
and build the knowledge
from the early stages

1822
01:09:30,700 --> 01:09:33,939
to what we come
to have as adults.

1823
01:09:33,939 --> 01:09:36,640
Those two insights of what we
start with and how we grow,

1824
01:09:36,640 --> 01:09:39,790
I think, are hugely powerful
and hugely important

1825
01:09:39,790 --> 01:09:42,404
for anything we want to do in
capturing-- making machines

1826
01:09:42,404 --> 01:09:43,779
that learn like
humans, or making

1827
01:09:43,779 --> 01:09:46,112
computational models that
really get at the heart of how

1828
01:09:46,112 --> 01:09:47,850
we come to be smart.