1
00:00:01,680 --> 00:00:04,080
The following content is
provided under a Creative

2
00:00:04,080 --> 00:00:05,620
Commons license.

3
00:00:05,620 --> 00:00:07,920
Your support will help
MIT OpenCourseWare

4
00:00:07,920 --> 00:00:12,280
continue to offer high quality
educational resources for free.

5
00:00:12,280 --> 00:00:14,910
To make a donation or
view additional materials

6
00:00:14,910 --> 00:00:18,870
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,870 --> 00:00:21,820
at osw.mit.edu.

8
00:00:21,820 --> 00:00:23,820
TOMER ULLMAN: What we're
going to do in the last

9
00:00:23,820 --> 00:00:26,820
section is not so much
have a lecture as a debate.

10
00:00:26,820 --> 00:00:28,980
And it's going to be a
debate between myself

11
00:00:28,980 --> 00:00:30,150
and Laura Schultz.

12
00:00:30,150 --> 00:00:32,070
And hopefully at
the end, all of you

13
00:00:32,070 --> 00:00:33,990
can join in for some
sort of free for all.

14
00:00:33,990 --> 00:00:35,760
Although, of course,
you're welcome to ask

15
00:00:35,760 --> 00:00:38,370
questions at any moment.

16
00:00:38,370 --> 00:00:40,830
Now, Laura and I have had
this debate a few times now.

17
00:00:40,830 --> 00:00:43,230
It's a debate about where
do new ideas come from.

18
00:00:43,230 --> 00:00:44,070
It's about theories.

19
00:00:44,070 --> 00:00:45,510
It's about imagination.

20
00:00:45,510 --> 00:00:47,760
The last time that we were
supposed to have the debate

21
00:00:47,760 --> 00:00:50,540
was in SRCD, which is a
child development conference.

22
00:00:50,540 --> 00:00:52,030
But Laura couldn't make it.

23
00:00:52,030 --> 00:00:54,000
So I ended up pulling
a Monty Python man

24
00:00:54,000 --> 00:00:55,590
wrestles with
himself routine where

25
00:00:55,590 --> 00:00:59,430
I was arguing both my
point and Laura's point.

26
00:00:59,430 --> 00:01:01,950
So it's nice to have a
sparring partner again

27
00:01:01,950 --> 00:01:04,260
after that experience.

28
00:01:04,260 --> 00:01:06,877
I am still a little intimidated.

29
00:01:06,877 --> 00:01:09,210
Just to give you a sense of
the caliber of my opponent--

30
00:01:09,210 --> 00:01:11,293
the reason Laura didn't
show up to our SRCD debate

31
00:01:11,293 --> 00:01:14,260
was that she was giving
a talk of her own

32
00:01:14,260 --> 00:01:16,639
in something called TED.

33
00:01:16,639 --> 00:01:18,180
I don't know if
you've heard of SRCD,

34
00:01:18,180 --> 00:01:20,520
but presumably
you've heard of TED.

35
00:01:20,520 --> 00:01:21,300
OK.

36
00:01:21,300 --> 00:01:22,610
So what's this debate about?

37
00:01:22,610 --> 00:01:23,620
What am I talking about?

38
00:01:23,620 --> 00:01:26,010
Here's a 65 second prologue
sort of setting it up.

39
00:01:26,010 --> 00:01:27,440
And then we'll get into it.

40
00:01:27,440 --> 00:01:29,910
The background to this
debate is that, you know,

41
00:01:29,910 --> 00:01:32,370
we have this sort of really
nice picture coming out

42
00:01:32,370 --> 00:01:33,750
of development and models.

43
00:01:33,750 --> 00:01:36,690
And sort of Laura and Josh
have set up the perfect intro

44
00:01:36,690 --> 00:01:38,430
to that, which is
from development

45
00:01:38,430 --> 00:01:41,220
we have this idea of children
are sort of like scientists

46
00:01:41,220 --> 00:01:43,680
or hackers or what have you.

47
00:01:43,680 --> 00:01:46,770
They come up with these
theories to explain the world.

48
00:01:46,770 --> 00:01:47,920
And they do it beautifully.

49
00:01:47,920 --> 00:01:49,620
And that's what they
do in development.

50
00:01:49,620 --> 00:01:51,960
And then we have
this idea coming out

51
00:01:51,960 --> 00:01:55,410
of computational
land which is, well,

52
00:01:55,410 --> 00:01:57,300
what we mean by theories--
how do we actually

53
00:01:57,300 --> 00:01:58,740
capture that computationally?

54
00:01:58,740 --> 00:02:00,120
How do we formalize that?

55
00:02:00,120 --> 00:02:02,820
Well, maybe it's something
like hierarchical learning

56
00:02:02,820 --> 00:02:05,760
over space of programs, right?

57
00:02:05,760 --> 00:02:07,740
Because programs
are this the thing

58
00:02:07,740 --> 00:02:08,952
that's kind of like theories.

59
00:02:08,952 --> 00:02:10,410
And they kind of
explain the world.

60
00:02:10,410 --> 00:02:13,470
And you can learn them via
Bayesian inference and programs

61
00:02:13,470 --> 00:02:15,970
over programs and
learning over learning.

62
00:02:15,970 --> 00:02:19,420
And that's sort of a
really nice picture.

63
00:02:19,420 --> 00:02:20,512
But a lot of people--

64
00:02:20,512 --> 00:02:21,970
Josh I think was
using the phrase--

65
00:02:21,970 --> 00:02:25,320
or Nour was using the
phrase-- "give us grief."

66
00:02:25,320 --> 00:02:27,960
A lot of people give grief
to these computation models

67
00:02:27,960 --> 00:02:28,560
correctly.

68
00:02:28,560 --> 00:02:29,850
They say something
like, wait a minute.

69
00:02:29,850 --> 00:02:31,590
These computational models
that you've described,

70
00:02:31,590 --> 00:02:32,964
like Josh was
describing earlier,

71
00:02:32,964 --> 00:02:35,490
these hierarchical
programs over programs,

72
00:02:35,490 --> 00:02:37,630
what's the theory
space on that, right?

73
00:02:37,630 --> 00:02:39,690
Like, what's the space
of all possible programs?

74
00:02:39,690 --> 00:02:40,190
Remind me.

75
00:02:40,190 --> 00:02:41,270
Isn't that infinite?

76
00:02:41,270 --> 00:02:43,590
And isn't infinite
really, really big?

77
00:02:43,590 --> 00:02:45,840
What are you trying to
suggest, that children are just

78
00:02:45,840 --> 00:02:48,150
searching through this
giant space of programs?

79
00:02:48,150 --> 00:02:51,360
How could they
possibly be doing that?

80
00:02:51,360 --> 00:02:53,430
And what we'd like to
argue and I'll spell out

81
00:02:53,430 --> 00:02:55,388
is that while we don't
search through the space

82
00:02:55,388 --> 00:02:58,170
of all programs, we don't
do that when we write out

83
00:02:58,170 --> 00:02:59,070
our programs, right?

84
00:02:59,070 --> 00:03:00,224
We somehow figure it out.

85
00:03:00,224 --> 00:03:01,890
We're writing out
these infinite spaces.

86
00:03:01,890 --> 00:03:03,816
And yet we search
them in our computers.

87
00:03:03,816 --> 00:03:05,190
That's what we're
try to suggest.

88
00:03:05,190 --> 00:03:07,231
The same way that we search
them in our computers

89
00:03:07,231 --> 00:03:09,750
might be something like the
way the children actually have

90
00:03:09,750 --> 00:03:11,040
to search for their theories.

91
00:03:11,040 --> 00:03:14,250
And it's hard, grueling work
for computers and children.

92
00:03:14,250 --> 00:03:16,770
And the algorithms that we're,
in particular, interested

93
00:03:16,770 --> 00:03:18,930
in-- there's all sorts of
learning algorithms, of course.

94
00:03:18,930 --> 00:03:20,280
But one of the most
influential learning

95
00:03:20,280 --> 00:03:22,620
algorithms that people have
used for learning these programs

96
00:03:22,620 --> 00:03:24,711
and hierarchical programs
over programs and things

97
00:03:24,711 --> 00:03:27,210
like that is just sort of this
version of stochastic search.

98
00:03:27,210 --> 00:03:28,950
And I've told you a little bit
about that-- those of you who

99
00:03:28,950 --> 00:03:31,800
went to the church tutorial--
about Markov Chain Monte

100
00:03:31,800 --> 00:03:33,280
Carlo, and all that stuff.

101
00:03:33,280 --> 00:03:34,696
But the point is
something-- well,

102
00:03:34,696 --> 00:03:37,320
the way to search large,
complicated theory spaces

103
00:03:37,320 --> 00:03:39,336
is through something
like stochastic search.

104
00:03:39,336 --> 00:03:41,460
And, well, if it's something
like stochastic search

105
00:03:41,460 --> 00:03:44,460
algorithms for our
theories, maybe children

106
00:03:44,460 --> 00:03:48,750
are doing something like
stochastic search, right?

107
00:03:48,750 --> 00:03:51,621
No is what Laura's going to say.

108
00:03:51,621 --> 00:03:53,370
Or she is going to say
that can't possibly

109
00:03:53,370 --> 00:03:55,842
be the whole story, and
here's what's missing.

110
00:03:55,842 --> 00:03:57,300
So does everyone
sort of understand

111
00:03:57,300 --> 00:03:59,730
what the debate is going
to be about more or less?

112
00:03:59,730 --> 00:04:02,190
I'm going to spell out,
of course, all of this

113
00:04:02,190 --> 00:04:04,590
in the next few minutes.

114
00:04:04,590 --> 00:04:05,220
OK.

115
00:04:05,220 --> 00:04:07,350
So as I said, we're going
to switch back and forth,

116
00:04:07,350 --> 00:04:07,770
Laura and I.

117
00:04:07,770 --> 00:04:08,910
The outline of the
debate is that I'm

118
00:04:08,910 --> 00:04:10,701
going to give some
background for like what

119
00:04:10,701 --> 00:04:12,780
good our theory is or
presenting a good theory--

120
00:04:12,780 --> 00:04:13,950
just really sort
of covered that.

121
00:04:13,950 --> 00:04:15,150
So I'll be zooming
through it just

122
00:04:15,150 --> 00:04:17,160
to give you an example of
what I'm talking about.

123
00:04:17,160 --> 00:04:19,618
Because I want to tell you what
stochastic search in theory

124
00:04:19,618 --> 00:04:21,839
space looks like.

125
00:04:21,839 --> 00:04:24,220
After I do that, Laura
is going to step in.

126
00:04:24,220 --> 00:04:26,040
And she's going to
rudely interrupt

127
00:04:26,040 --> 00:04:27,120
me and say, no, no, no.

128
00:04:27,120 --> 00:04:28,310
That can't possibly be true.

129
00:04:28,310 --> 00:04:30,268
Here's all the things
that are wrong with that.

130
00:04:30,268 --> 00:04:31,560
I will then give a rebuttal.

131
00:04:31,560 --> 00:04:34,349
And then Laura will
end and summarize.

132
00:04:34,349 --> 00:04:36,390
Beyond all these things,
something I didn't write

133
00:04:36,390 --> 00:04:38,190
was all of you.

134
00:04:38,190 --> 00:04:40,240
Jump in and say what you think.

135
00:04:40,240 --> 00:04:40,740
OK.

136
00:04:40,740 --> 00:04:41,850
So what good is a theory?

137
00:04:41,850 --> 00:04:44,266
Like I said, Josh sort
of went through this.

138
00:04:44,266 --> 00:04:46,890
I don't think you guys need that
much convincing at this point.

139
00:04:46,890 --> 00:04:49,181
But by theory, I mean some
sort of structured knowledge

140
00:04:49,181 --> 00:04:50,820
that goes beyond
the data, compresses

141
00:04:50,820 --> 00:04:53,760
the data in some way, and is
able to predict new things.

142
00:04:53,760 --> 00:04:55,350
My running example
is going to be

143
00:04:55,350 --> 00:04:57,670
much simpler than character
recognition or anything

144
00:04:57,670 --> 00:04:58,170
like that.

145
00:04:58,170 --> 00:05:01,990
It's going to be about
magnets and, in particular,

146
00:05:01,990 --> 00:05:03,612
very simplified
theory of magnetism.

147
00:05:03,612 --> 00:05:05,320
So suppose we bring
a child into the lab.

148
00:05:05,320 --> 00:05:07,062
And we tell her,
look at these blocks.

149
00:05:07,062 --> 00:05:08,020
They all look the same.

150
00:05:08,020 --> 00:05:08,780
Play with them.

151
00:05:08,780 --> 00:05:10,900
See if you can figure
out what's going on here.

152
00:05:10,900 --> 00:05:11,500
OK?

153
00:05:11,500 --> 00:05:14,230
And unbeknownst to the
child, but beknownst to you,

154
00:05:14,230 --> 00:05:15,730
is that these
things are magnets.

155
00:05:15,730 --> 00:05:16,730
They're not just blocks.

156
00:05:16,730 --> 00:05:17,470
Some of them are metal.

157
00:05:17,470 --> 00:05:18,511
Some of them are magnets.

158
00:05:18,511 --> 00:05:19,660
Some of them are plastic.

159
00:05:19,660 --> 00:05:20,320
OK?

160
00:05:20,320 --> 00:05:22,147
So she's going to start
playing with them.

161
00:05:22,147 --> 00:05:23,980
And she's going to start
noticing something.

162
00:05:23,980 --> 00:05:26,830
Like, sometimes none of these
things do anything, right?

163
00:05:26,830 --> 00:05:27,940
They don't stick.

164
00:05:27,940 --> 00:05:30,730
But sometimes,
huh, they do stick.

165
00:05:30,730 --> 00:05:33,012
And she starts
collecting observations

166
00:05:33,012 --> 00:05:33,970
in something like this.

167
00:05:33,970 --> 00:05:35,200
Like, how could she
explain the data?

168
00:05:35,200 --> 00:05:36,530
Well, she could explain
the data in this.

169
00:05:36,530 --> 00:05:37,654
You can just write it down.

170
00:05:37,654 --> 00:05:39,370
She could like
label these somehow.

171
00:05:39,370 --> 00:05:41,080
She could have
all these, A to I.

172
00:05:41,080 --> 00:05:43,776
And she could say, well, A
and B attract one another,

173
00:05:43,776 --> 00:05:46,150
B and A attract one another,
B and C attract one another,

174
00:05:46,150 --> 00:05:47,160
and so on.

175
00:05:47,160 --> 00:05:47,800
OK?

176
00:05:47,800 --> 00:05:49,210
That's her theory.

177
00:05:49,210 --> 00:05:50,762
That's her explanation
of the data.

178
00:05:50,762 --> 00:05:52,220
This is horrible,
of course, right?

179
00:05:52,220 --> 00:05:54,810
Like, this is not an explanation
in any sense of the word.

180
00:05:54,810 --> 00:05:56,320
It's just a table of data.

181
00:05:56,320 --> 00:05:58,079
But in one sense, why
is it not a theory?

182
00:05:58,079 --> 00:06:00,620
It's because it's just writing
down what you've already seen.

183
00:06:00,620 --> 00:06:02,530
It doesn't compress
it in any way.

184
00:06:02,530 --> 00:06:04,070
And it can't predict
anything new.

185
00:06:04,070 --> 00:06:06,880
If I now give you a new block
X and I tell you, listen,

186
00:06:06,880 --> 00:06:10,070
X attracts B, what else
can you tell me about it?

187
00:06:10,070 --> 00:06:13,880
And she'll say, well,
X attracts B. Yeah.

188
00:06:13,880 --> 00:06:14,380
OK.

189
00:06:14,380 --> 00:06:15,700
That's what my table tells me.

190
00:06:15,700 --> 00:06:16,990
You can't really
predict anything new

191
00:06:16,990 --> 00:06:18,310
from a giant table like that.

192
00:06:18,310 --> 00:06:19,600
You want some sort
of compression.

193
00:06:19,600 --> 00:06:20,560
You want some sort of theory.

194
00:06:20,560 --> 00:06:21,880
But what else could she
come up with, right?

195
00:06:21,880 --> 00:06:23,210
There's all sorts of things
you could come up with.

196
00:06:23,210 --> 00:06:25,251
But she could have come
up with a very simplified

197
00:06:25,251 --> 00:06:26,417
theory that goes like this.

198
00:06:26,417 --> 00:06:28,000
Suppose that we
imagine that there are

199
00:06:28,000 --> 00:06:29,140
certain things in the world.

200
00:06:29,140 --> 00:06:31,230
We're just going to call them,
like, shmagnet and shmetal.

201
00:06:31,230 --> 00:06:32,646
Because I don't
want to confuse it

202
00:06:32,646 --> 00:06:34,872
with actual magnets
and actual metals.

203
00:06:34,872 --> 00:06:36,080
But she comes up with a name.

204
00:06:36,080 --> 00:06:38,050
She says there are several
things in the world.

205
00:06:38,050 --> 00:06:40,133
And how do these things
interact with one another?

206
00:06:40,133 --> 00:06:42,070
Well, let's just
hypothesize some rules.

207
00:06:42,070 --> 00:06:44,170
And we can talk about how does
she come up with these rules.

208
00:06:44,170 --> 00:06:46,461
How does she know that there's
two things in the world?

209
00:06:46,461 --> 00:06:48,359
Let's say for some
reason by dumb luck

210
00:06:48,359 --> 00:06:49,900
she hits upon this,
which is actually

211
00:06:49,900 --> 00:06:51,280
a really good theory
for explaining

212
00:06:51,280 --> 00:06:53,500
that, which is to say there
are two things in the world.

213
00:06:53,500 --> 00:06:55,541
And the way they interact
is through these rules.

214
00:06:55,541 --> 00:06:57,880
If something is a magnet and
another thing is a magnet,

215
00:06:57,880 --> 00:06:59,230
if X is a magnet
and Y is the magnet,

216
00:06:59,230 --> 00:07:00,313
they're going to interact.

217
00:07:00,313 --> 00:07:01,660
They're going to stick.

218
00:07:01,660 --> 00:07:04,540
If X is a magnet and Y in the
metal they're going to stick.

219
00:07:04,540 --> 00:07:06,351
And interactions are symmetric.

220
00:07:06,351 --> 00:07:06,850
OK.

221
00:07:06,850 --> 00:07:09,040
So metals don't stick to metals,
but metals stick to magnets.

222
00:07:09,040 --> 00:07:09,880
Magnets sticks to metals.

223
00:07:09,880 --> 00:07:11,230
And the interactions
are symmetric.

224
00:07:11,230 --> 00:07:12,460
And if you have
these rules, you need

225
00:07:12,460 --> 00:07:15,160
to pay some overhead, obviously,
for remembering the rules.

226
00:07:15,160 --> 00:07:19,100
But you can compress this,
you know, n by n thing

227
00:07:19,100 --> 00:07:20,600
into just the vectors
of remembering

228
00:07:20,600 --> 00:07:22,510
what are the magnets,
what are the magnets.

229
00:07:22,510 --> 00:07:24,590
So you've achieved
some compression.

230
00:07:24,590 --> 00:07:26,798
And if I give you something
new and I tell you, look,

231
00:07:26,798 --> 00:07:29,200
this is X, X attracts A, and
you're like, wait a minute.

232
00:07:29,200 --> 00:07:30,230
A is a magnet.

233
00:07:30,230 --> 00:07:32,187
So this is either a
magnet or a metal.

234
00:07:32,187 --> 00:07:34,270
You can probably predict
a lot of different things

235
00:07:34,270 --> 00:07:34,930
about this thing.

236
00:07:34,930 --> 00:07:36,763
You can go and design
your little experiment

237
00:07:36,763 --> 00:07:39,381
to figure out if this new thing
is itself a magnet or a metal.

238
00:07:39,381 --> 00:07:39,880
Great.

239
00:07:39,880 --> 00:07:42,602
So this is wonderful.

240
00:07:42,602 --> 00:07:43,810
I won't go through something.

241
00:07:43,810 --> 00:07:45,250
Like, there's this added
thing of finding out

242
00:07:45,250 --> 00:07:47,329
which one are the actual
shmagnets and shmetals.

243
00:07:47,329 --> 00:07:49,120
And then you can predict
the observed data.

244
00:07:49,120 --> 00:07:49,690
OK.

245
00:07:49,690 --> 00:07:52,390
So now, what we've done is
we've set up this sort of space

246
00:07:52,390 --> 00:07:53,710
of possible theories, right?

247
00:07:53,710 --> 00:07:55,660
Imagine like all the
possible logical theories

248
00:07:55,660 --> 00:07:57,534
that you could have
written out, of all

249
00:07:57,534 --> 00:07:59,200
the possible logical
predicates, there's

250
00:07:59,200 --> 00:08:00,490
an infinite amount of them.

251
00:08:00,490 --> 00:08:03,050
But now we've turned this into
a rational inference problem,

252
00:08:03,050 --> 00:08:03,550
right?

253
00:08:03,550 --> 00:08:05,530
The problem for
you as the learner

254
00:08:05,530 --> 00:08:07,870
is out of the space of all
possible theories, just

255
00:08:07,870 --> 00:08:10,710
find the one that best explains
the observed data, right?

256
00:08:10,710 --> 00:08:13,360
Where, by best we mean something
like Bayes' law, right?

257
00:08:13,360 --> 00:08:16,126
Try to find the best
theory to predict the data,

258
00:08:16,126 --> 00:08:17,500
whereby we mean
the theory that's

259
00:08:17,500 --> 00:08:21,854
sort of the shortest and most
compressed in itself a priori.

260
00:08:21,854 --> 00:08:23,020
You've all seen Bayes' rule.

261
00:08:23,020 --> 00:08:25,311
You've all heard Josh talk
about this and other people.

262
00:08:25,311 --> 00:08:27,730
But it also explains the
data itself the best.

263
00:08:27,730 --> 00:08:28,810
That's the problem.

264
00:08:28,810 --> 00:08:29,570
Go figure it out.

265
00:08:29,570 --> 00:08:30,700
We formalize it for you.

266
00:08:30,700 --> 00:08:31,750
We've solved it.

267
00:08:31,750 --> 00:08:34,880
And, of course, you know,
this elides a lot of things.

268
00:08:34,880 --> 00:08:36,540
It doesn't really
solve anything.

269
00:08:36,540 --> 00:08:38,480
But we'll get to
that in a second.

270
00:08:38,480 --> 00:08:40,240
So you might wonder
like how do we

271
00:08:40,240 --> 00:08:43,330
build a good space
for possible theories.

272
00:08:43,330 --> 00:08:45,280
And one way to do
then is to say, well,

273
00:08:45,280 --> 00:08:47,110
how do you generate
all possible theories?

274
00:08:47,110 --> 00:08:50,127
How do you define a space
of all possible theories?

275
00:08:50,127 --> 00:08:52,210
In this case, we're just
going to go with grammar.

276
00:08:52,210 --> 00:08:54,209
But you could imagine
something like a generator

277
00:08:54,209 --> 00:08:56,830
for all possible
programs, right?

278
00:08:56,830 --> 00:09:00,089
So a grammar, to put it
very, very, very shortly,

279
00:09:00,089 --> 00:09:01,630
is something that
you can run forward

280
00:09:01,630 --> 00:09:02,680
and will generate a sentence.

281
00:09:02,680 --> 00:09:04,680
Or it's a way of sort of
looking at the sentence

282
00:09:04,680 --> 00:09:07,897
and figuring out what
its underlying rules are.

283
00:09:07,897 --> 00:09:08,980
Let's put it this way, OK?

284
00:09:08,980 --> 00:09:12,135
So in a grammar, you would start
with a sort of a sentence node.

285
00:09:12,135 --> 00:09:13,510
And then you know
that a sentence

286
00:09:13,510 --> 00:09:16,510
can go into several other
nodes in some probability.

287
00:09:16,510 --> 00:09:20,080
For example, a sentence
node can go into something

288
00:09:20,080 --> 00:09:21,730
like a verb phrase
and a noun phrase.

289
00:09:21,730 --> 00:09:23,855
The noun phrase goes into
a noun and another thing.

290
00:09:23,855 --> 00:09:25,720
And you sort generate
through this grammar

291
00:09:25,720 --> 00:09:27,162
until you end up
with a sentence.

292
00:09:27,162 --> 00:09:28,620
And it can be any
sort of sentence.

293
00:09:28,620 --> 00:09:31,840
It can be the clownfish ascended
the stairs darkly, right?

294
00:09:31,840 --> 00:09:33,730
And you can run
this grammar forward

295
00:09:33,730 --> 00:09:36,010
and generate whole new
sentences that you've never

296
00:09:36,010 --> 00:09:37,044
thought of it before.

297
00:09:37,044 --> 00:09:38,710
And you could then
also use that grammar

298
00:09:38,710 --> 00:09:39,709
to figure out sentences.

299
00:09:39,709 --> 00:09:42,400
So if you see the sentence, the
clownfish ascended the stairs

300
00:09:42,400 --> 00:09:44,858
darkly, you can use something
like a grammar to figure out,

301
00:09:44,858 --> 00:09:47,350
well, you know, how is
this sentence constructed.

302
00:09:47,350 --> 00:09:49,430
This is a grammar for
logical predicates.

303
00:09:49,430 --> 00:09:51,550
It just means that
if we run it forward,

304
00:09:51,550 --> 00:09:54,160
it starts out with some sort of
abstract thing like, you know,

305
00:09:54,160 --> 00:09:56,170
it's going to be a law
or some sort of set

306
00:09:56,170 --> 00:09:57,274
of logical predicates.

307
00:09:57,274 --> 00:09:58,690
You run it forward,
and it ends up

308
00:09:58,690 --> 00:10:01,260
not with a sentence, like the
clownfish blah, blah, blah.

309
00:10:01,260 --> 00:10:03,990
It ends up with a
particular law that's

310
00:10:03,990 --> 00:10:08,100
relating a few particular
predicates like, you know--

311
00:10:08,100 --> 00:10:11,400
sorry-- like X and Y interact
symmetrically, right?

312
00:10:11,400 --> 00:10:12,900
Like, there are
things in the world.

313
00:10:12,900 --> 00:10:14,910
There are X and Y. If
X interacts with Y,

314
00:10:14,910 --> 00:10:17,020
Y interacts with
X. That's a law.

315
00:10:17,020 --> 00:10:18,930
We run through the
grammar, we generate that.

316
00:10:18,930 --> 00:10:20,070
We could run through
the grammar again,

317
00:10:20,070 --> 00:10:22,028
and it will say something
completely different.

318
00:10:22,028 --> 00:10:23,825
Like, of all the things
that I've related,

319
00:10:23,825 --> 00:10:25,200
there are such
things as magnets.

320
00:10:25,200 --> 00:10:26,140
There are metals.

321
00:10:26,140 --> 00:10:28,290
And they should interact
in this way or that way.

322
00:10:28,290 --> 00:10:29,490
Once we define a
grammar like that,

323
00:10:29,490 --> 00:10:31,781
you can predict all sorts of
different theories, right?

324
00:10:31,781 --> 00:10:33,900
I gave simplified magnetism,
but you could also

325
00:10:33,900 --> 00:10:36,000
capture kinship
with a set of four

326
00:10:36,000 --> 00:10:38,350
laws and several predicates.

327
00:10:38,350 --> 00:10:39,900
You could capture taxonomy.

328
00:10:39,900 --> 00:10:43,500
You could capture very,
very simplified psychology

329
00:10:43,500 --> 00:10:44,921
of preference.

330
00:10:44,921 --> 00:10:45,420
OK.

331
00:10:45,420 --> 00:10:48,630
So as I said, this is another
view of theory search.

332
00:10:48,630 --> 00:10:50,512
I haven't yet gotten to
the algorithmic level

333
00:10:50,512 --> 00:10:52,470
of how do you actually
search for these things.

334
00:10:52,470 --> 00:10:53,820
But I did want to
set up the grammar,

335
00:10:53,820 --> 00:10:56,240
because it's going to be a
little bit important later on.

336
00:10:56,240 --> 00:10:57,500
The grammar is just a
way of saying, you know,

337
00:10:57,500 --> 00:10:58,541
you start with something.

338
00:10:58,541 --> 00:11:00,510
You generate it forward
until you end up

339
00:11:00,510 --> 00:11:04,570
with a particular
set of logical rules.

340
00:11:04,570 --> 00:11:09,300
So, again, to phrase the
problem in a particular way

341
00:11:09,300 --> 00:11:12,670
of logical inference is to say
we have the space of theories.

342
00:11:12,670 --> 00:11:14,130
It's an infinite space, right?

343
00:11:14,130 --> 00:11:16,200
And before I've
seen any data, there

344
00:11:16,200 --> 00:11:18,510
is some probability
of the right theory

345
00:11:18,510 --> 00:11:20,372
is to explain my data,
where the data can

346
00:11:20,372 --> 00:11:22,830
be something like-- play with
these blogs, figure them out.

347
00:11:22,830 --> 00:11:24,492
Before I've even
seen the blocks,

348
00:11:24,492 --> 00:11:26,700
before I've seen anything
that I need to explain yet,

349
00:11:26,700 --> 00:11:28,650
I have some sort of space
of all possible theories.

350
00:11:28,650 --> 00:11:30,108
It can be all the
logical theories,

351
00:11:30,108 --> 00:11:32,261
all the possible
programs in the world.

352
00:11:32,261 --> 00:11:34,260
And I have some sort of
probability distribution

353
00:11:34,260 --> 00:11:36,426
over which one of these are
more likely than others,

354
00:11:36,426 --> 00:11:38,326
which comes from the prior.

355
00:11:38,326 --> 00:11:40,200
And the prior can be
something on simplicity,

356
00:11:40,200 --> 00:11:43,980
like shorter programs are
more likely, or the less free

357
00:11:43,980 --> 00:11:45,045
parameters the better.

358
00:11:45,045 --> 00:11:46,170
Something like that, right?

359
00:11:46,170 --> 00:11:49,344
So before I've seen any data, I
can already score some programs

360
00:11:49,344 --> 00:11:50,760
as being unlikely,
because they're

361
00:11:50,760 --> 00:11:52,710
too long and too ungainly.

362
00:11:52,710 --> 00:11:54,160
Does that make
sense to everyone?

363
00:11:54,160 --> 00:11:54,792
OK.

364
00:11:54,792 --> 00:11:57,000
And that's just a representation
of that in 2D space.

365
00:11:57,000 --> 00:11:58,874
It's saying, like, over
here is theory space.

366
00:11:58,874 --> 00:12:01,170
And the size of the
hills over 2D space

367
00:12:01,170 --> 00:12:03,890
is how likely each point,
each point in theory space,

368
00:12:03,890 --> 00:12:05,950
is a theory, where
theory can be,

369
00:12:05,950 --> 00:12:08,880
as I said, a set of logical
laws or program or anything.

370
00:12:08,880 --> 00:12:10,380
And the height of
the hill over that

371
00:12:10,380 --> 00:12:11,610
is the amount of
probability you should

372
00:12:11,610 --> 00:12:13,985
assign to that theory, how
much you should believe in it.

373
00:12:13,985 --> 00:12:16,410
And as data comes
in, what happens

374
00:12:16,410 --> 00:12:19,080
is you become more or less
certain of certain programs.

375
00:12:19,080 --> 00:12:21,401
Like, suddenly you shift
probability distributions

376
00:12:21,401 --> 00:12:21,900
around.

377
00:12:21,900 --> 00:12:24,525
You say, oh, these things that
I didn't think were that likely,

378
00:12:24,525 --> 00:12:27,840
well, the data's really pushing
me to accepting these theories.

379
00:12:27,840 --> 00:12:28,380
OK?

380
00:12:28,380 --> 00:12:29,970
And that's what
learning is, right?

381
00:12:29,970 --> 00:12:31,136
That's the view of learning.

382
00:12:31,136 --> 00:12:33,210
You just start with
particular probabilities.

383
00:12:33,210 --> 00:12:34,470
You get some data.

384
00:12:34,470 --> 00:12:36,870
And then you shift
that around, and you

385
00:12:36,870 --> 00:12:37,980
get other probabilities.

386
00:12:37,980 --> 00:12:40,830
And that's sort of a
very beautiful picture.

387
00:12:40,830 --> 00:12:43,682
And the only problem with it
is that it is clearly false.

388
00:12:43,682 --> 00:12:45,390
And the reason it's
false is because this

389
00:12:45,390 --> 00:12:46,265
is an infinite space.

390
00:12:46,265 --> 00:12:48,139
And there's no way that
you could have scored

391
00:12:48,139 --> 00:12:49,237
the entire infinite space.

392
00:12:49,237 --> 00:12:51,570
There's no way that what
children are doing, what adults

393
00:12:51,570 --> 00:12:53,310
are doing, what
computers are doing

394
00:12:53,310 --> 00:12:56,460
is to instantly shift
probability mass

395
00:12:56,460 --> 00:12:57,960
over this entire space, right?

396
00:12:57,960 --> 00:12:58,800
New data comes in.

397
00:12:58,800 --> 00:13:00,660
They're holding in
this infinite space

398
00:13:00,660 --> 00:13:03,390
and shifting exactly the
probabilities around, right?

399
00:13:03,390 --> 00:13:06,270
This view of learning is
ridiculous, because, well, it's

400
00:13:06,270 --> 00:13:07,350
patently absurd.

401
00:13:07,350 --> 00:13:11,040
And also, people have
taken issue with it

402
00:13:11,040 --> 00:13:13,980
even though it was never meant
to be the story of learning

403
00:13:13,980 --> 00:13:15,792
that is happening
in the real-- well,

404
00:13:15,792 --> 00:13:17,500
how should I put this,
in the real world?

405
00:13:17,500 --> 00:13:20,125
Do you guys know the difference
between the computational level

406
00:13:20,125 --> 00:13:23,010
and the algorithmic level when
I say something like that?

407
00:13:23,010 --> 00:13:25,680
Show of hands-- who knows
about Marr's three levels?

408
00:13:25,680 --> 00:13:26,190
OK.

409
00:13:26,190 --> 00:13:28,880
Who doesn't know about Marr's
three levels of explanation?

410
00:13:28,880 --> 00:13:29,550
OK.

411
00:13:29,550 --> 00:13:32,100
The point is to say when you
try to explain the phenomenon,

412
00:13:32,100 --> 00:13:34,350
you're going to give it
several levels of explanation.

413
00:13:34,350 --> 00:13:36,360
You're going to give it sort
of the functional level, what

414
00:13:36,360 --> 00:13:37,944
we might call the
computational level.

415
00:13:37,944 --> 00:13:39,484
And then you're
going to actually say

416
00:13:39,484 --> 00:13:41,130
how does this actually
get implemented

417
00:13:41,130 --> 00:13:42,292
in an actual machine.

418
00:13:42,292 --> 00:13:44,250
And there are many ways
of implementing things.

419
00:13:44,250 --> 00:13:47,070
Like you might say, well, look,
the general problem for vision

420
00:13:47,070 --> 00:13:50,170
is this, or this thing needs
to be in addition function.

421
00:13:50,170 --> 00:13:52,360
These are the sort of
things I wanted to do.

422
00:13:52,360 --> 00:13:53,735
But there are many
different ways

423
00:13:53,735 --> 00:13:55,390
to implement that function.

424
00:13:55,390 --> 00:13:56,950
Some of them are
better than others.

425
00:13:56,950 --> 00:13:58,180
That's the algorithmic level.

426
00:13:58,180 --> 00:14:01,440
How does this actually get
implemented in an algorithm?

427
00:14:01,440 --> 00:14:02,940
Then you can implement
the algorithm

428
00:14:02,940 --> 00:14:04,754
in many different
mechanistic ways.

429
00:14:04,754 --> 00:14:07,170
You can go and implement it
in neurons or in silicon chips

430
00:14:07,170 --> 00:14:07,830
or things like that.

431
00:14:07,830 --> 00:14:09,705
There are many different
ways of implementing

432
00:14:09,705 --> 00:14:10,950
a particular algorithm.

433
00:14:10,950 --> 00:14:13,914
This view of, you know,
you have some theories,

434
00:14:13,914 --> 00:14:15,330
you have some prior
over theories,

435
00:14:15,330 --> 00:14:18,360
you shift the probabilities
around as data comes in--

436
00:14:18,360 --> 00:14:21,170
that's an explanation on
the computational level.

437
00:14:21,170 --> 00:14:23,970
That's not an explanation of how
we actually shift them around,

438
00:14:23,970 --> 00:14:26,640
how we actually search
for these theories.

439
00:14:26,640 --> 00:14:29,010
And really the
computational level

440
00:14:29,010 --> 00:14:31,417
sometimes gets grief,
because people say, well,

441
00:14:31,417 --> 00:14:32,250
what are you saying?

442
00:14:32,250 --> 00:14:33,750
Are you saying that
Einstein somehow

443
00:14:33,750 --> 00:14:35,490
had the theory of
relativity, we all

444
00:14:35,490 --> 00:14:37,470
had the theory of
relativity, in our head?

445
00:14:37,470 --> 00:14:39,080
And his process of
discovery was just

446
00:14:39,080 --> 00:14:40,920
to say, well, I believe
in Newton's theory.

447
00:14:40,920 --> 00:14:44,070
And I had some low
prior on relativity.

448
00:14:44,070 --> 00:14:47,160
But then the data came in,
and I shifted my probability

449
00:14:47,160 --> 00:14:48,780
to the theory of relativity.

450
00:14:48,780 --> 00:14:50,826
That sounds not the way
people actually learn.

451
00:14:50,826 --> 00:14:52,950
That doesn't sound like
the way we discover things.

452
00:14:52,950 --> 00:14:55,260
That doesn't sound like the way
we come up with new theories.

453
00:14:55,260 --> 00:14:57,420
That doesn't sound like the
way that children learn.

454
00:14:57,420 --> 00:14:59,290
And as I said, that's not
exactly what we think.

455
00:14:59,290 --> 00:15:01,290
And that's not what happens
in computers either.

456
00:15:01,290 --> 00:15:03,090
That's not the
algorithmic level.

457
00:15:03,090 --> 00:15:04,880
So what happens in
the algorithm level

458
00:15:04,880 --> 00:15:08,285
is you actually have to
search for your theories, OK?

459
00:15:08,285 --> 00:15:10,160
And this is what happens
in stochastic search

460
00:15:10,160 --> 00:15:10,900
in particular.

461
00:15:10,900 --> 00:15:12,080
What how does in
stochastic search--

462
00:15:12,080 --> 00:15:14,240
those of you who are in
tutorial remember this--

463
00:15:14,240 --> 00:15:15,780
you have some space of theories.

464
00:15:15,780 --> 00:15:19,540
OK, I'm still giving you the
space of possible theories.

465
00:15:19,540 --> 00:15:21,740
Each dot in theory space
is, let's say, a program.

466
00:15:21,740 --> 00:15:24,177
Or in this case, it's something
like a theory in the sense

467
00:15:24,177 --> 00:15:25,010
I defined it before.

468
00:15:25,010 --> 00:15:28,370
It's a set of logical laws
relating these predicates

469
00:15:28,370 --> 00:15:28,890
together.

470
00:15:28,890 --> 00:15:29,390
OK.

471
00:15:29,390 --> 00:15:31,142
So let me use--

472
00:15:31,142 --> 00:15:32,600
I'm not sure you
can follow my hand

473
00:15:32,600 --> 00:15:34,790
like this, right-- gaze
detection or whatever.

474
00:15:34,790 --> 00:15:38,630
So theory A is, let's
say, a theory that

475
00:15:38,630 --> 00:15:39,910
has three possible laws.

476
00:15:39,910 --> 00:15:43,460
It says if X has a
P, P can be anything.

477
00:15:43,460 --> 00:15:46,160
And Y is a P. It's the
same sort of thing.

478
00:15:46,160 --> 00:15:47,520
Then they're going to interact.

479
00:15:47,520 --> 00:15:51,069
The second law says if
X is P and Y is a Q,

480
00:15:51,069 --> 00:15:51,860
they will interact.

481
00:15:51,860 --> 00:15:52,490
It doesn't matter.

482
00:15:52,490 --> 00:15:53,960
You don't have to figure
this theory out, right?

483
00:15:53,960 --> 00:15:56,501
I'm just trying to give you an
example of what a theory could

484
00:15:56,501 --> 00:15:58,670
be theoretically.

485
00:15:58,670 --> 00:16:01,490
And they interact symmetrically.

486
00:16:01,490 --> 00:16:02,270
That's one dot.

487
00:16:02,270 --> 00:16:03,710
Let's call that dot number one.

488
00:16:03,710 --> 00:16:07,010
That's theory A. Here is dot
number two, theory B. OK?

489
00:16:07,010 --> 00:16:09,689
You as the learner, as the
stochastic search learner

490
00:16:09,689 --> 00:16:11,480
in the algorithmic
level, don't have access

491
00:16:11,480 --> 00:16:12,521
to the full theory space.

492
00:16:12,521 --> 00:16:14,870
All you have is A. That's
where you are right now.

493
00:16:14,870 --> 00:16:16,250
That's all you
have of the world.

494
00:16:16,250 --> 00:16:16,750
OK.

495
00:16:16,750 --> 00:16:18,707
That's how you can try
to explain the world.

496
00:16:18,707 --> 00:16:20,540
And what you can do is
you can try proposing

497
00:16:20,540 --> 00:16:22,550
certain changes to your theory.

498
00:16:22,550 --> 00:16:24,740
You can try taking out a
law or putting in a law

499
00:16:24,740 --> 00:16:27,080
or taking out a predicate,
somehow sort of messing

500
00:16:27,080 --> 00:16:29,160
with your theory,
hacking with your theory

501
00:16:29,160 --> 00:16:31,790
somehow, coming up
with a new theory that

502
00:16:31,790 --> 00:16:34,310
gives you theory B.
It's a different point

503
00:16:34,310 --> 00:16:35,511
in theory space.

504
00:16:35,511 --> 00:16:37,760
And what you do then is you
compare your two theories.

505
00:16:37,760 --> 00:16:40,454
You say, how well does
this predict the data?

506
00:16:40,454 --> 00:16:41,870
So and so, you
give it some score.

507
00:16:41,870 --> 00:16:44,080
You say, how well does this
theory predict the data?

508
00:16:44,080 --> 00:16:45,496
So and so, you
give it some score.

509
00:16:45,496 --> 00:16:48,140
You say, how likely is
the theory a priori?

510
00:16:48,140 --> 00:16:49,040
How short is it?

511
00:16:49,040 --> 00:16:50,110
How simple is it?

512
00:16:50,110 --> 00:16:50,690
OK.

513
00:16:50,690 --> 00:16:52,310
How short is this
theory a priori?

514
00:16:52,310 --> 00:16:52,550
OK.

515
00:16:52,550 --> 00:16:54,110
So you get some score
for this theory that's

516
00:16:54,110 --> 00:16:55,514
based on the data in the prior.

517
00:16:55,514 --> 00:16:56,930
You get some score
for this theory

518
00:16:56,930 --> 00:16:58,220
based on the data in the prior.

519
00:16:58,220 --> 00:17:00,553
And you basically decide which
one of these two theories

520
00:17:00,553 --> 00:17:01,310
to accept.

521
00:17:01,310 --> 00:17:03,590
You could either stay
with your old theory

522
00:17:03,590 --> 00:17:05,124
before you proposed any changes.

523
00:17:05,124 --> 00:17:06,790
Or you could decide
that, wait a minute,

524
00:17:06,790 --> 00:17:08,960
that theory that I just
proposed, the new one,

525
00:17:08,960 --> 00:17:10,102
is actually a bit better.

526
00:17:10,102 --> 00:17:12,560
Or even if it's not better,
it's not doing that much worse,

527
00:17:12,560 --> 00:17:13,450
so I'll jump to it.

528
00:17:13,450 --> 00:17:13,950
OK.

529
00:17:13,950 --> 00:17:15,970
That's the stochastic
part in stochastic search

530
00:17:15,970 --> 00:17:18,770
or in like these
Metropolis-Hastings algorithm.

531
00:17:18,770 --> 00:17:21,770
This way of sort of
jumping around in the space

532
00:17:21,770 --> 00:17:23,599
is the stochastic search
I'm talking about.

533
00:17:23,599 --> 00:17:25,099
You're here in theory space.

534
00:17:25,099 --> 00:17:26,720
You have one theory.

535
00:17:26,720 --> 00:17:28,339
You propose a change
to that theory.

536
00:17:28,339 --> 00:17:29,840
And we'll get in a
second to how you propose

537
00:17:29,840 --> 00:17:30,839
a change to that theory.

538
00:17:30,839 --> 00:17:31,885
You propose a change.

539
00:17:31,885 --> 00:17:32,510
On end up here.

540
00:17:32,510 --> 00:17:34,010
You say, should I move here?

541
00:17:34,010 --> 00:17:34,610
Let's check.

542
00:17:34,610 --> 00:17:36,050
Is this theory doing any better?

543
00:17:36,050 --> 00:17:37,880
If it's doing
better, move there.

544
00:17:37,880 --> 00:17:41,690
If it's doing worse, well,
maybe still move there

545
00:17:41,690 --> 00:17:43,940
depending on how much
worse it's doing.

546
00:17:43,940 --> 00:17:46,940
And this process is sort of
jumping around in theory space.

547
00:17:46,940 --> 00:17:49,550
Probabilistically proposing
theories and accepting them

548
00:17:49,550 --> 00:17:55,240
is not that far from what
MCMC is doing or optimization

549
00:17:55,240 --> 00:17:57,900
through MCMC.

550
00:17:57,900 --> 00:17:58,670
Oh, sorry.

551
00:17:58,670 --> 00:17:59,970
I'm clicking on this.

552
00:17:59,970 --> 00:18:00,470
OK.

553
00:18:00,470 --> 00:18:02,570
And you can notice that this
sort of search is somewhat

554
00:18:02,570 --> 00:18:04,986
different from-- this is the
picture that Josh was showing

555
00:18:04,986 --> 00:18:06,080
before--

556
00:18:06,080 --> 00:18:08,294
gradient descent or
convex optimization,

557
00:18:08,294 --> 00:18:10,460
the sort of thing that you
might do neural networks.

558
00:18:10,460 --> 00:18:12,680
I'm not trying to suggest that
this is exactly what happens

559
00:18:12,680 --> 00:18:13,430
in neural networks.

560
00:18:13,430 --> 00:18:15,513
I mean, the more complicated
neural network stuff,

561
00:18:15,513 --> 00:18:18,620
the energy landscape,
can be quite complicated.

562
00:18:18,620 --> 00:18:20,990
And they still need to do
stochastic gradient descent.

563
00:18:20,990 --> 00:18:23,870
But it doesn't look as fully
connected and as horrible

564
00:18:23,870 --> 00:18:27,230
as these sort of theories
on top look like.

565
00:18:27,230 --> 00:18:29,765
Because the space
there is much more--

566
00:18:29,765 --> 00:18:31,140
I don't want to
say well-defined.

567
00:18:31,140 --> 00:18:32,100
They're both well-defined.

568
00:18:32,100 --> 00:18:34,370
But it's sort of much easier
to search through, right?

569
00:18:34,370 --> 00:18:36,800
It's sort of easy to get
in these neural networks

570
00:18:36,800 --> 00:18:38,990
to know where you're going
to sort of differentiate

571
00:18:38,990 --> 00:18:40,840
and quickly get to your target.

572
00:18:40,840 --> 00:18:42,320
You're not so much
doing this sort

573
00:18:42,320 --> 00:18:44,247
of hard laborious
stochastic search.

574
00:18:44,247 --> 00:18:45,830
You sort of have
this notion of, well,

575
00:18:45,830 --> 00:18:47,420
it's immediately
going to be the best

576
00:18:47,420 --> 00:18:49,210
direction to go in is this.

577
00:18:49,210 --> 00:18:50,049
OK?

578
00:18:50,049 --> 00:18:51,590
I'm just going to
do gradient decent.

579
00:18:51,590 --> 00:18:53,060
I'm going to roll downhill.

580
00:18:53,060 --> 00:18:56,540
And then that's some sort of
good point in neural network

581
00:18:56,540 --> 00:18:57,190
land.

582
00:18:57,190 --> 00:18:57,895
OK.

583
00:18:57,895 --> 00:19:00,020
So how do we actually
propose alternative theories?

584
00:19:00,020 --> 00:19:03,970
Well, you say, look, I have
some sort of, let's say--

585
00:19:03,970 --> 00:19:05,150
let's do this.

586
00:19:05,150 --> 00:19:05,720
OK.

587
00:19:05,720 --> 00:19:07,310
You have some sort
of particular rule.

588
00:19:07,310 --> 00:19:08,809
And then what you
do-- remember when

589
00:19:08,809 --> 00:19:10,934
I said you have some sort
of grammar over theories,

590
00:19:10,934 --> 00:19:13,058
kind of like a grammar in
language for those of you

591
00:19:13,058 --> 00:19:13,700
who know?

592
00:19:13,700 --> 00:19:15,790
The grammar describes
a particular tree

593
00:19:15,790 --> 00:19:17,990
that you walk through to
end up with your theory.

594
00:19:17,990 --> 00:19:20,120
What you do to propose a
change to it-- and this

595
00:19:20,120 --> 00:19:21,786
is true for both these
logical theories,

596
00:19:21,786 --> 00:19:23,120
but also for programs.

597
00:19:23,120 --> 00:19:24,410
You go to any [INAUDIBLE].

598
00:19:24,410 --> 00:19:27,160
You go to any sort of
node in that tree that

599
00:19:27,160 --> 00:19:29,450
generated your program
or generated your theory,

600
00:19:29,450 --> 00:19:30,200
and you change it.

601
00:19:30,200 --> 00:19:33,020
You sort of re-sample from
it or, you know, cut it

602
00:19:33,020 --> 00:19:34,380
and regenerate.

603
00:19:34,380 --> 00:19:38,040
And then what that ends up doing
is it ends up, for example,

604
00:19:38,040 --> 00:19:40,470
adding a new rule
or, for example,

605
00:19:40,470 --> 00:19:43,490
changing the predicate, or
adding a new rule again,

606
00:19:43,490 --> 00:19:45,920
or deleting a rule, or
deleting a predicate,

607
00:19:45,920 --> 00:19:48,140
or changing a predicate,
deleting a predicate by sort

608
00:19:48,140 --> 00:19:50,640
of changing predicates, deleting
rules, adding rules, adding

609
00:19:50,640 --> 00:19:51,650
predicates.

610
00:19:51,650 --> 00:19:54,490
I wasn't covering
the full space.

611
00:19:54,490 --> 00:19:58,740
You jump around in theory land.

612
00:19:58,740 --> 00:19:59,240
OK.

613
00:19:59,240 --> 00:20:04,150
And this sort of dynamic of
moving around in theory land

614
00:20:04,150 --> 00:20:05,440
stochastically--

615
00:20:05,440 --> 00:20:09,074
one step forward,
two steps back, you

616
00:20:09,074 --> 00:20:10,990
know, you don't know the
target ahead of time,

617
00:20:10,990 --> 00:20:12,490
when you actually
propose something,

618
00:20:12,490 --> 00:20:13,810
you proposed something new--

619
00:20:13,810 --> 00:20:16,450
different learners might
take different paths

620
00:20:16,450 --> 00:20:18,460
to get to the same target
ultimately starting

621
00:20:18,460 --> 00:20:20,296
from an equal
state of ignorance.

622
00:20:20,296 --> 00:20:21,920
You sort of propose
different theories.

623
00:20:21,920 --> 00:20:22,780
You end up in different ways.

624
00:20:22,780 --> 00:20:24,405
But usually, if you
get the right data,

625
00:20:24,405 --> 00:20:26,560
eventually you end
up in the same spot.

626
00:20:26,560 --> 00:20:28,541
These discrete moments
of sort of, you

627
00:20:28,541 --> 00:20:30,415
know, faffing about and
sort of saying, well,

628
00:20:30,415 --> 00:20:31,910
this rule, that rule,
that doesn't explain it.

629
00:20:31,910 --> 00:20:32,890
This explains it,
doesn't explain it.

630
00:20:32,890 --> 00:20:35,410
Suddenly, you propose something
that sort of clicks into place.

631
00:20:35,410 --> 00:20:36,790
You say, ah, it's
not that there's

632
00:20:36,790 --> 00:20:37,789
two things in the world.

633
00:20:37,789 --> 00:20:39,550
There's three things
in the world, right?

634
00:20:39,550 --> 00:20:42,580
It's not just
metals and plastics.

635
00:20:42,580 --> 00:20:44,760
There's actually metals
and magnets and plastics.

636
00:20:44,760 --> 00:20:45,260
Ah.

637
00:20:45,260 --> 00:20:47,530
And suddenly, you rearrange
the data and things like that,

638
00:20:47,530 --> 00:20:48,030
right?

639
00:20:48,030 --> 00:20:49,690
All these things,
all these dynamics,

640
00:20:49,690 --> 00:20:52,090
have something of the flavor
of children's learning.

641
00:20:52,090 --> 00:20:53,920
And I didn't give
you the full spiel.

642
00:20:53,920 --> 00:20:56,204
And you can go and read
some of our papers.

643
00:20:56,204 --> 00:20:57,620
There's also been
other people who

644
00:20:57,620 --> 00:20:58,995
have been very
interested in sort

645
00:20:58,995 --> 00:21:01,360
of looking at the dynamics
of stochastic search,

646
00:21:01,360 --> 00:21:03,670
how well it predicts
what children are doing.

647
00:21:03,670 --> 00:21:05,410
Among them, I'll
just mention people

648
00:21:05,410 --> 00:21:07,690
like Liz Bonawitz and
Stephanie Denison,

649
00:21:07,690 --> 00:21:10,840
and also Tom Griffiths,
and finally Alison Gopnik.

650
00:21:10,840 --> 00:21:14,200
So I also have TED
speaker on my side.

651
00:21:14,200 --> 00:21:19,380
And not just any TED speaker--
it's Laura's former advisor.

652
00:21:19,380 --> 00:21:22,990
So at that point, it's a good
point to hand it off to Laura

653
00:21:22,990 --> 00:21:24,900
and see what she has
to say about this.

654
00:21:24,900 --> 00:21:27,752
But a midpoint summary is to
say theories are useful, right?

655
00:21:27,752 --> 00:21:29,710
I think Josh has already
convinced you of that.

656
00:21:29,710 --> 00:21:31,090
We want theories.

657
00:21:31,090 --> 00:21:33,340
The problem is that they
define these rich structured

658
00:21:33,340 --> 00:21:35,530
complicated landscapes
much more rich, much more

659
00:21:35,530 --> 00:21:36,850
complicated than
anything that you might

660
00:21:36,850 --> 00:21:37,974
find in the neural network.

661
00:21:37,974 --> 00:21:39,840
Well, yes.

662
00:21:39,840 --> 00:21:43,030
And it's hard to search for
these rich, complicated, fully

663
00:21:43,030 --> 00:21:45,850
connected landscapes, like
the space of all programs

664
00:21:45,850 --> 00:21:48,975
or the space of all
possible logic theories.

665
00:21:48,975 --> 00:21:50,350
And the way to
sort through it is

666
00:21:50,350 --> 00:21:53,320
stochastic search, which can
be horribly slow and wrong

667
00:21:53,320 --> 00:21:54,760
and things like that.

668
00:21:54,760 --> 00:21:57,610
But the claim is something
like, what are you

669
00:21:57,610 --> 00:21:58,660
going to do, right?

670
00:21:58,660 --> 00:22:00,220
I mean, we want
these rich theories.

671
00:22:00,220 --> 00:22:01,780
Rich theories define
rich landscapes.

672
00:22:01,780 --> 00:22:03,520
And you just have to
get away right now

673
00:22:03,520 --> 00:22:04,710
with stochastic search.

674
00:22:04,710 --> 00:22:06,670
Our algorithm solution
for these spaces

675
00:22:06,670 --> 00:22:08,680
are stochastic search
in that rich landscape.

676
00:22:08,680 --> 00:22:11,020
And, well, why
shouldn't that apply

677
00:22:11,020 --> 00:22:13,480
to what children are doing?

678
00:22:13,480 --> 00:22:15,880
So here with a why not is Laura.

679
00:22:19,065 --> 00:22:20,786
LAURA SCHULZ: I
think I have one.

680
00:22:20,786 --> 00:22:22,340
I'm hooked up, right?

681
00:22:22,340 --> 00:22:24,230
OK.

682
00:22:24,230 --> 00:22:27,350
So when I first
engaged in this debate

683
00:22:27,350 --> 00:22:30,890
with Tomer, I was stuck
on this kind of a thought

684
00:22:30,890 --> 00:22:33,860
that what's going to
happen here is-- in which,

685
00:22:33,860 --> 00:22:37,190
following an eloquent
exposition of a former model,

686
00:22:37,190 --> 00:22:39,650
attendant experiments,
and quantitative data,

687
00:22:39,650 --> 00:22:43,490
Laura proceeds to
wave her hands around.

688
00:22:43,490 --> 00:22:45,490
Someone was asking
about intuitions.

689
00:22:45,490 --> 00:22:47,660
And I just had this strong
compelling intuition

690
00:22:47,660 --> 00:22:49,460
this was completely wrong.

691
00:22:49,460 --> 00:22:51,380
But mainly, I was
puzzled as to why I was

692
00:22:51,380 --> 00:22:52,880
stuck on this archaic locution.

693
00:22:52,880 --> 00:22:55,574
What was I doing with
this "in which" situation?

694
00:22:55,574 --> 00:22:56,990
And it occurred
to me the reason I

695
00:22:56,990 --> 00:23:02,750
was stuck on this particular
locution is that I was thinking

696
00:23:02,750 --> 00:23:05,960
of a particular story
that nicely illustrates

697
00:23:05,960 --> 00:23:08,920
exactly what I think the problem
is with stochastic search.

698
00:23:08,920 --> 00:23:11,260
It's from a classic in
developmental literature,

699
00:23:11,260 --> 00:23:14,360
the stories, of course, A.
A. Milne and Winnie the Pooh.

700
00:23:14,360 --> 00:23:17,270
And this is the particular
problem of "in which"

701
00:23:17,270 --> 00:23:21,170
Christopher Robin and Pooh
and all go on an expedition

702
00:23:21,170 --> 00:23:22,820
to the North Pole.

703
00:23:22,820 --> 00:23:24,410
And so they're
organizing a search.

704
00:23:24,410 --> 00:23:27,740
And like Tomer's algorithms,
they don't actually

705
00:23:27,740 --> 00:23:29,510
know where the North Pole is.

706
00:23:29,510 --> 00:23:33,365
And it turns out, as Christopher
Robin confides in a moment

707
00:23:33,365 --> 00:23:36,230
to rabbit, they also don't know
what the North Pole is, also

708
00:23:36,230 --> 00:23:37,864
like Tomer's algorithms.

709
00:23:37,864 --> 00:23:39,530
But they do something
about the terrain.

710
00:23:39,530 --> 00:23:41,071
And they know
something about search.

711
00:23:41,071 --> 00:23:43,100
You gather all the
friends and relations.

712
00:23:43,100 --> 00:23:45,350
And you engage in an iterative
process over and over

713
00:23:45,350 --> 00:23:46,760
and over again
until you succeed,

714
00:23:46,760 --> 00:23:48,650
and you find yourself
at the North Pole.

715
00:23:48,650 --> 00:23:51,320
And Eeyore is a little
bit skeptical about this.

716
00:23:51,320 --> 00:23:53,370
He says, well, you can
call this you know,

717
00:23:53,370 --> 00:23:57,650
expo-whatever or
gathering nuts in May.

718
00:23:57,650 --> 00:23:59,815
It's all the same to him.

719
00:23:59,815 --> 00:24:01,190
And at the of the
day, I actually

720
00:24:01,190 --> 00:24:03,200
think Christopher Robin
and colleagues here

721
00:24:03,200 --> 00:24:06,890
have a certain set of advantages
over Tomer and colleagues.

722
00:24:06,890 --> 00:24:08,930
And that is that it is
a Hundred Acre Wood.

723
00:24:08,930 --> 00:24:10,620
And so if the North
Pole's really there,

724
00:24:10,620 --> 00:24:11,840
and they engage in
that process, and they

725
00:24:11,840 --> 00:24:13,590
have all of rabbits
friends and relations,

726
00:24:13,590 --> 00:24:15,650
they probably will find it.

727
00:24:15,650 --> 00:24:20,000
But it also turns out that,
unlike Tomer and colleagues,

728
00:24:20,000 --> 00:24:23,360
they do know something about
what they're searching for.

729
00:24:23,360 --> 00:24:25,230
Now, they're wrong about this.

730
00:24:25,230 --> 00:24:28,280
But nonetheless, I think they
access an important constraint

731
00:24:28,280 --> 00:24:30,786
on the search
process and actually

732
00:24:30,786 --> 00:24:31,910
helps them out quite a lot.

733
00:24:31,910 --> 00:24:37,200
So that's what I'm going to
try to talk to you about today.

734
00:24:37,200 --> 00:24:39,350
So here are the issues
with stochastic search.

735
00:24:39,350 --> 00:24:41,600
I think there are two big ones.

736
00:24:41,600 --> 00:24:43,580
Grant everything--
grant grammar,

737
00:24:43,580 --> 00:24:45,530
grant prior knowledge,
grant templates,

738
00:24:45,530 --> 00:24:47,810
grant a bias towards simplicity.

739
00:24:47,810 --> 00:24:50,270
The problem with an
infinite search space

740
00:24:50,270 --> 00:24:54,650
is infinite is a very, very,
very big space to search.

741
00:24:54,650 --> 00:24:56,640
It's a very big space to search.

742
00:24:56,640 --> 00:25:01,200
And children seem to do
remarkably well with it.

743
00:25:01,200 --> 00:25:04,040
So I'm going to, I guess, give
you a toy fictional example.

744
00:25:04,040 --> 00:25:09,140
I'm going to give you
a non-fiction example

745
00:25:09,140 --> 00:25:09,830
from my child.

746
00:25:09,830 --> 00:25:11,930
We were riding on an airplane.

747
00:25:11,930 --> 00:25:14,000
She was about 3 and
1/2 at the time.

748
00:25:14,000 --> 00:25:16,210
And she knows a lot
about airplanes.

749
00:25:16,210 --> 00:25:18,650
She has a lot of folk
physics, prior knowledge

750
00:25:18,650 --> 00:25:20,540
about airplanes.

751
00:25:20,540 --> 00:25:22,330
And she also knows
a lot about phones.

752
00:25:22,330 --> 00:25:23,996
She's had a lot of
experience of phones.

753
00:25:23,996 --> 00:25:25,520
But nothing in her
prior knowledge

754
00:25:25,520 --> 00:25:26,780
predicted the
announcement that you

755
00:25:26,780 --> 00:25:29,330
have to turn off your cell phone
when you fly a plane, right?

756
00:25:29,330 --> 00:25:30,440
So this was surprising.

757
00:25:30,440 --> 00:25:32,523
And she immediately said,
well, I know the answer.

758
00:25:32,523 --> 00:25:34,440
I know why you have to do that.

759
00:25:34,440 --> 00:25:36,170
And I know that she
doesn't know anything

760
00:25:36,170 --> 00:25:38,315
about radio transmission
or government bureaucracy.

761
00:25:38,315 --> 00:25:40,340
So I said, how do you know?

762
00:25:40,340 --> 00:25:41,240
Why do you think?

763
00:25:41,240 --> 00:25:45,300
And she said, well, because
when the plane takes off,

764
00:25:45,300 --> 00:25:48,790
it's too noisy to hear.

765
00:25:48,790 --> 00:25:51,680
You know, that example
is not especially clever

766
00:25:51,680 --> 00:25:54,050
or especially adorable.

767
00:25:54,050 --> 00:25:56,994
But what is really, really
interesting about that answer

768
00:25:56,994 --> 00:25:58,910
is that although it is
wrong-- it's even wrong

769
00:25:58,910 --> 00:26:01,520
as to the causal direction--

770
00:26:01,520 --> 00:26:04,040
it is a good wrong
answer compared

771
00:26:04,040 --> 00:26:05,960
to all of the other
things that are

772
00:26:05,960 --> 00:26:08,240
consistent with her prior
knowledge and the grammar

773
00:26:08,240 --> 00:26:11,630
of her intuitive theories
that she didn't say.

774
00:26:11,630 --> 00:26:13,890
She didn't say, because
airplanes are made of metal

775
00:26:13,890 --> 00:26:14,889
and so are phones.

776
00:26:14,889 --> 00:26:16,430
Because airplanes
fly over the Earth,

777
00:26:16,430 --> 00:26:17,510
and the Earth has phones.

778
00:26:17,510 --> 00:26:19,550
Because airplanes are
big and phones are small.

779
00:26:19,550 --> 00:26:21,592
Because airplanes-- her
grandfather lives in Ohio

780
00:26:21,592 --> 00:26:24,050
and has led her to believe that
everything is made in Ohio.

781
00:26:24,050 --> 00:26:26,200
Because airplanes and phones
are both made in Ohio.

782
00:26:26,200 --> 00:26:28,610
Infinite is a very big space.

783
00:26:28,610 --> 00:26:30,080
There are a lot
of things that you

784
00:26:30,080 --> 00:26:32,864
could say consistent
with prior knowledge

785
00:26:32,864 --> 00:26:34,280
where you're making
random changes

786
00:26:34,280 --> 00:26:38,330
in your intuitive theories
that are not even wrong.

787
00:26:38,330 --> 00:26:39,860
They're not even wrong.

788
00:26:39,860 --> 00:26:41,900
And so the real
question is, how did she

789
00:26:41,900 --> 00:26:45,260
converge at a good wrong
answer, at an answer

790
00:26:45,260 --> 00:26:48,130
that, although it is
wrong, makes sense?

791
00:26:48,130 --> 00:26:53,420
It isn't just, I
think, a toy problem.

792
00:26:53,420 --> 00:26:55,130
But here is the problem.

793
00:26:55,130 --> 00:26:57,110
There are innumerable
logical constitutive

794
00:26:57,110 --> 00:26:59,900
causal and relational hypotheses
consistent with the grammar

795
00:26:59,900 --> 00:27:01,200
of intuitive theories.

796
00:27:01,200 --> 00:27:03,650
How do we so rapidly, literally
between the announcement

797
00:27:03,650 --> 00:27:05,191
and the next thing
out of our mouths,

798
00:27:05,191 --> 00:27:08,240
converge on ones that,
if they were true,

799
00:27:08,240 --> 00:27:09,542
might explain the data?

800
00:27:09,542 --> 00:27:10,500
They might not be true.

801
00:27:10,500 --> 00:27:12,710
But if they were,
they could work.

802
00:27:12,710 --> 00:27:14,300
They could solve problems.

803
00:27:14,300 --> 00:27:17,300
And that I think is the
really hard mystery.

804
00:27:17,300 --> 00:27:20,830
And again, it's not
just a toy problem.

805
00:27:20,830 --> 00:27:24,380
Modeling even relatively
simple well-understood problems

806
00:27:24,380 --> 00:27:24,920
takes time.

807
00:27:24,920 --> 00:27:26,977
I would often come
across Josh's students

808
00:27:26,977 --> 00:27:28,060
wandering in the hallways.

809
00:27:28,060 --> 00:27:28,930
And I say, what are you doing?

810
00:27:28,930 --> 00:27:31,175
They're like, oh, I'm
waiting for my model to run.

811
00:27:31,175 --> 00:27:32,550
I'm like, waiting
for your model?

812
00:27:32,550 --> 00:27:33,930
Computers are really fast.

813
00:27:33,930 --> 00:27:35,810
They're fast
information processing.

814
00:27:35,810 --> 00:27:37,016
What is it doing?

815
00:27:37,016 --> 00:27:39,140
Well, it turns out it's
generating a lot of answers

816
00:27:39,140 --> 00:27:40,460
that aren't even wrong.

817
00:27:40,460 --> 00:27:41,970
That's what it's doing, right?

818
00:27:41,970 --> 00:27:44,360
It's spending a
lot of time sitting

819
00:27:44,360 --> 00:27:48,140
around sifting through things
that aren't even right.

820
00:27:48,140 --> 00:27:50,859
Iterations are spent
searching in hopeless places.

821
00:27:50,859 --> 00:27:53,150
And this is true of some of
the best and the brightest.

822
00:27:53,150 --> 00:27:56,920
This was like a fantastic
NIPS paper, a major advance.

823
00:27:56,920 --> 00:28:00,110
It's a probabilistic
graphics program.

824
00:28:00,110 --> 00:28:04,005
And it's solving the really
deep theory problem of CAPTCHAs.

825
00:28:04,005 --> 00:28:05,920
So it's trying to figure
out in this case--

826
00:28:05,920 --> 00:28:07,814
but it doesn't just
do-- in fairness-- well,

827
00:28:07,814 --> 00:28:08,730
let me return to that.

828
00:28:08,730 --> 00:28:09,950
Let me show you what it does do.

829
00:28:09,950 --> 00:28:11,241
There's a rectangle over there.

830
00:28:11,241 --> 00:28:14,300
It wants to be able to figure
out what is in that space.

831
00:28:14,300 --> 00:28:15,890
And it wants to model it.

832
00:28:15,890 --> 00:28:18,320
It wants to find a rectangle
in the lower left-hand corner

833
00:28:18,320 --> 00:28:18,950
of a scene.

834
00:28:18,950 --> 00:28:19,580
So what does it do?

835
00:28:19,580 --> 00:28:21,110
It generates pixels
all over the map

836
00:28:21,110 --> 00:28:22,318
until it finds the rectangle.

837
00:28:22,318 --> 00:28:24,450
And I saw this in [INAUDIBLE].

838
00:28:24,450 --> 00:28:26,164
And I said, why is
it looking all--

839
00:28:26,164 --> 00:28:27,580
can it at least
confine its search

840
00:28:27,580 --> 00:28:28,790
to the lower left-hand corner?

841
00:28:28,790 --> 00:28:30,540
But, of course, the
algorithm doesn't know

842
00:28:30,540 --> 00:28:31,820
from lower left-hand corners.

843
00:28:31,820 --> 00:28:33,266
It's a powerful algorithm.

844
00:28:33,266 --> 00:28:34,640
If you wanted to
solve a CAPTCHA,

845
00:28:34,640 --> 00:28:36,140
you could do-- see look at it.

846
00:28:36,140 --> 00:28:37,140
It's all over the place.

847
00:28:37,140 --> 00:28:39,530
Now, this is a virtue.

848
00:28:39,530 --> 00:28:40,680
And, yeah, it converges.

849
00:28:40,680 --> 00:28:42,200
And that's just great, right?

850
00:28:42,200 --> 00:28:45,092
But why doesn't it search
in lower left-hand corner?

851
00:28:45,092 --> 00:28:46,550
Well, the answer
is it doesn't know

852
00:28:46,550 --> 00:28:47,440
from lower left-hand corners.

853
00:28:47,440 --> 00:28:50,000
And it's a feature, not a bug,
that it doesn't know from this.

854
00:28:50,000 --> 00:28:51,140
Because that means
it doesn't just

855
00:28:51,140 --> 00:28:53,348
solve CAPTCHAs, which you
can do with edge detection.

856
00:28:53,348 --> 00:28:55,370
It can find, you know,
objects in the road.

857
00:28:55,370 --> 00:28:57,152
And it can do all
kinds of other things

858
00:28:57,152 --> 00:28:58,610
that I'm sure Josh
and Vikash would

859
00:28:58,610 --> 00:28:59,859
be happy to talk to you about.

860
00:28:59,859 --> 00:29:01,250
It's a pretty general thing.

861
00:29:01,250 --> 00:29:02,420
But it's not constrained.

862
00:29:02,420 --> 00:29:03,050
And that's a feature.

863
00:29:03,050 --> 00:29:04,500
But the interesting
thing about humans,

864
00:29:04,500 --> 00:29:06,041
including human
children, is they are

865
00:29:06,041 --> 00:29:07,524
both flexible and constrained.

866
00:29:07,524 --> 00:29:09,440
They can both solve a
whole bunch of problems,

867
00:29:09,440 --> 00:29:11,900
and they can converge on
them very quickly, right?

868
00:29:11,900 --> 00:29:14,930
Whereas, here, it trades off.

869
00:29:22,020 --> 00:29:23,870
And, again, that is
a simpler problem.

870
00:29:23,870 --> 00:29:25,670
That is a square and
a bunch of pixels.

871
00:29:25,670 --> 00:29:27,230
The kind of learning we're
talking about when we're

872
00:29:27,230 --> 00:29:29,396
trying to talk about theory
generation or real world

873
00:29:29,396 --> 00:29:32,480
learning is a really, really,
really, really big space.

874
00:29:32,480 --> 00:29:34,190
So one problem is
just how are you

875
00:29:34,190 --> 00:29:37,310
going to get at least to
answers that, if they were true,

876
00:29:37,310 --> 00:29:38,960
might work?

877
00:29:38,960 --> 00:29:42,230
But if one problem is that the
theory space is really big,

878
00:29:42,230 --> 00:29:45,440
the other problem is that human
learners are not that dumb.

879
00:29:45,440 --> 00:29:46,780
We have a lot of knowledge.

880
00:29:46,780 --> 00:29:49,280
And we have a lot of knowledge
that these algorithms are not

881
00:29:49,280 --> 00:29:51,230
making use of.

882
00:29:51,230 --> 00:29:53,170
And the question is, why not?

883
00:29:53,170 --> 00:29:54,920
And is there any way
that we could develop

884
00:29:54,920 --> 00:29:57,200
models that did make use of it?

885
00:29:57,200 --> 00:30:00,650
In particular, we know a
lot about our problems.

886
00:30:00,650 --> 00:30:02,420
Our problems are
actually our friends.

887
00:30:02,420 --> 00:30:04,190
We know about our
problems and our goals.

888
00:30:04,190 --> 00:30:07,220
And we know about our
problems well before we

889
00:30:07,220 --> 00:30:10,250
can solve those problems.

890
00:30:10,250 --> 00:30:13,670
An abstract representation
of what the solution might

891
00:30:13,670 --> 00:30:16,160
look like, what it
ought to do, what

892
00:30:16,160 --> 00:30:18,560
the criteria it's
trying to satisfy are,

893
00:30:18,560 --> 00:30:20,960
could help constrain
and guide the search.

894
00:30:20,960 --> 00:30:22,520
It matters about
it though, not just

895
00:30:22,520 --> 00:30:24,860
that she had prior knowledge
about airplanes and about

896
00:30:24,860 --> 00:30:27,050
phones, but that she
had prior knowledge

897
00:30:27,050 --> 00:30:28,890
that the problem
she was solving was

898
00:30:28,890 --> 00:30:32,390
an unpredicted incompatibility
between airplanes and phones.

899
00:30:32,390 --> 00:30:35,060
You have to turn one off
when the other is going on.

900
00:30:35,060 --> 00:30:37,475
That's information that's
not in her general background

901
00:30:37,475 --> 00:30:37,975
knowledge.

902
00:30:37,975 --> 00:30:41,066
It's about the particular
problem that she has.

903
00:30:41,066 --> 00:30:42,440
And the question
is how could you

904
00:30:42,440 --> 00:30:49,820
use that to make good proposals
and make better proposals?

905
00:30:49,820 --> 00:30:53,300
So the proposal I
have is that when

906
00:30:53,300 --> 00:30:56,359
you know something about
what you're looking for,

907
00:30:56,359 --> 00:30:57,650
then that can help you find it.

908
00:30:57,650 --> 00:30:58,640
And this is the
kind of knowledge

909
00:30:58,640 --> 00:30:59,900
that Christopher
Robin and colleagues

910
00:30:59,900 --> 00:31:01,490
had that Tomer and
colleagues did not.

911
00:31:01,490 --> 00:31:04,730
They at least eventually decided
that what they were looking for

912
00:31:04,730 --> 00:31:05,810
was, of course, a pole.

913
00:31:05,810 --> 00:31:06,980
They know what poles are.

914
00:31:06,980 --> 00:31:08,090
Therefore, when
they find a pole,

915
00:31:08,090 --> 00:31:10,048
they can be quite confident
that here they are.

916
00:31:10,048 --> 00:31:12,500
And this is a good candidate
solution to their problem.

917
00:31:12,500 --> 00:31:16,830
That solution is wrong,
but at least it's wrong.

918
00:31:19,700 --> 00:31:20,320
OK.

919
00:31:20,320 --> 00:31:24,780
So the argument
here, which I'm going

920
00:31:24,780 --> 00:31:26,880
to try to get
slightly more precise,

921
00:31:26,880 --> 00:31:29,940
is that the form of the problem
as an input to the algorithm

922
00:31:29,940 --> 00:31:33,630
should increase the probability
that proposes useful ideas.

923
00:31:33,630 --> 00:31:38,790
And you can consider this
even in the simplest form

924
00:31:38,790 --> 00:31:41,385
in the kind of information that
is contained in our question

925
00:31:41,385 --> 00:31:41,885
words.

926
00:31:41,885 --> 00:31:44,550
So I think it's an interesting
feature about human cognition

927
00:31:44,550 --> 00:31:46,740
that we have a very, very
small handful of question

928
00:31:46,740 --> 00:31:50,941
words, which we use to
query the entire universe.

929
00:31:50,941 --> 00:31:51,690
And you know what?

930
00:31:51,690 --> 00:31:54,670
Those question words do
a lot of work for us.

931
00:31:54,670 --> 00:31:58,410
When I tell you I'm asking
a question about who,

932
00:31:58,410 --> 00:32:01,274
you might propose that
we ought to be looking

933
00:32:01,274 --> 00:32:02,940
for some kind of
answer that's something

934
00:32:02,940 --> 00:32:03,890
like a social network.

935
00:32:03,890 --> 00:32:06,210
And a social network might
be more likely as an answer

936
00:32:06,210 --> 00:32:07,560
than a 2D map.

937
00:32:07,560 --> 00:32:10,500
Whereas, if I ask you a question
about where, well, you really

938
00:32:10,500 --> 00:32:12,150
do want to consider 2D maps.

939
00:32:12,150 --> 00:32:14,670
If I'm asking when, you're
talking about a time line

940
00:32:14,670 --> 00:32:15,390
answer.

941
00:32:15,390 --> 00:32:19,350
If I ask you a why question,
maybe it's a causal network.

942
00:32:19,350 --> 00:32:21,180
And if I'm asking
you a how question,

943
00:32:21,180 --> 00:32:24,037
maybe it's a circuit
diagram, right?

944
00:32:24,037 --> 00:32:26,370
You don't know anything about
the content at this point.

945
00:32:26,370 --> 00:32:27,900
I could be asking you anything.

946
00:32:27,900 --> 00:32:30,690
But I ask you who was
Christopher Columbus,

947
00:32:30,690 --> 00:32:33,120
and you answer 1492.

948
00:32:33,120 --> 00:32:35,730
That's the kind of thing that
our algorithms are doing.

949
00:32:35,730 --> 00:32:38,190
That's not even wrong, right?

950
00:32:38,190 --> 00:32:40,110
It's consistent with
your prior knowledge.

951
00:32:40,110 --> 00:32:42,270
And it's the kind
of thing Watson does

952
00:32:42,270 --> 00:32:44,400
as good [INAUDIBLE] solutions.

953
00:32:44,400 --> 00:32:46,420
But it's not what children do.

954
00:32:46,420 --> 00:32:49,290
It's not what children do.

955
00:32:49,290 --> 00:32:53,310
I think that the issue
is that this is actually

956
00:32:53,310 --> 00:32:54,830
a friendly amendment, right?

957
00:32:54,830 --> 00:32:57,500
Because in what we have
shown time and again--

958
00:32:57,500 --> 00:33:02,460
by we I mean not me, but
computational modeling folks--

959
00:33:02,460 --> 00:33:05,310
is that we can use
lots of information

960
00:33:05,310 --> 00:33:08,670
out there for hypothesis
evaluation, right?

961
00:33:08,670 --> 00:33:10,620
Once we have a theory
and we have the data,

962
00:33:10,620 --> 00:33:13,320
we can select and use this
information and say, well,

963
00:33:13,320 --> 00:33:15,600
does it answer the
problem or not, right?

964
00:33:15,600 --> 00:33:16,710
Does it improve?

965
00:33:16,710 --> 00:33:18,490
Does it make better predictions?

966
00:33:18,490 --> 00:33:22,720
So we use this kind of
information that we have.

967
00:33:22,720 --> 00:33:24,840
Even formally, we can
say that we can use

968
00:33:24,840 --> 00:33:26,912
it to select among hypotheses.

969
00:33:26,912 --> 00:33:29,370
We can use information about
the structural form of problem

970
00:33:29,370 --> 00:33:30,360
to represent them.

971
00:33:30,360 --> 00:33:34,470
The question is, can we use
the same kind of information

972
00:33:34,470 --> 00:33:35,970
to constrain the search space?

973
00:33:35,970 --> 00:33:37,136
And it's easy for me to say.

974
00:33:37,136 --> 00:33:38,790
I don't have to do that, right?

975
00:33:38,790 --> 00:33:43,710
But it's the kind of proposal
that I think is missing.

976
00:33:43,710 --> 00:33:46,890
Because we have rich constraints
that go far, far, far

977
00:33:46,890 --> 00:33:48,930
beyond our question words.

978
00:33:48,930 --> 00:33:50,370
The kinds of
problems that we have

979
00:33:50,370 --> 00:33:52,170
and the criteria
for solving them

980
00:33:52,170 --> 00:33:54,640
derive from all
kinds of sources.

981
00:33:54,640 --> 00:33:56,610
We try to solve different
kinds of problems--

982
00:33:56,610 --> 00:34:01,939
navigation in some cases,
explanation in other cases.

983
00:34:01,939 --> 00:34:03,480
And some of those
are epistemic ends.

984
00:34:03,480 --> 00:34:04,620
I want to persuade
you of something.

985
00:34:04,620 --> 00:34:06,300
I want to instruct
you in something.

986
00:34:06,300 --> 00:34:09,420
I want to deceive
you in some ways.

987
00:34:09,420 --> 00:34:11,520
But we also have all kinds
of non-epistemic goals.

988
00:34:11,520 --> 00:34:12,440
I want to impress you.

989
00:34:12,440 --> 00:34:13,315
I want to soothe you.

990
00:34:13,315 --> 00:34:14,370
I want to entertain you.

991
00:34:14,370 --> 00:34:17,639
Each of these goals is
actually a constraint

992
00:34:17,639 --> 00:34:21,026
on what is going to
count as the solution.

993
00:34:21,026 --> 00:34:24,210
Our goals are innumerable.

994
00:34:24,210 --> 00:34:25,980
But there are only
a small handful

995
00:34:25,980 --> 00:34:28,500
of ways you can
solve any given goal.

996
00:34:28,500 --> 00:34:31,139
So when you're dealing with
an infinite search space,

997
00:34:31,139 --> 00:34:34,139
having a goal, having
a problem, actually

998
00:34:34,139 --> 00:34:37,199
could act as a constraint on
how you search for the solution.

999
00:34:37,199 --> 00:34:39,810
And it is an interesting
feature of human cognition

1000
00:34:39,810 --> 00:34:43,710
that our goals can
be noble or venial.

1001
00:34:43,710 --> 00:34:46,199
They can be
impressive or trivial.

1002
00:34:46,199 --> 00:34:49,080
And it may not
matter with respect

1003
00:34:49,080 --> 00:34:50,850
to the solution we have.

1004
00:34:50,850 --> 00:34:54,090
We have analytic logic,
because the medieval monks

1005
00:34:54,090 --> 00:34:58,530
wanted to find incontrovertible
proof for the existence of God.

1006
00:34:58,530 --> 00:35:00,480
We don't hold onto
their goal, but we

1007
00:35:00,480 --> 00:35:02,280
hold onto their solution.

1008
00:35:02,280 --> 00:35:07,590
We are here in the East
Coast of Massachusetts,

1009
00:35:07,590 --> 00:35:11,550
because of the search for
the West Indies, right?

1010
00:35:11,550 --> 00:35:15,010
So our goals act as
constraints on the solution

1011
00:35:15,010 --> 00:35:16,560
and on the search process.

1012
00:35:16,560 --> 00:35:18,070
And the importance
of our goals may

1013
00:35:18,070 --> 00:35:21,600
be that they do exactly
that, that they help leverage

1014
00:35:21,600 --> 00:35:27,330
some new search in a way that
at least helps us make progress.

1015
00:35:27,330 --> 00:35:30,630
So the argument here is
instead of stochastic search,

1016
00:35:30,630 --> 00:35:31,221
that we have--

1017
00:35:31,221 --> 00:35:32,470
I don't call it goal oriented.

1018
00:35:32,470 --> 00:35:36,690
I call it goal constrained now--
goal-constrained hypothesis

1019
00:35:36,690 --> 00:35:38,790
generation.

1020
00:35:38,790 --> 00:35:41,940
And the idea here
is that at least we

1021
00:35:41,940 --> 00:35:44,345
know something about
where we want to go.

1022
00:35:44,345 --> 00:35:45,720
Now, this is not
a total argument

1023
00:35:45,720 --> 00:35:47,100
against stochastic search.

1024
00:35:47,100 --> 00:35:49,500
It's just a way of getting
stochastic search into a much

1025
00:35:49,500 --> 00:35:51,390
smaller search space, right?

1026
00:35:51,390 --> 00:35:53,970
Once you know what things count,
then you can do everything

1027
00:35:53,970 --> 00:35:54,720
Tomer says you do.

1028
00:35:54,720 --> 00:35:55,804
I actually agree with him.

1029
00:35:55,804 --> 00:35:57,928
But you don't want to do
it over all possibilities,

1030
00:35:57,928 --> 00:35:59,910
because you know a lot
more than that, right?

1031
00:35:59,910 --> 00:36:01,480
You know what kinds of
things are going to count.

1032
00:36:01,480 --> 00:36:03,540
You should do it over that
space, not over-- should

1033
00:36:03,540 --> 00:36:04,740
look in the left-hand corner.

1034
00:36:04,740 --> 00:36:06,360
Then you can iterate
all the pixels you want.

1035
00:36:06,360 --> 00:36:08,151
But if the thing's on
the left-hand corner,

1036
00:36:08,151 --> 00:36:11,400
that's where you
ought to be looking.

1037
00:36:11,400 --> 00:36:15,120
I'm going to give you a
corollary to this, which

1038
00:36:15,120 --> 00:36:17,930
is if you don't have any idea
what the search space is,

1039
00:36:17,930 --> 00:36:19,820
you are going to
resort to an extremely

1040
00:36:19,820 --> 00:36:22,064
inefficient, extremely
frustrating search

1041
00:36:22,064 --> 00:36:24,230
and, actually, the kinds
of conditions under which I

1042
00:36:24,230 --> 00:36:25,670
think human beings quit.

1043
00:36:25,670 --> 00:36:29,150
So I will give you an example
from my personal experience,

1044
00:36:29,150 --> 00:36:30,380
as Jessica Sommerville knows.

1045
00:36:30,380 --> 00:36:33,186
Because we were trying to
get my child's booster seat--

1046
00:36:33,186 --> 00:36:35,060
because children now
need booster seats until

1047
00:36:35,060 --> 00:36:36,660
they're 14--

1048
00:36:36,660 --> 00:36:38,510
re-attached from
my plane flight.

1049
00:36:38,510 --> 00:36:39,530
Couldn't do it, right?

1050
00:36:39,530 --> 00:36:41,113
One thing goes down,
two things go up.

1051
00:36:41,113 --> 00:36:42,260
It's a spatial problem.

1052
00:36:42,260 --> 00:36:44,470
We spent 10 minutes,
two PhDs, on it--

1053
00:36:44,470 --> 00:36:46,777
threw the thing out
and just had her ride

1054
00:36:46,777 --> 00:36:48,860
in the bottom of the booster
seat without the top.

1055
00:36:48,860 --> 00:36:52,730
If I have 1,000 piece puzzle and
I have to find a puzzle piece,

1056
00:36:52,730 --> 00:36:55,280
people who are good
at puzzles know

1057
00:36:55,280 --> 00:36:58,360
before they find that
piece something about what

1058
00:36:58,360 --> 00:36:59,360
it's going to look like.

1059
00:36:59,360 --> 00:37:01,526
Like, OK, well, the edge
has to be angled like this.

1060
00:37:01,526 --> 00:37:03,980
It has to have a concavity
here and a convexity there.

1061
00:37:03,980 --> 00:37:05,380
And that's what I'm looking for.

1062
00:37:05,380 --> 00:37:07,070
Me-- soon as I look
away from the piece,

1063
00:37:07,070 --> 00:37:08,690
I have no idea what
I'm looking for.

1064
00:37:08,690 --> 00:37:11,034
And so I do what Tomer would do.

1065
00:37:11,034 --> 00:37:12,950
I do a stochastic search
over all the puzzles,

1066
00:37:12,950 --> 00:37:15,140
And with 1,000 pieces
and many permutations,

1067
00:37:15,140 --> 00:37:16,970
that is not a good
way to solve a puzzle.

1068
00:37:16,970 --> 00:37:19,820
As a result, I never
do puzzles, right?

1069
00:37:19,820 --> 00:37:21,770
As entertainment, I
just don't understand.

1070
00:37:21,770 --> 00:37:24,846
But if you do know, it's
very satisfying, right?

1071
00:37:24,846 --> 00:37:26,220
You know what
you're looking for.

1072
00:37:26,220 --> 00:37:28,011
And now you can constrain
your search space

1073
00:37:28,011 --> 00:37:31,250
much more effectively to
those kinds of things.

1074
00:37:31,250 --> 00:37:33,320
Indeed, when I say we
are smarter than that,

1075
00:37:33,320 --> 00:37:34,490
human beings know.

1076
00:37:34,490 --> 00:37:37,170
We have metacognitive principles
around these kinds of things,

1077
00:37:37,170 --> 00:37:37,670
right?

1078
00:37:37,670 --> 00:37:39,740
This knowledge is
in human minds,

1079
00:37:39,740 --> 00:37:41,390
what it might mean
for us to think

1080
00:37:41,390 --> 00:37:43,520
that a problem is tractable.

1081
00:37:43,520 --> 00:37:45,110
What does that word mean, right?

1082
00:37:45,110 --> 00:37:46,910
Sometimes it means we have
the financial resources

1083
00:37:46,910 --> 00:37:48,326
or the technology
to carry it off.

1084
00:37:48,326 --> 00:37:50,690
But often, it means we
have a well-posed problem.

1085
00:37:50,690 --> 00:37:52,280
We don't know the
answer, but we know

1086
00:37:52,280 --> 00:37:53,609
what the answer needs to do.

1087
00:37:53,609 --> 00:37:55,400
We know what the answer
needs to look like.

1088
00:37:55,400 --> 00:37:59,120
We have criteria for
what would count.

1089
00:37:59,120 --> 00:38:02,060
At least we have a precise
enough representation

1090
00:38:02,060 --> 00:38:04,880
of the problem to effectively
and efficiently guide

1091
00:38:04,880 --> 00:38:05,380
the search.

1092
00:38:05,380 --> 00:38:06,880
And I think that's
the kind of thing

1093
00:38:06,880 --> 00:38:09,517
we would ideally like our
models and algorithms to have.

1094
00:38:09,517 --> 00:38:11,600
At that point, we may have
to bounce around a lot.

1095
00:38:11,600 --> 00:38:16,470
But we're bouncing in a
pretty well-defined space.

1096
00:38:16,470 --> 00:38:18,620
So to the degree
that this is true,

1097
00:38:18,620 --> 00:38:21,980
it actually explains a lot of
otherwise peculiar features

1098
00:38:21,980 --> 00:38:24,440
of human cognition.

1099
00:38:24,440 --> 00:38:27,080
For instance, we
had this weird sense

1100
00:38:27,080 --> 00:38:28,900
that we're on the
right track, right?

1101
00:38:28,900 --> 00:38:30,450
Well, what does it mean
to be on the right track?

1102
00:38:30,450 --> 00:38:32,750
It surely doesn't mean you're
better at explaining the data,

1103
00:38:32,750 --> 00:38:33,250
right?

1104
00:38:33,250 --> 00:38:34,905
You may be nowhere
close to having

1105
00:38:34,905 --> 00:38:37,280
an answer that makes better
predictions or gets it right.

1106
00:38:37,280 --> 00:38:39,350
But you're like, oh, that's
a really good idea, you know?

1107
00:38:39,350 --> 00:38:40,040
You get excited.

1108
00:38:40,040 --> 00:38:41,540
Or you're like, no,
that's a non-answer.

1109
00:38:41,540 --> 00:38:42,289
What does it mean?

1110
00:38:42,289 --> 00:38:44,240
It might mean that
at least it fits

1111
00:38:44,240 --> 00:38:45,770
the abstract form of solution.

1112
00:38:45,770 --> 00:38:47,540
If it were true, it might work.

1113
00:38:50,130 --> 00:38:52,427
We can tell our students
in an undergraduate class

1114
00:38:52,427 --> 00:38:54,260
that that was a great
idea even when we know

1115
00:38:54,260 --> 00:38:55,642
it's been disproven, right?

1116
00:38:55,642 --> 00:38:56,600
So it's actually false.

1117
00:38:56,600 --> 00:38:57,480
It doesn't explain.

1118
00:38:57,480 --> 00:38:58,936
It's just not true.

1119
00:38:58,936 --> 00:39:01,060
We still think it's a great
idea in virtue of what?

1120
00:39:01,060 --> 00:39:03,590
Not it's fit to the data, right?

1121
00:39:03,590 --> 00:39:05,120
Not how well it's
predicting things.

1122
00:39:05,120 --> 00:39:07,460
It's actually false
and still good.

1123
00:39:07,460 --> 00:39:09,580
What could that
possibly mean, right?

1124
00:39:09,580 --> 00:39:11,780
It must mean there's
some other constraint

1125
00:39:11,780 --> 00:39:16,100
on hypothesis generation
that we are sensitive to.

1126
00:39:16,100 --> 00:39:18,620
So I want to suggest that
there are actually two

1127
00:39:18,620 --> 00:39:20,587
constraints for our hypotheses.

1128
00:39:20,587 --> 00:39:22,670
One is how well they fit
prior knowledge and data.

1129
00:39:22,670 --> 00:39:23,960
That's the one we
know something about.

1130
00:39:23,960 --> 00:39:25,400
That is, for instance, truth.

1131
00:39:25,400 --> 00:39:28,900
But Stephen Colbert
in his infinite wisdom

1132
00:39:28,900 --> 00:39:29,900
proposed something else.

1133
00:39:29,900 --> 00:39:32,233
He said, well, we're also
sensitive to this thing called

1134
00:39:32,233 --> 00:39:34,642
truthiness, right?

1135
00:39:34,642 --> 00:39:36,350
You know, like how
good the story sounds,

1136
00:39:36,350 --> 00:39:37,266
how good the argument.

1137
00:39:37,266 --> 00:39:39,290
And of course, in
politics, you know,

1138
00:39:39,290 --> 00:39:41,900
he makes massive and
effective fun of this.

1139
00:39:41,900 --> 00:39:45,530
But I think this is a feature
of human cognition, not a bug.

1140
00:39:45,530 --> 00:39:48,470
I think it is extremely
important that we can generate

1141
00:39:48,470 --> 00:39:50,930
ideas that are truthy, right?

1142
00:39:50,930 --> 00:39:52,730
Because they're plausible.

1143
00:39:52,730 --> 00:39:53,600
They're interesting.

1144
00:39:53,600 --> 00:39:54,570
They're informative.

1145
00:39:54,570 --> 00:39:55,445
They tell good story.

1146
00:39:55,445 --> 00:39:58,470
They may be false, but it
could be worse than false.

1147
00:39:58,470 --> 00:39:59,912
It could be not even false.

1148
00:40:03,610 --> 00:40:07,780
So I want to make an
important point here.

1149
00:40:07,780 --> 00:40:12,400
Generating new ideas is not just
about Einstein versus Newton.

1150
00:40:12,400 --> 00:40:19,810
And it's not about going from
an undifferentiated concept

1151
00:40:19,810 --> 00:40:22,420
of heat and temperature to
a modern scientific one.

1152
00:40:22,420 --> 00:40:26,140
It's just about radical
conceptual change.

1153
00:40:26,140 --> 00:40:29,800
This is the stuff of
ordinary everyday thought.

1154
00:40:29,800 --> 00:40:32,900
It is our ability
to reliably make up

1155
00:40:32,900 --> 00:40:37,780
new relevant answers to
basically any ad hoc question.

1156
00:40:37,780 --> 00:40:39,010
The answers may be trivial.

1157
00:40:39,010 --> 00:40:40,510
They may be false.

1158
00:40:40,510 --> 00:40:41,771
But they are genuinely new.

1159
00:40:41,771 --> 00:40:44,020
And that we didn't have them
until we thought of them,

1160
00:40:44,020 --> 00:40:45,430
they're genuinely made up.

1161
00:40:45,430 --> 00:40:47,552
We didn't learn them from
evidence or testimony.

1162
00:40:47,552 --> 00:40:48,760
And they answer the question.

1163
00:40:48,760 --> 00:40:50,230
They're not non-sequiturs.

1164
00:40:50,230 --> 00:40:51,712
And I think this is important.

1165
00:40:51,712 --> 00:40:53,170
And this is only
possible if we can

1166
00:40:53,170 --> 00:40:56,000
use the form of our
problems to guide search.

1167
00:40:56,000 --> 00:40:57,430
So let me give you
a few examples.

1168
00:40:57,430 --> 00:41:00,290
What's a good name
for a theater company?

1169
00:41:00,290 --> 00:41:01,210
None of you know.

1170
00:41:01,210 --> 00:41:02,650
You haven't thought about
that problem before.

1171
00:41:02,650 --> 00:41:04,025
But now you're
thinking about it.

1172
00:41:04,025 --> 00:41:06,230
Well, what's a good name
for a theater company?

1173
00:41:06,230 --> 00:41:08,175
How do you get stripes
on peppermints?

1174
00:41:08,175 --> 00:41:10,300
This is not a problem you
walked in thinking about.

1175
00:41:10,300 --> 00:41:12,799
None of you are working on this
as your independent project.

1176
00:41:12,799 --> 00:41:13,960
But you can think about it.

1177
00:41:13,960 --> 00:41:15,880
And you already know
enough, knowing nothing

1178
00:41:15,880 --> 00:41:19,540
about theater and nothing about
peppermints maybe, to know what

1179
00:41:19,540 --> 00:41:21,310
the constraints,
what counts, right?

1180
00:41:21,310 --> 00:41:24,160
That's the kind of
information you already have.

1181
00:41:24,160 --> 00:41:26,590
It's not prior knowledge
about theater companies

1182
00:41:26,590 --> 00:41:27,220
or peppermints.

1183
00:41:27,220 --> 00:41:29,803
It's part knowledge about what's
going to work for a solution.

1184
00:41:29,803 --> 00:41:32,910
So for instance, you know that
McDonald's is a nonstarter.

1185
00:41:32,910 --> 00:41:36,340
And [INAUDIBLE] or whatever
is also not a good name

1186
00:41:36,340 --> 00:41:37,870
for a theater company.

1187
00:41:37,870 --> 00:41:39,820
You know that getting
strips on peppermints

1188
00:41:39,820 --> 00:41:42,160
you don't want to do it
with a spatter [INAUDIBLE]..

1189
00:41:42,160 --> 00:41:44,290
You don't want to just
spray things at it, right?

1190
00:41:44,290 --> 00:41:45,930
Because that wouldn't
even count as a solution.

1191
00:41:45,930 --> 00:41:47,980
That's not the kind of
thing you're looking for.

1192
00:41:47,980 --> 00:41:50,650
You're looking for something
more like fresh ink.

1193
00:41:50,650 --> 00:41:51,160
It's new.

1194
00:41:51,160 --> 00:41:51,730
It's novel.

1195
00:41:51,730 --> 00:41:52,090
It's familiar.

1196
00:41:52,090 --> 00:41:53,050
It makes some reference.

1197
00:41:53,050 --> 00:41:55,508
You're looking for something
more like a pendulum approach,

1198
00:41:55,508 --> 00:41:58,599
which at least could
generate periodicity.

1199
00:41:58,599 --> 00:42:00,640
I'm sure they don't use
a pendulum to spray paint

1200
00:42:00,640 --> 00:42:01,600
peppermints.

1201
00:42:01,600 --> 00:42:02,600
So are you.

1202
00:42:02,600 --> 00:42:05,340
But it's not a
bad answer, right?

1203
00:42:05,340 --> 00:42:08,590
So is there any evidence
that kids can do this,

1204
00:42:08,590 --> 00:42:10,030
that information
contained in only

1205
00:42:10,030 --> 00:42:11,696
that strict form of
the problem can help

1206
00:42:11,696 --> 00:42:13,824
learners converge on solutions?

1207
00:42:13,824 --> 00:42:14,740
We wanted to find out.

1208
00:42:14,740 --> 00:42:19,420
So I'm going to show a
little baby attempt to start

1209
00:42:19,420 --> 00:42:21,010
getting at this problem.

1210
00:42:21,010 --> 00:42:22,390
We gave kids a machine.

1211
00:42:22,390 --> 00:42:26,890
And we gave kids some things
that could work the machine.

1212
00:42:26,890 --> 00:42:31,390
And the machine had
two visual effects.

1213
00:42:31,390 --> 00:42:34,520
You could make a ball appear to
flow up and down on that screen

1214
00:42:34,520 --> 00:42:35,020
there.

1215
00:42:35,020 --> 00:42:36,340
Or you could make the
ball appear at the bottom

1216
00:42:36,340 --> 00:42:37,810
and then flash up
at the top, right?

1217
00:42:37,810 --> 00:42:39,502
Because we put a
computer behind, right?

1218
00:42:39,502 --> 00:42:41,168
So woo, the ball is
moving continuously,

1219
00:42:41,168 --> 00:42:42,820
or the ball is
moving discretely.

1220
00:42:42,820 --> 00:42:44,410
And the affordance,
as you might note,

1221
00:42:44,410 --> 00:42:46,990
are also continuous
or discrete, right?

1222
00:42:46,990 --> 00:42:48,760
There's a rolly
ball, or there's this

1223
00:42:48,760 --> 00:42:51,042
peg that you can move back
and forth continuously.

1224
00:42:51,042 --> 00:42:52,750
Or there's a peg you
can pull in and out,

1225
00:42:52,750 --> 00:42:55,180
or a drawer you can
pull in and out.

1226
00:42:55,180 --> 00:42:59,860
So there's continuous and
discrete affordances and also

1227
00:42:59,860 --> 00:43:01,220
continuous and discrete effects.

1228
00:43:01,220 --> 00:43:04,080
So we also had an auditory
tone that varied continuously

1229
00:43:04,080 --> 00:43:08,340
and an auditory tone that
went from high to low.

1230
00:43:08,340 --> 00:43:10,600
Everyone got it?

1231
00:43:10,600 --> 00:43:14,301
And we just showed the
children all the parts

1232
00:43:14,301 --> 00:43:15,550
that connected to the machine.

1233
00:43:15,550 --> 00:43:18,429
And then we showed
the kids the effects,

1234
00:43:18,429 --> 00:43:19,720
but we had hid the affordances.

1235
00:43:19,720 --> 00:43:22,510
And we said, well, which
part made the ball go?

1236
00:43:22,510 --> 00:43:24,980
And we asked them either
about the visual or auditory

1237
00:43:24,980 --> 00:43:25,480
affordance?

1238
00:43:25,480 --> 00:43:29,140
They'd seen no covariation
data, prior knowledge- you know,

1239
00:43:29,140 --> 00:43:30,560
agnostic about all of this.

1240
00:43:30,560 --> 00:43:32,270
And the question
is, well, would they

1241
00:43:32,270 --> 00:43:33,970
say, well, I'm trying
to solve a problem

1242
00:43:33,970 --> 00:43:36,980
about a continuous effect.

1243
00:43:36,980 --> 00:43:39,430
I should use a
continuous affordance.

1244
00:43:39,430 --> 00:43:42,010
I'm trying to solve a problem
about a discrete effect.

1245
00:43:42,010 --> 00:43:44,201
I should use a
discrete affordance.

1246
00:43:44,201 --> 00:43:46,450
Would they use something
about the form of the problem

1247
00:43:46,450 --> 00:43:52,280
to answer the solution if they
knew nothing else about it?

1248
00:43:52,280 --> 00:43:54,640
So there's no fact of
the matter here, right?

1249
00:43:54,640 --> 00:43:58,210
Because, obviously, we're not
using either of these really.

1250
00:43:58,210 --> 00:44:00,880
But the prediction was
that they would indeed

1251
00:44:00,880 --> 00:44:03,190
make this kind of mapping.

1252
00:44:03,190 --> 00:44:04,720
And they did so.

1253
00:44:04,720 --> 00:44:08,349
Now, what you might worry
about was that, well,

1254
00:44:08,349 --> 00:44:09,640
there is no fact of the matter.

1255
00:44:09,640 --> 00:44:11,710
So they're just doing some kind
of cross-modal mapping, right?

1256
00:44:11,710 --> 00:44:13,330
If you don't have a way
to answer the problem,

1257
00:44:13,330 --> 00:44:15,430
they're just saying
something like, well, you

1258
00:44:15,430 --> 00:44:17,680
know, this has this
property, so does this.

1259
00:44:17,680 --> 00:44:20,982
Let me go ahead and make a
mapping from one to the other.

1260
00:44:20,982 --> 00:44:22,690
It's a weird kind of
cross-modal mapping.

1261
00:44:22,690 --> 00:44:24,273
Because, usually,
you integrate things

1262
00:44:24,273 --> 00:44:27,010
that are in a single stimulus
actually contained together,

1263
00:44:27,010 --> 00:44:28,485
like the sound of
a ball dropping

1264
00:44:28,485 --> 00:44:29,860
and the sight of
a ball dropping,

1265
00:44:29,860 --> 00:44:30,950
not two different things.

1266
00:44:30,950 --> 00:44:33,640
But maybe it's something
kind of like that.

1267
00:44:33,640 --> 00:44:36,449
So if they're actually
using the form of a problem,

1268
00:44:36,449 --> 00:44:37,990
then if you change
the problems, they

1269
00:44:37,990 --> 00:44:39,830
should generate
different solutions.

1270
00:44:39,830 --> 00:44:45,170
So what we did in the second
experiment is we said--

1271
00:44:45,170 --> 00:44:46,550
oh.

1272
00:44:46,550 --> 00:44:47,426
Yeah.

1273
00:44:47,426 --> 00:44:49,300
So what we did here is
we showed the children

1274
00:44:49,300 --> 00:44:52,510
the continuous visual
stimuli, and then we

1275
00:44:52,510 --> 00:44:55,731
asked them to generate the
continuous auditory stimuli.

1276
00:44:55,731 --> 00:44:56,230
OK.

1277
00:44:56,230 --> 00:44:58,210
So all of these
are now continuous.

1278
00:44:58,210 --> 00:45:00,770
They could still just go ahead
and make the continuous map.

1279
00:45:00,770 --> 00:45:03,140
But if you represent the
problem as changing from visual

1280
00:45:03,140 --> 00:45:06,210
to audition, that feels like
a discrete problem, right?

1281
00:45:06,210 --> 00:45:08,040
You're completely
changing modalities.

1282
00:45:08,040 --> 00:45:09,870
So now at this case,
you might expect

1283
00:45:09,870 --> 00:45:13,284
the kids to resist making
just the continuous mapping

1284
00:45:13,284 --> 00:45:15,450
if they're using something
about the kind of problem

1285
00:45:15,450 --> 00:45:17,783
they have to constrain how
they search for the solution.

1286
00:45:17,783 --> 00:45:20,200
Now, again, there's no
real right answer here.

1287
00:45:20,200 --> 00:45:23,070
But what you see is the
kids shifted their responses

1288
00:45:23,070 --> 00:45:24,460
in response to this.

1289
00:45:24,460 --> 00:45:27,300
So this is just a
tiny bit of suggestion

1290
00:45:27,300 --> 00:45:30,150
that when there's no
differentiating prior knowledge

1291
00:45:30,150 --> 00:45:32,362
and there's no
differentiating evidence,

1292
00:45:32,362 --> 00:45:34,320
children take into account
what kind of problem

1293
00:45:34,320 --> 00:45:36,980
they're trying to solve,
and what the information is,

1294
00:45:36,980 --> 00:45:41,280
and the problem itself that
can help constrain their search

1295
00:45:41,280 --> 00:45:42,756
for a solution.

1296
00:45:42,756 --> 00:45:44,880
I'm going to give you
another example for some more

1297
00:45:44,880 --> 00:45:47,520
recent work by Pedro
Tsividis in our lab.

1298
00:45:47,520 --> 00:45:54,060
He varied the
dynamics of a scene.

1299
00:45:54,060 --> 00:45:56,550
He had here some bugs.

1300
00:45:56,550 --> 00:45:58,530
And in one scene,
those bugs in green,

1301
00:45:58,530 --> 00:46:00,930
they varied periodically.

1302
00:46:00,930 --> 00:46:03,990
So they just went from
having very few spots

1303
00:46:03,990 --> 00:46:05,730
to having a whole lot
of spots to having

1304
00:46:05,730 --> 00:46:08,670
a very few spots to having
a whole lot of spots, right?

1305
00:46:08,670 --> 00:46:10,590
That's what these green bugs do.

1306
00:46:10,590 --> 00:46:13,980
And the other bugs just
got faster over time.

1307
00:46:13,980 --> 00:46:15,480
So those longer
vectors are supposed

1308
00:46:15,480 --> 00:46:18,476
to be indicating the bugs are
getting faster continuously.

1309
00:46:18,476 --> 00:46:19,850
And then he said,
well, you know,

1310
00:46:19,850 --> 00:46:21,330
here are some bugs
in these rooms.

1311
00:46:21,330 --> 00:46:23,280
And I have two kinds
of lights in the room.

1312
00:46:23,280 --> 00:46:25,410
One set of lights looks
like those on top.

1313
00:46:25,410 --> 00:46:28,170
The other set of lights looks
like those on the bottom.

1314
00:46:28,170 --> 00:46:31,180
Can you tell me which lights
are responsible for the behavior

1315
00:46:31,180 --> 00:46:31,770
of which bugs?

1316
00:46:31,770 --> 00:46:34,540
Again, a sort of very
similar kind of problem.

1317
00:46:34,540 --> 00:46:37,050
No fact to the matter that
if children can represent

1318
00:46:37,050 --> 00:46:39,150
something about the
abstract form of the problem

1319
00:46:39,150 --> 00:46:41,850
and use that to constrain
their search for the solution,

1320
00:46:41,850 --> 00:46:44,880
then the periodic lights should
reflect the periodic change

1321
00:46:44,880 --> 00:46:46,280
in the bug's behavior.

1322
00:46:46,280 --> 00:46:48,690
The continuous light should
reflect the continuous change

1323
00:46:48,690 --> 00:46:49,410
in the bug's behavior.

1324
00:46:49,410 --> 00:46:51,951
We are, of course, not using
the words periodic or continuous

1325
00:46:51,951 --> 00:46:53,490
with us or anything like that.

1326
00:46:53,490 --> 00:46:55,014
They have to pull
that out and say,

1327
00:46:55,014 --> 00:46:56,430
this is the kind
of problem it is.

1328
00:46:56,430 --> 00:46:58,860
This is a feature of
a possible solution.

1329
00:46:58,860 --> 00:47:00,660
Let's go ahead and
make that mapping.

1330
00:47:00,660 --> 00:47:06,864
And indeed, the kids are
doing that well above chance.

1331
00:47:06,864 --> 00:47:08,280
And obviously, in
this case, we're

1332
00:47:08,280 --> 00:47:10,170
giving the kids some
possible solutions.

1333
00:47:10,170 --> 00:47:11,880
They're not generating
it whole cloth.

1334
00:47:11,880 --> 00:47:13,440
But minimally, it's
a different way

1335
00:47:13,440 --> 00:47:15,960
of thinking about
problems and search.

1336
00:47:15,960 --> 00:47:18,660
They're not using most of
our sort of traditional ways

1337
00:47:18,660 --> 00:47:19,470
of figuring it out.

1338
00:47:19,470 --> 00:47:21,719
They're just using something
about the kind of problem

1339
00:47:21,719 --> 00:47:23,820
they have and what's
available in the problem

1340
00:47:23,820 --> 00:47:26,560
to help sort out the solution.

1341
00:47:26,560 --> 00:47:29,290
So is this analogical
reasoning, right?

1342
00:47:29,290 --> 00:47:30,480
It feels kind of an analogy.

1343
00:47:30,480 --> 00:47:32,520
But it's a funny
kind of analogy.

1344
00:47:32,520 --> 00:47:34,290
Because what it
isn't is a mapping

1345
00:47:34,290 --> 00:47:38,190
between a known problem and a
known solution to a new problem

1346
00:47:38,190 --> 00:47:40,110
and a new solution.

1347
00:47:40,110 --> 00:47:43,170
It's rather a mapping
from the kind of problem

1348
00:47:43,170 --> 00:47:46,050
you have to the kind
of solution you have.

1349
00:47:46,050 --> 00:47:48,460
It's using, again, the
problem or the query itself

1350
00:47:48,460 --> 00:47:49,140
as your friend.

1351
00:47:49,140 --> 00:47:51,860
It has information in it about
how it wants to be solved.

1352
00:47:51,860 --> 00:47:53,610
How are you going to
use that to solve it?

1353
00:47:53,610 --> 00:47:55,620
And, again, the
virtue being that

1354
00:47:55,620 --> 00:48:00,960
even if your answer is wrong,
if it were true, it would work.

1355
00:48:00,960 --> 00:48:02,970
So the argument
also, of course, is

1356
00:48:02,970 --> 00:48:05,940
that this applies to any
possible goal we might have

1357
00:48:05,940 --> 00:48:08,640
including those cases where
it's just not at all obvious how

1358
00:48:08,640 --> 00:48:10,420
an analogy would apply.

1359
00:48:10,420 --> 00:48:12,996
So what is a good name for
a theater company, right?

1360
00:48:12,996 --> 00:48:14,370
You're using
something about what

1361
00:48:14,370 --> 00:48:17,255
would count as a solution to
constraint something about what

1362
00:48:17,255 --> 00:48:18,630
you think would
be a good answer.

1363
00:48:18,630 --> 00:48:21,870
But there's not an easy
way to tell that story

1364
00:48:21,870 --> 00:48:24,720
as analogical reasoning.

1365
00:48:24,720 --> 00:48:26,670
So their argument
is children seem

1366
00:48:26,670 --> 00:48:30,290
to have data independent
criteria for the evaluation

1367
00:48:30,290 --> 00:48:31,470
of hypotheses.

1368
00:48:31,470 --> 00:48:34,110
And these criteria extend beyond
simplicity, grammaticality,

1369
00:48:34,110 --> 00:48:36,630
or compatibility
with prior knowledge.

1370
00:48:36,630 --> 00:48:40,440
They consider the extent to
which a hypothesis fulfills

1371
00:48:40,440 --> 00:48:42,900
the abstract goals of a
solution of a problem,

1372
00:48:42,900 --> 00:48:46,350
not just the degree to
which it fits the data.

1373
00:48:46,350 --> 00:48:49,920
And I will suggest that
this is maybe deeply part

1374
00:48:49,920 --> 00:48:52,055
of an important mystery
of human cognition, which

1375
00:48:52,055 --> 00:48:54,806
is-- our most powerful learners
spend a lot of their time

1376
00:48:54,806 --> 00:48:57,180
doing something that has defied
most of our best attempts

1377
00:48:57,180 --> 00:49:00,900
to explain it, which is
they've spent a lot of time,

1378
00:49:00,900 --> 00:49:03,060
many hours a day, just
pretending and making up

1379
00:49:03,060 --> 00:49:05,480
stories.

1380
00:49:05,480 --> 00:49:07,580
Stories have some
interesting properties.

1381
00:49:07,580 --> 00:49:10,590
They do not have
to be true, right?

1382
00:49:10,590 --> 00:49:13,010
They don't have to fit
the world or fit the data.

1383
00:49:13,010 --> 00:49:16,370
But they do have to set up
a problem and a solution

1384
00:49:16,370 --> 00:49:18,110
that, if true, would
solve the problem.

1385
00:49:18,110 --> 00:49:19,360
Most of play has that.

1386
00:49:19,360 --> 00:49:22,220
It is not at all obvious why
you would think it was important

1387
00:49:22,220 --> 00:49:25,190
that you want to balance a
twisty on the top of the candle

1388
00:49:25,190 --> 00:49:28,220
stick in order to
shoot pee through it.

1389
00:49:28,220 --> 00:49:29,575
But it's a problem.

1390
00:49:29,575 --> 00:49:30,200
And guess what?

1391
00:49:30,200 --> 00:49:32,600
If you can set up the problem
and find the solution,

1392
00:49:32,600 --> 00:49:35,437
you've just accessed a
really important ability

1393
00:49:35,437 --> 00:49:37,520
about setting up problems
and setting up solutions

1394
00:49:37,520 --> 00:49:40,730
that, if they worked, would
solve that kind of problem.

1395
00:49:40,730 --> 00:49:42,230
And, of course,
imagination, most

1396
00:49:42,230 --> 00:49:44,120
of our narratives,
most of our fictions,

1397
00:49:44,120 --> 00:49:46,800
have at least those
sets of properties.

1398
00:49:46,800 --> 00:49:49,778
So I'm going to go
ahead and stop there.

1399
00:49:49,778 --> 00:49:55,222
[APPLAUSE]

1400
00:49:55,222 --> 00:49:55,930
TOMER ULLMAN: OK.

1401
00:49:55,930 --> 00:49:58,346
So if you remember the structure
we said at the beginning,

1402
00:49:58,346 --> 00:50:00,280
there will now be
a short rebuttal.

1403
00:50:00,280 --> 00:50:02,560
And then Laura will give
an even shorter summary,

1404
00:50:02,560 --> 00:50:04,020
and then the free for all.

1405
00:50:04,020 --> 00:50:06,610
So most of you are probably
thinking all sorts of ways

1406
00:50:06,610 --> 00:50:07,510
that Laura is wrong.

1407
00:50:07,510 --> 00:50:10,250
But wait, let me get
through it first,

1408
00:50:10,250 --> 00:50:12,240
and then see if I
didn't cover something.

1409
00:50:14,697 --> 00:50:16,780
But, actually, my response
to this, to all of what

1410
00:50:16,780 --> 00:50:21,280
Laura's said, is not you're
wrong, but you're right.

1411
00:50:21,280 --> 00:50:22,689
So you're right, Laura.

1412
00:50:22,689 --> 00:50:23,230
You're right.

1413
00:50:23,230 --> 00:50:25,771
Other people in the audience
who think that stochastic search

1414
00:50:25,771 --> 00:50:26,315
by itself--

1415
00:50:26,315 --> 00:50:28,440
if you have some sort of
infinite theory space that

1416
00:50:28,440 --> 00:50:30,481
was supposed to account
for all possible problems

1417
00:50:30,481 --> 00:50:32,410
and for any new
problem what you did

1418
00:50:32,410 --> 00:50:34,630
was to just completely
at random try

1419
00:50:34,630 --> 00:50:38,020
to search through that space
anew, then that wouldn't work--

1420
00:50:38,020 --> 00:50:38,696
that's fine.

1421
00:50:38,696 --> 00:50:40,570
And I agree that there's
something inherently

1422
00:50:40,570 --> 00:50:42,736
wrong about an algorithm
that can take some problem,

1423
00:50:42,736 --> 00:50:44,950
like why these two blocks
are sticking together,

1424
00:50:44,950 --> 00:50:47,410
and say, well, maybe it's
because the moon is bigger

1425
00:50:47,410 --> 00:50:49,260
than a piece of cheese, right?

1426
00:50:49,260 --> 00:50:52,890
Like, as Laura said, it just
seems like it's not even wrong.

1427
00:50:52,890 --> 00:50:55,420
Or maybe it's because people
have more than two children

1428
00:50:55,420 --> 00:50:56,140
on average.

1429
00:50:56,140 --> 00:50:56,806
No.

1430
00:50:56,806 --> 00:50:58,180
And there's also
something wrong.

1431
00:50:58,180 --> 00:51:00,942
Laura didn't quite get to this
or maybe not emphasize it.

1432
00:51:00,942 --> 00:51:02,650
But she emphasizes it
sometimes, which is

1433
00:51:02,650 --> 00:51:04,222
there's something wrong about--

1434
00:51:04,222 --> 00:51:06,180
well, she did a little,
but-- an algorithm that

1435
00:51:06,180 --> 00:51:08,035
makes sort of dumb proposals.

1436
00:51:08,035 --> 00:51:10,120
Dumb proposals of
all sorts of things--

1437
00:51:10,120 --> 00:51:13,480
things like you try to explain
something in theory space,

1438
00:51:13,480 --> 00:51:16,290
and you say, well, maybe it's
because of X. And you check it,

1439
00:51:16,290 --> 00:51:19,000
and it's not X. And you
say, oh well, oh well.

1440
00:51:19,000 --> 00:51:22,521
Maybe it's because of X, right?

1441
00:51:22,521 --> 00:51:24,520
There's something wrong
about stochastic search.

1442
00:51:24,520 --> 00:51:25,990
Although, I have to say, Laura,
you have an eight-year-old.

1443
00:51:25,990 --> 00:51:29,020
And, you know, when we gave this
first, I had a two-year-old.

1444
00:51:29,020 --> 00:51:31,420
And I actually think maybe
it's X, maybe it's X,

1445
00:51:31,420 --> 00:51:35,192
maybe it's X is not such
a crazy way of describing

1446
00:51:35,192 --> 00:51:36,400
what two-year-olds are doing.

1447
00:51:36,400 --> 00:51:37,396
LAURA SCHULZ: Well, [INAUDIBLE].

1448
00:51:37,396 --> 00:51:39,390
So, you know, like,
what if there's noise?

1449
00:51:39,390 --> 00:51:40,181
TOMER ULLMAN: Yeah.

1450
00:51:40,181 --> 00:51:41,850
Yeah, yeah.

1451
00:51:41,850 --> 00:51:42,580
OK.

1452
00:51:42,580 --> 00:51:44,980
But I do want to give
an actual response.

1453
00:51:44,980 --> 00:51:50,700
So, you know, I think I have
some responses for Laura.

1454
00:51:50,700 --> 00:51:53,200
And these are responses that,
importantly, you know, they'll

1455
00:51:53,200 --> 00:51:54,460
try to address what
Laura is saying.

1456
00:51:54,460 --> 00:51:55,840
They'll try to take
it into account.

1457
00:51:55,840 --> 00:51:58,390
They'll try to give new answers
to it that will importantly

1458
00:51:58,390 --> 00:52:00,160
leave her unhappy.

1459
00:52:00,160 --> 00:52:03,400
So what I'm going to try
to do is take a page out

1460
00:52:03,400 --> 00:52:07,980
of Hannibal Barca at the Battle
of Cannae and try to envelop.

1461
00:52:07,980 --> 00:52:12,652
So I'm going to highlight
of work by other people.

1462
00:52:12,652 --> 00:52:14,860
From one direction is work
by Steve Piantadosi, which

1463
00:52:14,860 --> 00:52:17,020
is about making these, you
know, algorithmic search part

1464
00:52:17,020 --> 00:52:18,520
of the problem,
whether it's too slow,

1465
00:52:18,520 --> 00:52:19,990
students are wandering
around the halls

1466
00:52:19,990 --> 00:52:21,750
doing nothing waiting
for it to converge.

1467
00:52:21,750 --> 00:52:24,477
What if we just did it
really, really fast?

1468
00:52:24,477 --> 00:52:26,060
Another way to address
this is to say,

1469
00:52:26,060 --> 00:52:27,580
what if we made
actually good proposals?

1470
00:52:27,580 --> 00:52:27,790
OK.

1471
00:52:27,790 --> 00:52:30,460
The problem of making proposals
that are just ridiculous-- what

1472
00:52:30,460 --> 00:52:32,085
if we made proposals
that actually take

1473
00:52:32,085 --> 00:52:33,810
the data into
account a little bit

1474
00:52:33,810 --> 00:52:36,351
or the previous data, the stuff
that we're trying to explain?

1475
00:52:38,890 --> 00:52:41,782
Another way to address
what Laura is saying

1476
00:52:41,782 --> 00:52:43,990
is to say, well, maybe we
can make better primitives.

1477
00:52:43,990 --> 00:52:46,840
And better primitives mean that
your search space is actually

1478
00:52:46,840 --> 00:52:51,580
more confined to the
right sort of things.

1479
00:52:51,580 --> 00:52:54,430
And finally, I'm going to
highlight some new thoughts

1480
00:52:54,430 --> 00:52:58,600
by myself about ad hoc spaces
and how we might construct them

1481
00:52:58,600 --> 00:53:01,690
to get at this
problem of this thing,

1482
00:53:01,690 --> 00:53:04,500
like how would you come up with
a new name for theater company

1483
00:53:04,500 --> 00:53:05,380
or a name for a new
theater company?

1484
00:53:05,380 --> 00:53:06,970
So I'll go through
these somewhat fast.

1485
00:53:06,970 --> 00:53:08,860
You're welcome to come and
talk about any of them.

1486
00:53:08,860 --> 00:53:10,359
And I'll point you
to the people who

1487
00:53:10,359 --> 00:53:13,540
are actually doing the work,
which I've said before.

1488
00:53:13,540 --> 00:53:14,610
So let's see.

1489
00:53:14,610 --> 00:53:16,030
This is just sort
of the rebuttal.

1490
00:53:16,030 --> 00:53:18,010
One of the rebuttals
is to say, well, here's

1491
00:53:18,010 --> 00:53:20,890
this work by Steve Piantadosi,
which is, you know,

1492
00:53:20,890 --> 00:53:23,560
introspection is really
actually a poor guide.

1493
00:53:23,560 --> 00:53:27,610
So when [INAUDIBLE] was giving
that beautiful example of cell

1494
00:53:27,610 --> 00:53:29,744
phones, why you have to
shut them off on airplanes,

1495
00:53:29,744 --> 00:53:31,660
you say, oh, and she
came up with this example

1496
00:53:31,660 --> 00:53:33,620
that it's not true.

1497
00:53:33,620 --> 00:53:35,890
But if it were true,
it would be nice.

1498
00:53:35,890 --> 00:53:38,492
Well, maybe she went through
a billion other things

1499
00:53:38,492 --> 00:53:39,700
before she came up with that?

1500
00:53:39,700 --> 00:53:41,350
Laura's like, she
did it like that.

1501
00:53:41,350 --> 00:53:43,960
But we don't have a good sense
for what is actually fast,

1502
00:53:43,960 --> 00:53:45,940
what is actually slow.

1503
00:53:45,940 --> 00:53:49,330
And there might be a case where
we don't actually introspect

1504
00:53:49,330 --> 00:53:51,175
about a lot of things.

1505
00:53:51,175 --> 00:53:53,110
The things that bubble
up into consciousness

1506
00:53:53,110 --> 00:53:54,970
that you might actually
accept or reject

1507
00:53:54,970 --> 00:53:56,920
rely on actually a
ton of other proposals

1508
00:53:56,920 --> 00:53:59,334
that they don't even bubble
up into introspection

1509
00:53:59,334 --> 00:54:01,000
that you're making
very quickly and just

1510
00:54:01,000 --> 00:54:02,920
rejecting even before that.

1511
00:54:02,920 --> 00:54:05,140
And Steve came up with this
really interesting way.

1512
00:54:05,140 --> 00:54:07,140
You guys have probably
heard about deep learning

1513
00:54:07,140 --> 00:54:08,650
and the sort of
the GPU revolution

1514
00:54:08,650 --> 00:54:11,000
for deep learning
and things like that.

1515
00:54:11,000 --> 00:54:12,720
So the point is if
instead of we use

1516
00:54:12,720 --> 00:54:16,220
CPUs we would use
graphical processing units,

1517
00:54:16,220 --> 00:54:19,060
then we can make stochastic
search algorithms in parallel

1518
00:54:19,060 --> 00:54:19,890
in some cases.

1519
00:54:19,890 --> 00:54:21,514
And once you can make
them in parallel,

1520
00:54:21,514 --> 00:54:22,810
then you can put them on a GPU.

1521
00:54:22,810 --> 00:54:24,400
And once you put
them on a GPU, a GPU

1522
00:54:24,400 --> 00:54:26,154
is sort of a way
of taking something

1523
00:54:26,154 --> 00:54:27,320
that's supposed to be a CPU.

1524
00:54:27,320 --> 00:54:28,900
And instead of
having a CPU that can

1525
00:54:28,900 --> 00:54:31,370
do something sort of
complicated on a sort of task,

1526
00:54:31,370 --> 00:54:34,630
you can do a lot of really
simple tasks in parallel

1527
00:54:34,630 --> 00:54:35,380
very fast.

1528
00:54:35,380 --> 00:54:37,110
So if you could
make that proposal,

1529
00:54:37,110 --> 00:54:39,007
that thing I said
before-- take a theory,

1530
00:54:39,007 --> 00:54:40,840
make a change to it--
if you could make that

1531
00:54:40,840 --> 00:54:43,060
into the sort of thing that
you could put on a GPU,

1532
00:54:43,060 --> 00:54:44,920
then you can make a
ton of those proposals

1533
00:54:44,920 --> 00:54:46,530
very quickly in parallel.

1534
00:54:46,530 --> 00:54:48,280
And that's sort of
what he figured out how

1535
00:54:48,280 --> 00:54:49,750
to do for a bunch of spaces.

1536
00:54:49,750 --> 00:54:51,280
It's much faster than the CPU.

1537
00:54:51,280 --> 00:54:53,530
And the main advantage is
that it's also much cheaper.

1538
00:54:53,530 --> 00:54:55,612
And you can cram a
whole bunch of together.

1539
00:54:55,612 --> 00:54:57,070
And you can get to
something like--

1540
00:54:57,070 --> 00:54:58,000
I forget the exact numbers.

1541
00:54:58,000 --> 00:55:00,010
You can make like a million
of these theories proposals

1542
00:55:00,010 --> 00:55:00,550
a second.

1543
00:55:00,550 --> 00:55:02,090
And that's just with
today's technology, right?

1544
00:55:02,090 --> 00:55:04,210
We don't know what's
coming around the corner

1545
00:55:04,210 --> 00:55:06,100
a few years from now.

1546
00:55:06,100 --> 00:55:08,242
You know, Steve plus
GPUs is awesome.

1547
00:55:08,242 --> 00:55:10,450
And you could think of it
like these various problems

1548
00:55:10,450 --> 00:55:12,310
like you're trying to fit these
data points on the bottom.

1549
00:55:12,310 --> 00:55:13,264
Can people see that?

1550
00:55:13,264 --> 00:55:14,680
This is sort of a
classic problem.

1551
00:55:14,680 --> 00:55:15,430
You have some data points.

1552
00:55:15,430 --> 00:55:17,020
You're trying to fit
a polynomial to it.

1553
00:55:17,020 --> 00:55:19,240
And you're trying to say,
well, how will we do that?

1554
00:55:19,240 --> 00:55:21,970
The truth is there's a lot of
very clever ways of doing that.

1555
00:55:21,970 --> 00:55:23,344
But let's assume
that you're even

1556
00:55:23,344 --> 00:55:25,469
doing random search
in polynomial space--

1557
00:55:25,469 --> 00:55:27,010
not the sort of
thing you want to do.

1558
00:55:27,010 --> 00:55:28,340
Those of you who have
been to the tutorial,

1559
00:55:28,340 --> 00:55:30,227
I mentioned that if you
have an actual better

1560
00:55:30,227 --> 00:55:32,560
way than stochastic search,
you should probably do that.

1561
00:55:32,560 --> 00:55:33,730
But suppose you
didn't know and you

1562
00:55:33,730 --> 00:55:35,021
wanted to do stochastic search.

1563
00:55:35,021 --> 00:55:37,330
You could still do a
million moves a second

1564
00:55:37,330 --> 00:55:39,800
and quickly converge on
something like that line

1565
00:55:39,800 --> 00:55:40,461
that you see.

1566
00:55:40,461 --> 00:55:40,960
OK.

1567
00:55:40,960 --> 00:55:42,876
And that line is actually
taken from, I think,

1568
00:55:42,876 --> 00:55:45,430
Galileo Galilei's data for
how things slide on a hill.

1569
00:55:45,430 --> 00:55:47,810
I'm not Galileo Galilei
sat around and said,

1570
00:55:47,810 --> 00:55:49,976
maybe it's x to the square,
maybe it's x to the 2.1,

1571
00:55:49,976 --> 00:55:51,900
maybe it's x to the 2.3,
maybe it's x to the 2.1,

1572
00:55:51,900 --> 00:55:53,170
maybe it's x to the
2, and then finally

1573
00:55:53,170 --> 00:55:55,044
converged like after a
million moves to that.

1574
00:55:55,044 --> 00:55:56,960
That's not exactly
scientific discovery.

1575
00:55:56,960 --> 00:55:58,420
But for a lot of
everyday thinking,

1576
00:55:58,420 --> 00:56:00,530
you might actually be
proposing things very fast

1577
00:56:00,530 --> 00:56:01,651
and rejecting them.

1578
00:56:01,651 --> 00:56:02,150
OK.

1579
00:56:02,150 --> 00:56:03,240
That's Steve.

1580
00:56:03,240 --> 00:56:05,610
Some things from Owen Lewis
about making maybe smarter

1581
00:56:05,610 --> 00:56:08,110
proposals-- and this gets at
that point of, like, maybe it's

1582
00:56:08,110 --> 00:56:09,000
an x, maybe not.

1583
00:56:09,000 --> 00:56:11,620
So I suppose that I'm trying
to teach you a concept.

1584
00:56:11,620 --> 00:56:12,220
OK.

1585
00:56:12,220 --> 00:56:13,630
I'm trying to teach you
a particular concept.

1586
00:56:13,630 --> 00:56:15,280
I'm going to give you
some positive examples

1587
00:56:15,280 --> 00:56:15,920
of the concept.

1588
00:56:15,920 --> 00:56:16,170
OK?

1589
00:56:16,170 --> 00:56:17,710
This is the sort of thing
that psychologists really

1590
00:56:17,710 --> 00:56:18,580
like to study.

1591
00:56:18,580 --> 00:56:19,120
OK?

1592
00:56:19,120 --> 00:56:20,190
So this is a room--

1593
00:56:20,190 --> 00:56:22,750
no, Roomba's an actual thing.

1594
00:56:22,750 --> 00:56:23,770
Blick gets overused.

1595
00:56:23,770 --> 00:56:26,050
Can someone give
me a nonsense term?

1596
00:56:26,050 --> 00:56:26,960
This is a Jabberwock.

1597
00:56:26,960 --> 00:56:27,460
OK?

1598
00:56:27,460 --> 00:56:28,330
This is a Jabberwock.

1599
00:56:28,330 --> 00:56:30,580
I'm going to give you another
example of a Jabberwock.

1600
00:56:30,580 --> 00:56:32,500
Who thinks they know
what Jabberwocks are?

1601
00:56:32,500 --> 00:56:34,334
You have some sense of
what a Jabberwock is?

1602
00:56:34,334 --> 00:56:34,833
OK.

1603
00:56:34,833 --> 00:56:36,130
Huh, that's also a Jabberwock.

1604
00:56:36,130 --> 00:56:37,090
Wait a minute.

1605
00:56:37,090 --> 00:56:38,170
OK.

1606
00:56:38,170 --> 00:56:40,440
The sense that I had for
maybe what a Jabberwock is

1607
00:56:40,440 --> 00:56:41,717
is maybe not that great.

1608
00:56:41,717 --> 00:56:42,550
That's a Jabberwock.

1609
00:56:42,550 --> 00:56:43,383
That's a Jabberwock.

1610
00:56:43,383 --> 00:56:44,230
That's a Jabberwock.

1611
00:56:44,230 --> 00:56:45,130
That's a Jabberwock.

1612
00:56:45,130 --> 00:56:45,644
OK.

1613
00:56:45,644 --> 00:56:47,560
And you might think at
this point, well, fine.

1614
00:56:47,560 --> 00:56:51,100
Jabberwocks are either squares
of any color or red circles.

1615
00:56:51,100 --> 00:56:52,600
Or maybe they're
squares or circles.

1616
00:56:52,600 --> 00:56:53,474
I don't know exactly.

1617
00:56:53,474 --> 00:56:55,300
You're building up
some sort of theory

1618
00:56:55,300 --> 00:56:57,040
for that concept,
which can be described

1619
00:56:57,040 --> 00:57:00,310
in something like a grammar
for your current hypothesis.

1620
00:57:00,310 --> 00:57:02,110
You might say it's
either a red circle,

1621
00:57:02,110 --> 00:57:03,770
or it's the square of any color.

1622
00:57:03,770 --> 00:57:04,270
OK.

1623
00:57:04,270 --> 00:57:06,394
And that's sort of your
grammar for these concepts.

1624
00:57:06,394 --> 00:57:09,897
And now you could sort of
change that grammar, right?

1625
00:57:09,897 --> 00:57:11,980
You could sort of excise
these nodes and that tree

1626
00:57:11,980 --> 00:57:13,370
to come up with new things.

1627
00:57:13,370 --> 00:57:15,286
Why would you want to
come up with new things?

1628
00:57:15,286 --> 00:57:16,820
Because, look,
that's a Jabberwock.

1629
00:57:16,820 --> 00:57:17,080
OK.

1630
00:57:17,080 --> 00:57:18,329
I just gave you a new example.

1631
00:57:18,329 --> 00:57:19,970
It's something you
didn't know before.

1632
00:57:19,970 --> 00:57:21,280
It's sort of confounding
with your theory.

1633
00:57:21,280 --> 00:57:23,350
You have to come up with
a new theory on the fly.

1634
00:57:23,350 --> 00:57:23,860
OK?

1635
00:57:23,860 --> 00:57:26,489
Theory-- again, used in a
very, very minimal sense here.

1636
00:57:26,489 --> 00:57:28,780
But if you accept that this
is something like a theory,

1637
00:57:28,780 --> 00:57:30,640
you have to come up with a
new theory for explaining

1638
00:57:30,640 --> 00:57:32,800
why that's a Jabberwock and all
the other things that you've

1639
00:57:32,800 --> 00:57:33,790
seen are Jabberwocks.

1640
00:57:33,790 --> 00:57:34,615
What is a bad idea?

1641
00:57:34,615 --> 00:57:39,477
A bad idea is to just cut
and generate randomly, right?

1642
00:57:39,477 --> 00:57:41,685
You might come up with
something like it's a triangle

1643
00:57:41,685 --> 00:57:42,643
or something like that.

1644
00:57:42,643 --> 00:57:45,241
But you might come up with,
well, maybe it's a square.

1645
00:57:45,241 --> 00:57:46,240
No, we already did that.

1646
00:57:46,240 --> 00:57:48,520
Well, maybe it's, you
know, just triangles.

1647
00:57:48,520 --> 00:57:49,870
Maybe it's just a square.

1648
00:57:49,870 --> 00:57:51,340
Like, you could
spend all this time

1649
00:57:51,340 --> 00:57:54,250
not taking into account your
previous theory and the fact

1650
00:57:54,250 --> 00:57:56,080
that your new
example had something

1651
00:57:56,080 --> 00:57:59,260
to do with triangles, something
to do with red triangles.

1652
00:57:59,260 --> 00:58:01,450
You want to be able to
make proposals that take

1653
00:58:01,450 --> 00:58:04,520
into account this new data.

1654
00:58:04,520 --> 00:58:06,610
Does everyone understand
what the problem

1655
00:58:06,610 --> 00:58:07,754
we're trying to get at is?

1656
00:58:07,754 --> 00:58:10,420
So what Owen has done is to sort
of take these stochastic search

1657
00:58:10,420 --> 00:58:12,711
algorithms and say, if you
get a new piece of data that

1658
00:58:12,711 --> 00:58:15,190
contradicts, that sort of
interferes with what you had

1659
00:58:15,190 --> 00:58:16,960
before, how would
you make proposals

1660
00:58:16,960 --> 00:58:19,390
that must take into
account this new data?

1661
00:58:19,390 --> 00:58:21,100
I'm going to recut and generate.

1662
00:58:21,100 --> 00:58:22,750
But I'm going to
identify the places

1663
00:58:22,750 --> 00:58:25,210
in my theory that would take
into account this new piece

1664
00:58:25,210 --> 00:58:25,750
of data.

1665
00:58:25,750 --> 00:58:27,970
And I'm going to make
smarter proposals.

1666
00:58:27,970 --> 00:58:29,200
They might still be wrong.

1667
00:58:29,200 --> 00:58:31,492
And there's better and
worse ways of doing this.

1668
00:58:31,492 --> 00:58:33,700
It's still going to be a
randomized search and theory

1669
00:58:33,700 --> 00:58:34,199
space.

1670
00:58:34,199 --> 00:58:37,140
But it's at least going to take
into account this new data.

1671
00:58:37,140 --> 00:58:38,132
OK.

1672
00:58:38,132 --> 00:58:40,090
And that's just-- I'm
afraid I don't have time.

1673
00:58:40,090 --> 00:58:40,860
But, look, it works.

1674
00:58:40,860 --> 00:58:42,340
And it's much better
than just bouncing around

1675
00:58:42,340 --> 00:58:44,560
completely around random
as you might guess.

1676
00:58:44,560 --> 00:58:48,850
Another response, this
work by Eyal Dechter.

1677
00:58:48,850 --> 00:58:51,200
Let me skip over this
for a little bit.

1678
00:58:51,200 --> 00:58:53,290
This is work Eyal
Dechter, which is to say,

1679
00:58:53,290 --> 00:58:55,701
what if you wanted to use
better and better primitives?

1680
00:58:55,701 --> 00:58:56,200
OK.

1681
00:58:56,200 --> 00:58:58,699
So before we have this notion
of you're just bouncing around

1682
00:58:58,699 --> 00:59:01,152
in theory space, you're
making all sorts of notions,

1683
00:59:01,152 --> 00:59:02,110
let me put it this way.

1684
00:59:02,110 --> 00:59:03,910
Suppose that you're searching
through the entire space

1685
00:59:03,910 --> 00:59:04,690
of programs.

1686
00:59:04,690 --> 00:59:05,257
OK?

1687
00:59:05,257 --> 00:59:07,090
And the only thing that
you had to work with

1688
00:59:07,090 --> 00:59:09,381
is something like that lambda
expression before, right?

1689
00:59:09,381 --> 00:59:12,340
You didn't have the notion of
plus, minus, multiplication,

1690
00:59:12,340 --> 00:59:13,741
sine waves, things like that.

1691
00:59:13,741 --> 00:59:16,240
And you're trying to figure out
something about an equation.

1692
00:59:16,240 --> 00:59:17,260
And you work through it.

1693
00:59:17,260 --> 00:59:18,510
And there's a way of doing it.

1694
00:59:18,510 --> 00:59:20,385
There's a way of, like,
generating functions

1695
00:59:20,385 --> 00:59:21,760
that rely on other
functions that

1696
00:59:21,760 --> 00:59:24,093
rely on other functions in
the complicated way that will

1697
00:59:24,093 --> 00:59:25,510
give you the plus function.

1698
00:59:25,510 --> 00:59:26,140
OK?

1699
00:59:26,140 --> 00:59:26,840
And you do that.

1700
00:59:26,840 --> 00:59:27,690
And then you generate a lot.

1701
00:59:27,690 --> 00:59:30,070
And you somehow manage to
find out the sinus function.

1702
00:59:30,070 --> 00:59:31,944
And you finally figure
out that this function

1703
00:59:31,944 --> 00:59:35,841
you're trying to describe
is sinus x plus sinus y.

1704
00:59:35,841 --> 00:59:37,090
Let's say something like that.

1705
00:59:37,090 --> 00:59:37,300
OK?

1706
00:59:37,300 --> 00:59:38,480
The exact example
doesn't matter.

1707
00:59:38,480 --> 00:59:40,870
But you worked really hard,
and you figured that out.

1708
00:59:40,870 --> 00:59:42,319
Now, you get a new example.

1709
00:59:42,319 --> 00:59:44,110
And underneath the
hood, it's actually just

1710
00:59:44,110 --> 00:59:45,610
sinus x minus sinus y.

1711
00:59:45,610 --> 00:59:48,855
Or let's say it's sinus y plus
sinus z or something like that.

1712
00:59:48,855 --> 00:59:50,230
And now you start
all over again.

1713
00:59:50,230 --> 00:59:50,770
You're like, fine.

1714
00:59:50,770 --> 00:59:52,630
OK, lambda something,
something, something.

1715
00:59:52,630 --> 00:59:54,963
Like, if you could only use
the fact that you've already

1716
00:59:54,963 --> 00:59:56,770
discovered the sign
function, you've

1717
00:59:56,770 --> 00:59:59,170
already discovered plus and
minus and things like that,

1718
00:59:59,170 --> 01:00:01,990
and now when you come to try
and explain a new problem,

1719
01:00:01,990 --> 01:00:04,250
you actually have a lot
of previous knowledge.

1720
01:00:04,250 --> 01:00:04,750
OK?

1721
01:00:04,750 --> 01:00:06,550
So when you're trying to
describe why airplanes take off

1722
01:00:06,550 --> 01:00:07,480
and how they do
that, you're actually

1723
01:00:07,480 --> 01:00:09,110
going to rely on
previous knowledge.

1724
01:00:09,110 --> 01:00:11,318
You're not going to search
through your entire theory

1725
01:00:11,318 --> 01:00:12,820
space starting from nowhere.

1726
01:00:12,820 --> 01:00:14,530
You're going to
rely on primitives

1727
01:00:14,530 --> 01:00:16,146
before that have been useful.

1728
01:00:16,146 --> 01:00:18,145
So you might notice that
actually plus and minus

1729
01:00:18,145 --> 01:00:21,520
and sines and things like that
and cosine and exponentiation

1730
01:00:21,520 --> 01:00:22,690
are really useful.

1731
01:00:22,690 --> 01:00:24,580
Let's save those as primitives.

1732
01:00:24,580 --> 01:00:27,010
So that next time that we make
random proposals in theory

1733
01:00:27,010 --> 01:00:29,530
space, you can think
of it of making

1734
01:00:29,530 --> 01:00:32,210
like a whole bunch of moves
at once that were useful.

1735
01:00:32,210 --> 01:00:32,710
OK.

1736
01:00:32,710 --> 01:00:34,210
Like, a whole bunch
of stuff at once

1737
01:00:34,210 --> 01:00:36,564
that was useful that
shows up all the time--

1738
01:00:36,564 --> 01:00:37,730
you want to make that again.

1739
01:00:37,730 --> 01:00:38,230
OK?

1740
01:00:38,230 --> 01:00:40,300
So I try that, for example.

1741
01:00:40,300 --> 01:00:42,269
And the examples that we
gave in theory space--

1742
01:00:42,269 --> 01:00:44,560
like you might find out that
actually a lot of theories

1743
01:00:44,560 --> 01:00:47,710
use transitivity, or a lot of
theories use reflection, right?

1744
01:00:47,710 --> 01:00:50,320
In the particular magnet
case, if X attracts Y,

1745
01:00:50,320 --> 01:00:52,460
then Y will attract
X, in general,

1746
01:00:52,460 --> 01:00:55,210
the law of if X
blahs Y, then Y will

1747
01:00:55,210 --> 01:00:57,940
blah X, that turns out to be
useful in a lot of domains

1748
01:00:57,940 --> 01:00:59,920
if only there was a way
of reusing them, right?

1749
01:00:59,920 --> 01:01:01,000
There is.

1750
01:01:01,000 --> 01:01:03,130
And what Eyal was doing
was to basically use

1751
01:01:03,130 --> 01:01:05,990
an explanation compression
algorithm, the EC algorithm.

1752
01:01:05,990 --> 01:01:08,692
And what it does is it tries
to encapsulate useful concepts.

1753
01:01:08,692 --> 01:01:10,400
And he used it on a
whole bunch of stuff.

1754
01:01:10,400 --> 01:01:11,941
One of the nice
domains he used it on

1755
01:01:11,941 --> 01:01:13,720
was these circuit diagrams.

1756
01:01:13,720 --> 01:01:16,320
Have any of you actually had
to solve circuit diagrams?

1757
01:01:16,320 --> 01:01:19,080
This is the sort of stuff
that people at MIT do.

1758
01:01:19,080 --> 01:01:21,737
You're given a particular
input-output function.

1759
01:01:21,737 --> 01:01:23,320
And again, like I
said, under the hood

1760
01:01:23,320 --> 01:01:25,750
it might be something like
you're just told something

1761
01:01:25,750 --> 01:01:27,820
like here's X and Y, OK?

1762
01:01:27,820 --> 01:01:30,520
X and Y can each be 1 or 0.

1763
01:01:30,520 --> 01:01:31,100
OK?

1764
01:01:31,100 --> 01:01:33,730
And now I'm going to give you
combinations of values of X

1765
01:01:33,730 --> 01:01:35,240
and Y and what that spits out.

1766
01:01:35,240 --> 01:01:35,740
OK.

1767
01:01:35,740 --> 01:01:37,870
So X and Y are both 1--

1768
01:01:37,870 --> 01:01:38,920
light turns on.

1769
01:01:38,920 --> 01:01:40,270
X and Y are both 0--

1770
01:01:40,270 --> 01:01:41,050
light turns on.

1771
01:01:41,050 --> 01:01:43,580
X is 1, Y is 0--
light turns off.

1772
01:01:43,580 --> 01:01:44,080
OK.

1773
01:01:44,080 --> 01:01:46,930
And you're trying to find
out some sort of circuit that

1774
01:01:46,930 --> 01:01:50,020
will explain this behavior,
some sort of combination logic

1775
01:01:50,020 --> 01:01:54,660
gates, like ANDs and ORs and
NANDs and things like that.

1776
01:01:54,660 --> 01:01:55,160
OK?

1777
01:01:55,160 --> 01:01:57,659
Do people sort of understand
the problem for these circuits?

1778
01:01:57,659 --> 01:02:00,580
And you can get a long list of
things, X, Y, Z, T, some sort

1779
01:02:00,580 --> 01:02:01,610
of complicated behavior.

1780
01:02:01,610 --> 01:02:03,160
You might not even
get the full behavior.

1781
01:02:03,160 --> 01:02:05,320
And you're trying to find
sort of the minimal set

1782
01:02:05,320 --> 01:02:07,830
of logical predicates, or
in this case circuits, that

1783
01:02:07,830 --> 01:02:09,380
will explain that behavior.

1784
01:02:09,380 --> 01:02:11,620
And now suppose
that you only have

1785
01:02:11,620 --> 01:02:14,220
the gate NAND to work with.

1786
01:02:14,220 --> 01:02:16,083
Do people know what NAND is?

1787
01:02:16,083 --> 01:02:16,927
OK.

1788
01:02:16,927 --> 01:02:18,010
It's a sort of logic gate.

1789
01:02:18,010 --> 01:02:20,680
It's a very simple logic
gate that you can build up

1790
01:02:20,680 --> 01:02:22,300
all the other logic gates from.

1791
01:02:22,300 --> 01:02:25,790
But it will take you a while
to build up AND from NAND or OR

1792
01:02:25,790 --> 01:02:26,290
from NAND.

1793
01:02:26,290 --> 01:02:27,340
But you can do it.

1794
01:02:27,340 --> 01:02:29,620
And so what he did was,
and his colleagues, they

1795
01:02:29,620 --> 01:02:31,394
started out sort of
giving this algorithm

1796
01:02:31,394 --> 01:02:32,560
a lot of different problems.

1797
01:02:32,560 --> 01:02:35,184
Like, here's a bunch of circuit
diagrams that you need to solve

1798
01:02:35,184 --> 01:02:37,810
or circuit problems that you
need to find the diagram for.

1799
01:02:37,810 --> 01:02:40,120
What the algorithm was
doing was that each time it

1800
01:02:40,120 --> 01:02:41,850
solved a problem
or set of problems,

1801
01:02:41,850 --> 01:02:44,350
it would go back and look,
huh, which parts of this

1802
01:02:44,350 --> 01:02:46,120
can I encapsulate, right?

1803
01:02:46,120 --> 01:02:48,340
Which parts of this can
I sort of use again?

1804
01:02:48,340 --> 01:02:51,140
I can carve off a chunk of
something that was useful.

1805
01:02:51,140 --> 01:02:52,640
And now, when I
make a new proposal,

1806
01:02:52,640 --> 01:02:56,230
I'm not going to say put a
NAND here or a NAND there.

1807
01:02:56,230 --> 01:02:58,750
Stochastically, I'm going to
sort of put in a whole chunk

1808
01:02:58,750 --> 01:03:00,140
that I've already used before.

1809
01:03:00,140 --> 01:03:00,640
OK?

1810
01:03:00,640 --> 01:03:02,598
I'm going to sort of call
that a new primitive.

1811
01:03:02,598 --> 01:03:04,500
Cut out this part of the space.

1812
01:03:04,500 --> 01:03:07,000
Under the hood, it's actually
an AND or something like that.

1813
01:03:07,000 --> 01:03:09,130
And discover-- so
discover is not an AND.

1814
01:03:09,130 --> 01:03:10,866
And discover is this
really useful thing

1815
01:03:10,866 --> 01:03:13,240
that they called E2, which
doesn't appear in logic books.

1816
01:03:13,240 --> 01:03:16,070
But it turns out to be really
useful for certain diagrams,

1817
01:03:16,070 --> 01:03:18,910
which is take an input,
split it into two,

1818
01:03:18,910 --> 01:03:21,790
do something on this part,
do something on that part,

1819
01:03:21,790 --> 01:03:22,860
and recombine it.

1820
01:03:22,860 --> 01:03:24,610
It turns out to be a
hugely useful concept

1821
01:03:24,610 --> 01:03:25,540
for circuit diagrams.

1822
01:03:25,540 --> 01:03:26,950
And this thing discovers it.

1823
01:03:26,950 --> 01:03:29,500
And once it discovers
it, it sort of reuses it.

1824
01:03:29,500 --> 01:03:32,020
And what that does is it turns
an infinite and unmanageable

1825
01:03:32,020 --> 01:03:34,401
space into infinite, but
a bit more manageable.

1826
01:03:34,401 --> 01:03:34,900
OK?

1827
01:03:34,900 --> 01:03:36,069
So your space is infinite.

1828
01:03:36,069 --> 01:03:38,110
You're not going to search
the full length of it.

1829
01:03:38,110 --> 01:03:40,600
Imagine that this is a space
of all possible programs.

1830
01:03:40,600 --> 01:03:42,610
As you go down, the
programs get way too long.

1831
01:03:42,610 --> 01:03:44,090
You're never going
to reach them.

1832
01:03:44,090 --> 01:03:45,530
But some of them
are really good.

1833
01:03:45,530 --> 01:03:46,720
Some of them are really
good explanations.

1834
01:03:46,720 --> 01:03:48,250
And the only way
to get to them is

1835
01:03:48,250 --> 01:03:50,020
if you had some sort
of way of chunking

1836
01:03:50,020 --> 01:03:52,990
the problem, of saying, yes,
it looks like a long program.

1837
01:03:52,990 --> 01:03:56,020
But actually half of it
I've already used before

1838
01:03:56,020 --> 01:03:57,310
to solve a different thing.

1839
01:03:57,310 --> 01:04:00,511
And half of it is less
long than two times.

1840
01:04:00,511 --> 01:04:02,260
So you might discover,
you know, you might

1841
01:04:02,260 --> 01:04:03,430
have an effective search area.

1842
01:04:03,430 --> 01:04:05,513
You find out all the
problems you can solve there.

1843
01:04:08,440 --> 01:04:13,210
Yeah, this is an interesting
thing, choice color.

1844
01:04:13,210 --> 01:04:15,730
So imagine that of this
blue thing over here

1845
01:04:15,730 --> 01:04:18,290
is describing, within the
space of all possible programs,

1846
01:04:18,290 --> 01:04:20,440
the sort of programs
that you want to find.

1847
01:04:20,440 --> 01:04:22,870
So there's the
effective search area.

1848
01:04:22,870 --> 01:04:25,030
They only cover part
of this blue thing.

1849
01:04:25,030 --> 01:04:26,696
You can think of it
like the probability

1850
01:04:26,696 --> 01:04:27,860
is really high over there.

1851
01:04:27,860 --> 01:04:29,680
You really want to
find all of them.

1852
01:04:29,680 --> 01:04:31,480
But by searching
that small space

1853
01:04:31,480 --> 01:04:34,180
and, within that small space,
finding the right primitives

1854
01:04:34,180 --> 01:04:36,490
and encapsulating them,
you can now actually

1855
01:04:36,490 --> 01:04:38,652
search more efficiently
the rest of the space.

1856
01:04:38,652 --> 01:04:41,110
And the rest of the space sort
of compresses and compresses

1857
01:04:41,110 --> 01:04:43,594
until it's all within your
effective search area.

1858
01:04:43,594 --> 01:04:45,010
Do people sort of
understand that?

1859
01:04:45,010 --> 01:04:46,270
It's sort of there
were long programs

1860
01:04:46,270 --> 01:04:48,010
before that you never
would have gotten to.

1861
01:04:48,010 --> 01:04:49,480
But by searching
these small spaces

1862
01:04:49,480 --> 01:04:51,021
that you could search
through before,

1863
01:04:51,021 --> 01:04:52,840
discovering the
useful parts there,

1864
01:04:52,840 --> 01:04:55,369
these new things that
seem really long before

1865
01:04:55,369 --> 01:04:56,160
are actually short.

1866
01:04:56,160 --> 01:04:59,180
Because they can be described
by just a few chunks.

1867
01:04:59,180 --> 01:05:00,024
OK?

1868
01:05:00,024 --> 01:05:01,440
This is a really
interesting work.

1869
01:05:01,440 --> 01:05:03,240
And I encourage you
all to read it, those

1870
01:05:03,240 --> 01:05:04,620
of you who find it interesting.

1871
01:05:04,620 --> 01:05:06,370
The last thing I'm going to
do-- and then that'll leaves 10

1872
01:05:06,370 --> 01:05:08,184
minutes to discussion,
which is great--

1873
01:05:08,184 --> 01:05:09,600
is this problem
that I guess maybe

1874
01:05:09,600 --> 01:05:11,160
it's really the heart of
what Laura is getting at.

1875
01:05:11,160 --> 01:05:13,020
I think she was not satisfied
by any of these things.

1876
01:05:13,020 --> 01:05:14,790
And she was sort of pointing
out, well, fine, you

1877
01:05:14,790 --> 01:05:16,373
can do stochastic
search all you want.

1878
01:05:16,373 --> 01:05:19,020
But the really hard problem is
constructing the space itself

1879
01:05:19,020 --> 01:05:20,750
on the fly.

1880
01:05:20,750 --> 01:05:22,500
You're not going to
use one infinite space

1881
01:05:22,500 --> 01:05:23,583
for all possible problems.

1882
01:05:23,583 --> 01:05:26,432
You're going to use the right
spaces for the right problem.

1883
01:05:26,432 --> 01:05:27,390
And how do you do that?

1884
01:05:30,234 --> 01:05:32,400
In this case, we're going
to do, give me a good name

1885
01:05:32,400 --> 01:05:34,211
for a romantic drama.

1886
01:05:34,211 --> 01:05:34,710
All right.

1887
01:05:34,710 --> 01:05:37,270
And your search space is going
to be imagined that-- can

1888
01:05:37,270 --> 01:05:38,520
people see sort of the border?

1889
01:05:38,520 --> 01:05:40,656
Like, there's this whole
space of uselessness.

1890
01:05:40,656 --> 01:05:42,030
And what you really
want to do is

1891
01:05:42,030 --> 01:05:44,460
focus in on that tiny
part of useful things.

1892
01:05:44,460 --> 01:05:46,920
If only there was a
way of just on the fly,

1893
01:05:46,920 --> 01:05:48,690
you know, zooming
in on that thing

1894
01:05:48,690 --> 01:05:50,485
and then bouncing
around in that.

1895
01:05:50,485 --> 01:05:53,910
And the point is to say, well,
when we construct the space,

1896
01:05:53,910 --> 01:05:55,424
we can just use
previous examples.

1897
01:05:55,424 --> 01:05:57,090
I don't think it's
the case that we just

1898
01:05:57,090 --> 01:05:59,071
knew something
necessarily completely

1899
01:05:59,071 --> 01:06:00,695
new in these sort of
everyday thinking.

1900
01:06:00,695 --> 01:06:01,280
Well, maybe.

1901
01:06:01,280 --> 01:06:03,030
We can argue about
that in the discussion.

1902
01:06:03,030 --> 01:06:04,863
What you actually start
out with is actually

1903
01:06:04,863 --> 01:06:07,800
taking a few examples that
you find relevant in some way

1904
01:06:07,800 --> 01:06:09,940
and using those examples
to then construct

1905
01:06:09,940 --> 01:06:12,564
your space on the fly, right?

1906
01:06:12,564 --> 01:06:13,980
You might think
about things like,

1907
01:06:13,980 --> 01:06:16,950
what other romantic dramas
do I remember in the past?

1908
01:06:16,950 --> 01:06:18,220
What do they share in common?

1909
01:06:18,220 --> 01:06:20,670
What movie names do I
know of in the past--

1910
01:06:20,670 --> 01:06:22,830
quickly finding the sort
of relevant thing for all

1911
01:06:22,830 --> 01:06:25,082
these things, and then
having the space for those,

1912
01:06:25,082 --> 01:06:26,790
and then searching
around stochastically.

1913
01:06:26,790 --> 01:06:29,010
Because you're not going to do
better than stochastic search.

1914
01:06:29,010 --> 01:06:31,385
There will come a point where
you're just bouncing around

1915
01:06:31,385 --> 01:06:32,370
at random.

1916
01:06:32,370 --> 01:06:35,509
So I used this actually, forgive
me, a paper title for SRCD

1917
01:06:35,509 --> 01:06:37,050
and came up with
some amusing things.

1918
01:06:37,050 --> 01:06:39,460
You guys can play with
that online if you want.

1919
01:06:39,460 --> 01:06:42,699
But let's do, give me a good
name for a new romantic drama.

1920
01:06:42,699 --> 01:06:45,240
So as I said, what you would do
is you would just think about

1921
01:06:45,240 --> 01:06:47,650
all the romantic dramas that
you know, like The Climbers,

1922
01:06:47,650 --> 01:06:51,150
Christine of The Big Tops,
Cupid's-- these are all actual

1923
01:06:51,150 --> 01:06:53,460
romantic dramas pulled
off of Wikipedia--

1924
01:06:53,460 --> 01:06:55,380
then use those to
construct your space.

1925
01:06:55,380 --> 01:06:58,310
Don't care about all the things
that could happen in the world.

1926
01:06:58,310 --> 01:06:58,830
OK?

1927
01:06:58,830 --> 01:07:00,150
And what do we mean
construct your space?

1928
01:07:00,150 --> 01:07:01,600
Well, there's a bunch of
ways to look the space.

1929
01:07:01,600 --> 01:07:03,660
What ideally we would
want and what I didn't do,

1930
01:07:03,660 --> 01:07:05,550
but what we're thinking
of, is to construct

1931
01:07:05,550 --> 01:07:07,320
a very, very simple
grammar which

1932
01:07:07,320 --> 01:07:08,695
instead of all
possible sentences

1933
01:07:08,695 --> 01:07:10,890
is a grammar for movie titles.

1934
01:07:10,890 --> 01:07:14,260
And this grammar usually tends
to generate things like the,

1935
01:07:14,260 --> 01:07:14,760
right?

1936
01:07:14,760 --> 01:07:15,990
The something something.

1937
01:07:15,990 --> 01:07:16,950
And [AUDIO OUT] long.

1938
01:07:16,950 --> 01:07:17,940
And then it just stops.

1939
01:07:17,940 --> 01:07:18,540
OK?

1940
01:07:18,540 --> 01:07:20,831
And it turns out that something
like the adjective noun

1941
01:07:20,831 --> 01:07:23,010
is a really good way of
generating names for pubs--

1942
01:07:23,010 --> 01:07:26,490
The White Queen, The Blond
Tiger, The Bleeding Bottle,

1943
01:07:26,490 --> 01:07:29,470
I don't know, something.

1944
01:07:29,470 --> 01:07:30,040
Right?

1945
01:07:30,040 --> 01:07:31,500
That's really useful if
only it could do that.

1946
01:07:31,500 --> 01:07:33,100
If it could construct
these tiny grammars,

1947
01:07:33,100 --> 01:07:35,183
it'll still give you an
infinite number of things.

1948
01:07:35,183 --> 01:07:37,350
But, you know,
[AUDIO OUT] movie names.

1949
01:07:37,350 --> 01:07:40,440
Or things like
verbing proper name

1950
01:07:40,440 --> 01:07:43,350
turns out to be a really good
thing for like, you know,

1951
01:07:43,350 --> 01:07:47,790
Amy Stopping,
Interrupting Timmy.

1952
01:07:47,790 --> 01:07:49,820
It's so bad.

1953
01:07:49,820 --> 01:07:52,170
And you could find that from
looking at these things.

1954
01:07:52,170 --> 01:07:54,150
And just to show
you how much I think

1955
01:07:54,150 --> 01:07:58,540
that this is, you know, actually
not that bad of a problem,

1956
01:07:58,540 --> 01:07:59,850
I did not this grammar thing.

1957
01:07:59,850 --> 01:08:01,890
I did something
even simpler, which

1958
01:08:01,890 --> 01:08:04,530
is to take all the other names
that I could find on Wikipedia

1959
01:08:04,530 --> 01:08:07,900
for different movie genres
throughout the ages,

1960
01:08:07,900 --> 01:08:10,940
and then I looked at things
like romantic dramas.

1961
01:08:10,940 --> 01:08:13,770
And what I did was construct
a very simple n-gram which

1962
01:08:13,770 --> 01:08:16,020
just takes those words
and just sort of does

1963
01:08:16,020 --> 01:08:18,000
random walk on those words.

1964
01:08:18,000 --> 01:08:18,810
OK?

1965
01:08:18,810 --> 01:08:21,233
And you could imagine
complicating this immediately

1966
01:08:21,233 --> 01:08:22,649
by taking something
like embedding

1967
01:08:22,649 --> 01:08:24,120
those words in the
high-dimensional space

1968
01:08:24,120 --> 01:08:25,880
and actually picking words
that aren't close to that.

1969
01:08:25,880 --> 01:08:27,750
So you could get new words that
were never in there before.

1970
01:08:27,750 --> 01:08:28,875
I'm not even doing that.

1971
01:08:28,875 --> 01:08:30,500
I'm doing something
ridiculously simple

1972
01:08:30,500 --> 01:08:31,999
that I don't think
people are doing.

1973
01:08:31,999 --> 01:08:34,710
But let me show you
how reasonable it is.

1974
01:08:34,710 --> 01:08:35,220
OK?

1975
01:08:35,220 --> 01:08:37,109
And what I'm going
to compare it to

1976
01:08:37,109 --> 01:08:39,410
is some stuff that we ask
people on Mechanical Turk

1977
01:08:39,410 --> 01:08:41,160
to give us names for
a new romantic drama.

1978
01:08:44,140 --> 01:08:45,930
Ah, the only thing I
forgot was the right

1979
01:08:45,930 --> 01:08:47,040
labeling for these things.

1980
01:08:47,040 --> 01:08:48,600
So Laura, what do you think?

1981
01:08:48,600 --> 01:08:51,380
Is this from Turk, or
is this my algorithm?

1982
01:08:54,380 --> 01:08:57,939
LAURA SCHULZ: I am
50/50 [AUDIO OUT]..

1983
01:08:57,939 --> 01:09:00,592
TOMER ULLMAN: So how about we'll
have by, not show of hands,

1984
01:09:00,592 --> 01:09:01,800
but people just shout it out.

1985
01:09:01,800 --> 01:09:05,935
Like, if it's, I
don't know, Turk or--

1986
01:09:05,935 --> 01:09:08,939
I'm looking for a short word
which is like a Turk or Tomer.

1987
01:09:08,939 --> 01:09:11,600
Let's do it that way, so Tomer
just standing in for Tomer'

1988
01:09:11,600 --> 01:09:13,392
simple silly algorithm.

1989
01:09:13,392 --> 01:09:14,850
So who thinks that
this was created

1990
01:09:14,850 --> 01:09:17,102
by someone an actual
human on Mechanical Turk?

1991
01:09:17,102 --> 01:09:19,560
And who thinks it was created
by Tomer mechanically running

1992
01:09:19,560 --> 01:09:20,460
through an algorithm?

1993
01:09:20,460 --> 01:09:21,090
OK.

1994
01:09:21,090 --> 01:09:24,560
So in 3, 2, 1 you're either
going to shout Turk or Tomer.

1995
01:09:24,560 --> 01:09:26,142
3, 2, 1.

1996
01:09:26,142 --> 01:09:27,750
[INTERPOSING VOICES]

1997
01:09:27,750 --> 01:09:28,500
TOMER ULLMAN: OK.

1998
01:09:28,500 --> 01:09:31,140
That was actually someone
at Mechanical Turk.

1999
01:09:31,140 --> 01:09:32,010
Let's do this again.

2000
01:09:32,010 --> 01:09:36,607
Girls In Ships for a
romantic drama, 3, 2, 1--

2001
01:09:36,607 --> 01:09:38,399
AUDIENCE: Tomer.

2002
01:09:38,399 --> 01:09:39,960
TOMER ULLMAN: This
was an algorithm.

2003
01:09:39,960 --> 01:09:42,270
Value Of Love, 3, 2, 1--

2004
01:09:42,270 --> 01:09:43,870
AUDIENCE: Turk.

2005
01:09:43,870 --> 01:09:45,600
TOMER ULLMAN: That
was Turk, good.

2006
01:09:45,600 --> 01:09:47,860
Endless Love, 3, 2, 1--

2007
01:09:47,860 --> 01:09:48,540
AUDIENCE: Turk.

2008
01:09:48,540 --> 01:09:49,140
TOMER ULLMAN: Good.

2009
01:09:49,140 --> 01:09:49,639
OK.

2010
01:09:49,639 --> 01:09:51,279
How about Legend of Paris?

2011
01:09:51,279 --> 01:09:52,370
3, 2, 1--

2012
01:09:52,370 --> 01:09:53,935
[INTERPOSING VOICES]

2013
01:09:53,935 --> 01:09:55,060
TOMER ULLMAN: Nobody knows.

2014
01:09:55,060 --> 01:09:57,050
This is actually me.

2015
01:09:57,050 --> 01:09:57,550
OK.

2016
01:09:57,550 --> 01:10:00,730
Who's enjoying this and
wants to do a few more?

2017
01:10:00,730 --> 01:10:02,980
Land of Roses, 3, 2, 1--

2018
01:10:02,980 --> 01:10:03,904
AUDIENCE: Tomer.

2019
01:10:03,904 --> 01:10:05,290
TOMER ULLMAN: Tomer.

2020
01:10:05,290 --> 01:10:07,960
And finally, Those We
Meet Again, 3, 2, 1--

2021
01:10:07,960 --> 01:10:08,790
AUDIENCE: Turk.

2022
01:10:08,790 --> 01:10:10,390
TOMER ULLMAN: No, it wasn't me.

2023
01:10:10,390 --> 01:10:11,290
Oh, sorry, one more.

2024
01:10:11,290 --> 01:10:13,065
Love Lightly, 3, 2, 1--

2025
01:10:13,065 --> 01:10:13,690
AUDIENCE: Turk.

2026
01:10:13,690 --> 01:10:14,740
TOMER ULLMAN: Yeah, Turk.

2027
01:10:14,740 --> 01:10:16,198
It seems like
Turkers were actually

2028
01:10:16,198 --> 01:10:18,760
doing better than the algorithm,
which is romantic is love.

2029
01:10:18,760 --> 01:10:20,218
And I'm just going
to put something

2030
01:10:20,218 --> 01:10:22,210
with love in the title.

2031
01:10:22,210 --> 01:10:24,490
So who wants to do this
action movies, and then

2032
01:10:24,490 --> 01:10:26,580
we'll start stop?

2033
01:10:26,580 --> 01:10:27,269
OK.

2034
01:10:27,269 --> 01:10:28,310
Let's do this for action.

2035
01:10:28,310 --> 01:10:29,830
How about The Chase?

2036
01:10:29,830 --> 01:10:30,760
3, 2, 1--

2037
01:10:30,760 --> 01:10:32,110
AUDIENCE: Turk.

2038
01:10:32,110 --> 01:10:34,345
TOMER ULLMAN: Yes, how
about Who, The Annihilation?

2039
01:10:34,345 --> 01:10:36,580
[LAUGHTER]

2040
01:10:36,580 --> 01:10:38,080
TOMER ULLMAN: OK, that's me.

2041
01:10:38,080 --> 01:10:39,190
The Oversight?

2042
01:10:39,190 --> 01:10:40,330
Turk.

2043
01:10:40,330 --> 01:10:41,450
The Edge.

2044
01:10:41,450 --> 01:10:42,250
AUDIENCE: Turk.

2045
01:10:42,250 --> 01:10:43,070
TOMER ULLMAN: Turk.

2046
01:10:43,070 --> 01:10:45,070
Jack Death?

2047
01:10:45,070 --> 01:10:45,910
Tomer.

2048
01:10:45,910 --> 01:10:46,960
Among Heroes.

2049
01:10:46,960 --> 01:10:47,892
AUDIENCE: Turk.

2050
01:10:47,892 --> 01:10:49,760
TOMER ULLMAN: No, it was me.

2051
01:10:49,760 --> 01:10:51,420
Swordmen in China Three?

2052
01:10:51,420 --> 01:10:52,348
AUDIENCE: Tomer

2053
01:10:52,348 --> 01:10:53,740
TOMER ULLMAN: Tomer.

2054
01:10:53,740 --> 01:10:54,306
And The Hit?

2055
01:10:54,306 --> 01:10:54,930
AUDIENCE: Turk.

2056
01:10:54,930 --> 01:10:55,620
TOMER ULLMAN: People on Turk.

2057
01:10:55,620 --> 01:10:57,710
You can probably [AUDIO OUT]
than four in each one.

2058
01:10:57,710 --> 01:11:00,335
And, again, people are like, The
Oversight, The Hit, The Chase,

2059
01:11:00,335 --> 01:11:00,879
The Edge.

2060
01:11:00,879 --> 01:11:02,170
That's the only thing they did.

2061
01:11:02,170 --> 01:11:04,740
They actually came up with
some clever stuff as well.

2062
01:11:04,740 --> 01:11:06,272
But, you know, it's interesting.

2063
01:11:06,272 --> 01:11:07,480
And, of course, I'm cheating.

2064
01:11:07,480 --> 01:11:09,729
Because the algorithm did a
bunch of really dumb stuff

2065
01:11:09,729 --> 01:11:13,250
that I didn't put in here,
like Hunchback of Monte Cristo,

2066
01:11:13,250 --> 01:11:15,500
Get it Did, Bell
of a Lesser God,

2067
01:11:15,500 --> 01:11:18,010
Eagles Shooting
Heroes, Tomb Raider,

2068
01:11:18,010 --> 01:11:23,890
The Raging God Of Violence, and
Legend of Legend, my favorite.

2069
01:11:23,890 --> 01:11:27,566
But my point is to say,
you know, in the same sense

2070
01:11:27,566 --> 01:11:29,190
that Joshua's saying,
you know, imagine

2071
01:11:29,190 --> 01:11:30,981
that you could use
something like a ConvNet

2072
01:11:30,981 --> 01:11:33,640
to quickly cede your proposals--
imagine if you could think

2073
01:11:33,640 --> 01:11:36,730
of like a random dumb algorithm
that could then [AUDIO OUT]

2074
01:11:36,730 --> 01:11:38,430
and say Legend of something.

2075
01:11:38,430 --> 01:11:39,680
And then you start to say, no.

2076
01:11:39,680 --> 01:11:40,930
That's not really a great idea.

2077
01:11:40,930 --> 01:11:42,304
What you're trying
to get at here

2078
01:11:42,304 --> 01:11:44,200
is not 100% accuracy
with these silly things,

2079
01:11:44,200 --> 01:11:45,740
but something like 1 in 5.

2080
01:11:45,740 --> 01:11:49,600
1 in 5 is better than 1
in 0, or 1 in a million

2081
01:11:49,600 --> 01:11:52,490
or something like that, which
is what Laura was pointing out.

2082
01:11:52,490 --> 01:11:52,990
OK.

2083
01:11:52,990 --> 01:11:55,570
So as I said, we still
have a long, long way

2084
01:11:55,570 --> 01:11:57,790
to go to model children
to meet Laura's critique.

2085
01:11:57,790 --> 01:11:58,999
It's hard to say what's hard.

2086
01:11:58,999 --> 01:12:01,248
I think that's what I was
trying to hint at with Steve

2087
01:12:01,248 --> 01:12:02,050
Piantadosi's point.

2088
01:12:02,050 --> 01:12:03,520
We don't really
know what's easy.

2089
01:12:03,520 --> 01:12:05,200
We don't really
know what's hard.

2090
01:12:05,200 --> 01:12:07,414
But people in development
and in computational land

2091
01:12:07,414 --> 01:12:09,580
should continue to care
about stochastic algorithms.

2092
01:12:09,580 --> 01:12:11,205
And people in
computational land should

2093
01:12:11,205 --> 01:12:14,440
continue to care about
children to everyone's benefit.

2094
01:12:14,440 --> 01:12:16,170
And that's it-- so, Laura.

2095
01:12:16,170 --> 01:12:18,062
LAURA SCHULZ: This
will be very short.

2096
01:12:18,062 --> 01:12:21,850
[APPLAUSE]

2097
01:12:21,850 --> 01:12:23,310
So I didn't know.

2098
01:12:23,310 --> 01:12:24,310
Or rather, I did know.

2099
01:12:24,310 --> 01:12:25,250
But I didn't know
it was going to be

2100
01:12:25,250 --> 01:12:27,560
part of this debate
about Max Siegel's thing.

2101
01:12:27,560 --> 01:12:29,390
So I'll say something
briefly about that.

2102
01:12:29,390 --> 01:12:35,216
But you've had three really good
approaches to each of these.

2103
01:12:35,216 --> 01:12:36,590
So I'll speak
briefly about them.

2104
01:12:36,590 --> 01:12:38,970
I think what Owen is
doing is totally great,

2105
01:12:38,970 --> 01:12:41,390
but still driven in
some sense by the data,

2106
01:12:41,390 --> 01:12:43,790
not by the question, right?

2107
01:12:43,790 --> 01:12:46,370
And I think the point that I'm
going to make just continually

2108
01:12:46,370 --> 01:12:49,580
here is that the way we
think is driven by the goals

2109
01:12:49,580 --> 01:12:52,100
that we have, right?

2110
01:12:52,100 --> 01:12:55,200
And each of these
solutions in some ways

2111
01:12:55,200 --> 01:12:58,100
is failing to use what is
most salient to it as humans,

2112
01:12:58,100 --> 01:12:59,700
which is we have problems.

2113
01:12:59,700 --> 01:13:01,970
We have questions, right?

2114
01:13:01,970 --> 01:13:04,850
And what I would like to
do is see us move to a case

2115
01:13:04,850 --> 01:13:07,700
where it's not just
the data that's causing

2116
01:13:07,700 --> 01:13:09,257
us to generate new ideas.

2117
01:13:09,257 --> 01:13:11,090
And we're not just
trying to deal with that.

2118
01:13:11,090 --> 01:13:16,720
It is actually the information
in the problem itself.

2119
01:13:16,720 --> 01:13:19,570
Similarly, I think what Eyal's
doing is totally beautiful.

2120
01:13:19,570 --> 01:13:21,240
And the representational
compression

2121
01:13:21,240 --> 01:13:23,310
is really, really interesting.

2122
01:13:23,310 --> 01:13:26,050
But a lot of learning
problems can't be solved.

2123
01:13:26,050 --> 01:13:28,542
Most of the ones
I was gesturing at

2124
01:13:28,542 --> 01:13:30,000
are not really
problems of changing

2125
01:13:30,000 --> 01:13:31,166
the representational format.

2126
01:13:31,166 --> 01:13:33,630
It matters hugely that
we have an Arabic numeral

2127
01:13:33,630 --> 01:13:35,430
system instead of a
Roman numeral system.

2128
01:13:35,430 --> 01:13:37,920
That changes the kinds of
problems that we can solve.

2129
01:13:37,920 --> 01:13:42,100
And so that represents
a huge advancement.

2130
01:13:42,100 --> 01:13:44,400
And for many kinds
of problems, it

2131
01:13:44,400 --> 01:13:47,160
will make search
much more efficient.

2132
01:13:47,160 --> 01:13:49,390
But a lot of problems just
don't have that property.

2133
01:13:49,390 --> 01:13:50,940
So it's, in some
sense, an answer

2134
01:13:50,940 --> 01:13:53,970
to a different kind of problem.

2135
01:13:53,970 --> 01:13:57,840
Steve's proposal-- what
can you say, right?

2136
01:13:57,840 --> 01:13:58,699
It could be true.

2137
01:13:58,699 --> 01:14:00,240
There are a billion,
billion neurons.

2138
01:14:00,240 --> 01:14:01,614
You get more
synaptic connections

2139
01:14:01,614 --> 01:14:03,450
than there are stars
in the known universe.

2140
01:14:03,450 --> 01:14:06,310
Of course, it could be true.

2141
01:14:06,310 --> 01:14:08,760
That's what an expedition
means-- a long line

2142
01:14:08,760 --> 01:14:10,532
of everybody, says Pooh.

2143
01:14:10,532 --> 01:14:13,550
But it's not as good
a story if it's true.

2144
01:14:13,550 --> 01:14:14,700
So it could be true.

2145
01:14:14,700 --> 01:14:16,950
You could do a billion
things really really fast

2146
01:14:16,950 --> 01:14:19,130
and just think about the
ones that you arrive at.

2147
01:14:19,130 --> 01:14:24,090
But I think the jury's
out on that one.

2148
01:14:24,090 --> 01:14:26,750
Max and Tomer and this--

2149
01:14:26,750 --> 01:14:28,709
and while ago I think
Sam Gershman also, right?

2150
01:14:28,709 --> 01:14:30,208
So Sam Gershman
came up to us and we

2151
01:14:30,208 --> 01:14:32,490
spent a while talking about
how you would invent what

2152
01:14:32,490 --> 01:14:35,700
we were affectionately calling a
bullshit generator, our ability

2153
01:14:35,700 --> 01:14:37,317
to [AUDIO OUT].

2154
01:14:37,317 --> 01:14:39,150
Somebody asked me about
anything, you know--

2155
01:14:39,150 --> 01:14:40,800
tell me about Ionic
and Doric columns,

2156
01:14:40,800 --> 01:14:42,510
you remember something
from sixth grade.

2157
01:14:42,510 --> 01:14:44,990
And you start talking, right?

2158
01:14:44,990 --> 01:14:46,760
So the question is,
what can you do?

2159
01:14:46,760 --> 01:14:48,510
And I think this is a
really nice attempt.

2160
01:14:48,510 --> 01:14:50,790
And I think the idea of
seeding it from past examples

2161
01:14:50,790 --> 01:14:54,090
to help construct a search space
is a really beautiful idea.

2162
01:14:54,090 --> 01:14:58,140
Again, the question is,
how do you make that.

2163
01:14:58,140 --> 01:15:01,860
My feeling is still
we can do something

2164
01:15:01,860 --> 01:15:03,600
that works for those
kinds of problems

2165
01:15:03,600 --> 01:15:04,800
where we have past examples.

2166
01:15:04,800 --> 01:15:07,880
We can do it for
any kind of problem.

2167
01:15:07,880 --> 01:15:09,360
And so what I
really want to push

2168
01:15:09,360 --> 01:15:11,760
for is use the problem, right?

2169
01:15:11,760 --> 01:15:13,499
Use the information
and the problem.

2170
01:15:13,499 --> 01:15:15,540
Because for those problems,
like romantic movies,

2171
01:15:15,540 --> 01:15:16,740
we have some existing examples.

2172
01:15:16,740 --> 01:15:17,640
We can the search space.

2173
01:15:17,640 --> 01:15:18,150
We can do that.

2174
01:15:18,150 --> 01:15:20,191
And for my theater company
example, it's perfect.

2175
01:15:20,191 --> 01:15:22,620
But for the peppermint
example, not so much, right?

2176
01:15:22,620 --> 01:15:24,870
You're not going to see the
search space from examples

2177
01:15:24,870 --> 01:15:27,210
of candies, or what you
know about the construction

2178
01:15:27,210 --> 01:15:27,720
of candies.

2179
01:15:27,720 --> 01:15:30,303
If you did, it wouldn't generate
the pendulum answer, which we

2180
01:15:30,303 --> 01:15:32,010
think is a good wrong answer.

2181
01:15:32,010 --> 01:15:34,230
So it's not just that.

2182
01:15:34,230 --> 01:15:36,600
It is I have a problem
of a particular kind.

2183
01:15:36,600 --> 01:15:39,480
It is going to be satisfied by
some kind of an answer and not

2184
01:15:39,480 --> 01:15:40,050
others.

2185
01:15:40,050 --> 01:15:43,662
How can I use that to
help my [INAUDIBLE]??

2186
01:15:43,662 --> 01:15:47,850
So that's I think the end
of what I have to say.

2187
01:15:47,850 --> 01:15:50,930
And we'll return to questions.