1
00:00:01,680 --> 00:00:04,080
The following content is
provided under a Creative

2
00:00:04,080 --> 00:00:05,620
Commons license.

3
00:00:05,620 --> 00:00:07,920
Your support will help
MIT OpenCourseWare

4
00:00:07,920 --> 00:00:12,310
continue to offer high quality,
educational resources for free.

5
00:00:12,310 --> 00:00:14,910
To make a donation or
view additional materials

6
00:00:14,910 --> 00:00:18,870
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,870 --> 00:00:20,010
at ocw.mit.edu.

8
00:00:22,750 --> 00:00:27,210
BORIS KATZ: When
a scientist though

9
00:00:27,210 --> 00:00:31,860
approaches a complex phenomena
or when an engineer looks

10
00:00:31,860 --> 00:00:37,890
at a difficult problem, they
usually break it up into pieces

11
00:00:37,890 --> 00:00:43,670
and try to understand
this phenomena

12
00:00:43,670 --> 00:00:46,380
and solve them separately.

13
00:00:46,380 --> 00:00:50,300
Understanding intelligence is
one such very, very complex,

14
00:00:50,300 --> 00:00:54,570
in fact, extraordinary complex
problem, and over the years,

15
00:00:54,570 --> 00:01:00,630
this divide and conquer
approach produced

16
00:01:00,630 --> 00:01:02,910
a number of very
successful fields

17
00:01:02,910 --> 00:01:08,130
like computer vision,
natural language processing,

18
00:01:08,130 --> 00:01:13,450
cognitive science, neuroscience,
machine vision, and so on.

19
00:01:13,450 --> 00:01:16,680
But we need to remember
the most cognitive tasks

20
00:01:16,680 --> 00:01:21,980
that humans perform actually
go across modalities,

21
00:01:21,980 --> 00:01:25,820
which is they span these
established fields.

22
00:01:25,820 --> 00:01:29,880
And the goal of our thrust is
to bring together techniques

23
00:01:29,880 --> 00:01:36,450
from all these fields,
create new models,

24
00:01:36,450 --> 00:01:39,670
and for solving
intelligence task,

25
00:01:39,670 --> 00:01:44,130
we also would like to understand
how these tasks operate

26
00:01:44,130 --> 00:01:46,410
in the brain.

27
00:01:46,410 --> 00:01:51,154
So I will start with one task,
which is scene recognition.

28
00:01:54,120 --> 00:01:56,340
What does scene
recognition involve?

29
00:01:59,150 --> 00:02:02,170
Well, machines, in order
to recognize a scene,

30
00:02:02,170 --> 00:02:06,860
a machine needs to do
some type of verification.

31
00:02:06,860 --> 00:02:08,360
Is that a street lamp?

32
00:02:11,630 --> 00:02:18,350
It uses the detection of
other people in the scene.

33
00:02:18,350 --> 00:02:20,620
It needs to do identification.

34
00:02:20,620 --> 00:02:24,290
Is this particular
building, Potala Palace,

35
00:02:24,290 --> 00:02:25,200
somewhere in Tibet?

36
00:02:27,720 --> 00:02:29,850
It needs to do object
categorization.

37
00:02:29,850 --> 00:02:32,760
Look at this image
and tell me where

38
00:02:32,760 --> 00:02:37,170
are mountains, trees, buildings,
street lamps, vendors, people,

39
00:02:37,170 --> 00:02:38,010
and so forth.

40
00:02:40,890 --> 00:02:43,950
We should also be able
to recognize activity.

41
00:02:43,950 --> 00:02:46,830
What is this person doing?

42
00:02:46,830 --> 00:02:48,570
Or what are these
two guys doing here?

43
00:02:54,940 --> 00:02:57,850
Well, currently, our
machines are pretty

44
00:02:57,850 --> 00:03:01,261
bad at all of these tasks.

45
00:03:01,261 --> 00:03:04,760
I understand that there has
been quite a lot of progress

46
00:03:04,760 --> 00:03:07,430
recently made in
machine learning.

47
00:03:07,430 --> 00:03:12,980
And I've also seen some claims
that machines perform better

48
00:03:12,980 --> 00:03:16,470
than humans in
some visual tasks.

49
00:03:16,470 --> 00:03:21,650
However, I think we
should take these claims

50
00:03:21,650 --> 00:03:24,380
with a grain of salt.

51
00:03:24,380 --> 00:03:28,070
First, there is nothing
amazing about machines

52
00:03:28,070 --> 00:03:30,320
doing certain things
better than humans.

53
00:03:30,320 --> 00:03:35,600
People did it over millennia.

54
00:03:35,600 --> 00:03:40,220
Humans needed tools
to build pyramids.

55
00:03:40,220 --> 00:03:45,050
They build tools to carry
heavy things, to lift them,

56
00:03:45,050 --> 00:03:47,700
to go faster, and so on.

57
00:03:47,700 --> 00:03:49,660
Well, you'll tell me,
no, no, no, we need,

58
00:03:49,660 --> 00:03:51,940
we are talking about
intelligent tasks.

59
00:03:51,940 --> 00:03:54,590
Well, for $5 you
can buy a calculator

60
00:03:54,590 --> 00:03:58,460
that multiplies numbers
much better than you do.

61
00:03:58,460 --> 00:04:04,280
For $100, you can build a gadget
that has a huge lookup table

62
00:04:04,280 --> 00:04:06,970
and plays chess much
better than any of you.

63
00:04:11,480 --> 00:04:14,810
So when you hear that a
computer can distinguish

64
00:04:14,810 --> 00:04:19,519
between 20 breeds of dogs,
or something like that,

65
00:04:19,519 --> 00:04:21,474
better than you
do, I don't think

66
00:04:21,474 --> 00:04:23,640
you should assume that the
vision problem is solved.

67
00:04:26,720 --> 00:04:30,320
Well, understand, I'm not
saying that because we have

68
00:04:30,320 --> 00:04:32,450
a dramatically better solution.

69
00:04:32,450 --> 00:04:34,310
Not at all.

70
00:04:34,310 --> 00:04:40,340
My point is that the problems
of real visual understanding

71
00:04:40,340 --> 00:04:45,030
and real language understanding
are extraordinary hard,

72
00:04:45,030 --> 00:04:48,530
and we need to be patient
and try to understand why

73
00:04:48,530 --> 00:04:52,940
and eventually find
better solutions.

74
00:04:52,940 --> 00:04:56,430
So back to visual
understanding problem.

75
00:04:56,430 --> 00:04:59,580
So as I said, machines are
better at these things.

76
00:04:59,580 --> 00:05:01,250
But humans are
absolutely awesome.

77
00:05:01,250 --> 00:05:05,630
You have absolutely no trouble
to do verification detection,

78
00:05:05,630 --> 00:05:07,526
identification, categorization.

79
00:05:07,526 --> 00:05:08,900
You could do much
more than that.

80
00:05:08,900 --> 00:05:14,120
You could recognize spatial
and temporal relationships

81
00:05:14,120 --> 00:05:14,990
between objects.

82
00:05:14,990 --> 00:05:16,940
You can do event recognition.

83
00:05:16,940 --> 00:05:18,170
You can explain things.

84
00:05:18,170 --> 00:05:21,780
You can look at that
image that I showed you

85
00:05:21,780 --> 00:05:29,330
and tell me what past
events caused the scene

86
00:05:29,330 --> 00:05:31,260
to look like it is.

87
00:05:31,260 --> 00:05:35,390
You can look at that scene and
say what future events might

88
00:05:35,390 --> 00:05:37,580
occur in that scene.

89
00:05:37,580 --> 00:05:38,646
You can fill gaps.

90
00:05:38,646 --> 00:05:39,770
You can hallucinate things.

91
00:05:39,770 --> 00:05:42,150
You could tell me
what objects which

92
00:05:42,150 --> 00:05:44,790
were included in that
scene which I've barely

93
00:05:44,790 --> 00:05:47,660
seen, actually, there
might be present there

94
00:05:47,660 --> 00:05:50,840
and what events not visible in
the scene could have occurred.

95
00:05:56,600 --> 00:05:59,110
So why are machines
are falling short?

96
00:06:01,970 --> 00:06:04,200
Well, in part, our
visual system is

97
00:06:04,200 --> 00:06:07,920
tuned to process structures
typically found in the world,

98
00:06:07,920 --> 00:06:10,010
but our machines have no idea.

99
00:06:10,010 --> 00:06:12,230
They don't know
enough about the world

100
00:06:12,230 --> 00:06:14,930
and they don't know what
structures and events make

101
00:06:14,930 --> 00:06:18,290
sense and typically
happen in the world.

102
00:06:22,781 --> 00:06:27,780
So I will show you
a blurry video.

103
00:06:27,780 --> 00:06:34,090
And I wonder if,
well, some of you

104
00:06:34,090 --> 00:06:36,750
who didn't see it, whether
you could figure out what's

105
00:06:36,750 --> 00:06:37,250
going on.

106
00:06:53,100 --> 00:06:56,310
Who have not seen this video?

107
00:06:56,310 --> 00:06:58,958
Could you tell me what you saw?

108
00:06:58,958 --> 00:07:01,333
AUDIENCE: A person was talking
on the phone then switched

109
00:07:01,333 --> 00:07:08,678
to working on his computer

110
00:07:08,678 --> 00:07:09,660
BORIS KATZ: Right.

111
00:07:09,660 --> 00:07:11,160
Well, this is amazing.

112
00:07:11,160 --> 00:07:13,490
Even with almost no
pixels there and you still

113
00:07:13,490 --> 00:07:16,520
recognize what's going
on, well, because you know

114
00:07:16,520 --> 00:07:18,530
what typically
happens in the world.

115
00:07:18,530 --> 00:07:21,290
Well, of course it
was sort of a joke

116
00:07:21,290 --> 00:07:23,660
and people sometimes
make mistakes, too.

117
00:07:23,660 --> 00:07:26,956
So here's the unblurred video.

118
00:07:26,956 --> 00:07:30,442
[AUDIENCE LAUGHTER]

119
00:07:37,414 --> 00:07:39,900
Well, but, all jokes aside.

120
00:07:39,900 --> 00:07:41,200
AUDIENCE: [INAUDIBLE]

121
00:07:41,200 --> 00:07:44,280
[AUDIENCE LAUGHTER]

122
00:07:44,280 --> 00:07:46,230
BORIS KATZ: Jokes
aside, though, it

123
00:07:46,230 --> 00:07:49,110
would have been extraordinary
if our machines could

124
00:07:49,110 --> 00:07:50,990
make mistakes like this.

125
00:07:50,990 --> 00:07:52,140
Not even close.

126
00:07:56,116 --> 00:07:59,610
Well, so we may want to
ask ourselves questions.

127
00:07:59,610 --> 00:08:04,340
How is this knowledge that
you seem to have is obtained?

128
00:08:04,340 --> 00:08:09,780
And how can we pass this
knowledge to our computers?

129
00:08:09,780 --> 00:08:12,420
How can we determine whether
this computer knowledge

130
00:08:12,420 --> 00:08:15,130
is correct?

131
00:08:15,130 --> 00:08:18,970
Our partial answer
is using language.

132
00:08:18,970 --> 00:08:21,420
And I bolded the
word partial here,

133
00:08:21,420 --> 00:08:26,060
because clearly there are
many other ways humans obtain

134
00:08:26,060 --> 00:08:28,160
knowledge about the
world, but today I

135
00:08:28,160 --> 00:08:33,120
will be talking about
language and show you

136
00:08:33,120 --> 00:08:38,830
what is needed to give
knowledge to the machine.

137
00:08:38,830 --> 00:08:40,870
So we have a proposal.

138
00:08:40,870 --> 00:08:44,220
We would like to create
a knowledge base that

139
00:08:44,220 --> 00:08:46,910
contains descriptions
of objects,

140
00:08:46,910 --> 00:08:49,490
their properties, the
relations between them

141
00:08:49,490 --> 00:08:51,270
as they're typically
found in the world,

142
00:08:51,270 --> 00:08:54,750
and we want to
make this knowledge

143
00:08:54,750 --> 00:08:58,980
base available to a
scene-recognition system.

144
00:08:58,980 --> 00:09:01,560
And to test the
performance of the system,

145
00:09:01,560 --> 00:09:05,940
we will ask natural
language questions.

146
00:09:05,940 --> 00:09:08,940
One of us here
has decided to see

147
00:09:08,940 --> 00:09:12,630
what questions
people actually ask,

148
00:09:12,630 --> 00:09:16,560
and he set up an
Amazon Turk experiment,

149
00:09:16,560 --> 00:09:20,490
where he showed people
hundreds of images

150
00:09:20,490 --> 00:09:23,670
and asked them to write
down, generate questions

151
00:09:23,670 --> 00:09:24,480
about these images.

152
00:09:24,480 --> 00:09:25,688
And I will show you a couple.

153
00:09:28,620 --> 00:09:29,510
Here's a scene.

154
00:09:29,510 --> 00:09:31,530
In here, the questions
that people ask,

155
00:09:31,530 --> 00:09:33,810
you know, how many
men in the picture?

156
00:09:33,810 --> 00:09:35,520
What's in the cart?

157
00:09:35,520 --> 00:09:37,530
What does the number
on the sign say?

158
00:09:37,530 --> 00:09:38,880
Is there any luggage?

159
00:09:38,880 --> 00:09:42,120
What is the color of
shirt of some lady?

160
00:09:42,120 --> 00:09:45,010
Another example.

161
00:09:45,010 --> 00:09:47,580
Who is winning, yellow or red?

162
00:09:47,580 --> 00:09:54,790
Well, the answer is, of course,
red, but how did you do that?

163
00:09:54,790 --> 00:09:57,900
Well, you need to know that
this is a sporting event, which

164
00:09:57,900 --> 00:09:58,850
all of you do.

165
00:09:58,850 --> 00:10:01,140
That it involves, in
this particular sporting

166
00:10:01,140 --> 00:10:04,044
event, winners and
losers, that you do.

167
00:10:04,044 --> 00:10:07,950
It needs to know that this
sort of short-hand yellow

168
00:10:07,950 --> 00:10:10,710
and the red means people
wearing these colors,

169
00:10:10,710 --> 00:10:13,260
rather than color themselves.

170
00:10:13,260 --> 00:10:16,530
You need to know to pay no
attention to people wearing red

171
00:10:16,530 --> 00:10:18,900
or maybe blue in the audience.

172
00:10:18,900 --> 00:10:22,410
And you also need to know that
a participant on the floor

173
00:10:22,410 --> 00:10:25,386
is likely a loser, not a winner.

174
00:10:25,386 --> 00:10:27,105
That's a lot of knowledge.

175
00:10:30,750 --> 00:10:32,700
So, back to our proposal.

176
00:10:32,700 --> 00:10:34,430
We want to try to
give at least some

177
00:10:34,430 --> 00:10:37,250
of that knowledge
using language,

178
00:10:37,250 --> 00:10:40,700
and for that, of
course, we need tools.

179
00:10:40,700 --> 00:10:46,280
And over the years, we've built
a system called START, which,

180
00:10:46,280 --> 00:10:48,530
in fact, contains
some tools that

181
00:10:48,530 --> 00:10:51,000
could be helpful for this task.

182
00:10:51,000 --> 00:10:56,330
And I will be happy to
share the API with you

183
00:10:56,330 --> 00:10:59,000
so that you could use
the system and maybe

184
00:10:59,000 --> 00:11:02,610
try to see what to do with the
knowledge that you give it.

185
00:11:05,760 --> 00:11:08,850
So there are only three
tools on this slide.

186
00:11:08,850 --> 00:11:12,055
So one is going from
language to structure.

187
00:11:12,055 --> 00:11:14,180
So we know that to provide
machines with knowledge,

188
00:11:14,180 --> 00:11:18,770
we give the machine a bunch of
sentences, texts, paragraphs,

189
00:11:18,770 --> 00:11:20,510
and that will be
converted into some kind

190
00:11:20,510 --> 00:11:21,800
of semantic representation.

191
00:11:21,800 --> 00:11:24,854
And I will show some
details of that.

192
00:11:24,854 --> 00:11:26,270
We want to go the
other direction.

193
00:11:26,270 --> 00:11:29,630
We want a machine to
explain what it does

194
00:11:29,630 --> 00:11:31,610
or describe its
knowledge using language.

195
00:11:31,610 --> 00:11:33,950
So we have a generator
that does that, that

196
00:11:33,950 --> 00:11:37,536
goes from semantic
representation to language.

197
00:11:37,536 --> 00:11:39,410
And we want to test the
machine, because it's

198
00:11:39,410 --> 00:11:41,390
very important to
know whether what

199
00:11:41,390 --> 00:11:44,750
you thought the machine, if
actually it understood you

200
00:11:44,750 --> 00:11:46,400
correctly.

201
00:11:46,400 --> 00:11:51,040
We want to ask
questions, give queries.

202
00:11:51,040 --> 00:11:53,960
Those will be converted in
semantic representations that

203
00:11:53,960 --> 00:11:56,570
will be matched against
what the machine knows.

204
00:11:56,570 --> 00:12:00,800
And the computer will either
give you a language response

205
00:12:00,800 --> 00:12:04,160
or perform some actions,
which will indicate that it

206
00:12:04,160 --> 00:12:10,290
understood what you asked.

207
00:12:10,290 --> 00:12:14,600
So we will go
through these tools

208
00:12:14,600 --> 00:12:16,940
and I will describe you
the START system in detail,

209
00:12:16,940 --> 00:12:20,230
but I just want you to remember
that this is a discipline

210
00:12:20,230 --> 00:12:21,950
and engineering
enterprise and these

211
00:12:21,950 --> 00:12:25,100
are the tools that I want
to give you and other people

212
00:12:25,100 --> 00:12:28,620
so that you could start thinking
deeper about human abilities

213
00:12:28,620 --> 00:12:32,050
and modalities, like vision
and language and others.

214
00:12:35,480 --> 00:12:38,550
Some of the building
blocks of the START system.

215
00:12:38,550 --> 00:12:42,720
We need to parse
language, we need

216
00:12:42,720 --> 00:12:45,030
to come up with
semantic representation,

217
00:12:45,030 --> 00:12:47,890
generate match,
reply, and so forth.

218
00:12:47,890 --> 00:12:51,300
So let's very quickly
go through that.

219
00:12:51,300 --> 00:12:54,160
Most of you, somewhere
in middle school,

220
00:12:54,160 --> 00:12:56,940
learned about parse trees.

221
00:12:56,940 --> 00:12:58,310
Linguists love them.

222
00:12:58,310 --> 00:12:59,850
They're beautiful.

223
00:12:59,850 --> 00:13:03,600
This is an example of a
sentence from Tom Sawyer.

224
00:13:03,600 --> 00:13:06,690
Tom greeted his
aunt who was sitting

225
00:13:06,690 --> 00:13:11,380
by an open window in a
pleasant rearward apartment.

226
00:13:11,380 --> 00:13:15,770
And linguists like to
argue about exactly

227
00:13:15,770 --> 00:13:18,060
the representations, but
pretty much most of them

228
00:13:18,060 --> 00:13:20,370
will agree that something
like that represents

229
00:13:20,370 --> 00:13:25,440
the structure, the syntactic
structure of the sentence.

230
00:13:25,440 --> 00:13:27,630
Well, they're
beautiful and nice,

231
00:13:27,630 --> 00:13:29,430
but they're really
horrible if you

232
00:13:29,430 --> 00:13:32,340
want to store them, if
you want to match them,

233
00:13:32,340 --> 00:13:34,740
if you want to
retrieve from them.

234
00:13:34,740 --> 00:13:38,400
And so we use the information
found in parse trees,

235
00:13:38,400 --> 00:13:40,890
but we developed a
different representation

236
00:13:40,890 --> 00:13:44,100
which we call Ternary
expression representation, which

237
00:13:44,100 --> 00:13:46,560
is a more semantic
representation of language.

238
00:13:46,560 --> 00:13:51,450
It is syntax-driven, but it
highlights semantic relations

239
00:13:51,450 --> 00:13:56,190
which humans find
important, and it also,

240
00:13:56,190 --> 00:13:59,740
because we give this
knowledge to computers,

241
00:13:59,740 --> 00:14:02,880
we made it very efficient
for indexing, for matching,

242
00:14:02,880 --> 00:14:03,930
and for retrieval.

243
00:14:03,930 --> 00:14:06,500
It's also reversible, and
I'll explain in a second what

244
00:14:06,500 --> 00:14:07,770
I mean by that.

245
00:14:07,770 --> 00:14:12,120
We implemented it as a
nested subject, relation,

246
00:14:12,120 --> 00:14:15,690
object set of tuples.

247
00:14:15,690 --> 00:14:18,510
So here is a different sentence.

248
00:14:18,510 --> 00:14:22,500
Say you have an image.

249
00:14:22,500 --> 00:14:25,470
You may recognize one of the
characters there in the back.

250
00:14:25,470 --> 00:14:27,180
It's Andrei Barbu.

251
00:14:27,180 --> 00:14:30,210
And say you want to describe,
in language, what you see here.

252
00:14:30,210 --> 00:14:32,520
And you say something
like, the person

253
00:14:32,520 --> 00:14:37,110
who picked up the yellow
lemon placed it in the bowl

254
00:14:37,110 --> 00:14:39,510
on the table.

255
00:14:39,510 --> 00:14:44,660
Using this structure, subject,
relation, object structure,

256
00:14:44,660 --> 00:14:47,470
you could, and first
parsing the sentence,

257
00:14:47,470 --> 00:14:49,990
you could create this
Ternary expression.

258
00:14:49,990 --> 00:14:52,110
And you could see,
person picked up lemon.

259
00:14:52,110 --> 00:14:57,420
That same person placed that
lemon in the bowl on the table,

260
00:14:57,420 --> 00:15:01,410
and that lemon
happens to be yellow.

261
00:15:01,410 --> 00:15:05,010
To make it a little
bit more easy

262
00:15:05,010 --> 00:15:10,250
for you to see what's going
on and convenient for humans

263
00:15:10,250 --> 00:15:14,100
and machines, we created a sort
of topologically equivalent

264
00:15:14,100 --> 00:15:18,720
linearization of that
graph, that knowledge

265
00:15:18,720 --> 00:15:22,730
graph, and as a set of triples.

266
00:15:22,730 --> 00:15:24,240
They're a little
bit misleading here

267
00:15:24,240 --> 00:15:29,130
just due to its simplicity,
but all words here, of course,

268
00:15:29,130 --> 00:15:33,380
need to have an index,
because if you have,

269
00:15:33,380 --> 00:15:37,890
say, a tall person
and a short person

270
00:15:37,890 --> 00:15:41,350
or, then, you will have to
distinguish between them.

271
00:15:41,350 --> 00:15:43,770
So you need indices,
but for simplicity, I

272
00:15:43,770 --> 00:15:45,050
didn't show them here.

273
00:15:45,050 --> 00:15:47,760
And then the verbs
also have indices

274
00:15:47,760 --> 00:15:50,460
so that when you use the
same word place here,

275
00:15:50,460 --> 00:15:52,650
that is the relation,
the triple, that

276
00:15:52,650 --> 00:15:55,350
happened to be in the
bowl, that the person

277
00:15:55,350 --> 00:15:56,550
placed a lemon in the bowl.

278
00:15:59,100 --> 00:16:02,690
So this is all
representation and when

279
00:16:02,690 --> 00:16:07,080
we distinguished at least three
types of Ternary expressions.

280
00:16:07,080 --> 00:16:12,330
So the first type you see
here, syntactic structure

281
00:16:12,330 --> 00:16:16,020
of a sentence.

282
00:16:16,020 --> 00:16:17,730
We also have syntactic features.

283
00:16:17,730 --> 00:16:22,700
The fact that the sitting
was, it was past tense.

284
00:16:22,700 --> 00:16:25,500
She was sitting, it said here.

285
00:16:25,500 --> 00:16:29,280
But it's a progressive
tense, was sitting, and also

286
00:16:29,280 --> 00:16:30,980
what kind of
article things have.

287
00:16:30,980 --> 00:16:36,720
The window had an article
and an indefinite article.

288
00:16:36,720 --> 00:16:40,920
And also the lexical features
that don't change from sentence

289
00:16:40,920 --> 00:16:44,310
to sentence, the fact that Tom
is a proper noun and so forth.

290
00:16:51,810 --> 00:16:59,420
Well, I told you that our
representation is reversible.

291
00:16:59,420 --> 00:17:03,756
Well, we need to be able to
teach machines to talk to us.

292
00:17:03,756 --> 00:17:05,339
And there are many
reasons to do that.

293
00:17:05,339 --> 00:17:07,099
Some of them are
shown on this slide.

294
00:17:07,099 --> 00:17:11,420
We want your robot
or your computer

295
00:17:11,420 --> 00:17:16,339
to explain what it
does somewhat remotely.

296
00:17:16,339 --> 00:17:18,020
You want your
machine or your robot

297
00:17:18,020 --> 00:17:20,329
to answer questions
which are complex

298
00:17:20,329 --> 00:17:26,569
and the robot may want to ask
you about it for clarification.

299
00:17:26,569 --> 00:17:29,870
You want to keep track of
conversation history and state.

300
00:17:29,870 --> 00:17:32,060
Engage in
mixed-initiative dialogue.

301
00:17:32,060 --> 00:17:33,840
Offer related information.

302
00:17:33,840 --> 00:17:38,300
So all these things need
to happen in the dialogue

303
00:17:38,300 --> 00:17:40,490
and, therefore,
your computer must

304
00:17:40,490 --> 00:17:43,730
be able to speak to you in a
language that you understand.

305
00:17:43,730 --> 00:17:51,500
In fact, I find that the biggest
problem with learning systems

306
00:17:51,500 --> 00:17:55,670
that we have today is
that some of the work

307
00:17:55,670 --> 00:17:58,160
can work quite
robustly, and sometimes

308
00:17:58,160 --> 00:18:01,680
give you good results,
but you have no idea why.

309
00:18:01,680 --> 00:18:04,130
You press a button, you
say, aha, here's the number,

310
00:18:04,130 --> 00:18:07,340
and put it in the paper,
but more recently, people

311
00:18:07,340 --> 00:18:10,470
started looking at why
it does what it does.

312
00:18:10,470 --> 00:18:11,990
But, again, it's
done by numbers.

313
00:18:11,990 --> 00:18:14,570
It would be really
wonderful if our learning

314
00:18:14,570 --> 00:18:17,520
system could tell us why they
came up with their conclusions.

315
00:18:17,520 --> 00:18:22,970
So we need language
and so we created,

316
00:18:22,970 --> 00:18:24,950
sort of built a
START generator that

317
00:18:24,950 --> 00:18:33,740
goes from those same expressions
and create natural language.

318
00:18:36,470 --> 00:18:41,810
This is why we call this
representation reversable.

319
00:18:41,810 --> 00:18:46,202
So given a set of
Ternary expressions,

320
00:18:46,202 --> 00:18:47,660
the machine will
create a sentence,

321
00:18:47,660 --> 00:18:50,390
the person who picked out
the yellow lemon placed it

322
00:18:50,390 --> 00:18:53,270
in the bowl on the table.

323
00:18:53,270 --> 00:18:55,550
But, of course, this
is a little bit silly,

324
00:18:55,550 --> 00:18:57,530
just parroting the
same sentence back.

325
00:18:57,530 --> 00:18:58,950
You want the
machine, for example,

326
00:18:58,950 --> 00:19:01,520
as I said, to ask you
a question or indicate

327
00:19:01,520 --> 00:19:04,980
a negative statement or rephrase
it from different pieces

328
00:19:04,980 --> 00:19:06,450
that it knows about.

329
00:19:06,450 --> 00:19:09,600
So, in fact, our generator
is very flexible.

330
00:19:09,600 --> 00:19:13,960
So here is an example where
by, say, observing the world,

331
00:19:13,960 --> 00:19:17,770
robot adds more information
to this representation,

332
00:19:17,770 --> 00:19:19,970
and they're indicated
here in blue.

333
00:19:19,970 --> 00:19:22,130
And now from the
original sentence about,

334
00:19:22,130 --> 00:19:24,440
which was the person who
picked out the yellow lemon,

335
00:19:24,440 --> 00:19:26,820
placed it in the
bowl, the machine,

336
00:19:26,820 --> 00:19:30,020
by just adding a couple
of new relations,

337
00:19:30,020 --> 00:19:33,350
the generator will be able to
ask a question, or the human.

338
00:19:33,350 --> 00:19:35,870
Will the person who placed
the yellow lemon in the bowl

339
00:19:35,870 --> 00:19:37,500
on the table pick it up soon?

340
00:19:37,500 --> 00:19:38,450
And so forth.

341
00:19:43,430 --> 00:19:47,180
All right, so we talked
about parse trees,

342
00:19:47,180 --> 00:19:51,506
about semantic representation,
about generation.

343
00:19:51,506 --> 00:19:52,630
So what do we do with that?

344
00:19:52,630 --> 00:19:58,130
Let's-- Supposing that you gave
some knowledge to your machine,

345
00:19:58,130 --> 00:20:03,800
hear that Tom Sawyer assertion,
and somebody asked you,

346
00:20:03,800 --> 00:20:08,360
was anyone sitting
by an open window?

347
00:20:08,360 --> 00:20:10,880
Well, what needs to
happen is this question

348
00:20:10,880 --> 00:20:13,880
needs to be converted into
Ternary representation

349
00:20:13,880 --> 00:20:19,340
as I indicated, and this
knowledge base, we assume,

350
00:20:19,340 --> 00:20:22,910
had the knowledge from the
original assertion plus million

351
00:20:22,910 --> 00:20:24,470
other assertions, of course.

352
00:20:24,470 --> 00:20:27,890
So we would need to match the
representation of the query

353
00:20:27,890 --> 00:20:30,680
with the knowledge base,
and the machine will say,

354
00:20:30,680 --> 00:20:33,140
aha, here is the match.

355
00:20:33,140 --> 00:20:35,360
Well, this is very
simple here, if you

356
00:20:35,360 --> 00:20:38,660
think window needs
to open to window,

357
00:20:38,660 --> 00:20:42,920
open to match window open,
open, and sit and sit,

358
00:20:42,920 --> 00:20:46,970
and then the word anyone, which
sort of matching the word,

359
00:20:46,970 --> 00:20:49,190
needs to match aunt.

360
00:20:49,190 --> 00:20:51,410
But, in reality,
of course, people

361
00:20:51,410 --> 00:20:54,110
ask questions which do
not that closely follow

362
00:20:54,110 --> 00:20:57,430
what the machine knows,
and so for our matcher,

363
00:20:57,430 --> 00:21:00,090
it needs to be much
more sophisticated.

364
00:21:00,090 --> 00:21:04,100
This is just the graphical
effect, sort of knowledge,

365
00:21:04,100 --> 00:21:08,270
graph representation
of that match.

366
00:21:08,270 --> 00:21:14,020
So START distinguishes
things like term matching,

367
00:21:14,020 --> 00:21:16,460
which as I showed, could
be a lexical match.

368
00:21:16,460 --> 00:21:18,600
Of course, it knows synonym.

369
00:21:18,600 --> 00:21:22,070
It knows hyponymy
when it goes one way.

370
00:21:22,070 --> 00:21:25,190
It's like a car is a vehicle.

371
00:21:25,190 --> 00:21:28,360
And as you can imagine, the
match also needs to go one way.

372
00:21:28,360 --> 00:21:33,800
If I say I bought a car, it also
means that I bought a vehicle.

373
00:21:33,800 --> 00:21:36,010
But if I say I bought
a vehicle, it's

374
00:21:36,010 --> 00:21:38,090
not true that I bought
a car because I may have

375
00:21:38,090 --> 00:21:40,730
bought a truck, but this aside.

376
00:21:40,730 --> 00:21:45,290
So it's pretty easy to do
matching on the level of terms

377
00:21:45,290 --> 00:21:51,590
of words, but a much
more complex problem

378
00:21:51,590 --> 00:21:55,400
is to match on the
level of the structure.

379
00:21:55,400 --> 00:21:59,225
And I will show you some
examples of this problem.

380
00:22:02,270 --> 00:22:04,010
But by now, you must
have figured out

381
00:22:04,010 --> 00:22:05,976
I love to stare at
English sentences.

382
00:22:05,976 --> 00:22:07,100
I hope you do a little bit.

383
00:22:07,100 --> 00:22:10,020
If not, please try.

384
00:22:10,020 --> 00:22:13,190
So here, let's consider
a couple of verbs.

385
00:22:13,190 --> 00:22:16,310
So here is the verb surprise.

386
00:22:16,310 --> 00:22:19,040
Let's consider
these two sentences.

387
00:22:19,040 --> 00:22:23,640
The patient surprised the
doctor with his fast recovery.

388
00:22:23,640 --> 00:22:27,350
The patient's fast recovery
surprised the doctor.

389
00:22:27,350 --> 00:22:31,130
Now for you who are used
to understanding language

390
00:22:31,130 --> 00:22:33,230
so quickly, it's even
hard to hear what's

391
00:22:33,230 --> 00:22:34,940
different about the sentences.

392
00:22:34,940 --> 00:22:36,457
But if you actually
do the parsing,

393
00:22:36,457 --> 00:22:38,540
you will see that the parse
trees are dramatically

394
00:22:38,540 --> 00:22:42,492
different, and therefore,
our Ternary representations

395
00:22:42,492 --> 00:22:43,450
will be very different.

396
00:22:43,450 --> 00:22:46,220
So we need to find a way
to tell the machine, yeah,

397
00:22:46,220 --> 00:22:48,430
that's the same thing.

398
00:22:48,430 --> 00:22:51,470
And the linguists call
these things syntactic

399
00:22:51,470 --> 00:22:52,740
alternations as well.

400
00:22:55,670 --> 00:22:59,680
In a different verb like load,
here's a different alternation.

401
00:22:59,680 --> 00:23:03,290
The crane loaded the ship
with containers or the crane

402
00:23:03,290 --> 00:23:05,540
loaded containers onto the ship.

403
00:23:05,540 --> 00:23:07,980
Again, means the same
thing pretty much,

404
00:23:07,980 --> 00:23:12,350
but the surface
representation is different.

405
00:23:12,350 --> 00:23:15,680
The next one is in
terms of a question.

406
00:23:15,680 --> 00:23:17,960
Did Iran provide
Syria with weapons

407
00:23:17,960 --> 00:23:21,420
or did Iran provide
weapons to Syria?

408
00:23:21,420 --> 00:23:24,540
Let's see if it's true for
every verb in the universe.

409
00:23:24,540 --> 00:23:28,010
So let's try to look at the,
say, surprise alternation

410
00:23:28,010 --> 00:23:32,730
and use it with a load verb
or the other way around.

411
00:23:32,730 --> 00:23:35,630
So I hope you're
bearing with me.

412
00:23:35,630 --> 00:23:42,530
So linguists put stars in
front of bad sentences.

413
00:23:42,530 --> 00:23:46,640
And so here I tried to
use the word surprise

414
00:23:46,640 --> 00:23:51,236
without alternation, which
load allows you to do.

415
00:23:51,236 --> 00:23:54,270
Here, you do the same onto, and
it says the patient surprised

416
00:23:54,270 --> 00:23:55,920
fast recovery onto the doctor.

417
00:23:55,920 --> 00:23:58,880
It makes absolutely no sense.

418
00:23:58,880 --> 00:24:04,770
Here I tried to do surprise
alternation for the word load.

419
00:24:04,770 --> 00:24:07,730
And it says the crane's
containers loaded the ship.

420
00:24:07,730 --> 00:24:09,260
Again, complete gibberish.

421
00:24:09,260 --> 00:24:10,400
And the same below.

422
00:24:10,400 --> 00:24:13,920
Did Iran's weapons
provide Syria?

423
00:24:13,920 --> 00:24:16,280
So it looks like a
really horrible story.

424
00:24:16,280 --> 00:24:18,570
Every English verb,
it looks like,

425
00:24:18,570 --> 00:24:25,250
has a different way of saying,
expressing these alternations,

426
00:24:25,250 --> 00:24:30,110
but fortunately,
this is not the case.

427
00:24:30,110 --> 00:24:31,850
Let's go back to
the verb surprise.

428
00:24:37,010 --> 00:24:43,440
Well, let's look at verbs
similar to the word surprise.

429
00:24:43,440 --> 00:24:45,680
So you could use
the same alternation

430
00:24:45,680 --> 00:24:46,700
with the word confused.

431
00:24:46,700 --> 00:24:49,070
You can say the patient
confused the doctor

432
00:24:49,070 --> 00:24:51,170
with slow recovery,
which will convert

433
00:24:51,170 --> 00:24:54,020
into the patient's slow
recovery confused the doctor.

434
00:24:54,020 --> 00:24:55,700
You can say the same
thing with anger,

435
00:24:55,700 --> 00:25:00,730
disappoint, embarrass, fight,
and impress, please, threaten.

436
00:25:00,730 --> 00:25:03,930
And what is really amazing
about it and very interesting

437
00:25:03,930 --> 00:25:10,210
that this syntactic
alternation works

438
00:25:10,210 --> 00:25:13,720
the same way for verbs
of the same meaning,

439
00:25:13,720 --> 00:25:15,520
of the same semantic class.

440
00:25:15,520 --> 00:25:19,330
And this particular class is
called the emotional reaction

441
00:25:19,330 --> 00:25:20,220
verbs.

442
00:25:20,220 --> 00:25:24,190
And it's a large semantic
class of about 300 verbs,

443
00:25:24,190 --> 00:25:26,470
and they all behave
identically from the point

444
00:25:26,470 --> 00:25:28,300
of view of that alternations.

445
00:25:28,300 --> 00:25:31,084
And it's true for all other
alternations that I showed you.

446
00:25:31,084 --> 00:25:32,500
So that's, of
course, is good news

447
00:25:32,500 --> 00:25:35,590
because it makes an interesting
connection between syntax

448
00:25:35,590 --> 00:25:40,240
and semantics,
but it also allows

449
00:25:40,240 --> 00:25:44,980
to build lexicons that are
more compact and easier

450
00:25:44,980 --> 00:25:46,030
to deal with.

451
00:25:46,030 --> 00:25:50,560
And one can imagine creating
this verb membership

452
00:25:50,560 --> 00:25:53,950
automatically by
looking at large corpus.

453
00:25:53,950 --> 00:25:57,450
And this is how,
presumably, children

454
00:25:57,450 --> 00:26:01,500
learn these verb classes
and these alternations.

455
00:26:01,500 --> 00:26:04,240
All right, so now that
you know how to match

456
00:26:04,240 --> 00:26:05,897
and it's not just
trivial match only,

457
00:26:05,897 --> 00:26:07,980
like I showed here, but a
more sophisticated match

458
00:26:07,980 --> 00:26:11,590
on the level of
structure as well,

459
00:26:11,590 --> 00:26:13,850
let's see what we can do
after the match happens.

460
00:26:13,850 --> 00:26:16,720
So here's the same sentence
and the same question.

461
00:26:16,720 --> 00:26:19,030
Was anybody sitting
by an open window?

462
00:26:19,030 --> 00:26:22,810
We retrieved the
structure and then

463
00:26:22,810 --> 00:26:26,980
we could tell our generator,
go and generate the sentence,

464
00:26:26,980 --> 00:26:28,630
and it will do that.

465
00:26:28,630 --> 00:26:31,150
Tom's aunt was sitting
by an open window

466
00:26:31,150 --> 00:26:34,990
in a pleasant
rearward apartment.

467
00:26:34,990 --> 00:26:36,250
Well, it's not as interesting.

468
00:26:36,250 --> 00:26:39,910
It's sort of parroting the
same thing I tell it, ABC,

469
00:26:39,910 --> 00:26:44,392
and it told me
back, ABC, if I ask

470
00:26:44,392 --> 00:26:47,489
who BC or something like that.

471
00:26:47,489 --> 00:26:49,530
If you want to build a
question-answering system,

472
00:26:49,530 --> 00:26:52,540
we want it to be able to,
in response to a question,

473
00:26:52,540 --> 00:26:56,590
understand it, go somewhere,
find the right answer,

474
00:26:56,590 --> 00:26:59,020
and give it back to you.

475
00:26:59,020 --> 00:27:04,030
And we build that and we
do it by a general, where,

476
00:27:04,030 --> 00:27:08,260
if looking at it, our system can
execute a procedure in response

477
00:27:08,260 --> 00:27:13,040
to a match to obtain the
answer from the data source.

478
00:27:13,040 --> 00:27:15,820
So an example is here and I
can show you some screenshots

479
00:27:15,820 --> 00:27:19,070
or, in fact, if you like it, we
can play with the system live

480
00:27:19,070 --> 00:27:20,590
and you'll see what it does.

481
00:27:20,590 --> 00:27:23,160
So it executes a procedure to
obtain an answer from the data

482
00:27:23,160 --> 00:27:24,260
source.

483
00:27:24,260 --> 00:27:28,605
If you say who directed Gone
With the Wind, from match,

484
00:27:28,605 --> 00:27:32,320
it will happen between what
you know, what you ask,

485
00:27:32,320 --> 00:27:36,010
and what the system knows,
some script will get executed

486
00:27:36,010 --> 00:27:38,830
and the machine will go to some
data source, find the answer,

487
00:27:38,830 --> 00:27:40,390
and give it back to you.

488
00:27:40,390 --> 00:27:41,380
So how this is done?

489
00:27:41,380 --> 00:27:47,180
Well, in order to explain
that, I need two more ideas.

490
00:27:47,180 --> 00:27:49,850
One is a natural
language annotation idea.

491
00:27:49,850 --> 00:27:53,830
So annotations,
sentences, and phrases

492
00:27:53,830 --> 00:27:56,626
that describe the content
of retrievable information

493
00:27:56,626 --> 00:27:58,750
segments, this is a graphic
sort of, in a cute way,

494
00:27:58,750 --> 00:28:02,110
show this sentence level,
or phrase level labels

495
00:28:02,110 --> 00:28:06,700
on some data, and they describe
the retrievable information

496
00:28:06,700 --> 00:28:07,210
segments.

497
00:28:07,210 --> 00:28:10,780
Annotations are then matched
against submitted queries

498
00:28:10,780 --> 00:28:13,360
and the successful
match results in,

499
00:28:13,360 --> 00:28:17,860
either retrieval of that
information or some procedure,

500
00:28:17,860 --> 00:28:20,901
to retrieve that information.

501
00:28:20,901 --> 00:28:25,210
And the special case
of this procedure

502
00:28:25,210 --> 00:28:30,490
is done using our
object-property-value data

503
00:28:30,490 --> 00:28:32,800
model.

504
00:28:32,800 --> 00:28:36,760
This technique can
connect language

505
00:28:36,760 --> 00:28:42,160
to arbitrary procedures,
but as I said,

506
00:28:42,160 --> 00:28:47,260
let's consider a lot of
semi-structured information

507
00:28:47,260 --> 00:28:49,540
sources available
on the web that

508
00:28:49,540 --> 00:28:54,820
can be modeled using this
object-property-value model.

509
00:28:54,820 --> 00:28:59,230
Well, what are these,
what kind of repositories

510
00:28:59,230 --> 00:29:00,130
have this property?

511
00:29:00,130 --> 00:29:01,840
If you think about
it, almost anything

512
00:29:01,840 --> 00:29:05,330
that humans create on the
web, which is semi-structured,

513
00:29:05,330 --> 00:29:06,190
is like that.

514
00:29:06,190 --> 00:29:09,580
If you have a site that
has a bunch of countries

515
00:29:09,580 --> 00:29:12,490
with properties like
populations, areas

516
00:29:12,490 --> 00:29:17,290
capitals, birthrates, and
so forth, the country is

517
00:29:17,290 --> 00:29:20,140
an object, the word
population is a property,

518
00:29:20,140 --> 00:29:24,250
and value is the actual
value of that property.

519
00:29:24,250 --> 00:29:26,680
You can have people
with their birth-dates,

520
00:29:26,680 --> 00:29:30,560
you can have cities with maps
and elevations and so forth.

521
00:29:30,560 --> 00:29:32,980
So in a sense, this
object-property-value model

522
00:29:32,980 --> 00:29:38,500
makes it possible to view and
use large segments of the web

523
00:29:38,500 --> 00:29:40,770
as a database.

524
00:29:40,770 --> 00:29:44,120
And schematically, here's
how START uses this model.

525
00:29:44,120 --> 00:29:46,480
A user asks the language
part of the system

526
00:29:46,480 --> 00:29:49,820
a question, the system needs
to understand the question,

527
00:29:49,820 --> 00:29:53,530
understands where the
question might be found,

528
00:29:53,530 --> 00:29:55,810
and what is the
object and property

529
00:29:55,810 --> 00:29:57,970
implicit in the question.

530
00:29:57,970 --> 00:29:59,700
After START does
that, it says it

531
00:29:59,700 --> 00:30:03,790
has a friend called Omnibase,
and it says, go get it.

532
00:30:03,790 --> 00:30:05,500
Go to this site.

533
00:30:05,500 --> 00:30:11,700
Go for that symbol called France
and go get the population.

534
00:30:11,700 --> 00:30:14,580
And it will go to
some world fact book

535
00:30:14,580 --> 00:30:18,580
and get the population.

536
00:30:18,580 --> 00:30:22,740
So this is how the
system works and here's

537
00:30:22,740 --> 00:30:24,440
an example of such a question.

538
00:30:24,440 --> 00:30:26,640
Here, the question
is, it's a screenshot,

539
00:30:26,640 --> 00:30:29,276
does Russia border on Moldova?

540
00:30:29,276 --> 00:30:33,180
The system says, ah ha,
you want to find out

541
00:30:33,180 --> 00:30:35,880
what countries border
Moldova, and find out

542
00:30:35,880 --> 00:30:38,940
whether Russia is among them.

543
00:30:38,940 --> 00:30:43,260
And then it actually checks that
and it tells you, no, Russia

544
00:30:43,260 --> 00:30:45,240
does not border Moldova,
because it doesn't

545
00:30:45,240 --> 00:30:48,690
find Russia in this response.

546
00:30:48,690 --> 00:30:51,920
And just for comparison, if
you ask the same question

547
00:30:51,920 --> 00:30:55,430
of a search engine,
it will give you

548
00:30:55,430 --> 00:31:01,020
24 million results, today
maybe 240 million results,

549
00:31:01,020 --> 00:31:02,940
and none of them really
answer the question.

550
00:31:09,000 --> 00:31:13,130
Well, I just want to tell you
that the ability to understand

551
00:31:13,130 --> 00:31:15,020
something really helps.

552
00:31:15,020 --> 00:31:17,210
In this case, the ability
to understand language

553
00:31:17,210 --> 00:31:19,880
gives you a lot of power.

554
00:31:19,880 --> 00:31:22,890
You can do a lot by trying
through the keywords

555
00:31:22,890 --> 00:31:24,690
and you can retrieve
a lot of documents,

556
00:31:24,690 --> 00:31:30,090
and this is how all of, pretty
much, modern systems work.

557
00:31:30,090 --> 00:31:32,960
But if you want to do something
a little bit more complex,

558
00:31:32,960 --> 00:31:34,460
it would be nice
to understand some.

559
00:31:34,460 --> 00:31:37,790
So here's an example
of a complex question.

560
00:31:37,790 --> 00:31:40,340
Who is the president of
the fourth largest country

561
00:31:40,340 --> 00:31:42,560
married to?

562
00:31:42,560 --> 00:31:46,520
Well, if you can analyze
this question into pieces,

563
00:31:46,520 --> 00:31:51,140
then you can very quickly
figure out that, right away,

564
00:31:51,140 --> 00:31:53,560
just throwing the pieces
on your knowledge base,

565
00:31:53,560 --> 00:31:55,250
you cannot resolve it.

566
00:31:55,250 --> 00:31:59,180
But we've built a very nice
syntatical-based algorithm

567
00:31:59,180 --> 00:32:04,700
that allows us to resolve the
complex questions into subsets

568
00:32:04,700 --> 00:32:06,860
of simpler questions and
understand in which order

569
00:32:06,860 --> 00:32:07,940
to ask them.

570
00:32:07,940 --> 00:32:10,070
So the machine
will say, oh, first

571
00:32:10,070 --> 00:32:12,350
I need to find out what's
the fourth largest country,

572
00:32:12,350 --> 00:32:14,840
then who its president
is, and then with that,

573
00:32:14,840 --> 00:32:16,490
who he is married to.

574
00:32:16,490 --> 00:32:20,140
And, very quickly, this is
how, schematically, it's done.

575
00:32:20,140 --> 00:32:22,170
This is sort of
an under the hood

576
00:32:22,170 --> 00:32:24,990
Ternary expression
representation of the question.

577
00:32:24,990 --> 00:32:27,530
The machine says, oh,
too hard, let's first

578
00:32:27,530 --> 00:32:30,515
find out what the fourth
largest country is.

579
00:32:30,515 --> 00:32:32,060
It's China.

580
00:32:32,060 --> 00:32:34,700
Then let's find out,
it's still hard,

581
00:32:34,700 --> 00:32:37,010
so let's find out who
the president of China

582
00:32:37,010 --> 00:32:41,810
is, found the name, and then
the next is just a lookup table.

583
00:32:41,810 --> 00:32:42,950
Who is he married to?

584
00:32:42,950 --> 00:32:45,270
And it gives you an answer.

585
00:32:51,280 --> 00:32:52,230
Some other examples.

586
00:32:52,230 --> 00:32:54,970
In what city was the fifth
president of the US born?

587
00:32:54,970 --> 00:32:57,850
And finds like a James Monroe
and gives you the city.

588
00:33:00,480 --> 00:33:02,960
What books did the author
of War and Peace write?

589
00:33:02,960 --> 00:33:06,790
Finds Leo Tolstoy and finds his
books from different sources.

590
00:33:11,080 --> 00:33:16,860
So the technology
that I described,

591
00:33:16,860 --> 00:33:22,260
object-property-value data
model, our Ternary expression

592
00:33:22,260 --> 00:33:29,990
representation, complex question
answering, the annotation,

593
00:33:29,990 --> 00:33:33,440
natural language
annotation, representation.

594
00:33:33,440 --> 00:33:37,840
They, over the years,
inspired a bunch of companies

595
00:33:37,840 --> 00:33:41,650
and a bunch of technologies,
starting with Ask Jeeves, who,

596
00:33:41,650 --> 00:33:45,010
I guess, existed before
you guys were even born,

597
00:33:45,010 --> 00:33:47,420
to Wolfram Alpha,
who pretty much took

598
00:33:47,420 --> 00:33:49,880
[INAUDIBLE] wholesale,
to more recently,

599
00:33:49,880 --> 00:33:53,470
Google QA started doing
really wonderful things using

600
00:33:53,470 --> 00:33:57,720
this idea that, everybody
had the idea that you should

601
00:33:57,720 --> 00:33:59,300
go from surface to a question.

602
00:33:59,300 --> 00:34:02,020
If you have a question,
you throw your question

603
00:34:02,020 --> 00:34:04,300
onto the web and
you get some answer,

604
00:34:04,300 --> 00:34:07,640
but it doesn't work
with high precision.

605
00:34:07,640 --> 00:34:11,320
So the idea that you need to
curate knowledge and build

606
00:34:11,320 --> 00:34:13,600
some huge depository
of knowledge

607
00:34:13,600 --> 00:34:16,630
was picked up by these
companies to, certainly now,

608
00:34:16,630 --> 00:34:22,290
all these companies do quite
decent question answering.

609
00:34:22,290 --> 00:34:25,760
And the same is true
for Watson and Siri,

610
00:34:25,760 --> 00:34:30,520
and I was involved in
some of these things,

611
00:34:30,520 --> 00:34:34,370
so I will show you.

612
00:34:34,370 --> 00:34:36,730
Let's see.

613
00:34:36,730 --> 00:34:38,139
Right, so let's start with this.

614
00:34:41,560 --> 00:34:46,090
About 10 years ago,
we, on top of START,

615
00:34:46,090 --> 00:34:53,431
we built a system that was
connected to a cell phone.

616
00:34:53,431 --> 00:34:55,389
I don't know how many of
you remember the world

617
00:34:55,389 --> 00:34:58,120
without smartphones,
but that was

618
00:34:58,120 --> 00:34:59,450
when smartphones wasn't there.

619
00:34:59,450 --> 00:35:01,840
There was no such
thing as iPhone.

620
00:35:01,840 --> 00:35:04,090
So there's a vanilla
phone that, all it did, it

621
00:35:04,090 --> 00:35:06,520
made phone calls.

622
00:35:06,520 --> 00:35:09,950
Of course, it also had a
camera and unlimited text.

623
00:35:09,950 --> 00:35:11,800
But it really
didn't do much more

624
00:35:11,800 --> 00:35:15,890
and we decided it's time
to connect it to language.

625
00:35:15,890 --> 00:35:22,860
So we convinced the company
to fund us to do that,

626
00:35:22,860 --> 00:35:28,980
and we build some language
called StartMobile.

627
00:35:28,980 --> 00:35:31,770
And this is an intelligent
phone assistant,

628
00:35:31,770 --> 00:35:35,320
which could, at the time,
retrieve general purpose

629
00:35:35,320 --> 00:35:39,250
information, provide access
to computational services,

630
00:35:39,250 --> 00:35:42,400
perform action on another
phone, trigger apparatus,

631
00:35:42,400 --> 00:35:46,000
like a camera on a phone,
receive instructions.

632
00:35:46,000 --> 00:35:49,330
And we have a,
talking about YouTube,

633
00:35:49,330 --> 00:35:53,890
we have a video that shows
the system in action.

634
00:35:53,890 --> 00:35:55,270
That video is quite old.

635
00:35:55,270 --> 00:35:58,660
It's from beginning of
2006, and at the time,

636
00:35:58,660 --> 00:36:03,370
we did not connect
to speech, but you

637
00:36:03,370 --> 00:36:05,970
could see what it
does by, the user

638
00:36:05,970 --> 00:36:08,430
was typing in questions
for the system

639
00:36:08,430 --> 00:36:11,570
in that particular video.

640
00:36:11,570 --> 00:36:14,110
So there's no narration so
if you read the captions,

641
00:36:14,110 --> 00:36:15,670
you'll figure out
what is going on.

642
00:36:41,242 --> 00:36:42,950
So here's my former
student, who actually

643
00:36:42,950 --> 00:36:49,260
went to Google to transition
our technology eventually.

644
00:36:49,260 --> 00:36:54,140
And he's not sure whether he
needs to take his coat or not.

645
00:36:54,140 --> 00:36:57,500
[JAZZ MUSIC]

646
00:36:57,500 --> 00:36:59,250
Again, this is very
dated, of course,

647
00:36:59,250 --> 00:37:01,470
because now the temperature
is almost uniform,

648
00:37:01,470 --> 00:37:03,280
but, again, that
was 10 years ago.

649
00:37:16,920 --> 00:37:19,020
Is there any sound?

650
00:37:19,020 --> 00:37:21,532
AUDIENCE: Yes.

651
00:37:21,532 --> 00:37:22,990
BORIS KATZ: All
right, those of you

652
00:37:22,990 --> 00:37:24,573
that know Cambridge
know that station.

653
00:37:39,041 --> 00:37:39,540
Where am I?

654
00:37:39,540 --> 00:37:44,500
The GPS just came about and
we were lucky to connect it

655
00:37:44,500 --> 00:37:47,580
and so now the guy gets
the map and knows where

656
00:37:47,580 --> 00:37:50,220
to find where he needs to go.

657
00:37:50,220 --> 00:37:52,960
So this is our data center
for those who haven't seen it.

658
00:37:52,960 --> 00:37:54,520
This is where CSAIL is.

659
00:37:54,520 --> 00:37:56,700
Again, it's dated, because
right now this lawn

660
00:37:56,700 --> 00:37:59,580
is a huge building, but it
wasn't there at the time.

661
00:38:06,041 --> 00:38:08,010
Oh it says here, trying
to reach my mother.

662
00:38:08,010 --> 00:38:10,030
I don't know why it shows
you this stuff, but.

663
00:38:10,030 --> 00:38:16,950
AUDIENCE: [INAUDIBLE]

664
00:38:16,950 --> 00:38:18,810
BORIS KATZ: He's
worried about his mother

665
00:38:18,810 --> 00:38:22,470
and so he decides to tell
her, remind my mother

666
00:38:22,470 --> 00:38:24,050
to take her medicine
at 3:00 p.m.

667
00:38:26,419 --> 00:38:28,210
And we'll see what
happens with that later.

668
00:38:52,190 --> 00:38:55,240
Take a higher resolution picture
using flash in 10 seconds.

669
00:38:55,240 --> 00:38:57,920
I don't think any phones can do
it even today for some reason.

670
00:38:57,920 --> 00:39:00,405
I don't know why it's so hard.

671
00:39:00,405 --> 00:39:01,800
[AUDIENCE LAUGHS]

672
00:39:01,800 --> 00:39:03,347
AUDIENCE: For a selfie.

673
00:39:03,347 --> 00:39:04,680
BORIS KATZ: Right, for a selfie.

674
00:39:04,680 --> 00:39:05,460
Very good, yeah.

675
00:39:11,687 --> 00:39:13,610
All right, so his
friend is busy and he

676
00:39:13,610 --> 00:39:16,606
wants to entertain
himself, I guess,

677
00:39:16,606 --> 00:39:18,230
but he doesn't quite
know how to do it.

678
00:39:28,330 --> 00:39:30,049
How do I use the
radio on my phone?

679
00:39:35,537 --> 00:39:36,037
Well.

680
00:39:39,530 --> 00:39:43,159
All right, now he knows.

681
00:39:43,159 --> 00:39:46,540
[MUSIC PLAYING]

682
00:39:54,760 --> 00:39:56,190
All right, mother's health.

683
00:39:59,613 --> 00:40:02,153
[AUDIENCE LAUGHS]

684
00:40:02,153 --> 00:40:04,910
Right, so, exactly.

685
00:40:04,910 --> 00:40:08,240
So a delayed accident
happened on her phone

686
00:40:08,240 --> 00:40:10,710
so we inserted the
thing on her phone

687
00:40:10,710 --> 00:40:16,090
and then she got this
warning, and that's my stuff.

688
00:40:16,090 --> 00:40:17,881
They all turned out to
be very good actors.

689
00:40:40,020 --> 00:40:42,030
All right, so this
is the last thing.

690
00:40:42,030 --> 00:40:46,140
Traveling, she is going back.

691
00:40:46,140 --> 00:40:48,480
She now has a car.

692
00:40:48,480 --> 00:40:51,810
And this is the last
thing that I'll show you.

693
00:40:51,810 --> 00:40:55,310
[JAZZ MUSIC]

694
00:41:09,810 --> 00:41:13,380
How do I get from here
to Frederica's house?

695
00:41:13,380 --> 00:41:16,740
Well, if you think about it,
this is a very hard question.

696
00:41:16,740 --> 00:41:19,400
You need to know
that here is here

697
00:41:19,400 --> 00:41:23,760
and go to GPS and
find the location.

698
00:41:23,760 --> 00:41:25,830
You need to know
Frederica's house

699
00:41:25,830 --> 00:41:28,530
from your list of contacts
that need to go there,

700
00:41:28,530 --> 00:41:31,260
and you need to then send it
to, in that case, we send it to,

701
00:41:31,260 --> 00:41:33,240
I believe it was
MapQuest, I don't even

702
00:41:33,240 --> 00:41:38,051
know if it exists now, to
actually give the directions.

703
00:41:38,051 --> 00:41:39,450
Well, anyway, so.

704
00:41:42,430 --> 00:41:47,340
Well, it was a little bit
of a sad story actually.

705
00:41:47,340 --> 00:41:49,080
So we built the system.

706
00:41:49,080 --> 00:41:52,770
The company that I
mentioned was Nokia.

707
00:41:52,770 --> 00:41:56,017
We showed them the demo.

708
00:41:56,017 --> 00:41:57,350
They were very excited about it.

709
00:41:57,350 --> 00:42:01,080
They said, well, can we
put START on the phone?

710
00:42:01,080 --> 00:42:06,390
Because in that
application, the signals

711
00:42:06,390 --> 00:42:09,250
were sent to MIT from
the phone and the answers

712
00:42:09,250 --> 00:42:11,670
were sending back to the phone.

713
00:42:11,670 --> 00:42:13,960
I said, well, it
doesn't seem right.

714
00:42:13,960 --> 00:42:18,230
START is large and
there was no internet

715
00:42:18,230 --> 00:42:21,290
that could take care of that.

716
00:42:21,290 --> 00:42:23,550
They said, no, no, no,
how big is your system?

717
00:42:23,550 --> 00:42:28,460
Can you talk through the company
to put it to a LISP compiler,

718
00:42:28,460 --> 00:42:31,210
to put it on our
chip, and so forth.

719
00:42:31,210 --> 00:42:32,160
We need another phone.

720
00:42:32,160 --> 00:42:35,790
Unfortunately, the word cloud
hadn't been invented yet.

721
00:42:35,790 --> 00:42:38,640
Maybe I would have been more
eloquent in explaining to them

722
00:42:38,640 --> 00:42:43,420
why they don't need to have
the system on the phone.

723
00:42:43,420 --> 00:42:47,790
And so they didn't want
to use that the way it is.

724
00:42:47,790 --> 00:42:52,950
We wrote a paper,
showed it to them,

725
00:42:52,950 --> 00:42:57,450
say, do something about
this, it will be too late.

726
00:42:57,450 --> 00:43:01,440
And right at that time, Apple
released its first iPhone.

727
00:43:04,140 --> 00:43:07,170
So I go to like
senior vice president

728
00:43:07,170 --> 00:43:11,006
and say, look, these
guys are ahead of you.

729
00:43:11,006 --> 00:43:12,630
You should decide
about it because they

730
00:43:12,630 --> 00:43:16,260
will do what I gave you.

731
00:43:16,260 --> 00:43:23,490
He asking, how many iPhones
did Apple sell last month?

732
00:43:23,490 --> 00:43:27,090
I said I read somewhere
like a couple of thousand.

733
00:43:27,090 --> 00:43:28,620
And he starts
laughing hysterically.

734
00:43:28,620 --> 00:43:34,600
He said, we, my company, ships
one million phones every day.

735
00:43:34,600 --> 00:43:36,100
Why do we care about
Apple, he said.

736
00:43:39,930 --> 00:43:42,180
Well, so, we gave this talk.

737
00:43:42,180 --> 00:43:46,920
That was September 2007 by then.

738
00:43:46,920 --> 00:43:52,290
In December, somebody started
a company called Siri,

739
00:43:52,290 --> 00:43:55,290
and then two years later,
Siri was bought by Apple

740
00:43:55,290 --> 00:43:56,950
and the rest is history.

741
00:43:56,950 --> 00:44:00,240
And Nokia was sold pretty
much on a yard sale

742
00:44:00,240 --> 00:44:03,600
and doesn't exist anymore.

743
00:44:03,600 --> 00:44:05,950
So be visionary.

744
00:44:05,950 --> 00:44:10,140
Don't think that you know what
you are doing all the time.

745
00:44:10,140 --> 00:44:16,770
Yeah, people often ask me to
say a few words about Jeopardy!.

746
00:44:16,770 --> 00:44:23,430
The question that the IBM
team was hoping to answer

747
00:44:23,430 --> 00:44:26,200
was actually a very
important question.

748
00:44:26,200 --> 00:44:30,180
Can we create a computer system
to compete against the best

749
00:44:30,180 --> 00:44:32,760
human in a task,
which is normally

750
00:44:32,760 --> 00:44:35,693
thought to require high
level of human intelligence?

751
00:44:40,430 --> 00:44:42,980
I was involved with them
from the very beginning

752
00:44:42,980 --> 00:44:47,300
for various reasons
which I will not go into.

753
00:44:47,300 --> 00:44:50,250
They put together a wonderful
team, some really good people,

754
00:44:50,250 --> 00:44:51,250
very devoted people.

755
00:44:51,250 --> 00:44:54,980
They spent four or five years
of their life, pretty much,

756
00:44:54,980 --> 00:44:58,130
totally devoted to that.

757
00:44:58,130 --> 00:45:01,800
And they built a system,
and these are the kind of,

758
00:45:01,800 --> 00:45:02,740
I guess--

759
00:45:02,740 --> 00:45:06,230
I don't know if any of you
know what Jeopardy! is but

760
00:45:06,230 --> 00:45:09,830
pretty much, people
ask the question,

761
00:45:09,830 --> 00:45:14,340
which for various reasons, is
formulated not as a question

762
00:45:14,340 --> 00:45:16,770
but as an assertion,
with the most

763
00:45:16,770 --> 00:45:19,640
to reflect these
that are pronouns

764
00:45:19,640 --> 00:45:23,624
and you need to give an answer,
which they call the question,

765
00:45:23,624 --> 00:45:24,290
for some reason.

766
00:45:24,290 --> 00:45:25,020
There's a gimmick.

767
00:45:25,020 --> 00:45:27,228
You have to say what is
envelope instead of envelope,

768
00:45:27,228 --> 00:45:28,990
but let's not pay
attention to that.

769
00:45:28,990 --> 00:45:30,470
It doesn't matter.

770
00:45:30,470 --> 00:45:31,520
And this is very hard.

771
00:45:31,520 --> 00:45:33,560
To push one of
these paper products

772
00:45:33,560 --> 00:45:36,170
is to stretch the
established limits,

773
00:45:36,170 --> 00:45:39,160
and you need to figure out
that to push the envelopes

774
00:45:39,160 --> 00:45:41,200
means stretch
established limits.

775
00:45:41,200 --> 00:45:44,000
This is a idiom for those
of you who are not native.

776
00:45:44,000 --> 00:45:45,340
And the answer is envelope.

777
00:45:45,340 --> 00:45:50,270
A simple question is the
chapels of these colleges were

778
00:45:50,270 --> 00:45:52,750
designed by this architect?

779
00:45:52,750 --> 00:45:56,400
And you need to figure out that
Christopher Wren is the answer.

780
00:46:01,800 --> 00:46:04,590
Of course, many questions
involve a question

781
00:46:04,590 --> 00:46:05,940
decomposition.

782
00:46:05,940 --> 00:46:08,430
So here's an example
of a real question.

783
00:46:08,430 --> 00:46:11,790
Of the four countries in the
world that the US does not

784
00:46:11,790 --> 00:46:16,100
have diplomatic relations with,
the one that's farthest north?

785
00:46:16,100 --> 00:46:19,800
So it's pretty much
asking several questions.

786
00:46:19,800 --> 00:46:23,079
One is the sort of
inner sub-question.

787
00:46:23,079 --> 00:46:24,870
The four countries in
the world that the US

788
00:46:24,870 --> 00:46:27,930
doesn't have relations with,
and the outer sub-question

789
00:46:27,930 --> 00:46:32,340
is, now that you know the answer
of these four countries, which

790
00:46:32,340 --> 00:46:33,390
is the farthest north?

791
00:46:33,390 --> 00:46:35,040
You do a little
bit of arithmetics

792
00:46:35,040 --> 00:46:37,660
and you find the
answer is North Korea.

793
00:46:37,660 --> 00:46:42,090
And of course, this is very
similar to what started years

794
00:46:42,090 --> 00:46:45,590
before, is pretty much
you decompose the question

795
00:46:45,590 --> 00:46:47,580
and you solve them
separately, as I showed you

796
00:46:47,580 --> 00:46:48,330
a few minutes ago.

797
00:46:52,260 --> 00:46:54,270
So Watson actually
took a bunch of ideas

798
00:46:54,270 --> 00:46:57,210
from START, the Ternary
expression representation,

799
00:46:57,210 --> 00:46:59,880
the natural language
annotations idea,

800
00:46:59,880 --> 00:47:03,180
the object-property-value
data model, and the question

801
00:47:03,180 --> 00:47:06,390
decomposition model,
and applied them

802
00:47:06,390 --> 00:47:10,300
when they could really
analyze syntactically

803
00:47:10,300 --> 00:47:13,020
the question where the question
was not too convoluted,

804
00:47:13,020 --> 00:47:15,570
and when there was a
semi-structured resource,

805
00:47:15,570 --> 00:47:19,860
or several resources,
to find an answer.

806
00:47:19,860 --> 00:47:22,050
But many questions, of
course, were not like that,

807
00:47:22,050 --> 00:47:24,960
and stretching the
envelope is one example.

808
00:47:24,960 --> 00:47:28,080
And so Watson used some
statistical machine

809
00:47:28,080 --> 00:47:32,700
learning approaches and
they did quite a good job

810
00:47:32,700 --> 00:47:36,930
of looking at a lot of
data to resolve and answer

811
00:47:36,930 --> 00:47:38,940
these questions.

812
00:47:38,940 --> 00:47:41,790
Their pipeline is,
really, miles long,

813
00:47:41,790 --> 00:47:45,080
because each of
these bullets has

814
00:47:45,080 --> 00:47:46,540
three, which are
a bunch of bullets

815
00:47:46,540 --> 00:47:49,990
with a bunch of bullets that
the task that they were doing,

816
00:47:49,990 --> 00:47:54,990
but on a very high level, they
needed to content acquisition.

817
00:47:54,990 --> 00:47:57,490
Pretty much, all right,
there was a problem.

818
00:47:57,490 --> 00:48:02,160
The company, Jeopardy!, told
them the web cannot be part

819
00:48:02,160 --> 00:48:04,610
of that because everything
that Google knows everything.

820
00:48:04,610 --> 00:48:06,940
So what is it that
you guys are doing?

821
00:48:06,940 --> 00:48:10,410
So they had, so what would you
do if somebody tells you you

822
00:48:10,410 --> 00:48:11,400
cannot use the word?

823
00:48:14,250 --> 00:48:15,410
AUDIENCE: [INAUDIBLE]

824
00:48:15,410 --> 00:48:16,410
BORIS KATZ: What's that?

825
00:48:16,410 --> 00:48:18,410
AUDIENCE: [INAUDIBLE]

826
00:48:18,410 --> 00:48:21,980
BORIS KATZ: Well, you
pretty much bring the web

827
00:48:21,980 --> 00:48:24,060
and put it in a box.

828
00:48:24,060 --> 00:48:26,930
And this is what IBM did.

829
00:48:26,930 --> 00:48:29,870
They took every repository,
interesting repository,

830
00:48:29,870 --> 00:48:32,810
every database, every
encyclopedia, every newspaper

831
00:48:32,810 --> 00:48:37,520
collection, I forgot where
the blogs existed at the time,

832
00:48:37,520 --> 00:48:40,820
and just had a lot
of clusters and a lot

833
00:48:40,820 --> 00:48:42,440
of memory and
everything was there.

834
00:48:42,440 --> 00:48:46,040
So now they could tell
the company no web.

835
00:48:46,040 --> 00:48:48,154
We are smart without the web.

836
00:48:48,154 --> 00:48:49,320
So that was the first thing.

837
00:48:49,320 --> 00:48:53,420
Then, they actually, there were
some wonderful natural language

838
00:48:53,420 --> 00:48:57,090
processing people do, so
they did question answering,

839
00:48:57,090 --> 00:48:58,670
they searched the documents.

840
00:48:58,670 --> 00:49:03,410
So what they really did, they
took the clue, they call it,

841
00:49:03,410 --> 00:49:08,300
the question, throw it on
not the web but on their web,

842
00:49:08,300 --> 00:49:12,080
find tens of
thousands of documents

843
00:49:12,080 --> 00:49:17,680
that even closely
match these keywords,

844
00:49:17,680 --> 00:49:19,910
and then the real
work just started.

845
00:49:19,910 --> 00:49:22,790
They had this kind of filtering,
that kind of filtering,

846
00:49:22,790 --> 00:49:24,870
this kind of answer
generation, that kind of.

847
00:49:24,870 --> 00:49:28,010
They would score it, they
will pay new evidence,

848
00:49:28,010 --> 00:49:30,200
they will do it again,
they will do ranking,

849
00:49:30,200 --> 00:49:32,930
and they will decide how
confident they are about that

850
00:49:32,930 --> 00:49:35,360
and then they will
decide whether it's worth

851
00:49:35,360 --> 00:49:37,430
how much they spent,
incredible amount of time

852
00:49:37,430 --> 00:49:40,820
figuring out how much money.

853
00:49:40,820 --> 00:49:43,040
I don't actually know
much about Jeopardy!,

854
00:49:43,040 --> 00:49:47,030
but apparently you have to tell
them how good you answered.

855
00:49:47,030 --> 00:49:50,780
Well, you need to come up with
a number of how expensive you

856
00:49:50,780 --> 00:49:53,060
are, how much you
will make if you win,

857
00:49:53,060 --> 00:49:57,240
and how much you
lose if you lose.

858
00:49:57,240 --> 00:50:07,040
And so they did it all and
they built a wonderful system.

859
00:50:07,040 --> 00:50:09,740
In the beginning, when
I started going there,

860
00:50:09,740 --> 00:50:11,450
that was very, very slow.

861
00:50:11,450 --> 00:50:16,520
It ran on a single processor
and really took two hours

862
00:50:16,520 --> 00:50:19,870
to do this pipeline.

863
00:50:19,870 --> 00:50:23,210
But that, if you think about it,
it's a very plausible problem.

864
00:50:23,210 --> 00:50:25,760
You could send it
all to, in that case,

865
00:50:25,760 --> 00:50:30,610
I think by the end it was
like several thousand cores,

866
00:50:30,610 --> 00:50:33,260
and they easily reduced
the time to three seconds,

867
00:50:33,260 --> 00:50:39,460
which was passable and
doable for the competition.

868
00:50:39,460 --> 00:50:43,100
And so they won,
as you all know.

869
00:50:43,100 --> 00:50:46,100
It's a great system
that nails the state

870
00:50:46,100 --> 00:50:50,240
of the art in natural
language, in QA, in information

871
00:50:50,240 --> 00:50:52,790
retrieval, machine learning.

872
00:50:52,790 --> 00:50:56,020
It's a great piece
of engineering.

873
00:50:56,020 --> 00:51:00,946
It reignited, no doubts about
it, public interest in AI.

874
00:51:00,946 --> 00:51:04,950
It brought new talented
people in our field.

875
00:51:04,950 --> 00:51:06,300
So this is all great news.

876
00:51:09,080 --> 00:51:13,880
But let's look at some
of the sort of blunders

877
00:51:13,880 --> 00:51:19,970
that occurred, well, both before
the competition and after.

878
00:51:19,970 --> 00:51:22,040
I had a whole bunch,
a collection of those.

879
00:51:22,040 --> 00:51:23,870
I'll just show you a couple.

880
00:51:23,870 --> 00:51:25,850
This actually happened
before the competition

881
00:51:25,850 --> 00:51:27,510
so they figured the problem.

882
00:51:27,510 --> 00:51:29,600
So the question was,
again, it's called

883
00:51:29,600 --> 00:51:33,160
a clue, in a category of
letters, in the late 40s,

884
00:51:33,160 --> 00:51:39,230
a mother wrote to this artist
that his picture, number nine,

885
00:51:39,230 --> 00:51:41,960
looked like her son's
finger paintings.

886
00:51:41,960 --> 00:51:44,840
Well, for those who
are quick at that,

887
00:51:44,840 --> 00:51:47,380
I'm sure you know that
it's Jackson Pollock,

888
00:51:47,380 --> 00:51:51,090
but Watson sent Rembrandt
for some stupid reasons.

889
00:51:51,090 --> 00:51:54,760
It failed to recognize that
late 40s referred to 1940s,

890
00:51:54,760 --> 00:51:58,700
or rather they thought it's
made in previous centuries,

891
00:51:58,700 --> 00:52:06,820
and apparently, it has something
to do with number nine.

892
00:52:06,820 --> 00:52:10,010
That was in a bunch of
documents related to Rembrandt

893
00:52:10,010 --> 00:52:12,900
and so it said Rembrandt.

894
00:52:12,900 --> 00:52:16,850
Another more famous blunder,
because that happened

895
00:52:16,850 --> 00:52:20,540
at the competition, the
category was US city,

896
00:52:20,540 --> 00:52:23,540
and the question was
its, which is a series,

897
00:52:23,540 --> 00:52:27,860
largest airport is named
for a World War II hero,

898
00:52:27,860 --> 00:52:32,210
and its second largest
for a World War II battle.

899
00:52:32,210 --> 00:52:34,430
And again, those of
you quick at that

900
00:52:34,430 --> 00:52:38,060
will know that in Chicago
there is this O'Hare Airport

901
00:52:38,060 --> 00:52:41,690
and he happened to be
famous here from the world.

902
00:52:41,690 --> 00:52:43,880
And the second
airport called Midway,

903
00:52:43,880 --> 00:52:46,520
which is a famous battle
in the Second World War.

904
00:52:46,520 --> 00:52:50,230
And Watson presses the
button and says Toronto.

905
00:52:50,230 --> 00:52:53,540
And there's a sort of
gasp in the audience,

906
00:52:53,540 --> 00:52:56,270
and also like tens of millions
or hundreds of millions

907
00:52:56,270 --> 00:52:59,757
television sets
around the world.

908
00:52:59,757 --> 00:53:01,465
And again, there are
some stupid reasons.

909
00:53:04,030 --> 00:53:06,430
Watson did machine
learning, as I said,

910
00:53:06,430 --> 00:53:10,220
and it statistically figured
out the category part

911
00:53:10,220 --> 00:53:14,240
of the clue, which in this
case, since this is a US city,

912
00:53:14,240 --> 00:53:17,210
it might not be that
important, so we should

913
00:53:17,210 --> 00:53:18,530
pay less attention to it.

914
00:53:18,530 --> 00:53:22,460
It also knew that Toronto
team playing in baseball--

915
00:53:22,460 --> 00:53:23,100
is that true?

916
00:53:23,100 --> 00:53:23,900
Yes.

917
00:53:23,900 --> 00:53:27,300
In the US baseball
league, and that, in fact,

918
00:53:27,300 --> 00:53:30,320
one of Toronto's airport
is named for a hero.

919
00:53:30,320 --> 00:53:32,600
Although, it's a
World War I hero.

920
00:53:32,600 --> 00:53:36,350
So it put it all together
and said Toronto.

921
00:53:36,350 --> 00:53:41,420
In any case, it won anyway
because it did an amazing job

922
00:53:41,420 --> 00:53:43,700
on answering many
more questions,

923
00:53:43,700 --> 00:53:46,970
but the question to us
is whether this is what

924
00:53:46,970 --> 00:53:48,350
we should all be striving for.

925
00:53:48,350 --> 00:53:51,470
I'm certainly all in favor
of building awesome systems,

926
00:53:51,470 --> 00:53:55,430
and they did, and I explained
to you why I think it's good,

927
00:53:55,430 --> 00:54:02,300
but IBM has not created a
machine that thinks like us.

928
00:54:02,300 --> 00:54:06,110
And what's in success
didn't bring us

929
00:54:06,110 --> 00:54:09,620
even an inch closer
to understanding

930
00:54:09,620 --> 00:54:12,720
human intelligence.

931
00:54:12,720 --> 00:54:15,930
And the positive
news, of course,

932
00:54:15,930 --> 00:54:20,360
is that there was those
blunders, should remind us

933
00:54:20,360 --> 00:54:22,370
that the problem is
waiting to be solved

934
00:54:22,370 --> 00:54:27,480
and you guys are in good
positions to try to do that.

935
00:54:27,480 --> 00:54:31,720
And that should be our
next big challenge.