1 00:00:01,680 --> 00:00:04,080 The following content is provided under a Creative 2 00:00:04,080 --> 00:00:05,620 Commons license. 3 00:00:05,620 --> 00:00:07,920 Your support will help MIT OpenCourseWare 4 00:00:07,920 --> 00:00:12,310 continue to offer high quality, educational resources for free. 5 00:00:12,310 --> 00:00:14,910 To make a donation or view additional materials 6 00:00:14,910 --> 00:00:18,870 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,870 --> 00:00:20,010 at ocw.mit.edu. 8 00:00:22,750 --> 00:00:27,210 BORIS KATZ: When a scientist though 9 00:00:27,210 --> 00:00:31,860 approaches a complex phenomena or when an engineer looks 10 00:00:31,860 --> 00:00:37,890 at a difficult problem, they usually break it up into pieces 11 00:00:37,890 --> 00:00:43,670 and try to understand this phenomena 12 00:00:43,670 --> 00:00:46,380 and solve them separately. 13 00:00:46,380 --> 00:00:50,300 Understanding intelligence is one such very, very complex, 14 00:00:50,300 --> 00:00:54,570 in fact, extraordinary complex problem, and over the years, 15 00:00:54,570 --> 00:01:00,630 this divide and conquer approach produced 16 00:01:00,630 --> 00:01:02,910 a number of very successful fields 17 00:01:02,910 --> 00:01:08,130 like computer vision, natural language processing, 18 00:01:08,130 --> 00:01:13,450 cognitive science, neuroscience, machine vision, and so on. 19 00:01:13,450 --> 00:01:16,680 But we need to remember the most cognitive tasks 20 00:01:16,680 --> 00:01:21,980 that humans perform actually go across modalities, 21 00:01:21,980 --> 00:01:25,820 which is they span these established fields. 22 00:01:25,820 --> 00:01:29,880 And the goal of our thrust is to bring together techniques 23 00:01:29,880 --> 00:01:36,450 from all these fields, create new models, 24 00:01:36,450 --> 00:01:39,670 and for solving intelligence task, 25 00:01:39,670 --> 00:01:44,130 we also would like to understand how these tasks operate 26 00:01:44,130 --> 00:01:46,410 in the brain. 27 00:01:46,410 --> 00:01:51,154 So I will start with one task, which is scene recognition. 28 00:01:54,120 --> 00:01:56,340 What does scene recognition involve? 29 00:01:59,150 --> 00:02:02,170 Well, machines, in order to recognize a scene, 30 00:02:02,170 --> 00:02:06,860 a machine needs to do some type of verification. 31 00:02:06,860 --> 00:02:08,360 Is that a street lamp? 32 00:02:11,630 --> 00:02:18,350 It uses the detection of other people in the scene. 33 00:02:18,350 --> 00:02:20,620 It needs to do identification. 34 00:02:20,620 --> 00:02:24,290 Is this particular building, Potala Palace, 35 00:02:24,290 --> 00:02:25,200 somewhere in Tibet? 36 00:02:27,720 --> 00:02:29,850 It needs to do object categorization. 37 00:02:29,850 --> 00:02:32,760 Look at this image and tell me where 38 00:02:32,760 --> 00:02:37,170 are mountains, trees, buildings, street lamps, vendors, people, 39 00:02:37,170 --> 00:02:38,010 and so forth. 40 00:02:40,890 --> 00:02:43,950 We should also be able to recognize activity. 41 00:02:43,950 --> 00:02:46,830 What is this person doing? 42 00:02:46,830 --> 00:02:48,570 Or what are these two guys doing here? 43 00:02:54,940 --> 00:02:57,850 Well, currently, our machines are pretty 44 00:02:57,850 --> 00:03:01,261 bad at all of these tasks. 45 00:03:01,261 --> 00:03:04,760 I understand that there has been quite a lot of progress 46 00:03:04,760 --> 00:03:07,430 recently made in machine learning. 47 00:03:07,430 --> 00:03:12,980 And I've also seen some claims that machines perform better 48 00:03:12,980 --> 00:03:16,470 than humans in some visual tasks. 49 00:03:16,470 --> 00:03:21,650 However, I think we should take these claims 50 00:03:21,650 --> 00:03:24,380 with a grain of salt. 51 00:03:24,380 --> 00:03:28,070 First, there is nothing amazing about machines 52 00:03:28,070 --> 00:03:30,320 doing certain things better than humans. 53 00:03:30,320 --> 00:03:35,600 People did it over millennia. 54 00:03:35,600 --> 00:03:40,220 Humans needed tools to build pyramids. 55 00:03:40,220 --> 00:03:45,050 They build tools to carry heavy things, to lift them, 56 00:03:45,050 --> 00:03:47,700 to go faster, and so on. 57 00:03:47,700 --> 00:03:49,660 Well, you'll tell me, no, no, no, we need, 58 00:03:49,660 --> 00:03:51,940 we are talking about intelligent tasks. 59 00:03:51,940 --> 00:03:54,590 Well, for $5 you can buy a calculator 60 00:03:54,590 --> 00:03:58,460 that multiplies numbers much better than you do. 61 00:03:58,460 --> 00:04:04,280 For $100, you can build a gadget that has a huge lookup table 62 00:04:04,280 --> 00:04:06,970 and plays chess much better than any of you. 63 00:04:11,480 --> 00:04:14,810 So when you hear that a computer can distinguish 64 00:04:14,810 --> 00:04:19,519 between 20 breeds of dogs, or something like that, 65 00:04:19,519 --> 00:04:21,474 better than you do, I don't think 66 00:04:21,474 --> 00:04:23,640 you should assume that the vision problem is solved. 67 00:04:26,720 --> 00:04:30,320 Well, understand, I'm not saying that because we have 68 00:04:30,320 --> 00:04:32,450 a dramatically better solution. 69 00:04:32,450 --> 00:04:34,310 Not at all. 70 00:04:34,310 --> 00:04:40,340 My point is that the problems of real visual understanding 71 00:04:40,340 --> 00:04:45,030 and real language understanding are extraordinary hard, 72 00:04:45,030 --> 00:04:48,530 and we need to be patient and try to understand why 73 00:04:48,530 --> 00:04:52,940 and eventually find better solutions. 74 00:04:52,940 --> 00:04:56,430 So back to visual understanding problem. 75 00:04:56,430 --> 00:04:59,580 So as I said, machines are better at these things. 76 00:04:59,580 --> 00:05:01,250 But humans are absolutely awesome. 77 00:05:01,250 --> 00:05:05,630 You have absolutely no trouble to do verification detection, 78 00:05:05,630 --> 00:05:07,526 identification, categorization. 79 00:05:07,526 --> 00:05:08,900 You could do much more than that. 80 00:05:08,900 --> 00:05:14,120 You could recognize spatial and temporal relationships 81 00:05:14,120 --> 00:05:14,990 between objects. 82 00:05:14,990 --> 00:05:16,940 You can do event recognition. 83 00:05:16,940 --> 00:05:18,170 You can explain things. 84 00:05:18,170 --> 00:05:21,780 You can look at that image that I showed you 85 00:05:21,780 --> 00:05:29,330 and tell me what past events caused the scene 86 00:05:29,330 --> 00:05:31,260 to look like it is. 87 00:05:31,260 --> 00:05:35,390 You can look at that scene and say what future events might 88 00:05:35,390 --> 00:05:37,580 occur in that scene. 89 00:05:37,580 --> 00:05:38,646 You can fill gaps. 90 00:05:38,646 --> 00:05:39,770 You can hallucinate things. 91 00:05:39,770 --> 00:05:42,150 You could tell me what objects which 92 00:05:42,150 --> 00:05:44,790 were included in that scene which I've barely 93 00:05:44,790 --> 00:05:47,660 seen, actually, there might be present there 94 00:05:47,660 --> 00:05:50,840 and what events not visible in the scene could have occurred. 95 00:05:56,600 --> 00:05:59,110 So why are machines are falling short? 96 00:06:01,970 --> 00:06:04,200 Well, in part, our visual system is 97 00:06:04,200 --> 00:06:07,920 tuned to process structures typically found in the world, 98 00:06:07,920 --> 00:06:10,010 but our machines have no idea. 99 00:06:10,010 --> 00:06:12,230 They don't know enough about the world 100 00:06:12,230 --> 00:06:14,930 and they don't know what structures and events make 101 00:06:14,930 --> 00:06:18,290 sense and typically happen in the world. 102 00:06:22,781 --> 00:06:27,780 So I will show you a blurry video. 103 00:06:27,780 --> 00:06:34,090 And I wonder if, well, some of you 104 00:06:34,090 --> 00:06:36,750 who didn't see it, whether you could figure out what's 105 00:06:36,750 --> 00:06:37,250 going on. 106 00:06:53,100 --> 00:06:56,310 Who have not seen this video? 107 00:06:56,310 --> 00:06:58,958 Could you tell me what you saw? 108 00:06:58,958 --> 00:07:01,333 AUDIENCE: A person was talking on the phone then switched 109 00:07:01,333 --> 00:07:08,678 to working on his computer 110 00:07:08,678 --> 00:07:09,660 BORIS KATZ: Right. 111 00:07:09,660 --> 00:07:11,160 Well, this is amazing. 112 00:07:11,160 --> 00:07:13,490 Even with almost no pixels there and you still 113 00:07:13,490 --> 00:07:16,520 recognize what's going on, well, because you know 114 00:07:16,520 --> 00:07:18,530 what typically happens in the world. 115 00:07:18,530 --> 00:07:21,290 Well, of course it was sort of a joke 116 00:07:21,290 --> 00:07:23,660 and people sometimes make mistakes, too. 117 00:07:23,660 --> 00:07:26,956 So here's the unblurred video. 118 00:07:26,956 --> 00:07:30,442 [AUDIENCE LAUGHTER] 119 00:07:37,414 --> 00:07:39,900 Well, but, all jokes aside. 120 00:07:39,900 --> 00:07:41,200 AUDIENCE: [INAUDIBLE] 121 00:07:41,200 --> 00:07:44,280 [AUDIENCE LAUGHTER] 122 00:07:44,280 --> 00:07:46,230 BORIS KATZ: Jokes aside, though, it 123 00:07:46,230 --> 00:07:49,110 would have been extraordinary if our machines could 124 00:07:49,110 --> 00:07:50,990 make mistakes like this. 125 00:07:50,990 --> 00:07:52,140 Not even close. 126 00:07:56,116 --> 00:07:59,610 Well, so we may want to ask ourselves questions. 127 00:07:59,610 --> 00:08:04,340 How is this knowledge that you seem to have is obtained? 128 00:08:04,340 --> 00:08:09,780 And how can we pass this knowledge to our computers? 129 00:08:09,780 --> 00:08:12,420 How can we determine whether this computer knowledge 130 00:08:12,420 --> 00:08:15,130 is correct? 131 00:08:15,130 --> 00:08:18,970 Our partial answer is using language. 132 00:08:18,970 --> 00:08:21,420 And I bolded the word partial here, 133 00:08:21,420 --> 00:08:26,060 because clearly there are many other ways humans obtain 134 00:08:26,060 --> 00:08:28,160 knowledge about the world, but today I 135 00:08:28,160 --> 00:08:33,120 will be talking about language and show you 136 00:08:33,120 --> 00:08:38,830 what is needed to give knowledge to the machine. 137 00:08:38,830 --> 00:08:40,870 So we have a proposal. 138 00:08:40,870 --> 00:08:44,220 We would like to create a knowledge base that 139 00:08:44,220 --> 00:08:46,910 contains descriptions of objects, 140 00:08:46,910 --> 00:08:49,490 their properties, the relations between them 141 00:08:49,490 --> 00:08:51,270 as they're typically found in the world, 142 00:08:51,270 --> 00:08:54,750 and we want to make this knowledge 143 00:08:54,750 --> 00:08:58,980 base available to a scene-recognition system. 144 00:08:58,980 --> 00:09:01,560 And to test the performance of the system, 145 00:09:01,560 --> 00:09:05,940 we will ask natural language questions. 146 00:09:05,940 --> 00:09:08,940 One of us here has decided to see 147 00:09:08,940 --> 00:09:12,630 what questions people actually ask, 148 00:09:12,630 --> 00:09:16,560 and he set up an Amazon Turk experiment, 149 00:09:16,560 --> 00:09:20,490 where he showed people hundreds of images 150 00:09:20,490 --> 00:09:23,670 and asked them to write down, generate questions 151 00:09:23,670 --> 00:09:24,480 about these images. 152 00:09:24,480 --> 00:09:25,688 And I will show you a couple. 153 00:09:28,620 --> 00:09:29,510 Here's a scene. 154 00:09:29,510 --> 00:09:31,530 In here, the questions that people ask, 155 00:09:31,530 --> 00:09:33,810 you know, how many men in the picture? 156 00:09:33,810 --> 00:09:35,520 What's in the cart? 157 00:09:35,520 --> 00:09:37,530 What does the number on the sign say? 158 00:09:37,530 --> 00:09:38,880 Is there any luggage? 159 00:09:38,880 --> 00:09:42,120 What is the color of shirt of some lady? 160 00:09:42,120 --> 00:09:45,010 Another example. 161 00:09:45,010 --> 00:09:47,580 Who is winning, yellow or red? 162 00:09:47,580 --> 00:09:54,790 Well, the answer is, of course, red, but how did you do that? 163 00:09:54,790 --> 00:09:57,900 Well, you need to know that this is a sporting event, which 164 00:09:57,900 --> 00:09:58,850 all of you do. 165 00:09:58,850 --> 00:10:01,140 That it involves, in this particular sporting 166 00:10:01,140 --> 00:10:04,044 event, winners and losers, that you do. 167 00:10:04,044 --> 00:10:07,950 It needs to know that this sort of short-hand yellow 168 00:10:07,950 --> 00:10:10,710 and the red means people wearing these colors, 169 00:10:10,710 --> 00:10:13,260 rather than color themselves. 170 00:10:13,260 --> 00:10:16,530 You need to know to pay no attention to people wearing red 171 00:10:16,530 --> 00:10:18,900 or maybe blue in the audience. 172 00:10:18,900 --> 00:10:22,410 And you also need to know that a participant on the floor 173 00:10:22,410 --> 00:10:25,386 is likely a loser, not a winner. 174 00:10:25,386 --> 00:10:27,105 That's a lot of knowledge. 175 00:10:30,750 --> 00:10:32,700 So, back to our proposal. 176 00:10:32,700 --> 00:10:34,430 We want to try to give at least some 177 00:10:34,430 --> 00:10:37,250 of that knowledge using language, 178 00:10:37,250 --> 00:10:40,700 and for that, of course, we need tools. 179 00:10:40,700 --> 00:10:46,280 And over the years, we've built a system called START, which, 180 00:10:46,280 --> 00:10:48,530 in fact, contains some tools that 181 00:10:48,530 --> 00:10:51,000 could be helpful for this task. 182 00:10:51,000 --> 00:10:56,330 And I will be happy to share the API with you 183 00:10:56,330 --> 00:10:59,000 so that you could use the system and maybe 184 00:10:59,000 --> 00:11:02,610 try to see what to do with the knowledge that you give it. 185 00:11:05,760 --> 00:11:08,850 So there are only three tools on this slide. 186 00:11:08,850 --> 00:11:12,055 So one is going from language to structure. 187 00:11:12,055 --> 00:11:14,180 So we know that to provide machines with knowledge, 188 00:11:14,180 --> 00:11:18,770 we give the machine a bunch of sentences, texts, paragraphs, 189 00:11:18,770 --> 00:11:20,510 and that will be converted into some kind 190 00:11:20,510 --> 00:11:21,800 of semantic representation. 191 00:11:21,800 --> 00:11:24,854 And I will show some details of that. 192 00:11:24,854 --> 00:11:26,270 We want to go the other direction. 193 00:11:26,270 --> 00:11:29,630 We want a machine to explain what it does 194 00:11:29,630 --> 00:11:31,610 or describe its knowledge using language. 195 00:11:31,610 --> 00:11:33,950 So we have a generator that does that, that 196 00:11:33,950 --> 00:11:37,536 goes from semantic representation to language. 197 00:11:37,536 --> 00:11:39,410 And we want to test the machine, because it's 198 00:11:39,410 --> 00:11:41,390 very important to know whether what 199 00:11:41,390 --> 00:11:44,750 you thought the machine, if actually it understood you 200 00:11:44,750 --> 00:11:46,400 correctly. 201 00:11:46,400 --> 00:11:51,040 We want to ask questions, give queries. 202 00:11:51,040 --> 00:11:53,960 Those will be converted in semantic representations that 203 00:11:53,960 --> 00:11:56,570 will be matched against what the machine knows. 204 00:11:56,570 --> 00:12:00,800 And the computer will either give you a language response 205 00:12:00,800 --> 00:12:04,160 or perform some actions, which will indicate that it 206 00:12:04,160 --> 00:12:10,290 understood what you asked. 207 00:12:10,290 --> 00:12:14,600 So we will go through these tools 208 00:12:14,600 --> 00:12:16,940 and I will describe you the START system in detail, 209 00:12:16,940 --> 00:12:20,230 but I just want you to remember that this is a discipline 210 00:12:20,230 --> 00:12:21,950 and engineering enterprise and these 211 00:12:21,950 --> 00:12:25,100 are the tools that I want to give you and other people 212 00:12:25,100 --> 00:12:28,620 so that you could start thinking deeper about human abilities 213 00:12:28,620 --> 00:12:32,050 and modalities, like vision and language and others. 214 00:12:35,480 --> 00:12:38,550 Some of the building blocks of the START system. 215 00:12:38,550 --> 00:12:42,720 We need to parse language, we need 216 00:12:42,720 --> 00:12:45,030 to come up with semantic representation, 217 00:12:45,030 --> 00:12:47,890 generate match, reply, and so forth. 218 00:12:47,890 --> 00:12:51,300 So let's very quickly go through that. 219 00:12:51,300 --> 00:12:54,160 Most of you, somewhere in middle school, 220 00:12:54,160 --> 00:12:56,940 learned about parse trees. 221 00:12:56,940 --> 00:12:58,310 Linguists love them. 222 00:12:58,310 --> 00:12:59,850 They're beautiful. 223 00:12:59,850 --> 00:13:03,600 This is an example of a sentence from Tom Sawyer. 224 00:13:03,600 --> 00:13:06,690 Tom greeted his aunt who was sitting 225 00:13:06,690 --> 00:13:11,380 by an open window in a pleasant rearward apartment. 226 00:13:11,380 --> 00:13:15,770 And linguists like to argue about exactly 227 00:13:15,770 --> 00:13:18,060 the representations, but pretty much most of them 228 00:13:18,060 --> 00:13:20,370 will agree that something like that represents 229 00:13:20,370 --> 00:13:25,440 the structure, the syntactic structure of the sentence. 230 00:13:25,440 --> 00:13:27,630 Well, they're beautiful and nice, 231 00:13:27,630 --> 00:13:29,430 but they're really horrible if you 232 00:13:29,430 --> 00:13:32,340 want to store them, if you want to match them, 233 00:13:32,340 --> 00:13:34,740 if you want to retrieve from them. 234 00:13:34,740 --> 00:13:38,400 And so we use the information found in parse trees, 235 00:13:38,400 --> 00:13:40,890 but we developed a different representation 236 00:13:40,890 --> 00:13:44,100 which we call Ternary expression representation, which 237 00:13:44,100 --> 00:13:46,560 is a more semantic representation of language. 238 00:13:46,560 --> 00:13:51,450 It is syntax-driven, but it highlights semantic relations 239 00:13:51,450 --> 00:13:56,190 which humans find important, and it also, 240 00:13:56,190 --> 00:13:59,740 because we give this knowledge to computers, 241 00:13:59,740 --> 00:14:02,880 we made it very efficient for indexing, for matching, 242 00:14:02,880 --> 00:14:03,930 and for retrieval. 243 00:14:03,930 --> 00:14:06,500 It's also reversible, and I'll explain in a second what 244 00:14:06,500 --> 00:14:07,770 I mean by that. 245 00:14:07,770 --> 00:14:12,120 We implemented it as a nested subject, relation, 246 00:14:12,120 --> 00:14:15,690 object set of tuples. 247 00:14:15,690 --> 00:14:18,510 So here is a different sentence. 248 00:14:18,510 --> 00:14:22,500 Say you have an image. 249 00:14:22,500 --> 00:14:25,470 You may recognize one of the characters there in the back. 250 00:14:25,470 --> 00:14:27,180 It's Andrei Barbu. 251 00:14:27,180 --> 00:14:30,210 And say you want to describe, in language, what you see here. 252 00:14:30,210 --> 00:14:32,520 And you say something like, the person 253 00:14:32,520 --> 00:14:37,110 who picked up the yellow lemon placed it in the bowl 254 00:14:37,110 --> 00:14:39,510 on the table. 255 00:14:39,510 --> 00:14:44,660 Using this structure, subject, relation, object structure, 256 00:14:44,660 --> 00:14:47,470 you could, and first parsing the sentence, 257 00:14:47,470 --> 00:14:49,990 you could create this Ternary expression. 258 00:14:49,990 --> 00:14:52,110 And you could see, person picked up lemon. 259 00:14:52,110 --> 00:14:57,420 That same person placed that lemon in the bowl on the table, 260 00:14:57,420 --> 00:15:01,410 and that lemon happens to be yellow. 261 00:15:01,410 --> 00:15:05,010 To make it a little bit more easy 262 00:15:05,010 --> 00:15:10,250 for you to see what's going on and convenient for humans 263 00:15:10,250 --> 00:15:14,100 and machines, we created a sort of topologically equivalent 264 00:15:14,100 --> 00:15:18,720 linearization of that graph, that knowledge 265 00:15:18,720 --> 00:15:22,730 graph, and as a set of triples. 266 00:15:22,730 --> 00:15:24,240 They're a little bit misleading here 267 00:15:24,240 --> 00:15:29,130 just due to its simplicity, but all words here, of course, 268 00:15:29,130 --> 00:15:33,380 need to have an index, because if you have, 269 00:15:33,380 --> 00:15:37,890 say, a tall person and a short person 270 00:15:37,890 --> 00:15:41,350 or, then, you will have to distinguish between them. 271 00:15:41,350 --> 00:15:43,770 So you need indices, but for simplicity, I 272 00:15:43,770 --> 00:15:45,050 didn't show them here. 273 00:15:45,050 --> 00:15:47,760 And then the verbs also have indices 274 00:15:47,760 --> 00:15:50,460 so that when you use the same word place here, 275 00:15:50,460 --> 00:15:52,650 that is the relation, the triple, that 276 00:15:52,650 --> 00:15:55,350 happened to be in the bowl, that the person 277 00:15:55,350 --> 00:15:56,550 placed a lemon in the bowl. 278 00:15:59,100 --> 00:16:02,690 So this is all representation and when 279 00:16:02,690 --> 00:16:07,080 we distinguished at least three types of Ternary expressions. 280 00:16:07,080 --> 00:16:12,330 So the first type you see here, syntactic structure 281 00:16:12,330 --> 00:16:16,020 of a sentence. 282 00:16:16,020 --> 00:16:17,730 We also have syntactic features. 283 00:16:17,730 --> 00:16:22,700 The fact that the sitting was, it was past tense. 284 00:16:22,700 --> 00:16:25,500 She was sitting, it said here. 285 00:16:25,500 --> 00:16:29,280 But it's a progressive tense, was sitting, and also 286 00:16:29,280 --> 00:16:30,980 what kind of article things have. 287 00:16:30,980 --> 00:16:36,720 The window had an article and an indefinite article. 288 00:16:36,720 --> 00:16:40,920 And also the lexical features that don't change from sentence 289 00:16:40,920 --> 00:16:44,310 to sentence, the fact that Tom is a proper noun and so forth. 290 00:16:51,810 --> 00:16:59,420 Well, I told you that our representation is reversible. 291 00:16:59,420 --> 00:17:03,756 Well, we need to be able to teach machines to talk to us. 292 00:17:03,756 --> 00:17:05,339 And there are many reasons to do that. 293 00:17:05,339 --> 00:17:07,099 Some of them are shown on this slide. 294 00:17:07,099 --> 00:17:11,420 We want your robot or your computer 295 00:17:11,420 --> 00:17:16,339 to explain what it does somewhat remotely. 296 00:17:16,339 --> 00:17:18,020 You want your machine or your robot 297 00:17:18,020 --> 00:17:20,329 to answer questions which are complex 298 00:17:20,329 --> 00:17:26,569 and the robot may want to ask you about it for clarification. 299 00:17:26,569 --> 00:17:29,870 You want to keep track of conversation history and state. 300 00:17:29,870 --> 00:17:32,060 Engage in mixed-initiative dialogue. 301 00:17:32,060 --> 00:17:33,840 Offer related information. 302 00:17:33,840 --> 00:17:38,300 So all these things need to happen in the dialogue 303 00:17:38,300 --> 00:17:40,490 and, therefore, your computer must 304 00:17:40,490 --> 00:17:43,730 be able to speak to you in a language that you understand. 305 00:17:43,730 --> 00:17:51,500 In fact, I find that the biggest problem with learning systems 306 00:17:51,500 --> 00:17:55,670 that we have today is that some of the work 307 00:17:55,670 --> 00:17:58,160 can work quite robustly, and sometimes 308 00:17:58,160 --> 00:18:01,680 give you good results, but you have no idea why. 309 00:18:01,680 --> 00:18:04,130 You press a button, you say, aha, here's the number, 310 00:18:04,130 --> 00:18:07,340 and put it in the paper, but more recently, people 311 00:18:07,340 --> 00:18:10,470 started looking at why it does what it does. 312 00:18:10,470 --> 00:18:11,990 But, again, it's done by numbers. 313 00:18:11,990 --> 00:18:14,570 It would be really wonderful if our learning 314 00:18:14,570 --> 00:18:17,520 system could tell us why they came up with their conclusions. 315 00:18:17,520 --> 00:18:22,970 So we need language and so we created, 316 00:18:22,970 --> 00:18:24,950 sort of built a START generator that 317 00:18:24,950 --> 00:18:33,740 goes from those same expressions and create natural language. 318 00:18:36,470 --> 00:18:41,810 This is why we call this representation reversable. 319 00:18:41,810 --> 00:18:46,202 So given a set of Ternary expressions, 320 00:18:46,202 --> 00:18:47,660 the machine will create a sentence, 321 00:18:47,660 --> 00:18:50,390 the person who picked out the yellow lemon placed it 322 00:18:50,390 --> 00:18:53,270 in the bowl on the table. 323 00:18:53,270 --> 00:18:55,550 But, of course, this is a little bit silly, 324 00:18:55,550 --> 00:18:57,530 just parroting the same sentence back. 325 00:18:57,530 --> 00:18:58,950 You want the machine, for example, 326 00:18:58,950 --> 00:19:01,520 as I said, to ask you a question or indicate 327 00:19:01,520 --> 00:19:04,980 a negative statement or rephrase it from different pieces 328 00:19:04,980 --> 00:19:06,450 that it knows about. 329 00:19:06,450 --> 00:19:09,600 So, in fact, our generator is very flexible. 330 00:19:09,600 --> 00:19:13,960 So here is an example where by, say, observing the world, 331 00:19:13,960 --> 00:19:17,770 robot adds more information to this representation, 332 00:19:17,770 --> 00:19:19,970 and they're indicated here in blue. 333 00:19:19,970 --> 00:19:22,130 And now from the original sentence about, 334 00:19:22,130 --> 00:19:24,440 which was the person who picked out the yellow lemon, 335 00:19:24,440 --> 00:19:26,820 placed it in the bowl, the machine, 336 00:19:26,820 --> 00:19:30,020 by just adding a couple of new relations, 337 00:19:30,020 --> 00:19:33,350 the generator will be able to ask a question, or the human. 338 00:19:33,350 --> 00:19:35,870 Will the person who placed the yellow lemon in the bowl 339 00:19:35,870 --> 00:19:37,500 on the table pick it up soon? 340 00:19:37,500 --> 00:19:38,450 And so forth. 341 00:19:43,430 --> 00:19:47,180 All right, so we talked about parse trees, 342 00:19:47,180 --> 00:19:51,506 about semantic representation, about generation. 343 00:19:51,506 --> 00:19:52,630 So what do we do with that? 344 00:19:52,630 --> 00:19:58,130 Let's-- Supposing that you gave some knowledge to your machine, 345 00:19:58,130 --> 00:20:03,800 hear that Tom Sawyer assertion, and somebody asked you, 346 00:20:03,800 --> 00:20:08,360 was anyone sitting by an open window? 347 00:20:08,360 --> 00:20:10,880 Well, what needs to happen is this question 348 00:20:10,880 --> 00:20:13,880 needs to be converted into Ternary representation 349 00:20:13,880 --> 00:20:19,340 as I indicated, and this knowledge base, we assume, 350 00:20:19,340 --> 00:20:22,910 had the knowledge from the original assertion plus million 351 00:20:22,910 --> 00:20:24,470 other assertions, of course. 352 00:20:24,470 --> 00:20:27,890 So we would need to match the representation of the query 353 00:20:27,890 --> 00:20:30,680 with the knowledge base, and the machine will say, 354 00:20:30,680 --> 00:20:33,140 aha, here is the match. 355 00:20:33,140 --> 00:20:35,360 Well, this is very simple here, if you 356 00:20:35,360 --> 00:20:38,660 think window needs to open to window, 357 00:20:38,660 --> 00:20:42,920 open to match window open, open, and sit and sit, 358 00:20:42,920 --> 00:20:46,970 and then the word anyone, which sort of matching the word, 359 00:20:46,970 --> 00:20:49,190 needs to match aunt. 360 00:20:49,190 --> 00:20:51,410 But, in reality, of course, people 361 00:20:51,410 --> 00:20:54,110 ask questions which do not that closely follow 362 00:20:54,110 --> 00:20:57,430 what the machine knows, and so for our matcher, 363 00:20:57,430 --> 00:21:00,090 it needs to be much more sophisticated. 364 00:21:00,090 --> 00:21:04,100 This is just the graphical effect, sort of knowledge, 365 00:21:04,100 --> 00:21:08,270 graph representation of that match. 366 00:21:08,270 --> 00:21:14,020 So START distinguishes things like term matching, 367 00:21:14,020 --> 00:21:16,460 which as I showed, could be a lexical match. 368 00:21:16,460 --> 00:21:18,600 Of course, it knows synonym. 369 00:21:18,600 --> 00:21:22,070 It knows hyponymy when it goes one way. 370 00:21:22,070 --> 00:21:25,190 It's like a car is a vehicle. 371 00:21:25,190 --> 00:21:28,360 And as you can imagine, the match also needs to go one way. 372 00:21:28,360 --> 00:21:33,800 If I say I bought a car, it also means that I bought a vehicle. 373 00:21:33,800 --> 00:21:36,010 But if I say I bought a vehicle, it's 374 00:21:36,010 --> 00:21:38,090 not true that I bought a car because I may have 375 00:21:38,090 --> 00:21:40,730 bought a truck, but this aside. 376 00:21:40,730 --> 00:21:45,290 So it's pretty easy to do matching on the level of terms 377 00:21:45,290 --> 00:21:51,590 of words, but a much more complex problem 378 00:21:51,590 --> 00:21:55,400 is to match on the level of the structure. 379 00:21:55,400 --> 00:21:59,225 And I will show you some examples of this problem. 380 00:22:02,270 --> 00:22:04,010 But by now, you must have figured out 381 00:22:04,010 --> 00:22:05,976 I love to stare at English sentences. 382 00:22:05,976 --> 00:22:07,100 I hope you do a little bit. 383 00:22:07,100 --> 00:22:10,020 If not, please try. 384 00:22:10,020 --> 00:22:13,190 So here, let's consider a couple of verbs. 385 00:22:13,190 --> 00:22:16,310 So here is the verb surprise. 386 00:22:16,310 --> 00:22:19,040 Let's consider these two sentences. 387 00:22:19,040 --> 00:22:23,640 The patient surprised the doctor with his fast recovery. 388 00:22:23,640 --> 00:22:27,350 The patient's fast recovery surprised the doctor. 389 00:22:27,350 --> 00:22:31,130 Now for you who are used to understanding language 390 00:22:31,130 --> 00:22:33,230 so quickly, it's even hard to hear what's 391 00:22:33,230 --> 00:22:34,940 different about the sentences. 392 00:22:34,940 --> 00:22:36,457 But if you actually do the parsing, 393 00:22:36,457 --> 00:22:38,540 you will see that the parse trees are dramatically 394 00:22:38,540 --> 00:22:42,492 different, and therefore, our Ternary representations 395 00:22:42,492 --> 00:22:43,450 will be very different. 396 00:22:43,450 --> 00:22:46,220 So we need to find a way to tell the machine, yeah, 397 00:22:46,220 --> 00:22:48,430 that's the same thing. 398 00:22:48,430 --> 00:22:51,470 And the linguists call these things syntactic 399 00:22:51,470 --> 00:22:52,740 alternations as well. 400 00:22:55,670 --> 00:22:59,680 In a different verb like load, here's a different alternation. 401 00:22:59,680 --> 00:23:03,290 The crane loaded the ship with containers or the crane 402 00:23:03,290 --> 00:23:05,540 loaded containers onto the ship. 403 00:23:05,540 --> 00:23:07,980 Again, means the same thing pretty much, 404 00:23:07,980 --> 00:23:12,350 but the surface representation is different. 405 00:23:12,350 --> 00:23:15,680 The next one is in terms of a question. 406 00:23:15,680 --> 00:23:17,960 Did Iran provide Syria with weapons 407 00:23:17,960 --> 00:23:21,420 or did Iran provide weapons to Syria? 408 00:23:21,420 --> 00:23:24,540 Let's see if it's true for every verb in the universe. 409 00:23:24,540 --> 00:23:28,010 So let's try to look at the, say, surprise alternation 410 00:23:28,010 --> 00:23:32,730 and use it with a load verb or the other way around. 411 00:23:32,730 --> 00:23:35,630 So I hope you're bearing with me. 412 00:23:35,630 --> 00:23:42,530 So linguists put stars in front of bad sentences. 413 00:23:42,530 --> 00:23:46,640 And so here I tried to use the word surprise 414 00:23:46,640 --> 00:23:51,236 without alternation, which load allows you to do. 415 00:23:51,236 --> 00:23:54,270 Here, you do the same onto, and it says the patient surprised 416 00:23:54,270 --> 00:23:55,920 fast recovery onto the doctor. 417 00:23:55,920 --> 00:23:58,880 It makes absolutely no sense. 418 00:23:58,880 --> 00:24:04,770 Here I tried to do surprise alternation for the word load. 419 00:24:04,770 --> 00:24:07,730 And it says the crane's containers loaded the ship. 420 00:24:07,730 --> 00:24:09,260 Again, complete gibberish. 421 00:24:09,260 --> 00:24:10,400 And the same below. 422 00:24:10,400 --> 00:24:13,920 Did Iran's weapons provide Syria? 423 00:24:13,920 --> 00:24:16,280 So it looks like a really horrible story. 424 00:24:16,280 --> 00:24:18,570 Every English verb, it looks like, 425 00:24:18,570 --> 00:24:25,250 has a different way of saying, expressing these alternations, 426 00:24:25,250 --> 00:24:30,110 but fortunately, this is not the case. 427 00:24:30,110 --> 00:24:31,850 Let's go back to the verb surprise. 428 00:24:37,010 --> 00:24:43,440 Well, let's look at verbs similar to the word surprise. 429 00:24:43,440 --> 00:24:45,680 So you could use the same alternation 430 00:24:45,680 --> 00:24:46,700 with the word confused. 431 00:24:46,700 --> 00:24:49,070 You can say the patient confused the doctor 432 00:24:49,070 --> 00:24:51,170 with slow recovery, which will convert 433 00:24:51,170 --> 00:24:54,020 into the patient's slow recovery confused the doctor. 434 00:24:54,020 --> 00:24:55,700 You can say the same thing with anger, 435 00:24:55,700 --> 00:25:00,730 disappoint, embarrass, fight, and impress, please, threaten. 436 00:25:00,730 --> 00:25:03,930 And what is really amazing about it and very interesting 437 00:25:03,930 --> 00:25:10,210 that this syntactic alternation works 438 00:25:10,210 --> 00:25:13,720 the same way for verbs of the same meaning, 439 00:25:13,720 --> 00:25:15,520 of the same semantic class. 440 00:25:15,520 --> 00:25:19,330 And this particular class is called the emotional reaction 441 00:25:19,330 --> 00:25:20,220 verbs. 442 00:25:20,220 --> 00:25:24,190 And it's a large semantic class of about 300 verbs, 443 00:25:24,190 --> 00:25:26,470 and they all behave identically from the point 444 00:25:26,470 --> 00:25:28,300 of view of that alternations. 445 00:25:28,300 --> 00:25:31,084 And it's true for all other alternations that I showed you. 446 00:25:31,084 --> 00:25:32,500 So that's, of course, is good news 447 00:25:32,500 --> 00:25:35,590 because it makes an interesting connection between syntax 448 00:25:35,590 --> 00:25:40,240 and semantics, but it also allows 449 00:25:40,240 --> 00:25:44,980 to build lexicons that are more compact and easier 450 00:25:44,980 --> 00:25:46,030 to deal with. 451 00:25:46,030 --> 00:25:50,560 And one can imagine creating this verb membership 452 00:25:50,560 --> 00:25:53,950 automatically by looking at large corpus. 453 00:25:53,950 --> 00:25:57,450 And this is how, presumably, children 454 00:25:57,450 --> 00:26:01,500 learn these verb classes and these alternations. 455 00:26:01,500 --> 00:26:04,240 All right, so now that you know how to match 456 00:26:04,240 --> 00:26:05,897 and it's not just trivial match only, 457 00:26:05,897 --> 00:26:07,980 like I showed here, but a more sophisticated match 458 00:26:07,980 --> 00:26:11,590 on the level of structure as well, 459 00:26:11,590 --> 00:26:13,850 let's see what we can do after the match happens. 460 00:26:13,850 --> 00:26:16,720 So here's the same sentence and the same question. 461 00:26:16,720 --> 00:26:19,030 Was anybody sitting by an open window? 462 00:26:19,030 --> 00:26:22,810 We retrieved the structure and then 463 00:26:22,810 --> 00:26:26,980 we could tell our generator, go and generate the sentence, 464 00:26:26,980 --> 00:26:28,630 and it will do that. 465 00:26:28,630 --> 00:26:31,150 Tom's aunt was sitting by an open window 466 00:26:31,150 --> 00:26:34,990 in a pleasant rearward apartment. 467 00:26:34,990 --> 00:26:36,250 Well, it's not as interesting. 468 00:26:36,250 --> 00:26:39,910 It's sort of parroting the same thing I tell it, ABC, 469 00:26:39,910 --> 00:26:44,392 and it told me back, ABC, if I ask 470 00:26:44,392 --> 00:26:47,489 who BC or something like that. 471 00:26:47,489 --> 00:26:49,530 If you want to build a question-answering system, 472 00:26:49,530 --> 00:26:52,540 we want it to be able to, in response to a question, 473 00:26:52,540 --> 00:26:56,590 understand it, go somewhere, find the right answer, 474 00:26:56,590 --> 00:26:59,020 and give it back to you. 475 00:26:59,020 --> 00:27:04,030 And we build that and we do it by a general, where, 476 00:27:04,030 --> 00:27:08,260 if looking at it, our system can execute a procedure in response 477 00:27:08,260 --> 00:27:13,040 to a match to obtain the answer from the data source. 478 00:27:13,040 --> 00:27:15,820 So an example is here and I can show you some screenshots 479 00:27:15,820 --> 00:27:19,070 or, in fact, if you like it, we can play with the system live 480 00:27:19,070 --> 00:27:20,590 and you'll see what it does. 481 00:27:20,590 --> 00:27:23,160 So it executes a procedure to obtain an answer from the data 482 00:27:23,160 --> 00:27:24,260 source. 483 00:27:24,260 --> 00:27:28,605 If you say who directed Gone With the Wind, from match, 484 00:27:28,605 --> 00:27:32,320 it will happen between what you know, what you ask, 485 00:27:32,320 --> 00:27:36,010 and what the system knows, some script will get executed 486 00:27:36,010 --> 00:27:38,830 and the machine will go to some data source, find the answer, 487 00:27:38,830 --> 00:27:40,390 and give it back to you. 488 00:27:40,390 --> 00:27:41,380 So how this is done? 489 00:27:41,380 --> 00:27:47,180 Well, in order to explain that, I need two more ideas. 490 00:27:47,180 --> 00:27:49,850 One is a natural language annotation idea. 491 00:27:49,850 --> 00:27:53,830 So annotations, sentences, and phrases 492 00:27:53,830 --> 00:27:56,626 that describe the content of retrievable information 493 00:27:56,626 --> 00:27:58,750 segments, this is a graphic sort of, in a cute way, 494 00:27:58,750 --> 00:28:02,110 show this sentence level, or phrase level labels 495 00:28:02,110 --> 00:28:06,700 on some data, and they describe the retrievable information 496 00:28:06,700 --> 00:28:07,210 segments. 497 00:28:07,210 --> 00:28:10,780 Annotations are then matched against submitted queries 498 00:28:10,780 --> 00:28:13,360 and the successful match results in, 499 00:28:13,360 --> 00:28:17,860 either retrieval of that information or some procedure, 500 00:28:17,860 --> 00:28:20,901 to retrieve that information. 501 00:28:20,901 --> 00:28:25,210 And the special case of this procedure 502 00:28:25,210 --> 00:28:30,490 is done using our object-property-value data 503 00:28:30,490 --> 00:28:32,800 model. 504 00:28:32,800 --> 00:28:36,760 This technique can connect language 505 00:28:36,760 --> 00:28:42,160 to arbitrary procedures, but as I said, 506 00:28:42,160 --> 00:28:47,260 let's consider a lot of semi-structured information 507 00:28:47,260 --> 00:28:49,540 sources available on the web that 508 00:28:49,540 --> 00:28:54,820 can be modeled using this object-property-value model. 509 00:28:54,820 --> 00:28:59,230 Well, what are these, what kind of repositories 510 00:28:59,230 --> 00:29:00,130 have this property? 511 00:29:00,130 --> 00:29:01,840 If you think about it, almost anything 512 00:29:01,840 --> 00:29:05,330 that humans create on the web, which is semi-structured, 513 00:29:05,330 --> 00:29:06,190 is like that. 514 00:29:06,190 --> 00:29:09,580 If you have a site that has a bunch of countries 515 00:29:09,580 --> 00:29:12,490 with properties like populations, areas 516 00:29:12,490 --> 00:29:17,290 capitals, birthrates, and so forth, the country is 517 00:29:17,290 --> 00:29:20,140 an object, the word population is a property, 518 00:29:20,140 --> 00:29:24,250 and value is the actual value of that property. 519 00:29:24,250 --> 00:29:26,680 You can have people with their birth-dates, 520 00:29:26,680 --> 00:29:30,560 you can have cities with maps and elevations and so forth. 521 00:29:30,560 --> 00:29:32,980 So in a sense, this object-property-value model 522 00:29:32,980 --> 00:29:38,500 makes it possible to view and use large segments of the web 523 00:29:38,500 --> 00:29:40,770 as a database. 524 00:29:40,770 --> 00:29:44,120 And schematically, here's how START uses this model. 525 00:29:44,120 --> 00:29:46,480 A user asks the language part of the system 526 00:29:46,480 --> 00:29:49,820 a question, the system needs to understand the question, 527 00:29:49,820 --> 00:29:53,530 understands where the question might be found, 528 00:29:53,530 --> 00:29:55,810 and what is the object and property 529 00:29:55,810 --> 00:29:57,970 implicit in the question. 530 00:29:57,970 --> 00:29:59,700 After START does that, it says it 531 00:29:59,700 --> 00:30:03,790 has a friend called Omnibase, and it says, go get it. 532 00:30:03,790 --> 00:30:05,500 Go to this site. 533 00:30:05,500 --> 00:30:11,700 Go for that symbol called France and go get the population. 534 00:30:11,700 --> 00:30:14,580 And it will go to some world fact book 535 00:30:14,580 --> 00:30:18,580 and get the population. 536 00:30:18,580 --> 00:30:22,740 So this is how the system works and here's 537 00:30:22,740 --> 00:30:24,440 an example of such a question. 538 00:30:24,440 --> 00:30:26,640 Here, the question is, it's a screenshot, 539 00:30:26,640 --> 00:30:29,276 does Russia border on Moldova? 540 00:30:29,276 --> 00:30:33,180 The system says, ah ha, you want to find out 541 00:30:33,180 --> 00:30:35,880 what countries border Moldova, and find out 542 00:30:35,880 --> 00:30:38,940 whether Russia is among them. 543 00:30:38,940 --> 00:30:43,260 And then it actually checks that and it tells you, no, Russia 544 00:30:43,260 --> 00:30:45,240 does not border Moldova, because it doesn't 545 00:30:45,240 --> 00:30:48,690 find Russia in this response. 546 00:30:48,690 --> 00:30:51,920 And just for comparison, if you ask the same question 547 00:30:51,920 --> 00:30:55,430 of a search engine, it will give you 548 00:30:55,430 --> 00:31:01,020 24 million results, today maybe 240 million results, 549 00:31:01,020 --> 00:31:02,940 and none of them really answer the question. 550 00:31:09,000 --> 00:31:13,130 Well, I just want to tell you that the ability to understand 551 00:31:13,130 --> 00:31:15,020 something really helps. 552 00:31:15,020 --> 00:31:17,210 In this case, the ability to understand language 553 00:31:17,210 --> 00:31:19,880 gives you a lot of power. 554 00:31:19,880 --> 00:31:22,890 You can do a lot by trying through the keywords 555 00:31:22,890 --> 00:31:24,690 and you can retrieve a lot of documents, 556 00:31:24,690 --> 00:31:30,090 and this is how all of, pretty much, modern systems work. 557 00:31:30,090 --> 00:31:32,960 But if you want to do something a little bit more complex, 558 00:31:32,960 --> 00:31:34,460 it would be nice to understand some. 559 00:31:34,460 --> 00:31:37,790 So here's an example of a complex question. 560 00:31:37,790 --> 00:31:40,340 Who is the president of the fourth largest country 561 00:31:40,340 --> 00:31:42,560 married to? 562 00:31:42,560 --> 00:31:46,520 Well, if you can analyze this question into pieces, 563 00:31:46,520 --> 00:31:51,140 then you can very quickly figure out that, right away, 564 00:31:51,140 --> 00:31:53,560 just throwing the pieces on your knowledge base, 565 00:31:53,560 --> 00:31:55,250 you cannot resolve it. 566 00:31:55,250 --> 00:31:59,180 But we've built a very nice syntatical-based algorithm 567 00:31:59,180 --> 00:32:04,700 that allows us to resolve the complex questions into subsets 568 00:32:04,700 --> 00:32:06,860 of simpler questions and understand in which order 569 00:32:06,860 --> 00:32:07,940 to ask them. 570 00:32:07,940 --> 00:32:10,070 So the machine will say, oh, first 571 00:32:10,070 --> 00:32:12,350 I need to find out what's the fourth largest country, 572 00:32:12,350 --> 00:32:14,840 then who its president is, and then with that, 573 00:32:14,840 --> 00:32:16,490 who he is married to. 574 00:32:16,490 --> 00:32:20,140 And, very quickly, this is how, schematically, it's done. 575 00:32:20,140 --> 00:32:22,170 This is sort of an under the hood 576 00:32:22,170 --> 00:32:24,990 Ternary expression representation of the question. 577 00:32:24,990 --> 00:32:27,530 The machine says, oh, too hard, let's first 578 00:32:27,530 --> 00:32:30,515 find out what the fourth largest country is. 579 00:32:30,515 --> 00:32:32,060 It's China. 580 00:32:32,060 --> 00:32:34,700 Then let's find out, it's still hard, 581 00:32:34,700 --> 00:32:37,010 so let's find out who the president of China 582 00:32:37,010 --> 00:32:41,810 is, found the name, and then the next is just a lookup table. 583 00:32:41,810 --> 00:32:42,950 Who is he married to? 584 00:32:42,950 --> 00:32:45,270 And it gives you an answer. 585 00:32:51,280 --> 00:32:52,230 Some other examples. 586 00:32:52,230 --> 00:32:54,970 In what city was the fifth president of the US born? 587 00:32:54,970 --> 00:32:57,850 And finds like a James Monroe and gives you the city. 588 00:33:00,480 --> 00:33:02,960 What books did the author of War and Peace write? 589 00:33:02,960 --> 00:33:06,790 Finds Leo Tolstoy and finds his books from different sources. 590 00:33:11,080 --> 00:33:16,860 So the technology that I described, 591 00:33:16,860 --> 00:33:22,260 object-property-value data model, our Ternary expression 592 00:33:22,260 --> 00:33:29,990 representation, complex question answering, the annotation, 593 00:33:29,990 --> 00:33:33,440 natural language annotation, representation. 594 00:33:33,440 --> 00:33:37,840 They, over the years, inspired a bunch of companies 595 00:33:37,840 --> 00:33:41,650 and a bunch of technologies, starting with Ask Jeeves, who, 596 00:33:41,650 --> 00:33:45,010 I guess, existed before you guys were even born, 597 00:33:45,010 --> 00:33:47,420 to Wolfram Alpha, who pretty much took 598 00:33:47,420 --> 00:33:49,880 [INAUDIBLE] wholesale, to more recently, 599 00:33:49,880 --> 00:33:53,470 Google QA started doing really wonderful things using 600 00:33:53,470 --> 00:33:57,720 this idea that, everybody had the idea that you should 601 00:33:57,720 --> 00:33:59,300 go from surface to a question. 602 00:33:59,300 --> 00:34:02,020 If you have a question, you throw your question 603 00:34:02,020 --> 00:34:04,300 onto the web and you get some answer, 604 00:34:04,300 --> 00:34:07,640 but it doesn't work with high precision. 605 00:34:07,640 --> 00:34:11,320 So the idea that you need to curate knowledge and build 606 00:34:11,320 --> 00:34:13,600 some huge depository of knowledge 607 00:34:13,600 --> 00:34:16,630 was picked up by these companies to, certainly now, 608 00:34:16,630 --> 00:34:22,290 all these companies do quite decent question answering. 609 00:34:22,290 --> 00:34:25,760 And the same is true for Watson and Siri, 610 00:34:25,760 --> 00:34:30,520 and I was involved in some of these things, 611 00:34:30,520 --> 00:34:34,370 so I will show you. 612 00:34:34,370 --> 00:34:36,730 Let's see. 613 00:34:36,730 --> 00:34:38,139 Right, so let's start with this. 614 00:34:41,560 --> 00:34:46,090 About 10 years ago, we, on top of START, 615 00:34:46,090 --> 00:34:53,431 we built a system that was connected to a cell phone. 616 00:34:53,431 --> 00:34:55,389 I don't know how many of you remember the world 617 00:34:55,389 --> 00:34:58,120 without smartphones, but that was 618 00:34:58,120 --> 00:34:59,450 when smartphones wasn't there. 619 00:34:59,450 --> 00:35:01,840 There was no such thing as iPhone. 620 00:35:01,840 --> 00:35:04,090 So there's a vanilla phone that, all it did, it 621 00:35:04,090 --> 00:35:06,520 made phone calls. 622 00:35:06,520 --> 00:35:09,950 Of course, it also had a camera and unlimited text. 623 00:35:09,950 --> 00:35:11,800 But it really didn't do much more 624 00:35:11,800 --> 00:35:15,890 and we decided it's time to connect it to language. 625 00:35:15,890 --> 00:35:22,860 So we convinced the company to fund us to do that, 626 00:35:22,860 --> 00:35:28,980 and we build some language called StartMobile. 627 00:35:28,980 --> 00:35:31,770 And this is an intelligent phone assistant, 628 00:35:31,770 --> 00:35:35,320 which could, at the time, retrieve general purpose 629 00:35:35,320 --> 00:35:39,250 information, provide access to computational services, 630 00:35:39,250 --> 00:35:42,400 perform action on another phone, trigger apparatus, 631 00:35:42,400 --> 00:35:46,000 like a camera on a phone, receive instructions. 632 00:35:46,000 --> 00:35:49,330 And we have a, talking about YouTube, 633 00:35:49,330 --> 00:35:53,890 we have a video that shows the system in action. 634 00:35:53,890 --> 00:35:55,270 That video is quite old. 635 00:35:55,270 --> 00:35:58,660 It's from beginning of 2006, and at the time, 636 00:35:58,660 --> 00:36:03,370 we did not connect to speech, but you 637 00:36:03,370 --> 00:36:05,970 could see what it does by, the user 638 00:36:05,970 --> 00:36:08,430 was typing in questions for the system 639 00:36:08,430 --> 00:36:11,570 in that particular video. 640 00:36:11,570 --> 00:36:14,110 So there's no narration so if you read the captions, 641 00:36:14,110 --> 00:36:15,670 you'll figure out what is going on. 642 00:36:41,242 --> 00:36:42,950 So here's my former student, who actually 643 00:36:42,950 --> 00:36:49,260 went to Google to transition our technology eventually. 644 00:36:49,260 --> 00:36:54,140 And he's not sure whether he needs to take his coat or not. 645 00:36:54,140 --> 00:36:57,500 [JAZZ MUSIC] 646 00:36:57,500 --> 00:36:59,250 Again, this is very dated, of course, 647 00:36:59,250 --> 00:37:01,470 because now the temperature is almost uniform, 648 00:37:01,470 --> 00:37:03,280 but, again, that was 10 years ago. 649 00:37:16,920 --> 00:37:19,020 Is there any sound? 650 00:37:19,020 --> 00:37:21,532 AUDIENCE: Yes. 651 00:37:21,532 --> 00:37:22,990 BORIS KATZ: All right, those of you 652 00:37:22,990 --> 00:37:24,573 that know Cambridge know that station. 653 00:37:39,041 --> 00:37:39,540 Where am I? 654 00:37:39,540 --> 00:37:44,500 The GPS just came about and we were lucky to connect it 655 00:37:44,500 --> 00:37:47,580 and so now the guy gets the map and knows where 656 00:37:47,580 --> 00:37:50,220 to find where he needs to go. 657 00:37:50,220 --> 00:37:52,960 So this is our data center for those who haven't seen it. 658 00:37:52,960 --> 00:37:54,520 This is where CSAIL is. 659 00:37:54,520 --> 00:37:56,700 Again, it's dated, because right now this lawn 660 00:37:56,700 --> 00:37:59,580 is a huge building, but it wasn't there at the time. 661 00:38:06,041 --> 00:38:08,010 Oh it says here, trying to reach my mother. 662 00:38:08,010 --> 00:38:10,030 I don't know why it shows you this stuff, but. 663 00:38:10,030 --> 00:38:16,950 AUDIENCE: [INAUDIBLE] 664 00:38:16,950 --> 00:38:18,810 BORIS KATZ: He's worried about his mother 665 00:38:18,810 --> 00:38:22,470 and so he decides to tell her, remind my mother 666 00:38:22,470 --> 00:38:24,050 to take her medicine at 3:00 p.m. 667 00:38:26,419 --> 00:38:28,210 And we'll see what happens with that later. 668 00:38:52,190 --> 00:38:55,240 Take a higher resolution picture using flash in 10 seconds. 669 00:38:55,240 --> 00:38:57,920 I don't think any phones can do it even today for some reason. 670 00:38:57,920 --> 00:39:00,405 I don't know why it's so hard. 671 00:39:00,405 --> 00:39:01,800 [AUDIENCE LAUGHS] 672 00:39:01,800 --> 00:39:03,347 AUDIENCE: For a selfie. 673 00:39:03,347 --> 00:39:04,680 BORIS KATZ: Right, for a selfie. 674 00:39:04,680 --> 00:39:05,460 Very good, yeah. 675 00:39:11,687 --> 00:39:13,610 All right, so his friend is busy and he 676 00:39:13,610 --> 00:39:16,606 wants to entertain himself, I guess, 677 00:39:16,606 --> 00:39:18,230 but he doesn't quite know how to do it. 678 00:39:28,330 --> 00:39:30,049 How do I use the radio on my phone? 679 00:39:35,537 --> 00:39:36,037 Well. 680 00:39:39,530 --> 00:39:43,159 All right, now he knows. 681 00:39:43,159 --> 00:39:46,540 [MUSIC PLAYING] 682 00:39:54,760 --> 00:39:56,190 All right, mother's health. 683 00:39:59,613 --> 00:40:02,153 [AUDIENCE LAUGHS] 684 00:40:02,153 --> 00:40:04,910 Right, so, exactly. 685 00:40:04,910 --> 00:40:08,240 So a delayed accident happened on her phone 686 00:40:08,240 --> 00:40:10,710 so we inserted the thing on her phone 687 00:40:10,710 --> 00:40:16,090 and then she got this warning, and that's my stuff. 688 00:40:16,090 --> 00:40:17,881 They all turned out to be very good actors. 689 00:40:40,020 --> 00:40:42,030 All right, so this is the last thing. 690 00:40:42,030 --> 00:40:46,140 Traveling, she is going back. 691 00:40:46,140 --> 00:40:48,480 She now has a car. 692 00:40:48,480 --> 00:40:51,810 And this is the last thing that I'll show you. 693 00:40:51,810 --> 00:40:55,310 [JAZZ MUSIC] 694 00:41:09,810 --> 00:41:13,380 How do I get from here to Frederica's house? 695 00:41:13,380 --> 00:41:16,740 Well, if you think about it, this is a very hard question. 696 00:41:16,740 --> 00:41:19,400 You need to know that here is here 697 00:41:19,400 --> 00:41:23,760 and go to GPS and find the location. 698 00:41:23,760 --> 00:41:25,830 You need to know Frederica's house 699 00:41:25,830 --> 00:41:28,530 from your list of contacts that need to go there, 700 00:41:28,530 --> 00:41:31,260 and you need to then send it to, in that case, we send it to, 701 00:41:31,260 --> 00:41:33,240 I believe it was MapQuest, I don't even 702 00:41:33,240 --> 00:41:38,051 know if it exists now, to actually give the directions. 703 00:41:38,051 --> 00:41:39,450 Well, anyway, so. 704 00:41:42,430 --> 00:41:47,340 Well, it was a little bit of a sad story actually. 705 00:41:47,340 --> 00:41:49,080 So we built the system. 706 00:41:49,080 --> 00:41:52,770 The company that I mentioned was Nokia. 707 00:41:52,770 --> 00:41:56,017 We showed them the demo. 708 00:41:56,017 --> 00:41:57,350 They were very excited about it. 709 00:41:57,350 --> 00:42:01,080 They said, well, can we put START on the phone? 710 00:42:01,080 --> 00:42:06,390 Because in that application, the signals 711 00:42:06,390 --> 00:42:09,250 were sent to MIT from the phone and the answers 712 00:42:09,250 --> 00:42:11,670 were sending back to the phone. 713 00:42:11,670 --> 00:42:13,960 I said, well, it doesn't seem right. 714 00:42:13,960 --> 00:42:18,230 START is large and there was no internet 715 00:42:18,230 --> 00:42:21,290 that could take care of that. 716 00:42:21,290 --> 00:42:23,550 They said, no, no, no, how big is your system? 717 00:42:23,550 --> 00:42:28,460 Can you talk through the company to put it to a LISP compiler, 718 00:42:28,460 --> 00:42:31,210 to put it on our chip, and so forth. 719 00:42:31,210 --> 00:42:32,160 We need another phone. 720 00:42:32,160 --> 00:42:35,790 Unfortunately, the word cloud hadn't been invented yet. 721 00:42:35,790 --> 00:42:38,640 Maybe I would have been more eloquent in explaining to them 722 00:42:38,640 --> 00:42:43,420 why they don't need to have the system on the phone. 723 00:42:43,420 --> 00:42:47,790 And so they didn't want to use that the way it is. 724 00:42:47,790 --> 00:42:52,950 We wrote a paper, showed it to them, 725 00:42:52,950 --> 00:42:57,450 say, do something about this, it will be too late. 726 00:42:57,450 --> 00:43:01,440 And right at that time, Apple released its first iPhone. 727 00:43:04,140 --> 00:43:07,170 So I go to like senior vice president 728 00:43:07,170 --> 00:43:11,006 and say, look, these guys are ahead of you. 729 00:43:11,006 --> 00:43:12,630 You should decide about it because they 730 00:43:12,630 --> 00:43:16,260 will do what I gave you. 731 00:43:16,260 --> 00:43:23,490 He asking, how many iPhones did Apple sell last month? 732 00:43:23,490 --> 00:43:27,090 I said I read somewhere like a couple of thousand. 733 00:43:27,090 --> 00:43:28,620 And he starts laughing hysterically. 734 00:43:28,620 --> 00:43:34,600 He said, we, my company, ships one million phones every day. 735 00:43:34,600 --> 00:43:36,100 Why do we care about Apple, he said. 736 00:43:39,930 --> 00:43:42,180 Well, so, we gave this talk. 737 00:43:42,180 --> 00:43:46,920 That was September 2007 by then. 738 00:43:46,920 --> 00:43:52,290 In December, somebody started a company called Siri, 739 00:43:52,290 --> 00:43:55,290 and then two years later, Siri was bought by Apple 740 00:43:55,290 --> 00:43:56,950 and the rest is history. 741 00:43:56,950 --> 00:44:00,240 And Nokia was sold pretty much on a yard sale 742 00:44:00,240 --> 00:44:03,600 and doesn't exist anymore. 743 00:44:03,600 --> 00:44:05,950 So be visionary. 744 00:44:05,950 --> 00:44:10,140 Don't think that you know what you are doing all the time. 745 00:44:10,140 --> 00:44:16,770 Yeah, people often ask me to say a few words about Jeopardy!. 746 00:44:16,770 --> 00:44:23,430 The question that the IBM team was hoping to answer 747 00:44:23,430 --> 00:44:26,200 was actually a very important question. 748 00:44:26,200 --> 00:44:30,180 Can we create a computer system to compete against the best 749 00:44:30,180 --> 00:44:32,760 human in a task, which is normally 750 00:44:32,760 --> 00:44:35,693 thought to require high level of human intelligence? 751 00:44:40,430 --> 00:44:42,980 I was involved with them from the very beginning 752 00:44:42,980 --> 00:44:47,300 for various reasons which I will not go into. 753 00:44:47,300 --> 00:44:50,250 They put together a wonderful team, some really good people, 754 00:44:50,250 --> 00:44:51,250 very devoted people. 755 00:44:51,250 --> 00:44:54,980 They spent four or five years of their life, pretty much, 756 00:44:54,980 --> 00:44:58,130 totally devoted to that. 757 00:44:58,130 --> 00:45:01,800 And they built a system, and these are the kind of, 758 00:45:01,800 --> 00:45:02,740 I guess-- 759 00:45:02,740 --> 00:45:06,230 I don't know if any of you know what Jeopardy! is but 760 00:45:06,230 --> 00:45:09,830 pretty much, people ask the question, 761 00:45:09,830 --> 00:45:14,340 which for various reasons, is formulated not as a question 762 00:45:14,340 --> 00:45:16,770 but as an assertion, with the most 763 00:45:16,770 --> 00:45:19,640 to reflect these that are pronouns 764 00:45:19,640 --> 00:45:23,624 and you need to give an answer, which they call the question, 765 00:45:23,624 --> 00:45:24,290 for some reason. 766 00:45:24,290 --> 00:45:25,020 There's a gimmick. 767 00:45:25,020 --> 00:45:27,228 You have to say what is envelope instead of envelope, 768 00:45:27,228 --> 00:45:28,990 but let's not pay attention to that. 769 00:45:28,990 --> 00:45:30,470 It doesn't matter. 770 00:45:30,470 --> 00:45:31,520 And this is very hard. 771 00:45:31,520 --> 00:45:33,560 To push one of these paper products 772 00:45:33,560 --> 00:45:36,170 is to stretch the established limits, 773 00:45:36,170 --> 00:45:39,160 and you need to figure out that to push the envelopes 774 00:45:39,160 --> 00:45:41,200 means stretch established limits. 775 00:45:41,200 --> 00:45:44,000 This is a idiom for those of you who are not native. 776 00:45:44,000 --> 00:45:45,340 And the answer is envelope. 777 00:45:45,340 --> 00:45:50,270 A simple question is the chapels of these colleges were 778 00:45:50,270 --> 00:45:52,750 designed by this architect? 779 00:45:52,750 --> 00:45:56,400 And you need to figure out that Christopher Wren is the answer. 780 00:46:01,800 --> 00:46:04,590 Of course, many questions involve a question 781 00:46:04,590 --> 00:46:05,940 decomposition. 782 00:46:05,940 --> 00:46:08,430 So here's an example of a real question. 783 00:46:08,430 --> 00:46:11,790 Of the four countries in the world that the US does not 784 00:46:11,790 --> 00:46:16,100 have diplomatic relations with, the one that's farthest north? 785 00:46:16,100 --> 00:46:19,800 So it's pretty much asking several questions. 786 00:46:19,800 --> 00:46:23,079 One is the sort of inner sub-question. 787 00:46:23,079 --> 00:46:24,870 The four countries in the world that the US 788 00:46:24,870 --> 00:46:27,930 doesn't have relations with, and the outer sub-question 789 00:46:27,930 --> 00:46:32,340 is, now that you know the answer of these four countries, which 790 00:46:32,340 --> 00:46:33,390 is the farthest north? 791 00:46:33,390 --> 00:46:35,040 You do a little bit of arithmetics 792 00:46:35,040 --> 00:46:37,660 and you find the answer is North Korea. 793 00:46:37,660 --> 00:46:42,090 And of course, this is very similar to what started years 794 00:46:42,090 --> 00:46:45,590 before, is pretty much you decompose the question 795 00:46:45,590 --> 00:46:47,580 and you solve them separately, as I showed you 796 00:46:47,580 --> 00:46:48,330 a few minutes ago. 797 00:46:52,260 --> 00:46:54,270 So Watson actually took a bunch of ideas 798 00:46:54,270 --> 00:46:57,210 from START, the Ternary expression representation, 799 00:46:57,210 --> 00:46:59,880 the natural language annotations idea, 800 00:46:59,880 --> 00:47:03,180 the object-property-value data model, and the question 801 00:47:03,180 --> 00:47:06,390 decomposition model, and applied them 802 00:47:06,390 --> 00:47:10,300 when they could really analyze syntactically 803 00:47:10,300 --> 00:47:13,020 the question where the question was not too convoluted, 804 00:47:13,020 --> 00:47:15,570 and when there was a semi-structured resource, 805 00:47:15,570 --> 00:47:19,860 or several resources, to find an answer. 806 00:47:19,860 --> 00:47:22,050 But many questions, of course, were not like that, 807 00:47:22,050 --> 00:47:24,960 and stretching the envelope is one example. 808 00:47:24,960 --> 00:47:28,080 And so Watson used some statistical machine 809 00:47:28,080 --> 00:47:32,700 learning approaches and they did quite a good job 810 00:47:32,700 --> 00:47:36,930 of looking at a lot of data to resolve and answer 811 00:47:36,930 --> 00:47:38,940 these questions. 812 00:47:38,940 --> 00:47:41,790 Their pipeline is, really, miles long, 813 00:47:41,790 --> 00:47:45,080 because each of these bullets has 814 00:47:45,080 --> 00:47:46,540 three, which are a bunch of bullets 815 00:47:46,540 --> 00:47:49,990 with a bunch of bullets that the task that they were doing, 816 00:47:49,990 --> 00:47:54,990 but on a very high level, they needed to content acquisition. 817 00:47:54,990 --> 00:47:57,490 Pretty much, all right, there was a problem. 818 00:47:57,490 --> 00:48:02,160 The company, Jeopardy!, told them the web cannot be part 819 00:48:02,160 --> 00:48:04,610 of that because everything that Google knows everything. 820 00:48:04,610 --> 00:48:06,940 So what is it that you guys are doing? 821 00:48:06,940 --> 00:48:10,410 So they had, so what would you do if somebody tells you you 822 00:48:10,410 --> 00:48:11,400 cannot use the word? 823 00:48:14,250 --> 00:48:15,410 AUDIENCE: [INAUDIBLE] 824 00:48:15,410 --> 00:48:16,410 BORIS KATZ: What's that? 825 00:48:16,410 --> 00:48:18,410 AUDIENCE: [INAUDIBLE] 826 00:48:18,410 --> 00:48:21,980 BORIS KATZ: Well, you pretty much bring the web 827 00:48:21,980 --> 00:48:24,060 and put it in a box. 828 00:48:24,060 --> 00:48:26,930 And this is what IBM did. 829 00:48:26,930 --> 00:48:29,870 They took every repository, interesting repository, 830 00:48:29,870 --> 00:48:32,810 every database, every encyclopedia, every newspaper 831 00:48:32,810 --> 00:48:37,520 collection, I forgot where the blogs existed at the time, 832 00:48:37,520 --> 00:48:40,820 and just had a lot of clusters and a lot 833 00:48:40,820 --> 00:48:42,440 of memory and everything was there. 834 00:48:42,440 --> 00:48:46,040 So now they could tell the company no web. 835 00:48:46,040 --> 00:48:48,154 We are smart without the web. 836 00:48:48,154 --> 00:48:49,320 So that was the first thing. 837 00:48:49,320 --> 00:48:53,420 Then, they actually, there were some wonderful natural language 838 00:48:53,420 --> 00:48:57,090 processing people do, so they did question answering, 839 00:48:57,090 --> 00:48:58,670 they searched the documents. 840 00:48:58,670 --> 00:49:03,410 So what they really did, they took the clue, they call it, 841 00:49:03,410 --> 00:49:08,300 the question, throw it on not the web but on their web, 842 00:49:08,300 --> 00:49:12,080 find tens of thousands of documents 843 00:49:12,080 --> 00:49:17,680 that even closely match these keywords, 844 00:49:17,680 --> 00:49:19,910 and then the real work just started. 845 00:49:19,910 --> 00:49:22,790 They had this kind of filtering, that kind of filtering, 846 00:49:22,790 --> 00:49:24,870 this kind of answer generation, that kind of. 847 00:49:24,870 --> 00:49:28,010 They would score it, they will pay new evidence, 848 00:49:28,010 --> 00:49:30,200 they will do it again, they will do ranking, 849 00:49:30,200 --> 00:49:32,930 and they will decide how confident they are about that 850 00:49:32,930 --> 00:49:35,360 and then they will decide whether it's worth 851 00:49:35,360 --> 00:49:37,430 how much they spent, incredible amount of time 852 00:49:37,430 --> 00:49:40,820 figuring out how much money. 853 00:49:40,820 --> 00:49:43,040 I don't actually know much about Jeopardy!, 854 00:49:43,040 --> 00:49:47,030 but apparently you have to tell them how good you answered. 855 00:49:47,030 --> 00:49:50,780 Well, you need to come up with a number of how expensive you 856 00:49:50,780 --> 00:49:53,060 are, how much you will make if you win, 857 00:49:53,060 --> 00:49:57,240 and how much you lose if you lose. 858 00:49:57,240 --> 00:50:07,040 And so they did it all and they built a wonderful system. 859 00:50:07,040 --> 00:50:09,740 In the beginning, when I started going there, 860 00:50:09,740 --> 00:50:11,450 that was very, very slow. 861 00:50:11,450 --> 00:50:16,520 It ran on a single processor and really took two hours 862 00:50:16,520 --> 00:50:19,870 to do this pipeline. 863 00:50:19,870 --> 00:50:23,210 But that, if you think about it, it's a very plausible problem. 864 00:50:23,210 --> 00:50:25,760 You could send it all to, in that case, 865 00:50:25,760 --> 00:50:30,610 I think by the end it was like several thousand cores, 866 00:50:30,610 --> 00:50:33,260 and they easily reduced the time to three seconds, 867 00:50:33,260 --> 00:50:39,460 which was passable and doable for the competition. 868 00:50:39,460 --> 00:50:43,100 And so they won, as you all know. 869 00:50:43,100 --> 00:50:46,100 It's a great system that nails the state 870 00:50:46,100 --> 00:50:50,240 of the art in natural language, in QA, in information 871 00:50:50,240 --> 00:50:52,790 retrieval, machine learning. 872 00:50:52,790 --> 00:50:56,020 It's a great piece of engineering. 873 00:50:56,020 --> 00:51:00,946 It reignited, no doubts about it, public interest in AI. 874 00:51:00,946 --> 00:51:04,950 It brought new talented people in our field. 875 00:51:04,950 --> 00:51:06,300 So this is all great news. 876 00:51:09,080 --> 00:51:13,880 But let's look at some of the sort of blunders 877 00:51:13,880 --> 00:51:19,970 that occurred, well, both before the competition and after. 878 00:51:19,970 --> 00:51:22,040 I had a whole bunch, a collection of those. 879 00:51:22,040 --> 00:51:23,870 I'll just show you a couple. 880 00:51:23,870 --> 00:51:25,850 This actually happened before the competition 881 00:51:25,850 --> 00:51:27,510 so they figured the problem. 882 00:51:27,510 --> 00:51:29,600 So the question was, again, it's called 883 00:51:29,600 --> 00:51:33,160 a clue, in a category of letters, in the late 40s, 884 00:51:33,160 --> 00:51:39,230 a mother wrote to this artist that his picture, number nine, 885 00:51:39,230 --> 00:51:41,960 looked like her son's finger paintings. 886 00:51:41,960 --> 00:51:44,840 Well, for those who are quick at that, 887 00:51:44,840 --> 00:51:47,380 I'm sure you know that it's Jackson Pollock, 888 00:51:47,380 --> 00:51:51,090 but Watson sent Rembrandt for some stupid reasons. 889 00:51:51,090 --> 00:51:54,760 It failed to recognize that late 40s referred to 1940s, 890 00:51:54,760 --> 00:51:58,700 or rather they thought it's made in previous centuries, 891 00:51:58,700 --> 00:52:06,820 and apparently, it has something to do with number nine. 892 00:52:06,820 --> 00:52:10,010 That was in a bunch of documents related to Rembrandt 893 00:52:10,010 --> 00:52:12,900 and so it said Rembrandt. 894 00:52:12,900 --> 00:52:16,850 Another more famous blunder, because that happened 895 00:52:16,850 --> 00:52:20,540 at the competition, the category was US city, 896 00:52:20,540 --> 00:52:23,540 and the question was its, which is a series, 897 00:52:23,540 --> 00:52:27,860 largest airport is named for a World War II hero, 898 00:52:27,860 --> 00:52:32,210 and its second largest for a World War II battle. 899 00:52:32,210 --> 00:52:34,430 And again, those of you quick at that 900 00:52:34,430 --> 00:52:38,060 will know that in Chicago there is this O'Hare Airport 901 00:52:38,060 --> 00:52:41,690 and he happened to be famous here from the world. 902 00:52:41,690 --> 00:52:43,880 And the second airport called Midway, 903 00:52:43,880 --> 00:52:46,520 which is a famous battle in the Second World War. 904 00:52:46,520 --> 00:52:50,230 And Watson presses the button and says Toronto. 905 00:52:50,230 --> 00:52:53,540 And there's a sort of gasp in the audience, 906 00:52:53,540 --> 00:52:56,270 and also like tens of millions or hundreds of millions 907 00:52:56,270 --> 00:52:59,757 television sets around the world. 908 00:52:59,757 --> 00:53:01,465 And again, there are some stupid reasons. 909 00:53:04,030 --> 00:53:06,430 Watson did machine learning, as I said, 910 00:53:06,430 --> 00:53:10,220 and it statistically figured out the category part 911 00:53:10,220 --> 00:53:14,240 of the clue, which in this case, since this is a US city, 912 00:53:14,240 --> 00:53:17,210 it might not be that important, so we should 913 00:53:17,210 --> 00:53:18,530 pay less attention to it. 914 00:53:18,530 --> 00:53:22,460 It also knew that Toronto team playing in baseball-- 915 00:53:22,460 --> 00:53:23,100 is that true? 916 00:53:23,100 --> 00:53:23,900 Yes. 917 00:53:23,900 --> 00:53:27,300 In the US baseball league, and that, in fact, 918 00:53:27,300 --> 00:53:30,320 one of Toronto's airport is named for a hero. 919 00:53:30,320 --> 00:53:32,600 Although, it's a World War I hero. 920 00:53:32,600 --> 00:53:36,350 So it put it all together and said Toronto. 921 00:53:36,350 --> 00:53:41,420 In any case, it won anyway because it did an amazing job 922 00:53:41,420 --> 00:53:43,700 on answering many more questions, 923 00:53:43,700 --> 00:53:46,970 but the question to us is whether this is what 924 00:53:46,970 --> 00:53:48,350 we should all be striving for. 925 00:53:48,350 --> 00:53:51,470 I'm certainly all in favor of building awesome systems, 926 00:53:51,470 --> 00:53:55,430 and they did, and I explained to you why I think it's good, 927 00:53:55,430 --> 00:54:02,300 but IBM has not created a machine that thinks like us. 928 00:54:02,300 --> 00:54:06,110 And what's in success didn't bring us 929 00:54:06,110 --> 00:54:09,620 even an inch closer to understanding 930 00:54:09,620 --> 00:54:12,720 human intelligence. 931 00:54:12,720 --> 00:54:15,930 And the positive news, of course, 932 00:54:15,930 --> 00:54:20,360 is that there was those blunders, should remind us 933 00:54:20,360 --> 00:54:22,370 that the problem is waiting to be solved 934 00:54:22,370 --> 00:54:27,480 and you guys are in good positions to try to do that. 935 00:54:27,480 --> 00:54:31,720 And that should be our next big challenge.