1 00:00:01,680 --> 00:00:04,080 The following content is provided under a Creative 2 00:00:04,080 --> 00:00:05,620 Commons license. 3 00:00:05,620 --> 00:00:07,920 Your support will help MIT OpenCourseWare 4 00:00:07,920 --> 00:00:12,280 continue to offer high quality educational resources for free. 5 00:00:12,280 --> 00:00:14,910 To make a donation, or view additional materials 6 00:00:14,910 --> 00:00:18,870 from hundreds of MIT courses, visit MIT OpenCourseWare 7 00:00:18,870 --> 00:00:21,470 at ocw.mit.edu. 8 00:00:21,470 --> 00:00:22,970 GABRIEL KREIMAN: What I'd like to do 9 00:00:22,970 --> 00:00:25,170 today is give a very brief introduction 10 00:00:25,170 --> 00:00:28,550 to neural circuits, why we study them, how we study them, 11 00:00:28,550 --> 00:00:30,270 and the possibilities that come out 12 00:00:30,270 --> 00:00:31,830 of understanding biological codes, 13 00:00:31,830 --> 00:00:35,340 and trying to translate those ideas into computational codes. 14 00:00:35,340 --> 00:00:37,260 Then I will be a bit more specific, 15 00:00:37,260 --> 00:00:39,270 and discuss some initial attempts 16 00:00:39,270 --> 00:00:42,510 at studying the computational role of feedback signals. 17 00:00:42,510 --> 00:00:44,610 And then I'll switch gears and talk 18 00:00:44,610 --> 00:00:47,280 for a few minutes about a couple of things that are not 19 00:00:47,280 --> 00:00:49,200 necessarily related to things that we've 20 00:00:49,200 --> 00:00:52,950 made any real work on, but I'm particularly 21 00:00:52,950 --> 00:00:55,740 excited about in the context of open question challenges, 22 00:00:55,740 --> 00:00:57,450 and opportunities, and what I think 23 00:00:57,450 --> 00:01:00,540 will happen over the next several years in the field. 24 00:01:00,540 --> 00:01:02,640 In the hope of inspiring several of you 25 00:01:02,640 --> 00:01:06,850 to actually solve some of these open questions in the field. 26 00:01:06,850 --> 00:01:10,320 So one of the reasons why I'm very excited about studying 27 00:01:10,320 --> 00:01:13,560 biology and studying brains is that our brains 28 00:01:13,560 --> 00:01:16,080 are the product of millions of years of evolution. 29 00:01:16,080 --> 00:01:18,180 And through evolution, we have discovered 30 00:01:18,180 --> 00:01:22,109 how to do things that are interesting, fast, efficient. 31 00:01:22,109 --> 00:01:24,150 And so if we can understand the biological cause, 32 00:01:24,150 --> 00:01:25,800 if we can understand the machinery 33 00:01:25,800 --> 00:01:28,620 by which we do all of these amazing feats, that 34 00:01:28,620 --> 00:01:30,570 in principle, we should be able to take 35 00:01:30,570 --> 00:01:32,460 some of these biological codes, and write 36 00:01:32,460 --> 00:01:35,034 computer code that will do all of those things 37 00:01:35,034 --> 00:01:35,700 in similar ways. 38 00:01:35,700 --> 00:01:38,130 In similar ways that we can write algorithms to compute 39 00:01:38,130 --> 00:01:39,720 the square root of 2, there could 40 00:01:39,720 --> 00:01:41,790 be algorithms that dictate how we 41 00:01:41,790 --> 00:01:43,980 see, how we can recognize objects, 42 00:01:43,980 --> 00:01:46,950 how we can recognize auditory events. 43 00:01:46,950 --> 00:01:49,710 In short, the answer to all of these Turing questions, 44 00:01:49,710 --> 00:01:52,140 in some sense, is hidden somewhere here 45 00:01:52,140 --> 00:01:53,160 inside our brain. 46 00:01:53,160 --> 00:01:56,330 So the question is, how can we listen to neurons and circuits, 47 00:01:56,330 --> 00:01:58,410 decode their activity, and maybe even write 48 00:01:58,410 --> 00:02:00,390 in information in the brain, and then 49 00:02:00,390 --> 00:02:04,770 trying to translate all of these ideas into computational codes. 50 00:02:04,770 --> 00:02:07,140 So there's a lot of fascinating properties 51 00:02:07,140 --> 00:02:08,729 that biological codes cover. 52 00:02:08,729 --> 00:02:10,320 Needless to say, we're not quite there 53 00:02:10,320 --> 00:02:12,990 yet in terms of computers and robots. 54 00:02:12,990 --> 00:02:16,570 So our hardware and software worked for many decades. 55 00:02:16,570 --> 00:02:21,210 I think it's very unlikely that your amazing iPhone 6 or 5 56 00:02:21,210 --> 00:02:24,790 or 7 whatever it is, will last four, five, six, seven, eight, 57 00:02:24,790 --> 00:02:25,750 nine decades. 58 00:02:25,750 --> 00:02:28,140 None of our computers will last that long. 59 00:02:28,140 --> 00:02:29,790 Our hardware does. 60 00:02:29,790 --> 00:02:31,620 There's amazing parallel computation 61 00:02:31,620 --> 00:02:32,832 going on in our brains. 62 00:02:32,832 --> 00:02:34,290 This is quite distinct from the way 63 00:02:34,290 --> 00:02:37,620 we think about algorithms and computation in other domains 64 00:02:37,620 --> 00:02:38,610 now. 65 00:02:38,610 --> 00:02:41,160 Our brains have a reprogrammable architecture. 66 00:02:41,160 --> 00:02:43,320 The same chunk of tissue can be used 67 00:02:43,320 --> 00:02:44,760 for several different purposes. 68 00:02:44,760 --> 00:02:46,634 Through learning and through our experiences, 69 00:02:46,634 --> 00:02:49,530 we can modify those architectures. 70 00:02:49,530 --> 00:02:51,690 A thing that has been quite interesting, 71 00:02:51,690 --> 00:02:53,630 and that maybe we'll come back to, 72 00:02:53,630 --> 00:02:56,640 is the notion of being able to do single shot learning, as 73 00:02:56,640 --> 00:02:59,490 opposed to some machine learning algorithms that require 74 00:02:59,490 --> 00:03:01,860 lots and lots of data to train. 75 00:03:01,860 --> 00:03:05,220 We can easily discover a structure in data. 76 00:03:05,220 --> 00:03:07,290 The notion of fault tolerance and robustness 77 00:03:07,290 --> 00:03:10,650 to transformations is an essential one. 78 00:03:10,650 --> 00:03:12,900 Robustness is arguably a fundamental property 79 00:03:12,900 --> 00:03:16,410 of biology and one that has been very, very hard to implement 80 00:03:16,410 --> 00:03:18,320 in computational circuitry. 81 00:03:18,320 --> 00:03:20,130 And for engineers, the whole issue 82 00:03:20,130 --> 00:03:23,040 about how to have different systems integrate information, 83 00:03:23,040 --> 00:03:25,560 and interact with each other, has been and continues 84 00:03:25,560 --> 00:03:27,270 to be a fundamental challenge. 85 00:03:27,270 --> 00:03:28,860 And our brains do that all the time. 86 00:03:28,860 --> 00:03:30,235 We're walking down the street, we 87 00:03:30,235 --> 00:03:31,620 can integrate visual information, 88 00:03:31,620 --> 00:03:34,320 with auditory information with our targets, our plans, what 89 00:03:34,320 --> 00:03:38,760 we're interested in doing, on social interactions, and so on. 90 00:03:38,760 --> 00:03:40,890 So why do we want to study neural circuits. 91 00:03:40,890 --> 00:03:42,630 So I think we are in the golden era 92 00:03:42,630 --> 00:03:45,510 right now, because we can begin to explore the answers to some 93 00:03:45,510 --> 00:03:49,560 of these Turing questions in brains at the biological level. 94 00:03:49,560 --> 00:03:53,100 So we can study high level cognitive phenomena 95 00:03:53,100 --> 00:03:55,230 at the level of neurons, and circuits of neurons. 96 00:03:55,230 --> 00:03:58,830 And I'll give you a few examples of that later on. 97 00:03:58,830 --> 00:04:01,620 More recently, and I'll come back to this towards the end, 98 00:04:01,620 --> 00:04:03,390 we've had the opportunity to begin 99 00:04:03,390 --> 00:04:06,960 to manipulate, and disrupt, and interact with neural circuits 100 00:04:06,960 --> 00:04:09,300 at unprecedented resolution. 101 00:04:09,300 --> 00:04:12,060 So we can begin to turn on and off 102 00:04:12,060 --> 00:04:14,100 specific subsets of neurons. 103 00:04:14,100 --> 00:04:17,430 And that has tremendously accelerated our possibility 104 00:04:17,430 --> 00:04:21,000 to test theories at the neural level. 105 00:04:21,000 --> 00:04:24,122 And then again, the notion being that empirical findings can 106 00:04:24,122 --> 00:04:26,205 be translated into computational algorithms-- that 107 00:04:26,205 --> 00:04:29,564 is, if we really understand how biology solves the problem, 108 00:04:29,564 --> 00:04:31,230 in principle, we should be able to write 109 00:04:31,230 --> 00:04:34,950 mathematical equations, and then write code that mimics 110 00:04:34,950 --> 00:04:36,190 some of those computations. 111 00:04:36,190 --> 00:04:38,190 And some of the examples of that, we 112 00:04:38,190 --> 00:04:40,320 talk about in the visual system in my presentation, 113 00:04:40,320 --> 00:04:42,670 but also in Jim DiCarlo's presentation. 114 00:04:42,670 --> 00:04:44,670 These are just advertising for a couple of books 115 00:04:44,670 --> 00:04:47,100 that I find interesting and relevant 116 00:04:47,100 --> 00:04:48,350 in computational neuroscience. 117 00:04:48,350 --> 00:04:50,532 I'm not going to have time to do any justice 118 00:04:50,532 --> 00:04:52,490 to the entire field of computation neuroscience 119 00:04:52,490 --> 00:04:52,989 at all. 120 00:04:52,989 --> 00:04:55,340 So all these slides will be in Dropbox, 121 00:04:55,340 --> 00:04:56,970 so if anyone wants to learn more about 122 00:04:56,970 --> 00:04:58,140 computational neuroscience. 123 00:04:58,140 --> 00:05:00,135 These are lot of tremendous books. 124 00:05:00,135 --> 00:05:01,760 Larry Abbott is the author of this one, 125 00:05:01,760 --> 00:05:04,910 and he'll be talking tonight. 126 00:05:04,910 --> 00:05:06,930 So how do we study biological circuitry. 127 00:05:06,930 --> 00:05:09,620 And I realize that this is deja vu and very well known 128 00:05:09,620 --> 00:05:10,730 for many of you. 129 00:05:10,730 --> 00:05:12,770 But in general, we have a variety 130 00:05:12,770 --> 00:05:15,890 of techniques to probe the function of brain circuits. 131 00:05:15,890 --> 00:05:18,410 And this is showing the temporal resolution 132 00:05:18,410 --> 00:05:20,660 of different techniques, and the spatial resolution 133 00:05:20,660 --> 00:05:24,050 of different techniques used to study neural circuits. 134 00:05:24,050 --> 00:05:26,090 All the way from techniques that have 135 00:05:26,090 --> 00:05:28,370 limited spatial and temporal resolution, 136 00:05:28,370 --> 00:05:30,770 such as PET and fMRI-- 137 00:05:30,770 --> 00:05:33,590 techniques that have very high temporal resolution, 138 00:05:33,590 --> 00:05:35,810 but relatively poor spatial resolution-- 139 00:05:35,810 --> 00:05:37,310 all the way to techniques that allow 140 00:05:37,310 --> 00:05:40,460 us to interrogate the function of individual channels 141 00:05:40,460 --> 00:05:41,560 with neurons. 142 00:05:41,560 --> 00:05:43,670 So most of what I'm going to talk about today 143 00:05:43,670 --> 00:05:46,310 is what we refer to as the neural circuit level, somewhere 144 00:05:46,310 --> 00:05:49,580 in between single neurons and then ensembles of neurons 145 00:05:49,580 --> 00:05:51,140 recording the local field potential, 146 00:05:51,140 --> 00:05:54,350 which give us the resolution of milliseconds, where we think 147 00:05:54,350 --> 00:05:56,780 a lot of the computations in the cortex are happening, 148 00:05:56,780 --> 00:05:59,900 and where we think we can begin to elucidate how neurons 149 00:05:59,900 --> 00:06:02,810 interact with each other. 150 00:06:02,810 --> 00:06:04,490 So to start from the very beginning, 151 00:06:04,490 --> 00:06:06,350 we need to understand what a neuron does. 152 00:06:06,350 --> 00:06:10,140 And again, many of you are quite familiar with this. 153 00:06:10,140 --> 00:06:12,260 But the basic fundamental understanding 154 00:06:12,260 --> 00:06:15,200 of what a neuron does is to integrate information-- receive 155 00:06:15,200 --> 00:06:17,240 information through its dendrites, 156 00:06:17,240 --> 00:06:19,310 integrates that information, and decides 157 00:06:19,310 --> 00:06:22,760 whether to fire a spike or not. 158 00:06:22,760 --> 00:06:25,610 Interestingly, some of the basic intuitions of our neuron 159 00:06:25,610 --> 00:06:29,300 function were essentially conceived by a Spaniard, 160 00:06:29,300 --> 00:06:30,320 Ramón y Cajal. 161 00:06:30,320 --> 00:06:31,730 He wanted to be an artist. 162 00:06:31,730 --> 00:06:34,700 His parents told him that he could not become an artist, 163 00:06:34,700 --> 00:06:37,130 he had to become a clinician, a medical doctor. 164 00:06:37,130 --> 00:06:38,940 So he followed the tradition. 165 00:06:38,940 --> 00:06:40,530 He became a medical doctor. 166 00:06:40,530 --> 00:06:43,700 But then he said, well, what I really like doing is drawing. 167 00:06:43,700 --> 00:06:46,760 And so he bought a microscope, he put it in his kitchen, 168 00:06:46,760 --> 00:06:50,100 and he spent a good chunk of his life drawing, essentially. 169 00:06:50,100 --> 00:06:53,810 So he would look at neurons, and he would draw their shapes. 170 00:06:53,810 --> 00:06:56,540 And that's essentially how neuroscience started. 171 00:06:56,540 --> 00:06:59,450 Just from these beautiful and amazing array 172 00:06:59,450 --> 00:07:03,050 of drawings of neurons, he conjectured the basic flow 173 00:07:03,050 --> 00:07:03,740 of information. 174 00:07:03,740 --> 00:07:05,739 This notion that this integration of information 175 00:07:05,739 --> 00:07:07,640 through dendrites, all of this integration 176 00:07:07,640 --> 00:07:08,990 happens in the soma. 177 00:07:08,990 --> 00:07:11,990 And from there, neurons decide whether to fire a spike or not. 178 00:07:11,990 --> 00:07:13,430 Nothing more, nothing less. 179 00:07:13,430 --> 00:07:16,290 That's essentially the fundamental unit 180 00:07:16,290 --> 00:07:18,950 of computation in our brains. 181 00:07:18,950 --> 00:07:22,670 How do we think about and model those processes? 182 00:07:22,670 --> 00:07:24,830 There's a family of different types of models 183 00:07:24,830 --> 00:07:28,100 that people have used to describe what a neuron does. 184 00:07:28,100 --> 00:07:31,940 These models differ in terms of their biological accuracy, 185 00:07:31,940 --> 00:07:34,550 and their computational complexity. 186 00:07:34,550 --> 00:07:37,880 One of the most used ones is perhaps an integrate and fire 187 00:07:37,880 --> 00:07:38,780 neuron. 188 00:07:38,780 --> 00:07:41,750 This is a very simple RC circuit. 189 00:07:41,750 --> 00:07:45,560 It basically integrates current, and then through a threshold, 190 00:07:45,560 --> 00:07:49,670 the neuron decides when to fire or not to fire a spike. 191 00:07:49,670 --> 00:07:53,330 This is essentially treating neurons as point masses. 192 00:07:53,330 --> 00:07:55,970 There are people out there who have argued that you 193 00:07:55,970 --> 00:07:57,152 need more and more detail. 194 00:07:57,152 --> 00:07:59,360 You need to know exactly how many dendrites you have, 195 00:07:59,360 --> 00:08:00,910 and the position of each dendrite, 196 00:08:00,910 --> 00:08:02,870 and on and on and on and on. 197 00:08:02,870 --> 00:08:04,700 What's the exact resolution at which we 198 00:08:04,700 --> 00:08:08,750 should study neuron systems is a fundamental open question. 199 00:08:08,750 --> 00:08:11,150 We don't know what's the right level of abstraction. 200 00:08:11,150 --> 00:08:14,120 There are people who think about brains in the context of blood 201 00:08:14,120 --> 00:08:17,129 flow, and millions and millions of neurons averaged together. 202 00:08:17,129 --> 00:08:18,920 There are people who think that we actually 203 00:08:18,920 --> 00:08:22,130 need to pay attention to the exact details of how 204 00:08:22,130 --> 00:08:25,580 every single dendrite integrates information, and so on. 205 00:08:25,580 --> 00:08:27,875 For many of us, this is a sufficient level 206 00:08:27,875 --> 00:08:28,500 of abstraction. 207 00:08:28,500 --> 00:08:31,760 The notion that there's a neuron that can integrate information. 208 00:08:31,760 --> 00:08:33,799 So we would like to push this notion 209 00:08:33,799 --> 00:08:36,860 that we can think about models with single neurons, 210 00:08:36,860 --> 00:08:39,080 and see how far we can go, understanding that we are 211 00:08:39,080 --> 00:08:43,039 ignoring a lot of the inner complexity of what's happening 212 00:08:43,039 --> 00:08:45,890 inside a neuron itself. 213 00:08:45,890 --> 00:08:47,660 So very, very briefly just to push 214 00:08:47,660 --> 00:08:50,580 the notion that this is not rocket science. 215 00:08:50,580 --> 00:08:53,810 It's very, very easy to build these integrate-and-fire model 216 00:08:53,810 --> 00:08:54,340 simulations. 217 00:08:54,340 --> 00:08:57,810 I know many of you do this on a daily basis. 218 00:08:57,810 --> 00:09:00,970 This is the equation of the RC circuit. 219 00:09:00,970 --> 00:09:03,800 There's current that flows through a capacitance. 220 00:09:03,800 --> 00:09:07,690 There's current that flows through the resistance, which, 221 00:09:07,690 --> 00:09:10,880 this RC circuit, we think of as composed of the ion channels 222 00:09:10,880 --> 00:09:12,750 in the membranes of the neurons. 223 00:09:12,750 --> 00:09:15,350 And this is all there is to it in terms 224 00:09:15,350 --> 00:09:18,410 of a lot of the simulation that we use to understand 225 00:09:18,410 --> 00:09:20,404 the function of neurons. 226 00:09:20,404 --> 00:09:22,070 And again, just to tell you that there's 227 00:09:22,070 --> 00:09:25,560 nothing scary or fundamentally difficult about this, 228 00:09:25,560 --> 00:09:27,479 here's just a couple of lines in MATLAB 229 00:09:27,479 --> 00:09:29,270 that you can take a look at if you've never 230 00:09:29,270 --> 00:09:30,830 done these kind of simulations. 231 00:09:30,830 --> 00:09:33,950 This is a very simple and perhaps even somewhat wrong 232 00:09:33,950 --> 00:09:37,610 simulation of an integrate-and-fire neuron. 233 00:09:37,610 --> 00:09:40,730 But just to tell you that it's relatively simple to build 234 00:09:40,730 --> 00:09:42,380 models of individual neurons that 235 00:09:42,380 --> 00:09:43,880 have these fundamental properties 236 00:09:43,880 --> 00:09:46,760 of being able to integrate information, and decide 237 00:09:46,760 --> 00:09:48,020 when to fire a spike. 238 00:09:48,020 --> 00:09:50,330 The fundamental questions that we really 239 00:09:50,330 --> 00:09:53,180 want to tackle in CBMM have to do 240 00:09:53,180 --> 00:09:55,100 with putting together lots of neurons, 241 00:09:55,100 --> 00:09:57,290 and understanding the function of circuits. 242 00:09:57,290 --> 00:09:59,340 It's not enough to understand individual neurons. 243 00:09:59,340 --> 00:10:01,900 We need to understand how they interact together. 244 00:10:01,900 --> 00:10:04,150 We want to understand what is there, 245 00:10:04,150 --> 00:10:07,780 who's there, what are they doing to whom, and when, and why. 246 00:10:07,780 --> 00:10:11,290 We really need to understand the activity of multiple neurons 247 00:10:11,290 --> 00:10:14,270 together in the form of circuitry. 248 00:10:14,270 --> 00:10:16,860 So just a handful of basic definitions. 249 00:10:16,860 --> 00:10:18,760 If we have a circuitry like this, 250 00:10:18,760 --> 00:10:21,820 where we start connecting multiple neurons together, 251 00:10:21,820 --> 00:10:25,660 information flows here in this circuitry in this direction. 252 00:10:25,660 --> 00:10:28,780 We refer to the connections between neurons 253 00:10:28,780 --> 00:10:31,225 that go in this direction as feed forward. 254 00:10:31,225 --> 00:10:33,850 We refer to the connections that flow in the opposite direction 255 00:10:33,850 --> 00:10:36,610 as feedback and I use the word recurrent connections 256 00:10:36,610 --> 00:10:40,130 for the horizontal connections within a particular layer. 257 00:10:40,130 --> 00:10:41,770 So this is just to fix the nomenclature 258 00:10:41,770 --> 00:10:45,220 for the discussion that will come next, and also 259 00:10:45,220 --> 00:10:49,990 today in the afternoon with Jim DiCarlo's presentation. 260 00:10:49,990 --> 00:10:52,300 Throughout a lot of anatomical work, 261 00:10:52,300 --> 00:10:55,810 we have begun to elucidate some of the basic connectivity 262 00:10:55,810 --> 00:10:58,130 between neurons in the cortex. 263 00:10:58,130 --> 00:11:00,130 And this is the primary example that 264 00:11:00,130 --> 00:11:03,580 has been cited extremely often of what we understand 265 00:11:03,580 --> 00:11:06,070 about the connectivity between different areas 266 00:11:06,070 --> 00:11:07,600 in the macaque monkey. 267 00:11:07,600 --> 00:11:10,360 We don't have a diagram like this for the human brain. 268 00:11:10,360 --> 00:11:12,220 Most of the detailed anatomical work 269 00:11:12,220 --> 00:11:14,740 has been done in macaque monkeys. 270 00:11:14,740 --> 00:11:18,850 So each of these boxes here represents a brain area, 271 00:11:18,850 --> 00:11:20,770 and this encapsulates our understanding 272 00:11:20,770 --> 00:11:22,480 of who talks to whom, or which area 273 00:11:22,480 --> 00:11:25,490 talks to which other area in terms of visual cortex. 274 00:11:25,490 --> 00:11:27,550 There's a lot of different parts of cortex 275 00:11:27,550 --> 00:11:29,680 that represent visual information. 276 00:11:29,680 --> 00:11:31,900 Here at the bottom, we have the retina. 277 00:11:31,900 --> 00:11:35,410 Information from the retina flows through to the LGN. 278 00:11:35,410 --> 00:11:38,710 From the LGN, information goes to primary visual cortex, 279 00:11:38,710 --> 00:11:40,310 sitting right here. 280 00:11:40,310 --> 00:11:42,310 And from there, there's a cascade 281 00:11:42,310 --> 00:11:45,580 that is largely parallel, and at the same time, hierarchical, 282 00:11:45,580 --> 00:11:48,180 of a conglomerate of multiple areas 283 00:11:48,180 --> 00:11:52,190 that are fundamental in processing visual information. 284 00:11:52,190 --> 00:11:54,206 We'll talk about some of these areas next. 285 00:11:54,206 --> 00:11:56,080 And we'll also talk about some of these areas 286 00:11:56,080 --> 00:11:58,150 today in the afternoon when Jim discusses 287 00:11:58,150 --> 00:12:00,400 what are the fundamental computations involved 288 00:12:00,400 --> 00:12:04,480 in visual object recognition. 289 00:12:04,480 --> 00:12:06,064 One of the fundamental clues as to how 290 00:12:06,064 --> 00:12:07,813 do we understand, how do we know that this 291 00:12:07,813 --> 00:12:09,310 is a particular visual area, how do 292 00:12:09,310 --> 00:12:12,730 we know that this is important for our vision, 293 00:12:12,730 --> 00:12:14,770 has come from anatomical lesions. 294 00:12:14,770 --> 00:12:18,080 Mostly in monkeys, but in some cases, in humans as well. 295 00:12:18,080 --> 00:12:20,320 So if you make lesions in some of these areas, 296 00:12:20,320 --> 00:12:22,510 depending on exactly where you make that lesion, 297 00:12:22,510 --> 00:12:25,030 people either become completely blind, 298 00:12:25,030 --> 00:12:26,740 or they have a particular scotoma, 299 00:12:26,740 --> 00:12:29,510 a particular chunk of the visual field where they cannot see. 300 00:12:29,510 --> 00:12:31,420 Or they have more high order types 301 00:12:31,420 --> 00:12:35,950 of deficits in terms of visual recognition. 302 00:12:35,950 --> 00:12:38,110 As an example, the primary visual cortex 303 00:12:38,110 --> 00:12:40,870 was discovered by people who were of the [INAUDIBLE] they 304 00:12:40,870 --> 00:12:43,780 were studying, the trajectory of bullets in soldiers 305 00:12:43,780 --> 00:12:46,990 during World War I. And by discovering 306 00:12:46,990 --> 00:12:49,570 that some of those peoples had a blind part 307 00:12:49,570 --> 00:12:52,750 to their visual field, and that was a topographically organized 308 00:12:52,750 --> 00:12:55,000 depending on the particular trajectory of the bullet 309 00:12:55,000 --> 00:12:57,080 through their occipital cortex. 310 00:12:57,080 --> 00:13:00,520 And that's how we became to think about V1 as fundamental 311 00:13:00,520 --> 00:13:02,200 in visual processing. 312 00:13:02,200 --> 00:13:03,890 It is not a perfect hierarchy. 313 00:13:03,890 --> 00:13:06,250 It's not there is A, B, C, D. Right? 314 00:13:06,250 --> 00:13:07,250 For a number of reasons. 315 00:13:07,250 --> 00:13:10,080 One is that there are lots of parallel connections. 316 00:13:10,080 --> 00:13:12,130 There are lots of different stages 317 00:13:12,130 --> 00:13:14,090 that are connected to each other. 318 00:13:14,090 --> 00:13:17,620 And one of the ways to define a hierarchy 319 00:13:17,620 --> 00:13:20,680 is by looking at the timing of the responses 320 00:13:20,680 --> 00:13:22,700 in different areas. 321 00:13:22,700 --> 00:13:26,334 So if you look at the average latency of the response in each 322 00:13:26,334 --> 00:13:28,000 of these areas, you'll find that there's 323 00:13:28,000 --> 00:13:29,470 an approximate hierarchy. 324 00:13:29,470 --> 00:13:32,680 Information gets out of the retina approximately at 50 325 00:13:32,680 --> 00:13:34,210 milliseconds. 326 00:13:34,210 --> 00:13:36,760 About 60 or so milliseconds in LGN, and so on. 327 00:13:36,760 --> 00:13:40,000 So it's approximately a 10 millisecond cost 328 00:13:40,000 --> 00:13:42,620 per step in terms of the average latency. 329 00:13:42,620 --> 00:13:44,740 However, if you start looking at the distribution, 330 00:13:44,740 --> 00:13:46,690 you'll see that it's not a strict hierarchy. 331 00:13:46,690 --> 00:13:51,070 For example, there are neurons in area V4 that 332 00:13:51,070 --> 00:13:52,730 are the early neurons in V4 may fire 333 00:13:52,730 --> 00:13:55,150 before the late neurons in V1. 334 00:13:55,150 --> 00:13:58,300 And that shows you that the circuitry is far more complex 335 00:13:58,300 --> 00:14:00,460 than just a simple hierarchy. 336 00:14:00,460 --> 00:14:02,680 One way to put some order into this seemingly 337 00:14:02,680 --> 00:14:05,650 complex and chaotic circuitry, one simplification 338 00:14:05,650 --> 00:14:07,430 is that there are two main pathways. 339 00:14:07,430 --> 00:14:09,220 One is the so-called what pathway. 340 00:14:09,220 --> 00:14:11,430 The other one is the so-called where pathway. 341 00:14:11,430 --> 00:14:14,380 The what pathway essentially is the ventral pathway. 342 00:14:14,380 --> 00:14:16,510 It's mostly involved in object recognition, 343 00:14:16,510 --> 00:14:18,300 trying to understand what is there. 344 00:14:18,300 --> 00:14:20,320 The dorsal pathway, the where pathway, 345 00:14:20,320 --> 00:14:22,720 is most involved in motion, and being 346 00:14:22,720 --> 00:14:26,080 able to detect where objects are, stereo, and so on. 347 00:14:26,080 --> 00:14:28,100 Again, this is not a strict division, 348 00:14:28,100 --> 00:14:30,520 but it's a pretty good approximation that many of us 349 00:14:30,520 --> 00:14:33,250 have used in terms of thinking about the fundamental 350 00:14:33,250 --> 00:14:35,970 computations in these areas. 351 00:14:35,970 --> 00:14:38,397 Now we often think about these boxes, 352 00:14:38,397 --> 00:14:40,480 but of course, there's a huge amount of complexity 353 00:14:40,480 --> 00:14:42,130 within each of these boxes. 354 00:14:42,130 --> 00:14:45,040 So if we zoom in one of these areas, 355 00:14:45,040 --> 00:14:47,230 we discover that there's a complex hierarchy 356 00:14:47,230 --> 00:14:48,520 of computations. 357 00:14:48,520 --> 00:14:50,100 There are multiple different layers. 358 00:14:50,100 --> 00:14:53,180 The cortex is essentially a six layer structure. 359 00:14:53,180 --> 00:14:54,970 And there are specific rules. 360 00:14:54,970 --> 00:14:57,970 People have referred to this as a canonical micro circuitry. 361 00:14:57,970 --> 00:15:01,030 There's a specific set of rules in terms of how information 362 00:15:01,030 --> 00:15:04,060 flows from one layer to another in terms of each 363 00:15:04,060 --> 00:15:06,100 of these cortical structures. 364 00:15:06,100 --> 00:15:09,610 To a first approximation, this canonical circuitry 365 00:15:09,610 --> 00:15:12,190 is common to most of these areas. 366 00:15:12,190 --> 00:15:13,600 There are these rules about which 367 00:15:13,600 --> 00:15:15,580 layer receives information first, and sends 368 00:15:15,580 --> 00:15:17,320 information to areas are more or less 369 00:15:17,320 --> 00:15:20,530 constant throughout the cortical circuitry. 370 00:15:20,530 --> 00:15:23,620 This doesn't mean that we understand this circuitry well, 371 00:15:23,620 --> 00:15:25,480 or what each of these connections is doing. 372 00:15:25,480 --> 00:15:26,900 We certainly don't. 373 00:15:26,900 --> 00:15:30,100 But these are initial steps to sort of decipher some 374 00:15:30,100 --> 00:15:33,270 of these basic biological connectivity that 375 00:15:33,270 --> 00:15:36,150 has fundamental computational properties for vision 376 00:15:36,150 --> 00:15:38,480 processing. 377 00:15:38,480 --> 00:15:40,570 So our lab has been very interested in what 378 00:15:40,570 --> 00:15:42,610 we call the first order approximation 379 00:15:42,610 --> 00:15:45,790 or immediate approximation to visual object recognition. 380 00:15:45,790 --> 00:15:48,790 The notion that we can recognize objects very fast, 381 00:15:48,790 --> 00:15:51,250 and that this can be explained, essentially, 382 00:15:51,250 --> 00:15:54,460 as the bottom-up hierarchical process. 383 00:15:54,460 --> 00:15:57,310 Jim DiCarlo is going to talk about this extensively 384 00:15:57,310 --> 00:16:00,430 this afternoon, so I'm going to essentially skip that, and jump 385 00:16:00,430 --> 00:16:02,860 into more recent work that we've done trying to think 386 00:16:02,860 --> 00:16:04,820 about top-down connections. 387 00:16:04,820 --> 00:16:06,490 But just let me briefly say why we 388 00:16:06,490 --> 00:16:09,070 think that the first pass of visual information 389 00:16:09,070 --> 00:16:12,100 can be semi-seriously approximated by these purely 390 00:16:12,100 --> 00:16:13,510 bottom-up processing. 391 00:16:13,510 --> 00:16:15,010 One is that at the behavioral level, 392 00:16:15,010 --> 00:16:17,470 we can recognize objects very, very fast. 393 00:16:17,470 --> 00:16:19,570 There's a series of psychophysical experiments 394 00:16:19,570 --> 00:16:21,760 that demonstrate that if I show you an object, 395 00:16:21,760 --> 00:16:26,110 recognition can happen within about 150 milliseconds or so. 396 00:16:26,110 --> 00:16:28,000 We know that the physiological signals 397 00:16:28,000 --> 00:16:30,220 underlying visual object recognition also 398 00:16:30,220 --> 00:16:31,780 happen very fast. 399 00:16:31,780 --> 00:16:34,040 Within about 100 to 150 milliseconds, 400 00:16:34,040 --> 00:16:37,030 we can find neurons that show very selective responses 401 00:16:37,030 --> 00:16:39,740 to complex objects, and again, you'll see examples of that 402 00:16:39,740 --> 00:16:42,220 this afternoon. 403 00:16:42,220 --> 00:16:45,340 The behavior and the physiology have inspired generations 404 00:16:45,340 --> 00:16:48,090 of computational models that are purely bottom-up, where there 405 00:16:48,090 --> 00:16:51,880 is no recurrency, and that can be quite successful in terms 406 00:16:51,880 --> 00:16:53,670 of visual recognition. 407 00:16:53,670 --> 00:16:56,440 To our first approximation, the recent excitement 408 00:16:56,440 --> 00:16:59,320 with deep convolutional networks can be traced back 409 00:16:59,320 --> 00:17:01,960 to some of these ideas, and some of these basic biologically 410 00:17:01,960 --> 00:17:04,911 inspired computations that are purely bottom-up. 411 00:17:04,911 --> 00:17:06,369 So to summarize-- and I'm not going 412 00:17:06,369 --> 00:17:09,089 to give any more details-- we think that the first 100 413 00:17:09,089 --> 00:17:12,069 milliseconds or so of visual processing 414 00:17:12,069 --> 00:17:15,099 can be approximated by these purely bottom-up, 415 00:17:15,099 --> 00:17:19,480 semi hierarchical sequence of computations. 416 00:17:19,480 --> 00:17:23,060 And this leaves open a fundamental question, 417 00:17:23,060 --> 00:17:27,520 which is, why we have all these massive feedback connections? 418 00:17:27,520 --> 00:17:29,500 We know that in cortex, there are actually 419 00:17:29,500 --> 00:17:31,840 more recurrent and feedback connections 420 00:17:31,840 --> 00:17:33,000 than feed-forward ones. 421 00:17:33,000 --> 00:17:34,680 And what I'd like to talk about today 422 00:17:34,680 --> 00:17:37,690 is a couple of ideas of what all of those feedback connections 423 00:17:37,690 --> 00:17:38,810 may be doing. 424 00:17:38,810 --> 00:17:42,940 So this is an anatomical study looking at a lot of the boxes 425 00:17:42,940 --> 00:17:44,980 that I showed you before, and showing 426 00:17:44,980 --> 00:17:47,140 how many of the connections to any given area 427 00:17:47,140 --> 00:17:49,360 come from one of these other variants. 428 00:17:49,360 --> 00:17:52,160 For example, if we take just primary visual cortex, 429 00:17:52,160 --> 00:17:54,430 this is saying that a good fraction 430 00:17:54,430 --> 00:17:56,770 of the connections to primary visual cortex 431 00:17:56,770 --> 00:17:57,970 actually come from V2. 432 00:17:57,970 --> 00:18:00,490 That's from the next stage of processing, 433 00:18:00,490 --> 00:18:02,620 rather than from V1 itself. 434 00:18:02,620 --> 00:18:05,830 All in all, if you quantify for a given neuron in V1, 435 00:18:05,830 --> 00:18:08,260 how many signals are coming from a bottom-up source that 436 00:18:08,260 --> 00:18:10,840 is for LGN versus how many signals are coming 437 00:18:10,840 --> 00:18:14,050 from other V1 neurons or from higher visual areas, 438 00:18:14,050 --> 00:18:16,600 it turns out that there are more horizontal and top-down 439 00:18:16,600 --> 00:18:18,402 projections than bottom-up ones. 440 00:18:18,402 --> 00:18:19,360 So what are they doing? 441 00:18:19,360 --> 00:18:21,609 If we can approximate the first 100 milliseconds or so 442 00:18:21,609 --> 00:18:23,970 of vision so well with bottom-up hierarchies, 443 00:18:23,970 --> 00:18:27,560 what are all these feedback signals doing? 444 00:18:27,560 --> 00:18:29,800 So this brings me to three examples 445 00:18:29,800 --> 00:18:32,050 that I'd like to discuss today of recent work 446 00:18:32,050 --> 00:18:34,990 that we've done to take some initial principles in thinking 447 00:18:34,990 --> 00:18:37,450 about what this feedback connections could be doing 448 00:18:37,450 --> 00:18:40,270 in terms of visual recognition. 449 00:18:40,270 --> 00:18:42,190 So I'll start by giving you an example 450 00:18:42,190 --> 00:18:44,980 of trying to understand the basic fundamental unit 451 00:18:44,980 --> 00:18:45,930 of feedback. 452 00:18:45,930 --> 00:18:47,760 That is these canonical computations, 453 00:18:47,760 --> 00:18:51,400 and by looking at the feedback that happens from V2 to V1 454 00:18:51,400 --> 00:18:53,594 in the visual system. 455 00:18:53,594 --> 00:18:55,510 Next, I'm going to give you an example of what 456 00:18:55,510 --> 00:18:58,120 happens during a visual search, where we also 457 00:18:58,120 --> 00:19:01,030 think that feedback signals may be playing a fundamental role, 458 00:19:01,030 --> 00:19:03,550 if you have to do or Where's Waldo kind of task, where 459 00:19:03,550 --> 00:19:06,220 you have to search for objects and in the environment. 460 00:19:06,220 --> 00:19:08,470 And finally, I will talk about pattern completion, how 461 00:19:08,470 --> 00:19:11,290 you can recognize objects that are heavily occluded, where 462 00:19:11,290 --> 00:19:13,490 we also think that feedback signals may 463 00:19:13,490 --> 00:19:16,340 be playing an important role. 464 00:19:16,340 --> 00:19:18,100 So before I go on to describe what 465 00:19:18,100 --> 00:19:21,430 we're seeing the feedback from V2 to V1 maybe doing, 466 00:19:21,430 --> 00:19:23,500 let me describe very quickly classical work 467 00:19:23,500 --> 00:19:26,590 that Hubel and Wiesel did that got them the Nobel Prize 468 00:19:26,590 --> 00:19:28,090 by recording the activity of neurons 469 00:19:28,090 --> 00:19:30,530 in primary visual cortex. 470 00:19:30,530 --> 00:19:32,410 They started working in kittens, and then 471 00:19:32,410 --> 00:19:35,140 subsequently in monkeys, and discovered 472 00:19:35,140 --> 00:19:38,170 that there are neurons that show orientation tuning, meaning 473 00:19:38,170 --> 00:19:40,390 that they respond very vigorously. 474 00:19:40,390 --> 00:19:42,470 These are spikes, each of these marks 475 00:19:42,470 --> 00:19:44,020 corresponds to an action potential, 476 00:19:44,020 --> 00:19:47,420 the fundamental language of computation in cortex. 477 00:19:47,420 --> 00:19:49,510 And this neuron responds quite vigorously 478 00:19:49,510 --> 00:19:51,934 when the cat was seeing a bar of this orientation. 479 00:19:51,934 --> 00:19:53,350 And essentially, there's no firing 480 00:19:53,350 --> 00:19:55,540 at all with this type of stumulus 481 00:19:55,540 --> 00:19:57,340 in the receptive field. 482 00:19:57,340 --> 00:20:00,630 This was fundamental because it transformed our understanding 483 00:20:00,630 --> 00:20:03,990 of the essential computations in primary visual cortex 484 00:20:03,990 --> 00:20:07,290 in terms of filtering the initial stimulus. 485 00:20:07,290 --> 00:20:10,412 This is what we now describe by Gabor functions. 486 00:20:10,412 --> 00:20:12,370 And if you look at deep convolutional networks, 487 00:20:12,370 --> 00:20:14,820 many of them, if not perhaps all of them, 488 00:20:14,820 --> 00:20:17,100 start with some sort of filtering operation 489 00:20:17,100 --> 00:20:20,220 that is either Gabor filters or resembles this type 490 00:20:20,220 --> 00:20:23,550 of orientation that we think is a fundamental aspect of how 491 00:20:23,550 --> 00:20:28,102 we start to process information in the visual field. 492 00:20:28,102 --> 00:20:30,310 One of the beautiful things that Hubel and Wiesel did 493 00:20:30,310 --> 00:20:32,580 is not only to make these discoveries, 494 00:20:32,580 --> 00:20:36,540 but also to come up with very simple graphical models of how 495 00:20:36,540 --> 00:20:38,550 they thought this could come about. 496 00:20:38,550 --> 00:20:41,040 And this remains today one of the fundamental ways 497 00:20:41,040 --> 00:20:43,680 in which we think about how our orientation tuning may 498 00:20:43,680 --> 00:20:44,970 come about. 499 00:20:44,970 --> 00:20:47,880 If you recall the activity of neurons in the retina 500 00:20:47,880 --> 00:20:50,640 or in the LGN, you'll find what's 501 00:20:50,640 --> 00:20:52,470 called center surround receptive fields. 502 00:20:52,470 --> 00:20:56,340 These are circularly symmetric receptive fields, 503 00:20:56,340 --> 00:20:59,550 with an area in the center that excites the neuron, 504 00:20:59,550 --> 00:21:03,250 and an area in the surround that inhibits the neuron. 505 00:21:03,250 --> 00:21:06,390 What they conjecture is that if you put together multiple LGN 506 00:21:06,390 --> 00:21:10,920 cells, whose receptive fields are aligned 507 00:21:10,920 --> 00:21:14,440 along a certain orientation, and you simply combine all of them, 508 00:21:14,440 --> 00:21:17,610 you simply add the responses of all of those neurons, 509 00:21:17,610 --> 00:21:19,980 you can get a neuron in the primary visual cortex 510 00:21:19,980 --> 00:21:22,000 that has orientation tuning. 511 00:21:22,000 --> 00:21:22,500 This 512 00:21:22,500 --> 00:21:25,200 is a problem that's far from solved, despite the fact 513 00:21:25,200 --> 00:21:26,964 that we have four or five decades. 514 00:21:26,964 --> 00:21:28,380 There are many, many models of how 515 00:21:28,380 --> 00:21:30,360 orientation tuning comes about. 516 00:21:30,360 --> 00:21:33,210 But this remains one of the basic bottom-up feed-forward 517 00:21:33,210 --> 00:21:35,310 ideas of how you can actually build 518 00:21:35,310 --> 00:21:38,540 orientation tuning from very simple receptive fields. 519 00:21:38,540 --> 00:21:40,380 This has informed a lot of our thinking 520 00:21:40,380 --> 00:21:43,020 about how basic computations can give rise 521 00:21:43,020 --> 00:21:47,760 to orientation tuning in a purely bottom-up fashion. 522 00:21:47,760 --> 00:21:49,470 In primary visual cortex, in addition 523 00:21:49,470 --> 00:21:52,230 to the so-called simple cells, are complex cells 524 00:21:52,230 --> 00:21:54,900 that show invariance to the exact position 525 00:21:54,900 --> 00:21:57,540 or the exact phase of the oriented bar 526 00:21:57,540 --> 00:21:59,340 within the receptive field. 527 00:21:59,340 --> 00:22:00,820 And that's illustrated here. 528 00:22:00,820 --> 00:22:03,000 So this is a simple cell. 529 00:22:03,000 --> 00:22:05,700 So this simple cell has orientation tuning, 530 00:22:05,700 --> 00:22:09,120 meaning that it responds more vigorously to this orientation 531 00:22:09,120 --> 00:22:10,920 than to this orientation. 532 00:22:10,920 --> 00:22:14,580 However, if you change the phase or the position of the oriented 533 00:22:14,580 --> 00:22:17,370 bar within the receptive field, the response 534 00:22:17,370 --> 00:22:19,410 decreases significantly. 535 00:22:19,410 --> 00:22:22,440 In contrast to this complex cell that not only 536 00:22:22,440 --> 00:22:24,360 has orientation tuning, meaning that it 537 00:22:24,360 --> 00:22:27,570 fires more vigorously to this orientation than to this one, 538 00:22:27,570 --> 00:22:30,730 but also has phase invariance, meaning that the response is 539 00:22:30,730 --> 00:22:33,840 more or less the same way, regardless of the exact phase 540 00:22:33,840 --> 00:22:35,460 or the exact position of the stimulus 541 00:22:35,460 --> 00:22:37,330 within the receptive field. 542 00:22:37,330 --> 00:22:39,570 And again, the notion that they postulated 543 00:22:39,570 --> 00:22:42,420 is that we can build these complex cells 544 00:22:42,420 --> 00:22:44,809 by a summation of activity or multiple simple cells. 545 00:22:44,809 --> 00:22:46,350 So again, if you imagine now that you 546 00:22:46,350 --> 00:22:50,160 have multiple simple cells with different receptive fields 547 00:22:50,160 --> 00:22:52,950 that are centered at these different positions, 548 00:22:52,950 --> 00:22:56,310 you can add them up, and create complex cells. 549 00:22:56,310 --> 00:22:59,070 These fundamental operations of simple and complex cells 550 00:22:59,070 --> 00:23:02,190 and primary visual cortex can be somehow traced 551 00:23:02,190 --> 00:23:05,850 to the root of a lot of the bottom-up hierarchical models. 552 00:23:05,850 --> 00:23:08,160 A lot of the deep convolutional networks today 553 00:23:08,160 --> 00:23:10,650 essentially have variations on these kind of themes, 554 00:23:10,650 --> 00:23:13,410 of filtering steps, nonlinear computations 555 00:23:13,410 --> 00:23:15,720 that give you invariance, and a concatenation 556 00:23:15,720 --> 00:23:18,420 of these filtering and invariance steps 557 00:23:18,420 --> 00:23:22,510 along the visual hierarchy. 558 00:23:22,510 --> 00:23:25,240 So in following up with this idea, 559 00:23:25,240 --> 00:23:29,490 I would like to understand the basics of what's 560 00:23:29,490 --> 00:23:33,120 the kind of information that's provided when you have signals 561 00:23:33,120 --> 00:23:35,730 from V2 to V1. 562 00:23:35,730 --> 00:23:37,680 To do that, we have been collaborating 563 00:23:37,680 --> 00:23:41,160 with Richard Born at Harvard Medical School, who has 564 00:23:41,160 --> 00:23:43,980 a way of implanting cryo loops. 565 00:23:43,980 --> 00:23:48,330 This is a device that can be implanted in monkeys in areas 566 00:23:48,330 --> 00:23:50,970 V2, and V3, lower the temperature, 567 00:23:50,970 --> 00:23:54,150 and thus reduce or essentially eliminate activity 568 00:23:54,150 --> 00:23:55,980 from areas V2 and V3. 569 00:23:55,980 --> 00:23:59,730 So that means that we can study V1 without activity 570 00:23:59,730 --> 00:24:01,270 in area V2 and V3. 571 00:24:01,270 --> 00:24:04,500 We can study V1 sans feedback. 572 00:24:04,500 --> 00:24:07,080 So this is an example of recordings 573 00:24:07,080 --> 00:24:09,270 of a neuron in this area. 574 00:24:09,270 --> 00:24:13,170 This is the normal activity that you get from the neuron. 575 00:24:13,170 --> 00:24:15,120 Here is when they present a visual stimulus. 576 00:24:15,120 --> 00:24:16,770 This is a spontaneous activity. 577 00:24:16,770 --> 00:24:19,020 Each of these dots corresponds to a spike. 578 00:24:19,020 --> 00:24:21,150 Each of these lines correspond to a repetition 579 00:24:21,150 --> 00:24:22,439 of the stimulus. 580 00:24:22,439 --> 00:24:24,480 This is a traditional way of showing raster plots 581 00:24:24,480 --> 00:24:26,500 for neuron responses. 582 00:24:26,500 --> 00:24:28,530 So you see that this is a spontaneous activity. 583 00:24:28,530 --> 00:24:29,790 You present the stimulus. 584 00:24:29,790 --> 00:24:32,910 There's an increase in the response of this neuron, 585 00:24:32,910 --> 00:24:35,797 as you might expect. 586 00:24:35,797 --> 00:24:36,630 Actually, I'm sorry. 587 00:24:36,630 --> 00:24:38,234 This actually starts here. 588 00:24:38,234 --> 00:24:40,650 So this is the spontaneous activity, this is the response. 589 00:24:40,650 --> 00:24:42,270 Now here, they turn on their pump. 590 00:24:42,270 --> 00:24:44,350 They start lowering the temperature. 591 00:24:44,350 --> 00:24:46,770 And you see within a couple of minutes, 592 00:24:46,770 --> 00:24:49,710 they essentially significantly reduce the responses. 593 00:24:49,710 --> 00:24:51,630 The largely silence-- not completely-- 594 00:24:51,630 --> 00:24:55,860 but largely silence activity in areas V2 and V3. 595 00:24:55,860 --> 00:24:59,010 And these are reversible, so when they turn the pumps off, 596 00:24:59,010 --> 00:25:00,090 activity comes back in. 597 00:25:00,090 --> 00:25:03,000 So the question is, what happens in primary visual cortex 598 00:25:03,000 --> 00:25:08,560 when you don't have feedback from V2 and V3. 599 00:25:08,560 --> 00:25:10,650 So the first thing they have characterized 600 00:25:10,650 --> 00:25:15,765 is that some of the basic properties of V1 do not change. 601 00:25:15,765 --> 00:25:18,360 It's consistent with the simple models 602 00:25:18,360 --> 00:25:22,080 that I just told you, where the orientation tuning 603 00:25:22,080 --> 00:25:24,990 in the primary visual cortex is largely 604 00:25:24,990 --> 00:25:27,240 dictated by the bottom-up inputs, 605 00:25:27,240 --> 00:25:29,135 by the signals from the LGN. 606 00:25:29,135 --> 00:25:30,510 The conjecture from that would be 607 00:25:30,510 --> 00:25:32,640 that if you silence V2 and V3, nothing 608 00:25:32,640 --> 00:25:34,410 would happen with orientation tuning 609 00:25:34,410 --> 00:25:36,070 in primary visual cortex. 610 00:25:36,070 --> 00:25:38,820 And that's essentially what they're showing here. 611 00:25:38,820 --> 00:25:40,544 These are example neurons. 612 00:25:40,544 --> 00:25:42,210 This is showing orientation selectivity. 613 00:25:42,210 --> 00:25:43,830 This is showing direction selectivity, 614 00:25:43,830 --> 00:25:45,630 what happens when you move an oriented 615 00:25:45,630 --> 00:25:47,380 bar within the receptive field. 616 00:25:47,380 --> 00:25:49,170 So this is showing the direction. 617 00:25:49,170 --> 00:25:51,420 This is showing the mean normalized response 618 00:25:51,420 --> 00:25:52,020 of a neuron. 619 00:25:52,020 --> 00:25:54,720 This is the preferred direction, and direction orientation 620 00:25:54,720 --> 00:25:57,390 that gives a maximum response. 621 00:25:57,390 --> 00:26:01,020 The blue curve corresponds to when you don't 622 00:26:01,020 --> 00:26:02,820 have activity in V2 and V3. 623 00:26:02,820 --> 00:26:04,670 Red corresponds to their control data. 624 00:26:04,670 --> 00:26:08,560 And essentially, the tuning of the neuron was not altered. 625 00:26:08,560 --> 00:26:12,540 The orientation preferred by this neuron was not altered. 626 00:26:12,540 --> 00:26:15,160 The same thing goes for direction selectivity. 627 00:26:15,160 --> 00:26:17,640 So the basic problems of orientation tuning 628 00:26:17,640 --> 00:26:20,340 and direction selectivity did not change. 629 00:26:20,340 --> 00:26:23,570 Let me say a few words about the dynamics of the responses. 630 00:26:23,570 --> 00:26:27,210 So here, what I'm showing you is the mean normalized responses 631 00:26:27,210 --> 00:26:28,430 as a function of time. 632 00:26:28,430 --> 00:26:31,360 Time 0 is when the stimulus is turned on. 633 00:26:31,360 --> 00:26:34,440 As I told you already, by about 50 milliseconds or so, 634 00:26:34,440 --> 00:26:37,890 you get a vigorous response in primary visual cortex. 635 00:26:37,890 --> 00:26:40,710 And if we compare the orange and the blue curves, 636 00:26:40,710 --> 00:26:44,500 we see that this initial response is largely identical. 637 00:26:44,500 --> 00:26:47,400 So the initial response of these V1 neurons 638 00:26:47,400 --> 00:26:52,500 is not affected by the absence of feedback from V2. 639 00:26:52,500 --> 00:26:53,900 We start to see effects, we start 640 00:26:53,900 --> 00:26:56,430 to see a change in the firing rate here. 641 00:26:56,430 --> 00:27:02,380 Largely at about 60 milliseconds or so after presentation. 642 00:27:02,380 --> 00:27:04,830 So in a highly oversimplified cartoon, 643 00:27:04,830 --> 00:27:08,740 I think of this as a bottom-up Hubel and Wiesel like response, 644 00:27:08,740 --> 00:27:10,350 driven by LGN. 645 00:27:10,350 --> 00:27:13,380 And signals from V2 to V1 coming back 646 00:27:13,380 --> 00:27:15,217 about 10 milliseconds later. 647 00:27:15,217 --> 00:27:17,550 And that's when we started seeing some of these feedback 648 00:27:17,550 --> 00:27:19,380 related effects. 649 00:27:19,380 --> 00:27:22,260 I told you that some of the basic properties do not change. 650 00:27:22,260 --> 00:27:24,600 We interpret this as being dictated largely 651 00:27:24,600 --> 00:27:25,860 by bottom-up signals. 652 00:27:25,860 --> 00:27:27,180 The dynamics do change. 653 00:27:27,180 --> 00:27:29,200 The initial response is unaffected. 654 00:27:29,200 --> 00:27:31,320 The later part of the response is affected. 655 00:27:31,320 --> 00:27:33,510 I want to say one thing that does change. 656 00:27:33,510 --> 00:27:35,760 And for that, I need to explain what an area summation 657 00:27:35,760 --> 00:27:37,470 curve is. 658 00:27:37,470 --> 00:27:39,720 So if you present the stimulus within the receptive 659 00:27:39,720 --> 00:27:43,440 field of a neuron of this size, you get a certain response. 660 00:27:43,440 --> 00:27:46,830 As you start increasing the size of this stimulus, 661 00:27:46,830 --> 00:27:48,540 you get a more vigorous response. 662 00:27:48,540 --> 00:27:49,690 Size matters. 663 00:27:49,690 --> 00:27:51,300 The larger, the better-- 664 00:27:51,300 --> 00:27:51,990 to a point. 665 00:27:51,990 --> 00:27:54,270 There comes a point where it turns out 666 00:27:54,270 --> 00:27:58,030 that the response of the neurons starts decreasing again. 667 00:27:58,030 --> 00:28:00,480 So larger is not always better. 668 00:28:00,480 --> 00:28:01,880 A little bit larger is better. 669 00:28:01,880 --> 00:28:04,680 This size has an inhibitory effect 670 00:28:04,680 --> 00:28:06,420 overall on the response of the neuron. 671 00:28:06,420 --> 00:28:08,190 This is called surround suppression. 672 00:28:08,190 --> 00:28:11,100 And these curves have been characterized in areas 673 00:28:11,100 --> 00:28:12,360 like primary visual cortex. 674 00:28:12,360 --> 00:28:16,030 Also in earlier areas for a very long time. 675 00:28:16,030 --> 00:28:19,140 It turns out that when you do these type of experiments 676 00:28:19,140 --> 00:28:22,380 in the absence of feedback, the effect of surround suppression 677 00:28:22,380 --> 00:28:23,800 does not disappear. 678 00:28:23,800 --> 00:28:27,330 That is, you still have a peak in the response as a function 679 00:28:27,330 --> 00:28:28,620 of a stimulus size. 680 00:28:28,620 --> 00:28:31,207 But there is a reduced amount of surround suppression. 681 00:28:31,207 --> 00:28:32,790 That is, when you don't have feedback, 682 00:28:32,790 --> 00:28:33,831 there's less suppression. 683 00:28:33,831 --> 00:28:36,820 You have a larger response for bigger stimulus. 684 00:28:36,820 --> 00:28:39,600 So we think that one of the fundamental computations 685 00:28:39,600 --> 00:28:41,880 that feedback is providing here is 686 00:28:41,880 --> 00:28:44,670 this integration from multiple neurons in V1 687 00:28:44,670 --> 00:28:46,200 that happens in V2. 688 00:28:46,200 --> 00:28:50,160 And then inhibition to activity of neurons in area V1 689 00:28:50,160 --> 00:28:51,990 to provide some of the suppression. 690 00:28:51,990 --> 00:28:54,060 This is partly the reason why our neurons are not 691 00:28:54,060 --> 00:28:57,600 very excited about a uniform stimulus, like a blank wall. 692 00:28:57,600 --> 00:29:00,420 Our neurons are interested in changes, and part of that, 693 00:29:00,420 --> 00:29:04,560 we think, is dictated by this feedback from V2 to V1. 694 00:29:04,560 --> 00:29:08,090 We can model these center surround interactions 695 00:29:08,090 --> 00:29:11,950 as a ratio of two Gaussian curves, two forces. 696 00:29:11,950 --> 00:29:14,380 One is the one that increases the response. 697 00:29:14,380 --> 00:29:16,110 The other one is a normalization term 698 00:29:16,110 --> 00:29:19,040 that suppresses the response when the stimulus is too large. 699 00:29:19,040 --> 00:29:20,604 There's a number of parameters here. 700 00:29:20,604 --> 00:29:22,020 Essentially, you can think of this 701 00:29:22,020 --> 00:29:24,840 as a ratio of Gaussians, ROGs. 702 00:29:24,840 --> 00:29:26,820 There's a ratio of two Gaussian curves. 703 00:29:26,820 --> 00:29:28,590 One dictating the center that responds. 704 00:29:28,590 --> 00:29:30,380 The other one, the surround response. 705 00:29:30,380 --> 00:29:32,190 And to make a long story short, we 706 00:29:32,190 --> 00:29:34,230 can feed the data from the monkey 707 00:29:34,230 --> 00:29:37,500 with this extremely simple ratio of Gaussian's model. 708 00:29:37,500 --> 00:29:39,420 And we can show that the main parameter 709 00:29:39,420 --> 00:29:43,800 that feedback seems to be acting upon is what we call Wn-- 710 00:29:43,800 --> 00:29:47,020 that is this normalization factor here. 711 00:29:47,020 --> 00:29:50,610 So that the tuning factor that dictates 712 00:29:50,610 --> 00:29:54,444 the strength of the surrounding division from V2 to V1-- 713 00:29:54,444 --> 00:29:56,610 we think that's one of the fundamental things that's 714 00:29:56,610 --> 00:29:59,607 being affected by feedback. 715 00:29:59,607 --> 00:30:01,190 So we would think of this as the gain. 716 00:30:01,190 --> 00:30:03,560 We think of this as the spatial extent 717 00:30:03,560 --> 00:30:05,690 over which the V2 can exert its action 718 00:30:05,690 --> 00:30:07,100 on primary visual cortex. 719 00:30:07,100 --> 00:30:10,310 We think that's the main thing that's affected here. 720 00:30:10,310 --> 00:30:13,400 This type of spatial effect may be 721 00:30:13,400 --> 00:30:16,910 important in other role that has been ascribed to feedback, 722 00:30:16,910 --> 00:30:19,310 which is the ability to direct attention 723 00:30:19,310 --> 00:30:21,620 to specific locations in the environment. 724 00:30:21,620 --> 00:30:23,120 I want to come back to this question 725 00:30:23,120 --> 00:30:25,520 here, and ask, under what conditions, 726 00:30:25,520 --> 00:30:28,910 and how can a feedback also provide important features 727 00:30:28,910 --> 00:30:31,465 specific signals from one area to another. 728 00:30:31,465 --> 00:30:32,840 And for that, I'm going to switch 729 00:30:32,840 --> 00:30:35,850 to another task, another completely different prep, 730 00:30:35,850 --> 00:30:37,430 which is the Where's Waldo task-- 731 00:30:37,430 --> 00:30:38,690 the task of visual search. 732 00:30:38,690 --> 00:30:41,670 How do we search for particular objects in the environment. 733 00:30:41,670 --> 00:30:45,429 And here, it's not sufficient to focus on a specific location, 734 00:30:45,429 --> 00:30:47,720 but we need to be able to search for specific features. 735 00:30:47,720 --> 00:30:50,750 We need to be able to bias our visual responses 736 00:30:50,750 --> 00:30:52,430 for specific features of the stimulus 737 00:30:52,430 --> 00:30:54,140 that we're searching for. 738 00:30:54,140 --> 00:30:56,400 So this is a famous sort of Where's Waldo task. 739 00:30:56,400 --> 00:30:58,830 You need to be able to search for specific features. 740 00:30:58,830 --> 00:31:02,180 It's not enough to be able to send feedback from V2 to V1, 741 00:31:02,180 --> 00:31:05,260 and direct attention, or change the sizes of the receptive 742 00:31:05,260 --> 00:31:08,480 fields, or the direct attention to a specific location. 743 00:31:08,480 --> 00:31:09,950 Another version that I'm not going 744 00:31:09,950 --> 00:31:13,580 to talk about of visual that has a related theme that relates 745 00:31:13,580 --> 00:31:15,860 to visual search is feature based attention, 746 00:31:15,860 --> 00:31:18,604 when you're actually paying attention to a particular face, 747 00:31:18,604 --> 00:31:21,020 to a particular color, to a particular feature that is not 748 00:31:21,020 --> 00:31:24,410 necessarily located, and to space, as our friend here has 749 00:31:24,410 --> 00:31:26,090 studied quite significantly. 750 00:31:26,090 --> 00:31:28,710 People always like to know the answer of where he is at. 751 00:31:28,710 --> 00:31:29,210 OK. 752 00:31:29,210 --> 00:31:31,550 So let me tell you about a computational model 753 00:31:31,550 --> 00:31:34,280 and some behavioral data that we have collected 754 00:31:34,280 --> 00:31:38,300 to try to get at this question of how feedback signals can 755 00:31:38,300 --> 00:31:40,640 be relevant for visual search. 756 00:31:40,640 --> 00:31:45,290 This initial part of this computational model 757 00:31:45,290 --> 00:31:48,940 is essentially the HMAX type of architecture 758 00:31:48,940 --> 00:31:52,100 that has been pioneered by Tommy Poggio and several people 759 00:31:52,100 --> 00:31:55,040 in his lab, most notably, people like Max Riesenhuber 760 00:31:55,040 --> 00:31:56,510 and Thomas Serre. 761 00:31:56,510 --> 00:31:58,340 I was thinking that by this time, 762 00:31:58,340 --> 00:32:00,620 people would have described this in more detail. 763 00:32:00,620 --> 00:32:02,450 I'm going to go through these very quickly. 764 00:32:02,450 --> 00:32:04,280 Again, today in the afternoon, we'll 765 00:32:04,280 --> 00:32:07,320 have more discussion about this family of models. 766 00:32:07,320 --> 00:32:09,350 So these family of models essentially 767 00:32:09,350 --> 00:32:12,830 goes through a series of linear and non-linear computations 768 00:32:12,830 --> 00:32:16,730 in a hierarchical way, inspired by the basic definition 769 00:32:16,730 --> 00:32:18,830 of simple and complex cells that I 770 00:32:18,830 --> 00:32:22,370 described in the work of Hubel and Wiesel. 771 00:32:22,370 --> 00:32:26,040 So basically, what these models do is they take an image. 772 00:32:26,040 --> 00:32:27,590 These are pixels. 773 00:32:27,590 --> 00:32:28,760 There's a filtering step. 774 00:32:28,760 --> 00:32:32,294 This filtering step involves Gabor filtering of the image. 775 00:32:32,294 --> 00:32:33,710 In this particular case, there are 776 00:32:33,710 --> 00:32:36,020 four different orientations. 777 00:32:36,020 --> 00:32:37,880 And what do you get here is a map 778 00:32:37,880 --> 00:32:42,470 of the visual input after this linear filtering process. 779 00:32:42,470 --> 00:32:46,700 The next step in this model is a local max operation. 780 00:32:46,700 --> 00:32:50,180 This is pooling neurons that have similar identical feature 781 00:32:50,180 --> 00:32:53,360 preferences, but slightly different scale 782 00:32:53,360 --> 00:32:54,680 in the receptive fields. 783 00:32:54,680 --> 00:32:57,530 Or slightly different positions in their receptive fields. 784 00:32:57,530 --> 00:33:00,410 And this max operation, this non-linear operation 785 00:33:00,410 --> 00:33:03,440 is giving you invariance to the specific feature. 786 00:33:03,440 --> 00:33:07,070 So now you can get a response to the same feature, irrespective 787 00:33:07,070 --> 00:33:09,770 of the exact scale or the exact position 788 00:33:09,770 --> 00:33:12,080 within the receptive field. 789 00:33:12,080 --> 00:33:15,165 These were labeled S1 and C1, initially 790 00:33:15,165 --> 00:33:16,650 in models by Fukushima. 791 00:33:16,650 --> 00:33:19,130 And this type of nomenclature was carried on later 792 00:33:19,130 --> 00:33:21,030 by Tommy and many others. 793 00:33:21,030 --> 00:33:24,410 And this is directly inspired by the simple and complex cells 794 00:33:24,410 --> 00:33:26,840 that I very briefly showed you previously 795 00:33:26,840 --> 00:33:29,660 in the recordings of Hubel and Wiesel. 796 00:33:29,660 --> 00:33:32,090 These filtering and max operations 797 00:33:32,090 --> 00:33:35,150 are repeated throughout the hierarchy again and again. 798 00:33:35,150 --> 00:33:37,160 So here's another layer that has a filtering 799 00:33:37,160 --> 00:33:40,850 step and a nonlinear max step. 800 00:33:40,850 --> 00:33:44,690 In this case, this filtering here is not a Gabor filter. 801 00:33:44,690 --> 00:33:47,870 We don't really understand very well what neurons in V2 and V4 802 00:33:47,870 --> 00:33:48,590 are doing. 803 00:33:48,590 --> 00:33:51,020 One of the types of filters that have been used 804 00:33:51,020 --> 00:33:54,380 and that we are using here is a radial basis function, 805 00:33:54,380 --> 00:33:56,630 where the properties of a neuron in this case 806 00:33:56,630 --> 00:34:02,150 are dictated by patches taking randomly from natural images. 807 00:34:02,150 --> 00:34:04,430 All of this is purely feed-forward. 808 00:34:04,430 --> 00:34:06,928 All of this is essentially the basic ingredient 809 00:34:06,928 --> 00:34:08,469 of the type of convolutional networks 810 00:34:08,469 --> 00:34:11,449 that had been used for object recognition. 811 00:34:11,449 --> 00:34:13,190 You can have more layers. 812 00:34:13,190 --> 00:34:15,199 You can have different types of computations. 813 00:34:15,199 --> 00:34:17,179 The basic properties are essentially 814 00:34:17,179 --> 00:34:19,370 the ones that are described briefly here. 815 00:34:19,370 --> 00:34:21,920 What I really want to talk about is not the former part, 816 00:34:21,920 --> 00:34:23,210 but this part of the model. 817 00:34:23,210 --> 00:34:26,550 Now I ask you, where's Waldo, you need to do something, 818 00:34:26,550 --> 00:34:29,510 you need be able to somehow look at this information, 819 00:34:29,510 --> 00:34:32,300 and be able to bias your responses 820 00:34:32,300 --> 00:34:36,949 or bias the model towards regions of the visual space 821 00:34:36,949 --> 00:34:39,949 that have features that resemble what you're looking for. 822 00:34:39,949 --> 00:34:42,949 Your car, your keys, Waldo. 823 00:34:42,949 --> 00:34:45,260 So the way we do that is first, in this case, 824 00:34:45,260 --> 00:34:47,093 I'm going to show you what happens if you're 825 00:34:47,093 --> 00:34:48,780 looking for the top hat here. 826 00:34:48,780 --> 00:34:50,210 So first, we have a representation 827 00:34:50,210 --> 00:34:51,920 in the model of the top hat. 828 00:34:51,920 --> 00:34:53,350 This is the hat here. 829 00:34:53,350 --> 00:34:56,360 And we have a representation in our vocabulary 830 00:34:56,360 --> 00:34:59,390 of how units in the highest echelons of this model 831 00:34:59,390 --> 00:35:00,290 represent this hat. 832 00:35:00,290 --> 00:35:02,750 So we have a representation of the features 833 00:35:02,750 --> 00:35:07,580 that compose this object at a high level in this model. 834 00:35:07,580 --> 00:35:10,490 We use that representation to modulate, 835 00:35:10,490 --> 00:35:13,490 in a multiplicative fashion, the entire image. 836 00:35:13,490 --> 00:35:15,500 Essentially, we bias the responses 837 00:35:15,500 --> 00:35:19,190 in the entire image based on the particular features 838 00:35:19,190 --> 00:35:21,560 that we are searching for. 839 00:35:21,560 --> 00:35:24,170 This is inspired by many physiological experiments that 840 00:35:24,170 --> 00:35:27,680 have shown that to a good approximation, 841 00:35:27,680 --> 00:35:29,270 this type of modulation in feature 842 00:35:29,270 --> 00:35:32,750 based attention has been observed across different parts 843 00:35:32,750 --> 00:35:33,590 of the visual field. 844 00:35:33,590 --> 00:35:36,080 That is, if you're searching for red objects, 845 00:35:36,080 --> 00:35:39,152 neurons that like red will enhance their response 846 00:35:39,152 --> 00:35:40,610 throughout the entire visual field. 847 00:35:40,610 --> 00:35:43,520 So have the entire visual field modulated 848 00:35:43,520 --> 00:35:48,770 by the pattern of features that we're searching for in here. 849 00:35:48,770 --> 00:35:52,150 After that, we have a normalization step. 850 00:35:52,150 --> 00:35:54,270 This normalization step is critical 851 00:35:54,270 --> 00:35:57,300 in order to discount purely bottom-up effects. 852 00:35:57,300 --> 00:35:59,870 We don't want the competition between different objects 853 00:35:59,870 --> 00:36:03,140 to be purely dictated by which object is brighter, 854 00:36:03,140 --> 00:36:03,770 for example. 855 00:36:03,770 --> 00:36:06,680 So we normalize that after modulating that 856 00:36:06,680 --> 00:36:09,650 with the features that we are searching. 857 00:36:09,650 --> 00:36:13,340 That gives us a map of the image, where each area has been 858 00:36:13,340 --> 00:36:15,560 essentially compared to this feature set 859 00:36:15,560 --> 00:36:16,910 that we're looking for. 860 00:36:16,910 --> 00:36:19,190 And then we have a winner take all mechanism that 861 00:36:19,190 --> 00:36:21,680 dictates where the model will pay attention to, 862 00:36:21,680 --> 00:36:23,840 or where the model will fixate on first. 863 00:36:23,840 --> 00:36:27,800 Where the model thinks that a particular object is located. 864 00:36:27,800 --> 00:36:30,920 OK so what happens when we have this feedback that's 865 00:36:30,920 --> 00:36:33,680 feature specific, and that modulates the responses based 866 00:36:33,680 --> 00:36:35,900 on the targets object that we're searching for. 867 00:36:35,900 --> 00:36:38,120 In these two images, either in objects 868 00:36:38,120 --> 00:36:42,180 arrays or when objects are embedded in complex scenes, 869 00:36:42,180 --> 00:36:44,300 we're searching for this top object. 870 00:36:44,300 --> 00:36:46,730 And the largest response in the model 871 00:36:46,730 --> 00:36:50,090 is indeed in the location of where the object is. 872 00:36:50,090 --> 00:36:52,070 In these other two images, the model 873 00:36:52,070 --> 00:36:54,410 is searching for this accordion here. 874 00:36:54,410 --> 00:36:56,570 And again, the model was able to find that 875 00:36:56,570 --> 00:37:00,110 by this comparison of the features with the stimulus. 876 00:37:00,110 --> 00:37:03,420 More generally, these are object array images. 877 00:37:03,420 --> 00:37:06,170 This is the number of fixations required 878 00:37:06,170 --> 00:37:08,960 to find the object in this object array images. 879 00:37:08,960 --> 00:37:11,330 So one would correspond to the first fixation. 880 00:37:11,330 --> 00:37:14,670 If the model does not find the object in the first location, 881 00:37:14,670 --> 00:37:16,520 there's what's called inhibition of return. 882 00:37:16,520 --> 00:37:18,470 So we make sure the model does not come back 883 00:37:18,470 --> 00:37:20,600 to the same location, and the model will 884 00:37:20,600 --> 00:37:24,230 look at the second best possible location in the image. 885 00:37:24,230 --> 00:37:27,870 And it will keep on searching until it finds the object. 886 00:37:27,870 --> 00:37:31,460 So the model performs in the first fixation at 60% correct. 887 00:37:31,460 --> 00:37:33,380 And eventually, after five fixations, 888 00:37:33,380 --> 00:37:37,160 it can find the object almost always right in here. 889 00:37:37,160 --> 00:37:39,560 This is what you would expect by random search. 890 00:37:39,560 --> 00:37:42,020 If you were to randomly fixate on different objects, 891 00:37:42,020 --> 00:37:44,130 so the model is doing much better than that. 892 00:37:44,130 --> 00:37:45,830 And then for the aficionados, there's 893 00:37:45,830 --> 00:37:48,980 a whole plethora of purely bottom-up models that 894 00:37:48,980 --> 00:37:50,882 don't have feedback whatsoever. 895 00:37:50,882 --> 00:37:52,340 This is a family of models that was 896 00:37:52,340 --> 00:37:54,920 pioneered by people like Laurent Itti and Christof Koch. 897 00:37:54,920 --> 00:37:56,812 These are saliency based models. 898 00:37:56,812 --> 00:37:59,270 Although you cannot see, there are a couple of other points 899 00:37:59,270 --> 00:37:59,990 in here. 900 00:37:59,990 --> 00:38:03,080 All of those models cannot find the object either. 901 00:38:03,080 --> 00:38:05,720 It's not that these objects that we're searching for 902 00:38:05,720 --> 00:38:07,550 are more salient, and therefore, that's 903 00:38:07,550 --> 00:38:09,290 why the model is finding them. 904 00:38:09,290 --> 00:38:11,340 We really need something more than just 905 00:38:11,340 --> 00:38:13,460 bottom-up pure saliency. 906 00:38:13,460 --> 00:38:14,960 We did a psychophysical experiment. 907 00:38:14,960 --> 00:38:17,690 We asked, well, this is how the model searches for Waldo. 908 00:38:17,690 --> 00:38:19,220 How will humans search for objects 909 00:38:19,220 --> 00:38:20,910 under the same conditions. 910 00:38:20,910 --> 00:38:22,530 So we had multiple objects. 911 00:38:22,530 --> 00:38:24,980 Subjects have to make a saccade to a target object. 912 00:38:24,980 --> 00:38:27,770 To make a long story short, this is the cumulative performance 913 00:38:27,770 --> 00:38:30,110 of the model and the number of fixations 914 00:38:30,110 --> 00:38:32,420 under these conditions, and the model 915 00:38:32,420 --> 00:38:35,780 that's reasonable in terms of how well humans do. 916 00:38:35,780 --> 00:38:39,867 This is data from every single individual subject in the task. 917 00:38:39,867 --> 00:38:41,450 I'm going to skip some of the details. 918 00:38:41,450 --> 00:38:45,110 You can compare the errors that the model is making. 919 00:38:45,110 --> 00:38:47,960 How consistent people are with themselves with respect 920 00:38:47,960 --> 00:38:48,710 to other subjects. 921 00:38:48,710 --> 00:38:50,960 How good it is with respect to humans. 922 00:38:50,960 --> 00:38:53,630 The long story is the model is far from perfect. 923 00:38:53,630 --> 00:38:55,280 We don't think that we have captured 924 00:38:55,280 --> 00:38:58,010 everything we need to understand about visual search. 925 00:38:58,010 --> 00:39:00,500 Some people alluded to before, for example, the notion 926 00:39:00,500 --> 00:39:03,080 that the model doesn't have these major changes 927 00:39:03,080 --> 00:39:05,870 with eccentricity, and the fovea, and so on. 928 00:39:05,870 --> 00:39:07,640 A long way to go, but we think that we've 929 00:39:07,640 --> 00:39:10,280 captured some of the essential initial ingredients 930 00:39:10,280 --> 00:39:11,240 of visual search. 931 00:39:11,240 --> 00:39:15,110 And that this is one example of how visual feedback signals can 932 00:39:15,110 --> 00:39:18,530 influence this bottom-up hierarchy for recognition. 933 00:39:18,530 --> 00:39:21,320 I want to very quickly move on to a third example 934 00:39:21,320 --> 00:39:24,410 that I wanted to give you of how feedback can help 935 00:39:24,410 --> 00:39:25,790 in terms of visual recognition. 936 00:39:25,790 --> 00:39:28,870 What are other functions that feedback could be playing. 937 00:39:28,870 --> 00:39:31,700 And for that, I'd like to discuss the work that Hanlin 938 00:39:31,700 --> 00:39:34,190 did here, and also, Bill Lotter in the lab, 939 00:39:34,190 --> 00:39:36,440 in terms of how we can recognize objects 940 00:39:36,440 --> 00:39:37,790 that are partially occluded. 941 00:39:37,790 --> 00:39:39,170 This happens all the time. 942 00:39:39,170 --> 00:39:42,080 So you walk around and see objects in the world. 943 00:39:42,080 --> 00:39:44,300 You can also encounter objects where you can only 944 00:39:44,300 --> 00:39:45,930 find partial information, and you have 945 00:39:45,930 --> 00:39:47,400 to make pattern completion. 946 00:39:47,400 --> 00:39:49,340 Pattern completion is a fundamental aspect 947 00:39:49,340 --> 00:39:50,120 of intelligence. 948 00:39:50,120 --> 00:39:52,520 We do that in all sorts of scenarios. 949 00:39:52,520 --> 00:39:54,440 It's not just restricted to vision. 950 00:39:54,440 --> 00:39:57,350 All of you can probably complete all of these patterns. 951 00:39:57,350 --> 00:39:59,410 We use pattern completion in social scenarios 952 00:39:59,410 --> 00:40:00,330 as well, right? 953 00:40:00,330 --> 00:40:02,152 You make inferences from partial knowledge 954 00:40:02,152 --> 00:40:04,110 about their intentions, and what they're doing, 955 00:40:04,110 --> 00:40:06,120 and what they're trying to do, OK? 956 00:40:06,120 --> 00:40:09,340 So we want to study this problem of how you complete pattern, 957 00:40:09,340 --> 00:40:12,510 how you extrapolate from partial limited information 958 00:40:12,510 --> 00:40:14,885 in the context of visual recognition. 959 00:40:14,885 --> 00:40:16,260 There are a lot of different ways 960 00:40:16,260 --> 00:40:19,640 in which one can present partially occluded objects. 961 00:40:19,640 --> 00:40:21,120 Here are just a few of them. 962 00:40:21,120 --> 00:40:23,490 What Hanlin did was use a paradigm called 963 00:40:23,490 --> 00:40:25,579 bubbles that's shown here. 964 00:40:25,579 --> 00:40:27,870 Essentially, it's like looking at the world like these. 965 00:40:27,870 --> 00:40:29,850 You only have small windows through which 966 00:40:29,850 --> 00:40:31,140 you can see the object. 967 00:40:31,140 --> 00:40:34,759 Performance can be titrated to make the task harder or easier. 968 00:40:34,759 --> 00:40:36,300 So if you have a lot of bubbles, it's 969 00:40:36,300 --> 00:40:39,750 relatively easy to recognize that this is a toy school bus. 970 00:40:39,750 --> 00:40:41,250 If you have only four bubbles, it's 971 00:40:41,250 --> 00:40:42,820 actually pretty challenging. 972 00:40:42,820 --> 00:40:46,950 So we can titrate performance on the difficulty of this task. 973 00:40:46,950 --> 00:40:50,760 Very quickly, let me start by showing you psychophysics 974 00:40:50,760 --> 00:40:51,540 performance here. 975 00:40:51,540 --> 00:40:54,570 This is how subjects perform as a function 976 00:40:54,570 --> 00:40:58,110 of the amount of occlusion in the image as a function of how 977 00:40:58,110 --> 00:41:00,760 many pixels you're showing for these images. 978 00:41:00,760 --> 00:41:04,470 And what you see here is that with 60% occlusion, 979 00:41:04,470 --> 00:41:06,420 performance is extremely high. 980 00:41:06,420 --> 00:41:08,910 Performance essentially drops to chance level 981 00:41:08,910 --> 00:41:10,940 when the object is more and more occluded. 982 00:41:10,940 --> 00:41:13,230 There is a significant amount of robustness 983 00:41:13,230 --> 00:41:14,726 in human performance. 984 00:41:14,726 --> 00:41:16,350 For example, you have a little bit more 985 00:41:16,350 --> 00:41:18,510 than 10% of the pixels in the object, 986 00:41:18,510 --> 00:41:21,360 and people can still recognize them reasonably well. 987 00:41:21,360 --> 00:41:24,280 So this is all behavioral data. 988 00:41:24,280 --> 00:41:27,540 Let me show you very quickly what Hanlin discovered 989 00:41:27,540 --> 00:41:31,110 by doing invasive recordings in human patients 990 00:41:31,110 --> 00:41:32,550 while the subjects were performing 991 00:41:32,550 --> 00:41:36,060 this recognition of objects that are partially occluded. 992 00:41:36,060 --> 00:41:38,610 It's illegal to put electrodes in the human brain 993 00:41:38,610 --> 00:41:41,430 in normal people, so we work with subjects 994 00:41:41,430 --> 00:41:44,470 that have pharmacological intractable epilepsy. 995 00:41:44,470 --> 00:41:46,440 So inside of subjects that have seizures, 996 00:41:46,440 --> 00:41:48,930 the neurosurgeons need to implant electrodes in order 997 00:41:48,930 --> 00:41:51,180 to localize the seizures. 998 00:41:51,180 --> 00:41:54,690 And B, in order to ensure that when they do a resection, 999 00:41:54,690 --> 00:41:56,700 and they take out the part of the brain that's 1000 00:41:56,700 --> 00:41:58,710 responsible for seizures, that they're not 1001 00:41:58,710 --> 00:42:02,380 going to interfere with other functions, such as language. 1002 00:42:02,380 --> 00:42:05,190 These patients stay in the hospital for about one week. 1003 00:42:05,190 --> 00:42:07,620 And during this one week, we have a unique opportunity 1004 00:42:07,620 --> 00:42:11,280 to go inside a human brain, and record physiological data. 1005 00:42:11,280 --> 00:42:12,700 Depending on the type of patient, 1006 00:42:12,700 --> 00:42:15,160 we've used the different types of electrodes. 1007 00:42:15,160 --> 00:42:17,850 This is what some people refer to as ECoG electrodes. 1008 00:42:17,850 --> 00:42:19,720 Electrocorticographic signals. 1009 00:42:19,720 --> 00:42:21,185 These are field potential signals, 1010 00:42:21,185 --> 00:42:23,310 very different from the ones that I was showing you 1011 00:42:23,310 --> 00:42:25,250 in the little spikes before. 1012 00:42:25,250 --> 00:42:28,420 These are aggregate measures, probably of tens of thousands, 1013 00:42:28,420 --> 00:42:31,350 if not millions of neurons, where we have very, very 1014 00:42:31,350 --> 00:42:34,350 high temporal resolution at the millisecond level, but very 1015 00:42:34,350 --> 00:42:38,100 poor spatial resolution, only being able to localize things 1016 00:42:38,100 --> 00:42:40,980 at the millimeter level or so. 1017 00:42:40,980 --> 00:42:43,500 With these, we can pinpoint specific locations 1018 00:42:43,500 --> 00:42:45,840 within about approximately one millimeter, 1019 00:42:45,840 --> 00:42:48,750 but have very high signal to noise ratio signals that are 1020 00:42:48,750 --> 00:42:50,700 dictated by the visual input. 1021 00:42:50,700 --> 00:42:53,426 An example of those signals is shown here. 1022 00:42:53,426 --> 00:42:55,050 These are intracranial field potentials 1023 00:42:55,050 --> 00:42:56,490 as a function of time. 1024 00:42:56,490 --> 00:42:58,250 This is the onset of the stimulus. 1025 00:42:58,250 --> 00:43:00,330 And these 39 different repetitions, 1026 00:43:00,330 --> 00:43:03,510 when Hanlin is showing this unoccluded face, 1027 00:43:03,510 --> 00:43:06,507 we see a very vigorous change, quite systematic 1028 00:43:06,507 --> 00:43:07,590 from one trial to another. 1029 00:43:07,590 --> 00:43:10,200 All of those gray traces are single trials, 1030 00:43:10,200 --> 00:43:13,470 similar to the raster plot that I was showing you before. 1031 00:43:13,470 --> 00:43:16,740 So now I'm going to show you a couple of single trials. 1032 00:43:16,740 --> 00:43:19,440 We're showing individual images where 1033 00:43:19,440 --> 00:43:20,760 objects are partially occluded. 1034 00:43:20,760 --> 00:43:23,310 In this case, there's only about 15% 1035 00:43:23,310 --> 00:43:26,070 of the pixels of the face that are being shown. 1036 00:43:26,070 --> 00:43:28,170 And we see that despite the fact that we're 1037 00:43:28,170 --> 00:43:31,500 covering 85%, more or less, of that image, 1038 00:43:31,500 --> 00:43:34,170 we still see a pretty consistent physiological signal. 1039 00:43:34,170 --> 00:43:35,950 The signals are clearly not identical. 1040 00:43:35,950 --> 00:43:38,340 For example, this one looks somewhat different. 1041 00:43:38,340 --> 00:43:40,410 There's a lot of our ability from one to another. 1042 00:43:40,410 --> 00:43:43,230 But again, these are just single trials showing that there still 1043 00:43:43,230 --> 00:43:45,600 is selectivity for these shape, despite the fact 1044 00:43:45,600 --> 00:43:48,700 that we are only showing a small fraction of this thing. 1045 00:43:48,700 --> 00:43:50,940 These are all the trials in which these five 1046 00:43:50,940 --> 00:43:52,439 different faces were presented. 1047 00:43:52,439 --> 00:43:53,730 Each line corresponds to trial. 1048 00:43:53,730 --> 00:43:54,930 These are raster plots. 1049 00:43:54,930 --> 00:43:58,150 As you can see, the data are extremely clear. 1050 00:43:58,150 --> 00:43:59,490 There's no processing here. 1051 00:43:59,490 --> 00:44:01,920 This is raw data single trials. 1052 00:44:01,920 --> 00:44:04,350 These are single trials with the partial images. 1053 00:44:04,350 --> 00:44:06,790 You again can see there's a vigorous response here. 1054 00:44:06,790 --> 00:44:08,920 The responses are not as nicely and neatly 1055 00:44:08,920 --> 00:44:11,250 aligned here, in part because all of these images 1056 00:44:11,250 --> 00:44:12,070 are different. 1057 00:44:12,070 --> 00:44:14,111 All of the locations on the models are different. 1058 00:44:14,111 --> 00:44:17,070 As I just showed you, there's a lot of variability here. 1059 00:44:17,070 --> 00:44:19,870 If you actually fix the bubble locations-- that 1060 00:44:19,870 --> 00:44:22,800 is, you repeatedly present the same image multiple times still 1061 00:44:22,800 --> 00:44:25,005 in pseudorandom order, but the same image, 1062 00:44:25,005 --> 00:44:26,880 you see that the signals are more consistent. 1063 00:44:26,880 --> 00:44:29,610 Not as consistent as this one, but certainly more consistent. 1064 00:44:29,610 --> 00:44:33,210 Again, very clear selective response 1065 00:44:33,210 --> 00:44:37,380 tolerant to a tremendous amount of occlusion in the image. 1066 00:44:37,380 --> 00:44:39,690 Interestingly, the latency of the response 1067 00:44:39,690 --> 00:44:43,060 is significantly later compared to the whole images. 1068 00:44:43,060 --> 00:44:45,330 So if you look at, for example, 200 milliseconds, 1069 00:44:45,330 --> 00:44:47,340 you see that the responses started significantly 1070 00:44:47,340 --> 00:44:49,780 before 200 milliseconds for the whole images. 1071 00:44:49,780 --> 00:44:52,339 All of the responses here start after 200 milliseconds. 1072 00:44:52,339 --> 00:44:53,880 We spent a significant amount of time 1073 00:44:53,880 --> 00:44:55,504 trying to characterize this and showing 1074 00:44:55,504 --> 00:44:57,150 that pattern completion, the ability 1075 00:44:57,150 --> 00:44:59,420 to recognize objects that are occluded, 1076 00:44:59,420 --> 00:45:03,410 involves a significant delay at the physiological level. 1077 00:45:03,410 --> 00:45:06,530 If you use the purely bottom-up architecture and tried to do 1078 00:45:06,530 --> 00:45:08,300 this in silico-- 1079 00:45:08,300 --> 00:45:10,850 this bottom-up model does not perform very well. 1080 00:45:10,850 --> 00:45:12,680 The performance deteriorates quite rapidly 1081 00:45:12,680 --> 00:45:15,200 when you start having significant occlusion. 1082 00:45:15,200 --> 00:45:18,440 I'm going to skip this and just very quickly argue about some 1083 00:45:18,440 --> 00:45:20,900 of the initial steps that Bill Lotter has 1084 00:45:20,900 --> 00:45:24,560 been doing, trying to add recurrency to the models. 1085 00:45:24,560 --> 00:45:27,770 Trying to have both feedback connections as well 1086 00:45:27,770 --> 00:45:30,290 as recurrent connections within each layer 1087 00:45:30,290 --> 00:45:33,380 to try to get a model that will be able to perform pattern 1088 00:45:33,380 --> 00:45:35,840 completion, and therefore, use these feedback 1089 00:45:35,840 --> 00:45:38,000 signals to allow us to extrapolate 1090 00:45:38,000 --> 00:45:41,030 from previous information about these objects. 1091 00:45:41,030 --> 00:45:44,200 Bill will be here Friday or Monday, I'm not sure. 1092 00:45:44,200 --> 00:45:47,210 So you should talk to him more about these models. 1093 00:45:47,210 --> 00:45:49,920 Essentially, they belong to the family of HMAX. 1094 00:45:49,920 --> 00:45:52,350 They belong to a family of convolutional networks, 1095 00:45:52,350 --> 00:45:54,410 where you have filter operations, threshold, 1096 00:45:54,410 --> 00:45:56,690 and saturation pooling on normalization. 1097 00:45:56,690 --> 00:45:59,060 Jim will say about this family of models 1098 00:45:59,060 --> 00:46:00,380 today in the afternoon. 1099 00:46:00,380 --> 00:46:02,240 These are purely bottom-up models. 1100 00:46:02,240 --> 00:46:04,820 And what Bill has been doing is other than recurrent 1101 00:46:04,820 --> 00:46:07,400 and feedback connections, retraining these models based 1102 00:46:07,400 --> 00:46:09,770 on these recurrent and feedback connections, 1103 00:46:09,770 --> 00:46:11,600 and then comparing their performance 1104 00:46:11,600 --> 00:46:13,740 with human psychophysics. 1105 00:46:13,740 --> 00:46:16,429 So this is the behavioral data that I showed you before. 1106 00:46:16,429 --> 00:46:18,470 This is the performance of the feedforward model. 1107 00:46:18,470 --> 00:46:22,310 This is the recurrent model that was able to train. 1108 00:46:22,310 --> 00:46:24,710 Another way to try to get out whether feedback 1109 00:46:24,710 --> 00:46:26,480 is relevant for pattern completion 1110 00:46:26,480 --> 00:46:28,430 is to use with backward masking. 1111 00:46:28,430 --> 00:46:30,860 Backward masking means that you present an image, 1112 00:46:30,860 --> 00:46:32,360 and immediately after that image, 1113 00:46:32,360 --> 00:46:35,360 within a few milliseconds, you present noise. 1114 00:46:35,360 --> 00:46:36,650 You present a mask. 1115 00:46:36,650 --> 00:46:39,200 And people have argued that masking essentially 1116 00:46:39,200 --> 00:46:40,634 interrupts feedback processing. 1117 00:46:40,634 --> 00:46:42,050 Essentially, it allows you to have 1118 00:46:42,050 --> 00:46:45,222 a bottom-up flow of information-- stops feedback. 1119 00:46:45,222 --> 00:46:47,180 I don't think this is quite extremely rigorous. 1120 00:46:47,180 --> 00:46:49,160 I think that the story is probably far more complicated 1121 00:46:49,160 --> 00:46:49,730 than that. 1122 00:46:49,730 --> 00:46:51,920 But to a first approximation, you present a picture, 1123 00:46:51,920 --> 00:46:54,200 you have a bottom-up stream, you put a mask, 1124 00:46:54,200 --> 00:46:57,750 and you interrupt all the subsequent feedback processing. 1125 00:46:57,750 --> 00:47:00,020 So if you do that at the behavioral level, 1126 00:47:00,020 --> 00:47:02,990 you can show that when stimuli are masked, particularly 1127 00:47:02,990 --> 00:47:05,990 if the interval is very short, you can significantly impair 1128 00:47:05,990 --> 00:47:07,540 pattern completion performance. 1129 00:47:07,540 --> 00:47:10,340 So if the mask comes within 25 milliseconds 1130 00:47:10,340 --> 00:47:12,440 of the actual stimulus performance 1131 00:47:12,440 --> 00:47:14,930 in recognizing these heavily occluded objects 1132 00:47:14,930 --> 00:47:16,490 is significantly impaired. 1133 00:47:16,490 --> 00:47:19,970 We interpreted this to indicate that feedback may be 1134 00:47:19,970 --> 00:47:23,010 needed for pattern completion. 1135 00:47:23,010 --> 00:47:27,350 This is Bill's instantiation of that recurrent model. 1136 00:47:27,350 --> 00:47:29,090 Because he has recurrency now, he also 1137 00:47:29,090 --> 00:47:30,620 has time in this models. 1138 00:47:30,620 --> 00:47:32,000 So he can also present the image, 1139 00:47:32,000 --> 00:47:35,030 present the mask to the model, and compare the performance 1140 00:47:35,030 --> 00:47:37,490 of the computational model as a function 1141 00:47:37,490 --> 00:47:41,514 of the occlusion in unmasked and the masked conditions. 1142 00:47:41,514 --> 00:47:43,930 So to summarize this-- and there's still two or three more 1143 00:47:43,930 --> 00:47:45,230 slides that I want to show-- 1144 00:47:45,230 --> 00:47:48,410 I've given you three examples of potential ways 1145 00:47:48,410 --> 00:47:50,880 in which feedback signals can be important. 1146 00:47:50,880 --> 00:47:53,480 The first one has to do with the effects of feedback 1147 00:47:53,480 --> 00:47:56,270 on surround suppression, going from V2 to V1. 1148 00:47:56,270 --> 00:47:58,730 We think that by doing this type of experiments combined 1149 00:47:58,730 --> 00:48:00,970 with the computational models to understand what 1150 00:48:00,970 --> 00:48:02,780 are the fundamental computations, 1151 00:48:02,780 --> 00:48:04,610 we can begin to elucidate some of the steps 1152 00:48:04,610 --> 00:48:06,770 by which feedback can exert its role. 1153 00:48:06,770 --> 00:48:08,780 We hoped to come up with the essential alphabet 1154 00:48:08,780 --> 00:48:11,690 of computations similar to the filtering and normalization 1155 00:48:11,690 --> 00:48:14,540 operations that are implemented by feedback. 1156 00:48:14,540 --> 00:48:16,970 The second example was feedback as being 1157 00:48:16,970 --> 00:48:19,190 able to have features that dictate what 1158 00:48:19,190 --> 00:48:22,700 we do in visual search tasks and the last example, 1159 00:48:22,700 --> 00:48:25,130 in both our preliminary work, trying to use feedback, 1160 00:48:25,130 --> 00:48:27,980 as well as recurrent connections to perform pattern completion 1161 00:48:27,980 --> 00:48:31,820 and extrapolate from prior information. 1162 00:48:31,820 --> 00:48:33,500 So the last thing I wanted to do is just 1163 00:48:33,500 --> 00:48:36,647 flash a few more slides about a couple 1164 00:48:36,647 --> 00:48:39,230 of things that are happening in neuroscience and computational 1165 00:48:39,230 --> 00:48:41,870 neuroscience that I think are tremendously 1166 00:48:41,870 --> 00:48:43,690 exciting for people. 1167 00:48:43,690 --> 00:48:46,550 If I were young again, these are some of the things 1168 00:48:46,550 --> 00:48:50,010 that I would definitely be very, very excited to follow up on. 1169 00:48:50,010 --> 00:48:52,940 So the notion that we'll be able to go inside brains 1170 00:48:52,940 --> 00:48:55,370 and read our biological code, and eventually write down 1171 00:48:55,370 --> 00:48:58,310 computer code, and build amazing machines is, I think, 1172 00:48:58,310 --> 00:49:00,000 very appealing and sexy. 1173 00:49:00,000 --> 00:49:02,750 But at the same time, it's a far cry, right? 1174 00:49:02,750 --> 00:49:05,420 We're a long way from being able to take biological codes 1175 00:49:05,420 --> 00:49:08,000 and translate that into computational codes. 1176 00:49:08,000 --> 00:49:10,100 It's really extremely tragic. 1177 00:49:10,100 --> 00:49:11,780 So here are three reasons why I think 1178 00:49:11,780 --> 00:49:15,800 there's optimism that this may not be as crazy as it sounds. 1179 00:49:15,800 --> 00:49:18,320 We're beginning to have tremendous information 1180 00:49:18,320 --> 00:49:20,939 about wiring diagrams at exquisite resolution. 1181 00:49:20,939 --> 00:49:22,730 There are a lot of people who are seriously 1182 00:49:22,730 --> 00:49:25,950 thinking about providing us with maps about which 1183 00:49:25,950 --> 00:49:27,800 neuron talks to which other neuron. 1184 00:49:27,800 --> 00:49:30,030 And this was not present ever before. 1185 00:49:30,030 --> 00:49:32,197 So we are now beginning to have detailed information 1186 00:49:32,197 --> 00:49:34,821 that it's much higher resolution connectivity than ever before. 1187 00:49:34,821 --> 00:49:36,830 The second one is the strength in numbers. 1188 00:49:36,830 --> 00:49:38,239 For decades, we've been recording 1189 00:49:38,239 --> 00:49:39,780 the activity of one neuron at a time, 1190 00:49:39,780 --> 00:49:41,360 maybe a few neurons at a time. 1191 00:49:41,360 --> 00:49:44,000 Now there are many different ideas and techniques out there 1192 00:49:44,000 --> 00:49:45,680 by which we can listen to and monitor 1193 00:49:45,680 --> 00:49:48,006 the activity of multiple neurons simultaneously. 1194 00:49:48,006 --> 00:49:49,880 And I think this is going to be game changing 1195 00:49:49,880 --> 00:49:51,921 for neurophysiology, but also for the possibility 1196 00:49:51,921 --> 00:49:55,390 of reputational models that are inspired by biology. 1197 00:49:55,390 --> 00:49:57,890 And the third one is a series of techniques mostly developed 1198 00:49:57,890 --> 00:50:00,510 by people like Ed Boyden and Karl Deisseroth 1199 00:50:00,510 --> 00:50:03,090 to do optogenetics, and to manipulate these circuits 1200 00:50:03,090 --> 00:50:04,770 with unprecedented resolution. 1201 00:50:04,770 --> 00:50:07,330 So let me expand on that for one second. 1202 00:50:07,330 --> 00:50:08,880 This is the C. elegans. 1203 00:50:08,880 --> 00:50:11,760 This is an intramicroscopy image of how one 1204 00:50:11,760 --> 00:50:13,445 can categorize the circuitry. 1205 00:50:13,445 --> 00:50:15,570 So it turns out that this pioneering work of Sydney 1206 00:50:15,570 --> 00:50:17,460 Brenner a couple of decades ago has 1207 00:50:17,460 --> 00:50:21,380 led to mapping the connectivity of each one of the 302 neurons. 1208 00:50:21,380 --> 00:50:24,167 How exactly for each neuron, who it's connected with. 1209 00:50:24,167 --> 00:50:26,250 And this is represented in that rather complex way 1210 00:50:26,250 --> 00:50:27,680 in this diagram here. 1211 00:50:27,680 --> 00:50:29,340 Well, it turns out that people are 1212 00:50:29,340 --> 00:50:33,810 beginning to do these type of heroic type of experiments 1213 00:50:33,810 --> 00:50:34,500 in cortex. 1214 00:50:34,500 --> 00:50:36,900 So we're beginning to have initial insights 1215 00:50:36,900 --> 00:50:38,430 about connectivity about how neurons 1216 00:50:38,430 --> 00:50:41,710 are wired with each other at this resolution in cortex. 1217 00:50:41,710 --> 00:50:45,110 We're nowhere near being able to have these for humans. 1218 00:50:45,110 --> 00:50:47,020 Not even other species, mice, and so on. 1219 00:50:47,020 --> 00:50:48,209 Not even Drosophila yet. 1220 00:50:48,209 --> 00:50:50,250 There's a huge amount of [INAUDIBLE] and interest 1221 00:50:50,250 --> 00:50:53,160 in the community of having a very detailed map. 1222 00:50:53,160 --> 00:50:55,710 So the question for you for the young and next generation, 1223 00:50:55,710 --> 00:50:57,376 what are we going to do with these maps. 1224 00:50:57,376 --> 00:51:00,210 If I give you a fantastic detailed wiring diagram 1225 00:51:00,210 --> 00:51:02,520 of a chunk of cortex, how is that 1226 00:51:02,520 --> 00:51:05,520 going to transform our ability to make inferences, and build 1227 00:51:05,520 --> 00:51:07,382 new computational models. 1228 00:51:07,382 --> 00:51:09,090 The second one has to do with our ability 1229 00:51:09,090 --> 00:51:11,300 to start the recording for more and more neurons. 1230 00:51:11,300 --> 00:51:13,466 This is that other I didn't have time to talk about. 1231 00:51:13,466 --> 00:51:16,180 This is work also that Hanlin did with Matias Ison and Itzhak 1232 00:51:16,180 --> 00:51:16,680 Fried. 1233 00:51:16,680 --> 00:51:18,960 These are recordings of spikes from human cortex, 1234 00:51:18,960 --> 00:51:21,120 again, in patients that have epilepsy. 1235 00:51:21,120 --> 00:51:23,990 I'm just flashing this slide because I had it handy. 1236 00:51:23,990 --> 00:51:25,140 These are 300 neurons. 1237 00:51:25,140 --> 00:51:27,991 This is not a simultaneously recorded population. 1238 00:51:27,991 --> 00:51:30,240 These are cases where we can record from a few neurons 1239 00:51:30,240 --> 00:51:31,827 at a time using micro wires now. 1240 00:51:31,827 --> 00:51:33,660 This is different from the type of recording 1241 00:51:33,660 --> 00:51:34,860 that I showed you before. 1242 00:51:34,860 --> 00:51:37,180 These are actual spikes that we can record. 1243 00:51:37,180 --> 00:51:40,590 And these 380 neurons is in a different task. 1244 00:51:40,590 --> 00:51:43,200 So recording from these 318 neurons 1245 00:51:43,200 --> 00:51:45,912 took us about three to four years of time. 1246 00:51:45,912 --> 00:51:47,370 There are more and more people that 1247 00:51:47,370 --> 00:51:49,930 are using either two photon imaging 1248 00:51:49,930 --> 00:51:53,550 and/or massive multielectrode arrays that are beginning 1249 00:51:53,550 --> 00:51:56,640 to be able to record the activity of hundreds of neurons 1250 00:51:56,640 --> 00:51:57,750 simultaneously. 1251 00:51:57,750 --> 00:52:01,530 My good friend and crazy inventor, Ed Boyden, 1252 00:52:01,530 --> 00:52:04,440 believes that we will be able to recover from 100,000 neurons 1253 00:52:04,440 --> 00:52:05,370 simultaneously. 1254 00:52:05,370 --> 00:52:07,710 Of course, he is far more grandiose than I am, 1255 00:52:07,710 --> 00:52:10,224 and he can think big at this kind of scale. 1256 00:52:10,224 --> 00:52:12,390 But even to think about the possibility of recording 1257 00:52:12,390 --> 00:52:14,850 from 1,000 or 5,000 neurons simultaneously so 1258 00:52:14,850 --> 00:52:16,890 that in a week or a month, one may 1259 00:52:16,890 --> 00:52:18,630 be able to have a tremendous amount 1260 00:52:18,630 --> 00:52:20,150 from a very large population. 1261 00:52:20,150 --> 00:52:22,410 This is going to be transformative. 1262 00:52:22,410 --> 00:52:25,140 Three decades ago in the field of molecular biology, 1263 00:52:25,140 --> 00:52:26,817 people would sequence a single gene, 1264 00:52:26,817 --> 00:52:28,650 and they would publish the entire sequence-- 1265 00:52:28,650 --> 00:52:31,329 ACCGG-- and so on. 1266 00:52:31,329 --> 00:52:32,370 That was the whole paper. 1267 00:52:32,370 --> 00:52:34,119 A grad student would spend five years just 1268 00:52:34,119 --> 00:52:35,519 sequencing a single gene. 1269 00:52:35,519 --> 00:52:37,560 Now we have the possibility of downloading genome 1270 00:52:37,560 --> 00:52:39,337 by advances in technology. 1271 00:52:39,337 --> 00:52:40,920 I suspect that a lot of our recordings 1272 00:52:40,920 --> 00:52:42,270 will become obsolete. 1273 00:52:42,270 --> 00:52:45,330 We'll be able to listen to the activity of thousands 1274 00:52:45,330 --> 00:52:47,070 of neurons simultaneously. 1275 00:52:47,070 --> 00:52:48,810 And again, it's for your generation 1276 00:52:48,810 --> 00:52:50,610 to think about how this will transform 1277 00:52:50,610 --> 00:52:54,000 our understanding of how quick we can read biological codes. 1278 00:52:54,000 --> 00:52:56,540 In the unlikely event that you think that that's not enough, 1279 00:52:56,540 --> 00:52:58,350 here's one more thing that I think 1280 00:52:58,350 --> 00:53:01,740 is transforming how we can decipher biological codes. 1281 00:53:01,740 --> 00:53:04,080 And that's again, Ed Boyden using techniques 1282 00:53:04,080 --> 00:53:06,090 that are referred to as optogenetics, where 1283 00:53:06,090 --> 00:53:10,320 you can manipulate the activity of specific types of neurons. 1284 00:53:10,320 --> 00:53:12,517 I flashed a lot of computational models today. 1285 00:53:12,517 --> 00:53:14,850 A lot of hypotheses about what different connections may 1286 00:53:14,850 --> 00:53:15,440 be doing. 1287 00:53:15,440 --> 00:53:17,023 At some point, we will be able to test 1288 00:53:17,023 --> 00:53:19,900 some of those hypotheses with unprecedented resolution. 1289 00:53:19,900 --> 00:53:23,220 So if somebody wanted to know what is this neuron V2, 1290 00:53:23,220 --> 00:53:24,720 what kind of feedback its providing, 1291 00:53:24,720 --> 00:53:27,890 we may be able to silence only neurons in V2 that 1292 00:53:27,890 --> 00:53:29,997 provide feedback to V1 in a clean manner 1293 00:53:29,997 --> 00:53:31,455 without affecting, for example, all 1294 00:53:31,455 --> 00:53:34,986 of the other feed-forward processes, and so on. 1295 00:53:34,986 --> 00:53:36,360 So the amount of specificity that 1296 00:53:36,360 --> 00:53:39,131 can be derived from these type of techniques is enormous. 1297 00:53:39,131 --> 00:53:40,380 So that's all I wanted to say. 1298 00:53:40,380 --> 00:53:43,560 So because we have very high specificity in our ability 1299 00:53:43,560 --> 00:53:45,150 to manipulate circuits, because we'll 1300 00:53:45,150 --> 00:53:47,525 be able to record the activity of many, many more neurons 1301 00:53:47,525 --> 00:53:49,290 simultaneously, and because we'll 1302 00:53:49,290 --> 00:53:51,120 have more and more detailed diagrams, 1303 00:53:51,120 --> 00:53:54,030 I think that the dream of being able to read out and decode 1304 00:53:54,030 --> 00:53:57,300 biological codes, and translate those into competition codes 1305 00:53:57,300 --> 00:53:59,130 is less crazy than it may sound. 1306 00:53:59,130 --> 00:54:02,480 We think that in the next several years and decades, 1307 00:54:02,480 --> 00:54:04,230 smart people like you will be able to make 1308 00:54:04,230 --> 00:54:06,810 this tremendous transformation and discover 1309 00:54:06,810 --> 00:54:08,760 specific algorithms about intelligence 1310 00:54:08,760 --> 00:54:12,720 by taking direct inspiration from biology. 1311 00:54:12,720 --> 00:54:14,370 So that's what's illustrated here. 1312 00:54:14,370 --> 00:54:16,150 We'll be happy to keep on fighting. 1313 00:54:16,150 --> 00:54:17,780 Andrei and I will fight. 1314 00:54:17,780 --> 00:54:20,580 We will be happy to keep on fighting about Eva and how 1315 00:54:20,580 --> 00:54:22,200 amazing she is and she isn't. 1316 00:54:22,200 --> 00:54:24,510 What I try to describe is that by really understanding 1317 00:54:24,510 --> 00:54:26,580 biological codes, we'll be able to write 1318 00:54:26,580 --> 00:54:28,110 amazing computational code. 1319 00:54:28,110 --> 00:54:29,400 I put a lot of arrows here. 1320 00:54:29,400 --> 00:54:30,810 I'm not claiming QED. 1321 00:54:30,810 --> 00:54:32,730 I'm not saying that we solve the problem. 1322 00:54:32,730 --> 00:54:36,030 There's a huge amount of work that we need in here.