1
00:00:01,680 --> 00:00:04,080
The following content is
provided under a Creative

2
00:00:04,080 --> 00:00:05,620
Commons license.

3
00:00:05,620 --> 00:00:07,920
Your support will help
MIT OpenCourseWare

4
00:00:07,920 --> 00:00:12,280
continue to offer high quality
educational resources for free.

5
00:00:12,280 --> 00:00:14,910
To make a donation, or
view additional materials

6
00:00:14,910 --> 00:00:18,870
from hundreds of MIT courses,
visit MIT OpenCourseWare

7
00:00:18,870 --> 00:00:21,470
at ocw.mit.edu.

8
00:00:21,470 --> 00:00:22,970
GABRIEL KREIMAN:
What I'd like to do

9
00:00:22,970 --> 00:00:25,170
today is give a very
brief introduction

10
00:00:25,170 --> 00:00:28,550
to neural circuits, why we
study them, how we study them,

11
00:00:28,550 --> 00:00:30,270
and the possibilities
that come out

12
00:00:30,270 --> 00:00:31,830
of understanding
biological codes,

13
00:00:31,830 --> 00:00:35,340
and trying to translate those
ideas into computational codes.

14
00:00:35,340 --> 00:00:37,260
Then I will be a
bit more specific,

15
00:00:37,260 --> 00:00:39,270
and discuss some
initial attempts

16
00:00:39,270 --> 00:00:42,510
at studying the computational
role of feedback signals.

17
00:00:42,510 --> 00:00:44,610
And then I'll switch
gears and talk

18
00:00:44,610 --> 00:00:47,280
for a few minutes about a
couple of things that are not

19
00:00:47,280 --> 00:00:49,200
necessarily related
to things that we've

20
00:00:49,200 --> 00:00:52,950
made any real work on,
but I'm particularly

21
00:00:52,950 --> 00:00:55,740
excited about in the context
of open question challenges,

22
00:00:55,740 --> 00:00:57,450
and opportunities,
and what I think

23
00:00:57,450 --> 00:01:00,540
will happen over the next
several years in the field.

24
00:01:00,540 --> 00:01:02,640
In the hope of
inspiring several of you

25
00:01:02,640 --> 00:01:06,850
to actually solve some of these
open questions in the field.

26
00:01:06,850 --> 00:01:10,320
So one of the reasons why I'm
very excited about studying

27
00:01:10,320 --> 00:01:13,560
biology and studying
brains is that our brains

28
00:01:13,560 --> 00:01:16,080
are the product of millions
of years of evolution.

29
00:01:16,080 --> 00:01:18,180
And through evolution,
we have discovered

30
00:01:18,180 --> 00:01:22,109
how to do things that are
interesting, fast, efficient.

31
00:01:22,109 --> 00:01:24,150
And so if we can understand
the biological cause,

32
00:01:24,150 --> 00:01:25,800
if we can understand
the machinery

33
00:01:25,800 --> 00:01:28,620
by which we do all of
these amazing feats, that

34
00:01:28,620 --> 00:01:30,570
in principle, we
should be able to take

35
00:01:30,570 --> 00:01:32,460
some of these biological
codes, and write

36
00:01:32,460 --> 00:01:35,034
computer code that will
do all of those things

37
00:01:35,034 --> 00:01:35,700
in similar ways.

38
00:01:35,700 --> 00:01:38,130
In similar ways that we can
write algorithms to compute

39
00:01:38,130 --> 00:01:39,720
the square root
of 2, there could

40
00:01:39,720 --> 00:01:41,790
be algorithms that
dictate how we

41
00:01:41,790 --> 00:01:43,980
see, how we can
recognize objects,

42
00:01:43,980 --> 00:01:46,950
how we can recognize
auditory events.

43
00:01:46,950 --> 00:01:49,710
In short, the answer to all
of these Turing questions,

44
00:01:49,710 --> 00:01:52,140
in some sense, is
hidden somewhere here

45
00:01:52,140 --> 00:01:53,160
inside our brain.

46
00:01:53,160 --> 00:01:56,330
So the question is, how can we
listen to neurons and circuits,

47
00:01:56,330 --> 00:01:58,410
decode their activity,
and maybe even write

48
00:01:58,410 --> 00:02:00,390
in information in
the brain, and then

49
00:02:00,390 --> 00:02:04,770
trying to translate all of these
ideas into computational codes.

50
00:02:04,770 --> 00:02:07,140
So there's a lot of
fascinating properties

51
00:02:07,140 --> 00:02:08,729
that biological codes cover.

52
00:02:08,729 --> 00:02:10,320
Needless to say,
we're not quite there

53
00:02:10,320 --> 00:02:12,990
yet in terms of
computers and robots.

54
00:02:12,990 --> 00:02:16,570
So our hardware and software
worked for many decades.

55
00:02:16,570 --> 00:02:21,210
I think it's very unlikely
that your amazing iPhone 6 or 5

56
00:02:21,210 --> 00:02:24,790
or 7 whatever it is, will last
four, five, six, seven, eight,

57
00:02:24,790 --> 00:02:25,750
nine decades.

58
00:02:25,750 --> 00:02:28,140
None of our computers
will last that long.

59
00:02:28,140 --> 00:02:29,790
Our hardware does.

60
00:02:29,790 --> 00:02:31,620
There's amazing
parallel computation

61
00:02:31,620 --> 00:02:32,832
going on in our brains.

62
00:02:32,832 --> 00:02:34,290
This is quite
distinct from the way

63
00:02:34,290 --> 00:02:37,620
we think about algorithms and
computation in other domains

64
00:02:37,620 --> 00:02:38,610
now.

65
00:02:38,610 --> 00:02:41,160
Our brains have a
reprogrammable architecture.

66
00:02:41,160 --> 00:02:43,320
The same chunk of
tissue can be used

67
00:02:43,320 --> 00:02:44,760
for several different purposes.

68
00:02:44,760 --> 00:02:46,634
Through learning and
through our experiences,

69
00:02:46,634 --> 00:02:49,530
we can modify those
architectures.

70
00:02:49,530 --> 00:02:51,690
A thing that has been
quite interesting,

71
00:02:51,690 --> 00:02:53,630
and that maybe
we'll come back to,

72
00:02:53,630 --> 00:02:56,640
is the notion of being able
to do single shot learning, as

73
00:02:56,640 --> 00:02:59,490
opposed to some machine
learning algorithms that require

74
00:02:59,490 --> 00:03:01,860
lots and lots of data to train.

75
00:03:01,860 --> 00:03:05,220
We can easily discover
a structure in data.

76
00:03:05,220 --> 00:03:07,290
The notion of fault
tolerance and robustness

77
00:03:07,290 --> 00:03:10,650
to transformations
is an essential one.

78
00:03:10,650 --> 00:03:12,900
Robustness is arguably
a fundamental property

79
00:03:12,900 --> 00:03:16,410
of biology and one that has been
very, very hard to implement

80
00:03:16,410 --> 00:03:18,320
in computational circuitry.

81
00:03:18,320 --> 00:03:20,130
And for engineers,
the whole issue

82
00:03:20,130 --> 00:03:23,040
about how to have different
systems integrate information,

83
00:03:23,040 --> 00:03:25,560
and interact with each
other, has been and continues

84
00:03:25,560 --> 00:03:27,270
to be a fundamental challenge.

85
00:03:27,270 --> 00:03:28,860
And our brains do
that all the time.

86
00:03:28,860 --> 00:03:30,235
We're walking down
the street, we

87
00:03:30,235 --> 00:03:31,620
can integrate
visual information,

88
00:03:31,620 --> 00:03:34,320
with auditory information with
our targets, our plans, what

89
00:03:34,320 --> 00:03:38,760
we're interested in doing, on
social interactions, and so on.

90
00:03:38,760 --> 00:03:40,890
So why do we want to
study neural circuits.

91
00:03:40,890 --> 00:03:42,630
So I think we are
in the golden era

92
00:03:42,630 --> 00:03:45,510
right now, because we can begin
to explore the answers to some

93
00:03:45,510 --> 00:03:49,560
of these Turing questions in
brains at the biological level.

94
00:03:49,560 --> 00:03:53,100
So we can study high
level cognitive phenomena

95
00:03:53,100 --> 00:03:55,230
at the level of neurons,
and circuits of neurons.

96
00:03:55,230 --> 00:03:58,830
And I'll give you a few
examples of that later on.

97
00:03:58,830 --> 00:04:01,620
More recently, and I'll come
back to this towards the end,

98
00:04:01,620 --> 00:04:03,390
we've had the
opportunity to begin

99
00:04:03,390 --> 00:04:06,960
to manipulate, and disrupt, and
interact with neural circuits

100
00:04:06,960 --> 00:04:09,300
at unprecedented resolution.

101
00:04:09,300 --> 00:04:12,060
So we can begin
to turn on and off

102
00:04:12,060 --> 00:04:14,100
specific subsets of neurons.

103
00:04:14,100 --> 00:04:17,430
And that has tremendously
accelerated our possibility

104
00:04:17,430 --> 00:04:21,000
to test theories at
the neural level.

105
00:04:21,000 --> 00:04:24,122
And then again, the notion being
that empirical findings can

106
00:04:24,122 --> 00:04:26,205
be translated into
computational algorithms-- that

107
00:04:26,205 --> 00:04:29,564
is, if we really understand
how biology solves the problem,

108
00:04:29,564 --> 00:04:31,230
in principle, we
should be able to write

109
00:04:31,230 --> 00:04:34,950
mathematical equations, and
then write code that mimics

110
00:04:34,950 --> 00:04:36,190
some of those computations.

111
00:04:36,190 --> 00:04:38,190
And some of the
examples of that, we

112
00:04:38,190 --> 00:04:40,320
talk about in the visual
system in my presentation,

113
00:04:40,320 --> 00:04:42,670
but also in Jim
DiCarlo's presentation.

114
00:04:42,670 --> 00:04:44,670
These are just advertising
for a couple of books

115
00:04:44,670 --> 00:04:47,100
that I find interesting
and relevant

116
00:04:47,100 --> 00:04:48,350
in computational neuroscience.

117
00:04:48,350 --> 00:04:50,532
I'm not going to have
time to do any justice

118
00:04:50,532 --> 00:04:52,490
to the entire field of
computation neuroscience

119
00:04:52,490 --> 00:04:52,989
at all.

120
00:04:52,989 --> 00:04:55,340
So all these slides
will be in Dropbox,

121
00:04:55,340 --> 00:04:56,970
so if anyone wants
to learn more about

122
00:04:56,970 --> 00:04:58,140
computational neuroscience.

123
00:04:58,140 --> 00:05:00,135
These are lot of
tremendous books.

124
00:05:00,135 --> 00:05:01,760
Larry Abbott is the
author of this one,

125
00:05:01,760 --> 00:05:04,910
and he'll be talking tonight.

126
00:05:04,910 --> 00:05:06,930
So how do we study
biological circuitry.

127
00:05:06,930 --> 00:05:09,620
And I realize that this is
deja vu and very well known

128
00:05:09,620 --> 00:05:10,730
for many of you.

129
00:05:10,730 --> 00:05:12,770
But in general,
we have a variety

130
00:05:12,770 --> 00:05:15,890
of techniques to probe the
function of brain circuits.

131
00:05:15,890 --> 00:05:18,410
And this is showing
the temporal resolution

132
00:05:18,410 --> 00:05:20,660
of different techniques,
and the spatial resolution

133
00:05:20,660 --> 00:05:24,050
of different techniques used
to study neural circuits.

134
00:05:24,050 --> 00:05:26,090
All the way from
techniques that have

135
00:05:26,090 --> 00:05:28,370
limited spatial and
temporal resolution,

136
00:05:28,370 --> 00:05:30,770
such as PET and fMRI--

137
00:05:30,770 --> 00:05:33,590
techniques that have very
high temporal resolution,

138
00:05:33,590 --> 00:05:35,810
but relatively poor
spatial resolution--

139
00:05:35,810 --> 00:05:37,310
all the way to
techniques that allow

140
00:05:37,310 --> 00:05:40,460
us to interrogate the function
of individual channels

141
00:05:40,460 --> 00:05:41,560
with neurons.

142
00:05:41,560 --> 00:05:43,670
So most of what I'm
going to talk about today

143
00:05:43,670 --> 00:05:46,310
is what we refer to as the
neural circuit level, somewhere

144
00:05:46,310 --> 00:05:49,580
in between single neurons
and then ensembles of neurons

145
00:05:49,580 --> 00:05:51,140
recording the local
field potential,

146
00:05:51,140 --> 00:05:54,350
which give us the resolution
of milliseconds, where we think

147
00:05:54,350 --> 00:05:56,780
a lot of the computations
in the cortex are happening,

148
00:05:56,780 --> 00:05:59,900
and where we think we can
begin to elucidate how neurons

149
00:05:59,900 --> 00:06:02,810
interact with each other.

150
00:06:02,810 --> 00:06:04,490
So to start from
the very beginning,

151
00:06:04,490 --> 00:06:06,350
we need to understand
what a neuron does.

152
00:06:06,350 --> 00:06:10,140
And again, many of you are
quite familiar with this.

153
00:06:10,140 --> 00:06:12,260
But the basic
fundamental understanding

154
00:06:12,260 --> 00:06:15,200
of what a neuron does is to
integrate information-- receive

155
00:06:15,200 --> 00:06:17,240
information through
its dendrites,

156
00:06:17,240 --> 00:06:19,310
integrates that
information, and decides

157
00:06:19,310 --> 00:06:22,760
whether to fire a spike or not.

158
00:06:22,760 --> 00:06:25,610
Interestingly, some of the
basic intuitions of our neuron

159
00:06:25,610 --> 00:06:29,300
function were essentially
conceived by a Spaniard,

160
00:06:29,300 --> 00:06:30,320
Ramón y Cajal.

161
00:06:30,320 --> 00:06:31,730
He wanted to be an artist.

162
00:06:31,730 --> 00:06:34,700
His parents told him that he
could not become an artist,

163
00:06:34,700 --> 00:06:37,130
he had to become a
clinician, a medical doctor.

164
00:06:37,130 --> 00:06:38,940
So he followed the tradition.

165
00:06:38,940 --> 00:06:40,530
He became a medical doctor.

166
00:06:40,530 --> 00:06:43,700
But then he said, well, what I
really like doing is drawing.

167
00:06:43,700 --> 00:06:46,760
And so he bought a microscope,
he put it in his kitchen,

168
00:06:46,760 --> 00:06:50,100
and he spent a good chunk of
his life drawing, essentially.

169
00:06:50,100 --> 00:06:53,810
So he would look at neurons,
and he would draw their shapes.

170
00:06:53,810 --> 00:06:56,540
And that's essentially
how neuroscience started.

171
00:06:56,540 --> 00:06:59,450
Just from these beautiful
and amazing array

172
00:06:59,450 --> 00:07:03,050
of drawings of neurons, he
conjectured the basic flow

173
00:07:03,050 --> 00:07:03,740
of information.

174
00:07:03,740 --> 00:07:05,739
This notion that this
integration of information

175
00:07:05,739 --> 00:07:07,640
through dendrites, all
of this integration

176
00:07:07,640 --> 00:07:08,990
happens in the soma.

177
00:07:08,990 --> 00:07:11,990
And from there, neurons decide
whether to fire a spike or not.

178
00:07:11,990 --> 00:07:13,430
Nothing more, nothing less.

179
00:07:13,430 --> 00:07:16,290
That's essentially
the fundamental unit

180
00:07:16,290 --> 00:07:18,950
of computation in our brains.

181
00:07:18,950 --> 00:07:22,670
How do we think about and
model those processes?

182
00:07:22,670 --> 00:07:24,830
There's a family of
different types of models

183
00:07:24,830 --> 00:07:28,100
that people have used to
describe what a neuron does.

184
00:07:28,100 --> 00:07:31,940
These models differ in terms
of their biological accuracy,

185
00:07:31,940 --> 00:07:34,550
and their computational
complexity.

186
00:07:34,550 --> 00:07:37,880
One of the most used ones is
perhaps an integrate and fire

187
00:07:37,880 --> 00:07:38,780
neuron.

188
00:07:38,780 --> 00:07:41,750
This is a very
simple RC circuit.

189
00:07:41,750 --> 00:07:45,560
It basically integrates current,
and then through a threshold,

190
00:07:45,560 --> 00:07:49,670
the neuron decides when to
fire or not to fire a spike.

191
00:07:49,670 --> 00:07:53,330
This is essentially treating
neurons as point masses.

192
00:07:53,330 --> 00:07:55,970
There are people out there
who have argued that you

193
00:07:55,970 --> 00:07:57,152
need more and more detail.

194
00:07:57,152 --> 00:07:59,360
You need to know exactly
how many dendrites you have,

195
00:07:59,360 --> 00:08:00,910
and the position
of each dendrite,

196
00:08:00,910 --> 00:08:02,870
and on and on and on and on.

197
00:08:02,870 --> 00:08:04,700
What's the exact
resolution at which we

198
00:08:04,700 --> 00:08:08,750
should study neuron systems is
a fundamental open question.

199
00:08:08,750 --> 00:08:11,150
We don't know what's the
right level of abstraction.

200
00:08:11,150 --> 00:08:14,120
There are people who think about
brains in the context of blood

201
00:08:14,120 --> 00:08:17,129
flow, and millions and millions
of neurons averaged together.

202
00:08:17,129 --> 00:08:18,920
There are people who
think that we actually

203
00:08:18,920 --> 00:08:22,130
need to pay attention to
the exact details of how

204
00:08:22,130 --> 00:08:25,580
every single dendrite integrates
information, and so on.

205
00:08:25,580 --> 00:08:27,875
For many of us, this
is a sufficient level

206
00:08:27,875 --> 00:08:28,500
of abstraction.

207
00:08:28,500 --> 00:08:31,760
The notion that there's a neuron
that can integrate information.

208
00:08:31,760 --> 00:08:33,799
So we would like
to push this notion

209
00:08:33,799 --> 00:08:36,860
that we can think about
models with single neurons,

210
00:08:36,860 --> 00:08:39,080
and see how far we can go,
understanding that we are

211
00:08:39,080 --> 00:08:43,039
ignoring a lot of the inner
complexity of what's happening

212
00:08:43,039 --> 00:08:45,890
inside a neuron itself.

213
00:08:45,890 --> 00:08:47,660
So very, very
briefly just to push

214
00:08:47,660 --> 00:08:50,580
the notion that this
is not rocket science.

215
00:08:50,580 --> 00:08:53,810
It's very, very easy to build
these integrate-and-fire model

216
00:08:53,810 --> 00:08:54,340
simulations.

217
00:08:54,340 --> 00:08:57,810
I know many of you do
this on a daily basis.

218
00:08:57,810 --> 00:09:00,970
This is the equation
of the RC circuit.

219
00:09:00,970 --> 00:09:03,800
There's current that flows
through a capacitance.

220
00:09:03,800 --> 00:09:07,690
There's current that flows
through the resistance, which,

221
00:09:07,690 --> 00:09:10,880
this RC circuit, we think of
as composed of the ion channels

222
00:09:10,880 --> 00:09:12,750
in the membranes of the neurons.

223
00:09:12,750 --> 00:09:15,350
And this is all there
is to it in terms

224
00:09:15,350 --> 00:09:18,410
of a lot of the simulation
that we use to understand

225
00:09:18,410 --> 00:09:20,404
the function of neurons.

226
00:09:20,404 --> 00:09:22,070
And again, just to
tell you that there's

227
00:09:22,070 --> 00:09:25,560
nothing scary or fundamentally
difficult about this,

228
00:09:25,560 --> 00:09:27,479
here's just a couple
of lines in MATLAB

229
00:09:27,479 --> 00:09:29,270
that you can take a
look at if you've never

230
00:09:29,270 --> 00:09:30,830
done these kind of simulations.

231
00:09:30,830 --> 00:09:33,950
This is a very simple and
perhaps even somewhat wrong

232
00:09:33,950 --> 00:09:37,610
simulation of an
integrate-and-fire neuron.

233
00:09:37,610 --> 00:09:40,730
But just to tell you that it's
relatively simple to build

234
00:09:40,730 --> 00:09:42,380
models of individual
neurons that

235
00:09:42,380 --> 00:09:43,880
have these
fundamental properties

236
00:09:43,880 --> 00:09:46,760
of being able to integrate
information, and decide

237
00:09:46,760 --> 00:09:48,020
when to fire a spike.

238
00:09:48,020 --> 00:09:50,330
The fundamental
questions that we really

239
00:09:50,330 --> 00:09:53,180
want to tackle in
CBMM have to do

240
00:09:53,180 --> 00:09:55,100
with putting together
lots of neurons,

241
00:09:55,100 --> 00:09:57,290
and understanding the
function of circuits.

242
00:09:57,290 --> 00:09:59,340
It's not enough to understand
individual neurons.

243
00:09:59,340 --> 00:10:01,900
We need to understand how
they interact together.

244
00:10:01,900 --> 00:10:04,150
We want to understand
what is there,

245
00:10:04,150 --> 00:10:07,780
who's there, what are they doing
to whom, and when, and why.

246
00:10:07,780 --> 00:10:11,290
We really need to understand
the activity of multiple neurons

247
00:10:11,290 --> 00:10:14,270
together in the
form of circuitry.

248
00:10:14,270 --> 00:10:16,860
So just a handful of
basic definitions.

249
00:10:16,860 --> 00:10:18,760
If we have a
circuitry like this,

250
00:10:18,760 --> 00:10:21,820
where we start connecting
multiple neurons together,

251
00:10:21,820 --> 00:10:25,660
information flows here in this
circuitry in this direction.

252
00:10:25,660 --> 00:10:28,780
We refer to the
connections between neurons

253
00:10:28,780 --> 00:10:31,225
that go in this direction
as feed forward.

254
00:10:31,225 --> 00:10:33,850
We refer to the connections that
flow in the opposite direction

255
00:10:33,850 --> 00:10:36,610
as feedback and I use the
word recurrent connections

256
00:10:36,610 --> 00:10:40,130
for the horizontal connections
within a particular layer.

257
00:10:40,130 --> 00:10:41,770
So this is just to
fix the nomenclature

258
00:10:41,770 --> 00:10:45,220
for the discussion that
will come next, and also

259
00:10:45,220 --> 00:10:49,990
today in the afternoon with
Jim DiCarlo's presentation.

260
00:10:49,990 --> 00:10:52,300
Throughout a lot
of anatomical work,

261
00:10:52,300 --> 00:10:55,810
we have begun to elucidate
some of the basic connectivity

262
00:10:55,810 --> 00:10:58,130
between neurons in the cortex.

263
00:10:58,130 --> 00:11:00,130
And this is the
primary example that

264
00:11:00,130 --> 00:11:03,580
has been cited extremely
often of what we understand

265
00:11:03,580 --> 00:11:06,070
about the connectivity
between different areas

266
00:11:06,070 --> 00:11:07,600
in the macaque monkey.

267
00:11:07,600 --> 00:11:10,360
We don't have a diagram like
this for the human brain.

268
00:11:10,360 --> 00:11:12,220
Most of the detailed
anatomical work

269
00:11:12,220 --> 00:11:14,740
has been done in
macaque monkeys.

270
00:11:14,740 --> 00:11:18,850
So each of these boxes here
represents a brain area,

271
00:11:18,850 --> 00:11:20,770
and this encapsulates
our understanding

272
00:11:20,770 --> 00:11:22,480
of who talks to
whom, or which area

273
00:11:22,480 --> 00:11:25,490
talks to which other area
in terms of visual cortex.

274
00:11:25,490 --> 00:11:27,550
There's a lot of
different parts of cortex

275
00:11:27,550 --> 00:11:29,680
that represent
visual information.

276
00:11:29,680 --> 00:11:31,900
Here at the bottom,
we have the retina.

277
00:11:31,900 --> 00:11:35,410
Information from the retina
flows through to the LGN.

278
00:11:35,410 --> 00:11:38,710
From the LGN, information
goes to primary visual cortex,

279
00:11:38,710 --> 00:11:40,310
sitting right here.

280
00:11:40,310 --> 00:11:42,310
And from there,
there's a cascade

281
00:11:42,310 --> 00:11:45,580
that is largely parallel, and
at the same time, hierarchical,

282
00:11:45,580 --> 00:11:48,180
of a conglomerate
of multiple areas

283
00:11:48,180 --> 00:11:52,190
that are fundamental in
processing visual information.

284
00:11:52,190 --> 00:11:54,206
We'll talk about some
of these areas next.

285
00:11:54,206 --> 00:11:56,080
And we'll also talk
about some of these areas

286
00:11:56,080 --> 00:11:58,150
today in the afternoon
when Jim discusses

287
00:11:58,150 --> 00:12:00,400
what are the fundamental
computations involved

288
00:12:00,400 --> 00:12:04,480
in visual object recognition.

289
00:12:04,480 --> 00:12:06,064
One of the fundamental
clues as to how

290
00:12:06,064 --> 00:12:07,813
do we understand, how
do we know that this

291
00:12:07,813 --> 00:12:09,310
is a particular
visual area, how do

292
00:12:09,310 --> 00:12:12,730
we know that this is
important for our vision,

293
00:12:12,730 --> 00:12:14,770
has come from
anatomical lesions.

294
00:12:14,770 --> 00:12:18,080
Mostly in monkeys, but in
some cases, in humans as well.

295
00:12:18,080 --> 00:12:20,320
So if you make lesions
in some of these areas,

296
00:12:20,320 --> 00:12:22,510
depending on exactly where
you make that lesion,

297
00:12:22,510 --> 00:12:25,030
people either become
completely blind,

298
00:12:25,030 --> 00:12:26,740
or they have a
particular scotoma,

299
00:12:26,740 --> 00:12:29,510
a particular chunk of the visual
field where they cannot see.

300
00:12:29,510 --> 00:12:31,420
Or they have more
high order types

301
00:12:31,420 --> 00:12:35,950
of deficits in terms
of visual recognition.

302
00:12:35,950 --> 00:12:38,110
As an example, the
primary visual cortex

303
00:12:38,110 --> 00:12:40,870
was discovered by people who
were of the [INAUDIBLE] they

304
00:12:40,870 --> 00:12:43,780
were studying, the trajectory
of bullets in soldiers

305
00:12:43,780 --> 00:12:46,990
during World War I.
And by discovering

306
00:12:46,990 --> 00:12:49,570
that some of those
peoples had a blind part

307
00:12:49,570 --> 00:12:52,750
to their visual field, and that
was a topographically organized

308
00:12:52,750 --> 00:12:55,000
depending on the particular
trajectory of the bullet

309
00:12:55,000 --> 00:12:57,080
through their occipital cortex.

310
00:12:57,080 --> 00:13:00,520
And that's how we became to
think about V1 as fundamental

311
00:13:00,520 --> 00:13:02,200
in visual processing.

312
00:13:02,200 --> 00:13:03,890
It is not a perfect hierarchy.

313
00:13:03,890 --> 00:13:06,250
It's not there is
A, B, C, D. Right?

314
00:13:06,250 --> 00:13:07,250
For a number of reasons.

315
00:13:07,250 --> 00:13:10,080
One is that there are lots
of parallel connections.

316
00:13:10,080 --> 00:13:12,130
There are lots of
different stages

317
00:13:12,130 --> 00:13:14,090
that are connected
to each other.

318
00:13:14,090 --> 00:13:17,620
And one of the ways
to define a hierarchy

319
00:13:17,620 --> 00:13:20,680
is by looking at the
timing of the responses

320
00:13:20,680 --> 00:13:22,700
in different areas.

321
00:13:22,700 --> 00:13:26,334
So if you look at the average
latency of the response in each

322
00:13:26,334 --> 00:13:28,000
of these areas, you'll
find that there's

323
00:13:28,000 --> 00:13:29,470
an approximate hierarchy.

324
00:13:29,470 --> 00:13:32,680
Information gets out of the
retina approximately at 50

325
00:13:32,680 --> 00:13:34,210
milliseconds.

326
00:13:34,210 --> 00:13:36,760
About 60 or so milliseconds
in LGN, and so on.

327
00:13:36,760 --> 00:13:40,000
So it's approximately
a 10 millisecond cost

328
00:13:40,000 --> 00:13:42,620
per step in terms of
the average latency.

329
00:13:42,620 --> 00:13:44,740
However, if you start
looking at the distribution,

330
00:13:44,740 --> 00:13:46,690
you'll see that it's
not a strict hierarchy.

331
00:13:46,690 --> 00:13:51,070
For example, there are
neurons in area V4 that

332
00:13:51,070 --> 00:13:52,730
are the early neurons
in V4 may fire

333
00:13:52,730 --> 00:13:55,150
before the late neurons in V1.

334
00:13:55,150 --> 00:13:58,300
And that shows you that the
circuitry is far more complex

335
00:13:58,300 --> 00:14:00,460
than just a simple hierarchy.

336
00:14:00,460 --> 00:14:02,680
One way to put some
order into this seemingly

337
00:14:02,680 --> 00:14:05,650
complex and chaotic
circuitry, one simplification

338
00:14:05,650 --> 00:14:07,430
is that there are
two main pathways.

339
00:14:07,430 --> 00:14:09,220
One is the so-called
what pathway.

340
00:14:09,220 --> 00:14:11,430
The other one is the
so-called where pathway.

341
00:14:11,430 --> 00:14:14,380
The what pathway essentially
is the ventral pathway.

342
00:14:14,380 --> 00:14:16,510
It's mostly involved
in object recognition,

343
00:14:16,510 --> 00:14:18,300
trying to understand
what is there.

344
00:14:18,300 --> 00:14:20,320
The dorsal pathway,
the where pathway,

345
00:14:20,320 --> 00:14:22,720
is most involved in
motion, and being

346
00:14:22,720 --> 00:14:26,080
able to detect where objects
are, stereo, and so on.

347
00:14:26,080 --> 00:14:28,100
Again, this is not
a strict division,

348
00:14:28,100 --> 00:14:30,520
but it's a pretty good
approximation that many of us

349
00:14:30,520 --> 00:14:33,250
have used in terms of
thinking about the fundamental

350
00:14:33,250 --> 00:14:35,970
computations in these areas.

351
00:14:35,970 --> 00:14:38,397
Now we often think
about these boxes,

352
00:14:38,397 --> 00:14:40,480
but of course, there's a
huge amount of complexity

353
00:14:40,480 --> 00:14:42,130
within each of these boxes.

354
00:14:42,130 --> 00:14:45,040
So if we zoom in
one of these areas,

355
00:14:45,040 --> 00:14:47,230
we discover that there's
a complex hierarchy

356
00:14:47,230 --> 00:14:48,520
of computations.

357
00:14:48,520 --> 00:14:50,100
There are multiple
different layers.

358
00:14:50,100 --> 00:14:53,180
The cortex is essentially
a six layer structure.

359
00:14:53,180 --> 00:14:54,970
And there are specific rules.

360
00:14:54,970 --> 00:14:57,970
People have referred to this
as a canonical micro circuitry.

361
00:14:57,970 --> 00:15:01,030
There's a specific set of rules
in terms of how information

362
00:15:01,030 --> 00:15:04,060
flows from one layer to
another in terms of each

363
00:15:04,060 --> 00:15:06,100
of these cortical structures.

364
00:15:06,100 --> 00:15:09,610
To a first approximation,
this canonical circuitry

365
00:15:09,610 --> 00:15:12,190
is common to most
of these areas.

366
00:15:12,190 --> 00:15:13,600
There are these
rules about which

367
00:15:13,600 --> 00:15:15,580
layer receives information
first, and sends

368
00:15:15,580 --> 00:15:17,320
information to areas
are more or less

369
00:15:17,320 --> 00:15:20,530
constant throughout
the cortical circuitry.

370
00:15:20,530 --> 00:15:23,620
This doesn't mean that we
understand this circuitry well,

371
00:15:23,620 --> 00:15:25,480
or what each of these
connections is doing.

372
00:15:25,480 --> 00:15:26,900
We certainly don't.

373
00:15:26,900 --> 00:15:30,100
But these are initial steps
to sort of decipher some

374
00:15:30,100 --> 00:15:33,270
of these basic biological
connectivity that

375
00:15:33,270 --> 00:15:36,150
has fundamental computational
properties for vision

376
00:15:36,150 --> 00:15:38,480
processing.

377
00:15:38,480 --> 00:15:40,570
So our lab has been
very interested in what

378
00:15:40,570 --> 00:15:42,610
we call the first
order approximation

379
00:15:42,610 --> 00:15:45,790
or immediate approximation
to visual object recognition.

380
00:15:45,790 --> 00:15:48,790
The notion that we can
recognize objects very fast,

381
00:15:48,790 --> 00:15:51,250
and that this can be
explained, essentially,

382
00:15:51,250 --> 00:15:54,460
as the bottom-up
hierarchical process.

383
00:15:54,460 --> 00:15:57,310
Jim DiCarlo is going to
talk about this extensively

384
00:15:57,310 --> 00:16:00,430
this afternoon, so I'm going to
essentially skip that, and jump

385
00:16:00,430 --> 00:16:02,860
into more recent work that
we've done trying to think

386
00:16:02,860 --> 00:16:04,820
about top-down connections.

387
00:16:04,820 --> 00:16:06,490
But just let me
briefly say why we

388
00:16:06,490 --> 00:16:09,070
think that the first pass
of visual information

389
00:16:09,070 --> 00:16:12,100
can be semi-seriously
approximated by these purely

390
00:16:12,100 --> 00:16:13,510
bottom-up processing.

391
00:16:13,510 --> 00:16:15,010
One is that at the
behavioral level,

392
00:16:15,010 --> 00:16:17,470
we can recognize
objects very, very fast.

393
00:16:17,470 --> 00:16:19,570
There's a series of
psychophysical experiments

394
00:16:19,570 --> 00:16:21,760
that demonstrate that
if I show you an object,

395
00:16:21,760 --> 00:16:26,110
recognition can happen within
about 150 milliseconds or so.

396
00:16:26,110 --> 00:16:28,000
We know that the
physiological signals

397
00:16:28,000 --> 00:16:30,220
underlying visual
object recognition also

398
00:16:30,220 --> 00:16:31,780
happen very fast.

399
00:16:31,780 --> 00:16:34,040
Within about 100 to
150 milliseconds,

400
00:16:34,040 --> 00:16:37,030
we can find neurons that
show very selective responses

401
00:16:37,030 --> 00:16:39,740
to complex objects, and again,
you'll see examples of that

402
00:16:39,740 --> 00:16:42,220
this afternoon.

403
00:16:42,220 --> 00:16:45,340
The behavior and the physiology
have inspired generations

404
00:16:45,340 --> 00:16:48,090
of computational models that are
purely bottom-up, where there

405
00:16:48,090 --> 00:16:51,880
is no recurrency, and that can
be quite successful in terms

406
00:16:51,880 --> 00:16:53,670
of visual recognition.

407
00:16:53,670 --> 00:16:56,440
To our first approximation,
the recent excitement

408
00:16:56,440 --> 00:16:59,320
with deep convolutional
networks can be traced back

409
00:16:59,320 --> 00:17:01,960
to some of these ideas, and
some of these basic biologically

410
00:17:01,960 --> 00:17:04,911
inspired computations
that are purely bottom-up.

411
00:17:04,911 --> 00:17:06,369
So to summarize--
and I'm not going

412
00:17:06,369 --> 00:17:09,089
to give any more details--
we think that the first 100

413
00:17:09,089 --> 00:17:12,069
milliseconds or so
of visual processing

414
00:17:12,069 --> 00:17:15,099
can be approximated by
these purely bottom-up,

415
00:17:15,099 --> 00:17:19,480
semi hierarchical
sequence of computations.

416
00:17:19,480 --> 00:17:23,060
And this leaves open a
fundamental question,

417
00:17:23,060 --> 00:17:27,520
which is, why we have all these
massive feedback connections?

418
00:17:27,520 --> 00:17:29,500
We know that in cortex,
there are actually

419
00:17:29,500 --> 00:17:31,840
more recurrent and
feedback connections

420
00:17:31,840 --> 00:17:33,000
than feed-forward ones.

421
00:17:33,000 --> 00:17:34,680
And what I'd like
to talk about today

422
00:17:34,680 --> 00:17:37,690
is a couple of ideas of what all
of those feedback connections

423
00:17:37,690 --> 00:17:38,810
may be doing.

424
00:17:38,810 --> 00:17:42,940
So this is an anatomical study
looking at a lot of the boxes

425
00:17:42,940 --> 00:17:44,980
that I showed you
before, and showing

426
00:17:44,980 --> 00:17:47,140
how many of the connections
to any given area

427
00:17:47,140 --> 00:17:49,360
come from one of
these other variants.

428
00:17:49,360 --> 00:17:52,160
For example, if we take
just primary visual cortex,

429
00:17:52,160 --> 00:17:54,430
this is saying that
a good fraction

430
00:17:54,430 --> 00:17:56,770
of the connections to
primary visual cortex

431
00:17:56,770 --> 00:17:57,970
actually come from V2.

432
00:17:57,970 --> 00:18:00,490
That's from the next
stage of processing,

433
00:18:00,490 --> 00:18:02,620
rather than from V1 itself.

434
00:18:02,620 --> 00:18:05,830
All in all, if you quantify
for a given neuron in V1,

435
00:18:05,830 --> 00:18:08,260
how many signals are coming
from a bottom-up source that

436
00:18:08,260 --> 00:18:10,840
is for LGN versus how
many signals are coming

437
00:18:10,840 --> 00:18:14,050
from other V1 neurons or
from higher visual areas,

438
00:18:14,050 --> 00:18:16,600
it turns out that there are
more horizontal and top-down

439
00:18:16,600 --> 00:18:18,402
projections than bottom-up ones.

440
00:18:18,402 --> 00:18:19,360
So what are they doing?

441
00:18:19,360 --> 00:18:21,609
If we can approximate the
first 100 milliseconds or so

442
00:18:21,609 --> 00:18:23,970
of vision so well with
bottom-up hierarchies,

443
00:18:23,970 --> 00:18:27,560
what are all these
feedback signals doing?

444
00:18:27,560 --> 00:18:29,800
So this brings me
to three examples

445
00:18:29,800 --> 00:18:32,050
that I'd like to discuss
today of recent work

446
00:18:32,050 --> 00:18:34,990
that we've done to take some
initial principles in thinking

447
00:18:34,990 --> 00:18:37,450
about what this feedback
connections could be doing

448
00:18:37,450 --> 00:18:40,270
in terms of visual recognition.

449
00:18:40,270 --> 00:18:42,190
So I'll start by
giving you an example

450
00:18:42,190 --> 00:18:44,980
of trying to understand
the basic fundamental unit

451
00:18:44,980 --> 00:18:45,930
of feedback.

452
00:18:45,930 --> 00:18:47,760
That is these
canonical computations,

453
00:18:47,760 --> 00:18:51,400
and by looking at the feedback
that happens from V2 to V1

454
00:18:51,400 --> 00:18:53,594
in the visual system.

455
00:18:53,594 --> 00:18:55,510
Next, I'm going to give
you an example of what

456
00:18:55,510 --> 00:18:58,120
happens during a visual
search, where we also

457
00:18:58,120 --> 00:19:01,030
think that feedback signals may
be playing a fundamental role,

458
00:19:01,030 --> 00:19:03,550
if you have to do or Where's
Waldo kind of task, where

459
00:19:03,550 --> 00:19:06,220
you have to search for objects
and in the environment.

460
00:19:06,220 --> 00:19:08,470
And finally, I will talk
about pattern completion, how

461
00:19:08,470 --> 00:19:11,290
you can recognize objects that
are heavily occluded, where

462
00:19:11,290 --> 00:19:13,490
we also think that
feedback signals may

463
00:19:13,490 --> 00:19:16,340
be playing an important role.

464
00:19:16,340 --> 00:19:18,100
So before I go on
to describe what

465
00:19:18,100 --> 00:19:21,430
we're seeing the feedback
from V2 to V1 maybe doing,

466
00:19:21,430 --> 00:19:23,500
let me describe very
quickly classical work

467
00:19:23,500 --> 00:19:26,590
that Hubel and Wiesel did
that got them the Nobel Prize

468
00:19:26,590 --> 00:19:28,090
by recording the
activity of neurons

469
00:19:28,090 --> 00:19:30,530
in primary visual cortex.

470
00:19:30,530 --> 00:19:32,410
They started working
in kittens, and then

471
00:19:32,410 --> 00:19:35,140
subsequently in
monkeys, and discovered

472
00:19:35,140 --> 00:19:38,170
that there are neurons that
show orientation tuning, meaning

473
00:19:38,170 --> 00:19:40,390
that they respond
very vigorously.

474
00:19:40,390 --> 00:19:42,470
These are spikes,
each of these marks

475
00:19:42,470 --> 00:19:44,020
corresponds to an
action potential,

476
00:19:44,020 --> 00:19:47,420
the fundamental language
of computation in cortex.

477
00:19:47,420 --> 00:19:49,510
And this neuron responds
quite vigorously

478
00:19:49,510 --> 00:19:51,934
when the cat was seeing a
bar of this orientation.

479
00:19:51,934 --> 00:19:53,350
And essentially,
there's no firing

480
00:19:53,350 --> 00:19:55,540
at all with this
type of stumulus

481
00:19:55,540 --> 00:19:57,340
in the receptive field.

482
00:19:57,340 --> 00:20:00,630
This was fundamental because it
transformed our understanding

483
00:20:00,630 --> 00:20:03,990
of the essential computations
in primary visual cortex

484
00:20:03,990 --> 00:20:07,290
in terms of filtering
the initial stimulus.

485
00:20:07,290 --> 00:20:10,412
This is what we now
describe by Gabor functions.

486
00:20:10,412 --> 00:20:12,370
And if you look at deep
convolutional networks,

487
00:20:12,370 --> 00:20:14,820
many of them, if not
perhaps all of them,

488
00:20:14,820 --> 00:20:17,100
start with some sort
of filtering operation

489
00:20:17,100 --> 00:20:20,220
that is either Gabor filters
or resembles this type

490
00:20:20,220 --> 00:20:23,550
of orientation that we think
is a fundamental aspect of how

491
00:20:23,550 --> 00:20:28,102
we start to process information
in the visual field.

492
00:20:28,102 --> 00:20:30,310
One of the beautiful things
that Hubel and Wiesel did

493
00:20:30,310 --> 00:20:32,580
is not only to make
these discoveries,

494
00:20:32,580 --> 00:20:36,540
but also to come up with very
simple graphical models of how

495
00:20:36,540 --> 00:20:38,550
they thought this
could come about.

496
00:20:38,550 --> 00:20:41,040
And this remains today one
of the fundamental ways

497
00:20:41,040 --> 00:20:43,680
in which we think about how
our orientation tuning may

498
00:20:43,680 --> 00:20:44,970
come about.

499
00:20:44,970 --> 00:20:47,880
If you recall the activity
of neurons in the retina

500
00:20:47,880 --> 00:20:50,640
or in the LGN,
you'll find what's

501
00:20:50,640 --> 00:20:52,470
called center surround
receptive fields.

502
00:20:52,470 --> 00:20:56,340
These are circularly
symmetric receptive fields,

503
00:20:56,340 --> 00:20:59,550
with an area in the center
that excites the neuron,

504
00:20:59,550 --> 00:21:03,250
and an area in the surround
that inhibits the neuron.

505
00:21:03,250 --> 00:21:06,390
What they conjecture is that if
you put together multiple LGN

506
00:21:06,390 --> 00:21:10,920
cells, whose receptive
fields are aligned

507
00:21:10,920 --> 00:21:14,440
along a certain orientation, and
you simply combine all of them,

508
00:21:14,440 --> 00:21:17,610
you simply add the responses
of all of those neurons,

509
00:21:17,610 --> 00:21:19,980
you can get a neuron in
the primary visual cortex

510
00:21:19,980 --> 00:21:22,000
that has orientation tuning.

511
00:21:22,000 --> 00:21:22,500
This

512
00:21:22,500 --> 00:21:25,200
is a problem that's far from
solved, despite the fact

513
00:21:25,200 --> 00:21:26,964
that we have four
or five decades.

514
00:21:26,964 --> 00:21:28,380
There are many,
many models of how

515
00:21:28,380 --> 00:21:30,360
orientation tuning comes about.

516
00:21:30,360 --> 00:21:33,210
But this remains one of the
basic bottom-up feed-forward

517
00:21:33,210 --> 00:21:35,310
ideas of how you
can actually build

518
00:21:35,310 --> 00:21:38,540
orientation tuning from very
simple receptive fields.

519
00:21:38,540 --> 00:21:40,380
This has informed a
lot of our thinking

520
00:21:40,380 --> 00:21:43,020
about how basic
computations can give rise

521
00:21:43,020 --> 00:21:47,760
to orientation tuning in a
purely bottom-up fashion.

522
00:21:47,760 --> 00:21:49,470
In primary visual
cortex, in addition

523
00:21:49,470 --> 00:21:52,230
to the so-called simple
cells, are complex cells

524
00:21:52,230 --> 00:21:54,900
that show invariance
to the exact position

525
00:21:54,900 --> 00:21:57,540
or the exact phase
of the oriented bar

526
00:21:57,540 --> 00:21:59,340
within the receptive field.

527
00:21:59,340 --> 00:22:00,820
And that's illustrated here.

528
00:22:00,820 --> 00:22:03,000
So this is a simple cell.

529
00:22:03,000 --> 00:22:05,700
So this simple cell
has orientation tuning,

530
00:22:05,700 --> 00:22:09,120
meaning that it responds more
vigorously to this orientation

531
00:22:09,120 --> 00:22:10,920
than to this orientation.

532
00:22:10,920 --> 00:22:14,580
However, if you change the phase
or the position of the oriented

533
00:22:14,580 --> 00:22:17,370
bar within the receptive
field, the response

534
00:22:17,370 --> 00:22:19,410
decreases significantly.

535
00:22:19,410 --> 00:22:22,440
In contrast to this
complex cell that not only

536
00:22:22,440 --> 00:22:24,360
has orientation
tuning, meaning that it

537
00:22:24,360 --> 00:22:27,570
fires more vigorously to this
orientation than to this one,

538
00:22:27,570 --> 00:22:30,730
but also has phase invariance,
meaning that the response is

539
00:22:30,730 --> 00:22:33,840
more or less the same way,
regardless of the exact phase

540
00:22:33,840 --> 00:22:35,460
or the exact position
of the stimulus

541
00:22:35,460 --> 00:22:37,330
within the receptive field.

542
00:22:37,330 --> 00:22:39,570
And again, the notion
that they postulated

543
00:22:39,570 --> 00:22:42,420
is that we can build
these complex cells

544
00:22:42,420 --> 00:22:44,809
by a summation of activity
or multiple simple cells.

545
00:22:44,809 --> 00:22:46,350
So again, if you
imagine now that you

546
00:22:46,350 --> 00:22:50,160
have multiple simple cells
with different receptive fields

547
00:22:50,160 --> 00:22:52,950
that are centered at
these different positions,

548
00:22:52,950 --> 00:22:56,310
you can add them up, and
create complex cells.

549
00:22:56,310 --> 00:22:59,070
These fundamental operations
of simple and complex cells

550
00:22:59,070 --> 00:23:02,190
and primary visual cortex
can be somehow traced

551
00:23:02,190 --> 00:23:05,850
to the root of a lot of the
bottom-up hierarchical models.

552
00:23:05,850 --> 00:23:08,160
A lot of the deep
convolutional networks today

553
00:23:08,160 --> 00:23:10,650
essentially have variations
on these kind of themes,

554
00:23:10,650 --> 00:23:13,410
of filtering steps,
nonlinear computations

555
00:23:13,410 --> 00:23:15,720
that give you invariance,
and a concatenation

556
00:23:15,720 --> 00:23:18,420
of these filtering
and invariance steps

557
00:23:18,420 --> 00:23:22,510
along the visual hierarchy.

558
00:23:22,510 --> 00:23:25,240
So in following
up with this idea,

559
00:23:25,240 --> 00:23:29,490
I would like to understand
the basics of what's

560
00:23:29,490 --> 00:23:33,120
the kind of information that's
provided when you have signals

561
00:23:33,120 --> 00:23:35,730
from V2 to V1.

562
00:23:35,730 --> 00:23:37,680
To do that, we have
been collaborating

563
00:23:37,680 --> 00:23:41,160
with Richard Born at Harvard
Medical School, who has

564
00:23:41,160 --> 00:23:43,980
a way of implanting cryo loops.

565
00:23:43,980 --> 00:23:48,330
This is a device that can be
implanted in monkeys in areas

566
00:23:48,330 --> 00:23:50,970
V2, and V3, lower
the temperature,

567
00:23:50,970 --> 00:23:54,150
and thus reduce or
essentially eliminate activity

568
00:23:54,150 --> 00:23:55,980
from areas V2 and V3.

569
00:23:55,980 --> 00:23:59,730
So that means that we can
study V1 without activity

570
00:23:59,730 --> 00:24:01,270
in area V2 and V3.

571
00:24:01,270 --> 00:24:04,500
We can study V1 sans feedback.

572
00:24:04,500 --> 00:24:07,080
So this is an
example of recordings

573
00:24:07,080 --> 00:24:09,270
of a neuron in this area.

574
00:24:09,270 --> 00:24:13,170
This is the normal activity
that you get from the neuron.

575
00:24:13,170 --> 00:24:15,120
Here is when they present
a visual stimulus.

576
00:24:15,120 --> 00:24:16,770
This is a spontaneous activity.

577
00:24:16,770 --> 00:24:19,020
Each of these dots
corresponds to a spike.

578
00:24:19,020 --> 00:24:21,150
Each of these lines
correspond to a repetition

579
00:24:21,150 --> 00:24:22,439
of the stimulus.

580
00:24:22,439 --> 00:24:24,480
This is a traditional way
of showing raster plots

581
00:24:24,480 --> 00:24:26,500
for neuron responses.

582
00:24:26,500 --> 00:24:28,530
So you see that this is
a spontaneous activity.

583
00:24:28,530 --> 00:24:29,790
You present the stimulus.

584
00:24:29,790 --> 00:24:32,910
There's an increase in the
response of this neuron,

585
00:24:32,910 --> 00:24:35,797
as you might expect.

586
00:24:35,797 --> 00:24:36,630
Actually, I'm sorry.

587
00:24:36,630 --> 00:24:38,234
This actually starts here.

588
00:24:38,234 --> 00:24:40,650
So this is the spontaneous
activity, this is the response.

589
00:24:40,650 --> 00:24:42,270
Now here, they
turn on their pump.

590
00:24:42,270 --> 00:24:44,350
They start lowering
the temperature.

591
00:24:44,350 --> 00:24:46,770
And you see within
a couple of minutes,

592
00:24:46,770 --> 00:24:49,710
they essentially significantly
reduce the responses.

593
00:24:49,710 --> 00:24:51,630
The largely silence--
not completely--

594
00:24:51,630 --> 00:24:55,860
but largely silence
activity in areas V2 and V3.

595
00:24:55,860 --> 00:24:59,010
And these are reversible, so
when they turn the pumps off,

596
00:24:59,010 --> 00:25:00,090
activity comes back in.

597
00:25:00,090 --> 00:25:03,000
So the question is, what
happens in primary visual cortex

598
00:25:03,000 --> 00:25:08,560
when you don't have
feedback from V2 and V3.

599
00:25:08,560 --> 00:25:10,650
So the first thing
they have characterized

600
00:25:10,650 --> 00:25:15,765
is that some of the basic
properties of V1 do not change.

601
00:25:15,765 --> 00:25:18,360
It's consistent with
the simple models

602
00:25:18,360 --> 00:25:22,080
that I just told you, where
the orientation tuning

603
00:25:22,080 --> 00:25:24,990
in the primary visual
cortex is largely

604
00:25:24,990 --> 00:25:27,240
dictated by the
bottom-up inputs,

605
00:25:27,240 --> 00:25:29,135
by the signals from the LGN.

606
00:25:29,135 --> 00:25:30,510
The conjecture
from that would be

607
00:25:30,510 --> 00:25:32,640
that if you silence
V2 and V3, nothing

608
00:25:32,640 --> 00:25:34,410
would happen with
orientation tuning

609
00:25:34,410 --> 00:25:36,070
in primary visual cortex.

610
00:25:36,070 --> 00:25:38,820
And that's essentially
what they're showing here.

611
00:25:38,820 --> 00:25:40,544
These are example neurons.

612
00:25:40,544 --> 00:25:42,210
This is showing
orientation selectivity.

613
00:25:42,210 --> 00:25:43,830
This is showing
direction selectivity,

614
00:25:43,830 --> 00:25:45,630
what happens when
you move an oriented

615
00:25:45,630 --> 00:25:47,380
bar within the receptive field.

616
00:25:47,380 --> 00:25:49,170
So this is showing
the direction.

617
00:25:49,170 --> 00:25:51,420
This is showing the
mean normalized response

618
00:25:51,420 --> 00:25:52,020
of a neuron.

619
00:25:52,020 --> 00:25:54,720
This is the preferred direction,
and direction orientation

620
00:25:54,720 --> 00:25:57,390
that gives a maximum response.

621
00:25:57,390 --> 00:26:01,020
The blue curve corresponds
to when you don't

622
00:26:01,020 --> 00:26:02,820
have activity in V2 and V3.

623
00:26:02,820 --> 00:26:04,670
Red corresponds to
their control data.

624
00:26:04,670 --> 00:26:08,560
And essentially, the tuning
of the neuron was not altered.

625
00:26:08,560 --> 00:26:12,540
The orientation preferred by
this neuron was not altered.

626
00:26:12,540 --> 00:26:15,160
The same thing goes for
direction selectivity.

627
00:26:15,160 --> 00:26:17,640
So the basic problems
of orientation tuning

628
00:26:17,640 --> 00:26:20,340
and direction selectivity
did not change.

629
00:26:20,340 --> 00:26:23,570
Let me say a few words about
the dynamics of the responses.

630
00:26:23,570 --> 00:26:27,210
So here, what I'm showing you
is the mean normalized responses

631
00:26:27,210 --> 00:26:28,430
as a function of time.

632
00:26:28,430 --> 00:26:31,360
Time 0 is when the
stimulus is turned on.

633
00:26:31,360 --> 00:26:34,440
As I told you already, by
about 50 milliseconds or so,

634
00:26:34,440 --> 00:26:37,890
you get a vigorous response
in primary visual cortex.

635
00:26:37,890 --> 00:26:40,710
And if we compare the
orange and the blue curves,

636
00:26:40,710 --> 00:26:44,500
we see that this initial
response is largely identical.

637
00:26:44,500 --> 00:26:47,400
So the initial response
of these V1 neurons

638
00:26:47,400 --> 00:26:52,500
is not affected by the
absence of feedback from V2.

639
00:26:52,500 --> 00:26:53,900
We start to see
effects, we start

640
00:26:53,900 --> 00:26:56,430
to see a change in
the firing rate here.

641
00:26:56,430 --> 00:27:02,380
Largely at about 60 milliseconds
or so after presentation.

642
00:27:02,380 --> 00:27:04,830
So in a highly
oversimplified cartoon,

643
00:27:04,830 --> 00:27:08,740
I think of this as a bottom-up
Hubel and Wiesel like response,

644
00:27:08,740 --> 00:27:10,350
driven by LGN.

645
00:27:10,350 --> 00:27:13,380
And signals from V2
to V1 coming back

646
00:27:13,380 --> 00:27:15,217
about 10 milliseconds later.

647
00:27:15,217 --> 00:27:17,550
And that's when we started
seeing some of these feedback

648
00:27:17,550 --> 00:27:19,380
related effects.

649
00:27:19,380 --> 00:27:22,260
I told you that some of the
basic properties do not change.

650
00:27:22,260 --> 00:27:24,600
We interpret this as
being dictated largely

651
00:27:24,600 --> 00:27:25,860
by bottom-up signals.

652
00:27:25,860 --> 00:27:27,180
The dynamics do change.

653
00:27:27,180 --> 00:27:29,200
The initial response
is unaffected.

654
00:27:29,200 --> 00:27:31,320
The later part of the
response is affected.

655
00:27:31,320 --> 00:27:33,510
I want to say one
thing that does change.

656
00:27:33,510 --> 00:27:35,760
And for that, I need to
explain what an area summation

657
00:27:35,760 --> 00:27:37,470
curve is.

658
00:27:37,470 --> 00:27:39,720
So if you present the
stimulus within the receptive

659
00:27:39,720 --> 00:27:43,440
field of a neuron of this size,
you get a certain response.

660
00:27:43,440 --> 00:27:46,830
As you start increasing
the size of this stimulus,

661
00:27:46,830 --> 00:27:48,540
you get a more
vigorous response.

662
00:27:48,540 --> 00:27:49,690
Size matters.

663
00:27:49,690 --> 00:27:51,300
The larger, the better--

664
00:27:51,300 --> 00:27:51,990
to a point.

665
00:27:51,990 --> 00:27:54,270
There comes a point
where it turns out

666
00:27:54,270 --> 00:27:58,030
that the response of the
neurons starts decreasing again.

667
00:27:58,030 --> 00:28:00,480
So larger is not always better.

668
00:28:00,480 --> 00:28:01,880
A little bit larger is better.

669
00:28:01,880 --> 00:28:04,680
This size has an
inhibitory effect

670
00:28:04,680 --> 00:28:06,420
overall on the
response of the neuron.

671
00:28:06,420 --> 00:28:08,190
This is called
surround suppression.

672
00:28:08,190 --> 00:28:11,100
And these curves have been
characterized in areas

673
00:28:11,100 --> 00:28:12,360
like primary visual cortex.

674
00:28:12,360 --> 00:28:16,030
Also in earlier areas
for a very long time.

675
00:28:16,030 --> 00:28:19,140
It turns out that when you
do these type of experiments

676
00:28:19,140 --> 00:28:22,380
in the absence of feedback, the
effect of surround suppression

677
00:28:22,380 --> 00:28:23,800
does not disappear.

678
00:28:23,800 --> 00:28:27,330
That is, you still have a peak
in the response as a function

679
00:28:27,330 --> 00:28:28,620
of a stimulus size.

680
00:28:28,620 --> 00:28:31,207
But there is a reduced amount
of surround suppression.

681
00:28:31,207 --> 00:28:32,790
That is, when you
don't have feedback,

682
00:28:32,790 --> 00:28:33,831
there's less suppression.

683
00:28:33,831 --> 00:28:36,820
You have a larger response
for bigger stimulus.

684
00:28:36,820 --> 00:28:39,600
So we think that one of the
fundamental computations

685
00:28:39,600 --> 00:28:41,880
that feedback is
providing here is

686
00:28:41,880 --> 00:28:44,670
this integration from
multiple neurons in V1

687
00:28:44,670 --> 00:28:46,200
that happens in V2.

688
00:28:46,200 --> 00:28:50,160
And then inhibition to
activity of neurons in area V1

689
00:28:50,160 --> 00:28:51,990
to provide some of
the suppression.

690
00:28:51,990 --> 00:28:54,060
This is partly the reason
why our neurons are not

691
00:28:54,060 --> 00:28:57,600
very excited about a uniform
stimulus, like a blank wall.

692
00:28:57,600 --> 00:29:00,420
Our neurons are interested
in changes, and part of that,

693
00:29:00,420 --> 00:29:04,560
we think, is dictated by
this feedback from V2 to V1.

694
00:29:04,560 --> 00:29:08,090
We can model these center
surround interactions

695
00:29:08,090 --> 00:29:11,950
as a ratio of two Gaussian
curves, two forces.

696
00:29:11,950 --> 00:29:14,380
One is the one that
increases the response.

697
00:29:14,380 --> 00:29:16,110
The other one is a
normalization term

698
00:29:16,110 --> 00:29:19,040
that suppresses the response
when the stimulus is too large.

699
00:29:19,040 --> 00:29:20,604
There's a number
of parameters here.

700
00:29:20,604 --> 00:29:22,020
Essentially, you
can think of this

701
00:29:22,020 --> 00:29:24,840
as a ratio of Gaussians, ROGs.

702
00:29:24,840 --> 00:29:26,820
There's a ratio of
two Gaussian curves.

703
00:29:26,820 --> 00:29:28,590
One dictating the
center that responds.

704
00:29:28,590 --> 00:29:30,380
The other one, the
surround response.

705
00:29:30,380 --> 00:29:32,190
And to make a long
story short, we

706
00:29:32,190 --> 00:29:34,230
can feed the data
from the monkey

707
00:29:34,230 --> 00:29:37,500
with this extremely simple
ratio of Gaussian's model.

708
00:29:37,500 --> 00:29:39,420
And we can show that
the main parameter

709
00:29:39,420 --> 00:29:43,800
that feedback seems to be
acting upon is what we call Wn--

710
00:29:43,800 --> 00:29:47,020
that is this
normalization factor here.

711
00:29:47,020 --> 00:29:50,610
So that the tuning
factor that dictates

712
00:29:50,610 --> 00:29:54,444
the strength of the surrounding
division from V2 to V1--

713
00:29:54,444 --> 00:29:56,610
we think that's one of the
fundamental things that's

714
00:29:56,610 --> 00:29:59,607
being affected by feedback.

715
00:29:59,607 --> 00:30:01,190
So we would think
of this as the gain.

716
00:30:01,190 --> 00:30:03,560
We think of this as
the spatial extent

717
00:30:03,560 --> 00:30:05,690
over which the V2
can exert its action

718
00:30:05,690 --> 00:30:07,100
on primary visual cortex.

719
00:30:07,100 --> 00:30:10,310
We think that's the main
thing that's affected here.

720
00:30:10,310 --> 00:30:13,400
This type of spatial
effect may be

721
00:30:13,400 --> 00:30:16,910
important in other role that
has been ascribed to feedback,

722
00:30:16,910 --> 00:30:19,310
which is the ability
to direct attention

723
00:30:19,310 --> 00:30:21,620
to specific locations
in the environment.

724
00:30:21,620 --> 00:30:23,120
I want to come back
to this question

725
00:30:23,120 --> 00:30:25,520
here, and ask, under
what conditions,

726
00:30:25,520 --> 00:30:28,910
and how can a feedback also
provide important features

727
00:30:28,910 --> 00:30:31,465
specific signals from
one area to another.

728
00:30:31,465 --> 00:30:32,840
And for that, I'm
going to switch

729
00:30:32,840 --> 00:30:35,850
to another task, another
completely different prep,

730
00:30:35,850 --> 00:30:37,430
which is the
Where's Waldo task--

731
00:30:37,430 --> 00:30:38,690
the task of visual search.

732
00:30:38,690 --> 00:30:41,670
How do we search for particular
objects in the environment.

733
00:30:41,670 --> 00:30:45,429
And here, it's not sufficient
to focus on a specific location,

734
00:30:45,429 --> 00:30:47,720
but we need to be able to
search for specific features.

735
00:30:47,720 --> 00:30:50,750
We need to be able to
bias our visual responses

736
00:30:50,750 --> 00:30:52,430
for specific features
of the stimulus

737
00:30:52,430 --> 00:30:54,140
that we're searching for.

738
00:30:54,140 --> 00:30:56,400
So this is a famous sort
of Where's Waldo task.

739
00:30:56,400 --> 00:30:58,830
You need to be able to
search for specific features.

740
00:30:58,830 --> 00:31:02,180
It's not enough to be able to
send feedback from V2 to V1,

741
00:31:02,180 --> 00:31:05,260
and direct attention, or change
the sizes of the receptive

742
00:31:05,260 --> 00:31:08,480
fields, or the direct attention
to a specific location.

743
00:31:08,480 --> 00:31:09,950
Another version
that I'm not going

744
00:31:09,950 --> 00:31:13,580
to talk about of visual that
has a related theme that relates

745
00:31:13,580 --> 00:31:15,860
to visual search is
feature based attention,

746
00:31:15,860 --> 00:31:18,604
when you're actually paying
attention to a particular face,

747
00:31:18,604 --> 00:31:21,020
to a particular color, to a
particular feature that is not

748
00:31:21,020 --> 00:31:24,410
necessarily located, and to
space, as our friend here has

749
00:31:24,410 --> 00:31:26,090
studied quite significantly.

750
00:31:26,090 --> 00:31:28,710
People always like to know
the answer of where he is at.

751
00:31:28,710 --> 00:31:29,210
OK.

752
00:31:29,210 --> 00:31:31,550
So let me tell you about
a computational model

753
00:31:31,550 --> 00:31:34,280
and some behavioral data
that we have collected

754
00:31:34,280 --> 00:31:38,300
to try to get at this question
of how feedback signals can

755
00:31:38,300 --> 00:31:40,640
be relevant for visual search.

756
00:31:40,640 --> 00:31:45,290
This initial part of
this computational model

757
00:31:45,290 --> 00:31:48,940
is essentially the HMAX
type of architecture

758
00:31:48,940 --> 00:31:52,100
that has been pioneered by
Tommy Poggio and several people

759
00:31:52,100 --> 00:31:55,040
in his lab, most notably,
people like Max Riesenhuber

760
00:31:55,040 --> 00:31:56,510
and Thomas Serre.

761
00:31:56,510 --> 00:31:58,340
I was thinking
that by this time,

762
00:31:58,340 --> 00:32:00,620
people would have described
this in more detail.

763
00:32:00,620 --> 00:32:02,450
I'm going to go through
these very quickly.

764
00:32:02,450 --> 00:32:04,280
Again, today in the
afternoon, we'll

765
00:32:04,280 --> 00:32:07,320
have more discussion about
this family of models.

766
00:32:07,320 --> 00:32:09,350
So these family of
models essentially

767
00:32:09,350 --> 00:32:12,830
goes through a series of linear
and non-linear computations

768
00:32:12,830 --> 00:32:16,730
in a hierarchical way, inspired
by the basic definition

769
00:32:16,730 --> 00:32:18,830
of simple and
complex cells that I

770
00:32:18,830 --> 00:32:22,370
described in the work
of Hubel and Wiesel.

771
00:32:22,370 --> 00:32:26,040
So basically, what these models
do is they take an image.

772
00:32:26,040 --> 00:32:27,590
These are pixels.

773
00:32:27,590 --> 00:32:28,760
There's a filtering step.

774
00:32:28,760 --> 00:32:32,294
This filtering step involves
Gabor filtering of the image.

775
00:32:32,294 --> 00:32:33,710
In this particular
case, there are

776
00:32:33,710 --> 00:32:36,020
four different orientations.

777
00:32:36,020 --> 00:32:37,880
And what do you
get here is a map

778
00:32:37,880 --> 00:32:42,470
of the visual input after
this linear filtering process.

779
00:32:42,470 --> 00:32:46,700
The next step in this model
is a local max operation.

780
00:32:46,700 --> 00:32:50,180
This is pooling neurons that
have similar identical feature

781
00:32:50,180 --> 00:32:53,360
preferences, but
slightly different scale

782
00:32:53,360 --> 00:32:54,680
in the receptive fields.

783
00:32:54,680 --> 00:32:57,530
Or slightly different positions
in their receptive fields.

784
00:32:57,530 --> 00:33:00,410
And this max operation,
this non-linear operation

785
00:33:00,410 --> 00:33:03,440
is giving you invariance
to the specific feature.

786
00:33:03,440 --> 00:33:07,070
So now you can get a response to
the same feature, irrespective

787
00:33:07,070 --> 00:33:09,770
of the exact scale
or the exact position

788
00:33:09,770 --> 00:33:12,080
within the receptive field.

789
00:33:12,080 --> 00:33:15,165
These were labeled
S1 and C1, initially

790
00:33:15,165 --> 00:33:16,650
in models by Fukushima.

791
00:33:16,650 --> 00:33:19,130
And this type of nomenclature
was carried on later

792
00:33:19,130 --> 00:33:21,030
by Tommy and many others.

793
00:33:21,030 --> 00:33:24,410
And this is directly inspired
by the simple and complex cells

794
00:33:24,410 --> 00:33:26,840
that I very briefly
showed you previously

795
00:33:26,840 --> 00:33:29,660
in the recordings
of Hubel and Wiesel.

796
00:33:29,660 --> 00:33:32,090
These filtering
and max operations

797
00:33:32,090 --> 00:33:35,150
are repeated throughout the
hierarchy again and again.

798
00:33:35,150 --> 00:33:37,160
So here's another layer
that has a filtering

799
00:33:37,160 --> 00:33:40,850
step and a nonlinear max step.

800
00:33:40,850 --> 00:33:44,690
In this case, this filtering
here is not a Gabor filter.

801
00:33:44,690 --> 00:33:47,870
We don't really understand very
well what neurons in V2 and V4

802
00:33:47,870 --> 00:33:48,590
are doing.

803
00:33:48,590 --> 00:33:51,020
One of the types of
filters that have been used

804
00:33:51,020 --> 00:33:54,380
and that we are using here
is a radial basis function,

805
00:33:54,380 --> 00:33:56,630
where the properties of
a neuron in this case

806
00:33:56,630 --> 00:34:02,150
are dictated by patches taking
randomly from natural images.

807
00:34:02,150 --> 00:34:04,430
All of this is
purely feed-forward.

808
00:34:04,430 --> 00:34:06,928
All of this is essentially
the basic ingredient

809
00:34:06,928 --> 00:34:08,469
of the type of
convolutional networks

810
00:34:08,469 --> 00:34:11,449
that had been used for
object recognition.

811
00:34:11,449 --> 00:34:13,190
You can have more layers.

812
00:34:13,190 --> 00:34:15,199
You can have different
types of computations.

813
00:34:15,199 --> 00:34:17,179
The basic properties
are essentially

814
00:34:17,179 --> 00:34:19,370
the ones that are
described briefly here.

815
00:34:19,370 --> 00:34:21,920
What I really want to talk
about is not the former part,

816
00:34:21,920 --> 00:34:23,210
but this part of the model.

817
00:34:23,210 --> 00:34:26,550
Now I ask you, where's Waldo,
you need to do something,

818
00:34:26,550 --> 00:34:29,510
you need be able to somehow
look at this information,

819
00:34:29,510 --> 00:34:32,300
and be able to
bias your responses

820
00:34:32,300 --> 00:34:36,949
or bias the model towards
regions of the visual space

821
00:34:36,949 --> 00:34:39,949
that have features that resemble
what you're looking for.

822
00:34:39,949 --> 00:34:42,949
Your car, your keys, Waldo.

823
00:34:42,949 --> 00:34:45,260
So the way we do that
is first, in this case,

824
00:34:45,260 --> 00:34:47,093
I'm going to show you
what happens if you're

825
00:34:47,093 --> 00:34:48,780
looking for the top hat here.

826
00:34:48,780 --> 00:34:50,210
So first, we have
a representation

827
00:34:50,210 --> 00:34:51,920
in the model of the top hat.

828
00:34:51,920 --> 00:34:53,350
This is the hat here.

829
00:34:53,350 --> 00:34:56,360
And we have a representation
in our vocabulary

830
00:34:56,360 --> 00:34:59,390
of how units in the highest
echelons of this model

831
00:34:59,390 --> 00:35:00,290
represent this hat.

832
00:35:00,290 --> 00:35:02,750
So we have a representation
of the features

833
00:35:02,750 --> 00:35:07,580
that compose this object at
a high level in this model.

834
00:35:07,580 --> 00:35:10,490
We use that representation
to modulate,

835
00:35:10,490 --> 00:35:13,490
in a multiplicative
fashion, the entire image.

836
00:35:13,490 --> 00:35:15,500
Essentially, we
bias the responses

837
00:35:15,500 --> 00:35:19,190
in the entire image based
on the particular features

838
00:35:19,190 --> 00:35:21,560
that we are searching for.

839
00:35:21,560 --> 00:35:24,170
This is inspired by many
physiological experiments that

840
00:35:24,170 --> 00:35:27,680
have shown that to a
good approximation,

841
00:35:27,680 --> 00:35:29,270
this type of
modulation in feature

842
00:35:29,270 --> 00:35:32,750
based attention has been
observed across different parts

843
00:35:32,750 --> 00:35:33,590
of the visual field.

844
00:35:33,590 --> 00:35:36,080
That is, if you're
searching for red objects,

845
00:35:36,080 --> 00:35:39,152
neurons that like red will
enhance their response

846
00:35:39,152 --> 00:35:40,610
throughout the
entire visual field.

847
00:35:40,610 --> 00:35:43,520
So have the entire
visual field modulated

848
00:35:43,520 --> 00:35:48,770
by the pattern of features that
we're searching for in here.

849
00:35:48,770 --> 00:35:52,150
After that, we have
a normalization step.

850
00:35:52,150 --> 00:35:54,270
This normalization
step is critical

851
00:35:54,270 --> 00:35:57,300
in order to discount
purely bottom-up effects.

852
00:35:57,300 --> 00:35:59,870
We don't want the competition
between different objects

853
00:35:59,870 --> 00:36:03,140
to be purely dictated by
which object is brighter,

854
00:36:03,140 --> 00:36:03,770
for example.

855
00:36:03,770 --> 00:36:06,680
So we normalize that
after modulating that

856
00:36:06,680 --> 00:36:09,650
with the features
that we are searching.

857
00:36:09,650 --> 00:36:13,340
That gives us a map of the
image, where each area has been

858
00:36:13,340 --> 00:36:15,560
essentially compared
to this feature set

859
00:36:15,560 --> 00:36:16,910
that we're looking for.

860
00:36:16,910 --> 00:36:19,190
And then we have a winner
take all mechanism that

861
00:36:19,190 --> 00:36:21,680
dictates where the model
will pay attention to,

862
00:36:21,680 --> 00:36:23,840
or where the model
will fixate on first.

863
00:36:23,840 --> 00:36:27,800
Where the model thinks that a
particular object is located.

864
00:36:27,800 --> 00:36:30,920
OK so what happens when we
have this feedback that's

865
00:36:30,920 --> 00:36:33,680
feature specific, and that
modulates the responses based

866
00:36:33,680 --> 00:36:35,900
on the targets object
that we're searching for.

867
00:36:35,900 --> 00:36:38,120
In these two images,
either in objects

868
00:36:38,120 --> 00:36:42,180
arrays or when objects are
embedded in complex scenes,

869
00:36:42,180 --> 00:36:44,300
we're searching for
this top object.

870
00:36:44,300 --> 00:36:46,730
And the largest
response in the model

871
00:36:46,730 --> 00:36:50,090
is indeed in the location
of where the object is.

872
00:36:50,090 --> 00:36:52,070
In these other two
images, the model

873
00:36:52,070 --> 00:36:54,410
is searching for
this accordion here.

874
00:36:54,410 --> 00:36:56,570
And again, the model
was able to find that

875
00:36:56,570 --> 00:37:00,110
by this comparison of the
features with the stimulus.

876
00:37:00,110 --> 00:37:03,420
More generally, these
are object array images.

877
00:37:03,420 --> 00:37:06,170
This is the number
of fixations required

878
00:37:06,170 --> 00:37:08,960
to find the object in
this object array images.

879
00:37:08,960 --> 00:37:11,330
So one would correspond
to the first fixation.

880
00:37:11,330 --> 00:37:14,670
If the model does not find the
object in the first location,

881
00:37:14,670 --> 00:37:16,520
there's what's called
inhibition of return.

882
00:37:16,520 --> 00:37:18,470
So we make sure the
model does not come back

883
00:37:18,470 --> 00:37:20,600
to the same location,
and the model will

884
00:37:20,600 --> 00:37:24,230
look at the second best
possible location in the image.

885
00:37:24,230 --> 00:37:27,870
And it will keep on searching
until it finds the object.

886
00:37:27,870 --> 00:37:31,460
So the model performs in the
first fixation at 60% correct.

887
00:37:31,460 --> 00:37:33,380
And eventually,
after five fixations,

888
00:37:33,380 --> 00:37:37,160
it can find the object
almost always right in here.

889
00:37:37,160 --> 00:37:39,560
This is what you would
expect by random search.

890
00:37:39,560 --> 00:37:42,020
If you were to randomly
fixate on different objects,

891
00:37:42,020 --> 00:37:44,130
so the model is doing
much better than that.

892
00:37:44,130 --> 00:37:45,830
And then for the
aficionados, there's

893
00:37:45,830 --> 00:37:48,980
a whole plethora of purely
bottom-up models that

894
00:37:48,980 --> 00:37:50,882
don't have feedback whatsoever.

895
00:37:50,882 --> 00:37:52,340
This is a family
of models that was

896
00:37:52,340 --> 00:37:54,920
pioneered by people like
Laurent Itti and Christof Koch.

897
00:37:54,920 --> 00:37:56,812
These are saliency based models.

898
00:37:56,812 --> 00:37:59,270
Although you cannot see, there
are a couple of other points

899
00:37:59,270 --> 00:37:59,990
in here.

900
00:37:59,990 --> 00:38:03,080
All of those models cannot
find the object either.

901
00:38:03,080 --> 00:38:05,720
It's not that these objects
that we're searching for

902
00:38:05,720 --> 00:38:07,550
are more salient,
and therefore, that's

903
00:38:07,550 --> 00:38:09,290
why the model is finding them.

904
00:38:09,290 --> 00:38:11,340
We really need
something more than just

905
00:38:11,340 --> 00:38:13,460
bottom-up pure saliency.

906
00:38:13,460 --> 00:38:14,960
We did a psychophysical
experiment.

907
00:38:14,960 --> 00:38:17,690
We asked, well, this is how
the model searches for Waldo.

908
00:38:17,690 --> 00:38:19,220
How will humans
search for objects

909
00:38:19,220 --> 00:38:20,910
under the same conditions.

910
00:38:20,910 --> 00:38:22,530
So we had multiple objects.

911
00:38:22,530 --> 00:38:24,980
Subjects have to make a
saccade to a target object.

912
00:38:24,980 --> 00:38:27,770
To make a long story short, this
is the cumulative performance

913
00:38:27,770 --> 00:38:30,110
of the model and the
number of fixations

914
00:38:30,110 --> 00:38:32,420
under these conditions,
and the model

915
00:38:32,420 --> 00:38:35,780
that's reasonable in terms
of how well humans do.

916
00:38:35,780 --> 00:38:39,867
This is data from every single
individual subject in the task.

917
00:38:39,867 --> 00:38:41,450
I'm going to skip
some of the details.

918
00:38:41,450 --> 00:38:45,110
You can compare the errors
that the model is making.

919
00:38:45,110 --> 00:38:47,960
How consistent people are
with themselves with respect

920
00:38:47,960 --> 00:38:48,710
to other subjects.

921
00:38:48,710 --> 00:38:50,960
How good it is with
respect to humans.

922
00:38:50,960 --> 00:38:53,630
The long story is the
model is far from perfect.

923
00:38:53,630 --> 00:38:55,280
We don't think that
we have captured

924
00:38:55,280 --> 00:38:58,010
everything we need to
understand about visual search.

925
00:38:58,010 --> 00:39:00,500
Some people alluded to before,
for example, the notion

926
00:39:00,500 --> 00:39:03,080
that the model doesn't
have these major changes

927
00:39:03,080 --> 00:39:05,870
with eccentricity, and
the fovea, and so on.

928
00:39:05,870 --> 00:39:07,640
A long way to go, but
we think that we've

929
00:39:07,640 --> 00:39:10,280
captured some of the
essential initial ingredients

930
00:39:10,280 --> 00:39:11,240
of visual search.

931
00:39:11,240 --> 00:39:15,110
And that this is one example of
how visual feedback signals can

932
00:39:15,110 --> 00:39:18,530
influence this bottom-up
hierarchy for recognition.

933
00:39:18,530 --> 00:39:21,320
I want to very quickly
move on to a third example

934
00:39:21,320 --> 00:39:24,410
that I wanted to give you
of how feedback can help

935
00:39:24,410 --> 00:39:25,790
in terms of visual recognition.

936
00:39:25,790 --> 00:39:28,870
What are other functions that
feedback could be playing.

937
00:39:28,870 --> 00:39:31,700
And for that, I'd like to
discuss the work that Hanlin

938
00:39:31,700 --> 00:39:34,190
did here, and also,
Bill Lotter in the lab,

939
00:39:34,190 --> 00:39:36,440
in terms of how we
can recognize objects

940
00:39:36,440 --> 00:39:37,790
that are partially occluded.

941
00:39:37,790 --> 00:39:39,170
This happens all the time.

942
00:39:39,170 --> 00:39:42,080
So you walk around and
see objects in the world.

943
00:39:42,080 --> 00:39:44,300
You can also encounter
objects where you can only

944
00:39:44,300 --> 00:39:45,930
find partial
information, and you have

945
00:39:45,930 --> 00:39:47,400
to make pattern completion.

946
00:39:47,400 --> 00:39:49,340
Pattern completion is
a fundamental aspect

947
00:39:49,340 --> 00:39:50,120
of intelligence.

948
00:39:50,120 --> 00:39:52,520
We do that in all
sorts of scenarios.

949
00:39:52,520 --> 00:39:54,440
It's not just
restricted to vision.

950
00:39:54,440 --> 00:39:57,350
All of you can probably
complete all of these patterns.

951
00:39:57,350 --> 00:39:59,410
We use pattern completion
in social scenarios

952
00:39:59,410 --> 00:40:00,330
as well, right?

953
00:40:00,330 --> 00:40:02,152
You make inferences
from partial knowledge

954
00:40:02,152 --> 00:40:04,110
about their intentions,
and what they're doing,

955
00:40:04,110 --> 00:40:06,120
and what they're
trying to do, OK?

956
00:40:06,120 --> 00:40:09,340
So we want to study this problem
of how you complete pattern,

957
00:40:09,340 --> 00:40:12,510
how you extrapolate from
partial limited information

958
00:40:12,510 --> 00:40:14,885
in the context of
visual recognition.

959
00:40:14,885 --> 00:40:16,260
There are a lot
of different ways

960
00:40:16,260 --> 00:40:19,640
in which one can present
partially occluded objects.

961
00:40:19,640 --> 00:40:21,120
Here are just a few of them.

962
00:40:21,120 --> 00:40:23,490
What Hanlin did was
use a paradigm called

963
00:40:23,490 --> 00:40:25,579
bubbles that's shown here.

964
00:40:25,579 --> 00:40:27,870
Essentially, it's like looking
at the world like these.

965
00:40:27,870 --> 00:40:29,850
You only have small
windows through which

966
00:40:29,850 --> 00:40:31,140
you can see the object.

967
00:40:31,140 --> 00:40:34,759
Performance can be titrated to
make the task harder or easier.

968
00:40:34,759 --> 00:40:36,300
So if you have a
lot of bubbles, it's

969
00:40:36,300 --> 00:40:39,750
relatively easy to recognize
that this is a toy school bus.

970
00:40:39,750 --> 00:40:41,250
If you have only
four bubbles, it's

971
00:40:41,250 --> 00:40:42,820
actually pretty challenging.

972
00:40:42,820 --> 00:40:46,950
So we can titrate performance
on the difficulty of this task.

973
00:40:46,950 --> 00:40:50,760
Very quickly, let me start
by showing you psychophysics

974
00:40:50,760 --> 00:40:51,540
performance here.

975
00:40:51,540 --> 00:40:54,570
This is how subjects
perform as a function

976
00:40:54,570 --> 00:40:58,110
of the amount of occlusion in
the image as a function of how

977
00:40:58,110 --> 00:41:00,760
many pixels you're
showing for these images.

978
00:41:00,760 --> 00:41:04,470
And what you see here is
that with 60% occlusion,

979
00:41:04,470 --> 00:41:06,420
performance is extremely high.

980
00:41:06,420 --> 00:41:08,910
Performance essentially
drops to chance level

981
00:41:08,910 --> 00:41:10,940
when the object is
more and more occluded.

982
00:41:10,940 --> 00:41:13,230
There is a significant
amount of robustness

983
00:41:13,230 --> 00:41:14,726
in human performance.

984
00:41:14,726 --> 00:41:16,350
For example, you have
a little bit more

985
00:41:16,350 --> 00:41:18,510
than 10% of the
pixels in the object,

986
00:41:18,510 --> 00:41:21,360
and people can still recognize
them reasonably well.

987
00:41:21,360 --> 00:41:24,280
So this is all behavioral data.

988
00:41:24,280 --> 00:41:27,540
Let me show you very quickly
what Hanlin discovered

989
00:41:27,540 --> 00:41:31,110
by doing invasive
recordings in human patients

990
00:41:31,110 --> 00:41:32,550
while the subjects
were performing

991
00:41:32,550 --> 00:41:36,060
this recognition of objects
that are partially occluded.

992
00:41:36,060 --> 00:41:38,610
It's illegal to put
electrodes in the human brain

993
00:41:38,610 --> 00:41:41,430
in normal people, so
we work with subjects

994
00:41:41,430 --> 00:41:44,470
that have pharmacological
intractable epilepsy.

995
00:41:44,470 --> 00:41:46,440
So inside of subjects
that have seizures,

996
00:41:46,440 --> 00:41:48,930
the neurosurgeons need to
implant electrodes in order

997
00:41:48,930 --> 00:41:51,180
to localize the seizures.

998
00:41:51,180 --> 00:41:54,690
And B, in order to ensure
that when they do a resection,

999
00:41:54,690 --> 00:41:56,700
and they take out the
part of the brain that's

1000
00:41:56,700 --> 00:41:58,710
responsible for seizures,
that they're not

1001
00:41:58,710 --> 00:42:02,380
going to interfere with other
functions, such as language.

1002
00:42:02,380 --> 00:42:05,190
These patients stay in the
hospital for about one week.

1003
00:42:05,190 --> 00:42:07,620
And during this one week,
we have a unique opportunity

1004
00:42:07,620 --> 00:42:11,280
to go inside a human brain,
and record physiological data.

1005
00:42:11,280 --> 00:42:12,700
Depending on the
type of patient,

1006
00:42:12,700 --> 00:42:15,160
we've used the different
types of electrodes.

1007
00:42:15,160 --> 00:42:17,850
This is what some people
refer to as ECoG electrodes.

1008
00:42:17,850 --> 00:42:19,720
Electrocorticographic signals.

1009
00:42:19,720 --> 00:42:21,185
These are field
potential signals,

1010
00:42:21,185 --> 00:42:23,310
very different from the
ones that I was showing you

1011
00:42:23,310 --> 00:42:25,250
in the little spikes before.

1012
00:42:25,250 --> 00:42:28,420
These are aggregate measures,
probably of tens of thousands,

1013
00:42:28,420 --> 00:42:31,350
if not millions of neurons,
where we have very, very

1014
00:42:31,350 --> 00:42:34,350
high temporal resolution at
the millisecond level, but very

1015
00:42:34,350 --> 00:42:38,100
poor spatial resolution, only
being able to localize things

1016
00:42:38,100 --> 00:42:40,980
at the millimeter level or so.

1017
00:42:40,980 --> 00:42:43,500
With these, we can
pinpoint specific locations

1018
00:42:43,500 --> 00:42:45,840
within about approximately
one millimeter,

1019
00:42:45,840 --> 00:42:48,750
but have very high signal to
noise ratio signals that are

1020
00:42:48,750 --> 00:42:50,700
dictated by the visual input.

1021
00:42:50,700 --> 00:42:53,426
An example of those
signals is shown here.

1022
00:42:53,426 --> 00:42:55,050
These are intracranial
field potentials

1023
00:42:55,050 --> 00:42:56,490
as a function of time.

1024
00:42:56,490 --> 00:42:58,250
This is the onset
of the stimulus.

1025
00:42:58,250 --> 00:43:00,330
And these 39
different repetitions,

1026
00:43:00,330 --> 00:43:03,510
when Hanlin is showing
this unoccluded face,

1027
00:43:03,510 --> 00:43:06,507
we see a very vigorous
change, quite systematic

1028
00:43:06,507 --> 00:43:07,590
from one trial to another.

1029
00:43:07,590 --> 00:43:10,200
All of those gray traces
are single trials,

1030
00:43:10,200 --> 00:43:13,470
similar to the raster plot
that I was showing you before.

1031
00:43:13,470 --> 00:43:16,740
So now I'm going to show you
a couple of single trials.

1032
00:43:16,740 --> 00:43:19,440
We're showing
individual images where

1033
00:43:19,440 --> 00:43:20,760
objects are partially occluded.

1034
00:43:20,760 --> 00:43:23,310
In this case, there's
only about 15%

1035
00:43:23,310 --> 00:43:26,070
of the pixels of the face
that are being shown.

1036
00:43:26,070 --> 00:43:28,170
And we see that despite
the fact that we're

1037
00:43:28,170 --> 00:43:31,500
covering 85%, more or
less, of that image,

1038
00:43:31,500 --> 00:43:34,170
we still see a pretty
consistent physiological signal.

1039
00:43:34,170 --> 00:43:35,950
The signals are
clearly not identical.

1040
00:43:35,950 --> 00:43:38,340
For example, this one
looks somewhat different.

1041
00:43:38,340 --> 00:43:40,410
There's a lot of our
ability from one to another.

1042
00:43:40,410 --> 00:43:43,230
But again, these are just single
trials showing that there still

1043
00:43:43,230 --> 00:43:45,600
is selectivity for these
shape, despite the fact

1044
00:43:45,600 --> 00:43:48,700
that we are only showing a
small fraction of this thing.

1045
00:43:48,700 --> 00:43:50,940
These are all the trials
in which these five

1046
00:43:50,940 --> 00:43:52,439
different faces were presented.

1047
00:43:52,439 --> 00:43:53,730
Each line corresponds to trial.

1048
00:43:53,730 --> 00:43:54,930
These are raster plots.

1049
00:43:54,930 --> 00:43:58,150
As you can see, the data
are extremely clear.

1050
00:43:58,150 --> 00:43:59,490
There's no processing here.

1051
00:43:59,490 --> 00:44:01,920
This is raw data single trials.

1052
00:44:01,920 --> 00:44:04,350
These are single trials
with the partial images.

1053
00:44:04,350 --> 00:44:06,790
You again can see there's
a vigorous response here.

1054
00:44:06,790 --> 00:44:08,920
The responses are not
as nicely and neatly

1055
00:44:08,920 --> 00:44:11,250
aligned here, in part
because all of these images

1056
00:44:11,250 --> 00:44:12,070
are different.

1057
00:44:12,070 --> 00:44:14,111
All of the locations on
the models are different.

1058
00:44:14,111 --> 00:44:17,070
As I just showed you, there's
a lot of variability here.

1059
00:44:17,070 --> 00:44:19,870
If you actually fix the
bubble locations-- that

1060
00:44:19,870 --> 00:44:22,800
is, you repeatedly present the
same image multiple times still

1061
00:44:22,800 --> 00:44:25,005
in pseudorandom order,
but the same image,

1062
00:44:25,005 --> 00:44:26,880
you see that the signals
are more consistent.

1063
00:44:26,880 --> 00:44:29,610
Not as consistent as this one,
but certainly more consistent.

1064
00:44:29,610 --> 00:44:33,210
Again, very clear
selective response

1065
00:44:33,210 --> 00:44:37,380
tolerant to a tremendous amount
of occlusion in the image.

1066
00:44:37,380 --> 00:44:39,690
Interestingly, the
latency of the response

1067
00:44:39,690 --> 00:44:43,060
is significantly later
compared to the whole images.

1068
00:44:43,060 --> 00:44:45,330
So if you look at, for
example, 200 milliseconds,

1069
00:44:45,330 --> 00:44:47,340
you see that the responses
started significantly

1070
00:44:47,340 --> 00:44:49,780
before 200 milliseconds
for the whole images.

1071
00:44:49,780 --> 00:44:52,339
All of the responses here
start after 200 milliseconds.

1072
00:44:52,339 --> 00:44:53,880
We spent a significant
amount of time

1073
00:44:53,880 --> 00:44:55,504
trying to characterize
this and showing

1074
00:44:55,504 --> 00:44:57,150
that pattern
completion, the ability

1075
00:44:57,150 --> 00:44:59,420
to recognize objects
that are occluded,

1076
00:44:59,420 --> 00:45:03,410
involves a significant delay
at the physiological level.

1077
00:45:03,410 --> 00:45:06,530
If you use the purely bottom-up
architecture and tried to do

1078
00:45:06,530 --> 00:45:08,300
this in silico--

1079
00:45:08,300 --> 00:45:10,850
this bottom-up model does
not perform very well.

1080
00:45:10,850 --> 00:45:12,680
The performance
deteriorates quite rapidly

1081
00:45:12,680 --> 00:45:15,200
when you start having
significant occlusion.

1082
00:45:15,200 --> 00:45:18,440
I'm going to skip this and just
very quickly argue about some

1083
00:45:18,440 --> 00:45:20,900
of the initial steps
that Bill Lotter has

1084
00:45:20,900 --> 00:45:24,560
been doing, trying to add
recurrency to the models.

1085
00:45:24,560 --> 00:45:27,770
Trying to have both
feedback connections as well

1086
00:45:27,770 --> 00:45:30,290
as recurrent connections
within each layer

1087
00:45:30,290 --> 00:45:33,380
to try to get a model that
will be able to perform pattern

1088
00:45:33,380 --> 00:45:35,840
completion, and therefore,
use these feedback

1089
00:45:35,840 --> 00:45:38,000
signals to allow
us to extrapolate

1090
00:45:38,000 --> 00:45:41,030
from previous information
about these objects.

1091
00:45:41,030 --> 00:45:44,200
Bill will be here Friday
or Monday, I'm not sure.

1092
00:45:44,200 --> 00:45:47,210
So you should talk to him
more about these models.

1093
00:45:47,210 --> 00:45:49,920
Essentially, they belong
to the family of HMAX.

1094
00:45:49,920 --> 00:45:52,350
They belong to a family
of convolutional networks,

1095
00:45:52,350 --> 00:45:54,410
where you have filter
operations, threshold,

1096
00:45:54,410 --> 00:45:56,690
and saturation pooling
on normalization.

1097
00:45:56,690 --> 00:45:59,060
Jim will say about
this family of models

1098
00:45:59,060 --> 00:46:00,380
today in the afternoon.

1099
00:46:00,380 --> 00:46:02,240
These are purely
bottom-up models.

1100
00:46:02,240 --> 00:46:04,820
And what Bill has been doing
is other than recurrent

1101
00:46:04,820 --> 00:46:07,400
and feedback connections,
retraining these models based

1102
00:46:07,400 --> 00:46:09,770
on these recurrent and
feedback connections,

1103
00:46:09,770 --> 00:46:11,600
and then comparing
their performance

1104
00:46:11,600 --> 00:46:13,740
with human psychophysics.

1105
00:46:13,740 --> 00:46:16,429
So this is the behavioral
data that I showed you before.

1106
00:46:16,429 --> 00:46:18,470
This is the performance
of the feedforward model.

1107
00:46:18,470 --> 00:46:22,310
This is the recurrent model
that was able to train.

1108
00:46:22,310 --> 00:46:24,710
Another way to try to
get out whether feedback

1109
00:46:24,710 --> 00:46:26,480
is relevant for
pattern completion

1110
00:46:26,480 --> 00:46:28,430
is to use with backward masking.

1111
00:46:28,430 --> 00:46:30,860
Backward masking means
that you present an image,

1112
00:46:30,860 --> 00:46:32,360
and immediately
after that image,

1113
00:46:32,360 --> 00:46:35,360
within a few milliseconds,
you present noise.

1114
00:46:35,360 --> 00:46:36,650
You present a mask.

1115
00:46:36,650 --> 00:46:39,200
And people have argued
that masking essentially

1116
00:46:39,200 --> 00:46:40,634
interrupts feedback processing.

1117
00:46:40,634 --> 00:46:42,050
Essentially, it
allows you to have

1118
00:46:42,050 --> 00:46:45,222
a bottom-up flow of
information-- stops feedback.

1119
00:46:45,222 --> 00:46:47,180
I don't think this is
quite extremely rigorous.

1120
00:46:47,180 --> 00:46:49,160
I think that the story is
probably far more complicated

1121
00:46:49,160 --> 00:46:49,730
than that.

1122
00:46:49,730 --> 00:46:51,920
But to a first approximation,
you present a picture,

1123
00:46:51,920 --> 00:46:54,200
you have a bottom-up
stream, you put a mask,

1124
00:46:54,200 --> 00:46:57,750
and you interrupt all the
subsequent feedback processing.

1125
00:46:57,750 --> 00:47:00,020
So if you do that at
the behavioral level,

1126
00:47:00,020 --> 00:47:02,990
you can show that when stimuli
are masked, particularly

1127
00:47:02,990 --> 00:47:05,990
if the interval is very short,
you can significantly impair

1128
00:47:05,990 --> 00:47:07,540
pattern completion performance.

1129
00:47:07,540 --> 00:47:10,340
So if the mask comes
within 25 milliseconds

1130
00:47:10,340 --> 00:47:12,440
of the actual
stimulus performance

1131
00:47:12,440 --> 00:47:14,930
in recognizing these
heavily occluded objects

1132
00:47:14,930 --> 00:47:16,490
is significantly impaired.

1133
00:47:16,490 --> 00:47:19,970
We interpreted this to
indicate that feedback may be

1134
00:47:19,970 --> 00:47:23,010
needed for pattern completion.

1135
00:47:23,010 --> 00:47:27,350
This is Bill's instantiation
of that recurrent model.

1136
00:47:27,350 --> 00:47:29,090
Because he has
recurrency now, he also

1137
00:47:29,090 --> 00:47:30,620
has time in this models.

1138
00:47:30,620 --> 00:47:32,000
So he can also
present the image,

1139
00:47:32,000 --> 00:47:35,030
present the mask to the model,
and compare the performance

1140
00:47:35,030 --> 00:47:37,490
of the computational
model as a function

1141
00:47:37,490 --> 00:47:41,514
of the occlusion in unmasked
and the masked conditions.

1142
00:47:41,514 --> 00:47:43,930
So to summarize this-- and
there's still two or three more

1143
00:47:43,930 --> 00:47:45,230
slides that I want to show--

1144
00:47:45,230 --> 00:47:48,410
I've given you three
examples of potential ways

1145
00:47:48,410 --> 00:47:50,880
in which feedback
signals can be important.

1146
00:47:50,880 --> 00:47:53,480
The first one has to do
with the effects of feedback

1147
00:47:53,480 --> 00:47:56,270
on surround suppression,
going from V2 to V1.

1148
00:47:56,270 --> 00:47:58,730
We think that by doing this
type of experiments combined

1149
00:47:58,730 --> 00:48:00,970
with the computational
models to understand what

1150
00:48:00,970 --> 00:48:02,780
are the fundamental
computations,

1151
00:48:02,780 --> 00:48:04,610
we can begin to elucidate
some of the steps

1152
00:48:04,610 --> 00:48:06,770
by which feedback
can exert its role.

1153
00:48:06,770 --> 00:48:08,780
We hoped to come up with
the essential alphabet

1154
00:48:08,780 --> 00:48:11,690
of computations similar to the
filtering and normalization

1155
00:48:11,690 --> 00:48:14,540
operations that are
implemented by feedback.

1156
00:48:14,540 --> 00:48:16,970
The second example
was feedback as being

1157
00:48:16,970 --> 00:48:19,190
able to have features
that dictate what

1158
00:48:19,190 --> 00:48:22,700
we do in visual search
tasks and the last example,

1159
00:48:22,700 --> 00:48:25,130
in both our preliminary
work, trying to use feedback,

1160
00:48:25,130 --> 00:48:27,980
as well as recurrent connections
to perform pattern completion

1161
00:48:27,980 --> 00:48:31,820
and extrapolate from
prior information.

1162
00:48:31,820 --> 00:48:33,500
So the last thing I
wanted to do is just

1163
00:48:33,500 --> 00:48:36,647
flash a few more
slides about a couple

1164
00:48:36,647 --> 00:48:39,230
of things that are happening in
neuroscience and computational

1165
00:48:39,230 --> 00:48:41,870
neuroscience that I
think are tremendously

1166
00:48:41,870 --> 00:48:43,690
exciting for people.

1167
00:48:43,690 --> 00:48:46,550
If I were young again,
these are some of the things

1168
00:48:46,550 --> 00:48:50,010
that I would definitely be very,
very excited to follow up on.

1169
00:48:50,010 --> 00:48:52,940
So the notion that we'll
be able to go inside brains

1170
00:48:52,940 --> 00:48:55,370
and read our biological code,
and eventually write down

1171
00:48:55,370 --> 00:48:58,310
computer code, and build
amazing machines is, I think,

1172
00:48:58,310 --> 00:49:00,000
very appealing and sexy.

1173
00:49:00,000 --> 00:49:02,750
But at the same time,
it's a far cry, right?

1174
00:49:02,750 --> 00:49:05,420
We're a long way from being
able to take biological codes

1175
00:49:05,420 --> 00:49:08,000
and translate that into
computational codes.

1176
00:49:08,000 --> 00:49:10,100
It's really extremely tragic.

1177
00:49:10,100 --> 00:49:11,780
So here are three
reasons why I think

1178
00:49:11,780 --> 00:49:15,800
there's optimism that this may
not be as crazy as it sounds.

1179
00:49:15,800 --> 00:49:18,320
We're beginning to have
tremendous information

1180
00:49:18,320 --> 00:49:20,939
about wiring diagrams
at exquisite resolution.

1181
00:49:20,939 --> 00:49:22,730
There are a lot of
people who are seriously

1182
00:49:22,730 --> 00:49:25,950
thinking about providing
us with maps about which

1183
00:49:25,950 --> 00:49:27,800
neuron talks to
which other neuron.

1184
00:49:27,800 --> 00:49:30,030
And this was not
present ever before.

1185
00:49:30,030 --> 00:49:32,197
So we are now beginning to
have detailed information

1186
00:49:32,197 --> 00:49:34,821
that it's much higher resolution
connectivity than ever before.

1187
00:49:34,821 --> 00:49:36,830
The second one is the
strength in numbers.

1188
00:49:36,830 --> 00:49:38,239
For decades, we've
been recording

1189
00:49:38,239 --> 00:49:39,780
the activity of one
neuron at a time,

1190
00:49:39,780 --> 00:49:41,360
maybe a few neurons at a time.

1191
00:49:41,360 --> 00:49:44,000
Now there are many different
ideas and techniques out there

1192
00:49:44,000 --> 00:49:45,680
by which we can
listen to and monitor

1193
00:49:45,680 --> 00:49:48,006
the activity of multiple
neurons simultaneously.

1194
00:49:48,006 --> 00:49:49,880
And I think this is
going to be game changing

1195
00:49:49,880 --> 00:49:51,921
for neurophysiology, but
also for the possibility

1196
00:49:51,921 --> 00:49:55,390
of reputational models that
are inspired by biology.

1197
00:49:55,390 --> 00:49:57,890
And the third one is a series
of techniques mostly developed

1198
00:49:57,890 --> 00:50:00,510
by people like Ed Boyden
and Karl Deisseroth

1199
00:50:00,510 --> 00:50:03,090
to do optogenetics, and to
manipulate these circuits

1200
00:50:03,090 --> 00:50:04,770
with unprecedented resolution.

1201
00:50:04,770 --> 00:50:07,330
So let me expand on
that for one second.

1202
00:50:07,330 --> 00:50:08,880
This is the C. elegans.

1203
00:50:08,880 --> 00:50:11,760
This is an intramicroscopy
image of how one

1204
00:50:11,760 --> 00:50:13,445
can categorize the circuitry.

1205
00:50:13,445 --> 00:50:15,570
So it turns out that this
pioneering work of Sydney

1206
00:50:15,570 --> 00:50:17,460
Brenner a couple
of decades ago has

1207
00:50:17,460 --> 00:50:21,380
led to mapping the connectivity
of each one of the 302 neurons.

1208
00:50:21,380 --> 00:50:24,167
How exactly for each neuron,
who it's connected with.

1209
00:50:24,167 --> 00:50:26,250
And this is represented
in that rather complex way

1210
00:50:26,250 --> 00:50:27,680
in this diagram here.

1211
00:50:27,680 --> 00:50:29,340
Well, it turns out
that people are

1212
00:50:29,340 --> 00:50:33,810
beginning to do these type
of heroic type of experiments

1213
00:50:33,810 --> 00:50:34,500
in cortex.

1214
00:50:34,500 --> 00:50:36,900
So we're beginning to
have initial insights

1215
00:50:36,900 --> 00:50:38,430
about connectivity
about how neurons

1216
00:50:38,430 --> 00:50:41,710
are wired with each other at
this resolution in cortex.

1217
00:50:41,710 --> 00:50:45,110
We're nowhere near being able
to have these for humans.

1218
00:50:45,110 --> 00:50:47,020
Not even other species,
mice, and so on.

1219
00:50:47,020 --> 00:50:48,209
Not even Drosophila yet.

1220
00:50:48,209 --> 00:50:50,250
There's a huge amount of
[INAUDIBLE] and interest

1221
00:50:50,250 --> 00:50:53,160
in the community of having
a very detailed map.

1222
00:50:53,160 --> 00:50:55,710
So the question for you for
the young and next generation,

1223
00:50:55,710 --> 00:50:57,376
what are we going to
do with these maps.

1224
00:50:57,376 --> 00:51:00,210
If I give you a fantastic
detailed wiring diagram

1225
00:51:00,210 --> 00:51:02,520
of a chunk of
cortex, how is that

1226
00:51:02,520 --> 00:51:05,520
going to transform our ability
to make inferences, and build

1227
00:51:05,520 --> 00:51:07,382
new computational models.

1228
00:51:07,382 --> 00:51:09,090
The second one has to
do with our ability

1229
00:51:09,090 --> 00:51:11,300
to start the recording
for more and more neurons.

1230
00:51:11,300 --> 00:51:13,466
This is that other I didn't
have time to talk about.

1231
00:51:13,466 --> 00:51:16,180
This is work also that Hanlin
did with Matias Ison and Itzhak

1232
00:51:16,180 --> 00:51:16,680
Fried.

1233
00:51:16,680 --> 00:51:18,960
These are recordings of
spikes from human cortex,

1234
00:51:18,960 --> 00:51:21,120
again, in patients
that have epilepsy.

1235
00:51:21,120 --> 00:51:23,990
I'm just flashing this slide
because I had it handy.

1236
00:51:23,990 --> 00:51:25,140
These are 300 neurons.

1237
00:51:25,140 --> 00:51:27,991
This is not a simultaneously
recorded population.

1238
00:51:27,991 --> 00:51:30,240
These are cases where we can
record from a few neurons

1239
00:51:30,240 --> 00:51:31,827
at a time using micro wires now.

1240
00:51:31,827 --> 00:51:33,660
This is different from
the type of recording

1241
00:51:33,660 --> 00:51:34,860
that I showed you before.

1242
00:51:34,860 --> 00:51:37,180
These are actual spikes
that we can record.

1243
00:51:37,180 --> 00:51:40,590
And these 380 neurons
is in a different task.

1244
00:51:40,590 --> 00:51:43,200
So recording from
these 318 neurons

1245
00:51:43,200 --> 00:51:45,912
took us about three
to four years of time.

1246
00:51:45,912 --> 00:51:47,370
There are more and
more people that

1247
00:51:47,370 --> 00:51:49,930
are using either
two photon imaging

1248
00:51:49,930 --> 00:51:53,550
and/or massive multielectrode
arrays that are beginning

1249
00:51:53,550 --> 00:51:56,640
to be able to record the
activity of hundreds of neurons

1250
00:51:56,640 --> 00:51:57,750
simultaneously.

1251
00:51:57,750 --> 00:52:01,530
My good friend and crazy
inventor, Ed Boyden,

1252
00:52:01,530 --> 00:52:04,440
believes that we will be able
to recover from 100,000 neurons

1253
00:52:04,440 --> 00:52:05,370
simultaneously.

1254
00:52:05,370 --> 00:52:07,710
Of course, he is far
more grandiose than I am,

1255
00:52:07,710 --> 00:52:10,224
and he can think big
at this kind of scale.

1256
00:52:10,224 --> 00:52:12,390
But even to think about the
possibility of recording

1257
00:52:12,390 --> 00:52:14,850
from 1,000 or 5,000
neurons simultaneously so

1258
00:52:14,850 --> 00:52:16,890
that in a week or
a month, one may

1259
00:52:16,890 --> 00:52:18,630
be able to have a
tremendous amount

1260
00:52:18,630 --> 00:52:20,150
from a very large population.

1261
00:52:20,150 --> 00:52:22,410
This is going to
be transformative.

1262
00:52:22,410 --> 00:52:25,140
Three decades ago in the
field of molecular biology,

1263
00:52:25,140 --> 00:52:26,817
people would sequence
a single gene,

1264
00:52:26,817 --> 00:52:28,650
and they would publish
the entire sequence--

1265
00:52:28,650 --> 00:52:31,329
ACCGG-- and so on.

1266
00:52:31,329 --> 00:52:32,370
That was the whole paper.

1267
00:52:32,370 --> 00:52:34,119
A grad student would
spend five years just

1268
00:52:34,119 --> 00:52:35,519
sequencing a single gene.

1269
00:52:35,519 --> 00:52:37,560
Now we have the possibility
of downloading genome

1270
00:52:37,560 --> 00:52:39,337
by advances in technology.

1271
00:52:39,337 --> 00:52:40,920
I suspect that a lot
of our recordings

1272
00:52:40,920 --> 00:52:42,270
will become obsolete.

1273
00:52:42,270 --> 00:52:45,330
We'll be able to listen to
the activity of thousands

1274
00:52:45,330 --> 00:52:47,070
of neurons simultaneously.

1275
00:52:47,070 --> 00:52:48,810
And again, it's
for your generation

1276
00:52:48,810 --> 00:52:50,610
to think about how
this will transform

1277
00:52:50,610 --> 00:52:54,000
our understanding of how quick
we can read biological codes.

1278
00:52:54,000 --> 00:52:56,540
In the unlikely event that you
think that that's not enough,

1279
00:52:56,540 --> 00:52:58,350
here's one more
thing that I think

1280
00:52:58,350 --> 00:53:01,740
is transforming how we can
decipher biological codes.

1281
00:53:01,740 --> 00:53:04,080
And that's again, Ed
Boyden using techniques

1282
00:53:04,080 --> 00:53:06,090
that are referred to
as optogenetics, where

1283
00:53:06,090 --> 00:53:10,320
you can manipulate the activity
of specific types of neurons.

1284
00:53:10,320 --> 00:53:12,517
I flashed a lot of
computational models today.

1285
00:53:12,517 --> 00:53:14,850
A lot of hypotheses about
what different connections may

1286
00:53:14,850 --> 00:53:15,440
be doing.

1287
00:53:15,440 --> 00:53:17,023
At some point, we
will be able to test

1288
00:53:17,023 --> 00:53:19,900
some of those hypotheses with
unprecedented resolution.

1289
00:53:19,900 --> 00:53:23,220
So if somebody wanted to
know what is this neuron V2,

1290
00:53:23,220 --> 00:53:24,720
what kind of feedback
its providing,

1291
00:53:24,720 --> 00:53:27,890
we may be able to silence
only neurons in V2 that

1292
00:53:27,890 --> 00:53:29,997
provide feedback to
V1 in a clean manner

1293
00:53:29,997 --> 00:53:31,455
without affecting,
for example, all

1294
00:53:31,455 --> 00:53:34,986
of the other feed-forward
processes, and so on.

1295
00:53:34,986 --> 00:53:36,360
So the amount of
specificity that

1296
00:53:36,360 --> 00:53:39,131
can be derived from these type
of techniques is enormous.

1297
00:53:39,131 --> 00:53:40,380
So that's all I wanted to say.

1298
00:53:40,380 --> 00:53:43,560
So because we have very high
specificity in our ability

1299
00:53:43,560 --> 00:53:45,150
to manipulate
circuits, because we'll

1300
00:53:45,150 --> 00:53:47,525
be able to record the activity
of many, many more neurons

1301
00:53:47,525 --> 00:53:49,290
simultaneously,
and because we'll

1302
00:53:49,290 --> 00:53:51,120
have more and more
detailed diagrams,

1303
00:53:51,120 --> 00:53:54,030
I think that the dream of being
able to read out and decode

1304
00:53:54,030 --> 00:53:57,300
biological codes, and translate
those into competition codes

1305
00:53:57,300 --> 00:53:59,130
is less crazy than it may sound.

1306
00:53:59,130 --> 00:54:02,480
We think that in the next
several years and decades,

1307
00:54:02,480 --> 00:54:04,230
smart people like you
will be able to make

1308
00:54:04,230 --> 00:54:06,810
this tremendous
transformation and discover

1309
00:54:06,810 --> 00:54:08,760
specific algorithms
about intelligence

1310
00:54:08,760 --> 00:54:12,720
by taking direct
inspiration from biology.

1311
00:54:12,720 --> 00:54:14,370
So that's what's
illustrated here.

1312
00:54:14,370 --> 00:54:16,150
We'll be happy to
keep on fighting.

1313
00:54:16,150 --> 00:54:17,780
Andrei and I will fight.

1314
00:54:17,780 --> 00:54:20,580
We will be happy to keep on
fighting about Eva and how

1315
00:54:20,580 --> 00:54:22,200
amazing she is and she isn't.

1316
00:54:22,200 --> 00:54:24,510
What I try to describe is
that by really understanding

1317
00:54:24,510 --> 00:54:26,580
biological codes,
we'll be able to write

1318
00:54:26,580 --> 00:54:28,110
amazing computational code.

1319
00:54:28,110 --> 00:54:29,400
I put a lot of arrows here.

1320
00:54:29,400 --> 00:54:30,810
I'm not claiming QED.

1321
00:54:30,810 --> 00:54:32,730
I'm not saying that
we solve the problem.

1322
00:54:32,730 --> 00:54:36,030
There's a huge amount of
work that we need in here.