WEBVTT

00:00:01.680 --> 00:00:04.080
The following content is
provided under a Creative

00:00:04.080 --> 00:00:05.620
Commons license.

00:00:05.620 --> 00:00:07.920
Your support will help
MIT OpenCourseWare

00:00:07.920 --> 00:00:12.280
continue to offer high quality
educational resources for free.

00:00:12.280 --> 00:00:14.910
To make a donation, or
view additional materials

00:00:14.910 --> 00:00:18.870
from hundreds of MIT courses,
visit MIT OpenCourseWare

00:00:18.870 --> 00:00:21.470
at ocw.mit.edu.

00:00:21.470 --> 00:00:22.970
GABRIEL KREIMAN:
What I'd like to do

00:00:22.970 --> 00:00:25.170
today is give a very
brief introduction

00:00:25.170 --> 00:00:28.550
to neural circuits, why we
study them, how we study them,

00:00:28.550 --> 00:00:30.270
and the possibilities
that come out

00:00:30.270 --> 00:00:31.830
of understanding
biological codes,

00:00:31.830 --> 00:00:35.340
and trying to translate those
ideas into computational codes.

00:00:35.340 --> 00:00:37.260
Then I will be a
bit more specific,

00:00:37.260 --> 00:00:39.270
and discuss some
initial attempts

00:00:39.270 --> 00:00:42.510
at studying the computational
role of feedback signals.

00:00:42.510 --> 00:00:44.610
And then I'll switch
gears and talk

00:00:44.610 --> 00:00:47.280
for a few minutes about a
couple of things that are not

00:00:47.280 --> 00:00:49.200
necessarily related
to things that we've

00:00:49.200 --> 00:00:52.950
made any real work on,
but I'm particularly

00:00:52.950 --> 00:00:55.740
excited about in the context
of open question challenges,

00:00:55.740 --> 00:00:57.450
and opportunities,
and what I think

00:00:57.450 --> 00:01:00.540
will happen over the next
several years in the field.

00:01:00.540 --> 00:01:02.640
In the hope of
inspiring several of you

00:01:02.640 --> 00:01:06.850
to actually solve some of these
open questions in the field.

00:01:06.850 --> 00:01:10.320
So one of the reasons why I'm
very excited about studying

00:01:10.320 --> 00:01:13.560
biology and studying
brains is that our brains

00:01:13.560 --> 00:01:16.080
are the product of millions
of years of evolution.

00:01:16.080 --> 00:01:18.180
And through evolution,
we have discovered

00:01:18.180 --> 00:01:22.109
how to do things that are
interesting, fast, efficient.

00:01:22.109 --> 00:01:24.150
And so if we can understand
the biological cause,

00:01:24.150 --> 00:01:25.800
if we can understand
the machinery

00:01:25.800 --> 00:01:28.620
by which we do all of
these amazing feats, that

00:01:28.620 --> 00:01:30.570
in principle, we
should be able to take

00:01:30.570 --> 00:01:32.460
some of these biological
codes, and write

00:01:32.460 --> 00:01:35.034
computer code that will
do all of those things

00:01:35.034 --> 00:01:35.700
in similar ways.

00:01:35.700 --> 00:01:38.130
In similar ways that we can
write algorithms to compute

00:01:38.130 --> 00:01:39.720
the square root
of 2, there could

00:01:39.720 --> 00:01:41.790
be algorithms that
dictate how we

00:01:41.790 --> 00:01:43.980
see, how we can
recognize objects,

00:01:43.980 --> 00:01:46.950
how we can recognize
auditory events.

00:01:46.950 --> 00:01:49.710
In short, the answer to all
of these Turing questions,

00:01:49.710 --> 00:01:52.140
in some sense, is
hidden somewhere here

00:01:52.140 --> 00:01:53.160
inside our brain.

00:01:53.160 --> 00:01:56.330
So the question is, how can we
listen to neurons and circuits,

00:01:56.330 --> 00:01:58.410
decode their activity,
and maybe even write

00:01:58.410 --> 00:02:00.390
in information in
the brain, and then

00:02:00.390 --> 00:02:04.770
trying to translate all of these
ideas into computational codes.

00:02:04.770 --> 00:02:07.140
So there's a lot of
fascinating properties

00:02:07.140 --> 00:02:08.729
that biological codes cover.

00:02:08.729 --> 00:02:10.320
Needless to say,
we're not quite there

00:02:10.320 --> 00:02:12.990
yet in terms of
computers and robots.

00:02:12.990 --> 00:02:16.570
So our hardware and software
worked for many decades.

00:02:16.570 --> 00:02:21.210
I think it's very unlikely
that your amazing iPhone 6 or 5

00:02:21.210 --> 00:02:24.790
or 7 whatever it is, will last
four, five, six, seven, eight,

00:02:24.790 --> 00:02:25.750
nine decades.

00:02:25.750 --> 00:02:28.140
None of our computers
will last that long.

00:02:28.140 --> 00:02:29.790
Our hardware does.

00:02:29.790 --> 00:02:31.620
There's amazing
parallel computation

00:02:31.620 --> 00:02:32.832
going on in our brains.

00:02:32.832 --> 00:02:34.290
This is quite
distinct from the way

00:02:34.290 --> 00:02:37.620
we think about algorithms and
computation in other domains

00:02:37.620 --> 00:02:38.610
now.

00:02:38.610 --> 00:02:41.160
Our brains have a
reprogrammable architecture.

00:02:41.160 --> 00:02:43.320
The same chunk of
tissue can be used

00:02:43.320 --> 00:02:44.760
for several different purposes.

00:02:44.760 --> 00:02:46.634
Through learning and
through our experiences,

00:02:46.634 --> 00:02:49.530
we can modify those
architectures.

00:02:49.530 --> 00:02:51.690
A thing that has been
quite interesting,

00:02:51.690 --> 00:02:53.630
and that maybe
we'll come back to,

00:02:53.630 --> 00:02:56.640
is the notion of being able
to do single shot learning, as

00:02:56.640 --> 00:02:59.490
opposed to some machine
learning algorithms that require

00:02:59.490 --> 00:03:01.860
lots and lots of data to train.

00:03:01.860 --> 00:03:05.220
We can easily discover
a structure in data.

00:03:05.220 --> 00:03:07.290
The notion of fault
tolerance and robustness

00:03:07.290 --> 00:03:10.650
to transformations
is an essential one.

00:03:10.650 --> 00:03:12.900
Robustness is arguably
a fundamental property

00:03:12.900 --> 00:03:16.410
of biology and one that has been
very, very hard to implement

00:03:16.410 --> 00:03:18.320
in computational circuitry.

00:03:18.320 --> 00:03:20.130
And for engineers,
the whole issue

00:03:20.130 --> 00:03:23.040
about how to have different
systems integrate information,

00:03:23.040 --> 00:03:25.560
and interact with each
other, has been and continues

00:03:25.560 --> 00:03:27.270
to be a fundamental challenge.

00:03:27.270 --> 00:03:28.860
And our brains do
that all the time.

00:03:28.860 --> 00:03:30.235
We're walking down
the street, we

00:03:30.235 --> 00:03:31.620
can integrate
visual information,

00:03:31.620 --> 00:03:34.320
with auditory information with
our targets, our plans, what

00:03:34.320 --> 00:03:38.760
we're interested in doing, on
social interactions, and so on.

00:03:38.760 --> 00:03:40.890
So why do we want to
study neural circuits.

00:03:40.890 --> 00:03:42.630
So I think we are
in the golden era

00:03:42.630 --> 00:03:45.510
right now, because we can begin
to explore the answers to some

00:03:45.510 --> 00:03:49.560
of these Turing questions in
brains at the biological level.

00:03:49.560 --> 00:03:53.100
So we can study high
level cognitive phenomena

00:03:53.100 --> 00:03:55.230
at the level of neurons,
and circuits of neurons.

00:03:55.230 --> 00:03:58.830
And I'll give you a few
examples of that later on.

00:03:58.830 --> 00:04:01.620
More recently, and I'll come
back to this towards the end,

00:04:01.620 --> 00:04:03.390
we've had the
opportunity to begin

00:04:03.390 --> 00:04:06.960
to manipulate, and disrupt, and
interact with neural circuits

00:04:06.960 --> 00:04:09.300
at unprecedented resolution.

00:04:09.300 --> 00:04:12.060
So we can begin
to turn on and off

00:04:12.060 --> 00:04:14.100
specific subsets of neurons.

00:04:14.100 --> 00:04:17.430
And that has tremendously
accelerated our possibility

00:04:17.430 --> 00:04:21.000
to test theories at
the neural level.

00:04:21.000 --> 00:04:24.122
And then again, the notion being
that empirical findings can

00:04:24.122 --> 00:04:26.205
be translated into
computational algorithms-- that

00:04:26.205 --> 00:04:29.564
is, if we really understand
how biology solves the problem,

00:04:29.564 --> 00:04:31.230
in principle, we
should be able to write

00:04:31.230 --> 00:04:34.950
mathematical equations, and
then write code that mimics

00:04:34.950 --> 00:04:36.190
some of those computations.

00:04:36.190 --> 00:04:38.190
And some of the
examples of that, we

00:04:38.190 --> 00:04:40.320
talk about in the visual
system in my presentation,

00:04:40.320 --> 00:04:42.670
but also in Jim
DiCarlo's presentation.

00:04:42.670 --> 00:04:44.670
These are just advertising
for a couple of books

00:04:44.670 --> 00:04:47.100
that I find interesting
and relevant

00:04:47.100 --> 00:04:48.350
in computational neuroscience.

00:04:48.350 --> 00:04:50.532
I'm not going to have
time to do any justice

00:04:50.532 --> 00:04:52.490
to the entire field of
computation neuroscience

00:04:52.490 --> 00:04:52.989
at all.

00:04:52.989 --> 00:04:55.340
So all these slides
will be in Dropbox,

00:04:55.340 --> 00:04:56.970
so if anyone wants
to learn more about

00:04:56.970 --> 00:04:58.140
computational neuroscience.

00:04:58.140 --> 00:05:00.135
These are lot of
tremendous books.

00:05:00.135 --> 00:05:01.760
Larry Abbott is the
author of this one,

00:05:01.760 --> 00:05:04.910
and he'll be talking tonight.

00:05:04.910 --> 00:05:06.930
So how do we study
biological circuitry.

00:05:06.930 --> 00:05:09.620
And I realize that this is
deja vu and very well known

00:05:09.620 --> 00:05:10.730
for many of you.

00:05:10.730 --> 00:05:12.770
But in general,
we have a variety

00:05:12.770 --> 00:05:15.890
of techniques to probe the
function of brain circuits.

00:05:15.890 --> 00:05:18.410
And this is showing
the temporal resolution

00:05:18.410 --> 00:05:20.660
of different techniques,
and the spatial resolution

00:05:20.660 --> 00:05:24.050
of different techniques used
to study neural circuits.

00:05:24.050 --> 00:05:26.090
All the way from
techniques that have

00:05:26.090 --> 00:05:28.370
limited spatial and
temporal resolution,

00:05:28.370 --> 00:05:30.770
such as PET and fMRI--

00:05:30.770 --> 00:05:33.590
techniques that have very
high temporal resolution,

00:05:33.590 --> 00:05:35.810
but relatively poor
spatial resolution--

00:05:35.810 --> 00:05:37.310
all the way to
techniques that allow

00:05:37.310 --> 00:05:40.460
us to interrogate the function
of individual channels

00:05:40.460 --> 00:05:41.560
with neurons.

00:05:41.560 --> 00:05:43.670
So most of what I'm
going to talk about today

00:05:43.670 --> 00:05:46.310
is what we refer to as the
neural circuit level, somewhere

00:05:46.310 --> 00:05:49.580
in between single neurons
and then ensembles of neurons

00:05:49.580 --> 00:05:51.140
recording the local
field potential,

00:05:51.140 --> 00:05:54.350
which give us the resolution
of milliseconds, where we think

00:05:54.350 --> 00:05:56.780
a lot of the computations
in the cortex are happening,

00:05:56.780 --> 00:05:59.900
and where we think we can
begin to elucidate how neurons

00:05:59.900 --> 00:06:02.810
interact with each other.

00:06:02.810 --> 00:06:04.490
So to start from
the very beginning,

00:06:04.490 --> 00:06:06.350
we need to understand
what a neuron does.

00:06:06.350 --> 00:06:10.140
And again, many of you are
quite familiar with this.

00:06:10.140 --> 00:06:12.260
But the basic
fundamental understanding

00:06:12.260 --> 00:06:15.200
of what a neuron does is to
integrate information-- receive

00:06:15.200 --> 00:06:17.240
information through
its dendrites,

00:06:17.240 --> 00:06:19.310
integrates that
information, and decides

00:06:19.310 --> 00:06:22.760
whether to fire a spike or not.

00:06:22.760 --> 00:06:25.610
Interestingly, some of the
basic intuitions of our neuron

00:06:25.610 --> 00:06:29.300
function were essentially
conceived by a Spaniard,

00:06:29.300 --> 00:06:30.320
Ramón y Cajal.

00:06:30.320 --> 00:06:31.730
He wanted to be an artist.

00:06:31.730 --> 00:06:34.700
His parents told him that he
could not become an artist,

00:06:34.700 --> 00:06:37.130
he had to become a
clinician, a medical doctor.

00:06:37.130 --> 00:06:38.940
So he followed the tradition.

00:06:38.940 --> 00:06:40.530
He became a medical doctor.

00:06:40.530 --> 00:06:43.700
But then he said, well, what I
really like doing is drawing.

00:06:43.700 --> 00:06:46.760
And so he bought a microscope,
he put it in his kitchen,

00:06:46.760 --> 00:06:50.100
and he spent a good chunk of
his life drawing, essentially.

00:06:50.100 --> 00:06:53.810
So he would look at neurons,
and he would draw their shapes.

00:06:53.810 --> 00:06:56.540
And that's essentially
how neuroscience started.

00:06:56.540 --> 00:06:59.450
Just from these beautiful
and amazing array

00:06:59.450 --> 00:07:03.050
of drawings of neurons, he
conjectured the basic flow

00:07:03.050 --> 00:07:03.740
of information.

00:07:03.740 --> 00:07:05.739
This notion that this
integration of information

00:07:05.739 --> 00:07:07.640
through dendrites, all
of this integration

00:07:07.640 --> 00:07:08.990
happens in the soma.

00:07:08.990 --> 00:07:11.990
And from there, neurons decide
whether to fire a spike or not.

00:07:11.990 --> 00:07:13.430
Nothing more, nothing less.

00:07:13.430 --> 00:07:16.290
That's essentially
the fundamental unit

00:07:16.290 --> 00:07:18.950
of computation in our brains.

00:07:18.950 --> 00:07:22.670
How do we think about and
model those processes?

00:07:22.670 --> 00:07:24.830
There's a family of
different types of models

00:07:24.830 --> 00:07:28.100
that people have used to
describe what a neuron does.

00:07:28.100 --> 00:07:31.940
These models differ in terms
of their biological accuracy,

00:07:31.940 --> 00:07:34.550
and their computational
complexity.

00:07:34.550 --> 00:07:37.880
One of the most used ones is
perhaps an integrate and fire

00:07:37.880 --> 00:07:38.780
neuron.

00:07:38.780 --> 00:07:41.750
This is a very
simple RC circuit.

00:07:41.750 --> 00:07:45.560
It basically integrates current,
and then through a threshold,

00:07:45.560 --> 00:07:49.670
the neuron decides when to
fire or not to fire a spike.

00:07:49.670 --> 00:07:53.330
This is essentially treating
neurons as point masses.

00:07:53.330 --> 00:07:55.970
There are people out there
who have argued that you

00:07:55.970 --> 00:07:57.152
need more and more detail.

00:07:57.152 --> 00:07:59.360
You need to know exactly
how many dendrites you have,

00:07:59.360 --> 00:08:00.910
and the position
of each dendrite,

00:08:00.910 --> 00:08:02.870
and on and on and on and on.

00:08:02.870 --> 00:08:04.700
What's the exact
resolution at which we

00:08:04.700 --> 00:08:08.750
should study neuron systems is
a fundamental open question.

00:08:08.750 --> 00:08:11.150
We don't know what's the
right level of abstraction.

00:08:11.150 --> 00:08:14.120
There are people who think about
brains in the context of blood

00:08:14.120 --> 00:08:17.129
flow, and millions and millions
of neurons averaged together.

00:08:17.129 --> 00:08:18.920
There are people who
think that we actually

00:08:18.920 --> 00:08:22.130
need to pay attention to
the exact details of how

00:08:22.130 --> 00:08:25.580
every single dendrite integrates
information, and so on.

00:08:25.580 --> 00:08:27.875
For many of us, this
is a sufficient level

00:08:27.875 --> 00:08:28.500
of abstraction.

00:08:28.500 --> 00:08:31.760
The notion that there's a neuron
that can integrate information.

00:08:31.760 --> 00:08:33.799
So we would like
to push this notion

00:08:33.799 --> 00:08:36.860
that we can think about
models with single neurons,

00:08:36.860 --> 00:08:39.080
and see how far we can go,
understanding that we are

00:08:39.080 --> 00:08:43.039
ignoring a lot of the inner
complexity of what's happening

00:08:43.039 --> 00:08:45.890
inside a neuron itself.

00:08:45.890 --> 00:08:47.660
So very, very
briefly just to push

00:08:47.660 --> 00:08:50.580
the notion that this
is not rocket science.

00:08:50.580 --> 00:08:53.810
It's very, very easy to build
these integrate-and-fire model

00:08:53.810 --> 00:08:54.340
simulations.

00:08:54.340 --> 00:08:57.810
I know many of you do
this on a daily basis.

00:08:57.810 --> 00:09:00.970
This is the equation
of the RC circuit.

00:09:00.970 --> 00:09:03.800
There's current that flows
through a capacitance.

00:09:03.800 --> 00:09:07.690
There's current that flows
through the resistance, which,

00:09:07.690 --> 00:09:10.880
this RC circuit, we think of
as composed of the ion channels

00:09:10.880 --> 00:09:12.750
in the membranes of the neurons.

00:09:12.750 --> 00:09:15.350
And this is all there
is to it in terms

00:09:15.350 --> 00:09:18.410
of a lot of the simulation
that we use to understand

00:09:18.410 --> 00:09:20.404
the function of neurons.

00:09:20.404 --> 00:09:22.070
And again, just to
tell you that there's

00:09:22.070 --> 00:09:25.560
nothing scary or fundamentally
difficult about this,

00:09:25.560 --> 00:09:27.479
here's just a couple
of lines in MATLAB

00:09:27.479 --> 00:09:29.270
that you can take a
look at if you've never

00:09:29.270 --> 00:09:30.830
done these kind of simulations.

00:09:30.830 --> 00:09:33.950
This is a very simple and
perhaps even somewhat wrong

00:09:33.950 --> 00:09:37.610
simulation of an
integrate-and-fire neuron.

00:09:37.610 --> 00:09:40.730
But just to tell you that it's
relatively simple to build

00:09:40.730 --> 00:09:42.380
models of individual
neurons that

00:09:42.380 --> 00:09:43.880
have these
fundamental properties

00:09:43.880 --> 00:09:46.760
of being able to integrate
information, and decide

00:09:46.760 --> 00:09:48.020
when to fire a spike.

00:09:48.020 --> 00:09:50.330
The fundamental
questions that we really

00:09:50.330 --> 00:09:53.180
want to tackle in
CBMM have to do

00:09:53.180 --> 00:09:55.100
with putting together
lots of neurons,

00:09:55.100 --> 00:09:57.290
and understanding the
function of circuits.

00:09:57.290 --> 00:09:59.340
It's not enough to understand
individual neurons.

00:09:59.340 --> 00:10:01.900
We need to understand how
they interact together.

00:10:01.900 --> 00:10:04.150
We want to understand
what is there,

00:10:04.150 --> 00:10:07.780
who's there, what are they doing
to whom, and when, and why.

00:10:07.780 --> 00:10:11.290
We really need to understand
the activity of multiple neurons

00:10:11.290 --> 00:10:14.270
together in the
form of circuitry.

00:10:14.270 --> 00:10:16.860
So just a handful of
basic definitions.

00:10:16.860 --> 00:10:18.760
If we have a
circuitry like this,

00:10:18.760 --> 00:10:21.820
where we start connecting
multiple neurons together,

00:10:21.820 --> 00:10:25.660
information flows here in this
circuitry in this direction.

00:10:25.660 --> 00:10:28.780
We refer to the
connections between neurons

00:10:28.780 --> 00:10:31.225
that go in this direction
as feed forward.

00:10:31.225 --> 00:10:33.850
We refer to the connections that
flow in the opposite direction

00:10:33.850 --> 00:10:36.610
as feedback and I use the
word recurrent connections

00:10:36.610 --> 00:10:40.130
for the horizontal connections
within a particular layer.

00:10:40.130 --> 00:10:41.770
So this is just to
fix the nomenclature

00:10:41.770 --> 00:10:45.220
for the discussion that
will come next, and also

00:10:45.220 --> 00:10:49.990
today in the afternoon with
Jim DiCarlo's presentation.

00:10:49.990 --> 00:10:52.300
Throughout a lot
of anatomical work,

00:10:52.300 --> 00:10:55.810
we have begun to elucidate
some of the basic connectivity

00:10:55.810 --> 00:10:58.130
between neurons in the cortex.

00:10:58.130 --> 00:11:00.130
And this is the
primary example that

00:11:00.130 --> 00:11:03.580
has been cited extremely
often of what we understand

00:11:03.580 --> 00:11:06.070
about the connectivity
between different areas

00:11:06.070 --> 00:11:07.600
in the macaque monkey.

00:11:07.600 --> 00:11:10.360
We don't have a diagram like
this for the human brain.

00:11:10.360 --> 00:11:12.220
Most of the detailed
anatomical work

00:11:12.220 --> 00:11:14.740
has been done in
macaque monkeys.

00:11:14.740 --> 00:11:18.850
So each of these boxes here
represents a brain area,

00:11:18.850 --> 00:11:20.770
and this encapsulates
our understanding

00:11:20.770 --> 00:11:22.480
of who talks to
whom, or which area

00:11:22.480 --> 00:11:25.490
talks to which other area
in terms of visual cortex.

00:11:25.490 --> 00:11:27.550
There's a lot of
different parts of cortex

00:11:27.550 --> 00:11:29.680
that represent
visual information.

00:11:29.680 --> 00:11:31.900
Here at the bottom,
we have the retina.

00:11:31.900 --> 00:11:35.410
Information from the retina
flows through to the LGN.

00:11:35.410 --> 00:11:38.710
From the LGN, information
goes to primary visual cortex,

00:11:38.710 --> 00:11:40.310
sitting right here.

00:11:40.310 --> 00:11:42.310
And from there,
there's a cascade

00:11:42.310 --> 00:11:45.580
that is largely parallel, and
at the same time, hierarchical,

00:11:45.580 --> 00:11:48.180
of a conglomerate
of multiple areas

00:11:48.180 --> 00:11:52.190
that are fundamental in
processing visual information.

00:11:52.190 --> 00:11:54.206
We'll talk about some
of these areas next.

00:11:54.206 --> 00:11:56.080
And we'll also talk
about some of these areas

00:11:56.080 --> 00:11:58.150
today in the afternoon
when Jim discusses

00:11:58.150 --> 00:12:00.400
what are the fundamental
computations involved

00:12:00.400 --> 00:12:04.480
in visual object recognition.

00:12:04.480 --> 00:12:06.064
One of the fundamental
clues as to how

00:12:06.064 --> 00:12:07.813
do we understand, how
do we know that this

00:12:07.813 --> 00:12:09.310
is a particular
visual area, how do

00:12:09.310 --> 00:12:12.730
we know that this is
important for our vision,

00:12:12.730 --> 00:12:14.770
has come from
anatomical lesions.

00:12:14.770 --> 00:12:18.080
Mostly in monkeys, but in
some cases, in humans as well.

00:12:18.080 --> 00:12:20.320
So if you make lesions
in some of these areas,

00:12:20.320 --> 00:12:22.510
depending on exactly where
you make that lesion,

00:12:22.510 --> 00:12:25.030
people either become
completely blind,

00:12:25.030 --> 00:12:26.740
or they have a
particular scotoma,

00:12:26.740 --> 00:12:29.510
a particular chunk of the visual
field where they cannot see.

00:12:29.510 --> 00:12:31.420
Or they have more
high order types

00:12:31.420 --> 00:12:35.950
of deficits in terms
of visual recognition.

00:12:35.950 --> 00:12:38.110
As an example, the
primary visual cortex

00:12:38.110 --> 00:12:40.870
was discovered by people who
were of the [INAUDIBLE] they

00:12:40.870 --> 00:12:43.780
were studying, the trajectory
of bullets in soldiers

00:12:43.780 --> 00:12:46.990
during World War I.
And by discovering

00:12:46.990 --> 00:12:49.570
that some of those
peoples had a blind part

00:12:49.570 --> 00:12:52.750
to their visual field, and that
was a topographically organized

00:12:52.750 --> 00:12:55.000
depending on the particular
trajectory of the bullet

00:12:55.000 --> 00:12:57.080
through their occipital cortex.

00:12:57.080 --> 00:13:00.520
And that's how we became to
think about V1 as fundamental

00:13:00.520 --> 00:13:02.200
in visual processing.

00:13:02.200 --> 00:13:03.890
It is not a perfect hierarchy.

00:13:03.890 --> 00:13:06.250
It's not there is
A, B, C, D. Right?

00:13:06.250 --> 00:13:07.250
For a number of reasons.

00:13:07.250 --> 00:13:10.080
One is that there are lots
of parallel connections.

00:13:10.080 --> 00:13:12.130
There are lots of
different stages

00:13:12.130 --> 00:13:14.090
that are connected
to each other.

00:13:14.090 --> 00:13:17.620
And one of the ways
to define a hierarchy

00:13:17.620 --> 00:13:20.680
is by looking at the
timing of the responses

00:13:20.680 --> 00:13:22.700
in different areas.

00:13:22.700 --> 00:13:26.334
So if you look at the average
latency of the response in each

00:13:26.334 --> 00:13:28.000
of these areas, you'll
find that there's

00:13:28.000 --> 00:13:29.470
an approximate hierarchy.

00:13:29.470 --> 00:13:32.680
Information gets out of the
retina approximately at 50

00:13:32.680 --> 00:13:34.210
milliseconds.

00:13:34.210 --> 00:13:36.760
About 60 or so milliseconds
in LGN, and so on.

00:13:36.760 --> 00:13:40.000
So it's approximately
a 10 millisecond cost

00:13:40.000 --> 00:13:42.620
per step in terms of
the average latency.

00:13:42.620 --> 00:13:44.740
However, if you start
looking at the distribution,

00:13:44.740 --> 00:13:46.690
you'll see that it's
not a strict hierarchy.

00:13:46.690 --> 00:13:51.070
For example, there are
neurons in area V4 that

00:13:51.070 --> 00:13:52.730
are the early neurons
in V4 may fire

00:13:52.730 --> 00:13:55.150
before the late neurons in V1.

00:13:55.150 --> 00:13:58.300
And that shows you that the
circuitry is far more complex

00:13:58.300 --> 00:14:00.460
than just a simple hierarchy.

00:14:00.460 --> 00:14:02.680
One way to put some
order into this seemingly

00:14:02.680 --> 00:14:05.650
complex and chaotic
circuitry, one simplification

00:14:05.650 --> 00:14:07.430
is that there are
two main pathways.

00:14:07.430 --> 00:14:09.220
One is the so-called
what pathway.

00:14:09.220 --> 00:14:11.430
The other one is the
so-called where pathway.

00:14:11.430 --> 00:14:14.380
The what pathway essentially
is the ventral pathway.

00:14:14.380 --> 00:14:16.510
It's mostly involved
in object recognition,

00:14:16.510 --> 00:14:18.300
trying to understand
what is there.

00:14:18.300 --> 00:14:20.320
The dorsal pathway,
the where pathway,

00:14:20.320 --> 00:14:22.720
is most involved in
motion, and being

00:14:22.720 --> 00:14:26.080
able to detect where objects
are, stereo, and so on.

00:14:26.080 --> 00:14:28.100
Again, this is not
a strict division,

00:14:28.100 --> 00:14:30.520
but it's a pretty good
approximation that many of us

00:14:30.520 --> 00:14:33.250
have used in terms of
thinking about the fundamental

00:14:33.250 --> 00:14:35.970
computations in these areas.

00:14:35.970 --> 00:14:38.397
Now we often think
about these boxes,

00:14:38.397 --> 00:14:40.480
but of course, there's a
huge amount of complexity

00:14:40.480 --> 00:14:42.130
within each of these boxes.

00:14:42.130 --> 00:14:45.040
So if we zoom in
one of these areas,

00:14:45.040 --> 00:14:47.230
we discover that there's
a complex hierarchy

00:14:47.230 --> 00:14:48.520
of computations.

00:14:48.520 --> 00:14:50.100
There are multiple
different layers.

00:14:50.100 --> 00:14:53.180
The cortex is essentially
a six layer structure.

00:14:53.180 --> 00:14:54.970
And there are specific rules.

00:14:54.970 --> 00:14:57.970
People have referred to this
as a canonical micro circuitry.

00:14:57.970 --> 00:15:01.030
There's a specific set of rules
in terms of how information

00:15:01.030 --> 00:15:04.060
flows from one layer to
another in terms of each

00:15:04.060 --> 00:15:06.100
of these cortical structures.

00:15:06.100 --> 00:15:09.610
To a first approximation,
this canonical circuitry

00:15:09.610 --> 00:15:12.190
is common to most
of these areas.

00:15:12.190 --> 00:15:13.600
There are these
rules about which

00:15:13.600 --> 00:15:15.580
layer receives information
first, and sends

00:15:15.580 --> 00:15:17.320
information to areas
are more or less

00:15:17.320 --> 00:15:20.530
constant throughout
the cortical circuitry.

00:15:20.530 --> 00:15:23.620
This doesn't mean that we
understand this circuitry well,

00:15:23.620 --> 00:15:25.480
or what each of these
connections is doing.

00:15:25.480 --> 00:15:26.900
We certainly don't.

00:15:26.900 --> 00:15:30.100
But these are initial steps
to sort of decipher some

00:15:30.100 --> 00:15:33.270
of these basic biological
connectivity that

00:15:33.270 --> 00:15:36.150
has fundamental computational
properties for vision

00:15:36.150 --> 00:15:38.480
processing.

00:15:38.480 --> 00:15:40.570
So our lab has been
very interested in what

00:15:40.570 --> 00:15:42.610
we call the first
order approximation

00:15:42.610 --> 00:15:45.790
or immediate approximation
to visual object recognition.

00:15:45.790 --> 00:15:48.790
The notion that we can
recognize objects very fast,

00:15:48.790 --> 00:15:51.250
and that this can be
explained, essentially,

00:15:51.250 --> 00:15:54.460
as the bottom-up
hierarchical process.

00:15:54.460 --> 00:15:57.310
Jim DiCarlo is going to
talk about this extensively

00:15:57.310 --> 00:16:00.430
this afternoon, so I'm going to
essentially skip that, and jump

00:16:00.430 --> 00:16:02.860
into more recent work that
we've done trying to think

00:16:02.860 --> 00:16:04.820
about top-down connections.

00:16:04.820 --> 00:16:06.490
But just let me
briefly say why we

00:16:06.490 --> 00:16:09.070
think that the first pass
of visual information

00:16:09.070 --> 00:16:12.100
can be semi-seriously
approximated by these purely

00:16:12.100 --> 00:16:13.510
bottom-up processing.

00:16:13.510 --> 00:16:15.010
One is that at the
behavioral level,

00:16:15.010 --> 00:16:17.470
we can recognize
objects very, very fast.

00:16:17.470 --> 00:16:19.570
There's a series of
psychophysical experiments

00:16:19.570 --> 00:16:21.760
that demonstrate that
if I show you an object,

00:16:21.760 --> 00:16:26.110
recognition can happen within
about 150 milliseconds or so.

00:16:26.110 --> 00:16:28.000
We know that the
physiological signals

00:16:28.000 --> 00:16:30.220
underlying visual
object recognition also

00:16:30.220 --> 00:16:31.780
happen very fast.

00:16:31.780 --> 00:16:34.040
Within about 100 to
150 milliseconds,

00:16:34.040 --> 00:16:37.030
we can find neurons that
show very selective responses

00:16:37.030 --> 00:16:39.740
to complex objects, and again,
you'll see examples of that

00:16:39.740 --> 00:16:42.220
this afternoon.

00:16:42.220 --> 00:16:45.340
The behavior and the physiology
have inspired generations

00:16:45.340 --> 00:16:48.090
of computational models that are
purely bottom-up, where there

00:16:48.090 --> 00:16:51.880
is no recurrency, and that can
be quite successful in terms

00:16:51.880 --> 00:16:53.670
of visual recognition.

00:16:53.670 --> 00:16:56.440
To our first approximation,
the recent excitement

00:16:56.440 --> 00:16:59.320
with deep convolutional
networks can be traced back

00:16:59.320 --> 00:17:01.960
to some of these ideas, and
some of these basic biologically

00:17:01.960 --> 00:17:04.911
inspired computations
that are purely bottom-up.

00:17:04.911 --> 00:17:06.369
So to summarize--
and I'm not going

00:17:06.369 --> 00:17:09.089
to give any more details--
we think that the first 100

00:17:09.089 --> 00:17:12.069
milliseconds or so
of visual processing

00:17:12.069 --> 00:17:15.099
can be approximated by
these purely bottom-up,

00:17:15.099 --> 00:17:19.480
semi hierarchical
sequence of computations.

00:17:19.480 --> 00:17:23.060
And this leaves open a
fundamental question,

00:17:23.060 --> 00:17:27.520
which is, why we have all these
massive feedback connections?

00:17:27.520 --> 00:17:29.500
We know that in cortex,
there are actually

00:17:29.500 --> 00:17:31.840
more recurrent and
feedback connections

00:17:31.840 --> 00:17:33.000
than feed-forward ones.

00:17:33.000 --> 00:17:34.680
And what I'd like
to talk about today

00:17:34.680 --> 00:17:37.690
is a couple of ideas of what all
of those feedback connections

00:17:37.690 --> 00:17:38.810
may be doing.

00:17:38.810 --> 00:17:42.940
So this is an anatomical study
looking at a lot of the boxes

00:17:42.940 --> 00:17:44.980
that I showed you
before, and showing

00:17:44.980 --> 00:17:47.140
how many of the connections
to any given area

00:17:47.140 --> 00:17:49.360
come from one of
these other variants.

00:17:49.360 --> 00:17:52.160
For example, if we take
just primary visual cortex,

00:17:52.160 --> 00:17:54.430
this is saying that
a good fraction

00:17:54.430 --> 00:17:56.770
of the connections to
primary visual cortex

00:17:56.770 --> 00:17:57.970
actually come from V2.

00:17:57.970 --> 00:18:00.490
That's from the next
stage of processing,

00:18:00.490 --> 00:18:02.620
rather than from V1 itself.

00:18:02.620 --> 00:18:05.830
All in all, if you quantify
for a given neuron in V1,

00:18:05.830 --> 00:18:08.260
how many signals are coming
from a bottom-up source that

00:18:08.260 --> 00:18:10.840
is for LGN versus how
many signals are coming

00:18:10.840 --> 00:18:14.050
from other V1 neurons or
from higher visual areas,

00:18:14.050 --> 00:18:16.600
it turns out that there are
more horizontal and top-down

00:18:16.600 --> 00:18:18.402
projections than bottom-up ones.

00:18:18.402 --> 00:18:19.360
So what are they doing?

00:18:19.360 --> 00:18:21.609
If we can approximate the
first 100 milliseconds or so

00:18:21.609 --> 00:18:23.970
of vision so well with
bottom-up hierarchies,

00:18:23.970 --> 00:18:27.560
what are all these
feedback signals doing?

00:18:27.560 --> 00:18:29.800
So this brings me
to three examples

00:18:29.800 --> 00:18:32.050
that I'd like to discuss
today of recent work

00:18:32.050 --> 00:18:34.990
that we've done to take some
initial principles in thinking

00:18:34.990 --> 00:18:37.450
about what this feedback
connections could be doing

00:18:37.450 --> 00:18:40.270
in terms of visual recognition.

00:18:40.270 --> 00:18:42.190
So I'll start by
giving you an example

00:18:42.190 --> 00:18:44.980
of trying to understand
the basic fundamental unit

00:18:44.980 --> 00:18:45.930
of feedback.

00:18:45.930 --> 00:18:47.760
That is these
canonical computations,

00:18:47.760 --> 00:18:51.400
and by looking at the feedback
that happens from V2 to V1

00:18:51.400 --> 00:18:53.594
in the visual system.

00:18:53.594 --> 00:18:55.510
Next, I'm going to give
you an example of what

00:18:55.510 --> 00:18:58.120
happens during a visual
search, where we also

00:18:58.120 --> 00:19:01.030
think that feedback signals may
be playing a fundamental role,

00:19:01.030 --> 00:19:03.550
if you have to do or Where's
Waldo kind of task, where

00:19:03.550 --> 00:19:06.220
you have to search for objects
and in the environment.

00:19:06.220 --> 00:19:08.470
And finally, I will talk
about pattern completion, how

00:19:08.470 --> 00:19:11.290
you can recognize objects that
are heavily occluded, where

00:19:11.290 --> 00:19:13.490
we also think that
feedback signals may

00:19:13.490 --> 00:19:16.340
be playing an important role.

00:19:16.340 --> 00:19:18.100
So before I go on
to describe what

00:19:18.100 --> 00:19:21.430
we're seeing the feedback
from V2 to V1 maybe doing,

00:19:21.430 --> 00:19:23.500
let me describe very
quickly classical work

00:19:23.500 --> 00:19:26.590
that Hubel and Wiesel did
that got them the Nobel Prize

00:19:26.590 --> 00:19:28.090
by recording the
activity of neurons

00:19:28.090 --> 00:19:30.530
in primary visual cortex.

00:19:30.530 --> 00:19:32.410
They started working
in kittens, and then

00:19:32.410 --> 00:19:35.140
subsequently in
monkeys, and discovered

00:19:35.140 --> 00:19:38.170
that there are neurons that
show orientation tuning, meaning

00:19:38.170 --> 00:19:40.390
that they respond
very vigorously.

00:19:40.390 --> 00:19:42.470
These are spikes,
each of these marks

00:19:42.470 --> 00:19:44.020
corresponds to an
action potential,

00:19:44.020 --> 00:19:47.420
the fundamental language
of computation in cortex.

00:19:47.420 --> 00:19:49.510
And this neuron responds
quite vigorously

00:19:49.510 --> 00:19:51.934
when the cat was seeing a
bar of this orientation.

00:19:51.934 --> 00:19:53.350
And essentially,
there's no firing

00:19:53.350 --> 00:19:55.540
at all with this
type of stumulus

00:19:55.540 --> 00:19:57.340
in the receptive field.

00:19:57.340 --> 00:20:00.630
This was fundamental because it
transformed our understanding

00:20:00.630 --> 00:20:03.990
of the essential computations
in primary visual cortex

00:20:03.990 --> 00:20:07.290
in terms of filtering
the initial stimulus.

00:20:07.290 --> 00:20:10.412
This is what we now
describe by Gabor functions.

00:20:10.412 --> 00:20:12.370
And if you look at deep
convolutional networks,

00:20:12.370 --> 00:20:14.820
many of them, if not
perhaps all of them,

00:20:14.820 --> 00:20:17.100
start with some sort
of filtering operation

00:20:17.100 --> 00:20:20.220
that is either Gabor filters
or resembles this type

00:20:20.220 --> 00:20:23.550
of orientation that we think
is a fundamental aspect of how

00:20:23.550 --> 00:20:28.102
we start to process information
in the visual field.

00:20:28.102 --> 00:20:30.310
One of the beautiful things
that Hubel and Wiesel did

00:20:30.310 --> 00:20:32.580
is not only to make
these discoveries,

00:20:32.580 --> 00:20:36.540
but also to come up with very
simple graphical models of how

00:20:36.540 --> 00:20:38.550
they thought this
could come about.

00:20:38.550 --> 00:20:41.040
And this remains today one
of the fundamental ways

00:20:41.040 --> 00:20:43.680
in which we think about how
our orientation tuning may

00:20:43.680 --> 00:20:44.970
come about.

00:20:44.970 --> 00:20:47.880
If you recall the activity
of neurons in the retina

00:20:47.880 --> 00:20:50.640
or in the LGN,
you'll find what's

00:20:50.640 --> 00:20:52.470
called center surround
receptive fields.

00:20:52.470 --> 00:20:56.340
These are circularly
symmetric receptive fields,

00:20:56.340 --> 00:20:59.550
with an area in the center
that excites the neuron,

00:20:59.550 --> 00:21:03.250
and an area in the surround
that inhibits the neuron.

00:21:03.250 --> 00:21:06.390
What they conjecture is that if
you put together multiple LGN

00:21:06.390 --> 00:21:10.920
cells, whose receptive
fields are aligned

00:21:10.920 --> 00:21:14.440
along a certain orientation, and
you simply combine all of them,

00:21:14.440 --> 00:21:17.610
you simply add the responses
of all of those neurons,

00:21:17.610 --> 00:21:19.980
you can get a neuron in
the primary visual cortex

00:21:19.980 --> 00:21:22.000
that has orientation tuning.

00:21:22.000 --> 00:21:22.500
This

00:21:22.500 --> 00:21:25.200
is a problem that's far from
solved, despite the fact

00:21:25.200 --> 00:21:26.964
that we have four
or five decades.

00:21:26.964 --> 00:21:28.380
There are many,
many models of how

00:21:28.380 --> 00:21:30.360
orientation tuning comes about.

00:21:30.360 --> 00:21:33.210
But this remains one of the
basic bottom-up feed-forward

00:21:33.210 --> 00:21:35.310
ideas of how you
can actually build

00:21:35.310 --> 00:21:38.540
orientation tuning from very
simple receptive fields.

00:21:38.540 --> 00:21:40.380
This has informed a
lot of our thinking

00:21:40.380 --> 00:21:43.020
about how basic
computations can give rise

00:21:43.020 --> 00:21:47.760
to orientation tuning in a
purely bottom-up fashion.

00:21:47.760 --> 00:21:49.470
In primary visual
cortex, in addition

00:21:49.470 --> 00:21:52.230
to the so-called simple
cells, are complex cells

00:21:52.230 --> 00:21:54.900
that show invariance
to the exact position

00:21:54.900 --> 00:21:57.540
or the exact phase
of the oriented bar

00:21:57.540 --> 00:21:59.340
within the receptive field.

00:21:59.340 --> 00:22:00.820
And that's illustrated here.

00:22:00.820 --> 00:22:03.000
So this is a simple cell.

00:22:03.000 --> 00:22:05.700
So this simple cell
has orientation tuning,

00:22:05.700 --> 00:22:09.120
meaning that it responds more
vigorously to this orientation

00:22:09.120 --> 00:22:10.920
than to this orientation.

00:22:10.920 --> 00:22:14.580
However, if you change the phase
or the position of the oriented

00:22:14.580 --> 00:22:17.370
bar within the receptive
field, the response

00:22:17.370 --> 00:22:19.410
decreases significantly.

00:22:19.410 --> 00:22:22.440
In contrast to this
complex cell that not only

00:22:22.440 --> 00:22:24.360
has orientation
tuning, meaning that it

00:22:24.360 --> 00:22:27.570
fires more vigorously to this
orientation than to this one,

00:22:27.570 --> 00:22:30.730
but also has phase invariance,
meaning that the response is

00:22:30.730 --> 00:22:33.840
more or less the same way,
regardless of the exact phase

00:22:33.840 --> 00:22:35.460
or the exact position
of the stimulus

00:22:35.460 --> 00:22:37.330
within the receptive field.

00:22:37.330 --> 00:22:39.570
And again, the notion
that they postulated

00:22:39.570 --> 00:22:42.420
is that we can build
these complex cells

00:22:42.420 --> 00:22:44.809
by a summation of activity
or multiple simple cells.

00:22:44.809 --> 00:22:46.350
So again, if you
imagine now that you

00:22:46.350 --> 00:22:50.160
have multiple simple cells
with different receptive fields

00:22:50.160 --> 00:22:52.950
that are centered at
these different positions,

00:22:52.950 --> 00:22:56.310
you can add them up, and
create complex cells.

00:22:56.310 --> 00:22:59.070
These fundamental operations
of simple and complex cells

00:22:59.070 --> 00:23:02.190
and primary visual cortex
can be somehow traced

00:23:02.190 --> 00:23:05.850
to the root of a lot of the
bottom-up hierarchical models.

00:23:05.850 --> 00:23:08.160
A lot of the deep
convolutional networks today

00:23:08.160 --> 00:23:10.650
essentially have variations
on these kind of themes,

00:23:10.650 --> 00:23:13.410
of filtering steps,
nonlinear computations

00:23:13.410 --> 00:23:15.720
that give you invariance,
and a concatenation

00:23:15.720 --> 00:23:18.420
of these filtering
and invariance steps

00:23:18.420 --> 00:23:22.510
along the visual hierarchy.

00:23:22.510 --> 00:23:25.240
So in following
up with this idea,

00:23:25.240 --> 00:23:29.490
I would like to understand
the basics of what's

00:23:29.490 --> 00:23:33.120
the kind of information that's
provided when you have signals

00:23:33.120 --> 00:23:35.730
from V2 to V1.

00:23:35.730 --> 00:23:37.680
To do that, we have
been collaborating

00:23:37.680 --> 00:23:41.160
with Richard Born at Harvard
Medical School, who has

00:23:41.160 --> 00:23:43.980
a way of implanting cryo loops.

00:23:43.980 --> 00:23:48.330
This is a device that can be
implanted in monkeys in areas

00:23:48.330 --> 00:23:50.970
V2, and V3, lower
the temperature,

00:23:50.970 --> 00:23:54.150
and thus reduce or
essentially eliminate activity

00:23:54.150 --> 00:23:55.980
from areas V2 and V3.

00:23:55.980 --> 00:23:59.730
So that means that we can
study V1 without activity

00:23:59.730 --> 00:24:01.270
in area V2 and V3.

00:24:01.270 --> 00:24:04.500
We can study V1 sans feedback.

00:24:04.500 --> 00:24:07.080
So this is an
example of recordings

00:24:07.080 --> 00:24:09.270
of a neuron in this area.

00:24:09.270 --> 00:24:13.170
This is the normal activity
that you get from the neuron.

00:24:13.170 --> 00:24:15.120
Here is when they present
a visual stimulus.

00:24:15.120 --> 00:24:16.770
This is a spontaneous activity.

00:24:16.770 --> 00:24:19.020
Each of these dots
corresponds to a spike.

00:24:19.020 --> 00:24:21.150
Each of these lines
correspond to a repetition

00:24:21.150 --> 00:24:22.439
of the stimulus.

00:24:22.439 --> 00:24:24.480
This is a traditional way
of showing raster plots

00:24:24.480 --> 00:24:26.500
for neuron responses.

00:24:26.500 --> 00:24:28.530
So you see that this is
a spontaneous activity.

00:24:28.530 --> 00:24:29.790
You present the stimulus.

00:24:29.790 --> 00:24:32.910
There's an increase in the
response of this neuron,

00:24:32.910 --> 00:24:35.797
as you might expect.

00:24:35.797 --> 00:24:36.630
Actually, I'm sorry.

00:24:36.630 --> 00:24:38.234
This actually starts here.

00:24:38.234 --> 00:24:40.650
So this is the spontaneous
activity, this is the response.

00:24:40.650 --> 00:24:42.270
Now here, they
turn on their pump.

00:24:42.270 --> 00:24:44.350
They start lowering
the temperature.

00:24:44.350 --> 00:24:46.770
And you see within
a couple of minutes,

00:24:46.770 --> 00:24:49.710
they essentially significantly
reduce the responses.

00:24:49.710 --> 00:24:51.630
The largely silence--
not completely--

00:24:51.630 --> 00:24:55.860
but largely silence
activity in areas V2 and V3.

00:24:55.860 --> 00:24:59.010
And these are reversible, so
when they turn the pumps off,

00:24:59.010 --> 00:25:00.090
activity comes back in.

00:25:00.090 --> 00:25:03.000
So the question is, what
happens in primary visual cortex

00:25:03.000 --> 00:25:08.560
when you don't have
feedback from V2 and V3.

00:25:08.560 --> 00:25:10.650
So the first thing
they have characterized

00:25:10.650 --> 00:25:15.765
is that some of the basic
properties of V1 do not change.

00:25:15.765 --> 00:25:18.360
It's consistent with
the simple models

00:25:18.360 --> 00:25:22.080
that I just told you, where
the orientation tuning

00:25:22.080 --> 00:25:24.990
in the primary visual
cortex is largely

00:25:24.990 --> 00:25:27.240
dictated by the
bottom-up inputs,

00:25:27.240 --> 00:25:29.135
by the signals from the LGN.

00:25:29.135 --> 00:25:30.510
The conjecture
from that would be

00:25:30.510 --> 00:25:32.640
that if you silence
V2 and V3, nothing

00:25:32.640 --> 00:25:34.410
would happen with
orientation tuning

00:25:34.410 --> 00:25:36.070
in primary visual cortex.

00:25:36.070 --> 00:25:38.820
And that's essentially
what they're showing here.

00:25:38.820 --> 00:25:40.544
These are example neurons.

00:25:40.544 --> 00:25:42.210
This is showing
orientation selectivity.

00:25:42.210 --> 00:25:43.830
This is showing
direction selectivity,

00:25:43.830 --> 00:25:45.630
what happens when
you move an oriented

00:25:45.630 --> 00:25:47.380
bar within the receptive field.

00:25:47.380 --> 00:25:49.170
So this is showing
the direction.

00:25:49.170 --> 00:25:51.420
This is showing the
mean normalized response

00:25:51.420 --> 00:25:52.020
of a neuron.

00:25:52.020 --> 00:25:54.720
This is the preferred direction,
and direction orientation

00:25:54.720 --> 00:25:57.390
that gives a maximum response.

00:25:57.390 --> 00:26:01.020
The blue curve corresponds
to when you don't

00:26:01.020 --> 00:26:02.820
have activity in V2 and V3.

00:26:02.820 --> 00:26:04.670
Red corresponds to
their control data.

00:26:04.670 --> 00:26:08.560
And essentially, the tuning
of the neuron was not altered.

00:26:08.560 --> 00:26:12.540
The orientation preferred by
this neuron was not altered.

00:26:12.540 --> 00:26:15.160
The same thing goes for
direction selectivity.

00:26:15.160 --> 00:26:17.640
So the basic problems
of orientation tuning

00:26:17.640 --> 00:26:20.340
and direction selectivity
did not change.

00:26:20.340 --> 00:26:23.570
Let me say a few words about
the dynamics of the responses.

00:26:23.570 --> 00:26:27.210
So here, what I'm showing you
is the mean normalized responses

00:26:27.210 --> 00:26:28.430
as a function of time.

00:26:28.430 --> 00:26:31.360
Time 0 is when the
stimulus is turned on.

00:26:31.360 --> 00:26:34.440
As I told you already, by
about 50 milliseconds or so,

00:26:34.440 --> 00:26:37.890
you get a vigorous response
in primary visual cortex.

00:26:37.890 --> 00:26:40.710
And if we compare the
orange and the blue curves,

00:26:40.710 --> 00:26:44.500
we see that this initial
response is largely identical.

00:26:44.500 --> 00:26:47.400
So the initial response
of these V1 neurons

00:26:47.400 --> 00:26:52.500
is not affected by the
absence of feedback from V2.

00:26:52.500 --> 00:26:53.900
We start to see
effects, we start

00:26:53.900 --> 00:26:56.430
to see a change in
the firing rate here.

00:26:56.430 --> 00:27:02.380
Largely at about 60 milliseconds
or so after presentation.

00:27:02.380 --> 00:27:04.830
So in a highly
oversimplified cartoon,

00:27:04.830 --> 00:27:08.740
I think of this as a bottom-up
Hubel and Wiesel like response,

00:27:08.740 --> 00:27:10.350
driven by LGN.

00:27:10.350 --> 00:27:13.380
And signals from V2
to V1 coming back

00:27:13.380 --> 00:27:15.217
about 10 milliseconds later.

00:27:15.217 --> 00:27:17.550
And that's when we started
seeing some of these feedback

00:27:17.550 --> 00:27:19.380
related effects.

00:27:19.380 --> 00:27:22.260
I told you that some of the
basic properties do not change.

00:27:22.260 --> 00:27:24.600
We interpret this as
being dictated largely

00:27:24.600 --> 00:27:25.860
by bottom-up signals.

00:27:25.860 --> 00:27:27.180
The dynamics do change.

00:27:27.180 --> 00:27:29.200
The initial response
is unaffected.

00:27:29.200 --> 00:27:31.320
The later part of the
response is affected.

00:27:31.320 --> 00:27:33.510
I want to say one
thing that does change.

00:27:33.510 --> 00:27:35.760
And for that, I need to
explain what an area summation

00:27:35.760 --> 00:27:37.470
curve is.

00:27:37.470 --> 00:27:39.720
So if you present the
stimulus within the receptive

00:27:39.720 --> 00:27:43.440
field of a neuron of this size,
you get a certain response.

00:27:43.440 --> 00:27:46.830
As you start increasing
the size of this stimulus,

00:27:46.830 --> 00:27:48.540
you get a more
vigorous response.

00:27:48.540 --> 00:27:49.690
Size matters.

00:27:49.690 --> 00:27:51.300
The larger, the better--

00:27:51.300 --> 00:27:51.990
to a point.

00:27:51.990 --> 00:27:54.270
There comes a point
where it turns out

00:27:54.270 --> 00:27:58.030
that the response of the
neurons starts decreasing again.

00:27:58.030 --> 00:28:00.480
So larger is not always better.

00:28:00.480 --> 00:28:01.880
A little bit larger is better.

00:28:01.880 --> 00:28:04.680
This size has an
inhibitory effect

00:28:04.680 --> 00:28:06.420
overall on the
response of the neuron.

00:28:06.420 --> 00:28:08.190
This is called
surround suppression.

00:28:08.190 --> 00:28:11.100
And these curves have been
characterized in areas

00:28:11.100 --> 00:28:12.360
like primary visual cortex.

00:28:12.360 --> 00:28:16.030
Also in earlier areas
for a very long time.

00:28:16.030 --> 00:28:19.140
It turns out that when you
do these type of experiments

00:28:19.140 --> 00:28:22.380
in the absence of feedback, the
effect of surround suppression

00:28:22.380 --> 00:28:23.800
does not disappear.

00:28:23.800 --> 00:28:27.330
That is, you still have a peak
in the response as a function

00:28:27.330 --> 00:28:28.620
of a stimulus size.

00:28:28.620 --> 00:28:31.207
But there is a reduced amount
of surround suppression.

00:28:31.207 --> 00:28:32.790
That is, when you
don't have feedback,

00:28:32.790 --> 00:28:33.831
there's less suppression.

00:28:33.831 --> 00:28:36.820
You have a larger response
for bigger stimulus.

00:28:36.820 --> 00:28:39.600
So we think that one of the
fundamental computations

00:28:39.600 --> 00:28:41.880
that feedback is
providing here is

00:28:41.880 --> 00:28:44.670
this integration from
multiple neurons in V1

00:28:44.670 --> 00:28:46.200
that happens in V2.

00:28:46.200 --> 00:28:50.160
And then inhibition to
activity of neurons in area V1

00:28:50.160 --> 00:28:51.990
to provide some of
the suppression.

00:28:51.990 --> 00:28:54.060
This is partly the reason
why our neurons are not

00:28:54.060 --> 00:28:57.600
very excited about a uniform
stimulus, like a blank wall.

00:28:57.600 --> 00:29:00.420
Our neurons are interested
in changes, and part of that,

00:29:00.420 --> 00:29:04.560
we think, is dictated by
this feedback from V2 to V1.

00:29:04.560 --> 00:29:08.090
We can model these center
surround interactions

00:29:08.090 --> 00:29:11.950
as a ratio of two Gaussian
curves, two forces.

00:29:11.950 --> 00:29:14.380
One is the one that
increases the response.

00:29:14.380 --> 00:29:16.110
The other one is a
normalization term

00:29:16.110 --> 00:29:19.040
that suppresses the response
when the stimulus is too large.

00:29:19.040 --> 00:29:20.604
There's a number
of parameters here.

00:29:20.604 --> 00:29:22.020
Essentially, you
can think of this

00:29:22.020 --> 00:29:24.840
as a ratio of Gaussians, ROGs.

00:29:24.840 --> 00:29:26.820
There's a ratio of
two Gaussian curves.

00:29:26.820 --> 00:29:28.590
One dictating the
center that responds.

00:29:28.590 --> 00:29:30.380
The other one, the
surround response.

00:29:30.380 --> 00:29:32.190
And to make a long
story short, we

00:29:32.190 --> 00:29:34.230
can feed the data
from the monkey

00:29:34.230 --> 00:29:37.500
with this extremely simple
ratio of Gaussian's model.

00:29:37.500 --> 00:29:39.420
And we can show that
the main parameter

00:29:39.420 --> 00:29:43.800
that feedback seems to be
acting upon is what we call Wn--

00:29:43.800 --> 00:29:47.020
that is this
normalization factor here.

00:29:47.020 --> 00:29:50.610
So that the tuning
factor that dictates

00:29:50.610 --> 00:29:54.444
the strength of the surrounding
division from V2 to V1--

00:29:54.444 --> 00:29:56.610
we think that's one of the
fundamental things that's

00:29:56.610 --> 00:29:59.607
being affected by feedback.

00:29:59.607 --> 00:30:01.190
So we would think
of this as the gain.

00:30:01.190 --> 00:30:03.560
We think of this as
the spatial extent

00:30:03.560 --> 00:30:05.690
over which the V2
can exert its action

00:30:05.690 --> 00:30:07.100
on primary visual cortex.

00:30:07.100 --> 00:30:10.310
We think that's the main
thing that's affected here.

00:30:10.310 --> 00:30:13.400
This type of spatial
effect may be

00:30:13.400 --> 00:30:16.910
important in other role that
has been ascribed to feedback,

00:30:16.910 --> 00:30:19.310
which is the ability
to direct attention

00:30:19.310 --> 00:30:21.620
to specific locations
in the environment.

00:30:21.620 --> 00:30:23.120
I want to come back
to this question

00:30:23.120 --> 00:30:25.520
here, and ask, under
what conditions,

00:30:25.520 --> 00:30:28.910
and how can a feedback also
provide important features

00:30:28.910 --> 00:30:31.465
specific signals from
one area to another.

00:30:31.465 --> 00:30:32.840
And for that, I'm
going to switch

00:30:32.840 --> 00:30:35.850
to another task, another
completely different prep,

00:30:35.850 --> 00:30:37.430
which is the
Where's Waldo task--

00:30:37.430 --> 00:30:38.690
the task of visual search.

00:30:38.690 --> 00:30:41.670
How do we search for particular
objects in the environment.

00:30:41.670 --> 00:30:45.429
And here, it's not sufficient
to focus on a specific location,

00:30:45.429 --> 00:30:47.720
but we need to be able to
search for specific features.

00:30:47.720 --> 00:30:50.750
We need to be able to
bias our visual responses

00:30:50.750 --> 00:30:52.430
for specific features
of the stimulus

00:30:52.430 --> 00:30:54.140
that we're searching for.

00:30:54.140 --> 00:30:56.400
So this is a famous sort
of Where's Waldo task.

00:30:56.400 --> 00:30:58.830
You need to be able to
search for specific features.

00:30:58.830 --> 00:31:02.180
It's not enough to be able to
send feedback from V2 to V1,

00:31:02.180 --> 00:31:05.260
and direct attention, or change
the sizes of the receptive

00:31:05.260 --> 00:31:08.480
fields, or the direct attention
to a specific location.

00:31:08.480 --> 00:31:09.950
Another version
that I'm not going

00:31:09.950 --> 00:31:13.580
to talk about of visual that
has a related theme that relates

00:31:13.580 --> 00:31:15.860
to visual search is
feature based attention,

00:31:15.860 --> 00:31:18.604
when you're actually paying
attention to a particular face,

00:31:18.604 --> 00:31:21.020
to a particular color, to a
particular feature that is not

00:31:21.020 --> 00:31:24.410
necessarily located, and to
space, as our friend here has

00:31:24.410 --> 00:31:26.090
studied quite significantly.

00:31:26.090 --> 00:31:28.710
People always like to know
the answer of where he is at.

00:31:28.710 --> 00:31:29.210
OK.

00:31:29.210 --> 00:31:31.550
So let me tell you about
a computational model

00:31:31.550 --> 00:31:34.280
and some behavioral data
that we have collected

00:31:34.280 --> 00:31:38.300
to try to get at this question
of how feedback signals can

00:31:38.300 --> 00:31:40.640
be relevant for visual search.

00:31:40.640 --> 00:31:45.290
This initial part of
this computational model

00:31:45.290 --> 00:31:48.940
is essentially the HMAX
type of architecture

00:31:48.940 --> 00:31:52.100
that has been pioneered by
Tommy Poggio and several people

00:31:52.100 --> 00:31:55.040
in his lab, most notably,
people like Max Riesenhuber

00:31:55.040 --> 00:31:56.510
and Thomas Serre.

00:31:56.510 --> 00:31:58.340
I was thinking
that by this time,

00:31:58.340 --> 00:32:00.620
people would have described
this in more detail.

00:32:00.620 --> 00:32:02.450
I'm going to go through
these very quickly.

00:32:02.450 --> 00:32:04.280
Again, today in the
afternoon, we'll

00:32:04.280 --> 00:32:07.320
have more discussion about
this family of models.

00:32:07.320 --> 00:32:09.350
So these family of
models essentially

00:32:09.350 --> 00:32:12.830
goes through a series of linear
and non-linear computations

00:32:12.830 --> 00:32:16.730
in a hierarchical way, inspired
by the basic definition

00:32:16.730 --> 00:32:18.830
of simple and
complex cells that I

00:32:18.830 --> 00:32:22.370
described in the work
of Hubel and Wiesel.

00:32:22.370 --> 00:32:26.040
So basically, what these models
do is they take an image.

00:32:26.040 --> 00:32:27.590
These are pixels.

00:32:27.590 --> 00:32:28.760
There's a filtering step.

00:32:28.760 --> 00:32:32.294
This filtering step involves
Gabor filtering of the image.

00:32:32.294 --> 00:32:33.710
In this particular
case, there are

00:32:33.710 --> 00:32:36.020
four different orientations.

00:32:36.020 --> 00:32:37.880
And what do you
get here is a map

00:32:37.880 --> 00:32:42.470
of the visual input after
this linear filtering process.

00:32:42.470 --> 00:32:46.700
The next step in this model
is a local max operation.

00:32:46.700 --> 00:32:50.180
This is pooling neurons that
have similar identical feature

00:32:50.180 --> 00:32:53.360
preferences, but
slightly different scale

00:32:53.360 --> 00:32:54.680
in the receptive fields.

00:32:54.680 --> 00:32:57.530
Or slightly different positions
in their receptive fields.

00:32:57.530 --> 00:33:00.410
And this max operation,
this non-linear operation

00:33:00.410 --> 00:33:03.440
is giving you invariance
to the specific feature.

00:33:03.440 --> 00:33:07.070
So now you can get a response to
the same feature, irrespective

00:33:07.070 --> 00:33:09.770
of the exact scale
or the exact position

00:33:09.770 --> 00:33:12.080
within the receptive field.

00:33:12.080 --> 00:33:15.165
These were labeled
S1 and C1, initially

00:33:15.165 --> 00:33:16.650
in models by Fukushima.

00:33:16.650 --> 00:33:19.130
And this type of nomenclature
was carried on later

00:33:19.130 --> 00:33:21.030
by Tommy and many others.

00:33:21.030 --> 00:33:24.410
And this is directly inspired
by the simple and complex cells

00:33:24.410 --> 00:33:26.840
that I very briefly
showed you previously

00:33:26.840 --> 00:33:29.660
in the recordings
of Hubel and Wiesel.

00:33:29.660 --> 00:33:32.090
These filtering
and max operations

00:33:32.090 --> 00:33:35.150
are repeated throughout the
hierarchy again and again.

00:33:35.150 --> 00:33:37.160
So here's another layer
that has a filtering

00:33:37.160 --> 00:33:40.850
step and a nonlinear max step.

00:33:40.850 --> 00:33:44.690
In this case, this filtering
here is not a Gabor filter.

00:33:44.690 --> 00:33:47.870
We don't really understand very
well what neurons in V2 and V4

00:33:47.870 --> 00:33:48.590
are doing.

00:33:48.590 --> 00:33:51.020
One of the types of
filters that have been used

00:33:51.020 --> 00:33:54.380
and that we are using here
is a radial basis function,

00:33:54.380 --> 00:33:56.630
where the properties of
a neuron in this case

00:33:56.630 --> 00:34:02.150
are dictated by patches taking
randomly from natural images.

00:34:02.150 --> 00:34:04.430
All of this is
purely feed-forward.

00:34:04.430 --> 00:34:06.928
All of this is essentially
the basic ingredient

00:34:06.928 --> 00:34:08.469
of the type of
convolutional networks

00:34:08.469 --> 00:34:11.449
that had been used for
object recognition.

00:34:11.449 --> 00:34:13.190
You can have more layers.

00:34:13.190 --> 00:34:15.199
You can have different
types of computations.

00:34:15.199 --> 00:34:17.179
The basic properties
are essentially

00:34:17.179 --> 00:34:19.370
the ones that are
described briefly here.

00:34:19.370 --> 00:34:21.920
What I really want to talk
about is not the former part,

00:34:21.920 --> 00:34:23.210
but this part of the model.

00:34:23.210 --> 00:34:26.550
Now I ask you, where's Waldo,
you need to do something,

00:34:26.550 --> 00:34:29.510
you need be able to somehow
look at this information,

00:34:29.510 --> 00:34:32.300
and be able to
bias your responses

00:34:32.300 --> 00:34:36.949
or bias the model towards
regions of the visual space

00:34:36.949 --> 00:34:39.949
that have features that resemble
what you're looking for.

00:34:39.949 --> 00:34:42.949
Your car, your keys, Waldo.

00:34:42.949 --> 00:34:45.260
So the way we do that
is first, in this case,

00:34:45.260 --> 00:34:47.093
I'm going to show you
what happens if you're

00:34:47.093 --> 00:34:48.780
looking for the top hat here.

00:34:48.780 --> 00:34:50.210
So first, we have
a representation

00:34:50.210 --> 00:34:51.920
in the model of the top hat.

00:34:51.920 --> 00:34:53.350
This is the hat here.

00:34:53.350 --> 00:34:56.360
And we have a representation
in our vocabulary

00:34:56.360 --> 00:34:59.390
of how units in the highest
echelons of this model

00:34:59.390 --> 00:35:00.290
represent this hat.

00:35:00.290 --> 00:35:02.750
So we have a representation
of the features

00:35:02.750 --> 00:35:07.580
that compose this object at
a high level in this model.

00:35:07.580 --> 00:35:10.490
We use that representation
to modulate,

00:35:10.490 --> 00:35:13.490
in a multiplicative
fashion, the entire image.

00:35:13.490 --> 00:35:15.500
Essentially, we
bias the responses

00:35:15.500 --> 00:35:19.190
in the entire image based
on the particular features

00:35:19.190 --> 00:35:21.560
that we are searching for.

00:35:21.560 --> 00:35:24.170
This is inspired by many
physiological experiments that

00:35:24.170 --> 00:35:27.680
have shown that to a
good approximation,

00:35:27.680 --> 00:35:29.270
this type of
modulation in feature

00:35:29.270 --> 00:35:32.750
based attention has been
observed across different parts

00:35:32.750 --> 00:35:33.590
of the visual field.

00:35:33.590 --> 00:35:36.080
That is, if you're
searching for red objects,

00:35:36.080 --> 00:35:39.152
neurons that like red will
enhance their response

00:35:39.152 --> 00:35:40.610
throughout the
entire visual field.

00:35:40.610 --> 00:35:43.520
So have the entire
visual field modulated

00:35:43.520 --> 00:35:48.770
by the pattern of features that
we're searching for in here.

00:35:48.770 --> 00:35:52.150
After that, we have
a normalization step.

00:35:52.150 --> 00:35:54.270
This normalization
step is critical

00:35:54.270 --> 00:35:57.300
in order to discount
purely bottom-up effects.

00:35:57.300 --> 00:35:59.870
We don't want the competition
between different objects

00:35:59.870 --> 00:36:03.140
to be purely dictated by
which object is brighter,

00:36:03.140 --> 00:36:03.770
for example.

00:36:03.770 --> 00:36:06.680
So we normalize that
after modulating that

00:36:06.680 --> 00:36:09.650
with the features
that we are searching.

00:36:09.650 --> 00:36:13.340
That gives us a map of the
image, where each area has been

00:36:13.340 --> 00:36:15.560
essentially compared
to this feature set

00:36:15.560 --> 00:36:16.910
that we're looking for.

00:36:16.910 --> 00:36:19.190
And then we have a winner
take all mechanism that

00:36:19.190 --> 00:36:21.680
dictates where the model
will pay attention to,

00:36:21.680 --> 00:36:23.840
or where the model
will fixate on first.

00:36:23.840 --> 00:36:27.800
Where the model thinks that a
particular object is located.

00:36:27.800 --> 00:36:30.920
OK so what happens when we
have this feedback that's

00:36:30.920 --> 00:36:33.680
feature specific, and that
modulates the responses based

00:36:33.680 --> 00:36:35.900
on the targets object
that we're searching for.

00:36:35.900 --> 00:36:38.120
In these two images,
either in objects

00:36:38.120 --> 00:36:42.180
arrays or when objects are
embedded in complex scenes,

00:36:42.180 --> 00:36:44.300
we're searching for
this top object.

00:36:44.300 --> 00:36:46.730
And the largest
response in the model

00:36:46.730 --> 00:36:50.090
is indeed in the location
of where the object is.

00:36:50.090 --> 00:36:52.070
In these other two
images, the model

00:36:52.070 --> 00:36:54.410
is searching for
this accordion here.

00:36:54.410 --> 00:36:56.570
And again, the model
was able to find that

00:36:56.570 --> 00:37:00.110
by this comparison of the
features with the stimulus.

00:37:00.110 --> 00:37:03.420
More generally, these
are object array images.

00:37:03.420 --> 00:37:06.170
This is the number
of fixations required

00:37:06.170 --> 00:37:08.960
to find the object in
this object array images.

00:37:08.960 --> 00:37:11.330
So one would correspond
to the first fixation.

00:37:11.330 --> 00:37:14.670
If the model does not find the
object in the first location,

00:37:14.670 --> 00:37:16.520
there's what's called
inhibition of return.

00:37:16.520 --> 00:37:18.470
So we make sure the
model does not come back

00:37:18.470 --> 00:37:20.600
to the same location,
and the model will

00:37:20.600 --> 00:37:24.230
look at the second best
possible location in the image.

00:37:24.230 --> 00:37:27.870
And it will keep on searching
until it finds the object.

00:37:27.870 --> 00:37:31.460
So the model performs in the
first fixation at 60% correct.

00:37:31.460 --> 00:37:33.380
And eventually,
after five fixations,

00:37:33.380 --> 00:37:37.160
it can find the object
almost always right in here.

00:37:37.160 --> 00:37:39.560
This is what you would
expect by random search.

00:37:39.560 --> 00:37:42.020
If you were to randomly
fixate on different objects,

00:37:42.020 --> 00:37:44.130
so the model is doing
much better than that.

00:37:44.130 --> 00:37:45.830
And then for the
aficionados, there's

00:37:45.830 --> 00:37:48.980
a whole plethora of purely
bottom-up models that

00:37:48.980 --> 00:37:50.882
don't have feedback whatsoever.

00:37:50.882 --> 00:37:52.340
This is a family
of models that was

00:37:52.340 --> 00:37:54.920
pioneered by people like
Laurent Itti and Christof Koch.

00:37:54.920 --> 00:37:56.812
These are saliency based models.

00:37:56.812 --> 00:37:59.270
Although you cannot see, there
are a couple of other points

00:37:59.270 --> 00:37:59.990
in here.

00:37:59.990 --> 00:38:03.080
All of those models cannot
find the object either.

00:38:03.080 --> 00:38:05.720
It's not that these objects
that we're searching for

00:38:05.720 --> 00:38:07.550
are more salient,
and therefore, that's

00:38:07.550 --> 00:38:09.290
why the model is finding them.

00:38:09.290 --> 00:38:11.340
We really need
something more than just

00:38:11.340 --> 00:38:13.460
bottom-up pure saliency.

00:38:13.460 --> 00:38:14.960
We did a psychophysical
experiment.

00:38:14.960 --> 00:38:17.690
We asked, well, this is how
the model searches for Waldo.

00:38:17.690 --> 00:38:19.220
How will humans
search for objects

00:38:19.220 --> 00:38:20.910
under the same conditions.

00:38:20.910 --> 00:38:22.530
So we had multiple objects.

00:38:22.530 --> 00:38:24.980
Subjects have to make a
saccade to a target object.

00:38:24.980 --> 00:38:27.770
To make a long story short, this
is the cumulative performance

00:38:27.770 --> 00:38:30.110
of the model and the
number of fixations

00:38:30.110 --> 00:38:32.420
under these conditions,
and the model

00:38:32.420 --> 00:38:35.780
that's reasonable in terms
of how well humans do.

00:38:35.780 --> 00:38:39.867
This is data from every single
individual subject in the task.

00:38:39.867 --> 00:38:41.450
I'm going to skip
some of the details.

00:38:41.450 --> 00:38:45.110
You can compare the errors
that the model is making.

00:38:45.110 --> 00:38:47.960
How consistent people are
with themselves with respect

00:38:47.960 --> 00:38:48.710
to other subjects.

00:38:48.710 --> 00:38:50.960
How good it is with
respect to humans.

00:38:50.960 --> 00:38:53.630
The long story is the
model is far from perfect.

00:38:53.630 --> 00:38:55.280
We don't think that
we have captured

00:38:55.280 --> 00:38:58.010
everything we need to
understand about visual search.

00:38:58.010 --> 00:39:00.500
Some people alluded to before,
for example, the notion

00:39:00.500 --> 00:39:03.080
that the model doesn't
have these major changes

00:39:03.080 --> 00:39:05.870
with eccentricity, and
the fovea, and so on.

00:39:05.870 --> 00:39:07.640
A long way to go, but
we think that we've

00:39:07.640 --> 00:39:10.280
captured some of the
essential initial ingredients

00:39:10.280 --> 00:39:11.240
of visual search.

00:39:11.240 --> 00:39:15.110
And that this is one example of
how visual feedback signals can

00:39:15.110 --> 00:39:18.530
influence this bottom-up
hierarchy for recognition.

00:39:18.530 --> 00:39:21.320
I want to very quickly
move on to a third example

00:39:21.320 --> 00:39:24.410
that I wanted to give you
of how feedback can help

00:39:24.410 --> 00:39:25.790
in terms of visual recognition.

00:39:25.790 --> 00:39:28.870
What are other functions that
feedback could be playing.

00:39:28.870 --> 00:39:31.700
And for that, I'd like to
discuss the work that Hanlin

00:39:31.700 --> 00:39:34.190
did here, and also,
Bill Lotter in the lab,

00:39:34.190 --> 00:39:36.440
in terms of how we
can recognize objects

00:39:36.440 --> 00:39:37.790
that are partially occluded.

00:39:37.790 --> 00:39:39.170
This happens all the time.

00:39:39.170 --> 00:39:42.080
So you walk around and
see objects in the world.

00:39:42.080 --> 00:39:44.300
You can also encounter
objects where you can only

00:39:44.300 --> 00:39:45.930
find partial
information, and you have

00:39:45.930 --> 00:39:47.400
to make pattern completion.

00:39:47.400 --> 00:39:49.340
Pattern completion is
a fundamental aspect

00:39:49.340 --> 00:39:50.120
of intelligence.

00:39:50.120 --> 00:39:52.520
We do that in all
sorts of scenarios.

00:39:52.520 --> 00:39:54.440
It's not just
restricted to vision.

00:39:54.440 --> 00:39:57.350
All of you can probably
complete all of these patterns.

00:39:57.350 --> 00:39:59.410
We use pattern completion
in social scenarios

00:39:59.410 --> 00:40:00.330
as well, right?

00:40:00.330 --> 00:40:02.152
You make inferences
from partial knowledge

00:40:02.152 --> 00:40:04.110
about their intentions,
and what they're doing,

00:40:04.110 --> 00:40:06.120
and what they're
trying to do, OK?

00:40:06.120 --> 00:40:09.340
So we want to study this problem
of how you complete pattern,

00:40:09.340 --> 00:40:12.510
how you extrapolate from
partial limited information

00:40:12.510 --> 00:40:14.885
in the context of
visual recognition.

00:40:14.885 --> 00:40:16.260
There are a lot
of different ways

00:40:16.260 --> 00:40:19.640
in which one can present
partially occluded objects.

00:40:19.640 --> 00:40:21.120
Here are just a few of them.

00:40:21.120 --> 00:40:23.490
What Hanlin did was
use a paradigm called

00:40:23.490 --> 00:40:25.579
bubbles that's shown here.

00:40:25.579 --> 00:40:27.870
Essentially, it's like looking
at the world like these.

00:40:27.870 --> 00:40:29.850
You only have small
windows through which

00:40:29.850 --> 00:40:31.140
you can see the object.

00:40:31.140 --> 00:40:34.759
Performance can be titrated to
make the task harder or easier.

00:40:34.759 --> 00:40:36.300
So if you have a
lot of bubbles, it's

00:40:36.300 --> 00:40:39.750
relatively easy to recognize
that this is a toy school bus.

00:40:39.750 --> 00:40:41.250
If you have only
four bubbles, it's

00:40:41.250 --> 00:40:42.820
actually pretty challenging.

00:40:42.820 --> 00:40:46.950
So we can titrate performance
on the difficulty of this task.

00:40:46.950 --> 00:40:50.760
Very quickly, let me start
by showing you psychophysics

00:40:50.760 --> 00:40:51.540
performance here.

00:40:51.540 --> 00:40:54.570
This is how subjects
perform as a function

00:40:54.570 --> 00:40:58.110
of the amount of occlusion in
the image as a function of how

00:40:58.110 --> 00:41:00.760
many pixels you're
showing for these images.

00:41:00.760 --> 00:41:04.470
And what you see here is
that with 60% occlusion,

00:41:04.470 --> 00:41:06.420
performance is extremely high.

00:41:06.420 --> 00:41:08.910
Performance essentially
drops to chance level

00:41:08.910 --> 00:41:10.940
when the object is
more and more occluded.

00:41:10.940 --> 00:41:13.230
There is a significant
amount of robustness

00:41:13.230 --> 00:41:14.726
in human performance.

00:41:14.726 --> 00:41:16.350
For example, you have
a little bit more

00:41:16.350 --> 00:41:18.510
than 10% of the
pixels in the object,

00:41:18.510 --> 00:41:21.360
and people can still recognize
them reasonably well.

00:41:21.360 --> 00:41:24.280
So this is all behavioral data.

00:41:24.280 --> 00:41:27.540
Let me show you very quickly
what Hanlin discovered

00:41:27.540 --> 00:41:31.110
by doing invasive
recordings in human patients

00:41:31.110 --> 00:41:32.550
while the subjects
were performing

00:41:32.550 --> 00:41:36.060
this recognition of objects
that are partially occluded.

00:41:36.060 --> 00:41:38.610
It's illegal to put
electrodes in the human brain

00:41:38.610 --> 00:41:41.430
in normal people, so
we work with subjects

00:41:41.430 --> 00:41:44.470
that have pharmacological
intractable epilepsy.

00:41:44.470 --> 00:41:46.440
So inside of subjects
that have seizures,

00:41:46.440 --> 00:41:48.930
the neurosurgeons need to
implant electrodes in order

00:41:48.930 --> 00:41:51.180
to localize the seizures.

00:41:51.180 --> 00:41:54.690
And B, in order to ensure
that when they do a resection,

00:41:54.690 --> 00:41:56.700
and they take out the
part of the brain that's

00:41:56.700 --> 00:41:58.710
responsible for seizures,
that they're not

00:41:58.710 --> 00:42:02.380
going to interfere with other
functions, such as language.

00:42:02.380 --> 00:42:05.190
These patients stay in the
hospital for about one week.

00:42:05.190 --> 00:42:07.620
And during this one week,
we have a unique opportunity

00:42:07.620 --> 00:42:11.280
to go inside a human brain,
and record physiological data.

00:42:11.280 --> 00:42:12.700
Depending on the
type of patient,

00:42:12.700 --> 00:42:15.160
we've used the different
types of electrodes.

00:42:15.160 --> 00:42:17.850
This is what some people
refer to as ECoG electrodes.

00:42:17.850 --> 00:42:19.720
Electrocorticographic signals.

00:42:19.720 --> 00:42:21.185
These are field
potential signals,

00:42:21.185 --> 00:42:23.310
very different from the
ones that I was showing you

00:42:23.310 --> 00:42:25.250
in the little spikes before.

00:42:25.250 --> 00:42:28.420
These are aggregate measures,
probably of tens of thousands,

00:42:28.420 --> 00:42:31.350
if not millions of neurons,
where we have very, very

00:42:31.350 --> 00:42:34.350
high temporal resolution at
the millisecond level, but very

00:42:34.350 --> 00:42:38.100
poor spatial resolution, only
being able to localize things

00:42:38.100 --> 00:42:40.980
at the millimeter level or so.

00:42:40.980 --> 00:42:43.500
With these, we can
pinpoint specific locations

00:42:43.500 --> 00:42:45.840
within about approximately
one millimeter,

00:42:45.840 --> 00:42:48.750
but have very high signal to
noise ratio signals that are

00:42:48.750 --> 00:42:50.700
dictated by the visual input.

00:42:50.700 --> 00:42:53.426
An example of those
signals is shown here.

00:42:53.426 --> 00:42:55.050
These are intracranial
field potentials

00:42:55.050 --> 00:42:56.490
as a function of time.

00:42:56.490 --> 00:42:58.250
This is the onset
of the stimulus.

00:42:58.250 --> 00:43:00.330
And these 39
different repetitions,

00:43:00.330 --> 00:43:03.510
when Hanlin is showing
this unoccluded face,

00:43:03.510 --> 00:43:06.507
we see a very vigorous
change, quite systematic

00:43:06.507 --> 00:43:07.590
from one trial to another.

00:43:07.590 --> 00:43:10.200
All of those gray traces
are single trials,

00:43:10.200 --> 00:43:13.470
similar to the raster plot
that I was showing you before.

00:43:13.470 --> 00:43:16.740
So now I'm going to show you
a couple of single trials.

00:43:16.740 --> 00:43:19.440
We're showing
individual images where

00:43:19.440 --> 00:43:20.760
objects are partially occluded.

00:43:20.760 --> 00:43:23.310
In this case, there's
only about 15%

00:43:23.310 --> 00:43:26.070
of the pixels of the face
that are being shown.

00:43:26.070 --> 00:43:28.170
And we see that despite
the fact that we're

00:43:28.170 --> 00:43:31.500
covering 85%, more or
less, of that image,

00:43:31.500 --> 00:43:34.170
we still see a pretty
consistent physiological signal.

00:43:34.170 --> 00:43:35.950
The signals are
clearly not identical.

00:43:35.950 --> 00:43:38.340
For example, this one
looks somewhat different.

00:43:38.340 --> 00:43:40.410
There's a lot of our
ability from one to another.

00:43:40.410 --> 00:43:43.230
But again, these are just single
trials showing that there still

00:43:43.230 --> 00:43:45.600
is selectivity for these
shape, despite the fact

00:43:45.600 --> 00:43:48.700
that we are only showing a
small fraction of this thing.

00:43:48.700 --> 00:43:50.940
These are all the trials
in which these five

00:43:50.940 --> 00:43:52.439
different faces were presented.

00:43:52.439 --> 00:43:53.730
Each line corresponds to trial.

00:43:53.730 --> 00:43:54.930
These are raster plots.

00:43:54.930 --> 00:43:58.150
As you can see, the data
are extremely clear.

00:43:58.150 --> 00:43:59.490
There's no processing here.

00:43:59.490 --> 00:44:01.920
This is raw data single trials.

00:44:01.920 --> 00:44:04.350
These are single trials
with the partial images.

00:44:04.350 --> 00:44:06.790
You again can see there's
a vigorous response here.

00:44:06.790 --> 00:44:08.920
The responses are not
as nicely and neatly

00:44:08.920 --> 00:44:11.250
aligned here, in part
because all of these images

00:44:11.250 --> 00:44:12.070
are different.

00:44:12.070 --> 00:44:14.111
All of the locations on
the models are different.

00:44:14.111 --> 00:44:17.070
As I just showed you, there's
a lot of variability here.

00:44:17.070 --> 00:44:19.870
If you actually fix the
bubble locations-- that

00:44:19.870 --> 00:44:22.800
is, you repeatedly present the
same image multiple times still

00:44:22.800 --> 00:44:25.005
in pseudorandom order,
but the same image,

00:44:25.005 --> 00:44:26.880
you see that the signals
are more consistent.

00:44:26.880 --> 00:44:29.610
Not as consistent as this one,
but certainly more consistent.

00:44:29.610 --> 00:44:33.210
Again, very clear
selective response

00:44:33.210 --> 00:44:37.380
tolerant to a tremendous amount
of occlusion in the image.

00:44:37.380 --> 00:44:39.690
Interestingly, the
latency of the response

00:44:39.690 --> 00:44:43.060
is significantly later
compared to the whole images.

00:44:43.060 --> 00:44:45.330
So if you look at, for
example, 200 milliseconds,

00:44:45.330 --> 00:44:47.340
you see that the responses
started significantly

00:44:47.340 --> 00:44:49.780
before 200 milliseconds
for the whole images.

00:44:49.780 --> 00:44:52.339
All of the responses here
start after 200 milliseconds.

00:44:52.339 --> 00:44:53.880
We spent a significant
amount of time

00:44:53.880 --> 00:44:55.504
trying to characterize
this and showing

00:44:55.504 --> 00:44:57.150
that pattern
completion, the ability

00:44:57.150 --> 00:44:59.420
to recognize objects
that are occluded,

00:44:59.420 --> 00:45:03.410
involves a significant delay
at the physiological level.

00:45:03.410 --> 00:45:06.530
If you use the purely bottom-up
architecture and tried to do

00:45:06.530 --> 00:45:08.300
this in silico--

00:45:08.300 --> 00:45:10.850
this bottom-up model does
not perform very well.

00:45:10.850 --> 00:45:12.680
The performance
deteriorates quite rapidly

00:45:12.680 --> 00:45:15.200
when you start having
significant occlusion.

00:45:15.200 --> 00:45:18.440
I'm going to skip this and just
very quickly argue about some

00:45:18.440 --> 00:45:20.900
of the initial steps
that Bill Lotter has

00:45:20.900 --> 00:45:24.560
been doing, trying to add
recurrency to the models.

00:45:24.560 --> 00:45:27.770
Trying to have both
feedback connections as well

00:45:27.770 --> 00:45:30.290
as recurrent connections
within each layer

00:45:30.290 --> 00:45:33.380
to try to get a model that
will be able to perform pattern

00:45:33.380 --> 00:45:35.840
completion, and therefore,
use these feedback

00:45:35.840 --> 00:45:38.000
signals to allow
us to extrapolate

00:45:38.000 --> 00:45:41.030
from previous information
about these objects.

00:45:41.030 --> 00:45:44.200
Bill will be here Friday
or Monday, I'm not sure.

00:45:44.200 --> 00:45:47.210
So you should talk to him
more about these models.

00:45:47.210 --> 00:45:49.920
Essentially, they belong
to the family of HMAX.

00:45:49.920 --> 00:45:52.350
They belong to a family
of convolutional networks,

00:45:52.350 --> 00:45:54.410
where you have filter
operations, threshold,

00:45:54.410 --> 00:45:56.690
and saturation pooling
on normalization.

00:45:56.690 --> 00:45:59.060
Jim will say about
this family of models

00:45:59.060 --> 00:46:00.380
today in the afternoon.

00:46:00.380 --> 00:46:02.240
These are purely
bottom-up models.

00:46:02.240 --> 00:46:04.820
And what Bill has been doing
is other than recurrent

00:46:04.820 --> 00:46:07.400
and feedback connections,
retraining these models based

00:46:07.400 --> 00:46:09.770
on these recurrent and
feedback connections,

00:46:09.770 --> 00:46:11.600
and then comparing
their performance

00:46:11.600 --> 00:46:13.740
with human psychophysics.

00:46:13.740 --> 00:46:16.429
So this is the behavioral
data that I showed you before.

00:46:16.429 --> 00:46:18.470
This is the performance
of the feedforward model.

00:46:18.470 --> 00:46:22.310
This is the recurrent model
that was able to train.

00:46:22.310 --> 00:46:24.710
Another way to try to
get out whether feedback

00:46:24.710 --> 00:46:26.480
is relevant for
pattern completion

00:46:26.480 --> 00:46:28.430
is to use with backward masking.

00:46:28.430 --> 00:46:30.860
Backward masking means
that you present an image,

00:46:30.860 --> 00:46:32.360
and immediately
after that image,

00:46:32.360 --> 00:46:35.360
within a few milliseconds,
you present noise.

00:46:35.360 --> 00:46:36.650
You present a mask.

00:46:36.650 --> 00:46:39.200
And people have argued
that masking essentially

00:46:39.200 --> 00:46:40.634
interrupts feedback processing.

00:46:40.634 --> 00:46:42.050
Essentially, it
allows you to have

00:46:42.050 --> 00:46:45.222
a bottom-up flow of
information-- stops feedback.

00:46:45.222 --> 00:46:47.180
I don't think this is
quite extremely rigorous.

00:46:47.180 --> 00:46:49.160
I think that the story is
probably far more complicated

00:46:49.160 --> 00:46:49.730
than that.

00:46:49.730 --> 00:46:51.920
But to a first approximation,
you present a picture,

00:46:51.920 --> 00:46:54.200
you have a bottom-up
stream, you put a mask,

00:46:54.200 --> 00:46:57.750
and you interrupt all the
subsequent feedback processing.

00:46:57.750 --> 00:47:00.020
So if you do that at
the behavioral level,

00:47:00.020 --> 00:47:02.990
you can show that when stimuli
are masked, particularly

00:47:02.990 --> 00:47:05.990
if the interval is very short,
you can significantly impair

00:47:05.990 --> 00:47:07.540
pattern completion performance.

00:47:07.540 --> 00:47:10.340
So if the mask comes
within 25 milliseconds

00:47:10.340 --> 00:47:12.440
of the actual
stimulus performance

00:47:12.440 --> 00:47:14.930
in recognizing these
heavily occluded objects

00:47:14.930 --> 00:47:16.490
is significantly impaired.

00:47:16.490 --> 00:47:19.970
We interpreted this to
indicate that feedback may be

00:47:19.970 --> 00:47:23.010
needed for pattern completion.

00:47:23.010 --> 00:47:27.350
This is Bill's instantiation
of that recurrent model.

00:47:27.350 --> 00:47:29.090
Because he has
recurrency now, he also

00:47:29.090 --> 00:47:30.620
has time in this models.

00:47:30.620 --> 00:47:32.000
So he can also
present the image,

00:47:32.000 --> 00:47:35.030
present the mask to the model,
and compare the performance

00:47:35.030 --> 00:47:37.490
of the computational
model as a function

00:47:37.490 --> 00:47:41.514
of the occlusion in unmasked
and the masked conditions.

00:47:41.514 --> 00:47:43.930
So to summarize this-- and
there's still two or three more

00:47:43.930 --> 00:47:45.230
slides that I want to show--

00:47:45.230 --> 00:47:48.410
I've given you three
examples of potential ways

00:47:48.410 --> 00:47:50.880
in which feedback
signals can be important.

00:47:50.880 --> 00:47:53.480
The first one has to do
with the effects of feedback

00:47:53.480 --> 00:47:56.270
on surround suppression,
going from V2 to V1.

00:47:56.270 --> 00:47:58.730
We think that by doing this
type of experiments combined

00:47:58.730 --> 00:48:00.970
with the computational
models to understand what

00:48:00.970 --> 00:48:02.780
are the fundamental
computations,

00:48:02.780 --> 00:48:04.610
we can begin to elucidate
some of the steps

00:48:04.610 --> 00:48:06.770
by which feedback
can exert its role.

00:48:06.770 --> 00:48:08.780
We hoped to come up with
the essential alphabet

00:48:08.780 --> 00:48:11.690
of computations similar to the
filtering and normalization

00:48:11.690 --> 00:48:14.540
operations that are
implemented by feedback.

00:48:14.540 --> 00:48:16.970
The second example
was feedback as being

00:48:16.970 --> 00:48:19.190
able to have features
that dictate what

00:48:19.190 --> 00:48:22.700
we do in visual search
tasks and the last example,

00:48:22.700 --> 00:48:25.130
in both our preliminary
work, trying to use feedback,

00:48:25.130 --> 00:48:27.980
as well as recurrent connections
to perform pattern completion

00:48:27.980 --> 00:48:31.820
and extrapolate from
prior information.

00:48:31.820 --> 00:48:33.500
So the last thing I
wanted to do is just

00:48:33.500 --> 00:48:36.647
flash a few more
slides about a couple

00:48:36.647 --> 00:48:39.230
of things that are happening in
neuroscience and computational

00:48:39.230 --> 00:48:41.870
neuroscience that I
think are tremendously

00:48:41.870 --> 00:48:43.690
exciting for people.

00:48:43.690 --> 00:48:46.550
If I were young again,
these are some of the things

00:48:46.550 --> 00:48:50.010
that I would definitely be very,
very excited to follow up on.

00:48:50.010 --> 00:48:52.940
So the notion that we'll
be able to go inside brains

00:48:52.940 --> 00:48:55.370
and read our biological code,
and eventually write down

00:48:55.370 --> 00:48:58.310
computer code, and build
amazing machines is, I think,

00:48:58.310 --> 00:49:00.000
very appealing and sexy.

00:49:00.000 --> 00:49:02.750
But at the same time,
it's a far cry, right?

00:49:02.750 --> 00:49:05.420
We're a long way from being
able to take biological codes

00:49:05.420 --> 00:49:08.000
and translate that into
computational codes.

00:49:08.000 --> 00:49:10.100
It's really extremely tragic.

00:49:10.100 --> 00:49:11.780
So here are three
reasons why I think

00:49:11.780 --> 00:49:15.800
there's optimism that this may
not be as crazy as it sounds.

00:49:15.800 --> 00:49:18.320
We're beginning to have
tremendous information

00:49:18.320 --> 00:49:20.939
about wiring diagrams
at exquisite resolution.

00:49:20.939 --> 00:49:22.730
There are a lot of
people who are seriously

00:49:22.730 --> 00:49:25.950
thinking about providing
us with maps about which

00:49:25.950 --> 00:49:27.800
neuron talks to
which other neuron.

00:49:27.800 --> 00:49:30.030
And this was not
present ever before.

00:49:30.030 --> 00:49:32.197
So we are now beginning to
have detailed information

00:49:32.197 --> 00:49:34.821
that it's much higher resolution
connectivity than ever before.

00:49:34.821 --> 00:49:36.830
The second one is the
strength in numbers.

00:49:36.830 --> 00:49:38.239
For decades, we've
been recording

00:49:38.239 --> 00:49:39.780
the activity of one
neuron at a time,

00:49:39.780 --> 00:49:41.360
maybe a few neurons at a time.

00:49:41.360 --> 00:49:44.000
Now there are many different
ideas and techniques out there

00:49:44.000 --> 00:49:45.680
by which we can
listen to and monitor

00:49:45.680 --> 00:49:48.006
the activity of multiple
neurons simultaneously.

00:49:48.006 --> 00:49:49.880
And I think this is
going to be game changing

00:49:49.880 --> 00:49:51.921
for neurophysiology, but
also for the possibility

00:49:51.921 --> 00:49:55.390
of reputational models that
are inspired by biology.

00:49:55.390 --> 00:49:57.890
And the third one is a series
of techniques mostly developed

00:49:57.890 --> 00:50:00.510
by people like Ed Boyden
and Karl Deisseroth

00:50:00.510 --> 00:50:03.090
to do optogenetics, and to
manipulate these circuits

00:50:03.090 --> 00:50:04.770
with unprecedented resolution.

00:50:04.770 --> 00:50:07.330
So let me expand on
that for one second.

00:50:07.330 --> 00:50:08.880
This is the C. elegans.

00:50:08.880 --> 00:50:11.760
This is an intramicroscopy
image of how one

00:50:11.760 --> 00:50:13.445
can categorize the circuitry.

00:50:13.445 --> 00:50:15.570
So it turns out that this
pioneering work of Sydney

00:50:15.570 --> 00:50:17.460
Brenner a couple
of decades ago has

00:50:17.460 --> 00:50:21.380
led to mapping the connectivity
of each one of the 302 neurons.

00:50:21.380 --> 00:50:24.167
How exactly for each neuron,
who it's connected with.

00:50:24.167 --> 00:50:26.250
And this is represented
in that rather complex way

00:50:26.250 --> 00:50:27.680
in this diagram here.

00:50:27.680 --> 00:50:29.340
Well, it turns out
that people are

00:50:29.340 --> 00:50:33.810
beginning to do these type
of heroic type of experiments

00:50:33.810 --> 00:50:34.500
in cortex.

00:50:34.500 --> 00:50:36.900
So we're beginning to
have initial insights

00:50:36.900 --> 00:50:38.430
about connectivity
about how neurons

00:50:38.430 --> 00:50:41.710
are wired with each other at
this resolution in cortex.

00:50:41.710 --> 00:50:45.110
We're nowhere near being able
to have these for humans.

00:50:45.110 --> 00:50:47.020
Not even other species,
mice, and so on.

00:50:47.020 --> 00:50:48.209
Not even Drosophila yet.

00:50:48.209 --> 00:50:50.250
There's a huge amount of
[INAUDIBLE] and interest

00:50:50.250 --> 00:50:53.160
in the community of having
a very detailed map.

00:50:53.160 --> 00:50:55.710
So the question for you for
the young and next generation,

00:50:55.710 --> 00:50:57.376
what are we going to
do with these maps.

00:50:57.376 --> 00:51:00.210
If I give you a fantastic
detailed wiring diagram

00:51:00.210 --> 00:51:02.520
of a chunk of
cortex, how is that

00:51:02.520 --> 00:51:05.520
going to transform our ability
to make inferences, and build

00:51:05.520 --> 00:51:07.382
new computational models.

00:51:07.382 --> 00:51:09.090
The second one has to
do with our ability

00:51:09.090 --> 00:51:11.300
to start the recording
for more and more neurons.

00:51:11.300 --> 00:51:13.466
This is that other I didn't
have time to talk about.

00:51:13.466 --> 00:51:16.180
This is work also that Hanlin
did with Matias Ison and Itzhak

00:51:16.180 --> 00:51:16.680
Fried.

00:51:16.680 --> 00:51:18.960
These are recordings of
spikes from human cortex,

00:51:18.960 --> 00:51:21.120
again, in patients
that have epilepsy.

00:51:21.120 --> 00:51:23.990
I'm just flashing this slide
because I had it handy.

00:51:23.990 --> 00:51:25.140
These are 300 neurons.

00:51:25.140 --> 00:51:27.991
This is not a simultaneously
recorded population.

00:51:27.991 --> 00:51:30.240
These are cases where we can
record from a few neurons

00:51:30.240 --> 00:51:31.827
at a time using micro wires now.

00:51:31.827 --> 00:51:33.660
This is different from
the type of recording

00:51:33.660 --> 00:51:34.860
that I showed you before.

00:51:34.860 --> 00:51:37.180
These are actual spikes
that we can record.

00:51:37.180 --> 00:51:40.590
And these 380 neurons
is in a different task.

00:51:40.590 --> 00:51:43.200
So recording from
these 318 neurons

00:51:43.200 --> 00:51:45.912
took us about three
to four years of time.

00:51:45.912 --> 00:51:47.370
There are more and
more people that

00:51:47.370 --> 00:51:49.930
are using either
two photon imaging

00:51:49.930 --> 00:51:53.550
and/or massive multielectrode
arrays that are beginning

00:51:53.550 --> 00:51:56.640
to be able to record the
activity of hundreds of neurons

00:51:56.640 --> 00:51:57.750
simultaneously.

00:51:57.750 --> 00:52:01.530
My good friend and crazy
inventor, Ed Boyden,

00:52:01.530 --> 00:52:04.440
believes that we will be able
to recover from 100,000 neurons

00:52:04.440 --> 00:52:05.370
simultaneously.

00:52:05.370 --> 00:52:07.710
Of course, he is far
more grandiose than I am,

00:52:07.710 --> 00:52:10.224
and he can think big
at this kind of scale.

00:52:10.224 --> 00:52:12.390
But even to think about the
possibility of recording

00:52:12.390 --> 00:52:14.850
from 1,000 or 5,000
neurons simultaneously so

00:52:14.850 --> 00:52:16.890
that in a week or
a month, one may

00:52:16.890 --> 00:52:18.630
be able to have a
tremendous amount

00:52:18.630 --> 00:52:20.150
from a very large population.

00:52:20.150 --> 00:52:22.410
This is going to
be transformative.

00:52:22.410 --> 00:52:25.140
Three decades ago in the
field of molecular biology,

00:52:25.140 --> 00:52:26.817
people would sequence
a single gene,

00:52:26.817 --> 00:52:28.650
and they would publish
the entire sequence--

00:52:28.650 --> 00:52:31.329
ACCGG-- and so on.

00:52:31.329 --> 00:52:32.370
That was the whole paper.

00:52:32.370 --> 00:52:34.119
A grad student would
spend five years just

00:52:34.119 --> 00:52:35.519
sequencing a single gene.

00:52:35.519 --> 00:52:37.560
Now we have the possibility
of downloading genome

00:52:37.560 --> 00:52:39.337
by advances in technology.

00:52:39.337 --> 00:52:40.920
I suspect that a lot
of our recordings

00:52:40.920 --> 00:52:42.270
will become obsolete.

00:52:42.270 --> 00:52:45.330
We'll be able to listen to
the activity of thousands

00:52:45.330 --> 00:52:47.070
of neurons simultaneously.

00:52:47.070 --> 00:52:48.810
And again, it's
for your generation

00:52:48.810 --> 00:52:50.610
to think about how
this will transform

00:52:50.610 --> 00:52:54.000
our understanding of how quick
we can read biological codes.

00:52:54.000 --> 00:52:56.540
In the unlikely event that you
think that that's not enough,

00:52:56.540 --> 00:52:58.350
here's one more
thing that I think

00:52:58.350 --> 00:53:01.740
is transforming how we can
decipher biological codes.

00:53:01.740 --> 00:53:04.080
And that's again, Ed
Boyden using techniques

00:53:04.080 --> 00:53:06.090
that are referred to
as optogenetics, where

00:53:06.090 --> 00:53:10.320
you can manipulate the activity
of specific types of neurons.

00:53:10.320 --> 00:53:12.517
I flashed a lot of
computational models today.

00:53:12.517 --> 00:53:14.850
A lot of hypotheses about
what different connections may

00:53:14.850 --> 00:53:15.440
be doing.

00:53:15.440 --> 00:53:17.023
At some point, we
will be able to test

00:53:17.023 --> 00:53:19.900
some of those hypotheses with
unprecedented resolution.

00:53:19.900 --> 00:53:23.220
So if somebody wanted to
know what is this neuron V2,

00:53:23.220 --> 00:53:24.720
what kind of feedback
its providing,

00:53:24.720 --> 00:53:27.890
we may be able to silence
only neurons in V2 that

00:53:27.890 --> 00:53:29.997
provide feedback to
V1 in a clean manner

00:53:29.997 --> 00:53:31.455
without affecting,
for example, all

00:53:31.455 --> 00:53:34.986
of the other feed-forward
processes, and so on.

00:53:34.986 --> 00:53:36.360
So the amount of
specificity that

00:53:36.360 --> 00:53:39.131
can be derived from these type
of techniques is enormous.

00:53:39.131 --> 00:53:40.380
So that's all I wanted to say.

00:53:40.380 --> 00:53:43.560
So because we have very high
specificity in our ability

00:53:43.560 --> 00:53:45.150
to manipulate
circuits, because we'll

00:53:45.150 --> 00:53:47.525
be able to record the activity
of many, many more neurons

00:53:47.525 --> 00:53:49.290
simultaneously,
and because we'll

00:53:49.290 --> 00:53:51.120
have more and more
detailed diagrams,

00:53:51.120 --> 00:53:54.030
I think that the dream of being
able to read out and decode

00:53:54.030 --> 00:53:57.300
biological codes, and translate
those into competition codes

00:53:57.300 --> 00:53:59.130
is less crazy than it may sound.

00:53:59.130 --> 00:54:02.480
We think that in the next
several years and decades,

00:54:02.480 --> 00:54:04.230
smart people like you
will be able to make

00:54:04.230 --> 00:54:06.810
this tremendous
transformation and discover

00:54:06.810 --> 00:54:08.760
specific algorithms
about intelligence

00:54:08.760 --> 00:54:12.720
by taking direct
inspiration from biology.

00:54:12.720 --> 00:54:14.370
So that's what's
illustrated here.

00:54:14.370 --> 00:54:16.150
We'll be happy to
keep on fighting.

00:54:16.150 --> 00:54:17.780
Andrei and I will fight.

00:54:17.780 --> 00:54:20.580
We will be happy to keep on
fighting about Eva and how

00:54:20.580 --> 00:54:22.200
amazing she is and she isn't.

00:54:22.200 --> 00:54:24.510
What I try to describe is
that by really understanding

00:54:24.510 --> 00:54:26.580
biological codes,
we'll be able to write

00:54:26.580 --> 00:54:28.110
amazing computational code.

00:54:28.110 --> 00:54:29.400
I put a lot of arrows here.

00:54:29.400 --> 00:54:30.810
I'm not claiming QED.

00:54:30.810 --> 00:54:32.730
I'm not saying that
we solve the problem.

00:54:32.730 --> 00:54:36.030
There's a huge amount of
work that we need in here.