WEBVTT
00:00:09.388 --> 00:00:11.680
MICHALE FEE: All right, let's
go ahead and get started.
00:00:11.680 --> 00:00:13.180
So we're starting
a new topic today.
00:00:13.180 --> 00:00:15.450
This is actually one of
my favorite lectures,
00:00:15.450 --> 00:00:20.430
one of my favorite subjects
in computational neuroscience.
00:00:20.430 --> 00:00:23.590
All right, so brief recap
of what we've been doing.
00:00:23.590 --> 00:00:27.750
So we've been working on circuit
models of neural networks.
00:00:27.750 --> 00:00:30.330
And we've been
working on what we
00:00:30.330 --> 00:00:32.430
call a rate model,
in which we replaced
00:00:32.430 --> 00:00:35.670
all the spikes of a
neuron with, essentially,
00:00:35.670 --> 00:00:39.000
a single number
that characterizes
00:00:39.000 --> 00:00:41.550
the rate at which
a neuron fires.
00:00:41.550 --> 00:00:45.960
We introduced a simple
network in which
00:00:45.960 --> 00:00:48.450
we have an input neuron
and an output neuron
00:00:48.450 --> 00:00:52.110
with a synaptic connection
of weight w between them.
00:00:52.110 --> 00:00:56.730
And that synaptic connection
leads to a synaptic input
00:00:56.730 --> 00:00:59.400
that's proportional
to w times the firing
00:00:59.400 --> 00:01:00.870
rate of the input neuron.
00:01:00.870 --> 00:01:03.690
And then we talked about
how we can characterize
00:01:03.690 --> 00:01:06.660
the output, the firing
rate of the output neuron,
00:01:06.660 --> 00:01:12.030
as some nonlinear function
of the total input
00:01:12.030 --> 00:01:15.330
to this output neuron.
00:01:15.330 --> 00:01:18.660
We've talked about
different F-I curves.
00:01:18.660 --> 00:01:22.530
We've talked about having
what's called a binary threshold
00:01:22.530 --> 00:01:25.360
unit, which has zero firing
below some threshold.
00:01:25.360 --> 00:01:28.500
And then actually, there
are different versions
00:01:28.500 --> 00:01:30.270
of the binary threshold unit.
00:01:30.270 --> 00:01:33.870
Sometimes the
firing rate is zero
00:01:33.870 --> 00:01:36.030
for inputs below the threshold.
00:01:36.030 --> 00:01:40.110
And in other models,
we use a minus 1.
00:01:40.110 --> 00:01:44.490
And then a constant firing rate
of one above that threshold.
00:01:44.490 --> 00:01:46.950
And we also talked
about linear neurons,
00:01:46.950 --> 00:01:49.410
where we can write down the
firing rate of the output
00:01:49.410 --> 00:01:54.000
neuron just as a weighted
sum of the inputs.
00:01:54.000 --> 00:01:56.130
And remember that
these neurons are
00:01:56.130 --> 00:01:59.880
kind of special in that they
can have negative firing
00:01:59.880 --> 00:02:05.050
rates, which is not really
biophysically plausible,
00:02:05.050 --> 00:02:09.690
but mathematically, it's very
convenient to have neurons
00:02:09.690 --> 00:02:10.530
like this.
00:02:10.530 --> 00:02:14.490
So we took this simple model
and we expanded it to the case
00:02:14.490 --> 00:02:17.790
where we have many input
neurons and many output neurons.
00:02:17.790 --> 00:02:24.090
So now we have a vector of input
firing rates, u, and a vector
00:02:24.090 --> 00:02:25.590
of output firing rates, u.
00:02:25.590 --> 00:02:27.860
And for the case
of linear neurons,
00:02:27.860 --> 00:02:29.850
we talked about how
you can write down
00:02:29.850 --> 00:02:33.180
the vector of firing
rates of the output neuron
00:02:33.180 --> 00:02:38.910
simply as a matrix product of a
weight matrix times the vector
00:02:38.910 --> 00:02:39.990
of input firing rates.
00:02:39.990 --> 00:02:45.970
And we talked about how this
can produce transformations
00:02:45.970 --> 00:02:47.660
of this vector of
input firing rates.
00:02:47.660 --> 00:02:51.430
So in this high-dimensional
space of inputs,
00:02:51.430 --> 00:02:54.850
we can imagine stretching
that input vector
00:02:54.850 --> 00:02:58.960
along different directions to
amplify certain directions that
00:02:58.960 --> 00:03:01.240
may be more important
than others.
00:03:01.240 --> 00:03:02.740
We talked about how
you can do that,
00:03:02.740 --> 00:03:06.350
stretch in arbitrary directions,
not just along the axes.
00:03:06.350 --> 00:03:10.390
And we talked about
how that vector of--
00:03:10.390 --> 00:03:14.510
that, sorry, matrix of weights
can produce a rotation.
00:03:14.510 --> 00:03:17.230
So we can have
some set of inputs
00:03:17.230 --> 00:03:20.140
where, let's say,
we have clusters
00:03:20.140 --> 00:03:22.187
of different input
values corresponding
00:03:22.187 --> 00:03:23.020
to different things.
00:03:23.020 --> 00:03:27.430
And you can rotate that
to put certain features
00:03:27.430 --> 00:03:29.380
in particular output neurons.
00:03:29.380 --> 00:03:31.240
So now you can
discriminate one class
00:03:31.240 --> 00:03:33.280
of objects from another
class of objects
00:03:33.280 --> 00:03:36.370
by looking at just
one dimension and not
00:03:36.370 --> 00:03:39.940
the whole
high-dimensional space.
00:03:39.940 --> 00:03:43.980
So today, we're going to look
at a new kind of network called
00:03:43.980 --> 00:03:45.910
a recurrent neural
network, where not
00:03:45.910 --> 00:03:50.650
only do we have inputs
to our output neurons
00:03:50.650 --> 00:03:54.400
from an input layer, but
we also have connections
00:03:54.400 --> 00:03:56.770
between the neurons
in the output layer.
00:03:56.770 --> 00:04:01.750
So these neurons in a recurrent
network talk to each other.
00:04:01.750 --> 00:04:07.160
And that imbues some really cool
properties onto these networks.
00:04:07.160 --> 00:04:10.090
So we're going to
develop the math
00:04:10.090 --> 00:04:11.890
and describe how
these things work
00:04:11.890 --> 00:04:15.100
to develop an intuition for
how recurrent networks respond
00:04:15.100 --> 00:04:16.180
to their inputs.
00:04:16.180 --> 00:04:20.019
We're going to get into
some of the computations
00:04:20.019 --> 00:04:22.330
that recurrent networks can do.
00:04:22.330 --> 00:04:26.980
They can act as amplifiers
in particular directions.
00:04:26.980 --> 00:04:30.250
They can act as integrators, so
they can accumulate information
00:04:30.250 --> 00:04:31.540
over time.
00:04:31.540 --> 00:04:33.430
They can generate sequences.
00:04:33.430 --> 00:04:35.620
They can act as
short-term memories
00:04:35.620 --> 00:04:39.170
of either continuous variables
or discrete variables.
00:04:39.170 --> 00:04:43.420
It's a very powerful kind
of circuit architecture.
00:04:43.420 --> 00:04:46.930
And on top of that, in order to
describe these mathematically,
00:04:46.930 --> 00:04:49.840
we're going to use all of
the linear algebra tools
00:04:49.840 --> 00:04:51.920
that we've been
developing so far.
00:04:51.920 --> 00:04:57.190
So, hopefully, a bunch of things
will kind of connect together.
00:04:57.190 --> 00:05:00.360
OK, so mathematical description
of recurrent networks.
00:05:00.360 --> 00:05:02.117
We're going to
talk about dynamics
00:05:02.117 --> 00:05:03.700
in these recurrent
networks, and we're
00:05:03.700 --> 00:05:06.460
going to start with
the very simplest kind
00:05:06.460 --> 00:05:09.100
of recurrent network
called an autapse network.
00:05:09.100 --> 00:05:13.360
Then we're going to extend
that to the general case
00:05:13.360 --> 00:05:16.450
of recurrent connectivity.
00:05:16.450 --> 00:05:18.520
And then we're going to
talk about how recurrent
00:05:18.520 --> 00:05:20.570
networks store memories.
00:05:20.570 --> 00:05:25.720
So we'll start talking about
a specific circuit models
00:05:25.720 --> 00:05:28.570
for storing short-term memories.
00:05:28.570 --> 00:05:33.280
And I'll touch on recurrent
networks for decision-making.
00:05:33.280 --> 00:05:38.620
And this will kind of lead
into the last few lectures
00:05:38.620 --> 00:05:41.920
of the class, where
we get into how
00:05:41.920 --> 00:05:45.610
sort of specific cases of
looking at how networks
00:05:45.610 --> 00:05:48.140
can store memories.
00:05:48.140 --> 00:05:50.500
OK, mathematical description.
00:05:50.500 --> 00:05:53.980
All right, so the first
thing that we need to do is--
00:05:53.980 --> 00:05:56.740
the really cool thing
about recurrent networks
00:05:56.740 --> 00:06:01.060
is that their activity
can evolve over time.
00:06:01.060 --> 00:06:05.500
So we need to talk about
dynamics, all right?
00:06:05.500 --> 00:06:08.170
The feed-forward networks
that we've been talking about,
00:06:08.170 --> 00:06:11.000
we just put in an input.
00:06:11.000 --> 00:06:14.210
It gets weighted by
synaptic strength,
00:06:14.210 --> 00:06:17.570
and we get a firing
rate in the output,
00:06:17.570 --> 00:06:19.160
just sort of instantaneously.
00:06:19.160 --> 00:06:21.500
We've been thinking
of you put an input,
00:06:21.500 --> 00:06:23.090
and you get an output.
00:06:23.090 --> 00:06:25.070
In general, neural
networks don't do that.
00:06:25.070 --> 00:06:27.740
You put an input, and
things change over time
00:06:27.740 --> 00:06:30.380
until you settle at
some output, maybe,
00:06:30.380 --> 00:06:33.830
or it starts doing something
interesting, all right?
00:06:33.830 --> 00:06:38.120
So the time course
of the activity
00:06:38.120 --> 00:06:39.770
becomes very
important, all right?
00:06:39.770 --> 00:06:43.180
So neurons don't respond
instantaneously to inputs.
00:06:43.180 --> 00:06:45.200
There are synaptic delays.
00:06:45.200 --> 00:06:48.140
There are integration
of membrane potential.
00:06:48.140 --> 00:06:49.970
Things change over time.
00:06:49.970 --> 00:06:53.300
And a specific example of
this that we saw in the past
00:06:53.300 --> 00:06:55.280
is that if you have
an input spike,
00:06:55.280 --> 00:06:59.480
you can produce a postsynaptic
current that jumps up abruptly
00:06:59.480 --> 00:07:01.910
as the synaptic
conductance turns on.
00:07:01.910 --> 00:07:04.670
And then the
synaptic conductance
00:07:04.670 --> 00:07:09.140
decays away as the
neurotransmitter unbinds
00:07:09.140 --> 00:07:10.670
from the neurotransmitter
receptor,
00:07:10.670 --> 00:07:15.160
and you get a synaptic current
that decays away over time, OK?
00:07:15.160 --> 00:07:19.720
So that's a simple kind of time
dependence that you would get.
00:07:19.720 --> 00:07:22.850
And that could lead
to time dependence
00:07:22.850 --> 00:07:26.000
in the firing rate
of the output neuron.
00:07:26.000 --> 00:07:28.190
OK, dendritic
propagation, membrane
00:07:28.190 --> 00:07:33.470
time constant, other examples
of how things can take time
00:07:33.470 --> 00:07:35.060
in a neural network.
00:07:35.060 --> 00:07:36.680
All right, so we're
going to model
00:07:36.680 --> 00:07:39.450
the firing rate of our output
neuron in the following way.
00:07:39.450 --> 00:07:42.170
If we have an input
firing rate that's zero
00:07:42.170 --> 00:07:46.400
and then steps up to some
constant and then steps down,
00:07:46.400 --> 00:07:51.740
we're going to model the output,
the firing rate of the output
00:07:51.740 --> 00:07:54.375
neuron, using exactly the
same kind of first order
00:07:54.375 --> 00:07:56.000
linear differential
equation that we've
00:07:56.000 --> 00:07:59.120
been using all along for
the membrane potential,
00:07:59.120 --> 00:08:00.817
for the Hodgkin-Huxley
gating variables.
00:08:00.817 --> 00:08:02.900
The same kind of differential
equation that you've
00:08:02.900 --> 00:08:04.808
seen over and over again.
00:08:04.808 --> 00:08:07.100
So that's the differential
equation we're going to use.
00:08:07.100 --> 00:08:11.330
We're going to say that the
time derivative of the firing
00:08:11.330 --> 00:08:14.210
rate of the output neuron
times the time constant
00:08:14.210 --> 00:08:17.960
is just equal to minus the
firing rate of the output
00:08:17.960 --> 00:08:19.740
non plus v infinity.
00:08:19.740 --> 00:08:22.697
And so you know that the
solution to this equation
00:08:22.697 --> 00:08:24.530
is that the firing rate
of the output neuron
00:08:24.530 --> 00:08:31.280
will just relax exponentially
to some new v infinity.
00:08:31.280 --> 00:08:34.520
And the v infinity
that we're going to use
00:08:34.520 --> 00:08:38.870
is just this non-linear function
times the weighted input
00:08:38.870 --> 00:08:41.870
to our neuron.
00:08:41.870 --> 00:08:47.260
So we're going to take the
formalism that we developed
00:08:47.260 --> 00:08:49.690
for our feed-forward
networks to say,
00:08:49.690 --> 00:08:51.460
what is the firing
rate of the output
00:08:51.460 --> 00:08:54.040
neuron as a function
of the inputs?
00:08:54.040 --> 00:08:56.380
And we're going to use
that firing rate that we've
00:08:56.380 --> 00:09:01.810
been using before as the
v infinity for our network
00:09:01.810 --> 00:09:03.220
with dynamics.
00:09:03.220 --> 00:09:04.330
Any questions about that?
00:09:07.150 --> 00:09:10.310
All right, so that becomes
our differential equation now
00:09:10.310 --> 00:09:14.390
for this recurrent
network, all right?
00:09:14.390 --> 00:09:17.510
So it's just a first order
linear differential equation,
00:09:17.510 --> 00:09:21.080
where the v infinity, the steady
state firing rate of the output
00:09:21.080 --> 00:09:25.520
neuron, is just this nonlinear
function times the weighted sum
00:09:25.520 --> 00:09:28.710
of all the inputs.
00:09:28.710 --> 00:09:31.460
All right, and actually, for
most of what we do today,
00:09:31.460 --> 00:09:35.160
we're going to just take
the case of a linear neuron.
00:09:35.160 --> 00:09:35.660
All right.
00:09:41.370 --> 00:09:42.630
So this I've already said.
00:09:42.630 --> 00:09:44.460
This I've already said.
00:09:44.460 --> 00:09:47.650
And actually, what I'm doing
here is just extending this.
00:09:47.650 --> 00:09:50.580
So this was the case for
a single output neuron
00:09:50.580 --> 00:09:52.080
and a single input neuron.
00:09:52.080 --> 00:09:54.300
What we're doing now is
we're just extending this
00:09:54.300 --> 00:09:58.290
to the case where we have
a vector of input neurons
00:09:58.290 --> 00:10:01.950
with a firing rate represented
by a firing rate vector u,
00:10:01.950 --> 00:10:04.260
and a vector of output
neurons with a fine rate
00:10:04.260 --> 00:10:08.310
vector v. And we're just going
to use this same differential
00:10:08.310 --> 00:10:11.220
equation, but we're going to
write it in vector notation.
00:10:11.220 --> 00:10:14.040
So each one of
these output neurons
00:10:14.040 --> 00:10:17.030
has an equation
like this, and we're
00:10:17.030 --> 00:10:21.255
going to combine them all
together into a single vector.
00:10:21.255 --> 00:10:22.130
Does that make sense?
00:10:25.570 --> 00:10:28.730
All right, so there
is our vector notation
00:10:28.730 --> 00:10:33.460
of the activity in
this recurrent network.
00:10:33.460 --> 00:10:37.880
Sorry, I forgot to put the
recurrent connections in there.
00:10:37.880 --> 00:10:42.200
So the time dependence
is really simple
00:10:42.200 --> 00:10:44.930
in this feed-forward
network, right?
00:10:44.930 --> 00:10:47.810
So in a feed-forward
network, the dynamics
00:10:47.810 --> 00:10:48.860
just look like this.
00:10:51.570 --> 00:10:53.330
But in a recurrent
network, this thing
00:10:53.330 --> 00:10:56.630
can get really interesting and
start doing interesting stuff.
00:10:56.630 --> 00:11:00.530
All right, so let's add
recurrent connections now
00:11:00.530 --> 00:11:06.075
and add these recurrent
connections to our equation.
00:11:08.910 --> 00:11:12.210
So in addition to
this weight matrix
00:11:12.210 --> 00:11:15.067
w that describes the
connections from the input
00:11:15.067 --> 00:11:16.650
layer to the output
layer, we're going
00:11:16.650 --> 00:11:19.470
to have another
weight matrix that
00:11:19.470 --> 00:11:22.500
describes the connections
between the neurons
00:11:22.500 --> 00:11:25.740
in the output layer.
00:11:25.740 --> 00:11:28.380
And this weight
matrix, of course,
00:11:28.380 --> 00:11:30.690
has to be able to
describe a connection
00:11:30.690 --> 00:11:35.040
from any one of these neurons
to any other of these neurons.
00:11:35.040 --> 00:11:37.260
And so this weight
matrix is going
00:11:37.260 --> 00:11:41.310
to be a function of the
postsynaptic neuron,
00:11:41.310 --> 00:11:42.840
the weight--
00:11:42.840 --> 00:11:45.210
the synaptic strength
is going to be
00:11:45.210 --> 00:11:49.380
a function of the postsynaptic
neuron and the presynaptic--
00:11:49.380 --> 00:11:51.220
the identity of the
postsynaptic neuron
00:11:51.220 --> 00:11:53.190
and the identity of
the presynaptic neuron.
00:11:53.190 --> 00:11:55.880
Does that make sense?
00:11:55.880 --> 00:11:58.310
OK, so there are
two kinds of input--
00:11:58.310 --> 00:12:03.630
a feed-forward input
from the input layer
00:12:03.630 --> 00:12:07.510
and a recurrent input due to
connections within the output
00:12:07.510 --> 00:12:08.010
layer.
00:12:13.340 --> 00:12:14.420
Any questions about that?
00:12:21.300 --> 00:12:24.840
OK, so there is the
equation now that
00:12:24.840 --> 00:12:29.760
describes the time rate
of change of the firing
00:12:29.760 --> 00:12:31.740
rates in the output layer.
00:12:31.740 --> 00:12:35.310
It's just this first order
linear differential equation.
00:12:35.310 --> 00:12:43.290
And the infinity is just
this non-linear function
00:12:43.290 --> 00:12:50.220
of the inputs, of the net input
to this neuron, to each neuron.
00:12:50.220 --> 00:12:53.190
And the net input to
this set of neurons
00:12:53.190 --> 00:12:57.090
is a contribution from
the feed-forward inputs,
00:12:57.090 --> 00:13:01.590
given by this weight matrix
w, and this contribution
00:13:01.590 --> 00:13:07.820
from the recurrent inputs,
given by this weight matrix, m.
00:13:07.820 --> 00:13:14.540
So that is the crux
of it, all right?
00:13:14.540 --> 00:13:21.320
So I want to make sure that
we understand where we are.
00:13:21.320 --> 00:13:23.916
Does anybody have any
questions about that?
00:13:23.916 --> 00:13:26.760
No?
00:13:26.760 --> 00:13:29.760
All right, then I'll push ahead.
00:13:29.760 --> 00:13:31.950
All right, so what is this?
00:13:31.950 --> 00:13:33.270
So we've seen this before.
00:13:33.270 --> 00:13:37.050
This product of
this weight matrix
00:13:37.050 --> 00:13:40.830
times this vector of
input firing rates
00:13:40.830 --> 00:13:42.550
just looks like this.
00:13:42.550 --> 00:13:49.800
You can see that the input to
this neuron, this first output
00:13:49.800 --> 00:13:54.420
neuron, is just the dot
product of these weights
00:13:54.420 --> 00:13:59.190
onto the first
neuron and the dot
00:13:59.190 --> 00:14:01.680
product of that vector of
weights, that row of the weight
00:14:01.680 --> 00:14:04.950
matrix, with the vector
of input firing rates.
00:14:07.650 --> 00:14:10.590
And the feed-forward
contribution to this neuron
00:14:10.590 --> 00:14:14.610
is just the dot product of that
row weight of this input weight
00:14:14.610 --> 00:14:20.770
matrix with the vector of
input firing rates, and so on.
00:14:20.770 --> 00:14:25.480
If we look at the recurrent
input to these neurons,
00:14:25.480 --> 00:14:28.630
the recurrent input
to this first neuron
00:14:28.630 --> 00:14:31.000
is just going to
be the dot product
00:14:31.000 --> 00:14:35.140
of this row of the
recurrent weight matrix
00:14:35.140 --> 00:14:39.250
and the vector of firing
rates in the output layer.
00:14:43.580 --> 00:14:46.210
The recurrent inputs
to the second neuron
00:14:46.210 --> 00:14:49.930
is going to be the dot product
of this row of the weight
00:14:49.930 --> 00:14:53.160
matrix and the vector
of firing rates.
00:14:56.130 --> 00:14:56.938
Yes?
00:14:56.938 --> 00:14:58.750
AUDIENCE: So I guess
I'm a little confused,
00:14:58.750 --> 00:15:01.890
because I thought it was
from A. Oh, to A. OK.
00:15:01.890 --> 00:15:04.410
MICHALE FEE: Yeah,
it's always post, pre.
00:15:04.410 --> 00:15:07.650
Post, pre in a weight matrix.
00:15:14.260 --> 00:15:16.060
That's because we're
usually writing
00:15:16.060 --> 00:15:20.890
down these vectors the way that
I'm defining this notation.
00:15:23.740 --> 00:15:31.990
This vector is a column
matrix, a column vector.
00:15:31.990 --> 00:15:37.840
All right, so we're going
to make one simplification
00:15:37.840 --> 00:15:39.880
to this.
00:15:39.880 --> 00:15:44.020
When we work with the
recurrent networks,
00:15:44.020 --> 00:15:47.230
we're usually going to
simplify this input.
00:15:47.230 --> 00:15:53.050
And rather than write down this
complex feed-forward component,
00:15:53.050 --> 00:15:55.820
writing this out as
this matrix product,
00:15:55.820 --> 00:15:59.950
we're just going to
simplify the math.
00:15:59.950 --> 00:16:04.090
And rather than carry
around this w times u,
00:16:04.090 --> 00:16:10.760
we're just going to replace
that with a vector of inputs
00:16:10.760 --> 00:16:12.700
onto each one of
those neurons, OK?
00:16:12.700 --> 00:16:17.380
So we're just going to pretend
that the input to this neuron
00:16:17.380 --> 00:16:21.377
is just coming
from one input, OK?
00:16:21.377 --> 00:16:22.960
And the input to
this neuron is coming
00:16:22.960 --> 00:16:24.610
from another single input.
00:16:24.610 --> 00:16:28.720
And so we're just going to
replace that feed-forward input
00:16:28.720 --> 00:16:30.820
onto this network
with this vector h.
00:16:33.660 --> 00:16:35.940
So that's the
equation that we're
00:16:35.940 --> 00:16:40.060
going to use moving
forward, all right?
00:16:40.060 --> 00:16:42.130
Just simplifies
things a little bit so
00:16:42.130 --> 00:16:45.050
we're not carrying
around this w u.
00:16:47.560 --> 00:16:50.860
So now, that's our
equation that we're
00:16:50.860 --> 00:16:54.350
going to use to describe
this recurrent network.
00:16:54.350 --> 00:16:57.137
This is a system of
coupled equations.
00:16:57.137 --> 00:16:57.970
What does that mean?
00:16:57.970 --> 00:17:01.540
You can see that the time
derivative of the firing
00:17:01.540 --> 00:17:05.349
rate of this first neuron
is given by a contribution
00:17:05.349 --> 00:17:08.920
from the input layer
and a contribution
00:17:08.920 --> 00:17:13.040
from other neurons
in the output layer.
00:17:13.040 --> 00:17:16.190
So the time rate of
change of this neuron
00:17:16.190 --> 00:17:18.950
depends on the activity
in all the other neurons
00:17:18.950 --> 00:17:20.050
in the network.
00:17:20.050 --> 00:17:21.800
And the time rate of
change in this neuron
00:17:21.800 --> 00:17:24.650
depends on the activity
of all the other neurons
00:17:24.650 --> 00:17:25.290
in the network.
00:17:25.290 --> 00:17:28.174
So that's a set of
coupled equations.
00:17:28.174 --> 00:17:30.390
And that, in general, can be--
00:17:30.390 --> 00:17:33.230
you know, it's not obvious,
when you look at it,
00:17:33.230 --> 00:17:35.360
what the solution is, all right?
00:17:35.360 --> 00:17:42.200
So we're going to develop the
tools to solve this equation
00:17:42.200 --> 00:17:46.640
and get some intuition about
how networks like this behave
00:17:46.640 --> 00:17:50.090
in response to their inputs.
00:17:50.090 --> 00:17:51.620
So the first thing
we're going to do
00:17:51.620 --> 00:17:58.800
is to simplify this network
to the case of linear neurons.
00:17:58.800 --> 00:18:01.810
So we don't have--
00:18:01.810 --> 00:18:04.080
so the neurons just fire.
00:18:04.080 --> 00:18:06.690
Their firing rate is just
linear with their input.
00:18:09.360 --> 00:18:12.750
And so that's the equation
for the linear case.
00:18:12.750 --> 00:18:14.400
All we've done is
we've just gotten rid
00:18:14.400 --> 00:18:16.830
of this non-linear function f.
00:18:19.470 --> 00:18:24.180
All right, so now let's
take a very simple case
00:18:24.180 --> 00:18:27.810
of a recurrent network
and use this equation
00:18:27.810 --> 00:18:29.970
to see how it
behaves, all right?
00:18:29.970 --> 00:18:34.080
So the simplest case
of a recurrent network
00:18:34.080 --> 00:18:39.120
is the case where the recurrent
connections within this layer
00:18:39.120 --> 00:18:41.100
are given by--
00:18:41.100 --> 00:18:43.980
the weight matrix is given
by a diagonal matrix.
00:18:43.980 --> 00:18:45.690
Now, what does
that correspond to?
00:18:45.690 --> 00:18:50.160
What that corresponds to is
this neuron making a connection
00:18:50.160 --> 00:18:56.340
onto itself with a synapse of
weight lambda one, right there.
00:18:56.340 --> 00:18:59.670
And that kind of
recurrent connection
00:18:59.670 --> 00:19:03.420
of a neuron onto itself
is called an autapse,
00:19:03.420 --> 00:19:06.770
like an auto synapse.
00:19:06.770 --> 00:19:08.760
And we're going to put
one of those autapses
00:19:08.760 --> 00:19:12.150
on each one of these
neurons in our output layer,
00:19:12.150 --> 00:19:15.460
in our recurrent layer.
00:19:15.460 --> 00:19:18.540
So now we can write
down the equation
00:19:18.540 --> 00:19:21.540
for this network, all right?
00:19:21.540 --> 00:19:26.150
And what we're going to
do is simply replace--
00:19:26.150 --> 00:19:28.260
sorry, let me just bring
up that equation again.
00:19:28.260 --> 00:19:30.030
Sorry, there's the equation.
00:19:30.030 --> 00:19:33.570
And we're simply going to
replace this weight matrix
00:19:33.570 --> 00:19:36.870
m, this recurrent weight matrix,
with that diagonal matrix
00:19:36.870 --> 00:19:40.510
that I just showed you.
00:19:40.510 --> 00:19:42.250
So there it is.
00:19:42.250 --> 00:19:45.820
So that time rate of change of
this vector of output neurons
00:19:45.820 --> 00:19:48.990
is just minus v plus this
diagonal matrix times
00:19:48.990 --> 00:19:51.480
[INAUDIBLE] plus the inputs.
00:19:55.570 --> 00:19:58.630
So now you can see
that if we write out
00:19:58.630 --> 00:20:03.130
the equation separately for each
one of these output neurons--
00:20:03.130 --> 00:20:06.210
so here it is in
vector notation.
00:20:06.210 --> 00:20:12.600
We can just write that out for
each one of our output neurons.
00:20:12.600 --> 00:20:14.880
So there's a separate
equation like this
00:20:14.880 --> 00:20:18.170
for each one of these neurons.
00:20:18.170 --> 00:20:20.570
But you can see that
these are all uncoupled.
00:20:20.570 --> 00:20:23.060
So we can understand how
this network responds just
00:20:23.060 --> 00:20:27.990
by studying this equation
for one of those neurons.
00:20:27.990 --> 00:20:29.000
OK, so let's do that.
00:20:29.000 --> 00:20:31.820
We have an independent equation.
00:20:31.820 --> 00:20:34.700
The firing rate change--
00:20:34.700 --> 00:20:37.820
the time derivative of the
firing rate of neuron one
00:20:37.820 --> 00:20:40.510
depends only on the
firing rate of neuron one.
00:20:40.510 --> 00:20:44.000
It doesn't depend on
any other neurons.
00:20:44.000 --> 00:20:45.790
As you can see,
it's not connected
00:20:45.790 --> 00:20:47.860
to any of the other neurons.
00:20:47.860 --> 00:20:50.420
OK, so let's write
this equation.
00:20:50.420 --> 00:20:53.420
And let's see what that
equation looks like.
00:20:53.420 --> 00:20:55.340
So we're going to rewrite
this a little bit.
00:20:55.340 --> 00:21:00.940
We're just going to factor
out the va all right here.
00:21:00.940 --> 00:21:05.574
This parameter,
1 minus lambda a,
00:21:05.574 --> 00:21:08.770
controls what kind of
solutions this equation has.
00:21:08.770 --> 00:21:11.793
And there are three different
cases that we need to consider.
00:21:11.793 --> 00:21:13.210
We need to consider
the case where
00:21:13.210 --> 00:21:17.350
1 minus lambda is greater
than zero, equal to zero,
00:21:17.350 --> 00:21:19.830
or less than zero.
00:21:19.830 --> 00:21:23.820
Those three different values of
that parameter 1 minus lambda
00:21:23.820 --> 00:21:26.963
give three different kinds of
solutions to this equation.
00:21:26.963 --> 00:21:28.380
We're going to
start with the case
00:21:28.380 --> 00:21:31.730
where lambda is less than one.
00:21:31.730 --> 00:21:35.020
And if lambda is less than
1, then this term right
00:21:35.020 --> 00:21:38.200
here is greater than zero.
00:21:38.200 --> 00:21:42.120
If we do that, then we
can rewrite this equation
00:21:42.120 --> 00:21:42.760
as follows.
00:21:42.760 --> 00:21:45.780
We're going to divide both
sides of this equation
00:21:45.780 --> 00:21:50.090
by 1 minus lambda, and
that's what we have here.
00:21:50.090 --> 00:21:52.970
And you can see that this
equation starts looking
00:21:52.970 --> 00:21:57.240
very familiar, very simple.
00:21:57.240 --> 00:22:00.560
We have a first order linear
differential equation, where
00:22:00.560 --> 00:22:04.790
we have a time constant here,
tau over 1 minus lambda,
00:22:04.790 --> 00:22:09.560
and a v infinity here, which is
the input, the effective input
00:22:09.560 --> 00:22:13.260
onto that neuron, divided
by 1 minus lambda.
00:22:13.260 --> 00:22:18.482
So that's tau dv dt equals
minus v plus v infinity.
00:22:21.260 --> 00:22:24.470
But now you can see
that the time constant
00:22:24.470 --> 00:22:28.380
and the v infinity
depend on lambda,
00:22:28.380 --> 00:22:35.450
depend on the strength of
that connection, all right?
00:22:35.450 --> 00:22:39.110
And the solution to that we've
seen before, to this equation.
00:22:39.110 --> 00:22:43.690
It's just exponential
relaxation toward v infinity.
00:22:43.690 --> 00:22:45.490
OK, so here's our v infinity.
00:22:45.490 --> 00:22:47.170
There's our tau.
00:22:47.170 --> 00:22:51.730
True for the case
of lambda between--
00:22:51.730 --> 00:22:55.210
let's just look at these
solutions for the case
00:22:55.210 --> 00:22:58.850
of lambda between zero and one.
00:22:58.850 --> 00:23:04.740
So I'm going to plot v as a
function of time when we have
00:23:04.740 --> 00:23:09.630
an input that goes from zero
and then steps up and then
00:23:09.630 --> 00:23:12.340
is held constant.
00:23:12.340 --> 00:23:15.180
All right, so let's look at
the case of lambda equals zero.
00:23:15.180 --> 00:23:18.540
So this lambda zero
means there's no autapse.
00:23:18.540 --> 00:23:21.280
It's just not connected.
00:23:21.280 --> 00:23:23.560
So you can see
that, in this case,
00:23:23.560 --> 00:23:24.970
the solution is very simple.
00:23:24.970 --> 00:23:29.100
It's just exponential relaxation
toward infinity. v infinity
00:23:29.100 --> 00:23:35.880
is just given by h, the
input, and tau is just
00:23:35.880 --> 00:23:39.540
the original tau,
1 minus 0, right?
00:23:39.540 --> 00:23:43.638
So it's just exponential
relaxation to h.
00:23:46.830 --> 00:23:47.810
That make sense?
00:23:51.100 --> 00:23:56.990
And it relaxes with a
time constant tau, tau m.
00:23:56.990 --> 00:23:59.480
We're going to now turn up
the synapse a little bit
00:23:59.480 --> 00:24:04.250
so that it has a
little bit of strength.
00:24:04.250 --> 00:24:08.200
You see that what happens
when lambda is 0.5,
00:24:08.200 --> 00:24:10.910
that v infinity gets bigger.
00:24:10.910 --> 00:24:12.400
v infinity goes to 2h.
00:24:12.400 --> 00:24:12.900
Why?
00:24:12.900 --> 00:24:16.010
Because it's h divided
by 1 minus 0.5.
00:24:16.010 --> 00:24:19.630
So it's h over 0.5, so 2h.
00:24:19.630 --> 00:24:21.310
And what happens to
the time constant?
00:24:21.310 --> 00:24:25.710
Well, it becomes two tau.
00:24:25.710 --> 00:24:28.800
All right, and if we make
lambda equal to 0.3--
00:24:28.800 --> 00:24:29.930
sorry, 0.66.
00:24:29.930 --> 00:24:31.340
We turn it up a little bit.
00:24:31.340 --> 00:24:36.600
You can see that the response
of this neuron gets even bigger.
00:24:36.600 --> 00:24:38.480
So you can see that
what's happening
00:24:38.480 --> 00:24:42.890
is that when we start
letting this neuron feed back
00:24:42.890 --> 00:24:47.970
to itself, positive feedback,
the response of the neuron
00:24:47.970 --> 00:24:51.020
to a fixed input--
00:24:51.020 --> 00:24:52.680
the input is the same
for all of those.
00:24:52.680 --> 00:24:55.380
The response of the
neuron gets bigger.
00:24:55.380 --> 00:24:59.130
And so having positive feedback
of that neuron onto itself
00:24:59.130 --> 00:25:02.130
through an autapse just
amplifies the response
00:25:02.130 --> 00:25:03.480
of this neuron to its input.
00:25:09.080 --> 00:25:11.930
Now, let's consider
the case where--
00:25:11.930 --> 00:25:14.630
so positive feedback
amplifies the response.
00:25:14.630 --> 00:25:16.190
And what also does it do?
00:25:16.190 --> 00:25:18.530
It slows the response down.
00:25:18.530 --> 00:25:21.590
The time constants are
getting longer, which
00:25:21.590 --> 00:25:23.570
means the response is slower.
00:25:27.305 --> 00:25:30.320
All right, let's look
at what happens when
00:25:30.320 --> 00:25:32.960
the lambdas are less than zero.
00:25:32.960 --> 00:25:37.085
What does lambda less than
zero correspond to here?
00:25:37.085 --> 00:25:37.960
AUDIENCE: [INAUDIBLE]
00:25:37.960 --> 00:25:41.470
MICHALE FEE: Yeah, which
is, in neurons, what
00:25:41.470 --> 00:25:43.294
does that correspond to?
00:25:43.294 --> 00:25:44.800
AUDIENCE: [INAUDIBLE]
00:25:44.800 --> 00:25:46.100
MICHALE FEE: Inhibition.
00:25:46.100 --> 00:25:48.685
So this neuron, when
you put an input in,
00:25:48.685 --> 00:25:50.980
it tries to activate the neuron.
00:25:50.980 --> 00:25:52.955
But that neuron inhibits itself.
00:25:52.955 --> 00:25:54.580
So what do you think's
going to happen?
00:25:54.580 --> 00:25:58.120
So positive feedback
made the response bigger.
00:25:58.120 --> 00:26:01.130
Here, the neuron is kind
of inhibiting itself.
00:26:01.130 --> 00:26:02.960
So what's going to happen?
00:26:02.960 --> 00:26:05.620
You put in that same
h that we had before,
00:26:05.620 --> 00:26:09.612
what's going to happen
when we have inhibition?
00:26:09.612 --> 00:26:11.070
AUDIENCE: Response
is [INAUDIBLE]..
00:26:11.070 --> 00:26:12.000
MICHALE FEE: What's that?
00:26:12.000 --> 00:26:12.900
AUDIENCE: The response
is going to be smaller.
00:26:12.900 --> 00:26:15.060
MICHALE FEE: The response will
just be smaller, that's right.
00:26:15.060 --> 00:26:16.030
So let's look at that.
00:26:16.030 --> 00:26:17.850
So here's firing
rate of this neuron
00:26:17.850 --> 00:26:20.820
is a function of time
for a step input.
00:26:20.820 --> 00:26:23.070
You can see for a
lambda equals zero,
00:26:23.070 --> 00:26:25.305
we're going to respond
with an amount h.
00:26:27.890 --> 00:26:29.590
But if we put in--
00:26:29.590 --> 00:26:30.920
in a time constant tau.
00:26:30.920 --> 00:26:33.740
If we put in a lambda
of negative one--
00:26:33.740 --> 00:26:36.080
that means you put
this input in--
00:26:36.080 --> 00:26:39.280
that neuron starts
inhibiting itself,
00:26:39.280 --> 00:26:42.130
and you can see the
response is smaller.
00:26:42.130 --> 00:26:44.290
But another thing
that's real interesting
00:26:44.290 --> 00:26:46.690
is that you can see that
the response of the neuron
00:26:46.690 --> 00:26:48.190
is actually faster.
00:26:52.100 --> 00:26:55.645
So if the feedback-- if
the lambda is minus one,
00:26:55.645 --> 00:27:00.350
you can see that v infinity
is h over 1 minus negative 1.
00:27:00.350 --> 00:27:02.860
So it's h over 2.
00:27:02.860 --> 00:27:03.760
All right, and so on.
00:27:03.760 --> 00:27:06.430
The more we turn up that
inhibition, the more
00:27:06.430 --> 00:27:09.070
suppressed the
neuron is, the weaker
00:27:09.070 --> 00:27:11.420
the response that
neuron is to its input,
00:27:11.420 --> 00:27:14.110
but the faster it is.
00:27:14.110 --> 00:27:17.500
So negative feedback suppresses
the response of the neuron
00:27:17.500 --> 00:27:19.300
and speeds up the response.
00:27:26.708 --> 00:27:28.750
OK, now, there's one other
really important thing
00:27:28.750 --> 00:27:32.860
about recurrent networks
in this regime, where
00:27:32.860 --> 00:27:36.610
this lambda is less than one.
00:27:36.610 --> 00:27:39.610
And that is that
the activity always
00:27:39.610 --> 00:27:43.080
relaxes back to zero when
you turn the input off.
00:27:43.080 --> 00:27:46.660
OK, so you put a step
input in, the neuron
00:27:46.660 --> 00:27:50.320
responds, relaxing exponentially
to sum of v infinity.
00:27:50.320 --> 00:27:53.820
But when you turn the
input off, the network
00:27:53.820 --> 00:27:56.040
relaxes back to zero, OK?
00:28:05.760 --> 00:28:10.080
So now let's go to
the more general case
00:28:10.080 --> 00:28:12.020
of recurrent connections.
00:28:12.020 --> 00:28:13.890
Oh, and first, I
just want to show you
00:28:13.890 --> 00:28:19.870
how we actually show graphically
how a neuron responds--
00:28:19.870 --> 00:28:22.820
sorry, how one of
these networks respond.
00:28:22.820 --> 00:28:25.050
And a typical way
that we do that is we
00:28:25.050 --> 00:28:29.430
plot the firing rate of one
neuron versus the firing
00:28:29.430 --> 00:28:31.260
rate of another neuron.
00:28:31.260 --> 00:28:34.510
That's called a
state-space trajectory.
00:28:34.510 --> 00:28:38.820
And we plot that response
as a function of time
00:28:38.820 --> 00:28:40.760
after we put in an input.
00:28:40.760 --> 00:28:44.100
So we can put an input in
described as some vector.
00:28:44.100 --> 00:28:47.670
So we put in some h1
and h2, and we then
00:28:47.670 --> 00:28:50.430
plot the response
of the neuron--
00:28:50.430 --> 00:28:54.750
the response of the network
in this output state space.
00:28:54.750 --> 00:28:57.400
So let me show you an example
of what that looks like.
00:28:57.400 --> 00:29:04.170
So here is the output
of this little network
00:29:04.170 --> 00:29:06.660
for different kinds of inputs.
00:29:06.660 --> 00:29:09.210
So Daniel made this nice
little movie for us.
00:29:12.170 --> 00:29:16.250
Here, you can see that if you
put an input into neuron one,
00:29:16.250 --> 00:29:17.420
neuron one responds.
00:29:17.420 --> 00:29:20.180
If you put a negative
input into neuron one,
00:29:20.180 --> 00:29:21.700
the neuron goes negative.
00:29:21.700 --> 00:29:25.140
If you put an input into neuron
two, the neuron responds.
00:29:25.140 --> 00:29:29.640
And if you put a negative input
into neuron two, it responds.
00:29:29.640 --> 00:29:33.680
Now, why did it respond
bigger in this direction than
00:29:33.680 --> 00:29:34.978
in this direction?
00:29:39.370 --> 00:29:42.320
AUDIENCE: That's [INAUDIBLE].
00:29:42.320 --> 00:29:43.070
MICHALE FEE: Good.
00:29:43.070 --> 00:29:47.858
Because neuron one had--
00:29:47.858 --> 00:29:48.830
AUDIENCE: Positive?
00:29:48.830 --> 00:29:51.000
MICHALE FEE: Positive feedback.
00:29:51.000 --> 00:29:53.330
And neuron two had
negative feedback.
00:29:53.330 --> 00:29:59.600
So neuron one, this neuron
one, amplified its input
00:29:59.600 --> 00:30:01.100
and gave a big response.
00:30:01.100 --> 00:30:05.400
Neuron two suppressed the
response to its input,
00:30:05.400 --> 00:30:06.650
and so it had a weak response.
00:30:12.750 --> 00:30:14.390
Let's look at another
interesting case.
00:30:14.390 --> 00:30:17.180
Let's put an input
into these neurons--
00:30:17.180 --> 00:30:20.090
not one at a time,
but simultaneously.
00:30:24.450 --> 00:30:28.440
So now we're going to put an
input into both neurons one
00:30:28.440 --> 00:30:29.760
and two simultaneously.
00:30:37.570 --> 00:30:38.550
It's like Spirograph.
00:30:38.550 --> 00:30:44.430
Did you guys play
with Spirograph?
00:30:44.430 --> 00:30:45.570
It's kind of weird, right?
00:30:45.570 --> 00:30:47.940
It's like making little
butterflies for spring.
00:30:52.140 --> 00:30:53.850
So why does the output--
00:30:53.850 --> 00:30:56.100
why does the response
of this neuron
00:30:56.100 --> 00:31:01.530
to an input, positive input to
both h1 and h2, look like this?
00:31:01.530 --> 00:31:04.680
Let's just break this down into
one of these little branches.
00:31:04.680 --> 00:31:05.910
We start at zero.
00:31:05.910 --> 00:31:09.420
We put an input into h1
and h2, and the response
00:31:09.420 --> 00:31:14.930
goes quickly like this and
then relaxes up to here.
00:31:14.930 --> 00:31:16.702
So why is that?
00:31:16.702 --> 00:31:18.190
Lena?
00:31:18.190 --> 00:31:23.646
AUDIENCE: [INAUDIBLE] so
there was [INAUDIBLE] and then
00:31:23.646 --> 00:31:27.012
because it's negative,
it's shorter.
00:31:27.012 --> 00:31:27.720
MICHALE FEE: Yup.
00:31:27.720 --> 00:31:30.965
The response in the v2
direction is weak but fast.
00:31:30.965 --> 00:31:31.590
AUDIENCE: Yeah.
00:31:31.590 --> 00:31:34.570
MICHALE FEE: So it
goes up quickly.
00:31:34.570 --> 00:31:37.680
And then the response
in the v1 direction is?
00:31:37.680 --> 00:31:39.060
AUDIENCE: Slow, but [INAUDIBLE].
00:31:39.060 --> 00:31:39.810
MICHALE FEE: Good.
00:31:39.810 --> 00:31:41.360
That's it.
00:31:41.360 --> 00:31:43.090
It's slow, but [AUDIO OUT].
00:31:43.090 --> 00:31:46.150
It's amplified in this
direction, suppressed
00:31:46.150 --> 00:31:46.900
in this direction.
00:31:46.900 --> 00:31:49.630
But the response is fast
this way and slow this way.
00:31:49.630 --> 00:31:51.310
So it traces this out.
00:31:51.310 --> 00:31:56.420
Now, when you turn the input
off, again, it relaxes.
00:31:56.420 --> 00:32:02.100
v2 relaxes quickly back to
zero, and v1 relaxes slowly
00:32:02.100 --> 00:32:02.800
back to zero.
00:32:02.800 --> 00:32:06.900
So it kind of traces out
this kind of hysteretic loop.
00:32:10.400 --> 00:32:13.900
It's not really hysteresis.
00:32:13.900 --> 00:32:15.820
Then it's exactly
mirror image when
00:32:15.820 --> 00:32:17.710
you put in a negative input.
00:32:17.710 --> 00:32:24.610
And when you put in h1
positive and v1 negative,
00:32:24.610 --> 00:32:28.210
it just looks like
a mirror image.
00:32:28.210 --> 00:32:30.470
All right, so any
questions about that?
00:32:30.470 --> 00:32:31.218
Yes, Lena?
00:32:31.218 --> 00:32:34.086
AUDIENCE: If there was nothing,
like no kind of amplified
00:32:34.086 --> 00:32:37.440
or [INAUDIBLE],, would it
just be like a [INAUDIBLE]??
00:32:37.440 --> 00:32:39.200
MICHALE FEE: Yeah,
so if you took out
00:32:39.200 --> 00:32:42.964
the recurrent connections, what
would what would it look like?
00:32:42.964 --> 00:32:44.200
AUDIENCE: An x?
00:32:44.200 --> 00:32:45.690
MICHALE FEE: Yeah, the output--
00:32:45.690 --> 00:32:50.060
so let's say that you just
literally set those to zero.
00:32:50.060 --> 00:32:58.130
Then the response will be
the identity matrix, right?
00:32:58.130 --> 00:33:00.570
You get the output as
a function of input.
00:33:00.570 --> 00:33:02.130
Let's just go back
to the equation.
00:33:02.130 --> 00:33:03.650
Can always, always
get the answer
00:33:03.650 --> 00:33:04.820
by looking at the equation.
00:33:10.330 --> 00:33:13.000
Too many animations.
00:33:13.000 --> 00:33:14.340
No, it's a very good question.
00:33:14.340 --> 00:33:14.910
Here we go.
00:33:14.910 --> 00:33:16.430
There it is right there.
00:33:16.430 --> 00:33:20.190
So you're asking about-- let's
just ask about the steady state
00:33:20.190 --> 00:33:21.080
response.
00:33:21.080 --> 00:33:23.540
So we can set dv
dt equal to zero.
00:33:23.540 --> 00:33:26.540
And you're asking, what is v?
00:33:26.540 --> 00:33:31.200
And you're saying, let's
set lambda to zero, right?
00:33:31.200 --> 00:33:35.130
We're going to set all these
diagonal elements to zero.
00:33:35.130 --> 00:33:37.740
And so now v equals h.
00:33:47.940 --> 00:33:49.350
OK, great question.
00:33:49.350 --> 00:33:54.390
Now, let's go to the case
of fully recurrent networks.
00:33:54.390 --> 00:33:57.330
We've been working with
this simplified case of just
00:33:57.330 --> 00:34:00.350
having neurons have autapses.
00:34:00.350 --> 00:34:03.290
And the reason we've been doing
that is because the answer
00:34:03.290 --> 00:34:06.380
you get for the autapse
kind of captures
00:34:06.380 --> 00:34:09.080
almost all the intuition
that you need to have.
00:34:09.080 --> 00:34:10.820
What we're going to
do is we're going
00:34:10.820 --> 00:34:14.270
to take a fully
recurrent neural network,
00:34:14.270 --> 00:34:17.150
and we're going to do a
mathematical trick that
00:34:17.150 --> 00:34:19.310
just turns it into
an autapse network.
00:34:22.340 --> 00:34:25.280
And the answer for the
fully recurrent network
00:34:25.280 --> 00:34:30.113
is just going to be just as
simple as what you saw here.
00:34:30.113 --> 00:34:31.280
All right, so let's do that.
00:34:31.280 --> 00:34:33.620
Let's take this fully
recurrent network.
00:34:33.620 --> 00:34:36.980
Our weight matrix m now,
instead of just having
00:34:36.980 --> 00:34:39.935
diagonal elements, also
has off-diagonal elements.
00:34:42.820 --> 00:34:44.820
And I'll say that one
of the things that we're
00:34:44.820 --> 00:34:47.580
going to do today is just
consider the simplest
00:34:47.580 --> 00:34:51.239
case of this fully
recurrent network, where
00:34:51.239 --> 00:34:55.889
the connections are symmetric,
where a connection from v1
00:34:55.889 --> 00:35:00.180
to v2 is equal to the connection
from v2 to v1, all right?
00:35:00.180 --> 00:35:04.050
We're going to do that
because that's the next thing
00:35:04.050 --> 00:35:06.150
to do to build our
intuition, and it's
00:35:06.150 --> 00:35:12.360
also mathematically simpler
than the fully general case, OK?
00:35:12.360 --> 00:35:15.390
So we saw how the
behavior of this network
00:35:15.390 --> 00:35:17.610
is very simple if m is diagonal.
00:35:20.273 --> 00:35:21.690
So what we're going
to do is we're
00:35:21.690 --> 00:35:26.030
going to take this
arbitrary matrix m,
00:35:26.030 --> 00:35:28.760
and we're going to
just make it diagonal.
00:35:28.760 --> 00:35:31.140
So let's do that.
00:35:31.140 --> 00:35:35.720
So we're going to rewrite
our weight matrix m as--
00:35:35.720 --> 00:35:46.910
so we're going to rewrite m
in this form, where this phi--
00:35:46.910 --> 00:35:52.210
sorry, where this lambda
is a diagonal matrix.
00:35:52.210 --> 00:35:54.340
So we're going to
take this network
00:35:54.340 --> 00:35:58.200
with recurrent connections
between different neurons
00:35:58.200 --> 00:36:03.780
in the network, and we're
going to transform it
00:36:03.780 --> 00:36:07.325
into sort of an equivalent
network that just has autapses.
00:36:11.310 --> 00:36:13.950
So how do we write
m in this form,
00:36:13.950 --> 00:36:17.310
with a rotation matrix
times a diagonal matrix
00:36:17.310 --> 00:36:19.870
times a rotation matrix?
00:36:19.870 --> 00:36:26.170
We just solve this
eigenvalue equation, OK?
00:36:26.170 --> 00:36:27.620
Does that make sense?
00:36:27.620 --> 00:36:29.630
We're just going to do
exactly the same thing
00:36:29.630 --> 00:36:36.570
we did in PCA, where we
find the covariance matrix.
00:36:36.570 --> 00:36:39.870
And we rewrote the
covariance matrix like this.
00:36:39.870 --> 00:36:42.300
Now we're going to
take a weight matrix
00:36:42.300 --> 00:36:46.830
of this recurrent
network, and we're
00:36:46.830 --> 00:36:51.090
going to rewrite it in
exactly the same way.
00:36:51.090 --> 00:36:55.110
So that process is called
diagonalizing the weight
00:36:55.110 --> 00:36:57.860
matrix.
00:36:57.860 --> 00:37:04.040
So the elements of lambda
here are the eigenvalues of m.
00:37:06.740 --> 00:37:12.320
And the columns of the phi
are the eigenvectors of m.
00:37:15.606 --> 00:37:22.590
And we're going to use these
quantities, these elements,
00:37:22.590 --> 00:37:27.450
to build a new network that
has the same properties
00:37:27.450 --> 00:37:32.400
as our recurrent network.
00:37:32.400 --> 00:37:34.740
So let me just show
you how we do that.
00:37:34.740 --> 00:37:38.190
So remember that what
this eigenvalue--
00:37:38.190 --> 00:37:43.470
this is an eigenvalue equation
written in matrix notation.
00:37:43.470 --> 00:37:51.340
What this means is this is set
of eigenvalues equations that
00:37:51.340 --> 00:37:54.888
have-- it's a set of
n eigenvalue equations
00:37:54.888 --> 00:37:56.430
like this, where
there's one of these
00:37:56.430 --> 00:37:58.280
for each neuron in the network.
00:37:58.280 --> 00:38:00.390
OK, so let me just
go through that.
00:38:00.390 --> 00:38:02.340
OK, so here's the
eigenvalue equation.
00:38:02.340 --> 00:38:08.180
If M is a symmetric matrix,
then the eigenvalues are real
00:38:08.180 --> 00:38:10.440
and phi is a rotation matrix.
00:38:10.440 --> 00:38:14.150
And the eigenvectors give us
an orthogonal basis, all right?
00:38:14.150 --> 00:38:16.450
So everybody remember this
from a few lectures ago?
00:38:19.090 --> 00:38:21.070
If M is symmetric--
and this is why
00:38:21.070 --> 00:38:23.420
we're going to, at
this point on, consider
00:38:23.420 --> 00:38:26.390
just the case where
M is symmetric,
00:38:26.390 --> 00:38:30.970
then the eigenvectors, the
columns of that matrix phi,
00:38:30.970 --> 00:38:37.600
give us an orthogonal set of
vectors and their unit vectors.
00:38:37.600 --> 00:38:41.710
So it satisfies this
orthonormal condition.
00:38:41.710 --> 00:38:45.010
And phi transpose phi is
an identity matrix, which
00:38:45.010 --> 00:38:48.670
means phi is a rotation matrix.
00:38:48.670 --> 00:38:51.670
OK, so now what we're
going to do is rewrite.
00:38:51.670 --> 00:38:53.950
The first thing we're going
to do to use this trick
00:38:53.950 --> 00:38:57.220
to rewrite our
matrix, our network,
00:38:57.220 --> 00:39:01.000
is to rewrite the
vector of firing rates v
00:39:01.000 --> 00:39:01.950
in this new basis.
00:39:01.950 --> 00:39:02.950
What are we going to do?
00:39:02.950 --> 00:39:07.120
Well take the vector and
all we're going to do
00:39:07.120 --> 00:39:11.170
is to rewrite that vector
in this new basis set.
00:39:11.170 --> 00:39:14.710
We're just going to do a change
of basis of our firing rate
00:39:14.710 --> 00:39:17.950
vector into a new
basis set that's
00:39:17.950 --> 00:39:20.170
given by the columns of phi.
00:39:23.507 --> 00:39:25.090
Another way of saying
it is that we're
00:39:25.090 --> 00:39:30.190
going to rotate this firing rate
vector v using the phi rotation
00:39:30.190 --> 00:39:32.170
matrix.
00:39:32.170 --> 00:39:35.620
So we're going to project v
onto each one of those new basis
00:39:35.620 --> 00:39:36.170
vectors.
00:39:36.170 --> 00:39:39.280
So there's v in
the standard basis.
00:39:39.280 --> 00:39:42.100
There's our new
basis, f1 and f2.
00:39:42.100 --> 00:39:45.160
We're going to project
v onto f1 and f2
00:39:45.160 --> 00:39:52.300
and write down that scalar
projection, c1 and c2.
00:39:52.300 --> 00:39:56.230
So we're going to write down
the scalar projection of v
00:39:56.230 --> 00:39:59.270
onto each one of
those basis vectors.
00:39:59.270 --> 00:40:01.870
So we can write
that c sub alpha--
00:40:01.870 --> 00:40:04.040
that's the alpha-th component--
00:40:04.040 --> 00:40:13.180
is just v dot the
alpha-th basis vector.
00:40:13.180 --> 00:40:16.510
So now we can express v
as a linear combination
00:40:16.510 --> 00:40:18.010
in this new basis.
00:40:21.240 --> 00:40:26.050
So it's c1 times f1 plus
c2 times f2 plus c3--
00:40:26.050 --> 00:40:27.870
that's supposed to be a three--
00:40:27.870 --> 00:40:29.360
times f3 and so on.
00:40:32.860 --> 00:40:35.980
And of course, remember,
we're doing all of this
00:40:35.980 --> 00:40:38.570
because we want to
understand the dynamics.
00:40:38.570 --> 00:40:40.910
So these things
are time dependent.
00:40:40.910 --> 00:40:45.100
So v is v changes in time.
00:40:45.100 --> 00:40:48.290
We're not going to be changing
our basis vectors in time.
00:40:48.290 --> 00:40:50.260
So if we want to write
down a time dependent v,
00:40:50.260 --> 00:40:51.940
it's really these
coefficients that
00:40:51.940 --> 00:40:56.445
are changing in time, right?
00:40:56.445 --> 00:40:59.170
Does that make sense?
00:40:59.170 --> 00:41:03.742
So we can now write our vector
v, our firing rate vector,
00:41:03.742 --> 00:41:09.750
as a sum of contributions in
all these different directions
00:41:09.750 --> 00:41:11.070
corresponding to the new basis.
00:41:14.280 --> 00:41:16.170
And each one of
those coefficients, c
00:41:16.170 --> 00:41:20.680
is just the time dependent
v projected onto one
00:41:20.680 --> 00:41:21.940
of those basis vectors.
00:41:28.450 --> 00:41:30.320
And questions?
00:41:30.320 --> 00:41:32.050
No?
00:41:32.050 --> 00:41:33.410
OK.
00:41:33.410 --> 00:41:39.140
And remember, we can write
that in matrix notation using
00:41:39.140 --> 00:41:42.490
this formalism that we developed
in the lecture on basis sets.
00:41:42.490 --> 00:41:47.570
So v is just phi c, and c
is just phi transpose v.
00:41:47.570 --> 00:41:49.640
So we're just taking
this vector v,
00:41:49.640 --> 00:41:52.220
and we're rotating it
into a new basis set,
00:41:52.220 --> 00:41:53.720
and we can rotate it back.
00:41:56.433 --> 00:41:58.100
All right, so now
what we're going to do
00:41:58.100 --> 00:42:03.390
is we're going to take this v
expressed in this new basis set
00:42:03.390 --> 00:42:08.870
and were going to rewrite our
equation in that new basis set.
00:42:11.770 --> 00:42:12.500
Watch this.
00:42:12.500 --> 00:42:14.560
This is so cool.
00:42:14.560 --> 00:42:16.145
All right, you ready?
00:42:16.145 --> 00:42:18.520
We're going to take this, and
we're to plug it into here.
00:42:22.140 --> 00:42:28.170
So dv dt is phi dc dt.
00:42:28.170 --> 00:42:31.786
V is just phi c.
00:42:31.786 --> 00:42:36.550
v is phi c, and
h doesn't change.
00:42:36.550 --> 00:42:40.645
So now what is that?
00:42:45.270 --> 00:42:48.215
Do you remember?
00:42:48.215 --> 00:42:49.610
AUDIENCE: Phi [INAUDIBLE].
00:42:49.610 --> 00:42:50.600
MICHALE FEE: Right.
00:42:50.600 --> 00:42:57.440
We got phi as the solution
to the eigenvalue equation.
00:42:57.440 --> 00:43:00.010
What was the
eigenvalue equation?
00:43:00.010 --> 00:43:06.150
The eigenvalue equation was
m phi equals phi lambda.
00:43:06.150 --> 00:43:09.870
So the phi here,
this rotation matrix,
00:43:09.870 --> 00:43:13.980
is the solution to this
equation, all right?
00:43:13.980 --> 00:43:18.870
So we're given m,
and we're saying
00:43:18.870 --> 00:43:20.950
we're going to find
a phi and a lambda
00:43:20.950 --> 00:43:26.110
such that we can write m
phi is equal to phi lambda.
00:43:26.110 --> 00:43:32.220
So when we take that matrix m
and we run eig on it in Matlab,
00:43:32.220 --> 00:43:37.150
Matlab sends us back a phi and
a lambda such that this equation
00:43:37.150 --> 00:43:37.650
is true.
00:43:41.120 --> 00:43:43.805
So literally, we can
take the weight matrix
00:43:43.805 --> 00:43:47.860
m stick it into Matlab,
and get a phi and a lambda
00:43:47.860 --> 00:43:51.790
such that m phi is
equal to phi lambda.
00:43:51.790 --> 00:43:59.020
So m phi is equal to what?
00:43:59.020 --> 00:43:59.800
Phi lambda.
00:44:04.020 --> 00:44:06.810
That becomes this.
00:44:06.810 --> 00:44:11.350
Now, all of a sudden, this
thing is just going to simplify.
00:44:14.480 --> 00:44:16.265
So how would we
simplify this equation?
00:44:19.270 --> 00:44:23.080
We can get rid of all of these
things, all of these phi's,
00:44:23.080 --> 00:44:24.220
by doing what?
00:44:24.220 --> 00:44:25.555
How do you get rid of phi's?
00:44:25.555 --> 00:44:27.430
AUDIENCE: Multiply
[INAUDIBLE] phi transpose.
00:44:27.430 --> 00:44:29.710
MICHALE FEE: You multiply
by phi transpose, exactly.
00:44:29.710 --> 00:44:32.470
So we're going to multiply
each term in this equation
00:44:32.470 --> 00:44:35.760
by phi transpose.
00:44:35.760 --> 00:44:37.860
So what do you have?
00:44:37.860 --> 00:44:42.990
Phi transpose phi, phi transpose
phi, phi transpose phi.
00:44:42.990 --> 00:44:46.780
What is phi transpose
phi equal to?
00:44:46.780 --> 00:44:48.700
The identity matrix.
00:44:48.700 --> 00:44:51.220
Because it's a rotation
matrix, phi transpose
00:44:51.220 --> 00:44:54.730
is just the inverse of phi.
00:44:54.730 --> 00:44:58.550
So phi inverse phi is just
equal to the identity matrix.
00:44:58.550 --> 00:45:00.680
And all those things disappear.
00:45:00.680 --> 00:45:02.800
And you're left
with this equation--
00:45:02.800 --> 00:45:09.370
tau dc dt equals minus c
plus lambda c plus h, hf.
00:45:09.370 --> 00:45:10.450
And what is hf?
00:45:10.450 --> 00:45:13.580
hf is just h rotated
into the new basis set.
00:45:16.370 --> 00:45:20.980
So this is the equation
for a recurrent network
00:45:20.980 --> 00:45:28.120
with just autapses,
which we just understood.
00:45:28.120 --> 00:45:30.610
We just wrote down what
the solution is, right?
00:45:30.610 --> 00:45:33.130
And we plotted it for
different values of lambda.
00:45:40.380 --> 00:45:44.260
So now let's just look at
what some of these look like.
00:45:44.260 --> 00:45:52.360
So we've rewritten our weight
matrix in a new basis set.
00:45:52.360 --> 00:45:55.540
We've rebuilt our network
and a new basis set,
00:45:55.540 --> 00:45:59.860
in a rotated basis set
where everything simplifies.
00:45:59.860 --> 00:46:02.380
So we've taken this
complicated network
00:46:02.380 --> 00:46:07.540
with recurrent connections
and we've rewritten it
00:46:07.540 --> 00:46:10.600
in a new network, where
each of these neurons
00:46:10.600 --> 00:46:13.480
in our new network
corresponds to what's
00:46:13.480 --> 00:46:18.820
called a mode of the
fully recurrent network.
00:46:22.200 --> 00:46:28.850
So the activities c alpha c1
and c2 of the network modes
00:46:28.850 --> 00:46:33.770
represent kind of an activity
in a linear combination
00:46:33.770 --> 00:46:35.180
of these neurons.
00:46:35.180 --> 00:46:40.360
So we're going to go
through what that means now.
00:46:40.360 --> 00:46:42.970
So the first thing I want
to do is just calculate
00:46:42.970 --> 00:46:46.960
what the steady state
response is in this neuron.
00:46:46.960 --> 00:46:48.770
And I'll just do
it mathematically,
00:46:48.770 --> 00:46:51.550
and then I'll show you what
it looks like graphically.
00:46:54.400 --> 00:46:56.320
So there's our original
network equation.
00:46:56.320 --> 00:47:00.380
We've rewritten it a set
of differential equations
00:47:00.380 --> 00:47:03.320
for the modes of this network.
00:47:06.470 --> 00:47:10.270
I'm just rewriting this
by putting an I here,
00:47:10.270 --> 00:47:12.040
minus I times c.
00:47:12.040 --> 00:47:14.150
That's the only
change I made here.
00:47:14.150 --> 00:47:15.475
I just rewrote it like this.
00:47:20.450 --> 00:47:21.670
Let's find a steady state.
00:47:21.670 --> 00:47:24.760
So we're going to set
dc dt equal to zero.
00:47:24.760 --> 00:47:28.850
We're going to ask, what
is c in steady state?
00:47:28.850 --> 00:47:33.310
So we're going to call
that c infinity, all right?
00:47:33.310 --> 00:47:37.520
I minus lambda times c infinity
equals phi transpose h.
00:47:37.520 --> 00:47:38.740
OK, don't panic.
00:47:38.740 --> 00:47:41.480
It's all going to be
very simple in a second.
00:47:41.480 --> 00:47:47.230
c infinity is just I minus
lambda inverse phi transpose h.
00:47:47.230 --> 00:47:49.300
But I is diagonal.
00:47:49.300 --> 00:47:50.560
Lambda is diagonal.
00:47:50.560 --> 00:47:53.730
So I minus lambda
inverse is just the--
00:47:53.730 --> 00:47:58.600
it's a diagonal matrix with
these elements with one
00:47:58.600 --> 00:48:00.145
over all those
diagonal elements.
00:48:04.290 --> 00:48:06.870
Now let's calculate v
infinity. v infinity
00:48:06.870 --> 00:48:09.430
is just phi times v infinity.
00:48:09.430 --> 00:48:12.390
So here, we're multiplying
on the left by phi.
00:48:12.390 --> 00:48:14.710
That's just v infinity.
00:48:14.710 --> 00:48:16.750
So v infinity is just this.
00:48:16.750 --> 00:48:18.330
So what is this?
00:48:18.330 --> 00:48:21.960
This just says v
infinity is some matrix--
00:48:21.960 --> 00:48:23.940
it's a rotated stretch matrix--
00:48:23.940 --> 00:48:25.170
times the input.
00:48:25.170 --> 00:48:30.500
So v infinity is just
this matrix times h.
00:48:30.500 --> 00:48:32.050
And now let's look
at what that is.
00:48:34.580 --> 00:48:37.610
v infinity is a matrix times h.
00:48:37.610 --> 00:48:39.830
We're going to call that g.
00:48:39.830 --> 00:48:42.600
v infinity is a gain matrix.
00:48:42.600 --> 00:48:45.270
We're going to think of that
as a gain times the input.
00:48:45.270 --> 00:48:50.800
So it's just a matrix
operation on the input.
00:48:50.800 --> 00:48:55.390
This matrix has exactly
the same eigenvectors as m.
00:48:55.390 --> 00:48:59.290
And the eigenvalues are
just 1 over 1 minus lambda.
00:49:01.870 --> 00:49:03.350
Hang in there.
00:49:03.350 --> 00:49:07.060
So what this means is that
if an input is parallel
00:49:07.060 --> 00:49:09.940
to one of the eigenvectors
of the weight matrix,
00:49:09.940 --> 00:49:12.520
that means the output is
parallel to the input.
00:49:16.640 --> 00:49:19.240
So if the input is
in the direction
00:49:19.240 --> 00:49:25.720
of one of the eigenvectors,
v infinity is g times f.
00:49:25.720 --> 00:49:28.651
But g times f--
00:49:28.651 --> 00:49:31.310
f is an eigenvector
v. And what that means
00:49:31.310 --> 00:49:35.900
is that v infinity is parallel
to f with a scaling factor
00:49:35.900 --> 00:49:39.310
1 over 1 minus lambda.
00:49:39.310 --> 00:49:39.810
All right?
00:49:39.810 --> 00:49:41.030
So hang in there.
00:49:41.030 --> 00:49:43.720
I'm going to show you
what this looks like.
00:49:43.720 --> 00:49:48.480
So in steady state, the output
will be parallel to the input
00:49:48.480 --> 00:49:50.490
if the input is in
the direction of one
00:49:50.490 --> 00:49:52.950
of the eigenvectors
of the network.
00:49:57.610 --> 00:50:00.750
So if the input is in
the direction of one
00:50:00.750 --> 00:50:02.370
of the eigenvectors
of the network,
00:50:02.370 --> 00:50:07.770
that means you're activating
only one mode of the network.
00:50:07.770 --> 00:50:11.155
And only that one mode responds,
and none of the other modes
00:50:11.155 --> 00:50:11.655
respond.
00:50:15.840 --> 00:50:17.760
The response of
the network will be
00:50:17.760 --> 00:50:20.340
in the direction of
that input, and it
00:50:20.340 --> 00:50:24.480
will be amplified or
suppressed by this gain factor.
00:50:24.480 --> 00:50:28.260
And the time constant will
also be increased or decreased
00:50:28.260 --> 00:50:30.350
by that factor.
00:50:30.350 --> 00:50:32.370
So now let's look at--
so I just kind of whizzed
00:50:32.370 --> 00:50:33.370
through a bunch of math.
00:50:33.370 --> 00:50:36.400
Let's look at what this
looks like graphically
00:50:36.400 --> 00:50:39.050
for a few simple cases.
00:50:39.050 --> 00:50:41.440
And then I think it will
become much more clear.
00:50:41.440 --> 00:50:43.600
Let's just look at
a simple network,
00:50:43.600 --> 00:50:47.740
where we have two neurons
with an excitatory connection
00:50:47.740 --> 00:50:51.520
from neuron one to neuron
two, an excitatory connection
00:50:51.520 --> 00:50:53.440
from neuron two to neuron one.
00:50:53.440 --> 00:50:56.640
And we're going to
make that weight 0.8.
00:50:56.640 --> 00:51:00.220
OK, so what is the weight
matrix M look like?
00:51:00.220 --> 00:51:03.409
Just tell me what the
entries are for M.
00:51:03.409 --> 00:51:05.630
AUDIENCE: Does it
not have the autapse?
00:51:05.630 --> 00:51:09.370
MICHALE FEE: No, so
there's no connection
00:51:09.370 --> 00:51:13.450
of any of these neurons
onto themselves.
00:51:13.450 --> 00:51:15.640
AUDIENCE: So you have,
like, zeros on the diagonal.
00:51:15.640 --> 00:51:17.098
MICHALE FEE: Zeros
on the diagonal.
00:51:17.098 --> 00:51:18.080
Good.
00:51:18.080 --> 00:51:19.840
AUDIENCE: All the diagonals.
00:51:19.840 --> 00:51:20.720
MICHALE FEE: Good.
00:51:20.720 --> 00:51:22.120
Like that?
00:51:22.120 --> 00:51:23.030
Good.
00:51:23.030 --> 00:51:26.510
Connection from neuron
one to itself is zero.
00:51:26.510 --> 00:51:32.330
The connection from
post, pre is row, column.
00:51:32.330 --> 00:51:37.220
So onto neuron one
from neuron two is 0.8.
00:51:37.220 --> 00:51:40.460
Onto neuron two from
neuron one is 0.8.
00:51:40.460 --> 00:51:43.310
And neuron two onto
neuron two is zero.
00:51:46.400 --> 00:51:51.070
So now we are just going to
diagonalize this weight matrix.
00:51:51.070 --> 00:51:58.660
We're going to find the
eigenvectors and eigenvalues.
00:51:58.660 --> 00:52:02.380
The eigenvectors are
the columns of phi.
00:52:02.380 --> 00:52:04.865
And the eigenvalues are the
diagonal elements of lambda.
00:52:08.140 --> 00:52:10.720
Let's take a look at what
those eigenvectors are.
00:52:10.720 --> 00:52:13.860
So this vector here is f1.
00:52:13.860 --> 00:52:16.370
This vector here is
another eigenvector, f2.
00:52:19.780 --> 00:52:20.785
And how did I get this?
00:52:24.260 --> 00:52:26.993
How did I get this from this?
00:52:26.993 --> 00:52:27.910
How would you do that?
00:52:27.910 --> 00:52:32.044
If I gave you this matrix,
how would you find phi?
00:52:32.044 --> 00:52:33.940
AUDIENCE: Eig M.
00:52:33.940 --> 00:52:37.580
MICHALE FEE: Good,
eig of M. Now,
00:52:37.580 --> 00:52:39.700
remember in the
last lecture when
00:52:39.700 --> 00:52:45.350
we were talking about some
simple cases of matrices
00:52:45.350 --> 00:52:49.240
that are really easy to
find the eigenvectors of?
00:52:49.240 --> 00:52:53.350
If you have a symmetric matrix,
where the diagonal elements are
00:52:53.350 --> 00:52:56.260
equal to each other,
the eigenvectors
00:52:56.260 --> 00:53:01.070
are always 45 degrees
here and 45 degrees there.
00:53:01.070 --> 00:53:07.000
And the eigenvalues are just the
diagonal elements plus or minus
00:53:07.000 --> 00:53:08.270
the off-diagonal elements.
00:53:08.270 --> 00:53:15.460
So the eigenvalues here
are 0.8 and minus 0.8.
00:53:15.460 --> 00:53:22.350
All right, so those are the two
eigenvectors of this matrix,
00:53:22.350 --> 00:53:23.055
of this network.
00:53:25.830 --> 00:53:29.860
Those are the modes
of the network.
00:53:29.860 --> 00:53:34.410
Notice that one of the modes
corresponds to neuron one
00:53:34.410 --> 00:53:37.560
and neuron two firing together.
00:53:37.560 --> 00:53:40.650
The other mode corresponds
to neuron one and neuron
00:53:40.650 --> 00:53:43.200
two firing with opposite sign--
00:53:46.680 --> 00:53:50.020
minus one, one.
00:53:50.020 --> 00:53:54.550
So the lambda-- the diagonal
elements of the lambda matrix
00:53:54.550 --> 00:53:56.120
are the eigenvalues.
00:53:56.120 --> 00:54:04.410
They're 0.8 and minus
0.8, a plus or minus b.
00:54:04.410 --> 00:54:07.500
Now, this gain
factor, what this says
00:54:07.500 --> 00:54:11.960
is that if I have an input
in the direction of f1,
00:54:11.960 --> 00:54:14.650
the response is going to
be amplified by a gain.
00:54:14.650 --> 00:54:17.720
And remember, we just derived,
on the previous slide,
00:54:17.720 --> 00:54:20.990
that that gain factor
is just 1 over 1
00:54:20.990 --> 00:54:26.360
minus the eigenvalue
for that eigenvector.
00:54:26.360 --> 00:54:34.270
In this case, the eigenvalue
for mode one is 0.8.
00:54:34.270 --> 00:54:38.680
So 1 over 1 minus 0.8 is 5.
00:54:38.680 --> 00:54:43.180
So the gain in this
direction is 5.
00:54:43.180 --> 00:54:47.650
The gain for an input
in this direction
00:54:47.650 --> 00:54:56.380
is 1 over 1 minus negative
0.8, which is 1 over 1.8.
00:54:56.380 --> 00:54:58.640
Does that makes sense?
00:54:58.640 --> 00:55:00.370
OK, let's keep going,
because I think
00:55:00.370 --> 00:55:01.960
it will make even
more sense once we
00:55:01.960 --> 00:55:04.195
see how the network
responds to its inputs.
00:55:10.910 --> 00:55:12.780
So zero input.
00:55:12.780 --> 00:55:16.220
Now we're going to put an input
in the direction of this mode
00:55:16.220 --> 00:55:16.870
one.
00:55:16.870 --> 00:55:20.890
And you can see the
mode responds a lot.
00:55:20.890 --> 00:55:23.000
Put a negative input
in, it responds a lot.
00:55:23.000 --> 00:55:27.910
If we put a mode input in this
direction or this direction,
00:55:27.910 --> 00:55:37.340
the response is suppressed
by an amount of about 0.5.
00:55:37.340 --> 00:55:39.440
Because here, the gain is small.
00:55:39.440 --> 00:55:41.360
Here, the gain is big.
00:55:41.360 --> 00:55:43.414
So you see what's happening?
00:55:43.414 --> 00:55:50.070
This network looks just
like an autapse network,
00:55:50.070 --> 00:55:53.910
but where we've taken this
input and output space and just
00:55:53.910 --> 00:56:00.410
rotated it into a new coordinate
system, into this new basis.
00:56:00.410 --> 00:56:00.969
Yes?
00:56:00.969 --> 00:56:02.636
AUDIENCE: Why did it
kind of loop around
00:56:02.636 --> 00:56:04.650
on the one side [INAUDIBLE]?
00:56:04.650 --> 00:56:08.970
MICHALE FEE: OK, it's because
these things are relaxing
00:56:08.970 --> 00:56:10.710
exponentially back to zero.
00:56:10.710 --> 00:56:12.630
And we got a little
bit impatient
00:56:12.630 --> 00:56:16.560
and started the next input
before it had quite gone away.
00:56:16.560 --> 00:56:19.320
OK, good question.
00:56:19.320 --> 00:56:21.750
It's just that if you really
wait for a long time for it
00:56:21.750 --> 00:56:24.240
to settle, then the movie
just takes a long time.
00:56:24.240 --> 00:56:26.850
But maybe it would
be better to do that.
00:56:26.850 --> 00:56:30.510
So input this way and this
way lead to a large response,
00:56:30.510 --> 00:56:35.280
because those inputs activate
mode one, which has a big gain.
00:56:35.280 --> 00:56:38.540
Inputs in this direction
and this direction
00:56:38.540 --> 00:56:41.450
have a small response,
because they activate
00:56:41.450 --> 00:56:46.150
mode two, which has small gain.
00:56:46.150 --> 00:56:51.730
But notice that when
you activate mode one--
00:56:51.730 --> 00:56:54.230
when you put an input
in this direction,
00:56:54.230 --> 00:56:58.000
it only activates mode one.
00:56:58.000 --> 00:57:01.060
And it doesn't activate
mode two at all.
00:57:01.060 --> 00:57:03.830
If you put an input
in this direction,
00:57:03.830 --> 00:57:06.070
then it only activates
mode two, and it doesn't
00:57:06.070 --> 00:57:07.780
activate mode one at all.
00:57:11.220 --> 00:57:15.360
So it's just like the
autapse network, but rotated.
00:57:18.490 --> 00:57:27.830
So now let's do the case
where we have an input that
00:57:27.830 --> 00:57:29.840
activates both modes.
00:57:29.840 --> 00:57:33.300
So let's say we put an
input in this direction.
00:57:33.300 --> 00:57:37.730
What does that direction
correspond to h up.
00:57:37.730 --> 00:57:41.330
What is that input mean
here in terms of h1 and h2?
00:57:46.560 --> 00:57:49.590
Let's say we just put
an input-- remember,
00:57:49.590 --> 00:57:55.050
this is a plot on
axes h1 versus h2.
00:57:55.050 --> 00:57:57.570
So this input
vector h corresponds
00:57:57.570 --> 00:58:04.220
to just putting an input
on h2, into this neuron.
00:58:04.220 --> 00:58:08.120
So you can see that when we
put an input in this direction,
00:58:08.120 --> 00:58:09.500
we're activating--
00:58:09.500 --> 00:58:14.210
that input has a projection
onto mode one and mode two.
00:58:14.210 --> 00:58:16.070
So we're activating both modes.
00:58:19.200 --> 00:58:23.280
You can see that the
input h has a projection
00:58:23.280 --> 00:58:27.900
onto f1 and projection onto f2.
00:58:27.900 --> 00:58:28.860
So what you do is--
00:58:34.090 --> 00:58:36.340
well, here, I'm just showing
you what the steady state
00:58:36.340 --> 00:58:39.490
response is mathematically.
00:58:39.490 --> 00:58:42.280
Let me just show you
what that looks like.
00:58:42.280 --> 00:58:46.250
What this says is that if we
put an h in this direction,
00:58:46.250 --> 00:58:50.140
it's going to activate
a little bit of mode one
00:58:50.140 --> 00:58:54.940
with a big gain and a
little bit of mode two
00:58:54.940 --> 00:58:56.380
with a very small gain.
00:58:56.380 --> 00:59:01.880
And so the steady state response
will be the sum of those two.
00:59:01.880 --> 00:59:04.240
It'll be up here.
00:59:04.240 --> 00:59:09.120
So the steady state response
to this input in this direction
00:59:09.120 --> 00:59:10.810
is going to be over here.
00:59:10.810 --> 00:59:11.670
Why?
00:59:11.670 --> 00:59:16.680
Because that input activates
mode one and mode two both.
00:59:16.680 --> 00:59:20.180
But the response
of mode one is big,
00:59:20.180 --> 00:59:23.150
and the response of mode
two is really small.
00:59:23.150 --> 00:59:24.830
And so the steady
state response is
00:59:24.830 --> 00:59:29.180
going to be way
over here because
00:59:29.180 --> 00:59:32.330
of the big response, the
amplified response of mode two,
00:59:32.330 --> 00:59:35.750
which is in this direction, OK?
00:59:35.750 --> 00:59:37.442
So when we put an
input straight up,
00:59:37.442 --> 00:59:38.900
the response of
the network's going
00:59:38.900 --> 00:59:40.760
to be all the way over here.
00:59:40.760 --> 00:59:43.640
How is it going to get there?
00:59:43.640 --> 00:59:44.390
Let's take a look.
00:59:52.570 --> 00:59:55.110
We're going to put an input--
00:59:55.110 --> 00:59:58.283
sorry, that was first
in this direction.
00:59:58.283 --> 00:59:59.700
Now let's see what
happens when we
00:59:59.700 --> 01:00:01.720
put an input in this direction.
01:00:01.720 --> 01:00:06.150
You can see the response is
really big along the mode one
01:00:06.150 --> 01:00:08.250
direction, in this
direction, and it's
01:00:08.250 --> 01:00:12.550
really small in this direction.
01:00:12.550 --> 01:00:18.380
So input up in the upward
direction onto just this neuron
01:00:18.380 --> 01:00:21.690
produces a large
response in mode,
01:00:21.690 --> 01:00:24.020
which is this way, and
a very small response
01:00:24.020 --> 01:00:26.570
in mode two, which is this way.
01:00:26.570 --> 01:00:32.380
The response in mode two is
very fast, because the lambda,
01:00:32.380 --> 01:00:37.192
the 1 over 1 minus
lambda, is small,
01:00:37.192 --> 01:00:39.810
which makes the
time constant faster
01:00:39.810 --> 01:00:43.230
and the response smaller.
01:00:43.230 --> 01:00:45.990
So, again, it's just
like the response
01:00:45.990 --> 01:00:50.422
of the autapse
network, but rotated
01:00:50.422 --> 01:00:51.630
into a new coordinate system.
01:00:56.670 --> 01:00:58.290
All right, any
questions about that?
01:01:02.610 --> 01:01:06.060
So you can see we basically
understood everything
01:01:06.060 --> 01:01:09.660
we needed to know about
recurrent networks
01:01:09.660 --> 01:01:17.920
just by understanding simple
networks with just autapses.
01:01:17.920 --> 01:01:21.830
And all these more
complicated networks
01:01:21.830 --> 01:01:25.190
are just nothing
but rotated versions
01:01:25.190 --> 01:01:27.710
of the response of a
network with just autapses.
01:01:36.998 --> 01:01:38.040
Any questions about that?
01:01:41.990 --> 01:01:44.350
OK, let's do another
network now where
01:01:44.350 --> 01:01:46.210
we have inhibitory connections.
01:01:46.210 --> 01:01:50.350
That's called mutual inhibition.
01:01:50.350 --> 01:01:52.870
And let's make that
inhibition minus 0.8.
01:01:52.870 --> 01:01:55.690
The weight matrix is just
zeros on the diagonals,
01:01:55.690 --> 01:01:57.940
because there's no autapse here.
01:01:57.940 --> 01:02:03.230
And minus 0.8 on
the off-diagonals.
01:02:03.230 --> 01:02:10.338
What are the eigenvectors for
this matrix, for this network?
01:02:10.338 --> 01:02:11.790
AUDIENCE: The same.
01:02:11.790 --> 01:02:13.890
MICHALE FEE: Yeah,
because the diagonal
01:02:13.890 --> 01:02:15.430
elements are equal
to each other,
01:02:15.430 --> 01:02:18.070
and the off-diagonal elements
are equal to each other.
01:02:18.070 --> 01:02:21.990
It's a symmetric network
with equal diagonal elements.
01:02:21.990 --> 01:02:26.440
The eigenvectors are
always at 45 degrees.
01:02:26.440 --> 01:02:28.000
And what are the eigenvalues?
01:02:30.940 --> 01:02:34.370
AUDIENCE: [INAUDIBLE]
01:02:34.370 --> 01:02:36.458
MICHALE FEE: Well,
the two numbers
01:02:36.458 --> 01:02:37.500
are going to be the same.
01:02:37.500 --> 01:02:44.320
It's zero plus and minus 0.8,
plus and minus negative 0.8,
01:02:44.320 --> 01:02:47.400
which is just 0.8
and minus 0.8, right?
01:02:47.400 --> 01:02:47.910
Good.
01:02:47.910 --> 01:02:51.390
So the eigenvalues are
just 0.8 and minus 0.8.
01:02:51.390 --> 01:02:55.590
But the eigenvalues correspond
to different eigenvectors.
01:02:55.590 --> 01:02:59.760
So now the eigenvalue
mode in the 1,
01:02:59.760 --> 01:03:04.170
1 direction is now
minus 0.8, which
01:03:04.170 --> 01:03:09.270
means it's suppressing the
response in this direction.
01:03:09.270 --> 01:03:12.980
And the eigenvalue for the
eigenvector in the minus 1,
01:03:12.980 --> 01:03:16.530
1 direction is now
close to 1, which
01:03:16.530 --> 01:03:20.880
means that mode has a lot
of recurrent feedback.
01:03:20.880 --> 01:03:25.480
And so its response in this
direction is going to be big.
01:03:25.480 --> 01:03:26.970
It's going to be amplified.
01:03:26.970 --> 01:03:31.580
So unlike the case where we had
positive recurrent synapses,
01:03:31.580 --> 01:03:35.070
where we had amplification
in this direction, now
01:03:35.070 --> 01:03:37.920
we're going to
have amplification
01:03:37.920 --> 01:03:39.565
in this direction.
01:03:39.565 --> 01:03:40.440
Does that make sense?
01:03:43.500 --> 01:03:44.730
Think of it this way--
01:03:44.730 --> 01:03:48.320
if we go back to
this network here,
01:03:48.320 --> 01:03:51.950
you can see that when
these two neurons--
01:03:51.950 --> 01:03:56.690
when this neuron is active, it
tends to activate this neuron.
01:03:56.690 --> 01:03:58.230
And when this
neuron is activate,
01:03:58.230 --> 01:04:00.150
it tends to activate
that neuron.
01:04:00.150 --> 01:04:05.150
So this network, if you were to
activate one of these neurons,
01:04:05.150 --> 01:04:08.480
it tends to drive the
other neuron also.
01:04:08.480 --> 01:04:13.040
And so the activity of those two
neurons likes to go together.
01:04:13.040 --> 01:04:15.520
When one is big, the
other one wants to be big.
01:04:15.520 --> 01:04:21.800
And that's why there's a lot
of gain in this direction.
01:04:21.800 --> 01:04:23.110
Does that make sense?
01:04:23.110 --> 01:04:26.440
With these recurrent
excitatory connections,
01:04:26.440 --> 01:04:29.860
it's hard to make
this neuron fire
01:04:29.860 --> 01:04:32.470
and make that neuron not fire.
01:04:32.470 --> 01:04:36.300
And that's why the response is
suppressed in this direction,
01:04:36.300 --> 01:04:36.910
OK?
01:04:36.910 --> 01:04:41.920
With this network, when
this neuron is active,
01:04:41.920 --> 01:04:43.860
it's trying to
suppress that neuron.
01:04:47.100 --> 01:04:49.050
When that neuron has
positive firing rate,
01:04:49.050 --> 01:04:51.870
it's trying to make that neuron
have a negative firing rate.
01:04:51.870 --> 01:04:53.850
When that neuron is
negative, it tries
01:04:53.850 --> 01:04:55.470
to make that one go positive.
01:04:55.470 --> 01:04:58.140
And so this network
likes to have
01:04:58.140 --> 01:05:04.990
one firing positive and the
other neuron going negative.
01:05:04.990 --> 01:05:06.550
And so that's what happens.
01:05:06.550 --> 01:05:16.580
What you find is that if you put
an input into the first neuron,
01:05:16.580 --> 01:05:20.330
it tends to suppress the
activity in the second neuron,
01:05:20.330 --> 01:05:21.980
in v2.
01:05:21.980 --> 01:05:27.390
If you put neuron
into neuron two,
01:05:27.390 --> 01:05:29.220
it tends to suppress
the activity,
01:05:29.220 --> 01:05:32.040
or make v1 go negative.
01:05:32.040 --> 01:05:36.810
So it's, again, exactly
like the autapse network,
01:05:36.810 --> 01:05:42.590
but just, in this case, rotated
minus 45 degrees instead
01:05:42.590 --> 01:05:44.950
of plus 45 degrees, OK?
01:05:51.750 --> 01:05:55.000
Any questions about that?
01:05:55.000 --> 01:05:55.780
All right.
01:05:55.780 --> 01:05:59.830
So now let's talk about how--
01:05:59.830 --> 01:06:00.989
yes, Linda?
01:06:00.989 --> 01:06:03.489
AUDIENCE: So we just did, those
were all symmetric matrices,
01:06:03.489 --> 01:06:04.390
right?
01:06:04.390 --> 01:06:05.098
MICHALE FEE: Yes.
01:06:05.098 --> 01:06:08.397
AUDIENCE: So [INAUDIBLE]
can we not do this strategy
01:06:08.397 --> 01:06:09.380
if it's not symmetric?
01:06:09.380 --> 01:06:11.690
MICHALE FEE: You can do it
for non-symmetric matrices,
01:06:11.690 --> 01:06:15.260
but non-symmetric
matrices start doing
01:06:15.260 --> 01:06:17.330
all kinds of other
cool stuff that
01:06:17.330 --> 01:06:20.730
is a topic for another day.
01:06:20.730 --> 01:06:25.650
So symmetric matrices
are special in that they
01:06:25.650 --> 01:06:31.380
have very simple dynamics.
01:06:31.380 --> 01:06:37.930
They just relax to a
steady state solution.
01:06:37.930 --> 01:06:40.980
Weight matrices that are
not symmetric, or even
01:06:40.980 --> 01:06:43.230
anti-symmetric, tend to
do really cool things
01:06:43.230 --> 01:06:46.590
like oscillating.
01:06:46.590 --> 01:06:50.670
And we'll get to that in
another lecture, all right?
01:06:50.670 --> 01:06:55.170
OK, so now let's talk about
using recurrent networks
01:06:55.170 --> 01:06:57.150
to store memories.
01:06:57.150 --> 01:07:00.360
So, remember, all of
the cases we've just
01:07:00.360 --> 01:07:03.960
described, all of the
networks we've just described,
01:07:03.960 --> 01:07:08.340
had the properties that the
lambdas were less than one.
01:07:08.340 --> 01:07:10.190
So what we've been
looking at are
01:07:10.190 --> 01:07:13.970
networks for which
lambda is less than one
01:07:13.970 --> 01:07:18.320
and they're symmetric
weight matrices.
01:07:18.320 --> 01:07:20.008
So that was kind
of a special case,
01:07:20.008 --> 01:07:21.800
but it's a good case
for building intuition
01:07:21.800 --> 01:07:24.050
about what goes on.
01:07:24.050 --> 01:07:25.800
But now we're going
to start branching out
01:07:25.800 --> 01:07:30.310
into more interesting behavior.
01:07:33.040 --> 01:07:37.090
So let's take a look at what
happens to our equation.
01:07:37.090 --> 01:07:41.170
This is now our equation
different modes of a network.
01:07:41.170 --> 01:07:43.930
What happens to this equation
when lambda is actually
01:07:43.930 --> 01:07:46.670
equal to one?
01:07:46.670 --> 01:07:52.210
So when lambda is equal to one,
this term goes to zero, right?
01:07:52.210 --> 01:07:58.170
So we can just cross this
out and rewrite our equation
01:07:58.170 --> 01:08:06.710
as tau dc dt equals f1 f dot h.
01:08:06.710 --> 01:08:09.238
So what is this?
01:08:09.238 --> 01:08:10.280
What does that look like?
01:08:13.850 --> 01:08:21.130
What's the solution to c for
this differential equation?
01:08:21.130 --> 01:08:25.420
Does this exponentially
relax toward a v infinity?
01:08:29.640 --> 01:08:31.840
What is v infinity here?
01:08:31.840 --> 01:08:34.770
It's not even defined.
01:08:34.770 --> 01:08:38.399
If you set dc dt equal to
zero, there's not even a c
01:08:38.399 --> 01:08:39.899
to solve for, right?
01:08:39.899 --> 01:08:41.399
So what is this?
01:08:46.290 --> 01:08:49.890
The derivative of c
is just equal to--
01:08:49.890 --> 01:08:55.238
if we put in an input
that's constant, what is c?
01:08:55.238 --> 01:08:57.510
AUDIENCE: [INAUDIBLE]
01:08:57.510 --> 01:09:00.290
MICHALE FEE: This is
an integrator, right?
01:09:00.290 --> 01:09:04.609
This c, the solution
to this equation,
01:09:04.609 --> 01:09:10.960
is that c is the
integral of this input.
01:09:10.960 --> 01:09:16.960
c is some initial c plus
the integral over time.
01:09:22.279 --> 01:09:25.180
So if we have an input--
01:09:25.180 --> 01:09:28.050
and again, what
we're plotting here
01:09:28.050 --> 01:09:34.370
is the activity of one of
the modes of our network, c1,
01:09:34.370 --> 01:09:37.430
which is a function
of the projection
01:09:37.430 --> 01:09:42.350
of the input along the
eigenvector of mode one.
01:09:42.350 --> 01:09:46.189
So we're going to plot h, which
is just how much the input
01:09:46.189 --> 01:09:50.000
overlaps with mode one.
01:09:50.000 --> 01:09:53.810
And as a function of time,
let's start at one equals zero.
01:09:53.810 --> 01:09:54.890
What will this look like?
01:09:59.710 --> 01:10:02.250
This will just
increase linearly.
01:10:02.250 --> 01:10:03.514
And then what happens?
01:10:06.993 --> 01:10:08.120
What happens here?
01:10:13.650 --> 01:10:14.357
Raymundo?
01:10:14.357 --> 01:10:15.690
AUDIENCE: R just stays constant.
01:10:15.690 --> 01:10:18.400
MICHALE FEE: Good.
01:10:18.400 --> 01:10:21.730
We've been through that,
like, 100 times in this class.
01:10:25.220 --> 01:10:33.600
Now, what's special about
this network is that remember,
01:10:33.600 --> 01:10:37.350
when lambda was less
than one, the network
01:10:37.350 --> 01:10:39.120
would respond to the input.
01:10:39.120 --> 01:10:41.370
And then what would it do
when we took the input away?
01:10:44.830 --> 01:10:47.300
It would decay back to zero.
01:10:47.300 --> 01:10:51.070
But this network does
something really special.
01:10:51.070 --> 01:10:53.620
This network, you put
an input in and then
01:10:53.620 --> 01:10:58.360
take the input away, this
network stays active.
01:10:58.360 --> 01:11:02.920
It remembers what the input was.
01:11:02.920 --> 01:11:06.220
Whereas, if you have a network
where lambda is less than one,
01:11:06.220 --> 01:11:12.237
the network very quickly
forgets what the input was.
01:11:12.237 --> 01:11:14.570
All right, what happens when
lambda is greater than one?
01:11:14.570 --> 01:11:18.920
So when lambda is greater
than one, this term is now--
01:11:18.920 --> 01:11:20.780
this thing inside
the parentheses
01:11:20.780 --> 01:11:23.490
is negative, multiplied
by a negative number.
01:11:23.490 --> 01:11:27.290
This whole coefficient in front
of the c1 becomes positive.
01:11:27.290 --> 01:11:31.050
So we're just going to write
it as lambda minus one.
01:11:31.050 --> 01:11:33.980
And so this because positive.
01:11:33.980 --> 01:11:35.750
And what does that
solution look like?
01:11:35.750 --> 01:11:37.670
Does anyone know
what that looks like?
01:11:37.670 --> 01:11:40.760
dc dt equals a positive
number times c.
01:11:49.790 --> 01:11:50.990
Nobody?
01:11:50.990 --> 01:11:53.495
Are we all just sleepy?
01:11:57.278 --> 01:11:57.820
What happens?
01:12:00.700 --> 01:12:04.950
So if this is negative, if this
coefficient were negative, dc--
01:12:04.950 --> 01:12:07.650
if c is positive, then
dc dt is negative,
01:12:07.650 --> 01:12:11.620
and it relaxes to zero, right?
01:12:11.620 --> 01:12:13.270
Lets think about
this for a minute.
01:12:13.270 --> 01:12:15.380
What happens if this
quantity is positive?
01:12:15.380 --> 01:12:16.740
So if c is positive--
01:12:19.760 --> 01:12:20.720
cover that up.
01:12:20.720 --> 01:12:24.090
If this is positive
and c is positive,
01:12:24.090 --> 01:12:26.760
then dc dt is positive.
01:12:26.760 --> 01:12:31.790
So that means if c is positive,
it just keeps getting bigger,
01:12:31.790 --> 01:12:32.360
right?
01:12:32.360 --> 01:12:36.320
And so what happens is you
get exponential growth.
01:12:36.320 --> 01:12:39.670
So if we now take an input and
we put it into this network,
01:12:39.670 --> 01:12:41.740
where lambda is
greater than one,
01:12:41.740 --> 01:12:44.550
you get exponential growth.
01:12:44.550 --> 01:12:47.000
And now what happens when
you turn that input off?
01:12:53.800 --> 01:12:55.720
Does it go away?
01:13:06.031 --> 01:13:07.013
What happens?
01:13:12.420 --> 01:13:14.400
draw with their hand
what happens here.
01:13:18.360 --> 01:13:20.690
So just look at the equation.
01:13:20.690 --> 01:13:26.450
Again, h dot f1 is zero
here, so that's gone.
01:13:26.450 --> 01:13:28.190
This is positive.
01:13:28.190 --> 01:13:30.270
c is positive.
01:13:30.270 --> 01:13:31.140
So what is dc dt?
01:13:33.950 --> 01:13:34.490
Good.
01:13:34.490 --> 01:13:35.550
It's positive.
01:13:35.550 --> 01:13:36.290
And so what is--
01:13:36.290 --> 01:13:36.850
AUDIENCE: [INAUDIBLE]
01:13:36.850 --> 01:13:38.100
MICHALE FEE: It keeps growing.
01:13:41.710 --> 01:13:43.620
So you can see that
this network also
01:13:43.620 --> 01:13:49.020
remembers that it had input.
01:13:51.940 --> 01:13:54.550
So this network
also has a memory.
01:13:54.550 --> 01:13:58.990
So anytime you have lambda
less than one the network
01:13:58.990 --> 01:14:01.060
just-- as soon as
the input goes away,
01:14:01.060 --> 01:14:03.190
the network activity
goes to zero,
01:14:03.190 --> 01:14:06.220
and it just completely forgets
that it ever had input.
01:14:06.220 --> 01:14:09.860
Whereas, as long as lambda is
equal to or greater than one,
01:14:09.860 --> 01:14:15.640
then this network remembers
that it had input.
01:14:15.640 --> 01:14:18.790
So if lambda is less than
one, then the network
01:14:18.790 --> 01:14:23.330
relaxes exponentially back to
zero after the input goes away.
01:14:23.330 --> 01:14:26.680
If you have lambda equal to
one, you have an integrator,
01:14:26.680 --> 01:14:29.350
and the network
activity persists
01:14:29.350 --> 01:14:31.550
after the input goes away.
01:14:31.550 --> 01:14:33.550
And if you have
exponential growth,
01:14:33.550 --> 01:14:36.400
the network activity
also persists
01:14:36.400 --> 01:14:37.700
after the input goes away.
01:14:40.770 --> 01:14:45.020
And so that right there
is one of the best
01:14:45.020 --> 01:14:52.560
models for short-term
memory in the brain.
01:14:52.560 --> 01:14:58.440
The idea that you have
neurons that get input,
01:14:58.440 --> 01:15:02.700
become activated, and
then hold that memory
01:15:02.700 --> 01:15:08.340
by reactivating themselves and
holding their own activity high
01:15:08.340 --> 01:15:11.310
through recurrent excitation.
01:15:11.310 --> 01:15:14.430
But that excitation
has to be big enough
01:15:14.430 --> 01:15:17.520
to either just barely
maintain the activity
01:15:17.520 --> 01:15:22.240
or continue increasing
their activity.
01:15:22.240 --> 01:15:25.870
OK, now, that's not
necessarily such a great model
01:15:25.870 --> 01:15:26.710
for a memory, right?
01:15:26.710 --> 01:15:28.990
Because we can't have
neurons whose activity is
01:15:28.990 --> 01:15:31.750
exploding exponentially, right?
01:15:31.750 --> 01:15:32.980
So that's not so great.
01:15:32.980 --> 01:15:40.070
But it is quite commonly
thought that in neural networks
01:15:40.070 --> 01:15:43.400
involved in memory, the lambda
is actually greater than one.
01:15:43.400 --> 01:15:46.020
And how would we
rescue this situation?
01:15:46.020 --> 01:15:48.890
How would we save our
network from having neurons
01:15:48.890 --> 01:15:51.004
that blow up exponentially?
01:15:53.610 --> 01:15:59.430
Well, remember, this
was the solution
01:15:59.430 --> 01:16:02.880
for a network with
linear neurons.
01:16:02.880 --> 01:16:07.450
But neurons in the brain are
not really linear, are they?
01:16:07.450 --> 01:16:09.370
They have firing
rates that saturate.
01:16:09.370 --> 01:16:12.140
At higher inputs, firing
rates tend [AUDIO OUT]..
01:16:12.140 --> 01:16:12.640
Why?
01:16:12.640 --> 01:16:14.995
Because sodium channels
become inactivated,
01:16:14.995 --> 01:16:18.850
and the neurons can't
respond that fast, right?
01:16:29.430 --> 01:16:31.650
All right, this
I've already said.
01:16:31.650 --> 01:16:37.760
So we use what are called
saturating non-linearities.
01:16:37.760 --> 01:16:41.000
So it's very common
to write down
01:16:41.000 --> 01:16:45.230
models in which we can still
have neurons that are--
01:16:45.230 --> 01:16:47.380
we can still have them
approximately linear.
01:16:47.380 --> 01:16:50.510
So it's quite often to
have neurons that are
01:16:50.510 --> 01:16:52.460
linear for small [INAUDIBLE].
01:16:52.460 --> 01:16:55.130
They can go plus and
minus, but they saturate
01:16:55.130 --> 01:16:57.170
on the plus side or the minus.
01:16:57.170 --> 01:17:00.050
So now you can have
an input to a neuron
01:17:00.050 --> 01:17:03.230
that activates the neuron.
01:17:03.230 --> 01:17:08.670
You can see what happens is you
start activating this neuron.
01:17:08.670 --> 01:17:14.730
It keeps activating itself,
even as the input goes away.
01:17:14.730 --> 01:17:17.790
But now, what happens
is that activity
01:17:17.790 --> 01:17:20.490
starts getting up into the
regime where the neuron can't
01:17:20.490 --> 01:17:23.130
fire any faster.
01:17:23.130 --> 01:17:28.070
And so the activity becomes
stable at some high value
01:17:28.070 --> 01:17:29.400
of firing.
01:17:29.400 --> 01:17:31.040
Does that make sense?
01:17:31.040 --> 01:17:32.600
And this kind of
neuron, for example,
01:17:32.600 --> 01:17:38.330
can remember a plus input, or
it can remember a minus input.
01:17:41.415 --> 01:17:42.290
Does that make sense?
01:17:42.290 --> 01:17:46.050
So that's how we can
build a simple network
01:17:46.050 --> 01:17:52.950
with a neuron that can
remember its previous inputs
01:17:52.950 --> 01:17:56.700
with a lambda that's
greater than one.
01:17:56.700 --> 01:18:00.540
And this right here,
that basic thing,
01:18:00.540 --> 01:18:05.730
is one of the models for
how the hippocampus stores
01:18:05.730 --> 01:18:08.820
memories, that you have
hippocampal neurons that
01:18:08.820 --> 01:18:11.490
connect to each other
with a lot of recurrent
01:18:11.490 --> 01:18:13.500
connections [AUDIO OUT]
in the hippocampus
01:18:13.500 --> 01:18:15.960
has a lot of
recurrent connections.
01:18:15.960 --> 01:18:20.100
And the idea is that those
neurons activate each other,
01:18:20.100 --> 01:18:25.060
but then those neurons saturate
so they can't fire anymore,
01:18:25.060 --> 01:18:29.230
and now you can have a stable
memory of some prior input.
01:18:37.020 --> 01:18:38.870
And I think we
should stop there.
01:18:38.870 --> 01:18:42.080
But there are other
very interesting topics
01:18:42.080 --> 01:18:50.990
that we're going to get to
on how these kind of networks
01:18:50.990 --> 01:18:54.290
can also make
decisions and how they
01:18:54.290 --> 01:18:58.190
can store continuous memories--
not just discrete memories,
01:18:58.190 --> 01:19:00.740
plus or minus, on
or off, but can
01:19:00.740 --> 01:19:07.540
store a value for a long period
of time using this integrator.
01:19:07.540 --> 01:19:10.050
OK, so we'll stop there.